From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Srhj=DT=lists.infradead.org=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH,
	DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AC0EFC43457
	for <linux-arm-kernel@archiver.kernel.org>; Mon, 12 Oct 2020 16:32:12 +0000 (UTC)
Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 4C70320838
	for <linux-arm-kernel@archiver.kernel.org>; Mon, 12 Oct 2020 16:32:12 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="1oJxsltn"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4C70320838
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding:
	Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive:
	List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID:
	Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date:
	Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner;
	 bh=V9y7q3rWqqto55j1y1OUsh5CKAoSuRwYjRbwlVGWPAc=; b=1oJxsltnSRDvwrOWkvfiajEHD
	MbR5mJiO5Xs3JH8EOnb2IPOFCobGOTmBudUs77ag3KVyJfJmixe08ohSaOSnKHzIVXzMZr27f1xSC
	rWgqVk/usREdzW8n0/1a28QG5GUAzbG9yqcnDGzy89+70j6KMS/f8YXjmLmNt4T55D2d3e65JixKS
	DYqh1zSA/xh+xhiT0158313p7cv2ebC5Ac+I9u1bpUomJYvVV25W9Qf4yKvyzOObYNJsrF4t+PK6v
	Ad4PkkNRNJpDqjwscdOiAJAB5oEUA84wJOu5Myd4i+GYErcSSZXdY4VxPHg7EkhY+Hv6l1c7ewmZ1
	9hYtzDBtQ==;
Received: from localhost ([::1] helo=merlin.infradead.org)
	by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux))
	id 1kS0if-00030F-71; Mon, 12 Oct 2020 16:30:41 +0000
Received: from foss.arm.com ([217.140.110.172])
 by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux))
 id 1kS0ib-0002yw-Be
 for linux-arm-kernel@lists.infradead.org; Mon, 12 Oct 2020 16:30:38 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A20EE31B;
 Mon, 12 Oct 2020 09:30:34 -0700 (PDT)
Received: from localhost (unknown [10.1.199.49])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 442E73F719;
 Mon, 12 Oct 2020 09:30:34 -0700 (PDT)
Date: Mon, 12 Oct 2020 17:30:32 +0100
From: Ionela Voinescu <ionela.voinescu@arm.com>
To: Lukasz Luba <lukasz.luba@arm.com>
Subject: Re: [PATCH v2 2/2] [RFC] CPUFreq: Add support for
 cpu-perf-dependencies
Message-ID: <20201012163032.GA30838@arm.com>
References: <2417d7b5-bc58-fa30-192c-e5991ec22ce0@arm.com>
 <20201008110241.dcyxdtqqj7slwmnc@vireshk-i7>
 <20201008150317.GB20268@arm.com>
 <56846759-e3a6-9471-827d-27af0c3d410d@arm.com>
 <20201009053921.pkq4pcyrv4r7ylzu@vireshk-i7>
 <42e3c8e9-cadc-d013-1e1f-fa06af4a45ff@arm.com>
 <20201009140141.GA4048593@bogus>
 <2b7b6486-2898-1279-ce9f-9e7bd3512152@arm.com>
 <20201012105945.GA9219@arm.com>
 <500510b9-58f3-90b3-8c95-0ac481d468b5@arm.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <500510b9-58f3-90b3-8c95-0ac481d468b5@arm.com>
User-Agent: Mutt/1.9.4 (2018-02-28)
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20201012_123037_493300_3B7C3608 
X-CRM114-Status: GOOD (  58.44  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Cc: Rob Herring <robh@kernel.org>, daniel.lezcano@linaro.org,
 devicetree@vger.kernel.org, vireshk@kernel.org, linux-pm@vger.kernel.org,
 rjw@rjwysocki.net, linux-kernel@vger.kernel.org, sudeep.holla@arm.com,
 Nicola Mazzucato <nicola.mazzucato@arm.com>,
 Viresh Kumar <viresh.kumar@linaro.org>, chris.redpath@arm.com,
 morten.rasmussen@arm.com, linux-arm-kernel@lists.infradead.org
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

Hi Lukasz,

On Monday 12 Oct 2020 at 14:48:20 (+0100), Lukasz Luba wrote:
> 
> 
> On 10/12/20 11:59 AM, Ionela Voinescu wrote:
> > On Monday 12 Oct 2020 at 11:22:57 (+0100), Lukasz Luba wrote:
> > [..]
> > > > > I thought about it and looked for other platforms' DT to see if can reuse
> > > > > existing opp information. Unfortunately I don't think it is optimal. The reason
> > > > > being that, because cpus have the same opp table it does not necessarily mean
> > > > > that they share a clock wire. It just tells us that they have the same
> > > > > capabilities (literally just tells us they have the same V/f op points).
> > > > > Unless I am missing something?
> > > > > 
> > > > > When comparing with ACPI/_PSD it becomes more intuitive that there is no
> > > > > equivalent way to reveal "perf-dependencies" in DT.
> > > > 
> > > > You should be able to by examining the clock tree. But perhaps SCMI
> > > > abstracts all that and just presents virtual clocks without parent
> > > > clocks available to determine what clocks are shared? Fix SCMI if that's
> > > > the case.
> > > 
> > > True, the SCMI clock does not support discovery of clock tree:
> > > (from 4.6.1 Clock management protocol background)
> > > 'The protocol does not cover discovery of the clock tree, which must be
> > > described through firmware tables instead.' [1]
> > > 
> > > In this situation, would it make sense, instead of this binding from
> > > patch 1/2, create a binding for internal firmware/scmi node?
> > > 
> > > Something like:
> > > 
> > > firmware {
> > > 	scmi {
> > > 	...		
> > > 		scmi-perf-dep {
> > > 			compatible = "arm,scmi-perf-dependencies";
> > > 			cpu-perf-dep0 {
> > > 				cpu-perf-affinity = <&CPU0>, <&CPU1>;
> > > 			};
> > > 			cpu-perf-dep1 {
> > > 				cpu-perf-affinity = <&CPU3>, <&CPU4>;
> > > 			};
> > > 			cpu-perf-dep2 {
> > > 				cpu-perf-affinity = <&CPU7>;
> > > 			};
> > > 		};
> > > 	};
> > > };
> > > 
> > > The code which is going to parse the binding would be inside the
> > > scmi perf protocol code and used via API by scmi-cpufreq.c.
> > > 
> > 
> > While SCMI cpufreq would be able to benefit from the functionality that
> > Nicola is trying to introduce, it's not the only driver, and more
> > importantly, it's not *going* to be the only driver benefiting from
> > this.
> > 
> > Currently there is also qcom-cpufreq-hw.c and the future
> > mediatek-cpufreq-hw.c that is currently under review [1]. They both do
> > their frequency setting by interacting with HW/FW, and could either take
> > or update their OPP tables from there. Therefore, if the platform would
> > require it, they could also expose different controls for frequency
> > setting and could benefit from additional information about clock
> > domains (either through opp-shared or the new entries in Nicola's patch),
> > without driver changes.
> > 
> > Another point to be made is that I strongly believe this is going to be
> > the norm in the future. Directly setting PLLs and regulator voltages
> > has been proven unsafe and unsecure.
> > 
> > Therefore, I see this as support for a generic cpufreq feature (a
> > hardware coordination type), rather than support for a specific driver.
> > 
> > [1] https://lkml.org/lkml/2020/9/10/11
> > 
> > > 
> > > Now regarding the 'dependent_cpus' mask.
> > > 
> > > We could avoid adding a new field 'dependent_cpus' in policy
> > > struct, but I am not sure of one bit - Frequency Invariant Engine,
> > > (which is also not fixed by just adding a new cpumask).
> >    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >    Let's take it step by step..
> > > 
> > > We have 3 subsystems to fix:
> > > 1. EAS - EM has API function which takes custom cpumask, so no issue,
> >             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 	   keep in mind that EAS it's using the max aggregation method
> > 	   that schedutil is using. So if we are to describe the
> > 	   functionality correctly, it needs both a cpumask describing
> > 	   the frequency domains and an aggregation method.
> 
> EAS does not use schedutil max agregation, it calculates max_util
> internally.
> 

But isn't it the same logic mechanism that schedutil uses?

> The compute_energy() loops through the CPUs in the domain and
> takes the utilization from them via schedutil_cpu_util(cpu_rq(cpu)).
> It figures out max_util and then em_cpu_energy() maps it to next
  ^^^^^^^^^^^^^^^^^^^^^^^

Same for schedutil: sugov_next_freq_shared() calls sugov_get_util()
which then calls schedutil_cpu_util().

If your point is that one is applying the max function in compute_energy()
while the other is doing it in sugov_next_freq_shared(), I'll re-enforce
my argument that they are logically doing the same *type* of
aggregation. EAS is relying on it and schedutil was purposely modified
for this purpose:

938e5e4b0d15  sched/cpufreq: Prepare schedutil for Energy Aware
Scheduling

> frequency for the cluster. It just needs proper utilization from
> CPUs, which is taken from run-queues, which is a sum of utilization
> of tasks being there. This leads to problem how we account utilization
> of a task. This is the place where the FIE is involved. EAS assumes the
> utilization is calculated properly.

This is separate. Above we were discussing the aggregation method and
what CPUs this is applied on. I'll continue on FIE below.

> > 
> > >    fix would be to use it via the scmi-cpufreq.c
> > 
> > > 2. IPA (for calculating the power of a cluster, not whole thermal needs
> > >    this knowledge about 'dependent cpus') - this can be fixed internally
> > 
> > > 3. Frequency Invariant Engine (FIE) - currently it relies on schedutil
> > >    filtering and providing max freq of all cpus in the cluster into the
> > >    FIE; this info is then populated to all 'related_cpus' which will
> > >    have this freq (we know, because there is no other freq requests);
> > >    Issues:
> > > 3.1. Schedutil is not going to check all cpus in the cluster to take
> > >    max freq, which is then passed into the cpufreq driver and FIE
> > > 3.2. FIE would have to (or maybe we would drop it) have a logic similar
> > >    to what schedutil does (max freq search and set, then filter next
> > >    freq requests from other cpus in the next period e.g. 10ms)
> > > 3.3. Schedutil is going to invoke freq change for each cpu independently
> > >    and the current code just calls arch_set_freq_scale() - adding just
> > >    'dependent_cpus' won't help
> > 
> > I don't believe these are issues. As we need changes for EAS and IPA, we'd
> > need changes for FIE. We don't need more than the cpumask that shows
> > frequency domains as we already already have the aggregation method that
> > schedutil uses to propagate the max frequency in a domain across CPUs.
> 
> Schedutil is going to work in !policy_is_shared() mode, which leads to
> sugov_update_single() being the 'main' function. We won't have
> schedutil goodness which is handling related_cpus use case.
> 

Agreed! I did not mean that I'd rely on schedutil to do the aggregation
and hand me the answer. But my suggestion is to use the same logical
method - maximum, for cases where counters are not present.

> Then in software FIE would you just change the call from:
> 	arch_set_freq_scale(policy->related_cpus,...)
> to:
> 	arch_set_freq_scale(policy->dependent_cpus,...)
> ?
> 
> This code would be called from any CPU (without filtering) and it
> would loop through cpumask updating freq_scale, which is wrong IMO.
> You need some 'logic', which is not currently in there.
> 

Definitely! But that's because the FIE changes above are incomplete.
That's why whomever does these changes should go beyond:
s/related_cpus/dependent_cpus.

We don't need more information from DT additional to this dependent_cpus
maks, but it does not mean the end solution for making use of it will be
a simple "s/related_cpus/dependent_cpus".

> Leaving the 'related_cpus' would also be wrong (because real CPU
> frequency is different, so we would account task utilization wrongly).
> 
> > 
> > This would be the default method if cycle counters are not present. It
> > might not reflect the frequency the cores actually get from HW, but for
> > that cycle counters should be used.
> 
> IMHO the configurations with per-cpu freq requests while there are CPUs
> 'dependent' and there are no HW counters to use for tasks
> utilization accounting - should be blocked. Then we don't need
> 'dependent_cpus' in software FIE. Then one less from your requirements
> list for new cpumask.
> 

I'd go for a default.. better have something than removing it
altogether, but we'll see.

I'll stop this here as I think we're distracting a bit from the main
purpose of this RFC. I don't believe FIE brings an additional
requirement. "Software" FIE will need fixing/optimizing/bypassing
(we'll agree later on the implementation) but it does not need anything
else from DT/ACPI.

Thank you,
Ionela.

> > 
> > > 3.4 What would be the real frequency of these cpus and what would be
> > >    set to FIE
> > > 3.5 FIE is going to filter to soon requests from other dependent cpus?
> > > 
> > > IMHO the FIE needs more bits than just a new cpumask.
> > > Maybe we should consider to move FIE arch_set_freq_scale() call into the
> > > cpufreq driver, which will know better how to aggregate/filter requests
> > > and then call FIE update?
> > 
> > I'm quite strongly against this :). As described before, this is not a
> > feature that a single driver needs, and even if it was, the aggregation
> > method for FIE is not a driver policy.
> 
> Software version of FIE has issues in this case, schedutil or EAS won't
> help (different code path).
> 
> Regards,
> Lukasz

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel