From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C696BC56202 for ; Mon, 23 Nov 2020 11:35:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 771612072C for ; Mon, 23 Nov 2020 11:35:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728976AbgKWLeu (ORCPT ); Mon, 23 Nov 2020 06:34:50 -0500 Received: from foss.arm.com ([217.140.110.172]:43448 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728689AbgKWLeu (ORCPT ); Mon, 23 Nov 2020 06:34:50 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AF85530E; Mon, 23 Nov 2020 03:34:49 -0800 (PST) Received: from [10.57.28.98] (unknown [10.57.28.98]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 228503F70D; Mon, 23 Nov 2020 03:34:46 -0800 (PST) Subject: Re: [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms To: Viresh Kumar Cc: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Amit Daniel Kachhap , Daniel Lezcano , Javi Merino , Zhang Rui , Amit Kucheria , linux-kernel@vger.kernel.org, Quentin Perret , linux-pm@vger.kernel.org References: <1fa9994395764ba19cfe6240d8b3c1ce390e8f82.1605770951.git.viresh.kumar@linaro.org> <20201123104119.g46y6idk734pw7fl@vireshk-i7> From: Lukasz Luba Message-ID: Date: Mon, 23 Nov 2020 11:34:44 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20201123104119.g46y6idk734pw7fl@vireshk-i7> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/23/20 10:41 AM, Viresh Kumar wrote: > On 20-11-20, 14:51, Lukasz Luba wrote: >> On 11/19/20 7:38 AM, Viresh Kumar wrote: >>> Scenario 1: The CPUs were mostly idle in the previous polling window of >>> the IPA governor as the tasks were sleeping and here are the details >>> from traces (load is in %): >>> >>> Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=203 load={{0x35,0x1,0x0,0x31,0x0,0x0,0x64,0x0}} dynamic_power=1339 >>> New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=600 load={{0x60,0x46,0x45,0x45,0x48,0x3b,0x61,0x44}} dynamic_power=3960 >>> >>> Here, the "Old" line gives the load and requested_power (dynamic_power >>> here) numbers calculated using the idle time based implementation, while >>> "New" is based on the CPU utilization from scheduler. >>> >>> As can be clearly seen, the load and requested_power numbers are simply >>> incorrect in the idle time based approach and the numbers collected from >>> CPU's utilization are much closer to the reality. >> >> It is contradicting to what you have put in 'Scenario 1' description, >> isn't it? > > At least I didn't think so when I wrote this and am still not sure :) > >> Frequency at 1.2GHz, 75% total_load, power 4W... I'd say if CPUs were >> mostly idle than 1.3W would better reflect that state. > > The CPUs were idle because the tasks were sleeping, but once the tasks > resume to work, we need a frequency that matches the real load of the > tasks. This is exactly what schedutil would ask for as well as it uses > the same metric and so we should be looking to ask for the same power > budget.. Yes, agree. > >> What was the IPA period in your setup? > > It is 100 ms by default, though I remember that I tried with 10 ms as > well. > >> It depends on your platform IPA period (e.g. 100ms) and your current >> runqueues state (at that sampling point in time). The PELT decay/rise >> period is different. I am not sure if you observe the system avg load >> for last e.g. 100ms looking at these signals. Maybe IPA period is too >> short/long and couldn't catch up with PELT signals? >> But we won't too short averaging, since 16ms is a display tick. >> >> IMHO based on this result it looks like the util could lost older >> information from the past or didn't converge yet to this low load yet. >> >>> >>> Scenario 2: The CPUs were busy in the previous polling window of the IPA >>> governor: >>> >>> Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=800 load={{0x64,0x64,0x64,0x64,0x64,0x64,0x64,0x64}} dynamic_power=5280 >>> New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=708 load={{0x4d,0x5c,0x5c,0x5b,0x5c,0x5c,0x51,0x5b}} dynamic_power=4672 >>> >>> As can be seen, the idle time based load is 100% for all the CPUs as it >>> took only the last window into account, but in reality the CPUs aren't >>> that loaded as shown by the utilization numbers. >> >> This is also odd. The ~88% of total_load, looks like started decaying or >> didn't converge yet to 100% or some task vanished? > > They must have decayed a bit because of the idle period, so looks okay > that way. > I have experimented with this new estimation and compared with real power meter and other models. It looks good, better than current mainline. I will continue experiments, but this patch LGTM and I will add my reviewed-by today (after finishing it). It would make more sense to adjust IPA period to util signal then the opposite. I have to play with this a bit... Regards, Lukasz