From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936960Ab3DJKpV (ORCPT ); Wed, 10 Apr 2013 06:45:21 -0400 Received: from mail-bk0-f52.google.com ([209.85.214.52]:56106 "EHLO mail-bk0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751586Ab3DJKpU (ORCPT ); Wed, 10 Apr 2013 06:45:20 -0400 MIME-Version: 1.0 In-Reply-To: <20130410113854.31734308@amdc308.digital.local> References: <1364804657-16590-1-git-send-email-jonghwa3.lee@samsung.com> <20130409123719.7399d5ad@amdc308.digital.local> <20130409184440.4cd87c1b@amdc308.digital.local> <20130410104452.661902af@amdc308.digital.local> <20130410113854.31734308@amdc308.digital.local> Date: Wed, 10 Apr 2013 12:45:17 +0200 Message-ID: Subject: Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. From: Vincent Guittot To: Lukasz Majewski , sanjay rawat Cc: Daniel Lezcano , Lorenzo Pieralisi , Viresh Kumar , Jonghwa Lee , "Rafael J. Wysocki" , "linux-kernel@vger.kernel.org" , Linux PM list , "cpufreq@vger.kernel.org" , MyungJoo Ham , Kyungmin Park , Chanwoo Choi , "sw0312.kim@samsung.com" , Marek Szyprowski Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10 April 2013 11:38, Lukasz Majewski wrote: > Hi Vincent, > >> On 10 April 2013 10:44, Lukasz Majewski >> wrote: >> > Hi Vincent, >> > >> >> >> >> >> >> On Tuesday, 9 April 2013, Lukasz Majewski >> >> wrote: >> >> > Hi Viresh and Vincent, >> >> > >> >> >> On 9 April 2013 16:07, Lukasz Majewski >> >> >> wrote: >> >> >> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee >> >> >> > Our approach is a bit different than cpufreq_ondemand one. >> >> >> > Ondemand takes the per CPU idle time, then on that basis >> >> >> > calculates per cpu load. The next step is to choose the >> >> >> > highest load and then use this value to properly scale >> >> >> > frequency. >> >> >> > >> >> >> > On the other hand LAB tries to model different behavior: >> >> >> > >> >> >> > As a first step we applied Vincent Guittot's "pack small >> >> >> > tasks" [*] patch to improve "race to idle" behavior: >> >> >> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks >> >> >> >> >> >> Luckily he is part of my team :) >> >> >> >> >> >> http://www.linaro.org/linux-on-arm/meet-the-team/power-management >> >> >> >> >> >> BTW, he is using ondemand governor for all his work. >> >> >> >> >> >> > Afterwards, we decided to investigate different approach for >> >> >> > power governing: >> >> >> > >> >> >> > Use the number of sleeping CPUs (not the maximal per-CPU >> >> >> > load) to change frequency. We thereof depend on [*] to "pack" >> >> >> > as many tasks to CPU as possible and allow other to sleep. >> >> >> >> >> >> He packs only small tasks. >> >> > >> >> > What's about packing not only small tasks? I will investigate the >> >> > possibility to aggressively pack (even with a cost of performance >> >> > degradation) as many tasks as possible to a single CPU. >> >> >> >> Hi Lukasz, >> >> >> >> I've got same comment on my current patch and I'm preparing a new >> >> version that can pack tasks more agressively based on the same >> >> buddy mecanism. This will be done at the cost of performance of >> >> course. >> > >> > Can you share your development tree? >> >> The dev is not finished yet but i will share it as soon as possible > > Ok > >> >> > >> >> >> >> >> >> > >> >> > It seems a good idea for a power consumption reduction. >> >> >> >> In fact, it's not always true and depends several inputs like the >> >> number of tasks that run simultaneously >> > >> > In my understanding, we can try to couple (affine) maximal number of >> > task with a CPU. Performance shall decrease, but we will avoid >> > costs of tasks migration. >> > >> > If I remember correctly, I've asked you about some testbench/test >> > program for scheduler evaluation. I assume that nothing has changed >> > and there isn't any "common" set of scheduler tests? >> >> There are a bunch of bench that are used to evaluate scheduler like >> hackbench, pgbench but they generally fills all CPU in order to test >> max performance. Are you looking for such kind of bench ? > > I'd rather see a bit different set of tests - something similar to > "cyclic" tests for PREEMPT_RT patch. > > For sched work it would be welcome to spawn a lot of processes with > different duration and workload. And on this basis observe if e.g. 2 or > 3 processors are idle. sanjay is working on something like that: https://git.linaro.org/gitweb?p=people/sanjayrawat/cyclicTest.git;a=shortlog;h=refs/heads/master > >> >> > >> >> >> >> > >> >> >> And if there are many small tasks we are >> >> >> packing, then load must be high and so ondemand gov will >> >> >> increase freq. >> >> > >> >> > This is of course true for "packing" all tasks to a single CPU. >> >> > If we stay at the power consumption envelope, we can even >> >> > overclock the frequency. >> >> > >> >> > But what if other - lets say 3 CPUs - are under heavy workload? >> >> > Ondemand will switch frequency to maximum, and as Jonghwa pointed >> >> > out this can cause dangerous temperature increase. >> >> >> >> IIUC, your main concern is to stay in a power consumption budget to >> >> not over heat and have to face the side effect of high temperature >> >> like a decrease of power efficiency. So your governor modifies the >> >> max frequency based on the number of running/idle CPU >> > Yes, this is correct. >> > >> >> to have an >> >> almost stable power consumtpion ? >> > >> > From our observation it seems, that for 3 or 4 running CPUs under >> > heavy load we see much more power consumption reduction. >> >> That's logic because you will reduce the voltage >> >> > >> > To put it in another way - ondemand would increase frequency to max >> > for all 4 CPUs. On the other hand, if user experience drops to the >> > acceptable level we can reduce power consumption. >> > >> > Reducing frequency and CPU voltage (by DVS) causes as a side effect, >> > that temperature stays at acceptable level. >> > >> >> >> >> Have you also looked at the power clamp driver that have similar >> >> target ? >> > >> > I might be wrong here, but in my opinion the power clamp driver is >> > a bit different: >> >> yes, it periodically forces the cluster in a low power state >> >> > >> > 1. It is dedicated to Intel SoCs, which provide special set of >> > registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor >> > to enter certain C state for a given duration. Idle duration is >> > calculated by per CPU set of high priority kthreads (which also >> > program [*] registers). >> >> IIRC, a trial on ARM platform have been done by lorenzo and daniel. >> Lorenzo, Daniel, have you more information ? > > More information would be welcome :-) > >> >> > >> > 2. ARM SoCs don't have such infrastructure, so we depend on SW here. >> > Scheduler has to remove tasks from a particular CPU and "execute" on >> > it the idle_task. >> > Moreover at Exynos4 thermal control loop depends on SW, since we can >> > only read SoC temperature via TMU (Thermal Management Unit) block. >> >> The idle duration is quite small and should not perturb normal >> behavior > > What do you mean by "small idle duration"? You think about exact time > needed to enter idle state (ARM's WFI) or the time in which CPU is idle. The time in which CPUs are idle Vincent > >> >> Vincent >> > >> > >> > Correct me again, but it seems to me that on ARM we can use CPU >> > hotplug (which as Tomas Glexner stated recently is going to be >> > "refactored" :-) ) or "ask" scheduler to use smallest possible >> > number of CPUs and enter C state for idling CPUs. >> > >> > >> > >> >> >> >> >> >> Vincent >> >> >> >> > >> >> >> >> >> >> > Contrary, when all cores are heavily loaded, we decided to >> >> >> > reduce frequency by around 30%. With this approach user >> >> >> > experience recution is still acceptable (with much less power >> >> >> > consumption). >> >> >> >> >> >> Don't know.. running many cpus at lower freq for long duration >> >> >> will probably take more power than running them at high freq >> >> >> for short duration and making system idle again. >> >> >> >> >> >> > We have posted this "RFC" patch mainly for discussion, and I >> >> >> > think it fits its purpose :-). >> >> >> >> >> >> Yes, no issues with your RFC idea.. its perfect.. >> >> >> >> >> >> @Vincent: Can you please follow this thread a bit and tell us >> >> >> what your views are? >> >> >> >> >> >> -- >> >> >> viresh >> >> > >> >> > >> >> > >> >> > -- >> >> > Best regards, >> >> > >> >> > Lukasz Majewski >> >> > >> >> > Samsung R&D Poland (SRPOL) | Linux Platform Group >> >> > >> > >> > >> > -- >> > Best regards, >> > >> > Lukasz Majewski >> > >> > Samsung R&D Poland (SRPOL) | Linux Platform Group > > > > -- > Best regards, > > Lukasz Majewski > > Samsung R&D Poland (SRPOL) | Linux Platform Group