* [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. @ 2013-04-01 8:24 Jonghwa Lee 2013-04-01 8:24 ` [RFC PATCH 1/2] cpuidle: Add idle enter/exit time stamp for notifying current idle state Jonghwa Lee ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Jonghwa Lee @ 2013-04-01 8:24 UTC (permalink / raw) To: Rafael J. Wysocki Cc: linux-kernel, linux-pm, cpufreq, MyungJoo Ham, Lukasz Majewski, Kyungmin Park, Chanwoo Choi, sw0312.kim, m.szyprowski, Jonghwa Lee This patchset adds new cpufreq governor named LAB(Legacy Application Boost). Basically, this governor is based on ondemand governor. ** Introduce LAB (Legacy Application Boost) governor <<Purpose>> One of the problem of ondemand is that it considers the most busy cpu only while doesn't care how many cpu is in busy state at the moment. This may results in unnecessary power consumption, and it'll be critical for the system having limited power source. To get the best energy efficiency, LAB governor considers not only idle time but also the number of idle cpus. It primarily focuses on supplying adequate performance to users within the limited resource. It checks the number of idle cpus then controls the maximum frequency dynamically. And it also applies different frequency increasing level depends on that information. In simple terms the less the number of busy cpus, the better performance will be given. In addition, stable system's power consumption in the busy state can be achieved also with using LAB governor. This will help to manage and estimate power consumption in certain system. <<Algorithm>> - Count the idle cpus : Number of idle cpus are determined by historical result. It stores cpu idle usage derived by dividing cpu idling time with a period at every time when cpufreq timer is called. And it calculates average of idle usage from the most recent stored data. It uses cpu state usage information from per-cpu cpuidle_device. However, because that cpu state usage information is updated only when cpu exits the state, so the information may differ with the real one while cpu is in idle at checking time. To detect whether cpu is idling, it uses timestamps of cpuidle devices. It needs to re-calculate idle state usage for the following cases. The possible 3 cases which cpu is in idle at checking time. (Shaded section represents staying in idle) 1) During the last period, idle state would be broken more than once. |____________|__ _____|_______ | |////////////|//|____|/////|//____________| t1 t2 @ t4 2) During the last period, there was no idle state entered, current idle is the first one. | | ________|_______ | |____________|____|////////|////__________| t1 t2 @ t4 3) During the last whole period, the core is in idle state. | ______|_____________|________ | |_____|//////|/////////////|////__________| t1 t2 @ t4 (@ : Current checking point) After calculating idle state usage, it decides whether cpu was idle in the last period with the idle threshold. If average of idle usage is bigger than threshold, then it treats the cpu as a idle cpu. For the test, I set default threshold value to 90. - Limitation of maximum frequency : With information got from counting the idle cpus phase, it sets maximum frequency for the next period. By default, it limits the current policy's maximum frequency by from 0 to 35% depends on the number of idle cpus. - Setting next frequency LAB governor changes the frequency step by step like the conservative governor. However, in LAB governor, the step is changing dynamically depending on how many cpus are in idle. next_freq = current_freq + current_idle_cpus * increasing_step; <<To do>> The prototype of this feature was developed as a cpuidle driver, so it it uses cpuidle framework information temporarily. I'd like to use the data of per-cpu tick_sched variable, it has all information that I want exactly but it can't be accessed from cpufreq side. I tested this patch on pegasus qaud board. Any comments are welcomed. Jonghwa Lee (2): cpuidle: Add idle enter/exit time stamp for notifying current idle state. cpufreq: Introduce new cpufreq governor, LAB(Legacy Application Boost). drivers/cpufreq/Kconfig | 26 ++ drivers/cpufreq/Makefile | 1 + drivers/cpufreq/cpufreq_governor.h | 14 + drivers/cpufreq/cpufreq_lab.c | 553 ++++++++++++++++++++++++++++++++++++ drivers/cpuidle/cpuidle.c | 8 +- include/linux/cpufreq.h | 3 + include/linux/cpuidle.h | 4 + 7 files changed, 605 insertions(+), 4 deletions(-) create mode 100644 drivers/cpufreq/cpufreq_lab.c -- 1.7.9.5 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC PATCH 1/2] cpuidle: Add idle enter/exit time stamp for notifying current idle state. 2013-04-01 8:24 [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor Jonghwa Lee @ 2013-04-01 8:24 ` Jonghwa Lee 2013-04-02 5:00 ` Daniel Lezcano 2013-04-01 8:24 ` [RFC PATCH 2/2] cpufreq: Introduce new cpufreq governor, LAB(Legacy Application Boost) Jonghwa Lee 2013-04-01 15:37 ` [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor Viresh Kumar 2 siblings, 1 reply; 26+ messages in thread From: Jonghwa Lee @ 2013-04-01 8:24 UTC (permalink / raw) To: Rafael J. Wysocki Cc: linux-kernel, linux-pm, cpufreq, MyungJoo Ham, Lukasz Majewski, Kyungmin Park, Chanwoo Choi, sw0312.kim, m.szyprowski, Jonghwa Lee This patch adds idle state time stamp to cpuidle device structure to notify its current idle state. If last enter time is newer than last exit time, then it means that the core is in idle now. Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> --- drivers/cpuidle/cpuidle.c | 8 ++++---- include/linux/cpuidle.h | 4 ++++ 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index eba6929..1e830cc 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -233,18 +233,18 @@ int cpuidle_wrap_enter(struct cpuidle_device *dev, int (*enter)(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index)) { - ktime_t time_start, time_end; s64 diff; - time_start = ktime_get(); + dev->last_idle_start = ktime_get(); index = enter(dev, drv, index); - time_end = ktime_get(); + dev->last_idle_end = ktime_get(); local_irq_enable(); - diff = ktime_to_us(ktime_sub(time_end, time_start)); + diff = ktime_to_us(ktime_sub(dev->last_idle_end, + dev->last_idle_start)); if (diff > INT_MAX) diff = INT_MAX; diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index 480c14d..d1af05f 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -16,6 +16,7 @@ #include <linux/kobject.h> #include <linux/completion.h> #include <linux/hrtimer.h> +#include <linux/ktime.h> #define CPUIDLE_STATE_MAX 8 #define CPUIDLE_NAME_LEN 16 @@ -74,6 +75,9 @@ struct cpuidle_device { struct kobject kobj; struct completion kobj_unregister; + ktime_t last_idle_start; + ktime_t last_idle_end; + #ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED int safe_state_index; cpumask_t coupled_cpus; -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 1/2] cpuidle: Add idle enter/exit time stamp for notifying current idle state. 2013-04-01 8:24 ` [RFC PATCH 1/2] cpuidle: Add idle enter/exit time stamp for notifying current idle state Jonghwa Lee @ 2013-04-02 5:00 ` Daniel Lezcano 0 siblings, 0 replies; 26+ messages in thread From: Daniel Lezcano @ 2013-04-02 5:00 UTC (permalink / raw) To: Jonghwa Lee Cc: Rafael J. Wysocki, linux-kernel, linux-pm, cpufreq, MyungJoo Ham, Lukasz Majewski, Kyungmin Park, Chanwoo Choi, sw0312.kim, m.szyprowski On 04/01/2013 10:24 AM, Jonghwa Lee wrote: > This patch adds idle state time stamp to cpuidle device structure to > notify its current idle state. If last enter time is newer than last > exit time, then it means that the core is in idle now. > > Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> > --- The patch description does not explain what problem you want to solve, how to solve it and the patch itself shows nothing. Could you elaborate ? Thanks -- Daniel > drivers/cpuidle/cpuidle.c | 8 ++++---- > include/linux/cpuidle.h | 4 ++++ > 2 files changed, 8 insertions(+), 4 deletions(-) > > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c > index eba6929..1e830cc 100644 > --- a/drivers/cpuidle/cpuidle.c > +++ b/drivers/cpuidle/cpuidle.c > @@ -233,18 +233,18 @@ int cpuidle_wrap_enter(struct cpuidle_device *dev, > int (*enter)(struct cpuidle_device *dev, > struct cpuidle_driver *drv, int index)) > { > - ktime_t time_start, time_end; > s64 diff; > > - time_start = ktime_get(); > + dev->last_idle_start = ktime_get(); > > index = enter(dev, drv, index); > > - time_end = ktime_get(); > + dev->last_idle_end = ktime_get(); > > local_irq_enable(); > > - diff = ktime_to_us(ktime_sub(time_end, time_start)); > + diff = ktime_to_us(ktime_sub(dev->last_idle_end, > + dev->last_idle_start)); > if (diff > INT_MAX) > diff = INT_MAX; > > diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h > index 480c14d..d1af05f 100644 > --- a/include/linux/cpuidle.h > +++ b/include/linux/cpuidle.h > @@ -16,6 +16,7 @@ > #include <linux/kobject.h> > #include <linux/completion.h> > #include <linux/hrtimer.h> > +#include <linux/ktime.h> > > #define CPUIDLE_STATE_MAX 8 > #define CPUIDLE_NAME_LEN 16 > @@ -74,6 +75,9 @@ struct cpuidle_device { > struct kobject kobj; > struct completion kobj_unregister; > > + ktime_t last_idle_start; > + ktime_t last_idle_end; > + > #ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED > int safe_state_index; > cpumask_t coupled_cpus; > -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 1/2] cpuidle: Add idle enter/exit time stamp for notifying current idle state. @ 2013-04-02 5:00 ` Daniel Lezcano 0 siblings, 0 replies; 26+ messages in thread From: Daniel Lezcano @ 2013-04-02 5:00 UTC (permalink / raw) To: Jonghwa Lee Cc: Rafael J. Wysocki, linux-kernel, linux-pm, cpufreq, MyungJoo Ham, Lukasz Majewski, Kyungmin Park, Chanwoo Choi, sw0312.kim, m.szyprowski On 04/01/2013 10:24 AM, Jonghwa Lee wrote: > This patch adds idle state time stamp to cpuidle device structure to > notify its current idle state. If last enter time is newer than last > exit time, then it means that the core is in idle now. > > Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> > --- The patch description does not explain what problem you want to solve, how to solve it and the patch itself shows nothing. Could you elaborate ? Thanks -- Daniel > drivers/cpuidle/cpuidle.c | 8 ++++---- > include/linux/cpuidle.h | 4 ++++ > 2 files changed, 8 insertions(+), 4 deletions(-) > > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c > index eba6929..1e830cc 100644 > --- a/drivers/cpuidle/cpuidle.c > +++ b/drivers/cpuidle/cpuidle.c > @@ -233,18 +233,18 @@ int cpuidle_wrap_enter(struct cpuidle_device *dev, > int (*enter)(struct cpuidle_device *dev, > struct cpuidle_driver *drv, int index)) > { > - ktime_t time_start, time_end; > s64 diff; > > - time_start = ktime_get(); > + dev->last_idle_start = ktime_get(); > > index = enter(dev, drv, index); > > - time_end = ktime_get(); > + dev->last_idle_end = ktime_get(); > > local_irq_enable(); > > - diff = ktime_to_us(ktime_sub(time_end, time_start)); > + diff = ktime_to_us(ktime_sub(dev->last_idle_end, > + dev->last_idle_start)); > if (diff > INT_MAX) > diff = INT_MAX; > > diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h > index 480c14d..d1af05f 100644 > --- a/include/linux/cpuidle.h > +++ b/include/linux/cpuidle.h > @@ -16,6 +16,7 @@ > #include <linux/kobject.h> > #include <linux/completion.h> > #include <linux/hrtimer.h> > +#include <linux/ktime.h> > > #define CPUIDLE_STATE_MAX 8 > #define CPUIDLE_NAME_LEN 16 > @@ -74,6 +75,9 @@ struct cpuidle_device { > struct kobject kobj; > struct completion kobj_unregister; > > + ktime_t last_idle_start; > + ktime_t last_idle_end; > + > #ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED > int safe_state_index; > cpumask_t coupled_cpus; > -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 1/2] cpuidle: Add idle enter/exit time stamp for notifying current idle state. 2013-04-02 5:00 ` Daniel Lezcano (?) @ 2013-04-02 6:17 ` jonghwa3.lee 2013-04-02 7:34 ` Daniel Lezcano -1 siblings, 1 reply; 26+ messages in thread From: jonghwa3.lee @ 2013-04-02 6:17 UTC (permalink / raw) To: Daniel Lezcano Cc: Rafael J. Wysocki, linux-kernel, linux-pm, cpufreq, MyungJoo Ham, Lukasz Majewski, Kyungmin Park, Chanwoo Choi, sw0312.kim, m.szyprowski On 2013년 04월 02일 14:00, Daniel Lezcano wrote: > On 04/01/2013 10:24 AM, Jonghwa Lee wrote: >> This patch adds idle state time stamp to cpuidle device structure to >> notify its current idle state. If last enter time is newer than last >> exit time, then it means that the core is in idle now. >> >> Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> >> --- > > The patch description does not explain what problem you want to solve, > how to solve it and the patch itself shows nothing. > > Could you elaborate ? I'm sorry for lacking description. I supplement more. This patch does add time-stamp for idle enter/exit only nothing more. The reason why I needed them is that I wanted to know current cpu idle state. It is hard to know whether cpu is in idle or not now. When I check the cpuidle state usage, sometimes the information is wrong. Because it is updated only when the cpu exits the idle state. So while the cpu is idling, the cpuidle state usage holds past one. Therefore I put the time-stamp for cpuidle enter/exit for checking current idling and calculating idle state usage correctly. I just make this patch temporary for my cpufreq governor work. So, it just use time-stamp for all idle state together. After RFC working, I have a plan to update this patch to use timestamp for each idle state. Thanks, And it also can be used to calculate cpuidle state usage even the cpu is idling. I knew the code looks worthless, but as this patchset is just suggesting idea, > > Thanks > -- Daniel > >> drivers/cpuidle/cpuidle.c | 8 ++++---- >> include/linux/cpuidle.h | 4 ++++ >> 2 files changed, 8 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c >> index eba6929..1e830cc 100644 >> --- a/drivers/cpuidle/cpuidle.c >> +++ b/drivers/cpuidle/cpuidle.c >> @@ -233,18 +233,18 @@ int cpuidle_wrap_enter(struct cpuidle_device *dev, >> int (*enter)(struct cpuidle_device *dev, >> struct cpuidle_driver *drv, int index)) >> { >> - ktime_t time_start, time_end; >> s64 diff; >> >> - time_start = ktime_get(); >> + dev->last_idle_start = ktime_get(); >> >> index = enter(dev, drv, index); >> >> - time_end = ktime_get(); >> + dev->last_idle_end = ktime_get(); >> >> local_irq_enable(); >> >> - diff = ktime_to_us(ktime_sub(time_end, time_start)); >> + diff = ktime_to_us(ktime_sub(dev->last_idle_end, >> + dev->last_idle_start)); >> if (diff > INT_MAX) >> diff = INT_MAX; >> >> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h >> index 480c14d..d1af05f 100644 >> --- a/include/linux/cpuidle.h >> +++ b/include/linux/cpuidle.h >> @@ -16,6 +16,7 @@ >> #include <linux/kobject.h> >> #include <linux/completion.h> >> #include <linux/hrtimer.h> >> +#include <linux/ktime.h> >> >> #define CPUIDLE_STATE_MAX 8 >> #define CPUIDLE_NAME_LEN 16 >> @@ -74,6 +75,9 @@ struct cpuidle_device { >> struct kobject kobj; >> struct completion kobj_unregister; >> >> + ktime_t last_idle_start; >> + ktime_t last_idle_end; >> + >> #ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED >> int safe_state_index; >> cpumask_t coupled_cpus; >> > > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 1/2] cpuidle: Add idle enter/exit time stamp for notifying current idle state. 2013-04-02 6:17 ` jonghwa3.lee @ 2013-04-02 7:34 ` Daniel Lezcano 2013-04-02 9:37 ` jonghwa3.lee 0 siblings, 1 reply; 26+ messages in thread From: Daniel Lezcano @ 2013-04-02 7:34 UTC (permalink / raw) To: jonghwa3.lee Cc: Rafael J. Wysocki, linux-kernel, linux-pm, cpufreq, MyungJoo Ham, Lukasz Majewski, Kyungmin Park, Chanwoo Choi, sw0312.kim, m.szyprowski On 04/02/2013 08:17 AM, jonghwa3.lee@samsung.com wrote: > On 2013년 04월 02일 14:00, Daniel Lezcano wrote: > >> On 04/01/2013 10:24 AM, Jonghwa Lee wrote: >>> This patch adds idle state time stamp to cpuidle device structure to >>> notify its current idle state. If last enter time is newer than last >>> exit time, then it means that the core is in idle now. >>> >>> Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> >>> --- >> >> The patch description does not explain what problem you want to solve, >> how to solve it and the patch itself shows nothing. >> >> Could you elaborate ? > > > I'm sorry for lacking description. I supplement more. > > This patch does add time-stamp for idle enter/exit only nothing more. > The reason why I needed them is that I wanted to know current cpu idle > state. It is hard to know whether cpu is in idle or not now. Did you looked at: include/linux/sched.h:extern int idle_cpu(int cpu); ? > When I check the cpuidle state usage, sometimes the information is wrong. > Because it is updated only when the cpu exits the idle state. So while the > cpu is idling, the cpuidle state usage holds past one. Therefore I put > the time-stamp for cpuidle enter/exit for checking current idling and > calculating idle state usage correctly. > > I just make this patch temporary for my cpufreq governor work. So, it just > use time-stamp for all idle state together. After RFC working, I have a plan > to update this patch to use timestamp for each idle state. I suggest you look at the enter_idle / exit_idle function and make your governor to subscribe to the IDLE_START/EXIT notifiers. arch/x86/kernel/process.c These are defined for the x86 architecture, maybe worth to add it to another architecture. Thanks -- Daniel -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 1/2] cpuidle: Add idle enter/exit time stamp for notifying current idle state. 2013-04-02 7:34 ` Daniel Lezcano @ 2013-04-02 9:37 ` jonghwa3.lee 2013-04-02 10:08 ` Daniel Lezcano 0 siblings, 1 reply; 26+ messages in thread From: jonghwa3.lee @ 2013-04-02 9:37 UTC (permalink / raw) To: Daniel Lezcano Cc: Rafael J. Wysocki, linux-kernel, linux-pm, cpufreq, MyungJoo Ham, Lukasz Majewski, Kyungmin Park, Chanwoo Choi, sw0312.kim, m.szyprowski On 2013년 04월 02일 16:34, Daniel Lezcano wrote: > On 04/02/2013 08:17 AM, jonghwa3.lee@samsung.com wrote: >> On 2013년 04월 02일 14:00, Daniel Lezcano wrote: >> >>> On 04/01/2013 10:24 AM, Jonghwa Lee wrote: >>>> This patch adds idle state time stamp to cpuidle device structure to >>>> notify its current idle state. If last enter time is newer than last >>>> exit time, then it means that the core is in idle now. >>>> >>>> Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> >>>> --- >>> >>> The patch description does not explain what problem you want to solve, >>> how to solve it and the patch itself shows nothing. >>> >>> Could you elaborate ? >> >> >> I'm sorry for lacking description. I supplement more. >> >> This patch does add time-stamp for idle enter/exit only nothing more. >> The reason why I needed them is that I wanted to know current cpu idle >> state. It is hard to know whether cpu is in idle or not now. > > Did you looked at: > > include/linux/sched.h:extern int idle_cpu(int cpu); > Yes, I did. > ? > >> When I check the cpuidle state usage, sometimes the information is wrong. >> Because it is updated only when the cpu exits the idle state. So while the >> cpu is idling, the cpuidle state usage holds past one. Therefore I put >> the time-stamp for cpuidle enter/exit for checking current idling and >> calculating idle state usage correctly. >> >> I just make this patch temporary for my cpufreq governor work. So, it just >> use time-stamp for all idle state together. After RFC working, I have a plan >> to update this patch to use timestamp for each idle state. > > I suggest you look at the enter_idle / exit_idle function and make your > governor to subscribe to the IDLE_START/EXIT notifiers. > > arch/x86/kernel/process.c > > These are defined for the x86 architecture, maybe worth to add it to > another architecture. > Thanks for your opinion. Actually, I work on ARM architecture and I knew that the attempt of applying idle notifier was failed. You probably knew it, because the link you gave me before is that attempt. (https://lkml.org/lkml/2012/2/7/504) :) Currently, there is only notifying call which is for led in arch/arm/kernel/process.c and I think it isn't for me to use. Anyway, Do you really think it is better way to use notifier than my way? Because I think it is too heavy for me. On my board, sometimes entering idle happened hundreds times during the 100ms. I don't want to call notifier that much time. IMO, just moving local variable to per-cpu variable for showing the enter/exit time looks better although it requires code modification on cpudile side. What do you think? Thanks, Jonghwa. > Thanks > -- Daniel > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 1/2] cpuidle: Add idle enter/exit time stamp for notifying current idle state. 2013-04-02 9:37 ` jonghwa3.lee @ 2013-04-02 10:08 ` Daniel Lezcano 2013-04-02 11:07 ` jonghwa3.lee 0 siblings, 1 reply; 26+ messages in thread From: Daniel Lezcano @ 2013-04-02 10:08 UTC (permalink / raw) To: jonghwa3.lee Cc: Rafael J. Wysocki, linux-kernel, linux-pm, cpufreq, MyungJoo Ham, Lukasz Majewski, Kyungmin Park, Chanwoo Choi, sw0312.kim, m.szyprowski On 04/02/2013 11:37 AM, jonghwa3.lee@samsung.com wrote: > On 2013년 04월 02일 16:34, Daniel Lezcano wrote: > >> On 04/02/2013 08:17 AM, jonghwa3.lee@samsung.com wrote: >>> On 2013년 04월 02일 14:00, Daniel Lezcano wrote: >>> >>>> On 04/01/2013 10:24 AM, Jonghwa Lee wrote: >>>>> This patch adds idle state time stamp to cpuidle device structure to >>>>> notify its current idle state. If last enter time is newer than last >>>>> exit time, then it means that the core is in idle now. >>>>> >>>>> Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> >>>>> --- >>>> >>>> The patch description does not explain what problem you want to solve, >>>> how to solve it and the patch itself shows nothing. >>>> >>>> Could you elaborate ? >>> >>> >>> I'm sorry for lacking description. I supplement more. >>> >>> This patch does add time-stamp for idle enter/exit only nothing more. >>> The reason why I needed them is that I wanted to know current cpu idle >>> state. It is hard to know whether cpu is in idle or not now. >> >> Did you looked at: >> >> include/linux/sched.h:extern int idle_cpu(int cpu); >> > > > Yes, I did. > >> ? >> >>> When I check the cpuidle state usage, sometimes the information is wrong. >>> Because it is updated only when the cpu exits the idle state. So while the >>> cpu is idling, the cpuidle state usage holds past one. Therefore I put >>> the time-stamp for cpuidle enter/exit for checking current idling and >>> calculating idle state usage correctly. >>> >>> I just make this patch temporary for my cpufreq governor work. So, it just >>> use time-stamp for all idle state together. After RFC working, I have a plan >>> to update this patch to use timestamp for each idle state. >> >> I suggest you look at the enter_idle / exit_idle function and make your >> governor to subscribe to the IDLE_START/EXIT notifiers. >> >> arch/x86/kernel/process.c >> >> These are defined for the x86 architecture, maybe worth to add it to >> another architecture. >> > > > Thanks for your opinion. > > Actually, I work on ARM architecture and I knew that the attempt of applying > idle notifier was failed. You probably knew it, because the link you gave me > before is that attempt. (https://lkml.org/lkml/2012/2/7/504) :) Yeah, now I recall this thread. > Currently, there > is only notifying call which is for led in arch/arm/kernel/process.c and I think > it isn't for me to use. Anyway, Do you really think it is better way to use > notifier than my way? Because I think it is too heavy for me. On my board, > sometimes entering idle happened hundreds times during the 100ms. I don't want > to call notifier that much time. IMO, just moving local variable to per-cpu > variable for showing the enter/exit time looks better although it requires code > modification on cpudile side. What do you think? Sorry, but it is hard to figure out what you are trying to achieve with a single patch. IIUC, you want to know how long the cpu is idle including the current state, right ? So you need to know if the cpu is idle and when it entered the idle state, correct ? If the cpu is idle and the information is per cpu, how will you read this value from another cpu without introducing a locking mechanism ? Does it mean the cpufreq governor needs cpuidle ? I am wondering if these informations shouldn't be retrieved from the scheduler, not from cpuidle. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 1/2] cpuidle: Add idle enter/exit time stamp for notifying current idle state. 2013-04-02 10:08 ` Daniel Lezcano @ 2013-04-02 11:07 ` jonghwa3.lee 2013-04-02 11:18 ` Daniel Lezcano 0 siblings, 1 reply; 26+ messages in thread From: jonghwa3.lee @ 2013-04-02 11:07 UTC (permalink / raw) To: Daniel Lezcano Cc: Rafael J. Wysocki, linux-kernel, linux-pm, cpufreq, MyungJoo Ham, Lukasz Majewski, Kyungmin Park, Chanwoo Choi, sw0312.kim, m.szyprowski On 2013년 04월 02일 19:08, Daniel Lezcano wrote: > On 04/02/2013 11:37 AM, jonghwa3.lee@samsung.com wrote: >> On 2013년 04월 02일 16:34, Daniel Lezcano wrote: >> >>> On 04/02/2013 08:17 AM, jonghwa3.lee@samsung.com wrote: >>>> On 2013년 04월 02일 14:00, Daniel Lezcano wrote: >>>> >>>>> On 04/01/2013 10:24 AM, Jonghwa Lee wrote: >>>>>> This patch adds idle state time stamp to cpuidle device structure to >>>>>> notify its current idle state. If last enter time is newer than last >>>>>> exit time, then it means that the core is in idle now. >>>>>> >>>>>> Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> >>>>>> --- >>>>> >>>>> The patch description does not explain what problem you want to solve, >>>>> how to solve it and the patch itself shows nothing. >>>>> >>>>> Could you elaborate ? >>>> >>>> >>>> I'm sorry for lacking description. I supplement more. >>>> >>>> This patch does add time-stamp for idle enter/exit only nothing more. >>>> The reason why I needed them is that I wanted to know current cpu idle >>>> state. It is hard to know whether cpu is in idle or not now. >>> >>> Did you looked at: >>> >>> include/linux/sched.h:extern int idle_cpu(int cpu); >>> >> >> >> Yes, I did. >> >>> ? >>> >>>> When I check the cpuidle state usage, sometimes the information is wrong. >>>> Because it is updated only when the cpu exits the idle state. So while the >>>> cpu is idling, the cpuidle state usage holds past one. Therefore I put >>>> the time-stamp for cpuidle enter/exit for checking current idling and >>>> calculating idle state usage correctly. >>>> >>>> I just make this patch temporary for my cpufreq governor work. So, it just >>>> use time-stamp for all idle state together. After RFC working, I have a plan >>>> to update this patch to use timestamp for each idle state. >>> >>> I suggest you look at the enter_idle / exit_idle function and make your >>> governor to subscribe to the IDLE_START/EXIT notifiers. >>> >>> arch/x86/kernel/process.c >>> >>> These are defined for the x86 architecture, maybe worth to add it to >>> another architecture. >>> >> >> >> Thanks for your opinion. >> >> Actually, I work on ARM architecture and I knew that the attempt of applying >> idle notifier was failed. You probably knew it, because the link you gave me >> before is that attempt. (https://lkml.org/lkml/2012/2/7/504) :) > > Yeah, now I recall this thread. > Oh my, I thought you gave the link but you didn't. It was Viresh Kumar from other patch of the patchset. Sorry. >> Currently, there >> is only notifying call which is for led in arch/arm/kernel/process.c and I think >> it isn't for me to use. Anyway, Do you really think it is better way to use >> notifier than my way? Because I think it is too heavy for me. On my board, >> sometimes entering idle happened hundreds times during the 100ms. I don't want >> to call notifier that much time. IMO, just moving local variable to per-cpu >> variable for showing the enter/exit time looks better although it requires code >> modification on cpudile side. What do you think? > > Sorry, but it is hard to figure out what you are trying to achieve with > a single patch. > > IIUC, you want to know how long the cpu is idle including the current > state, right ? So you need to know if the cpu is idle and when it > entered the idle state, correct ? > Exactly. > If the cpu is idle and the information is per cpu, how will you read > this value from another cpu without introducing a locking mechanism ? > I think it might be tolerated for incoherency of that data. Governor reads the data only, and if recoded start time or end time are different in few usec with real one then it doesn't effect to the result much. > Does it mean the cpufreq governor needs cpuidle ? I am wondering if > these informations shouldn't be retrieved from the scheduler, not from > cpuidle. > Yes, tick_sched per-cpu variable has all information that I need. But it isn't global variable. And I'm afraid to change static variable to global one as my pleases. Thanks, Jonghwa ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 1/2] cpuidle: Add idle enter/exit time stamp for notifying current idle state. 2013-04-02 11:07 ` jonghwa3.lee @ 2013-04-02 11:18 ` Daniel Lezcano 2013-04-03 6:10 ` jonghwa3.lee 0 siblings, 1 reply; 26+ messages in thread From: Daniel Lezcano @ 2013-04-02 11:18 UTC (permalink / raw) To: jonghwa3.lee Cc: Rafael J. Wysocki, linux-kernel, linux-pm, cpufreq, MyungJoo Ham, Lukasz Majewski, Kyungmin Park, Chanwoo Choi, sw0312.kim, m.szyprowski On 04/02/2013 01:07 PM, jonghwa3.lee@samsung.com wrote: > On 2013년 04월 02일 19:08, Daniel Lezcano wrote: > >> On 04/02/2013 11:37 AM, jonghwa3.lee@samsung.com wrote: >>> On 2013년 04월 02일 16:34, Daniel Lezcano wrote: >>> >>>> On 04/02/2013 08:17 AM, jonghwa3.lee@samsung.com wrote: >>>>> On 2013년 04월 02일 14:00, Daniel Lezcano wrote: >>>>> >>>>>> On 04/01/2013 10:24 AM, Jonghwa Lee wrote: >>>>>>> This patch adds idle state time stamp to cpuidle device structure to >>>>>>> notify its current idle state. If last enter time is newer than last >>>>>>> exit time, then it means that the core is in idle now. >>>>>>> >>>>>>> Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> >>>>>>> --- >>>>>> >>>>>> The patch description does not explain what problem you want to solve, >>>>>> how to solve it and the patch itself shows nothing. >>>>>> >>>>>> Could you elaborate ? >>>>> >>>>> >>>>> I'm sorry for lacking description. I supplement more. >>>>> >>>>> This patch does add time-stamp for idle enter/exit only nothing more. >>>>> The reason why I needed them is that I wanted to know current cpu idle >>>>> state. It is hard to know whether cpu is in idle or not now. >>>> >>>> Did you looked at: >>>> >>>> include/linux/sched.h:extern int idle_cpu(int cpu); >>>> >>> >>> >>> Yes, I did. >>> >>>> ? >>>> >>>>> When I check the cpuidle state usage, sometimes the information is wrong. >>>>> Because it is updated only when the cpu exits the idle state. So while the >>>>> cpu is idling, the cpuidle state usage holds past one. Therefore I put >>>>> the time-stamp for cpuidle enter/exit for checking current idling and >>>>> calculating idle state usage correctly. >>>>> >>>>> I just make this patch temporary for my cpufreq governor work. So, it just >>>>> use time-stamp for all idle state together. After RFC working, I have a plan >>>>> to update this patch to use timestamp for each idle state. >>>> >>>> I suggest you look at the enter_idle / exit_idle function and make your >>>> governor to subscribe to the IDLE_START/EXIT notifiers. >>>> >>>> arch/x86/kernel/process.c >>>> >>>> These are defined for the x86 architecture, maybe worth to add it to >>>> another architecture. >>>> >>> >>> >>> Thanks for your opinion. >>> >>> Actually, I work on ARM architecture and I knew that the attempt of applying >>> idle notifier was failed. You probably knew it, because the link you gave me >>> before is that attempt. (https://lkml.org/lkml/2012/2/7/504) :) >> >> Yeah, now I recall this thread. >> > > > Oh my, I thought you gave the link but you didn't. It was Viresh Kumar from > other patch of the patchset. Sorry. > >>> Currently, there >>> is only notifying call which is for led in arch/arm/kernel/process.c and I think >>> it isn't for me to use. Anyway, Do you really think it is better way to use >>> notifier than my way? Because I think it is too heavy for me. On my board, >>> sometimes entering idle happened hundreds times during the 100ms. I don't want >>> to call notifier that much time. IMO, just moving local variable to per-cpu >>> variable for showing the enter/exit time looks better although it requires code >>> modification on cpudile side. What do you think? >> >> Sorry, but it is hard to figure out what you are trying to achieve with >> a single patch. >> >> IIUC, you want to know how long the cpu is idle including the current >> state, right ? So you need to know if the cpu is idle and when it >> entered the idle state, correct ? >> > > > Exactly. > > >> If the cpu is idle and the information is per cpu, how will you read >> this value from another cpu without introducing a locking mechanism ? >> > > > I think it might be tolerated for incoherency of that data. Governor reads the > data only, and if recoded start time or end time are different in few usec with > real one then it doesn't effect to the result much. > > >> Does it mean the cpufreq governor needs cpuidle ? I am wondering if >> these informations shouldn't be retrieved from the scheduler, not from >> cpuidle. >> > > > Yes, tick_sched per-cpu variable has all information that I need. But it isn't > global variable. And I'm afraid to change static variable to global one as my > pleases. It is a global variable but there is a function to get access: extern struct tick_sched *tick_get_tick_sched(int cpu); Does it fit better for what you want to achieve ? Thanks -- Daniel -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 1/2] cpuidle: Add idle enter/exit time stamp for notifying current idle state. 2013-04-02 11:18 ` Daniel Lezcano @ 2013-04-03 6:10 ` jonghwa3.lee 0 siblings, 0 replies; 26+ messages in thread From: jonghwa3.lee @ 2013-04-03 6:10 UTC (permalink / raw) To: Daniel Lezcano Cc: Rafael J. Wysocki, linux-kernel, linux-pm, cpufreq, MyungJoo Ham, Lukasz Majewski, Kyungmin Park, Chanwoo Choi, sw0312.kim, m.szyprowski On 2013년 04월 02일 20:18, Daniel Lezcano wrote: > On 04/02/2013 01:07 PM, jonghwa3.lee@samsung.com wrote: >> On 2013년 04월 02일 19:08, Daniel Lezcano wrote: >> >>> On 04/02/2013 11:37 AM, jonghwa3.lee@samsung.com wrote: >>>> On 2013년 04월 02일 16:34, Daniel Lezcano wrote: >>>> >>>>> On 04/02/2013 08:17 AM, jonghwa3.lee@samsung.com wrote: >>>>>> On 2013년 04월 02일 14:00, Daniel Lezcano wrote: >>>>>> >>>>>>> On 04/01/2013 10:24 AM, Jonghwa Lee wrote: >>>>>>>> This patch adds idle state time stamp to cpuidle device structure to >>>>>>>> notify its current idle state. If last enter time is newer than last >>>>>>>> exit time, then it means that the core is in idle now. >>>>>>>> >>>>>>>> Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> >>>>>>>> --- >>>>>>> >>>>>>> The patch description does not explain what problem you want to solve, >>>>>>> how to solve it and the patch itself shows nothing. >>>>>>> >>>>>>> Could you elaborate ? >>>>>> >>>>>> >>>>>> I'm sorry for lacking description. I supplement more. >>>>>> >>>>>> This patch does add time-stamp for idle enter/exit only nothing more. >>>>>> The reason why I needed them is that I wanted to know current cpu idle >>>>>> state. It is hard to know whether cpu is in idle or not now. >>>>> >>>>> Did you looked at: >>>>> >>>>> include/linux/sched.h:extern int idle_cpu(int cpu); >>>>> >>>> >>>> >>>> Yes, I did. >>>> >>>>> ? >>>>> >>>>>> When I check the cpuidle state usage, sometimes the information is wrong. >>>>>> Because it is updated only when the cpu exits the idle state. So while the >>>>>> cpu is idling, the cpuidle state usage holds past one. Therefore I put >>>>>> the time-stamp for cpuidle enter/exit for checking current idling and >>>>>> calculating idle state usage correctly. >>>>>> >>>>>> I just make this patch temporary for my cpufreq governor work. So, it just >>>>>> use time-stamp for all idle state together. After RFC working, I have a plan >>>>>> to update this patch to use timestamp for each idle state. >>>>> >>>>> I suggest you look at the enter_idle / exit_idle function and make your >>>>> governor to subscribe to the IDLE_START/EXIT notifiers. >>>>> >>>>> arch/x86/kernel/process.c >>>>> >>>>> These are defined for the x86 architecture, maybe worth to add it to >>>>> another architecture. >>>>> >>>> >>>> >>>> Thanks for your opinion. >>>> >>>> Actually, I work on ARM architecture and I knew that the attempt of applying >>>> idle notifier was failed. You probably knew it, because the link you gave me >>>> before is that attempt. (https://lkml.org/lkml/2012/2/7/504) :) >>> >>> Yeah, now I recall this thread. >>> >> >> >> Oh my, I thought you gave the link but you didn't. It was Viresh Kumar from >> other patch of the patchset. Sorry. >> >>>> Currently, there >>>> is only notifying call which is for led in arch/arm/kernel/process.c and I think >>>> it isn't for me to use. Anyway, Do you really think it is better way to use >>>> notifier than my way? Because I think it is too heavy for me. On my board, >>>> sometimes entering idle happened hundreds times during the 100ms. I don't want >>>> to call notifier that much time. IMO, just moving local variable to per-cpu >>>> variable for showing the enter/exit time looks better although it requires code >>>> modification on cpudile side. What do you think? >>> >>> Sorry, but it is hard to figure out what you are trying to achieve with >>> a single patch. >>> >>> IIUC, you want to know how long the cpu is idle including the current >>> state, right ? So you need to know if the cpu is idle and when it >>> entered the idle state, correct ? >>> >> >> >> Exactly. >> >> >>> If the cpu is idle and the information is per cpu, how will you read >>> this value from another cpu without introducing a locking mechanism ? >>> >> >> >> I think it might be tolerated for incoherency of that data. Governor reads the >> data only, and if recoded start time or end time are different in few usec with >> real one then it doesn't effect to the result much. >> >> >>> Does it mean the cpufreq governor needs cpuidle ? I am wondering if >>> these informations shouldn't be retrieved from the scheduler, not from >>> cpuidle. >>> >> >> >> Yes, tick_sched per-cpu variable has all information that I need. But it isn't >> global variable. And I'm afraid to change static variable to global one as my >> pleases. > > It is a global variable but there is a function to get access: > > extern struct tick_sched *tick_get_tick_sched(int cpu); > > Does it fit better for what you want to achieve ? Yes, it seems exactly what I needed. I'll check it. Thanks for your advice :) Thanks, Jonghwa. > > Thanks > -- Daniel > > ^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC PATCH 2/2] cpufreq: Introduce new cpufreq governor, LAB(Legacy Application Boost). 2013-04-01 8:24 [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor Jonghwa Lee 2013-04-01 8:24 ` [RFC PATCH 1/2] cpuidle: Add idle enter/exit time stamp for notifying current idle state Jonghwa Lee @ 2013-04-01 8:24 ` Jonghwa Lee 2013-04-01 15:37 ` [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor Viresh Kumar 2 siblings, 0 replies; 26+ messages in thread From: Jonghwa Lee @ 2013-04-01 8:24 UTC (permalink / raw) To: Rafael J. Wysocki Cc: linux-kernel, linux-pm, cpufreq, MyungJoo Ham, Lukasz Majewski, Kyungmin Park, Chanwoo Choi, sw0312.kim, m.szyprowski, Jonghwa Lee This patch introduces new cpufreq governor named 'LAB'. LAB governor will use historical cpuidle state usage infomation to determine how many cpus are in busy now. The results, the number of idle cpus, will affect to next frequency dynamically. For instance, we can assume that it is working on quard core processor. When all 4 cores are busy then this governor throttles next frequency in maximum. For 3 cores, throttling will be loosen. The less core are busy, the more high frequency will be set. So, when only one core is in busy, then it releases maximum frequency that system allows. The name of 'Legacy Application Boost' came from the such above aspect which system has the highest performance for single threaded process. This is tested on Pegasus Quad board. Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com> Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com> --- drivers/cpufreq/Kconfig | 26 ++ drivers/cpufreq/Makefile | 1 + drivers/cpufreq/cpufreq_governor.h | 14 + drivers/cpufreq/cpufreq_lab.c | 553 ++++++++++++++++++++++++++++++++++++ include/linux/cpufreq.h | 3 + 5 files changed, 597 insertions(+) create mode 100644 drivers/cpufreq/cpufreq_lab.c diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig index cbcb21e..d0b22de 100644 --- a/drivers/cpufreq/Kconfig +++ b/drivers/cpufreq/Kconfig @@ -102,6 +102,18 @@ config CPU_FREQ_DEFAULT_GOV_CONSERVATIVE Be aware that not all cpufreq drivers support the conservative governor. If unsure have a look at the help section of the driver. Fallback governor will be the performance governor. + +config CPU_FREQ_DEFAULT_GOV_LAB + bool "lab" + select CPU_FREQ_GOV_LAB + select CPU_FREQ_GOV_PERFORMANCE + help + Use the CPUFreq governor 'lab' as default. This allows + you to get a full dynamic frequency capable system by simply + loading your cpufreq low-level hardware driver. + Be aware that not all cpufreq drivers support the lab governor. + If unsure have a look at the help section of the driver. + Fallback governor will be the performance governor. endchoice config CPU_FREQ_GOV_PERFORMANCE @@ -184,6 +196,20 @@ config CPU_FREQ_GOV_CONSERVATIVE If in doubt, say N. +config CPU_FREQ_GOV_LAB + tristate "'lab' cpufreq policy governor" + select CPU_FREQ_TABLE + select CPU_FREQ_GOV_COMMON + help + 'lab' - This driver adds a dynamic cpufreq policy governor. + + To compile this driver as a module, choose M here: the + module will be called cpufreq_ondemand. + + For details, take a look at linux/Documentation/cpu-freq. + + If in doubt, say N. + config GENERIC_CPUFREQ_CPU0 tristate "Generic CPU0 cpufreq driver" depends on HAVE_CLK && REGULATOR && PM_OPP && OF diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile index 863fd18..2520fa7 100644 --- a/drivers/cpufreq/Makefile +++ b/drivers/cpufreq/Makefile @@ -9,6 +9,7 @@ obj-$(CONFIG_CPU_FREQ_GOV_POWERSAVE) += cpufreq_powersave.o obj-$(CONFIG_CPU_FREQ_GOV_USERSPACE) += cpufreq_userspace.o obj-$(CONFIG_CPU_FREQ_GOV_ONDEMAND) += cpufreq_ondemand.o obj-$(CONFIG_CPU_FREQ_GOV_CONSERVATIVE) += cpufreq_conservative.o +obj-$(CONFIG_CPU_FREQ_GOV_LAB) += cpufreq_lab.o obj-$(CONFIG_CPU_FREQ_GOV_COMMON) += cpufreq_governor.o # CPUfreq cross-arch helpers diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 46bde01..3062d60 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -103,6 +103,19 @@ struct cs_cpu_dbs_info_s { unsigned int enable:1; }; +struct lb_cpu_dbs_info_s { + struct cpu_dbs_common_info cdbs; + u64 prev_cpu_iowait; + struct cpufreq_frequency_table *freq_table; + unsigned int freq_lo; + unsigned int freq_lo_jiffies; + unsigned int freq_hi_jiffies; + unsigned int rate_mult; + unsigned int sample_type:1; + + unsigned int last_sampling_rate; +}; + /* Governers sysfs tunables */ struct od_dbs_tuners { unsigned int ignore_nice; @@ -128,6 +141,7 @@ struct dbs_data { /* Common across governors */ #define GOV_ONDEMAND 0 #define GOV_CONSERVATIVE 1 + #define GOV_LAB 2 int governor; unsigned int min_sampling_rate; struct attribute_group *attr_group; diff --git a/drivers/cpufreq/cpufreq_lab.c b/drivers/cpufreq/cpufreq_lab.c new file mode 100644 index 0000000..7841a50 --- /dev/null +++ b/drivers/cpufreq/cpufreq_lab.c @@ -0,0 +1,553 @@ +/* + * drivers/cpufreq/cpufreq_lab.c + * + * LAB(Legacy Application Boost) cpufreq governor + * + * Copyright (C) SAMSUNG Electronics. CO. + * Jonghwa Lee <jonghw3.lee@samusng.com> + * Lukasz Majewski <l.majewski@samsung.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/cpufreq.h> +#include <linux/init.h> +#include <linux/kernel.h> +#include <linux/kernel_stat.h> +#include <linux/kobject.h> +#include <linux/module.h> +#include <linux/mutex.h> +#include <linux/percpu-defs.h> +#include <linux/sysfs.h> +#include <linux/tick.h> +#include <linux/types.h> +#include <linux/cpuidle.h> +#include <linux/slab.h> + +#include "cpufreq_governor.h" + +/* On-demand governor macros */ +#define DEF_FREQUENCY_DOWN_DIFFERENTIAL (10) +#define DEF_FREQUENCY_UP_THRESHOLD (80) +#define DEF_SAMPLING_DOWN_FACTOR (1) +#define MAX_SAMPLING_DOWN_FACTOR (100000) +#define MICRO_FREQUENCY_DOWN_DIFFERENTIAL (3) +#define MICRO_FREQUENCY_UP_THRESHOLD (95) +#define MICRO_FREQUENCY_MIN_SAMPLE_RATE (10000) +#define MIN_FREQUENCY_UP_THRESHOLD (11) +#define MAX_FREQUENCY_UP_THRESHOLD (100) + +#define MAX_HIST 10 +#define FREQ_STEP 50000 +#define IDLE_THRESHOLD 90 + +static unsigned int lab_limit_rate[5] = {35, 20, 10, 0, 0}; +static unsigned long *ref_time; +static unsigned int *idle_avg; +static unsigned int **idle_hist; + +static struct dbs_data lb_dbs_data; +static DEFINE_PER_CPU(struct lb_cpu_dbs_info_s, lb_cpu_dbs_info); + +#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_LAB +static struct cpufreq_governor cpufreq_gov_lab; +#endif + +static struct od_dbs_tuners lb_tuners = { + .up_threshold = DEF_FREQUENCY_UP_THRESHOLD, + .sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR, + .adj_up_threshold = DEF_FREQUENCY_UP_THRESHOLD - + DEF_FREQUENCY_DOWN_DIFFERENTIAL, + .ignore_nice = 0, +}; + +static void dbs_freq_increase(struct cpufreq_policy *p, unsigned int freq) +{ + if (p->cur == freq) + return; + + __cpufreq_driver_target(p, freq, CPUFREQ_RELATION_L); +} + +static inline int cpu_idle_calc_avg(unsigned int *p, int size) +{ + int i, sum; + + for (i = 0, sum = 0; i < size; p++, i++) + sum += *p; + + return (int) (sum / size); +} + +/* + * Every sampling_rate, we check, if current idle time is less than 20% + * (default), then we try to increase frequency. Every sampling_rate, we look + * for the lowest frequency which can sustain the load while keeping idle time + * over 30%. If such a frequency exist, we try to decrease to this frequency. + * + * Any frequency increase takes it to the maximum frequency. Frequency reduction + * happens at minimum steps of 5% (default) of current frequency + */ +static void lb_check_cpu(int cpu, unsigned int load_freq) +{ + struct lb_cpu_dbs_info_s *dbs_info = &per_cpu(lb_cpu_dbs_info, cpu); + struct cpufreq_policy *policy = dbs_info->cdbs.cur_policy; + struct cpuidle_device *dev; + int i, idx, idle_cpus; + static int cnt; + unsigned int _max; + + dbs_info->freq_lo = 0; + + idle_cpus = 0; + + idx = cnt++ % MAX_HIST; + /* Check for LAB limitation */ + for_each_possible_cpu(i) { + ktime_t cur; + ktime_t last_idle_enter; + ktime_t last_idle_exit; + s64 delta_time; + unsigned int last_sampling_rate = dbs_info->last_sampling_rate; + + dev = per_cpu(cpuidle_devices, i); + last_idle_enter = dev->last_idle_start; + last_idle_exit = dev->last_idle_end; + + cur = ktime_get(); + + /* Check whether the i'th core is in idle */ + if (ktime_to_us(ktime_sub(last_idle_enter, + last_idle_exit)) > 0) { + if (ktime_to_us(ktime_sub(cur, last_idle_enter)) + > last_sampling_rate) { + delta_time = last_sampling_rate; + } else { + delta_time = ktime_to_us(ktime_sub(cur, + last_idle_enter)); + if (ktime_to_us(ktime_sub(cur, last_idle_exit)) + < last_sampling_rate) + delta_time += dev->states_usage[0].time + - ref_time[i]; + } + } else { + delta_time = dev->states_usage[0].time - ref_time[i]; + } + + delta_time *= 100; + if (last_sampling_rate > 0) + delta_time = div_s64(delta_time, last_sampling_rate); + else + delta_time = 100; + + if (delta_time > 100) + delta_time = 100; + + idle_hist[i][idx] = delta_time; + + ref_time[i] = dev->states_usage[0].time; + + idle_avg[i] = cpu_idle_calc_avg(idle_hist[i], + cnt < 10 ? cnt : MAX_HIST); + if (idle_avg[i] > IDLE_THRESHOLD) + idle_cpus++; + } + + _max = policy->max * (100 - lab_limit_rate[idle_cpus]); + _max /= 100; + + if (!idx) + pr_debug("_max : %d, idle_cpus : %d, avg : %d %d %d %d\n", _max, + idle_cpus, idle_avg[0], idle_avg[1], idle_avg[2], idle_avg[3]); + + /* Check for frequency increase */ + if (load_freq > lb_tuners.up_threshold * policy->cur) { + unsigned int freq_next; + static unsigned int inc; + + if (!idle_cpus) + inc += FREQ_STEP / 2; + else + inc += FREQ_STEP * idle_cpus; + + freq_next = min(policy->cur + inc, _max); + if (freq_next == _max) + inc = 0; + /* If switching to max speed, apply sampling_down_factor */ + if (policy->cur < _max) + dbs_info->rate_mult = + lb_tuners.sampling_down_factor; + + dbs_freq_increase(policy, freq_next); + return; + } + + /* Check for frequency decrease */ + /* if we cannot reduce the frequency anymore, break out early */ + if (policy->cur == policy->min) + return; + + /* + * The optimal frequency is the frequency that is the lowest that can + * support the current CPU usage without triggering the up policy. To be + * safe, we focus 10 points under the threshold. + */ + if (load_freq < lb_tuners.adj_up_threshold * policy->cur) { + unsigned int freq_next; + freq_next = load_freq / lb_tuners.adj_up_threshold; + + /* No longer fully busy, reset rate_mult */ + dbs_info->rate_mult = 1; + + if (freq_next < policy->min) + freq_next = policy->min; + + __cpufreq_driver_target(policy, freq_next, + CPUFREQ_RELATION_L); + } +} + +static void lb_dbs_timer(struct work_struct *work) +{ + struct delayed_work *dw = to_delayed_work(work); + struct lb_cpu_dbs_info_s *dbs_info = + container_of(work, struct lb_cpu_dbs_info_s, cdbs.work.work); + unsigned int cpu = dbs_info->cdbs.cur_policy->cpu; + struct lb_cpu_dbs_info_s *core_dbs_info = &per_cpu(lb_cpu_dbs_info, + cpu); + int delay, sample_type = core_dbs_info->sample_type; + bool eval_load; + + mutex_lock(&core_dbs_info->cdbs.timer_mutex); + eval_load = need_load_eval(&core_dbs_info->cdbs, + lb_tuners.sampling_rate); + + /* Common NORMAL_SAMPLE setup */ + core_dbs_info->sample_type = OD_NORMAL_SAMPLE; + if (sample_type == OD_SUB_SAMPLE) { + delay = core_dbs_info->freq_lo_jiffies; + if (eval_load) + __cpufreq_driver_target(core_dbs_info->cdbs.cur_policy, + core_dbs_info->freq_lo, + CPUFREQ_RELATION_H); + } else { + if (eval_load) + dbs_check_cpu(&lb_dbs_data, cpu); + if (core_dbs_info->freq_lo) { + /* Setup timer for SUB_SAMPLE */ + core_dbs_info->sample_type = OD_SUB_SAMPLE; + delay = core_dbs_info->freq_hi_jiffies; + } else { + delay = delay_for_sampling_rate(lb_tuners.sampling_rate + * core_dbs_info->rate_mult); + } + } + + dbs_info->last_sampling_rate = jiffies_to_usecs(delay); + + schedule_delayed_work_on(smp_processor_id(), dw, delay); + mutex_unlock(&core_dbs_info->cdbs.timer_mutex); +} + +/************************** sysfs interface ************************/ + +static ssize_t show_sampling_rate_min(struct kobject *kobj, + struct attribute *attr, char *buf) +{ + return sprintf(buf, "%u\n", lb_dbs_data.min_sampling_rate); +} + +/** + * update_sampling_rate - update sampling rate effective immediately if needed. + * @new_rate: new sampling rate + * + * If new rate is smaller than the old, simply updating + * dbs_tuners_int.sampling_rate might not be appropriate. For example, if the + * original sampling_rate was 1 second and the requested new sampling rate is 10 + * ms because the user needs immediate reaction from lab governor, but not + * sure if higher frequency will be required or not, then, the governor may + * change the sampling rate too late; up to 1 second later. Thus, if we are + * reducing the sampling rate, we need to make the new value effective + * immediately. + */ +static void update_sampling_rate(unsigned int new_rate) +{ + int cpu; + + lb_tuners.sampling_rate = new_rate = max(new_rate, + lb_dbs_data.min_sampling_rate); + + for_each_online_cpu(cpu) { + struct cpufreq_policy *policy; + struct lb_cpu_dbs_info_s *dbs_info; + unsigned long next_sampling, appointed_at; + + policy = cpufreq_cpu_get(cpu); + if (!policy) + continue; + if (policy->governor != &cpufreq_gov_lab) { + cpufreq_cpu_put(policy); + continue; + } + dbs_info = &per_cpu(lb_cpu_dbs_info, cpu); + cpufreq_cpu_put(policy); + + mutex_lock(&dbs_info->cdbs.timer_mutex); + + if (!delayed_work_pending(&dbs_info->cdbs.work)) { + mutex_unlock(&dbs_info->cdbs.timer_mutex); + continue; + } + + next_sampling = jiffies + usecs_to_jiffies(new_rate); + appointed_at = dbs_info->cdbs.work.timer.expires; + + if (time_before(next_sampling, appointed_at)) { + + mutex_unlock(&dbs_info->cdbs.timer_mutex); + cancel_delayed_work_sync(&dbs_info->cdbs.work); + mutex_lock(&dbs_info->cdbs.timer_mutex); + + schedule_delayed_work_on(cpu, &dbs_info->cdbs.work, + usecs_to_jiffies(new_rate)); + + } + mutex_unlock(&dbs_info->cdbs.timer_mutex); + } +} + +static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b, + const char *buf, size_t count) +{ + unsigned int input; + int ret; + ret = sscanf(buf, "%u", &input); + if (ret != 1) + return -EINVAL; + update_sampling_rate(input); + return count; +} + +static ssize_t store_io_is_busy(struct kobject *a, struct attribute *b, + const char *buf, size_t count) +{ + unsigned int input; + int ret; + + ret = sscanf(buf, "%u", &input); + if (ret != 1) + return -EINVAL; + lb_tuners.io_is_busy = !!input; + return count; +} + +static ssize_t store_up_threshold(struct kobject *a, struct attribute *b, + const char *buf, size_t count) +{ + unsigned int input; + int ret; + ret = sscanf(buf, "%u", &input); + + if (ret != 1 || input > MAX_FREQUENCY_UP_THRESHOLD || + input < MIN_FREQUENCY_UP_THRESHOLD) { + return -EINVAL; + } + /* Calculate the new adj_up_threshold */ + lb_tuners.adj_up_threshold += input; + lb_tuners.adj_up_threshold -= lb_tuners.up_threshold; + + lb_tuners.up_threshold = input; + return count; +} + +static ssize_t store_sampling_down_factor(struct kobject *a, + struct attribute *b, const char *buf, size_t count) +{ + unsigned int input, j; + int ret; + ret = sscanf(buf, "%u", &input); + + if (ret != 1 || input > MAX_SAMPLING_DOWN_FACTOR || input < 1) + return -EINVAL; + lb_tuners.sampling_down_factor = input; + + /* Reset down sampling multiplier in case it was active */ + for_each_online_cpu(j) { + struct lb_cpu_dbs_info_s *dbs_info = &per_cpu(lb_cpu_dbs_info, + j); + dbs_info->rate_mult = 1; + } + return count; +} + +static ssize_t store_ignore_nice_load(struct kobject *a, struct attribute *b, + const char *buf, size_t count) +{ + unsigned int input; + int ret; + + unsigned int j; + + ret = sscanf(buf, "%u", &input); + if (ret != 1) + return -EINVAL; + + if (input > 1) + input = 1; + + if (input == lb_tuners.ignore_nice) { /* nothing to do */ + return count; + } + lb_tuners.ignore_nice = input; + + /* we need to re-evaluate prev_cpu_idle */ + for_each_online_cpu(j) { + struct lb_cpu_dbs_info_s *dbs_info; + dbs_info = &per_cpu(lb_cpu_dbs_info, j); + dbs_info->cdbs.prev_cpu_idle = get_cpu_idle_time(j, + &dbs_info->cdbs.prev_cpu_wall); + if (lb_tuners.ignore_nice) + dbs_info->cdbs.prev_cpu_nice = + kcpustat_cpu(j).cpustat[CPUTIME_NICE]; + + } + return count; +} + +show_one(lb, sampling_rate, sampling_rate); +show_one(lb, io_is_busy, io_is_busy); +show_one(lb, up_threshold, up_threshold); +show_one(lb, sampling_down_factor, sampling_down_factor); +show_one(lb, ignore_nice_load, ignore_nice); + +define_one_global_rw(sampling_rate); +define_one_global_rw(io_is_busy); +define_one_global_rw(up_threshold); +define_one_global_rw(sampling_down_factor); +define_one_global_rw(ignore_nice_load); +define_one_global_ro(sampling_rate_min); + +static struct attribute *dbs_attributes[] = { + &sampling_rate_min.attr, + &sampling_rate.attr, + &up_threshold.attr, + &sampling_down_factor.attr, + &ignore_nice_load.attr, + &io_is_busy.attr, + NULL +}; + +static struct attribute_group lb_attr_group = { + .attrs = dbs_attributes, + .name = "lab", +}; + +/************************** sysfs end ************************/ + +define_get_cpu_dbs_routines(lb_cpu_dbs_info); + +static struct od_ops lb_ops = { + .freq_increase = dbs_freq_increase, +}; + +static struct dbs_data lb_dbs_data = { + .governor = GOV_LAB, + .attr_group = &lb_attr_group, + .tuners = &lb_tuners, + .get_cpu_cdbs = get_cpu_cdbs, + .get_cpu_dbs_info_s = get_cpu_dbs_info_s, + .gov_dbs_timer = lb_dbs_timer, + .gov_check_cpu = lb_check_cpu, + .gov_ops = &lb_ops, +}; + +static int lb_cpufreq_governor_dbs(struct cpufreq_policy *policy, + unsigned int event) +{ + return cpufreq_governor_dbs(&lb_dbs_data, policy, event); +} + +#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_LAB +static +#endif +struct cpufreq_governor cpufreq_gov_lab = { + .name = "lab", + .governor = lb_cpufreq_governor_dbs, + .max_transition_latency = TRANSITION_LATENCY_LIMIT, + .owner = THIS_MODULE, +}; + +static int __init cpufreq_gov_dbs_init(void) +{ + u64 idle_time; + int i, cpu = get_cpu(); + + mutex_init(&lb_dbs_data.mutex); + idle_time = get_cpu_idle_time_us(cpu, NULL); + put_cpu(); + if (idle_time != -1ULL) { + /* Idle micro accounting is supported. Use finer thresholds */ + lb_tuners.up_threshold = MICRO_FREQUENCY_UP_THRESHOLD; + lb_tuners.adj_up_threshold = MICRO_FREQUENCY_UP_THRESHOLD - + MICRO_FREQUENCY_DOWN_DIFFERENTIAL; + /* + * In nohz/micro accounting case we set the minimum frequency + * not depending on HZ, but fixed (very low). The deferred + * timer might skip some samples if idle/sleeping as needed. + */ + lb_dbs_data.min_sampling_rate = MICRO_FREQUENCY_MIN_SAMPLE_RATE; + } else { + /* For correct statistics, we need 10 ticks for each measure */ + lb_dbs_data.min_sampling_rate = MIN_SAMPLING_RATE_RATIO * + jiffies_to_usecs(10); + } + + /* Initialize arrays */ + ref_time = kzalloc(GFP_KERNEL, + num_possible_cpus() * sizeof(unsigned long)); + idle_avg = kzalloc(GFP_KERNEL, + num_possible_cpus() * sizeof(unsigned int)); + idle_hist = kzalloc(GFP_KERNEL, + num_possible_cpus() * sizeof(unsigned int *)); + for (i = 0; i < num_possible_cpus(); i++) + idle_hist[i] = kzalloc(GFP_KERNEL, + MAX_HIST * sizeof(unsigned int)); + + return cpufreq_register_governor(&cpufreq_gov_lab); +} + +static void __exit cpufreq_gov_dbs_exit(void) +{ + int i; + + if (!ref_time) + kfree(ref_time); + if (!idle_avg) + kfree(idle_avg); + if (!idle_hist) { + for (i = 0; i < num_possible_cpus(); i++) { + if (!idle_hist[i]) + kfree(idle_hist[i]); + } + kfree(idle_hist); + } + + cpufreq_unregister_governor(&cpufreq_gov_lab); +} + +MODULE_AUTHOR("Jonghwa Lee <jonghwa3.lee@samsung.com>"); +MODULE_AUTHOR("Lukasz Majewski <l.majewski@samsung.com>"); +MODULE_DESCRIPTION("'cpufreq_lab' - A dynamic cpufreq governor for " + "Legacy Application Boosting"); +MODULE_LICENSE("GPL"); + +#ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_LAB +fs_initcall(cpufreq_gov_dbs_init); +#else +module_init(cpufreq_gov_dbs_init); +#endif +module_exit(cpufreq_gov_dbs_exit); diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index a22944c..050f28b 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -382,6 +382,9 @@ extern struct cpufreq_governor cpufreq_gov_ondemand; #elif defined(CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE) extern struct cpufreq_governor cpufreq_gov_conservative; #define CPUFREQ_DEFAULT_GOVERNOR (&cpufreq_gov_conservative) +#elif defined(CONFIG_CPU_FREQ_DEFAULT_GOV_LAB) +extern struct cpufreq_governor cpufreq_gov_lab; +#define CPUFREQ_DEFAULT_GOVERNOR (&cpufreq_gov_lab) #endif -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. 2013-04-01 8:24 [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor Jonghwa Lee 2013-04-01 8:24 ` [RFC PATCH 1/2] cpuidle: Add idle enter/exit time stamp for notifying current idle state Jonghwa Lee 2013-04-01 8:24 ` [RFC PATCH 2/2] cpufreq: Introduce new cpufreq governor, LAB(Legacy Application Boost) Jonghwa Lee @ 2013-04-01 15:37 ` Viresh Kumar 2013-04-09 10:37 ` Lukasz Majewski 2 siblings, 1 reply; 26+ messages in thread From: Viresh Kumar @ 2013-04-01 15:37 UTC (permalink / raw) To: Jonghwa Lee Cc: Rafael J. Wysocki, linux-kernel, Linux PM list, cpufreq, MyungJoo Ham, Lukasz Majewski, Kyungmin Park, Chanwoo Choi, sw0312.kim, Marek Szyprowski On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee <jonghwa3.lee@samsung.com> wrote: > <<Purpose>> > One of the problem of ondemand is that it considers the most busy > cpu only while doesn't care how many cpu is in busy state at the > moment. This may results in unnecessary power consumption, and it'll > be critical for the system having limited power source. > > To get the best energy efficiency, LAB governor considers not only > idle time but also the number of idle cpus. It primarily focuses on > supplying adequate performance to users within the limited resource. > It checks the number of idle cpus then controls the maximum frequency > dynamically. And it also applies different frequency increasing level > depends on that information. In simple terms the less the number of > busy cpus, the better performance will be given. > In addition, stable system's power consumption in the busy state can > be achieved also with using LAB governor. This will help to manage and > estimate power consumption in certain system. Hi Jonghwa, First of all, i should accept that i haven't got to the minute details about your patch until now but have done a broad review of it. There are many things that i am concerned about: - I don't want an additional governor to be added to cpufreq unless there is a very very strong reason for it. See what happened to earlier attempts: https://lkml.org/lkml/2012/2/7/504 But it doesn't mean you can't get it in. :) - What the real logic behind your patchset: I haven't got it completely with your mails. So what you said is: - The lesser the number of busy cpus: you want to run at higher freqs - The more the number of busy cpus: you want to run at lower freqs But the basic idea i had about this stuff was: The more the number of busy cpus, the more loaded the system is, otherwise scheduler wouldn't have used so many cpus and so there is need to run at higher frequency rather than a lower one. Which would save power in a sense.. Finish work early and let most of the cpus enter idle state as early as possible. But with your solution we would run at lower frequencies and so these cpus will take longer to get into idle state again. This will really kill lot of power. Think about it. - In case you need some sort of support on this use case, why replicate ondemand governor again by creating another governor. I have had some hard time removing the amount of redundancy inside governors and you are again going towards that direction. Modifying ondemand governor for this response would be a better option. - You haven't rebased of latest code from linux-next :) -- viresh ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. 2013-04-01 15:37 ` [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor Viresh Kumar @ 2013-04-09 10:37 ` Lukasz Majewski 2013-04-09 12:02 ` Viresh Kumar 2013-04-09 12:25 ` jonghwa3.lee 0 siblings, 2 replies; 26+ messages in thread From: Lukasz Majewski @ 2013-04-09 10:37 UTC (permalink / raw) To: Viresh Kumar Cc: Jonghwa Lee, Rafael J. Wysocki, linux-kernel, Linux PM list, cpufreq, MyungJoo Ham, Kyungmin Park, Chanwoo Choi, sw0312.kim, Marek Szyprowski Hi Viresh, First of all I'd like to apologize for a late response. Please find my comments below. > On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee > <jonghwa3.lee@samsung.com> wrote: > > <<Purpose>> > > One of the problem of ondemand is that it considers the most busy > > cpu only while doesn't care how many cpu is in busy state at the > > moment. This may results in unnecessary power consumption, and it'll > > be critical for the system having limited power source. > > > > To get the best energy efficiency, LAB governor considers not only > > idle time but also the number of idle cpus. It primarily focuses on > > supplying adequate performance to users within the limited resource. > > It checks the number of idle cpus then controls the maximum > > frequency dynamically. And it also applies different frequency > > increasing level depends on that information. In simple terms the > > less the number of busy cpus, the better performance will be given. > > In addition, stable system's power consumption in the busy state > > can be achieved also with using LAB governor. This will help to > > manage and estimate power consumption in certain system. > > Hi Jonghwa, > > First of all, i should accept that i haven't got to the minute details > about your > patch until now but have done a broad review of it. > > There are many things that i am concerned about: > - I don't want an additional governor to be added to cpufreq unless > there is a very very strong reason for it. See what happened to > earlier attempts: > > https://lkml.org/lkml/2012/2/7/504 > > But it doesn't mean you can't get it in. :) > > - What the real logic behind your patchset: I haven't got it > completely with your > mails. So what you said is: > > - The lesser the number of busy cpus: you want to run at higher > freqs > - The more the number of busy cpus: you want to run at lower freqs > > But the basic idea i had about this stuff was: The more the number of > busy cpus, the more loaded the system is, otherwise scheduler wouldn't > have used so many cpus and so there is need to run at higher frequency > rather than a lower one. Which would save power in a sense.. Finish > work early and let most of the cpus enter idle state as early as > possible. But with your solution we would run at lower frequencies > and so these cpus will take longer to get into idle state again. This > will really kill lot of power. Our approach is a bit different than cpufreq_ondemand one. Ondemand takes the per CPU idle time, then on that basis calculates per cpu load. The next step is to choose the highest load and then use this value to properly scale frequency. On the other hand LAB tries to model different behavior: As a first step we applied Vincent Guittot's "pack small tasks" [*] patch to improve "race to idle" behavior: http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks Afterwards, we decided to investigate different approach for power governing: Use the number of sleeping CPUs (not the maximal per-CPU load) to change frequency. We thereof depend on [*] to "pack" as many tasks to CPU as possible and allow other to sleep. On this basis we can increase (even overclock) frequency (when other CPUs sleep) to improve performance of the only running CPU. Contrary, when all cores are heavily loaded, we decided to reduce frequency by around 30%. With this approach user experience recution is still acceptable (with much less power consumption). When system is "idle" - only small background tasks are running, the frequency is reduced to minimum. To sum up: Different approach (number of idle CPUs) is used by us to control ondemand governor. We also heavily depend on [*] patch set. > > Think about it. > > - In case you need some sort of support on this use case, why > replicate ondemand governor again by creating another governor. I > have had some hard time removing the amount of redundancy inside > governors and you are again going towards that direction. Modifying > ondemand governor for this response would be a better option. We have only posted the "RFC" so, we are open for suggestions. At cpufreq_governor.c file the dbs_check_cpu function is responsible for calculating the maximal load (among CPUs). I think that we could also count the number of sleeping CPUs (as the average of time accumulated data). Then we could pass this data to newly written function (e.g. lab_check_cpu) defined at cpufreq_ondemand.c (and pointed to by gov_check_cpu). This would require change to dbs_check_cpu function and extending ONDEMAND governor by lab_check_cpu() function. > > - You haven't rebased of latest code from linux-next :) > We have posted this "RFC" patch mainly for discussion, and I think it fits its purpose :-). -- Best regards, Lukasz Majewski Samsung R&D Poland (SRPOL) | Linux Platform Group ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. 2013-04-09 10:37 ` Lukasz Majewski @ 2013-04-09 12:02 ` Viresh Kumar 2013-04-09 16:44 ` Lukasz Majewski 2013-04-09 12:25 ` jonghwa3.lee 1 sibling, 1 reply; 26+ messages in thread From: Viresh Kumar @ 2013-04-09 12:02 UTC (permalink / raw) To: Lukasz Majewski Cc: Jonghwa Lee, Rafael J. Wysocki, linux-kernel, Linux PM list, cpufreq, MyungJoo Ham, Kyungmin Park, Chanwoo Choi, sw0312.kim, Marek Szyprowski, Vincent Guittot On 9 April 2013 16:07, Lukasz Majewski <l.majewski@samsung.com> wrote: >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee > Our approach is a bit different than cpufreq_ondemand one. Ondemand > takes the per CPU idle time, then on that basis calculates per cpu load. > The next step is to choose the highest load and then use this value to > properly scale frequency. > > On the other hand LAB tries to model different behavior: > > As a first step we applied Vincent Guittot's "pack small tasks" [*] > patch to improve "race to idle" behavior: > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks Luckily he is part of my team :) http://www.linaro.org/linux-on-arm/meet-the-team/power-management BTW, he is using ondemand governor for all his work. > Afterwards, we decided to investigate different approach for power > governing: > > Use the number of sleeping CPUs (not the maximal per-CPU load) to > change frequency. We thereof depend on [*] to "pack" as many tasks to > CPU as possible and allow other to sleep. He packs only small tasks. And if there are many small tasks we are packing, then load must be high and so ondemand gov will increase freq. > Contrary, when all cores are heavily loaded, we decided to reduce > frequency by around 30%. With this approach user experience recution is > still acceptable (with much less power consumption). Don't know.. running many cpus at lower freq for long duration will probably take more power than running them at high freq for short duration and making system idle again. > We have posted this "RFC" patch mainly for discussion, and I think it > fits its purpose :-). Yes, no issues with your RFC idea.. its perfect.. @Vincent: Can you please follow this thread a bit and tell us what your views are? -- viresh ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. 2013-04-09 12:02 ` Viresh Kumar @ 2013-04-09 16:44 ` Lukasz Majewski [not found] ` <CAKfTPtD6MK9ogq7mOijSxLSsH0n65Xra48XfRSB3DFs35GT=2g@mail.gmail.com> 0 siblings, 1 reply; 26+ messages in thread From: Lukasz Majewski @ 2013-04-09 16:44 UTC (permalink / raw) To: Viresh Kumar, Vincent Guittot Cc: Jonghwa Lee, Rafael J. Wysocki, linux-kernel, Linux PM list, cpufreq, MyungJoo Ham, Kyungmin Park, Chanwoo Choi, sw0312.kim, Marek Szyprowski Hi Viresh and Vincent, > On 9 April 2013 16:07, Lukasz Majewski <l.majewski@samsung.com> wrote: > >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee > > Our approach is a bit different than cpufreq_ondemand one. Ondemand > > takes the per CPU idle time, then on that basis calculates per cpu > > load. The next step is to choose the highest load and then use this > > value to properly scale frequency. > > > > On the other hand LAB tries to model different behavior: > > > > As a first step we applied Vincent Guittot's "pack small tasks" [*] > > patch to improve "race to idle" behavior: > > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks > > Luckily he is part of my team :) > > http://www.linaro.org/linux-on-arm/meet-the-team/power-management > > BTW, he is using ondemand governor for all his work. > > > Afterwards, we decided to investigate different approach for power > > governing: > > > > Use the number of sleeping CPUs (not the maximal per-CPU load) to > > change frequency. We thereof depend on [*] to "pack" as many tasks > > to CPU as possible and allow other to sleep. > > He packs only small tasks. What's about packing not only small tasks? I will investigate the possibility to aggressively pack (even with a cost of performance degradation) as many tasks as possible to a single CPU. It seems a good idea for a power consumption reduction. > And if there are many small tasks we are > packing, then load must be high and so ondemand gov will increase > freq. This is of course true for "packing" all tasks to a single CPU. If we stay at the power consumption envelope, we can even overclock the frequency. But what if other - lets say 3 CPUs - are under heavy workload? Ondemand will switch frequency to maximum, and as Jonghwa pointed out this can cause dangerous temperature increase. > > > Contrary, when all cores are heavily loaded, we decided to reduce > > frequency by around 30%. With this approach user experience > > recution is still acceptable (with much less power consumption). > > Don't know.. running many cpus at lower freq for long duration will > probably take more power than running them at high freq for short > duration and making system idle again. > > > We have posted this "RFC" patch mainly for discussion, and I think > > it fits its purpose :-). > > Yes, no issues with your RFC idea.. its perfect.. > > @Vincent: Can you please follow this thread a bit and tell us what > your views are? > > -- > viresh -- Best regards, Lukasz Majewski Samsung R&D Poland (SRPOL) | Linux Platform Group ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <CAKfTPtD6MK9ogq7mOijSxLSsH0n65Xra48XfRSB3DFs35GT=2g@mail.gmail.com>]
* Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. [not found] ` <CAKfTPtD6MK9ogq7mOijSxLSsH0n65Xra48XfRSB3DFs35GT=2g@mail.gmail.com> @ 2013-04-10 6:56 ` Vincent Guittot 2013-04-10 8:44 ` Lukasz Majewski 1 sibling, 0 replies; 26+ messages in thread From: Vincent Guittot @ 2013-04-10 6:56 UTC (permalink / raw) To: Lukasz Majewski Cc: Viresh Kumar, Jonghwa Lee, Rafael J. Wysocki, linux-kernel, Linux PM list, cpufreq, MyungJoo Ham, Kyungmin Park, Chanwoo Choi, sw0312.kim, Marek Szyprowski On 9 April 2013 20:52, Vincent Guittot <vincent.guittot@linaro.org> wrote: > > > On Tuesday, 9 April 2013, Lukasz Majewski <l.majewski@samsung.com> wrote: >> Hi Viresh and Vincent, >> >>> On 9 April 2013 16:07, Lukasz Majewski <l.majewski@samsung.com> wrote: >>> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee >>> > Our approach is a bit different than cpufreq_ondemand one. Ondemand >>> > takes the per CPU idle time, then on that basis calculates per cpu >>> > load. The next step is to choose the highest load and then use this >>> > value to properly scale frequency. >>> > >>> > On the other hand LAB tries to model different behavior: >>> > >>> > As a first step we applied Vincent Guittot's "pack small tasks" [*] >>> > patch to improve "race to idle" behavior: >>> > >>> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks >>> >>> Luckily he is part of my team :) >>> >>> http://www.linaro.org/linux-on-arm/meet-the-team/power-management >>> >>> BTW, he is using ondemand governor for all his work. >>> >>> > Afterwards, we decided to investigate different approach for power >>> > governing: >>> > >>> > Use the number of sleeping CPUs (not the maximal per-CPU load) to >>> > change frequency. We thereof depend on [*] to "pack" as many tasks >>> > to CPU as possible and allow other to sleep. >>> >>> He packs only small tasks. >> >> What's about packing not only small tasks? I will investigate the >> possibility to aggressively pack (even with a cost of performance >> degradation) as many tasks as possible to a single CPU. > > Hi Lukasz, > > I've got same comment on my current patch and I'm preparing a new version > that can pack tasks more agressively based on the same buddy mecanism. This > will be done at the cost of performance of course. > > > >> >> It seems a good idea for a power consumption reduction. > > In fact, it's not always true and depends several inputs like the number of > tasks that run simultaneously > > >> >>> And if there are many small tasks we are >>> packing, then load must be high and so ondemand gov will increase >>> freq. >> >> This is of course true for "packing" all tasks to a single CPU. If we >> stay at the power consumption envelope, we can even overclock the >> frequency. >> >> But what if other - lets say 3 CPUs - are under heavy workload? >> Ondemand will switch frequency to maximum, and as Jonghwa pointed out >> this can cause dangerous temperature increase. > > IIUC, your main concern is to stay in a power consumption budget to not over > heat and have to face the side effect of high temperature like a decrease of > power efficiency. So your governor modifies the max frequency based on the > number of running/idle CPU to have an almost stable power consumtpion ? > > Have you also looked at the power clamp driver that have similar target ? > > > Vincent > > >> >>> >>> > Contrary, when all cores are heavily loaded, we decided to reduce >>> > frequency by around 30%. With this approach user experience >>> > recution is still acceptable (with much less power consumption). >>> >>> Don't know.. running many cpus at lower freq for long duration will >>> probably take more power than running them at high freq for short >>> duration and making system idle again. >>> >>> > We have posted this "RFC" patch mainly for discussion, and I think >>> > it fits its purpose :-). >>> >>> Yes, no issues with your RFC idea.. its perfect.. >>> >>> @Vincent: Can you please follow this thread a bit and tell us what >>> your views are? >>> >>> -- >>> viresh >> >> >> >> -- >> Best regards, >> >> Lukasz Majewski >> >> Samsung R&D Poland (SRPOL) | Linux Platform Group >> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. [not found] ` <CAKfTPtD6MK9ogq7mOijSxLSsH0n65Xra48XfRSB3DFs35GT=2g@mail.gmail.com> @ 2013-04-10 8:44 ` Lukasz Majewski 2013-04-10 8:44 ` Lukasz Majewski 1 sibling, 0 replies; 26+ messages in thread From: Lukasz Majewski @ 2013-04-10 8:44 UTC (permalink / raw) To: Vincent Guittot Cc: Viresh Kumar, Jonghwa Lee, Rafael J. Wysocki, linux-kernel, Linux PM list, cpufreq, MyungJoo Ham, Kyungmin Park, Chanwoo Choi, sw0312.kim, Marek Szyprowski Hi Vincent, > > > On Tuesday, 9 April 2013, Lukasz Majewski <l.majewski@samsung.com> > wrote: > > Hi Viresh and Vincent, > > > >> On 9 April 2013 16:07, Lukasz Majewski <l.majewski@samsung.com> > >> wrote: > >> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee > >> > Our approach is a bit different than cpufreq_ondemand one. > >> > Ondemand takes the per CPU idle time, then on that basis > >> > calculates per cpu load. The next step is to choose the highest > >> > load and then use this value to properly scale frequency. > >> > > >> > On the other hand LAB tries to model different behavior: > >> > > >> > As a first step we applied Vincent Guittot's "pack small > >> > tasks" [*] patch to improve "race to idle" behavior: > >> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks > >> > >> Luckily he is part of my team :) > >> > >> http://www.linaro.org/linux-on-arm/meet-the-team/power-management > >> > >> BTW, he is using ondemand governor for all his work. > >> > >> > Afterwards, we decided to investigate different approach for > >> > power governing: > >> > > >> > Use the number of sleeping CPUs (not the maximal per-CPU load) to > >> > change frequency. We thereof depend on [*] to "pack" as many > >> > tasks to CPU as possible and allow other to sleep. > >> > >> He packs only small tasks. > > > > What's about packing not only small tasks? I will investigate the > > possibility to aggressively pack (even with a cost of performance > > degradation) as many tasks as possible to a single CPU. > > Hi Lukasz, > > I've got same comment on my current patch and I'm preparing a new > version that can pack tasks more agressively based on the same buddy > mecanism. This will be done at the cost of performance of course. Can you share your development tree? > > > > > > It seems a good idea for a power consumption reduction. > > In fact, it's not always true and depends several inputs like the > number of tasks that run simultaneously In my understanding, we can try to couple (affine) maximal number of task with a CPU. Performance shall decrease, but we will avoid costs of tasks migration. If I remember correctly, I've asked you about some testbench/test program for scheduler evaluation. I assume that nothing has changed and there isn't any "common" set of scheduler tests? > > > > >> And if there are many small tasks we are > >> packing, then load must be high and so ondemand gov will increase > >> freq. > > > > This is of course true for "packing" all tasks to a single CPU. If > > we stay at the power consumption envelope, we can even overclock the > > frequency. > > > > But what if other - lets say 3 CPUs - are under heavy workload? > > Ondemand will switch frequency to maximum, and as Jonghwa pointed > > out this can cause dangerous temperature increase. > > IIUC, your main concern is to stay in a power consumption budget to > not over heat and have to face the side effect of high temperature > like a decrease of power efficiency. So your governor modifies the > max frequency based on the number of running/idle CPU Yes, this is correct. > to have an > almost stable power consumtpion ? >From our observation it seems, that for 3 or 4 running CPUs under heavy load we see much more power consumption reduction. To put it in another way - ondemand would increase frequency to max for all 4 CPUs. On the other hand, if user experience drops to the acceptable level we can reduce power consumption. Reducing frequency and CPU voltage (by DVS) causes as a side effect, that temperature stays at acceptable level. > > Have you also looked at the power clamp driver that have similar > target ? I might be wrong here, but in my opinion the power clamp driver is a bit different: 1. It is dedicated to Intel SoCs, which provide special set of registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to enter certain C state for a given duration. Idle duration is calculated by per CPU set of high priority kthreads (which also program [*] registers). 2. ARM SoCs don't have such infrastructure, so we depend on SW here. Scheduler has to remove tasks from a particular CPU and "execute" on it the idle_task. Moreover at Exynos4 thermal control loop depends on SW, since we can only read SoC temperature via TMU (Thermal Management Unit) block. Correct me again, but it seems to me that on ARM we can use CPU hotplug (which as Tomas Glexner stated recently is going to be "refactored" :-) ) or "ask" scheduler to use smallest possible number of CPUs and enter C state for idling CPUs. > > > Vincent > > > > >> > >> > Contrary, when all cores are heavily loaded, we decided to reduce > >> > frequency by around 30%. With this approach user experience > >> > recution is still acceptable (with much less power consumption). > >> > >> Don't know.. running many cpus at lower freq for long duration will > >> probably take more power than running them at high freq for short > >> duration and making system idle again. > >> > >> > We have posted this "RFC" patch mainly for discussion, and I > >> > think it fits its purpose :-). > >> > >> Yes, no issues with your RFC idea.. its perfect.. > >> > >> @Vincent: Can you please follow this thread a bit and tell us what > >> your views are? > >> > >> -- > >> viresh > > > > > > > > -- > > Best regards, > > > > Lukasz Majewski > > > > Samsung R&D Poland (SRPOL) | Linux Platform Group > > -- Best regards, Lukasz Majewski Samsung R&D Poland (SRPOL) | Linux Platform Group ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. @ 2013-04-10 8:44 ` Lukasz Majewski 0 siblings, 0 replies; 26+ messages in thread From: Lukasz Majewski @ 2013-04-10 8:44 UTC (permalink / raw) To: Vincent Guittot Cc: Viresh Kumar, Jonghwa Lee, Rafael J. Wysocki, linux-kernel, Linux PM list, cpufreq, MyungJoo Ham, Kyungmin Park, Chanwoo Choi, sw0312.kim, Marek Szyprowski Hi Vincent, > > > On Tuesday, 9 April 2013, Lukasz Majewski <l.majewski@samsung.com> > wrote: > > Hi Viresh and Vincent, > > > >> On 9 April 2013 16:07, Lukasz Majewski <l.majewski@samsung.com> > >> wrote: > >> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee > >> > Our approach is a bit different than cpufreq_ondemand one. > >> > Ondemand takes the per CPU idle time, then on that basis > >> > calculates per cpu load. The next step is to choose the highest > >> > load and then use this value to properly scale frequency. > >> > > >> > On the other hand LAB tries to model different behavior: > >> > > >> > As a first step we applied Vincent Guittot's "pack small > >> > tasks" [*] patch to improve "race to idle" behavior: > >> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks > >> > >> Luckily he is part of my team :) > >> > >> http://www.linaro.org/linux-on-arm/meet-the-team/power-management > >> > >> BTW, he is using ondemand governor for all his work. > >> > >> > Afterwards, we decided to investigate different approach for > >> > power governing: > >> > > >> > Use the number of sleeping CPUs (not the maximal per-CPU load) to > >> > change frequency. We thereof depend on [*] to "pack" as many > >> > tasks to CPU as possible and allow other to sleep. > >> > >> He packs only small tasks. > > > > What's about packing not only small tasks? I will investigate the > > possibility to aggressively pack (even with a cost of performance > > degradation) as many tasks as possible to a single CPU. > > Hi Lukasz, > > I've got same comment on my current patch and I'm preparing a new > version that can pack tasks more agressively based on the same buddy > mecanism. This will be done at the cost of performance of course. Can you share your development tree? > > > > > > It seems a good idea for a power consumption reduction. > > In fact, it's not always true and depends several inputs like the > number of tasks that run simultaneously In my understanding, we can try to couple (affine) maximal number of task with a CPU. Performance shall decrease, but we will avoid costs of tasks migration. If I remember correctly, I've asked you about some testbench/test program for scheduler evaluation. I assume that nothing has changed and there isn't any "common" set of scheduler tests? > > > > >> And if there are many small tasks we are > >> packing, then load must be high and so ondemand gov will increase > >> freq. > > > > This is of course true for "packing" all tasks to a single CPU. If > > we stay at the power consumption envelope, we can even overclock the > > frequency. > > > > But what if other - lets say 3 CPUs - are under heavy workload? > > Ondemand will switch frequency to maximum, and as Jonghwa pointed > > out this can cause dangerous temperature increase. > > IIUC, your main concern is to stay in a power consumption budget to > not over heat and have to face the side effect of high temperature > like a decrease of power efficiency. So your governor modifies the > max frequency based on the number of running/idle CPU Yes, this is correct. > to have an > almost stable power consumtpion ? From our observation it seems, that for 3 or 4 running CPUs under heavy load we see much more power consumption reduction. To put it in another way - ondemand would increase frequency to max for all 4 CPUs. On the other hand, if user experience drops to the acceptable level we can reduce power consumption. Reducing frequency and CPU voltage (by DVS) causes as a side effect, that temperature stays at acceptable level. > > Have you also looked at the power clamp driver that have similar > target ? I might be wrong here, but in my opinion the power clamp driver is a bit different: 1. It is dedicated to Intel SoCs, which provide special set of registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to enter certain C state for a given duration. Idle duration is calculated by per CPU set of high priority kthreads (which also program [*] registers). 2. ARM SoCs don't have such infrastructure, so we depend on SW here. Scheduler has to remove tasks from a particular CPU and "execute" on it the idle_task. Moreover at Exynos4 thermal control loop depends on SW, since we can only read SoC temperature via TMU (Thermal Management Unit) block. Correct me again, but it seems to me that on ARM we can use CPU hotplug (which as Tomas Glexner stated recently is going to be "refactored" :-) ) or "ask" scheduler to use smallest possible number of CPUs and enter C state for idling CPUs. > > > Vincent > > > > >> > >> > Contrary, when all cores are heavily loaded, we decided to reduce > >> > frequency by around 30%. With this approach user experience > >> > recution is still acceptable (with much less power consumption). > >> > >> Don't know.. running many cpus at lower freq for long duration will > >> probably take more power than running them at high freq for short > >> duration and making system idle again. > >> > >> > We have posted this "RFC" patch mainly for discussion, and I > >> > think it fits its purpose :-). > >> > >> Yes, no issues with your RFC idea.. its perfect.. > >> > >> @Vincent: Can you please follow this thread a bit and tell us what > >> your views are? > >> > >> -- > >> viresh > > > > > > > > -- > > Best regards, > > > > Lukasz Majewski > > > > Samsung R&D Poland (SRPOL) | Linux Platform Group > > -- Best regards, Lukasz Majewski Samsung R&D Poland (SRPOL) | Linux Platform Group ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. 2013-04-10 8:44 ` Lukasz Majewski (?) @ 2013-04-10 9:13 ` Vincent Guittot 2013-04-10 9:38 ` Lukasz Majewski 2013-04-23 7:28 ` Lukasz Majewski -1 siblings, 2 replies; 26+ messages in thread From: Vincent Guittot @ 2013-04-10 9:13 UTC (permalink / raw) To: Lukasz Majewski, Daniel Lezcano, Lorenzo Pieralisi Cc: Viresh Kumar, Jonghwa Lee, Rafael J. Wysocki, linux-kernel, Linux PM list, cpufreq, MyungJoo Ham, Kyungmin Park, Chanwoo Choi, sw0312.kim, Marek Szyprowski On 10 April 2013 10:44, Lukasz Majewski <l.majewski@samsung.com> wrote: > Hi Vincent, > >> >> >> On Tuesday, 9 April 2013, Lukasz Majewski <l.majewski@samsung.com> >> wrote: >> > Hi Viresh and Vincent, >> > >> >> On 9 April 2013 16:07, Lukasz Majewski <l.majewski@samsung.com> >> >> wrote: >> >> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee >> >> > Our approach is a bit different than cpufreq_ondemand one. >> >> > Ondemand takes the per CPU idle time, then on that basis >> >> > calculates per cpu load. The next step is to choose the highest >> >> > load and then use this value to properly scale frequency. >> >> > >> >> > On the other hand LAB tries to model different behavior: >> >> > >> >> > As a first step we applied Vincent Guittot's "pack small >> >> > tasks" [*] patch to improve "race to idle" behavior: >> >> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks >> >> >> >> Luckily he is part of my team :) >> >> >> >> http://www.linaro.org/linux-on-arm/meet-the-team/power-management >> >> >> >> BTW, he is using ondemand governor for all his work. >> >> >> >> > Afterwards, we decided to investigate different approach for >> >> > power governing: >> >> > >> >> > Use the number of sleeping CPUs (not the maximal per-CPU load) to >> >> > change frequency. We thereof depend on [*] to "pack" as many >> >> > tasks to CPU as possible and allow other to sleep. >> >> >> >> He packs only small tasks. >> > >> > What's about packing not only small tasks? I will investigate the >> > possibility to aggressively pack (even with a cost of performance >> > degradation) as many tasks as possible to a single CPU. >> >> Hi Lukasz, >> >> I've got same comment on my current patch and I'm preparing a new >> version that can pack tasks more agressively based on the same buddy >> mecanism. This will be done at the cost of performance of course. > > Can you share your development tree? The dev is not finished yet but i will share it as soon as possible > >> >> >> > >> > It seems a good idea for a power consumption reduction. >> >> In fact, it's not always true and depends several inputs like the >> number of tasks that run simultaneously > > In my understanding, we can try to couple (affine) maximal number of > task with a CPU. Performance shall decrease, but we will avoid costs of > tasks migration. > > If I remember correctly, I've asked you about some testbench/test > program for scheduler evaluation. I assume that nothing has changed and > there isn't any "common" set of scheduler tests? There are a bunch of bench that are used to evaluate scheduler like hackbench, pgbench but they generally fills all CPU in order to test max performance. Are you looking for such kind of bench ? > >> >> > >> >> And if there are many small tasks we are >> >> packing, then load must be high and so ondemand gov will increase >> >> freq. >> > >> > This is of course true for "packing" all tasks to a single CPU. If >> > we stay at the power consumption envelope, we can even overclock the >> > frequency. >> > >> > But what if other - lets say 3 CPUs - are under heavy workload? >> > Ondemand will switch frequency to maximum, and as Jonghwa pointed >> > out this can cause dangerous temperature increase. >> >> IIUC, your main concern is to stay in a power consumption budget to >> not over heat and have to face the side effect of high temperature >> like a decrease of power efficiency. So your governor modifies the >> max frequency based on the number of running/idle CPU > Yes, this is correct. > >> to have an >> almost stable power consumtpion ? > > From our observation it seems, that for 3 or 4 running CPUs under heavy > load we see much more power consumption reduction. That's logic because you will reduce the voltage > > To put it in another way - ondemand would increase frequency to max for > all 4 CPUs. On the other hand, if user experience drops to the > acceptable level we can reduce power consumption. > > Reducing frequency and CPU voltage (by DVS) causes as a side effect, > that temperature stays at acceptable level. > >> >> Have you also looked at the power clamp driver that have similar >> target ? > > I might be wrong here, but in my opinion the power clamp driver is a bit > different: yes, it periodically forces the cluster in a low power state > > 1. It is dedicated to Intel SoCs, which provide special set of > registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to > enter certain C state for a given duration. Idle duration is calculated > by per CPU set of high priority kthreads (which also program [*] > registers). IIRC, a trial on ARM platform have been done by lorenzo and daniel. Lorenzo, Daniel, have you more information ? > > 2. ARM SoCs don't have such infrastructure, so we depend on SW here. > Scheduler has to remove tasks from a particular CPU and "execute" on > it the idle_task. > Moreover at Exynos4 thermal control loop depends on SW, since we can > only read SoC temperature via TMU (Thermal Management Unit) block. The idle duration is quite small and should not perturb normal behavior Vincent > > > Correct me again, but it seems to me that on ARM we can use CPU hotplug > (which as Tomas Glexner stated recently is going to be "refactored" :-) > ) or "ask" scheduler to use smallest possible number of CPUs and enter C > state for idling CPUs. > > > >> >> >> Vincent >> >> > >> >> >> >> > Contrary, when all cores are heavily loaded, we decided to reduce >> >> > frequency by around 30%. With this approach user experience >> >> > recution is still acceptable (with much less power consumption). >> >> >> >> Don't know.. running many cpus at lower freq for long duration will >> >> probably take more power than running them at high freq for short >> >> duration and making system idle again. >> >> >> >> > We have posted this "RFC" patch mainly for discussion, and I >> >> > think it fits its purpose :-). >> >> >> >> Yes, no issues with your RFC idea.. its perfect.. >> >> >> >> @Vincent: Can you please follow this thread a bit and tell us what >> >> your views are? >> >> >> >> -- >> >> viresh >> > >> > >> > >> > -- >> > Best regards, >> > >> > Lukasz Majewski >> > >> > Samsung R&D Poland (SRPOL) | Linux Platform Group >> > > > > -- > Best regards, > > Lukasz Majewski > > Samsung R&D Poland (SRPOL) | Linux Platform Group ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. 2013-04-10 9:13 ` Vincent Guittot @ 2013-04-10 9:38 ` Lukasz Majewski 2013-04-10 10:45 ` Vincent Guittot 2013-04-23 7:28 ` Lukasz Majewski 1 sibling, 1 reply; 26+ messages in thread From: Lukasz Majewski @ 2013-04-10 9:38 UTC (permalink / raw) To: Vincent Guittot Cc: Daniel Lezcano, Lorenzo Pieralisi, Viresh Kumar, Jonghwa Lee, Rafael J. Wysocki, linux-kernel, Linux PM list, cpufreq, MyungJoo Ham, Kyungmin Park, Chanwoo Choi, sw0312.kim, Marek Szyprowski Hi Vincent, > On 10 April 2013 10:44, Lukasz Majewski <l.majewski@samsung.com> > wrote: > > Hi Vincent, > > > >> > >> > >> On Tuesday, 9 April 2013, Lukasz Majewski <l.majewski@samsung.com> > >> wrote: > >> > Hi Viresh and Vincent, > >> > > >> >> On 9 April 2013 16:07, Lukasz Majewski <l.majewski@samsung.com> > >> >> wrote: > >> >> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee > >> >> > Our approach is a bit different than cpufreq_ondemand one. > >> >> > Ondemand takes the per CPU idle time, then on that basis > >> >> > calculates per cpu load. The next step is to choose the > >> >> > highest load and then use this value to properly scale > >> >> > frequency. > >> >> > > >> >> > On the other hand LAB tries to model different behavior: > >> >> > > >> >> > As a first step we applied Vincent Guittot's "pack small > >> >> > tasks" [*] patch to improve "race to idle" behavior: > >> >> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks > >> >> > >> >> Luckily he is part of my team :) > >> >> > >> >> http://www.linaro.org/linux-on-arm/meet-the-team/power-management > >> >> > >> >> BTW, he is using ondemand governor for all his work. > >> >> > >> >> > Afterwards, we decided to investigate different approach for > >> >> > power governing: > >> >> > > >> >> > Use the number of sleeping CPUs (not the maximal per-CPU > >> >> > load) to change frequency. We thereof depend on [*] to "pack" > >> >> > as many tasks to CPU as possible and allow other to sleep. > >> >> > >> >> He packs only small tasks. > >> > > >> > What's about packing not only small tasks? I will investigate the > >> > possibility to aggressively pack (even with a cost of performance > >> > degradation) as many tasks as possible to a single CPU. > >> > >> Hi Lukasz, > >> > >> I've got same comment on my current patch and I'm preparing a new > >> version that can pack tasks more agressively based on the same > >> buddy mecanism. This will be done at the cost of performance of > >> course. > > > > Can you share your development tree? > > The dev is not finished yet but i will share it as soon as possible Ok > > > > >> > >> > >> > > >> > It seems a good idea for a power consumption reduction. > >> > >> In fact, it's not always true and depends several inputs like the > >> number of tasks that run simultaneously > > > > In my understanding, we can try to couple (affine) maximal number of > > task with a CPU. Performance shall decrease, but we will avoid > > costs of tasks migration. > > > > If I remember correctly, I've asked you about some testbench/test > > program for scheduler evaluation. I assume that nothing has changed > > and there isn't any "common" set of scheduler tests? > > There are a bunch of bench that are used to evaluate scheduler like > hackbench, pgbench but they generally fills all CPU in order to test > max performance. Are you looking for such kind of bench ? I'd rather see a bit different set of tests - something similar to "cyclic" tests for PREEMPT_RT patch. For sched work it would be welcome to spawn a lot of processes with different duration and workload. And on this basis observe if e.g. 2 or 3 processors are idle. > > > > >> > >> > > >> >> And if there are many small tasks we are > >> >> packing, then load must be high and so ondemand gov will > >> >> increase freq. > >> > > >> > This is of course true for "packing" all tasks to a single CPU. > >> > If we stay at the power consumption envelope, we can even > >> > overclock the frequency. > >> > > >> > But what if other - lets say 3 CPUs - are under heavy workload? > >> > Ondemand will switch frequency to maximum, and as Jonghwa pointed > >> > out this can cause dangerous temperature increase. > >> > >> IIUC, your main concern is to stay in a power consumption budget to > >> not over heat and have to face the side effect of high temperature > >> like a decrease of power efficiency. So your governor modifies the > >> max frequency based on the number of running/idle CPU > > Yes, this is correct. > > > >> to have an > >> almost stable power consumtpion ? > > > > From our observation it seems, that for 3 or 4 running CPUs under > > heavy load we see much more power consumption reduction. > > That's logic because you will reduce the voltage > > > > > To put it in another way - ondemand would increase frequency to max > > for all 4 CPUs. On the other hand, if user experience drops to the > > acceptable level we can reduce power consumption. > > > > Reducing frequency and CPU voltage (by DVS) causes as a side effect, > > that temperature stays at acceptable level. > > > >> > >> Have you also looked at the power clamp driver that have similar > >> target ? > > > > I might be wrong here, but in my opinion the power clamp driver is > > a bit different: > > yes, it periodically forces the cluster in a low power state > > > > > 1. It is dedicated to Intel SoCs, which provide special set of > > registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor > > to enter certain C state for a given duration. Idle duration is > > calculated by per CPU set of high priority kthreads (which also > > program [*] registers). > > IIRC, a trial on ARM platform have been done by lorenzo and daniel. > Lorenzo, Daniel, have you more information ? More information would be welcome :-) > > > > > 2. ARM SoCs don't have such infrastructure, so we depend on SW here. > > Scheduler has to remove tasks from a particular CPU and "execute" on > > it the idle_task. > > Moreover at Exynos4 thermal control loop depends on SW, since we can > > only read SoC temperature via TMU (Thermal Management Unit) block. > > The idle duration is quite small and should not perturb normal > behavior What do you mean by "small idle duration"? You think about exact time needed to enter idle state (ARM's WFI) or the time in which CPU is idle. > > Vincent > > > > > > Correct me again, but it seems to me that on ARM we can use CPU > > hotplug (which as Tomas Glexner stated recently is going to be > > "refactored" :-) ) or "ask" scheduler to use smallest possible > > number of CPUs and enter C state for idling CPUs. > > > > > > > >> > >> > >> Vincent > >> > >> > > >> >> > >> >> > Contrary, when all cores are heavily loaded, we decided to > >> >> > reduce frequency by around 30%. With this approach user > >> >> > experience recution is still acceptable (with much less power > >> >> > consumption). > >> >> > >> >> Don't know.. running many cpus at lower freq for long duration > >> >> will probably take more power than running them at high freq > >> >> for short duration and making system idle again. > >> >> > >> >> > We have posted this "RFC" patch mainly for discussion, and I > >> >> > think it fits its purpose :-). > >> >> > >> >> Yes, no issues with your RFC idea.. its perfect.. > >> >> > >> >> @Vincent: Can you please follow this thread a bit and tell us > >> >> what your views are? > >> >> > >> >> -- > >> >> viresh > >> > > >> > > >> > > >> > -- > >> > Best regards, > >> > > >> > Lukasz Majewski > >> > > >> > Samsung R&D Poland (SRPOL) | Linux Platform Group > >> > > > > > > > -- > > Best regards, > > > > Lukasz Majewski > > > > Samsung R&D Poland (SRPOL) | Linux Platform Group -- Best regards, Lukasz Majewski Samsung R&D Poland (SRPOL) | Linux Platform Group ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. 2013-04-10 9:38 ` Lukasz Majewski @ 2013-04-10 10:45 ` Vincent Guittot 0 siblings, 0 replies; 26+ messages in thread From: Vincent Guittot @ 2013-04-10 10:45 UTC (permalink / raw) To: Lukasz Majewski, sanjay rawat Cc: Daniel Lezcano, Lorenzo Pieralisi, Viresh Kumar, Jonghwa Lee, Rafael J. Wysocki, linux-kernel, Linux PM list, cpufreq, MyungJoo Ham, Kyungmin Park, Chanwoo Choi, sw0312.kim, Marek Szyprowski On 10 April 2013 11:38, Lukasz Majewski <l.majewski@samsung.com> wrote: > Hi Vincent, > >> On 10 April 2013 10:44, Lukasz Majewski <l.majewski@samsung.com> >> wrote: >> > Hi Vincent, >> > >> >> >> >> >> >> On Tuesday, 9 April 2013, Lukasz Majewski <l.majewski@samsung.com> >> >> wrote: >> >> > Hi Viresh and Vincent, >> >> > >> >> >> On 9 April 2013 16:07, Lukasz Majewski <l.majewski@samsung.com> >> >> >> wrote: >> >> >> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee >> >> >> > Our approach is a bit different than cpufreq_ondemand one. >> >> >> > Ondemand takes the per CPU idle time, then on that basis >> >> >> > calculates per cpu load. The next step is to choose the >> >> >> > highest load and then use this value to properly scale >> >> >> > frequency. >> >> >> > >> >> >> > On the other hand LAB tries to model different behavior: >> >> >> > >> >> >> > As a first step we applied Vincent Guittot's "pack small >> >> >> > tasks" [*] patch to improve "race to idle" behavior: >> >> >> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks >> >> >> >> >> >> Luckily he is part of my team :) >> >> >> >> >> >> http://www.linaro.org/linux-on-arm/meet-the-team/power-management >> >> >> >> >> >> BTW, he is using ondemand governor for all his work. >> >> >> >> >> >> > Afterwards, we decided to investigate different approach for >> >> >> > power governing: >> >> >> > >> >> >> > Use the number of sleeping CPUs (not the maximal per-CPU >> >> >> > load) to change frequency. We thereof depend on [*] to "pack" >> >> >> > as many tasks to CPU as possible and allow other to sleep. >> >> >> >> >> >> He packs only small tasks. >> >> > >> >> > What's about packing not only small tasks? I will investigate the >> >> > possibility to aggressively pack (even with a cost of performance >> >> > degradation) as many tasks as possible to a single CPU. >> >> >> >> Hi Lukasz, >> >> >> >> I've got same comment on my current patch and I'm preparing a new >> >> version that can pack tasks more agressively based on the same >> >> buddy mecanism. This will be done at the cost of performance of >> >> course. >> > >> > Can you share your development tree? >> >> The dev is not finished yet but i will share it as soon as possible > > Ok > >> >> > >> >> >> >> >> >> > >> >> > It seems a good idea for a power consumption reduction. >> >> >> >> In fact, it's not always true and depends several inputs like the >> >> number of tasks that run simultaneously >> > >> > In my understanding, we can try to couple (affine) maximal number of >> > task with a CPU. Performance shall decrease, but we will avoid >> > costs of tasks migration. >> > >> > If I remember correctly, I've asked you about some testbench/test >> > program for scheduler evaluation. I assume that nothing has changed >> > and there isn't any "common" set of scheduler tests? >> >> There are a bunch of bench that are used to evaluate scheduler like >> hackbench, pgbench but they generally fills all CPU in order to test >> max performance. Are you looking for such kind of bench ? > > I'd rather see a bit different set of tests - something similar to > "cyclic" tests for PREEMPT_RT patch. > > For sched work it would be welcome to spawn a lot of processes with > different duration and workload. And on this basis observe if e.g. 2 or > 3 processors are idle. sanjay is working on something like that: https://git.linaro.org/gitweb?p=people/sanjayrawat/cyclicTest.git;a=shortlog;h=refs/heads/master > >> >> > >> >> >> >> > >> >> >> And if there are many small tasks we are >> >> >> packing, then load must be high and so ondemand gov will >> >> >> increase freq. >> >> > >> >> > This is of course true for "packing" all tasks to a single CPU. >> >> > If we stay at the power consumption envelope, we can even >> >> > overclock the frequency. >> >> > >> >> > But what if other - lets say 3 CPUs - are under heavy workload? >> >> > Ondemand will switch frequency to maximum, and as Jonghwa pointed >> >> > out this can cause dangerous temperature increase. >> >> >> >> IIUC, your main concern is to stay in a power consumption budget to >> >> not over heat and have to face the side effect of high temperature >> >> like a decrease of power efficiency. So your governor modifies the >> >> max frequency based on the number of running/idle CPU >> > Yes, this is correct. >> > >> >> to have an >> >> almost stable power consumtpion ? >> > >> > From our observation it seems, that for 3 or 4 running CPUs under >> > heavy load we see much more power consumption reduction. >> >> That's logic because you will reduce the voltage >> >> > >> > To put it in another way - ondemand would increase frequency to max >> > for all 4 CPUs. On the other hand, if user experience drops to the >> > acceptable level we can reduce power consumption. >> > >> > Reducing frequency and CPU voltage (by DVS) causes as a side effect, >> > that temperature stays at acceptable level. >> > >> >> >> >> Have you also looked at the power clamp driver that have similar >> >> target ? >> > >> > I might be wrong here, but in my opinion the power clamp driver is >> > a bit different: >> >> yes, it periodically forces the cluster in a low power state >> >> > >> > 1. It is dedicated to Intel SoCs, which provide special set of >> > registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor >> > to enter certain C state for a given duration. Idle duration is >> > calculated by per CPU set of high priority kthreads (which also >> > program [*] registers). >> >> IIRC, a trial on ARM platform have been done by lorenzo and daniel. >> Lorenzo, Daniel, have you more information ? > > More information would be welcome :-) > >> >> > >> > 2. ARM SoCs don't have such infrastructure, so we depend on SW here. >> > Scheduler has to remove tasks from a particular CPU and "execute" on >> > it the idle_task. >> > Moreover at Exynos4 thermal control loop depends on SW, since we can >> > only read SoC temperature via TMU (Thermal Management Unit) block. >> >> The idle duration is quite small and should not perturb normal >> behavior > > What do you mean by "small idle duration"? You think about exact time > needed to enter idle state (ARM's WFI) or the time in which CPU is idle. The time in which CPUs are idle Vincent > >> >> Vincent >> > >> > >> > Correct me again, but it seems to me that on ARM we can use CPU >> > hotplug (which as Tomas Glexner stated recently is going to be >> > "refactored" :-) ) or "ask" scheduler to use smallest possible >> > number of CPUs and enter C state for idling CPUs. >> > >> > >> > >> >> >> >> >> >> Vincent >> >> >> >> > >> >> >> >> >> >> > Contrary, when all cores are heavily loaded, we decided to >> >> >> > reduce frequency by around 30%. With this approach user >> >> >> > experience recution is still acceptable (with much less power >> >> >> > consumption). >> >> >> >> >> >> Don't know.. running many cpus at lower freq for long duration >> >> >> will probably take more power than running them at high freq >> >> >> for short duration and making system idle again. >> >> >> >> >> >> > We have posted this "RFC" patch mainly for discussion, and I >> >> >> > think it fits its purpose :-). >> >> >> >> >> >> Yes, no issues with your RFC idea.. its perfect.. >> >> >> >> >> >> @Vincent: Can you please follow this thread a bit and tell us >> >> >> what your views are? >> >> >> >> >> >> -- >> >> >> viresh >> >> > >> >> > >> >> > >> >> > -- >> >> > Best regards, >> >> > >> >> > Lukasz Majewski >> >> > >> >> > Samsung R&D Poland (SRPOL) | Linux Platform Group >> >> > >> > >> > >> > -- >> > Best regards, >> > >> > Lukasz Majewski >> > >> > Samsung R&D Poland (SRPOL) | Linux Platform Group > > > > -- > Best regards, > > Lukasz Majewski > > Samsung R&D Poland (SRPOL) | Linux Platform Group ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. 2013-04-10 9:13 ` Vincent Guittot 2013-04-10 9:38 ` Lukasz Majewski @ 2013-04-23 7:28 ` Lukasz Majewski 2013-04-23 7:53 ` Vincent Guittot 1 sibling, 1 reply; 26+ messages in thread From: Lukasz Majewski @ 2013-04-23 7:28 UTC (permalink / raw) To: Vincent Guittot Cc: Daniel Lezcano, Lorenzo Pieralisi, Viresh Kumar, Jonghwa Lee, Linux PM list, cpufreq, MyungJoo Ham, Kyungmin Park, Marek Szyprowski Hi Vincent, > >> Hi Lukasz, > >> > >> I've got same comment on my current patch and I'm preparing a new > >> version that can pack tasks more agressively based on the same > >> buddy mecanism. This will be done at the cost of performance of > >> course. > > > > Can you share your development tree? > > The dev is not finished yet but i will share it as soon as possible May I ask how the work with "more aggressive tasks packing" is going? -- Best regards, Lukasz Majewski Samsung R&D Poland (SRPOL) | Linux Platform Group ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. 2013-04-23 7:28 ` Lukasz Majewski @ 2013-04-23 7:53 ` Vincent Guittot 0 siblings, 0 replies; 26+ messages in thread From: Vincent Guittot @ 2013-04-23 7:53 UTC (permalink / raw) To: Lukasz Majewski Cc: Daniel Lezcano, Lorenzo Pieralisi, Viresh Kumar, Jonghwa Lee, Linux PM list, cpufreq, MyungJoo Ham, Kyungmin Park, Marek Szyprowski On 23 April 2013 09:28, Lukasz Majewski <l.majewski@samsung.com> wrote: > Hi Vincent, > >> >> Hi Lukasz, >> >> >> >> I've got same comment on my current patch and I'm preparing a new >> >> version that can pack tasks more agressively based on the same >> >> buddy mecanism. This will be done at the cost of performance of >> >> course. >> > >> > Can you share your development tree? >> >> The dev is not finished yet but i will share it as soon as possible > > May I ask how the work with "more aggressive tasks packing" is going? Hi Lukasz, I should be able to send a proposal by the end of this week. I will add you in the list Regards, Vincent > > -- > Best regards, > > Lukasz Majewski > > Samsung R&D Poland (SRPOL) | Linux Platform Group ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. 2013-04-10 8:44 ` Lukasz Majewski (?) (?) @ 2013-04-10 10:14 ` Lorenzo Pieralisi -1 siblings, 0 replies; 26+ messages in thread From: Lorenzo Pieralisi @ 2013-04-10 10:14 UTC (permalink / raw) To: Lukasz Majewski Cc: Vincent Guittot, Viresh Kumar, Jonghwa Lee, Rafael J. Wysocki, linux-kernel, Linux PM list, cpufreq, MyungJoo Ham, Kyungmin Park, Chanwoo Choi, sw0312.kim, Marek Szyprowski On Wed, Apr 10, 2013 at 09:44:52AM +0100, Lukasz Majewski wrote: [...] > > Have you also looked at the power clamp driver that have similar > > target ? > > I might be wrong here, but in my opinion the power clamp driver is a bit > different: > > 1. It is dedicated to Intel SoCs, which provide special set of > registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to > enter certain C state for a given duration. Idle duration is calculated > by per CPU set of high priority kthreads (which also program [*] > registers). > Those registers are used for compensation (ie user asked a given idle ratio but HW stats show a mismatch) and they are not "programmed" they are just read. That code is Intel specific but it can be easily ported to ARM, I did that and most of the code is common with zero dependency on the architecture. > 2. ARM SoCs don't have such infrastructure, so we depend on SW here. Well, it is true that most of the SoCs I am working on do not have a programming interface to monitor C-state residency, granted, this is a problem. If those stats can be retrieved somehow (I did that on our TC2 platform) then power clamp can be used on ARM with minor modifications. Lorenzo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. 2013-04-09 10:37 ` Lukasz Majewski 2013-04-09 12:02 ` Viresh Kumar @ 2013-04-09 12:25 ` jonghwa3.lee 1 sibling, 0 replies; 26+ messages in thread From: jonghwa3.lee @ 2013-04-09 12:25 UTC (permalink / raw) To: Lukasz Majewski Cc: Viresh Kumar, Rafael J. Wysocki, linux-kernel, Linux PM list, cpufreq, MyungJoo Ham, Kyungmin Park, Chanwoo Choi, sw0312.kim, Marek Szyprowski Hi, sorry for my late reply. I just want to add comment to assist Lukasz's. I put my comments below of Lukasz's. On 2013년 04월 09일 19:37, Lukasz Majewski wrote: > Hi Viresh, > > First of all I'd like to apologize for a late response. > Please find my comments below. > >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee >> <jonghwa3.lee@samsung.com> wrote: >>> <<Purpose>> >>> One of the problem of ondemand is that it considers the most busy >>> cpu only while doesn't care how many cpu is in busy state at the >>> moment. This may results in unnecessary power consumption, and it'll >>> be critical for the system having limited power source. >>> >>> To get the best energy efficiency, LAB governor considers not only >>> idle time but also the number of idle cpus. It primarily focuses on >>> supplying adequate performance to users within the limited resource. >>> It checks the number of idle cpus then controls the maximum >>> frequency dynamically. And it also applies different frequency >>> increasing level depends on that information. In simple terms the >>> less the number of busy cpus, the better performance will be given. >>> In addition, stable system's power consumption in the busy state >>> can be achieved also with using LAB governor. This will help to >>> manage and estimate power consumption in certain system. >> >> Hi Jonghwa, >> >> First of all, i should accept that i haven't got to the minute details >> about your >> patch until now but have done a broad review of it. >> >> There are many things that i am concerned about: >> - I don't want an additional governor to be added to cpufreq unless >> there is a very very strong reason for it. See what happened to >> earlier attempts: >> >> https://lkml.org/lkml/2012/2/7/504 >> >> But it doesn't mean you can't get it in. :) >> >> - What the real logic behind your patchset: I haven't got it >> completely with your >> mails. So what you said is: >> >> - The lesser the number of busy cpus: you want to run at higher >> freqs >> - The more the number of busy cpus: you want to run at lower freqs >> >> But the basic idea i had about this stuff was: The more the number of >> busy cpus, the more loaded the system is, otherwise scheduler wouldn't >> have used so many cpus and so there is need to run at higher frequency >> rather than a lower one. Which would save power in a sense.. Finish >> work early and let most of the cpus enter idle state as early as >> possible. But with your solution we would run at lower frequencies >> and so these cpus will take longer to get into idle state again. This >> will really kill lot of power. > > Our approach is a bit different than cpufreq_ondemand one. Ondemand > takes the per CPU idle time, then on that basis calculates per cpu load. > The next step is to choose the highest load and then use this value to > properly scale frequency. > > On the other hand LAB tries to model different behavior: > > As a first step we applied Vincent Guittot's "pack small tasks" [*] > patch to improve "race to idle" behavior: > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks > > Afterwards, we decided to investigate different approach for power > governing: > > Use the number of sleeping CPUs (not the maximal per-CPU load) to > change frequency. We thereof depend on [*] to "pack" as many tasks to > CPU as possible and allow other to sleep. > On this basis we can increase (even overclock) frequency (when other > CPUs sleep) to improve performance of the only running CPU. > > Contrary, when all cores are heavily loaded, we decided to reduce > frequency by around 30%. With this approach user experience recution is > still acceptable (with much less power consumption). > When system is "idle" - only small background tasks are running, the > frequency is reduced to minimum. > > To sum up: > > Different approach (number of idle CPUs) is used by us to control > ondemand governor. We also heavily depend on [*] patch set. > In additions, it is hard to say just letting high performance to busy system is the best way for reducing the power consumption. Yes, as like Viresh said, we can save the time of busy working, but that is not always perfect solution. If we push all CPUs to keep high performance as much as they can, the temperature will increase rapidly. In my test, on-demand governor reached the thermal limits frequently, while lab governor didn't. And the high temperature also effects the power consumption increasing. I got a rough result which has 10% differences in power consumption between 20% temperature differed conditions. <<Consumed power with maximum frequency at different temperature>> Temperature Power consumption(mWh) Loss(%) 65'C 53.89 Base 80'C 59.88 10 So, to reduce the power consumption, it looks we have to give our cares more to avoid the situation where system goes to high temperature. It is more meaningful for the mobile environment. In mobile devices, high temperature will also affect user's experience badly and can't be decreased easily. Maybe temperature problem is not the duty of cpufreq governor. However, if we just take a look at power consumption gain, we can find the right of the this governor either. <<Consumed power during the web browsing with foreground movie play>> Governor Power consumption(mWh) Percentage(%) ondemand 70.28 100 lab 55.14 78 ===> There is almost 22% gain for power consumption. <<Results of various performance test>> ## v8 benchmark test (http://v8.googlecode.com) Governor score Percentage(%) ondemand 1736.75 100 lab 1576.75 91 ===> 10% performance loss. ## Dhrystone Governor dhrystone Percentage(%) ondemand 2362204 100 lab 2197802 93 ===> 7% performance loss. ## Egypt GLbenchmark (http://www.glbenchmark.com) Governor Frames FPS ondemand 5612 49.65 lab 5559 49.19 ===> Almost no differences. (Above whole tests were tested on the EXYNOS4412 SoC based board.) Thanks, Jonghwa >> >> Think about it. >> >> - In case you need some sort of support on this use case, why >> replicate ondemand governor again by creating another governor. I >> have had some hard time removing the amount of redundancy inside >> governors and you are again going towards that direction. Modifying >> ondemand governor for this response would be a better option. > > We have only posted the "RFC" so, we are open for suggestions. > > At cpufreq_governor.c file the dbs_check_cpu function is responsible > for calculating the maximal load (among CPUs). I think that we could > also count the number of sleeping CPUs (as the average of time > accumulated data). Then we could pass this data to newly written > function (e.g. lab_check_cpu) defined at cpufreq_ondemand.c (and > pointed to by gov_check_cpu). > > This would require change to dbs_check_cpu function and extending > ONDEMAND governor by lab_check_cpu() function. > >> >> - You haven't rebased of latest code from linux-next :) >> > > We have posted this "RFC" patch mainly for discussion, and I think it > fits its purpose :-). > > > ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2013-04-23 7:53 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-04-01 8:24 [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor Jonghwa Lee 2013-04-01 8:24 ` [RFC PATCH 1/2] cpuidle: Add idle enter/exit time stamp for notifying current idle state Jonghwa Lee 2013-04-02 5:00 ` Daniel Lezcano 2013-04-02 5:00 ` Daniel Lezcano 2013-04-02 6:17 ` jonghwa3.lee 2013-04-02 7:34 ` Daniel Lezcano 2013-04-02 9:37 ` jonghwa3.lee 2013-04-02 10:08 ` Daniel Lezcano 2013-04-02 11:07 ` jonghwa3.lee 2013-04-02 11:18 ` Daniel Lezcano 2013-04-03 6:10 ` jonghwa3.lee 2013-04-01 8:24 ` [RFC PATCH 2/2] cpufreq: Introduce new cpufreq governor, LAB(Legacy Application Boost) Jonghwa Lee 2013-04-01 15:37 ` [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor Viresh Kumar 2013-04-09 10:37 ` Lukasz Majewski 2013-04-09 12:02 ` Viresh Kumar 2013-04-09 16:44 ` Lukasz Majewski [not found] ` <CAKfTPtD6MK9ogq7mOijSxLSsH0n65Xra48XfRSB3DFs35GT=2g@mail.gmail.com> 2013-04-10 6:56 ` Vincent Guittot 2013-04-10 8:44 ` Lukasz Majewski 2013-04-10 8:44 ` Lukasz Majewski 2013-04-10 9:13 ` Vincent Guittot 2013-04-10 9:38 ` Lukasz Majewski 2013-04-10 10:45 ` Vincent Guittot 2013-04-23 7:28 ` Lukasz Majewski 2013-04-23 7:53 ` Vincent Guittot 2013-04-10 10:14 ` Lorenzo Pieralisi 2013-04-09 12:25 ` jonghwa3.lee
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.