linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] sched/eas: introduce system-wide overutil indicator
@ 2019-09-19  7:20 YT Chang
  2019-09-19  8:00 ` Vincent Guittot
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: YT Chang @ 2019-09-19  7:20 UTC (permalink / raw)
  To: Peter Zijlstra, Matthias Brugger
  Cc: wsd_upstream, linux-kernel, linux-arm-kernel, linux-mediatek, YT Chang

When the system is overutilization, the load-balance crossing
clusters will be triggered and scheduler will not use energy
aware scheduling to choose CPUs.

The overutilization means the loading of  ANY CPUs
exceeds threshold (80%).

However, only 1 heavy task or while-1 program will run on highest
capacity CPUs and it still result to trigger overutilization. So
the system will not use Energy Aware scheduling.

To avoid it, a system-wide over-utilization indicator to trigger
load-balance cross clusters.

The policy is:
	The loading of "ALL CPUs in the highest capacity"
						exceeds threshold(80%) or
	The loading of "Any CPUs not in the highest capacity"
						exceed threshold(80%)

Signed-off-by: YT Chang <yt.chang@mediatek.com>
---
 kernel/sched/fair.c | 76 +++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 65 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 036be95..f4c3d70 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5182,10 +5182,71 @@ static inline bool cpu_overutilized(int cpu)
 static inline void update_overutilized_status(struct rq *rq)
 {
 	if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu)) {
-		WRITE_ONCE(rq->rd->overutilized, SG_OVERUTILIZED);
-		trace_sched_overutilized_tp(rq->rd, SG_OVERUTILIZED);
+		if (capacity_orig_of(cpu_of(rq)) < rq->rd->max_cpu_capacity) {
+			WRITE_ONCE(rq->rd->overutilized, SG_OVERUTILIZED);
+			trace_sched_overutilized_tp(rq->rd, SG_OVERUTILIZED);
+		}
 	}
 }
+
+static
+void update_system_overutilized(struct sched_domain *sd, struct cpumask *cpus)
+{
+	unsigned long group_util;
+	bool intra_overutil = false;
+	unsigned long max_capacity;
+	struct sched_group *group = sd->groups;
+	struct root_domain *rd;
+	int this_cpu;
+	bool overutilized;
+	int i;
+
+	this_cpu = smp_processor_id();
+	rd = cpu_rq(this_cpu)->rd;
+	overutilized = READ_ONCE(rd->overutilized);
+	max_capacity = rd->max_cpu_capacity;
+
+	do {
+		group_util = 0;
+		for_each_cpu_and(i, sched_group_span(group), cpus) {
+			group_util += cpu_util(i);
+			if (cpu_overutilized(i)) {
+				if (capacity_orig_of(i) < max_capacity) {
+					intra_overutil = true;
+					break;
+				}
+			}
+		}
+
+		/*
+		 * A capacity base hint for over-utilization.
+		 * Not to trigger system overutiled if heavy tasks
+		 * in Big.cluster, so
+		 * add the free room(20%) of Big.cluster is impacted which means
+		 * system-wide over-utilization,
+		 * that considers whole cluster not single cpu
+		 */
+		if (group->group_weight > 1 && (group->sgc->capacity * 1024 <
+						group_util * capacity_margin)) {
+			intra_overutil = true;
+			break;
+		}
+
+		group = group->next;
+
+	} while (group != sd->groups && !intra_overutil);
+
+	if (overutilized != intra_overutil) {
+		if (intra_overutil == true) {
+			WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED);
+			trace_sched_overutilized_tp(rd, SG_OVERUTILIZED);
+		} else {
+			WRITE_ONCE(rd->overutilized, 0);
+			trace_sched_overutilized_tp(rd, 0);
+		}
+	}
+}
+
 #else
 static inline void update_overutilized_status(struct rq *rq) { }
 #endif
@@ -8242,15 +8303,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 
 		/* update overload indicator if we are at root domain */
 		WRITE_ONCE(rd->overload, sg_status & SG_OVERLOAD);
-
-		/* Update over-utilization (tipping point, U >= 0) indicator */
-		WRITE_ONCE(rd->overutilized, sg_status & SG_OVERUTILIZED);
-		trace_sched_overutilized_tp(rd, sg_status & SG_OVERUTILIZED);
-	} else if (sg_status & SG_OVERUTILIZED) {
-		struct root_domain *rd = env->dst_rq->rd;
-
-		WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED);
-		trace_sched_overutilized_tp(rd, SG_OVERUTILIZED);
 	}
 }
 
@@ -8476,6 +8528,8 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 	 */
 	update_sd_lb_stats(env, &sds);
 
+	update_system_overutilized(env->sd, env->cpus);
+
 	if (sched_energy_enabled()) {
 		struct root_domain *rd = env->dst_rq->rd;
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] sched/eas: introduce system-wide overutil indicator
  2019-09-19  7:20 [PATCH 1/1] sched/eas: introduce system-wide overutil indicator YT Chang
@ 2019-09-19  8:00 ` Vincent Guittot
  2019-09-19  8:10 ` kbuild test robot
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Vincent Guittot @ 2019-09-19  8:00 UTC (permalink / raw)
  To: YT Chang
  Cc: Peter Zijlstra, Matthias Brugger, wsd_upstream, linux-kernel,
	LAK, linux-mediatek

On Thu, 19 Sep 2019 at 09:20, YT Chang <yt.chang@mediatek.com> wrote:
>
> When the system is overutilization, the load-balance crossing

s/overutilization/overutilized/

> clusters will be triggered and scheduler will not use energy
> aware scheduling to choose CPUs.
>
> The overutilization means the loading of  ANY CPUs

s/ANY/any/

> exceeds threshold (80%).
>
> However, only 1 heavy task or while-1 program will run on highest
> capacity CPUs and it still result to trigger overutilization. So
> the system will not use Energy Aware scheduling.
>
> To avoid it, a system-wide over-utilization indicator to trigger
> load-balance cross clusters.

The current rd->overutilized is already system wide. I mean that as
soon as one CPU is overutilized, the whole system is considered as
overutilized whereas you would like a finer grain level of
overutilization.
I remember a patch that was proposing a per sched_domain
overutilization detection. The load_balance at one sched_domain level
was enabled only if the child level was not able to handle the
overutilization and the energy aware scheduling was still used in the
other sched_domain

>
> The policy is:
>         The loading of "ALL CPUs in the highest capacity"
>                                                 exceeds threshold(80%) or
>         The loading of "Any CPUs not in the highest capacity"
>                                                 exceed threshold(80%)

Do you have UCs or figures that show a benefit with this change ?

>
> Signed-off-by: YT Chang <yt.chang@mediatek.com>
> ---
>  kernel/sched/fair.c | 76 +++++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 65 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 036be95..f4c3d70 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5182,10 +5182,71 @@ static inline bool cpu_overutilized(int cpu)
>  static inline void update_overutilized_status(struct rq *rq)
>  {
>         if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu)) {
> -               WRITE_ONCE(rq->rd->overutilized, SG_OVERUTILIZED);
> -               trace_sched_overutilized_tp(rq->rd, SG_OVERUTILIZED);
> +               if (capacity_orig_of(cpu_of(rq)) < rq->rd->max_cpu_capacity) {
> +                       WRITE_ONCE(rq->rd->overutilized, SG_OVERUTILIZED);
> +                       trace_sched_overutilized_tp(rq->rd, SG_OVERUTILIZED);
> +               }
>         }
>  }
> +
> +static
> +void update_system_overutilized(struct sched_domain *sd, struct cpumask *cpus)
> +{
> +       unsigned long group_util;
> +       bool intra_overutil = false;
> +       unsigned long max_capacity;
> +       struct sched_group *group = sd->groups;
> +       struct root_domain *rd;
> +       int this_cpu;
> +       bool overutilized;
> +       int i;
> +
> +       this_cpu = smp_processor_id();
> +       rd = cpu_rq(this_cpu)->rd;
> +       overutilized = READ_ONCE(rd->overutilized);
> +       max_capacity = rd->max_cpu_capacity;
> +
> +       do {
> +               group_util = 0;
> +               for_each_cpu_and(i, sched_group_span(group), cpus) {
> +                       group_util += cpu_util(i);
> +                       if (cpu_overutilized(i)) {
> +                               if (capacity_orig_of(i) < max_capacity) {
> +                                       intra_overutil = true;
> +                                       break;
> +                               }
> +                       }
> +               }
> +
> +               /*
> +                * A capacity base hint for over-utilization.
> +                * Not to trigger system overutiled if heavy tasks
> +                * in Big.cluster, so
> +                * add the free room(20%) of Big.cluster is impacted which means
> +                * system-wide over-utilization,
> +                * that considers whole cluster not single cpu
> +                */
> +               if (group->group_weight > 1 && (group->sgc->capacity * 1024 <
> +                                               group_util * capacity_margin)) {
> +                       intra_overutil = true;
> +                       break;
> +               }
> +
> +               group = group->next;
> +
> +       } while (group != sd->groups && !intra_overutil);
> +
> +       if (overutilized != intra_overutil) {
> +               if (intra_overutil == true) {
> +                       WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED);
> +                       trace_sched_overutilized_tp(rd, SG_OVERUTILIZED);
> +               } else {
> +                       WRITE_ONCE(rd->overutilized, 0);
> +                       trace_sched_overutilized_tp(rd, 0);
> +               }
> +       }
> +}
> +
>  #else
>  static inline void update_overutilized_status(struct rq *rq) { }
>  #endif
> @@ -8242,15 +8303,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
>
>                 /* update overload indicator if we are at root domain */
>                 WRITE_ONCE(rd->overload, sg_status & SG_OVERLOAD);
> -
> -               /* Update over-utilization (tipping point, U >= 0) indicator */
> -               WRITE_ONCE(rd->overutilized, sg_status & SG_OVERUTILIZED);
> -               trace_sched_overutilized_tp(rd, sg_status & SG_OVERUTILIZED);
> -       } else if (sg_status & SG_OVERUTILIZED) {
> -               struct root_domain *rd = env->dst_rq->rd;
> -
> -               WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED);
> -               trace_sched_overutilized_tp(rd, SG_OVERUTILIZED);
>         }
>  }
>
> @@ -8476,6 +8528,8 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
>          */
>         update_sd_lb_stats(env, &sds);
>
> +       update_system_overutilized(env->sd, env->cpus);

This should be called only if (sched_energy_enabled())

> +
>         if (sched_energy_enabled()) {
>                 struct root_domain *rd = env->dst_rq->rd;
>
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] sched/eas: introduce system-wide overutil indicator
  2019-09-19  7:20 [PATCH 1/1] sched/eas: introduce system-wide overutil indicator YT Chang
  2019-09-19  8:00 ` Vincent Guittot
@ 2019-09-19  8:10 ` kbuild test robot
  2019-09-19  8:10 ` Quentin Perret
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: kbuild test robot @ 2019-09-19  8:10 UTC (permalink / raw)
  To: YT Chang
  Cc: kbuild-all, Peter Zijlstra, Matthias Brugger, wsd_upstream,
	linux-kernel, linux-arm-kernel, linux-mediatek, YT Chang

[-- Attachment #1: Type: text/plain, Size: 3293 bytes --]

Hi YT,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.3 next-20190918]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/YT-Chang/sched-eas-introduce-system-wide-overutil-indicator/20190919-152213
config: i386-defconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/sched/fair.c: In function 'update_system_overutilized':
>> kernel/sched/fair.c:5234:20: error: 'capacity_margin' undeclared (first use in this function); did you mean 'capacity_of'?
          group_util * capacity_margin)) {
                       ^~~~~~~~~~~~~~~
                       capacity_of
   kernel/sched/fair.c:5234:20: note: each undeclared identifier is reported only once for each function it appears in

vim +5234 kernel/sched/fair.c

  5195	
  5196	static
  5197	void update_system_overutilized(struct sched_domain *sd, struct cpumask *cpus)
  5198	{
  5199		unsigned long group_util;
  5200		bool intra_overutil = false;
  5201		unsigned long max_capacity;
  5202		struct sched_group *group = sd->groups;
  5203		struct root_domain *rd;
  5204		int this_cpu;
  5205		bool overutilized;
  5206		int i;
  5207	
  5208		this_cpu = smp_processor_id();
  5209		rd = cpu_rq(this_cpu)->rd;
  5210		overutilized = READ_ONCE(rd->overutilized);
  5211		max_capacity = rd->max_cpu_capacity;
  5212	
  5213		do {
  5214			group_util = 0;
  5215			for_each_cpu_and(i, sched_group_span(group), cpus) {
  5216				group_util += cpu_util(i);
  5217				if (cpu_overutilized(i)) {
  5218					if (capacity_orig_of(i) < max_capacity) {
  5219						intra_overutil = true;
  5220						break;
  5221					}
  5222				}
  5223			}
  5224	
  5225			/*
  5226			 * A capacity base hint for over-utilization.
  5227			 * Not to trigger system overutiled if heavy tasks
  5228			 * in Big.cluster, so
  5229			 * add the free room(20%) of Big.cluster is impacted which means
  5230			 * system-wide over-utilization,
  5231			 * that considers whole cluster not single cpu
  5232			 */
  5233			if (group->group_weight > 1 && (group->sgc->capacity * 1024 <
> 5234							group_util * capacity_margin)) {
  5235				intra_overutil = true;
  5236				break;
  5237			}
  5238	
  5239			group = group->next;
  5240	
  5241		} while (group != sd->groups && !intra_overutil);
  5242	
  5243		if (overutilized != intra_overutil) {
  5244			if (intra_overutil == true) {
  5245				WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED);
  5246				trace_sched_overutilized_tp(rd, SG_OVERUTILIZED);
  5247			} else {
  5248				WRITE_ONCE(rd->overutilized, 0);
  5249				trace_sched_overutilized_tp(rd, 0);
  5250			}
  5251		}
  5252	}
  5253	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28088 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] sched/eas: introduce system-wide overutil indicator
  2019-09-19  7:20 [PATCH 1/1] sched/eas: introduce system-wide overutil indicator YT Chang
  2019-09-19  8:00 ` Vincent Guittot
  2019-09-19  8:10 ` kbuild test robot
@ 2019-09-19  8:10 ` Quentin Perret
  2019-09-21 14:44 ` kbuild test robot
  2019-09-23  8:05 ` Dietmar Eggemann
  4 siblings, 0 replies; 6+ messages in thread
From: Quentin Perret @ 2019-09-19  8:10 UTC (permalink / raw)
  To: YT Chang
  Cc: Peter Zijlstra, Matthias Brugger, wsd_upstream, linux-kernel,
	linux-arm-kernel, linux-mediatek

Hi,

Could you please CC me on later versions of this ? I'm interested.

On Thursday 19 Sep 2019 at 15:20:22 (+0800), YT Chang wrote:
> When the system is overutilization, the load-balance crossing
> clusters will be triggered and scheduler will not use energy
> aware scheduling to choose CPUs.
> 
> The overutilization means the loading of  ANY CPUs
> exceeds threshold (80%).
> 
> However, only 1 heavy task or while-1 program will run on highest
> capacity CPUs and it still result to trigger overutilization. So
> the system will not use Energy Aware scheduling.
> 
> To avoid it, a system-wide over-utilization indicator to trigger
> load-balance cross clusters.
> 
> The policy is:
> 	The loading of "ALL CPUs in the highest capacity"
> 						exceeds threshold(80%) or
> 	The loading of "Any CPUs not in the highest capacity"
> 						exceed threshold(80%)
> 
> Signed-off-by: YT Chang <yt.chang@mediatek.com>

Right, so we originally went for the simpler implementation because in
general when you have the biggest CPUs of the system running flat out at
max freq, the micro-optimizations for energy on littles don't matter all
that much. Is there a use-case where you see a big difference ?

A second thing is RT pressure. If a big CPU is used at 50% by a CFS task
and 50% by RT, we should mark it overutilized. Otherwise EAS will think
the CFS task is 50% and try to down-migrate it. But the truth is, we
dont know the size of the task ... So, I believe your patch breaks that
ATM.

And there is a similar problem with misfit. That is, a task running flat
out on a big CPU will be flagged as misfit, even if there is nothing we
can do about (we can't up-migrate it for obvious reasons). So perhaps we
should look at a common solution for both issues, if deemed useful.

> ---
>  kernel/sched/fair.c | 76 +++++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 65 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 036be95..f4c3d70 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5182,10 +5182,71 @@ static inline bool cpu_overutilized(int cpu)
>  static inline void update_overutilized_status(struct rq *rq)
>  {
>  	if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu)) {
> -		WRITE_ONCE(rq->rd->overutilized, SG_OVERUTILIZED);
> -		trace_sched_overutilized_tp(rq->rd, SG_OVERUTILIZED);
> +		if (capacity_orig_of(cpu_of(rq)) < rq->rd->max_cpu_capacity) {
> +			WRITE_ONCE(rq->rd->overutilized, SG_OVERUTILIZED);
> +			trace_sched_overutilized_tp(rq->rd, SG_OVERUTILIZED);
> +		}
>  	}
>  }
> +
> +static
> +void update_system_overutilized(struct sched_domain *sd, struct cpumask *cpus)
> +{
> +	unsigned long group_util;
> +	bool intra_overutil = false;
> +	unsigned long max_capacity;
> +	struct sched_group *group = sd->groups;
> +	struct root_domain *rd;
> +	int this_cpu;
> +	bool overutilized;
> +	int i;
> +
> +	this_cpu = smp_processor_id();
> +	rd = cpu_rq(this_cpu)->rd;
> +	overutilized = READ_ONCE(rd->overutilized);
> +	max_capacity = rd->max_cpu_capacity;
> +
> +	do {
> +		group_util = 0;
> +		for_each_cpu_and(i, sched_group_span(group), cpus) {
> +			group_util += cpu_util(i);
> +			if (cpu_overutilized(i)) {
> +				if (capacity_orig_of(i) < max_capacity) {

This is what breaks things with RT pressure I think.

> +					intra_overutil = true;
> +					break;
> +				}
> +			}
> +		}
> +
> +		/*
> +		 * A capacity base hint for over-utilization.
> +		 * Not to trigger system overutiled if heavy tasks
> +		 * in Big.cluster, so
> +		 * add the free room(20%) of Big.cluster is impacted which means
> +		 * system-wide over-utilization,
> +		 * that considers whole cluster not single cpu
> +		 */
> +		if (group->group_weight > 1 && (group->sgc->capacity * 1024 <
> +						group_util * capacity_margin)) {
> +			intra_overutil = true;
> +			break;
> +		}

What if we have only one big MC domain with both big and little CPUs and
no DIE ? Say you have 4 big tasks, 4 big CPUs, 4 little CPUs (idle).
You'll fail to mark the system overutilized no ?

> +
> +		group = group->next;
> +
> +	} while (group != sd->groups && !intra_overutil);
> +
> +	if (overutilized != intra_overutil) {
> +		if (intra_overutil == true) {
> +			WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED);
> +			trace_sched_overutilized_tp(rd, SG_OVERUTILIZED);
> +		} else {
> +			WRITE_ONCE(rd->overutilized, 0);
> +			trace_sched_overutilized_tp(rd, 0);
> +		}
> +	}
> +}
> +
>  #else
>  static inline void update_overutilized_status(struct rq *rq) { }
>  #endif
> @@ -8242,15 +8303,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
>  
>  		/* update overload indicator if we are at root domain */
>  		WRITE_ONCE(rd->overload, sg_status & SG_OVERLOAD);
> -
> -		/* Update over-utilization (tipping point, U >= 0) indicator */
> -		WRITE_ONCE(rd->overutilized, sg_status & SG_OVERUTILIZED);
> -		trace_sched_overutilized_tp(rd, sg_status & SG_OVERUTILIZED);
> -	} else if (sg_status & SG_OVERUTILIZED) {
> -		struct root_domain *rd = env->dst_rq->rd;
> -
> -		WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED);
> -		trace_sched_overutilized_tp(rd, SG_OVERUTILIZED);
>  	}
>  }
>  
> @@ -8476,6 +8528,8 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
>  	 */
>  	update_sd_lb_stats(env, &sds);
>  
> +	update_system_overutilized(env->sd, env->cpus);
> +
>  	if (sched_energy_enabled()) {
>  		struct root_domain *rd = env->dst_rq->rd;
>  
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] sched/eas: introduce system-wide overutil indicator
  2019-09-19  7:20 [PATCH 1/1] sched/eas: introduce system-wide overutil indicator YT Chang
                   ` (2 preceding siblings ...)
  2019-09-19  8:10 ` Quentin Perret
@ 2019-09-21 14:44 ` kbuild test robot
  2019-09-23  8:05 ` Dietmar Eggemann
  4 siblings, 0 replies; 6+ messages in thread
From: kbuild test robot @ 2019-09-21 14:44 UTC (permalink / raw)
  To: YT Chang
  Cc: kbuild-all, Peter Zijlstra, Matthias Brugger, wsd_upstream,
	linux-kernel, linux-arm-kernel, linux-mediatek, YT Chang

[-- Attachment #1: Type: text/plain, Size: 5636 bytes --]

Hi YT,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.3 next-20190918]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/YT-Chang/sched-eas-introduce-system-wide-overutil-indicator/20190919-152213
config: x86_64-randconfig-s1-201937 (attached as .config)
compiler: gcc-6 (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 
:::::: branch date: 2 hours ago
:::::: commit date: 2 hours ago

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/sched/fair.c: In function 'update_system_overutilized':
>> kernel/sched/fair.c:5234:20: error: 'capacity_margin' undeclared (first use in this function)
          group_util * capacity_margin)) {
                       ^~~~~~~~~~~~~~~
   kernel/sched/fair.c:5234:20: note: each undeclared identifier is reported only once for each function it appears in

# https://github.com/0day-ci/linux/commit/58f2ed2a11501d4de287fafc0a7b3385d54f8238
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 58f2ed2a11501d4de287fafc0a7b3385d54f8238
vim +/capacity_margin +5234 kernel/sched/fair.c

58f2ed2a11501d YT Chang 2019-09-19  5195  
58f2ed2a11501d YT Chang 2019-09-19  5196  static
58f2ed2a11501d YT Chang 2019-09-19  5197  void update_system_overutilized(struct sched_domain *sd, struct cpumask *cpus)
58f2ed2a11501d YT Chang 2019-09-19  5198  {
58f2ed2a11501d YT Chang 2019-09-19  5199  	unsigned long group_util;
58f2ed2a11501d YT Chang 2019-09-19  5200  	bool intra_overutil = false;
58f2ed2a11501d YT Chang 2019-09-19  5201  	unsigned long max_capacity;
58f2ed2a11501d YT Chang 2019-09-19  5202  	struct sched_group *group = sd->groups;
58f2ed2a11501d YT Chang 2019-09-19  5203  	struct root_domain *rd;
58f2ed2a11501d YT Chang 2019-09-19  5204  	int this_cpu;
58f2ed2a11501d YT Chang 2019-09-19  5205  	bool overutilized;
58f2ed2a11501d YT Chang 2019-09-19  5206  	int i;
58f2ed2a11501d YT Chang 2019-09-19  5207  
58f2ed2a11501d YT Chang 2019-09-19  5208  	this_cpu = smp_processor_id();
58f2ed2a11501d YT Chang 2019-09-19  5209  	rd = cpu_rq(this_cpu)->rd;
58f2ed2a11501d YT Chang 2019-09-19  5210  	overutilized = READ_ONCE(rd->overutilized);
58f2ed2a11501d YT Chang 2019-09-19  5211  	max_capacity = rd->max_cpu_capacity;
58f2ed2a11501d YT Chang 2019-09-19  5212  
58f2ed2a11501d YT Chang 2019-09-19  5213  	do {
58f2ed2a11501d YT Chang 2019-09-19  5214  		group_util = 0;
58f2ed2a11501d YT Chang 2019-09-19  5215  		for_each_cpu_and(i, sched_group_span(group), cpus) {
58f2ed2a11501d YT Chang 2019-09-19  5216  			group_util += cpu_util(i);
58f2ed2a11501d YT Chang 2019-09-19  5217  			if (cpu_overutilized(i)) {
58f2ed2a11501d YT Chang 2019-09-19  5218  				if (capacity_orig_of(i) < max_capacity) {
58f2ed2a11501d YT Chang 2019-09-19  5219  					intra_overutil = true;
58f2ed2a11501d YT Chang 2019-09-19  5220  					break;
58f2ed2a11501d YT Chang 2019-09-19  5221  				}
58f2ed2a11501d YT Chang 2019-09-19  5222  			}
58f2ed2a11501d YT Chang 2019-09-19  5223  		}
58f2ed2a11501d YT Chang 2019-09-19  5224  
58f2ed2a11501d YT Chang 2019-09-19  5225  		/*
58f2ed2a11501d YT Chang 2019-09-19  5226  		 * A capacity base hint for over-utilization.
58f2ed2a11501d YT Chang 2019-09-19  5227  		 * Not to trigger system overutiled if heavy tasks
58f2ed2a11501d YT Chang 2019-09-19  5228  		 * in Big.cluster, so
58f2ed2a11501d YT Chang 2019-09-19  5229  		 * add the free room(20%) of Big.cluster is impacted which means
58f2ed2a11501d YT Chang 2019-09-19  5230  		 * system-wide over-utilization,
58f2ed2a11501d YT Chang 2019-09-19  5231  		 * that considers whole cluster not single cpu
58f2ed2a11501d YT Chang 2019-09-19  5232  		 */
58f2ed2a11501d YT Chang 2019-09-19  5233  		if (group->group_weight > 1 && (group->sgc->capacity * 1024 <
58f2ed2a11501d YT Chang 2019-09-19 @5234  						group_util * capacity_margin)) {
58f2ed2a11501d YT Chang 2019-09-19  5235  			intra_overutil = true;
58f2ed2a11501d YT Chang 2019-09-19  5236  			break;
58f2ed2a11501d YT Chang 2019-09-19  5237  		}
58f2ed2a11501d YT Chang 2019-09-19  5238  
58f2ed2a11501d YT Chang 2019-09-19  5239  		group = group->next;
58f2ed2a11501d YT Chang 2019-09-19  5240  
58f2ed2a11501d YT Chang 2019-09-19  5241  	} while (group != sd->groups && !intra_overutil);
58f2ed2a11501d YT Chang 2019-09-19  5242  
58f2ed2a11501d YT Chang 2019-09-19  5243  	if (overutilized != intra_overutil) {
58f2ed2a11501d YT Chang 2019-09-19  5244  		if (intra_overutil == true) {
58f2ed2a11501d YT Chang 2019-09-19  5245  			WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED);
58f2ed2a11501d YT Chang 2019-09-19  5246  			trace_sched_overutilized_tp(rd, SG_OVERUTILIZED);
58f2ed2a11501d YT Chang 2019-09-19  5247  		} else {
58f2ed2a11501d YT Chang 2019-09-19  5248  			WRITE_ONCE(rd->overutilized, 0);
58f2ed2a11501d YT Chang 2019-09-19  5249  			trace_sched_overutilized_tp(rd, 0);
58f2ed2a11501d YT Chang 2019-09-19  5250  		}
58f2ed2a11501d YT Chang 2019-09-19  5251  	}
58f2ed2a11501d YT Chang 2019-09-19  5252  }
58f2ed2a11501d YT Chang 2019-09-19  5253  

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34240 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] sched/eas: introduce system-wide overutil indicator
  2019-09-19  7:20 [PATCH 1/1] sched/eas: introduce system-wide overutil indicator YT Chang
                   ` (3 preceding siblings ...)
  2019-09-21 14:44 ` kbuild test robot
@ 2019-09-23  8:05 ` Dietmar Eggemann
  4 siblings, 0 replies; 6+ messages in thread
From: Dietmar Eggemann @ 2019-09-23  8:05 UTC (permalink / raw)
  To: YT Chang, Peter Zijlstra, Matthias Brugger
  Cc: wsd_upstream, linux-kernel, linux-arm-kernel, linux-mediatek

On 9/19/19 9:20 AM, YT Chang wrote:
> When the system is overutilization, the load-balance crossing
> clusters will be triggered and scheduler will not use energy
> aware scheduling to choose CPUs.

We're currently transitioning from traditional big.LITTLE (the CPUs of 1
cluster (all having the same CPU (original) capacity) represent a DIE
Sched Domain (SD) level Sched Group (SG)) to DynamIQ systems. Later can
share CPUs with different CPU (original) capacity in one cluster.
In Linux mainline with today's DynamIQ systems (1 cluster) you will
only have 1 cluster, i.e. 1 MC SD level SG.

For those systems the current approach is much more applicable.

Or do you apply the out-of-tree Phantom Domain concept, which creates n
(n=2 or 3 ((huge,) big, little)) DIE SGs on your 1 cluster DynamIQ system?

> The overutilization means the loading of  ANY CPUs
> exceeds threshold (80%).
> 
> However, only 1 heavy task or while-1 program will run on highest
> capacity CPUs and it still result to trigger overutilization. So
> the system will not use Energy Aware scheduling.

The patch-header of commit 2802bf3cd936 ("sched/fair: Add
over-utilization/tipping point indicator") mentioned why the current
approach is so conservatively defined.

> To avoid it, a system-wide over-utilization indicator to trigger
> load-balance cross clusters.
> 
> The policy is:
> 	The loading of "ALL CPUs in the highest capacity"
> 						exceeds threshold(80%) or
> 	The loading of "Any CPUs not in the highest capacity"
> 						exceed threshold(80%)

We experimented with an overutilized (tipping point) indicator per SD
from Thara Gopinath (Linaro), mentioned by Vincent already, till v2 of
the Energy Aware Scheduling patch-set in 2018 but we couldn't find any
advantage using it over the one you now find in mainline.

https://lore.kernel.org/r/20180406153607.17815-4-dietmar.eggemann@arm.com

Maybe you can have a look at this patch and see if it gives you an
advantage with your use cases and system topology layout?

The 'system-wide' in the name of the patch is misleading. The current
approach is also system-wide, we have the overutilized information on
the root domain (system here stands for root domain). You change the
detection mechanism from per-CPU to a mixed-mode detection (per-CPU and
per-SG).

> Signed-off-by: YT Chang <yt.chang@mediatek.com>
> ---
>  kernel/sched/fair.c | 76 +++++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 65 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 036be95..f4c3d70 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5182,10 +5182,71 @@ static inline bool cpu_overutilized(int cpu)
>  static inline void update_overutilized_status(struct rq *rq)
>  {
>  	if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu)) {
> -		WRITE_ONCE(rq->rd->overutilized, SG_OVERUTILIZED);
> -		trace_sched_overutilized_tp(rq->rd, SG_OVERUTILIZED);
> +		if (capacity_orig_of(cpu_of(rq)) < rq->rd->max_cpu_capacity) {
> +			WRITE_ONCE(rq->rd->overutilized, SG_OVERUTILIZED);
> +			trace_sched_overutilized_tp(rq->rd, SG_OVERUTILIZED);
> +		}
>  	}
>  }
> +
> +static
> +void update_system_overutilized(struct sched_domain *sd, struct cpumask *cpus)
> +{
> +	unsigned long group_util;
> +	bool intra_overutil = false;
> +	unsigned long max_capacity;
> +	struct sched_group *group = sd->groups;
> +	struct root_domain *rd;
> +	int this_cpu;
> +	bool overutilized;
> +	int i;
> +
> +	this_cpu = smp_processor_id();
> +	rd = cpu_rq(this_cpu)->rd;
> +	overutilized = READ_ONCE(rd->overutilized);
> +	max_capacity = rd->max_cpu_capacity;
> +
> +	do {
> +		group_util = 0;
> +		for_each_cpu_and(i, sched_group_span(group), cpus) {
> +			group_util += cpu_util(i);
> +			if (cpu_overutilized(i)) {
> +				if (capacity_orig_of(i) < max_capacity) {
> +					intra_overutil = true;
> +					break;
> +				}
> +			}
> +		}
> +
> +		/*
> +		 * A capacity base hint for over-utilization.
> +		 * Not to trigger system overutiled if heavy tasks
> +		 * in Big.cluster, so
> +		 * add the free room(20%) of Big.cluster is impacted which means
> +		 * system-wide over-utilization,
> +		 * that considers whole cluster not single cpu
> +		 */
> +		if (group->group_weight > 1 && (group->sgc->capacity * 1024 <
> +						group_util * capacity_margin)) {

Why 'group->group_weight > 1' ? Do you have some out-of-tree code which
lets SGs with 1 CPU survive?

[...]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-09-23  8:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-19  7:20 [PATCH 1/1] sched/eas: introduce system-wide overutil indicator YT Chang
2019-09-19  8:00 ` Vincent Guittot
2019-09-19  8:10 ` kbuild test robot
2019-09-19  8:10 ` Quentin Perret
2019-09-21 14:44 ` kbuild test robot
2019-09-23  8:05 ` Dietmar Eggemann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).