linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -v4 00/10] sched/balancing: Misc updates & cleanups
@ 2024-03-08 10:58 Ingo Molnar
  2024-03-08 10:58 ` [PATCH 01/10] sched/balancing: Switch the 'DEFINE_SPINLOCK(balancing)' spinlock into an 'atomic_t sched_balance_running' flag Ingo Molnar
                   ` (9 more replies)
  0 siblings, 10 replies; 35+ messages in thread
From: Ingo Molnar @ 2024-03-08 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider, Vincent Guittot

Improve a handful of things I noticed while looking through the
scheduler balancing code. No change in functionality intended.

Changes in -v4:

 - Incorporate review feedback
 - Bump SCHEDSTAT_VERSION to 16, due to the changed order of colums
 - Allow CONFIG_SCHEDSTAT builds more widely

Thanks,

     Ingo

==============================>
Ingo Molnar (9):
  sched/balancing: Switch the 'DEFINE_SPINLOCK(balancing)' spinlock into an 'atomic_t sched_balance_running' flag
  sched/balancing: Change 'enum cpu_idle_type' to have more natural definitions
  sched/debug: Increase SCHEDSTAT_VERSION to 16
  sched/debug: Allow CONFIG_SCHEDSTATS even on !KERNEL_DEBUG kernels
  sched/balancing: Change comment formatting to not overlap Git conflict marker lines
  sched/balancing: Fix comments (trying to) refer to NOHZ_BALANCE_KICK
  sched/balancing: Update run_rebalance_domains() comments
  sched/balancing: Vertically align the comments of 'struct sg_lb_stats' and 'struct sd_lb_stats'
  sched/balancing: Update comments in 'struct sg_lb_stats' and 'struct sd_lb_stats'

Shrikanth Hegde (1):
  sched/balancing: Remove reliance on 'enum cpu_idle_type' ordering when iterating [CPU_MAX_IDLE_TYPES] arrays in show_schedstat()

 include/linux/sched/idle.h |   2 +-
 kernel/sched/fair.c        | 103 +++++++++++++++++++++++++++++++++++++----------------------------
 kernel/sched/stats.c       |   5 ++--
 lib/Kconfig.debug          |   2 +-
 4 files changed, 62 insertions(+), 50 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 01/10] sched/balancing: Switch the 'DEFINE_SPINLOCK(balancing)' spinlock into an 'atomic_t sched_balance_running' flag
  2024-03-08 10:58 [PATCH -v4 00/10] sched/balancing: Misc updates & cleanups Ingo Molnar
@ 2024-03-08 10:58 ` Ingo Molnar
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  2024-03-15  9:25   ` [PATCH 01/10] " Shrikanth Hegde
  2024-03-08 10:58 ` [PATCH 02/10] sched/balancing: Remove reliance on 'enum cpu_idle_type' ordering when iterating [CPU_MAX_IDLE_TYPES] arrays in show_schedstat() Ingo Molnar
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Ingo Molnar @ 2024-03-08 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider, Vincent Guittot

The 'balancing' spinlock added in:

  08c183f31bdb ("[PATCH] sched: add option to serialize load balancing")

... is taken when the SD_SERIALIZE flag is set in a domain, but in reality it
is a glorified global atomic flag serializing the load-balancing of
those domains.

It doesn't have any explicit locking semantics per se: we just
spin_trylock() it.

Turn it into a ... global atomic flag. This makes it more
clear what is going on here, and reduces overhead and code
size a bit:

  # kernel/sched/fair.o: [x86-64 defconfig]

     text	   data	    bss	    dec	    hex	filename
    60730	   2721	    104	  63555	   f843	fair.o.before
    60718	   2721	    104	  63543	   f837	fair.o.after

Also document the flag a bit.

No change in functionality intended.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6a16129f9a5c..2ef89b36aed1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11633,7 +11633,20 @@ static int active_load_balance_cpu_stop(void *data)
 	return 0;
 }
 
-static DEFINE_SPINLOCK(balancing);
+/*
+ * This flag serializes load-balancing passes over large domains
+ * (above the NODE topology level) - only one load-balancing instance
+ * may run at a time, to reduce overhead on very large systems with
+ * lots of CPUs and large NUMA distances.
+ *
+ * - Note that load-balancing passes triggered while another one
+ *   is executing are skipped and not re-tried.
+ *
+ * - Also note that this does not serialize rebalance_domains()
+ *   execution, as non-SD_SERIALIZE domains will still be
+ *   load-balanced in parallel.
+ */
+static atomic_t sched_balance_running = ATOMIC_INIT(0);
 
 /*
  * Scale the max load_balance interval with the number of CPUs in the system.
@@ -11711,7 +11724,7 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
 
 		need_serialize = sd->flags & SD_SERIALIZE;
 		if (need_serialize) {
-			if (!spin_trylock(&balancing))
+			if (atomic_cmpxchg_acquire(&sched_balance_running, 0, 1))
 				goto out;
 		}
 
@@ -11729,7 +11742,7 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
 			interval = get_sd_balance_interval(sd, busy);
 		}
 		if (need_serialize)
-			spin_unlock(&balancing);
+			atomic_set_release(&sched_balance_running, 0);
 out:
 		if (time_after(next_balance, sd->last_balance + interval)) {
 			next_balance = sd->last_balance + interval;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 02/10] sched/balancing: Remove reliance on 'enum cpu_idle_type' ordering when iterating [CPU_MAX_IDLE_TYPES] arrays in show_schedstat()
  2024-03-08 10:58 [PATCH -v4 00/10] sched/balancing: Misc updates & cleanups Ingo Molnar
  2024-03-08 10:58 ` [PATCH 01/10] sched/balancing: Switch the 'DEFINE_SPINLOCK(balancing)' spinlock into an 'atomic_t sched_balance_running' flag Ingo Molnar
@ 2024-03-08 10:58 ` Ingo Molnar
  2024-03-08 13:26   ` Vincent Guittot
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Shrikanth Hegde
  2024-03-08 10:58 ` [PATCH 03/10] sched/balancing: Change 'enum cpu_idle_type' to have more natural definitions Ingo Molnar
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Ingo Molnar @ 2024-03-08 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider, Vincent Guittot

From: Shrikanth Hegde <sshegde@linux.ibm.com>

show_schedstat() output breaks and doesn't print all entries
if the ordering of the definitions in 'enum cpu_idle_type' is changed,
because show_schedstat() assumes that 'CPU_IDLE' is 0.

Fix it before we change the definition order & values.

[ mingo: Added changelog. ]

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Valentin Schneider <vschneid@redhat.com>
---
 kernel/sched/stats.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 857f837f52cb..85277953cc72 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -150,8 +150,7 @@ static int show_schedstat(struct seq_file *seq, void *v)
 
 			seq_printf(seq, "domain%d %*pb", dcount++,
 				   cpumask_pr_args(sched_domain_span(sd)));
-			for (itype = CPU_IDLE; itype < CPU_MAX_IDLE_TYPES;
-					itype++) {
+			for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {
 				seq_printf(seq, " %u %u %u %u %u %u %u %u",
 				    sd->lb_count[itype],
 				    sd->lb_balanced[itype],
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 03/10] sched/balancing: Change 'enum cpu_idle_type' to have more natural definitions
  2024-03-08 10:58 [PATCH -v4 00/10] sched/balancing: Misc updates & cleanups Ingo Molnar
  2024-03-08 10:58 ` [PATCH 01/10] sched/balancing: Switch the 'DEFINE_SPINLOCK(balancing)' spinlock into an 'atomic_t sched_balance_running' flag Ingo Molnar
  2024-03-08 10:58 ` [PATCH 02/10] sched/balancing: Remove reliance on 'enum cpu_idle_type' ordering when iterating [CPU_MAX_IDLE_TYPES] arrays in show_schedstat() Ingo Molnar
@ 2024-03-08 10:58 ` Ingo Molnar
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  2024-03-08 10:58 ` [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16 Ingo Molnar
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2024-03-08 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider, Vincent Guittot

The cpu_idle_type enum has the confusingly inverted property
that 'not idle' is 1, and 'idle' is '0'.

This resulted in a number of unnecessary complications in the code.

Reverse the order, remove the CPU_NOT_IDLE type, and convert
all code to a natural boolean form.

It's much more readable:

  -       enum cpu_idle_type idle = this_rq->idle_balance ?
  -                                               CPU_IDLE : CPU_NOT_IDLE;
  -
  +       enum cpu_idle_type idle = this_rq->idle_balance;

  --------------------------------

  -       if (env->idle == CPU_NOT_IDLE || !busiest->sum_nr_running)
  +       if (!env->idle || !busiest->sum_nr_running)

  --------------------------------

And gets rid of the double negation in these usages:

  -               if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1)
  +               if (env->idle && env->src_rq->nr_running <= 1)

Furthermore, this makes code much more obvious where there's
differentiation between CPU_IDLE and CPU_NEWLY_IDLE.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
---
 include/linux/sched/idle.h |  2 +-
 kernel/sched/fair.c        | 27 ++++++++++++---------------
 2 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h
index 478084f9105e..e670ac282333 100644
--- a/include/linux/sched/idle.h
+++ b/include/linux/sched/idle.h
@@ -5,8 +5,8 @@
 #include <linux/sched.h>
 
 enum cpu_idle_type {
+	__CPU_NOT_IDLE = 0,
 	CPU_IDLE,
-	CPU_NOT_IDLE,
 	CPU_NEWLY_IDLE,
 	CPU_MAX_IDLE_TYPES
 };
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2ef89b36aed1..3a510cf1fb00 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9070,7 +9070,7 @@ static int detach_tasks(struct lb_env *env)
 		 * We don't want to steal all, otherwise we may be treated likewise,
 		 * which could at worst lead to a livelock crash.
 		 */
-		if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1)
+		if (env->idle && env->src_rq->nr_running <= 1)
 			break;
 
 		env->loop++;
@@ -9803,7 +9803,7 @@ static inline bool smt_vs_nonsmt_groups(struct sched_group *sg1,
 static inline bool smt_balance(struct lb_env *env, struct sg_lb_stats *sgs,
 			       struct sched_group *group)
 {
-	if (env->idle == CPU_NOT_IDLE)
+	if (!env->idle)
 		return false;
 
 	/*
@@ -9827,7 +9827,7 @@ static inline long sibling_imbalance(struct lb_env *env,
 	int ncores_busiest, ncores_local;
 	long imbalance;
 
-	if (env->idle == CPU_NOT_IDLE || !busiest->sum_nr_running)
+	if (!env->idle || !busiest->sum_nr_running)
 		return 0;
 
 	ncores_busiest = sds->busiest->cores;
@@ -9927,8 +9927,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 				sgs->group_misfit_task_load = rq->misfit_task_load;
 				*sg_status |= SG_OVERLOAD;
 			}
-		} else if ((env->idle != CPU_NOT_IDLE) &&
-			   sched_reduced_capacity(rq, env->sd)) {
+		} else if (env->idle && sched_reduced_capacity(rq, env->sd)) {
 			/* Check for a task running on a CPU with reduced capacity */
 			if (sgs->group_misfit_task_load < load)
 				sgs->group_misfit_task_load = load;
@@ -9940,7 +9939,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 	sgs->group_weight = group->group_weight;
 
 	/* Check if dst CPU is idle and preferred to this group */
-	if (!local_group && env->idle != CPU_NOT_IDLE && sgs->sum_h_nr_running &&
+	if (!local_group && env->idle && sgs->sum_h_nr_running &&
 	    sched_group_asym(env, sgs, group))
 		sgs->group_asym_packing = 1;
 
@@ -10698,7 +10697,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 			 * waiting task in this overloaded busiest group. Let's
 			 * try to pull it.
 			 */
-			if (env->idle != CPU_NOT_IDLE && env->imbalance == 0) {
+			if (env->idle && env->imbalance == 0) {
 				env->migration_type = migrate_task;
 				env->imbalance = 1;
 			}
@@ -10913,7 +10912,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 		goto force_balance;
 
 	if (busiest->group_type != group_overloaded) {
-		if (env->idle == CPU_NOT_IDLE) {
+		if (!env->idle) {
 			/*
 			 * If the busiest group is not overloaded (and as a
 			 * result the local one too) but this CPU is already
@@ -11121,7 +11120,7 @@ asym_active_balance(struct lb_env *env)
 	 * the lower priority @env::dst_cpu help it. Do not follow
 	 * CPU priority.
 	 */
-	return env->idle != CPU_NOT_IDLE && sched_use_asym_prio(env->sd, env->dst_cpu) &&
+	return env->idle && sched_use_asym_prio(env->sd, env->dst_cpu) &&
 	       (sched_asym_prefer(env->dst_cpu, env->src_cpu) ||
 		!sched_use_asym_prio(env->sd, env->src_cpu));
 }
@@ -11159,7 +11158,7 @@ static int need_active_balance(struct lb_env *env)
 	 * because of other sched_class or IRQs if more capacity stays
 	 * available on dst_cpu.
 	 */
-	if ((env->idle != CPU_NOT_IDLE) &&
+	if (env->idle &&
 	    (env->src_rq->cfs.h_nr_running == 1)) {
 		if ((check_cpu_capacity(env->src_rq, sd)) &&
 		    (capacity_of(env->src_cpu)*sd->imbalance_pct < capacity_of(env->dst_cpu)*100))
@@ -11735,8 +11734,8 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
 				 * env->dst_cpu, so we can't know our idle
 				 * state even if we migrated tasks. Update it.
 				 */
-				idle = idle_cpu(cpu) ? CPU_IDLE : CPU_NOT_IDLE;
-				busy = idle != CPU_IDLE && !sched_idle_cpu(cpu);
+				idle = idle_cpu(cpu);
+				busy = !idle && !sched_idle_cpu(cpu);
 			}
 			sd->last_balance = jiffies;
 			interval = get_sd_balance_interval(sd, busy);
@@ -12416,9 +12415,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 static __latent_entropy void run_rebalance_domains(struct softirq_action *h)
 {
 	struct rq *this_rq = this_rq();
-	enum cpu_idle_type idle = this_rq->idle_balance ?
-						CPU_IDLE : CPU_NOT_IDLE;
-
+	enum cpu_idle_type idle = this_rq->idle_balance;
 	/*
 	 * If this CPU has a pending nohz_balance_kick, then do the
 	 * balancing on behalf of the other idle CPUs whose ticks are
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16
  2024-03-08 10:58 [PATCH -v4 00/10] sched/balancing: Misc updates & cleanups Ingo Molnar
                   ` (2 preceding siblings ...)
  2024-03-08 10:58 ` [PATCH 03/10] sched/balancing: Change 'enum cpu_idle_type' to have more natural definitions Ingo Molnar
@ 2024-03-08 10:58 ` Ingo Molnar
  2024-03-08 16:40   ` Shrikanth Hegde
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  2024-03-08 10:58 ` [PATCH 05/10] sched/debug: Allow CONFIG_SCHEDSTATS even on !KERNEL_DEBUG kernels Ingo Molnar
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Ingo Molnar @ 2024-03-08 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider, Vincent Guittot

We changed the order of definitions within 'enum cpu_idle_type',
which changed the order of [CPU_MAX_IDLE_TYPES] columns in
show_schedstat().

Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/stats.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 85277953cc72..78e48f5426ee 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -113,7 +113,7 @@ void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p,
  * Bump this up when changing the output format or the meaning of an existing
  * format, so that tools can adapt (or abort)
  */
-#define SCHEDSTAT_VERSION 15
+#define SCHEDSTAT_VERSION 16
 
 static int show_schedstat(struct seq_file *seq, void *v)
 {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 05/10] sched/debug: Allow CONFIG_SCHEDSTATS even on !KERNEL_DEBUG kernels
  2024-03-08 10:58 [PATCH -v4 00/10] sched/balancing: Misc updates & cleanups Ingo Molnar
                   ` (3 preceding siblings ...)
  2024-03-08 10:58 ` [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16 Ingo Molnar
@ 2024-03-08 10:58 ` Ingo Molnar
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  2024-03-08 10:58 ` [PATCH 06/10] sched/balancing: Change comment formatting to not overlap Git conflict marker lines Ingo Molnar
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2024-03-08 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider, Vincent Guittot

All major Linux distributions enable CONFIG_SCHEDSTATS,
so make it more widely available.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
---
 lib/Kconfig.debug | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ef36b829ae1f..0d6700f23ca4 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1250,7 +1250,7 @@ config SCHED_INFO
 
 config SCHEDSTATS
 	bool "Collect scheduler statistics"
-	depends on DEBUG_KERNEL && PROC_FS
+	depends on PROC_FS
 	select SCHED_INFO
 	help
 	  If you say Y here, additional code will be inserted into the
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 06/10] sched/balancing: Change comment formatting to not overlap Git conflict marker lines
  2024-03-08 10:58 [PATCH -v4 00/10] sched/balancing: Misc updates & cleanups Ingo Molnar
                   ` (4 preceding siblings ...)
  2024-03-08 10:58 ` [PATCH 05/10] sched/debug: Allow CONFIG_SCHEDSTATS even on !KERNEL_DEBUG kernels Ingo Molnar
@ 2024-03-08 10:58 ` Ingo Molnar
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  2024-03-08 10:58 ` [PATCH 07/10] sched/balancing: Fix comments (trying to) refer to NOHZ_BALANCE_KICK Ingo Molnar
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2024-03-08 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider, Vincent Guittot

So the scheduler has two such comment blocks, with '=' used as a double underline:

        /*
         * VRUNTIME
         * ========
         *

'========' also happens to be a Git conflict marker, throwing off a simple
search in an editor for this pattern.

Change them to '-------' type of underline instead - it looks just as good.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
---
 kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3a510cf1fb00..84d4791cf628 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3679,7 +3679,7 @@ static void reweight_eevdf(struct cfs_rq *cfs_rq, struct sched_entity *se,
 
 	/*
 	 * VRUNTIME
-	 * ========
+	 * --------
 	 *
 	 * COROLLARY #1: The virtual runtime of the entity needs to be
 	 * adjusted if re-weight at !0-lag point.
@@ -3762,7 +3762,7 @@ static void reweight_eevdf(struct cfs_rq *cfs_rq, struct sched_entity *se,
 
 	/*
 	 * DEADLINE
-	 * ========
+	 * --------
 	 *
 	 * When the weight changes, the virtual time slope changes and
 	 * we should adjust the relative virtual deadline accordingly.
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 07/10] sched/balancing: Fix comments (trying to) refer to NOHZ_BALANCE_KICK
  2024-03-08 10:58 [PATCH -v4 00/10] sched/balancing: Misc updates & cleanups Ingo Molnar
                   ` (5 preceding siblings ...)
  2024-03-08 10:58 ` [PATCH 06/10] sched/balancing: Change comment formatting to not overlap Git conflict marker lines Ingo Molnar
@ 2024-03-08 10:58 ` Ingo Molnar
  2024-03-08 12:03   ` Vincent Guittot
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  2024-03-08 10:58 ` [PATCH 08/10] sched/balancing: Update run_rebalance_domains() comments Ingo Molnar
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Ingo Molnar @ 2024-03-08 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider, Vincent Guittot

Fix two typos:

 - There's no such thing as 'nohz_balancing_kick', the
   flag is named 'BALANCE' and is capitalized:  NOHZ_BALANCE_KICK.

 - Likewise there's no such thing as a 'pending nohz_balance_kick'
   either, the NOHZ_BALANCE_KICK flag is all upper-case.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 84d4791cf628..f3c03c6db3c8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12409,15 +12409,16 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 }
 
 /*
- * run_rebalance_domains is triggered when needed from the scheduler tick.
- * Also triggered for nohz idle balancing (with nohz_balancing_kick set).
+ * This softirq may be triggered from the scheduler tick, or by
+ * any of the flags in NOHZ_KICK_MASK: NOHZ_BALANCE_KICK,
+ * NOHZ_STATS_KICK or NOHZ_NEXT_KICK.
  */
 static __latent_entropy void run_rebalance_domains(struct softirq_action *h)
 {
 	struct rq *this_rq = this_rq();
 	enum cpu_idle_type idle = this_rq->idle_balance;
 	/*
-	 * If this CPU has a pending nohz_balance_kick, then do the
+	 * If this CPU has a pending NOHZ_BALANCE_KICK, then do the
 	 * balancing on behalf of the other idle CPUs whose ticks are
 	 * stopped. Do nohz_idle_balance *before* rebalance_domains to
 	 * give the idle CPUs a chance to load balance. Else we may
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 08/10] sched/balancing: Update run_rebalance_domains() comments
  2024-03-08 10:58 [PATCH -v4 00/10] sched/balancing: Misc updates & cleanups Ingo Molnar
                   ` (6 preceding siblings ...)
  2024-03-08 10:58 ` [PATCH 07/10] sched/balancing: Fix comments (trying to) refer to NOHZ_BALANCE_KICK Ingo Molnar
@ 2024-03-08 10:58 ` Ingo Molnar
  2024-03-08 12:02   ` Vincent Guittot
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  2024-03-08 10:59 ` [PATCH 09/10] sched/balancing: Vertically align the comments of 'struct sg_lb_stats' and 'struct sd_lb_stats' Ingo Molnar
  2024-03-08 10:59 ` [PATCH 10/10] sched/balancing: Update comments in " Ingo Molnar
  9 siblings, 2 replies; 35+ messages in thread
From: Ingo Molnar @ 2024-03-08 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider, Vincent Guittot

The first sentence of the comment explaining run_rebalance_domains()
is historic and not true anymore:

    * run_rebalance_domains is triggered when needed from the scheduler tick.

... contradicted/modified by the second sentence:

    * Also triggered for NOHZ idle balancing (with NOHZ_BALANCE_KICK set).

Avoid that kind of confusion straight away and explain from what
places sched_balance_softirq() is triggered.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f3c03c6db3c8..b567c0790f44 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12409,9 +12409,12 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 }
 
 /*
- * This softirq may be triggered from the scheduler tick, or by
- * any of the flags in NOHZ_KICK_MASK: NOHZ_BALANCE_KICK,
- * NOHZ_STATS_KICK or NOHZ_NEXT_KICK.
+ * This softirq handler is triggered via SCHED_SOFTIRQ from two places:
+ *
+ * - directly from the local scheduler_tick() for periodic load balancing
+ *
+ * - indirectly from a remote scheduler_tick() for NOHZ idle balancing
+ *   through the SMP cross-call nohz_csd_func()
  */
 static __latent_entropy void run_rebalance_domains(struct softirq_action *h)
 {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 09/10] sched/balancing: Vertically align the comments of 'struct sg_lb_stats' and 'struct sd_lb_stats'
  2024-03-08 10:58 [PATCH -v4 00/10] sched/balancing: Misc updates & cleanups Ingo Molnar
                   ` (7 preceding siblings ...)
  2024-03-08 10:58 ` [PATCH 08/10] sched/balancing: Update run_rebalance_domains() comments Ingo Molnar
@ 2024-03-08 10:59 ` Ingo Molnar
  2024-03-08 12:01   ` Vincent Guittot
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  2024-03-08 10:59 ` [PATCH 10/10] sched/balancing: Update comments in " Ingo Molnar
  9 siblings, 2 replies; 35+ messages in thread
From: Ingo Molnar @ 2024-03-08 10:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider, Vincent Guittot

Make them easier to read.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b567c0790f44..40b98e43d794 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9436,19 +9436,19 @@ static void update_blocked_averages(int cpu)
  * sg_lb_stats - stats of a sched_group required for load_balancing
  */
 struct sg_lb_stats {
-	unsigned long avg_load; /*Avg load across the CPUs of the group */
-	unsigned long group_load; /* Total load over the CPUs of the group */
+	unsigned long avg_load;			/* Avg load across the CPUs of the group */
+	unsigned long group_load;		/* Total load over the CPUs of the group */
 	unsigned long group_capacity;
-	unsigned long group_util; /* Total utilization over the CPUs of the group */
-	unsigned long group_runnable; /* Total runnable time over the CPUs of the group */
-	unsigned int sum_nr_running; /* Nr of tasks running in the group */
-	unsigned int sum_h_nr_running; /* Nr of CFS tasks running in the group */
+	unsigned long group_util;		/* Total utilization over the CPUs of the group */
+	unsigned long group_runnable;		/* Total runnable time over the CPUs of the group */
+	unsigned int sum_nr_running;		/* Nr of tasks running in the group */
+	unsigned int sum_h_nr_running;		/* Nr of CFS tasks running in the group */
 	unsigned int idle_cpus;
 	unsigned int group_weight;
 	enum group_type group_type;
-	unsigned int group_asym_packing; /* Tasks should be moved to preferred CPU */
-	unsigned int group_smt_balance;  /* Task on busy SMT be moved */
-	unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */
+	unsigned int group_asym_packing;	/* Tasks should be moved to preferred CPU */
+	unsigned int group_smt_balance;		/* Task on busy SMT be moved */
+	unsigned long group_misfit_task_load;	/* A CPU has a task too big for its capacity */
 #ifdef CONFIG_NUMA_BALANCING
 	unsigned int nr_numa_running;
 	unsigned int nr_preferred_running;
@@ -9460,15 +9460,15 @@ struct sg_lb_stats {
  *		 during load balancing.
  */
 struct sd_lb_stats {
-	struct sched_group *busiest;	/* Busiest group in this sd */
-	struct sched_group *local;	/* Local group in this sd */
-	unsigned long total_load;	/* Total load of all groups in sd */
-	unsigned long total_capacity;	/* Total capacity of all groups in sd */
-	unsigned long avg_load;	/* Average load across all groups in sd */
-	unsigned int prefer_sibling; /* tasks should go to sibling first */
+	struct sched_group *busiest;		/* Busiest group in this sd */
+	struct sched_group *local;		/* Local group in this sd */
+	unsigned long total_load;		/* Total load of all groups in sd */
+	unsigned long total_capacity;		/* Total capacity of all groups in sd */
+	unsigned long avg_load;			/* Average load across all groups in sd */
+	unsigned int prefer_sibling;		/* tasks should go to sibling first */
 
-	struct sg_lb_stats busiest_stat;/* Statistics of the busiest group */
-	struct sg_lb_stats local_stat;	/* Statistics of the local group */
+	struct sg_lb_stats busiest_stat;	/* Statistics of the busiest group */
+	struct sg_lb_stats local_stat;		/* Statistics of the local group */
 };
 
 static inline void init_sd_lb_stats(struct sd_lb_stats *sds)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 10/10] sched/balancing: Update comments in 'struct sg_lb_stats' and 'struct sd_lb_stats'
  2024-03-08 10:58 [PATCH -v4 00/10] sched/balancing: Misc updates & cleanups Ingo Molnar
                   ` (8 preceding siblings ...)
  2024-03-08 10:59 ` [PATCH 09/10] sched/balancing: Vertically align the comments of 'struct sg_lb_stats' and 'struct sd_lb_stats' Ingo Molnar
@ 2024-03-08 10:59 ` Ingo Molnar
  2024-03-08 12:01   ` Vincent Guittot
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  9 siblings, 2 replies; 35+ messages in thread
From: Ingo Molnar @ 2024-03-08 10:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider, Vincent Guittot

- Align for readability
- Capitalize consistently

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40b98e43d794..116a640534b9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9433,17 +9433,17 @@ static void update_blocked_averages(int cpu)
 /********** Helpers for find_busiest_group ************************/
 
 /*
- * sg_lb_stats - stats of a sched_group required for load_balancing
+ * sg_lb_stats - stats of a sched_group required for load-balancing:
  */
 struct sg_lb_stats {
-	unsigned long avg_load;			/* Avg load across the CPUs of the group */
-	unsigned long group_load;		/* Total load over the CPUs of the group */
-	unsigned long group_capacity;
-	unsigned long group_util;		/* Total utilization over the CPUs of the group */
+	unsigned long avg_load;			/* Avg load            over the CPUs of the group */
+	unsigned long group_load;		/* Total load          over the CPUs of the group */
+	unsigned long group_capacity;		/* Capacity            over the CPUs of the group */
+	unsigned long group_util;		/* Total utilization   over the CPUs of the group */
 	unsigned long group_runnable;		/* Total runnable time over the CPUs of the group */
-	unsigned int sum_nr_running;		/* Nr of tasks running in the group */
+	unsigned int sum_nr_running;		/* Nr of all tasks running in the group */
 	unsigned int sum_h_nr_running;		/* Nr of CFS tasks running in the group */
-	unsigned int idle_cpus;
+	unsigned int idle_cpus;                 /* Nr of idle CPUs         in the group */
 	unsigned int group_weight;
 	enum group_type group_type;
 	unsigned int group_asym_packing;	/* Tasks should be moved to preferred CPU */
@@ -9456,8 +9456,7 @@ struct sg_lb_stats {
 };
 
 /*
- * sd_lb_stats - Structure to store the statistics of a sched_domain
- *		 during load balancing.
+ * sd_lb_stats - stats of a sched_domain required for load-balancing:
  */
 struct sd_lb_stats {
 	struct sched_group *busiest;		/* Busiest group in this sd */
@@ -9465,7 +9464,7 @@ struct sd_lb_stats {
 	unsigned long total_load;		/* Total load of all groups in sd */
 	unsigned long total_capacity;		/* Total capacity of all groups in sd */
 	unsigned long avg_load;			/* Average load across all groups in sd */
-	unsigned int prefer_sibling;		/* tasks should go to sibling first */
+	unsigned int prefer_sibling;		/* Tasks should go to sibling first */
 
 	struct sg_lb_stats busiest_stat;	/* Statistics of the busiest group */
 	struct sg_lb_stats local_stat;		/* Statistics of the local group */
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 10/10] sched/balancing: Update comments in 'struct sg_lb_stats' and 'struct sd_lb_stats'
  2024-03-08 10:59 ` [PATCH 10/10] sched/balancing: Update comments in " Ingo Molnar
@ 2024-03-08 12:01   ` Vincent Guittot
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  1 sibling, 0 replies; 35+ messages in thread
From: Vincent Guittot @ 2024-03-08 12:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider

On Fri, 8 Mar 2024 at 11:59, Ingo Molnar <mingo@kernel.org> wrote:
>
> - Align for readability
> - Capitalize consistently
>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Reviewed-by: Valentin Schneider <vschneid@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
>  kernel/sched/fair.c | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 40b98e43d794..116a640534b9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9433,17 +9433,17 @@ static void update_blocked_averages(int cpu)
>  /********** Helpers for find_busiest_group ************************/
>
>  /*
> - * sg_lb_stats - stats of a sched_group required for load_balancing
> + * sg_lb_stats - stats of a sched_group required for load-balancing:
>   */
>  struct sg_lb_stats {
> -       unsigned long avg_load;                 /* Avg load across the CPUs of the group */
> -       unsigned long group_load;               /* Total load over the CPUs of the group */
> -       unsigned long group_capacity;
> -       unsigned long group_util;               /* Total utilization over the CPUs of the group */
> +       unsigned long avg_load;                 /* Avg load            over the CPUs of the group */
> +       unsigned long group_load;               /* Total load          over the CPUs of the group */
> +       unsigned long group_capacity;           /* Capacity            over the CPUs of the group */
> +       unsigned long group_util;               /* Total utilization   over the CPUs of the group */
>         unsigned long group_runnable;           /* Total runnable time over the CPUs of the group */
> -       unsigned int sum_nr_running;            /* Nr of tasks running in the group */
> +       unsigned int sum_nr_running;            /* Nr of all tasks running in the group */
>         unsigned int sum_h_nr_running;          /* Nr of CFS tasks running in the group */
> -       unsigned int idle_cpus;
> +       unsigned int idle_cpus;                 /* Nr of idle CPUs         in the group */
>         unsigned int group_weight;
>         enum group_type group_type;
>         unsigned int group_asym_packing;        /* Tasks should be moved to preferred CPU */
> @@ -9456,8 +9456,7 @@ struct sg_lb_stats {
>  };
>
>  /*
> - * sd_lb_stats - Structure to store the statistics of a sched_domain
> - *              during load balancing.
> + * sd_lb_stats - stats of a sched_domain required for load-balancing:
>   */
>  struct sd_lb_stats {
>         struct sched_group *busiest;            /* Busiest group in this sd */
> @@ -9465,7 +9464,7 @@ struct sd_lb_stats {
>         unsigned long total_load;               /* Total load of all groups in sd */
>         unsigned long total_capacity;           /* Total capacity of all groups in sd */
>         unsigned long avg_load;                 /* Average load across all groups in sd */
> -       unsigned int prefer_sibling;            /* tasks should go to sibling first */
> +       unsigned int prefer_sibling;            /* Tasks should go to sibling first */
>
>         struct sg_lb_stats busiest_stat;        /* Statistics of the busiest group */
>         struct sg_lb_stats local_stat;          /* Statistics of the local group */
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 09/10] sched/balancing: Vertically align the comments of 'struct sg_lb_stats' and 'struct sd_lb_stats'
  2024-03-08 10:59 ` [PATCH 09/10] sched/balancing: Vertically align the comments of 'struct sg_lb_stats' and 'struct sd_lb_stats' Ingo Molnar
@ 2024-03-08 12:01   ` Vincent Guittot
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  1 sibling, 0 replies; 35+ messages in thread
From: Vincent Guittot @ 2024-03-08 12:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider

On Fri, 8 Mar 2024 at 11:59, Ingo Molnar <mingo@kernel.org> wrote:
>
> Make them easier to read.
>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Reviewed-by: Valentin Schneider <vschneid@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
>  kernel/sched/fair.c | 34 +++++++++++++++++-----------------
>  1 file changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b567c0790f44..40b98e43d794 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9436,19 +9436,19 @@ static void update_blocked_averages(int cpu)
>   * sg_lb_stats - stats of a sched_group required for load_balancing
>   */
>  struct sg_lb_stats {
> -       unsigned long avg_load; /*Avg load across the CPUs of the group */
> -       unsigned long group_load; /* Total load over the CPUs of the group */
> +       unsigned long avg_load;                 /* Avg load across the CPUs of the group */
> +       unsigned long group_load;               /* Total load over the CPUs of the group */
>         unsigned long group_capacity;
> -       unsigned long group_util; /* Total utilization over the CPUs of the group */
> -       unsigned long group_runnable; /* Total runnable time over the CPUs of the group */
> -       unsigned int sum_nr_running; /* Nr of tasks running in the group */
> -       unsigned int sum_h_nr_running; /* Nr of CFS tasks running in the group */
> +       unsigned long group_util;               /* Total utilization over the CPUs of the group */
> +       unsigned long group_runnable;           /* Total runnable time over the CPUs of the group */
> +       unsigned int sum_nr_running;            /* Nr of tasks running in the group */
> +       unsigned int sum_h_nr_running;          /* Nr of CFS tasks running in the group */
>         unsigned int idle_cpus;
>         unsigned int group_weight;
>         enum group_type group_type;
> -       unsigned int group_asym_packing; /* Tasks should be moved to preferred CPU */
> -       unsigned int group_smt_balance;  /* Task on busy SMT be moved */
> -       unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */
> +       unsigned int group_asym_packing;        /* Tasks should be moved to preferred CPU */
> +       unsigned int group_smt_balance;         /* Task on busy SMT be moved */
> +       unsigned long group_misfit_task_load;   /* A CPU has a task too big for its capacity */
>  #ifdef CONFIG_NUMA_BALANCING
>         unsigned int nr_numa_running;
>         unsigned int nr_preferred_running;
> @@ -9460,15 +9460,15 @@ struct sg_lb_stats {
>   *              during load balancing.
>   */
>  struct sd_lb_stats {
> -       struct sched_group *busiest;    /* Busiest group in this sd */
> -       struct sched_group *local;      /* Local group in this sd */
> -       unsigned long total_load;       /* Total load of all groups in sd */
> -       unsigned long total_capacity;   /* Total capacity of all groups in sd */
> -       unsigned long avg_load; /* Average load across all groups in sd */
> -       unsigned int prefer_sibling; /* tasks should go to sibling first */
> +       struct sched_group *busiest;            /* Busiest group in this sd */
> +       struct sched_group *local;              /* Local group in this sd */
> +       unsigned long total_load;               /* Total load of all groups in sd */
> +       unsigned long total_capacity;           /* Total capacity of all groups in sd */
> +       unsigned long avg_load;                 /* Average load across all groups in sd */
> +       unsigned int prefer_sibling;            /* tasks should go to sibling first */
>
> -       struct sg_lb_stats busiest_stat;/* Statistics of the busiest group */
> -       struct sg_lb_stats local_stat;  /* Statistics of the local group */
> +       struct sg_lb_stats busiest_stat;        /* Statistics of the busiest group */
> +       struct sg_lb_stats local_stat;          /* Statistics of the local group */
>  };
>
>  static inline void init_sd_lb_stats(struct sd_lb_stats *sds)
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 08/10] sched/balancing: Update run_rebalance_domains() comments
  2024-03-08 10:58 ` [PATCH 08/10] sched/balancing: Update run_rebalance_domains() comments Ingo Molnar
@ 2024-03-08 12:02   ` Vincent Guittot
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  1 sibling, 0 replies; 35+ messages in thread
From: Vincent Guittot @ 2024-03-08 12:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider

On Fri, 8 Mar 2024 at 11:59, Ingo Molnar <mingo@kernel.org> wrote:
>
> The first sentence of the comment explaining run_rebalance_domains()
> is historic and not true anymore:
>
>     * run_rebalance_domains is triggered when needed from the scheduler tick.
>
> ... contradicted/modified by the second sentence:
>
>     * Also triggered for NOHZ idle balancing (with NOHZ_BALANCE_KICK set).
>
> Avoid that kind of confusion straight away and explain from what
> places sched_balance_softirq() is triggered.
>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
>  kernel/sched/fair.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f3c03c6db3c8..b567c0790f44 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12409,9 +12409,12 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
>  }
>
>  /*
> - * This softirq may be triggered from the scheduler tick, or by
> - * any of the flags in NOHZ_KICK_MASK: NOHZ_BALANCE_KICK,
> - * NOHZ_STATS_KICK or NOHZ_NEXT_KICK.
> + * This softirq handler is triggered via SCHED_SOFTIRQ from two places:
> + *
> + * - directly from the local scheduler_tick() for periodic load balancing
> + *
> + * - indirectly from a remote scheduler_tick() for NOHZ idle balancing
> + *   through the SMP cross-call nohz_csd_func()
>   */
>  static __latent_entropy void run_rebalance_domains(struct softirq_action *h)
>  {
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 07/10] sched/balancing: Fix comments (trying to) refer to NOHZ_BALANCE_KICK
  2024-03-08 10:58 ` [PATCH 07/10] sched/balancing: Fix comments (trying to) refer to NOHZ_BALANCE_KICK Ingo Molnar
@ 2024-03-08 12:03   ` Vincent Guittot
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  1 sibling, 0 replies; 35+ messages in thread
From: Vincent Guittot @ 2024-03-08 12:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider

On Fri, 8 Mar 2024 at 11:59, Ingo Molnar <mingo@kernel.org> wrote:
>
> Fix two typos:
>
>  - There's no such thing as 'nohz_balancing_kick', the
>    flag is named 'BALANCE' and is capitalized:  NOHZ_BALANCE_KICK.
>
>  - Likewise there's no such thing as a 'pending nohz_balance_kick'
>    either, the NOHZ_BALANCE_KICK flag is all upper-case.
>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Reviewed-by: Valentin Schneider <vschneid@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
>  kernel/sched/fair.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 84d4791cf628..f3c03c6db3c8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12409,15 +12409,16 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
>  }
>
>  /*
> - * run_rebalance_domains is triggered when needed from the scheduler tick.
> - * Also triggered for nohz idle balancing (with nohz_balancing_kick set).
> + * This softirq may be triggered from the scheduler tick, or by
> + * any of the flags in NOHZ_KICK_MASK: NOHZ_BALANCE_KICK,
> + * NOHZ_STATS_KICK or NOHZ_NEXT_KICK.
>   */
>  static __latent_entropy void run_rebalance_domains(struct softirq_action *h)
>  {
>         struct rq *this_rq = this_rq();
>         enum cpu_idle_type idle = this_rq->idle_balance;
>         /*
> -        * If this CPU has a pending nohz_balance_kick, then do the
> +        * If this CPU has a pending NOHZ_BALANCE_KICK, then do the
>          * balancing on behalf of the other idle CPUs whose ticks are
>          * stopped. Do nohz_idle_balance *before* rebalance_domains to
>          * give the idle CPUs a chance to load balance. Else we may
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/10] sched/balancing: Remove reliance on 'enum cpu_idle_type' ordering when iterating [CPU_MAX_IDLE_TYPES] arrays in show_schedstat()
  2024-03-08 10:58 ` [PATCH 02/10] sched/balancing: Remove reliance on 'enum cpu_idle_type' ordering when iterating [CPU_MAX_IDLE_TYPES] arrays in show_schedstat() Ingo Molnar
@ 2024-03-08 13:26   ` Vincent Guittot
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Shrikanth Hegde
  1 sibling, 0 replies; 35+ messages in thread
From: Vincent Guittot @ 2024-03-08 13:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Valentin Schneider

On Fri, 8 Mar 2024 at 11:59, Ingo Molnar <mingo@kernel.org> wrote:
>
> From: Shrikanth Hegde <sshegde@linux.ibm.com>
>
> show_schedstat() output breaks and doesn't print all entries
> if the ordering of the definitions in 'enum cpu_idle_type' is changed,
> because show_schedstat() assumes that 'CPU_IDLE' is 0.
>
> Fix it before we change the definition order & values.
>
> [ mingo: Added changelog. ]
>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Valentin Schneider <vschneid@redhat.com>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
>  kernel/sched/stats.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
> index 857f837f52cb..85277953cc72 100644
> --- a/kernel/sched/stats.c
> +++ b/kernel/sched/stats.c
> @@ -150,8 +150,7 @@ static int show_schedstat(struct seq_file *seq, void *v)
>
>                         seq_printf(seq, "domain%d %*pb", dcount++,
>                                    cpumask_pr_args(sched_domain_span(sd)));
> -                       for (itype = CPU_IDLE; itype < CPU_MAX_IDLE_TYPES;
> -                                       itype++) {
> +                       for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {
>                                 seq_printf(seq, " %u %u %u %u %u %u %u %u",
>                                     sd->lb_count[itype],
>                                     sd->lb_balanced[itype],
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16
  2024-03-08 10:58 ` [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16 Ingo Molnar
@ 2024-03-08 16:40   ` Shrikanth Hegde
  2024-03-12 10:01     ` Ingo Molnar
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  1 sibling, 1 reply; 35+ messages in thread
From: Shrikanth Hegde @ 2024-03-08 16:40 UTC (permalink / raw)
  To: Ingo Molnar, linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Valentin Schneider, Vincent Guittot, Gautham R . Shenoy



On 3/8/24 4:28 PM, Ingo Molnar wrote:
> We changed the order of definitions within 'enum cpu_idle_type',
> which changed the order of [CPU_MAX_IDLE_TYPES] columns in
> show_schedstat().
> 

> +++ b/kernel/sched/stats.c
> @@ -113,7 +113,7 @@ void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p,
>   * Bump this up when changing the output format or the meaning of an existing
>   * format, so that tools can adapt (or abort)
>   */
> -#define SCHEDSTAT_VERSION 15
> +#define SCHEDSTAT_VERSION 16

Please add the info about version, and change of the order 
briefly in Documentation/scheduler/sched-stats.rst as well.

One recent user that I recollect is sched scoreboard. Version number should 
be able to help the users when they it is breaking their scripts. 

+Gautham, Any thoughts?

>  
>  static int show_schedstat(struct seq_file *seq, void *v)
>  {

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16
  2024-03-08 16:40   ` Shrikanth Hegde
@ 2024-03-12 10:01     ` Ingo Molnar
  2024-03-14  5:21       ` Gautham R. Shenoy
  0 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2024-03-12 10:01 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: linux-kernel, Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Valentin Schneider, Vincent Guittot, Gautham R . Shenoy


* Shrikanth Hegde <sshegde@linux.ibm.com> wrote:

> 
> 
> On 3/8/24 4:28 PM, Ingo Molnar wrote:
> > We changed the order of definitions within 'enum cpu_idle_type',
> > which changed the order of [CPU_MAX_IDLE_TYPES] columns in
> > show_schedstat().
> > 
> 
> > +++ b/kernel/sched/stats.c
> > @@ -113,7 +113,7 @@ void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p,
> >   * Bump this up when changing the output format or the meaning of an existing
> >   * format, so that tools can adapt (or abort)
> >   */
> > -#define SCHEDSTAT_VERSION 15
> > +#define SCHEDSTAT_VERSION 16
> 
> Please add the info about version, and change of the order 
> briefly in Documentation/scheduler/sched-stats.rst as well.

Good point, I've added this paragraph to sched-stats.rst:

 +Version 16 of schedstats changed the order of definitions within
 +'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES]
 +columns in show_schedstat(). In particular the position of CPU_IDLE
 +and __CPU_NOT_IDLE changed places. The size of the array is unchanged.

> One recent user that I recollect is sched scoreboard. Version number should 
> be able to help the users when they it is breaking their scripts. 
> 
> +Gautham, Any thoughts?

If it's really a problem, I suspect we could maintain the v15 order of 
output.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/balancing: Update comments in 'struct sg_lb_stats' and 'struct sd_lb_stats'
  2024-03-08 10:59 ` [PATCH 10/10] sched/balancing: Update comments in " Ingo Molnar
  2024-03-08 12:01   ` Vincent Guittot
@ 2024-03-12 12:00   ` tip-bot2 for Ingo Molnar
  1 sibling, 0 replies; 35+ messages in thread
From: tip-bot2 for Ingo Molnar @ 2024-03-12 12:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ingo Molnar, Valentin Schneider, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     33928ed8bde066316dc3cb6ccf7f74400073652f
Gitweb:        https://git.kernel.org/tip/33928ed8bde066316dc3cb6ccf7f74400073652f
Author:        Ingo Molnar <mingo@kernel.org>
AuthorDate:    Fri, 08 Mar 2024 11:59:01 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 12 Mar 2024 11:59:59 +01:00

sched/balancing: Update comments in 'struct sg_lb_stats' and 'struct sd_lb_stats'

- Align for readability
- Capitalize consistently

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20240308105901.1096078-11-mingo@kernel.org
---
 kernel/sched/fair.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40b98e4..116a640 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9433,17 +9433,17 @@ static void update_blocked_averages(int cpu)
 /********** Helpers for find_busiest_group ************************/
 
 /*
- * sg_lb_stats - stats of a sched_group required for load_balancing
+ * sg_lb_stats - stats of a sched_group required for load-balancing:
  */
 struct sg_lb_stats {
-	unsigned long avg_load;			/* Avg load across the CPUs of the group */
-	unsigned long group_load;		/* Total load over the CPUs of the group */
-	unsigned long group_capacity;
-	unsigned long group_util;		/* Total utilization over the CPUs of the group */
+	unsigned long avg_load;			/* Avg load            over the CPUs of the group */
+	unsigned long group_load;		/* Total load          over the CPUs of the group */
+	unsigned long group_capacity;		/* Capacity            over the CPUs of the group */
+	unsigned long group_util;		/* Total utilization   over the CPUs of the group */
 	unsigned long group_runnable;		/* Total runnable time over the CPUs of the group */
-	unsigned int sum_nr_running;		/* Nr of tasks running in the group */
+	unsigned int sum_nr_running;		/* Nr of all tasks running in the group */
 	unsigned int sum_h_nr_running;		/* Nr of CFS tasks running in the group */
-	unsigned int idle_cpus;
+	unsigned int idle_cpus;                 /* Nr of idle CPUs         in the group */
 	unsigned int group_weight;
 	enum group_type group_type;
 	unsigned int group_asym_packing;	/* Tasks should be moved to preferred CPU */
@@ -9456,8 +9456,7 @@ struct sg_lb_stats {
 };
 
 /*
- * sd_lb_stats - Structure to store the statistics of a sched_domain
- *		 during load balancing.
+ * sd_lb_stats - stats of a sched_domain required for load-balancing:
  */
 struct sd_lb_stats {
 	struct sched_group *busiest;		/* Busiest group in this sd */
@@ -9465,7 +9464,7 @@ struct sd_lb_stats {
 	unsigned long total_load;		/* Total load of all groups in sd */
 	unsigned long total_capacity;		/* Total capacity of all groups in sd */
 	unsigned long avg_load;			/* Average load across all groups in sd */
-	unsigned int prefer_sibling;		/* tasks should go to sibling first */
+	unsigned int prefer_sibling;		/* Tasks should go to sibling first */
 
 	struct sg_lb_stats busiest_stat;	/* Statistics of the busiest group */
 	struct sg_lb_stats local_stat;		/* Statistics of the local group */

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/balancing: Vertically align the comments of 'struct sg_lb_stats' and 'struct sd_lb_stats'
  2024-03-08 10:59 ` [PATCH 09/10] sched/balancing: Vertically align the comments of 'struct sg_lb_stats' and 'struct sd_lb_stats' Ingo Molnar
  2024-03-08 12:01   ` Vincent Guittot
@ 2024-03-12 12:00   ` tip-bot2 for Ingo Molnar
  1 sibling, 0 replies; 35+ messages in thread
From: tip-bot2 for Ingo Molnar @ 2024-03-12 12:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ingo Molnar, Valentin Schneider, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     e492e1b0e0721f3929ef9d9708d029144b396dd7
Gitweb:        https://git.kernel.org/tip/e492e1b0e0721f3929ef9d9708d029144b396dd7
Author:        Ingo Molnar <mingo@kernel.org>
AuthorDate:    Fri, 08 Mar 2024 11:59:00 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 12 Mar 2024 11:59:59 +01:00

sched/balancing: Vertically align the comments of 'struct sg_lb_stats' and 'struct sd_lb_stats'

Make them easier to read.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20240308105901.1096078-10-mingo@kernel.org
---
 kernel/sched/fair.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b567c07..40b98e4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9436,19 +9436,19 @@ static void update_blocked_averages(int cpu)
  * sg_lb_stats - stats of a sched_group required for load_balancing
  */
 struct sg_lb_stats {
-	unsigned long avg_load; /*Avg load across the CPUs of the group */
-	unsigned long group_load; /* Total load over the CPUs of the group */
+	unsigned long avg_load;			/* Avg load across the CPUs of the group */
+	unsigned long group_load;		/* Total load over the CPUs of the group */
 	unsigned long group_capacity;
-	unsigned long group_util; /* Total utilization over the CPUs of the group */
-	unsigned long group_runnable; /* Total runnable time over the CPUs of the group */
-	unsigned int sum_nr_running; /* Nr of tasks running in the group */
-	unsigned int sum_h_nr_running; /* Nr of CFS tasks running in the group */
+	unsigned long group_util;		/* Total utilization over the CPUs of the group */
+	unsigned long group_runnable;		/* Total runnable time over the CPUs of the group */
+	unsigned int sum_nr_running;		/* Nr of tasks running in the group */
+	unsigned int sum_h_nr_running;		/* Nr of CFS tasks running in the group */
 	unsigned int idle_cpus;
 	unsigned int group_weight;
 	enum group_type group_type;
-	unsigned int group_asym_packing; /* Tasks should be moved to preferred CPU */
-	unsigned int group_smt_balance;  /* Task on busy SMT be moved */
-	unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */
+	unsigned int group_asym_packing;	/* Tasks should be moved to preferred CPU */
+	unsigned int group_smt_balance;		/* Task on busy SMT be moved */
+	unsigned long group_misfit_task_load;	/* A CPU has a task too big for its capacity */
 #ifdef CONFIG_NUMA_BALANCING
 	unsigned int nr_numa_running;
 	unsigned int nr_preferred_running;
@@ -9460,15 +9460,15 @@ struct sg_lb_stats {
  *		 during load balancing.
  */
 struct sd_lb_stats {
-	struct sched_group *busiest;	/* Busiest group in this sd */
-	struct sched_group *local;	/* Local group in this sd */
-	unsigned long total_load;	/* Total load of all groups in sd */
-	unsigned long total_capacity;	/* Total capacity of all groups in sd */
-	unsigned long avg_load;	/* Average load across all groups in sd */
-	unsigned int prefer_sibling; /* tasks should go to sibling first */
-
-	struct sg_lb_stats busiest_stat;/* Statistics of the busiest group */
-	struct sg_lb_stats local_stat;	/* Statistics of the local group */
+	struct sched_group *busiest;		/* Busiest group in this sd */
+	struct sched_group *local;		/* Local group in this sd */
+	unsigned long total_load;		/* Total load of all groups in sd */
+	unsigned long total_capacity;		/* Total capacity of all groups in sd */
+	unsigned long avg_load;			/* Average load across all groups in sd */
+	unsigned int prefer_sibling;		/* tasks should go to sibling first */
+
+	struct sg_lb_stats busiest_stat;	/* Statistics of the busiest group */
+	struct sg_lb_stats local_stat;		/* Statistics of the local group */
 };
 
 static inline void init_sd_lb_stats(struct sd_lb_stats *sds)

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/balancing: Update run_rebalance_domains() comments
  2024-03-08 10:58 ` [PATCH 08/10] sched/balancing: Update run_rebalance_domains() comments Ingo Molnar
  2024-03-08 12:02   ` Vincent Guittot
@ 2024-03-12 12:00   ` tip-bot2 for Ingo Molnar
  1 sibling, 0 replies; 35+ messages in thread
From: tip-bot2 for Ingo Molnar @ 2024-03-12 12:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ingo Molnar, Vincent Guittot, Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     3dc6f6c8efe2d9148865664fffbe9dd89fb0d157
Gitweb:        https://git.kernel.org/tip/3dc6f6c8efe2d9148865664fffbe9dd89fb0d157
Author:        Ingo Molnar <mingo@kernel.org>
AuthorDate:    Fri, 08 Mar 2024 11:58:59 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 12 Mar 2024 11:59:43 +01:00

sched/balancing: Update run_rebalance_domains() comments

The first sentence of the comment explaining run_rebalance_domains()
is historic and not true anymore:

    * run_rebalance_domains is triggered when needed from the scheduler tick.

... contradicted/modified by the second sentence:

    * Also triggered for NOHZ idle balancing (with NOHZ_BALANCE_KICK set).

Avoid that kind of confusion straight away and explain from what
places sched_balance_softirq() is triggered.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Valentin Schneider <vschneid@redhat.com>
Link: https://lore.kernel.org/r/20240308105901.1096078-9-mingo@kernel.org
---
 kernel/sched/fair.c |  9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f3c03c6..b567c07 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12409,9 +12409,12 @@ out:
 }
 
 /*
- * This softirq may be triggered from the scheduler tick, or by
- * any of the flags in NOHZ_KICK_MASK: NOHZ_BALANCE_KICK,
- * NOHZ_STATS_KICK or NOHZ_NEXT_KICK.
+ * This softirq handler is triggered via SCHED_SOFTIRQ from two places:
+ *
+ * - directly from the local scheduler_tick() for periodic load balancing
+ *
+ * - indirectly from a remote scheduler_tick() for NOHZ idle balancing
+ *   through the SMP cross-call nohz_csd_func()
  */
 static __latent_entropy void run_rebalance_domains(struct softirq_action *h)
 {

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/balancing: Change comment formatting to not overlap Git conflict marker lines
  2024-03-08 10:58 ` [PATCH 06/10] sched/balancing: Change comment formatting to not overlap Git conflict marker lines Ingo Molnar
@ 2024-03-12 12:00   ` tip-bot2 for Ingo Molnar
  0 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Ingo Molnar @ 2024-03-12 12:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ingo Molnar, Valentin Schneider, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     be8858dba9a2c3aec454a6b382671101fd0dc3b7
Gitweb:        https://git.kernel.org/tip/be8858dba9a2c3aec454a6b382671101fd0dc3b7
Author:        Ingo Molnar <mingo@kernel.org>
AuthorDate:    Fri, 08 Mar 2024 11:58:57 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 12 Mar 2024 11:03:41 +01:00

sched/balancing: Change comment formatting to not overlap Git conflict marker lines

So the scheduler has two such comment blocks, with '=' used as a double underline:

        /*
         * VRUNTIME
         * ========
         *

'========' also happens to be a Git conflict marker, throwing off a simple
search in an editor for this pattern.

Change them to '-------' type of underline instead - it looks just as good.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20240308105901.1096078-7-mingo@kernel.org
---
 kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3a510cf..84d4791 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3679,7 +3679,7 @@ static void reweight_eevdf(struct cfs_rq *cfs_rq, struct sched_entity *se,
 
 	/*
 	 * VRUNTIME
-	 * ========
+	 * --------
 	 *
 	 * COROLLARY #1: The virtual runtime of the entity needs to be
 	 * adjusted if re-weight at !0-lag point.
@@ -3762,7 +3762,7 @@ static void reweight_eevdf(struct cfs_rq *cfs_rq, struct sched_entity *se,
 
 	/*
 	 * DEADLINE
-	 * ========
+	 * --------
 	 *
 	 * When the weight changes, the virtual time slope changes and
 	 * we should adjust the relative virtual deadline accordingly.

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/balancing: Fix comments (trying to) refer to NOHZ_BALANCE_KICK
  2024-03-08 10:58 ` [PATCH 07/10] sched/balancing: Fix comments (trying to) refer to NOHZ_BALANCE_KICK Ingo Molnar
  2024-03-08 12:03   ` Vincent Guittot
@ 2024-03-12 12:00   ` tip-bot2 for Ingo Molnar
  1 sibling, 0 replies; 35+ messages in thread
From: tip-bot2 for Ingo Molnar @ 2024-03-12 12:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ingo Molnar, Valentin Schneider, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     3a5fe9305719c680ccf63216781a4d4068c8e3f3
Gitweb:        https://git.kernel.org/tip/3a5fe9305719c680ccf63216781a4d4068c8e3f3
Author:        Ingo Molnar <mingo@kernel.org>
AuthorDate:    Fri, 08 Mar 2024 11:58:58 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 12 Mar 2024 11:03:42 +01:00

sched/balancing: Fix comments (trying to) refer to NOHZ_BALANCE_KICK

Fix two typos:

 - There's no such thing as 'nohz_balancing_kick', the
   flag is named 'BALANCE' and is capitalized:  NOHZ_BALANCE_KICK.

 - Likewise there's no such thing as a 'pending nohz_balance_kick'
   either, the NOHZ_BALANCE_KICK flag is all upper-case.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20240308105901.1096078-8-mingo@kernel.org
---
 kernel/sched/fair.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 84d4791..f3c03c6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12409,15 +12409,16 @@ out:
 }
 
 /*
- * run_rebalance_domains is triggered when needed from the scheduler tick.
- * Also triggered for nohz idle balancing (with nohz_balancing_kick set).
+ * This softirq may be triggered from the scheduler tick, or by
+ * any of the flags in NOHZ_KICK_MASK: NOHZ_BALANCE_KICK,
+ * NOHZ_STATS_KICK or NOHZ_NEXT_KICK.
  */
 static __latent_entropy void run_rebalance_domains(struct softirq_action *h)
 {
 	struct rq *this_rq = this_rq();
 	enum cpu_idle_type idle = this_rq->idle_balance;
 	/*
-	 * If this CPU has a pending nohz_balance_kick, then do the
+	 * If this CPU has a pending NOHZ_BALANCE_KICK, then do the
 	 * balancing on behalf of the other idle CPUs whose ticks are
 	 * stopped. Do nohz_idle_balance *before* rebalance_domains to
 	 * give the idle CPUs a chance to load balance. Else we may

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/debug: Allow CONFIG_SCHEDSTATS even on !KERNEL_DEBUG kernels
  2024-03-08 10:58 ` [PATCH 05/10] sched/debug: Allow CONFIG_SCHEDSTATS even on !KERNEL_DEBUG kernels Ingo Molnar
@ 2024-03-12 12:00   ` tip-bot2 for Ingo Molnar
  0 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Ingo Molnar @ 2024-03-12 12:00 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ingo Molnar, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     9ab121d65e03b4dc38f207871070eb353b396b05
Gitweb:        https://git.kernel.org/tip/9ab121d65e03b4dc38f207871070eb353b396b05
Author:        Ingo Molnar <mingo@kernel.org>
AuthorDate:    Fri, 08 Mar 2024 11:58:56 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 12 Mar 2024 11:03:41 +01:00

sched/debug: Allow CONFIG_SCHEDSTATS even on !KERNEL_DEBUG kernels

All major Linux distributions enable CONFIG_SCHEDSTATS,
so make it more widely available.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240308105901.1096078-6-mingo@kernel.org
---
 lib/Kconfig.debug | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 6c596e6..ed2ad1a 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1250,7 +1250,7 @@ config SCHED_INFO
 
 config SCHEDSTATS
 	bool "Collect scheduler statistics"
-	depends on DEBUG_KERNEL && PROC_FS
+	depends on PROC_FS
 	select SCHED_INFO
 	help
 	  If you say Y here, additional code will be inserted into the

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/debug: Increase SCHEDSTAT_VERSION to 16
  2024-03-08 10:58 ` [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16 Ingo Molnar
  2024-03-08 16:40   ` Shrikanth Hegde
@ 2024-03-12 12:00   ` tip-bot2 for Ingo Molnar
  1 sibling, 0 replies; 35+ messages in thread
From: tip-bot2 for Ingo Molnar @ 2024-03-12 12:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Shrikanth Hegde, Ingo Molnar, Gautham R. Shenoy, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     11b0bfa5d463b17cac5bf6b94fea4921713530c3
Gitweb:        https://git.kernel.org/tip/11b0bfa5d463b17cac5bf6b94fea4921713530c3
Author:        Ingo Molnar <mingo@kernel.org>
AuthorDate:    Fri, 08 Mar 2024 11:58:55 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 12 Mar 2024 11:03:40 +01:00

sched/debug: Increase SCHEDSTAT_VERSION to 16

We changed the order of definitions within 'enum cpu_idle_type',
which changed the order of [CPU_MAX_IDLE_TYPES] columns in
show_schedstat().

Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "Gautham R. Shenoy" <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20240308105901.1096078-5-mingo@kernel.org
---
 Documentation/scheduler/sched-stats.rst | 5 +++++
 kernel/sched/stats.c                    | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/scheduler/sched-stats.rst b/Documentation/scheduler/sched-stats.rst
index 03c0629..73c4126 100644
--- a/Documentation/scheduler/sched-stats.rst
+++ b/Documentation/scheduler/sched-stats.rst
@@ -2,6 +2,11 @@
 Scheduler Statistics
 ====================
 
+Version 16 of schedstats changed the order of definitions within
+'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES]
+columns in show_schedstat(). In particular the position of CPU_IDLE
+and __CPU_NOT_IDLE changed places. The size of the array is unchanged.
+
 Version 15 of schedstats dropped counters for some sched_yield:
 yld_exp_empty, yld_act_empty and yld_both_empty. Otherwise, it is
 identical to version 14.
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 8527795..78e48f5 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -113,7 +113,7 @@ void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p,
  * Bump this up when changing the output format or the meaning of an existing
  * format, so that tools can adapt (or abort)
  */
-#define SCHEDSTAT_VERSION 15
+#define SCHEDSTAT_VERSION 16
 
 static int show_schedstat(struct seq_file *seq, void *v)
 {

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/balancing: Remove reliance on 'enum cpu_idle_type' ordering when iterating [CPU_MAX_IDLE_TYPES] arrays in show_schedstat()
  2024-03-08 10:58 ` [PATCH 02/10] sched/balancing: Remove reliance on 'enum cpu_idle_type' ordering when iterating [CPU_MAX_IDLE_TYPES] arrays in show_schedstat() Ingo Molnar
  2024-03-08 13:26   ` Vincent Guittot
@ 2024-03-12 12:00   ` tip-bot2 for Shrikanth Hegde
  1 sibling, 0 replies; 35+ messages in thread
From: tip-bot2 for Shrikanth Hegde @ 2024-03-12 12:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Shrikanth Hegde, Ingo Molnar, Vincent Guittot, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     02a61f325a8e62a7c76479c5f2f7ddcba16877e8
Gitweb:        https://git.kernel.org/tip/02a61f325a8e62a7c76479c5f2f7ddcba16877e8
Author:        Shrikanth Hegde <sshegde@linux.ibm.com>
AuthorDate:    Fri, 08 Mar 2024 11:58:53 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 12 Mar 2024 11:03:40 +01:00

sched/balancing: Remove reliance on 'enum cpu_idle_type' ordering when iterating [CPU_MAX_IDLE_TYPES] arrays in show_schedstat()

show_schedstat() output breaks and doesn't print all entries
if the ordering of the definitions in 'enum cpu_idle_type' is changed,
because show_schedstat() assumes that 'CPU_IDLE' is 0.

Fix it before we change the definition order & values.

[ mingo: Added changelog. ]

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20240308105901.1096078-3-mingo@kernel.org
---
 kernel/sched/stats.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 857f837..8527795 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -150,8 +150,7 @@ static int show_schedstat(struct seq_file *seq, void *v)
 
 			seq_printf(seq, "domain%d %*pb", dcount++,
 				   cpumask_pr_args(sched_domain_span(sd)));
-			for (itype = CPU_IDLE; itype < CPU_MAX_IDLE_TYPES;
-					itype++) {
+			for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {
 				seq_printf(seq, " %u %u %u %u %u %u %u %u",
 				    sd->lb_count[itype],
 				    sd->lb_balanced[itype],

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/balancing: Change 'enum cpu_idle_type' to have more natural definitions
  2024-03-08 10:58 ` [PATCH 03/10] sched/balancing: Change 'enum cpu_idle_type' to have more natural definitions Ingo Molnar
@ 2024-03-12 12:00   ` tip-bot2 for Ingo Molnar
  0 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Ingo Molnar @ 2024-03-12 12:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ingo Molnar, Valentin Schneider, Vincent Guittot,
	Gautham R. Shenoy, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     38d707c54df4ca58cd9ceae2ddcbd6f606b99e9f
Gitweb:        https://git.kernel.org/tip/38d707c54df4ca58cd9ceae2ddcbd6f606b99e9f
Author:        Ingo Molnar <mingo@kernel.org>
AuthorDate:    Fri, 08 Mar 2024 11:58:54 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 12 Mar 2024 11:03:40 +01:00

sched/balancing: Change 'enum cpu_idle_type' to have more natural definitions

The cpu_idle_type enum has the confusingly inverted property
that 'not idle' is 1, and 'idle' is '0'.

This resulted in a number of unnecessary complications in the code.

Reverse the order, remove the CPU_NOT_IDLE type, and convert
all code to a natural boolean form.

It's much more readable:

  -       enum cpu_idle_type idle = this_rq->idle_balance ?
  -                                               CPU_IDLE : CPU_NOT_IDLE;
  -
  +       enum cpu_idle_type idle = this_rq->idle_balance;

  --------------------------------

  -       if (env->idle == CPU_NOT_IDLE || !busiest->sum_nr_running)
  +       if (!env->idle || !busiest->sum_nr_running)

  --------------------------------

And gets rid of the double negation in these usages:

  -               if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1)
  +               if (env->idle && env->src_rq->nr_running <= 1)

Furthermore, this makes code much more obvious where there's
differentiation between CPU_IDLE and CPU_NEWLY_IDLE.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: "Gautham R. Shenoy" <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20240308105901.1096078-4-mingo@kernel.org
---
 include/linux/sched/idle.h |  2 +-
 kernel/sched/fair.c        | 27 ++++++++++++---------------
 2 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h
index 478084f..e670ac2 100644
--- a/include/linux/sched/idle.h
+++ b/include/linux/sched/idle.h
@@ -5,8 +5,8 @@
 #include <linux/sched.h>
 
 enum cpu_idle_type {
+	__CPU_NOT_IDLE = 0,
 	CPU_IDLE,
-	CPU_NOT_IDLE,
 	CPU_NEWLY_IDLE,
 	CPU_MAX_IDLE_TYPES
 };
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2ef89b3..3a510cf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9070,7 +9070,7 @@ static int detach_tasks(struct lb_env *env)
 		 * We don't want to steal all, otherwise we may be treated likewise,
 		 * which could at worst lead to a livelock crash.
 		 */
-		if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1)
+		if (env->idle && env->src_rq->nr_running <= 1)
 			break;
 
 		env->loop++;
@@ -9803,7 +9803,7 @@ static inline bool smt_vs_nonsmt_groups(struct sched_group *sg1,
 static inline bool smt_balance(struct lb_env *env, struct sg_lb_stats *sgs,
 			       struct sched_group *group)
 {
-	if (env->idle == CPU_NOT_IDLE)
+	if (!env->idle)
 		return false;
 
 	/*
@@ -9827,7 +9827,7 @@ static inline long sibling_imbalance(struct lb_env *env,
 	int ncores_busiest, ncores_local;
 	long imbalance;
 
-	if (env->idle == CPU_NOT_IDLE || !busiest->sum_nr_running)
+	if (!env->idle || !busiest->sum_nr_running)
 		return 0;
 
 	ncores_busiest = sds->busiest->cores;
@@ -9927,8 +9927,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 				sgs->group_misfit_task_load = rq->misfit_task_load;
 				*sg_status |= SG_OVERLOAD;
 			}
-		} else if ((env->idle != CPU_NOT_IDLE) &&
-			   sched_reduced_capacity(rq, env->sd)) {
+		} else if (env->idle && sched_reduced_capacity(rq, env->sd)) {
 			/* Check for a task running on a CPU with reduced capacity */
 			if (sgs->group_misfit_task_load < load)
 				sgs->group_misfit_task_load = load;
@@ -9940,7 +9939,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 	sgs->group_weight = group->group_weight;
 
 	/* Check if dst CPU is idle and preferred to this group */
-	if (!local_group && env->idle != CPU_NOT_IDLE && sgs->sum_h_nr_running &&
+	if (!local_group && env->idle && sgs->sum_h_nr_running &&
 	    sched_group_asym(env, sgs, group))
 		sgs->group_asym_packing = 1;
 
@@ -10698,7 +10697,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 			 * waiting task in this overloaded busiest group. Let's
 			 * try to pull it.
 			 */
-			if (env->idle != CPU_NOT_IDLE && env->imbalance == 0) {
+			if (env->idle && env->imbalance == 0) {
 				env->migration_type = migrate_task;
 				env->imbalance = 1;
 			}
@@ -10913,7 +10912,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 		goto force_balance;
 
 	if (busiest->group_type != group_overloaded) {
-		if (env->idle == CPU_NOT_IDLE) {
+		if (!env->idle) {
 			/*
 			 * If the busiest group is not overloaded (and as a
 			 * result the local one too) but this CPU is already
@@ -11121,7 +11120,7 @@ asym_active_balance(struct lb_env *env)
 	 * the lower priority @env::dst_cpu help it. Do not follow
 	 * CPU priority.
 	 */
-	return env->idle != CPU_NOT_IDLE && sched_use_asym_prio(env->sd, env->dst_cpu) &&
+	return env->idle && sched_use_asym_prio(env->sd, env->dst_cpu) &&
 	       (sched_asym_prefer(env->dst_cpu, env->src_cpu) ||
 		!sched_use_asym_prio(env->sd, env->src_cpu));
 }
@@ -11159,7 +11158,7 @@ static int need_active_balance(struct lb_env *env)
 	 * because of other sched_class or IRQs if more capacity stays
 	 * available on dst_cpu.
 	 */
-	if ((env->idle != CPU_NOT_IDLE) &&
+	if (env->idle &&
 	    (env->src_rq->cfs.h_nr_running == 1)) {
 		if ((check_cpu_capacity(env->src_rq, sd)) &&
 		    (capacity_of(env->src_cpu)*sd->imbalance_pct < capacity_of(env->dst_cpu)*100))
@@ -11735,8 +11734,8 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
 				 * env->dst_cpu, so we can't know our idle
 				 * state even if we migrated tasks. Update it.
 				 */
-				idle = idle_cpu(cpu) ? CPU_IDLE : CPU_NOT_IDLE;
-				busy = idle != CPU_IDLE && !sched_idle_cpu(cpu);
+				idle = idle_cpu(cpu);
+				busy = !idle && !sched_idle_cpu(cpu);
 			}
 			sd->last_balance = jiffies;
 			interval = get_sd_balance_interval(sd, busy);
@@ -12416,9 +12415,7 @@ out:
 static __latent_entropy void run_rebalance_domains(struct softirq_action *h)
 {
 	struct rq *this_rq = this_rq();
-	enum cpu_idle_type idle = this_rq->idle_balance ?
-						CPU_IDLE : CPU_NOT_IDLE;
-
+	enum cpu_idle_type idle = this_rq->idle_balance;
 	/*
 	 * If this CPU has a pending nohz_balance_kick, then do the
 	 * balancing on behalf of the other idle CPUs whose ticks are

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: sched/core] sched/balancing: Switch the 'DEFINE_SPINLOCK(balancing)' spinlock into an 'atomic_t sched_balance_running' flag
  2024-03-08 10:58 ` [PATCH 01/10] sched/balancing: Switch the 'DEFINE_SPINLOCK(balancing)' spinlock into an 'atomic_t sched_balance_running' flag Ingo Molnar
@ 2024-03-12 12:00   ` tip-bot2 for Ingo Molnar
  2024-03-15  9:25   ` [PATCH 01/10] " Shrikanth Hegde
  1 sibling, 0 replies; 35+ messages in thread
From: tip-bot2 for Ingo Molnar @ 2024-03-12 12:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ingo Molnar, Valentin Schneider, Shrikanth Hegde, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     214c1b7f13954559cf09d5d04b934bf32ba4d618
Gitweb:        https://git.kernel.org/tip/214c1b7f13954559cf09d5d04b934bf32ba4d618
Author:        Ingo Molnar <mingo@kernel.org>
AuthorDate:    Fri, 08 Mar 2024 11:58:52 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 12 Mar 2024 11:03:39 +01:00

sched/balancing: Switch the 'DEFINE_SPINLOCK(balancing)' spinlock into an 'atomic_t sched_balance_running' flag

The 'balancing' spinlock added in:

  08c183f31bdb ("[PATCH] sched: add option to serialize load balancing")

... is taken when the SD_SERIALIZE flag is set in a domain, but in reality it
is a glorified global atomic flag serializing the load-balancing of
those domains.

It doesn't have any explicit locking semantics per se: we just
spin_trylock() it.

Turn it into a ... global atomic flag. This makes it more
clear what is going on here, and reduces overhead and code
size a bit:

  # kernel/sched/fair.o: [x86-64 defconfig]

     text	   data	    bss	    dec	    hex	filename
    60730	   2721	    104	  63555	   f843	fair.o.before
    60718	   2721	    104	  63543	   f837	fair.o.after

Also document the flag a bit.

No change in functionality intended.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://lore.kernel.org/r/20240308105901.1096078-2-mingo@kernel.org
---
 kernel/sched/fair.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6a16129..2ef89b3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11633,7 +11633,20 @@ out_unlock:
 	return 0;
 }
 
-static DEFINE_SPINLOCK(balancing);
+/*
+ * This flag serializes load-balancing passes over large domains
+ * (above the NODE topology level) - only one load-balancing instance
+ * may run at a time, to reduce overhead on very large systems with
+ * lots of CPUs and large NUMA distances.
+ *
+ * - Note that load-balancing passes triggered while another one
+ *   is executing are skipped and not re-tried.
+ *
+ * - Also note that this does not serialize rebalance_domains()
+ *   execution, as non-SD_SERIALIZE domains will still be
+ *   load-balanced in parallel.
+ */
+static atomic_t sched_balance_running = ATOMIC_INIT(0);
 
 /*
  * Scale the max load_balance interval with the number of CPUs in the system.
@@ -11711,7 +11724,7 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
 
 		need_serialize = sd->flags & SD_SERIALIZE;
 		if (need_serialize) {
-			if (!spin_trylock(&balancing))
+			if (atomic_cmpxchg_acquire(&sched_balance_running, 0, 1))
 				goto out;
 		}
 
@@ -11729,7 +11742,7 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
 			interval = get_sd_balance_interval(sd, busy);
 		}
 		if (need_serialize)
-			spin_unlock(&balancing);
+			atomic_set_release(&sched_balance_running, 0);
 out:
 		if (time_after(next_balance, sd->last_balance + interval)) {
 			next_balance = sd->last_balance + interval;

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16
  2024-03-12 10:01     ` Ingo Molnar
@ 2024-03-14  5:21       ` Gautham R. Shenoy
  2024-03-14  6:04         ` Swapnil Sapkal
  0 siblings, 1 reply; 35+ messages in thread
From: Gautham R. Shenoy @ 2024-03-14  5:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Shrikanth Hegde, linux-kernel, Dietmar Eggemann, Linus Torvalds,
	Peter Zijlstra, Valentin Schneider, Vincent Guittot,
	Swapnil Sapkal

Hello Ingo, Shrikanth,

On Tue, Mar 12, 2024 at 11:01:54AM +0100, Ingo Molnar wrote:
> 
> * Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
> 
> > 
> > 
> > On 3/8/24 4:28 PM, Ingo Molnar wrote:
> > > We changed the order of definitions within 'enum cpu_idle_type',
> > > which changed the order of [CPU_MAX_IDLE_TYPES] columns in
> > > show_schedstat().
> > > 
> > 
> > > +++ b/kernel/sched/stats.c
> > > @@ -113,7 +113,7 @@ void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p,
> > >   * Bump this up when changing the output format or the meaning of an existing
> > >   * format, so that tools can adapt (or abort)
> > >   */
> > > -#define SCHEDSTAT_VERSION 15
> > > +#define SCHEDSTAT_VERSION 16
> > 
> > Please add the info about version, and change of the order 
> > briefly in Documentation/scheduler/sched-stats.rst as well.
> 
> Good point, I've added this paragraph to sched-stats.rst:
> 
>  +Version 16 of schedstats changed the order of definitions within
>  +'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES]
>  +columns in show_schedstat(). In particular the position of CPU_IDLE
>  +and __CPU_NOT_IDLE changed places. The size of the array is unchanged.

Thanks for this change!

Since we are considering bumping up the version for this change, it
might be good to get rid of "lb_imbalance" which is currently a sum of
all the different kinds of imbalances (due to load, utilization,
number of tasks, misfit tasks). This is not useful.

Swapnil had a patch to print the different imbalances, which will
change the number of fields. Can we include this change into v16 as
well?

Swapnil, could you please rebase your patch on tip/sched/core and post
it ?

> 
> > One recent user that I recollect is sched scoreboard. Version number should 
> > be able to help the users when they it is breaking their scripts. 
> >

> > +Gautham, Any thoughts?

Yes, the scoreboard looks at the version. If it doesn't understand a
version, it will not parse it.  It can be extended to understand the
new version, since that's only about adding a new json.

> 
> If it's really a problem, I suspect we could maintain the v15 order of 
> output.

I don't think moving to v16 is an issue.

> 
> Thanks,
> 
> 	Ingo

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16
  2024-03-14  5:21       ` Gautham R. Shenoy
@ 2024-03-14  6:04         ` Swapnil Sapkal
  2024-03-15  4:37           ` Shrikanth Hegde
  0 siblings, 1 reply; 35+ messages in thread
From: Swapnil Sapkal @ 2024-03-14  6:04 UTC (permalink / raw)
  To: Gautham R. Shenoy, Ingo Molnar
  Cc: Shrikanth Hegde, linux-kernel, Dietmar Eggemann, Linus Torvalds,
	Peter Zijlstra, Valentin Schneider, Vincent Guittot



On 3/14/2024 10:51 AM, Gautham R. Shenoy wrote:
> Hello Ingo, Shrikanth,
> 
> On Tue, Mar 12, 2024 at 11:01:54AM +0100, Ingo Molnar wrote:
>>
>> * Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>>
>>>
>>>
>>> On 3/8/24 4:28 PM, Ingo Molnar wrote:
>>>> We changed the order of definitions within 'enum cpu_idle_type',
>>>> which changed the order of [CPU_MAX_IDLE_TYPES] columns in
>>>> show_schedstat().
>>>>
>>>
>>>> +++ b/kernel/sched/stats.c
>>>> @@ -113,7 +113,7 @@ void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p,
>>>>    * Bump this up when changing the output format or the meaning of an existing
>>>>    * format, so that tools can adapt (or abort)
>>>>    */
>>>> -#define SCHEDSTAT_VERSION 15
>>>> +#define SCHEDSTAT_VERSION 16
>>>
>>> Please add the info about version, and change of the order
>>> briefly in Documentation/scheduler/sched-stats.rst as well.
>>
>> Good point, I've added this paragraph to sched-stats.rst:
>>
>>   +Version 16 of schedstats changed the order of definitions within
>>   +'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES]
>>   +columns in show_schedstat(). In particular the position of CPU_IDLE
>>   +and __CPU_NOT_IDLE changed places. The size of the array is unchanged.
> 
> Thanks for this change!
> 
> Since we are considering bumping up the version for this change, it
> might be good to get rid of "lb_imbalance" which is currently a sum of
> all the different kinds of imbalances (due to load, utilization,
> number of tasks, misfit tasks). This is not useful.
> 
> Swapnil had a patch to print the different imbalances, which will
> change the number of fields. Can we include this change into v16 as
> well?
> 
> Swapnil, could you please rebase your patch on tip/sched/core and post
> it ?
Please find the patch below. This is based on tip/sched/core.

---
 From deeed5bf937bddf227deb1cdb9e2e6c164c48053 Mon Sep 17 00:00:00 2001
From: Swapnil Sapkal <swapnil.sapkal@amd.com>
Date: Thu, 15 Jun 2023 04:55:09 +0000
Subject: [PATCH] sched: Report the different kinds of imbalances in /proc/schedstat

In /proc/schedstat, lb_imbalance reports the sum of imbalances
discovered in sched domains with each call to sched_balance_rq(), which is
not very useful because lb_imbalance is an aggregate of the imbalances
due to load, utilization, nr_tasks and misfit_tasks. Remove this field
from /proc/schedstat.

Currently there are no fields in /proc/schedstat to report different types
of imbalances. Introduce new fields in /proc/schedstat to report the
total imbalances in load, utilization, nr_tasks or misfit_tasks.

Added fields to /proc/schedstat:
  	- lb_imbalance_load: Total imbalance due to load.
	- lb_imbalance_util: Total imbalance due to utilization.
	- lb_imbalance_task: Total imbalance due to number of tasks.
	- lb_imbalance_misfit: Total imbalance due to misfit tasks.

Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
---
  Documentation/scheduler/sched-stats.rst | 104 +++++++++++++-----------
  include/linux/sched/topology.h          |   5 +-
  kernel/sched/fair.c                     |  15 +++-
  kernel/sched/stats.c                    |   7 +-
  4 files changed, 80 insertions(+), 51 deletions(-)

diff --git a/Documentation/scheduler/sched-stats.rst b/Documentation/scheduler/sched-stats.rst
index 7c2b16c4729d..d6e9a8a5619c 100644
--- a/Documentation/scheduler/sched-stats.rst
+++ b/Documentation/scheduler/sched-stats.rst
@@ -6,6 +6,9 @@ Version 16 of schedstats changed the order of definitions within
  'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES]
  columns in show_schedstat(). In particular the position of CPU_IDLE
  and __CPU_NOT_IDLE changed places. The size of the array is unchanged.
+Also stop reporting 'lb_imbalance' as it has no significance anymore
+and instead add more relevant fields namely 'lb_imbalance_load',
+'lb_imbalance_util', 'lb_imbalance_task' and 'lb_imbalance_misfit'.
  
  Version 15 of schedstats dropped counters for some sched_yield:
  yld_exp_empty, yld_act_empty and yld_both_empty. Otherwise, it is
@@ -73,86 +76,93 @@ One of these is produced per domain for each cpu described. (Note that if
  CONFIG_SMP is not defined, *no* domains are utilized and these lines
  will not appear in the output.)
  
-domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
+domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
  
  The first field is a bit mask indicating what cpus this domain operates over.
  
-The next 24 are a variety of sched_balance_rq() statistics in grouped into types
+The next 33 are a variety of sched_balance_rq() statistics in grouped into types
  of idleness (idle, busy, and newly idle):
  
      1)  # of times in this domain sched_balance_rq() was called when the
+        cpu was busy
+    2)  # of times in this domain sched_balance_rq() checked but found the
+        load did not require balancing when busy
+    3)  # of times in this domain sched_balance_rq() tried to move one or
+        more tasks and failed, when the cpu was busy
+    4)  Total imbalance in load when the cpu was busy
+    5)  Total imbalance in utilization when the cpu was busy
+    6)  Total imbalance in number of tasks when the cpu was busy
+    7)  Total imbalance due to misfit tasks when the cpu was busy
+    8)  # of times in this domain pull_task() was called when busy
+    9)  # of times in this domain pull_task() was called even though the
+        target task was cache-hot when busy
+    10) # of times in this domain sched_balance_rq() was called but did not
+        find a busier queue while the cpu was busy
+    11) # of times in this domain a busier queue was found while the cpu
+        was busy but no busier group was found
+
+    12) # of times in this domain sched_balance_rq() was called when the
          cpu was idle
-    2)  # of times in this domain sched_balance_rq() checked but found
+    13) # of times in this domain sched_balance_rq() checked but found
          the load did not require balancing when the cpu was idle
-    3)  # of times in this domain sched_balance_rq() tried to move one or
+    14) # of times in this domain sched_balance_rq() tried to move one or
          more tasks and failed, when the cpu was idle
-    4)  sum of imbalances discovered (if any) with each call to
-        sched_balance_rq() in this domain when the cpu was idle
-    5)  # of times in this domain pull_task() was called when the cpu
+    15) Total imbalance in load when the cpu was idle
+    16) Total imbalance in utilization when the cpu was idle
+    17) Total imbalance in number of tasks when the cpu was idle
+    18) Total imbalance due to misfit tasks when the cpu was idle
+    19) # of times in this domain pull_task() was called when the cpu
          was idle
-    6)  # of times in this domain pull_task() was called even though
+    20) # of times in this domain pull_task() was called even though
          the target task was cache-hot when idle
-    7)  # of times in this domain sched_balance_rq() was called but did
+    21) # of times in this domain sched_balance_rq() was called but did
          not find a busier queue while the cpu was idle
-    8)  # of times in this domain a busier queue was found while the
+    22) # of times in this domain a busier queue was found while the
          cpu was idle but no busier group was found
-    9)  # of times in this domain sched_balance_rq() was called when the
-        cpu was busy
-    10) # of times in this domain sched_balance_rq() checked but found the
-        load did not require balancing when busy
-    11) # of times in this domain sched_balance_rq() tried to move one or
-        more tasks and failed, when the cpu was busy
-    12) sum of imbalances discovered (if any) with each call to
-        sched_balance_rq() in this domain when the cpu was busy
-    13) # of times in this domain pull_task() was called when busy
-    14) # of times in this domain pull_task() was called even though the
-        target task was cache-hot when busy
-    15) # of times in this domain sched_balance_rq() was called but did not
-        find a busier queue while the cpu was busy
-    16) # of times in this domain a busier queue was found while the cpu
-        was busy but no busier group was found
  
-    17) # of times in this domain sched_balance_rq() was called when the
-        cpu was just becoming idle
-    18) # of times in this domain sched_balance_rq() checked but found the
+    23) # of times in this domain sched_balance_rq() was called when the
+        was just becoming idle
+    24) # of times in this domain sched_balance_rq() checked but found the
          load did not require balancing when the cpu was just becoming idle
-    19) # of times in this domain sched_balance_rq() tried to move one or more
+    25) # of times in this domain sched_balance_rq() tried to move one or more
          tasks and failed, when the cpu was just becoming idle
-    20) sum of imbalances discovered (if any) with each call to
-        sched_balance_rq() in this domain when the cpu was just becoming idle
-    21) # of times in this domain pull_task() was called when newly idle
-    22) # of times in this domain pull_task() was called even though the
+    26) Total imbalance in load when the cpu was just becoming idle
+    27) Total imbalance in utilization when the cpu was just becoming idle
+    28) Total imbalance in number of tasks when the cpu was just becoming idle
+    29) Total imbalance due to misfit tasks when the cpu was just becoming idle
+    30) # of times in this domain pull_task() was called when newly idle
+    31) # of times in this domain pull_task() was called even though the
          target task was cache-hot when just becoming idle
-    23) # of times in this domain sched_balance_rq() was called but did not
+    32) # of times in this domain sched_balance_rq() was called but did not
          find a busier queue while the cpu was just becoming idle
-    24) # of times in this domain a busier queue was found while the cpu
+    33) # of times in this domain a busier queue was found while the cpu
          was just becoming idle but no busier group was found
  
     Next three are active_load_balance() statistics:
  
-    25) # of times active_load_balance() was called
-    26) # of times active_load_balance() tried to move a task and failed
-    27) # of times active_load_balance() successfully moved a task
+    34) # of times active_load_balance() was called
+    35) # of times active_load_balance() tried to move a task and failed
+    36) # of times active_load_balance() successfully moved a task
  
     Next three are sched_balance_exec() statistics:
  
-    28) sbe_cnt is not used
-    29) sbe_balanced is not used
-    30) sbe_pushed is not used
+    37) sbe_cnt is not used
+    38) sbe_balanced is not used
+    39) sbe_pushed is not used
  
     Next three are sched_balance_fork() statistics:
  
-    31) sbf_cnt is not used
-    32) sbf_balanced is not used
-    33) sbf_pushed is not used
+    40) sbf_cnt is not used
+    41) sbf_balanced is not used
+    42) sbf_pushed is not used
  
     Next three are try_to_wake_up() statistics:
  
-    34) # of times in this domain try_to_wake_up() awoke a task that
+    43) # of times in this domain try_to_wake_up() awoke a task that
          last ran on a different cpu in this domain
-    35) # of times in this domain try_to_wake_up() moved a task to the
+    44) # of times in this domain try_to_wake_up() moved a task to the
          waking cpu because it was cache-cold on its own cpu anyway
-    36) # of times in this domain try_to_wake_up() started passive balancing
+    45) # of times in this domain try_to_wake_up() started passive balancing
  
  /proc/<pid>/schedstat
  ---------------------
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index c8fe9bab981b..15685c40a713 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -114,7 +114,10 @@ struct sched_domain {
  	unsigned int lb_count[CPU_MAX_IDLE_TYPES];
  	unsigned int lb_failed[CPU_MAX_IDLE_TYPES];
  	unsigned int lb_balanced[CPU_MAX_IDLE_TYPES];
-	unsigned int lb_imbalance[CPU_MAX_IDLE_TYPES];
+	unsigned int lb_imbalance_load[CPU_MAX_IDLE_TYPES];
+	unsigned int lb_imbalance_util[CPU_MAX_IDLE_TYPES];
+	unsigned int lb_imbalance_task[CPU_MAX_IDLE_TYPES];
+	unsigned int lb_imbalance_misfit[CPU_MAX_IDLE_TYPES];
  	unsigned int lb_gained[CPU_MAX_IDLE_TYPES];
  	unsigned int lb_hot_gained[CPU_MAX_IDLE_TYPES];
  	unsigned int lb_nobusyg[CPU_MAX_IDLE_TYPES];
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a19ea290b790..515258f97ba3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11288,7 +11288,20 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
  
  	WARN_ON_ONCE(busiest == env.dst_rq);
  
-	schedstat_add(sd->lb_imbalance[idle], env.imbalance);
+	switch (env.migration_type) {
+	case migrate_load:
+		schedstat_add(sd->lb_imbalance_load[idle], env.imbalance);
+		break;
+	case migrate_util:
+		schedstat_add(sd->lb_imbalance_util[idle], env.imbalance);
+		break;
+	case migrate_task:
+		schedstat_add(sd->lb_imbalance_task[idle], env.imbalance);
+		break;
+	case migrate_misfit:
+		schedstat_add(sd->lb_imbalance_misfit[idle], env.imbalance);
+		break;
+	}
  
  	env.src_cpu = busiest->cpu;
  	env.src_rq = busiest;
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 78e48f5426ee..a02bc9db2f1c 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -151,11 +151,14 @@ static int show_schedstat(struct seq_file *seq, void *v)
  			seq_printf(seq, "domain%d %*pb", dcount++,
  				   cpumask_pr_args(sched_domain_span(sd)));
  			for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {
-				seq_printf(seq, " %u %u %u %u %u %u %u %u",
+				seq_printf(seq, " %u %u %u %u %u %u %u %u %u %u %u",
  				    sd->lb_count[itype],
  				    sd->lb_balanced[itype],
  				    sd->lb_failed[itype],
-				    sd->lb_imbalance[itype],
+				    sd->lb_imbalance_load[itype],
+				    sd->lb_imbalance_util[itype],
+				    sd->lb_imbalance_task[itype],
+				    sd->lb_imbalance_misfit[itype],
  				    sd->lb_gained[itype],
  				    sd->lb_hot_gained[itype],
  				    sd->lb_nobusyq[itype],
-- 
2.34.1


> 
>>
>>> One recent user that I recollect is sched scoreboard. Version number should
>>> be able to help the users when they it is breaking their scripts.
>>>
> 
>>> +Gautham, Any thoughts?
> 
> Yes, the scoreboard looks at the version. If it doesn't understand a
> version, it will not parse it.  It can be extended to understand the
> new version, since that's only about adding a new json.
> 
>>
>> If it's really a problem, I suspect we could maintain the v15 order of
>> output.
> 
> I don't think moving to v16 is an issue.
> 
>>
>> Thanks,
>>
>> 	Ingo
> 
> --
> Thanks and Regards
> gautham.

--
Thanks and Regards,
Swapnil

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16
  2024-03-14  6:04         ` Swapnil Sapkal
@ 2024-03-15  4:37           ` Shrikanth Hegde
  2024-03-15  9:06             ` Swapnil Sapkal
  0 siblings, 1 reply; 35+ messages in thread
From: Shrikanth Hegde @ 2024-03-15  4:37 UTC (permalink / raw)
  To: Swapnil Sapkal
  Cc: linux-kernel, Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Valentin Schneider, Vincent Guittot, Gautham R. Shenoy,
	Ingo Molnar



On 3/14/24 11:34 AM, Swapnil Sapkal wrote:
> 
> 
> Please find the patch below. This is based on tip/sched/core.
> 
> ---
> From deeed5bf937bddf227deb1cdb9e2e6c164c48053 Mon Sep 17 00:00:00 2001
> From: Swapnil Sapkal <swapnil.sapkal@amd.com>
> Date: Thu, 15 Jun 2023 04:55:09 +0000
> Subject: [PATCH] sched: Report the different kinds of imbalances in
> /proc/schedstat
> 
> In /proc/schedstat, lb_imbalance reports the sum of imbalances
> discovered in sched domains with each call to sched_balance_rq(), which is
> not very useful because lb_imbalance is an aggregate of the imbalances
> due to load, utilization, nr_tasks and misfit_tasks. Remove this field
> from /proc/schedstat.
> 

Yes. This makes sense. any one of them can skew the value. 

> Currently there are no fields in /proc/schedstat to report different types
> of imbalances. Introduce new fields in /proc/schedstat to report the
> total imbalances in load, utilization, nr_tasks or misfit_tasks.
> 
> Added fields to /proc/schedstat:
>      - lb_imbalance_load: Total imbalance due to load.
>     - lb_imbalance_util: Total imbalance due to utilization.
>     - lb_imbalance_task: Total imbalance due to number of tasks.
>     - lb_imbalance_misfit: Total imbalance due to misfit tasks.
> 
> Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
> ---
>  Documentation/scheduler/sched-stats.rst | 104 +++++++++++++-----------
>  include/linux/sched/topology.h          |   5 +-
>  kernel/sched/fair.c                     |  15 +++-
>  kernel/sched/stats.c                    |   7 +-
>  4 files changed, 80 insertions(+), 51 deletions(-)
> 
> diff --git a/Documentation/scheduler/sched-stats.rst
> b/Documentation/scheduler/sched-stats.rst
> index 7c2b16c4729d..d6e9a8a5619c 100644
> --- a/Documentation/scheduler/sched-stats.rst
> +++ b/Documentation/scheduler/sched-stats.rst
> @@ -6,6 +6,9 @@ Version 16 of schedstats changed the order of
> definitions within
>  'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES]
>  columns in show_schedstat(). In particular the position of CPU_IDLE
>  and __CPU_NOT_IDLE changed places. The size of the array is unchanged.
> +Also stop reporting 'lb_imbalance' as it has no significance anymore
> +and instead add more relevant fields namely 'lb_imbalance_load',
> +'lb_imbalance_util', 'lb_imbalance_task' and 'lb_imbalance_misfit'.
>  
>  Version 15 of schedstats dropped counters for some sched_yield:
>  yld_exp_empty, yld_act_empty and yld_both_empty. Otherwise, it is
> @@ -73,86 +76,93 @@ One of these is produced per domain for each cpu
> described. (Note that if
>  CONFIG_SMP is not defined, *no* domains are utilized and these lines
>  will not appear in the output.)
>  
> -domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
> +domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
>  
>  The first field is a bit mask indicating what cpus this domain operates
> over.
>  

IIUC, this is editing the content of Version 15, But changes would happen on Version 16. 
Instead can we add the section for Version 16 and not modify for 15? 

> -The next 24 are a variety of sched_balance_rq() statistics in grouped
> into types
> +The next 33 are a variety of sched_balance_rq() statistics in grouped
> into types
>  of idleness (idle, busy, and newly idle):
>  
>      1)  # of times in this domain sched_balance_rq() was called when the
> +        cpu was busy
> +    2)  # of times in this domain sched_balance_rq() checked but found the
> +        load did not require balancing when busy
> +    3)  # of times in this domain sched_balance_rq() tried to move one or
> +        more tasks and failed, when the cpu was busy
> +    4)  Total imbalance in load when the cpu was busy
> +    5)  Total imbalance in utilization when the cpu was busy
> +    6)  Total imbalance in number of tasks when the cpu was busy
> +    7)  Total imbalance due to misfit tasks when the cpu was busy
> +    8)  # of times in this domain pull_task() was called when busy
> +    9)  # of times in this domain pull_task() was called even though the
> +        target task was cache-hot when busy
> +    10) # of times in this domain sched_balance_rq() was called but did
> not
> +        find a busier queue while the cpu was busy
> +    11) # of times in this domain a busier queue was found while the cpu
> +        was busy but no busier group was found
> +
> +    12) # of times in this domain sched_balance_rq() was called when the
>          cpu was idle
> -    2)  # of times in this domain sched_balance_rq() checked but found
> +    13) # of times in this domain sched_balance_rq() checked but found
>          the load did not require balancing when the cpu was idle
> -    3)  # of times in this domain sched_balance_rq() tried to move one or
> +    14) # of times in this domain sched_balance_rq() tried to move one or
>          more tasks and failed, when the cpu was idle
> -    4)  sum of imbalances discovered (if any) with each call to
> -        sched_balance_rq() in this domain when the cpu was idle
> -    5)  # of times in this domain pull_task() was called when the cpu
> +    15) Total imbalance in load when the cpu was idle
> +    16) Total imbalance in utilization when the cpu was idle
> +    17) Total imbalance in number of tasks when the cpu was idle
> +    18) Total imbalance due to misfit tasks when the cpu was idle
> +    19) # of times in this domain pull_task() was called when the cpu
>          was idle
> -    6)  # of times in this domain pull_task() was called even though
> +    20) # of times in this domain pull_task() was called even though
>          the target task was cache-hot when idle
> -    7)  # of times in this domain sched_balance_rq() was called but did
> +    21) # of times in this domain sched_balance_rq() was called but did
>          not find a busier queue while the cpu was idle
> -    8)  # of times in this domain a busier queue was found while the
> +    22) # of times in this domain a busier queue was found while the
>          cpu was idle but no busier group was found
> -    9)  # of times in this domain sched_balance_rq() was called when the
> -        cpu was busy
> -    10) # of times in this domain sched_balance_rq() checked but found the
> -        load did not require balancing when busy
> -    11) # of times in this domain sched_balance_rq() tried to move one or
> -        more tasks and failed, when the cpu was busy
> -    12) sum of imbalances discovered (if any) with each call to
> -        sched_balance_rq() in this domain when the cpu was busy
> -    13) # of times in this domain pull_task() was called when busy
> -    14) # of times in this domain pull_task() was called even though the
> -        target task was cache-hot when busy
> -    15) # of times in this domain sched_balance_rq() was called but did
> not
> -        find a busier queue while the cpu was busy
> -    16) # of times in this domain a busier queue was found while the cpu
> -        was busy but no busier group was found
>  
> -    17) # of times in this domain sched_balance_rq() was called when the
> -        cpu was just becoming idle
> -    18) # of times in this domain sched_balance_rq() checked but found the
> +    23) # of times in this domain sched_balance_rq() was called when the
> +        was just becoming idle
> +    24) # of times in this domain sched_balance_rq() checked but found the
>          load did not require balancing when the cpu was just becoming idle
> -    19) # of times in this domain sched_balance_rq() tried to move one
> or more
> +    25) # of times in this domain sched_balance_rq() tried to move one
> or more
>          tasks and failed, when the cpu was just becoming idle
> -    20) sum of imbalances discovered (if any) with each call to
> -        sched_balance_rq() in this domain when the cpu was just
> becoming idle
> -    21) # of times in this domain pull_task() was called when newly idle
> -    22) # of times in this domain pull_task() was called even though the
> +    26) Total imbalance in load when the cpu was just becoming idle
> +    27) Total imbalance in utilization when the cpu was just becoming idle
> +    28) Total imbalance in number of tasks when the cpu was just
> becoming idle
> +    29) Total imbalance due to misfit tasks when the cpu was just
> becoming idle
> +    30) # of times in this domain pull_task() was called when newly idle
> +    31) # of times in this domain pull_task() was called even though the
>          target task was cache-hot when just becoming idle
> -    23) # of times in this domain sched_balance_rq() was called but did
> not
> +    32) # of times in this domain sched_balance_rq() was called but did
> not
>          find a busier queue while the cpu was just becoming idle
> -    24) # of times in this domain a busier queue was found while the cpu
> +    33) # of times in this domain a busier queue was found while the cpu
>          was just becoming idle but no busier group was found
>  
>     Next three are active_load_balance() statistics:
>  
> -    25) # of times active_load_balance() was called
> -    26) # of times active_load_balance() tried to move a task and failed
> -    27) # of times active_load_balance() successfully moved a task
> +    34) # of times active_load_balance() was called
> +    35) # of times active_load_balance() tried to move a task and failed
> +    36) # of times active_load_balance() successfully moved a task
>  
>     Next three are sched_balance_exec() statistics:
>  
> -    28) sbe_cnt is not used
> -    29) sbe_balanced is not used
> -    30) sbe_pushed is not used
> +    37) sbe_cnt is not used
> +    38) sbe_balanced is not used
> +    39) sbe_pushed is not used
>  
>     Next three are sched_balance_fork() statistics:
>  
> -    31) sbf_cnt is not used
> -    32) sbf_balanced is not used
> -    33) sbf_pushed is not used
> +    40) sbf_cnt is not used
> +    41) sbf_balanced is not used
> +    42) sbf_pushed is not used
>  
>     Next three are try_to_wake_up() statistics:
>  
> -    34) # of times in this domain try_to_wake_up() awoke a task that
> +    43) # of times in this domain try_to_wake_up() awoke a task that
>          last ran on a different cpu in this domain
> -    35) # of times in this domain try_to_wake_up() moved a task to the
> +    44) # of times in this domain try_to_wake_up() moved a task to the
>          waking cpu because it was cache-cold on its own cpu anyway
> -    36) # of times in this domain try_to_wake_up() started passive
> balancing
> +    45) # of times in this domain try_to_wake_up() started passive
> balancing
>  
>  /proc/<pid>/schedstat
>  ---------------------
> diff --git a/include/linux/sched/topology.h
> b/include/linux/sched/topology.h
> index c8fe9bab981b..15685c40a713 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -114,7 +114,10 @@ struct sched_domain {
>      unsigned int lb_count[CPU_MAX_IDLE_TYPES];
>      unsigned int lb_failed[CPU_MAX_IDLE_TYPES];
>      unsigned int lb_balanced[CPU_MAX_IDLE_TYPES];
> -    unsigned int lb_imbalance[CPU_MAX_IDLE_TYPES];
> +    unsigned int lb_imbalance_load[CPU_MAX_IDLE_TYPES];
> +    unsigned int lb_imbalance_util[CPU_MAX_IDLE_TYPES];
> +    unsigned int lb_imbalance_task[CPU_MAX_IDLE_TYPES];
> +    unsigned int lb_imbalance_misfit[CPU_MAX_IDLE_TYPES];
>      unsigned int lb_gained[CPU_MAX_IDLE_TYPES];
>      unsigned int lb_hot_gained[CPU_MAX_IDLE_TYPES];
>      unsigned int lb_nobusyg[CPU_MAX_IDLE_TYPES];
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a19ea290b790..515258f97ba3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11288,7 +11288,20 @@ static int sched_balance_rq(int this_cpu,
> struct rq *this_rq,
>  
>      WARN_ON_ONCE(busiest == env.dst_rq);
>  
> -    schedstat_add(sd->lb_imbalance[idle], env.imbalance);
> +    switch (env.migration_type) {
> +    case migrate_load:
> +        schedstat_add(sd->lb_imbalance_load[idle], env.imbalance);
> +        break;
> +    case migrate_util:
> +        schedstat_add(sd->lb_imbalance_util[idle], env.imbalance);
> +        break;
> +    case migrate_task:
> +        schedstat_add(sd->lb_imbalance_task[idle], env.imbalance);
> +        break;
> +    case migrate_misfit:
> +        schedstat_add(sd->lb_imbalance_misfit[idle], env.imbalance);
> +        break;
> +    }
>  

This switch statement could use a helper function. 

>      env.src_cpu = busiest->cpu;
>      env.src_rq = busiest;
> diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
> index 78e48f5426ee..a02bc9db2f1c 100644
> --- a/kernel/sched/stats.c
> +++ b/kernel/sched/stats.c
> @@ -151,11 +151,14 @@ static int show_schedstat(struct seq_file *seq,
> void *v)
>              seq_printf(seq, "domain%d %*pb", dcount++,
>                     cpumask_pr_args(sched_domain_span(sd)));
>              for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {
> -                seq_printf(seq, " %u %u %u %u %u %u %u %u",
> +                seq_printf(seq, " %u %u %u %u %u %u %u %u %u %u %u",
>                      sd->lb_count[itype],
>                      sd->lb_balanced[itype],
>                      sd->lb_failed[itype],
> -                    sd->lb_imbalance[itype],
> +                    sd->lb_imbalance_load[itype],
> +                    sd->lb_imbalance_util[itype],
> +                    sd->lb_imbalance_task[itype],
> +                    sd->lb_imbalance_misfit[itype],
>                      sd->lb_gained[itype],
>                      sd->lb_hot_gained[itype],
>                      sd->lb_nobusyq[itype],

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16
  2024-03-15  4:37           ` Shrikanth Hegde
@ 2024-03-15  9:06             ` Swapnil Sapkal
  2024-03-15  9:20               ` Shrikanth Hegde
  0 siblings, 1 reply; 35+ messages in thread
From: Swapnil Sapkal @ 2024-03-15  9:06 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: linux-kernel, Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Valentin Schneider, Vincent Guittot, Gautham R. Shenoy,
	Ingo Molnar

Hello Shrikanth,

Thank you for reviewing.

On 3/15/2024 10:07 AM, Shrikanth Hegde wrote:
>
> On 3/14/24 11:34 AM, Swapnil Sapkal wrote:
>>
>> Please find the patch below. This is based on tip/sched/core.
>>
>> ---
>>  From deeed5bf937bddf227deb1cdb9e2e6c164c48053 Mon Sep 17 00:00:00 2001
>> From: Swapnil Sapkal <swapnil.sapkal@amd.com>
>> Date: Thu, 15 Jun 2023 04:55:09 +0000
>> Subject: [PATCH] sched: Report the different kinds of imbalances in
>> /proc/schedstat
>>
>> In /proc/schedstat, lb_imbalance reports the sum of imbalances
>> discovered in sched domains with each call to sched_balance_rq(), which is
>> not very useful because lb_imbalance is an aggregate of the imbalances
>> due to load, utilization, nr_tasks and misfit_tasks. Remove this field
>> from /proc/schedstat.
>>
> Yes. This makes sense. any one of them can skew the value.

Yes.

>> Currently there are no fields in /proc/schedstat to report different types
>> of imbalances. Introduce new fields in /proc/schedstat to report the
>> total imbalances in load, utilization, nr_tasks or misfit_tasks.
>>
>> Added fields to /proc/schedstat:
>>       - lb_imbalance_load: Total imbalance due to load.
>>      - lb_imbalance_util: Total imbalance due to utilization.
>>      - lb_imbalance_task: Total imbalance due to number of tasks.
>>      - lb_imbalance_misfit: Total imbalance due to misfit tasks.
>>
>> Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
>> ---
>>   Documentation/scheduler/sched-stats.rst | 104 +++++++++++++-----------
>>   include/linux/sched/topology.h          |   5 +-
>>   kernel/sched/fair.c                     |  15 +++-
>>   kernel/sched/stats.c                    |   7 +-
>>   4 files changed, 80 insertions(+), 51 deletions(-)
>>
>> diff --git a/Documentation/scheduler/sched-stats.rst
>> b/Documentation/scheduler/sched-stats.rst
>> index 7c2b16c4729d..d6e9a8a5619c 100644
>> --- a/Documentation/scheduler/sched-stats.rst
>> +++ b/Documentation/scheduler/sched-stats.rst
>> @@ -6,6 +6,9 @@ Version 16 of schedstats changed the order of
>> definitions within
>>   'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES]
>>   columns in show_schedstat(). In particular the position of CPU_IDLE
>>   and __CPU_NOT_IDLE changed places. The size of the array is unchanged.
>> +Also stop reporting 'lb_imbalance' as it has no significance anymore
>> +and instead add more relevant fields namely 'lb_imbalance_load',
>> +'lb_imbalance_util', 'lb_imbalance_task' and 'lb_imbalance_misfit'.
>>   
>>   Version 15 of schedstats dropped counters for some sched_yield:
>>   yld_exp_empty, yld_act_empty and yld_both_empty. Otherwise, it is
>> @@ -73,86 +76,93 @@ One of these is produced per domain for each cpu
>> described. (Note that if
>>   CONFIG_SMP is not defined, *no* domains are utilized and these lines
>>   will not appear in the output.)
>>   
>> -domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
>> 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
>> +domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
>> 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
>>   
>>   The first field is a bit mask indicating what cpus this domain operates
>> over.
>>   
> IIUC, this is editing the content of Version 15, But changes would happen on Version 16.
> Instead can we add the section for Version 16 and not modify for 15?

This file contains the field details of current schedstat version and the short
summary of what change across the versions. Maintaining each versions details will
increase the file size. Instead I will add the links to previous version's
documentation.
  
Thoughts on this?

>> -The next 24 are a variety of sched_balance_rq() statistics in grouped
>> into types
>> +The next 33 are a variety of sched_balance_rq() statistics in grouped
>> into types
>>   of idleness (idle, busy, and newly idle):
>>   
>>       1)  # of times in this domain sched_balance_rq() was called when the
>> +        cpu was busy
>> +    2)  # of times in this domain sched_balance_rq() checked but found the
>> +        load did not require balancing when busy
>> +    3)  # of times in this domain sched_balance_rq() tried to move one or
>> +        more tasks and failed, when the cpu was busy
>> +    4)  Total imbalance in load when the cpu was busy
>> +    5)  Total imbalance in utilization when the cpu was busy
>> +    6)  Total imbalance in number of tasks when the cpu was busy
>> +    7)  Total imbalance due to misfit tasks when the cpu was busy
>> +    8)  # of times in this domain pull_task() was called when busy
>> +    9)  # of times in this domain pull_task() was called even though the
>> +        target task was cache-hot when busy
>> +    10) # of times in this domain sched_balance_rq() was called but did
>> not
>> +        find a busier queue while the cpu was busy
>> +    11) # of times in this domain a busier queue was found while the cpu
>> +        was busy but no busier group was found
>> +
>> +    12) # of times in this domain sched_balance_rq() was called when the
>>           cpu was idle
>> -    2)  # of times in this domain sched_balance_rq() checked but found
>> +    13) # of times in this domain sched_balance_rq() checked but found
>>           the load did not require balancing when the cpu was idle
>> -    3)  # of times in this domain sched_balance_rq() tried to move one or
>> +    14) # of times in this domain sched_balance_rq() tried to move one or
>>           more tasks and failed, when the cpu was idle
>> -    4)  sum of imbalances discovered (if any) with each call to
>> -        sched_balance_rq() in this domain when the cpu was idle
>> -    5)  # of times in this domain pull_task() was called when the cpu
>> +    15) Total imbalance in load when the cpu was idle
>> +    16) Total imbalance in utilization when the cpu was idle
>> +    17) Total imbalance in number of tasks when the cpu was idle
>> +    18) Total imbalance due to misfit tasks when the cpu was idle
>> +    19) # of times in this domain pull_task() was called when the cpu
>>           was idle
>> -    6)  # of times in this domain pull_task() was called even though
>> +    20) # of times in this domain pull_task() was called even though
>>           the target task was cache-hot when idle
>> -    7)  # of times in this domain sched_balance_rq() was called but did
>> +    21) # of times in this domain sched_balance_rq() was called but did
>>           not find a busier queue while the cpu was idle
>> -    8)  # of times in this domain a busier queue was found while the
>> +    22) # of times in this domain a busier queue was found while the
>>           cpu was idle but no busier group was found
>> -    9)  # of times in this domain sched_balance_rq() was called when the
>> -        cpu was busy
>> -    10) # of times in this domain sched_balance_rq() checked but found the
>> -        load did not require balancing when busy
>> -    11) # of times in this domain sched_balance_rq() tried to move one or
>> -        more tasks and failed, when the cpu was busy
>> -    12) sum of imbalances discovered (if any) with each call to
>> -        sched_balance_rq() in this domain when the cpu was busy
>> -    13) # of times in this domain pull_task() was called when busy
>> -    14) # of times in this domain pull_task() was called even though the
>> -        target task was cache-hot when busy
>> -    15) # of times in this domain sched_balance_rq() was called but did
>> not
>> -        find a busier queue while the cpu was busy
>> -    16) # of times in this domain a busier queue was found while the cpu
>> -        was busy but no busier group was found
>>   
>> -    17) # of times in this domain sched_balance_rq() was called when the
>> -        cpu was just becoming idle
>> -    18) # of times in this domain sched_balance_rq() checked but found the
>> +    23) # of times in this domain sched_balance_rq() was called when the
>> +        was just becoming idle
>> +    24) # of times in this domain sched_balance_rq() checked but found the
>>           load did not require balancing when the cpu was just becoming idle
>> -    19) # of times in this domain sched_balance_rq() tried to move one
>> or more
>> +    25) # of times in this domain sched_balance_rq() tried to move one
>> or more
>>           tasks and failed, when the cpu was just becoming idle
>> -    20) sum of imbalances discovered (if any) with each call to
>> -        sched_balance_rq() in this domain when the cpu was just
>> becoming idle
>> -    21) # of times in this domain pull_task() was called when newly idle
>> -    22) # of times in this domain pull_task() was called even though the
>> +    26) Total imbalance in load when the cpu was just becoming idle
>> +    27) Total imbalance in utilization when the cpu was just becoming idle
>> +    28) Total imbalance in number of tasks when the cpu was just
>> becoming idle
>> +    29) Total imbalance due to misfit tasks when the cpu was just
>> becoming idle
>> +    30) # of times in this domain pull_task() was called when newly idle
>> +    31) # of times in this domain pull_task() was called even though the
>>           target task was cache-hot when just becoming idle
>> -    23) # of times in this domain sched_balance_rq() was called but did
>> not
>> +    32) # of times in this domain sched_balance_rq() was called but did
>> not
>>           find a busier queue while the cpu was just becoming idle
>> -    24) # of times in this domain a busier queue was found while the cpu
>> +    33) # of times in this domain a busier queue was found while the cpu
>>           was just becoming idle but no busier group was found
>>   
>>      Next three are active_load_balance() statistics:
>>   
>> -    25) # of times active_load_balance() was called
>> -    26) # of times active_load_balance() tried to move a task and failed
>> -    27) # of times active_load_balance() successfully moved a task
>> +    34) # of times active_load_balance() was called
>> +    35) # of times active_load_balance() tried to move a task and failed
>> +    36) # of times active_load_balance() successfully moved a task
>>   
>>      Next three are sched_balance_exec() statistics:
>>   
>> -    28) sbe_cnt is not used
>> -    29) sbe_balanced is not used
>> -    30) sbe_pushed is not used
>> +    37) sbe_cnt is not used
>> +    38) sbe_balanced is not used
>> +    39) sbe_pushed is not used
>>   
>>      Next three are sched_balance_fork() statistics:
>>   
>> -    31) sbf_cnt is not used
>> -    32) sbf_balanced is not used
>> -    33) sbf_pushed is not used
>> +    40) sbf_cnt is not used
>> +    41) sbf_balanced is not used
>> +    42) sbf_pushed is not used
>>   
>>      Next three are try_to_wake_up() statistics:
>>   
>> -    34) # of times in this domain try_to_wake_up() awoke a task that
>> +    43) # of times in this domain try_to_wake_up() awoke a task that
>>           last ran on a different cpu in this domain
>> -    35) # of times in this domain try_to_wake_up() moved a task to the
>> +    44) # of times in this domain try_to_wake_up() moved a task to the
>>           waking cpu because it was cache-cold on its own cpu anyway
>> -    36) # of times in this domain try_to_wake_up() started passive
>> balancing
>> +    45) # of times in this domain try_to_wake_up() started passive
>> balancing
>>   
>>   /proc/<pid>/schedstat
>>   ---------------------
>> diff --git a/include/linux/sched/topology.h
>> b/include/linux/sched/topology.h
>> index c8fe9bab981b..15685c40a713 100644
>> --- a/include/linux/sched/topology.h
>> +++ b/include/linux/sched/topology.h
>> @@ -114,7 +114,10 @@ struct sched_domain {
>>       unsigned int lb_count[CPU_MAX_IDLE_TYPES];
>>       unsigned int lb_failed[CPU_MAX_IDLE_TYPES];
>>       unsigned int lb_balanced[CPU_MAX_IDLE_TYPES];
>> -    unsigned int lb_imbalance[CPU_MAX_IDLE_TYPES];
>> +    unsigned int lb_imbalance_load[CPU_MAX_IDLE_TYPES];
>> +    unsigned int lb_imbalance_util[CPU_MAX_IDLE_TYPES];
>> +    unsigned int lb_imbalance_task[CPU_MAX_IDLE_TYPES];
>> +    unsigned int lb_imbalance_misfit[CPU_MAX_IDLE_TYPES];
>>       unsigned int lb_gained[CPU_MAX_IDLE_TYPES];
>>       unsigned int lb_hot_gained[CPU_MAX_IDLE_TYPES];
>>       unsigned int lb_nobusyg[CPU_MAX_IDLE_TYPES];
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index a19ea290b790..515258f97ba3 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -11288,7 +11288,20 @@ static int sched_balance_rq(int this_cpu,
>> struct rq *this_rq,
>>   
>>       WARN_ON_ONCE(busiest == env.dst_rq);
>>   
>> -    schedstat_add(sd->lb_imbalance[idle], env.imbalance);
>> +    switch (env.migration_type) {
>> +    case migrate_load:
>> +        schedstat_add(sd->lb_imbalance_load[idle], env.imbalance);
>> +        break;
>> +    case migrate_util:
>> +        schedstat_add(sd->lb_imbalance_util[idle], env.imbalance);
>> +        break;
>> +    case migrate_task:
>> +        schedstat_add(sd->lb_imbalance_task[idle], env.imbalance);
>> +        break;
>> +    case migrate_misfit:
>> +        schedstat_add(sd->lb_imbalance_misfit[idle], env.imbalance);
>> +        break;
>> +    }
>>   
> This switch statement could use a helper function.

Sure, will update this.

>   
>
>>       env.src_cpu = busiest->cpu;
>>       env.src_rq = busiest;
>> diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
>> index 78e48f5426ee..a02bc9db2f1c 100644
>> --- a/kernel/sched/stats.c
>> +++ b/kernel/sched/stats.c
>> @@ -151,11 +151,14 @@ static int show_schedstat(struct seq_file *seq,
>> void *v)
>>               seq_printf(seq, "domain%d %*pb", dcount++,
>>                      cpumask_pr_args(sched_domain_span(sd)));
>>               for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {
>> -                seq_printf(seq, " %u %u %u %u %u %u %u %u",
>> +                seq_printf(seq, " %u %u %u %u %u %u %u %u %u %u %u",
>>                       sd->lb_count[itype],
>>                       sd->lb_balanced[itype],
>>                       sd->lb_failed[itype],
>> -                    sd->lb_imbalance[itype],
>> +                    sd->lb_imbalance_load[itype],
>> +                    sd->lb_imbalance_util[itype],
>> +                    sd->lb_imbalance_task[itype],
>> +                    sd->lb_imbalance_misfit[itype],
>>                       sd->lb_gained[itype],
>>                       sd->lb_hot_gained[itype],
>>                       sd->lb_nobusyq[itype],

--
Thanks and Regards,
Swapnil


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16
  2024-03-15  9:06             ` Swapnil Sapkal
@ 2024-03-15  9:20               ` Shrikanth Hegde
  2024-03-15 14:03                 ` Swapnil Sapkal
  0 siblings, 1 reply; 35+ messages in thread
From: Shrikanth Hegde @ 2024-03-15  9:20 UTC (permalink / raw)
  To: Swapnil Sapkal
  Cc: linux-kernel, Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Valentin Schneider, Vincent Guittot, Gautham R. Shenoy,
	Ingo Molnar



On 3/15/24 2:36 PM, Swapnil Sapkal wrote:
> Hello Shrikanth,
> 
> Thank you for reviewing.
> 
> On 3/15/2024 10:07 AM, Shrikanth Hegde wrote:
>>

>>> 19 20
>>> 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
>>> +domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
>>> 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
>>> 44 45
>>>     The first field is a bit mask indicating what cpus this domain
>>> operates
>>> over.
>>>   
>> IIUC, this is editing the content of Version 15, But changes would
>> happen on Version 16.
>> Instead can we add the section for Version 16 and not modify for 15?
> 
> This file contains the field details of current schedstat version and
> the short
> summary of what change across the versions. Maintaining each versions
> details will
> increase the file size. Instead I will add the links to previous version's
> documentation.
>  

I hadn't seen that. what you are saying is right. it would bloat up.  

> Thoughts on this?
> 
>>> -The next 24 are a variety of sched_balance_rq() statistics in grouped
>>> into types
>>> +The next 33 are a variety of sched_balance_rq() statistics in grouped
>>> into types
>>>   of idleness (idle, busy, and newly idle):

Please change this to
 of idleness (busy, idle, and newly idle): 

>>>         1)  # of times in this domain sched_balance_rq() was called
>>> when the
>>> +        cpu was busy
>>> +    2)  # of times in this domain sched_balance_rq() checked but
>>> found the
>>> +        load did not require balancing when busy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/10] sched/balancing: Switch the 'DEFINE_SPINLOCK(balancing)' spinlock into an 'atomic_t sched_balance_running' flag
  2024-03-08 10:58 ` [PATCH 01/10] sched/balancing: Switch the 'DEFINE_SPINLOCK(balancing)' spinlock into an 'atomic_t sched_balance_running' flag Ingo Molnar
  2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
@ 2024-03-15  9:25   ` Shrikanth Hegde
  1 sibling, 0 replies; 35+ messages in thread
From: Shrikanth Hegde @ 2024-03-15  9:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dietmar Eggemann, Peter Zijlstra, Valentin Schneider,
	Vincent Guittot, LKML



On 3/8/24 4:28 PM, Ingo Molnar wrote:
> The 'balancing' spinlock added in:
> 

> Also document the flag a bit.
> 
> No change in functionality intended.
> 

> -static DEFINE_SPINLOCK(balancing);
> +/*
> + * This flag serializes load-balancing passes over large domains
> + * (above the NODE topology level) - only one load-balancing instance
> + * may run at a time, to reduce overhead on very large systems with
> + * lots of CPUs and large NUMA distances.
> + *
> + * - Note that load-balancing passes triggered while another one
> + *   is executing are skipped and not re-tried.
> + *
> + * - Also note that this does not serialize rebalance_domains()

nit: please change rebalance_domains to sched_balance_domains. 

> + *   execution, as non-SD_SERIALIZE domains will still be
> + *   load-balanced in parallel.
> + */
> +static atomic_t sched_balance_running = ATOMIC_INIT(0);
>  
>  /*
>   * Scale the max load_balance interval with the number of CPUs in the system.
> @@ -11711,7 +11724,7 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
>  
>  		need_serialize = sd->flags & SD_SERIALIZE;
>  		if (need_serialize) {


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16
  2024-03-15  9:20               ` Shrikanth Hegde
@ 2024-03-15 14:03                 ` Swapnil Sapkal
  0 siblings, 0 replies; 35+ messages in thread
From: Swapnil Sapkal @ 2024-03-15 14:03 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: linux-kernel, Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Valentin Schneider, Vincent Guittot, Gautham R. Shenoy,
	Ingo Molnar


On 3/15/2024 2:50 PM, Shrikanth Hegde wrote:
> On 3/15/24 2:36 PM, Swapnil Sapkal wrote:
>> Hello Shrikanth,
>>
>> Thank you for reviewing.
>>
>> On 3/15/2024 10:07 AM, Shrikanth Hegde wrote:
>>>> 19 20
>>>> 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
>>>> +domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
>>>> 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
>>>> 44 45
>>>>      The first field is a bit mask indicating what cpus this domain
>>>> operates
>>>> over.
>>>>    
>>> IIUC, this is editing the content of Version 15, But changes would
>>> happen on Version 16.
>>> Instead can we add the section for Version 16 and not modify for 15?
>> This file contains the field details of current schedstat version and
>> the short
>> summary of what change across the versions. Maintaining each versions
>> details will
>> increase the file size. Instead I will add the links to previous version's
>> documentation.
>>   
> I hadn't seen that. what you are saying is right. it would bloat up.
>
>> Thoughts on this?
>>
>>>> -The next 24 are a variety of sched_balance_rq() statistics in grouped
>>>> into types
>>>> +The next 33 are a variety of sched_balance_rq() statistics in grouped
>>>> into types
>>>>    of idleness (idle, busy, and newly idle):
> Please change this to
>   of idleness (busy, idle, and newly idle):

Thanks for catching this. I have updated this in v2. v2 is available at:
https://lore.kernel.org/all/20240315135501.1778620-1-swapnil.sapkal@amd.com/

>>>>          1)  # of times in this domain sched_balance_rq() was called
>>>> when the
>>>> +        cpu was busy
>>>> +    2)  # of times in this domain sched_balance_rq() checked but
>>>> found the
>>>> +        load did not require balancing when busy

--
Thanks and Regards,
Swapnil


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2024-03-15 14:03 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-08 10:58 [PATCH -v4 00/10] sched/balancing: Misc updates & cleanups Ingo Molnar
2024-03-08 10:58 ` [PATCH 01/10] sched/balancing: Switch the 'DEFINE_SPINLOCK(balancing)' spinlock into an 'atomic_t sched_balance_running' flag Ingo Molnar
2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
2024-03-15  9:25   ` [PATCH 01/10] " Shrikanth Hegde
2024-03-08 10:58 ` [PATCH 02/10] sched/balancing: Remove reliance on 'enum cpu_idle_type' ordering when iterating [CPU_MAX_IDLE_TYPES] arrays in show_schedstat() Ingo Molnar
2024-03-08 13:26   ` Vincent Guittot
2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Shrikanth Hegde
2024-03-08 10:58 ` [PATCH 03/10] sched/balancing: Change 'enum cpu_idle_type' to have more natural definitions Ingo Molnar
2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
2024-03-08 10:58 ` [PATCH 04/10] sched/debug: Increase SCHEDSTAT_VERSION to 16 Ingo Molnar
2024-03-08 16:40   ` Shrikanth Hegde
2024-03-12 10:01     ` Ingo Molnar
2024-03-14  5:21       ` Gautham R. Shenoy
2024-03-14  6:04         ` Swapnil Sapkal
2024-03-15  4:37           ` Shrikanth Hegde
2024-03-15  9:06             ` Swapnil Sapkal
2024-03-15  9:20               ` Shrikanth Hegde
2024-03-15 14:03                 ` Swapnil Sapkal
2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
2024-03-08 10:58 ` [PATCH 05/10] sched/debug: Allow CONFIG_SCHEDSTATS even on !KERNEL_DEBUG kernels Ingo Molnar
2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
2024-03-08 10:58 ` [PATCH 06/10] sched/balancing: Change comment formatting to not overlap Git conflict marker lines Ingo Molnar
2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
2024-03-08 10:58 ` [PATCH 07/10] sched/balancing: Fix comments (trying to) refer to NOHZ_BALANCE_KICK Ingo Molnar
2024-03-08 12:03   ` Vincent Guittot
2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
2024-03-08 10:58 ` [PATCH 08/10] sched/balancing: Update run_rebalance_domains() comments Ingo Molnar
2024-03-08 12:02   ` Vincent Guittot
2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
2024-03-08 10:59 ` [PATCH 09/10] sched/balancing: Vertically align the comments of 'struct sg_lb_stats' and 'struct sd_lb_stats' Ingo Molnar
2024-03-08 12:01   ` Vincent Guittot
2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
2024-03-08 10:59 ` [PATCH 10/10] sched/balancing: Update comments in " Ingo Molnar
2024-03-08 12:01   ` Vincent Guittot
2024-03-12 12:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).