All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7 v3] move update blocked load outside newidle_balance
@ 2021-02-12 14:17 Vincent Guittot
  2021-02-12 14:17 ` [PATCH 1/7 v3] sched/fair: remove update of blocked load from newidle_balance Vincent Guittot
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Vincent Guittot @ 2021-02-12 14:17 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, joel, valentin.schneider
  Cc: fweisbec, tglx, qais.yousef, Vincent Guittot

Joel reported long preempt and irq off sequence in newidle_balance because
of a large number of CPU cgroups in use and having to be updated. This
patchset moves the update outside newidle_imblance. This enables to early
abort during the updates in case of pending irq as an example.

Instead of kicking a normal ILB that will wakes up CPU which is already
idle, patch 6 triggers the update of statistics in the idle thread of
the CPU before selecting and entering an idle state.

Changes on v3:
- Fixed a compilation error for !CONFIG_SMP && CONFIG_NO_HZ_COMMON
  reported by kernel test robot <lkp@intel.com>
- Took advantage of this new version to add a short desciption for
  nohz_run_idle_balance

Changes on v2:
- Fixed some typos and updated some comments
- Added more cleanup
- Changed to way to trigger ILB in idle thread context to remove a possible
  race condition between the normal softirq ILB and this new mecanism. The
  cpu can already be set in idle_cpus_mask because even if the cpu is added
  later when entering idle, it might not have been removed yet from previous
  idle phase.
  
Vincent Guittot (7):
  sched/fair: remove update of blocked load from newidle_balance
  sched/fair: remove unused return of _nohz_idle_balance
  sched/fair: remove unused parameter of update_nohz_stats
  sched/fair: merge for each idle cpu loop of ILB
  sched/fair: reorder newidle_balance pulled_task tests
  sched/fair: trigger the update of blocked load on newly idle cpu
  sched/fair: reduce the window for duplicated update

 kernel/sched/fair.c  | 118 +++++++++++++++++--------------------------
 kernel/sched/idle.c  |   6 +++
 kernel/sched/sched.h |   5 ++
 3 files changed, 57 insertions(+), 72 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/7 v3] sched/fair: remove update of blocked load from newidle_balance
  2021-02-12 14:17 [PATCH 0/7 v3] move update blocked load outside newidle_balance Vincent Guittot
@ 2021-02-12 14:17 ` Vincent Guittot
  2021-02-12 14:17 ` [PATCH 2/7 v3] sched/fair: remove unused return of _nohz_idle_balance Vincent Guittot
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Vincent Guittot @ 2021-02-12 14:17 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, joel, valentin.schneider
  Cc: fweisbec, tglx, qais.yousef, Vincent Guittot

newidle_balance runs with both preempt and irq disabled which prevent
local irq to run during this period. The duration for updating the
blocked load of CPUs varies according to the number of CPU cgroups
with non-decayed load and extends this critical period to an uncontrolled
level.

Remove the update from newidle_balance and trigger a normal ILB that
will take care of the update instead.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 33 +++++----------------------------
 1 file changed, 5 insertions(+), 28 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 59b645e3c4fd..bfe1e235fe01 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7392,8 +7392,6 @@ enum migration_type {
 #define LBF_NEED_BREAK	0x02
 #define LBF_DST_PINNED  0x04
 #define LBF_SOME_PINNED	0x08
-#define LBF_NOHZ_STATS	0x10
-#define LBF_NOHZ_AGAIN	0x20
 
 struct lb_env {
 	struct sched_domain	*sd;
@@ -8397,9 +8395,6 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
 		struct rq *rq = cpu_rq(i);
 
-		if ((env->flags & LBF_NOHZ_STATS) && update_nohz_stats(rq, false))
-			env->flags |= LBF_NOHZ_AGAIN;
-
 		sgs->group_load += cpu_load(rq);
 		sgs->group_util += cpu_util(i);
 		sgs->group_runnable += cpu_runnable(rq);
@@ -8940,11 +8935,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 	struct sg_lb_stats tmp_sgs;
 	int sg_status = 0;
 
-#ifdef CONFIG_NO_HZ_COMMON
-	if (env->idle == CPU_NEWLY_IDLE && READ_ONCE(nohz.has_blocked))
-		env->flags |= LBF_NOHZ_STATS;
-#endif
-
 	do {
 		struct sg_lb_stats *sgs = &tmp_sgs;
 		int local_group;
@@ -8981,14 +8971,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 	/* Tag domain that child domain prefers tasks go to siblings first */
 	sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING;
 
-#ifdef CONFIG_NO_HZ_COMMON
-	if ((env->flags & LBF_NOHZ_AGAIN) &&
-	    cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd))) {
-
-		WRITE_ONCE(nohz.next_blocked,
-			   jiffies + msecs_to_jiffies(LOAD_AVG_PERIOD));
-	}
-#endif
 
 	if (env->sd->flags & SD_NUMA)
 		env->fbq_type = fbq_classify_group(&sds->busiest_stat);
@@ -10517,16 +10499,11 @@ static void nohz_newidle_balance(struct rq *this_rq)
 	    time_before(jiffies, READ_ONCE(nohz.next_blocked)))
 		return;
 
-	raw_spin_unlock(&this_rq->lock);
 	/*
-	 * This CPU is going to be idle and blocked load of idle CPUs
-	 * need to be updated. Run the ilb locally as it is a good
-	 * candidate for ilb instead of waking up another idle CPU.
-	 * Kick an normal ilb if we failed to do the update.
+	 * Blocked load of idle CPUs need to be updated.
+	 * Kick an ILB to update statistics.
 	 */
-	if (!_nohz_idle_balance(this_rq, NOHZ_STATS_KICK, CPU_NEWLY_IDLE))
-		kick_ilb(NOHZ_STATS_KICK);
-	raw_spin_lock(&this_rq->lock);
+	kick_ilb(NOHZ_STATS_KICK);
 }
 
 #else /* !CONFIG_NO_HZ_COMMON */
@@ -10587,8 +10564,6 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 			update_next_balance(sd, &next_balance);
 		rcu_read_unlock();
 
-		nohz_newidle_balance(this_rq);
-
 		goto out;
 	}
 
@@ -10654,6 +10629,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 
 	if (pulled_task)
 		this_rq->idle_stamp = 0;
+	else
+		nohz_newidle_balance(this_rq);
 
 	rq_repin_lock(this_rq, rf);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/7 v3] sched/fair: remove unused return of _nohz_idle_balance
  2021-02-12 14:17 [PATCH 0/7 v3] move update blocked load outside newidle_balance Vincent Guittot
  2021-02-12 14:17 ` [PATCH 1/7 v3] sched/fair: remove update of blocked load from newidle_balance Vincent Guittot
@ 2021-02-12 14:17 ` Vincent Guittot
  2021-02-12 14:17 ` [PATCH 3/7 v3] sched/fair: remove unused parameter of update_nohz_stats Vincent Guittot
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Vincent Guittot @ 2021-02-12 14:17 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, joel, valentin.schneider
  Cc: fweisbec, tglx, qais.yousef, Vincent Guittot

The return of _nohz_idle_balance() is not used anymore so we can remove
it

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bfe1e235fe01..f3f0f8cca061 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10354,10 +10354,8 @@ void nohz_balance_enter_idle(int cpu)
  * Internal function that runs load balance for all idle cpus. The load balance
  * can be a simple update of blocked load or a complete load balance with
  * tasks movement depending of flags.
- * The function returns false if the loop has stopped before running
- * through all idle CPUs.
  */
-static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
+static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
 			       enum cpu_idle_type idle)
 {
 	/* Earliest time when we have to do rebalance again */
@@ -10367,7 +10365,6 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
 	int update_next_balance = 0;
 	int this_cpu = this_rq->cpu;
 	int balance_cpu;
-	int ret = false;
 	struct rq *rq;
 
 	SCHED_WARN_ON((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK);
@@ -10447,15 +10444,10 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
 	WRITE_ONCE(nohz.next_blocked,
 		now + msecs_to_jiffies(LOAD_AVG_PERIOD));
 
-	/* The full idle balance loop has been done */
-	ret = true;
-
 abort:
 	/* There is still blocked load, enable periodic update */
 	if (has_blocked_load)
 		WRITE_ONCE(nohz.has_blocked, 1);
-
-	return ret;
 }
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/7 v3] sched/fair: remove unused parameter of update_nohz_stats
  2021-02-12 14:17 [PATCH 0/7 v3] move update blocked load outside newidle_balance Vincent Guittot
  2021-02-12 14:17 ` [PATCH 1/7 v3] sched/fair: remove update of blocked load from newidle_balance Vincent Guittot
  2021-02-12 14:17 ` [PATCH 2/7 v3] sched/fair: remove unused return of _nohz_idle_balance Vincent Guittot
@ 2021-02-12 14:17 ` Vincent Guittot
  2021-02-12 14:17 ` [PATCH 4/7 v3] sched/fair: merge for each idle cpu loop of ILB Vincent Guittot
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Vincent Guittot @ 2021-02-12 14:17 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, joel, valentin.schneider
  Cc: fweisbec, tglx, qais.yousef, Vincent Guittot

idle load balance is the only user of update_nohz_stats and doesn't use
force parameter. Remove it

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f3f0f8cca061..4573a0abd38a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8352,7 +8352,7 @@ group_type group_classify(unsigned int imbalance_pct,
 	return group_has_spare;
 }
 
-static bool update_nohz_stats(struct rq *rq, bool force)
+static bool update_nohz_stats(struct rq *rq)
 {
 #ifdef CONFIG_NO_HZ_COMMON
 	unsigned int cpu = rq->cpu;
@@ -8363,7 +8363,7 @@ static bool update_nohz_stats(struct rq *rq, bool force)
 	if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask))
 		return false;
 
-	if (!force && !time_after(jiffies, rq->last_blocked_load_update_tick))
+	if (!time_after(jiffies, rq->last_blocked_load_update_tick))
 		return true;
 
 	update_blocked_averages(cpu);
@@ -10401,7 +10401,7 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
 
 		rq = cpu_rq(balance_cpu);
 
-		has_blocked_load |= update_nohz_stats(rq, true);
+		has_blocked_load |= update_nohz_stats(rq);
 
 		/*
 		 * If time for next balance is due,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/7 v3] sched/fair: merge for each idle cpu loop of ILB
  2021-02-12 14:17 [PATCH 0/7 v3] move update blocked load outside newidle_balance Vincent Guittot
                   ` (2 preceding siblings ...)
  2021-02-12 14:17 ` [PATCH 3/7 v3] sched/fair: remove unused parameter of update_nohz_stats Vincent Guittot
@ 2021-02-12 14:17 ` Vincent Guittot
  2021-02-12 14:17 ` [PATCH 5/7 v3] sched/fair: reorder newidle_balance pulled_task tests Vincent Guittot
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Vincent Guittot @ 2021-02-12 14:17 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, joel, valentin.schneider
  Cc: fweisbec, tglx, qais.yousef, Vincent Guittot

Remove the specific case for handling this_cpu outside for_each_cpu() loop
when running ILB. Instead we use for_each_cpu_wrap() and start with the
next cpu after this_cpu so we will continue to finish with this_cpu.

update_nohz_stats() is now used for this_cpu too and will prevents
unnecessary update. We don't need a special case for handling the update of
nohz.next_balance for this_cpu anymore because it is now handled by the
loop like others.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 32 +++++++-------------------------
 1 file changed, 7 insertions(+), 25 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4573a0abd38a..ac4add026f32 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10043,22 +10043,9 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
 	 * When the cpu is attached to null domain for ex, it will not be
 	 * updated.
 	 */
-	if (likely(update_next_balance)) {
+	if (likely(update_next_balance))
 		rq->next_balance = next_balance;
 
-#ifdef CONFIG_NO_HZ_COMMON
-		/*
-		 * If this CPU has been elected to perform the nohz idle
-		 * balance. Other idle CPUs have already rebalanced with
-		 * nohz_idle_balance() and nohz.next_balance has been
-		 * updated accordingly. This CPU is now running the idle load
-		 * balance for itself and we need to update the
-		 * nohz.next_balance accordingly.
-		 */
-		if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance))
-			nohz.next_balance = rq->next_balance;
-#endif
-	}
 }
 
 static inline int on_null_domain(struct rq *rq)
@@ -10385,8 +10372,12 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
 	 */
 	smp_mb();
 
-	for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
-		if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
+	/*
+	 * Start with the next CPU after this_cpu so we will end with this_cpu and let a
+	 * chance for other idle cpu to pull load.
+	 */
+	for_each_cpu_wrap(balance_cpu,  nohz.idle_cpus_mask, this_cpu+1) {
+		if (!idle_cpu(balance_cpu))
 			continue;
 
 		/*
@@ -10432,15 +10423,6 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
 	if (likely(update_next_balance))
 		nohz.next_balance = next_balance;
 
-	/* Newly idle CPU doesn't need an update */
-	if (idle != CPU_NEWLY_IDLE) {
-		update_blocked_averages(this_cpu);
-		has_blocked_load |= this_rq->has_blocked_load;
-	}
-
-	if (flags & NOHZ_BALANCE_KICK)
-		rebalance_domains(this_rq, CPU_IDLE);
-
 	WRITE_ONCE(nohz.next_blocked,
 		now + msecs_to_jiffies(LOAD_AVG_PERIOD));
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/7 v3] sched/fair: reorder newidle_balance pulled_task tests
  2021-02-12 14:17 [PATCH 0/7 v3] move update blocked load outside newidle_balance Vincent Guittot
                   ` (3 preceding siblings ...)
  2021-02-12 14:17 ` [PATCH 4/7 v3] sched/fair: merge for each idle cpu loop of ILB Vincent Guittot
@ 2021-02-12 14:17 ` Vincent Guittot
  2021-02-12 14:17 ` [PATCH 6/7 v3] sched/fair: trigger the update of blocked load on newly idle cpu Vincent Guittot
  2021-02-12 14:17 ` [PATCH 7/7 v3] sched/fair: reduce the window for duplicated update Vincent Guittot
  6 siblings, 0 replies; 13+ messages in thread
From: Vincent Guittot @ 2021-02-12 14:17 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, joel, valentin.schneider
  Cc: fweisbec, tglx, qais.yousef, Vincent Guittot

Reorder the tests and skip useless ones when no load balance has been
performed and rq lock has not been released.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ac4add026f32..5d285d93e433 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10584,7 +10584,6 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 	if (curr_cost > this_rq->max_idle_balance_cost)
 		this_rq->max_idle_balance_cost = curr_cost;
 
-out:
 	/*
 	 * While browsing the domains, we released the rq lock, a task could
 	 * have been enqueued in the meantime. Since we're not going idle,
@@ -10593,14 +10592,15 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 	if (this_rq->cfs.h_nr_running && !pulled_task)
 		pulled_task = 1;
 
-	/* Move the next balance forward */
-	if (time_after(this_rq->next_balance, next_balance))
-		this_rq->next_balance = next_balance;
-
 	/* Is there a task of a high priority class? */
 	if (this_rq->nr_running != this_rq->cfs.h_nr_running)
 		pulled_task = -1;
 
+out:
+	/* Move the next balance forward */
+	if (time_after(this_rq->next_balance, next_balance))
+		this_rq->next_balance = next_balance;
+
 	if (pulled_task)
 		this_rq->idle_stamp = 0;
 	else
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/7 v3] sched/fair: trigger the update of blocked load on newly idle cpu
  2021-02-12 14:17 [PATCH 0/7 v3] move update blocked load outside newidle_balance Vincent Guittot
                   ` (4 preceding siblings ...)
  2021-02-12 14:17 ` [PATCH 5/7 v3] sched/fair: reorder newidle_balance pulled_task tests Vincent Guittot
@ 2021-02-12 14:17 ` Vincent Guittot
  2021-02-12 19:19   ` Valentin Schneider
  2021-02-12 14:17 ` [PATCH 7/7 v3] sched/fair: reduce the window for duplicated update Vincent Guittot
  6 siblings, 1 reply; 13+ messages in thread
From: Vincent Guittot @ 2021-02-12 14:17 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, joel, valentin.schneider
  Cc: fweisbec, tglx, qais.yousef, Vincent Guittot

Instead of waking up a random and already idle CPU, we can take advantage
of this_cpu being about to enter idle to run the ILB and update the
blocked load.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c  | 24 +++++++++++++++++++++---
 kernel/sched/idle.c  |  6 ++++++
 kernel/sched/sched.h |  5 +++++
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5d285d93e433..cd0ea635225e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10453,6 +10453,24 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 	return true;
 }
 
+/*
+ * Check if we need to run the ILB for updating blocked load before entering
+ * idle state.
+ */
+void nohz_run_idle_balance(int cpu)
+{
+	unsigned int flags;
+
+	flags = atomic_fetch_andnot(NOHZ_KICK_MASK, nohz_flags(cpu));
+
+	if (flags && !need_resched()) {
+		struct rq *rq = cpu_rq(cpu);
+
+		rq->nohz_idle_balance = flags;
+		nohz_idle_balance(rq, CPU_IDLE);
+	}
+}
+
 static void nohz_newidle_balance(struct rq *this_rq)
 {
 	int this_cpu = this_rq->cpu;
@@ -10474,10 +10492,10 @@ static void nohz_newidle_balance(struct rq *this_rq)
 		return;
 
 	/*
-	 * Blocked load of idle CPUs need to be updated.
-	 * Kick an ILB to update statistics.
+	 * Set the need to trigger ILB in order to update blocked load
+	 * before entering idle state.
 	 */
-	kick_ilb(NOHZ_STATS_KICK);
+	atomic_or(NOHZ_STATS_KICK, nohz_flags(this_cpu));
 }
 
 #else /* !CONFIG_NO_HZ_COMMON */
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 305727ea0677..52a4e9ce2f9b 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -261,6 +261,12 @@ static void cpuidle_idle_call(void)
 static void do_idle(void)
 {
 	int cpu = smp_processor_id();
+
+	/*
+	 * Check if we need to update some blocked load
+	 */
+	nohz_run_idle_balance(cpu);
+
 	/*
 	 * If the arch has a polling bit, we maintain an invariant:
 	 *
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6edc67df3554..17de50acb88d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2374,6 +2374,11 @@ extern void nohz_balance_exit_idle(struct rq *rq);
 static inline void nohz_balance_exit_idle(struct rq *rq) { }
 #endif
 
+#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
+extern void nohz_run_idle_balance(int cpu);
+#else
+static inline void nohz_run_idle_balance(int cpu) { }
+#endif
 
 #ifdef CONFIG_SMP
 static inline
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/7 v3] sched/fair: reduce the window for duplicated update
  2021-02-12 14:17 [PATCH 0/7 v3] move update blocked load outside newidle_balance Vincent Guittot
                   ` (5 preceding siblings ...)
  2021-02-12 14:17 ` [PATCH 6/7 v3] sched/fair: trigger the update of blocked load on newly idle cpu Vincent Guittot
@ 2021-02-12 14:17 ` Vincent Guittot
  2021-02-12 19:19   ` Valentin Schneider
  6 siblings, 1 reply; 13+ messages in thread
From: Vincent Guittot @ 2021-02-12 14:17 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, joel, valentin.schneider
  Cc: fweisbec, tglx, qais.yousef, Vincent Guittot

Start to update last_blocked_load_update_tick to reduce the possibility
of another cpu starting the update one more time

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cd0ea635225e..7ef0911529ee 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7852,16 +7852,20 @@ static inline bool others_have_blocked(struct rq *rq)
 	return false;
 }
 
-static inline void update_blocked_load_status(struct rq *rq, bool has_blocked)
+static inline void update_blocked_load_tick(struct rq *rq)
 {
-	rq->last_blocked_load_update_tick = jiffies;
+	WRITE_ONCE(rq->last_blocked_load_update_tick, jiffies);
+}
 
+static inline void update_blocked_load_status(struct rq *rq, bool has_blocked)
+{
 	if (!has_blocked)
 		rq->has_blocked_load = 0;
 }
 #else
 static inline bool cfs_rq_has_blocked(struct cfs_rq *cfs_rq) { return false; }
 static inline bool others_have_blocked(struct rq *rq) { return false; }
+static inline void update_blocked_load_tick(struct rq *rq) {}
 static inline void update_blocked_load_status(struct rq *rq, bool has_blocked) {}
 #endif
 
@@ -8022,6 +8026,7 @@ static void update_blocked_averages(int cpu)
 	struct rq_flags rf;
 
 	rq_lock_irqsave(rq, &rf);
+	update_blocked_load_tick(rq);
 	update_rq_clock(rq);
 
 	decayed |= __update_blocked_others(rq, &done);
@@ -8363,7 +8368,7 @@ static bool update_nohz_stats(struct rq *rq)
 	if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask))
 		return false;
 
-	if (!time_after(jiffies, rq->last_blocked_load_update_tick))
+	if (!time_after(jiffies, READ_ONCE(rq->last_blocked_load_update_tick)))
 		return true;
 
 	update_blocked_averages(cpu);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/7 v3] sched/fair: trigger the update of blocked load on newly idle cpu
  2021-02-12 14:17 ` [PATCH 6/7 v3] sched/fair: trigger the update of blocked load on newly idle cpu Vincent Guittot
@ 2021-02-12 19:19   ` Valentin Schneider
  2021-02-15 15:02     ` Vincent Guittot
  0 siblings, 1 reply; 13+ messages in thread
From: Valentin Schneider @ 2021-02-12 19:19 UTC (permalink / raw)
  To: Vincent Guittot, mingo, peterz, juri.lelli, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, linux-kernel, joel
  Cc: fweisbec, tglx, qais.yousef, Vincent Guittot

On 12/02/21 15:17, Vincent Guittot wrote:
> Instead of waking up a random and already idle CPU, we can take advantage
> of this_cpu being about to enter idle to run the ILB and update the
> blocked load.
>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c  | 24 +++++++++++++++++++++---
>  kernel/sched/idle.c  |  6 ++++++
>  kernel/sched/sched.h |  5 +++++
>  3 files changed, 32 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5d285d93e433..cd0ea635225e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10453,6 +10453,24 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>       return true;
>  }
>
> +/*
> + * Check if we need to run the ILB for updating blocked load before entering
> + * idle state.
> + */
> +void nohz_run_idle_balance(int cpu)
> +{
> +	unsigned int flags;
> +
> +	flags = atomic_fetch_andnot(NOHZ_KICK_MASK, nohz_flags(cpu));
> +
> +	if (flags && !need_resched()) {
> +		struct rq *rq = cpu_rq(cpu);
> +
> +		rq->nohz_idle_balance = flags;
> +		nohz_idle_balance(rq, CPU_IDLE);
> +	}

So this can now run a full fledged nohz_idle_balance() if NOHZ_BALANCE_MASK
is set.

I don't think there is anything inherently wrong with it - the
nohz_idle_balance() call resulting from the kick_ilb() IPI will just bail
out due to the flags being cleared here. This wasn't immediately clear to
me however.

> +}
> +

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 7/7 v3] sched/fair: reduce the window for duplicated update
  2021-02-12 14:17 ` [PATCH 7/7 v3] sched/fair: reduce the window for duplicated update Vincent Guittot
@ 2021-02-12 19:19   ` Valentin Schneider
  0 siblings, 0 replies; 13+ messages in thread
From: Valentin Schneider @ 2021-02-12 19:19 UTC (permalink / raw)
  To: Vincent Guittot, mingo, peterz, juri.lelli, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, linux-kernel, joel
  Cc: fweisbec, tglx, qais.yousef, Vincent Guittot

On 12/02/21 15:17, Vincent Guittot wrote:
> Start to update last_blocked_load_update_tick to reduce the possibility
> of another cpu starting the update one more time
>

IIUC this can happen if e.g. a CPU is busy updating load in
update_blocked_averages() while another enters update_nohz_stats() for the
same rq. They'll be serialized by the RQ lock, but both can still enter
update_blocked_averages() at roughly the same time.

Shouldn't then the rq->last_blocked_load_update_tick check be either
deferred to or re-done within update_blocked_averages(), with the rq lock
held?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/7 v3] sched/fair: trigger the update of blocked load on newly idle cpu
  2021-02-12 19:19   ` Valentin Schneider
@ 2021-02-15 15:02     ` Vincent Guittot
  2021-02-17 11:51       ` Valentin Schneider
  0 siblings, 1 reply; 13+ messages in thread
From: Vincent Guittot @ 2021-02-15 15:02 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, linux-kernel, Joel Fernandes,
	Frederic Weisbecker, Thomas Gleixner, Qais Yousef

On Fri, 12 Feb 2021 at 20:19, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> On 12/02/21 15:17, Vincent Guittot wrote:
> > Instead of waking up a random and already idle CPU, we can take advantage
> > of this_cpu being about to enter idle to run the ILB and update the
> > blocked load.
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> >  kernel/sched/fair.c  | 24 +++++++++++++++++++++---
> >  kernel/sched/idle.c  |  6 ++++++
> >  kernel/sched/sched.h |  5 +++++
> >  3 files changed, 32 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 5d285d93e433..cd0ea635225e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -10453,6 +10453,24 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
> >       return true;
> >  }
> >
> > +/*
> > + * Check if we need to run the ILB for updating blocked load before entering
> > + * idle state.
> > + */
> > +void nohz_run_idle_balance(int cpu)
> > +{
> > +     unsigned int flags;
> > +
> > +     flags = atomic_fetch_andnot(NOHZ_KICK_MASK, nohz_flags(cpu));
> > +
> > +     if (flags && !need_resched()) {
> > +             struct rq *rq = cpu_rq(cpu);
> > +
> > +             rq->nohz_idle_balance = flags;
> > +             nohz_idle_balance(rq, CPU_IDLE);
> > +     }
>
> So this can now run a full fledged nohz_idle_balance() if NOHZ_BALANCE_MASK
> is set.

Yes.
>
> I don't think there is anything inherently wrong with it - the
> nohz_idle_balance() call resulting from the kick_ilb() IPI will just bail
> out due to the flags being cleared here. This wasn't immediately clear to
> me however.

In fact, I forgot to replace the WARN_ON in nohz_csd_func() by a
simple return as reported by kernel test robot / oliver.sang@intel.com

>
> > +}
> > +

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/7 v3] sched/fair: trigger the update of blocked load on newly idle cpu
  2021-02-15 15:02     ` Vincent Guittot
@ 2021-02-17 11:51       ` Valentin Schneider
  2021-02-17 15:34         ` Vincent Guittot
  0 siblings, 1 reply; 13+ messages in thread
From: Valentin Schneider @ 2021-02-17 11:51 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, linux-kernel, Joel Fernandes,
	Frederic Weisbecker, Thomas Gleixner, Qais Yousef

On 15/02/21 16:02, Vincent Guittot wrote:
> On Fri, 12 Feb 2021 at 20:19, Valentin Schneider
> <valentin.schneider@arm.com> wrote:
>> I don't think there is anything inherently wrong with it - the
>> nohz_idle_balance() call resulting from the kick_ilb() IPI will just bail
>> out due to the flags being cleared here. This wasn't immediately clear to
>> me however.
>
> In fact, I forgot to replace the WARN_ON in nohz_csd_func() by a
> simple return as reported by kernel test robot / oliver.sang@intel.com
>

Can't that actually be a problem? kick_ilb() says:

         * Access to rq::nohz_csd is serialized by NOHZ_KICK_MASK; he who sets
         * the first flag owns it; cleared by nohz_csd_func().

So if you have:

  kick_ilb() -> kicks CPU42

And then said CPU42 goes through, before nohz_csd_func(),:

  do_idle() -> nohz_run_idle_balance()

you could have yet another CPU do:

  kick_ilb() -> kicks CPU42

which would break rq->nohz_csd serialization.

>>
>> > +}
>> > +

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/7 v3] sched/fair: trigger the update of blocked load on newly idle cpu
  2021-02-17 11:51       ` Valentin Schneider
@ 2021-02-17 15:34         ` Vincent Guittot
  0 siblings, 0 replies; 13+ messages in thread
From: Vincent Guittot @ 2021-02-17 15:34 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, linux-kernel, Joel Fernandes,
	Frederic Weisbecker, Thomas Gleixner, Qais Yousef

On Wed, 17 Feb 2021 at 12:51, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> On 15/02/21 16:02, Vincent Guittot wrote:
> > On Fri, 12 Feb 2021 at 20:19, Valentin Schneider
> > <valentin.schneider@arm.com> wrote:
> >> I don't think there is anything inherently wrong with it - the
> >> nohz_idle_balance() call resulting from the kick_ilb() IPI will just bail
> >> out due to the flags being cleared here. This wasn't immediately clear to
> >> me however.
> >
> > In fact, I forgot to replace the WARN_ON in nohz_csd_func() by a
> > simple return as reported by kernel test robot / oliver.sang@intel.com
> >
>
> Can't that actually be a problem? kick_ilb() says:
>
>          * Access to rq::nohz_csd is serialized by NOHZ_KICK_MASK; he who sets
>          * the first flag owns it; cleared by nohz_csd_func().
>
> So if you have:
>
>   kick_ilb() -> kicks CPU42
>
> And then said CPU42 goes through, before nohz_csd_func(),:
>
>   do_idle() -> nohz_run_idle_balance()
>
> you could have yet another CPU do:
>
>   kick_ilb() -> kicks CPU42
>
> which would break rq->nohz_csd serialization.

Yeah there are ever further problems and I get some rcu_sched log on
my large server with one benchmark with one specific parameter which I
can't reproduce on my smaller system. Right now, I'm working on making
both exclusive which should be mainly about testing if this_cpu is set
in nohz.idle_cpus_mask

>
> >>
> >> > +}
> >> > +

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-02-17 15:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-12 14:17 [PATCH 0/7 v3] move update blocked load outside newidle_balance Vincent Guittot
2021-02-12 14:17 ` [PATCH 1/7 v3] sched/fair: remove update of blocked load from newidle_balance Vincent Guittot
2021-02-12 14:17 ` [PATCH 2/7 v3] sched/fair: remove unused return of _nohz_idle_balance Vincent Guittot
2021-02-12 14:17 ` [PATCH 3/7 v3] sched/fair: remove unused parameter of update_nohz_stats Vincent Guittot
2021-02-12 14:17 ` [PATCH 4/7 v3] sched/fair: merge for each idle cpu loop of ILB Vincent Guittot
2021-02-12 14:17 ` [PATCH 5/7 v3] sched/fair: reorder newidle_balance pulled_task tests Vincent Guittot
2021-02-12 14:17 ` [PATCH 6/7 v3] sched/fair: trigger the update of blocked load on newly idle cpu Vincent Guittot
2021-02-12 19:19   ` Valentin Schneider
2021-02-15 15:02     ` Vincent Guittot
2021-02-17 11:51       ` Valentin Schneider
2021-02-17 15:34         ` Vincent Guittot
2021-02-12 14:17 ` [PATCH 7/7 v3] sched/fair: reduce the window for duplicated update Vincent Guittot
2021-02-12 19:19   ` Valentin Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.