stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] sched/fair: handle case of task_h_load() returning 0
@ 2020-07-10 15:24 Vincent Guittot
  2020-07-16  0:27 ` Sasha Levin
  2020-07-17 11:21 ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
  0 siblings, 2 replies; 4+ messages in thread
From: Vincent Guittot @ 2020-07-10 15:24 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel
  Cc: valentin.schneider, Vincent Guittot, stable

task_h_load() can return 0 in some situations like running stress-ng
mmapfork, which forks thousands of threads, in a sched group on a 224 cores
system. The load balance doesn't handle this correctly because
env->imbalance never decreases and it will stop pulling tasks only after
reaching loop_max, which can be equal to the number of running tasks of
the cfs. Make sure that imbalance will be decreased by at least 1.

misfit task is the other feature that doesn't handle correctly such
situation although it's probably more difficult to face the problem
because of the smaller number of CPUs and running tasks on heterogenous
system.

We can't simply ensure that task_h_load() returns at least one because it
would imply to handle underflow in other places.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: <stable@vger.kernel.org> # v4.4+
---

Changes v3:
- Fix warning about cast reported by lkp@intel.com>

 kernel/sched/fair.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b9b9f19e80c1..71a372e3707a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4049,7 +4049,11 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
 		return;
 	}
 
-	rq->misfit_task_load = task_h_load(p);
+	/*
+	 * Make sure that misfit_task_load will not be null even if
+	 * task_h_load() returns 0.
+	 */
+	rq->misfit_task_load = max_t(unsigned long, task_h_load(p), 1);
 }
 
 #else /* CONFIG_SMP */
@@ -7648,7 +7652,14 @@ static int detach_tasks(struct lb_env *env)
 
 		switch (env->migration_type) {
 		case migrate_load:
-			load = task_h_load(p);
+			/*
+			 * Depending of the number of CPUs and tasks and the
+			 * cgroup hierarchy, task_h_load() can return a null
+			 * value. Make sure that env->imbalance decreases
+			 * otherwise detach_tasks() will stop only after
+			 * detaching up to loop_max tasks.
+			 */
+			load = max_t(unsigned long, task_h_load(p), 1);
 
 			if (sched_feat(LB_MIN) &&
 			    load < 16 && !env->sd->nr_balance_failed)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] sched/fair: handle case of task_h_load() returning 0
  2020-07-10 15:24 [PATCH v3] sched/fair: handle case of task_h_load() returning 0 Vincent Guittot
@ 2020-07-16  0:27 ` Sasha Levin
  2020-07-16  7:21   ` Vincent Guittot
  2020-07-17 11:21 ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
  1 sibling, 1 reply; 4+ messages in thread
From: Sasha Levin @ 2020-07-16  0:27 UTC (permalink / raw)
  To: Sasha Levin, Vincent Guittot, mingo, peterz, juri.lelli
  Cc: valentin.schneider, stable

Hi

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: 4.4+

The bot has tested the following trees: v5.7.8, v5.4.51, v4.19.132, v4.14.188, v4.9.230, v4.4.230.

v5.7.8: Build OK!
v5.4.51: Failed to apply! Possible dependencies:
    0b0695f2b34a4 ("sched/fair: Rework load_balance()")
    490ba971d8b49 ("sched/fair: Clean up asym packing")
    a349834703010 ("sched/fair: Rename sg_lb_stats::sum_nr_running to sum_h_nr_running")
    fcf0553db6f4c ("sched/fair: Remove meaningless imbalance calculation")

v4.19.132: Failed to apply! Possible dependencies:
    0b0695f2b34a4 ("sched/fair: Rework load_balance()")
    1c1b8a7b03ef5 ("sched/fair: Replace source_load() & target_load() with weighted_cpuload()")
    3b1baa6496e6b ("sched/fair: Add 'group_misfit_task' load-balance type")
    4ad3831a9d4af ("sched/fair: Don't move tasks to lower capacity CPUs unless necessary")
    575638d1047eb ("sched/core: Change root_domain->overload type to int")
    630246a06ae2a ("sched/fair: Clean-up update_sg_lb_stats parameters")
    6aa140fa45089 ("sched/topology: Reference the Energy Model of CPUs when available")
    757ffdd705ee9 ("sched/fair: Set rq->rd->overload when misfit")
    a3df067974c52 ("sched/fair: Rename weighted_cpuload() to cpu_runnable_load()")
    cad68e552e777 ("sched/fair: Consider misfit tasks when load-balancing")
    dbbad719449e0 ("sched/fair: Change 'prefer_sibling' type to bool")
    e90c8fe15a3bf ("sched/fair: Wrap rq->rd->overload accesses with READ/WRITE_ONCE()")
    fdf5f315d5cfa ("sched/fair: Disable LB_BIAS by default")

v4.14.188: Failed to apply! Possible dependencies:
    0b0695f2b34a4 ("sched/fair: Rework load_balance()")
    1c1b8a7b03ef5 ("sched/fair: Replace source_load() & target_load() with weighted_cpuload()")
    2a2f5d4e44ed1 ("sched/fair: Rewrite cfs_rq->removed_*avg")
    3b1baa6496e6b ("sched/fair: Add 'group_misfit_task' load-balance type")
    7f65ea42eb00b ("sched/fair: Add util_est on top of PELT")
    97fb7a0a8944b ("sched: Clean up and harmonize the coding style of the scheduler code base")
    a3df067974c52 ("sched/fair: Rename weighted_cpuload() to cpu_runnable_load()")
    cad68e552e777 ("sched/fair: Consider misfit tasks when load-balancing")
    d18be45dbfef2 ("sched/cpufreq: Split utilization signals")
    d4edd662ac165 ("sched/cpufreq: Use the DEADLINE utilization signal")
    f01415fdbfe83 ("sched/fair: Use 'unsigned long' for utilization, consistently")

v4.9.230: Failed to apply! Possible dependencies:
    3b1baa6496e6b ("sched/fair: Add 'group_misfit_task' load-balance type")
    4eb5aaa3af8a5 ("sched/headers: Prepare for new header dependencies before moving code to <linux/sched/autogroup.h>")
    4f17722c7256a ("sched/headers: Prepare for new header dependencies before moving code to <linux/sched/loadavg.h>")
    555570d744f81 ("sched/clock: Update static_key usage")
    5eca1c10cbaa9 ("sched/headers: Clean up <linux/sched.h>")
    6e84f31522f93 ("sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>")
    7f65ea42eb00b ("sched/fair: Add util_est on top of PELT")
    983de5f97169a ("firmware: tegra: Add BPMP support")
    9881b024b7d76 ("sched/clock: Delay switching sched_clock to stable")
    acb04058de494 ("sched/clock: Fix hotplug crash")
    ae7e81c077d60 ("sched/headers: Prepare for new header dependencies before moving code to <uapi/linux/sched/types.h>")
    b52992c06c902 ("drm/i915: Support asynchronous waits on struct fence from i915_gem_request")
    ca791d7f42563 ("firmware: tegra: Add IVC library")
    e601757102cfd ("sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h>")
    ea8b1c4a6019f ("drivers: psci: PSCI checker module")
    ee6a3d19f15b9 ("sched/headers: Remove the <linux/topology.h> include from <linux/sched.h>")
    fd7712337ff09 ("sched/headers: Prepare to remove the <linux/gfp.h> include from <linux/sched.h>")

v4.4.230: Failed to apply! Possible dependencies:
    051f263098a90 ("IB/mlx5: Add driver cross-channel support")
    146d2f1af3245 ("IB/mlx5: Allocate a Transport Domain for each ucontext")
    2811ba51b0495 ("IB/mlx5: Add RoCE fields to Address Vector")
    37aa5c36aa70c ("IB/mlx5: Add UARs write-combining and non-cached mapping")
    3b1baa6496e6b ("sched/fair: Add 'group_misfit_task' load-balance type")
    3cca26069a4b7 ("IB/mlx5: Support IB device's callbacks for adding/deleting GIDs")
    3f89a643eb295 ("IB/mlx5: Extend query_device/port to support RoCE")
    5eca1c10cbaa9 ("sched/headers: Clean up <linux/sched.h>")
    6e84f31522f93 ("sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>")
    7c2344c3bbf97 ("IB/mlx5: Implements disassociate_ucontext API")
    7f65ea42eb00b ("sched/fair: Add util_est on top of PELT")
    a8c21a5451d83 ("drm/etnaviv: add initial etnaviv DRM driver")
    a9709d6874d55 ("vhost: convert pre sorted vhost memory array to interval tree")
    b368d7cb8ceb7 ("IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext")
    cfb5e088e26ae ("IB/mlx5: Add CQE version 1 support to user QPs and SRQs")
    d69e3bcf79764 ("IB/mlx5: Mmap the HCA's core clock register to user-space")
    ebd61f68e1c79 ("IB/mlx5: Support IB device's callback for getting the link layer")
    ee6a3d19f15b9 ("sched/headers: Remove the <linux/topology.h> include from <linux/sched.h>")
    fc24fc5e9530a ("IB/mlx5: Support IB device's callback for getting its netdev")
    fd7712337ff09 ("sched/headers: Prepare to remove the <linux/gfp.h> include from <linux/sched.h>")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] sched/fair: handle case of task_h_load() returning 0
  2020-07-16  0:27 ` Sasha Levin
@ 2020-07-16  7:21   ` Vincent Guittot
  0 siblings, 0 replies; 4+ messages in thread
From: Vincent Guittot @ 2020-07-16  7:21 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Valentin Schneider, # v4 . 16+

On Thu, 16 Jul 2020 at 02:27, Sasha Levin <sashal@kernel.org> wrote:
>
> Hi
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: 4.4+
>
> The bot has tested the following trees: v5.7.8, v5.4.51, v4.19.132, v4.14.188, v4.9.230, v4.4.230.
>
> v5.7.8: Build OK!
> v5.4.51: Failed to apply! Possible dependencies:
>     0b0695f2b34a4 ("sched/fair: Rework load_balance()")
>     490ba971d8b49 ("sched/fair: Clean up asym packing")
>     a349834703010 ("sched/fair: Rename sg_lb_stats::sum_nr_running to sum_h_nr_running")
>     fcf0553db6f4c ("sched/fair: Remove meaningless imbalance calculation")
>
> v4.19.132: Failed to apply! Possible dependencies:
>     0b0695f2b34a4 ("sched/fair: Rework load_balance()")
>     1c1b8a7b03ef5 ("sched/fair: Replace source_load() & target_load() with weighted_cpuload()")
>     3b1baa6496e6b ("sched/fair: Add 'group_misfit_task' load-balance type")
>     4ad3831a9d4af ("sched/fair: Don't move tasks to lower capacity CPUs unless necessary")
>     575638d1047eb ("sched/core: Change root_domain->overload type to int")
>     630246a06ae2a ("sched/fair: Clean-up update_sg_lb_stats parameters")
>     6aa140fa45089 ("sched/topology: Reference the Energy Model of CPUs when available")
>     757ffdd705ee9 ("sched/fair: Set rq->rd->overload when misfit")
>     a3df067974c52 ("sched/fair: Rename weighted_cpuload() to cpu_runnable_load()")
>     cad68e552e777 ("sched/fair: Consider misfit tasks when load-balancing")
>     dbbad719449e0 ("sched/fair: Change 'prefer_sibling' type to bool")
>     e90c8fe15a3bf ("sched/fair: Wrap rq->rd->overload accesses with READ/WRITE_ONCE()")
>     fdf5f315d5cfa ("sched/fair: Disable LB_BIAS by default")
>
> v4.14.188: Failed to apply! Possible dependencies:
>     0b0695f2b34a4 ("sched/fair: Rework load_balance()")
>     1c1b8a7b03ef5 ("sched/fair: Replace source_load() & target_load() with weighted_cpuload()")
>     2a2f5d4e44ed1 ("sched/fair: Rewrite cfs_rq->removed_*avg")
>     3b1baa6496e6b ("sched/fair: Add 'group_misfit_task' load-balance type")
>     7f65ea42eb00b ("sched/fair: Add util_est on top of PELT")
>     97fb7a0a8944b ("sched: Clean up and harmonize the coding style of the scheduler code base")
>     a3df067974c52 ("sched/fair: Rename weighted_cpuload() to cpu_runnable_load()")
>     cad68e552e777 ("sched/fair: Consider misfit tasks when load-balancing")
>     d18be45dbfef2 ("sched/cpufreq: Split utilization signals")
>     d4edd662ac165 ("sched/cpufreq: Use the DEADLINE utilization signal")
>     f01415fdbfe83 ("sched/fair: Use 'unsigned long' for utilization, consistently")
>
> v4.9.230: Failed to apply! Possible dependencies:
>     3b1baa6496e6b ("sched/fair: Add 'group_misfit_task' load-balance type")
>     4eb5aaa3af8a5 ("sched/headers: Prepare for new header dependencies before moving code to <linux/sched/autogroup.h>")
>     4f17722c7256a ("sched/headers: Prepare for new header dependencies before moving code to <linux/sched/loadavg.h>")
>     555570d744f81 ("sched/clock: Update static_key usage")
>     5eca1c10cbaa9 ("sched/headers: Clean up <linux/sched.h>")
>     6e84f31522f93 ("sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>")
>     7f65ea42eb00b ("sched/fair: Add util_est on top of PELT")
>     983de5f97169a ("firmware: tegra: Add BPMP support")
>     9881b024b7d76 ("sched/clock: Delay switching sched_clock to stable")
>     acb04058de494 ("sched/clock: Fix hotplug crash")
>     ae7e81c077d60 ("sched/headers: Prepare for new header dependencies before moving code to <uapi/linux/sched/types.h>")
>     b52992c06c902 ("drm/i915: Support asynchronous waits on struct fence from i915_gem_request")
>     ca791d7f42563 ("firmware: tegra: Add IVC library")
>     e601757102cfd ("sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h>")
>     ea8b1c4a6019f ("drivers: psci: PSCI checker module")
>     ee6a3d19f15b9 ("sched/headers: Remove the <linux/topology.h> include from <linux/sched.h>")
>     fd7712337ff09 ("sched/headers: Prepare to remove the <linux/gfp.h> include from <linux/sched.h>")
>
> v4.4.230: Failed to apply! Possible dependencies:
>     051f263098a90 ("IB/mlx5: Add driver cross-channel support")
>     146d2f1af3245 ("IB/mlx5: Allocate a Transport Domain for each ucontext")
>     2811ba51b0495 ("IB/mlx5: Add RoCE fields to Address Vector")
>     37aa5c36aa70c ("IB/mlx5: Add UARs write-combining and non-cached mapping")
>     3b1baa6496e6b ("sched/fair: Add 'group_misfit_task' load-balance type")
>     3cca26069a4b7 ("IB/mlx5: Support IB device's callbacks for adding/deleting GIDs")
>     3f89a643eb295 ("IB/mlx5: Extend query_device/port to support RoCE")
>     5eca1c10cbaa9 ("sched/headers: Clean up <linux/sched.h>")
>     6e84f31522f93 ("sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>")
>     7c2344c3bbf97 ("IB/mlx5: Implements disassociate_ucontext API")
>     7f65ea42eb00b ("sched/fair: Add util_est on top of PELT")
>     a8c21a5451d83 ("drm/etnaviv: add initial etnaviv DRM driver")
>     a9709d6874d55 ("vhost: convert pre sorted vhost memory array to interval tree")
>     b368d7cb8ceb7 ("IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext")
>     cfb5e088e26ae ("IB/mlx5: Add CQE version 1 support to user QPs and SRQs")
>     d69e3bcf79764 ("IB/mlx5: Mmap the HCA's core clock register to user-space")
>     ebd61f68e1c79 ("IB/mlx5: Support IB device's callback for getting the link layer")
>     ee6a3d19f15b9 ("sched/headers: Remove the <linux/topology.h> include from <linux/sched.h>")
>     fc24fc5e9530a ("IB/mlx5: Support IB device's callback for getting its netdev")
>     fd7712337ff09 ("sched/headers: Prepare to remove the <linux/gfp.h> include from <linux/sched.h>")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?

Once it will be queued, i will provide a backport or the stable
kernels that can't apply it

>
> --
> Thanks
> Sasha

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip: sched/urgent] sched/fair: handle case of task_h_load() returning 0
  2020-07-10 15:24 [PATCH v3] sched/fair: handle case of task_h_load() returning 0 Vincent Guittot
  2020-07-16  0:27 ` Sasha Levin
@ 2020-07-17 11:21 ` tip-bot2 for Vincent Guittot
  1 sibling, 0 replies; 4+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2020-07-17 11:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Peter Zijlstra (Intel),
	Valentin Schneider, Dietmar Eggemann, stable, x86, LKML

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     01cfcde9c26d8555f0e6e9aea9d6049f87683998
Gitweb:        https://git.kernel.org/tip/01cfcde9c26d8555f0e6e9aea9d6049f87683998
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Fri, 10 Jul 2020 17:24:26 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 16 Jul 2020 23:19:48 +02:00

sched/fair: handle case of task_h_load() returning 0

task_h_load() can return 0 in some situations like running stress-ng
mmapfork, which forks thousands of threads, in a sched group on a 224 cores
system. The load balance doesn't handle this correctly because
env->imbalance never decreases and it will stop pulling tasks only after
reaching loop_max, which can be equal to the number of running tasks of
the cfs. Make sure that imbalance will be decreased by at least 1.

misfit task is the other feature that doesn't handle correctly such
situation although it's probably more difficult to face the problem
because of the smaller number of CPUs and running tasks on heterogenous
system.

We can't simply ensure that task_h_load() returns at least one because it
would imply to handle underflow in other places.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: <stable@vger.kernel.org> # v4.4+
Link: https://lkml.kernel.org/r/20200710152426.16981-1-vincent.guittot@linaro.org
---
 kernel/sched/fair.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 658aa7a..04fa8db 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4039,7 +4039,11 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
 		return;
 	}
 
-	rq->misfit_task_load = task_h_load(p);
+	/*
+	 * Make sure that misfit_task_load will not be null even if
+	 * task_h_load() returns 0.
+	 */
+	rq->misfit_task_load = max_t(unsigned long, task_h_load(p), 1);
 }
 
 #else /* CONFIG_SMP */
@@ -7638,7 +7642,14 @@ static int detach_tasks(struct lb_env *env)
 
 		switch (env->migration_type) {
 		case migrate_load:
-			load = task_h_load(p);
+			/*
+			 * Depending of the number of CPUs and tasks and the
+			 * cgroup hierarchy, task_h_load() can return a null
+			 * value. Make sure that env->imbalance decreases
+			 * otherwise detach_tasks() will stop only after
+			 * detaching up to loop_max tasks.
+			 */
+			load = max_t(unsigned long, task_h_load(p), 1);
 
 			if (sched_feat(LB_MIN) &&
 			    load < 16 && !env->sd->nr_balance_failed)

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-07-17 11:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-10 15:24 [PATCH v3] sched/fair: handle case of task_h_load() returning 0 Vincent Guittot
2020-07-16  0:27 ` Sasha Levin
2020-07-16  7:21   ` Vincent Guittot
2020-07-17 11:21 ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).