linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] sched/fair: sanitize vruntime of entity being migrated
@ 2023-03-06 13:24 Zhang Qiao
  2023-03-06 13:53 ` Vincent Guittot
                   ` (4 more replies)
  0 siblings, 5 replies; 59+ messages in thread
From: Zhang Qiao @ 2023-03-06 13:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, rkagan,
	zhangqiao22

Commit 829c1651e9c4 ("sched/fair: sanitize vruntime of
entity being placed") fix an overflowing bug, but ignore
a case that se->exec_start is reset after a migration.

For fixing this case, we reset the vruntime of a long
sleeping task in migrate_task_rq_fair().

Fixes: 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed")
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
---

v1 -> v2:
- fix some typos and update comments
- reformat the patch

---
 kernel/sched/fair.c | 76 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 55 insertions(+), 21 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a1b1f855b96..74c9918ffe76 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4648,11 +4648,45 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
 #endif
 }
 
+static inline bool entity_is_long_sleep(struct sched_entity *se)
+{
+	struct cfs_rq *cfs_rq;
+	u64 sleep_time;
+
+	if (se->exec_start == 0)
+		return false;
+
+	cfs_rq = cfs_rq_of(se);
+	sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
+	if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
+		return true;
+
+	return false;
+}
+
+static inline u64 sched_sleeper_credit(struct sched_entity *se)
+{
+	unsigned long thresh;
+
+	if (se_is_idle(se))
+		thresh = sysctl_sched_min_granularity;
+	else
+		thresh = sysctl_sched_latency;
+
+	/*
+	 * Halve their sleep time's effect, to allow
+	 * for a gentler effect of sleepers:
+	 */
+	if (sched_feat(GENTLE_FAIR_SLEEPERS))
+		thresh >>= 1;
+
+	return thresh;
+}
+
 static void
 place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 {
 	u64 vruntime = cfs_rq->min_vruntime;
-	u64 sleep_time;
 
 	/*
 	 * The 'current' period is already promised to the current tasks,
@@ -4664,23 +4698,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 		vruntime += sched_vslice(cfs_rq, se);
 
 	/* sleeps up to a single latency don't count. */
-	if (!initial) {
-		unsigned long thresh;
-
-		if (se_is_idle(se))
-			thresh = sysctl_sched_min_granularity;
-		else
-			thresh = sysctl_sched_latency;
-
-		/*
-		 * Halve their sleep time's effect, to allow
-		 * for a gentler effect of sleepers:
-		 */
-		if (sched_feat(GENTLE_FAIR_SLEEPERS))
-			thresh >>= 1;
-
-		vruntime -= thresh;
-	}
+	if (!initial)
+		vruntime -= sched_sleeper_credit(se);
 
 	/*
 	 * Pull vruntime of the entity being placed to the base level of
@@ -4689,8 +4708,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 	 * the base as it may be too far off and the comparison may get
 	 * inversed due to s64 overflow.
 	 */
-	sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
-	if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
+	if (entity_is_long_sleep(se))
 		se->vruntime = vruntime;
 	else
 		se->vruntime = max_vruntime(se->vruntime, vruntime);
@@ -7635,7 +7653,23 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
 	if (READ_ONCE(p->__state) == TASK_WAKING) {
 		struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
-		se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
+		/*
+		 * We determine whether a task sleeps for long by checking
+		 * se->exec_start, and if it is, we sanitize its vruntime at
+		 * place_entity(). However, after a migration, this detection
+		 * method fails due to se->exec_start being reset.
+		 *
+		 * For fixing this case, we add the same check here. For a task
+		 * which has slept for a long time, its vruntime should be reset
+		 * to cfs_rq->min_vruntime with a sleep credit. Because waking
+		 * task's vruntime will be added to cfs_rq->min_vruntime when
+		 * enqueue, we only need to reset the se->vruntime of waking task
+		 * to a credit here.
+		 */
+		if (entity_is_long_sleep(se))
+			se->vruntime = -sched_sleeper_credit(se);
+		else
+			se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
 	}
 
 	if (!task_on_rq_migrating(p)) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread
* [PATCH v2] sched/fair: sanitize vruntime of entity being migrated
@ 2023-03-17 16:08 Vincent Guittot
  2023-03-18  7:45 ` Zhang Qiao
                   ` (3 more replies)
  0 siblings, 4 replies; 59+ messages in thread
From: Vincent Guittot @ 2023-03-17 16:08 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, linux-kernel, zhangqiao22
  Cc: Vincent Guittot

Commit 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed")
fixes an overflowing bug, but ignore a case that se->exec_start is reset
after a migration.

For fixing this case, we delay the reset of se->exec_start after
placing the entity which se->exec_start to detect long sleeping task.

In order to take into account a possible divergence between the clock_task
of 2 rqs, we increase the threshold to around 104 days.


Fixes: 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed")
Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---

My last proposal was not yet correct as the exec_start was not always
reset after migrating a task. I finally found this solution which keeps
the long sleep detection to one place as well as the reset of se->exec_start.

 kernel/sched/fair.c | 57 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 45 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0f499e9a74b5..421173fde351 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4648,11 +4648,33 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
 #endif
 }
 
+static inline bool entity_is_long_sleeper(struct sched_entity *se)
+{
+	struct cfs_rq *cfs_rq;
+	u64 sleep_time;
+
+	if (se->exec_start == 0)
+		return false;
+
+	cfs_rq = cfs_rq_of(se);
+
+	sleep_time = rq_clock_task(rq_of(cfs_rq));
+
+	/* Happen while migrating because of clock task divergence */
+	if (sleep_time <= se->exec_start)
+		return false;
+
+	sleep_time -= se->exec_start;
+	if (sleep_time > ((1ULL << 63) / scale_load_down(NICE_0_LOAD)))
+		return true;
+
+	return false;
+}
+
 static void
 place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 {
 	u64 vruntime = cfs_rq->min_vruntime;
-	u64 sleep_time;
 
 	/*
 	 * The 'current' period is already promised to the current tasks,
@@ -4684,13 +4706,24 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 
 	/*
 	 * Pull vruntime of the entity being placed to the base level of
-	 * cfs_rq, to prevent boosting it if placed backwards.  If the entity
-	 * slept for a long time, don't even try to compare its vruntime with
-	 * the base as it may be too far off and the comparison may get
-	 * inversed due to s64 overflow.
-	 */
-	sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
-	if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
+	 * cfs_rq, to prevent boosting it if placed backwards.
+	 * However, min_vruntime can advance much faster than real time, with
+	 * the extreme being when an entity with the minimal weight always runs
+	 * on the cfs_rq. If the waking entity slept for a long time, its
+	 * vruntime difference from min_vruntime may overflow s64 and their
+	 * comparison may get inversed, so ignore the entity's original
+	 * vruntime in that case.
+	 * The maximal vruntime speedup is given by the ratio of normal to
+	 * minimal weight: scale_load_down(NICE_0_LOAD) / MIN_SHARES.
+	 * When placing a migrated waking entity, its exec_start has been set
+	 * from a different rq. In order to take into account a possible
+	 * divergence between new and prev rq's clocks task because of irq and
+	 * stolen time, we take an additional margin.
+	 * So, cutting off on the sleep time of
+	 *     2^63 / scale_load_down(NICE_0_LOAD) ~ 104 days
+	 * should be safe.
+	 */
+	if (entity_is_long_sleeper(se))
 		se->vruntime = vruntime;
 	else
 		se->vruntime = max_vruntime(se->vruntime, vruntime);
@@ -4770,6 +4803,9 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 
 	if (flags & ENQUEUE_WAKEUP)
 		place_entity(cfs_rq, se, 0);
+	/* Entity has migrated, no longer consider this task hot */
+	if (flags & ENQUEUE_MIGRATED)
+		se->exec_start = 0;
 
 	check_schedstat_required();
 	update_stats_enqueue_fair(cfs_rq, se, flags);
@@ -7665,9 +7701,6 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
 	/* Tell new CPU we are migrated */
 	se->avg.last_update_time = 0;
 
-	/* We have migrated, no longer consider this task hot */
-	se->exec_start = 0;
-
 	update_scan_period(p, new_cpu);
 }
 
@@ -8701,7 +8734,7 @@ static void attach_task(struct rq *rq, struct task_struct *p)
 	lockdep_assert_rq_held(rq);
 
 	WARN_ON_ONCE(task_rq(p) != rq);
-	activate_task(rq, p, ENQUEUE_NOCLOCK);
+	activate_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_MIGRATED);
 	check_preempt_curr(rq, p, 0);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2023-03-24  4:05 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-06 13:24 [PATCH v2] sched/fair: sanitize vruntime of entity being migrated Zhang Qiao
2023-03-06 13:53 ` Vincent Guittot
2023-03-07 10:26   ` Vincent Guittot
2023-03-07 11:05     ` Zhang Qiao
2023-03-07 13:41     ` Zhang Qiao
2023-03-08  8:01       ` Vincent Guittot
2023-03-08 12:55         ` Vincent Guittot
2023-03-09  8:37           ` Zhang Qiao
2023-03-09  9:09             ` Dietmar Eggemann
2023-03-09  9:30               ` Zhang Qiao
2023-03-09 10:48             ` Vincent Guittot
2023-03-09 14:23               ` Zhang Qiao
2023-03-07  2:16 ` kernel test robot
2023-03-07 12:45 ` Dietmar Eggemann
2023-03-07 14:06   ` Zhang Qiao
2023-03-09  9:43   ` Zhang Qiao
2023-03-08 14:33 ` Chen Yu
2023-03-09 13:05 ` Peter Zijlstra
2023-03-09 13:34   ` Vincent Guittot
2023-03-09 14:28     ` Peter Zijlstra
2023-03-09 14:36       ` Peter Zijlstra
2023-03-09 15:14         ` Vincent Guittot
2023-03-10 14:29           ` Vincent Guittot
2023-03-11  9:57             ` Zhang Qiao
2023-03-13 14:23               ` Vincent Guittot
2023-03-14 11:03                 ` Zhang Qiao
2023-03-14 13:26                   ` Vincent Guittot
2023-03-14 13:38                     ` Zhang Qiao
2023-03-14 13:39                       ` Vincent Guittot
2023-03-14 15:32                         ` Vincent Guittot
2023-03-15  9:16                           ` Zhang Qiao
2023-03-15 15:30                             ` Vincent Guittot
2023-03-13  9:06             ` Dietmar Eggemann
2023-03-13 18:17               ` Dietmar Eggemann
2023-03-14  7:41                 ` Vincent Guittot
2023-03-14 12:07                   ` Peter Zijlstra
2023-03-14 13:24                     ` Vincent Guittot
2023-03-14 17:16                       ` Peter Zijlstra
2023-03-15  7:18                         ` Vincent Guittot
2023-03-15  8:42                           ` Vincent Guittot
2023-03-15 10:15                             ` Dietmar Eggemann
2023-03-15 10:21                               ` Vincent Guittot
2023-03-15 13:35                                 ` Dietmar Eggemann
2023-03-15 15:32                                   ` Vincent Guittot
2023-03-14 13:29                     ` Dietmar Eggemann
2023-03-14 13:37                       ` Dietmar Eggemann
2023-03-17 16:08 Vincent Guittot
2023-03-18  7:45 ` Zhang Qiao
2023-03-20 12:29   ` Dietmar Eggemann
2023-03-20 13:26     ` Vincent Guittot
2023-03-21 10:02 ` Peter Zijlstra
2023-03-21 10:29   ` Dietmar Eggemann
2023-03-21 10:49     ` Peter Zijlstra
2023-03-21 11:12       ` Vincent Guittot
2023-03-21 11:13       ` Dietmar Eggemann
2023-03-21 12:26         ` Peter Zijlstra
2023-03-21 12:28 ` Peter Zijlstra
2023-03-21 12:38   ` Vincent Guittot
2023-03-24  4:05 ` Chen Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).