All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, linux-kernel@vger.kernel.org,
	zhangqiao22@huawei.com
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH v2] sched/fair: sanitize vruntime of entity being migrated
Date: Fri, 17 Mar 2023 17:08:10 +0100	[thread overview]
Message-ID: <20230317160810.107988-1-vincent.guittot@linaro.org> (raw)

Commit 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed")
fixes an overflowing bug, but ignore a case that se->exec_start is reset
after a migration.

For fixing this case, we delay the reset of se->exec_start after
placing the entity which se->exec_start to detect long sleeping task.

In order to take into account a possible divergence between the clock_task
of 2 rqs, we increase the threshold to around 104 days.


Fixes: 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed")
Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---

My last proposal was not yet correct as the exec_start was not always
reset after migrating a task. I finally found this solution which keeps
the long sleep detection to one place as well as the reset of se->exec_start.

 kernel/sched/fair.c | 57 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 45 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0f499e9a74b5..421173fde351 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4648,11 +4648,33 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
 #endif
 }
 
+static inline bool entity_is_long_sleeper(struct sched_entity *se)
+{
+	struct cfs_rq *cfs_rq;
+	u64 sleep_time;
+
+	if (se->exec_start == 0)
+		return false;
+
+	cfs_rq = cfs_rq_of(se);
+
+	sleep_time = rq_clock_task(rq_of(cfs_rq));
+
+	/* Happen while migrating because of clock task divergence */
+	if (sleep_time <= se->exec_start)
+		return false;
+
+	sleep_time -= se->exec_start;
+	if (sleep_time > ((1ULL << 63) / scale_load_down(NICE_0_LOAD)))
+		return true;
+
+	return false;
+}
+
 static void
 place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 {
 	u64 vruntime = cfs_rq->min_vruntime;
-	u64 sleep_time;
 
 	/*
 	 * The 'current' period is already promised to the current tasks,
@@ -4684,13 +4706,24 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 
 	/*
 	 * Pull vruntime of the entity being placed to the base level of
-	 * cfs_rq, to prevent boosting it if placed backwards.  If the entity
-	 * slept for a long time, don't even try to compare its vruntime with
-	 * the base as it may be too far off and the comparison may get
-	 * inversed due to s64 overflow.
-	 */
-	sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
-	if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
+	 * cfs_rq, to prevent boosting it if placed backwards.
+	 * However, min_vruntime can advance much faster than real time, with
+	 * the extreme being when an entity with the minimal weight always runs
+	 * on the cfs_rq. If the waking entity slept for a long time, its
+	 * vruntime difference from min_vruntime may overflow s64 and their
+	 * comparison may get inversed, so ignore the entity's original
+	 * vruntime in that case.
+	 * The maximal vruntime speedup is given by the ratio of normal to
+	 * minimal weight: scale_load_down(NICE_0_LOAD) / MIN_SHARES.
+	 * When placing a migrated waking entity, its exec_start has been set
+	 * from a different rq. In order to take into account a possible
+	 * divergence between new and prev rq's clocks task because of irq and
+	 * stolen time, we take an additional margin.
+	 * So, cutting off on the sleep time of
+	 *     2^63 / scale_load_down(NICE_0_LOAD) ~ 104 days
+	 * should be safe.
+	 */
+	if (entity_is_long_sleeper(se))
 		se->vruntime = vruntime;
 	else
 		se->vruntime = max_vruntime(se->vruntime, vruntime);
@@ -4770,6 +4803,9 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 
 	if (flags & ENQUEUE_WAKEUP)
 		place_entity(cfs_rq, se, 0);
+	/* Entity has migrated, no longer consider this task hot */
+	if (flags & ENQUEUE_MIGRATED)
+		se->exec_start = 0;
 
 	check_schedstat_required();
 	update_stats_enqueue_fair(cfs_rq, se, flags);
@@ -7665,9 +7701,6 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
 	/* Tell new CPU we are migrated */
 	se->avg.last_update_time = 0;
 
-	/* We have migrated, no longer consider this task hot */
-	se->exec_start = 0;
-
 	update_scan_period(p, new_cpu);
 }
 
@@ -8701,7 +8734,7 @@ static void attach_task(struct rq *rq, struct task_struct *p)
 	lockdep_assert_rq_held(rq);
 
 	WARN_ON_ONCE(task_rq(p) != rq);
-	activate_task(rq, p, ENQUEUE_NOCLOCK);
+	activate_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_MIGRATED);
 	check_preempt_curr(rq, p, 0);
 }
 
-- 
2.34.1


             reply	other threads:[~2023-03-17 16:08 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-17 16:08 Vincent Guittot [this message]
2023-03-18  7:45 ` [PATCH v2] sched/fair: sanitize vruntime of entity being migrated Zhang Qiao
2023-03-20 12:29   ` Dietmar Eggemann
2023-03-20 13:26     ` Vincent Guittot
2023-03-21 10:02 ` Peter Zijlstra
2023-03-21 10:29   ` Dietmar Eggemann
2023-03-21 10:49     ` Peter Zijlstra
2023-03-21 11:12       ` Vincent Guittot
2023-03-21 11:13       ` Dietmar Eggemann
2023-03-21 12:26         ` Peter Zijlstra
2023-03-21 12:28 ` Peter Zijlstra
2023-03-21 12:38   ` Vincent Guittot
2023-03-22  9:07 ` [tip: sched/urgent] sched/fair: Sanitize " tip-bot2 for Vincent Guittot
2023-03-24  4:05 ` [PATCH v2] sched/fair: sanitize " Chen Yu
  -- strict thread matches above, loose matches on Subject: below --
2023-03-06 13:24 Zhang Qiao
2023-03-06 13:53 ` Vincent Guittot
2023-03-07 10:26   ` Vincent Guittot
2023-03-07 11:05     ` Zhang Qiao
2023-03-07 13:41     ` Zhang Qiao
2023-03-08  8:01       ` Vincent Guittot
2023-03-08 12:55         ` Vincent Guittot
2023-03-09  8:37           ` Zhang Qiao
2023-03-09  9:09             ` Dietmar Eggemann
2023-03-09  9:30               ` Zhang Qiao
2023-03-09 10:48             ` Vincent Guittot
2023-03-09 14:23               ` Zhang Qiao
2023-03-07  2:16 ` kernel test robot
2023-03-07 12:45 ` Dietmar Eggemann
2023-03-07 14:06   ` Zhang Qiao
2023-03-09  9:43   ` Zhang Qiao
2023-03-08 14:33 ` Chen Yu
2023-03-09 13:05 ` Peter Zijlstra
2023-03-09 13:34   ` Vincent Guittot
2023-03-09 14:28     ` Peter Zijlstra
2023-03-09 14:36       ` Peter Zijlstra
2023-03-09 15:14         ` Vincent Guittot
2023-03-10 14:29           ` Vincent Guittot
2023-03-11  9:57             ` Zhang Qiao
2023-03-13 14:23               ` Vincent Guittot
2023-03-14 11:03                 ` Zhang Qiao
2023-03-14 13:26                   ` Vincent Guittot
2023-03-14 13:38                     ` Zhang Qiao
2023-03-14 13:39                       ` Vincent Guittot
2023-03-14 15:32                         ` Vincent Guittot
2023-03-15  9:16                           ` Zhang Qiao
2023-03-15 15:30                             ` Vincent Guittot
2023-03-13  9:06             ` Dietmar Eggemann
2023-03-13 18:17               ` Dietmar Eggemann
2023-03-14  7:41                 ` Vincent Guittot
2023-03-14 12:07                   ` Peter Zijlstra
2023-03-14 13:24                     ` Vincent Guittot
2023-03-14 17:16                       ` Peter Zijlstra
2023-03-15  7:18                         ` Vincent Guittot
2023-03-15  8:42                           ` Vincent Guittot
2023-03-15 10:15                             ` Dietmar Eggemann
2023-03-15 10:21                               ` Vincent Guittot
2023-03-15 13:35                                 ` Dietmar Eggemann
2023-03-15 15:32                                   ` Vincent Guittot
2023-03-14 13:29                     ` Dietmar Eggemann
2023-03-14 13:37                       ` Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230317160810.107988-1-vincent.guittot@linaro.org \
    --to=vincent.guittot@linaro.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vschneid@redhat.com \
    --cc=zhangqiao22@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.