From: Zhang Qiao <zhangqiao22@huawei.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: <linux-kernel@vger.kernel.org>, <mingo@redhat.com>,
<juri.lelli@redhat.com>, <dietmar.eggemann@arm.com>,
<rostedt@goodmis.org>, <bsegall@google.com>, <mgorman@suse.de>,
<bristot@redhat.com>, <vschneid@redhat.com>, <rkagan@amazon.de>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH v2] sched/fair: sanitize vruntime of entity being migrated
Date: Sat, 11 Mar 2023 17:57:04 +0800 [thread overview]
Message-ID: <ef2f07f1-fe3a-624f-52e7-1089138dc137@huawei.com> (raw)
In-Reply-To: <ZAs+zV0o9ShO7nLT@vingu-book>
在 2023/3/10 22:29, Vincent Guittot 写道:
> Le jeudi 09 mars 2023 à 16:14:38 (+0100), Vincent Guittot a écrit :
>> On Thu, 9 Mar 2023 at 15:37, Peter Zijlstra <peterz@infradead.org> wrote:
>>>
>>> On Thu, Mar 09, 2023 at 03:28:25PM +0100, Peter Zijlstra wrote:
>>>> On Thu, Mar 09, 2023 at 02:34:05PM +0100, Vincent Guittot wrote:
>>>>
>>>>> Then, even if we don't clear exec_start before migrating and keep
>>>>> current value to be used in place_entity on the new cpu, we can't
>>>>> compare the rq_clock_task(rq_of(cfs_rq)) of 2 different rqs AFAICT
>>>>
>>>> Blergh -- indeed, irq and steal time can skew them between CPUs :/
>>>> I suppose we can fudge that... wait_start (which is basically what we're
>>>> making it do) also does that IIRC.
>>>>
>>>> I really dislike having this placement muck spreadout like proposed.
>>>
>>> Also, I think we might be over-engineering this, we don't care about
>>> accuracy at all, all we really care about is 'long-time'.
>>
>> you mean taking the patch 1/2 that you mentioned here to add a
>> migrated field:
>> https://lore.kernel.org/all/68832dfbb60fda030540b5f4e39c5801942689b1.1648228023.git.tim.c.chen@linux.intel.com/T/#ma5637eb8010f3f4a4abff778af8db705429d003b
>>
>> And assume that the divergence between the rq_clock_task() can be ignored ?
>>
>> That could probably work but we need to replace the (60LL *
>> NSEC_PER_SEC) by ((1ULL << 63) / NICE_0_LOAD) because 60sec divergence
>> would not be unrealistic.
>> and a comment to explain why it's acceptable
>
> Zhang,
>
> Could you try the patch below ?
> This is a rebase/merge/update of:
> -patch 1/2 above and
> -https://lore.kernel.org/lkml/20230209193107.1432770-1-rkagan@amazon.de/
I applyed and tested this patch, and it make hackbench slower.
According to my previous test results. The good result is 82.1(s).
But the result of this patch is 108.725(s).
> version1: v6.2
> version2: v6.2 + commit 829c1651e9c4
> version3: v6.2 + commit 829c1651e9c4 + this patch
>
> -------------------------------------------------
> version1 version2 version3
> test1 81.0 118.1 82.1
> test2 82.1 116.9 80.3
> test3 83.2 103.9 83.3
> avg(s) 82.1 113.0 81.9
>
> -------------------------------------------------
>
> The proposal accepts a divergence of up to 52 days between the 2 rqs.
>
> If this work, we will prepare a proper patch
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 63d242164b1a..cb8af0a137f7 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -550,6 +550,7 @@ struct sched_entity {
> struct rb_node run_node;
> struct list_head group_node;
> unsigned int on_rq;
> + unsigned int migrated;
>
> u64 exec_start;
> u64 sum_exec_runtime;
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7a1b1f855b96..36acd9598b40 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1057,6 +1057,7 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
> /*
> * We are starting a new run period:
> */
> + se->migrated = 0;
> se->exec_start = rq_clock_task(rq_of(cfs_rq));
> }
>
> @@ -4684,13 +4685,23 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>
> /*
> * Pull vruntime of the entity being placed to the base level of
> - * cfs_rq, to prevent boosting it if placed backwards. If the entity
> - * slept for a long time, don't even try to compare its vruntime with
> - * the base as it may be too far off and the comparison may get
> - * inversed due to s64 overflow.
> + * cfs_rq, to prevent boosting it if placed backwards.
> + * However, min_vruntime can advance much faster than real time, with
> + * the exterme being when an entity with the minimal weight always runs
> + * on the cfs_rq. If the new entity slept for long, its vruntime
> + * difference from min_vruntime may overflow s64 and their comparison
> + * may get inversed, so ignore the entity's original vruntime in that
> + * case.
> + * The maximal vruntime speedup is given by the ratio of normal to
> + * minimal weight: NICE_0_LOAD / MIN_SHARES, so cutting off on the
why not is `scale_load_down(NICE_0_LOAD) / MIN_SHARES` here ?
> + * sleep time of 2^63 / NICE_0_LOAD should be safe.
> + * When placing a migrated waking entity, its exec_start has been set
> + * from a different rq. In order to take into account a possible
> + * divergence between new and prev rq's clocks task because of irq and
This divergence might be larger, it cause `sleep_time` maybe negative.
> + * stolen time, we take an additional margin.
> */
> sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> - if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> + if ((s64)sleep_time > (1ULL << 63) / NICE_0_LOAD / 2)> se->vruntime = vruntime;
> else
> se->vruntime = max_vruntime(se->vruntime, vruntime);
> @@ -7658,7 +7669,7 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
> se->avg.last_update_time = 0;
>
> /* We have migrated, no longer consider this task hot */
> - se->exec_start = 0;
> + se->migrated = 1;
>
> update_scan_period(p, new_cpu);
> }
> @@ -8344,6 +8355,9 @@ static int task_hot(struct task_struct *p, struct lb_env *env)
> if (sysctl_sched_migration_cost == 0)
> return 0;
>
> + if (p->se.migrated)
> + return 0;
> +
> delta = rq_clock_task(env->src_rq) - p->se.exec_start;
>
> return delta < (s64)sysctl_sched_migration_cost;
>
>
>
>>
>>
>>>
>>>
> .
>
next prev parent reply other threads:[~2023-03-11 9:57 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-06 13:24 [PATCH v2] sched/fair: sanitize vruntime of entity being migrated Zhang Qiao
2023-03-06 13:53 ` Vincent Guittot
2023-03-07 10:26 ` Vincent Guittot
2023-03-07 11:05 ` Zhang Qiao
2023-03-07 13:41 ` Zhang Qiao
2023-03-08 8:01 ` Vincent Guittot
2023-03-08 12:55 ` Vincent Guittot
2023-03-09 8:37 ` Zhang Qiao
2023-03-09 9:09 ` Dietmar Eggemann
2023-03-09 9:30 ` Zhang Qiao
2023-03-09 10:48 ` Vincent Guittot
2023-03-09 14:23 ` Zhang Qiao
2023-03-07 2:16 ` kernel test robot
2023-03-07 12:45 ` Dietmar Eggemann
2023-03-07 14:06 ` Zhang Qiao
2023-03-09 9:43 ` Zhang Qiao
2023-03-08 14:33 ` Chen Yu
2023-03-09 13:05 ` Peter Zijlstra
2023-03-09 13:34 ` Vincent Guittot
2023-03-09 14:28 ` Peter Zijlstra
2023-03-09 14:36 ` Peter Zijlstra
2023-03-09 15:14 ` Vincent Guittot
2023-03-10 14:29 ` Vincent Guittot
2023-03-11 9:57 ` Zhang Qiao [this message]
2023-03-13 14:23 ` Vincent Guittot
2023-03-14 11:03 ` Zhang Qiao
2023-03-14 13:26 ` Vincent Guittot
2023-03-14 13:38 ` Zhang Qiao
2023-03-14 13:39 ` Vincent Guittot
2023-03-14 15:32 ` Vincent Guittot
2023-03-15 9:16 ` Zhang Qiao
2023-03-15 15:30 ` Vincent Guittot
2023-03-13 9:06 ` Dietmar Eggemann
2023-03-13 18:17 ` Dietmar Eggemann
2023-03-14 7:41 ` Vincent Guittot
2023-03-14 12:07 ` Peter Zijlstra
2023-03-14 13:24 ` Vincent Guittot
2023-03-14 17:16 ` Peter Zijlstra
2023-03-15 7:18 ` Vincent Guittot
2023-03-15 8:42 ` Vincent Guittot
2023-03-15 10:15 ` Dietmar Eggemann
2023-03-15 10:21 ` Vincent Guittot
2023-03-15 13:35 ` Dietmar Eggemann
2023-03-15 15:32 ` Vincent Guittot
2023-03-14 13:29 ` Dietmar Eggemann
2023-03-14 13:37 ` Dietmar Eggemann
2023-03-17 16:08 Vincent Guittot
2023-03-18 7:45 ` Zhang Qiao
2023-03-20 12:29 ` Dietmar Eggemann
2023-03-20 13:26 ` Vincent Guittot
2023-03-21 10:02 ` Peter Zijlstra
2023-03-21 10:29 ` Dietmar Eggemann
2023-03-21 10:49 ` Peter Zijlstra
2023-03-21 11:12 ` Vincent Guittot
2023-03-21 11:13 ` Dietmar Eggemann
2023-03-21 12:26 ` Peter Zijlstra
2023-03-21 12:28 ` Peter Zijlstra
2023-03-21 12:38 ` Vincent Guittot
2023-03-24 4:05 ` Chen Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ef2f07f1-fe3a-624f-52e7-1089138dc137@huawei.com \
--to=zhangqiao22@huawei.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rkagan@amazon.de \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.