All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Yuyang Du <yuyang.du@intel.com>, Leo Yan <leo.yan@linaro.org>,
	Steve Muckle <steve.muckle@linaro.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-kernel@vger.kernel.org, eas-dev@lists.linaro.org
Subject: Re: [PATCH RFC] sched/fair: let cpu's cfs_rq to reflect task migration
Date: Wed, 6 Apr 2016 19:53:53 +0100	[thread overview]
Message-ID: <57055B41.6000906@arm.com> (raw)
In-Reply-To: <20160406083702.GE18516@e105550-lin.cambridge.arm.com>

On 06/04/16 09:37, Morten Rasmussen wrote:
> On Tue, Apr 05, 2016 at 06:00:40PM +0100, Dietmar Eggemann wrote:
>> @@ -2893,8 +2906,12 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>>         se->avg.last_update_time = cfs_rq->avg.last_update_time;
>>         cfs_rq->avg.load_avg += se->avg.load_avg;
>>         cfs_rq->avg.load_sum += se->avg.load_sum;
>> -       cfs_rq->avg.util_avg += se->avg.util_avg;
>> -       cfs_rq->avg.util_sum += se->avg.util_sum;
>> +
>> +       if (!entity_is_task(se))
>> +               return;
>> +
>> +       rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
>> +       rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
> 
> To me it seems that you cannot be sure that the rq_of(cfs_rq)->cfs.avg
> time stamp is aligned with se->avg time stamp, which is necessary before
> you can add/subtract two geometric series without introducing an error.
> 
> attach_entity_load_avg() is called (through a couple of other functions)
> from the for_each_sched_entity() loop in enqueue_task_fair() which works
> its way towards the root cfs_rq, i.e. rq_of(cfs_rq)->cfs. So in the loop
> iteration where you attach the task sched_entity, we haven't yet visited
> and updated rq_of(cfs_rq)->cfs.avg.
> 
> If you just add the task contribution and discover later that there is a
> time delta when you update rq_of(cfs_rq)->cfs.avg you end up decaying
> the task contribution which was already up-to-date and its util
> contribution to rq_of(cfs_rq)->cfs.avg ends up being smaller than it
> should be.
> 
> Am I missing something?

No that's definitely wrong in the current patch. I could defer the attach into
the update_cfs_rq_load_avg() call for the root cfs_rq if 
'&rq_of(cfs_rq)->cfs != cfs_rq' in attach_entity_load_avg():

Something like this (only lightly tested!):

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 51d675715776..d31d9cd453a1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2856,6 +2856,16 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
        decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
                scale_load_down(cfs_rq->load.weight), cfs_rq->curr != NULL, cfs_rq);
 
+       if (cfs_rq->added_util_avg) {
+               sa->util_avg += cfs_rq->added_util_avg;
+               cfs_rq->added_util_avg = 0;
+       }
+
+       if (cfs_rq->added_util_sum) {
+               sa->util_sum += cfs_rq->added_util_sum;
+               cfs_rq->added_util_sum = 0;
+       }
+
 #ifndef CONFIG_64BIT
        smp_wmb();
        cfs_rq->load_last_update_time_copy = sa->last_update_time;
@@ -2910,8 +2920,13 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
        if (!entity_is_task(se))
                return;
 
-       rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
-       rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
+       if (&rq_of(cfs_rq)->cfs == cfs_rq) {
+               rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
+               rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
+       } else {
+               rq_of(cfs_rq)->cfs.added_util_avg = se->avg.util_avg;
+               rq_of(cfs_rq)->cfs.added_util_sum = se->avg.util_sum;
+       }
 }
 
 static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
@@ -4344,8 +4359,11 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
                cfs_rq = cfs_rq_of(se);
                cfs_rq->h_nr_running++;
 
-               if (cfs_rq_throttled(cfs_rq))
+               if (cfs_rq_throttled(cfs_rq)) {
+                       rq_of(cfs_rq)->cfs.added_util_avg = 0;
+                       rq_of(cfs_rq)->cfs.added_util_sum = 0;
                        break;
+               }

                update_load_avg(se, 1);
                update_cfs_shares(cfs_rq);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b2ff5a2bd6df..f0ea3a7eaf07 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -391,6 +391,8 @@ struct cfs_rq {
        unsigned long tg_load_avg_contrib;
 #endif
        atomic_long_t removed_load_avg, removed_util_avg;
+       u32 added_util_sum;
+       unsigned long added_util_avg;
 #ifndef CONFIG_64BIT
        u64 load_last_update_time_copy;

But attach_entity_load_avg() is not only called in enqueue_entity_load_avg() for migrated
tasks but also in attach_task_cfs_rq() which is called from switched_to_fair() and
task_move_group_fair() where we can't assume that after the enqueue_entity_load_avg() a
call to update_cfs_rq_load_avg() follows like in

enqueue_task_fair():

    for_each_sched_entity(se)
        enqueue_entity()
	    enqueue_entity_load_avg()
        	    update_cfs_rq_load_avg(now, cfs_rq)
                    if (migrated) attach_entity_load_avg()     

    for_each_sched_entity(se)
        update_load_avg()
            update_cfs_rq_load_avg(now, cfs_rq)

	  
Not sure if we can just update the root cfs_rq to se->avg.last_update_time before we add
se->avg.util_[avg/sum] to rq_of(cfs_rq)->cfs.avg.util_[avg/sum] in attach_entity_load_avg()?

cfs_rq throttling has to be considered as well ...

  parent reply	other threads:[~2016-04-06 18:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-01 16:38 [PATCH RFC] sched/fair: let cpu's cfs_rq to reflect task migration Leo Yan
2016-04-01 19:49 ` Peter Zijlstra
2016-04-01 22:28   ` Steve Muckle
2016-04-02  7:11     ` Leo Yan
2016-04-04  8:48       ` Morten Rasmussen
2016-04-04 18:30         ` Yuyang Du
2016-04-05  7:51           ` Morten Rasmussen
2016-04-05  0:15             ` Yuyang Du
2016-04-05 17:00               ` Dietmar Eggemann
2016-04-06  8:37                 ` Morten Rasmussen
2016-04-06 12:14                   ` Vincent Guittot
2016-04-06 18:53                   ` Dietmar Eggemann [this message]
2016-04-07 13:04                     ` Vincent Guittot
2016-04-07 20:30                       ` Dietmar Eggemann
2016-04-08  6:05                         ` Vincent Guittot
2016-04-05  6:56         ` Leo Yan
2016-04-05  9:13           ` Morten Rasmussen
2016-04-04  9:01 ` Morten Rasmussen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57055B41.6000906@arm.com \
    --to=dietmar.eggemann@arm.com \
    --cc=eas-dev@lists.linaro.org \
    --cc=leo.yan@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=steve.muckle@linaro.org \
    --cc=vincent.guittot@linaro.org \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.