[PATCH] sched: fix first task of a task group is attached twice

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] sched: fix first task of a task group is attached twice
@ 2016-05-24 13:08 Vincent Guittot
  2016-05-25 15:01 ` [PATCH v2] " Vincent Guittot
  0 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2016-05-24 13:08 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel; +Cc: yuyang.du, dietmar.eggemann, Vincent Guittot

The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
that the 1st sched_entity that will be attached, will keep its
last_update_time set to 0 and will attached once again during the
enqueue.
Initialize cfs_rq->avg.last_update_time to current rq's clock

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e8..a4e2c10 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8586,6 +8586,14 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
 		se->depth = parent->depth + 1;
 	}
 
+	/*
+	 * Set last_update_time to something different from 0 to make
+	 * sure the 1st sched_entity will not be attached twice: once
+	 * when attaching the task to the group and one more time when
+	 * enqueueing the task.
+	 */
+	tg->cfs_rq[cpu]->avg.last_update_time = rq_clock_task(rq_of(cfs_rq));
+
 	se->my_q = cfs_rq;
 	/* guarantee group entities always have weight */
 	update_load_set(&se->load, NICE_0_LOAD);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2] sched: fix first task of a task group is attached twice
  2016-05-24 13:08 [PATCH] sched: fix first task of a task group is attached twice Vincent Guittot
@ 2016-05-25 15:01 ` Vincent Guittot
  2016-05-25 22:38   ` Yuyang Du
  2016-05-27 15:48   ` Dietmar Eggemann
  0 siblings, 2 replies; 23+ messages in thread
From: Vincent Guittot @ 2016-05-25 15:01 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel; +Cc: yuyang.du, dietmar.eggemann, Vincent Guittot

The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
that the 1st sched_entity that will be attached, will keep its
last_update_time set to 0 and will attached once again during the
enqueue.
Initialize cfs_rq->avg.last_update_time to 1 instead.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---

v2:
- rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held

 kernel/sched/fair.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e8..3724656 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8586,6 +8586,14 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
 		se->depth = parent->depth + 1;
 	}
 
+	/*
+	 * Set last_update_time to something different from 0 to make
+	 * sure the 1st sched_entity will not be attached twice: once
+	 * when attaching the task to the group and one more time when
+	 * enqueueing the task.
+	 */
+	tg->cfs_rq[cpu]->avg.last_update_time = 1;
+
 	se->my_q = cfs_rq;
 	/* guarantee group entities always have weight */
 	update_load_set(&se->load, NICE_0_LOAD);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] sched: fix first task of a task group is attached twice
  2016-05-25 15:01 ` [PATCH v2] " Vincent Guittot
@ 2016-05-25 22:38   ` Yuyang Du
  2016-05-26  8:26     ` Vincent Guittot
  2016-05-27 15:48   ` Dietmar Eggemann
  1 sibling, 1 reply; 23+ messages in thread
From: Yuyang Du @ 2016-05-25 22:38 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: peterz, mingo, linux-kernel, dietmar.eggemann

On Wed, May 25, 2016 at 05:01:11PM +0200, Vincent Guittot wrote:
> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
> that the 1st sched_entity that will be attached, will keep its
> last_update_time set to 0 and will attached once again during the
> enqueue.
> Initialize cfs_rq->avg.last_update_time to 1 instead.
 
Actually, the first time (attach_task_cfs_rq()) is called even way
before init_entity_runnable_average(), no?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] sched: fix first task of a task group is attached twice
  2016-05-25 22:38   ` Yuyang Du
@ 2016-05-26  8:26     ` Vincent Guittot
  2016-05-26  0:40       ` Yuyang Du
  0 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2016-05-26  8:26 UTC (permalink / raw)
  To: Yuyang Du; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Dietmar Eggemann

On 26 May 2016 at 00:38, Yuyang Du <yuyang.du@intel.com> wrote:
> On Wed, May 25, 2016 at 05:01:11PM +0200, Vincent Guittot wrote:
>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
>> that the 1st sched_entity that will be attached, will keep its
>> last_update_time set to 0 and will attached once again during the
>> enqueue.
>> Initialize cfs_rq->avg.last_update_time to 1 instead.
>
> Actually, the first time (attach_task_cfs_rq()) is called even way
> before init_entity_runnable_average(), no?

maybe there is an issue during the fork of a task too
This patch is about the init of the sched_avg of a task group. The
problem happens when the 1st task is moved the group but it's not
about the fork of a task

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] sched: fix first task of a task group is attached twice
  2016-05-26  8:26     ` Vincent Guittot
@ 2016-05-26  0:40       ` Yuyang Du
  2016-05-26  8:51         ` Vincent Guittot
  0 siblings, 1 reply; 23+ messages in thread
From: Yuyang Du @ 2016-05-26  0:40 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Dietmar Eggemann

On Thu, May 26, 2016 at 10:26:54AM +0200, Vincent Guittot wrote:
> On 26 May 2016 at 00:38, Yuyang Du <yuyang.du@intel.com> wrote:
> > On Wed, May 25, 2016 at 05:01:11PM +0200, Vincent Guittot wrote:
> >> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
> >> that the 1st sched_entity that will be attached, will keep its
> >> last_update_time set to 0 and will attached once again during the
> >> enqueue.
> >> Initialize cfs_rq->avg.last_update_time to 1 instead.
> >
> > Actually, the first time (attach_task_cfs_rq()) is called even way
> > before init_entity_runnable_average(), no?
> 
> maybe there is an issue during the fork of a task too
> This patch is about the init of the sched_avg of a task group. The
> problem happens when the 1st task is moved the group but it's not
> about the fork of a task

Right, but the root cause is not addressed, which is when forked, the task
should not be touched on sched avgs before init_entity_runnable_average()
in wake_up_new_task(). You think?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] sched: fix first task of a task group is attached twice
  2016-05-26  0:40       ` Yuyang Du
@ 2016-05-26  8:51         ` Vincent Guittot
  0 siblings, 0 replies; 23+ messages in thread
From: Vincent Guittot @ 2016-05-26  8:51 UTC (permalink / raw)
  To: Yuyang Du; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Dietmar Eggemann

On 26 May 2016 at 02:40, Yuyang Du <yuyang.du@intel.com> wrote:
> On Thu, May 26, 2016 at 10:26:54AM +0200, Vincent Guittot wrote:
>> On 26 May 2016 at 00:38, Yuyang Du <yuyang.du@intel.com> wrote:
>> > On Wed, May 25, 2016 at 05:01:11PM +0200, Vincent Guittot wrote:
>> >> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
>> >> that the 1st sched_entity that will be attached, will keep its
>> >> last_update_time set to 0 and will attached once again during the
>> >> enqueue.
>> >> Initialize cfs_rq->avg.last_update_time to 1 instead.
>> >
>> > Actually, the first time (attach_task_cfs_rq()) is called even way
>> > before init_entity_runnable_average(), no?
>>
>> maybe there is an issue during the fork of a task too
>> This patch is about the init of the sched_avg of a task group. The
>> problem happens when the 1st task is moved the group but it's not
>> about the fork of a task
>
> Right, but the root cause is not addressed, which is when forked, the task
> should not be touched on sched avgs before init_entity_runnable_average()
> in wake_up_new_task(). You think?

so yes, this patch doesn't not address the fact that sched avgs should
not be touch before init_entity_runnable_average(). This patch is only
about inti of sched_avg of task group

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] sched: fix first task of a task group is attached twice
  2016-05-25 15:01 ` [PATCH v2] " Vincent Guittot
  2016-05-25 22:38   ` Yuyang Du
@ 2016-05-27 15:48   ` Dietmar Eggemann
  2016-05-27 17:16     ` Vincent Guittot
  1 sibling, 1 reply; 23+ messages in thread
From: Dietmar Eggemann @ 2016-05-27 15:48 UTC (permalink / raw)
  To: Vincent Guittot, peterz, mingo, linux-kernel; +Cc: yuyang.du

On 25/05/16 16:01, Vincent Guittot wrote:
> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
> that the 1st sched_entity that will be attached, will keep its
> last_update_time set to 0 and will attached once again during the
> enqueue.
> Initialize cfs_rq->avg.last_update_time to 1 instead.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> 
> v2:
> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held
> 
>  kernel/sched/fair.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 218f8e8..3724656 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8586,6 +8586,14 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
>  		se->depth = parent->depth + 1;
>  	}
>  
> +	/*
> +	 * Set last_update_time to something different from 0 to make
> +	 * sure the 1st sched_entity will not be attached twice: once
> +	 * when attaching the task to the group and one more time when
> +	 * enqueueing the task.
> +	 */
> +	tg->cfs_rq[cpu]->avg.last_update_time = 1;
> +
>  	se->my_q = cfs_rq;
>  	/* guarantee group entities always have weight */
>  	update_load_set(&se->load, NICE_0_LOAD);

So why not setting the last_update_time value for those cfs_rq's when
we have the lock? E.g. in task_move_group_fair() or attach_task_cfs_rq().

@@ -8490,12 +8493,20 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 #ifdef CONFIG_FAIR_GROUP_SCHED
 static void task_move_group_fair(struct task_struct *p)
 {
+#ifdef CONFIG_SMP
+       struct cfs_rq *cfs_rq = NULL;
+#endif
+
        detach_task_cfs_rq(p);
        set_task_rq(p, task_cpu(p));
 
 #ifdef CONFIG_SMP
        /* Tell se's cfs_rq has been changed -- migrated */
        p->se.avg.last_update_time = 0;
+
+       cfs_rq = cfs_rq_of(&p->se);
+       if (!cfs_rq->avg.last_update_time)
+               cfs_rq->avg.last_update_time = rq_clock_task(rq_of(cfs_rq));
 #endif

or

@@ -8423,6 +8423,9 @@ static void attach_task_cfs_rq(struct task_struct *p)
        se->depth = se->parent ? se->parent->depth + 1 : 0;
 #endif
 
+       if (!cfs_rq->avg.last_update_time)
+               cfs_rq->avg.last_update_time = rq_clock_task(rq_of(cfs_rq));
+
        /* Synchronize task with its cfs_rq */
        attach_entity_load_avg(cfs_rq, se);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] sched: fix first task of a task group is attached twice
  2016-05-27 15:48   ` Dietmar Eggemann
@ 2016-05-27 17:16     ` Vincent Guittot
  2016-05-27 20:38       ` Dietmar Eggemann
  0 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2016-05-27 17:16 UTC (permalink / raw)
  To: Dietmar Eggemann; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Yuyang Du

On 27 May 2016 at 17:48, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 25/05/16 16:01, Vincent Guittot wrote:
>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
>> that the 1st sched_entity that will be attached, will keep its
>> last_update_time set to 0 and will attached once again during the
>> enqueue.
>> Initialize cfs_rq->avg.last_update_time to 1 instead.
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>
>> v2:
>> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held
>>
>>  kernel/sched/fair.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 218f8e8..3724656 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8586,6 +8586,14 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
>>               se->depth = parent->depth + 1;
>>       }
>>
>> +     /*
>> +      * Set last_update_time to something different from 0 to make
>> +      * sure the 1st sched_entity will not be attached twice: once
>> +      * when attaching the task to the group and one more time when
>> +      * enqueueing the task.
>> +      */
>> +     tg->cfs_rq[cpu]->avg.last_update_time = 1;
>> +
>>       se->my_q = cfs_rq;
>>       /* guarantee group entities always have weight */
>>       update_load_set(&se->load, NICE_0_LOAD);
>
> So why not setting the last_update_time value for those cfs_rq's when
> we have the lock? E.g. in task_move_group_fair() or attach_task_cfs_rq().

I'm not sure that it's worth adding this init in functions that are
then used often only for the init of it.
If you are concerned by the update of the load of the 1st task that
will be attached, it can still have elapsed  a long time between the
creation of the group and the 1st enqueue of a task. This was the case
for the test i did when i found this issue.

Beside this point, I have to send a new version to set
load_last_update_time_copy for not 64 bits system. Fengguang points me
the issue

Regards,
Vincent

>
> @@ -8490,12 +8493,20 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>  static void task_move_group_fair(struct task_struct *p)
>  {
> +#ifdef CONFIG_SMP
> +       struct cfs_rq *cfs_rq = NULL;
> +#endif
> +
>         detach_task_cfs_rq(p);
>         set_task_rq(p, task_cpu(p));
>
>  #ifdef CONFIG_SMP
>         /* Tell se's cfs_rq has been changed -- migrated */
>         p->se.avg.last_update_time = 0;
> +
> +       cfs_rq = cfs_rq_of(&p->se);
> +       if (!cfs_rq->avg.last_update_time)
> +               cfs_rq->avg.last_update_time = rq_clock_task(rq_of(cfs_rq));
>  #endif
>
> or
>
> @@ -8423,6 +8423,9 @@ static void attach_task_cfs_rq(struct task_struct *p)
>         se->depth = se->parent ? se->parent->depth + 1 : 0;
>  #endif
>
> +       if (!cfs_rq->avg.last_update_time)
> +               cfs_rq->avg.last_update_time = rq_clock_task(rq_of(cfs_rq));
> +
>         /* Synchronize task with its cfs_rq */
>         attach_entity_load_avg(cfs_rq, se);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] sched: fix first task of a task group is attached twice
  2016-05-27 17:16     ` Vincent Guittot
@ 2016-05-27 20:38       ` Dietmar Eggemann
  2016-05-30  7:04         ` Vincent Guittot
  2016-05-30 15:54         ` [PATCH v2] " Vincent Guittot
  0 siblings, 2 replies; 23+ messages in thread
From: Dietmar Eggemann @ 2016-05-27 20:38 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Yuyang Du

On 27/05/16 18:16, Vincent Guittot wrote:
> On 27 May 2016 at 17:48, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>> On 25/05/16 16:01, Vincent Guittot wrote:
>>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
>>> that the 1st sched_entity that will be attached, will keep its
>>> last_update_time set to 0 and will attached once again during the
>>> enqueue.
>>> Initialize cfs_rq->avg.last_update_time to 1 instead.
>>>
>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>>> ---
>>>
>>> v2:
>>> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held
>>>
>>>  kernel/sched/fair.c | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 218f8e8..3724656 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -8586,6 +8586,14 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
>>>               se->depth = parent->depth + 1;
>>>       }
>>>
>>> +     /*
>>> +      * Set last_update_time to something different from 0 to make
>>> +      * sure the 1st sched_entity will not be attached twice: once
>>> +      * when attaching the task to the group and one more time when
>>> +      * enqueueing the task.
>>> +      */
>>> +     tg->cfs_rq[cpu]->avg.last_update_time = 1;
>>> +

Couldn't you not just set the value in init_cfs_rq():

@@ -8482,6 +8482,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
        cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
 #endif
 #ifdef CONFIG_SMP
+       cfs_rq->avg.last_update_time = 1;
        atomic_long_set(&cfs_rq->removed_load_avg, 0);
        atomic_long_set(&cfs_rq->removed_util_avg, 0);
 #endif

>>>       se->my_q = cfs_rq;
>>>       /* guarantee group entities always have weight */
>>>       update_load_set(&se->load, NICE_0_LOAD);
>>
>> So why not setting the last_update_time value for those cfs_rq's when
>> we have the lock? E.g. in task_move_group_fair() or attach_task_cfs_rq().
> 
> I'm not sure that it's worth adding this init in functions that are
> then used often only for the init of it.

Yeah, there will be this if condition overhead.

> If you are concerned by the update of the load of the 1st task that
> will be attached, it can still have elapsed  a long time between the
> creation of the group and the 1st enqueue of a task. This was the case
> for the test i did when i found this issue.

Understood, but for me, creation of the task group is
cpu_cgroup_css_alloc ->  sched_create_group() -> ... -> init_cfs_rq(),
init_tg_cfs_entry(), ...

and the functions which are called when the first task is put into the
task group are cpu_cgroup_attach() and cpu_cgroup_fork() and they whould
trigger the initial setup of the cfs_rq->avg.last_update_time.

> 
> Beside this point, I have to send a new version to set
> load_last_update_time_copy for not 64 bits system. Fengguang points me
> the issue

OK.

[...]
>>
>> +       if (!cfs_rq->avg.last_update_time)
>> +               cfs_rq->avg.last_update_time = rq_clock_task(rq_of(cfs_rq));
>> +
>>         /* Synchronize task with its cfs_rq */
>>         attach_entity_load_avg(cfs_rq, se);
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] sched: fix first task of a task group is attached twice
  2016-05-27 20:38       ` Dietmar Eggemann
@ 2016-05-30  7:04         ` Vincent Guittot
  2016-05-30 15:52           ` [PATCH v3] " Vincent Guittot
  2016-05-30 15:54         ` [PATCH v2] " Vincent Guittot
  1 sibling, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2016-05-30  7:04 UTC (permalink / raw)
  To: Dietmar Eggemann; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Yuyang Du

On 27 May 2016 at 22:38, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 27/05/16 18:16, Vincent Guittot wrote:
>> On 27 May 2016 at 17:48, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>>> On 25/05/16 16:01, Vincent Guittot wrote:
>>>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
>>>> that the 1st sched_entity that will be attached, will keep its
>>>> last_update_time set to 0 and will attached once again during the
>>>> enqueue.
>>>> Initialize cfs_rq->avg.last_update_time to 1 instead.
>>>>
>>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>>>> ---
>>>>
>>>> v2:
>>>> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held
>>>>
>>>>  kernel/sched/fair.c | 8 ++++++++
>>>>  1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 218f8e8..3724656 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -8586,6 +8586,14 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
>>>>               se->depth = parent->depth + 1;
>>>>       }
>>>>
>>>> +     /*
>>>> +      * Set last_update_time to something different from 0 to make
>>>> +      * sure the 1st sched_entity will not be attached twice: once
>>>> +      * when attaching the task to the group and one more time when
>>>> +      * enqueueing the task.
>>>> +      */
>>>> +     tg->cfs_rq[cpu]->avg.last_update_time = 1;
>>>> +
>
> Couldn't you not just set the value in init_cfs_rq():

yes, there is no good reason to use  init_tg_cfs_entry instead of  init_cfs_rq

>
> @@ -8482,6 +8482,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
>         cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
>  #endif
>  #ifdef CONFIG_SMP
> +       cfs_rq->avg.last_update_time = 1;
>         atomic_long_set(&cfs_rq->removed_load_avg, 0);
>         atomic_long_set(&cfs_rq->removed_util_avg, 0);
>  #endif
>
>>>>       se->my_q = cfs_rq;
>>>>       /* guarantee group entities always have weight */
>>>>       update_load_set(&se->load, NICE_0_LOAD);
>>>
>>> So why not setting the last_update_time value for those cfs_rq's when
>>> we have the lock? E.g. in task_move_group_fair() or attach_task_cfs_rq().
>>
>> I'm not sure that it's worth adding this init in functions that are
>> then used often only for the init of it.
>
> Yeah, there will be this if condition overhead.
>
>> If you are concerned by the update of the load of the 1st task that
>> will be attached, it can still have elapsed  a long time between the
>> creation of the group and the 1st enqueue of a task. This was the case
>> for the test i did when i found this issue.
>
> Understood, but for me, creation of the task group is
> cpu_cgroup_css_alloc ->  sched_create_group() -> ... -> init_cfs_rq(),
> init_tg_cfs_entry(), ...
>
> and the functions which are called when the first task is put into the
> task group are cpu_cgroup_attach() and cpu_cgroup_fork() and they whould
> trigger the initial setup of the cfs_rq->avg.last_update_time.
>
>>
>> Beside this point, I have to send a new version to set
>> load_last_update_time_copy for not 64 bits system. Fengguang points me
>> the issue
>
> OK.
>
> [...]
>>>
>>> +       if (!cfs_rq->avg.last_update_time)
>>> +               cfs_rq->avg.last_update_time = rq_clock_task(rq_of(cfs_rq));
>>> +
>>>         /* Synchronize task with its cfs_rq */
>>>         attach_entity_load_avg(cfs_rq, se);
>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3] sched: fix first task of a task group is attached twice
  2016-05-30  7:04         ` Vincent Guittot
@ 2016-05-30 15:52           ` Vincent Guittot
  2016-05-30 19:48             ` Yuyang Du
                               ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Vincent Guittot @ 2016-05-30 15:52 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel; +Cc: yuyang.du, dietmar.eggemann, Vincent Guittot

The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
that the 1st sched_entity that will be attached, will keep its
last_update_time set to 0 and will attached once again during the
enqueue.
Initialize cfs_rq->avg.last_update_time to 1 instead.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---

v3:
- add initialization of load_last_update_time_copy for not 64bits system
- move init into init_cfs_rq

v2:
- rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held

 kernel/sched/fair.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e8..86be9c1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8459,6 +8459,16 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
 #endif
 #ifdef CONFIG_SMP
+	/*
+	 * Set last_update_time to something different from 0 to make
+	 * sure the 1st sched_entity will not be attached twice: once
+	 * when attaching the task to the group and one more time when
+	 * enqueueing the task.
+	 */
+	cfs_rq->avg.last_update_time = 1;
+#ifndef CONFIG_64BIT
+	cfs_rq->load_last_update_time_copy = 1;
+#endif
 	atomic_long_set(&cfs_rq->removed_load_avg, 0);
 	atomic_long_set(&cfs_rq->removed_util_avg, 0);
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] sched: fix first task of a task group is attached twice
  2016-05-30 15:52           ` [PATCH v3] " Vincent Guittot
@ 2016-05-30 19:48             ` Yuyang Du
  2016-05-31  7:28               ` Vincent Guittot
  2016-06-01 15:31             ` Dietmar Eggemann
  2016-06-15 19:19             ` Yuyang Du
  2 siblings, 1 reply; 23+ messages in thread
From: Yuyang Du @ 2016-05-30 19:48 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: peterz, mingo, linux-kernel, dietmar.eggemann

On Mon, May 30, 2016 at 05:52:20PM +0200, Vincent Guittot wrote:
> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
> that the 1st sched_entity that will be attached, will keep its
> last_update_time set to 0 and will attached once again during the
> enqueue.
> Initialize cfs_rq->avg.last_update_time to 1 instead.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> 
> v3:
> - add initialization of load_last_update_time_copy for not 64bits system
> - move init into init_cfs_rq
> 
> v2:
> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held
> 
>  kernel/sched/fair.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 218f8e8..86be9c1 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8459,6 +8459,16 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
>  	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
>  #endif
>  #ifdef CONFIG_SMP
> +	/*
> +	 * Set last_update_time to something different from 0 to make
> +	 * sure the 1st sched_entity will not be attached twice:once
> +	 * when attaching the task to the group and one more time when
> +	 * enqueueing the task.
> +	 */

The first time: "once when attaching the task to the group".

That attaching is purely wrong, but will not have any effect (at least
load/util wise), because the task will later be inited in
init_entity_runnable_average().

The second time: "one more time when enqueueing the task".

In enqueue_entity_load_avg(), your patch will not have any effect either,
because of the above overwriting the new task's load in
init_entity_runnable_average(), no?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] sched: fix first task of a task group is attached twice
  2016-05-30 19:48             ` Yuyang Du
@ 2016-05-31  7:28               ` Vincent Guittot
  2016-05-31  0:44                 ` Yuyang Du
  0 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2016-05-31  7:28 UTC (permalink / raw)
  To: Yuyang Du; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Dietmar Eggemann

Hi Yuyang,

On 30 May 2016 at 21:48, Yuyang Du <yuyang.du@intel.com> wrote:
> On Mon, May 30, 2016 at 05:52:20PM +0200, Vincent Guittot wrote:
>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
>> that the 1st sched_entity that will be attached, will keep its
>> last_update_time set to 0 and will attached once again during the
>> enqueue.
>> Initialize cfs_rq->avg.last_update_time to 1 instead.
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>
>> v3:
>> - add initialization of load_last_update_time_copy for not 64bits system
>> - move init into init_cfs_rq
>>
>> v2:
>> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held
>>
>>  kernel/sched/fair.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 218f8e8..86be9c1 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8459,6 +8459,16 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
>>       cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
>>  #endif
>>  #ifdef CONFIG_SMP
>> +     /*
>> +      * Set last_update_time to something different from 0 to make
>> +      * sure the 1st sched_entity will not be attached twice:once
>> +      * when attaching the task to the group and one more time when
>> +      * enqueueing the task.
>> +      */
>
> The first time: "once when attaching the task to the group".
>
> That attaching is purely wrong, but will not have any effect (at least
> load/util wise), because the task will later be inited in
> init_entity_runnable_average().

This patch is not related to the init of a task but related to the
init of the cfs_rq and to what happen with the 1st task that is
enqueued on it.

Lets take a task A that has already been scheduled on other cfs_rq so
its se->avg.last_update_time is different from 0.

Create a new task group TGB
At creation, the cfs_rq->avg.last_update_time of this TGB is set to 0.

Now move task A on TGB.
A is attached to TGB so  se->avg.last_update_time =
cfs_rq->avg.last_update_time which is 0
A is then enqueued on th cfs_rq and because se->avg.last_update_time
== 0, A will be attached one more time on the cfs_rq

This patch set cfs_rq->avg.last_update_time to 1 at creation so the
1st time that A is attached to TGB,  se->avg.last_update_time =
cfs_rq->avg.last_update_time = 1 and A will not bve attached one more
time during the enqueue.

>
> The second time: "one more time when enqueueing the task".
>
> In enqueue_entity_load_avg(), your patch will not have any effect either,
> because of the above overwriting the new task's load in
> init_entity_runnable_average(), no?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] sched: fix first task of a task group is attached twice
  2016-05-31  7:28               ` Vincent Guittot
@ 2016-05-31  0:44                 ` Yuyang Du
  0 siblings, 0 replies; 23+ messages in thread
From: Yuyang Du @ 2016-05-31  0:44 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Dietmar Eggemann

On Tue, May 31, 2016 at 09:28:30AM +0200, Vincent Guittot wrote:
> Hi Yuyang,
> 
> On 30 May 2016 at 21:48, Yuyang Du <yuyang.du@intel.com> wrote:
> > On Mon, May 30, 2016 at 05:52:20PM +0200, Vincent Guittot wrote:
> >> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
> >> that the 1st sched_entity that will be attached, will keep its
> >> last_update_time set to 0 and will attached once again during the
> >> enqueue.
> >> Initialize cfs_rq->avg.last_update_time to 1 instead.
> >>
> >> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> >> ---
> >>
> >> v3:
> >> - add initialization of load_last_update_time_copy for not 64bits system
> >> - move init into init_cfs_rq
> >>
> >> v2:
> >> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held
> >>
> >>  kernel/sched/fair.c | 10 ++++++++++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 218f8e8..86be9c1 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -8459,6 +8459,16 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
> >>       cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
> >>  #endif
> >>  #ifdef CONFIG_SMP
> >> +     /*
> >> +      * Set last_update_time to something different from 0 to make
> >> +      * sure the 1st sched_entity will not be attached twice:once
> >> +      * when attaching the task to the group and one more time when
> >> +      * enqueueing the task.
> >> +      */
> >
> > The first time: "once when attaching the task to the group".
> >
> > That attaching is purely wrong, but will not have any effect (at least
> > load/util wise), because the task will later be inited in
> > init_entity_runnable_average().
> 
> This patch is not related to the init of a task but related to the
> init of the cfs_rq and to what happen with the 1st task that is
> enqueued on it.
> 
> Lets take a task A that has already been scheduled on other cfs_rq so
> its se->avg.last_update_time is different from 0.

I understand it, finally, :)
 
> Create a new task group TGB
> At creation, the cfs_rq->avg.last_update_time of this TGB is set to 0.
> 
> Now move task A on TGB.
> A is attached to TGB so  se->avg.last_update_time =
> cfs_rq->avg.last_update_time which is 0
> A is then enqueued on th cfs_rq and because se->avg.last_update_time
> == 0, A will be attached one more time on the cfs_rq
> 
> This patch set cfs_rq->avg.last_update_time to 1 at creation so the
> 1st time that A is attached to TGB,  se->avg.last_update_time =
> cfs_rq->avg.last_update_time = 1 and A will not bve attached one more
> time during the enqueue.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] sched: fix first task of a task group is attached twice
  2016-05-30 15:52           ` [PATCH v3] " Vincent Guittot
  2016-05-30 19:48             ` Yuyang Du
@ 2016-06-01 15:31             ` Dietmar Eggemann
  2016-06-01 15:54               ` Vincent Guittot
  2016-06-15 19:19             ` Yuyang Du
  2 siblings, 1 reply; 23+ messages in thread
From: Dietmar Eggemann @ 2016-06-01 15:31 UTC (permalink / raw)
  To: Vincent Guittot, peterz, mingo, linux-kernel; +Cc: yuyang.du

On 30/05/16 16:52, Vincent Guittot wrote:
> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
> that the 1st sched_entity that will be attached, will keep its

attached in .task_move_group ?

I'm not sure if we can have a .switched_to() directly followed by a
.enqueue_task() into a cfs_rq with avg.last_update_time = 0.

> last_update_time set to 0 and will attached once again during the
> enqueue.
> Initialize cfs_rq->avg.last_update_time to 1 instead.

Maybe worth mentioning in the header:

This double se attaching for the first task which moves into the task
group owning this cfs_rq (.task_move_group() and .enqueue_task()) can
(obviously) only happen if CONFIG_FAIR_GROUP_SCHED is set.

The reason for this is that we set 'se->avg.last_update_time =
cfs_rq->avg.last_update_time' in attach_entity_load_avg() and use
'migrated = !sa->last_update_time' as a flag in
enqueue_entity_load_avg() to decide if we call attach_entity_load_avg()
(again) or only update se->avg .

Tested-by: Dietmar Eggemann <Dietmar.Eggemann@arm.com>

> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> 
> v3:
> - add initialization of load_last_update_time_copy for not 64bits system
> - move init into init_cfs_rq
> 
> v2:
> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held
> 
>  kernel/sched/fair.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 218f8e8..86be9c1 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8459,6 +8459,16 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
>  	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
>  #endif
>  #ifdef CONFIG_SMP
> +	/*
> +	 * Set last_update_time to something different from 0 to make
> +	 * sure the 1st sched_entity will not be attached twice: once
> +	 * when attaching the task to the group and one more time when
> +	 * enqueueing the task.
> +	 */
> +	cfs_rq->avg.last_update_time = 1;
> +#ifndef CONFIG_64BIT
> +	cfs_rq->load_last_update_time_copy = 1;
> +#endif
>  	atomic_long_set(&cfs_rq->removed_load_avg, 0);
>  	atomic_long_set(&cfs_rq->removed_util_avg, 0);
>  #endif

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] sched: fix first task of a task group is attached twice
  2016-06-01 15:31             ` Dietmar Eggemann
@ 2016-06-01 15:54               ` Vincent Guittot
  2016-06-06 19:32                 ` Dietmar Eggemann
  0 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2016-06-01 15:54 UTC (permalink / raw)
  To: Dietmar Eggemann; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Yuyang Du

On 1 June 2016 at 17:31, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 30/05/16 16:52, Vincent Guittot wrote:
>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
>> that the 1st sched_entity that will be attached, will keep its
>
> attached in .task_move_group ?
>
> I'm not sure if we can have a .switched_to() directly followed by a
> .enqueue_task() into a cfs_rq with avg.last_update_time = 0.

I think it can happen as well if the task is not queued during the
change of scheduling class because when we attach the task in
switched_to, the task->se.avg.last_update_time will stay to 0. So when
the task will be enqueued, it will be attached one more time

>
>> last_update_time set to 0 and will attached once again during the
>> enqueue.
>> Initialize cfs_rq->avg.last_update_time to 1 instead.
>
> Maybe worth mentioning in the header:
>
> This double se attaching for the first task which moves into the task
> group owning this cfs_rq (.task_move_group() and .enqueue_task()) can
> (obviously) only happen if CONFIG_FAIR_GROUP_SCHED is set.
>
> The reason for this is that we set 'se->avg.last_update_time =
> cfs_rq->avg.last_update_time' in attach_entity_load_avg() and use
> 'migrated = !sa->last_update_time' as a flag in
> enqueue_entity_load_avg() to decide if we call attach_entity_load_avg()
> (again) or only update se->avg .
>
> Tested-by: Dietmar Eggemann <Dietmar.Eggemann@arm.com>

Thanks

>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>
>> v3:
>> - add initialization of load_last_update_time_copy for not 64bits system
>> - move init into init_cfs_rq
>>
>> v2:
>> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held
>>
>>  kernel/sched/fair.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 218f8e8..86be9c1 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8459,6 +8459,16 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
>>       cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
>>  #endif
>>  #ifdef CONFIG_SMP
>> +     /*
>> +      * Set last_update_time to something different from 0 to make
>> +      * sure the 1st sched_entity will not be attached twice: once
>> +      * when attaching the task to the group and one more time when
>> +      * enqueueing the task.
>> +      */
>> +     cfs_rq->avg.last_update_time = 1;
>> +#ifndef CONFIG_64BIT
>> +     cfs_rq->load_last_update_time_copy = 1;
>> +#endif
>>       atomic_long_set(&cfs_rq->removed_load_avg, 0);
>>       atomic_long_set(&cfs_rq->removed_util_avg, 0);
>>  #endif

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] sched: fix first task of a task group is attached twice
  2016-06-01 15:54               ` Vincent Guittot
@ 2016-06-06 19:32                 ` Dietmar Eggemann
  2016-06-07  7:35                   ` Vincent Guittot
  0 siblings, 1 reply; 23+ messages in thread
From: Dietmar Eggemann @ 2016-06-06 19:32 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Yuyang Du

On 01/06/16 16:54, Vincent Guittot wrote:
> On 1 June 2016 at 17:31, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>> On 30/05/16 16:52, Vincent Guittot wrote:
>>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
>>> that the 1st sched_entity that will be attached, will keep its
>>
>> attached in .task_move_group ?
>>
>> I'm not sure if we can have a .switched_to() directly followed by a
>> .enqueue_task() into a cfs_rq with avg.last_update_time = 0.
> 
> I think it can happen as well if the task is not queued during the
> change of scheduling class because when we attach the task in
> switched_to, the task->se.avg.last_update_time will stay to 0. So when
> the task will be enqueued, it will be attached one more time

You're right. 

So I started msc_test as an rt task in task group tg_1 (css.id=2) (1) and when it slept 
I changed it to become a cfs task (2).


(1) # chrt -r 99 ./cg.sh /tg_1 ./msc_test

(2) # chrt -o -p 0 2093

...
---> busy
[   84.336570] dequeue_task_rt: cpu=2 comm=msc_test pid=2093 tg_css_id=2
[   84.342948] enqueue_task_rt: cpu=1 comm=msc_test pid=2093 tg_css_id=2
---> sleep
[   86.548655] dequeue_task_rt: cpu=1 comm=msc_test pid=2093 tg_css_id=2
[   91.008457] switched_from_rt: cpu=1 comm=msc_test pid=2093
[   91.013896] switched_to_fair: cpu=1 comm=msc_test pid=2093
[   91.019336] attach_entity_load_avg: cpu=1 comm=msc_test pid=2093 last_update_time=0 tg->css.id=2
[   91.028499] do_sched_setscheduler comm=msc_test pid=2093 policy=0 rv=0
[   91.548807] enqueue_task_fair: cpu=1 comm=msc_test pid=2093 tg_css_id=2
[   91.555363] attach_entity_load_avg: cpu=1 comm=msc_test pid=2093 last_update_time=91548795220 tg->css.id=2
---> busy
...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] sched: fix first task of a task group is attached twice
  2016-06-06 19:32                 ` Dietmar Eggemann
@ 2016-06-07  7:35                   ` Vincent Guittot
  0 siblings, 0 replies; 23+ messages in thread
From: Vincent Guittot @ 2016-06-07  7:35 UTC (permalink / raw)
  To: Dietmar Eggemann; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Yuyang Du

Hi Dietmar,

On 6 June 2016 at 21:32, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 01/06/16 16:54, Vincent Guittot wrote:
>> On 1 June 2016 at 17:31, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>>> On 30/05/16 16:52, Vincent Guittot wrote:
>>>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
>>>> that the 1st sched_entity that will be attached, will keep its
>>>
>>> attached in .task_move_group ?
>>>
>>> I'm not sure if we can have a .switched_to() directly followed by a
>>> .enqueue_task() into a cfs_rq with avg.last_update_time = 0.
>>
>> I think it can happen as well if the task is not queued during the
>> change of scheduling class because when we attach the task in
>> switched_to, the task->se.avg.last_update_time will stay to 0. So when
>> the task will be enqueued, it will be attached one more time
>
> You're right.
>
> So I started msc_test as an rt task in task group tg_1 (css.id=2) (1) and when it slept
> I changed it to become a cfs task (2).
>
>
> (1) # chrt -r 99 ./cg.sh /tg_1 ./msc_test
>
> (2) # chrt -o -p 0 2093
>
> ...
> ---> busy
> [   84.336570] dequeue_task_rt: cpu=2 comm=msc_test pid=2093 tg_css_id=2
> [   84.342948] enqueue_task_rt: cpu=1 comm=msc_test pid=2093 tg_css_id=2
> ---> sleep
> [   86.548655] dequeue_task_rt: cpu=1 comm=msc_test pid=2093 tg_css_id=2
> [   91.008457] switched_from_rt: cpu=1 comm=msc_test pid=2093
> [   91.013896] switched_to_fair: cpu=1 comm=msc_test pid=2093
> [   91.019336] attach_entity_load_avg: cpu=1 comm=msc_test pid=2093 last_update_time=0 tg->css.id=2
> [   91.028499] do_sched_setscheduler comm=msc_test pid=2093 policy=0 rv=0
> [   91.548807] enqueue_task_fair: cpu=1 comm=msc_test pid=2093 tg_css_id=2
> [   91.555363] attach_entity_load_avg: cpu=1 comm=msc_test pid=2093 last_update_time=91548795220 tg->css.id=2
> ---> busy
> ...

Thanks for testing the sequence

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] sched: fix first task of a task group is attached twice
  2016-05-30 15:52           ` [PATCH v3] " Vincent Guittot
  2016-05-30 19:48             ` Yuyang Du
  2016-06-01 15:31             ` Dietmar Eggemann
@ 2016-06-15 19:19             ` Yuyang Du
  2016-06-16  7:12               ` Vincent Guittot
  2 siblings, 1 reply; 23+ messages in thread
From: Yuyang Du @ 2016-06-15 19:19 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: peterz, mingo, linux-kernel, dietmar.eggemann

On Mon, May 30, 2016 at 05:52:20PM +0200, Vincent Guittot wrote:
> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
> that the 1st sched_entity that will be attached, will keep its
> last_update_time set to 0 and will attached once again during the
> enqueue.
> Initialize cfs_rq->avg.last_update_time to 1 instead.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> 
> v3:
> - add initialization of load_last_update_time_copy for not 64bits system
> - move init into init_cfs_rq
> 
> v2:
> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held
> 
>  kernel/sched/fair.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 218f8e8..86be9c1 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8459,6 +8459,16 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
>  	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
>  #endif
>  #ifdef CONFIG_SMP
> +	/*
> +	 * Set last_update_time to something different from 0 to make
> +	 * sure the 1st sched_entity will not be attached twice: once
> +	 * when attaching the task to the group and one more time when
> +	 * enqueueing the task.
> +	 */
> +	cfs_rq->avg.last_update_time = 1;
> +#ifndef CONFIG_64BIT
> +	cfs_rq->load_last_update_time_copy = 1;
> +#endif
>  	atomic_long_set(&cfs_rq->removed_load_avg, 0);
>  	atomic_long_set(&cfs_rq->removed_util_avg, 0);
>  #endif

Then, when enqueued, both cfs_rq and task will be decayed to 0, due to
a large gap between 1 and now, no?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] sched: fix first task of a task group is attached twice
  2016-06-15 19:19             ` Yuyang Du
@ 2016-06-16  7:12               ` Vincent Guittot
  2016-06-15 23:24                 ` Yuyang Du
  0 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2016-06-16  7:12 UTC (permalink / raw)
  To: Yuyang Du; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Dietmar Eggemann

On 15 June 2016 at 21:19, Yuyang Du <yuyang.du@intel.com> wrote:
> On Mon, May 30, 2016 at 05:52:20PM +0200, Vincent Guittot wrote:
>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
>> that the 1st sched_entity that will be attached, will keep its
>> last_update_time set to 0 and will attached once again during the
>> enqueue.
>> Initialize cfs_rq->avg.last_update_time to 1 instead.
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>
>> v3:
>> - add initialization of load_last_update_time_copy for not 64bits system
>> - move init into init_cfs_rq
>>
>> v2:
>> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held
>>
>>  kernel/sched/fair.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 218f8e8..86be9c1 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8459,6 +8459,16 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
>>       cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
>>  #endif
>>  #ifdef CONFIG_SMP
>> +     /*
>> +      * Set last_update_time to something different from 0 to make
>> +      * sure the 1st sched_entity will not be attached twice: once
>> +      * when attaching the task to the group and one more time when
>> +      * enqueueing the task.
>> +      */
>> +     cfs_rq->avg.last_update_time = 1;
>> +#ifndef CONFIG_64BIT
>> +     cfs_rq->load_last_update_time_copy = 1;
>> +#endif
>>       atomic_long_set(&cfs_rq->removed_load_avg, 0);
>>       atomic_long_set(&cfs_rq->removed_util_avg, 0);
>>  #endif
>
> Then, when enqueued, both cfs_rq and task will be decayed to 0, due to
> a large gap between 1 and now, no?

yes, like it is done currently (but 1ns later) .

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] sched: fix first task of a task group is attached twice
  2016-06-16  7:12               ` Vincent Guittot
@ 2016-06-15 23:24                 ` Yuyang Du
  2016-06-16  9:42                   ` Vincent Guittot
  0 siblings, 1 reply; 23+ messages in thread
From: Yuyang Du @ 2016-06-15 23:24 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Dietmar Eggemann

On Thu, Jun 16, 2016 at 09:12:58AM +0200, Vincent Guittot wrote:
> > Then, when enqueued, both cfs_rq and task will be decayed to 0, due to
> > a large gap between 1 and now, no?
> 
> yes, like it is done currently (but 1ns later) .

Well, currently, cfs_rq will be decayed to 0, but will then add the task.
So it turns out the current result is right. Attached twice, but result
is right. Correct?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] sched: fix first task of a task group is attached twice
  2016-06-15 23:24                 ` Yuyang Du
@ 2016-06-16  9:42                   ` Vincent Guittot
  0 siblings, 0 replies; 23+ messages in thread
From: Vincent Guittot @ 2016-06-16  9:42 UTC (permalink / raw)
  To: Yuyang Du; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Dietmar Eggemann

On 16 June 2016 at 01:24, Yuyang Du <yuyang.du@intel.com> wrote:
> On Thu, Jun 16, 2016 at 09:12:58AM +0200, Vincent Guittot wrote:
>> > Then, when enqueued, both cfs_rq and task will be decayed to 0, due to
>> > a large gap between 1 and now, no?
>>
>> yes, like it is done currently (but 1ns later) .
>
> Well, currently, cfs_rq will be decayed to 0, but will then add the task.
> So it turns out the current result is right. Attached twice, but result
> is right. Correct?

So the load looks accidentally correct but not the utilization which
will be overestimated up to twice the max

With this change,the behavior of the 1st task becomes the same as what
happen to the other tasks that will be attached to the task group that
has been idle for a while and  that has an old last_update_time.
Then, i have other pending patch to fix this behavior, has mentioned
in a previous email

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] sched: fix first task of a task group is attached twice
  2016-05-27 20:38       ` Dietmar Eggemann
  2016-05-30  7:04         ` Vincent Guittot
@ 2016-05-30 15:54         ` Vincent Guittot
  1 sibling, 0 replies; 23+ messages in thread
From: Vincent Guittot @ 2016-05-30 15:54 UTC (permalink / raw)
  To: Dietmar Eggemann; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Yuyang Du

On 27 May 2016 at 22:38, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 27/05/16 18:16, Vincent Guittot wrote:
>> On 27 May 2016 at 17:48, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>>> On 25/05/16 16:01, Vincent Guittot wrote:
>>>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
>>>> that the 1st sched_entity that will be attached, will keep its
>>>> last_update_time set to 0 and will attached once again during the
>>>> enqueue.
>>>> Initialize cfs_rq->avg.last_update_time to 1 instead.
>>>>
>>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>>>> ---
>>>>
>>>> v2:
>>>> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held
>>>>
>>>>  kernel/sched/fair.c | 8 ++++++++
>>>>  1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 218f8e8..3724656 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -8586,6 +8586,14 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
>>>>               se->depth = parent->depth + 1;
>>>>       }
>>>>
>>>> +     /*
>>>> +      * Set last_update_time to something different from 0 to make
>>>> +      * sure the 1st sched_entity will not be attached twice: once
>>>> +      * when attaching the task to the group and one more time when
>>>> +      * enqueueing the task.
>>>> +      */
>>>> +     tg->cfs_rq[cpu]->avg.last_update_time = 1;
>>>> +
>
> Couldn't you not just set the value in init_cfs_rq():
>
> @@ -8482,6 +8482,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
>         cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
>  #endif
>  #ifdef CONFIG_SMP
> +       cfs_rq->avg.last_update_time = 1;
>         atomic_long_set(&cfs_rq->removed_load_avg, 0);
>         atomic_long_set(&cfs_rq->removed_util_avg, 0);
>  #endif
>
>>>>       se->my_q = cfs_rq;
>>>>       /* guarantee group entities always have weight */
>>>>       update_load_set(&se->load, NICE_0_LOAD);
>>>
>>> So why not setting the last_update_time value for those cfs_rq's when
>>> we have the lock? E.g. in task_move_group_fair() or attach_task_cfs_rq().
>>
>> I'm not sure that it's worth adding this init in functions that are
>> then used often only for the init of it.
>
> Yeah, there will be this if condition overhead.
>
>> If you are concerned by the update of the load of the 1st task that
>> will be attached, it can still have elapsed  a long time between the
>> creation of the group and the 1st enqueue of a task. This was the case
>> for the test i did when i found this issue.
>
> Understood, but for me, creation of the task group is
> cpu_cgroup_css_alloc ->  sched_create_group() -> ... -> init_cfs_rq(),
> init_tg_cfs_entry(), ...
>
> and the functions which are called when the first task is put into the
> task group are cpu_cgroup_attach() and cpu_cgroup_fork() and they whould
> trigger the initial setup of the cfs_rq->avg.last_update_time.

Adding a test and the init of cfs_rq->avg.last_update_time in
cpu_cgroup_attach() and cpu_cgroup_fork() in order to have an almost
up to date cfs_rq->avg.last_update_time at creation, will only solve a
part of a wider issue that happens when moving a task to a cfs_rq that
has not been updated for a while (since the creation for the 1st time
but also since the last update of a blocked cfs_rq)

I have another pending patch for this kind of issue that i haven't sent yet

>
>>
>> Beside this point, I have to send a new version to set
>> load_last_update_time_copy for not 64 bits system. Fengguang points me
>> the issue
>
> OK.
>
> [...]
>>>
>>> +       if (!cfs_rq->avg.last_update_time)
>>> +               cfs_rq->avg.last_update_time = rq_clock_task(rq_of(cfs_rq));
>>> +
>>>         /* Synchronize task with its cfs_rq */
>>>         attach_entity_load_avg(cfs_rq, se);
>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-06-16  9:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-24 13:08 [PATCH] sched: fix first task of a task group is attached twice Vincent Guittot
2016-05-25 15:01 ` [PATCH v2] " Vincent Guittot
2016-05-25 22:38   ` Yuyang Du
2016-05-26  8:26     ` Vincent Guittot
2016-05-26  0:40       ` Yuyang Du
2016-05-26  8:51         ` Vincent Guittot
2016-05-27 15:48   ` Dietmar Eggemann
2016-05-27 17:16     ` Vincent Guittot
2016-05-27 20:38       ` Dietmar Eggemann
2016-05-30  7:04         ` Vincent Guittot
2016-05-30 15:52           ` [PATCH v3] " Vincent Guittot
2016-05-30 19:48             ` Yuyang Du
2016-05-31  7:28               ` Vincent Guittot
2016-05-31  0:44                 ` Yuyang Du
2016-06-01 15:31             ` Dietmar Eggemann
2016-06-01 15:54               ` Vincent Guittot
2016-06-06 19:32                 ` Dietmar Eggemann
2016-06-07  7:35                   ` Vincent Guittot
2016-06-15 19:19             ` Yuyang Du
2016-06-16  7:12               ` Vincent Guittot
2016-06-15 23:24                 ` Yuyang Du
2016-06-16  9:42                   ` Vincent Guittot
2016-05-30 15:54         ` [PATCH v2] " Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).