All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Miguel de Dios <migueldedios@google.com>,
	Steve Muckle <smuckle@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org, kernel-team@android.com,
	Todd Kjos <tkjos@google.com>, Paul Turner <pjt@google.com>,
	Quentin Perret <quentin.perret@arm.com>,
	Patrick Bellasi <Patrick.Bellasi@arm.com>,
	Chris Redpath <Chris.Redpath@arm.com>,
	Morten Rasmussen <Morten.Rasmussen@arm.com>,
	John Dias <joaodias@google.com>
Subject: Re: [PATCH] sched/fair: vruntime should normalize when switching from fair
Date: Thu, 23 Aug 2018 18:52:08 +0200	[thread overview]
Message-ID: <b27b2354-cbfe-7c16-bdc1-d88ed3be90bd@arm.com> (raw)
In-Reply-To: <dcd2fa21-7fef-d648-017d-9d42d11793e0@google.com>

Hi,

On 08/21/2018 01:54 AM, Miguel de Dios wrote:
> On 08/17/2018 11:27 AM, Steve Muckle wrote:
>> From: John Dias <joaodias@google.com>
>>
>> When rt_mutex_setprio changes a task's scheduling class to RT,
>> we're seeing cases where the task's vruntime is not updated
>> correctly upon return to the fair class.
>> Specifically, the following is being observed:
>> - task is deactivated while still in the fair class
>> - task is boosted to RT via rt_mutex_setprio, which changes
>>    the task to RT and calls check_class_changed.
>> - check_class_changed leads to detach_task_cfs_rq, at which point
>>    the vruntime_normalized check sees that the task's state is 
>> TASK_WAKING,
>>    which results in skipping the subtraction of the rq's min_vruntime
>>    from the task's vruntime
>> - later, when the prio is deboosted and the task is moved back
>>    to the fair class, the fair rq's min_vruntime is added to
>>    the task's vruntime, even though it wasn't subtracted earlier.
>> The immediate result is inflation of the task's vruntime, giving
>> it lower priority (starving it if there's enough available work).
>> The longer-term effect is inflation of all vruntimes because the
>> task's vruntime becomes the rq's min_vruntime when the higher
>> priority tasks go idle. That leads to a vicious cycle, where
>> the vruntime inflation repeatedly doubled.
>>
>> The change here is to detect when vruntime_normalized is being
>> called when the task is waking but is waking in another class,
>> and to conclude that this is a case where vruntime has not
>> been normalized.
>>
>> Signed-off-by: John Dias <joaodias@google.com>
>> Signed-off-by: Steve Muckle <smuckle@google.com>
>> ---
>>   kernel/sched/fair.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index b39fb596f6c1..14011d7929d8 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -9638,7 +9638,8 @@ static inline bool vruntime_normalized(struct 
>> task_struct *p)
>>        * - A task which has been woken up by try_to_wake_up() and
>>        *   waiting for actually being woken up by sched_ttwu_pending().
>>        */
>> -    if (!se->sum_exec_runtime || p->state == TASK_WAKING)
>> +    if (!se->sum_exec_runtime ||
>> +        (p->state == TASK_WAKING && p->sched_class == 
>> &fair_sched_class))
>>           return true;
>>       return false;
> The normalization of vruntime used to exist in task_waking but it was 
> removed and the normalization was moved into migrate_task_rq_fair. The 
> reasoning being that task_waking_fair was only hit when a task is queued 
> onto a different core and migrate_task_rq_fair should do the same work.
> 
> However, we're finding that there's one case which migrate_task_rq_fair 
> doesn't hit: that being the case where rt_mutex_setprio changes a task's 
> scheduling class to RT when its scheduled out. The task never hits 
> migrate_task_rq_fair because it is switched to RT and migrates as an RT 
> task. Because of this we're getting an unbounded addition of 
> min_vruntime when the task is re-attached to the CFS runqueue when it 
> loses the inherited priority. The patch above works because now the 
> kernel specifically checks for this case and normalizes accordingly.
> 
> Here's the patch I was talking about: 
> https://lore.kernel.org/patchwork/patch/677689/. In our testing we were 
> seeing vruntimes nearly double every time after rt_mutex_setprio boosts 
> the task to RT.
> 
> Signed-off-by: Miguel de Dios <migueldedios@google.com>
> Tested-by: Miguel de Dios <migueldedios@google.com>

I tried to catch this issue on my Arm64 Juno board using pi_test (and a 
slightly adapted pip_test (usleep_val = 1500 and keep low as cfs)) from 
rt-tests but wasn't able to do so.

# pi_stress --inversions=1 --duration=1 --groups=1 --sched id=low,policy=cfs

Starting PI Stress Test
Number of thread groups: 1
Duration of test run: 1 seconds
Number of inversions per group: 1
      Admin thread SCHED_FIFO priority 4
1 groups of 3 threads will be created
       High thread SCHED_FIFO priority 3
        Med thread SCHED_FIFO priority 2
        Low thread SCHED_OTHER nice 0

# ./pip_stress

In both cases, the cfs task entering  rt_mutex_setprio() is queued, so 
dequeue_task_fair()->dequeue_entity(), which subtracts 
cfs_rq->min_vruntime from se->vruntime, is called on it before it gets 
the rt prio.

Maybe it requires a very specific use of the pthread library to provoke 
this issue by making sure that the cfs tasks really blocks/sleeps?

  reply	other threads:[~2018-08-23 16:52 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-17 18:27 [PATCH] sched/fair: vruntime should normalize when switching from fair Steve Muckle
2018-08-20 23:54 ` Miguel de Dios
2018-08-23 16:52   ` Dietmar Eggemann [this message]
2018-08-24  6:54     ` Juri Lelli
2018-08-24 21:17       ` Steve Muckle
2018-09-06 23:25       ` Dietmar Eggemann
2018-09-07  7:16         ` Juri Lelli
2018-09-07  7:58           ` Vincent Guittot
2018-09-11  6:24             ` Dietmar Eggemann
2018-08-24  9:32   ` Peter Zijlstra
2018-08-24  9:47     ` Peter Zijlstra
2018-08-24 21:24       ` Steve Muckle
2018-08-27 11:14         ` Peter Zijlstra
2018-08-28 14:53           ` Dietmar Eggemann
2018-08-29 10:54             ` Dietmar Eggemann
2018-08-29 11:59               ` Peter Zijlstra
2018-08-29 15:33                 ` Dietmar Eggemann
2018-08-31 22:24                   ` Steve Muckle
2018-09-26  9:50             ` Wanpeng Li
2018-09-26 22:38               ` Dietmar Eggemann
2018-09-27  1:19                 ` Wanpeng Li
2018-09-27 13:22                   ` Dietmar Eggemann
2018-09-28  0:43                     ` Wanpeng Li
2018-09-28 16:10                       ` Steve Muckle
2018-09-28 16:45                         ` Joel Fernandes
2018-09-28 17:35                         ` Dietmar Eggemann
2018-09-29  1:07                           ` Wanpeng Li
2018-09-28 17:11                       ` Dietmar Eggemann
2018-09-28 16:43                   ` Joel Fernandes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b27b2354-cbfe-7c16-bdc1-d88ed3be90bd@arm.com \
    --to=dietmar.eggemann@arm.com \
    --cc=Chris.Redpath@arm.com \
    --cc=Morten.Rasmussen@arm.com \
    --cc=Patrick.Bellasi@arm.com \
    --cc=joaodias@google.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=migueldedios@google.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=quentin.perret@arm.com \
    --cc=smuckle@google.com \
    --cc=tkjos@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.