Re: [PATCH] sched/fair: cfs quota cause large schedule latency

* Re: [PATCH] sched/fair: cfs quota cause large schedule latency
       [not found] <7A2C95E1327F7148AB122F200A3EFA408068ADA6@dggema521-mbx.china.huawei.com>
@ 2018-07-20 13:15 ` Peter Zijlstra
  0 siblings, 0 replies; only message in thread
From: Peter Zijlstra @ 2018-07-20 13:15 UTC (permalink / raw)
  To: Xiexiangyou
  Cc: linux-kernel, pjt, tglx, efault, akpm, vincent.guittot,
	Huangweidong (C), weiqi (C),
	longpeng

On Mon, Jul 16, 2018 at 07:08:41AM +0000, Xiexiangyou wrote:
> Virtual machine has cgroup hierarchies as follow:
> 
>                root
> 
>                 |
> 
>               vm_tg
> 
>               (cfs_rq)
> 
>               /    \
> 
>             (se)    (se)
> 
>             tg_A    tg_B
> 
>           (cfs_rq)    (cfs_rq)
> 
>             /          \
> 
>           (se)          (se)
> 
>           a              b
> 
> A and B are two vcpus of the VM.
> 
> 
> 
> We set cfs quota on vm_tg, and the schedule latency of vcpu(a/b) may become very large, up to more than 2S.
> 
> 
> 
> Shows Perf sched test result:
> 
> Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
> 
> -----------------------------------------------------------------------------------------------------------------
> 
>   CPU 0/KVM:49609       |    260.261 ms |       50 | avg:   82.017 ms | max: 2510.990 ms | max at:  43335.555886 s
> 
>   .....
> 
> 
> 
> We add some trace points, found the sequence as follows will lead to the issue:
> 
> -          'a' is only task of tg_A, when 'a' go to sleep, tg_A is dequeued, and tg_A->se->load.weight = MIN_SHARES.
> 
> -          'b' continue running, then trigger throttle. tg_A->cfs_rq->throttle_count=1
> 
> -          some task wakeup process 'a', When enqueue tg_A, tg_A->se->load.weight can't be updated because tg_A->cfs_rq->throttle_count=1
> 
> -          after one cfs quota period, vm_tg is unthrottled
> 
> -          'a' is running
> 
> -          after one tick, when update tg_A->se's vruntime, tg_A->se->load.weight is still MIN_SHARES, lead tg_A->se's vruntime has grown a large value.
> 
> -          That will cause 'a' to have a large schedule latancy.
> 
> The fix patch as follows:
> 
> Signed-off-by: Xiangyou Xie <xiexiangyou@huawei.com<mailto:xiexiangyou@huawei.com>>

The above Changelog violates just about every formatting rule ever
invented. Also you got your email format wrong.

The patch might be OK, but at this point I really can't do anything with
it anyway.

> ---
> kernel/sched/fair.c | 3 ---
> 1 file changed, 3 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2f0a0be..348ccd6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3016,9 +3016,6 @@ static void update_cfs_group(struct sched_entity *se)
>         if (!gcfs_rq)
>                 return;
> 
> -       if (throttled_hierarchy(gcfs_rq))
> -               return;
> -
> #ifndef CONFIG_SMP
>         runnable = shares = READ_ONCE(gcfs_rq->tg->shares);
> 
> --
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] only message in thread