From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755332AbcHSBnG (ORCPT ); Thu, 18 Aug 2016 21:43:06 -0400 Received: from mail-wm0-f54.google.com ([74.125.82.54]:35771 "EHLO mail-wm0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754528AbcHSBnD (ORCPT ); Thu, 18 Aug 2016 21:43:03 -0400 MIME-Version: 1.0 In-Reply-To: <20160818134517.GC27873@e105550-lin.cambridge.arm.com> References: <1469453670-2660-1-git-send-email-morten.rasmussen@arm.com> <1469453670-2660-11-git-send-email-morten.rasmussen@arm.com> <20160815142342.GV6879@twins.programming.kicks-ass.net> <20160815154237.GE3391@e105550-lin.cambridge.arm.com> <20160818084053.GG3391@e105550-lin.cambridge.arm.com> <20160818102438.GA27873@e105550-lin.cambridge.arm.com> <20160818134517.GC27873@e105550-lin.cambridge.arm.com> From: Wanpeng Li Date: Fri, 19 Aug 2016 09:43:00 +0800 Message-ID: Subject: Re: [PATCH v3 10/13] sched/fair: Compute task/cpu utilization at wake-up more correctly To: Morten Rasmussen Cc: Peter Zijlstra , Ingo Molnar , Dietmar Eggemann , Yuyang Du , Vincent Guittot , Mike Galbraith , sgurrappadi@nvidia.com, Koan-Sin Tan , =?UTF-8?B?5bCP5p6X5pWs5aSq?= , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2016-08-18 21:45 GMT+08:00 Morten Rasmussen : > On Thu, Aug 18, 2016 at 07:46:44PM +0800, Wanpeng Li wrote: >> 2016-08-18 18:24 GMT+08:00 Morten Rasmussen : >> > On Thu, Aug 18, 2016 at 09:40:55AM +0100, Morten Rasmussen wrote: >> >> On Mon, Aug 15, 2016 at 04:42:37PM +0100, Morten Rasmussen wrote: >> >> > On Mon, Aug 15, 2016 at 04:23:42PM +0200, Peter Zijlstra wrote: >> >> > > But unlike that function, it doesn't actually use __update_load_avg(). >> >> > > Why not? >> >> > >> >> > Fair question :) >> >> > >> >> > We currently exploit the fact that the task utilization is _not_ updated >> >> > in wake-up balancing to make sure we don't under-estimate the capacity >> >> > requirements for tasks that have slept for a while. If we update it, we >> >> > loose the non-decayed 'peak' utilization, but I guess we could just >> >> > store it somewhere when we do the wake-up decay. >> >> > >> >> > I thought there was a better reason when I wrote the patch, but I don't >> >> > recall right now. I will look into it again and see if we can use >> >> > __update_load_avg() to do a proper update instead of doing things twice. >> >> >> >> AFAICT, we should be able to synchronize the task utilization to the >> >> previous rq utilization using __update_load_avg() as you suggest. The >> >> patch below is should work as a replacement without any changes to >> >> subsequent patches. It doesn't solve the under-estimation issue, but I >> >> have another patch for that. >> > >> > And here is a possible solution to the under-estimation issue. The patch >> > would have to go at the end of this set. >> > >> > ---8<--- >> > >> > From 5bc918995c6c589b833ba1f189a8b92fa22202ae Mon Sep 17 00:00:00 2001 >> > From: Morten Rasmussen >> > Date: Wed, 17 Aug 2016 15:30:43 +0100 >> > Subject: [PATCH] sched/fair: Track peak per-entity utilization >> > >> > When using PELT (per-entity load tracking) utilization to place tasks at >> > wake-up using the decayed utilization (due to sleep) leads to >> > under-estimation of true utilization of the task. This could mean >> > putting the task on a cpu with less available capacity than is actually >> > needed. This issue can be mitigated by using 'peak' utilization instead >> > of the decayed utilization for placement decisions, e.g. at task >> > wake-up. >> > >> > The 'peak' utilization metric, util_peak, tracks util_avg when the task >> > is running and retains its previous value while the task is >> > blocked/waiting on the rq. It is instantly updated to track util_avg >> > again as soon as the task running again. >> >> Maybe this will lead to disable wake affine due to a spike peak value >> for a low average load task. > > I assume you are referring to using task_util_peak() instead of > task_util() in wake_cap()? Yes. > > The peak value should never exceed the util_avg accumulated by the task > last time it ran. So any spike has to be caused by the task accumulating > more utilization last time it ran. We don't know if it a spike or a more I see. > permanent change in behaviour, so we have to guess. So a spike on an > asymmetric system could cause us to disable wake affine in some > circumstances (either prev_cpu or waker cpu has to be low compute > capacity) for the following wake-up. > > SMP should be unaffected as we should bail out on the previous > condition. Why capacity_orig instead of capacity since it is checked each time wakeup and maybe rt class/interrupt have already occupied many cpu utilization. > > The counter-example is task with a fairly long busy period and a much > longer period (cycle). Its util_avg might have decayed away since the > last activation so it appears very small at wake-up and we end up > putting it on a low capacity cpu every time even though it keeps the cpu > busy for a long time every time it wakes up. Agreed, that's the reason for under-estimation concern. > > Did that answer your question? Yeah, thanks for the clarification. Regards, Wanpeng Li