From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751915AbeFENPX (ORCPT ); Tue, 5 Jun 2018 09:15:23 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:38381 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751764AbeFENPW (ORCPT ); Tue, 5 Jun 2018 09:15:22 -0400 X-Google-Smtp-Source: ADUXVKI8c20Axzy+zsrVt+Ytx3K8HD1dwUE1HTMeN32TmlBjAOH6OqRjSvX7j106PgJPNLKcgYJUYg== Date: Tue, 5 Jun 2018 15:15:18 +0200 From: Juri Lelli To: Quentin Perret Cc: Vincent Guittot , Peter Zijlstra , Ingo Molnar , linux-kernel , "Rafael J. Wysocki" , Dietmar Eggemann , Morten Rasmussen , viresh kumar , Valentin Schneider Subject: Re: [PATCH v5 00/10] track CPU utilization Message-ID: <20180605131518.GG16081@localhost.localdomain> References: <1527253951-22709-1-git-send-email-vincent.guittot@linaro.org> <20180605105721.GA12193@e108498-lin.cambridge.arm.com> <20180605121153.GD16081@localhost.localdomain> <20180605130548.GB12193@e108498-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180605130548.GB12193@e108498-lin.cambridge.arm.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/06/18 14:05, Quentin Perret wrote: > On Tuesday 05 Jun 2018 at 14:11:53 (+0200), Juri Lelli wrote: > > Hi Quentin, > > > > On 05/06/18 11:57, Quentin Perret wrote: > > > > [...] > > > > > What about the diff below (just a quick hack to show the idea) applied > > > on tip/sched/core ? > > > > > > ---8<--- > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > > index a8ba6d1f262a..23a4fb1c2c25 100644 > > > --- a/kernel/sched/cpufreq_schedutil.c > > > +++ b/kernel/sched/cpufreq_schedutil.c > > > @@ -180,9 +180,12 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu) > > > sg_cpu->util_dl = cpu_util_dl(rq); > > > } > > > > > > +unsigned long scale_rt_capacity(int cpu); > > > static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu) > > > { > > > struct rq *rq = cpu_rq(sg_cpu->cpu); > > > + int cpu = sg_cpu->cpu; > > > + unsigned long util, dl_bw; > > > > > > if (rq->rt.rt_nr_running) > > > return sg_cpu->max; > > > @@ -197,7 +200,14 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu) > > > * util_cfs + util_dl as requested freq. However, cpufreq is not yet > > > * ready for such an interface. So, we only do the latter for now. > > > */ > > > - return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs)); > > > + util = arch_scale_cpu_capacity(NULL, cpu) * scale_rt_capacity(cpu); > > > > Sorry to be pedantinc, but this (ATM) includes DL avg contribution, so, > > since we use max below, we will probably have the same problem that we > > discussed on Vincent's approach (overestimation of DL contribution while > > we could use running_bw). > > Ah no, you're right, this isn't great for long running deadline tasks. > We should definitely account for the running_bw here, not the dl avg... > > I was trying to address the issue of RT stealing time from CFS here, but > the DL integration isn't quite right which this patch as-is, I agree ... > > > > > > + util >>= SCHED_CAPACITY_SHIFT; > > > + util = arch_scale_cpu_capacity(NULL, cpu) - util; > > > + util += sg_cpu->util_cfs; > > > + dl_bw = (rq->dl.this_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT; > > > > Why this_bw instead of running_bw? > > So IIUC, this_bw should basically give you the absolute reservation (== the > sum of runtime/deadline ratios of all DL tasks on that rq). Yep. > The reason I added this max is because I'm still not sure to understand > how we can safely drop the freq below that point ? If we don't guarantee > to always stay at least at the freq required by DL, aren't we risking to > start a deadline tasks stuck at a low freq because of rate limiting ? In > this case, if that tasks uses all of its runtime then you might start > missing deadlines ... We decided to avoid (software) rate limiting for DL with e97a90f7069b ("sched/cpufreq: Rate limits for SCHED_DEADLINE"). > My feeling is that the only safe thing to do is to guarantee to never go > below the freq required by DL, and to optimistically add CFS tasks > without raising the OPP if we have good reasons to think that DL is > using less than it required (which is what we should get by using > running_bw above I suppose). Does that make any sense ? Then we can't still avoid the hardware limits, so using running_bw is a trade off between safety (especially considering soft real-time scenarios) and energy consumption (which seems to be working in practice). Thanks, - Juri