From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B938419BCC for ; Wed, 3 Jan 2024 13:41:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linaro.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="jSLzNE3w" Received: by mail-pj1-f53.google.com with SMTP id 98e67ed59e1d1-28c23a0df1eso3458135a91.0 for ; Wed, 03 Jan 2024 05:41:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1704289275; x=1704894075; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=LvuPjU/IgmMeJVXibAVcHoabpAAtu1SyJkkN9yxBvKY=; b=jSLzNE3wl2d+5mCzkH116gN/LlHafdn2w7BZcky4QmUcfDv72G50fzQXCZIvx9kcSV wMbU5pTBOcn+F4tx/FScEO4SjiD63jJphIXFYnuKowax5rE3VZZgnfbCae5dgNArVzJ8 HLqDUp9ghfkQvG5TxHLR2n5YPJ2W/Z57RH7zmFWX5qNyJLuwxzfuTGsneZrwAaaqnj6b eO8unsalJ4kIs/Js7JoFuh7kxCDBIBwfMGYA9Bp1KScmiweKhDl9lOT4cY/qrgZbjzkd 4RXbKvGgOEJBVxxqGER0xJaDTHFErelJT/j3Uht/otYsJRPY2bqoBVdM6ILjUqvO5hxh vtqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704289275; x=1704894075; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LvuPjU/IgmMeJVXibAVcHoabpAAtu1SyJkkN9yxBvKY=; b=hSdgSAcPCHdmHkk0egqxl+V7bpSCdnV7K2v60xy/ov73UbzIl7hkbRpGMSqWmA8bbE 0syzwIacX7pQIBme44+Dbm8lXV3n8IAjjMkkpvce2GTXNFDfis4zoYumlJZzzjPb54Ev y3ks4Mm4HDWz+KVuP4iuzh6Cok5vLq38GwSk0F/Y3iR1cHSQ20R1/6WxP1YljmHUOJut PtjJsxvBXOYEZZmFO+1AvRKq27SWyAfvCxI1azQ6vEuQMDHetIUxLTbzdfXchlw/f/vt UFdr2pErTYTdmbPZ7Ki6uxSVq/1kQ/LD6T5DbEB2Xc7U//uKzHiWfJZ702adu76d1PGo Rlfg== X-Gm-Message-State: AOJu0YwhBPVUE/zAEEB7fRejnrTcMuW8veWfboYGNWBoXYouI7U06yIo uhHrUP3PE2tvJQxfwY1FPa64YFhmZQLezJ0Xg5jFwpMVnQPLjw== X-Google-Smtp-Source: AGHT+IFgMREW0a+5RgXL5UBvl62QyyRG13LxltUKOAWoZ8mPkanEmqAeA6bEMPIkqbHOKcVM477djGZ8URCoGIJ7l0U= X-Received: by 2002:a17:90a:df90:b0:28c:146e:31e7 with SMTP id p16-20020a17090adf9000b0028c146e31e7mr5428986pjv.0.1704289275013; Wed, 03 Jan 2024 05:41:15 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20231208015242.385103-1-qyousef@layalina.io> <20231208015242.385103-2-qyousef@layalina.io> <20231212124037.at47izy5xp6lsxh2@airbuntu> <20231229002529.sidy6wxmclhzlzib@airbuntu> In-Reply-To: <20231229002529.sidy6wxmclhzlzib@airbuntu> From: Vincent Guittot Date: Wed, 3 Jan 2024 14:41:03 +0100 Message-ID: Subject: Re: [PATCH 1/4] sched/fair: Be less aggressive in calling cpufreq_update_util() To: Qais Yousef Cc: Ingo Molnar , Peter Zijlstra , "Rafael J. Wysocki" , Viresh Kumar , Dietmar Eggemann , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Lukasz Luba , Wei Wang , Rick Yiu , Chung-Kai Mei , Hongyan Xia Content-Type: text/plain; charset="UTF-8" On Fri, 29 Dec 2023 at 01:25, Qais Yousef wrote: > > On 12/12/23 12:40, Qais Yousef wrote: > > On 12/12/23 12:06, Vincent Guittot wrote: > > > > > > @@ -6772,6 +6737,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) > > > > enqueue_throttle: > > > > assert_list_leaf_cfs_rq(rq); > > > > > > > > > > Here and in the other places below, you lose : > > > > > > - } else if (decayed) { > > > > > > The decayed condition ensures a rate limit (~1ms) in the number of > > > calls to cpufreq_update_util. > > > > > > enqueue/dequeue/tick don't create any sudden change in the PELT > > > signals that would require to update cpufreq of the change unlike > > > attach/detach > > > > Okay, thanks for the clue. Let me rethink this again. > > Thinking more about this. Do we really need to send freq updates at > enqueue/attach etc? Yes, attach and detach are the 2 events which can make abrupt and significant changes in the utilization of the CPU. > > I did an experiment to remove all the updates except in three places: > > 1. Context switch (done unconditionally) > 2. Tick > 2. update_blocked_averages() >From the PoV of util_avg, attach, detach, tick and update_blocked_averages are mandatory events to report to cpufreq to correctly follow utilization. > > I tried the below patch on mac mini with m1 SoC, 6.6 kernel; speedometers > scores were the same (against this series). > > I ran perf bench sched pipe to see if the addition in context switch will make > things worse > > before (this series applied): > > # Running 'sched/pipe' benchmark: > # Executed 1000000 pipe operations between two processes > > Total time: 20.505 [sec] > > 20.505628 usecs/op > 48767 ops/sec > > after (proposed patch below applied on top): > > # Running 'sched/pipe' benchmark: > # Executed 1000000 pipe operations between two processes > > Total time: 19.475 [sec] > > 19.475838 usecs/op > 51345 ops/sec > > I tried against vanilla kernel (without any patches, including some dependency > backports) using schedutil governor > > # Running 'sched/pipe' benchmark: > # Executed 1000000 pipe operations between two processes > > Total time: 22.428 [sec] > > 22.428166 usecs/op > 44586 ops/sec > > (I got higher run to run variance against this kernel) > > So things got better. I think we can actually unify all updates to be at > context switch and tick for all classes. > > The one hole is for long running CFS tasks without context switch; no updates > until tick this means the dvfs headroom needs to cater for that as worst case > scenario now. I think this is the behavior today actually; but accidental > updates due to enqueue/dequeue could have helped to issue more updates. If none > of that happens, then updating load_avg at entity_tick() is what would have > triggered a frequency update. > > I'm not sure if the blocked average one is required to be honest. But feared > that when the cpu goes to idle we might miss reducing frequencies vote for that > CPU - which matters on shared cpufreq policies. > > I haven't done comprehensive testing of course. But would love to hear any > thoughts on how we can be more strategic and reduce the number of cpufreq > updates we send. And iowait boost state needs to be verified. > > While testing this series more I did notice that short kworker context switches > still can cause the frequency to go high. I traced this back to > __balance_callbacks() in finish_lock_switch() after calling > uclamp_context_switch(). So I think we do have a problem of updates being > 'accidental' and we need to improve the interaction with the governor to be > more intentional keeping in mind all the constraints we have today in h/w and > software. > > --->8--- > > > From 6deed09be1d075afa3858ca62dd54826cdb60d44 Mon Sep 17 00:00:00 2001 > From: Qais Yousef > Date: Tue, 26 Dec 2023 01:23:57 +0000 > Subject: [PATCH] sched/fair: Update freq on tick and context switch and > blocked avgs > > Signed-off-by: Qais Yousef (Google) > --- > kernel/sched/cpufreq_schedutil.c | 3 --- > kernel/sched/fair.c | 13 +------------ > kernel/sched/sched.h | 15 +-------------- > 3 files changed, 2 insertions(+), 29 deletions(-) > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > index c0879a985097..553a3d7f02d8 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -166,9 +166,6 @@ static inline bool ignore_short_tasks(int cpu, > struct task_struct *p = cpu_rq(cpu)->curr; > unsigned long task_util; > > - if (!(flags & SCHED_CPUFREQ_PERF_HINTS)) > - return false; > - > if (!fair_policy(p->policy)) > return false; > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index d63eae534cec..3a30f78b37d3 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5717,8 +5717,6 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) > sub_nr_running(rq, task_delta); > > done: > - cpufreq_update_util(rq, 0); > - > /* > * Note: distribution will already see us throttled via the > * throttled-list. rq->lock protects completion. > @@ -5811,8 +5809,6 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) > unthrottle_throttle: > assert_list_leaf_cfs_rq(rq); > > - cpufreq_update_util(rq, 0); > - > /* Determine whether we need to wake up potentially idle CPU: */ > if (rq->curr == rq->idle && rq->cfs.nr_running) > resched_curr(rq); > @@ -6675,8 +6671,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) > enqueue_throttle: > assert_list_leaf_cfs_rq(rq); > > - cpufreq_update_util(rq, p->in_iowait ? SCHED_CPUFREQ_IOWAIT : 0); > - > hrtick_update(rq); > } > > @@ -6754,7 +6748,6 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) > > dequeue_throttle: > util_est_update(&rq->cfs, p, task_sleep); > - cpufreq_update_util(rq, 0); > hrtick_update(rq); > } > > @@ -8338,7 +8331,6 @@ done: __maybe_unused; > > update_misfit_status(p, rq); > sched_fair_update_stop_tick(rq, p); > - cpufreq_update_util(rq, 0); > > return p; > > @@ -12460,7 +12452,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued) > > update_misfit_status(curr, rq); > update_overutilized_status(task_rq(curr)); > - cpufreq_update_util(rq, SCHED_CPUFREQ_PERF_HINTS); > + cpufreq_update_util(rq, current->in_iowait ? SCHED_CPUFREQ_IOWAIT : 0); > > task_tick_core(rq, curr); > } > @@ -12585,7 +12577,6 @@ static void detach_task_cfs_rq(struct task_struct *p) > struct sched_entity *se = &p->se; > > detach_entity_cfs_rq(se); > - cpufreq_update_util(task_rq(p), 0); > } > > static void attach_task_cfs_rq(struct task_struct *p) > @@ -12593,7 +12584,6 @@ static void attach_task_cfs_rq(struct task_struct *p) > struct sched_entity *se = &p->se; > > attach_entity_cfs_rq(se); > - cpufreq_update_util(task_rq(p), 0); > } > > static void switched_from_fair(struct rq *rq, struct task_struct *p) > @@ -12839,7 +12829,6 @@ static int __sched_group_set_shares(struct task_group *tg, unsigned long shares) > update_load_avg(cfs_rq_of(se), se, UPDATE_TG); > update_cfs_group(se); > } > - cpufreq_update_util(rq, 0); > rq_unlock_irqrestore(rq, &rf); > } > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 516187ea2b81..e1622e2b82be 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -3076,20 +3076,7 @@ static inline bool uclamp_rq_is_capped(struct rq *rq) > /* Request freq update on context switch if necessary */ > static inline void uclamp_context_switch(struct rq *rq) > { > - unsigned long uclamp_min; > - unsigned long uclamp_max; > - unsigned long util; > - > - /* Only RT and FAIR tasks are aware of uclamp */ > - if (!rt_policy(current->policy) && !fair_policy(current->policy)) > - return; > - > - uclamp_min = uclamp_eff_value(current, UCLAMP_MIN); > - uclamp_max = uclamp_eff_value(current, UCLAMP_MAX); > - util = rq->cfs.avg.util_avg; > - > - if (uclamp_min > util || uclamp_max < util) > - cpufreq_update_util(rq, SCHED_CPUFREQ_PERF_HINTS); > + cpufreq_update_util(rq, current->in_iowait ? SCHED_CPUFREQ_IOWAIT : 0); > } > #else /* CONFIG_UCLAMP_TASK */ > static inline unsigned long uclamp_eff_value(struct task_struct *p, > -- > 2.40.1 >