From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABFE9C7618D for ; Wed, 5 Apr 2023 23:36:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233443AbjDEXgV (ORCPT ); Wed, 5 Apr 2023 19:36:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230237AbjDEXgT (ORCPT ); Wed, 5 Apr 2023 19:36:19 -0400 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 719241BEA for ; Wed, 5 Apr 2023 16:36:17 -0700 (PDT) Received: by mail-wr1-x430.google.com with SMTP id t4so32536672wra.7 for ; Wed, 05 Apr 2023 16:36:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680737776; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7g+nJD83KGO/5Q3M9wqXtM4GjJe+j3fJfw4W1uFI0Ko=; b=A603qC3B8eGcR0RRofg0DOqi9m9uXLjjHi5SZN8MT0JPaxdGKRrCwKuoGDeebTNhpe ezkv4XyzqykqFuK//Mg7d2nmdXTb/CL7AoYB85sXU75WQTEV2dLqQTFI/AX5bRU0nohR NRwmoIX6aMKGiClyAD7LdtY8T0JcHgYergLpmS+RPWRp8jN/SQJF+Uq532LPRcpxPHpI bTZ6F1IoKMzt/sDTJ1pC3yG0lXbLWDng34uYOkqN7Nhc+ClkllqN1G5NNlIIeoN76b5m YAdOt/rA9A3rJtBx/zeMLeGmiUMVLwCEjlwzyWg+LsHkg6krxTbdkLxCreYGMCC2HrA6 Mvug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680737776; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7g+nJD83KGO/5Q3M9wqXtM4GjJe+j3fJfw4W1uFI0Ko=; b=hpsm2vfeUrxBLTPNRcENL2WTGDLVP5n0UJmL/eQaIYmsx89SkUihq0yNkxdyCQuPy1 nO09yimCPnHw4sk4okBpUC9ZYB7YsONvh8m6QLLCGzasc++qca3Gr+c83x4RfijsHq0J uTdCSdWY3YwocLyFE7Mhyu7BfusWsJwBGOS5IftH2s4dPl+Q4UOPc3AZAueLW7Cu9dST YEeTENx2FwOWe6z9MAYTnb6yHY4hQpuirtJFrJsh9DypZ7qWAwAc03aNeiMfqolB0EfZ eSAeaa9btrE/fbHxheNM18vAaClLv7ROb6bEursItXCAPLFhvFXL5RSAimgd+KQpiuE1 qB4Q== X-Gm-Message-State: AAQBX9ebLPHf5D00h78/STNB727y42bNXp1Uhsf5C8hCPOEyB3QNH0sR zM6kovqTiSbbHV5r3087uztzyiBVVAutSjKxDLc3yQ== X-Google-Smtp-Source: AKy350a0C0q1q4LpobeEhPSekaGzWzfKFKGMY2W7At4vqtXYsoiZNeyRZDugJjdwgnkexFbe0JsranaDSwf7caiBUMA= X-Received: by 2002:a05:6000:1f89:b0:2ee:e45d:71cf with SMTP id bw9-20020a0560001f8900b002eee45d71cfmr27465wrb.6.1680737775781; Wed, 05 Apr 2023 16:36:15 -0700 (PDT) MIME-Version: 1.0 References: <20230330224348.1006691-1-davidai@google.com> <20230330224348.1006691-2-davidai@google.com> <0da58608-2958-fc6c-effb-2f73013c609f@arm.com> In-Reply-To: From: David Dai Date: Wed, 5 Apr 2023 16:36:04 -0700 Message-ID: Subject: Re: [RFC PATCH 1/6] sched/fair: Add util_guest for tasks To: Saravana Kannan Cc: Dietmar Eggemann , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , kernel-team@android.com, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 5, 2023 at 2:43=E2=80=AFPM Saravana Kannan wrote: > > On Wed, Apr 5, 2023 at 3:50=E2=80=AFAM Dietmar Eggemann > wrote: > > > > On 04/04/2023 03:11, David Dai wrote: > > > On Mon, Apr 3, 2023 at 4:40=E2=80=AFAM Dietmar Eggemann > > > wrote: > > >> > > >> Hi David, > > > Hi Dietmar, thanks for your comments. > > >> > > >> On 31/03/2023 00:43, David Dai wrote: > > > > [...] > > > > >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > >>> index 6986ea31c984..998649554344 100644 > > >>> --- a/kernel/sched/fair.c > > >>> +++ b/kernel/sched/fair.c > > >>> @@ -4276,14 +4276,16 @@ static int newidle_balance(struct rq *this_= rq, struct rq_flags *rf); > > >>> > > >>> static inline unsigned long task_util(struct task_struct *p) > > >>> { > > >>> - return READ_ONCE(p->se.avg.util_avg); > > >>> + return max(READ_ONCE(p->se.avg.util_avg), > > >>> + READ_ONCE(p->se.avg.util_guest)); > > >>> } > > >>> > > >>> static inline unsigned long _task_util_est(struct task_struct *p) > > >>> { > > >>> struct util_est ue =3D READ_ONCE(p->se.avg.util_est); > > >>> > > >>> - return max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANGED)); > > >>> + return max_t(unsigned long, READ_ONCE(p->se.avg.util_guest), > > >>> + max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANG= ED))); > > >>> } > > >> > > >> I can't see why the existing p->uclamp_req[UCLAMP_MIN].value can't b= e > > >> used here instead p->se.avg.util_guest. > > > Using p->uclamp_req[UCLAMP_MIN].value would result in folding in > > > uclamp values into task_util and task_util_est for all tasks that hav= e > > > uclamp values set. The intent of these patches isn=E2=80=99t to modif= y > > > existing uclamp behaviour. Users would also override util values from > > > the guest when they set uclamp values. > > >> > > >> I do understand the issue of inheriting uclamp values at fork but do= n't > > >> get the not being `additive` thing. We are at task level here. > > > > > Uclamp values are max aggregated with other tasks at the runqueue > > > level when deciding CPU frequency. For example, a vCPU runqueue may > > > have an util of 512 that results in setting 512 to uclamp_min on the > > > vCPU task. This is insufficient to drive a frequency response if it > > > shares the runqueue with another host task running with util of 512 a= s > > > it would result in a clamped util value of 512 at the runqueue(Ex. If > > > a guest thread had just migrated onto this vCPU). > > > > OK, see your point now. You want an accurate per-task boost for this > > vCPU task on the host run-queue. > > And a scenario in which a vCPU can ask for 100% in these moments is not > > sufficient I guess? In this case uclamp_min could work. > > Right. vCPU can have whatever utilization and there can be random host > threads completely unrelated to the VM. And we need to aggregate both > of their util when deciding CPU freq. > > > > > >> The fact that you have to max util_avg and util_est directly in > > >> task_util() and _task_util_est() tells me that there are places wher= e > > >> this helps and uclamp_task_util() is not called there. > > > Can you clarify on this point a bit more? > > > > Sorry, I meant s/util_est/util_guest/. > > > > The effect of the change in _task_util_est() you see via: > > > > enqueue_task_fair() > > util_est_enqueue() > > cfs_rq->avg.util_est.enqueued +=3D _task_util_est(p) > > > > so that `sugov_get_util() -> cpu_util_cfs() -> > > cfs_rq->avg.util_est.enqueued` can see the effect of util_guest? That sequence looks correct to me. > > > > Not sure about the change in task_util() yet. task_util() provides some signaling in addition to task_util_est() via: find_energy_effcient_cpu() cpu_util_next() lsub_positive(&util, task_util(p)); ... util +=3D task_util(p); //Can provide a better signal than util_est. dequeue_task_fair() util_est_update() ue.enqueued =3D task_util(p); //Updates ue.ewma Thanks, David > > > > >> When you say in the cover letter that you tried uclamp_min, how exac= tly > > >> did you use it? Did you run the existing mainline or did you use > > >> uclamp_min as a replacement for util_guest in this patch here? > > > > > I called sched_setattr_nocheck() with .sched_flags =3D > > > SCHED_FLAG_UTIL_CLAMP when updating uclamp_min and clamp_max is left > > > at 1024. Uclamp_min was not aggregated with task_util and > > > task_util_est during my testing. The only caveat there is that I adde= d > > > a change to only reset uclamp on fork when testing(I realize there is > > > specifically a SCHED_FLAG_RESET_ON_FORK, but I didn=E2=80=99t want to= reset > > > other sched attributes). > > > > OK, understood. It's essentially a util_est v2 for vCPU tasks on host. > > Yup. We initially looked into just overwriting util_est, but didn't > think that'll land well with the community :) as it was a bit messier > because we needed to make sure the current util_est update paths don't > run for vCPU tasks on host (because those values would be wrong). > > > >>> static inline unsigned long task_util_est(struct task_struct *p) > > >>> @@ -6242,6 +6244,15 @@ enqueue_task_fair(struct rq *rq, struct task= _struct *p, int flags) > > >>> */ > > >>> util_est_enqueue(&rq->cfs, p); > > >>> > > >>> + /* > > >>> + * The normal code path for host thread enqueue doesn't take = into > > >>> + * account guest task migrations when updating cpufreq util. > > >>> + * So, always update the cpufreq when a vCPU thread has a > > >>> + * non-zero util_guest value. > > >>> + */ > > >>> + if (READ_ONCE(p->se.avg.util_guest)) > > >>> + cpufreq_update_util(rq, 0); > > >> > > >> > > >> This is because enqueue_entity() -> update_load_avg() -> > > >> attach_entity_load_avg() -> cfs_rq_util_change() requires root run-q= ueue > > >> (&rq->cfs =3D=3D cfs_rq) to call cpufreq_update_util()? > > > The enqueue_entity() would not call into update_load_avg() due to the > > > check for !se->avg.last_update_time. se->avg.last_update_time is > > > non-zero because the vCPU task did not migrate before this enqueue. > > > This enqueue path is reached when util_guest is updated for the vCPU > > > task through the sched_setattr_nocheck call where we want to ensure a > > > frequency update occurs. > > > > OK, vCPU tasks are pinned so always !WF_MIGRATED wakeup I guess? > > Even if say little-vCPU threads are allowed to migrate within little > CPUs, this will still be an issue. While a vCPU thread is continuously > running on a single CPU, a guest thread can migrate into that vCPU and > cause a huge increase in util_guest. But that won't trigger an cpufreq > update on the host side because the host doesn't see a task migration. > That's what David is trying to address. > > -Saravana