All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chen Yu <yu.c.chen@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: <mingo@kernel.org>, <vincent.guittot@linaro.org>,
	<linux-kernel@vger.kernel.org>, <juri.lelli@redhat.com>,
	<dietmar.eggemann@arm.com>, <rostedt@goodmis.org>,
	<bsegall@google.com>, <mgorman@suse.de>, <bristot@redhat.com>,
	<corbet@lwn.net>, <qyousef@layalina.io>, <chris.hyser@oracle.com>,
	<patrick.bellasi@matbug.net>, <pjt@google.com>, <pavel@ucw.cz>,
	<qperret@google.com>, <tim.c.chen@linux.intel.com>,
	<joshdon@google.com>, <timj@gnu.org>, <kprateek.nayak@amd.com>,
	<youssefesmat@chromium.org>, <joel@joelfernandes.org>
Subject: Re: [PATCH 06/10] sched/fair: Add avg_vruntime
Date: Fri, 24 Mar 2023 15:12:19 +0800	[thread overview]
Message-ID: <ZB1NU1Yc8DSi4zrW@chenyu5-mobl1> (raw)
In-Reply-To: <20230321160458.GB2273492@hirez.programming.kicks-ass.net>

On 2023-03-21 at 17:04:58 +0100, Peter Zijlstra wrote:
> On Tue, Mar 21, 2023 at 09:58:13PM +0800, Chen Yu wrote:
> > On 2023-03-06 at 14:25:27 +0100, Peter Zijlstra wrote:
> > [...]
> > >  
> > > +/*
> > > + * Compute virtual time from the per-task service numbers:
> > > + *
> > > + * Fair schedulers conserve lag: \Sum lag_i = 0
> > > + *
> > > + * lag_i = S - s_i = w_i * (V - v_i)
> > > + *
> > The definination of above lag_i seems to be inconsistent with the defininatin
> > of se->lag in PATCH 8. Maybe rename lag_i to something other to avoid confusion?
> 
> Yeah, I ran into that the other day, I think I'll introduce vlag_i = V - v_i
> or so.
> 
> > > + * \Sum lag_i = 0 -> \Sum w_i * (V - v_i) = V * \Sum w_i - \Sum w_i * v_i = 0
> > > + *
> > > + * From which we solve V:
> > > + *
> > > + *     \Sum v_i * w_i
> > > + * V = --------------
> > > + *        \Sum w_i
> > > + *
> > > + * However, since v_i is u64, and the multiplcation could easily overflow
> > > + * transform it into a relative form that uses smaller quantities:
> > > + *
> > > + * Substitute: v_i == (v_i - v) + v
> > > + *
> > > + *     \Sum ((v_i - v) + v) * w_i   \Sum (v_i - v) * w_i
> > > + * V = -------------------------- = -------------------- + v
> > > + *              \Sum w_i                   \Sum w_i
> > > + *
> > > + *
> 
> > Not sure if I understand it correctly, does it mean  (v_i - v) * w_i will not
> > overflow? If the weight of task is 15 (nice 19), then if v_i - v > (S64_MAX / 15)
> > it gets overflow. Is it possible that v_i is much larger than cfs_rq->min_vruntime
> > in this case?
> 
> Or worse, SCHED_IDLE, where weight is 2 (IIRC) or cgroups, then vtime
> advances at 512 times realtime. Now, the tick puts a limit on how long
> we'll overshoot these super low weight entities, for HZ=1000 we still
> only get 0.5s of vtime for weight=2.
>
> That would be only 30 bits used, except we use double FIXEDPOINT_SHIFT
> on 64bit, so we'll end up at 40-ish.
> 
> That should give us enough room to carry an average of deltas around
> min_vruntime.
> 
I'm trying to digest how ticks could prevent the overflow.
In update_curr() -> update_min_vruntime(cfs_rq), the cfs_rq->min_vruntime
is set to
max (cfs_rq->min_vruntime, min(curr->vruntime, leftmost(se->vruntime)))
so, although curr->vruntime increase by 0.5 seconds in each tick,
the leftmost(se->vruntime) could still be very small and unchanged,
thus the delta between v_i and cfs_rq->min_vruntime is still large.
Instead sysctl_sched_latency could decide how far it is between the
se.vruntime and the cfs_rq.min_vruntime, by calculating the vruntime
delta between task1 and task2:

    sched_vslice(task1) = (NICE0_LOAD/se1.weight)  * (w1/Sum wi * sysctl_sched_latency)
    sched_vslice(task2) = (NICE0_LOAD/se2.weight)  * (w2/Sum wi * sysctl_sched_latency)

Besides in patch 10, entity_eligible() checks
\Sum (v_i - v)*w_i >= (v_i - v)*(\Sum w_i)
and the \Sum w_i could become large if there are many runnable tasks and
bring overflow? But yes, if the vi -v is very small we can get rid of this
issue IMO.

thanks,
Chenyu
> But yes, I've seen this go sideways and I need to stare a bit more at
> this. One of the things I've considered is changing the min_vruntime
> update rules to instead move to avg_vruntime() to minimize the deltas.
> But I've not yet actually written that code.
> 

  reply	other threads:[~2023-03-24  7:13 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06 13:25 [PATCH 00/10] sched: EEVDF using latency-nice Peter Zijlstra
2023-03-06 13:25 ` [PATCH 01/10] sched: Introduce latency-nice as a per-task attribute Peter Zijlstra
2023-03-06 13:25 ` [PATCH 02/10] sched/core: Propagate parent tasks latency requirements to the child task Peter Zijlstra
2023-03-06 13:25 ` [PATCH 03/10] sched: Allow sched_{get,set}attr to change latency_nice of the task Peter Zijlstra
2023-03-06 13:25 ` [PATCH 04/10] sched/fair: Add latency_offset Peter Zijlstra
2023-03-06 13:25 ` [PATCH 05/10] sched/fair: Add sched group latency support Peter Zijlstra
2023-03-06 13:25 ` [PATCH 06/10] sched/fair: Add avg_vruntime Peter Zijlstra
2023-03-21 13:58   ` Chen Yu
2023-03-21 16:04     ` Peter Zijlstra
2023-03-24  7:12       ` Chen Yu [this message]
2023-03-24 10:03         ` Peter Zijlstra
2023-03-06 13:25 ` [PATCH 07/10] sched/fair: Remove START_DEBIT Peter Zijlstra
2023-03-06 13:25 ` [PATCH 08/10] sched/fair: Add lag based placement Peter Zijlstra
2023-03-16 22:49   ` Tim Chen
2023-03-06 13:25 ` [PATCH 09/10] rbtree: Add rb_add_augmented_cached() helper Peter Zijlstra
2023-03-06 13:25 ` [PATCH 10/10] sched/fair: Implement an EEVDF like policy Peter Zijlstra
2023-03-08  8:39   ` Mike Galbraith
2023-03-08  9:26     ` Mike Galbraith
2023-03-08 13:36     ` Mike Galbraith
2023-03-09  4:23       ` Mike Galbraith
2023-03-10 20:38         ` Peter Zijlstra
2023-03-11  5:53           ` Mike Galbraith
2023-03-11  7:56             ` Mike Galbraith
2023-03-09  9:06       ` Peter Zijlstra
2023-03-09 12:44         ` Peter Zijlstra
2023-03-09 15:29           ` Peter Zijlstra
2023-03-09 15:39             ` Peter Zijlstra
2023-03-09 16:24             ` Mike Galbraith
2023-03-09 16:42             ` Peter Zijlstra
2023-03-07 10:27 ` [PATCH 00/10] sched: EEVDF using latency-nice Vincent Guittot
2023-03-07 13:08   ` Peter Zijlstra
2023-03-08 15:13 ` Shrikanth Hegde
2023-03-22  6:49 ` K Prateek Nayak
2023-03-22  9:38   ` K Prateek Nayak
2023-03-23 11:53 ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZB1NU1Yc8DSi4zrW@chenyu5-mobl1 \
    --to=yu.c.chen@intel.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=chris.hyser@oracle.com \
    --cc=corbet@lwn.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=joel@joelfernandes.org \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=patrick.bellasi@matbug.net \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=qperret@google.com \
    --cc=qyousef@layalina.io \
    --cc=rostedt@goodmis.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=timj@gnu.org \
    --cc=vincent.guittot@linaro.org \
    --cc=youssefesmat@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.