Re: [PATCH 08/15] sched,fair: simplify timeslice length code

From: Vincent Guittot <vincent.guittot@linaro.org>
To: Rik van Riel <riel@surriel.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Kernel Team <kernel-team@fb.com>, Paul Turner <pjt@google.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH 08/15] sched,fair: simplify timeslice length code
Date: Thu, 29 Aug 2019 16:02:26 +0200	[thread overview]
Message-ID: <CAKfTPtDX+keNfNxf78yMoF3QaXSG_fZHJ_nqCFKYDMYGa84A6Q@mail.gmail.com> (raw)
In-Reply-To: <d703071084dadb477b8248b041d0d1aa730d65cd.camel@surriel.com>

On Thu, 29 Aug 2019 at 01:19, Rik van Riel <riel@surriel.com> wrote:
>
> On Wed, 2019-08-28 at 19:32 +0200, Vincent Guittot wrote:
> > On Thu, 22 Aug 2019 at 04:18, Rik van Riel <riel@surriel.com> wrote:
> > > The idea behind __sched_period makes sense, but the results do not
> > > always.
> > >
> > > When a CPU has one high priority task and a large number of low
> > > priority
> > > tasks, __sched_period will return a value larger than
> > > sysctl_sched_latency,
> > > and the one high priority task may end up getting a timeslice all
> > > for itself
> > > that is also much larger than sysctl_sched_latency.
> >
> > note that unless you enable sched_feat(HRTICK), the sched_slice is
> > mainly used to decide how fast we preempt running task at tick but a
> > newly wake up task can preempt it before
> >
> > > The low priority tasks will have their time slices rounded up to
> > > sysctl_sched_min_granularity, resulting in an even larger
> > > scheduling
> > > latency than targeted by __sched_period.
> >
> > Will this not break the fairness between a always running task and a
> > short sleeping one with this changes ?
>
> In what way?
>
> The vruntime for the always running task will continue
> to advance the same way it always has.

Ok so 1st, my brain is probably not yet fully back from vacations as I
have read sysctl_sched_min_granularity instead of sysctl_sched_latency
 and wrongly thought that you were setting
sysctl_sched_min_granularity for all tasks.
That being said, sched_slice is used to prevent other tasks to preempt
the running task before it get a chances to run its ideal time
compared to others and before new tasks modify the ideal sched_slice
of each. By capping this max value, the task can be preempted earlier
than before by newly wake up task and don't get the amount of running
time it could have expect before the situation is changing

>
> > > Simplify the code by simply ripping out __sched_period and always
> > > taking
> > > fractions of sysctl_sched_latency.
> > >
> > > If a high priority task ends up getting a "too small" time slice
> > > compared
> > > to low priority tasks, the vruntime scaling ensures that it will
> > > simply
> > > get scheduled more frequently than low priority tasks.
> >
> > Will you not increase the number of context switch ?
>
> It should actually decrease the number of context
> switches. If a nice +19 task gets a longer time slice
> than it would today, its vruntime will be advanced by

In fact that's already the case today, when a task is scheduled, it
runs a full jiffy even if its sched_slice is smaller than a jiffy
(unless you have enabled sched_feat(HRTICK)).

> more than sysctl_sched_latency, and it will not get
> to run again until another task has caught up with its
> vruntime.
>
> That means the regular (or high) priority task that
> shares the CPU with that nice +19 task might get
> several time slices in a row until the nice +19 task
> gets to run again.
>
> What am I overlooking?

My point is more for task that runs several ticks in a row. Their
sched_slice will be shorter in some cases with your changes so they
can be preempted earlier by other runnable tasks with a lower vruntime
and there will be more context switch

>
> --
> All Rights Reversed.