linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] sched/fair: Align vruntime to last_se when curr_se's timeslice run out
@ 2018-12-05 12:41 weiqi (C)
  2018-12-06  9:36 ` Peter Zijlstra
  0 siblings, 1 reply; 2+ messages in thread
From: weiqi (C) @ 2018-12-05 12:41 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra

Hello,
I measuring the cfs scheduling delay, and found that
when a cfs_rq contains multiple tasks, it does not guarantee
every se of this cfs_rq can get a changce to run within
  one sched_latency period. Assume that there are three se
  A, B, and C in cfs_rq, and the default sched_latency=24ms.
In ideal state, A, B, and C take turns to get 8ms, The timing is:
  A(8ms) -> B(8ms) -> C(8ms) -> A(8ms) -> B(8ms) -> C(8ms) ....
So, the delay for each task is approximately 16ms.

However, check_preempt_tick() is the clock interrupt granularity,
and the clock interrupt may also have an indeterminate fine delay,
so the time slice actually expires relative
  to the ideal expiration time delay uncertain time.
This may result in a timing of :
      A -> B -> C -> B -> C -> A
which increases the scheduling delay of A to 32ms.


The more extreme scenario is wakeup sleepers. If a sleeper task
  wakes up to a cpu, its vruntime will be sched_latency/2 less
  than cfs_rq->min_vruntime /* place_entity() function do this */
In the scene of the above A, B, C cycle time slice,
a sleeper thread D woken up
D->vruntime = cfs_rq->min_vruntime - sched_latency/2
then the timing of the pick_task may be:

A(8ms)->B(8ms)->C(8ms)->D(6ms)->D(6ms)->D(6ms,then dequeue)->C(8ms)-> 
B(8ms)->A

causing A's scheduling delay to reach 50ms
so, I make changes to check_preempt_tick():

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ee271bb..1f61b9c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4020,7 +4020,23 @@ static void clear_buddies(struct cfs_rq *cfs_rq, 
struct sched_entity *se)
         ideal_runtime = sched_slice(cfs_rq, curr);
         delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
         if (delta_exec > ideal_runtime) {
+               struct rb_node *next = NULL;
+               struct rb_node *right_most = NULL;
+               struct sched_entity *last;
                 resched_curr(rq_of(cfs_rq));
+
+                /* always set to max vruntime */
+                if (cfs_rq->nr_running > 1) {
+                        next = &curr->run_node;
+                        do {
+                                right_most = next;
+                                next = rb_next(next);
+                        } while (next);
+
+                        last = rb_entry(right_most,
+                                         struct sched_entity, run_node);
+                        curr->vruntime = last->vruntime + 1; // maybe 
+1 is not needed
+                }
+
                 /*
                  * The current task ran long enough, ensure it doesn't get
                  * re-elected due to buddy favours.


In this way, time sliced task in one sched_latency period will not get 
runtime again,
And for wake sleeper, it can meet the needs of wake-up preemption, and 
this will not let it
take too much time in a scheduling period.

I wrote a small press program, bind it to 12 cpus, this program wake up 
36 sub-processes
  every second and each sub-process work 120ms, looping all the time..
I‘ve tested on linux-stable 4.20-rc1 kernel, the maximum delay of press 
program is about 80ms.
After applying the above modifications, the maximum delay is stable at 27ms.

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [RFC] sched/fair: Align vruntime to last_se when curr_se's timeslice run out
  2018-12-05 12:41 [RFC] sched/fair: Align vruntime to last_se when curr_se's timeslice run out weiqi (C)
@ 2018-12-06  9:36 ` Peter Zijlstra
  0 siblings, 0 replies; 2+ messages in thread
From: Peter Zijlstra @ 2018-12-06  9:36 UTC (permalink / raw)
  To: weiqi (C); +Cc: linux-kernel, Ingo Molnar, Mike Galbraith

On Wed, Dec 05, 2018 at 08:41:39PM +0800, weiqi (C) wrote:

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ee271bb..1f61b9c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4020,7 +4020,23 @@ static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se)
>         ideal_runtime = sched_slice(cfs_rq, curr);
>         delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
>         if (delta_exec > ideal_runtime) {
> +               struct rb_node *next = NULL;
> +               struct rb_node *right_most = NULL;
> +               struct sched_entity *last;
>                 resched_curr(rq_of(cfs_rq));
> +
> +                /* always set to max vruntime */
> +                if (cfs_rq->nr_running > 1) {
> +                        next = &curr->run_node;
> +                        do {
> +                                right_most = next;
> +                                next = rb_next(next);
> +                        } while (next);
> +
> +                        last = rb_entry(right_most,
> +                                         struct sched_entity, run_node);

This you can obviously do better by tracking max_vruntime along side
min_vruntime. But for testing this should work fine I suppose.

> +                        curr->vruntime = last->vruntime + 1; // maybe +1 is not needed

This however is completely broken... you've basically reduced a virtual
runtime scheduler to a simple RR one.

Yes, place_entity() is not ideal, for starters we should not insert
relative to min_vruntime but to the 0-lag point (weighted average
vruntime). And IIRC, we should not let negative lag tasks reduce the
runqueue weight.

But those things are computationally expensive to do, so we fudged it.

> +                }
> +
>                 /*
>                  * The current task ran long enough, ensure it doesn't get
>                  * re-elected due to buddy favours.




^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-12-06  9:36 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-05 12:41 [RFC] sched/fair: Align vruntime to last_se when curr_se's timeslice run out weiqi (C)
2018-12-06  9:36 ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).