linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Tao Zhou <tao.zhou@linux.dev>
Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de,
	linux-kernel@vger.kernel.org, parth@linux.ibm.com,
	qais.yousef@arm.com, chris.hyser@oracle.com,
	pkondeti@codeaurora.org, vschneid@redhat.com,
	patrick.bellasi@matbug.net, David.Laight@aculab.com,
	pjt@google.com, pavel@ucw.cz, tj@kernel.org, qperret@google.com,
	tim.c.chen@linux.intel.com
Subject: Re: [RFC 5/6] sched/fair: Take into account latency nice at wakeup
Date: Mon, 2 May 2022 11:54:31 +0200	[thread overview]
Message-ID: <CAKfTPtD2Hb_ZU8x1J9B6he7fYNvw2AyUBOKdfRk6zcSvJEvYTg@mail.gmail.com> (raw)
In-Reply-To: <Ym6uMoHVkqr9zOpj@geo.homenetwork>

Hi Tao,

On Sun, 1 May 2022 at 17:58, Tao Zhou <tao.zhou@linux.dev> wrote:
>
> Hi Vincent,
>
> Change to Valentin Schneider's now using mail address.

Thanks

>
> On Fri, Mar 11, 2022 at 05:14:05PM +0100, Vincent Guittot wrote:
>
> > Take into account the nice latency priority of a thread when deciding to
> > preempt the current running thread. We don't want to provide more CPU
> > bandwidth to a thread but reorder the scheduling to run latency sensitive
> > task first whenever possible.
> >
> > As long as a thread didn't use its bandwidth, it will be able to preempt
> > the current thread.
> >
> > At the opposite, a thread with a low latency priority will preempt current
> > thread at wakeup only to keep fair CPU bandwidth sharing. Otherwise it will
> > wait for the tick to get its sched slice.
> >
> >                                    curr vruntime
> >                                        |
> >                       sysctl_sched_wakeup_granularity
> >                                    <-->
> > ----------------------------------|----|-----------------------|---------------
> >                                   |    |<--------------------->
> >                                   |    .  sysctl_sched_latency
> >                                   |    .
> > default/current latency entity    |    .
> >                                   |    .
> > 1111111111111111111111111111111111|0000|-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-
> > se preempts curr at wakeup ------>|<- se doesn't preempt curr -----------------
> >                                   |    .
> >                                   |    .
> >                                   |    .
> > low latency entity                |    .
> >                                    ---------------------->|
> >                                % of sysctl_sched_latency  |
> > 1111111111111111111111111111111111111111111111111111111111|0000|-1-1-1-1-1-1-1-
> > preempt ------------------------------------------------->|<- do not preempt --
> >                                   |    .
> >                                   |    .
> >                                   |    .
> > high latency entity               |    .
> >          |<-----------------------|    .
> >          | % of sysctl_sched_latency   .
> > 111111111|0000|-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1
> > preempt->|<- se doesn't preempt curr ------------------------------------------
> >
> > Tests results of nice latency impact on heavy load like hackbench:
> >
> > hackbench -l (2560 / group) -g group
> > group        latency 0             latency 19
> > 1            1.450(+/- 0.60%)      1.376(+/- 0.54%) + 5%
> > 4            1.537(+/- 1.75%)      1.335(+/- 1.81%) +13%
> > 8            1.414(+/- 1.91%)      1.348(+/- 1.88%) + 5%
> > 16           1.423(+/- 1.65%)      1.374(+/- 0.58%) + 3%
> >
> > hackbench -p -l (2560 / group) -g group
> > group
> > 1            1.416(+/- 3.45%)      0.886(+/- 0.54%) +37%
> > 4            1.634(+/- 7.14%)      0.888(+/- 5.40%) +45%
> > 8            1.449(+/- 2.14%)      0.804(+/- 4.55%) +44%
> > 16           0.917(+/- 4.12%)      0.777(+/- 1.41%) +15%
> >
> > By deacreasing the latency prio, we reduce the number of preemption at
>
> s/deacreasing/decreasing/

yes

> s/reduce/increase/

not in the case of hackbench tests above. By decreasing the latency
prio of all hackbench threads, we make sure that they will not preempt
the current thread and let it move forward so we reduce the number of
preemption.

>
> > wakeup and help hackbench making progress.
> >
> > Test results of nice latency impact on short live load like cyclictest
> > while competing with heavy load like hackbench:
> >
> > hackbench -l 10000 -g group &
> > cyclictest --policy other -D 5 -q -n
> >         latency 0           latency -20
> > group   min  avg    max     min  avg    max
> > 0       16    17     28      15   17     27
> > 1       61   382  10603      63   89   4628
> > 4       52   437  15455      54   98  16238
> > 8       56   728  38499      61  125  28983
> > 16      53  1215  52207      61  183  80751
> >
> > group = 0 means that hackbench is not running.
> >
> > The avg is significantly improved with nice latency -20 especially with
> > large number of groups but min and max remain quite similar. If we add the
> > histogram parameters to get details of latency, we have :
> >
> > hackbench -l 10000 -g 16 &
> > cyclictest --policy other -D 5 -q -n  -H 20000 --histfile data.txt
> >               latency 0    latency -20
> > Min Latencies:    63           62
> > Avg Latencies:  1397          218
> > Max Latencies: 44926        42291
> > 50% latencies:   100           98
> > 75% latencies:   762          116
> > 85% latencies:  1118          126
> > 90% latencies:  1540          130
> > 95% latencies:  5610          138
> > 99% latencies: 13738          266
> >
> > With percentile details, we see the benefit of nice latency -20 as
> > 1% of the latencies stays above 266us whereas the default latency has
> > got 25% are above 762us and 10% over the 1ms.
> >

[..]

> > +static long wakeup_latency_gran(int latency_weight)
> > +{
> > +     long thresh = sysctl_sched_latency;
> > +
> > +     if (!latency_weight)
> > +             return 0;
> > +
> > +     if (sched_feat(GENTLE_FAIR_SLEEPERS))
> > +             thresh >>= 1;
> > +
> > +     /*
> > +      * Clamp the delta to stay in the scheduler period range
> > +      * [-sysctl_sched_latency:sysctl_sched_latency]
> > +      */
> > +     latency_weight = clamp_t(long, latency_weight,
> > +                             -1 * NICE_LATENCY_WEIGHT_MAX,
> > +                             NICE_LATENCY_WEIGHT_MAX);
> > +
> > +     return (thresh * latency_weight) >> NICE_LATENCY_SHIFT;
> > +}
> > +
> >  static unsigned long wakeup_gran(struct sched_entity *se)
> >  {
> >       unsigned long gran = sysctl_sched_wakeup_granularity;
> > @@ -7008,6 +7059,10 @@ static int
> >  wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
> >  {
> >       s64 gran, vdiff = curr->vruntime - se->vruntime;
> > +     int latency_weight = se->latency_weight - curr->latency_weight;
> > +
> > +     latency_weight = min(latency_weight, se->latency_weight);
>
> Let lable A=se->latency_weight, B=curr->latency_weight, C=A-B.
>
> 1 A>0 B>0
>     ::C=A-B<0, min(C,A)=C, latency decrease, C is the real diff value no matter
>       what A is. That means it can be very 'long'(-sched_latency) and vdiff is
>       more possible to be in <= 0 case and return -1.
>     ::C=A-B>0, min(C,A)=A, latency increase, but it is conservative. Limit to

A>0 and B>0  then min(C=A-B, A) = C

>       A/1024*sched_latency.
>     When latecy is decreased, the negtive value added to vdiff is larger than the
>     positive value added to vdiff when latency is increased.

When the weight > 0, the tasks have some latency requirements so we
use their relative priority to decide if we can preempt current which
also has some latency requirement

>
> 2 A>0 B<0
>     ::C=A-B>0 and C>A, min(C,A)=A, latency increase and it is conservative.
>
> 3 A<0 B<0
>     ::C=A-B>0, min(C,A)=A, latency increase but min(C,A)<0, I think if latency
>     increase means the value added to vdiff will be a positive value to increase
>     the chance to return 1. I would say it is max(C,A)=C
>     ::C=A-B<0, min(C,A)=A, latency decrease and the real negtive value.

When the weight < 0, the tasks haven't latency requirement and even
don't care of being scheduled "quickly". In this case, we don't care
about the relative priority but we want to minimize the preemption of
current so we use the weight

>
> 4 A<0,B>0
>     ::C=A-B<0, min(C,A)=C, latency decrease and the real negtive value.
>
> Is there a reason that the value when latency increase and latency decrease
> be treated differently. Latency increase value is limited to se's latency_weight

I have tried to explain why I treat differently if weight is > 0 or < 0.

> but latency decrease value can extend to -sched_latency or treat them the same.
> Penalty latency decrease and conserve latency increase.
>
>
> There is any value that this latency_weight can be considered to be a average.
>
> The delta value choose is ~%5 to 1024. %5*sched_latency=0.05*6ms=0.3ms.(no scale)
> I do not think over that vdiff equation  and others though.
>
> Thanks,
> Tao
> > +     vdiff += wakeup_latency_gran(latency_weight);
> >
> >       if (vdiff <= 0)
> >               return -1;
> > @@ -7117,6 +7172,7 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
> >               return;
> >
> >       update_curr(cfs_rq_of(se));
> > +
> >       if (wakeup_preempt_entity(se, pse) == 1) {
> >               /*
> >                * Bias pick_next to pick the sched entity that is
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 456ad2159eb1..dd92aa9c36f9 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -122,6 +122,17 @@ extern void call_trace_sched_update_nr_running(struct rq *rq, int count);
> >   * Default tasks should be treated as a task with latency_nice = 0.
> >   */
> >  #define DEFAULT_LATENCY_NICE 0
> > +#define DEFAULT_LATENCY_PRIO (DEFAULT_LATENCY_NICE + LATENCY_NICE_WIDTH/2)
> > +
> > +/*
> > + * Convert user-nice values [ -20 ... 0 ... 19 ]
> > + * to static latency [ 0..39 ],
> > + * and back.
> > + */
> > +#define NICE_TO_LATENCY(nice)        ((nice) + DEFAULT_LATENCY_PRIO)
> > +#define LATENCY_TO_NICE(prio)        ((prio) - DEFAULT_LATENCY_PRIO)
> > +#define NICE_LATENCY_SHIFT   (SCHED_FIXEDPOINT_SHIFT)
> > +#define NICE_LATENCY_WEIGHT_MAX      (1L << NICE_LATENCY_SHIFT)
> >
> >  /*
> >   * Increase resolution of nice-level calculations for 64-bit architectures.
> > @@ -2098,6 +2109,7 @@ static_assert(WF_TTWU == SD_BALANCE_WAKE);
> >
> >  extern const int             sched_prio_to_weight[40];
> >  extern const u32             sched_prio_to_wmult[40];
> > +extern const int             sched_latency_to_weight[40];
> >
> >  /*
> >   * {de,en}queue flags:
> > --
> > 2.17.1
> >

  reply	other threads:[~2022-05-02  9:57 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-11 16:14 [PATCH 0/6] Add latency_nice priority Vincent Guittot
2022-03-11 16:14 ` [PATCH 1/6] sched: Introduce latency-nice as a per-task attribute Vincent Guittot
2022-03-11 16:14 ` [PATCH 2/6] sched/core: Propagate parent task's latency requirements to the child task Vincent Guittot
2022-03-22  0:22   ` Tim Chen
2022-03-22 14:57     ` Vincent Guittot
2022-03-11 16:14 ` [PATCH 3/6] sched: Allow sched_{get,set}attr to change latency_nice of the task Vincent Guittot
2022-03-22  0:22   ` Tim Chen
2022-03-22 14:55     ` Vincent Guittot
2022-03-28  9:23   ` Dietmar Eggemann
2022-03-28 12:41     ` Vincent Guittot
2022-03-11 16:14 ` [PATCH 4/6] sched/core: Add permission checks for setting the latency_nice value Vincent Guittot
2022-03-11 16:14 ` [RFC 5/6] sched/fair: Take into account latency nice at wakeup Vincent Guittot
2022-03-15  0:53   ` Josh Don
2022-03-15 13:54     ` Vincent Guittot
2022-03-28  9:24   ` Dietmar Eggemann
2022-03-28 12:51     ` Vincent Guittot
2022-05-01 15:58   ` Tao Zhou
2022-05-02  9:54     ` Vincent Guittot [this message]
2022-05-02 12:30       ` Vincent Guittot
2022-05-02 15:08         ` Tao Zhou
2022-05-02 15:26           ` Tao Zhou
2022-05-02 15:47             ` Tao Zhou
2022-05-02 16:21               ` Vincent Guittot
2022-05-02 23:09                 ` Tao Zhou
2022-05-03  2:30     ` Tao Zhou
2022-05-03 12:40       ` Vincent Guittot
2022-05-04 11:14   ` Chen Yu
2022-05-04 12:39     ` Vincent Guittot
2022-03-11 16:14 ` [RFC 6/6] sched/fair: Add sched group latency support Vincent Guittot
2022-03-15  0:58   ` Josh Don
2022-03-15 17:07     ` Vincent Guittot
2022-03-21 17:24   ` Tejun Heo
2022-03-22 16:10     ` Vincent Guittot
2022-03-22 16:40       ` Tejun Heo
2022-03-23 15:04         ` Vincent Guittot
2022-03-23 18:20           ` Chris Hyser
2022-03-22 16:41     ` Tim Chen
2022-03-23 15:23       ` Vincent Guittot
2022-03-22 16:39 ` [PATCH 0/6] Add latency_nice priority Qais Yousef
2022-03-23 15:32   ` Vincent Guittot
2022-03-24 17:25     ` Qais Yousef
2022-03-25 13:27       ` Vincent Guittot
2022-03-28 16:27         ` Qais Yousef
2022-03-30  7:30           ` Vincent Guittot
2022-03-28  9:24 ` Dietmar Eggemann
2022-03-28 12:56   ` Vincent Guittot
2022-04-01 12:15     ` Qais Yousef
2022-04-02  8:46       ` Vincent Guittot
2022-04-09 17:08         ` Qais Yousef
2022-04-09 17:28           ` Steven Rostedt
2022-04-09 18:10             ` Qais Yousef
2022-04-11  7:26               ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKfTPtD2Hb_ZU8x1J9B6he7fYNvw2AyUBOKdfRk6zcSvJEvYTg@mail.gmail.com \
    --to=vincent.guittot@linaro.org \
    --cc=David.Laight@aculab.com \
    --cc=bsegall@google.com \
    --cc=chris.hyser@oracle.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=parth@linux.ibm.com \
    --cc=patrick.bellasi@matbug.net \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=pkondeti@codeaurora.org \
    --cc=qais.yousef@arm.com \
    --cc=qperret@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tao.zhou@linux.dev \
    --cc=tim.c.chen@linux.intel.com \
    --cc=tj@kernel.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).