linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Usecases for the per-task latency-nice attribute
@ 2019-09-18 12:41 Parth Shah
  2019-09-18 14:18 ` Patrick Bellasi
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Parth Shah @ 2019-09-18 12:41 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra, Patrick Bellasi, subhra mazumdar,
	tim.c.chen, Valentin Schneider
  Cc: mingo, morten.rasmussen, dietmar.eggemann, pjt, vincent.guittot,
	quentin.perret, dhaval.giani, daniel.lezcano, tj,
	rafael.j.wysocki, qais.yousef

Hello everyone,

As per the discussion in LPC2019, new per-task property like latency-nice
can be useful in certain scenarios. The scheduler can take proper decision
by knowing latency requirement of a task from the end-user itself.

There has already been an effort from Subhra for introducing Task
latency-nice [1] values and have seen several possibilities where this type of
interface can be used.

From the best of my understanding of the discussion on the mail thread and
in the LPC2019, it seems that there are two dilemmas;

1. Name: What should be the name for such attr for all the possible usecases?
=============
Latency nice is the proposed name as of now where the lower value indicates
that the task doesn't care much for the latency and we can spend some more
time in the kernel to decide a better placement of a task (to save time,
energy, etc.)
But there seems to be a bit of confusion on whether we want biasing as well
(latency-biased) or something similar, in which case "latency-nice" may
confuse the end-user.

2. Value: What should be the range of possible values supported by this new
attr?
==============
The possible values of such task attribute still need community attention.
Do we need a range of values or just binary/ternary values are sufficient?
Also signed or unsigned and so the length of the variable (u64, s32, etc)?



This mail is to initiate the discussion regarding the possible usecases of
such per task attribute and to come up with a specific name and value for
the same.

Hopefully, interested one should plot out their usecase for which this new
attr can potentially help in solving or optimizing it.


Well, to start with, here is my usecase.

-------------------
**Usecases**
-------------------

$> TurboSched
====================
TurboSched [2] tries to minimize the number of active cores in a socket by
packing an un-important and low-utilization (named jitter) task on an
already active core and thus refrains from waking up of a new core if
possible. This requires tagging of tasks from the userspace hinting which
tasks are un-important and thus waking-up a new core to minimize the
latency is un-necessary for such tasks.
As per the discussion on the posted RFC, it will be appropriate to use the
task latency property where a task with the highest latency-nice value can
be packed.
But for this specific use-cases, having just a binary value to know which
task is latency-sensitive and which not is sufficient enough, but having a
range is also a good way to go where above some threshold the task can be
packed.




References:
===========
[1]. https://lkml.org/lkml/2019/8/30/829
[2]. https://lkml.org/lkml/2019/7/25/296


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Usecases for the per-task latency-nice attribute
  2019-09-18 12:41 Usecases for the per-task latency-nice attribute Parth Shah
@ 2019-09-18 14:18 ` Patrick Bellasi
  2019-09-18 15:22   ` Vincent Guittot
                     ` (2 more replies)
  2019-09-18 17:16 ` Tim Chen
  2019-09-19 14:43 ` Qais Yousef
  2 siblings, 3 replies; 17+ messages in thread
From: Patrick Bellasi @ 2019-09-18 14:18 UTC (permalink / raw)
  To: Parth Shah
  Cc: linux-kernel, Peter Zijlstra, subhra mazumdar, tim.c.chen,
	Valentin Schneider, mingo, morten.rasmussen, dietmar.eggemann,
	pjt, vincent.guittot, quentin.perret, dhaval.giani,
	daniel.lezcano, tj, rafael.j.wysocki, qais.yousef,
	Patrick Bellasi


On Wed, Sep 18, 2019 at 13:41:04 +0100, Parth Shah wrote...

> Hello everyone,

Hi Parth,
thanks for staring this discussion.

[ + patrick.bellasi@matbug.net ] my new email address, since with
@arm.com I will not be reachable anymore starting next week.

> As per the discussion in LPC2019, new per-task property like latency-nice
> can be useful in certain scenarios. The scheduler can take proper decision
> by knowing latency requirement of a task from the end-user itself.
>
> There has already been an effort from Subhra for introducing Task
> latency-nice [1] values and have seen several possibilities where this type of
> interface can be used.
>
> From the best of my understanding of the discussion on the mail thread and
> in the LPC2019, it seems that there are two dilemmas;
>
> 1. Name: What should be the name for such attr for all the possible usecases?
> =============
> Latency nice is the proposed name as of now where the lower value indicates
> that the task doesn't care much for the latency

If by "lower value" you mean -19 (in the proposed [-20,19] range), then
I think the meaning should be the opposite.

A -19 latency-nice task is a task which is not willing to give up
latency. For those tasks for example we want to reduce the wake-up
latency at maximum.

This will keep its semantic aligned to that of process niceness values
which range from -20 (most favourable to the process) to 19 (least
favourable to the process).

> and we can spend some more time in the kernel to decide a better
> placement of a task (to save time, energy, etc.)

Tasks with an high latency-nice value (e.g. 19) are "less sensible to
latency". These are tasks we wanna optimize mainly for throughput and
thus, for example, we can spend some more time to find out a better task
placement at wakeup time.

Does that makes sense?

> But there seems to be a bit of confusion on whether we want biasing as well
> (latency-biased) or something similar, in which case "latency-nice" may
> confuse the end-user.

AFAIU PeterZ point was "just" that if we call it "-nice" it has to
behave as "nice values" to avoid confusions to users. But, if we come up
with a different naming maybe we will have more freedom.

Personally, I like both "latency-nice" or "latency-tolerant", where:

 - latency-nice:
   should have a better understanding based on pre-existing concepts

 - latency-tolerant:
   decouples a bit its meaning from the niceness thus giving maybe a bit
   more freedom in its complete definition and perhaps avoid any
   possible interpretation confusion like the one I commented above.

Fun fact: there was also the latency-nasty proposal from PaulMK :)

> 2. Value: What should be the range of possible values supported by this new
> attr?
> ==============
> The possible values of such task attribute still need community attention.
> Do we need a range of values or just binary/ternary values are sufficient?
> Also signed or unsigned and so the length of the variable (u64, s32,
> etc)?

AFAIR, the proposal on the table are essentially two:

 A) use a [-20,19] range
 
    Which has similarities with the niceness concept and gives a minimal
    continuous range. This can be on hand for things like scaling the
    vruntime normalization [3]

 B) use some sort of "profile tagging"
    e.g. background, latency-sensible, etc...
    
    If I correctly got what PaulT was proposing toward the end of the
    discussion at LPC.

This last option deserves better exploration.

At first glance I'm more for option A, I see a range as something that:

  - gives us a bit of flexibility in terms of the possible internal
    usages of the actual value

  - better supports some kind of linear/proportional mapping

  - still supports a "profile tagging" by (possible) exposing to
    user-space some kind of system wide knobs defining threshold that
    maps the continuous value into a "profile"
    e.g. latency-nice >= 15: use SCHED_BATCH

    In the following discussion I'll call "threshold based profiling"
    this approach.


> This mail is to initiate the discussion regarding the possible usecases of
> such per task attribute and to come up with a specific name and value for
> the same.
>
> Hopefully, interested one should plot out their usecase for which this new
> attr can potentially help in solving or optimizing it.

+1

> Well, to start with, here is my usecase.
>
> -------------------
> **Usecases**
> -------------------
>
> $> TurboSched
> ====================
> TurboSched [2] tries to minimize the number of active cores in a socket by
> packing an un-important and low-utilization (named jitter) task on an
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We should really come up with a different name, since jitters clashes
with other RT related concepts.

Maybe we don't even need a name at all, the other two attributes you
specify are good enough to identify those tasks: they are just "small
background" tasks.

  small      : because on their small util_est value
  background : because of their high latency-nice value

> already active core and thus refrains from waking up of a new core if
> possible. This requires tagging of tasks from the userspace hinting which
> tasks are un-important and thus waking-up a new core to minimize the
> latency is un-necessary for such tasks.
> As per the discussion on the posted RFC, it will be appropriate to use the
> task latency property where a task with the highest latency-nice value can
> be packed.

We should better defined here what you mean with "highest" latency-nice
value, do you really mean the top of the range, e.g. 19?

Or...

> But for this specific use-cases, having just a binary value to know which
> task is latency-sensitive and which not is sufficient enough, but having a
> range is also a good way to go where above some threshold the task can be
> packed.

... yes, maybe we can reason about a "threshold based profiling" where
something like for example:

   /proc/sys/kernel/sched_packing_util_max    : 200
   /proc/sys/kernel/sched_packing_latency_min : 17

means that a task with latency-nice >= 17 and util_est <= 200 will be packed?


$> Wakeup path tunings
==========================

Some additional possible use-cases was already discussed in [3]:

 - dynamically tune the policy of a task among SCHED_{OTHER,BATCH,IDLE}
   depending on crossing certain pre-configured threshold of latency
   niceness.
  
 - dynamically bias the vruntime updates we do in place_entity()
   depending on the actual latency niceness of a task.
  
   PeterZ thinks this is dangerous but that we can "(carefully) fumble a
   bit there."
  
 - bias the decisions we take in check_preempt_tick() still depending
   on a relative comparison of the current and wakeup task latency
   niceness values.

> References:
> ===========
> [1]. https://lkml.org/lkml/2019/8/30/829
> [2]. https://lkml.org/lkml/2019/7/25/296

  [3]. Message-ID: <20190905114709.GM2349@hirez.programming.kicks-ass.net>
       https://lore.kernel.org/lkml/20190905114709.GM2349@hirez.programming.kicks-ass.net/


Best,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Usecases for the per-task latency-nice attribute
  2019-09-18 14:18 ` Patrick Bellasi
@ 2019-09-18 15:22   ` Vincent Guittot
  2019-09-18 15:46     ` Patrick Bellasi
  2019-09-18 15:42   ` Valentin Schneider
  2019-09-19  7:01   ` Parth Shah
  2 siblings, 1 reply; 17+ messages in thread
From: Vincent Guittot @ 2019-09-18 15:22 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Parth Shah, linux-kernel, Peter Zijlstra, subhra mazumdar,
	Tim Chen, Valentin Schneider, Ingo Molnar, Morten Rasmussen,
	Dietmar Eggemann, Paul Turner, Quentin Perret, Dhaval Giani,
	Daniel Lezcano, Tejun Heo, Rafael J. Wysocki, Qais Yousef,
	Patrick Bellasi

On Wed, 18 Sep 2019 at 16:19, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
>
>
> On Wed, Sep 18, 2019 at 13:41:04 +0100, Parth Shah wrote...
>
> > Hello everyone,
>
> Hi Parth,
> thanks for staring this discussion.
>
> [ + patrick.bellasi@matbug.net ] my new email address, since with
> @arm.com I will not be reachable anymore starting next week.
>
> > As per the discussion in LPC2019, new per-task property like latency-nice
> > can be useful in certain scenarios. The scheduler can take proper decision
> > by knowing latency requirement of a task from the end-user itself.
> >
> > There has already been an effort from Subhra for introducing Task
> > latency-nice [1] values and have seen several possibilities where this type of
> > interface can be used.
> >
> > From the best of my understanding of the discussion on the mail thread and
> > in the LPC2019, it seems that there are two dilemmas;
> >
> > 1. Name: What should be the name for such attr for all the possible usecases?
> > =============
> > Latency nice is the proposed name as of now where the lower value indicates
> > that the task doesn't care much for the latency
>
> If by "lower value" you mean -19 (in the proposed [-20,19] range), then
> I think the meaning should be the opposite.
>
> A -19 latency-nice task is a task which is not willing to give up
> latency. For those tasks for example we want to reduce the wake-up
> latency at maximum.
>
> This will keep its semantic aligned to that of process niceness values
> which range from -20 (most favourable to the process) to 19 (least
> favourable to the process).
>
> > and we can spend some more time in the kernel to decide a better
> > placement of a task (to save time, energy, etc.)
>
> Tasks with an high latency-nice value (e.g. 19) are "less sensible to
> latency". These are tasks we wanna optimize mainly for throughput and
> thus, for example, we can spend some more time to find out a better task
> placement at wakeup time.
>
> Does that makes sense?
>
> > But there seems to be a bit of confusion on whether we want biasing as well
> > (latency-biased) or something similar, in which case "latency-nice" may
> > confuse the end-user.
>
> AFAIU PeterZ point was "just" that if we call it "-nice" it has to
> behave as "nice values" to avoid confusions to users. But, if we come up
> with a different naming maybe we will have more freedom.
>
> Personally, I like both "latency-nice" or "latency-tolerant", where:
>
>  - latency-nice:
>    should have a better understanding based on pre-existing concepts
>
>  - latency-tolerant:
>    decouples a bit its meaning from the niceness thus giving maybe a bit
>    more freedom in its complete definition and perhaps avoid any
>    possible interpretation confusion like the one I commented above.
>
> Fun fact: there was also the latency-nasty proposal from PaulMK :)
>
> > 2. Value: What should be the range of possible values supported by this new
> > attr?
> > ==============
> > The possible values of such task attribute still need community attention.
> > Do we need a range of values or just binary/ternary values are sufficient?
> > Also signed or unsigned and so the length of the variable (u64, s32,
> > etc)?
>
> AFAIR, the proposal on the table are essentially two:
>
>  A) use a [-20,19] range
>
>     Which has similarities with the niceness concept and gives a minimal
>     continuous range. This can be on hand for things like scaling the
>     vruntime normalization [3]
>
>  B) use some sort of "profile tagging"
>     e.g. background, latency-sensible, etc...
>
>     If I correctly got what PaulT was proposing toward the end of the
>     discussion at LPC.
>
> This last option deserves better exploration.
>
> At first glance I'm more for option A, I see a range as something that:
>
>   - gives us a bit of flexibility in terms of the possible internal
>     usages of the actual value
>
>   - better supports some kind of linear/proportional mapping
>
>   - still supports a "profile tagging" by (possible) exposing to
>     user-space some kind of system wide knobs defining threshold that
>     maps the continuous value into a "profile"
>     e.g. latency-nice >= 15: use SCHED_BATCH
>
>     In the following discussion I'll call "threshold based profiling"
>     this approach.
>
>
> > This mail is to initiate the discussion regarding the possible usecases of
> > such per task attribute and to come up with a specific name and value for
> > the same.
> >
> > Hopefully, interested one should plot out their usecase for which this new
> > attr can potentially help in solving or optimizing it.
>
> +1
>
> > Well, to start with, here is my usecase.
> >
> > -------------------
> > **Usecases**
> > -------------------
> >
> > $> TurboSched
> > ====================
> > TurboSched [2] tries to minimize the number of active cores in a socket by
> > packing an un-important and low-utilization (named jitter) task on an
>              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> We should really come up with a different name, since jitters clashes
> with other RT related concepts.
>
> Maybe we don't even need a name at all, the other two attributes you
> specify are good enough to identify those tasks: they are just "small
> background" tasks.
>
>   small      : because on their small util_est value
>   background : because of their high latency-nice value
>
> > already active core and thus refrains from waking up of a new core if
> > possible. This requires tagging of tasks from the userspace hinting which
> > tasks are un-important and thus waking-up a new core to minimize the
> > latency is un-necessary for such tasks.
> > As per the discussion on the posted RFC, it will be appropriate to use the
> > task latency property where a task with the highest latency-nice value can
> > be packed.
>
> We should better defined here what you mean with "highest" latency-nice
> value, do you really mean the top of the range, e.g. 19?
>
> Or...
>
> > But for this specific use-cases, having just a binary value to know which
> > task is latency-sensitive and which not is sufficient enough, but having a
> > range is also a good way to go where above some threshold the task can be
> > packed.
>
> ... yes, maybe we can reason about a "threshold based profiling" where
> something like for example:
>
>    /proc/sys/kernel/sched_packing_util_max    : 200
>    /proc/sys/kernel/sched_packing_latency_min : 17
>
> means that a task with latency-nice >= 17 and util_est <= 200 will be packed?
>
>
> $> Wakeup path tunings
> ==========================
>
> Some additional possible use-cases was already discussed in [3]:
>
>  - dynamically tune the policy of a task among SCHED_{OTHER,BATCH,IDLE}
>    depending on crossing certain pre-configured threshold of latency
>    niceness.
>
>  - dynamically bias the vruntime updates we do in place_entity()
>    depending on the actual latency niceness of a task.
>
>    PeterZ thinks this is dangerous but that we can "(carefully) fumble a
>    bit there."

I agree with Peter that we can easily break the fairness if we bias vruntime

>
>  - bias the decisions we take in check_preempt_tick() still depending
>    on a relative comparison of the current and wakeup task latency
>    niceness values.

This one seems possible as it will mainly enable a task to preempt
"earlier" the running task but will not break the fairness
So the main impact will be the number of context switch between tasks
to favor or not the scheduling latency

>
> > References:
> > ===========
> > [1]. https://lkml.org/lkml/2019/8/30/829
> > [2]. https://lkml.org/lkml/2019/7/25/296
>
>   [3]. Message-ID: <20190905114709.GM2349@hirez.programming.kicks-ass.net>
>        https://lore.kernel.org/lkml/20190905114709.GM2349@hirez.programming.kicks-ass.net/
>
>
> Best,
> Patrick
>
> --
> #include <best/regards.h>
>
> Patrick Bellasi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Usecases for the per-task latency-nice attribute
  2019-09-18 14:18 ` Patrick Bellasi
  2019-09-18 15:22   ` Vincent Guittot
@ 2019-09-18 15:42   ` Valentin Schneider
  2019-09-19 16:41     ` Parth Shah
  2019-09-19  7:01   ` Parth Shah
  2 siblings, 1 reply; 17+ messages in thread
From: Valentin Schneider @ 2019-09-18 15:42 UTC (permalink / raw)
  To: Patrick Bellasi, Parth Shah
  Cc: linux-kernel, Peter Zijlstra, subhra mazumdar, tim.c.chen, mingo,
	morten.rasmussen, dietmar.eggemann, pjt, vincent.guittot,
	quentin.perret, dhaval.giani, daniel.lezcano, tj,
	rafael.j.wysocki, qais.yousef, Patrick Bellasi

On 18/09/2019 15:18, Patrick Bellasi wrote:
>> 1. Name: What should be the name for such attr for all the possible usecases?
>> =============
>> Latency nice is the proposed name as of now where the lower value indicates
>> that the task doesn't care much for the latency
> 
> If by "lower value" you mean -19 (in the proposed [-20,19] range), then
> I think the meaning should be the opposite.
> 
> A -19 latency-nice task is a task which is not willing to give up
> latency. For those tasks for example we want to reduce the wake-up
> latency at maximum.
> 
> This will keep its semantic aligned to that of process niceness values
> which range from -20 (most favourable to the process) to 19 (least
> favourable to the process).
> 

I don't want to start a bikeshedding session here, but I agree with Parth
on the interpretation of the values.

I've always read niceness values as
-20 (least nice to the system / other processes)
+19 (most nice to the system / other processes)

So following this trend I'd see for latency-nice:
-20 (least nice to latency, i.e. sacrifice latency for throughput)
+19 (most nice to latency, i.e. sacrifice throughput for latency)

However...

>> But there seems to be a bit of confusion on whether we want biasing as well
>> (latency-biased) or something similar, in which case "latency-nice" may
>> confuse the end-user.
> 
> AFAIU PeterZ point was "just" that if we call it "-nice" it has to
> behave as "nice values" to avoid confusions to users. But, if we come up
> with a different naming maybe we will have more freedom.
> 

...just getting rid of the "-nice" would leave us free not to have to
interpret the values as "nice to / not nice to" :)

> Personally, I like both "latency-nice" or "latency-tolerant", where:
> 
>  - latency-nice:
>    should have a better understanding based on pre-existing concepts
> 
>  - latency-tolerant:
>    decouples a bit its meaning from the niceness thus giving maybe a bit
>    more freedom in its complete definition and perhaps avoid any
>    possible interpretation confusion like the one I commented above.
> 
> Fun fact: there was also the latency-nasty proposal from PaulMK :)
> 

[...]

> 
> $> Wakeup path tunings
> ==========================
> 
> Some additional possible use-cases was already discussed in [3]:
> 
>  - dynamically tune the policy of a task among SCHED_{OTHER,BATCH,IDLE}
>    depending on crossing certain pre-configured threshold of latency
>    niceness.
>   
>  - dynamically bias the vruntime updates we do in place_entity()
>    depending on the actual latency niceness of a task.
>   
>    PeterZ thinks this is dangerous but that we can "(carefully) fumble a
>    bit there."
>   
>  - bias the decisions we take in check_preempt_tick() still depending
>    on a relative comparison of the current and wakeup task latency
>    niceness values.

Aren't we missing the point about tweaking the sched domain scans (which
AFAIR was the original point for latency-nice)?

Something like default value is current behaviour and
- Being less latency-sensitive means increasing the scans (e.g. trending
  towards only going through the slow wakeup-path at the extreme setting)
- Being more latency-sensitive means reducing the scans (e.g. trending
  towards a fraction of the domain scanned in the fast-path at the extreme
  setting).

> 

$> Load balance tuning
======================

Already mentioned these in [4]:

- Increase (reduce) nr_balance_failed threshold when trying to active
  balance a latency-sensitive (non-latency-sensitive) task.

- Increase (decrease) sched_migration_cost factor in task_hot() for
  latency-sensitive (non-latency-sensitive) tasks.

>> References:
>> ===========
>> [1]. https://lkml.org/lkml/2019/8/30/829
>> [2]. https://lkml.org/lkml/2019/7/25/296
> 
>   [3]. Message-ID: <20190905114709.GM2349@hirez.programming.kicks-ass.net>
>        https://lore.kernel.org/lkml/20190905114709.GM2349@hirez.programming.kicks-ass.net/
> 

[4]: https://lkml.kernel.org/r/3d3306e4-3a78-5322-df69-7665cf01cc43@arm.com

> 
> Best,
> Patrick
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Usecases for the per-task latency-nice attribute
  2019-09-18 15:22   ` Vincent Guittot
@ 2019-09-18 15:46     ` Patrick Bellasi
  2019-09-18 16:00       ` Vincent Guittot
  0 siblings, 1 reply; 17+ messages in thread
From: Patrick Bellasi @ 2019-09-18 15:46 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Parth Shah, linux-kernel, Peter Zijlstra, subhra mazumdar,
	Tim Chen, Valentin Schneider, Ingo Molnar, Morten Rasmussen,
	Dietmar Eggemann, Paul Turner, Quentin Perret, Dhaval Giani,
	Daniel Lezcano, Tejun Heo, Rafael J. Wysocki, Qais Yousef,
	Patrick Bellasi


On Wed, Sep 18, 2019 at 16:22:32 +0100, Vincent Guittot wrote...

> On Wed, 18 Sep 2019 at 16:19, Patrick Bellasi <patrick.bellasi@arm.com> wrote:

[...]

>> $> Wakeup path tunings
>> ==========================
>>
>> Some additional possible use-cases was already discussed in [3]:
>>
>>  - dynamically tune the policy of a task among SCHED_{OTHER,BATCH,IDLE}
>>    depending on crossing certain pre-configured threshold of latency
>>    niceness.
>>
>>  - dynamically bias the vruntime updates we do in place_entity()
>>    depending on the actual latency niceness of a task.
>>
>>    PeterZ thinks this is dangerous but that we can "(carefully) fumble a
>>    bit there."
>
> I agree with Peter that we can easily break the fairness if we bias vruntime

Just to be more precise here and also to better understand, here I'm
talking about turning the tweaks we already have for:

 - START_DEBIT
 - GENTLE_FAIR_SLEEPERS

a bit more parametric and proportional to the latency-nice of a task.

In principle, if a task declares a positive latency niceness, could we
not read this also as "I accept to be a bit penalised in terms of
fairness at wakeup time"?

Whatever tweaks we do there should affect anyway only one sched_latency
period... although I'm not yet sure if that's possible and how.

>>  - bias the decisions we take in check_preempt_tick() still depending
>>    on a relative comparison of the current and wakeup task latency
>>    niceness values.
>
> This one seems possible as it will mainly enable a task to preempt
> "earlier" the running task but will not break the fairness
> So the main impact will be the number of context switch between tasks
> to favor or not the scheduling latency

Preempting before is definitively a nice-to-have feature.

At the same time it's interesting a support where a low latency-nice
task (e.g. TOP_APP) RUNNABLE on a CPU has better chances to be executed
up to completion without being preempted by an high latency-nice task
(e.g. BACKGROUND) waking up on its CPU.

For that to happen, we need a mechanism to "delay" the execution of a
less important RUNNABLE task up to a certain period.

It's impacting the fairness, true, but latency-nice in this case will
means that we want to "complete faster", not just "start faster".

Is this definition something we can reason about?

Best,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Usecases for the per-task latency-nice attribute
  2019-09-18 15:46     ` Patrick Bellasi
@ 2019-09-18 16:00       ` Vincent Guittot
  0 siblings, 0 replies; 17+ messages in thread
From: Vincent Guittot @ 2019-09-18 16:00 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Parth Shah, linux-kernel, Peter Zijlstra, subhra mazumdar,
	Tim Chen, Valentin Schneider, Ingo Molnar, Morten Rasmussen,
	Dietmar Eggemann, Paul Turner, Quentin Perret, Dhaval Giani,
	Daniel Lezcano, Tejun Heo, Rafael J. Wysocki, Qais Yousef,
	Patrick Bellasi

On Wed, 18 Sep 2019 at 17:46, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
>
>
> On Wed, Sep 18, 2019 at 16:22:32 +0100, Vincent Guittot wrote...
>
> > On Wed, 18 Sep 2019 at 16:19, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
>
> [...]
>
> >> $> Wakeup path tunings
> >> ==========================
> >>
> >> Some additional possible use-cases was already discussed in [3]:
> >>
> >>  - dynamically tune the policy of a task among SCHED_{OTHER,BATCH,IDLE}
> >>    depending on crossing certain pre-configured threshold of latency
> >>    niceness.
> >>
> >>  - dynamically bias the vruntime updates we do in place_entity()
> >>    depending on the actual latency niceness of a task.
> >>
> >>    PeterZ thinks this is dangerous but that we can "(carefully) fumble a
> >>    bit there."
> >
> > I agree with Peter that we can easily break the fairness if we bias vruntime
>
> Just to be more precise here and also to better understand, here I'm
> talking about turning the tweaks we already have for:
>
>  - START_DEBIT
>  - GENTLE_FAIR_SLEEPERS

ok. So extending these 2 features could make sense

>
> a bit more parametric and proportional to the latency-nice of a task.
>
> In principle, if a task declares a positive latency niceness, could we
> not read this also as "I accept to be a bit penalised in terms of
> fairness at wakeup time"?

I would say no. It's not because you declare a positive latency
niceness that you should lose some fairness and runtime. If task
accept long latency because it's only care about throughput, it
doesn't want to lost some running time

>
> Whatever tweaks we do there should affect anyway only one sched_latency
> period... although I'm not yet sure if that's possible and how.
>
> >>  - bias the decisions we take in check_preempt_tick() still depending
> >>    on a relative comparison of the current and wakeup task latency
> >>    niceness values.
> >
> > This one seems possible as it will mainly enable a task to preempt
> > "earlier" the running task but will not break the fairness
> > So the main impact will be the number of context switch between tasks
> > to favor or not the scheduling latency
>
> Preempting before is definitively a nice-to-have feature.
>
> At the same time it's interesting a support where a low latency-nice
> task (e.g. TOP_APP) RUNNABLE on a CPU has better chances to be executed
> up to completion without being preempted by an high latency-nice task
> (e.g. BACKGROUND) waking up on its CPU.
>
> For that to happen, we need a mechanism to "delay" the execution of a
> less important RUNNABLE task up to a certain period.
>
> It's impacting the fairness, true, but latency-nice in this case will
> means that we want to "complete faster", not just "start faster".

you TOP_APP task will have to set both nice and latency-nice  if it
wants to make (almost) sure to have time to finish before BACKGROUND


>
> Is this definition something we can reason about?
>
> Best,
> Patrick
>
> --
> #include <best/regards.h>
>
> Patrick Bellasi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Usecases for the per-task latency-nice attribute
  2019-09-18 12:41 Usecases for the per-task latency-nice attribute Parth Shah
  2019-09-18 14:18 ` Patrick Bellasi
@ 2019-09-18 17:16 ` Tim Chen
  2019-09-19  8:37   ` Parth Shah
  2019-09-19  9:06   ` David Laight
  2019-09-19 14:43 ` Qais Yousef
  2 siblings, 2 replies; 17+ messages in thread
From: Tim Chen @ 2019-09-18 17:16 UTC (permalink / raw)
  To: Parth Shah, linux-kernel, Peter Zijlstra, Patrick Bellasi,
	subhra mazumdar, Valentin Schneider
  Cc: mingo, morten.rasmussen, dietmar.eggemann, pjt, vincent.guittot,
	quentin.perret, dhaval.giani, daniel.lezcano, tj,
	rafael.j.wysocki, qais.yousef

On 9/18/19 5:41 AM, Parth Shah wrote:
> Hello everyone,
> 
> As per the discussion in LPC2019, new per-task property like latency-nice
> can be useful in certain scenarios. The scheduler can take proper decision
> by knowing latency requirement of a task from the end-user itself.
> 
> There has already been an effort from Subhra for introducing Task
> latency-nice [1] values and have seen several possibilities where this type of
> interface can be used.
> 
> From the best of my understanding of the discussion on the mail thread and
> in the LPC2019, it seems that there are two dilemmas;

Thanks for starting the discussion.


> 
> -------------------
> **Usecases**
> -------------------
> 
> $> TurboSched
> ====================
> TurboSched [2] tries to minimize the number of active cores in a socket by
> packing an un-important and low-utilization (named jitter) task on an
> already active core and thus refrains from waking up of a new core if
> possible. This requires tagging of tasks from the userspace hinting which
> tasks are un-important and thus waking-up a new core to minimize the
> latency is un-necessary for such tasks.
> As per the discussion on the posted RFC, it will be appropriate to use the
> task latency property where a task with the highest latency-nice value can
> be packed.
> But for this specific use-cases, having just a binary value to know which
> task is latency-sensitive and which not is sufficient enough, but having a
> range is also a good way to go where above some threshold the task can be
> packed.
> 
> 

$> Separating AVX512 tasks and latency sensitive tasks on separate cores
-------------------------------------------------------------------------
Another usecase we are considering is to segregate those workload that will pull down
core cpu frequency (e.g. AVX512) from workload that are latency sensitive.
There are certain tasks that need to provide a fast response time (latency sensitive)
and they are best scheduled on cpu that has a lighter load and not have other
tasks running on the sibling cpu that could pull down the cpu core frequency.

Some users are running machine learning batch tasks with AVX512, and have observed
that these tasks affect the tasks needing a fast response.  They have to
rely on manual CPU affinity to separate these tasks.  With appropriate
latency hint on task, the scheduler can be taught to separate them.

Tim




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Usecases for the per-task latency-nice attribute
  2019-09-18 14:18 ` Patrick Bellasi
  2019-09-18 15:22   ` Vincent Guittot
  2019-09-18 15:42   ` Valentin Schneider
@ 2019-09-19  7:01   ` Parth Shah
  2 siblings, 0 replies; 17+ messages in thread
From: Parth Shah @ 2019-09-19  7:01 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, Peter Zijlstra, subhra mazumdar, tim.c.chen,
	Valentin Schneider, mingo, morten.rasmussen, dietmar.eggemann,
	pjt, vincent.guittot, quentin.perret, dhaval.giani,
	daniel.lezcano, tj, rafael.j.wysocki, qais.yousef,
	Patrick Bellasi



On 9/18/19 7:48 PM, Patrick Bellasi wrote:
> 
> On Wed, Sep 18, 2019 at 13:41:04 +0100, Parth Shah wrote...
> 
>> Hello everyone,
> 
> Hi Parth,
> thanks for staring this discussion.
> 
> [ + patrick.bellasi@matbug.net ] my new email address, since with
> @arm.com I will not be reachable anymore starting next week.
> 

Noted. I will send new version with the summary of all the discussion and
add more people to CC. Will change your mail in that, thanks for notifying me.

>> As per the discussion in LPC2019, new per-task property like latency-nice
>> can be useful in certain scenarios. The scheduler can take proper decision
>> by knowing latency requirement of a task from the end-user itself.
>>
>> There has already been an effort from Subhra for introducing Task
>> latency-nice [1] values and have seen several possibilities where this type of
>> interface can be used.
>>
>> From the best of my understanding of the discussion on the mail thread and
>> in the LPC2019, it seems that there are two dilemmas;
>>
>> 1. Name: What should be the name for such attr for all the possible usecases?
>> =============
>> Latency nice is the proposed name as of now where the lower value indicates
>> that the task doesn't care much for the latency
> 
> If by "lower value" you mean -19 (in the proposed [-20,19] range), then
> I think the meaning should be the opposite.
> 

Oops, my bad. i wanted to tell higher value but somehow missed that
latency-nice should be the opposite to the latency sensitivity.

But in the further scope of the discussion, I mean -19 to be the least
value (latency sensitive) and +20 to be the greatest value(does not care
for latency) if range is [-19,20]

> A -19 latency-nice task is a task which is not willing to give up
> latency. For those tasks for example we want to reduce the wake-up
> latency at maximum.
> 
> This will keep its semantic aligned to that of process niceness values
> which range from -20 (most favourable to the process) to 19 (least
> favourable to the process).

Totally agreed upon.

> 
>> and we can spend some more time in the kernel to decide a better
>> placement of a task (to save time, energy, etc.)
> 
> Tasks with an high latency-nice value (e.g. 19) are "less sensible to
> latency". These are tasks we wanna optimize mainly for throughput and
> thus, for example, we can spend some more time to find out a better task
> placement at wakeup time.
> 
> Does that makes sense?

Correct. Task placement is one way to optimize which can benefit to both
the server and embedded world by saving power without compromising much on
performance.

> 
>> But there seems to be a bit of confusion on whether we want biasing as well
>> (latency-biased) or something similar, in which case "latency-nice" may
>> confuse the end-user.
> 
> AFAIU PeterZ point was "just" that if we call it "-nice" it has to
> behave as "nice values" to avoid confusions to users. But, if we come up
> with a different naming maybe we will have more freedom.
> 
> Personally, I like both "latency-nice" or "latency-tolerant", where:
> 
>  - latency-nice:
>    should have a better understanding based on pre-existing concepts
> 
>  - latency-tolerant:
>    decouples a bit its meaning from the niceness thus giving maybe a bit
>    more freedom in its complete definition and perhaps avoid any
>    possible interpretation confusion like the one I commented above.
> 
> Fun fact: there was also the latency-nasty proposal from PaulMK :)
> 

Cool. In that sense, latency-tolerant seems to be more flexible covering
multiple functionality that a scheduler can provide with such userspace hints.


>> 2. Value: What should be the range of possible values supported by this new
>> attr?
>> ==============
>> The possible values of such task attribute still need community attention.
>> Do we need a range of values or just binary/ternary values are sufficient?
>> Also signed or unsigned and so the length of the variable (u64, s32,
>> etc)?
> 
> AFAIR, the proposal on the table are essentially two:
> 
>  A) use a [-20,19] range
> 
>     Which has similarities with the niceness concept and gives a minimal
>     continuous range. This can be on hand for things like scaling the
>     vruntime normalization [3]
> 
>  B) use some sort of "profile tagging"
>     e.g. background, latency-sensible, etc...
>     
>     If I correctly got what PaulT was proposing toward the end of the
>     discussion at LPC.
> 

If I got it right, then for option B, we can have this attr to be used as a
latency_flag just like per-process flags (e.g. PF_IDLE). If so, then we can
piggyback on the p->flags itself, hence I will prefer the range unless we
have multiple usecases which can not get best out of the range.

> This last option deserves better exploration.
> 
> At first glance I'm more for option A, I see a range as something that:
> 
>   - gives us a bit of flexibility in terms of the possible internal
>     usages of the actual value
> 
>   - better supports some kind of linear/proportional mapping
> 
>   - still supports a "profile tagging" by (possible) exposing to
>     user-space some kind of system wide knobs defining threshold that
>     maps the continuous value into a "profile"
>     e.g. latency-nice >= 15: use SCHED_BATCH
> 

+1, good listing to support range for latency-<whatever>

>     In the following discussion I'll call "threshold based profiling"
>     this approach.
> 
> 
>> This mail is to initiate the discussion regarding the possible usecases of
>> such per task attribute and to come up with a specific name and value for
>> the same.
>>
>> Hopefully, interested one should plot out their usecase for which this new
>> attr can potentially help in solving or optimizing it.
> 
> +1
> 
>> Well, to start with, here is my usecase.
>>
>> -------------------
>> **Usecases**
>> -------------------
>>
>> $> TurboSched
>> ====================
>> TurboSched [2] tries to minimize the number of active cores in a socket by
>> packing an un-important and low-utilization (named jitter) task on an
>              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> We should really come up with a different name, since jitters clashes
> with other RT related concepts.
> 

I agree, based on LPC discussion and comments from tglx, I am happy to
rename it to whatever feels functionally correct and non-confusing to end-user.

> Maybe we don't even need a name at all, the other two attributes you
> specify are good enough to identify those tasks: they are just "small
> background" tasks.
> 
>   small      : because on their small util_est value
>   background : because of their high latency-nice value
> 

Correct. If we have latency-nice hints + utilization then we can classify
those tasks for task packing.

>> already active core and thus refrains from waking up of a new core if
>> possible. This requires tagging of tasks from the userspace hinting which
>> tasks are un-important and thus waking-up a new core to minimize the
>> latency is un-necessary for such tasks.
>> As per the discussion on the posted RFC, it will be appropriate to use the
>> task latency property where a task with the highest latency-nice value can
>> be packed.
> 
> We should better defined here what you mean with "highest" latency-nice
> value, do you really mean the top of the range, e.g. 19?
> 

yes, I mean +19 (or +20 whichever is higher) here which does not care for
latency.

> Or...
> 
>> But for this specific use-cases, having just a binary value to know which
>> task is latency-sensitive and which not is sufficient enough, but having a
>> range is also a good way to go where above some threshold the task can be
>> packed.
> 
> ... yes, maybe we can reason about a "threshold based profiling" where
> something like for example:
> 
>    /proc/sys/kernel/sched_packing_util_max    : 200
>    /proc/sys/kernel/sched_packing_latency_min : 17
> 
> means that a task with latency-nice >= 17 and util_est <= 200 will be packed?
> 

yes, something like that.

> 
> $> Wakeup path tunings
> ==========================
> 
> Some additional possible use-cases was already discussed in [3]:
> 
>  1. dynamically tune the policy of a task among SCHED_{OTHER,BATCH,IDLE}
>    depending on crossing certain pre-configured threshold of latency
>    niceness.
>   
>  2. dynamically bias the vruntime updates we do in place_entity()
>    depending on the actual latency niceness of a task.
>   
>    PeterZ thinks this is dangerous but that we can "(carefully) fumble a
>    bit there."
>   
>  3. bias the decisions we take in check_preempt_tick() still depending
>    on a relative comparison of the current and wakeup task latency
>    niceness values.
> 

Nice. Thanks for listing out the usecases.

I guess latency_flags will be difficult to use for usecase 2 and 3, but
range will work for all the three usecases.

>> References:
>> ===========
>> [1]. https://lkml.org/lkml/2019/8/30/829
>> [2]. https://lkml.org/lkml/2019/7/25/296
> 
>   [3]. Message-ID: <20190905114709.GM2349@hirez.programming.kicks-ass.net>
>        https://lore.kernel.org/lkml/20190905114709.GM2349@hirez.programming.kicks-ass.net/
> 
> 
> Best,
> Patrick
> 

Thanks,
Parth


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Usecases for the per-task latency-nice attribute
  2019-09-18 17:16 ` Tim Chen
@ 2019-09-19  8:37   ` Parth Shah
  2019-09-19 16:27     ` Tim Chen
  2019-09-19  9:06   ` David Laight
  1 sibling, 1 reply; 17+ messages in thread
From: Parth Shah @ 2019-09-19  8:37 UTC (permalink / raw)
  To: Tim Chen, linux-kernel, Peter Zijlstra, Patrick Bellasi,
	subhra mazumdar, Valentin Schneider
  Cc: mingo, morten.rasmussen, dietmar.eggemann, pjt, vincent.guittot,
	quentin.perret, dhaval.giani, daniel.lezcano, tj,
	rafael.j.wysocki, qais.yousef



On 9/18/19 10:46 PM, Tim Chen wrote:
> On 9/18/19 5:41 AM, Parth Shah wrote:
>> Hello everyone,
>>
>> As per the discussion in LPC2019, new per-task property like latency-nice
>> can be useful in certain scenarios. The scheduler can take proper decision
>> by knowing latency requirement of a task from the end-user itself.
>>
>> There has already been an effort from Subhra for introducing Task
>> latency-nice [1] values and have seen several possibilities where this type of
>> interface can be used.
>>
>> From the best of my understanding of the discussion on the mail thread and
>> in the LPC2019, it seems that there are two dilemmas;
> 
> Thanks for starting the discussion.
> 
> 
>>
>> -------------------
>> **Usecases**
>> -------------------
>>
>> $> TurboSched
>> ====================
>> TurboSched [2] tries to minimize the number of active cores in a socket by
>> packing an un-important and low-utilization (named jitter) task on an
>> already active core and thus refrains from waking up of a new core if
>> possible. This requires tagging of tasks from the userspace hinting which
>> tasks are un-important and thus waking-up a new core to minimize the
>> latency is un-necessary for such tasks.
>> As per the discussion on the posted RFC, it will be appropriate to use the
>> task latency property where a task with the highest latency-nice value can
>> be packed.
>> But for this specific use-cases, having just a binary value to know which
>> task is latency-sensitive and which not is sufficient enough, but having a
>> range is also a good way to go where above some threshold the task can be
>> packed.
>>
>>
> 
> $> Separating AVX512 tasks and latency sensitive tasks on separate cores
> -------------------------------------------------------------------------
> Another usecase we are considering is to segregate those workload that will pull down
> core cpu frequency (e.g. AVX512) from workload that are latency sensitive.
> There are certain tasks that need to provide a fast response time (latency sensitive)
> and they are best scheduled on cpu that has a lighter load and not have other
> tasks running on the sibling cpu that could pull down the cpu core frequency.
> 
> Some users are running machine learning batch tasks with AVX512, and have observed
> that these tasks affect the tasks needing a fast response.  They have to
> rely on manual CPU affinity to separate these tasks.  With appropriate
> latency hint on task, the scheduler can be taught to separate them.
> 

Thanks for listing out your usecase.

This is interesting. If scheduler has the knowledge of AVX512 tasks then
with these interface the scheduler can refrain from picking such core
occupying AVX512 tasks for the task with "latency-nice = -19".

So I guess for this specific use-case, the value for such per-task
attribute should have range (most probably [-19,20]) and the name
"latency-nice" also suits the need.

Do you have any specific values in mind for such attr?


Thanks,
Parth


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Usecases for the per-task latency-nice attribute
  2019-09-18 17:16 ` Tim Chen
  2019-09-19  8:37   ` Parth Shah
@ 2019-09-19  9:06   ` David Laight
  2019-09-19 16:30     ` Tim Chen
  1 sibling, 1 reply; 17+ messages in thread
From: David Laight @ 2019-09-19  9:06 UTC (permalink / raw)
  To: 'Tim Chen',
	Parth Shah, linux-kernel, Peter Zijlstra, Patrick Bellasi,
	subhra mazumdar, Valentin Schneider
  Cc: mingo, morten.rasmussen, dietmar.eggemann, pjt, vincent.guittot,
	quentin.perret, dhaval.giani, daniel.lezcano, tj,
	rafael.j.wysocki, qais.yousef

From: Tim Chen
> Sent: 18 September 2019 18:16
...
> Some users are running machine learning batch tasks with AVX512, and have observed
> that these tasks affect the tasks needing a fast response.  They have to
> rely on manual CPU affinity to separate these tasks.  With appropriate
> latency hint on task, the scheduler can be taught to separate them.

Will (or can) the scheduler pre-empt a low priority process that is spinning
in userspace in order to allow a high priority (or low latency) process run
on that cpu?

My suspicion is that the process switch can't happen until (at least) the
next hardware interrupt - and possibly only a timer tick into the scheduler.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Usecases for the per-task latency-nice attribute
  2019-09-18 12:41 Usecases for the per-task latency-nice attribute Parth Shah
  2019-09-18 14:18 ` Patrick Bellasi
  2019-09-18 17:16 ` Tim Chen
@ 2019-09-19 14:43 ` Qais Yousef
  2019-09-20 10:45   ` Parth Shah
  2 siblings, 1 reply; 17+ messages in thread
From: Qais Yousef @ 2019-09-19 14:43 UTC (permalink / raw)
  To: Parth Shah
  Cc: linux-kernel, Peter Zijlstra, Patrick Bellasi, subhra mazumdar,
	tim.c.chen, Valentin Schneider, mingo, morten.rasmussen,
	dietmar.eggemann, pjt, vincent.guittot, quentin.perret,
	dhaval.giani, daniel.lezcano, tj, rafael.j.wysocki

On 09/18/19 18:11, Parth Shah wrote:
> Hello everyone,
> 
> As per the discussion in LPC2019, new per-task property like latency-nice
> can be useful in certain scenarios. The scheduler can take proper decision
> by knowing latency requirement of a task from the end-user itself.
> 
> There has already been an effort from Subhra for introducing Task
> latency-nice [1] values and have seen several possibilities where this type of
> interface can be used.
> 
> From the best of my understanding of the discussion on the mail thread and
> in the LPC2019, it seems that there are two dilemmas;
> 
> 1. Name: What should be the name for such attr for all the possible usecases?
> =============
> Latency nice is the proposed name as of now where the lower value indicates
> that the task doesn't care much for the latency and we can spend some more
> time in the kernel to decide a better placement of a task (to save time,
> energy, etc.)
> But there seems to be a bit of confusion on whether we want biasing as well
> (latency-biased) or something similar, in which case "latency-nice" may
> confuse the end-user.
> 
> 2. Value: What should be the range of possible values supported by this new
> attr?
> ==============
> The possible values of such task attribute still need community attention.
> Do we need a range of values or just binary/ternary values are sufficient?
> Also signed or unsigned and so the length of the variable (u64, s32, etc)?

IMO the main question is who is the intended user of this new knob/API?

If it's intended for system admins to optimize certain workloads on a system
then I like the latency-nice range.

If we want to support application writers to define the latency requirements of
their tasks then I think latency-nice would be very confusing to use.
Especially when one has to consider they lack a pre-knowledge about the system
they will run on; and what else they are sharing the resources with.

> 
> 
> 
> This mail is to initiate the discussion regarding the possible usecases of
> such per task attribute and to come up with a specific name and value for
> the same.
> 
> Hopefully, interested one should plot out their usecase for which this new
> attr can potentially help in solving or optimizing it.
> 
> 
> Well, to start with, here is my usecase.
> 
> -------------------
> **Usecases**
> -------------------
> 
> $> TurboSched
> ====================
> TurboSched [2] tries to minimize the number of active cores in a socket by
> packing an un-important and low-utilization (named jitter) task on an
> already active core and thus refrains from waking up of a new core if
> possible. This requires tagging of tasks from the userspace hinting which
> tasks are un-important and thus waking-up a new core to minimize the
> latency is un-necessary for such tasks.
> As per the discussion on the posted RFC, it will be appropriate to use the
> task latency property where a task with the highest latency-nice value can
> be packed.
> But for this specific use-cases, having just a binary value to know which
> task is latency-sensitive and which not is sufficient enough, but having a
> range is also a good way to go where above some threshold the task can be
> packed.


$> EAS
====================
The new knob can help EAS path to switch to spreading behavior when
latency-nice is set instead of packing tasks on the most energy efficient CPU.
ie: pick the most energy efficient idle CPU.

--
Qais Yousef

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Usecases for the per-task latency-nice attribute
  2019-09-19  8:37   ` Parth Shah
@ 2019-09-19 16:27     ` Tim Chen
  0 siblings, 0 replies; 17+ messages in thread
From: Tim Chen @ 2019-09-19 16:27 UTC (permalink / raw)
  To: Parth Shah, linux-kernel, Peter Zijlstra, Patrick Bellasi,
	subhra mazumdar, Valentin Schneider
  Cc: mingo, morten.rasmussen, dietmar.eggemann, pjt, vincent.guittot,
	quentin.perret, dhaval.giani, daniel.lezcano, tj,
	rafael.j.wysocki, qais.yousef

On 9/19/19 1:37 AM, Parth Shah wrote:
> 
>>
>> $> Separating AVX512 tasks and latency sensitive tasks on separate cores
>> -------------------------------------------------------------------------
>> Another usecase we are considering is to segregate those workload that will pull down
>> core cpu frequency (e.g. AVX512) from workload that are latency sensitive.
>> There are certain tasks that need to provide a fast response time (latency sensitive)
>> and they are best scheduled on cpu that has a lighter load and not have other
>> tasks running on the sibling cpu that could pull down the cpu core frequency.
>>
>> Some users are running machine learning batch tasks with AVX512, and have observed
>> that these tasks affect the tasks needing a fast response.  They have to
>> rely on manual CPU affinity to separate these tasks.  With appropriate
>> latency hint on task, the scheduler can be taught to separate them.
>>
> 
> Thanks for listing out your usecase.
> 
> This is interesting. If scheduler has the knowledge of AVX512 tasks then
> with these interface the scheduler can refrain from picking such core
> occupying AVX512 tasks for the task with "latency-nice = -19".
> 
> So I guess for this specific use-case, the value for such per-task
> attribute should have range (most probably [-19,20]) and the name
> "latency-nice" also suits the need.

Yes.

> 
> Do you have any specific values in mind for such attr?

Not really.  I assume a [-19 20] range that the user who launch the
task will set.  Probably something towards the -19 end for latency
sensitive task and something towards the 20 end for AVX512 tasks.  And 0
as default for most tasks.

Tim

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Usecases for the per-task latency-nice attribute
  2019-09-19  9:06   ` David Laight
@ 2019-09-19 16:30     ` Tim Chen
  0 siblings, 0 replies; 17+ messages in thread
From: Tim Chen @ 2019-09-19 16:30 UTC (permalink / raw)
  To: David Laight, Parth Shah, linux-kernel, Peter Zijlstra,
	Patrick Bellasi, subhra mazumdar, Valentin Schneider
  Cc: mingo, morten.rasmussen, dietmar.eggemann, pjt, vincent.guittot,
	quentin.perret, dhaval.giani, daniel.lezcano, tj,
	rafael.j.wysocki, qais.yousef

On 9/19/19 2:06 AM, David Laight wrote:
> From: Tim Chen
>> Sent: 18 September 2019 18:16
> ...
>> Some users are running machine learning batch tasks with AVX512, and have observed
>> that these tasks affect the tasks needing a fast response.  They have to
>> rely on manual CPU affinity to separate these tasks.  With appropriate
>> latency hint on task, the scheduler can be taught to separate them.
> 
> Will (or can) the scheduler pre-empt a low priority process that is spinning
> in userspace in order to allow a high priority (or low latency) process run
> on that cpu?
> 
> My suspicion is that the process switch can't happen until (at least) the
> next hardware interrupt - and possibly only a timer tick into the scheduler.
> 

The issue has to do with AVX512 running on the HT sibling, which pulls down
the core frequency.  So latency sensitive tasks are not blocked but are
running concurrently on siblings, but slower.  With latency hint, the scheduler
can try to avoid putting them on the same core.

Tim

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Usecases for the per-task latency-nice attribute
  2019-09-18 15:42   ` Valentin Schneider
@ 2019-09-19 16:41     ` Parth Shah
  2019-09-19 18:07       ` Valentin Schneider
  2019-09-27 13:53       ` Pavel Machek
  0 siblings, 2 replies; 17+ messages in thread
From: Parth Shah @ 2019-09-19 16:41 UTC (permalink / raw)
  To: Valentin Schneider, Patrick Bellasi
  Cc: linux-kernel, Peter Zijlstra, subhra mazumdar, tim.c.chen, mingo,
	morten.rasmussen, dietmar.eggemann, pjt, vincent.guittot,
	quentin.perret, dhaval.giani, daniel.lezcano, tj,
	rafael.j.wysocki, qais.yousef, Patrick Bellasi



On 9/18/19 9:12 PM, Valentin Schneider wrote:
> On 18/09/2019 15:18, Patrick Bellasi wrote:
>>> 1. Name: What should be the name for such attr for all the possible usecases?
>>> =============
>>> Latency nice is the proposed name as of now where the lower value indicates
>>> that the task doesn't care much for the latency
>>
>> If by "lower value" you mean -19 (in the proposed [-20,19] range), then
>> I think the meaning should be the opposite.
>>
>> A -19 latency-nice task is a task which is not willing to give up
>> latency. For those tasks for example we want to reduce the wake-up
>> latency at maximum.
>>
>> This will keep its semantic aligned to that of process niceness values
>> which range from -20 (most favourable to the process) to 19 (least
>> favourable to the process).
>>
> 
> I don't want to start a bikeshedding session here, but I agree with Parth
> on the interpretation of the values.
> 
> I've always read niceness values as
> -20 (least nice to the system / other processes)
> +19 (most nice to the system / other processes)
> 
> So following this trend I'd see for latency-nice:


So jotting down separately, in case if we think to have "latency-nice"
terminology, then we might need to select one of the 2 interpretation:

1).
> -20 (least nice to latency, i.e. sacrifice latency for throughput)
> +19 (most nice to latency, i.e. sacrifice throughput for latency)
> 

2).
-20 (least nice to other task in terms of sacrificing latency, i.e.
latency-sensitive)
+19 (most nice to other tasks in terms of sacrificing latency, i.e.
latency-forgoing)


> However...
> 
>>> But there seems to be a bit of confusion on whether we want biasing as well
>>> (latency-biased) or something similar, in which case "latency-nice" may
>>> confuse the end-user.
>>
>> AFAIU PeterZ point was "just" that if we call it "-nice" it has to
>> behave as "nice values" to avoid confusions to users. But, if we come up
>> with a different naming maybe we will have more freedom.
>>
> 
> ...just getting rid of the "-nice" would leave us free not to have to
> interpret the values as "nice to / not nice to" :)
> 
>> Personally, I like both "latency-nice" or "latency-tolerant", where:
>>
>>  - latency-nice:
>>    should have a better understanding based on pre-existing concepts
>>
>>  - latency-tolerant:
>>    decouples a bit its meaning from the niceness thus giving maybe a bit
>>    more freedom in its complete definition and perhaps avoid any
>>    possible interpretation confusion like the one I commented above.
>>
>> Fun fact: there was also the latency-nasty proposal from PaulMK :)
>>
> 
> [...]
> 
>>
>> $> Wakeup path tunings
>> ==========================
>>
>> Some additional possible use-cases was already discussed in [3]:
>>
>>  - dynamically tune the policy of a task among SCHED_{OTHER,BATCH,IDLE}
>>    depending on crossing certain pre-configured threshold of latency
>>    niceness.
>>   
>>  - dynamically bias the vruntime updates we do in place_entity()
>>    depending on the actual latency niceness of a task.
>>   
>>    PeterZ thinks this is dangerous but that we can "(carefully) fumble a
>>    bit there."
>>   
>>  - bias the decisions we take in check_preempt_tick() still depending
>>    on a relative comparison of the current and wakeup task latency
>>    niceness values.
> 
> Aren't we missing the point about tweaking the sched domain scans (which
> AFAIR was the original point for latency-nice)?
> 
> Something like default value is current behaviour and
> - Being less latency-sensitive means increasing the scans (e.g. trending
>   towards only going through the slow wakeup-path at the extreme setting)
> - Being more latency-sensitive means reducing the scans (e.g. trending
>   towards a fraction of the domain scanned in the fast-path at the extreme
>   setting).
> 

Correct. But I was pondering upon the values required for this case.
Is having just a range from [-20,19] even for larger system sufficient enough?

>>
> 
> $> Load balance tuning
> ======================
> 
> Already mentioned these in [4]:
> 
> - Increase (reduce) nr_balance_failed threshold when trying to active
>   balance a latency-sensitive (non-latency-sensitive) task.
> 
> - Increase (decrease) sched_migration_cost factor in task_hot() for
>   latency-sensitive (non-latency-sensitive) tasks.
> 

Thanks for listing down your ideas.

These are pretty useful optimization in general. But one may wonder if we
reduce the search scans for idle-core in wake-up path and by-chance selects
the busy core, then one would expect load balancer to move the task to idle
core.

If I got it correct, the in such cases, the sched_migration_cost should be
carefully increased, right?


>>> References:
>>> ===========
>>> [1]. https://lkml.org/lkml/2019/8/30/829
>>> [2]. https://lkml.org/lkml/2019/7/25/296
>>
>>   [3]. Message-ID: <20190905114709.GM2349@hirez.programming.kicks-ass.net>
>>        https://lore.kernel.org/lkml/20190905114709.GM2349@hirez.programming.kicks-ass.net/
>>
> 
> [4]: https://lkml.kernel.org/r/3d3306e4-3a78-5322-df69-7665cf01cc43@arm.com
> 
>>
>> Best,
>> Patrick
>>

Thanks,
Parth


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Usecases for the per-task latency-nice attribute
  2019-09-19 16:41     ` Parth Shah
@ 2019-09-19 18:07       ` Valentin Schneider
  2019-09-27 13:53       ` Pavel Machek
  1 sibling, 0 replies; 17+ messages in thread
From: Valentin Schneider @ 2019-09-19 18:07 UTC (permalink / raw)
  To: Parth Shah, Patrick Bellasi
  Cc: linux-kernel, Peter Zijlstra, subhra mazumdar, tim.c.chen, mingo,
	morten.rasmussen, dietmar.eggemann, pjt, vincent.guittot,
	quentin.perret, dhaval.giani, daniel.lezcano, tj,
	rafael.j.wysocki, qais.yousef, Patrick Bellasi

On 19/09/2019 17:41, Parth Shah wrote:
> So jotting down separately, in case if we think to have "latency-nice"
> terminology, then we might need to select one of the 2 interpretation:
> 
> 1).
>> -20 (least nice to latency, i.e. sacrifice latency for throughput)
>> +19 (most nice to latency, i.e. sacrifice throughput for latency)
>>
> 
> 2).
> -20 (least nice to other task in terms of sacrificing latency, i.e.
> latency-sensitive)
> +19 (most nice to other tasks in terms of sacrificing latency, i.e.
> latency-forgoing)
> 
> 

I'd vote for 1 (duh) but won't fight for it, if it comes to it I'd be
happy with a random draw :D

>> Aren't we missing the point about tweaking the sched domain scans (which
>> AFAIR was the original point for latency-nice)?
>>
>> Something like default value is current behaviour and
>> - Being less latency-sensitive means increasing the scans (e.g. trending
>>   towards only going through the slow wakeup-path at the extreme setting)
>> - Being more latency-sensitive means reducing the scans (e.g. trending
>>   towards a fraction of the domain scanned in the fast-path at the extreme
>>   setting).
>>
> 
> Correct. But I was pondering upon the values required for this case.
> Is having just a range from [-20,19] even for larger system sufficient enough?
> 

As I said in the original thread by Subhra, this range should be plenty
enough IMO. You get ~5% deltas in each direction after all.

>>>
>>
>> $> Load balance tuning
>> ======================
>>
>> Already mentioned these in [4]:
>>
>> - Increase (reduce) nr_balance_failed threshold when trying to active
>>   balance a latency-sensitive (non-latency-sensitive) task.
>>
>> - Increase (decrease) sched_migration_cost factor in task_hot() for
>>   latency-sensitive (non-latency-sensitive) tasks.
>>
> 
> Thanks for listing down your ideas.
> 
> These are pretty useful optimization in general. But one may wonder if we
> reduce the search scans for idle-core in wake-up path and by-chance selects
> the busy core, then one would expect load balancer to move the task to idle
> core.
> 
> If I got it correct, the in such cases, the sched_migration_cost should be
> carefully increased, right?
> 

IIUC you're describing a scenario where we fail to find an idle core due to
a wakee being latency-sensitive (thus shorter scan), and place it on a rq
that already has runnable tasks (despite idle rqs being available).

In this case yes, we could potentially have a balance attempt trying to pull
from that rq. We'd try to pull the non-running tasks first, and if a
latency-sensitive task happens to be one of them we should be careful with
what we do - a migration could lead to unwanted latency.

It might be a bit more clear when you're balancing between busy cores - 
overall I think you should try to migrate the non-latency-sensitive
tasks first. Playing with task_hot() could be one of the ways to do that, but
it's just a suggestion at this time.

> 
>>>> References:
>>>> ===========
>>>> [1]. https://lkml.org/lkml/2019/8/30/829
>>>> [2]. https://lkml.org/lkml/2019/7/25/296
>>>
>>>   [3]. Message-ID: <20190905114709.GM2349@hirez.programming.kicks-ass.net>
>>>        https://lore.kernel.org/lkml/20190905114709.GM2349@hirez.programming.kicks-ass.net/
>>>
>>
>> [4]: https://lkml.kernel.org/r/3d3306e4-3a78-5322-df69-7665cf01cc43@arm.com
>>
>>>
>>> Best,
>>> Patrick
>>>
> 
> Thanks,
> Parth
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Usecases for the per-task latency-nice attribute
  2019-09-19 14:43 ` Qais Yousef
@ 2019-09-20 10:45   ` Parth Shah
  0 siblings, 0 replies; 17+ messages in thread
From: Parth Shah @ 2019-09-20 10:45 UTC (permalink / raw)
  To: Qais Yousef
  Cc: linux-kernel, Peter Zijlstra, Patrick Bellasi, subhra mazumdar,
	tim.c.chen, Valentin Schneider, mingo, morten.rasmussen,
	dietmar.eggemann, pjt, vincent.guittot, quentin.perret,
	dhaval.giani, daniel.lezcano, tj, rafael.j.wysocki



On 9/19/19 8:13 PM, Qais Yousef wrote:
> On 09/18/19 18:11, Parth Shah wrote:
>> Hello everyone,
>>
>> As per the discussion in LPC2019, new per-task property like latency-nice
>> can be useful in certain scenarios. The scheduler can take proper decision
>> by knowing latency requirement of a task from the end-user itself.
>>
>> There has already been an effort from Subhra for introducing Task
>> latency-nice [1] values and have seen several possibilities where this type of
>> interface can be used.
>>
>> From the best of my understanding of the discussion on the mail thread and
>> in the LPC2019, it seems that there are two dilemmas;
>>
>> 1. Name: What should be the name for such attr for all the possible usecases?
>> =============
>> Latency nice is the proposed name as of now where the lower value indicates
>> that the task doesn't care much for the latency and we can spend some more
>> time in the kernel to decide a better placement of a task (to save time,
>> energy, etc.)
>> But there seems to be a bit of confusion on whether we want biasing as well
>> (latency-biased) or something similar, in which case "latency-nice" may
>> confuse the end-user.
>>
>> 2. Value: What should be the range of possible values supported by this new
>> attr?
>> ==============
>> The possible values of such task attribute still need community attention.
>> Do we need a range of values or just binary/ternary values are sufficient?
>> Also signed or unsigned and so the length of the variable (u64, s32, etc)?
> 
> IMO the main question is who is the intended user of this new knob/API?
> 
> If it's intended for system admins to optimize certain workloads on a system
> then I like the latency-nice range.
> 
> If we want to support application writers to define the latency requirements of
> their tasks then I think latency-nice would be very confusing to use.
> Especially when one has to consider they lack a pre-knowledge about the system
> they will run on; and what else they are sharing the resources with.
> 

Yes, valid point.
But from my view, this will most certainly be for system admins who can
optimize certain workloads from the systemd, tuned or similar OS daemons.

>>
>>
>>
>> This mail is to initiate the discussion regarding the possible usecases of
>> such per task attribute and to come up with a specific name and value for
>> the same.
>>
>> Hopefully, interested one should plot out their usecase for which this new
>> attr can potentially help in solving or optimizing it.
>>
>>
>> Well, to start with, here is my usecase.
>>
>> -------------------
>> **Usecases**
>> -------------------
>>
>> $> TurboSched
>> ====================
>> TurboSched [2] tries to minimize the number of active cores in a socket by
>> packing an un-important and low-utilization (named jitter) task on an
>> already active core and thus refrains from waking up of a new core if
>> possible. This requires tagging of tasks from the userspace hinting which
>> tasks are un-important and thus waking-up a new core to minimize the
>> latency is un-necessary for such tasks.
>> As per the discussion on the posted RFC, it will be appropriate to use the
>> task latency property where a task with the highest latency-nice value can
>> be packed.
>> But for this specific use-cases, having just a binary value to know which
>> task is latency-sensitive and which not is sufficient enough, but having a
>> range is also a good way to go where above some threshold the task can be
>> packed.
> 
> 
> $> EAS
> ====================
> The new knob can help EAS path to switch to spreading behavior when
> latency-nice is set instead of packing tasks on the most energy efficient CPU.
> ie: pick the most energy efficient idle CPU.
> 

+1

Thanks,
Parth

> --
> Qais Yousef
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Usecases for the per-task latency-nice attribute
  2019-09-19 16:41     ` Parth Shah
  2019-09-19 18:07       ` Valentin Schneider
@ 2019-09-27 13:53       ` Pavel Machek
  1 sibling, 0 replies; 17+ messages in thread
From: Pavel Machek @ 2019-09-27 13:53 UTC (permalink / raw)
  To: Parth Shah
  Cc: Valentin Schneider, Patrick Bellasi, linux-kernel,
	Peter Zijlstra, subhra mazumdar, tim.c.chen, mingo,
	morten.rasmussen, dietmar.eggemann, pjt, vincent.guittot,
	quentin.perret, dhaval.giani, daniel.lezcano, tj,
	rafael.j.wysocki, qais.yousef, Patrick Bellasi

Hi!

> > I don't want to start a bikeshedding session here, but I agree with Parth
> > on the interpretation of the values.
> > 
> > I've always read niceness values as
> > -20 (least nice to the system / other processes)
> > +19 (most nice to the system / other processes)
> > 
> > So following this trend I'd see for latency-nice:
> 
> 
> So jotting down separately, in case if we think to have "latency-nice"
> terminology, then we might need to select one of the 2 interpretation:
> 
> 1).
> > -20 (least nice to latency, i.e. sacrifice latency for throughput)
> > +19 (most nice to latency, i.e. sacrifice throughput for latency)
> > 
> 
> 2).
> -20 (least nice to other task in terms of sacrificing latency, i.e.
> latency-sensitive)
> +19 (most nice to other tasks in terms of sacrificing latency, i.e.
> latency-forgoing)

For the record, interpretation 2 makes sense to me.

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-09-27 13:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-18 12:41 Usecases for the per-task latency-nice attribute Parth Shah
2019-09-18 14:18 ` Patrick Bellasi
2019-09-18 15:22   ` Vincent Guittot
2019-09-18 15:46     ` Patrick Bellasi
2019-09-18 16:00       ` Vincent Guittot
2019-09-18 15:42   ` Valentin Schneider
2019-09-19 16:41     ` Parth Shah
2019-09-19 18:07       ` Valentin Schneider
2019-09-27 13:53       ` Pavel Machek
2019-09-19  7:01   ` Parth Shah
2019-09-18 17:16 ` Tim Chen
2019-09-19  8:37   ` Parth Shah
2019-09-19 16:27     ` Tim Chen
2019-09-19  9:06   ` David Laight
2019-09-19 16:30     ` Tim Chen
2019-09-19 14:43 ` Qais Yousef
2019-09-20 10:45   ` Parth Shah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).