linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] sched/eevdf: sched_attr::sched_runtime slice hint
@ 2023-09-15 12:43 peterz
  2023-09-15 12:43 ` [PATCH 1/2] sched/eevdf: Also update slice on placement peterz
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: peterz @ 2023-09-15 12:43 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, peterz, vincent.guittot, juri.lelli,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, corbet,
	qyousef, chris.hyser, patrick.bellasi, pjt, pavel, qperret,
	tim.c.chen, joshdon, timj, kprateek.nayak, yu.c.chen,
	youssefesmat, joel, efault, tglx, daniel.m.jordan

Hi,

As promised a while ago, here is a new version of the variable slice length
hint stuff.  Back when I asked for comments on the latency-nice vs slice length
thing, there was very limited feedback on-list, a number of people have
expressed interrest in the slice length hint.


I'm still working on improving the wakeup latency -- but esp. after commit:

  63304558ba5d ("sched/eevdf: Curb wakeup-preemption")

it needs a little more work. Everything I tried so far made it worse.

As is it behaves ok-ish:

  root@ivb-ep:~/bench# cat doit-latency-slice.sh
  #!/bin/bash

  perf bench sched messaging -g 40 -l 12000 &

  sleep 1
  chrt3 -o --sched-runtime $((`cat /debug/sched/base_slice_ns`*10)) 0 cyclictest --policy other -D 5 -q   -H 20000 --histfile data.txt ; grep Latencies data.txt
  chrt3 -o --sched-runtime 0 0 cyclictest --policy other -D 5 -q   -H 20000 --histfile data.txt ; grep Latencies data.txt
  chrt3 -o --sched-runtime $((`cat /debug/sched/base_slice_ns`/10)) 0 cyclictest --policy other -D 5 -q   -H 20000 --histfile data.txt ; grep Latencies data.txt

  wait $!
  root@ivb-ep:~/bench# ./doit-latency-slice.sh
  # Running 'sched/messaging' benchmark:
  # /dev/cpu_dma_latency set to 0us
  # Min Latencies: 00060
  # Avg Latencies: 00990
  # Max Latencies: 224925
  # /dev/cpu_dma_latency set to 0us
  # Min Latencies: 00020
  # Avg Latencies: 00656
  # Max Latencies: 37595
  # /dev/cpu_dma_latency set to 0us
  # Min Latencies: 00016
  # Avg Latencies: 00354
  # Max Latencies: 16687
  # 20 sender and receiver processes per group
  # 40 groups == 1600 processes run

       Total time: 38.246 [sec]


(chrt3 is a hacked up version of util-linux/chrt that allows --sched-runtime unconditionally)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] sched/eevdf: Also update slice on placement
  2023-09-15 12:43 [PATCH 0/2] sched/eevdf: sched_attr::sched_runtime slice hint peterz
@ 2023-09-15 12:43 ` peterz
  2023-10-03 10:42   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
  2023-09-15 12:43 ` [PATCH 2/2] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion peterz
  2023-09-16 21:33 ` [PATCH 0/2] sched/eevdf: sched_attr::sched_runtime slice hint Qais Yousef
  2 siblings, 1 reply; 14+ messages in thread
From: peterz @ 2023-09-15 12:43 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, peterz, vincent.guittot, juri.lelli,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, corbet,
	qyousef, chris.hyser, patrick.bellasi, pjt, pavel, qperret,
	tim.c.chen, joshdon, timj, kprateek.nayak, yu.c.chen,
	youssefesmat, joel, efault, tglx, daniel.m.jordan

[-- Attachment #1: peterz-sched-eevdf-slice-update.patch --]
[-- Type: text/plain, Size: 1128 bytes --]

Tasks that never consume their full slice would not update their slice value.
This means that tasks that are spawned before the sysctl scaling keep their
original (UP) slice length.

Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/fair.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/sched/fair.c
===================================================================
--- linux-2.6.orig/kernel/sched/fair.c
+++ linux-2.6/kernel/sched/fair.c
@@ -4919,10 +4919,12 @@ static inline void update_misfit_status(
 static void
 place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 {
-	u64 vslice = calc_delta_fair(se->slice, se);
-	u64 vruntime = avg_vruntime(cfs_rq);
+	u64 vslice, vruntime = avg_vruntime(cfs_rq);
 	s64 lag = 0;
 
+	se->slice = sysctl_sched_base_slice;
+	vslice = calc_delta_fair(se->slice, se);
+
 	/*
 	 * Due to how V is constructed as the weighted average of entities,
 	 * adding tasks with positive lag, or removing tasks with negative lag



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/2] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion
  2023-09-15 12:43 [PATCH 0/2] sched/eevdf: sched_attr::sched_runtime slice hint peterz
  2023-09-15 12:43 ` [PATCH 1/2] sched/eevdf: Also update slice on placement peterz
@ 2023-09-15 12:43 ` peterz
  2023-09-19  7:53   ` Mike Galbraith
  2023-09-19 22:07   ` Qais Yousef
  2023-09-16 21:33 ` [PATCH 0/2] sched/eevdf: sched_attr::sched_runtime slice hint Qais Yousef
  2 siblings, 2 replies; 14+ messages in thread
From: peterz @ 2023-09-15 12:43 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, peterz, vincent.guittot, juri.lelli,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, corbet,
	qyousef, chris.hyser, patrick.bellasi, pjt, pavel, qperret,
	tim.c.chen, joshdon, timj, kprateek.nayak, yu.c.chen,
	youssefesmat, joel, efault, tglx, daniel.m.jordan

[-- Attachment #1: peterz-sched_attr-runtime.patch --]
[-- Type: text/plain, Size: 9418 bytes --]

Allow applications to directly set a suggested request/slice length using
sched_attr::sched_runtime.

The implementation clamps the value to: 0.1[ms] <= slice <= 100[ms]
which is 1/10 the size of HZ=1000 and 10 times the size of HZ=100.

Applications should strive to use their periodic runtime at a high
confidence interval (95%+) as the target slice. Using a smaller slice
will introduce undue preemptions, while using a larger value will
increase latency.

For all the following examples assume a scheduling quantum of 8, and for
consistency all examples have W=4:

  {A,B,C,D}(w=1,r=8):

  ABCD...
  +---+---+---+---

  t=0, V=1.5				t=1, V=3.5
  A  |------<				A          |------<
  B   |------<				B   |------<
  C    |------<				C    |------<
  D     |------<			D     |------<
  ---+*------+-------+---		---+--*----+-------+---

  t=2, V=5.5				t=3, V=7.5
  A          |------<			A          |------<
  B           |------<			B           |------<
  C    |------<				C            |------<
  D     |------<			D     |------<
  ---+----*--+-------+---		---+------*+-------+---

Note: 4 identical tasks in FIFO order

~~~

  {A,B}(w=1,r=16) C(w=2,r=16)

  AACCBBCC...
  +---+---+---+---

  t=0, V=1.25				t=2, V=5.25
  A  |--------------<                   A                  |--------------<
  B   |--------------<                  B   |--------------<
  C    |------<                         C    |------<
  ---+*------+-------+---               ---+----*--+-------+---

  t=4, V=8.25				t=6, V=12.25
  A                  |--------------<   A                  |--------------<
  B   |--------------<                  B                   |--------------<
  C            |------<                 C            |------<
  ---+-------*-------+---               ---+-------+---*---+---

Note: 1 heavy task -- because q=8, double r such that the deadline of the w=2
      task doesn't go below q.

Note: observe the full schedule becomes: W*max(r_i/w_i) = 4*2q = 8q in length.

Note: the period of the heavy task is half the full period at:
      W*(r_i/w_i) = 4*(2q/2) = 4q

~~~

  {A,C,D}(w=1,r=16) B(w=1,r=8):

  BAACCBDD...
  +---+---+---+---

  t=0, V=1.5				t=1, V=3.5
  A  |--------------<			A  |---------------<
  B   |------<				B           |------<
  C    |--------------<			C    |--------------<
  D     |--------------<		D     |--------------<
  ---+*------+-------+---		---+--*----+-------+---

  t=3, V=7.5				t=5, V=11.5
  A                  |---------------<  A                  |---------------<
  B           |------<                  B           |------<
  C    |--------------<                 C                    |--------------<
  D     |--------------<                D     |--------------<
  ---+------*+-------+---               ---+-------+--*----+---

  t=6, V=13.5
  A                  |---------------<
  B                   |------<
  C                    |--------------<
  D     |--------------<
  ---+-------+----*--+---

Note: 1 short task -- again double r so that the deadline of the short task
      won't be below q. Made B short because its not the leftmost task, but is
      eligible with the 0,1,2,3 spread.

Note: like with the heavy task, the period of the short task observes:
      W*(r_i/w_i) = 4*(1q/1) = 4q

~~~

  A(w=1,r=16) B(w=1,r=8) C(w=2,r=16)

  BCCAABCC...
  +---+---+---+---

  t=0, V=1.25				t=1, V=3.25
  A  |--------------<                   A  |--------------<
  B   |------<                          B           |------<
  C    |------<                         C    |------<
  ---+*------+-------+---               ---+--*----+-------+---

  t=3, V=7.25				t=5, V=11.25
  A  |--------------<                   A                  |--------------<
  B           |------<                  B           |------<
  C            |------<                 C            |------<
  ---+------*+-------+---               ---+-------+--*----+---

  t=6, V=13.25
  A                  |--------------<
  B                   |------<
  C            |------<
  ---+-------+----*--+---

Note: 1 heavy and 1 short task -- combine them all.

Note: both the short and heavy task end up with a period of 4q

~~~

  A(w=1,r=16) B(w=2,r=16) C(w=1,r=8)

  BBCAABBC...
  +---+---+---+---

  t=0, V=1				t=2, V=5
  A  |--------------<                   A  |--------------<
  B   |------<                          B           |------<
  C    |------<                         C    |------<
  ---+*------+-------+---               ---+----*--+-------+---

  t=3, V=7				t=5, V=11
  A  |--------------<                   A                  |--------------<
  B           |------<                  B           |------<
  C            |------<                 C            |------<
  ---+------*+-------+---               ---+-------+--*----+---

  t=7, V=15
  A                  |--------------<
  B                   |------<
  C            |------<
  ---+-------+------*+---

Note: as before but permuted

~~~

>From all this it can be deduced that, for the steady state:

 - the total period (P) of a schedule is:	W*max(r_i/w_i)
 - the average period of a task is:		W*(r_i/w_i)
 - each task obtains the fair share:		w_i/W of each full period P

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched.h |    3 +++
 kernel/sched/core.c   |   33 ++++++++++++++++++++++++++-------
 kernel/sched/fair.c   |    6 ++++--
 3 files changed, 33 insertions(+), 9 deletions(-)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -555,6 +555,9 @@ struct sched_entity {
 	struct list_head		group_node;
 	unsigned int			on_rq;
 
+	unsigned int			custom_slice : 1;
+					/* 31 bits hole */
+
 	u64				exec_start;
 	u64				sum_exec_runtime;
 	u64				prev_sum_exec_runtime;
Index: linux-2.6/kernel/sched/core.c
===================================================================
--- linux-2.6.orig/kernel/sched/core.c
+++ linux-2.6/kernel/sched/core.c
@@ -4501,7 +4501,6 @@ static void __sched_fork(unsigned long c
 	p->se.nr_migrations		= 0;
 	p->se.vruntime			= 0;
 	p->se.vlag			= 0;
-	p->se.slice			= sysctl_sched_base_slice;
 	INIT_LIST_HEAD(&p->se.group_node);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
@@ -4755,6 +4754,8 @@ int sched_fork(unsigned long clone_flags
 
 		p->prio = p->normal_prio = p->static_prio;
 		set_load_weight(p, false);
+		p->se.custom_slice = 0;
+		p->se.slice = sysctl_sched_base_slice;
 
 		/*
 		 * We don't need the reset flag anymore after the fork. It has
@@ -7552,10 +7553,20 @@ static void __setscheduler_params(struct
 
 	p->policy = policy;
 
-	if (dl_policy(policy))
+	if (dl_policy(policy)) {
 		__setparam_dl(p, attr);
-	else if (fair_policy(policy))
+	} else if (fair_policy(policy)) {
 		p->static_prio = NICE_TO_PRIO(attr->sched_nice);
+		if (attr->sched_runtime) {
+			p->se.custom_slice = 1;
+			p->se.slice = clamp_t(u64, attr->sched_runtime,
+					      NSEC_PER_MSEC/10,   /* HZ=1000 * 10 */
+					      NSEC_PER_MSEC*100); /* HZ=100  / 10 */
+		} else {
+			p->se.custom_slice = 0;
+			p->se.slice = sysctl_sched_base_slice;
+		}
+	}
 
 	/*
 	 * __sched_setscheduler() ensures attr->sched_priority == 0 when
@@ -7740,7 +7751,9 @@ recheck:
 	 * but store a possible modification of reset_on_fork.
 	 */
 	if (unlikely(policy == p->policy)) {
-		if (fair_policy(policy) && attr->sched_nice != task_nice(p))
+		if (fair_policy(policy) &&
+		    (attr->sched_nice != task_nice(p) ||
+		     (attr->sched_runtime && attr->sched_runtime != p->se.slice)))
 			goto change;
 		if (rt_policy(policy) && attr->sched_priority != p->rt_priority)
 			goto change;
@@ -7886,6 +7899,9 @@ static int _sched_setscheduler(struct ta
 		.sched_nice	= PRIO_TO_NICE(p->static_prio),
 	};
 
+	if (p->se.custom_slice)
+		attr.sched_runtime = p->se.slice;
+
 	/* Fixup the legacy SCHED_RESET_ON_FORK hack. */
 	if ((policy != SETPARAM_POLICY) && (policy & SCHED_RESET_ON_FORK)) {
 		attr.sched_flags |= SCHED_FLAG_RESET_ON_FORK;
@@ -8062,12 +8078,14 @@ err_size:
 
 static void get_params(struct task_struct *p, struct sched_attr *attr)
 {
-	if (task_has_dl_policy(p))
+	if (task_has_dl_policy(p)) {
 		__getparam_dl(p, attr);
-	else if (task_has_rt_policy(p))
+	} else if (task_has_rt_policy(p)) {
 		attr->sched_priority = p->rt_priority;
-	else
+	} else {
 		attr->sched_nice = task_nice(p);
+		attr->sched_runtime = p->se.slice;
+	}
 }
 
 /**
@@ -10086,6 +10104,7 @@ void __init sched_init(void)
 	}
 
 	set_load_weight(&init_task, false);
+	init_task.se.slice = sysctl_sched_base_slice,
 
 	/*
 	 * The boot idle thread does lazy MMU switching as well:
Index: linux-2.6/kernel/sched/fair.c
===================================================================
--- linux-2.6.orig/kernel/sched/fair.c
+++ linux-2.6/kernel/sched/fair.c
@@ -974,7 +974,8 @@ static void update_deadline(struct cfs_r
 	 * nice) while the request time r_i is determined by
 	 * sysctl_sched_base_slice.
 	 */
-	se->slice = sysctl_sched_base_slice;
+	if (!se->custom_slice)
+		se->slice = sysctl_sched_base_slice;
 
 	/*
 	 * EEVDF: vd_i = ve_i + r_i / w_i
@@ -4922,7 +4923,8 @@ place_entity(struct cfs_rq *cfs_rq, stru
 	u64 vslice, vruntime = avg_vruntime(cfs_rq);
 	s64 lag = 0;
 
-	se->slice = sysctl_sched_base_slice;
+	if (!se->custom_slice)
+		se->slice = sysctl_sched_base_slice;
 	vslice = calc_delta_fair(se->slice, se);
 
 	/*



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] sched/eevdf: sched_attr::sched_runtime slice hint
  2023-09-15 12:43 [PATCH 0/2] sched/eevdf: sched_attr::sched_runtime slice hint peterz
  2023-09-15 12:43 ` [PATCH 1/2] sched/eevdf: Also update slice on placement peterz
  2023-09-15 12:43 ` [PATCH 2/2] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion peterz
@ 2023-09-16 21:33 ` Qais Yousef
  2023-09-18  3:43   ` Mike Galbraith
  2 siblings, 1 reply; 14+ messages in thread
From: Qais Yousef @ 2023-09-16 21:33 UTC (permalink / raw)
  To: peterz
  Cc: mingo, linux-kernel, vincent.guittot, juri.lelli,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, corbet,
	chris.hyser, patrick.bellasi, pjt, pavel, qperret, tim.c.chen,
	joshdon, timj, kprateek.nayak, yu.c.chen, youssefesmat, joel,
	efault, tglx, daniel.m.jordan

On 09/15/23 14:43, peterz@infradead.org wrote:
> Hi,
> 
> As promised a while ago, here is a new version of the variable slice length
> hint stuff.  Back when I asked for comments on the latency-nice vs slice length
> thing, there was very limited feedback on-list, a number of people have
> expressed interrest in the slice length hint.

I did try to give feedback then, but I don't see my email on lore and not sure
if it arrived.

As it stands having any interface to help describe latency sensitive tasks is
much desired! So I am desperate enough to take whatever.

But at the same time, I do see that we will not stop here. And we get a lot of
conflicting requirements from different workloads that I think we need
a framework to help provide sensible way to allow us describe those needs to
help the scheduler manage resources better.

I wasn't sure if you're still planning to send an interface or not, so was
working on this potential way (patch below) to provide a generic framework for
sched-qos. It's only a shell as I didn't get a chance yet to implement the
WAKEUP_LATENCY hint yet.

I did add the concept of grouping the hint to be meaningful for a group of
tasks sharing a cookie. Borrowing the concept from core scheduling. I've seen
many times how priorities (or nice) was used incorrectly with the assumption
that it applies to the apps tasks only. Which might be the case with autogroup
on some systems. But it struck a chord then that there's a perception/desire
not to apply it globally but only relative to a group of tasks they care about.
So I catered to allow describing such cases.

I was still trying to wrap my head around implementing WAKEUP_LATENCY hint, but
the idea I had is to represent WAKEUP_LATENCY in us time and somehow translate
this into lag. Which what I thought is our admission control. Based on your
patch it seems it might be simpler than this.

Was still thinking this through to be honest. But it seems it's either speak
now or forever hold, so here you go :)


Cheers

--
Qais Yousef

--->8---

From d6c83e05a81ac4ca34e99cb1f56d1acdacc63362 Mon Sep 17 00:00:00 2001
From: Qais Yousef <qyousef@layalina.io>
Date: Sat, 26 Aug 2023 17:39:31 +0100
Subject: [PATCH] sched: Add a new sched-qos interface

The need to describe the conflicting demand of various workloads hasn't
been higher. Both hardware and software has moved rapidly in the past
decade and system usage is more diverse and the number of workloads
expected to run on the same machine whether on Mobile or Server markets
has created a big dilemma on how to better manage those requirements.

The problem is that we lack mechanisms to allow these workloads to
describe what they need, and then allow kernel to do best efforts to
manage those demands based on the hardware it is running on
transparently and current system state.

Example of conflicting requirements that come across frequently:

	1. Improve wake up latency without for SCHED_OTHER. Many tasks
	   end up using SCHED_FIFO/SCHED_RR to compensate for this
	   shortcoming. RT tasks lack power management and fairness and
	   can be hard and error prone to use correctly and portably.

	2. Prefer spreading vs prefer packing on wake up for a group of
	   tasks. Geekbench-like workloads would benefit from
	   parallelising on different CPUs. hackbench type of workloads
	   can benefit from waking on up same CPUs or a CPU that is
	   closer in the cache hierarchy.

	3. Nice values for SCHED_OTHER are system wide and require
	   privileges. Many workloads would like a way to set relative
	   nice value so they can preempt each others, but not be
	   impact or be impacted by other tasks belong to different
	   workloads on the system.

	4. Provide a way to tag some tasks as 'background' to keep them
	   out of the way. SCHED_IDLE is too strong for some of these
	   tasks but yet they can be computationally heavy. Example
	   tasks are garbage collectors. Their work is both important
	   and not important.

Whether any of these use cases warrants an additional QoS hint is
something to be discussed individually. But the main point is to
introduce an interface that can be extendable to cater for potentially
those requirements and more. Wake up latency is the major driving use
case that has brewing already for years and it is the first QoS hint to
be introduced in later patches.

It is desired to have apps (and benchmarks!) directly use this interface
for optimal perf/watt. But in the absence of such support, it should be
possible to write a userspace daemon to monitor workloads and apply
these QoS hints on apps behalf based on analysis done by anyone
interested in improving the performance of those workloads.

Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
---
 Documentation/scheduler/index.rst     |  1 +
 Documentation/scheduler/sched-qos.rst | 44 +++++++++++++++++++++++++
 include/uapi/linux/sched.h            |  4 +++
 include/uapi/linux/sched/types.h      | 46 +++++++++++++++++++++++++++
 kernel/sched/core.c                   |  3 ++
 tools/include/uapi/linux/sched.h      |  4 +++
 6 files changed, 102 insertions(+)
 create mode 100644 Documentation/scheduler/sched-qos.rst

diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/index.rst
index 3170747226f6..fef59d7cd8e2 100644
--- a/Documentation/scheduler/index.rst
+++ b/Documentation/scheduler/index.rst
@@ -20,6 +20,7 @@ Scheduler
     sched-rt-group
     sched-stats
     sched-debug
+    sched-qos
 
     text_files
 
diff --git a/Documentation/scheduler/sched-qos.rst b/Documentation/scheduler/sched-qos.rst
new file mode 100644
index 000000000000..0911261cb124
--- /dev/null
+++ b/Documentation/scheduler/sched-qos.rst
@@ -0,0 +1,44 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+Scheduler QoS
+=============
+
+1. Introduction
+===============
+
+Different workloads have different scheduling requirements to operate
+optimally. The same applies to tasks within the same workload.
+
+To enable smarter usage of system resources and to cater for the conflicting
+demands of various tasks, Scheduler QoS provides a mechanism to provide more
+information about those demands so that scheduler can do best-effort to
+honour them.
+
+  @sched_qos_type	what QoS hint to apply
+  @sched_qos_value	value of the QoS hint
+  @sched_qos_cookie	magic cookie to tag a group of tasks for which the QoS
+			applies. If 0, the hint will apply globally system
+			wide. If not 0, the hint will be relative to tasks that
+			has the same cookie value only.
+
+QoS hints are set once and not inherited by children by design. The
+rationale is that each task has its individual characteristics and it is
+encouraged to describe each of these separately. Also since system resources
+are finite, there's a limit to what can be done to honour these requests
+before reaching a tipping point where there are too many requests for
+a particular QoS that is impossible to service for all of them at once and
+some will start to lose out. For example if 10 tasks require better wake
+up latencies on a 4 CPUs SMP system, then if they all wake up at once, only
+4 can perceive the hint honoured and the rest will have to wait. Inheritance
+can lead these 10 to become a 100 or a 1000 more easily, and then the QoS
+hint will lose its meaning and effectiveness rapidly. The chances of 10
+tasks waking up at the same time is lower than a 100 and lower than a 1000.
+
+To set multiple QoS hints, a syscall is required for each. This is a
+trade-off to reduce the churn on extending the interface as the hope for
+this to evolve as workloads and hardware get more sophisticated and the
+need for extension will arise; and when this happen the task should be
+simpler to add the kernel extension and allow userspace to use readily by
+setting the newly added flag without having to update the whole of
+sched_attr.
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 3bac0a8ceab2..67ef99f64ddc 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -102,6 +102,9 @@ struct clone_args {
 	__aligned_u64 set_tid_size;
 	__aligned_u64 cgroup;
 };
+
+enum sched_qos_type {
+};
 #endif
 
 #define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
@@ -132,6 +135,7 @@ struct clone_args {
 #define SCHED_FLAG_KEEP_PARAMS		0x10
 #define SCHED_FLAG_UTIL_CLAMP_MIN	0x20
 #define SCHED_FLAG_UTIL_CLAMP_MAX	0x40
+#define SCHED_FLAG_QOS			0x80
 
 #define SCHED_FLAG_KEEP_ALL	(SCHED_FLAG_KEEP_POLICY | \
 				 SCHED_FLAG_KEEP_PARAMS)
diff --git a/include/uapi/linux/sched/types.h b/include/uapi/linux/sched/types.h
index f2c4589d4dbf..8c2658ffe4bd 100644
--- a/include/uapi/linux/sched/types.h
+++ b/include/uapi/linux/sched/types.h
@@ -98,6 +98,48 @@ struct sched_param {
  * scheduled on a CPU with no more capacity than the specified value.
  *
  * A task utilization boundary can be reset by setting the attribute to -1.
+ *
+ * Scheduler QoS
+ * =============
+ *
+ * Different workloads have different scheduling requirements to operate
+ * optimally. The same applies to tasks within the same workload.
+ *
+ * To enable smarter usage of system resources and to cater for the conflicting
+ * demands of various tasks, Scheduler QoS provides a mechanism to provide more
+ * information about those demands so that scheduler can do best-effort to
+ * honour them.
+ *
+ *  @sched_qos_type	what QoS hint to apply
+ *  @sched_qos_value	value of the QoS hint
+ *  @sched_qos_cookie	magic cookie to tag a group of tasks for which the QoS
+ *			applies. If 0, the hint will apply globally system
+ *			wide. If not 0, the hint will be relative to tasks that
+ *			has the same cookie value only.
+ *
+ * QoS hints are set once and not inherited by children by design. The
+ * rationale is that each task has its individual characteristics and it is
+ * encouraged to describe each of these separately. Also since system resources
+ * are finite, there's a limit to what can be done to honour these requests
+ * before reaching a tipping point where there are too many requests for
+ * a particular QoS that is impossible to service for all of them at once and
+ * some will start to lose out. For example if 10 tasks require better wake
+ * up latencies on a 4 CPUs SMP system, then if they all wake up at once, only
+ * 4 can perceive the hint honoured and the rest will have to wait. Inheritance
+ * can lead these 10 to become a 100 or a 1000 more easily, and then the QoS
+ * hint will lose its meaning and effectiveness rapidly. The chances of 10
+ * tasks waking up at the same time is lower than a 100 and lower than a 1000.
+ *
+ * To set multiple QoS hints, a syscall is required for each. This is a
+ * trade-off to reduce the churn on extending the interface as the hope for
+ * this to evolve as workloads and hardware get more sophisticated and the
+ * need for extension will arise; and when this happen the task should be
+ * simpler to add the kernel extension and allow userspace to use readily by
+ * setting the newly added flag without having to update the whole of
+ * sched_attr.
+ *
+ * Details about the available QoS hints can be found in:
+ * Documentation/scheduler/sched-qos.rst
  */
 struct sched_attr {
 	__u32 size;
@@ -120,6 +162,10 @@ struct sched_attr {
 	__u32 sched_util_min;
 	__u32 sched_util_max;
 
+	__u32 sched_qos_type;
+	__s64 sched_qos_value;
+	__u32 sched_qos_cookie;
+
 };
 
 #endif /* _UAPI_LINUX_SCHED_TYPES_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index efe3848978a0..efc658f0f6e7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7680,6 +7680,9 @@ static int __sched_setscheduler(struct task_struct *p,
 			return retval;
 	}
 
+	if (attr->sched_flags & SCHED_FLAG_QOS)
+		return -ENOSYS;
+
 	/*
 	 * SCHED_DEADLINE bandwidth accounting relies on stable cpusets
 	 * information.
diff --git a/tools/include/uapi/linux/sched.h b/tools/include/uapi/linux/sched.h
index 3bac0a8ceab2..67ef99f64ddc 100644
--- a/tools/include/uapi/linux/sched.h
+++ b/tools/include/uapi/linux/sched.h
@@ -102,6 +102,9 @@ struct clone_args {
 	__aligned_u64 set_tid_size;
 	__aligned_u64 cgroup;
 };
+
+enum sched_qos_type {
+};
 #endif
 
 #define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
@@ -132,6 +135,7 @@ struct clone_args {
 #define SCHED_FLAG_KEEP_PARAMS		0x10
 #define SCHED_FLAG_UTIL_CLAMP_MIN	0x20
 #define SCHED_FLAG_UTIL_CLAMP_MAX	0x40
+#define SCHED_FLAG_QOS			0x80
 
 #define SCHED_FLAG_KEEP_ALL	(SCHED_FLAG_KEEP_POLICY | \
 				 SCHED_FLAG_KEEP_PARAMS)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] sched/eevdf: sched_attr::sched_runtime slice hint
  2023-09-16 21:33 ` [PATCH 0/2] sched/eevdf: sched_attr::sched_runtime slice hint Qais Yousef
@ 2023-09-18  3:43   ` Mike Galbraith
  2023-09-19 21:08     ` Qais Yousef
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2023-09-18  3:43 UTC (permalink / raw)
  To: Qais Yousef, peterz
  Cc: mingo, linux-kernel, vincent.guittot, juri.lelli,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, corbet,
	chris.hyser, patrick.bellasi, pjt, pavel, qperret, tim.c.chen,
	joshdon, timj, kprateek.nayak, yu.c.chen, youssefesmat, joel,
	tglx, daniel.m.jordan

On Sat, 2023-09-16 at 22:33 +0100, Qais Yousef wrote:
>
> Example of conflicting requirements that come across frequently:
>
>         1. Improve wake up latency without for SCHED_OTHER. Many tasks
>            end up using SCHED_FIFO/SCHED_RR to compensate for this
>            shortcoming. RT tasks lack power management and fairness and
>            can be hard and error prone to use correctly and portably.

This bit appears to be dealt with about as nicely as it can be in a
fair class by the latency nice patch set, and deals with both
individual tasks and groups thereof, ie has cgroups support.

Its trade slice for latency fits EEVDF nicely IMHO.  As its name
implies, the trade agreement language is relative niceness, which I
find more appropriate than time units, use of which would put the deal
squarely into the realm of RT, thus have no place in a fair class.

I don't yet know how effective it is.  I dinged up schedtool to play
with both it and $subject, but have yet to target any pet piglets or
measured impact of shiny new lipstick cannon.

>         2. Prefer spreading vs prefer packing on wake up for a group of
>            tasks. Geekbench-like workloads would benefit from
>            parallelising on different CPUs. hackbench type of workloads
>            can benefit from waking on up same CPUs or a CPU that is
>            closer in the cache hierarchy.
>
>         3. Nice values for SCHED_OTHER are system wide and require
>            privileges. Many workloads would like a way to set relative
>            nice value so they can preempt each others, but not be
>            impact or be impacted by other tasks belong to different
>            workloads on the system.
>
>         4. Provide a way to tag some tasks as 'background' to keep them
>            out of the way. SCHED_IDLE is too strong for some of these
>            tasks but yet they can be computationally heavy. Example
>            tasks are garbage collectors. Their work is both important
>            and not important.

All three of those make my eyebrows twitch mightily even in their not
well defined form: any notion of applying badges to identify groups of
tasks would constitute creation of yet another cgroups.

	-Mike

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion
  2023-09-15 12:43 ` [PATCH 2/2] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion peterz
@ 2023-09-19  7:53   ` Mike Galbraith
  2023-09-24  9:21     ` Mike Galbraith
  2023-09-19 22:07   ` Qais Yousef
  1 sibling, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2023-09-19  7:53 UTC (permalink / raw)
  To: peterz, mingo
  Cc: linux-kernel, vincent.guittot, juri.lelli, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, corbet, qyousef, chris.hyser,
	patrick.bellasi, pjt, pavel, qperret, tim.c.chen, joshdon, timj,
	kprateek.nayak, yu.c.chen, youssefesmat, joel, tglx,
	daniel.m.jordan

On Fri, 2023-09-15 at 14:43 +0200, peterz@infradead.org wrote:
> Allow applications to directly set a suggested request/slice length using
> sched_attr::sched_runtime.

I met an oddity while fiddling to see what a custom slice would do for
cyclictest, it seeming to be reasonable target.  $subject seems to be
working fine per live peek with crash, caveat being it's not alone in
otherwise virgin source...

homer:..kernel/linux-tip # make -j8 vmlinux && killall cyclictest

Elsewhere, post build/kill...

instance-a, stock
homer:/root # cyclictest -Smq
# /dev/cpu_dma_latency set to 0us
T: 0 (17763) P: 0 I:1000 C: 297681 Min:      2 Act:   51 Avg:   66 Max:   10482
T: 1 (17764) P: 0 I:1500 C: 198639 Min:      2 Act: 3159 Avg:   70 Max:    9054
T: 2 (17765) P: 0 I:2000 C: 149075 Min:      2 Act:   52 Avg:   78 Max:    8190
T: 3 (17766) P: 0 I:2500 C: 119867 Min:      2 Act:   55 Avg:   77 Max:    8328
T: 4 (17767) P: 0 I:3000 C:  99888 Min:      2 Act:   51 Avg:   90 Max:    8483
T: 5 (17768) P: 0 I:3500 C:  85748 Min:      3 Act:   53 Avg:   76 Max:    8148
T: 6 (17769) P: 0 I:4000 C:  75153 Min:      3 Act:   53 Avg:   92 Max:    7510
T: 7 (17770) P: 0 I:4500 C:  66807 Min:      3 Act:   55 Avg:   81 Max:    8709

instance-b, launched w. custom slice, and verifies via live peek with crash.
homer:/root # schedtool -v -s 100000 -e cyclictest -Smq
PID 17753: PRIO   0, POLICY N: SCHED_NORMAL  , NICE   0, EEVDF_NICE   0, EEVDF_SLICE 100000, AFFINITY 0xff
# /dev/cpu_dma_latency set to 0us
T: 0 (17754) P: 0 I:1000 C: 297014 Min:      1 Act:   51 Avg:   79 Max:    9584
T: 1 (17755) P: 0 I:1500 C: 198401 Min:      1 Act: 3845 Avg:  118 Max:    9995
T: 2 (17756) P: 0 I:2000 C: 149137 Min:      1 Act:  103 Avg:  125 Max:    8863
T: 3 (17757) P: 0 I:2500 C: 119519 Min:      1 Act:   52 Avg:  218 Max:    9704
T: 4 (17758) P: 0 I:3000 C:  99760 Min:      1 Act:   51 Avg:  134 Max:   11108
T: 5 (17759) P: 0 I:3500 C:  85731 Min:      1 Act:   53 Avg:  234 Max:    9649
T: 6 (17760) P: 0 I:4000 C:  75321 Min:      2 Act:   53 Avg:  139 Max:    7351
T: 7 (17761) P: 0 I:4500 C:  66929 Min:      3 Act:   51 Avg:  191 Max:    6469
                                                               ^^^ hmm
Those Avg: numbers follow the custom slice.

homer:/root # schedtool -v -s 500000 -e cyclictest -Smq
PID 29755: PRIO   0, POLICY N: SCHED_NORMAL  , NICE   0, EEVDF_NICE   0, EEVDF_SLICE 500000, AFFINITY 0xff
# /dev/cpu_dma_latency set to 0us
T: 0 (29756) P: 0 I:1000 C: 352348 Min:      1 Act:   51 Avg:   67 Max:   10259
T: 1 (29757) P: 0 I:1500 C: 229618 Min:      1 Act:   59 Avg:  121 Max:    8439
T: 2 (29758) P: 0 I:2000 C: 176031 Min:      1 Act:   54 Avg:  159 Max:    8839
T: 3 (29759) P: 0 I:2500 C: 139346 Min:      1 Act:   52 Avg:  186 Max:    9498
T: 4 (29760) P: 0 I:3000 C: 117779 Min:      2 Act:   54 Avg:  172 Max:    8862
T: 5 (29761) P: 0 I:3500 C: 101272 Min:      1 Act:   54 Avg:  180 Max:    9331
T: 6 (29762) P: 0 I:4000 C:  88781 Min:      3 Act:   54 Avg:  208 Max:    7111
T: 7 (29763) P: 0 I:4500 C:  78986 Min:      1 Act:  143 Avg:  168 Max:    6677

homer:/root # cyclictest -Smq
# /dev/cpu_dma_latency set to 0us
T: 0 (29747) P: 0 I:1000 C: 354262 Min:      2 Act:   51 Avg:   65 Max:    9754
T: 1 (29748) P: 0 I:1500 C: 236885 Min:      1 Act:   43 Avg:   56 Max:    8434
T: 2 (29749) P: 0 I:2000 C: 177461 Min:      3 Act:   53 Avg:   75 Max:    9028
T: 3 (29750) P: 0 I:2500 C: 142315 Min:      2 Act:   53 Avg:   74 Max:    7654
T: 4 (29751) P: 0 I:3000 C: 118642 Min:      3 Act:   51 Avg:   78 Max:    8169
T: 5 (29752) P: 0 I:3500 C: 101833 Min:      3 Act:   52 Avg:   75 Max:    7790
T: 6 (29753) P: 0 I:4000 C:  89065 Min:      3 Act:   52 Avg:   76 Max:    8001
T: 7 (29754) P: 0 I:4500 C:  79323 Min:      3 Act:   54 Avg:   78 Max:    7703


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] sched/eevdf: sched_attr::sched_runtime slice hint
  2023-09-18  3:43   ` Mike Galbraith
@ 2023-09-19 21:08     ` Qais Yousef
  2023-09-20  4:02       ` Mike Galbraith
  0 siblings, 1 reply; 14+ messages in thread
From: Qais Yousef @ 2023-09-19 21:08 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: peterz, mingo, linux-kernel, vincent.guittot, juri.lelli,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, corbet,
	chris.hyser, patrick.bellasi, pjt, pavel, qperret, tim.c.chen,
	joshdon, timj, kprateek.nayak, yu.c.chen, youssefesmat, joel,
	tglx, daniel.m.jordan

On 09/18/23 05:43, Mike Galbraith wrote:
> On Sat, 2023-09-16 at 22:33 +0100, Qais Yousef wrote:
> >
> > Example of conflicting requirements that come across frequently:
> >
> >         1. Improve wake up latency without for SCHED_OTHER. Many tasks
> >            end up using SCHED_FIFO/SCHED_RR to compensate for this
> >            shortcoming. RT tasks lack power management and fairness and
> >            can be hard and error prone to use correctly and portably.
> 
> This bit appears to be dealt with about as nicely as it can be in a
> fair class by the latency nice patch set, and deals with both
> individual tasks and groups thereof, ie has cgroups support.

AFAIU the latency_nice is no longer going forward. But I could be mistaken.

> Its trade slice for latency fits EEVDF nicely IMHO.  As its name
> implies, the trade agreement language is relative niceness, which I
> find more appropriate than time units, use of which would put the deal
> squarely into the realm of RT, thus have no place in a fair class.

Nice (or latency nice) have global indication that can make sense within the
specific context tested on. Like RT priorities.

Abstract notion is fine if you have a better suggestion, but being global
relative is a problem IMO. The intended consumers are application writers; who
have no prior knowledge about the system they'll be running on. I think that
was the main point against latency_nice IIUC.

> I don't yet know how effective it is.  I dinged up schedtool to play
> with both it and $subject, but have yet to target any pet piglets or
> measured impact of shiny new lipstick cannon.
> 
> >         2. Prefer spreading vs prefer packing on wake up for a group of
> >            tasks. Geekbench-like workloads would benefit from
> >            parallelising on different CPUs. hackbench type of workloads
> >            can benefit from waking on up same CPUs or a CPU that is
> >            closer in the cache hierarchy.
> >
> >         3. Nice values for SCHED_OTHER are system wide and require
> >            privileges. Many workloads would like a way to set relative
> >            nice value so they can preempt each others, but not be
> >            impact or be impacted by other tasks belong to different
> >            workloads on the system.
> >
> >         4. Provide a way to tag some tasks as 'background' to keep them
> >            out of the way. SCHED_IDLE is too strong for some of these
> >            tasks but yet they can be computationally heavy. Example
> >            tasks are garbage collectors. Their work is both important
> >            and not important.
> 
> All three of those make my eyebrows twitch mightily even in their not
> well defined form: any notion of applying badges to identify groups of
> tasks would constitute creation of yet another cgroups.

cgroups require root privilege. And it is intended for sysadmins to split
system resources between apps. It doesn't help an app to describe the
relationship between its tasks. Nor any requirements for them to do their job
properly. But rather impose something on them regardless of what they want.


Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion
  2023-09-15 12:43 ` [PATCH 2/2] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion peterz
  2023-09-19  7:53   ` Mike Galbraith
@ 2023-09-19 22:07   ` Qais Yousef
  2023-09-19 22:37     ` Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: Qais Yousef @ 2023-09-19 22:07 UTC (permalink / raw)
  To: peterz
  Cc: mingo, linux-kernel, vincent.guittot, juri.lelli,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, corbet,
	chris.hyser, patrick.bellasi, pjt, pavel, qperret, tim.c.chen,
	joshdon, timj, kprateek.nayak, yu.c.chen, youssefesmat, joel,
	efault, tglx, daniel.m.jordan

On 09/15/23 14:43, peterz@infradead.org wrote:
> Allow applications to directly set a suggested request/slice length using
> sched_attr::sched_runtime.

I'm probably as eternally confused as ever, but is this going to be the latency
hint too? I find it hard to correlate runtime to latency if it is.

> 
> The implementation clamps the value to: 0.1[ms] <= slice <= 100[ms]
> which is 1/10 the size of HZ=1000 and 10 times the size of HZ=100.
> 
> Applications should strive to use their periodic runtime at a high
> confidence interval (95%+) as the target slice. Using a smaller slice
> will introduce undue preemptions, while using a larger value will
> increase latency.

I can see this being hard to be used in practice. There's portability issue on
defining a runtime that is universal for all systems. Same workload will run
faster on some systems, and slower on others. Applications can't just pick
a value and must do some training to discover the right value for a particular
system. Add to that the weird impact HMP and DVFS can have on runtime from
wakeup to wakeup; things get harder. Shared DVFS policies particularly where
suddenly a task can find itself taking half the runtime because of a busy task
on another CPU doubling your speed.

(slice is not invariant, right?)

And a 95%+ confidence will be hard. A task might not know for sure what it will
do all the time before hand. There could be strong correlation for a short
period of time, but the interactive nature of a lot of workloads make this
hard to be predicted with such high confidence. And those transitions events
are what usually the scheduler struggles to handle well. All history is
suddenly erased and rebuilding it takes time; during which things get messy.

Binder tasks for example can be latency sensitive, but they're not periodic and
will be run on demand when someone asks for something. They're usually short
lived.

Actually so far in Android we just had the notion of something being sensitive
to wake up latency without the need to be specific about it. And if a set of
tasks got stuck on the same rq, they better run first as much as possible. We
did find the need to implement something in the load balancer to spread as
oversubscribe issues are unavoidable. I think the misfit path is the best to
handle this and I'd be happy to send patches in this effect once we land some
interface.

Of course you might find variations of this from different vendors with their
own SDKs for developers.

How do you see the proposed interface fits in this picture? I can't see how to
use it, but maybe I didn't understand it. Assuming of course this is indeed
about latency :-)


Thanks!

--
Qais Yousef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion
  2023-09-19 22:07   ` Qais Yousef
@ 2023-09-19 22:37     ` Peter Zijlstra
  2023-09-20  0:45       ` Qais Yousef
  2023-12-10 22:47       ` Qais Yousef
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2023-09-19 22:37 UTC (permalink / raw)
  To: Qais Yousef
  Cc: mingo, linux-kernel, vincent.guittot, juri.lelli,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, corbet,
	chris.hyser, patrick.bellasi, pjt, pavel, qperret, tim.c.chen,
	joshdon, timj, kprateek.nayak, yu.c.chen, youssefesmat, joel,
	efault, tglx, daniel.m.jordan

On Tue, Sep 19, 2023 at 11:07:08PM +0100, Qais Yousef wrote:
> On 09/15/23 14:43, peterz@infradead.org wrote:
> > Allow applications to directly set a suggested request/slice length using
> > sched_attr::sched_runtime.
> 
> I'm probably as eternally confused as ever, but is this going to be the latency
> hint too? I find it hard to correlate runtime to latency if it is.

Yes. Think of it as if a task has to save up for it's slice. Shorter
slice means a shorter time to save up for the time, means it can run
sooner. Longer slice, you get to save up longer.

Some people really want longer slices to reduce cache trashing or
held-lock-preemption like things. Oracle, Facebook, or virt thingies.

Other people just want very short activations but wants them quickly.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion
  2023-09-19 22:37     ` Peter Zijlstra
@ 2023-09-20  0:45       ` Qais Yousef
  2023-12-10 22:47       ` Qais Yousef
  1 sibling, 0 replies; 14+ messages in thread
From: Qais Yousef @ 2023-09-20  0:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, vincent.guittot, juri.lelli,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, corbet,
	chris.hyser, patrick.bellasi, pjt, pavel, qperret, tim.c.chen,
	joshdon, timj, kprateek.nayak, yu.c.chen, youssefesmat, joel,
	efault, tglx, daniel.m.jordan

On 09/20/23 00:37, Peter Zijlstra wrote:
> On Tue, Sep 19, 2023 at 11:07:08PM +0100, Qais Yousef wrote:
> > On 09/15/23 14:43, peterz@infradead.org wrote:
> > > Allow applications to directly set a suggested request/slice length using
> > > sched_attr::sched_runtime.
> > 
> > I'm probably as eternally confused as ever, but is this going to be the latency
> > hint too? I find it hard to correlate runtime to latency if it is.
> 
> Yes. Think of it as if a task has to save up for it's slice. Shorter
> slice means a shorter time to save up for the time, means it can run
> sooner. Longer slice, you get to save up longer.

Okay, so bias toward latency (short runtime) or throughput (long runtime).

I revisited the paper and can appreciate the importance of the term 'request'
in here.

Is the 95%+ confidence part really mandatory? I can easily see runtime swings
between 2-4ms over a trace for example. Should this task request 4ms as runtime
then? If we request 2ms but we ended up needing 4ms, IIUC we'll be preempted
after 2ms as that's what we requested, right?

What is the penalty for lying if we request 4ms but end up needing 2ms?

> Some people really want longer slices to reduce cache trashing or
> held-lock-preemption like things. Oracle, Facebook, or virt thingies.
> 
> Other people just want very short activations but wants them quickly.

Is 3-5ms in the very short region? I think that's the average I see. There are
longer, and shorter, but nothing 'crazy' long.

If we have a bunch of very short tasks stuck on the same rq; IIUC the ones that
actually requested the shorter slice should win as the other will still have
sysctl_sched_base_slice as their request, hence the deadline will seem further
away in spite of not consuming their full slice. And somehow lag will sort
itself to ensure fairness if there were too many wake ups of short-request
tasks (latency wakeup storm).

With this interface it'd be sort of compulsory for users to keep their latency
sensitive tasks short; which maybe is a good thing. The question is how short
do they have to be. Is there a number that can be exposed or deduced/calculated
to help identify/advise users to stay within?

Silly question, do you think this interface is transferable if we move away
from EEVDF in the future for whatever reason? I feel I have to reason about how
EEVDF works to use it, which probably was my first stumbling point as I was
thinking in a more detached/abstract manner.

Sorry, too many questions..


Thanks!

--
Qais Yousef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] sched/eevdf: sched_attr::sched_runtime slice hint
  2023-09-19 21:08     ` Qais Yousef
@ 2023-09-20  4:02       ` Mike Galbraith
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Galbraith @ 2023-09-20  4:02 UTC (permalink / raw)
  To: Qais Yousef
  Cc: peterz, mingo, linux-kernel, vincent.guittot, juri.lelli,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, corbet,
	chris.hyser, patrick.bellasi, pjt, pavel, qperret, tim.c.chen,
	joshdon, timj, kprateek.nayak, yu.c.chen, youssefesmat, joel,
	tglx, daniel.m.jordan

On Tue, 2023-09-19 at 22:08 +0100, Qais Yousef wrote:
> On 09/18/23 05:43, Mike Galbraith wrote:
> > On Sat, 2023-09-16 at 22:33 +0100, Qais Yousef wrote:
> > >
> > > Example of conflicting requirements that come across frequently:
> > >
> > >         1. Improve wake up latency without for SCHED_OTHER. Many tasks
> > >            end up using SCHED_FIFO/SCHED_RR to compensate for this
> > >            shortcoming. RT tasks lack power management and fairness and
> > >            can be hard and error prone to use correctly and portably.
> >
> > This bit appears to be dealt with about as nicely as it can be in a
> > fair class by the latency nice patch set, and deals with both
> > individual tasks and groups thereof, ie has cgroups support.
>
> AFAIU the latency_nice is no longer going forward. But I could be mistaken.

Effectively it is, both making the same request under the hood, the
difference being trade negotiation idiom.

I took both to try out for no particularly good reason.  The only thing
silly looking in the result is one clipping at OMG the other at OMFG.

> > All three of those make my eyebrows twitch mightily even in their not
> > well defined form: any notion of applying badges to identify groups of
> > tasks would constitute creation of yet another cgroups.
>
> cgroups require root privilege. And it is intended for sysadmins to split
> system resources between apps. It doesn't help an app to describe the
> relationship between its tasks. Nor any requirements for them to do their job
> properly. But rather impose something on them regardless of what they want.

The whys and wherefores are clear.  I suspect that addition of another
task group interface with conflicting scheduling parameters, policies,
hopes and/or prayers to be dealt with at each and every level of the
existing hierarchy is going to be hard to sell, but who knows, maybe
that skeleton looks more attractive to maintainers than it does to me.
I suppose we'll find out once you hang some meat on it.

	-Mike

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion
  2023-09-19  7:53   ` Mike Galbraith
@ 2023-09-24  9:21     ` Mike Galbraith
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Galbraith @ 2023-09-24  9:21 UTC (permalink / raw)
  To: peterz, mingo
  Cc: linux-kernel, vincent.guittot, juri.lelli, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, corbet, qyousef, chris.hyser,
	patrick.bellasi, pjt, pavel, qperret, tim.c.chen, joshdon, timj,
	kprateek.nayak, yu.c.chen, youssefesmat, joel, tglx,
	daniel.m.jordan

On Tue, 2023-09-19 at 09:53 +0200, Mike Galbraith wrote:
> On Fri, 2023-09-15 at 14:43 +0200, peterz@infradead.org wrote:
> > Allow applications to directly set a suggested request/slice length using
> > sched_attr::sched_runtime.
>
> I met an oddity while fiddling to see what a custom slice would do for
> cyclictest, it seeming to be reasonable target...

For the record, that cyclictest oddity was the mixed slice handling
improvement patch from the latency-nice branch not doing so well at the
ultralight end of the spectrum.  Turning the feature off eliminated it.

Some numbers for the terminally bored below.

	-Mike

5 minutes of repeatable mixed load, Blender 1920x1080@24 YouTube clip
(no annoying ads) vs massive_intr (4x88% hogs) with cyclictest -D 300
doing the profile duration timing in dinky/cute rpi4.  Filenames
describe slice and feature settings, fudge in filename means feature
was turned on.

perf sched lat -i perf.data.full.6.6.0-rc2-v8.stock --sort=runtime -S 15 -T

 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  massive_intr:(5)      | 816065.611 ms |   917839 | avg:   0.255 ms | max:  42.579 ms | sum:234075.309 ms |
  chromium-browse:(16)  |  59428.589 ms |   138489 | avg:   0.507 ms | max:  28.546 ms | sum:70262.946 ms |
  ThreadPoolForeg:(49)  |  32962.888 ms |    57147 | avg:   0.916 ms | max:  35.043 ms | sum:52354.576 ms |
  mutter:1352           |  24910.265 ms |    52058 | avg:   0.556 ms | max:  21.166 ms | sum:28945.039 ms |
  Chrome_ChildIOT:(14)  |  23785.517 ms |   132307 | avg:   0.345 ms | max:  30.987 ms | sum:45657.621 ms |
  VizCompositorTh:30768 |  14985.421 ms |    64769 | avg:   0.432 ms | max:  24.620 ms | sum:27981.948 ms |
  Xorg:925              |  14802.426 ms |    67407 | avg:   0.339 ms | max:  23.912 ms | sum:22844.860 ms |
  alsa-sink-MAI P:1260  |  13958.874 ms |    33127 | avg:   0.023 ms | max:  15.023 ms | sum:  756.454 ms |
  cyclictest:(5)        |  13345.073 ms |   715171 | avg:   0.271 ms | max:  19.277 ms | sum:193861.885 ms |
  Media:30834           |  12627.366 ms |    64061 | avg:   0.339 ms | max:  29.811 ms | sum:21687.561 ms |
  ThreadPoolSingl:(6)   |   9254.163 ms |    43524 | avg:   0.405 ms | max:  21.251 ms | sum:17617.750 ms |
  V4L2DecoderThre:30887 |   9251.235 ms |    63002 | avg:   0.302 ms | max:  16.819 ms | sum:19000.280 ms |
  VideoFrameCompo:30836 |   7947.943 ms |    47459 | avg:   0.300 ms | max:  19.666 ms | sum:14254.378 ms |
  pulseaudio:1172       |   6638.018 ms |    43467 | avg:   0.219 ms | max:  14.621 ms | sum: 9535.951 ms |
  threaded-ml:30883     |   5358.744 ms |    29193 | avg:   0.349 ms | max:  12.893 ms | sum:10175.289 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |1109784.680 ms |  3029507 |                 |       42.579 ms |    845307.017 ms |
 ----------------------------------------------------------------------------------------------------------
  INFO: 0.001% context switch bugs (13 out of 1909170)

perf sched lat -i perf.data.full.6.6.0-rc2-v8.massive_intr-100ms-slice --sort=runtime -S 15 -T

 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  massive_intr:(5)      | 861639.948 ms |   812113 | avg:   0.277 ms | max: 138.492 ms | sum:224707.956 ms |
  chromium-browse:(8)   |  51307.141 ms |   125222 | avg:   0.530 ms | max:  42.935 ms | sum:66314.290 ms |
  ThreadPoolForeg:(16)  |  26421.979 ms |    45000 | avg:   0.873 ms | max:  35.752 ms | sum:39306.544 ms |
  Chrome_ChildIOT:(5)   |  22542.183 ms |   131172 | avg:   0.336 ms | max:  51.751 ms | sum:44091.954 ms |
  mutter:1352           |  21835.334 ms |    48508 | avg:   0.566 ms | max:  37.495 ms | sum:27446.519 ms |
  VizCompositorTh:39048 |  14531.018 ms |    63787 | avg:   0.463 ms | max:  56.326 ms | sum:29522.218 ms |
  Xorg:925              |  14497.447 ms |    67315 | avg:   0.397 ms | max:  36.714 ms | sum:26735.175 ms |
  alsa-sink-MAI P:1260  |  13935.472 ms |    33111 | avg:   0.020 ms | max:   6.753 ms | sum:  677.888 ms |
  cyclictest:(5)        |  12696.835 ms |   653111 | avg:   0.425 ms | max:  38.092 ms | sum:277440.622 ms |
  Media:39089           |  12571.118 ms |    67187 | avg:   0.335 ms | max:  26.660 ms | sum:22498.438 ms |
  V4L2DecoderThre:39125 |   9156.299 ms |    66378 | avg:   0.301 ms | max:  23.828 ms | sum:19991.504 ms |
  ThreadPoolSingl:(4)   |   9079.291 ms |    46187 | avg:   0.377 ms | max:  28.850 ms | sum:17422.535 ms |
  VideoFrameCompo:39091 |   8103.756 ms |    50025 | avg:   0.290 ms | max:  33.230 ms | sum:14518.688 ms |
  pulseaudio:1172       |   6575.897 ms |    44952 | avg:   0.259 ms | max:  19.937 ms | sum:11630.128 ms |
  threaded-ml:39123     |   5367.921 ms |    29503 | avg:   0.339 ms | max:  24.648 ms | sum: 9993.313 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |1127978.646 ms |  2802143 |                 |      138.492 ms |    920177.765 ms |
 ----------------------------------------------------------------------------------------------------------
  INFO: 0.000% context switch bugs (8 out of 1773282)

perf sched lat -i perf.data.full.6.6.0-rc2-v8.massive_intr-100ms-slice-fudge --sort=runtime -S 15 -T

 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  massive_intr:(5)      | 855426.748 ms |   777984 | avg:   0.266 ms | max: 111.756 ms | sum:206723.857 ms |
  chromium-browse:(8)   |  51656.331 ms |   122631 | avg:   0.531 ms | max:  36.305 ms | sum:65094.280 ms |
  ThreadPoolForeg:(16)  |  27473.845 ms |    43053 | avg:   0.972 ms | max:  34.973 ms | sum:41833.237 ms |
  mutter:1352           |  21412.313 ms |    47476 | avg:   0.541 ms | max:  33.553 ms | sum:25685.892 ms |
  Chrome_ChildIOT:(5)   |  20283.623 ms |   119424 | avg:   0.395 ms | max:  31.266 ms | sum:47164.449 ms |
  VizCompositorTh:36026 |  14643.428 ms |    63979 | avg:   0.464 ms | max:  34.832 ms | sum:29708.794 ms |
  Xorg:925              |  14296.586 ms |    67756 | avg:   0.410 ms | max:  23.774 ms | sum:27811.107 ms |
  alsa-sink-MAI P:1260  |  13977.823 ms |    33116 | avg:   0.023 ms | max:   5.513 ms | sum:  750.004 ms |
  cyclictest:(5)        |  12365.030 ms |   645151 | avg:   0.475 ms | max:  35.084 ms | sum:306236.764 ms |
  Media:36076           |  12256.872 ms |    60848 | avg:   0.378 ms | max:  26.714 ms | sum:22978.110 ms |
  ThreadPoolSingl:(4)   |   8983.939 ms |    43538 | avg:   0.401 ms | max:  21.137 ms | sum:17468.417 ms |
  V4L2DecoderThre:36101 |   8910.124 ms |    58505 | avg:   0.316 ms | max:  22.654 ms | sum:18503.766 ms |
  VideoFrameCompo:36081 |   7851.251 ms |    44655 | avg:   0.325 ms | max:  24.662 ms | sum:14532.619 ms |
  pulseaudio:1172       |   6671.226 ms |    44172 | avg:   0.332 ms | max:  19.466 ms | sum:14673.304 ms |
  threaded-ml:36099     |   5338.302 ms |    28879 | avg:   0.379 ms | max:  20.754 ms | sum:10944.051 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |1115349.047 ms |  2703902 |                 |      111.756 ms |    959507.323 ms |
 ----------------------------------------------------------------------------------------------------------
  INFO: 0.000% context switch bugs (4 out of 1763882)

perf sched lat -i perf.data.full.6.6.0-rc2-v8.cyclictest-500us-slice --sort=runtime -S 15 -T

 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  massive_intr:(5)      | 847672.515 ms |   951686 | avg:   0.233 ms | max:  30.660 ms | sum:222000.736 ms |
  chromium-browse:(8)   |  53628.292 ms |   133414 | avg:   0.456 ms | max:  30.414 ms | sum:60792.606 ms |
  ThreadPoolForeg:(17)  |  26765.405 ms |    43894 | avg:   0.869 ms | max:  43.268 ms | sum:38131.378 ms |
  mutter:1352           |  22161.746 ms |    49865 | avg:   0.514 ms | max:  24.773 ms | sum:25639.654 ms |
  Chrome_ChildIOT:(7)   |  20878.044 ms |   122989 | avg:   0.315 ms | max:  44.034 ms | sum:38735.550 ms |
  Xorg:925              |  14647.685 ms |    66766 | avg:   0.326 ms | max:  16.256 ms | sum:21785.616 ms |
  VizCompositorTh:34571 |  14304.582 ms |    64222 | avg:   0.401 ms | max:  23.334 ms | sum:25767.033 ms |
  alsa-sink-MAI P:1260  |  14006.042 ms |    33136 | avg:   0.022 ms | max:   4.851 ms | sum:  724.564 ms |
  cyclictest:(5)        |  13404.228 ms |   731243 | avg:   0.217 ms | max:  28.997 ms | sum:158740.145 ms |
  Media:34626           |  12790.442 ms |    63825 | avg:   0.327 ms | max:  30.721 ms | sum:20853.022 ms |
  V4L2DecoderThre:34651 |   9278.538 ms |    62619 | avg:   0.284 ms | max:  18.678 ms | sum:17761.070 ms |
  ThreadPoolSingl:(4)   |   9226.802 ms |    42846 | avg:   0.389 ms | max:  18.563 ms | sum:16684.938 ms |
  VideoFrameCompo:34627 |   7788.047 ms |    46681 | avg:   0.285 ms | max:  14.102 ms | sum:13327.258 ms |
  pulseaudio:1172       |   6643.612 ms |    42393 | avg:   0.186 ms | max:   7.567 ms | sum: 7873.616 ms |
  threaded-ml:34649     |   5402.911 ms |    28737 | avg:   0.313 ms | max:  13.276 ms | sum: 9007.151 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |1114449.900 ms |  3009626 |                 |       44.034 ms |    740410.836 ms |
 ----------------------------------------------------------------------------------------------------------
  INFO: 0.000% context switch bugs (8 out of 1873597)

perf sched lat -i perf.data.full.6.6.0-rc2-v8.cyclictest-500us-slice-fudge --sort=runtime -S 15 -T

 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  massive_intr:(5)      | 838981.090 ms |   880877 | avg:   0.262 ms | max:  28.559 ms | sum:230982.334 ms |
  chromium-browse:(9)   |  55497.942 ms |   128916 | avg:   0.520 ms | max:  39.331 ms | sum:67042.536 ms |
  ThreadPoolForeg:(20)  |  27669.981 ms |    41272 | avg:   0.990 ms | max:  29.775 ms | sum:40857.370 ms |
  mutter:1352           |  22455.332 ms |    47362 | avg:   0.562 ms | max:  19.838 ms | sum:26611.559 ms |
  Chrome_ChildIOT:(7)   |  22295.707 ms |   123559 | avg:   0.344 ms | max:  29.449 ms | sum:42479.505 ms |
  Xorg:925              |  14894.399 ms |    65930 | avg:   0.353 ms | max:  17.650 ms | sum:23303.557 ms |
  VizCompositorTh:37170 |  14567.478 ms |    62477 | avg:   0.438 ms | max:  25.008 ms | sum:27366.073 ms |
  alsa-sink-MAI P:1260  |  14207.866 ms |    33134 | avg:   0.022 ms | max:   4.092 ms | sum:  744.586 ms |
  cyclictest:(5)        |  13483.280 ms |   697504 | avg:   0.375 ms | max:  20.859 ms | sum:261834.795 ms |
  Media:37224           |  12890.016 ms |    62641 | avg:   0.343 ms | max:  14.333 ms | sum:21510.222 ms |
  ThreadPoolSingl:(4)   |   9095.635 ms |    42121 | avg:   0.383 ms | max:  17.168 ms | sum:16116.408 ms |
  V4L2DecoderThre:37261 |   9079.051 ms |    63220 | avg:   0.291 ms | max:  18.144 ms | sum:18421.175 ms |
  VideoFrameCompo:37226 |   8049.344 ms |    46145 | avg:   0.309 ms | max:  18.575 ms | sum:14252.577 ms |
  pulseaudio:1172       |   6767.263 ms |    42736 | avg:   0.205 ms | max:   8.202 ms | sum: 8781.668 ms |
  threaded-ml:37262     |   5404.490 ms |    28428 | avg:   0.332 ms | max:  17.758 ms | sum: 9424.566 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |1115092.327 ms |  2896377 |                 |       39.331 ms |    881443.337 ms |
 ----------------------------------------------------------------------------------------------------------
  INFO: 0.001% context switch bugs (12 out of 1845612)


Peeks out window at lovely sunny Sunday, and <poof>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip: sched/urgent] sched/eevdf: Also update slice on placement
  2023-09-15 12:43 ` [PATCH 1/2] sched/eevdf: Also update slice on placement peterz
@ 2023-10-03 10:42   ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-10-03 10:42 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     2f2fc17bab0011430ceb6f2dc1959e7d1f981444
Gitweb:        https://git.kernel.org/tip/2f2fc17bab0011430ceb6f2dc1959e7d1f981444
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 15 Sep 2023 00:48:55 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 03 Oct 2023 12:32:29 +02:00

sched/eevdf: Also update slice on placement

Tasks that never consume their full slice would not update their slice value.
This means that tasks that are spawned before the sysctl scaling keep their
original (UP) slice length.

Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230915124822.847197830@noisy.programming.kicks-ass.net
---
 kernel/sched/fair.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cb22592..7d73652 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4919,10 +4919,12 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq) {}
 static void
 place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 {
-	u64 vslice = calc_delta_fair(se->slice, se);
-	u64 vruntime = avg_vruntime(cfs_rq);
+	u64 vslice, vruntime = avg_vruntime(cfs_rq);
 	s64 lag = 0;
 
+	se->slice = sysctl_sched_base_slice;
+	vslice = calc_delta_fair(se->slice, se);
+
 	/*
 	 * Due to how V is constructed as the weighted average of entities,
 	 * adding tasks with positive lag, or removing tasks with negative lag

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion
  2023-09-19 22:37     ` Peter Zijlstra
  2023-09-20  0:45       ` Qais Yousef
@ 2023-12-10 22:47       ` Qais Yousef
  1 sibling, 0 replies; 14+ messages in thread
From: Qais Yousef @ 2023-12-10 22:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, vincent.guittot, juri.lelli,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, corbet,
	chris.hyser, patrick.bellasi, pjt, pavel, qperret, tim.c.chen,
	joshdon, timj, kprateek.nayak, yu.c.chen, youssefesmat, joel,
	efault, tglx, daniel.m.jordan

On 09/20/23 00:37, Peter Zijlstra wrote:
> On Tue, Sep 19, 2023 at 11:07:08PM +0100, Qais Yousef wrote:
> > On 09/15/23 14:43, peterz@infradead.org wrote:
> > > Allow applications to directly set a suggested request/slice length using
> > > sched_attr::sched_runtime.
> > 
> > I'm probably as eternally confused as ever, but is this going to be the latency
> > hint too? I find it hard to correlate runtime to latency if it is.
> 
> Yes. Think of it as if a task has to save up for it's slice. Shorter
> slice means a shorter time to save up for the time, means it can run
> sooner. Longer slice, you get to save up longer.
> 
> Some people really want longer slices to reduce cache trashing or
> held-lock-preemption like things. Oracle, Facebook, or virt thingies.
> 
> Other people just want very short activations but wants them quickly.

I did check with several folks around here in the Android world, and none of us
can see how in practice we can use this interface.

It is helpful for those who have a specific system and workload they want to
tune them together. But as a generic app developer interface it will be
impossible to use.

Is that sched-qos thingy worth trying to pursue as an alternative for app
developers? I think from their perspective they just can practically say I care
about running ASAP or not; so a boolean flag to trigger the desire for short
wake up latency. How to implement that, that'd be my pain. But do you see an
issue in principle in trying to go down that route and see how far I (we if
anyone else is interested) can get?

I think the two can co-exist each serving a different purpose.

Or is there something about this interface that makes it usable in this manner
but I couldn't get it?


Thanks!

--
Qais Yousef

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-12-10 23:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-15 12:43 [PATCH 0/2] sched/eevdf: sched_attr::sched_runtime slice hint peterz
2023-09-15 12:43 ` [PATCH 1/2] sched/eevdf: Also update slice on placement peterz
2023-10-03 10:42   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
2023-09-15 12:43 ` [PATCH 2/2] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion peterz
2023-09-19  7:53   ` Mike Galbraith
2023-09-24  9:21     ` Mike Galbraith
2023-09-19 22:07   ` Qais Yousef
2023-09-19 22:37     ` Peter Zijlstra
2023-09-20  0:45       ` Qais Yousef
2023-12-10 22:47       ` Qais Yousef
2023-09-16 21:33 ` [PATCH 0/2] sched/eevdf: sched_attr::sched_runtime slice hint Qais Yousef
2023-09-18  3:43   ` Mike Galbraith
2023-09-19 21:08     ` Qais Yousef
2023-09-20  4:02       ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).