All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7]  Add latency_nice priority
@ 2022-05-12 16:35 Vincent Guittot
  2022-05-12 16:35 ` [PATCH v2 1/7] sched: Introduce latency-nice as a per-task attribute Vincent Guittot
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Vincent Guittot @ 2022-05-12 16:35 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, linux-kernel, parth
  Cc: qais.yousef, chris.hyser, valentin.schneider, patrick.bellasi,
	David.Laight, pjt, pavel, tj, qperret, tim.c.chen, joshdon,
	Vincent Guittot

This patchset restarts the work about adding a latency nice priority to
describe the latency tolerance of cfs tasks.

The patches [1-4] have been done by Parth:
https://lore.kernel.org/lkml/20200228090755.22829-1-parth@linux.ibm.com/

I have just rebased and moved the set of latency priority outside the
priority update. I have removed the reviewed tag because the patches
are 2 years old.

The patches [5-7] use latency nice priority to decide if a cfs task can
preempt the current running task. Patch 5 gives some tests results with
cyclictests and hackbench to highlight the benefit of latency nice
priority for short interactive task or long intensive tasks.


Change since v1:
- fix typo
- move some codes in the right patch to make bisect happy
- simplify and fixed how the weight is computed
- added support of sched core patch 7

Parth Shah (4):
  sched: Introduce latency-nice as a per-task attribute
  sched/core: Propagate parent task's latency requirements to the child
    task
  sched: Allow sched_{get,set}attr to change latency_nice of the task
  sched/core: Add permission checks for setting the latency_nice value

Vincent Guittot (3):
  sched/fair: Take into account latency nice at wakeup
  sched/fair: Add sched group latency support
  sched/core: support latency nice with sched core

 include/linux/sched.h            |   3 +
 include/uapi/linux/sched.h       |   4 +-
 include/uapi/linux/sched/types.h |  19 ++++++
 init/init_task.c                 |   1 +
 kernel/sched/core.c              |  90 ++++++++++++++++++++++++++
 kernel/sched/debug.c             |   1 +
 kernel/sched/fair.c              | 105 ++++++++++++++++++++++++++++++-
 kernel/sched/sched.h             |  34 ++++++++++
 tools/include/uapi/linux/sched.h |   4 +-
 9 files changed, 257 insertions(+), 4 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/7] sched: Introduce latency-nice as a per-task attribute
  2022-05-12 16:35 [PATCH v2 0/7] Add latency_nice priority Vincent Guittot
@ 2022-05-12 16:35 ` Vincent Guittot
  2022-05-12 16:35 ` [PATCH v2 2/7] sched/core: Propagate parent task's latency requirements to the child task Vincent Guittot
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Vincent Guittot @ 2022-05-12 16:35 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, linux-kernel, parth
  Cc: qais.yousef, chris.hyser, valentin.schneider, patrick.bellasi,
	David.Laight, pjt, pavel, tj, qperret, tim.c.chen, joshdon,
	Vincent Guittot

From: Parth Shah <parth@linux.ibm.com>

Latency-nice indicates the latency requirements of a task with respect
to the other tasks in the system. The value of the attribute can be within
the range of [-20, 19] both inclusive to be in-line with the values just
like task nice values.

latency_nice = -20 indicates the task to have the least latency as
compared to the tasks having latency_nice = +19.

The latency_nice may affect only the CFS SCHED_CLASS by getting
latency requirements from the userspace.

Additionally, add debugging bits for newly added latency_nice attribute.

Signed-off-by: Parth Shah <parth@linux.ibm.com>
[rebase - minor fixes]
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 include/linux/sched.h |  1 +
 kernel/sched/debug.c  |  1 +
 kernel/sched/sched.h  | 18 ++++++++++++++++++
 3 files changed, 20 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a27316f5f737..34c6c9c2797c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -775,6 +775,7 @@ struct task_struct {
 	int				static_prio;
 	int				normal_prio;
 	unsigned int			rt_priority;
+	int				latency_nice;
 
 	struct sched_entity		se;
 	struct sched_rt_entity		rt;
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index bb3d63bdf4ae..a3f7876217a6 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1042,6 +1042,7 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
 #endif
 	P(policy);
 	P(prio);
+	P(latency_nice);
 	if (task_has_dl_policy(p)) {
 		P(dl.runtime);
 		P(dl.deadline);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4784898e8f83..271ecd37c13d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -117,6 +117,24 @@ extern void call_trace_sched_update_nr_running(struct rq *rq, int count);
  */
 #define NS_TO_JIFFIES(TIME)	((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))
 
+/*
+ * Latency nice is meant to provide scheduler hints about the relative
+ * latency requirements of a task with respect to other tasks.
+ * Thus a task with latency_nice == 19 can be hinted as the task with no
+ * latency requirements, in contrast to the task with latency_nice == -20
+ * which should be given priority in terms of lower latency.
+ */
+#define MAX_LATENCY_NICE	19
+#define MIN_LATENCY_NICE	-20
+
+#define LATENCY_NICE_WIDTH	\
+	(MAX_LATENCY_NICE - MIN_LATENCY_NICE + 1)
+
+/*
+ * Default tasks should be treated as a task with latency_nice = 0.
+ */
+#define DEFAULT_LATENCY_NICE	0
+
 /*
  * Increase resolution of nice-level calculations for 64-bit architectures.
  * The extra resolution improves shares distribution and load balancing of
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/7] sched/core: Propagate parent task's latency requirements to the child task
  2022-05-12 16:35 [PATCH v2 0/7] Add latency_nice priority Vincent Guittot
  2022-05-12 16:35 ` [PATCH v2 1/7] sched: Introduce latency-nice as a per-task attribute Vincent Guittot
@ 2022-05-12 16:35 ` Vincent Guittot
  2022-05-12 16:35 ` [PATCH v2 3/7] sched: Allow sched_{get,set}attr to change latency_nice of the task Vincent Guittot
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Vincent Guittot @ 2022-05-12 16:35 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, linux-kernel, parth
  Cc: qais.yousef, chris.hyser, valentin.schneider, patrick.bellasi,
	David.Laight, pjt, pavel, tj, qperret, tim.c.chen, joshdon,
	Vincent Guittot

From: Parth Shah <parth@linux.ibm.com>

Clone parent task's latency_nice attribute to the forked child task.

Reset the latency_nice value to default value when the child task is
set to sched_reset_on_fork.

Also, initialize init_task.latency_nice value with DEFAULT_LATENCY_NICE
value

Signed-off-by: Parth Shah <parth@linux.ibm.com>
[rebase - minor fixes]
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 init/init_task.c    | 1 +
 kernel/sched/core.c | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/init/init_task.c b/init/init_task.c
index 73cc8f03511a..225d11a39bc9 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -78,6 +78,7 @@ struct task_struct init_task
 	.prio		= MAX_PRIO - 20,
 	.static_prio	= MAX_PRIO - 20,
 	.normal_prio	= MAX_PRIO - 20,
+	.latency_nice	= DEFAULT_LATENCY_NICE,
 	.policy		= SCHED_NORMAL,
 	.cpus_ptr	= &init_task.cpus_mask,
 	.user_cpus_ptr	= NULL,
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 07bacb050198..1f04b815b588 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4473,6 +4473,9 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 	 */
 	p->prio = current->normal_prio;
 
+	/* Propagate the parent's latency requirements to the child as well */
+	p->latency_nice = current->latency_nice;
+
 	uclamp_fork(p);
 
 	/*
@@ -4489,6 +4492,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 		p->prio = p->normal_prio = p->static_prio;
 		set_load_weight(p, false);
 
+		p->latency_nice = DEFAULT_LATENCY_NICE;
 		/*
 		 * We don't need the reset flag anymore after the fork. It has
 		 * fulfilled its duty:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/7] sched: Allow sched_{get,set}attr to change latency_nice of the task
  2022-05-12 16:35 [PATCH v2 0/7] Add latency_nice priority Vincent Guittot
  2022-05-12 16:35 ` [PATCH v2 1/7] sched: Introduce latency-nice as a per-task attribute Vincent Guittot
  2022-05-12 16:35 ` [PATCH v2 2/7] sched/core: Propagate parent task's latency requirements to the child task Vincent Guittot
@ 2022-05-12 16:35 ` Vincent Guittot
  2022-05-12 16:35 ` [PATCH v2 4/7] sched/core: Add permission checks for setting the latency_nice value Vincent Guittot
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Vincent Guittot @ 2022-05-12 16:35 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, linux-kernel, parth
  Cc: qais.yousef, chris.hyser, valentin.schneider, patrick.bellasi,
	David.Laight, pjt, pavel, tj, qperret, tim.c.chen, joshdon,
	Vincent Guittot

From: Parth Shah <parth@linux.ibm.com>

Introduce the latency_nice attribute to sched_attr and provide a
mechanism to change the value with the use of sched_setattr/sched_getattr
syscall.

Also add new flag "SCHED_FLAG_LATENCY_NICE" to hint the change in
latency_nice of the task on every sched_setattr syscall.

Signed-off-by: Parth Shah <parth@linux.ibm.com>
[rebase and add a dedicated __setscheduler_latency ]
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 include/uapi/linux/sched.h       |  4 +++-
 include/uapi/linux/sched/types.h | 19 +++++++++++++++++++
 kernel/sched/core.c              | 25 +++++++++++++++++++++++++
 tools/include/uapi/linux/sched.h |  4 +++-
 4 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 3bac0a8ceab2..b2e932c25be6 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -132,6 +132,7 @@ struct clone_args {
 #define SCHED_FLAG_KEEP_PARAMS		0x10
 #define SCHED_FLAG_UTIL_CLAMP_MIN	0x20
 #define SCHED_FLAG_UTIL_CLAMP_MAX	0x40
+#define SCHED_FLAG_LATENCY_NICE		0x80
 
 #define SCHED_FLAG_KEEP_ALL	(SCHED_FLAG_KEEP_POLICY | \
 				 SCHED_FLAG_KEEP_PARAMS)
@@ -143,6 +144,7 @@ struct clone_args {
 			 SCHED_FLAG_RECLAIM		| \
 			 SCHED_FLAG_DL_OVERRUN		| \
 			 SCHED_FLAG_KEEP_ALL		| \
-			 SCHED_FLAG_UTIL_CLAMP)
+			 SCHED_FLAG_UTIL_CLAMP		| \
+			 SCHED_FLAG_LATENCY_NICE)
 
 #endif /* _UAPI_LINUX_SCHED_H */
diff --git a/include/uapi/linux/sched/types.h b/include/uapi/linux/sched/types.h
index f2c4589d4dbf..db1e8199e8c8 100644
--- a/include/uapi/linux/sched/types.h
+++ b/include/uapi/linux/sched/types.h
@@ -10,6 +10,7 @@ struct sched_param {
 
 #define SCHED_ATTR_SIZE_VER0	48	/* sizeof first published struct */
 #define SCHED_ATTR_SIZE_VER1	56	/* add: util_{min,max} */
+#define SCHED_ATTR_SIZE_VER2	60	/* add: latency_nice */
 
 /*
  * Extended scheduling parameters data structure.
@@ -98,6 +99,22 @@ struct sched_param {
  * scheduled on a CPU with no more capacity than the specified value.
  *
  * A task utilization boundary can be reset by setting the attribute to -1.
+ *
+ * Latency Tolerance Attributes
+ * ===========================
+ *
+ * A subset of sched_attr attributes allows to specify the relative latency
+ * requirements of a task with respect to the other tasks running/queued in the
+ * system.
+ *
+ * @ sched_latency_nice	task's latency_nice value
+ *
+ * The latency_nice of a task can have any value in a range of
+ * [MIN_LATENCY_NICE..MAX_LATENCY_NICE].
+ *
+ * A task with latency_nice with the value of LATENCY_NICE_MIN can be
+ * taken for a task requiring a lower latency as opposed to the task with
+ * higher latency_nice.
  */
 struct sched_attr {
 	__u32 size;
@@ -120,6 +137,8 @@ struct sched_attr {
 	__u32 sched_util_min;
 	__u32 sched_util_max;
 
+	/* latency requirement hints */
+	__s32 sched_latency_nice;
 };
 
 #endif /* _UAPI_LINUX_SCHED_TYPES_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1f04b815b588..036bd9ff66e9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7200,6 +7200,15 @@ static void __setscheduler_params(struct task_struct *p,
 	p->rt_priority = attr->sched_priority;
 	p->normal_prio = normal_prio(p);
 	set_load_weight(p, true);
+
+}
+
+static void __setscheduler_latency(struct task_struct *p,
+		const struct sched_attr *attr)
+{
+	if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE) {
+		p->latency_nice = attr->sched_latency_nice;
+	}
 }
 
 /*
@@ -7326,6 +7335,13 @@ static int __sched_setscheduler(struct task_struct *p,
 			return retval;
 	}
 
+	if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE) {
+		if (attr->sched_latency_nice > MAX_LATENCY_NICE)
+			return -EINVAL;
+		if (attr->sched_latency_nice < MIN_LATENCY_NICE)
+			return -EINVAL;
+	}
+
 	if (pi)
 		cpuset_read_lock();
 
@@ -7360,6 +7376,9 @@ static int __sched_setscheduler(struct task_struct *p,
 			goto change;
 		if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)
 			goto change;
+		if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE &&
+		    attr->sched_latency_nice != p->latency_nice)
+			goto change;
 
 		p->sched_reset_on_fork = reset_on_fork;
 		retval = 0;
@@ -7448,6 +7467,7 @@ static int __sched_setscheduler(struct task_struct *p,
 		__setscheduler_params(p, attr);
 		__setscheduler_prio(p, newprio);
 	}
+	__setscheduler_latency(p, attr);
 	__setscheduler_uclamp(p, attr);
 
 	if (queued) {
@@ -7658,6 +7678,9 @@ static int sched_copy_attr(struct sched_attr __user *uattr, struct sched_attr *a
 	    size < SCHED_ATTR_SIZE_VER1)
 		return -EINVAL;
 
+	if ((attr->sched_flags & SCHED_FLAG_LATENCY_NICE) &&
+	    size < SCHED_ATTR_SIZE_VER2)
+		return -EINVAL;
 	/*
 	 * XXX: Do we want to be lenient like existing syscalls; or do we want
 	 * to be strict and return an error on out-of-bounds values?
@@ -7895,6 +7918,8 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
 	get_params(p, &kattr);
 	kattr.sched_flags &= SCHED_FLAG_ALL;
 
+	kattr.sched_latency_nice = p->latency_nice;
+
 #ifdef CONFIG_UCLAMP_TASK
 	/*
 	 * This could race with another potential updater, but this is fine
diff --git a/tools/include/uapi/linux/sched.h b/tools/include/uapi/linux/sched.h
index 3bac0a8ceab2..ecc4884bfe4b 100644
--- a/tools/include/uapi/linux/sched.h
+++ b/tools/include/uapi/linux/sched.h
@@ -132,6 +132,7 @@ struct clone_args {
 #define SCHED_FLAG_KEEP_PARAMS		0x10
 #define SCHED_FLAG_UTIL_CLAMP_MIN	0x20
 #define SCHED_FLAG_UTIL_CLAMP_MAX	0x40
+#define SCHED_FLAG_LATENCY_NICE		0X80
 
 #define SCHED_FLAG_KEEP_ALL	(SCHED_FLAG_KEEP_POLICY | \
 				 SCHED_FLAG_KEEP_PARAMS)
@@ -143,6 +144,7 @@ struct clone_args {
 			 SCHED_FLAG_RECLAIM		| \
 			 SCHED_FLAG_DL_OVERRUN		| \
 			 SCHED_FLAG_KEEP_ALL		| \
-			 SCHED_FLAG_UTIL_CLAMP)
+			 SCHED_FLAG_UTIL_CLAMP		| \
+			 SCHED_FLAG_LATENCY_NICE)
 
 #endif /* _UAPI_LINUX_SCHED_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/7] sched/core: Add permission checks for setting the latency_nice value
  2022-05-12 16:35 [PATCH v2 0/7] Add latency_nice priority Vincent Guittot
                   ` (2 preceding siblings ...)
  2022-05-12 16:35 ` [PATCH v2 3/7] sched: Allow sched_{get,set}attr to change latency_nice of the task Vincent Guittot
@ 2022-05-12 16:35 ` Vincent Guittot
  2022-05-12 16:35 ` [PATCH v2 5/7] sched/fair: Take into account latency nice at wakeup Vincent Guittot
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Vincent Guittot @ 2022-05-12 16:35 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, linux-kernel, parth
  Cc: qais.yousef, chris.hyser, valentin.schneider, patrick.bellasi,
	David.Laight, pjt, pavel, tj, qperret, tim.c.chen, joshdon,
	Vincent Guittot

From: Parth Shah <parth@linux.ibm.com>

Since the latency_nice uses the similar infrastructure as NICE, use the
already existing CAP_SYS_NICE security checks for the latency_nice. This
should return -EPERM for the non-root user when trying to set the task
latency_nice value to any lower than the current value.

Signed-off-by: Parth Shah <parth@linux.ibm.com>
[rebase]
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 036bd9ff66e9..2c0f782a9089 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7340,6 +7340,10 @@ static int __sched_setscheduler(struct task_struct *p,
 			return -EINVAL;
 		if (attr->sched_latency_nice < MIN_LATENCY_NICE)
 			return -EINVAL;
+		/* Use the same security checks as NICE */
+		if (attr->sched_latency_nice < p->latency_nice &&
+		    !capable(CAP_SYS_NICE))
+			return -EPERM;
 	}
 
 	if (pi)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 5/7] sched/fair: Take into account latency nice at wakeup
  2022-05-12 16:35 [PATCH v2 0/7] Add latency_nice priority Vincent Guittot
                   ` (3 preceding siblings ...)
  2022-05-12 16:35 ` [PATCH v2 4/7] sched/core: Add permission checks for setting the latency_nice value Vincent Guittot
@ 2022-05-12 16:35 ` Vincent Guittot
  2022-05-17  0:54   ` Josh Don
  2022-05-12 16:35 ` [PATCH v2 6/7] sched/fair: Add sched group latency support Vincent Guittot
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2022-05-12 16:35 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, linux-kernel, parth
  Cc: qais.yousef, chris.hyser, valentin.schneider, patrick.bellasi,
	David.Laight, pjt, pavel, tj, qperret, tim.c.chen, joshdon,
	Vincent Guittot

Take into account the nice latency priority of a thread when deciding to
preempt the current running thread. We don't want to provide more CPU
bandwidth to a thread but reorder the scheduling to run latency sensitive
task first whenever possible.

As long as a thread didn't use its bandwidth, it will be able to preempt
the current thread.

At the opposite, a thread with a low latency priority will preempt current
thread at wakeup only to keep fair CPU bandwidth sharing. Otherwise it will
wait for the tick to get its sched slice.

                                   curr vruntime
                                       |
                      sysctl_sched_wakeup_granularity
                                   <-->
----------------------------------|----|-----------------------|---------------
                                  |    |<--------------------->
                                  |    .  sysctl_sched_latency
                                  |    .
default/current latency entity    |    .
                                  |    .
1111111111111111111111111111111111|0000|-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-
se preempts curr at wakeup ------>|<- se doesn't preempt curr -----------------
                                  |    .
                                  |    .
                                  |    .
low latency entity                |    .
                                   ---------------------->|
                               % of sysctl_sched_latency  |
1111111111111111111111111111111111111111111111111111111111|0000|-1-1-1-1-1-1-1-
preempt ------------------------------------------------->|<- do not preempt --
                                  |    .
                                  |    .
                                  |    .
high latency entity               |    .
         |<-----------------------|    .
         | % of sysctl_sched_latency   .
111111111|0000|-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1
preempt->|<- se doesn't preempt curr ------------------------------------------

Tests results of nice latency impact on heavy load like hackbench:

hackbench -l (2560 / group) -g group
group        latency 0             latency 19
1            1.399(+/- 1.27%)      1.357(+/- 1.48%) + 3%
4            1.422(+/- 4.69%)      1.267(+/- 2.39%) +11%
8            1.334(+/- 2.92%)      1.290(+/- 0.92%) + 3%
16           1.353(+/- 1.37%)      1.310(+/- 0.35%) + 3%

hackbench -p -l (2560 / group) -g group
group
1            1.450(+/- 9.14%)      0.853(+/- 3.23%) +41%
4            1.539(+/- 6.41%)      0.754(+/- 3.96%) +51%
8            1.380(+/- 8.04%)      0.687(+/- 5.30%) +50%
16           0.774(+/- 6.30%)      0.688(+/- 3.11%) +11%

By deacreasing the latency prio, we reduce the number of preemption at
wakeup and help hackbench making progress.

Test results of nice latency impact on short live load like cyclictest
while competing with heavy load like hackbench:

hackbench -l 10000 -g group &
cyclictest --policy other -D 5 -q -n
        latency 0           latency -20
group   min  avg    max     min  avg    max
0       15    18     28      17   17     27
1       65   386   9154      62   92   6268
4       63   447  14623      54   93   7655
8       63   847  43705      49  124  26500
16      53  1081  66523      44  199  30185

group = 0 means that hackbench is not running.

The avg is significantly improved with nice latency -20 especially with
large number of groups but min and max remain quite similar. If we add the
histogram parameters to get details of latency, we have :

hackbench -l 10000 -g 16 &
cyclictest --policy other -D 5 -q -n  -H 20000 --histfile data.txt
              latency 0    latency -20
Min Latencies:    63           62
Avg Latencies:  1129          132
Max Latencies: 71331        16762
50% latencies:    92           85
75% latencies:   622           90
85% latencies:  1038           93
90% latencies:  1371           96
95% latencies:  5304          100
99% latencies: 17329          137

With percentile details, we see the benefit of nice latency -20 as
1% of the latencies stays above 137us whereas the default latency has
got 25% are above 322us and 15% over the 1ms.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---

For v1, it has been discussed the opportunity to take into account latency_prio
in other places than check_preempt_wakeup().
The fast wakeup path is mainly about quickly looking for an idle CPU and I
don't see any place where the added complexity would provide obvious benefit.
The only place could be wake_affine_weight when we already compare the load;
here we could also check latency_nice prio of current tasks.
The slow path is mainly/only used for exec and fork and I wonder if there is
cases where we would like a newly forked task to preempt current one as soon
as possible as an example.
So I haven't add any new place that takes into account latency_prio for now.


 include/linux/sched.h |  4 ++-
 init/init_task.c      |  2 +-
 kernel/sched/core.c   | 34 +++++++++++++++++----
 kernel/sched/debug.c  |  2 +-
 kernel/sched/fair.c   | 69 +++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/sched.h  | 12 ++++++++
 6 files changed, 112 insertions(+), 11 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 34c6c9c2797c..d991cbd972ea 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -560,6 +560,8 @@ struct sched_entity {
 	unsigned long			runnable_weight;
 #endif
 
+	int				latency_weight;
+
 #ifdef CONFIG_SMP
 	/*
 	 * Per entity load average tracking.
@@ -775,7 +777,7 @@ struct task_struct {
 	int				static_prio;
 	int				normal_prio;
 	unsigned int			rt_priority;
-	int				latency_nice;
+	int				latency_prio;
 
 	struct sched_entity		se;
 	struct sched_rt_entity		rt;
diff --git a/init/init_task.c b/init/init_task.c
index 225d11a39bc9..e98c71f24981 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -78,7 +78,7 @@ struct task_struct init_task
 	.prio		= MAX_PRIO - 20,
 	.static_prio	= MAX_PRIO - 20,
 	.normal_prio	= MAX_PRIO - 20,
-	.latency_nice	= DEFAULT_LATENCY_NICE,
+	.latency_prio	= NICE_WIDTH - 20,
 	.policy		= SCHED_NORMAL,
 	.cpus_ptr	= &init_task.cpus_mask,
 	.user_cpus_ptr	= NULL,
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2c0f782a9089..ff020b99625c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1308,6 +1308,11 @@ static void set_load_weight(struct task_struct *p, bool update_load)
 	}
 }
 
+static void set_latency_weight(struct task_struct *p)
+{
+	p->se.latency_weight = sched_latency_to_weight[p->latency_prio];
+}
+
 #ifdef CONFIG_UCLAMP_TASK
 /*
  * Serializes updates of utilization clamp values
@@ -4474,7 +4479,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 	p->prio = current->normal_prio;
 
 	/* Propagate the parent's latency requirements to the child as well */
-	p->latency_nice = current->latency_nice;
+	p->latency_prio = current->latency_prio;
 
 	uclamp_fork(p);
 
@@ -4492,7 +4497,9 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 		p->prio = p->normal_prio = p->static_prio;
 		set_load_weight(p, false);
 
-		p->latency_nice = DEFAULT_LATENCY_NICE;
+		p->latency_prio = NICE_TO_LATENCY(0);
+		set_latency_weight(p);
+
 		/*
 		 * We don't need the reset flag anymore after the fork. It has
 		 * fulfilled its duty:
@@ -7207,7 +7214,8 @@ static void __setscheduler_latency(struct task_struct *p,
 		const struct sched_attr *attr)
 {
 	if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE) {
-		p->latency_nice = attr->sched_latency_nice;
+		p->latency_prio = NICE_TO_LATENCY(attr->sched_latency_nice);
+		set_latency_weight(p);
 	}
 }
 
@@ -7341,7 +7349,7 @@ static int __sched_setscheduler(struct task_struct *p,
 		if (attr->sched_latency_nice < MIN_LATENCY_NICE)
 			return -EINVAL;
 		/* Use the same security checks as NICE */
-		if (attr->sched_latency_nice < p->latency_nice &&
+		if (attr->sched_latency_nice < LATENCY_TO_NICE(p->latency_prio) &&
 		    !capable(CAP_SYS_NICE))
 			return -EPERM;
 	}
@@ -7381,7 +7389,7 @@ static int __sched_setscheduler(struct task_struct *p,
 		if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)
 			goto change;
 		if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE &&
-		    attr->sched_latency_nice != p->latency_nice)
+		    attr->sched_latency_nice != LATENCY_TO_NICE(p->latency_prio))
 			goto change;
 
 		p->sched_reset_on_fork = reset_on_fork;
@@ -7922,7 +7930,7 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
 	get_params(p, &kattr);
 	kattr.sched_flags &= SCHED_FLAG_ALL;
 
-	kattr.sched_latency_nice = p->latency_nice;
+	kattr.sched_latency_nice = LATENCY_TO_NICE(p->latency_prio);
 
 #ifdef CONFIG_UCLAMP_TASK
 	/*
@@ -11117,6 +11125,20 @@ const u32 sched_prio_to_wmult[40] = {
  /*  15 */ 119304647, 148102320, 186737708, 238609294, 286331153,
 };
 
+/*
+ * latency weight for wakeup preemption
+ */
+const int sched_latency_to_weight[40] = {
+ /* -20 */      1024,       973,       922,       870,       819,
+ /* -15 */       768,       717,       666,       614,       563,
+ /* -10 */       512,       461,       410,       358,       307,
+ /*  -5 */       256,       205,       154,       102,       51,
+ /*   0 */	   0,       -51,      -102,      -154,      -205,
+ /*   5 */      -256,      -307,      -358,      -410,      -461,
+ /*  10 */      -512,      -563,      -614,      -666,      -717,
+ /*  15 */      -768,      -819,      -870,      -922,      -973,
+};
+
 void call_trace_sched_update_nr_running(struct rq *rq, int count)
 {
         trace_sched_update_nr_running_tp(rq, count);
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index a3f7876217a6..06aaa0c81d4b 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1042,7 +1042,7 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
 #endif
 	P(policy);
 	P(prio);
-	P(latency_nice);
+	P(latency_prio);
 	if (task_has_dl_policy(p)) {
 		P(dl.runtime);
 		P(dl.deadline);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bc9f6e94c84e..3af74f1a79ca 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5619,6 +5619,35 @@ static int sched_idle_cpu(int cpu)
 }
 #endif
 
+static void set_next_buddy(struct sched_entity *se);
+
+static void check_preempt_from_idle(struct cfs_rq *cfs, struct sched_entity *se)
+{
+	struct sched_entity *next;
+
+	if (se->latency_weight <= 0)
+		return;
+
+	if (cfs->nr_running <= 1)
+		return;
+	/*
+	 * When waking from idle, we don't need to check to preempt at wakeup
+	 * the idle thread and don't set next buddy as a candidate for being
+	 * picked in priority.
+	 * In case of simultaneous wakeup from idle, the latency sensitive tasks
+	 * lost opportunity to preempt non sensitive tasks which woke up
+	 * simultaneously.
+	 */
+
+	if (cfs->next)
+		next = cfs->next;
+	else
+		next = __pick_first_entity(cfs);
+
+	if (next && wakeup_preempt_entity(next, se) == 1)
+		set_next_buddy(se);
+}
+
 /*
  * The enqueue_task method is called before nr_running is
  * increased. Here we update the fair scheduling stats and
@@ -5712,6 +5741,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	if (!task_new)
 		update_overutilized_status(rq);
 
+	if (rq->curr == rq->idle)
+		check_preempt_from_idle(cfs_rq_of(&p->se), &p->se);
+
 enqueue_throttle:
 	if (cfs_bandwidth_used()) {
 		/*
@@ -5733,8 +5765,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	hrtick_update(rq);
 }
 
-static void set_next_buddy(struct sched_entity *se);
-
 /*
  * The dequeue_task method is called before nr_running is
  * decreased. We remove the task from the rbtree and
@@ -6991,6 +7021,37 @@ balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 }
 #endif /* CONFIG_SMP */
 
+static long wakeup_latency_gran(struct sched_entity *curr, struct sched_entity *se)
+{
+	int latency_weight = se->latency_weight;
+	long thresh = sysctl_sched_latency;
+
+	/*
+	 * A positive latency weigth means that the sched_entity has latency
+	 * requirement that needs to be evaluated versus other entity.
+	 * Otherwise, use the latency weight to evaluate how much scheduling
+	 * delay is acceptable by se.
+	 */
+	if ((se->latency_weight > 0) || (curr->latency_weight > 0))
+		latency_weight -= curr->latency_weight;
+
+	if (!latency_weight)
+		return 0;
+
+	if (sched_feat(GENTLE_FAIR_SLEEPERS))
+		thresh >>= 1;
+
+	/*
+	 * Clamp the delta to stay in the scheduler period range
+	 * [-sysctl_sched_latency:sysctl_sched_latency]
+	 */
+	latency_weight = clamp_t(long, latency_weight,
+				-1 * NICE_LATENCY_WEIGHT_MAX,
+				NICE_LATENCY_WEIGHT_MAX);
+
+	return (thresh * latency_weight) >> NICE_LATENCY_SHIFT;
+}
+
 static unsigned long wakeup_gran(struct sched_entity *se)
 {
 	unsigned long gran = sysctl_sched_wakeup_granularity;
@@ -7030,6 +7091,9 @@ wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
 {
 	s64 gran, vdiff = curr->vruntime - se->vruntime;
 
+	/* Take into account latency priority */
+	vdiff += wakeup_latency_gran(curr, se);
+
 	if (vdiff <= 0)
 		return -1;
 
@@ -7138,6 +7202,7 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
 		return;
 
 	update_curr(cfs_rq_of(se));
+
 	if (wakeup_preempt_entity(se, pse) == 1) {
 		/*
 		 * Bias pick_next to pick the sched entity that is
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 271ecd37c13d..831b2c8feff1 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -134,6 +134,17 @@ extern void call_trace_sched_update_nr_running(struct rq *rq, int count);
  * Default tasks should be treated as a task with latency_nice = 0.
  */
 #define DEFAULT_LATENCY_NICE	0
+#define DEFAULT_LATENCY_PRIO	(DEFAULT_LATENCY_NICE + LATENCY_NICE_WIDTH/2)
+
+/*
+ * Convert user-nice values [ -20 ... 0 ... 19 ]
+ * to static latency [ 0..39 ],
+ * and back.
+ */
+#define NICE_TO_LATENCY(nice)	((nice) + DEFAULT_LATENCY_PRIO)
+#define LATENCY_TO_NICE(prio)	((prio) - DEFAULT_LATENCY_PRIO)
+#define NICE_LATENCY_SHIFT	(SCHED_FIXEDPOINT_SHIFT)
+#define NICE_LATENCY_WEIGHT_MAX	(1L << NICE_LATENCY_SHIFT)
 
 /*
  * Increase resolution of nice-level calculations for 64-bit architectures.
@@ -2078,6 +2089,7 @@ static_assert(WF_TTWU == SD_BALANCE_WAKE);
 
 extern const int		sched_prio_to_weight[40];
 extern const u32		sched_prio_to_wmult[40];
+extern const int		sched_latency_to_weight[40];
 
 /*
  * {de,en}queue flags:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 6/7] sched/fair: Add sched group latency support
  2022-05-12 16:35 [PATCH v2 0/7] Add latency_nice priority Vincent Guittot
                   ` (4 preceding siblings ...)
  2022-05-12 16:35 ` [PATCH v2 5/7] sched/fair: Take into account latency nice at wakeup Vincent Guittot
@ 2022-05-12 16:35 ` Vincent Guittot
  2022-05-12 16:35 ` [PATCH v2 7/7] sched/core: support latency nice with sched core Vincent Guittot
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Vincent Guittot @ 2022-05-12 16:35 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, linux-kernel, parth
  Cc: qais.yousef, chris.hyser, valentin.schneider, patrick.bellasi,
	David.Laight, pjt, pavel, tj, qperret, tim.c.chen, joshdon,
	Vincent Guittot

Tasks can set its latency priority which is then used to decide to preempt
the current running entity of the cfs but sched group entities still have
the default latency priority.

Add a latency field in task group to set the latency priority of the group
which will be used against other entity in the parent cfs.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---

For v1, there were dicussions about the best interface to express some latency
constraints for a cfs group. The weight seems to be specific to the wakeup path
and can't be easily reused elsewhere like EAS or idle cpu selection path.
The current proposal of a latency prio ranging from [0:40] seems the simplest
interface but on the other side, it removes the notion of either having latency
constaint with a negative value or relaxing the latency with positive value;
This is an important and easy to understand difference. So I tend to think that
keeping the latency nice is the easiest way to express if a group has latency
constraint (negative value) or don't care about scheduling latency
(positive value). Using a range will then help to order groups' constraint.

I have studied how we can use a duration but this will mainly provide confusion
to the user because whatever the value, we will never be able to ensure it.


 kernel/sched/core.c  | 41 +++++++++++++++++++++++++++++++++++++++++
 kernel/sched/fair.c  | 32 ++++++++++++++++++++++++++++++++
 kernel/sched/sched.h |  4 ++++
 3 files changed, 71 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ff020b99625c..95f3ef54447e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10798,6 +10798,30 @@ static int cpu_idle_write_s64(struct cgroup_subsys_state *css,
 {
 	return sched_group_set_idle(css_tg(css), idle);
 }
+
+static s64 cpu_latency_read_s64(struct cgroup_subsys_state *css,
+			       struct cftype *cft)
+{
+	return css_tg(css)->latency_prio;
+}
+
+static int cpu_latency_write_s64(struct cgroup_subsys_state *css,
+				struct cftype *cft, s64 latency_prio)
+{
+	return sched_group_set_latency(css_tg(css), latency_prio);
+}
+
+static s64 cpu_latency_nice_read_s64(struct cgroup_subsys_state *css,
+			       struct cftype *cft)
+{
+	return LATENCY_TO_NICE(css_tg(css)->latency_prio);
+}
+
+static int cpu_latency_nice_write_s64(struct cgroup_subsys_state *css,
+				struct cftype *cft, s64 latency_nice)
+{
+	return sched_group_set_latency(css_tg(css), NICE_TO_LATENCY(latency_nice));
+}
 #endif
 
 static struct cftype cpu_legacy_files[] = {
@@ -10812,6 +10836,11 @@ static struct cftype cpu_legacy_files[] = {
 		.read_s64 = cpu_idle_read_s64,
 		.write_s64 = cpu_idle_write_s64,
 	},
+	{
+		.name = "latency",
+		.read_s64 = cpu_latency_nice_read_s64,
+		.write_s64 = cpu_latency_nice_write_s64,
+	},
 #endif
 #ifdef CONFIG_CFS_BANDWIDTH
 	{
@@ -11029,6 +11058,12 @@ static struct cftype cpu_files[] = {
 		.read_s64 = cpu_idle_read_s64,
 		.write_s64 = cpu_idle_write_s64,
 	},
+	{
+		.name = "latency",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.read_s64 = cpu_latency_nice_read_s64,
+		.write_s64 = cpu_latency_nice_write_s64,
+	},
 #endif
 #ifdef CONFIG_CFS_BANDWIDTH
 	{
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3af74f1a79ca..71c0762491c5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11529,6 +11529,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
 		goto err;
 
 	tg->shares = NICE_0_LOAD;
+	tg->latency_prio = DEFAULT_LATENCY_PRIO;
 
 	init_cfs_bandwidth(tg_cfs_bandwidth(tg));
 
@@ -11627,6 +11628,7 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
 	}
 
 	se->my_q = cfs_rq;
+	se->latency_weight = sched_latency_to_weight[tg->latency_prio];
 	/* guarantee group entities always have weight */
 	update_load_set(&se->load, NICE_0_LOAD);
 	se->parent = parent;
@@ -11757,6 +11759,36 @@ int sched_group_set_idle(struct task_group *tg, long idle)
 	return 0;
 }
 
+int sched_group_set_latency(struct task_group *tg, long latency_prio)
+{
+	int i;
+
+	if (tg == &root_task_group)
+		return -EINVAL;
+
+	if (latency_prio < 0 ||
+	    latency_prio > LATENCY_NICE_WIDTH)
+		return -EINVAL;
+
+	mutex_lock(&shares_mutex);
+
+	if (tg->latency_prio == latency_prio) {
+		mutex_unlock(&shares_mutex);
+		return 0;
+	}
+
+	tg->latency_prio = latency_prio;
+
+	for_each_possible_cpu(i) {
+		struct sched_entity *se = tg->se[i];
+
+		WRITE_ONCE(se->latency_weight, sched_latency_to_weight[latency_prio]);
+	}
+
+	mutex_unlock(&shares_mutex);
+	return 0;
+}
+
 #else /* CONFIG_FAIR_GROUP_SCHED */
 
 void free_fair_sched_group(struct task_group *tg) { }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 831b2c8feff1..0c26bb5a742e 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -414,6 +414,8 @@ struct task_group {
 
 	/* A positive value indicates that this is a SCHED_IDLE group. */
 	int			idle;
+	/* latency priority of the group. */
+	int			latency_prio;
 
 #ifdef	CONFIG_SMP
 	/*
@@ -527,6 +529,8 @@ extern int sched_group_set_shares(struct task_group *tg, unsigned long shares);
 
 extern int sched_group_set_idle(struct task_group *tg, long idle);
 
+extern int sched_group_set_latency(struct task_group *tg, long latency);
+
 #ifdef CONFIG_SMP
 extern void set_task_rq_fair(struct sched_entity *se,
 			     struct cfs_rq *prev, struct cfs_rq *next);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 7/7] sched/core: support latency nice with sched core
  2022-05-12 16:35 [PATCH v2 0/7] Add latency_nice priority Vincent Guittot
                   ` (5 preceding siblings ...)
  2022-05-12 16:35 ` [PATCH v2 6/7] sched/fair: Add sched group latency support Vincent Guittot
@ 2022-05-12 16:35 ` Vincent Guittot
  2022-05-13  7:12 ` [PATCH v2 0/7] Add latency_nice priority David Laight
  2022-05-13 21:44 ` Tim Chen
  8 siblings, 0 replies; 23+ messages in thread
From: Vincent Guittot @ 2022-05-12 16:35 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, linux-kernel, parth
  Cc: qais.yousef, chris.hyser, valentin.schneider, patrick.bellasi,
	David.Laight, pjt, pavel, tj, qperret, tim.c.chen, joshdon,
	Vincent Guittot

Take into account wakeup_latency_gran() when ordering the cfs threads.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 71c0762491c5..063e9a3c7e51 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11193,6 +11193,10 @@ bool cfs_prio_less(struct task_struct *a, struct task_struct *b, bool in_fi)
 	delta = (s64)(sea->vruntime - seb->vruntime) +
 		(s64)(cfs_rqb->min_vruntime_fi - cfs_rqa->min_vruntime_fi);
 
+	/* Take into account latency prio */
+	delta += wakeup_latency_gran(sea, seb);
+
+
 	return delta > 0;
 }
 #else
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* RE: [PATCH v2 0/7]  Add latency_nice priority
  2022-05-12 16:35 [PATCH v2 0/7] Add latency_nice priority Vincent Guittot
                   ` (6 preceding siblings ...)
  2022-05-12 16:35 ` [PATCH v2 7/7] sched/core: support latency nice with sched core Vincent Guittot
@ 2022-05-13  7:12 ` David Laight
  2022-05-13 14:29   ` Vincent Guittot
  2022-05-13 21:44 ` Tim Chen
  8 siblings, 1 reply; 23+ messages in thread
From: David Laight @ 2022-05-13  7:12 UTC (permalink / raw)
  To: 'Vincent Guittot',
	mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, linux-kernel, parth
  Cc: qais.yousef, chris.hyser, valentin.schneider, patrick.bellasi,
	pjt, pavel, tj, qperret, tim.c.chen, joshdon

From: Vincent Guittot
> Sent: 12 May 2022 17:35
> 
> This patchset restarts the work about adding a latency nice priority to
> describe the latency tolerance of cfs tasks.
> 
> The patches [1-4] have been done by Parth:
> https://lore.kernel.org/lkml/20200228090755.22829-1-parth@linux.ibm.com/
> 
> I have just rebased and moved the set of latency priority outside the
> priority update. I have removed the reviewed tag because the patches
> are 2 years old.
> 
> The patches [5-7] use latency nice priority to decide if a cfs task can
> preempt the current running task. Patch 5 gives some tests results with
> cyclictests and hackbench to highlight the benefit of latency nice
> priority for short interactive task or long intensive tasks.

I'd have thought the best way to reduce latency would be to look
harder for an idle cpu before trying to preempt the current task.

By far the worst case latency for a cfs task is waking it from an
rt task.
AFAICT the cpu selection algorithm is something like:
1) if the cpu it last ran on is idle, schedule it there.
2) if one of a small subset of cpu is idle run it there.
3) schedule on the current cpu after the rt thread exits.

Then there is the additional behaviour:
An rt thread (almost) always starts on the cpu it ran on last,
any running cfs thread is put on the q for that cpu.

This makes it very difficult to start a background cfs thread
from an active rt thread.
Quite often it won't migrate to an idle cpu until the timer
tick scheduler rebalance happens.

I think the search for an idle cpu is also limited when woken
by a cfs thread.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/7] Add latency_nice priority
  2022-05-13  7:12 ` [PATCH v2 0/7] Add latency_nice priority David Laight
@ 2022-05-13 14:29   ` Vincent Guittot
  0 siblings, 0 replies; 23+ messages in thread
From: Vincent Guittot @ 2022-05-13 14:29 UTC (permalink / raw)
  To: David Laight
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, linux-kernel, parth, qais.yousef,
	chris.hyser, valentin.schneider, patrick.bellasi, pjt, pavel, tj,
	qperret, tim.c.chen, joshdon

On Fri, 13 May 2022 at 09:13, David Laight <David.Laight@aculab.com> wrote:
>
> From: Vincent Guittot
> > Sent: 12 May 2022 17:35
> >
> > This patchset restarts the work about adding a latency nice priority to
> > describe the latency tolerance of cfs tasks.
> >
> > The patches [1-4] have been done by Parth:
> > https://lore.kernel.org/lkml/20200228090755.22829-1-parth@linux.ibm.com/
> >
> > I have just rebased and moved the set of latency priority outside the
> > priority update. I have removed the reviewed tag because the patches
> > are 2 years old.
> >
> > The patches [5-7] use latency nice priority to decide if a cfs task can
> > preempt the current running task. Patch 5 gives some tests results with
> > cyclictests and hackbench to highlight the benefit of latency nice
> > priority for short interactive task or long intensive tasks.
>
> I'd have thought the best way to reduce latency would be to look
> harder for an idle cpu before trying to preempt the current task.

Although it's a good policy, this is not always true:
- The wakeup latency of an idle CPU can be significant for deep idle
state (several ms)
- The cost of looking for an idle CPU can be higher than preempting
current thread

>
> By far the worst case latency for a cfs task is waking it from an
> rt task.
> AFAICT the cpu selection algorithm is something like:
> 1) if the cpu it last ran on is idle, schedule it there.
> 2) if one of a small subset of cpu is idle run it there.

yes the LLC

> 3) schedule on the current cpu after the rt thread exits.

when prev and current cpu are differents, we start to select which one
will be the starting point for looking for an idle cpu

>
> Then there is the additional behaviour:
> An rt thread (almost) always starts on the cpu it ran on last,
> any running cfs thread is put on the q for that cpu.

and might even have to wait for other cfs threads to run 1st which is
where this patch want to improve in priority

>
> This makes it very difficult to start a background cfs thread
> from an active rt thread.
> Quite often it won't migrate to an idle cpu until the timer
> tick scheduler rebalance happens.

so you are speaking about idle cpu out of the LLC
But then, this cfs task should try to wakeup 1st on the new cpu and
not on the RT one

>
> I think the search for an idle cpu is also limited when woken
> by a cfs thread.
>
>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/7]  Add latency_nice priority
  2022-05-12 16:35 [PATCH v2 0/7] Add latency_nice priority Vincent Guittot
                   ` (7 preceding siblings ...)
  2022-05-13  7:12 ` [PATCH v2 0/7] Add latency_nice priority David Laight
@ 2022-05-13 21:44 ` Tim Chen
  2022-05-19 14:16   ` Vincent Guittot
  8 siblings, 1 reply; 23+ messages in thread
From: Tim Chen @ 2022-05-13 21:44 UTC (permalink / raw)
  To: Vincent Guittot, mingo, peterz, juri.lelli, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, linux-kernel,
	parth
  Cc: qais.yousef, chris.hyser, valentin.schneider, patrick.bellasi,
	David.Laight, pjt, pavel, tj, qperret, joshdon, len.brown

On Thu, 2022-05-12 at 18:35 +0200, Vincent Guittot wrote:
> This patchset restarts the work about adding a latency nice priority to
> describe the latency tolerance of cfs tasks.
> 
> The patches [1-4] have been done by Parth:
> https://lore.kernel.org/lkml/20200228090755.22829-1-parth@linux.ibm.com/
> 
> I have just rebased and moved the set of latency priority outside the
> priority update. I have removed the reviewed tag because the patches
> are 2 years old.
> 

Vincent,

Thanks for introducing the feature again, which is much needed.  I am trying
to look at the problem again from usage point of view. And wonder if
there are ways to make the latency_nice knob easier to use.

The latency nice value here is relative.  A latency sensitive task
may not tell if setting the latency_nice to -5, or to -10 is good enough.
It depends on what other tasks are setting their latency_nice value to.
What a task does know is what it it doing and its characteristics.  
For instance for client tasks, we may have categories such as

Task Category					latency_nice_range
-------------                                   ------------------
urgent						-19 to -16
media playback					-15 to -11
interactive (e.g.pressing key)  		-10 to -6
normal						-5  to  9
background					 10  to 15
opportunistic soaker task (sched_idle class)	 16 to  20

And we could allow a task to set attribute of which task category applies
to it and the OS can set a default latency nice value in its task category.   
So a task can just declare itself what kind of task it is, and not worry about
actually setting a latency nice value which it may not know
what is appopriate. 
If needed, a task could still adjust its latency nice value within the range to
differentiate itself in a task category. And we will prevent
a task from seeting inappropriate latency nice value out of the right range.

Knowing a task characteristics will also be helpful with other
scheduling decisions, like placing a task on a more high performing
core in hetero systems.  

I think the missing piece here is a way for a task to declare 
what kind of task it is.  I think that will make things easier.

Tim

> The patches [5-7] use latency nice priority to decide if a cfs task can
> preempt the current running task. Patch 5 gives some tests results with
> cyclictests and hackbench to highlight the benefit of latency nice
> priority for short interactive task or long intensive tasks.
> 
> 
> Change since v1:
> - fix typo
> - move some codes in the right patch to make bisect happy
> - simplify and fixed how the weight is computed
> - added support of sched core patch 7
> 
> Parth Shah (4):
>   sched: Introduce latency-nice as a per-task attribute
>   sched/core: Propagate parent task's latency requirements to the child
>     task
>   sched: Allow sched_{get,set}attr to change latency_nice of the task
>   sched/core: Add permission checks for setting the latency_nice value
> 
> Vincent Guittot (3):
>   sched/fair: Take into account latency nice at wakeup
>   sched/fair: Add sched group latency support
>   sched/core: support latency nice with sched core
> 
>  include/linux/sched.h            |   3 +
>  include/uapi/linux/sched.h       |   4 +-
>  include/uapi/linux/sched/types.h |  19 ++++++
>  init/init_task.c                 |   1 +
>  kernel/sched/core.c              |  90 ++++++++++++++++++++++++++
>  kernel/sched/debug.c             |   1 +
>  kernel/sched/fair.c              | 105 ++++++++++++++++++++++++++++++-
>  kernel/sched/sched.h             |  34 ++++++++++
>  tools/include/uapi/linux/sched.h |   4 +-
>  9 files changed, 257 insertions(+), 4 deletions(-)
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] sched/fair: Add sched group latency support
  2022-05-12 16:35 ` [PATCH v2 6/7] sched/fair: Add sched group latency support Vincent Guittot
@ 2022-05-16 13:06 ` Dan Carpenter
  -1 siblings, 0 replies; 23+ messages in thread
From: kernel test robot @ 2022-05-15  8:28 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 3755 bytes --]

CC: kbuild-all(a)lists.01.org
BCC: lkp(a)intel.com
In-Reply-To: <20220512163534.2572-7-vincent.guittot@linaro.org>
References: <20220512163534.2572-7-vincent.guittot@linaro.org>
TO: Vincent Guittot <vincent.guittot@linaro.org>

Hi Vincent,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on linux/master tip/master linus/master v5.18-rc6 next-20220513]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Vincent-Guittot/Add-latency_nice-priority/20220513-005836
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 734387ec2f9d77b00276042b1fa7c95f48ee879d
:::::: branch date: 3 days ago
:::::: commit date: 3 days ago
config: x86_64-rhel-8.3-kselftests (https://download.01.org/0day-ci/archive/20220515/202205151601.2cHPPsvk-lkp(a)intel.com/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
kernel/sched/fair.c:11785 sched_group_set_latency() error: buffer overflow 'sched_latency_to_weight' 40 <= 40
kernel/sched/fair.c:11785 sched_group_set_latency() error: buffer overflow 'sched_latency_to_weight' 40 <= 40

vim +/sched_latency_to_weight +11785 kernel/sched/fair.c

304000390f88d0 Josh Don        2021-07-29  11761  
632b7c939b164d Vincent Guittot 2022-05-12  11762  int sched_group_set_latency(struct task_group *tg, long latency_prio)
632b7c939b164d Vincent Guittot 2022-05-12  11763  {
632b7c939b164d Vincent Guittot 2022-05-12  11764  	int i;
632b7c939b164d Vincent Guittot 2022-05-12  11765  
632b7c939b164d Vincent Guittot 2022-05-12  11766  	if (tg == &root_task_group)
632b7c939b164d Vincent Guittot 2022-05-12  11767  		return -EINVAL;
632b7c939b164d Vincent Guittot 2022-05-12  11768  
632b7c939b164d Vincent Guittot 2022-05-12  11769  	if (latency_prio < 0 ||
632b7c939b164d Vincent Guittot 2022-05-12  11770  	    latency_prio > LATENCY_NICE_WIDTH)
632b7c939b164d Vincent Guittot 2022-05-12  11771  		return -EINVAL;
632b7c939b164d Vincent Guittot 2022-05-12  11772  
632b7c939b164d Vincent Guittot 2022-05-12  11773  	mutex_lock(&shares_mutex);
632b7c939b164d Vincent Guittot 2022-05-12  11774  
632b7c939b164d Vincent Guittot 2022-05-12  11775  	if (tg->latency_prio == latency_prio) {
632b7c939b164d Vincent Guittot 2022-05-12  11776  		mutex_unlock(&shares_mutex);
632b7c939b164d Vincent Guittot 2022-05-12  11777  		return 0;
632b7c939b164d Vincent Guittot 2022-05-12  11778  	}
632b7c939b164d Vincent Guittot 2022-05-12  11779  
632b7c939b164d Vincent Guittot 2022-05-12  11780  	tg->latency_prio = latency_prio;
632b7c939b164d Vincent Guittot 2022-05-12  11781  
632b7c939b164d Vincent Guittot 2022-05-12  11782  	for_each_possible_cpu(i) {
632b7c939b164d Vincent Guittot 2022-05-12  11783  		struct sched_entity *se = tg->se[i];
632b7c939b164d Vincent Guittot 2022-05-12  11784  
632b7c939b164d Vincent Guittot 2022-05-12 @11785  		WRITE_ONCE(se->latency_weight, sched_latency_to_weight[latency_prio]);
632b7c939b164d Vincent Guittot 2022-05-12  11786  	}
632b7c939b164d Vincent Guittot 2022-05-12  11787  
632b7c939b164d Vincent Guittot 2022-05-12  11788  	mutex_unlock(&shares_mutex);
632b7c939b164d Vincent Guittot 2022-05-12  11789  	return 0;
632b7c939b164d Vincent Guittot 2022-05-12  11790  }
632b7c939b164d Vincent Guittot 2022-05-12  11791  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] sched/fair: Add sched group latency support
@ 2022-05-16 13:06 ` Dan Carpenter
  0 siblings, 0 replies; 23+ messages in thread
From: Dan Carpenter @ 2022-05-16 13:06 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3197 bytes --]

Hi Vincent,

url:    https://github.com/intel-lab-lkp/linux/commits/Vincent-Guittot/Add-latency_nice-priority/20220513-005836
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 734387ec2f9d77b00276042b1fa7c95f48ee879d
config: x86_64-rhel-8.3-kselftests (https://download.01.org/0day-ci/archive/20220515/202205151601.2cHPPsvk-lkp(a)intel.com/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
kernel/sched/fair.c:11785 sched_group_set_latency() error: buffer overflow 'sched_latency_to_weight' 40 <= 40
kernel/sched/fair.c:11785 sched_group_set_latency() error: buffer overflow 'sched_latency_to_weight' 40 <= 40

vim +/sched_latency_to_weight +11785 kernel/sched/fair.c

632b7c939b164d Vincent Guittot 2022-05-12  11762  int sched_group_set_latency(struct task_group *tg, long latency_prio)
632b7c939b164d Vincent Guittot 2022-05-12  11763  {
632b7c939b164d Vincent Guittot 2022-05-12  11764  	int i;
632b7c939b164d Vincent Guittot 2022-05-12  11765  
632b7c939b164d Vincent Guittot 2022-05-12  11766  	if (tg == &root_task_group)
632b7c939b164d Vincent Guittot 2022-05-12  11767  		return -EINVAL;
632b7c939b164d Vincent Guittot 2022-05-12  11768  
632b7c939b164d Vincent Guittot 2022-05-12  11769  	if (latency_prio < 0 ||
632b7c939b164d Vincent Guittot 2022-05-12  11770  	    latency_prio > LATENCY_NICE_WIDTH)

This would probably be >= instead of >?

632b7c939b164d Vincent Guittot 2022-05-12  11771  		return -EINVAL;
632b7c939b164d Vincent Guittot 2022-05-12  11772  
632b7c939b164d Vincent Guittot 2022-05-12  11773  	mutex_lock(&shares_mutex);
632b7c939b164d Vincent Guittot 2022-05-12  11774  
632b7c939b164d Vincent Guittot 2022-05-12  11775  	if (tg->latency_prio == latency_prio) {
632b7c939b164d Vincent Guittot 2022-05-12  11776  		mutex_unlock(&shares_mutex);
632b7c939b164d Vincent Guittot 2022-05-12  11777  		return 0;
632b7c939b164d Vincent Guittot 2022-05-12  11778  	}
632b7c939b164d Vincent Guittot 2022-05-12  11779  
632b7c939b164d Vincent Guittot 2022-05-12  11780  	tg->latency_prio = latency_prio;
632b7c939b164d Vincent Guittot 2022-05-12  11781  
632b7c939b164d Vincent Guittot 2022-05-12  11782  	for_each_possible_cpu(i) {
632b7c939b164d Vincent Guittot 2022-05-12  11783  		struct sched_entity *se = tg->se[i];
632b7c939b164d Vincent Guittot 2022-05-12  11784  
632b7c939b164d Vincent Guittot 2022-05-12 @11785  		WRITE_ONCE(se->latency_weight, sched_latency_to_weight[latency_prio]);
                                                                                               ^^^^^^^^^^^^^^^^^^^^^^^^
40 elements and latency_prio is 0-40 so off by one.

632b7c939b164d Vincent Guittot 2022-05-12  11786  	}
632b7c939b164d Vincent Guittot 2022-05-12  11787  
632b7c939b164d Vincent Guittot 2022-05-12  11788  	mutex_unlock(&shares_mutex);
632b7c939b164d Vincent Guittot 2022-05-12  11789  	return 0;
632b7c939b164d Vincent Guittot 2022-05-12  11790  }

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] sched/fair: Add sched group latency support
  2022-05-16 13:06 ` Dan Carpenter
  (?)
@ 2022-05-16 13:33 ` Vincent Guittot
  -1 siblings, 0 replies; 23+ messages in thread
From: Vincent Guittot @ 2022-05-16 13:33 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3507 bytes --]

On Mon, 16 May 2022 at 15:07, Dan Carpenter <dan.carpenter@oracle.com> wrote:
>
> Hi Vincent,
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Vincent-Guittot/Add-latency_nice-priority/20220513-005836
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 734387ec2f9d77b00276042b1fa7c95f48ee879d
> config: x86_64-rhel-8.3-kselftests (https://download.01.org/0day-ci/archive/20220515/202205151601.2cHPPsvk-lkp(a)intel.com/config)
> compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
>
> smatch warnings:
> kernel/sched/fair.c:11785 sched_group_set_latency() error: buffer overflow 'sched_latency_to_weight' 40 <= 40
> kernel/sched/fair.c:11785 sched_group_set_latency() error: buffer overflow 'sched_latency_to_weight' 40 <= 40
>
> vim +/sched_latency_to_weight +11785 kernel/sched/fair.c
>
> 632b7c939b164d Vincent Guittot 2022-05-12  11762  int sched_group_set_latency(struct task_group *tg, long latency_prio)
> 632b7c939b164d Vincent Guittot 2022-05-12  11763  {
> 632b7c939b164d Vincent Guittot 2022-05-12  11764        int i;
> 632b7c939b164d Vincent Guittot 2022-05-12  11765
> 632b7c939b164d Vincent Guittot 2022-05-12  11766        if (tg == &root_task_group)
> 632b7c939b164d Vincent Guittot 2022-05-12  11767                return -EINVAL;
> 632b7c939b164d Vincent Guittot 2022-05-12  11768
> 632b7c939b164d Vincent Guittot 2022-05-12  11769        if (latency_prio < 0 ||
> 632b7c939b164d Vincent Guittot 2022-05-12  11770            latency_prio > LATENCY_NICE_WIDTH)
>
> This would probably be >= instead of >?

yes

>
> 632b7c939b164d Vincent Guittot 2022-05-12  11771                return -EINVAL;
> 632b7c939b164d Vincent Guittot 2022-05-12  11772
> 632b7c939b164d Vincent Guittot 2022-05-12  11773        mutex_lock(&shares_mutex);
> 632b7c939b164d Vincent Guittot 2022-05-12  11774
> 632b7c939b164d Vincent Guittot 2022-05-12  11775        if (tg->latency_prio == latency_prio) {
> 632b7c939b164d Vincent Guittot 2022-05-12  11776                mutex_unlock(&shares_mutex);
> 632b7c939b164d Vincent Guittot 2022-05-12  11777                return 0;
> 632b7c939b164d Vincent Guittot 2022-05-12  11778        }
> 632b7c939b164d Vincent Guittot 2022-05-12  11779
> 632b7c939b164d Vincent Guittot 2022-05-12  11780        tg->latency_prio = latency_prio;
> 632b7c939b164d Vincent Guittot 2022-05-12  11781
> 632b7c939b164d Vincent Guittot 2022-05-12  11782        for_each_possible_cpu(i) {
> 632b7c939b164d Vincent Guittot 2022-05-12  11783                struct sched_entity *se = tg->se[i];
> 632b7c939b164d Vincent Guittot 2022-05-12  11784
> 632b7c939b164d Vincent Guittot 2022-05-12 @11785                WRITE_ONCE(se->latency_weight, sched_latency_to_weight[latency_prio]);
>                                                                                                ^^^^^^^^^^^^^^^^^^^^^^^^
> 40 elements and latency_prio is 0-40 so off by one.
>
> 632b7c939b164d Vincent Guittot 2022-05-12  11786        }
> 632b7c939b164d Vincent Guittot 2022-05-12  11787
> 632b7c939b164d Vincent Guittot 2022-05-12  11788        mutex_unlock(&shares_mutex);
> 632b7c939b164d Vincent Guittot 2022-05-12  11789        return 0;
> 632b7c939b164d Vincent Guittot 2022-05-12  11790  }
>
> --
> 0-DAY CI Kernel Test Service
> https://01.org/lkp
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/7] sched/fair: Take into account latency nice at wakeup
  2022-05-12 16:35 ` [PATCH v2 5/7] sched/fair: Take into account latency nice at wakeup Vincent Guittot
@ 2022-05-17  0:54   ` Josh Don
  2022-05-19 13:55     ` Vincent Guittot
  0 siblings, 1 reply; 23+ messages in thread
From: Josh Don @ 2022-05-17  0:54 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Benjamin Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	parth, Qais Yousef, Hyser,Chris, Valentin Schneider,
	patrick.bellasi, David.Laight, Paul Turner, pavel, Tejun Heo,
	Quentin Perret, Tim Chen

Hi Vincent,

On Thu, May 12, 2022 at 9:36 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> Take into account the nice latency priority of a thread when deciding to
> preempt the current running thread. We don't want to provide more CPU
> bandwidth to a thread but reorder the scheduling to run latency sensitive
> task first whenever possible.
>
> As long as a thread didn't use its bandwidth, it will be able to preempt
> the current thread.
>
> At the opposite, a thread with a low latency priority will preempt current
> thread at wakeup only to keep fair CPU bandwidth sharing. Otherwise it will
> wait for the tick to get its sched slice.

Following up from the discussion on the prior series, I'm still not
sure why this approach is exclusive of extending the entity placement
code; I think both changes together would be useful.

By only changing the wakeup preemption decision, you're only
guaranteeing that the latency-sensitive thing on cpu won't be
preempted until the next sched tick, which can occur at any time
offset from the wakeup, from 0ns to the length of one tick. If a
latency-tolerant task wakes up with a lot of sleeper credit, it would
pretty quickly preempt a latency-sensitive task on-cpu, even if it
doesn't initially do so due to the above changes to wakeup preemption.

Adjusting place_entity wouldn't impact cpu bandwidth in steady-state
competition between threads of different latency prio, it would only
impact slightly at wakeup, in a similar but more consistent manner to
the changes above to wakeup_preempt_entity (ie. a task that is not
latency sensitive might have to wait a few ticks to preempt a latency
sensitive task, even if it was recently sleeping for a while).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/7] sched/fair: Take into account latency nice at wakeup
  2022-05-17  0:54   ` Josh Don
@ 2022-05-19 13:55     ` Vincent Guittot
  2022-05-19 20:06       ` Josh Don
  0 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2022-05-19 13:55 UTC (permalink / raw)
  To: Josh Don
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Benjamin Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	parth, Qais Yousef, Hyser,Chris, Valentin Schneider,
	patrick.bellasi, David.Laight, Paul Turner, pavel, Tejun Heo,
	Quentin Perret, Tim Chen

On Tue, 17 May 2022 at 02:54, Josh Don <joshdon@google.com> wrote:
>
> Hi Vincent,
>
> On Thu, May 12, 2022 at 9:36 AM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > Take into account the nice latency priority of a thread when deciding to
> > preempt the current running thread. We don't want to provide more CPU
> > bandwidth to a thread but reorder the scheduling to run latency sensitive
> > task first whenever possible.
> >
> > As long as a thread didn't use its bandwidth, it will be able to preempt
> > the current thread.
> >
> > At the opposite, a thread with a low latency priority will preempt current
> > thread at wakeup only to keep fair CPU bandwidth sharing. Otherwise it will
> > wait for the tick to get its sched slice.
>
> Following up from the discussion on the prior series, I'm still not
> sure why this approach is exclusive of extending the entity placement
> code; I think both changes together would be useful.
>
> By only changing the wakeup preemption decision, you're only
> guaranteeing that the latency-sensitive thing on cpu won't be
> preempted until the next sched tick, which can occur at any time
> offset from the wakeup, from 0ns to the length of one tick. If a

In fact, you are ensured to run a minimum time of 3ms at least on >=8
cores system before tick can preempt you. I considered updating this
part as well to increase the value for negative weight but didn't do
it for now. I can have a look

> latency-tolerant task wakes up with a lot of sleeper credit, it would
> pretty quickly preempt a latency-sensitive task on-cpu, even if it
> doesn't initially do so due to the above changes to wakeup preemption.
>
> Adjusting place_entity wouldn't impact cpu bandwidth in steady-state
> competition between threads of different latency prio, it would only
> impact slightly at wakeup, in a similar but more consistent manner to
> the changes above to wakeup_preempt_entity (ie. a task that is not
> latency sensitive might have to wait a few ticks to preempt a latency
> sensitive task, even if it was recently sleeping for a while).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/7] Add latency_nice priority
  2022-05-13 21:44 ` Tim Chen
@ 2022-05-19 14:16   ` Vincent Guittot
  2022-05-19 18:14     ` Chris Hyser
  0 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2022-05-19 14:16 UTC (permalink / raw)
  To: Tim Chen
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, linux-kernel, parth, qais.yousef,
	chris.hyser, valentin.schneider, patrick.bellasi, David.Laight,
	pjt, pavel, tj, qperret, joshdon, len.brown

On Fri, 13 May 2022 at 23:44, Tim Chen <tim.c.chen@linux.intel.com> wrote:
>
> On Thu, 2022-05-12 at 18:35 +0200, Vincent Guittot wrote:
> > This patchset restarts the work about adding a latency nice priority to
> > describe the latency tolerance of cfs tasks.
> >
> > The patches [1-4] have been done by Parth:
> > https://lore.kernel.org/lkml/20200228090755.22829-1-parth@linux.ibm.com/
> >
> > I have just rebased and moved the set of latency priority outside the
> > priority update. I have removed the reviewed tag because the patches
> > are 2 years old.
> >
>
> Vincent,
>
> Thanks for introducing the feature again, which is much needed.  I am trying
> to look at the problem again from usage point of view. And wonder if
> there are ways to make the latency_nice knob easier to use.
>
> The latency nice value here is relative.  A latency sensitive task
> may not tell if setting the latency_nice to -5, or to -10 is good enough.
> It depends on what other tasks are setting their latency_nice value to.
> What a task does know is what it it doing and its characteristics.
> For instance for client tasks, we may have categories such as
>
> Task Category                                   latency_nice_range
> -------------                                   ------------------
> urgent                                          -19 to -16
> media playback                                  -15 to -11
> interactive (e.g.pressing key)                  -10 to -6
> normal                                          -5  to  9
> background                                       10  to 15
> opportunistic soaker task (sched_idle class)     16 to  20
>
> And we could allow a task to set attribute of which task category applies
> to it and the OS can set a default latency nice value in its task category.
> So a task can just declare itself what kind of task it is, and not worry about
> actually setting a latency nice value which it may not know
> what is appopriate.
> If needed, a task could still adjust its latency nice value within the range to
> differentiate itself in a task category. And we will prevent
> a task from seeting inappropriate latency nice value out of the right range.

The description above make sense but I'm not sure this should be put
as part of the interface but more in the documentation to describe how
system can make use of nice_latency
>
> Knowing a task characteristics will also be helpful with other
> scheduling decisions, like placing a task on a more high performing
> core in hetero systems.

Ok so you would like a more general interface than an latency
interface but a way to set some attributes to a task so we can make
smarter decision

>
> I think the missing piece here is a way for a task to declare
> what kind of task it is.  I think that will make things easier.
>
> Tim
>
> > The patches [5-7] use latency nice priority to decide if a cfs task can
> > preempt the current running task. Patch 5 gives some tests results with
> > cyclictests and hackbench to highlight the benefit of latency nice
> > priority for short interactive task or long intensive tasks.
> >
> >
> > Change since v1:
> > - fix typo
> > - move some codes in the right patch to make bisect happy
> > - simplify and fixed how the weight is computed
> > - added support of sched core patch 7
> >
> > Parth Shah (4):
> >   sched: Introduce latency-nice as a per-task attribute
> >   sched/core: Propagate parent task's latency requirements to the child
> >     task
> >   sched: Allow sched_{get,set}attr to change latency_nice of the task
> >   sched/core: Add permission checks for setting the latency_nice value
> >
> > Vincent Guittot (3):
> >   sched/fair: Take into account latency nice at wakeup
> >   sched/fair: Add sched group latency support
> >   sched/core: support latency nice with sched core
> >
> >  include/linux/sched.h            |   3 +
> >  include/uapi/linux/sched.h       |   4 +-
> >  include/uapi/linux/sched/types.h |  19 ++++++
> >  init/init_task.c                 |   1 +
> >  kernel/sched/core.c              |  90 ++++++++++++++++++++++++++
> >  kernel/sched/debug.c             |   1 +
> >  kernel/sched/fair.c              | 105 ++++++++++++++++++++++++++++++-
> >  kernel/sched/sched.h             |  34 ++++++++++
> >  tools/include/uapi/linux/sched.h |   4 +-
> >  9 files changed, 257 insertions(+), 4 deletions(-)
> >
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/7] Add latency_nice priority
  2022-05-19 14:16   ` Vincent Guittot
@ 2022-05-19 18:14     ` Chris Hyser
  2022-05-20 18:46       ` Tim Chen
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Hyser @ 2022-05-19 18:14 UTC (permalink / raw)
  To: Vincent Guittot, Tim Chen
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, linux-kernel, parth, qais.yousef,
	valentin.schneider, patrick.bellasi, David.Laight, pjt, pavel,
	tj, qperret, joshdon, len.brown

On 5/19/22 10:16 AM, Vincent Guittot wrote:
> On Fri, 13 May 2022 at 23:44, Tim Chen <tim.c.chen@linux.intel.com> wrote:
>>
>> On Thu, 2022-05-12 at 18:35 +0200, Vincent Guittot wrote:
>>> This patchset restarts the work about adding a latency nice priority to
>>> describe the latency tolerance of cfs tasks.
>>>
>>> The patches [1-4] have been done by Parth:
>>> https://lore.kernel.org/lkml/20200228090755.22829-1-parth@linux.ibm.com/
>>>
>>> I have just rebased and moved the set of latency priority outside the
>>> priority update. I have removed the reviewed tag because the patches
>>> are 2 years old.
>>>
>>
>> Vincent,
>>
>> Thanks for introducing the feature again, which is much needed.  I am trying
>> to look at the problem again from usage point of view. And wonder if
>> there are ways to make the latency_nice knob easier to use.
>>
>> The latency nice value here is relative.  A latency sensitive task
>> may not tell if setting the latency_nice to -5, or to -10 is good enough.
>> It depends on what other tasks are setting their latency_nice value to.
>> What a task does know is what it it doing and its characteristics.
>> For instance for client tasks, we may have categories such as
>>
>> Task Category                                   latency_nice_range
>> -------------                                   ------------------
>> urgent                                          -19 to -16
>> media playback                                  -15 to -11
>> interactive (e.g.pressing key)                  -10 to -6
>> normal                                          -5  to  9
>> background                                       10  to 15
>> opportunistic soaker task (sched_idle class)     16 to  20
>>
>> And we could allow a task to set attribute of which task category applies
>> to it and the OS can set a default latency nice value in its task category.
>> So a task can just declare itself what kind of task it is, and not worry about
>> actually setting a latency nice value which it may not know
>> what is appopriate.
>> If needed, a task could still adjust its latency nice value within the range to
>> differentiate itself in a task category. And we will prevent
>> a task from seeting inappropriate latency nice value out of the right range.
> 
> The description above make sense but I'm not sure this should be put
> as part of the interface but more in the documentation to describe how
> system can make use of nice_latency
>>
>> Knowing a task characteristics will also be helpful with other
>> scheduling decisions, like placing a task on a more high performing
>> core in hetero systems.
> 
> Ok so you would like a more general interface than an latency
> interface but a way to set some attributes to a task so we can make
> smarter decision

The original definition of latency nice was as a task attribute describing the latency sensitivity of the task. The fact 
that it was mapped to 'nice' values created too much granularity and made it look more like a tuning knob than a 
statement about the characteristics of the task as intended.

> 
>>
>> I think the missing piece here is a way for a task to declare
>> what kind of task it is.  I think that will make things easier.

A classification of tasks into categories would be useful, but perhaps one level up in a user space tool or a user's 
head (ie docs). For any of the categories you describe, there may be a number of per-task attributes beyond latency 
sensitivity needed to capture the task characteristics you mention and ideally would be set in specific ways. Say 'nice' 
values, oom kill, etc. And others may make sense in the future, like say NUMA sensitivity, etc.

Basically, a category can map to a bunch of desired default values for various scheduler visible task attributes.

Now you could also take the idea in the other direction where you set a "category value" for a task and have the kernel 
pick the other attribute defaults like 'nice' that would typically apply to tasks in the category, but I think letting 
user space figure stuff out and then set low level kernel task attributes primitives is cleaner.

-chrish

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/7] sched/fair: Take into account latency nice at wakeup
  2022-05-19 13:55     ` Vincent Guittot
@ 2022-05-19 20:06       ` Josh Don
  2022-05-23 12:58         ` Vincent Guittot
  0 siblings, 1 reply; 23+ messages in thread
From: Josh Don @ 2022-05-19 20:06 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Benjamin Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	parth, Qais Yousef, Hyser,Chris, Valentin Schneider,
	patrick.bellasi, David.Laight, Paul Turner, pavel, Tejun Heo,
	Quentin Perret, Tim Chen

On Thu, May 19, 2022 at 6:57 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Tue, 17 May 2022 at 02:54, Josh Don <joshdon@google.com> wrote:
> >
> > Hi Vincent,
> >
> > On Thu, May 12, 2022 at 9:36 AM Vincent Guittot
> > <vincent.guittot@linaro.org> wrote:
> > >
> > > Take into account the nice latency priority of a thread when deciding to
> > > preempt the current running thread. We don't want to provide more CPU
> > > bandwidth to a thread but reorder the scheduling to run latency sensitive
> > > task first whenever possible.
> > >
> > > As long as a thread didn't use its bandwidth, it will be able to preempt
> > > the current thread.
> > >
> > > At the opposite, a thread with a low latency priority will preempt current
> > > thread at wakeup only to keep fair CPU bandwidth sharing. Otherwise it will
> > > wait for the tick to get its sched slice.
> >
> > Following up from the discussion on the prior series, I'm still not
> > sure why this approach is exclusive of extending the entity placement
> > code; I think both changes together would be useful.
> >
> > By only changing the wakeup preemption decision, you're only
> > guaranteeing that the latency-sensitive thing on cpu won't be
> > preempted until the next sched tick, which can occur at any time
> > offset from the wakeup, from 0ns to the length of one tick. If a
>
> In fact, you are ensured to run a minimum time of 3ms at least on >=8
> cores system before tick can preempt you. I considered updating this
> part as well to increase the value for negative weight but didn't do
> it for now. I can have a look

If the latency sensitive thing on cpu has already been running close
to or greater than that min granularity, that doesn't apply; can still
get preempted pretty quickly from the tick by the newly woken task.

A possible change would be to reduce the sleeper credit when the
waking entity has lower latency priority than the current entity (ie.
similar to the se_is_idle() check already being made in
place_entity()).

> > latency-tolerant task wakes up with a lot of sleeper credit, it would
> > pretty quickly preempt a latency-sensitive task on-cpu, even if it
> > doesn't initially do so due to the above changes to wakeup preemption.
> >
> > Adjusting place_entity wouldn't impact cpu bandwidth in steady-state
> > competition between threads of different latency prio, it would only
> > impact slightly at wakeup, in a similar but more consistent manner to
> > the changes above to wakeup_preempt_entity (ie. a task that is not
> > latency sensitive might have to wait a few ticks to preempt a latency
> > sensitive task, even if it was recently sleeping for a while).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/7] Add latency_nice priority
  2022-05-19 18:14     ` Chris Hyser
@ 2022-05-20 18:46       ` Tim Chen
  2022-05-23 12:52         ` Vincent Guittot
  0 siblings, 1 reply; 23+ messages in thread
From: Tim Chen @ 2022-05-20 18:46 UTC (permalink / raw)
  To: Chris Hyser, Vincent Guittot
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, linux-kernel, parth, qais.yousef,
	valentin.schneider, patrick.bellasi, David.Laight, pjt, pavel,
	tj, qperret, joshdon, len.brown

On Thu, 2022-05-19 at 14:14 -0400, Chris Hyser wrote:
> On 5/19/22 10:16 AM, Vincent Guittot wrote:
> > On Fri, 13 May 2022 at 23:44, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > > On Thu, 2022-05-12 at 18:35 +0200, Vincent Guittot wrote:
> > > > This patchset restarts the work about adding a latency nice priority to
> > > > describe the latency tolerance of cfs tasks.
> > > > 
> > > > The patches [1-4] have been done by Parth:
> > > > https://lore.kernel.org/lkml/20200228090755.22829-1-parth@linux.ibm.com/
> > > > 
> > > > I have just rebased and moved the set of latency priority outside the
> > > > priority update. I have removed the reviewed tag because the patches
> > > > are 2 years old.
> > > > 
> > > 
> > > Vincent,
> > > 
> > > Thanks for introducing the feature again, which is much needed.  I am trying
> > > to look at the problem again from usage point of view. And wonder if
> > > there are ways to make the latency_nice knob easier to use.
> > > 
> > > The latency nice value here is relative.  A latency sensitive task
> > > may not tell if setting the latency_nice to -5, or to -10 is good enough.
> > > It depends on what other tasks are setting their latency_nice value to.
> > > What a task does know is what it it doing and its characteristics.
> > > For instance for client tasks, we may have categories such as
> > > 
> > > Task Category                                   latency_nice_range
> > > -------------                                   ------------------
> > > urgent                                          -19 to -16
> > > media playback                                  -15 to -11
> > > interactive (e.g.pressing key)                  -10 to -6
> > > normal                                          -5  to  9
> > > background                                       10  to 15
> > > opportunistic soaker task (sched_idle class)     16 to  20
> > > 
> > > And we could allow a task to set attribute of which task category applies
> > > to it and the OS can set a default latency nice value in its task category.
> > > So a task can just declare itself what kind of task it is, and not worry about
> > > actually setting a latency nice value which it may not know
> > > what is appopriate.
> > > If needed, a task could still adjust its latency nice value within the range to
> > > differentiate itself in a task category. And we will prevent
> > > a task from seeting inappropriate latency nice value out of the right range.
> > 
> > The description above make sense but I'm not sure this should be put
> > as part of the interface but more in the documentation to describe how
> > system can make use of nice_latency
> > > Knowing a task characteristics will also be helpful with other
> > > scheduling decisions, like placing a task on a more high performing
> > > core in hetero systems.
> > 
> > Ok so you would like a more general interface than an latency
> > interface but a way to set some attributes to a task so we can make
> > smarter decision
> 
> The original definition of latency nice was as a task attribute describing the latency sensitivity of the task. The fact 
> that it was mapped to 'nice' values created too much granularity and made it look more like a tuning knob than a 
> statement about the characteristics of the task as intended.
> 
> > > I think the missing piece here is a way for a task to declare
> > > what kind of task it is.  I think that will make things easier.
> 
> A classification of tasks into categories would be useful, but perhaps one level up in a user space tool or a user's 
> head (ie docs). For any of the categories you describe, there may be a number of per-task attributes beyond latency 
> sensitivity needed to capture the task characteristics you mention and ideally would be set in specific ways. Say 'nice' 
> values, oom kill, etc. And others may make sense in the future, like say NUMA sensitivity, etc.
> 
> Basically, a category can map to a bunch of desired default values for various scheduler visible task attributes.

Yes.  I think having a default value for each category will make the setting of the attributes
consistent and life of a task simpler.

Without guidance, one media playback task can set its latency_nice to be -5, 
and another interactive task doing computation and displaying results to user
set its latency_nice to be -6. We would have the interactive task running ahead of the media playback,
which is undesired.  Just letting each task set its own latency_nice value will not achieve the
desired effect.  We need guidance on what attribute value a certain task category 
should use (as in documentation that Vincent mentioned).

Or let OS set the attribute to some sensible value if it knows the task category.

> 
> Now you could also take the idea in the other direction where you set a "category value" for a task and have the kernel 
> pick the other attribute defaults like 'nice' that would typically apply to tasks in the category, but I think letting 
> user space figure stuff out and then set low level kernel task attributes primitives is cleaner.

It is cleaner for the OS to give the knob to the user and say, "set it".
But users need guidance on what knob value they should use,
or even better, they don't need to worry about setting the knob.
I think it is easy for the user to set task category correctly, but
harder to set the low level knobs.

I can see that it is easy to set the latency_nice knob incorrectly.
The absolute value of the knob doesn't matter. Only the relative values
between tasks do. So we need all the tasks to agree on what latency-nice
to use for a task category.  

Tim


> 
> -chrish


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/7] Add latency_nice priority
  2022-05-20 18:46       ` Tim Chen
@ 2022-05-23 12:52         ` Vincent Guittot
  2022-05-23 13:19           ` David Laight
  0 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2022-05-23 12:52 UTC (permalink / raw)
  To: Tim Chen
  Cc: Chris Hyser, mingo, peterz, juri.lelli, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, linux-kernel,
	parth, qais.yousef, valentin.schneider, patrick.bellasi,
	David.Laight, pjt, pavel, tj, qperret, joshdon, len.brown

On Fri, 20 May 2022 at 20:46, Tim Chen <tim.c.chen@linux.intel.com> wrote:
>
> On Thu, 2022-05-19 at 14:14 -0400, Chris Hyser wrote:
> > On 5/19/22 10:16 AM, Vincent Guittot wrote:
> > > On Fri, 13 May 2022 at 23:44, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > > > On Thu, 2022-05-12 at 18:35 +0200, Vincent Guittot wrote:
> > > > > This patchset restarts the work about adding a latency nice priority to
> > > > > describe the latency tolerance of cfs tasks.
> > > > >
> > > > > The patches [1-4] have been done by Parth:
> > > > > https://lore.kernel.org/lkml/20200228090755.22829-1-parth@linux.ibm.com/
> > > > >
> > > > > I have just rebased and moved the set of latency priority outside the
> > > > > priority update. I have removed the reviewed tag because the patches
> > > > > are 2 years old.
> > > > >
> > > >
> > > > Vincent,
> > > >
> > > > Thanks for introducing the feature again, which is much needed.  I am trying
> > > > to look at the problem again from usage point of view. And wonder if
> > > > there are ways to make the latency_nice knob easier to use.
> > > >
> > > > The latency nice value here is relative.  A latency sensitive task
> > > > may not tell if setting the latency_nice to -5, or to -10 is good enough.
> > > > It depends on what other tasks are setting their latency_nice value to.
> > > > What a task does know is what it it doing and its characteristics.
> > > > For instance for client tasks, we may have categories such as
> > > >
> > > > Task Category                                   latency_nice_range
> > > > -------------                                   ------------------
> > > > urgent                                          -19 to -16
> > > > media playback                                  -15 to -11
> > > > interactive (e.g.pressing key)                  -10 to -6
> > > > normal                                          -5  to  9
> > > > background                                       10  to 15
> > > > opportunistic soaker task (sched_idle class)     16 to  20
> > > >
> > > > And we could allow a task to set attribute of which task category applies
> > > > to it and the OS can set a default latency nice value in its task category.
> > > > So a task can just declare itself what kind of task it is, and not worry about
> > > > actually setting a latency nice value which it may not know
> > > > what is appopriate.
> > > > If needed, a task could still adjust its latency nice value within the range to
> > > > differentiate itself in a task category. And we will prevent
> > > > a task from seeting inappropriate latency nice value out of the right range.
> > >
> > > The description above make sense but I'm not sure this should be put
> > > as part of the interface but more in the documentation to describe how
> > > system can make use of nice_latency
> > > > Knowing a task characteristics will also be helpful with other
> > > > scheduling decisions, like placing a task on a more high performing
> > > > core in hetero systems.
> > >
> > > Ok so you would like a more general interface than an latency
> > > interface but a way to set some attributes to a task so we can make
> > > smarter decision
> >
> > The original definition of latency nice was as a task attribute describing the latency sensitivity of the task. The fact
> > that it was mapped to 'nice' values created too much granularity and made it look more like a tuning knob than a
> > statement about the characteristics of the task as intended.
> >
> > > > I think the missing piece here is a way for a task to declare
> > > > what kind of task it is.  I think that will make things easier.
> >
> > A classification of tasks into categories would be useful, but perhaps one level up in a user space tool or a user's
> > head (ie docs). For any of the categories you describe, there may be a number of per-task attributes beyond latency
> > sensitivity needed to capture the task characteristics you mention and ideally would be set in specific ways. Say 'nice'
> > values, oom kill, etc. And others may make sense in the future, like say NUMA sensitivity, etc.
> >
> > Basically, a category can map to a bunch of desired default values for various scheduler visible task attributes.
>
> Yes.  I think having a default value for each category will make the setting of the attributes
> consistent and life of a task simpler.
>
> Without guidance, one media playback task can set its latency_nice to be -5,
> and another interactive task doing computation and displaying results to user
> set its latency_nice to be -6. We would have the interactive task running ahead of the media playback,
> which is undesired.  Just letting each task set its own latency_nice value will not achieve the
> desired effect.  We need guidance on what attribute value a certain task category
> should use (as in documentation that Vincent mentioned).
>
> Or let OS set the attribute to some sensible value if it knows the task category.

I'm in favor of documentation that provides guidance rather than the
OS setting attribute because the values are relative to each others so
we will always find some exceptions to a default configuration for let
say media vs user interaction. With a documentation, someone can
always not follow it if he thinks it doesn't apply to his system
because of whatever the reason. With the OS setting attributes, this
exception will have to abuse its category to get the right priority vs
others

>
> >
> > Now you could also take the idea in the other direction where you set a "category value" for a task and have the kernel
> > pick the other attribute defaults like 'nice' that would typically apply to tasks in the category, but I think letting
> > user space figure stuff out and then set low level kernel task attributes primitives is cleaner.
>
> It is cleaner for the OS to give the knob to the user and say, "set it".
> But users need guidance on what knob value they should use,
> or even better, they don't need to worry about setting the knob.
> I think it is easy for the user to set task category correctly, but
> harder to set the low level knobs.
>
> I can see that it is easy to set the latency_nice knob incorrectly.
> The absolute value of the knob doesn't matter. Only the relative values
> between tasks do. So we need all the tasks to agree on what latency-nice
> to use for a task category.
>
> Tim
>
>
> >
> > -chrish
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/7] sched/fair: Take into account latency nice at wakeup
  2022-05-19 20:06       ` Josh Don
@ 2022-05-23 12:58         ` Vincent Guittot
  0 siblings, 0 replies; 23+ messages in thread
From: Vincent Guittot @ 2022-05-23 12:58 UTC (permalink / raw)
  To: Josh Don
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Benjamin Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	parth, Qais Yousef, Hyser,Chris, Valentin Schneider,
	patrick.bellasi, David.Laight, Paul Turner, pavel, Tejun Heo,
	Quentin Perret, Tim Chen

On Thu, 19 May 2022 at 22:07, Josh Don <joshdon@google.com> wrote:
>
> On Thu, May 19, 2022 at 6:57 AM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > On Tue, 17 May 2022 at 02:54, Josh Don <joshdon@google.com> wrote:
> > >
> > > Hi Vincent,
> > >
> > > On Thu, May 12, 2022 at 9:36 AM Vincent Guittot
> > > <vincent.guittot@linaro.org> wrote:
> > > >
> > > > Take into account the nice latency priority of a thread when deciding to
> > > > preempt the current running thread. We don't want to provide more CPU
> > > > bandwidth to a thread but reorder the scheduling to run latency sensitive
> > > > task first whenever possible.
> > > >
> > > > As long as a thread didn't use its bandwidth, it will be able to preempt
> > > > the current thread.
> > > >
> > > > At the opposite, a thread with a low latency priority will preempt current
> > > > thread at wakeup only to keep fair CPU bandwidth sharing. Otherwise it will
> > > > wait for the tick to get its sched slice.
> > >
> > > Following up from the discussion on the prior series, I'm still not
> > > sure why this approach is exclusive of extending the entity placement
> > > code; I think both changes together would be useful.
> > >
> > > By only changing the wakeup preemption decision, you're only
> > > guaranteeing that the latency-sensitive thing on cpu won't be
> > > preempted until the next sched tick, which can occur at any time
> > > offset from the wakeup, from 0ns to the length of one tick. If a
> >
> > In fact, you are ensured to run a minimum time of 3ms at least on >=8
> > cores system before tick can preempt you. I considered updating this
> > part as well to increase the value for negative weight but didn't do
> > it for now. I can have a look
>
> If the latency sensitive thing on cpu has already been running close
> to or greater than that min granularity, that doesn't apply; can still
> get preempted pretty quickly from the tick by the newly woken task.

The 3ms is ensured from the task wakeup whatever the previous run and
whatever can wakeup in the mean time but didn't preempt it at wakeup.
As mentioned above, it is probably worth having this value scale with
latency, especially for low latency task.

>
> A possible change would be to reduce the sleeper credit when the
> waking entity has lower latency priority than the current entity (ie.
> similar to the se_is_idle() check already being made in
> place_entity()).
>
> > > latency-tolerant task wakes up with a lot of sleeper credit, it would
> > > pretty quickly preempt a latency-sensitive task on-cpu, even if it
> > > doesn't initially do so due to the above changes to wakeup preemption.
> > >
> > > Adjusting place_entity wouldn't impact cpu bandwidth in steady-state
> > > competition between threads of different latency prio, it would only
> > > impact slightly at wakeup, in a similar but more consistent manner to
> > > the changes above to wakeup_preempt_entity (ie. a task that is not
> > > latency sensitive might have to wait a few ticks to preempt a latency
> > > sensitive task, even if it was recently sleeping for a while).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH v2 0/7] Add latency_nice priority
  2022-05-23 12:52         ` Vincent Guittot
@ 2022-05-23 13:19           ` David Laight
  0 siblings, 0 replies; 23+ messages in thread
From: David Laight @ 2022-05-23 13:19 UTC (permalink / raw)
  To: 'Vincent Guittot', Tim Chen
  Cc: Chris Hyser, mingo, peterz, juri.lelli, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, linux-kernel,
	parth, qais.yousef, valentin.schneider, patrick.bellasi, pjt,
	pavel, tj, qperret, joshdon, len.brown

> > > > > media playback                                  -15 to -11

Isn't that what the RT scheduler is for....

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-05-23 13:19 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-12 16:35 [PATCH v2 0/7] Add latency_nice priority Vincent Guittot
2022-05-12 16:35 ` [PATCH v2 1/7] sched: Introduce latency-nice as a per-task attribute Vincent Guittot
2022-05-12 16:35 ` [PATCH v2 2/7] sched/core: Propagate parent task's latency requirements to the child task Vincent Guittot
2022-05-12 16:35 ` [PATCH v2 3/7] sched: Allow sched_{get,set}attr to change latency_nice of the task Vincent Guittot
2022-05-12 16:35 ` [PATCH v2 4/7] sched/core: Add permission checks for setting the latency_nice value Vincent Guittot
2022-05-12 16:35 ` [PATCH v2 5/7] sched/fair: Take into account latency nice at wakeup Vincent Guittot
2022-05-17  0:54   ` Josh Don
2022-05-19 13:55     ` Vincent Guittot
2022-05-19 20:06       ` Josh Don
2022-05-23 12:58         ` Vincent Guittot
2022-05-12 16:35 ` [PATCH v2 6/7] sched/fair: Add sched group latency support Vincent Guittot
2022-05-12 16:35 ` [PATCH v2 7/7] sched/core: support latency nice with sched core Vincent Guittot
2022-05-13  7:12 ` [PATCH v2 0/7] Add latency_nice priority David Laight
2022-05-13 14:29   ` Vincent Guittot
2022-05-13 21:44 ` Tim Chen
2022-05-19 14:16   ` Vincent Guittot
2022-05-19 18:14     ` Chris Hyser
2022-05-20 18:46       ` Tim Chen
2022-05-23 12:52         ` Vincent Guittot
2022-05-23 13:19           ` David Laight
2022-05-15  8:28 [PATCH v2 6/7] sched/fair: Add sched group latency support kernel test robot
2022-05-16 13:06 ` Dan Carpenter
2022-05-16 13:33 ` Vincent Guittot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.