All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/4] sched: Optimizations to sched_class processing
@ 2019-12-19 21:44 Steven Rostedt
  2019-12-19 21:44 ` [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor Steven Rostedt
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Steven Rostedt @ 2019-12-19 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kirill Tkhai, Kirill Tkhai, Peter Zijlstra, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, Ingo Molnar,
	Andrew Morton


As Kirill made a micro-optimization to the processing of pick_next_task()
that required the address locations of the sched_class descriptors to
be that of their priority to one another. This required a linker
script modification to guarantee that order.

After adding the forced order in the linker script, I realized that
we no longer needed the 'next' field in the sched_class descriptor.
Thus, I changed it to use the order of the linker script.

Then decided that the sched_class_highest define could be moved
to the linker script as well to keep the defines of the order to
be in one location, and it be obvious what the highest sched_class is
when SMP is not configured. (BTW, I have not tested this with
!CONFIG_SMP yet, but who does ;-). As the removal of next was a bit
more invasive than the highest sched class change, I moved that to
be the second patch.

Finally I added Kirill's patch at the end, (Which may not have made it
to LKML due to trying to not get it mangled by Exchange).

Kirill Tkhai (1):
      sched: Micro optimization in pick_next_task() and in check_preempt_curr()

Steven Rostedt (VMware) (3):
      sched: Force the address order of each sched class descriptor
      sched: Have sched_class_highest define by vmlinux.lds.h
      sched: Remove struct sched_class next field

----
 include/asm-generic/vmlinux.lds.h | 29 +++++++++++++++++++++++++++++
 kernel/sched/core.c               | 24 +++++++++---------------
 kernel/sched/deadline.c           |  4 ++--
 kernel/sched/fair.c               |  4 ++--
 kernel/sched/idle.c               |  4 ++--
 kernel/sched/rt.c                 |  4 ++--
 kernel/sched/sched.h              | 13 +++++--------
 kernel/sched/stop_task.c          |  4 ++--
 8 files changed, 53 insertions(+), 33 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor
  2019-12-19 21:44 [RFC][PATCH 0/4] sched: Optimizations to sched_class processing Steven Rostedt
@ 2019-12-19 21:44 ` Steven Rostedt
  2019-12-20  8:52   ` Rasmus Villemoes
  2019-12-19 21:44 ` [RFC][PATCH 2/4] sched: Have sched_class_highest define by vmlinux.lds.h Steven Rostedt
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2019-12-19 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kirill Tkhai, Kirill Tkhai, Peter Zijlstra, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, Ingo Molnar,
	Andrew Morton

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

In order to make a micro optimization in pick_next_task(), the order of the
sched class descriptor address must be in the same order as their priority
to each other. That is:

 &idle_sched_class < &fair_sched_class < &rt_sched_class <
 &dl_sched_class < &stop_sched_class

In order to guarantee this order of the sched class descriptors, add each
one into their own data section and force the order in the linker script.

Link: https://lore.kernel.org/r/157675913272.349305.8936736338884044103.stgit@localhost.localdomain
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/asm-generic/vmlinux.lds.h | 19 +++++++++++++++++++
 kernel/sched/deadline.c           |  3 ++-
 kernel/sched/fair.c               |  3 ++-
 kernel/sched/idle.c               |  3 ++-
 kernel/sched/rt.c                 |  3 ++-
 kernel/sched/stop_task.c          |  3 ++-
 6 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index e00f41aa8ec4..772d961c69a5 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -108,6 +108,24 @@
 #define SBSS_MAIN .sbss
 #endif
 
+#ifdef CONFIG_SMP
+#define STOP_SCHED_CLASS	*(__stop_sched_class)
+#else
+#define STOP_SCHED_CLASS
+#endif
+
+/*
+ * The order of the sched class addresses are important, as they are
+ * used to determine the order of the priority of each sched class in
+ * relation to each other.
+ */
+#define SCHED_DATA				\
+	*(__idle_sched_class)			\
+	*(__fair_sched_class)			\
+	*(__rt_sched_class)			\
+	*(__dl_sched_class)			\
+	STOP_SCHED_CLASS
+
 /*
  * Align to a 32 byte boundary equal to the
  * alignment gcc 4.5 uses for a struct
@@ -308,6 +326,7 @@
 #define DATA_DATA							\
 	*(.xiptext)							\
 	*(DATA_MAIN)							\
+	SCHED_DATA							\
 	*(.ref.data)							\
 	*(.data..shared_aligned) /* percpu related */			\
 	MEM_KEEP(init.data*)						\
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 43323f875cb9..5abdbe569f93 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2428,7 +2428,8 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
 	}
 }
 
-const struct sched_class dl_sched_class = {
+const struct sched_class dl_sched_class
+	__attribute__((section("__dl_sched_class"))) = {
 	.next			= &rt_sched_class,
 	.enqueue_task		= enqueue_task_dl,
 	.dequeue_task		= dequeue_task_dl,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 08a233e97a01..e745fe0e0cd3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10745,7 +10745,8 @@ static unsigned int get_rr_interval_fair(struct rq *rq, struct task_struct *task
 /*
  * All the scheduling class methods:
  */
-const struct sched_class fair_sched_class = {
+const struct sched_class fair_sched_class
+	__attribute__((section("__fair_sched_class"))) = {
 	.next			= &idle_sched_class,
 	.enqueue_task		= enqueue_task_fair,
 	.dequeue_task		= dequeue_task_fair,
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index ffa959e91227..700a9c826f0e 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -454,7 +454,8 @@ static void update_curr_idle(struct rq *rq)
 /*
  * Simple, special scheduling class for the per-CPU idle tasks:
  */
-const struct sched_class idle_sched_class = {
+const struct sched_class idle_sched_class
+	__attribute__((section("__idle_sched_class"))) = {
 	/* .next is NULL */
 	/* no enqueue/yield_task for idle tasks */
 
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e591d40fd645..5d3f9bcddaeb 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2354,7 +2354,8 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
 		return 0;
 }
 
-const struct sched_class rt_sched_class = {
+const struct sched_class rt_sched_class
+	__attribute__((section("__rt_sched_class"))) = {
 	.next			= &fair_sched_class,
 	.enqueue_task		= enqueue_task_rt,
 	.dequeue_task		= dequeue_task_rt,
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 4c9e9975684f..03bc7530ff75 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -115,7 +115,8 @@ static void update_curr_stop(struct rq *rq)
 /*
  * Simple, special scheduling class for the per-CPU stop tasks:
  */
-const struct sched_class stop_sched_class = {
+const struct sched_class stop_sched_class
+	__attribute__((section("__stop_sched_class"))) = {
 	.next			= &dl_sched_class,
 
 	.enqueue_task		= enqueue_task_stop,
-- 
2.24.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC][PATCH 2/4] sched: Have sched_class_highest define by vmlinux.lds.h
  2019-12-19 21:44 [RFC][PATCH 0/4] sched: Optimizations to sched_class processing Steven Rostedt
  2019-12-19 21:44 ` [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor Steven Rostedt
@ 2019-12-19 21:44 ` Steven Rostedt
  2019-12-20  8:52   ` Rasmus Villemoes
  2020-06-25 11:53   ` [tip: sched/core] " tip-bot2 for Steven Rostedt (VMware)
  2019-12-19 21:44 ` [RFC][PATCH 3/4] sched: Remove struct sched_class next field Steven Rostedt
  2019-12-19 21:44 ` [RFC][PATCH 4/4] sched: Micro optimization in pick_next_task() and in check_preempt_curr() Steven Rostedt
  3 siblings, 2 replies; 16+ messages in thread
From: Steven Rostedt @ 2019-12-19 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kirill Tkhai, Kirill Tkhai, Peter Zijlstra, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, Ingo Molnar,
	Andrew Morton

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Now that the sched_class descriptors are defined by the linker script, and
this needs to be aware of the existance of stop_sched_class when SMP is
enabled or not, as it is used as the "highest" priority when defined. Move
the declaration of sched_class_highest to the same location in the linker
script that inserts stop_sched_class, and this will also make it easier to
see what should be defined as the highest class, as this linker script
location defines the priorities as well.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/asm-generic/vmlinux.lds.h | 11 ++++++++++-
 kernel/sched/sched.h              |  9 +++------
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 772d961c69a5..1c14c4ddf785 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -109,9 +109,16 @@
 #endif
 
 #ifdef CONFIG_SMP
-#define STOP_SCHED_CLASS	*(__stop_sched_class)
+#define STOP_SCHED_CLASS		\
+	STRUCT_ALIGN();			\
+	sched_class_highest = .;	\
+	*(__stop_sched_class)
+#define BEFORE_DL_SCHED_CLASS
 #else
 #define STOP_SCHED_CLASS
+#define BEFORE_DL_SCHED_CLASS		\
+	STRUCT_ALIGN();			\
+	sched_class_highest = .;
 #endif
 
 /*
@@ -120,9 +127,11 @@
  * relation to each other.
  */
 #define SCHED_DATA				\
+	STRUCT_ALIGN();				\
 	*(__idle_sched_class)			\
 	*(__fair_sched_class)			\
 	*(__rt_sched_class)			\
+	BEFORE_DL_SCHED_CLASS			\
 	*(__dl_sched_class)			\
 	STOP_SCHED_CLASS
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 280a3c735935..0554c588ad85 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1771,17 +1771,14 @@ static inline void set_next_task(struct rq *rq, struct task_struct *next)
 	next->sched_class->set_next_task(rq, next, false);
 }
 
-#ifdef CONFIG_SMP
-#define sched_class_highest (&stop_sched_class)
-#else
-#define sched_class_highest (&dl_sched_class)
-#endif
+/* Defined in include/asm-generic/vmlinux.lds.h */
+extern struct sched_class sched_class_highest;
 
 #define for_class_range(class, _from, _to) \
 	for (class = (_from); class != (_to); class = class->next)
 
 #define for_each_class(class) \
-	for_class_range(class, sched_class_highest, NULL)
+	for_class_range(class, &sched_class_highest, NULL)
 
 extern const struct sched_class stop_sched_class;
 extern const struct sched_class dl_sched_class;
-- 
2.24.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC][PATCH 3/4] sched: Remove struct sched_class next field
  2019-12-19 21:44 [RFC][PATCH 0/4] sched: Optimizations to sched_class processing Steven Rostedt
  2019-12-19 21:44 ` [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor Steven Rostedt
  2019-12-19 21:44 ` [RFC][PATCH 2/4] sched: Have sched_class_highest define by vmlinux.lds.h Steven Rostedt
@ 2019-12-19 21:44 ` Steven Rostedt
  2019-12-20 12:14   ` Rasmus Villemoes
  2020-06-25 11:53   ` [tip: sched/core] sched: Remove struct sched_class::next field tip-bot2 for Steven Rostedt (VMware)
  2019-12-19 21:44 ` [RFC][PATCH 4/4] sched: Micro optimization in pick_next_task() and in check_preempt_curr() Steven Rostedt
  3 siblings, 2 replies; 16+ messages in thread
From: Steven Rostedt @ 2019-12-19 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kirill Tkhai, Kirill Tkhai, Peter Zijlstra, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, Ingo Molnar,
	Andrew Morton

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Now that the sched_class descriptors are defined in order via the linker
script vmlinux.lds.h, there's no reason to have a "next" pointer to the
previous priroity structure. The order of the sturctures can be aligned as
an array, and used to index and find the next sched_class descriptor.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/asm-generic/vmlinux.lds.h | 1 +
 kernel/sched/deadline.c           | 1 -
 kernel/sched/fair.c               | 1 -
 kernel/sched/idle.c               | 1 -
 kernel/sched/rt.c                 | 1 -
 kernel/sched/sched.h              | 6 +++---
 kernel/sched/stop_task.c          | 1 -
 7 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 1c14c4ddf785..f4d480c4f7c6 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -128,6 +128,7 @@
  */
 #define SCHED_DATA				\
 	STRUCT_ALIGN();				\
+	__start_sched_classes = .;		\
 	*(__idle_sched_class)			\
 	*(__fair_sched_class)			\
 	*(__rt_sched_class)			\
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 5abdbe569f93..9c232214fe63 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2430,7 +2430,6 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
 
 const struct sched_class dl_sched_class
 	__attribute__((section("__dl_sched_class"))) = {
-	.next			= &rt_sched_class,
 	.enqueue_task		= enqueue_task_dl,
 	.dequeue_task		= dequeue_task_dl,
 	.yield_task		= yield_task_dl,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e745fe0e0cd3..52f2a7b06d9b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10747,7 +10747,6 @@ static unsigned int get_rr_interval_fair(struct rq *rq, struct task_struct *task
  */
 const struct sched_class fair_sched_class
 	__attribute__((section("__fair_sched_class"))) = {
-	.next			= &idle_sched_class,
 	.enqueue_task		= enqueue_task_fair,
 	.dequeue_task		= dequeue_task_fair,
 	.yield_task		= yield_task_fair,
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 700a9c826f0e..f0871a9b8c98 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -456,7 +456,6 @@ static void update_curr_idle(struct rq *rq)
  */
 const struct sched_class idle_sched_class
 	__attribute__((section("__idle_sched_class"))) = {
-	/* .next is NULL */
 	/* no enqueue/yield_task for idle tasks */
 
 	/* dequeue is not valid, we print a debug message there: */
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 5d3f9bcddaeb..d6b330b72c60 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2356,7 +2356,6 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
 
 const struct sched_class rt_sched_class
 	__attribute__((section("__rt_sched_class"))) = {
-	.next			= &fair_sched_class,
 	.enqueue_task		= enqueue_task_rt,
 	.dequeue_task		= dequeue_task_rt,
 	.yield_task		= yield_task_rt,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0554c588ad85..30a4615cf480 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1700,7 +1700,6 @@ extern const u32		sched_prio_to_wmult[40];
 #define RETRY_TASK		((void *)-1UL)
 
 struct sched_class {
-	const struct sched_class *next;
 
 #ifdef CONFIG_UCLAMP_TASK
 	int uclamp_enabled;
@@ -1773,12 +1772,13 @@ static inline void set_next_task(struct rq *rq, struct task_struct *next)
 
 /* Defined in include/asm-generic/vmlinux.lds.h */
 extern struct sched_class sched_class_highest;
+extern struct sched_class __start_sched_classes;
 
 #define for_class_range(class, _from, _to) \
-	for (class = (_from); class != (_to); class = class->next)
+	for (class = (_from); class > (_to); class--)
 
 #define for_each_class(class) \
-	for_class_range(class, &sched_class_highest, NULL)
+	for_class_range(class, &sched_class_highest, (&__start_sched_classes) - 1)
 
 extern const struct sched_class stop_sched_class;
 extern const struct sched_class dl_sched_class;
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 03bc7530ff75..0f88eec8d4da 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -117,7 +117,6 @@ static void update_curr_stop(struct rq *rq)
  */
 const struct sched_class stop_sched_class
 	__attribute__((section("__stop_sched_class"))) = {
-	.next			= &dl_sched_class,
 
 	.enqueue_task		= enqueue_task_stop,
 	.dequeue_task		= dequeue_task_stop,
-- 
2.24.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC][PATCH 4/4] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 21:44 [RFC][PATCH 0/4] sched: Optimizations to sched_class processing Steven Rostedt
                   ` (2 preceding siblings ...)
  2019-12-19 21:44 ` [RFC][PATCH 3/4] sched: Remove struct sched_class next field Steven Rostedt
@ 2019-12-19 21:44 ` Steven Rostedt
  3 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2019-12-19 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kirill Tkhai, Kirill Tkhai, Peter Zijlstra, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, Ingo Molnar,
	Andrew Morton

From: Kirill Tkhai <tkhai@yandex.ru>

This introduces an optimization based on xxx_sched_class addresses
in two hot scheduler functions: pick_next_task() and check_preempt_curr().

It is possible to compare pointers to sched classes to check, which
of them has a higher priority, instead of current iterations using
for_each_class().

One more result of the patch is that size of object file becomes a little
less (excluding added BUG_ON(), which goes in __init section):

$size kernel/sched/core.o
         text     data      bss	    dec	    hex	filename
before:  66446    18957	    676	  86079	  1503f	kernel/sched/core.o
after:   66398    18957	    676	  86031	  1500f	kernel/sched/core.o

Link: http://lkml.kernel.org/r/711a9c4b-ff32-1136-b848-17c622d548f3@yandex.ru

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/sched/core.c | 24 +++++++++---------------
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 90e4b00ace89..63401807fcf0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1416,20 +1416,10 @@ static inline void check_class_changed(struct rq *rq, struct task_struct *p,
 
 void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
 {
-	const struct sched_class *class;
-
-	if (p->sched_class == rq->curr->sched_class) {
+	if (p->sched_class == rq->curr->sched_class)
 		rq->curr->sched_class->check_preempt_curr(rq, p, flags);
-	} else {
-		for_each_class(class) {
-			if (class == rq->curr->sched_class)
-				break;
-			if (class == p->sched_class) {
-				resched_curr(rq);
-				break;
-			}
-		}
-	}
+	else if (p->sched_class > rq->curr->sched_class)
+		resched_curr(rq);
 
 	/*
 	 * A queue event has occurred, and we're going to schedule.  In
@@ -3914,8 +3904,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	 * higher scheduling class, because otherwise those loose the
 	 * opportunity to pull in more work from other CPUs.
 	 */
-	if (likely((prev->sched_class == &idle_sched_class ||
-		    prev->sched_class == &fair_sched_class) &&
+	if (likely(prev->sched_class <= &fair_sched_class &&
 		   rq->nr_running == rq->cfs.h_nr_running)) {
 
 		p = pick_next_task_fair(rq, prev, rf);
@@ -6569,6 +6558,11 @@ void __init sched_init(void)
 	unsigned long ptr = 0;
 	int i;
 
+	BUG_ON(&idle_sched_class > &fair_sched_class ||
+		&fair_sched_class > &rt_sched_class ||
+		&rt_sched_class > &dl_sched_class ||
+		&dl_sched_class > &stop_sched_class);
+
 	wait_bit_init();
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-- 
2.24.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 2/4] sched: Have sched_class_highest define by vmlinux.lds.h
  2019-12-19 21:44 ` [RFC][PATCH 2/4] sched: Have sched_class_highest define by vmlinux.lds.h Steven Rostedt
@ 2019-12-20  8:52   ` Rasmus Villemoes
  2020-06-25 11:53   ` [tip: sched/core] " tip-bot2 for Steven Rostedt (VMware)
  1 sibling, 0 replies; 16+ messages in thread
From: Rasmus Villemoes @ 2019-12-20  8:52 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel
  Cc: Kirill Tkhai, Kirill Tkhai, Peter Zijlstra, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, Ingo Molnar,
	Andrew Morton

On 19/12/2019 22.44, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Now that the sched_class descriptors are defined by the linker script, and
> this needs to be aware of the existance of stop_sched_class when SMP is
> enabled or not, as it is used as the "highest" priority when defined. Move
> the declaration of sched_class_highest to the same location in the linker
> script that inserts stop_sched_class, and this will also make it easier to
> see what should be defined as the highest class, as this linker script
> location defines the priorities as well.
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  include/asm-generic/vmlinux.lds.h | 11 ++++++++++-
>  kernel/sched/sched.h              |  9 +++------
>  2 files changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 772d961c69a5..1c14c4ddf785 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -109,9 +109,16 @@
>  #endif
>  
>  #ifdef CONFIG_SMP
> -#define STOP_SCHED_CLASS	*(__stop_sched_class)
> +#define STOP_SCHED_CLASS		\
> +	STRUCT_ALIGN();			\
> +	sched_class_highest = .;	\
> +	*(__stop_sched_class)
> +#define BEFORE_DL_SCHED_CLASS
>  #else
>  #define STOP_SCHED_CLASS
> +#define BEFORE_DL_SCHED_CLASS		\
> +	STRUCT_ALIGN();			\
> +	sched_class_highest = .;
>  #endif
>  
>  /*
> @@ -120,9 +127,11 @@
>   * relation to each other.
>   */
>  #define SCHED_DATA				\
> +	STRUCT_ALIGN();				\
>  	*(__idle_sched_class)			\
>  	*(__fair_sched_class)			\
>  	*(__rt_sched_class)			\
> +	BEFORE_DL_SCHED_CLASS			\
>  	*(__dl_sched_class)			\
>  	STOP_SCHED_CLASS

If you reverse the ordering so the highest sched class comes first, this
can just be

sched_class_highest = .
STOP_SCHED_CLASS
*(__dl_sched_class)
...
*(__idle_sched_class)
__end_sched_classes = .

and this from patch 3/4

 #define for_class_range(class, _from, _to) \
-	for (class = (_from); class != (_to); class = class->next)
+	for (class = (_from); class > (_to); class--)

 #define for_each_class(class) \
-	for_class_range(class, &sched_class_highest, NULL)
+	for_class_range(class, &sched_class_highest, (&__start_sched_classes) - 1)

can instead become

+ for (class = (_from); class < (_to); class++)

and

+ for_class_range(class, &sched_class_highest, &__end_sched_classes)

which seem somewhat more readable.

And actually, I don't think you need the STOP_SCHED_CLASS define at all
- in non-SMP, no object file has a __stop_sched_class section, so
including *(__stop_sched_class) unconditionally will DTRT. (BTW, it's a
bit confusing that stop_task.o is compiled in if CONFIG_SMP, but
stop_task.c also has a #ifdef CONFIG_SMP).

Rasmus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor
  2019-12-19 21:44 ` [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor Steven Rostedt
@ 2019-12-20  8:52   ` Rasmus Villemoes
  2019-12-20 10:00     ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Rasmus Villemoes @ 2019-12-20  8:52 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel
  Cc: Kirill Tkhai, Kirill Tkhai, Peter Zijlstra, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, Ingo Molnar,
	Andrew Morton

On 19/12/2019 22.44, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> In order to make a micro optimization in pick_next_task(), the order of the
> sched class descriptor address must be in the same order as their priority
> to each other. That is:
> 
>  &idle_sched_class < &fair_sched_class < &rt_sched_class <
>  &dl_sched_class < &stop_sched_class
> 
> In order to guarantee this order of the sched class descriptors, add each
> one into their own data section and force the order in the linker script.

I think it would make the code simpler if one reverses these, see other
reply.

> +/*
> + * The order of the sched class addresses are important, as they are
> + * used to determine the order of the priority of each sched class in
> + * relation to each other.
> + */
> +#define SCHED_DATA				\
> +	*(__idle_sched_class)			\
> +	*(__fair_sched_class)			\
> +	*(__rt_sched_class)			\
> +	*(__dl_sched_class)			\
> +	STOP_SCHED_CLASS
> +
>  /*
>   * Align to a 32 byte boundary equal to the
>   * alignment gcc 4.5 uses for a struct
> @@ -308,6 +326,7 @@
>  #define DATA_DATA							\
>  	*(.xiptext)							\
>  	*(DATA_MAIN)							\
> +	SCHED_DATA							\
>  	*(.ref.data)							\

Doesn't this make the structs end up in .data (writable) rather than
.rodata?

Rasmus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor
  2019-12-20  8:52   ` Rasmus Villemoes
@ 2019-12-20 10:00     ` Peter Zijlstra
  2019-12-20 10:12       ` Rasmus Villemoes
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2019-12-20 10:00 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Steven Rostedt, linux-kernel, Kirill Tkhai, Kirill Tkhai, mingo,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	Ingo Molnar, Andrew Morton

On Fri, Dec 20, 2019 at 09:52:37AM +0100, Rasmus Villemoes wrote:
> On 19/12/2019 22.44, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > 
> > In order to make a micro optimization in pick_next_task(), the order of the
> > sched class descriptor address must be in the same order as their priority
> > to each other. That is:
> > 
> >  &idle_sched_class < &fair_sched_class < &rt_sched_class <
> >  &dl_sched_class < &stop_sched_class
> > 
> > In order to guarantee this order of the sched class descriptors, add each
> > one into their own data section and force the order in the linker script.
> 
> I think it would make the code simpler if one reverses these, see other
> reply.

I started out agreeing, because of that mess around STOP_SCHED_CLASS and
that horrid BEFORE_CRUD thing.

Then, when I fixed it all up, I saw what it did to Kyrill's patch (#4)
and that ends up looking like:

-       if (likely((prev->sched_class == &idle_sched_class ||
-                   prev->sched_class == &fair_sched_class) &&
+       if (likely(prev->sched_class >= &fair_sched_class &&

And that's just weird.

Then I had a better look and now...

> > +/*
> > + * The order of the sched class addresses are important, as they are
> > + * used to determine the order of the priority of each sched class in
> > + * relation to each other.
> > + */
> > +#define SCHED_DATA				\
> > +	*(__idle_sched_class)			\
> > +	*(__fair_sched_class)			\
> > +	*(__rt_sched_class)			\
> > +	*(__dl_sched_class)			\
> > +	STOP_SCHED_CLASS

I'm confused, why does that STOP_SCHED_CLASS need magic here at all?
Doesn't the linker deal with empty sections already by making them 0
sized?

> >  /*
> >   * Align to a 32 byte boundary equal to the
> >   * alignment gcc 4.5 uses for a struct
> > @@ -308,6 +326,7 @@
> >  #define DATA_DATA							\
> >  	*(.xiptext)							\
> >  	*(DATA_MAIN)							\
> > +	SCHED_DATA							\
> >  	*(.ref.data)							\
> 
> Doesn't this make the structs end up in .data (writable) rather than
> .rodata?

Right! That wants fixing.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor
  2019-12-20 10:00     ` Peter Zijlstra
@ 2019-12-20 10:12       ` Rasmus Villemoes
  2019-12-20 10:44         ` Kirill Tkhai
  2019-12-20 12:19         ` Peter Zijlstra
  0 siblings, 2 replies; 16+ messages in thread
From: Rasmus Villemoes @ 2019-12-20 10:12 UTC (permalink / raw)
  To: Peter Zijlstra, Rasmus Villemoes
  Cc: Steven Rostedt, linux-kernel, Kirill Tkhai, Kirill Tkhai, mingo,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	Ingo Molnar, Andrew Morton

On 20/12/2019 11.00, Peter Zijlstra wrote:
> On Fri, Dec 20, 2019 at 09:52:37AM +0100, Rasmus Villemoes wrote:
>> On 19/12/2019 22.44, Steven Rostedt wrote:
>>> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
>>>
>>> In order to make a micro optimization in pick_next_task(), the order of the
>>> sched class descriptor address must be in the same order as their priority
>>> to each other. That is:
>>>
>>>  &idle_sched_class < &fair_sched_class < &rt_sched_class <
>>>  &dl_sched_class < &stop_sched_class
>>>
>>> In order to guarantee this order of the sched class descriptors, add each
>>> one into their own data section and force the order in the linker script.
>>
>> I think it would make the code simpler if one reverses these, see other
>> reply.
> 
> I started out agreeing, because of that mess around STOP_SCHED_CLASS and
> that horrid BEFORE_CRUD thing.
> 
> Then, when I fixed it all up, I saw what it did to Kyrill's patch (#4)
> and that ends up looking like:
> 
> -       if (likely((prev->sched_class == &idle_sched_class ||
> -                   prev->sched_class == &fair_sched_class) &&
> +       if (likely(prev->sched_class >= &fair_sched_class &&
> 
> And that's just weird.

I kind of agree, but if one can come up with good enough macro names, I
think all that comparison logic should be in helpers either way the
array is laid out. Something like

#define sched_class_lower_eq [something] /* perhaps comment on the array
order */
sched_class_lower_eq(prev->sched_class, &fair_sched_class)

> Then I had a better look and now...
> 
>>> +/*
>>> + * The order of the sched class addresses are important, as they are
>>> + * used to determine the order of the priority of each sched class in
>>> + * relation to each other.
>>> + */
>>> +#define SCHED_DATA				\
>>> +	*(__idle_sched_class)			\
>>> +	*(__fair_sched_class)			\
>>> +	*(__rt_sched_class)			\
>>> +	*(__dl_sched_class)			\
>>> +	STOP_SCHED_CLASS
> 
> I'm confused, why does that STOP_SCHED_CLASS need magic here at all?
> Doesn't the linker deal with empty sections already by making them 0
> sized?

Yes, but dropping the STOP_SCHED_CLASS define doesn't prevent one from
needing some ifdeffery to define highest_sched_class if they are laid
out in (higher sched class <-> higher address) order.

Rasmus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor
  2019-12-20 10:12       ` Rasmus Villemoes
@ 2019-12-20 10:44         ` Kirill Tkhai
  2019-12-20 15:18           ` Steven Rostedt
  2019-12-20 12:19         ` Peter Zijlstra
  1 sibling, 1 reply; 16+ messages in thread
From: Kirill Tkhai @ 2019-12-20 10:44 UTC (permalink / raw)
  To: Rasmus Villemoes, Peter Zijlstra
  Cc: Steven Rostedt, linux-kernel, Kirill Tkhai, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, Ingo Molnar,
	Andrew Morton

On 20.12.2019 13:12, Rasmus Villemoes wrote:
> On 20/12/2019 11.00, Peter Zijlstra wrote:
>> On Fri, Dec 20, 2019 at 09:52:37AM +0100, Rasmus Villemoes wrote:
>>> On 19/12/2019 22.44, Steven Rostedt wrote:
>>>> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
>>>>
>>>> In order to make a micro optimization in pick_next_task(), the order of the
>>>> sched class descriptor address must be in the same order as their priority
>>>> to each other. That is:
>>>>
>>>>  &idle_sched_class < &fair_sched_class < &rt_sched_class <
>>>>  &dl_sched_class < &stop_sched_class
>>>>
>>>> In order to guarantee this order of the sched class descriptors, add each
>>>> one into their own data section and force the order in the linker script.
>>>
>>> I think it would make the code simpler if one reverses these, see other
>>> reply.
>>
>> I started out agreeing, because of that mess around STOP_SCHED_CLASS and
>> that horrid BEFORE_CRUD thing.
>>
>> Then, when I fixed it all up, I saw what it did to Kyrill's patch (#4)
>> and that ends up looking like:
>>
>> -       if (likely((prev->sched_class == &idle_sched_class ||
>> -                   prev->sched_class == &fair_sched_class) &&
>> +       if (likely(prev->sched_class >= &fair_sched_class &&
>>
>> And that's just weird.
> 
> I kind of agree, but if one can come up with good enough macro names, I
> think all that comparison logic should be in helpers either way the
> array is laid out. Something like
> 
> #define sched_class_lower_eq [something] /* perhaps comment on the array
> order */
> sched_class_lower_eq(prev->sched_class, &fair_sched_class)

You started from making code simple argument. But in case of we make the code
simple in this single place, the rest of code, which are the many places, will
become wrapped over huge helpers, which make the readability worse IMO.

After we have implemented a reliable sections order in lds file, we may use
it for a long time without remembering the way they are organized. But in case
of we introduce huge helpers, we always will bump into them during analyzing
of difficult asynchronous code, and these helpers size will overflow internal
buffers in our heads just by their excess symbols.

My opinion is to better make some less beautiful thing in a single synchronous place,
than to distribute the redundancy over all the code (especially, when it is asynchronous).

>> Then I had a better look and now...
>>
>>>> +/*
>>>> + * The order of the sched class addresses are important, as they are
>>>> + * used to determine the order of the priority of each sched class in
>>>> + * relation to each other.
>>>> + */
>>>> +#define SCHED_DATA				\
>>>> +	*(__idle_sched_class)			\
>>>> +	*(__fair_sched_class)			\
>>>> +	*(__rt_sched_class)			\
>>>> +	*(__dl_sched_class)			\
>>>> +	STOP_SCHED_CLASS
>>
>> I'm confused, why does that STOP_SCHED_CLASS need magic here at all?
>> Doesn't the linker deal with empty sections already by making them 0
>> sized?
> 
> Yes, but dropping the STOP_SCHED_CLASS define doesn't prevent one from
> needing some ifdeffery to define highest_sched_class if they are laid
> out in (higher sched class <-> higher address) order.
> 
> Rasmus
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 3/4] sched: Remove struct sched_class next field
  2019-12-19 21:44 ` [RFC][PATCH 3/4] sched: Remove struct sched_class next field Steven Rostedt
@ 2019-12-20 12:14   ` Rasmus Villemoes
  2020-06-25 11:53   ` [tip: sched/core] sched: Remove struct sched_class::next field tip-bot2 for Steven Rostedt (VMware)
  1 sibling, 0 replies; 16+ messages in thread
From: Rasmus Villemoes @ 2019-12-20 12:14 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel
  Cc: Kirill Tkhai, Kirill Tkhai, Peter Zijlstra, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, Ingo Molnar,
	Andrew Morton

On 19/12/2019 22.44, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Now that the sched_class descriptors are defined in order via the linker
> script vmlinux.lds.h, there's no reason to have a "next" pointer to the
> previous priroity structure. The order of the sturctures can be aligned as
> an array, and used to index and find the next sched_class descriptor.
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  include/asm-generic/vmlinux.lds.h | 1 +
>  kernel/sched/deadline.c           | 1 -
>  kernel/sched/fair.c               | 1 -
>  kernel/sched/idle.c               | 1 -
>  kernel/sched/rt.c                 | 1 -
>  kernel/sched/sched.h              | 6 +++---
>  kernel/sched/stop_task.c          | 1 -
>  7 files changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 1c14c4ddf785..f4d480c4f7c6 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -128,6 +128,7 @@
>   */
>  #define SCHED_DATA				\
>  	STRUCT_ALIGN();				\
> +	__start_sched_classes = .;		\
>  	*(__idle_sched_class)			\
>  	*(__fair_sched_class)			\
>  	*(__rt_sched_class)			\

This is broken. It works by accident on a 64 bit SMP config, since you
start at a 32 byte boundary, then include four 8-byte aligned structs,
so the second STRUCT_ALIGN (not visible in this hunk, but comes from the
STOP_SCHED_CLASS) is a no-op, and stop_sched_class ends up at the right
offset from the previous one.

But, for example, a 32 bit non-smp kernel with CONFIG_FAIR_GROUP_SCHED=y
has sizeof(struct sched_class) == 68, and

$ nm -n vmlinux | grep sched_class
c0728660 D idle_sched_class
c0728660 D __start_sched_classes
c07286a4 D fair_sched_class
c07286e8 D rt_sched_class
c0728740 D dl_sched_class
c0728740 D sched_class_highest

notice dl_sched_class is 88 bytes beyond rt_sched_class, while the
others are properly 68-byte separated.

So just drop the second STRUCT_ALIGN (and maybe the first as well).
Maybe throw in some ASSERTs in the linker script, but since the linker
doesn't know sizeof(struct sched_class), the best one can do is perhaps
some kind of ASSERT(fair_sched_class - idle_sched_class ==
rt_sched_class - fair_sched_class). And/or include a BUG_ON that checks
that the sched_class elements actually constitute a proper "struct
sched_class[]" array.

Rasmus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor
  2019-12-20 10:12       ` Rasmus Villemoes
  2019-12-20 10:44         ` Kirill Tkhai
@ 2019-12-20 12:19         ` Peter Zijlstra
  2019-12-20 14:34           ` Rasmus Villemoes
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2019-12-20 12:19 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Steven Rostedt, linux-kernel, Kirill Tkhai, Kirill Tkhai, mingo,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	Ingo Molnar, Andrew Morton

On Fri, Dec 20, 2019 at 11:12:37AM +0100, Rasmus Villemoes wrote:
> On 20/12/2019 11.00, Peter Zijlstra wrote:

> >>> +/*
> >>> + * The order of the sched class addresses are important, as they are
> >>> + * used to determine the order of the priority of each sched class in
> >>> + * relation to each other.
> >>> + */
> >>> +#define SCHED_DATA				\
> >>> +	*(__idle_sched_class)			\
> >>> +	*(__fair_sched_class)			\
> >>> +	*(__rt_sched_class)			\
> >>> +	*(__dl_sched_class)			\
> >>> +	STOP_SCHED_CLASS
> > 
> > I'm confused, why does that STOP_SCHED_CLASS need magic here at all?
> > Doesn't the linker deal with empty sections already by making them 0
> > sized?
> 
> Yes, but dropping the STOP_SCHED_CLASS define doesn't prevent one from
> needing some ifdeffery to define highest_sched_class if they are laid
> out in (higher sched class <-> higher address) order.

Would not something like:

	__begin_sched_classes = .;
	*(__idle_sched_class)
	*(__fair_sched_class)
	*(__rt_sched_class)
	*(__dl_sched_class)
	*(__stop_sched_class)
	__end_sched_classes = .;

combined with something like:

extern struct sched_class *__begin_sched_classes;
extern struct sched_class *__end_sched_classes;

#define sched_class_highest (__end_sched_classes - 1)
#define sched_class_lowest  (__begin_sched_classes - 1)

#define for_class_range(class, _from, _to) \
	for (class = (_from); class != (_to), class--)

#define for_each_class(class) \
	for_class_range(class, sched_class_highest, sched_class_lowest)

just work?

When no __stop_sched_class is present, that section is 0 sized, and
__end_sched_classes points to one past __dl_sched_class, no?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor
  2019-12-20 12:19         ` Peter Zijlstra
@ 2019-12-20 14:34           ` Rasmus Villemoes
  0 siblings, 0 replies; 16+ messages in thread
From: Rasmus Villemoes @ 2019-12-20 14:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-kernel, Kirill Tkhai, Kirill Tkhai, mingo,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	Ingo Molnar, Andrew Morton

On 20/12/2019 13.19, Peter Zijlstra wrote:
> On Fri, Dec 20, 2019 at 11:12:37AM +0100, Rasmus Villemoes wrote:
>> On 20/12/2019 11.00, Peter Zijlstra wrote:
> 
>>>>> +/*
>>>>> + * The order of the sched class addresses are important, as they are
>>>>> + * used to determine the order of the priority of each sched class in
>>>>> + * relation to each other.
>>>>> + */
>>>>> +#define SCHED_DATA				\
>>>>> +	*(__idle_sched_class)			\
>>>>> +	*(__fair_sched_class)			\
>>>>> +	*(__rt_sched_class)			\
>>>>> +	*(__dl_sched_class)			\
>>>>> +	STOP_SCHED_CLASS
>>>
>>> I'm confused, why does that STOP_SCHED_CLASS need magic here at all?
>>> Doesn't the linker deal with empty sections already by making them 0
>>> sized?
>>
>> Yes, but dropping the STOP_SCHED_CLASS define doesn't prevent one from
>> needing some ifdeffery to define highest_sched_class if they are laid
>> out in (higher sched class <-> higher address) order.
> 
> Would not something like:
> 
> 	__begin_sched_classes = .;
> 	*(__idle_sched_class)
> 	*(__fair_sched_class)
> 	*(__rt_sched_class)
> 	*(__dl_sched_class)
> 	*(__stop_sched_class)
> 	__end_sched_classes = .;
> 
> combined with something like:
> 
> extern struct sched_class *__begin_sched_classes;
> extern struct sched_class *__end_sched_classes;

extern const struct sched_class __begin_sched_classes[];

but yes, I get the idea.

> #define sched_class_highest (__end_sched_classes - 1)
> #define sched_class_lowest  (__begin_sched_classes - 1)
> 
> #define for_class_range(class, _from, _to) \
> 	for (class = (_from); class != (_to), class--)
> 
> #define for_each_class(class) \
> 	for_class_range(class, sched_class_highest, sched_class_lowest)
> 
> just work?

Yes, I think so - I was only thinking of the case where all the symbols
would be defined in the linker script, and for this to work you need the
C compiler to subtract the sizeof().

I'd probably not include the -1 in the definition of sched_class_lowest,
but instead put it in the for_each_class definition (i.e. use
sched_class_lowest-1 as _to parameter).

A whole other option is of course to make the whole thing a bona fide C
array defined in sched/core.c, with fair_sched_class being defined as
&sched_classes[1] etc. But that requires giving all the methods extern
linkage. The advantage might be that the compiler can see how much we
iterate over, though I wouldn't expect it to actually unroll the
for_each_class loops five times. So yes, the above is probably the best
way to go.

Rasmus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor
  2019-12-20 10:44         ` Kirill Tkhai
@ 2019-12-20 15:18           ` Steven Rostedt
  0 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2019-12-20 15:18 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Rasmus Villemoes, Peter Zijlstra, linux-kernel, Kirill Tkhai,
	mingo, juri.lelli, vincent.guittot, dietmar.eggemann, bsegall,
	mgorman, Ingo Molnar, Andrew Morton

On Fri, 20 Dec 2019 13:44:05 +0300
Kirill Tkhai <ktkhai@virtuozzo.com> wrote:

> My opinion is to better make some less beautiful thing in a single synchronous place,
> than to distribute the redundancy over all the code (especially, when it is asynchronous).

I very much subscribe to this notion. The TRACE_EVENT() and NMI code
does the same. Keep all other use cases simple by making it complex in
one locality.

-- Steve

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tip: sched/core] sched: Remove struct sched_class::next field
  2019-12-19 21:44 ` [RFC][PATCH 3/4] sched: Remove struct sched_class next field Steven Rostedt
  2019-12-20 12:14   ` Rasmus Villemoes
@ 2020-06-25 11:53   ` tip-bot2 for Steven Rostedt (VMware)
  1 sibling, 0 replies; 16+ messages in thread
From: tip-bot2 for Steven Rostedt (VMware) @ 2020-06-25 11:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Steven Rostedt (VMware), Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     a87e749e8fa1aaef9b4db32e21c2795e69ce67bf
Gitweb:        https://git.kernel.org/tip/a87e749e8fa1aaef9b4db32e21c2795e69ce67bf
Author:        Steven Rostedt (VMware) <rostedt@goodmis.org>
AuthorDate:    Thu, 19 Dec 2019 16:44:54 -05:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 25 Jun 2020 13:45:44 +02:00

sched: Remove struct sched_class::next field

Now that the sched_class descriptors are defined in order via the linker
script vmlinux.lds.h, there's no reason to have a "next" pointer to the
previous priroity structure. The order of the sturctures can be aligned as
an array, and used to index and find the next sched_class descriptor.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191219214558.845353593@goodmis.org
---
 kernel/sched/deadline.c  | 1 -
 kernel/sched/fair.c      | 1 -
 kernel/sched/idle.c      | 1 -
 kernel/sched/rt.c        | 1 -
 kernel/sched/sched.h     | 1 -
 kernel/sched/stop_task.c | 1 -
 6 files changed, 6 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d9e7946..c9cc1d6 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2481,7 +2481,6 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
 
 const struct sched_class dl_sched_class
 	__attribute__((section("__dl_sched_class"))) = {
-	.next			= &rt_sched_class,
 	.enqueue_task		= enqueue_task_dl,
 	.dequeue_task		= dequeue_task_dl,
 	.yield_task		= yield_task_dl,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3365f6b..a63f400 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11124,7 +11124,6 @@ static unsigned int get_rr_interval_fair(struct rq *rq, struct task_struct *task
  */
 const struct sched_class fair_sched_class
 	__attribute__((section("__fair_sched_class"))) = {
-	.next			= &idle_sched_class,
 	.enqueue_task		= enqueue_task_fair,
 	.dequeue_task		= dequeue_task_fair,
 	.yield_task		= yield_task_fair,
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index f580629..336d478 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -455,7 +455,6 @@ static void update_curr_idle(struct rq *rq)
  */
 const struct sched_class idle_sched_class
 	__attribute__((section("__idle_sched_class"))) = {
-	/* .next is NULL */
 	/* no enqueue/yield_task for idle tasks */
 
 	/* dequeue is not valid, we print a debug message there: */
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 6543d44..f215eea 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2431,7 +2431,6 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
 
 const struct sched_class rt_sched_class
 	__attribute__((section("__rt_sched_class"))) = {
-	.next			= &fair_sched_class,
 	.enqueue_task		= enqueue_task_rt,
 	.dequeue_task		= dequeue_task_rt,
 	.yield_task		= yield_task_rt,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4165c06..549e7e6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1754,7 +1754,6 @@ extern const u32		sched_prio_to_wmult[40];
 #define RETRY_TASK		((void *)-1UL)
 
 struct sched_class {
-	const struct sched_class *next;
 
 #ifdef CONFIG_UCLAMP_TASK
 	int uclamp_enabled;
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index f4bbd54..394bc81 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -111,7 +111,6 @@ static void update_curr_stop(struct rq *rq)
  */
 const struct sched_class stop_sched_class
 	__attribute__((section("__stop_sched_class"))) = {
-	.next			= &dl_sched_class,
 
 	.enqueue_task		= enqueue_task_stop,
 	.dequeue_task		= dequeue_task_stop,

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip: sched/core] sched: Have sched_class_highest define by vmlinux.lds.h
  2019-12-19 21:44 ` [RFC][PATCH 2/4] sched: Have sched_class_highest define by vmlinux.lds.h Steven Rostedt
  2019-12-20  8:52   ` Rasmus Villemoes
@ 2020-06-25 11:53   ` tip-bot2 for Steven Rostedt (VMware)
  1 sibling, 0 replies; 16+ messages in thread
From: tip-bot2 for Steven Rostedt (VMware) @ 2020-06-25 11:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Steven Rostedt (VMware), Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     c3a340f7e7eadac7662ab104ceb16432e5a4c6b2
Gitweb:        https://git.kernel.org/tip/c3a340f7e7eadac7662ab104ceb16432e5a4c6b2
Author:        Steven Rostedt (VMware) <rostedt@goodmis.org>
AuthorDate:    Thu, 19 Dec 2019 16:44:53 -05:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 25 Jun 2020 13:45:44 +02:00

sched: Have sched_class_highest define by vmlinux.lds.h

Now that the sched_class descriptors are defined by the linker script, and
this needs to be aware of the existance of stop_sched_class when SMP is
enabled or not, as it is used as the "highest" priority when defined. Move
the declaration of sched_class_highest to the same location in the linker
script that inserts stop_sched_class, and this will also make it easier to
see what should be defined as the highest class, as this linker script
location defines the priorities as well.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191219214558.682913590@goodmis.org
---
 include/asm-generic/vmlinux.lds.h |  5 ++++-
 kernel/sched/core.c               |  8 ++++++++
 kernel/sched/sched.h              | 17 +++++++++--------
 3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 2186d7b..66fb84c 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -114,11 +114,14 @@
  * relation to each other.
  */
 #define SCHED_DATA				\
+	STRUCT_ALIGN();				\
+	__begin_sched_classes = .;		\
 	*(__idle_sched_class)			\
 	*(__fair_sched_class)			\
 	*(__rt_sched_class)			\
 	*(__dl_sched_class)			\
-	*(__stop_sched_class)
+	*(__stop_sched_class)			\
+	__end_sched_classes = .;
 
 /*
  * Align to a 32 byte boundary equal to the
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0208b71..81640fe 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6646,6 +6646,14 @@ void __init sched_init(void)
 	unsigned long ptr = 0;
 	int i;
 
+	/* Make sure the linker didn't screw up */
+	BUG_ON(&idle_sched_class + 1 != &fair_sched_class ||
+	       &fair_sched_class + 1 != &rt_sched_class ||
+	       &rt_sched_class + 1   != &dl_sched_class);
+#ifdef CONFIG_SMP
+	BUG_ON(&dl_sched_class + 1 != &stop_sched_class);
+#endif
+
 	wait_bit_init();
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3368876..4165c06 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1811,7 +1811,7 @@ struct sched_class {
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	void (*task_change_group)(struct task_struct *p, int type);
 #endif
-};
+} __aligned(32); /* STRUCT_ALIGN(), vmlinux.lds.h */
 
 static inline void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
@@ -1825,17 +1825,18 @@ static inline void set_next_task(struct rq *rq, struct task_struct *next)
 	next->sched_class->set_next_task(rq, next, false);
 }
 
-#ifdef CONFIG_SMP
-#define sched_class_highest (&stop_sched_class)
-#else
-#define sched_class_highest (&dl_sched_class)
-#endif
+/* Defined in include/asm-generic/vmlinux.lds.h */
+extern struct sched_class __begin_sched_classes[];
+extern struct sched_class __end_sched_classes[];
+
+#define sched_class_highest (__end_sched_classes - 1)
+#define sched_class_lowest  (__begin_sched_classes - 1)
 
 #define for_class_range(class, _from, _to) \
-	for (class = (_from); class != (_to); class = class->next)
+	for (class = (_from); class != (_to); class--)
 
 #define for_each_class(class) \
-	for_class_range(class, sched_class_highest, NULL)
+	for_class_range(class, sched_class_highest, sched_class_lowest)
 
 extern const struct sched_class stop_sched_class;
 extern const struct sched_class dl_sched_class;

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-06-25 11:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-19 21:44 [RFC][PATCH 0/4] sched: Optimizations to sched_class processing Steven Rostedt
2019-12-19 21:44 ` [RFC][PATCH 1/4] sched: Force the address order of each sched class descriptor Steven Rostedt
2019-12-20  8:52   ` Rasmus Villemoes
2019-12-20 10:00     ` Peter Zijlstra
2019-12-20 10:12       ` Rasmus Villemoes
2019-12-20 10:44         ` Kirill Tkhai
2019-12-20 15:18           ` Steven Rostedt
2019-12-20 12:19         ` Peter Zijlstra
2019-12-20 14:34           ` Rasmus Villemoes
2019-12-19 21:44 ` [RFC][PATCH 2/4] sched: Have sched_class_highest define by vmlinux.lds.h Steven Rostedt
2019-12-20  8:52   ` Rasmus Villemoes
2020-06-25 11:53   ` [tip: sched/core] " tip-bot2 for Steven Rostedt (VMware)
2019-12-19 21:44 ` [RFC][PATCH 3/4] sched: Remove struct sched_class next field Steven Rostedt
2019-12-20 12:14   ` Rasmus Villemoes
2020-06-25 11:53   ` [tip: sched/core] sched: Remove struct sched_class::next field tip-bot2 for Steven Rostedt (VMware)
2019-12-19 21:44 ` [RFC][PATCH 4/4] sched: Micro optimization in pick_next_task() and in check_preempt_curr() Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.