linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
@ 2019-12-19 12:39 Kirill Tkhai
  2019-12-19 13:12 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Kirill Tkhai @ 2019-12-19 12:39 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, ktkhai
  Cc: linux-kernel

In kernel/sched/Makefile files, describing different sched classes, already
go in the order from the lowest priority class to the highest priority class:

idle.o fair.o rt.o deadline.o stop_task.o

The documentation of GNU linker says, that section appears in the order
they are seen during link time (see [1]):

>Normally, the linker will place files and sections matched by wildcards
>in the order in which they are seen during the link. You can change this
>by using the SORT keyword, which appears before a wildcard pattern
>in parentheses (e.g., SORT(.text*)).

So, we may expect const variables from idle.o will go before ro variables
from fair.o in RO_DATA section, while ro variables from fair.o will go
before ro variables from rt.o, etc.

(Also, it looks like the linking order is already used in kernel, e.g.
 in drivers/md/Makefile)

Thus, we may introduce an optimization based on xxx_sched_class addresses
in these two hot scheduler functions: pick_next_task() and check_preempt_curr().

One more result of the patch is that size of object file becomes a little
less (excluding added BUG_ON(), which goes in __init section):

$size kernel/sched/core.o
         text     data      bss	    dec	    hex	filename
before:  66446    18957	    676	  86079	  1503f	kernel/sched/core.o
after:   66398    18957	    676	  86031	  1500f	kernel/sched/core.o

[1] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/sections.html

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 kernel/sched/Makefile |    2 ++
 kernel/sched/core.c   |   24 +++++++++---------------
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 5fc9c9b70862..f78f177c660a 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -23,6 +23,8 @@ CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer
 endif
 
 obj-y += core.o loadavg.o clock.o cputime.o
+# Order is significant: a more priority class xxx is described by variable
+# xxx_sched_class with a bigger address. See BUG_ON() in sched_init().
 obj-y += idle.o fair.o rt.o deadline.o
 obj-y += wait.o wait_bit.o swait.o completion.o
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 15508c202bf5..befdd7158b27 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1416,20 +1416,10 @@ static inline void check_class_changed(struct rq *rq, struct task_struct *p,
 
 void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
 {
-	const struct sched_class *class;
-
-	if (p->sched_class == rq->curr->sched_class) {
+	if (p->sched_class == rq->curr->sched_class)
 		rq->curr->sched_class->check_preempt_curr(rq, p, flags);
-	} else {
-		for_each_class(class) {
-			if (class == rq->curr->sched_class)
-				break;
-			if (class == p->sched_class) {
-				resched_curr(rq);
-				break;
-			}
-		}
-	}
+	else if (p->sched_class > rq->curr->sched_class)
+		resched_curr(rq);
 
 	/*
 	 * A queue event has occurred, and we're going to schedule.  In
@@ -3914,8 +3904,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	 * higher scheduling class, because otherwise those loose the
 	 * opportunity to pull in more work from other CPUs.
 	 */
-	if (likely((prev->sched_class == &idle_sched_class ||
-		    prev->sched_class == &fair_sched_class) &&
+	if (likely(prev->sched_class <= &fair_sched_class &&
 		   rq->nr_running == rq->cfs.h_nr_running)) {
 
 		p = pick_next_task_fair(rq, prev, rf);
@@ -6569,6 +6558,11 @@ void __init sched_init(void)
 	unsigned long ptr = 0;
 	int i;
 
+	BUG_ON(&idle_sched_class > &fair_sched_class ||
+		&fair_sched_class > &rt_sched_class ||
+		&rt_sched_class > &dl_sched_class ||
+		&dl_sched_class > &stop_sched_class);
+
 	wait_bit_init();
 
 #ifdef CONFIG_FAIR_GROUP_SCHED



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 12:39 [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr() Kirill Tkhai
@ 2019-12-19 13:12 ` Peter Zijlstra
  2019-12-19 14:02   ` Peter Zijlstra
  2019-12-19 14:04   ` [Q] ld: Does LTO reorder ro variables in two files? Kirill Tkhai
  2019-12-19 13:50 ` [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr() Steven Rostedt
  2020-06-25 11:53 ` [tip: sched/core] sched: Force the address order of each sched class descriptor tip-bot2 for Steven Rostedt (VMware)
  2 siblings, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2019-12-19 13:12 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, linux-kernel

On Thu, Dec 19, 2019 at 03:39:14PM +0300, Kirill Tkhai wrote:
> In kernel/sched/Makefile files, describing different sched classes, already
> go in the order from the lowest priority class to the highest priority class:
> 
> idle.o fair.o rt.o deadline.o stop_task.o
> 
> The documentation of GNU linker says, that section appears in the order
> they are seen during link time (see [1]):
> 
> >Normally, the linker will place files and sections matched by wildcards
> >in the order in which they are seen during the link. You can change this
> >by using the SORT keyword, which appears before a wildcard pattern
> >in parentheses (e.g., SORT(.text*)).
> 
> So, we may expect const variables from idle.o will go before ro variables
> from fair.o in RO_DATA section, while ro variables from fair.o will go
> before ro variables from rt.o, etc.
> 
> (Also, it looks like the linking order is already used in kernel, e.g.
>  in drivers/md/Makefile)
> 
> Thus, we may introduce an optimization based on xxx_sched_class addresses
> in these two hot scheduler functions: pick_next_task() and check_preempt_curr().
> 
> One more result of the patch is that size of object file becomes a little
> less (excluding added BUG_ON(), which goes in __init section):
> 
> $size kernel/sched/core.o
>          text     data      bss	    dec	    hex	filename
> before:  66446    18957	    676	  86079	  1503f	kernel/sched/core.o
> after:   66398    18957	    676	  86031	  1500f	kernel/sched/core.o

Does LTO preserve this behaviour? I've never quite dared do this exact
optimization.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 12:39 [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr() Kirill Tkhai
  2019-12-19 13:12 ` Peter Zijlstra
@ 2019-12-19 13:50 ` Steven Rostedt
  2019-12-19 13:53   ` Kirill Tkhai
  2019-12-19 13:53   ` Peter Zijlstra
  2020-06-25 11:53 ` [tip: sched/core] sched: Force the address order of each sched class descriptor tip-bot2 for Steven Rostedt (VMware)
  2 siblings, 2 replies; 24+ messages in thread
From: Steven Rostedt @ 2019-12-19 13:50 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, linux-kernel

On Thu, 19 Dec 2019 15:39:14 +0300
Kirill Tkhai <ktkhai@virtuozzo.com> wrote:

> @@ -6569,6 +6558,11 @@ void __init sched_init(void)
>  	unsigned long ptr = 0;
>  	int i;
>  
> +	BUG_ON(&idle_sched_class > &fair_sched_class ||
> +		&fair_sched_class > &rt_sched_class ||
> +		&rt_sched_class > &dl_sched_class ||
> +		&dl_sched_class > &stop_sched_class);
> +

Can this be a BUILD_BUG_ON? These address should all be constants.

-- Steve



>  	wait_bit_init();
>  

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 13:50 ` [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr() Steven Rostedt
@ 2019-12-19 13:53   ` Kirill Tkhai
  2019-12-19 13:53   ` Peter Zijlstra
  1 sibling, 0 replies; 24+ messages in thread
From: Kirill Tkhai @ 2019-12-19 13:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, linux-kernel

On 19.12.2019 16:50, Steven Rostedt wrote:
> On Thu, 19 Dec 2019 15:39:14 +0300
> Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> 
>> @@ -6569,6 +6558,11 @@ void __init sched_init(void)
>>  	unsigned long ptr = 0;
>>  	int i;
>>  
>> +	BUG_ON(&idle_sched_class > &fair_sched_class ||
>> +		&fair_sched_class > &rt_sched_class ||
>> +		&rt_sched_class > &dl_sched_class ||
>> +		&dl_sched_class > &stop_sched_class);
>> +
> 
> Can this be a BUILD_BUG_ON? These address should all be constants.

BUILD_BUG_ON() is compile-time check, while address is assigned
at link time, isn't it?!

Anyway, plain BUILD_BUG_ON() fails here with the following:

In file included from ./arch/x86/include/asm/current.h:5,
                 from ./include/linux/sched.h:12,
                 from kernel/sched/sched.h:5,
                 from kernel/sched/core.c:9:
kernel/sched/core.c: In function ‘sched_init’:
./include/linux/compiler.h:394:38: error: call to ‘__compiletime_assert_6561’ declared with attribute error: BUILD_BUG_ON failed: &idle_sched_class > &fair_sched_class || &fair_sched_class > &rt_sched_class || &rt_sched_class > &dl_sched_class || &dl_sched_class > &stop_sched_class
  394 |  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
      |                                      ^
./include/linux/compiler.h:375:4: note: in definition of macro ‘__compiletime_assert’
  375 |    prefix ## suffix();    \
      |    ^~~~~~
./include/linux/compiler.h:394:2: note: in expansion of macro ‘_compiletime_assert’
  394 |  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
      |  ^~~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’
   39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
      |                                     ^~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
   50 |  BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
      |  ^~~~~~~~~~~~~~~~
kernel/sched/core.c:6561:2: note: in expansion of macro ‘BUILD_BUG_ON’
 6561 |  BUILD_BUG_ON(&idle_sched_class > &fair_sched_class ||
      |  ^~~~~~~~~~~~


> -- Steve
> 
> 
> 
>>  	wait_bit_init();
>>  


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 13:50 ` [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr() Steven Rostedt
  2019-12-19 13:53   ` Kirill Tkhai
@ 2019-12-19 13:53   ` Peter Zijlstra
  1 sibling, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2019-12-19 13:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Kirill Tkhai, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, linux-kernel

On Thu, Dec 19, 2019 at 08:50:42AM -0500, Steven Rostedt wrote:
> On Thu, 19 Dec 2019 15:39:14 +0300
> Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> 
> > @@ -6569,6 +6558,11 @@ void __init sched_init(void)
> >  	unsigned long ptr = 0;
> >  	int i;
> >  
> > +	BUG_ON(&idle_sched_class > &fair_sched_class ||
> > +		&fair_sched_class > &rt_sched_class ||
> > +		&rt_sched_class > &dl_sched_class ||
> > +		&dl_sched_class > &stop_sched_class);
> > +
> 
> Can this be a BUILD_BUG_ON? These address should all be constants.

Nope, BUILD_BUG_ON() is for compile time constants, these are link time
constants.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 13:12 ` Peter Zijlstra
@ 2019-12-19 14:02   ` Peter Zijlstra
  2019-12-19 14:25     ` Kirill Tkhai
  2019-12-19 14:04   ` [Q] ld: Does LTO reorder ro variables in two files? Kirill Tkhai
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2019-12-19 14:02 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, linux-kernel

On Thu, Dec 19, 2019 at 02:12:42PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 19, 2019 at 03:39:14PM +0300, Kirill Tkhai wrote:
> > In kernel/sched/Makefile files, describing different sched classes, already
> > go in the order from the lowest priority class to the highest priority class:
> > 
> > idle.o fair.o rt.o deadline.o stop_task.o
> > 
> > The documentation of GNU linker says, that section appears in the order
> > they are seen during link time (see [1]):
> > 
> > >Normally, the linker will place files and sections matched by wildcards
> > >in the order in which they are seen during the link. You can change this
> > >by using the SORT keyword, which appears before a wildcard pattern
> > >in parentheses (e.g., SORT(.text*)).
> > 
> > So, we may expect const variables from idle.o will go before ro variables
> > from fair.o in RO_DATA section, while ro variables from fair.o will go
> > before ro variables from rt.o, etc.
> > 
> > (Also, it looks like the linking order is already used in kernel, e.g.
> >  in drivers/md/Makefile)
> > 
> > Thus, we may introduce an optimization based on xxx_sched_class addresses
> > in these two hot scheduler functions: pick_next_task() and check_preempt_curr().
> > 
> > One more result of the patch is that size of object file becomes a little
> > less (excluding added BUG_ON(), which goes in __init section):
> > 
> > $size kernel/sched/core.o
> >          text     data      bss	    dec	    hex	filename
> > before:  66446    18957	    676	  86079	  1503f	kernel/sched/core.o
> > after:   66398    18957	    676	  86031	  1500f	kernel/sched/core.o
> 
> Does LTO preserve this behaviour? I've never quite dared do this exact
> optimization.

Also, ld.lld seems a popular option.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Q] ld: Does LTO reorder ro variables in two files?
  2019-12-19 13:12 ` Peter Zijlstra
  2019-12-19 14:02   ` Peter Zijlstra
@ 2019-12-19 14:04   ` Kirill Tkhai
  2019-12-19 15:21     ` Jeff Law
  2019-12-19 15:45     ` Alexander Monakov
  1 sibling, 2 replies; 24+ messages in thread
From: Kirill Tkhai @ 2019-12-19 14:04 UTC (permalink / raw)
  To: Peter Zijlstra, gcc-help
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, linux-kernel

CC: gcc-help@gcc.gnu.org

Hi, gcc guys,

this thread starts here: https://lkml.org/lkml/2019/12/19/403

There are two const variables:

   struct sched_class idle_sched_class
and
   struct sched_class fair_sched_class,

which are declared in two files idle.c and fair.c.

1)In Makefile the order is: idle.o fair.o
2)the variables go to the same ro section
3)there is no SORT(.*) keyword in linker script.

Is it always true, that after linkage &idle_sched_class < &fair_sched_class?

Thanks!
Kirill

On 19.12.2019 16:12, Peter Zijlstra wrote:
> On Thu, Dec 19, 2019 at 03:39:14PM +0300, Kirill Tkhai wrote:
>> In kernel/sched/Makefile files, describing different sched classes, already
>> go in the order from the lowest priority class to the highest priority class:
>>
>> idle.o fair.o rt.o deadline.o stop_task.o
>>
>> The documentation of GNU linker says, that section appears in the order
>> they are seen during link time (see [1]):
>>
>>> Normally, the linker will place files and sections matched by wildcards
>>> in the order in which they are seen during the link. You can change this
>>> by using the SORT keyword, which appears before a wildcard pattern
>>> in parentheses (e.g., SORT(.text*)).
>>
>> So, we may expect const variables from idle.o will go before ro variables
>> from fair.o in RO_DATA section, while ro variables from fair.o will go
>> before ro variables from rt.o, etc.
>>
>> (Also, it looks like the linking order is already used in kernel, e.g.
>>  in drivers/md/Makefile)
>>
>> Thus, we may introduce an optimization based on xxx_sched_class addresses
>> in these two hot scheduler functions: pick_next_task() and check_preempt_curr().
>>
>> One more result of the patch is that size of object file becomes a little
>> less (excluding added BUG_ON(), which goes in __init section):
>>
>> $size kernel/sched/core.o
>>          text     data      bss	    dec	    hex	filename
>> before:  66446    18957	    676	  86079	  1503f	kernel/sched/core.o
>> after:   66398    18957	    676	  86031	  1500f	kernel/sched/core.o
> 
> Does LTO preserve this behaviour? I've never quite dared do this exact
> optimization.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 14:02   ` Peter Zijlstra
@ 2019-12-19 14:25     ` Kirill Tkhai
  2019-12-19 14:31       ` Kirill Tkhai
  2019-12-19 14:43       ` Steven Rostedt
  0 siblings, 2 replies; 24+ messages in thread
From: Kirill Tkhai @ 2019-12-19 14:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, linux-kernel

On 19.12.2019 17:02, Peter Zijlstra wrote:
> On Thu, Dec 19, 2019 at 02:12:42PM +0100, Peter Zijlstra wrote:
>> On Thu, Dec 19, 2019 at 03:39:14PM +0300, Kirill Tkhai wrote:
>>> In kernel/sched/Makefile files, describing different sched classes, already
>>> go in the order from the lowest priority class to the highest priority class:
>>>
>>> idle.o fair.o rt.o deadline.o stop_task.o
>>>
>>> The documentation of GNU linker says, that section appears in the order
>>> they are seen during link time (see [1]):
>>>
>>>> Normally, the linker will place files and sections matched by wildcards
>>>> in the order in which they are seen during the link. You can change this
>>>> by using the SORT keyword, which appears before a wildcard pattern
>>>> in parentheses (e.g., SORT(.text*)).
>>>
>>> So, we may expect const variables from idle.o will go before ro variables
>>> from fair.o in RO_DATA section, while ro variables from fair.o will go
>>> before ro variables from rt.o, etc.
>>>
>>> (Also, it looks like the linking order is already used in kernel, e.g.
>>>  in drivers/md/Makefile)
>>>
>>> Thus, we may introduce an optimization based on xxx_sched_class addresses
>>> in these two hot scheduler functions: pick_next_task() and check_preempt_curr().
>>>
>>> One more result of the patch is that size of object file becomes a little
>>> less (excluding added BUG_ON(), which goes in __init section):
>>>
>>> $size kernel/sched/core.o
>>>          text     data      bss	    dec	    hex	filename
>>> before:  66446    18957	    676	  86079	  1503f	kernel/sched/core.o
>>> after:   66398    18957	    676	  86031	  1500f	kernel/sched/core.o
>>
>> Does LTO preserve this behaviour? I've never quite dared do this exact
>> optimization.
> 
> Also, ld.lld seems a popular option.

I asked on their IRC. Oh, it looks like no way is for this.

About the link: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/sections.html

(17:19:25) nbjoerg: but it is not guarenteed behavior
(17:19:50) nbjoerg: if for some strange reason you really need to enforce relative orders of global objects, put them in consecutively named sections

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 14:25     ` Kirill Tkhai
@ 2019-12-19 14:31       ` Kirill Tkhai
  2019-12-19 14:43       ` Steven Rostedt
  1 sibling, 0 replies; 24+ messages in thread
From: Kirill Tkhai @ 2019-12-19 14:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, linux-kernel

On 19.12.2019 17:25, Kirill Tkhai wrote:
> On 19.12.2019 17:02, Peter Zijlstra wrote:
>> On Thu, Dec 19, 2019 at 02:12:42PM +0100, Peter Zijlstra wrote:
>>> On Thu, Dec 19, 2019 at 03:39:14PM +0300, Kirill Tkhai wrote:
>>>> In kernel/sched/Makefile files, describing different sched classes, already
>>>> go in the order from the lowest priority class to the highest priority class:
>>>>
>>>> idle.o fair.o rt.o deadline.o stop_task.o
>>>>
>>>> The documentation of GNU linker says, that section appears in the order
>>>> they are seen during link time (see [1]):
>>>>
>>>>> Normally, the linker will place files and sections matched by wildcards
>>>>> in the order in which they are seen during the link. You can change this
>>>>> by using the SORT keyword, which appears before a wildcard pattern
>>>>> in parentheses (e.g., SORT(.text*)).
>>>>
>>>> So, we may expect const variables from idle.o will go before ro variables
>>>> from fair.o in RO_DATA section, while ro variables from fair.o will go
>>>> before ro variables from rt.o, etc.
>>>>
>>>> (Also, it looks like the linking order is already used in kernel, e.g.
>>>>  in drivers/md/Makefile)
>>>>
>>>> Thus, we may introduce an optimization based on xxx_sched_class addresses
>>>> in these two hot scheduler functions: pick_next_task() and check_preempt_curr().
>>>>
>>>> One more result of the patch is that size of object file becomes a little
>>>> less (excluding added BUG_ON(), which goes in __init section):
>>>>
>>>> $size kernel/sched/core.o
>>>>          text     data      bss	    dec	    hex	filename
>>>> before:  66446    18957	    676	  86079	  1503f	kernel/sched/core.o
>>>> after:   66398    18957	    676	  86031	  1500f	kernel/sched/core.o
>>>
>>> Does LTO preserve this behaviour? I've never quite dared do this exact
>>> optimization.
>>
>> Also, ld.lld seems a popular option.
> 
> I asked on their IRC. Oh, it looks like no way is for this.
> 
> About the link: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/sections.html
> 
> (17:19:25) nbjoerg: but it is not guarenteed behavior
> (17:19:50) nbjoerg: if for some strange reason you really need to enforce relative orders of global objects, put them in consecutively named sections

Introduction of sched_class::id instead of this patch's approach does not have a big sense,
since this will help in check_preempt_curr() only. And this requires too many new lines of code.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 14:25     ` Kirill Tkhai
  2019-12-19 14:31       ` Kirill Tkhai
@ 2019-12-19 14:43       ` Steven Rostedt
  2019-12-19 14:46         ` Kirill Tkhai
  1 sibling, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2019-12-19 14:43 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Peter Zijlstra, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, linux-kernel

On Thu, 19 Dec 2019 17:25:40 +0300
Kirill Tkhai <ktkhai@virtuozzo.com> wrote:

> (17:19:25) nbjoerg: but it is not guarenteed behavior
> (17:19:50) nbjoerg: if for some strange reason you really need to enforce relative orders of global objects, put them in consecutively named sections

Which appears to work. I tried this patch on top of yours:

Not sure how this does with locality though.

-- Steve

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index e00f41aa8ec4..ff12a422ff19 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -108,6 +108,13 @@
 #define SBSS_MAIN .sbss
 #endif
 
+#define SCHED_DATA				\
+	*(__idle_sched_class)			\
+	*(__fair_sched_class)			\
+	*(__rt_sched_class)			\
+	*(__dl_sched_class)			\
+	*(__stop_sched_class)
+
 /*
  * Align to a 32 byte boundary equal to the
  * alignment gcc 4.5 uses for a struct
@@ -308,6 +315,7 @@
 #define DATA_DATA							\
 	*(.xiptext)							\
 	*(DATA_MAIN)							\
+	SCHED_DATA							\
 	*(.ref.data)							\
 	*(.data..shared_aligned) /* percpu related */			\
 	MEM_KEEP(init.data*)						\
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 43323f875cb9..5abdbe569f93 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2428,7 +2428,8 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
 	}
 }
 
-const struct sched_class dl_sched_class = {
+const struct sched_class dl_sched_class
+	__attribute__((section("__dl_sched_class"))) = {
 	.next			= &rt_sched_class,
 	.enqueue_task		= enqueue_task_dl,
 	.dequeue_task		= dequeue_task_dl,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 08a233e97a01..e745fe0e0cd3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10745,7 +10745,8 @@ static unsigned int get_rr_interval_fair(struct rq *rq, struct task_struct *task
 /*
  * All the scheduling class methods:
  */
-const struct sched_class fair_sched_class = {
+const struct sched_class fair_sched_class
+	__attribute__((section("__fair_sched_class"))) = {
 	.next			= &idle_sched_class,
 	.enqueue_task		= enqueue_task_fair,
 	.dequeue_task		= dequeue_task_fair,
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index ffa959e91227..700a9c826f0e 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -454,7 +454,8 @@ static void update_curr_idle(struct rq *rq)
 /*
  * Simple, special scheduling class for the per-CPU idle tasks:
  */
-const struct sched_class idle_sched_class = {
+const struct sched_class idle_sched_class
+	__attribute__((section("__idle_sched_class"))) = {
 	/* .next is NULL */
 	/* no enqueue/yield_task for idle tasks */
 
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e591d40fd645..5d3f9bcddaeb 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2354,7 +2354,8 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
 		return 0;
 }
 
-const struct sched_class rt_sched_class = {
+const struct sched_class rt_sched_class
+	__attribute__((section("__rt_sched_class"))) = {
 	.next			= &fair_sched_class,
 	.enqueue_task		= enqueue_task_rt,
 	.dequeue_task		= dequeue_task_rt,
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 4c9e9975684f..03bc7530ff75 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -115,7 +115,8 @@ static void update_curr_stop(struct rq *rq)
 /*
  * Simple, special scheduling class for the per-CPU stop tasks:
  */
-const struct sched_class stop_sched_class = {
+const struct sched_class stop_sched_class
+	__attribute__((section("__stop_sched_class"))) = {
 	.next			= &dl_sched_class,
 
 	.enqueue_task		= enqueue_task_stop,

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 14:43       ` Steven Rostedt
@ 2019-12-19 14:46         ` Kirill Tkhai
  2019-12-19 14:59           ` Steven Rostedt
  0 siblings, 1 reply; 24+ messages in thread
From: Kirill Tkhai @ 2019-12-19 14:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, linux-kernel

On 19.12.2019 17:43, Steven Rostedt wrote:
> On Thu, 19 Dec 2019 17:25:40 +0300
> Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> 
>> (17:19:25) nbjoerg: but it is not guarenteed behavior
>> (17:19:50) nbjoerg: if for some strange reason you really need to enforce relative orders of global objects, put them in consecutively named sections
> 
> Which appears to work. I tried this patch on top of yours:
> 
> Not sure how this does with locality though.

Hm, I'm not sure, but AFAIR some (all?) sections are aligned at 4K.
Will this bring holes (4K-sizeof(struct sched_class)) in address
space?


> -- Steve
> 
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index e00f41aa8ec4..ff12a422ff19 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -108,6 +108,13 @@
>  #define SBSS_MAIN .sbss
>  #endif
>  
> +#define SCHED_DATA				\
> +	*(__idle_sched_class)			\
> +	*(__fair_sched_class)			\
> +	*(__rt_sched_class)			\
> +	*(__dl_sched_class)			\
> +	*(__stop_sched_class)
> +
>  /*
>   * Align to a 32 byte boundary equal to the
>   * alignment gcc 4.5 uses for a struct
> @@ -308,6 +315,7 @@
>  #define DATA_DATA							\
>  	*(.xiptext)							\
>  	*(DATA_MAIN)							\
> +	SCHED_DATA							\
>  	*(.ref.data)							\
>  	*(.data..shared_aligned) /* percpu related */			\
>  	MEM_KEEP(init.data*)						\
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 43323f875cb9..5abdbe569f93 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2428,7 +2428,8 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
>  	}
>  }
>  
> -const struct sched_class dl_sched_class = {
> +const struct sched_class dl_sched_class
> +	__attribute__((section("__dl_sched_class"))) = {
>  	.next			= &rt_sched_class,
>  	.enqueue_task		= enqueue_task_dl,
>  	.dequeue_task		= dequeue_task_dl,
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 08a233e97a01..e745fe0e0cd3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10745,7 +10745,8 @@ static unsigned int get_rr_interval_fair(struct rq *rq, struct task_struct *task
>  /*
>   * All the scheduling class methods:
>   */
> -const struct sched_class fair_sched_class = {
> +const struct sched_class fair_sched_class
> +	__attribute__((section("__fair_sched_class"))) = {
>  	.next			= &idle_sched_class,
>  	.enqueue_task		= enqueue_task_fair,
>  	.dequeue_task		= dequeue_task_fair,
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index ffa959e91227..700a9c826f0e 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -454,7 +454,8 @@ static void update_curr_idle(struct rq *rq)
>  /*
>   * Simple, special scheduling class for the per-CPU idle tasks:
>   */
> -const struct sched_class idle_sched_class = {
> +const struct sched_class idle_sched_class
> +	__attribute__((section("__idle_sched_class"))) = {
>  	/* .next is NULL */
>  	/* no enqueue/yield_task for idle tasks */
>  
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index e591d40fd645..5d3f9bcddaeb 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2354,7 +2354,8 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
>  		return 0;
>  }
>  
> -const struct sched_class rt_sched_class = {
> +const struct sched_class rt_sched_class
> +	__attribute__((section("__rt_sched_class"))) = {
>  	.next			= &fair_sched_class,
>  	.enqueue_task		= enqueue_task_rt,
>  	.dequeue_task		= dequeue_task_rt,
> diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
> index 4c9e9975684f..03bc7530ff75 100644
> --- a/kernel/sched/stop_task.c
> +++ b/kernel/sched/stop_task.c
> @@ -115,7 +115,8 @@ static void update_curr_stop(struct rq *rq)
>  /*
>   * Simple, special scheduling class for the per-CPU stop tasks:
>   */
> -const struct sched_class stop_sched_class = {
> +const struct sched_class stop_sched_class
> +	__attribute__((section("__stop_sched_class"))) = {
>  	.next			= &dl_sched_class,
>  
>  	.enqueue_task		= enqueue_task_stop,
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 14:46         ` Kirill Tkhai
@ 2019-12-19 14:59           ` Steven Rostedt
  2019-12-19 15:20             ` [PATCH v2] " Kirill Tkhai
  0 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2019-12-19 14:59 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Peter Zijlstra, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, linux-kernel

On Thu, 19 Dec 2019 17:46:37 +0300
Kirill Tkhai <ktkhai@virtuozzo.com> wrote:

> On 19.12.2019 17:43, Steven Rostedt wrote:
> > On Thu, 19 Dec 2019 17:25:40 +0300
> > Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> >   
> >> (17:19:25) nbjoerg: but it is not guarenteed behavior
> >> (17:19:50) nbjoerg: if for some strange reason you really need to enforce relative orders of global objects, put them in consecutively named sections  
> > 
> > Which appears to work. I tried this patch on top of yours:
> > 
> > Not sure how this does with locality though.  
> 
> Hm, I'm not sure, but AFAIR some (all?) sections are aligned at 4K.
> Will this bring holes (4K-sizeof(struct sched_class)) in address
> space?

I believe only if you set the align attribute in the linker script.
With this and your patch:

# grep sched_class /proc/kallsyms
ffffffff8e760900 D idle_sched_class
ffffffff8e7609e0 D fair_sched_class
ffffffff8e760ac0 D rt_sched_class
ffffffff8e760ba0 D dl_sched_class
ffffffff8e760c80 D stop_sched_class

-- Steve

> 
> 
> > -- Steve
> > 
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index e00f41aa8ec4..ff12a422ff19 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -108,6 +108,13 @@
> >  #define SBSS_MAIN .sbss
> >  #endif
> >  
> > +#define SCHED_DATA				\
> > +	*(__idle_sched_class)			\
> > +	*(__fair_sched_class)			\
> > +	*(__rt_sched_class)			\
> > +	*(__dl_sched_class)			\
> > +	*(__stop_sched_class)
> > +
> >  /*
> >   * Align to a 32 byte boundary equal to the
> >   * alignment gcc 4.5 uses for a struct
> > @@ -308,6 +315,7 @@
> >  #define DATA_DATA							\
> >  	*(.xiptext)							\
> >  	*(DATA_MAIN)							\
> > +	SCHED_DATA							\
> >  	*(.ref.data)							\
> >  	*(.data..shared_aligned) /* percpu related */			\
> >  	MEM_KEEP(init.data*)						\
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index 43323f875cb9..5abdbe569f93 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -2428,7 +2428,8 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
> >  	}
> >  }
> >  
> > -const struct sched_class dl_sched_class = {
> > +const struct sched_class dl_sched_class
> > +	__attribute__((section("__dl_sched_class"))) = {
> >  	.next			= &rt_sched_class,
> >  	.enqueue_task		= enqueue_task_dl,
> >  	.dequeue_task		= dequeue_task_dl,
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 08a233e97a01..e745fe0e0cd3 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -10745,7 +10745,8 @@ static unsigned int get_rr_interval_fair(struct rq *rq, struct task_struct *task
> >  /*
> >   * All the scheduling class methods:
> >   */
> > -const struct sched_class fair_sched_class = {
> > +const struct sched_class fair_sched_class
> > +	__attribute__((section("__fair_sched_class"))) = {
> >  	.next			= &idle_sched_class,
> >  	.enqueue_task		= enqueue_task_fair,
> >  	.dequeue_task		= dequeue_task_fair,
> > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> > index ffa959e91227..700a9c826f0e 100644
> > --- a/kernel/sched/idle.c
> > +++ b/kernel/sched/idle.c
> > @@ -454,7 +454,8 @@ static void update_curr_idle(struct rq *rq)
> >  /*
> >   * Simple, special scheduling class for the per-CPU idle tasks:
> >   */
> > -const struct sched_class idle_sched_class = {
> > +const struct sched_class idle_sched_class
> > +	__attribute__((section("__idle_sched_class"))) = {
> >  	/* .next is NULL */
> >  	/* no enqueue/yield_task for idle tasks */
> >  
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index e591d40fd645..5d3f9bcddaeb 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -2354,7 +2354,8 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
> >  		return 0;
> >  }
> >  
> > -const struct sched_class rt_sched_class = {
> > +const struct sched_class rt_sched_class
> > +	__attribute__((section("__rt_sched_class"))) = {
> >  	.next			= &fair_sched_class,
> >  	.enqueue_task		= enqueue_task_rt,
> >  	.dequeue_task		= dequeue_task_rt,
> > diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
> > index 4c9e9975684f..03bc7530ff75 100644
> > --- a/kernel/sched/stop_task.c
> > +++ b/kernel/sched/stop_task.c
> > @@ -115,7 +115,8 @@ static void update_curr_stop(struct rq *rq)
> >  /*
> >   * Simple, special scheduling class for the per-CPU stop tasks:
> >   */
> > -const struct sched_class stop_sched_class = {
> > +const struct sched_class stop_sched_class
> > +	__attribute__((section("__stop_sched_class"))) = {
> >  	.next			= &dl_sched_class,
> >  
> >  	.enqueue_task		= enqueue_task_stop,
> >   


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 14:59           ` Steven Rostedt
@ 2019-12-19 15:20             ` Kirill Tkhai
  2019-12-19 15:40               ` Steven Rostedt
  2019-12-19 16:05               ` Steven Rostedt
  0 siblings, 2 replies; 24+ messages in thread
From: Kirill Tkhai @ 2019-12-19 15:20 UTC (permalink / raw)
  To: Steven Rostedt, Peter Zijlstra
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, bsegall,
	mgorman, linux-kernel

From: Kirill Tkhai <ktkhai@virtuozzo.com>

This introduces an optimization based on xxx_sched_class addresses
in two hot scheduler functions: pick_next_task() and check_preempt_curr().

After this patch, it will be possible to compare pointers to sched classes
to check, which of them has a higher priority, instead of current iterations
using for_each_class().

One more result of the patch is that size of object file becomes a little
less (excluding added BUG_ON(), which goes in __init section):

$size kernel/sched/core.o
         text     data      bss	    dec	    hex	filename
before:  66446    18957	    676	  86079	  1503f	kernel/sched/core.o
after:   66398    18957	    676	  86031	  1500f	kernel/sched/core.o

SCHED_DATA improvements guaranteeing order of sched classes are made
by Steven Rostedt <rostedt@goodmis.org>

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>

v2: Steven's data sections ordering. Hunk with comment in Makefile is removed.
---
 include/asm-generic/vmlinux.lds.h |    8 ++++++++
 kernel/sched/core.c               |   24 +++++++++---------------
 kernel/sched/deadline.c           |    3 ++-
 kernel/sched/fair.c               |    3 ++-
 kernel/sched/idle.c               |    3 ++-
 kernel/sched/rt.c                 |    3 ++-
 kernel/sched/stop_task.c          |    3 ++-
 7 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index e00f41aa8ec4..ff12a422ff19 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -108,6 +108,13 @@
 #define SBSS_MAIN .sbss
 #endif
 
+#define SCHED_DATA				\
+	*(__idle_sched_class)			\
+	*(__fair_sched_class)			\
+	*(__rt_sched_class)			\
+	*(__dl_sched_class)			\
+	*(__stop_sched_class)
+
 /*
  * Align to a 32 byte boundary equal to the
  * alignment gcc 4.5 uses for a struct
@@ -308,6 +315,7 @@
 #define DATA_DATA							\
 	*(.xiptext)							\
 	*(DATA_MAIN)							\
+	SCHED_DATA							\
 	*(.ref.data)							\
 	*(.data..shared_aligned) /* percpu related */			\
 	MEM_KEEP(init.data*)						\
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 15508c202bf5..befdd7158b27 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1416,20 +1416,10 @@ static inline void check_class_changed(struct rq *rq, struct task_struct *p,
 
 void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
 {
-	const struct sched_class *class;
-
-	if (p->sched_class == rq->curr->sched_class) {
+	if (p->sched_class == rq->curr->sched_class)
 		rq->curr->sched_class->check_preempt_curr(rq, p, flags);
-	} else {
-		for_each_class(class) {
-			if (class == rq->curr->sched_class)
-				break;
-			if (class == p->sched_class) {
-				resched_curr(rq);
-				break;
-			}
-		}
-	}
+	else if (p->sched_class > rq->curr->sched_class)
+		resched_curr(rq);
 
 	/*
 	 * A queue event has occurred, and we're going to schedule.  In
@@ -3914,8 +3904,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	 * higher scheduling class, because otherwise those loose the
 	 * opportunity to pull in more work from other CPUs.
 	 */
-	if (likely((prev->sched_class == &idle_sched_class ||
-		    prev->sched_class == &fair_sched_class) &&
+	if (likely(prev->sched_class <= &fair_sched_class &&
 		   rq->nr_running == rq->cfs.h_nr_running)) {
 
 		p = pick_next_task_fair(rq, prev, rf);
@@ -6569,6 +6558,11 @@ void __init sched_init(void)
 	unsigned long ptr = 0;
 	int i;
 
+	BUG_ON(&idle_sched_class > &fair_sched_class ||
+		&fair_sched_class > &rt_sched_class ||
+		&rt_sched_class > &dl_sched_class ||
+		&dl_sched_class > &stop_sched_class);
+
 	wait_bit_init();
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 43323f875cb9..5abdbe569f93 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2428,7 +2428,8 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
 	}
 }
 
-const struct sched_class dl_sched_class = {
+const struct sched_class dl_sched_class
+	__attribute__((section("__dl_sched_class"))) = {
 	.next			= &rt_sched_class,
 	.enqueue_task		= enqueue_task_dl,
 	.dequeue_task		= dequeue_task_dl,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8da0222924cf..9379b3804582 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10760,7 +10760,8 @@ static unsigned int get_rr_interval_fair(struct rq *rq, struct task_struct *task
 /*
  * All the scheduling class methods:
  */
-const struct sched_class fair_sched_class = {
+const struct sched_class fair_sched_class
+	__attribute__((section("__fair_sched_class"))) = {
 	.next			= &idle_sched_class,
 	.enqueue_task		= enqueue_task_fair,
 	.dequeue_task		= dequeue_task_fair,
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index ffa959e91227..700a9c826f0e 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -454,7 +454,8 @@ static void update_curr_idle(struct rq *rq)
 /*
  * Simple, special scheduling class for the per-CPU idle tasks:
  */
-const struct sched_class idle_sched_class = {
+const struct sched_class idle_sched_class
+	__attribute__((section("__idle_sched_class"))) = {
 	/* .next is NULL */
 	/* no enqueue/yield_task for idle tasks */
 
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e591d40fd645..5d3f9bcddaeb 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2354,7 +2354,8 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
 		return 0;
 }
 
-const struct sched_class rt_sched_class = {
+const struct sched_class rt_sched_class
+	__attribute__((section("__rt_sched_class"))) = {
 	.next			= &fair_sched_class,
 	.enqueue_task		= enqueue_task_rt,
 	.dequeue_task		= dequeue_task_rt,
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 4c9e9975684f..03bc7530ff75 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -115,7 +115,8 @@ static void update_curr_stop(struct rq *rq)
 /*
  * Simple, special scheduling class for the per-CPU stop tasks:
  */
-const struct sched_class stop_sched_class = {
+const struct sched_class stop_sched_class
+	__attribute__((section("__stop_sched_class"))) = {
 	.next			= &dl_sched_class,
 
 	.enqueue_task		= enqueue_task_stop,

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Q] ld: Does LTO reorder ro variables in two files?
  2019-12-19 14:04   ` [Q] ld: Does LTO reorder ro variables in two files? Kirill Tkhai
@ 2019-12-19 15:21     ` Jeff Law
  2019-12-19 15:30       ` Kirill Tkhai
  2019-12-19 15:45     ` Alexander Monakov
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff Law @ 2019-12-19 15:21 UTC (permalink / raw)
  To: Kirill Tkhai, Peter Zijlstra, gcc-help
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, linux-kernel

On Thu, 2019-12-19 at 17:04 +0300, Kirill Tkhai wrote:
> CC: gcc-help@gcc.gnu.org
> 
> Hi, gcc guys,
> 
> this thread starts here: https://lkml.org/lkml/2019/12/19/403
> 
> There are two const variables:
> 
>    struct sched_class idle_sched_class
> and
>    struct sched_class fair_sched_class,
> 
> which are declared in two files idle.c and fair.c.
> 
> 1)In Makefile the order is: idle.o fair.o
> 2)the variables go to the same ro section
> 3)there is no SORT(.*) keyword in linker script.
> 
> Is it always true, that after linkage &idle_sched_class < &fair_sched_class?
I certainly wouldn't depend on it.   The first and most obvious problem
is symbol sorting by the linker.  Longer term I'd be worried about LTO
reordering things.

In the end I'm pretty sure it'd be well outside what I'd be comfortable
depending on.

jeff
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Q] ld: Does LTO reorder ro variables in two files?
  2019-12-19 15:21     ` Jeff Law
@ 2019-12-19 15:30       ` Kirill Tkhai
  0 siblings, 0 replies; 24+ messages in thread
From: Kirill Tkhai @ 2019-12-19 15:30 UTC (permalink / raw)
  To: law, Peter Zijlstra, gcc-help
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, linux-kernel

On 19.12.2019 18:21, Jeff Law wrote:
> On Thu, 2019-12-19 at 17:04 +0300, Kirill Tkhai wrote:
>> CC: gcc-help@gcc.gnu.org
>>
>> Hi, gcc guys,
>>
>> this thread starts here: https://lkml.org/lkml/2019/12/19/403
>>
>> There are two const variables:
>>
>>    struct sched_class idle_sched_class
>> and
>>    struct sched_class fair_sched_class,
>>
>> which are declared in two files idle.c and fair.c.
>>
>> 1)In Makefile the order is: idle.o fair.o
>> 2)the variables go to the same ro section
>> 3)there is no SORT(.*) keyword in linker script.
>>
>> Is it always true, that after linkage &idle_sched_class < &fair_sched_class?
> I certainly wouldn't depend on it.   The first and most obvious problem
> is symbol sorting by the linker.  Longer term I'd be worried about LTO
> reordering things.
> 
> In the end I'm pretty sure it'd be well outside what I'd be comfortable
> depending on.

Ok, I'd be comfortable too :) Thanks for the clarification, Jeff.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 15:20             ` [PATCH v2] " Kirill Tkhai
@ 2019-12-19 15:40               ` Steven Rostedt
  2019-12-19 16:08                 ` Kirill Tkhai
  2019-12-19 16:05               ` Steven Rostedt
  1 sibling, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2019-12-19 15:40 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Peter Zijlstra, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, linux-kernel

On Thu, 19 Dec 2019 18:20:58 +0300
Kirill Tkhai <ktkhai@virtuozzo.com> wrote:

> From: Kirill Tkhai <ktkhai@virtuozzo.com>
> 
> This introduces an optimization based on xxx_sched_class addresses
> in two hot scheduler functions: pick_next_task() and check_preempt_curr().
> 
> After this patch, it will be possible to compare pointers to sched classes
> to check, which of them has a higher priority, instead of current iterations
> using for_each_class().
> 
> One more result of the patch is that size of object file becomes a little
> less (excluding added BUG_ON(), which goes in __init section):
> 
> $size kernel/sched/core.o
>          text     data      bss	    dec	    hex	filename
> before:  66446    18957	    676	  86079	  1503f	kernel/sched/core.o
> after:   66398    18957	    676	  86031	  1500f	kernel/sched/core.o
> 
> SCHED_DATA improvements guaranteeing order of sched classes are made
> by Steven Rostedt <rostedt@goodmis.org>

For the above changes, you can add:

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> 
> v2: Steven's data sections ordering. Hunk with comment in Makefile is removed.
> ---
>  include/asm-generic/vmlinux.lds.h |    8 ++++++++
>  kernel/sched/core.c               |   24 +++++++++---------------
>  kernel/sched/deadline.c           |    3 ++-
>  kernel/sched/fair.c               |    3 ++-
>  kernel/sched/idle.c               |    3 ++-
>  kernel/sched/rt.c                 |    3 ++-
>  kernel/sched/stop_task.c          |    3 ++-
>  7 files changed, 27 insertions(+), 20 deletions(-)
> 
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index e00f41aa8ec4..ff12a422ff19 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -108,6 +108,13 @@
>  #define SBSS_MAIN .sbss
>  #endif
>  

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Q] ld: Does LTO reorder ro variables in two files?
  2019-12-19 14:04   ` [Q] ld: Does LTO reorder ro variables in two files? Kirill Tkhai
  2019-12-19 15:21     ` Jeff Law
@ 2019-12-19 15:45     ` Alexander Monakov
  2019-12-19 16:08       ` Kirill Tkhai
  1 sibling, 1 reply; 24+ messages in thread
From: Alexander Monakov @ 2019-12-19 15:45 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Peter Zijlstra, gcc-help, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, linux-kernel,
	Jan Hubicka

[adding Jan Hubicka, GCC LTO maintainer]

On Thu, 19 Dec 2019, Kirill Tkhai wrote:

> CC: gcc-help@gcc.gnu.org
> 
> Hi, gcc guys,
> 
> this thread starts here: https://lkml.org/lkml/2019/12/19/403
> 
> There are two const variables:
> 
>    struct sched_class idle_sched_class
> and
>    struct sched_class fair_sched_class,
> 
> which are declared in two files idle.c and fair.c.
> 
> 1)In Makefile the order is: idle.o fair.o
> 2)the variables go to the same ro section
> 3)there is no SORT(.*) keyword in linker script.
> 
> Is it always true, that after linkage &idle_sched_class < &fair_sched_class?

No, with LTO you don't have that guarantee. For functions it's more obvious,
GCC wants to analyze functions in reverse topological order so callees are
generally optimized before callers, and it will emit assembly as it goes, so
function ordering with LTO does not give much care to translation unit
boundaries. For variables it's a bit more subtle, GCC partitions all variables
and functions so it can hand them off to multiple compiler processes while doing
LTO. There's no guarantees about order of variables that end up in different
partitions.

There's __attribute__((no_reorder)) that is intended to enforce ordering even
with LTO (it's documented under "Common function attributes" but works for
global variables as well).

Alexander

> Thanks!
> Kirill
> 
> On 19.12.2019 16:12, Peter Zijlstra wrote:
> > On Thu, Dec 19, 2019 at 03:39:14PM +0300, Kirill Tkhai wrote:
> >> In kernel/sched/Makefile files, describing different sched classes, already
> >> go in the order from the lowest priority class to the highest priority class:
> >>
> >> idle.o fair.o rt.o deadline.o stop_task.o
> >>
> >> The documentation of GNU linker says, that section appears in the order
> >> they are seen during link time (see [1]):
> >>
> >>> Normally, the linker will place files and sections matched by wildcards
> >>> in the order in which they are seen during the link. You can change this
> >>> by using the SORT keyword, which appears before a wildcard pattern
> >>> in parentheses (e.g., SORT(.text*)).
> >>
> >> So, we may expect const variables from idle.o will go before ro variables
> >> from fair.o in RO_DATA section, while ro variables from fair.o will go
> >> before ro variables from rt.o, etc.
> >>
> >> (Also, it looks like the linking order is already used in kernel, e.g.
> >>  in drivers/md/Makefile)
> >>
> >> Thus, we may introduce an optimization based on xxx_sched_class addresses
> >> in these two hot scheduler functions: pick_next_task() and check_preempt_curr().
> >>
> >> One more result of the patch is that size of object file becomes a little
> >> less (excluding added BUG_ON(), which goes in __init section):
> >>
> >> $size kernel/sched/core.o
> >>          text     data      bss	    dec	    hex	filename
> >> before:  66446    18957	    676	  86079	  1503f	kernel/sched/core.o
> >> after:   66398    18957	    676	  86031	  1500f	kernel/sched/core.o
> > 
> > Does LTO preserve this behaviour? I've never quite dared do this exact
> > optimization.
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 15:20             ` [PATCH v2] " Kirill Tkhai
  2019-12-19 15:40               ` Steven Rostedt
@ 2019-12-19 16:05               ` Steven Rostedt
  1 sibling, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2019-12-19 16:05 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Peter Zijlstra, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, linux-kernel

On Thu, 19 Dec 2019 18:20:58 +0300
Kirill Tkhai <ktkhai@virtuozzo.com> wrote:

> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index e00f41aa8ec4..ff12a422ff19 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -108,6 +108,13 @@
>  #define SBSS_MAIN .sbss
>  #endif

I would also add a comment here:

/*
 * The order of the sched class addresses are important, as they are
 * used to determine the order of the priority of each sched class in
 * relation to each other.
 */

>  
> +#define SCHED_DATA				\
> +	*(__idle_sched_class)			\
> +	*(__fair_sched_class)			\
> +	*(__rt_sched_class)			\
> +	*(__dl_sched_class)			\
> +	*(__stop_sched_class)
> +
>  /*
>   * Align to a 32 byte boundary equal to the
>   * alignment gcc 4.5 uses for a struct
> @@ -308,6 +315,7 @@
>  #define DATA_DATA							\
>  	*(.xiptext)							\
>  	*(DATA_MAIN)							\
> +	SCHED_DATA							\
>  	*(.ref.data)							\
>  	*(.data..shared_aligned) /* percpu related */			\
>  	MEM_KEEP(init.data*)						\

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 15:40               ` Steven Rostedt
@ 2019-12-19 16:08                 ` Kirill Tkhai
  2019-12-19 16:22                   ` Steven Rostedt
  0 siblings, 1 reply; 24+ messages in thread
From: Kirill Tkhai @ 2019-12-19 16:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, linux-kernel

On 19.12.2019 18:40, Steven Rostedt wrote:
> On Thu, 19 Dec 2019 18:20:58 +0300
> Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> 
>> From: Kirill Tkhai <ktkhai@virtuozzo.com>
>>
>> This introduces an optimization based on xxx_sched_class addresses
>> in two hot scheduler functions: pick_next_task() and check_preempt_curr().
>>
>> After this patch, it will be possible to compare pointers to sched classes
>> to check, which of them has a higher priority, instead of current iterations
>> using for_each_class().
>>
>> One more result of the patch is that size of object file becomes a little
>> less (excluding added BUG_ON(), which goes in __init section):
>>
>> $size kernel/sched/core.o
>>          text     data      bss	    dec	    hex	filename
>> before:  66446    18957	    676	  86079	  1503f	kernel/sched/core.o
>> after:   66398    18957	    676	  86031	  1500f	kernel/sched/core.o
>>
>> SCHED_DATA improvements guaranteeing order of sched classes are made
>> by Steven Rostedt <rostedt@goodmis.org>
> 
> For the above changes, you can add:
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Should I resend this as two patches, with your changes in a separate?

> 
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>
>> v2: Steven's data sections ordering. Hunk with comment in Makefile is removed.
>> ---
>>  include/asm-generic/vmlinux.lds.h |    8 ++++++++
>>  kernel/sched/core.c               |   24 +++++++++---------------
>>  kernel/sched/deadline.c           |    3 ++-
>>  kernel/sched/fair.c               |    3 ++-
>>  kernel/sched/idle.c               |    3 ++-
>>  kernel/sched/rt.c                 |    3 ++-
>>  kernel/sched/stop_task.c          |    3 ++-
>>  7 files changed, 27 insertions(+), 20 deletions(-)
>>
>> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
>> index e00f41aa8ec4..ff12a422ff19 100644
>> --- a/include/asm-generic/vmlinux.lds.h
>> +++ b/include/asm-generic/vmlinux.lds.h
>> @@ -108,6 +108,13 @@
>>  #define SBSS_MAIN .sbss
>>  #endif
>>  


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Q] ld: Does LTO reorder ro variables in two files?
  2019-12-19 15:45     ` Alexander Monakov
@ 2019-12-19 16:08       ` Kirill Tkhai
  0 siblings, 0 replies; 24+ messages in thread
From: Kirill Tkhai @ 2019-12-19 16:08 UTC (permalink / raw)
  To: Alexander Monakov
  Cc: Peter Zijlstra, gcc-help, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, linux-kernel,
	Jan Hubicka

On 19.12.2019 18:45, Alexander Monakov wrote:
> [adding Jan Hubicka, GCC LTO maintainer]
> 
> On Thu, 19 Dec 2019, Kirill Tkhai wrote:
> 
>> CC: gcc-help@gcc.gnu.org
>>
>> Hi, gcc guys,
>>
>> this thread starts here: https://lkml.org/lkml/2019/12/19/403
>>
>> There are two const variables:
>>
>>    struct sched_class idle_sched_class
>> and
>>    struct sched_class fair_sched_class,
>>
>> which are declared in two files idle.c and fair.c.
>>
>> 1)In Makefile the order is: idle.o fair.o
>> 2)the variables go to the same ro section
>> 3)there is no SORT(.*) keyword in linker script.
>>
>> Is it always true, that after linkage &idle_sched_class < &fair_sched_class?
> 
> No, with LTO you don't have that guarantee. For functions it's more obvious,
> GCC wants to analyze functions in reverse topological order so callees are
> generally optimized before callers, and it will emit assembly as it goes, so
> function ordering with LTO does not give much care to translation unit
> boundaries. For variables it's a bit more subtle, GCC partitions all variables
> and functions so it can hand them off to multiple compiler processes while doing
> LTO. There's no guarantees about order of variables that end up in different
> partitions.
> 
> There's __attribute__((no_reorder)) that is intended to enforce ordering even
> with LTO (it's documented under "Common function attributes" but works for
> global variables as well).

Thanks, Alexander!

Kirill

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 16:08                 ` Kirill Tkhai
@ 2019-12-19 16:22                   ` Steven Rostedt
  2019-12-19 20:16                     ` Kirill Tkhai
  0 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2019-12-19 16:22 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Peter Zijlstra, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, linux-kernel

On Thu, 19 Dec 2019 19:08:05 +0300
Kirill Tkhai <ktkhai@virtuozzo.com> wrote:

> Should I resend this as two patches, with your changes in a separate?

You don't have to, you can include multiple SOBs if a patch was written
by two people.

But perhaps it will better to do so, that way people will know who to
blame when the linker breaks ;-)

I'll send you a patch that you can apply just before your change. That
may be the cleanest way.

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 16:22                   ` Steven Rostedt
@ 2019-12-19 20:16                     ` Kirill Tkhai
  2019-12-19 20:20                       ` Steven Rostedt
  0 siblings, 1 reply; 24+ messages in thread
From: Kirill Tkhai @ 2019-12-19 20:16 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, linux-kernel

On 19.12.2019 19:22, Steven Rostedt wrote:
> On Thu, 19 Dec 2019 19:08:05 +0300
> Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> 
>> Should I resend this as two patches, with your changes in a separate?
> 
> You don't have to, you can include multiple SOBs if a patch was written
> by two people.
> 
> But perhaps it will better to do so, that way people will know who to
> blame when the linker breaks ;-)
> 
> I'll send you a patch that you can apply just before your change. That
> may be the cleanest way.

Two small patches look better then one huge, so I prefer to send a patch
on top of yours :)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] sched: Micro optimization in pick_next_task() and in check_preempt_curr()
  2019-12-19 20:16                     ` Kirill Tkhai
@ 2019-12-19 20:20                       ` Steven Rostedt
  0 siblings, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2019-12-19 20:20 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Peter Zijlstra, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, mgorman, linux-kernel

On Thu, 19 Dec 2019 20:16:17 +0000
Kirill Tkhai <ktkhai@virtuozzo.com> wrote:

> Two small patches look better then one huge, so I prefer to send a patch
> on top of yours :)

My one patch has turned into 3 patches. Having the sorted structs gave
me some more crazy ideas. I'll post the series shortly ;-)

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [tip: sched/core] sched: Force the address order of each sched class descriptor
  2019-12-19 12:39 [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr() Kirill Tkhai
  2019-12-19 13:12 ` Peter Zijlstra
  2019-12-19 13:50 ` [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr() Steven Rostedt
@ 2020-06-25 11:53 ` tip-bot2 for Steven Rostedt (VMware)
  2 siblings, 0 replies; 24+ messages in thread
From: tip-bot2 for Steven Rostedt (VMware) @ 2020-06-25 11:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Steven Rostedt (VMware), Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     590d69796346353878b275c5512c664e3f875f24
Gitweb:        https://git.kernel.org/tip/590d69796346353878b275c5512c664e3f875f24
Author:        Steven Rostedt (VMware) <rostedt@goodmis.org>
AuthorDate:    Thu, 19 Dec 2019 16:44:52 -05:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 25 Jun 2020 13:45:43 +02:00

sched: Force the address order of each sched class descriptor

In order to make a micro optimization in pick_next_task(), the order of the
sched class descriptor address must be in the same order as their priority
to each other. That is:

 &idle_sched_class < &fair_sched_class < &rt_sched_class <
 &dl_sched_class < &stop_sched_class

In order to guarantee this order of the sched class descriptors, add each
one into their own data section and force the order in the linker script.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/157675913272.349305.8936736338884044103.stgit@localhost.localdomain
---
 include/asm-generic/vmlinux.lds.h | 13 +++++++++++++
 kernel/sched/deadline.c           |  3 ++-
 kernel/sched/fair.c               |  3 ++-
 kernel/sched/idle.c               |  3 ++-
 kernel/sched/rt.c                 |  3 ++-
 kernel/sched/stop_task.c          |  3 ++-
 6 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index db600ef..2186d7b 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -109,6 +109,18 @@
 #endif
 
 /*
+ * The order of the sched class addresses are important, as they are
+ * used to determine the order of the priority of each sched class in
+ * relation to each other.
+ */
+#define SCHED_DATA				\
+	*(__idle_sched_class)			\
+	*(__fair_sched_class)			\
+	*(__rt_sched_class)			\
+	*(__dl_sched_class)			\
+	*(__stop_sched_class)
+
+/*
  * Align to a 32 byte boundary equal to the
  * alignment gcc 4.5 uses for a struct
  */
@@ -388,6 +400,7 @@
 	.rodata           : AT(ADDR(.rodata) - LOAD_OFFSET) {		\
 		__start_rodata = .;					\
 		*(.rodata) *(.rodata.*)					\
+		SCHED_DATA						\
 		RO_AFTER_INIT_DATA	/* Read only after init */	\
 		. = ALIGN(8);						\
 		__start___tracepoints_ptrs = .;				\
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d4708e2..d9e7946 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2479,7 +2479,8 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
 	}
 }
 
-const struct sched_class dl_sched_class = {
+const struct sched_class dl_sched_class
+	__attribute__((section("__dl_sched_class"))) = {
 	.next			= &rt_sched_class,
 	.enqueue_task		= enqueue_task_dl,
 	.dequeue_task		= dequeue_task_dl,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0424a0a..3365f6b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11122,7 +11122,8 @@ static unsigned int get_rr_interval_fair(struct rq *rq, struct task_struct *task
 /*
  * All the scheduling class methods:
  */
-const struct sched_class fair_sched_class = {
+const struct sched_class fair_sched_class
+	__attribute__((section("__fair_sched_class"))) = {
 	.next			= &idle_sched_class,
 	.enqueue_task		= enqueue_task_fair,
 	.dequeue_task		= dequeue_task_fair,
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 8d75ca2..f580629 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -453,7 +453,8 @@ static void update_curr_idle(struct rq *rq)
 /*
  * Simple, special scheduling class for the per-CPU idle tasks:
  */
-const struct sched_class idle_sched_class = {
+const struct sched_class idle_sched_class
+	__attribute__((section("__idle_sched_class"))) = {
 	/* .next is NULL */
 	/* no enqueue/yield_task for idle tasks */
 
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index f395ddb..6543d44 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2429,7 +2429,8 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
 		return 0;
 }
 
-const struct sched_class rt_sched_class = {
+const struct sched_class rt_sched_class
+	__attribute__((section("__rt_sched_class"))) = {
 	.next			= &fair_sched_class,
 	.enqueue_task		= enqueue_task_rt,
 	.dequeue_task		= dequeue_task_rt,
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 3e50a6a..f4bbd54 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -109,7 +109,8 @@ static void update_curr_stop(struct rq *rq)
 /*
  * Simple, special scheduling class for the per-CPU stop tasks:
  */
-const struct sched_class stop_sched_class = {
+const struct sched_class stop_sched_class
+	__attribute__((section("__stop_sched_class"))) = {
 	.next			= &dl_sched_class,
 
 	.enqueue_task		= enqueue_task_stop,

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-06-25 11:53 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-19 12:39 [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr() Kirill Tkhai
2019-12-19 13:12 ` Peter Zijlstra
2019-12-19 14:02   ` Peter Zijlstra
2019-12-19 14:25     ` Kirill Tkhai
2019-12-19 14:31       ` Kirill Tkhai
2019-12-19 14:43       ` Steven Rostedt
2019-12-19 14:46         ` Kirill Tkhai
2019-12-19 14:59           ` Steven Rostedt
2019-12-19 15:20             ` [PATCH v2] " Kirill Tkhai
2019-12-19 15:40               ` Steven Rostedt
2019-12-19 16:08                 ` Kirill Tkhai
2019-12-19 16:22                   ` Steven Rostedt
2019-12-19 20:16                     ` Kirill Tkhai
2019-12-19 20:20                       ` Steven Rostedt
2019-12-19 16:05               ` Steven Rostedt
2019-12-19 14:04   ` [Q] ld: Does LTO reorder ro variables in two files? Kirill Tkhai
2019-12-19 15:21     ` Jeff Law
2019-12-19 15:30       ` Kirill Tkhai
2019-12-19 15:45     ` Alexander Monakov
2019-12-19 16:08       ` Kirill Tkhai
2019-12-19 13:50 ` [PATCH RFC] sched: Micro optimization in pick_next_task() and in check_preempt_curr() Steven Rostedt
2019-12-19 13:53   ` Kirill Tkhai
2019-12-19 13:53   ` Peter Zijlstra
2020-06-25 11:53 ` [tip: sched/core] sched: Force the address order of each sched class descriptor tip-bot2 for Steven Rostedt (VMware)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).