All of lore.kernel.org
 help / color / mirror / Atom feed
* [ANNOUNCE] LinSched for v3.3-rc7
@ 2012-03-15  3:58 Paul Turner
  2012-03-15  4:08 ` Dhaval Giani
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Paul Turner @ 2012-03-15  3:58 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Paul McKenney, Benjamin Segall,
	Ranjit Manomohan, Nikhil Rao, jmc, Dhaval Giani, Suresh Siddha,
	Srivatsa Vaddagiri
  Cc: LKML

Hi All,

[ Take 2, gmail tried to a non text/plain component into the last email .. ]

Quick start version:

Available under linsched-alpha at:
  git://git.kernel.org/pub/scm/linux/kernel/git/pjt/linsched.git  .linsched

NOTE: The branch history is still subject to some revision as I am
still re-partitioning some of the patches.  Once this is complete, I
will promote linsched-alpha into a linsched branch at which point it
will no longer be subject to history re-writes.

After checking out the code:
cd tools/linsched
make
cd tests
./run_tests.sh basic_tests
<< then try changing some scheduler parameters, e.g. sched_latency,
and repeating >>

(Note:  The basic_tests are unit-tests, these are calibrated to the
current scheduler tunables and should strictly be considered sanity
tests.  Please see the mcarlo-sim work for a more useful testing
environment.)

Extended version:

First of all, apologies in the delay to posting this -- I know there's
been a lot of interest.  We made the choice to first rebase to v3.3
since there were fairly extensive changes, especially within the
scheduler, that meant we had the opportunity to significantly clean up
some of the LinSched code.  (For example, previously we were
processing kernel/sched* using awk as a Makefile step so that we could
extract the necessary structure information without modifying
sched.c!)  While the code benefited greatly from this, there were
several other changes that required fairly extensive modification in
this process (and in the meanwhile the v3.1 version became less
representative due to the extent of the above changes); which pushed
things out much further than I would have liked.  I suppose the moral
of the story is always release early, and often.

That said, I'm relatively happy with the current state of integration,
there's certainly some specific areas that can still be greatly
improved (in particular, the main simulator loop has not had as much
attention paid as the LinSched<>Kernel interactions and there's a long
list of TODOs that could be improved there), but things are now mated
fairly cleanly through the use of a new LinSched architecture.  This
is a total re-write of almost all LinSched<>Kernel interactions versus
the previous (2.6.35) version, and has allowed us to now carry almost
zero modifications against the kernel source.  It's both possible to
develop/test in place, as well as being patch compatible.  The
remaining touch-points now total just 20 lines!  Half of these are
likely mergable, with the other 10 lines being more LinSched specific
at this point in time, I've broken these down below:

The total damage:
 include/linux/init.h      |    6 ++++++   (linsched ugliness,
unfortunately necessary until we boot-strap proper initcall support)
 include/linux/rcupdate.h  |    3 +++    (only necessary to allow -O0
compilation which is extremely handy for analyzing the scheduler using
gdb)
 kernel/pid.c              |    4 ++++        (linsched ugliness,
these can go eventually)
 kernel/sched/fair.c       |    2 +-          (this is just the
promotion of 1 structure and function from static state which weren't
published in the sched/ re-factoring that we need from within the
simulator)
 kernel/sched/stats.c      |    2 +-
 kernel/time/timekeeping.c |    3 ++-    (this fixes a time-dilation
error due to rounding when our clock-source has ns-resolution, e.g.
shift==1)
 6 files changed, 17 insertions(+), 3 deletions(-)

Summarized changes vs 2.6.35 (previous version):

- The original LinSched (understandably) simplified many of the kernel
interactions in order to make simulation easier.  Unfortunately, this
has serious side-effects on the accuracy of simulation.  We've now
introduced a large portion of this state, including: irq and soft-irq
contexts (we now perform periodic load-balance out of SCHED_SOFTIRQ
for example), support for active load-balancing, correctly modeled
nohz interactions, ipi and stop-task support.

- Support for record and replay of application scheduling via perf.
This is not yet well integrated, but under tests exist the tools to
record an applications behavior using perf sched record, and then play
it back in the simulator.

- Load-balancer scoring.  This one is a very promising new avenue for
load-balancer testing.  We analyzed several workloads and found that
they could be well-modeled using a log-normal distribution.
Parameterizing these models then allows us to construct a large (500)
test-case set of randomly generated workloads that behave similarly.
By integrating the variance between the current load-balance and an
offline computed (currently greedy first-fit) balance we're able to
automatically identify and score an approximation of our distance from
an ideal load-balance.  Historically, such scores are very difficult
to interpret, however, that's where our ability to generate a large
set of test-cases above comes in.  This allows us to exploit a nice
property, it's much easier to design a scoring function that diverges
(in this case the variance) than a nice stable one that converges.  We
can then catch regressions in load-balancer quality by measuring the
divergence in this set of scoring functions across our set of
test-cases.  This particular feature needs a large set of
documentation in itself (todo), but to get started with playing with
it see Makefile.mcarlo-sims in tools/linsched/tests.  In particular to
evaluate the entire set across a variety of topologies the following
command can be issued:
  make -j <num_cpus * 2 > -f Makefile.mcarlo-sims
(The included 'diff-mcarlo-500' tool can then be used to make
comparisons across result sets.)

- Validation versus real hardware.  Under tests/validation we've
included a tool for replaying and recording the above simulations on a
live-machine.  These can then be compared to simulated runs using the
tools above to ensure that LinSched is modelling your architecture
reasonably appropriately.  We did some reasonably extensive
comparisons versus several x86 topologies in the v3.1 code using this;
it's a fundamentally hard problem -- in particular there's much more
clock drift between events on real hardware, but the results showed
the included topologies to be a reasonable simulacrum under LinSched.

What's to come?
- More documentation, especially about the use of the new
load-balancer scoring tools.
- The history is very coarse right now as a result of going through a
rebase cement-mixer.  I'd like to incrementally refactor some of the
larger commits; once this is done I will promote linsched-alpha to a
stable linsched branch that won't be subject to history re-writes.
- KBuild integration.  We currently build everything out of the
tools/linsched makefiles.  One of the immediate TODOs involves
re-working the arch/linsched half of this to work with kbuild so that
its less hacky/fragile.
- Writing up some of the existing TODOs as starting points for anyone
who wants to get involved.

I'd also like to take a moment to specially recognize the effort of
the following contributors, all of whom were involved extensively in
the work above.  Things have come a long way since the 5000 lines of
"#ifdef LINSCHED", the current status would not be possible without
them.
  Ben Segall, Dhaval Giani, Ranjit Manomohan, Nikhil Rao, and Abhishek
Srivastava

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] LinSched for v3.3-rc7
  2012-03-15  3:58 [ANNOUNCE] LinSched for v3.3-rc7 Paul Turner
@ 2012-03-15  4:08 ` Dhaval Giani
  2012-03-21  9:20   ` Michael Wang
  2012-03-21 14:20   ` Morten Rasmussen
  2012-03-15  7:21 ` Ingo Molnar
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 12+ messages in thread
From: Dhaval Giani @ 2012-03-15  4:08 UTC (permalink / raw)
  To: Paul Turner
  Cc: Ingo Molnar, Peter Zijlstra, Paul McKenney, Benjamin Segall,
	Ranjit Manomohan, Nikhil Rao, jmc, Suresh Siddha,
	Srivatsa Vaddagiri, LKML, Abhishek Srivastava

[Adding abhishek to the cc]

On Wed, Mar 14, 2012 at 8:58 PM, Paul Turner <pjt@google.com> wrote:
> Hi All,
>
> [ Take 2, gmail tried to a non text/plain component into the last email .. ]
>
> Quick start version:
>
> Available under linsched-alpha at:
>  git://git.kernel.org/pub/scm/linux/kernel/git/pjt/linsched.git  .linsched
>
> NOTE: The branch history is still subject to some revision as I am
> still re-partitioning some of the patches.  Once this is complete, I
> will promote linsched-alpha into a linsched branch at which point it
> will no longer be subject to history re-writes.
>
> After checking out the code:
> cd tools/linsched
> make
> cd tests
> ./run_tests.sh basic_tests
> << then try changing some scheduler parameters, e.g. sched_latency,
> and repeating >>
>
> (Note:  The basic_tests are unit-tests, these are calibrated to the
> current scheduler tunables and should strictly be considered sanity
> tests.  Please see the mcarlo-sim work for a more useful testing
> environment.)
>
> Extended version:
>
> First of all, apologies in the delay to posting this -- I know there's
> been a lot of interest.  We made the choice to first rebase to v3.3
> since there were fairly extensive changes, especially within the
> scheduler, that meant we had the opportunity to significantly clean up
> some of the LinSched code.  (For example, previously we were
> processing kernel/sched* using awk as a Makefile step so that we could
> extract the necessary structure information without modifying
> sched.c!)  While the code benefited greatly from this, there were
> several other changes that required fairly extensive modification in
> this process (and in the meanwhile the v3.1 version became less
> representative due to the extent of the above changes); which pushed
> things out much further than I would have liked.  I suppose the moral
> of the story is always release early, and often.
>
> That said, I'm relatively happy with the current state of integration,
> there's certainly some specific areas that can still be greatly
> improved (in particular, the main simulator loop has not had as much
> attention paid as the LinSched<>Kernel interactions and there's a long
> list of TODOs that could be improved there), but things are now mated
> fairly cleanly through the use of a new LinSched architecture.  This
> is a total re-write of almost all LinSched<>Kernel interactions versus
> the previous (2.6.35) version, and has allowed us to now carry almost
> zero modifications against the kernel source.  It's both possible to
> develop/test in place, as well as being patch compatible.  The
> remaining touch-points now total just 20 lines!  Half of these are
> likely mergable, with the other 10 lines being more LinSched specific
> at this point in time, I've broken these down below:
>
> The total damage:
>  include/linux/init.h      |    6 ++++++   (linsched ugliness,
> unfortunately necessary until we boot-strap proper initcall support)
>  include/linux/rcupdate.h  |    3 +++    (only necessary to allow -O0
> compilation which is extremely handy for analyzing the scheduler using
> gdb)
>  kernel/pid.c              |    4 ++++        (linsched ugliness,
> these can go eventually)
>  kernel/sched/fair.c       |    2 +-          (this is just the
> promotion of 1 structure and function from static state which weren't
> published in the sched/ re-factoring that we need from within the
> simulator)
>  kernel/sched/stats.c      |    2 +-
>  kernel/time/timekeeping.c |    3 ++-    (this fixes a time-dilation
> error due to rounding when our clock-source has ns-resolution, e.g.
> shift==1)
>  6 files changed, 17 insertions(+), 3 deletions(-)
>
> Summarized changes vs 2.6.35 (previous version):
>
> - The original LinSched (understandably) simplified many of the kernel
> interactions in order to make simulation easier.  Unfortunately, this
> has serious side-effects on the accuracy of simulation.  We've now
> introduced a large portion of this state, including: irq and soft-irq
> contexts (we now perform periodic load-balance out of SCHED_SOFTIRQ
> for example), support for active load-balancing, correctly modeled
> nohz interactions, ipi and stop-task support.
>
> - Support for record and replay of application scheduling via perf.
> This is not yet well integrated, but under tests exist the tools to
> record an applications behavior using perf sched record, and then play
> it back in the simulator.
>
> - Load-balancer scoring.  This one is a very promising new avenue for
> load-balancer testing.  We analyzed several workloads and found that
> they could be well-modeled using a log-normal distribution.
> Parameterizing these models then allows us to construct a large (500)
> test-case set of randomly generated workloads that behave similarly.
> By integrating the variance between the current load-balance and an
> offline computed (currently greedy first-fit) balance we're able to
> automatically identify and score an approximation of our distance from
> an ideal load-balance.  Historically, such scores are very difficult
> to interpret, however, that's where our ability to generate a large
> set of test-cases above comes in.  This allows us to exploit a nice
> property, it's much easier to design a scoring function that diverges
> (in this case the variance) than a nice stable one that converges.  We
> can then catch regressions in load-balancer quality by measuring the
> divergence in this set of scoring functions across our set of
> test-cases.  This particular feature needs a large set of
> documentation in itself (todo), but to get started with playing with
> it see Makefile.mcarlo-sims in tools/linsched/tests.  In particular to
> evaluate the entire set across a variety of topologies the following
> command can be issued:
>  make -j <num_cpus * 2 > -f Makefile.mcarlo-sims
> (The included 'diff-mcarlo-500' tool can then be used to make
> comparisons across result sets.)
>
> - Validation versus real hardware.  Under tests/validation we've
> included a tool for replaying and recording the above simulations on a
> live-machine.  These can then be compared to simulated runs using the
> tools above to ensure that LinSched is modelling your architecture
> reasonably appropriately.  We did some reasonably extensive
> comparisons versus several x86 topologies in the v3.1 code using this;
> it's a fundamentally hard problem -- in particular there's much more
> clock drift between events on real hardware, but the results showed
> the included topologies to be a reasonable simulacrum under LinSched.
>
> What's to come?
> - More documentation, especially about the use of the new
> load-balancer scoring tools.
> - The history is very coarse right now as a result of going through a
> rebase cement-mixer.  I'd like to incrementally refactor some of the
> larger commits; once this is done I will promote linsched-alpha to a
> stable linsched branch that won't be subject to history re-writes.
> - KBuild integration.  We currently build everything out of the
> tools/linsched makefiles.  One of the immediate TODOs involves
> re-working the arch/linsched half of this to work with kbuild so that
> its less hacky/fragile.
> - Writing up some of the existing TODOs as starting points for anyone
> who wants to get involved.
>
> I'd also like to take a moment to specially recognize the effort of
> the following contributors, all of whom were involved extensively in
> the work above.  Things have come a long way since the 5000 lines of
> "#ifdef LINSCHED", the current status would not be possible without
> them.
>  Ben Segall, Dhaval Giani, Ranjit Manomohan, Nikhil Rao, and Abhishek
> Srivastava
>
> Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] LinSched for v3.3-rc7
  2012-03-15  3:58 [ANNOUNCE] LinSched for v3.3-rc7 Paul Turner
  2012-03-15  4:08 ` Dhaval Giani
@ 2012-03-15  7:21 ` Ingo Molnar
  2012-03-23  4:03 ` Michael Wang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2012-03-15  7:21 UTC (permalink / raw)
  To: Paul Turner
  Cc: Peter Zijlstra, Paul McKenney, Benjamin Segall, Ranjit Manomohan,
	Nikhil Rao, jmc, Dhaval Giani, Suresh Siddha, Srivatsa Vaddagiri,
	LKML


* Paul Turner <pjt@google.com> wrote:

> That said, I'm relatively happy with the current state of 
> integration, there's certainly some specific areas that can 
> still be greatly improved (in particular, the main simulator 
> loop has not had as much attention paid as the 
> LinSched<>Kernel interactions and there's a long list of TODOs 
> that could be improved there), but things are now mated fairly 
> cleanly through the use of a new LinSched architecture.  This 
> is a total re-write of almost all LinSched<>Kernel 
> interactions versus the previous (2.6.35) version, and has 
> allowed us to now carry almost zero modifications against the 
> kernel source.  It's both possible to develop/test in place, 
> as well as being patch compatible.  The remaining touch-points 
> now total just 20 lines!  Half of these are likely mergable, 
> with the other 10 lines being more LinSched specific at this 
> point in time, I've broken these down below:
> 
> The total damage:
>  include/linux/init.h      |    6 ++++++   (linsched ugliness,
> unfortunately necessary until we boot-strap proper initcall support)
>  include/linux/rcupdate.h  |    3 +++    (only necessary to allow -O0
> compilation which is extremely handy for analyzing the scheduler using
> gdb)
>  kernel/pid.c              |    4 ++++        (linsched ugliness,
> these can go eventually)
>  kernel/sched/fair.c       |    2 +-          (this is just the
> promotion of 1 structure and function from static state which weren't
> published in the sched/ re-factoring that we need from within the
> simulator)
>  kernel/sched/stats.c      |    2 +-
>  kernel/time/timekeeping.c |    3 ++-    (this fixes a time-dilation
> error due to rounding when our clock-source has ns-resolution, e.g.
> shift==1)
>  6 files changed, 17 insertions(+), 3 deletions(-)

Mind sending these preparatory changes as a standalone series as 
well, against the upstream scheduler, straight away?

Maybe we can find ways to remove the uglies while reviewing and 
integrating all that. Having those bits upstream would make the 
rest of linsched a lot easier to merge as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] LinSched for v3.3-rc7
  2012-03-15  4:08 ` Dhaval Giani
@ 2012-03-21  9:20   ` Michael Wang
  2012-03-21  9:54     ` Paul Turner
  2012-03-21 14:20   ` Morten Rasmussen
  1 sibling, 1 reply; 12+ messages in thread
From: Michael Wang @ 2012-03-21  9:20 UTC (permalink / raw)
  To: Dhaval Giani
  Cc: Paul Turner, Ingo Molnar, Peter Zijlstra, Paul McKenney,
	Benjamin Segall, Ranjit Manomohan, Nikhil Rao, jmc,
	Suresh Siddha, Srivatsa Vaddagiri, LKML, Abhishek Srivastava

On 03/15/2012 12:08 PM, Dhaval Giani wrote:

> [Adding abhishek to the cc]
> 
> On Wed, Mar 14, 2012 at 8:58 PM, Paul Turner <pjt@google.com> wrote:
>> Hi All,
>>
>> [ Take 2, gmail tried to a non text/plain component into the last email .. ]
>>
>> Quick start version:
>>
>> Available under linsched-alpha at:
>>  git://git.kernel.org/pub/scm/linux/kernel/git/pjt/linsched.git  .linsched
>>
>> NOTE: The branch history is still subject to some revision as I am
>> still re-partitioning some of the patches.  Once this is complete, I
>> will promote linsched-alpha into a linsched branch at which point it
>> will no longer be subject to history re-writes.
>>
>> After checking out the code:
>> cd tools/linsched
>> make
>> cd tests
>> ./run_tests.sh basic_tests
>> << then try changing some scheduler parameters, e.g. sched_latency,
>> and repeating >>
>>
>> (Note:  The basic_tests are unit-tests, these are calibrated to the
>> current scheduler tunables and should strictly be considered sanity
>> tests.  Please see the mcarlo-sim work for a more useful testing
>> environment.)
>>
>> Extended version:
>>
>> First of all, apologies in the delay to posting this -- I know there's
>> been a lot of interest.  We made the choice to first rebase to v3.3
>> since there were fairly extensive changes, especially within the
>> scheduler, that meant we had the opportunity to significantly clean up
>> some of the LinSched code.  (For example, previously we were
>> processing kernel/sched* using awk as a Makefile step so that we could
>> extract the necessary structure information without modifying
>> sched.c!)  While the code benefited greatly from this, there were
>> several other changes that required fairly extensive modification in
>> this process (and in the meanwhile the v3.1 version became less
>> representative due to the extent of the above changes); which pushed
>> things out much further than I would have liked.  I suppose the moral
>> of the story is always release early, and often.
>>
>> That said, I'm relatively happy with the current state of integration,
>> there's certainly some specific areas that can still be greatly
>> improved (in particular, the main simulator loop has not had as much
>> attention paid as the LinSched<>Kernel interactions and there's a long
>> list of TODOs that could be improved there), but things are now mated
>> fairly cleanly through the use of a new LinSched architecture.  This
>> is a total re-write of almost all LinSched<>Kernel interactions versus
>> the previous (2.6.35) version, and has allowed us to now carry almost
>> zero modifications against the kernel source.  It's both possible to
>> develop/test in place, as well as being patch compatible.  The
>> remaining touch-points now total just 20 lines!  Half of these are
>> likely mergable, with the other 10 lines being more LinSched specific
>> at this point in time, I've broken these down below:
>>
>> The total damage:
>>  include/linux/init.h      |    6 ++++++   (linsched ugliness,
>> unfortunately necessary until we boot-strap proper initcall support)
>>  include/linux/rcupdate.h  |    3 +++    (only necessary to allow -O0
>> compilation which is extremely handy for analyzing the scheduler using
>> gdb)
>>  kernel/pid.c              |    4 ++++        (linsched ugliness,
>> these can go eventually)
>>  kernel/sched/fair.c       |    2 +-          (this is just the
>> promotion of 1 structure and function from static state which weren't
>> published in the sched/ re-factoring that we need from within the
>> simulator)
>>  kernel/sched/stats.c      |    2 +-
>>  kernel/time/timekeeping.c |    3 ++-    (this fixes a time-dilation
>> error due to rounding when our clock-source has ns-resolution, e.g.
>> shift==1)


The edit in timekeeping:

xtime.tv_nsec = ((s64)timekeeper.xtime_nsec + (1ULL << timekeeper.shift)
- 1) >> timekeeper.shift;

Looks better then the old code which blindly add 1ns for the lost in
rounding, is it possible to commit this change to mainline?

Regards,
Michael Wang

>>  6 files changed, 17 insertions(+), 3 deletions(-)
>>
>> Summarized changes vs 2.6.35 (previous version):
>>
>> - The original LinSched (understandably) simplified many of the kernel
>> interactions in order to make simulation easier.  Unfortunately, this
>> has serious side-effects on the accuracy of simulation.  We've now
>> introduced a large portion of this state, including: irq and soft-irq
>> contexts (we now perform periodic load-balance out of SCHED_SOFTIRQ
>> for example), support for active load-balancing, correctly modeled
>> nohz interactions, ipi and stop-task support.
>>
>> - Support for record and replay of application scheduling via perf.
>> This is not yet well integrated, but under tests exist the tools to
>> record an applications behavior using perf sched record, and then play
>> it back in the simulator.
>>
>> - Load-balancer scoring.  This one is a very promising new avenue for
>> load-balancer testing.  We analyzed several workloads and found that
>> they could be well-modeled using a log-normal distribution.
>> Parameterizing these models then allows us to construct a large (500)
>> test-case set of randomly generated workloads that behave similarly.
>> By integrating the variance between the current load-balance and an
>> offline computed (currently greedy first-fit) balance we're able to
>> automatically identify and score an approximation of our distance from
>> an ideal load-balance.  Historically, such scores are very difficult
>> to interpret, however, that's where our ability to generate a large
>> set of test-cases above comes in.  This allows us to exploit a nice
>> property, it's much easier to design a scoring function that diverges
>> (in this case the variance) than a nice stable one that converges.  We
>> can then catch regressions in load-balancer quality by measuring the
>> divergence in this set of scoring functions across our set of
>> test-cases.  This particular feature needs a large set of
>> documentation in itself (todo), but to get started with playing with
>> it see Makefile.mcarlo-sims in tools/linsched/tests.  In particular to
>> evaluate the entire set across a variety of topologies the following
>> command can be issued:
>>  make -j <num_cpus * 2 > -f Makefile.mcarlo-sims
>> (The included 'diff-mcarlo-500' tool can then be used to make
>> comparisons across result sets.)
>>
>> - Validation versus real hardware.  Under tests/validation we've
>> included a tool for replaying and recording the above simulations on a
>> live-machine.  These can then be compared to simulated runs using the
>> tools above to ensure that LinSched is modelling your architecture
>> reasonably appropriately.  We did some reasonably extensive
>> comparisons versus several x86 topologies in the v3.1 code using this;
>> it's a fundamentally hard problem -- in particular there's much more
>> clock drift between events on real hardware, but the results showed
>> the included topologies to be a reasonable simulacrum under LinSched.
>>
>> What's to come?
>> - More documentation, especially about the use of the new
>> load-balancer scoring tools.
>> - The history is very coarse right now as a result of going through a
>> rebase cement-mixer.  I'd like to incrementally refactor some of the
>> larger commits; once this is done I will promote linsched-alpha to a
>> stable linsched branch that won't be subject to history re-writes.
>> - KBuild integration.  We currently build everything out of the
>> tools/linsched makefiles.  One of the immediate TODOs involves
>> re-working the arch/linsched half of this to work with kbuild so that
>> its less hacky/fragile.
>> - Writing up some of the existing TODOs as starting points for anyone
>> who wants to get involved.
>>
>> I'd also like to take a moment to specially recognize the effort of
>> the following contributors, all of whom were involved extensively in
>> the work above.  Things have come a long way since the 5000 lines of
>> "#ifdef LINSCHED", the current status would not be possible without
>> them.
>>  Ben Segall, Dhaval Giani, Ranjit Manomohan, Nikhil Rao, and Abhishek
>> Srivastava
>>
>> Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] LinSched for v3.3-rc7
  2012-03-21  9:20   ` Michael Wang
@ 2012-03-21  9:54     ` Paul Turner
  2012-03-21 10:11       ` Michael Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Turner @ 2012-03-21  9:54 UTC (permalink / raw)
  To: Michael Wang
  Cc: Dhaval Giani, Ingo Molnar, Peter Zijlstra, Paul McKenney,
	Benjamin Segall, Ranjit Manomohan, Nikhil Rao, jmc,
	Suresh Siddha, Srivatsa Vaddagiri, LKML, Abhishek Srivastava

On Wed, Mar 21, 2012 at 2:20 AM, Michael Wang
<wangyun@linux.vnet.ibm.com> wrote:
> On 03/15/2012 12:08 PM, Dhaval Giani wrote:
>
>> [Adding abhishek to the cc]
>>
>> On Wed, Mar 14, 2012 at 8:58 PM, Paul Turner <pjt@google.com> wrote:
>>> Hi All,
>>>
>>> [ Take 2, gmail tried to a non text/plain component into the last email .. ]
>>>
>>> Quick start version:
>>>
>>> Available under linsched-alpha at:
>>>  git://git.kernel.org/pub/scm/linux/kernel/git/pjt/linsched.git  .linsched
>>>
>>> NOTE: The branch history is still subject to some revision as I am
>>> still re-partitioning some of the patches.  Once this is complete, I
>>> will promote linsched-alpha into a linsched branch at which point it
>>> will no longer be subject to history re-writes.
>>>
>>> After checking out the code:
>>> cd tools/linsched
>>> make
>>> cd tests
>>> ./run_tests.sh basic_tests
>>> << then try changing some scheduler parameters, e.g. sched_latency,
>>> and repeating >>
>>>
>>> (Note:  The basic_tests are unit-tests, these are calibrated to the
>>> current scheduler tunables and should strictly be considered sanity
>>> tests.  Please see the mcarlo-sim work for a more useful testing
>>> environment.)
>>>
>>> Extended version:
>>>
>>> First of all, apologies in the delay to posting this -- I know there's
>>> been a lot of interest.  We made the choice to first rebase to v3.3
>>> since there were fairly extensive changes, especially within the
>>> scheduler, that meant we had the opportunity to significantly clean up
>>> some of the LinSched code.  (For example, previously we were
>>> processing kernel/sched* using awk as a Makefile step so that we could
>>> extract the necessary structure information without modifying
>>> sched.c!)  While the code benefited greatly from this, there were
>>> several other changes that required fairly extensive modification in
>>> this process (and in the meanwhile the v3.1 version became less
>>> representative due to the extent of the above changes); which pushed
>>> things out much further than I would have liked.  I suppose the moral
>>> of the story is always release early, and often.
>>>
>>> That said, I'm relatively happy with the current state of integration,
>>> there's certainly some specific areas that can still be greatly
>>> improved (in particular, the main simulator loop has not had as much
>>> attention paid as the LinSched<>Kernel interactions and there's a long
>>> list of TODOs that could be improved there), but things are now mated
>>> fairly cleanly through the use of a new LinSched architecture.  This
>>> is a total re-write of almost all LinSched<>Kernel interactions versus
>>> the previous (2.6.35) version, and has allowed us to now carry almost
>>> zero modifications against the kernel source.  It's both possible to
>>> develop/test in place, as well as being patch compatible.  The
>>> remaining touch-points now total just 20 lines!  Half of these are
>>> likely mergable, with the other 10 lines being more LinSched specific
>>> at this point in time, I've broken these down below:
>>>
>>> The total damage:
>>>  include/linux/init.h      |    6 ++++++   (linsched ugliness,
>>> unfortunately necessary until we boot-strap proper initcall support)
>>>  include/linux/rcupdate.h  |    3 +++    (only necessary to allow -O0
>>> compilation which is extremely handy for analyzing the scheduler using
>>> gdb)
>>>  kernel/pid.c              |    4 ++++        (linsched ugliness,
>>> these can go eventually)
>>>  kernel/sched/fair.c       |    2 +-          (this is just the
>>> promotion of 1 structure and function from static state which weren't
>>> published in the sched/ re-factoring that we need from within the
>>> simulator)
>>>  kernel/sched/stats.c      |    2 +-
>>>  kernel/time/timekeeping.c |    3 ++-    (this fixes a time-dilation
>>> error due to rounding when our clock-source has ns-resolution, e.g.
>>> shift==1)
>
>
> The edit in timekeeping:
>
> xtime.tv_nsec = ((s64)timekeeper.xtime_nsec + (1ULL << timekeeper.shift)
> - 1) >> timekeeper.shift;
>
> Looks better then the old code which blindly add 1ns for the lost in
> rounding, is it possible to commit this change to mainline?
>

Yes, these patches patches are about to go out as a free-standing
series as suggested by Ingo.

- Paul

> Regards,
> Michael Wang
>
>>>  6 files changed, 17 insertions(+), 3 deletions(-)
>>>
>>> Summarized changes vs 2.6.35 (previous version):
>>>
>>> - The original LinSched (understandably) simplified many of the kernel
>>> interactions in order to make simulation easier.  Unfortunately, this
>>> has serious side-effects on the accuracy of simulation.  We've now
>>> introduced a large portion of this state, including: irq and soft-irq
>>> contexts (we now perform periodic load-balance out of SCHED_SOFTIRQ
>>> for example), support for active load-balancing, correctly modeled
>>> nohz interactions, ipi and stop-task support.
>>>
>>> - Support for record and replay of application scheduling via perf.
>>> This is not yet well integrated, but under tests exist the tools to
>>> record an applications behavior using perf sched record, and then play
>>> it back in the simulator.
>>>
>>> - Load-balancer scoring.  This one is a very promising new avenue for
>>> load-balancer testing.  We analyzed several workloads and found that
>>> they could be well-modeled using a log-normal distribution.
>>> Parameterizing these models then allows us to construct a large (500)
>>> test-case set of randomly generated workloads that behave similarly.
>>> By integrating the variance between the current load-balance and an
>>> offline computed (currently greedy first-fit) balance we're able to
>>> automatically identify and score an approximation of our distance from
>>> an ideal load-balance.  Historically, such scores are very difficult
>>> to interpret, however, that's where our ability to generate a large
>>> set of test-cases above comes in.  This allows us to exploit a nice
>>> property, it's much easier to design a scoring function that diverges
>>> (in this case the variance) than a nice stable one that converges.  We
>>> can then catch regressions in load-balancer quality by measuring the
>>> divergence in this set of scoring functions across our set of
>>> test-cases.  This particular feature needs a large set of
>>> documentation in itself (todo), but to get started with playing with
>>> it see Makefile.mcarlo-sims in tools/linsched/tests.  In particular to
>>> evaluate the entire set across a variety of topologies the following
>>> command can be issued:
>>>  make -j <num_cpus * 2 > -f Makefile.mcarlo-sims
>>> (The included 'diff-mcarlo-500' tool can then be used to make
>>> comparisons across result sets.)
>>>
>>> - Validation versus real hardware.  Under tests/validation we've
>>> included a tool for replaying and recording the above simulations on a
>>> live-machine.  These can then be compared to simulated runs using the
>>> tools above to ensure that LinSched is modelling your architecture
>>> reasonably appropriately.  We did some reasonably extensive
>>> comparisons versus several x86 topologies in the v3.1 code using this;
>>> it's a fundamentally hard problem -- in particular there's much more
>>> clock drift between events on real hardware, but the results showed
>>> the included topologies to be a reasonable simulacrum under LinSched.
>>>
>>> What's to come?
>>> - More documentation, especially about the use of the new
>>> load-balancer scoring tools.
>>> - The history is very coarse right now as a result of going through a
>>> rebase cement-mixer.  I'd like to incrementally refactor some of the
>>> larger commits; once this is done I will promote linsched-alpha to a
>>> stable linsched branch that won't be subject to history re-writes.
>>> - KBuild integration.  We currently build everything out of the
>>> tools/linsched makefiles.  One of the immediate TODOs involves
>>> re-working the arch/linsched half of this to work with kbuild so that
>>> its less hacky/fragile.
>>> - Writing up some of the existing TODOs as starting points for anyone
>>> who wants to get involved.
>>>
>>> I'd also like to take a moment to specially recognize the effort of
>>> the following contributors, all of whom were involved extensively in
>>> the work above.  Things have come a long way since the 5000 lines of
>>> "#ifdef LINSCHED", the current status would not be possible without
>>> them.
>>>  Ben Segall, Dhaval Giani, Ranjit Manomohan, Nikhil Rao, and Abhishek
>>> Srivastava
>>>
>>> Thanks!
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] LinSched for v3.3-rc7
  2012-03-21  9:54     ` Paul Turner
@ 2012-03-21 10:11       ` Michael Wang
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Wang @ 2012-03-21 10:11 UTC (permalink / raw)
  To: Paul Turner
  Cc: Dhaval Giani, Ingo Molnar, Peter Zijlstra, Paul McKenney,
	Benjamin Segall, Ranjit Manomohan, Nikhil Rao, jmc,
	Suresh Siddha, Srivatsa Vaddagiri, LKML, Abhishek Srivastava

On 03/21/2012 05:54 PM, Paul Turner wrote:

> On Wed, Mar 21, 2012 at 2:20 AM, Michael Wang
> <wangyun@linux.vnet.ibm.com> wrote:
>> On 03/15/2012 12:08 PM, Dhaval Giani wrote:
>>
>>> [Adding abhishek to the cc]
>>>
>>> On Wed, Mar 14, 2012 at 8:58 PM, Paul Turner <pjt@google.com> wrote:
>>>> Hi All,
>>>>
>>>> [ Take 2, gmail tried to a non text/plain component into the last email .. ]
>>>>
>>>> Quick start version:
>>>>
>>>> Available under linsched-alpha at:
>>>>  git://git.kernel.org/pub/scm/linux/kernel/git/pjt/linsched.git  .linsched
>>>>
>>>> NOTE: The branch history is still subject to some revision as I am
>>>> still re-partitioning some of the patches.  Once this is complete, I
>>>> will promote linsched-alpha into a linsched branch at which point it
>>>> will no longer be subject to history re-writes.
>>>>
>>>> After checking out the code:
>>>> cd tools/linsched
>>>> make
>>>> cd tests
>>>> ./run_tests.sh basic_tests
>>>> << then try changing some scheduler parameters, e.g. sched_latency,
>>>> and repeating >>
>>>>
>>>> (Note:  The basic_tests are unit-tests, these are calibrated to the
>>>> current scheduler tunables and should strictly be considered sanity
>>>> tests.  Please see the mcarlo-sim work for a more useful testing
>>>> environment.)
>>>>
>>>> Extended version:
>>>>
>>>> First of all, apologies in the delay to posting this -- I know there's
>>>> been a lot of interest.  We made the choice to first rebase to v3.3
>>>> since there were fairly extensive changes, especially within the
>>>> scheduler, that meant we had the opportunity to significantly clean up
>>>> some of the LinSched code.  (For example, previously we were
>>>> processing kernel/sched* using awk as a Makefile step so that we could
>>>> extract the necessary structure information without modifying
>>>> sched.c!)  While the code benefited greatly from this, there were
>>>> several other changes that required fairly extensive modification in
>>>> this process (and in the meanwhile the v3.1 version became less
>>>> representative due to the extent of the above changes); which pushed
>>>> things out much further than I would have liked.  I suppose the moral
>>>> of the story is always release early, and often.
>>>>
>>>> That said, I'm relatively happy with the current state of integration,
>>>> there's certainly some specific areas that can still be greatly
>>>> improved (in particular, the main simulator loop has not had as much
>>>> attention paid as the LinSched<>Kernel interactions and there's a long
>>>> list of TODOs that could be improved there), but things are now mated
>>>> fairly cleanly through the use of a new LinSched architecture.  This
>>>> is a total re-write of almost all LinSched<>Kernel interactions versus
>>>> the previous (2.6.35) version, and has allowed us to now carry almost
>>>> zero modifications against the kernel source.  It's both possible to
>>>> develop/test in place, as well as being patch compatible.  The
>>>> remaining touch-points now total just 20 lines!  Half of these are
>>>> likely mergable, with the other 10 lines being more LinSched specific
>>>> at this point in time, I've broken these down below:
>>>>
>>>> The total damage:
>>>>  include/linux/init.h      |    6 ++++++   (linsched ugliness,
>>>> unfortunately necessary until we boot-strap proper initcall support)
>>>>  include/linux/rcupdate.h  |    3 +++    (only necessary to allow -O0
>>>> compilation which is extremely handy for analyzing the scheduler using
>>>> gdb)
>>>>  kernel/pid.c              |    4 ++++        (linsched ugliness,
>>>> these can go eventually)
>>>>  kernel/sched/fair.c       |    2 +-          (this is just the
>>>> promotion of 1 structure and function from static state which weren't
>>>> published in the sched/ re-factoring that we need from within the
>>>> simulator)
>>>>  kernel/sched/stats.c      |    2 +-
>>>>  kernel/time/timekeeping.c |    3 ++-    (this fixes a time-dilation
>>>> error due to rounding when our clock-source has ns-resolution, e.g.
>>>> shift==1)
>>
>>
>> The edit in timekeeping:
>>
>> xtime.tv_nsec = ((s64)timekeeper.xtime_nsec + (1ULL << timekeeper.shift)
>> - 1) >> timekeeper.shift;
>>
>> Looks better then the old code which blindly add 1ns for the lost in
>> rounding, is it possible to commit this change to mainline?
>>
> 
> Yes, these patches patches are about to go out as a free-standing
> series as suggested by Ingo.
> 

I see.

I think this LinSched is interesting and very useful while study or
testing the code, have we got some TODOs now as you mentioned before?

I'd like to see whether I can do some help :)

Regards,
Michael Wang

> - Paul
> 
>> Regards,
>> Michael Wang
>>
>>>>  6 files changed, 17 insertions(+), 3 deletions(-)
>>>>
>>>> Summarized changes vs 2.6.35 (previous version):
>>>>
>>>> - The original LinSched (understandably) simplified many of the kernel
>>>> interactions in order to make simulation easier.  Unfortunately, this
>>>> has serious side-effects on the accuracy of simulation.  We've now
>>>> introduced a large portion of this state, including: irq and soft-irq
>>>> contexts (we now perform periodic load-balance out of SCHED_SOFTIRQ
>>>> for example), support for active load-balancing, correctly modeled
>>>> nohz interactions, ipi and stop-task support.
>>>>
>>>> - Support for record and replay of application scheduling via perf.
>>>> This is not yet well integrated, but under tests exist the tools to
>>>> record an applications behavior using perf sched record, and then play
>>>> it back in the simulator.
>>>>
>>>> - Load-balancer scoring.  This one is a very promising new avenue for
>>>> load-balancer testing.  We analyzed several workloads and found that
>>>> they could be well-modeled using a log-normal distribution.
>>>> Parameterizing these models then allows us to construct a large (500)
>>>> test-case set of randomly generated workloads that behave similarly.
>>>> By integrating the variance between the current load-balance and an
>>>> offline computed (currently greedy first-fit) balance we're able to
>>>> automatically identify and score an approximation of our distance from
>>>> an ideal load-balance.  Historically, such scores are very difficult
>>>> to interpret, however, that's where our ability to generate a large
>>>> set of test-cases above comes in.  This allows us to exploit a nice
>>>> property, it's much easier to design a scoring function that diverges
>>>> (in this case the variance) than a nice stable one that converges.  We
>>>> can then catch regressions in load-balancer quality by measuring the
>>>> divergence in this set of scoring functions across our set of
>>>> test-cases.  This particular feature needs a large set of
>>>> documentation in itself (todo), but to get started with playing with
>>>> it see Makefile.mcarlo-sims in tools/linsched/tests.  In particular to
>>>> evaluate the entire set across a variety of topologies the following
>>>> command can be issued:
>>>>  make -j <num_cpus * 2 > -f Makefile.mcarlo-sims
>>>> (The included 'diff-mcarlo-500' tool can then be used to make
>>>> comparisons across result sets.)
>>>>
>>>> - Validation versus real hardware.  Under tests/validation we've
>>>> included a tool for replaying and recording the above simulations on a
>>>> live-machine.  These can then be compared to simulated runs using the
>>>> tools above to ensure that LinSched is modelling your architecture
>>>> reasonably appropriately.  We did some reasonably extensive
>>>> comparisons versus several x86 topologies in the v3.1 code using this;
>>>> it's a fundamentally hard problem -- in particular there's much more
>>>> clock drift between events on real hardware, but the results showed
>>>> the included topologies to be a reasonable simulacrum under LinSched.
>>>>
>>>> What's to come?
>>>> - More documentation, especially about the use of the new
>>>> load-balancer scoring tools.
>>>> - The history is very coarse right now as a result of going through a
>>>> rebase cement-mixer.  I'd like to incrementally refactor some of the
>>>> larger commits; once this is done I will promote linsched-alpha to a
>>>> stable linsched branch that won't be subject to history re-writes.
>>>> - KBuild integration.  We currently build everything out of the
>>>> tools/linsched makefiles.  One of the immediate TODOs involves
>>>> re-working the arch/linsched half of this to work with kbuild so that
>>>> its less hacky/fragile.
>>>> - Writing up some of the existing TODOs as starting points for anyone
>>>> who wants to get involved.
>>>>
>>>> I'd also like to take a moment to specially recognize the effort of
>>>> the following contributors, all of whom were involved extensively in
>>>> the work above.  Things have come a long way since the 5000 lines of
>>>> "#ifdef LINSCHED", the current status would not be possible without
>>>> them.
>>>>  Ben Segall, Dhaval Giani, Ranjit Manomohan, Nikhil Rao, and Abhishek
>>>> Srivastava
>>>>
>>>> Thanks!
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>
>>
>>
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] LinSched for v3.3-rc7
  2012-03-15  4:08 ` Dhaval Giani
  2012-03-21  9:20   ` Michael Wang
@ 2012-03-21 14:20   ` Morten Rasmussen
  1 sibling, 0 replies; 12+ messages in thread
From: Morten Rasmussen @ 2012-03-21 14:20 UTC (permalink / raw)
  To: Dhaval Giani
  Cc: Paul Turner, Ingo Molnar, Peter Zijlstra, Paul McKenney,
	Benjamin Segall, Ranjit Manomohan, Nikhil Rao, jmc,
	Suresh Siddha, Srivatsa Vaddagiri, LKML, Abhishek Srivastava

Hi,

On Thu, Mar 15, 2012 at 04:08:17AM +0000, Dhaval Giani wrote:
> [Adding abhishek to the cc]
> 
> On Wed, Mar 14, 2012 at 8:58 PM, Paul Turner <pjt@google.com> wrote:
> > - Support for record and replay of application scheduling via perf.
> > This is not yet well integrated, but under tests exist the tools to
> > record an applications behavior using perf sched record, and then play
> > it back in the simulator.

I am interested in testing this feature and I could use a hint on how to
get it working. tests/perf_replay.c seems to accept perf sched record
traces which have been post-processed into .rlog traces. Where can I
find a tool to generate rlogs from perf traces?

Thanks,
Morten


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] LinSched for v3.3-rc7
  2012-03-15  3:58 [ANNOUNCE] LinSched for v3.3-rc7 Paul Turner
  2012-03-15  4:08 ` Dhaval Giani
  2012-03-15  7:21 ` Ingo Molnar
@ 2012-03-23  4:03 ` Michael Wang
  2012-03-28  5:19   ` Michael Wang
  2012-04-09  3:29 ` Michael Wang
  2012-07-23  3:03 ` Michael Wang
  4 siblings, 1 reply; 12+ messages in thread
From: Michael Wang @ 2012-03-23  4:03 UTC (permalink / raw)
  To: Paul Turner
  Cc: Ingo Molnar, Peter Zijlstra, Paul McKenney, Benjamin Segall,
	Ranjit Manomohan, Nikhil Rao, jmc, Dhaval Giani, Suresh Siddha,
	Srivatsa Vaddagiri, LKML

On 03/15/2012 11:58 AM, Paul Turner wrote:

> Hi All,
> 
> [ Take 2, gmail tried to a non text/plain component into the last email .. ]
> 
> Quick start version:
> 
> Available under linsched-alpha at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/pjt/linsched.git  .linsched

Hi, All

I got confused with the LinSched main loop...
My understanding is:

	while (not time up) {
		
		get all cpus whose next event is the left most
		
		for those cpus {

			simulate hres clock interrupt		
			
			process_all_softirqs() ?			

			if cpu is idle
				enter idle
			else
				check process running time
		}
	}

Wonder why we need to call process_all_softirqs which will process other
cpu's pending soft irq here?

Regards,
Michael Wang


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] LinSched for v3.3-rc7
  2012-03-23  4:03 ` Michael Wang
@ 2012-03-28  5:19   ` Michael Wang
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Wang @ 2012-03-28  5:19 UTC (permalink / raw)
  To: Paul Turner
  Cc: Ingo Molnar, Peter Zijlstra, Paul McKenney, Benjamin Segall,
	Ranjit Manomohan, Nikhil Rao, jmc, Dhaval Giani, Suresh Siddha,
	Srivatsa Vaddagiri, LKML

On 03/23/2012 12:03 PM, Michael Wang wrote:

> On 03/15/2012 11:58 AM, Paul Turner wrote:
> 
>> Hi All,
>>
>> [ Take 2, gmail tried to a non text/plain component into the last email .. ]
>>
>> Quick start version:
>>
>> Available under linsched-alpha at:
>>   git://git.kernel.org/pub/scm/linux/kernel/git/pjt/linsched.git  .linsched
> 
> Hi, All
> 
> I got confused with the LinSched main loop...
> My understanding is:
> 
> 	while (not time up) {
> 		
> 		get all cpus whose next event is the left most
> 		
> 		for those cpus {
> 
> 			simulate hres clock interrupt		
> 			
> 			process_all_softirqs() ?			
> 
> 			if cpu is idle
> 				enter idle
> 			else
> 				check process running time
> 		}
> 	}
> 
> Wonder why we need to call process_all_softirqs which will process other
> cpu's pending soft irq here?
> 


After done some test on this question, I think use
"process_all_softirqs" here is to simulate an interrupt for idle cpu,
actually, just simulate the part which process soft irq and check
reschedule, but in my opinion, this is wrong and will make the results
inaccurately.

The key point for this "process_all_softirqs" is the HRTIMER_SOFTIRQ.

Generally, in LinSched, all the softirq should be handled in "irq_exit"
after the simulated clock interrupt, beside one case:

		simulate clock irq for cpu x
		last running task going to sleep on cpu x
		clock irq leave
		cpu x enter idle(tick_nohz_idle_enter)

Here in "tick_nohz_idle_enter", if cpu x is the timer cpu,
HRTIMER_SOFTIRQ will be raised.

Now, if on a real machine, HRTIMER_SOFTIRQ will be handled while next
interrupt on cpu x, in order to restart the tick timer.

I think in LinSched, this interrupt should arrive at the time when the
sleep task wakeup on cpu x or some one else kick cpu x with an ipi
reschedule interrupt(no hz balance kick I suppose).

But if we use process_all_softirqs here, the next scene will be:
		
		simulate next clock irq for cpu y
		process clock event on cpu y
		clock irq leave
		process_all_softirqs

Here in "process_all_softirqs", HRTIMER_SOFTIRQ on cpu x will be
handled, and the tick timer of cpu x will be enabled and fire at next
tick(actually this is another mistake because we haven't call
"tick_nohz_irq_exit" in "irq_exit" even the cpu is still idle), this may
caused cpu x do an extra load balance if next tick is the time to do it.

And this will also happen if time has been passed when simulate next
clock irq for cpu y(the case that cpu x is the last cpu to be handled in
last clock event), cpu x may trigger load balance if it is the time to
do it, in "process_all_softirqs".

All these means we simulated an extra interrupt after cpu x idled, which
may help cpu x to do extra load balance work, that's doesn't make sense
and will cause some cpu more 'active' then others, isn't it?

I'd like to do more tests after disabled "process_all_softirqs", but I
don't know how we got those expectation numbers? Any one know what kind
of formula we should use to calculate the expected results?

Regards,
Michael Wang
		

> Regards,
> Michael Wang
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 







^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] LinSched for v3.3-rc7
  2012-03-15  3:58 [ANNOUNCE] LinSched for v3.3-rc7 Paul Turner
                   ` (2 preceding siblings ...)
  2012-03-23  4:03 ` Michael Wang
@ 2012-04-09  3:29 ` Michael Wang
  2012-07-23  3:03 ` Michael Wang
  4 siblings, 0 replies; 12+ messages in thread
From: Michael Wang @ 2012-04-09  3:29 UTC (permalink / raw)
  To: Paul Turner
  Cc: Ingo Molnar, Peter Zijlstra, Paul McKenney, Benjamin Segall,
	Ranjit Manomohan, Nikhil Rao, jmc, Dhaval Giani, Suresh Siddha,
	Srivatsa Vaddagiri, LKML

On 03/15/2012 11:58 AM, Paul Turner wrote:

> Hi All,
> 
> [ Take 2, gmail tried to a non text/plain component into the last email .. ]
> 
> Quick start version:
> 
> Available under linsched-alpha at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/pjt/linsched.git  .linsched
> 


Hi, Paul

I got some patches on LinSched, is there a dedicated mail list only for
LinSched?

Or should I send them to LKML for review?

Regards,
Michael Wang

 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] LinSched for v3.3-rc7
  2012-03-15  3:58 [ANNOUNCE] LinSched for v3.3-rc7 Paul Turner
                   ` (3 preceding siblings ...)
  2012-04-09  3:29 ` Michael Wang
@ 2012-07-23  3:03 ` Michael Wang
  2012-07-23 12:54   ` Paul Turner
  4 siblings, 1 reply; 12+ messages in thread
From: Michael Wang @ 2012-07-23  3:03 UTC (permalink / raw)
  To: Paul Turner
  Cc: Ingo Molnar, Peter Zijlstra, Paul McKenney, Benjamin Segall,
	Ranjit Manomohan, Nikhil Rao, jmc, Dhaval Giani, Suresh Siddha,
	Srivatsa Vaddagiri, LKML

Is there any latest info about the linsched?
I've not seen any updates so I'm not sure whether it is in the status to
accept patches or still need some rebuilding?

Regards,
Michael Wang

On 03/15/2012 11:58 AM, Paul Turner wrote:
> Hi All,
> 
> [ Take 2, gmail tried to a non text/plain component into the last email .. ]
> 
> Quick start version:
> 
> Available under linsched-alpha at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/pjt/linsched.git  .linsched
> 
> NOTE: The branch history is still subject to some revision as I am
> still re-partitioning some of the patches.  Once this is complete, I
> will promote linsched-alpha into a linsched branch at which point it
> will no longer be subject to history re-writes.
> 
> After checking out the code:
> cd tools/linsched
> make
> cd tests
> ./run_tests.sh basic_tests
> << then try changing some scheduler parameters, e.g. sched_latency,
> and repeating >>
> 
> (Note:  The basic_tests are unit-tests, these are calibrated to the
> current scheduler tunables and should strictly be considered sanity
> tests.  Please see the mcarlo-sim work for a more useful testing
> environment.)
> 
> Extended version:
> 
> First of all, apologies in the delay to posting this -- I know there's
> been a lot of interest.  We made the choice to first rebase to v3.3
> since there were fairly extensive changes, especially within the
> scheduler, that meant we had the opportunity to significantly clean up
> some of the LinSched code.  (For example, previously we were
> processing kernel/sched* using awk as a Makefile step so that we could
> extract the necessary structure information without modifying
> sched.c!)  While the code benefited greatly from this, there were
> several other changes that required fairly extensive modification in
> this process (and in the meanwhile the v3.1 version became less
> representative due to the extent of the above changes); which pushed
> things out much further than I would have liked.  I suppose the moral
> of the story is always release early, and often.
> 
> That said, I'm relatively happy with the current state of integration,
> there's certainly some specific areas that can still be greatly
> improved (in particular, the main simulator loop has not had as much
> attention paid as the LinSched<>Kernel interactions and there's a long
> list of TODOs that could be improved there), but things are now mated
> fairly cleanly through the use of a new LinSched architecture.  This
> is a total re-write of almost all LinSched<>Kernel interactions versus
> the previous (2.6.35) version, and has allowed us to now carry almost
> zero modifications against the kernel source.  It's both possible to
> develop/test in place, as well as being patch compatible.  The
> remaining touch-points now total just 20 lines!  Half of these are
> likely mergable, with the other 10 lines being more LinSched specific
> at this point in time, I've broken these down below:
> 
> The total damage:
>  include/linux/init.h      |    6 ++++++   (linsched ugliness,
> unfortunately necessary until we boot-strap proper initcall support)
>  include/linux/rcupdate.h  |    3 +++    (only necessary to allow -O0
> compilation which is extremely handy for analyzing the scheduler using
> gdb)
>  kernel/pid.c              |    4 ++++        (linsched ugliness,
> these can go eventually)
>  kernel/sched/fair.c       |    2 +-          (this is just the
> promotion of 1 structure and function from static state which weren't
> published in the sched/ re-factoring that we need from within the
> simulator)
>  kernel/sched/stats.c      |    2 +-
>  kernel/time/timekeeping.c |    3 ++-    (this fixes a time-dilation
> error due to rounding when our clock-source has ns-resolution, e.g.
> shift==1)
>  6 files changed, 17 insertions(+), 3 deletions(-)
> 
> Summarized changes vs 2.6.35 (previous version):
> 
> - The original LinSched (understandably) simplified many of the kernel
> interactions in order to make simulation easier.  Unfortunately, this
> has serious side-effects on the accuracy of simulation.  We've now
> introduced a large portion of this state, including: irq and soft-irq
> contexts (we now perform periodic load-balance out of SCHED_SOFTIRQ
> for example), support for active load-balancing, correctly modeled
> nohz interactions, ipi and stop-task support.
> 
> - Support for record and replay of application scheduling via perf.
> This is not yet well integrated, but under tests exist the tools to
> record an applications behavior using perf sched record, and then play
> it back in the simulator.
> 
> - Load-balancer scoring.  This one is a very promising new avenue for
> load-balancer testing.  We analyzed several workloads and found that
> they could be well-modeled using a log-normal distribution.
> Parameterizing these models then allows us to construct a large (500)
> test-case set of randomly generated workloads that behave similarly.
> By integrating the variance between the current load-balance and an
> offline computed (currently greedy first-fit) balance we're able to
> automatically identify and score an approximation of our distance from
> an ideal load-balance.  Historically, such scores are very difficult
> to interpret, however, that's where our ability to generate a large
> set of test-cases above comes in.  This allows us to exploit a nice
> property, it's much easier to design a scoring function that diverges
> (in this case the variance) than a nice stable one that converges.  We
> can then catch regressions in load-balancer quality by measuring the
> divergence in this set of scoring functions across our set of
> test-cases.  This particular feature needs a large set of
> documentation in itself (todo), but to get started with playing with
> it see Makefile.mcarlo-sims in tools/linsched/tests.  In particular to
> evaluate the entire set across a variety of topologies the following
> command can be issued:
>   make -j <num_cpus * 2 > -f Makefile.mcarlo-sims
> (The included 'diff-mcarlo-500' tool can then be used to make
> comparisons across result sets.)
> 
> - Validation versus real hardware.  Under tests/validation we've
> included a tool for replaying and recording the above simulations on a
> live-machine.  These can then be compared to simulated runs using the
> tools above to ensure that LinSched is modelling your architecture
> reasonably appropriately.  We did some reasonably extensive
> comparisons versus several x86 topologies in the v3.1 code using this;
> it's a fundamentally hard problem -- in particular there's much more
> clock drift between events on real hardware, but the results showed
> the included topologies to be a reasonable simulacrum under LinSched.
> 
> What's to come?
> - More documentation, especially about the use of the new
> load-balancer scoring tools.
> - The history is very coarse right now as a result of going through a
> rebase cement-mixer.  I'd like to incrementally refactor some of the
> larger commits; once this is done I will promote linsched-alpha to a
> stable linsched branch that won't be subject to history re-writes.
> - KBuild integration.  We currently build everything out of the
> tools/linsched makefiles.  One of the immediate TODOs involves
> re-working the arch/linsched half of this to work with kbuild so that
> its less hacky/fragile.
> - Writing up some of the existing TODOs as starting points for anyone
> who wants to get involved.
> 
> I'd also like to take a moment to specially recognize the effort of
> the following contributors, all of whom were involved extensively in
> the work above.  Things have come a long way since the 5000 lines of
> "#ifdef LINSCHED", the current status would not be possible without
> them.
>   Ben Segall, Dhaval Giani, Ranjit Manomohan, Nikhil Rao, and Abhishek
> Srivastava
> 
> Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ANNOUNCE] LinSched for v3.3-rc7
  2012-07-23  3:03 ` Michael Wang
@ 2012-07-23 12:54   ` Paul Turner
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Turner @ 2012-07-23 12:54 UTC (permalink / raw)
  To: Michael Wang
  Cc: Ingo Molnar, Peter Zijlstra, Paul McKenney, Benjamin Segall,
	Ranjit Manomohan, Nikhil Rao, jmc, Dhaval Giani, Suresh Siddha,
	Srivatsa Vaddagiri, LKML

On Sun, Jul 22, 2012 at 8:03 PM, Michael Wang
<wangyun@linux.vnet.ibm.com> wrote:
> Is there any latest info about the linsched?
> I've not seen any updates so I'm not sure whether it is in the status to
> accept patches or still need some rebuilding?
>

Hey Michael,

I'll happily take patches but I've been completely backlogged by internal work.

Dhaval, was going to send me a 3.5 rebase; let me sync up with him to
see if that needs any massaging and I'll republish.

I will be out for about two weeks from this Friday; but there should
be at least 2 pushes before LPC in August -- one before I go, and one
when I return.  This is one of the many things on my pre-offline
todo-list :-(

Thanks,

- Paul

> Regards,
> Michael Wang
>
> On 03/15/2012 11:58 AM, Paul Turner wrote:
>> Hi All,
>>
>> [ Take 2, gmail tried to a non text/plain component into the last email .. ]
>>
>> Quick start version:
>>
>> Available under linsched-alpha at:
>>   git://git.kernel.org/pub/scm/linux/kernel/git/pjt/linsched.git  .linsched
>>
>> NOTE: The branch history is still subject to some revision as I am
>> still re-partitioning some of the patches.  Once this is complete, I
>> will promote linsched-alpha into a linsched branch at which point it
>> will no longer be subject to history re-writes.
>>
>> After checking out the code:
>> cd tools/linsched
>> make
>> cd tests
>> ./run_tests.sh basic_tests
>> << then try changing some scheduler parameters, e.g. sched_latency,
>> and repeating >>
>>
>> (Note:  The basic_tests are unit-tests, these are calibrated to the
>> current scheduler tunables and should strictly be considered sanity
>> tests.  Please see the mcarlo-sim work for a more useful testing
>> environment.)
>>
>> Extended version:
>>
>> First of all, apologies in the delay to posting this -- I know there's
>> been a lot of interest.  We made the choice to first rebase to v3.3
>> since there were fairly extensive changes, especially within the
>> scheduler, that meant we had the opportunity to significantly clean up
>> some of the LinSched code.  (For example, previously we were
>> processing kernel/sched* using awk as a Makefile step so that we could
>> extract the necessary structure information without modifying
>> sched.c!)  While the code benefited greatly from this, there were
>> several other changes that required fairly extensive modification in
>> this process (and in the meanwhile the v3.1 version became less
>> representative due to the extent of the above changes); which pushed
>> things out much further than I would have liked.  I suppose the moral
>> of the story is always release early, and often.
>>
>> That said, I'm relatively happy with the current state of integration,
>> there's certainly some specific areas that can still be greatly
>> improved (in particular, the main simulator loop has not had as much
>> attention paid as the LinSched<>Kernel interactions and there's a long
>> list of TODOs that could be improved there), but things are now mated
>> fairly cleanly through the use of a new LinSched architecture.  This
>> is a total re-write of almost all LinSched<>Kernel interactions versus
>> the previous (2.6.35) version, and has allowed us to now carry almost
>> zero modifications against the kernel source.  It's both possible to
>> develop/test in place, as well as being patch compatible.  The
>> remaining touch-points now total just 20 lines!  Half of these are
>> likely mergable, with the other 10 lines being more LinSched specific
>> at this point in time, I've broken these down below:
>>
>> The total damage:
>>  include/linux/init.h      |    6 ++++++   (linsched ugliness,
>> unfortunately necessary until we boot-strap proper initcall support)
>>  include/linux/rcupdate.h  |    3 +++    (only necessary to allow -O0
>> compilation which is extremely handy for analyzing the scheduler using
>> gdb)
>>  kernel/pid.c              |    4 ++++        (linsched ugliness,
>> these can go eventually)
>>  kernel/sched/fair.c       |    2 +-          (this is just the
>> promotion of 1 structure and function from static state which weren't
>> published in the sched/ re-factoring that we need from within the
>> simulator)
>>  kernel/sched/stats.c      |    2 +-
>>  kernel/time/timekeeping.c |    3 ++-    (this fixes a time-dilation
>> error due to rounding when our clock-source has ns-resolution, e.g.
>> shift==1)
>>  6 files changed, 17 insertions(+), 3 deletions(-)
>>
>> Summarized changes vs 2.6.35 (previous version):
>>
>> - The original LinSched (understandably) simplified many of the kernel
>> interactions in order to make simulation easier.  Unfortunately, this
>> has serious side-effects on the accuracy of simulation.  We've now
>> introduced a large portion of this state, including: irq and soft-irq
>> contexts (we now perform periodic load-balance out of SCHED_SOFTIRQ
>> for example), support for active load-balancing, correctly modeled
>> nohz interactions, ipi and stop-task support.
>>
>> - Support for record and replay of application scheduling via perf.
>> This is not yet well integrated, but under tests exist the tools to
>> record an applications behavior using perf sched record, and then play
>> it back in the simulator.
>>
>> - Load-balancer scoring.  This one is a very promising new avenue for
>> load-balancer testing.  We analyzed several workloads and found that
>> they could be well-modeled using a log-normal distribution.
>> Parameterizing these models then allows us to construct a large (500)
>> test-case set of randomly generated workloads that behave similarly.
>> By integrating the variance between the current load-balance and an
>> offline computed (currently greedy first-fit) balance we're able to
>> automatically identify and score an approximation of our distance from
>> an ideal load-balance.  Historically, such scores are very difficult
>> to interpret, however, that's where our ability to generate a large
>> set of test-cases above comes in.  This allows us to exploit a nice
>> property, it's much easier to design a scoring function that diverges
>> (in this case the variance) than a nice stable one that converges.  We
>> can then catch regressions in load-balancer quality by measuring the
>> divergence in this set of scoring functions across our set of
>> test-cases.  This particular feature needs a large set of
>> documentation in itself (todo), but to get started with playing with
>> it see Makefile.mcarlo-sims in tools/linsched/tests.  In particular to
>> evaluate the entire set across a variety of topologies the following
>> command can be issued:
>>   make -j <num_cpus * 2 > -f Makefile.mcarlo-sims
>> (The included 'diff-mcarlo-500' tool can then be used to make
>> comparisons across result sets.)
>>
>> - Validation versus real hardware.  Under tests/validation we've
>> included a tool for replaying and recording the above simulations on a
>> live-machine.  These can then be compared to simulated runs using the
>> tools above to ensure that LinSched is modelling your architecture
>> reasonably appropriately.  We did some reasonably extensive
>> comparisons versus several x86 topologies in the v3.1 code using this;
>> it's a fundamentally hard problem -- in particular there's much more
>> clock drift between events on real hardware, but the results showed
>> the included topologies to be a reasonable simulacrum under LinSched.
>>
>> What's to come?
>> - More documentation, especially about the use of the new
>> load-balancer scoring tools.
>> - The history is very coarse right now as a result of going through a
>> rebase cement-mixer.  I'd like to incrementally refactor some of the
>> larger commits; once this is done I will promote linsched-alpha to a
>> stable linsched branch that won't be subject to history re-writes.
>> - KBuild integration.  We currently build everything out of the
>> tools/linsched makefiles.  One of the immediate TODOs involves
>> re-working the arch/linsched half of this to work with kbuild so that
>> its less hacky/fragile.
>> - Writing up some of the existing TODOs as starting points for anyone
>> who wants to get involved.
>>
>> I'd also like to take a moment to specially recognize the effort of
>> the following contributors, all of whom were involved extensively in
>> the work above.  Things have come a long way since the 5000 lines of
>> "#ifdef LINSCHED", the current status would not be possible without
>> them.
>>   Ben Segall, Dhaval Giani, Ranjit Manomohan, Nikhil Rao, and Abhishek
>> Srivastava
>>
>> Thanks!
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-07-23 12:55 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-15  3:58 [ANNOUNCE] LinSched for v3.3-rc7 Paul Turner
2012-03-15  4:08 ` Dhaval Giani
2012-03-21  9:20   ` Michael Wang
2012-03-21  9:54     ` Paul Turner
2012-03-21 10:11       ` Michael Wang
2012-03-21 14:20   ` Morten Rasmussen
2012-03-15  7:21 ` Ingo Molnar
2012-03-23  4:03 ` Michael Wang
2012-03-28  5:19   ` Michael Wang
2012-04-09  3:29 ` Michael Wang
2012-07-23  3:03 ` Michael Wang
2012-07-23 12:54   ` Paul Turner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.