All of lore.kernel.org
 help / color / mirror / Atom feed
* [ANNOUNCE] 3.9-rc1-nohz1
@ 2013-03-09  0:50 Frederic Weisbecker
  2013-03-09  8:26 ` Ingo Molnar
  0 siblings, 1 reply; 5+ messages in thread
From: Frederic Weisbecker @ 2013-03-09  0:50 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Li Zhong, Namhyung Kim,
	Paul E. McKenney, Paul Gortmaker, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Kevin Hilman, Mats Liljegren

Hi,

Several fixes there. And this version should have much lesser spurious
warnings. Your testing and reviews is very appreciated.

The 5 first patches of the series are pending on a pull request for -tip
(3.10 material).

I'm now considering how I should upstream the rest of the series.
All the pieces that got merged until now were sort of easy because the various
chunks were pretty self contained and independant (full dynticks cputime
accounting, printk, RCU user mode, dynticks API generalization, etc...).

Now what remains in this series is hard to cut into individual parts.
Everything depends on defining an interface with kernel parameter
to partition the full dynticks CPUs set.

I think we really need to start using a branch in -tip and move incrementally
from there with the following steps:

	1) Set the kernel parameters and config option
	2) Handle timers wakeup, timekeeping, posix cpu timers, perf, sched etc...
	   on top of kernel parameter based CPU partition
	3) Once we know _everything_ is handled, bring the final dynticks infrastructure
	4) Upstream

This will make everything much easier for everyone: easier piecewise reviews and easier for
other people to contribute.

Because you don't want me to spam you with ~40 commits for 2 more years, right?

Thanks.

This version can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	3.9-rc1-nohz1

---
Changes since 3.8-rc6-nohz4:

* Rebase against 3.9-rc1

* Fixed a few races with exception and preemption handling [1-3/29]

* Dropped commit "sched: Remove broken check for skip clock update"
that was buggy (thanks Steve for pointing that)

* Ignore noisy stale rq clock detection on boot and other situations
with rq->skip_clock_update [27/29]

* Dropped commit "sched: Update clock of nohz busiest rq before balancing"
that became useless (thanks Li Zhong)

* Don't issue a self IPI on timer enqueue if the CPU didn't stop its
tick [9/29]

* Rename a bit the Kconfig menu after discussion with Borislav [6/29]

* Handle broken full_nohz mask in kernel parameters (thanks Borislav) [6/29]

---
TODO list hasn't changed much:

- Posix CPU timers
- Perf events
- sched_class::task_tick()
- various other scheduler details
- ...

---
Frederic Weisbecker (29):
  context_tracking: Move exception handling to generic code
  context_tracking: Restore correct previous context state on exception
    exit
  context_tracking: Restore preempted context state after
    preempt_schedule_irq()
  cputime: Dynamically scale cputime for full dynticks accounting
  context_tracking: Enable probes by default for selftesting
  nohz: Basic full dynticks interface
  nohz: Assign timekeeping duty to a non-full-nohz CPU
  nohz: Trace timekeeping update
  nohz: Wake up full dynticks CPUs when a timer gets enqueued
  rcu: Restart the tick on non-responding full dynticks CPUs
  sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz
  sched: Update rq clock on nohz CPU before migrating tasks
  sched: Update rq clock on nohz CPU before setting fair group shares
  sched: Update rq clock on tickless CPUs before calling
    check_preempt_curr()
  sched: Update rq clock earlier in unthrottle_cfs_rq
  sched: Update rq clock before idle balancing
  sched: Update nohz rq clock before searching busiest group on load
    balancing
  nohz: Move nohz load balancer selection into idle logic
  nohz: Full dynticks mode
  nohz: Only stop the tick on RCU nocb CPUs
  nohz: Don't turn off the tick if rcu needs it
  nohz: Don't stop the tick if posix cpu timers are running
  nohz: Add some tracing
  rcu: Don't keep the tick for RCU while in userspace
  timer: Don't run non-pinned timer to full dynticks CPUs
  sched: Use an accessor to read rq clock
  sched: Debug nohz rq clock
  sched: Update rq clock before rt sched average scale
  sched: Disable lb_bias feature for full dynticks

 arch/x86/include/asm/context_tracking.h |   21 ----
 arch/x86/kernel/kvm.c                   |    8 +-
 arch/x86/kernel/traps.c                 |   68 +++++++++-----
 arch/x86/mm/fault.c                     |    8 +-
 include/linux/context_tracking.h        |   24 +++++-
 include/linux/posix-timers.h            |    1 +
 include/linux/rcupdate.h                |    8 ++
 include/linux/sched.h                   |   14 ++-
 include/linux/tick.h                    |    9 ++
 init/Kconfig                            |    1 +
 kernel/fork.c                           |    2 +-
 kernel/hrtimer.c                        |    3 +-
 kernel/posix-cpu-timers.c               |   11 ++
 kernel/rcutree.c                        |   19 +++-
 kernel/rcutree.h                        |    1 -
 kernel/rcutree_plugin.h                 |   13 +--
 kernel/sched/core.c                     |  110 ++++++++++++++++++++--
 kernel/sched/cputime.c                  |  154 ++++++++++++++++---------------
 kernel/sched/fair.c                     |   79 +++++++++++-----
 kernel/sched/features.h                 |    3 +
 kernel/sched/rt.c                       |    8 +-
 kernel/sched/sched.h                    |   61 ++++++++++++
 kernel/sched/stats.h                    |    8 +-
 kernel/sched/stop_task.c                |    8 +-
 kernel/softirq.c                        |    5 +-
 kernel/time/Kconfig                     |    9 ++
 kernel/time/tick-broadcast.c            |    3 +-
 kernel/time/tick-common.c               |    5 +-
 kernel/time/tick-sched.c                |  134 ++++++++++++++++++++++++---
 kernel/timer.c                          |    5 +-
 30 files changed, 587 insertions(+), 216 deletions(-)

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ANNOUNCE] 3.9-rc1-nohz1
  2013-03-09  0:50 [ANNOUNCE] 3.9-rc1-nohz1 Frederic Weisbecker
@ 2013-03-09  8:26 ` Ingo Molnar
  2013-03-10 23:53   ` Frederic Weisbecker
  0 siblings, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2013-03-09  8:26 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Chris Metcalf,
	Christoph Lameter, Geoff Levand, Gilad Ben Yossef, Hakan Akkan,
	Li Zhong, Namhyung Kim, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Kevin Hilman,
	Mats Liljegren


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> Hi,
> 
> Several fixes there. And this version should have much lesser spurious warnings. 
> Your testing and reviews is very appreciated.
> 
> The 5 first patches of the series are pending on a pull request for -tip (3.10 
> material).
> 
> I'm now considering how I should upstream the rest of the series. All the pieces 
> that got merged until now were sort of easy because the various chunks were pretty 
> self contained and independant (full dynticks cputime accounting, printk, RCU user 
> mode, dynticks API generalization, etc...).
> 
> Now what remains in this series is hard to cut into individual parts. Everything 
> depends on defining an interface with kernel parameter to partition the full 
> dynticks CPUs set.
> 
> I think we really need to start using a branch in -tip and move incrementally from 
> there with the following steps:
> 
> 	1) Set the kernel parameters and config option
> 	2) Handle timers wakeup, timekeeping, posix cpu timers, perf, sched etc...
> 	   on top of kernel parameter based CPU partition
> 	3) Once we know _everything_ is handled, bring the final dynticks infrastructure
> 	4) Upstream
> 
> This will make everything much easier for everyone: easier piecewise reviews and 
> easier for other people to contribute.
> 
> Because you don't want me to spam you with ~40 commits for 2 more years, right?
> 
> Thanks.
> 
> This version can be found at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	3.9-rc1-nohz1
> 
> ---
> Changes since 3.8-rc6-nohz4:
> 
> * Rebase against 3.9-rc1
> 
> * Fixed a few races with exception and preemption handling [1-3/29]
> 
> * Dropped commit "sched: Remove broken check for skip clock update"
> that was buggy (thanks Steve for pointing that)
> 
> * Ignore noisy stale rq clock detection on boot and other situations
> with rq->skip_clock_update [27/29]
> 
> * Dropped commit "sched: Update clock of nohz busiest rq before balancing"
> that became useless (thanks Li Zhong)
> 
> * Don't issue a self IPI on timer enqueue if the CPU didn't stop its
> tick [9/29]
> 
> * Rename a bit the Kconfig menu after discussion with Borislav [6/29]
> 
> * Handle broken full_nohz mask in kernel parameters (thanks Borislav) [6/29]
> 
> ---
> TODO list hasn't changed much:
> 
> - Posix CPU timers
> - Perf events
> - sched_class::task_tick()
> - various other scheduler details
> - ...

We could certainly start tip:sched/dynticks (or tip:timers/dynticks) to accelerate 
the upstream merging of it. Nobody expressed deep concerns with the approach, so 
what is left is some more hard work.

Two quick requests:

 - Mind adding a Documentation/... file with a high level description,
   rough design, open problems, etc.?

 - Please outline how the current TODO entries affect upstream
   mergability. Does it reduce the 'full'-ness of this dynticks mode?
   Outright buggy behavior? Other trade-offs?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ANNOUNCE] 3.9-rc1-nohz1
  2013-03-09  8:26 ` Ingo Molnar
@ 2013-03-10 23:53   ` Frederic Weisbecker
  2013-03-11  7:39     ` Ingo Molnar
  0 siblings, 1 reply; 5+ messages in thread
From: Frederic Weisbecker @ 2013-03-10 23:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Chris Metcalf,
	Christoph Lameter, Geoff Levand, Gilad Ben Yossef, Hakan Akkan,
	Li Zhong, Namhyung Kim, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Kevin Hilman,
	Mats Liljegren

2013/3/9 Ingo Molnar <mingo@kernel.org>:
> We could certainly start tip:sched/dynticks (or tip:timers/dynticks) to accelerate
> the upstream merging of it. Nobody expressed deep concerns with the approach, so
> what is left is some more hard work.

Great to see you're ok with that direction! I'm working on that then.

>
> Two quick requests:
>
>  - Mind adding a Documentation/... file with a high level description,
>    rough design, open problems, etc.?

Sure! We'll maintain that along the way.

>
>  - Please outline how the current TODO entries affect upstream
>    mergability. Does it reduce the 'full'-ness of this dynticks mode?
>    Outright buggy behavior? Other trade-offs?

Mostly this is about upstream features that won't be working with the
current state of the art: enqueuing a posix cpu timer on a nohz CPU
may result in it being ignored by the target due to the lack of
ticking until expiration, perf events may not be round-robined, etc...
I'll make sure to document all these items.

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ANNOUNCE] 3.9-rc1-nohz1
  2013-03-10 23:53   ` Frederic Weisbecker
@ 2013-03-11  7:39     ` Ingo Molnar
  2013-03-11 16:38       ` Frederic Weisbecker
  0 siblings, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2013-03-11  7:39 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Chris Metcalf,
	Christoph Lameter, Geoff Levand, Gilad Ben Yossef, Hakan Akkan,
	Li Zhong, Namhyung Kim, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Kevin Hilman,
	Mats Liljegren


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> >  - Please outline how the current TODO entries affect upstream
> >    mergability. Does it reduce the 'full'-ness of this dynticks mode?
> >    Outright buggy behavior? Other trade-offs?
> 
> Mostly this is about upstream features that won't be working with the current 
> state of the art: enqueuing a posix cpu timer on a nohz CPU may result in it being 
> ignored by the target due to the lack of ticking until expiration, perf events may 
> not be round-robined, etc... I'll make sure to document all these items.

So it's "buggy behavior of existing features" it appears?

It would be really useful to add some sort of 'make it safe easily' mechanism:

 - if a posix timer is enqueued on a CPU, then the CPU should have a timer ticking

 - if perf events are active on a CPU, then it should have a timer ticking

this would make it mergable, as most of the time systems don't have any of these 
facilities active. Plus this dynticks-off mechanism would also allow us to cover any 
other (still unknown) facility that regresses. So it would be nice to have that 
option.

Later on we could gradually eliminate these limitations. It would also be apparent 
where they are, just from grepping the source.

If that's done, and if it tests fine for a few weeks then this could be v3.10 
material IMO.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ANNOUNCE] 3.9-rc1-nohz1
  2013-03-11  7:39     ` Ingo Molnar
@ 2013-03-11 16:38       ` Frederic Weisbecker
  0 siblings, 0 replies; 5+ messages in thread
From: Frederic Weisbecker @ 2013-03-11 16:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Chris Metcalf,
	Christoph Lameter, Geoff Levand, Gilad Ben Yossef, Hakan Akkan,
	Li Zhong, Namhyung Kim, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Kevin Hilman,
	Mats Liljegren

2013/3/11 Ingo Molnar <mingo@kernel.org>:
>
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
>> >  - Please outline how the current TODO entries affect upstream
>> >    mergability. Does it reduce the 'full'-ness of this dynticks mode?
>> >    Outright buggy behavior? Other trade-offs?
>>
>> Mostly this is about upstream features that won't be working with the current
>> state of the art: enqueuing a posix cpu timer on a nohz CPU may result in it being
>> ignored by the target due to the lack of ticking until expiration, perf events may
>> not be round-robined, etc... I'll make sure to document all these items.
>
> So it's "buggy behavior of existing features" it appears?

Right.

> It would be really useful to add some sort of 'make it safe easily' mechanism:
>
>  - if a posix timer is enqueued on a CPU, then the CPU should have a timer ticking
>
>  - if perf events are active on a CPU, then it should have a timer ticking
>
> this would make it mergable, as most of the time systems don't have any of these
> facilities active. Plus this dynticks-off mechanism would also allow us to cover any
> other (still unknown) facility that regresses. So it would be nice to have that
> option.

Yeah that's how I intended to solve the issue for these cases. I don't
worry that much about posix cpu timers and perf in fact. These should
be not hard to cope with. I'm more worried about scheduler details in
scheduler_tick().

I covered the rq clock and a part of update_cpu_load_active().

Now we have yet to care about sched_avg_update(),
calc_load_account_active() and sched_class::task_tick() to make sure
we are not letting something behind. There is rq->rt_avg that seem to
be used for load balancing when rt tasks are around. Then
calc_load_update. Idle load balancing is concerned as well. I haven't
looked deeply into these places so I don't know what can be shortcut
or not there.

> Later on we could gradually eliminate these limitations. It would also be apparent
> where they are, just from grepping the source.
>
> If that's done, and if it tests fine for a few weeks then this could be v3.10
> material IMO.

Ok, I won't be that optimistic about the release time but things are
certainly going to be faster now. I'm going to reshape and send you
what I have now then we'll have a fresher view of the rest.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-03-11 16:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-09  0:50 [ANNOUNCE] 3.9-rc1-nohz1 Frederic Weisbecker
2013-03-09  8:26 ` Ingo Molnar
2013-03-10 23:53   ` Frederic Weisbecker
2013-03-11  7:39     ` Ingo Molnar
2013-03-11 16:38       ` Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.