All of lore.kernel.org
 help / color / mirror / Atom feed
* [ANNOUNCE] 3.8-rc6-nohz4
@ 2013-02-06 18:28 Frederic Weisbecker
  2013-02-07  2:50 ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2013-02-06 18:28 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Li Zhong, Namhyung Kim,
	Paul E. McKenney, Paul Gortmaker, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner

Hi,

The support for printk and cputime accounting to work on full dynticks CPUs have
been merged in -tip tree and is likely deemed for the 3.9 merge window. So this
new release is a rebase against the relevant branches in -tip and v3.8-rc6.

The remaining amount of patches has thus quite schrinked.

You can pull this branch from:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	3.8-rc6-nohz4

Remember it doesn't yet support everything the tick does, namely it lacks
full support for:

- Posix CPU timers
- Perf events
- sched_class::task_tick()
- various other scheduler details
- ...

So use it with caution!

Thanks.

PS: next upstream focus will probably be on posix cpu timers. May be we can try to
make it work using timer_list or hrtimers, I don't know yet. I need to experiment.

---
Changes since 3.8-rc4-nohz3 (including those of cputime):

* Rebase against v3.8-rc6 and latest tip:/sched/core and tip:/irq/core

* Fix cputime build error with kvm modules (Thanks Sedat Dilek and
Wu Fenguang)

* Fix cputime mistyped header inclusion in ia64

* Fix more missing symbols for kvm

* Removal of profiling's timer hook also applied in -tip


---
Frederic Weisbecker (26):
  nohz: Basic full dynticks interface
  nohz: Assign timekeeping duty to a non-full-nohz CPU
  nohz: Trace timekeeping update
  nohz: Wake up full dynticks CPUs when a timer gets enqueued
  rcu: Restart the tick on non-responding full dynticks CPUs
  sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz
  sched: Update rq clock on nohz CPU before migrating tasks
  sched: Update rq clock on nohz CPU before setting fair group shares
  sched: Update rq clock on tickless CPUs before calling
    check_preempt_curr()
  sched: Update rq clock earlier in unthrottle_cfs_rq
  sched: Update clock of nohz busiest rq before balancing
  sched: Update rq clock before idle balancing
  sched: Update nohz rq clock before searching busiest group on load
    balancing
  nohz: Move nohz load balancer selection into idle logic
  nohz: Full dynticks mode
  nohz: Only stop the tick on RCU nocb CPUs
  nohz: Don't turn off the tick if rcu needs it
  nohz: Don't stop the tick if posix cpu timers are running
  nohz: Add some tracing
  rcu: Don't keep the tick for RCU while in userspace
  timer: Don't run non-pinned timer to full dynticks CPUs
  sched: Use an accessor to read rq clock
  sched: Debug nohz rq clock
  sched: Remove broken check for skip clock update
  sched: Update rq clock before rt sched average scale
  sched: Disable lb_bias feature for full dynticks

 include/linux/posix-timers.h |    1 +
 include/linux/rcupdate.h     |    8 +++
 include/linux/sched.h        |   10 +++-
 include/linux/tick.h         |    9 +++
 kernel/hrtimer.c             |    3 +-
 kernel/posix-cpu-timers.c    |   11 ++++
 kernel/rcutree.c             |   19 +++++--
 kernel/rcutree.h             |    1 -
 kernel/rcutree_plugin.h      |   13 +---
 kernel/sched/core.c          |  104 ++++++++++++++++++++++++++++++---
 kernel/sched/fair.c          |   96 ++++++++++++++++++++++--------
 kernel/sched/features.h      |    3 +
 kernel/sched/rt.c            |    8 +-
 kernel/sched/sched.h         |   50 ++++++++++++++++
 kernel/sched/stats.h         |    8 +-
 kernel/sched/stop_task.c     |    8 +-
 kernel/softirq.c             |    5 +-
 kernel/time/Kconfig          |    9 +++
 kernel/time/tick-broadcast.c |    3 +-
 kernel/time/tick-common.c    |    5 +-
 kernel/time/tick-sched.c     |  132 +++++++++++++++++++++++++++++++++++++----
 kernel/timer.c               |    5 +-
 22 files changed, 427 insertions(+), 84 deletions(-)

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-06 18:28 [ANNOUNCE] 3.8-rc6-nohz4 Frederic Weisbecker
@ 2013-02-07  2:50 ` Steven Rostedt
  2013-02-07 11:10   ` Ingo Molnar
  2013-02-07 16:41   ` Frederic Weisbecker
  0 siblings, 2 replies; 28+ messages in thread
From: Steven Rostedt @ 2013-02-07  2:50 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Chris Metcalf,
	Christoph Lameter, Geoff Levand, Gilad Ben Yossef, Hakan Akkan,
	Ingo Molnar, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

I'll reply to this as I come up with comments.

First thing is, don't call it NO_HZ_FULL. A better name would be
NO_HZ_CPU. I would like to reserve NO_HZ_FULL when we totally remove
jiffies :-)

And the kconfig help should probably call it "Adaptive tickless" or
"Tickless for single tasks". The full tickless system really sounds like
we totally removed jiffies. It should explain it better. Something like:

  "Adaptive tickless system"

With this option, you may designate CPUs that will turn off the periodic
interrupt "tick" when only a single task is scheduled on the CPU. This
is similar to NO_HZ where the tick is suspended when the CPU goes into
idle. With this option, it takes it one step further. When only a single
task is scheduled on the CPU, there scheduler does not need to keep
track of time slices, as the running task does not need to be preempted
for other tasks. Stopping the tick allows the task to avoid being
interrupted by service routines by the kernel.

CPUs must be designated at time of boot via the kernel command line
parameter (cpu_nohz) and must be a subset of the rcu_nocb parameter,
which prevents RCU service routines from being called on the CPUs as
well.

---

Something like that.

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07  2:50 ` Steven Rostedt
@ 2013-02-07 11:10   ` Ingo Molnar
  2013-02-07 15:41     ` Christoph Lameter
                       ` (2 more replies)
  2013-02-07 16:41   ` Frederic Weisbecker
  1 sibling, 3 replies; 28+ messages in thread
From: Ingo Molnar @ 2013-02-07 11:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner


* Steven Rostedt <rostedt@goodmis.org> wrote:

> I'll reply to this as I come up with comments.
> 
> First thing is, don't call it NO_HZ_FULL. A better name would 
> be NO_HZ_CPU. I would like to reserve NO_HZ_FULL when we 
> totally remove jiffies :-)

I don't think we want yet another config option named in a 
weird way.

What we want instead is to just split NO_HZ up into its 
conceptual parts:

   CONFIG_NO_HZ_IDLE
   CONFIG_NO_HZ_USER_SPACE
   CONFIG_NO_HZ_KERNEL_SPACE

Where the current status quo is NO_HZ_IDLE=y, and Frederic is 
about to introduce NO_HZ_USER_SPACE=y. When jiffies get removed 
we get NO_HZ_KERNEL_SPACE=y.

The 'CONFIG_NO_HZ' meta-option, which we should leave for easy 
configurability and for compatibility, should get us the 
currently recommended default, which for the time being might 
be:

   CONFIG_NO_HZ_IDLE=y
   # CONFIG_NO_HZ_USER_SPACE is disabled

Btw., you could add CONFIG_NO_HZ_KERNEL_SPACE right away, just 
keep it false all the time. That would document our future plans 
pretty well.

Once CONFIG_NO_HZ_USER_SPACE is proven problem-free, we might 
default to:

   CONFIG_NO_HZ_IDLE=y
   CONFIG_NO_HZ_USER_SPACE=y

The goal is to have this in the distant future:

   CONFIG_NO_HZ=y

   CONFIG_NO_HZ_IDLE=y
   CONFIG_NO_HZ_USER_SPACE=y
   CONFIG_NO_HZ_KERNEL_SPACE=y

And eventually we might even be able to get rid of all the 3 
variants, and only offer full-on/off.

Agreed?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 11:10   ` Ingo Molnar
@ 2013-02-07 15:41     ` Christoph Lameter
  2013-02-07 16:12     ` Steven Rostedt
  2013-02-07 16:25     ` Frederic Weisbecker
  2 siblings, 0 replies; 28+ messages in thread
From: Christoph Lameter @ 2013-02-07 15:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Frederic Weisbecker, LKML, Alessio Igor Bogani,
	Andrew Morton, Chris Metcalf, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

On Thu, 7 Feb 2013, Ingo Molnar wrote:

> Agreed?

Yes and please also change the texts in Kconfig to accurately describe
what happens to the timer tick.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 11:10   ` Ingo Molnar
  2013-02-07 15:41     ` Christoph Lameter
@ 2013-02-07 16:12     ` Steven Rostedt
  2013-02-07 16:30       ` Paul E. McKenney
  2013-02-07 16:25     ` Frederic Weisbecker
  2 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2013-02-07 16:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

On Thu, 2013-02-07 at 12:10 +0100, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > I'll reply to this as I come up with comments.
> > 
> > First thing is, don't call it NO_HZ_FULL. A better name would 
> > be NO_HZ_CPU. I would like to reserve NO_HZ_FULL when we 
> > totally remove jiffies :-)
> 
> I don't think we want yet another config option named in a 
> weird way.
> 
> What we want instead is to just split NO_HZ up into its 
> conceptual parts:
> 
>    CONFIG_NO_HZ_IDLE
>    CONFIG_NO_HZ_USER_SPACE
>    CONFIG_NO_HZ_KERNEL_SPACE
> 
> Where the current status quo is NO_HZ_IDLE=y, and Frederic is 
> about to introduce NO_HZ_USER_SPACE=y. When jiffies get removed 
> we get NO_HZ_KERNEL_SPACE=y.

Saying NO_HZ_USER_SPACE is a bit of a misnomer. As we don't just stop
the tick for user space, but it may remained stopped when entering the
kernel. The rule is that when there's just a single task on a CPU, the
tick can stop (no scheduling work needed). But if the task triggers
something that may require a tick (like printk) then the tick will start
again. But just going into the kernel does not designate a tick restart.

Maybe a better name would be NO_HZ_SINGLE_TASK ?

> 
> The 'CONFIG_NO_HZ' meta-option, which we should leave for easy 
> configurability and for compatibility, should get us the 
> currently recommended default, which for the time being might 
> be:
> 
>    CONFIG_NO_HZ_IDLE=y
>    # CONFIG_NO_HZ_USER_SPACE is disabled
> 
> Btw., you could add CONFIG_NO_HZ_KERNEL_SPACE right away, just 
> keep it false all the time. That would document our future plans 
> pretty well.

Maybe the removal of jiffies would be NO_HZ_COMPLETE?


-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 11:10   ` Ingo Molnar
  2013-02-07 15:41     ` Christoph Lameter
  2013-02-07 16:12     ` Steven Rostedt
@ 2013-02-07 16:25     ` Frederic Weisbecker
  2013-02-07 16:41       ` Steven Rostedt
  2013-02-07 19:07       ` Ingo Molnar
  2 siblings, 2 replies; 28+ messages in thread
From: Frederic Weisbecker @ 2013-02-07 16:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

2013/2/7 Ingo Molnar <mingo@kernel.org>:
>
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> I'll reply to this as I come up with comments.
>>
>> First thing is, don't call it NO_HZ_FULL. A better name would
>> be NO_HZ_CPU. I would like to reserve NO_HZ_FULL when we
>> totally remove jiffies :-)
>
> I don't think we want yet another config option named in a
> weird way.
>
> What we want instead is to just split NO_HZ up into its
> conceptual parts:
>
>    CONFIG_NO_HZ_IDLE

Renaming CONFIG_NO_HZ to CONFIG_NO_HZ_IDLE is something I considered.
I was just worried about this option being present in many defconfig.
Perhaps we can do that renaming and keep CONFIG_NO_HZ around a little
while for backward compatibility (pretty much like what we've done for
CONFIG_PERF_COUNTERS -> CONFIG_PERF_EVENTS).

>    CONFIG_NO_HZ_USER_SPACE
>    CONFIG_NO_HZ_KERNEL_SPACE
>
> Where the current status quo is NO_HZ_IDLE=y, and Frederic is
> about to introduce NO_HZ_USER_SPACE=y. When jiffies get removed
> we get NO_HZ_KERNEL_SPACE=y.

Note on my tree I stop the tick on both rings. I believe that
restarting the tick on kernel entry isn't something we should
seriously consider. It would be a costly operation that may make
things worse. And in fact there is no big difference. Just kernelspace
has more opportunities to be disturbed (RCU IPIs, async timer/work
scheduled by the kernel, etc...) and get its tick restarted sometimes.

>
> The 'CONFIG_NO_HZ' meta-option, which we should leave for easy
> configurability and for compatibility, should get us the
> currently recommended default, which for the time being might
> be:

Ah looks like you considered the compatibility as well :)

>
>    CONFIG_NO_HZ_IDLE=y
>    # CONFIG_NO_HZ_USER_SPACE is disabled
>
> Btw., you could add CONFIG_NO_HZ_KERNEL_SPACE right away, just
> keep it false all the time. That would document our future plans
> pretty well.
>
> Once CONFIG_NO_HZ_USER_SPACE is proven problem-free, we might
> default to:
>
>    CONFIG_NO_HZ_IDLE=y
>    CONFIG_NO_HZ_USER_SPACE=y
>
> The goal is to have this in the distant future:
>
>    CONFIG_NO_HZ=y
>
>    CONFIG_NO_HZ_IDLE=y
>    CONFIG_NO_HZ_USER_SPACE=y
>    CONFIG_NO_HZ_KERNEL_SPACE=y
>
> And eventually we might even be able to get rid of all the 3
> variants, and only offer full-on/off.
>
> Agreed?

At least for now we seem to agree on CONFIG_NO_HZ_IDLE and keep
CONFIG_NO_HZ for compatibility. Are you ok with that? If so I'll send
a patch.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 16:12     ` Steven Rostedt
@ 2013-02-07 16:30       ` Paul E. McKenney
  2013-02-07 17:06         ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2013-02-07 16:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Frederic Weisbecker, LKML, Alessio Igor Bogani,
	Andrew Morton, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Li Zhong, Namhyung Kim,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

On Thu, Feb 07, 2013 at 11:12:00AM -0500, Steven Rostedt wrote:
> On Thu, 2013-02-07 at 12:10 +0100, Ingo Molnar wrote:
> > * Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > I'll reply to this as I come up with comments.
> > > 
> > > First thing is, don't call it NO_HZ_FULL. A better name would 
> > > be NO_HZ_CPU. I would like to reserve NO_HZ_FULL when we 
> > > totally remove jiffies :-)
> > 
> > I don't think we want yet another config option named in a 
> > weird way.
> > 
> > What we want instead is to just split NO_HZ up into its 
> > conceptual parts:
> > 
> >    CONFIG_NO_HZ_IDLE
> >    CONFIG_NO_HZ_USER_SPACE
> >    CONFIG_NO_HZ_KERNEL_SPACE
> > 
> > Where the current status quo is NO_HZ_IDLE=y, and Frederic is 
> > about to introduce NO_HZ_USER_SPACE=y. When jiffies get removed 
> > we get NO_HZ_KERNEL_SPACE=y.
> 
> Saying NO_HZ_USER_SPACE is a bit of a misnomer. As we don't just stop
> the tick for user space, but it may remained stopped when entering the
> kernel. The rule is that when there's just a single task on a CPU, the
> tick can stop (no scheduling work needed). But if the task triggers
> something that may require a tick (like printk) then the tick will start
> again. But just going into the kernel does not designate a tick restart.
> 
> Maybe a better name would be NO_HZ_SINGLE_TASK ?
> 
> > 
> > The 'CONFIG_NO_HZ' meta-option, which we should leave for easy 
> > configurability and for compatibility, should get us the 
> > currently recommended default, which for the time being might 
> > be:
> > 
> >    CONFIG_NO_HZ_IDLE=y
> >    # CONFIG_NO_HZ_USER_SPACE is disabled
> > 
> > Btw., you could add CONFIG_NO_HZ_KERNEL_SPACE right away, just 
> > keep it false all the time. That would document our future plans 
> > pretty well.
> 
> Maybe the removal of jiffies would be NO_HZ_COMPLETE?

I suspect that removal of jiffies from the kernel will take a few stages,
with RCU being one of the laggards for awhile.  Making RCU's state
machine depend wholly on process-based execution will take some care
and experimentation, especially for extreme and corner-case workloads.
For example, having RCU OOM the system just because a specific CPU was
unable to run some RCU kthread for an extended time is something to
be avoided.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 16:25     ` Frederic Weisbecker
@ 2013-02-07 16:41       ` Steven Rostedt
  2013-02-07 16:45         ` Frederic Weisbecker
  2013-02-07 19:07       ` Ingo Molnar
  1 sibling, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2013-02-07 16:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

On Thu, 2013-02-07 at 17:25 +0100, Frederic Weisbecker wrote:

> At least for now we seem to agree on CONFIG_NO_HZ_IDLE and keep
> CONFIG_NO_HZ for compatibility. Are you ok with that? If so I'll send
> a patch.

I believe that Ingo was suggesting to have CONFIG_NO_HZ give options to
what type of config NO_HZ you want. Something like:

config NO_HZ
	bool "Enable tickless support"

config NO_HZ_IDLE
	bool "Stop tick when CPU is idle"
	default y
	depends on NO_HZ

config NO_HZ_TASK
	bool "Stop tick on specified CPUs when single task is running"
	default n
	depends on NO_HZ

That is, if you select NO_HZ, by default NO_HZ_IDLE is also selected.
But in the kernel the NO_HZ_IDLE is used.

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07  2:50 ` Steven Rostedt
  2013-02-07 11:10   ` Ingo Molnar
@ 2013-02-07 16:41   ` Frederic Weisbecker
  2013-02-07 17:00     ` Steven Rostedt
  1 sibling, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2013-02-07 16:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Chris Metcalf,
	Christoph Lameter, Geoff Levand, Gilad Ben Yossef, Hakan Akkan,
	Ingo Molnar, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

2013/2/7 Steven Rostedt <rostedt@goodmis.org>:
> I'll reply to this as I come up with comments.
>
> First thing is, don't call it NO_HZ_FULL. A better name would be
> NO_HZ_CPU. I would like to reserve NO_HZ_FULL when we totally remove
> jiffies :-)

I'm not sure we'll ever be able to completely remove the tick, even if
jiffies is removed.
Ok in any case, NO_HZ_ADAPTIVE is probably a more accurate name.

>
> And the kconfig help should probably call it "Adaptive tickless" or
> "Tickless for single tasks". The full tickless system really sounds like
> we totally removed jiffies. It should explain it better. Something like:
>
>   "Adaptive tickless system"

Right the problem with "tickless" is its meaning of absolute removal.
"Full dynticks" is what I think reflect best what's happening.

> With this option, you may designate CPUs that will turn off the periodic
> interrupt "tick" when only a single task is scheduled on the CPU. This
> is similar to NO_HZ where the tick is suspended when the CPU goes into
> idle. With this option, it takes it one step further. When only a single
> task is scheduled on the CPU, there scheduler does not need to keep
> track of time slices, as the running task does not need to be preempted
> for other tasks. Stopping the tick allows the task to avoid being
> interrupted by service routines by the kernel.
>
> CPUs must be designated at time of boot via the kernel command line
> parameter (cpu_nohz) and must be a subset of the rcu_nocb parameter,
> which prevents RCU service routines from being called on the CPUs as
> well.
>
> ---
>
> Something like that.

I'm not convinced that "single task" must be a fundamental component
of this. It's an implementation detail. We should be able to keep the
tick off in the future when more than one task are on the runqueue and
hrtick is on. May be this will never show up as a performance gain but
we don't know yet.

Ok let's talk about that single task constraint in the Kconfig help so
that the user knows the practical constraint as of today. But I
suggest we keep that as an internal detail that we can deal with in
the future.

Hm?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 16:41       ` Steven Rostedt
@ 2013-02-07 16:45         ` Frederic Weisbecker
  2013-02-07 17:03           ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2013-02-07 16:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

2013/2/7 Steven Rostedt <rostedt@goodmis.org>:
> On Thu, 2013-02-07 at 17:25 +0100, Frederic Weisbecker wrote:
>
>> At least for now we seem to agree on CONFIG_NO_HZ_IDLE and keep
>> CONFIG_NO_HZ for compatibility. Are you ok with that? If so I'll send
>> a patch.
>
> I believe that Ingo was suggesting to have CONFIG_NO_HZ give options to
> what type of config NO_HZ you want. Something like:
>
> config NO_HZ
>         bool "Enable tickless support"
>
> config NO_HZ_IDLE
>         bool "Stop tick when CPU is idle"
>         default y
>         depends on NO_HZ

Sounds good!

>
> config NO_HZ_TASK
>         bool "Stop tick on specified CPUs when single task is running"
>         default n
>         depends on NO_HZ

Ok I launched another debate about that single task thing. I wish we
don't make it a fundamental component but rather an implementation
detail that can be dynamically dealt with in the future. Anyway let's
talk about that on my previous answer.

>
> That is, if you select NO_HZ, by default NO_HZ_IDLE is also selected.
> But in the kernel the NO_HZ_IDLE is used.

Yeah, nice idea!

>
> -- Steve
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 16:41   ` Frederic Weisbecker
@ 2013-02-07 17:00     ` Steven Rostedt
  2013-02-07 17:18       ` Frederic Weisbecker
  0 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2013-02-07 17:00 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Chris Metcalf,
	Christoph Lameter, Geoff Levand, Gilad Ben Yossef, Hakan Akkan,
	Ingo Molnar, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

On Thu, 2013-02-07 at 17:41 +0100, Frederic Weisbecker wrote:

> I'm not convinced that "single task" must be a fundamental component
> of this. It's an implementation detail. We should be able to keep the
> tick off in the future when more than one task are on the runqueue and
> hrtick is on. May be this will never show up as a performance gain but
> we don't know yet.
> 
> Ok let's talk about that single task constraint in the Kconfig help so
> that the user knows the practical constraint as of today. But I
> suggest we keep that as an internal detail that we can deal with in
> the future.

Hmm, but isn't time slices still implemented by ticks? I would think
implementing multiple tasks would be another huge change.

Maybe have:

NO_HZ_IDLE
NO_HZ_SINGLE_TASK
NO_HZ_MULTI_TASK
NO_HZ_COMPLETE

And as Ingo has suggested, maybe in the future we can remove SINGLE and
MULTI and have just COMPLETE.

But anyway, the current method has a strict requirement of a single
task, and that is user visible. I would want to keep the config name
implying that requirement.

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 16:45         ` Frederic Weisbecker
@ 2013-02-07 17:03           ` Steven Rostedt
  2013-02-07 17:45             ` Frederic Weisbecker
  0 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2013-02-07 17:03 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

On Thu, 2013-02-07 at 17:45 +0100, Frederic Weisbecker wrote:

> >
> > config NO_HZ_TASK
> >         bool "Stop tick on specified CPUs when single task is running"
> >         default n
> >         depends on NO_HZ
> 
> Ok I launched another debate about that single task thing. I wish we
> don't make it a fundamental component but rather an implementation
> detail that can be dynamically dealt with in the future.

It's not just an implementation detail, as it is very visible to the
user. If they want to take advantage of a task NO_HZ they have to go
through a bit of loops to make sure only a single task is running on a
CPU. We should be broadcasting this requirement to educate the users on
exactly how they can take advantage of this feature.


>  Anyway let's
> talk about that on my previous answer.

I already did ;-)

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 16:30       ` Paul E. McKenney
@ 2013-02-07 17:06         ` Steven Rostedt
  2013-02-07 17:37           ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2013-02-07 17:06 UTC (permalink / raw)
  To: paulmck
  Cc: Ingo Molnar, Frederic Weisbecker, LKML, Alessio Igor Bogani,
	Andrew Morton, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Li Zhong, Namhyung Kim,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

On Thu, 2013-02-07 at 08:30 -0800, Paul E. McKenney wrote:

> I suspect that removal of jiffies from the kernel will take a few stages,
> with RCU being one of the laggards for awhile.  Making RCU's state
> machine depend wholly on process-based execution will take some care
> and experimentation, especially for extreme and corner-case workloads.
> For example, having RCU OOM the system just because a specific CPU was
> unable to run some RCU kthread for an extended time is something to
> be avoided.  ;-)

Tickless doesn't mean no timeouts or periodic timers. I think we will
always have some sort of dynamic tick when needed. It will just be more
event driven then something that goes off constantly.

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 17:00     ` Steven Rostedt
@ 2013-02-07 17:18       ` Frederic Weisbecker
  2013-02-07 19:14         ` Christoph Lameter
  0 siblings, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2013-02-07 17:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Chris Metcalf,
	Christoph Lameter, Geoff Levand, Gilad Ben Yossef, Hakan Akkan,
	Ingo Molnar, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

2013/2/7 Steven Rostedt <rostedt@goodmis.org>:
> On Thu, 2013-02-07 at 17:41 +0100, Frederic Weisbecker wrote:
>
>> I'm not convinced that "single task" must be a fundamental component
>> of this. It's an implementation detail. We should be able to keep the
>> tick off in the future when more than one task are on the runqueue and
>> hrtick is on. May be this will never show up as a performance gain but
>> we don't know yet.
>>
>> Ok let's talk about that single task constraint in the Kconfig help so
>> that the user knows the practical constraint as of today. But I
>> suggest we keep that as an internal detail that we can deal with in
>> the future.
>
> Hmm, but isn't time slices still implemented by ticks?

Not with hrtick.

> I would think
> implementing multiple tasks would be another huge change.

I don't think so. Really hrtick should take of everything.

>
> Maybe have:
>
> NO_HZ_IDLE
> NO_HZ_SINGLE_TASK
> NO_HZ_MULTI_TASK
> NO_HZ_COMPLETE

I still see single task, multitask or complete as implementation
constraints. Once we make hrtick support dynticks, it should be
dynamically handled: if hrtick is enabled then stop the tick even on
multitask, otherwise only stop it when we have one task.

Then when we remove jiffies, the complete coverage comes along.

>
> And as Ingo has suggested, maybe in the future we can remove SINGLE and
> MULTI and have just COMPLETE.

But really, turning these constraints into single built-in optable
choices doesn't make much sense to me.

> But anyway, the current method has a strict requirement of a single
> task, and that is user visible. I would want to keep the config name
> implying that requirement.

As long as it's specified in the Kconfig help, does it matter? It''s a
constraint amongst many others: you need to keep one CPU with a
periodic tick, you need to avoid posix cpu timers, etc...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 17:06         ` Steven Rostedt
@ 2013-02-07 17:37           ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2013-02-07 17:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Frederic Weisbecker, LKML, Alessio Igor Bogani,
	Andrew Morton, Chris Metcalf, Christoph Lameter, Geoff Levand,
	Gilad Ben Yossef, Hakan Akkan, Li Zhong, Namhyung Kim,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

On Thu, Feb 07, 2013 at 12:06:21PM -0500, Steven Rostedt wrote:
> On Thu, 2013-02-07 at 08:30 -0800, Paul E. McKenney wrote:
> 
> > I suspect that removal of jiffies from the kernel will take a few stages,
> > with RCU being one of the laggards for awhile.  Making RCU's state
> > machine depend wholly on process-based execution will take some care
> > and experimentation, especially for extreme and corner-case workloads.
> > For example, having RCU OOM the system just because a specific CPU was
> > unable to run some RCU kthread for an extended time is something to
> > be avoided.  ;-)
> 
> Tickless doesn't mean no timeouts or periodic timers. I think we will
> always have some sort of dynamic tick when needed. It will just be more
> event driven then something that goes off constantly.

As long as we don't end up replacing a single tick with multiple hrtimers
(or whatever), ending up with more overhead and disruption than we
started with.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 17:03           ` Steven Rostedt
@ 2013-02-07 17:45             ` Frederic Weisbecker
  0 siblings, 0 replies; 28+ messages in thread
From: Frederic Weisbecker @ 2013-02-07 17:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

2013/2/7 Steven Rostedt <rostedt@goodmis.org>:
> On Thu, 2013-02-07 at 17:45 +0100, Frederic Weisbecker wrote:
>
>> >
>> > config NO_HZ_TASK
>> >         bool "Stop tick on specified CPUs when single task is running"
>> >         default n
>> >         depends on NO_HZ
>>
>> Ok I launched another debate about that single task thing. I wish we
>> don't make it a fundamental component but rather an implementation
>> detail that can be dynamically dealt with in the future.
>
> It's not just an implementation detail, as it is very visible to the
> user. If they want to take advantage of a task NO_HZ they have to go
> through a bit of loops to make sure only a single task is running on a
> CPU. We should be broadcasting this requirement to educate the users on
> exactly how they can take advantage of this feature.

If you guys really insist I can make it CONFIG_NO_HZ_SINGLETASK. I
don't mind that much. Then when we support hrtick we can rename it to
NO_HZ_FULL or whatever.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 16:25     ` Frederic Weisbecker
  2013-02-07 16:41       ` Steven Rostedt
@ 2013-02-07 19:07       ` Ingo Molnar
  2013-02-07 19:19         ` Steven Rostedt
  2013-02-08 15:51         ` Frederic Weisbecker
  1 sibling, 2 replies; 28+ messages in thread
From: Ingo Molnar @ 2013-02-07 19:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Steven Rostedt, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> 2013/2/7 Ingo Molnar <mingo@kernel.org>:
> >
> > * Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> >> I'll reply to this as I come up with comments.
> >>
> >> First thing is, don't call it NO_HZ_FULL. A better name would
> >> be NO_HZ_CPU. I would like to reserve NO_HZ_FULL when we
> >> totally remove jiffies :-)
> >
> > I don't think we want yet another config option named in a
> > weird way.
> >
> > What we want instead is to just split NO_HZ up into its
> > conceptual parts:
> >
> >    CONFIG_NO_HZ_IDLE
> 
> Renaming CONFIG_NO_HZ to CONFIG_NO_HZ_IDLE is something I 
> considered. I was just worried about this option being present 
> in many defconfig.

I don't think renaming it is an option - it's present not just 
in defconfigs, but in various distro configs, etc.

But we can add new config variables and use the existing 
CONFIG_NO_HZ value to set their default values.

> Perhaps we can do that renaming and keep CONFIG_NO_HZ around a 
> little while for backward compatibility (pretty much like what 
> we've done for CONFIG_PERF_COUNTERS -> CONFIG_PERF_EVENTS).

Yes.

> >    CONFIG_NO_HZ_USER_SPACE
> >    CONFIG_NO_HZ_KERNEL_SPACE
> >
> > Where the current status quo is NO_HZ_IDLE=y, and Frederic is
> > about to introduce NO_HZ_USER_SPACE=y. When jiffies get removed
> > we get NO_HZ_KERNEL_SPACE=y.
> 
> Note on my tree I stop the tick on both rings. I believe that 
> restarting the tick on kernel entry isn't something we should 
> seriously consider. It would be a costly operation that may 
> make things worse. And in fact there is no big difference. 
> Just kernelspace has more opportunities to be disturbed (RCU 
> IPIs, async timer/work scheduled by the kernel, etc...) and 
> get its tick restarted sometimes.

Ok.

Could we just simplify things and make this an unconditional 
option of NO_HZ? Any reason why we'd want to make this 
configurable, other than debugging?

I'm worried about the proliferation of not easily separable 
config options. We already have way too many timer and scheduler 
options to begin with.

> At least for now we seem to agree on CONFIG_NO_HZ_IDLE and 
> keep CONFIG_NO_HZ for compatibility. Are you ok with that? If 
> so I'll send a patch.

What would be the name of the new config option?

Can we just keep CONFIG_NO_HZ and extend it with your bits, and 
make sure they work well?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 17:18       ` Frederic Weisbecker
@ 2013-02-07 19:14         ` Christoph Lameter
  2013-02-07 19:55           ` Ingo Molnar
                             ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Christoph Lameter @ 2013-02-07 19:14 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Steven Rostedt, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Geoff Levand, Gilad Ben Yossef, Hakan Akkan,
	Ingo Molnar, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

On Thu, 7 Feb 2013, Frederic Weisbecker wrote:

> Not with hrtick.

hrtick? Did we not already try that a couple of years back and it turned
out that the overhead of constantly reprogramming a timer via the PCI bus
was causing too much of a performance regression?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 19:07       ` Ingo Molnar
@ 2013-02-07 19:19         ` Steven Rostedt
  2013-02-08 15:51         ` Frederic Weisbecker
  1 sibling, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2013-02-07 19:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

On Thu, 2013-02-07 at 20:07 +0100, Ingo Molnar wrote:

> Could we just simplify things and make this an unconditional 
> option of NO_HZ? Any reason why we'd want to make this 
> configurable, other than debugging?

I think the worry is the overhead that is required to keep it active. It
requires the context_tracking being enabled. Although, we may be able to
have both working. 

Frederic, can we switch between context_tracking timing and tick base at
run time?

If we can have it enabled without overhead then I see no problem with
it. We still need the boot time kernel parameter to implement it. Hmm,
even if we can't dynamically switch between context_tracking and tick
base, we could make that decision at boot up based off of the kernel
parameters.


> 
> I'm worried about the proliferation of not easily separable 
> config options. We already have way too many timer and scheduler 
> options to begin with.

I agree.

> 
> > At least for now we seem to agree on CONFIG_NO_HZ_IDLE and 
> > keep CONFIG_NO_HZ for compatibility. Are you ok with that? If 
> > so I'll send a patch.
> 
> What would be the name of the new config option?
> 
> Can we just keep CONFIG_NO_HZ and extend it with your bits, and 
> make sure they work well?

As long as we do not introduce performance regressions. If we can keep
it active without causing the system to slow down when not in use, then
I think it should be always enabled if CONFIG_NO_HZ is selected.

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 19:14         ` Christoph Lameter
@ 2013-02-07 19:55           ` Ingo Molnar
  2013-02-08  6:18           ` Mike Galbraith
  2013-02-08 15:53           ` Frederic Weisbecker
  2 siblings, 0 replies; 28+ messages in thread
From: Ingo Molnar @ 2013-02-07 19:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Frederic Weisbecker, Steven Rostedt, LKML, Alessio Igor Bogani,
	Andrew Morton, Chris Metcalf, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner


* Christoph Lameter <cl@linux.com> wrote:

> On Thu, 7 Feb 2013, Frederic Weisbecker wrote:
> 
> > Not with hrtick.
> 
> hrtick? Did we not already try that a couple of years back and 
> it turned out that the overhead of constantly reprogramming a 
> timer via the PCI bus was causing too much of a performance 
> regression?

No, it was simply buggy.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 19:14         ` Christoph Lameter
  2013-02-07 19:55           ` Ingo Molnar
@ 2013-02-08  6:18           ` Mike Galbraith
  2013-02-08 15:53           ` Frederic Weisbecker
  2 siblings, 0 replies; 28+ messages in thread
From: Mike Galbraith @ 2013-02-08  6:18 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Frederic Weisbecker, Steven Rostedt, LKML, Alessio Igor Bogani,
	Andrew Morton, Chris Metcalf, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Li Zhong, Namhyung Kim,
	Paul E. McKenney, Paul Gortmaker, Peter Zijlstra,
	Thomas Gleixner

On Thu, 2013-02-07 at 19:14 +0000, Christoph Lameter wrote: 
> On Thu, 7 Feb 2013, Frederic Weisbecker wrote:
> 
> > Not with hrtick.
> 
> hrtick? Did we not already try that a couple of years back and it turned
> out that the overhead of constantly reprogramming a timer via the PCI bus
> was causing too much of a performance regression?

Yup.

-Mike


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 19:07       ` Ingo Molnar
  2013-02-07 19:19         ` Steven Rostedt
@ 2013-02-08 15:51         ` Frederic Weisbecker
  2013-02-11  9:59           ` Ingo Molnar
  1 sibling, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2013-02-08 15:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

2013/2/7 Ingo Molnar <mingo@kernel.org>:
>
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
>> 2013/2/7 Ingo Molnar <mingo@kernel.org>:
>> >
>> > * Steven Rostedt <rostedt@goodmis.org> wrote:
>> >
>> >> I'll reply to this as I come up with comments.
>> >>
>> >> First thing is, don't call it NO_HZ_FULL. A better name would
>> >> be NO_HZ_CPU. I would like to reserve NO_HZ_FULL when we
>> >> totally remove jiffies :-)
>> >
>> > I don't think we want yet another config option named in a
>> > weird way.
>> >
>> > What we want instead is to just split NO_HZ up into its
>> > conceptual parts:
>> >
>> >    CONFIG_NO_HZ_IDLE
>>
>> Renaming CONFIG_NO_HZ to CONFIG_NO_HZ_IDLE is something I
>> considered. I was just worried about this option being present
>> in many defconfig.
>
> I don't think renaming it is an option - it's present not just
> in defconfigs, but in various distro configs, etc.
>
> But we can add new config variables and use the existing
> CONFIG_NO_HZ value to set their default values.

Sure.

>> Note on my tree I stop the tick on both rings. I believe that
>> restarting the tick on kernel entry isn't something we should
>> seriously consider. It would be a costly operation that may
>> make things worse. And in fact there is no big difference.
>> Just kernelspace has more opportunities to be disturbed (RCU
>> IPIs, async timer/work scheduled by the kernel, etc...) and
>> get its tick restarted sometimes.
>
> Ok.
>
> Could we just simplify things and make this an unconditional
> option of NO_HZ? Any reason why we'd want to make this
> configurable, other than debugging?
>
> I'm worried about the proliferation of not easily separable
> config options. We already have way too many timer and scheduler
> options to begin with.

Like Steve said, this is for overhead reasons. The syscall uses the
slow path so that's ok. But we add a callback to every exception, irq
entry/exit, scheduler sched switch, signal handling, user and kernel
preemption point. This all could be lowered using static keys but even
that doesn't make me feel comfortable with this idea.

Moreover, for now this is going to be used only on extreme usecases
such as real time and HPC. If we really have to merge this into an
all-in-one nohz kconfig, I suggest we wait for the feature to mature a
bit and prove that it can be useful further those specialized
workloads, and also that we can ensure it's off-case overhead is not
significant.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-07 19:14         ` Christoph Lameter
  2013-02-07 19:55           ` Ingo Molnar
  2013-02-08  6:18           ` Mike Galbraith
@ 2013-02-08 15:53           ` Frederic Weisbecker
  2013-02-08 16:18             ` Steven Rostedt
  2013-02-08 18:57             ` Clark Williams
  2 siblings, 2 replies; 28+ messages in thread
From: Frederic Weisbecker @ 2013-02-08 15:53 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Steven Rostedt, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Geoff Levand, Gilad Ben Yossef, Hakan Akkan,
	Ingo Molnar, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

2013/2/7 Christoph Lameter <cl@linux.com>:
> On Thu, 7 Feb 2013, Frederic Weisbecker wrote:
>
>> Not with hrtick.
>
> hrtick? Did we not already try that a couple of years back and it turned
> out that the overhead of constantly reprogramming a timer via the PCI bus
> was causing too much of a performance regression?

Yeah Peter said that especially reprogramming the clock everytime we
call schedule() was killing the performances. Now may be on some
workloads, with the tick stopped, we can find some new results.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-08 15:53           ` Frederic Weisbecker
@ 2013-02-08 16:18             ` Steven Rostedt
  2013-02-08 16:24               ` Christoph Lameter
  2013-02-08 18:57             ` Clark Williams
  1 sibling, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2013-02-08 16:18 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Christoph Lameter, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Geoff Levand, Gilad Ben Yossef, Hakan Akkan,
	Ingo Molnar, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

On Fri, 2013-02-08 at 16:53 +0100, Frederic Weisbecker wrote:
> 2013/2/7 Christoph Lameter <cl@linux.com>:
> > On Thu, 7 Feb 2013, Frederic Weisbecker wrote:
> >
> >> Not with hrtick.
> >
> > hrtick? Did we not already try that a couple of years back and it turned
> > out that the overhead of constantly reprogramming a timer via the PCI bus
> > was causing too much of a performance regression?
> 
> Yeah Peter said that especially reprogramming the clock everytime we
> call schedule() was killing the performances. Now may be on some
> workloads, with the tick stopped, we can find some new results.

I could imagine this being dynamic. If the system isn't very loaded, and
the scheduler is giving lots of time slices to tasks, then perhaps it
could switch to a reprogramming the clock based scheduling. Or maybe, we
could switch to a "skip ticks" method. That is, instead of completely
disabling the tick, make the tick go off every other time or less, and
use the NO_HZ code to calculate the missed ticks.

-- Steve



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-08 16:18             ` Steven Rostedt
@ 2013-02-08 16:24               ` Christoph Lameter
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Lameter @ 2013-02-08 16:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Geoff Levand, Gilad Ben Yossef, Hakan Akkan,
	Ingo Molnar, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner

On Fri, 8 Feb 2013, Steven Rostedt wrote:

> On Fri, 2013-02-08 at 16:53 +0100, Frederic Weisbecker wrote:
> > 2013/2/7 Christoph Lameter <cl@linux.com>:
> > > On Thu, 7 Feb 2013, Frederic Weisbecker wrote:
> > >
> > >> Not with hrtick.
> > >
> > > hrtick? Did we not already try that a couple of years back and it turned
> > > out that the overhead of constantly reprogramming a timer via the PCI bus
> > > was causing too much of a performance regression?
> >
> > Yeah Peter said that especially reprogramming the clock everytime we
> > call schedule() was killing the performances. Now may be on some
> > workloads, with the tick stopped, we can find some new results.
>
> I could imagine this being dynamic. If the system isn't very loaded, and
> the scheduler is giving lots of time slices to tasks, then perhaps it
> could switch to a reprogramming the clock based scheduling. Or maybe, we
> could switch to a "skip ticks" method. That is, instead of completely
> disabling the tick, make the tick go off every other time or less, and
> use the NO_HZ code to calculate the missed ticks.

Ok that sounds good. Automatically reducing the HZ as much as possible
would also quiet down the OS and be beneficial for low latency tasks. We
are configuring the kernels here with the lowest HZ that the hardware
allows to reduce the number of events that impact on the app. The main
problem is that the network stack becomes flaky at low HZ.




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-08 15:53           ` Frederic Weisbecker
  2013-02-08 16:18             ` Steven Rostedt
@ 2013-02-08 18:57             ` Clark Williams
  2013-02-08 19:43               ` Christoph Lameter
  1 sibling, 1 reply; 28+ messages in thread
From: Clark Williams @ 2013-02-08 18:57 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Christoph Lameter, Steven Rostedt, LKML, Alessio Igor Bogani,
	Andrew Morton, Chris Metcalf, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Li Zhong, Namhyung Kim,
	Paul E. McKenney, Paul Gortmaker, Peter Zijlstra,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 1309 bytes --]

On Fri, 8 Feb 2013 16:53:17 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> 2013/2/7 Christoph Lameter <cl@linux.com>:
> > On Thu, 7 Feb 2013, Frederic Weisbecker wrote:
> >
> >> Not with hrtick.
> >
> > hrtick? Did we not already try that a couple of years back and it turned
> > out that the overhead of constantly reprogramming a timer via the PCI bus
> > was causing too much of a performance regression?
> 
> Yeah Peter said that especially reprogramming the clock everytime we
> call schedule() was killing the performances. Now may be on some
> workloads, with the tick stopped, we can find some new results.
> --

I was a little apprehensive when you started talking about multiple
tasks in Adaptive NOHZ mode on a core but the more I started thinking
about it, I realized that we might end up in a cooperative multitasking
mode with no tick at all going. Multiple SCHED_FIFO threads could
run until blocking and another would be picked. Depends on well
behaved threads of course, so probably many cases of users shooting off
some toes with this...

Of course if you mix scheduling policies or have RT throttling turned
on we'll need some sort of tick for preemption. But if we can keep the
timer reprogramming down we may see some big wins for RT and HPC loads. 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-08 18:57             ` Clark Williams
@ 2013-02-08 19:43               ` Christoph Lameter
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Lameter @ 2013-02-08 19:43 UTC (permalink / raw)
  To: Clark Williams
  Cc: Frederic Weisbecker, Steven Rostedt, LKML, Alessio Igor Bogani,
	Andrew Morton, Chris Metcalf, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Li Zhong, Namhyung Kim,
	Paul E. McKenney, Paul Gortmaker, Peter Zijlstra,
	Thomas Gleixner

On Fri, 8 Feb 2013, Clark Williams wrote:

> I was a little apprehensive when you started talking about multiple
> tasks in Adaptive NOHZ mode on a core but the more I started thinking
> about it, I realized that we might end up in a cooperative multitasking
> mode with no tick at all going. Multiple SCHED_FIFO threads could
> run until blocking and another would be picked. Depends on well
> behaved threads of course, so probably many cases of users shooting off
> some toes with this...
>
> Of course if you mix scheduling policies or have RT throttling turned
> on we'll need some sort of tick for preemption. But if we can keep the
> timer reprogramming down we may see some big wins for RT and HPC loads.

We could tune the (hr)timer tick to have the same interval as the time
slice interval for a process and make that constant for all processes on a
hardware thread?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [ANNOUNCE] 3.8-rc6-nohz4
  2013-02-08 15:51         ` Frederic Weisbecker
@ 2013-02-11  9:59           ` Ingo Molnar
  0 siblings, 0 replies; 28+ messages in thread
From: Ingo Molnar @ 2013-02-11  9:59 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Steven Rostedt, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Thomas Gleixner


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> > I'm worried about the proliferation of not easily separable 
> > config options. We already have way too many timer and 
> > scheduler options to begin with.
> 
> Like Steve said, this is for overhead reasons. The syscall 
> uses the slow path so that's ok. But we add a callback to 
> every exception, irq entry/exit, scheduler sched switch, 
> signal handling, user and kernel preemption point. This all 
> could be lowered using static keys but even that doesn't make 
> me feel comfortable with this idea.
> 
> Moreover, for now this is going to be used only on extreme 
> usecases such as real time and HPC. If we really have to merge 
> this into an all-in-one nohz kconfig, I suggest we wait for 
> the feature to mature a bit and prove that it can be useful 
> further those specialized workloads, and also that we can 
> ensure it's off-case overhead is not significant.

I have no problems with making it an option initially - as long 
as the options are logically named and interconnected.

In terms of overhead, a big plus is the reduction in user-space 
execution overhead. At HZ=1000 we easily have 0.5%-1.0% overhead 
currently. That is a *lot* of overhead if the box does mostly 
user-space execution - which most boxes do, both servers and 
desktops - not HPC systems.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-02-11  9:59 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-06 18:28 [ANNOUNCE] 3.8-rc6-nohz4 Frederic Weisbecker
2013-02-07  2:50 ` Steven Rostedt
2013-02-07 11:10   ` Ingo Molnar
2013-02-07 15:41     ` Christoph Lameter
2013-02-07 16:12     ` Steven Rostedt
2013-02-07 16:30       ` Paul E. McKenney
2013-02-07 17:06         ` Steven Rostedt
2013-02-07 17:37           ` Paul E. McKenney
2013-02-07 16:25     ` Frederic Weisbecker
2013-02-07 16:41       ` Steven Rostedt
2013-02-07 16:45         ` Frederic Weisbecker
2013-02-07 17:03           ` Steven Rostedt
2013-02-07 17:45             ` Frederic Weisbecker
2013-02-07 19:07       ` Ingo Molnar
2013-02-07 19:19         ` Steven Rostedt
2013-02-08 15:51         ` Frederic Weisbecker
2013-02-11  9:59           ` Ingo Molnar
2013-02-07 16:41   ` Frederic Weisbecker
2013-02-07 17:00     ` Steven Rostedt
2013-02-07 17:18       ` Frederic Weisbecker
2013-02-07 19:14         ` Christoph Lameter
2013-02-07 19:55           ` Ingo Molnar
2013-02-08  6:18           ` Mike Galbraith
2013-02-08 15:53           ` Frederic Weisbecker
2013-02-08 16:18             ` Steven Rostedt
2013-02-08 16:24               ` Christoph Lameter
2013-02-08 18:57             ` Clark Williams
2013-02-08 19:43               ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.