All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/32] Nohz cpusets (was: Nohz Tasks)
@ 2011-08-15 15:51 Frederic Weisbecker
  2011-08-15 15:51 ` [PATCH 01/32 RESEND] nohz: Drop useless call in tick_nohz_start_idle() Frederic Weisbecker
                   ` (33 more replies)
  0 siblings, 34 replies; 139+ messages in thread
From: Frederic Weisbecker @ 2011-08-15 15:51 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Anton Blanchard, Avi Kivity,
	Ingo Molnar, Lai Jiangshan, Paul E . McKenney, Paul Menage,
	Peter Zijlstra, Stephen Hemminger, Thomas Gleixner, Tim Pepper

So it's still in draft stage. It's far from covering everything
the periodic timer does but it has made some progress since last
posting. So I think it's time now for another early release.

= What's that? = 

On the mainline kernel we have a feature (CONFIG_NO_HZ) that is
able to turn off the periodic scheduler tick when the CPU has
nothing to do, namely when it's running the idle task.

The scheduler tick handles many things like RCU and scheduler
internal state, jiffies accouting, wall time accounting, load
accounting, cputime accounting, timer wheel, posix cpu timers,
etc...

However by the time we run idle and the CPU is going to sleep,
none of these things are useful for the CPU. We can then shut it
down.

The benefit of this is for energy saving purposes. We avoid
to wake up the CPU needlessly with these useless interrupts.

What this patchset do is to extend that feature to non idle
cases, implementing some new kind of "adaptive nohz". But the
purpose is different and the implementation too.

= How does that work =

It tries to handle all the things that the timer tick usually
handle but using different tricks. Sometimes we can't really
afford to avoid the periodic tick, but sometimes we can and if
we do, we need to take some special care.

- We can't shutdown the tick if we have more than one task
running, due to the need for the tick for preemption. But I believe
that one day we can avoid the periodic tick for that and rather
anticipate when the scheduler really needs the tick.

- We can't shutdown the tick if RCU needs to complete a grace
period from the current CPU, or if it has callbacks to handle.

- We can't shutdown the tick if we have a posix cpu timer queued. Similarly
to the preemption case, we should be able to anticipate that with a
precise timer and avoid a periodic check based on HZ.

- Restart the tick when more than one non-idle task are in the runqueue.

- We need to handle process accounting, RCU, rq clock, task tick, etc...

And that patchset for now only handles a part of the whole needs.

= What's the interface =

We use the cpuset interface by adding a nohz flag to it.
As long as a CPU is part of a nohz cpuset, then this CPU will
try to enter into adaptive nohz mode when it can, even if it is part
of another cpuset that is not nohz.

= Why do we need that? =

There are at least two potential users of this feature:

* High performance computing: To optimize the throughput, some
workloads involve running one task per CPU that mostly run in
userspace. These tasks don't want and don't need to suffer from the
overhead of the timer interrupt. It consumes CPU time and it trashes
the CPU cache.

* Real time: Minimizing timer interrupts means less interrupts and thus
less critical sections that usually induce latency.

= What's missing? =

Many things like handling of perf events, irq work, sched clock tick,
runqueue clock, sched_class::task_tick(), rq clock, cpu load, ...

The handling of cputimes is also incomplete as there are other places
that use the utime/stime. Process time accounting is globally incomplete.

But anyway the thing is moving forward. An early posting was just very
needed at that step.

For those who want to play:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	nohz/cpuset-v1

Frederic Weisbecker (32):
  nohz: Drop useless call in tick_nohz_start_idle()
  nohz: Drop ts->idle_active
  nohz: Drop useless ts->inidle check before rearming the tick
  nohz: Separate idle sleeping time accounting from nohz switching
  nohz: Move rcu dynticks idle mode handling to idle enter/exit APIs
  nohz: Move idle ticks stats tracking out of nohz handlers
  nohz: Rename ts->idle_tick to ts->last_tick
  nohz: Move nohz load balancer selection into idle logic
  nohz: Move ts->idle_calls into strict idle logic
  nohz: Move next idle expiring time record into idle logic area
  cpuset: Set up interface for nohz flag
  nohz: Try not to give the timekeeping duty to a cpuset nohz cpu
  nohz: Adaptive tick stop and restart on nohz cpuset
  nohz/cpuset: Don't turn off the tick if rcu needs it
  nohz/cpuset: Restart tick when switching to idle task
  nohz/cpuset: Wake up adaptive nohz CPU when a timer gets enqueued
  x86: New cpuset nohz irq vector
  nohz/cpuset: Don't stop the tick if posix cpu timers are running
  nohz/cpuset: Restart tick when nohz flag is cleared on cpuset
  nohz/cpuset: Restart the tick if printk needs it
  rcu: Restart the tick on non-responding adaptive nohz CPUs
  rcu: Restart tick if we enqueue a callback in a nohz/cpuset CPU
  nohz/cpuset: Account user and system times in adaptive nohz mode
  nohz/cpuset: Handle kernel entry/exit to account cputime
  nohz/cpuset: New API to flush cputimes on nohz cpusets
  nohz/cpuset: Flush cputime on threads in nohz cpusets when waiting leader
  nohz/cpuset: Flush cputimes on procfs stat file read
  nohz/cpuset: Flush cputimes for getrusage() and times() syscalls
  x86: Syscall hooks for nohz cpusets
  x86: Exception hooks for nohz cpusets
  rcu: Switch to extended quiescent state in userspace from nohz cpuset
  nohz/cpuset: Disable under some configs

 arch/Kconfig                           |    3 +
 arch/arm/kernel/process.c              |    4 +-
 arch/avr32/kernel/process.c            |    4 +-
 arch/blackfin/kernel/process.c         |    4 +-
 arch/microblaze/kernel/process.c       |    4 +-
 arch/mips/kernel/process.c             |    4 +-
 arch/powerpc/kernel/idle.c             |    4 +-
 arch/powerpc/platforms/iseries/setup.c |    8 +-
 arch/s390/kernel/process.c             |    4 +-
 arch/sh/kernel/idle.c                  |    4 +-
 arch/sparc/kernel/process_64.c         |    4 +-
 arch/tile/kernel/process.c             |    4 +-
 arch/um/kernel/process.c               |    4 +-
 arch/unicore32/kernel/process.c        |    4 +-
 arch/x86/Kconfig                       |    1 +
 arch/x86/include/asm/entry_arch.h      |    3 +
 arch/x86/include/asm/hw_irq.h          |    6 +
 arch/x86/include/asm/irq_vectors.h     |    2 +
 arch/x86/include/asm/smp.h             |   11 +
 arch/x86/include/asm/thread_info.h     |   10 +-
 arch/x86/kernel/entry_64.S             |    4 +
 arch/x86/kernel/irqinit.c              |    4 +
 arch/x86/kernel/process_32.c           |    4 +-
 arch/x86/kernel/process_64.c           |    5 +-
 arch/x86/kernel/ptrace.c               |   10 +
 arch/x86/kernel/smp.c                  |   26 ++
 arch/x86/kernel/traps.c                |   22 +-
 arch/x86/mm/fault.c                    |   13 +-
 fs/proc/array.c                        |    2 +
 include/linux/cpuset.h                 |   29 ++
 include/linux/kernel_stat.h            |    2 +
 include/linux/posix-timers.h           |    1 +
 include/linux/rcupdate.h               |    1 +
 include/linux/sched.h                  |   10 +-
 include/linux/tick.h                   |   50 +++-
 init/Kconfig                           |    8 +
 kernel/cpuset.c                        |  105 +++++++
 kernel/exit.c                          |    2 +
 kernel/posix-cpu-timers.c              |   12 +
 kernel/printk.c                        |   17 +-
 kernel/rcutree.c                       |   28 ++-
 kernel/sched.c                         |  132 +++++++++-
 kernel/softirq.c                       |    6 +-
 kernel/sys.c                           |    6 +
 kernel/time/tick-sched.c               |  479 ++++++++++++++++++++++++--------
 kernel/time/timer_list.c               |    4 +-
 kernel/timer.c                         |    8 +-
 47 files changed, 897 insertions(+), 185 deletions(-)

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 139+ messages in thread

end of thread, other threads:[~2011-09-09  2:29 UTC | newest]

Thread overview: 139+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-15 15:51 [RFC PATCH 00/32] Nohz cpusets (was: Nohz Tasks) Frederic Weisbecker
2011-08-15 15:51 ` [PATCH 01/32 RESEND] nohz: Drop useless call in tick_nohz_start_idle() Frederic Weisbecker
2011-08-29 14:23   ` Peter Zijlstra
2011-08-29 17:10     ` Frederic Weisbecker
2011-08-15 15:51 ` [PATCH 02/32 RESEND] nohz: Drop ts->idle_active Frederic Weisbecker
2011-08-29 14:23   ` Peter Zijlstra
2011-08-29 16:15     ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 03/32 RESEND] nohz: Drop useless ts->inidle check before rearming the tick Frederic Weisbecker
2011-08-29 14:23   ` Peter Zijlstra
2011-08-29 16:58     ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 04/32] nohz: Separate idle sleeping time accounting from nohz switching Frederic Weisbecker
2011-08-29 14:23   ` Peter Zijlstra
2011-08-29 16:32     ` Frederic Weisbecker
2011-08-29 17:44       ` Peter Zijlstra
2011-08-29 22:53         ` Frederic Weisbecker
2011-08-29 14:23   ` Peter Zijlstra
2011-08-29 17:01     ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 05/32] nohz: Move rcu dynticks idle mode handling to idle enter/exit APIs Frederic Weisbecker
2011-08-29 14:25   ` Peter Zijlstra
2011-08-29 17:11     ` Frederic Weisbecker
2011-08-29 17:49       ` Peter Zijlstra
2011-08-29 17:59         ` Frederic Weisbecker
2011-08-29 18:06           ` Peter Zijlstra
2011-08-29 23:35             ` Frederic Weisbecker
2011-08-30 11:17               ` Peter Zijlstra
2011-08-30 14:11                 ` Frederic Weisbecker
2011-08-30 14:13                   ` Peter Zijlstra
2011-08-30 14:27                     ` Frederic Weisbecker
2011-08-30 11:19               ` Peter Zijlstra
2011-08-30 14:26                 ` Frederic Weisbecker
2011-08-30 15:22                   ` Peter Zijlstra
2011-08-30 18:45                     ` Frederic Weisbecker
2011-08-30 11:21               ` Peter Zijlstra
2011-08-30 14:32                 ` Frederic Weisbecker
2011-08-30 15:26                   ` Peter Zijlstra
2011-08-30 15:33                     ` Frederic Weisbecker
2011-08-30 15:42                       ` Peter Zijlstra
2011-08-30 18:53                         ` Frederic Weisbecker
2011-08-30 20:58                       ` Peter Zijlstra
2011-08-30 22:24                         ` Frederic Weisbecker
2011-08-31  9:17                           ` Peter Zijlstra
2011-08-31 13:37                             ` Frederic Weisbecker
2011-08-31 14:41                               ` Peter Zijlstra
2011-09-01 16:40                                 ` Paul E. McKenney
2011-09-01 17:13                                   ` Peter Zijlstra
2011-09-02  1:41                                     ` Paul E. McKenney
2011-09-02  8:24                                       ` Peter Zijlstra
2011-09-04 19:37                                         ` Paul E. McKenney
2011-09-05 14:28                                           ` Peter Zijlstra
2011-08-15 15:52 ` [PATCH 06/32] nohz: Move idle ticks stats tracking out of nohz handlers Frederic Weisbecker
2011-08-29 14:28   ` Peter Zijlstra
2011-09-06  0:35     ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 07/32] nohz: Rename ts->idle_tick to ts->last_tick Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 08/32] nohz: Move nohz load balancer selection into idle logic Frederic Weisbecker
2011-08-29 14:45   ` Peter Zijlstra
2011-09-08 14:08     ` Frederic Weisbecker
2011-09-08 17:16       ` Paul E. McKenney
2011-08-15 15:52 ` [PATCH 09/32] nohz: Move ts->idle_calls into strict " Frederic Weisbecker
2011-08-29 14:47   ` Peter Zijlstra
2011-08-29 17:34     ` Frederic Weisbecker
2011-08-29 17:59       ` Peter Zijlstra
2011-08-29 18:23         ` Frederic Weisbecker
2011-08-29 18:33           ` Peter Zijlstra
2011-08-30 14:45             ` Frederic Weisbecker
2011-08-30 15:33               ` Peter Zijlstra
2011-09-06 16:35                 ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 10/32] nohz: Move next idle expiring time record into idle logic area Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 11/32] cpuset: Set up interface for nohz flag Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 12/32] nohz: Try not to give the timekeeping duty to a cpuset nohz cpu Frederic Weisbecker
2011-08-29 14:55   ` Peter Zijlstra
2011-08-30 15:17     ` Frederic Weisbecker
2011-08-30 15:30       ` Dimitri Sivanich
2011-08-30 15:37       ` Peter Zijlstra
2011-08-30 22:44         ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 13/32] nohz: Adaptive tick stop and restart on nohz cpuset Frederic Weisbecker
2011-08-29 15:25   ` Peter Zijlstra
2011-09-06 13:03     ` Frederic Weisbecker
2011-08-29 15:28   ` Peter Zijlstra
2011-08-29 18:02     ` Frederic Weisbecker
2011-08-29 18:07       ` Peter Zijlstra
2011-08-29 18:28         ` Frederic Weisbecker
2011-08-30 12:44           ` Peter Zijlstra
2011-08-30 14:38             ` Frederic Weisbecker
2011-08-30 15:28               ` Peter Zijlstra
2011-08-29 15:32   ` Peter Zijlstra
2011-08-15 15:52 ` [PATCH 14/32] nohz/cpuset: Don't turn off the tick if rcu needs it Frederic Weisbecker
2011-08-16 20:13   ` Paul E. McKenney
2011-08-17  2:10     ` Frederic Weisbecker
2011-08-17  2:49       ` Paul E. McKenney
2011-08-29 15:36   ` Peter Zijlstra
2011-08-15 15:52 ` [PATCH 15/32] nohz/cpuset: Restart tick when switching to idle task Frederic Weisbecker
2011-08-29 15:43   ` Peter Zijlstra
2011-08-30 15:04     ` Frederic Weisbecker
2011-08-30 15:35       ` Peter Zijlstra
2011-08-15 15:52 ` [PATCH 16/32] nohz/cpuset: Wake up adaptive nohz CPU when a timer gets enqueued Frederic Weisbecker
2011-08-29 15:51   ` Peter Zijlstra
2011-08-29 15:55   ` Peter Zijlstra
2011-08-30 15:06     ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 17/32] x86: New cpuset nohz irq vector Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 18/32] nohz/cpuset: Don't stop the tick if posix cpu timers are running Frederic Weisbecker
2011-08-29 15:59   ` Peter Zijlstra
2011-08-15 15:52 ` [PATCH 19/32] nohz/cpuset: Restart tick when nohz flag is cleared on cpuset Frederic Weisbecker
2011-08-29 16:02   ` Peter Zijlstra
2011-08-30 15:10     ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 20/32] nohz/cpuset: Restart the tick if printk needs it Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 21/32] rcu: Restart the tick on non-responding adaptive nohz CPUs Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 22/32] rcu: Restart tick if we enqueue a callback in a nohz/cpuset CPU Frederic Weisbecker
2011-08-16 20:20   ` Paul E. McKenney
2011-08-17  2:18     ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 23/32] nohz/cpuset: Account user and system times in adaptive nohz mode Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 24/32] nohz/cpuset: Handle kernel entry/exit to account cputime Frederic Weisbecker
2011-08-16 20:38   ` Paul E. McKenney
2011-08-17  2:30     ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 25/32] nohz/cpuset: New API to flush cputimes on nohz cpusets Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 26/32] nohz/cpuset: Flush cputime on threads in nohz cpusets when waiting leader Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 27/32] nohz/cpuset: Flush cputimes on procfs stat file read Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 28/32] nohz/cpuset: Flush cputimes for getrusage() and times() syscalls Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 29/32] x86: Syscall hooks for nohz cpusets Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 30/32] x86: Exception " Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 31/32] rcu: Switch to extended quiescent state in userspace from nohz cpuset Frederic Weisbecker
2011-08-16 20:44   ` Paul E. McKenney
2011-08-17  2:43     ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 32/32] nohz/cpuset: Disable under some configs Frederic Weisbecker
2011-08-17 16:36 ` [RFC PATCH 00/32] Nohz cpusets (was: Nohz Tasks) Avi Kivity
2011-08-18 13:25   ` Frederic Weisbecker
2011-08-20  7:45     ` Paul Menage
2011-08-23 16:36       ` Frederic Weisbecker
2011-08-24 14:41 ` Gilad Ben-Yossef
2011-08-30 14:06   ` Frederic Weisbecker
2011-08-31  3:47     ` Mike Galbraith
2011-08-31  9:28       ` Peter Zijlstra
2011-08-31 10:26         ` Mike Galbraith
2011-08-31 10:33           ` Peter Zijlstra
2011-08-31 14:00             ` Gilad Ben-Yossef
2011-08-31 14:26               ` Peter Zijlstra
2011-08-31 14:05           ` Gilad Ben-Yossef
2011-08-31 16:12             ` Mike Galbraith
2011-08-31 13:57     ` Gilad Ben-Yossef
2011-08-31 14:30       ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.