All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET] cpuhog: implement and use cpuhog
@ 2010-03-08 15:53 Tejun Heo
  2010-03-08 15:53 ` [PATCH 1/4] cpuhog: implement cpuhog Tejun Heo
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Tejun Heo @ 2010-03-08 15:53 UTC (permalink / raw)
  To: linux-kernel, rusty, sivanich, heiko.carstens, torvalds, mingo,
	peterz, dipankar, josh, paulmck, oleg, akpm

Hello, all.

This patchset implements cpuhog which is a simplistic cpu
monopolization mechanism and reimplements stop_machine() and replaces
migration_thread with it.

This allows stop_machine() to be simpler and much more efficient on
very large machines without using more resources while also making the
rather messy overloaded migration_thread usages cleaner.

This should solve the slow boot problem[1] caused by repeated
stop_machine workqueue creation/destruction reported by Dimitri
Sivanich.

The patchset is currently on top of v2.6.33 and contains the following
patches.

 0001-cpuhog-implement-cpuhog.patch
 0002-stop_machine-reimplement-using-cpuhog.patch
 0003-scheduler-replace-migration_thread-with-cpuhog.patch
 0004-scheduler-kill-paranoia-check-in-synchronize_sched_e.patch

0001 implements cpuhog.  0002 converts stop_machine.  0003 converts
migration users and 0004 removes paranoia checks in
synchronize_sched_expedited().  0004 is done separately so that 0003
can serve as a debug/bisection point.

Tested cpu on/offlining, shutdown, all migration usage paths including
RCU torture test at 0003 and 004 and everything seems to work fine
here.  Dimitri, can you please test whether this solves the problem
you're seeing there?

The tree is available in the following git tree.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git cpuhog

diffstat follows.

 Documentation/RCU/torture.txt |   10 -
 arch/s390/kernel/time.c       |    1 
 drivers/xen/manage.c          |   14 -
 include/linux/cpuhog.h        |   24 ++
 include/linux/rcutiny.h       |    2 
 include/linux/rcutree.h       |    1 
 include/linux/stop_machine.h  |   20 --
 kernel/Makefile               |    2 
 kernel/cpu.c                  |    8 
 kernel/cpuhog.c               |  362 ++++++++++++++++++++++++++++++++++++++++++
 kernel/module.c               |   14 -
 kernel/rcutorture.c           |    2 
 kernel/sched.c                |  327 +++++++++----------------------------
 kernel/stop_machine.c         |  162 ++++--------------
 14 files changed, 509 insertions(+), 440 deletions(-)

Thanks.

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/957726

^ permalink raw reply	[flat|nested] 23+ messages in thread
* [PATCHSET sched/core] cpuhog: implement and use cpuhog, take#2
@ 2010-03-17  8:40 Tejun Heo
  2010-03-17  8:40 ` [PATCH 2/4] stop_machine: reimplement using cpuhog Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2010-03-17  8:40 UTC (permalink / raw)
  To: linux-kernel, rusty, sivanich, heiko.carstens, torvalds, mingo,
	dipankar, josh, paulmck, oleg, akpm, peterz, arjan

Hello,

This is the second take of cpuhog patchset which implements a
simplistic cpu monopolization mechanism and reimplements
stop_machine() and replaces migration_thread with it.

This allows stop_machine() to be simpler and much more efficient on
very large machines without using more resources while also making the
rather messy overloaded migration_thread usages cleaner.

This should solve the slow boot problem[1] caused by repeated
stop_machine workqueue creation/destruction reported by Dimitri
Sivanich.

Changes from the last take[L] are

* cpuhog callbacks are no longer allowed to sleep.  preemption is
  disabled around them.

* Patches are rebased on top of sched/core.

This patchset contains the following four patches.

 0001-cpuhog-implement-cpuhog.patch
 0002-stop_machine-reimplement-using-cpuhog.patch
 0003-scheduler-replace-migration_thread-with-cpuhog.patch
 0004-scheduler-kill-paranoia-check-in-synchronize_sched_e.patch

Tested cpu on/offlining, shutdown, all migration usage paths including
RCU torture test at 0003 and 004 and everything seems to work fine
here.

Peter suggested reusing stop_cpu/machine() names instead of
introducing new hog_* names.  While renaming definitely is an option,
I think it's better to keep the distinction between
stop_{cpu|machine}() and this maximum priority scheduler based
monopolization mechanism.

hog_[one_]cpu[s]() schedule highest priority task to monopolize the
upper half of cpu[s] but doesn't affect contextless part of the cpu
(hard/soft irq, bh, tasklet...) nor does it coordinate with each other
to make sure all the cpus are synchronized while executing the
callbacks.  In that sense, it is the lowest bottom of upper half but
not quite stopping the cpu or the machine and I think the distinction
is meaningful to make.  Now that the callbacks are not allowed to
sleep, they really 'hog' the target cpus too.

I wanted to avoid verbs associated with the traditional workqueue -
schedule and queue, while emphasizing that this is something that you
don't want to abuse - so the verb hog.  monopolize_cpu() was the
second choice but hog is shorter and can be used as a noun as-is, so I
chose hog.  If you like/dislike the name, please let me know.

If Rusty and RCU people agree (where are you? :-), I think it would be
the easiest to route the whole thing through sched tree so I rebased
it on top of sched/core.  Again, if you disagree, please let me know.

The tree is available in the following git tree.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git cpuhog

diffstat follows.

 Documentation/RCU/torture.txt |   10 -
 arch/s390/kernel/time.c       |    1 
 drivers/xen/manage.c          |   14 -
 include/linux/cpuhog.h        |   24 ++
 include/linux/rcutiny.h       |    2 
 include/linux/rcutree.h       |    1 
 include/linux/stop_machine.h  |   20 --
 kernel/Makefile               |    2 
 kernel/cpu.c                  |    8 
 kernel/cpuhog.c               |  368 ++++++++++++++++++++++++++++++++++++++++++
 kernel/module.c               |   14 -
 kernel/rcutorture.c           |    2 
 kernel/sched.c                |  282 +++++---------------------------
 kernel/sched_fair.c           |   39 ++--
 kernel/stop_machine.c         |  162 ++++--------------
 15 files changed, 511 insertions(+), 438 deletions(-)

Thanks.

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/957726
[L] http://thread.gmane.org/gmane.linux.kernel/958743

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2010-04-02  5:46 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-08 15:53 [PATCHSET] cpuhog: implement and use cpuhog Tejun Heo
2010-03-08 15:53 ` [PATCH 1/4] cpuhog: implement cpuhog Tejun Heo
2010-03-08 19:01   ` Oleg Nesterov
2010-03-08 23:18     ` Tejun Heo
2010-03-08 15:53 ` [PATCH 2/4] stop_machine: reimplement using cpuhog Tejun Heo
2010-03-08 16:32   ` Arjan van de Ven
2010-03-08 23:21     ` Tejun Heo
2010-03-08 17:10   ` Heiko Carstens
2010-03-08 18:27     ` Oleg Nesterov
2010-03-08 19:37       ` Heiko Carstens
2010-03-08 23:39         ` Tejun Heo
2010-03-09  7:09           ` Heiko Carstens
2010-03-09  7:16             ` Tejun Heo
2010-03-08 19:06   ` Oleg Nesterov
2010-03-08 23:22     ` Tejun Heo
2010-03-08 15:53 ` [PATCH 3/4] scheduler: replace migration_thread with cpuhog Tejun Heo
2010-03-08 15:53 ` [PATCH 4/4] scheduler: kill paranoia check in synchronize_sched_expedited() Tejun Heo
2010-03-10 19:25 ` [PATCHSET] cpuhog: implement and use cpuhog Peter Zijlstra
2010-03-12  3:13   ` Tejun Heo
2010-03-29  6:46     ` Rusty Russell
2010-03-29  9:11     ` Peter Zijlstra
2010-04-02  5:45       ` Tejun Heo
2010-03-17  8:40 [PATCHSET sched/core] cpuhog: implement and use cpuhog, take#2 Tejun Heo
2010-03-17  8:40 ` [PATCH 2/4] stop_machine: reimplement using cpuhog Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.