All of lore.kernel.org
 help / color / mirror / Atom feed
* linux-next: manual merge of the workqueues tree with the tip tree
@ 2009-11-26  8:00 Stephen Rothwell
  2009-11-26  8:12 ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Stephen Rothwell @ 2009-11-26  8:00 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-next, linux-kernel, Mike Galbraith, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra

Hi Tejun,

Today's linux-next merge of the workqueues tree got a conflict in
kernel/sched.c between commit eae0c9dfb534cb3449888b9601228efa6480fdb5
("sched: Fix and clean up rate-limit newidle code") from the tip tree and
commit 710c15b748f5ce9c573cc047f419cf007a677a9a ("sched: refactor
try_to_wake_up() and implement try_to_wake_up_local()") from the
workqueues tree.

I did the following fixup which should be checked ... I can carry this
fix (if it is suitable).

However, I have gone back to a previous version of the workqueues tree
for another issue (build problem for an interaction with the sound tree),
so this is not in linux-next today.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc kernel/sched.c
index 686be36,e488e07..0000000
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@@ -2323,7 -2336,58 +2335,69 @@@ void task_oncpu_function_call(struct ta
  	preempt_enable();
  }
  
- /***
+ static inline void ttwu_activate(struct task_struct *p, struct rq *rq,
+ 				 bool is_sync, bool is_migrate, bool is_local)
+ {
+ 	schedstat_inc(p, se.nr_wakeups);
+ 	if (is_sync)
+ 		schedstat_inc(p, se.nr_wakeups_sync);
+ 	if (is_migrate)
+ 		schedstat_inc(p, se.nr_wakeups_migrate);
+ 	if (is_local)
+ 		schedstat_inc(p, se.nr_wakeups_local);
+ 	else
+ 		schedstat_inc(p, se.nr_wakeups_remote);
+ 
+ 	activate_task(rq, p, 1);
+ 
+ 	/*
+ 	 * Only attribute actual wakeups done by this task.
+ 	 */
+ 	if (!in_interrupt()) {
+ 		struct sched_entity *se = &current->se;
+ 		u64 sample = se->sum_exec_runtime;
+ 
+ 		if (se->last_wakeup)
+ 			sample -= se->last_wakeup;
+ 		else
+ 			sample -= se->start_runtime;
+ 		update_avg(&se->avg_wakeup, sample);
+ 
+ 		se->last_wakeup = se->sum_exec_runtime;
+ 	}
+ }
+ 
+ static inline void ttwu_woken_up(struct task_struct *p, struct rq *rq,
+ 				 int wake_flags, bool success)
+ {
+ 	trace_sched_wakeup(rq, p, success);
+ 	check_preempt_curr(rq, p, wake_flags);
+ 
+ 	p->state = TASK_RUNNING;
+ #ifdef CONFIG_SMP
+ 	if (p->sched_class->task_wake_up)
+ 		p->sched_class->task_wake_up(rq, p);
++
++	if (unlikely(rq->idle_stamp)) {
++		u64 delta = rq->clock - rq->idle_stamp;
++		u64 max = 2*sysctl_sched_migration_cost;
++
++		if (delta > max)
++			rq->avg_idle = max;
++		else
++			update_avg(&rq->avg_idle, delta);
++		rq->idle_stamp = 0;
++	}
+ #endif
+ 	/*
+ 	 * Wake up is complete, fire wake up notifier.  This allows
+ 	 * try_to_wake_up_local() to be called from wake up notifiers.
+ 	 */
+ 	if (success)
+ 		fire_sched_notifier(p, wakeup);
+ }
+ 
+ /**
   * try_to_wake_up - wake up a thread
   * @p: the to-be-woken-up thread
   * @state: the mask of task states that can be woken

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: linux-next: manual merge of the workqueues tree with the tip tree
  2009-11-26  8:00 linux-next: manual merge of the workqueues tree with the tip tree Stephen Rothwell
@ 2009-11-26  8:12 ` Ingo Molnar
  2009-11-26  9:15   ` Tejun Heo
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2009-11-26  8:12 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Tejun Heo, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin, Peter Zijlstra


* Stephen Rothwell <sfr@canb.auug.org.au> wrote:

> Hi Tejun,
> 
> Today's linux-next merge of the workqueues tree got a conflict in
> kernel/sched.c between commit eae0c9dfb534cb3449888b9601228efa6480fdb5
> ("sched: Fix and clean up rate-limit newidle code") from the tip tree and
> commit 710c15b748f5ce9c573cc047f419cf007a677a9a ("sched: refactor
> try_to_wake_up() and implement try_to_wake_up_local()") from the
> workqueues tree.

Tejun,

Please submit scheduler patches to the scheduler tree. Such level of 
refactoring of a critical scheduler component needs to go through the 
regular scheduler channels. This is a frequently modified piece of code 
and conflicts are likely in the future.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: linux-next: manual merge of the workqueues tree with the tip tree
  2009-11-26  8:12 ` Ingo Molnar
@ 2009-11-26  9:15   ` Tejun Heo
  2009-11-26  9:26     ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Tejun Heo @ 2009-11-26  9:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin, Peter Zijlstra

11/26/2009 05:12 PM, Ingo Molnar wrote:
> Please submit scheduler patches to the scheduler tree. Such level of 
> refactoring of a critical scheduler component needs to go through the 
> regular scheduler channels. This is a frequently modified piece of code 
> and conflicts are likely in the future.

Sure, which sched/* branch should I base these patches on?
Alternatively, pulling the following branch into one of sched/* should
do the trick too.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-sched

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: linux-next: manual merge of the workqueues tree with the tip tree
  2009-11-26  9:15   ` Tejun Heo
@ 2009-11-26  9:26     ` Ingo Molnar
  2009-11-26  9:48       ` Tejun Heo
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2009-11-26  9:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin, Peter Zijlstra


* Tejun Heo <tj@kernel.org> wrote:

> 11/26/2009 05:12 PM, Ingo Molnar wrote:
> > Please submit scheduler patches to the scheduler tree. Such level of 
> > refactoring of a critical scheduler component needs to go through the 
> > regular scheduler channels. This is a frequently modified piece of code 
> > and conflicts are likely in the future.
> 
> Sure, which sched/* branch should I base these patches on?

You could send the patch you rely on standalone (it seems to be a single 
patch) and we can look at applying it to the scheduler tree. That 
reduces the conflicts on an ongoing basis. Please Cc: PeterZ and Mike 
Galbraith as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: linux-next: manual merge of the workqueues tree with the tip tree
  2009-11-26  9:26     ` Ingo Molnar
@ 2009-11-26  9:48       ` Tejun Heo
  2009-11-26  9:51         ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Tejun Heo @ 2009-11-26  9:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin, Peter Zijlstra

Hello, Ingo.

11/26/2009 06:26 PM, Ingo Molnar wrote:
>> Sure, which sched/* branch should I base these patches on?
> 
> You could send the patch you rely on standalone (it seems to be a single 
> patch) and we can look at applying it to the scheduler tree. That 
> reduces the conflicts on an ongoing basis. Please Cc: PeterZ and Mike 
> Galbraith as well.

The tree contains four scheduler patches.

0001-sched-rename-preempt_notifier-to-sched_notifier-and-.patch
0002-sched-update-sched_notifier-and-add-wakeup-sleep-not.patch
0003-sched-implement-sched_notifier_wake_up_process.patch
0004-sched-implement-force_cpus_allowed.patch

1, 2 and 4 are somewhat spread throughout sched.c so it would be
better if they all are routed through sched tree.  Currently the
wq#for-sched contains the followings on top of linus#master.

* Adds debugobj support to workqueue.

* Pulls in sched/urgent to receive the scheduler fix.

* Adds the above four patches.

If pulling in from the existing branch is an option, I'd prefer that.
If not, please let me know.  I'll send the above four patches against
sched/urgent.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: linux-next: manual merge of the workqueues tree with the tip tree
  2009-11-26  9:48       ` Tejun Heo
@ 2009-11-26  9:51         ` Ingo Molnar
  2009-11-26 10:11           ` [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it Tejun Heo
                             ` (3 more replies)
  0 siblings, 4 replies; 39+ messages in thread
From: Ingo Molnar @ 2009-11-26  9:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin, Peter Zijlstra


* Tejun Heo <tj@kernel.org> wrote:

> Hello, Ingo.
> 
> 11/26/2009 06:26 PM, Ingo Molnar wrote:
> >> Sure, which sched/* branch should I base these patches on?
> > 
> > You could send the patch you rely on standalone (it seems to be a single 
> > patch) and we can look at applying it to the scheduler tree. That 
> > reduces the conflicts on an ongoing basis. Please Cc: PeterZ and Mike 
> > Galbraith as well.
> 
> The tree contains four scheduler patches.
> 
> 0001-sched-rename-preempt_notifier-to-sched_notifier-and-.patch
> 0002-sched-update-sched_notifier-and-add-wakeup-sleep-not.patch
> 0003-sched-implement-sched_notifier_wake_up_process.patch
> 0004-sched-implement-force_cpus_allowed.patch
> 
> 1, 2 and 4 are somewhat spread throughout sched.c so it would be
> better if they all are routed through sched tree.  Currently the
> wq#for-sched contains the followings on top of linus#master.
> 
> * Adds debugobj support to workqueue.
> 
> * Pulls in sched/urgent to receive the scheduler fix.
> 
> * Adds the above four patches.
> 
> If pulling in from the existing branch is an option, I'd prefer that. 
> If not, please let me know.  I'll send the above four patches against 
> sched/urgent.

I've merged sched/urgent into sched/core and pushed it out - mind basing 
any sched.c patches on top of that and send a series of scheduler-only 
patches?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-26  9:51         ` Ingo Molnar
@ 2009-11-26 10:11           ` Tejun Heo
  2009-11-26 10:29             ` Ingo Molnar
  2009-11-26 10:13           ` [PATCH 2/4 tip/sched/core] sched: update sched_notifier and add wakeup/sleep notifications Tejun Heo
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 39+ messages in thread
From: Tejun Heo @ 2009-11-26 10:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin, Peter Zijlstra

Rename preempt_notifier to sched_notifier, move it from preempt.h to
sched.h, drop sched_ prefixes from ops names and make sched_notifier
always enabled.

This is to prepare for adding more notification hooks.  This patch
doesn't make any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
---
 arch/ia64/kvm/Kconfig    |    1 -
 arch/powerpc/kvm/Kconfig |    1 -
 arch/s390/kvm/Kconfig    |    1 -
 arch/x86/kvm/Kconfig     |    1 -
 include/linux/kvm_host.h |    4 +--
 include/linux/preempt.h  |   43 -------------------------------
 include/linux/sched.h    |   44 ++++++++++++++++++++++++++++---
 init/Kconfig             |    4 ---
 kernel/sched.c           |   64 +++++++++++++++-------------------------------
 virt/kvm/kvm_main.c      |   26 ++++++++----------
 10 files changed, 74 insertions(+), 115 deletions(-)

diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
index ef3e7be..a9e2b9c 100644
--- a/arch/ia64/kvm/Kconfig
+++ b/arch/ia64/kvm/Kconfig
@@ -22,7 +22,6 @@ config KVM
 	depends on HAVE_KVM && MODULES && EXPERIMENTAL
 	# for device assignment:
 	depends on PCI
-	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 	select HAVE_KVM_IRQCHIP
 	select KVM_APIC_ARCHITECTURE
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index c299268..092503e 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -18,7 +18,6 @@ if VIRTUALIZATION
 
 config KVM
 	bool
-	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 
 config KVM_440
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index bf164fc..e125d45 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -18,7 +18,6 @@ if VIRTUALIZATION
 config KVM
 	tristate "Kernel-based Virtual Machine (KVM) support"
 	depends on HAVE_KVM && EXPERIMENTAL
-	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 	select S390_SWITCH_AMODE
 	---help---
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index b84e571..b391852 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -22,7 +22,6 @@ config KVM
 	depends on HAVE_KVM
 	# for device assignment:
 	depends on PCI
-	select PREEMPT_NOTIFIERS
 	select MMU_NOTIFIER
 	select ANON_INODES
 	select HAVE_KVM_IRQCHIP
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b7bbb5d..bc0c1d4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -74,9 +74,7 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, struct kvm_io_bus *bus,
 
 struct kvm_vcpu {
 	struct kvm *kvm;
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-	struct preempt_notifier preempt_notifier;
-#endif
+	struct sched_notifier sched_notifier;
 	int vcpu_id;
 	struct mutex mutex;
 	int   cpu;
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 72b1a10..538c675 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -93,47 +93,4 @@ do { \
 
 #endif
 
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-
-struct preempt_notifier;
-
-/**
- * preempt_ops - notifiers called when a task is preempted and rescheduled
- * @sched_in: we're about to be rescheduled:
- *    notifier: struct preempt_notifier for the task being scheduled
- *    cpu:  cpu we're scheduled on
- * @sched_out: we've just been preempted
- *    notifier: struct preempt_notifier for the task being preempted
- *    next: the task that's kicking us out
- */
-struct preempt_ops {
-	void (*sched_in)(struct preempt_notifier *notifier, int cpu);
-	void (*sched_out)(struct preempt_notifier *notifier,
-			  struct task_struct *next);
-};
-
-/**
- * preempt_notifier - key for installing preemption notifiers
- * @link: internal use
- * @ops: defines the notifier functions to be called
- *
- * Usually used in conjunction with container_of().
- */
-struct preempt_notifier {
-	struct hlist_node link;
-	struct preempt_ops *ops;
-};
-
-void preempt_notifier_register(struct preempt_notifier *notifier);
-void preempt_notifier_unregister(struct preempt_notifier *notifier);
-
-static inline void preempt_notifier_init(struct preempt_notifier *notifier,
-				     struct preempt_ops *ops)
-{
-	INIT_HLIST_NODE(&notifier->link);
-	notifier->ops = ops;
-}
-
-#endif
-
 #endif /* __LINUX_PREEMPT_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 78ba664..68fffe8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1210,6 +1210,44 @@ struct sched_rt_entity {
 #endif
 };
 
+struct sched_notifier;
+
+/**
+ * sched_notifier_ops - notifiers called for scheduling events
+ * @in: we're about to be rescheduled:
+ *    notifier: struct sched_notifier for the task being scheduled
+ *    cpu:  cpu we're scheduled on
+ * @out: we've just been preempted
+ *    notifier: struct sched_notifier for the task being preempted
+ *    next: the task that's kicking us out
+ */
+struct sched_notifier_ops {
+	void (*in)(struct sched_notifier *notifier, int cpu);
+	void (*out)(struct sched_notifier *notifier, struct task_struct *next);
+};
+
+/**
+ * sched_notifier - key for installing scheduler notifiers
+ * @link: internal use
+ * @ops: defines the notifier functions to be called
+ *
+ * Usually used in conjunction with container_of().
+ */
+struct sched_notifier {
+	struct hlist_node link;
+	struct sched_notifier_ops *ops;
+};
+
+void sched_notifier_register(struct sched_notifier *notifier);
+void sched_notifier_unregister(struct sched_notifier *notifier);
+
+static inline void sched_notifier_init(struct sched_notifier *notifier,
+				       struct sched_notifier_ops *ops)
+{
+	INIT_HLIST_NODE(&notifier->link);
+	notifier->ops = ops;
+}
+
 struct rcu_node;
 
 struct task_struct {
@@ -1233,10 +1271,8 @@ struct task_struct {
 	struct sched_entity se;
 	struct sched_rt_entity rt;
 
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-	/* list of struct preempt_notifier: */
-	struct hlist_head preempt_notifiers;
-#endif
+	/* list of struct sched_notifier: */
+	struct hlist_head sched_notifiers;
 
 	/*
 	 * fpu_counter contains the number of consecutive context switches
diff --git a/init/Kconfig b/init/Kconfig
index 9e03ef8..0220aa7 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1206,7 +1206,3 @@ config STOP_MACHINE
 	  Need stop_machine() primitive.
 
 source "block/Kconfig"
-
-config PREEMPT_NOTIFIERS
-	bool
-
diff --git a/kernel/sched.c b/kernel/sched.c
index 315ba40..b5278c2 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2538,10 +2538,7 @@ static void __sched_fork(struct task_struct *p)
 	INIT_LIST_HEAD(&p->rt.run_list);
 	p->se.on_rq = 0;
 	INIT_LIST_HEAD(&p->se.group_node);
-
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-	INIT_HLIST_HEAD(&p->preempt_notifiers);
-#endif
+	INIT_HLIST_HEAD(&p->sched_notifiers);
 
 	/*
 	 * We mark the process as running here, but have not actually
@@ -2651,64 +2648,47 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
 	task_rq_unlock(rq, &flags);
 }
 
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-
 /**
- * preempt_notifier_register - tell me when current is being preempted & rescheduled
+ * sched_notifier_register - register scheduler notifier
  * @notifier: notifier struct to register
  */
-void preempt_notifier_register(struct preempt_notifier *notifier)
+void sched_notifier_register(struct sched_notifier *notifier)
 {
-	hlist_add_head(&notifier->link, &current->preempt_notifiers);
+	hlist_add_head(&notifier->link, &current->sched_notifiers);
 }
-EXPORT_SYMBOL_GPL(preempt_notifier_register);
+EXPORT_SYMBOL_GPL(sched_notifier_register);
 
 /**
- * preempt_notifier_unregister - no longer interested in preemption notifications
+ * sched_notifier_unregister - unregister scheduler notifier
  * @notifier: notifier struct to unregister
  *
- * This is safe to call from within a preemption notifier.
+ * This is safe to call from within a scheduler notifier.
  */
-void preempt_notifier_unregister(struct preempt_notifier *notifier)
+void sched_notifier_unregister(struct sched_notifier *notifier)
 {
 	hlist_del(&notifier->link);
 }
-EXPORT_SYMBOL_GPL(preempt_notifier_unregister);
+EXPORT_SYMBOL_GPL(sched_notifier_unregister);
 
-static void fire_sched_in_preempt_notifiers(struct task_struct *curr)
+static void fire_sched_in_notifiers(struct task_struct *curr)
 {
-	struct preempt_notifier *notifier;
+	struct sched_notifier *notifier;
 	struct hlist_node *node;
 
-	hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
-		notifier->ops->sched_in(notifier, raw_smp_processor_id());
+	hlist_for_each_entry(notifier, node, &curr->sched_notifiers, link)
+		notifier->ops->in(notifier, raw_smp_processor_id());
 }
 
-static void
-fire_sched_out_preempt_notifiers(struct task_struct *curr,
-				 struct task_struct *next)
+static void fire_sched_out_notifiers(struct task_struct *curr,
+				     struct task_struct *next)
 {
-	struct preempt_notifier *notifier;
+	struct sched_notifier *notifier;
 	struct hlist_node *node;
 
-	hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
-		notifier->ops->sched_out(notifier, next);
-}
-
-#else /* !CONFIG_PREEMPT_NOTIFIERS */
-
-static void fire_sched_in_preempt_notifiers(struct task_struct *curr)
-{
+	hlist_for_each_entry(notifier, node, &curr->sched_notifiers, link)
+		notifier->ops->out(notifier, next);
 }
 
-static void
-fire_sched_out_preempt_notifiers(struct task_struct *curr,
-				 struct task_struct *next)
-{
-}
-
-#endif /* CONFIG_PREEMPT_NOTIFIERS */
-
 /**
  * prepare_task_switch - prepare to switch tasks
  * @rq: the runqueue preparing to switch
@@ -2726,7 +2706,7 @@ static inline void
 prepare_task_switch(struct rq *rq, struct task_struct *prev,
 		    struct task_struct *next)
 {
-	fire_sched_out_preempt_notifiers(prev, next);
+	fire_sched_out_notifiers(prev, next);
 	prepare_lock_switch(rq, next);
 	prepare_arch_switch(next);
 }
@@ -2768,7 +2748,7 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
 	prev_state = prev->state;
 	finish_arch_switch(prev);
 	perf_event_task_sched_in(current, cpu_of(rq));
-	fire_sched_in_preempt_notifiers(current);
+	fire_sched_in_notifiers(current);
 	finish_lock_switch(rq, prev);
 
 	if (mm)
@@ -9584,9 +9564,7 @@ void __init sched_init(void)
 
 	set_load_weight(&init_task);
 
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-	INIT_HLIST_HEAD(&init_task.preempt_notifiers);
-#endif
+	INIT_HLIST_HEAD(&init_task.sched_notifiers);
 
 #ifdef CONFIG_SMP
 	open_softirq(SCHED_SOFTIRQ, run_rebalance_domains);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7495ce3..4e8e33f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -79,7 +79,7 @@ static cpumask_var_t cpus_hardware_enabled;
 struct kmem_cache *kvm_vcpu_cache;
 EXPORT_SYMBOL_GPL(kvm_vcpu_cache);
 
-static __read_mostly struct preempt_ops kvm_preempt_ops;
+static __read_mostly struct sched_notifier_ops kvm_sched_notifier_ops;
 
 struct dentry *kvm_debugfs_dir;
 
@@ -713,7 +713,7 @@ void vcpu_load(struct kvm_vcpu *vcpu)
 
 	mutex_lock(&vcpu->mutex);
 	cpu = get_cpu();
-	preempt_notifier_register(&vcpu->preempt_notifier);
+	sched_notifier_register(&vcpu->sched_notifier);
 	kvm_arch_vcpu_load(vcpu, cpu);
 	put_cpu();
 }
@@ -722,7 +722,7 @@ void vcpu_put(struct kvm_vcpu *vcpu)
 {
 	preempt_disable();
 	kvm_arch_vcpu_put(vcpu);
-	preempt_notifier_unregister(&vcpu->preempt_notifier);
+	sched_notifier_unregister(&vcpu->sched_notifier);
 	preempt_enable();
 	mutex_unlock(&vcpu->mutex);
 }
@@ -1772,7 +1772,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 	if (IS_ERR(vcpu))
 		return PTR_ERR(vcpu);
 
-	preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops);
+	sched_notifier_init(&vcpu->sched_notifier, &kvm_sched_notifier_ops);
 
 	r = kvm_arch_vcpu_setup(vcpu);
 	if (r)
@@ -2690,23 +2690,21 @@ static struct sys_device kvm_sysdev = {
 struct page *bad_page;
 pfn_t bad_pfn;
 
-static inline
-struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn)
+static inline struct kvm_vcpu *sched_notifier_to_vcpu(struct sched_notifier *sn)
 {
-	return container_of(pn, struct kvm_vcpu, preempt_notifier);
+	return container_of(sn, struct kvm_vcpu, sched_notifier);
 }
 
-static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
+static void kvm_sched_in(struct sched_notifier *sn, int cpu)
 {
-	struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
+	struct kvm_vcpu *vcpu = sched_notifier_to_vcpu(sn);
 
 	kvm_arch_vcpu_load(vcpu, cpu);
 }
 
-static void kvm_sched_out(struct preempt_notifier *pn,
-			  struct task_struct *next)
+static void kvm_sched_out(struct sched_notifier *sn, struct task_struct *next)
 {
-	struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
+	struct kvm_vcpu *vcpu = sched_notifier_to_vcpu(sn);
 
 	kvm_arch_vcpu_put(vcpu);
 }
@@ -2780,8 +2778,8 @@ int kvm_init(void *opaque, unsigned int vcpu_size,
 		goto out_free;
 	}
 
-	kvm_preempt_ops.sched_in = kvm_sched_in;
-	kvm_preempt_ops.sched_out = kvm_sched_out;
+	kvm_sched_notifier_ops.in = kvm_sched_in;
+	kvm_sched_notifier_ops.out = kvm_sched_out;
 
 	kvm_init_debug();
 
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/4 tip/sched/core] sched: update sched_notifier and add wakeup/sleep notifications
  2009-11-26  9:51         ` Ingo Molnar
  2009-11-26 10:11           ` [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it Tejun Heo
@ 2009-11-26 10:13           ` Tejun Heo
  2009-11-26 10:13           ` [PATCH 3/4 tip/sched/core] sched: refactor try_to_wake_up() and implement try_to_wake_up_local() Tejun Heo
  2009-11-26 10:14           ` [PATCH 4/4 tip/sched/core] sched: implement force_cpus_allowed() Tejun Heo
  3 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2009-11-26 10:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin, Peter Zijlstra

Update sched_notifier such that

* in and out ops are symmetric in the parameter they take.

* Use single fire_sched_notifier() macro instead of separate function
  for each op.

* Allow NULL ops.

* Add wakeup and sleep notifications.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
---
 include/linux/sched.h |   20 +++++++++++++-------
 kernel/sched.c        |   41 ++++++++++++++++++-----------------------
 virt/kvm/kvm_main.c   |    4 ++--
 3 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 68fffe8..e03a754 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1214,15 +1214,21 @@ struct sched_notifier;
 
 /**
  * sched_notifier_ops - notifiers called for scheduling events
- * @in: we're about to be rescheduled:
- *    notifier: struct sched_notifier for the task being scheduled
- *    cpu:  cpu we're scheduled on
- * @out: we've just been preempted
- *    notifier: struct sched_notifier for the task being preempted
- *    next: the task that's kicking us out
+ * @wakeup: we're waking up
+ *    notifier: struct sched_notifier for the task being woken up
+ * @sleep: we're going to bed
+ *    notifier: struct sched_notifier for the task sleeping
+ * @in: we're now running on the cpu
+ *    notifier: struct sched_notifier for the task being scheduled in
+ *    prev: the task which ran before us
+ * @out: we're leaving the cpu
+ *    notifier: struct sched_notifier for the task being scheduled out
+ *    next: the task which will run after us
  */
 struct sched_notifier_ops {
-	void (*in)(struct sched_notifier *notifier, int cpu);
+	void (*wakeup)(struct sched_notifier *notifier);
+	void (*sleep)(struct sched_notifier *notifier);
+	void (*in)(struct sched_notifier *notifier, struct task_struct *prev);
 	void (*out)(struct sched_notifier *notifier, struct task_struct *next);
 };
 
diff --git a/kernel/sched.c b/kernel/sched.c
index b5278c2..475da1a 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1389,6 +1389,16 @@ static const u32 prio_to_wmult[40] = {
  /*  15 */ 119304647, 148102320, 186737708, 238609294, 286331153,
 };
 
+#define fire_sched_notifier(p, callback, args...) do {			\
+	struct task_struct *__p = (p);					\
+	struct sched_notifier *__sn;					\
+	struct hlist_node *__pos;					\
+									\
+	hlist_for_each_entry(__sn, __pos, &__p->sched_notifiers, link)	\
+		if (__sn->ops->callback)				\
+			__sn->ops->callback(__sn , ##args);		\
+} while (0)
+
 static void activate_task(struct rq *rq, struct task_struct *p, int wakeup);
 
 /*
@@ -2454,6 +2464,8 @@ out_running:
 		rq->idle_stamp = 0;
 	}
 #endif
+	if (success)
+		fire_sched_notifier(p, wakeup);
 out:
 	task_rq_unlock(rq, &flags);
 	put_cpu();
@@ -2670,25 +2682,6 @@ void sched_notifier_unregister(struct sched_notifier *notifier)
 }
 EXPORT_SYMBOL_GPL(sched_notifier_unregister);
 
-static void fire_sched_in_notifiers(struct task_struct *curr)
-{
-	struct sched_notifier *notifier;
-	struct hlist_node *node;
-
-	hlist_for_each_entry(notifier, node, &curr->sched_notifiers, link)
-		notifier->ops->in(notifier, raw_smp_processor_id());
-}
-
-static void fire_sched_out_notifiers(struct task_struct *curr,
-				     struct task_struct *next)
-{
-	struct sched_notifier *notifier;
-	struct hlist_node *node;
-
-	hlist_for_each_entry(notifier, node, &curr->sched_notifiers, link)
-		notifier->ops->out(notifier, next);
-}
-
 /**
  * prepare_task_switch - prepare to switch tasks
  * @rq: the runqueue preparing to switch
@@ -2706,7 +2699,7 @@ static inline void
 prepare_task_switch(struct rq *rq, struct task_struct *prev,
 		    struct task_struct *next)
 {
-	fire_sched_out_notifiers(prev, next);
+	fire_sched_notifier(current, out, next);
 	prepare_lock_switch(rq, next);
 	prepare_arch_switch(next);
 }
@@ -2748,7 +2741,7 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
 	prev_state = prev->state;
 	finish_arch_switch(prev);
 	perf_event_task_sched_in(current, cpu_of(rq));
-	fire_sched_in_notifiers(current);
+	fire_sched_notifier(current, in, prev);
 	finish_lock_switch(rq, prev);
 
 	if (mm)
@@ -5441,10 +5434,12 @@ need_resched_nonpreemptible:
 	clear_tsk_need_resched(prev);
 
 	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
-		if (unlikely(signal_pending_state(prev->state, prev)))
+		if (unlikely(signal_pending_state(prev->state, prev))) {
 			prev->state = TASK_RUNNING;
-		else
+		} else {
+			fire_sched_notifier(prev, sleep);
 			deactivate_task(rq, prev, 1);
+		}
 		switch_count = &prev->nvcsw;
 	}
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4e8e33f..006358d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2695,11 +2695,11 @@ static inline struct kvm_vcpu *sched_notifier_to_vcpu(struct sched_notifier *sn)
 	return container_of(sn, struct kvm_vcpu, sched_notifier);
 }
 
-static void kvm_sched_in(struct sched_notifier *sn, int cpu)
+static void kvm_sched_in(struct sched_notifier *sn, struct task_struct *prev)
 {
 	struct kvm_vcpu *vcpu = sched_notifier_to_vcpu(sn);
 
-	kvm_arch_vcpu_load(vcpu, cpu);
+	kvm_arch_vcpu_load(vcpu, smp_processor_id());
 }
 
 static void kvm_sched_out(struct sched_notifier *sn, struct task_struct *next)
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/4 tip/sched/core] sched: refactor try_to_wake_up() and implement try_to_wake_up_local()
  2009-11-26  9:51         ` Ingo Molnar
  2009-11-26 10:11           ` [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it Tejun Heo
  2009-11-26 10:13           ` [PATCH 2/4 tip/sched/core] sched: update sched_notifier and add wakeup/sleep notifications Tejun Heo
@ 2009-11-26 10:13           ` Tejun Heo
  2009-11-26 10:14           ` [PATCH 4/4 tip/sched/core] sched: implement force_cpus_allowed() Tejun Heo
  3 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2009-11-26 10:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin, Peter Zijlstra

Factor ttwu_activate() and ttwu_woken_up() out of try_to_wake_up() and
use them to implement try_to_wake_up_local().  try_to_wake_up_local()
is similar to try_to_wake_up() but it assumes the caller has this_rq()
locked and the target task is bound to this_rq().
try_to_wake_up_local() can be called from wakeup and sleep scheduler
notifiers.

The factoring out doesn't affect try_to_wake_up() much
code-generation-wise.  Depending on configuration options, it ends up
generating the same object code as before or slightly different one
due to different register assignment.

The refactoring and local wake up function implementation using
refactored functions are based on Peter Zijlstra's suggestion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
---
 include/linux/sched.h |    2 +
 kernel/sched.c        |  166 +++++++++++++++++++++++++++++++++++--------------
 2 files changed, 120 insertions(+), 48 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e03a754..c889a58 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2057,6 +2057,8 @@ extern void release_uids(struct user_namespace *ns);
 
 extern void do_timer(unsigned long ticks);
 
+extern bool try_to_wake_up_local(struct task_struct *p, unsigned int state,
+				 int wake_flags);
 extern int wake_up_state(struct task_struct *tsk, unsigned int state);
 extern int wake_up_process(struct task_struct *tsk);
 extern void wake_up_new_task(struct task_struct *tsk,
diff --git a/kernel/sched.c b/kernel/sched.c
index 475da1a..bad92c0 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2333,11 +2333,73 @@ void task_oncpu_function_call(struct task_struct *p,
 	preempt_enable();
 }
 
-/***
+static inline void ttwu_activate(struct task_struct *p, struct rq *rq,
+				 bool is_sync, bool is_migrate, bool is_local)
+{
+	schedstat_inc(p, se.nr_wakeups);
+	if (is_sync)
+		schedstat_inc(p, se.nr_wakeups_sync);
+	if (is_migrate)
+		schedstat_inc(p, se.nr_wakeups_migrate);
+	if (is_local)
+		schedstat_inc(p, se.nr_wakeups_local);
+	else
+		schedstat_inc(p, se.nr_wakeups_remote);
+
+	activate_task(rq, p, 1);
+
+	/*
+	 * Only attribute actual wakeups done by this task.
+	 */
+	if (!in_interrupt()) {
+		struct sched_entity *se = &current->se;
+		u64 sample = se->sum_exec_runtime;
+
+		if (se->last_wakeup)
+			sample -= se->last_wakeup;
+		else
+			sample -= se->start_runtime;
+		update_avg(&se->avg_wakeup, sample);
+
+		se->last_wakeup = se->sum_exec_runtime;
+	}
+}
+
+static inline void ttwu_woken_up(struct task_struct *p, struct rq *rq,
+				 int wake_flags, bool success)
+{
+	trace_sched_wakeup(rq, p, success);
+	check_preempt_curr(rq, p, wake_flags);
+
+	p->state = TASK_RUNNING;
+#ifdef CONFIG_SMP
+	if (p->sched_class->task_wake_up)
+		p->sched_class->task_wake_up(rq, p);
+
+	if (unlikely(rq->idle_stamp)) {
+		u64 delta = rq->clock - rq->idle_stamp;
+		u64 max = 2*sysctl_sched_migration_cost;
+
+		if (delta > max)
+			rq->avg_idle = max;
+		else
+			update_avg(&rq->avg_idle, delta);
+		rq->idle_stamp = 0;
+	}
+#endif
+	/*
+	 * Wake up is complete, fire wake up notifier.  This allows
+	 * try_to_wake_up_local() to be called from wake up notifiers.
+	 */
+	if (success)
+		fire_sched_notifier(p, wakeup);
+}
+
+/**
  * try_to_wake_up - wake up a thread
  * @p: the to-be-woken-up thread
  * @state: the mask of task states that can be woken
- * @sync: do a synchronous wakeup?
+ * @wake_flags: wake modifier flags (WF_*)
  *
  * Put it on the run-queue if it's not already there. The "current"
  * thread is always on the run-queue (except when the actual
@@ -2345,7 +2407,8 @@ void task_oncpu_function_call(struct task_struct *p,
  * the simpler "current->state = TASK_RUNNING" to mark yourself
  * runnable without the overhead of this.
  *
- * returns failure only if the task is already active.
+ * Returns %true if @p was woken up, %false if it was already running
+ * or @state didn't match @p's state.
  */
 static int try_to_wake_up(struct task_struct *p, unsigned int state,
 			  int wake_flags)
@@ -2416,59 +2479,61 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state,
 
 out_activate:
 #endif /* CONFIG_SMP */
-	schedstat_inc(p, se.nr_wakeups);
-	if (wake_flags & WF_SYNC)
-		schedstat_inc(p, se.nr_wakeups_sync);
-	if (orig_cpu != cpu)
-		schedstat_inc(p, se.nr_wakeups_migrate);
-	if (cpu == this_cpu)
-		schedstat_inc(p, se.nr_wakeups_local);
-	else
-		schedstat_inc(p, se.nr_wakeups_remote);
-	activate_task(rq, p, 1);
+	ttwu_activate(p, rq, wake_flags & WF_SYNC, orig_cpu != cpu,
+		      cpu == this_cpu);
 	success = 1;
+out_running:
+	ttwu_woken_up(p, rq, wake_flags, success);
+out:
+	task_rq_unlock(rq, &flags);
+	put_cpu();
 
-	/*
-	 * Only attribute actual wakeups done by this task.
-	 */
-	if (!in_interrupt()) {
-		struct sched_entity *se = &current->se;
-		u64 sample = se->sum_exec_runtime;
-
-		if (se->last_wakeup)
-			sample -= se->last_wakeup;
-		else
-			sample -= se->start_runtime;
-		update_avg(&se->avg_wakeup, sample);
+	return success;
+}
 
-		se->last_wakeup = se->sum_exec_runtime;
-	}
+/**
+ * try_to_wake_up_local - try to wake up a local task with rq lock held
+ * @p: the to-be-woken-up thread
+ * @state: the mask of task states that can be woken
+ * @wake_flags: wake modifier flags (WF_*)
+ *
+ * Put @p on the run-queue if it's not alredy there.  The caller must
+ * ensure that this_rq() is locked, @p is bound to this_rq() and @p is
+ * not the current task.  this_rq() stays locked over invocation.
+ *
+ * This function can be called from wakeup and sleep scheduler
+ * notifiers.  Be careful not to create deep recursion by chaining
+ * wakeup notifiers.
+ *
+ * Returns %true if @p was woken up, %false if it was already running
+ * or @state didn't match @p's state.
+ */
+bool try_to_wake_up_local(struct task_struct *p, unsigned int state,
+			  int wake_flags)
+{
+	struct rq *rq = task_rq(p);
+	bool success = false;
 
-out_running:
-	trace_sched_wakeup(rq, p, success);
-	check_preempt_curr(rq, p, wake_flags);
+	BUG_ON(rq != this_rq());
+	BUG_ON(p == current);
+	lockdep_assert_held(&rq->lock);
 
-	p->state = TASK_RUNNING;
-#ifdef CONFIG_SMP
-	if (p->sched_class->task_wake_up)
-		p->sched_class->task_wake_up(rq, p);
+	if (!sched_feat(SYNC_WAKEUPS))
+		wake_flags &= ~WF_SYNC;
 
-	if (unlikely(rq->idle_stamp)) {
-		u64 delta = rq->clock - rq->idle_stamp;
-		u64 max = 2*sysctl_sched_migration_cost;
+	if (!(p->state & state))
+		return false;
 
-		if (delta > max)
-			rq->avg_idle = max;
-		else
-			update_avg(&rq->avg_idle, delta);
-		rq->idle_stamp = 0;
+	if (!p->se.on_rq) {
+		if (likely(!task_running(rq, p))) {
+			schedstat_inc(rq, ttwu_count);
+			schedstat_inc(rq, ttwu_local);
+		}
+		ttwu_activate(p, rq, wake_flags & WF_SYNC, false, true);
+		success = true;
 	}
-#endif
-	if (success)
-		fire_sched_notifier(p, wakeup);
-out:
-	task_rq_unlock(rq, &flags);
-	put_cpu();
+
+	ttwu_woken_up(p, rq, wake_flags, success);
 
 	return success;
 }
@@ -5437,6 +5502,11 @@ need_resched_nonpreemptible:
 		if (unlikely(signal_pending_state(prev->state, prev))) {
 			prev->state = TASK_RUNNING;
 		} else {
+			/*
+			 * Fire sleep notifier before changing any scheduler
+			 * state.  This allows try_to_wake_up_local() to be
+			 * called from sleep notifiers.
+			 */
 			fire_sched_notifier(prev, sleep);
 			deactivate_task(rq, prev, 1);
 		}
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/4 tip/sched/core] sched: implement force_cpus_allowed()
  2009-11-26  9:51         ` Ingo Molnar
                             ` (2 preceding siblings ...)
  2009-11-26 10:13           ` [PATCH 3/4 tip/sched/core] sched: refactor try_to_wake_up() and implement try_to_wake_up_local() Tejun Heo
@ 2009-11-26 10:14           ` Tejun Heo
  3 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2009-11-26 10:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin, Peter Zijlstra

set_cpus_allowed_ptr() modifies the allowed cpu mask of a task.  The
function performs the following checks before applying new mask.

* Check whether PF_THREAD_BOUND is set.  This is set for bound
  kthreads so that they can't be moved around.

* Check whether the target cpu is still marked active - cpu_active().
  Active state is cleared early while downing a cpu.

This patch adds force_cpus_allowed() which bypasses the above two
checks.  The caller is responsible for guaranteeing that the
destination cpu doesn't go down until force_cpus_allowed() finishes.

The first check is bypassed by factoring out actual migration part
into __set_cpus_allowed() from set_cpus_allowed_ptr() and calling the
inner function from force_cpus_allowed().

The second check is buried deep down in __migrate_task() which is
executed by migration threads.  @force parameter is added to
__migrate_task().  As the only way to pass parameters from
__set_cpus_allowed() is through migration_req, migration_req->force is
added and the @force parameter is passed down to __migrate_task().

Please note the naming discrepancy between set_cpus_allowed_ptr() and
the new functions.  The _ptr suffix is from the days when cpumask api
wasn't mature and future changes should drop it from
set_cpus_allowed_ptr() too.

force_cpus_allowed() will be used for concurrency-managed workqueue.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
---
 include/linux/sched.h |    7 ++++
 kernel/sched.c        |   89 +++++++++++++++++++++++++++++++++----------------
 2 files changed, 67 insertions(+), 29 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c889a58..82544e8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1851,6 +1851,8 @@ static inline void rcu_copy_process(struct task_struct *p)
 #ifdef CONFIG_SMP
 extern int set_cpus_allowed_ptr(struct task_struct *p,
 				const struct cpumask *new_mask);
+extern int force_cpus_allowed(struct task_struct *p,
+				  const struct cpumask *new_mask);
 #else
 static inline int set_cpus_allowed_ptr(struct task_struct *p,
 				       const struct cpumask *new_mask)
@@ -1859,6 +1861,11 @@ static inline int set_cpus_allowed_ptr(struct task_struct *p,
 		return -EINVAL;
 	return 0;
 }
+static inline int force_cpus_allowed(struct task_struct *p,
+				     const struct cpumask *new_mask)
+{
+	return set_cpus_allowed_ptr(p, new_mask);
+}
 #endif
 
 #ifndef CONFIG_CPUMASK_OFFSTACK
diff --git a/kernel/sched.c b/kernel/sched.c
index bad92c0..eaa660f 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2107,6 +2107,7 @@ struct migration_req {
 
 	struct task_struct *task;
 	int dest_cpu;
+	bool force;
 
 	struct completion done;
 };
@@ -2115,8 +2116,8 @@ struct migration_req {
  * The task's runqueue lock must be held.
  * Returns true if you have to wait for migration thread.
  */
-static int
-migrate_task(struct task_struct *p, int dest_cpu, struct migration_req *req)
+static int migrate_task(struct task_struct *p, int dest_cpu,
+			struct migration_req *req, bool force)
 {
 	struct rq *rq = task_rq(p);
 
@@ -2133,6 +2134,7 @@ migrate_task(struct task_struct *p, int dest_cpu, struct migration_req *req)
 	init_completion(&req->done);
 	req->task = p;
 	req->dest_cpu = dest_cpu;
+	req->force = force;
 	list_add(&req->list, &rq->migration_queue);
 
 	return 1;
@@ -3171,7 +3173,7 @@ static void sched_migrate_task(struct task_struct *p, int dest_cpu)
 		goto out;
 
 	/* force the process onto the specified CPU */
-	if (migrate_task(p, dest_cpu, &req)) {
+	if (migrate_task(p, dest_cpu, &req, false)) {
 		/* Need to wait for migration thread (might exit: take ref). */
 		struct task_struct *mt = rq->migration_thread;
 
@@ -7099,34 +7101,19 @@ static inline void sched_init_granularity(void)
  * 7) we wake up and the migration is done.
  */
 
-/*
- * Change a given task's CPU affinity. Migrate the thread to a
- * proper CPU and schedule it away if the CPU it's executing on
- * is removed from the allowed bitmask.
- *
- * NOTE: the caller must have a valid reference to the task, the
- * task must not exit() & deallocate itself prematurely. The
- * call is not atomic; no spinlocks may be held.
- */
-int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
+static inline int __set_cpus_allowed(struct task_struct *p,
+				     const struct cpumask *new_mask,
+				     struct rq *rq, unsigned long *flags,
+				     bool force)
 {
 	struct migration_req req;
-	unsigned long flags;
-	struct rq *rq;
 	int ret = 0;
 
-	rq = task_rq_lock(p, &flags);
 	if (!cpumask_intersects(new_mask, cpu_online_mask)) {
 		ret = -EINVAL;
 		goto out;
 	}
 
-	if (unlikely((p->flags & PF_THREAD_BOUND) && p != current &&
-		     !cpumask_equal(&p->cpus_allowed, new_mask))) {
-		ret = -EINVAL;
-		goto out;
-	}
-
 	if (p->sched_class->set_cpus_allowed)
 		p->sched_class->set_cpus_allowed(p, new_mask);
 	else {
@@ -7138,12 +7125,13 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 	if (cpumask_test_cpu(task_cpu(p), new_mask))
 		goto out;
 
-	if (migrate_task(p, cpumask_any_and(cpu_online_mask, new_mask), &req)) {
+	if (migrate_task(p, cpumask_any_and(cpu_online_mask, new_mask), &req,
+			 force)) {
 		/* Need help from migration thread: drop lock and wait. */
 		struct task_struct *mt = rq->migration_thread;
 
 		get_task_struct(mt);
-		task_rq_unlock(rq, &flags);
+		task_rq_unlock(rq, flags);
 		wake_up_process(rq->migration_thread);
 		put_task_struct(mt);
 		wait_for_completion(&req.done);
@@ -7151,13 +7139,54 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 		return 0;
 	}
 out:
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, flags);
 
 	return ret;
 }
+
+/*
+ * Change a given task's CPU affinity. Migrate the thread to a
+ * proper CPU and schedule it away if the CPU it's executing on
+ * is removed from the allowed bitmask.
+ *
+ * NOTE: the caller must have a valid reference to the task, the
+ * task must not exit() & deallocate itself prematurely. The
+ * call is not atomic; no spinlocks may be held.
+ */
+int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
+{
+	unsigned long flags;
+	struct rq *rq;
+
+	rq = task_rq_lock(p, &flags);
+
+	if (unlikely((p->flags & PF_THREAD_BOUND) && p != current &&
+		     !cpumask_equal(&p->cpus_allowed, new_mask))) {
+		task_rq_unlock(rq, &flags);
+		return -EINVAL;
+	}
+
+	return __set_cpus_allowed(p, new_mask, rq, &flags, false);
+}
 EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr);
 
 /*
+ * Similar to set_cpus_allowed_ptr() but bypasses PF_THREAD_BOUND
+ * check and ignores cpu_active() status as long as the cpu is online.
+ * The caller is responsible for guaranteeing that the destination
+ * cpus don't go down until this function finishes and in general
+ * ensuring things don't go bonkers.
+ */
+int force_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
+{
+	unsigned long flags;
+	struct rq *rq;
+
+	rq = task_rq_lock(p, &flags);
+	return __set_cpus_allowed(p, new_mask, rq, &flags, true);
+}
+
+/*
  * Move (not current) task off this cpu, onto dest cpu. We're doing
  * this because either it can't run here any more (set_cpus_allowed()
  * away from this CPU, or CPU going down), or because we're
@@ -7168,12 +7197,13 @@ EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr);
  *
  * Returns non-zero if task was successfully migrated.
  */
-static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
+static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu,
+			  bool force)
 {
 	struct rq *rq_dest, *rq_src;
 	int ret = 0, on_rq;
 
-	if (unlikely(!cpu_active(dest_cpu)))
+	if (!force && unlikely(!cpu_active(dest_cpu)))
 		return ret;
 
 	rq_src = cpu_rq(src_cpu);
@@ -7252,7 +7282,8 @@ static int migration_thread(void *data)
 
 		if (req->task != NULL) {
 			spin_unlock(&rq->lock);
-			__migrate_task(req->task, cpu, req->dest_cpu);
+			__migrate_task(req->task, cpu, req->dest_cpu,
+				       req->force);
 		} else if (likely(cpu == (badcpu = smp_processor_id()))) {
 			req->dest_cpu = RCU_MIGRATION_GOT_QS;
 			spin_unlock(&rq->lock);
@@ -7277,7 +7308,7 @@ static int __migrate_task_irq(struct task_struct *p, int src_cpu, int dest_cpu)
 	int ret;
 
 	local_irq_disable();
-	ret = __migrate_task(p, src_cpu, dest_cpu);
+	ret = __migrate_task(p, src_cpu, dest_cpu, false);
 	local_irq_enable();
 	return ret;
 }
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-26 10:11           ` [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it Tejun Heo
@ 2009-11-26 10:29             ` Ingo Molnar
  2009-11-26 10:32               ` Peter Zijlstra
                                 ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Ingo Molnar @ 2009-11-26 10:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin, Peter Zijlstra


* Tejun Heo <tj@kernel.org> wrote:

> Rename preempt_notifier to sched_notifier, move it from preempt.h to 
> sched.h, drop sched_ prefixes from ops names and make sched_notifier 
> always enabled.
> 
> This is to prepare for adding more notification hooks.  This patch 
> doesn't make any functional changes.

The sched notifiers and the various event notifiers we have in the same 
codepaths should really be unified into a single callback framework.

We have these _5_ callbacks:

...
        perf_event_task_sched_out(prev, next, cpu);
...
        fire_sched_out_notifiers(prev, next);
...
        trace_sched_switch(rq, prev, next);
...
        perf_event_task_sched_in(current, cpu_of(rq));
	fire_sched_in_notifiers(current);
...

That could be done with just two callbacks - one for sched-out, one for 
sched-in.

The best way to do that would be to use two TRACE_EVENT() callbacks, 
make them unconditional and register to them. (with wrappers to make it 
all convenient to use)

This requires some work but needs to be done.

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-26 10:29             ` Ingo Molnar
@ 2009-11-26 10:32               ` Peter Zijlstra
  2009-11-26 11:23                 ` Peter Zijlstra
  2009-11-26 10:44               ` Tejun Heo
  2009-11-27  3:33               ` Paul Mackerras
  2 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2009-11-26 10:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tejun Heo, Stephen Rothwell, linux-next, linux-kernel,
	Mike Galbraith, Thomas Gleixner, H. Peter Anvin

On Thu, 2009-11-26 at 11:29 +0100, Ingo Molnar wrote:
> * Tejun Heo <tj@kernel.org> wrote:
> 
> > Rename preempt_notifier to sched_notifier, move it from preempt.h to 
> > sched.h, drop sched_ prefixes from ops names and make sched_notifier 
> > always enabled.
> > 
> > This is to prepare for adding more notification hooks.  This patch 
> > doesn't make any functional changes.
> 
> The sched notifiers and the various event notifiers we have in the same 
> codepaths should really be unified into a single callback framework.
> 
> We have these _5_ callbacks:
> 
> ....
>         perf_event_task_sched_out(prev, next, cpu);
> ....
>         fire_sched_out_notifiers(prev, next);
> ....
>         trace_sched_switch(rq, prev, next);
> ....
>         perf_event_task_sched_in(current, cpu_of(rq));
> 	fire_sched_in_notifiers(current);
> ....
> 
> That could be done with just two callbacks - one for sched-out, one for 
> sched-in.
> 
> The best way to do that would be to use two TRACE_EVENT() callbacks, 
> make them unconditional and register to them. (with wrappers to make it 
> all convenient to use)
> 
> This requires some work but needs to be done.

Ugh,.. it also makes TRACE_EVENT unconditional.

That really wants a separate option.. What we could do is take regular
notifier lists and extend them to auto-generate a tracepoint when the
trace stuff is enabled or something.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-26 10:29             ` Ingo Molnar
  2009-11-26 10:32               ` Peter Zijlstra
@ 2009-11-26 10:44               ` Tejun Heo
  2009-11-27  3:33               ` Paul Mackerras
  2 siblings, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2009-11-26 10:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin, Peter Zijlstra

Hello, Ingo.

11/26/2009 07:29 PM, Ingo Molnar wrote:
> That could be done with just two callbacks - one for sched-out, one for 
> sched-in.
> 
> The best way to do that would be to use two TRACE_EVENT() callbacks, 
> make them unconditional and register to them. (with wrappers to make it 
> all convenient to use)
> 
> This requires some work but needs to be done.

Thought about that but trace events and scheduler callbacks have very
different trigger enable/disable conditions.  I couldn't think of a
way to do both in reasonably efficient manner.  Although they both are
notification mechanisms, they do have pretty different requirements
and I'm not quite sure whether unifying them is a good idea.  Of
course if you have an idea to do both efficiently, no reason not to do
it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-26 10:32               ` Peter Zijlstra
@ 2009-11-26 11:23                 ` Peter Zijlstra
  2009-11-26 11:56                   ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2009-11-26 11:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tejun Heo, Stephen Rothwell, linux-next, linux-kernel,
	Mike Galbraith, Thomas Gleixner, H. Peter Anvin

On Thu, 2009-11-26 at 11:32 +0100, Peter Zijlstra wrote:
> On Thu, 2009-11-26 at 11:29 +0100, Ingo Molnar wrote:
> > * Tejun Heo <tj@kernel.org> wrote:
> > 
> > > Rename preempt_notifier to sched_notifier, move it from preempt.h to 
> > > sched.h, drop sched_ prefixes from ops names and make sched_notifier 
> > > always enabled.
> > > 
> > > This is to prepare for adding more notification hooks.  This patch 
> > > doesn't make any functional changes.
> > 
> > The sched notifiers and the various event notifiers we have in the same 
> > codepaths should really be unified into a single callback framework.
> > 
> > We have these _5_ callbacks:
> > 
> > ....
> >         perf_event_task_sched_out(prev, next, cpu);
> > ....
> >         fire_sched_out_notifiers(prev, next);
> > ....
> >         trace_sched_switch(rq, prev, next);
> > ....
> >         perf_event_task_sched_in(current, cpu_of(rq));
> > 	fire_sched_in_notifiers(current);
> > ....
> > 
> > That could be done with just two callbacks - one for sched-out, one for 
> > sched-in.
> > 
> > The best way to do that would be to use two TRACE_EVENT() callbacks, 
> > make them unconditional and register to them. (with wrappers to make it 
> > all convenient to use)
> > 
> > This requires some work but needs to be done.
> 
> Ugh,.. it also makes TRACE_EVENT unconditional.
> 
> That really wants a separate option.. What we could do is take regular
> notifier lists and extend them to auto-generate a tracepoint when the
> trace stuff is enabled or something.

Also, there is this thing about direct and indirect function calls.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-26 11:23                 ` Peter Zijlstra
@ 2009-11-26 11:56                   ` Ingo Molnar
  2009-11-26 12:40                     ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2009-11-26 11:56 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt,
	=?unknown-8bit?B?RnLDqWTDqXJpYw==?= Weisbecker
  Cc: Tejun Heo, Stephen Rothwell, linux-next, linux-kernel,
	Mike Galbraith, Thomas Gleixner, H. Peter Anvin


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, 2009-11-26 at 11:32 +0100, Peter Zijlstra wrote:
> > On Thu, 2009-11-26 at 11:29 +0100, Ingo Molnar wrote:
> > > * Tejun Heo <tj@kernel.org> wrote:
> > > 
> > > > Rename preempt_notifier to sched_notifier, move it from preempt.h to 
> > > > sched.h, drop sched_ prefixes from ops names and make sched_notifier 
> > > > always enabled.
> > > > 
> > > > This is to prepare for adding more notification hooks.  This patch 
> > > > doesn't make any functional changes.
> > > 
> > > The sched notifiers and the various event notifiers we have in the same 
> > > codepaths should really be unified into a single callback framework.
> > > 
> > > We have these _5_ callbacks:
> > > 
> > > ....
> > >         perf_event_task_sched_out(prev, next, cpu);
> > > ....
> > >         fire_sched_out_notifiers(prev, next);
> > > ....
> > >         trace_sched_switch(rq, prev, next);
> > > ....
> > >         perf_event_task_sched_in(current, cpu_of(rq));
> > > 	fire_sched_in_notifiers(current);
> > > ....
> > > 
> > > That could be done with just two callbacks - one for sched-out, one for 
> > > sched-in.
> > > 
> > > The best way to do that would be to use two TRACE_EVENT() callbacks, 
> > > make them unconditional and register to them. (with wrappers to make it 
> > > all convenient to use)
> > > 
> > > This requires some work but needs to be done.
> > 
> > Ugh,.. it also makes TRACE_EVENT unconditional.
> > 
> > That really wants a separate option.. What we could do is take regular
> > notifier lists and extend them to auto-generate a tracepoint when the
> > trace stuff is enabled or something.

I wouldnt mind some form of TRACE_EVENT_CALLBACK() thing whose callback 
facility is always available, even if CONFIG_PERF_EVENTS and 
CONFIG_TRACING is disabled.

It might grow out of notifier.h - albeit i suspect the shorter path 
would be to grow it from TRACE_EVENT().

( The various pagefault notifiers in arch/x86/mm/fault.c could use this 
  facility too. )

What we definitely dont want is the proliferation of callbacks.

> Also, there is this thing about direct and indirect function calls.

Yeah. The norm would be for those points to be disabled and have near 
zero overhead. If it has callbacks registered it should be light-weight.

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-26 11:56                   ` Ingo Molnar
@ 2009-11-26 12:40                     ` Peter Zijlstra
  2009-11-27  2:11                       ` Tejun Heo
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2009-11-26 12:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Frédéric Weisbecker, Tejun Heo,
	Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin

On Thu, 2009-11-26 at 12:56 +0100, Ingo Molnar wrote:
> 
> I wouldnt mind some form of TRACE_EVENT_CALLBACK() thing whose callback 
> facility is always available, even if CONFIG_PERF_EVENTS and 
> CONFIG_TRACING is disabled.

CALLBACK_EVENT() would be my preferred name, and shouldn't live anywhere
near the regular tracing bits, the tracing bits could simply add another
callback in it when enabled.

> It might grow out of notifier.h - albeit i suspect the shorter path 
> would be to grow it from TRACE_EVENT().
> 
> ( The various pagefault notifiers in arch/x86/mm/fault.c could use this 
>   facility too. )
> 
> What we definitely dont want is the proliferation of callbacks.

Sure, a first approach would be to find something that can cover both
these extended preempt notifiers and the mmu notifier stuff, clearly
notifier.h isn't cutting it anymore, and rolling these things yourself
has obvious disadvantages like not being to add generic bits without
having to touch all of them.

The big downside from TRACE_EVENT() like things is the implementation,
its a horrid macro mess. They're nice enough when you don't have to
touch the implementation, but rather painful when you do have to.

But I guess there's just no real alternative..

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-26 12:40                     ` Peter Zijlstra
@ 2009-11-27  2:11                       ` Tejun Heo
  2009-11-27  4:52                         ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Tejun Heo @ 2009-11-27  2:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Frédéric Weisbecker,
	Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin

Hello, Peter, Ingo.

11/26/2009 09:40 PM, Peter Zijlstra wrote:
> CALLBACK_EVENT() would be my preferred name, and shouldn't live anywhere
> near the regular tracing bits, the tracing bits could simply add another
> callback in it when enabled.

I haven't looked at the mm code but if the scheduler callback
requirement isn't gonna explode big time soon and we know which
functions are the candidate callbacks at build time, I think this can
be done pretty efficiently with an ulong enable mask per task and
fixed function dispatch such that no callback case just goes through
one likely() conditional test at the tracing point and callback cases
are dispatched using conditional direct jump.

The thing is that I've been sitting on these workqueue patches for
months now and I really want them in stable tree at this point.  So,
how about putting the current simplistic notifier code into a sched/
branch which is not pushed to Linus and then after pushing the
workqueue patches, I'll work on the notifiers branch before pushing
the whole thing to Linus.  Although the scheduler notifier changes
necessary for c-m-workqueue adds more notifiers, it's just extension
of an existing facility and pretty isolated change from other
workqueue changes.

How does that sound?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-26 10:29             ` Ingo Molnar
  2009-11-26 10:32               ` Peter Zijlstra
  2009-11-26 10:44               ` Tejun Heo
@ 2009-11-27  3:33               ` Paul Mackerras
  2009-11-27  4:54                 ` Ingo Molnar
  2 siblings, 1 reply; 39+ messages in thread
From: Paul Mackerras @ 2009-11-27  3:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tejun Heo, Stephen Rothwell, linux-next, linux-kernel,
	Mike Galbraith, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra

Ingo Molnar writes:

> The sched notifiers and the various event notifiers we have in the same 
> codepaths should really be unified into a single callback framework.
> 
> We have these _5_ callbacks:
> 
> ...
>         perf_event_task_sched_out(prev, next, cpu);
> ...
>         fire_sched_out_notifiers(prev, next);
> ...
>         trace_sched_switch(rq, prev, next);
> ...
>         perf_event_task_sched_in(current, cpu_of(rq));
> 	fire_sched_in_notifiers(current);
> ...
> 
> That could be done with just two callbacks - one for sched-out, one for 
> sched-in.
> 
> The best way to do that would be to use two TRACE_EVENT() callbacks, 
> make them unconditional and register to them. (with wrappers to make it 
> all convenient to use)

I'd rather 5 explicit direct function calls than two direct calls and
five indirect function calls, actually...

Paul.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-27  2:11                       ` Tejun Heo
@ 2009-11-27  4:52                         ` Ingo Molnar
  2009-11-27  5:38                           ` Tejun Heo
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2009-11-27  4:52 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Steven Rostedt, Fr??d??ric Weisbecker,
	Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin


* Tejun Heo <tj@kernel.org> wrote:

> Hello, Peter, Ingo.
> 
> 11/26/2009 09:40 PM, Peter Zijlstra wrote:
> > CALLBACK_EVENT() would be my preferred name, and shouldn't live anywhere
> > near the regular tracing bits, the tracing bits could simply add another
> > callback in it when enabled.
> 
> I haven't looked at the mm code but if the scheduler callback 
> requirement isn't gonna explode big time soon and we know which 
> functions are the candidate callbacks at build time, I think this can 
> be done pretty efficiently with an ulong enable mask per task and 
> fixed function dispatch such that no callback case just goes through 
> one likely() conditional test at the tracing point and callback cases 
> are dispatched using conditional direct jump.

Yes - and that's what the tracepoints infrastructure is about.

Btw., longer term it will be faster than a mask check and a 
default-untaken conditional: there's ongoign work to offer runtime 
instruction patching features for tracing callbacks. There's the jump 
patching optimization and also the immediate values patching 
optimization.

We've got old-style notifiers for regular callbacks, we've got new-style 
tracepoints which are callbacks and event source descriptors - and what 
i'm asking for is to have _one_ callback mechanism, and to use that in 
the scheduler. 5 callbacks using 3 different facilities is excessive - 
i'd like to see just two callbacks using one facility.

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-27  3:33               ` Paul Mackerras
@ 2009-11-27  4:54                 ` Ingo Molnar
  0 siblings, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2009-11-27  4:54 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Tejun Heo, Stephen Rothwell, linux-next, linux-kernel,
	Mike Galbraith, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra


* Paul Mackerras <paulus@samba.org> wrote:

> Ingo Molnar writes:
> 
> > The sched notifiers and the various event notifiers we have in the same 
> > codepaths should really be unified into a single callback framework.
> > 
> > We have these _5_ callbacks:
> > 
> > ...
> >         perf_event_task_sched_out(prev, next, cpu);
> > ...
> >         fire_sched_out_notifiers(prev, next);
> > ...
> >         trace_sched_switch(rq, prev, next);
> > ...
> >         perf_event_task_sched_in(current, cpu_of(rq));
> > 	fire_sched_in_notifiers(current);
> > ...
> > 
> > That could be done with just two callbacks - one for sched-out, one for 
> > sched-in.
> > 
> > The best way to do that would be to use two TRACE_EVENT() callbacks, 
> > make them unconditional and register to them. (with wrappers to make 
> > it all convenient to use)
> 
> I'd rather 5 explicit direct function calls than two direct calls and 
> five indirect function calls, actually...

Those five callbacks are typically disabled on a regular Linux system. 
So i'd rather have two sites with some NOPs in them. (no branches, no 
calls)

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-27  4:52                         ` Ingo Molnar
@ 2009-11-27  5:38                           ` Tejun Heo
  2009-11-27  5:46                             ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Tejun Heo @ 2009-11-27  5:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Steven Rostedt, Fr??d??ric Weisbecker,
	Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin

Hello,

11/27/2009 01:52 PM, Ingo Molnar wrote:
> Btw., longer term it will be faster than a mask check and a
> default-untaken conditional: there's ongoign work to offer runtime
> instruction patching features for tracing callbacks.  There's the
> jump patching optimization and also the immediate values patching
> optimization.

Scheduler callbacks won't benefit much from it.  There will always be
workqueues and thus conditional branch will always be necessary.

> We've got old-style notifiers for regular callbacks, we've got new-style 
> tracepoints which are callbacks and event source descriptors - and what 
> i'm asking for is to have _one_ callback mechanism, and to use that in 
> the scheduler. 5 callbacks using 3 different facilities is excessive - 
> i'd like to see just two callbacks using one facility.

The patches in question don't really change anything in in/out paths.
It only adds wake up and sleep callbacks to the existing notifier
mechanism.  Sure, let's unify all of them and make them prettier and
more efficient but I don't think we need to hold up workqueue changes
for that, right?  We can do those in separate steps and have workqueue
changes tested in at least linux-next.

I'll re-post four scheduler patches which reorganize preempt notifier
but make no functional changes and another one to add wakeup and
sleep.  The first four can go into sched/core and the last one in a
separate branch.  That way, conflicts will be minimal yet upstream
won't see any functional difference from the current code.  Later when
notifier frameworks is reworked, we can merge them all up and send
them upstream.  How does it sound?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-27  5:38                           ` Tejun Heo
@ 2009-11-27  5:46                             ` Ingo Molnar
  2009-11-27  6:01                               ` Tejun Heo
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2009-11-27  5:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Steven Rostedt, Fr??d??ric Weisbecker,
	Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin


* Tejun Heo <tj@kernel.org> wrote:

> Hello,
> 
> 11/27/2009 01:52 PM, Ingo Molnar wrote:
> > Btw., longer term it will be faster than a mask check and a
> > default-untaken conditional: there's ongoign work to offer runtime
> > instruction patching features for tracing callbacks.  There's the
> > jump patching optimization and also the immediate values patching
> > optimization.
> 
> Scheduler callbacks won't benefit much from it.  There will always be 
> workqueues and thus conditional branch will always be necessary.

Other code will benefit from it though, such as the page fault callbacks 
i already mentioned.

My position on this is rather clear: i want no new callbacks and no 
changes to callbacks in the scheduler until this situation is cleaned 
up. Five callback sites are _way_ too much - so if you want to add 
callbacks or change them, please clean it up and improve it first.

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-27  5:46                             ` Ingo Molnar
@ 2009-11-27  6:01                               ` Tejun Heo
  2009-11-27  6:13                                 ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Tejun Heo @ 2009-11-27  6:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Steven Rostedt, Fr??d??ric Weisbecker,
	Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin

Hello,

11/27/2009 02:46 PM, Ingo Molnar wrote:
> Other code will benefit from it though, such as the page fault callbacks 
> i already mentioned.
> 
> My position on this is rather clear: i want no new callbacks and no 
> changes to callbacks in the scheduler until this situation is cleaned 
> up. Five callback sites are _way_ too much - so if you want to add 
> callbacks or change them, please clean it up and improve it first.

Even changes which cause no functional differences?  It's just
logistics at that point and I'll only be pushing the actual changes
(addition of wakeup/sleep callbacks) to linux-next so that different
stages of workqueue changes can receive some amount of testing.  If
you don't want that in sched development tree, I can maintain a
temporary branch for linux-next testing but I really can't see what
will be the benefit of doing things that way.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-27  6:01                               ` Tejun Heo
@ 2009-11-27  6:13                                 ` Ingo Molnar
  2009-11-27  6:16                                   ` Tejun Heo
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2009-11-27  6:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Steven Rostedt, Fr??d??ric Weisbecker,
	Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin


* Tejun Heo <tj@kernel.org> wrote:

> Hello,
> 
> 11/27/2009 02:46 PM, Ingo Molnar wrote:
> > Other code will benefit from it though, such as the page fault callbacks 
> > i already mentioned.
> > 
> > My position on this is rather clear: i want no new callbacks and no 
> > changes to callbacks in the scheduler until this situation is cleaned 
> > up. Five callback sites are _way_ too much - so if you want to add 
> > callbacks or change them, please clean it up and improve it first.
> 
> Even changes which cause no functional differences? [...]

Such as enabling preempt notifiers unconditionally? That's a functional 
change - it turns a so-far optional callback into an essentially 
mandatory one.

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-27  6:13                                 ` Ingo Molnar
@ 2009-11-27  6:16                                   ` Tejun Heo
  2009-11-27  6:21                                     ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Tejun Heo @ 2009-11-27  6:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Steven Rostedt, Fr??d??ric Weisbecker,
	Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin

11/27/2009 03:13 PM, Ingo Molnar wrote:
>>> My position on this is rather clear: i want no new callbacks and no 
>>> changes to callbacks in the scheduler until this situation is cleaned 
>>> up. Five callback sites are _way_ too much - so if you want to add 
>>> callbacks or change them, please clean it up and improve it first.
>>
>> Even changes which cause no functional differences? [...]
> 
> Such as enabling preempt notifiers unconditionally? That's a functional 
> change - it turns a so-far optional callback into an essentially 
> mandatory one.

No, I'm not gonna do that.  Just patches to reorganize code so that
unnecessary conflicts won't occur.  There will be NO functional
changes.

-- 
tejun

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-27  6:16                                   ` Tejun Heo
@ 2009-11-27  6:21                                     ` Ingo Molnar
  2009-11-27  6:38                                       ` Tejun Heo
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2009-11-27  6:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Steven Rostedt, Fr??d??ric Weisbecker,
	Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin


* Tejun Heo <tj@kernel.org> wrote:

> 11/27/2009 03:13 PM, Ingo Molnar wrote:
> >>> My position on this is rather clear: i want no new callbacks and no 
> >>> changes to callbacks in the scheduler until this situation is cleaned 
> >>> up. Five callback sites are _way_ too much - so if you want to add 
> >>> callbacks or change them, please clean it up and improve it first.
> >>
> >> Even changes which cause no functional differences? [...]
> > 
> > Such as enabling preempt notifiers unconditionally? That's a functional 
> > change - it turns a so-far optional callback into an essentially 
> > mandatory one.
> 
> No, I'm not gonna do that.  Just patches to reorganize code so that 
> unnecessary conflicts won't occur.  There will be NO functional 
> changes.

Not without the other changes - which you want to do too, right? Please 
send all sched.c modifications via the scheduler tree. Going via other 
trees is fine when there's agreement by the maintainers - but this is 
one of the rare cases where that's not the case.

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-27  6:21                                     ` Ingo Molnar
@ 2009-11-27  6:38                                       ` Tejun Heo
  2009-11-27  7:02                                         ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Tejun Heo @ 2009-11-27  6:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Steven Rostedt, Fr??d??ric Weisbecker,
	Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin

Hello,

11/27/2009 03:21 PM, Ingo Molnar wrote:
>> No, I'm not gonna do that.  Just patches to reorganize code so that 
>> unnecessary conflicts won't occur.  There will be NO functional 
>> changes.
> 
> Not without the other changes - which you want to do too, right?

The extra things I want can stay in a devel branch until notifiers get
cleaned up and it will only be a few patches which aren't very likely
to cause conflicts when it gets exported for linux-next or other
testing branches.

> Please send all sched.c modifications via the scheduler tree. Going
> via other trees is fine when there's agreement by the maintainers -
> but this is one of the rare cases where that's not the case.

Yeah, sure.  So, two patchsets.  One for sched/core doing pure
reorganization without any functional changes.  The other for
sched/notifier (or whatever name you would prefer) which is purely for
development and testing and will not be pushed to Linus unless it
receives notifier framework cleanup.  wq#for-next will pull from
sched/notifier and be exported to linux-next but it will never be
submitted to Linus until sched/notifier is cleaned up.  Am I
understanding it correctly?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it
  2009-11-27  6:38                                       ` Tejun Heo
@ 2009-11-27  7:02                                         ` Ingo Molnar
  0 siblings, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2009-11-27  7:02 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Steven Rostedt, Fr??d??ric Weisbecker,
	Stephen Rothwell, linux-next, linux-kernel, Mike Galbraith,
	Thomas Gleixner, H. Peter Anvin


* Tejun Heo <tj@kernel.org> wrote:

> > Please send all sched.c modifications via the scheduler tree. Going 
> > via other trees is fine when there's agreement by the maintainers - 
> > but this is one of the rare cases where that's not the case.
> 
> Yeah, sure.  So, two patchsets.  One for sched/core doing pure 
> reorganization without any functional changes.  The other for 
> sched/notifier (or whatever name you would prefer) which is purely for 
> development and testing and will not be pushed to Linus unless it 
> receives notifier framework cleanup.  wq#for-next will pull from 
> sched/notifier and be exported to linux-next but it will never be 
> submitted to Linus until sched/notifier is cleaned up.  Am I 
> understanding it correctly?

Yeah, that would be fine.

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: linux-next: manual merge of the workqueues tree with the tip tree
  2019-11-18 12:50   ` Ingo Molnar
  2019-11-18 14:56     ` Paul E. McKenney
@ 2019-11-18 15:09     ` Tejun Heo
  1 sibling, 0 replies; 39+ messages in thread
From: Tejun Heo @ 2019-11-18 15:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Sebastian Andrzej Siewior, Stephen Rothwell, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
	Linux Next Mailing List, Linux Kernel Mailing List,
	Joel Fernandes (Google),
	Paul E. McKenney

On Mon, Nov 18, 2019 at 01:50:46PM +0100, Ingo Molnar wrote:
> So 5a6446626d7e is currently queued up for v5.5 as part of the RCU tree. 
> 
> I can cherry pick 5a6446626d7e into tip:core/urgent if Paul and Tejun 
> agree.

Yeah, please go ahead.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: linux-next: manual merge of the workqueues tree with the tip tree
  2019-11-18 12:50   ` Ingo Molnar
@ 2019-11-18 14:56     ` Paul E. McKenney
  2019-11-18 15:09     ` Tejun Heo
  1 sibling, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2019-11-18 14:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Sebastian Andrzej Siewior, Stephen Rothwell, Tejun Heo,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
	Linux Next Mailing List, Linux Kernel Mailing List,
	Joel Fernandes (Google)

On Mon, Nov 18, 2019 at 01:50:46PM +0100, Ingo Molnar wrote:
> 
> * Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> > On 2019-11-18 15:08:58 [+1100], Stephen Rothwell wrote:
> > > Hi all,
> > Hi,
> > 
> > > Today's linux-next merge of the workqueues tree got a conflict in:
> > > 
> > >   kernel/workqueue.c
> > > 
> > > between commit:
> > > 
> > >   5a6446626d7e ("workqueue: Convert for_each_wq to use built-in list check")
> > > 
> > > from the tip tree and commit:
> > > 
> > >   49e9d1a9faf2 ("workqueue: Add RCU annotation for pwq list walk")
> > > 
> > > from the workqueues tree.
> > 
> > urgh. So the RCU warning is introduced in commit
> >    28875945ba98d ("rcu: Add support for consolidated-RCU reader checking")
> > 
> > which was merged in v5.4-rc1. I enabled it around -rc7 and saw a few
> > warnings including in the workqueue code. I asked about this and posted
> > later a patch which was applied by Tejun. Now I see that the tip tree
> > has a patch for this warning…
> > I would vote for the patch in -tip since it also removes the
> > assert_rcu_or_wq_mutex() macro.
> > It would be nice if this could be part of v5.4 since once the RCU
> > warning is enabled it will yell.
> 
> So 5a6446626d7e is currently queued up for v5.5 as part of the RCU tree. 
> 
> I can cherry pick 5a6446626d7e into tip:core/urgent if Paul and Tejun 
> agree.

No objections here.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: linux-next: manual merge of the workqueues tree with the tip tree
  2019-11-18  9:00 ` Sebastian Andrzej Siewior
@ 2019-11-18 12:50   ` Ingo Molnar
  2019-11-18 14:56     ` Paul E. McKenney
  2019-11-18 15:09     ` Tejun Heo
  0 siblings, 2 replies; 39+ messages in thread
From: Ingo Molnar @ 2019-11-18 12:50 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Stephen Rothwell, Tejun Heo, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra, Linux Next Mailing List,
	Linux Kernel Mailing List, Joel Fernandes (Google),
	Paul E. McKenney


* Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 2019-11-18 15:08:58 [+1100], Stephen Rothwell wrote:
> > Hi all,
> Hi,
> 
> > Today's linux-next merge of the workqueues tree got a conflict in:
> > 
> >   kernel/workqueue.c
> > 
> > between commit:
> > 
> >   5a6446626d7e ("workqueue: Convert for_each_wq to use built-in list check")
> > 
> > from the tip tree and commit:
> > 
> >   49e9d1a9faf2 ("workqueue: Add RCU annotation for pwq list walk")
> > 
> > from the workqueues tree.
> 
> urgh. So the RCU warning is introduced in commit
>    28875945ba98d ("rcu: Add support for consolidated-RCU reader checking")
> 
> which was merged in v5.4-rc1. I enabled it around -rc7 and saw a few
> warnings including in the workqueue code. I asked about this and posted
> later a patch which was applied by Tejun. Now I see that the tip tree
> has a patch for this warning…
> I would vote for the patch in -tip since it also removes the
> assert_rcu_or_wq_mutex() macro.
> It would be nice if this could be part of v5.4 since once the RCU
> warning is enabled it will yell.

So 5a6446626d7e is currently queued up for v5.5 as part of the RCU tree. 

I can cherry pick 5a6446626d7e into tip:core/urgent if Paul and Tejun 
agree.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: linux-next: manual merge of the workqueues tree with the tip tree
  2019-11-18  4:08 Stephen Rothwell
@ 2019-11-18  9:00 ` Sebastian Andrzej Siewior
  2019-11-18 12:50   ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-11-18  9:00 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Tejun Heo, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra, Linux Next Mailing List,
	Linux Kernel Mailing List, Joel Fernandes (Google),
	Paul E. McKenney

On 2019-11-18 15:08:58 [+1100], Stephen Rothwell wrote:
> Hi all,
Hi,

> Today's linux-next merge of the workqueues tree got a conflict in:
> 
>   kernel/workqueue.c
> 
> between commit:
> 
>   5a6446626d7e ("workqueue: Convert for_each_wq to use built-in list check")
> 
> from the tip tree and commit:
> 
>   49e9d1a9faf2 ("workqueue: Add RCU annotation for pwq list walk")
> 
> from the workqueues tree.

urgh. So the RCU warning is introduced in commit
   28875945ba98d ("rcu: Add support for consolidated-RCU reader checking")

which was merged in v5.4-rc1. I enabled it around -rc7 and saw a few
warnings including in the workqueue code. I asked about this and posted
later a patch which was applied by Tejun. Now I see that the tip tree
has a patch for this warning…
I would vote for the patch in -tip since it also removes the
assert_rcu_or_wq_mutex() macro.
It would be nice if this could be part of v5.4 since once the RCU
warning is enabled it will yell.

Sebastian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* linux-next: manual merge of the workqueues tree with the tip tree
@ 2019-11-18  4:08 Stephen Rothwell
  2019-11-18  9:00 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 39+ messages in thread
From: Stephen Rothwell @ 2019-11-18  4:08 UTC (permalink / raw)
  To: Tejun Heo, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra
  Cc: Linux Next Mailing List, Linux Kernel Mailing List,
	Joel Fernandes (Google),
	Paul E. McKenney, Sebastian Andrzej Siewior

[-- Attachment #1: Type: text/plain, Size: 792 bytes --]

Hi all,

Today's linux-next merge of the workqueues tree got a conflict in:

  kernel/workqueue.c

between commit:

  5a6446626d7e ("workqueue: Convert for_each_wq to use built-in list check")

from the tip tree and commit:

  49e9d1a9faf2 ("workqueue: Add RCU annotation for pwq list walk")

from the workqueues tree.

I fixed it up (I just used the former as it is a superset of the latter)
and can carry the fix as necessary. This is now fixed as far as linux-next
is concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* linux-next: manual merge of the workqueues tree with the tip tree
@ 2011-12-28  4:37 Stephen Rothwell
  0 siblings, 0 replies; 39+ messages in thread
From: Stephen Rothwell @ 2011-12-28  4:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-next, linux-kernel, Jan Beulich, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Christoph Lameter

[-- Attachment #1: Type: text/plain, Size: 1629 bytes --]

Hi Tejun,

Today's linux-next merge of the workqueues tree got a conflict in
arch/x86/include/asm/percpu.h between commit cebef5beed3d ("x86: Fix and
improve percpu_cmpxchg{8,16}b_double()") from the tip tree and commit
933393f58fef ("percpu: Remove irqsafe_cpu_xxx variants") from the
workqueues tree.

I fixed it up (see below) and can carry the fix as necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc arch/x86/include/asm/percpu.h
index 529bf07e,562ccb5..0000000
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@@ -462,9 -446,8 +443,8 @@@ do {									
  	__ret;								\
  })
  
 -#define __this_cpu_cmpxchg_double_4(pcp1, pcp2, o1, o2, n1, n2)		percpu_cmpxchg8b_double(pcp1, o1, o2, n1, n2)
 -#define this_cpu_cmpxchg_double_4(pcp1, pcp2, o1, o2, n1, n2)		percpu_cmpxchg8b_double(pcp1, o1, o2, n1, n2)
 +#define __this_cpu_cmpxchg_double_4	percpu_cmpxchg8b_double
 +#define this_cpu_cmpxchg_double_4	percpu_cmpxchg8b_double
- #define irqsafe_cpu_cmpxchg_double_4	percpu_cmpxchg8b_double
  #endif /* CONFIG_X86_CMPXCHG64 */
  
  /*
@@@ -519,9 -503,8 +492,8 @@@
  	__ret;								\
  })
  
 -#define __this_cpu_cmpxchg_double_8(pcp1, pcp2, o1, o2, n1, n2)		percpu_cmpxchg16b_double(pcp1, o1, o2, n1, n2)
 -#define this_cpu_cmpxchg_double_8(pcp1, pcp2, o1, o2, n1, n2)		percpu_cmpxchg16b_double(pcp1, o1, o2, n1, n2)
 +#define __this_cpu_cmpxchg_double_8	percpu_cmpxchg16b_double
 +#define this_cpu_cmpxchg_double_8	percpu_cmpxchg16b_double
- #define irqsafe_cpu_cmpxchg_double_8	percpu_cmpxchg16b_double
  
  #endif
  

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* linux-next: manual merge of the workqueues tree with the tip tree
@ 2010-12-27  4:38 Stephen Rothwell
  0 siblings, 0 replies; 39+ messages in thread
From: Stephen Rothwell @ 2010-12-27  4:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-next, linux-kernel, John Stultz, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 514 bytes --]

Hi Tejun,

Today's linux-next merge of the workqueues tree got a conflict in
drivers/rtc/rtc-dev.c between commit
042620a018afcfba1d678062b62e463b9e43a68d ("RTC: Remove UIE emulation")
from the  tree and commit 9db8995be5e1869b5effa117909bc285e06fc09b ("rtc:
don't use flush_scheduled_work()") from the workqueues tree.

The former removes the code that the latter modifies, so I used the
former.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* linux-next: manual merge of the workqueues tree with the tip tree
@ 2010-08-02  3:26 Stephen Rothwell
  0 siblings, 0 replies; 39+ messages in thread
From: Stephen Rothwell @ 2010-08-02  3:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-next, linux-kernel, Paul E. McKenney, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra

Hi Tejun,

Today's linux-next merge of the workqueues tree got a conflict in
kernel/workqueue.c between commit
a25909a4d4a29e272f953e12595bf2f04a292dbd ("lockdep: Add an
in_workqueue_context() lockdep-based test function") from the tip tree
and commit 098849516dd522a343e659740c8f1394a5b7fa69 ("workqueue: explain
for_each_*cwq_cpu() iterators") from the workqueues tree (alogn with a
few others that were previously reported).

I fixed it up (see below) and can carry the fix as necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc kernel/workqueue.c
index 59fef15,e2eb351..0000000
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@@ -68,21 -237,68 +237,83 @@@ struct workqueue_struct 
  #endif
  };
  
 +#ifdef CONFIG_LOCKDEP
 +/**
 + * in_workqueue_context() - in context of specified workqueue?
 + * @wq: the workqueue of interest
 + *
 + * Checks lockdep state to see if the current task is executing from
 + * within a workqueue item.  This function exists only if lockdep is
 + * enabled.
 + */
 +int in_workqueue_context(struct workqueue_struct *wq)
 +{
 +	return lock_is_held(&wq->lockdep_map);
 +}
 +#endif
 +
+ struct workqueue_struct *system_wq __read_mostly;
+ struct workqueue_struct *system_long_wq __read_mostly;
+ struct workqueue_struct *system_nrt_wq __read_mostly;
+ struct workqueue_struct *system_unbound_wq __read_mostly;
+ EXPORT_SYMBOL_GPL(system_wq);
+ EXPORT_SYMBOL_GPL(system_long_wq);
+ EXPORT_SYMBOL_GPL(system_nrt_wq);
+ EXPORT_SYMBOL_GPL(system_unbound_wq);
+ 
+ #define for_each_busy_worker(worker, i, pos, gcwq)			\
+ 	for (i = 0; i < BUSY_WORKER_HASH_SIZE; i++)			\
+ 		hlist_for_each_entry(worker, pos, &gcwq->busy_hash[i], hentry)
+ 
+ static inline int __next_gcwq_cpu(int cpu, const struct cpumask *mask,
+ 				  unsigned int sw)
+ {
+ 	if (cpu < nr_cpu_ids) {
+ 		if (sw & 1) {
+ 			cpu = cpumask_next(cpu, mask);
+ 			if (cpu < nr_cpu_ids)
+ 				return cpu;
+ 		}
+ 		if (sw & 2)
+ 			return WORK_CPU_UNBOUND;
+ 	}
+ 	return WORK_CPU_NONE;
+ }
+ 
+ static inline int __next_wq_cpu(int cpu, const struct cpumask *mask,
+ 				struct workqueue_struct *wq)
+ {
+ 	return __next_gcwq_cpu(cpu, mask, !(wq->flags & WQ_UNBOUND) ? 1 : 2);
+ }
+ 
+ /*
+  * CPU iterators
+  *
+  * An extra gcwq is defined for an invalid cpu number
+  * (WORK_CPU_UNBOUND) to host workqueues which are not bound to any
+  * specific CPU.  The following iterators are similar to
+  * for_each_*_cpu() iterators but also considers the unbound gcwq.
+  *
+  * for_each_gcwq_cpu()		: possible CPUs + WORK_CPU_UNBOUND
+  * for_each_online_gcwq_cpu()	: online CPUs + WORK_CPU_UNBOUND
+  * for_each_cwq_cpu()		: possible CPUs for bound workqueues,
+  *				  WORK_CPU_UNBOUND for unbound workqueues
+  */
+ #define for_each_gcwq_cpu(cpu)						\
+ 	for ((cpu) = __next_gcwq_cpu(-1, cpu_possible_mask, 3);		\
+ 	     (cpu) < WORK_CPU_NONE;					\
+ 	     (cpu) = __next_gcwq_cpu((cpu), cpu_possible_mask, 3))
+ 
+ #define for_each_online_gcwq_cpu(cpu)					\
+ 	for ((cpu) = __next_gcwq_cpu(-1, cpu_online_mask, 3);		\
+ 	     (cpu) < WORK_CPU_NONE;					\
+ 	     (cpu) = __next_gcwq_cpu((cpu), cpu_online_mask, 3))
+ 
+ #define for_each_cwq_cpu(cpu, wq)					\
+ 	for ((cpu) = __next_wq_cpu(-1, cpu_possible_mask, (wq));	\
+ 	     (cpu) < WORK_CPU_NONE;					\
+ 	     (cpu) = __next_wq_cpu((cpu), cpu_possible_mask, (wq)))
+ 
  #ifdef CONFIG_DEBUG_OBJECTS_WORK
  
  static struct debug_obj_descr work_debug_descr;

^ permalink raw reply	[flat|nested] 39+ messages in thread

* linux-next: manual merge of the workqueues tree with the tip tree
@ 2010-07-20  4:46 Stephen Rothwell
  0 siblings, 0 replies; 39+ messages in thread
From: Stephen Rothwell @ 2010-07-20  4:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-next, linux-kernel, Li Zefan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra

Hi Tejun,

Today's linux-next merge of the workqueues tree got a conflict in
kernel/trace/Kconfig between commit
039ca4e74a1cf60bd7487324a564ecf5c981f254 ("tracing: Remove kmemtrace
ftrace plugin") from the tip tree and commit
64166699752006f1a23a9cf7c96ae36654ccfc2c ("workqueue: temporarily remove
workqueue tracing") from the workqueues tree.

Juts context changes.  I fixed it up (see below) and can carry the fix as
necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc kernel/trace/Kconfig
index f669092,a0d95c1f..0000000
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@@ -354,17 -371,26 +354,6 @@@ config STACK_TRACE
  
  	  Say N if unsure.
  
- config WORKQUEUE_TRACER
- 	bool "Trace workqueues"
- 	select GENERIC_TRACER
- 	help
- 	  The workqueue tracer provides some statistical information
-           about each cpu workqueue thread such as the number of the
-           works inserted and executed since their creation. It can help
-           to evaluate the amount of work each of them has to perform.
-           For example it can help a developer to decide whether he should
-           choose a per-cpu workqueue instead of a singlethreaded one.
- 
 -config KMEMTRACE
 -	bool "Trace SLAB allocations"
 -	select GENERIC_TRACER
 -	help
 -	  kmemtrace provides tracing for slab allocator functions, such as
 -	  kmalloc, kfree, kmem_cache_alloc, kmem_cache_free, etc. Collected
 -	  data is then fed to the userspace application in order to analyse
 -	  allocation hotspots, internal fragmentation and so on, making it
 -	  possible to see how well an allocator performs, as well as debug
 -	  and profile kernel code.
 -
 -	  This requires an userspace application to use. See
 -	  Documentation/trace/kmemtrace.txt for more information.
 -
 -	  Saying Y will make the kernel somewhat larger and slower. However,
 -	  if you disable kmemtrace at run-time or boot-time, the performance
 -	  impact is minimal (depending on the arch the kernel is built for).
 -
 -	  If unsure, say N.
 -
  config BLK_DEV_IO_TRACE
  	bool "Support for tracing block IO actions"
  	depends on SYSFS

^ permalink raw reply	[flat|nested] 39+ messages in thread

* linux-next: manual merge of the workqueues tree with the tip tree
@ 2010-07-20  4:46 Stephen Rothwell
  0 siblings, 0 replies; 39+ messages in thread
From: Stephen Rothwell @ 2010-07-20  4:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-next, linux-kernel, Paul E. McKenney, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra

Hi Tejun,

Today's linux-next merge of the workqueues tree got a conflict in
kernel/workqueue.c between commit
a25909a4d4a29e272f953e12595bf2f04a292dbd ("lockdep: Add an
in_workqueue_context() lockdep-based test function") from the tip tree
and several commits from the workqueues tree.

I fixed it up (see below) and can carry the fix as necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc kernel/workqueue.c
index 59fef15,aca9472..0000000
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@@ -68,21 -216,55 +216,70 @@@ struct workqueue_struct 
  #endif
  };
  
+ struct workqueue_struct *system_wq __read_mostly;
+ struct workqueue_struct *system_long_wq __read_mostly;
+ struct workqueue_struct *system_nrt_wq __read_mostly;
+ struct workqueue_struct *system_unbound_wq __read_mostly;
+ EXPORT_SYMBOL_GPL(system_wq);
+ EXPORT_SYMBOL_GPL(system_long_wq);
+ EXPORT_SYMBOL_GPL(system_nrt_wq);
+ EXPORT_SYMBOL_GPL(system_unbound_wq);
+ 
+ #define for_each_busy_worker(worker, i, pos, gcwq)			\
+ 	for (i = 0; i < BUSY_WORKER_HASH_SIZE; i++)			\
+ 		hlist_for_each_entry(worker, pos, &gcwq->busy_hash[i], hentry)
+ 
+ static inline int __next_gcwq_cpu(int cpu, const struct cpumask *mask,
+ 				  unsigned int sw)
+ {
+ 	if (cpu < nr_cpu_ids) {
+ 		if (sw & 1) {
+ 			cpu = cpumask_next(cpu, mask);
+ 			if (cpu < nr_cpu_ids)
+ 				return cpu;
+ 		}
+ 		if (sw & 2)
+ 			return WORK_CPU_UNBOUND;
+ 	}
+ 	return WORK_CPU_NONE;
+ }
+ 
+ static inline int __next_wq_cpu(int cpu, const struct cpumask *mask,
+ 				struct workqueue_struct *wq)
+ {
+ 	return __next_gcwq_cpu(cpu, mask, !(wq->flags & WQ_UNBOUND) ? 1 : 2);
+ }
+ 
+ #define for_each_gcwq_cpu(cpu)						\
+ 	for ((cpu) = __next_gcwq_cpu(-1, cpu_possible_mask, 3);		\
+ 	     (cpu) < WORK_CPU_NONE;					\
+ 	     (cpu) = __next_gcwq_cpu((cpu), cpu_possible_mask, 3))
+ 
+ #define for_each_online_gcwq_cpu(cpu)					\
+ 	for ((cpu) = __next_gcwq_cpu(-1, cpu_online_mask, 3);		\
+ 	     (cpu) < WORK_CPU_NONE;					\
+ 	     (cpu) = __next_gcwq_cpu((cpu), cpu_online_mask, 3))
+ 
+ #define for_each_cwq_cpu(cpu, wq)					\
+ 	for ((cpu) = __next_wq_cpu(-1, cpu_possible_mask, (wq));	\
+ 	     (cpu) < WORK_CPU_NONE;					\
+ 	     (cpu) = __next_wq_cpu((cpu), cpu_possible_mask, (wq)))
+ 
 +#ifdef CONFIG_LOCKDEP
 +/**
 + * in_workqueue_context() - in context of specified workqueue?
 + * @wq: the workqueue of interest
 + *
 + * Checks lockdep state to see if the current task is executing from
 + * within a workqueue item.  This function exists only if lockdep is
 + * enabled.
 + */
 +int in_workqueue_context(struct workqueue_struct *wq)
 +{
 +	return lock_is_held(&wq->lockdep_map);
 +}
 +#endif
 +
  #ifdef CONFIG_DEBUG_OBJECTS_WORK
  
  static struct debug_obj_descr work_debug_descr;

^ permalink raw reply	[flat|nested] 39+ messages in thread

* linux-next: manual merge of the workqueues tree with the tip tree
@ 2010-07-20  4:46 Stephen Rothwell
  0 siblings, 0 replies; 39+ messages in thread
From: Stephen Rothwell @ 2010-07-20  4:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-next, linux-kernel, Paul E. McKenney, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra

Hi Tejun,

Today's linux-next merge of the workqueues tree got a conflict in
include/linux/workqueue.h between commit
a25909a4d4a29e272f953e12595bf2f04a292dbd ("lockdep: Add an
in_workqueue_context() lockdep-based test function") from the tip tree
and commit a0a1a5fd4fb15ec61117c759fe9f5c16c53d9e9c ("workqueue:
reimplement workqueue freeze using max_active") from the workqueues tree.

I fixed it up (see below) and can carry the fix as necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc include/linux/workqueue.h
index d0f7c81,d74a529..0000000
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@@ -298,7 -394,10 +394,14 @@@ static inline long work_on_cpu(unsigne
  long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);
  #endif /* CONFIG_SMP */
  
 +#ifdef CONFIG_LOCKDEP
 +int in_workqueue_context(struct workqueue_struct *wq);
 +#endif
++
+ #ifdef CONFIG_FREEZER
+ extern void freeze_workqueues_begin(void);
+ extern bool freeze_workqueues_busy(void);
+ extern void thaw_workqueues(void);
+ #endif /* CONFIG_FREEZER */
+ 
  #endif

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2019-11-18 15:09 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-26  8:00 linux-next: manual merge of the workqueues tree with the tip tree Stephen Rothwell
2009-11-26  8:12 ` Ingo Molnar
2009-11-26  9:15   ` Tejun Heo
2009-11-26  9:26     ` Ingo Molnar
2009-11-26  9:48       ` Tejun Heo
2009-11-26  9:51         ` Ingo Molnar
2009-11-26 10:11           ` [PATCH 1/4 tip/sched/core] sched: rename preempt_notifier to sched_notifier and always enable it Tejun Heo
2009-11-26 10:29             ` Ingo Molnar
2009-11-26 10:32               ` Peter Zijlstra
2009-11-26 11:23                 ` Peter Zijlstra
2009-11-26 11:56                   ` Ingo Molnar
2009-11-26 12:40                     ` Peter Zijlstra
2009-11-27  2:11                       ` Tejun Heo
2009-11-27  4:52                         ` Ingo Molnar
2009-11-27  5:38                           ` Tejun Heo
2009-11-27  5:46                             ` Ingo Molnar
2009-11-27  6:01                               ` Tejun Heo
2009-11-27  6:13                                 ` Ingo Molnar
2009-11-27  6:16                                   ` Tejun Heo
2009-11-27  6:21                                     ` Ingo Molnar
2009-11-27  6:38                                       ` Tejun Heo
2009-11-27  7:02                                         ` Ingo Molnar
2009-11-26 10:44               ` Tejun Heo
2009-11-27  3:33               ` Paul Mackerras
2009-11-27  4:54                 ` Ingo Molnar
2009-11-26 10:13           ` [PATCH 2/4 tip/sched/core] sched: update sched_notifier and add wakeup/sleep notifications Tejun Heo
2009-11-26 10:13           ` [PATCH 3/4 tip/sched/core] sched: refactor try_to_wake_up() and implement try_to_wake_up_local() Tejun Heo
2009-11-26 10:14           ` [PATCH 4/4 tip/sched/core] sched: implement force_cpus_allowed() Tejun Heo
2010-07-20  4:46 linux-next: manual merge of the workqueues tree with the tip tree Stephen Rothwell
2010-07-20  4:46 Stephen Rothwell
2010-07-20  4:46 Stephen Rothwell
2010-08-02  3:26 Stephen Rothwell
2010-12-27  4:38 Stephen Rothwell
2011-12-28  4:37 Stephen Rothwell
2019-11-18  4:08 Stephen Rothwell
2019-11-18  9:00 ` Sebastian Andrzej Siewior
2019-11-18 12:50   ` Ingo Molnar
2019-11-18 14:56     ` Paul E. McKenney
2019-11-18 15:09     ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.