All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC patch 0/5] Per cpu thread hotplug infrastructure
@ 2012-06-13 11:00 Thomas Gleixner
  2012-06-13 11:00 ` [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads Thomas Gleixner
                   ` (4 more replies)
  0 siblings, 5 replies; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-13 11:00 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat, Rusty Russell,
	Paul E. McKenney, Tejun Heo

The following series implements the infrastructure for parking and
unparking kernel threads to avoid the full teardown and fork on cpu
hotplug operations along with management infrastructure for hotplug
and users.

I've converted the most obvious (and simple) users ksoftirqd and
watchdog along with a driver (the latter is untested due to lack of
hardware).

Looking at the other (ab)users of per cpu kernel threads:

 - drivers/* should be rather trivial to convert all users

 - RCU needs more thought, but I'm sure that Paul will figure it out
   in no time :)

   The main issue with RCU are the per node threads, but it should be
   simple to extend the infrastructure to allow registration and
   handling of per node threads as well.

 - workqueues. I did not even try to think about converting it. The
   hotplug related code in there causes strong feelings related to
   zombies and chainsaws. (See https://lwn.net/Articles/378036/)

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-13 11:00 [RFC patch 0/5] Per cpu thread hotplug infrastructure Thomas Gleixner
@ 2012-06-13 11:00 ` Thomas Gleixner
  2012-06-13 18:33   ` Paul E. McKenney
                     ` (3 more replies)
  2012-06-13 11:00 ` [RFC patch 1/5] kthread: Implement park/unpark facility Thomas Gleixner
                   ` (3 subsequent siblings)
  4 siblings, 4 replies; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-13 11:00 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat, Rusty Russell,
	Paul E. McKenney, Tejun Heo

[-- Attachment #1: smpboot-percpu-thread-infrastructure.patch --]
[-- Type: text/plain, Size: 9066 bytes --]

Provide a generic interface for setting up and tearing down percpu
threads.

On registration the threads for already online cpus are created and
started. On deregistration (modules) the threads are stoppped.

During hotplug operations the threads are created, started, parked and
unparked. The datastructure for registration provides a pointer to
percpu storage space and optional setup, cleanup, park, unpark
functions. These functions are called when the thread state changes.

Thread functions should look like the following:

int thread_fn(void *cookie)
{
	while (!smpboot_thread_check_parking(cookie)) {
		do_stuff();
	}
	return 0;
}

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/smpboot.h |   40 +++++++++
 kernel/cpu.c            |    6 +
 kernel/smpboot.c        |  205 ++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/smpboot.h        |    4 
 4 files changed, 255 insertions(+)

Index: tip/include/linux/smpboot.h
===================================================================
--- /dev/null
+++ tip/include/linux/smpboot.h
@@ -0,0 +1,40 @@
+#ifndef _LINUX_SMPBOOT_H
+#define _LINUX_SMPBOOT_H
+
+#include <linux/types.h>
+
+struct task_struct;
+/* Cookie handed to the thread_fn*/
+struct smpboot_thread_data;
+
+/**
+ * struct smp_hotplug_thread - CPU hotplug related thread descriptor
+ * @store:		Pointer to per cpu storage for the task pointers
+ * @list:		List head for core management
+ * @thread_fn:		The associated thread function
+ * @setup:		Optional setup function, called when the thread gets
+ *			operational the first time
+ * @cleanup:		Optional cleanup function, called when the thread
+ *			should stop (module exit)
+ * @park:		Optional park function, called when the thread is
+ *			parked (cpu offline)
+ * @unpark:		Optional unpark function, called when the thread is
+ *			unparked (cpu online)
+ * @thread_comm:	The base name of the thread
+ */
+struct smp_hotplug_thread {
+	struct task_struct __percpu	**store;
+	struct list_head		list;
+	int				(*thread_fn)(void *data);
+	void				(*setup)(unsigned int cpu);
+	void				(*cleanup)(unsigned int cpu, bool online);
+	void				(*park)(unsigned int cpu);
+	void				(*unpark)(unsigned int cpu);
+	const char			*thread_comm;
+};
+
+int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread);
+void smpboot_unregister_percpu_thread(struct smp_hotplug_thread *plug_thread);
+int smpboot_thread_check_parking(struct smpboot_thread_data *td);
+
+#endif
Index: tip/kernel/cpu.c
===================================================================
--- tip.orig/kernel/cpu.c
+++ tip/kernel/cpu.c
@@ -280,6 +280,7 @@ static int __ref _cpu_down(unsigned int 
 				__func__, cpu);
 		goto out_release;
 	}
+	smpboot_park_threads(cpu);
 
 	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
 	if (err) {
@@ -354,6 +355,10 @@ static int __cpuinit _cpu_up(unsigned in
 		goto out;
 	}
 
+	ret = smpboot_create_threads(cpu);
+	if (ret)
+		goto out;
+
 	ret = __cpu_notify(CPU_UP_PREPARE | mod, hcpu, -1, &nr_calls);
 	if (ret) {
 		nr_calls--;
@@ -370,6 +375,7 @@ static int __cpuinit _cpu_up(unsigned in
 
 	/* Now call notifier in preparation. */
 	cpu_notify(CPU_ONLINE | mod, hcpu);
+	smpboot_unpark_threads(cpu);
 
 out_notify:
 	if (ret != 0)
Index: tip/kernel/smpboot.c
===================================================================
--- tip.orig/kernel/smpboot.c
+++ tip/kernel/smpboot.c
@@ -1,11 +1,17 @@
 /*
  * Common SMP CPU bringup/teardown functions
  */
+#include <linux/cpu.h>
 #include <linux/err.h>
 #include <linux/smp.h>
 #include <linux/init.h>
+#include <linux/list.h>
+#include <linux/slab.h>
 #include <linux/sched.h>
+#include <linux/export.h>
 #include <linux/percpu.h>
+#include <linux/kthread.h>
+#include <linux/smpboot.h>
 
 #include "smpboot.h"
 
@@ -65,3 +71,202 @@ void __init idle_threads_init(void)
 	}
 }
 #endif
+
+static LIST_HEAD(hotplug_threads);
+static DEFINE_MUTEX(smpboot_threads_lock);
+
+struct smpboot_thread_data {
+	unsigned int			cpu;
+	unsigned int			status;
+	struct smp_hotplug_thread	*ht;
+};
+
+enum {
+	HP_THREAD_NONE = 0,
+	HP_THREAD_ACTIVE,
+	HP_THREAD_PARKED,
+};
+
+/**
+ * smpboot_thread_check_parking - percpu hotplug thread loop function
+ * @td:		thread data pointer
+ *
+ * Checks for thread stop and park conditions. Calls the necessary
+ * setup, cleanup, park and unpark functions for the registered
+ * thread.
+ *
+ * Returns 1 when the thread should exit, 0 otherwise.
+ */
+int smpboot_thread_check_parking(struct smpboot_thread_data *td)
+{
+	struct smp_hotplug_thread *ht = td->ht;
+
+again:
+	if (kthread_should_stop()) {
+		if (ht->cleanup)
+			ht->cleanup(td->cpu, cpu_online(td->cpu));
+		kfree(td);
+		return 1;
+	}
+
+	BUG_ON(td->cpu != smp_processor_id());
+
+	if (kthread_should_park()) {
+		if (ht->park && td->status == HP_THREAD_ACTIVE) {
+			ht->park(td->cpu);
+			td->status = HP_THREAD_PARKED;
+		}
+		kthread_parkme();
+		/* We might have been woken for stop */
+		goto again;
+	}
+
+	switch (td->status) {
+	case HP_THREAD_NONE:
+		if (ht->setup)
+			ht->setup(td->cpu);
+		td->status = HP_THREAD_ACTIVE;
+		break;
+	case HP_THREAD_PARKED:
+		if (ht->unpark)
+			ht->unpark(td->cpu);
+		td->status = HP_THREAD_ACTIVE;
+		break;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(smpboot_thread_check_parking);
+
+static int
+__smpboot_create_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
+{
+	struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
+	struct smpboot_thread_data *td;
+
+	if (tsk)
+		return 0;
+
+	td = kzalloc_node(sizeof(*td), GFP_KERNEL, cpu_to_node(cpu));
+	if (!td)
+		return -ENOMEM;
+	td->cpu = cpu;
+	td->ht = ht;
+
+	tsk = kthread_create_on_cpu(ht->thread_fn, td, cpu, ht->thread_comm);
+	if (IS_ERR(tsk))
+		return PTR_ERR(tsk);
+
+	get_task_struct(tsk);
+	*per_cpu_ptr(ht->store, cpu) = tsk;
+	return 0;
+}
+
+int smpboot_create_threads(unsigned int cpu)
+{
+	struct smp_hotplug_thread *cur;
+	int ret = 0;
+
+	mutex_lock(&smpboot_threads_lock);
+	list_for_each_entry(cur, &hotplug_threads, list) {
+		ret = __smpboot_create_thread(cur, cpu);
+		if (ret)
+			break;
+	}
+	mutex_unlock(&smpboot_threads_lock);
+	return ret;
+}
+
+static void smpboot_unpark_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
+{
+	struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
+
+	kthread_unpark(tsk);
+}
+
+void smpboot_unpark_threads(unsigned int cpu)
+{
+	struct smp_hotplug_thread *cur;
+
+	mutex_lock(&smpboot_threads_lock);
+	list_for_each_entry(cur, &hotplug_threads, list)
+		smpboot_unpark_thread(cur, cpu);
+	mutex_unlock(&smpboot_threads_lock);
+}
+
+static void smpboot_park_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
+{
+	struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
+
+	if (tsk)
+		kthread_park(tsk);
+}
+
+void smpboot_park_threads(unsigned int cpu)
+{
+	struct smp_hotplug_thread *cur;
+
+	mutex_lock(&smpboot_threads_lock);
+	list_for_each_entry(cur, &hotplug_threads, list)
+		smpboot_park_thread(cur, cpu);
+	mutex_unlock(&smpboot_threads_lock);
+}
+
+static void smpboot_destroy_threads(struct smp_hotplug_thread *ht)
+{
+	unsigned int cpu;
+
+	/* We need to destroy also the parked threads of offline cpus */
+	for_each_possible_cpu(cpu) {
+		struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
+
+		if (tsk) {
+			kthread_stop(tsk);
+			put_task_struct(tsk);
+			*per_cpu_ptr(ht->store, cpu) = NULL;
+		}
+	}
+}
+
+/**
+ * smpboot_register_percpu_thread - Register a per_cpu thread related to hotplug
+ * @plug_thread:	Hotplug thread descriptor
+ *
+ * Creates and starts the threads on all online cpus.
+ */
+int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
+{
+	unsigned int cpu;
+	int ret = 0;
+
+	mutex_lock(&smpboot_threads_lock);
+	for_each_online_cpu(cpu) {
+		ret = __smpboot_create_thread(plug_thread, cpu);
+		if (ret) {
+			smpboot_destroy_threads(plug_thread);
+			goto out;
+		}
+		smpboot_unpark_thread(plug_thread, cpu);
+	}
+	list_add(&plug_thread->list, &hotplug_threads);
+out:
+	mutex_unlock(&smpboot_threads_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread);
+
+/**
+ * smpboot_unregister_percpu_thread - Unregister a per_cpu thread related to hotplug
+ * @plug_thread:	Hotplug thread descriptor
+ *
+ * Stops all threads on all possible cpus.
+ */
+void smpboot_unregister_percpu_thread(struct smp_hotplug_thread *plug_thread)
+{
+	get_online_cpus();
+	mutex_lock(&smpboot_threads_lock);
+	list_del(&plug_thread->list);
+	smpboot_destroy_threads(plug_thread);
+	mutex_unlock(&smpboot_threads_lock);
+	put_online_cpus();
+}
+EXPORT_SYMBOL_GPL(smpboot_unregister_percpu_thread);
Index: tip/kernel/smpboot.h
===================================================================
--- tip.orig/kernel/smpboot.h
+++ tip/kernel/smpboot.h
@@ -13,4 +13,8 @@ static inline void idle_thread_set_boot_
 static inline void idle_threads_init(void) { }
 #endif
 
+int smpboot_create_threads(unsigned int cpu);
+void smpboot_park_threads(unsigned int cpu);
+void smpboot_unpark_threads(unsigned int cpu);
+
 #endif



^ permalink raw reply	[flat|nested] 53+ messages in thread

* [RFC patch 1/5] kthread: Implement park/unpark facility
  2012-06-13 11:00 [RFC patch 0/5] Per cpu thread hotplug infrastructure Thomas Gleixner
  2012-06-13 11:00 ` [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads Thomas Gleixner
@ 2012-06-13 11:00 ` Thomas Gleixner
  2012-06-14  2:32   ` Namhyung Kim
                     ` (3 more replies)
  2012-06-13 11:00 ` [RFC patch 3/5] softirq: Use hotplug thread infrastructure Thomas Gleixner
                   ` (2 subsequent siblings)
  4 siblings, 4 replies; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-13 11:00 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat, Rusty Russell,
	Paul E. McKenney, Tejun Heo

[-- Attachment #1: kthread-implement-park-and-percpu.patch --]
[-- Type: text/plain, Size: 8515 bytes --]

To avoid the full teardown/setup of per cpu kthreads in the case of
cpu hot(un)plug, provide a facility which allows to put the kthread
into a park position and unpark it when the cpu comes online again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/kthread.h |   11 ++-
 kernel/kthread.c        |  165 ++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 162 insertions(+), 14 deletions(-)

Index: tip/include/linux/kthread.h
===================================================================
--- tip.orig/include/linux/kthread.h
+++ tip/include/linux/kthread.h
@@ -14,6 +14,11 @@ struct task_struct *kthread_create_on_no
 	kthread_create_on_node(threadfn, data, -1, namefmt, ##arg)
 
 
+struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
+					  void *data,
+					  unsigned int cpu,
+					  const char *namefmt);
+
 /**
  * kthread_run - create and wake a thread.
  * @threadfn: the function to run until signal_pending(current).
@@ -34,9 +39,13 @@ struct task_struct *kthread_create_on_no
 
 void kthread_bind(struct task_struct *k, unsigned int cpu);
 int kthread_stop(struct task_struct *k);
-int kthread_should_stop(void);
+bool kthread_should_stop(void);
+bool kthread_should_park(void);
 bool kthread_freezable_should_stop(bool *was_frozen);
 void *kthread_data(struct task_struct *k);
+int kthread_park(struct task_struct *k);
+void kthread_unpark(struct task_struct *k);
+void kthread_parkme(void);
 
 int kthreadd(void *unused);
 extern struct task_struct *kthreadd_task;
Index: tip/kernel/kthread.c
===================================================================
--- tip.orig/kernel/kthread.c
+++ tip/kernel/kthread.c
@@ -37,11 +37,20 @@ struct kthread_create_info
 };
 
 struct kthread {
-	int should_stop;
+	unsigned long flags;
+	unsigned int cpu;
 	void *data;
+	struct completion parked;
 	struct completion exited;
 };
 
+enum KTHREAD_BITS {
+	KTHREAD_IS_PER_CPU = 0,
+	KTHREAD_SHOULD_STOP,
+	KTHREAD_SHOULD_PARK,
+	KTHREAD_IS_PARKED,
+};
+
 #define to_kthread(tsk)	\
 	container_of((tsk)->vfork_done, struct kthread, exited)
 
@@ -52,13 +61,29 @@ struct kthread {
  * and this will return true.  You should then return, and your return
  * value will be passed through to kthread_stop().
  */
-int kthread_should_stop(void)
+bool kthread_should_stop(void)
 {
-	return to_kthread(current)->should_stop;
+	return test_bit(KTHREAD_SHOULD_STOP, &to_kthread(current)->flags);
 }
 EXPORT_SYMBOL(kthread_should_stop);
 
 /**
+ * kthread_should_park - should this kthread return now?
+ *
+ * When someone calls kthread_park() on your kthread, it will be woken
+ * and this will return true.  You should then return, and your return
+ * value will be passed through to kthread_park().
+ *
+ * Similar to kthread_should_stop(), but this keeps the thread alive
+ * and in a park position. kthread_unpark() "restart" the thread and
+ * calls the thread function again.
+ */
+bool kthread_should_park(void)
+{
+	return test_bit(KTHREAD_SHOULD_PARK, &to_kthread(current)->flags);
+}
+
+/**
  * kthread_freezable_should_stop - should this freezable kthread return now?
  * @was_frozen: optional out parameter, indicates whether %current was frozen
  *
@@ -96,6 +121,23 @@ void *kthread_data(struct task_struct *t
 	return to_kthread(task)->data;
 }
 
+static void __kthread_parkme(struct kthread *self)
+{
+	__set_current_state(TASK_INTERRUPTIBLE);
+	while (test_bit(KTHREAD_SHOULD_PARK, &self->flags)) {
+		if (!test_and_set_bit(KTHREAD_IS_PARKED, &self->flags))
+			complete(&self->parked);
+		schedule();
+	}
+	clear_bit(KTHREAD_IS_PARKED, &self->flags);
+	__set_current_state(TASK_RUNNING);
+}
+
+void kthread_parkme(void)
+{
+	__kthread_parkme(to_kthread(current));
+}
+
 static int kthread(void *_create)
 {
 	/* Copy data: it's on kthread's stack */
@@ -105,9 +147,10 @@ static int kthread(void *_create)
 	struct kthread self;
 	int ret;
 
-	self.should_stop = 0;
+	self.flags = 0;
 	self.data = data;
 	init_completion(&self.exited);
+	init_completion(&self.parked);
 	current->vfork_done = &self.exited;
 
 	/* OK, tell user we're spawned, wait for stop or wakeup */
@@ -117,9 +160,11 @@ static int kthread(void *_create)
 	schedule();
 
 	ret = -EINTR;
-	if (!self.should_stop)
-		ret = threadfn(data);
 
+	if (!test_bit(KTHREAD_SHOULD_STOP, &self.flags)) {
+		__kthread_parkme(&self);
+		ret = threadfn(data);
+	}
 	/* we can't just return, we must preserve "self" on stack */
 	do_exit(ret);
 }
@@ -172,8 +217,7 @@ static void create_kthread(struct kthrea
  * Returns a task_struct or ERR_PTR(-ENOMEM).
  */
 struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
-					   void *data,
-					   int node,
+					   void *data, int node,
 					   const char namefmt[],
 					   ...)
 {
@@ -210,6 +254,13 @@ struct task_struct *kthread_create_on_no
 }
 EXPORT_SYMBOL(kthread_create_on_node);
 
+static void __kthread_bind(struct task_struct *p, unsigned int cpu)
+{
+	/* It's safe because the task is inactive. */
+	do_set_cpus_allowed(p, cpumask_of(cpu));
+	p->flags |= PF_THREAD_BOUND;
+}
+
 /**
  * kthread_bind - bind a just-created kthread to a cpu.
  * @p: thread created by kthread_create().
@@ -226,14 +277,102 @@ void kthread_bind(struct task_struct *p,
 		WARN_ON(1);
 		return;
 	}
-
-	/* It's safe because the task is inactive. */
-	do_set_cpus_allowed(p, cpumask_of(cpu));
-	p->flags |= PF_THREAD_BOUND;
+	__kthread_bind(p, cpu);
 }
 EXPORT_SYMBOL(kthread_bind);
 
 /**
+ * kthread_create_on_cpu - Create a cpu bound kthread
+ * @threadfn: the function to run until signal_pending(current).
+ * @data: data ptr for @threadfn.
+ * @node: memory node number.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: This helper function creates and names a kernel thread
+ * and binds it to a given CPU. The thread will be woken and put into
+ * park mode.
+ */
+struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
+					  void *data, unsigned int cpu,
+					  const char *namefmt)
+{
+	struct task_struct *p;
+
+	p = kthread_create_on_node(threadfn, data, cpu_to_node(cpu), namefmt,
+				   cpu);
+	if (IS_ERR(p))
+		return p;
+	/* Park the thread, mark it percpu and then bind it */
+	kthread_park(p);
+	set_bit(KTHREAD_IS_PER_CPU, &to_kthread(p)->flags);
+	to_kthread(p)->cpu = cpu;
+	__kthread_bind(p, cpu);
+	return p;
+}
+
+/**
+ * kthread_unpark - unpark a thread created by kthread_create().
+ * @k:		thread created by kthread_create().
+ *
+ * Sets kthread_should_park() for @k to return false, wakes it, and
+ * waits for it to return. If the thread is marked percpu then its
+ * bound to the cpu again.
+ */
+void kthread_unpark(struct task_struct *k)
+{
+	struct kthread *kthread;
+
+	get_task_struct(k);
+
+	kthread = to_kthread(k);
+	barrier(); /* it might have exited */
+	if (k->vfork_done != NULL) {
+		clear_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
+		if (test_bit(KTHREAD_IS_PARKED, &kthread->flags)) {
+			if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags))
+				__kthread_bind(k, kthread->cpu);
+			wake_up_process(k);
+		}
+	}
+	put_task_struct(k);
+}
+
+/**
+ * kthread_park - park a thread created by kthread_create().
+ * @k: thread created by kthread_create().
+ *
+ * Sets kthread_should_park() for @k to return true, wakes it, and
+ * waits for it to return. This can also be called after kthread_create()
+ * instead of calling wake_up_process(): the thread will park without
+ * calling threadfn().
+ *
+ * Returns 0 if the thread is parked, -ENOSYS if the thread exited.
+ * If called by the kthread itself just the park bit is set.
+ */
+int kthread_park(struct task_struct *k)
+{
+	struct kthread *kthread;
+	int ret = -ENOSYS;
+
+	get_task_struct(k);
+
+	kthread = to_kthread(k);
+	barrier(); /* it might have exited */
+	if (k->vfork_done != NULL) {
+		if (!test_bit(KTHREAD_IS_PARKED, &kthread->flags)) {
+			set_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
+			if (k != current) {
+				wake_up_process(k);
+				wait_for_completion(&kthread->parked);
+			}
+		}
+		ret = 0;
+	}
+	put_task_struct(k);
+	return ret;
+}
+
+/**
  * kthread_stop - stop a thread created by kthread_create().
  * @k: thread created by kthread_create().
  *
@@ -259,7 +398,7 @@ int kthread_stop(struct task_struct *k)
 	kthread = to_kthread(k);
 	barrier(); /* it might have exited */
 	if (k->vfork_done != NULL) {
-		kthread->should_stop = 1;
+		set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
 		wake_up_process(k);
 		wait_for_completion(&kthread->exited);
 	}



^ permalink raw reply	[flat|nested] 53+ messages in thread

* [RFC patch 3/5] softirq: Use hotplug thread infrastructure
  2012-06-13 11:00 [RFC patch 0/5] Per cpu thread hotplug infrastructure Thomas Gleixner
  2012-06-13 11:00 ` [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads Thomas Gleixner
  2012-06-13 11:00 ` [RFC patch 1/5] kthread: Implement park/unpark facility Thomas Gleixner
@ 2012-06-13 11:00 ` Thomas Gleixner
  2012-06-13 11:00 ` [RFC patch 5/5] infiniband: ehca: " Thomas Gleixner
  2012-06-13 11:00 ` [RFC patch 4/5] watchdog: " Thomas Gleixner
  4 siblings, 0 replies; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-13 11:00 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat, Rusty Russell,
	Paul E. McKenney, Tejun Heo

[-- Attachment #1: softirq-use-smboot-threads.patch --]
[-- Type: text/plain, Size: 3945 bytes --]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/softirq.c |   84 +++++++++++--------------------------------------------
 1 file changed, 18 insertions(+), 66 deletions(-)

Index: tip/kernel/softirq.c
===================================================================
--- tip.orig/kernel/softirq.c
+++ tip/kernel/softirq.c
@@ -23,6 +23,7 @@
 #include <linux/rcupdate.h>
 #include <linux/ftrace.h>
 #include <linux/smp.h>
+#include <linux/smpboot.h>
 #include <linux/tick.h>
 
 #define CREATE_TRACE_POINTS
@@ -733,24 +734,17 @@ void __init softirq_init(void)
 	open_softirq(HI_SOFTIRQ, tasklet_hi_action);
 }
 
-static int run_ksoftirqd(void * __bind_cpu)
+static int run_ksoftirqd(void *td)
 {
-	set_current_state(TASK_INTERRUPTIBLE);
-
-	while (!kthread_should_stop()) {
+	while (!smpboot_thread_check_parking(td)) {
+		set_current_state(TASK_INTERRUPTIBLE);
 		preempt_disable();
-		if (!local_softirq_pending()) {
+		if (!local_softirq_pending())
 			schedule_preempt_disabled();
-		}
 
 		__set_current_state(TASK_RUNNING);
 
 		while (local_softirq_pending()) {
-			/* Preempt disable stops cpu going offline.
-			   If already offline, we'll be on wrong CPU:
-			   don't process */
-			if (cpu_is_offline((long)__bind_cpu))
-				goto wait_to_die;
 			local_irq_disable();
 			if (local_softirq_pending())
 				__do_softirq();
@@ -758,23 +752,10 @@ static int run_ksoftirqd(void * __bind_c
 			sched_preempt_enable_no_resched();
 			cond_resched();
 			preempt_disable();
-			rcu_note_context_switch((long)__bind_cpu);
+			rcu_note_context_switch(smp_processor_id());
 		}
 		preempt_enable();
-		set_current_state(TASK_INTERRUPTIBLE);
 	}
-	__set_current_state(TASK_RUNNING);
-	return 0;
-
-wait_to_die:
-	preempt_enable();
-	/* Wait for kthread_stop */
-	set_current_state(TASK_INTERRUPTIBLE);
-	while (!kthread_should_stop()) {
-		schedule();
-		set_current_state(TASK_INTERRUPTIBLE);
-	}
-	__set_current_state(TASK_RUNNING);
 	return 0;
 }
 
@@ -841,50 +822,17 @@ static int __cpuinit cpu_callback(struct
 				  unsigned long action,
 				  void *hcpu)
 {
-	int hotcpu = (unsigned long)hcpu;
-	struct task_struct *p;
-
 	switch (action) {
-	case CPU_UP_PREPARE:
-	case CPU_UP_PREPARE_FROZEN:
-		p = kthread_create_on_node(run_ksoftirqd,
-					   hcpu,
-					   cpu_to_node(hotcpu),
-					   "ksoftirqd/%d", hotcpu);
-		if (IS_ERR(p)) {
-			printk("ksoftirqd for %i failed\n", hotcpu);
-			return notifier_from_errno(PTR_ERR(p));
-		}
-		kthread_bind(p, hotcpu);
-  		per_cpu(ksoftirqd, hotcpu) = p;
- 		break;
-	case CPU_ONLINE:
-	case CPU_ONLINE_FROZEN:
-		wake_up_process(per_cpu(ksoftirqd, hotcpu));
-		break;
 #ifdef CONFIG_HOTPLUG_CPU
-	case CPU_UP_CANCELED:
-	case CPU_UP_CANCELED_FROZEN:
-		if (!per_cpu(ksoftirqd, hotcpu))
-			break;
-		/* Unbind so it can run.  Fall thru. */
-		kthread_bind(per_cpu(ksoftirqd, hotcpu),
-			     cpumask_any(cpu_online_mask));
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN: {
-		static const struct sched_param param = {
-			.sched_priority = MAX_RT_PRIO-1
-		};
-
-		p = per_cpu(ksoftirqd, hotcpu);
-		per_cpu(ksoftirqd, hotcpu) = NULL;
-		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
-		kthread_stop(p);
+		int hotcpu = (unsigned long)hcpu;
+
 		takeover_tasklets(hotcpu);
 		break;
 	}
 #endif /* CONFIG_HOTPLUG_CPU */
- 	}
+	}
 	return NOTIFY_OK;
 }
 
@@ -892,14 +840,18 @@ static struct notifier_block __cpuinitda
 	.notifier_call = cpu_callback
 };
 
+static struct smp_hotplug_thread softirq_threads = {
+	.store		= &ksoftirqd,
+	.thread_fn	= run_ksoftirqd,
+	.thread_comm	= "ksoftirqd/%u",
+};
+
 static __init int spawn_ksoftirqd(void)
 {
-	void *cpu = (void *)(long)smp_processor_id();
-	int err = cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
-
-	BUG_ON(err != NOTIFY_OK);
-	cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
 	register_cpu_notifier(&cpu_nfb);
+
+	BUG_ON(smpboot_register_percpu_thread(&softirq_threads));
+
 	return 0;
 }
 early_initcall(spawn_ksoftirqd);



^ permalink raw reply	[flat|nested] 53+ messages in thread

* [RFC patch 4/5] watchdog: Use hotplug thread infrastructure
  2012-06-13 11:00 [RFC patch 0/5] Per cpu thread hotplug infrastructure Thomas Gleixner
                   ` (3 preceding siblings ...)
  2012-06-13 11:00 ` [RFC patch 5/5] infiniband: ehca: " Thomas Gleixner
@ 2012-06-13 11:00 ` Thomas Gleixner
  4 siblings, 0 replies; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-13 11:00 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat, Rusty Russell,
	Paul E. McKenney, Tejun Heo

[-- Attachment #1: watchdog-use-smb-kthreads.patch --]
[-- Type: text/plain, Size: 9022 bytes --]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/watchdog.c |  232 ++++++++++++++++--------------------------------------
 1 file changed, 69 insertions(+), 163 deletions(-)

Index: tip/kernel/watchdog.c
===================================================================
--- tip.orig/kernel/watchdog.c
+++ tip/kernel/watchdog.c
@@ -22,6 +22,7 @@
 #include <linux/notifier.h>
 #include <linux/module.h>
 #include <linux/sysctl.h>
+#include <linux/smpboot.h>
 
 #include <asm/irq_regs.h>
 #include <linux/kvm_para.h>
@@ -29,6 +30,7 @@
 
 int watchdog_enabled = 1;
 int __read_mostly watchdog_thresh = 10;
+static int __read_mostly watchdog_disabled;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
 static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog);
@@ -256,6 +258,9 @@ static void watchdog_interrupt_count(voi
 static inline void watchdog_interrupt_count(void) { return; }
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
+static int watchdog_nmi_enable(unsigned int cpu);
+static void watchdog_nmi_disable(unsigned int cpu);
+
 /* watchdog kicker functions */
 static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 {
@@ -327,23 +332,46 @@ static enum hrtimer_restart watchdog_tim
 	return HRTIMER_RESTART;
 }
 
-
-/*
- * The watchdog thread - touches the timestamp.
- */
-static int watchdog(void *unused)
+static void watchdog_enable(unsigned int cpu)
 {
-	struct sched_param param = { .sched_priority = 0 };
 	struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
 
-	/* initialize timestamp */
-	__touch_watchdog();
+	/* Enable the perf event */
+	watchdog_nmi_enable(cpu);
 
 	/* kick off the timer for the hardlockup detector */
+	hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	hrtimer->function = watchdog_timer_fn;
+
 	/* done here because hrtimer_start can only pin to smp_processor_id() */
 	hrtimer_start(hrtimer, ns_to_ktime(get_sample_period()),
 		      HRTIMER_MODE_REL_PINNED);
 
+	/* initialize timestamp */
+	__touch_watchdog();
+}
+
+static void watchdog_disable(unsigned int cpu)
+{
+	struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
+
+	hrtimer_cancel(hrtimer);
+	/* disable the perf event */
+	watchdog_nmi_disable(cpu);
+}
+
+/*
+ * The watchdog thread - touches the timestamp.
+ */
+static int watchdog(void *td)
+{
+	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
+
+	if (!watchdog_enabled)
+		kthread_park(current);
+
+	sched_setscheduler(current, SCHED_FIFO, &param);
+
 	set_current_state(TASK_INTERRUPTIBLE);
 	/*
 	 * Run briefly (kicked by the hrtimer callback function) once every
@@ -352,27 +380,16 @@ static int watchdog(void *unused)
 	 * 2*watchdog_thresh seconds then the debug-printout triggers in
 	 * watchdog_timer_fn().
 	 */
-	while (!kthread_should_stop()) {
+	while (!smpboot_thread_check_parking(td)) {
+		set_current_state(TASK_INTERRUPTIBLE);
 		__touch_watchdog();
 		schedule();
-
-		if (kthread_should_stop())
-			break;
-
-		set_current_state(TASK_INTERRUPTIBLE);
 	}
-	/*
-	 * Drop the policy/priority elevation during thread exit to avoid a
-	 * scheduling latency spike.
-	 */
-	__set_current_state(TASK_RUNNING);
-	sched_setscheduler(current, SCHED_NORMAL, &param);
 	return 0;
 }
 
-
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
-static int watchdog_nmi_enable(int cpu)
+static int watchdog_nmi_enable(unsigned int cpu)
 {
 	struct perf_event_attr *wd_attr;
 	struct perf_event *event = per_cpu(watchdog_ev, cpu);
@@ -395,7 +412,6 @@ static int watchdog_nmi_enable(int cpu)
 		goto out_save;
 	}
 
-
 	/* vary the KERN level based on the returned errno */
 	if (PTR_ERR(event) == -EOPNOTSUPP)
 		pr_info("disabled (cpu%i): not supported (no LAPIC?)\n", cpu);
@@ -416,7 +432,7 @@ out:
 	return 0;
 }
 
-static void watchdog_nmi_disable(int cpu)
+static void watchdog_nmi_disable(unsigned int cpu)
 {
 	struct perf_event *event = per_cpu(watchdog_ev, cpu);
 
@@ -430,107 +446,35 @@ static void watchdog_nmi_disable(int cpu
 	return;
 }
 #else
-static int watchdog_nmi_enable(int cpu) { return 0; }
-static void watchdog_nmi_disable(int cpu) { return; }
+static int watchdog_nmi_enable(unsigned int cpu) { return 0; }
+static void watchdog_nmi_disable(unsigned int cpu) { return; }
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
 /* prepare/enable/disable routines */
-static void watchdog_prepare_cpu(int cpu)
-{
-	struct hrtimer *hrtimer = &per_cpu(watchdog_hrtimer, cpu);
-
-	WARN_ON(per_cpu(softlockup_watchdog, cpu));
-	hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	hrtimer->function = watchdog_timer_fn;
-}
-
-static int watchdog_enable(int cpu)
-{
-	struct task_struct *p = per_cpu(softlockup_watchdog, cpu);
-	int err = 0;
-
-	/* enable the perf event */
-	err = watchdog_nmi_enable(cpu);
-
-	/* Regardless of err above, fall through and start softlockup */
-
-	/* create the watchdog thread */
-	if (!p) {
-		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
-		p = kthread_create_on_node(watchdog, NULL, cpu_to_node(cpu), "watchdog/%d", cpu);
-		if (IS_ERR(p)) {
-			pr_err("softlockup watchdog for %i failed\n", cpu);
-			if (!err) {
-				/* if hardlockup hasn't already set this */
-				err = PTR_ERR(p);
-				/* and disable the perf event */
-				watchdog_nmi_disable(cpu);
-			}
-			goto out;
-		}
-		sched_setscheduler(p, SCHED_FIFO, &param);
-		kthread_bind(p, cpu);
-		per_cpu(watchdog_touch_ts, cpu) = 0;
-		per_cpu(softlockup_watchdog, cpu) = p;
-		wake_up_process(p);
-	}
-
-out:
-	return err;
-}
-
-static void watchdog_disable(int cpu)
-{
-	struct task_struct *p = per_cpu(softlockup_watchdog, cpu);
-	struct hrtimer *hrtimer = &per_cpu(watchdog_hrtimer, cpu);
-
-	/*
-	 * cancel the timer first to stop incrementing the stats
-	 * and waking up the kthread
-	 */
-	hrtimer_cancel(hrtimer);
-
-	/* disable the perf event */
-	watchdog_nmi_disable(cpu);
-
-	/* stop the watchdog thread */
-	if (p) {
-		per_cpu(softlockup_watchdog, cpu) = NULL;
-		kthread_stop(p);
-	}
-}
-
 /* sysctl functions */
 #ifdef CONFIG_SYSCTL
 static void watchdog_enable_all_cpus(void)
 {
-	int cpu;
-
-	watchdog_enabled = 0;
-
-	for_each_online_cpu(cpu)
-		if (!watchdog_enable(cpu))
-			/* if any cpu succeeds, watchdog is considered
-			   enabled for the system */
-			watchdog_enabled = 1;
-
-	if (!watchdog_enabled)
-		pr_err("failed to be enabled on some cpus\n");
+	unsigned int cpu;
 
+	if (watchdog_disabled) {
+		watchdog_disabled = 0;
+		for_each_online_cpu(cpu)
+			kthread_unpark(per_cpu(softlockup_watchdog, cpu));
+	}
 }
 
 static void watchdog_disable_all_cpus(void)
 {
-	int cpu;
+	unsigned int cpu;
 
-	for_each_online_cpu(cpu)
-		watchdog_disable(cpu);
-
-	/* if all watchdogs are disabled, then they are disabled for the system */
-	watchdog_enabled = 0;
+	if (!watchdog_disabled) {
+		watchdog_disabled = 1;
+		for_each_online_cpu(cpu)
+			kthread_park(per_cpu(softlockup_watchdog, cpu));
+	}
 }
 
-
 /*
  * proc handler for /proc/sys/kernel/nmi_watchdog,watchdog_thresh
  */
@@ -540,73 +484,35 @@ int proc_dowatchdog(struct ctl_table *ta
 {
 	int ret;
 
+	if (watchdog_disabled < 0)
+		return -ENODEV;
+
 	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 	if (ret || !write)
-		goto out;
+		return ret;
 
 	if (watchdog_enabled && watchdog_thresh)
 		watchdog_enable_all_cpus();
 	else
 		watchdog_disable_all_cpus();
 
-out:
 	return ret;
 }
 #endif /* CONFIG_SYSCTL */
 
-
-/*
- * Create/destroy watchdog threads as CPUs come and go:
- */
-static int __cpuinit
-cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-	int hotcpu = (unsigned long)hcpu;
-
-	switch (action) {
-	case CPU_UP_PREPARE:
-	case CPU_UP_PREPARE_FROZEN:
-		watchdog_prepare_cpu(hotcpu);
-		break;
-	case CPU_ONLINE:
-	case CPU_ONLINE_FROZEN:
-		if (watchdog_enabled)
-			watchdog_enable(hotcpu);
-		break;
-#ifdef CONFIG_HOTPLUG_CPU
-	case CPU_UP_CANCELED:
-	case CPU_UP_CANCELED_FROZEN:
-		watchdog_disable(hotcpu);
-		break;
-	case CPU_DEAD:
-	case CPU_DEAD_FROZEN:
-		watchdog_disable(hotcpu);
-		break;
-#endif /* CONFIG_HOTPLUG_CPU */
-	}
-
-	/*
-	 * hardlockup and softlockup are not important enough
-	 * to block cpu bring up.  Just always succeed and
-	 * rely on printk output to flag problems.
-	 */
-	return NOTIFY_OK;
-}
-
-static struct notifier_block __cpuinitdata cpu_nfb = {
-	.notifier_call = cpu_callback
+static struct smp_hotplug_thread watchdog_threads = {
+	.store		= &softlockup_watchdog,
+	.thread_fn	= watchdog,
+	.thread_comm	= "watchdog/%u",
+	.setup		= watchdog_enable,
+	.park		= watchdog_disable,
+	.unpark		= watchdog_enable,
 };
 
 void __init lockup_detector_init(void)
 {
-	void *cpu = (void *)(long)smp_processor_id();
-	int err;
-
-	err = cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
-	WARN_ON(notifier_to_errno(err));
-
-	cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
-	register_cpu_notifier(&cpu_nfb);
-
-	return;
+	if (smpboot_register_percpu_thread(&watchdog_threads)) {
+		pr_err("Failed to create watchdog threads, disabled\n");
+		watchdog_disabled = -ENODEV;
+	}
 }



^ permalink raw reply	[flat|nested] 53+ messages in thread

* [RFC patch 5/5] infiniband: ehca: Use hotplug thread infrastructure
  2012-06-13 11:00 [RFC patch 0/5] Per cpu thread hotplug infrastructure Thomas Gleixner
                   ` (2 preceding siblings ...)
  2012-06-13 11:00 ` [RFC patch 3/5] softirq: Use hotplug thread infrastructure Thomas Gleixner
@ 2012-06-13 11:00 ` Thomas Gleixner
  2012-06-18  6:30   ` Rusty Russell
  2012-06-13 11:00 ` [RFC patch 4/5] watchdog: " Thomas Gleixner
  4 siblings, 1 reply; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-13 11:00 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat, Rusty Russell,
	Paul E. McKenney, Tejun Heo

[-- Attachment #1: infiniband-ehca-use-smp-threads.patch --]
[-- Type: text/plain, Size: 9759 bytes --]

Get rid of the hotplug notifiers and use the generic hotplug thread
infrastructure.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/infiniband/hw/ehca/ehca_irq.c |  240 ++++++++++++----------------------
 drivers/infiniband/hw/ehca/ehca_irq.h |    5 
 2 files changed, 92 insertions(+), 153 deletions(-)

Index: tip/drivers/infiniband/hw/ehca/ehca_irq.c
===================================================================
--- tip.orig/drivers/infiniband/hw/ehca/ehca_irq.c
+++ tip/drivers/infiniband/hw/ehca/ehca_irq.c
@@ -42,6 +42,7 @@
  */
 
 #include <linux/slab.h>
+#include <linux/smpboot.h>
 
 #include "ehca_classes.h"
 #include "ehca_irq.h"
@@ -652,7 +653,7 @@ void ehca_tasklet_eq(unsigned long data)
 	ehca_process_eq((struct ehca_shca*)data, 1);
 }
 
-static inline int find_next_online_cpu(struct ehca_comp_pool *pool)
+static int find_next_online_cpu(struct ehca_comp_pool *pool)
 {
 	int cpu;
 	unsigned long flags;
@@ -662,10 +663,15 @@ static inline int find_next_online_cpu(s
 		ehca_dmp(cpu_online_mask, cpumask_size(), "");
 
 	spin_lock_irqsave(&pool->last_cpu_lock, flags);
-	cpu = cpumask_next(pool->last_cpu, cpu_online_mask);
-	if (cpu >= nr_cpu_ids)
-		cpu = cpumask_first(cpu_online_mask);
-	pool->last_cpu = cpu;
+	while (1) {
+		cpu = cpumask_next(pool->last_cpu, cpu_online_mask);
+		if (cpu >= nr_cpu_ids)
+			cpu = cpumask_first(cpu_online_mask);
+		pool->last_cpu = cpu;
+		/* Might be on the way out */
+		if (per_cpu_ptr(pool->cpu_comp_tasks, cpu)->active)
+			break;
+	}
 	spin_unlock_irqrestore(&pool->last_cpu_lock, flags);
 
 	return cpu;
@@ -719,19 +725,18 @@ static void queue_comp_task(struct ehca_
 static void run_comp_task(struct ehca_cpu_comp_task *cct)
 {
 	struct ehca_cq *cq;
-	unsigned long flags;
 
-	spin_lock_irqsave(&cct->task_lock, flags);
+	spin_lock_irq(&cct->task_lock);
 
 	while (!list_empty(&cct->cq_list)) {
 		cq = list_entry(cct->cq_list.next, struct ehca_cq, entry);
-		spin_unlock_irqrestore(&cct->task_lock, flags);
+		spin_unlock_irq(&cct->task_lock);
 
 		comp_event_callback(cq);
 		if (atomic_dec_and_test(&cq->nr_events))
 			wake_up(&cq->wait_completion);
 
-		spin_lock_irqsave(&cct->task_lock, flags);
+		spin_lock_irq(&cct->task_lock);
 		spin_lock(&cq->task_lock);
 		cq->nr_callbacks--;
 		if (!cq->nr_callbacks) {
@@ -741,17 +746,51 @@ static void run_comp_task(struct ehca_cp
 		spin_unlock(&cq->task_lock);
 	}
 
-	spin_unlock_irqrestore(&cct->task_lock, flags);
+	spin_unlock_irq(&cct->task_lock);
 }
 
-static int comp_task(void *__cct)
+static void comp_task_park(unsigned int cpu)
 {
-	struct ehca_cpu_comp_task *cct = __cct;
-	int cql_empty;
+	struct ehca_cpu_comp_task *cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
+	struct ehca_cpu_comp_task *target;
+	struct ehca_cq *cq, *tmp;
+	LIST_HEAD(list);
+
+	spin_lock_irq(&cct->task_lock);
+	cct->cq_jobs = 0;
+	cct->active = 0;
+	list_splice_init(&cct->cq_list, &list);
+	spin_unlock_irq(&cct->task_lock);
+
+	cpu = find_next_online_cpu(pool);
+	target = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
+	spin_lock_irq(&target->task_lock);
+	list_for_each_entry_safe(cq, tmp, &list, entry) {
+		list_del(&cq->entry);
+		__queue_comp_task(cq, target);
+	}
+	spin_unlock_irq(&target->task_lock);
+}
+
+static void comp_task_stop(unsigned int cpu, bool online)
+{
+	struct ehca_cpu_comp_task *cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
+
+	spin_lock_irq(&cct->task_lock);
+	cct->cq_jobs = 0;
+	cct->active = 0;
+	WARN_ON(!list_empty(&cct->cq_list));
+	spin_unlock_irq(&cct->task_lock);
+}
+
+static int comp_task(void *td)
+{
+	struct ehca_cpu_comp_task *cct = this_cpu_ptr(pool->cpu_comp_tasks);
 	DECLARE_WAITQUEUE(wait, current);
+	int cql_empty;
 
-	set_current_state(TASK_INTERRUPTIBLE);
-	while (!kthread_should_stop()) {
+	while (!smpboot_thread_check_parking(td)) {
+		set_current_state(TASK_INTERRUPTIBLE);
 		add_wait_queue(&cct->wait_queue, &wait);
 
 		spin_lock_irq(&cct->task_lock);
@@ -768,131 +807,21 @@ static int comp_task(void *__cct)
 		cql_empty = list_empty(&cct->cq_list);
 		spin_unlock_irq(&cct->task_lock);
 		if (!cql_empty)
-			run_comp_task(__cct);
-
-		set_current_state(TASK_INTERRUPTIBLE);
+			run_comp_task(cct);
 	}
-	__set_current_state(TASK_RUNNING);
-
 	return 0;
 }
 
-static struct task_struct *create_comp_task(struct ehca_comp_pool *pool,
-					    int cpu)
-{
-	struct ehca_cpu_comp_task *cct;
-
-	cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
-	spin_lock_init(&cct->task_lock);
-	INIT_LIST_HEAD(&cct->cq_list);
-	init_waitqueue_head(&cct->wait_queue);
-	cct->task = kthread_create_on_node(comp_task, cct, cpu_to_node(cpu),
-					   "ehca_comp/%d", cpu);
-
-	return cct->task;
-}
-
-static void destroy_comp_task(struct ehca_comp_pool *pool,
-			      int cpu)
-{
-	struct ehca_cpu_comp_task *cct;
-	struct task_struct *task;
-	unsigned long flags_cct;
-
-	cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
-
-	spin_lock_irqsave(&cct->task_lock, flags_cct);
-
-	task = cct->task;
-	cct->task = NULL;
-	cct->cq_jobs = 0;
-
-	spin_unlock_irqrestore(&cct->task_lock, flags_cct);
-
-	if (task)
-		kthread_stop(task);
-}
-
-static void __cpuinit take_over_work(struct ehca_comp_pool *pool, int cpu)
-{
-	struct ehca_cpu_comp_task *cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
-	LIST_HEAD(list);
-	struct ehca_cq *cq;
-	unsigned long flags_cct;
-
-	spin_lock_irqsave(&cct->task_lock, flags_cct);
-
-	list_splice_init(&cct->cq_list, &list);
-
-	while (!list_empty(&list)) {
-		cq = list_entry(cct->cq_list.next, struct ehca_cq, entry);
-
-		list_del(&cq->entry);
-		__queue_comp_task(cq, this_cpu_ptr(pool->cpu_comp_tasks));
-	}
-
-	spin_unlock_irqrestore(&cct->task_lock, flags_cct);
-
-}
-
-static int __cpuinit comp_pool_callback(struct notifier_block *nfb,
-					unsigned long action,
-					void *hcpu)
-{
-	unsigned int cpu = (unsigned long)hcpu;
-	struct ehca_cpu_comp_task *cct;
-
-	switch (action) {
-	case CPU_UP_PREPARE:
-	case CPU_UP_PREPARE_FROZEN:
-		ehca_gen_dbg("CPU: %x (CPU_PREPARE)", cpu);
-		if (!create_comp_task(pool, cpu)) {
-			ehca_gen_err("Can't create comp_task for cpu: %x", cpu);
-			return notifier_from_errno(-ENOMEM);
-		}
-		break;
-	case CPU_UP_CANCELED:
-	case CPU_UP_CANCELED_FROZEN:
-		ehca_gen_dbg("CPU: %x (CPU_CANCELED)", cpu);
-		cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
-		kthread_bind(cct->task, cpumask_any(cpu_online_mask));
-		destroy_comp_task(pool, cpu);
-		break;
-	case CPU_ONLINE:
-	case CPU_ONLINE_FROZEN:
-		ehca_gen_dbg("CPU: %x (CPU_ONLINE)", cpu);
-		cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
-		kthread_bind(cct->task, cpu);
-		wake_up_process(cct->task);
-		break;
-	case CPU_DOWN_PREPARE:
-	case CPU_DOWN_PREPARE_FROZEN:
-		ehca_gen_dbg("CPU: %x (CPU_DOWN_PREPARE)", cpu);
-		break;
-	case CPU_DOWN_FAILED:
-	case CPU_DOWN_FAILED_FROZEN:
-		ehca_gen_dbg("CPU: %x (CPU_DOWN_FAILED)", cpu);
-		break;
-	case CPU_DEAD:
-	case CPU_DEAD_FROZEN:
-		ehca_gen_dbg("CPU: %x (CPU_DEAD)", cpu);
-		destroy_comp_task(pool, cpu);
-		take_over_work(pool, cpu);
-		break;
-	}
-
-	return NOTIFY_OK;
-}
-
-static struct notifier_block comp_pool_callback_nb __cpuinitdata = {
-	.notifier_call	= comp_pool_callback,
-	.priority	= 0,
+static struct smp_hotplug_thread comp_pool_threads = {
+	.thread_fn	= comp_task,
+	.thread_comm	= "ehca_comp/%u",
+	.cleanup	= comp_task_stop,
+	.park		= comp_task_park,
 };
 
 int ehca_create_comp_pool(void)
 {
-	int cpu;
-	struct task_struct *task;
+	int cpu, ret = -ENOMEM;
 
 	if (!ehca_scaling_code)
 		return 0;
@@ -905,38 +834,47 @@ int ehca_create_comp_pool(void)
 	pool->last_cpu = cpumask_any(cpu_online_mask);
 
 	pool->cpu_comp_tasks = alloc_percpu(struct ehca_cpu_comp_task);
-	if (pool->cpu_comp_tasks == NULL) {
-		kfree(pool);
-		return -EINVAL;
-	}
+	if (!pool->cpu_comp_tasks)
+		goto out_pool;
 
-	for_each_online_cpu(cpu) {
-		task = create_comp_task(pool, cpu);
-		if (task) {
-			kthread_bind(task, cpu);
-			wake_up_process(task);
-		}
+	pool->cpu_comp_threads = alloc_percpu(struct task_struct *);
+	if (!pool->cpu_comp_threads)
+		goto out_tasks;
+
+	for_each_present_cpu(cpu) {
+		struct ehca_cpu_comp_task *cct;
+
+		cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
+		spin_lock_init(&cct->task_lock);
+		INIT_LIST_HEAD(&cct->cq_list);
+		init_waitqueue_head(&cct->wait_queue);
 	}
 
-	register_hotcpu_notifier(&comp_pool_callback_nb);
+	comp_pool_threads.store = pool->cpu_comp_threads;
+	ret = smpboot_register_percpu_thread(&comp_pool_threads);
+	if (ret)
+		goto out_threads;
 
-	printk(KERN_INFO "eHCA scaling code enabled\n");
+	pr_info("eHCA scaling code enabled\n");
+	return ret;
 
-	return 0;
+out_threads:
+	free_percpu(pool->cpu_comp_threads);
+out_tasks:
+	free_percpu(pool->cpu_comp_tasks);
+out_pool:
+	kfree(pool);
+	return ret;
 }
 
 void ehca_destroy_comp_pool(void)
 {
-	int i;
-
 	if (!ehca_scaling_code)
 		return;
 
-	unregister_hotcpu_notifier(&comp_pool_callback_nb);
-
-	for_each_online_cpu(i)
-		destroy_comp_task(pool, i);
+	smpboot_unregister_percpu_thread(&comp_pool_threads);
 
+	free_percpu(pool->cpu_comp_threads);
 	free_percpu(pool->cpu_comp_tasks);
 	kfree(pool);
 }
Index: tip/drivers/infiniband/hw/ehca/ehca_irq.h
===================================================================
--- tip.orig/drivers/infiniband/hw/ehca/ehca_irq.h
+++ tip/drivers/infiniband/hw/ehca/ehca_irq.h
@@ -60,13 +60,14 @@ void ehca_process_eq(struct ehca_shca *s
 struct ehca_cpu_comp_task {
 	wait_queue_head_t wait_queue;
 	struct list_head cq_list;
-	struct task_struct *task;
 	spinlock_t task_lock;
 	int cq_jobs;
+	int active;
 };
 
 struct ehca_comp_pool {
-	struct ehca_cpu_comp_task *cpu_comp_tasks;
+	struct ehca_cpu_comp_task __percpu *cpu_comp_tasks;
+	struct task_struct * __percpu *cpu_comp_threads;
 	int last_cpu;
 	spinlock_t last_cpu_lock;
 };



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-13 11:00 ` [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads Thomas Gleixner
@ 2012-06-13 18:33   ` Paul E. McKenney
  2012-06-13 18:56     ` Thomas Gleixner
  2012-06-13 18:57   ` Paul E. McKenney
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2012-06-13 18:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Wed, Jun 13, 2012 at 11:00:54AM -0000, Thomas Gleixner wrote:
> Provide a generic interface for setting up and tearing down percpu
> threads.
> 
> On registration the threads for already online cpus are created and
> started. On deregistration (modules) the threads are stoppped.
> 
> During hotplug operations the threads are created, started, parked and
> unparked. The datastructure for registration provides a pointer to
> percpu storage space and optional setup, cleanup, park, unpark
> functions. These functions are called when the thread state changes.
> 
> Thread functions should look like the following:
> 
> int thread_fn(void *cookie)
> {
> 	while (!smpboot_thread_check_parking(cookie)) {
> 		do_stuff();
> 	}
> 	return 0;
> }

Very cool!!!

So I am currently trying to apply this to RCU's per-CPU kthread.
I don't believe that I need to mess with RCU's per-rcu_node kthread
because it can just have its affinity adjusted when the first CPU
onlines and the last CPU offlines for the corresponding rcu_node.

One question below about the order of parking.

Also, I have not yet figured out how this avoids a parked thread waking
up while the CPU is offline, but I am probably still missing something.

							Thanx, Paul

> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  include/linux/smpboot.h |   40 +++++++++
>  kernel/cpu.c            |    6 +
>  kernel/smpboot.c        |  205 ++++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/smpboot.h        |    4 
>  4 files changed, 255 insertions(+)
> 
> Index: tip/include/linux/smpboot.h
> ===================================================================
> --- /dev/null
> +++ tip/include/linux/smpboot.h
> @@ -0,0 +1,40 @@
> +#ifndef _LINUX_SMPBOOT_H
> +#define _LINUX_SMPBOOT_H
> +
> +#include <linux/types.h>
> +
> +struct task_struct;
> +/* Cookie handed to the thread_fn*/
> +struct smpboot_thread_data;
> +
> +/**
> + * struct smp_hotplug_thread - CPU hotplug related thread descriptor
> + * @store:		Pointer to per cpu storage for the task pointers
> + * @list:		List head for core management
> + * @thread_fn:		The associated thread function
> + * @setup:		Optional setup function, called when the thread gets
> + *			operational the first time
> + * @cleanup:		Optional cleanup function, called when the thread
> + *			should stop (module exit)
> + * @park:		Optional park function, called when the thread is
> + *			parked (cpu offline)
> + * @unpark:		Optional unpark function, called when the thread is
> + *			unparked (cpu online)
> + * @thread_comm:	The base name of the thread
> + */
> +struct smp_hotplug_thread {
> +	struct task_struct __percpu	**store;
> +	struct list_head		list;
> +	int				(*thread_fn)(void *data);
> +	void				(*setup)(unsigned int cpu);
> +	void				(*cleanup)(unsigned int cpu, bool online);
> +	void				(*park)(unsigned int cpu);
> +	void				(*unpark)(unsigned int cpu);
> +	const char			*thread_comm;
> +};
> +
> +int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread);
> +void smpboot_unregister_percpu_thread(struct smp_hotplug_thread *plug_thread);
> +int smpboot_thread_check_parking(struct smpboot_thread_data *td);
> +
> +#endif
> Index: tip/kernel/cpu.c
> ===================================================================
> --- tip.orig/kernel/cpu.c
> +++ tip/kernel/cpu.c
> @@ -280,6 +280,7 @@ static int __ref _cpu_down(unsigned int 
>  				__func__, cpu);
>  		goto out_release;
>  	}
> +	smpboot_park_threads(cpu);
> 
>  	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
>  	if (err) {
> @@ -354,6 +355,10 @@ static int __cpuinit _cpu_up(unsigned in
>  		goto out;
>  	}
> 
> +	ret = smpboot_create_threads(cpu);
> +	if (ret)
> +		goto out;
> +
>  	ret = __cpu_notify(CPU_UP_PREPARE | mod, hcpu, -1, &nr_calls);
>  	if (ret) {
>  		nr_calls--;
> @@ -370,6 +375,7 @@ static int __cpuinit _cpu_up(unsigned in
> 
>  	/* Now call notifier in preparation. */
>  	cpu_notify(CPU_ONLINE | mod, hcpu);
> +	smpboot_unpark_threads(cpu);
> 
>  out_notify:
>  	if (ret != 0)
> Index: tip/kernel/smpboot.c
> ===================================================================
> --- tip.orig/kernel/smpboot.c
> +++ tip/kernel/smpboot.c
> @@ -1,11 +1,17 @@
>  /*
>   * Common SMP CPU bringup/teardown functions
>   */
> +#include <linux/cpu.h>
>  #include <linux/err.h>
>  #include <linux/smp.h>
>  #include <linux/init.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
>  #include <linux/sched.h>
> +#include <linux/export.h>
>  #include <linux/percpu.h>
> +#include <linux/kthread.h>
> +#include <linux/smpboot.h>
> 
>  #include "smpboot.h"
> 
> @@ -65,3 +71,202 @@ void __init idle_threads_init(void)
>  	}
>  }
>  #endif
> +
> +static LIST_HEAD(hotplug_threads);
> +static DEFINE_MUTEX(smpboot_threads_lock);
> +
> +struct smpboot_thread_data {
> +	unsigned int			cpu;
> +	unsigned int			status;
> +	struct smp_hotplug_thread	*ht;
> +};
> +
> +enum {
> +	HP_THREAD_NONE = 0,
> +	HP_THREAD_ACTIVE,
> +	HP_THREAD_PARKED,
> +};
> +
> +/**
> + * smpboot_thread_check_parking - percpu hotplug thread loop function
> + * @td:		thread data pointer
> + *
> + * Checks for thread stop and park conditions. Calls the necessary
> + * setup, cleanup, park and unpark functions for the registered
> + * thread.
> + *
> + * Returns 1 when the thread should exit, 0 otherwise.
> + */
> +int smpboot_thread_check_parking(struct smpboot_thread_data *td)
> +{
> +	struct smp_hotplug_thread *ht = td->ht;
> +
> +again:
> +	if (kthread_should_stop()) {
> +		if (ht->cleanup)
> +			ht->cleanup(td->cpu, cpu_online(td->cpu));
> +		kfree(td);
> +		return 1;
> +	}
> +
> +	BUG_ON(td->cpu != smp_processor_id());
> +
> +	if (kthread_should_park()) {
> +		if (ht->park && td->status == HP_THREAD_ACTIVE) {
> +			ht->park(td->cpu);
> +			td->status = HP_THREAD_PARKED;
> +		}
> +		kthread_parkme();
> +		/* We might have been woken for stop */
> +		goto again;
> +	}
> +
> +	switch (td->status) {
> +	case HP_THREAD_NONE:
> +		if (ht->setup)
> +			ht->setup(td->cpu);
> +		td->status = HP_THREAD_ACTIVE;
> +		break;
> +	case HP_THREAD_PARKED:
> +		if (ht->unpark)
> +			ht->unpark(td->cpu);
> +		td->status = HP_THREAD_ACTIVE;
> +		break;
> +	}
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(smpboot_thread_check_parking);
> +
> +static int
> +__smpboot_create_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
> +{
> +	struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
> +	struct smpboot_thread_data *td;
> +
> +	if (tsk)
> +		return 0;
> +
> +	td = kzalloc_node(sizeof(*td), GFP_KERNEL, cpu_to_node(cpu));
> +	if (!td)
> +		return -ENOMEM;
> +	td->cpu = cpu;
> +	td->ht = ht;
> +
> +	tsk = kthread_create_on_cpu(ht->thread_fn, td, cpu, ht->thread_comm);
> +	if (IS_ERR(tsk))
> +		return PTR_ERR(tsk);
> +
> +	get_task_struct(tsk);
> +	*per_cpu_ptr(ht->store, cpu) = tsk;
> +	return 0;
> +}
> +
> +int smpboot_create_threads(unsigned int cpu)
> +{
> +	struct smp_hotplug_thread *cur;
> +	int ret = 0;
> +
> +	mutex_lock(&smpboot_threads_lock);
> +	list_for_each_entry(cur, &hotplug_threads, list) {
> +		ret = __smpboot_create_thread(cur, cpu);
> +		if (ret)
> +			break;
> +	}
> +	mutex_unlock(&smpboot_threads_lock);
> +	return ret;
> +}
> +
> +static void smpboot_unpark_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
> +{
> +	struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
> +
> +	kthread_unpark(tsk);
> +}
> +
> +void smpboot_unpark_threads(unsigned int cpu)
> +{
> +	struct smp_hotplug_thread *cur;
> +
> +	mutex_lock(&smpboot_threads_lock);
> +	list_for_each_entry(cur, &hotplug_threads, list)
> +		smpboot_unpark_thread(cur, cpu);
> +	mutex_unlock(&smpboot_threads_lock);
> +}
> +
> +static void smpboot_park_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
> +{
> +	struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
> +
> +	if (tsk)
> +		kthread_park(tsk);
> +}
> +
> +void smpboot_park_threads(unsigned int cpu)
> +{
> +	struct smp_hotplug_thread *cur;
> +
> +	mutex_lock(&smpboot_threads_lock);
> +	list_for_each_entry(cur, &hotplug_threads, list)

Shouldn't this be list_for_each_entry_reverse()?  Yes, the notifiers
still run in the same order for both online and offline, but all uses
of smpboot_park_threads() would be new, so should be OK with the
proper ordering, right?

> +		smpboot_park_thread(cur, cpu);
> +	mutex_unlock(&smpboot_threads_lock);
> +}
> +
> +static void smpboot_destroy_threads(struct smp_hotplug_thread *ht)
> +{
> +	unsigned int cpu;
> +
> +	/* We need to destroy also the parked threads of offline cpus */
> +	for_each_possible_cpu(cpu) {
> +		struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
> +
> +		if (tsk) {
> +			kthread_stop(tsk);
> +			put_task_struct(tsk);
> +			*per_cpu_ptr(ht->store, cpu) = NULL;
> +		}
> +	}
> +}
> +
> +/**
> + * smpboot_register_percpu_thread - Register a per_cpu thread related to hotplug
> + * @plug_thread:	Hotplug thread descriptor
> + *
> + * Creates and starts the threads on all online cpus.
> + */
> +int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
> +{
> +	unsigned int cpu;
> +	int ret = 0;
> +
> +	mutex_lock(&smpboot_threads_lock);
> +	for_each_online_cpu(cpu) {
> +		ret = __smpboot_create_thread(plug_thread, cpu);
> +		if (ret) {
> +			smpboot_destroy_threads(plug_thread);
> +			goto out;
> +		}
> +		smpboot_unpark_thread(plug_thread, cpu);
> +	}
> +	list_add(&plug_thread->list, &hotplug_threads);
> +out:
> +	mutex_unlock(&smpboot_threads_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread);
> +
> +/**
> + * smpboot_unregister_percpu_thread - Unregister a per_cpu thread related to hotplug
> + * @plug_thread:	Hotplug thread descriptor
> + *
> + * Stops all threads on all possible cpus.
> + */
> +void smpboot_unregister_percpu_thread(struct smp_hotplug_thread *plug_thread)
> +{
> +	get_online_cpus();
> +	mutex_lock(&smpboot_threads_lock);
> +	list_del(&plug_thread->list);
> +	smpboot_destroy_threads(plug_thread);
> +	mutex_unlock(&smpboot_threads_lock);
> +	put_online_cpus();
> +}
> +EXPORT_SYMBOL_GPL(smpboot_unregister_percpu_thread);
> Index: tip/kernel/smpboot.h
> ===================================================================
> --- tip.orig/kernel/smpboot.h
> +++ tip/kernel/smpboot.h
> @@ -13,4 +13,8 @@ static inline void idle_thread_set_boot_
>  static inline void idle_threads_init(void) { }
>  #endif
> 
> +int smpboot_create_threads(unsigned int cpu);
> +void smpboot_park_threads(unsigned int cpu);
> +void smpboot_unpark_threads(unsigned int cpu);
> +
>  #endif
> 
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-13 18:33   ` Paul E. McKenney
@ 2012-06-13 18:56     ` Thomas Gleixner
  2012-06-14  8:08       ` Peter Zijlstra
  0 siblings, 1 reply; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-13 18:56 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Wed, 13 Jun 2012, Paul E. McKenney wrote:
> On Wed, Jun 13, 2012 at 11:00:54AM -0000, Thomas Gleixner wrote:
>
> So I am currently trying to apply this to RCU's per-CPU kthread.
> I don't believe that I need to mess with RCU's per-rcu_node kthread
> because it can just have its affinity adjusted when the first CPU
> onlines and the last CPU offlines for the corresponding rcu_node.
> 
> One question below about the order of parking.
> 
> Also, I have not yet figured out how this avoids a parked thread waking
> up while the CPU is offline, but I am probably still missing something.

If it's just a spurious wakeup then it goes back to sleep right away
as nothing cleared the park bit.

If something calls unpark(), then it's toast. I should put a warning
into the code somewhere to catch that case.

> > +void smpboot_park_threads(unsigned int cpu)
> > +{
> > +	struct smp_hotplug_thread *cur;
> > +
> > +	mutex_lock(&smpboot_threads_lock);
> > +	list_for_each_entry(cur, &hotplug_threads, list)
> 
> Shouldn't this be list_for_each_entry_reverse()?  Yes, the notifiers
> still run in the same order for both online and offline, but all uses
> of smpboot_park_threads() would be new, so should be OK with the
> proper ordering, right?

Duh, yes

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-13 11:00 ` [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads Thomas Gleixner
  2012-06-13 18:33   ` Paul E. McKenney
@ 2012-06-13 18:57   ` Paul E. McKenney
  2012-06-13 19:02     ` Thomas Gleixner
  2012-06-14  8:31   ` Srivatsa S. Bhat
  2012-06-18  8:47   ` Cyrill Gorcunov
  3 siblings, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2012-06-13 18:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Wed, Jun 13, 2012 at 11:00:54AM -0000, Thomas Gleixner wrote:
> Provide a generic interface for setting up and tearing down percpu
> threads.
> 
> On registration the threads for already online cpus are created and
> started. On deregistration (modules) the threads are stoppped.
> 
> During hotplug operations the threads are created, started, parked and
> unparked. The datastructure for registration provides a pointer to
> percpu storage space and optional setup, cleanup, park, unpark
> functions. These functions are called when the thread state changes.
> 
> Thread functions should look like the following:
> 
> int thread_fn(void *cookie)
> {
> 	while (!smpboot_thread_check_parking(cookie)) {
> 		do_stuff();
> 	}
> 	return 0;
> }
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

[ . . . ]

> Index: tip/kernel/cpu.c
> ===================================================================
> --- tip.orig/kernel/cpu.c
> +++ tip/kernel/cpu.c
> @@ -280,6 +280,7 @@ static int __ref _cpu_down(unsigned int 
>  				__func__, cpu);
>  		goto out_release;
>  	}
> +	smpboot_park_threads(cpu);
> 
>  	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
>  	if (err) {
> @@ -354,6 +355,10 @@ static int __cpuinit _cpu_up(unsigned in
>  		goto out;
>  	}
> 
> +	ret = smpboot_create_threads(cpu);
> +	if (ret)
> +		goto out;
> +
>  	ret = __cpu_notify(CPU_UP_PREPARE | mod, hcpu, -1, &nr_calls);
>  	if (ret) {
>  		nr_calls--;
> @@ -370,6 +375,7 @@ static int __cpuinit _cpu_up(unsigned in
> 
>  	/* Now call notifier in preparation. */
>  	cpu_notify(CPU_ONLINE | mod, hcpu);
> +	smpboot_unpark_threads(cpu);

OK, RCU must use the lower-level interfaces, given that one of
then CPU_ONLINE notifiers might invoke synchronize_rcu().

							Thanx, Paul


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-13 18:57   ` Paul E. McKenney
@ 2012-06-13 19:02     ` Thomas Gleixner
  2012-06-13 19:17       ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-13 19:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Wed, 13 Jun 2012, Paul E. McKenney wrote:
> On Wed, Jun 13, 2012 at 11:00:54AM -0000, Thomas Gleixner wrote:
> >  	/* Now call notifier in preparation. */
> >  	cpu_notify(CPU_ONLINE | mod, hcpu);
> > +	smpboot_unpark_threads(cpu);
> 
> OK, RCU must use the lower-level interfaces, given that one of
> then CPU_ONLINE notifiers might invoke synchronize_rcu().

We can start the threads before the notifiers. There is no
restriction.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-13 19:02     ` Thomas Gleixner
@ 2012-06-13 19:17       ` Paul E. McKenney
  2012-06-13 20:47         ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2012-06-13 19:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Wed, Jun 13, 2012 at 09:02:55PM +0200, Thomas Gleixner wrote:
> On Wed, 13 Jun 2012, Paul E. McKenney wrote:
> > On Wed, Jun 13, 2012 at 11:00:54AM -0000, Thomas Gleixner wrote:
> > >  	/* Now call notifier in preparation. */
> > >  	cpu_notify(CPU_ONLINE | mod, hcpu);
> > > +	smpboot_unpark_threads(cpu);
> > 
> > OK, RCU must use the lower-level interfaces, given that one of
> > then CPU_ONLINE notifiers might invoke synchronize_rcu().
> 
> We can start the threads before the notifiers. There is no
> restriction.

Sounds very good in both cases!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-13 19:17       ` Paul E. McKenney
@ 2012-06-13 20:47         ` Paul E. McKenney
  2012-06-14  4:51           ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2012-06-13 20:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Wed, Jun 13, 2012 at 12:17:45PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 13, 2012 at 09:02:55PM +0200, Thomas Gleixner wrote:
> > On Wed, 13 Jun 2012, Paul E. McKenney wrote:
> > > On Wed, Jun 13, 2012 at 11:00:54AM -0000, Thomas Gleixner wrote:
> > > >  	/* Now call notifier in preparation. */
> > > >  	cpu_notify(CPU_ONLINE | mod, hcpu);
> > > > +	smpboot_unpark_threads(cpu);
> > > 
> > > OK, RCU must use the lower-level interfaces, given that one of
> > > then CPU_ONLINE notifiers might invoke synchronize_rcu().
> > 
> > We can start the threads before the notifiers. There is no
> > restriction.
> 
> Sounds very good in both cases!

Just for reference, here is what I am using.


Anticipatory changes in advance of Thomas's next patchset

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6ab5d82..510ba5e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -372,10 +372,10 @@ static int __cpuinit _cpu_up(unsigned int cpu, int tasks_frozen)
 	if (ret != 0)
 		goto out_notify;
 	BUG_ON(!cpu_online(cpu));
+	smpboot_unpark_threads(cpu);
 
 	/* Now call notifier in preparation. */
 	cpu_notify(CPU_ONLINE | mod, hcpu);
-	smpboot_unpark_threads(cpu);
 
 out_notify:
 	if (ret != 0)
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index d7a3659..7d54e7c 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -201,7 +201,7 @@ void smpboot_park_threads(unsigned int cpu)
 	struct smp_hotplug_thread *cur;
 
 	mutex_lock(&smpboot_threads_lock);
-	list_for_each_entry(cur, &hotplug_threads, list)
+	list_for_each_entry_reverse(cur, &hotplug_threads, list)
 		smpboot_park_thread(cur, cpu);
 	mutex_unlock(&smpboot_threads_lock);
 }


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [RFC patch 1/5] kthread: Implement park/unpark facility
  2012-06-13 11:00 ` [RFC patch 1/5] kthread: Implement park/unpark facility Thomas Gleixner
@ 2012-06-14  2:32   ` Namhyung Kim
  2012-06-14  8:35     ` Thomas Gleixner
  2012-06-14  8:36   ` Srivatsa S. Bhat
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 53+ messages in thread
From: Namhyung Kim @ 2012-06-14  2:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Paul E. McKenney, Tejun Heo

Hi, Thomas

A nitpick and questions. :)


On Wed, 13 Jun 2012 11:00:54 -0000, Thomas Gleixner wrote:
> To avoid the full teardown/setup of per cpu kthreads in the case of
> cpu hot(un)plug, provide a facility which allows to put the kthread
> into a park position and unpark it when the cpu comes online again.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
[snip]
>  
>  /**
> + * kthread_create_on_cpu - Create a cpu bound kthread
> + * @threadfn: the function to run until signal_pending(current).
> + * @data: data ptr for @threadfn.
> + * @node: memory node number.

Should be @cpu.


> + * @namefmt: printf-style name for the thread.
> + *
> + * Description: This helper function creates and names a kernel thread
> + * and binds it to a given CPU. The thread will be woken and put into
> + * park mode.
> + */
> +struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
> +					  void *data, unsigned int cpu,
> +					  const char *namefmt)
> +{
> +	struct task_struct *p;
> +
> +	p = kthread_create_on_node(threadfn, data, cpu_to_node(cpu), namefmt,
> +				   cpu);
> +	if (IS_ERR(p))
> +		return p;
> +	/* Park the thread, mark it percpu and then bind it */
> +	kthread_park(p);

Why park? AFAIK the p is not running after creation, so binding it
doesn't need parking, right?


> +	set_bit(KTHREAD_IS_PER_CPU, &to_kthread(p)->flags);
> +	to_kthread(p)->cpu = cpu;
> +	__kthread_bind(p, cpu);
> +	return p;
> +}
> +
> +/**
> + * kthread_unpark - unpark a thread created by kthread_create().
> + * @k:		thread created by kthread_create().
> + *
> + * Sets kthread_should_park() for @k to return false, wakes it, and
> + * waits for it to return. If the thread is marked percpu then its
> + * bound to the cpu again.
> + */
> +void kthread_unpark(struct task_struct *k)
> +{
> +	struct kthread *kthread;
> +
> +	get_task_struct(k);
> +
> +	kthread = to_kthread(k);
> +	barrier(); /* it might have exited */
> +	if (k->vfork_done != NULL) {

Does is guarantee it's not exited? If so, as it is used a couple of
times, wouldn't it be better making it up a (static?) helper (with
comments, hopefully) something like a kthread_is_alive() ?


> +		clear_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
> +		if (test_bit(KTHREAD_IS_PARKED, &kthread->flags)) {
> +			if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags))
> +				__kthread_bind(k, kthread->cpu);
> +			wake_up_process(k);
> +		}
> +	}
> +	put_task_struct(k);
> +}
> +
> +/**
> + * kthread_park - park a thread created by kthread_create().
> + * @k: thread created by kthread_create().
> + *
> + * Sets kthread_should_park() for @k to return true, wakes it, and
> + * waits for it to return. This can also be called after kthread_create()
> + * instead of calling wake_up_process(): the thread will park without
> + * calling threadfn().
> + *
> + * Returns 0 if the thread is parked, -ENOSYS if the thread exited.
> + * If called by the kthread itself just the park bit is set.
> + */
> +int kthread_park(struct task_struct *k)
> +{
> +	struct kthread *kthread;
> +	int ret = -ENOSYS;
> +
> +	get_task_struct(k);
> +
> +	kthread = to_kthread(k);
> +	barrier(); /* it might have exited */
> +	if (k->vfork_done != NULL) {
> +		if (!test_bit(KTHREAD_IS_PARKED, &kthread->flags)) {
> +			set_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
> +			if (k != current) {
> +				wake_up_process(k);
> +				wait_for_completion(&kthread->parked);
> +			}
> +		}
> +		ret = 0;
> +	}
> +	put_task_struct(k);
> +	return ret;
> +}
> +
> +/**
>   * kthread_stop - stop a thread created by kthread_create().
>   * @k: thread created by kthread_create().
>   *
> @@ -259,7 +398,7 @@ int kthread_stop(struct task_struct *k)
>  	kthread = to_kthread(k);
>  	barrier(); /* it might have exited */
>  	if (k->vfork_done != NULL) {
> -		kthread->should_stop = 1;
> +		set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);

I wonder whether it also needs to unpark @k. Isn't it possible to stop a
parked thread?

Thanks,
Namhyung


>  		wake_up_process(k);
>  		wait_for_completion(&kthread->exited);
>  	}

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-13 20:47         ` Paul E. McKenney
@ 2012-06-14  4:51           ` Paul E. McKenney
  2012-06-14 11:20             ` Thomas Gleixner
  2012-06-14 22:40             ` Paul E. McKenney
  0 siblings, 2 replies; 53+ messages in thread
From: Paul E. McKenney @ 2012-06-14  4:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Wed, Jun 13, 2012 at 01:47:25PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 13, 2012 at 12:17:45PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 13, 2012 at 09:02:55PM +0200, Thomas Gleixner wrote:
> > > On Wed, 13 Jun 2012, Paul E. McKenney wrote:
> > > > On Wed, Jun 13, 2012 at 11:00:54AM -0000, Thomas Gleixner wrote:
> > > > >  	/* Now call notifier in preparation. */
> > > > >  	cpu_notify(CPU_ONLINE | mod, hcpu);
> > > > > +	smpboot_unpark_threads(cpu);
> > > > 
> > > > OK, RCU must use the lower-level interfaces, given that one of
> > > > then CPU_ONLINE notifiers might invoke synchronize_rcu().
> > > 
> > > We can start the threads before the notifiers. There is no
> > > restriction.
> > 
> > Sounds very good in both cases!
> 
> Just for reference, here is what I am using.

And here is a buggy first attempt to make RCU use the smpboot interfaces.
Probably still bugs in my adaptation, as it still hangs in the first
attempt to offline a CPU.  If I revert the softirq smpboot commit, the
offline still hangs somewhere near the __stop_machine() processing, but
the system continues running otherwise.  Will continue debugging tomorrow.

When doing this sort of conversion, renaming the per-CPU variable used
to hold the kthreads' task_struct pointers is highly recommended --
failing to do so cost me substantial confusion.  ;-)

							Thanx, Paul

------------------------------------------------------------------------

rcu: Use smp_hotplug_thread facility for RCU's per-CPU kthread

Bring RCU into the new-age CPU-hotplug fold by modifying RCU's per-CPU
kthread code to use the new smp_hotplug_thread facility.
    
Not-signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

 rcutree.c        |    4 -
 rcutree.h        |    2 
 rcutree_plugin.h |  177 ++++++++-----------------------------------------------
 rcutree_trace.c  |    3 
 4 files changed, 27 insertions(+), 159 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 0da7b88..7813d7d 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -125,7 +125,6 @@ static int rcu_scheduler_fully_active __read_mostly;
  */
 static DEFINE_PER_CPU(struct task_struct *, rcu_cpu_kthread_task);
 DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_status);
-DEFINE_PER_CPU(int, rcu_cpu_kthread_cpu);
 DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_loops);
 DEFINE_PER_CPU(char, rcu_cpu_has_work);
 
@@ -1458,7 +1457,6 @@ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
 	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
 
 	/* Adjust any no-longer-needed kthreads. */
-	rcu_stop_cpu_kthread(cpu);
 	rcu_node_kthread_setaffinity(rnp, -1);
 
 	/* Remove the dead CPU from the bitmasks in the rcu_node hierarchy. */
@@ -2514,11 +2512,9 @@ static int __cpuinit rcu_cpu_notify(struct notifier_block *self,
 	case CPU_ONLINE:
 	case CPU_DOWN_FAILED:
 		rcu_node_kthread_setaffinity(rnp, -1);
-		rcu_cpu_kthread_setrt(cpu, 1);
 		break;
 	case CPU_DOWN_PREPARE:
 		rcu_node_kthread_setaffinity(rnp, cpu);
-		rcu_cpu_kthread_setrt(cpu, 0);
 		break;
 	case CPU_DYING:
 	case CPU_DYING_FROZEN:
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 7f5d138..70883af 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -434,7 +434,6 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp);
 #ifdef CONFIG_HOTPLUG_CPU
 static void rcu_report_unblock_qs_rnp(struct rcu_node *rnp,
 				      unsigned long flags);
-static void rcu_stop_cpu_kthread(int cpu);
 #endif /* #ifdef CONFIG_HOTPLUG_CPU */
 static void rcu_print_detail_task_stall(struct rcu_state *rsp);
 static int rcu_print_task_stall(struct rcu_node *rnp);
@@ -472,7 +471,6 @@ static int __cpuinit rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 static void invoke_rcu_node_kthread(struct rcu_node *rnp);
 static void rcu_yield(void (*f)(unsigned long), unsigned long arg);
 #endif /* #ifdef CONFIG_RCU_BOOST */
-static void rcu_cpu_kthread_setrt(int cpu, int to_rt);
 static void __cpuinit rcu_prepare_kthreads(int cpu);
 static void rcu_prepare_for_idle_init(int cpu);
 static void rcu_cleanup_after_idle(int cpu);
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 2411000..f789341 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -25,6 +25,7 @@
  */
 
 #include <linux/delay.h>
+#include <linux/smpboot.h>
 
 #define RCU_KTHREAD_PRIO 1
 
@@ -1449,25 +1450,6 @@ static int __cpuinit rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 	return 0;
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-
-/*
- * Stop the RCU's per-CPU kthread when its CPU goes offline,.
- */
-static void rcu_stop_cpu_kthread(int cpu)
-{
-	struct task_struct *t;
-
-	/* Stop the CPU's kthread. */
-	t = per_cpu(rcu_cpu_kthread_task, cpu);
-	if (t != NULL) {
-		per_cpu(rcu_cpu_kthread_task, cpu) = NULL;
-		kthread_stop(t);
-	}
-}
-
-#endif /* #ifdef CONFIG_HOTPLUG_CPU */
-
 static void rcu_kthread_do_work(void)
 {
 	rcu_do_batch(&rcu_sched_state, &__get_cpu_var(rcu_sched_data));
@@ -1490,30 +1472,6 @@ static void invoke_rcu_node_kthread(struct rcu_node *rnp)
 }
 
 /*
- * Set the specified CPU's kthread to run RT or not, as specified by
- * the to_rt argument.  The CPU-hotplug locks are held, so the task
- * is not going away.
- */
-static void rcu_cpu_kthread_setrt(int cpu, int to_rt)
-{
-	int policy;
-	struct sched_param sp;
-	struct task_struct *t;
-
-	t = per_cpu(rcu_cpu_kthread_task, cpu);
-	if (t == NULL)
-		return;
-	if (to_rt) {
-		policy = SCHED_FIFO;
-		sp.sched_priority = RCU_KTHREAD_PRIO;
-	} else {
-		policy = SCHED_NORMAL;
-		sp.sched_priority = 0;
-	}
-	sched_setscheduler_nocheck(t, policy, &sp);
-}
-
-/*
  * Timer handler to initiate the waking up of per-CPU kthreads that
  * have yielded the CPU due to excess numbers of RCU callbacks.
  * We wake up the per-rcu_node kthread, which in turn will wake up
@@ -1553,63 +1511,35 @@ static void rcu_yield(void (*f)(unsigned long), unsigned long arg)
 }
 
 /*
- * Handle cases where the rcu_cpu_kthread() ends up on the wrong CPU.
- * This can happen while the corresponding CPU is either coming online
- * or going offline.  We cannot wait until the CPU is fully online
- * before starting the kthread, because the various notifier functions
- * can wait for RCU grace periods.  So we park rcu_cpu_kthread() until
- * the corresponding CPU is online.
- *
- * Return 1 if the kthread needs to stop, 0 otherwise.
- *
- * Caller must disable bh.  This function can momentarily enable it.
- */
-static int rcu_cpu_kthread_should_stop(int cpu)
-{
-	while (cpu_is_offline(cpu) ||
-	       !cpumask_equal(&current->cpus_allowed, cpumask_of(cpu)) ||
-	       smp_processor_id() != cpu) {
-		if (kthread_should_stop())
-			return 1;
-		per_cpu(rcu_cpu_kthread_status, cpu) = RCU_KTHREAD_OFFCPU;
-		per_cpu(rcu_cpu_kthread_cpu, cpu) = raw_smp_processor_id();
-		local_bh_enable();
-		schedule_timeout_uninterruptible(1);
-		if (!cpumask_equal(&current->cpus_allowed, cpumask_of(cpu)))
-			set_cpus_allowed_ptr(current, cpumask_of(cpu));
-		local_bh_disable();
-	}
-	per_cpu(rcu_cpu_kthread_cpu, cpu) = cpu;
-	return 0;
-}
-
-/*
  * Per-CPU kernel thread that invokes RCU callbacks.  This replaces the
  * RCU softirq used in flavors and configurations of RCU that do not
  * support RCU priority boosting.
  */
-static int rcu_cpu_kthread(void *arg)
+static int rcu_cpu_kthread(void *cookie)
 {
-	int cpu = (int)(long)arg;
 	unsigned long flags;
+	struct sched_param sp;
 	int spincnt = 0;
-	unsigned int *statusp = &per_cpu(rcu_cpu_kthread_status, cpu);
+	unsigned int *statusp = &__get_cpu_var(rcu_cpu_kthread_status);
 	char work;
-	char *workp = &per_cpu(rcu_cpu_has_work, cpu);
+	char *workp = &__get_cpu_var(rcu_cpu_has_work);
 
-	trace_rcu_utilization("Start CPU kthread@init");
+	trace_rcu_utilization("Start CPU kthread@unpark");
+	sp.sched_priority = RCU_KTHREAD_PRIO;
+	sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
 	for (;;) {
 		*statusp = RCU_KTHREAD_WAITING;
 		trace_rcu_utilization("End CPU kthread@rcu_wait");
-		rcu_wait(*workp != 0 || kthread_should_stop());
+		rcu_wait(*workp != 0 ||
+			 smpboot_thread_check_parking(cookie));
 		trace_rcu_utilization("Start CPU kthread@rcu_wait");
 		local_bh_disable();
-		if (rcu_cpu_kthread_should_stop(cpu)) {
+		if (smpboot_thread_check_parking(cookie)) {
 			local_bh_enable();
 			break;
 		}
 		*statusp = RCU_KTHREAD_RUNNING;
-		per_cpu(rcu_cpu_kthread_loops, cpu)++;
+		this_cpu_inc(rcu_cpu_kthread_loops);
 		local_irq_save(flags);
 		work = *workp;
 		*workp = 0;
@@ -1624,59 +1554,14 @@ static int rcu_cpu_kthread(void *arg)
 		if (spincnt > 10) {
 			*statusp = RCU_KTHREAD_YIELDING;
 			trace_rcu_utilization("End CPU kthread@rcu_yield");
-			rcu_yield(rcu_cpu_kthread_timer, (unsigned long)cpu);
+			rcu_yield(rcu_cpu_kthread_timer,
+				  smp_processor_id());
 			trace_rcu_utilization("Start CPU kthread@rcu_yield");
 			spincnt = 0;
 		}
 	}
 	*statusp = RCU_KTHREAD_STOPPED;
-	trace_rcu_utilization("End CPU kthread@term");
-	return 0;
-}
-
-/*
- * Spawn a per-CPU kthread, setting up affinity and priority.
- * Because the CPU hotplug lock is held, no other CPU will be attempting
- * to manipulate rcu_cpu_kthread_task.  There might be another CPU
- * attempting to access it during boot, but the locking in kthread_bind()
- * will enforce sufficient ordering.
- *
- * Please note that we cannot simply refuse to wake up the per-CPU
- * kthread because kthreads are created in TASK_UNINTERRUPTIBLE state,
- * which can result in softlockup complaints if the task ends up being
- * idle for more than a couple of minutes.
- *
- * However, please note also that we cannot bind the per-CPU kthread to its
- * CPU until that CPU is fully online.  We also cannot wait until the
- * CPU is fully online before we create its per-CPU kthread, as this would
- * deadlock the system when CPU notifiers tried waiting for grace
- * periods.  So we bind the per-CPU kthread to its CPU only if the CPU
- * is online.  If its CPU is not yet fully online, then the code in
- * rcu_cpu_kthread() will wait until it is fully online, and then do
- * the binding.
- */
-static int __cpuinit rcu_spawn_one_cpu_kthread(int cpu)
-{
-	struct sched_param sp;
-	struct task_struct *t;
-
-	if (!rcu_scheduler_fully_active ||
-	    per_cpu(rcu_cpu_kthread_task, cpu) != NULL)
-		return 0;
-	t = kthread_create_on_node(rcu_cpu_kthread,
-				   (void *)(long)cpu,
-				   cpu_to_node(cpu),
-				   "rcuc/%d", cpu);
-	if (IS_ERR(t))
-		return PTR_ERR(t);
-	if (cpu_online(cpu))
-		kthread_bind(t, cpu);
-	per_cpu(rcu_cpu_kthread_cpu, cpu) = cpu;
-	WARN_ON_ONCE(per_cpu(rcu_cpu_kthread_task, cpu) != NULL);
-	sp.sched_priority = RCU_KTHREAD_PRIO;
-	sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
-	per_cpu(rcu_cpu_kthread_task, cpu) = t;
-	wake_up_process(t); /* Get to TASK_INTERRUPTIBLE quickly. */
+	trace_rcu_utilization("End CPU kthread@park");
 	return 0;
 }
 
@@ -1788,6 +1673,12 @@ static int __cpuinit rcu_spawn_one_node_kthread(struct rcu_state *rsp,
 	return rcu_spawn_one_boost_kthread(rsp, rnp, rnp_index);
 }
 
+static struct smp_hotplug_thread rcu_cpu_thread_spec = {
+	.store = &rcu_cpu_kthread_task,
+	.thread_fn = rcu_cpu_kthread,
+	.thread_comm = "rcuc/%u",
+};
+
 /*
  * Spawn all kthreads -- called as soon as the scheduler is running.
  */
@@ -1797,11 +1688,9 @@ static int __init rcu_spawn_kthreads(void)
 	struct rcu_node *rnp;
 
 	rcu_scheduler_fully_active = 1;
-	for_each_possible_cpu(cpu) {
+	BUG_ON(smpboot_register_percpu_thread(&rcu_cpu_thread_spec));
+	for_each_possible_cpu(cpu)
 		per_cpu(rcu_cpu_has_work, cpu) = 0;
-		if (cpu_online(cpu))
-			(void)rcu_spawn_one_cpu_kthread(cpu);
-	}
 	rnp = rcu_get_root(rcu_state);
 	(void)rcu_spawn_one_node_kthread(rcu_state, rnp);
 	if (NUM_RCU_NODES > 1) {
@@ -1818,11 +1707,9 @@ static void __cpuinit rcu_prepare_kthreads(int cpu)
 	struct rcu_node *rnp = rdp->mynode;
 
 	/* Fire up the incoming CPU's kthread and leaf rcu_node kthread. */
-	if (rcu_scheduler_fully_active) {
-		(void)rcu_spawn_one_cpu_kthread(cpu);
-		if (rnp->node_kthread_task == NULL)
-			(void)rcu_spawn_one_node_kthread(rcu_state, rnp);
-	}
+	if (rcu_scheduler_fully_active &&
+	    rnp->node_kthread_task == NULL)
+		(void)rcu_spawn_one_node_kthread(rcu_state, rnp);
 }
 
 #else /* #ifdef CONFIG_RCU_BOOST */
@@ -1846,22 +1733,10 @@ static void rcu_preempt_boost_start_gp(struct rcu_node *rnp)
 {
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-
-static void rcu_stop_cpu_kthread(int cpu)
-{
-}
-
-#endif /* #ifdef CONFIG_HOTPLUG_CPU */
-
 static void rcu_node_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 {
 }
 
-static void rcu_cpu_kthread_setrt(int cpu, int to_rt)
-{
-}
-
 static int __init rcu_scheduler_really_started(void)
 {
 	rcu_scheduler_fully_active = 1;
diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c
index d4bc16d..6b4c76b 100644
--- a/kernel/rcutree_trace.c
+++ b/kernel/rcutree_trace.c
@@ -83,11 +83,10 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp)
 			rdp->nxttail[RCU_WAIT_TAIL]],
 		   ".D"[&rdp->nxtlist != rdp->nxttail[RCU_DONE_TAIL]]);
 #ifdef CONFIG_RCU_BOOST
-	seq_printf(m, " kt=%d/%c/%d ktl=%x",
+	seq_printf(m, " kt=%d/%c ktl=%x",
 		   per_cpu(rcu_cpu_has_work, rdp->cpu),
 		   convert_kthread_status(per_cpu(rcu_cpu_kthread_status,
 					  rdp->cpu)),
-		   per_cpu(rcu_cpu_kthread_cpu, rdp->cpu),
 		   per_cpu(rcu_cpu_kthread_loops, rdp->cpu) & 0xffff);
 #endif /* #ifdef CONFIG_RCU_BOOST */
 	seq_printf(m, " b=%ld", rdp->blimit);


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-13 18:56     ` Thomas Gleixner
@ 2012-06-14  8:08       ` Peter Zijlstra
  2012-06-14  8:17         ` Peter Zijlstra
  2012-06-14  8:37         ` Thomas Gleixner
  0 siblings, 2 replies; 53+ messages in thread
From: Peter Zijlstra @ 2012-06-14  8:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paul E. McKenney, LKML, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Wed, 2012-06-13 at 20:56 +0200, Thomas Gleixner wrote:
> If it's just a spurious wakeup then it goes back to sleep right away
> as nothing cleared the park bit. 

Your spurious wakeup will have destroyed the binding though. So you need
to be careful.



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14  8:08       ` Peter Zijlstra
@ 2012-06-14  8:17         ` Peter Zijlstra
  2012-06-15  1:53           ` Tejun Heo
  2012-06-14  8:37         ` Thomas Gleixner
  1 sibling, 1 reply; 53+ messages in thread
From: Peter Zijlstra @ 2012-06-14  8:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paul E. McKenney, LKML, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, 2012-06-14 at 10:08 +0200, Peter Zijlstra wrote:
> On Wed, 2012-06-13 at 20:56 +0200, Thomas Gleixner wrote:
> > If it's just a spurious wakeup then it goes back to sleep right away
> > as nothing cleared the park bit. 
> 
> Your spurious wakeup will have destroyed the binding though. So you need
> to be careful.

We should probably do something like the below..

TJ does this wreck workqueues? Its somewhat 'creative' in that regard
and really wants fixing.

---
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5018,6 +5018,8 @@ void do_set_cpus_allowed(struct task_str
 
 	cpumask_copy(&p->cpus_allowed, new_mask);
 	p->nr_cpus_allowed = cpumask_weight(new_mask);
+	if (p->nr_cpus_allowed != 1)
+		p->flags &= ~PF_THREAD_BOUND;
 }
 
 /*


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-13 11:00 ` [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads Thomas Gleixner
  2012-06-13 18:33   ` Paul E. McKenney
  2012-06-13 18:57   ` Paul E. McKenney
@ 2012-06-14  8:31   ` Srivatsa S. Bhat
  2012-06-14  8:44     ` Thomas Gleixner
  2012-06-18  8:47   ` Cyrill Gorcunov
  3 siblings, 1 reply; 53+ messages in thread
From: Srivatsa S. Bhat @ 2012-06-14  8:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Rusty Russell,
	Paul E. McKenney, Tejun Heo

On 06/13/2012 04:30 PM, Thomas Gleixner wrote:

> Provide a generic interface for setting up and tearing down percpu
> threads.
> 
> On registration the threads for already online cpus are created and
> started. On deregistration (modules) the threads are stoppped.
> 
> During hotplug operations the threads are created, started, parked and
> unparked. The datastructure for registration provides a pointer to
> percpu storage space and optional setup, cleanup, park, unpark
> functions. These functions are called when the thread state changes.
> 
> Thread functions should look like the following:
> 
> int thread_fn(void *cookie)
> {
> 	while (!smpboot_thread_check_parking(cookie)) {
> 		do_stuff();
> 	}
> 	return 0;
> }
>


This patchset looks great! But I have some comments below:

 
> Index: tip/kernel/cpu.c
> ===================================================================
> --- tip.orig/kernel/cpu.c
> +++ tip/kernel/cpu.c
> @@ -280,6 +280,7 @@ static int __ref _cpu_down(unsigned int 
>  				__func__, cpu);
>  		goto out_release;
>  	}
> +	smpboot_park_threads(cpu);
> 


If cpu_down fails further down, don't we want to unpark these threads
as part of error recovery?

>  	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
>  	if (err) {
> @@ -354,6 +355,10 @@ static int __cpuinit _cpu_up(unsigned in
>  		goto out;
>  	}
> 
> +	ret = smpboot_create_threads(cpu);
> +	if (ret)
> +		goto out;
> +


Here also, we might want to clean up on error right?

>  	ret = __cpu_notify(CPU_UP_PREPARE | mod, hcpu, -1, &nr_calls);
>  	if (ret) {
>  		nr_calls--;
> @@ -370,6 +375,7 @@ static int __cpuinit _cpu_up(unsigned in
> 
>  	/* Now call notifier in preparation. */
>  	cpu_notify(CPU_ONLINE | mod, hcpu);
> +	smpboot_unpark_threads(cpu);
> 
>  out_notify:
>  	if (ret != 0)



Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 1/5] kthread: Implement park/unpark facility
  2012-06-14  2:32   ` Namhyung Kim
@ 2012-06-14  8:35     ` Thomas Gleixner
  2012-06-14  8:59       ` Thomas Gleixner
  0 siblings, 1 reply; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-14  8:35 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Paul E. McKenney, Tejun Heo

On Thu, 14 Jun 2012, Namhyung Kim wrote:
> >  /**
> > + * kthread_create_on_cpu - Create a cpu bound kthread
> > + * @threadfn: the function to run until signal_pending(current).
> > + * @data: data ptr for @threadfn.
> > + * @node: memory node number.
> 
> Should be @cpu.

Yup. Copy and paste should be forbidden :)

> > +				   cpu);
> > +	if (IS_ERR(p))
> > +		return p;
> > +	/* Park the thread, mark it percpu and then bind it */
> > +	kthread_park(p);
> 
> Why park? AFAIK the p is not running after creation, so binding it
> doesn't need parking, right?
 
Parking is necessary to get it out of the UNINTERRUPTIBLE state,
otherwise the hung task detector would complain. Right the binding is
not strictly necessary. In fact we can delay it to unpark() where we
know that the target cpu is online already.
 
> > +void kthread_unpark(struct task_struct *k)
> > +{
> > +	struct kthread *kthread;
> > +
> > +	get_task_struct(k);
> > +
> > +	kthread = to_kthread(k);
> > +	barrier(); /* it might have exited */
> > +	if (k->vfork_done != NULL) {
> 
> Does is guarantee it's not exited? If so, as it is used a couple of
> times, wouldn't it be better making it up a (static?) helper (with
> comments, hopefully) something like a kthread_is_alive() ?

Good point.

> > +/**
> >   * kthread_stop - stop a thread created by kthread_create().
> >   * @k: thread created by kthread_create().
> >   *
> > @@ -259,7 +398,7 @@ int kthread_stop(struct task_struct *k)
> >  	kthread = to_kthread(k);
> >  	barrier(); /* it might have exited */
> >  	if (k->vfork_done != NULL) {
> > -		kthread->should_stop = 1;
> > +		set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
> 
> I wonder whether it also needs to unpark @k. Isn't it possible to stop a
> parked thread?

Right, it should be possible and it is. Clearing the park flag is not
strictly necessary, but yes we should do that.

Thanks for the review!

       tglx



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 1/5] kthread: Implement park/unpark facility
  2012-06-13 11:00 ` [RFC patch 1/5] kthread: Implement park/unpark facility Thomas Gleixner
  2012-06-14  2:32   ` Namhyung Kim
@ 2012-06-14  8:36   ` Srivatsa S. Bhat
  2012-06-14 20:01   ` Silas Boyd-Wickizer
  2012-06-14 22:13   ` Silas Boyd-Wickizer
  3 siblings, 0 replies; 53+ messages in thread
From: Srivatsa S. Bhat @ 2012-06-14  8:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Rusty Russell,
	Paul E. McKenney, Tejun Heo

On 06/13/2012 04:30 PM, Thomas Gleixner wrote:

> To avoid the full teardown/setup of per cpu kthreads in the case of
> cpu hot(un)plug, provide a facility which allows to put the kthread
> into a park position and unpark it when the cpu comes online again.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> 
>  /**
> + * kthread_should_park - should this kthread return now?
> + *
> + * When someone calls kthread_park() on your kthread, it will be woken
> + * and this will return true.  You should then return, and your return
> + * value will be passed through to kthread_park().
> + *
> + * Similar to kthread_should_stop(), but this keeps the thread alive
> + * and in a park position. kthread_unpark() "restart" the thread and


s/restart/restarts

> + * calls the thread function again.
> + */


Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14  8:08       ` Peter Zijlstra
  2012-06-14  8:17         ` Peter Zijlstra
@ 2012-06-14  8:37         ` Thomas Gleixner
  2012-06-14  8:38           ` Peter Zijlstra
  1 sibling, 1 reply; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-14  8:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, LKML, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, 14 Jun 2012, Peter Zijlstra wrote:

> On Wed, 2012-06-13 at 20:56 +0200, Thomas Gleixner wrote:
> > If it's just a spurious wakeup then it goes back to sleep right away
> > as nothing cleared the park bit. 
> 
> Your spurious wakeup will have destroyed the binding though. So you need
> to be careful.

We explicitely rebind it on unpark().

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14  8:37         ` Thomas Gleixner
@ 2012-06-14  8:38           ` Peter Zijlstra
  0 siblings, 0 replies; 53+ messages in thread
From: Peter Zijlstra @ 2012-06-14  8:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paul E. McKenney, LKML, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, 2012-06-14 at 10:37 +0200, Thomas Gleixner wrote:
> On Thu, 14 Jun 2012, Peter Zijlstra wrote:
> 
> > On Wed, 2012-06-13 at 20:56 +0200, Thomas Gleixner wrote:
> > > If it's just a spurious wakeup then it goes back to sleep right away
> > > as nothing cleared the park bit. 
> > 
> > Your spurious wakeup will have destroyed the binding though. So you need
> > to be careful.
> 
> We explicitely rebind it on unpark().

OK, no worries then.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14  8:31   ` Srivatsa S. Bhat
@ 2012-06-14  8:44     ` Thomas Gleixner
  0 siblings, 0 replies; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-14  8:44 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Rusty Russell,
	Paul E. McKenney, Tejun Heo

On Thu, 14 Jun 2012, Srivatsa S. Bhat wrote:
> On 06/13/2012 04:30 PM, Thomas Gleixner wrote:
> > @@ -280,6 +280,7 @@ static int __ref _cpu_down(unsigned int 
> >  				__func__, cpu);
> >  		goto out_release;
> >  	}
> > +	smpboot_park_threads(cpu);
> > 
> 
> 
> If cpu_down fails further down, don't we want to unpark these threads
> as part of error recovery?

Right.

> >  	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
> >  	if (err) {
> > @@ -354,6 +355,10 @@ static int __cpuinit _cpu_up(unsigned in
> >  		goto out;
> >  	}
> > 
> > +	ret = smpboot_create_threads(cpu);
> > +	if (ret)
> > +		goto out;
> > +
> 
> 
> Here also, we might want to clean up on error right?

Good question. If we failed to create a thread, we can't online the
cpu, but we might try again later. So now the question is whether we
really need to completely destroy the already created and parked ones
or just leave them around. No real opinion on that, I just picked the
lazy way :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 1/5] kthread: Implement park/unpark facility
  2012-06-14  8:35     ` Thomas Gleixner
@ 2012-06-14  8:59       ` Thomas Gleixner
  0 siblings, 0 replies; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-14  8:59 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Paul E. McKenney, Tejun Heo

On Thu, 14 Jun 2012, Thomas Gleixner wrote:
> On Thu, 14 Jun 2012, Namhyung Kim wrote:
> > I wonder whether it also needs to unpark @k. Isn't it possible to stop a
> > parked thread?
> 
> Right, it should be possible and it is. Clearing the park flag is not
> strictly necessary, but yes we should do that.

And it's necessary as parkme() wont return otherwise :)

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14  4:51           ` Paul E. McKenney
@ 2012-06-14 11:20             ` Thomas Gleixner
  2012-06-14 12:59               ` Paul E. McKenney
  2012-06-14 22:40             ` Paul E. McKenney
  1 sibling, 1 reply; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-14 11:20 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

B1;2601;0cOn Wed, 13 Jun 2012, Paul E. McKenney wrote:

> On Wed, Jun 13, 2012 at 01:47:25PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 13, 2012 at 12:17:45PM -0700, Paul E. McKenney wrote:
> > > On Wed, Jun 13, 2012 at 09:02:55PM +0200, Thomas Gleixner wrote:
> > > > On Wed, 13 Jun 2012, Paul E. McKenney wrote:
> > > > > On Wed, Jun 13, 2012 at 11:00:54AM -0000, Thomas Gleixner wrote:
> > > > > >  	/* Now call notifier in preparation. */
> > > > > >  	cpu_notify(CPU_ONLINE | mod, hcpu);
> > > > > > +	smpboot_unpark_threads(cpu);
> > > > > 
> > > > > OK, RCU must use the lower-level interfaces, given that one of
> > > > > then CPU_ONLINE notifiers might invoke synchronize_rcu().
> > > > 
> > > > We can start the threads before the notifiers. There is no
> > > > restriction.
> > > 
> > > Sounds very good in both cases!
> > 
> > Just for reference, here is what I am using.
> 
> And here is a buggy first attempt to make RCU use the smpboot interfaces.
> Probably still bugs in my adaptation, as it still hangs in the first
> attempt to offline a CPU.  If I revert the softirq smpboot commit, the
> offline still hangs somewhere near the __stop_machine() processing, but
> the system continues running otherwise.  Will continue debugging tomorrow.

I gave it a quick shot, but I was not able to reproduce the hang yet.

But looking at the thread function made me look into rcu_yield() and I
really wonder what kind of drug induced that particular piece of
horror.

I can't figure out why this yield business is necessary at all. The
commit logs are as helpful as the missing code comments :)

I suspect that it's some starvation issue. But if we need it, then
can't we replace it with something sane like the (untested) patch
below?

Thanks,

	tglx
---
 kernel/rcutree.h        |    2 -
 kernel/rcutree_plugin.h |   89 ++++++++++--------------------------------------
 2 files changed, 19 insertions(+), 72 deletions(-)

Index: tip/kernel/rcutree.h
===================================================================
--- tip.orig/kernel/rcutree.h
+++ tip/kernel/rcutree.h
@@ -469,8 +469,6 @@ static void rcu_boost_kthread_setaffinit
 static int __cpuinit rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 						 struct rcu_node *rnp,
 						 int rnp_index);
-static void invoke_rcu_node_kthread(struct rcu_node *rnp);
-static void rcu_yield(void (*f)(unsigned long), unsigned long arg);
 #endif /* #ifdef CONFIG_RCU_BOOST */
 static void rcu_cpu_kthread_setrt(int cpu, int to_rt);
 static void __cpuinit rcu_prepare_kthreads(int cpu);
Index: tip/kernel/rcutree_plugin.h
===================================================================
--- tip.orig/kernel/rcutree_plugin.h
+++ tip/kernel/rcutree_plugin.h
@@ -1217,6 +1217,16 @@ static void rcu_initiate_boost_trace(str
 
 #endif /* #else #ifdef CONFIG_RCU_TRACE */
 
+static void rcu_wake_cond(struct task_struct *t, int status)
+{
+	/*
+	 * If the thread is yielding, only wake it when this
+	 * is invoked from idle
+	 */
+	if (status != RCU_KTHREAD_YIELDING || is_idle_task(current))
+		wake_up_process(t);
+}
+
 /*
  * Carry out RCU priority boosting on the task indicated by ->exp_tasks
  * or ->boost_tasks, advancing the pointer to the next task in the
@@ -1289,17 +1299,6 @@ static int rcu_boost(struct rcu_node *rn
 }
 
 /*
- * Timer handler to initiate waking up of boost kthreads that
- * have yielded the CPU due to excessive numbers of tasks to
- * boost.  We wake up the per-rcu_node kthread, which in turn
- * will wake up the booster kthread.
- */
-static void rcu_boost_kthread_timer(unsigned long arg)
-{
-	invoke_rcu_node_kthread((struct rcu_node *)arg);
-}
-
-/*
  * Priority-boosting kthread.  One per leaf rcu_node and one for the
  * root rcu_node.
  */
@@ -1322,8 +1321,9 @@ static int rcu_boost_kthread(void *arg)
 		else
 			spincnt = 0;
 		if (spincnt > 10) {
+			rnp->boost_kthread_status = RCU_KTHREAD_YIELDING;
 			trace_rcu_utilization("End boost kthread@rcu_yield");
-			rcu_yield(rcu_boost_kthread_timer, (unsigned long)rnp);
+			schedule_timeout_interruptible(2);
 			trace_rcu_utilization("Start boost kthread@rcu_yield");
 			spincnt = 0;
 		}
@@ -1361,8 +1361,8 @@ static void rcu_initiate_boost(struct rc
 			rnp->boost_tasks = rnp->gp_tasks;
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 		t = rnp->boost_kthread_task;
-		if (t != NULL)
-			wake_up_process(t);
+		if (t)
+			rcu_wake_cond(t, rnp->boost_kthread_status);
 	} else {
 		rcu_initiate_boost_trace(rnp);
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
@@ -1379,8 +1379,10 @@ static void invoke_rcu_callbacks_kthread
 	local_irq_save(flags);
 	__this_cpu_write(rcu_cpu_has_work, 1);
 	if (__this_cpu_read(rcu_cpu_kthread_task) != NULL &&
-	    current != __this_cpu_read(rcu_cpu_kthread_task))
-		wake_up_process(__this_cpu_read(rcu_cpu_kthread_task));
+	    current != __this_cpu_read(rcu_cpu_kthread_task)) {
+		rcu_wake_cond(__this_cpu_read(rcu_cpu_kthread_task),
+			      __this_cpu_read(rcu_cpu_kthread_status));
+	}
 	local_irq_restore(flags);
 }
 
@@ -1476,20 +1478,6 @@ static void rcu_kthread_do_work(void)
 }
 
 /*
- * Wake up the specified per-rcu_node-structure kthread.
- * Because the per-rcu_node kthreads are immortal, we don't need
- * to do anything to keep them alive.
- */
-static void invoke_rcu_node_kthread(struct rcu_node *rnp)
-{
-	struct task_struct *t;
-
-	t = rnp->node_kthread_task;
-	if (t != NULL)
-		wake_up_process(t);
-}
-
-/*
  * Set the specified CPU's kthread to run RT or not, as specified by
  * the to_rt argument.  The CPU-hotplug locks are held, so the task
  * is not going away.
@@ -1514,45 +1502,6 @@ static void rcu_cpu_kthread_setrt(int cp
 }
 
 /*
- * Timer handler to initiate the waking up of per-CPU kthreads that
- * have yielded the CPU due to excess numbers of RCU callbacks.
- * We wake up the per-rcu_node kthread, which in turn will wake up
- * the booster kthread.
- */
-static void rcu_cpu_kthread_timer(unsigned long arg)
-{
-	struct rcu_data *rdp = per_cpu_ptr(rcu_state->rda, arg);
-	struct rcu_node *rnp = rdp->mynode;
-
-	atomic_or(rdp->grpmask, &rnp->wakemask);
-	invoke_rcu_node_kthread(rnp);
-}
-
-/*
- * Drop to non-real-time priority and yield, but only after posting a
- * timer that will cause us to regain our real-time priority if we
- * remain preempted.  Either way, we restore our real-time priority
- * before returning.
- */
-static void rcu_yield(void (*f)(unsigned long), unsigned long arg)
-{
-	struct sched_param sp;
-	struct timer_list yield_timer;
-	int prio = current->rt_priority;
-
-	setup_timer_on_stack(&yield_timer, f, arg);
-	mod_timer(&yield_timer, jiffies + 2);
-	sp.sched_priority = 0;
-	sched_setscheduler_nocheck(current, SCHED_NORMAL, &sp);
-	set_user_nice(current, 19);
-	schedule();
-	set_user_nice(current, 0);
-	sp.sched_priority = prio;
-	sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
-	del_timer(&yield_timer);
-}
-
-/*
  * Handle cases where the rcu_cpu_kthread() ends up on the wrong CPU.
  * This can happen while the corresponding CPU is either coming online
  * or going offline.  We cannot wait until the CPU is fully online
@@ -1624,7 +1573,7 @@ static int rcu_cpu_kthread(void *arg)
 		if (spincnt > 10) {
 			*statusp = RCU_KTHREAD_YIELDING;
 			trace_rcu_utilization("End CPU kthread@rcu_yield");
-			rcu_yield(rcu_cpu_kthread_timer, (unsigned long)cpu);
+			schedule_timeout_interruptible(2);
 			trace_rcu_utilization("Start CPU kthread@rcu_yield");
 			spincnt = 0;
 		}

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 11:20             ` Thomas Gleixner
@ 2012-06-14 12:59               ` Paul E. McKenney
  2012-06-14 13:01                 ` Peter Zijlstra
  2012-06-14 13:32                 ` Thomas Gleixner
  0 siblings, 2 replies; 53+ messages in thread
From: Paul E. McKenney @ 2012-06-14 12:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, Jun 14, 2012 at 01:20:39PM +0200, Thomas Gleixner wrote:
> B1;2601;0cOn Wed, 13 Jun 2012, Paul E. McKenney wrote:
> 
> > On Wed, Jun 13, 2012 at 01:47:25PM -0700, Paul E. McKenney wrote:
> > > On Wed, Jun 13, 2012 at 12:17:45PM -0700, Paul E. McKenney wrote:
> > > > On Wed, Jun 13, 2012 at 09:02:55PM +0200, Thomas Gleixner wrote:
> > > > > On Wed, 13 Jun 2012, Paul E. McKenney wrote:
> > > > > > On Wed, Jun 13, 2012 at 11:00:54AM -0000, Thomas Gleixner wrote:
> > > > > > >  	/* Now call notifier in preparation. */
> > > > > > >  	cpu_notify(CPU_ONLINE | mod, hcpu);
> > > > > > > +	smpboot_unpark_threads(cpu);
> > > > > > 
> > > > > > OK, RCU must use the lower-level interfaces, given that one of
> > > > > > then CPU_ONLINE notifiers might invoke synchronize_rcu().
> > > > > 
> > > > > We can start the threads before the notifiers. There is no
> > > > > restriction.
> > > > 
> > > > Sounds very good in both cases!
> > > 
> > > Just for reference, here is what I am using.
> > 
> > And here is a buggy first attempt to make RCU use the smpboot interfaces.
> > Probably still bugs in my adaptation, as it still hangs in the first
> > attempt to offline a CPU.  If I revert the softirq smpboot commit, the
> > offline still hangs somewhere near the __stop_machine() processing, but
> > the system continues running otherwise.  Will continue debugging tomorrow.
> 
> I gave it a quick shot, but I was not able to reproduce the hang yet.

Really?  I have a strictly Western-Hemisphere bug?  ;-)

> But looking at the thread function made me look into rcu_yield() and I
> really wonder what kind of drug induced that particular piece of
> horror.

When you are working on something like RCU priority boosting, no other
drug is in any way necessary.  ;-)

> I can't figure out why this yield business is necessary at all. The
> commit logs are as helpful as the missing code comments :)
> 
> I suspect that it's some starvation issue. But if we need it, then
> can't we replace it with something sane like the (untested) patch
> below?

Yep, starvation.  I will take a look at your approach after I wake
up a bit more.

							Thanx, Paul

> Thanks,
> 
> 	tglx
> ---
>  kernel/rcutree.h        |    2 -
>  kernel/rcutree_plugin.h |   89 ++++++++++--------------------------------------
>  2 files changed, 19 insertions(+), 72 deletions(-)
> 
> Index: tip/kernel/rcutree.h
> ===================================================================
> --- tip.orig/kernel/rcutree.h
> +++ tip/kernel/rcutree.h
> @@ -469,8 +469,6 @@ static void rcu_boost_kthread_setaffinit
>  static int __cpuinit rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
>  						 struct rcu_node *rnp,
>  						 int rnp_index);
> -static void invoke_rcu_node_kthread(struct rcu_node *rnp);
> -static void rcu_yield(void (*f)(unsigned long), unsigned long arg);
>  #endif /* #ifdef CONFIG_RCU_BOOST */
>  static void rcu_cpu_kthread_setrt(int cpu, int to_rt);
>  static void __cpuinit rcu_prepare_kthreads(int cpu);
> Index: tip/kernel/rcutree_plugin.h
> ===================================================================
> --- tip.orig/kernel/rcutree_plugin.h
> +++ tip/kernel/rcutree_plugin.h
> @@ -1217,6 +1217,16 @@ static void rcu_initiate_boost_trace(str
> 
>  #endif /* #else #ifdef CONFIG_RCU_TRACE */
> 
> +static void rcu_wake_cond(struct task_struct *t, int status)
> +{
> +	/*
> +	 * If the thread is yielding, only wake it when this
> +	 * is invoked from idle
> +	 */
> +	if (status != RCU_KTHREAD_YIELDING || is_idle_task(current))
> +		wake_up_process(t);
> +}
> +
>  /*
>   * Carry out RCU priority boosting on the task indicated by ->exp_tasks
>   * or ->boost_tasks, advancing the pointer to the next task in the
> @@ -1289,17 +1299,6 @@ static int rcu_boost(struct rcu_node *rn
>  }
> 
>  /*
> - * Timer handler to initiate waking up of boost kthreads that
> - * have yielded the CPU due to excessive numbers of tasks to
> - * boost.  We wake up the per-rcu_node kthread, which in turn
> - * will wake up the booster kthread.
> - */
> -static void rcu_boost_kthread_timer(unsigned long arg)
> -{
> -	invoke_rcu_node_kthread((struct rcu_node *)arg);
> -}
> -
> -/*
>   * Priority-boosting kthread.  One per leaf rcu_node and one for the
>   * root rcu_node.
>   */
> @@ -1322,8 +1321,9 @@ static int rcu_boost_kthread(void *arg)
>  		else
>  			spincnt = 0;
>  		if (spincnt > 10) {
> +			rnp->boost_kthread_status = RCU_KTHREAD_YIELDING;
>  			trace_rcu_utilization("End boost kthread@rcu_yield");
> -			rcu_yield(rcu_boost_kthread_timer, (unsigned long)rnp);
> +			schedule_timeout_interruptible(2);
>  			trace_rcu_utilization("Start boost kthread@rcu_yield");
>  			spincnt = 0;
>  		}
> @@ -1361,8 +1361,8 @@ static void rcu_initiate_boost(struct rc
>  			rnp->boost_tasks = rnp->gp_tasks;
>  		raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  		t = rnp->boost_kthread_task;
> -		if (t != NULL)
> -			wake_up_process(t);
> +		if (t)
> +			rcu_wake_cond(t, rnp->boost_kthread_status);
>  	} else {
>  		rcu_initiate_boost_trace(rnp);
>  		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> @@ -1379,8 +1379,10 @@ static void invoke_rcu_callbacks_kthread
>  	local_irq_save(flags);
>  	__this_cpu_write(rcu_cpu_has_work, 1);
>  	if (__this_cpu_read(rcu_cpu_kthread_task) != NULL &&
> -	    current != __this_cpu_read(rcu_cpu_kthread_task))
> -		wake_up_process(__this_cpu_read(rcu_cpu_kthread_task));
> +	    current != __this_cpu_read(rcu_cpu_kthread_task)) {
> +		rcu_wake_cond(__this_cpu_read(rcu_cpu_kthread_task),
> +			      __this_cpu_read(rcu_cpu_kthread_status));
> +	}
>  	local_irq_restore(flags);
>  }
> 
> @@ -1476,20 +1478,6 @@ static void rcu_kthread_do_work(void)
>  }
> 
>  /*
> - * Wake up the specified per-rcu_node-structure kthread.
> - * Because the per-rcu_node kthreads are immortal, we don't need
> - * to do anything to keep them alive.
> - */
> -static void invoke_rcu_node_kthread(struct rcu_node *rnp)
> -{
> -	struct task_struct *t;
> -
> -	t = rnp->node_kthread_task;
> -	if (t != NULL)
> -		wake_up_process(t);
> -}
> -
> -/*
>   * Set the specified CPU's kthread to run RT or not, as specified by
>   * the to_rt argument.  The CPU-hotplug locks are held, so the task
>   * is not going away.
> @@ -1514,45 +1502,6 @@ static void rcu_cpu_kthread_setrt(int cp
>  }
> 
>  /*
> - * Timer handler to initiate the waking up of per-CPU kthreads that
> - * have yielded the CPU due to excess numbers of RCU callbacks.
> - * We wake up the per-rcu_node kthread, which in turn will wake up
> - * the booster kthread.
> - */
> -static void rcu_cpu_kthread_timer(unsigned long arg)
> -{
> -	struct rcu_data *rdp = per_cpu_ptr(rcu_state->rda, arg);
> -	struct rcu_node *rnp = rdp->mynode;
> -
> -	atomic_or(rdp->grpmask, &rnp->wakemask);
> -	invoke_rcu_node_kthread(rnp);
> -}
> -
> -/*
> - * Drop to non-real-time priority and yield, but only after posting a
> - * timer that will cause us to regain our real-time priority if we
> - * remain preempted.  Either way, we restore our real-time priority
> - * before returning.
> - */
> -static void rcu_yield(void (*f)(unsigned long), unsigned long arg)
> -{
> -	struct sched_param sp;
> -	struct timer_list yield_timer;
> -	int prio = current->rt_priority;
> -
> -	setup_timer_on_stack(&yield_timer, f, arg);
> -	mod_timer(&yield_timer, jiffies + 2);
> -	sp.sched_priority = 0;
> -	sched_setscheduler_nocheck(current, SCHED_NORMAL, &sp);
> -	set_user_nice(current, 19);
> -	schedule();
> -	set_user_nice(current, 0);
> -	sp.sched_priority = prio;
> -	sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
> -	del_timer(&yield_timer);
> -}
> -
> -/*
>   * Handle cases where the rcu_cpu_kthread() ends up on the wrong CPU.
>   * This can happen while the corresponding CPU is either coming online
>   * or going offline.  We cannot wait until the CPU is fully online
> @@ -1624,7 +1573,7 @@ static int rcu_cpu_kthread(void *arg)
>  		if (spincnt > 10) {
>  			*statusp = RCU_KTHREAD_YIELDING;
>  			trace_rcu_utilization("End CPU kthread@rcu_yield");
> -			rcu_yield(rcu_cpu_kthread_timer, (unsigned long)cpu);
> +			schedule_timeout_interruptible(2);
>  			trace_rcu_utilization("Start CPU kthread@rcu_yield");
>  			spincnt = 0;
>  		}
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 12:59               ` Paul E. McKenney
@ 2012-06-14 13:01                 ` Peter Zijlstra
  2012-06-14 14:47                   ` Paul E. McKenney
  2012-06-14 13:32                 ` Thomas Gleixner
  1 sibling, 1 reply; 53+ messages in thread
From: Peter Zijlstra @ 2012-06-14 13:01 UTC (permalink / raw)
  To: paulmck
  Cc: Thomas Gleixner, LKML, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, 2012-06-14 at 05:59 -0700, Paul E. McKenney wrote:
> Yep, starvation.

What's the particular starvation case?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 12:59               ` Paul E. McKenney
  2012-06-14 13:01                 ` Peter Zijlstra
@ 2012-06-14 13:32                 ` Thomas Gleixner
  2012-06-14 13:47                   ` Thomas Gleixner
  2012-06-14 14:50                   ` Paul E. McKenney
  1 sibling, 2 replies; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-14 13:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, 14 Jun 2012, Paul E. McKenney wrote:
> On Thu, Jun 14, 2012 at 01:20:39PM +0200, Thomas Gleixner wrote:
> > I gave it a quick shot, but I was not able to reproduce the hang yet.
> 
> Really?  I have a strictly Western-Hemisphere bug?  ;-)

I guess I need to fire up rcu torture to make it surface.
 
> > But looking at the thread function made me look into rcu_yield() and I
> > really wonder what kind of drug induced that particular piece of
> > horror.
> 
> When you are working on something like RCU priority boosting, no other
> drug is in any way necessary.  ;-)

And how do we protect minors from that ?
 
> > I can't figure out why this yield business is necessary at all. The
> > commit logs are as helpful as the missing code comments :)
> > 
> > I suspect that it's some starvation issue. But if we need it, then
> > can't we replace it with something sane like the (untested) patch
> > below?
> 
> Yep, starvation.  I will take a look at your approach after I wake
> up a bit more.

Btw, if that simpler yield approach is working and I can't see why it
shouldn't then you can get rid of the node task as well. The only
purpose of it is to push up the priority of yielding tasks, right?

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 13:32                 ` Thomas Gleixner
@ 2012-06-14 13:47                   ` Thomas Gleixner
  2012-06-14 14:12                     ` Thomas Gleixner
  2012-06-14 14:50                   ` Paul E. McKenney
  1 sibling, 1 reply; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-14 13:47 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, 14 Jun 2012, Thomas Gleixner wrote:

> On Thu, 14 Jun 2012, Paul E. McKenney wrote:
> > On Thu, Jun 14, 2012 at 01:20:39PM +0200, Thomas Gleixner wrote:
> > > I gave it a quick shot, but I was not able to reproduce the hang yet.
> > 
> > Really?  I have a strictly Western-Hemisphere bug?  ;-)
> 
> I guess I need to fire up rcu torture to make it surface.
>  
> > > But looking at the thread function made me look into rcu_yield() and I
> > > really wonder what kind of drug induced that particular piece of
> > > horror.
> > 
> > When you are working on something like RCU priority boosting, no other
> > drug is in any way necessary.  ;-)
> 
> And how do we protect minors from that ?
>  
> > > I can't figure out why this yield business is necessary at all. The
> > > commit logs are as helpful as the missing code comments :)
> > > 
> > > I suspect that it's some starvation issue. But if we need it, then
> > > can't we replace it with something sane like the (untested) patch
> > > below?
> > 
> > Yep, starvation.  I will take a look at your approach after I wake
> > up a bit more.
> 
> Btw, if that simpler yield approach is working and I can't see why it
> shouldn't then you can get rid of the node task as well. The only
> purpose of it is to push up the priority of yielding tasks, right?

Ah, missed that it calls rcu_initiate_boost() as well....

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 13:47                   ` Thomas Gleixner
@ 2012-06-14 14:12                     ` Thomas Gleixner
  2012-06-14 15:01                       ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-14 14:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, 14 Jun 2012, Thomas Gleixner wrote:
> On Thu, 14 Jun 2012, Thomas Gleixner wrote:
> 
> > On Thu, 14 Jun 2012, Paul E. McKenney wrote:
> > > On Thu, Jun 14, 2012 at 01:20:39PM +0200, Thomas Gleixner wrote:
> > > > I gave it a quick shot, but I was not able to reproduce the hang yet.
> > > 
> > > Really?  I have a strictly Western-Hemisphere bug?  ;-)
> > 
> > I guess I need to fire up rcu torture to make it surface.
> >  
> > > > But looking at the thread function made me look into rcu_yield() and I
> > > > really wonder what kind of drug induced that particular piece of
> > > > horror.
> > > 
> > > When you are working on something like RCU priority boosting, no other
> > > drug is in any way necessary.  ;-)
> > 
> > And how do we protect minors from that ?
> >  
> > > > I can't figure out why this yield business is necessary at all. The
> > > > commit logs are as helpful as the missing code comments :)
> > > > 
> > > > I suspect that it's some starvation issue. But if we need it, then
> > > > can't we replace it with something sane like the (untested) patch
> > > > below?
> > > 
> > > Yep, starvation.  I will take a look at your approach after I wake
> > > up a bit more.
> > 
> > Btw, if that simpler yield approach is working and I can't see why it
> > shouldn't then you can get rid of the node task as well. The only
> > purpose of it is to push up the priority of yielding tasks, right?
> 
> Ah, missed that it calls rcu_initiate_boost() as well....

And looking further, I really don't understand why it's doing
that. That node thread is only woken by these weird yield timers.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 13:01                 ` Peter Zijlstra
@ 2012-06-14 14:47                   ` Paul E. McKenney
  2012-06-14 14:56                     ` Peter Zijlstra
  0 siblings, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2012-06-14 14:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, Jun 14, 2012 at 03:01:27PM +0200, Peter Zijlstra wrote:
> On Thu, 2012-06-14 at 05:59 -0700, Paul E. McKenney wrote:
> > Yep, starvation.
> 
> What's the particular starvation case?

RCU callback processing consumes the entire CPU in RCU_BOOST case where
processing runs at real-time priority.  This is analogous to RT throttling
in the scheduler.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 13:32                 ` Thomas Gleixner
  2012-06-14 13:47                   ` Thomas Gleixner
@ 2012-06-14 14:50                   ` Paul E. McKenney
  2012-06-14 15:02                     ` Thomas Gleixner
  2012-06-14 15:38                     ` Paul E. McKenney
  1 sibling, 2 replies; 53+ messages in thread
From: Paul E. McKenney @ 2012-06-14 14:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, Jun 14, 2012 at 03:32:19PM +0200, Thomas Gleixner wrote:
> On Thu, 14 Jun 2012, Paul E. McKenney wrote:
> > On Thu, Jun 14, 2012 at 01:20:39PM +0200, Thomas Gleixner wrote:
> > > I gave it a quick shot, but I was not able to reproduce the hang yet.
> > 
> > Really?  I have a strictly Western-Hemisphere bug?  ;-)
> 
> I guess I need to fire up rcu torture to make it surface.

A simple offline was triggering it for me.  Perhaps some of my debug
code was inappropriate, will retry.

> > > But looking at the thread function made me look into rcu_yield() and I
> > > really wonder what kind of drug induced that particular piece of
> > > horror.
> > 
> > When you are working on something like RCU priority boosting, no other
> > drug is in any way necessary.  ;-)
> 
> And how do we protect minors from that ?

We rely on their own sense of self-preservation preventing them from
getting involved in such insanity.

> > > I can't figure out why this yield business is necessary at all. The
> > > commit logs are as helpful as the missing code comments :)
> > > 
> > > I suspect that it's some starvation issue. But if we need it, then
> > > can't we replace it with something sane like the (untested) patch
> > > below?
> > 
> > Yep, starvation.  I will take a look at your approach after I wake
> > up a bit more.
> 
> Btw, if that simpler yield approach is working and I can't see why it
> shouldn't then you can get rid of the node task as well. The only
> purpose of it is to push up the priority of yielding tasks, right?

It also boosts the priority of preempted RCU read-side critical sections.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 14:47                   ` Paul E. McKenney
@ 2012-06-14 14:56                     ` Peter Zijlstra
  2012-06-14 15:07                       ` Thomas Gleixner
  2012-06-14 15:45                       ` Paul E. McKenney
  0 siblings, 2 replies; 53+ messages in thread
From: Peter Zijlstra @ 2012-06-14 14:56 UTC (permalink / raw)
  To: paulmck
  Cc: Thomas Gleixner, LKML, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, 2012-06-14 at 07:47 -0700, Paul E. McKenney wrote:
> RCU callback processing consumes the entire CPU in RCU_BOOST case where
> processing runs at real-time priority.  This is analogous to RT throttling
> in the scheduler. 

But previously we can in non-preemptible softirq context, why would if
behave differently when done from a RT task?

Also, no its not quite like the throttling, that really idles the cpu
even if there's no SCHED_OTHER tasks to run.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 14:12                     ` Thomas Gleixner
@ 2012-06-14 15:01                       ` Paul E. McKenney
  2012-06-14 15:08                         ` Thomas Gleixner
  0 siblings, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2012-06-14 15:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, Jun 14, 2012 at 04:12:14PM +0200, Thomas Gleixner wrote:
> On Thu, 14 Jun 2012, Thomas Gleixner wrote:
> > On Thu, 14 Jun 2012, Thomas Gleixner wrote:
> > 
> > > On Thu, 14 Jun 2012, Paul E. McKenney wrote:
> > > > On Thu, Jun 14, 2012 at 01:20:39PM +0200, Thomas Gleixner wrote:
> > > > > I gave it a quick shot, but I was not able to reproduce the hang yet.
> > > > 
> > > > Really?  I have a strictly Western-Hemisphere bug?  ;-)
> > > 
> > > I guess I need to fire up rcu torture to make it surface.
> > >  
> > > > > But looking at the thread function made me look into rcu_yield() and I
> > > > > really wonder what kind of drug induced that particular piece of
> > > > > horror.
> > > > 
> > > > When you are working on something like RCU priority boosting, no other
> > > > drug is in any way necessary.  ;-)
> > > 
> > > And how do we protect minors from that ?
> > >  
> > > > > I can't figure out why this yield business is necessary at all. The
> > > > > commit logs are as helpful as the missing code comments :)
> > > > > 
> > > > > I suspect that it's some starvation issue. But if we need it, then
> > > > > can't we replace it with something sane like the (untested) patch
> > > > > below?
> > > > 
> > > > Yep, starvation.  I will take a look at your approach after I wake
> > > > up a bit more.
> > > 
> > > Btw, if that simpler yield approach is working and I can't see why it
> > > shouldn't then you can get rid of the node task as well. The only
> > > purpose of it is to push up the priority of yielding tasks, right?
> > 
> > Ah, missed that it calls rcu_initiate_boost() as well....
> 
> And looking further, I really don't understand why it's doing
> that. That node thread is only woken by these weird yield timers.

If your patch works out, it indeed might be possible to get rid of
->node_kthread_task.  The ->boost_kthread_task needs to stay, however.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 14:50                   ` Paul E. McKenney
@ 2012-06-14 15:02                     ` Thomas Gleixner
  2012-06-14 15:38                     ` Paul E. McKenney
  1 sibling, 0 replies; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-14 15:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, 14 Jun 2012, Paul E. McKenney wrote:
> On Thu, Jun 14, 2012 at 03:32:19PM +0200, Thomas Gleixner wrote:
> > On Thu, 14 Jun 2012, Paul E. McKenney wrote:
> > > On Thu, Jun 14, 2012 at 01:20:39PM +0200, Thomas Gleixner wrote:
> > > > I gave it a quick shot, but I was not able to reproduce the hang yet.
> > > 
> > > Really?  I have a strictly Western-Hemisphere bug?  ;-)
> > 
> > I guess I need to fire up rcu torture to make it surface.
> 
> A simple offline was triggering it for me.  Perhaps some of my debug
> code was inappropriate, will retry.
> 
> > > > But looking at the thread function made me look into rcu_yield() and I
> > > > really wonder what kind of drug induced that particular piece of
> > > > horror.
> > > 
> > > When you are working on something like RCU priority boosting, no other
> > > drug is in any way necessary.  ;-)
> > 
> > And how do we protect minors from that ?
> 
> We rely on their own sense of self-preservation preventing them from
> getting involved in such insanity.
> 
> > > > I can't figure out why this yield business is necessary at all. The
> > > > commit logs are as helpful as the missing code comments :)
> > > > 
> > > > I suspect that it's some starvation issue. But if we need it, then
> > > > can't we replace it with something sane like the (untested) patch
> > > > below?
> > > 
> > > Yep, starvation.  I will take a look at your approach after I wake
> > > up a bit more.
> > 
> > Btw, if that simpler yield approach is working and I can't see why it
> > shouldn't then you can get rid of the node task as well. The only
> > purpose of it is to push up the priority of yielding tasks, right?
> 
> It also boosts the priority of preempted RCU read-side critical sections.

Though the only way how this thread is invoked is via the timeout of
that yield timer. So I really have a hard time for understanding that.

cpu_kthread()
    ....
    yield()
		timer fires -> mark cpu in mask and wakeup node kthread

  node_kthread()		
      do magic boost invocation
      push priority of cpu_kthread marked in mask

For the boost thread it looks like:

boost_kthread()
    .....
    yield()
		timer fires -> wakeup node kthread

  node_kthread()		
      do magic boost invocation, but no prio adjustment of boost thread.

/me scratches head

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 14:56                     ` Peter Zijlstra
@ 2012-06-14 15:07                       ` Thomas Gleixner
  2012-06-14 15:45                       ` Paul E. McKenney
  1 sibling, 0 replies; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-14 15:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulmck, LKML, Ingo Molnar, Srivatsa S. Bhat, Rusty Russell, Tejun Heo

On Thu, 14 Jun 2012, Peter Zijlstra wrote:

> On Thu, 2012-06-14 at 07:47 -0700, Paul E. McKenney wrote:
> > RCU callback processing consumes the entire CPU in RCU_BOOST case where
> > processing runs at real-time priority.  This is analogous to RT throttling
> > in the scheduler. 
> 
> But previously we can in non-preemptible softirq context, why would if
> behave differently when done from a RT task?

softirqs are different. They loop ten times and then wake ksoftirqd
which runs with sched_other.

Though that's wonky, because if an interrupt arrives before ksoftirqd
can take over we loop another 10 times in irq_exit(). Rinse and
repeat.....

And the main difference is that the scheduler does not yell on the
softirq context, but it yells when an rt task monopolizes the cpu.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 15:01                       ` Paul E. McKenney
@ 2012-06-14 15:08                         ` Thomas Gleixner
  0 siblings, 0 replies; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-14 15:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, 14 Jun 2012, Paul E. McKenney wrote:
> On Thu, Jun 14, 2012 at 04:12:14PM +0200, Thomas Gleixner wrote:
> > > > Btw, if that simpler yield approach is working and I can't see why it
> > > > shouldn't then you can get rid of the node task as well. The only
> > > > purpose of it is to push up the priority of yielding tasks, right?
> > > 
> > > Ah, missed that it calls rcu_initiate_boost() as well....
> > 
> > And looking further, I really don't understand why it's doing
> > that. That node thread is only woken by these weird yield timers.
> 
> If your patch works out, it indeed might be possible to get rid of
> ->node_kthread_task.  The ->boost_kthread_task needs to stay, however.

That's right.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 14:50                   ` Paul E. McKenney
  2012-06-14 15:02                     ` Thomas Gleixner
@ 2012-06-14 15:38                     ` Paul E. McKenney
  2012-06-14 16:19                       ` Thomas Gleixner
  1 sibling, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2012-06-14 15:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, Jun 14, 2012 at 07:50:38AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 14, 2012 at 03:32:19PM +0200, Thomas Gleixner wrote:
> > On Thu, 14 Jun 2012, Paul E. McKenney wrote:
> > > On Thu, Jun 14, 2012 at 01:20:39PM +0200, Thomas Gleixner wrote:
> > > > I gave it a quick shot, but I was not able to reproduce the hang yet.
> > > 
> > > Really?  I have a strictly Western-Hemisphere bug?  ;-)
> > 
> > I guess I need to fire up rcu torture to make it surface.
> 
> A simple offline was triggering it for me.  Perhaps some of my debug
> code was inappropriate, will retry.

And removing my debug code did the trick.  I guess some of it was really
just bug code.  In any case, it now passes short rcutorture testing for
me as well.

However, I do see the splat shown below.  The splat confuses me greatly,
as I thought that this kthread should be affinitied to a single CPU and
thus smp_processor_id() should refrain from splatting.  I don't see any
sort of "I broke affinity" message from the scheduler.

							Thanx, Paul

[  293.647090] rcu-torture:rcu_torture_onoff task: onlining 1
[  293.921180] BUG: using smp_processor_id() in preemptible [00000000] code: ksoftirqd/1/13
[  293.923686] ------------[ cut here ]------------
[  293.924467] kernel BUG at /media/homes/git/linux-2.6-tip.test/kernel/smpboot.c:107!
[  293.924855] invalid opcode: 0000 [#1] PREEMPT SMP 
[  293.924855] Modules linked in:
[  293.924855] 
[  293.924855] Pid: 13, comm: ksoftirqd/1 Not tainted 3.5.0-rc1+ #1424 Bochs Bochs
[  293.924855] EIP: 0060:[<c108e3e5>] EFLAGS: 00010202 CPU: 0
[  293.924855] EIP is at smpboot_thread_check_parking+0xb5/0xc0
[  293.924855] EAX: 00000000 EBX: df4003c0 ECX: 00000000 EDX: 00000000
[  293.924855] ESI: 00000001 EDI: c19a1ec0 EBP: df47ff54 ESP: df47ff48
[  293.924855]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  293.924855] CR0: 8005003b CR2: b7782000 CR3: 02cfd000 CR4: 00000690
[  293.924855] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  293.924855] DR6: ffff0ff0 DR7: 00000400
[  293.924855] Process ksoftirqd/1 (pid: 13, ti=df47e000 task=df480a20 task.ti=df47e000)
[  293.924855] Stack:
[  293.924855]  df480a20 df4003c0 df480a20 df47ff6c c1039cef 00000000 df459ed4 df4003c0
[  293.924855]  c1039cd0 df47ffe4 c105490f 00000001 00000001 df4003c0 00000000 c16e5050
[  293.924855]  dead4ead ffffffff ffffffff c335f98c c34fa620 00000000 c18bb502 df47ffa4
[  293.924855] Call Trace:
[  293.924855]  [<c1039cef>] run_ksoftirqd+0x1f/0x110
[  293.924855]  [<c1039cd0>] ? __do_softirq+0x330/0x330
[  293.924855]  [<c105490f>] kthread+0x8f/0xa0
[  293.924855]  [<c16e5050>] ? nmi_espfix_stack+0x2a/0x6a
[  293.924855]  [<c16e0000>] ? sock_rps_save_rxhash.isra.31.part.32+0xd6/0x127
[  293.924855]  [<c1054880>] ? flush_kthread_worker+0xe0/0xe0
[  293.924855]  [<c16eb17a>] kernel_thread_helper+0x6/0xd
[  293.924855] Code: 85 d2 74 04 8b 03 ff d2 c7 43 04 01 00 00 00 31 c0 5b 5e 5f 5d c3 83 f8 02 74 07 5b 31 c0 5e 5f 5d c3 8b 57 1c 85 d2 75 db eb dd <0f> 0b 89 f6 8d bc 27 00 00 00 00 55 89 e5 57 bf ff ff ff ff 56 
[  293.924855] EIP: [<c108e3e5>] smpboot_thread_check_parking+0xb5/0xc0 SS:ESP 0068:df47ff48
[  293.960654] rcu-torture:rcu_torture_onoff task: onlined 1


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 14:56                     ` Peter Zijlstra
  2012-06-14 15:07                       ` Thomas Gleixner
@ 2012-06-14 15:45                       ` Paul E. McKenney
  2012-06-14 16:20                         ` Thomas Gleixner
  1 sibling, 1 reply; 53+ messages in thread
From: Paul E. McKenney @ 2012-06-14 15:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, Jun 14, 2012 at 04:56:29PM +0200, Peter Zijlstra wrote:
> On Thu, 2012-06-14 at 07:47 -0700, Paul E. McKenney wrote:
> > RCU callback processing consumes the entire CPU in RCU_BOOST case where
> > processing runs at real-time priority.  This is analogous to RT throttling
> > in the scheduler. 
> 
> But previously we can in non-preemptible softirq context, why would if
> behave differently when done from a RT task?

In -rt, yes, but in mainline, ksoftirqd does not run at RT prio, right?

> Also, no its not quite like the throttling, that really idles the cpu
> even if there's no SCHED_OTHER tasks to run.

Agreed, not -exactly- like throttling, but it has a broadly similar
goal, namely to prevent a given type of processing from starving
everything else in the system.

That said, why does throttling idle the CPU even if there is no other
SCHED_OTHER tasks to run?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 15:38                     ` Paul E. McKenney
@ 2012-06-14 16:19                       ` Thomas Gleixner
  2012-06-14 16:48                         ` Paul E. McKenney
  0 siblings, 1 reply; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-14 16:19 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, 14 Jun 2012, Paul E. McKenney wrote:

> On Thu, Jun 14, 2012 at 07:50:38AM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 14, 2012 at 03:32:19PM +0200, Thomas Gleixner wrote:
> > > On Thu, 14 Jun 2012, Paul E. McKenney wrote:
> > > > On Thu, Jun 14, 2012 at 01:20:39PM +0200, Thomas Gleixner wrote:
> > > > > I gave it a quick shot, but I was not able to reproduce the hang yet.
> > > > 
> > > > Really?  I have a strictly Western-Hemisphere bug?  ;-)
> > > 
> > > I guess I need to fire up rcu torture to make it surface.
> > 
> > A simple offline was triggering it for me.  Perhaps some of my debug
> > code was inappropriate, will retry.
> 
> And removing my debug code did the trick.  I guess some of it was really
> just bug code.  In any case, it now passes short rcutorture testing for
> me as well.
> 
> However, I do see the splat shown below.  The splat confuses me greatly,
> as I thought that this kthread should be affinitied to a single CPU and
> thus smp_processor_id() should refrain from splatting.  I don't see any
> sort of "I broke affinity" message from the scheduler.

Hmm. Is that reproducible ?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 15:45                       ` Paul E. McKenney
@ 2012-06-14 16:20                         ` Thomas Gleixner
  0 siblings, 0 replies; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-14 16:20 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, 14 Jun 2012, Paul E. McKenney wrote:

> On Thu, Jun 14, 2012 at 04:56:29PM +0200, Peter Zijlstra wrote:
> > On Thu, 2012-06-14 at 07:47 -0700, Paul E. McKenney wrote:
> > > RCU callback processing consumes the entire CPU in RCU_BOOST case where
> > > processing runs at real-time priority.  This is analogous to RT throttling
> > > in the scheduler. 
> > 
> > But previously we can in non-preemptible softirq context, why would if
> > behave differently when done from a RT task?
> 
> In -rt, yes, but in mainline, ksoftirqd does not run at RT prio, right?
> 
> > Also, no its not quite like the throttling, that really idles the cpu
> > even if there's no SCHED_OTHER tasks to run.
> 
> Agreed, not -exactly- like throttling, but it has a broadly similar
> goal, namely to prevent a given type of processing from starving
> everything else in the system.
> 
> That said, why does throttling idle the CPU even if there is no other
> SCHED_OTHER tasks to run?

For simplicity reasons :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14 16:19                       ` Thomas Gleixner
@ 2012-06-14 16:48                         ` Paul E. McKenney
  0 siblings, 0 replies; 53+ messages in thread
From: Paul E. McKenney @ 2012-06-14 16:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Thu, Jun 14, 2012 at 06:19:50PM +0200, Thomas Gleixner wrote:
> On Thu, 14 Jun 2012, Paul E. McKenney wrote:
> 
> > On Thu, Jun 14, 2012 at 07:50:38AM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 14, 2012 at 03:32:19PM +0200, Thomas Gleixner wrote:
> > > > On Thu, 14 Jun 2012, Paul E. McKenney wrote:
> > > > > On Thu, Jun 14, 2012 at 01:20:39PM +0200, Thomas Gleixner wrote:
> > > > > > I gave it a quick shot, but I was not able to reproduce the hang yet.
> > > > > 
> > > > > Really?  I have a strictly Western-Hemisphere bug?  ;-)
> > > > 
> > > > I guess I need to fire up rcu torture to make it surface.
> > > 
> > > A simple offline was triggering it for me.  Perhaps some of my debug
> > > code was inappropriate, will retry.
> > 
> > And removing my debug code did the trick.  I guess some of it was really
> > just bug code.  In any case, it now passes short rcutorture testing for
> > me as well.
> > 
> > However, I do see the splat shown below.  The splat confuses me greatly,
> > as I thought that this kthread should be affinitied to a single CPU and
> > thus smp_processor_id() should refrain from splatting.  I don't see any
> > sort of "I broke affinity" message from the scheduler.
> 
> Hmm. Is that reproducible ?

Twice so far, will retry.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 1/5] kthread: Implement park/unpark facility
  2012-06-13 11:00 ` [RFC patch 1/5] kthread: Implement park/unpark facility Thomas Gleixner
  2012-06-14  2:32   ` Namhyung Kim
  2012-06-14  8:36   ` Srivatsa S. Bhat
@ 2012-06-14 20:01   ` Silas Boyd-Wickizer
  2012-06-14 20:13     ` Thomas Gleixner
  2012-06-14 22:13   ` Silas Boyd-Wickizer
  3 siblings, 1 reply; 53+ messages in thread
From: Silas Boyd-Wickizer @ 2012-06-14 20:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Paul E. McKenney, Tejun Heo

Hello,

I've taken a stab at converting stop_machine to use kthread_park and
kthread_unpark.  I don't think stop_machine can use
smpboot_park_threads (in their current implementation) because, among
other reasons, _cpu_down calls __stop_machine after parking the
stopper threads via smpboot_park_threads.

Silas

Signed-off-by: Silas Boyd-Wickizer <sbw@mit.edu>
---
 kernel/stop_machine.c |   41 ++++++++++++++++++-----------------------
 1 file changed, 18 insertions(+), 23 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 2f194e9..631f133 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -244,7 +244,7 @@ int try_stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)
 	return ret;
 }
 
-static int cpu_stopper_thread(void *data)
+static __noreturn int cpu_stopper_thread(void *data)
 {
 	struct cpu_stopper *stopper = data;
 	struct cpu_stop_work *work;
@@ -253,10 +253,8 @@ static int cpu_stopper_thread(void *data)
 repeat:
 	set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */
 
-	if (kthread_should_stop()) {
-		__set_current_state(TASK_RUNNING);
-		return 0;
-	}
+	if (kthread_should_park())
+		kthread_parkme();
 
 	work = NULL;
 	spin_lock_irq(&stopper->lock);
@@ -308,23 +306,23 @@ static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
 
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_UP_PREPARE:
-		BUG_ON(stopper->thread || stopper->enabled ||
-		       !list_empty(&stopper->works));
-		p = kthread_create_on_node(cpu_stopper_thread,
-					   stopper,
-					   cpu_to_node(cpu),
-					   "migration/%d", cpu);
-		if (IS_ERR(p))
-			return notifier_from_errno(PTR_ERR(p));
-		get_task_struct(p);
-		kthread_bind(p, cpu);
-		sched_set_stop_task(cpu, p);
-		stopper->thread = p;
+		if (!stopper->thread) {
+			BUG_ON(stopper->enabled ||
+			       !list_empty(&stopper->works));
+			p = kthread_create_on_cpu(cpu_stopper_thread,
+						  stopper,
+						  cpu,
+						  "migration/%d");
+			if (IS_ERR(p))
+				return notifier_from_errno(PTR_ERR(p));
+			get_task_struct(p);
+			stopper->thread = p;
+		}
+		sched_set_stop_task(cpu, stopper->thread);
 		break;
 
 	case CPU_ONLINE:
-		/* strictly unnecessary, as first user will wake it */
-		wake_up_process(stopper->thread);
+		kthread_unpark(stopper->thread);
 		/* mark enabled */
 		spin_lock_irq(&stopper->lock);
 		stopper->enabled = true;
@@ -339,16 +337,13 @@ static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
 
 		sched_set_stop_task(cpu, NULL);
 		/* kill the stopper */
-		kthread_stop(stopper->thread);
+		kthread_park(stopper->thread);
 		/* drain remaining works */
 		spin_lock_irq(&stopper->lock);
 		list_for_each_entry(work, &stopper->works, list)
 			cpu_stop_signal_done(work->done, false);
 		stopper->enabled = false;
 		spin_unlock_irq(&stopper->lock);
-		/* release the stopper */
-		put_task_struct(stopper->thread);
-		stopper->thread = NULL;
 		break;
 	}
 #endif

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [RFC patch 1/5] kthread: Implement park/unpark facility
  2012-06-14 20:01   ` Silas Boyd-Wickizer
@ 2012-06-14 20:13     ` Thomas Gleixner
  0 siblings, 0 replies; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-14 20:13 UTC (permalink / raw)
  To: Silas Boyd-Wickizer
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Paul E. McKenney, Tejun Heo

on Thu, 14 Jun 2012, Silas Boyd-Wickizer wrote:

> Hello,
> 
> I've taken a stab at converting stop_machine to use kthread_park and
> kthread_unpark.  I don't think stop_machine can use
> smpboot_park_threads (in their current implementation) because, among
> other reasons, _cpu_down calls __stop_machine after parking the
> stopper threads via smpboot_park_threads.

Yes, I know. That's why I haven't touched it yet. That needs some
further modifications down the road.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 1/5] kthread: Implement park/unpark facility
  2012-06-13 11:00 ` [RFC patch 1/5] kthread: Implement park/unpark facility Thomas Gleixner
                     ` (2 preceding siblings ...)
  2012-06-14 20:01   ` Silas Boyd-Wickizer
@ 2012-06-14 22:13   ` Silas Boyd-Wickizer
  2012-06-15  1:44     ` Tejun Heo
  3 siblings, 1 reply; 53+ messages in thread
From: Silas Boyd-Wickizer @ 2012-06-14 22:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Paul E. McKenney, Tejun Heo

Hello,

Attached is a rough patch that uses park/unpark for workqueue idle
threads.  The patch is a hack, it probably has bugs, and it certainly
doesn't simplify the workqueue code.  Perhaps, however, the patch
might be useful in guiding future changes to smp_boot_threads or
workqueues.

A difficulty in applying park/unpark (or smp_boot_threads
potentially), it that there is no single per-CPU thread for each CPU.
That is, the thread that starts out as the initial worker/idle thread
could be different from any thread that is idling when a CPU goes
offline.  An additional complication is that a per-cpu worker that is
idle may have initially been unbound (e.g. worker creation in the
trustee_thread can cause this).

Silas

Signed-off-by: Silas Boyd-Wickizer <sbw@mit.edu>
---
 kernel/workqueue.c |  114 +++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 87 insertions(+), 27 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9a3128d..fb5e18d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -61,6 +61,7 @@ enum {
 	WORKER_REBIND		= 1 << 5,	/* mom is home, come back */
 	WORKER_CPU_INTENSIVE	= 1 << 6,	/* cpu intensive */
 	WORKER_UNBOUND		= 1 << 7,	/* worker is unbound */
+	WORKER_PARK		= 1 << 8,
 
 	WORKER_NOT_RUNNING	= WORKER_PREP | WORKER_ROGUE | WORKER_REBIND |
 				  WORKER_CPU_INTENSIVE | WORKER_UNBOUND,
@@ -1412,6 +1413,13 @@ fail:
 	return NULL;
 }
 
+static void __start_worker(struct worker *worker)
+{
+	worker->flags |= WORKER_STARTED;
+	worker->gcwq->nr_workers++;
+	worker_enter_idle(worker);
+}
+
 /**
  * start_worker - start a newly created worker
  * @worker: worker to start
@@ -1423,12 +1431,29 @@ fail:
  */
 static void start_worker(struct worker *worker)
 {
-	worker->flags |= WORKER_STARTED;
-	worker->gcwq->nr_workers++;
-	worker_enter_idle(worker);
+	__start_worker(worker);
 	wake_up_process(worker->task);
 }
 
+static void unpark_worker(struct worker *worker)
+{
+	worker->flags &= ~WORKER_PARK;
+	__start_worker(worker);
+	kthread_unpark(worker->task);
+}
+
+static void __disconnect_worker(struct worker *worker)
+{
+	struct global_cwq *gcwq = worker->gcwq;
+
+	if (worker->flags & WORKER_STARTED)
+		gcwq->nr_workers--;
+	if (worker->flags & WORKER_IDLE)
+		gcwq->nr_idle--;
+
+	list_del_init(&worker->entry);
+}
+
 /**
  * destroy_worker - destroy a workqueue worker
  * @worker: worker to be destroyed
@@ -1447,12 +1472,7 @@ static void destroy_worker(struct worker *worker)
 	BUG_ON(worker->current_work);
 	BUG_ON(!list_empty(&worker->scheduled));
 
-	if (worker->flags & WORKER_STARTED)
-		gcwq->nr_workers--;
-	if (worker->flags & WORKER_IDLE)
-		gcwq->nr_idle--;
-
-	list_del_init(&worker->entry);
+	__disconnect_worker(worker);
 	worker->flags |= WORKER_DIE;
 
 	spin_unlock_irq(&gcwq->lock);
@@ -1464,6 +1484,23 @@ static void destroy_worker(struct worker *worker)
 	ida_remove(&gcwq->worker_ida, id);
 }
 
+static void park_idle_worker(struct worker *worker)
+{
+	struct global_cwq *gcwq = worker->gcwq;
+
+	BUG_ON(worker->current_work);
+	BUG_ON(!list_empty(&worker->scheduled));
+	BUG_ON(gcwq->first_idle);
+
+	__disconnect_worker(worker);
+	worker->flags = WORKER_PARK | WORKER_PREP;
+
+	spin_unlock_irq(&gcwq->lock);
+	kthread_park(worker->task);
+	spin_lock_irq(&gcwq->lock);
+	gcwq->first_idle = worker;
+}
+
 static void idle_worker_timeout(unsigned long __gcwq)
 {
 	struct global_cwq *gcwq = (void *)__gcwq;
@@ -1953,6 +1990,12 @@ woke_up:
 		return 0;
 	}
 
+	if (worker->flags & WORKER_PARK) {
+		spin_unlock_irq(&gcwq->lock);
+		kthread_parkme();
+		WARN_ON(!worker_maybe_bind_and_lock(worker));
+	}
+
 	worker_leave_idle(worker);
 recheck:
 	/* no more worker necessary? */
@@ -3340,6 +3383,12 @@ static int __cpuinit trustee_thread(void *__gcwq)
 
 	gcwq->flags |= GCWQ_MANAGING_WORKERS;
 
+	if (!gcwq->first_idle && !list_empty(&gcwq->idle_list)) {
+		worker = list_first_entry(&gcwq->idle_list,
+					struct worker, entry);
+		park_idle_worker(worker);
+	}
+
 	list_for_each_entry(worker, &gcwq->idle_list, entry)
 		worker->flags |= WORKER_ROGUE;
 
@@ -3516,15 +3565,7 @@ static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
 		if (IS_ERR(new_trustee))
 			return notifier_from_errno(PTR_ERR(new_trustee));
 		kthread_bind(new_trustee, cpu);
-		/* fall through */
-	case CPU_UP_PREPARE:
-		BUG_ON(gcwq->first_idle);
-		new_worker = create_worker(gcwq, false);
-		if (!new_worker) {
-			if (new_trustee)
-				kthread_stop(new_trustee);
-			return NOTIFY_BAD;
-		}
+		break;
 	}
 
 	/* some are called w/ irq disabled, don't disturb irq status */
@@ -3540,8 +3581,18 @@ static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
 		wait_trustee_state(gcwq, TRUSTEE_IN_CHARGE);
 		/* fall through */
 	case CPU_UP_PREPARE:
-		BUG_ON(gcwq->first_idle);
-		gcwq->first_idle = new_worker;
+		if (!gcwq->first_idle) {
+			spin_unlock_irq(&gcwq->lock);
+			new_worker = create_worker(gcwq, false);
+			if (!new_worker) {
+				if (new_trustee)
+					kthread_stop(new_trustee);
+				return NOTIFY_BAD;
+			}
+			spin_lock_irq(&gcwq->lock);
+			BUG_ON(gcwq->first_idle);
+			gcwq->first_idle = new_worker;
+		}
 		break;
 
 	case CPU_DYING:
@@ -3556,10 +3607,14 @@ static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
 
 	case CPU_POST_DEAD:
 		gcwq->trustee_state = TRUSTEE_BUTCHER;
-		/* fall through */
+		break;
+
 	case CPU_UP_CANCELED:
-		destroy_worker(gcwq->first_idle);
-		gcwq->first_idle = NULL;
+		/* If worker is parked, leave it parked for later */
+		if (!(gcwq->first_idle->flags & WORKER_PARK)) {
+			destroy_worker(gcwq->first_idle);
+			gcwq->first_idle = NULL;
+		}
 		break;
 
 	case CPU_DOWN_FAILED:
@@ -3570,17 +3625,22 @@ static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
 			wake_up_process(gcwq->trustee);
 			wait_trustee_state(gcwq, TRUSTEE_DONE);
 		}
+		BUG_ON(!gcwq->first_idle);
 
 		/*
 		 * Trustee is done and there might be no worker left.
 		 * Put the first_idle in and request a real manager to
 		 * take a look.
 		 */
-		spin_unlock_irq(&gcwq->lock);
-		kthread_bind(gcwq->first_idle->task, cpu);
-		spin_lock_irq(&gcwq->lock);
 		gcwq->flags |= GCWQ_MANAGE_WORKERS;
-		start_worker(gcwq->first_idle);
+		if (!(gcwq->first_idle->flags & WORKER_PARK)) {
+			spin_unlock_irq(&gcwq->lock);
+			kthread_bind(gcwq->first_idle->task, cpu);
+			spin_lock_irq(&gcwq->lock);
+			start_worker(gcwq->first_idle);
+		} else {
+			unpark_worker(gcwq->first_idle);
+		}
 		gcwq->first_idle = NULL;
 		break;
 	}

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14  4:51           ` Paul E. McKenney
  2012-06-14 11:20             ` Thomas Gleixner
@ 2012-06-14 22:40             ` Paul E. McKenney
  1 sibling, 0 replies; 53+ messages in thread
From: Paul E. McKenney @ 2012-06-14 22:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Tejun Heo

On Wed, Jun 13, 2012 at 09:51:25PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 13, 2012 at 01:47:25PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 13, 2012 at 12:17:45PM -0700, Paul E. McKenney wrote:
> > > On Wed, Jun 13, 2012 at 09:02:55PM +0200, Thomas Gleixner wrote:
> > > > On Wed, 13 Jun 2012, Paul E. McKenney wrote:
> > > > > On Wed, Jun 13, 2012 at 11:00:54AM -0000, Thomas Gleixner wrote:
> > > > > >  	/* Now call notifier in preparation. */
> > > > > >  	cpu_notify(CPU_ONLINE | mod, hcpu);
> > > > > > +	smpboot_unpark_threads(cpu);
> > > > > 
> > > > > OK, RCU must use the lower-level interfaces, given that one of
> > > > > then CPU_ONLINE notifiers might invoke synchronize_rcu().
> > > > 
> > > > We can start the threads before the notifiers. There is no
> > > > restriction.
> > > 
> > > Sounds very good in both cases!
> > 
> > Just for reference, here is what I am using.
> 
> And here is a buggy first attempt to make RCU use the smpboot interfaces.
> Probably still bugs in my adaptation, as it still hangs in the first
> attempt to offline a CPU.  If I revert the softirq smpboot commit, the
> offline still hangs somewhere near the __stop_machine() processing, but
> the system continues running otherwise.  Will continue debugging tomorrow.
> 
> When doing this sort of conversion, renaming the per-CPU variable used
> to hold the kthreads' task_struct pointers is highly recommended --
> failing to do so cost me substantial confusion.  ;-)

OK, if it is going to actually work, I guess I can sign it off.

							Thanx, Paul

------------------------------------------------------------------------

rcu: Use smp_hotplug_thread facility for RCU's per-CPU kthread

Bring RCU into the new-age CPU-hotplug fold by modifying RCU's per-CPU
kthread code to use the new smp_hotplug_thread facility.
    
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

 rcutree.c        |    4 -
 rcutree.h        |    2 
 rcutree_plugin.h |  177 ++++++++-----------------------------------------------
 rcutree_trace.c  |    3 
 4 files changed, 27 insertions(+), 159 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 0da7b88..7813d7d 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -125,7 +125,6 @@ static int rcu_scheduler_fully_active __read_mostly;
  */
 static DEFINE_PER_CPU(struct task_struct *, rcu_cpu_kthread_task);
 DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_status);
-DEFINE_PER_CPU(int, rcu_cpu_kthread_cpu);
 DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_loops);
 DEFINE_PER_CPU(char, rcu_cpu_has_work);
 
@@ -1458,7 +1457,6 @@ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
 	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
 
 	/* Adjust any no-longer-needed kthreads. */
-	rcu_stop_cpu_kthread(cpu);
 	rcu_node_kthread_setaffinity(rnp, -1);
 
 	/* Remove the dead CPU from the bitmasks in the rcu_node hierarchy. */
@@ -2514,11 +2512,9 @@ static int __cpuinit rcu_cpu_notify(struct notifier_block *self,
 	case CPU_ONLINE:
 	case CPU_DOWN_FAILED:
 		rcu_node_kthread_setaffinity(rnp, -1);
-		rcu_cpu_kthread_setrt(cpu, 1);
 		break;
 	case CPU_DOWN_PREPARE:
 		rcu_node_kthread_setaffinity(rnp, cpu);
-		rcu_cpu_kthread_setrt(cpu, 0);
 		break;
 	case CPU_DYING:
 	case CPU_DYING_FROZEN:
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 7f5d138..70883af 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -434,7 +434,6 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp);
 #ifdef CONFIG_HOTPLUG_CPU
 static void rcu_report_unblock_qs_rnp(struct rcu_node *rnp,
 				      unsigned long flags);
-static void rcu_stop_cpu_kthread(int cpu);
 #endif /* #ifdef CONFIG_HOTPLUG_CPU */
 static void rcu_print_detail_task_stall(struct rcu_state *rsp);
 static int rcu_print_task_stall(struct rcu_node *rnp);
@@ -472,7 +471,6 @@ static int __cpuinit rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 static void invoke_rcu_node_kthread(struct rcu_node *rnp);
 static void rcu_yield(void (*f)(unsigned long), unsigned long arg);
 #endif /* #ifdef CONFIG_RCU_BOOST */
-static void rcu_cpu_kthread_setrt(int cpu, int to_rt);
 static void __cpuinit rcu_prepare_kthreads(int cpu);
 static void rcu_prepare_for_idle_init(int cpu);
 static void rcu_cleanup_after_idle(int cpu);
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 2411000..f789341 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -25,6 +25,7 @@
  */
 
 #include <linux/delay.h>
+#include <linux/smpboot.h>
 
 #define RCU_KTHREAD_PRIO 1
 
@@ -1449,25 +1450,6 @@ static int __cpuinit rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 	return 0;
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-
-/*
- * Stop the RCU's per-CPU kthread when its CPU goes offline,.
- */
-static void rcu_stop_cpu_kthread(int cpu)
-{
-	struct task_struct *t;
-
-	/* Stop the CPU's kthread. */
-	t = per_cpu(rcu_cpu_kthread_task, cpu);
-	if (t != NULL) {
-		per_cpu(rcu_cpu_kthread_task, cpu) = NULL;
-		kthread_stop(t);
-	}
-}
-
-#endif /* #ifdef CONFIG_HOTPLUG_CPU */
-
 static void rcu_kthread_do_work(void)
 {
 	rcu_do_batch(&rcu_sched_state, &__get_cpu_var(rcu_sched_data));
@@ -1490,30 +1472,6 @@ static void invoke_rcu_node_kthread(struct rcu_node *rnp)
 }
 
 /*
- * Set the specified CPU's kthread to run RT or not, as specified by
- * the to_rt argument.  The CPU-hotplug locks are held, so the task
- * is not going away.
- */
-static void rcu_cpu_kthread_setrt(int cpu, int to_rt)
-{
-	int policy;
-	struct sched_param sp;
-	struct task_struct *t;
-
-	t = per_cpu(rcu_cpu_kthread_task, cpu);
-	if (t == NULL)
-		return;
-	if (to_rt) {
-		policy = SCHED_FIFO;
-		sp.sched_priority = RCU_KTHREAD_PRIO;
-	} else {
-		policy = SCHED_NORMAL;
-		sp.sched_priority = 0;
-	}
-	sched_setscheduler_nocheck(t, policy, &sp);
-}
-
-/*
  * Timer handler to initiate the waking up of per-CPU kthreads that
  * have yielded the CPU due to excess numbers of RCU callbacks.
  * We wake up the per-rcu_node kthread, which in turn will wake up
@@ -1553,63 +1511,35 @@ static void rcu_yield(void (*f)(unsigned long), unsigned long arg)
 }
 
 /*
- * Handle cases where the rcu_cpu_kthread() ends up on the wrong CPU.
- * This can happen while the corresponding CPU is either coming online
- * or going offline.  We cannot wait until the CPU is fully online
- * before starting the kthread, because the various notifier functions
- * can wait for RCU grace periods.  So we park rcu_cpu_kthread() until
- * the corresponding CPU is online.
- *
- * Return 1 if the kthread needs to stop, 0 otherwise.
- *
- * Caller must disable bh.  This function can momentarily enable it.
- */
-static int rcu_cpu_kthread_should_stop(int cpu)
-{
-	while (cpu_is_offline(cpu) ||
-	       !cpumask_equal(&current->cpus_allowed, cpumask_of(cpu)) ||
-	       smp_processor_id() != cpu) {
-		if (kthread_should_stop())
-			return 1;
-		per_cpu(rcu_cpu_kthread_status, cpu) = RCU_KTHREAD_OFFCPU;
-		per_cpu(rcu_cpu_kthread_cpu, cpu) = raw_smp_processor_id();
-		local_bh_enable();
-		schedule_timeout_uninterruptible(1);
-		if (!cpumask_equal(&current->cpus_allowed, cpumask_of(cpu)))
-			set_cpus_allowed_ptr(current, cpumask_of(cpu));
-		local_bh_disable();
-	}
-	per_cpu(rcu_cpu_kthread_cpu, cpu) = cpu;
-	return 0;
-}
-
-/*
  * Per-CPU kernel thread that invokes RCU callbacks.  This replaces the
  * RCU softirq used in flavors and configurations of RCU that do not
  * support RCU priority boosting.
  */
-static int rcu_cpu_kthread(void *arg)
+static int rcu_cpu_kthread(void *cookie)
 {
-	int cpu = (int)(long)arg;
 	unsigned long flags;
+	struct sched_param sp;
 	int spincnt = 0;
-	unsigned int *statusp = &per_cpu(rcu_cpu_kthread_status, cpu);
+	unsigned int *statusp = &__get_cpu_var(rcu_cpu_kthread_status);
 	char work;
-	char *workp = &per_cpu(rcu_cpu_has_work, cpu);
+	char *workp = &__get_cpu_var(rcu_cpu_has_work);
 
-	trace_rcu_utilization("Start CPU kthread@init");
+	trace_rcu_utilization("Start CPU kthread@unpark");
+	sp.sched_priority = RCU_KTHREAD_PRIO;
+	sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
 	for (;;) {
 		*statusp = RCU_KTHREAD_WAITING;
 		trace_rcu_utilization("End CPU kthread@rcu_wait");
-		rcu_wait(*workp != 0 || kthread_should_stop());
+		rcu_wait(*workp != 0 ||
+			 smpboot_thread_check_parking(cookie));
 		trace_rcu_utilization("Start CPU kthread@rcu_wait");
 		local_bh_disable();
-		if (rcu_cpu_kthread_should_stop(cpu)) {
+		if (smpboot_thread_check_parking(cookie)) {
 			local_bh_enable();
 			break;
 		}
 		*statusp = RCU_KTHREAD_RUNNING;
-		per_cpu(rcu_cpu_kthread_loops, cpu)++;
+		this_cpu_inc(rcu_cpu_kthread_loops);
 		local_irq_save(flags);
 		work = *workp;
 		*workp = 0;
@@ -1624,59 +1554,14 @@ static int rcu_cpu_kthread(void *arg)
 		if (spincnt > 10) {
 			*statusp = RCU_KTHREAD_YIELDING;
 			trace_rcu_utilization("End CPU kthread@rcu_yield");
-			rcu_yield(rcu_cpu_kthread_timer, (unsigned long)cpu);
+			rcu_yield(rcu_cpu_kthread_timer,
+				  smp_processor_id());
 			trace_rcu_utilization("Start CPU kthread@rcu_yield");
 			spincnt = 0;
 		}
 	}
 	*statusp = RCU_KTHREAD_STOPPED;
-	trace_rcu_utilization("End CPU kthread@term");
-	return 0;
-}
-
-/*
- * Spawn a per-CPU kthread, setting up affinity and priority.
- * Because the CPU hotplug lock is held, no other CPU will be attempting
- * to manipulate rcu_cpu_kthread_task.  There might be another CPU
- * attempting to access it during boot, but the locking in kthread_bind()
- * will enforce sufficient ordering.
- *
- * Please note that we cannot simply refuse to wake up the per-CPU
- * kthread because kthreads are created in TASK_UNINTERRUPTIBLE state,
- * which can result in softlockup complaints if the task ends up being
- * idle for more than a couple of minutes.
- *
- * However, please note also that we cannot bind the per-CPU kthread to its
- * CPU until that CPU is fully online.  We also cannot wait until the
- * CPU is fully online before we create its per-CPU kthread, as this would
- * deadlock the system when CPU notifiers tried waiting for grace
- * periods.  So we bind the per-CPU kthread to its CPU only if the CPU
- * is online.  If its CPU is not yet fully online, then the code in
- * rcu_cpu_kthread() will wait until it is fully online, and then do
- * the binding.
- */
-static int __cpuinit rcu_spawn_one_cpu_kthread(int cpu)
-{
-	struct sched_param sp;
-	struct task_struct *t;
-
-	if (!rcu_scheduler_fully_active ||
-	    per_cpu(rcu_cpu_kthread_task, cpu) != NULL)
-		return 0;
-	t = kthread_create_on_node(rcu_cpu_kthread,
-				   (void *)(long)cpu,
-				   cpu_to_node(cpu),
-				   "rcuc/%d", cpu);
-	if (IS_ERR(t))
-		return PTR_ERR(t);
-	if (cpu_online(cpu))
-		kthread_bind(t, cpu);
-	per_cpu(rcu_cpu_kthread_cpu, cpu) = cpu;
-	WARN_ON_ONCE(per_cpu(rcu_cpu_kthread_task, cpu) != NULL);
-	sp.sched_priority = RCU_KTHREAD_PRIO;
-	sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
-	per_cpu(rcu_cpu_kthread_task, cpu) = t;
-	wake_up_process(t); /* Get to TASK_INTERRUPTIBLE quickly. */
+	trace_rcu_utilization("End CPU kthread@park");
 	return 0;
 }
 
@@ -1788,6 +1673,12 @@ static int __cpuinit rcu_spawn_one_node_kthread(struct rcu_state *rsp,
 	return rcu_spawn_one_boost_kthread(rsp, rnp, rnp_index);
 }
 
+static struct smp_hotplug_thread rcu_cpu_thread_spec = {
+	.store = &rcu_cpu_kthread_task,
+	.thread_fn = rcu_cpu_kthread,
+	.thread_comm = "rcuc/%u",
+};
+
 /*
  * Spawn all kthreads -- called as soon as the scheduler is running.
  */
@@ -1797,11 +1688,9 @@ static int __init rcu_spawn_kthreads(void)
 	struct rcu_node *rnp;
 
 	rcu_scheduler_fully_active = 1;
-	for_each_possible_cpu(cpu) {
+	BUG_ON(smpboot_register_percpu_thread(&rcu_cpu_thread_spec));
+	for_each_possible_cpu(cpu)
 		per_cpu(rcu_cpu_has_work, cpu) = 0;
-		if (cpu_online(cpu))
-			(void)rcu_spawn_one_cpu_kthread(cpu);
-	}
 	rnp = rcu_get_root(rcu_state);
 	(void)rcu_spawn_one_node_kthread(rcu_state, rnp);
 	if (NUM_RCU_NODES > 1) {
@@ -1818,11 +1707,9 @@ static void __cpuinit rcu_prepare_kthreads(int cpu)
 	struct rcu_node *rnp = rdp->mynode;
 
 	/* Fire up the incoming CPU's kthread and leaf rcu_node kthread. */
-	if (rcu_scheduler_fully_active) {
-		(void)rcu_spawn_one_cpu_kthread(cpu);
-		if (rnp->node_kthread_task == NULL)
-			(void)rcu_spawn_one_node_kthread(rcu_state, rnp);
-	}
+	if (rcu_scheduler_fully_active &&
+	    rnp->node_kthread_task == NULL)
+		(void)rcu_spawn_one_node_kthread(rcu_state, rnp);
 }
 
 #else /* #ifdef CONFIG_RCU_BOOST */
@@ -1846,22 +1733,10 @@ static void rcu_preempt_boost_start_gp(struct rcu_node *rnp)
 {
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-
-static void rcu_stop_cpu_kthread(int cpu)
-{
-}
-
-#endif /* #ifdef CONFIG_HOTPLUG_CPU */
-
 static void rcu_node_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 {
 }
 
-static void rcu_cpu_kthread_setrt(int cpu, int to_rt)
-{
-}
-
 static int __init rcu_scheduler_really_started(void)
 {
 	rcu_scheduler_fully_active = 1;
diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c
index d4bc16d..6b4c76b 100644
--- a/kernel/rcutree_trace.c
+++ b/kernel/rcutree_trace.c
@@ -83,11 +83,10 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp)
 			rdp->nxttail[RCU_WAIT_TAIL]],
 		   ".D"[&rdp->nxtlist != rdp->nxttail[RCU_DONE_TAIL]]);
 #ifdef CONFIG_RCU_BOOST
-	seq_printf(m, " kt=%d/%c/%d ktl=%x",
+	seq_printf(m, " kt=%d/%c ktl=%x",
 		   per_cpu(rcu_cpu_has_work, rdp->cpu),
 		   convert_kthread_status(per_cpu(rcu_cpu_kthread_status,
 					  rdp->cpu)),
-		   per_cpu(rcu_cpu_kthread_cpu, rdp->cpu),
 		   per_cpu(rcu_cpu_kthread_loops, rdp->cpu) & 0xffff);
 #endif /* #ifdef CONFIG_RCU_BOOST */
 	seq_printf(m, " b=%ld", rdp->blimit);


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [RFC patch 1/5] kthread: Implement park/unpark facility
  2012-06-14 22:13   ` Silas Boyd-Wickizer
@ 2012-06-15  1:44     ` Tejun Heo
  0 siblings, 0 replies; 53+ messages in thread
From: Tejun Heo @ 2012-06-15  1:44 UTC (permalink / raw)
  To: Silas Boyd-Wickizer
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Srivatsa S. Bhat, Rusty Russell, Paul E. McKenney

On Thu, Jun 14, 2012 at 03:13:31PM -0700, Silas Boyd-Wickizer wrote:
> Hello,
> 
> Attached is a rough patch that uses park/unpark for workqueue idle
> threads.  The patch is a hack, it probably has bugs, and it certainly
> doesn't simplify the workqueue code.  Perhaps, however, the patch
> might be useful in guiding future changes to smp_boot_threads or
> workqueues.
> 
> A difficulty in applying park/unpark (or smp_boot_threads
> potentially), it that there is no single per-CPU thread for each CPU.
> That is, the thread that starts out as the initial worker/idle thread
> could be different from any thread that is idling when a CPU goes
> offline.  An additional complication is that a per-cpu worker that is
> idle may have initially been unbound (e.g. worker creation in the
> trustee_thread can cause this).

I haven't looked at it in any detail yet but if the goal is just
keeping workers around for temporarily down CPUs, workqueue changes
can be pretty simple.  I *think* all that's necessary is trustee not
butchering on cpu down.  wq already has mechanisms to recycle dangling
workers when a cpu comes back online.  I'll give it a shot next week.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-14  8:17         ` Peter Zijlstra
@ 2012-06-15  1:53           ` Tejun Heo
  2012-06-15  9:58             ` Peter Zijlstra
  0 siblings, 1 reply; 53+ messages in thread
From: Tejun Heo @ 2012-06-15  1:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Paul E. McKenney, LKML, Ingo Molnar,
	Srivatsa S. Bhat, Rusty Russell

On Thu, Jun 14, 2012 at 10:17:28AM +0200, Peter Zijlstra wrote:
> On Thu, 2012-06-14 at 10:08 +0200, Peter Zijlstra wrote:
> > On Wed, 2012-06-13 at 20:56 +0200, Thomas Gleixner wrote:
> > > If it's just a spurious wakeup then it goes back to sleep right away
> > > as nothing cleared the park bit. 
> > 
> > Your spurious wakeup will have destroyed the binding though. So you need
> > to be careful.
> 
> We should probably do something like the below..
> 
> TJ does this wreck workqueues? Its somewhat 'creative' in that regard
> and really wants fixing.
> 
> ---
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5018,6 +5018,8 @@ void do_set_cpus_allowed(struct task_str
>  
>  	cpumask_copy(&p->cpus_allowed, new_mask);
>  	p->nr_cpus_allowed = cpumask_weight(new_mask);
> +	if (p->nr_cpus_allowed != 1)
> +		p->flags &= ~PF_THREAD_BOUND;

The only reason wq workers use PF_THREAD_BOUND is to prevent userland
from mucking with cpus_allowed, so the above wouldn't break anything
in itself although userland would be able to wreck it afterwards.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-15  1:53           ` Tejun Heo
@ 2012-06-15  9:58             ` Peter Zijlstra
  0 siblings, 0 replies; 53+ messages in thread
From: Peter Zijlstra @ 2012-06-15  9:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Paul E. McKenney, LKML, Ingo Molnar,
	Srivatsa S. Bhat, Rusty Russell

On Fri, 2012-06-15 at 10:53 +0900, Tejun Heo wrote:
> On Thu, Jun 14, 2012 at 10:17:28AM +0200, Peter Zijlstra wrote:
> > On Thu, 2012-06-14 at 10:08 +0200, Peter Zijlstra wrote:
> > > On Wed, 2012-06-13 at 20:56 +0200, Thomas Gleixner wrote:
> > > > If it's just a spurious wakeup then it goes back to sleep right away
> > > > as nothing cleared the park bit. 
> > > 
> > > Your spurious wakeup will have destroyed the binding though. So you need
> > > to be careful.
> > 
> > We should probably do something like the below..
> > 
> > TJ does this wreck workqueues? Its somewhat 'creative' in that regard
> > and really wants fixing.
> > 
> > ---
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -5018,6 +5018,8 @@ void do_set_cpus_allowed(struct task_str
> >  
> >  	cpumask_copy(&p->cpus_allowed, new_mask);
> >  	p->nr_cpus_allowed = cpumask_weight(new_mask);
> > +	if (p->nr_cpus_allowed != 1)
> > +		p->flags &= ~PF_THREAD_BOUND;
> 
> The only reason wq workers use PF_THREAD_BOUND is to prevent userland
> from mucking with cpus_allowed, so the above wouldn't break anything
> in itself although userland would be able to wreck it afterwards.

Thing is, if things could get wrecked by userland moving a thread to a
different cpu, things just got wrecked by the kernel doing that very
same thing.

PF_THREAD_BOUND isn't called PF_NO_USER_AFFINITY (although it seems a
popular interpretation).



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 5/5] infiniband: ehca: Use hotplug thread infrastructure
  2012-06-13 11:00 ` [RFC patch 5/5] infiniband: ehca: " Thomas Gleixner
@ 2012-06-18  6:30   ` Rusty Russell
  2012-06-18  8:08     ` Thomas Gleixner
  0 siblings, 1 reply; 53+ messages in thread
From: Rusty Russell @ 2012-06-18  6:30 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat, Paul E. McKenney,
	Tejun Heo

On Wed, 13 Jun 2012 11:00:56 -0000, Thomas Gleixner <tglx@linutronix.de> wrote:
> @@ -662,10 +663,15 @@ static inline int find_next_online_cpu(s
>  		ehca_dmp(cpu_online_mask, cpumask_size(), "");
>  
>  	spin_lock_irqsave(&pool->last_cpu_lock, flags);
> -	cpu = cpumask_next(pool->last_cpu, cpu_online_mask);
> -	if (cpu >= nr_cpu_ids)
> -		cpu = cpumask_first(cpu_online_mask);
> -	pool->last_cpu = cpu;
> +	while (1) {
> +		cpu = cpumask_next(pool->last_cpu, cpu_online_mask);
> +		if (cpu >= nr_cpu_ids)
> +			cpu = cpumask_first(cpu_online_mask);
> +		pool->last_cpu = cpu;
> +		/* Might be on the way out */
> +		if (per_cpu_ptr(pool->cpu_comp_tasks, cpu)->active)
> +			break;
> +	}

Heh, isn't this what we used to call a "do while" loop? :)

Your infrastructure is a really weird mix.  On the one hand, it's a set
of callbacks: setup, cleanup, park, unpark.  Cool.

On the other hand, instead of a 'run' callback, you've got a thread_fn,
which has to loop and call smpboot_thread_check_parking().

If you just had the thread_fn, it'd be trivial to follow program flow.
If you just had the callbacks, it'd still be pretty easy, though it
seems like a little too much help.

As it is, we have Paul doing setup stuff inside his thread_fn:

     +	trace_rcu_utilization("Start CPU kthread@unpark");
     +	sp.sched_priority = RCU_KTHREAD_PRIO;
     +	sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);

I'm just not sure this complexity wins us anything.  Why not
just let people "register_percpu_kthread" and make the thread_fn
identical to normal kthread fns:

        while (!kthread_should_stop()) {
                if (kthread_should_park()) {
                        kthread_parkme();
                        continue;
                }

Maybe implement a 'bool kthread_stop_or_park()' helper.

I'll whip up a patch on top of yours if you don't think it's crazy...

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 5/5] infiniband: ehca: Use hotplug thread infrastructure
  2012-06-18  6:30   ` Rusty Russell
@ 2012-06-18  8:08     ` Thomas Gleixner
  2012-06-24 10:29       ` Thomas Gleixner
  0 siblings, 1 reply; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-18  8:08 UTC (permalink / raw)
  To: Rusty Russell
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Paul E. McKenney, Tejun Heo

On Mon, 18 Jun 2012, Rusty Russell wrote:

> On Wed, 13 Jun 2012 11:00:56 -0000, Thomas Gleixner <tglx@linutronix.de> wrote:
> > @@ -662,10 +663,15 @@ static inline int find_next_online_cpu(s
> >  		ehca_dmp(cpu_online_mask, cpumask_size(), "");
> >  
> >  	spin_lock_irqsave(&pool->last_cpu_lock, flags);
> > -	cpu = cpumask_next(pool->last_cpu, cpu_online_mask);
> > -	if (cpu >= nr_cpu_ids)
> > -		cpu = cpumask_first(cpu_online_mask);
> > -	pool->last_cpu = cpu;
> > +	while (1) {
> > +		cpu = cpumask_next(pool->last_cpu, cpu_online_mask);
> > +		if (cpu >= nr_cpu_ids)
> > +			cpu = cpumask_first(cpu_online_mask);
> > +		pool->last_cpu = cpu;
> > +		/* Might be on the way out */
> > +		if (per_cpu_ptr(pool->cpu_comp_tasks, cpu)->active)
> > +			break;
> > +	}
> 
> Heh, isn't this what we used to call a "do while" loop? :)

Yep :)
 
> Your infrastructure is a really weird mix.  On the one hand, it's a set
> of callbacks: setup, cleanup, park, unpark.  Cool.
> 
> On the other hand, instead of a 'run' callback, you've got a thread_fn,
> which has to loop and call smpboot_thread_check_parking().
> 
> If you just had the thread_fn, it'd be trivial to follow program flow.
> If you just had the callbacks, it'd still be pretty easy, though it
> seems like a little too much help.

The reason why I moved stuff into the callbacks and have the state
machine inside of smpboot_check_kthread_parking() is that every user
has to do that. i.e. keep track of the state so you wont
setup/teardown stuff twice.

Now look at that function and copy it into all users of the park
infrastructure. Not pretty. I had it that way and it was fugly as
hell. That's why I came up with the callbacks and the generic state
machine.
 
> As it is, we have Paul doing setup stuff inside his thread_fn:
> 
>      +	trace_rcu_utilization("Start CPU kthread@unpark");
>      +	sp.sched_priority = RCU_KTHREAD_PRIO;
>      +	sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);

I fixed that already :)
 
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-13 11:00 ` [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads Thomas Gleixner
                     ` (2 preceding siblings ...)
  2012-06-14  8:31   ` Srivatsa S. Bhat
@ 2012-06-18  8:47   ` Cyrill Gorcunov
  2012-06-18  8:50     ` Thomas Gleixner
  3 siblings, 1 reply; 53+ messages in thread
From: Cyrill Gorcunov @ 2012-06-18  8:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Paul E. McKenney, Tejun Heo

On Wed, Jun 13, 2012 at 11:00:54AM -0000, Thomas Gleixner wrote:
> +
> +static int
> +__smpboot_create_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
> +{
> +	struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
> +	struct smpboot_thread_data *td;
> +
> +	if (tsk)
> +		return 0;
> +
> +	td = kzalloc_node(sizeof(*td), GFP_KERNEL, cpu_to_node(cpu));
> +	if (!td)
> +		return -ENOMEM;
> +	td->cpu = cpu;
> +	td->ht = ht;
> +
> +	tsk = kthread_create_on_cpu(ht->thread_fn, td, cpu, ht->thread_comm);
> +	if (IS_ERR(tsk))
> +		return PTR_ERR(tsk);
> +

Hi Thomas, I might be missing something obvious but will not we leak td
allocated here if kthread_create_on_cpu failed?

	Cyrill

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads
  2012-06-18  8:47   ` Cyrill Gorcunov
@ 2012-06-18  8:50     ` Thomas Gleixner
  0 siblings, 0 replies; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-18  8:50 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Rusty Russell, Paul E. McKenney, Tejun Heo

On Mon, 18 Jun 2012, Cyrill Gorcunov wrote:
> On Wed, Jun 13, 2012 at 11:00:54AM -0000, Thomas Gleixner wrote:
> > +
> > +static int
> > +__smpboot_create_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
> > +{
> > +	struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
> > +	struct smpboot_thread_data *td;
> > +
> > +	if (tsk)
> > +		return 0;
> > +
> > +	td = kzalloc_node(sizeof(*td), GFP_KERNEL, cpu_to_node(cpu));
> > +	if (!td)
> > +		return -ENOMEM;
> > +	td->cpu = cpu;
> > +	td->ht = ht;
> > +
> > +	tsk = kthread_create_on_cpu(ht->thread_fn, td, cpu, ht->thread_comm);
> > +	if (IS_ERR(tsk))
> > +		return PTR_ERR(tsk);
> > +
> 
> Hi Thomas, I might be missing something obvious but will not we leak td
> allocated here if kthread_create_on_cpu failed?

Yes. Good catch!

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC patch 5/5] infiniband: ehca: Use hotplug thread infrastructure
  2012-06-18  8:08     ` Thomas Gleixner
@ 2012-06-24 10:29       ` Thomas Gleixner
  0 siblings, 0 replies; 53+ messages in thread
From: Thomas Gleixner @ 2012-06-24 10:29 UTC (permalink / raw)
  To: Rusty Russell
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Srivatsa S. Bhat,
	Paul E. McKenney, Tejun Heo

On Mon, 18 Jun 2012, Thomas Gleixner wrote:
> > Your infrastructure is a really weird mix.  On the one hand, it's a set
> > of callbacks: setup, cleanup, park, unpark.  Cool.
> > 
> > On the other hand, instead of a 'run' callback, you've got a thread_fn,
> > which has to loop and call smpboot_thread_check_parking().
> > 
> > If you just had the thread_fn, it'd be trivial to follow program flow.
> > If you just had the callbacks, it'd still be pretty easy, though it
> > seems like a little too much help.

I reread your reply with less sleep depriviation and now I understood
your suggestion for using a run callback :)

Yep, that makes sense as it moves the loop into the generic code. Will
have a go on that.

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2012-06-24 10:29 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-13 11:00 [RFC patch 0/5] Per cpu thread hotplug infrastructure Thomas Gleixner
2012-06-13 11:00 ` [RFC patch 2/5] smpboot: Provide infrastructure for percpu hotplug threads Thomas Gleixner
2012-06-13 18:33   ` Paul E. McKenney
2012-06-13 18:56     ` Thomas Gleixner
2012-06-14  8:08       ` Peter Zijlstra
2012-06-14  8:17         ` Peter Zijlstra
2012-06-15  1:53           ` Tejun Heo
2012-06-15  9:58             ` Peter Zijlstra
2012-06-14  8:37         ` Thomas Gleixner
2012-06-14  8:38           ` Peter Zijlstra
2012-06-13 18:57   ` Paul E. McKenney
2012-06-13 19:02     ` Thomas Gleixner
2012-06-13 19:17       ` Paul E. McKenney
2012-06-13 20:47         ` Paul E. McKenney
2012-06-14  4:51           ` Paul E. McKenney
2012-06-14 11:20             ` Thomas Gleixner
2012-06-14 12:59               ` Paul E. McKenney
2012-06-14 13:01                 ` Peter Zijlstra
2012-06-14 14:47                   ` Paul E. McKenney
2012-06-14 14:56                     ` Peter Zijlstra
2012-06-14 15:07                       ` Thomas Gleixner
2012-06-14 15:45                       ` Paul E. McKenney
2012-06-14 16:20                         ` Thomas Gleixner
2012-06-14 13:32                 ` Thomas Gleixner
2012-06-14 13:47                   ` Thomas Gleixner
2012-06-14 14:12                     ` Thomas Gleixner
2012-06-14 15:01                       ` Paul E. McKenney
2012-06-14 15:08                         ` Thomas Gleixner
2012-06-14 14:50                   ` Paul E. McKenney
2012-06-14 15:02                     ` Thomas Gleixner
2012-06-14 15:38                     ` Paul E. McKenney
2012-06-14 16:19                       ` Thomas Gleixner
2012-06-14 16:48                         ` Paul E. McKenney
2012-06-14 22:40             ` Paul E. McKenney
2012-06-14  8:31   ` Srivatsa S. Bhat
2012-06-14  8:44     ` Thomas Gleixner
2012-06-18  8:47   ` Cyrill Gorcunov
2012-06-18  8:50     ` Thomas Gleixner
2012-06-13 11:00 ` [RFC patch 1/5] kthread: Implement park/unpark facility Thomas Gleixner
2012-06-14  2:32   ` Namhyung Kim
2012-06-14  8:35     ` Thomas Gleixner
2012-06-14  8:59       ` Thomas Gleixner
2012-06-14  8:36   ` Srivatsa S. Bhat
2012-06-14 20:01   ` Silas Boyd-Wickizer
2012-06-14 20:13     ` Thomas Gleixner
2012-06-14 22:13   ` Silas Boyd-Wickizer
2012-06-15  1:44     ` Tejun Heo
2012-06-13 11:00 ` [RFC patch 3/5] softirq: Use hotplug thread infrastructure Thomas Gleixner
2012-06-13 11:00 ` [RFC patch 5/5] infiniband: ehca: " Thomas Gleixner
2012-06-18  6:30   ` Rusty Russell
2012-06-18  8:08     ` Thomas Gleixner
2012-06-24 10:29       ` Thomas Gleixner
2012-06-13 11:00 ` [RFC patch 4/5] watchdog: " Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.