linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] kthread_create
@ 2003-12-31  3:31 Rusty Russell
  2003-12-31  4:33 ` Jeff Garzik
                   ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Rusty Russell @ 2003-12-31  3:31 UTC (permalink / raw)
  To: torvalds, akpm; +Cc: mingo, linux-kernel

Hi all,

	Ingo read through this before and liked it: this is the basis
of the Hotplug CPU patch, and as such has been stressed fairly well.
Tested stand-alone, and included here for wider review.

Feedback welcome!
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

Name: Kernel Thread Control Primitives
Author: Rusty Russell
Status: Tested on 2.6.0-bk3

D: The hotplug CPU code introduces two major problems:
D: 
D: 1) Threads which previously never stopped (migration thread,
D:    ksoftirqd, keventd) have to be stopped cleanly as CPUs go offline.
D: 2) Threads which previously never had to be created now have
D:    to be created when a CPU goes online.
D: 
D: Unfortunately, stopping a thread is fairly baroque, involving memory
D: barriers, a completion and spinning until the task is actually dead.
D: 
D: There are also three problems in starting a thread:
D: 1) Doing it from a random process context risks environment contamination:
D:    better to do it from keventd to guarantee a clean environment, a-la
D:    call_usermodehelper.
D: 2) Getting the task struct without races is a hard: see kernel/sched.c
D:    migration_call(), kernel/workqueue.c create_workqueue_thread().
D: 3) There are races in starting a thread for a CPU which is not yet
D:    online: migration thread does a complex dance at the moment for
D:    a similar reason (there may be no migration thread to migrate us).
D: 
D: Place all this logic in some primitives to make life easier:
D: kthread_create(), kthread_start() and kthread_destroy().  These
D: primitives require no extra data-structures in the caller: they operate
D: on normal "struct task_struct"s.
D: 
D: Other changes:
D:   - Expose keventd_up(), as keventd and migration threads will use
D:     kthread to launch, and kthread normally uses workqueues and must
D:     recognize this case.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .32392-linux-2.6.0-test11-bk6/include/linux/kthread.h .32392-linux-2.6.0-test11-bk6.updated/include/linux/kthread.h
--- .32392-linux-2.6.0-test11-bk6/include/linux/kthread.h	1970-01-01 10:00:00.000000000 +1000
+++ .32392-linux-2.6.0-test11-bk6.updated/include/linux/kthread.h	2003-12-09 17:42:29.000000000 +1100
@@ -0,0 +1,38 @@
+#ifndef _LINUX_KTHREAD_H
+#define _LINUX_KTHREAD_H
+/* Simple abstraction for kernel thread usage: the initfn is called at
+ * the start, if that's successful (ie. returns 0), then the thread
+ * sleeps.
+ *
+ * Every time the thread is woken, it will run corefn, until that
+ * returns an error.  The thread must be ended by calling
+ * kthread_destroy().
+ */
+#include <linux/err.h>
+struct task_struct;
+
+/* Part I: create a kthread: if fork fails return ERR_PTR(-errno). */
+struct task_struct *kthread_create(int (*initfn)(void *data),
+				   int (*corefn)(void *data),
+				   void *data,
+				   const char namefmt[], ...);
+
+/* Part II: have thread call initfn(); return thread if successful,
+   otherwise ERR_PTR(-errno). */
+struct task_struct *kthread_start(struct task_struct *k);
+
+/* Convenient wrapper for both of the above. */
+#define kthread_run(initfn, corefn, data, namefmt, ...)			      \
+({									      \
+	struct task_struct *__k						      \
+		= kthread_create(initfn,corefn,data,namefmt, ## __VA_ARGS__); \
+	if (!IS_ERR(__k))						      \
+		__k = kthread_start(__k);				      \
+	__k;								      \
+})
+
+/* Stop the thread.  Return value is last return of corefn() (ie. zero
+ * if exited as normal).  Can be called before kthread_start(). */
+int kthread_destroy(struct task_struct *k);
+
+#endif /* _LINUX_KTHREAD_H */
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .32392-linux-2.6.0-test11-bk6/include/linux/workqueue.h .32392-linux-2.6.0-test11-bk6.updated/include/linux/workqueue.h
--- .32392-linux-2.6.0-test11-bk6/include/linux/workqueue.h	2003-09-22 10:07:08.000000000 +1000
+++ .32392-linux-2.6.0-test11-bk6.updated/include/linux/workqueue.h	2003-12-09 17:42:29.000000000 +1100
@@ -60,6 +60,7 @@ extern int FASTCALL(schedule_work(struct
 extern int FASTCALL(schedule_delayed_work(struct work_struct *work, unsigned long delay));
 extern void flush_scheduled_work(void);
 extern int current_is_keventd(void);
+extern int keventd_up(void);
 
 extern void init_workqueues(void);
 
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .32392-linux-2.6.0-test11-bk6/kernel/Makefile .32392-linux-2.6.0-test11-bk6.updated/kernel/Makefile
--- .32392-linux-2.6.0-test11-bk6/kernel/Makefile	2003-10-09 18:03:02.000000000 +1000
+++ .32392-linux-2.6.0-test11-bk6.updated/kernel/Makefile	2003-12-09 17:42:29.000000000 +1100
@@ -6,7 +6,8 @@ obj-y     = sched.o fork.o exec_domain.o
 	    exit.o itimer.o time.o softirq.o resource.o \
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
-	    rcupdate.o intermodule.o extable.o params.o posix-timers.o
+	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
+	    kthread.o
 
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .32392-linux-2.6.0-test11-bk6/kernel/kthread.c .32392-linux-2.6.0-test11-bk6.updated/kernel/kthread.c
--- .32392-linux-2.6.0-test11-bk6/kernel/kthread.c	1970-01-01 10:00:00.000000000 +1000
+++ .32392-linux-2.6.0-test11-bk6.updated/kernel/kthread.c	2003-12-09 17:42:29.000000000 +1100
@@ -0,0 +1,228 @@
+/* Kernel thread helper functions.
+ *   Copyright (C) 2003 IBM Corporation, Rusty Russell.
+ *
+ * Everything uses keventd, so that we get a clean environment even if
+ * we're invoked from userspace (think modprobe, hotplug cpu).
+ */
+#include <linux/sched.h>
+#include <linux/kthread.h>
+#include <linux/completion.h>
+#include <linux/err.h>
+#include <asm/semaphore.h>
+
+/* This makes sure only one kthread is being talked to at once. */
+static DECLARE_MUTEX(kthread_control);
+
+/* This coordinates communication between the kthread and routine
+ * controlling it.  Strictly unneccessary, but without it it's barrier
+ * hell. */
+static spinlock_t ktm_lock = SPIN_LOCK_UNLOCKED;
+
+/* All thread comms is command -> ack, so we keep it simple. */
+struct kt_message
+{
+	struct task_struct *from, *to;
+	void *info;
+};
+
+static struct kt_message ktm;
+
+static void ktm_send(struct task_struct *to, void *info)
+{
+	spin_lock(&ktm_lock);
+	ktm.to = to;
+	ktm.from = current;
+	ktm.info = info;
+	if (ktm.to)
+		wake_up_process(ktm.to);
+	spin_unlock(&ktm_lock);
+}
+
+static struct kt_message ktm_receive(void)
+{
+	struct kt_message m;
+
+	for (;;) {
+		spin_lock(&ktm_lock);
+		if (ktm.to == current)
+			break;
+		current->state = TASK_INTERRUPTIBLE;
+		spin_unlock(&ktm_lock);
+		schedule();
+	}
+	m = ktm;
+	spin_unlock(&ktm_lock);
+	return m;
+}
+
+struct kthread
+{
+	int (*initfn)(void *data);
+	int (*corefn)(void *data);
+	void *data;
+	char *name;
+};
+
+/* Check if we're being told to stop. */
+static int time_to_die(struct kt_message *m)
+{
+        int ret = 0;
+
+        spin_lock(&ktm_lock);
+        if (ktm.to == current && ktm.info == NULL) {
+                *m = ktm;
+                ret = 1;
+        }
+        spin_unlock(&ktm_lock);
+        return ret;
+}
+
+static int kthread(void *data)
+{
+	/* Copy data: it's on keventd_init's stack */
+	struct kthread k = *(struct kthread *)data;
+	struct kt_message m;
+	int ret = 0;
+	sigset_t blocked;
+
+	strcpy(current->comm, k.name);
+
+	/* Block and flush all signals. */
+	sigfillset(&blocked);
+	sigprocmask(SIG_BLOCK, &blocked, NULL);
+	flush_signals(current);
+
+	/* Send to spawn_kthread, so it knows who we are. */
+	ktm_send(ktm.info, current);
+
+	/* Receive from kthread_start or kthread_destroy */
+	m = ktm_receive();
+	if (!m.info)
+		goto stop;
+	if (k.initfn && (ret = k.initfn(k.data)) < 0)
+		goto stop;
+	ktm_send(m.from, current);
+
+	for (;;) {
+		if (time_to_die(&m))
+			break;
+
+		/* If it fails, just wait until kthread_destroy. */
+		if (k.corefn && (ret = k.corefn(k.data)) < 0)
+			k.corefn = NULL;
+
+		if (time_to_die(&m))
+			break;
+
+		schedule();
+	}
+
+	current->state = TASK_RUNNING;
+stop:
+	ktm_send(m.from, ERR_PTR(ret));
+	return ret;
+}
+
+struct kthread_create
+{
+	struct task_struct *result;
+	struct kthread k;
+	struct completion done;
+};
+
+/* We are keventd().  We create a thread. */
+static void spawn_kthread(void *data)
+{
+	struct kthread_create *kc = data;
+	int ret;
+
+	/* Set up message so they know who their parent is. */
+	ktm_send(NULL, current);
+
+	/* We want our own signal handler (we take no signals by default). */
+	ret = kernel_thread(kthread, &kc->k, CLONE_FS | CLONE_FILES | SIGCHLD);
+	if (ret < 0)
+		kc->result = ERR_PTR(ret);
+	else {
+		/* They tell us who they are. */
+		struct kt_message m = ktm_receive();
+		kc->result = m.info;
+	}
+	complete(&kc->done);
+}
+
+struct task_struct *kthread_create(int (*initfn)(void *data),
+				   int (*corefn)(void *data),
+				   void *data,
+				   const char namefmt[],
+				   ...)
+{
+	va_list args;
+	struct kthread_create kc;
+	DECLARE_WORK(work, spawn_kthread, &kc);
+	/* Or, as we like to say, 16. */
+	char name[sizeof(((struct task_struct *)0)->comm)];
+
+	va_start(args, namefmt);
+	vsnprintf(name, sizeof(name), namefmt, args);
+	va_end(args);
+
+	init_completion(&kc.done);
+	kc.k.initfn = initfn;
+	kc.k.corefn = corefn;
+	kc.k.data = data;
+	kc.k.name = name;
+
+	down(&kthread_control);
+	/* If we're being called to start the first workqueue, we
+	 * can't use keventd. */
+	if (!keventd_up())
+		work.func(work.data);
+	else {
+		schedule_work(&work);
+		wait_for_completion(&kc.done);
+	}
+	up(&kthread_control);
+	return kc.result;
+}
+
+static void wait_for_death(struct task_struct *k)
+{
+	while (!(k->state & TASK_ZOMBIE) && !(k->state & TASK_DEAD))
+		yield();
+}
+
+struct task_struct *kthread_start(struct task_struct *k)
+{
+	struct kt_message m;
+
+	get_task_struct(k);
+
+	down(&kthread_control);
+	ktm_send(k, k);
+	m = ktm_receive();
+	up(&kthread_control);
+
+	if (IS_ERR(m.info))
+		wait_for_death(k);
+	put_task_struct(k);
+
+	return m.info;
+}
+
+int kthread_destroy(struct task_struct *k)
+{
+	struct kt_message m;
+
+	get_task_struct(k);
+
+	down(&kthread_control);
+	ktm_send(k, NULL);
+	m = ktm_receive();
+	up(&kthread_control);
+
+	wait_for_death(k);
+	put_task_struct(k);
+
+	return PTR_ERR(m.info);
+}
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .32392-linux-2.6.0-test11-bk6/kernel/workqueue.c .32392-linux-2.6.0-test11-bk6.updated/kernel/workqueue.c
--- .32392-linux-2.6.0-test11-bk6/kernel/workqueue.c	2003-09-22 10:27:38.000000000 +1000
+++ .32392-linux-2.6.0-test11-bk6.updated/kernel/workqueue.c	2003-12-09 17:42:29.000000000 +1100
@@ -359,6 +359,11 @@ void flush_scheduled_work(void)
 	flush_workqueue(keventd_wq);
 }
 
+int keventd_up(void)
+{
+	return keventd_wq != NULL;
+}
+
 int current_is_keventd(void)
 {
 	struct cpu_workqueue_struct *cwq;

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2003-12-31  3:31 [PATCH 1/2] kthread_create Rusty Russell
@ 2003-12-31  4:33 ` Jeff Garzik
  2003-12-31  5:28   ` Rusty Russell
  2003-12-31  4:49 ` Andrew Morton
  2003-12-31  5:06 ` Davide Libenzi
  2 siblings, 1 reply; 47+ messages in thread
From: Jeff Garzik @ 2003-12-31  4:33 UTC (permalink / raw)
  To: Rusty Russell; +Cc: torvalds, akpm, mingo, linux-kernel

Rusty Russell wrote:
> Hi all,
> 
> 	Ingo read through this before and liked it: this is the basis
> of the Hotplug CPU patch, and as such has been stressed fairly well.
> Tested stand-alone, and included here for wider review.


Hey, this is pretty cool.

Recalling threads from LKML past, there are two mechanisms I (and some 
others) felt were missing from the equally nifty workqueue stuff:
1) one-shot threads
2) keventd overflow

For #1, your patch seems to cover that nicely.

For #2, that's to be used for situations where (a) you need a thread 
context _and_ (b) you simply cannot wait for keventd to become available 
(since there are no time guarantees).

Anyway, thanks for doing this, it fills a need, I believe.

	Jeff




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2003-12-31  3:31 [PATCH 1/2] kthread_create Rusty Russell
  2003-12-31  4:33 ` Jeff Garzik
@ 2003-12-31  4:49 ` Andrew Morton
  2003-12-31  5:18   ` Rusty Russell
  2003-12-31  5:06 ` Davide Libenzi
  2 siblings, 1 reply; 47+ messages in thread
From: Andrew Morton @ 2003-12-31  4:49 UTC (permalink / raw)
  To: Rusty Russell; +Cc: torvalds, mingo, linux-kernel

Rusty Russell <rusty@rustcorp.com.au> wrote:
>
> Hi all,
> 
> 	Ingo read through this before and liked it: this is the basis
> of the Hotplug CPU patch, and as such has been stressed fairly well.
> Tested stand-alone, and included here for wider review.

It would be nice to be able to see all the hotplug CPU patches in one
place, to get a feel for their shape and size.  That way, we can decide
whether we need to look at this patch ;)

> D: kthread_create(), kthread_start() and kthread_destroy().  These

A few things:

> +static struct kt_message ktm_receive(void)
> +{
> +	struct kt_message m;
> +
> +	for (;;) {
> +		spin_lock(&ktm_lock);
> +		if (ktm.to == current)
> +			break;
> +		current->state = TASK_INTERRUPTIBLE;
> +		spin_unlock(&ktm_lock);
> +		schedule();
> +	}

If the calling task has a signal pending, this could become a tight loop?

> +
> +static int kthread(void *data)
> +{
> +	/* Copy data: it's on keventd_init's stack */
> +	struct kthread k = *(struct kthread *)data;
> +	struct kt_message m;
> +	int ret = 0;
> +	sigset_t blocked;
> +
> +	strcpy(current->comm, k.name);
> +
> +	/* Block and flush all signals. */
> +	sigfillset(&blocked);
> +	sigprocmask(SIG_BLOCK, &blocked, NULL);
> +	flush_signals(current);
> +

deamonize() was not suitable here?

> +	/* Send to spawn_kthread, so it knows who we are. */
> +	ktm_send(ktm.info, current);
> +
> +	/* Receive from kthread_start or kthread_destroy */
> +	m = ktm_receive();
> +	if (!m.info)
> +		goto stop;
> +	if (k.initfn && (ret = k.initfn(k.data)) < 0)
> +		goto stop;
> +	ktm_send(m.from, current);
> +
> +	for (;;) {
> +		if (time_to_die(&m))
> +			break;
> +
> +		/* If it fails, just wait until kthread_destroy. */
> +		if (k.corefn && (ret = k.corefn(k.data)) < 0)
> +			k.corefn = NULL;
> +
> +		if (time_to_die(&m))
> +			break;
> +
> +		schedule();
> +	}

In what state is this schedule() called?  If it's TASK_RUNNING (or
TASK_INTERRUPTIBLE with signal_pending()) and this task has rt priority
higher than the thing it is waiting for we could have a problem?

> +
> +	current->state = TASK_RUNNING;
> +stop:
> +	ktm_send(m.from, ERR_PTR(ret));
> +	return ret;
> +}
> +
> +struct kthread_create
> +{
> +	struct task_struct *result;
> +	struct kthread k;
> +	struct completion done;
> +};
> +

`kthread_create' sounds like the name of a function to me, not a structure.

> +struct task_struct *kthread_create(int (*initfn)(void *data),

I was right! ;)

It would be nice to kerneldocify kthread_create(), kthread_start() and
kthread_destroy() sometime.

> +static void wait_for_death(struct task_struct *k)
> +{
> +	while (!(k->state & TASK_ZOMBIE) && !(k->state & TASK_DEAD))
> +		yield();
> +}
> +

If the calling task has higher rt priority than *k, could this not become a
busy loop?  It would be preferable to use a real sleep/wait primitive here.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2003-12-31  3:31 [PATCH 1/2] kthread_create Rusty Russell
  2003-12-31  4:33 ` Jeff Garzik
  2003-12-31  4:49 ` Andrew Morton
@ 2003-12-31  5:06 ` Davide Libenzi
  2003-12-31  5:34   ` Rusty Russell
  2 siblings, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2003-12-31  5:06 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Wed, 31 Dec 2003, Rusty Russell wrote:

> +struct kt_message
> +{
> +	struct task_struct *from, *to;
> +	void *info;
> +};
> +
> +static struct kt_message ktm;
> +
> +static void ktm_send(struct task_struct *to, void *info)
> +{
> +	spin_lock(&ktm_lock);
> +	ktm.to = to;
> +	ktm.from = current;
> +	ktm.info = info;
> +	if (ktm.to)
> +		wake_up_process(ktm.to);
> +	spin_unlock(&ktm_lock);
> +}
> +
> +static struct kt_message ktm_receive(void)
> +{
> +	struct kt_message m;
> +
> +	for (;;) {
> +		spin_lock(&ktm_lock);
> +		if (ktm.to == current)
> +			break;
> +		current->state = TASK_INTERRUPTIBLE;
> +		spin_unlock(&ktm_lock);
> +		schedule();
> +	}
> +	m = ktm;
> +	spin_unlock(&ktm_lock);
> +	return m;
> +}

Wouldn't it be better to put a kt_message inside a tast_struct?



- Davide




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2003-12-31  4:49 ` Andrew Morton
@ 2003-12-31  5:18   ` Rusty Russell
  0 siblings, 0 replies; 47+ messages in thread
From: Rusty Russell @ 2003-12-31  5:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, mingo, linux-kernel

In message <20031230204910.0e767b50.akpm@osdl.org> you write:
> It would be nice to be able to see all the hotplug CPU patches in one
> place, to get a feel for their shape and size.  That way, we can decide
> whether we need to look at this patch ;)

Um, I've had this on kernel.org for a few years now.  It's even at the
top of the page:

	http:://www.kernel.org/pub/linux/kernel/people/rusty

> > +static struct kt_message ktm_receive(void)
> > +{
> > +	struct kt_message m;
> > +
> > +	for (;;) {
> > +		spin_lock(&ktm_lock);
> > +		if (ktm.to == current)
> > +			break;
> > +		current->state = TASK_INTERRUPTIBLE;
> > +		spin_unlock(&ktm_lock);
> > +		schedule();
> > +	}
> 
> If the calling task has a signal pending, this could become a tight loop?

Possibly, but there's not much we can do.  We never wait long, and
we're keventd or a child here, so we're only talking about SIGCHLD.

> > +	strcpy(current->comm, k.name);
> > +
> > +	/* Block and flush all signals. */
> > +	sigfillset(&blocked);
> > +	sigprocmask(SIG_BLOCK, &blocked, NULL);
> > +	flush_signals(current);
> > +
> 
> deamonize() was not suitable here?

No. (1) By design, we're always a purebred kernel thread descended
directly from the init thread and have never had an mm anywhere.  (2)
daemonize is an abhorrent abortion: it's dangerous and presumptive to
try to "clean up" a random thread into a kernel thread.

> > +		/* If it fails, just wait until kthread_destroy. */
> > +		if (k.corefn && (ret = k.corefn(k.data)) < 0)
> > +			k.corefn = NULL;
> > +
> > +		if (time_to_die(&m))
> > +			break;
> > +
> > +		schedule();
> > +	}
> 
> In what state is this schedule() called?  If it's TASK_RUNNING (or
> TASK_INTERRUPTIBLE with signal_pending()) and this task has rt priority
> higher than the thing it is waiting for we could have a problem?

Yep, that's by design.  (1) signal_pending() is not possible, we've
blocked all signals above.  (2) the corefn() MUST set the task state,
as per normal semantics:

       1) set current->state to TASK_INTERRUPTIBLE
       2) check condition
       3) return (so core can schedule)

> > +struct kthread_create
> > +{
> > +	struct task_struct *result;
> > +	struct kthread k;
> > +	struct completion done;
> > +};
> > +
> 
> `kthread_create' sounds like the name of a function to me, not a structure.

OK, I've changed it to kthread_creation.  It's private to kthread.c
anyway.

> It would be nice to kerneldocify kthread_create(), kthread_start() and
> kthread_destroy() sometime.

Sure, if you want.  Does anyone actually read that?  I prefer the
comments on how to use functions belong in the headers, not above the
definitions as seems to be the kerneldoc way.

> > +static void wait_for_death(struct task_struct *k)
> > +{
> > +	while (!(k->state & TASK_ZOMBIE) && !(k->state & TASK_DEAD))
> > +		yield();
> > +}
> > +
> 
> If the calling task has higher rt priority than *k, could this not become a
> busy loop?  It would be preferable to use a real sleep/wait primitive here.

Hmm, if it's an RT task, it'll screw up, yes, because yield() won't
yield().

Fixing this well would require a way of notifying someone who is not
the parent when a task dies, OR taking over the parenthood of the
task.  Both of these required non-trivial changes to exit.c and I
shied away.

All things are possible, however...

Thanks!
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2003-12-31  4:33 ` Jeff Garzik
@ 2003-12-31  5:28   ` Rusty Russell
  2003-12-31  6:34     ` Jeff Garzik
  0 siblings, 1 reply; 47+ messages in thread
From: Rusty Russell @ 2003-12-31  5:28 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: torvalds, akpm, mingo, linux-kernel

In message <3FF251B1.4070404@pobox.com> you write:
> Rusty Russell wrote:
> > Hi all,
> > 
> > 	Ingo read through this before and liked it: this is the basis
> > of the Hotplug CPU patch, and as such has been stressed fairly well.
> > Tested stand-alone, and included here for wider review.
> 
> Hey, this is pretty cool.
> 
> Recalling threads from LKML past, there are two mechanisms I (and some 
> others) felt were missing from the equally nifty workqueue stuff:
> 1) one-shot threads
> 2) keventd overflow
> 
> For #1, your patch seems to cover that nicely.

It's really for persistent threads, but you can use it as one-shot by
either (1) calling exit() in the core function (and noone calls
kthread_destroy), or (2) not having a core function and just having an
init function.  I used this in a test patch.

For #2, if you really can't wait for keventd, perhaps your own
workqueue is in order?

> Anyway, thanks for doing this, it fills a need, I believe.

Yes: stopping threads manually is a real PITA, and things like
complete_and_exit() always make me wince.

Thanks,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2003-12-31  5:06 ` Davide Libenzi
@ 2003-12-31  5:34   ` Rusty Russell
  2003-12-31  5:56     ` Davide Libenzi
  0 siblings, 1 reply; 47+ messages in thread
From: Rusty Russell @ 2003-12-31  5:34 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

In message <Pine.LNX.4.44.0312302100550.1457-100000@bigblue.dev.mdolabs.com> you write:
> Wouldn't it be better to put a kt_message inside a tast_struct?

Expand task_struct for this one usage?  I don't think that's
worthwhile.

The whole code is written so there is no datastructure associated with
the kthread.  When something like kt_message is needed (to kill a
thread, for example), they grab the lock and use the static one.

This means that threads can exit without having to do cleanup.

Hope that clarifies,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2003-12-31  5:34   ` Rusty Russell
@ 2003-12-31  5:56     ` Davide Libenzi
  2003-12-31  6:27       ` Rusty Russell
  2003-12-31  6:31       ` Srivatsa Vaddagiri
  0 siblings, 2 replies; 47+ messages in thread
From: Davide Libenzi @ 2003-12-31  5:56 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Wed, 31 Dec 2003, Rusty Russell wrote:

> In message <Pine.LNX.4.44.0312302100550.1457-100000@bigblue.dev.mdolabs.com> you write:
> > Wouldn't it be better to put a kt_message inside a tast_struct?
> 
> Expand task_struct for this one usage?  I don't think that's
> worthwhile.
> 
> The whole code is written so there is no datastructure associated with
> the kthread.  When something like kt_message is needed (to kill a
> thread, for example), they grab the lock and use the static one.
> 
> This means that threads can exit without having to do cleanup.

I agree on one side, there's the drawback on a size increase (3 pointers, 
plus eventually a spinlock) of the task struct. But IMO the code would be 
cleaner, since you know who is the target of the message. Also it would 
not require any cleanup since there would be nothing allocated, just a 
struct member inside task_struct.
Also, what happens in the task woke up by a send does not reschedule 
before another CPU does another send? Wouldn't a message be lost?



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2003-12-31  5:56     ` Davide Libenzi
@ 2003-12-31  6:27       ` Rusty Russell
  2004-01-01  3:45         ` Davide Libenzi
  2003-12-31  6:31       ` Srivatsa Vaddagiri
  1 sibling, 1 reply; 47+ messages in thread
From: Rusty Russell @ 2003-12-31  6:27 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

In message <Pine.LNX.4.44.0312302149350.1457-100000@bigblue.dev.mdolabs.com> you write:
> plus eventually a spinlock) of the task struct. But IMO the code would be 
> cleaner, since you know who is the target of the message.

<shrug> The code's really not that complicated.

> Also, what happens in the task woke up by a send does not reschedule 
> before another CPU does another send? Wouldn't a message be lost?

There's a lock, so only one communication happens at a time, and all
communication is request-response, so it's pretty straightforward.

But an alternate implementation would be to have a "kthread" kernel
thread, which would actually be parent to the kthread threads.  This
means it can allocate and clean up, since it catches *all* thread
deaths, including "exit()".

What do you think?
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2003-12-31  5:56     ` Davide Libenzi
  2003-12-31  6:27       ` Rusty Russell
@ 2003-12-31  6:31       ` Srivatsa Vaddagiri
  2003-12-31  7:12         ` Davide Libenzi
  1 sibling, 1 reply; 47+ messages in thread
From: Srivatsa Vaddagiri @ 2003-12-31  6:31 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Rusty Russell, Linus Torvalds, Andrew Morton, mingo,
	Linux Kernel Mailing List

On Tue, Dec 30, 2003 at 09:56:05PM -0800, Davide Libenzi wrote:
> Also, what happens in the task woke up by a send does not reschedule 
> before another CPU does another send? Wouldn't a message be lost?
> 

The messages should not be lost because we take the cpucontrol
semaphore in kthread_start or kthread_destroy before sending 
a (start or destroy) message.

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560033

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2003-12-31  5:28   ` Rusty Russell
@ 2003-12-31  6:34     ` Jeff Garzik
  2003-12-31  8:47       ` Benjamin Herrenschmidt
  2004-01-01 23:51       ` Rusty Russell
  0 siblings, 2 replies; 47+ messages in thread
From: Jeff Garzik @ 2003-12-31  6:34 UTC (permalink / raw)
  To: Rusty Russell; +Cc: torvalds, akpm, mingo, linux-kernel

Rusty Russell wrote:
> For #2, if you really can't wait for keventd, perhaps your own
> workqueue is in order?

Way too wasteful, and doing so is working around a fundamental failing 
of workqueues:  keventd gives no guarantee that your scheduled work will 
be executed this week, this month, or this year :)

keventd is used by two competing classes of users:  those who want 
low-latency, quick execution in task context (Tux-ish), and those who 
just want to run something in task context, where they might sleep 
(perhaps for a long time).

So it would be nice to have thread pool semantics occasionally found in 
userspace:  if thread pool is full when new work is queued, 
_temporarily_ increase the pool size (or create a one-shot kthread). 
Sure you have kthread creation overhead, but at least you have a 
reasonable guarantee that your work won't wait 5-30 seconds or more 
before being performed.

	Jeff




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2003-12-31  6:31       ` Srivatsa Vaddagiri
@ 2003-12-31  7:12         ` Davide Libenzi
  2003-12-31 23:25           ` Rusty Russell
  0 siblings, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2003-12-31  7:12 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Rusty Russell, Linus Torvalds, Andrew Morton, mingo,
	Linux Kernel Mailing List

On Wed, 31 Dec 2003, Srivatsa Vaddagiri wrote:

> On Tue, Dec 30, 2003 at 09:56:05PM -0800, Davide Libenzi wrote:
> > Also, what happens in the task woke up by a send does not reschedule 
> > before another CPU does another send? Wouldn't a message be lost?
> > 
> 
> The messages should not be lost because we take the cpucontrol
> semaphore in kthread_start or kthread_destroy before sending 
> a (start or destroy) message.

I see, ok. At that point though, having the message struct inside the task 
struct could save the *to pointer and (because of the big lock above), using 
barrier and proper order in setting *from and *info, the spin lock. OTOH 
the big lock above really make the global message structure private, so it 
does not make much difference.



- Davide








^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2003-12-31  6:34     ` Jeff Garzik
@ 2003-12-31  8:47       ` Benjamin Herrenschmidt
  2004-01-01 23:51       ` Rusty Russell
  1 sibling, 0 replies; 47+ messages in thread
From: Benjamin Herrenschmidt @ 2003-12-31  8:47 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Rusty Russell, Linus Torvalds, Andrew Morton, mingo, Linux Kernel list

On Wed, 2003-12-31 at 17:34, Jeff Garzik wrote:
> Rusty Russell wrote:
> > For #2, if you really can't wait for keventd, perhaps your own
> > workqueue is in order?
> 
> Way too wasteful, and doing so is working around a fundamental failing 
> of workqueues:  keventd gives no guarantee that your scheduled work will 
> be executed this week, this month, or this year :)
> 
> keventd is used by two competing classes of users:  those who want 
> low-latency, quick execution in task context (Tux-ish), and those who 
> just want to run something in task context, where they might sleep 
> (perhaps for a long time).

That why those ones could be replaced by short-lived threads... I use
a short lived thread for ADB probing and I'm considering doing so for
things like ethernet NIC reset etc... which is definitely scheduling
way too long in keventd

> So it would be nice to have thread pool semantics occasionally found in 
> userspace:  if thread pool is full when new work is queued, 
> _temporarily_ increase the pool size (or create a one-shot kthread). 
> Sure you have kthread creation overhead, but at least you have a 
> reasonable guarantee that your work won't wait 5-30 seconds or more 
> before being performed.

The overhead isn't big (and even lower with kthread I suppose due to
not doing daemonize() & reparent_to_init()) so for occasional
long-sleeping crap, short lived kthreads seem to be the solution. If
those end up not using keventd any more (except for actual creation
of the short lived thread), there is probably no need for a thread pool
thing, only low latency stuffs will still use work_queue.

Ben.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2003-12-31  7:12         ` Davide Libenzi
@ 2003-12-31 23:25           ` Rusty Russell
  0 siblings, 0 replies; 47+ messages in thread
From: Rusty Russell @ 2003-12-31 23:25 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

In message <Pine.LNX.4.44.0312302255080.1457-100000@bigblue.dev.mdolabs.com> you write:
> On Wed, 31 Dec 2003, Srivatsa Vaddagiri wrote:
> 
> > The messages should not be lost because we take the cpucontrol
> > semaphore in kthread_start or kthread_destroy before sending 
> > a (start or destroy) message.
> 
> I see, ok. At that point though, having the message struct inside the task 
> struct could save the *to pointer and (because of the big lock above), using 
> barrier and proper order in setting *from and *info, the spin lock.

My original version used barriers.  But IMHO if you're using barriers
and your code isn't speed-critical, you don't have enough locks.

So I just threw a spinlock around the struct, and no more barrier
issues.

Cheers,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2003-12-31  6:27       ` Rusty Russell
@ 2004-01-01  3:45         ` Davide Libenzi
  2004-01-02  7:09           ` Rusty Russell
  0 siblings, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2004-01-01  3:45 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Wed, 31 Dec 2003, Rusty Russell wrote:

> But an alternate implementation would be to have a "kthread" kernel
> thread, which would actually be parent to the kthread threads.  This
> means it can allocate and clean up, since it catches *all* thread
> deaths, including "exit()".
> 
> What do you think?

Did you take a look at the stuff I sent you? I'll append here with a 
simple comment (this goes over you bits).

1)
Define a structure:

struct kt_message {
       spinlock_t lock;
       struct task_struct *from;
       void *info;
};

and add it inside the task struct. There's no need to cleanup this 
structure on exit.


2)
The global mutex, the global spinlock and the global "struct kt_message" go away. 


3) The "creator" of the kthread is not picked up with a message but is 
passed with the kernel data.


The reason for those bits would be that basically the communication is 
really 1<->1, so IMHO there's no need to converge into static global data 
for this. OTOH it does not really have scalability problems, so it is a 
matter of personal taste. Burn those bits if you like :-)



- Davide



diff -Nru linux-2.6-mm1/include/linux/init_task.h linux-2.6-mm1.mod/include/linux/init_task.h
--- linux-2.6-mm1/include/linux/init_task.h	2003-12-31 11:30:48.529434800 -0800
+++ linux-2.6-mm1.mod/include/linux/init_task.h	2003-12-31 12:07:58.929362472 -0800
@@ -109,6 +109,9 @@
 	.switch_lock	= SPIN_LOCK_UNLOCKED,				\
 	.journal_info	= NULL,						\
 	.io_wait	= NULL,						\
+	.ktm		= {						\
+		.lock		= SPIN_LOCK_UNLOCKED			\
+	},								\
 }
 
 
diff -Nru linux-2.6-mm1/include/linux/sched.h linux-2.6-mm1.mod/include/linux/sched.h
--- linux-2.6-mm1/include/linux/sched.h	2003-12-31 11:30:48.842387224 -0800
+++ linux-2.6-mm1.mod/include/linux/sched.h	2003-12-31 12:05:39.808512056 -0800
@@ -326,6 +326,11 @@
 	struct sigqueue *sigq;		/* signal queue entry. */
 };
 
+struct kt_message {
+	spinlock_t lock;
+	struct task_struct *from;
+	void *info;
+};
 
 struct io_context;			/* See blkdev.h */
 void exit_io_context(void);
@@ -471,6 +476,9 @@
  * to a stack based synchronous wait) if its doing sync IO.
  */
 	wait_queue_t *io_wait;
+
+/* Used to exchange messages by kthread. */
+	struct kt_message ktm;
 };
 
 static inline pid_t process_group(struct task_struct *tsk)
diff -Nru linux-2.6-mm1/kernel/kthread.c linux-2.6-mm1.mod/kernel/kthread.c
--- linux-2.6-mm1/kernel/kthread.c	2003-12-31 11:33:26.056487048 -0800
+++ linux-2.6-mm1.mod/kernel/kthread.c	2003-12-31 12:38:08.989191552 -0800
@@ -10,32 +10,14 @@
 #include <linux/err.h>
 #include <asm/semaphore.h>
 
-/* This makes sure only one kthread is being talked to at once. */
-static DECLARE_MUTEX(kthread_control);
-
-/* This coordinates communication between the kthread and routine
- * controlling it.  Strictly unneccessary, but without it it's barrier
- * hell. */
-static spinlock_t ktm_lock = SPIN_LOCK_UNLOCKED;
-
-/* All thread comms is command -> ack, so we keep it simple. */
-struct kt_message
-{
-	struct task_struct *from, *to;
-	void *info;
-};
-
-static struct kt_message ktm;
 
 static void ktm_send(struct task_struct *to, void *info)
 {
-	spin_lock(&ktm_lock);
-	ktm.to = to;
-	ktm.from = current;
-	ktm.info = info;
-	if (ktm.to)
-		wake_up_process(ktm.to);
-	spin_unlock(&ktm_lock);
+	spin_lock(&to->ktm.lock);
+	to->ktm.from = current;
+	to->ktm.info = info;
+	wake_up_process(to);
+	spin_unlock(&to->ktm.lock);
 }
 
 static struct kt_message ktm_receive(void)
@@ -43,15 +25,16 @@
 	struct kt_message m;
 
 	for (;;) {
-		spin_lock(&ktm_lock);
-		if (ktm.to == current)
+		spin_lock(&current->ktm.lock);
+		if (current->ktm.from)
 			break;
 		current->state = TASK_INTERRUPTIBLE;
-		spin_unlock(&ktm_lock);
+		spin_unlock(&current->ktm.lock);
 		schedule();
 	}
-	m = ktm;
-	spin_unlock(&ktm_lock);
+	m = current->ktm;
+	current->ktm.from = NULL;
+	spin_unlock(&current->ktm.lock);
 	return m;
 }
 
@@ -63,17 +46,25 @@
 	char *name;
 };
 
+struct kthread_create
+{
+	struct task_struct *creator;
+	struct task_struct *result;
+	struct kthread k;
+	struct completion done;
+};
+
 /* Check if we're being told to stop. */
 static int time_to_die(struct kt_message *m)
 {
         int ret = 0;
 
-        spin_lock(&ktm_lock);
-        if (ktm.to == current && ktm.info == NULL) {
-                *m = ktm;
+        spin_lock(&current->ktm.lock);
+        if (current->ktm.from && !current->ktm.info) {
+                *m = current->ktm;
                 ret = 1;
         }
-        spin_unlock(&ktm_lock);
+        spin_unlock(&current->ktm.lock);
         return ret;
 }
 
@@ -81,7 +72,8 @@
 static int kthread(void *data)
 {
 	/* Copy data: it's on keventd_init's stack */
-	struct kthread k = *(struct kthread *)data;
+	struct kthread_create *kc = data;
+	struct kthread k = kc->k;
 	struct kt_message m;
 	int ret = 0;
 	sigset_t blocked;
@@ -94,7 +86,7 @@
 	flush_signals(current);
 
 	/* Send to spawn_kthread, so it knows who we are. */
-	ktm_send(ktm.info, current);
+	ktm_send(kc->creator, current);
 
 	/* Receive from kthread_start or kthread_destroy */
 	m = ktm_receive();
@@ -124,24 +116,16 @@
 	return ret;
 }
 
-struct kthread_create
-{
-	struct task_struct *result;
-	struct kthread k;
-	struct completion done;
-};
-
 /* We are keventd().  We create a thread. */
 static void spawn_kthread(void *data)
 {
 	struct kthread_create *kc = data;
 	int ret;
 
-	/* Set up message so they know who their parent is. */
-	ktm_send(NULL, current);
+	kc->creator = current;
 
 	/* We want our own signal handler (we take no signals by default). */
-	ret = kernel_thread(kthread, &kc->k, CLONE_FS | CLONE_FILES | SIGCHLD);
+	ret = kernel_thread(kthread, kc, CLONE_FS | CLONE_FILES | SIGCHLD);
 	if (ret < 0)
 		kc->result = ERR_PTR(ret);
 	else {
@@ -174,7 +158,6 @@
 	kc.k.data = data;
 	kc.k.name = name;
 
-	down(&kthread_control);
 	/* If we're being called to start the first workqueue, we
 	 * can't use keventd. */
 	if (!keventd_up())
@@ -183,7 +166,6 @@
 		schedule_work(&work);
 		wait_for_completion(&kc.done);
 	}
-	up(&kthread_control);
 	return kc.result;
 }
 
@@ -199,10 +181,8 @@
 
 	get_task_struct(k);
 
-	down(&kthread_control);
 	ktm_send(k, k);
 	m = ktm_receive();
-	up(&kthread_control);
 
 	if (IS_ERR(m.info))
 		wait_for_death(k);
@@ -217,10 +197,8 @@
 
 	get_task_struct(k);
 
-	down(&kthread_control);
 	ktm_send(k, NULL);
 	m = ktm_receive();
-	up(&kthread_control);
 
 	wait_for_death(k);
 	put_task_struct(k);


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2003-12-31  6:34     ` Jeff Garzik
  2003-12-31  8:47       ` Benjamin Herrenschmidt
@ 2004-01-01 23:51       ` Rusty Russell
  1 sibling, 0 replies; 47+ messages in thread
From: Rusty Russell @ 2004-01-01 23:51 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: torvalds, akpm, mingo, linux-kernel

In message <3FF26DFF.3040909@pobox.com> you write:
> Rusty Russell wrote:
> > For #2, if you really can't wait for keventd, perhaps your own
> > workqueue is in order?
> 
> Way too wasteful, 

I feel the same, which is one reason I didn't use them here...

> and doing so is working around a fundamental failing 
> of workqueues:  keventd gives no guarantee that your scheduled work will 
> be executed this week, this month, or this year :)

Hmm, I think that if you had an all-dancing, self-balancing workqueue
mechanism, you wouldn't need to create your own workqueues anyway.

> So it would be nice to have thread pool semantics occasionally found in 
> userspace:  if thread pool is full when new work is queued, 
> _temporarily_ increase the pool size (or create a one-shot kthread). 
> Sure you have kthread creation overhead, but at least you have a 
> reasonable guarantee that your work won't wait 5-30 seconds or more 
> before being performed.

If we're wishlisting, I'd prefer something which also scaled down
below one thread per cpu until required.  Mainly because it just feels
wrong to have those threads sitting around doing mainly nothing.

Simple matter of coding, right?
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-01  3:45         ` Davide Libenzi
@ 2004-01-02  7:09           ` Rusty Russell
  2004-01-02 16:58             ` Davide Libenzi
  2004-03-29 15:38             ` Davide Libenzi
  0 siblings, 2 replies; 47+ messages in thread
From: Rusty Russell @ 2004-01-02  7:09 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

In message <Pine.LNX.4.44.0312311935080.5831-100000@bigblue.dev.mdolabs.com> yo
u write:
> On Wed, 31 Dec 2003, Rusty Russell wrote:
> 
> > But an alternate implementation would be to have a "kthread" kernel
> > thread, which would actually be parent to the kthread threads.  This
> > means it can allocate and clean up, since it catches *all* thread
> > deaths, including "exit()".
> > 
> > What do you think?
> 
> Did you take a look at the stuff I sent you? I'll append here with a 
> simple comment (this goes over you bits).

Yes, but I think it's a really bad idea, as I said before.

Anyway, Here's a version which fixes the issues raised by Andrew by
doing *everything* in keventd, uses waitpid(), and uses signals for
communication (except for the case of init failing).

Passes initial testing here.

TODO: better documentation.

Comments, welcome.
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

Name: Use Signals For Kthread
Author: Rusty Russell
Status: Tested on 2.6.0-bk3

D: The hotplug CPU code introduces two major problems:
D: 
D: 1) Threads which previously never stopped (migration thread,
D:    ksoftirqd, keventd) have to be stopped cleanly as CPUs go offline.
D: 2) Threads which previously never had to be created now have
D:    to be created when a CPU goes online.
D: 
D: Unfortunately, stopping a thread is fairly baroque, involving memory
D: barriers, a completion and spinning until the task is actually dead.
D: 
D: There are also three problems in starting a thread:
D: 1) Doing it from a random process context risks environment contamination:
D:    better to do it from keventd to guarantee a clean environment, a-la
D:    call_usermodehelper.
D: 2) Getting the task struct without races is a hard: see kernel/sched.c
D:    migration_call(), kernel/workqueue.c create_workqueue_thread().
D: 3) There are races in starting a thread for a CPU which is not yet
D:    online: migration thread does a complex dance at the moment for
D:    a similar reason (there may be no migration thread to migrate us).
D: 
D: Place all this logic in some primitives to make life easier:
D: kthread_create(), kthread_start() and kthread_destroy().  These
D: primitives require no extra data-structures in the caller: they operate
D: on normal "struct task_struct"s.
D: 
D: Other changes:
D:   - Expose keventd_up(), as keventd and migration threads will use
D:     kthread to launch, and kthread normally uses workqueues and must
D:     recognize this case.
D: 
D: Changes since first version:
D: 1) Sleep UNINTERRUPTIBLE when waiting for reply (thanks Andrew Morton).
D: 2) Reparent threads and waitpid for them.
D: 3) Do ALL ops via keventd, so we don't disturb current process.
D: 
D: This version uses signals rather than its own communication
D: mechanism makes kthread code a little simpler, and means users can
D: loop inside corefn if they want, and return when signal_pending().

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .29518-linux-2.6.0-bk3/include/linux/kthread.h .29518-linux-2.6.0-bk3.updated/include/linux/kthread.h
--- .29518-linux-2.6.0-bk3/include/linux/kthread.h	1970-01-01 10:00:00.000000000 +1000
+++ .29518-linux-2.6.0-bk3.updated/include/linux/kthread.h	2004-01-02 18:03:20.000000000 +1100
@@ -0,0 +1,38 @@
+#ifndef _LINUX_KTHREAD_H
+#define _LINUX_KTHREAD_H
+/* Simple abstraction for kernel thread usage: the initfn is called at
+ * the start, if that's successful (ie. returns 0), then the thread
+ * sleeps.
+ *
+ * Every time the thread is woken, it will run corefn, until that
+ * returns an error.  The thread must be ended by calling
+ * kthread_destroy().
+ */
+#include <linux/err.h>
+struct task_struct;
+
+/* Part I: create a kthread: if fork fails return ERR_PTR(-errno). */
+struct task_struct *kthread_create(int (*initfn)(void *data),
+				   int (*corefn)(void *data),
+				   void *data,
+				   const char namefmt[], ...);
+
+/* Part II: have thread call initfn(); return thread if successful,
+   otherwise ERR_PTR(-errno). */
+struct task_struct *kthread_start(struct task_struct *k);
+
+/* Convenient wrapper for both of the above. */
+#define kthread_run(initfn, corefn, data, namefmt, ...)			      \
+({									      \
+	struct task_struct *__k						      \
+		= kthread_create(initfn,corefn,data,namefmt, ## __VA_ARGS__); \
+	if (!IS_ERR(__k))						      \
+		__k = kthread_start(__k);				      \
+	__k;								      \
+})
+
+/* Stop the thread.  Return value is last return of corefn() (ie. zero
+ * if exited as normal).  Can be called before kthread_start(). */
+int kthread_destroy(struct task_struct *k);
+
+#endif /* _LINUX_KTHREAD_H */
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .29518-linux-2.6.0-bk3/include/linux/workqueue.h .29518-linux-2.6.0-bk3.updated/include/linux/workqueue.h
--- .29518-linux-2.6.0-bk3/include/linux/workqueue.h	2003-09-22 10:07:08.000000000 +1000
+++ .29518-linux-2.6.0-bk3.updated/include/linux/workqueue.h	2004-01-02 18:03:20.000000000 +1100
@@ -60,6 +60,7 @@ extern int FASTCALL(schedule_work(struct
 extern int FASTCALL(schedule_delayed_work(struct work_struct *work, unsigned long delay));
 extern void flush_scheduled_work(void);
 extern int current_is_keventd(void);
+extern int keventd_up(void);
 
 extern void init_workqueues(void);
 
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .29518-linux-2.6.0-bk3/kernel/Makefile .29518-linux-2.6.0-bk3.updated/kernel/Makefile
--- .29518-linux-2.6.0-bk3/kernel/Makefile	2003-10-09 18:03:02.000000000 +1000
+++ .29518-linux-2.6.0-bk3.updated/kernel/Makefile	2004-01-02 18:03:20.000000000 +1100
@@ -6,7 +6,8 @@ obj-y     = sched.o fork.o exec_domain.o
 	    exit.o itimer.o time.o softirq.o resource.o \
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
-	    rcupdate.o intermodule.o extable.o params.o posix-timers.o
+	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
+	    kthread.o
 
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .29518-linux-2.6.0-bk3/kernel/kthread.c .29518-linux-2.6.0-bk3.updated/kernel/kthread.c
--- .29518-linux-2.6.0-bk3/kernel/kthread.c	1970-01-01 10:00:00.000000000 +1000
+++ .29518-linux-2.6.0-bk3.updated/kernel/kthread.c	2004-01-02 18:03:20.000000000 +1100
@@ -0,0 +1,227 @@
+/* Kernel thread helper functions.
+ *   Copyright (C) 2003 IBM Corporation, Rusty Russell.
+ *
+ * Everything is done via keventd, so that we get a clean environment
+ * even if we're invoked from userspace (think modprobe, hotplug cpu,
+ * etc.).  Also, it allows us to wait for dying kthreads without side
+ * effects involved in adopting kthreads to random processes.
+ */
+#define __KERNEL_SYSCALLS__
+#include <linux/sched.h>
+#include <linux/kthread.h>
+#include <linux/completion.h>
+#include <linux/err.h>
+#include <linux/unistd.h>
+#include <asm/semaphore.h>
+
+/* This makes sure only one kthread is being talked to at once:
+ * protects global vars below. */
+static DECLARE_MUTEX(kthread_control);
+
+static int kthread_init_failed;
+static DECLARE_COMPLETION(kthread_ack);
+
+struct kthread_create_info
+{
+	int (*initfn)(void *data);
+	int (*corefn)(void *data);
+	void *data;
+	char *name;
+};
+
+/* Returns a POSITIVE error number. */
+static int kthread(void *data)
+{
+	/* Copy data: it's on keventd_init's stack */
+	struct kthread_create_info k = *(struct kthread_create_info *)data;
+	int ret = 0;
+	sigset_t blocked;
+
+	strcpy(current->comm, k.name);
+
+	/* Block and flush all signals. */
+	sigfillset(&blocked);
+	sigprocmask(SIG_BLOCK, &blocked, NULL);
+	flush_signals(current);
+
+	/* OK, tell user we're spawned, wait for kthread_start/destroy */
+	current->state = TASK_INTERRUPTIBLE;
+	complete(&kthread_ack);
+	schedule();
+
+	if (signal_pending(current))
+		return EINTR;
+
+	/* Run initfn, tell the caller the result. */
+	if (k.initfn)
+		ret = k.initfn(k.data);
+	kthread_init_failed = (ret < 0);
+	complete(&kthread_ack);
+	if (ret < 0)
+		return -ret;
+
+	while (!signal_pending(current)) {
+		/* If it fails, just wait until kthread_destroy. */
+		if (k.corefn && (ret = k.corefn(k.data)) < 0) {
+			k.corefn = NULL;
+			current->state = TASK_INTERRUPTIBLE;
+		}
+		schedule();
+	}
+
+	current->state = TASK_RUNNING;
+	return -ret;
+}
+
+enum kthread_op_type
+{
+	KTHREAD_CREATE,
+	KTHREAD_START,
+	KTHREAD_STOP,
+};
+
+struct kthread_op
+{
+	enum kthread_op_type type;
+
+	union {
+		/* KTHREAD_CREATE in */
+		struct kthread_start_info *start;
+		/* KTHREAD_START, KTHREAD_STOP in */
+		struct task_struct *target;
+		/* out */
+		struct task_struct *result;
+	} u;
+	struct completion done;
+};
+
+/* We are keventd: create a thread. */
+static void *create_kthread(struct kthread_op *op)
+{
+	int pid;
+
+	init_completion(&kthread_ack);
+	/* We want our own signal handler (we take no signals by default). */
+	pid = kernel_thread(kthread, op->u.start,CLONE_FS|CLONE_FILES|SIGCHLD);
+	if (pid < 0)
+		return ERR_PTR(pid);
+
+	wait_for_completion(&kthread_ack);
+	return find_task_by_pid(pid);
+}
+
+static void adopt_kthread(struct task_struct *k)
+{
+	write_lock_irq(&tasklist_lock);
+	REMOVE_LINKS(k);
+	k->parent = current;
+	k->real_parent = current;
+	SET_LINKS(k);
+	write_unlock_irq(&tasklist_lock);
+}
+
+/* We are keventd: start the thread. */
+static void *start_kthread(struct kthread_op *op)
+{
+	int status = 0;
+
+	init_completion(&kthread_ack);
+
+	/* Adopt it in case it's going to die, then tell it to start... */
+	adopt_kthread(op->u.target);
+	get_task_struct(op->u.target);
+	wake_up_process(op->u.target);
+	wait_for_completion(&kthread_ack);
+
+	if (kthread_init_failed)
+		waitpid(op->u.target->tgid, &status, __WALL);
+	put_task_struct(op->u.target);
+
+	return ERR_PTR((status >> 8) & 0xFF);
+}
+
+/* We are keventd: stop the thread. */
+static void *stop_kthread(struct kthread_op *op)
+{
+	int status;
+
+	adopt_kthread(op->u.target);
+
+	get_task_struct(op->u.target);
+	force_sig(SIGTERM, op->u.target);
+	waitpid(op->u.target->tgid, &status, __WALL);
+	put_task_struct(op->u.target);
+
+	return ERR_PTR((status >> 8) & 0xFF);
+}
+
+/* We are keventd: do what they told us */
+static void keventd_do_kthread_op(void *_op)
+{
+	struct kthread_op *op = _op;
+
+	down(&kthread_control);
+
+	switch (op->type) {
+	case KTHREAD_CREATE: op->u.result = create_kthread(op); break;
+	case KTHREAD_START: op->u.result = start_kthread(op); break;
+	case KTHREAD_STOP: op->u.result = stop_kthread(op); break;
+	default:
+		BUG();
+	}
+
+	up(&kthread_control);
+	complete(&op->done);
+}
+
+static struct task_struct *do_kthread_op(enum kthread_op_type type, void *info)
+{
+	struct kthread_op op;
+	DECLARE_WORK(work, keventd_do_kthread_op, &op);
+
+	op.type = type;
+	op.u.target = info;
+	init_completion(&op.done);
+	
+	/* If we're being called to start the first workqueue, we
+	 * can't use keventd. */
+	if (!keventd_up())
+		work.func(work.data);
+	else {
+		schedule_work(&work);
+		wait_for_completion(&op.done);
+	}
+	return op.u.result;
+}
+
+struct task_struct *kthread_create(int (*initfn)(void *data),
+				   int (*corefn)(void *data),
+				   void *data,
+				   const char namefmt[],
+				   ...)
+{
+	va_list args;
+	struct kthread_create_info start;
+	/* Or, as we like to say, 16. */
+	char name[sizeof(((struct task_struct *)0)->comm)];
+
+	va_start(args, namefmt);
+	vsnprintf(name, sizeof(name), namefmt, args);
+	va_end(args);
+
+	start.initfn = initfn;
+	start.corefn = corefn;
+	start.data = data;
+	start.name = name;
+	return do_kthread_op(KTHREAD_CREATE, &start);
+}
+
+struct task_struct *kthread_start(struct task_struct *k)
+{
+	return do_kthread_op(KTHREAD_START, k);
+}
+
+int kthread_destroy(struct task_struct *k)
+{
+	return PTR_ERR(do_kthread_op(KTHREAD_STOP, k));
+}
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .29518-linux-2.6.0-bk3/kernel/workqueue.c .29518-linux-2.6.0-bk3.updated/kernel/workqueue.c
--- .29518-linux-2.6.0-bk3/kernel/workqueue.c	2003-09-22 10:27:38.000000000 +1000
+++ .29518-linux-2.6.0-bk3.updated/kernel/workqueue.c	2004-01-02 18:03:20.000000000 +1100
@@ -359,6 +359,11 @@ void flush_scheduled_work(void)
 	flush_workqueue(keventd_wq);
 }
 
+int keventd_up(void)
+{
+	return keventd_wq != NULL;
+}
+
 int current_is_keventd(void)
 {
 	struct cpu_workqueue_struct *cwq;

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-02  7:09           ` Rusty Russell
@ 2004-01-02 16:58             ` Davide Libenzi
  2004-01-03  3:05               ` Rusty Russell
  2004-03-29 15:39               ` Rusty Russell
  2004-03-29 15:38             ` Davide Libenzi
  1 sibling, 2 replies; 47+ messages in thread
From: Davide Libenzi @ 2004-01-02 16:58 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Fri, 2 Jan 2004, Rusty Russell wrote:

> In message <Pine.LNX.4.44.0312311935080.5831-100000@bigblue.dev.mdolabs.com> yo
> u write:
> > On Wed, 31 Dec 2003, Rusty Russell wrote:
> > 
> > > But an alternate implementation would be to have a "kthread" kernel
> > > thread, which would actually be parent to the kthread threads.  This
> > > means it can allocate and clean up, since it catches *all* thread
> > > deaths, including "exit()".
> > > 
> > > What do you think?
> > 
> > Did you take a look at the stuff I sent you? I'll append here with a 
> > simple comment (this goes over you bits).
> 
> Yes, but I think it's a really bad idea, as I said before.
> 
> Anyway, Here's a version which fixes the issues raised by Andrew by
> doing *everything* in keventd, uses waitpid(), and uses signals for
> communication (except for the case of init failing).

Rusty, you still have to use global static data when there is no need. I 
like this version better though ;)



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-02 16:58             ` Davide Libenzi
@ 2004-01-03  3:05               ` Rusty Russell
  2004-01-03  3:43                 ` Davide Libenzi
  2004-03-29 15:39                 ` Davide Libenzi
  2004-03-29 15:39               ` Rusty Russell
  1 sibling, 2 replies; 47+ messages in thread
From: Rusty Russell @ 2004-01-03  3:05 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

In message <Pine.LNX.4.44.0401020856150.2278-100000@bigblue.dev.mdolabs.com> you write:
> Rusty, you still have to use global static data when there is no need.

And you're still putting obscure crap in the task struct when there's
no need.  Honestly, I'd be ashamed to post such a patch.

> I like this version better though ;)

I think I should seek a second opinion though.
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-03  3:05               ` Rusty Russell
@ 2004-01-03  3:43                 ` Davide Libenzi
  2004-01-03 11:47                   ` Rusty Russell
                                     ` (3 more replies)
  2004-03-29 15:39                 ` Davide Libenzi
  1 sibling, 4 replies; 47+ messages in thread
From: Davide Libenzi @ 2004-01-03  3:43 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Sat, 3 Jan 2004, Rusty Russell wrote:

> In message <Pine.LNX.4.44.0401020856150.2278-100000@bigblue.dev.mdolabs.com> you write:
> > Rusty, you still have to use global static data when there is no need.
> 
> And you're still putting obscure crap in the task struct when there's
> no need.  Honestly, I'd be ashamed to post such a patch.

Ashamed !? Take a look at your original patch and then define shame. You 
had a communication mechanism that whilst being a private 1<->1 
communication among two tasks, relied on a single global message 
strucure, lock and mutex. Honestly I do not like myself to add stuff 
inside a strcture for one-time use. Not because of adding 12 bytes to the 
struct, that are laughable. But because it is used by a small piece of 
code w/out a re-use ability for other things.



> > I like this version better though ;)
> 
> I think I should seek a second opinion though.

But of course, even a third one ;)



- Davide





^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-03  3:43                 ` Davide Libenzi
@ 2004-01-03 11:47                   ` Rusty Russell
  2004-01-04  4:23                     ` Davide Libenzi
  2004-03-29 15:42                     ` Davide Libenzi
  2004-01-03 19:00                   ` Davide Libenzi
                                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 47+ messages in thread
From: Rusty Russell @ 2004-01-03 11:47 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

In message <Pine.LNX.4.44.0401021919240.825-100000@bigblue.dev.mdolabs.com> you write:
> On Sat, 3 Jan 2004, Rusty Russell wrote:
> 
> > In message <Pine.LNX.4.44.0401020856150.2278-100000@bigblue.dev.mdolabs.com
> you write:
> > > Rusty, you still have to use global static data when there is no need.
> > 
> > And you're still putting obscure crap in the task struct when there's
> > no need.  Honestly, I'd be ashamed to post such a patch.
> 
> Ashamed !? Take a look at your original patch and then define shame. You 
> had a communication mechanism that whilst being a private 1<->1 
> communication among two tasks, relied on a single global message 
> strucure, lock and mutex.

Still do.  It's *simple*, and I refuse to be ashamed of that.

My words were harsh, but I completely disagree with you.  I believe
you are wrong.  I would never have coded it the way you did.  I read
your code and I still think you are wrong, and find your code both
bloated and ugly.

It's not about space, it's about taste.  And placing random stuff in
an unrelated structure because you can't think of a better way to do
it is TASTELESS.  If it were the only way, it might be forgivable, but
it's not, and I far prefer a little localized messiness to global
messiness.

Now, on something we do agree: I dislike the global structure myself.
By all means try changing the code to use a pipe between child and
parent for the initfn result.  But I've told you that I will not
submit any solution which adds to a generic structure for a specific
problem.

I'm very, very sorry this has gotten a little heated: I generally
enjoy our discussions.  But I don't think I should have to say "no"
four times.

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-03  3:43                 ` Davide Libenzi
  2004-01-03 11:47                   ` Rusty Russell
@ 2004-01-03 19:00                   ` Davide Libenzi
  2004-01-03 23:53                     ` Davide Libenzi
                                       ` (3 more replies)
  2004-03-29 15:40                   ` Davide Libenzi
  2004-03-29 15:41                   ` Rusty Russell
  3 siblings, 4 replies; 47+ messages in thread
From: Davide Libenzi @ 2004-01-03 19:00 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Fri, 2 Jan 2004, Davide Libenzi wrote:

> On Sat, 3 Jan 2004, Rusty Russell wrote:
> 
> > In message <Pine.LNX.4.44.0401020856150.2278-100000@bigblue.dev.mdolabs.com> you write:
> > > Rusty, you still have to use global static data when there is no need.
> > 
> > And you're still putting obscure crap in the task struct when there's
> > no need.  Honestly, I'd be ashamed to post such a patch.
> 
> Ashamed !? Take a look at your original patch and then define shame. You 
> had a communication mechanism that whilst being a private 1<->1 
> communication among two tasks, relied on a single global message 
> strucure, lock and mutex. Honestly I do not like myself to add stuff 
> inside a strcture for one-time use. Not because of adding 12 bytes to the 
> struct, that are laughable. But because it is used by a small piece of 
> code w/out a re-use ability for other things.

Rusty, I took a better look at the patch and I think we can have 
per-kthread stuff w/out littering the task_struct and by making the thing 
more robust. We keep a global list_head protected by a global spinlock. We 
define a structure that contain all the per-kthread stuff we need 
(including a task_struct* to the kthread itself). When a kthread starts it 
will add itself to the list, and when it will die it will remove itself 
from the list. The start/stop functions will lookup the list (or hash, 
depending on how much stuff you want to drop in) with the target 
task_struct*, and if the lookup fails, it means the task already quit 
(another task already did kthread_stop() ??, natural death ????). This is 
too bad, but at least there won't be deadlock (or crash) beacause of this. 
This because currently we keep the kthread task_struct* lingering around 
w/out a method that willl inform us if the task goes away for some reason 
(so that we can avoid signaling it and waiting for some interaction). The 
list/hash will be able to tell us this. What do you think?




- Davide




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-03 19:00                   ` Davide Libenzi
@ 2004-01-03 23:53                     ` Davide Libenzi
  2004-01-04  2:34                     ` Rusty Russell
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 47+ messages in thread
From: Davide Libenzi @ 2004-01-03 23:53 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Sat, 3 Jan 2004, Davide Libenzi wrote:

> On Fri, 2 Jan 2004, Davide Libenzi wrote:
> 
> > On Sat, 3 Jan 2004, Rusty Russell wrote:
> > 
> > > In message <Pine.LNX.4.44.0401020856150.2278-100000@bigblue.dev.mdolabs.com> you write:
> > > > Rusty, you still have to use global static data when there is no need.
> > > 
> > > And you're still putting obscure crap in the task struct when there's
> > > no need.  Honestly, I'd be ashamed to post such a patch.
> > 
> > Ashamed !? Take a look at your original patch and then define shame. You 
> > had a communication mechanism that whilst being a private 1<->1 
> > communication among two tasks, relied on a single global message 
> > strucure, lock and mutex. Honestly I do not like myself to add stuff 
> > inside a strcture for one-time use. Not because of adding 12 bytes to the 
> > struct, that are laughable. But because it is used by a small piece of 
> > code w/out a re-use ability for other things.
> 
> Rusty, I took a better look at the patch and I think we can have 
> per-kthread stuff w/out littering the task_struct and by making the thing 
> more robust. We keep a global list_head protected by a global spinlock. We 
> define a structure that contain all the per-kthread stuff we need 
> (including a task_struct* to the kthread itself). When a kthread starts it 
> will add itself to the list, and when it will die it will remove itself 
> from the list. The start/stop functions will lookup the list (or hash, 
> depending on how much stuff you want to drop in) with the target 
> task_struct*, and if the lookup fails, it means the task already quit 
> (another task already did kthread_stop() ??, natural death ????). This is 
> too bad, but at least there won't be deadlock (or crash) beacause of this. 
> This because currently we keep the kthread task_struct* lingering around 
> w/out a method that willl inform us if the task goes away for some reason 
> (so that we can avoid signaling it and waiting for some interaction). The 
> list/hash will be able to tell us this. What do you think?

Rusty, I gave the patch a quick shot. There's no more single and 
global data structures but the list that stores all kthreads. Like usual, 
if you don't like it you can trash it. But pls leave shame apart. It did 
nothing to you :-)




- Davide




--- linux-2.5/include/linux/kthread.h._orig	2004-01-03 14:15:33.033286880 -0800
+++ linux-2.5/include/linux/kthread.h	2004-01-03 14:15:33.033286880 -0800
@@ -0,0 +1,81 @@
+#ifndef _LINUX_KTHREAD_H
+#define _LINUX_KTHREAD_H
+/* Simple interface for creating and stopping kernel threads without mess. */
+#include <linux/err.h>
+struct task_struct;
+
+/**
+ * kthread_create: create a kthread.
+ * @initfn: a function to run when kthread_start() is called.
+ * @corefn: a function to run after @initfn succeeds.
+ * @data: data ptr for @initfn and @corefn.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: This helper function creates and names a kernel
+ * thread, and controls it for you.  See kthread_start() and
+ * kthread_run().
+ * 
+ * Returns a task_struct or ERR_PTR(-ENOMEM).
+ */
+struct task_struct *kthread_create(int (*initfn)(void *data),
+				   int (*corefn)(void *data),
+				   void *data,
+				   const char namefmt[], ...);
+
+/**
+ * kthread_start: start a thread created by kthread_create().
+ * @k: the task returned by kthread_create().
+ *
+ * Description: This makes the thread @k run the initfn() specified in
+ * kthread_create(), if non-NULL, and return the result.  If that
+ * result is less than zero, then the thread @k is terminated.
+ * Otherwise, the corefn() is run if it is not NULL.
+ *
+ * There are two ways to write corefn(): simple or complex.  For
+ * simple, you set current->state to TASK_INTERRUPTIBLE, check for
+ * work, then return 0 when it is done.  The kthread code will call
+ * schedule(), and call corefn() again when woken.  More complex
+ * corefn()s can loop themselves, but must return 0 when
+ * signal_pending(current) is true.
+ *
+ * corefn() can terminate itself in two ways: if someone will later
+ * call kthread_stop(), simply return a negative error number.
+ * Otherwise, if you call do_exit() directly, the thread will
+ * terminate immediately. */
+int kthread_start(struct task_struct *k);
+
+/**
+ * kthread_run: create and start a thread.
+ * @initfn: a function to run when kthread_start() is called.
+ * @corefn: a function to run after initfn succeeds.
+ * @data: data ptr for @initfn and @corefn.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: Convenient wrapper for kthread_create() followed by
+ * kthread_start().  Returns the kthread, or ERR_PTR(). */
+#define kthread_run(initfn, corefn, data, namefmt, ...)			      \
+({									      \
+	struct task_struct *__k						      \
+		= kthread_create(initfn,corefn,data,namefmt, ## __VA_ARGS__); \
+	if (!IS_ERR(__k)) {						      \
+		int __err = kthread_start(__k);				      \
+		if (__err) __k = ERR_PTR(__err);			      \
+	}								      \
+	__k;								      \
+})
+
+/**
+ * kthread_stop: stop a thread created by kthread_create().
+ * @k: thread created by kthread_start.
+ *
+ * Sends a signal to @k, and waits for it to exit.  Your corefn() must
+ * not call do_exit() itself if you use this function!  This can also
+ * be called after kthread_create() instead of calling
+ * kthread_start(): the thread will exit without calling initfn() or
+ * corefn().
+ *
+ * Returns the last result of corefn(), or 0 if kthread_start() was
+ * never called. */
+int kthread_stop(struct task_struct *k);
+
+#endif /* _LINUX_KTHREAD_H */
--- linux-2.5/include/linux/workqueue.h._orig	2004-01-03 14:14:17.678742512 -0800
+++ linux-2.5/include/linux/workqueue.h	2004-01-03 14:15:33.042285512 -0800
@@ -60,6 +60,7 @@
 extern int FASTCALL(schedule_delayed_work(struct work_struct *work, unsigned long delay));
 extern void flush_scheduled_work(void);
 extern int current_is_keventd(void);
+extern int keventd_up(void);
 
 extern void init_workqueues(void);
 
--- linux-2.5/kernel/Makefile._orig	2004-01-03 14:14:18.205662408 -0800
+++ linux-2.5/kernel/Makefile	2004-01-03 14:15:33.092277912 -0800
@@ -6,7 +6,8 @@
 	    exit.o itimer.o time.o softirq.o resource.o \
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
-	    rcupdate.o intermodule.o extable.o params.o posix-timers.o
+	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
+	    kthread.o
 
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
--- linux-2.5/kernel/kthread.c._orig	2004-01-03 14:15:33.100276696 -0800
+++ linux-2.5/kernel/kthread.c	2004-01-03 15:48:39.783971680 -0800
@@ -0,0 +1,255 @@
+/* Kernel thread helper functions.
+ *   Copyright (C) 2003 IBM Corporation, Rusty Russell.
+ *
+ * Everything is done via keventd, so that we get a clean environment
+ * even if we're invoked from userspace (think modprobe, hotplug cpu,
+ * etc.).  Also, it allows us to wait for dying kthreads without side
+ * effects involved in adopting kthreads to random processes.
+ */
+#define __KERNEL_SYSCALLS__
+#include <linux/sched.h>
+#include <linux/list.h>
+#include <linux/kthread.h>
+#include <linux/completion.h>
+#include <linux/err.h>
+#include <linux/unistd.h>
+#include <asm/semaphore.h>
+
+
+
+
+static LIST_HEAD(kt_list);
+static spinlock_t kt_lock = SPIN_LOCK_UNLOCKED;
+
+
+
+struct kthread_create_info
+{
+	int (*initfn)(void *data);
+	int (*corefn)(void *data);
+	void *data;
+	char *name;
+};
+
+struct kthread_create
+{
+	struct kthread_create_info info;
+	struct semaphore ack;
+	struct task_struct *tsk;
+};
+
+struct kthread_node
+{
+	struct list_head lnk;
+	struct task_struct *self;
+	struct semaphore sem;
+	struct semaphore ack;
+	int ret, done;
+};
+
+
+/*
+ * Try to lookup the kthread node associated with the task "k".
+ * If the task exist, acquire ktn->sem to avoid the task to exit.
+ */
+static struct kthread_node *kthread_lookup(struct task_struct *k) {
+	struct list_head *pos;
+	struct kthread_node *ktn = NULL;
+
+	spin_lock(&kt_lock);
+	list_for_each(pos, &kt_list) {
+		ktn = list_entry(pos, struct kthread_node, lnk);
+		if (ktn->self == k)
+			break;
+		ktn = NULL;
+	}
+	if (ktn && down_trylock(&ktn->sem))
+		ktn = NULL;
+	spin_unlock(&kt_lock);
+	return ktn;
+}
+
+
+/* Returns a POSITIVE error number. */
+static int kthread(void *data)
+{
+	/* Copy data: it's on keventd_init's stack */
+	struct kthread_create *ktc = data;
+	struct kthread_create_info k = ktc->info;
+	int ret = 0;
+	sigset_t blocked;
+	struct kthread_node ktn;
+
+	/* Initialize the kthread list node and drop itself inside the list */
+	ktn.done = 0;
+	ktn.ret = -1;
+	ktn.self = current;
+	init_MUTEX(&ktn.sem);
+	init_MUTEX_LOCKED(&ktn.ack);
+	spin_lock(&kt_lock);
+	list_add(&ktn.lnk, &kt_list);
+	spin_unlock(&kt_lock);
+
+	strcpy(current->comm, k.name);
+
+	/* Block and flush all signals. */
+	sigfillset(&blocked);
+	sigprocmask(SIG_BLOCK, &blocked, NULL);
+	flush_signals(current);
+
+	/*
+	 * OK, tell user we're spawned, wait for kthread_start/stop.
+	 * Note that after this call the "ktc" pointer will be no longer
+	 * valid.
+	 */
+	up(&ktc->ack);
+	current->state = TASK_INTERRUPTIBLE;
+	schedule();
+
+	ret = EINTR;
+	if (signal_pending(current))
+		goto out;
+
+	/* Run initfn, tell the caller the result. */
+	ret = 0;
+	if (k.initfn)
+		ret = k.initfn(k.data);
+	ktn.ret = ret;
+
+	/*
+	 * The caller is waiting for us to signal the completion
+	 * of the init function.
+	 */
+	up(&ktn.ack);
+
+	if (ret < 0)
+		goto out;
+
+	while (!signal_pending(current)) {
+		/* If it fails, just wait until kthread_stop. */
+		if (k.corefn && (ret = k.corefn(k.data)) < 0) {
+			k.corefn = NULL;
+			current->state = TASK_INTERRUPTIBLE;
+		}
+		schedule();
+	}
+
+	current->state = TASK_RUNNING;
+
+	ret = 0;
+out:
+	spin_lock(&kt_lock);
+	list_del(&ktn.lnk);
+	spin_unlock(&kt_lock);
+
+	ktn.done = 1;
+	ktn.ret = ret;
+
+	/*
+	 * The control code will hold the semaphore down to avoid
+	 * the task (and the associated node data) to vanish during
+	 * the handshake
+	 */
+	down(&ktn.sem);
+
+	return ret;
+}
+
+static void create_kthread(void *data)
+{
+	int pid;
+	struct kthread_create *ktc = data;
+
+	/* We want our own signal handler (we take no signals by default). */
+	pid = kernel_thread(kthread, ktc, CLONE_FS|CLONE_FILES|SIGCHLD);
+	if (pid < 0)
+		ktc->tsk = ERR_PTR(pid);
+	else {
+		down(&ktc->ack);
+		ktc->tsk = find_task_by_pid(pid);
+	}
+}
+
+struct task_struct *kthread_create(int (*initfn)(void *data),
+				   int (*corefn)(void *data),
+				   void *data,
+				   const char namefmt[],
+				   ...)
+{
+	va_list args;
+	struct kthread_create ktc;
+	/* Or, as we like to say, 16. */
+	char name[sizeof(((struct task_struct *)0)->comm)];
+	DECLARE_WORK(work, create_kthread, &ktc);
+
+	va_start(args, namefmt);
+	vsnprintf(name, sizeof(name), namefmt, args);
+	va_end(args);
+
+	ktc.info.initfn = initfn;
+	ktc.info.corefn = corefn;
+	ktc.info.data = data;
+	ktc.info.name = name;
+	init_MUTEX_LOCKED(&ktc.ack);
+
+	/* If we're being called to start the first workqueue, we
+	 * can't use keventd. */
+	if (!keventd_up())
+		work.func(work.data);
+	else {
+		schedule_work(&work);
+		down(&ktc.ack);
+	}
+	return ktc.tsk;
+}
+
+int kthread_start(struct task_struct *k)
+{
+	int ret;
+	struct kthread_node *ktn = kthread_lookup(k);
+
+	/* No corresponding kthread is found. Too bad ... */
+	if (!ktn)
+		return -ENOENT;
+
+	/*
+	 * Wake up the kthread task waiting for us to unlock so that
+	 * it can go run the init function.
+	 */
+	wake_up_process(ktn->self);
+
+	/* Wait for kthread ack after init function complete. */
+	down(&ktn->ack);
+
+	/* Get the init result. */
+	ret = ktn->ret;
+
+	/* now we can release the kthread node. */
+	up(&ktn->sem);
+
+	return ret;
+}
+
+int kthread_stop(struct task_struct *k)
+{
+	int ret;
+	struct kthread_node *ktn = kthread_lookup(k);
+
+	/* No corresponding kthread is found. Too bad ... */
+	if (!ktn)
+		return -ENOENT;
+
+	force_sig(SIGTERM, ktn->self);
+
+	while (!ktn->done)
+		yield();
+
+	/* Get the complete result. */
+	ret = ktn->ret;
+
+	/* now we can release the kthread node. */
+	up(&ktn->sem);
+
+	return ret;
+}
+
--- linux-2.5/kernel/workqueue.c._orig	2004-01-03 14:14:18.259654200 -0800
+++ linux-2.5/kernel/workqueue.c	2004-01-03 14:15:33.109275328 -0800
@@ -359,6 +359,11 @@
 	flush_workqueue(keventd_wq);
 }
 
+int keventd_up(void)
+{
+	return keventd_wq != NULL;
+}
+
 int current_is_keventd(void)
 {
 	struct cpu_workqueue_struct *cwq;


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-03 19:00                   ` Davide Libenzi
  2004-01-03 23:53                     ` Davide Libenzi
@ 2004-01-04  2:34                     ` Rusty Russell
  2004-01-04  4:42                       ` Davide Libenzi
  2004-03-29 15:42                       ` Davide Libenzi
  2004-03-29 15:41                     ` Davide Libenzi
  2004-03-29 15:42                     ` Rusty Russell
  3 siblings, 2 replies; 47+ messages in thread
From: Rusty Russell @ 2004-01-04  2:34 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

In message <Pine.LNX.4.44.0401031021280.1678-100000@bigblue.dev.mdolabs.com> you write:
> Rusty, I took a better look at the patch and I think we can have 
> per-kthread stuff w/out littering the task_struct and by making the thing 
> more robust.

Except sharing data with a lock is perfectly robust.

> We keep a global list_head protected by a global spinlock. We 
> define a structure that contain all the per-kthread stuff we need 
> (including a task_struct* to the kthread itself). When a kthread starts it 
> will add itself to the list, and when it will die it will remove itself 
> from the list.

Yeah, I deliberately didn't implement this, because (1) it seems like
a lot of complexity when using a lock and letting them share a single
structure works fine and is even simpler, and (2) the thread can't
just "do_exit()".

You can get around (2) by having a permenant parent "kthread" thread
which is a parent to all the kthreads (it'll get a SIGCHLD when
someone does "do_exit()").  But the implementation was pretty ugly,
since it involved having a communications mechanism with the kthread
parent, which means you have the global ktm_message-like-thing for
this...

> The start/stop functions will lookup the list (or hash, 
> depending on how much stuff you want to drop in) with the target 
> task_struct*, and if the lookup fails, it means the task already quit 
> (another task already did kthread_stop() ??, natural death ????). This is 
> too bad, but at least there won't be deadlock (or crash) beacause of this. 

> This because currently we keep the kthread task_struct* lingering around 
> w/out a method that willl inform us if the task goes away for some reason 
> (so that we can avoid signaling it and waiting for some interaction). The 
> list/hash will be able to tell us this. What do you think?

I put the "corefn fails" option in there in case someone wants it, but
I'm not sure how it would be used in practice.  If it becomes
commonplace to have them handing around for a long time, then storing
the result somewhere rather than keeping the whole thread around makes
sense, but at the moment it seems like premature optimization.

Unfortunately, my fd idea is broken, because the process which does
the start may not be the same as the one which does the create.  A
shared queue really is the simplest solution.

Latest version below,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

Name: Use Signals For Kthread
Author: Rusty Russell
Status: Tested on 2.6.0-bk3

D: The hotplug CPU code introduces two major problems:
D: 
D: 1) Threads which previously never stopped (migration thread,
D:    ksoftirqd, keventd) have to be stopped cleanly as CPUs go offline.
D: 2) Threads which previously never had to be created now have
D:    to be created when a CPU goes online.
D: 
D: Unfortunately, stopping a thread is fairly baroque, involving memory
D: barriers, a completion and spinning until the task is actually dead.
D: 
D: There are also three problems in starting a thread:
D: 1) Doing it from a random process context risks environment contamination:
D:    better to do it from keventd to guarantee a clean environment, a-la
D:    call_usermodehelper.
D: 2) Getting the task struct without races is a hard: see kernel/sched.c
D:    migration_call(), kernel/workqueue.c create_workqueue_thread().
D: 3) There are races in starting a thread for a CPU which is not yet
D:    online: migration thread does a complex dance at the moment for
D:    a similar reason (there may be no migration thread to migrate us).
D: 
D: Place all this logic in some primitives to make life easier:
D: kthread_create(), kthread_start() and kthread_destroy().  These
D: primitives require no extra data-structures in the caller: they operate
D: on normal "struct task_struct"s.
D: 
D: Other changes:
D:   - Expose keventd_up(), as keventd and migration threads will use
D:     kthread to launch, and kthread normally uses workqueues and must
D:     recognize this case.
D: 
D: Changes since first version:
D: 1) Sleep UNINTERRUPTIBLE when waiting for reply (thanks Andrew Morton).
D: 2) Reparent threads and waitpid for them.
D: 3) Do ALL ops via keventd, so we don't disturb current process.
D: 4) This version uses signals rather than its own communication
D:     mechanism makes kthread code a little simpler, and means users can
D:     loop inside corefn if they want, and return when signal_pending().
D: 5) kthread_destroy() renamed to kthread_stop().
D: 6) kthread_start() returns an int.
D: 7) Better documentation, in nanodoc form.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .8740-linux-2.6.1-rc1/include/linux/kthread.h .8740-linux-2.6.1-rc1.updated/include/linux/kthread.h
--- .8740-linux-2.6.1-rc1/include/linux/kthread.h	1970-01-01 10:00:00.000000000 +1000
+++ .8740-linux-2.6.1-rc1.updated/include/linux/kthread.h	2004-01-02 21:18:54.000000000 +1100
@@ -0,0 +1,81 @@
+#ifndef _LINUX_KTHREAD_H
+#define _LINUX_KTHREAD_H
+/* Simple interface for creating and stopping kernel threads without mess. */
+#include <linux/err.h>
+struct task_struct;
+
+/**
+ * kthread_create: create a kthread.
+ * @initfn: a function to run when kthread_start() is called.
+ * @corefn: a function to run after @initfn succeeds.
+ * @data: data ptr for @initfn and @corefn.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: This helper function creates and names a kernel
+ * thread, and controls it for you.  See kthread_start() and
+ * kthread_run().
+ * 
+ * Returns a task_struct or ERR_PTR(-ENOMEM).
+ */
+struct task_struct *kthread_create(int (*initfn)(void *data),
+				   int (*corefn)(void *data),
+				   void *data,
+				   const char namefmt[], ...);
+
+/**
+ * kthread_start: start a thread created by kthread_create().
+ * @k: the task returned by kthread_create().
+ *
+ * Description: This makes the thread @k run the initfn() specified in
+ * kthread_create(), if non-NULL, and return the result.  If that
+ * result is less than zero, then the thread @k is terminated.
+ * Otherwise, the corefn() is run if it is not NULL.
+ *
+ * There are two ways to write corefn(): simple or complex.  For
+ * simple, you set current->state to TASK_INTERRUPTIBLE, check for
+ * work, then return 0 when it is done.  The kthread code will call
+ * schedule(), and call corefn() again when woken.  More complex
+ * corefn()s can loop themselves, but must return 0 when
+ * signal_pending(current) is true.
+ *
+ * corefn() can terminate itself in two ways: if someone will later
+ * call kthread_stop(), simply return a negative error number.
+ * Otherwise, if you call do_exit() directly, the thread will
+ * terminate immediately. */
+int kthread_start(struct task_struct *k);
+
+/**
+ * kthread_run: create and start a thread.
+ * @initfn: a function to run when kthread_start() is called.
+ * @corefn: a function to run after initfn succeeds.
+ * @data: data ptr for @initfn and @corefn.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: Convenient wrapper for kthread_create() followed by
+ * kthread_start().  Returns the kthread, or ERR_PTR(). */
+#define kthread_run(initfn, corefn, data, namefmt, ...)			      \
+({									      \
+	struct task_struct *__k						      \
+		= kthread_create(initfn,corefn,data,namefmt, ## __VA_ARGS__); \
+	if (!IS_ERR(__k)) {						      \
+		int __err = kthread_start(__k);				      \
+		if (__err) __k = ERR_PTR(__err);			      \
+	}								      \
+	__k;								      \
+})
+
+/**
+ * kthread_stop: stop a thread created by kthread_create().
+ * @k: thread created by kthread_start.
+ *
+ * Sends a signal to @k, and waits for it to exit.  Your corefn() must
+ * not call do_exit() itself if you use this function!  This can also
+ * be called after kthread_create() instead of calling
+ * kthread_start(): the thread will exit without calling initfn() or
+ * corefn().
+ *
+ * Returns the last result of corefn(), or 0 if kthread_start() was
+ * never called. */
+int kthread_stop(struct task_struct *k);
+
+#endif /* _LINUX_KTHREAD_H */
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .8740-linux-2.6.1-rc1/include/linux/workqueue.h .8740-linux-2.6.1-rc1.updated/include/linux/workqueue.h
--- .8740-linux-2.6.1-rc1/include/linux/workqueue.h	2003-09-22 10:07:08.000000000 +1000
+++ .8740-linux-2.6.1-rc1.updated/include/linux/workqueue.h	2004-01-02 20:45:13.000000000 +1100
@@ -60,6 +60,7 @@ extern int FASTCALL(schedule_work(struct
 extern int FASTCALL(schedule_delayed_work(struct work_struct *work, unsigned long delay));
 extern void flush_scheduled_work(void);
 extern int current_is_keventd(void);
+extern int keventd_up(void);
 
 extern void init_workqueues(void);
 
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .8740-linux-2.6.1-rc1/kernel/Makefile .8740-linux-2.6.1-rc1.updated/kernel/Makefile
--- .8740-linux-2.6.1-rc1/kernel/Makefile	2003-10-09 18:03:02.000000000 +1000
+++ .8740-linux-2.6.1-rc1.updated/kernel/Makefile	2004-01-02 20:45:13.000000000 +1100
@@ -6,7 +6,8 @@ obj-y     = sched.o fork.o exec_domain.o
 	    exit.o itimer.o time.o softirq.o resource.o \
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
-	    rcupdate.o intermodule.o extable.o params.o posix-timers.o
+	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
+	    kthread.o
 
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .8740-linux-2.6.1-rc1/kernel/kthread.c .8740-linux-2.6.1-rc1.updated/kernel/kthread.c
--- .8740-linux-2.6.1-rc1/kernel/kthread.c	1970-01-01 10:00:00.000000000 +1000
+++ .8740-linux-2.6.1-rc1.updated/kernel/kthread.c	2004-01-02 21:07:50.000000000 +1100
@@ -0,0 +1,227 @@
+/* Kernel thread helper functions.
+ *   Copyright (C) 2003 IBM Corporation, Rusty Russell.
+ *
+ * Everything is done via keventd, so that we get a clean environment
+ * even if we're invoked from userspace (think modprobe, hotplug cpu,
+ * etc.).  Also, it allows us to wait for dying kthreads without side
+ * effects involved in adopting kthreads to random processes.
+ */
+#define __KERNEL_SYSCALLS__
+#include <linux/sched.h>
+#include <linux/kthread.h>
+#include <linux/completion.h>
+#include <linux/err.h>
+#include <linux/unistd.h>
+#include <asm/semaphore.h>
+
+/* This makes sure only one kthread is being talked to at once:
+ * protects global vars below. */
+static DECLARE_MUTEX(kthread_control);
+
+static int kthread_init_failed;
+static DECLARE_COMPLETION(kthread_ack);
+
+struct kthread_create_info
+{
+	int (*initfn)(void *data);
+	int (*corefn)(void *data);
+	void *data;
+	char *name;
+};
+
+/* Returns a POSITIVE error number. */
+static int kthread(void *data)
+{
+	/* Copy data: it's on keventd_init's stack */
+	struct kthread_create_info k = *(struct kthread_create_info *)data;
+	int ret = 0;
+	sigset_t blocked;
+
+	strcpy(current->comm, k.name);
+
+	/* Block and flush all signals. */
+	sigfillset(&blocked);
+	sigprocmask(SIG_BLOCK, &blocked, NULL);
+	flush_signals(current);
+
+	/* OK, tell user we're spawned, wait for kthread_start/stop */
+	current->state = TASK_INTERRUPTIBLE;
+	complete(&kthread_ack);
+	schedule();
+
+	if (signal_pending(current))
+		return EINTR;
+
+	/* Run initfn, tell the caller the result. */
+	if (k.initfn)
+		ret = k.initfn(k.data);
+	kthread_init_failed = (ret < 0);
+	complete(&kthread_ack);
+	if (ret < 0)
+		return -ret;
+
+	while (!signal_pending(current)) {
+		/* If it fails, just wait until kthread_stop. */
+		if (k.corefn && (ret = k.corefn(k.data)) < 0) {
+			k.corefn = NULL;
+			current->state = TASK_INTERRUPTIBLE;
+		}
+		schedule();
+	}
+
+	current->state = TASK_RUNNING;
+	return -ret;
+}
+
+enum kthread_op_type
+{
+	KTHREAD_CREATE,
+	KTHREAD_START,
+	KTHREAD_STOP,
+};
+
+struct kthread_op
+{
+	enum kthread_op_type type;
+
+	union {
+		/* KTHREAD_CREATE in */
+		struct kthread_start_info *start;
+		/* KTHREAD_START, KTHREAD_STOP in */
+		struct task_struct *target;
+		/* out */
+		struct task_struct *result;
+	} u;
+	struct completion done;
+};
+
+/* We are keventd: create a thread. */
+static void *create_kthread(struct kthread_op *op)
+{
+	int pid;
+
+	init_completion(&kthread_ack);
+	/* We want our own signal handler (we take no signals by default). */
+	pid = kernel_thread(kthread, op->u.start,CLONE_FS|CLONE_FILES|SIGCHLD);
+	if (pid < 0)
+		return ERR_PTR(pid);
+
+	wait_for_completion(&kthread_ack);
+	return find_task_by_pid(pid);
+}
+
+static void adopt_kthread(struct task_struct *k)
+{
+	write_lock_irq(&tasklist_lock);
+	REMOVE_LINKS(k);
+	k->parent = current;
+	k->real_parent = current;
+	SET_LINKS(k);
+	write_unlock_irq(&tasklist_lock);
+}
+
+/* We are keventd: start the thread. */
+static void *start_kthread(struct kthread_op *op)
+{
+	int status = 0;
+
+	init_completion(&kthread_ack);
+
+	/* Adopt it in case it's going to die, then tell it to start... */
+	adopt_kthread(op->u.target);
+	get_task_struct(op->u.target);
+	wake_up_process(op->u.target);
+	wait_for_completion(&kthread_ack);
+
+	if (kthread_init_failed)
+		waitpid(op->u.target->tgid, &status, __WALL);
+	put_task_struct(op->u.target);
+
+	return ERR_PTR((status >> 8) & 0xFF);
+}
+
+/* We are keventd: stop the thread. */
+static void *stop_kthread(struct kthread_op *op)
+{
+	int status;
+
+	adopt_kthread(op->u.target);
+
+	get_task_struct(op->u.target);
+	force_sig(SIGTERM, op->u.target);
+	waitpid(op->u.target->tgid, &status, __WALL);
+	put_task_struct(op->u.target);
+
+	return ERR_PTR((status >> 8) & 0xFF);
+}
+
+/* We are keventd: do what they told us */
+static void keventd_do_kthread_op(void *_op)
+{
+	struct kthread_op *op = _op;
+
+	down(&kthread_control);
+
+	switch (op->type) {
+	case KTHREAD_CREATE: op->u.result = create_kthread(op); break;
+	case KTHREAD_START: op->u.result = start_kthread(op); break;
+	case KTHREAD_STOP: op->u.result = stop_kthread(op); break;
+	default:
+		BUG();
+	}
+
+	up(&kthread_control);
+	complete(&op->done);
+}
+
+static struct task_struct *do_kthread_op(enum kthread_op_type type, void *info)
+{
+	struct kthread_op op;
+	DECLARE_WORK(work, keventd_do_kthread_op, &op);
+
+	op.type = type;
+	op.u.target = info;
+	init_completion(&op.done);
+	
+	/* If we're being called to start the first workqueue, we
+	 * can't use keventd. */
+	if (!keventd_up())
+		work.func(work.data);
+	else {
+		schedule_work(&work);
+		wait_for_completion(&op.done);
+	}
+	return op.u.result;
+}
+
+struct task_struct *kthread_create(int (*initfn)(void *data),
+				   int (*corefn)(void *data),
+				   void *data,
+				   const char namefmt[],
+				   ...)
+{
+	va_list args;
+	struct kthread_create_info start;
+	/* Or, as we like to say, 16. */
+	char name[sizeof(((struct task_struct *)0)->comm)];
+
+	va_start(args, namefmt);
+	vsnprintf(name, sizeof(name), namefmt, args);
+	va_end(args);
+
+	start.initfn = initfn;
+	start.corefn = corefn;
+	start.data = data;
+	start.name = name;
+	return do_kthread_op(KTHREAD_CREATE, &start);
+}
+
+int kthread_start(struct task_struct *k)
+{
+	return PTR_ERR(do_kthread_op(KTHREAD_START, k));
+}
+
+int kthread_stop(struct task_struct *k)
+{
+	return PTR_ERR(do_kthread_op(KTHREAD_STOP, k));
+}
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .8740-linux-2.6.1-rc1/kernel/workqueue.c .8740-linux-2.6.1-rc1.updated/kernel/workqueue.c
--- .8740-linux-2.6.1-rc1/kernel/workqueue.c	2003-09-22 10:27:38.000000000 +1000
+++ .8740-linux-2.6.1-rc1.updated/kernel/workqueue.c	2004-01-02 20:45:13.000000000 +1100
@@ -359,6 +359,11 @@ void flush_scheduled_work(void)
 	flush_workqueue(keventd_wq);
 }
 
+int keventd_up(void)
+{
+	return keventd_wq != NULL;
+}
+
 int current_is_keventd(void)
 {
 	struct cpu_workqueue_struct *cwq;

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-03 11:47                   ` Rusty Russell
@ 2004-01-04  4:23                     ` Davide Libenzi
  2004-03-29 15:42                     ` Davide Libenzi
  1 sibling, 0 replies; 47+ messages in thread
From: Davide Libenzi @ 2004-01-04  4:23 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Sat, 3 Jan 2004, Rusty Russell wrote:

> Still do.  It's *simple*, and I refuse to be ashamed of that.
> 
> My words were harsh, but I completely disagree with you.  I believe
> you are wrong.  I would never have coded it the way you did.  I read
> your code and I still think you are wrong, and find your code both
> bloated and ugly.

Bloated ? This is the diffstat of my "ashamed" patch over your bits :-)

include/linux/init_task.h |    3 +
include/linux/sched.h     |    8 ++++
kernel/kthread.c          |   78 ++++++++++++++++------------------------------
3 files changed, 39 insertions(+), 50 deletions(-)



> Now, on something we do agree: I dislike the global structure myself.
> By all means try changing the code to use a pipe between child and
> parent for the initfn result.  But I've told you that I will not
> submit any solution which adds to a generic structure for a specific
> problem.
> 
> I'm very, very sorry this has gotten a little heated: I generally
> enjoy our discussions.  But I don't think I should have to say "no"
> four times.

It's ok Rusty, I enjoy the discussion in any case :-) Since I told you in 
a private email that I was convinced myself about adding stuff inside the 
struct, you could have avoided the "ashamed" thing. But it's fine, a 
little bit of sarcasm is the salt of life.




- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-04  2:34                     ` Rusty Russell
@ 2004-01-04  4:42                       ` Davide Libenzi
  2004-01-04  4:55                         ` Davide Libenzi
                                           ` (2 more replies)
  2004-03-29 15:42                       ` Davide Libenzi
  1 sibling, 3 replies; 47+ messages in thread
From: Davide Libenzi @ 2004-01-04  4:42 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Sun, 4 Jan 2004, Rusty Russell wrote:

> In message <Pine.LNX.4.44.0401031021280.1678-100000@bigblue.dev.mdolabs.com> you write:
> > Rusty, I took a better look at the patch and I think we can have 
> > per-kthread stuff w/out littering the task_struct and by making the thing 
> > more robust.
> 
> Except sharing data with a lock is perfectly robust.
> 
> > We keep a global list_head protected by a global spinlock. We 
> > define a structure that contain all the per-kthread stuff we need 
> > (including a task_struct* to the kthread itself). When a kthread starts it 
> > will add itself to the list, and when it will die it will remove itself 
> > from the list.
> 
> Yeah, I deliberately didn't implement this, because (1) it seems like
> a lot of complexity when using a lock and letting them share a single
> structure works fine and is even simpler, and (2) the thread can't
> just "do_exit()".
> 
> You can get around (2) by having a permenant parent "kthread" thread
> which is a parent to all the kthreads (it'll get a SIGCHLD when
> someone does "do_exit()").  But the implementation was pretty ugly,
> since it involved having a communications mechanism with the kthread
> parent, which means you have the global ktm_message-like-thing for
> this...

You will lose in any case. What happens if the thread does do_exit() and 
you do kthread_stop() after that?
With the patch I posted to you, the kthread_stop() will simply miss the 
lookup and return -ENOENT.



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-04  4:42                       ` Davide Libenzi
@ 2004-01-04  4:55                         ` Davide Libenzi
  2004-01-04  9:35                         ` Rusty Russell
  2004-03-29 15:42                         ` Davide Libenzi
  2 siblings, 0 replies; 47+ messages in thread
From: Davide Libenzi @ 2004-01-04  4:55 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Sat, 3 Jan 2004, Davide Libenzi wrote:

> On Sun, 4 Jan 2004, Rusty Russell wrote:
> 
> > In message <Pine.LNX.4.44.0401031021280.1678-100000@bigblue.dev.mdolabs.com> you write:
> > > Rusty, I took a better look at the patch and I think we can have 
> > > per-kthread stuff w/out littering the task_struct and by making the thing 
> > > more robust.
> > 
> > Except sharing data with a lock is perfectly robust.
> > 
> > > We keep a global list_head protected by a global spinlock. We 
> > > define a structure that contain all the per-kthread stuff we need 
> > > (including a task_struct* to the kthread itself). When a kthread starts it 
> > > will add itself to the list, and when it will die it will remove itself 
> > > from the list.
> > 
> > Yeah, I deliberately didn't implement this, because (1) it seems like
> > a lot of complexity when using a lock and letting them share a single
> > structure works fine and is even simpler, and (2) the thread can't
> > just "do_exit()".
> > 
> > You can get around (2) by having a permenant parent "kthread" thread
> > which is a parent to all the kthreads (it'll get a SIGCHLD when
> > someone does "do_exit()").  But the implementation was pretty ugly,
> > since it involved having a communications mechanism with the kthread
> > parent, which means you have the global ktm_message-like-thing for
> > this...
> 
> You will lose in any case. What happens if the thread does do_exit() and 
> you do kthread_stop() after that?
> With the patch I posted to you, the kthread_stop() will simply miss the 
> lookup and return -ENOENT.

Nope, we are screwed in any case. Is it really important to give the 
ability to do do_exit() for kthreads? I mean, why a simple return would 
not work?



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-04  4:42                       ` Davide Libenzi
  2004-01-04  4:55                         ` Davide Libenzi
@ 2004-01-04  9:35                         ` Rusty Russell
  2004-01-04 23:03                           ` Davide Libenzi
  2004-03-29 15:42                         ` Davide Libenzi
  2 siblings, 1 reply; 47+ messages in thread
From: Rusty Russell @ 2004-01-04  9:35 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: mingo, Linux Kernel Mailing List

In message <Pine.LNX.4.44.0401032039350.2022-100000@bigblue.dev.mdolabs.com> you write:
> > You can get around (2) by having a permenant parent "kthread" thread
> > which is a parent to all the kthreads (it'll get a SIGCHLD when
> > someone does "do_exit()").  But the implementation was pretty ugly,
> > since it involved having a communications mechanism with the kthread
> > parent, which means you have the global ktm_message-like-thing for
> > this...
> 
> You will lose in any case. What happens if the thread does do_exit() and 
> you do kthread_stop() after that?

That's illegal.  Either your thread exits, or you call kthread_stop().

> With the patch I posted to you, the kthread_stop() will simply miss the 
> lookup and return -ENOENT.

Or find some other random kthread which has reused the task struct and
kill that 8(

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-04  9:35                         ` Rusty Russell
@ 2004-01-04 23:03                           ` Davide Libenzi
  2004-01-05  4:09                             ` Rusty Russell
  0 siblings, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2004-01-04 23:03 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Davide Libenzi, mingo, Linux Kernel Mailing List

On Sun, 4 Jan 2004, Rusty Russell wrote:

> In message <Pine.LNX.4.44.0401032039350.2022-100000@bigblue.dev.mdolabs.com> you write:
> > > You can get around (2) by having a permenant parent "kthread" thread
> > > which is a parent to all the kthreads (it'll get a SIGCHLD when
> > > someone does "do_exit()").  But the implementation was pretty ugly,
> > > since it involved having a communications mechanism with the kthread
> > > parent, which means you have the global ktm_message-like-thing for
> > > this...
> > 
> > You will lose in any case. What happens if the thread does do_exit() and 
> > you do kthread_stop() after that?
> 
> That's illegal.  Either your thread exits, or you call kthread_stop().
> 
> > With the patch I posted to you, the kthread_stop() will simply miss the 
> > lookup and return -ENOENT.
> 
> Or find some other random kthread which has reused the task struct and
> kill that 8(

I can see two options:

1) We do not allow do_exit() from kthreads

2) We give kthread_exit()

What do you think?



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-04 23:03                           ` Davide Libenzi
@ 2004-01-05  4:09                             ` Rusty Russell
  2004-01-05  5:06                               ` Davide Libenzi
  0 siblings, 1 reply; 47+ messages in thread
From: Rusty Russell @ 2004-01-05  4:09 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: mingo, Linux Kernel Mailing List

In message <Pine.LNX.4.44.0401041501510.12666-100000@bigblue.dev.mdolabs.com> you write:
> I can see two options:
> 
> 1) We do not allow do_exit() from kthreads
> 
> 2) We give kthread_exit()
> 
> What do you think?

Well, I've now got a patch which works without any global
datastructures, but still allows the same semantics.  Includes some
fixes in the previous code, and test code.

Also includes a longstanding fix in workqueue.c: we never get SIGCHLD
if signal handler is SIG_IGN.

I'm not sure it's neater, though.  We don't have WIFEXITED() in the
kernel, so lots of manual shifting, probably should add that
somewhere.

Thoughts?
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.6.1-rc1-bk5/include/linux/kthread.h working-2.6.1-rc1-bk5-kthread-noglobal/include/linux/kthread.h
--- linux-2.6.1-rc1-bk5/include/linux/kthread.h	1970-01-01 10:00:00.000000000 +1000
+++ working-2.6.1-rc1-bk5-kthread-noglobal/include/linux/kthread.h	2004-01-05 14:42:02.000000000 +1100
@@ -0,0 +1,81 @@
+#ifndef _LINUX_KTHREAD_H
+#define _LINUX_KTHREAD_H
+/* Simple interface for creating and stopping kernel threads without mess. */
+#include <linux/err.h>
+struct task_struct;
+
+/**
+ * kthread_create: create a kthread.
+ * @initfn: a function to run when kthread_start() is called.
+ * @corefn: a function to run after @initfn succeeds.
+ * @data: data ptr for @initfn and @corefn.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: This helper function creates and names a kernel
+ * thread, and controls it for you.  See kthread_start() and
+ * kthread_run().
+ * 
+ * Returns a task_struct or ERR_PTR(-ENOMEM).
+ */
+struct task_struct *kthread_create(int (*initfn)(void *data),
+				   int (*corefn)(void *data),
+				   void *data,
+				   const char namefmt[], ...);
+
+/**
+ * kthread_start: start a thread created by kthread_create().
+ * @k: the task returned by kthread_create().
+ *
+ * Description: This makes the thread @k run the initfn() specified in
+ * kthread_create(), if non-NULL, and return the result.  If that
+ * result is less than zero, then the thread @k is terminated.
+ * Otherwise, the corefn() is run if it is not NULL.
+ *
+ * There are two ways to write corefn(): simple or complex.  For
+ * simple, you set current->state to TASK_INTERRUPTIBLE, check for
+ * work, then return 0 when it is done.  The kthread code will call
+ * schedule(), and call corefn() again when woken.  More complex
+ * corefn()s can loop themselves, but must return 0 when
+ * signal_pending(current) is true.
+ *
+ * corefn() can terminate itself in two ways: if someone will later
+ * call kthread_stop(), simply return a negative error number.
+ * Otherwise, if you call do_exit() directly, the thread will
+ * terminate immediately. */
+int kthread_start(struct task_struct *k);
+
+/**
+ * kthread_run: create and start a thread.
+ * @initfn: a function to run when kthread_start() is called.
+ * @corefn: a function to run after initfn succeeds.
+ * @data: data ptr for @initfn and @corefn.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: Convenient wrapper for kthread_create() followed by
+ * kthread_start().  Returns the kthread, or ERR_PTR(). */
+#define kthread_run(initfn, corefn, data, namefmt, ...)			      \
+({									      \
+	struct task_struct *__k						      \
+		= kthread_create(initfn,corefn,data,namefmt, ## __VA_ARGS__); \
+	if (!IS_ERR(__k)) {						      \
+		int __err = kthread_start(__k);				      \
+		if (__err) __k = ERR_PTR(__err);			      \
+	}								      \
+	__k;								      \
+})
+
+/**
+ * kthread_stop: stop a thread created by kthread_create().
+ * @k: thread created by kthread_start.
+ *
+ * Sends a signal to @k, and waits for it to exit.  Your corefn() must
+ * not call do_exit() itself if you use this function!  This can also
+ * be called after kthread_create() instead of calling
+ * kthread_start(): the thread will exit without calling initfn() or
+ * corefn().
+ *
+ * Returns the last result of corefn(), or 0 if kthread_start() was
+ * never called. */
+int kthread_stop(struct task_struct *k);
+
+#endif /* _LINUX_KTHREAD_H */
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.6.1-rc1-bk5/include/linux/workqueue.h working-2.6.1-rc1-bk5-kthread-noglobal/include/linux/workqueue.h
--- linux-2.6.1-rc1-bk5/include/linux/workqueue.h	2003-09-22 10:07:08.000000000 +1000
+++ working-2.6.1-rc1-bk5-kthread-noglobal/include/linux/workqueue.h	2004-01-05 14:42:02.000000000 +1100
@@ -60,6 +60,7 @@ extern int FASTCALL(schedule_work(struct
 extern int FASTCALL(schedule_delayed_work(struct work_struct *work, unsigned long delay));
 extern void flush_scheduled_work(void);
 extern int current_is_keventd(void);
+extern int keventd_up(void);
 
 extern void init_workqueues(void);
 
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.6.1-rc1-bk5/kernel/Makefile working-2.6.1-rc1-bk5-kthread-noglobal/kernel/Makefile
--- linux-2.6.1-rc1-bk5/kernel/Makefile	2003-10-09 18:03:02.000000000 +1000
+++ working-2.6.1-rc1-bk5-kthread-noglobal/kernel/Makefile	2004-01-05 14:42:02.000000000 +1100
@@ -6,7 +6,8 @@ obj-y     = sched.o fork.o exec_domain.o
 	    exit.o itimer.o time.o softirq.o resource.o \
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
-	    rcupdate.o intermodule.o extable.o params.o posix-timers.o
+	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
+	    kthread.o
 
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.6.1-rc1-bk5/kernel/kthread.c working-2.6.1-rc1-bk5-kthread-noglobal/kernel/kthread.c
--- linux-2.6.1-rc1-bk5/kernel/kthread.c	1970-01-01 10:00:00.000000000 +1000
+++ working-2.6.1-rc1-bk5-kthread-noglobal/kernel/kthread.c	2004-01-05 14:42:02.000000000 +1100
@@ -0,0 +1,289 @@
+/* Kernel thread helper functions.
+ *   Copyright (C) 2003 IBM Corporation, Rusty Russell.
+ *
+ * Everything is done via keventd, so that we get a clean environment
+ * even if we're invoked from userspace (think modprobe, hotplug cpu,
+ * etc.).  Also, it allows us to wait for dying kthreads without side
+ * effects involved in adopting kthreads to random processes.
+ */
+#define __KERNEL_SYSCALLS__
+#include <linux/sched.h>
+#include <linux/kthread.h>
+#include <linux/completion.h>
+#include <linux/err.h>
+#include <linux/unistd.h>
+#include <asm/semaphore.h>
+
+struct kthread_create_info
+{
+	int (*initfn)(void *data);
+	int (*corefn)(void *data);
+	void *data;
+	char *name;
+	struct completion started;
+};
+
+/* Returns a POSITIVE error number. */
+static int kthread(void *data)
+{
+	/* Copy data: it's on keventd_init's stack */
+	struct kthread_create_info k = *(struct kthread_create_info *)data;
+	int ret = 0;
+	sigset_t blocked;
+
+	strcpy(current->comm, k.name);
+
+	/* Block and flush all signals. */
+	sigfillset(&blocked);
+	sigprocmask(SIG_BLOCK, &blocked, NULL);
+	flush_signals(current);
+
+	/* OK, tell user we're spawned, wait for kthread_start/stop */
+	current->state = TASK_INTERRUPTIBLE;
+	complete(&((struct kthread_create_info *)data)->started);
+	schedule();
+
+	if (signal_pending(current))
+		return EINTR;
+
+	/* Run initfn: start_kthread is in wait(). */
+	if (k.initfn)
+		ret = k.initfn(k.data);
+	if (ret < 0)
+		return -ret << 8;
+
+	/* We succeeded.  Stop ourselves to break them out of wait(). */
+	current->exit_code = 1;
+	current->state = TASK_STOPPED;
+	notify_parent(current, SIGCHLD);
+	schedule();
+
+	while (!signal_pending(current)) {
+		/* If it fails, just wait until kthread_stop. */
+		if (k.corefn && (ret = k.corefn(k.data)) < 0) {
+			k.corefn = NULL;
+			current->state = TASK_INTERRUPTIBLE;
+		}
+		schedule();
+	}
+
+	current->state = TASK_RUNNING;
+	return -ret << 8;
+}
+
+enum kthread_op_type
+{
+	KTHREAD_CREATE,
+	KTHREAD_START,
+	KTHREAD_STOP,
+};
+
+struct kthread_op
+{
+	enum kthread_op_type type;
+
+	union {
+		/* KTHREAD_CREATE in */
+		struct kthread_create_info *create;
+		/* KTHREAD_START, KTHREAD_STOP in */
+		struct task_struct *target;
+		/* out */
+		struct task_struct *result;
+	} u;
+	struct completion done;
+};
+
+/* We are keventd: create a thread. */
+static void *create_kthread(struct kthread_op *op)
+{
+	int pid;
+
+	/* We want our own signal handler (we take no signals by default). */
+	pid = kernel_thread(kthread,op->u.create,CLONE_FS|CLONE_FILES|SIGCHLD);
+	if (pid < 0)
+		return ERR_PTR(pid);
+
+	wait_for_completion(&op->u.create->started);
+	return find_task_by_pid(pid);
+}
+
+static void adopt_kthread(struct task_struct *k)
+{
+	write_lock_irq(&tasklist_lock);
+	REMOVE_LINKS(k);
+	k->parent = current;
+	k->real_parent = current;
+	SET_LINKS(k);
+	write_unlock_irq(&tasklist_lock);
+}
+
+/* We are keventd: start the thread. */
+static void *start_kthread(struct kthread_op *op)
+{
+	int ret, status;
+
+	/* Adopt it so we can wait() for result. */
+	adopt_kthread(op->u.target);
+	wake_up_process(op->u.target);
+
+	ret = waitpid(op->u.target->tgid, &status, __WALL|WUNTRACED);
+	BUG_ON(ret < 0);
+	if ((status & 0xff) == 0x7f) {
+		/* STOPPED means initfn succeeded, tell it to continue. */
+		wake_up_process(op->u.target);
+		return NULL;
+	}
+
+	return ERR_PTR(-((status >> 8) & 0xFF));
+}
+
+/* We are keventd: stop the thread. */
+static void *stop_kthread(struct kthread_op *op)
+{
+	int status;
+	pid_t pid = op->u.target->tgid;
+
+	adopt_kthread(op->u.target);
+
+	force_sig(SIGTERM, op->u.target);
+	waitpid(pid, &status, __WALL);
+
+	return ERR_PTR(-((status >> 8) & 0xFF));
+}
+
+/* We are keventd: do what they told us */
+static void keventd_do_kthread_op(void *_op)
+{
+	struct kthread_op *op = _op;
+
+	switch (op->type) {
+	case KTHREAD_CREATE: op->u.result = create_kthread(op); break;
+	case KTHREAD_START: op->u.result = start_kthread(op); break;
+	case KTHREAD_STOP: op->u.result = stop_kthread(op); break;
+	default:
+		BUG();
+	}
+	complete(&op->done);
+}
+
+static struct task_struct *do_kthread_op(enum kthread_op_type type, void *info)
+{
+	struct kthread_op op;
+	DECLARE_WORK(work, keventd_do_kthread_op, &op);
+
+	op.type = type;
+	op.u.target = info;
+	init_completion(&op.done);
+
+	/* If we're being called to start the first workqueue, we
+	 * can't use keventd. */
+	if (!keventd_up())
+		work.func(work.data);
+	else {
+		schedule_work(&work);
+		wait_for_completion(&op.done);
+	}
+	return op.u.result;
+}
+
+struct task_struct *kthread_create(int (*initfn)(void *data),
+				   int (*corefn)(void *data),
+				   void *data,
+				   const char namefmt[],
+				   ...)
+{
+	va_list args;
+	struct kthread_create_info start;
+	/* Or, as we like to say, 16. */
+	char name[sizeof(((struct task_struct *)0)->comm)];
+
+	va_start(args, namefmt);
+	vsnprintf(name, sizeof(name), namefmt, args);
+	va_end(args);
+
+	start.initfn = initfn;
+	start.corefn = corefn;
+	start.data = data;
+	start.name = name;
+	init_completion(&start.started);
+	return do_kthread_op(KTHREAD_CREATE, &start);
+}
+
+int kthread_start(struct task_struct *k)
+{
+	return PTR_ERR(do_kthread_op(KTHREAD_START, k));
+}
+
+int kthread_stop(struct task_struct *k)
+{
+	return PTR_ERR(do_kthread_op(KTHREAD_STOP, k));
+}
+
+/* Test code */
+#include <linux/init.h>
+static int init(void *data)
+{
+	if (data == init) {
+		printk("init returning %i\n", -EINVAL);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int core(void *data)
+{
+	if (data == core)
+		return -ENOENT;
+	if (data == do_exit)
+		do_exit(0);
+	current->state = TASK_INTERRUPTIBLE;
+	return 0;
+}
+
+static int test_kthreads(void)
+{
+	struct task_struct *k;
+	int ret;
+
+	printk("kthread: Initialization which fails.\n");
+	k = kthread_create(init, core, init, "kthreadtest");
+	BUG_ON(IS_ERR(k));
+	ret = kthread_start(k);
+	BUG_ON(ret != -EINVAL);
+
+	printk("kthread: stopped before initialization\n");
+	k = kthread_create(init, core, NULL, "kthreadtest");
+	BUG_ON(IS_ERR(k));
+	ret = kthread_stop(k);
+	BUG_ON(ret != 0);
+
+	printk("kthread: Corefn which fails.\n");
+	k = kthread_create(init, core, core, "kthreadtest");
+	BUG_ON(IS_ERR(k));
+	ret = kthread_start(k);
+	BUG_ON(ret != 0);
+	ret = kthread_stop(k);
+	BUG_ON(ret != -ENOENT);
+
+	printk("kthread: Corefn which succeeds.\n");
+	k = kthread_create(init, core, NULL, "kthreadtest");
+	BUG_ON(IS_ERR(k));
+	ret = kthread_start(k);
+	BUG_ON(ret != 0);
+	ret = kthread_stop(k);
+	BUG_ON(ret != 0);
+
+	printk("kthread: Corefn which exits.\n");
+	k = kthread_create(init, core, do_exit, "kthreadtest");
+	BUG_ON(IS_ERR(k));
+	get_task_struct(k);
+	ret = kthread_start(k);
+	BUG_ON(ret != 0);
+	BUG_ON(!(k->state & (TASK_ZOMBIE|TASK_DEAD)));
+	put_task_struct(k);
+
+	printk("All kthread tests passed...\n");
+	return 0;
+}
+
+module_init(test_kthreads);
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.6.1-rc1-bk5/kernel/workqueue.c working-2.6.1-rc1-bk5-kthread-noglobal/kernel/workqueue.c
--- linux-2.6.1-rc1-bk5/kernel/workqueue.c	2003-09-22 10:27:38.000000000 +1000
+++ working-2.6.1-rc1-bk5-kthread-noglobal/kernel/workqueue.c	2004-01-05 14:42:02.000000000 +1100
@@ -181,7 +181,7 @@ static int worker_thread(void *__startup
 	complete(&startup->done);
 
 	/* Install a handler so SIGCLD is delivered */
-	sa.sa.sa_handler = SIG_IGN;
+	sa.sa.sa_handler = SIG_DFL;
 	sa.sa.sa_flags = 0;
 	siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
 	do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
@@ -359,6 +359,11 @@ void flush_scheduled_work(void)
 	flush_workqueue(keventd_wq);
 }
 
+int keventd_up(void)
+{
+	return keventd_wq != NULL;
+}
+
 int current_is_keventd(void)
 {
 	struct cpu_workqueue_struct *cwq;

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-05  4:09                             ` Rusty Russell
@ 2004-01-05  5:06                               ` Davide Libenzi
  2004-01-05  6:38                                 ` Rusty Russell
  0 siblings, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2004-01-05  5:06 UTC (permalink / raw)
  To: Rusty Russell; +Cc: mingo, Linux Kernel Mailing List

On Mon, 5 Jan 2004, Rusty Russell wrote:

> In message <Pine.LNX.4.44.0401041501510.12666-100000@bigblue.dev.mdolabs.com> you write:
> > I can see two options:
> > 
> > 1) We do not allow do_exit() from kthreads
> > 
> > 2) We give kthread_exit()
> > 
> > What do you think?
> 
> Well, I've now got a patch which works without any global
> datastructures, but still allows the same semantics.  Includes some
> fixes in the previous code, and test code.
> 
> Also includes a longstanding fix in workqueue.c: we never get SIGCHLD
> if signal handler is SIG_IGN.
> 
> I'm not sure it's neater, though.  We don't have WIFEXITED() in the
> kernel, so lots of manual shifting, probably should add that
> somewhere.
> 
> Thoughts?

Honestly I do not like playing with SIGCLD/waitpid for things that do not 
concern real task exits. But I think it can be avoided, and actually I 
don't know why I did not think about this before. We don't need to return 
a struct task_struct* for kthread_create(). We can have:

struct kthread_struct {
	struct task_struct *tsk;
	int ret;
	struct semaphore ack; /* Or completion if you like */
};

This structure is allocated in kthread_create() and propagated to the 
kthread() void* data. Then they (the creator and the kthread) will have a 
common ground to exchange acks w/out playing with SIGCLD/waitpid. Heh ?



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-05  5:06                               ` Davide Libenzi
@ 2004-01-05  6:38                                 ` Rusty Russell
  2004-01-05  6:52                                   ` Davide Libenzi
  0 siblings, 1 reply; 47+ messages in thread
From: Rusty Russell @ 2004-01-05  6:38 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: mingo, Linux Kernel Mailing List

In message <Pine.LNX.4.44.0401042100510.15831-100000@bigblue.dev.mdolabs.com> you write:
> Honestly I do not like playing with SIGCLD/waitpid for things that do not 
> concern real task exits. 

Well, it's pretty well documented.

> But I think it can be avoided, and actually I 
> don't know why I did not think about this before. We don't need to return 
> a struct task_struct* for kthread_create(). We can have:
> 
> struct kthread_struct {

Nope.  That's EXACTLY the kind of burden on the caller I wanted to
avoid if at all possible.

I just have decide whether to submit the wait one or the shared queue
one to Andrew....

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-05  6:38                                 ` Rusty Russell
@ 2004-01-05  6:52                                   ` Davide Libenzi
  2004-01-07  7:00                                     ` Rusty Russell
  0 siblings, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2004-01-05  6:52 UTC (permalink / raw)
  To: Rusty Russell; +Cc: mingo, Linux Kernel Mailing List

On Mon, 5 Jan 2004, Rusty Russell wrote:

> > But I think it can be avoided, and actually I 
> > don't know why I did not think about this before. We don't need to return 
> > a struct task_struct* for kthread_create(). We can have:
> > 
> > struct kthread_struct {
> 
> Nope.  That's EXACTLY the kind of burden on the caller I wanted to
> avoid if at all possible.

Which burden? The kthread is a resource and a "struct kthread" is an 
handle to the resource. You create the resource (kthread_create()), you 
control the resource (kthread_start()) and you free the resource 
(kthread_stop()). To me it's simple and clean and does not require hacks 
like taking owerships of tasks and using SIGCLD/waitpid to communicate. 
Anyway, that's your baby and you'll take your choice.



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-05  6:52                                   ` Davide Libenzi
@ 2004-01-07  7:00                                     ` Rusty Russell
  2004-01-07  7:25                                       ` Davide Libenzi
  0 siblings, 1 reply; 47+ messages in thread
From: Rusty Russell @ 2004-01-07  7:00 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: mingo, Linux Kernel Mailing List

In message <Pine.LNX.4.44.0401042243510.16042-100000@bigblue.dev.mdolabs.com> you write:
> On Mon, 5 Jan 2004, Rusty Russell wrote:
> > Nope.  That's EXACTLY the kind of burden on the caller I wanted to
> > avoid if at all possible.
> 
> Which burden? The kthread is a resource and a "struct kthread" is an 
> handle to the resource. You create the resource (kthread_create()), you 
> control the resource (kthread_start()) and you free the resource 
> (kthread_stop()). To me it's simple and clean and does not require hacks 
> like taking owerships of tasks and using SIGCLD/waitpid to communicate. 
> Anyway, that's your baby and you'll take your choice.

Thinking more about this issue lead me to rewrite the code to be
simpler to use.  Main benefit is that the transition from existing
code is minimal.

Latest version of patch, and code which uses it.  It's actually quite
neat now.  Changes since first version:

1) kthread_start() deleted in favor of wake_up_process() directly.
2) New kthread_bind() for just-created threads, to replace original
   migration thread open-coded version.
3) Slight simplification: thread named only if spawned OK.

Thanks!
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

Name: Simplified Kthread
Author: Rusty Russell
Status: Booted on 2.6.1-rc2-bk1
Depends: Hotcpu-New-Kthread/workqueue_children.patch.gz

D: The hotplug CPU code introduces two major problems:
D: 
D: 1) Threads which previously never stopped (migration thread,
D:    ksoftirqd, keventd) have to be stopped cleanly as CPUs go offline.
D: 2) Threads which previously never had to be created now have
D:    to be created when a CPU goes online.
D: 
D: Unfortunately, stopping a thread is fairly baroque, involving memory
D: barriers, a completion and spinning until the task is actually dead
D: (for example, complete_and_exit() must be used if inside a module).
D: 
D: There are also three problems in starting a thread:
D: 1) Doing it from a random process context risks environment contamination:
D:    better to do it from keventd to guarantee a clean environment, a-la
D:    call_usermodehelper.
D: 2) Getting the task struct without races is a hard: see kernel/sched.c
D:    migration_call(), kernel/workqueue.c create_workqueue_thread().
D: 3) There are races in starting a thread for a CPU which is not yet
D:    online: migration thread does a complex dance at the moment for
D:    a similar reason (there may be no migration thread to migrate us).
D: 
D: Place all this logic in some primitives to make life easier:
D: kthread_create() and kthread_stop().  These primitives require no
D: extra data-structures in the caller: they operate on normal "struct
D: task_struct"s.
D: 
D: Other changes:
D:   - Expose keventd_up(), as keventd and migration threads will use
D:     kthread to launch, and kthread normally uses workqueues and must
D:     recognize this case.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .16124-linux-2.6.1-rc2-bk1/include/linux/kthread.h .16124-linux-2.6.1-rc2-bk1.updated/include/linux/kthread.h
--- .16124-linux-2.6.1-rc2-bk1/include/linux/kthread.h	1970-01-01 10:00:00.000000000 +1000
+++ .16124-linux-2.6.1-rc2-bk1.updated/include/linux/kthread.h	2004-01-07 17:56:59.000000000 +1100
@@ -0,0 +1,71 @@
+#ifndef _LINUX_KTHREAD_H
+#define _LINUX_KTHREAD_H
+/* Simple interface for creating and stopping kernel threads without mess. */
+#include <linux/err.h>
+#include <linux/sched.h>
+
+/**
+ * kthread_create: create a kthread.
+ * @threadfn: the function to run until signal_pending(current).
+ * @data: data ptr for @threadfn.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: This helper function creates and names a kernel
+ * thread.  The thread will be stopped: use wake_up_process() to start
+ * it.  See also kthread_run(), kthread_create_on_cpu().
+ *
+ * When woken, the thread will run @threadfn() with @data as its
+ * argument. @threadfn can either call do_exit() directly if it is a
+ * standalone thread for which noone will call kthread_stop(), or
+ * return when 'signal_pending(current)' is true (which means
+ * kthread_stop() has been called).  The return value should be zero
+ * or a negative error number: it will be passed to kthread_stop().
+ * 
+ * Returns a task_struct or ERR_PTR(-ENOMEM).
+ */
+struct task_struct *kthread_create(int (*threadfn)(void *data),
+				   void *data,
+				   const char namefmt[], ...);
+
+/**
+ * kthread_run: create and wake a thread.
+ * @threadfn: the function to run until signal_pending(current).
+ * @data: data ptr for @threadfn.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: Convenient wrapper for kthread_create() followed by
+ * wake_up_process().  Returns the kthread, or ERR_PTR(-ENOMEM). */
+#define kthread_run(threadfn, data, namefmt, ...)			   \
+({									   \
+	struct task_struct *__k						   \
+		= kthread_create(threadfn, data, namefmt, ## __VA_ARGS__); \
+	if (!IS_ERR(__k))						   \
+		wake_up_process(__k);					   \
+	__k;								   \
+})
+
+/**
+ * kthread_bind: bind a just-created kthread to a cpu.
+ * @k: thread created by kthread_create().
+ * @cpu: cpu (might not be online, must be possible) for @k to run on.
+ *
+ * Description: This function is equivalent to set_cpus_allowed(),
+ * except that @cpu doesn't need to be online, and the thread must be
+ * stopped (ie. just returned from kthread_create().
+ */
+void kthread_bind(struct task_struct *k, unsigned int cpu);
+
+/**
+ * kthread_stop: stop a thread created by kthread_create().
+ * @k: thread created by kthread_create().
+ *
+ * Sends a signal to @k, and waits for it to exit.  Your threadfn()
+ * must not call do_exit() itself if you use this function!  This can
+ * also be called after kthread_create() instead of calling
+ * wake_up_process(): the thread will exit without calling threadfn().
+ *
+ * Returns the result of threadfn(), or -EINTR if wake_up_process()
+ * was never called. */
+int kthread_stop(struct task_struct *k);
+
+#endif /* _LINUX_KTHREAD_H */
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .16124-linux-2.6.1-rc2-bk1/include/linux/workqueue.h .16124-linux-2.6.1-rc2-bk1.updated/include/linux/workqueue.h
--- .16124-linux-2.6.1-rc2-bk1/include/linux/workqueue.h	2003-09-22 10:07:08.000000000 +1000
+++ .16124-linux-2.6.1-rc2-bk1.updated/include/linux/workqueue.h	2004-01-07 17:56:59.000000000 +1100
@@ -60,6 +60,7 @@ extern int FASTCALL(schedule_work(struct
 extern int FASTCALL(schedule_delayed_work(struct work_struct *work, unsigned long delay));
 extern void flush_scheduled_work(void);
 extern int current_is_keventd(void);
+extern int keventd_up(void);
 
 extern void init_workqueues(void);
 
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .16124-linux-2.6.1-rc2-bk1/kernel/Makefile .16124-linux-2.6.1-rc2-bk1.updated/kernel/Makefile
--- .16124-linux-2.6.1-rc2-bk1/kernel/Makefile	2003-10-09 18:03:02.000000000 +1000
+++ .16124-linux-2.6.1-rc2-bk1.updated/kernel/Makefile	2004-01-07 17:56:59.000000000 +1100
@@ -6,7 +6,8 @@ obj-y     = sched.o fork.o exec_domain.o
 	    exit.o itimer.o time.o softirq.o resource.o \
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
-	    rcupdate.o intermodule.o extable.o params.o posix-timers.o
+	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
+	    kthread.o
 
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .16124-linux-2.6.1-rc2-bk1/kernel/kthread.c .16124-linux-2.6.1-rc2-bk1.updated/kernel/kthread.c
--- .16124-linux-2.6.1-rc2-bk1/kernel/kthread.c	1970-01-01 10:00:00.000000000 +1000
+++ .16124-linux-2.6.1-rc2-bk1.updated/kernel/kthread.c	2004-01-07 17:56:59.000000000 +1100
@@ -0,0 +1,170 @@
+/* Kernel thread helper functions.
+ *   Copyright (C) 2004 IBM Corporation, Rusty Russell.
+ *
+ * Everything is done via keventd, so that we get a clean environment
+ * even if we're invoked from userspace (think modprobe, hotplug cpu,
+ * etc.).  Also, it allows us to wait for dying kthreads without side
+ * effects involved in adopting kthreads to random processes.
+ */
+#define __KERNEL_SYSCALLS__
+#include <linux/sched.h>
+#include <linux/kthread.h>
+#include <linux/completion.h>
+#include <linux/err.h>
+#include <linux/unistd.h>
+#include <asm/semaphore.h>
+
+struct kthread_create_info
+{
+	/* Information passed to kthread() from keventd. */
+	int (*threadfn)(void *data);
+	void *data;
+	struct completion started;
+
+	/* Result passed back to kthread_create() from keventd. */
+	struct task_struct *result;
+	struct completion done;
+};
+
+/* Returns so that WEXITSTATUS(ret) == errno. */
+static int kthread(void *_create)
+{
+	struct kthread_create_info *create = _create;
+	int (*threadfn)(void *data);
+	void *data;
+	int ret = -EINTR;
+
+	/* Copy data: it's on keventd's stack */
+	threadfn = create->threadfn;
+	data = create->data;
+
+	/* OK, tell user we're spawned, wait for stop or wakeup */
+	current->state = TASK_INTERRUPTIBLE;
+	complete(&create->started);
+	schedule();
+
+	while (!signal_pending(current))
+		ret = threadfn(data);
+
+	return (-ret) << 8;
+}
+
+/* We are keventd: create a thread. */
+static void keventd_create_kthread(void *_create)
+{
+	struct kthread_create_info *create = _create;
+	int pid;
+
+	/* We want our own signal handler (we take no signals by default). */
+	pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);
+	if (pid < 0) {
+		create->result = ERR_PTR(pid);
+	} else {
+		wait_for_completion(&create->started);
+		create->result = find_task_by_pid(pid);
+		wait_task_inactive(create->result);
+	}
+	complete(&create->done);
+}
+
+struct kthread_stop_info
+{
+	struct task_struct *k;
+	int result;
+	struct completion done;
+};
+
+/* "to look upon me as her own dad -- in a very real, and legally
+   binding sense." - Michael Palin */
+static void adopt_kthread(struct task_struct *k)
+{
+	write_lock_irq(&tasklist_lock);
+	REMOVE_LINKS(k);
+	k->parent = current;
+	k->real_parent = current;
+	SET_LINKS(k);
+	write_unlock_irq(&tasklist_lock);
+}
+
+/* We are keventd: stop the thread. */
+static void keventd_stop_kthread(void *_stop)
+{
+	struct kthread_stop_info *stop = _stop;
+	int status;
+	sigset_t blocked;
+	struct k_sigaction sa;
+
+	/* Install a handler so SIGCHLD is actually delivered */
+	sa.sa.sa_handler = SIG_DFL;
+	sa.sa.sa_flags = 0;
+	siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
+	do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
+	allow_signal(SIGCHLD);
+
+	adopt_kthread(stop->k);
+	/* All signals are blocked, hence the force. */
+	force_sig(SIGTERM, stop->k);
+	waitpid(stop->k->tgid, &status, __WALL);
+	stop->result = -((status >> 8) & 0xFF);
+	complete(&stop->done);
+
+	/* Back to normal: block and flush all signals */
+	sigfillset(&blocked);
+	sigprocmask(SIG_BLOCK, &blocked, NULL);
+	flush_signals(current);
+	sa.sa.sa_handler = SIG_IGN;
+	do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
+	while (waitpid(-1, &status, __WALL|WNOHANG) > 0);
+}
+
+struct task_struct *kthread_create(int (*threadfn)(void *data),
+				   void *data,
+				   const char namefmt[],
+				   ...)
+{
+	struct kthread_create_info create;
+	DECLARE_WORK(work, keventd_create_kthread, &create);
+
+	create.threadfn = threadfn;
+	create.data = data;
+	init_completion(&create.started);
+	init_completion(&create.done);
+
+	/* If we're being called to start the first workqueue, we
+	 * can't use keventd. */
+	if (!keventd_up())
+		work.func(work.data);
+	else {
+		schedule_work(&work);
+		wait_for_completion(&create.done);
+	}
+	if (!IS_ERR(create.result)) {
+		va_list args;
+		va_start(args, namefmt);
+		vsnprintf(create.result->comm, sizeof(create.result->comm),
+			  namefmt, args);
+		va_end(args);
+	}
+
+	return create.result;
+}
+
+void kthread_bind(struct task_struct *k, unsigned int cpu)
+{
+	BUG_ON(k->state != TASK_INTERRUPTIBLE);
+	k->thread_info->cpu = cpu;
+	k->cpus_allowed = cpumask_of_cpu(cpu);
+}
+
+int kthread_stop(struct task_struct *k)
+{
+	struct kthread_stop_info stop;
+	DECLARE_WORK(work, keventd_stop_kthread, &stop);
+
+	stop.k = k;
+	init_completion(&stop.done);
+
+	schedule_work(&work);
+	wait_for_completion(&stop.done);
+	return stop.result;
+}
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .16124-linux-2.6.1-rc2-bk1/kernel/workqueue.c .16124-linux-2.6.1-rc2-bk1.updated/kernel/workqueue.c
--- .16124-linux-2.6.1-rc2-bk1/kernel/workqueue.c	2004-01-07 17:56:58.000000000 +1100
+++ .16124-linux-2.6.1-rc2-bk1.updated/kernel/workqueue.c	2004-01-07 17:56:59.000000000 +1100
@@ -347,6 +347,11 @@ void flush_scheduled_work(void)
 	flush_workqueue(keventd_wq);
 }
 
+int keventd_up(void)
+{
+	return keventd_wq != NULL;
+}
+
 int current_is_keventd(void)
 {
 	struct cpu_workqueue_struct *cwq;

Name: Use Kthread For Core Kernel Threads
Author: Rusty Russell
Status: Booted on 2.6.1-rc2-bk1
Depends: Hotcpu-New-Kthread/kthread-simple.patch.gz

D: This simply changes over the migration threads, the workqueue
D: threads and the ksoftirqd threads to use kthread.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .9505-linux-2.6.1-rc2-bk1/kernel/sched.c .9505-linux-2.6.1-rc2-bk1.updated/kernel/sched.c
--- .9505-linux-2.6.1-rc2-bk1/kernel/sched.c	2004-01-06 18:01:01.000000000 +1100
+++ .9505-linux-2.6.1-rc2-bk1.updated/kernel/sched.c	2004-01-07 16:34:26.000000000 +1100
@@ -37,6 +37,7 @@
 #include <linux/rcupdate.h>
 #include <linux/cpu.h>
 #include <linux/percpu.h>
+#include <linux/kthread.h>
 
 #ifdef CONFIG_NUMA
 #define cpu_to_node_mask(cpu) node_to_cpumask(cpu_to_node(cpu))
@@ -2640,12 +2641,6 @@ static void move_task_away(struct task_s
 	local_irq_restore(flags);
 }
 
-typedef struct {
-	int cpu;
-	struct completion startup_done;
-	task_t *task;
-} migration_startup_t;
-
 /*
  * migration_thread - this is a highprio system thread that performs
  * thread migration by bumping thread off CPU then 'pushing' onto
@@ -2655,27 +2650,17 @@ static int migration_thread(void * data)
 {
 	/* Marking "param" __user is ok, since we do a set_fs(KERNEL_DS); */
 	struct sched_param __user param = { .sched_priority = MAX_RT_PRIO-1 };
-	migration_startup_t *startup = data;
-	int cpu = startup->cpu;
 	runqueue_t *rq;
+	int cpu = (long)data;
 	int ret;
 
-	startup->task = current;
-	complete(&startup->startup_done);
-	set_current_state(TASK_UNINTERRUPTIBLE);
-	schedule();
-
 	BUG_ON(smp_processor_id() != cpu);
-
-	daemonize("migration/%d", cpu);
-	set_fs(KERNEL_DS);
-
 	ret = setscheduler(0, SCHED_FIFO, &param);
 
 	rq = this_rq();
-	rq->migration_thread = current;
+	BUG_ON(rq->migration_thread != current);
 
-	for (;;) {
+	while (!signal_pending(current)) {
 		struct list_head *head;
 		migration_req_t *req;
 
@@ -2698,6 +2683,7 @@ static int migration_thread(void * data)
 			       any_online_cpu(req->task->cpus_allowed));
 		complete(&req->done);
 	}
+	return 0;
 }
 
 /*
@@ -2708,36 +2694,27 @@ static int migration_call(struct notifie
 			  unsigned long action,
 			  void *hcpu)
 {
-	long cpu = (long) hcpu;
-	migration_startup_t startup;
+	int cpu = (long)hcpu;
+	struct task_struct *p;
 
 	switch (action) {
 	case CPU_ONLINE:
-
-		printk("Starting migration thread for cpu %li\n", cpu);
-
-		startup.cpu = cpu;
-		startup.task = NULL;
-		init_completion(&startup.startup_done);
-
-		kernel_thread(migration_thread, &startup, CLONE_KERNEL);
-		wait_for_completion(&startup.startup_done);
-		wait_task_inactive(startup.task);
-
-		startup.task->thread_info->cpu = cpu;
-		startup.task->cpus_allowed = cpumask_of_cpu(cpu);
-
-		wake_up_process(startup.task);
-
-		while (!cpu_rq(cpu)->migration_thread)
-			yield();
-
+		p = kthread_create(migration_thread, hcpu, "migration/%d",cpu);
+		if (IS_ERR(p))
+			return NOTIFY_BAD;
+		kthread_bind(p, cpu);
+		cpu_rq(cpu)->migration_thread = p;
+		wake_up_process(p);
 		break;
 	}
 	return NOTIFY_OK;
 }
 
-static struct notifier_block migration_notifier = { &migration_call, NULL, 0 };
+/* Want this before the other threads, so they can use set_cpus_allowed. */
+static struct notifier_block __devinitdata migration_notifier = { 
+	.notifier_call = migration_call,
+	.priority = 10,
+};
 
 __init int migration_init(void)
 {
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .9505-linux-2.6.1-rc2-bk1/kernel/softirq.c .9505-linux-2.6.1-rc2-bk1.updated/kernel/softirq.c
--- .9505-linux-2.6.1-rc2-bk1/kernel/softirq.c	2003-10-09 18:03:02.000000000 +1000
+++ .9505-linux-2.6.1-rc2-bk1.updated/kernel/softirq.c	2004-01-07 16:36:34.000000000 +1100
@@ -14,6 +14,7 @@
 #include <linux/notifier.h>
 #include <linux/percpu.h>
 #include <linux/cpu.h>
+#include <linux/kthread.h>
 
 /*
    - No shared variables, all the data are CPU local.
@@ -337,20 +338,14 @@ static int ksoftirqd(void * __bind_cpu)
 {
 	int cpu = (int) (long) __bind_cpu;
 
-	daemonize("ksoftirqd/%d", cpu);
 	set_user_nice(current, 19);
 	current->flags |= PF_IOTHREAD;
 
-	/* Migrate to the right CPU */
-	set_cpus_allowed(current, cpumask_of_cpu(cpu));
 	BUG_ON(smp_processor_id() != cpu);
 
-	__set_current_state(TASK_INTERRUPTIBLE);
-	mb();
-
-	__get_cpu_var(ksoftirqd) = current;
+	set_current_state(TASK_INTERRUPTIBLE);
 
-	for (;;) {
+	while (!signal_pending(current)) {
 		if (!local_softirq_pending())
 			schedule();
 
@@ -363,6 +358,7 @@ static int ksoftirqd(void * __bind_cpu)
 
 		__set_current_state(TASK_INTERRUPTIBLE);
 	}
+	return 0;
 }
 
 static int __devinit cpu_callback(struct notifier_block *nfb,
@@ -370,15 +366,17 @@ static int __devinit cpu_callback(struct
 				  void *hcpu)
 {
 	int hotcpu = (unsigned long)hcpu;
+	struct task_struct *p;
 
 	if (action == CPU_ONLINE) {
-		if (kernel_thread(ksoftirqd, hcpu, CLONE_KERNEL) < 0) {
+		p = kthread_create(ksoftirqd, hcpu, "ksoftirqd/%d", hotcpu);
+		if (IS_ERR(p)) {
 			printk("ksoftirqd for %i failed\n", hotcpu);
 			return NOTIFY_BAD;
 		}
-
-		while (!per_cpu(ksoftirqd, hotcpu))
-			yield();
+		per_cpu(ksoftirqd, hotcpu) = p;
+		kthread_bind(p, hotcpu);
+		wake_up_process(p);
  	}
 	return NOTIFY_OK;
 }
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .9505-linux-2.6.1-rc2-bk1/kernel/workqueue.c .9505-linux-2.6.1-rc2-bk1.updated/kernel/workqueue.c
--- .9505-linux-2.6.1-rc2-bk1/kernel/workqueue.c	2004-01-07 16:33:53.000000000 +1100
+++ .9505-linux-2.6.1-rc2-bk1.updated/kernel/workqueue.c	2004-01-07 16:35:27.000000000 +1100
@@ -22,6 +22,7 @@
 #include <linux/completion.h>
 #include <linux/workqueue.h>
 #include <linux/slab.h>
+#include <linux/kthread.h>
 
 /*
  * The per-CPU workqueue.
@@ -45,7 +46,6 @@ struct cpu_workqueue_struct {
 
 	struct workqueue_struct *wq;
 	task_t *thread;
-	struct completion exit;
 
 } ____cacheline_aligned;
 
@@ -153,28 +152,23 @@ static inline void run_workqueue(struct 
 	spin_unlock_irqrestore(&cwq->lock, flags);
 }
 
-typedef struct startup_s {
-	struct cpu_workqueue_struct *cwq;
-	struct completion done;
-	const char *name;
-} startup_t;
-
-static int worker_thread(void *__startup)
+static int worker_thread(void *__cwq)
 {
-	startup_t *startup = __startup;
-	struct cpu_workqueue_struct *cwq = startup->cwq;
+	struct cpu_workqueue_struct *cwq = __cwq;
 	int cpu = cwq - cwq->wq->cpu_wq;
 	DECLARE_WAITQUEUE(wait, current);
 	struct k_sigaction sa;
+	sigset_t blocked;
 
-	daemonize("%s/%d", startup->name, cpu);
 	current->flags |= PF_IOTHREAD;
-	cwq->thread = current;
 
 	set_user_nice(current, -10);
-	set_cpus_allowed(current, cpumask_of_cpu(cpu));
+	BUG_ON(smp_processor_id() != cpu);
 
-	complete(&startup->done);
+	/* Block and flush all signals */
+	sigfillset(&blocked);
+	sigprocmask(SIG_BLOCK, &blocked, NULL);
+	flush_signals(current);
 
 	/* SIG_IGN makes children autoreap: see do_notify_parent(). */
 	sa.sa.sa_handler = SIG_IGN;
@@ -182,12 +176,10 @@ static int worker_thread(void *__startup
 	siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
 	do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
 
-	for (;;) {
+	while (!signal_pending(current)) {
 		set_task_state(current, TASK_INTERRUPTIBLE);
 
 		add_wait_queue(&cwq->more_work, &wait);
-		if (!cwq->thread)
-			break;
 		if (list_empty(&cwq->worklist))
 			schedule();
 		else
@@ -197,9 +189,6 @@ static int worker_thread(void *__startup
 		if (!list_empty(&cwq->worklist))
 			run_workqueue(cwq);
 	}
-	remove_wait_queue(&cwq->more_work, &wait);
-	complete(&cwq->exit);
-
 	return 0;
 }
 
@@ -251,9 +240,8 @@ static int create_workqueue_thread(struc
 				   const char *name,
 				   int cpu)
 {
-	startup_t startup;
 	struct cpu_workqueue_struct *cwq = wq->cpu_wq + cpu;
-	int ret;
+	struct task_struct *p;
 
 	spin_lock_init(&cwq->lock);
 	cwq->wq = wq;
@@ -263,17 +251,13 @@ static int create_workqueue_thread(struc
 	INIT_LIST_HEAD(&cwq->worklist);
 	init_waitqueue_head(&cwq->more_work);
 	init_waitqueue_head(&cwq->work_done);
-	init_completion(&cwq->exit);
 
-	init_completion(&startup.done);
-	startup.cwq = cwq;
-	startup.name = name;
-	ret = kernel_thread(worker_thread, &startup, CLONE_FS | CLONE_FILES);
-	if (ret >= 0) {
-		wait_for_completion(&startup.done);
-		BUG_ON(!cwq->thread);
-	}
-	return ret;
+	p = kthread_create(worker_thread, cwq, "%s/%d", name, cpu);
+	if (IS_ERR(p))
+		return PTR_ERR(p);
+	cwq->thread = p;
+	kthread_bind(p, cpu);
+	return 0;
 }
 
 struct workqueue_struct *create_workqueue(const char *name)
@@ -292,6 +276,8 @@ struct workqueue_struct *create_workqueu
 			continue;
 		if (create_workqueue_thread(wq, name, cpu) < 0)
 			destroy = 1;
+		else
+			wake_up_process(wq->cpu_wq[cpu].thread);
 	}
 	/*
 	 * Was there any error during startup? If yes then clean up:
@@ -308,13 +294,8 @@ static void cleanup_workqueue_thread(str
 	struct cpu_workqueue_struct *cwq;
 
 	cwq = wq->cpu_wq + cpu;
-	if (cwq->thread) {
-		/* Tell thread to exit and wait for it. */
-		cwq->thread = NULL;
-		wake_up(&cwq->more_work);
-
-		wait_for_completion(&cwq->exit);
-	}
+	if (cwq->thread)
+		kthread_stop(cwq->thread);
 }
 
 void destroy_workqueue(struct workqueue_struct *wq)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-07  7:00                                     ` Rusty Russell
@ 2004-01-07  7:25                                       ` Davide Libenzi
  2004-01-08  0:33                                         ` Rusty Russell
  0 siblings, 1 reply; 47+ messages in thread
From: Davide Libenzi @ 2004-01-07  7:25 UTC (permalink / raw)
  To: Rusty Russell; +Cc: mingo, Linux Kernel Mailing List

On Wed, 7 Jan 2004, Rusty Russell wrote:

> In message <Pine.LNX.4.44.0401042243510.16042-100000@bigblue.dev.mdolabs.com> you write:
> > On Mon, 5 Jan 2004, Rusty Russell wrote:
> > > Nope.  That's EXACTLY the kind of burden on the caller I wanted to
> > > avoid if at all possible.
> > 
> > Which burden? The kthread is a resource and a "struct kthread" is an 
> > handle to the resource. You create the resource (kthread_create()), you 
> > control the resource (kthread_start()) and you free the resource 
> > (kthread_stop()). To me it's simple and clean and does not require hacks 
> > like taking owerships of tasks and using SIGCLD/waitpid to communicate. 
> > Anyway, that's your baby and you'll take your choice.
> 
> Thinking more about this issue lead me to rewrite the code to be
> simpler to use.  Main benefit is that the transition from existing
> code is minimal.
> 
> Latest version of patch, and code which uses it.  It's actually quite
> neat now.  Changes since first version:
> 
> 1) kthread_start() deleted in favor of wake_up_process() directly.
> 2) New kthread_bind() for just-created threads, to replace original
>    migration thread open-coded version.
> 3) Slight simplification: thread named only if spawned OK.

Yes, I like this better. Without any doubt, the removal of sync'd start 
function simplified things a lot.



- Davide





^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-07  7:25                                       ` Davide Libenzi
@ 2004-01-08  0:33                                         ` Rusty Russell
  0 siblings, 0 replies; 47+ messages in thread
From: Rusty Russell @ 2004-01-08  0:33 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: mingo, Linux Kernel Mailing List

In message <Pine.LNX.4.44.0401062314070.1030-100000@bigblue.dev.mdolabs.com> you write:
> On Wed, 7 Jan 2004, Rusty Russell wrote:
> 
> > Latest version of patch, and code which uses it.  It's actually quite
> > neat now.  Changes since first version:
> 
> Yes, I like this better. Without any doubt, the removal of sync'd start 
> function simplified things a lot.

Yeah, I thought you would: makes our previous debates moot so we can
be friends again 8)

The reason I did the startfn thing originally is for the hotplug CPU
code: I wanted to make sure the thread gets onto the CPU while we're
in the notifier, so it doesn't go down again before the thread does
set_cpus_allowed().  Not a problem in practice, but it's nice to have
the BUG_ON(smp_processor_id() != cpu) in the code.

kthread_bind() solves this neatly, and we need it for migration_thread
anyway (which is the one thread where being on the right CPU effects
correctness, not just performance).

Cheers!
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-02  7:09           ` Rusty Russell
  2004-01-02 16:58             ` Davide Libenzi
@ 2004-03-29 15:38             ` Davide Libenzi
  1 sibling, 0 replies; 47+ messages in thread
From: Davide Libenzi @ 2004-03-29 15:38 UTC (permalink / raw)
  To: Administrator
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Fri, 2 Jan 2004, Rusty Russell wrote:

> In message <Pine.LNX.4.44.0312311935080.5831-100000@bigblue.dev.mdolabs.com> yo
> u write:
> > On Wed, 31 Dec 2003, Rusty Russell wrote:
> > 
> > > But an alternate implementation would be to have a "kthread" kernel
> > > thread, which would actually be parent to the kthread threads.  This
> > > means it can allocate and clean up, since it catches *all* thread
> > > deaths, including "exit()".
> > > 
> > > What do you think?
> > 
> > Did you take a look at the stuff I sent you? I'll append here with a 
> > simple comment (this goes over you bits).
> 
> Yes, but I think it's a really bad idea, as I said before.
> 
> Anyway, Here's a version which fixes the issues raised by Andrew by
> doing *everything* in keventd, uses waitpid(), and uses signals for
> communication (except for the case of init failing).

Rusty, you still have to use global static data when there is no need. I 
like this version better though ;)



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-02 16:58             ` Davide Libenzi
  2004-01-03  3:05               ` Rusty Russell
@ 2004-03-29 15:39               ` Rusty Russell
  1 sibling, 0 replies; 47+ messages in thread
From: Rusty Russell @ 2004-03-29 15:39 UTC (permalink / raw)
  To: Administrator
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

In message <Pine.LNX.4.44.0401020856150.2278-100000@bigblue.dev.mdolabs.com> you write:
> Rusty, you still have to use global static data when there is no need.

And you're still putting obscure crap in the task struct when there's
no need.  Honestly, I'd be ashamed to post such a patch.

> I like this version better though ;)

I think I should seek a second opinion though.
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-03  3:05               ` Rusty Russell
  2004-01-03  3:43                 ` Davide Libenzi
@ 2004-03-29 15:39                 ` Davide Libenzi
  1 sibling, 0 replies; 47+ messages in thread
From: Davide Libenzi @ 2004-03-29 15:39 UTC (permalink / raw)
  To: Administrator
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Sat, 3 Jan 2004, Rusty Russell wrote:

> In message <Pine.LNX.4.44.0401020856150.2278-100000@bigblue.dev.mdolabs.com> you write:
> > Rusty, you still have to use global static data when there is no need.
> 
> And you're still putting obscure crap in the task struct when there's
> no need.  Honestly, I'd be ashamed to post such a patch.

Ashamed !? Take a look at your original patch and then define shame. You 
had a communication mechanism that whilst being a private 1<->1 
communication among two tasks, relied on a single global message 
strucure, lock and mutex. Honestly I do not like myself to add stuff 
inside a strcture for one-time use. Not because of adding 12 bytes to the 
struct, that are laughable. But because it is used by a small piece of 
code w/out a re-use ability for other things.



> > I like this version better though ;)
> 
> I think I should seek a second opinion though.

But of course, even a third one ;)



- Davide





^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-03  3:43                 ` Davide Libenzi
  2004-01-03 11:47                   ` Rusty Russell
  2004-01-03 19:00                   ` Davide Libenzi
@ 2004-03-29 15:40                   ` Davide Libenzi
  2004-03-29 15:41                   ` Rusty Russell
  3 siblings, 0 replies; 47+ messages in thread
From: Davide Libenzi @ 2004-03-29 15:40 UTC (permalink / raw)
  To: Administrator
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Fri, 2 Jan 2004, Davide Libenzi wrote:

> On Sat, 3 Jan 2004, Rusty Russell wrote:
> 
> > In message <Pine.LNX.4.44.0401020856150.2278-100000@bigblue.dev.mdolabs.com> you write:
> > > Rusty, you still have to use global static data when there is no need.
> > 
> > And you're still putting obscure crap in the task struct when there's
> > no need.  Honestly, I'd be ashamed to post such a patch.
> 
> Ashamed !? Take a look at your original patch and then define shame. You 
> had a communication mechanism that whilst being a private 1<->1 
> communication among two tasks, relied on a single global message 
> strucure, lock and mutex. Honestly I do not like myself to add stuff 
> inside a strcture for one-time use. Not because of adding 12 bytes to the 
> struct, that are laughable. But because it is used by a small piece of 
> code w/out a re-use ability for other things.

Rusty, I took a better look at the patch and I think we can have 
per-kthread stuff w/out littering the task_struct and by making the thing 
more robust. We keep a global list_head protected by a global spinlock. We 
define a structure that contain all the per-kthread stuff we need 
(including a task_struct* to the kthread itself). When a kthread starts it 
will add itself to the list, and when it will die it will remove itself 
from the list. The start/stop functions will lookup the list (or hash, 
depending on how much stuff you want to drop in) with the target 
task_struct*, and if the lookup fails, it means the task already quit 
(another task already did kthread_stop() ??, natural death ????). This is 
too bad, but at least there won't be deadlock (or crash) beacause of this. 
This because currently we keep the kthread task_struct* lingering around 
w/out a method that willl inform us if the task goes away for some reason 
(so that we can avoid signaling it and waiting for some interaction). The 
list/hash will be able to tell us this. What do you think?




- Davide




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-03 19:00                   ` Davide Libenzi
  2004-01-03 23:53                     ` Davide Libenzi
  2004-01-04  2:34                     ` Rusty Russell
@ 2004-03-29 15:41                     ` Davide Libenzi
  2004-03-29 15:42                     ` Rusty Russell
  3 siblings, 0 replies; 47+ messages in thread
From: Davide Libenzi @ 2004-03-29 15:41 UTC (permalink / raw)
  To: Administrator
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Sat, 3 Jan 2004, Davide Libenzi wrote:

> On Fri, 2 Jan 2004, Davide Libenzi wrote:
> 
> > On Sat, 3 Jan 2004, Rusty Russell wrote:
> > 
> > > In message <Pine.LNX.4.44.0401020856150.2278-100000@bigblue.dev.mdolabs.com> you write:
> > > > Rusty, you still have to use global static data when there is no need.
> > > 
> > > And you're still putting obscure crap in the task struct when there's
> > > no need.  Honestly, I'd be ashamed to post such a patch.
> > 
> > Ashamed !? Take a look at your original patch and then define shame. You 
> > had a communication mechanism that whilst being a private 1<->1 
> > communication among two tasks, relied on a single global message 
> > strucure, lock and mutex. Honestly I do not like myself to add stuff 
> > inside a strcture for one-time use. Not because of adding 12 bytes to the 
> > struct, that are laughable. But because it is used by a small piece of 
> > code w/out a re-use ability for other things.
> 
> Rusty, I took a better look at the patch and I think we can have 
> per-kthread stuff w/out littering the task_struct and by making the thing 
> more robust. We keep a global list_head protected by a global spinlock. We 
> define a structure that contain all the per-kthread stuff we need 
> (including a task_struct* to the kthread itself). When a kthread starts it 
> will add itself to the list, and when it will die it will remove itself 
> from the list. The start/stop functions will lookup the list (or hash, 
> depending on how much stuff you want to drop in) with the target 
> task_struct*, and if the lookup fails, it means the task already quit 
> (another task already did kthread_stop() ??, natural death ????). This is 
> too bad, but at least there won't be deadlock (or crash) beacause of this. 
> This because currently we keep the kthread task_struct* lingering around 
> w/out a method that willl inform us if the task goes away for some reason 
> (so that we can avoid signaling it and waiting for some interaction). The 
> list/hash will be able to tell us this. What do you think?

Rusty, I gave the patch a quick shot. There's no more single and 
global data structures but the list that stores all kthreads. Like usual, 
if you don't like it you can trash it. But pls leave shame apart. It did 
nothing to you :-)




- Davide




--- linux-2.5/include/linux/kthread.h._orig	2004-01-03 14:15:33.033286880 -0800
+++ linux-2.5/include/linux/kthread.h	2004-01-03 14:15:33.033286880 -0800
@@ -0,0 +1,81 @@
+#ifndef _LINUX_KTHREAD_H
+#define _LINUX_KTHREAD_H
+/* Simple interface for creating and stopping kernel threads without mess. */
+#include <linux/err.h>
+struct task_struct;
+
+/**
+ * kthread_create: create a kthread.
+ * @initfn: a function to run when kthread_start() is called.
+ * @corefn: a function to run after @initfn succeeds.
+ * @data: data ptr for @initfn and @corefn.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: This helper function creates and names a kernel
+ * thread, and controls it for you.  See kthread_start() and
+ * kthread_run().
+ * 
+ * Returns a task_struct or ERR_PTR(-ENOMEM).
+ */
+struct task_struct *kthread_create(int (*initfn)(void *data),
+				   int (*corefn)(void *data),
+				   void *data,
+				   const char namefmt[], ...);
+
+/**
+ * kthread_start: start a thread created by kthread_create().
+ * @k: the task returned by kthread_create().
+ *
+ * Description: This makes the thread @k run the initfn() specified in
+ * kthread_create(), if non-NULL, and return the result.  If that
+ * result is less than zero, then the thread @k is terminated.
+ * Otherwise, the corefn() is run if it is not NULL.
+ *
+ * There are two ways to write corefn(): simple or complex.  For
+ * simple, you set current->state to TASK_INTERRUPTIBLE, check for
+ * work, then return 0 when it is done.  The kthread code will call
+ * schedule(), and call corefn() again when woken.  More complex
+ * corefn()s can loop themselves, but must return 0 when
+ * signal_pending(current) is true.
+ *
+ * corefn() can terminate itself in two ways: if someone will later
+ * call kthread_stop(), simply return a negative error number.
+ * Otherwise, if you call do_exit() directly, the thread will
+ * terminate immediately. */
+int kthread_start(struct task_struct *k);
+
+/**
+ * kthread_run: create and start a thread.
+ * @initfn: a function to run when kthread_start() is called.
+ * @corefn: a function to run after initfn succeeds.
+ * @data: data ptr for @initfn and @corefn.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: Convenient wrapper for kthread_create() followed by
+ * kthread_start().  Returns the kthread, or ERR_PTR(). */
+#define kthread_run(initfn, corefn, data, namefmt, ...)			      \
+({									      \
+	struct task_struct *__k						      \
+		= kthread_create(initfn,corefn,data,namefmt, ## __VA_ARGS__); \
+	if (!IS_ERR(__k)) {						      \
+		int __err = kthread_start(__k);				      \
+		if (__err) __k = ERR_PTR(__err);			      \
+	}								      \
+	__k;								      \
+})
+
+/**
+ * kthread_stop: stop a thread created by kthread_create().
+ * @k: thread created by kthread_start.
+ *
+ * Sends a signal to @k, and waits for it to exit.  Your corefn() must
+ * not call do_exit() itself if you use this function!  This can also
+ * be called after kthread_create() instead of calling
+ * kthread_start(): the thread will exit without calling initfn() or
+ * corefn().
+ *
+ * Returns the last result of corefn(), or 0 if kthread_start() was
+ * never called. */
+int kthread_stop(struct task_struct *k);
+
+#endif /* _LINUX_KTHREAD_H */
--- linux-2.5/include/linux/workqueue.h._orig	2004-01-03 14:14:17.678742512 -0800
+++ linux-2.5/include/linux/workqueue.h	2004-01-03 14:15:33.042285512 -0800
@@ -60,6 +60,7 @@
 extern int FASTCALL(schedule_delayed_work(struct work_struct *work, unsigned long delay));
 extern void flush_scheduled_work(void);
 extern int current_is_keventd(void);
+extern int keventd_up(void);
 
 extern void init_workqueues(void);
 
--- linux-2.5/kernel/Makefile._orig	2004-01-03 14:14:18.205662408 -0800
+++ linux-2.5/kernel/Makefile	2004-01-03 14:15:33.092277912 -0800
@@ -6,7 +6,8 @@
 	    exit.o itimer.o time.o softirq.o resource.o \
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
-	    rcupdate.o intermodule.o extable.o params.o posix-timers.o
+	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
+	    kthread.o
 
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
--- linux-2.5/kernel/kthread.c._orig	2004-01-03 14:15:33.100276696 -0800
+++ linux-2.5/kernel/kthread.c	2004-01-03 15:48:39.783971680 -0800
@@ -0,0 +1,255 @@
+/* Kernel thread helper functions.
+ *   Copyright (C) 2003 IBM Corporation, Rusty Russell.
+ *
+ * Everything is done via keventd, so that we get a clean environment
+ * even if we're invoked from userspace (think modprobe, hotplug cpu,
+ * etc.).  Also, it allows us to wait for dying kthreads without side
+ * effects involved in adopting kthreads to random processes.
+ */
+#define __KERNEL_SYSCALLS__
+#include <linux/sched.h>
+#include <linux/list.h>
+#include <linux/kthread.h>
+#include <linux/completion.h>
+#include <linux/err.h>
+#include <linux/unistd.h>
+#include <asm/semaphore.h>
+
+
+
+
+static LIST_HEAD(kt_list);
+static spinlock_t kt_lock = SPIN_LOCK_UNLOCKED;
+
+
+
+struct kthread_create_info
+{
+	int (*initfn)(void *data);
+	int (*corefn)(void *data);
+	void *data;
+	char *name;
+};
+
+struct kthread_create
+{
+	struct kthread_create_info info;
+	struct semaphore ack;
+	struct task_struct *tsk;
+};
+
+struct kthread_node
+{
+	struct list_head lnk;
+	struct task_struct *self;
+	struct semaphore sem;
+	struct semaphore ack;
+	int ret, done;
+};
+
+
+/*
+ * Try to lookup the kthread node associated with the task "k".
+ * If the task exist, acquire ktn->sem to avoid the task to exit.
+ */
+static struct kthread_node *kthread_lookup(struct task_struct *k) {
+	struct list_head *pos;
+	struct kthread_node *ktn = NULL;
+
+	spin_lock(&kt_lock);
+	list_for_each(pos, &kt_list) {
+		ktn = list_entry(pos, struct kthread_node, lnk);
+		if (ktn->self == k)
+			break;
+		ktn = NULL;
+	}
+	if (ktn && down_trylock(&ktn->sem))
+		ktn = NULL;
+	spin_unlock(&kt_lock);
+	return ktn;
+}
+
+
+/* Returns a POSITIVE error number. */
+static int kthread(void *data)
+{
+	/* Copy data: it's on keventd_init's stack */
+	struct kthread_create *ktc = data;
+	struct kthread_create_info k = ktc->info;
+	int ret = 0;
+	sigset_t blocked;
+	struct kthread_node ktn;
+
+	/* Initialize the kthread list node and drop itself inside the list */
+	ktn.done = 0;
+	ktn.ret = -1;
+	ktn.self = current;
+	init_MUTEX(&ktn.sem);
+	init_MUTEX_LOCKED(&ktn.ack);
+	spin_lock(&kt_lock);
+	list_add(&ktn.lnk, &kt_list);
+	spin_unlock(&kt_lock);
+
+	strcpy(current->comm, k.name);
+
+	/* Block and flush all signals. */
+	sigfillset(&blocked);
+	sigprocmask(SIG_BLOCK, &blocked, NULL);
+	flush_signals(current);
+
+	/*
+	 * OK, tell user we're spawned, wait for kthread_start/stop.
+	 * Note that after this call the "ktc" pointer will be no longer
+	 * valid.
+	 */
+	up(&ktc->ack);
+	current->state = TASK_INTERRUPTIBLE;
+	schedule();
+
+	ret = EINTR;
+	if (signal_pending(current))
+		goto out;
+
+	/* Run initfn, tell the caller the result. */
+	ret = 0;
+	if (k.initfn)
+		ret = k.initfn(k.data);
+	ktn.ret = ret;
+
+	/*
+	 * The caller is waiting for us to signal the completion
+	 * of the init function.
+	 */
+	up(&ktn.ack);
+
+	if (ret < 0)
+		goto out;
+
+	while (!signal_pending(current)) {
+		/* If it fails, just wait until kthread_stop. */
+		if (k.corefn && (ret = k.corefn(k.data)) < 0) {
+			k.corefn = NULL;
+			current->state = TASK_INTERRUPTIBLE;
+		}
+		schedule();
+	}
+
+	current->state = TASK_RUNNING;
+
+	ret = 0;
+out:
+	spin_lock(&kt_lock);
+	list_del(&ktn.lnk);
+	spin_unlock(&kt_lock);
+
+	ktn.done = 1;
+	ktn.ret = ret;
+
+	/*
+	 * The control code will hold the semaphore down to avoid
+	 * the task (and the associated node data) to vanish during
+	 * the handshake
+	 */
+	down(&ktn.sem);
+
+	return ret;
+}
+
+static void create_kthread(void *data)
+{
+	int pid;
+	struct kthread_create *ktc = data;
+
+	/* We want our own signal handler (we take no signals by default). */
+	pid = kernel_thread(kthread, ktc, CLONE_FS|CLONE_FILES|SIGCHLD);
+	if (pid < 0)
+		ktc->tsk = ERR_PTR(pid);
+	else {
+		down(&ktc->ack);
+		ktc->tsk = find_task_by_pid(pid);
+	}
+}
+
+struct task_struct *kthread_create(int (*initfn)(void *data),
+				   int (*corefn)(void *data),
+				   void *data,
+				   const char namefmt[],
+				   ...)
+{
+	va_list args;
+	struct kthread_create ktc;
+	/* Or, as we like to say, 16. */
+	char name[sizeof(((struct task_struct *)0)->comm)];
+	DECLARE_WORK(work, create_kthread, &ktc);
+
+	va_start(args, namefmt);
+	vsnprintf(name, sizeof(name), namefmt, args);
+	va_end(args);
+
+	ktc.info.initfn = initfn;
+	ktc.info.corefn = corefn;
+	ktc.info.data = data;
+	ktc.info.name = name;
+	init_MUTEX_LOCKED(&ktc.ack);
+
+	/* If we're being called to start the first workqueue, we
+	 * can't use keventd. */
+	if (!keventd_up())
+		work.func(work.data);
+	else {
+		schedule_work(&work);
+		down(&ktc.ack);
+	}
+	return ktc.tsk;
+}
+
+int kthread_start(struct task_struct *k)
+{
+	int ret;
+	struct kthread_node *ktn = kthread_lookup(k);
+
+	/* No corresponding kthread is found. Too bad ... */
+	if (!ktn)
+		return -ENOENT;
+
+	/*
+	 * Wake up the kthread task waiting for us to unlock so that
+	 * it can go run the init function.
+	 */
+	wake_up_process(ktn->self);
+
+	/* Wait for kthread ack after init function complete. */
+	down(&ktn->ack);
+
+	/* Get the init result. */
+	ret = ktn->ret;
+
+	/* now we can release the kthread node. */
+	up(&ktn->sem);
+
+	return ret;
+}
+
+int kthread_stop(struct task_struct *k)
+{
+	int ret;
+	struct kthread_node *ktn = kthread_lookup(k);
+
+	/* No corresponding kthread is found. Too bad ... */
+	if (!ktn)
+		return -ENOENT;
+
+	force_sig(SIGTERM, ktn->self);
+
+	while (!ktn->done)
+		yield();
+
+	/* Get the complete result. */
+	ret = ktn->ret;
+
+	/* now we can release the kthread node. */
+	up(&ktn->sem);
+
+	return ret;
+}
+
--- linux-2.5/kernel/workqueue.c._orig	2004-01-03 14:14:18.259654200 -0800
+++ linux-2.5/kernel/workqueue.c	2004-01-03 14:15:33.109275328 -0800
@@ -359,6 +359,11 @@
 	flush_workqueue(keventd_wq);
 }
 
+int keventd_up(void)
+{
+	return keventd_wq != NULL;
+}
+
 int current_is_keventd(void)
 {
 	struct cpu_workqueue_struct *cwq;


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-03  3:43                 ` Davide Libenzi
                                     ` (2 preceding siblings ...)
  2004-03-29 15:40                   ` Davide Libenzi
@ 2004-03-29 15:41                   ` Rusty Russell
  3 siblings, 0 replies; 47+ messages in thread
From: Rusty Russell @ 2004-03-29 15:41 UTC (permalink / raw)
  To: Administrator
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

In message <Pine.LNX.4.44.0401021919240.825-100000@bigblue.dev.mdolabs.com> you write:
> On Sat, 3 Jan 2004, Rusty Russell wrote:
> 
> > In message <Pine.LNX.4.44.0401020856150.2278-100000@bigblue.dev.mdolabs.com
> you write:
> > > Rusty, you still have to use global static data when there is no need.
> > 
> > And you're still putting obscure crap in the task struct when there's
> > no need.  Honestly, I'd be ashamed to post such a patch.
> 
> Ashamed !? Take a look at your original patch and then define shame. You 
> had a communication mechanism that whilst being a private 1<->1 
> communication among two tasks, relied on a single global message 
> strucure, lock and mutex.

Still do.  It's *simple*, and I refuse to be ashamed of that.

My words were harsh, but I completely disagree with you.  I believe
you are wrong.  I would never have coded it the way you did.  I read
your code and I still think you are wrong, and find your code both
bloated and ugly.

It's not about space, it's about taste.  And placing random stuff in
an unrelated structure because you can't think of a better way to do
it is TASTELESS.  If it were the only way, it might be forgivable, but
it's not, and I far prefer a little localized messiness to global
messiness.

Now, on something we do agree: I dislike the global structure myself.
By all means try changing the code to use a pipe between child and
parent for the initfn result.  But I've told you that I will not
submit any solution which adds to a generic structure for a specific
problem.

I'm very, very sorry this has gotten a little heated: I generally
enjoy our discussions.  But I don't think I should have to say "no"
four times.

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-04  2:34                     ` Rusty Russell
  2004-01-04  4:42                       ` Davide Libenzi
@ 2004-03-29 15:42                       ` Davide Libenzi
  1 sibling, 0 replies; 47+ messages in thread
From: Davide Libenzi @ 2004-03-29 15:42 UTC (permalink / raw)
  To: Administrator
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Sun, 4 Jan 2004, Rusty Russell wrote:

> In message <Pine.LNX.4.44.0401031021280.1678-100000@bigblue.dev.mdolabs.com> you write:
> > Rusty, I took a better look at the patch and I think we can have 
> > per-kthread stuff w/out littering the task_struct and by making the thing 
> > more robust.
> 
> Except sharing data with a lock is perfectly robust.
> 
> > We keep a global list_head protected by a global spinlock. We 
> > define a structure that contain all the per-kthread stuff we need 
> > (including a task_struct* to the kthread itself). When a kthread starts it 
> > will add itself to the list, and when it will die it will remove itself 
> > from the list.
> 
> Yeah, I deliberately didn't implement this, because (1) it seems like
> a lot of complexity when using a lock and letting them share a single
> structure works fine and is even simpler, and (2) the thread can't
> just "do_exit()".
> 
> You can get around (2) by having a permenant parent "kthread" thread
> which is a parent to all the kthreads (it'll get a SIGCHLD when
> someone does "do_exit()").  But the implementation was pretty ugly,
> since it involved having a communications mechanism with the kthread
> parent, which means you have the global ktm_message-like-thing for
> this...

You will lose in any case. What happens if the thread does do_exit() and 
you do kthread_stop() after that?
With the patch I posted to you, the kthread_stop() will simply miss the 
lookup and return -ENOENT.



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-04  4:42                       ` Davide Libenzi
  2004-01-04  4:55                         ` Davide Libenzi
  2004-01-04  9:35                         ` Rusty Russell
@ 2004-03-29 15:42                         ` Davide Libenzi
  2 siblings, 0 replies; 47+ messages in thread
From: Davide Libenzi @ 2004-03-29 15:42 UTC (permalink / raw)
  To: Administrator
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Sat, 3 Jan 2004, Davide Libenzi wrote:

> On Sun, 4 Jan 2004, Rusty Russell wrote:
> 
> > In message <Pine.LNX.4.44.0401031021280.1678-100000@bigblue.dev.mdolabs.com> you write:
> > > Rusty, I took a better look at the patch and I think we can have 
> > > per-kthread stuff w/out littering the task_struct and by making the thing 
> > > more robust.
> > 
> > Except sharing data with a lock is perfectly robust.
> > 
> > > We keep a global list_head protected by a global spinlock. We 
> > > define a structure that contain all the per-kthread stuff we need 
> > > (including a task_struct* to the kthread itself). When a kthread starts it 
> > > will add itself to the list, and when it will die it will remove itself 
> > > from the list.
> > 
> > Yeah, I deliberately didn't implement this, because (1) it seems like
> > a lot of complexity when using a lock and letting them share a single
> > structure works fine and is even simpler, and (2) the thread can't
> > just "do_exit()".
> > 
> > You can get around (2) by having a permenant parent "kthread" thread
> > which is a parent to all the kthreads (it'll get a SIGCHLD when
> > someone does "do_exit()").  But the implementation was pretty ugly,
> > since it involved having a communications mechanism with the kthread
> > parent, which means you have the global ktm_message-like-thing for
> > this...
> 
> You will lose in any case. What happens if the thread does do_exit() and 
> you do kthread_stop() after that?
> With the patch I posted to you, the kthread_stop() will simply miss the 
> lookup and return -ENOENT.

Nope, we are screwed in any case. Is it really important to give the 
ability to do do_exit() for kthreads? I mean, why a simple return would 
not work?



- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-03 11:47                   ` Rusty Russell
  2004-01-04  4:23                     ` Davide Libenzi
@ 2004-03-29 15:42                     ` Davide Libenzi
  1 sibling, 0 replies; 47+ messages in thread
From: Davide Libenzi @ 2004-03-29 15:42 UTC (permalink / raw)
  To: Administrator
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

On Sat, 3 Jan 2004, Rusty Russell wrote:

> Still do.  It's *simple*, and I refuse to be ashamed of that.
> 
> My words were harsh, but I completely disagree with you.  I believe
> you are wrong.  I would never have coded it the way you did.  I read
> your code and I still think you are wrong, and find your code both
> bloated and ugly.

Bloated ? This is the diffstat of my "ashamed" patch over your bits :-)

include/linux/init_task.h |    3 +
include/linux/sched.h     |    8 ++++
kernel/kthread.c          |   78 ++++++++++++++++------------------------------
3 files changed, 39 insertions(+), 50 deletions(-)



> Now, on something we do agree: I dislike the global structure myself.
> By all means try changing the code to use a pipe between child and
> parent for the initfn result.  But I've told you that I will not
> submit any solution which adds to a generic structure for a specific
> problem.
> 
> I'm very, very sorry this has gotten a little heated: I generally
> enjoy our discussions.  But I don't think I should have to say "no"
> four times.

It's ok Rusty, I enjoy the discussion in any case :-) Since I told you in 
a private email that I was convinced myself about adding stuff inside the 
struct, you could have avoided the "ashamed" thing. But it's fine, a 
little bit of sarcasm is the salt of life.




- Davide



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
  2004-01-03 19:00                   ` Davide Libenzi
                                       ` (2 preceding siblings ...)
  2004-03-29 15:41                     ` Davide Libenzi
@ 2004-03-29 15:42                     ` Rusty Russell
  3 siblings, 0 replies; 47+ messages in thread
From: Rusty Russell @ 2004-03-29 15:42 UTC (permalink / raw)
  To: Administrator
  Cc: Linus Torvalds, Andrew Morton, mingo, Linux Kernel Mailing List

In message <Pine.LNX.4.44.0401031021280.1678-100000@bigblue.dev.mdolabs.com> you write:
> Rusty, I took a better look at the patch and I think we can have 
> per-kthread stuff w/out littering the task_struct and by making the thing 
> more robust.

Except sharing data with a lock is perfectly robust.

> We keep a global list_head protected by a global spinlock. We 
> define a structure that contain all the per-kthread stuff we need 
> (including a task_struct* to the kthread itself). When a kthread starts it 
> will add itself to the list, and when it will die it will remove itself 
> from the list.

Yeah, I deliberately didn't implement this, because (1) it seems like
a lot of complexity when using a lock and letting them share a single
structure works fine and is even simpler, and (2) the thread can't
just "do_exit()".

You can get around (2) by having a permenant parent "kthread" thread
which is a parent to all the kthreads (it'll get a SIGCHLD when
someone does "do_exit()").  But the implementation was pretty ugly,
since it involved having a communications mechanism with the kthread
parent, which means you have the global ktm_message-like-thing for
this...

> The start/stop functions will lookup the list (or hash, 
> depending on how much stuff you want to drop in) with the target 
> task_struct*, and if the lookup fails, it means the task already quit 
> (another task already did kthread_stop() ??, natural death ????). This is 
> too bad, but at least there won't be deadlock (or crash) beacause of this. 

> This because currently we keep the kthread task_struct* lingering around 
> w/out a method that willl inform us if the task goes away for some reason 
> (so that we can avoid signaling it and waiting for some interaction). The 
> list/hash will be able to tell us this. What do you think?

I put the "corefn fails" option in there in case someone wants it, but
I'm not sure how it would be used in practice.  If it becomes
commonplace to have them handing around for a long time, then storing
the result somewhere rather than keeping the whole thread around makes
sense, but at the moment it seems like premature optimization.

Unfortunately, my fd idea is broken, because the process which does
the start may not be the same as the one which does the create.  A
shared queue really is the simplest solution.

Latest version below,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

Name: Use Signals For Kthread
Author: Rusty Russell
Status: Tested on 2.6.0-bk3

D: The hotplug CPU code introduces two major problems:
D: 
D: 1) Threads which previously never stopped (migration thread,
D:    ksoftirqd, keventd) have to be stopped cleanly as CPUs go offline.
D: 2) Threads which previously never had to be created now have
D:    to be created when a CPU goes online.
D: 
D: Unfortunately, stopping a thread is fairly baroque, involving memory
D: barriers, a completion and spinning until the task is actually dead.
D: 
D: There are also three problems in starting a thread:
D: 1) Doing it from a random process context risks environment contamination:
D:    better to do it from keventd to guarantee a clean environment, a-la
D:    call_usermodehelper.
D: 2) Getting the task struct without races is a hard: see kernel/sched.c
D:    migration_call(), kernel/workqueue.c create_workqueue_thread().
D: 3) There are races in starting a thread for a CPU which is not yet
D:    online: migration thread does a complex dance at the moment for
D:    a similar reason (there may be no migration thread to migrate us).
D: 
D: Place all this logic in some primitives to make life easier:
D: kthread_create(), kthread_start() and kthread_destroy().  These
D: primitives require no extra data-structures in the caller: they operate
D: on normal "struct task_struct"s.
D: 
D: Other changes:
D:   - Expose keventd_up(), as keventd and migration threads will use
D:     kthread to launch, and kthread normally uses workqueues and must
D:     recognize this case.
D: 
D: Changes since first version:
D: 1) Sleep UNINTERRUPTIBLE when waiting for reply (thanks Andrew Morton).
D: 2) Reparent threads and waitpid for them.
D: 3) Do ALL ops via keventd, so we don't disturb current process.
D: 4) This version uses signals rather than its own communication
D:     mechanism makes kthread code a little simpler, and means users can
D:     loop inside corefn if they want, and return when signal_pending().
D: 5) kthread_destroy() renamed to kthread_stop().
D: 6) kthread_start() returns an int.
D: 7) Better documentation, in nanodoc form.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .8740-linux-2.6.1-rc1/include/linux/kthread.h .8740-linux-2.6.1-rc1.updated/include/linux/kthread.h
--- .8740-linux-2.6.1-rc1/include/linux/kthread.h	1970-01-01 10:00:00.000000000 +1000
+++ .8740-linux-2.6.1-rc1.updated/include/linux/kthread.h	2004-01-02 21:18:54.000000000 +1100
@@ -0,0 +1,81 @@
+#ifndef _LINUX_KTHREAD_H
+#define _LINUX_KTHREAD_H
+/* Simple interface for creating and stopping kernel threads without mess. */
+#include <linux/err.h>
+struct task_struct;
+
+/**
+ * kthread_create: create a kthread.
+ * @initfn: a function to run when kthread_start() is called.
+ * @corefn: a function to run after @initfn succeeds.
+ * @data: data ptr for @initfn and @corefn.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: This helper function creates and names a kernel
+ * thread, and controls it for you.  See kthread_start() and
+ * kthread_run().
+ * 
+ * Returns a task_struct or ERR_PTR(-ENOMEM).
+ */
+struct task_struct *kthread_create(int (*initfn)(void *data),
+				   int (*corefn)(void *data),
+				   void *data,
+				   const char namefmt[], ...);
+
+/**
+ * kthread_start: start a thread created by kthread_create().
+ * @k: the task returned by kthread_create().
+ *
+ * Description: This makes the thread @k run the initfn() specified in
+ * kthread_create(), if non-NULL, and return the result.  If that
+ * result is less than zero, then the thread @k is terminated.
+ * Otherwise, the corefn() is run if it is not NULL.
+ *
+ * There are two ways to write corefn(): simple or complex.  For
+ * simple, you set current->state to TASK_INTERRUPTIBLE, check for
+ * work, then return 0 when it is done.  The kthread code will call
+ * schedule(), and call corefn() again when woken.  More complex
+ * corefn()s can loop themselves, but must return 0 when
+ * signal_pending(current) is true.
+ *
+ * corefn() can terminate itself in two ways: if someone will later
+ * call kthread_stop(), simply return a negative error number.
+ * Otherwise, if you call do_exit() directly, the thread will
+ * terminate immediately. */
+int kthread_start(struct task_struct *k);
+
+/**
+ * kthread_run: create and start a thread.
+ * @initfn: a function to run when kthread_start() is called.
+ * @corefn: a function to run after initfn succeeds.
+ * @data: data ptr for @initfn and @corefn.
+ * @namefmt: printf-style name for the thread.
+ *
+ * Description: Convenient wrapper for kthread_create() followed by
+ * kthread_start().  Returns the kthread, or ERR_PTR(). */
+#define kthread_run(initfn, corefn, data, namefmt, ...)			      \
+({									      \
+	struct task_struct *__k						      \
+		= kthread_create(initfn,corefn,data,namefmt, ## __VA_ARGS__); \
+	if (!IS_ERR(__k)) {						      \
+		int __err = kthread_start(__k);				      \
+		if (__err) __k = ERR_PTR(__err);			      \
+	}								      \
+	__k;								      \
+})
+
+/**
+ * kthread_stop: stop a thread created by kthread_create().
+ * @k: thread created by kthread_start.
+ *
+ * Sends a signal to @k, and waits for it to exit.  Your corefn() must
+ * not call do_exit() itself if you use this function!  This can also
+ * be called after kthread_create() instead of calling
+ * kthread_start(): the thread will exit without calling initfn() or
+ * corefn().
+ *
+ * Returns the last result of corefn(), or 0 if kthread_start() was
+ * never called. */
+int kthread_stop(struct task_struct *k);
+
+#endif /* _LINUX_KTHREAD_H */
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .8740-linux-2.6.1-rc1/include/linux/workqueue.h .8740-linux-2.6.1-rc1.updated/include/linux/workqueue.h
--- .8740-linux-2.6.1-rc1/include/linux/workqueue.h	2003-09-22 10:07:08.000000000 +1000
+++ .8740-linux-2.6.1-rc1.updated/include/linux/workqueue.h	2004-01-02 20:45:13.000000000 +1100
@@ -60,6 +60,7 @@ extern int FASTCALL(schedule_work(struct
 extern int FASTCALL(schedule_delayed_work(struct work_struct *work, unsigned long delay));
 extern void flush_scheduled_work(void);
 extern int current_is_keventd(void);
+extern int keventd_up(void);
 
 extern void init_workqueues(void);
 
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .8740-linux-2.6.1-rc1/kernel/Makefile .8740-linux-2.6.1-rc1.updated/kernel/Makefile
--- .8740-linux-2.6.1-rc1/kernel/Makefile	2003-10-09 18:03:02.000000000 +1000
+++ .8740-linux-2.6.1-rc1.updated/kernel/Makefile	2004-01-02 20:45:13.000000000 +1100
@@ -6,7 +6,8 @@ obj-y     = sched.o fork.o exec_domain.o
 	    exit.o itimer.o time.o softirq.o resource.o \
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
-	    rcupdate.o intermodule.o extable.o params.o posix-timers.o
+	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
+	    kthread.o
 
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .8740-linux-2.6.1-rc1/kernel/kthread.c .8740-linux-2.6.1-rc1.updated/kernel/kthread.c
--- .8740-linux-2.6.1-rc1/kernel/kthread.c	1970-01-01 10:00:00.000000000 +1000
+++ .8740-linux-2.6.1-rc1.updated/kernel/kthread.c	2004-01-02 21:07:50.000000000 +1100
@@ -0,0 +1,227 @@
+/* Kernel thread helper functions.
+ *   Copyright (C) 2003 IBM Corporation, Rusty Russell.
+ *
+ * Everything is done via keventd, so that we get a clean environment
+ * even if we're invoked from userspace (think modprobe, hotplug cpu,
+ * etc.).  Also, it allows us to wait for dying kthreads without side
+ * effects involved in adopting kthreads to random processes.
+ */
+#define __KERNEL_SYSCALLS__
+#include <linux/sched.h>
+#include <linux/kthread.h>
+#include <linux/completion.h>
+#include <linux/err.h>
+#include <linux/unistd.h>
+#include <asm/semaphore.h>
+
+/* This makes sure only one kthread is being talked to at once:
+ * protects global vars below. */
+static DECLARE_MUTEX(kthread_control);
+
+static int kthread_init_failed;
+static DECLARE_COMPLETION(kthread_ack);
+
+struct kthread_create_info
+{
+	int (*initfn)(void *data);
+	int (*corefn)(void *data);
+	void *data;
+	char *name;
+};
+
+/* Returns a POSITIVE error number. */
+static int kthread(void *data)
+{
+	/* Copy data: it's on keventd_init's stack */
+	struct kthread_create_info k = *(struct kthread_create_info *)data;
+	int ret = 0;
+	sigset_t blocked;
+
+	strcpy(current->comm, k.name);
+
+	/* Block and flush all signals. */
+	sigfillset(&blocked);
+	sigprocmask(SIG_BLOCK, &blocked, NULL);
+	flush_signals(current);
+
+	/* OK, tell user we're spawned, wait for kthread_start/stop */
+	current->state = TASK_INTERRUPTIBLE;
+	complete(&kthread_ack);
+	schedule();
+
+	if (signal_pending(current))
+		return EINTR;
+
+	/* Run initfn, tell the caller the result. */
+	if (k.initfn)
+		ret = k.initfn(k.data);
+	kthread_init_failed = (ret < 0);
+	complete(&kthread_ack);
+	if (ret < 0)
+		return -ret;
+
+	while (!signal_pending(current)) {
+		/* If it fails, just wait until kthread_stop. */
+		if (k.corefn && (ret = k.corefn(k.data)) < 0) {
+			k.corefn = NULL;
+			current->state = TASK_INTERRUPTIBLE;
+		}
+		schedule();
+	}
+
+	current->state = TASK_RUNNING;
+	return -ret;
+}
+
+enum kthread_op_type
+{
+	KTHREAD_CREATE,
+	KTHREAD_START,
+	KTHREAD_STOP,
+};
+
+struct kthread_op
+{
+	enum kthread_op_type type;
+
+	union {
+		/* KTHREAD_CREATE in */
+		struct kthread_start_info *start;
+		/* KTHREAD_START, KTHREAD_STOP in */
+		struct task_struct *target;
+		/* out */
+		struct task_struct *result;
+	} u;
+	struct completion done;
+};
+
+/* We are keventd: create a thread. */
+static void *create_kthread(struct kthread_op *op)
+{
+	int pid;
+
+	init_completion(&kthread_ack);
+	/* We want our own signal handler (we take no signals by default). */
+	pid = kernel_thread(kthread, op->u.start,CLONE_FS|CLONE_FILES|SIGCHLD);
+	if (pid < 0)
+		return ERR_PTR(pid);
+
+	wait_for_completion(&kthread_ack);
+	return find_task_by_pid(pid);
+}
+
+static void adopt_kthread(struct task_struct *k)
+{
+	write_lock_irq(&tasklist_lock);
+	REMOVE_LINKS(k);
+	k->parent = current;
+	k->real_parent = current;
+	SET_LINKS(k);
+	write_unlock_irq(&tasklist_lock);
+}
+
+/* We are keventd: start the thread. */
+static void *start_kthread(struct kthread_op *op)
+{
+	int status = 0;
+
+	init_completion(&kthread_ack);
+
+	/* Adopt it in case it's going to die, then tell it to start... */
+	adopt_kthread(op->u.target);
+	get_task_struct(op->u.target);
+	wake_up_process(op->u.target);
+	wait_for_completion(&kthread_ack);
+
+	if (kthread_init_failed)
+		waitpid(op->u.target->tgid, &status, __WALL);
+	put_task_struct(op->u.target);
+
+	return ERR_PTR((status >> 8) & 0xFF);
+}
+
+/* We are keventd: stop the thread. */
+static void *stop_kthread(struct kthread_op *op)
+{
+	int status;
+
+	adopt_kthread(op->u.target);
+
+	get_task_struct(op->u.target);
+	force_sig(SIGTERM, op->u.target);
+	waitpid(op->u.target->tgid, &status, __WALL);
+	put_task_struct(op->u.target);
+
+	return ERR_PTR((status >> 8) & 0xFF);
+}
+
+/* We are keventd: do what they told us */
+static void keventd_do_kthread_op(void *_op)
+{
+	struct kthread_op *op = _op;
+
+	down(&kthread_control);
+
+	switch (op->type) {
+	case KTHREAD_CREATE: op->u.result = create_kthread(op); break;
+	case KTHREAD_START: op->u.result = start_kthread(op); break;
+	case KTHREAD_STOP: op->u.result = stop_kthread(op); break;
+	default:
+		BUG();
+	}
+
+	up(&kthread_control);
+	complete(&op->done);
+}
+
+static struct task_struct *do_kthread_op(enum kthread_op_type type, void *info)
+{
+	struct kthread_op op;
+	DECLARE_WORK(work, keventd_do_kthread_op, &op);
+
+	op.type = type;
+	op.u.target = info;
+	init_completion(&op.done);
+	
+	/* If we're being called to start the first workqueue, we
+	 * can't use keventd. */
+	if (!keventd_up())
+		work.func(work.data);
+	else {
+		schedule_work(&work);
+		wait_for_completion(&op.done);
+	}
+	return op.u.result;
+}
+
+struct task_struct *kthread_create(int (*initfn)(void *data),
+				   int (*corefn)(void *data),
+				   void *data,
+				   const char namefmt[],
+				   ...)
+{
+	va_list args;
+	struct kthread_create_info start;
+	/* Or, as we like to say, 16. */
+	char name[sizeof(((struct task_struct *)0)->comm)];
+
+	va_start(args, namefmt);
+	vsnprintf(name, sizeof(name), namefmt, args);
+	va_end(args);
+
+	start.initfn = initfn;
+	start.corefn = corefn;
+	start.data = data;
+	start.name = name;
+	return do_kthread_op(KTHREAD_CREATE, &start);
+}
+
+int kthread_start(struct task_struct *k)
+{
+	return PTR_ERR(do_kthread_op(KTHREAD_START, k));
+}
+
+int kthread_stop(struct task_struct *k)
+{
+	return PTR_ERR(do_kthread_op(KTHREAD_STOP, k));
+}
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .8740-linux-2.6.1-rc1/kernel/workqueue.c .8740-linux-2.6.1-rc1.updated/kernel/workqueue.c
--- .8740-linux-2.6.1-rc1/kernel/workqueue.c	2003-09-22 10:27:38.000000000 +1000
+++ .8740-linux-2.6.1-rc1.updated/kernel/workqueue.c	2004-01-02 20:45:13.000000000 +1100
@@ -359,6 +359,11 @@ void flush_scheduled_work(void)
 	flush_workqueue(keventd_wq);
 }
 
+int keventd_up(void)
+{
+	return keventd_wq != NULL;
+}
+
 int current_is_keventd(void)
 {
 	struct cpu_workqueue_struct *cwq;

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/2] kthread_create
@ 2003-12-31 18:02 Albert Cahalan
  0 siblings, 0 replies; 47+ messages in thread
From: Albert Cahalan @ 2003-12-31 18:02 UTC (permalink / raw)
  To: linux-kernel mailing list; +Cc: rusty, jgarzik, benh, akpm

> +struct task_struct *kthread_create(int (*initfn)(void *data),
> +       int (*corefn)(void *data),
> +       void *data,
> +       const char namefmt[],
> +       ...)
> +{
> + va_list args;
> + struct kthread_create kc;
> + DECLARE_WORK(work, spawn_kthread, &kc);
> + /* Or, as we like to say, 16. */
> + char name[sizeof(((struct task_struct *)0)->comm)];
> +
> + va_start(args, namefmt);
> + vsnprintf(name, sizeof(name), namefmt, args);
> + va_end(args);
> +
> + init_completion(&kc.done);
> + kc.k.initfn = initfn;
> + kc.k.corefn = corefn;
> + kc.k.data = data;
> + kc.k.name = name;

Since processor ID info is available, there's no
need for per-CPU naming. (I do realize it may be
per-disk or per-filesystem though)

With the huge NUMA boxes, a simple "ps -ef" is
going to look insane. How about task groups for
them? Then the various threads wind up being
seen as just that, threads.

BTW, some (p->pid==0) tests may need to become
(p->tgid==0) tests instead.



^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2004-03-29 15:42 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-31  3:31 [PATCH 1/2] kthread_create Rusty Russell
2003-12-31  4:33 ` Jeff Garzik
2003-12-31  5:28   ` Rusty Russell
2003-12-31  6:34     ` Jeff Garzik
2003-12-31  8:47       ` Benjamin Herrenschmidt
2004-01-01 23:51       ` Rusty Russell
2003-12-31  4:49 ` Andrew Morton
2003-12-31  5:18   ` Rusty Russell
2003-12-31  5:06 ` Davide Libenzi
2003-12-31  5:34   ` Rusty Russell
2003-12-31  5:56     ` Davide Libenzi
2003-12-31  6:27       ` Rusty Russell
2004-01-01  3:45         ` Davide Libenzi
2004-01-02  7:09           ` Rusty Russell
2004-01-02 16:58             ` Davide Libenzi
2004-01-03  3:05               ` Rusty Russell
2004-01-03  3:43                 ` Davide Libenzi
2004-01-03 11:47                   ` Rusty Russell
2004-01-04  4:23                     ` Davide Libenzi
2004-03-29 15:42                     ` Davide Libenzi
2004-01-03 19:00                   ` Davide Libenzi
2004-01-03 23:53                     ` Davide Libenzi
2004-01-04  2:34                     ` Rusty Russell
2004-01-04  4:42                       ` Davide Libenzi
2004-01-04  4:55                         ` Davide Libenzi
2004-01-04  9:35                         ` Rusty Russell
2004-01-04 23:03                           ` Davide Libenzi
2004-01-05  4:09                             ` Rusty Russell
2004-01-05  5:06                               ` Davide Libenzi
2004-01-05  6:38                                 ` Rusty Russell
2004-01-05  6:52                                   ` Davide Libenzi
2004-01-07  7:00                                     ` Rusty Russell
2004-01-07  7:25                                       ` Davide Libenzi
2004-01-08  0:33                                         ` Rusty Russell
2004-03-29 15:42                         ` Davide Libenzi
2004-03-29 15:42                       ` Davide Libenzi
2004-03-29 15:41                     ` Davide Libenzi
2004-03-29 15:42                     ` Rusty Russell
2004-03-29 15:40                   ` Davide Libenzi
2004-03-29 15:41                   ` Rusty Russell
2004-03-29 15:39                 ` Davide Libenzi
2004-03-29 15:39               ` Rusty Russell
2004-03-29 15:38             ` Davide Libenzi
2003-12-31  6:31       ` Srivatsa Vaddagiri
2003-12-31  7:12         ` Davide Libenzi
2003-12-31 23:25           ` Rusty Russell
2003-12-31 18:02 Albert Cahalan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).