linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 00/12] Syslets, Threadlets, generic AIO support, v5
@ 2007-02-28 21:39 Ingo Molnar
  2007-02-28 21:41 ` [patch 01/12] syslets: add async.h include file, kernel-side API definitions Ingo Molnar
                   ` (13 more replies)
  0 siblings, 14 replies; 18+ messages in thread
From: Ingo Molnar @ 2007-02-28 21:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Jens Axboe, Thomas Gleixner


this is the v5 release of the syslet/threadlet subsystem:

   http://redhat.com/~mingo/syslet-patches/

this release took 4 days to get out, but there were a couple of key 
changes that needed some time to settle down:

 - ported the code from v2.6.20 to current -git (v2.6.20-rc2 should be 
   fine as a base)

 - 64-bit support in terms of a x86_64 port. Jens has updated the FIO
   syslet code to work on 64-bit too. (kernel/async.c was pretty 64-bit
   clean already, it needed minimal changes for basic x86_64 support.)

 - 32-bit user-space on 64-bit kernel compat support. 32-bit syslet and
   threadlet binaries work fine on 64-bit kernels.

 - various cleanups and simplifications

the v4->v5 delta is:

 17 files changed, 327 insertions(+), 271 deletions(-)

amongst the plans for v6 are cleanups/simplifications to the syslet 
engine API, a number of suggestions have been made for that already.

the linecount increase in v5 is mostly due to the x86_64 port. The ABI 
had to change again - see the async-test userspace code for details.

the x86_64 patch is a bit monolithic at the moment, i'll split it up 
further in v6.

As always, comments, suggestions, reports are welcome!

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 01/12] syslets: add async.h include file, kernel-side API definitions
  2007-02-28 21:39 [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Ingo Molnar
@ 2007-02-28 21:41 ` Ingo Molnar
  2007-02-28 21:41 ` [patch 02/12] syslets: add syslet.h include file, user API/ABI definitions Ingo Molnar
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2007-02-28 21:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Jens Axboe, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

add include/linux/async.h which contains the kernel-side API
declarations.

it also provides NOP stubs for the !CONFIG_ASYNC_SUPPORT case.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/linux/async.h |   88 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

Index: linux/include/linux/async.h
===================================================================
--- /dev/null
+++ linux/include/linux/async.h
@@ -0,0 +1,88 @@
+#ifndef _LINUX_ASYNC_H
+#define _LINUX_ASYNC_H
+
+#include <linux/completion.h>
+#include <linux/compiler.h>
+#include <linux/syslet.h>
+#include <asm/unistd.h>
+
+/*
+ * The syslet subsystem - asynchronous syscall execution support.
+ *
+ * Syslet-subsystem internal definitions:
+ */
+
+/*
+ * The kernel-side copy of a syslet atom - with arguments expanded:
+ */
+struct syslet_atom {
+	unsigned long				flags;
+	unsigned long				nr;
+	long __user				*ret_ptr;
+	struct syslet_uatom	__user		*next;
+	unsigned long				args[6];
+	syscall_fn_t				*call_table;
+	unsigned int				nr_syscalls;
+};
+
+/*
+ * The 'async head' is the thread which has user-space context (ptregs)
+ * 'below it' - this is the one that can return to user-space:
+ */
+struct async_head {
+	spinlock_t				lock;
+	struct task_struct			*user_task;
+
+	struct list_head			ready_async_threads;
+	struct list_head			busy_async_threads;
+
+	struct mutex				completion_lock;
+	long					events_left;
+	wait_queue_head_t			wait;
+
+	struct async_head_user	__user		*ahu;
+
+	unsigned long		__user		*new_stackp;
+	unsigned long				new_ip;
+	unsigned long				restore_stack;
+	unsigned long				restore_ip;
+	struct completion			start_done;
+	struct completion			exit_done;
+};
+
+/*
+ * The 'async thread' is either a newly created async thread or it is
+ * an 'ex-head' - it cannot return to user-space and only has kernel
+ * context.
+ */
+struct async_thread {
+	struct task_struct			*task;
+	unsigned long				user_stack;
+	unsigned long				user_ip;
+	struct async_head			*ah;
+
+	struct list_head			entry;
+
+	unsigned int				exit;
+};
+
+/*
+ * Generic kernel API definitions:
+ */
+#ifdef CONFIG_ASYNC_SUPPORT
+extern void async_init(struct task_struct *t);
+extern void async_exit(struct task_struct *t);
+extern void __async_schedule(struct task_struct *t);
+#else /* !CONFIG_ASYNC_SUPPORT */
+static inline void async_init(struct task_struct *t)
+{
+}
+static inline void async_exit(struct task_struct *t)
+{
+}
+static inline void __async_schedule(struct task_struct *t)
+{
+}
+#endif /* !CONFIG_ASYNC_SUPPORT */
+
+#endif

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 02/12] syslets: add syslet.h include file, user API/ABI definitions
  2007-02-28 21:39 [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Ingo Molnar
  2007-02-28 21:41 ` [patch 01/12] syslets: add async.h include file, kernel-side API definitions Ingo Molnar
@ 2007-02-28 21:41 ` Ingo Molnar
  2007-03-01  3:05   ` Kevin O'Connor
  2007-02-28 21:41 ` [patch 03/12] syslets: generic kernel bits Ingo Molnar
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2007-02-28 21:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Jens Axboe, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

add include/linux/syslet.h which contains the user-space API/ABI
declarations. Add the new header to include/linux/Kbuild as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/linux/Kbuild   |    1 
 include/linux/syslet.h |  155 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 156 insertions(+)

Index: linux/include/linux/Kbuild
===================================================================
--- linux.orig/include/linux/Kbuild
+++ linux/include/linux/Kbuild
@@ -141,6 +141,7 @@ header-y += sockios.h
 header-y += som.h
 header-y += sound.h
 header-y += synclink.h
+header-y += syslet.h
 header-y += telephony.h
 header-y += termios.h
 header-y += ticable.h
Index: linux/include/linux/syslet.h
===================================================================
--- /dev/null
+++ linux/include/linux/syslet.h
@@ -0,0 +1,155 @@
+#ifndef _LINUX_SYSLET_H
+#define _LINUX_SYSLET_H
+/*
+ * The syslet subsystem - asynchronous syscall execution support.
+ *
+ * Started by Ingo Molnar:
+ *
+ *  Copyright (C) 2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ *
+ * User-space API/ABI definitions:
+ */
+
+#ifndef __user
+# define __user
+#endif
+
+/*
+ * This is the 'Syslet Atom' - the basic unit of execution
+ * within the syslet framework. A syslet always represents
+ * a single system-call plus its arguments, plus has conditions
+ * attached to it that allows the construction of larger
+ * programs from these atoms. User-space variables can be used
+ * (for example a loop index) via the special sys_umem*() syscalls.
+ *
+ * Arguments are implemented via pointers to arguments. This not
+ * only increases the flexibility of syslet atoms (multiple syslets
+ * can share the same variable for example), but is also an
+ * optimization: copy_uatom() will only fetch syscall parameters
+ * up until the point it meets the first NULL pointer. 50% of all
+ * syscalls have 2 or less parameters (and 90% of all syscalls have
+ * 4 or less parameters).
+ *
+ * [ Note: since the argument array is at the end of the atom, and the
+ *   kernel will not touch any argument beyond the final NULL one, atoms
+ *   might be packed more tightly. (the only special case exception to
+ *   this rule would be SKIP_TO_NEXT_ON_STOP atoms, where the kernel will
+ *   jump a full syslet_uatom number of bytes.) ]
+ */
+struct syslet_uatom {
+	u32		flags;
+	u32		nr;
+	u64		ret_ptr;
+	u64		next;
+	u64		arg_ptr[6];
+	/*
+	 * User-space can put anything in here, kernel will not
+	 * touch it:
+	 */
+	u64		private;
+};
+
+/*
+ * Flags to modify/control syslet atom behavior:
+ */
+
+/*
+ * Immediately queue this syslet asynchronously - do not even
+ * attempt to execute it synchronously in the user context:
+ */
+#define SYSLET_ASYNC				0x00000001
+
+/*
+ * Never queue this syslet asynchronously - even if synchronous
+ * execution causes a context-switching:
+ */
+#define SYSLET_SYNC				0x00000002
+
+/*
+ * Do not queue the syslet in the completion ring when done.
+ *
+ * ( the default is that the final atom of a syslet is queued
+ *   in the completion ring. )
+ *
+ * Some syscalls generate implicit completion events of their
+ * own.
+ */
+#define SYSLET_NO_COMPLETE			0x00000004
+
+/*
+ * Execution control: conditions upon the return code
+ * of the just executed syslet atom. 'Stop' means syslet
+ * execution is stopped and the atom is put into the
+ * completion ring:
+ */
+#define SYSLET_STOP_ON_NONZERO			0x00000008
+#define SYSLET_STOP_ON_ZERO			0x00000010
+#define SYSLET_STOP_ON_NEGATIVE			0x00000020
+#define SYSLET_STOP_ON_NON_POSITIVE		0x00000040
+
+#define SYSLET_STOP_MASK				\
+	(	SYSLET_STOP_ON_NONZERO		|	\
+		SYSLET_STOP_ON_ZERO		|	\
+		SYSLET_STOP_ON_NEGATIVE		|	\
+		SYSLET_STOP_ON_NON_POSITIVE		)
+
+/*
+ * Special modifier to 'stop' handling: instead of stopping the
+ * execution of the syslet, the linearly next syslet is executed.
+ * (Normal execution flows along atom->next, and execution stops
+ *  if atom->next is NULL or a stop condition becomes true.)
+ *
+ * This is what allows true branches of execution within syslets.
+ */
+#define SYSLET_SKIP_TO_NEXT_ON_STOP		0x00000080
+
+/*
+ * This is the (per-user-context) descriptor of the async completion
+ * ring. This gets passed in to sys_async_exec():
+ */
+struct async_head_user {
+	/*
+	 * Current completion ring index - managed by the kernel:
+	 */
+	u64		kernel_ring_idx;
+	/*
+	 * User-side ring index:
+	 */
+	u64		user_ring_idx;
+
+	/*
+	 * Ring of pointers to completed async syslets (i.e. syslets that
+	 * generated a cachemiss and went async, returning -EASYNCSYSLET
+	 * to the user context by sys_async_exec()) are queued here.
+	 * Syslets that were executed synchronously (cached) are not
+	 * queued here.
+	 *
+	 * Note: the final atom that generated the exit condition is
+	 * queued here. Normally this would be the last atom of a syslet.
+	 */
+	u64		completion_ring_ptr;
+
+	/*
+	 * Ring size in bytes:
+	 */
+	u64		ring_size_bytes;
+
+	/*
+	 * The head task can become a cachemiss thread later on
+	 * too, if it blocks - so it needs its separate thread
+	 * stack and start address too:
+	 */
+	u64		head_stack;
+	u64		head_ip;
+
+	/*
+	 * Newly started async kernel threads will take their
+	 * user stack and user start address from here. User-space
+	 * code has to check for new_thread_stack going to NULL
+	 * and has to refill it with a new stack if that happens.
+	 */
+	u64		new_thread_stack;
+	u64		new_thread_ip;
+};
+
+#endif

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 03/12] syslets: generic kernel bits
  2007-02-28 21:39 [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Ingo Molnar
  2007-02-28 21:41 ` [patch 01/12] syslets: add async.h include file, kernel-side API definitions Ingo Molnar
  2007-02-28 21:41 ` [patch 02/12] syslets: add syslet.h include file, user API/ABI definitions Ingo Molnar
@ 2007-02-28 21:41 ` Ingo Molnar
  2007-02-28 21:41 ` [patch 04/12] syslets: core code Ingo Molnar
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2007-02-28 21:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Jens Axboe, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

add the kernel generic bits - these are present even if !CONFIG_ASYNC_SUPPORT.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 fs/exec.c             |    4 ++++
 include/linux/sched.h |   23 ++++++++++++++++++++++-
 kernel/capability.c   |    3 +++
 kernel/exit.c         |    7 +++++++
 kernel/fork.c         |    5 +++++
 kernel/sched.c        |    9 +++++++++
 kernel/sys.c          |   36 ++++++++++++++++++++++++++++++++++++
 7 files changed, 86 insertions(+), 1 deletion(-)

Index: linux/fs/exec.c
===================================================================
--- linux.orig/fs/exec.c
+++ linux/fs/exec.c
@@ -1444,6 +1444,10 @@ static int coredump_wait(int exit_code)
 		tsk->vfork_done = NULL;
 		complete(vfork_done);
 	}
+	/*
+	 * Make sure we exit our async context before waiting:
+	 */
+	async_exit(tsk);
 
 	if (core_waiters)
 		wait_for_completion(&startup_done);
Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -83,12 +83,12 @@ struct sched_param {
 #include <linux/timer.h>
 #include <linux/hrtimer.h>
 #include <linux/task_io_accounting.h>
+#include <linux/async.h>
 
 #include <asm/processor.h>
 
 struct exec_domain;
 struct futex_pi_state;
-
 /*
  * List of flags we want to share for kernel threads,
  * if only because they are not used by them anyway.
@@ -997,6 +997,12 @@ struct task_struct {
 /* journalling filesystem info */
 	void *journal_info;
 
+/* async syscall support: */
+	struct async_thread *at, *async_ready;
+	struct async_head *ah;
+	struct async_thread __at;
+	struct async_head __ah;
+
 /* VM state */
 	struct reclaim_state *reclaim_state;
 
@@ -1055,6 +1061,21 @@ struct task_struct {
 #endif
 };
 
+/*
+ * Is an async syscall being executed currently?
+ */
+#ifdef CONFIG_ASYNC_SUPPORT
+static inline int async_syscall(struct task_struct *t)
+{
+	return t->async_ready != NULL;
+}
+#else /* !CONFIG_ASYNC_SUPPORT */
+static inline int async_syscall(struct task_struct *t)
+{
+	return 0;
+}
+#endif /* !CONFIG_ASYNC_SUPPORT */
+
 static inline pid_t process_group(struct task_struct *tsk)
 {
 	return tsk->signal->pgrp;
Index: linux/kernel/capability.c
===================================================================
--- linux.orig/kernel/capability.c
+++ linux/kernel/capability.c
@@ -178,6 +178,9 @@ asmlinkage long sys_capset(cap_user_head
      int ret;
      pid_t pid;
 
+     if (async_syscall(current))
+             return -ENOSYS;
+
      if (get_user(version, &header->version))
 	     return -EFAULT; 
 
Index: linux/kernel/exit.c
===================================================================
--- linux.orig/kernel/exit.c
+++ linux/kernel/exit.c
@@ -26,6 +26,7 @@
 #include <linux/ptrace.h>
 #include <linux/profile.h>
 #include <linux/mount.h>
+#include <linux/async.h>
 #include <linux/proc_fs.h>
 #include <linux/mempolicy.h>
 #include <linux/taskstats_kern.h>
@@ -890,6 +891,12 @@ fastcall NORET_TYPE void do_exit(long co
 		schedule();
 	}
 
+	/*
+	 * Note: async threads have to exit their context before the MM
+	 * exit (due to the coredumping wait):
+	 */
+	async_exit(tsk);
+
 	tsk->flags |= PF_EXITING;
 
 	if (unlikely(in_atomic()))
Index: linux/kernel/fork.c
===================================================================
--- linux.orig/kernel/fork.c
+++ linux/kernel/fork.c
@@ -22,6 +22,7 @@
 #include <linux/personality.h>
 #include <linux/mempolicy.h>
 #include <linux/sem.h>
+#include <linux/async.h>
 #include <linux/file.h>
 #include <linux/key.h>
 #include <linux/binfmts.h>
@@ -1056,6 +1057,7 @@ static struct task_struct *copy_process(
 
 	p->lock_depth = -1;		/* -1 = no lock */
 	do_posix_clock_monotonic_gettime(&p->start_time);
+	async_init(p);
 	p->security = NULL;
 	p->io_context = NULL;
 	p->io_wait = NULL;
@@ -1623,6 +1625,9 @@ asmlinkage long sys_unshare(unsigned lon
 	struct uts_namespace *uts, *new_uts = NULL;
 	struct ipc_namespace *ipc, *new_ipc = NULL;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	check_unshare_flags(&unshare_flags);
 
 	/* Return -EINVAL for all unsupported flags */
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -38,6 +38,7 @@
 #include <linux/vmalloc.h>
 #include <linux/blkdev.h>
 #include <linux/delay.h>
+#include <linux/async.h>
 #include <linux/smp.h>
 #include <linux/threads.h>
 #include <linux/timer.h>
@@ -3455,6 +3456,14 @@ asmlinkage void __sched schedule(void)
 	}
 	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
 
+	prev = current;
+	if (unlikely(prev->async_ready)) {
+		if (prev->state && !(preempt_count() & PREEMPT_ACTIVE) &&
+			(!(prev->state & TASK_INTERRUPTIBLE) ||
+				!signal_pending(prev)))
+			__async_schedule(prev);
+	}
+
 need_resched:
 	preempt_disable();
 	prev = current;
Index: linux/kernel/sys.c
===================================================================
--- linux.orig/kernel/sys.c
+++ linux/kernel/sys.c
@@ -941,6 +941,9 @@ asmlinkage long sys_setregid(gid_t rgid,
 	int new_egid = old_egid;
 	int retval;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	retval = security_task_setgid(rgid, egid, (gid_t)-1, LSM_SETID_RE);
 	if (retval)
 		return retval;
@@ -987,6 +990,9 @@ asmlinkage long sys_setgid(gid_t gid)
 	int old_egid = current->egid;
 	int retval;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	retval = security_task_setgid(gid, (gid_t)-1, (gid_t)-1, LSM_SETID_ID);
 	if (retval)
 		return retval;
@@ -1057,6 +1063,9 @@ asmlinkage long sys_setreuid(uid_t ruid,
 	int old_ruid, old_euid, old_suid, new_ruid, new_euid;
 	int retval;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	retval = security_task_setuid(ruid, euid, (uid_t)-1, LSM_SETID_RE);
 	if (retval)
 		return retval;
@@ -1120,6 +1129,9 @@ asmlinkage long sys_setuid(uid_t uid)
 	int old_ruid, old_suid, new_suid;
 	int retval;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	retval = security_task_setuid(uid, (uid_t)-1, (uid_t)-1, LSM_SETID_ID);
 	if (retval)
 		return retval;
@@ -1160,6 +1172,9 @@ asmlinkage long sys_setresuid(uid_t ruid
 	int old_suid = current->suid;
 	int retval;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	retval = security_task_setuid(ruid, euid, suid, LSM_SETID_RES);
 	if (retval)
 		return retval;
@@ -1214,6 +1229,9 @@ asmlinkage long sys_setresgid(gid_t rgid
 {
 	int retval;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	retval = security_task_setgid(rgid, egid, sgid, LSM_SETID_RES);
 	if (retval)
 		return retval;
@@ -1269,6 +1287,9 @@ asmlinkage long sys_setfsuid(uid_t uid)
 {
 	int old_fsuid;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	old_fsuid = current->fsuid;
 	if (security_task_setuid(uid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS))
 		return old_fsuid;
@@ -1298,6 +1319,9 @@ asmlinkage long sys_setfsgid(gid_t gid)
 {
 	int old_fsgid;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	old_fsgid = current->fsgid;
 	if (security_task_setgid(gid, (gid_t)-1, (gid_t)-1, LSM_SETID_FS))
 		return old_fsgid;
@@ -1373,6 +1397,9 @@ asmlinkage long sys_setpgid(pid_t pid, p
 	struct task_struct *group_leader = current->group_leader;
 	int err = -EINVAL;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	if (!pid)
 		pid = group_leader->pid;
 	if (!pgid)
@@ -1496,6 +1523,9 @@ asmlinkage long sys_setsid(void)
 	pid_t session;
 	int err = -EPERM;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	write_lock_irq(&tasklist_lock);
 
 	/* Fail if I am already a session leader */
@@ -1739,6 +1769,9 @@ asmlinkage long sys_setgroups(int gidset
 	struct group_info *group_info;
 	int retval;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	if (!capable(CAP_SETGID))
 		return -EPERM;
 	if ((unsigned)gidsetsize > NGROUPS_MAX)
@@ -2080,6 +2113,9 @@ asmlinkage long sys_prctl(int option, un
 {
 	long error;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	error = security_task_prctl(option, arg2, arg3, arg4, arg5);
 	if (error)
 		return error;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 04/12] syslets: core code
  2007-02-28 21:39 [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Ingo Molnar
                   ` (2 preceding siblings ...)
  2007-02-28 21:41 ` [patch 03/12] syslets: generic kernel bits Ingo Molnar
@ 2007-02-28 21:41 ` Ingo Molnar
  2007-02-28 21:41 ` [patch 05/12] syslets: core, documentation Ingo Molnar
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2007-02-28 21:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Jens Axboe, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

the core syslet / async system calls infrastructure code.

Is built only if CONFIG_ASYNC_SUPPORT is enabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/Makefile |    1 
 kernel/async.c  |  989 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 990 insertions(+)

Index: linux/kernel/Makefile
===================================================================
--- linux.orig/kernel/Makefile
+++ linux/kernel/Makefile
@@ -10,6 +10,7 @@ obj-y     = sched.o fork.o exec_domain.o
 	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
 	    hrtimer.o rwsem.o latency.o nsproxy.o srcu.o
 
+obj-$(CONFIG_ASYNC_SUPPORT) += async.o
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
 obj-y += time/
 obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o
Index: linux/kernel/async.c
===================================================================
--- /dev/null
+++ linux/kernel/async.c
@@ -0,0 +1,989 @@
+/*
+ * kernel/async.c
+ *
+ * The syslet and threadlet subsystem - asynchronous syscall and
+ * user-space code execution support.
+ *
+ * Started by Ingo Molnar:
+ *
+ *  Copyright (C) 2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ *
+ * This file is released under the GPLv2.
+ *
+ * This code implements asynchronous syscalls via 'syslets'.
+ *
+ * Syslets consist of a set of 'syslet atoms' which are residing
+ * purely in user-space memory and have no kernel-space resource
+ * attached to them. These atoms can be linked to each other via
+ * pointers. Besides the fundamental ability to execute system
+ * calls, syslet atoms can also implement branches, loops and
+ * arithmetics.
+ *
+ * Thus syslets can be used to build small autonomous programs that
+ * the kernel can execute purely from kernel-space, without having
+ * to return to any user-space context. Syslets can be run by any
+ * unprivileged user-space application - they are executed safely
+ * by the kernel.
+ *
+ * "Threadlets" are the user-space equivalent of syslets: small
+ * functions of execution that user-space attempts/expects to execute
+ * without scheduling. If the threadlet nevertheless blocks, the kernel
+ * creates a real thread from it, and that thread is put aside sleeping.
+ * The 'head' context (the context that never blocks) returns to the
+ * original function that called the threadlet. Once the sleeping thread
+ * wakes up again (after it got for whatever it was waiting - IO, timeout,
+ * etc.) the function continues executing asynchronously, as a thread.
+ * A user-space completion ring connects these asynchronous function calls
+ * back to the head context.
+ */
+#include <linux/syscalls.h>
+#include <linux/syslet.h>
+#include <linux/delay.h>
+#include <linux/async.h>
+#include <linux/sched.h>
+#include <linux/init.h>
+#include <linux/err.h>
+
+#include <asm/uaccess.h>
+#include <asm/unistd.h>
+
+/*
+ * An async 'cachemiss context' is either busy, or it is ready.
+ * If it is ready, the 'head' might switch its user-space context
+ * to that ready thread anytime - so that if the ex-head blocks,
+ * one ready thread can become the next head and can continue to
+ * execute user-space code.
+ */
+static void
+__mark_async_thread_ready(struct async_thread *at, struct async_head *ah)
+{
+	list_del(&at->entry);
+	list_add_tail(&at->entry, &ah->ready_async_threads);
+	if (list_empty(&ah->busy_async_threads))
+		wake_up(&ah->wait);
+}
+
+static void
+mark_async_thread_ready(struct async_thread *at, struct async_head *ah)
+{
+	spin_lock(&ah->lock);
+	__mark_async_thread_ready(at, ah);
+	spin_unlock(&ah->lock);
+}
+
+static void
+__mark_async_thread_busy(struct async_thread *at, struct async_head *ah)
+{
+	list_del(&at->entry);
+	list_add_tail(&at->entry, &ah->busy_async_threads);
+}
+
+static void
+mark_async_thread_busy(struct async_thread *at, struct async_head *ah)
+{
+	spin_lock(&ah->lock);
+	__mark_async_thread_busy(at, ah);
+	spin_unlock(&ah->lock);
+}
+
+static void
+__async_thread_init(struct task_struct *t, struct async_thread *at,
+		    struct async_head *ah)
+{
+	INIT_LIST_HEAD(&at->entry);
+	at->exit = 0;
+	at->task = t;
+	at->ah = ah;
+
+	t->at = at;
+}
+
+static void
+async_thread_init(struct task_struct *t, struct async_thread *at,
+		  struct async_head *ah)
+{
+	spin_lock(&ah->lock);
+	__async_thread_init(t, at, ah);
+	__mark_async_thread_ready(at, ah);
+	spin_unlock(&ah->lock);
+}
+
+static void
+async_thread_exit(struct async_thread *at, struct task_struct *t)
+{
+	struct async_head *ah = at->ah;
+
+	spin_lock(&ah->lock);
+	list_del_init(&at->entry);
+	if (at->exit)
+		complete(&ah->exit_done);
+	t->at = NULL;
+	at->task = NULL;
+	spin_unlock(&ah->lock);
+}
+
+static struct async_thread *
+pick_ready_cachemiss_thread(struct async_head *ah)
+{
+	struct list_head *head = &ah->ready_async_threads;
+
+	if (list_empty(head))
+		return NULL;
+
+	return list_entry(head->next, struct async_thread, entry);
+}
+
+void __async_schedule(struct task_struct *t)
+{
+	struct async_thread *new_async_thread;
+	struct async_thread *async_ready;
+	struct async_head *ah = t->ah;
+	struct task_struct *new_task;
+
+	WARN_ON(!ah);
+	spin_lock(&ah->lock);
+
+	new_async_thread = pick_ready_cachemiss_thread(ah);
+	if (!new_async_thread)
+		goto out_unlock;
+
+	async_ready = t->async_ready;
+	WARN_ON(!async_ready);
+	t->async_ready = NULL;
+
+	new_task = new_async_thread->task;
+
+	move_user_context(new_task, t);
+	if (ah->restore_stack) {
+		set_task_stack_reg(new_task, ah->restore_stack);
+		WARN_ON(!ah->restore_ip);
+		task_ip_reg(new_task) = ah->restore_ip;
+		/*
+		 * The return code 0 is needed to tell the
+		 * head user-context that the threadlet went async:
+		 */
+		task_ret_reg(new_task) = 0;
+	}
+
+	new_task->at = NULL;
+	t->ah = NULL;
+	new_task->ah = ah;
+	ah->user_task = new_task;
+
+	wake_up_process(new_task);
+
+	__async_thread_init(t, async_ready, ah);
+	__mark_async_thread_busy(t->at, ah);
+
+ out_unlock:
+	spin_unlock(&ah->lock);
+}
+
+static void async_schedule(struct task_struct *t)
+{
+	if (t->async_ready)
+		__async_schedule(t);
+}
+
+static long __exec_atom(struct task_struct *t, struct syslet_atom *atom)
+{
+	struct async_thread *async_ready_save;
+	long ret;
+
+	/*
+	 * If user-space expects the syscall to schedule then
+	 * (try to) switch user-space to another thread straight
+	 * away and execute the syscall asynchronously:
+	 */
+	if (unlikely(atom->flags & SYSLET_ASYNC))
+		async_schedule(t);
+	/*
+	 * Does user-space want synchronous execution for this atom?:
+	 */
+	async_ready_save = t->async_ready;
+	if (unlikely(atom->flags & SYSLET_SYNC))
+		t->async_ready = NULL;
+
+	if (unlikely(atom->nr >= atom->nr_syscalls))
+		return -ENOSYS;
+
+	ret = atom->call_table[atom->nr](atom->args[0], atom->args[1],
+					 atom->args[2], atom->args[3],
+					 atom->args[4], atom->args[5]);
+
+	if (atom->ret_ptr && put_user(ret, atom->ret_ptr))
+		return -EFAULT;
+
+	if (t->ah)
+		t->async_ready = async_ready_save;
+
+	return ret;
+}
+
+/*
+ * Arithmetics syscall, add a value to a user-space memory location.
+ *
+ * Generic C version - in case the architecture has not implemented it
+ * in assembly.
+ */
+asmlinkage __attribute__((weak)) long
+sys_umem_add(unsigned long __user *uptr, unsigned long inc)
+{
+	unsigned long val, new_val;
+
+	if (get_user(val, uptr))
+		return -EFAULT;
+	/*
+	 * inc == 0 means 'read memory value':
+	 */
+	if (!inc)
+		return val;
+
+	new_val = val + inc;
+	if (__put_user(new_val, uptr))
+		return -EFAULT;
+
+	return new_val;
+}
+
+/*
+ * Open-coded because this is a very hot codepath during syslet
+ * execution and every cycle counts ...
+ *
+ * [ NOTE: it's an explicit fastcall because optimized assembly code
+ *   might depend on this. There are some kernels that disable regparm,
+ *   so lets not break those if possible. ]
+ */
+fastcall __attribute__((weak)) long
+copy_uatom(struct syslet_atom *atom, struct syslet_uatom __user *uatom)
+{
+	unsigned long __user *arg_ptr;
+	long ret = 0;
+
+	if (!access_ok(VERIFY_READ, uatom, sizeof(*uatom)))
+		return -EFAULT;
+
+	ret = __get_user(atom->nr, &uatom->nr);
+	ret |= __get_user(atom->ret_ptr, (long __user **)&uatom->ret_ptr);
+	ret |= __get_user(atom->flags, (unsigned long __user *)&uatom->flags);
+	ret |= __get_user(atom->next,
+			(struct syslet_uatom  __user **)&uatom->next);
+
+	memset(atom->args, 0, sizeof(atom->args));
+
+	ret |= __get_user(arg_ptr, (unsigned long __user **)&uatom->arg_ptr[0]);
+	if (!arg_ptr)
+		return ret;
+	if (!access_ok(VERIFY_READ, arg_ptr, sizeof(*arg_ptr)))
+		return -EFAULT;
+	ret |= __get_user(atom->args[0], arg_ptr);
+
+	ret |= __get_user(arg_ptr, (unsigned long __user **)&uatom->arg_ptr[1]);
+	if (!arg_ptr)
+		return ret;
+	if (!access_ok(VERIFY_READ, arg_ptr, sizeof(*arg_ptr)))
+		return -EFAULT;
+	ret |= __get_user(atom->args[1], arg_ptr);
+
+	ret |= __get_user(arg_ptr, (unsigned long __user **)&uatom->arg_ptr[2]);
+	if (!arg_ptr)
+		return ret;
+	if (!access_ok(VERIFY_READ, arg_ptr, sizeof(*arg_ptr)))
+		return -EFAULT;
+	ret |= __get_user(atom->args[2], arg_ptr);
+
+	ret |= __get_user(arg_ptr, (unsigned long __user **)&uatom->arg_ptr[3]);
+	if (!arg_ptr)
+		return ret;
+	if (!access_ok(VERIFY_READ, arg_ptr, sizeof(*arg_ptr)))
+		return -EFAULT;
+	ret |= __get_user(atom->args[3], arg_ptr);
+
+	ret |= __get_user(arg_ptr, (unsigned long __user **)&uatom->arg_ptr[4]);
+	if (!arg_ptr)
+		return ret;
+	if (!access_ok(VERIFY_READ, arg_ptr, sizeof(*arg_ptr)))
+		return -EFAULT;
+	ret |= __get_user(atom->args[4], arg_ptr);
+
+	ret |= __get_user(arg_ptr, (unsigned long __user **)&uatom->arg_ptr[5]);
+	if (!arg_ptr)
+		return ret;
+	if (!access_ok(VERIFY_READ, arg_ptr, sizeof(*arg_ptr)))
+		return -EFAULT;
+	ret |= __get_user(atom->args[5], arg_ptr);
+
+	return ret;
+}
+
+/*
+ * Should the next atom run, depending on the return value of
+ * the current atom - or should we stop execution?
+ */
+static int run_next_atom(struct syslet_atom *atom, long ret)
+{
+	switch (atom->flags & SYSLET_STOP_MASK) {
+		case SYSLET_STOP_ON_NONZERO:
+			if (!ret)
+				return 1;
+			return 0;
+		case SYSLET_STOP_ON_ZERO:
+			if (ret)
+				return 1;
+			return 0;
+		case SYSLET_STOP_ON_NEGATIVE:
+			if (ret >= 0)
+				return 1;
+			return 0;
+		case SYSLET_STOP_ON_NON_POSITIVE:
+			if (ret > 0)
+				return 1;
+			return 0;
+	}
+	return 1;
+}
+
+static struct syslet_uatom __user *
+next_uatom(struct syslet_atom *atom, struct syslet_uatom *uatom, long ret)
+{
+	/*
+	 * If the stop condition is false then continue
+	 * to atom->next:
+	 */
+	if (run_next_atom(atom, ret))
+		return atom->next;
+	/*
+	 * Special-case: if the stop condition is true and the atom
+	 * has SKIP_TO_NEXT_ON_STOP set, then instead of
+	 * stopping we skip to the atom directly after this atom
+	 * (in linear address-space).
+	 *
+	 * This, combined with the atom->next pointer and the
+	 * stop condition flags is what allows true branches and
+	 * loops in syslets:
+	 */
+	if (atom->flags & SYSLET_SKIP_TO_NEXT_ON_STOP)
+		return uatom + 1;
+
+	return NULL;
+}
+
+/*
+ * If user-space requested a completion event then put the last
+ * executed uatom into the completion ring:
+ */
+static long
+completion_event(struct async_head *ah, struct task_struct *t,
+		 void __user *event, struct async_head_user __user *ahu)
+{
+	unsigned long ring_size_bytes, max_ring_idx, kernel_ring_idx;
+	struct syslet_uatom __user *slot_val = NULL;
+	u64 __user *completion_ring, *ring_slot;
+
+	WARN_ON(!t->at);
+	WARN_ON(t->ah);
+
+	if (!access_ok(VERIFY_WRITE, ahu, sizeof(*ahu)))
+		return -EFAULT;
+
+	if (__get_user(completion_ring,
+				(u64 __user **)&ahu->completion_ring_ptr))
+		return -EFAULT;
+	if (__get_user(ring_size_bytes,
+			(unsigned long __user *)&ahu->ring_size_bytes))
+		return -EFAULT;
+	if (!ring_size_bytes)
+		return -EINVAL;
+
+	max_ring_idx = ring_size_bytes / sizeof(u64);
+	if (ring_size_bytes != max_ring_idx * sizeof(u64))
+		return -EINVAL;
+	/*
+	 * We pre-check the ring pointer, so that in the fastpath
+	 * we can use __get_user():
+	 */
+	if (!access_ok(VERIFY_WRITE, completion_ring, ring_size_bytes))
+		return -EFAULT;
+
+	mutex_lock(&ah->completion_lock);
+	/*
+	 * Asynchron threads can complete in parallel so use the
+	 * head-lock to serialize:
+	 */
+	if (__get_user(kernel_ring_idx,
+			(unsigned long __user *)&ahu->kernel_ring_idx))
+		goto fault_unlock;
+	if (kernel_ring_idx >= max_ring_idx)
+		goto err_unlock;
+
+	ring_slot = completion_ring + kernel_ring_idx;
+	if (__get_user(slot_val, (struct syslet_uatom __user **)ring_slot))
+		goto fault_unlock;
+	/*
+	 * User-space submitted more work than what fits into the
+	 * completion ring - do not stomp over it silently and signal
+	 * the error condition:
+	 */
+	if (slot_val)
+		goto err_unlock;
+
+	slot_val = event;
+	if (__put_user(slot_val, (struct syslet_uatom __user **)ring_slot))
+		goto fault_unlock;
+	/*
+	 * Update the ring index:
+	 */
+	kernel_ring_idx++;
+	if (kernel_ring_idx == max_ring_idx)
+		kernel_ring_idx = 0;
+
+	if (__put_user(kernel_ring_idx, &ahu->kernel_ring_idx))
+		goto fault_unlock;
+
+	/*
+	 * See whether the async-head is waiting and needs a wakeup:
+	 */
+	if (ah->events_left) {
+		if (!--ah->events_left) {
+			/*
+			 * We first unlock the mutex - to reduce the size
+			 * of the critical section. We have a safe
+			 * reference to 'ah':
+			 */
+			mutex_unlock(&ah->completion_lock);
+			wake_up(&ah->wait);
+			goto out;
+		}
+	}
+
+	mutex_unlock(&ah->completion_lock);
+ out:
+	return 0;
+
+ fault_unlock:
+	mutex_unlock(&ah->completion_lock);
+
+	return -EFAULT;
+
+ err_unlock:
+	mutex_unlock(&ah->completion_lock);
+
+	return -EINVAL;
+}
+
+/*
+ * This is the main syslet atom execution loop. This fetches atoms
+ * and executes them until it runs out of atoms or until the
+ * exit condition becomes false:
+ */
+static struct syslet_uatom __user *
+exec_atom(struct async_head *ah, struct task_struct *t,
+	  struct syslet_uatom __user *uatom,
+	  struct async_head_user __user *ahu,
+	  syscall_fn_t *call_table,
+	  unsigned int nr_syscalls)
+{
+	struct syslet_uatom __user *last_uatom;
+	struct syslet_atom atom;
+	long ret;
+
+	atom.call_table = call_table;
+	atom.nr_syscalls = nr_syscalls;
+
+ run_next:
+	if (unlikely(copy_uatom(&atom, uatom)))
+		return ERR_PTR(-EFAULT);
+
+	last_uatom = uatom;
+	ret = __exec_atom(t, &atom);
+	if (unlikely(signal_pending(t)))
+		goto stop;
+	if (need_resched())
+		cond_resched();
+
+	uatom = next_uatom(&atom, uatom, ret);
+	if (uatom)
+		goto run_next;
+ stop:
+	/*
+	 * We do completion only in async context:
+	 */
+	if (t->at && !(atom.flags & SYSLET_NO_COMPLETE)) {
+		if (completion_event(ah, t, last_uatom, ahu))
+			return ERR_PTR(-EFAULT);
+	}
+
+	return last_uatom;
+}
+
+static long
+cachemiss_loop(struct async_thread *at, struct async_head *ah,
+	       struct task_struct *t)
+{
+	for (;;) {
+		mark_async_thread_busy(at, ah);
+		set_task_state(t, TASK_INTERRUPTIBLE);
+		if (unlikely(t->ah || at->exit || signal_pending(t)))
+			break;
+		mark_async_thread_ready(at, ah);
+		schedule();
+	}
+	t->state = TASK_RUNNING;
+
+	async_thread_exit(at, t);
+
+	if (at->exit)
+		do_exit(0);
+
+	if (!t->ah) {
+		/*
+		 * Cachemiss threads return to one given
+		 * user-space instruction address and stack
+		 * pointer:
+		 */
+		set_task_stack_reg(t, at->user_stack);
+		task_ip_reg(t) = at->user_ip;
+
+		return -1;
+	}
+	return 0;
+}
+
+/*
+ * This is what a newly created cachemiss thread executes for the
+ * first time: initialize, pick up the user stack/IP addresses from
+ * the head and then execute the cachemiss loop. If the cachemiss
+ * loop returns then we return back to user-space:
+ */
+static long cachemiss_thread(void *data)
+{
+	struct pt_regs *head_regs, *regs;
+	struct task_struct *t = current;
+	struct async_head *ah = data;
+	struct async_thread *at;
+	int ret;
+
+	at = &t->__at;
+	async_thread_init(t, at, ah);
+
+	/*
+	 * Clone the head thread's user-space ptregs over,
+	 * now that we are in kernel-space:
+	 */
+	head_regs = task_pt_regs(ah->user_task);
+	regs = task_pt_regs(t);
+
+	*regs = *head_regs;
+	ret = get_user(at->user_stack, ah->new_stackp);
+	WARN_ON(ret);
+	/*
+	 * Clear the stack pointer, signalling to user-space that
+	 * this thread stack has been used up:
+	 */
+	ret = put_user(0, ah->new_stackp);
+	WARN_ON(ret);
+
+	complete(&ah->start_done);
+
+	return cachemiss_loop(at, ah, t);
+}
+
+/**
+ * sys_async_thread - do work as an async cachemiss thread again
+ *
+ * @event: completion event
+ * @ahu: async head
+ *
+ * If an async thread has returned back to user-space (due to say
+ * a signal) then it is a 'busy' thread during that period. It
+ * can again offer itself into the cachemiss pool by calling this
+ * syscall:
+ */
+asmlinkage long
+sys_async_thread(void __user *event, struct async_head_user __user *ahu)
+{
+	struct task_struct *t = current;
+	struct async_thread *at = t->at;
+	struct async_head *ah = t->__at.ah;
+
+	/*
+	 * Only async threads are allowed to do this:
+	 */
+	if (!ah || t->ah)
+		return -EINVAL;
+
+	/*
+	 * A threadlet might want to signal a completion event:
+	 */
+	if (event) {
+		/*
+		 * threadlet - make sure the stack is never used
+		 * again by this thread:
+		 */
+		set_task_stack_reg(t, 0x11111111);
+		task_ip_reg(t) = 0x22222222;
+
+		if (completion_event(ah, t, event, ahu))
+			return -EFAULT;
+	}
+	/*
+	 * If a cachemiss threadlet calls sys_async_thread()
+	 * then we first have to mark it ready:
+	 */
+	if (at) {
+		mark_async_thread_ready(at, ah);
+	} else {
+		at = &t->__at;
+		WARN_ON(!at->ah);
+
+		async_thread_init(t, at, ah);
+	}
+
+	return cachemiss_loop(at, at->ah, t);
+}
+
+/*
+ * Initialize the in-kernel async head, based on the user-space async
+ * head:
+ */
+static long
+async_head_init(struct task_struct *t, struct async_head_user __user *ahu)
+{
+	struct async_head *ah;
+
+	ah = &t->__ah;
+
+	spin_lock_init(&ah->lock);
+	INIT_LIST_HEAD(&ah->ready_async_threads);
+	INIT_LIST_HEAD(&ah->busy_async_threads);
+	init_waitqueue_head(&ah->wait);
+	mutex_init(&ah->completion_lock);
+	ah->events_left = 0;
+	ah->ahu = NULL;
+	ah->new_stackp = NULL;
+	ah->new_ip = 0;
+	ah->restore_stack = 0;
+	ah->restore_ip = 0;
+	ah->user_task = t;
+	t->ah = ah;
+
+	return 0;
+}
+
+/*
+ * If the head cache-misses then it will become a cachemiss
+ * thread after having finished its current syslet. If it
+ * returns to user-space after that point (to handle a signal
+ * for example) then it will need a thread stack of its own:
+ */
+static long init_head(struct async_head *ah, struct task_struct *t,
+		      struct async_head_user __user *ahu)
+{
+	unsigned long head_stack, head_ip;
+
+	if (get_user(head_stack, (unsigned long __user *)&ahu->head_stack))
+		return -EFAULT;
+	if (get_user(head_ip, (unsigned long __user *)&ahu->head_ip))
+		return -EFAULT;
+	t->__at.user_stack = head_stack;
+	t->__at.user_ip = head_ip;
+
+	return async_head_init(t, ahu);
+}
+
+/*
+ * Simple limit and pool management mechanism for now:
+ */
+static long
+refill_cachemiss_pool(struct async_head *ah, struct task_struct *t,
+		      struct async_head_user __user *ahu)
+{
+	unsigned long new_ip;
+	int pid, ret;
+
+	init_completion(&ah->start_done);
+	ah->new_stackp = (unsigned long __user *)&ahu->new_thread_stack;
+	ret = get_user(new_ip, (unsigned long __user *)&ahu->new_thread_ip);
+	WARN_ON(ret);
+	ah->new_ip = new_ip;
+
+	pid = create_async_thread(cachemiss_thread, (void *)ah,
+			   CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND |
+			   CLONE_THREAD | CLONE_SYSVSEM);
+	if (pid < 0)
+		return pid;
+
+	wait_for_completion(&ah->start_done);
+	ah->new_stackp = NULL;
+	ah->new_ip = 0;
+
+	return 0;
+}
+
+/**
+ * sys_async_exec - execute a syslet.
+ *
+ * returns the uatom that was last executed, if the kernel was able to
+ * execute the syslet synchronously, or NULL if the syslet became
+ * asynchronous. (in the latter case syslet completion will be notified
+ * via the completion ring)
+ *
+ * (Various errors might also be returned via the usual negative numbers.)
+ */
+static struct syslet_uatom __user *
+__sys_async_exec(struct syslet_uatom __user *uatom,
+		 struct async_head_user __user *ahu,
+		 syscall_fn_t *call_table,
+		 unsigned int nr_syscalls)
+{
+	struct syslet_uatom __user *ret;
+	struct task_struct *t = current;
+	struct async_head *ah = t->ah;
+	struct async_thread *at = &t->__at;
+
+	/*
+	 * Do not allow recursive calls of sys_async_exec():
+	 */
+	if (async_syscall(t))
+		return ERR_PTR(-ENOSYS);
+
+	if (!uatom || !ahu || !ahu->new_thread_stack)
+		return ERR_PTR(-EINVAL);
+
+	if (unlikely(!ah)) {
+		ret = (void *)init_head(ah, t, ahu);
+		if (ret)
+			return ret;
+		ah = t->ah;
+	}
+
+	if (unlikely(list_empty(&ah->ready_async_threads))) {
+		ret = (void *)refill_cachemiss_pool(ah, t, ahu);
+		if (ret)
+			return ret;
+	}
+
+	t->async_ready = at;
+	ah->ahu = ahu;
+
+	ret = exec_atom(ah, t, uatom, ahu, call_table, nr_syscalls);
+
+	/*
+	 * Are we still executing as head?
+	 */
+	if (t->ah) {
+		t->async_ready = NULL;
+
+		return ret;
+	}
+
+	/*
+	 * We got turned into a cachemiss thread,
+	 * enter the cachemiss loop:
+	 */
+	set_task_state(t, TASK_INTERRUPTIBLE);
+	mark_async_thread_ready(at, ah);
+
+	return ERR_PTR(cachemiss_loop(at, ah, t));
+}
+
+asmlinkage struct syslet_uatom __user *
+sys_async_exec(struct syslet_uatom __user *uatom,
+	       struct async_head_user __user *ahu)
+{
+	return __sys_async_exec(uatom, ahu, sys_call_table, NR_syscalls);
+}
+
+#ifdef CONFIG_COMPAT
+
+asmlinkage struct syslet_uatom __user *
+compat_sys_async_exec(struct syslet_uatom __user *uatom,
+		      struct async_head_user __user *ahu)
+{
+	return __sys_async_exec(uatom, ahu, compat_sys_call_table,
+				compat_NR_syscalls);
+}
+
+#endif
+
+/**
+ * sys_async_wait - wait for async completion events
+ *
+ * This syscall waits for @min_wait_events syslet completion events
+ * to finish or for all async processing to finish (whichever
+ * comes first).
+ */
+asmlinkage long
+sys_async_wait(unsigned long min_wait_events, unsigned long user_ring_idx,
+	       struct async_head_user __user *ahu)
+{
+	struct task_struct *t = current;
+	struct async_head *ah = t->ah;
+	unsigned long kernel_ring_idx;
+
+	/*
+	 * Do not allow async waiting:
+	 */
+	if (async_syscall(t))
+		return -ENOSYS;
+	if (!ah)
+		return -EINVAL;
+
+	mutex_lock(&ah->completion_lock);
+	if (get_user(kernel_ring_idx,
+			(unsigned long __user *)&ahu->kernel_ring_idx))
+		goto err_unlock;
+	/*
+	 * Account any completions that happened since user-space
+	 * checked the ring:
+ 	 */
+	ah->events_left = min_wait_events - (kernel_ring_idx - user_ring_idx);
+	mutex_unlock(&ah->completion_lock);
+
+	return wait_event_interruptible(ah->wait,
+		list_empty(&ah->busy_async_threads) || ah->events_left <= 0);
+
+ err_unlock:
+	mutex_unlock(&ah->completion_lock);
+	return -EFAULT;
+}
+
+asmlinkage long
+sys_threadlet_on(unsigned long restore_stack,
+		 unsigned long restore_ip,
+		 struct async_head_user __user *ahu)
+{
+	struct task_struct *t = current;
+	struct async_head *ah = t->ah;
+	struct async_thread *at = &t->__at;
+	long ret;
+
+	/*
+	 * Do not allow recursive calls of sys_threadlet_on():
+	 */
+	if (t->async_ready || t->at)
+		return -EINVAL;
+
+	if (unlikely(!ah)) {
+		ret = init_head(ah, t, ahu);
+		if (ret)
+			return ret;
+		ah = t->ah;
+	}
+
+	if (unlikely(list_empty(&ah->ready_async_threads))) {
+		ret = refill_cachemiss_pool(ah, t, ahu);
+		if (ret)
+			return ret;
+	}
+
+	t->async_ready = at;
+	ah->restore_stack = restore_stack;
+	ah->restore_ip = restore_ip;
+
+	ah->ahu = ahu;
+
+	return 0;
+}
+
+asmlinkage long sys_threadlet_off(void)
+{
+	struct task_struct *t = current;
+	struct async_head *ah = t->ah;
+
+	/*
+	 * Are we still executing as head?
+	 */
+	if (ah) {
+		t->async_ready = NULL;
+
+		return 1;
+	}
+
+	/*
+	 * We got turned into a cachemiss thread,
+	 * return to user-space, which can do
+	 * the notification, etc:
+	 */
+	return 0;
+}
+
+static void __notify_async_thread_exit(struct async_thread *at,
+				       struct async_head *ah)
+{
+	list_del_init(&at->entry);
+	at->exit = 1;
+	init_completion(&ah->exit_done);
+	wake_up_process(at->task);
+}
+
+static void stop_cachemiss_threads(struct async_head *ah)
+{
+	struct async_thread *at;
+
+repeat:
+	spin_lock(&ah->lock);
+	list_for_each_entry(at, &ah->ready_async_threads, entry) {
+
+		__notify_async_thread_exit(at, ah);
+		spin_unlock(&ah->lock);
+
+		wait_for_completion(&ah->exit_done);
+
+		goto repeat;
+	}
+
+	list_for_each_entry(at, &ah->busy_async_threads, entry) {
+
+		__notify_async_thread_exit(at, ah);
+		spin_unlock(&ah->lock);
+
+		wait_for_completion(&ah->exit_done);
+
+		goto repeat;
+	}
+	spin_unlock(&ah->lock);
+}
+
+static void async_head_exit(struct async_head *ah, struct task_struct *t)
+{
+	stop_cachemiss_threads(ah);
+	WARN_ON(!list_empty(&ah->ready_async_threads));
+	WARN_ON(!list_empty(&ah->busy_async_threads));
+	WARN_ON(spin_is_locked(&ah->lock));
+
+	t->ah = NULL;
+}
+
+/*
+ * fork()-time initialization:
+ */
+void async_init(struct task_struct *t)
+{
+	t->at		= NULL;
+	t->async_ready	= NULL;
+	t->ah		= NULL;
+	t->__at.ah	= NULL;
+	t->__at.user_stack = 0;
+}
+
+/*
+ * do_exit()-time cleanup:
+ */
+void async_exit(struct task_struct *t)
+{
+	struct async_thread *at = t->at;
+	struct async_head *ah = t->ah;
+
+	/*
+	 * If head does a sys_exit() then the final schedule() must
+	 * not be passed on to another cachemiss thread:
+	 */
+	t->async_ready = NULL;
+
+	if (unlikely(at))
+		async_thread_exit(at, t);
+
+	if (unlikely(ah))
+		async_head_exit(ah, t);
+}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 05/12] syslets: core, documentation
  2007-02-28 21:39 [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Ingo Molnar
                   ` (3 preceding siblings ...)
  2007-02-28 21:41 ` [patch 04/12] syslets: core code Ingo Molnar
@ 2007-02-28 21:41 ` Ingo Molnar
  2007-02-28 21:41 ` [patch 06/12] x86: split FPU state from task state Ingo Molnar
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2007-02-28 21:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Jens Axboe, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

Add Documentation/syslet-design.txt with a high-level description
of the syslet concepts.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 Documentation/syslet-design.txt |  137 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 137 insertions(+)

Index: linux/Documentation/syslet-design.txt
===================================================================
--- /dev/null
+++ linux/Documentation/syslet-design.txt
@@ -0,0 +1,137 @@
+Syslets / asynchronous system calls
+===================================
+
+started by Ingo Molnar <mingo@redhat.com>
+
+Goal:
+-----
+
+The goal of the syslet subsystem is to allow user-space to execute
+arbitrary system calls asynchronously. It does so by allowing user-space
+to execute "syslets" which are small scriptlets that the kernel can execute
+both securely and asynchronously without having to exit to user-space.
+
+the core syslet concepts are:
+
+The Syslet Atom:
+----------------
+
+The syslet atom is a small, fixed-size (44 bytes on 32-bit) piece of
+user-space memory, which is the basic unit of execution within the syslet
+framework. A syslet represents a single system-call and its arguments.
+In addition it also has condition flags attached to it that allows the
+construction of larger programs (syslets) from these atoms.
+
+Arguments to the system call are implemented via pointers to arguments.
+This not only increases the flexibility of syslet atoms (multiple syslets
+can share the same variable for example), but is also an optimization:
+copy_uatom() will only fetch syscall parameters up until the point it
+meets the first NULL pointer. 50% of all syscalls have 2 or less
+parameters (and 90% of all syscalls have 4 or less parameters).
+
+ [ Note: since the argument array is at the end of the atom, and the
+   kernel will not touch any argument beyond the first NULL one, atoms
+   might be packed more tightly. (the only special case exception to
+   this rule would be SKIP_TO_NEXT_ON_STOP atoms, where the kernel will
+   jump a full syslet_uatom number of bytes.) ]
+
+The Syslet:
+-----------
+
+A syslet is a program, represented by a graph of syslet atoms. The
+syslet atoms are chained to each other either via the atom->next pointer,
+or via the SYSLET_SKIP_TO_NEXT_ON_STOP flag.
+
+Running Syslets:
+----------------
+
+Syslets can be run via the sys_async_exec() system call, which takes
+the first atom of the syslet as an argument. The kernel does not need
+to be told about the other atoms - it will fetch them on the fly as
+execution goes forward.
+
+A syslet might either be executed 'cached', or it might generate a
+'cachemiss'.
+
+'Cached' syslet execution means that the whole syslet was executed
+without blocking. The system-call returns the submitted atom's address
+in this case.
+
+If a syslet blocks while the kernel executes a system-call embedded in
+one of its atoms, the kernel will keep working on that syscall in
+parallel, but it immediately returns to user-space with a NULL pointer,
+so the submitting task can submit other syslets.
+
+Completion of asynchronous syslets:
+-----------------------------------
+
+Completion of asynchronous syslets is done via the 'completion ring',
+which is a ringbuffer of syslet atom pointers in user-space memory,
+provided by user-space as an argument to the sys_async_exec() syscall.
+The kernel fills in the ringbuffer starting at index 0, and user-space
+must clear out these pointers. Once the kernel reaches the end of
+the ring it wraps back to index 0. The kernel will not overwrite
+non-NULL pointers (but will return an error), and thus user-space has
+to make sure it completes all events it asked for.
+
+Waiting for completions:
+------------------------
+
+Syslet completions can be waited for via the sys_async_wait()
+system call - which takes the number of events it should wait for as
+a parameter. This system call will also return if the number of
+pending events goes down to zero.
+
+Sample Hello World syslet code:
+
+--------------------------->
+/*
+ * Set up a syslet atom:
+ */
+static void
+init_atom(struct syslet_uatom *atom, int nr,
+	  void *arg_ptr0, void *arg_ptr1, void *arg_ptr2,
+	  void *arg_ptr3, void *arg_ptr4, void *arg_ptr5,
+	  void *ret_ptr, unsigned long flags, struct syslet_uatom *next)
+{
+	atom->nr = nr;
+	atom->arg_ptr[0] = arg_ptr0;
+	atom->arg_ptr[1] = arg_ptr1;
+	atom->arg_ptr[2] = arg_ptr2;
+	atom->arg_ptr[3] = arg_ptr3;
+	atom->arg_ptr[4] = arg_ptr4;
+	atom->arg_ptr[5] = arg_ptr5;
+	atom->ret_ptr = ret_ptr;
+	atom->flags = flags;
+	atom->next = next;
+}
+
+int main(int argc, char *argv[])
+{
+	unsigned long int fd_out = 1; /* standard output */
+	char *buf = "Hello Syslet World!\n";
+	unsigned long size = strlen(buf);
+	struct syslet_uatom atom, *done;
+
+	async_head_init();
+
+	/*
+	 * Simple syslet consisting of a single atom:
+	 */
+	init_atom(&atom, __NR_sys_write, &fd_out, &buf, &size,
+		  NULL, NULL, NULL, NULL, SYSLET_ASYNC, NULL);
+	done = sys_async_exec(&atom);
+	if (!done) {
+		sys_async_wait(1);
+		if (completion_ring[curr_ring_idx] == &atom) {
+			completion_ring[curr_ring_idx] = NULL;
+			printf("completed an async syslet atom!\n");
+		}
+	} else {
+		printf("completed an cached syslet atom!\n");
+	}
+
+	async_head_exit();
+
+	return 0;
+}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 06/12] x86: split FPU state from task state
  2007-02-28 21:39 [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Ingo Molnar
                   ` (4 preceding siblings ...)
  2007-02-28 21:41 ` [patch 05/12] syslets: core, documentation Ingo Molnar
@ 2007-02-28 21:41 ` Ingo Molnar
  2007-02-28 21:42 ` [patch 07/12] syslets: x86, add create_async_thread() method Ingo Molnar
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2007-02-28 21:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Jens Axboe, Thomas Gleixner

From: Arjan van de Ven <arjan@linux.intel.com>

Split the FPU save area from the task struct. This allows easy migration
of FPU context, and it's generally cleaner. It also allows the following
two (future) optimizations:

1) allocate the right size for the actual cpu rather than 512 bytes always
2) only allocate when the application actually uses FPU, so in the first
   lazy FPU trap. This could save memory for non-fpu using apps.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/i386/kernel/i387.c        |   96 ++++++++++++++++++++---------------------
 arch/i386/kernel/process.c     |   56 +++++++++++++++++++++++
 arch/i386/kernel/traps.c       |   10 ----
 include/asm-i386/i387.h        |    6 +-
 include/asm-i386/processor.h   |    6 ++
 include/asm-i386/thread_info.h |    6 ++
 kernel/fork.c                  |    7 ++
 7 files changed, 123 insertions(+), 64 deletions(-)

Index: linux/arch/i386/kernel/i387.c
===================================================================
--- linux.orig/arch/i386/kernel/i387.c
+++ linux/arch/i386/kernel/i387.c
@@ -31,9 +31,9 @@ void mxcsr_feature_mask_init(void)
 	unsigned long mask = 0;
 	clts();
 	if (cpu_has_fxsr) {
-		memset(&current->thread.i387.fxsave, 0, sizeof(struct i387_fxsave_struct));
-		asm volatile("fxsave %0" : : "m" (current->thread.i387.fxsave)); 
-		mask = current->thread.i387.fxsave.mxcsr_mask;
+		memset(&current->thread.i387->fxsave, 0, sizeof(struct i387_fxsave_struct));
+		asm volatile("fxsave %0" : : "m" (current->thread.i387->fxsave));
+		mask = current->thread.i387->fxsave.mxcsr_mask;
 		if (mask == 0) mask = 0x0000ffbf;
 	} 
 	mxcsr_feature_mask &= mask;
@@ -49,16 +49,16 @@ void mxcsr_feature_mask_init(void)
 void init_fpu(struct task_struct *tsk)
 {
 	if (cpu_has_fxsr) {
-		memset(&tsk->thread.i387.fxsave, 0, sizeof(struct i387_fxsave_struct));
-		tsk->thread.i387.fxsave.cwd = 0x37f;
+		memset(&tsk->thread.i387->fxsave, 0, sizeof(struct i387_fxsave_struct));
+		tsk->thread.i387->fxsave.cwd = 0x37f;
 		if (cpu_has_xmm)
-			tsk->thread.i387.fxsave.mxcsr = 0x1f80;
+			tsk->thread.i387->fxsave.mxcsr = 0x1f80;
 	} else {
-		memset(&tsk->thread.i387.fsave, 0, sizeof(struct i387_fsave_struct));
-		tsk->thread.i387.fsave.cwd = 0xffff037fu;
-		tsk->thread.i387.fsave.swd = 0xffff0000u;
-		tsk->thread.i387.fsave.twd = 0xffffffffu;
-		tsk->thread.i387.fsave.fos = 0xffff0000u;
+		memset(&tsk->thread.i387->fsave, 0, sizeof(struct i387_fsave_struct));
+		tsk->thread.i387->fsave.cwd = 0xffff037fu;
+		tsk->thread.i387->fsave.swd = 0xffff0000u;
+		tsk->thread.i387->fsave.twd = 0xffffffffu;
+		tsk->thread.i387->fsave.fos = 0xffff0000u;
 	}
 	/* only the device not available exception or ptrace can call init_fpu */
 	set_stopped_child_used_math(tsk);
@@ -152,18 +152,18 @@ static inline unsigned long twd_fxsr_to_
 unsigned short get_fpu_cwd( struct task_struct *tsk )
 {
 	if ( cpu_has_fxsr ) {
-		return tsk->thread.i387.fxsave.cwd;
+		return tsk->thread.i387->fxsave.cwd;
 	} else {
-		return (unsigned short)tsk->thread.i387.fsave.cwd;
+		return (unsigned short)tsk->thread.i387->fsave.cwd;
 	}
 }
 
 unsigned short get_fpu_swd( struct task_struct *tsk )
 {
 	if ( cpu_has_fxsr ) {
-		return tsk->thread.i387.fxsave.swd;
+		return tsk->thread.i387->fxsave.swd;
 	} else {
-		return (unsigned short)tsk->thread.i387.fsave.swd;
+		return (unsigned short)tsk->thread.i387->fsave.swd;
 	}
 }
 
@@ -171,9 +171,9 @@ unsigned short get_fpu_swd( struct task_
 unsigned short get_fpu_twd( struct task_struct *tsk )
 {
 	if ( cpu_has_fxsr ) {
-		return tsk->thread.i387.fxsave.twd;
+		return tsk->thread.i387->fxsave.twd;
 	} else {
-		return (unsigned short)tsk->thread.i387.fsave.twd;
+		return (unsigned short)tsk->thread.i387->fsave.twd;
 	}
 }
 #endif  /*  0  */
@@ -181,7 +181,7 @@ unsigned short get_fpu_twd( struct task_
 unsigned short get_fpu_mxcsr( struct task_struct *tsk )
 {
 	if ( cpu_has_xmm ) {
-		return tsk->thread.i387.fxsave.mxcsr;
+		return tsk->thread.i387->fxsave.mxcsr;
 	} else {
 		return 0x1f80;
 	}
@@ -192,27 +192,27 @@ unsigned short get_fpu_mxcsr( struct tas
 void set_fpu_cwd( struct task_struct *tsk, unsigned short cwd )
 {
 	if ( cpu_has_fxsr ) {
-		tsk->thread.i387.fxsave.cwd = cwd;
+		tsk->thread.i387->fxsave.cwd = cwd;
 	} else {
-		tsk->thread.i387.fsave.cwd = ((long)cwd | 0xffff0000u);
+		tsk->thread.i387->fsave.cwd = ((long)cwd | 0xffff0000u);
 	}
 }
 
 void set_fpu_swd( struct task_struct *tsk, unsigned short swd )
 {
 	if ( cpu_has_fxsr ) {
-		tsk->thread.i387.fxsave.swd = swd;
+		tsk->thread.i387->fxsave.swd = swd;
 	} else {
-		tsk->thread.i387.fsave.swd = ((long)swd | 0xffff0000u);
+		tsk->thread.i387->fsave.swd = ((long)swd | 0xffff0000u);
 	}
 }
 
 void set_fpu_twd( struct task_struct *tsk, unsigned short twd )
 {
 	if ( cpu_has_fxsr ) {
-		tsk->thread.i387.fxsave.twd = twd_i387_to_fxsr(twd);
+		tsk->thread.i387->fxsave.twd = twd_i387_to_fxsr(twd);
 	} else {
-		tsk->thread.i387.fsave.twd = ((long)twd | 0xffff0000u);
+		tsk->thread.i387->fsave.twd = ((long)twd | 0xffff0000u);
 	}
 }
 
@@ -298,8 +298,8 @@ static inline int save_i387_fsave( struc
 	struct task_struct *tsk = current;
 
 	unlazy_fpu( tsk );
-	tsk->thread.i387.fsave.status = tsk->thread.i387.fsave.swd;
-	if ( __copy_to_user( buf, &tsk->thread.i387.fsave,
+	tsk->thread.i387->fsave.status = tsk->thread.i387->fsave.swd;
+	if ( __copy_to_user( buf, &tsk->thread.i387->fsave,
 			     sizeof(struct i387_fsave_struct) ) )
 		return -1;
 	return 1;
@@ -312,15 +312,15 @@ static int save_i387_fxsave( struct _fps
 
 	unlazy_fpu( tsk );
 
-	if ( convert_fxsr_to_user( buf, &tsk->thread.i387.fxsave ) )
+	if ( convert_fxsr_to_user( buf, &tsk->thread.i387->fxsave ) )
 		return -1;
 
-	err |= __put_user( tsk->thread.i387.fxsave.swd, &buf->status );
+	err |= __put_user( tsk->thread.i387->fxsave.swd, &buf->status );
 	err |= __put_user( X86_FXSR_MAGIC, &buf->magic );
 	if ( err )
 		return -1;
 
-	if ( __copy_to_user( &buf->_fxsr_env[0], &tsk->thread.i387.fxsave,
+	if ( __copy_to_user( &buf->_fxsr_env[0], &tsk->thread.i387->fxsave,
 			     sizeof(struct i387_fxsave_struct) ) )
 		return -1;
 	return 1;
@@ -343,7 +343,7 @@ int save_i387( struct _fpstate __user *b
 			return save_i387_fsave( buf );
 		}
 	} else {
-		return save_i387_soft( &current->thread.i387.soft, buf );
+		return save_i387_soft( &current->thread.i387->soft, buf );
 	}
 }
 
@@ -351,7 +351,7 @@ static inline int restore_i387_fsave( st
 {
 	struct task_struct *tsk = current;
 	clear_fpu( tsk );
-	return __copy_from_user( &tsk->thread.i387.fsave, buf,
+	return __copy_from_user( &tsk->thread.i387->fsave, buf,
 				 sizeof(struct i387_fsave_struct) );
 }
 
@@ -360,11 +360,11 @@ static int restore_i387_fxsave( struct _
 	int err;
 	struct task_struct *tsk = current;
 	clear_fpu( tsk );
-	err = __copy_from_user( &tsk->thread.i387.fxsave, &buf->_fxsr_env[0],
+	err = __copy_from_user( &tsk->thread.i387->fxsave, &buf->_fxsr_env[0],
 				sizeof(struct i387_fxsave_struct) );
 	/* mxcsr reserved bits must be masked to zero for security reasons */
-	tsk->thread.i387.fxsave.mxcsr &= mxcsr_feature_mask;
-	return err ? 1 : convert_fxsr_from_user( &tsk->thread.i387.fxsave, buf );
+	tsk->thread.i387->fxsave.mxcsr &= mxcsr_feature_mask;
+	return err ? 1 : convert_fxsr_from_user( &tsk->thread.i387->fxsave, buf );
 }
 
 int restore_i387( struct _fpstate __user *buf )
@@ -378,7 +378,7 @@ int restore_i387( struct _fpstate __user
 			err = restore_i387_fsave( buf );
 		}
 	} else {
-		err = restore_i387_soft( &current->thread.i387.soft, buf );
+		err = restore_i387_soft( &current->thread.i387->soft, buf );
 	}
 	set_used_math();
 	return err;
@@ -391,7 +391,7 @@ int restore_i387( struct _fpstate __user
 static inline int get_fpregs_fsave( struct user_i387_struct __user *buf,
 				    struct task_struct *tsk )
 {
-	return __copy_to_user( buf, &tsk->thread.i387.fsave,
+	return __copy_to_user( buf, &tsk->thread.i387->fsave,
 			       sizeof(struct user_i387_struct) );
 }
 
@@ -399,7 +399,7 @@ static inline int get_fpregs_fxsave( str
 				     struct task_struct *tsk )
 {
 	return convert_fxsr_to_user( (struct _fpstate __user *)buf,
-				     &tsk->thread.i387.fxsave );
+				     &tsk->thread.i387->fxsave );
 }
 
 int get_fpregs( struct user_i387_struct __user *buf, struct task_struct *tsk )
@@ -411,7 +411,7 @@ int get_fpregs( struct user_i387_struct 
 			return get_fpregs_fsave( buf, tsk );
 		}
 	} else {
-		return save_i387_soft( &tsk->thread.i387.soft,
+		return save_i387_soft( &tsk->thread.i387->soft,
 				       (struct _fpstate __user *)buf );
 	}
 }
@@ -419,14 +419,14 @@ int get_fpregs( struct user_i387_struct 
 static inline int set_fpregs_fsave( struct task_struct *tsk,
 				    struct user_i387_struct __user *buf )
 {
-	return __copy_from_user( &tsk->thread.i387.fsave, buf,
+	return __copy_from_user( &tsk->thread.i387->fsave, buf,
 				 sizeof(struct user_i387_struct) );
 }
 
 static inline int set_fpregs_fxsave( struct task_struct *tsk,
 				     struct user_i387_struct __user *buf )
 {
-	return convert_fxsr_from_user( &tsk->thread.i387.fxsave,
+	return convert_fxsr_from_user( &tsk->thread.i387->fxsave,
 				       (struct _fpstate __user *)buf );
 }
 
@@ -439,7 +439,7 @@ int set_fpregs( struct task_struct *tsk,
 			return set_fpregs_fsave( tsk, buf );
 		}
 	} else {
-		return restore_i387_soft( &tsk->thread.i387.soft,
+		return restore_i387_soft( &tsk->thread.i387->soft,
 					  (struct _fpstate __user *)buf );
 	}
 }
@@ -447,7 +447,7 @@ int set_fpregs( struct task_struct *tsk,
 int get_fpxregs( struct user_fxsr_struct __user *buf, struct task_struct *tsk )
 {
 	if ( cpu_has_fxsr ) {
-		if (__copy_to_user( buf, &tsk->thread.i387.fxsave,
+		if (__copy_to_user( buf, &tsk->thread.i387->fxsave,
 				    sizeof(struct user_fxsr_struct) ))
 			return -EFAULT;
 		return 0;
@@ -461,11 +461,11 @@ int set_fpxregs( struct task_struct *tsk
 	int ret = 0;
 
 	if ( cpu_has_fxsr ) {
-		if (__copy_from_user( &tsk->thread.i387.fxsave, buf,
+		if (__copy_from_user( &tsk->thread.i387->fxsave, buf,
 				  sizeof(struct user_fxsr_struct) ))
 			ret = -EFAULT;
 		/* mxcsr reserved bits must be masked to zero for security reasons */
-		tsk->thread.i387.fxsave.mxcsr &= mxcsr_feature_mask;
+		tsk->thread.i387->fxsave.mxcsr &= mxcsr_feature_mask;
 	} else {
 		ret = -EIO;
 	}
@@ -479,7 +479,7 @@ int set_fpxregs( struct task_struct *tsk
 static inline void copy_fpu_fsave( struct task_struct *tsk,
 				   struct user_i387_struct *fpu )
 {
-	memcpy( fpu, &tsk->thread.i387.fsave,
+	memcpy( fpu, &tsk->thread.i387->fsave,
 		sizeof(struct user_i387_struct) );
 }
 
@@ -490,10 +490,10 @@ static inline void copy_fpu_fxsave( stru
 	unsigned short *from;
 	int i;
 
-	memcpy( fpu, &tsk->thread.i387.fxsave, 7 * sizeof(long) );
+	memcpy( fpu, &tsk->thread.i387->fxsave, 7 * sizeof(long) );
 
 	to = (unsigned short *)&fpu->st_space[0];
-	from = (unsigned short *)&tsk->thread.i387.fxsave.st_space[0];
+	from = (unsigned short *)&tsk->thread.i387->fxsave.st_space[0];
 	for ( i = 0 ; i < 8 ; i++, to += 5, from += 8 ) {
 		memcpy( to, from, 5 * sizeof(unsigned short) );
 	}
@@ -540,7 +540,7 @@ int dump_task_extended_fpu(struct task_s
 	if (fpvalid) {
 		if (tsk == current)
 		       unlazy_fpu(tsk);
-		memcpy(fpu, &tsk->thread.i387.fxsave, sizeof(*fpu));
+		memcpy(fpu, &tsk->thread.i387->fxsave, sizeof(*fpu));
 	}
 	return fpvalid;
 }
Index: linux/arch/i386/kernel/process.c
===================================================================
--- linux.orig/arch/i386/kernel/process.c
+++ linux/arch/i386/kernel/process.c
@@ -648,7 +648,7 @@ struct task_struct fastcall * __switch_t
 
 	/* we're going to use this soon, after a few expensive things */
 	if (next_p->fpu_counter > 5)
-		prefetch(&next->i387.fxsave);
+		prefetch(&next->i387->fxsave);
 
 	/*
 	 * Reload esp0.
@@ -927,3 +927,57 @@ unsigned long arch_align_stack(unsigned 
 		sp -= get_random_int() % 8192;
 	return sp & ~0xf;
 }
+
+
+
+struct kmem_cache *task_struct_cachep;
+struct kmem_cache *task_i387_cachep;
+
+struct task_struct * alloc_task_struct(void)
+{
+	struct task_struct *tsk;
+	tsk = kmem_cache_alloc(task_struct_cachep, GFP_KERNEL);
+	if (!tsk)
+		return NULL;
+	tsk->thread.i387 = kmem_cache_alloc(task_i387_cachep, GFP_KERNEL);
+	if (!tsk->thread.i387)
+		goto error;
+	WARN_ON((unsigned long)tsk->thread.i387 & 15);
+	return tsk;
+
+error:
+	kfree(tsk);
+	return NULL;
+}
+
+void memcpy_task_struct(struct task_struct *dst, struct task_struct *src)
+{
+	union i387_union *ptr;
+	ptr = dst->thread.i387;
+	*dst = *src;
+	dst->thread.i387 = ptr;
+	memcpy(dst->thread.i387, src->thread.i387, sizeof(union i387_union));
+}
+
+void free_task_struct(struct task_struct *tsk)
+{
+	kmem_cache_free(task_i387_cachep, tsk->thread.i387);
+	tsk->thread.i387=NULL;
+	kmem_cache_free(task_struct_cachep, tsk);
+}
+
+
+void task_struct_slab_init(void)
+{
+ 	/* create a slab on which task_structs can be allocated */
+        task_struct_cachep =
+        	kmem_cache_create("task_struct", sizeof(struct task_struct),
+        		ARCH_MIN_TASKALIGN, SLAB_PANIC, NULL, NULL);
+        task_i387_cachep =
+        	kmem_cache_create("task_i387", sizeof(union i387_union), 32,
+        	    SLAB_PANIC | SLAB_MUST_HWCACHE_ALIGN, NULL, NULL);
+}
+
+
+/* the very init task needs a static allocated i387 area */
+union i387_union init_i387_context;
Index: linux/arch/i386/kernel/traps.c
===================================================================
--- linux.orig/arch/i386/kernel/traps.c
+++ linux/arch/i386/kernel/traps.c
@@ -1157,16 +1157,6 @@ void __init trap_init(void)
 	set_trap_gate(19,&simd_coprocessor_error);
 
 	if (cpu_has_fxsr) {
-		/*
-		 * Verify that the FXSAVE/FXRSTOR data will be 16-byte aligned.
-		 * Generates a compile-time "error: zero width for bit-field" if
-		 * the alignment is wrong.
-		 */
-		struct fxsrAlignAssert {
-			int _:!(offsetof(struct task_struct,
-					thread.i387.fxsave) & 15);
-		};
-
 		printk(KERN_INFO "Enabling fast FPU save and restore... ");
 		set_in_cr4(X86_CR4_OSFXSR);
 		printk("done.\n");
Index: linux/include/asm-i386/i387.h
===================================================================
--- linux.orig/include/asm-i386/i387.h
+++ linux/include/asm-i386/i387.h
@@ -34,7 +34,7 @@ extern void init_fpu(struct task_struct 
 		"nop ; frstor %1",		\
 		"fxrstor %1",			\
 		X86_FEATURE_FXSR,		\
-		"m" ((tsk)->thread.i387.fxsave))
+		"m" ((tsk)->thread.i387->fxsave))
 
 extern void kernel_fpu_begin(void);
 #define kernel_fpu_end() do { stts(); preempt_enable(); } while(0)
@@ -60,8 +60,8 @@ static inline void __save_init_fpu( stru
 		"fxsave %[fx]\n"
 		"bt $7,%[fsw] ; jnc 1f ; fnclex\n1:",
 		X86_FEATURE_FXSR,
-		[fx] "m" (tsk->thread.i387.fxsave),
-		[fsw] "m" (tsk->thread.i387.fxsave.swd) : "memory");
+		[fx] "m" (tsk->thread.i387->fxsave),
+		[fsw] "m" (tsk->thread.i387->fxsave.swd) : "memory");
 	/* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception
 	   is pending.  Clear the x87 state here by setting it to fixed
    	   values. safe_address is a random variable that should be in L1 */
Index: linux/include/asm-i386/processor.h
===================================================================
--- linux.orig/include/asm-i386/processor.h
+++ linux/include/asm-i386/processor.h
@@ -407,7 +407,7 @@ struct thread_struct {
 /* fault info */
 	unsigned long	cr2, trap_no, error_code;
 /* floating point info */
-	union i387_union	i387;
+	union i387_union	*i387;
 /* virtual 86 mode info */
 	struct vm86_struct __user * vm86_info;
 	unsigned long		screen_bitmap;
@@ -420,11 +420,15 @@ struct thread_struct {
 	unsigned long	io_bitmap_max;
 };
 
+
+extern union i387_union init_i387_context;
+
 #define INIT_THREAD  {							\
 	.vm86_info = NULL,						\
 	.sysenter_cs = __KERNEL_CS,					\
 	.io_bitmap_ptr = NULL,						\
 	.fs = __KERNEL_PDA,						\
+	.i387 = &init_i387_context,					\
 }
 
 /*
Index: linux/include/asm-i386/thread_info.h
===================================================================
--- linux.orig/include/asm-i386/thread_info.h
+++ linux/include/asm-i386/thread_info.h
@@ -102,6 +102,12 @@ static inline struct thread_info *curren
 
 #define free_thread_info(info)	kfree(info)
 
+#define __HAVE_ARCH_TASK_STRUCT_ALLOCATOR
+extern struct task_struct * alloc_task_struct(void);
+extern void free_task_struct(struct task_struct *tsk);
+extern void memcpy_task_struct(struct task_struct *dst, struct task_struct *src);
+extern void task_struct_slab_init(void);
+
 #else /* !__ASSEMBLY__ */
 
 /* how to get the thread information struct from ASM */
Index: linux/kernel/fork.c
===================================================================
--- linux.orig/kernel/fork.c
+++ linux/kernel/fork.c
@@ -84,6 +84,8 @@ int nr_processes(void)
 #ifndef __HAVE_ARCH_TASK_STRUCT_ALLOCATOR
 # define alloc_task_struct()	kmem_cache_alloc(task_struct_cachep, GFP_KERNEL)
 # define free_task_struct(tsk)	kmem_cache_free(task_struct_cachep, (tsk))
+# define memcpy_task_struct(dst, src) *dst = *src;
+
 static struct kmem_cache *task_struct_cachep;
 #endif
 
@@ -138,6 +140,8 @@ void __init fork_init(unsigned long memp
 	task_struct_cachep =
 		kmem_cache_create("task_struct", sizeof(struct task_struct),
 			ARCH_MIN_TASKALIGN, SLAB_PANIC, NULL, NULL);
+#else
+	task_struct_slab_init();
 #endif
 
 	/*
@@ -176,7 +180,8 @@ static struct task_struct *dup_task_stru
 		return NULL;
 	}
 
-	*tsk = *orig;
+	memcpy_task_struct(tsk, orig);
+
 	tsk->thread_info = ti;
 	setup_thread_stack(tsk, orig);
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 07/12] syslets: x86, add create_async_thread() method
  2007-02-28 21:39 [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Ingo Molnar
                   ` (5 preceding siblings ...)
  2007-02-28 21:41 ` [patch 06/12] x86: split FPU state from task state Ingo Molnar
@ 2007-02-28 21:42 ` Ingo Molnar
  2007-02-28 21:42 ` [patch 08/12] syslets: x86, add move_user_context() method Ingo Molnar
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2007-02-28 21:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Jens Axboe, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

add the create_async_thread() way of creating kernel threads:
these threads first execute a kernel function and when they
return from it they execute user-space.

An architecture must implement this interface before it can turn
CONFIG_ASYNC_SUPPORT on.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/i386/kernel/entry.S     |   25 +++++++++++++++++++++++++
 arch/i386/kernel/process.c   |   31 +++++++++++++++++++++++++++++++
 include/asm-i386/processor.h |   17 +++++++++++++++++
 include/asm-i386/unistd.h    |   10 ++++++++++
 4 files changed, 83 insertions(+)

Index: linux/arch/i386/kernel/entry.S
===================================================================
--- linux.orig/arch/i386/kernel/entry.S
+++ linux/arch/i386/kernel/entry.S
@@ -1034,6 +1034,31 @@ ENTRY(kernel_thread_helper)
 	CFI_ENDPROC
 ENDPROC(kernel_thread_helper)
 
+ENTRY(async_thread_helper)
+	CFI_STARTPROC
+	/*
+	 * Allocate space on the stack for pt-regs.
+	 * sizeof(struct pt_regs) == 64, and we've got 8 bytes on the
+	 * kernel stack already:
+	 */
+	subl $64-8, %esp
+	CFI_ADJUST_CFA_OFFSET 64-8
+	movl %edx,%eax
+	push %edx
+	CFI_ADJUST_CFA_OFFSET 4
+	call *%ebx
+	addl $4, %esp
+	CFI_ADJUST_CFA_OFFSET -4
+
+	movl %eax, PT_EAX(%esp)
+
+	GET_THREAD_INFO(%ebp)
+
+	jmp syscall_exit
+	CFI_ENDPROC
+ENDPROC(async_thread_helper)
+
+
 .section .rodata,"a"
 #include "syscall_table.S"
 
Index: linux/arch/i386/kernel/process.c
===================================================================
--- linux.orig/arch/i386/kernel/process.c
+++ linux/arch/i386/kernel/process.c
@@ -355,6 +355,37 @@ int kernel_thread(int (*fn)(void *), voi
 EXPORT_SYMBOL(kernel_thread);
 
 /*
+ * This gets run with %ebx containing the
+ * function to call, and %edx containing
+ * the "args".
+ */
+extern void async_thread_helper(void);
+
+/*
+ * Create an async thread
+ */
+int create_async_thread(long (*fn)(void *), void * arg, unsigned long flags)
+{
+	struct pt_regs regs;
+
+	memset(&regs, 0, sizeof(regs));
+
+	regs.ebx = (unsigned long) fn;
+	regs.edx = (unsigned long) arg;
+
+	regs.xds = __USER_DS;
+	regs.xes = __USER_DS;
+	regs.xfs = __KERNEL_PDA;
+	regs.orig_eax = -1;
+	regs.eip = (unsigned long) async_thread_helper;
+	regs.xcs = __KERNEL_CS | get_kernel_rpl();
+	regs.eflags = X86_EFLAGS_IF | X86_EFLAGS_SF | X86_EFLAGS_PF | 0x2;
+
+	/* Ok, create the new task.. */
+	return do_fork(flags, 0, &regs, 0, NULL, NULL);
+}
+
+/*
  * Free current thread data structures etc..
  */
 void exit_thread(void)
Index: linux/include/asm-i386/processor.h
===================================================================
--- linux.orig/include/asm-i386/processor.h
+++ linux/include/asm-i386/processor.h
@@ -472,6 +472,11 @@ extern void prepare_to_copy(struct task_
  */
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
+/*
+ * create an async thread:
+ */
+extern int create_async_thread(long (*fn)(void *), void * arg, unsigned long flags);
+
 extern unsigned long thread_saved_pc(struct task_struct *tsk);
 void show_trace(struct task_struct *task, struct pt_regs *regs, unsigned long *stack);
 
@@ -504,6 +509,18 @@ unsigned long get_wchan(struct task_stru
 #define KSTK_EIP(task) (task_pt_regs(task)->eip)
 #define KSTK_ESP(task) (task_pt_regs(task)->esp)
 
+/*
+ * Register access methods for async syscall support.
+ *
+ * Note, task_stack_reg() must not be an lvalue, hence this macro:
+ */
+#define task_stack_reg(t)			\
+		({ unsigned long __esp = task_pt_regs(t)->esp; __esp; })
+#define set_task_stack_reg(t, new_stack)	\
+		do { task_pt_regs(t)->esp = (new_stack); } while (0)
+#define task_ip_reg(t)				task_pt_regs(t)->eip
+#define task_ret_reg(t)				task_pt_regs(t)->eax
+
 
 struct microcode_header {
 	unsigned int hdrver;
Index: linux/include/asm-i386/unistd.h
===================================================================
--- linux.orig/include/asm-i386/unistd.h
+++ linux/include/asm-i386/unistd.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_I386_UNISTD_H_
 #define _ASM_I386_UNISTD_H_
 
+#include <linux/linkage.h>
+
 /*
  * This file contains the system call numbers.
  */
@@ -330,6 +332,14 @@
 
 #define NR_syscalls 320
 
+#ifndef __ASSEMBLY__
+
+typedef asmlinkage long (*syscall_fn_t)(long, long, long, long, long, long);
+
+extern syscall_fn_t sys_call_table[NR_syscalls];
+
+#endif
+
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
 #define __ARCH_WANT_OLD_STAT

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 08/12] syslets: x86, add move_user_context() method
  2007-02-28 21:39 [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Ingo Molnar
                   ` (6 preceding siblings ...)
  2007-02-28 21:42 ` [patch 07/12] syslets: x86, add create_async_thread() method Ingo Molnar
@ 2007-02-28 21:42 ` Ingo Molnar
  2007-02-28 21:42 ` [patch 09/12] syslets: x86, mark async unsafe syscalls Ingo Molnar
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2007-02-28 21:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Jens Axboe, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

add the move_user_context() method to move the user-space
context of one kernel thread to another kernel thread.
User-space might notice the changed TID, but execution,
stack and register contents (general purpose and FPU) are
still the same.

An architecture must implement this interface before it can turn
CONFIG_ASYNC_SUPPORT on.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/i386/kernel/process.c |   21 +++++++++++++++++++++
 include/asm-i386/system.h  |    7 +++++++
 2 files changed, 28 insertions(+)

Index: linux/arch/i386/kernel/process.c
===================================================================
--- linux.orig/arch/i386/kernel/process.c
+++ linux/arch/i386/kernel/process.c
@@ -839,6 +839,27 @@ unsigned long get_wchan(struct task_stru
 }
 
 /*
+ * Move user-space context from one kernel thread to another.
+ * This includes registers and FPU state. Callers must make
+ * sure that neither task is running user context at the moment:
+ */
+void
+move_user_context(struct task_struct *new_task, struct task_struct *old_task)
+{
+	struct pt_regs *old_regs = task_pt_regs(old_task);
+	struct pt_regs *new_regs = task_pt_regs(new_task);
+	union i387_union *tmp;
+
+	*new_regs = *old_regs;
+	/*
+	 * Flip around the FPU state too:
+	 */
+	tmp = new_task->thread.i387;
+	new_task->thread.i387 = old_task->thread.i387;
+	old_task->thread.i387 = tmp;
+}
+
+/*
  * sys_alloc_thread_area: get a yet unused TLS descriptor index.
  */
 static int get_free_idx(void)
Index: linux/include/asm-i386/system.h
===================================================================
--- linux.orig/include/asm-i386/system.h
+++ linux/include/asm-i386/system.h
@@ -33,6 +33,13 @@ extern struct task_struct * FASTCALL(__s
 		      "2" (prev), "d" (next));				\
 } while (0)
 
+/*
+ * Move user-space context from one kernel thread to another.
+ * This includes registers and FPU state for now:
+ */
+extern void
+move_user_context(struct task_struct *new_task, struct task_struct *old_task);
+
 #define _set_base(addr,base) do { unsigned long __pr; \
 __asm__ __volatile__ ("movw %%dx,%1\n\t" \
 	"rorl $16,%%edx\n\t" \

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 09/12] syslets: x86, mark async unsafe syscalls
  2007-02-28 21:39 [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Ingo Molnar
                   ` (7 preceding siblings ...)
  2007-02-28 21:42 ` [patch 08/12] syslets: x86, add move_user_context() method Ingo Molnar
@ 2007-02-28 21:42 ` Ingo Molnar
  2007-02-28 21:42 ` [patch 10/12] syslets: x86: enable ASYNC_SUPPORT Ingo Molnar
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2007-02-28 21:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Jens Axboe, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

mark clone() and fork() as not available for async execution.
Both need an intact user context beneath them to work.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/i386/kernel/ioport.c  |    6 ++++++
 arch/i386/kernel/ldt.c     |    3 +++
 arch/i386/kernel/process.c |    6 ++++++
 arch/i386/kernel/vm86.c    |    6 ++++++
 4 files changed, 21 insertions(+)

Index: linux/arch/i386/kernel/ioport.c
===================================================================
--- linux.orig/arch/i386/kernel/ioport.c
+++ linux/arch/i386/kernel/ioport.c
@@ -62,6 +62,9 @@ asmlinkage long sys_ioperm(unsigned long
 	struct tss_struct * tss;
 	unsigned long *bitmap;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	if ((from + num <= from) || (from + num > IO_BITMAP_BITS))
 		return -EINVAL;
 	if (turn_on && !capable(CAP_SYS_RAWIO))
@@ -139,6 +142,9 @@ asmlinkage long sys_iopl(unsigned long u
 	unsigned int old = (regs->eflags >> 12) & 3;
 	struct thread_struct *t = &current->thread;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	if (level > 3)
 		return -EINVAL;
 	/* Trying to gain more privileges? */
Index: linux/arch/i386/kernel/ldt.c
===================================================================
--- linux.orig/arch/i386/kernel/ldt.c
+++ linux/arch/i386/kernel/ldt.c
@@ -233,6 +233,9 @@ asmlinkage int sys_modify_ldt(int func, 
 {
 	int ret = -ENOSYS;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	switch (func) {
 	case 0:
 		ret = read_ldt(ptr, bytecount);
Index: linux/arch/i386/kernel/process.c
===================================================================
--- linux.orig/arch/i386/kernel/process.c
+++ linux/arch/i386/kernel/process.c
@@ -750,6 +750,9 @@ struct task_struct fastcall * __switch_t
 
 asmlinkage int sys_fork(struct pt_regs regs)
 {
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	return do_fork(SIGCHLD, regs.esp, &regs, 0, NULL, NULL);
 }
 
@@ -759,6 +762,9 @@ asmlinkage int sys_clone(struct pt_regs 
 	unsigned long newsp;
 	int __user *parent_tidptr, *child_tidptr;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	clone_flags = regs.ebx;
 	newsp = regs.ecx;
 	parent_tidptr = (int __user *)regs.edx;
Index: linux/arch/i386/kernel/vm86.c
===================================================================
--- linux.orig/arch/i386/kernel/vm86.c
+++ linux/arch/i386/kernel/vm86.c
@@ -209,6 +209,9 @@ asmlinkage int sys_vm86old(struct pt_reg
 	struct task_struct *tsk;
 	int tmp, ret = -EPERM;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	tsk = current;
 	if (tsk->thread.saved_esp0)
 		goto out;
@@ -239,6 +242,9 @@ asmlinkage int sys_vm86(struct pt_regs r
 	int tmp, ret;
 	struct vm86plus_struct __user *v86;
 
+	if (async_syscall(current))
+		return -ENOSYS;
+
 	tsk = current;
 	switch (regs.ebx) {
 		case VM86_REQUEST_IRQ:

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 10/12] syslets: x86: enable ASYNC_SUPPORT
  2007-02-28 21:39 [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Ingo Molnar
                   ` (8 preceding siblings ...)
  2007-02-28 21:42 ` [patch 09/12] syslets: x86, mark async unsafe syscalls Ingo Molnar
@ 2007-02-28 21:42 ` Ingo Molnar
  2007-02-28 21:42 ` [patch 11/12] syslets: x86, wire up the syslet system calls Ingo Molnar
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2007-02-28 21:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Jens Axboe, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

enable CONFIG_ASYNC_SUPPORT on x86.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/i386/Kconfig |    4 ++++
 1 file changed, 4 insertions(+)

Index: linux/arch/i386/Kconfig
===================================================================
--- linux.orig/arch/i386/Kconfig
+++ linux/arch/i386/Kconfig
@@ -55,6 +55,10 @@ config ZONE_DMA
 	bool
 	default y
 
+config ASYNC_SUPPORT
+	bool
+	default y
+
 config SBUS
 	bool
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 11/12] syslets: x86, wire up the syslet system calls
  2007-02-28 21:39 [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Ingo Molnar
                   ` (9 preceding siblings ...)
  2007-02-28 21:42 ` [patch 10/12] syslets: x86: enable ASYNC_SUPPORT Ingo Molnar
@ 2007-02-28 21:42 ` Ingo Molnar
  2007-02-28 21:42 ` [patch 12/12] syslets: x86_64: add syslet/threadlet support Ingo Molnar
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2007-02-28 21:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Jens Axboe, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

wire up the new syslet / async system call syscalls and make it
thus available to user-space.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/i386/kernel/syscall_table.S |    6 ++++++
 include/asm-i386/unistd.h        |    8 +++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

Index: linux/arch/i386/kernel/syscall_table.S
===================================================================
--- linux.orig/arch/i386/kernel/syscall_table.S
+++ linux/arch/i386/kernel/syscall_table.S
@@ -319,3 +319,9 @@ ENTRY(sys_call_table)
 	.long sys_move_pages
 	.long sys_getcpu
 	.long sys_epoll_pwait
+	.long sys_async_exec		/* 320 */
+	.long sys_async_wait
+	.long sys_umem_add
+	.long sys_async_thread
+	.long sys_threadlet_on
+	.long sys_threadlet_off		/* 325 */
Index: linux/include/asm-i386/unistd.h
===================================================================
--- linux.orig/include/asm-i386/unistd.h
+++ linux/include/asm-i386/unistd.h
@@ -327,10 +327,16 @@
 #define __NR_move_pages		317
 #define __NR_getcpu		318
 #define __NR_epoll_pwait	319
+#define __NR_async_exec		320
+#define __NR_async_wait		321
+#define __NR_umem_add		322
+#define __NR_async_thread	323
+#define __NR_threadlet_on	324
+#define __NR_threadlet_off	325
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 320
+#define NR_syscalls 326
 
 #ifndef __ASSEMBLY__
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 12/12] syslets: x86_64: add syslet/threadlet support
  2007-02-28 21:39 [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Ingo Molnar
                   ` (10 preceding siblings ...)
  2007-02-28 21:42 ` [patch 11/12] syslets: x86, wire up the syslet system calls Ingo Molnar
@ 2007-02-28 21:42 ` Ingo Molnar
  2007-03-01  9:36 ` [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Stephen Rothwell
  2007-03-07 20:10 ` Anton Blanchard
  13 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2007-02-28 21:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Jens Axboe, Thomas Gleixner

From: Ingo Molnar <mingo@elte.hu>

add the arch specific bits of syslet/threadlet support to x86_64.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86_64/Kconfig            |    4 ++
 arch/x86_64/ia32/ia32entry.S   |   20 ++++++++++-
 arch/x86_64/kernel/entry.S     |   72 ++++++++++++++++++++++++++++++++++++++++-
 arch/x86_64/kernel/process.c   |   11 ++++++
 include/asm-x86_64/processor.h |   16 +++++++++
 include/asm-x86_64/system.h    |   12 ++++++
 include/asm-x86_64/unistd.h    |   29 +++++++++++++++-
 7 files changed, 160 insertions(+), 4 deletions(-)

Index: linux/arch/x86_64/Kconfig
===================================================================
--- linux.orig/arch/x86_64/Kconfig
+++ linux/arch/x86_64/Kconfig
@@ -36,6 +36,10 @@ config ZONE_DMA32
 	bool
 	default y
 
+config ASYNC_SUPPORT
+	bool
+	default y
+
 config LOCKDEP_SUPPORT
 	bool
 	default y
Index: linux/arch/x86_64/ia32/ia32entry.S
===================================================================
--- linux.orig/arch/x86_64/ia32/ia32entry.S
+++ linux/arch/x86_64/ia32/ia32entry.S
@@ -368,6 +368,14 @@ quiet_ni_syscall:
 	PTREGSCALL stub32_vfork, sys_vfork, %rdi
 	PTREGSCALL stub32_iopl, sys_iopl, %rsi
 	PTREGSCALL stub32_rt_sigsuspend, sys_rt_sigsuspend, %rdx
+	/*
+	 * sys_async_thread() and sys_async_exec() both take 2 parameters,
+	 * none of which is ptregs - but the syscalls rely on being able to
+	 * modify ptregs. So we put ptregs into the 3rd parameter - so it's
+	 * unused and it also does not mess up the first 2 parameters:
+	 */
+	PTREGSCALL stub32_compat_async_exec, compat_sys_async_exec, %rdx
+	PTREGSCALL stub32_compat_async_thread, sys_async_thread, %rdx
 
 ENTRY(ia32_ptregs_common)
 	popq %r11
@@ -394,6 +402,9 @@ END(ia32_ptregs_common)
 
 	.section .rodata,"a"
 	.align 8
+.globl compat_sys_call_table
+compat_sys_call_table:
+.globl ia32_sys_call_table
 ia32_sys_call_table:
 	.quad sys_restart_syscall
 	.quad sys_exit
@@ -714,9 +725,16 @@ ia32_sys_call_table:
 	.quad compat_sys_get_robust_list
 	.quad sys_splice
 	.quad sys_sync_file_range
-	.quad sys_tee
+	.quad sys_tee			/* 315 */
 	.quad compat_sys_vmsplice
 	.quad compat_sys_move_pages
 	.quad sys_getcpu
 	.quad sys_epoll_pwait
+	.quad stub32_compat_async_exec	/* 320 */
+	.quad sys_async_wait
+	.quad sys_umem_add
+	.quad stub32_compat_async_thread
+	.quad sys_threadlet_on
+	.quad sys_threadlet_off		/* 325 */
+.globl ia32_syscall_end
 ia32_syscall_end:		
Index: linux/arch/x86_64/kernel/entry.S
===================================================================
--- linux.orig/arch/x86_64/kernel/entry.S
+++ linux/arch/x86_64/kernel/entry.S
@@ -410,6 +410,14 @@ END(\label)
 	PTREGSCALL stub_rt_sigsuspend, sys_rt_sigsuspend, %rdx
 	PTREGSCALL stub_sigaltstack, sys_sigaltstack, %rdx
 	PTREGSCALL stub_iopl, sys_iopl, %rsi
+	/*
+	 * sys_async_thread() and sys_async_exec() both take 2 parameters,
+	 * none of which is ptregs - but the syscalls rely on being able to
+	 * modify ptregs. So we put ptregs into the 3rd parameter - so it's
+	 * unused and it also does not mess up the first 2 parameters:
+	 */
+	PTREGSCALL stub_async_thread, sys_async_thread, %rdx
+	PTREGSCALL stub_async_exec, sys_async_exec, %rdx
 
 ENTRY(ptregscall_common)
 	popq %r11
@@ -430,7 +438,7 @@ ENTRY(ptregscall_common)
 	ret
 	CFI_ENDPROC
 END(ptregscall_common)
-	
+
 ENTRY(stub_execve)
 	CFI_STARTPROC
 	popq %r11
@@ -990,6 +998,68 @@ child_rip:
 ENDPROC(child_rip)
 
 /*
+ * Create an async kernel thread.
+ *
+ * C extern interface:
+ *	extern long create_async_thread(int (*fn)(void *), void * arg, unsigned long flags)
+ *
+ * asm input arguments:
+ *	rdi: fn, rsi: arg, rdx: flags
+ */
+ENTRY(create_async_thread)
+	CFI_STARTPROC
+	FAKE_STACK_FRAME $async_child_rip
+	SAVE_ALL
+
+	# rdi: flags, rsi: usp, rdx: will be &pt_regs
+	movq %rdx,%rdi
+	movq $-1, %rsi
+	movq %rsp, %rdx
+
+	xorl %r8d,%r8d
+	xorl %r9d,%r9d
+
+	# clone now
+	call do_fork
+	movq %rax,RAX(%rsp)
+	xorl %edi,%edi
+
+	/*
+	 * It isn't worth to check for reschedule here,
+	 * so internally to the x86_64 port you can rely on kernel_thread()
+	 * not to reschedule the child before returning, this avoids the need
+	 * of hacks for example to fork off the per-CPU idle tasks.
+         * [Hopefully no generic code relies on the reschedule -AK]
+	 */
+	RESTORE_ALL
+	UNFAKE_STACK_FRAME
+	ret
+	CFI_ENDPROC
+ENDPROC(async_kernel_thread)
+
+async_child_rip:
+	CFI_STARTPROC
+
+	movq %rdi, %rax
+	movq %rsi, %rdi
+	call *%rax
+
+	/*
+	 * Fix up the PDA - we might return with sysexit:
+	 */
+	RESTORE_TOP_OF_STACK %r11
+
+	/*
+	 * return to user-space:
+	 */
+	movq %rax, RAX(%rsp)
+	RESTORE_REST
+	jmp int_ret_from_sys_call
+
+	CFI_ENDPROC
+ENDPROC(async_child_rip)
+
+/*
  * execve(). This function needs to use IRET, not SYSRET, to set up all state properly.
  *
  * C extern interface:
Index: linux/arch/x86_64/kernel/process.c
===================================================================
--- linux.orig/arch/x86_64/kernel/process.c
+++ linux/arch/x86_64/kernel/process.c
@@ -418,6 +418,17 @@ void release_thread(struct task_struct *
 	}
 }
 
+/*
+ * Move user-space context from one kernel thread to another.
+ * Callers must make sure that neither task is running user context
+ * at the moment:
+ */
+void
+move_user_context(struct task_struct *new_task, struct task_struct *old_task)
+{
+	*task_pt_regs(new_task) = *task_pt_regs(old_task);
+}
+
 static inline void set_32bit_tls(struct task_struct *t, int tls, u32 addr)
 {
 	struct user_desc ud = { 
Index: linux/include/asm-x86_64/processor.h
===================================================================
--- linux.orig/include/asm-x86_64/processor.h
+++ linux/include/asm-x86_64/processor.h
@@ -322,6 +322,11 @@ extern void prepare_to_copy(struct task_
 extern long kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
 /*
+ * create an async thread:
+ */
+extern long create_async_thread(long (*fn)(void *), void * arg, unsigned long flags);
+
+/*
  * Return saved PC of a blocked thread.
  * What is this good for? it will be always the scheduler or ret_from_fork.
  */
@@ -332,6 +337,17 @@ extern unsigned long get_wchan(struct ta
 #define KSTK_EIP(tsk) (task_pt_regs(tsk)->rip)
 #define KSTK_ESP(tsk) -1 /* sorry. doesn't work for syscall. */
 
+/*
+ * Register access methods for async syscall support.
+ *
+ * Note, task_stack_reg() must not be an lvalue, hence this macro:
+ */
+#define task_stack_reg(t)			\
+		({ unsigned long __rsp = task_pt_regs(t)->rsp; __rsp; })
+#define set_task_stack_reg(t, new_stack)	\
+		do { task_pt_regs(t)->rsp = (new_stack); } while (0)
+#define task_ip_reg(t)				task_pt_regs(t)->rip
+#define task_ret_reg(t)			task_pt_regs(t)->rax
 
 struct microcode_header {
 	unsigned int hdrver;
Index: linux/include/asm-x86_64/system.h
===================================================================
--- linux.orig/include/asm-x86_64/system.h
+++ linux/include/asm-x86_64/system.h
@@ -20,6 +20,8 @@
 #define __EXTRA_CLOBBER  \
 	,"rcx","rbx","rdx","r8","r9","r10","r11","r12","r13","r14","r15"
 
+struct task_struct;
+
 /* Save restore flags to clear handle leaking NT */
 #define switch_to(prev,next,last) \
 	asm volatile(SAVE_CONTEXT						    \
@@ -42,7 +44,15 @@
 		       [thread_info] "i" (offsetof(struct task_struct, thread_info)), \
 		       [pda_pcurrent] "i" (offsetof(struct x8664_pda, pcurrent))   \
 		     : "memory", "cc" __EXTRA_CLOBBER)
-    
+
+
+/*
+ * Move user-space context from one kernel thread to another.
+ * This includes registers and FPU state for now:
+ */
+extern void
+move_user_context(struct task_struct *new_task, struct task_struct *old_task);
+
 extern void load_gs_index(unsigned); 
 
 /*
Index: linux/include/asm-x86_64/unistd.h
===================================================================
--- linux.orig/include/asm-x86_64/unistd.h
+++ linux/include/asm-x86_64/unistd.h
@@ -619,8 +619,21 @@ __SYSCALL(__NR_sync_file_range, sys_sync
 __SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages		279
 __SYSCALL(__NR_move_pages, sys_move_pages)
+#define __NR_async_exec		280
+__SYSCALL(__NR_async_exec, stub_async_exec)
+#define __NR_async_wait		281
+__SYSCALL(__NR_async_wait, sys_async_wait)
+#define __NR_umem_add		282
+__SYSCALL(__NR_umem_add, sys_umem_add)
+#define __NR_async_thread	283
+__SYSCALL(__NR_async_thread, stub_async_thread)
+#define __NR_threadlet_on	284
+__SYSCALL(__NR_threadlet_on, sys_threadlet_on)
+#define __NR_threadlet_off	285
+__SYSCALL(__NR_threadlet_off, sys_threadlet_off)
 
-#define __NR_syscall_max __NR_move_pages
+#define __NR_syscall_max	__NR_threadlet_off
+#define NR_syscalls		__NR_syscall_max
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR
@@ -654,6 +667,20 @@ __SYSCALL(__NR_move_pages, sys_move_page
 #include <linux/types.h>
 #include <asm/ptrace.h>
 
+typedef asmlinkage long (*syscall_fn_t)(long, long, long, long, long, long);
+
+extern syscall_fn_t sys_call_table[NR_syscalls];
+
+#ifdef CONFIG_COMPAT
+
+extern syscall_fn_t compat_sys_call_table[];
+extern syscall_fn_t ia32_syscall_end;
+extern syscall_fn_t ia32_sys_call_table;
+
+#define compat_NR_syscalls (&ia32_syscall_end - compat_sys_call_table)
+
+#endif
+
 asmlinkage long sys_iopl(unsigned int level, struct pt_regs *regs);
 asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int turn_on);
 struct sigaction;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 02/12] syslets: add syslet.h include file, user API/ABI definitions
  2007-02-28 21:41 ` [patch 02/12] syslets: add syslet.h include file, user API/ABI definitions Ingo Molnar
@ 2007-03-01  3:05   ` Kevin O'Connor
  2007-03-01  9:18     ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Kevin O'Connor @ 2007-03-01  3:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Arjan van de Ven,
	Christoph Hellwig, Andrew Morton, Alan Cox, Ulrich Drepper,
	Zach Brown

On Wed, Feb 28, 2007 at 10:41:17PM +0100, Ingo Molnar wrote:
> From: Ingo Molnar <mingo@elte.hu>
> 
> add include/linux/syslet.h which contains the user-space API/ABI
> declarations. Add the new header to include/linux/Kbuild as well.

Hi Ingo,

I'd like to propose a simpler userspace API for syslets.  I believe
this revised API is just as capable as yours (anything done purely in
kernel space with the existing API can also be done with this one).

An "atom" would look like:

struct syslet_uatom {
	u32		nr;
	u64		ret_ptr;
	u64		next;
	u64		arg_nr;
	u64		args[6];
};

The sys_nr, ret_ptr, and next fields would be unchanged.  The args
array would directly store the arguments to the system call.  To
optimize the case where only a few arguments are necessary, an
explicit argument count would be set in the arg_nr field.

The above is very similar to what Linus and Davide described as a
"single submission" syslet interface - it differs only with the
addition of the next parameter.  As with your API, a null next field
would immediately stop the syslet.  So, a user wishing to run a single
system call asynchronously could use the above interface with the next
field set to null.

Of course, the above lacks the syscall return testing capabilities in
your atoms.  To obtain that capability, one could add a new syscall:

long sys_syslet_helper(long flags, long *ptr, long inc, u64 new_next)

The above is effectively a combination of sys_umem_add and the "flags"
field from the existing syslet_uatom.  The system call would only be
available from syslets.  It would add "inc" to the integer stored in
"ptr" and return the result.  The "flags" field could optionally
contain one of:
 SYSLET_BRANCH_ON_NONZERO
 SYSLET_BRANCH_ON_ZERO
 SYSLET_BRANCH_ON_NEGATIVE
 SYSLET_BRANCH_ON_NON_POSITIVE
If the flag were set and the return result of the syscall met the
specified condition, then the code would arrange for the calling
syslet to branch to "new_next" instead of the normal "next".

I would also change the event ring notification system.  Instead of
building that support into all syslets, one could introduce an "add to
head" syscall specifically for that purpose.  If done this way,
userspace could arrange for this new sys_addtoring call to always be
the last uatom executed in a syslet.  This would make the support
optional - those userspace applications that prefer to use a futex or
signal as an event system could arrange to have those system calls as
the last one in the chain instead.  With this change, the
sys_async_exec would simplify to:

long sys_async_exec(struct syslet_uatom *uatom);

As above, I believe this API has as much power as the existing system.
The general idea is to make the system call / atoms simpler and use
more atoms when building complex chains.

For example, the open & stat case could be done with a chain like the
following:

atom1: &atom3->args[1] = sys_open(...)
atom2: sys_syslet_helper(SYSLET_BRANCH_ON_NON_POSITIVE,
                         &atom3->args[1], 0, atom4)
atom3: sys_stat([arg1 filled above], ...)
atom4: sys_futex(...)   // alert parent of completion

It is also possible to use sys_syslet_helper to push a return value to
multiple syslet parameters (for example, propagating an fd from open
to multiple reads).  For example:

atom1: &atom3->args[1] = sys_open(...)
atom2: &atom4->args[1] = sys_syslet_helper(0, &atom3->args[1], 0, 0)
atom3: sys_read([arg1 filled in atom1], ...)
atom4: sys_read([arg1 filled in atom2], ...)
...

Although this is a bit ugly, I must wonder how many times one would
build chains complex enough to require it.

Cheers,
-Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 02/12] syslets: add syslet.h include file, user API/ABI definitions
  2007-03-01  3:05   ` Kevin O'Connor
@ 2007-03-01  9:18     ` Ingo Molnar
  0 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2007-03-01  9:18 UTC (permalink / raw)
  To: Kevin O'Connor
  Cc: linux-kernel, Linus Torvalds, Arjan van de Ven,
	Christoph Hellwig, Andrew Morton, Alan Cox, Ulrich Drepper,
	Zach Brown


* Kevin O'Connor <kevin@koconnor.net> wrote:

> I'd like to propose a simpler userspace API for syslets.  I believe 
> this revised API is just as capable as yours (anything done purely in 
> kernel space with the existing API can also be done with this one).
> 
> An "atom" would look like:
> 
> struct syslet_uatom {
> 	u32		nr;
> 	u64		ret_ptr;
> 	u64		next;
> 	u64		arg_nr;
> 	u64		args[6];
> };
> 
> The sys_nr, ret_ptr, and next fields would be unchanged.  The args 
> array would directly store the arguments to the system call.  To 
> optimize the case where only a few arguments are necessary, an 
> explicit argument count would be set in the arg_nr field.

i dont see the advantage of arg_nr - if the arguments are direct then 
the best way is to just fetch them all, not to do testing upon arg_nr. 
Furthermore, regarding the indirect pointers, they are quite essential 
for some uses, see:

  http://lkml.org/lkml/2007/2/28/292

> Of course, the above lacks the syscall return testing capabilities in 
> your atoms.  To obtain that capability, one could add a new syscall:
> 
> long sys_syslet_helper(long flags, long *ptr, long inc, u64 new_next)

yes. But a 'flags' field is handy anyway, to signal things like 
NOCOMPLETE or SYSLET_SYNC/SYSLET_ASYNC (a flags field is always useful 
in such structures). So the condition testing comes 'for free' in 
essence.

but ... as you can see it with sys_umem_add(), i like the addition of 
helper syscalls - i just think that this particular one wouldnt be too 
helpful.

> I would also change the event ring notification system.  Instead of 
> building that support into all syslets, one could introduce an "add to 
> head" syscall specifically for that purpose.  If done this way, 
> userspace could arrange for this new sys_addtoring call to always be 
> the last uatom executed in a syslet.  This would make the support 
> optional - those userspace applications that prefer to use a futex or 
> signal as an event system could arrange to have those system calls as 
> the last one in the chain instead. [...]

the problem is signals: a syslet has to return to user-space upon 
signals or upon a stop condition. So to notify about the precise place 
of stoppage, the notification ring is a 'built in' property.

(nevertheless, as i mentioned it in a prior mail, i'll create separate 
ring syscalls, because they are useful for other stuff too.)

> For example, the open & stat case could be done with a chain like the 
> following:
> 
> atom1: &atom3->args[1] = sys_open(...)
> atom2: sys_syslet_helper(SYSLET_BRANCH_ON_NON_POSITIVE,
>                          &atom3->args[1], 0, atom4)

i dont see a huge conceptual difference between having the syslet helper 
in flags versus having it in a separate syscall. Other than yours has 
twice the number of atoms.

> It is also possible to use sys_syslet_helper to push a return value to 
> multiple syslet parameters (for example, propagating an fd from open 
> to multiple reads).  For example:
> 
> atom1: &atom3->args[1] = sys_open(...)
> atom2: &atom4->args[1] = sys_syslet_helper(0, &atom3->args[1], 0, 0)
> atom3: sys_read([arg1 filled in atom1], ...)
> atom4: sys_read([arg1 filled in atom2], ...)

try to do this in FIO. You'd have to create many extra atoms to push the 
fd into the argument fields - instead of just sharing the variable. 
Sharing is /good/. These 'simplifications' complicate the whole syslet 
programming model down to being near unusable.

> Although this is a bit ugly, I must wonder how many times one would 
> build chains complex enough to require it.

take a look at FIO.

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 00/12] Syslets, Threadlets, generic AIO support, v5
  2007-02-28 21:39 [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Ingo Molnar
                   ` (11 preceding siblings ...)
  2007-02-28 21:42 ` [patch 12/12] syslets: x86_64: add syslet/threadlet support Ingo Molnar
@ 2007-03-01  9:36 ` Stephen Rothwell
  2007-03-07 20:10 ` Anton Blanchard
  13 siblings, 0 replies; 18+ messages in thread
From: Stephen Rothwell @ 2007-03-01  9:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Arjan van de Ven,
	Christoph Hellwig, Andrew Morton, Alan Cox, Ulrich Drepper,
	Zach Brown, Evgeniy Polyakov, David S. Miller,
	Suparna Bhattacharya, Davide Libenzi, Jens Axboe,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 313 bytes --]

On Wed, 28 Feb 2007 22:39:38 +0100 Ingo Molnar <mingo@elte.hu> wrote:
>
>  - 32-bit user-space on 64-bit kernel compat support. 32-bit syslet and
>    threadlet binaries work fine on 64-bit kernels.

Yay!  :-)

--
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 00/12] Syslets, Threadlets, generic AIO support, v5
  2007-02-28 21:39 [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Ingo Molnar
                   ` (12 preceding siblings ...)
  2007-03-01  9:36 ` [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Stephen Rothwell
@ 2007-03-07 20:10 ` Anton Blanchard
  2007-03-13  7:05   ` Milton Miller
  13 siblings, 1 reply; 18+ messages in thread
From: Anton Blanchard @ 2007-03-07 20:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Arjan van de Ven,
	Christoph Hellwig, Andrew Morton, Alan Cox, Ulrich Drepper,
	Zach Brown, Evgeniy Polyakov, David S. Miller,
	Suparna Bhattacharya, Davide Libenzi, Jens Axboe,
	Thomas Gleixner


Hi Ingo,

> this is the v5 release of the syslet/threadlet subsystem:
> 
>    http://redhat.com/~mingo/syslet-patches/

Nice!

I tried to port this to ppc64 but found a few problems:

The 64bit powerpc ABI has the concept of a TOC (r2) which is used for
per function data. This means this wont work:

        if (ah->restore_stack) {
                set_task_stack_reg(new_task, ah->restore_stack);
                task_ip_reg(new_task) = ah->restore_ip;
                /*
                 * The return code 0 is needed to tell the
                 * head user-context that the threadlet went async:
                 */
                task_ret_reg(new_task) = 0;
        }

I think we would want to change restore_ip to restore_function, and then
create a per arch helper, perhaps:

void set_user_context(struct task_struct *task, unsigned long stack,
		      unsigned long function, unsigned long retval);

ppc64 could then grab the ip and r2 values from the function descriptor.

The other issue involves the syscall table:

asmlinkage struct syslet_uatom __user *
sys_async_exec(struct syslet_uatom __user *uatom,
               struct async_head_user __user *ahu)
{
        return __sys_async_exec(uatom, ahu, sys_call_table, NR_syscalls);
}

This exposes the layout of the syscall table. Unfortunately it wont work
on ppc64. In arch/powerpc/kernel/systbl.S:

#define COMPAT_SYS(func)        .llong  .sys_##func,.compat_sys_##func

Both syscall tables are overlaid.

Anton

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 00/12] Syslets, Threadlets, generic AIO support, v5
  2007-03-07 20:10 ` Anton Blanchard
@ 2007-03-13  7:05   ` Milton Miller
  0 siblings, 0 replies; 18+ messages in thread
From: Milton Miller @ 2007-03-13  7:05 UTC (permalink / raw)
  To: LKML, Ingo Molnar; +Cc: Anton Blanchard, Stephen Rothwell

Anton Blanchard wrote:
> Hi Ingo,
>
>> this is the v5 release of the syslet/threadlet subsystem:
>>
>>    http://redhat.com/~mingo/syslet-patches/
>
> Nice!
>

I too went and downloaded patches-v5 for review.

First off, one problem I noticed in sys_async_wait:

+	ah->events_left = min_wait_events - (kernel_ring_idx - user_ring_idx);

This completely misses the wraparound case of kernel_ring_idx <
user_ring_idx. I wonder if this is causing some of the benchmark 
problems?
(add max_ring_index if kernel < user).

> I tried to port this to ppc64 but found a few problems:
>
> The 64bit powerpc ABI has the concept of a TOC (r2) which is used for
> per function data. This means this wont work:
[deleted]
> I think we would want to change restore_ip to restore_function, and 
> then
> create a per arch helper, perhaps:
>
> void set_user_context(struct task_struct *task, unsigned long stack,
> 		      unsigned long function, unsigned long retval);
>
> ppc64 could then grab the ip and r2 values from the function 
> descriptor.
>
> The other issue involves the syscall table:
>
> asmlinkage struct syslet_uatom __user *
> sys_async_exec(struct syslet_uatom __user *uatom,
>                struct async_head_user __user *ahu)
> {
>         return __sys_async_exec(uatom, ahu, sys_call_table, 
> NR_syscalls);
> }
>
> This exposes the layout of the syscall table. Unfortunately it wont 
> work
> on ppc64. In arch/powerpc/kernel/systbl.S:
>
> #define COMPAT_SYS(func)        .llong  .sys_##func,.compat_sys_##func
>
> Both syscall tables are overlaid.
>
> Anton

In addition, the entries in the table are not function pointers, they
are the actual code targets.   So we need a arch helper to invoke the
system call.

Here is another problem with your compat code.  Just telling user space
that everything is u64 and having the kernel retrieve pointers and ulong
doesn't work, you have to actually copy in u64 values and truncate them
down.  Your current code is broken on all 32bit big endian kernels.
Actually, the check needs to be that the upper 32 bits are 0 or return
-EINVAL.

In addition, the compat syscall entry points assume that the arguments
have been truncated to compat_ulong values by the syscall entry path,
and that they only need to do sign extension (and/or pointer conversion
on s390 with its 31 bit pointers).  So all compat kernels are broken.

The two of these things together makes me think we want two copy
functions.  At that point we may as well define the struct uatom in
terms of ulong and compat_ulong for the compat_uatom.  That would lead
to two copies of exec_uatom, but the elimination of passing the syscall
table as an argument down.  The need_resched and signal check could
become part of the common next_uatom routine, although it would need to
know uatom+1 instead of doing the addition in itself.

Other observations:

All the logic setting at and async_ready is a bit hard to follow.
After some analysis, t->at is only ever set to &t->__at and
async_ready is only set to the same at or NULL.  Both of these
should become flags, and at->task should be converted to
container_of.  Also, the name at is hard to grep / search for.

The stop flags are decoded with a case but are not densely encoded,
rather they are one hot.  We either need to error on multiple stop
bits being set, stop on each possible condition, or encode them
densely.

There is no check for flags being set that are not recognized.
If we ever add a flag for another stop condition this would
lead to incorrect execution by the kernel.

There are some syscalls that can return -EFAULT but later have
force_syscall_noerror.  We should create a stop on ERROR and
clear the force_noerror flag between syscalls.  The umem_add
syscall should add force_noerror if the put_user succeeds.

In copy_uatom, you call verify_read on the entire uatom.  This means
that the struct with all user space size has to be within the process
limit, which violates your assertion that userspace doesn't need the
whole structure.  If we add the requirement that the space that would
be occupied by the complete atom has to exist, then we can copy the
whole struct uatom with copy_from_user and then copy the args with
get_user.  User space can still pack them more densely, and we can
still stop copying on a null arg pointer.  Actually, calling access_ok
then __get_user can be more expensive on some architectures because
they have to verify both start and length on access_ok but can only
verify start on get_user because they have unmapped areas between user
space and kernel space.  This would also mean that we don't check
arg_ptr for NULL without verifying that get_user actually worked.	
The gotos in exec_uatom are just a while loop with a break.

sys_umem_add should be in /lib under lib-y in the Makefile.
In fact declaring the function weak does not make it a weak
syscall implementation on some architectures.

Weak syscalls aliases to sys_ni_syscall are needed for when
async support is not selected in Kconfig.

The Documentation patch does not explain threadlets.  In fact, I
had to read the old patch announcement to find out what a threadlet
was and how the concept was different from a syslet.

Do we need 32 bits for the syscall number?   I wonder if we
would be better served by a 16 bit syscall number and 16 bits
reserved for future use.

The how does userspace actually find out about the error doing the
completion in the syslet case?  Its actually executing in an
async_thread.  The only case a return code is given to userspace
is sys_async_thread, and those are mostly invisible to the thread.

I'm not sure if the api for threadlets is ready.  Why does user space
need scan for the stack being consumed?  Can't it infer that by the
return code of the sync task?

<partial-thought> It appears you expect restore_ip and restore_stack
implement a minimal longjmp back to userspace.  Instead of passing
these explicitly, why not treat async_on as a setjmp and return to that
point. As Anton pointed out, a simple stack pointer and next instruction
pointer are not enough for all architectures.  </partial-thought>

Why must the sp reg macro not be an lvalue?  To enforce your stack
clear api?

There seemed to be some copied comments in the x86-64 code (including
some space vs tabs).

Is testing t->async_ready what we want to exclude system calls on?  I
thought that would catch callers who might go async on schedule()
instead of those running async in the background.   I think we want
somthing more like t->at or whatever flag that becomes, saying that
the thread is a background thread.  Or maybe a combination.

milton


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2007-03-13  7:05 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-28 21:39 [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Ingo Molnar
2007-02-28 21:41 ` [patch 01/12] syslets: add async.h include file, kernel-side API definitions Ingo Molnar
2007-02-28 21:41 ` [patch 02/12] syslets: add syslet.h include file, user API/ABI definitions Ingo Molnar
2007-03-01  3:05   ` Kevin O'Connor
2007-03-01  9:18     ` Ingo Molnar
2007-02-28 21:41 ` [patch 03/12] syslets: generic kernel bits Ingo Molnar
2007-02-28 21:41 ` [patch 04/12] syslets: core code Ingo Molnar
2007-02-28 21:41 ` [patch 05/12] syslets: core, documentation Ingo Molnar
2007-02-28 21:41 ` [patch 06/12] x86: split FPU state from task state Ingo Molnar
2007-02-28 21:42 ` [patch 07/12] syslets: x86, add create_async_thread() method Ingo Molnar
2007-02-28 21:42 ` [patch 08/12] syslets: x86, add move_user_context() method Ingo Molnar
2007-02-28 21:42 ` [patch 09/12] syslets: x86, mark async unsafe syscalls Ingo Molnar
2007-02-28 21:42 ` [patch 10/12] syslets: x86: enable ASYNC_SUPPORT Ingo Molnar
2007-02-28 21:42 ` [patch 11/12] syslets: x86, wire up the syslet system calls Ingo Molnar
2007-02-28 21:42 ` [patch 12/12] syslets: x86_64: add syslet/threadlet support Ingo Molnar
2007-03-01  9:36 ` [patch 00/12] Syslets, Threadlets, generic AIO support, v5 Stephen Rothwell
2007-03-07 20:10 ` Anton Blanchard
2007-03-13  7:05   ` Milton Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).