linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* syslets v7: back to basics
@ 2007-12-06 23:20 Zach Brown
  2007-12-06 23:20 ` [PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype Zach Brown
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Zach Brown @ 2007-12-06 23:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Ulrich Drepper, Arjan van de Ven,
	Andrew Morton, Alan Cox, Evgeniy Polyakov, David S. Miller,
	Suparna Bhattacharya, Davide Libenzi, Jens Axboe,
	Thomas Gleixner, Dan Williams, Jeff Moyer, Simon Holm Thogersen,
	suresh.b.siddha


The following patches are a substantial refactoring of the syslet code.  I'm
branding them as the v7 release of the syslet infrastructure, though they
represent a signifiant change in focus.

My current focus is to see the most fundamental functionality brought to
maturity.  To me, this means getting a ABI that is used by applications through
glibc on x86 and PPC64.   Only once that is ready should we distract ourselves
with advanced complexity.

To that end, this patch series differs from v6 in significant ways:

 * syslets are initiated by providing syslet arguments to sys_indirect().

 * uatoms, threadlets, and the kaio changes are postponed until they can be
   justified and rebuilt on more complete infrastructure.  (I'm not saying
   these shouldn't or won't be persued.  I'm saying that we should get the
   simplest piece working first.)

 * the code is clarified and commented, the patches are bisectable and pass
   checkpatch.

The use of sys_indirect() and the move from 'atom's simplified the ABI
considerably.  I've put a trivial example in a syslet-userspace git tree:

    git://git.kernel.org/pub/scm/linux/kernel/git/zab/syslets-userspace.git

This git repository will grow more tests and documentation over time.

The patches sent with this mail are based on the v6 indirect patches but they
aren't included.  The full syslets patch series, including the indirect
patches, are available in a few forms:

broken out patch series:
    http://www.kernel.org/pub/linux/kernel/people/zab/broken-out/syslets/

in a 'syslets' git branch off of current linux-2.6.git:
    git://git.kernel.org/pub/scm/linux/kernel/git/zab/linux-2.6.git syslets

git tree of the guilt .git/patches directory:
    git://git.kernel.org/pub/scm/linux/kernel/git/zab/guilt-series.git

The patches were barely tested on i386 and x86_64.

There are both implementation details and design problems left.  My hope is
that we can address these in the coming weeks.

 - Do we stop the user from initiating more syslets than fit in the ring? 
 - Do we worry now about the hashed ring mutexes scaling poorly?  (They will.)
 - What are the semantics of ptrace()ing a syslet submission which blocks?
 - How should applications deal with waiting syslet tasks with stale data
   in their task_struct?  (syslet, setuid, syslet..)
 - Issuing a syslet is an implicit sys_clone(), will apps pass in clone flags?
 - Are the u32 ring index reads and writes atomic for supported architectures?

Any feedback on these questions would be greatly appreciated.

I'm particularly interested in hearing from people who are trying to use
syslets in their applications.  This will involve awkward wrappers instead of
glibc calls for now, and your machine may explode, but hopefully the chance to
influence the design of syslets would make it worth the effort.

Finally, I carried the enormous cc: list for this mail over from previous
syslet releases.  If you want to be removed or added to the list for future
syslet releases, please do let me know.

- z

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype
  2007-12-06 23:20 syslets v7: back to basics Zach Brown
@ 2007-12-06 23:20 ` Zach Brown
  2007-12-06 23:20   ` [PATCH 2/6] syslet: asm-generic support to disable syslets Zach Brown
  2007-12-08 12:40   ` [PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype Simon Holm Thøgersen
  2007-12-08 12:52 ` [PATCH] Fix casting on architectures with 32-bit pointers/longs Simon Holm Thøgersen
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 26+ messages in thread
From: Zach Brown @ 2007-12-06 23:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Ulrich Drepper, Arjan van de Ven,
	Andrew Morton, Alan Cox, Evgeniy Polyakov, David S. Miller,
	Suparna Bhattacharya, Davide Libenzi, Jens Axboe,
	Thomas Gleixner, Dan Williams, Jeff Moyer, Simon Holm Thogersen,
	suresh.b.siddha

call_indirect() was using the wrong calling convention for the system call
handlers.  system call handlers would get mixed up arguments.

Signed-off-by: Zach Brown <zach.brown@oracle.com>

diff --git a/include/asm-x86/indirect_32.h b/include/asm-x86/indirect_32.h
index a1b72ac..e3dea8e 100644
--- a/include/asm-x86/indirect_32.h
+++ b/include/asm-x86/indirect_32.h
@@ -15,8 +15,8 @@ struct indirect_registers {
 
 static inline long call_indirect(struct indirect_registers *regs)
 {
-  extern long (*sys_call_table[]) (__u32, __u32, __u32, __u32, __u32, __u32);
-
+	extern asmlinkage long (*sys_call_table[])(long, long, long,
+						   long, long, long);
   return sys_call_table[INDIRECT_SYSCALL(regs)](regs->ebx, regs->ecx,
 						regs->edx, regs->esi,
 						regs->edi, regs->ebp);
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/6] syslet: asm-generic support to disable syslets
  2007-12-06 23:20 ` [PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype Zach Brown
@ 2007-12-06 23:20   ` Zach Brown
  2007-12-06 23:20     ` [PATCH 3/6] syslet: introduce abi structs Zach Brown
  2007-12-08 12:40   ` [PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype Simon Holm Thøgersen
  1 sibling, 1 reply; 26+ messages in thread
From: Zach Brown @ 2007-12-06 23:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Ulrich Drepper, Arjan van de Ven,
	Andrew Morton, Alan Cox, Evgeniy Polyakov, David S. Miller,
	Suparna Bhattacharya, Davide Libenzi, Jens Axboe,
	Thomas Gleixner, Dan Williams, Jeff Moyer, Simon Holm Thogersen,
	suresh.b.siddha

This provides an implementation of the architecture dependent portion of
syslets which disables syslet operations.

This patch is an incomplete demonstration.  All asm-*/syslet*.h files would
include these files until their architectures provide implementations which
enable syslet support.

Signed-off-by: Zach Brown <zach.brown@oracle.com>

diff --git a/include/asm-generic/syslet-abi.h b/include/asm-generic/syslet-abi.h
new file mode 100644
index 0000000..5d19971
--- /dev/null
+++ b/include/asm-generic/syslet-abi.h
@@ -0,0 +1,11 @@
+#ifndef _ASM_GENERIC_SYSLET_ABI_H
+#define _ASM_GENERIC_SYSLET_ABI_H
+
+/*
+ * I'm assuming that a u64 ip and u64 esp won't be enough for all
+ * archs, so I just let each arch define its own.
+ */
+struct syslet_frame {
+};
+
+#endif
diff --git a/include/asm-generic/syslet.h b/include/asm-generic/syslet.h
new file mode 100644
index 0000000..de9a750
--- /dev/null
+++ b/include/asm-generic/syslet.h
@@ -0,0 +1,34 @@
+#ifndef _ASM_GENERIC_SYSLET_H
+#define _ASM_GENERIC_SYSLET_H
+
+/*
+ * This provider of the arch-specific syslet APIs is used when an architecture
+ * doesn't support syslets.
+ */
+
+/* this stops the other functions from ever being called */
+static inline int syslet_frame_valid(struct syslet_frame *frame)
+{
+	return 0;
+}
+
+static inline void set_user_frame(struct task_struct *task,
+				  struct syslet_frame *frame)
+{
+	BUG();
+}
+
+static inline void move_user_context(struct task_struct *dest,
+					struct task_struct *src)
+{
+	BUG();
+}
+
+static inline int create_syslet_thread(long (*fn)(void *),
+				       void *arg, unsigned long flags)
+{
+	BUG();
+	return 0;
+}
+
+#endif
diff --git a/include/asm-x86/syslet-abi.h b/include/asm-x86/syslet-abi.h
new file mode 100644
index 0000000..14a7182
--- /dev/null
+++ b/include/asm-x86/syslet-abi.h
@@ -0,0 +1 @@
+#include <asm-generic/syslet-abi.h>
diff --git a/include/asm-x86/syslet.h b/include/asm-x86/syslet.h
new file mode 100644
index 0000000..583d810
--- /dev/null
+++ b/include/asm-x86/syslet.h
@@ -0,0 +1 @@
+#include <asm-generic/syslet.h>
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/6] syslet: introduce abi structs
  2007-12-06 23:20   ` [PATCH 2/6] syslet: asm-generic support to disable syslets Zach Brown
@ 2007-12-06 23:20     ` Zach Brown
  2007-12-06 23:20       ` [PATCH 4/6] syslets: add indirect args Zach Brown
  0 siblings, 1 reply; 26+ messages in thread
From: Zach Brown @ 2007-12-06 23:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Ulrich Drepper, Arjan van de Ven,
	Andrew Morton, Alan Cox, Evgeniy Polyakov, David S. Miller,
	Suparna Bhattacharya, Davide Libenzi, Jens Axboe,
	Thomas Gleixner, Dan Williams, Jeff Moyer, Simon Holm Thogersen,
	suresh.b.siddha

This patch adds the architecture independent structures of the
syslet ABI.

Signed-off-by: Zach Brown <zach.brown@oracle.com>

diff --git a/include/linux/syslet-abi.h b/include/linux/syslet-abi.h
new file mode 100644
index 0000000..a8bc1a3
--- /dev/null
+++ b/include/linux/syslet-abi.h
@@ -0,0 +1,34 @@
+#ifndef _LINUX_SYSLET_ABI_H
+#define _LINUX_SYSLET_ABI_H
+
+#include <asm/syslet-abi.h> /* for struct syslet_frame */
+
+struct syslet_args {
+	u64 completion_ring_ptr;
+	u64 caller_data;
+	struct syslet_frame frame;
+};
+
+struct syslet_completion {
+	u64 status;
+	u64 caller_data;
+};
+
+/*
+ * The ring follows the "wrapping" convention as described by Andrew at:
+ * 	http://lkml.org/lkml/2007/4/11/276
+ * The head is updated by the kernel as completions are added and the
+ * tail is updated by userspace as completions are removed.
+ *
+ * The number of elements must be a power of two and the ring must be
+ * aligned to a u64.
+ */
+struct syslet_ring {
+	u32 kernel_head;
+	u32 user_tail;
+	u32 elements;
+	u32 wait_group;
+	struct syslet_completion comp[0];
+};
+
+#endif
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/6] syslets: add indirect args
  2007-12-06 23:20     ` [PATCH 3/6] syslet: introduce abi structs Zach Brown
@ 2007-12-06 23:20       ` Zach Brown
  2007-12-06 23:20         ` [PATCH 5/6] syslets: add generic syslets infrastructure Zach Brown
  0 siblings, 1 reply; 26+ messages in thread
From: Zach Brown @ 2007-12-06 23:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Ulrich Drepper, Arjan van de Ven,
	Andrew Morton, Alan Cox, Evgeniy Polyakov, David S. Miller,
	Suparna Bhattacharya, Davide Libenzi, Jens Axboe,
	Thomas Gleixner, Dan Williams, Jeff Moyer, Simon Holm Thogersen,
	suresh.b.siddha

This adds the syslet indirect args to the indirect_params union.

This is broken, but it lets us simply demonstrate the rest of the syslet
universe around the indirect argument passing convention.

A caller could well want to perform a syscall that uses indirect arguments as a
syscall.  Maybe we turn indirect_params into a struct that contains a union for
arguments which can never be used concurrently.  This needs wider discussion.

Signed-off-by: Zach Brown <zach.brown@oracle.com>

diff --git a/include/linux/indirect.h b/include/linux/indirect.h
index 97f9ac4..5d5abd7 100644
--- a/include/linux/indirect.h
+++ b/include/linux/indirect.h
@@ -3,6 +3,7 @@
 #define _LINUX_INDIRECT_H
 
 #include <asm/indirect.h>
+#include <linux/syslet-abi.h>
 
 
 /* IMPORTANT:
@@ -14,6 +15,7 @@ union indirect_params {
   struct {
     int flags;
   } file_flags;
+  struct syslet_args syslet;
 };
 
 #define INDIRECT_PARAM(set, name) current->indirect_params.set.name
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/6] syslets: add generic syslets infrastructure
  2007-12-06 23:20       ` [PATCH 4/6] syslets: add indirect args Zach Brown
@ 2007-12-06 23:20         ` Zach Brown
  2007-12-06 23:20           ` [PATCH 6/6] syslets: add both 32bit and 64bit x86 syslet support Zach Brown
                             ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Zach Brown @ 2007-12-06 23:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Ulrich Drepper, Arjan van de Ven,
	Andrew Morton, Alan Cox, Evgeniy Polyakov, David S. Miller,
	Suparna Bhattacharya, Davide Libenzi, Jens Axboe,
	Thomas Gleixner, Dan Williams, Jeff Moyer, Simon Holm Thogersen,
	suresh.b.siddha

The indirect syslet arguments specify where to store the completion and what
function in userspcae to return to once the syslet has been executed.  The
details of how we pass the indirect syslet arguments needs help.

We parse the indirect syslet arguments in sys_indirect() before we call
the given system call.  If they're OK we mark the task as ready to become
a syslet.  We make sure that there is a child task waiting.

We call into kernel/syslet.c from the scheduler when we try to block a task
which has been marked as ready.   A child task is woken and returns to
userspace.

We store the result of the system call in the userspace ring back up in
sys_indrect() as the system call finally finishes.  At that point the original
task returns to the frame that userspace provided in the indirect syslet args.

This generic infrastructure relies on architecture specific routines to create
a new child task, move userspace state from one kernel task to another, and to
setup the userspace return frame in ptregs.  Code in asm-generic just returns
-EINVAL until an architecture provides the needed routines.

This is a simplification of Ingo's more involved syslet and threatlet
infrastructure which was built around 'uatoms'.  Enough code has changed that
it wasn't appropriate to bring the previous Signed-off-by lines forward.

Signed-off-by: Zach Brown <zach.brown@oracle.com>;

diff --git a/fs/exec.c b/fs/exec.c
index 282240a..942262f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -51,6 +51,7 @@
 #include <linux/tsacct_kern.h>
 #include <linux/cn_proc.h>
 #include <linux/audit.h>
+#include <linux/syslet.h>
 
 #include <asm/uaccess.h>
 #include <asm/mmu_context.h>
@@ -1614,6 +1615,8 @@ static int coredump_wait(int exit_code)
 		complete(vfork_done);
 	}
 
+	kill_syslet_tasks(tsk);
+
 	if (core_waiters)
 		wait_for_completion(&startup_done);
 fail:
diff --git a/include/asm-generic/errno.h b/include/asm-generic/errno.h
index e8852c0..26674c4 100644
--- a/include/asm-generic/errno.h
+++ b/include/asm-generic/errno.h
@@ -106,4 +106,7 @@
 #define	EOWNERDEAD	130	/* Owner died */
 #define	ENOTRECOVERABLE	131	/* State not recoverable */
 
+/* for syslets */
+#define	ESYSLETPENDING	132	/* syslet syscall now pending */
+
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 61a4b83..a134966 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1182,6 +1182,13 @@ struct task_struct {
 
 	/* Additional system call parameters.  */
 	union indirect_params indirect_params;
+
+	/* task waiting to return to userspace if we block as a syslet */
+	spinlock_t		syslet_lock;
+	struct list_head	syslet_tasks;
+	unsigned		syslet_ready:1,
+				syslet_return:1,
+				syslet_exit:1;
 };
 
 /*
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index addb39f..1a44838 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -55,6 +55,7 @@ struct compat_timeval;
 struct robust_list_head;
 struct getcpu_cache;
 struct indirect_registers;
+struct syslet_ring;
 
 #include <linux/types.h>
 #include <linux/aio_abi.h>
@@ -615,6 +616,8 @@ asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len);
 asmlinkage long sys_indirect(struct indirect_registers __user *userregs,
 			     void __user *userparams, size_t paramslen,
 			     int flags);
+asmlinkage long sys_syslet_ring_wait(struct syslet_ring __user *ring,
+				     unsigned long user_idx);
 
 int kernel_execve(const char *filename, char *const argv[], char *const envp[]);
 
diff --git a/include/linux/syslet.h b/include/linux/syslet.h
new file mode 100644
index 0000000..734b98f
--- /dev/null
+++ b/include/linux/syslet.h
@@ -0,0 +1,18 @@
+#ifndef _LINUX_SYSLET_H
+#define _LINUX_SYSLET_H
+
+#include <linux/syslet-abi.h>
+#include <asm/syslet.h>
+
+void syslet_init(struct task_struct *tsk);
+void kill_syslet_tasks(struct task_struct *cur);
+void syslet_schedule(struct task_struct *cur);
+int syslet_pre_indirect(void);
+int syslet_post_indirect(int status);
+
+static inline int syslet_args_present(union indirect_params *params)
+{
+	return params->syslet.completion_ring_ptr;
+}
+
+#endif
diff --git a/kernel/Makefile b/kernel/Makefile
index bcad37d..7a7dfbe 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -9,7 +9,7 @@ obj-y     = sched.o fork.o exec_domain.o panic.o printk.o profile.o \
 	    rcupdate.o extable.o params.o posix-timers.o \
 	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
 	    hrtimer.o rwsem.o latency.o nsproxy.o srcu.o \
-	    utsname.o notifier.o
+	    utsname.o notifier.o syslet.o
 
 obj-$(CONFIG_SYSCTL) += sysctl_check.o
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
diff --git a/kernel/exit.c b/kernel/exit.c
index 549c055..831b2f9 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -44,6 +44,7 @@
 #include <linux/resource.h>
 #include <linux/blkdev.h>
 #include <linux/task_io_accounting_ops.h>
+#include <linux/syslet.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -949,6 +950,12 @@ fastcall NORET_TYPE void do_exit(long code)
 		schedule();
 	}
 
+	/*
+	 * syslet threads have to exit their context before the MM exit (due to
+	 * the coredumping wait).
+	 */
+	kill_syslet_tasks(tsk);
+
 	tsk->flags |= PF_EXITING;
 	/*
 	 * tsk->flags are checked in the futex code to protect against
diff --git a/kernel/fork.c b/kernel/fork.c
index 8ca1a14..4b1efb9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -51,6 +51,7 @@
 #include <linux/random.h>
 #include <linux/tty.h>
 #include <linux/proc_fs.h>
+#include <linux/syslet.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -1123,6 +1124,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	p->blocked_on = NULL; /* not blocked yet */
 #endif
 
+	syslet_init(p);
+
 	/* Perform scheduler related setup. Assign this task to a CPU. */
 	sched_fork(p, clone_flags);
 
diff --git a/kernel/indirect.c b/kernel/indirect.c
index bc60850..3bcfaff 100644
--- a/kernel/indirect.c
+++ b/kernel/indirect.c
@@ -3,6 +3,8 @@
 #include <linux/unistd.h>
 #include <asm/asm-offsets.h>
 
+/* XXX would we prefer to generalize this somehow? */
+#include <linux/syslet.h>
 
 asmlinkage long sys_indirect(struct indirect_registers __user *userregs,
 			     void __user *userparams, size_t paramslen,
@@ -17,6 +19,24 @@ asmlinkage long sys_indirect(struct indirect_registers __user *userregs,
 	if (copy_from_user(&regs, userregs, sizeof(regs)))
 		return -EFAULT;
 
+	if (paramslen > sizeof(union indirect_params))
+		return -EINVAL;
+
+	if (copy_from_user(&current->indirect_params, userparams, paramslen)) {
+		result = -EFAULT;
+		goto out;
+	}
+
+	/* We need to come up with a better way to allow and forbid syscalls */
+	if (unlikely(syslet_args_present(&current->indirect_params))) {
+		result = syslet_pre_indirect();
+		if (result == 0) {
+			result = call_indirect(&regs);
+			result = syslet_post_indirect(result);
+		}
+		goto out;
+	}
+
 	switch (INDIRECT_SYSCALL (&regs))
 	{
 #define INDSYSCALL(name) __NR_##name
@@ -24,16 +44,12 @@ asmlinkage long sys_indirect(struct indirect_registers __user *userregs,
 		break;
 
 	default:
-		return -EINVAL;
+		result = -EINVAL;
+		goto out;
 	}
 
-	if (paramslen > sizeof(union indirect_params))
-		return -EINVAL;
-
-	result = -EFAULT;
-	if (!copy_from_user(&current->indirect_params, userparams, paramslen))
-		result = call_indirect(&regs);
-
+	result = call_indirect(&regs);
+out:
 	memset(&current->indirect_params, '\0', paramslen);
 
 	return result;
diff --git a/kernel/sched.c b/kernel/sched.c
index 59ff6b1..27e799d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -63,6 +63,7 @@
 #include <linux/reciprocal_div.h>
 #include <linux/unistd.h>
 #include <linux/pagemap.h>
+#include <linux/syslet.h>
 
 #include <asm/tlb.h>
 #include <asm/irq_regs.h>
@@ -3612,6 +3613,14 @@ asmlinkage void __sched schedule(void)
 	struct rq *rq;
 	int cpu;
 
+	prev = current;
+	if (unlikely(prev->syslet_ready)) {
+		if (prev->state && !(preempt_count() & PREEMPT_ACTIVE) &&
+			(!(prev->state & TASK_INTERRUPTIBLE) ||
+				!signal_pending(prev)))
+			syslet_schedule(prev);
+	}
+
 need_resched:
 	preempt_disable();
 	cpu = smp_processor_id();
diff --git a/kernel/syslet.c b/kernel/syslet.c
new file mode 100644
index 0000000..6fb2eb9
--- /dev/null
+++ b/kernel/syslet.c
@@ -0,0 +1,462 @@
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/mutex.h>
+#include <linux/completion.h>
+#include <linux/err.h>
+#include <linux/jhash.h>
+#include <linux/list.h>
+#include <linux/syslet.h>
+
+#include <asm/uaccess.h>
+
+/*
+ * XXX todo:
+ *  - do we need all this '*cur = current' nonsense?
+ *  - try to prevent userspace from submitting too much.. lazy user ptr read?
+ *  - explain how to deal with waiting threads with stale data in current
+ *  - how does userspace tell that a syslet completion was lost?
+ *  	provide an -errno argument to the userspace return function?
+ */
+
+/*
+ * These structs are stored on the kernel stack of tasks which are waiting to
+ * return to userspace.  They are linked into their parent's list of syslet
+ * children stored in 'syslet_tasks' in the parent's task_struct.
+ */
+struct syslet_task_entry {
+	struct task_struct *task;
+	struct list_head item;
+};
+
+/*
+ * syslet_ring doesn't have any kernel-side storage.  Userspace allocates them
+ * in their address space and initializes their fields and then passes them to
+ * the kernel.
+ *
+ * These hashes provide the kernel-side storage for the wait queues which
+ * sys_syslet_ring_wait() uses and the mutex which completion uses to serialize
+ * the (possible blocking) ordered writes of the completion and kernel head
+ * index into the ring.
+ *
+ * We chose the bucket that supports a given ring by hashing a u32 that
+ * userspace sets in the ring.
+ */
+#define SYSLET_HASH_BITS (CONFIG_BASE_SMALL ? 4 : 8)
+#define SYSLET_HASH_NR (1 << SYSLET_HASH_BITS)
+#define SYSLET_HASH_MASK (SYSLET_HASH_NR - 1)
+static wait_queue_head_t syslet_waitqs[SYSLET_HASH_NR];
+static struct mutex syslet_muts[SYSLET_HASH_NR];
+
+static wait_queue_head_t *ring_waitqueue(struct syslet_ring __user *ring)
+{
+	u32 group;
+
+	if (get_user(group, &ring->wait_group))
+		return ERR_PTR(-EFAULT);
+	else
+		return &syslet_waitqs[jhash_1word(group, 0) & SYSLET_HASH_MASK];
+}
+
+static struct mutex *ring_mutex(struct syslet_ring __user *ring)
+{
+	u32 group;
+
+	if (get_user(group, (u32 __user *)&ring->wait_group))
+		return ERR_PTR(-EFAULT);
+	else
+		return &syslet_muts[jhash_1word(group, 0) & SYSLET_HASH_MASK];
+}
+
+/*
+ * This is called for new tasks and for child tasks which might copy
+ * task_struct from their parent.  So we clear the syslet indirect args,
+ * too, just to be clear.
+ */
+void syslet_init(struct task_struct *tsk)
+{
+	memset(&tsk->indirect_params.syslet, 0, sizeof(struct syslet_args));
+	spin_lock_init(&tsk->syslet_lock);
+	INIT_LIST_HEAD(&tsk->syslet_tasks);
+	tsk->syslet_ready = 0;
+	tsk->syslet_return = 0;
+	tsk->syslet_exit = 0;
+}
+
+static struct task_struct *first_syslet_task(struct task_struct *parent)
+{
+	struct syslet_task_entry *entry;
+
+	assert_spin_locked(&parent->syslet_lock);
+
+	if (!list_empty(&parent->syslet_tasks)) {
+		entry = list_first_entry(&parent->syslet_tasks,
+					 struct syslet_task_entry, item);
+		return entry->task;
+	} else
+		return NULL;
+}
+
+/*
+ * XXX it's not great to wake up potentially lots of tasks under the lock
+ */
+/*
+ * We ask all the waiting syslet tasks to exit before we ourselves will
+ * exit.  The tasks remove themselves from the list and wake our process
+ * with the lock held to be sure that we're still around when they wake us.
+ */
+void kill_syslet_tasks(struct task_struct *cur)
+{
+	struct syslet_task_entry *entry;
+
+	spin_lock(&cur->syslet_lock);
+
+	list_for_each_entry(entry, &cur->syslet_tasks, item)  {
+		entry->task->syslet_exit = 1;
+		wake_up_process(entry->task);
+	}
+
+	while (!list_empty(&cur->syslet_tasks)) {
+		set_task_state(cur, TASK_INTERRUPTIBLE);
+		if (list_empty(&cur->syslet_tasks))
+			break;
+		spin_unlock(&cur->syslet_lock);
+		schedule();
+		spin_lock(&cur->syslet_lock);
+	}
+	spin_unlock(&cur->syslet_lock);
+
+	set_task_state(cur, TASK_RUNNING);
+}
+
+/*
+ * This task is cloned off of a syslet parent as the parent calls
+ * syslet_pre_indirect() from sys_indirect().  That parent waits for us to
+ * complete a completion struct on their stack.
+ *
+ * This task then waits until its parent tells it to return to user space on
+ * its behalf when the parent gets in to schedule().
+ *
+ * The parent in schedule will set this tasks's ptregs frame to return to the
+ * sys_indirect() call site in user space.  Our -ESYSLETPENDING return code is
+ * given to userspace to indicate that the status of their system call
+ * will be delivered to the ring.
+ */
+struct syslet_task_args {
+	struct completion *comp;
+	struct task_struct *parent;
+};
+static long syslet_thread(void *data)
+{
+	struct syslet_task_args args;
+	struct task_struct *cur = current;
+	struct syslet_task_entry entry = {
+		.task = cur,
+		.item = LIST_HEAD_INIT(entry.item),
+	};
+
+	args = *(struct syslet_task_args *)data;
+
+	spin_lock(&args.parent->syslet_lock);
+	list_add_tail(&entry.item, &args.parent->syslet_tasks);
+	spin_unlock(&args.parent->syslet_lock);
+
+	complete(args.comp);
+
+	/* wait until the scheduler tells us to return to user space */
+	for (;;) {
+		set_task_state(cur, TASK_INTERRUPTIBLE);
+		if (cur->syslet_return || cur->syslet_exit ||
+		    signal_pending(cur))
+			break;
+		schedule();
+	}
+	set_task_state(cur, TASK_RUNNING);
+
+	spin_lock(&args.parent->syslet_lock);
+	list_del(&entry.item);
+	/* our parent won't exit until it tests the list under the lock */
+	if (list_empty(&args.parent->syslet_tasks))
+		wake_up_process(args.parent);
+	spin_unlock(&args.parent->syslet_lock);
+
+	/* just exit if we weren't asked to return to userspace */
+	if (!cur->syslet_return)
+		do_exit(0);
+
+	/* inform userspace that their call will complete in the ring */
+	return -ESYSLETPENDING;
+}
+
+static int create_new_syslet_task(struct task_struct *cur)
+{
+	struct syslet_task_args args;
+	struct completion comp;
+	int ret;
+
+	init_completion(&comp);
+	args.comp = &comp;
+	args.parent = cur;
+
+	ret = create_syslet_thread(syslet_thread, &args,
+				   CLONE_VM | CLONE_FS | CLONE_FILES |
+				   CLONE_SIGHAND | CLONE_THREAD |
+				   CLONE_SYSVSEM);
+	if (ret >= 0) {
+		wait_for_completion(&comp);
+		ret = 0;
+	}
+
+	return ret;
+}
+
+/*
+ * This is called by sys_indirect() when it sees that syslet args have
+ * been provided.  We validate the arguments and make sure that there is
+ * a task waiting.  If everything works out we tell the scheduler that it
+ * can call syslet_schedule() by setting syslet_ready.
+ */
+int syslet_pre_indirect(void)
+{
+	struct task_struct *cur = current;
+	struct syslet_ring __user *ring;
+	u32 elements;
+	int ret;
+
+	/* Not sure if returning -EINVAL on unsupported archs is right */
+	if (!syslet_frame_valid(&cur->indirect_params.syslet.frame)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ring = (struct syslet_ring __user __force *)(unsigned long)
+		cur->indirect_params.syslet.completion_ring_ptr;
+	if (get_user(elements, &ring->elements)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	if (!is_power_of_2(elements)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/*
+	 * Racing to test this list outside the lock as the final task removes
+	 * itself is OK.  It should be very rare, and all it results in is
+	 * syslet_schedule() finding the list empty and letting the task block.
+	 */
+	if (list_empty(&cur->syslet_tasks)) {
+		ret = create_new_syslet_task(cur);
+		if (ret)
+			goto out;
+	} else
+		ret = 0;
+
+	cur->syslet_ready = 1;
+out:
+	return ret;
+}
+
+/*
+ * This is called by sys_indirect() after it has called the given system
+ * call handler.  If we didn't block then we just return the status of the
+ * system call to userspace.
+ *
+ * If we did bock, however, then userspace got a -ESYSLETPENDING long ago.
+ * We need to deliver the status of the system call into the syslet ring
+ * and then return to the function in userspace which the caller specified
+ * in the frame in the syslet args.  schedule() already set that up
+ * when we blocked.  All we have to do is return to userspace.
+ *
+ * The return code from this function is lost.  It could become the
+ * argument to the userspace return function which would let us tell
+ * userspace when we fail to copy the status into the ring.
+ */
+int syslet_post_indirect(int status)
+{
+	struct syslet_ring __user *ring;
+	struct syslet_completion comp;
+	struct task_struct *cur = current;
+	struct syslet_args *args = &cur->indirect_params.syslet;
+	wait_queue_head_t *waitq;
+	struct mutex *mutex;
+	u32 kidx;
+	u32 mask;
+	int ret;
+
+	/* we didn't block, just return the status to userspace */
+	if (cur->syslet_ready) {
+		cur->syslet_ready = 0;
+		return status;
+	}
+
+	ring = (struct syslet_ring __force __user *)(unsigned long)
+		args->completion_ring_ptr;
+
+	comp.status = status;
+	comp.caller_data = args->caller_data;
+
+	mutex = ring_mutex(ring);
+	if (IS_ERR(mutex))
+		return PTR_ERR(mutex);
+
+	waitq = ring_waitqueue(ring);
+	if (IS_ERR(waitq))
+		return PTR_ERR(waitq);
+
+	if (get_user(mask, &ring->elements))
+		return -EFAULT;
+
+	if (!is_power_of_2(mask))
+		return -EINVAL;
+	mask--;
+
+	mutex_lock(mutex);
+
+	ret = -EFAULT;
+	if (get_user(kidx, (u32 __user *)&ring->kernel_head))
+		goto out;
+
+	if (copy_to_user(&ring->comp[kidx & mask], &comp, sizeof(comp)))
+		goto out;
+
+	/*
+	 * Make sure that the completion is stored before the index which
+	 * refers to it.  Notice that this means that userspace has to worry
+	 * about issuing a read memory barrier after it reads the index.
+	 */
+	smp_wmb();
+
+	kidx++;
+	if (put_user(kidx, &ring->kernel_head))
+		ret = -EFAULT;
+	else
+		ret = 0;
+out:
+	mutex_unlock(mutex);
+	if (ret == 0 && waitqueue_active(waitq))
+		wake_up(waitq);
+	return ret;
+}
+
+/*
+ * We're called by the scheduler when it sees that a task is about to block and
+ * has syslet_ready.  Our job is to hand userspace's state off to a waiting
+ * task and tell it to return to userspace.  That tells userspace that the
+ * system call that we're executing blocked and will complete in the future.
+ *
+ * The indirect syslet arguemnts specify the userspace instruction and stack
+ * that the child should return to.
+ */
+void syslet_schedule(struct task_struct *cur)
+{
+	struct task_struct *child = NULL;
+
+	spin_lock(&cur->syslet_lock);
+
+	child = first_syslet_task(cur);
+	if (child) {
+		move_user_context(child, cur);
+		set_user_frame(cur, &cur->indirect_params.syslet.frame);
+		cur->syslet_ready = 0;
+		child->syslet_return = 1;
+	}
+
+	spin_unlock(&cur->syslet_lock);
+
+	if (child)
+		wake_up_process(child);
+}
+
+/*
+ * Userspace calls this when the ring is empty.  We return to userspace
+ * when the kernel head and user tail indexes are no longer equal, meaning
+ * that the kernel has stored a new completion.
+ *
+ * The ring is stored entirely in user space.  We don't have a system call
+ * which initializes kernel state to go along with the ring.
+ *
+ * So we have to read the kernel head index from userspace.  In the common
+ * case this will not fault or block and will be a very fast simple
+ * pointer dereference.
+ *
+ * Howerver, we need a way for the kernel completion path to wake us when
+ * there is a new event.  We hash a field of the ring into buckets of
+ * wait queues for this.
+ *
+ * This relies on aligned u32 reads and writes being atomic with regard
+ * to other reads and writes, which I sure hope is true on linux's
+ * architectures.  I'm crossing my fingers.
+ */
+asmlinkage long sys_syslet_ring_wait(struct syslet_ring __user *ring,
+				     unsigned long user_idx)
+{
+	wait_queue_head_t *waitq;
+	struct task_struct *cur = current;
+	DEFINE_WAIT(wait);
+	u32 kidx;
+	int ret;
+
+	/* XXX disallow async waiting */
+
+	waitq = ring_waitqueue(ring);
+	if (IS_ERR(waitq)) {
+		ret = PTR_ERR(waitq);
+		goto out;
+	}
+
+	/*
+	 * We have to be careful not to miss wake-ups by setting our
+	 * state before testing the condition.  Testing our condition includes
+	 * copying the index from userspace, which can modify our state which
+	 * can mask a wake-up setting our state.
+	 *
+	 * So we very carefully copy the index.  We use the blocking copy
+	 * to fault the index in and detect bad pointers.  We only proceed
+	 * with the test and sleeping if the non-blocking copy succeeds.
+	 *
+	 * In the common case the non-blocking copy will succeed and this
+	 * will be very fast indeed.
+	 */
+	for (;;) {
+		prepare_to_wait(waitq, &wait, TASK_INTERRUPTIBLE);
+		ret = __copy_from_user_inatomic(&kidx, &ring->kernel_head,
+						sizeof(u32));
+		if (ret) {
+			set_task_state(cur, TASK_RUNNING);
+			ret = copy_from_user(&kidx, &ring->kernel_head,
+					     sizeof(u32));
+			if (ret) {
+				ret = -EFAULT;
+				break;
+			}
+			continue;
+		}
+
+		if (kidx != user_idx)
+			break;
+		if (signal_pending(cur)) {
+			ret = -ERESTARTSYS;
+			break;
+		}
+
+		schedule();
+	}
+
+	finish_wait(waitq, &wait);
+out:
+	return ret;
+}
+
+static int __init syslet_module_init(void)
+{
+	unsigned long i;
+
+	for (i = 0; i < SYSLET_HASH_NR; i++) {
+		init_waitqueue_head(&syslet_waitqs[i]);
+		mutex_init(&syslet_muts[i]);
+	}
+
+	return 0;
+}
+module_init(syslet_module_init);
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6] syslets: add both 32bit and 64bit x86 syslet support
  2007-12-06 23:20         ` [PATCH 5/6] syslets: add generic syslets infrastructure Zach Brown
@ 2007-12-06 23:20           ` Zach Brown
  2007-12-07 11:55           ` [PATCH 5/6] syslets: add generic syslets infrastructure Evgeniy Polyakov
  2008-01-09  2:03           ` Rusty Russell
  2 siblings, 0 replies; 26+ messages in thread
From: Zach Brown @ 2007-12-06 23:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Ulrich Drepper, Arjan van de Ven,
	Andrew Morton, Alan Cox, Evgeniy Polyakov, David S. Miller,
	Suparna Bhattacharya, Davide Libenzi, Jens Axboe,
	Thomas Gleixner, Dan Williams, Jeff Moyer, Simon Holm Thogersen,
	suresh.b.siddha

This adds the architecture-specific routines needed by syslets for x86.

The syslet thread creation routines create a new thread which executes
a kernel function and then returns to userspace instead of exiting.

move_user_context() and set_user_frame() let the scheduler modify a child
thread so that it returns to userspace at the same place that a blocking
system call would have when it finished.  This currently performs a very
expensive copy of the fpu state.  Intel is working on a more robust patch
which allocates the i387 state off of thread_struct.  When that is ready
this can just juggle pointers to transfer the fpu state.

The syslets infrastructure needs to work with ptregs for the task which
is in sys_indirect().  So we add a PTREGSCALL stub around sys_indirect()
in x86_64.

Finally, we wire up sys_syslet_ring_wait().

Signed-off-by: Zach Brown <zach.brown@oracle.com>

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index dc7f938..66a121d 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -1025,6 +1025,30 @@ ENTRY(kernel_thread_helper)
 	CFI_ENDPROC
 ENDPROC(kernel_thread_helper)
 
+ENTRY(syslet_thread_helper)
+	CFI_STARTPROC
+	/*
+	 * Allocate space on the stack for pt-regs.
+	 * sizeof(struct pt_regs) == 64, and we've got 8 bytes on the
+	 * kernel stack already:
+	 */
+	subl $64-8, %esp
+	CFI_ADJUST_CFA_OFFSET 64-8
+	movl %edx,%eax
+	push %edx
+	CFI_ADJUST_CFA_OFFSET 4
+	call *%ebx
+	addl $4, %esp
+	CFI_ADJUST_CFA_OFFSET -4
+
+	movl %eax, PT_EAX(%esp)
+
+	GET_THREAD_INFO(%ebp)
+
+	jmp syscall_exit
+	CFI_ENDPROC
+ENDPROC(syslet_thread_helper)
+
 #ifdef CONFIG_XEN
 ENTRY(xen_hypervisor_callback)
 	CFI_STARTPROC
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 3a058bb..08e34f6 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -412,6 +412,7 @@ END(\label)
 	PTREGSCALL stub_rt_sigsuspend, sys_rt_sigsuspend, %rdx
 	PTREGSCALL stub_sigaltstack, sys_sigaltstack, %rdx
 	PTREGSCALL stub_iopl, sys_iopl, %rsi
+	PTREGSCALL stub_indirect, sys_indirect, %r8
 
 ENTRY(ptregscall_common)
 	popq %r11
@@ -994,6 +995,71 @@ child_rip:
 ENDPROC(child_rip)
 
 /*
+ * Create a syslet kernel thread.  This differs from a thread created with
+ * kernel_thread() in that it returns to userspace after it finishes executing
+ * its given kernel function.
+ *
+ * C extern interface:
+ *	extern long create_syslet_thread(int (*fn)(void *),
+ *					 void * arg, unsigned long flags)
+ *
+ * asm input arguments:
+ *	rdi: fn, rsi: arg, rdx: flags
+ */
+ENTRY(create_syslet_thread)
+	CFI_STARTPROC
+	FAKE_STACK_FRAME $syslet_child_rip
+	SAVE_ALL
+
+	# rdi: flags, rsi: usp, rdx: will be &pt_regs
+	movq %rdx,%rdi
+	movq $-1, %rsi
+	movq %rsp, %rdx
+
+	xorl %r8d,%r8d
+	xorl %r9d,%r9d
+
+	# clone now
+	call do_fork
+	movq %rax,RAX(%rsp)
+	xorl %edi,%edi
+
+	/*
+	 * It isn't worth to check for reschedule here,
+	 * so internally to the x86_64 port you can rely on kernel_thread()
+	 * not to reschedule the child before returning, this avoids the need
+	 * of hacks for example to fork off the per-CPU idle tasks.
+         * [Hopefully no generic code relies on the reschedule -AK]
+	 */
+	RESTORE_ALL
+	UNFAKE_STACK_FRAME
+	ret
+	CFI_ENDPROC
+ENDPROC(syslet_kernel_thread)
+
+syslet_child_rip:
+	CFI_STARTPROC
+
+	movq %rdi, %rax
+	movq %rsi, %rdi
+	call *%rax
+
+	/*
+	 * Fix up the PDA - we might return with sysexit:
+	 */
+	RESTORE_TOP_OF_STACK %r11
+
+	/*
+	 * return to user-space:
+	 */
+	movq %rax, RAX(%rsp)
+	RESTORE_REST
+	jmp int_ret_from_sys_call
+
+	CFI_ENDPROC
+ENDPROC(syslet_child_rip)
+
+/*
  * execve(). This function needs to use IRET, not SYSRET, to set up all state properly.
  *
  * C extern interface:
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 7b89958..7bf2836 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -394,6 +394,39 @@ int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
 EXPORT_SYMBOL(kernel_thread);
 
 /*
+ * This gets run with %ebx containing the
+ * function to call, and %edx containing
+ * the "args".
+ */
+void syslet_thread_helper(void);
+
+/*
+ * Create a syslet kernel thread.  This differs from a thread created with
+ * kernel_thread() in that it returns to userspace after it finishes executing
+ * its given kernel function.
+ */
+int create_syslet_thread(long (*fn)(void *), void *arg, unsigned long flags)
+{
+	struct pt_regs regs;
+
+	memset(&regs, 0, sizeof(regs));
+
+	regs.ebx = (unsigned long)fn;
+	regs.edx = (unsigned long)arg;
+
+	regs.xds = __USER_DS;
+	regs.xes = __USER_DS;
+	regs.xfs = __KERNEL_PERCPU;
+	regs.orig_eax = -1;
+	regs.eip = (unsigned long)syslet_thread_helper;
+	regs.xcs = __KERNEL_CS | get_kernel_rpl();
+	regs.eflags = X86_EFLAGS_IF | X86_EFLAGS_SF | X86_EFLAGS_PF | 0x2;
+
+	/* Ok, create the new task.. */
+	return do_fork(flags, 0, &regs, 0, NULL, NULL);
+}
+
+/*
  * Free current thread data structures etc..
  */
 void exit_thread(void)
@@ -852,6 +885,32 @@ unsigned long get_wchan(struct task_struct *p)
 }
 
 /*
+ * This expensive hack will go away once thread->i387 is allocated instead of
+ * embedded in task_struct.  Intel is working on it.
+ */
+static union i387_union i387_tmp[NR_CPUS] __cacheline_aligned_in_smp;
+
+/*
+ * Move user-space context from one kernel thread to another.
+ * This includes registers and FPU state. Callers must make
+ * sure that neither task is running user context at the moment:
+ */
+void move_user_context(struct task_struct *dest, struct task_struct *src)
+{
+	struct pt_regs *old_regs = task_pt_regs(src);
+	struct pt_regs *new_regs = task_pt_regs(dest);
+	union i387_union *tmp;
+
+	*new_regs = *old_regs;
+
+	tmp = &i387_tmp[get_cpu()];
+	*tmp = dest->thread.i387;
+	dest->thread.i387 = src->thread.i387;
+	src->thread.i387 = *tmp;
+	put_cpu();
+}
+
+/*
  * sys_alloc_thread_area: get a yet unused TLS descriptor index.
  */
 static int get_free_idx(void)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6309b27..9fb050d 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -436,6 +436,16 @@ void release_thread(struct task_struct *dead_task)
 	}
 }
 
+/*
+ * Move user-space context from one kernel thread to another.
+ * Callers must make sure that neither task is running user context
+ * at the moment:
+ */
+void move_user_context(struct task_struct *dest, struct task_struct *src)
+{
+	*task_pt_regs(dest) = *task_pt_regs(src);
+}
+
 static inline void set_32bit_tls(struct task_struct *t, int tls, u32 addr)
 {
 	struct user_desc ud = { 
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index 92095b2..5a532cf 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -325,3 +325,4 @@ ENTRY(sys_call_table)
 	.long sys_eventfd
 	.long sys_fallocate
 	.long sys_indirect		/* 325 */
+	.long sys_syslet_ring_wait
diff --git a/include/asm-x86/syslet-abi.h b/include/asm-x86/syslet-abi.h
index 14a7182..06e7528 100644
--- a/include/asm-x86/syslet-abi.h
+++ b/include/asm-x86/syslet-abi.h
@@ -1 +1,9 @@
-#include <asm-generic/syslet-abi.h>
+#ifndef __ASM_X86_SYSLET_ABI_H
+#define __ASM_X86_SYSLET_ABI_H
+
+struct syslet_frame {
+	u64 ip;
+	u64 sp;
+};
+
+#endif
diff --git a/include/asm-x86/syslet.h b/include/asm-x86/syslet.h
index 583d810..75e4532 100644
--- a/include/asm-x86/syslet.h
+++ b/include/asm-x86/syslet.h
@@ -1 +1,32 @@
-#include <asm-generic/syslet.h>
+#ifndef __ASM_X86_SYSLET_H
+#define __ASM_X86_SYSLET_H
+
+#include "syslet-abi.h"
+
+/* These are provided by kernel/entry.S and kernel/process.c */
+void move_user_context(struct task_struct *dest, struct task_struct *src);
+int create_syslet_thread(long (*fn)(void *),
+			 void *arg, unsigned long flags);
+
+static inline int syslet_frame_valid(struct syslet_frame *frame)
+{
+	return frame->ip && frame->sp;
+}
+
+#ifdef CONFIG_X86_32
+static inline void set_user_frame(struct task_struct *task,
+				  struct syslet_frame *frame)
+{
+	task_pt_regs(task)->eip = frame->ip;
+	task_pt_regs(task)->esp = frame->sp;
+}
+#else
+static inline void set_user_frame(struct task_struct *task,
+				  struct syslet_frame *frame)
+{
+	task_pt_regs(task)->rip = frame->ip;
+	task_pt_regs(task)->rsp = frame->sp;
+}
+#endif
+
+#endif
diff --git a/include/asm-x86/unistd_32.h b/include/asm-x86/unistd_32.h
index 8ee0b20..0857c4d 100644
--- a/include/asm-x86/unistd_32.h
+++ b/include/asm-x86/unistd_32.h
@@ -331,10 +331,11 @@
 #define __NR_eventfd		323
 #define __NR_fallocate		324
 #define __NR_indirect		325
+#define __NR_syslet_ring_wait	326
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 326
+#define NR_syscalls 327
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
diff --git a/include/asm-x86/unistd_64.h b/include/asm-x86/unistd_64.h
index 66eab33..e01f5dc 100644
--- a/include/asm-x86/unistd_64.h
+++ b/include/asm-x86/unistd_64.h
@@ -636,7 +636,9 @@ __SYSCALL(__NR_eventfd, sys_eventfd)
 #define __NR_fallocate				285
 __SYSCALL(__NR_fallocate, sys_fallocate)
 #define __NR_indirect				286
-__SYSCALL(__NR_indirect, sys_indirect)
+__SYSCALL(__NR_indirect, stub_indirect)
+#define __NR_syslet_ring_wait			287
+__SYSCALL(__NR_syslet_ring_wait, sys_syslet_ring_wait)
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR
-- 
1.5.2.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] syslets: add generic syslets infrastructure
  2007-12-06 23:20         ` [PATCH 5/6] syslets: add generic syslets infrastructure Zach Brown
  2007-12-06 23:20           ` [PATCH 6/6] syslets: add both 32bit and 64bit x86 syslet support Zach Brown
@ 2007-12-07 11:55           ` Evgeniy Polyakov
  2007-12-07 18:24             ` Zach Brown
  2008-01-09  2:03           ` Rusty Russell
  2 siblings, 1 reply; 26+ messages in thread
From: Evgeniy Polyakov @ 2007-12-07 11:55 UTC (permalink / raw)
  To: Zach Brown
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, David S. Miller,
	Suparna Bhattacharya, Davide Libenzi, Jens Axboe,
	Thomas Gleixner, Dan Williams, Jeff Moyer, Simon Holm Thogersen,
	suresh.b.siddha

Hi Zach.

On Thu, Dec 06, 2007 at 03:20:18PM -0800, Zach Brown (zach.brown@oracle.com) wrote:
> +/*
> + * XXX todo:
> + *  - do we need all this '*cur = current' nonsense?
> + *  - try to prevent userspace from submitting too much.. lazy user ptr read?
> + *  - explain how to deal with waiting threads with stale data in current
> + *  - how does userspace tell that a syslet completion was lost?
> + *  	provide an -errno argument to the userspace return function?
> + */
> +
> +/*
> + * These structs are stored on the kernel stack of tasks which are waiting to
> + * return to userspace.  They are linked into their parent's list of syslet
> + * children stored in 'syslet_tasks' in the parent's task_struct.
> + */
> +struct syslet_task_entry {
> +	struct task_struct *task;
> +	struct list_head item;
> +};
> +
> +/*
> + * syslet_ring doesn't have any kernel-side storage.  Userspace allocates them
> + * in their address space and initializes their fields and then passes them to
> + * the kernel.
> + *
> + * These hashes provide the kernel-side storage for the wait queues which
> + * sys_syslet_ring_wait() uses and the mutex which completion uses to serialize
> + * the (possible blocking) ordered writes of the completion and kernel head
> + * index into the ring.
> + *
> + * We chose the bucket that supports a given ring by hashing a u32 that
> + * userspace sets in the ring.
> + */
> +#define SYSLET_HASH_BITS (CONFIG_BASE_SMALL ? 4 : 8)
> +#define SYSLET_HASH_NR (1 << SYSLET_HASH_BITS)
> +#define SYSLET_HASH_MASK (SYSLET_HASH_NR - 1)
> +static wait_queue_head_t syslet_waitqs[SYSLET_HASH_NR];
> +static struct mutex syslet_muts[SYSLET_HASH_NR];

Why do you care about hashed tables scalability and not using trees?

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] syslets: add generic syslets infrastructure
  2007-12-07 11:55           ` [PATCH 5/6] syslets: add generic syslets infrastructure Evgeniy Polyakov
@ 2007-12-07 18:24             ` Zach Brown
  0 siblings, 0 replies; 26+ messages in thread
From: Zach Brown @ 2007-12-07 18:24 UTC (permalink / raw)
  To: Evgeniy Polyakov
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, David S. Miller,
	Suparna Bhattacharya, Davide Libenzi, Jens Axboe,
	Thomas Gleixner, Dan Williams, Jeff Moyer, Simon Holm Thogersen,
	suresh.b.siddha


>> +/*
>> + * syslet_ring doesn't have any kernel-side storage.  Userspace allocates them
>> + * in their address space and initializes their fields and then passes them to
>> + * the kernel.
>> + *
>> + * These hashes provide the kernel-side storage for the wait queues which
>> + * sys_syslet_ring_wait() uses and the mutex which completion uses to serialize
>> + * the (possible blocking) ordered writes of the completion and kernel head
>> + * index into the ring.
>> + *
>> + * We chose the bucket that supports a given ring by hashing a u32 that
>> + * userspace sets in the ring.
>> + */
>> +#define SYSLET_HASH_BITS (CONFIG_BASE_SMALL ? 4 : 8)
>> +#define SYSLET_HASH_NR (1 << SYSLET_HASH_BITS)
>> +#define SYSLET_HASH_MASK (SYSLET_HASH_NR - 1)
>> +static wait_queue_head_t syslet_waitqs[SYSLET_HASH_NR];
>> +static struct mutex syslet_muts[SYSLET_HASH_NR];
> 
> Why do you care about hashed tables scalability and not using trees?

Well, this notion of letting tasks safely complete to any ring they can
address is just a possibility.  We might decide that it's not worth it.
 This implementation was an easy example that borrows from the way
futexes do similar work.

I like it because you could have, say, different processes completing
into a ring in shared memory.

If we do allow this kind of flexible ring specification, it's not at all
clear that trees would be the best way to address the scalability
limits.  There are lots of possibilities, including locking the page
lock of the page which holds the head index.

- z

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype
  2007-12-06 23:20 ` [PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype Zach Brown
  2007-12-06 23:20   ` [PATCH 2/6] syslet: asm-generic support to disable syslets Zach Brown
@ 2007-12-08 12:40   ` Simon Holm Thøgersen
  2007-12-08 21:22     ` Zach Brown
  1 sibling, 1 reply; 26+ messages in thread
From: Simon Holm Thøgersen @ 2007-12-08 12:40 UTC (permalink / raw)
  To: Zach Brown
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, Evgeniy Polyakov,
	David S. Miller, Suparna Bhattacharya, Davide Libenzi,
	Jens Axboe, Thomas Gleixner, Dan Williams, Jeff Moyer,
	suresh.b.siddha


tor, 06 12 2007 kl. 15:20 -0800, skrev Zach Brown:
> call_indirect() was using the wrong calling convention for the system call
> handlers.  system call handlers would get mixed up arguments.
> 
> Signed-off-by: Zach Brown <zach.brown@oracle.com>
> 
> diff --git a/include/asm-x86/indirect_32.h b/include/asm-x86/indirect_32.h
> index a1b72ac..e3dea8e 100644
> --- a/include/asm-x86/indirect_32.h
> +++ b/include/asm-x86/indirect_32.h
> @@ -15,8 +15,8 @@ struct indirect_registers {
>  
>  static inline long call_indirect(struct indirect_registers *regs)
>  {
> -  extern long (*sys_call_table[]) (__u32, __u32, __u32, __u32, __u32, __u32);
> -
> +	extern asmlinkage long (*sys_call_table[])(long, long, long,
This should be something like below instead, otherwise gcc wont parse
asmlinkage as being an attribute of the function signature.
	extern long (asmlinkage *sys_call_table[])(long, long, long,
I don't now if it has changed with recent gcc versions, this works for
me with 4.2.0.
> +						   long, long, long);
>    return sys_call_table[INDIRECT_SYSCALL(regs)](regs->ebx, regs->ecx,
>  						regs->edx, regs->esi,
>  						regs->edi, regs->ebp);


Simon Holm Thøgersen


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] Fix casting on architectures with 32-bit pointers/longs.
  2007-12-06 23:20 syslets v7: back to basics Zach Brown
  2007-12-06 23:20 ` [PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype Zach Brown
@ 2007-12-08 12:52 ` Simon Holm Thøgersen
  2007-12-10 19:46 ` syslets v7: back to basics Jens Axboe
  2007-12-10 21:30 ` Phillip Susi
  3 siblings, 0 replies; 26+ messages in thread
From: Simon Holm Thøgersen @ 2007-12-08 12:52 UTC (permalink / raw)
  To: Zach Brown
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, Evgeniy Polyakov,
	David S. Miller, Suparna Bhattacharya, Davide Libenzi,
	Jens Axboe, Thomas Gleixner, Dan Williams, Jeff Moyer,
	suresh.b.siddha


tor, 06 12 2007 kl. 15:20 -0800, skrev Zach Brown:
> The following patches are a substantial refactoring of the syslet code.  I'm
> branding them as the v7 release of the syslet infrastructure, though they
> represent a signifiant change in focus.
> 
> My current focus is to see the most fundamental functionality brought to
> maturity.  To me, this means getting a ABI that is used by applications through
> glibc on x86 and PPC64.   Only once that is ready should we distract ourselves
> with advanced complexity.
> 
> To that end, this patch series differs from v6 in significant ways:
> 
>  * syslets are initiated by providing syslet arguments to sys_indirect().
> 
>  * uatoms, threadlets, and the kaio changes are postponed until they can be
>    justified and rebuilt on more complete infrastructure.  (I'm not saying
>    these shouldn't or won't be persued.  I'm saying that we should get the
>    simplest piece working first.)
> 
>  * the code is clarified and commented, the patches are bisectable and pass
>    checkpatch.
> 
> The use of sys_indirect() and the move from 'atom's simplified the ABI
> considerably.  I've put a trivial example in a syslet-userspace git tree:
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/zab/syslets-userspace.git
> 

Signed-of-by: Simon Holm Thøgersen <odie@cs.aau.dk>
---

diff --git a/basic.c b/basic.c
index 418a1a3..5938d85 100644
--- a/basic.c
+++ b/basic.c
@@ -42,7 +42,7 @@ int main(int argc, char **argv)
 	params.syslet.frame.sp = (u64)(long)memalign(pagesize, pagesize);
 
 	memset(&params, 0, sizeof(params));
-	params.syslet.frame.ip = (u64)syslet_return_func;
+	params.syslet.frame.ip = (u64)(long)syslet_return_func;
 	params.syslet.frame.sp = (u64)(long)memalign(pagesize, pagesize);
 	params.syslet.ring_ptr = (u64)(long)ring;
 
@@ -55,7 +55,7 @@ int main(int argc, char **argv)
 			pid, my_pid);
 	}
 
-	params.syslet.frame.ip = (u64)syslet_return_func;
+	params.syslet.frame.ip = (u64)(long)syslet_return_func;
 	params.syslet.frame.sp = (u64)(long)memalign(pagesize, pagesize);
 	params.syslet.ring_ptr = (u64)(long)ring;
 	params.syslet.caller_data = CALLER_DATA;



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype
  2007-12-08 12:40   ` [PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype Simon Holm Thøgersen
@ 2007-12-08 21:22     ` Zach Brown
  0 siblings, 0 replies; 26+ messages in thread
From: Zach Brown @ 2007-12-08 21:22 UTC (permalink / raw)
  To: Simon Holm Thøgersen
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, Evgeniy Polyakov,
	David S. Miller, Suparna Bhattacharya, Davide Libenzi,
	Jens Axboe, Thomas Gleixner, Dan Williams, Jeff Moyer,
	suresh.b.siddha


>> +	extern asmlinkage long (*sys_call_table[])(long, long, long,
> This should be something like below instead, otherwise gcc wont parse
> asmlinkage as being an attribute of the function signature.
> 	extern long (asmlinkage *sys_call_table[])(long, long, long,

Yeah, thanks for pointing these out.  As it happened, Jens beat you to
it :).

Both problems have been fixed in the git repositories for the guilt
series and userspace tools, respectively.  You can always fetch the most
recent versions from there.

- z

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: syslets v7: back to basics
  2007-12-06 23:20 syslets v7: back to basics Zach Brown
  2007-12-06 23:20 ` [PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype Zach Brown
  2007-12-08 12:52 ` [PATCH] Fix casting on architectures with 32-bit pointers/longs Simon Holm Thøgersen
@ 2007-12-10 19:46 ` Jens Axboe
  2007-12-10 21:30 ` Phillip Susi
  3 siblings, 0 replies; 26+ messages in thread
From: Jens Axboe @ 2007-12-10 19:46 UTC (permalink / raw)
  To: Zach Brown
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, Evgeniy Polyakov,
	David S. Miller, Suparna Bhattacharya, Davide Libenzi,
	Thomas Gleixner, Dan Williams, Jeff Moyer, Simon Holm Thogersen,
	suresh.b.siddha

On Thu, Dec 06 2007, Zach Brown wrote:
> 
> The following patches are a substantial refactoring of the syslet code.  I'm
> branding them as the v7 release of the syslet infrastructure, though they
> represent a signifiant change in focus.

[snip]

In case anyone is interested in playing with this, I updated the fio
syslet engine to this interface. Not very well tested yet, but it seems
to work.

I have to say that I like the new interface a lot better, it's a lot
simpler to work with. I was able to cut about 33% of the code that
handles queueing IO and retrieving events (maybe even more, summed up
losely). I'm still not a big fan of the indirect stuff, but that's
minor.

You can get fio by doing a

$ git clone git://git.kernel.dk/fio.git

or just grabbing the latest snapshot from

http://brick.kernel.dk/snaps/fio-git-latest.tar.gz

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: syslets v7: back to basics
  2007-12-06 23:20 syslets v7: back to basics Zach Brown
                   ` (2 preceding siblings ...)
  2007-12-10 19:46 ` syslets v7: back to basics Jens Axboe
@ 2007-12-10 21:30 ` Phillip Susi
  2007-12-10 22:15   ` Zach Brown
  3 siblings, 1 reply; 26+ messages in thread
From: Phillip Susi @ 2007-12-10 21:30 UTC (permalink / raw)
  To: Zach Brown
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, Evgeniy.Polyakov

Zach Brown wrote:
> The following patches are a substantial refactoring of the syslet code.  I'm
> branding them as the v7 release of the syslet infrastructure, though they
> represent a signifiant change in focus.
> 
> My current focus is to see the most fundamental functionality brought to
> maturity.  To me, this means getting a ABI that is used by applications through
> glibc on x86 and PPC64.   Only once that is ready should we distract ourselves
> with advanced complexity.

I pulled from your tree to look over the patches, and noticed that it 
looks like several commits were merged improperly.  It looks like they 
were auto merged or something from an email, and the commit message 
contains the email headers, rather than just the commit message in the 
body.  This leads to the shortlog showing entries that start with 
"Return-Path:".

I was hoping to find at least some initial information on the overall 
design in Documentation/ but don't see any.  Have you written any yet 
that I could take a look at elsewhere maybe?

Some of the things I was trying to figure out is does each syslet get 
its own stack, and schedule only at a few well defined points, and if 
so, would it then be fair to characterize them as kernel mode fibers?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: syslets v7: back to basics
  2007-12-10 21:30 ` Phillip Susi
@ 2007-12-10 22:15   ` Zach Brown
  0 siblings, 0 replies; 26+ messages in thread
From: Zach Brown @ 2007-12-10 22:15 UTC (permalink / raw)
  To: Phillip Susi
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, Evgeniy Polyakov


> I pulled from your tree to look over the patches, and noticed that it
> looks like several commits were merged improperly.  It looks like they
> were auto merged or something from an email, and the commit message
> contains the email headers, rather than just the commit message in the
> body.  This leads to the shortlog showing entries that start with
> "Return-Path:".

These are patches that guilt imported from email messages.  It didn't
strip the headers and I didn't care to.  I'll try to in the future, it
isn't a big deal.

> I was hoping to find at least some initial information on the overall
> design in Documentation/ but don't see any.  Have you written any yet
> that I could take a look at elsewhere maybe?

No, but it's coming.  I'd like to have some robust documentation so that
Ulrich can help me understand what more he'd need to support POSIX AIO
with syslets from glibc.

> Some of the things I was trying to figure out is does each syslet get
> its own stack,

Yes.  Each blocking operation has a thread that is performing the
operation synchronously.  The benefit is that the thread is only created
if the operation blocks.  If it doesn't block then it's a normal system
call invocation.  You don't have to manage threads and communicate the
arguments and results of system calls amongst threads for the case where
it never blocks.

> and schedule only at a few well defined points

No, every blocking point is considered a scheduling point.

> , and if
> so, would it then be fair to characterize them as kernel mode fibers?

I'm not sure what exactly you mean by kernel mode fibers (I can guess,
but I'd rather not).  From the answer of to the last question, though,
I'm going to guess that it might not be the most apt characterization.

- z


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] syslets: add generic syslets infrastructure
  2007-12-06 23:20         ` [PATCH 5/6] syslets: add generic syslets infrastructure Zach Brown
  2007-12-06 23:20           ` [PATCH 6/6] syslets: add both 32bit and 64bit x86 syslet support Zach Brown
  2007-12-07 11:55           ` [PATCH 5/6] syslets: add generic syslets infrastructure Evgeniy Polyakov
@ 2008-01-09  2:03           ` Rusty Russell
  2008-01-09  3:00             ` Zach Brown
  2 siblings, 1 reply; 26+ messages in thread
From: Rusty Russell @ 2008-01-09  2:03 UTC (permalink / raw)
  To: Zach Brown
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, Evgeniy Polyakov,
	David S. Miller, Suparna Bhattacharya, Davide Libenzi,
	Jens Axboe, Thomas Gleixner, Dan Williams, Jeff Moyer,
	Simon Holm Thogersen, suresh.b.siddha

On Friday 07 December 2007 10:20:18 Zach Brown wrote:
> The indirect syslet arguments specify where to store the completion and
> what function in userspcae to return to once the syslet has been executed. 
> The details of how we pass the indirect syslet arguments needs help.

Hi Zach,

    Firstly, why not just specify an address for the return value and be done 
with it?  This infrastructure seems overkill, and you can always extend later 
if required.

Secondly, you really should allow integration with an eventfd so you don't 
make the posix AIO mistake of providing a poll-incompatible interface.

Finally, and probably most alarmingly, AFAICT randomly changing TID will break 
all threaded programs, which means this won't be fitted into existing code 
bases, making it YA niche Linux-only API 8(

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] syslets: add generic syslets infrastructure
  2008-01-09  2:03           ` Rusty Russell
@ 2008-01-09  3:00             ` Zach Brown
  2008-01-09  3:48               ` Rusty Russell
  0 siblings, 1 reply; 26+ messages in thread
From: Zach Brown @ 2008-01-09  3:00 UTC (permalink / raw)
  To: Rusty Russell
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, Evgeniy Polyakov,
	David S. Miller, Suparna Bhattacharya, Davide Libenzi,
	Jens Axboe, Thomas Gleixner, Dan Williams, Jeff Moyer,
	Simon Holm Thogersen, suresh.b.siddha


>     Firstly, why not just specify an address for the return value and be done 
> with it?  This infrastructure seems overkill, and you can always extend later 
> if required.

Sorry, which infrastructure?

Providing the function and stack to return to?  Sure, I could certainly
entertain the idea of not having syslet tasks return to userspace in the
first pass.  Ingo sure seemed excited by the idea.

Or do you mean the syscall return value ending up in the userspace
completion event ring?  That's mostly about being able to wait for
pending syslets to complete.

> Secondly, you really should allow integration with an eventfd so you don't 
> make the posix AIO mistake of providing a poll-incompatible interface.

Yeah, this seems straight forward enough that I haven't made it an
initial priority.  I'm sure it will be helpful for people who are stuck
integrating with entrenched software that wants to wait for pollable fds.

For more flexible software, though, it's compelling to now be able to
aggregate waiting for completion of the existing waiting syscalls (poll,
epoll_wait, futexes, whatever) by issuing them as concurrent syslets.

> Finally, and probably most alarmingly, AFAICT randomly changing TID will break 
> all threaded programs, which means this won't be fitted into existing code 
> bases, making it YA niche Linux-only API 8(

Yeah, this still needs to be investigated.  I haven't yet and I haven't
heard of anyone else trying their hand at it.

In the YANLOA mode apps would know that executing syslets is an implicit
clone() and would act accordingly.  "8(", indeed.

I wonder if there isn't an opportunity to add a clone() flag which
juggles the association between TIDs and task_structs.  I don't relish
the idea of investigating the life cycles of task_struct references that
derive from TIDs and seeing how those would race with a syslet blocking
and cloning, but, well, maybe that's what needs to be done.

This all isn't my area of expertise, though, sadly.  It would be swell
if someone wanted to look into it before I'm forced to learn yet another
weird corner of the kernel.

- z

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] syslets: add generic syslets infrastructure
  2008-01-09  3:00             ` Zach Brown
@ 2008-01-09  3:48               ` Rusty Russell
  2008-01-09 18:16                 ` Zach Brown
  0 siblings, 1 reply; 26+ messages in thread
From: Rusty Russell @ 2008-01-09  3:48 UTC (permalink / raw)
  To: Zach Brown
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, Evgeniy Polyakov,
	David S. Miller, Suparna Bhattacharya, Davide Libenzi,
	Jens Axboe, Thomas Gleixner, Dan Williams, Jeff Moyer,
	Simon Holm Thogersen, suresh.b.siddha

On Wednesday 09 January 2008 14:00:04 Zach Brown wrote:
> >     Firstly, why not just specify an address for the return value and be
> > done with it?  This infrastructure seems overkill, and you can always
> > extend later if required.
>
> Sorry, which infrastructure?
>
> Providing the function and stack to return to?  Sure, I could certainly
> entertain the idea of not having syslet tasks return to userspace in the
> first pass.  Ingo sure seemed excited by the idea.
>
> Or do you mean the syscall return value ending up in the userspace
> completion event ring?  That's mostly about being able to wait for
> pending syslets to complete.

The latter.  A ring is optimal for processing a huge number of requests, but 
if you're really going to be firing off syslet threads all over the place 
you're not going to be optimal anyway.  And being able to point the return 
value to the stack or into some datastructure is way nicer to code (zero 
setup == easy to understand and easy to convert).

For notification, see below.

> > Secondly, you really should allow integration with an eventfd so you
> > don't make the posix AIO mistake of providing a poll-incompatible
> > interface.
>
> Yeah, this seems straight forward enough that I haven't made it an
> initial priority.  I'm sure it will be helpful for people who are stuck
> integrating with entrenched software that wants to wait for pollable fds.

Unfortunately, waiting for someone to write a killer app which uses your new 
API is the road to disappointment.  The real target is convincing the handful 
of important apps (Samba, Apache, ...) to #ifdef around some small piece of 
code in order to get performance.  And a mere single design wart could mean 
that never happens.  Look at epoll, it's probably been the most successful 
and it's still damn niche.

> For more flexible software, though, it's compelling to now be able to
> aggregate waiting for completion of the existing waiting syscalls (poll,
> epoll_wait, futexes, whatever) by issuing them as concurrent syslets.

Is replacing epoll with syslets really going to win, even if you're writing 
apps from scratch?  Anyway a fast notification mechanism is a different 
problem than syslets, and should be separated.

> > Finally, and probably most alarmingly, AFAICT randomly changing TID will
> > break all threaded programs, which means this won't be fitted into
> > existing code bases, making it YA niche Linux-only API 8(
>
> I wonder if there isn't an opportunity to add a clone() flag which
> juggles the association between TIDs and task_structs.  I don't relish
> the idea of investigating the life cycles of task_struct references that
> derive from TIDs and seeing how those would race with a syslet blocking
> and cloning, but, well, maybe that's what needs to be done.

This must be solved, yet all avenues seem crawling with worms.  Redirecting 
find_task_by_pid() to find the original and converting all the places where 
we return tids to userspace?  Swapping tids when we clone?  Duplicate tids, 
with only the non-syslet one being returned from find_task_by_pid()?

> This all isn't my area of expertise, though, sadly.  It would be swell
> if someone wanted to look into it before I'm forced to learn yet another
> weird corner of the kernel.

Let's just tell Ingo it's impossible to solve :)

Rusty.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] syslets: add generic syslets infrastructure
  2008-01-09  3:48               ` Rusty Russell
@ 2008-01-09 18:16                 ` Zach Brown
  2008-01-09 22:04                   ` Rusty Russell
  2008-01-10  5:41                   ` Jeff Garzik
  0 siblings, 2 replies; 26+ messages in thread
From: Zach Brown @ 2008-01-09 18:16 UTC (permalink / raw)
  To: Rusty Russell
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, Evgeniy Polyakov,
	David S. Miller, Suparna Bhattacharya, Davide Libenzi,
	Jens Axboe, Thomas Gleixner, Dan Williams, Jeff Moyer,
	Simon Holm Thogersen, suresh.b.siddha


>> Or do you mean the syscall return value ending up in the userspace
>> completion event ring?  That's mostly about being able to wait for
>> pending syslets to complete.
> 
> The latter.  A ring is optimal for processing a huge number of requests, but 
> if you're really going to be firing off syslet threads all over the place 
> you're not going to be optimal anyway.  And being able to point the return 
> value to the stack or into some datastructure is way nicer to code (zero 
> setup == easy to understand and easy to convert).

One of Linus' rhetorical requirements for the syslets work is that we be
able to wait for the result without spending overhead building up state
in some completion context.  The userland rings in the current syslet
patches achieve that and don't seem outrageously complicated.

I have a hard time getting worked up about this particular piece of the
puzzle, though.  If people are excited about *only* providing a pollable
fd to collect the syslet completions, well, great, whatever.

> This must be solved, yet all avenues seem crawling with worms.

Yup.

- z

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] syslets: add generic syslets infrastructure
  2008-01-09 18:16                 ` Zach Brown
@ 2008-01-09 22:04                   ` Rusty Russell
  2008-01-09 22:58                     ` Linus Torvalds
  2008-01-09 23:15                     ` Davide Libenzi
  2008-01-10  5:41                   ` Jeff Garzik
  1 sibling, 2 replies; 26+ messages in thread
From: Rusty Russell @ 2008-01-09 22:04 UTC (permalink / raw)
  To: Zach Brown
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, Evgeniy Polyakov,
	David S. Miller, Suparna Bhattacharya, Davide Libenzi,
	Jens Axboe, Thomas Gleixner, Dan Williams, Jeff Moyer,
	Simon Holm Thogersen, suresh.b.siddha

On Thursday 10 January 2008 05:16:57 Zach Brown wrote:
> > The latter.  A ring is optimal for processing a huge number of requests,
> > but if you're really going to be firing off syslet threads all over the
> > place you're not going to be optimal anyway.  And being able to point the
> > return value to the stack or into some datastructure is way nicer to code
> > (zero setup == easy to understand and easy to convert).
>
> One of Linus' rhetorical requirements for the syslets work is that we be
> able to wait for the result without spending overhead building up state
> in some completion context.  The userland rings in the current syslet
> patches achieve that and don't seem outrageously complicated.

I'd have to read his original statement, but eventfd doesn't build up state, 
so I think it qualifies.

YA incompatible userspace notification system just doesn't excite me though.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] syslets: add generic syslets infrastructure
  2008-01-09 22:04                   ` Rusty Russell
@ 2008-01-09 22:58                     ` Linus Torvalds
  2008-01-09 23:05                       ` Linus Torvalds
                                         ` (2 more replies)
  2008-01-09 23:15                     ` Davide Libenzi
  1 sibling, 3 replies; 26+ messages in thread
From: Linus Torvalds @ 2008-01-09 22:58 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Zach Brown, linux-kernel, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, Evgeniy Polyakov,
	David S. Miller, Suparna Bhattacharya, Davide Libenzi,
	Jens Axboe, Thomas Gleixner, Dan Williams, Jeff Moyer,
	Simon Holm Thogersen, suresh.b.siddha



On Thu, 10 Jan 2008, Rusty Russell wrote:
> 
> I'd have to read his original statement, but eventfd doesn't build up state, 
> so I think it qualifies.

How about you guys battle it out by giving an example program usign the 
interface?

Here's a favourite really simple load of mine:

 - do the equivalent of "ls -lR" or "find /usr" as quickly as possible, 
   without playing "sort by the inode numbers" games (that don't work in 
   general, but are great for some filesystems)

   Do this on a directory that isn't newly created, but has had files 
   added and removed over time (so that the return order of "readdir()" 
   isn't dense and sorted in the inode tables already). The classic 
   example is "ls -l /usr/bin" or similar.

This is actually a fairly hard load, because there's two easily tested 
cases that are both equally important: hot-cache (no IO at all, just CPU), 
and cold-cache (all about trying to get concurrent IO on inode lookups 
going, to get the disk elevator working in the *absense* of any sorting).

And almost all of the operations are operations that right now have no 
asynchronous model except for full threads (ie neither filename nor inode 
lookup have any aio_xyz() things).

How can we do something like *that*? It's about as simple an IO test 
program you can imagine, while I'd argue that it is still a reasonably 
"realistic" thing to do, and interesting for asynchronous operations.

How would do you something like this, striving to allow overlap of IO, and 
getting (hopefully) the block layer to create bigger request sizes?

To make it simple, let's make the *only* operation we care about being 
asynchronous be that "lstat()". And instead of printing out each file, 
just add up the sizes or something (ie make this approximate "du -sh"). 

But the important thing is that if things are cached, it should be the 
same speed as this trivial program, ie the async interfaces shouldn't slow 
things down. But if things require IO, hopefully we can get at least some 
reasonable number of concurrent IO's going, and making the program not 
seek back and forth quite so much).

Try this with and without a "echo 3 > /proc/sys/vm/drop_caches" in 
between. Both are real and relevant usage cases, I think.

And *simplicity* of the end result really does matter. If it's not simple 
to use, people won't use it.

		Linus

---
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <dirent.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>

static void find(const char *base, int baselen);

static void handle(const char *name, int namelen)
{
	struct stat st;

	if (lstat(name, &st) < 0)
		return;
	printf("%8lu %s\n", st.st_size, name);
	if (S_ISDIR(st.st_mode))
		find(name, namelen);
}

static void find(const char *base, int baselen)
{
	DIR *dir;
	char *name = malloc(baselen + 255 + 2);

	if (!name)
		return;
	memcpy(name, base, baselen);
	name[baselen] = '/';
	dir = opendir(base);
	if (dir) {
		struct dirent *de;
		while ((de = readdir(dir)) != NULL) {
			const char *p = de->d_name;
			int len = strlen(p);
			if (len > 255)
				continue;
			if (p[0] == '.') {
				if (len == 1)
					continue;
				if (len == 2 && p[1] == '.')
					continue;
			}
			memcpy(name + baselen + 1, de->d_name, len+1);
			handle(name, baselen + 1 + len);
		}
		closedir(dir);
	}
	free(name);
}

int main(int argc, char **argv)
{
	find(".",1);
	return 0;
}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] syslets: add generic syslets infrastructure
  2008-01-09 22:58                     ` Linus Torvalds
@ 2008-01-09 23:05                       ` Linus Torvalds
  2008-01-09 23:47                       ` Zach Brown
  2008-01-10  1:18                       ` Rusty Russell
  2 siblings, 0 replies; 26+ messages in thread
From: Linus Torvalds @ 2008-01-09 23:05 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Zach Brown, linux-kernel, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, Evgeniy Polyakov,
	David S. Miller, Suparna Bhattacharya, Davide Libenzi,
	Jens Axboe, Thomas Gleixner, Dan Williams, Jeff Moyer,
	Simon Holm Thogersen, suresh.b.siddha



On Wed, 9 Jan 2008, Linus Torvalds wrote:
> 
> Try this with and without a "echo 3 > /proc/sys/vm/drop_caches" in 
> between. Both are real and relevant usage cases, I think.

Side note, for me the difference on my home directory for the cached vs 
uncached case is 5 seconds vs 5 minutes. I like the 5 sec, I'd like to 
improve on that 5 min.

Is this test a bit *too* simplistic? Probably. But I think that if we can 
come up with an interface that works ok for that test, it at least signals 
*something*.

			Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] syslets: add generic syslets infrastructure
  2008-01-09 22:04                   ` Rusty Russell
  2008-01-09 22:58                     ` Linus Torvalds
@ 2008-01-09 23:15                     ` Davide Libenzi
  1 sibling, 0 replies; 26+ messages in thread
From: Davide Libenzi @ 2008-01-09 23:15 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Zach Brown, Linux Kernel Mailing List, Linus Torvalds,
	Ingo Molnar, Ulrich Drepper, Arjan van de Ven, Andrew Morton,
	Alan Cox, Evgeniy Polyakov, David S. Miller,
	Suparna Bhattacharya, Jens Axboe, Thomas Gleixner, Dan Williams,
	Jeff Moyer, Simon Holm Thogersen, suresh.b.siddha

On Thu, 10 Jan 2008, Rusty Russell wrote:

> On Thursday 10 January 2008 05:16:57 Zach Brown wrote:
> > > The latter.  A ring is optimal for processing a huge number of requests,
> > > but if you're really going to be firing off syslet threads all over the
> > > place you're not going to be optimal anyway.  And being able to point the
> > > return value to the stack or into some datastructure is way nicer to code
> > > (zero setup == easy to understand and easy to convert).
> >
> > One of Linus' rhetorical requirements for the syslets work is that we be
> > able to wait for the result without spending overhead building up state
> > in some completion context.  The userland rings in the current syslet
> > patches achieve that and don't seem outrageously complicated.
> 
> I'd have to read his original statement, but eventfd doesn't build up state, 
> so I think it qualifies.

I think you and Zach are talking about different issues ;)
Eventfd born for two different reasons. First, to be able to have 
userspace to signal to a poll/select/epoll based listener an event. This 
can elso be done with pipes, but eventfd has a few advantages over pipes 
(3-4 times faster and *a lot* less memory footprint). Second, as a generic 
way for kernel subsystems to signal completions to a poll/select/epoll 
userspace listener. And this is what is used in the new KAIO eventfd 
feature (patch was like 5 lines IIRC).
This allow for KAIO events to be signaled to poll/select/epoll in a pretty 
easy way, using a simple extension of the AIO API.
What we talked originally with Ingo, when the first syslet code came up, 
was the ability to do the reverse thing. That is, host an epoll_wait() 
inside a syslet, and gather the completion using whatever the syslet code 
was/is going to use for it.
Given that 1) the eventfd completion patch was trivial and immediately 
available 2) the future of the whole syslet concept was/is still unclear, 
I believe it makes/made sense. If the syslets will become mainline, it'll 
mean that userspace will then be able to select the event-completion 
"hosting" method that better suits their needs. That are, either AIO 
completion hosted inside an epoll_wait() via an eventfd, or an 
epoll_wait() hosted inside a syslet.




- Davide



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] syslets: add generic syslets infrastructure
  2008-01-09 22:58                     ` Linus Torvalds
  2008-01-09 23:05                       ` Linus Torvalds
@ 2008-01-09 23:47                       ` Zach Brown
  2008-01-10  1:18                       ` Rusty Russell
  2 siblings, 0 replies; 26+ messages in thread
From: Zach Brown @ 2008-01-09 23:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rusty Russell, linux-kernel, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, Evgeniy Polyakov,
	David S. Miller, Suparna Bhattacharya, Davide Libenzi,
	Jens Axboe, Thomas Gleixner, Dan Williams, Jeff Moyer,
	Simon Holm Thogersen, suresh.b.siddha

Linus Torvalds wrote:
> 
> On Thu, 10 Jan 2008, Rusty Russell wrote:
>> I'd have to read his original statement, but eventfd doesn't build up state, 
>> so I think it qualifies.
> 
> How about you guys battle it out by giving an example program usign the 
> interface?
> 
> Here's a favourite really simple load of mine:
> 
>  - do the equivalent of "ls -lR" or "find /usr" as quickly as possible, 
>    without playing "sort by the inode numbers" games (that don't work in 
>    general, but are great for some filesystems)
> 
>    Do this on a directory that isn't newly created, but has had files 
>    added and removed over time (so that the return order of "readdir()" 
>    isn't dense and sorted in the inode tables already). The classic 
>    example is "ls -l /usr/bin" or similar.

Sure, that's straight forward enough.  We've all written little test
apps for variants of this load in the past, anyway.  (It was one of the
first things I did for fibrils, Ingo had a variant for syslets which
read small file data too, Chris has a syslet mode in his 'acp' util, etc.)

I was going to send out a patch series pretty soon which includes
cleanups (I think) of the sys_indirect() infrastructure.  I can throw
together this little test app along with that.

- z

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] syslets: add generic syslets infrastructure
  2008-01-09 22:58                     ` Linus Torvalds
  2008-01-09 23:05                       ` Linus Torvalds
  2008-01-09 23:47                       ` Zach Brown
@ 2008-01-10  1:18                       ` Rusty Russell
  2 siblings, 0 replies; 26+ messages in thread
From: Rusty Russell @ 2008-01-10  1:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Zach Brown, linux-kernel, Ingo Molnar, Ulrich Drepper,
	Arjan van de Ven, Andrew Morton, Alan Cox, Evgeniy Polyakov,
	David S. Miller, Suparna Bhattacharya, Davide Libenzi,
	Jens Axboe, Thomas Gleixner, Dan Williams, Jeff Moyer,
	Simon Holm Thogersen, suresh.b.siddha

On Thursday 10 January 2008 09:58:10 Linus Torvalds wrote:
> On Thu, 10 Jan 2008, Rusty Russell wrote:
> > I'd have to read his original statement, but eventfd doesn't build up
> > state, so I think it qualifies.
>
> How about you guys battle it out by giving an example program usign the
> interface?

Nice idea.

> And *simplicity* of the end result really does matter. If it's not simple
> to use, people won't use it.

Completely agreed, but async is always more complex than sync.  

eg. your malloc()-and-overwrite trick here assumes it's serial, similarly 
stack vars.  Even before anything's happened we've massively increased the 
number of mallocs :(  Maybe someone else can see a neater way?

Below is an async-ready version (I've assumed you still want each dir grouped 
together).  It's already slower (best 0m3.842s vs best 0m3.659s):

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <dirent.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>

static void find(const char *base, int baselen);

struct result
{
	struct result *next;
	struct stat st;
	unsigned int namelen;
	char name[];
};

static struct result *new_result(const char *base, int baselen,
				 const char *sub, int sublen)
{
	struct result *r;

	r = malloc(sizeof(*r) + baselen + sublen + 2);
	memcpy(r->name, base, baselen);
	r->name[baselen] = '/';
	memcpy(r->name + baselen + 1, sub, sublen+1);
	r->namelen = baselen + sublen + 1;
	
	return r;
}

static void process(struct result *r, struct result **dirs)
{
	printf("%8lu %s\n", r->st.st_size, r->name);
	if (S_ISDIR(r->st.st_mode)) {
		r->next = *dirs;
		*dirs = r;
	} else 
		free(r);
}

/* -1 = fail, 0 = success, 1 = in progress. */
static int handle_async(struct result *r)
{
	/* Ours is sync. */
	return lstat(r->name, &r->st);
}

static void process_pending(struct result *pending, struct result **dirs)
{
	/* Since we're sync, pending will be NULL.  Otherwise call
	pending as they complete. */
}

static void find(const char *base, int baselen)
{
	DIR *dir;
	struct result *r, *pending = NULL, *dirs = NULL;

	dir = opendir(base);
	if (dir) {
		struct dirent *de;
		while ((de = readdir(dir)) != NULL) {
			const char *p = de->d_name;
			int len = strlen(p);
			if (p[0] == '.') {
				if (len == 1)
					continue;
				if (len == 2 && p[1] == '.')
					continue;
			}
			r = new_result(base, baselen, p, len);
			switch (handle_async(r)) {
			case 0:
				process(r, &dirs);
				break;
			case -1:
				free(r);
				break;
			case 1:
				r->next = pending;
				pending = r;
			}
		}
		closedir(dir);
		process_pending(pending, &dirs);
		while (dirs) {
			find(dirs->name, dirs->namelen);
			r = dirs;
			dirs = dirs->next;
			free(r);
		}
	}
}

int main(int argc, char **argv)
{
	find(".",1);
	return 0;
}


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] syslets: add generic syslets infrastructure
  2008-01-09 18:16                 ` Zach Brown
  2008-01-09 22:04                   ` Rusty Russell
@ 2008-01-10  5:41                   ` Jeff Garzik
  1 sibling, 0 replies; 26+ messages in thread
From: Jeff Garzik @ 2008-01-10  5:41 UTC (permalink / raw)
  To: Zach Brown
  Cc: Rusty Russell, linux-kernel, Linus Torvalds, Ingo Molnar,
	Ulrich Drepper, Arjan van de Ven, Andrew Morton, Alan Cox,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Jens Axboe, Thomas Gleixner, Dan Williams,
	Jeff Moyer, Simon Holm Thogersen, suresh.b.siddha

So my radical ultra tired rant o the week...

Rather than adding sys_indirect and syslets as is,

* admit that this is beginning to look like a new ABI.  explore the 
freedoms that that avenue opens...

* (even more radical)  I wonder what a tiny, SANE register-based 
bytecode interface might look like.  Have a single page shared between 
kernel and userland, for each thread.  Userland fills that page with 
bytecode, for a virtual machines with 256 registers -- where 
instructions roughly equate to syscalls.

The common case -- a single syscall like open(2) -- would be a single 
byte bytecode, plus a couple VM register stores.  The result is stored 
in another VM register.

But this format enables more complex cases, where userland programs can 
pass strings of syscalls into the kernel, and let them execute until 
some exceptional condition occurs.  Results would be stored in VM 
registers (or userland addresses stored in VM registers...).

This sort of interface would be
* fast

* equate to the current syscall regime (easy to get existing APIs 
going... hopefully equivalent to glibc switching to a strange new 
SYSENTER variant)

* be flexible enough to support a simple implementation today

* be flexible enough to enable experiments into syscall parallelism (aka 
VM instruction parallelism <grin>)

* be flexible enough to enable experiments into syscall batching

One would probably want to add some simple logic opcodes in addition to 
opcodes for syscalls and such -- but this should not turn into Forth or 
Parrot or Java :)

Thus, this new ABI can easily and immediately support all existing 
syscalls, while enabling

Now to come up with a good programming API and model(s) to match this 
parallel, batched kernel<->userland interface...

	Jeff,	very tired and delirious, so feel free to laugh at this,
		but I've been pondering this for a while





^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2008-01-10  5:41 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-06 23:20 syslets v7: back to basics Zach Brown
2007-12-06 23:20 ` [PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype Zach Brown
2007-12-06 23:20   ` [PATCH 2/6] syslet: asm-generic support to disable syslets Zach Brown
2007-12-06 23:20     ` [PATCH 3/6] syslet: introduce abi structs Zach Brown
2007-12-06 23:20       ` [PATCH 4/6] syslets: add indirect args Zach Brown
2007-12-06 23:20         ` [PATCH 5/6] syslets: add generic syslets infrastructure Zach Brown
2007-12-06 23:20           ` [PATCH 6/6] syslets: add both 32bit and 64bit x86 syslet support Zach Brown
2007-12-07 11:55           ` [PATCH 5/6] syslets: add generic syslets infrastructure Evgeniy Polyakov
2007-12-07 18:24             ` Zach Brown
2008-01-09  2:03           ` Rusty Russell
2008-01-09  3:00             ` Zach Brown
2008-01-09  3:48               ` Rusty Russell
2008-01-09 18:16                 ` Zach Brown
2008-01-09 22:04                   ` Rusty Russell
2008-01-09 22:58                     ` Linus Torvalds
2008-01-09 23:05                       ` Linus Torvalds
2008-01-09 23:47                       ` Zach Brown
2008-01-10  1:18                       ` Rusty Russell
2008-01-09 23:15                     ` Davide Libenzi
2008-01-10  5:41                   ` Jeff Garzik
2007-12-08 12:40   ` [PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype Simon Holm Thøgersen
2007-12-08 21:22     ` Zach Brown
2007-12-08 12:52 ` [PATCH] Fix casting on architectures with 32-bit pointers/longs Simon Holm Thøgersen
2007-12-10 19:46 ` syslets v7: back to basics Jens Axboe
2007-12-10 21:30 ` Phillip Susi
2007-12-10 22:15   ` Zach Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).