[patch 00/13] Syslets, "Threadlets", generic AIO support, v3

* [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
@ 2007-02-21 21:13 Ingo Molnar
  2007-02-21 21:14 ` [patch 01/13] syslets: add async.h include file, kernel-side API definitions Ingo Molnar
                   ` (18 more replies)
  0 siblings, 19 replies; 337+ messages in thread
From: Ingo Molnar @ 2007-02-21 21:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Arjan van de Ven, Christoph Hellwig,
	Andrew Morton, Alan Cox, Ulrich Drepper, Zach Brown,
	Evgeniy Polyakov, David S. Miller, Suparna Bhattacharya,
	Davide Libenzi, Jens Axboe, Thomas Gleixner

this is the v3 release of the syslet/threadlet subsystem:

   http://redhat.com/~mingo/syslet-patches/

This release came a few days later than i originally wanted, because 
i've implemented many fundamental changes to the code. The biggest 
highlights of v3 are:

 - "Threadlets": the introduction of the 'threadlet' execution concept.

 - syslets: multiple rings support with no kernel-side footprint, the 
   elimination of mlock() pinning, no async_register/unregister() calls 
   needed anymore and more.

"Threadlets" are basically the user-space equivalent of syslets: small 
functions of execution that the kernel attempts to execute without 
scheduling. If the threadlet blocks, the kernel creates a real thread 
from it, and execution continues in that thread. The 'head' context (the 
context that never blocks) returns to the original function that called 
the threadlet. Threadlets are very easy to use:

long my_threadlet_fn(void *data)
{
	char *name = data;
	int fd;

	fd = open(name, O_RDONLY);
	if (fd < 0)
		goto out;

	fstat(fd, &stat);
	read(fd, buf, count)
	...

out:
	return threadlet_complete();
}

main()
{
	done = threadlet_exec(threadlet_fn, new_stack, &user_head);
	if (!done)
		reqs_queued++;
}

There is no limitation whatsoever about how a threadlet function can 
look like: it can use arbitrary system-calls and all execution will be 
procedural. There is no 'registration' needed when running threadlets 
either: the kernel will take care of all the details, user-space just 
runs a threadlet without any preparation and that's it.

Completion of async threadlets can be done from user-space via any of 
the existing APIs: in threadlet-test.c (see the async-test-v3.tar.gz 
user-space examples at the URL above) i've for example used a futex 
between the head and the async threads to do threadlet notification. But 
select(), poll() or signals can be used too - whichever is most 
convenient to the application writer.

Threadlets can also be thought of as 'optional threads': they execute in 
the original context as long as they do not block, but once they block, 
they are moved off into their separate thread context - and the original 
context can continue execution.

Threadlets can also be thought of as 'on-demand parallelism': user-space 
does not have to worry about setting up, sizing and feeding a thread 
pool - the kernel will execute the workload in a single-threaded manner 
as long as it makes sense, but once the context blocks, a parallel 
context is created. So parallelism inside applications is utilized in a 
natural way. (The best place to do this is in the kernel - user-space 
has no idea about what level of parallelism is best for any given 
moment.)

I believe this threadlet concept is what user-space will want to use for 
programmable parallelism.

[ Note that right now there's a pair of system-calls: sys_threadlet_on() 
  and sys_threadlet_off() that demarks the beginning and the end of a 
  syslet function, which enter the kernel even in the 'cached' case - 
  but my plan is to do these two system calls via a vsyscall, without 
  having to enter the kernel at all. That will reduce cached threadlet 
  execution NULL-overhead to around 10 nsecs - making it essentially 
  zero. ]

Threadlets share much of the scheduling infrastructure with syslets.

Syslets (small, kernel-side, scripted "syscall plugins") are still 
supported - they are (much...) harder to program than threadlets but 
they allow the highest performance. Core infrastructure libraries like 
glibc/libaio are expected to use syslets. Jens Axboe's FIO tool already 
includes support for v2 syslets, and the following patch updates FIO to 
the v3 API:

   http://redhat.com/~mingo/syslet-patches/fio-syslet-v3.patch

Furthermore, the syslet code and API has been significantly enhanced as 
well:

 - support for multiple completion rings has been added

 - there is no more mlock()ing of the completion ring(s)

 - sys_async_register()/unregister() has been removed as it is not 
   needed anymore. sys_async_exec() can be called straight away.

 - there is no kernel-side resource used up by async completion rings at 
   all (all the state is in user-space), so an arbitrary number of
   completion rings are supported.

plus lots of bugs were fixed and a good number of cleanups were done as 
well. The v3 code is ABI-incompatible with v2, due to these fundamental 
changes.

As always, comments, suggestions, reports are welcome.

	Ingo

^ permalink raw reply	[flat|nested] 337+ messages in thread