linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Zach Brown <zach.brown@oracle.com>
Cc: linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>, Ulrich Drepper <drepper@redhat.com>,
	Arjan van de Ven <arjan@infradead.org>,
	Andrew Morton <akpm@zip.com.au>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Evgeniy Polyakov <johnpol@2ka.mipt.ru>,
	"David S. Miller" <davem@davemloft.net>,
	Suparna Bhattacharya <suparna@in.ibm.com>,
	Davide Libenzi <davidel@xmailserver.org>,
	Jens Axboe <jens.axboe@oracle.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Dan Williams <dan.j.williams@gmail.com>,
	Jeff Moyer <jmoyer@redhat.com>,
	Simon Holm Thogersen <odie@cs.aau.dk>,
	suresh.b.siddha@intel.com
Subject: Re: [PATCH 5/6] syslets: add generic syslets infrastructure
Date: Wed, 9 Jan 2008 14:48:44 +1100	[thread overview]
Message-ID: <200801091448.46241.rusty@rustcorp.com.au> (raw)
In-Reply-To: <478438B4.6010101@oracle.com>

On Wednesday 09 January 2008 14:00:04 Zach Brown wrote:
> >     Firstly, why not just specify an address for the return value and be
> > done with it?  This infrastructure seems overkill, and you can always
> > extend later if required.
>
> Sorry, which infrastructure?
>
> Providing the function and stack to return to?  Sure, I could certainly
> entertain the idea of not having syslet tasks return to userspace in the
> first pass.  Ingo sure seemed excited by the idea.
>
> Or do you mean the syscall return value ending up in the userspace
> completion event ring?  That's mostly about being able to wait for
> pending syslets to complete.

The latter.  A ring is optimal for processing a huge number of requests, but 
if you're really going to be firing off syslet threads all over the place 
you're not going to be optimal anyway.  And being able to point the return 
value to the stack or into some datastructure is way nicer to code (zero 
setup == easy to understand and easy to convert).

For notification, see below.

> > Secondly, you really should allow integration with an eventfd so you
> > don't make the posix AIO mistake of providing a poll-incompatible
> > interface.
>
> Yeah, this seems straight forward enough that I haven't made it an
> initial priority.  I'm sure it will be helpful for people who are stuck
> integrating with entrenched software that wants to wait for pollable fds.

Unfortunately, waiting for someone to write a killer app which uses your new 
API is the road to disappointment.  The real target is convincing the handful 
of important apps (Samba, Apache, ...) to #ifdef around some small piece of 
code in order to get performance.  And a mere single design wart could mean 
that never happens.  Look at epoll, it's probably been the most successful 
and it's still damn niche.

> For more flexible software, though, it's compelling to now be able to
> aggregate waiting for completion of the existing waiting syscalls (poll,
> epoll_wait, futexes, whatever) by issuing them as concurrent syslets.

Is replacing epoll with syslets really going to win, even if you're writing 
apps from scratch?  Anyway a fast notification mechanism is a different 
problem than syslets, and should be separated.

> > Finally, and probably most alarmingly, AFAICT randomly changing TID will
> > break all threaded programs, which means this won't be fitted into
> > existing code bases, making it YA niche Linux-only API 8(
>
> I wonder if there isn't an opportunity to add a clone() flag which
> juggles the association between TIDs and task_structs.  I don't relish
> the idea of investigating the life cycles of task_struct references that
> derive from TIDs and seeing how those would race with a syslet blocking
> and cloning, but, well, maybe that's what needs to be done.

This must be solved, yet all avenues seem crawling with worms.  Redirecting 
find_task_by_pid() to find the original and converting all the places where 
we return tids to userspace?  Swapping tids when we clone?  Duplicate tids, 
with only the non-syslet one being returned from find_task_by_pid()?

> This all isn't my area of expertise, though, sadly.  It would be swell
> if someone wanted to look into it before I'm forced to learn yet another
> weird corner of the kernel.

Let's just tell Ingo it's impossible to solve :)

Rusty.

  reply	other threads:[~2008-01-09  3:49 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-06 23:20 syslets v7: back to basics Zach Brown
2007-12-06 23:20 ` [PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype Zach Brown
2007-12-06 23:20   ` [PATCH 2/6] syslet: asm-generic support to disable syslets Zach Brown
2007-12-06 23:20     ` [PATCH 3/6] syslet: introduce abi structs Zach Brown
2007-12-06 23:20       ` [PATCH 4/6] syslets: add indirect args Zach Brown
2007-12-06 23:20         ` [PATCH 5/6] syslets: add generic syslets infrastructure Zach Brown
2007-12-06 23:20           ` [PATCH 6/6] syslets: add both 32bit and 64bit x86 syslet support Zach Brown
2007-12-07 11:55           ` [PATCH 5/6] syslets: add generic syslets infrastructure Evgeniy Polyakov
2007-12-07 18:24             ` Zach Brown
2008-01-09  2:03           ` Rusty Russell
2008-01-09  3:00             ` Zach Brown
2008-01-09  3:48               ` Rusty Russell [this message]
2008-01-09 18:16                 ` Zach Brown
2008-01-09 22:04                   ` Rusty Russell
2008-01-09 22:58                     ` Linus Torvalds
2008-01-09 23:05                       ` Linus Torvalds
2008-01-09 23:47                       ` Zach Brown
2008-01-10  1:18                       ` Rusty Russell
2008-01-09 23:15                     ` Davide Libenzi
2008-01-10  5:41                   ` Jeff Garzik
2007-12-08 12:40   ` [PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype Simon Holm Thøgersen
2007-12-08 21:22     ` Zach Brown
2007-12-08 12:52 ` [PATCH] Fix casting on architectures with 32-bit pointers/longs Simon Holm Thøgersen
2007-12-10 19:46 ` syslets v7: back to basics Jens Axboe
2007-12-10 21:30 ` Phillip Susi
2007-12-10 22:15   ` Zach Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200801091448.46241.rusty@rustcorp.com.au \
    --to=rusty@rustcorp.com.au \
    --cc=akpm@zip.com.au \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=arjan@infradead.org \
    --cc=dan.j.williams@gmail.com \
    --cc=davem@davemloft.net \
    --cc=davidel@xmailserver.org \
    --cc=drepper@redhat.com \
    --cc=jens.axboe@oracle.com \
    --cc=jmoyer@redhat.com \
    --cc=johnpol@2ka.mipt.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=odie@cs.aau.dk \
    --cc=suparna@in.ibm.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).