From: "Kent Overstreet" <kent.overstreet@gmail.com>
To: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Davide Libenzi" <davidel@xmailserver.org>,
"Zach Brown" <zach.brown@oracle.com>,
"Ingo Molnar" <mingo@elte.hu>,
"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
linux-aio@kvack.org, "Suparna Bhattacharya" <suparna@in.ibm.com>,
"Benjamin LaHaise" <bcrl@kvack.org>
Subject: Re: [PATCH 2 of 4] Introduce i386 fibril scheduling
Date: Tue, 6 Feb 2007 13:45:50 -0900 [thread overview]
Message-ID: <6f703f960702061445q23dd9d48q7afec75d2400ef62@mail.gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0702061238300.8424@woody.linux-foundation.org>
On 2/6/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Mon, 5 Feb 2007, Kent Overstreet wrote:
> >
> > struct asys_ret {
> > int ret;
> > struct thread *p;
> > };
> >
> > struct asys_ret r;
> > r.p = me;
> >
> > async_read(fd, buf, nbytes, &r);
>
> That's horrible. It means that "r" cannot have automatic linkage (since
> the stack will be *gone* by the time we need to fill in "ret"), so now you
> need to track *two* pointers: "me" and "&r".
You'd only allocate r on the stack if that stack is going to be around
later; i.e. if you're using user threads. Otherwise, you just allocate
it in some struct containing your aiocb or whatever.
> And for user space, it means that we pass the _one_ thing around that we
> need for both identifying the async operation to the kernel (the "cookie")
> for wait or cancel, and the place where we expect the return value to be
> found (which in turn can _easily_ represent a whole "struct aiocb *",
> since the return value obviously has to be embedded in there anyway).
>
> Linus
The "struct aiocb" isn't something you have to or necessarily want to
keep around. It's the way the current aio interface works (which I've
coded to), but I don't really see the point. All it really contains is
the syscall arguments, but once the syscall's in progress there's no
reason the kernel has to refer back to it; similarly for userspace,
it's just another struct that userspace has to keep track of and free
at some later time.
In fact, that's the only sane way you can have a ring for submitted
system calls, as otherwise elements of the ring are getting freed in
essentially random order.
I don't see the point in having a ring for completed events, since
it's at most two pointers per completion; quite a bit less data being
sent back than for submissions.
-----
The trouble with differentiating between calls that block and calls
that don't is you completely loose the ability to batch syscalls
together; this is potentially a major win of an asynchronous
interface.
An app can have a bunch of cheap, fast user space threads servicing
whatever; as they run, they can push their system calls onto a global
stack. When no more can run, it does a giant asys_submit (something
similar to io_submit), then the io_getevents equivilant, running the
user threads that had their syscalls complete.
This doesn't mean you can't run synchronously the syscalls that
wouldn't block, or that you have to allocate a fibril for every
syscall - but for servers that care more about throughput than
latency, this is potentially a big win, in cache effects if nothing
else.
(And this doesn't prevent you from having a different syscall that
submits an asynchronous syscall, but runs it right away if it was able
to without blocking).
next prev parent reply other threads:[~2007-02-06 22:45 UTC|newest]
Thread overview: 153+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-01-30 20:39 [PATCH 0 of 4] Generic AIO by scheduling stacks Zach Brown
2007-01-30 20:39 ` [PATCH 1 of 4] Introduce per_call_chain() Zach Brown
2007-01-30 20:39 ` [PATCH 2 of 4] Introduce i386 fibril scheduling Zach Brown
2007-02-01 8:36 ` Ingo Molnar
2007-02-01 13:02 ` Ingo Molnar
2007-02-01 13:19 ` Christoph Hellwig
2007-02-01 13:52 ` Ingo Molnar
2007-02-01 17:13 ` Mark Lord
2007-02-01 18:02 ` Ingo Molnar
2007-02-02 13:23 ` Andi Kleen
2007-02-01 21:52 ` Zach Brown
2007-02-01 22:23 ` Benjamin LaHaise
2007-02-01 22:37 ` Zach Brown
2007-02-02 13:22 ` Andi Kleen
2007-02-01 20:07 ` Linus Torvalds
2007-02-02 10:49 ` Ingo Molnar
2007-02-02 15:56 ` Linus Torvalds
2007-02-02 19:59 ` Alan
2007-02-02 20:14 ` Linus Torvalds
2007-02-02 20:58 ` Davide Libenzi
2007-02-02 21:09 ` Linus Torvalds
2007-02-02 21:30 ` Alan
2007-02-02 21:30 ` Linus Torvalds
2007-02-02 22:42 ` Ingo Molnar
2007-02-02 23:01 ` Linus Torvalds
2007-02-02 23:17 ` Linus Torvalds
2007-02-03 0:04 ` Alan
2007-02-03 0:23 ` bert hubert
2007-02-02 22:48 ` Alan
2007-02-05 16:44 ` Zach Brown
2007-02-02 22:21 ` Ingo Molnar
2007-02-02 22:49 ` Linus Torvalds
2007-02-02 23:55 ` Ingo Molnar
2007-02-03 0:56 ` Linus Torvalds
2007-02-03 7:15 ` Suparna Bhattacharya
2007-02-03 8:23 ` Ingo Molnar
2007-02-03 9:25 ` Matt Mackall
2007-02-03 10:03 ` Ingo Molnar
2007-02-05 17:44 ` Zach Brown
2007-02-05 19:26 ` Davide Libenzi
2007-02-05 19:41 ` Zach Brown
2007-02-05 20:10 ` Davide Libenzi
2007-02-05 20:21 ` Zach Brown
2007-02-05 20:42 ` Linus Torvalds
2007-02-05 20:39 ` Linus Torvalds
2007-02-05 21:09 ` Davide Libenzi
2007-02-05 21:31 ` Kent Overstreet
2007-02-06 20:25 ` Davide Libenzi
2007-02-06 20:46 ` Linus Torvalds
2007-02-06 21:16 ` David Miller
2007-02-06 21:28 ` Linus Torvalds
2007-02-06 21:31 ` David Miller
2007-02-06 21:46 ` Eric Dumazet
2007-02-06 21:50 ` Linus Torvalds
2007-02-06 22:28 ` Zach Brown
2007-02-06 22:45 ` Kent Overstreet [this message]
2007-02-06 23:04 ` Linus Torvalds
2007-02-07 1:22 ` Kent Overstreet
2007-02-06 23:23 ` Davide Libenzi
2007-02-06 23:39 ` Joel Becker
2007-02-06 23:56 ` Davide Libenzi
2007-02-07 0:06 ` Joel Becker
2007-02-07 0:23 ` Davide Libenzi
2007-02-07 0:44 ` Joel Becker
2007-02-07 1:15 ` Davide Libenzi
2007-02-07 1:24 ` Kent Overstreet
2007-02-07 1:30 ` Joel Becker
2007-02-07 6:16 ` Michael K. Edwards
2007-02-07 9:17 ` Michael K. Edwards
2007-02-07 9:37 ` Michael K. Edwards
2007-02-06 0:32 ` Davide Libenzi
2007-02-05 21:21 ` Zach Brown
2007-02-02 23:37 ` Davide Libenzi
2007-02-03 0:02 ` Davide Libenzi
2007-02-05 17:12 ` Zach Brown
2007-02-05 18:24 ` Davide Libenzi
2007-02-05 21:44 ` David Miller
2007-02-06 0:15 ` Davide Libenzi
2007-02-05 21:36 ` bert hubert
2007-02-05 21:57 ` Linus Torvalds
2007-02-05 22:07 ` bert hubert
2007-02-05 22:15 ` Zach Brown
2007-02-05 22:34 ` Davide Libenzi
2007-02-06 0:27 ` Scot McKinley
2007-02-06 0:48 ` David Miller
2007-02-06 0:48 ` Joel Becker
2007-02-05 17:02 ` Zach Brown
2007-02-05 18:52 ` Davide Libenzi
2007-02-05 19:20 ` Zach Brown
2007-02-05 19:38 ` Davide Libenzi
2007-02-04 5:12 ` Davide Libenzi
2007-02-05 17:54 ` Zach Brown
2007-01-30 20:39 ` [PATCH 3 of 4] Teach paths to wake a specific void * target instead of a whole task_struct Zach Brown
2007-01-30 20:39 ` [PATCH 4 of 4] Introduce aio system call submission and completion system calls Zach Brown
2007-01-31 8:58 ` Andi Kleen
2007-01-31 17:15 ` Zach Brown
2007-01-31 17:21 ` Andi Kleen
2007-01-31 19:23 ` Zach Brown
2007-02-01 11:13 ` Suparna Bhattacharya
2007-02-01 19:50 ` Trond Myklebust
2007-02-02 7:19 ` Suparna Bhattacharya
2007-02-02 7:45 ` Andi Kleen
2007-02-01 22:18 ` Zach Brown
2007-02-02 3:35 ` Suparna Bhattacharya
2007-02-01 20:26 ` bert hubert
2007-02-01 21:29 ` Zach Brown
2007-02-02 7:12 ` bert hubert
2007-02-04 5:12 ` Davide Libenzi
2007-01-30 21:58 ` [PATCH 0 of 4] Generic AIO by scheduling stacks Linus Torvalds
2007-01-30 22:23 ` Linus Torvalds
2007-01-30 22:53 ` Zach Brown
2007-01-30 22:40 ` Zach Brown
2007-01-30 22:53 ` Linus Torvalds
2007-01-30 23:45 ` Zach Brown
2007-01-31 2:07 ` Benjamin Herrenschmidt
2007-01-31 2:04 ` Benjamin Herrenschmidt
2007-01-31 2:46 ` Linus Torvalds
2007-01-31 3:02 ` Linus Torvalds
2007-01-31 10:50 ` Xavier Bestel
2007-01-31 19:28 ` Zach Brown
2007-01-31 17:59 ` Zach Brown
2007-01-31 5:16 ` Benjamin Herrenschmidt
2007-01-31 5:36 ` Nick Piggin
2007-01-31 5:51 ` Nick Piggin
2007-01-31 6:06 ` Linus Torvalds
2007-01-31 8:43 ` Ingo Molnar
2007-01-31 20:13 ` Joel Becker
2007-01-31 18:20 ` Zach Brown
2007-01-31 17:47 ` Zach Brown
2007-01-31 17:38 ` Zach Brown
2007-01-31 17:51 ` Benjamin LaHaise
2007-01-31 19:25 ` Zach Brown
2007-01-31 20:05 ` Benjamin LaHaise
2007-01-31 20:41 ` Zach Brown
2007-02-04 5:13 ` Davide Libenzi
2007-02-04 20:00 ` Davide Libenzi
2007-02-09 22:33 ` Linus Torvalds
2007-02-09 23:11 ` Davide Libenzi
2007-02-09 23:35 ` Linus Torvalds
2007-02-10 18:45 ` Davide Libenzi
2007-02-10 19:01 ` Linus Torvalds
2007-02-10 19:35 ` Linus Torvalds
2007-02-10 20:59 ` Davide Libenzi
2007-02-10 0:04 ` Eric Dumazet
2007-02-10 0:12 ` Linus Torvalds
2007-02-10 0:34 ` Alan
2007-02-10 10:47 ` bert hubert
2007-02-10 18:19 ` Davide Libenzi
2007-02-11 0:56 ` David Miller
2007-02-11 2:49 ` Linus Torvalds
2007-02-14 16:42 ` James Antill
2007-02-03 14:05 [PATCH 2 of 4] Introduce i386 fibril scheduling linux
2007-02-06 13:43 Al Boldi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6f703f960702061445q23dd9d48q7afec75d2400ef62@mail.gmail.com \
--to=kent.overstreet@gmail.com \
--cc=bcrl@kvack.org \
--cc=davidel@xmailserver.org \
--cc=linux-aio@kvack.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=suparna@in.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=zach.brown@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).