From: Davide Libenzi <davidel@xmailserver.org>
To: bert hubert <bert.hubert@netherlabs.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Zach Brown <zach.brown@oracle.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-aio@kvack.org, Suparna Bhattacharya <suparna@in.ibm.com>,
Benjamin LaHaise <bcrl@kvack.org>, Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH 0 of 4] Generic AIO by scheduling stacks
Date: Sat, 10 Feb 2007 10:19:15 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.0702100921180.11630@alien.or.mcafeemobile.com> (raw)
In-Reply-To: <20070210104712.GA20878@outpost.ds9a.nl>
On Sat, 10 Feb 2007, bert hubert wrote:
> On Fri, Feb 09, 2007 at 02:33:01PM -0800, Linus Torvalds wrote:
>
> > - IF the system call blocks, we call the architecture-specific
> > "schedule_async()" function before we even get any scheduler locks, and
> > it can just do a fork() at that time, and let the *child* return to the
> > original user space. The process that already started doing the system
> > call will just continue to do the system call.
>
> Ah - cool. The average time we have to wait is probably far greater than the
> fork overhead, microseconds versus milliseconds.
>
> However, and there probably is a good reason for this, why isn't it possible
> to do it the other way around, and have the *child* do the work and the
> original return to userspace?
If the parent is going to schedule(), someone above has already dropped
the parent's task_struct inside a wait queue, so the *parent* will be the
wakeup target [1].
Linus take to the generic AIO is a neat one, but IMO continuos fork/exits
are going to be expensive. Even if the task is going to sleep, that does
not mean that the parent (well, in Linus case, the child actually) does
not have more stuff to feed to async(). IMO the frequency of AIO
submission and retrieval can get pretty high (hence the frequency of
fork/exit), and there might be a price to pay for it at the end.
IMO one solution, following the non-fibril way, may be:
- Keep a pool of per-process threads (a per-process pool already has stuff
like "files" already correctly setup, just for example - no need to
teach everywhere around the kernel of the "async" special case)
- When a schedule happen on the submission thread, we get a thread
(task_struct really) of the available pool
- We setup the submission (now going to sleep) thread return IP to an
async_complete (or whatever name) stub. This will drop a result in a
queue, and wake the async_wait (or whatever name) wait queue head
- We may want to swap at least the PID (signals, ...?) between the two, so
even if we're re-emrging with a new task_struct, the TID will be the same
- We make the "returning" thread to come back to userspace through some
special helper ala ret_from_fork (ret_from_async ?)
- We want also to keep a record (hash?) of userspace cookies and threads
currently servicing them, so that we can implement cancel (send signal)
Open issues:
- What if the pool becomes empty since all thread are stuck under schedule?
o Grow the pool (and delay-shrink at quiter times)?
o Make the caller really sleep?
o Fall back in queue-request mode?
- Look at the Devil hiding in the details and showing up many times during
the process
Yup, I can see Zach having a lot of fun with it ;)
[1] Well, you could add a list_head to the task_struct, and teach the
add-to-waitqueue to drop a reference to all the wait queue entries
hosting the task_struct. Then walk&fix (likely be only one entry) when
you swap the submission thread context (thread_info, per_call stuff, ...)
over a service thread task_struct. At that point you can re-emerge
with the same task_struct. Pretty nasty though.
- Davide
next prev parent reply other threads:[~2007-02-10 18:19 UTC|newest]
Thread overview: 151+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-01-30 20:39 [PATCH 0 of 4] Generic AIO by scheduling stacks Zach Brown
2007-01-30 20:39 ` [PATCH 1 of 4] Introduce per_call_chain() Zach Brown
2007-01-30 20:39 ` [PATCH 2 of 4] Introduce i386 fibril scheduling Zach Brown
2007-02-01 8:36 ` Ingo Molnar
2007-02-01 13:02 ` Ingo Molnar
2007-02-01 13:19 ` Christoph Hellwig
2007-02-01 13:52 ` Ingo Molnar
2007-02-01 17:13 ` Mark Lord
2007-02-01 18:02 ` Ingo Molnar
2007-02-02 13:23 ` Andi Kleen
2007-02-01 21:52 ` Zach Brown
2007-02-01 22:23 ` Benjamin LaHaise
2007-02-01 22:37 ` Zach Brown
2007-02-02 13:22 ` Andi Kleen
2007-02-01 20:07 ` Linus Torvalds
2007-02-02 10:49 ` Ingo Molnar
2007-02-02 15:56 ` Linus Torvalds
2007-02-02 19:59 ` Alan
2007-02-02 20:14 ` Linus Torvalds
2007-02-02 20:58 ` Davide Libenzi
2007-02-02 21:09 ` Linus Torvalds
2007-02-02 21:30 ` Alan
2007-02-02 21:30 ` Linus Torvalds
2007-02-02 22:42 ` Ingo Molnar
2007-02-02 23:01 ` Linus Torvalds
2007-02-02 23:17 ` Linus Torvalds
2007-02-03 0:04 ` Alan
2007-02-03 0:23 ` bert hubert
2007-02-02 22:48 ` Alan
2007-02-05 16:44 ` Zach Brown
2007-02-02 22:21 ` Ingo Molnar
2007-02-02 22:49 ` Linus Torvalds
2007-02-02 23:55 ` Ingo Molnar
2007-02-03 0:56 ` Linus Torvalds
2007-02-03 7:15 ` Suparna Bhattacharya
2007-02-03 8:23 ` Ingo Molnar
2007-02-03 9:25 ` Matt Mackall
2007-02-03 10:03 ` Ingo Molnar
2007-02-05 17:44 ` Zach Brown
2007-02-05 19:26 ` Davide Libenzi
2007-02-05 19:41 ` Zach Brown
2007-02-05 20:10 ` Davide Libenzi
2007-02-05 20:21 ` Zach Brown
2007-02-05 20:42 ` Linus Torvalds
2007-02-05 20:39 ` Linus Torvalds
2007-02-05 21:09 ` Davide Libenzi
2007-02-05 21:31 ` Kent Overstreet
2007-02-06 20:25 ` Davide Libenzi
2007-02-06 20:46 ` Linus Torvalds
2007-02-06 21:16 ` David Miller
2007-02-06 21:28 ` Linus Torvalds
2007-02-06 21:31 ` David Miller
2007-02-06 21:46 ` Eric Dumazet
2007-02-06 21:50 ` Linus Torvalds
2007-02-06 22:28 ` Zach Brown
2007-02-06 22:45 ` Kent Overstreet
2007-02-06 23:04 ` Linus Torvalds
2007-02-07 1:22 ` Kent Overstreet
2007-02-06 23:23 ` Davide Libenzi
2007-02-06 23:39 ` Joel Becker
2007-02-06 23:56 ` Davide Libenzi
2007-02-07 0:06 ` Joel Becker
2007-02-07 0:23 ` Davide Libenzi
2007-02-07 0:44 ` Joel Becker
2007-02-07 1:15 ` Davide Libenzi
2007-02-07 1:24 ` Kent Overstreet
2007-02-07 1:30 ` Joel Becker
2007-02-07 6:16 ` Michael K. Edwards
2007-02-07 9:17 ` Michael K. Edwards
2007-02-07 9:37 ` Michael K. Edwards
2007-02-06 0:32 ` Davide Libenzi
2007-02-05 21:21 ` Zach Brown
2007-02-02 23:37 ` Davide Libenzi
2007-02-03 0:02 ` Davide Libenzi
2007-02-05 17:12 ` Zach Brown
2007-02-05 18:24 ` Davide Libenzi
2007-02-05 21:44 ` David Miller
2007-02-06 0:15 ` Davide Libenzi
2007-02-05 21:36 ` bert hubert
2007-02-05 21:57 ` Linus Torvalds
2007-02-05 22:07 ` bert hubert
2007-02-05 22:15 ` Zach Brown
2007-02-05 22:34 ` Davide Libenzi
2007-02-06 0:27 ` Scot McKinley
2007-02-06 0:48 ` David Miller
2007-02-06 0:48 ` Joel Becker
2007-02-05 17:02 ` Zach Brown
2007-02-05 18:52 ` Davide Libenzi
2007-02-05 19:20 ` Zach Brown
2007-02-05 19:38 ` Davide Libenzi
2007-02-04 5:12 ` Davide Libenzi
2007-02-05 17:54 ` Zach Brown
2007-01-30 20:39 ` [PATCH 3 of 4] Teach paths to wake a specific void * target instead of a whole task_struct Zach Brown
2007-01-30 20:39 ` [PATCH 4 of 4] Introduce aio system call submission and completion system calls Zach Brown
2007-01-31 8:58 ` Andi Kleen
2007-01-31 17:15 ` Zach Brown
2007-01-31 17:21 ` Andi Kleen
2007-01-31 19:23 ` Zach Brown
2007-02-01 11:13 ` Suparna Bhattacharya
2007-02-01 19:50 ` Trond Myklebust
2007-02-02 7:19 ` Suparna Bhattacharya
2007-02-02 7:45 ` Andi Kleen
2007-02-01 22:18 ` Zach Brown
2007-02-02 3:35 ` Suparna Bhattacharya
2007-02-01 20:26 ` bert hubert
2007-02-01 21:29 ` Zach Brown
2007-02-02 7:12 ` bert hubert
2007-02-04 5:12 ` Davide Libenzi
2007-01-30 21:58 ` [PATCH 0 of 4] Generic AIO by scheduling stacks Linus Torvalds
2007-01-30 22:23 ` Linus Torvalds
2007-01-30 22:53 ` Zach Brown
2007-01-30 22:40 ` Zach Brown
2007-01-30 22:53 ` Linus Torvalds
2007-01-30 23:45 ` Zach Brown
2007-01-31 2:07 ` Benjamin Herrenschmidt
2007-01-31 2:04 ` Benjamin Herrenschmidt
2007-01-31 2:46 ` Linus Torvalds
2007-01-31 3:02 ` Linus Torvalds
2007-01-31 10:50 ` Xavier Bestel
2007-01-31 19:28 ` Zach Brown
2007-01-31 17:59 ` Zach Brown
2007-01-31 5:16 ` Benjamin Herrenschmidt
2007-01-31 5:36 ` Nick Piggin
2007-01-31 5:51 ` Nick Piggin
2007-01-31 6:06 ` Linus Torvalds
2007-01-31 8:43 ` Ingo Molnar
2007-01-31 20:13 ` Joel Becker
2007-01-31 18:20 ` Zach Brown
2007-01-31 17:47 ` Zach Brown
2007-01-31 17:38 ` Zach Brown
2007-01-31 17:51 ` Benjamin LaHaise
2007-01-31 19:25 ` Zach Brown
2007-01-31 20:05 ` Benjamin LaHaise
2007-01-31 20:41 ` Zach Brown
2007-02-04 5:13 ` Davide Libenzi
2007-02-04 20:00 ` Davide Libenzi
2007-02-09 22:33 ` Linus Torvalds
2007-02-09 23:11 ` Davide Libenzi
2007-02-09 23:35 ` Linus Torvalds
2007-02-10 18:45 ` Davide Libenzi
2007-02-10 19:01 ` Linus Torvalds
2007-02-10 19:35 ` Linus Torvalds
2007-02-10 20:59 ` Davide Libenzi
2007-02-10 0:04 ` Eric Dumazet
2007-02-10 0:12 ` Linus Torvalds
2007-02-10 0:34 ` Alan
2007-02-10 10:47 ` bert hubert
2007-02-10 18:19 ` Davide Libenzi [this message]
2007-02-11 0:56 ` David Miller
2007-02-11 2:49 ` Linus Torvalds
2007-02-14 16:42 ` James Antill
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0702100921180.11630@alien.or.mcafeemobile.com \
--to=davidel@xmailserver.org \
--cc=bcrl@kvack.org \
--cc=bert.hubert@netherlabs.nl \
--cc=linux-aio@kvack.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=suparna@in.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=zach.brown@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).