All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suparna Bhattacharya <suparna@in.ibm.com>
To: Zach Brown <zach.brown@oracle.com>
Cc: Andi Kleen <ak@suse.de>,
	linux-kernel@vger.kernel.org, linux-aio@kvack.org,
	Benjamin LaHaise <bcrl@kvack.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 4 of 4] Introduce aio system call submission and completion system calls
Date: Thu, 1 Feb 2007 16:43:07 +0530	[thread overview]
Message-ID: <20070201111307.GA24723@in.ibm.com> (raw)
In-Reply-To: <FA06EB13-4BF6-4E3A-B223-6494079F541F@oracle.com>

On Wed, Jan 31, 2007 at 11:23:39AM -0800, Zach Brown wrote:
> On Jan 31, 2007, at 9:21 AM, Andi Kleen wrote:
> 
> >On Wednesday 31 January 2007 18:15, Zach Brown wrote:
> >>
> >>On Jan 31, 2007, at 12:58 AM, Andi Kleen wrote:
> >>
> >>>Do you have any numbers how this compares cycle wise to just doing
> >>>clone+syscall+exit in user space?
> >>
> >>Not yet, no.  Release early, release often, and all that.  I'll throw
> >>something together.
> >
> >So what was the motivation for doing this then?
> 
> Most fundamentally?  Providing AIO system call functionality at a  
> much lower maintenance cost.  The hope is that the cost of adopting  
> these fibril things will be lower than the cost of having to touch a  
> code path that wants AIO support.
> 
> I simply don't believe that it's cheap to update code paths to  
> support non-blocking state machines.  As just one example of a  
> looming cost, consider the retry-based buffered fs AIO patches that  
> exist today.  Their requirement to maintain these precisely balanced  
> code paths that know to only return -EIOCBRETRY once they're at a  
> point where retries won't access current-> seems.. unsustainable to  
> me.  This stems from the retries being handled off in the aio kernel  
> threads which have their own task_struct.  fs/aio.c goes to the  
> trouble of migrating ->mm from the submitting task_struct, but  
> nothing else.  Continually adjusting this finely balanced  
> relationship between paths that return -EIOCBRETY and the fields of  
> task_struct that fs/aio.c knows to share with the submitting context  
> seems unacceptably fragile.

Wooo ...hold on ... I think this is swinging out of perspective :)

I have said some of this before, but let me try again.

As you already discovered when going down the fibril path, there are
two kinds of accesses to current-> state, (1) common state
for a given call chain (e.g. journal info etc), and (2) for 
various validations against the caller's process (uid, ulimit etc). 

(1) is not an issue when it comes to execution in background threads
(the VFS already uses background writeback for example).

As for (2), such checks need to happen upfront at the time of IO submission,
so again are not an issue.

This is aside from access to the caller's address space, a familiar
concept which the AIO threads use. If there is any state that
relates to address space access, then it belongs in the ->mm struct, rather
than in current (and we should fix that if we find anything which isn't
already there).

I don't see any other reason why IO paths should be assuming that they are
running in the original caller's context, midway through doing the IO. If
that were the case background writeouts and readaheads could be fragile as
well (or ptrace). The reason it isn't is because of this conceptual division of
responsibility.

Sure, having explicit separation of submission checks as an interface
would likely make this clearer and cleaner, but I'm just
pointing out that usage of current-> state isn't and shouldn't be arbitrary
when it comes to filesystem IO paths. We should be concerned in any case
if that starts happening.

Of course, this is fundamentally connected to the way filesystem IO is
designed to work, and may not necessarily apply to all syscal

When you want to make any and every syscall asynchronous, then indeed
the challenge is magnified and that is where it could get scary. But that
isn't the problem the current AIO code is trying to tackle.

> 
> Even with those buffered IO patches we still only get non-blocking  
> behaviour at a few specific blocking points in the buffered IO path.   
> It's nothing like the guarantee of non-blocking submission returns  
> that the fibril-based submission guarantees.

This one is a better reason, and why I have thought of fibrils (or the
equivalent alternative of enhancing kernel theads to become even lighter)
as an interesting fallback option to implement AIO for cases which we
haven't (maybe some of which are too complicated) gotton around to
supporting natively. Especially if it could be coupled with some clever
tricks to keep stack space to be minimal (I have wondered whether any of
the ideas from similar user-level efforts like Cappricio, or laio would help).

> 
> >  It's only point
> >is to have smaller startup costs for AIO than clone+fork without
> >fixing the VFS code to be a state machine, right?
> 
> Smaller startup costs and fewer behavioural differences.  Did that  
> message to Nick about ioprio and io_context resonate with you at all?
> 
> >I'm personally unclear if it's really less work to teach a lot of
> >code in the kernel about a new thread abstraction than changing VFS.
> 
> Why are we limiting the scope of moving to a state machine just to  
> the VFS?  If you look no further than some hypothetical AIO iscsi/aoe/ 
> nbd/whatever target you obviously include networking.  Probably splice 
> () if you're aggressive :).
> 
> Let's be clear.  I would be thrilled if AIO was implemented by native  
> non-blocking handler implementations.  I don't think it will happen.   
> Not because we don't think it sounds great on paper, but because it's  
> a hugely complex endeavor that would take development and maintenance  
> effort away from the task of keeping basic functionality working.
> 
> So the hope with fibrils is that we lower the barrier to getting AIO  
> syscall support across the board at an acceptable cost.
> 
> It doesn't *stop* us from migrating very important paths (storage,  
> networking) to wildly optimized AIO implementations.  But it also  
> doesn't force AIO support to wait for that.
> 
> >Your patches don't look that complicated yet but you openly
> >admitted you waved away many of the more tricky issues (like
> >signals etc.) and I bet there are yet-unknown side effects
> >of this too that will need more changes.
> 
> To quibble, "waved away" implies that they've been dismissed.  That's  
> not right.  It's a work in progress, so yes, there will be more  
> fiddly details discovered and addressed over time.  The hope is that  
> when it's said and done it'll still be worth merging.  If at some  
> point it gets to be too much, well, at least we'll have this work to  
> reference as a decisive attempt.
> 
> >I'm not sure the fibrils thing will be that much faster than
> >a possibly somewhat fast pathed for this case clone+syscall+exit.
> 
> I'll try and get some numbers for you sooner rather than later.
> 
> Thanks for being diligent, this is exactly the kind of hard look I  
> want this work to get.

BTW, I like the way you are approaching this with a cautiously
critical eye cognizant of lurking details/issues, despite the obvious
(and justified) excitement/eureka feeling.  AIO _is_ hard !

Regards
Suparna

> 
> - z
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


  reply	other threads:[~2007-02-01 11:07 UTC|newest]

Thread overview: 151+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-30 20:39 [PATCH 0 of 4] Generic AIO by scheduling stacks Zach Brown
2007-01-30 20:39 ` [PATCH 1 of 4] Introduce per_call_chain() Zach Brown
2007-01-30 20:39 ` [PATCH 2 of 4] Introduce i386 fibril scheduling Zach Brown
2007-02-01  8:36   ` Ingo Molnar
2007-02-01 13:02     ` Ingo Molnar
2007-02-01 13:19       ` Christoph Hellwig
2007-02-01 13:52         ` Ingo Molnar
2007-02-01 17:13           ` Mark Lord
2007-02-01 18:02             ` Ingo Molnar
2007-02-02 13:23         ` Andi Kleen
2007-02-01 21:52       ` Zach Brown
2007-02-01 22:23         ` Benjamin LaHaise
2007-02-01 22:37           ` Zach Brown
2007-02-02 13:22       ` Andi Kleen
2007-02-01 20:07     ` Linus Torvalds
2007-02-02 10:49       ` Ingo Molnar
2007-02-02 15:56         ` Linus Torvalds
2007-02-02 19:59           ` Alan
2007-02-02 20:14             ` Linus Torvalds
2007-02-02 20:58               ` Davide Libenzi
2007-02-02 21:09                 ` Linus Torvalds
2007-02-02 21:30               ` Alan
2007-02-02 21:30                 ` Linus Torvalds
2007-02-02 22:42                   ` Ingo Molnar
2007-02-02 23:01                     ` Linus Torvalds
2007-02-02 23:17                       ` Linus Torvalds
2007-02-03  0:04                         ` Alan
2007-02-03  0:23                         ` bert hubert
2007-02-02 22:48                   ` Alan
2007-02-05 16:44             ` Zach Brown
2007-02-02 22:21           ` Ingo Molnar
2007-02-02 22:49             ` Linus Torvalds
2007-02-02 23:55               ` Ingo Molnar
2007-02-03  0:56                 ` Linus Torvalds
2007-02-03  7:15                   ` Suparna Bhattacharya
2007-02-03  8:23                   ` Ingo Molnar
2007-02-03  9:25                     ` Matt Mackall
2007-02-03 10:03                       ` Ingo Molnar
2007-02-05 17:44                     ` Zach Brown
2007-02-05 19:26                       ` Davide Libenzi
2007-02-05 19:41                         ` Zach Brown
2007-02-05 20:10                           ` Davide Libenzi
2007-02-05 20:21                             ` Zach Brown
2007-02-05 20:42                               ` Linus Torvalds
2007-02-05 20:39                             ` Linus Torvalds
2007-02-05 21:09                               ` Davide Libenzi
2007-02-05 21:31                                 ` Kent Overstreet
2007-02-06 20:25                                   ` Davide Libenzi
2007-02-06 20:46                                   ` Linus Torvalds
2007-02-06 21:16                                     ` David Miller
2007-02-06 21:28                                       ` Linus Torvalds
2007-02-06 21:31                                         ` David Miller
2007-02-06 21:46                                           ` Eric Dumazet
2007-02-06 21:50                                           ` Linus Torvalds
2007-02-06 22:28                                             ` Zach Brown
2007-02-06 22:45                                     ` Kent Overstreet
2007-02-06 23:04                                       ` Linus Torvalds
2007-02-07  1:22                                         ` Kent Overstreet
2007-02-06 23:23                                       ` Davide Libenzi
2007-02-06 23:39                                         ` Joel Becker
2007-02-06 23:56                                           ` Davide Libenzi
2007-02-07  0:06                                             ` Joel Becker
2007-02-07  0:23                                               ` Davide Libenzi
2007-02-07  0:44                                                 ` Joel Becker
2007-02-07  1:15                                                   ` Davide Libenzi
2007-02-07  1:24                                                     ` Kent Overstreet
2007-02-07  1:30                                                     ` Joel Becker
2007-02-07  6:16                                                   ` Michael K. Edwards
2007-02-07  9:17                                                     ` Michael K. Edwards
2007-02-07  9:37                                                       ` Michael K. Edwards
2007-02-06  0:32                                 ` Davide Libenzi
2007-02-05 21:21                               ` Zach Brown
2007-02-02 23:37             ` Davide Libenzi
2007-02-03  0:02               ` Davide Libenzi
2007-02-05 17:12               ` Zach Brown
2007-02-05 18:24                 ` Davide Libenzi
2007-02-05 21:44                   ` David Miller
2007-02-06  0:15                     ` Davide Libenzi
2007-02-05 21:36               ` bert hubert
2007-02-05 21:57                 ` Linus Torvalds
2007-02-05 22:07                   ` bert hubert
2007-02-05 22:15                     ` Zach Brown
2007-02-05 22:34                   ` Davide Libenzi
2007-02-06  0:27                   ` Scot McKinley
2007-02-06  0:48                     ` David Miller
2007-02-06  0:48                     ` Joel Becker
2007-02-05 17:02             ` Zach Brown
2007-02-05 18:52               ` Davide Libenzi
2007-02-05 19:20                 ` Zach Brown
2007-02-05 19:38                   ` Davide Libenzi
2007-02-04  5:12   ` Davide Libenzi
2007-02-05 17:54     ` Zach Brown
2007-01-30 20:39 ` [PATCH 3 of 4] Teach paths to wake a specific void * target instead of a whole task_struct Zach Brown
2007-01-30 20:39 ` [PATCH 4 of 4] Introduce aio system call submission and completion system calls Zach Brown
2007-01-31  8:58   ` Andi Kleen
2007-01-31 17:15     ` Zach Brown
2007-01-31 17:21       ` Andi Kleen
2007-01-31 19:23         ` Zach Brown
2007-02-01 11:13           ` Suparna Bhattacharya [this message]
2007-02-01 19:50             ` Trond Myklebust
2007-02-02  7:19               ` Suparna Bhattacharya
2007-02-02  7:45                 ` Andi Kleen
2007-02-01 22:18             ` Zach Brown
2007-02-02  3:35               ` Suparna Bhattacharya
2007-02-01 20:26   ` bert hubert
2007-02-01 21:29     ` Zach Brown
2007-02-02  7:12       ` bert hubert
2007-02-04  5:12   ` Davide Libenzi
2007-01-30 21:58 ` [PATCH 0 of 4] Generic AIO by scheduling stacks Linus Torvalds
2007-01-30 22:23   ` Linus Torvalds
2007-01-30 22:53     ` Zach Brown
2007-01-30 22:40   ` Zach Brown
2007-01-30 22:53     ` Linus Torvalds
2007-01-30 23:45       ` Zach Brown
2007-01-31  2:07         ` Benjamin Herrenschmidt
2007-01-31  2:04 ` Benjamin Herrenschmidt
2007-01-31  2:46   ` Linus Torvalds
2007-01-31  3:02     ` Linus Torvalds
2007-01-31 10:50       ` Xavier Bestel
2007-01-31 19:28         ` Zach Brown
2007-01-31 17:59       ` Zach Brown
2007-01-31  5:16     ` Benjamin Herrenschmidt
2007-01-31  5:36     ` Nick Piggin
2007-01-31  5:51       ` Nick Piggin
2007-01-31  6:06       ` Linus Torvalds
2007-01-31  8:43         ` Ingo Molnar
2007-01-31 20:13         ` Joel Becker
2007-01-31 18:20       ` Zach Brown
2007-01-31 17:47     ` Zach Brown
2007-01-31 17:38   ` Zach Brown
2007-01-31 17:51     ` Benjamin LaHaise
2007-01-31 19:25       ` Zach Brown
2007-01-31 20:05         ` Benjamin LaHaise
2007-01-31 20:41           ` Zach Brown
2007-02-04  5:13 ` Davide Libenzi
2007-02-04 20:00   ` Davide Libenzi
2007-02-09 22:33 ` Linus Torvalds
2007-02-09 23:11   ` Davide Libenzi
2007-02-09 23:35     ` Linus Torvalds
2007-02-10 18:45       ` Davide Libenzi
2007-02-10 19:01         ` Linus Torvalds
2007-02-10 19:35           ` Linus Torvalds
2007-02-10 20:59           ` Davide Libenzi
2007-02-10  0:04   ` Eric Dumazet
2007-02-10  0:12     ` Linus Torvalds
2007-02-10  0:34       ` Alan
2007-02-10 10:47   ` bert hubert
2007-02-10 18:19     ` Davide Libenzi
2007-02-11  0:56   ` David Miller
2007-02-11  2:49     ` Linus Torvalds
2007-02-14 16:42       ` James Antill

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070201111307.GA24723@in.ibm.com \
    --to=suparna@in.ibm.com \
    --cc=ak@suse.de \
    --cc=bcrl@kvack.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.