* [ATTEND] [LSF TOPIC] What to do about O_DIRECT? @ 2013-01-18 22:10 Josef Bacik 2013-01-18 22:49 ` Zach Brown ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Josef Bacik @ 2013-01-18 22:10 UTC (permalink / raw) To: lsf-pc; +Cc: linux-fsdevel Hello, I'd like to attend this years LSF. I've been doing various file system work for the last 6 years, most of that with my head down in btrfs. I'd like to talk about what to do about O_DIRECT. Nobody really owns it and nobody really _wants_ to own it, and we've all been tacking on our own file systems optimizations and work arounds to make the generic stuff work. I'm to the point now where I'm just going to do all the work ourselves inside of btrfs since we need to have different waiting rules. So the question is do we want to just rm -f fs/direct-io.c and let everybody do their own thing, or is there some way we can tease out the actual generic stuff that everybody is going to need to do and adapt everybody to use that? And then theres the question of what are the things we want to do in the generic code, do we want to just do the get pages thing, do we want to still have stuff to build and submit the bios? What about how AIO interacts with it? And best of all can we convince Zach to do all of it for us! Thanks, Josef ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [ATTEND] [LSF TOPIC] What to do about O_DIRECT? 2013-01-18 22:10 [ATTEND] [LSF TOPIC] What to do about O_DIRECT? Josef Bacik @ 2013-01-18 22:49 ` Zach Brown 2013-01-18 23:01 ` Theodore Ts'o 2013-01-22 14:03 ` Jan Kara 2 siblings, 0 replies; 12+ messages in thread From: Zach Brown @ 2013-01-18 22:49 UTC (permalink / raw) To: Josef Bacik; +Cc: lsf-pc, linux-fsdevel > about how AIO interacts with it? And best of all can we convince Zach to do all > of it for us! Thanks, While I'll condede that you *can*, I look forward to finding out if you will :). - z ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [ATTEND] [LSF TOPIC] What to do about O_DIRECT? 2013-01-18 22:10 [ATTEND] [LSF TOPIC] What to do about O_DIRECT? Josef Bacik 2013-01-18 22:49 ` Zach Brown @ 2013-01-18 23:01 ` Theodore Ts'o 2013-01-20 22:35 ` Dave Chinner 2013-01-22 14:03 ` Jan Kara 2 siblings, 1 reply; 12+ messages in thread From: Theodore Ts'o @ 2013-01-18 23:01 UTC (permalink / raw) To: Josef Bacik; +Cc: lsf-pc, linux-fsdevel .... and can we get rid of this horrible hack where we have this bastardized use of a struct buffer_head which is allocated on the stack, which has nothing really to do with a buffer head, and is all about the fact that no one wants to change the function signature for get_block_t? - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [ATTEND] [LSF TOPIC] What to do about O_DIRECT? 2013-01-18 23:01 ` Theodore Ts'o @ 2013-01-20 22:35 ` Dave Chinner 2013-01-21 14:35 ` Josef Bacik 0 siblings, 1 reply; 12+ messages in thread From: Dave Chinner @ 2013-01-20 22:35 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Josef Bacik, lsf-pc, linux-fsdevel On Fri, Jan 18, 2013 at 06:01:28PM -0500, Theodore Ts'o wrote: > .... and can we get rid of this horrible hack where we have this > bastardized use of a struct buffer_head which is allocated on the > stack, which has nothing really to do with a buffer head, and is all > about the fact that no one wants to change the function signature for > get_block_t? I have patches to do that, but they are on the back burner right now because it causes some kind of weird corruption in the pwritev case that kvm uses to issue IO (i.e. large iovecs of 4k segments). IMO, the direct IO code is that complex and convoluted now that t is close to impossible to modify without introduce some weird, subtle and almost impossible to debug issue. The code has been optimised to the point of being unmaintainable, and I have seriously considered just reimplementing the bits XFS needs just for XFS several times in the past year.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [ATTEND] [LSF TOPIC] What to do about O_DIRECT? 2013-01-20 22:35 ` Dave Chinner @ 2013-01-21 14:35 ` Josef Bacik 0 siblings, 0 replies; 12+ messages in thread From: Josef Bacik @ 2013-01-21 14:35 UTC (permalink / raw) To: Dave Chinner; +Cc: Theodore Ts'o, Josef Bacik, lsf-pc, linux-fsdevel On Sun, Jan 20, 2013 at 03:35:21PM -0700, Dave Chinner wrote: > On Fri, Jan 18, 2013 at 06:01:28PM -0500, Theodore Ts'o wrote: > > .... and can we get rid of this horrible hack where we have this > > bastardized use of a struct buffer_head which is allocated on the > > stack, which has nothing really to do with a buffer head, and is all > > about the fact that no one wants to change the function signature for > > get_block_t? > > I have patches to do that, but they are on the back burner right now > because it causes some kind of weird corruption in the pwritev case > that kvm uses to issue IO (i.e. large iovecs of 4k segments). > > IMO, the direct IO code is that complex and convoluted now that t is > close to impossible to modify without introduce some weird, subtle > and almost impossible to debug issue. The code has been optimised to > the point of being unmaintainable, and I have seriously considered > just reimplementing the bits XFS needs just for XFS several times in > the past year.... > Yeah this is the point that I'm at currently, and it's even worse for Btrfs since we have to short circuit most of the work that the generic stuff does already since we build bio's ourselves. So maybe it would be best that we trim the generic stuff down to it's most basic functionality and leave it in place for things like ext2/3, and then for the more advanced file systems we just all do our own things. If we can all take a good look at what we would need to replace maybe we can come up with some generic helpers. Thanks, Josef ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [ATTEND] [LSF TOPIC] What to do about O_DIRECT? 2013-01-18 22:10 [ATTEND] [LSF TOPIC] What to do about O_DIRECT? Josef Bacik 2013-01-18 22:49 ` Zach Brown 2013-01-18 23:01 ` Theodore Ts'o @ 2013-01-22 14:03 ` Jan Kara 2013-01-30 23:16 ` Dave Chinner 2 siblings, 1 reply; 12+ messages in thread From: Jan Kara @ 2013-01-22 14:03 UTC (permalink / raw) To: Josef Bacik; +Cc: lsf-pc, linux-fsdevel Hello, On Fri 18-01-13 17:10:07, Josef Bacik wrote: > I'd like to talk about what to do about O_DIRECT. Nobody really owns it > and nobody really _wants_ to own it, and we've all been tacking on our > own file systems optimizations and work arounds to make the generic stuff > work. I'm to the point now where I'm just going to do all the work > ourselves inside of btrfs since we need to have different waiting rules. > So the question is do we want to just rm -f fs/direct-io.c and let > everybody do their own thing, I don't think we really can. Just grep for its uses. There are like 15 filesystems using it. That would be a huge amount of duplication. > or is there some way we can tease out the > actual generic stuff that everybody is going to need to do and adapt > everybody to use that? And then theres the question of what are the > things we want to do in the generic code, do we want to just do the get > pages thing, do we want to still have stuff to build and submit the bios? > What about how AIO interacts with it? I'm not sure what issues you are exactly facing but I can understand blockdev_direct_IO() isn't doing what btrfs would need. And I also agree with others that the code is rather complex and hard to maintain. E.g. the get_block_t insanity of using buffer_head is nagging me for a long time. The handling of unaligned DIO which all filesystems just serialize (at least for writes) because it causes data corruption. But these are mostly smaller gradual improvements. IMHO the devil is in "show me the code that is flexible enough to work for most, fast, and simpler than what we have". So I think we can speak about what btrfs (or xfs or whoever else) would need and how we could change (or whether it's worth to change) the generic code to accommodate its needs. Hum? Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [ATTEND] [LSF TOPIC] What to do about O_DIRECT? 2013-01-22 14:03 ` Jan Kara @ 2013-01-30 23:16 ` Dave Chinner 2013-01-31 22:41 ` Jan Kara 0 siblings, 1 reply; 12+ messages in thread From: Dave Chinner @ 2013-01-30 23:16 UTC (permalink / raw) To: Jan Kara; +Cc: Josef Bacik, lsf-pc, linux-fsdevel On Tue, Jan 22, 2013 at 03:03:37PM +0100, Jan Kara wrote: > Hello, > > On Fri 18-01-13 17:10:07, Josef Bacik wrote: > > I'd like to talk about what to do about O_DIRECT. Nobody really owns it > > and nobody really _wants_ to own it, and we've all been tacking on our > > own file systems optimizations and work arounds to make the generic stuff > > work. I'm to the point now where I'm just going to do all the work > > ourselves inside of btrfs since we need to have different waiting rules. > > So the question is do we want to just rm -f fs/direct-io.c and let > > everybody do their own thing, > I don't think we really can. Just grep for its uses. There are like 15 > filesystems using it. That would be a huge amount of duplication. > > > or is there some way we can tease out the > > actual generic stuff that everybody is going to need to do and adapt > > everybody to use that? And then theres the question of what are the > > things we want to do in the generic code, do we want to just do the get > > pages thing, do we want to still have stuff to build and submit the bios? > > What about how AIO interacts with it? > I'm not sure what issues you are exactly facing but I can understand > blockdev_direct_IO() isn't doing what btrfs would need. And I also agree > with others that the code is rather complex and hard to maintain. E.g. the > get_block_t insanity of using buffer_head is nagging me for a long time. > The handling of unaligned DIO which all filesystems just serialize (at least > for writes) because it causes data corruption. But these are mostly smaller > gradual improvements. The problem I find is that small gradual improvements is that every time I try to do one I end up with some wierd subtle problem that I've been unable to debug. It's happened several times in the past year, and each time I've given up on trying to make gradual improvements because of this.... That fits my definition of unmaintainable code almost perfectly. > IMHO the devil is in "show me the code that is flexible enough to work > for most, fast, and simpler than what we have". So I think we can speak > about what btrfs (or xfs or whoever else) would need and how we could > change (or whether it's worth to change) the generic code to accommodate > its needs. Hum? I'd say XFS needs very little outside help - AFAICT it still has >90% of the infrastructure it needs to do direct IO itself.... Cheers, Dave -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [ATTEND] [LSF TOPIC] What to do about O_DIRECT? 2013-01-30 23:16 ` Dave Chinner @ 2013-01-31 22:41 ` Jan Kara 2013-02-05 21:51 ` Dave Chinner 0 siblings, 1 reply; 12+ messages in thread From: Jan Kara @ 2013-01-31 22:41 UTC (permalink / raw) To: Dave Chinner; +Cc: Jan Kara, Josef Bacik, lsf-pc, linux-fsdevel On Thu 31-01-13 10:16:00, Dave Chinner wrote: > On Tue, Jan 22, 2013 at 03:03:37PM +0100, Jan Kara wrote: > > Hello, > > > > On Fri 18-01-13 17:10:07, Josef Bacik wrote: > > > I'd like to talk about what to do about O_DIRECT. Nobody really owns it > > > and nobody really _wants_ to own it, and we've all been tacking on our > > > own file systems optimizations and work arounds to make the generic stuff > > > work. I'm to the point now where I'm just going to do all the work > > > ourselves inside of btrfs since we need to have different waiting rules. > > > So the question is do we want to just rm -f fs/direct-io.c and let > > > everybody do their own thing, > > I don't think we really can. Just grep for its uses. There are like 15 > > filesystems using it. That would be a huge amount of duplication. > > > > > or is there some way we can tease out the > > > actual generic stuff that everybody is going to need to do and adapt > > > everybody to use that? And then theres the question of what are the > > > things we want to do in the generic code, do we want to just do the get > > > pages thing, do we want to still have stuff to build and submit the bios? > > > What about how AIO interacts with it? > > I'm not sure what issues you are exactly facing but I can understand > > blockdev_direct_IO() isn't doing what btrfs would need. And I also agree > > with others that the code is rather complex and hard to maintain. E.g. the > > get_block_t insanity of using buffer_head is nagging me for a long time. > > The handling of unaligned DIO which all filesystems just serialize (at least > > for writes) because it causes data corruption. But these are mostly smaller > > gradual improvements. > > The problem I find is that small gradual improvements is that every > time I try to do one I end up with some wierd subtle problem that > I've been unable to debug. It's happened several times in the past > year, and each time I've given up on trying to make gradual > improvements because of this.... > > That fits my definition of unmaintainable code almost perfectly. > > > IMHO the devil is in "show me the code that is flexible enough to work > > for most, fast, and simpler than what we have". So I think we can speak > > about what btrfs (or xfs or whoever else) would need and how we could > > change (or whether it's worth to change) the generic code to accommodate > > its needs. Hum? > > I'd say XFS needs very little outside help - AFAICT it still has >90% > of the infrastructure it needs to do direct IO itself.... I'm not sure it's really about infrastructure. When I look into fs/direct-io.c, it is theoretically a trivial thing - just a loop with get pages, map blocks, submit bio, repeat. But then there are the details of blocksize < pagesize or even DIO not aligned to blocksize, holes in files, throttling so that we don't have too many bios in flight... and suddently we have the beast it is now. And every filesystem will have to deal with these special cases so doing it in the generic code looks like a good thing to me (I can imagine those nasty subtle bugs when each filesystem has to handle all the cases on its own - and I believe XFS may get it right pretty quickly but world isn't just XFS)... Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [ATTEND] [LSF TOPIC] What to do about O_DIRECT? 2013-01-31 22:41 ` Jan Kara @ 2013-02-05 21:51 ` Dave Chinner 2013-02-06 0:40 ` Joel Becker 2013-02-06 17:36 ` Jan Kara 0 siblings, 2 replies; 12+ messages in thread From: Dave Chinner @ 2013-02-05 21:51 UTC (permalink / raw) To: Jan Kara; +Cc: Josef Bacik, lsf-pc, linux-fsdevel On Thu, Jan 31, 2013 at 11:41:09PM +0100, Jan Kara wrote: > On Thu 31-01-13 10:16:00, Dave Chinner wrote: > > On Tue, Jan 22, 2013 at 03:03:37PM +0100, Jan Kara wrote: > > > Hello, > > > > > > On Fri 18-01-13 17:10:07, Josef Bacik wrote: > > > > I'd like to talk about what to do about O_DIRECT. Nobody really owns it > > > > and nobody really _wants_ to own it, and we've all been tacking on our > > > > own file systems optimizations and work arounds to make the generic stuff > > > > work. I'm to the point now where I'm just going to do all the work > > > > ourselves inside of btrfs since we need to have different waiting rules. > > > > So the question is do we want to just rm -f fs/direct-io.c and let > > > > everybody do their own thing, > > > I don't think we really can. Just grep for its uses. There are like 15 > > > filesystems using it. That would be a huge amount of duplication. > > > > > > > or is there some way we can tease out the > > > > actual generic stuff that everybody is going to need to do and adapt > > > > everybody to use that? And then theres the question of what are the > > > > things we want to do in the generic code, do we want to just do the get > > > > pages thing, do we want to still have stuff to build and submit the bios? > > > > What about how AIO interacts with it? > > > I'm not sure what issues you are exactly facing but I can understand > > > blockdev_direct_IO() isn't doing what btrfs would need. And I also agree > > > with others that the code is rather complex and hard to maintain. E.g. the > > > get_block_t insanity of using buffer_head is nagging me for a long time. > > > The handling of unaligned DIO which all filesystems just serialize (at least > > > for writes) because it causes data corruption. But these are mostly smaller > > > gradual improvements. > > > > The problem I find is that small gradual improvements is that every > > time I try to do one I end up with some wierd subtle problem that > > I've been unable to debug. It's happened several times in the past > > year, and each time I've given up on trying to make gradual > > improvements because of this.... > > > > That fits my definition of unmaintainable code almost perfectly. > > > > > IMHO the devil is in "show me the code that is flexible enough to work > > > for most, fast, and simpler than what we have". So I think we can speak > > > about what btrfs (or xfs or whoever else) would need and how we could > > > change (or whether it's worth to change) the generic code to accommodate > > > its needs. Hum? > > > > I'd say XFS needs very little outside help - AFAICT it still has >90% > > of the infrastructure it needs to do direct IO itself.... > I'm not sure it's really about infrastructure. When I look into > fs/direct-io.c, it is theoretically a trivial thing - just a loop with get > pages, map blocks, submit bio, repeat. But then there are the details of > blocksize < pagesize or even DIO not aligned to blocksize, holes in files, > throttling so that we don't have too many bios in flight... Sure, that's relatively simple, but once you optimise it repeated to the point where the order of single instructions is important even simple code becomes a tangled, ugly mess. > and suddently > we have the beast it is now. And every filesystem will have to deal with > these special cases so doing it in the generic code looks like a good > thing to me (I can imagine those nasty subtle bugs when each filesystem has > to handle all the cases on its own - and I believe XFS may get it right > pretty quickly but world isn't just XFS)... The advantage of using shared code is that it eases the burden of maintenance and enhancement on individual filesystems. Both Josef and I are putting forward the argument that the shared direct IO code provides neither of those advantages any more due to current complexity and fragility that has resulted from the monolithic "everything for everyone" approach we currently have. What I'm trying to say is that maybe there's a better way of providing generic direct IO support. Perhaps we are better served by having smaller generic helpers similar to the buffered IO path to allow filesystems to the simple stuff as optimally as possible without all the overhead they don't need. One-size-fits-all has never worked in the filesystems game, yet we seem to be stuck on that approach here even when it appears to be collapsing under it's own weight.... :/ Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [ATTEND] [LSF TOPIC] What to do about O_DIRECT? 2013-02-05 21:51 ` Dave Chinner @ 2013-02-06 0:40 ` Joel Becker 2013-02-06 4:32 ` Kent Overstreet 2013-02-06 17:36 ` Jan Kara 1 sibling, 1 reply; 12+ messages in thread From: Joel Becker @ 2013-02-06 0:40 UTC (permalink / raw) To: Dave Chinner; +Cc: Jan Kara, Josef Bacik, lsf-pc, linux-fsdevel On Wed, Feb 06, 2013 at 08:51:12AM +1100, Dave Chinner wrote: > The advantage of using shared code is that it eases the burden of > maintenance and enhancement on individual filesystems. Both Josef > and I are putting forward the argument that the shared direct IO > code provides neither of those advantages any more due to current > complexity and fragility that has resulted from the monolithic > "everything for everyone" approach we currently have. > > What I'm trying to say is that maybe there's a better way of > providing generic direct IO support. Perhaps we are better served by > having smaller generic helpers similar to the buffered IO path to > allow filesystems to the simple stuff as optimally as possible > without all the overhead they don't need. One-size-fits-all has > never worked in the filesystems game, yet we seem to be stuck on > that approach here even when it appears to be collapsing under it's > own weight.... :/ I vote for trying the helper approach. I think dropping generic code altogether would be a disaster. The corner cases of O_DIRECT are legion; everyone has behavioral assumptions based on historical implementations, etc. Remember how badly some high-performance software handles O_DIRECT alignments larger than 512B. We have a long history of successfully inverting the generic code with helpers and (if necessary, I'm not saying it is) operations structures. I don't think I'd try to shove down generic_aio_read/write. Let them handle the check for O_DIRECT and the fallback to buffered I/O. Joel -- "And yet I fight, And yet I fight this battle all alone. No one to cry to; No place to call home." http://www.jlbec.org/ jlbec@evilplan.org ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [ATTEND] [LSF TOPIC] What to do about O_DIRECT? 2013-02-06 0:40 ` Joel Becker @ 2013-02-06 4:32 ` Kent Overstreet 0 siblings, 0 replies; 12+ messages in thread From: Kent Overstreet @ 2013-02-06 4:32 UTC (permalink / raw) To: Dave Chinner, Jan Kara, Josef Bacik, lsf-pc, linux-fsdevel On Tue, Feb 5, 2013 at 4:40 PM, Joel Becker <jlbec@evilplan.org> wrote: > I vote for trying the helper approach. I think dropping generic > code altogether would be a disaster. The corner cases of O_DIRECT are > legion; everyone has behavioral assumptions based on historical > implementations, etc. Remember how badly some high-performance software > handles O_DIRECT alignments larger than 512B. > We have a long history of successfully inverting the generic > code with helpers and (if necessary, I'm not saying it is) operations > structures. > I don't think I'd try to shove down generic_aio_read/write. Let > them handle the check for O_DIRECT and the fallback to buffered I/O. Tackling the dio code has been high on my list, if I ever get time - in the profiles I've been doing, reading from a raw block device, the dio code is the biggest single source of overhead by a good margin and there's plenty of obvious inefficiency. The helper approach was more or less what I had in mind - also, splitting out the block device path out to start with, and once that's cleaned up using that to guide how the helpers and such should be structured. The other thing that really frustrates me about the dio code is the perverse flow of control between the generic code and the fs code, via the getblk callbacks. IMO, it'd be much cleaner to aim for something more like submitting a bio to the block layer - just set up a bio, point it at inode:offset and pass it off to the fs code. If I can finish off my immutable bvec/efficient bio splitting code and get it in it should make that approach work out nicely. But currently my brain is stuffed full of other sutff :/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [ATTEND] [LSF TOPIC] What to do about O_DIRECT? 2013-02-05 21:51 ` Dave Chinner 2013-02-06 0:40 ` Joel Becker @ 2013-02-06 17:36 ` Jan Kara 1 sibling, 0 replies; 12+ messages in thread From: Jan Kara @ 2013-02-06 17:36 UTC (permalink / raw) To: Dave Chinner; +Cc: Jan Kara, Josef Bacik, lsf-pc, linux-fsdevel On Wed 06-02-13 08:51:12, Dave Chinner wrote: > The advantage of using shared code is that it eases the burden of > maintenance and enhancement on individual filesystems. Both Josef > and I are putting forward the argument that the shared direct IO > code provides neither of those advantages any more due to current > complexity and fragility that has resulted from the monolithic > "everything for everyone" approach we currently have. > > What I'm trying to say is that maybe there's a better way of > providing generic direct IO support. Perhaps we are better served by > having smaller generic helpers similar to the buffered IO path to > allow filesystems to the simple stuff as optimally as possible > without all the overhead they don't need. One-size-fits-all has > never worked in the filesystems game, yet we seem to be stuck on > that approach here even when it appears to be collapsing under it's > own weight.... :/ Yeah, the approach of providing smaller generic helpers could make the code more readable so I guess it's worth a try. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-02-06 17:37 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-01-18 22:10 [ATTEND] [LSF TOPIC] What to do about O_DIRECT? Josef Bacik 2013-01-18 22:49 ` Zach Brown 2013-01-18 23:01 ` Theodore Ts'o 2013-01-20 22:35 ` Dave Chinner 2013-01-21 14:35 ` Josef Bacik 2013-01-22 14:03 ` Jan Kara 2013-01-30 23:16 ` Dave Chinner 2013-01-31 22:41 ` Jan Kara 2013-02-05 21:51 ` Dave Chinner 2013-02-06 0:40 ` Joel Becker 2013-02-06 4:32 ` Kent Overstreet 2013-02-06 17:36 ` Jan Kara
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.