All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Theodore Ts'o <tytso@mit.edu>, Eric Sandeen <sandeen@redhat.com>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH 0/5 v2] add extent status tree caching
Date: Mon, 22 Jul 2013 20:02:55 +1000	[thread overview]
Message-ID: <20130722100255.GF11674@dastard> (raw)
In-Reply-To: <20130722021742.GA24195@gmail.com>

On Mon, Jul 22, 2013 at 10:17:42AM +0800, Zheng Liu wrote:
> On Mon, Jul 22, 2013 at 11:38:31AM +1000, Dave Chinner wrote:
> > On Fri, Jul 19, 2013 at 12:19:30PM -0400, Theodore Ts'o wrote:
> > > On Fri, Jul 19, 2013 at 01:33:09PM +1000, Dave Chinner wrote:
> > > > An ioctl is kinda silly for this. Just use O_NONBLOCK when calling
> > > > open() and do the prefetch right in the open call. The open() can
> > > > block, anyway, and what you are trying to do is non-blocking IO with
> > > > AIO, so it seems like we've already got a sensible, generic
> > > > interface for triggering this sort of prefetch operation.
> > > 
> > > O_NONBLOCK (either set via open or fcntl) is a possibility, since it's
> > > carefully defined to be unspecified for regular files by SUSv3.  It is
> > > quite different from the existing semantics for O_NONBLOCK, though.
> > > Currently, for all file types where O_NONBLOCK is not ignored, open(2)
> > > is guaranteed itself not to block.  If we use O_NONBLOCK for regular
> > > files to mean that any necessary metadata blocks required for AIO to
> > > be "A" will be cached, then it will make open(2) much more likely to
> > > block.  Also, for all file types where O_NONBLOCK is not ignored,
> > > read(2) will not block but instead return -1 and set errno to EAGAIN.
> > > This would also be a change.
> > > 
> > > If we tried to get this new semantics for O_NONBLOCK to be accepted by
> > > the Austin Group for standardization in the future, would they accept
> > > it, or would they say, "this makes me vommit"?  I have a suspicion
> > > there reaction might be closer to the latter....
> > > 
> > > If we want a VFS-level API, in my opinion an fadvise() flag would be a
> > > better choice.
> > 
> > Sure. Make it an fadvise() flag - just don't add ioctls for things
> > that are generically useful.
> > 
> > On second thoughts - you're trying to get the extent map read in. We
> > already have an interface for querying extent maps - fiemap.
> > FIEMAP_FLAG_PREFETCH along with the range of the file you want the
> > extent map prefetched for?
> 
> I don't think fiemap is a good interface.  The application uses
> fiemap(2) to retrieve extent mapping. 

fiemap is used to query information about extent maps. What it
returns is entirely dependent on the input parameters that are
passed to it. Indeed, from Documentation/filesystems/fiemap.txt:

"If fm_extent_count is zero, then the fm_extents[] array is ignored
(no extents will be returned), and the fm_mapped_extents count will
hold the number of extents needed in fm_extents[] to hold the file's
current mapping."

Think about that for a minute. What does the filesystem do with such
an fiemap query when the extent map is not cached?  That's right,
*fiemap reads the extent map from disk into the cache* and then
returns the number of extents in the range.

All I have suggested is adding a flag to make this an *explicit
operation* rather than a side effect of a "count extents" query. I
fail to see any justification for a whole new interface when we
already have a perfectly functional one that already provides the
functionality that is required...

> That means that the app could use
> these mappings in userspace.  But now we want to cache these mappings in
> kernel space.

If the filesystem is not caching the extents read in during fiemap
operations then perhaps you should look into fixing that deficiency.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2013-07-22 10:03 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-16 15:17 [PATCH 0/5 v2] add extent status tree caching Theodore Ts'o
2013-07-16 15:17 ` [PATCH 1/5] ext4: refactor code to read the extent tree block Theodore Ts'o
2013-07-16 15:18 ` [PATCH 2/5] ext4: print the block number of invalid extent tree blocks Theodore Ts'o
2013-07-18  0:56   ` Zheng Liu
2013-07-16 15:18 ` [PATCH 3/5] ext4: use unsigned int for es_status values Theodore Ts'o
2013-07-16 15:18 ` [PATCH 4/5] ext4: cache all of an extent tree's leaf block upon reading Theodore Ts'o
2013-07-16 15:18 ` [PATCH 5/5] ext4: add new ioctl EXT4_IOC_PRECACHE_EXTENTS Theodore Ts'o
2013-07-18  1:19   ` Zheng Liu
2013-07-18  2:50     ` Theodore Ts'o
2013-07-18 13:06       ` Zheng Liu
2013-07-18 15:21         ` Theodore Ts'o
2013-07-18 18:35 ` [PATCH 0/5 v2] add extent status tree caching Eric Sandeen
2013-07-18 18:53   ` Theodore Ts'o
2013-07-19  0:56     ` Eric Sandeen
2013-07-19  2:59       ` Theodore Ts'o
2013-07-19  3:33         ` Dave Chinner
2013-07-19 14:22           ` Jeff Moyer
2013-07-19 16:19           ` Theodore Ts'o
2013-07-22  1:38             ` Dave Chinner
2013-07-22  2:17               ` Zheng Liu
2013-07-22 10:02                 ` Dave Chinner [this message]
2013-07-22 12:57                   ` Zheng Liu
2013-07-30  3:08                     ` Dave Chinner
2013-08-04  1:27                       ` Theodore Ts'o
2013-08-13  3:10                         ` Dave Chinner
2013-08-13  3:21                           ` Eric Sandeen
2013-08-13 13:04                             ` Theodore Ts'o
2013-08-16  3:21                               ` Dave Chinner
2013-08-16 14:39                                 ` Theodore Ts'o
2013-07-18 23:54   ` Zheng Liu
2013-07-19  0:07     ` Theodore Ts'o
2013-07-19  1:03       ` Zheng Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130722100255.GF11674@dastard \
    --to=david@fromorbit.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.