All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zheng Liu <gnehzuil.liu@gmail.com>
To: Eric Sandeen <sandeen@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH 0/5 v2] add extent status tree caching
Date: Fri, 19 Jul 2013 07:54:51 +0800	[thread overview]
Message-ID: <20130718235451.GA29997@gmail.com> (raw)
In-Reply-To: <51E8356C.9030603@redhat.com>

Hi Eric,

On Thu, Jul 18, 2013 at 01:35:24PM -0500, Eric Sandeen wrote:
> On 7/16/13 10:17 AM, Theodore Ts'o wrote:
> > In addition to fixing a few bugs and addressing review comments, we now
> > add a new ioctl, EXT4_IOC_PRECACHE_EXTENTS, which forces all of the
> > extents in an inode to be cached in the extents status tree, and marks
> > them to be preferentially protected when under memory pressure.  
> > 
> > This is critically important when using AIO to a preallocated file,
> > since if we need to read in blocks from the extent tree, the
> > io_submit(2) system call becomes synchronous, which is rather rude to
> > applications which were expecting the AIO to be "A".
> > 
> > As a bonus, using the extent status tree to store the logical to
> > physical block mapping is usually more compact that having to keep one
> > or more extent tree blocks in the buffer cache.
> > 
> > (Should we do this all the time, instead of when the application
> > explicitly requests it?  Maybe; there could be cases with very large,
> > fragmented files accessed by an application such as "file" is only needs
> > to look at a small subset of the file where this could result in an
> > unnecessary work and memory allocated.  OTOH, 95%+ of the time this
> > would probably be a win...)
> 
> I'd say yes, we should - maybe not in all cases but if you need it for
> AIO, try to make it "all the time" at least for that AIO?
> 
> We keep telling application writers not to assume certain things about
> various filesystems, or to write applications that treat ext4 differently 
> han ext3 differently than xfs etc...

Yes, I agree with you.  As Ted and I have discussed the problem of
setting 'data=writeback' by default in ext4.  Although most application
writers have realized that they need to explicit call fsync to flush all
dirty pages, there are still some legacy applications that depends on
the 'data=ordered' mode to flush all dirty pages.

> 
> This goes the other way.
> 
> In the end who (besides google?) is really going to call this IOCTL?
> 
> I wondered if only doing this when files are opened O_DIRECT might make
> sense, but Jeff Moyer pointed out that giant databases probably don't
> want to read in their entire block mapping tree - OTOH, they probably use
> preallocation if they're smart, and maybe it's not that bad.

I have talked with my colleague who is a MySQL contributor about whether
MySQL tries to preallocate some files or not.  As far as I know, at
least MySQL doesn't try to do it until now.  I don't have the source
code of Oracle or DB2, these giant databases might use preallocation I
guess.

> 
> Or what about tying this into POSIX_FADV_WILLNEED?  Hohum, that gets
> into force_page_cache_readahead().  We need POSIX_FADV_WILLNEED_META...

Yes, _WILLNEED_METADATA flag makes sense to me if other file systems
also want to support it.  But, as Ted said, now adding it in ioctl might
a good choice because we won't impact other file systems.

                                                - Zheng

  parent reply	other threads:[~2013-07-18 23:36 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-16 15:17 [PATCH 0/5 v2] add extent status tree caching Theodore Ts'o
2013-07-16 15:17 ` [PATCH 1/5] ext4: refactor code to read the extent tree block Theodore Ts'o
2013-07-16 15:18 ` [PATCH 2/5] ext4: print the block number of invalid extent tree blocks Theodore Ts'o
2013-07-18  0:56   ` Zheng Liu
2013-07-16 15:18 ` [PATCH 3/5] ext4: use unsigned int for es_status values Theodore Ts'o
2013-07-16 15:18 ` [PATCH 4/5] ext4: cache all of an extent tree's leaf block upon reading Theodore Ts'o
2013-07-16 15:18 ` [PATCH 5/5] ext4: add new ioctl EXT4_IOC_PRECACHE_EXTENTS Theodore Ts'o
2013-07-18  1:19   ` Zheng Liu
2013-07-18  2:50     ` Theodore Ts'o
2013-07-18 13:06       ` Zheng Liu
2013-07-18 15:21         ` Theodore Ts'o
2013-07-18 18:35 ` [PATCH 0/5 v2] add extent status tree caching Eric Sandeen
2013-07-18 18:53   ` Theodore Ts'o
2013-07-19  0:56     ` Eric Sandeen
2013-07-19  2:59       ` Theodore Ts'o
2013-07-19  3:33         ` Dave Chinner
2013-07-19 14:22           ` Jeff Moyer
2013-07-19 16:19           ` Theodore Ts'o
2013-07-22  1:38             ` Dave Chinner
2013-07-22  2:17               ` Zheng Liu
2013-07-22 10:02                 ` Dave Chinner
2013-07-22 12:57                   ` Zheng Liu
2013-07-30  3:08                     ` Dave Chinner
2013-08-04  1:27                       ` Theodore Ts'o
2013-08-13  3:10                         ` Dave Chinner
2013-08-13  3:21                           ` Eric Sandeen
2013-08-13 13:04                             ` Theodore Ts'o
2013-08-16  3:21                               ` Dave Chinner
2013-08-16 14:39                                 ` Theodore Ts'o
2013-07-18 23:54   ` Zheng Liu [this message]
2013-07-19  0:07     ` Theodore Ts'o
2013-07-19  1:03       ` Zheng Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130718235451.GA29997@gmail.com \
    --to=gnehzuil.liu@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.