linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Valerie Henson" <val.henson@gmail.com>
To: "David Chinner" <dgc@sgi.com>
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-kernel@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>,
	"Andreas Dilger" <adilger@clusterfs.com>,
	"Ric Wheeler" <ric@emc.com>
Subject: Re: [RFC] Parallelize IO for e2fsck
Date: Thu, 17 Jan 2008 17:43:37 -0800	[thread overview]
Message-ID: <70b6f0bf0801171743g610c4a96qf42b268ccc777db4@mail.gmail.com> (raw)
In-Reply-To: <20080118011542.GQ155259@sgi.com>

On Jan 17, 2008 5:15 PM, David Chinner <dgc@sgi.com> wrote:
> On Wed, Jan 16, 2008 at 01:30:43PM -0800, Valerie Henson wrote:
> > Hi y'all,
> >
> > This is a request for comments on the rewrite of the e2fsck IO
> > parallelization patches I sent out a few months ago.  The mechanism is
> > totally different.  Previously IO was parallelized by issuing IOs from
> > multiple threads; now a single thread issues fadvise(WILLNEED) and
> > then uses read() to complete the IO.
>
> Interesting.
>
> We ultimately rejected a similar patch to xfs_repair (pre-population
> the kernel block device cache) mainly because of low memory
> performance issues and it doesn't really enable you to do anything
> particularly smart with optimising I/O patterns for larger, high
> performance RAID arrays.
>
> The low memory problems were particularly bad; the readahead
> thrashing cause a slowdown of 2-3x compared to the baseline and
> often it was due to the repair process requiring all of memory
> to cache stuff it would need later. IIRC, multi-terabyte ext3
> filesystems have similar memory usage problems to XFS, so there's
> a good chance that this patch will see the same sorts of issues.

That was one of my first concerns - how to avoid overflowing memory?
Whenever I screw it up on e2fsck, it does go, oh, 2 times slower due
to the minor detail of every single block being read from disk twice.
:)

I have a partial solution that sort of blindly manages the buffer
cache.  First, the user passes e2fsck a parameter saying how much
memory is available as buffer cache.  The readahead thread reads
things in and immediately throws them away so they are only in buffer
cache (no double-caching).  Then readahead and e2fsck work together so
that readahead only reads in new blocks when the main thread is done
with earlier blocks.  The already-used blocks get kicked out of buffer
cache to make room for the new ones.

What would be nice is to take into account the current total memory
usage of the whole fsck process and factor that in.  I don't think it
would be hard to add to the existing cache management framework.
Thoughts?

> Promising results, though....

Thanks!  It's solving a rather simpler problem than XFS check/repair. :)

-VAL

  reply	other threads:[~2008-01-18  1:44 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <70b6f0bf0801161322k2740a8dch6a0d6e6e112cd2d0@mail.gmail.com>
2008-01-16 21:30 ` [RFC] Parallelize IO for e2fsck Valerie Henson
2008-01-18  1:15   ` David Chinner
2008-01-18  1:43     ` Valerie Henson [this message]
2008-01-21 23:00   ` Andreas Dilger
2008-01-22  3:38     ` David Chinner
2008-01-22  4:17       ` Valdis.Kletnieks
2008-01-22  7:00         ` Andreas Dilger
2008-01-22 13:05           ` Alan Cox
2008-01-22 14:40           ` Theodore Tso
2008-01-22 14:57             ` Arnaldo Carvalho de Melo
2008-01-28 19:30             ` Pavel Machek
2008-01-28 19:56               ` Theodore Tso
2008-01-28 20:01                 ` Pavel Machek
2008-02-03 13:51                   ` KOSAKI Motohiro
2008-01-29  8:29                 ` david
2008-01-22  7:05       ` Andreas Dilger
2008-01-22  8:16         ` David Chinner
2008-01-22 17:42       ` Bryan Henderson
     [not found] <9Mo9w-7Ws-25@gated-at.bofh.it>
     [not found] ` <9Mo9w-7Ws-23@gated-at.bofh.it>
     [not found]   ` <9OdWm-7uN-25@gated-at.bofh.it>
     [not found]     ` <9Oi9A-5EJ-3@gated-at.bofh.it>
     [not found]       ` <9OiMg-6IC-1@gated-at.bofh.it>
     [not found]         ` <9OlqL-2xG-3@gated-at.bofh.it>
     [not found]           ` <9Orda-3ub-45@gated-at.bofh.it>
2008-01-24 17:32             ` Bodo Eggert
2008-01-24 22:07               ` Andreas Dilger
2008-01-24 23:08               ` Adrian Bunk
2008-01-24 23:40                 ` Theodore Tso
2008-01-25  0:25                   ` Zan Lynx
2008-01-25 11:09                     ` Andreas Dilger
2008-01-26  0:55                       ` Zan Lynx
2008-01-26 11:56                         ` KOSAKI Motohiro
2008-01-26 12:32                   ` KOSAKI Motohiro
     [not found] <alpine.LSU.0.999.0801252338460.26260@be1.lrz>
2008-01-26  1:55 ` Bryan Henderson
2008-01-26 13:21   ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=70b6f0bf0801171743g610c4a96qf42b268ccc777db4@mail.gmail.com \
    --to=val.henson@gmail.com \
    --cc=adilger@clusterfs.com \
    --cc=dgc@sgi.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ric@emc.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).