linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@sun.com>
To: Valerie Henson <val@vahconsulting.com>
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-kernel@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>,
	Andreas Dilger <adilger@clusterfs.com>, Ric Wheeler <ric@emc.com>
Subject: Re: [RFC] Parallelize IO for e2fsck
Date: Mon, 21 Jan 2008 16:00:41 -0700	[thread overview]
Message-ID: <20080121230041.GL3180@webber.adilger.int> (raw)
In-Reply-To: <70b6f0bf0801161330y46ec555m5d4994a1eea7d045@mail.gmail.com>

On Jan 16, 2008  13:30 -0800, Valerie Henson wrote:
> I have a partial solution that sort of blindly manages the buffer
> cache.  First, the user passes e2fsck a parameter saying how much
> memory is available as buffer cache.  The readahead thread reads
> things in and immediately throws them away so they are only in buffer
> cache (no double-caching).  Then readahead and e2fsck work together so
> that readahead only reads in new blocks when the main thread is done
> with earlier blocks.  The already-used blocks get kicked out of buffer
> cache to make room for the new ones.
>
> What would be nice is to take into account the current total memory
> usage of the whole fsck process and factor that in.  I don't think it
> would be hard to add to the existing cache management framework.
> Thoughts?

I discussed this with Ted at one point also.  This is a generic problem,
not just for readahead, because "fsck" can run multiple e2fsck in parallel
and in case of many large filesystems on a single node this can cause
memory usage problems also.

What I was proposing is that "fsck.{fstype}" be modified to return an
estimated minimum amount of memory needed, and some "desired" amount of
memory (i.e. readahead) to fsck the filesystem, using some parameter like
"fsck.{fstype} --report-memory-needed /dev/XXX".  If this does not
return the output in the expected format, or returns an error then fsck
will assume some amount of memory based on the device size and continue
as it does today.

If the fsck.{fstype} does understand this parameter, then fsck makes a
decision based on devices, parallelism, total RAM (less some amount to
avoid thrashing), then it can call the individual fsck commands with
"--maximum-memory MMM /dev/XXX" so each knows how much cache it can
allocate.  This parameter can also be specified by the user if running
e2fsck directly.

I haven't looked through your patch yet, but I hope to get to it soon.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


  parent reply	other threads:[~2008-01-21 23:02 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <70b6f0bf0801161322k2740a8dch6a0d6e6e112cd2d0@mail.gmail.com>
2008-01-16 21:30 ` [RFC] Parallelize IO for e2fsck Valerie Henson
2008-01-18  1:15   ` David Chinner
2008-01-18  1:43     ` Valerie Henson
2008-01-21 23:00   ` Andreas Dilger [this message]
2008-01-22  3:38     ` David Chinner
2008-01-22  4:17       ` Valdis.Kletnieks
2008-01-22  7:00         ` Andreas Dilger
2008-01-22 13:05           ` Alan Cox
2008-01-22 14:40           ` Theodore Tso
2008-01-22 14:57             ` Arnaldo Carvalho de Melo
2008-01-28 19:30             ` Pavel Machek
2008-01-28 19:56               ` Theodore Tso
2008-01-28 20:01                 ` Pavel Machek
2008-02-03 13:51                   ` KOSAKI Motohiro
2008-01-29  8:29                 ` david
2008-01-22  7:05       ` Andreas Dilger
2008-01-22  8:16         ` David Chinner
2008-01-22 17:42       ` Bryan Henderson
     [not found] <9Mo9w-7Ws-25@gated-at.bofh.it>
     [not found] ` <9Mo9w-7Ws-23@gated-at.bofh.it>
     [not found]   ` <9OdWm-7uN-25@gated-at.bofh.it>
     [not found]     ` <9Oi9A-5EJ-3@gated-at.bofh.it>
     [not found]       ` <9OiMg-6IC-1@gated-at.bofh.it>
     [not found]         ` <9OlqL-2xG-3@gated-at.bofh.it>
     [not found]           ` <9Orda-3ub-45@gated-at.bofh.it>
2008-01-24 17:32             ` Bodo Eggert
2008-01-24 22:07               ` Andreas Dilger
2008-01-24 23:08               ` Adrian Bunk
2008-01-24 23:40                 ` Theodore Tso
2008-01-25  0:25                   ` Zan Lynx
2008-01-25 11:09                     ` Andreas Dilger
2008-01-26  0:55                       ` Zan Lynx
2008-01-26 11:56                         ` KOSAKI Motohiro
2008-01-26 12:32                   ` KOSAKI Motohiro
     [not found] <alpine.LSU.0.999.0801252338460.26260@be1.lrz>
2008-01-26  1:55 ` Bryan Henderson
2008-01-26 13:21   ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080121230041.GL3180@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=adilger@clusterfs.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ric@emc.com \
    --cc=tytso@mit.edu \
    --cc=val@vahconsulting.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).