All of lore.kernel.org
 help / color / mirror / Atom feed
From: tytso@mit.edu
To: Andreas Dilger <andreas.dilger@lustre.org>
Cc: Eric Sandeen <sandeen@redhat.com>,
	ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs()
Date: Thu, 19 Nov 2009 14:08:46 -0500	[thread overview]
Message-ID: <20091119190846.GB2099@thunk.org> (raw)
In-Reply-To: <F64B29C1-A90E-42F5-80CF-5704283D9A1B@sun.com>

On Mon, Nov 16, 2009 at 03:38:16PM -0800, Andreas Dilger wrote:
> 
> The problem is that if you do "e2fsck -fn" it will still report this
> as an error in the filesystem, even though "e2fsck -fp" will
> silently fix it.  I just repeated this test and still see errors,
> even 30 minutes after a file was modified, even after multiple
> syncs.

Sure, but running e2fsck -fn on a mounted file system will always
potentially show problems.   In fact, in your demonstration:

> [adilger@webber ~]$ sync; sleep 10; sync
> [adilger@webber ~]$ e2fsck -fn /dev/dm-0
> e2fsck 1.41.6.sun1 (30-May-2009)
> Warning!  /dev/dm-0 is mounted.
> Warning: skipping journal recovery because doing a read-only
> filesystem check.
	...
> Pass 1: Checking inodes, blocks, and sizes
> Deleted inode 884739 has zero dtime.  Fix? no
	...
> Pass 5: Checking group summary information\x0f
> Block bitmap differences:  -1784645
> Fix? no
> 
> Inode bitmap differences:  -884739
> Fix? no

.... neither of these errors would be fixed by the hacking of updating
the summary free blocks and inode counts.

If the concern is what happens when someone runs e2fsck -fn on a
mountd file system, I have a very hard time getting excited about
that....

> The other thing that comes to mind is that we don't recover the journal
> for a read-only e2fsck, but we DO recover it on a read-only mount
> seems inconsistent.  It wouldn't be hard to have e2fsck -n read the
> journal and
> persistently cache the journal blocks in its internal cache (i.e. flag
> them so they can't be discarded from cache) before it runs the rest
> of the
> e2fsck.

Eventually it would be nice if we did the same thing in both kernel
and userspace when doing a read-only mount/check: build a redirection
table that maps specific physical blocks to the block in the journal,
and whenever the system tries to access a specific physical block, we
look up the proper block to use instead in the redirection block.

The one tricky bit about doing this in the kernel is that we would
still have to replay the journal in the case of the read-only root.
Why?  Because otherwise older e2fsck's would get confused and replay
the journal, and that would lead to some potentially serious
confusion.  Even if we fix this in future versions of e2fsck, we still
need to be careful dealing with remounting a r/o filesystem to be
read/write, especially in the journal=data mode.

The simple way of handling journaled data blocks is to hack the
bmap() function to use the redirection block, but the problem with
doing that is the journal block will be left in the buffer heads in
the page cache.  If the file system is remounted r/w without first
flushing these buffer heads, future attempts to modify these pages in
the page cache could result in a random block in the journalling
getting corrupted by an update, instead of updating the proper final
location on disk for that data block.

If we have someone who is at least some basic experience in kernel
coding, but and an entry-level project getting involved with ext4,
this would be an ideal, self-contained thing to try doing.  I'd
suggest implementing it in userspace first, using the userspace/kernel
API framework that allows e2fsck/recovery.c to be roughly kept in sync
with fs/jbd[2]/recovery.c, and avoiding the hair of r/o roots by
always replaying the journal in the case of the root file system.
Anyone interested?  If so, let me know...

    			    	   		       - Ted

  reply	other threads:[~2009-11-19 20:32 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-06 22:33 [PATCH 2/2] ext4: journal superblock modifications in ext4_statfs() Eric Sandeen
2009-11-07  0:26 ` Andreas Dilger
2009-11-07  1:08   ` Eric Sandeen
2009-11-08 21:48   ` Theodore Tso
2009-11-08 22:09     ` Eric Sandeen
2009-11-09 12:53       ` Theodore Tso
2009-11-09 17:55         ` Andreas Dilger
2009-11-09  4:41     ` Andreas Dilger
2009-11-15  3:29       ` Theodore Tso
2009-11-16 23:38         ` Andreas Dilger
2009-11-19 19:08           ` tytso [this message]
2009-11-23 11:57             ` Duane Griffin
2009-11-23 14:26               ` tytso
2009-11-23 14:40                 ` Duane Griffin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091119190846.GB2099@thunk.org \
    --to=tytso@mit.edu \
    --cc=andreas.dilger@lustre.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.