From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
Jeff Layton <jlayton@redhat.com>,
Christoph Hellwig <hch@infradead.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nfs@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization
Date: Tue, 4 Apr 2017 22:34:14 +1000 [thread overview]
Message-ID: <20170404123414.GA23007@dastard> (raw)
In-Reply-To: <20170403140055.GF15168@quack2.suse.cz>
On Mon, Apr 03, 2017 at 04:00:55PM +0200, Jan Kara wrote:
> On Sun 02-04-17 09:05:26, Dave Chinner wrote:
> > On Thu, Mar 30, 2017 at 12:12:31PM -0400, J. Bruce Fields wrote:
> > > On Thu, Mar 30, 2017 at 07:11:48AM -0400, Jeff Layton wrote:
> > > > On Thu, 2017-03-30 at 08:47 +0200, Jan Kara wrote:
> > > > > Because if above is acceptable we could make reported i_version to be a sum
> > > > > of "superblock crash counter" and "inode i_version". We increment
> > > > > "superblock crash counter" whenever we detect unclean filesystem shutdown.
> > > > > That way after a crash we are guaranteed each inode will report new
> > > > > i_version (the sum would probably have to look like "superblock crash
> > > > > counter" * 65536 + "inode i_version" so that we avoid reusing possible
> > > > > i_version numbers we gave away but did not write to disk but still...).
> > > > > Thoughts?
> > >
> > > How hard is this for filesystems to support? Do they need an on-disk
> > > format change to keep track of the crash counter?
> >
> > Yes. We'll need version counter in the superblock, and we'll need to
> > know what the increment semantics are.
> >
> > The big question is how do we know there was a crash? The only thing
> > a journalling filesystem knows at mount time is whether it is clean
> > or requires recovery. Filesystems can require recovery for many
> > reasons that don't involve a crash (e.g. root fs is never unmounted
> > cleanly, so always requires recovery). Further, some filesystems may
> > not even know there was a crash at mount time because their
> > architecture always leaves a consistent filesystem on disk (e.g. COW
> > filesystems)....
>
> What filesystems can or cannot easily do obviously differs. Ext4 has a
> recovery flag set in superblock on RW mount/remount and cleared on
> umount/RO remount.
Even this doesn't help. A recent bug that was reported to the XFS
list - turns out that systemd can't remount-ro the root
filesystem sucessfully on shutdown because there are open write fds
on the root filesystem when it attempts the remount. So it just
reboots without a remount-ro. This uncovered a bug in grub in
that it (still!) thinks sync(1) is sufficient to get all the
metadata that points to a kernel image onto disk in places it can
read. XFS, like ext4, leaves it in the journal and so the system then fails to
boot because systemd didn't remount-ro the root fs and hence the
journal was never flushed before reboot and so grub can't find the
kernel and so everything fails....
> This flag being set on mount would imply incrementing the crash
> counter. It should be pretty easy for each filesystem to implement
> such flag and the counter but I agree it requires an on-disk
> format change.
Yup, anything we want that is persistent and consistent across
filesystems will need on-disk format changes. Hence we need a solid
specification first, not to mention tests to validate correct
behaviour across all filesystems in xfstests...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2017-04-04 12:34 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-21 17:03 [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 01/30] lustre: don't set f_version in ll_readdir Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 02/30] ecryptfs: remove unnecessary i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 03/30] ceph: remove the bump of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 04/30] f2fs: don't bother setting i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 05/30] hpfs: don't bother with the i_version counter Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 06/30] jfs: remove initialization of " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 07/30] nilfs2: remove inode->i_version initialization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 08/30] orangefs: remove initialization of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 09/30] reiserfs: remove unneeded i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 10/30] ntfs: remove i_version handling Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 11/30] fs: new API for handling i_version Jeff Layton
2017-03-03 22:36 ` J. Bruce Fields
2017-03-04 0:09 ` Jeff Layton
2017-03-03 23:55 ` NeilBrown
2017-03-04 1:58 ` Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 12/30] fat: convert to new i_version API Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 13/30] affs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 14/30] afs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 15/30] btrfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 16/30] exofs: switch " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 17/30] ext2: convert " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 18/30] ext4: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 19/30] nfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 20/30] nfsd: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 21/30] ocfs2: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 22/30] ufs: use " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 23/30] xfs: convert to " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 24/30] IMA: switch IMA over " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 25/30] fs: add a "force" parameter to inode_inc_iversion Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 26/30] fs: only set S_VERSION when updating times if it has been queried Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 27/30] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 28/30] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 29/30] fs: track whether the i_version has been queried with an i_state flag Jeff Layton
2017-03-04 0:03 ` NeilBrown
2017-03-04 0:43 ` Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t Jeff Layton
2016-12-22 8:38 ` Amir Goldstein
2016-12-22 13:27 ` Jeff Layton
2017-03-04 0:00 ` NeilBrown
2016-12-22 8:45 ` [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Christoph Hellwig
2016-12-22 14:42 ` Jeff Layton
2017-03-20 21:43 ` J. Bruce Fields
2017-03-21 13:45 ` Christoph Hellwig
2017-03-21 16:30 ` J. Bruce Fields
2017-03-21 17:23 ` Jeff Layton
2017-03-21 17:37 ` J. Bruce Fields
2017-03-21 17:51 ` J. Bruce Fields
2017-03-21 18:30 ` J. Bruce Fields
2017-03-21 18:46 ` Jeff Layton
2017-03-21 19:13 ` J. Bruce Fields
2017-03-21 21:54 ` Jeff Layton
2017-03-29 11:15 ` Jan Kara
2017-03-29 17:54 ` Jeff Layton
2017-03-29 23:41 ` Dave Chinner
2017-03-30 11:24 ` Jeff Layton
2017-04-04 18:38 ` J. Bruce Fields
2017-03-30 6:47 ` Jan Kara
2017-03-30 11:11 ` Jeff Layton
2017-03-30 16:12 ` J. Bruce Fields
2017-03-30 18:35 ` Jeff Layton
2017-03-30 21:11 ` Boaz Harrosh
2017-04-04 18:31 ` J. Bruce Fields
2017-04-05 1:43 ` NeilBrown
2017-04-05 8:05 ` Jan Kara
2017-04-05 18:14 ` J. Bruce Fields
2017-05-11 18:59 ` J. Bruce Fields
2017-05-11 22:22 ` NeilBrown
2017-05-12 16:21 ` J. Bruce Fields
2017-10-30 13:21 ` Jeff Layton
2017-05-12 8:27 ` Jan Kara
2017-05-12 15:56 ` J. Bruce Fields
2017-05-12 11:01 ` Jeff Layton
2017-05-12 15:57 ` J. Bruce Fields
2017-04-06 1:12 ` NeilBrown
2017-04-06 7:22 ` Jan Kara
2017-04-05 17:26 ` J. Bruce Fields
2017-04-01 23:05 ` Dave Chinner
2017-04-03 14:00 ` Jan Kara
2017-04-04 12:34 ` Dave Chinner [this message]
2017-04-04 17:53 ` J. Bruce Fields
2017-04-05 1:26 ` NeilBrown
2017-03-21 21:45 ` Dave Chinner
2017-03-22 19:53 ` Jeff Layton
2017-03-03 23:00 ` J. Bruce Fields
2017-03-04 0:53 ` Jeff Layton
2017-03-08 17:29 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170404123414.GA23007@dastard \
--to=david@fromorbit.com \
--cc=bfields@fieldses.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jlayton@redhat.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).