From: Jan Kara <jack@suse.cz> To: NeilBrown <neil@brown.name> Cc: Jan Kara <jack@suse.cz>, "J. Bruce Fields" <bfields@fieldses.org>, Jeff Layton <jlayton@redhat.com>, Christoph Hellwig <hch@infradead.org>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-xfs@vger.kernel.org Subject: Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Date: Thu, 6 Apr 2017 09:22:07 +0200 [thread overview] Message-ID: <20170406072207.GA25500@quack2.suse.cz> (raw) In-Reply-To: <87k26ygx0d.fsf@notabene.neil.brown.name> On Thu 06-04-17 11:12:02, NeilBrown wrote: > On Wed, Apr 05 2017, Jan Kara wrote: > >> If you want to ensure read-only files can remain cached over a crash, > >> then you would have to mark a file in some way on stable storage > >> *before* allowing any change. > >> e.g. you could use the lsb. Odd i_versions might have been changed > >> recently and crash-count*large-number needs to be added. > >> Even i_versions have not been changed recently and nothing need be > >> added. > >> > >> If you want to change a file with an even i_version, you subtract > >> crash-count*large-number > >> to the i_version, then set lsb. This is written to stable storage before > >> the change. > >> > >> If a file has not been changed for a while, you can add > >> crash-count*large-number > >> and clear lsb. > >> > >> The lsb of the i_version would be for internal use only. It would not > >> be visible outside the filesystem. > >> > >> It feels a bit clunky, but I think it would work and is the best > >> combination of Jan's idea and your requirement. > >> The biggest cost would be switching to 'odd' before an changes, and the > >> unknown is when does it make sense to switch to 'even'. > > > > Well, there is also a problem that you would need to somehow remember with > > which 'crash count' the i_version has been previously reported as that is > > not stored on disk with my scheme. So I don't think we can easily use your > > scheme. > > I don't think there is a problem here.... maybe I didn't explain > properly or something. > > I'm assuming there is a crash-count that is stored once per filesystem. > This might be a disk-format change, or maybe the "Last checked" time > could be used with ext4 (that is a bit horrible though). > > Every on-disk i_version has a flag to choose between: > - use this number as it is, but update it on-disk before any change > - add multiple of current crash-count to this number before use. > If you crash during an update, the i_version is thus automatically > increased. > > To change from the first option to the second option you subtract the > multiple of the current crash-count (which might make the stored > i_version negative), and flip the bit. > To change from the second option to the first, you add the multiple > of the current crash-count, and flip the bit. > In each case, the externally visible i_version does not change. > Nothing needs to be stored except the per-inode i_version and the per-fs > crash_count. Right, I didn't realize you would subtract crash counter when flipping the bit and then add it back when flipping again. That would work. > > So the options we have are: > > > > 1) Keep i_version as is, make clients also check for i_ctime. > > Pro: No on-disk format changes. > > Cons: After a crash, i_version can go backwards (but when file changes > > i_version, i_ctime pair should be still different) or not, data can be > > old or not. > > I like to think of this approach as using the i_version as an extension > to the i_ctime. > i_ctime doesn't necessarily change on every file modification, either > because it is not a modification that is meant to change i_ctime, or > because i_ctime doesn't have the resolution to show a very small change > in time, or because the clock that is used to update i_ctime doesn't > have much resolution. > So when a change happens, if the stored c_time changes, set i_version to > zero, otherwise increment i_version. > Then the externally visible i-version is a combination of the stored > c_time and the stored i_version. > If you only used 1-second ctime resolution for versioning purposes, you > could provide a 64bit i_version as 34 bits of ctime and 30 bits of > changes-in-one-second. > It is important that the resolution of ctime used is less that the > fastest possible restart after a crash. > > I don't think that i_version going backwards should be a problem, as > long as an old version means exactly the same old data. Presumably > journalling would ensure that the data and ctime/version are updated > atomically. So as Dave and I wrote earlier in this thread, journalling does not ensure data vs ctime/version consistency (well, except for ext4 in data=journal mode but people rarely run that due to performance implications). So you can get old data and new version as well as new data and old version after a crash. The only thing filesystems guarantee is that you will not see uninitialized blocks and that fsync makes both data & ctime/version persistent. But as Bruce wrote for NFS open-to-close semantics this may be actually good enough. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR
next prev parent reply other threads:[~2017-04-06 7:22 UTC|newest] Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-12-21 17:03 Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 01/30] lustre: don't set f_version in ll_readdir Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 02/30] ecryptfs: remove unnecessary i_version bump Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 03/30] ceph: remove the bump of i_version Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 04/30] f2fs: don't bother setting i_version Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 05/30] hpfs: don't bother with the i_version counter Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 06/30] jfs: remove initialization of " Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 07/30] nilfs2: remove inode->i_version initialization Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 08/30] orangefs: remove initialization of i_version Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 09/30] reiserfs: remove unneeded i_version bump Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 10/30] ntfs: remove i_version handling Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 11/30] fs: new API for handling i_version Jeff Layton 2017-03-03 22:36 ` J. Bruce Fields 2017-03-04 0:09 ` Jeff Layton 2017-03-03 23:55 ` NeilBrown 2017-03-04 1:58 ` Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 12/30] fat: convert to new i_version API Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 13/30] affs: " Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 14/30] afs: " Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 15/30] btrfs: " Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 16/30] exofs: switch " Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 17/30] ext2: convert " Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 18/30] ext4: " Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 19/30] nfs: " Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 20/30] nfsd: " Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 21/30] ocfs2: " Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 22/30] ufs: use " Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 23/30] xfs: convert to " Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 24/30] IMA: switch IMA over " Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 25/30] fs: add a "force" parameter to inode_inc_iversion Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 26/30] fs: only set S_VERSION when updating times if it has been queried Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 27/30] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 28/30] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 29/30] fs: track whether the i_version has been queried with an i_state flag Jeff Layton 2017-03-04 0:03 ` NeilBrown 2017-03-04 0:43 ` Jeff Layton 2016-12-21 17:03 ` [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t Jeff Layton 2016-12-22 8:38 ` Amir Goldstein 2016-12-22 13:27 ` Jeff Layton 2017-03-04 0:00 ` NeilBrown 2016-12-22 8:45 ` [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Christoph Hellwig 2016-12-22 14:42 ` Jeff Layton 2017-03-20 21:43 ` J. Bruce Fields 2017-03-21 13:45 ` Christoph Hellwig 2017-03-21 16:30 ` J. Bruce Fields 2017-03-21 17:23 ` Jeff Layton 2017-03-21 17:37 ` J. Bruce Fields 2017-03-21 17:51 ` J. Bruce Fields 2017-03-21 18:30 ` J. Bruce Fields 2017-03-21 18:46 ` Jeff Layton 2017-03-21 19:13 ` J. Bruce Fields 2017-03-21 21:54 ` Jeff Layton 2017-03-29 11:15 ` Jan Kara 2017-03-29 17:54 ` Jeff Layton 2017-03-29 23:41 ` Dave Chinner 2017-03-30 11:24 ` Jeff Layton 2017-04-04 18:38 ` J. Bruce Fields 2017-03-30 6:47 ` Jan Kara 2017-03-30 11:11 ` Jeff Layton 2017-03-30 16:12 ` J. Bruce Fields 2017-03-30 18:35 ` Jeff Layton 2017-03-30 21:11 ` Boaz Harrosh 2017-04-04 18:31 ` J. Bruce Fields 2017-04-05 1:43 ` NeilBrown 2017-04-05 8:05 ` Jan Kara 2017-04-05 18:14 ` J. Bruce Fields 2017-05-11 18:59 ` J. Bruce Fields 2017-05-11 22:22 ` NeilBrown 2017-05-12 16:21 ` J. Bruce Fields 2017-10-30 13:21 ` Jeff Layton 2017-05-12 8:27 ` Jan Kara 2017-05-12 15:56 ` J. Bruce Fields 2017-05-12 11:01 ` Jeff Layton 2017-05-12 15:57 ` J. Bruce Fields 2017-04-06 1:12 ` NeilBrown 2017-04-06 7:22 ` Jan Kara [this message] 2017-04-05 17:26 ` J. Bruce Fields 2017-04-01 23:05 ` Dave Chinner 2017-04-03 14:00 ` Jan Kara 2017-04-04 12:34 ` Dave Chinner 2017-04-04 17:53 ` J. Bruce Fields 2017-04-05 1:26 ` NeilBrown 2017-03-21 21:45 ` Dave Chinner 2017-03-22 19:53 ` Jeff Layton 2017-03-03 23:00 ` J. Bruce Fields 2017-03-04 0:53 ` Jeff Layton 2017-03-08 17:29 ` J. Bruce Fields
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20170406072207.GA25500@quack2.suse.cz \ --to=jack@suse.cz \ --cc=bfields@fieldses.org \ --cc=hch@infradead.org \ --cc=jlayton@redhat.com \ --cc=linux-btrfs@vger.kernel.org \ --cc=linux-ext4@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nfs@vger.kernel.org \ --cc=linux-xfs@vger.kernel.org \ --cc=neil@brown.name \ --subject='Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).