From: Jeff Layton <jlayton@kernel.org>
To: Theodore Ts'o <tytso@mit.edu>, NeilBrown <neilb@suse.de>
Cc: Trond Myklebust <trondmy@hammerspace.com>,
"bfields@fieldses.org" <bfields@fieldses.org>,
"zohar@linux.ibm.com" <zohar@linux.ibm.com>,
"djwong@kernel.org" <djwong@kernel.org>,
"brauner@kernel.org" <brauner@kernel.org>,
"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
"david@fromorbit.com" <david@fromorbit.com>,
"fweimer@redhat.com" <fweimer@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"chuck.lever@oracle.com" <chuck.lever@oracle.com>,
"linux-man@vger.kernel.org" <linux-man@vger.kernel.org>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
"jack@suse.cz" <jack@suse.cz>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"xiubli@redhat.com" <xiubli@redhat.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"adilger.kernel@dilger.ca" <adilger.kernel@dilger.ca>,
"lczerner@redhat.com" <lczerner@redhat.com>,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field
Date: Fri, 16 Sep 2022 11:11:34 -0400 [thread overview]
Message-ID: <24005713ad25370d64ab5bd0db0b2e4fcb902c1c.camel@kernel.org> (raw)
In-Reply-To: <7027d1c2923053fe763e9218d10ce8634b56e81d.camel@kernel.org>
On Fri, 2022-09-16 at 07:36 -0400, Jeff Layton wrote:
> On Fri, 2022-09-16 at 02:54 -0400, Theodore Ts'o wrote:
> > On Fri, Sep 16, 2022 at 08:23:55AM +1000, NeilBrown wrote:
> > > > > If the answer is that 'all values change', then why store the crash
> > > > > counter in the inode at all? Why not just add it as an offset when
> > > > > you're generating the user-visible change attribute?
> > > > >
> > > > > i.e. statx.change_attr = inode->i_version + (crash counter * offset)
> >
> > I had suggested just hashing the crash counter with the file system's
> > on-disk i_version number, which is essentially what you are suggested.
> >
> > > > Yes, if we plan to ensure that all the change attrs change after a
> > > > crash, we can do that.
> > > >
> > > > So what would make sense for an offset? Maybe 2**12? One would hope that
> > > > there wouldn't be more than 4k increments before one of them made it to
> > > > disk. OTOH, maybe that can happen with teeny-tiny writes.
> > >
> > > Leave it up the to filesystem to decide. The VFS and/or NFSD should
> > > have not have part in calculating the i_version. It should be entirely
> > > in the filesystem - though support code could be provided if common
> > > patterns exist across filesystems.
> >
> > Oh, *heck* no. This parameter is for the NFS implementation to
> > decide, because it's NFS's caching algorithms which are at stake here.
> >
> > As a the file system maintainer, I had offered to make an on-disk
> > "crash counter" which would get updated when the journal had gotten
> > replayed, in addition to the on-disk i_version number. This will be
> > available for the Linux implementation of NFSD to use, but that's up
> > to *you* to decide how you want to use them.
> >
> > I was perfectly happy with hashing the crash counter and the i_version
> > because I had assumed that not *that* much stuff was going to be
> > cached, and so invalidating all of the caches in the unusual case
> > where there was a crash was acceptable. After all it's a !@#?!@
> > cache. Caches sometimmes get invalidated. "That is the order of
> > things." (as Ramata'Klan once said in "Rocks and Shoals")
> >
> > But if people expect that multiple TB's of data is going to be stored;
> > that cache invalidation is unacceptable; and that a itsy-weeny chance
> > of false negative failures which might cause data corruption might be
> > acceptable tradeoff, hey, that's for the system which is providing
> > caching semantics to determine.
> >
> > PLEASE don't put this tradeoff on the file system authors; I would
> > much prefer to leave this tradeoff in the hands of the system which is
> > trying to do the caching.
> >
>
> Yeah, if we were designing this from scratch, I might agree with leaving
> more up to the filesystem, but the existing users all have pretty much
> the same needs. I'm going to plan to try to keep most of this in the
> common infrastructure defined in iversion.h.
>
> Ted, for the ext4 crash counter, what wordsize were you thinking? I
> doubt we'll be able to use much more than 32 bits so a larger integer is
> probably not worthwhile. There are several holes in struct super_block
> (at least on x86_64), so adding this field to the generic structure
> needn't grow it.
That said, now that I've taken a swipe at implementing this, I need more
information than just the crash counter. We need to multiply the crash
counter with a reasonable estimate of the maximum number of individual
writes that could occur between an i_version being incremented and that
value making it to the backing store.
IOW, given a write that bumps the i_version to X, how many more write
calls could race in before X makes it to the platter? I took a SWAG and
said 4k in an earlier email, but I don't really have a way to know, and
that could vary wildly with different filesystems and storage.
What I'd like to see is this in struct super_block:
u32 s_version_offset;
...and then individual filesystems can calculate:
crash_counter * max_number_of_writes
and put the correct value in there at mount time.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2022-09-16 15:11 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-07 11:16 [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field Jeff Layton
2022-09-07 11:37 ` NeilBrown
2022-09-07 12:20 ` J. Bruce Fields
2022-09-07 12:58 ` Jeff Layton
2022-09-07 12:47 ` Jeff Layton
2022-09-07 12:52 ` J. Bruce Fields
2022-09-07 13:12 ` Jeff Layton
2022-09-07 13:51 ` Jan Kara
2022-09-07 14:43 ` Jeff Layton
2022-09-08 0:44 ` NeilBrown
2022-09-08 8:33 ` Jan Kara
2022-09-08 15:21 ` Theodore Ts'o
2022-09-08 15:44 ` J. Bruce Fields
2022-09-08 15:44 ` Jeff Layton
2022-09-08 15:56 ` J. Bruce Fields
2022-09-08 16:15 ` Chuck Lever III
2022-09-08 17:40 ` Jeff Layton
2022-09-08 18:22 ` J. Bruce Fields
2022-09-08 19:07 ` Jeff Layton
2022-09-08 23:01 ` NeilBrown
2022-09-08 23:23 ` Jeff Layton
2022-09-08 23:45 ` NeilBrown
2022-09-09 15:45 ` J. Bruce Fields
2022-09-09 16:36 ` Jeff Layton
2022-09-10 14:56 ` J. Bruce Fields
2022-09-12 11:42 ` Jeff Layton
2022-09-12 12:13 ` Florian Weimer
2022-09-12 12:55 ` Jeff Layton
2022-09-12 13:20 ` Florian Weimer
2022-09-12 13:49 ` Jeff Layton
2022-09-12 13:51 ` J. Bruce Fields
2022-09-12 14:02 ` Jeff Layton
2022-09-12 14:47 ` J. Bruce Fields
2022-09-12 14:15 ` Trond Myklebust
2022-09-12 14:50 ` J. Bruce Fields
2022-09-12 14:56 ` Trond Myklebust
2022-09-12 15:32 ` Trond Myklebust
2022-09-12 15:49 ` Jeff Layton
2022-09-12 12:54 ` J. Bruce Fields
2022-09-12 12:59 ` Jeff Layton
2022-09-13 0:29 ` John Stoffel
2022-09-13 0:41 ` Dave Chinner
2022-09-13 1:49 ` NeilBrown
2022-09-13 2:41 ` Dave Chinner
2022-09-13 3:30 ` NeilBrown
2022-09-13 9:38 ` Theodore Ts'o
2022-09-13 19:02 ` J. Bruce Fields
2022-09-13 23:19 ` NeilBrown
2022-09-14 0:08 ` J. Bruce Fields
2022-09-09 20:34 ` John Stoffel
2022-09-10 22:13 ` NeilBrown
2022-09-12 10:43 ` Jeff Layton
2022-09-12 13:42 ` J. Bruce Fields
2022-09-12 23:14 ` NeilBrown
2022-09-15 14:06 ` J. Bruce Fields
2022-09-15 15:08 ` Trond Myklebust
2022-09-15 16:45 ` Jeff Layton
2022-09-15 17:49 ` Trond Myklebust
2022-09-15 18:11 ` Jeff Layton
2022-09-15 19:03 ` Trond Myklebust
2022-09-15 19:25 ` Jeff Layton
2022-09-15 22:23 ` NeilBrown
2022-09-16 6:54 ` Theodore Ts'o
2022-09-16 11:36 ` Jeff Layton
2022-09-16 15:11 ` Jeff Layton [this message]
2022-09-18 23:53 ` Dave Chinner
2022-09-19 13:13 ` Jeff Layton
2022-09-20 0:16 ` Dave Chinner
2022-09-20 10:26 ` Jeff Layton
2022-09-21 0:00 ` Dave Chinner
2022-09-21 10:33 ` Jeff Layton
2022-09-21 21:41 ` Dave Chinner
2022-09-22 10:18 ` Jeff Layton
2022-09-22 20:18 ` Jeff Layton
2022-09-23 9:56 ` Jan Kara
2022-09-23 10:19 ` Jeff Layton
2022-09-23 13:44 ` Trond Myklebust
2022-09-23 13:50 ` Jeff Layton
2022-09-23 14:58 ` Frank Filz
2022-09-26 22:43 ` NeilBrown
2022-09-27 11:14 ` Jeff Layton
2022-09-27 13:18 ` Jeff Layton
2022-09-15 15:41 ` Jeff Layton
2022-09-15 22:42 ` NeilBrown
2022-09-16 11:32 ` Jeff Layton
2022-09-09 12:11 ` Theodore Ts'o
2022-09-09 12:47 ` Jeff Layton
2022-09-09 13:48 ` Theodore Ts'o
2022-09-09 14:43 ` Jeff Layton
2022-09-09 14:58 ` Theodore Ts'o
2022-09-08 22:55 ` NeilBrown
2022-09-08 23:59 ` Trond Myklebust
2022-09-09 0:51 ` NeilBrown
2022-09-09 1:05 ` Trond Myklebust
2022-09-09 1:07 ` NeilBrown
2022-09-09 1:10 ` Trond Myklebust
2022-09-09 2:14 ` Trond Myklebust
2022-09-09 6:41 ` NeilBrown
2022-09-10 12:39 ` Jeff Layton
2022-09-10 22:53 ` NeilBrown
2022-09-12 10:25 ` Jeff Layton
2022-09-12 23:29 ` NeilBrown
2022-09-13 1:15 ` Dave Chinner
2022-09-13 1:41 ` NeilBrown
2022-09-13 19:01 ` Jeff Layton
2022-09-13 23:24 ` NeilBrown
2022-09-14 11:51 ` Jeff Layton
2022-09-14 22:45 ` NeilBrown
2022-09-14 23:02 ` NeilBrown
2022-09-08 22:40 ` NeilBrown
2022-09-07 13:55 ` Trond Myklebust
2022-09-07 14:05 ` Jeff Layton
2022-09-07 15:04 ` Trond Myklebust
2022-09-07 15:11 ` Jeff Layton
2022-09-08 0:40 ` NeilBrown
2022-09-08 11:34 ` Jeff Layton
2022-09-08 22:29 ` NeilBrown
2022-09-09 11:53 ` Jeff Layton
2022-09-10 22:58 ` NeilBrown
2022-09-10 19:46 ` Al Viro
2022-09-10 23:00 ` NeilBrown
2022-09-08 0:31 ` NeilBrown
2022-09-08 0:41 ` Trond Myklebust
2022-09-08 0:53 ` NeilBrown
2022-09-08 11:37 ` Jeff Layton
2022-09-08 12:40 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=24005713ad25370d64ab5bd0db0b2e4fcb902c1c.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=adilger.kernel@dilger.ca \
--cc=bfields@fieldses.org \
--cc=brauner@kernel.org \
--cc=ceph-devel@vger.kernel.org \
--cc=chuck.lever@oracle.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=fweimer@redhat.com \
--cc=jack@suse.cz \
--cc=lczerner@redhat.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-man@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=trondmy@hammerspace.com \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=xiubli@redhat.com \
--cc=zohar@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).