linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: bfields@fieldses.org (J. Bruce Fields)
To: Jeff Layton <jlayton@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
	NeilBrown <neilb@suse.de>,
	adilger.kernel@dilger.ca, djwong@kernel.org, david@fromorbit.com,
	trondmy@hammerspace.com, viro@zeniv.linux.org.uk,
	zohar@linux.ibm.com, xiubli@redhat.com, chuck.lever@oracle.com,
	lczerner@redhat.com, brauner@kernel.org, fweimer@redhat.com,
	linux-man@vger.kernel.org, linux-api@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-xfs@vger.kernel.org
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field
Date: Sat, 10 Sep 2022 10:56:00 -0400	[thread overview]
Message-ID: <20220910145600.GA347@fieldses.org> (raw)
In-Reply-To: <125df688dbebaf06478b0911e76e228e910b04b3.camel@kernel.org>

On Fri, Sep 09, 2022 at 12:36:29PM -0400, Jeff Layton wrote:
> On Fri, 2022-09-09 at 11:45 -0400, J. Bruce Fields wrote:
> > On Thu, Sep 08, 2022 at 03:07:58PM -0400, Jeff Layton wrote:
> > > On Thu, 2022-09-08 at 14:22 -0400, J. Bruce Fields wrote:
> > > > On Thu, Sep 08, 2022 at 01:40:11PM -0400, Jeff Layton wrote:
> > > > > Yeah, ok. That does make some sense. So we would mix this into the
> > > > > i_version instead of the ctime when it was available. Preferably, we'd
> > > > > mix that in when we store the i_version rather than adding it afterward.
> > > > > 
> > > > > Ted, how would we access this? Maybe we could just add a new (generic)
> > > > > super_block field for this that ext4 (and other filesystems) could
> > > > > populate at mount time?
> > > > 
> > > > Couldn't the filesystem just return an ino_version that already includes
> > > > it?
> > > > 
> > > 
> > > Yes. That's simple if we want to just fold it in during getattr. If we
> > > want to fold that into the values stored on disk, then I'm a little less
> > > clear on how that will work.
> > > 
> > > Maybe I need a concrete example of how that will work:
> > > 
> > > Suppose we have an i_version value X with the previous crash counter
> > > already factored in that makes it to disk. We hand out a newer version
> > > X+1 to a client, but that value never makes it to disk.
> > > 
> > > The machine crashes and comes back up, and we get a query for i_version
> > > and it comes back as X. Fine, it's an old version. Now there is a write.
> > > What do we do to ensure that the new value doesn't collide with X+1? 
> > 
> > I was assuming we could partition i_version's 64 bits somehow: e.g., top
> > 16 bits store the crash counter.  You increment the i_version by: 1)
> > replacing the top bits by the new crash counter, if it has changed, and
> > 2) incrementing.
> > 
> > Do the numbers work out?  2^16 mounts after unclean shutdowns sounds
> > like a lot for one filesystem, as does 2^48 changes to a single file,
> > but people do weird things.  Maybe there's a better partitioning, or
> > some more flexible way of maintaining an i_version that still allows you
> > to identify whether a given i_version preceded a crash.
> > 
> 
> We consume one bit to keep track of the "seen" flag, so it would be a
> 16+47 split. I assume that we'd also reset the version counter to 0 when
> the crash counter changes? Maybe that doesn't matter as long as we don't
> overflow into the crash counter.
> 
> I'm not sure we can get away with 16 bits for the crash counter, as
> it'll leave us subject to the version counter wrapping after a long
> uptimes. 
> 
> If you increment a counter every nanosecond, how long until that counter
> wraps? With 63 bits, that's 292 years (and change). With 16+47 bits,
> that's less than two days. An 8+55 split would give us ~416 days which
> seems a bit more reasonable?

Though now it's starting to seem a little limiting to allow only 2^8
mounts after unclean shutdowns.

Another way to think of it might be: multiply that 8-bit crash counter
by 2^48, and think of it as a 64-bit value that we believe (based on
practical limits on how many times you can modify a single file) is
gauranteed to be larger than any i_version that we gave out before the
most recent crash.

Our goal is to ensure that after a crash, any *new* i_versions that we
give out or write to disk are larger than any that have previously been
given out.  We can do that by ensuring that they're equal to at least
that old maximum.

So think of the 64-bit value we're storing in the superblock as a
ceiling on i_version values across all the filesystem's inodes.  Call it
s_version_max or something.  We also need to know what the maximum was
before the most recent crash.  Call that s_version_max_old.

Then we could get correct behavior if we generated i_versions with
something like:

	i_version++;
	if (i_version < s_version_max_old)
		i_version = s_version_max_old;
	if (i_version > s_version_max)
		s_version_max = i_version + 1;

But that last step makes this ludicrously expensive, because for this to
be safe across crashes we need to update that value on disk as well, and
we need to do that frequently.

Fortunately, s_version_max doesn't have to be a tight bound at all.  We
can easily just initialize it to, say, 2^40, and only bump it by 2^40 at
a time.  And recognize when we're running up against it way ahead of
time, so we only need to say "here's an updated value, could you please
make sure it gets to disk sometime in the next twenty minutes"?
(Numbers made up.)

Sorry, that was way too many words.  But I think something like that
could work, and make it very difficult to hit any hard limits, and
actually not be too complicated??  Unless I missed something.

--b.

  reply	other threads:[~2022-09-10 14:56 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-07 11:16 [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field Jeff Layton
2022-09-07 11:37 ` NeilBrown
2022-09-07 12:20   ` J. Bruce Fields
2022-09-07 12:58     ` Jeff Layton
2022-09-07 12:47   ` Jeff Layton
2022-09-07 12:52     ` J. Bruce Fields
2022-09-07 13:12       ` Jeff Layton
2022-09-07 13:51         ` Jan Kara
2022-09-07 14:43           ` Jeff Layton
2022-09-08  0:44           ` NeilBrown
2022-09-08  8:33             ` Jan Kara
2022-09-08 15:21               ` Theodore Ts'o
2022-09-08 15:44                 ` J. Bruce Fields
2022-09-08 15:44                 ` Jeff Layton
2022-09-08 15:56                   ` J. Bruce Fields
2022-09-08 16:15                     ` Chuck Lever III
2022-09-08 17:40                     ` Jeff Layton
2022-09-08 18:22                       ` J. Bruce Fields
2022-09-08 19:07                         ` Jeff Layton
2022-09-08 23:01                           ` NeilBrown
2022-09-08 23:23                             ` Jeff Layton
2022-09-08 23:45                               ` NeilBrown
2022-09-09 15:45                           ` J. Bruce Fields
2022-09-09 16:36                             ` Jeff Layton
2022-09-10 14:56                               ` J. Bruce Fields [this message]
2022-09-12 11:42                                 ` Jeff Layton
2022-09-12 12:13                                   ` Florian Weimer
2022-09-12 12:55                                     ` Jeff Layton
2022-09-12 13:20                                       ` Florian Weimer
2022-09-12 13:49                                         ` Jeff Layton
2022-09-12 13:51                                       ` J. Bruce Fields
2022-09-12 14:02                                         ` Jeff Layton
2022-09-12 14:47                                           ` J. Bruce Fields
2022-09-12 14:15                                         ` Trond Myklebust
2022-09-12 14:50                                           ` J. Bruce Fields
2022-09-12 14:56                                             ` Trond Myklebust
2022-09-12 15:32                                               ` Trond Myklebust
2022-09-12 15:49                                                 ` Jeff Layton
2022-09-12 12:54                                   ` J. Bruce Fields
2022-09-12 12:59                                     ` Jeff Layton
2022-09-13  0:29                                   ` John Stoffel
2022-09-13  0:41                                   ` Dave Chinner
2022-09-13  1:49                                     ` NeilBrown
2022-09-13  2:41                                       ` Dave Chinner
2022-09-13  3:30                                         ` NeilBrown
2022-09-13  9:38                                           ` Theodore Ts'o
2022-09-13 19:02                                       ` J. Bruce Fields
2022-09-13 23:19                                         ` NeilBrown
2022-09-14  0:08                                           ` J. Bruce Fields
2022-09-09 20:34                           ` John Stoffel
2022-09-10 22:13                           ` NeilBrown
2022-09-12 10:43                             ` Jeff Layton
2022-09-12 13:42                             ` J. Bruce Fields
2022-09-12 23:14                               ` NeilBrown
2022-09-15 14:06                                 ` J. Bruce Fields
2022-09-15 15:08                                   ` Trond Myklebust
2022-09-15 16:45                                     ` Jeff Layton
2022-09-15 17:49                                       ` Trond Myklebust
2022-09-15 18:11                                         ` Jeff Layton
2022-09-15 19:03                                           ` Trond Myklebust
2022-09-15 19:25                                             ` Jeff Layton
2022-09-15 22:23                                               ` NeilBrown
2022-09-16  6:54                                                 ` Theodore Ts'o
2022-09-16 11:36                                                   ` Jeff Layton
2022-09-16 15:11                                                     ` Jeff Layton
2022-09-18 23:53                                                       ` Dave Chinner
2022-09-19 13:13                                                         ` Jeff Layton
2022-09-20  0:16                                                           ` Dave Chinner
2022-09-20 10:26                                                             ` Jeff Layton
2022-09-21  0:00                                                               ` Dave Chinner
2022-09-21 10:33                                                                 ` Jeff Layton
2022-09-21 21:41                                                                   ` Dave Chinner
2022-09-22 10:18                                                                     ` Jeff Layton
2022-09-22 20:18                                                                       ` Jeff Layton
2022-09-23  9:56                                                                         ` Jan Kara
2022-09-23 10:19                                                                           ` Jeff Layton
2022-09-23 13:44                                                                           ` Trond Myklebust
2022-09-23 13:50                                                                             ` Jeff Layton
2022-09-23 14:58                                                                               ` Frank Filz
2022-09-26 22:43                                                                               ` NeilBrown
2022-09-27 11:14                                                                                 ` Jeff Layton
2022-09-27 13:18                                                                                 ` Jeff Layton
2022-09-15 15:41                                   ` Jeff Layton
2022-09-15 22:42                                     ` NeilBrown
2022-09-16 11:32                                       ` Jeff Layton
2022-09-09 12:11                       ` Theodore Ts'o
2022-09-09 12:47                         ` Jeff Layton
2022-09-09 13:48                           ` Theodore Ts'o
2022-09-09 14:43                             ` Jeff Layton
2022-09-09 14:58                               ` Theodore Ts'o
2022-09-08 22:55                   ` NeilBrown
2022-09-08 23:59                     ` Trond Myklebust
2022-09-09  0:51                       ` NeilBrown
2022-09-09  1:05                         ` Trond Myklebust
2022-09-09  1:07                         ` NeilBrown
2022-09-09  1:10                           ` Trond Myklebust
2022-09-09  2:14                             ` Trond Myklebust
2022-09-09  6:41                               ` NeilBrown
2022-09-10 12:39                                 ` Jeff Layton
2022-09-10 22:53                                   ` NeilBrown
2022-09-12 10:25                                     ` Jeff Layton
2022-09-12 23:29                                       ` NeilBrown
2022-09-13  1:15                                         ` Dave Chinner
2022-09-13  1:41                                           ` NeilBrown
2022-09-13 19:01                                           ` Jeff Layton
2022-09-13 23:24                                             ` NeilBrown
2022-09-14 11:51                                               ` Jeff Layton
2022-09-14 22:45                                                 ` NeilBrown
2022-09-14 23:02                                                   ` NeilBrown
2022-09-08 22:40                 ` NeilBrown
2022-09-07 13:55         ` Trond Myklebust
2022-09-07 14:05           ` Jeff Layton
2022-09-07 15:04             ` Trond Myklebust
2022-09-07 15:11               ` Jeff Layton
2022-09-08  0:40             ` NeilBrown
2022-09-08 11:34               ` Jeff Layton
2022-09-08 22:29                 ` NeilBrown
2022-09-09 11:53                   ` Jeff Layton
2022-09-10 22:58                     ` NeilBrown
2022-09-10 19:46               ` Al Viro
2022-09-10 23:00                 ` NeilBrown
2022-09-08  0:31           ` NeilBrown
2022-09-08  0:41             ` Trond Myklebust
2022-09-08  0:53               ` NeilBrown
2022-09-08 11:37               ` Jeff Layton
2022-09-08 12:40                 ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220910145600.GA347@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=brauner@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=fweimer@redhat.com \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=lczerner@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=trondmy@hammerspace.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xiubli@redhat.com \
    --cc=zohar@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).