linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: Amir Goldstein <amir73il@gmail.com>,
	Trond Myklebust <trondmy@hammerspace.com>,
	"djwong@kernel.org" <djwong@kernel.org>,
	"zohar@linux.ibm.com" <zohar@linux.ibm.com>,
	"brauner@kernel.org" <brauner@kernel.org>,
	"xiubli@redhat.com" <xiubli@redhat.com>,
	"neilb@suse.de" <neilb@suse.de>,
	"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"dwysocha@redhat.com" <dwysocha@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"tytso@mit.edu" <tytso@mit.edu>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"jack@suse.cz" <jack@suse.cz>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"lczerner@redhat.com" <lczerner@redhat.com>,
	"adilger.kernel@dilger.ca" <adilger.kernel@dilger.ca>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: [PATCH v3 4/7] xfs: don't bump the i_version on an atime update in xfs_vn_update_time
Date: Mon, 29 Aug 2022 15:48:48 +1000	[thread overview]
Message-ID: <20220829054848.GR3600936@dread.disaster.area> (raw)
In-Reply-To: <732164ffb95468992035a6f597dc26e3ce39316d.camel@kernel.org>

On Sun, Aug 28, 2022 at 10:37:37AM -0400, Jeff Layton wrote:
> On Sun, 2022-08-28 at 16:25 +0300, Amir Goldstein wrote:
> > On Sat, Aug 27, 2022 at 7:10 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > Yeah, thinking about it some more, simply changing the block allocation
> > > is not something that should affect the ctime, so we probably don't want
> > > to bump i_version on it. It's an implicit change, IOW, not an explicit
> > > one.
> > > 
> > > The fact that xfs might do that is unfortunate, but it's not the end of
> > > the world and it still would conform to the proposed definition for
> > > i_version. In practice, this sort of allocation change should come soon
> > > after the file was written, so one would hope that any damage due to the
> > > false i_version bump would be minimized.
> > > 
> > 
> > That was exactly my point.
> > 
> > > It would be nice to teach it not to do that however. Maybe we can insert
> > > the NOIVER flag at a strategic place to avoid it?

No, absolutely not.

I've already explained this: The XFS *disk format specification*
says that di_changecount is bumped for every change that is made to
the inode.

Applications that are written from this specification expect the on
disk format for a XFS given filesystem feature to remain the same
until it is either deprecated and removed or we add feature flags to
indicate it has different behaviour.  We can't just change the
behaviour at a whim.

And that's ignoring the fact that randomly spewing NOIVER
into transactions that modify inode metadata is a nasty hack - it
is not desirable from a design or documentation POV, nor is it
maintainable.

> > Why would that be nice to avoid?
> > You did not specify any use case where incrementing i_version
> > on block mapping change matters in practice.
> > On the contrary, you said that NFS client writer sends COMMIT on close,
> > which should stabilize i_version for the next readers.
> > 
> > Given that we already have an xfs implementation that does increment
> > i_version on block mapping changes and it would be a pain to change
> > that or add a new user options, I don't see the point in discussing it further
> > unless there is a good incentive for avoiding i_version updates in those cases.
> > 
> 
> Because the change to the block allocation doesn't represent an
> "explicit" change to the inode. We will have bumped the ctime on the
> original write (in update_time), but the follow-on changes that occur
> due to that write needn't be counted as they aren't visible to the
> client.
> 
> It's possible for a client to issue a read between the write and the
> flush and get the interim value for i_version. Then, once the write
> happens and the i_version gets bumped again, the client invalidates its
> cache even though it needn't do so.
> 
> The race window ought to be relatively small, and this wouldn't result
> in incorrect behavior that you'd notice (other than loss of
> performance), but it's not ideal. We're doing more on-the-wire reads
> than are necessary in this case.
> 
> It would be nice to have it not do that. If we end up taking this patch
> to make it elide the i_version bumps on atime updates, we may be able to
> set the the NOIVER flag in other cases as well, and avoid some of these
> extra bumps.


<sigh>

Please don't make me repeat myself for the third time.

Once we have decided on a solid, unchanging definition for the
*statx user API variable*, we'll implement a new on-disk field that
provides this information.  We will document it in the on-disk
specification as "this is how di_iversion behaves" so that it is
clear to everyone parsing the on-disk format or writing their own
XFS driver how to implement it and when to expect it to
change.

Then we can add a filesystem and inode feature flags that say "inode
has new iversion" and we use that to populate the kernel iversion
instead of di_changecount. We keep di_changecount exactly the way it
is now for the applications and use cases we already have for that
specific behaviour. If the kernel and/or filesystem don't support
the new di_iversion field, then we'll use di_changecount as it
currently exists for the kernel iversion code.

Keep in mind that we've been doing dynamic inode format updates in
XFS for a couple of decades - users don't even have to be aware that
they need to perform format upgrades because often they just happen
whenever an inode is accessed. IOWs, just because we have to change
the on-disk format to support this new iversion definition, it
doesn't mean users have to reformat filesystems before the new
feature can be used.

Hence, over time, as distros update kernels, the XFS iversion
behaviour will change automagically as we update inodes in existing
filesystems as they are accessed to add and then use the new
di_iversion field for the VFS change attribute field instead of the
di_changecount field...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2022-08-29  5:48 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-26 21:46 [PATCH v3 0/7] vfs: clean up i_version behavior and expose it via statx Jeff Layton
2022-08-26 21:46 ` [PATCH v3 1/7] iversion: update comments with info about atime updates Jeff Layton
2022-08-29  7:56   ` Dave Chinner
2022-08-29 10:39     ` Jeff Layton
2022-08-29 22:58       ` NeilBrown
2022-08-30 11:40         ` Jeff Layton
2022-08-30 13:24           ` J. Bruce Fields
2022-08-30 13:50             ` Jeff Layton
2022-08-30 14:44               ` J. Bruce Fields
2022-08-30 14:58                 ` Trond Myklebust
2022-08-30 15:17                   ` J. Bruce Fields
2022-08-30 15:43                     ` Trond Myklebust
2022-08-30 17:02                       ` Jeff Layton
2022-08-30 17:47                         ` Trond Myklebust
2022-08-30 17:53                           ` Jeff Layton
2022-08-30 18:25                             ` Trond Myklebust
2022-08-30 19:11                               ` Jeff Layton
2022-08-30 18:32                         ` J. Bruce Fields
2022-08-30 19:30                           ` Jeff Layton
2022-08-30 19:46                             ` J. Bruce Fields
2022-08-30 19:57                               ` Jeff Layton
2022-08-30 20:08                                 ` J. Bruce Fields
2022-08-30  1:04       ` Dave Chinner
2022-08-30 12:38         ` Jeff Layton
2022-08-26 21:46 ` [PATCH v3 2/7] ext4: fix i_version handling in ext4 Jeff Layton
2022-08-26 21:46 ` [PATCH v3 3/7] ext4: unconditionally enable the i_version counter Jeff Layton
2022-08-29 14:51   ` Jan Kara
2022-08-26 21:47 ` [PATCH v3 4/7] xfs: don't bump the i_version on an atime update in xfs_vn_update_time Jeff Layton
2022-08-27  7:26   ` Amir Goldstein
2022-08-27  8:01     ` Amir Goldstein
2022-08-27 13:14       ` Jeff Layton
2022-08-27 15:46         ` Darrick J. Wong
2022-08-27 16:03           ` Trond Myklebust
2022-08-27 16:10             ` Jeff Layton
2022-08-27 17:06               ` Trond Myklebust
2022-08-28 13:25               ` Amir Goldstein
2022-08-28 14:37                 ` Jeff Layton
2022-08-28 16:53                   ` Amir Goldstein
2022-08-29  5:48                   ` Dave Chinner [this message]
2022-08-29 10:33                     ` Jeff Layton
2022-08-30  0:08                       ` Dave Chinner
2022-08-30 11:20                         ` Jeff Layton
2022-08-28 17:30           ` Amir Goldstein
2022-08-26 21:47 ` [PATCH v3 5/7] vfs: report an inode version in statx for IS_I_VERSION inodes Jeff Layton
2022-08-26 21:47 ` [PATCH v3 6/7] nfs: report the inode version in statx if requested Jeff Layton
2022-08-26 21:47 ` [PATCH v3 7/7] ceph: fill in the change attribute in statx requests Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220829054848.GR3600936@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=djwong@kernel.org \
    --cc=dwysocha@redhat.com \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=lczerner@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=trondmy@hammerspace.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xiubli@redhat.com \
    --cc=zohar@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).