linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Amir Goldstein <amir73il@gmail.com>,
	Trond Myklebust <trondmy@hammerspace.com>,
	"djwong@kernel.org" <djwong@kernel.org>,
	"zohar@linux.ibm.com" <zohar@linux.ibm.com>,
	"brauner@kernel.org" <brauner@kernel.org>,
	"xiubli@redhat.com" <xiubli@redhat.com>,
	"neilb@suse.de" <neilb@suse.de>,
	"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"dwysocha@redhat.com" <dwysocha@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"tytso@mit.edu" <tytso@mit.edu>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"jack@suse.cz" <jack@suse.cz>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"lczerner@redhat.com" <lczerner@redhat.com>,
	"adilger.kernel@dilger.ca" <adilger.kernel@dilger.ca>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: [PATCH v3 4/7] xfs: don't bump the i_version on an atime update in xfs_vn_update_time
Date: Tue, 30 Aug 2022 07:20:50 -0400	[thread overview]
Message-ID: <fe13642a39160cc7dd15f9212e1afb69e955c0be.camel@kernel.org> (raw)
In-Reply-To: <20220830000851.GV3600936@dread.disaster.area>

On Tue, 2022-08-30 at 10:08 +1000, Dave Chinner wrote:
> On Mon, Aug 29, 2022 at 06:33:48AM -0400, Jeff Layton wrote:
> > On Mon, 2022-08-29 at 15:48 +1000, Dave Chinner wrote:
> > > > 
> > > > The race window ought to be relatively small, and this wouldn't result
> > > > in incorrect behavior that you'd notice (other than loss of
> > > > performance), but it's not ideal. We're doing more on-the-wire reads
> > > > than are necessary in this case.
> > > > 
> > > > It would be nice to have it not do that. If we end up taking this patch
> > > > to make it elide the i_version bumps on atime updates, we may be able to
> > > > set the the NOIVER flag in other cases as well, and avoid some of these
> > > > extra bumps.
> > > 
> > > 
> > > <sigh>
> > > 
> > > Please don't make me repeat myself for the third time.
> > > 
> > > Once we have decided on a solid, unchanging definition for the
> > > *statx user API variable*, we'll implement a new on-disk field that
> > > provides this information.  We will document it in the on-disk
> > > specification as "this is how di_iversion behaves" so that it is
> > > clear to everyone parsing the on-disk format or writing their own
> > > XFS driver how to implement it and when to expect it to
> > > change.
> > > 
> > > Then we can add a filesystem and inode feature flags that say "inode
> > > has new iversion" and we use that to populate the kernel iversion
> > > instead of di_changecount. We keep di_changecount exactly the way it
> > > is now for the applications and use cases we already have for that
> > > specific behaviour. If the kernel and/or filesystem don't support
> > > the new di_iversion field, then we'll use di_changecount as it
> > > currently exists for the kernel iversion code.
> > > 
> > 
> > Aside from NFS and IMA, what applications are dependent on the current
> > definition and how do they rely on i_version today?
> 
> I've answered this multiple times already: the di_changecount
> behaviour is defined in the on-disk specification and hence we
> *cannot change the behaviour* without changing the on-disk format
> specification.
> 
> Apart from the forensics aspect of the change counter (which nobody
> but us XFS developers seem to understand just how damn important
> this is), there are *many* third party applications that parse the
> XFS on-disk format directly. This:
> 
> https://codesearch.debian.net/search?q=XFS_SB_VERSION_DIRV2&literal=1
> 
> Shows grub2, libparted, syslinux, partclone and fsarchiver as
> knowing about XFS on-disk superblock flags that tell them what
> format the directory structure is in. That alone is enough to
> indicate they parse on-disk inodes directly, and hence may expect
> di_changecount to have specific meaning and use it to detect
> unexpected changes to files/directories they care about.
> 
> If I go looking for XFS_SB_MAGIC, I find things like libblkid,
> klibc, qemu, Xen, testdisk, gpart, and virtualbox all parse the
> on-disk superblocks directly from the block device, too. They also
> rely directly on XFS developers ensuring there are no silent
> incomaptible changes to the on disk format.
> 
> I also know of many other utilities that people and companies have
> written that parse the on disk format directly from userspace. The
> functions these perform include low level storage management tools,
> copying and managing disk images (e.g. offline configuration for
> cluster deployments), data recovery tools that scrape all the data
> out of broken filesystems, etc.
> 
> These applications are reliant on the guarantee we provide that the
> on-disk format will not silently change and that behaviour/structure
> can always easily be discovered by feature flags in the superblock
> and/or inodes.
> 
> IOWs, just because there aren't obvious "traditional" application on
> top of the kernel filesystem that consumes the in-memory kernel
> iversion field, it does not mean that the defined behaviour of the
> on-disk di_changecount field is not used or relied on by other tools
> that work directly on the on-disk format.
> 
> You might be right that NFS doesn't care about this, but the point
> remains that NFS does not control the XFS on-disk format, nor does
> the fact that what NFS wants from the change attribute has changed
> over time override the fact that maintaining XFS on-disk format
> compatibility is the responsibility of XFS developers. We're willing
> to change the on-disk format to support whatever the new definition
> of the statx change attribute ends up being, and that should be the
> end of the discussion.
> 

Thanks for spelling this out in more detail.

> > > Keep in mind that we've been doing dynamic inode format updates in
> > > XFS for a couple of decades - users don't even have to be aware that
> > > they need to perform format upgrades because often they just happen
> > > whenever an inode is accessed. IOWs, just because we have to change
> > > the on-disk format to support this new iversion definition, it
> > > doesn't mean users have to reformat filesystems before the new
> > > feature can be used.
> > > 
> > > Hence, over time, as distros update kernels, the XFS iversion
> > > behaviour will change automagically as we update inodes in existing
> > > filesystems as they are accessed to add and then use the new
> > > di_iversion field for the VFS change attribute field instead of the
> > > di_changecount field...
> > > 
> > 
> > If you want to create a whole new on-disk field for this, then that's
> > your prerogative, but before you do that, I'd like to better understand
> > why and how the constraints on this field changed.
> > 
> > The original log message from the commit that added a change counter
> > (below) stated that you were adding it for network filesystems like NFS.
> > When did this change and why?
> 
> It never changed. I'll repeat what I've already explained twice
> before:
> 
> https://lore.kernel.org/linux-xfs/20220818030048.GE3600936@dread.disaster.area/
> https://lore.kernel.org/linux-xfs/20220818033731.GF3600936@dread.disaster.area/
> 
> tl; dr: NFS requirements were just one of *many* we had at the time
> for an atomic persistent change counter.
> 
> The fact is that NFS users are just going to have to put up with
> random cache invalidations on XFS for a while longer. Nobody noticed
> this and/or cared about this enough to raise it as an issue for the
> past decade, so waiting another few months for upstream XFS to
> change to a different on-disk format for the NFS/statx change
> attribute isn't a big deal.
> 

Fair enough. I'll plan to drop this patch from the series for now, with
the expectation that you guys will add a new i_version counter that
better conforms to what NFS and IMA need.

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2022-08-30 11:21 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-26 21:46 [PATCH v3 0/7] vfs: clean up i_version behavior and expose it via statx Jeff Layton
2022-08-26 21:46 ` [PATCH v3 1/7] iversion: update comments with info about atime updates Jeff Layton
2022-08-29  7:56   ` Dave Chinner
2022-08-29 10:39     ` Jeff Layton
2022-08-29 22:58       ` NeilBrown
2022-08-30 11:40         ` Jeff Layton
2022-08-30 13:24           ` J. Bruce Fields
2022-08-30 13:50             ` Jeff Layton
2022-08-30 14:44               ` J. Bruce Fields
2022-08-30 14:58                 ` Trond Myklebust
2022-08-30 15:17                   ` J. Bruce Fields
2022-08-30 15:43                     ` Trond Myklebust
2022-08-30 17:02                       ` Jeff Layton
2022-08-30 17:47                         ` Trond Myklebust
2022-08-30 17:53                           ` Jeff Layton
2022-08-30 18:25                             ` Trond Myklebust
2022-08-30 19:11                               ` Jeff Layton
2022-08-30 18:32                         ` J. Bruce Fields
2022-08-30 19:30                           ` Jeff Layton
2022-08-30 19:46                             ` J. Bruce Fields
2022-08-30 19:57                               ` Jeff Layton
2022-08-30 20:08                                 ` J. Bruce Fields
2022-08-30  1:04       ` Dave Chinner
2022-08-30 12:38         ` Jeff Layton
2022-08-26 21:46 ` [PATCH v3 2/7] ext4: fix i_version handling in ext4 Jeff Layton
2022-08-26 21:46 ` [PATCH v3 3/7] ext4: unconditionally enable the i_version counter Jeff Layton
2022-08-29 14:51   ` Jan Kara
2022-08-26 21:47 ` [PATCH v3 4/7] xfs: don't bump the i_version on an atime update in xfs_vn_update_time Jeff Layton
2022-08-27  7:26   ` Amir Goldstein
2022-08-27  8:01     ` Amir Goldstein
2022-08-27 13:14       ` Jeff Layton
2022-08-27 15:46         ` Darrick J. Wong
2022-08-27 16:03           ` Trond Myklebust
2022-08-27 16:10             ` Jeff Layton
2022-08-27 17:06               ` Trond Myklebust
2022-08-28 13:25               ` Amir Goldstein
2022-08-28 14:37                 ` Jeff Layton
2022-08-28 16:53                   ` Amir Goldstein
2022-08-29  5:48                   ` Dave Chinner
2022-08-29 10:33                     ` Jeff Layton
2022-08-30  0:08                       ` Dave Chinner
2022-08-30 11:20                         ` Jeff Layton [this message]
2022-08-28 17:30           ` Amir Goldstein
2022-08-26 21:47 ` [PATCH v3 5/7] vfs: report an inode version in statx for IS_I_VERSION inodes Jeff Layton
2022-08-26 21:47 ` [PATCH v3 6/7] nfs: report the inode version in statx if requested Jeff Layton
2022-08-26 21:47 ` [PATCH v3 7/7] ceph: fill in the change attribute in statx requests Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fe13642a39160cc7dd15f9212e1afb69e955c0be.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=dwysocha@redhat.com \
    --cc=jack@suse.cz \
    --cc=lczerner@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=trondmy@hammerspace.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xiubli@redhat.com \
    --cc=zohar@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).