linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Trond Myklebust <trondmy@hammerspace.com>,
	"djwong@kernel.org" <djwong@kernel.org>
Cc: "zohar@linux.ibm.com" <zohar@linux.ibm.com>,
	"brauner@kernel.org" <brauner@kernel.org>,
	"xiubli@redhat.com" <xiubli@redhat.com>,
	"neilb@suse.de" <neilb@suse.de>,
	"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"dwysocha@redhat.com" <dwysocha@redhat.com>,
	"david@fromorbit.com" <david@fromorbit.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"tytso@mit.edu" <tytso@mit.edu>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"jack@suse.cz" <jack@suse.cz>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"amir73il@gmail.com" <amir73il@gmail.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"lczerner@redhat.com" <lczerner@redhat.com>,
	"adilger.kernel@dilger.ca" <adilger.kernel@dilger.ca>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: [PATCH v3 4/7] xfs: don't bump the i_version on an atime update in xfs_vn_update_time
Date: Sat, 27 Aug 2022 12:10:06 -0400	[thread overview]
Message-ID: <b13812a68310e49cc6fb649c2b1c25287712a8af.camel@kernel.org> (raw)
In-Reply-To: <079df2134120f847e8237675a8cc227d6354a153.camel@hammerspace.com>

On Sat, 2022-08-27 at 16:03 +0000, Trond Myklebust wrote:
> On Sat, 2022-08-27 at 08:46 -0700, Darrick J. Wong wrote:
> > On Sat, Aug 27, 2022 at 09:14:30AM -0400, Jeff Layton wrote:
> > > On Sat, 2022-08-27 at 11:01 +0300, Amir Goldstein wrote:
> > > > On Sat, Aug 27, 2022 at 10:26 AM Amir Goldstein
> > > > <amir73il@gmail.com> wrote:
> > > > > 
> > > > > On Sat, Aug 27, 2022 at 12:49 AM Jeff Layton
> > > > > <jlayton@kernel.org> wrote:
> > > > > > 
> > > > > > xfs will update the i_version when updating only the atime
> > > > > > value, which
> > > > > > is not desirable for any of the current consumers of
> > > > > > i_version. Doing so
> > > > > > leads to unnecessary cache invalidations on NFS and extra
> > > > > > measurement
> > > > > > activity in IMA.
> > > > > > 
> > > > > > Add a new XFS_ILOG_NOIVER flag, and use that to indicate that
> > > > > > the
> > > > > > transaction should not update the i_version. Set that value
> > > > > > in
> > > > > > xfs_vn_update_time if we're only updating the atime.
> > > > > > 
> > > > > > Cc: Dave Chinner <david@fromorbit.com>
> > > > > > Cc: NeilBrown <neilb@suse.de>
> > > > > > Cc: Trond Myklebust <trondmy@hammerspace.com>
> > > > > > Cc: David Wysochanski <dwysocha@redhat.com>
> > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > ---
> > > > > >  fs/xfs/libxfs/xfs_log_format.h  |  2 +-
> > > > > >  fs/xfs/libxfs/xfs_trans_inode.c |  2 +-
> > > > > >  fs/xfs/xfs_iops.c               | 11 +++++++++--
> > > > > >  3 files changed, 11 insertions(+), 4 deletions(-)
> > > > > > 
> > > > > > Dave has NACK'ed this patch, but I'm sending it as a way to
> > > > > > illustrate
> > > > > > the problem. I still think this approach should at least fix
> > > > > > the worst
> > > > > > problems with atime updates being counted. We can look to
> > > > > > carve out
> > > > > > other "spurious" i_version updates as we identify them.
> > > > > > 
> > > > > 
> > > > > AFAIK, "spurious" is only inode blocks map changes due to
> > > > > writeback
> > > > > of dirty pages. Anybody know about other cases?
> > > > > 
> > > > > Regarding inode blocks map changes, first of all, I don't think
> > > > > that there is
> > > > > any practical loss from invalidating NFS client cache on dirty
> > > > > data writeback,
> > > > > because NFS server should be serving cold data most of the
> > > > > time.
> > > > > If there are a few unneeded cache invalidations they would only
> > > > > be temporary.
> > > > > 
> > > > 
> > > > Unless there is an issue with a writer NFS client that
> > > > invalidates its
> > > > own attribute
> > > > caches on server data writeback?
> > > > 
> > > 
> > > The client just looks at the file attributes (of which i_version is
> > > but
> > > one), and if certain attributes have changed (mtime, ctime,
> > > i_version,
> > > etc...) then it invalidates its cache.
> > > 
> > > In the case of blocks map changes, could that mean a difference in
> > > the
> > > observable sparse regions of the file? If so, then a READ_PLUS
> > > before
> > > the change and a READ_PLUS after could give different results.
> > > Since
> > > that difference is observable by the client, I'd think we'd want to
> > > bump
> > > i_version for that anyway.
> > 
> > How /is/ READ_PLUS supposed to detect sparse regions, anyway?  I know
> > that's been the subject of recent debate.  At least as far as XFS is
> > concerned, a file range can go from hole -> delayed allocation
> > reservation -> unwritten extent -> (actual writeback) -> written
> > extent.
> > The dance became rather more complex when we added COW.  If any of
> > that
> > will make a difference for READ_PLUS, then yes, I think you'd want
> > file
> > writeback activities to bump iversion to cause client invalidations,
> > like (I think) Dave said.
> > 
> > The fs/iomap/ implementation of SEEK_DATA/SEEK_HOLE reports data for
> > written and delalloc extents; and an unwritten extent will report
> > data
> > for any pagecache it finds.
> > 
> 
> READ_PLUS should never return anything different than a read() system
> call would return for any given area. The way it reports sparse regions
> vs. data regions is purely an RPC formatting convenience.
> 
> The only point to note about NFS READ and READ_PLUS is that because the
> client is forced to send multiple RPC calls if the user is trying to
> read a region that is larger than the 'rsize' value, it is possible
> that these READ/READ_PLUS calls may be processed out of order, and so
> the result may end up looking different than if you had executed a
> read() call for the full region directly on the server.
> However each individual READ / READ_PLUS reply should look as if the
> user had called read() on that rsize-sized section of the file.
> > > 

Yeah, thinking about it some more, simply changing the block allocation
is not something that should affect the ctime, so we probably don't want
to bump i_version on it. It's an implicit change, IOW, not an explicit
one.

The fact that xfs might do that is unfortunate, but it's not the end of
the world and it still would conform to the proposed definition for
i_version. In practice, this sort of allocation change should come soon
after the file was written, so one would hope that any damage due to the
false i_version bump would be minimized.

It would be nice to teach it not to do that however. Maybe we can insert
the NOIVER flag at a strategic place to avoid it?
-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2022-08-27 16:10 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-26 21:46 [PATCH v3 0/7] vfs: clean up i_version behavior and expose it via statx Jeff Layton
2022-08-26 21:46 ` [PATCH v3 1/7] iversion: update comments with info about atime updates Jeff Layton
2022-08-29  7:56   ` Dave Chinner
2022-08-29 10:39     ` Jeff Layton
2022-08-29 22:58       ` NeilBrown
2022-08-30 11:40         ` Jeff Layton
2022-08-30 13:24           ` J. Bruce Fields
2022-08-30 13:50             ` Jeff Layton
2022-08-30 14:44               ` J. Bruce Fields
2022-08-30 14:58                 ` Trond Myklebust
2022-08-30 15:17                   ` J. Bruce Fields
2022-08-30 15:43                     ` Trond Myklebust
2022-08-30 17:02                       ` Jeff Layton
2022-08-30 17:47                         ` Trond Myklebust
2022-08-30 17:53                           ` Jeff Layton
2022-08-30 18:25                             ` Trond Myklebust
2022-08-30 19:11                               ` Jeff Layton
2022-08-30 18:32                         ` J. Bruce Fields
2022-08-30 19:30                           ` Jeff Layton
2022-08-30 19:46                             ` J. Bruce Fields
2022-08-30 19:57                               ` Jeff Layton
2022-08-30 20:08                                 ` J. Bruce Fields
2022-08-30  1:04       ` Dave Chinner
2022-08-30 12:38         ` Jeff Layton
2022-08-26 21:46 ` [PATCH v3 2/7] ext4: fix i_version handling in ext4 Jeff Layton
2022-08-26 21:46 ` [PATCH v3 3/7] ext4: unconditionally enable the i_version counter Jeff Layton
2022-08-29 14:51   ` Jan Kara
2022-08-26 21:47 ` [PATCH v3 4/7] xfs: don't bump the i_version on an atime update in xfs_vn_update_time Jeff Layton
2022-08-27  7:26   ` Amir Goldstein
2022-08-27  8:01     ` Amir Goldstein
2022-08-27 13:14       ` Jeff Layton
2022-08-27 15:46         ` Darrick J. Wong
2022-08-27 16:03           ` Trond Myklebust
2022-08-27 16:10             ` Jeff Layton [this message]
2022-08-27 17:06               ` Trond Myklebust
2022-08-28 13:25               ` Amir Goldstein
2022-08-28 14:37                 ` Jeff Layton
2022-08-28 16:53                   ` Amir Goldstein
2022-08-29  5:48                   ` Dave Chinner
2022-08-29 10:33                     ` Jeff Layton
2022-08-30  0:08                       ` Dave Chinner
2022-08-30 11:20                         ` Jeff Layton
2022-08-28 17:30           ` Amir Goldstein
2022-08-26 21:47 ` [PATCH v3 5/7] vfs: report an inode version in statx for IS_I_VERSION inodes Jeff Layton
2022-08-26 21:47 ` [PATCH v3 6/7] nfs: report the inode version in statx if requested Jeff Layton
2022-08-26 21:47 ` [PATCH v3 7/7] ceph: fill in the change attribute in statx requests Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b13812a68310e49cc6fb649c2b1c25287712a8af.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=dwysocha@redhat.com \
    --cc=jack@suse.cz \
    --cc=lczerner@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=trondmy@hammerspace.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xiubli@redhat.com \
    --cc=zohar@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).