linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: Trond Myklebust <trondmy@hammerspace.com>,
	"djwong@kernel.org" <djwong@kernel.org>,
	"zohar@linux.ibm.com" <zohar@linux.ibm.com>,
	"brauner@kernel.org" <brauner@kernel.org>,
	"xiubli@redhat.com" <xiubli@redhat.com>,
	"neilb@suse.de" <neilb@suse.de>,
	"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"dwysocha@redhat.com" <dwysocha@redhat.com>,
	"david@fromorbit.com" <david@fromorbit.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"tytso@mit.edu" <tytso@mit.edu>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"jack@suse.cz" <jack@suse.cz>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"lczerner@redhat.com" <lczerner@redhat.com>,
	"adilger.kernel@dilger.ca" <adilger.kernel@dilger.ca>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: [PATCH v3 4/7] xfs: don't bump the i_version on an atime update in xfs_vn_update_time
Date: Sun, 28 Aug 2022 16:25:28 +0300	[thread overview]
Message-ID: <CAOQ4uxgThXDEO3mxR_PtPgcPsF7ueqFUxHO3F3KE9sVqi8sLJQ@mail.gmail.com> (raw)
In-Reply-To: <b13812a68310e49cc6fb649c2b1c25287712a8af.camel@kernel.org>

On Sat, Aug 27, 2022 at 7:10 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Sat, 2022-08-27 at 16:03 +0000, Trond Myklebust wrote:
> > On Sat, 2022-08-27 at 08:46 -0700, Darrick J. Wong wrote:
> > > On Sat, Aug 27, 2022 at 09:14:30AM -0400, Jeff Layton wrote:
> > > > On Sat, 2022-08-27 at 11:01 +0300, Amir Goldstein wrote:
> > > > > On Sat, Aug 27, 2022 at 10:26 AM Amir Goldstein
> > > > > <amir73il@gmail.com> wrote:
> > > > > >
> > > > > > On Sat, Aug 27, 2022 at 12:49 AM Jeff Layton
> > > > > > <jlayton@kernel.org> wrote:
> > > > > > >
> > > > > > > xfs will update the i_version when updating only the atime
> > > > > > > value, which
> > > > > > > is not desirable for any of the current consumers of
> > > > > > > i_version. Doing so
> > > > > > > leads to unnecessary cache invalidations on NFS and extra
> > > > > > > measurement
> > > > > > > activity in IMA.
> > > > > > >
> > > > > > > Add a new XFS_ILOG_NOIVER flag, and use that to indicate that
> > > > > > > the
> > > > > > > transaction should not update the i_version. Set that value
> > > > > > > in
> > > > > > > xfs_vn_update_time if we're only updating the atime.
> > > > > > >
> > > > > > > Cc: Dave Chinner <david@fromorbit.com>
> > > > > > > Cc: NeilBrown <neilb@suse.de>
> > > > > > > Cc: Trond Myklebust <trondmy@hammerspace.com>
> > > > > > > Cc: David Wysochanski <dwysocha@redhat.com>
> > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > > ---
> > > > > > >  fs/xfs/libxfs/xfs_log_format.h  |  2 +-
> > > > > > >  fs/xfs/libxfs/xfs_trans_inode.c |  2 +-
> > > > > > >  fs/xfs/xfs_iops.c               | 11 +++++++++--
> > > > > > >  3 files changed, 11 insertions(+), 4 deletions(-)
> > > > > > >
> > > > > > > Dave has NACK'ed this patch, but I'm sending it as a way to
> > > > > > > illustrate
> > > > > > > the problem. I still think this approach should at least fix
> > > > > > > the worst
> > > > > > > problems with atime updates being counted. We can look to
> > > > > > > carve out
> > > > > > > other "spurious" i_version updates as we identify them.
> > > > > > >
> > > > > >
> > > > > > AFAIK, "spurious" is only inode blocks map changes due to
> > > > > > writeback
> > > > > > of dirty pages. Anybody know about other cases?
> > > > > >
> > > > > > Regarding inode blocks map changes, first of all, I don't think
> > > > > > that there is
> > > > > > any practical loss from invalidating NFS client cache on dirty
> > > > > > data writeback,
> > > > > > because NFS server should be serving cold data most of the
> > > > > > time.
> > > > > > If there are a few unneeded cache invalidations they would only
> > > > > > be temporary.
> > > > > >
> > > > >
> > > > > Unless there is an issue with a writer NFS client that
> > > > > invalidates its
> > > > > own attribute
> > > > > caches on server data writeback?
> > > > >
> > > >
> > > > The client just looks at the file attributes (of which i_version is
> > > > but
> > > > one), and if certain attributes have changed (mtime, ctime,
> > > > i_version,
> > > > etc...) then it invalidates its cache.
> > > >
> > > > In the case of blocks map changes, could that mean a difference in
> > > > the
> > > > observable sparse regions of the file? If so, then a READ_PLUS
> > > > before
> > > > the change and a READ_PLUS after could give different results.
> > > > Since
> > > > that difference is observable by the client, I'd think we'd want to
> > > > bump
> > > > i_version for that anyway.
> > >
> > > How /is/ READ_PLUS supposed to detect sparse regions, anyway?  I know
> > > that's been the subject of recent debate.  At least as far as XFS is
> > > concerned, a file range can go from hole -> delayed allocation
> > > reservation -> unwritten extent -> (actual writeback) -> written
> > > extent.
> > > The dance became rather more complex when we added COW.  If any of
> > > that
> > > will make a difference for READ_PLUS, then yes, I think you'd want
> > > file
> > > writeback activities to bump iversion to cause client invalidations,
> > > like (I think) Dave said.
> > >
> > > The fs/iomap/ implementation of SEEK_DATA/SEEK_HOLE reports data for
> > > written and delalloc extents; and an unwritten extent will report
> > > data
> > > for any pagecache it finds.
> > >
> >
> > READ_PLUS should never return anything different than a read() system
> > call would return for any given area. The way it reports sparse regions
> > vs. data regions is purely an RPC formatting convenience.
> >
> > The only point to note about NFS READ and READ_PLUS is that because the
> > client is forced to send multiple RPC calls if the user is trying to
> > read a region that is larger than the 'rsize' value, it is possible
> > that these READ/READ_PLUS calls may be processed out of order, and so
> > the result may end up looking different than if you had executed a
> > read() call for the full region directly on the server.
> > However each individual READ / READ_PLUS reply should look as if the
> > user had called read() on that rsize-sized section of the file.
> > > >
>
> Yeah, thinking about it some more, simply changing the block allocation
> is not something that should affect the ctime, so we probably don't want
> to bump i_version on it. It's an implicit change, IOW, not an explicit
> one.
>
> The fact that xfs might do that is unfortunate, but it's not the end of
> the world and it still would conform to the proposed definition for
> i_version. In practice, this sort of allocation change should come soon
> after the file was written, so one would hope that any damage due to the
> false i_version bump would be minimized.
>

That was exactly my point.

> It would be nice to teach it not to do that however. Maybe we can insert
> the NOIVER flag at a strategic place to avoid it?

Why would that be nice to avoid?
You did not specify any use case where incrementing i_version
on block mapping change matters in practice.
On the contrary, you said that NFS client writer sends COMMIT on close,
which should stabilize i_version for the next readers.

Given that we already have an xfs implementation that does increment
i_version on block mapping changes and it would be a pain to change
that or add a new user options, I don't see the point in discussing it further
unless there is a good incentive for avoiding i_version updates in those cases.

Thanks,
Amir.

  parent reply	other threads:[~2022-08-28 13:25 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-26 21:46 [PATCH v3 0/7] vfs: clean up i_version behavior and expose it via statx Jeff Layton
2022-08-26 21:46 ` [PATCH v3 1/7] iversion: update comments with info about atime updates Jeff Layton
2022-08-29  7:56   ` Dave Chinner
2022-08-29 10:39     ` Jeff Layton
2022-08-29 22:58       ` NeilBrown
2022-08-30 11:40         ` Jeff Layton
2022-08-30 13:24           ` J. Bruce Fields
2022-08-30 13:50             ` Jeff Layton
2022-08-30 14:44               ` J. Bruce Fields
2022-08-30 14:58                 ` Trond Myklebust
2022-08-30 15:17                   ` J. Bruce Fields
2022-08-30 15:43                     ` Trond Myklebust
2022-08-30 17:02                       ` Jeff Layton
2022-08-30 17:47                         ` Trond Myklebust
2022-08-30 17:53                           ` Jeff Layton
2022-08-30 18:25                             ` Trond Myklebust
2022-08-30 19:11                               ` Jeff Layton
2022-08-30 18:32                         ` J. Bruce Fields
2022-08-30 19:30                           ` Jeff Layton
2022-08-30 19:46                             ` J. Bruce Fields
2022-08-30 19:57                               ` Jeff Layton
2022-08-30 20:08                                 ` J. Bruce Fields
2022-08-30  1:04       ` Dave Chinner
2022-08-30 12:38         ` Jeff Layton
2022-08-26 21:46 ` [PATCH v3 2/7] ext4: fix i_version handling in ext4 Jeff Layton
2022-08-26 21:46 ` [PATCH v3 3/7] ext4: unconditionally enable the i_version counter Jeff Layton
2022-08-29 14:51   ` Jan Kara
2022-08-26 21:47 ` [PATCH v3 4/7] xfs: don't bump the i_version on an atime update in xfs_vn_update_time Jeff Layton
2022-08-27  7:26   ` Amir Goldstein
2022-08-27  8:01     ` Amir Goldstein
2022-08-27 13:14       ` Jeff Layton
2022-08-27 15:46         ` Darrick J. Wong
2022-08-27 16:03           ` Trond Myklebust
2022-08-27 16:10             ` Jeff Layton
2022-08-27 17:06               ` Trond Myklebust
2022-08-28 13:25               ` Amir Goldstein [this message]
2022-08-28 14:37                 ` Jeff Layton
2022-08-28 16:53                   ` Amir Goldstein
2022-08-29  5:48                   ` Dave Chinner
2022-08-29 10:33                     ` Jeff Layton
2022-08-30  0:08                       ` Dave Chinner
2022-08-30 11:20                         ` Jeff Layton
2022-08-28 17:30           ` Amir Goldstein
2022-08-26 21:47 ` [PATCH v3 5/7] vfs: report an inode version in statx for IS_I_VERSION inodes Jeff Layton
2022-08-26 21:47 ` [PATCH v3 6/7] nfs: report the inode version in statx if requested Jeff Layton
2022-08-26 21:47 ` [PATCH v3 7/7] ceph: fill in the change attribute in statx requests Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ4uxgThXDEO3mxR_PtPgcPsF7ueqFUxHO3F3KE9sVqi8sLJQ@mail.gmail.com \
    --to=amir73il@gmail.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=brauner@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=dwysocha@redhat.com \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=lczerner@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=trondmy@hammerspace.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xiubli@redhat.com \
    --cc=zohar@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).