linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Dave Chinner <david@fromorbit.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Amir Goldstein <amir73il@gmail.com>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	Christian Brauner <brauner@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	John Stultz <jstultz@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Boyd <sboyd@kernel.org>,
	Chandan Babu R <chandan.babu@oracle.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>, Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.de>, David Howells <dhowells@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-mm@kvack.org,
	linux-nfs@vger.kernel.org
Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing
Date: Tue, 31 Oct 2023 07:26:37 -0400	[thread overview]
Message-ID: <b4c04efdde3bc7d107d0bdc68e100a94942aca3c.camel@kernel.org> (raw)
In-Reply-To: <ZUBbj8XsA6uW8ZDK@dread.disaster.area>

On Tue, 2023-10-31 at 12:42 +1100, Dave Chinner wrote:
> On Mon, Oct 30, 2023 at 01:11:56PM -1000, Linus Torvalds wrote:
> > On Mon, 30 Oct 2023 at 12:37, Dave Chinner <david@fromorbit.com> wrote:
> > > 
> > > If XFS can ignore relatime or lazytime persistent updates for given
> > > situations, then *we don't need to make periodic on-disk updates of
> > > atime*. This makes the whole problem of "persistent atime update bumps
> > > i_version" go away because then we *aren't making persistent atime
> > > updates* except when some other persistent modification that bumps
> > > [cm]time occurs.
> > 
> > Well, I think this should be split into two independent questions:
> > 
> >  (a) are relatime or lazytime atime updates persistent if nothing else changes?
> 
> They only become persistent after 24 hours or, in the case of
> relatime, immediately persistent if mtime < atime (i.e. read after a
> modification). Those are the only times that the VFS triggers
> persistent writeback of atime, and it's the latter case (mtime <
> atime) that is the specific trigger that exposed the problem with
> atime bumping i_version in the first place.
> 
> >  (b) do atime updates _ever_ update i_version *regardless* of relatime
> > or lazytime?
> > 
> > and honestly, I think the best answer to (b) would be that "no,
> > i_version should simply not change for atime updates". And I think
> > that answer is what it is because no user of i_version seems to want
> > it.
> 
> As I keep repeating: Repeatedly stating that "atime should not bump
> i_version" does not address the questions I'm asking *at all*.
> 
> > Now, the reason it's a single question for you is that apparently for
> > XFS, the only thing that matters is "inode was written to disk" and
> > that "di_changecount" value is thus related to the persistence of
> > atime updates, but splitting di_changecount out to be a separate thing
> > from i_version seems to be on the table, so I think those two things
> > really could be independent issues.
> 
> Wrong way around - we'd have to split i_version out from
> di_changecount. It's i_version that has changed semantics, not
> di_changecount, and di_changecount behaviour must remain unchanged.
> 

I have to take issue with your characterization of this. The
requirements for NFS's change counter have not changed. Clearly there
was a breakdown in communications when it was first implemented in Linux
that caused atime updates to get counted in the i_version value, but
that was never intentional and never by design.

I'm simply trying to correct this historical mistake.

> What I really don't want to do is implement a new i_version field in
> the XFS on-disk format. What this redefinition of i_version
> semantics has made clear is that i_version is *user defined
> metadata*, not internal filesystem metadata that is defined by the
> filesystem on-disk format.
> 
> User defined persistent metadata *belongs in xattrs*, not in the
> core filesystem on-disk formats. If the VFS wants to define and
> manage i_version behaviour, smeantics and persistence independently
> of the filesystems that manage the persistent storage (as it clearly
> does!) then we should treat it just like any other VFS defined inode
> metadata (e.g. per inode objects like security constraints, ACLs,
> fsverity digests, fscrypt keys, etc). i.e. it should be in a named
> xattr, not directly implemented in the filesystem on-disk format
> deinfitions.
> 
> Then the application can change the meaning of the metadata whenever
> and however it likes. Then filesystem developers just don't need
> to care about it at all because the VFS specific persistent metadata
> is not part of the on-disk format we need to maintain
> cross-platform forwards and backwards compatibility for.
> 
> > > But I don't want to do this unconditionally - for systems not
> > > running anything that samples i_version we want relatime/lazytime
> > > to behave as they are supposed to and do periodic persistent updates
> > > as per normal. Principle of least surprise and all that jazz.
> > 
> > Well - see above: I think in a perfect world, we'd simply never change
> > i_version at all for any atime updates, and relatime/lazytime simply
> > wouldn't be an issue at all wrt i_version.
> 
> Right, that's what I'd like, especially as the new definition of
> i_version - "only change when [cm]time changes" - means that the VFS
> i_version is really now just a glorified timestamp.
> 
> > Wouldn't _that_ be the trule "least surprising" behavior? Considering
> > that nobody wants i_version to change for what are otherwise pure
> > reads (that's kind of the *definition* of atime, after all).
> 
> So, if you don't like the idea of us ignoring relatime/lazytime
> conditionally, are we allowed to simply ignore them *all the time*
> and do all timestamp updates in the manner that causes users the
> least amount of pain?
> 
> I mean, relatime only exists because atime updates cause users pain.
> lazytime only exists because relatime doesn't address the pain that
> timestamp updates cause mixed read/write or pure O_DSYNC overwrite
> workloads pain. noatime is a pain because it loses all atime
> updates.
> 
> There is no "one size is right for everyone", so why not just let
> filesystems do what is most efficient from an internal IO and
> persistence POV whilst still maintaining the majority of "expected"
> behaviours?
> 
> Keep in mind, though, that this is all moot if we can get rid of
> i_version entirely....
> 
> > Now, the annoyance here is that *both* (a) and (b) then have that
> > impact of "i_version no longer tracks di_changecount".
> 
> .... and what is annoying is that that the new i_version just a
> glorified ctime change counter. What we should be fixing is ctime -
> integrating this change counting into ctime would allow us to make
> i_version go away entirely. i.e. We don't need a persistent ctime
> change counter if the ctime has sufficient resolution or persistent
> encoding that it does not need an external persistent change
> counter.
> 
> That was reasoning behind the multi-grain timestamps. While the mgts
> implementation was flawed, the reasoning behind it certainly isn't.
> We should be trying to get rid of i_version by integrating it into
> ctime updates, not arguing how atime vs i_version should work.
> 
> > So I don't think the issue here is "i_version" per se. I think in a
> > vacuum, the best option of i_version is pretty obvious.  But if you
> > want i_version to track di_changecount, *then* you end up with that
> > situation where the persistence of atime matters, and i_version needs
> > to update whenever a (persistent) atime update happens.
> 
> Yet I don't want i_version to track di_changecount.
> 
> I want to *stop supporting i_version altogether* in XFS.
> 
> I want i_version as filesystem internal metadata to die completely.
> 
> I don't want to change the on disk format to add a new i_version
> field because we'll be straight back in this same siutation when the
> next i_version bug is found and semantics get changed yet again.
> 
> Hence if we can encode the necessary change attributes into ctime,
> we can drop VFS i_version support altogether.  Then the "atime bumps
> i_version" problem also goes away because then we *don't use
> i_version*.
> 
> But if we can't get the VFS to do this with ctime, at least we have
> the abstractions available to us (i.e. timestamp granularity and
> statx change cookie) to allow XFS to implement this sort of
> ctime-with-integrated-change-counter internally to the filesystem
> and be able to drop i_version support.... 
> 
> [....]

> > This really is all *entirely* an artifact of that "bi_changecount" vs
> > "i_version" being tied together. You did seem to imply that you'd be
> > ok with having "bi_changecount" be split from i_version, ie from an
> > earlier email in this thread:
> > 
> >  "Now that NFS is using a proper abstraction (i.e. vfs_statx()) to get
> >   the change cookie, we really don't need to expose di_changecount in
> >   i_version at all - we could simply copy an internal di_changecount
> >   value into the statx cookie field in xfs_vn_getattr() and there
> >   would be almost no change of behaviour from the perspective of NFS
> >   and IMA at all"
> 
> .... which is what I was talking about here.
> 
> i.e. I was not talking about splitting i_version from di_changecount
> - I was talking about being able to stop supporting the VFS
> i_version counter entirely and still having NFS and IMA work
> correctly.
> 
> Continually bring the argument back to "atime vs i_version" misses
> the bigger issues around this new i_version definition and
> implementation, and that the real solution should be to fix ctime
> updates to make i_version at the VFS level go away forever.
> 

Ok, so your proposal is to keep using coarse grained timestamps, but use
the "insignificant" bits in them to store a change counter?

That sounds complex and fraught with peril, but I'm willing to listen to
some specifics about how that would work.
-- 
Jeff Layton <jlayton@kernel.org>

  parent reply	other threads:[~2023-10-31 11:26 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-18 17:41 [PATCH RFC 0/9] fs: multigrain timestamps (redux) Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 1/9] fs: switch timespec64 fields in inode to discrete integers Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing Jeff Layton
2023-10-18 19:18   ` Linus Torvalds
2023-10-18 20:47     ` Jeff Layton
2023-10-18 21:31       ` Linus Torvalds
2023-10-18 21:52         ` Jeff Layton
2023-10-19  9:29           ` Christian Brauner
2023-10-19 11:28             ` Jeff Layton
2023-10-19 22:02               ` Dave Chinner
2023-10-20 12:12                 ` Jeff Layton
2023-10-20 20:06                   ` Linus Torvalds
2023-10-20 20:20                     ` Linus Torvalds
2023-10-20 21:05                     ` Jeff Layton
2023-10-22 22:17                   ` Dave Chinner
2023-10-23 14:45                     ` Jeff Layton
2023-10-23 23:26                       ` Dave Chinner
2023-10-24  0:18                         ` Linus Torvalds
2023-10-24  3:40                           ` Dave Chinner
2023-10-24  4:10                             ` Linus Torvalds
2023-10-24  7:08                             ` Amir Goldstein
2023-10-24 18:40                               ` Jeff Layton
2023-10-25  8:05                                 ` Dave Chinner
2023-10-25 10:41                                   ` Amir Goldstein
2023-10-25 12:25                                   ` Jeff Layton
2023-10-26  2:20                                     ` Dave Chinner
2023-10-26  5:42                                       ` Amir Goldstein
2023-10-27 10:35                                       ` Jeff Layton
2023-10-30 22:37                                         ` Dave Chinner
2023-10-30 23:11                                           ` Linus Torvalds
2023-10-31  1:42                                             ` Dave Chinner
2023-10-31  7:03                                               ` Amir Goldstein
2023-10-31 10:30                                                 ` Christian Brauner
2023-10-31 11:29                                                 ` Jeff Layton
2023-10-31 21:57                                                   ` Dave Chinner
2023-10-31 23:02                                                     ` Darrick J. Wong
2023-10-31 23:47                                                       ` Dave Chinner
2023-11-01 10:16                                                     ` Jan Kara
2023-11-01 11:38                                                       ` Amir Goldstein
2023-11-02 10:17                                                         ` Jeff Layton
2023-11-01 20:10                                                       ` Linus Torvalds
2023-11-01 21:34                                                         ` Trond Myklebust
2023-11-01 22:23                                                           ` Linus Torvalds
2023-11-01 22:45                                                             ` Trond Myklebust
2023-11-01 23:29                                                           ` Dave Chinner
2023-11-02 10:29                                                             ` Jeff Layton
2023-11-02 10:15                                                         ` Jeff Layton
2023-10-31 23:12                                                 ` Darrick J. Wong
2023-11-01  8:08                                                   ` Amir Goldstein
2023-10-31 11:26                                               ` Jeff Layton [this message]
2023-10-31 19:43                                                 ` John Stoffel
2023-10-31 11:04                                           ` Jeff Layton
2023-10-31 12:22                                             ` Jan Kara
2023-10-31 12:55                                               ` Jeff Layton
2023-10-30 23:34                                         ` ronnie sahlberg
2023-10-24 14:24                             ` Jeff Layton
2023-10-24 19:06                           ` Jeff Layton
2023-10-24 19:40                             ` Linus Torvalds
2023-10-24 20:19                               ` Jeff Layton
2023-10-31 10:26               ` Christian Brauner
2023-10-31 13:55                 ` Jeff Layton
2023-10-19 22:00   ` Thomas Gleixner
2023-10-19 22:41     ` Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 3/9] timekeeping: add new debugfs file to count multigrain timestamps Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 4/9] fs: add infrastructure for " Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 5/9] fs: have setattr_copy handle multigrain timestamps appropriately Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 6/9] xfs: switch to multigrain timestamps Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 7/9] ext4: " Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 8/9] btrfs: convert " Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 9/9] tmpfs: add support for " Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4c04efdde3bc7d107d0bdc68e100a94942aca3c.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=clm@fb.com \
    --cc=david@fromorbit.com \
    --cc=dhowells@redhat.com \
    --cc=djwong@kernel.org \
    --cc=dsterba@suse.com \
    --cc=hughd@google.com \
    --cc=jack@suse.de \
    --cc=josef@toxicpanda.com \
    --cc=jstultz@google.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).