All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>,
	John Stultz <jstultz@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Boyd <sboyd@kernel.org>,
	Chandan Babu R <chandan.babu@oracle.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Dave Chinner <david@fromorbit.com>, Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>, Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Amir Goldstein <amir73il@gmail.com>, Jan Kara <jack@suse.de>,
	David Howells <dhowells@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-mm@kvack.org,
	linux-nfs@vger.kernel.org, Jeff Layton <jlayton@kernel.org>
Subject: [PATCH RFC 0/9] fs: multigrain timestamps (redux)
Date: Wed, 18 Oct 2023 13:41:07 -0400	[thread overview]
Message-ID: <20231018-mgtime-v1-0-4a7a97b1f482@kernel.org> (raw)

The VFS always uses coarse-grained timestamps when updating the
ctime and mtime after a change. This has the benefit of allowing
filesystems to optimize away a lot metadata updates, down to around 1
per jiffy, even when a file is under heavy writes.

Unfortunately, this coarseness has always been an issue when we're
exporting via NFSv3, which relies on timestamps to validate caches. A
lot of changes can happen in a jiffy, so timestamps aren't sufficient to
help the client decide to invalidate the cache.

Even with NFSv4, a lot of exported filesystems don't properly support a
change attribute and are subject to the same problems with timestamp
granularity. Other applications have similar issues with timestamps (e.g
backup applications).

If we were to always use fine-grained timestamps, that would improve the
situation, but that becomes rather expensive, as the underlying
filesystem would have to log a lot more metadata updates.

What we need is a way to only use fine-grained timestamps when they are
being actively queried. The idea is to use an unused bit in the ctime's
tv_nsec field to mark when the mtime or ctime has been queried via
getattr. Once that has been marked, the next m/ctime update will use a
fine-grained timestamp.

The original merge of multigrain timestamps for v6.6 had to be reverted,
as a file with a coarse-grained timestamp could incorrectly appear to be
modified before a file with a fine-grained timestamp, when that wasn't
the case.

This revision solves that problem by making it so that when a
fine-grained timespec64 is handed out, that that value becomes the floor
for further coarse-grained timespec64 fetches. This requires new
timekeeper interfaces with a potential downside: when a file is
stamped with a fine-grained timestamp, it has to (briefly) take the
global timekeeper spinlock.

Because of that, this set takes greater pains to avoid issuing new
fine-grained timestamps when possible. A fine-grained timestamp is now
only required if the current mtime or ctime have been fetched for a
getattr, and the next coarse-grained tick has not happened yet. For any
other case, a coarse-grained timestamp is fine, and that is done using
the seqcount.

In order to get some hard numbers about how often the lock would be
taken, I've added a couple of percpu counters and a debugfs file for
tracking both types of multigrain timekeeper fetches.

With this, I did a kdevops fstests run on xfs (CRC mode). I ran "make
fstests-baseline" and then immediately grabbed the counter values, and
calcuated the percentage:

$ time make fstests-baseline
real    324m17.337s
user    27m23.213s
sys     2m40.313s

fine            3059498
coarse          383848171
pct fine        .79075661

Next I did a kdevops fstests run with NFS. One server serving 3 clients
(v4.2, v4.0 and v3). Again, timed "make fstests-baseline" and then
grabbed the multigrain counters from the NFS server:

$ time make fstests-baseline
real    181m57.585s
user    16m8.266s
sys     1m45.864s

fine            8137657
coarse          44726007
pct fine        15.393668

We can't run as many tests on nfs as xfs, so the run is shorter. nfsd is
a very getattr-heavy workload, and the clients aggressively coalesce
writes, so this is probably something of a pessimal case for number of
fine-grained timestamps over time.

At this point I'm mainly wondering whether (briefly) taking the
timekeeper spinlock in this codepath is unreasonable. It does very
little work under it, so I'm hoping the impact would be unmeasurable for
most workloads.

Side Q: what's the best tool for measuring spinlock contention? It'd be
interesting to see how often (and how long) we end up spinning on this
lock under different workloads.

Note that some of the patches in the series are virtually identical to
the ones before. I stripped the prior Reviewed-by/Acked-by tags though
since the underlying infrastructure has changed a bit.

Comments and suggestions welcome.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Jeff Layton (9):
      fs: switch timespec64 fields in inode to discrete integers
      timekeeping: new interfaces for multigrain timestamp handing
      timekeeping: add new debugfs file to count multigrain timestamps
      fs: add infrastructure for multigrain timestamps
      fs: have setattr_copy handle multigrain timestamps appropriately
      xfs: switch to multigrain timestamps
      ext4: switch to multigrain timestamps
      btrfs: convert to multigrain timestamps
      tmpfs: add support for multigrain timestamps

 fs/attr.c                           |  52 ++++++++++++++--
 fs/btrfs/file.c                     |  25 ++------
 fs/btrfs/super.c                    |   5 +-
 fs/ext4/super.c                     |   2 +-
 fs/inode.c                          |  70 ++++++++++++++++++++-
 fs/stat.c                           |  41 ++++++++++++-
 fs/xfs/libxfs/xfs_trans_inode.c     |   6 +-
 fs/xfs/xfs_iops.c                   |  10 +--
 fs/xfs/xfs_super.c                  |   2 +-
 include/linux/fs.h                  |  85 ++++++++++++++++++--------
 include/linux/timekeeper_internal.h |   2 +
 include/linux/timekeeping.h         |   4 ++
 kernel/time/timekeeping.c           | 117 ++++++++++++++++++++++++++++++++++++
 mm/shmem.c                          |   2 +-
 14 files changed, 352 insertions(+), 71 deletions(-)
---
base-commit: 12cd44023651666bd44baa36a5c999698890debb
change-id: 20231016-mgtime-fe3ea75c6f59

Best regards,
-- 
Jeff Layton <jlayton@kernel.org>


             reply	other threads:[~2023-10-18 17:41 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-18 17:41 Jeff Layton [this message]
2023-10-18 17:41 ` [PATCH RFC 1/9] fs: switch timespec64 fields in inode to discrete integers Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing Jeff Layton
2023-10-18 19:18   ` Linus Torvalds
2023-10-18 20:47     ` Jeff Layton
2023-10-18 21:31       ` Linus Torvalds
2023-10-18 21:52         ` Jeff Layton
2023-10-19  9:29           ` Christian Brauner
2023-10-19 11:28             ` Jeff Layton
2023-10-19 22:02               ` Dave Chinner
2023-10-20 12:12                 ` Jeff Layton
2023-10-20 20:06                   ` Linus Torvalds
2023-10-20 20:20                     ` Linus Torvalds
2023-10-20 21:05                     ` Jeff Layton
2023-10-22 22:17                   ` Dave Chinner
2023-10-23 14:45                     ` Jeff Layton
2023-10-23 23:26                       ` Dave Chinner
2023-10-24  0:18                         ` Linus Torvalds
2023-10-24  3:40                           ` Dave Chinner
2023-10-24  4:10                             ` Linus Torvalds
2023-10-24  7:08                             ` Amir Goldstein
2023-10-24 18:40                               ` Jeff Layton
2023-10-25  8:05                                 ` Dave Chinner
2023-10-25 10:41                                   ` Amir Goldstein
2023-10-25 12:25                                   ` Jeff Layton
2023-10-26  2:20                                     ` Dave Chinner
2023-10-26  5:42                                       ` Amir Goldstein
2023-10-27 10:35                                       ` Jeff Layton
2023-10-30 22:37                                         ` Dave Chinner
2023-10-30 23:11                                           ` Linus Torvalds
2023-10-31  1:42                                             ` Dave Chinner
2023-10-31  7:03                                               ` Amir Goldstein
2023-10-31 10:30                                                 ` Christian Brauner
2023-10-31 11:29                                                 ` Jeff Layton
2023-10-31 21:57                                                   ` Dave Chinner
2023-10-31 23:02                                                     ` Darrick J. Wong
2023-10-31 23:47                                                       ` Dave Chinner
2023-11-01 10:16                                                     ` Jan Kara
2023-11-01 11:38                                                       ` Amir Goldstein
2023-11-02 10:17                                                         ` Jeff Layton
2023-11-01 20:10                                                       ` Linus Torvalds
2023-11-01 21:34                                                         ` Trond Myklebust
2023-11-01 22:23                                                           ` Linus Torvalds
2023-11-01 22:45                                                             ` Trond Myklebust
2023-11-01 23:29                                                           ` Dave Chinner
2023-11-02 10:29                                                             ` Jeff Layton
2023-11-02 10:15                                                         ` Jeff Layton
2023-10-31 23:12                                                 ` Darrick J. Wong
2023-11-01  8:08                                                   ` Amir Goldstein
2023-10-31 11:26                                               ` Jeff Layton
2023-10-31 19:43                                                 ` John Stoffel
2023-10-31 11:04                                           ` Jeff Layton
2023-10-31 12:22                                             ` Jan Kara
2023-10-31 12:55                                               ` Jeff Layton
2023-10-30 23:34                                         ` ronnie sahlberg
2023-10-24 14:24                             ` Jeff Layton
2023-10-24 19:06                           ` Jeff Layton
2023-10-24 19:40                             ` Linus Torvalds
2023-10-24 20:19                               ` Jeff Layton
2023-10-31 10:26               ` Christian Brauner
2023-10-31 13:55                 ` Jeff Layton
2023-10-19 22:00   ` Thomas Gleixner
2023-10-19 22:41     ` Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 3/9] timekeeping: add new debugfs file to count multigrain timestamps Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 4/9] fs: add infrastructure for " Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 5/9] fs: have setattr_copy handle multigrain timestamps appropriately Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 6/9] xfs: switch to multigrain timestamps Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 7/9] ext4: " Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 8/9] btrfs: convert " Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 9/9] tmpfs: add support for " Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231018-mgtime-v1-0-4a7a97b1f482@kernel.org \
    --to=jlayton@kernel.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=clm@fb.com \
    --cc=david@fromorbit.com \
    --cc=dhowells@redhat.com \
    --cc=djwong@kernel.org \
    --cc=dsterba@suse.com \
    --cc=hughd@google.com \
    --cc=jack@suse.de \
    --cc=josef@toxicpanda.com \
    --cc=jstultz@google.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.