All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/19] fs: rework and optimize i_version handling in filesystems
@ 2017-12-13 14:19 Jeff Layton
  2017-12-13 14:19 ` [PATCH 01/19] fs: new API for handling inode->i_version Jeff Layton
                   ` (19 more replies)
  0 siblings, 20 replies; 46+ messages in thread
From: Jeff Layton @ 2017-12-13 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, hch, neilb, bfields, amir73il, jack, viro

From: Jeff Layton <jlayton@redhat.com>

About a year ago, I sent a pile of patches that overhauled how the
inode->i_version field is handled in filesystems. This is a follow up
to that initial series.

tl;dr: I think we can greatly reduce the cost of the inode->i_version
counter, by exploiting the fact that we don't need to increment it
if no one is looking at it. We can also clean up the code to prepare
to eventually expose this value via statx().

The inode->i_version field is supposed to be a value that changes
whenever there is any data or metadata change to the inode. Some
filesystems use it internally to detect directory changes during
readdir. knfsd will use it if the filesystem has MS_I_VERSION
set. IMA will also use it to optimize away some remeasurement if
it's available.

Only btrfs, ext4, and xfs implement it for data changes. Because of
this, these filesystems must log the inode to disk whenever the
i_version counter changes. That has a non-zero performance impact,
especially on write-heavy workloads, because we end up dirtying the
inode metadata on every write, not just when the times change. [1]

It turns out though that none of these users of i_version require that
i_version change on every change to the file. The only real requirement
is that it be different if _something_ changed since the last time we
queried for it.

If we keep track of when something queries the value, we can avoid
bumping the counter and an on-disk update when nothing else has changed
if no one has queried it since it was last incremented.

This patchset changes the code to only bump the i_version counter when
it's strictly necessary, or when we're updating the inode metadata
anyway (e.g. when times change).

It takes the approach of converting the existing accessors of i_version
to use a new API, while leaving the underlying implementation mostly the
same.  The last patch then converts the existing implementation to keep
track of whether the value has been queried since it was last
incremented and uses that to avoid incrementing the counter when it can.

With this, we reduce inode metadata updates across all 3 filesystems
down to roughly the frequency of the timestamp granularity, particularly
when it's not being queried (the vastly common case).

The pessimal workload here is 1 byte writes, and it helps that
significantly. Of course, that's not what we'd consider a real-world
workload.

A tiobench-example.fio workload also shows some modest performance
gains, and I've gotten mails from the kernel test robot that show some
significant performance gains on some microbenchmarks (case-msync-mt in
the vm-scalability testsuite to be specific), with an earlier version of
this set.

With larger writes, the gains with this patchset mostly vaporize,
but it does not seem to cause performance to regress anywhere, AFAICT.

I'm happy to run other workloads if anyone can suggest them.

At this point, the patchset works and does what it's expected to do in
my own testing. It seems like it's at least a modest performance win
across all 3 major disk-based filesystems. It may also encourage others
to implement i_version as well since it reduces the cost.

[1]: On ext4 it must be turned on with the i_version mount option,
     mostly due to fears of incurring this impact, AFAICT.

Jeff Layton (19):
  fs: new API for handling inode->i_version
  fs: don't take the i_lock in inode_inc_iversion
  fat: convert to new i_version API
  affs: convert to new i_version API
  afs: convert to new i_version API
  btrfs: convert to new i_version API
  exofs: switch to new i_version API
  ext2: convert to new i_version API
  ext4: convert to new i_version API
  nfs: convert to new i_version API
  nfsd: convert to new i_version API
  ocfs2: convert to new i_version API
  ufs: use new i_version API
  xfs: convert to new i_version API
  IMA: switch IMA over to new i_version API
  fs: only set S_VERSION when updating times if necessary
  xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need
    incrementing
  btrfs: only dirty the inode in btrfs_update_time if something was
    changed
  fs: handle inode->i_version more efficiently

 fs/affs/amigaffs.c                |   4 +-
 fs/affs/dir.c                     |   4 +-
 fs/affs/super.c                   |   2 +-
 fs/afs/fsclient.c                 |   2 +-
 fs/afs/inode.c                    |   4 +-
 fs/btrfs/delayed-inode.c          |   6 +-
 fs/btrfs/inode.c                  |  11 +-
 fs/btrfs/tree-log.c               |   3 +-
 fs/exofs/dir.c                    |   8 +-
 fs/exofs/super.c                  |   2 +-
 fs/ext2/dir.c                     |   8 +-
 fs/ext2/super.c                   |   4 +-
 fs/ext4/dir.c                     |   8 +-
 fs/ext4/inline.c                  |   6 +-
 fs/ext4/inode.c                   |  12 +-
 fs/ext4/ioctl.c                   |   2 +-
 fs/ext4/namei.c                   |   4 +-
 fs/ext4/super.c                   |   2 +-
 fs/ext4/xattr.c                   |   4 +-
 fs/fat/dir.c                      |   2 +-
 fs/fat/inode.c                    |   8 +-
 fs/fat/namei_msdos.c              |   6 +-
 fs/fat/namei_vfat.c               |  20 +--
 fs/inode.c                        |   9 +-
 fs/nfs/delegation.c               |   2 +-
 fs/nfs/fscache-index.c            |   4 +-
 fs/nfs/inode.c                    |  16 +--
 fs/nfs/nfs4proc.c                 |   9 +-
 fs/nfs/nfstrace.h                 |   4 +-
 fs/nfs/write.c                    |   7 +-
 fs/nfsd/nfsfh.h                   |   2 +-
 fs/ocfs2/dir.c                    |  14 +--
 fs/ocfs2/inode.c                  |   2 +-
 fs/ocfs2/namei.c                  |   2 +-
 fs/ocfs2/quota_global.c           |   2 +-
 fs/ufs/dir.c                      |   8 +-
 fs/ufs/inode.c                    |   2 +-
 fs/ufs/super.c                    |   2 +-
 fs/xfs/libxfs/xfs_inode_buf.c     |   5 +-
 fs/xfs/xfs_icache.c               |   4 +-
 fs/xfs/xfs_inode.c                |   2 +-
 fs/xfs/xfs_inode_item.c           |   2 +-
 fs/xfs/xfs_trans_inode.c          |  14 ++-
 include/linux/fs.h                | 250 ++++++++++++++++++++++++++++++++++++--
 security/integrity/ima/ima_api.c  |   2 +-
 security/integrity/ima/ima_main.c |   2 +-
 46 files changed, 371 insertions(+), 127 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2017-12-18 14:03 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-13 14:19 [PATCH 00/19] fs: rework and optimize i_version handling in filesystems Jeff Layton
2017-12-13 14:19 ` [PATCH 01/19] fs: new API for handling inode->i_version Jeff Layton
2017-12-13 22:04   ` NeilBrown
2017-12-14  0:27     ` Jeff Layton
2017-12-16  4:17       ` NeilBrown
2017-12-17 13:01         ` Jeff Layton
2017-12-18 14:03         ` Jeff Layton
2017-12-13 14:20 ` [PATCH 02/19] fs: don't take the i_lock in inode_inc_iversion Jeff Layton
2017-12-13 21:52   ` Jeff Layton
2017-12-13 22:07     ` NeilBrown
2017-12-13 14:20 ` [PATCH 03/19] fat: convert to new i_version API Jeff Layton
2017-12-13 14:20 ` [PATCH 04/19] affs: " Jeff Layton
2017-12-13 14:20 ` [PATCH 05/19] afs: " Jeff Layton
2017-12-13 14:20 ` [PATCH 06/19] btrfs: " Jeff Layton
2017-12-13 14:20 ` [PATCH 07/19] exofs: switch " Jeff Layton
2017-12-13 14:20 ` [PATCH 08/19] ext2: convert " Jeff Layton
2017-12-18 12:47   ` Jan Kara
2017-12-13 14:20 ` [PATCH 09/19] ext4: " Jeff Layton
2017-12-14 21:52   ` Theodore Ts'o
2017-12-13 14:20 ` [PATCH 10/19] nfs: " Jeff Layton
2017-12-13 14:20 ` [PATCH 11/19] nfsd: " Jeff Layton
2017-12-13 14:20 ` [PATCH 12/19] ocfs2: " Jeff Layton
2017-12-18 12:49   ` Jan Kara
2017-12-13 14:20 ` [PATCH 13/19] ufs: use " Jeff Layton
2017-12-13 14:20 ` [PATCH 14/19] xfs: convert to " Jeff Layton
2017-12-13 22:48   ` Dave Chinner
2017-12-13 23:25     ` Dave Chinner
2017-12-14  0:10       ` Jeff Layton
2017-12-14  2:17         ` Dave Chinner
2017-12-14 11:16           ` Jeff Layton
2017-12-13 14:20 ` [PATCH 15/19] IMA: switch IMA over " Jeff Layton
2017-12-13 14:20 ` [PATCH 16/19] fs: only set S_VERSION when updating times if necessary Jeff Layton
2017-12-15 12:59   ` Jeff Layton
2017-12-13 14:20 ` [PATCH 17/19] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton
2017-12-13 14:20 ` [PATCH 18/19] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton
2017-12-15 13:03   ` Jeff Layton
2017-12-13 14:20 ` [PATCH 19/19] fs: handle inode->i_version more efficiently Jeff Layton
2017-12-13 15:05 ` [PATCH 00/19] fs: rework and optimize i_version handling in filesystems J. Bruce Fields
2017-12-13 20:14   ` Jeff Layton
2017-12-13 22:10     ` Jeff Layton
2017-12-13 23:03     ` Dave Chinner
2017-12-14  0:02       ` Jeff Layton
2017-12-14 14:14         ` Jeff Layton
2017-12-14 15:14           ` J. Bruce Fields
2017-12-15 15:15             ` Jeff Layton
2017-12-15 15:26               ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.