linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] lazytime fixes and cleanups
@ 2021-01-05  0:54 Eric Biggers
  2021-01-05  0:54 ` [PATCH 01/13] fs: avoid double-writing inodes on lazytime expiration Eric Biggers
                   ` (12 more replies)
  0 siblings, 13 replies; 37+ messages in thread
From: Eric Biggers @ 2021-01-05  0:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, Theodore Ts'o,
	Christoph Hellwig

Hello,

This patchset fixes the lazytime bug which I reported last year
(https://lore.kernel.org/r/20200306004555.GB225345@gmail.com).  This bug
causes inodes with dirty timestamps to remain dirty after a sync, which
causes the inodes to be unnecessarily written again, and also causes the
FS_IOC_REMOVE_ENCRYPTION_KEY ioctl to not work properly.  This bug is
causing xfstest generic/580 to fail when lazytime is enabled.

Ted and Christoph proposed fixes for this.  Ted's fix
(https://lore.kernel.org/r/20200307020043.60118-1-tytso@mit.edu) changed
the call to mark_inode_dirty_sync(inode) in __writeback_single_inode()
to ->dirty_inode(inode, I_DIRTY_TIME_EXPIRED).  However this would have
broken XFS, which wants an I_DIRTY_SYNC notification.  Also, people
preferred a larger rework involving adding a ->lazytime_expired method.

Christoph's fix
(https://lore.kernel.org/r/20200325122825.1086872-3-hch@lst.de)
introduced ->lazytime_expired, but it wasn't correct because it didn't
consider cases in which timestamps are force-expired.

To resolve this, I propose that we first fix the bug by making
__writeback_single_inode() do an I_DIRTY_SYNC notification if the
timestamps expired (patch #1).

Then, the remaining patches introduce ->lazytime_expired and make XFS
use it.  They also clean up various things, such as improving comments.

Also, it turns out that lazytime on XFS is broken because it doesn't
actually write timestamps to disk after a sync() or after 24 hours.
This is fixed by the patch to switch XFS to use ->lazytime_expired.
I've written an xfstest which reproduces this bug.

This patchset applies to v5.11-rc2.

Eric Biggers (13):
  fs: avoid double-writing inodes on lazytime expiration
  gfs2: don't worry about I_DIRTY_TIME in gfs2_fsync()
  fs: only specify I_DIRTY_TIME when needed in generic_update_time()
  fat: only specify I_DIRTY_TIME when needed in fat_update_time()
  fs: don't call ->dirty_inode for lazytime timestamp updates
  fs: pass only I_DIRTY_INODE flags to ->dirty_inode
  fs: correctly document the inode dirty flags
  ext4: simplify i_state checks in __ext4_update_other_inode_time()
  fs: drop redundant checks from __writeback_single_inode()
  fs: clean up __mark_inode_dirty() a bit
  fs: add a lazytime_expired method
  xfs: remove a stale comment from xfs_file_aio_write_checks()
  xfs: implement lazytime_expired

 Documentation/filesystems/locking.rst |  2 +
 Documentation/filesystems/vfs.rst     | 15 ++++-
 fs/ext4/inode.c                       | 20 ++----
 fs/f2fs/super.c                       |  3 -
 fs/fat/misc.c                         | 21 +++---
 fs/fs-writeback.c                     | 94 +++++++++++++++++++--------
 fs/gfs2/file.c                        |  4 +-
 fs/gfs2/super.c                       |  2 -
 fs/inode.c                            | 40 ++++++------
 fs/sync.c                             |  2 +-
 fs/xfs/xfs_file.c                     |  6 --
 fs/xfs/xfs_super.c                    | 12 +---
 include/linux/fs.h                    | 25 +++++--
 13 files changed, 143 insertions(+), 103 deletions(-)


base-commit: e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62
-- 
2.30.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/13] fs: avoid double-writing inodes on lazytime expiration
  2021-01-05  0:54 [PATCH 00/13] lazytime fixes and cleanups Eric Biggers
@ 2021-01-05  0:54 ` Eric Biggers
  2021-01-07 14:47   ` Jan Kara
  2021-01-05  0:54 ` [PATCH 02/13] gfs2: don't worry about I_DIRTY_TIME in gfs2_fsync() Eric Biggers
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2021-01-05  0:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, Theodore Ts'o,
	Christoph Hellwig, stable

From: Eric Biggers <ebiggers@google.com>

When lazytime is enabled and an inode with dirty timestamps is being
expired, either due to dirtytime_expire_interval being exceeded or due
to a sync or syncfs system call, we need to inform the filesystem that
the inode is dirty so that the inode's timestamps can be copied out to
the on-disk data structures.  That's because if the filesystem supports
lazytime, it will have ignored the ->dirty_inode(inode, I_DIRTY_TIME)
notification when the timestamp was modified in memory.

Currently this is accomplished by calling mark_inode_dirty_sync() from
__writeback_single_inode().  However, this has the unfortunate side
effect of also putting the inode the writeback list.  That's not
appropriate in this case, since the inode is already being written.

That causes the inode to remain dirty after a sync.  Normally that's
just wasteful, as it causes the inode to be written twice.  But when
fscrypt is used this bug also partially breaks the
FS_IOC_REMOVE_ENCRYPTION_KEY ioctl, as the ioctl reports that files are
still in-use when they aren't.  For more details, see the original
report at https://lore.kernel.org/r/20200306004555.GB225345@gmail.com

Fix this by calling ->dirty_inode(inode, I_DIRTY_SYNC) directly instead
of mark_inode_dirty_sync().

This fixes xfstest generic/580 when lazytime is enabled.

A later patch will introduce a ->lazytime_expired method to cleanly
separate out the lazytime expiration case, in particular for XFS which
uses the VFS-level dirtiness tracking only for lazytime.  But that's
separate from fixing this bug.  Also, note that XFS will incorrectly
ignore the I_DIRTY_SYNC notification from __writeback_single_inode()
both before and after this patch, as I_DIRTY_TIME was already cleared in
i_state.  Later patches will fix this separate bug.

Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/fs-writeback.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index acfb55834af23..081e335cdee47 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1509,11 +1509,22 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 
 	spin_unlock(&inode->i_lock);
 
-	if (dirty & I_DIRTY_TIME)
-		mark_inode_dirty_sync(inode);
 	/* Don't write the inode if only I_DIRTY_PAGES was set */
 	if (dirty & ~I_DIRTY_PAGES) {
-		int err = write_inode(inode, wbc);
+		int err;
+
+		/*
+		 * If the inode is being written due to a lazytime timestamp
+		 * expiration, then the filesystem needs to be notified about it
+		 * so that e.g. the filesystem can update on-disk fields and
+		 * journal the timestamp update.  Just calling write_inode()
+		 * isn't enough.  Don't call mark_inode_dirty_sync(), as that
+		 * would put the inode back on the dirty list.
+		 */
+		if ((dirty & I_DIRTY_TIME) && inode->i_sb->s_op->dirty_inode)
+			inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
+
+		err = write_inode(inode, wbc);
 		if (ret == 0)
 			ret = err;
 	}

base-commit: e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 02/13] gfs2: don't worry about I_DIRTY_TIME in gfs2_fsync()
  2021-01-05  0:54 [PATCH 00/13] lazytime fixes and cleanups Eric Biggers
  2021-01-05  0:54 ` [PATCH 01/13] fs: avoid double-writing inodes on lazytime expiration Eric Biggers
@ 2021-01-05  0:54 ` Eric Biggers
  2021-01-08  8:54   ` Christoph Hellwig
  2021-01-05  0:54 ` [PATCH 03/13] fs: only specify I_DIRTY_TIME when needed in generic_update_time() Eric Biggers
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2021-01-05  0:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, Theodore Ts'o,
	Christoph Hellwig

From: Eric Biggers <ebiggers@google.com>

The I_DIRTY_TIME flag is primary used within the VFS, and there's no
reason for ->fsync() implementations to do anything with it.  This is
because when !datasync, the VFS will expire dirty timestamps before
calling ->fsync().  (See vfs_fsync_range().)  This turns I_DIRTY_TIME
into I_DIRTY_SYNC.

Therefore, change gfs2_fsync() to not check for I_DIRTY_TIME.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/gfs2/file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index b39b339feddc9..7fe2497755a37 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -749,7 +749,7 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 {
 	struct address_space *mapping = file->f_mapping;
 	struct inode *inode = mapping->host;
-	int sync_state = inode->i_state & I_DIRTY_ALL;
+	int sync_state = inode->i_state & I_DIRTY;
 	struct gfs2_inode *ip = GFS2_I(inode);
 	int ret = 0, ret1 = 0;
 
@@ -762,7 +762,7 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 	if (!gfs2_is_jdata(ip))
 		sync_state &= ~I_DIRTY_PAGES;
 	if (datasync)
-		sync_state &= ~(I_DIRTY_SYNC | I_DIRTY_TIME);
+		sync_state &= ~I_DIRTY_SYNC;
 
 	if (sync_state) {
 		ret = sync_inode_metadata(inode, 1);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 03/13] fs: only specify I_DIRTY_TIME when needed in generic_update_time()
  2021-01-05  0:54 [PATCH 00/13] lazytime fixes and cleanups Eric Biggers
  2021-01-05  0:54 ` [PATCH 01/13] fs: avoid double-writing inodes on lazytime expiration Eric Biggers
  2021-01-05  0:54 ` [PATCH 02/13] gfs2: don't worry about I_DIRTY_TIME in gfs2_fsync() Eric Biggers
@ 2021-01-05  0:54 ` Eric Biggers
  2021-01-08  8:57   ` Christoph Hellwig
  2021-01-05  0:54 ` [PATCH 04/13] fat: only specify I_DIRTY_TIME when needed in fat_update_time() Eric Biggers
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2021-01-05  0:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, Theodore Ts'o,
	Christoph Hellwig

From: Eric Biggers <ebiggers@google.com>

generic_update_time() always passes I_DIRTY_TIME to
__mark_inode_dirty(), which doesn't really make sense because (a)
generic_update_time() might be asked to do only an i_version update, not
also a timestamps update; and (b) I_DIRTY_TIME is only supposed to be
set in i_state if the filesystem has lazytime enabled, so using it
unconditionally in generic_update_time() is inconsistent.

As a result there is a weird edge case where if only an i_version update
was requested (not also a timestamps update) but it is no longer needed
(i.e. inode_maybe_inc_iversion() returns false), then I_DIRTY_TIME will
be set in i_state even if the filesystem isn't mounted with lazytime.

Fix this by only passing I_DIRTY_TIME to __mark_inode_dirty() if the
timestamps were updated and the filesystem has lazytime enabled.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/inode.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 6442d97d9a4ab..d0fa43d8e9d5c 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1743,24 +1743,26 @@ static int relatime_need_update(struct vfsmount *mnt, struct inode *inode,
 
 int generic_update_time(struct inode *inode, struct timespec64 *time, int flags)
 {
-	int iflags = I_DIRTY_TIME;
-	bool dirty = false;
-
-	if (flags & S_ATIME)
-		inode->i_atime = *time;
-	if (flags & S_VERSION)
-		dirty = inode_maybe_inc_iversion(inode, false);
-	if (flags & S_CTIME)
-		inode->i_ctime = *time;
-	if (flags & S_MTIME)
-		inode->i_mtime = *time;
-	if ((flags & (S_ATIME | S_CTIME | S_MTIME)) &&
-	    !(inode->i_sb->s_flags & SB_LAZYTIME))
-		dirty = true;
-
-	if (dirty)
-		iflags |= I_DIRTY_SYNC;
-	__mark_inode_dirty(inode, iflags);
+	int dirty_flags = 0;
+
+	if (flags & (S_ATIME | S_CTIME | S_MTIME)) {
+		if (flags & S_ATIME)
+			inode->i_atime = *time;
+		if (flags & S_CTIME)
+			inode->i_ctime = *time;
+		if (flags & S_MTIME)
+			inode->i_mtime = *time;
+
+		if (inode->i_sb->s_flags & SB_LAZYTIME)
+			dirty_flags |= I_DIRTY_TIME;
+		else
+			dirty_flags |= I_DIRTY_SYNC;
+	}
+
+	if ((flags & S_VERSION) && inode_maybe_inc_iversion(inode, false))
+		dirty_flags |= I_DIRTY_SYNC;
+
+	__mark_inode_dirty(inode, dirty_flags);
 	return 0;
 }
 EXPORT_SYMBOL(generic_update_time);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 04/13] fat: only specify I_DIRTY_TIME when needed in fat_update_time()
  2021-01-05  0:54 [PATCH 00/13] lazytime fixes and cleanups Eric Biggers
                   ` (2 preceding siblings ...)
  2021-01-05  0:54 ` [PATCH 03/13] fs: only specify I_DIRTY_TIME when needed in generic_update_time() Eric Biggers
@ 2021-01-05  0:54 ` Eric Biggers
  2021-01-07 13:13   ` Jan Kara
  2021-01-05  0:54 ` [PATCH 05/13] fs: don't call ->dirty_inode for lazytime timestamp updates Eric Biggers
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2021-01-05  0:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, Theodore Ts'o,
	Christoph Hellwig

From: Eric Biggers <ebiggers@google.com>

As was done for generic_update_time(), only pass I_DIRTY_TIME to
__mark_inode_dirty() when the inode's timestamps were actually updated
and lazytime is enabled.  This avoids a weird edge case where
I_DIRTY_TIME could be set in i_state when lazytime isn't enabled.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/fat/misc.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/fat/misc.c b/fs/fat/misc.c
index f1b2a1fc2a6a4..33e1e0c9fd634 100644
--- a/fs/fat/misc.c
+++ b/fs/fat/misc.c
@@ -329,21 +329,22 @@ EXPORT_SYMBOL_GPL(fat_truncate_time);
 
 int fat_update_time(struct inode *inode, struct timespec64 *now, int flags)
 {
-	int iflags = I_DIRTY_TIME;
-	bool dirty = false;
+	int dirty_flags = 0;
 
 	if (inode->i_ino == MSDOS_ROOT_INO)
 		return 0;
 
-	fat_truncate_time(inode, now, flags);
-	if (flags & S_VERSION)
-		dirty = inode_maybe_inc_iversion(inode, false);
-	if ((flags & (S_ATIME | S_CTIME | S_MTIME)) &&
-	    !(inode->i_sb->s_flags & SB_LAZYTIME))
-		dirty = true;
+	if (flags & (S_ATIME | S_CTIME | S_MTIME)) {
+		fat_truncate_time(inode, now, flags);
+		if (inode->i_sb->s_flags & SB_LAZYTIME)
+			dirty_flags |= I_DIRTY_TIME;
+		else
+			dirty_flags |= I_DIRTY_SYNC;
+	}
+
+	if ((flags & S_VERSION) && inode_maybe_inc_iversion(inode, false))
+		dirty_flags |= I_DIRTY_SYNC;
 
-	if (dirty)
-		iflags |= I_DIRTY_SYNC;
 	__mark_inode_dirty(inode, iflags);
 	return 0;
 }
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 05/13] fs: don't call ->dirty_inode for lazytime timestamp updates
  2021-01-05  0:54 [PATCH 00/13] lazytime fixes and cleanups Eric Biggers
                   ` (3 preceding siblings ...)
  2021-01-05  0:54 ` [PATCH 04/13] fat: only specify I_DIRTY_TIME when needed in fat_update_time() Eric Biggers
@ 2021-01-05  0:54 ` Eric Biggers
  2021-01-07 13:17   ` Jan Kara
  2021-01-08  9:02   ` Christoph Hellwig
  2021-01-05  0:54 ` [PATCH 06/13] fs: pass only I_DIRTY_INODE flags to ->dirty_inode Eric Biggers
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 37+ messages in thread
From: Eric Biggers @ 2021-01-05  0:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, Theodore Ts'o,
	Christoph Hellwig

From: Eric Biggers <ebiggers@google.com>

There is no need to call ->dirty_inode for lazytime timestamp updates
(i.e. for __mark_inode_dirty(I_DIRTY_TIME)), since by the definition of
lazytime, filesystems must ignore these updates.  Filesystems only need
to care about the updated timestamps when they expire.

Therefore, only call ->dirty_inode when I_DIRTY_INODE is set.

Based on a patch from Christoph Hellwig:
https://lore.kernel.org/r/20200325122825.1086872-4-hch@lst.de

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/ext4/inode.c   | 12 +-----------
 fs/f2fs/super.c   |  3 ---
 fs/fs-writeback.c |  6 +++---
 fs/gfs2/super.c   |  2 --
 4 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 27946882d4ce4..4cc6c7834312f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5933,26 +5933,16 @@ int __ext4_mark_inode_dirty(handle_t *handle, struct inode *inode,
  * If the inode is marked synchronous, we don't honour that here - doing
  * so would cause a commit on atime updates, which we don't bother doing.
  * We handle synchronous inodes at the highest possible level.
- *
- * If only the I_DIRTY_TIME flag is set, we can skip everything.  If
- * I_DIRTY_TIME and I_DIRTY_SYNC is set, the only inode fields we need
- * to copy into the on-disk inode structure are the timestamp files.
  */
 void ext4_dirty_inode(struct inode *inode, int flags)
 {
 	handle_t *handle;
 
-	if (flags == I_DIRTY_TIME)
-		return;
 	handle = ext4_journal_start(inode, EXT4_HT_INODE, 2);
 	if (IS_ERR(handle))
-		goto out;
-
+		return;
 	ext4_mark_inode_dirty(handle, inode);
-
 	ext4_journal_stop(handle);
-out:
-	return;
 }
 
 int ext4_change_inode_journal_flag(struct inode *inode, int val)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index b4a07fe62d1a5..cc98dc49f4a26 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1196,9 +1196,6 @@ static void f2fs_dirty_inode(struct inode *inode, int flags)
 			inode->i_ino == F2FS_META_INO(sbi))
 		return;
 
-	if (flags == I_DIRTY_TIME)
-		return;
-
 	if (is_inode_flag_set(inode, FI_AUTO_RECOVER))
 		clear_inode_flag(inode, FI_AUTO_RECOVER);
 
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 081e335cdee47..e3347fd6eb13a 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2264,16 +2264,16 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 	 * Don't do this for I_DIRTY_PAGES - that doesn't actually
 	 * dirty the inode itself
 	 */
-	if (flags & (I_DIRTY_INODE | I_DIRTY_TIME)) {
+	if (flags & I_DIRTY_INODE) {
 		trace_writeback_dirty_inode_start(inode, flags);
 
 		if (sb->s_op->dirty_inode)
 			sb->s_op->dirty_inode(inode, flags);
 
 		trace_writeback_dirty_inode(inode, flags);
-	}
-	if (flags & I_DIRTY_INODE)
+
 		flags &= ~I_DIRTY_TIME;
+	}
 	dirtytime = flags & I_DIRTY_TIME;
 
 	/*
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 2f56acc41c049..042b94288ff11 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -562,8 +562,6 @@ static void gfs2_dirty_inode(struct inode *inode, int flags)
 	int need_endtrans = 0;
 	int ret;
 
-	if (!(flags & I_DIRTY_INODE))
-		return;
 	if (unlikely(gfs2_withdrawn(sdp)))
 		return;
 	if (!gfs2_glock_is_locked_by_me(ip->i_gl)) {
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/13] fs: pass only I_DIRTY_INODE flags to ->dirty_inode
  2021-01-05  0:54 [PATCH 00/13] lazytime fixes and cleanups Eric Biggers
                   ` (4 preceding siblings ...)
  2021-01-05  0:54 ` [PATCH 05/13] fs: don't call ->dirty_inode for lazytime timestamp updates Eric Biggers
@ 2021-01-05  0:54 ` Eric Biggers
  2021-01-08  9:02   ` Christoph Hellwig
  2021-01-05  0:54 ` [PATCH 07/13] fs: correctly document the inode dirty flags Eric Biggers
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2021-01-05  0:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, Theodore Ts'o,
	Christoph Hellwig

From: Eric Biggers <ebiggers@google.com>

->dirty_inode is now only called when I_DIRTY_INODE (I_DIRTY_SYNC and/or
I_DIRTY_DATASYNC) is set.  However it may still be passed other dirty
flags at the same time, provided that these other flags happened to be
passed to __mark_inode_dirty() at the same time as I_DIRTY_INODE.

This doesn't make sense because there is no reason for filesystems to
care about these extra flags.  Nor are filesystems notified about all
updates to these other flags.

Therefore, mask the flags before passing them to ->dirty_inode.

Also properly document ->dirty_inode in vfs.rst.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 Documentation/filesystems/vfs.rst | 5 ++++-
 fs/fs-writeback.c                 | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index ca52c82e5bb54..287b80948a40b 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -270,7 +270,10 @@ or bottom half).
 	->alloc_inode.
 
 ``dirty_inode``
-	this method is called by the VFS to mark an inode dirty.
+	this method is called by the VFS when an inode is marked dirty.
+	This is specifically for the inode itself being marked dirty,
+	not its data.  If the update needs to be persisted by fdatasync(),
+	then I_DIRTY_DATASYNC will be set in the flags argument.
 
 ``write_inode``
 	this method is called when the VFS needs to write an inode to
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e3347fd6eb13a..f20daf4f5e19b 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2268,7 +2268,7 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 		trace_writeback_dirty_inode_start(inode, flags);
 
 		if (sb->s_op->dirty_inode)
-			sb->s_op->dirty_inode(inode, flags);
+			sb->s_op->dirty_inode(inode, flags & I_DIRTY_INODE);
 
 		trace_writeback_dirty_inode(inode, flags);
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/13] fs: correctly document the inode dirty flags
  2021-01-05  0:54 [PATCH 00/13] lazytime fixes and cleanups Eric Biggers
                   ` (5 preceding siblings ...)
  2021-01-05  0:54 ` [PATCH 06/13] fs: pass only I_DIRTY_INODE flags to ->dirty_inode Eric Biggers
@ 2021-01-05  0:54 ` Eric Biggers
  2021-01-08  9:03   ` Christoph Hellwig
  2021-01-05  0:54 ` [PATCH 08/13] ext4: simplify i_state checks in __ext4_update_other_inode_time() Eric Biggers
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2021-01-05  0:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, Theodore Ts'o,
	Christoph Hellwig

From: Eric Biggers <ebiggers@google.com>

The documentation for I_DIRTY_SYNC and I_DIRTY_DATASYNC is a bit
misleading, and I_DIRTY_TIME isn't documented at all.  Fix this.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 include/linux/fs.h | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd47deea7c176..45a0303b2aeb6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2084,8 +2084,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
 /*
  * Inode state bits.  Protected by inode->i_lock
  *
- * Three bits determine the dirty state of the inode, I_DIRTY_SYNC,
- * I_DIRTY_DATASYNC and I_DIRTY_PAGES.
+ * Four bits determine the dirty state of the inode: I_DIRTY_SYNC,
+ * I_DIRTY_DATASYNC, I_DIRTY_PAGES, and I_DIRTY_TIME.
  *
  * Four bits define the lifetime of an inode.  Initially, inodes are I_NEW,
  * until that flag is cleared.  I_WILL_FREE, I_FREEING and I_CLEAR are set at
@@ -2094,12 +2094,20 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
  * Two bits are used for locking and completion notification, I_NEW and I_SYNC.
  *
  * I_DIRTY_SYNC		Inode is dirty, but doesn't have to be written on
- *			fdatasync().  i_atime is the usual cause.
- * I_DIRTY_DATASYNC	Data-related inode changes pending. We keep track of
+ *			fdatasync() (unless I_DIRTY_DATASYNC is also set).
+ *			Timestamp updates are the usual cause.
+ * I_DIRTY_DATASYNC	Data-related inode changes pending.  We keep track of
  *			these changes separately from I_DIRTY_SYNC so that we
  *			don't have to write inode on fdatasync() when only
- *			mtime has changed in it.
+ *			e.g. the timestamps have changed.
  * I_DIRTY_PAGES	Inode has dirty pages.  Inode itself may be clean.
+ * I_DIRTY_TIME		The inode itself only has dirty timestamps, and the
+ *			lazytime mount option is enabled.  We keep track of this
+ *			separately from I_DIRTY_SYNC in order to implement
+ *			lazytime.  This gets cleared if I_DIRTY_INODE
+ *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set.  I.e.
+ *			either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
+ *			i_state, but not both.  I_DIRTY_PAGES may still be set.
  * I_NEW		Serves as both a mutex and completion notification.
  *			New inodes set I_NEW.  If two processes both create
  *			the same inode, one of them will release its inode and
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/13] ext4: simplify i_state checks in __ext4_update_other_inode_time()
  2021-01-05  0:54 [PATCH 00/13] lazytime fixes and cleanups Eric Biggers
                   ` (6 preceding siblings ...)
  2021-01-05  0:54 ` [PATCH 07/13] fs: correctly document the inode dirty flags Eric Biggers
@ 2021-01-05  0:54 ` Eric Biggers
  2021-01-07 13:24   ` Jan Kara
  2021-01-05  0:54 ` [PATCH 09/13] fs: drop redundant checks from __writeback_single_inode() Eric Biggers
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2021-01-05  0:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, Theodore Ts'o,
	Christoph Hellwig

From: Eric Biggers <ebiggers@google.com>

Since I_DIRTY_TIME and I_DIRTY_INODE are mutually exclusive in i_state,
there's no need to check for I_DIRTY_TIME && !I_DIRTY_INODE.  Just check
for I_DIRTY_TIME.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/ext4/inode.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 4cc6c7834312f..9e34541715968 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4962,14 +4962,12 @@ static void __ext4_update_other_inode_time(struct super_block *sb,
 		return;
 
 	if ((inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW |
-			       I_DIRTY_INODE)) ||
-	    ((inode->i_state & I_DIRTY_TIME) == 0))
+			       I_DIRTY_TIME)) != I_DIRTY_TIME)
 		return;
 
 	spin_lock(&inode->i_lock);
-	if (((inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW |
-				I_DIRTY_INODE)) == 0) &&
-	    (inode->i_state & I_DIRTY_TIME)) {
+	if ((inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW |
+			       I_DIRTY_TIME)) != I_DIRTY_TIME) {
 		struct ext4_inode_info	*ei = EXT4_I(inode);
 
 		inode->i_state &= ~I_DIRTY_TIME;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/13] fs: drop redundant checks from __writeback_single_inode()
  2021-01-05  0:54 [PATCH 00/13] lazytime fixes and cleanups Eric Biggers
                   ` (7 preceding siblings ...)
  2021-01-05  0:54 ` [PATCH 08/13] ext4: simplify i_state checks in __ext4_update_other_inode_time() Eric Biggers
@ 2021-01-05  0:54 ` Eric Biggers
  2021-01-08  9:12   ` Christoph Hellwig
  2021-01-05  0:54 ` [PATCH 10/13] fs: clean up __mark_inode_dirty() a bit Eric Biggers
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2021-01-05  0:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, Theodore Ts'o,
	Christoph Hellwig

From: Eric Biggers <ebiggers@google.com>

I_DIRTY_TIME and I_DIRTY_INODE are mutually exclusive in i_state.  So
after seeing that I_DIRTY_TIME is set, there's no point in checking for
I_DIRTY_INODE, as it must be clear.

Separately, wbc->for_sync implies wbc->sync_mode == WB_SYNC_ALL.
So there's no need to check for both.  Just check for WB_SYNC_ALL.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/fs-writeback.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index f20daf4f5e19b..3f5a589399afe 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1482,8 +1482,7 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 
 	dirty = inode->i_state & I_DIRTY;
 	if ((inode->i_state & I_DIRTY_TIME) &&
-	    ((dirty & I_DIRTY_INODE) ||
-	     wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
+	    (wbc->sync_mode == WB_SYNC_ALL ||
 	     time_after(jiffies, inode->dirtied_time_when +
 			dirtytime_expire_interval * HZ))) {
 		dirty |= I_DIRTY_TIME;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/13] fs: clean up __mark_inode_dirty() a bit
  2021-01-05  0:54 [PATCH 00/13] lazytime fixes and cleanups Eric Biggers
                   ` (8 preceding siblings ...)
  2021-01-05  0:54 ` [PATCH 09/13] fs: drop redundant checks from __writeback_single_inode() Eric Biggers
@ 2021-01-05  0:54 ` Eric Biggers
  2021-01-08  9:10   ` Christoph Hellwig
  2021-01-05  0:54 ` [PATCH 11/13] fs: add a lazytime_expired method Eric Biggers
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2021-01-05  0:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, Theodore Ts'o,
	Christoph Hellwig

From: Eric Biggers <ebiggers@google.com>

Improve some comments, and don't bother checking for the I_DIRTY_TIME
flag in the case where we just cleared it.

Also, warn if I_DIRTY_TIME and I_DIRTY_PAGES are passed to
__mark_inode_dirty() at the same time, as this case isn't handled.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/fs-writeback.c | 49 +++++++++++++++++++++++++++++------------------
 1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 3f5a589399afe..ed76112bd067b 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2227,23 +2227,24 @@ static noinline void block_dump___mark_inode_dirty(struct inode *inode)
 }
 
 /**
- * __mark_inode_dirty -	internal function
+ * __mark_inode_dirty -	internal function to mark an inode dirty
  *
  * @inode: inode to mark
- * @flags: what kind of dirty (i.e. I_DIRTY_SYNC)
+ * @flags: what kind of dirty, e.g. I_DIRTY_SYNC.  This can be a combination of
+ *	   multiple I_DIRTY_* flags, except that I_DIRTY_TIME can't be combined
+ *	   with I_DIRTY_PAGES.
  *
- * Mark an inode as dirty. Callers should use mark_inode_dirty or
- * mark_inode_dirty_sync.
+ * Mark an inode as dirty.  We notify the filesystem, then update the inode's
+ * dirty flags.  Then, if needed we add the inode to the appropriate dirty list.
  *
- * Put the inode on the super block's dirty list.
+ * Most callers should use mark_inode_dirty() or mark_inode_dirty_sync()
+ * instead of calling this directly.
  *
- * CAREFUL! We mark it dirty unconditionally, but move it onto the
- * dirty list only if it is hashed or if it refers to a blockdev.
- * If it was not hashed, it will never be added to the dirty list
- * even if it is later hashed, as it will have been marked dirty already.
+ * CAREFUL!  We only add the inode to the dirty list if it is hashed or if it
+ * refers to a blockdev.  Unhashed inodes will never be added to the dirty list
+ * even if they are later hashed, as they will have been marked dirty already.
  *
- * In short, make sure you hash any inodes _before_ you start marking
- * them dirty.
+ * In short, ensure you hash any inodes _before_ you start marking them dirty.
  *
  * Note that for blockdevs, inode->dirtied_when represents the dirtying time of
  * the block-special inode (/dev/hda1) itself.  And the ->dirtied_when field of
@@ -2255,25 +2256,34 @@ static noinline void block_dump___mark_inode_dirty(struct inode *inode)
 void __mark_inode_dirty(struct inode *inode, int flags)
 {
 	struct super_block *sb = inode->i_sb;
-	int dirtytime;
+	int dirtytime = 0;
 
 	trace_writeback_mark_inode_dirty(inode, flags);
 
-	/*
-	 * Don't do this for I_DIRTY_PAGES - that doesn't actually
-	 * dirty the inode itself
-	 */
 	if (flags & I_DIRTY_INODE) {
+		/*
+		 * Notify the filesystem about the inode being dirtied, so that
+		 * (if needed) it can update on-disk fields and journal the
+		 * inode.  This is only needed when the inode itself is being
+		 * dirtied now.  I.e. it's only needed for I_DIRTY_INODE, not
+		 * for just I_DIRTY_PAGES or I_DIRTY_TIME.
+		 */
 		trace_writeback_dirty_inode_start(inode, flags);
-
 		if (sb->s_op->dirty_inode)
 			sb->s_op->dirty_inode(inode, flags & I_DIRTY_INODE);
-
 		trace_writeback_dirty_inode(inode, flags);
 
+		/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
 		flags &= ~I_DIRTY_TIME;
+	} else {
+		/*
+		 * Else it's either I_DIRTY_PAGES, I_DIRTY_TIME, or nothing.
+		 * (We don't support setting both I_DIRTY_PAGES and I_DIRTY_TIME
+		 * in one call to __mark_inode_dirty().)
+		 */
+		dirtytime = flags & I_DIRTY_TIME;
+		WARN_ON_ONCE(dirtytime && (flags != I_DIRTY_TIME));
 	}
-	dirtytime = flags & I_DIRTY_TIME;
 
 	/*
 	 * Paired with smp_mb() in __writeback_single_inode() for the
@@ -2296,6 +2306,7 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 
 		inode_attach_wb(inode, NULL);
 
+		/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
 		if (flags & I_DIRTY_INODE)
 			inode->i_state &= ~I_DIRTY_TIME;
 		inode->i_state |= flags;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 11/13] fs: add a lazytime_expired method
  2021-01-05  0:54 [PATCH 00/13] lazytime fixes and cleanups Eric Biggers
                   ` (9 preceding siblings ...)
  2021-01-05  0:54 ` [PATCH 10/13] fs: clean up __mark_inode_dirty() a bit Eric Biggers
@ 2021-01-05  0:54 ` Eric Biggers
  2021-01-07 14:02   ` Jan Kara
  2021-01-05  0:54 ` [PATCH 12/13] xfs: remove a stale comment from xfs_file_aio_write_checks() Eric Biggers
  2021-01-05  0:54 ` [PATCH 13/13] xfs: implement lazytime_expired Eric Biggers
  12 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2021-01-05  0:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, Theodore Ts'o,
	Christoph Hellwig

From: Eric Biggers <ebiggers@google.com>

Add a lazytime_expired method to 'struct super_operations'.  Filesystems
can implement this to be notified when an inode's lazytime timestamps
have expired and need to be written to disk.

This avoids any potential ambiguity with
->dirty_inode(inode, I_DIRTY_SYNC), which can also mean a generic
dirtying of the inode, not just a lazytime timestamp expiration.
In particular, this will be useful for XFS.

If not implemented, then ->dirty_inode(inode, I_DIRTY_SYNC) continues to
be called.

Note that there are three cases where we have to make sure to call
lazytime_expired():

- __writeback_single_inode(): inode is being written now
- vfs_fsync_range(): inode is going to be synced
- iput(): inode is going to be evicted

In the latter two cases, the inode still needs to be put on the
writeback list.  So, we can't just replace the calls to
mark_inode_dirty_sync() with lazytime_expired().  Instead, add a new
flag I_DIRTY_TIME_EXPIRED which can be passed to __mark_inode_dirty().
It's like I_DIRTY_SYNC, except it causes the filesystem to be notified
of a lazytime expiration rather than a generic I_DIRTY_SYNC.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 Documentation/filesystems/locking.rst |  2 ++
 Documentation/filesystems/vfs.rst     | 10 ++++++++++
 fs/fs-writeback.c                     | 27 ++++++++++++++++++++++-----
 fs/inode.c                            |  2 +-
 fs/sync.c                             |  2 +-
 include/linux/fs.h                    |  7 ++++++-
 6 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index c0f2c7586531b..53088e2a93b69 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -150,6 +150,7 @@ prototypes::
 	void (*free_inode)(struct inode *);
 	void (*destroy_inode)(struct inode *);
 	void (*dirty_inode) (struct inode *, int flags);
+	void (*lazytime_expired) (struct inode *);
 	int (*write_inode) (struct inode *, struct writeback_control *wbc);
 	int (*drop_inode) (struct inode *);
 	void (*evict_inode) (struct inode *);
@@ -175,6 +176,7 @@ alloc_inode:
 free_inode:				called from RCU callback
 destroy_inode:
 dirty_inode:
+lazytime_expired:
 write_inode:
 drop_inode:				!!!inode->i_lock!!!
 evict_inode:
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 287b80948a40b..02531b1421d01 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -231,6 +231,7 @@ filesystem.  As of kernel 2.6.22, the following members are defined:
 		void (*destroy_inode)(struct inode *);
 
 		void (*dirty_inode) (struct inode *, int flags);
+		void (*lazytime_expired) (struct inode *);
 		int (*write_inode) (struct inode *, int);
 		void (*drop_inode) (struct inode *);
 		void (*delete_inode) (struct inode *);
@@ -275,6 +276,15 @@ or bottom half).
 	not its data.  If the update needs to be persisted by fdatasync(),
 	then I_DIRTY_DATASYNC will be set in the flags argument.
 
+``lazytime_expired``
+	when the lazytime mount option is enabled, this method is
+	called when an inode's in-memory updated timestamps have
+	expired and thus need to be written to disk.  This happens
+	when the timestamps have been in memory for too long, when the
+	inode is going to be evicted, or when userspace triggers a
+	sync.  If this method is not implemented, then
+	->dirty_inode(inode, I_DIRTY_SYNC) is called instead.
+
 ``write_inode``
 	this method is called when the VFS needs to write an inode to
 	disc.  The second parameter indicates whether the write should
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index ed76112bd067b..f187fc3f854e4 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1441,6 +1441,14 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
 	}
 }
 
+static void lazytime_expired(struct inode *inode)
+{
+	if (inode->i_sb->s_op->lazytime_expired)
+		inode->i_sb->s_op->lazytime_expired(inode);
+	else if (inode->i_sb->s_op->dirty_inode)
+		inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
+}
+
 /*
  * Write out an inode and its dirty pages. Do not update the writeback list
  * linkage. That is left to the caller. The caller is also responsible for
@@ -1520,8 +1528,8 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 		 * isn't enough.  Don't call mark_inode_dirty_sync(), as that
 		 * would put the inode back on the dirty list.
 		 */
-		if ((dirty & I_DIRTY_TIME) && inode->i_sb->s_op->dirty_inode)
-			inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
+		if (dirty & I_DIRTY_TIME)
+			lazytime_expired(inode);
 
 		err = write_inode(inode, wbc);
 		if (ret == 0)
@@ -2231,8 +2239,9 @@ static noinline void block_dump___mark_inode_dirty(struct inode *inode)
  *
  * @inode: inode to mark
  * @flags: what kind of dirty, e.g. I_DIRTY_SYNC.  This can be a combination of
- *	   multiple I_DIRTY_* flags, except that I_DIRTY_TIME can't be combined
- *	   with I_DIRTY_PAGES.
+ *	   multiple I_DIRTY_* flags, except that:
+ *	   - I_DIRTY_TIME can't be combined with I_DIRTY_PAGES
+ *	   - I_DIRTY_TIME_EXPIRED must be used by itself
  *
  * Mark an inode as dirty.  We notify the filesystem, then update the inode's
  * dirty flags.  Then, if needed we add the inode to the appropriate dirty list.
@@ -2260,7 +2269,15 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 
 	trace_writeback_mark_inode_dirty(inode, flags);
 
-	if (flags & I_DIRTY_INODE) {
+	if (flags & I_DIRTY_TIME_EXPIRED) {
+		/*
+		 * Notify the filesystem about a lazytime timestamp expiration.
+		 * Afterwards, this case just turns into I_DIRTY_SYNC.
+		 */
+		WARN_ON_ONCE(flags & ~I_DIRTY_TIME_EXPIRED);
+		lazytime_expired(inode);
+		flags = I_DIRTY_SYNC;
+	} else if (flags & I_DIRTY_INODE) {
 		/*
 		 * Notify the filesystem about the inode being dirtied, so that
 		 * (if needed) it can update on-disk fields and journal the
diff --git a/fs/inode.c b/fs/inode.c
index d0fa43d8e9d5c..039b201a4743a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1673,7 +1673,7 @@ void iput(struct inode *inode)
 			atomic_inc(&inode->i_count);
 			spin_unlock(&inode->i_lock);
 			trace_writeback_lazytime_iput(inode);
-			mark_inode_dirty_sync(inode);
+			__mark_inode_dirty(inode, I_DIRTY_TIME_EXPIRED);
 			goto retry;
 		}
 		iput_final(inode);
diff --git a/fs/sync.c b/fs/sync.c
index 1373a610dc784..363071a3528e3 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -196,7 +196,7 @@ int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)
 	if (!file->f_op->fsync)
 		return -EINVAL;
 	if (!datasync && (inode->i_state & I_DIRTY_TIME))
-		mark_inode_dirty_sync(inode);
+		__mark_inode_dirty(inode, I_DIRTY_TIME_EXPIRED);
 	return file->f_op->fsync(file, start, end, datasync);
 }
 EXPORT_SYMBOL(vfs_fsync_range);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 45a0303b2aeb6..8c5f5c5e62be4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1935,7 +1935,8 @@ struct super_operations {
 	void (*destroy_inode)(struct inode *);
 	void (*free_inode)(struct inode *);
 
-   	void (*dirty_inode) (struct inode *, int flags);
+	void (*dirty_inode) (struct inode *, int flags);
+	void (*lazytime_expired)(struct inode *);
 	int (*write_inode) (struct inode *, struct writeback_control *wbc);
 	int (*drop_inode) (struct inode *);
 	void (*evict_inode) (struct inode *);
@@ -2108,6 +2109,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
  *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set.  I.e.
  *			either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
  *			i_state, but not both.  I_DIRTY_PAGES may still be set.
+ * I_DIRTY_TIME_EXPIRED Passed to __mark_inode_dirty() to indicate the intent to
+ *			expire the inode's timestamps.  Not stored in i_state.
+ *
  * I_NEW		Serves as both a mutex and completion notification.
  *			New inodes set I_NEW.  If two processes both create
  *			the same inode, one of them will release its inode and
@@ -2173,6 +2177,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
 #define I_DIO_WAKEUP		(1 << __I_DIO_WAKEUP)
 #define I_LINKABLE		(1 << 10)
 #define I_DIRTY_TIME		(1 << 11)
+#define I_DIRTY_TIME_EXPIRED	(1 << 12)
 #define I_WB_SWITCH		(1 << 13)
 #define I_OVL_INUSE		(1 << 14)
 #define I_CREATING		(1 << 15)
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 12/13] xfs: remove a stale comment from xfs_file_aio_write_checks()
  2021-01-05  0:54 [PATCH 00/13] lazytime fixes and cleanups Eric Biggers
                   ` (10 preceding siblings ...)
  2021-01-05  0:54 ` [PATCH 11/13] fs: add a lazytime_expired method Eric Biggers
@ 2021-01-05  0:54 ` Eric Biggers
  2021-01-08  9:15   ` Christoph Hellwig
  2021-01-05  0:54 ` [PATCH 13/13] xfs: implement lazytime_expired Eric Biggers
  12 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2021-01-05  0:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, Theodore Ts'o,
	Christoph Hellwig

From: Eric Biggers <ebiggers@google.com>

The comment in xfs_file_aio_write_checks() about calling file_modified()
after dropping the ilock doesn't make sense, because the code that
unconditionally acquires and drops the ilock was removed by
commit 467f78992a07 ("xfs: reduce ilock hold times in
xfs_file_aio_write_checks").

Remove this outdated comment.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/xfs/xfs_file.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5b0f93f738372..4927c6653f15d 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -389,12 +389,6 @@ xfs_file_aio_write_checks(
 	} else
 		spin_unlock(&ip->i_flags_lock);
 
-	/*
-	 * Updating the timestamps will grab the ilock again from
-	 * xfs_fs_dirty_inode, so we have to call it after dropping the
-	 * lock above.  Eventually we should look into a way to avoid
-	 * the pointless lock roundtrip.
-	 */
 	return file_modified(file);
 }
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 13/13] xfs: implement lazytime_expired
  2021-01-05  0:54 [PATCH 00/13] lazytime fixes and cleanups Eric Biggers
                   ` (11 preceding siblings ...)
  2021-01-05  0:54 ` [PATCH 12/13] xfs: remove a stale comment from xfs_file_aio_write_checks() Eric Biggers
@ 2021-01-05  0:54 ` Eric Biggers
  12 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2021-01-05  0:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, Theodore Ts'o,
	Christoph Hellwig

From: Eric Biggers <ebiggers@google.com>

Implement the new ->lazytime_expired method to get notified of lazytime
timestamp expirations, instead of relying on ->dirty_inode(inode,
I_DIRTY_SYNC) which is potentially ambiguous.

This fixes a bug where XFS didn't write lazytime timestamps to disk upon
a sync(), or after 24 hours (dirtytime_expire_interval * 2).  This is
because it only wrote the timestamps if I_DIRTY_TIME was set in i_state.
But actually when an inode's timestamps expire without the inode being
marked I_DIRTY_SYNC first, then ->dirty_inode isn't called until
__writeback_single_inode() has already cleared I_DIRTY_TIME in i_state.

The new ->lazytime_expired method is unambiguous, so it removes any need
to check for I_DIRTY_TIME, which avoids this bug.

I've written an xfstest which reproduces this bug.

Fixes: c3b1b13190ae ("xfs: implement the lazytime mount option")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/xfs/xfs_super.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 813be879a5e51..0b7623907b264 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -666,19 +666,13 @@ xfs_fs_destroy_inode(
 }
 
 static void
-xfs_fs_dirty_inode(
-	struct inode			*inode,
-	int				flag)
+xfs_fs_lazytime_expired(
+	struct inode			*inode)
 {
 	struct xfs_inode		*ip = XFS_I(inode);
 	struct xfs_mount		*mp = ip->i_mount;
 	struct xfs_trans		*tp;
 
-	if (!(inode->i_sb->s_flags & SB_LAZYTIME))
-		return;
-	if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
-		return;
-
 	if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
 		return;
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
@@ -1108,7 +1102,7 @@ xfs_fs_free_cached_objects(
 static const struct super_operations xfs_super_operations = {
 	.alloc_inode		= xfs_fs_alloc_inode,
 	.destroy_inode		= xfs_fs_destroy_inode,
-	.dirty_inode		= xfs_fs_dirty_inode,
+	.lazytime_expired	= xfs_fs_lazytime_expired,
 	.drop_inode		= xfs_fs_drop_inode,
 	.put_super		= xfs_fs_put_super,
 	.sync_fs		= xfs_fs_sync_fs,
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/13] fat: only specify I_DIRTY_TIME when needed in fat_update_time()
  2021-01-05  0:54 ` [PATCH 04/13] fat: only specify I_DIRTY_TIME when needed in fat_update_time() Eric Biggers
@ 2021-01-07 13:13   ` Jan Kara
  2021-01-07 19:10     ` Eric Biggers
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kara @ 2021-01-07 13:13 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

On Mon 04-01-21 16:54:43, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> As was done for generic_update_time(), only pass I_DIRTY_TIME to
> __mark_inode_dirty() when the inode's timestamps were actually updated
> and lazytime is enabled.  This avoids a weird edge case where
> I_DIRTY_TIME could be set in i_state when lazytime isn't enabled.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

...
> +	if ((flags & S_VERSION) && inode_maybe_inc_iversion(inode, false))
> +		dirty_flags |= I_DIRTY_SYNC;
>  
> -	if (dirty)
> -		iflags |= I_DIRTY_SYNC;
>  	__mark_inode_dirty(inode, iflags);
				  ^^^ dirty_flags here?

Otherwise the change looks good to me.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 05/13] fs: don't call ->dirty_inode for lazytime timestamp updates
  2021-01-05  0:54 ` [PATCH 05/13] fs: don't call ->dirty_inode for lazytime timestamp updates Eric Biggers
@ 2021-01-07 13:17   ` Jan Kara
  2021-01-07 13:18     ` Jan Kara
  2021-01-08  9:02   ` Christoph Hellwig
  1 sibling, 1 reply; 37+ messages in thread
From: Jan Kara @ 2021-01-07 13:17 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

On Mon 04-01-21 16:54:44, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> There is no need to call ->dirty_inode for lazytime timestamp updates
> (i.e. for __mark_inode_dirty(I_DIRTY_TIME)), since by the definition of
> lazytime, filesystems must ignore these updates.  Filesystems only need
> to care about the updated timestamps when they expire.
> 
> Therefore, only call ->dirty_inode when I_DIRTY_INODE is set.
> 
> Based on a patch from Christoph Hellwig:
> https://lore.kernel.org/r/20200325122825.1086872-4-hch@lst.de
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

...

> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 081e335cdee47..e3347fd6eb13a 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2264,16 +2264,16 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>  	 * Don't do this for I_DIRTY_PAGES - that doesn't actually
>  	 * dirty the inode itself
>  	 */
> -	if (flags & (I_DIRTY_INODE | I_DIRTY_TIME)) {
> +	if (flags & I_DIRTY_INODE) {
>  		trace_writeback_dirty_inode_start(inode, flags);
>  
>  		if (sb->s_op->dirty_inode)
>  			sb->s_op->dirty_inode(inode, flags);

OK, but shouldn't we pass just (flags & I_DIRTY_INODE) to ->dirty_inode().
Just to make it clear what the filesystem is supposed to consume in
'flags'...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 05/13] fs: don't call ->dirty_inode for lazytime timestamp updates
  2021-01-07 13:17   ` Jan Kara
@ 2021-01-07 13:18     ` Jan Kara
  0 siblings, 0 replies; 37+ messages in thread
From: Jan Kara @ 2021-01-07 13:18 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

On Thu 07-01-21 14:17:53, Jan Kara wrote:
> On Mon 04-01-21 16:54:44, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > There is no need to call ->dirty_inode for lazytime timestamp updates
> > (i.e. for __mark_inode_dirty(I_DIRTY_TIME)), since by the definition of
> > lazytime, filesystems must ignore these updates.  Filesystems only need
> > to care about the updated timestamps when they expire.
> > 
> > Therefore, only call ->dirty_inode when I_DIRTY_INODE is set.
> > 
> > Based on a patch from Christoph Hellwig:
> > https://lore.kernel.org/r/20200325122825.1086872-4-hch@lst.de
> > 
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> 
> ...
> 
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 081e335cdee47..e3347fd6eb13a 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -2264,16 +2264,16 @@ void __mark_inode_dirty(struct inode *inode, int flags)
> >  	 * Don't do this for I_DIRTY_PAGES - that doesn't actually
> >  	 * dirty the inode itself
> >  	 */
> > -	if (flags & (I_DIRTY_INODE | I_DIRTY_TIME)) {
> > +	if (flags & I_DIRTY_INODE) {
> >  		trace_writeback_dirty_inode_start(inode, flags);
> >  
> >  		if (sb->s_op->dirty_inode)
> >  			sb->s_op->dirty_inode(inode, flags);
> 
> OK, but shouldn't we pass just (flags & I_DIRTY_INODE) to ->dirty_inode().
> Just to make it clear what the filesystem is supposed to consume in
> 'flags'...

Aha, you just did that in the following patch ;) So taking back my comment.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/13] ext4: simplify i_state checks in __ext4_update_other_inode_time()
  2021-01-05  0:54 ` [PATCH 08/13] ext4: simplify i_state checks in __ext4_update_other_inode_time() Eric Biggers
@ 2021-01-07 13:24   ` Jan Kara
  2021-01-07 19:06     ` Eric Biggers
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kara @ 2021-01-07 13:24 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

On Mon 04-01-21 16:54:47, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Since I_DIRTY_TIME and I_DIRTY_INODE are mutually exclusive in i_state,
> there's no need to check for I_DIRTY_TIME && !I_DIRTY_INODE.  Just check
> for I_DIRTY_TIME.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  fs/ext4/inode.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 4cc6c7834312f..9e34541715968 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4962,14 +4962,12 @@ static void __ext4_update_other_inode_time(struct super_block *sb,
>  		return;
>  
>  	if ((inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW |
> -			       I_DIRTY_INODE)) ||
> -	    ((inode->i_state & I_DIRTY_TIME) == 0))
> +			       I_DIRTY_TIME)) != I_DIRTY_TIME)
>  		return;

This is OK.

>  	spin_lock(&inode->i_lock);
> -	if (((inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW |
> -				I_DIRTY_INODE)) == 0) &&
> -	    (inode->i_state & I_DIRTY_TIME)) {
> +	if ((inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW |
> +			       I_DIRTY_TIME)) != I_DIRTY_TIME) {

But this condition is negated AFAICT. We should have == I_DIRTY_TIME here
AFAICT.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 11/13] fs: add a lazytime_expired method
  2021-01-05  0:54 ` [PATCH 11/13] fs: add a lazytime_expired method Eric Biggers
@ 2021-01-07 14:02   ` Jan Kara
  2021-01-07 22:05     ` Eric Biggers
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kara @ 2021-01-07 14:02 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

On Mon 04-01-21 16:54:50, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Add a lazytime_expired method to 'struct super_operations'.  Filesystems
> can implement this to be notified when an inode's lazytime timestamps
> have expired and need to be written to disk.
> 
> This avoids any potential ambiguity with
> ->dirty_inode(inode, I_DIRTY_SYNC), which can also mean a generic
> dirtying of the inode, not just a lazytime timestamp expiration.
> In particular, this will be useful for XFS.
> 
> If not implemented, then ->dirty_inode(inode, I_DIRTY_SYNC) continues to
> be called.
> 
> Note that there are three cases where we have to make sure to call
> lazytime_expired():
> 
> - __writeback_single_inode(): inode is being written now
> - vfs_fsync_range(): inode is going to be synced
> - iput(): inode is going to be evicted
> 
> In the latter two cases, the inode still needs to be put on the
> writeback list.  So, we can't just replace the calls to
> mark_inode_dirty_sync() with lazytime_expired().  Instead, add a new
> flag I_DIRTY_TIME_EXPIRED which can be passed to __mark_inode_dirty().
> It's like I_DIRTY_SYNC, except it causes the filesystem to be notified
> of a lazytime expiration rather than a generic I_DIRTY_SYNC.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Hum, seeing this patch I kind of wonder: Why don't we dirty the inode after
expiring the lazytime timestamps with I_DIRTY_SYNC | I_DIRTY_TIME_EXPIRED
and propagate I_DIRTY_TIME_EXPIRED even to ->dirty_inode() where XFS can
catch it and act? Functionally it would be the same but we'd save a bunch
of generic code and ->lazytime_expired helper used just by a single
filesystem...

								Honza

> ---
>  Documentation/filesystems/locking.rst |  2 ++
>  Documentation/filesystems/vfs.rst     | 10 ++++++++++
>  fs/fs-writeback.c                     | 27 ++++++++++++++++++++++-----
>  fs/inode.c                            |  2 +-
>  fs/sync.c                             |  2 +-
>  include/linux/fs.h                    |  7 ++++++-
>  6 files changed, 42 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
> index c0f2c7586531b..53088e2a93b69 100644
> --- a/Documentation/filesystems/locking.rst
> +++ b/Documentation/filesystems/locking.rst
> @@ -150,6 +150,7 @@ prototypes::
>  	void (*free_inode)(struct inode *);
>  	void (*destroy_inode)(struct inode *);
>  	void (*dirty_inode) (struct inode *, int flags);
> +	void (*lazytime_expired) (struct inode *);
>  	int (*write_inode) (struct inode *, struct writeback_control *wbc);
>  	int (*drop_inode) (struct inode *);
>  	void (*evict_inode) (struct inode *);
> @@ -175,6 +176,7 @@ alloc_inode:
>  free_inode:				called from RCU callback
>  destroy_inode:
>  dirty_inode:
> +lazytime_expired:
>  write_inode:
>  drop_inode:				!!!inode->i_lock!!!
>  evict_inode:
> diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
> index 287b80948a40b..02531b1421d01 100644
> --- a/Documentation/filesystems/vfs.rst
> +++ b/Documentation/filesystems/vfs.rst
> @@ -231,6 +231,7 @@ filesystem.  As of kernel 2.6.22, the following members are defined:
>  		void (*destroy_inode)(struct inode *);
>  
>  		void (*dirty_inode) (struct inode *, int flags);
> +		void (*lazytime_expired) (struct inode *);
>  		int (*write_inode) (struct inode *, int);
>  		void (*drop_inode) (struct inode *);
>  		void (*delete_inode) (struct inode *);
> @@ -275,6 +276,15 @@ or bottom half).
>  	not its data.  If the update needs to be persisted by fdatasync(),
>  	then I_DIRTY_DATASYNC will be set in the flags argument.
>  
> +``lazytime_expired``
> +	when the lazytime mount option is enabled, this method is
> +	called when an inode's in-memory updated timestamps have
> +	expired and thus need to be written to disk.  This happens
> +	when the timestamps have been in memory for too long, when the
> +	inode is going to be evicted, or when userspace triggers a
> +	sync.  If this method is not implemented, then
> +	->dirty_inode(inode, I_DIRTY_SYNC) is called instead.
> +
>  ``write_inode``
>  	this method is called when the VFS needs to write an inode to
>  	disc.  The second parameter indicates whether the write should
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index ed76112bd067b..f187fc3f854e4 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1441,6 +1441,14 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
>  	}
>  }
>  
> +static void lazytime_expired(struct inode *inode)
> +{
> +	if (inode->i_sb->s_op->lazytime_expired)
> +		inode->i_sb->s_op->lazytime_expired(inode);
> +	else if (inode->i_sb->s_op->dirty_inode)
> +		inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
> +}
> +
>  /*
>   * Write out an inode and its dirty pages. Do not update the writeback list
>   * linkage. That is left to the caller. The caller is also responsible for
> @@ -1520,8 +1528,8 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
>  		 * isn't enough.  Don't call mark_inode_dirty_sync(), as that
>  		 * would put the inode back on the dirty list.
>  		 */
> -		if ((dirty & I_DIRTY_TIME) && inode->i_sb->s_op->dirty_inode)
> -			inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
> +		if (dirty & I_DIRTY_TIME)
> +			lazytime_expired(inode);
>  
>  		err = write_inode(inode, wbc);
>  		if (ret == 0)
> @@ -2231,8 +2239,9 @@ static noinline void block_dump___mark_inode_dirty(struct inode *inode)
>   *
>   * @inode: inode to mark
>   * @flags: what kind of dirty, e.g. I_DIRTY_SYNC.  This can be a combination of
> - *	   multiple I_DIRTY_* flags, except that I_DIRTY_TIME can't be combined
> - *	   with I_DIRTY_PAGES.
> + *	   multiple I_DIRTY_* flags, except that:
> + *	   - I_DIRTY_TIME can't be combined with I_DIRTY_PAGES
> + *	   - I_DIRTY_TIME_EXPIRED must be used by itself
>   *
>   * Mark an inode as dirty.  We notify the filesystem, then update the inode's
>   * dirty flags.  Then, if needed we add the inode to the appropriate dirty list.
> @@ -2260,7 +2269,15 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>  
>  	trace_writeback_mark_inode_dirty(inode, flags);
>  
> -	if (flags & I_DIRTY_INODE) {
> +	if (flags & I_DIRTY_TIME_EXPIRED) {
> +		/*
> +		 * Notify the filesystem about a lazytime timestamp expiration.
> +		 * Afterwards, this case just turns into I_DIRTY_SYNC.
> +		 */
> +		WARN_ON_ONCE(flags & ~I_DIRTY_TIME_EXPIRED);
> +		lazytime_expired(inode);
> +		flags = I_DIRTY_SYNC;
> +	} else if (flags & I_DIRTY_INODE) {
>  		/*
>  		 * Notify the filesystem about the inode being dirtied, so that
>  		 * (if needed) it can update on-disk fields and journal the
> diff --git a/fs/inode.c b/fs/inode.c
> index d0fa43d8e9d5c..039b201a4743a 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1673,7 +1673,7 @@ void iput(struct inode *inode)
>  			atomic_inc(&inode->i_count);
>  			spin_unlock(&inode->i_lock);
>  			trace_writeback_lazytime_iput(inode);
> -			mark_inode_dirty_sync(inode);
> +			__mark_inode_dirty(inode, I_DIRTY_TIME_EXPIRED);
>  			goto retry;
>  		}
>  		iput_final(inode);
> diff --git a/fs/sync.c b/fs/sync.c
> index 1373a610dc784..363071a3528e3 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -196,7 +196,7 @@ int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)
>  	if (!file->f_op->fsync)
>  		return -EINVAL;
>  	if (!datasync && (inode->i_state & I_DIRTY_TIME))
> -		mark_inode_dirty_sync(inode);
> +		__mark_inode_dirty(inode, I_DIRTY_TIME_EXPIRED);
>  	return file->f_op->fsync(file, start, end, datasync);
>  }
>  EXPORT_SYMBOL(vfs_fsync_range);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 45a0303b2aeb6..8c5f5c5e62be4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1935,7 +1935,8 @@ struct super_operations {
>  	void (*destroy_inode)(struct inode *);
>  	void (*free_inode)(struct inode *);
>  
> -   	void (*dirty_inode) (struct inode *, int flags);
> +	void (*dirty_inode) (struct inode *, int flags);
> +	void (*lazytime_expired)(struct inode *);
>  	int (*write_inode) (struct inode *, struct writeback_control *wbc);
>  	int (*drop_inode) (struct inode *);
>  	void (*evict_inode) (struct inode *);
> @@ -2108,6 +2109,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>   *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set.  I.e.
>   *			either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
>   *			i_state, but not both.  I_DIRTY_PAGES may still be set.
> + * I_DIRTY_TIME_EXPIRED Passed to __mark_inode_dirty() to indicate the intent to
> + *			expire the inode's timestamps.  Not stored in i_state.
> + *
>   * I_NEW		Serves as both a mutex and completion notification.
>   *			New inodes set I_NEW.  If two processes both create
>   *			the same inode, one of them will release its inode and
> @@ -2173,6 +2177,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>  #define I_DIO_WAKEUP		(1 << __I_DIO_WAKEUP)
>  #define I_LINKABLE		(1 << 10)
>  #define I_DIRTY_TIME		(1 << 11)
> +#define I_DIRTY_TIME_EXPIRED	(1 << 12)
>  #define I_WB_SWITCH		(1 << 13)
>  #define I_OVL_INUSE		(1 << 14)
>  #define I_CREATING		(1 << 15)
> -- 
> 2.30.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 01/13] fs: avoid double-writing inodes on lazytime expiration
  2021-01-05  0:54 ` [PATCH 01/13] fs: avoid double-writing inodes on lazytime expiration Eric Biggers
@ 2021-01-07 14:47   ` Jan Kara
  2021-01-07 14:58     ` Matthew Wilcox
                       ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Jan Kara @ 2021-01-07 14:47 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig, stable

[-- Attachment #1: Type: text/plain, Size: 4396 bytes --]

On Mon 04-01-21 16:54:40, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> When lazytime is enabled and an inode with dirty timestamps is being
> expired, either due to dirtytime_expire_interval being exceeded or due
> to a sync or syncfs system call, we need to inform the filesystem that
> the inode is dirty so that the inode's timestamps can be copied out to
> the on-disk data structures.  That's because if the filesystem supports
> lazytime, it will have ignored the ->dirty_inode(inode, I_DIRTY_TIME)
> notification when the timestamp was modified in memory.
> 
> Currently this is accomplished by calling mark_inode_dirty_sync() from
> __writeback_single_inode().  However, this has the unfortunate side
> effect of also putting the inode the writeback list.  That's not
> appropriate in this case, since the inode is already being written.
> 
> That causes the inode to remain dirty after a sync.  Normally that's
> just wasteful, as it causes the inode to be written twice.  But when
> fscrypt is used this bug also partially breaks the
> FS_IOC_REMOVE_ENCRYPTION_KEY ioctl, as the ioctl reports that files are
> still in-use when they aren't.  For more details, see the original
> report at https://lore.kernel.org/r/20200306004555.GB225345@gmail.com
> 
> Fix this by calling ->dirty_inode(inode, I_DIRTY_SYNC) directly instead
> of mark_inode_dirty_sync().
> 
> This fixes xfstest generic/580 when lazytime is enabled.
> 
> A later patch will introduce a ->lazytime_expired method to cleanly
> separate out the lazytime expiration case, in particular for XFS which
> uses the VFS-level dirtiness tracking only for lazytime.  But that's
> separate from fixing this bug.  Also, note that XFS will incorrectly
> ignore the I_DIRTY_SYNC notification from __writeback_single_inode()
> both before and after this patch, as I_DIRTY_TIME was already cleared in
> i_state.  Later patches will fix this separate bug.
> 
> Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
> Cc: stable@vger.kernel.org
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Good catch! It could also cause issues with filesystem freezing which kind
of assumes that the filesystem will be clean after sync_filesystem()
(otherwise writeback threads can get stalled on frozen filesystem while
holding some locks and generally the system behavior becomes kind of
awkward).

> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index acfb55834af23..081e335cdee47 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1509,11 +1509,22 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
>  
>  	spin_unlock(&inode->i_lock);
>  
> -	if (dirty & I_DIRTY_TIME)
> -		mark_inode_dirty_sync(inode);
>  	/* Don't write the inode if only I_DIRTY_PAGES was set */
>  	if (dirty & ~I_DIRTY_PAGES) {
> -		int err = write_inode(inode, wbc);
> +		int err;
> +
> +		/*
> +		 * If the inode is being written due to a lazytime timestamp
> +		 * expiration, then the filesystem needs to be notified about it
> +		 * so that e.g. the filesystem can update on-disk fields and
> +		 * journal the timestamp update.  Just calling write_inode()
> +		 * isn't enough.  Don't call mark_inode_dirty_sync(), as that
> +		 * would put the inode back on the dirty list.
> +		 */
> +		if ((dirty & I_DIRTY_TIME) && inode->i_sb->s_op->dirty_inode)
> +			inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
> +
> +		err = write_inode(inode, wbc);
>  		if (ret == 0)
>  			ret = err;
>  	}

I have to say I dislike this special call of ->dirty_inode(). It works but
it makes me wonder, didn't we forget about something or won't we forget in
the future? Because it's very easy to miss this special case...

I think attached patch (compile-tested only) should actually fix the
problem as well without this special ->dirty_inode() call. It basically
only moves the mark_inode_dirty_sync() before inode->i_state clearing.
Because conceptually mark_inode_dirty_sync() is IMO the right function to
call. It will take care of clearing I_DIRTY_TIME flag (because we are
setting I_DIRTY_SYNC), it will also not touch inode->i_io_list if the inode
is queued for sync (I_SYNC_QUEUED is set in that case). The only problem
with calling it was that it was called *after* clearing dirty bits from
i_state... What do you think?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-fs-Make-sure-inode-is-clean-after-__writeback_single.patch --]
[-- Type: text/x-patch, Size: 1913 bytes --]

From 80ccc6a78d1c0532f600b98884f7a64e58333485 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 7 Jan 2021 15:36:05 +0100
Subject: [PATCH] fs: Make sure inode is clean after __writeback_single_inode()

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index acfb55834af2..b9356f470fae 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1473,22 +1473,25 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 			ret = err;
 	}
 
+	/*
+	 * If inode has dirty timestamps and we need to write them, call
+	 * mark_inode_dirty_sync() to notify filesystem about it.
+	 */
+	if (inode->i_state & I_DIRTY_TIME &&
+	    (wbc->for_sync || wbc->sync_mode == WB_SYNC_ALL ||
+	     time_after(jiffies, inode->dirtied_time_when +
+			dirtytime_expire_interval * HZ))) {
+		trace_writeback_lazytime(inode);
+		mark_inode_dirty_sync(inode);
+	}
+
 	/*
 	 * Some filesystems may redirty the inode during the writeback
 	 * due to delalloc, clear dirty metadata flags right before
 	 * write_inode()
 	 */
 	spin_lock(&inode->i_lock);
-
 	dirty = inode->i_state & I_DIRTY;
-	if ((inode->i_state & I_DIRTY_TIME) &&
-	    ((dirty & I_DIRTY_INODE) ||
-	     wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
-	     time_after(jiffies, inode->dirtied_time_when +
-			dirtytime_expire_interval * HZ))) {
-		dirty |= I_DIRTY_TIME;
-		trace_writeback_lazytime(inode);
-	}
 	inode->i_state &= ~dirty;
 
 	/*
@@ -1509,8 +1512,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 
 	spin_unlock(&inode->i_lock);
 
-	if (dirty & I_DIRTY_TIME)
-		mark_inode_dirty_sync(inode);
 	/* Don't write the inode if only I_DIRTY_PAGES was set */
 	if (dirty & ~I_DIRTY_PAGES) {
 		int err = write_inode(inode, wbc);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 01/13] fs: avoid double-writing inodes on lazytime expiration
  2021-01-07 14:47   ` Jan Kara
@ 2021-01-07 14:58     ` Matthew Wilcox
  2021-01-07 21:46     ` Eric Biggers
  2021-01-08  9:01     ` Christoph Hellwig
  2 siblings, 0 replies; 37+ messages in thread
From: Matthew Wilcox @ 2021-01-07 14:58 UTC (permalink / raw)
  To: Jan Kara
  Cc: Eric Biggers, linux-fsdevel, linux-ext4, linux-f2fs-devel,
	linux-xfs, Theodore Ts'o, Christoph Hellwig, stable

On Thu, Jan 07, 2021 at 03:47:09PM +0100, Jan Kara wrote:
> I think attached patch (compile-tested only) should actually fix the
> problem as well without this special ->dirty_inode() call. It basically
> only moves the mark_inode_dirty_sync() before inode->i_state clearing.
> Because conceptually mark_inode_dirty_sync() is IMO the right function to
> call. It will take care of clearing I_DIRTY_TIME flag (because we are
> setting I_DIRTY_SYNC), it will also not touch inode->i_io_list if the inode
> is queued for sync (I_SYNC_QUEUED is set in that case). The only problem
> with calling it was that it was called *after* clearing dirty bits from
> i_state... What do you think?

I like this patch far more, fwiw.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/13] ext4: simplify i_state checks in __ext4_update_other_inode_time()
  2021-01-07 13:24   ` Jan Kara
@ 2021-01-07 19:06     ` Eric Biggers
  0 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2021-01-07 19:06 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

On Thu, Jan 07, 2021 at 02:24:12PM +0100, Jan Kara wrote:
> On Mon 04-01-21 16:54:47, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > Since I_DIRTY_TIME and I_DIRTY_INODE are mutually exclusive in i_state,
> > there's no need to check for I_DIRTY_TIME && !I_DIRTY_INODE.  Just check
> > for I_DIRTY_TIME.
> > 
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > ---
> >  fs/ext4/inode.c | 8 +++-----
> >  1 file changed, 3 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 4cc6c7834312f..9e34541715968 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -4962,14 +4962,12 @@ static void __ext4_update_other_inode_time(struct super_block *sb,
> >  		return;
> >  
> >  	if ((inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW |
> > -			       I_DIRTY_INODE)) ||
> > -	    ((inode->i_state & I_DIRTY_TIME) == 0))
> > +			       I_DIRTY_TIME)) != I_DIRTY_TIME)
> >  		return;
> 
> This is OK.
> 
> >  	spin_lock(&inode->i_lock);
> > -	if (((inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW |
> > -				I_DIRTY_INODE)) == 0) &&
> > -	    (inode->i_state & I_DIRTY_TIME)) {
> > +	if ((inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW |
> > +			       I_DIRTY_TIME)) != I_DIRTY_TIME) {
> 
> But this condition is negated AFAICT. We should have == I_DIRTY_TIME here
> AFAICT.

Indeed, I'll fix that.  Thanks for catching this!

- Eric

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/13] fat: only specify I_DIRTY_TIME when needed in fat_update_time()
  2021-01-07 13:13   ` Jan Kara
@ 2021-01-07 19:10     ` Eric Biggers
  0 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2021-01-07 19:10 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

On Thu, Jan 07, 2021 at 02:13:28PM +0100, Jan Kara wrote:
> On Mon 04-01-21 16:54:43, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > As was done for generic_update_time(), only pass I_DIRTY_TIME to
> > __mark_inode_dirty() when the inode's timestamps were actually updated
> > and lazytime is enabled.  This avoids a weird edge case where
> > I_DIRTY_TIME could be set in i_state when lazytime isn't enabled.
> > 
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> 
> ...
> > +	if ((flags & S_VERSION) && inode_maybe_inc_iversion(inode, false))
> > +		dirty_flags |= I_DIRTY_SYNC;
> >  
> > -	if (dirty)
> > -		iflags |= I_DIRTY_SYNC;
> >  	__mark_inode_dirty(inode, iflags);
> 				  ^^^ dirty_flags here?
> 
> Otherwise the change looks good to me.

Yeah, I'll fix that.  I accidentally didn't have CONFIG_FAT_FS enabled.

- Eric

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 01/13] fs: avoid double-writing inodes on lazytime expiration
  2021-01-07 14:47   ` Jan Kara
  2021-01-07 14:58     ` Matthew Wilcox
@ 2021-01-07 21:46     ` Eric Biggers
  2021-01-08  8:54       ` Christoph Hellwig
  2021-01-08  9:01     ` Christoph Hellwig
  2 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2021-01-07 21:46 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig, stable

On Thu, Jan 07, 2021 at 03:47:09PM +0100, Jan Kara wrote:
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index acfb55834af23..081e335cdee47 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -1509,11 +1509,22 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
> >  
> >  	spin_unlock(&inode->i_lock);
> >  
> > -	if (dirty & I_DIRTY_TIME)
> > -		mark_inode_dirty_sync(inode);
> >  	/* Don't write the inode if only I_DIRTY_PAGES was set */
> >  	if (dirty & ~I_DIRTY_PAGES) {
> > -		int err = write_inode(inode, wbc);
> > +		int err;
> > +
> > +		/*
> > +		 * If the inode is being written due to a lazytime timestamp
> > +		 * expiration, then the filesystem needs to be notified about it
> > +		 * so that e.g. the filesystem can update on-disk fields and
> > +		 * journal the timestamp update.  Just calling write_inode()
> > +		 * isn't enough.  Don't call mark_inode_dirty_sync(), as that
> > +		 * would put the inode back on the dirty list.
> > +		 */
> > +		if ((dirty & I_DIRTY_TIME) && inode->i_sb->s_op->dirty_inode)
> > +			inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
> > +
> > +		err = write_inode(inode, wbc);
> >  		if (ret == 0)
> >  			ret = err;
> >  	}
> 
> I have to say I dislike this special call of ->dirty_inode(). It works but
> it makes me wonder, didn't we forget about something or won't we forget in
> the future? Because it's very easy to miss this special case...
> 
> I think attached patch (compile-tested only) should actually fix the
> problem as well without this special ->dirty_inode() call. It basically
> only moves the mark_inode_dirty_sync() before inode->i_state clearing.
> Because conceptually mark_inode_dirty_sync() is IMO the right function to
> call. It will take care of clearing I_DIRTY_TIME flag (because we are
> setting I_DIRTY_SYNC), it will also not touch inode->i_io_list if the inode
> is queued for sync (I_SYNC_QUEUED is set in that case). The only problem
> with calling it was that it was called *after* clearing dirty bits from
> i_state... What do you think?
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

> From 80ccc6a78d1c0532f600b98884f7a64e58333485 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Thu, 7 Jan 2021 15:36:05 +0100
> Subject: [PATCH] fs: Make sure inode is clean after __writeback_single_inode()
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/fs-writeback.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index acfb55834af2..b9356f470fae 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1473,22 +1473,25 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
>  			ret = err;
>  	}
>  
> +	/*
> +	 * If inode has dirty timestamps and we need to write them, call
> +	 * mark_inode_dirty_sync() to notify filesystem about it.
> +	 */
> +	if (inode->i_state & I_DIRTY_TIME &&
> +	    (wbc->for_sync || wbc->sync_mode == WB_SYNC_ALL ||
> +	     time_after(jiffies, inode->dirtied_time_when +
> +			dirtytime_expire_interval * HZ))) {
> +		trace_writeback_lazytime(inode);
> +		mark_inode_dirty_sync(inode);
> +	}
> +
>  	/*
>  	 * Some filesystems may redirty the inode during the writeback
>  	 * due to delalloc, clear dirty metadata flags right before
>  	 * write_inode()
>  	 */
>  	spin_lock(&inode->i_lock);
> -
>  	dirty = inode->i_state & I_DIRTY;
> -	if ((inode->i_state & I_DIRTY_TIME) &&
> -	    ((dirty & I_DIRTY_INODE) ||
> -	     wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
> -	     time_after(jiffies, inode->dirtied_time_when +
> -			dirtytime_expire_interval * HZ))) {
> -		dirty |= I_DIRTY_TIME;
> -		trace_writeback_lazytime(inode);
> -	}
>  	inode->i_state &= ~dirty;
>  
>  	/*
> @@ -1509,8 +1512,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
>  
>  	spin_unlock(&inode->i_lock);
>  
> -	if (dirty & I_DIRTY_TIME)
> -		mark_inode_dirty_sync(inode);
>  	/* Don't write the inode if only I_DIRTY_PAGES was set */
>  	if (dirty & ~I_DIRTY_PAGES) {
>  		int err = write_inode(inode, wbc);

It looks like that's going to work, and it fixes the XFS bug too.

Note that if __writeback_single_inode() is called from writeback_single_inode()
(rather than writeback_sb_inodes()), then the inode might not be queued for
sync, in which case mark_inode_dirty_sync() will move it to a writeback list.

That's okay because afterwards, writeback_single_inode() will delete the inode
from any writeback list if it's been fully cleaned, right?  So clean inodes
won't get left on a writeback list.

It's confusing because there are comments in writeback_single_inode() and above
__writeback_single_inode() that say that the inode must not be moved between
writeback lists.  I take it that those comments are outdated, as they predate
I_SYNC_QUEUED being introduced by commit 5afced3bf281 ("writeback: Avoid
skipping inode writeback")?

- Eric

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 11/13] fs: add a lazytime_expired method
  2021-01-07 14:02   ` Jan Kara
@ 2021-01-07 22:05     ` Eric Biggers
  2021-01-08  9:14       ` Christoph Hellwig
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2021-01-07 22:05 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

On Thu, Jan 07, 2021 at 03:02:28PM +0100, Jan Kara wrote:
> On Mon 04-01-21 16:54:50, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > Add a lazytime_expired method to 'struct super_operations'.  Filesystems
> > can implement this to be notified when an inode's lazytime timestamps
> > have expired and need to be written to disk.
> > 
> > This avoids any potential ambiguity with
> > ->dirty_inode(inode, I_DIRTY_SYNC), which can also mean a generic
> > dirtying of the inode, not just a lazytime timestamp expiration.
> > In particular, this will be useful for XFS.
> > 
> > If not implemented, then ->dirty_inode(inode, I_DIRTY_SYNC) continues to
> > be called.
> > 
> > Note that there are three cases where we have to make sure to call
> > lazytime_expired():
> > 
> > - __writeback_single_inode(): inode is being written now
> > - vfs_fsync_range(): inode is going to be synced
> > - iput(): inode is going to be evicted
> > 
> > In the latter two cases, the inode still needs to be put on the
> > writeback list.  So, we can't just replace the calls to
> > mark_inode_dirty_sync() with lazytime_expired().  Instead, add a new
> > flag I_DIRTY_TIME_EXPIRED which can be passed to __mark_inode_dirty().
> > It's like I_DIRTY_SYNC, except it causes the filesystem to be notified
> > of a lazytime expiration rather than a generic I_DIRTY_SYNC.
> > 
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> 
> Hum, seeing this patch I kind of wonder: Why don't we dirty the inode after
> expiring the lazytime timestamps with I_DIRTY_SYNC | I_DIRTY_TIME_EXPIRED
> and propagate I_DIRTY_TIME_EXPIRED even to ->dirty_inode() where XFS can
> catch it and act? Functionally it would be the same but we'd save a bunch
> of generic code and ->lazytime_expired helper used just by a single
> filesystem...
> 

Yes, that would be equivalent to what this patch does.

Either way, note that if we also use your suggestion for patch #1, then that
already fixes the XFS bug, since i_state will start containing I_DIRTY_TIME when
->dirty_inode(I_DIRTY_SYNC) is called.  So xfs_fs_dirty_inode() will start
working as intended.

That makes introducing ->lazytime_expired (or equivalently I_DIRTY_TIME_EXPIRED)
kind of useless since it wouldn't actually fix anything.

So I'm tempted to just drop it.

The XFS developers might have a different opinion though, as they were the ones
who requested it originally:

	https://lore.kernel.org/r/20200312143445.GA19160@infradead.org
	https://lore.kernel.org/r/20200325092057.GA25483@infradead.org
	https://lore.kernel.org/r/20200325154759.GY29339@magnolia
	https://lore.kernel.org/r/20200312223913.GL10776@dread.disaster.area

Any thoughts from anyone about whether we should still introduce a separate
notification for lazytime expiration, vs. just using ->dirty_inode(I_DIRTY_SYNC)
with I_DIRTY_TIME in i_state?

- Eric

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 01/13] fs: avoid double-writing inodes on lazytime expiration
  2021-01-07 21:46     ` Eric Biggers
@ 2021-01-08  8:54       ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2021-01-08  8:54 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Jan Kara, linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig, stable

On Thu, Jan 07, 2021 at 01:46:37PM -0800, Eric Biggers wrote:
> It looks like that's going to work, and it fixes the XFS bug too.
> 
> Note that if __writeback_single_inode() is called from writeback_single_inode()
> (rather than writeback_sb_inodes()), then the inode might not be queued for
> sync, in which case mark_inode_dirty_sync() will move it to a writeback list.
> 
> That's okay because afterwards, writeback_single_inode() will delete the inode
> from any writeback list if it's been fully cleaned, right?  So clean inodes
> won't get left on a writeback list.
> 
> It's confusing because there are comments in writeback_single_inode() and above
> __writeback_single_inode() that say that the inode must not be moved between
> writeback lists.  I take it that those comments are outdated, as they predate
> I_SYNC_QUEUED being introduced by commit 5afced3bf281 ("writeback: Avoid
> skipping inode writeback")?

Yes.  I think we need to update the comment as well.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/13] gfs2: don't worry about I_DIRTY_TIME in gfs2_fsync()
  2021-01-05  0:54 ` [PATCH 02/13] gfs2: don't worry about I_DIRTY_TIME in gfs2_fsync() Eric Biggers
@ 2021-01-08  8:54   ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2021-01-08  8:54 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 03/13] fs: only specify I_DIRTY_TIME when needed in generic_update_time()
  2021-01-05  0:54 ` [PATCH 03/13] fs: only specify I_DIRTY_TIME when needed in generic_update_time() Eric Biggers
@ 2021-01-08  8:57   ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2021-01-08  8:57 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 01/13] fs: avoid double-writing inodes on lazytime expiration
  2021-01-07 14:47   ` Jan Kara
  2021-01-07 14:58     ` Matthew Wilcox
  2021-01-07 21:46     ` Eric Biggers
@ 2021-01-08  9:01     ` Christoph Hellwig
  2021-01-09 17:11       ` Eric Biggers
  2 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2021-01-08  9:01 UTC (permalink / raw)
  To: Jan Kara
  Cc: Eric Biggers, linux-fsdevel, linux-ext4, linux-f2fs-devel,
	linux-xfs, Theodore Ts'o, Christoph Hellwig, stable

> +	/*
> +	 * If inode has dirty timestamps and we need to write them, call
> +	 * mark_inode_dirty_sync() to notify filesystem about it.
> +	 */
> +	if (inode->i_state & I_DIRTY_TIME &&
> +	    (wbc->for_sync || wbc->sync_mode == WB_SYNC_ALL ||
> +	     time_after(jiffies, inode->dirtied_time_when +
> +			dirtytime_expire_interval * HZ))) {

If we're touching this area, it would be nice to split this condition
into a readable helper ala:

static inline bool inode_needs_timestamp_sync(struct writeback_control *wbc,
		struct inode *inode)
{
	if (!(inode->i_state & I_DIRTY_TIME))
		return false;
	if (wbc->for_sync || wbc->sync_mode == WB_SYNC_ALL)
		return true;
	return time_after(jiffies, inode->dirtied_time_when +
			  dirtytime_expire_interval * HZ);
}

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 05/13] fs: don't call ->dirty_inode for lazytime timestamp updates
  2021-01-05  0:54 ` [PATCH 05/13] fs: don't call ->dirty_inode for lazytime timestamp updates Eric Biggers
  2021-01-07 13:17   ` Jan Kara
@ 2021-01-08  9:02   ` Christoph Hellwig
  1 sibling, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2021-01-08  9:02 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 06/13] fs: pass only I_DIRTY_INODE flags to ->dirty_inode
  2021-01-05  0:54 ` [PATCH 06/13] fs: pass only I_DIRTY_INODE flags to ->dirty_inode Eric Biggers
@ 2021-01-08  9:02   ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2021-01-08  9:02 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 07/13] fs: correctly document the inode dirty flags
  2021-01-05  0:54 ` [PATCH 07/13] fs: correctly document the inode dirty flags Eric Biggers
@ 2021-01-08  9:03   ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2021-01-08  9:03 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 10/13] fs: clean up __mark_inode_dirty() a bit
  2021-01-05  0:54 ` [PATCH 10/13] fs: clean up __mark_inode_dirty() a bit Eric Biggers
@ 2021-01-08  9:10   ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2021-01-08  9:10 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

On Mon, Jan 04, 2021 at 04:54:49PM -0800, Eric Biggers wrote:
> +	} else {
> +		/*
> +		 * Else it's either I_DIRTY_PAGES, I_DIRTY_TIME, or nothing.
> +		 * (We don't support setting both I_DIRTY_PAGES and I_DIRTY_TIME
> +		 * in one call to __mark_inode_dirty().)
> +		 */
> +		dirtytime = flags & I_DIRTY_TIME;
> +		WARN_ON_ONCE(dirtytime && (flags != I_DIRTY_TIME));

No need for the inner braces here.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/13] fs: drop redundant checks from __writeback_single_inode()
  2021-01-05  0:54 ` [PATCH 09/13] fs: drop redundant checks from __writeback_single_inode() Eric Biggers
@ 2021-01-08  9:12   ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2021-01-08  9:12 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

Looks good, but will change a bit with Jan's patch as the base.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 11/13] fs: add a lazytime_expired method
  2021-01-07 22:05     ` Eric Biggers
@ 2021-01-08  9:14       ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2021-01-08  9:14 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Jan Kara, linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

On Thu, Jan 07, 2021 at 02:05:57PM -0800, Eric Biggers wrote:
> The XFS developers might have a different opinion though, as they were the ones
> who requested it originally:
> 
> 	https://lore.kernel.org/r/20200312143445.GA19160@infradead.org
> 	https://lore.kernel.org/r/20200325092057.GA25483@infradead.org
> 	https://lore.kernel.org/r/20200325154759.GY29339@magnolia
> 	https://lore.kernel.org/r/20200312223913.GL10776@dread.disaster.area
> 
> Any thoughts from anyone about whether we should still introduce a separate
> notification for lazytime expiration, vs. just using ->dirty_inode(I_DIRTY_SYNC)
> with I_DIRTY_TIME in i_state?

I still find the way ->dirty_inode is used very confusing, but with this
series and Jan's first patch I think we have a good enough state for now
and don't need to add a method just for XFS.  I still think it might make
sense to eventually revisit how file systems are notified about dirtying.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 12/13] xfs: remove a stale comment from xfs_file_aio_write_checks()
  2021-01-05  0:54 ` [PATCH 12/13] xfs: remove a stale comment from xfs_file_aio_write_checks() Eric Biggers
@ 2021-01-08  9:15   ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2021-01-08  9:15 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, Christoph Hellwig

On Mon, Jan 04, 2021 at 04:54:51PM -0800, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> The comment in xfs_file_aio_write_checks() about calling file_modified()
> after dropping the ilock doesn't make sense, because the code that
> unconditionally acquires and drops the ilock was removed by
> commit 467f78992a07 ("xfs: reduce ilock hold times in
> xfs_file_aio_write_checks").
> 
> Remove this outdated comment.

Yes, this looks good, I actually have the removal included in a WIP
patch as well, but splitting it out like this look good to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 01/13] fs: avoid double-writing inodes on lazytime expiration
  2021-01-08  9:01     ` Christoph Hellwig
@ 2021-01-09 17:11       ` Eric Biggers
  0 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2021-01-09 17:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	Theodore Ts'o, stable

On Fri, Jan 08, 2021 at 10:01:33AM +0100, Christoph Hellwig wrote:
> > +	/*
> > +	 * If inode has dirty timestamps and we need to write them, call
> > +	 * mark_inode_dirty_sync() to notify filesystem about it.
> > +	 */
> > +	if (inode->i_state & I_DIRTY_TIME &&
> > +	    (wbc->for_sync || wbc->sync_mode == WB_SYNC_ALL ||
> > +	     time_after(jiffies, inode->dirtied_time_when +
> > +			dirtytime_expire_interval * HZ))) {
> 
> If we're touching this area, it would be nice to split this condition
> into a readable helper ala:
> 
> static inline bool inode_needs_timestamp_sync(struct writeback_control *wbc,
> 		struct inode *inode)
> {
> 	if (!(inode->i_state & I_DIRTY_TIME))
> 		return false;
> 	if (wbc->for_sync || wbc->sync_mode == WB_SYNC_ALL)
> 		return true;
> 	return time_after(jiffies, inode->dirtied_time_when +
> 			  dirtytime_expire_interval * HZ);
> }

I didn't end up doing this since it would only be called once, and IMO it's more
readable to keep it inlined next to the comment that explains what's going on.
Especially considering the later patch that drops the check for wbc->for_sync.

-  Eric

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2021-01-09 17:12 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-05  0:54 [PATCH 00/13] lazytime fixes and cleanups Eric Biggers
2021-01-05  0:54 ` [PATCH 01/13] fs: avoid double-writing inodes on lazytime expiration Eric Biggers
2021-01-07 14:47   ` Jan Kara
2021-01-07 14:58     ` Matthew Wilcox
2021-01-07 21:46     ` Eric Biggers
2021-01-08  8:54       ` Christoph Hellwig
2021-01-08  9:01     ` Christoph Hellwig
2021-01-09 17:11       ` Eric Biggers
2021-01-05  0:54 ` [PATCH 02/13] gfs2: don't worry about I_DIRTY_TIME in gfs2_fsync() Eric Biggers
2021-01-08  8:54   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 03/13] fs: only specify I_DIRTY_TIME when needed in generic_update_time() Eric Biggers
2021-01-08  8:57   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 04/13] fat: only specify I_DIRTY_TIME when needed in fat_update_time() Eric Biggers
2021-01-07 13:13   ` Jan Kara
2021-01-07 19:10     ` Eric Biggers
2021-01-05  0:54 ` [PATCH 05/13] fs: don't call ->dirty_inode for lazytime timestamp updates Eric Biggers
2021-01-07 13:17   ` Jan Kara
2021-01-07 13:18     ` Jan Kara
2021-01-08  9:02   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 06/13] fs: pass only I_DIRTY_INODE flags to ->dirty_inode Eric Biggers
2021-01-08  9:02   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 07/13] fs: correctly document the inode dirty flags Eric Biggers
2021-01-08  9:03   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 08/13] ext4: simplify i_state checks in __ext4_update_other_inode_time() Eric Biggers
2021-01-07 13:24   ` Jan Kara
2021-01-07 19:06     ` Eric Biggers
2021-01-05  0:54 ` [PATCH 09/13] fs: drop redundant checks from __writeback_single_inode() Eric Biggers
2021-01-08  9:12   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 10/13] fs: clean up __mark_inode_dirty() a bit Eric Biggers
2021-01-08  9:10   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 11/13] fs: add a lazytime_expired method Eric Biggers
2021-01-07 14:02   ` Jan Kara
2021-01-07 22:05     ` Eric Biggers
2021-01-08  9:14       ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 12/13] xfs: remove a stale comment from xfs_file_aio_write_checks() Eric Biggers
2021-01-08  9:15   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 13/13] xfs: implement lazytime_expired Eric Biggers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).