* lazytime causing inodes to remain dirty after sync? @ 2020-03-06 0:45 Eric Biggers 2020-03-07 2:00 ` [PATCH] writeback: avoid double-writing the inode on a lazytime expiration Theodore Ts'o 0 siblings, 1 reply; 9+ messages in thread From: Eric Biggers @ 2020-03-06 0:45 UTC (permalink / raw) To: linux-fsdevel; +Cc: linux-ext4, linux-f2fs-devel While testing my patch "fscrypt: don't evict dirty inodes after removing key" (https://lkml.kernel.org/r/20200305084138.653498-1-ebiggers@kernel.org), I've run into an issue where even after the filesystem is sync'ed and no files are in-use, inodes can remain dirty if the filesystem is mounted with -o lazytime. Thus, my patch causes some inodes to not be evicted when they should be. (lazytime is the default on f2fs, but ext4 supports it too.) This is caused by the following code in __writeback_single_inode() that redirties the inode if its access time is dirty: if (dirty & I_DIRTY_TIME) mark_inode_dirty_sync(inode); /* Don't write the inode if only I_DIRTY_PAGES was set */ if (dirty & ~I_DIRTY_PAGES) { int err = write_inode(inode, wbc); if (ret == 0) ret = err; } trace_writeback_single_inode(inode, wbc, nr_to_write); return ret; Here's a reproducer in the kvm-xfstests test appliance which demonstrates the problem using sync(), without fscrypt involved at all: sysctl vm.dirty_expire_centisecs=500 umount /vdc mkfs.ext4 -F /dev/vdc mount /vdc -o lazytime echo contents > /vdc/file sync ino=$(stat -c %i /vdc/file) echo 1 | tee /sys/kernel/debug/tracing/events/writeback/writeback_{single_inode_start,mark_inode_dirty,lazytime}/enable echo "ino == $ino" | tee /sys/kernel/debug/tracing/events/writeback/writeback_{single_inode_start,mark_inode_dirty,lazytime}/filter echo > /sys/kernel/debug/tracing/trace cat /vdc/file > /dev/null sync cat /sys/kernel/debug/tracing/trace_pipe The tracing shows that the inode for /vdc/file is written during the sync at 7.28s. But then, still during the sync, it's immediately re-dirtied. It then gets written again later in the background, after the sync. cat-286 [001] ...1 7.279433: writeback_mark_inode_dirty: bdi 254:32: ino=12 state= flags=I_DIRTY_TIME kworker/u8:0-8 [003] ...1 7.282647: writeback_single_inode_start: bdi 254:32: ino=12 state=I_SYNC|I_DIRTY_TIME|I_DIRTY_TIME_EXPIRED dirtied_when=4294879420 age=0 index=1 to_write=9223372036854775807 wrote=0 cgroup_ino=1 kworker/u8:0-8 [003] ...2 7.282660: writeback_lazytime: dev 254,32 ino 12 dirtied 4294879420 state I_SYNC|I_DIRTY_TIME|I_DIRTY_TIME_EXPIRED mode 0100644 kworker/u8:0-8 [003] ...1 7.283204: writeback_mark_inode_dirty: bdi 254:32: ino=12 state=I_SYNC flags=I_DIRTY_SYNC kworker/u8:0-8 [003] ...1 12.412079: writeback_single_inode_start: bdi 254:32: ino=12 state=I_DIRTY_SYNC|I_SYNC dirtied_when=4294879421 age=5 index=1 to_write=13312 wrote=0 cgroup_ino=1 Is this behavior intentional at all? It seems like a bug; it seems the inode should be written just once, during the sync. - Eric ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] writeback: avoid double-writing the inode on a lazytime expiration 2020-03-06 0:45 lazytime causing inodes to remain dirty after sync? Eric Biggers @ 2020-03-07 2:00 ` Theodore Ts'o 2020-03-11 3:20 ` Eric Biggers 0 siblings, 1 reply; 9+ messages in thread From: Theodore Ts'o @ 2020-03-07 2:00 UTC (permalink / raw) To: Linux Filesystem Development List Cc: Ext4 Developers List, linux-f2fs-devel, Theodore Ts'o, Eric Biggers In the case that an inode has dirty timestamp for longer than the lazytime expiration timeout (or if all such inodes are being flushed out due to a sync or syncfs system call), we need to inform the file system that the inode is dirty so that the inode's timestamps can be copied out to the on-disk data structures. That's because if the file system supports lazytime, it will have ignored the dirty_inode(inode, I_DIRTY_TIME) notification when the timestamp was modified in memory.q Previously, this was accomplished by calling mark_inode_dirty_sync(), but that has the unfortunate side effect of also putting the inode the writeback list, and that's not necessary in this case, since we will immediately call write_inode() afterwards. Eric Biggers noticed that this was causing problems for fscrypt after the key was removed[1]. [1] https://lore.kernel.org/r/20200306004555.GB225345@gmail.com Reported-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> --- fs/fs-writeback.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 76ac9c7d32ec..32101349ba97 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1504,8 +1504,9 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) spin_unlock(&inode->i_lock); - if (dirty & I_DIRTY_TIME) - mark_inode_dirty_sync(inode); + /* This was a lazytime expiration; we need to tell the file system */ + if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode) + inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_TIME_EXPIRED); /* Don't write the inode if only I_DIRTY_PAGES was set */ if (dirty & ~I_DIRTY_PAGES) { int err = write_inode(inode, wbc); -- 2.24.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration 2020-03-07 2:00 ` [PATCH] writeback: avoid double-writing the inode on a lazytime expiration Theodore Ts'o @ 2020-03-11 3:20 ` Eric Biggers 2020-03-11 12:57 ` Theodore Y. Ts'o 2020-03-11 23:54 ` Dave Chinner 0 siblings, 2 replies; 9+ messages in thread From: Eric Biggers @ 2020-03-11 3:20 UTC (permalink / raw) To: Theodore Ts'o Cc: Linux Filesystem Development List, Ext4 Developers List, linux-f2fs-devel On Fri, Mar 06, 2020 at 09:00:43PM -0500, Theodore Ts'o wrote: > In the case that an inode has dirty timestamp for longer than the > lazytime expiration timeout (or if all such inodes are being flushed > out due to a sync or syncfs system call), we need to inform the file > system that the inode is dirty so that the inode's timestamps can be > copied out to the on-disk data structures. That's because if the file > system supports lazytime, it will have ignored the dirty_inode(inode, > I_DIRTY_TIME) notification when the timestamp was modified in memory.q > > Previously, this was accomplished by calling mark_inode_dirty_sync(), > but that has the unfortunate side effect of also putting the inode the > writeback list, and that's not necessary in this case, since we will > immediately call write_inode() afterwards. > > Eric Biggers noticed that this was causing problems for fscrypt after > the key was removed[1]. > > [1] https://lore.kernel.org/r/20200306004555.GB225345@gmail.com > > Reported-by: Eric Biggers <ebiggers@kernel.org> > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > --- > fs/fs-writeback.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 76ac9c7d32ec..32101349ba97 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -1504,8 +1504,9 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) > > spin_unlock(&inode->i_lock); > > - if (dirty & I_DIRTY_TIME) > - mark_inode_dirty_sync(inode); > + /* This was a lazytime expiration; we need to tell the file system */ > + if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode) > + inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_TIME_EXPIRED); > /* Don't write the inode if only I_DIRTY_PAGES was set */ > if (dirty & ~I_DIRTY_PAGES) { > int err = write_inode(inode, wbc); > -- Thanks Ted! This fixes the fscrypt test failure. However, are you sure this works correctly on all filesystems? I'm not sure about XFS. XFS only implements ->dirty_inode(), not ->write_inode(), and in its ->dirty_inode() it does: static void xfs_fs_dirty_inode( struct inode *inode, int flag) { struct xfs_inode *ip = XFS_I(inode); struct xfs_mount *mp = ip->i_mount; struct xfs_trans *tp; if (!(inode->i_sb->s_flags & SB_LAZYTIME)) return; if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME)) return; if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp)) return; xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); xfs_trans_log_inode(tp, ip, XFS_ILOG_TIMESTAMP); xfs_trans_commit(tp); } So flag=I_DIRTY_TIME_EXPIRED will be a no-op. Maybe you should be using I_DIRTY_SYNC instead? Or perhaps XFS should be checking for either I_DIRTY_TIME_EXPIRED or I_DIRTY_SYNC? - Eric ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration 2020-03-11 3:20 ` Eric Biggers @ 2020-03-11 12:57 ` Theodore Y. Ts'o 2020-03-12 0:07 ` Dave Chinner 2020-03-11 23:54 ` Dave Chinner 1 sibling, 1 reply; 9+ messages in thread From: Theodore Y. Ts'o @ 2020-03-11 12:57 UTC (permalink / raw) To: Eric Biggers Cc: Linux Filesystem Development List, Ext4 Developers List, linux-f2fs-devel On Tue, Mar 10, 2020 at 08:20:09PM -0700, Eric Biggers wrote: > Thanks Ted! This fixes the fscrypt test failure. > > However, are you sure this works correctly on all filesystems? I'm not sure > about XFS. XFS only implements ->dirty_inode(), not ->write_inode(), and in its > ->dirty_inode() it does: ... > if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME)) > return; That's true, but when the timestamps were originally modified, dirty_inode() will be called with flag == I_DIRTY_TIME, which will *not* be a no-op; which is to say, XFS will force the timestamps to be updated on disk when the timestamps are first dirtied, because it doesn't support I_DIRTY_TIME. So I think we're fine. - Ted ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration 2020-03-11 12:57 ` Theodore Y. Ts'o @ 2020-03-12 0:07 ` Dave Chinner 2020-03-12 14:34 ` Christoph Hellwig 0 siblings, 1 reply; 9+ messages in thread From: Dave Chinner @ 2020-03-12 0:07 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Eric Biggers, Linux Filesystem Development List, Ext4 Developers List, linux-f2fs-devel, linux-xfs On Wed, Mar 11, 2020 at 08:57:49AM -0400, Theodore Y. Ts'o wrote: > On Tue, Mar 10, 2020 at 08:20:09PM -0700, Eric Biggers wrote: > > Thanks Ted! This fixes the fscrypt test failure. > > > > However, are you sure this works correctly on all filesystems? I'm not sure > > about XFS. XFS only implements ->dirty_inode(), not ->write_inode(), and in its > > ->dirty_inode() it does: > ... > > if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME)) > > return; > > That's true, but when the timestamps were originally modified, > dirty_inode() will be called with flag == I_DIRTY_TIME, which will > *not* be a no-op; which is to say, XFS will force the timestamps to be > updated on disk when the timestamps are first dirtied, because it > doesn't support I_DIRTY_TIME. We log the initial timestamp change, and then ignore timestamp updates until the dirty time expires and the inode is set I_DIRTY_SYNC via __mark_inode_dirty_sync(). IOWs, on expiry, we have time stamps that may be 24 hours out of date in memory, and they still need to be flushed to the journal. However, your change does not mark the inode dirtying on expiry anymore, so... > So I think we're fine. ... we're not fine. This breaks XFS and any other filesystem that relies on a I_DIRTY_SYNC notification to handle dirty time expiry correctly. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration 2020-03-12 0:07 ` Dave Chinner @ 2020-03-12 14:34 ` Christoph Hellwig 2020-03-12 22:39 ` Dave Chinner 2020-03-20 2:46 ` Theodore Y. Ts'o 0 siblings, 2 replies; 9+ messages in thread From: Christoph Hellwig @ 2020-03-12 14:34 UTC (permalink / raw) To: Dave Chinner Cc: Theodore Y. Ts'o, Eric Biggers, Linux Filesystem Development List, Ext4 Developers List, linux-f2fs-devel, linux-xfs On Thu, Mar 12, 2020 at 11:07:17AM +1100, Dave Chinner wrote: > > That's true, but when the timestamps were originally modified, > > dirty_inode() will be called with flag == I_DIRTY_TIME, which will > > *not* be a no-op; which is to say, XFS will force the timestamps to be > > updated on disk when the timestamps are first dirtied, because it > > doesn't support I_DIRTY_TIME. > > We log the initial timestamp change, and then ignore timestamp > updates until the dirty time expires and the inode is set > I_DIRTY_SYNC via __mark_inode_dirty_sync(). IOWs, on expiry, we have > time stamps that may be 24 hours out of date in memory, and they > still need to be flushed to the journal. > > However, your change does not mark the inode dirtying on expiry > anymore, so... > > > So I think we're fine. > > ... we're not fine. This breaks XFS and any other filesystem that > relies on a I_DIRTY_SYNC notification to handle dirty time expiry > correctly. I haven't seen the original mail this replies to, but if we could get the lazytime expirty by some other means (e.g. an explicit callback), XFS could opt out of all the VFS inode tracking again, which would simplify a few things. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration 2020-03-12 14:34 ` Christoph Hellwig @ 2020-03-12 22:39 ` Dave Chinner 2020-03-20 2:46 ` Theodore Y. Ts'o 1 sibling, 0 replies; 9+ messages in thread From: Dave Chinner @ 2020-03-12 22:39 UTC (permalink / raw) To: Christoph Hellwig Cc: Theodore Y. Ts'o, Eric Biggers, Linux Filesystem Development List, Ext4 Developers List, linux-f2fs-devel, linux-xfs On Thu, Mar 12, 2020 at 07:34:45AM -0700, Christoph Hellwig wrote: > On Thu, Mar 12, 2020 at 11:07:17AM +1100, Dave Chinner wrote: > > > That's true, but when the timestamps were originally modified, > > > dirty_inode() will be called with flag == I_DIRTY_TIME, which will > > > *not* be a no-op; which is to say, XFS will force the timestamps to be > > > updated on disk when the timestamps are first dirtied, because it > > > doesn't support I_DIRTY_TIME. > > > > We log the initial timestamp change, and then ignore timestamp > > updates until the dirty time expires and the inode is set > > I_DIRTY_SYNC via __mark_inode_dirty_sync(). IOWs, on expiry, we have > > time stamps that may be 24 hours out of date in memory, and they > > still need to be flushed to the journal. > > > > However, your change does not mark the inode dirtying on expiry > > anymore, so... > > > > > So I think we're fine. > > > > ... we're not fine. This breaks XFS and any other filesystem that > > relies on a I_DIRTY_SYNC notification to handle dirty time expiry > > correctly. > > I haven't seen the original mail this replies to, The original problem was calling mark_inode_dirty_sync() on expiry during inode writeback was causing the inode to be put back on the dirty inode list and so ext4 was flushing it twice - once on expiry and once 5 seconds later on the next background writeback pass. This is a problem that XFS does not have because it does not implement ->write_inode... > but if we could > get the lazytime expirty by some other means (e.g. an explicit > callback), XFS could opt out of all the VFS inode tracking again, > which would simplify a few things. Yes, that would definitely make things simpler for XFS, and it would also solve the problem that the generic lazytime expiry code has.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration 2020-03-12 14:34 ` Christoph Hellwig 2020-03-12 22:39 ` Dave Chinner @ 2020-03-20 2:46 ` Theodore Y. Ts'o 1 sibling, 0 replies; 9+ messages in thread From: Theodore Y. Ts'o @ 2020-03-20 2:46 UTC (permalink / raw) To: Christoph Hellwig Cc: Dave Chinner, Eric Biggers, Linux Filesystem Development List, Ext4 Developers List, linux-f2fs-devel, linux-xfs On Thu, Mar 12, 2020 at 07:34:45AM -0700, Christoph Hellwig wrote: > I haven't seen the original mail this replies to, but if we could > get the lazytime expirty by some other means (e.g. an explicit > callback), XFS could opt out of all the VFS inode tracking again, > which would simplify a few things. Part of my thinking of calling inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_TIME_EXPIRED); So that it would be an explicit callback to XFS. So why don't I break this as two patches --- one which uses I_DIRTY_SYNC, as before, and a second one which changes calls dirty_inode() with I_DIRTY_TIME_EXPIRED, and with a change to XFS so that it recognizes I_DIRTY_TIME_EXPIRED as if it were I_DIRTY_SYNC. If this would then allow XFS to simplify how it handles VFS tracking, you could do that in a separate patch. Does that work? I'll send out the two patches, and if you can review/ack the second patch, that would be great. - Ted ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration 2020-03-11 3:20 ` Eric Biggers 2020-03-11 12:57 ` Theodore Y. Ts'o @ 2020-03-11 23:54 ` Dave Chinner 1 sibling, 0 replies; 9+ messages in thread From: Dave Chinner @ 2020-03-11 23:54 UTC (permalink / raw) To: Eric Biggers Cc: Theodore Ts'o, Linux Filesystem Development List, Ext4 Developers List, linux-f2fs-devel On Tue, Mar 10, 2020 at 08:20:09PM -0700, Eric Biggers wrote: > On Fri, Mar 06, 2020 at 09:00:43PM -0500, Theodore Ts'o wrote: > > In the case that an inode has dirty timestamp for longer than the > > lazytime expiration timeout (or if all such inodes are being flushed > > out due to a sync or syncfs system call), we need to inform the file > > system that the inode is dirty so that the inode's timestamps can be > > copied out to the on-disk data structures. That's because if the file > > system supports lazytime, it will have ignored the dirty_inode(inode, > > I_DIRTY_TIME) notification when the timestamp was modified in memory.q > > > > Previously, this was accomplished by calling mark_inode_dirty_sync(), > > but that has the unfortunate side effect of also putting the inode the > > writeback list, and that's not necessary in this case, since we will > > immediately call write_inode() afterwards. > > > > Eric Biggers noticed that this was causing problems for fscrypt after > > the key was removed[1]. > > > > [1] https://lore.kernel.org/r/20200306004555.GB225345@gmail.com > > > > Reported-by: Eric Biggers <ebiggers@kernel.org> > > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > > --- > > fs/fs-writeback.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > > index 76ac9c7d32ec..32101349ba97 100644 > > --- a/fs/fs-writeback.c > > +++ b/fs/fs-writeback.c > > @@ -1504,8 +1504,9 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) > > > > spin_unlock(&inode->i_lock); > > > > - if (dirty & I_DIRTY_TIME) > > - mark_inode_dirty_sync(inode); > > + /* This was a lazytime expiration; we need to tell the file system */ > > + if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode) > > + inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_TIME_EXPIRED); > > /* Don't write the inode if only I_DIRTY_PAGES was set */ > > if (dirty & ~I_DIRTY_PAGES) { > > int err = write_inode(inode, wbc); > > -- > > Thanks Ted! This fixes the fscrypt test failure. > > However, are you sure this works correctly on all filesystems? I'm not sure > about XFS. XFS only implements ->dirty_inode(), not ->write_inode(), and in its > ->dirty_inode() it does: > > static void > xfs_fs_dirty_inode( > struct inode *inode, > int flag) > { > struct xfs_inode *ip = XFS_I(inode); > struct xfs_mount *mp = ip->i_mount; > struct xfs_trans *tp; > > if (!(inode->i_sb->s_flags & SB_LAZYTIME)) > return; > if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME)) > return; > > if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp)) > return; > xfs_ilock(ip, XFS_ILOCK_EXCL); > xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); > xfs_trans_log_inode(tp, ip, XFS_ILOG_TIMESTAMP); > xfs_trans_commit(tp); > } > > > So flag=I_DIRTY_TIME_EXPIRED will be a no-op. > > Maybe you should be using I_DIRTY_SYNC instead? Or perhaps XFS should be > checking for either I_DIRTY_TIME_EXPIRED or I_DIRTY_SYNC? Right, XFS does not use the VFS inode writeback code at all - we track all metadata changes internally via journalling. The VFS uses I_DIRTY_SYNC to indicate and inode is metadata dirty and a writeback candidate. Hence if we need to mark an inode dirty for integrity purposes for _any reason_, then I_DIRTY_SYNC is the correct flag to be passing to ->dirty_inode. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-03-20 2:47 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-06 0:45 lazytime causing inodes to remain dirty after sync? Eric Biggers 2020-03-07 2:00 ` [PATCH] writeback: avoid double-writing the inode on a lazytime expiration Theodore Ts'o 2020-03-11 3:20 ` Eric Biggers 2020-03-11 12:57 ` Theodore Y. Ts'o 2020-03-12 0:07 ` Dave Chinner 2020-03-12 14:34 ` Christoph Hellwig 2020-03-12 22:39 ` Dave Chinner 2020-03-20 2:46 ` Theodore Y. Ts'o 2020-03-11 23:54 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).