Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
* lazytime causing inodes to remain dirty after sync?
@ 2020-03-06  0:45 Eric Biggers
  2020-03-07  2:00 ` [PATCH] writeback: avoid double-writing the inode on a lazytime expiration Theodore Ts'o
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Biggers @ 2020-03-06  0:45 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-ext4, linux-f2fs-devel

While testing my patch "fscrypt: don't evict dirty inodes after removing key"
(https://lkml.kernel.org/r/20200305084138.653498-1-ebiggers@kernel.org), I've
run into an issue where even after the filesystem is sync'ed and no files are
in-use, inodes can remain dirty if the filesystem is mounted with -o lazytime.
Thus, my patch causes some inodes to not be evicted when they should be.

(lazytime is the default on f2fs, but ext4 supports it too.)

This is caused by the following code in __writeback_single_inode() that
redirties the inode if its access time is dirty:

	if (dirty & I_DIRTY_TIME)
		mark_inode_dirty_sync(inode);
	/* Don't write the inode if only I_DIRTY_PAGES was set */
	if (dirty & ~I_DIRTY_PAGES) {
		int err = write_inode(inode, wbc);
		if (ret == 0)
			ret = err;
	}
	trace_writeback_single_inode(inode, wbc, nr_to_write);
	return ret;

Here's a reproducer in the kvm-xfstests test appliance which demonstrates the
problem using sync(), without fscrypt involved at all:

	sysctl vm.dirty_expire_centisecs=500
	umount /vdc
	mkfs.ext4 -F /dev/vdc
	mount /vdc -o lazytime
	echo contents > /vdc/file
	sync
	ino=$(stat -c %i /vdc/file)
	echo 1 | tee /sys/kernel/debug/tracing/events/writeback/writeback_{single_inode_start,mark_inode_dirty,lazytime}/enable
	echo "ino == $ino" | tee /sys/kernel/debug/tracing/events/writeback/writeback_{single_inode_start,mark_inode_dirty,lazytime}/filter
	echo > /sys/kernel/debug/tracing/trace
	cat /vdc/file > /dev/null
	sync
	cat /sys/kernel/debug/tracing/trace_pipe

The tracing shows that the inode for /vdc/file is written during the sync at
7.28s.  But then, still during the sync, it's immediately re-dirtied.  It then
gets written again later in the background, after the sync.

             cat-286   [001] ...1     7.279433: writeback_mark_inode_dirty: bdi 254:32: ino=12 state= flags=I_DIRTY_TIME
    kworker/u8:0-8     [003] ...1     7.282647: writeback_single_inode_start: bdi 254:32: ino=12 state=I_SYNC|I_DIRTY_TIME|I_DIRTY_TIME_EXPIRED dirtied_when=4294879420 age=0 index=1 to_write=9223372036854775807 wrote=0 cgroup_ino=1
    kworker/u8:0-8     [003] ...2     7.282660: writeback_lazytime: dev 254,32 ino 12 dirtied 4294879420 state I_SYNC|I_DIRTY_TIME|I_DIRTY_TIME_EXPIRED mode 0100644
    kworker/u8:0-8     [003] ...1     7.283204: writeback_mark_inode_dirty: bdi 254:32: ino=12 state=I_SYNC flags=I_DIRTY_SYNC
    kworker/u8:0-8     [003] ...1    12.412079: writeback_single_inode_start: bdi 254:32: ino=12 state=I_DIRTY_SYNC|I_SYNC dirtied_when=4294879421 age=5 index=1 to_write=13312 wrote=0 cgroup_ino=1

Is this behavior intentional at all?  It seems like a bug; it seems the inode
should be written just once, during the sync.  

- Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] writeback: avoid double-writing the inode on a lazytime expiration
  2020-03-06  0:45 lazytime causing inodes to remain dirty after sync? Eric Biggers
@ 2020-03-07  2:00 ` Theodore Ts'o
  2020-03-11  3:20   ` Eric Biggers
  0 siblings, 1 reply; 17+ messages in thread
From: Theodore Ts'o @ 2020-03-07  2:00 UTC (permalink / raw)
  To: Linux Filesystem Development List
  Cc: Ext4 Developers List, linux-f2fs-devel, Theodore Ts'o, Eric Biggers

In the case that an inode has dirty timestamp for longer than the
lazytime expiration timeout (or if all such inodes are being flushed
out due to a sync or syncfs system call), we need to inform the file
system that the inode is dirty so that the inode's timestamps can be
copied out to the on-disk data structures.  That's because if the file
system supports lazytime, it will have ignored the dirty_inode(inode,
I_DIRTY_TIME) notification when the timestamp was modified in memory.q

Previously, this was accomplished by calling mark_inode_dirty_sync(),
but that has the unfortunate side effect of also putting the inode the
writeback list, and that's not necessary in this case, since we will
immediately call write_inode() afterwards.

Eric Biggers noticed that this was causing problems for fscrypt after
the key was removed[1].

[1] https://lore.kernel.org/r/20200306004555.GB225345@gmail.com

Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 fs/fs-writeback.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 76ac9c7d32ec..32101349ba97 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1504,8 +1504,9 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 
 	spin_unlock(&inode->i_lock);
 
-	if (dirty & I_DIRTY_TIME)
-		mark_inode_dirty_sync(inode);
+	/* This was a lazytime expiration; we need to tell the file system */
+	if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
+		inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_TIME_EXPIRED);
 	/* Don't write the inode if only I_DIRTY_PAGES was set */
 	if (dirty & ~I_DIRTY_PAGES) {
 		int err = write_inode(inode, wbc);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration
  2020-03-07  2:00 ` [PATCH] writeback: avoid double-writing the inode on a lazytime expiration Theodore Ts'o
@ 2020-03-11  3:20   ` Eric Biggers
  2020-03-11 12:57     ` Theodore Y. Ts'o
  2020-03-11 23:54     ` [PATCH] " Dave Chinner
  0 siblings, 2 replies; 17+ messages in thread
From: Eric Biggers @ 2020-03-11  3:20 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Linux Filesystem Development List, Ext4 Developers List,
	linux-f2fs-devel

On Fri, Mar 06, 2020 at 09:00:43PM -0500, Theodore Ts'o wrote:
> In the case that an inode has dirty timestamp for longer than the
> lazytime expiration timeout (or if all such inodes are being flushed
> out due to a sync or syncfs system call), we need to inform the file
> system that the inode is dirty so that the inode's timestamps can be
> copied out to the on-disk data structures.  That's because if the file
> system supports lazytime, it will have ignored the dirty_inode(inode,
> I_DIRTY_TIME) notification when the timestamp was modified in memory.q
> 
> Previously, this was accomplished by calling mark_inode_dirty_sync(),
> but that has the unfortunate side effect of also putting the inode the
> writeback list, and that's not necessary in this case, since we will
> immediately call write_inode() afterwards.
> 
> Eric Biggers noticed that this was causing problems for fscrypt after
> the key was removed[1].
> 
> [1] https://lore.kernel.org/r/20200306004555.GB225345@gmail.com
> 
> Reported-by: Eric Biggers <ebiggers@kernel.org>
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> ---
>  fs/fs-writeback.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 76ac9c7d32ec..32101349ba97 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1504,8 +1504,9 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
>  
>  	spin_unlock(&inode->i_lock);
>  
> -	if (dirty & I_DIRTY_TIME)
> -		mark_inode_dirty_sync(inode);
> +	/* This was a lazytime expiration; we need to tell the file system */
> +	if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
> +		inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_TIME_EXPIRED);
>  	/* Don't write the inode if only I_DIRTY_PAGES was set */
>  	if (dirty & ~I_DIRTY_PAGES) {
>  		int err = write_inode(inode, wbc);
> -- 

Thanks Ted!  This fixes the fscrypt test failure.

However, are you sure this works correctly on all filesystems?  I'm not sure
about XFS.  XFS only implements ->dirty_inode(), not ->write_inode(), and in its
->dirty_inode() it does:

	static void
	xfs_fs_dirty_inode(
		struct inode                    *inode,
		int                             flag)
	{
		struct xfs_inode                *ip = XFS_I(inode);
		struct xfs_mount                *mp = ip->i_mount;
		struct xfs_trans                *tp;

		if (!(inode->i_sb->s_flags & SB_LAZYTIME))
			return;
		if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
			return;

		if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
			return;
		xfs_ilock(ip, XFS_ILOCK_EXCL);
		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
		xfs_trans_log_inode(tp, ip, XFS_ILOG_TIMESTAMP);
		xfs_trans_commit(tp);
	}


So flag=I_DIRTY_TIME_EXPIRED will be a no-op.

Maybe you should be using I_DIRTY_SYNC instead?  Or perhaps XFS should be
checking for either I_DIRTY_TIME_EXPIRED or I_DIRTY_SYNC?

- Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration
  2020-03-11  3:20   ` Eric Biggers
@ 2020-03-11 12:57     ` Theodore Y. Ts'o
  2020-03-12  0:07       ` Dave Chinner
  2020-03-11 23:54     ` [PATCH] " Dave Chinner
  1 sibling, 1 reply; 17+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-11 12:57 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Linux Filesystem Development List, Ext4 Developers List,
	linux-f2fs-devel

On Tue, Mar 10, 2020 at 08:20:09PM -0700, Eric Biggers wrote:
> Thanks Ted!  This fixes the fscrypt test failure.
> 
> However, are you sure this works correctly on all filesystems?  I'm not sure
> about XFS.  XFS only implements ->dirty_inode(), not ->write_inode(), and in its
> ->dirty_inode() it does:
  ...
> 		if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
> 			return;

That's true, but when the timestamps were originally modified,
dirty_inode() will be called with flag == I_DIRTY_TIME, which will
*not* be a no-op; which is to say, XFS will force the timestamps to be
updated on disk when the timestamps are first dirtied, because it
doesn't support I_DIRTY_TIME.

So I think we're fine.

					- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration
  2020-03-11  3:20   ` Eric Biggers
  2020-03-11 12:57     ` Theodore Y. Ts'o
@ 2020-03-11 23:54     ` " Dave Chinner
  1 sibling, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2020-03-11 23:54 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Theodore Ts'o, Linux Filesystem Development List,
	Ext4 Developers List, linux-f2fs-devel

On Tue, Mar 10, 2020 at 08:20:09PM -0700, Eric Biggers wrote:
> On Fri, Mar 06, 2020 at 09:00:43PM -0500, Theodore Ts'o wrote:
> > In the case that an inode has dirty timestamp for longer than the
> > lazytime expiration timeout (or if all such inodes are being flushed
> > out due to a sync or syncfs system call), we need to inform the file
> > system that the inode is dirty so that the inode's timestamps can be
> > copied out to the on-disk data structures.  That's because if the file
> > system supports lazytime, it will have ignored the dirty_inode(inode,
> > I_DIRTY_TIME) notification when the timestamp was modified in memory.q
> > 
> > Previously, this was accomplished by calling mark_inode_dirty_sync(),
> > but that has the unfortunate side effect of also putting the inode the
> > writeback list, and that's not necessary in this case, since we will
> > immediately call write_inode() afterwards.
> > 
> > Eric Biggers noticed that this was causing problems for fscrypt after
> > the key was removed[1].
> > 
> > [1] https://lore.kernel.org/r/20200306004555.GB225345@gmail.com
> > 
> > Reported-by: Eric Biggers <ebiggers@kernel.org>
> > Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> > ---
> >  fs/fs-writeback.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 76ac9c7d32ec..32101349ba97 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -1504,8 +1504,9 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
> >  
> >  	spin_unlock(&inode->i_lock);
> >  
> > -	if (dirty & I_DIRTY_TIME)
> > -		mark_inode_dirty_sync(inode);
> > +	/* This was a lazytime expiration; we need to tell the file system */
> > +	if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
> > +		inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_TIME_EXPIRED);
> >  	/* Don't write the inode if only I_DIRTY_PAGES was set */
> >  	if (dirty & ~I_DIRTY_PAGES) {
> >  		int err = write_inode(inode, wbc);
> > -- 
> 
> Thanks Ted!  This fixes the fscrypt test failure.
> 
> However, are you sure this works correctly on all filesystems?  I'm not sure
> about XFS.  XFS only implements ->dirty_inode(), not ->write_inode(), and in its
> ->dirty_inode() it does:
> 
> 	static void
> 	xfs_fs_dirty_inode(
> 		struct inode                    *inode,
> 		int                             flag)
> 	{
> 		struct xfs_inode                *ip = XFS_I(inode);
> 		struct xfs_mount                *mp = ip->i_mount;
> 		struct xfs_trans                *tp;
> 
> 		if (!(inode->i_sb->s_flags & SB_LAZYTIME))
> 			return;
> 		if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
> 			return;
> 
> 		if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
> 			return;
> 		xfs_ilock(ip, XFS_ILOCK_EXCL);
> 		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> 		xfs_trans_log_inode(tp, ip, XFS_ILOG_TIMESTAMP);
> 		xfs_trans_commit(tp);
> 	}
> 
> 
> So flag=I_DIRTY_TIME_EXPIRED will be a no-op.
> 
> Maybe you should be using I_DIRTY_SYNC instead?  Or perhaps XFS should be
> checking for either I_DIRTY_TIME_EXPIRED or I_DIRTY_SYNC?

Right, XFS does not use the VFS inode writeback code at all - we
track all metadata changes internally via journalling. The VFS uses
I_DIRTY_SYNC to indicate and inode is metadata dirty and a writeback
candidate. Hence if we need to mark an inode dirty for integrity
purposes for _any reason_, then I_DIRTY_SYNC is the correct flag to
be passing to ->dirty_inode.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration
  2020-03-11 12:57     ` Theodore Y. Ts'o
@ 2020-03-12  0:07       ` Dave Chinner
  2020-03-12 14:34         ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2020-03-12  0:07 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Eric Biggers, Linux Filesystem Development List,
	Ext4 Developers List, linux-f2fs-devel, linux-xfs

On Wed, Mar 11, 2020 at 08:57:49AM -0400, Theodore Y. Ts'o wrote:
> On Tue, Mar 10, 2020 at 08:20:09PM -0700, Eric Biggers wrote:
> > Thanks Ted!  This fixes the fscrypt test failure.
> > 
> > However, are you sure this works correctly on all filesystems?  I'm not sure
> > about XFS.  XFS only implements ->dirty_inode(), not ->write_inode(), and in its
> > ->dirty_inode() it does:
>   ...
> > 		if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
> > 			return;
> 
> That's true, but when the timestamps were originally modified,
> dirty_inode() will be called with flag == I_DIRTY_TIME, which will
> *not* be a no-op; which is to say, XFS will force the timestamps to be
> updated on disk when the timestamps are first dirtied, because it
> doesn't support I_DIRTY_TIME.

We log the initial timestamp change, and then ignore timestamp
updates until the dirty time expires and the inode is set
I_DIRTY_SYNC via __mark_inode_dirty_sync(). IOWs, on expiry, we have
time stamps that may be 24 hours out of date in memory, and they
still need to be flushed to the journal.

However, your change does not mark the inode dirtying on expiry
anymore, so...

> So I think we're fine.

... we're not fine. This breaks XFS and any other filesystem that
relies on a I_DIRTY_SYNC notification to handle dirty time expiry
correctly.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration
  2020-03-12  0:07       ` Dave Chinner
@ 2020-03-12 14:34         ` Christoph Hellwig
  2020-03-12 22:39           ` Dave Chinner
  2020-03-20  2:46           ` Theodore Y. Ts'o
  0 siblings, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-03-12 14:34 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Theodore Y. Ts'o, Eric Biggers,
	Linux Filesystem Development List, Ext4 Developers List,
	linux-f2fs-devel, linux-xfs

On Thu, Mar 12, 2020 at 11:07:17AM +1100, Dave Chinner wrote:
> > That's true, but when the timestamps were originally modified,
> > dirty_inode() will be called with flag == I_DIRTY_TIME, which will
> > *not* be a no-op; which is to say, XFS will force the timestamps to be
> > updated on disk when the timestamps are first dirtied, because it
> > doesn't support I_DIRTY_TIME.
> 
> We log the initial timestamp change, and then ignore timestamp
> updates until the dirty time expires and the inode is set
> I_DIRTY_SYNC via __mark_inode_dirty_sync(). IOWs, on expiry, we have
> time stamps that may be 24 hours out of date in memory, and they
> still need to be flushed to the journal.
> 
> However, your change does not mark the inode dirtying on expiry
> anymore, so...
> 
> > So I think we're fine.
> 
> ... we're not fine. This breaks XFS and any other filesystem that
> relies on a I_DIRTY_SYNC notification to handle dirty time expiry
> correctly.

I haven't seen the original mail this replies to, but if we could
get the lazytime expirty by some other means (e.g. an explicit
callback), XFS could opt out of all the VFS inode tracking again,
which would simplify a few things.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration
  2020-03-12 14:34         ` Christoph Hellwig
@ 2020-03-12 22:39           ` Dave Chinner
  2020-03-20  2:46           ` Theodore Y. Ts'o
  1 sibling, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2020-03-12 22:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Theodore Y. Ts'o, Eric Biggers,
	Linux Filesystem Development List, Ext4 Developers List,
	linux-f2fs-devel, linux-xfs

On Thu, Mar 12, 2020 at 07:34:45AM -0700, Christoph Hellwig wrote:
> On Thu, Mar 12, 2020 at 11:07:17AM +1100, Dave Chinner wrote:
> > > That's true, but when the timestamps were originally modified,
> > > dirty_inode() will be called with flag == I_DIRTY_TIME, which will
> > > *not* be a no-op; which is to say, XFS will force the timestamps to be
> > > updated on disk when the timestamps are first dirtied, because it
> > > doesn't support I_DIRTY_TIME.
> > 
> > We log the initial timestamp change, and then ignore timestamp
> > updates until the dirty time expires and the inode is set
> > I_DIRTY_SYNC via __mark_inode_dirty_sync(). IOWs, on expiry, we have
> > time stamps that may be 24 hours out of date in memory, and they
> > still need to be flushed to the journal.
> > 
> > However, your change does not mark the inode dirtying on expiry
> > anymore, so...
> > 
> > > So I think we're fine.
> > 
> > ... we're not fine. This breaks XFS and any other filesystem that
> > relies on a I_DIRTY_SYNC notification to handle dirty time expiry
> > correctly.
> 
> I haven't seen the original mail this replies to,

The original problem was calling mark_inode_dirty_sync() on expiry
during inode writeback was causing the inode to be put back on the
dirty inode list and so ext4 was flushing it twice - once on expiry
and once 5 seconds later on the next background writeback pass.

This is a problem that XFS does not have because it does not
implement ->write_inode...

> but if we could
> get the lazytime expirty by some other means (e.g. an explicit
> callback), XFS could opt out of all the VFS inode tracking again,
> which would simplify a few things.

Yes, that would definitely make things simpler for XFS, and it would
also solve the problem that the generic lazytime expiry code has....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration
  2020-03-12 14:34         ` Christoph Hellwig
  2020-03-12 22:39           ` Dave Chinner
@ 2020-03-20  2:46           ` Theodore Y. Ts'o
  2020-03-20  2:52             ` [PATCH 1/2] " Theodore Ts'o
  1 sibling, 1 reply; 17+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-20  2:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Eric Biggers, Linux Filesystem Development List,
	Ext4 Developers List, linux-f2fs-devel, linux-xfs

On Thu, Mar 12, 2020 at 07:34:45AM -0700, Christoph Hellwig wrote:
> I haven't seen the original mail this replies to, but if we could
> get the lazytime expirty by some other means (e.g. an explicit
> callback), XFS could opt out of all the VFS inode tracking again,
> which would simplify a few things.

Part of my thinking of calling 

       inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_TIME_EXPIRED);

So that it would be an explicit callback to XFS.  So why don't I break
this as two patches --- one which uses I_DIRTY_SYNC, as before, and a
second one which changes calls dirty_inode() with
I_DIRTY_TIME_EXPIRED, and with a change to XFS so that it recognizes
I_DIRTY_TIME_EXPIRED as if it were I_DIRTY_SYNC.  If this would then
allow XFS to simplify how it handles VFS tracking, you could do that
in a separate patch.

Does that work?  I'll send out the two patches, and if you can
review/ack the second patch, that would be great.

	       	      	     	  	- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] writeback: avoid double-writing the inode on a lazytime expiration
  2020-03-20  2:46           ` Theodore Y. Ts'o
@ 2020-03-20  2:52             ` " Theodore Ts'o
  2020-03-20  2:52               ` [PATCH 2/2] writeback, xfs: call dirty_inode() with I_DIRTY_TIME_EXPIRED when appropriate Theodore Ts'o
  2020-03-25  9:20               ` [PATCH 1/2] writeback: avoid double-writing the inode on a lazytime expiration Christoph Hellwig
  0 siblings, 2 replies; 17+ messages in thread
From: Theodore Ts'o @ 2020-03-20  2:52 UTC (permalink / raw)
  To: Ext4 Developers List, linux-f2fs-devel, linux-xfs
  Cc: Theodore Ts'o, Eric Biggers

In the case that an inode has dirty timestamp for longer than the
lazytime expiration timeout (or if all such inodes are being flushed
out due to a sync or syncfs system call), we need to inform the file
system that the inode is dirty so that the inode's timestamps can be
copied out to the on-disk data structures.  That's because if the file
system supports lazytime, it will have ignored the dirty_inode(inode,
I_DIRTY_TIME) notification when the timestamp was modified in memory.q

Previously, this was accomplished by calling mark_inode_dirty_sync(),
but that has the unfortunate side effect of also putting the inode the
writeback list, and that's not necessary in this case, since we will
immediately call write_inode() afterwards.

Eric Biggers noticed that this was causing problems for fscrypt after
the key was removed[1].

[1] https://lore.kernel.org/r/20200306004555.GB225345@gmail.com

Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 fs/fs-writeback.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 76ac9c7d32ec..867454997c9d 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1504,8 +1504,9 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 
 	spin_unlock(&inode->i_lock);
 
-	if (dirty & I_DIRTY_TIME)
-		mark_inode_dirty_sync(inode);
+	/* This was a lazytime expiration; we need to tell the file system */
+	if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
+		inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
 	/* Don't write the inode if only I_DIRTY_PAGES was set */
 	if (dirty & ~I_DIRTY_PAGES) {
 		int err = write_inode(inode, wbc);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/2] writeback, xfs: call dirty_inode() with I_DIRTY_TIME_EXPIRED when appropriate
  2020-03-20  2:52             ` [PATCH 1/2] " Theodore Ts'o
@ 2020-03-20  2:52               ` Theodore Ts'o
  2020-03-23 17:58                 ` Theodore Y. Ts'o
  2020-03-25  9:20               ` [PATCH 1/2] writeback: avoid double-writing the inode on a lazytime expiration Christoph Hellwig
  1 sibling, 1 reply; 17+ messages in thread
From: Theodore Ts'o @ 2020-03-20  2:52 UTC (permalink / raw)
  To: Ext4 Developers List, linux-f2fs-devel, linux-xfs; +Cc: Theodore Ts'o

Use the flag I_DIRTY_TIME_EXPIRED passed to dirty_inode() to signal to
the file system that it is time to flush the inode's timestamps to
stable storage.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 fs/fs-writeback.c  | 2 +-
 fs/xfs/xfs_super.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 867454997c9d..32101349ba97 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1506,7 +1506,7 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 
 	/* This was a lazytime expiration; we need to tell the file system */
 	if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
-		inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
+		inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_TIME_EXPIRED);
 	/* Don't write the inode if only I_DIRTY_PAGES was set */
 	if (dirty & ~I_DIRTY_PAGES) {
 		int err = write_inode(inode, wbc);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 2094386af8ac..f27b9b205f81 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -622,7 +622,8 @@ xfs_fs_dirty_inode(
 
 	if (!(inode->i_sb->s_flags & SB_LAZYTIME))
 		return;
-	if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
+	if ((flag != I_DIRTY_SYNC && flag != I_DIRTY_TIME_EXPIRED) ||
+	    !(inode->i_state & I_DIRTY_TIME))
 		return;
 
 	if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
-- 
2.24.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] writeback, xfs: call dirty_inode() with I_DIRTY_TIME_EXPIRED when appropriate
  2020-03-20  2:52               ` [PATCH 2/2] writeback, xfs: call dirty_inode() with I_DIRTY_TIME_EXPIRED when appropriate Theodore Ts'o
@ 2020-03-23 17:58                 ` Theodore Y. Ts'o
  2020-03-24  8:37                   ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-23 17:58 UTC (permalink / raw)
  To: Dave Chinner, Christoph Hellwig, Ext4 Developers List,
	linux-f2fs-devel, linux-xfs

Christoph, Dave --- does this give you the notification that you were
looking such that XFS could get the notification desired that it was
the timestamps need to be written back?

    	       	       	  	  - Ted

On Thu, Mar 19, 2020 at 10:52:55PM -0400, Theodore Ts'o wrote:
> Use the flag I_DIRTY_TIME_EXPIRED passed to dirty_inode() to signal to
> the file system that it is time to flush the inode's timestamps to
> stable storage.
> 
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> ---
>  fs/fs-writeback.c  | 2 +-
>  fs/xfs/xfs_super.c | 3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 867454997c9d..32101349ba97 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1506,7 +1506,7 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
>  
>  	/* This was a lazytime expiration; we need to tell the file system */
>  	if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
> -		inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
> +		inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_TIME_EXPIRED);
>  	/* Don't write the inode if only I_DIRTY_PAGES was set */
>  	if (dirty & ~I_DIRTY_PAGES) {
>  		int err = write_inode(inode, wbc);
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 2094386af8ac..f27b9b205f81 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -622,7 +622,8 @@ xfs_fs_dirty_inode(
>  
>  	if (!(inode->i_sb->s_flags & SB_LAZYTIME))
>  		return;
> -	if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
> +	if ((flag != I_DIRTY_SYNC && flag != I_DIRTY_TIME_EXPIRED) ||
> +	    !(inode->i_state & I_DIRTY_TIME))
>  		return;
>  
>  	if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] writeback, xfs: call dirty_inode() with I_DIRTY_TIME_EXPIRED when appropriate
  2020-03-23 17:58                 ` Theodore Y. Ts'o
@ 2020-03-24  8:37                   ` Christoph Hellwig
  2020-03-24 18:43                     ` Theodore Y. Ts'o
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2020-03-24  8:37 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Dave Chinner, Christoph Hellwig, Ext4 Developers List,
	linux-f2fs-devel, linux-xfs

On Mon, Mar 23, 2020 at 01:58:38PM -0400, Theodore Y. Ts'o wrote:
> Christoph, Dave --- does this give you the notification that you were
> looking such that XFS could get the notification desired that it was
> the timestamps need to be written back?

I need to look at it in more detail as it seems convoluted.  Also the
order seems like you regress XFS in patch 1 and then fix it in patch 2?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] writeback, xfs: call dirty_inode() with I_DIRTY_TIME_EXPIRED when appropriate
  2020-03-24  8:37                   ` Christoph Hellwig
@ 2020-03-24 18:43                     ` Theodore Y. Ts'o
  0 siblings, 0 replies; 17+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-24 18:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Ext4 Developers List, linux-f2fs-devel, linux-xfs

On Tue, Mar 24, 2020 at 01:37:59AM -0700, Christoph Hellwig wrote:
> On Mon, Mar 23, 2020 at 01:58:38PM -0400, Theodore Y. Ts'o wrote:
> > Christoph, Dave --- does this give you the notification that you were
> > looking such that XFS could get the notification desired that it was
> > the timestamps need to be written back?
> 
> I need to look at it in more detail as it seems convoluted.  Also the
> order seems like you regress XFS in patch 1 and then fix it in patch 2?

In patch one we send I_DIRTY_SYNC as we had been doing as before.  So
I don't believe that patch #1 would regress XFS; can you confirm?

My thinking was to move ahead with patch 1 so that it fixed the bug
which Eric Biffers had reported for f2fs, but only to move forward
with patch #2 if it would be useful for XFS.

Cheers,

     	      	    	     	    - Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] writeback: avoid double-writing the inode on a lazytime expiration
  2020-03-20  2:52             ` [PATCH 1/2] " Theodore Ts'o
  2020-03-20  2:52               ` [PATCH 2/2] writeback, xfs: call dirty_inode() with I_DIRTY_TIME_EXPIRED when appropriate Theodore Ts'o
@ 2020-03-25  9:20               ` Christoph Hellwig
  2020-03-25 15:21                 ` Theodore Y. Ts'o
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2020-03-25  9:20 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Ext4 Developers List, linux-f2fs-devel, linux-xfs, Eric Biggers,
	Richard Weinberger

>  	spin_unlock(&inode->i_lock);
>  
> -	if (dirty & I_DIRTY_TIME)
> -		mark_inode_dirty_sync(inode);
> +	/* This was a lazytime expiration; we need to tell the file system */
> +	if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
> +		inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);

I think this needs a very clear comment explaining why we don't go
through __mark_inode_dirty.

But as said before I'd rather have a new lazytime_expired operation that
makes it very clear what is happening.  We currenly have 4 file systems
(ext4, f2fs, ubifs and xfs) that support lazytime, so this won't really
be a major churn.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] writeback: avoid double-writing the inode on a lazytime expiration
  2020-03-25  9:20               ` [PATCH 1/2] writeback: avoid double-writing the inode on a lazytime expiration Christoph Hellwig
@ 2020-03-25 15:21                 ` Theodore Y. Ts'o
  2020-03-25 15:47                   ` Darrick J. Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-25 15:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ext4 Developers List, linux-f2fs-devel, linux-xfs, Eric Biggers,
	Richard Weinberger

On Wed, Mar 25, 2020 at 02:20:57AM -0700, Christoph Hellwig wrote:
> >  	spin_unlock(&inode->i_lock);
> >  
> > -	if (dirty & I_DIRTY_TIME)
> > -		mark_inode_dirty_sync(inode);
> > +	/* This was a lazytime expiration; we need to tell the file system */
> > +	if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
> > +		inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
> 
> I think this needs a very clear comment explaining why we don't go
> through __mark_inode_dirty.

I can take the explanation which is in the git commit description and
move it into the comment.

> But as said before I'd rather have a new lazytime_expired operation that
> makes it very clear what is happening.  We currenly have 4 file systems
> (ext4, f2fs, ubifs and xfs) that support lazytime, so this won't really
> be a major churn.

Again, I believe patch #2 does what you want; if it doesn't can you
explain why passing I_DIRTY_TIME_EXPIRED to s_op->dirty_inode() isn't
"a new lazytime expired operation that makes very clear what is
happening"?

I separated out patch #1 and patch #2 because patch #1 preserves
current behavior, and patch #2 modifies XFS code, which I don't want
to push Linus without an XFS reviewed-by.

N.b.  None of the other file systems required a change for patch #2,
so if you want, we can have the XFS tree carry patch #2, and/or
combine that with whatever other simplifying changes that you want.
Or I can combine patch #1 and patch #2, with an XFS Reviewed-by, and
send it through the ext4 tree.

What's your pleasure?

					- Ted


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] writeback: avoid double-writing the inode on a lazytime expiration
  2020-03-25 15:21                 ` Theodore Y. Ts'o
@ 2020-03-25 15:47                   ` Darrick J. Wong
  0 siblings, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2020-03-25 15:47 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Christoph Hellwig, Ext4 Developers List, linux-f2fs-devel,
	linux-xfs, Eric Biggers, Richard Weinberger

On Wed, Mar 25, 2020 at 11:21:13AM -0400, Theodore Y. Ts'o wrote:
> On Wed, Mar 25, 2020 at 02:20:57AM -0700, Christoph Hellwig wrote:
> > >  	spin_unlock(&inode->i_lock);
> > >  
> > > -	if (dirty & I_DIRTY_TIME)
> > > -		mark_inode_dirty_sync(inode);
> > > +	/* This was a lazytime expiration; we need to tell the file system */
> > > +	if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
> > > +		inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
> > 
> > I think this needs a very clear comment explaining why we don't go
> > through __mark_inode_dirty.
> 
> I can take the explanation which is in the git commit description and
> move it into the comment.
> 
> > But as said before I'd rather have a new lazytime_expired operation that
> > makes it very clear what is happening.  We currenly have 4 file systems
> > (ext4, f2fs, ubifs and xfs) that support lazytime, so this won't really
> > be a major churn.
> 
> Again, I believe patch #2 does what you want; if it doesn't can you
> explain why passing I_DIRTY_TIME_EXPIRED to s_op->dirty_inode() isn't
> "a new lazytime expired operation that makes very clear what is
> happening"?
> 
> I separated out patch #1 and patch #2 because patch #1 preserves
> current behavior, and patch #2 modifies XFS code, which I don't want
> to push Linus without an XFS reviewed-by.
> 
> N.b.  None of the other file systems required a change for patch #2,
> so if you want, we can have the XFS tree carry patch #2, and/or
> combine that with whatever other simplifying changes that you want.
> Or I can combine patch #1 and patch #2, with an XFS Reviewed-by, and
> send it through the ext4 tree.
> 
> What's your pleasure?

TBH while I'm pretty sure this does actually maintain more or less the
same behavior on xfs, I prefer Christoph's explicit ->lazytime_expired
approach[1] over squinting at bitflag manipulations.

(It also took me a while to realize that this patch duo even existed, as
it was kinda buried in its parent thread...)

--D

[1] https://lore.kernel.org/linux-fsdevel/20200325122825.1086872-1-hch@lst.de/T/#t

> 
> 					- Ted
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, back to index

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-06  0:45 lazytime causing inodes to remain dirty after sync? Eric Biggers
2020-03-07  2:00 ` [PATCH] writeback: avoid double-writing the inode on a lazytime expiration Theodore Ts'o
2020-03-11  3:20   ` Eric Biggers
2020-03-11 12:57     ` Theodore Y. Ts'o
2020-03-12  0:07       ` Dave Chinner
2020-03-12 14:34         ` Christoph Hellwig
2020-03-12 22:39           ` Dave Chinner
2020-03-20  2:46           ` Theodore Y. Ts'o
2020-03-20  2:52             ` [PATCH 1/2] " Theodore Ts'o
2020-03-20  2:52               ` [PATCH 2/2] writeback, xfs: call dirty_inode() with I_DIRTY_TIME_EXPIRED when appropriate Theodore Ts'o
2020-03-23 17:58                 ` Theodore Y. Ts'o
2020-03-24  8:37                   ` Christoph Hellwig
2020-03-24 18:43                     ` Theodore Y. Ts'o
2020-03-25  9:20               ` [PATCH 1/2] writeback: avoid double-writing the inode on a lazytime expiration Christoph Hellwig
2020-03-25 15:21                 ` Theodore Y. Ts'o
2020-03-25 15:47                   ` Darrick J. Wong
2020-03-11 23:54     ` [PATCH] " Dave Chinner

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git