Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
From: Jan Kara <jack@suse.cz>
To: Eric Biggers <ebiggers@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-xfs@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 11/13] fs: add a lazytime_expired method
Date: Thu, 7 Jan 2021 15:02:28 +0100
Message-ID: <20210107140228.GF12990@quack2.suse.cz> (raw)
In-Reply-To: <20210105005452.92521-12-ebiggers@kernel.org>

On Mon 04-01-21 16:54:50, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Add a lazytime_expired method to 'struct super_operations'.  Filesystems
> can implement this to be notified when an inode's lazytime timestamps
> have expired and need to be written to disk.
> 
> This avoids any potential ambiguity with
> ->dirty_inode(inode, I_DIRTY_SYNC), which can also mean a generic
> dirtying of the inode, not just a lazytime timestamp expiration.
> In particular, this will be useful for XFS.
> 
> If not implemented, then ->dirty_inode(inode, I_DIRTY_SYNC) continues to
> be called.
> 
> Note that there are three cases where we have to make sure to call
> lazytime_expired():
> 
> - __writeback_single_inode(): inode is being written now
> - vfs_fsync_range(): inode is going to be synced
> - iput(): inode is going to be evicted
> 
> In the latter two cases, the inode still needs to be put on the
> writeback list.  So, we can't just replace the calls to
> mark_inode_dirty_sync() with lazytime_expired().  Instead, add a new
> flag I_DIRTY_TIME_EXPIRED which can be passed to __mark_inode_dirty().
> It's like I_DIRTY_SYNC, except it causes the filesystem to be notified
> of a lazytime expiration rather than a generic I_DIRTY_SYNC.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Hum, seeing this patch I kind of wonder: Why don't we dirty the inode after
expiring the lazytime timestamps with I_DIRTY_SYNC | I_DIRTY_TIME_EXPIRED
and propagate I_DIRTY_TIME_EXPIRED even to ->dirty_inode() where XFS can
catch it and act? Functionally it would be the same but we'd save a bunch
of generic code and ->lazytime_expired helper used just by a single
filesystem...

								Honza

> ---
>  Documentation/filesystems/locking.rst |  2 ++
>  Documentation/filesystems/vfs.rst     | 10 ++++++++++
>  fs/fs-writeback.c                     | 27 ++++++++++++++++++++++-----
>  fs/inode.c                            |  2 +-
>  fs/sync.c                             |  2 +-
>  include/linux/fs.h                    |  7 ++++++-
>  6 files changed, 42 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
> index c0f2c7586531b..53088e2a93b69 100644
> --- a/Documentation/filesystems/locking.rst
> +++ b/Documentation/filesystems/locking.rst
> @@ -150,6 +150,7 @@ prototypes::
>  	void (*free_inode)(struct inode *);
>  	void (*destroy_inode)(struct inode *);
>  	void (*dirty_inode) (struct inode *, int flags);
> +	void (*lazytime_expired) (struct inode *);
>  	int (*write_inode) (struct inode *, struct writeback_control *wbc);
>  	int (*drop_inode) (struct inode *);
>  	void (*evict_inode) (struct inode *);
> @@ -175,6 +176,7 @@ alloc_inode:
>  free_inode:				called from RCU callback
>  destroy_inode:
>  dirty_inode:
> +lazytime_expired:
>  write_inode:
>  drop_inode:				!!!inode->i_lock!!!
>  evict_inode:
> diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
> index 287b80948a40b..02531b1421d01 100644
> --- a/Documentation/filesystems/vfs.rst
> +++ b/Documentation/filesystems/vfs.rst
> @@ -231,6 +231,7 @@ filesystem.  As of kernel 2.6.22, the following members are defined:
>  		void (*destroy_inode)(struct inode *);
>  
>  		void (*dirty_inode) (struct inode *, int flags);
> +		void (*lazytime_expired) (struct inode *);
>  		int (*write_inode) (struct inode *, int);
>  		void (*drop_inode) (struct inode *);
>  		void (*delete_inode) (struct inode *);
> @@ -275,6 +276,15 @@ or bottom half).
>  	not its data.  If the update needs to be persisted by fdatasync(),
>  	then I_DIRTY_DATASYNC will be set in the flags argument.
>  
> +``lazytime_expired``
> +	when the lazytime mount option is enabled, this method is
> +	called when an inode's in-memory updated timestamps have
> +	expired and thus need to be written to disk.  This happens
> +	when the timestamps have been in memory for too long, when the
> +	inode is going to be evicted, or when userspace triggers a
> +	sync.  If this method is not implemented, then
> +	->dirty_inode(inode, I_DIRTY_SYNC) is called instead.
> +
>  ``write_inode``
>  	this method is called when the VFS needs to write an inode to
>  	disc.  The second parameter indicates whether the write should
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index ed76112bd067b..f187fc3f854e4 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1441,6 +1441,14 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
>  	}
>  }
>  
> +static void lazytime_expired(struct inode *inode)
> +{
> +	if (inode->i_sb->s_op->lazytime_expired)
> +		inode->i_sb->s_op->lazytime_expired(inode);
> +	else if (inode->i_sb->s_op->dirty_inode)
> +		inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
> +}
> +
>  /*
>   * Write out an inode and its dirty pages. Do not update the writeback list
>   * linkage. That is left to the caller. The caller is also responsible for
> @@ -1520,8 +1528,8 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
>  		 * isn't enough.  Don't call mark_inode_dirty_sync(), as that
>  		 * would put the inode back on the dirty list.
>  		 */
> -		if ((dirty & I_DIRTY_TIME) && inode->i_sb->s_op->dirty_inode)
> -			inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
> +		if (dirty & I_DIRTY_TIME)
> +			lazytime_expired(inode);
>  
>  		err = write_inode(inode, wbc);
>  		if (ret == 0)
> @@ -2231,8 +2239,9 @@ static noinline void block_dump___mark_inode_dirty(struct inode *inode)
>   *
>   * @inode: inode to mark
>   * @flags: what kind of dirty, e.g. I_DIRTY_SYNC.  This can be a combination of
> - *	   multiple I_DIRTY_* flags, except that I_DIRTY_TIME can't be combined
> - *	   with I_DIRTY_PAGES.
> + *	   multiple I_DIRTY_* flags, except that:
> + *	   - I_DIRTY_TIME can't be combined with I_DIRTY_PAGES
> + *	   - I_DIRTY_TIME_EXPIRED must be used by itself
>   *
>   * Mark an inode as dirty.  We notify the filesystem, then update the inode's
>   * dirty flags.  Then, if needed we add the inode to the appropriate dirty list.
> @@ -2260,7 +2269,15 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>  
>  	trace_writeback_mark_inode_dirty(inode, flags);
>  
> -	if (flags & I_DIRTY_INODE) {
> +	if (flags & I_DIRTY_TIME_EXPIRED) {
> +		/*
> +		 * Notify the filesystem about a lazytime timestamp expiration.
> +		 * Afterwards, this case just turns into I_DIRTY_SYNC.
> +		 */
> +		WARN_ON_ONCE(flags & ~I_DIRTY_TIME_EXPIRED);
> +		lazytime_expired(inode);
> +		flags = I_DIRTY_SYNC;
> +	} else if (flags & I_DIRTY_INODE) {
>  		/*
>  		 * Notify the filesystem about the inode being dirtied, so that
>  		 * (if needed) it can update on-disk fields and journal the
> diff --git a/fs/inode.c b/fs/inode.c
> index d0fa43d8e9d5c..039b201a4743a 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1673,7 +1673,7 @@ void iput(struct inode *inode)
>  			atomic_inc(&inode->i_count);
>  			spin_unlock(&inode->i_lock);
>  			trace_writeback_lazytime_iput(inode);
> -			mark_inode_dirty_sync(inode);
> +			__mark_inode_dirty(inode, I_DIRTY_TIME_EXPIRED);
>  			goto retry;
>  		}
>  		iput_final(inode);
> diff --git a/fs/sync.c b/fs/sync.c
> index 1373a610dc784..363071a3528e3 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -196,7 +196,7 @@ int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)
>  	if (!file->f_op->fsync)
>  		return -EINVAL;
>  	if (!datasync && (inode->i_state & I_DIRTY_TIME))
> -		mark_inode_dirty_sync(inode);
> +		__mark_inode_dirty(inode, I_DIRTY_TIME_EXPIRED);
>  	return file->f_op->fsync(file, start, end, datasync);
>  }
>  EXPORT_SYMBOL(vfs_fsync_range);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 45a0303b2aeb6..8c5f5c5e62be4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1935,7 +1935,8 @@ struct super_operations {
>  	void (*destroy_inode)(struct inode *);
>  	void (*free_inode)(struct inode *);
>  
> -   	void (*dirty_inode) (struct inode *, int flags);
> +	void (*dirty_inode) (struct inode *, int flags);
> +	void (*lazytime_expired)(struct inode *);
>  	int (*write_inode) (struct inode *, struct writeback_control *wbc);
>  	int (*drop_inode) (struct inode *);
>  	void (*evict_inode) (struct inode *);
> @@ -2108,6 +2109,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>   *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set.  I.e.
>   *			either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
>   *			i_state, but not both.  I_DIRTY_PAGES may still be set.
> + * I_DIRTY_TIME_EXPIRED Passed to __mark_inode_dirty() to indicate the intent to
> + *			expire the inode's timestamps.  Not stored in i_state.
> + *
>   * I_NEW		Serves as both a mutex and completion notification.
>   *			New inodes set I_NEW.  If two processes both create
>   *			the same inode, one of them will release its inode and
> @@ -2173,6 +2177,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>  #define I_DIO_WAKEUP		(1 << __I_DIO_WAKEUP)
>  #define I_LINKABLE		(1 << 10)
>  #define I_DIRTY_TIME		(1 << 11)
> +#define I_DIRTY_TIME_EXPIRED	(1 << 12)
>  #define I_WB_SWITCH		(1 << 13)
>  #define I_OVL_INUSE		(1 << 14)
>  #define I_CREATING		(1 << 15)
> -- 
> 2.30.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply index

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-05  0:54 [PATCH 00/13] lazytime fixes and cleanups Eric Biggers
2021-01-05  0:54 ` [PATCH 01/13] fs: avoid double-writing inodes on lazytime expiration Eric Biggers
2021-01-07 14:47   ` Jan Kara
2021-01-07 14:58     ` Matthew Wilcox
2021-01-07 21:46     ` Eric Biggers
2021-01-08  8:54       ` Christoph Hellwig
2021-01-08  9:01     ` Christoph Hellwig
2021-01-09 17:11       ` Eric Biggers
2021-01-05  0:54 ` [PATCH 02/13] gfs2: don't worry about I_DIRTY_TIME in gfs2_fsync() Eric Biggers
2021-01-08  8:54   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 03/13] fs: only specify I_DIRTY_TIME when needed in generic_update_time() Eric Biggers
2021-01-08  8:57   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 04/13] fat: only specify I_DIRTY_TIME when needed in fat_update_time() Eric Biggers
2021-01-07 13:13   ` Jan Kara
2021-01-07 19:10     ` Eric Biggers
2021-01-05  0:54 ` [PATCH 05/13] fs: don't call ->dirty_inode for lazytime timestamp updates Eric Biggers
2021-01-07 13:17   ` Jan Kara
2021-01-07 13:18     ` Jan Kara
2021-01-08  9:02   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 06/13] fs: pass only I_DIRTY_INODE flags to ->dirty_inode Eric Biggers
2021-01-08  9:02   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 07/13] fs: correctly document the inode dirty flags Eric Biggers
2021-01-08  9:03   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 08/13] ext4: simplify i_state checks in __ext4_update_other_inode_time() Eric Biggers
2021-01-07 13:24   ` Jan Kara
2021-01-07 19:06     ` Eric Biggers
2021-01-05  0:54 ` [PATCH 09/13] fs: drop redundant checks from __writeback_single_inode() Eric Biggers
2021-01-08  9:12   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 10/13] fs: clean up __mark_inode_dirty() a bit Eric Biggers
2021-01-08  9:10   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 11/13] fs: add a lazytime_expired method Eric Biggers
2021-01-07 14:02   ` Jan Kara [this message]
2021-01-07 22:05     ` Eric Biggers
2021-01-08  9:14       ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 12/13] xfs: remove a stale comment from xfs_file_aio_write_checks() Eric Biggers
2021-01-08  9:15   ` Christoph Hellwig
2021-01-05  0:54 ` [PATCH 13/13] xfs: implement lazytime_expired Eric Biggers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210107140228.GF12990@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=ebiggers@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git