linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4 v2] writeback: Lazytime handling fix and cleanups
@ 2020-06-11  8:11 Jan Kara
  2020-06-11  8:11 ` [PATCH 1/4] writeback: Protect inode->i_io_list with inode->i_lock Jan Kara
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Jan Kara @ 2020-06-11  8:11 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Martijn Coenen, Jan Kara

Hello,

this patch series fixes an issue with handling of lazy inode timestamp
writeback which could result in missed background writeback of an inode
or missed update of inode timestamps during sync(2). It also somewhat
simplifies the writeback code handling of lazy inode timestamp updates.

Changes since v1:
* Split off locking change for i_io_list manipulation to a separate patch
* Renamed older_than_this to dirtied_before
* Renamed __redirty_tail() to redirty_tail_locked()
* Other minor style fixes

								Honza

Previous versions:
Link: http://lore.kernel.org/r/20200601091202.31302-1-jack@suse.cz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] writeback: Protect inode->i_io_list with inode->i_lock
  2020-06-11  8:11 [PATCH 0/4 v2] writeback: Lazytime handling fix and cleanups Jan Kara
@ 2020-06-11  8:11 ` Jan Kara
  2020-06-12  7:45   ` Martijn Coenen
  2020-06-15  6:27   ` Christoph Hellwig
  2020-06-11  8:11 ` [PATCH 2/4] writeback: Avoid skipping inode writeback Jan Kara
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Jan Kara @ 2020-06-11  8:11 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Martijn Coenen, Jan Kara

Currently, operations on inode->i_io_list are protected by
wb->list_lock. In the following patches we'll need to maintain
consistency between inode->i_state and inode->i_io_list so change the
code so that inode->i_lock protects also all inode's i_io_list handling.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index a605c3dddabc..ff0b18331590 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -144,6 +144,7 @@ static void inode_io_list_del_locked(struct inode *inode,
 				     struct bdi_writeback *wb)
 {
 	assert_spin_locked(&wb->list_lock);
+	assert_spin_locked(&inode->i_lock);
 
 	list_del_init(&inode->i_io_list);
 	wb_io_lists_depopulated(wb);
@@ -1122,7 +1123,9 @@ void inode_io_list_del(struct inode *inode)
 	struct bdi_writeback *wb;
 
 	wb = inode_to_wb_and_lock_list(inode);
+	spin_lock(&inode->i_lock);
 	inode_io_list_del_locked(inode, wb);
+	spin_unlock(&inode->i_lock);
 	spin_unlock(&wb->list_lock);
 }
 EXPORT_SYMBOL(inode_io_list_del);
@@ -1172,8 +1175,10 @@ void sb_clear_inode_writeback(struct inode *inode)
  * the case then the inode must have been redirtied while it was being written
  * out and we don't reset its dirtied_when.
  */
-static void redirty_tail(struct inode *inode, struct bdi_writeback *wb)
+static void redirty_tail_locked(struct inode *inode, struct bdi_writeback *wb)
 {
+	assert_spin_locked(&inode->i_lock);
+
 	if (!list_empty(&wb->b_dirty)) {
 		struct inode *tail;
 
@@ -1184,6 +1189,13 @@ static void redirty_tail(struct inode *inode, struct bdi_writeback *wb)
 	inode_io_list_move_locked(inode, wb, &wb->b_dirty);
 }
 
+static void redirty_tail(struct inode *inode, struct bdi_writeback *wb)
+{
+	spin_lock(&inode->i_lock);
+	redirty_tail_locked(inode, wb);
+	spin_unlock(&inode->i_lock);
+}
+
 /*
  * requeue inode for re-scanning after bdi->b_io list is exhausted.
  */
@@ -1394,7 +1406,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
 		 * writeback is not making progress due to locked
 		 * buffers. Skip this inode for now.
 		 */
-		redirty_tail(inode, wb);
+		redirty_tail_locked(inode, wb);
 		return;
 	}
 
@@ -1414,7 +1426,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
 			 * retrying writeback of the dirty page/inode
 			 * that cannot be performed immediately.
 			 */
-			redirty_tail(inode, wb);
+			redirty_tail_locked(inode, wb);
 		}
 	} else if (inode->i_state & I_DIRTY) {
 		/*
@@ -1422,7 +1434,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
 		 * such as delayed allocation during submission or metadata
 		 * updates after data IO completion.
 		 */
-		redirty_tail(inode, wb);
+		redirty_tail_locked(inode, wb);
 	} else if (inode->i_state & I_DIRTY_TIME) {
 		inode->dirtied_when = jiffies;
 		inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
@@ -1669,8 +1681,8 @@ static long writeback_sb_inodes(struct super_block *sb,
 		 */
 		spin_lock(&inode->i_lock);
 		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
+			redirty_tail_locked(inode, wb);
 			spin_unlock(&inode->i_lock);
-			redirty_tail(inode, wb);
 			continue;
 		}
 		if ((inode->i_state & I_SYNC) && wbc.sync_mode != WB_SYNC_ALL) {
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] writeback: Avoid skipping inode writeback
  2020-06-11  8:11 [PATCH 0/4 v2] writeback: Lazytime handling fix and cleanups Jan Kara
  2020-06-11  8:11 ` [PATCH 1/4] writeback: Protect inode->i_io_list with inode->i_lock Jan Kara
@ 2020-06-11  8:11 ` Jan Kara
  2020-06-12  8:03   ` Martijn Coenen
  2020-06-15  6:27   ` Christoph Hellwig
  2020-06-11  8:11 ` [PATCH 3/4] writeback: Fix sync livelock due to b_dirty_time processing Jan Kara
  2020-06-11  8:11 ` [PATCH 4/4] writeback: Drop I_DIRTY_TIME_EXPIRE Jan Kara
  3 siblings, 2 replies; 14+ messages in thread
From: Jan Kara @ 2020-06-11  8:11 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Martijn Coenen, Jan Kara, stable

Inode's i_io_list list head is used to attach inode to several different
lists - wb->{b_dirty, b_dirty_time, b_io, b_more_io}. When flush worker
prepares a list of inodes to writeback e.g. for sync(2), it moves inodes
to b_io list. Thus it is critical for sync(2) data integrity guarantees
that inode is not requeued to any other writeback list when inode is
queued for processing by flush worker. That's the reason why
writeback_single_inode() does not touch i_io_list (unless the inode is
completely clean) and why __mark_inode_dirty() does not touch i_io_list
if I_SYNC flag is set.

However there are two flaws in the current logic:

1) When inode has only I_DIRTY_TIME set but it is already queued in b_io
list due to sync(2), concurrent __mark_inode_dirty(inode, I_DIRTY_SYNC)
can still move inode back to b_dirty list resulting in skipping
writeback of inode time stamps during sync(2).

2) When inode is on b_dirty_time list and writeback_single_inode() races
with __mark_inode_dirty() like:

writeback_single_inode()		__mark_inode_dirty(inode, I_DIRTY_PAGES)
  inode->i_state |= I_SYNC
  __writeback_single_inode()
					  inode->i_state |= I_DIRTY_PAGES;
					  if (inode->i_state & I_SYNC)
					    bail
  if (!(inode->i_state & I_DIRTY_ALL))
  - not true so nothing done

We end up with I_DIRTY_PAGES inode on b_dirty_time list and thus
standard background writeback will not writeback this inode leading to
possible dirty throttling stalls etc. (thanks to Martijn Coenen for this
analysis).

Fix these problems by tracking whether inode is queued in b_io or
b_more_io lists in a new I_SYNC_QUEUED flag. When this flag is set, we
know flush worker has queued inode and we should not touch i_io_list.
On the other hand we also know that once flush worker is done with the
inode it will requeue the inode to appropriate dirty list. When
I_SYNC_QUEUED is not set, __mark_inode_dirty() can (and must) move inode
to appropriate dirty list.

Reported-by: Martijn Coenen <maco@android.com>
Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c  | 17 ++++++++++++-----
 include/linux/fs.h |  8 ++++++--
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index ff0b18331590..f470c10641c5 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -146,6 +146,7 @@ static void inode_io_list_del_locked(struct inode *inode,
 	assert_spin_locked(&wb->list_lock);
 	assert_spin_locked(&inode->i_lock);
 
+	inode->i_state &= ~I_SYNC_QUEUED;
 	list_del_init(&inode->i_io_list);
 	wb_io_lists_depopulated(wb);
 }
@@ -1187,6 +1188,7 @@ static void redirty_tail_locked(struct inode *inode, struct bdi_writeback *wb)
 			inode->dirtied_when = jiffies;
 	}
 	inode_io_list_move_locked(inode, wb, &wb->b_dirty);
+	inode->i_state &= ~I_SYNC_QUEUED;
 }
 
 static void redirty_tail(struct inode *inode, struct bdi_writeback *wb)
@@ -1262,8 +1264,11 @@ static int move_expired_inodes(struct list_head *delaying_queue,
 			break;
 		list_move(&inode->i_io_list, &tmp);
 		moved++;
+		spin_lock(&inode->i_lock);
 		if (flags & EXPIRE_DIRTY_ATIME)
-			set_bit(__I_DIRTY_TIME_EXPIRED, &inode->i_state);
+			inode->i_state |= I_DIRTY_TIME_EXPIRED;
+		inode->i_state |= I_SYNC_QUEUED;
+		spin_unlock(&inode->i_lock);
 		if (sb_is_blkdev_sb(inode->i_sb))
 			continue;
 		if (sb && sb != inode->i_sb)
@@ -1438,6 +1443,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
 	} else if (inode->i_state & I_DIRTY_TIME) {
 		inode->dirtied_when = jiffies;
 		inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
+		inode->i_state &= ~I_SYNC_QUEUED;
 	} else {
 		/* The inode is clean. Remove from writeback lists. */
 		inode_io_list_del_locked(inode, wb);
@@ -2301,11 +2307,12 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 		inode->i_state |= flags;
 
 		/*
-		 * If the inode is being synced, just update its dirty state.
-		 * The unlocker will place the inode on the appropriate
-		 * superblock list, based upon its state.
+		 * If the inode is queued for writeback by flush worker, just
+		 * update its dirty state. Once the flush worker is done with
+		 * the inode it will place it on the appropriate superblock
+		 * list, based upon its state.
 		 */
-		if (inode->i_state & I_SYNC)
+		if (inode->i_state & I_SYNC_QUEUED)
 			goto out_unlock_inode;
 
 		/*
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 19ef6c88c152..48556efcdcf0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2157,6 +2157,10 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
  *
  * I_DONTCACHE		Evict inode as soon as it is not used anymore.
  *
+ * I_SYNC_QUEUED	Inode is queued in b_io or b_more_io writeback lists.
+ *			Used to detect that mark_inode_dirty() should not move
+ * 			inode between dirty lists.
+ *
  * Q: What is the difference between I_WILL_FREE and I_FREEING?
  */
 #define I_DIRTY_SYNC		(1 << 0)
@@ -2174,12 +2178,12 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
 #define I_DIO_WAKEUP		(1 << __I_DIO_WAKEUP)
 #define I_LINKABLE		(1 << 10)
 #define I_DIRTY_TIME		(1 << 11)
-#define __I_DIRTY_TIME_EXPIRED	12
-#define I_DIRTY_TIME_EXPIRED	(1 << __I_DIRTY_TIME_EXPIRED)
+#define I_DIRTY_TIME_EXPIRED	(1 << 12)
 #define I_WB_SWITCH		(1 << 13)
 #define I_OVL_INUSE		(1 << 14)
 #define I_CREATING		(1 << 15)
 #define I_DONTCACHE		(1 << 16)
+#define I_SYNC_QUEUED		(1 << 17)
 
 #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
 #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] writeback: Fix sync livelock due to b_dirty_time processing
  2020-06-11  8:11 [PATCH 0/4 v2] writeback: Lazytime handling fix and cleanups Jan Kara
  2020-06-11  8:11 ` [PATCH 1/4] writeback: Protect inode->i_io_list with inode->i_lock Jan Kara
  2020-06-11  8:11 ` [PATCH 2/4] writeback: Avoid skipping inode writeback Jan Kara
@ 2020-06-11  8:11 ` Jan Kara
  2020-06-15  6:27   ` Christoph Hellwig
  2020-06-11  8:11 ` [PATCH 4/4] writeback: Drop I_DIRTY_TIME_EXPIRE Jan Kara
  3 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2020-06-11  8:11 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Martijn Coenen, Jan Kara, stable

When we are processing writeback for sync(2), move_expired_inodes()
didn't set any inode expiry value (older_than_this). This can result in
writeback never completing if there's steady stream of inodes added to
b_dirty_time list as writeback rechecks dirty lists after each writeback
round whether there's more work to be done. Fix the problem by using
sync(2) start time is inode expiry value when processing b_dirty_time
list similarly as for ordinarily dirtied inodes. This requires some
refactoring of older_than_this handling which simplifies the code
noticeably as a bonus.

Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c                | 44 ++++++++++++++++------------------------
 include/trace/events/writeback.h | 13 ++++++------
 2 files changed, 23 insertions(+), 34 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index f470c10641c5..ae17d64a3e18 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -42,7 +42,6 @@
 struct wb_writeback_work {
 	long nr_pages;
 	struct super_block *sb;
-	unsigned long *older_than_this;
 	enum writeback_sync_modes sync_mode;
 	unsigned int tagged_writepages:1;
 	unsigned int for_kupdate:1;
@@ -1234,16 +1233,13 @@ static bool inode_dirtied_after(struct inode *inode, unsigned long t)
 #define EXPIRE_DIRTY_ATIME 0x0001
 
 /*
- * Move expired (dirtied before work->older_than_this) dirty inodes from
+ * Move expired (dirtied before dirtied_before) dirty inodes from
  * @delaying_queue to @dispatch_queue.
  */
 static int move_expired_inodes(struct list_head *delaying_queue,
 			       struct list_head *dispatch_queue,
-			       int flags,
-			       struct wb_writeback_work *work)
+			       int flags, unsigned long dirtied_before)
 {
-	unsigned long *older_than_this = NULL;
-	unsigned long expire_time;
 	LIST_HEAD(tmp);
 	struct list_head *pos, *node;
 	struct super_block *sb = NULL;
@@ -1251,16 +1247,9 @@ static int move_expired_inodes(struct list_head *delaying_queue,
 	int do_sb_sort = 0;
 	int moved = 0;
 
-	if ((flags & EXPIRE_DIRTY_ATIME) == 0)
-		older_than_this = work->older_than_this;
-	else if (!work->for_sync) {
-		expire_time = jiffies - (dirtytime_expire_interval * HZ);
-		older_than_this = &expire_time;
-	}
 	while (!list_empty(delaying_queue)) {
 		inode = wb_inode(delaying_queue->prev);
-		if (older_than_this &&
-		    inode_dirtied_after(inode, *older_than_this))
+		if (inode_dirtied_after(inode, dirtied_before))
 			break;
 		list_move(&inode->i_io_list, &tmp);
 		moved++;
@@ -1306,18 +1295,22 @@ static int move_expired_inodes(struct list_head *delaying_queue,
  *                                           |
  *                                           +--> dequeue for IO
  */
-static void queue_io(struct bdi_writeback *wb, struct wb_writeback_work *work)
+static void queue_io(struct bdi_writeback *wb, struct wb_writeback_work *work,
+		     unsigned long dirtied_before)
 {
 	int moved;
+	unsigned long time_expire_jif = dirtied_before;
 
 	assert_spin_locked(&wb->list_lock);
 	list_splice_init(&wb->b_more_io, &wb->b_io);
-	moved = move_expired_inodes(&wb->b_dirty, &wb->b_io, 0, work);
+	moved = move_expired_inodes(&wb->b_dirty, &wb->b_io, 0, dirtied_before);
+	if (!work->for_sync)
+		time_expire_jif = jiffies - dirtytime_expire_interval * HZ;
 	moved += move_expired_inodes(&wb->b_dirty_time, &wb->b_io,
-				     EXPIRE_DIRTY_ATIME, work);
+				     EXPIRE_DIRTY_ATIME, time_expire_jif);
 	if (moved)
 		wb_io_lists_populated(wb);
-	trace_writeback_queue_io(wb, work, moved);
+	trace_writeback_queue_io(wb, work, dirtied_before, moved);
 }
 
 static int write_inode(struct inode *inode, struct writeback_control *wbc)
@@ -1829,7 +1822,7 @@ static long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages,
 	blk_start_plug(&plug);
 	spin_lock(&wb->list_lock);
 	if (list_empty(&wb->b_io))
-		queue_io(wb, &work);
+		queue_io(wb, &work, jiffies);
 	__writeback_inodes_wb(wb, &work);
 	spin_unlock(&wb->list_lock);
 	blk_finish_plug(&plug);
@@ -1849,7 +1842,7 @@ static long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages,
  * takes longer than a dirty_writeback_interval interval, then leave a
  * one-second gap.
  *
- * older_than_this takes precedence over nr_to_write.  So we'll only write back
+ * dirtied_before takes precedence over nr_to_write.  So we'll only write back
  * all dirty pages if they are all attached to "old" mappings.
  */
 static long wb_writeback(struct bdi_writeback *wb,
@@ -1857,14 +1850,11 @@ static long wb_writeback(struct bdi_writeback *wb,
 {
 	unsigned long wb_start = jiffies;
 	long nr_pages = work->nr_pages;
-	unsigned long oldest_jif;
+	unsigned long dirtied_before = jiffies;
 	struct inode *inode;
 	long progress;
 	struct blk_plug plug;
 
-	oldest_jif = jiffies;
-	work->older_than_this = &oldest_jif;
-
 	blk_start_plug(&plug);
 	spin_lock(&wb->list_lock);
 	for (;;) {
@@ -1898,14 +1888,14 @@ static long wb_writeback(struct bdi_writeback *wb,
 		 * safe.
 		 */
 		if (work->for_kupdate) {
-			oldest_jif = jiffies -
+			dirtied_before = jiffies -
 				msecs_to_jiffies(dirty_expire_interval * 10);
 		} else if (work->for_background)
-			oldest_jif = jiffies;
+			dirtied_before = jiffies;
 
 		trace_writeback_start(wb, work);
 		if (list_empty(&wb->b_io))
-			queue_io(wb, work);
+			queue_io(wb, work, dirtied_before);
 		if (work->sb)
 			progress = writeback_sb_inodes(work->sb, wb, work);
 		else
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 10f5d1fa7347..7565dcd59697 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -498,8 +498,9 @@ DEFINE_WBC_EVENT(wbc_writepage);
 TRACE_EVENT(writeback_queue_io,
 	TP_PROTO(struct bdi_writeback *wb,
 		 struct wb_writeback_work *work,
+		 unsigned long dirtied_before,
 		 int moved),
-	TP_ARGS(wb, work, moved),
+	TP_ARGS(wb, work, dirtied_before, moved),
 	TP_STRUCT__entry(
 		__array(char,		name, 32)
 		__field(unsigned long,	older)
@@ -509,19 +510,17 @@ TRACE_EVENT(writeback_queue_io,
 		__field(ino_t,		cgroup_ino)
 	),
 	TP_fast_assign(
-		unsigned long *older_than_this = work->older_than_this;
 		strscpy_pad(__entry->name, bdi_dev_name(wb->bdi), 32);
-		__entry->older	= older_than_this ?  *older_than_this : 0;
-		__entry->age	= older_than_this ?
-				  (jiffies - *older_than_this) * 1000 / HZ : -1;
+		__entry->older	= dirtied_before;
+		__entry->age	= (jiffies - dirtied_before) * 1000 / HZ;
 		__entry->moved	= moved;
 		__entry->reason	= work->reason;
 		__entry->cgroup_ino	= __trace_wb_assign_cgroup(wb);
 	),
 	TP_printk("bdi %s: older=%lu age=%ld enqueue=%d reason=%s cgroup_ino=%lu",
 		__entry->name,
-		__entry->older,	/* older_than_this in jiffies */
-		__entry->age,	/* older_than_this in relative milliseconds */
+		__entry->older,	/* dirtied_before in jiffies */
+		__entry->age,	/* dirtied_before in relative milliseconds */
 		__entry->moved,
 		__print_symbolic(__entry->reason, WB_WORK_REASON),
 		(unsigned long)__entry->cgroup_ino
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] writeback: Drop I_DIRTY_TIME_EXPIRE
  2020-06-11  8:11 [PATCH 0/4 v2] writeback: Lazytime handling fix and cleanups Jan Kara
                   ` (2 preceding siblings ...)
  2020-06-11  8:11 ` [PATCH 3/4] writeback: Fix sync livelock due to b_dirty_time processing Jan Kara
@ 2020-06-11  8:11 ` Jan Kara
  2020-09-02 17:20   ` Darrick J. Wong
  3 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2020-06-11  8:11 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Martijn Coenen, Jan Kara

The only use of I_DIRTY_TIME_EXPIRE is to detect in
__writeback_single_inode() that inode got there because flush worker
decided it's time to writeback the dirty inode time stamps (either
because we are syncing or because of age). However we can detect this
directly in __writeback_single_inode() and there's no need for the
strange propagation with I_DIRTY_TIME_EXPIRE flag.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c                  |  2 +-
 fs/fs-writeback.c                | 28 +++++++++++-----------------
 fs/xfs/libxfs/xfs_trans_inode.c  |  4 ++--
 include/linux/fs.h               |  1 -
 include/trace/events/writeback.h |  1 -
 5 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 40ec5c7ef0d3..4db497f02ffb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4887,7 +4887,7 @@ static void __ext4_update_other_inode_time(struct super_block *sb,
 	    (inode->i_state & I_DIRTY_TIME)) {
 		struct ext4_inode_info	*ei = EXT4_I(inode);
 
-		inode->i_state &= ~(I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED);
+		inode->i_state &= ~I_DIRTY_TIME;
 		spin_unlock(&inode->i_lock);
 
 		spin_lock(&ei->i_raw_lock);
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index ae17d64a3e18..149227160ff0 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1238,7 +1238,7 @@ static bool inode_dirtied_after(struct inode *inode, unsigned long t)
  */
 static int move_expired_inodes(struct list_head *delaying_queue,
 			       struct list_head *dispatch_queue,
-			       int flags, unsigned long dirtied_before)
+			       unsigned long dirtied_before)
 {
 	LIST_HEAD(tmp);
 	struct list_head *pos, *node;
@@ -1254,8 +1254,6 @@ static int move_expired_inodes(struct list_head *delaying_queue,
 		list_move(&inode->i_io_list, &tmp);
 		moved++;
 		spin_lock(&inode->i_lock);
-		if (flags & EXPIRE_DIRTY_ATIME)
-			inode->i_state |= I_DIRTY_TIME_EXPIRED;
 		inode->i_state |= I_SYNC_QUEUED;
 		spin_unlock(&inode->i_lock);
 		if (sb_is_blkdev_sb(inode->i_sb))
@@ -1303,11 +1301,11 @@ static void queue_io(struct bdi_writeback *wb, struct wb_writeback_work *work,
 
 	assert_spin_locked(&wb->list_lock);
 	list_splice_init(&wb->b_more_io, &wb->b_io);
-	moved = move_expired_inodes(&wb->b_dirty, &wb->b_io, 0, dirtied_before);
+	moved = move_expired_inodes(&wb->b_dirty, &wb->b_io, dirtied_before);
 	if (!work->for_sync)
 		time_expire_jif = jiffies - dirtytime_expire_interval * HZ;
 	moved += move_expired_inodes(&wb->b_dirty_time, &wb->b_io,
-				     EXPIRE_DIRTY_ATIME, time_expire_jif);
+				     time_expire_jif);
 	if (moved)
 		wb_io_lists_populated(wb);
 	trace_writeback_queue_io(wb, work, dirtied_before, moved);
@@ -1483,18 +1481,14 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 	spin_lock(&inode->i_lock);
 
 	dirty = inode->i_state & I_DIRTY;
-	if (inode->i_state & I_DIRTY_TIME) {
-		if ((dirty & I_DIRTY_INODE) ||
-		    wbc->sync_mode == WB_SYNC_ALL ||
-		    unlikely(inode->i_state & I_DIRTY_TIME_EXPIRED) ||
-		    unlikely(time_after(jiffies,
-					(inode->dirtied_time_when +
-					 dirtytime_expire_interval * HZ)))) {
-			dirty |= I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED;
-			trace_writeback_lazytime(inode);
-		}
-	} else
-		inode->i_state &= ~I_DIRTY_TIME_EXPIRED;
+	if ((inode->i_state & I_DIRTY_TIME) &&
+	    ((dirty & I_DIRTY_INODE) ||
+	     wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
+	     time_after(jiffies, inode->dirtied_time_when +
+			dirtytime_expire_interval * HZ))) {
+		dirty |= I_DIRTY_TIME;
+		trace_writeback_lazytime(inode);
+	}
 	inode->i_state &= ~dirty;
 
 	/*
diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
index b5dfb6654842..1b4df6636944 100644
--- a/fs/xfs/libxfs/xfs_trans_inode.c
+++ b/fs/xfs/libxfs/xfs_trans_inode.c
@@ -96,9 +96,9 @@ xfs_trans_log_inode(
 	 * to log the timestamps, or will clear already cleared fields in the
 	 * worst case.
 	 */
-	if (inode->i_state & (I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED)) {
+	if (inode->i_state & I_DIRTY_TIME) {
 		spin_lock(&inode->i_lock);
-		inode->i_state &= ~(I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED);
+		inode->i_state &= ~I_DIRTY_TIME;
 		spin_unlock(&inode->i_lock);
 	}
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 48556efcdcf0..45eadf5bea5d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2178,7 +2178,6 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
 #define I_DIO_WAKEUP		(1 << __I_DIO_WAKEUP)
 #define I_LINKABLE		(1 << 10)
 #define I_DIRTY_TIME		(1 << 11)
-#define I_DIRTY_TIME_EXPIRED	(1 << 12)
 #define I_WB_SWITCH		(1 << 13)
 #define I_OVL_INUSE		(1 << 14)
 #define I_CREATING		(1 << 15)
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 7565dcd59697..e7cbccc7c14c 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -20,7 +20,6 @@
 		{I_CLEAR,		"I_CLEAR"},		\
 		{I_SYNC,		"I_SYNC"},		\
 		{I_DIRTY_TIME,		"I_DIRTY_TIME"},	\
-		{I_DIRTY_TIME_EXPIRED,	"I_DIRTY_TIME_EXPIRED"}, \
 		{I_REFERENCED,		"I_REFERENCED"}		\
 	)
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] writeback: Protect inode->i_io_list with inode->i_lock
  2020-06-11  8:11 ` [PATCH 1/4] writeback: Protect inode->i_io_list with inode->i_lock Jan Kara
@ 2020-06-12  7:45   ` Martijn Coenen
  2020-06-15  6:27   ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Martijn Coenen @ 2020-06-12  7:45 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Christoph Hellwig

Hi Jan,

On Thu, Jun 11, 2020 at 10:12 AM Jan Kara <jack@suse.cz> wrote:
>
> Currently, operations on inode->i_io_list are protected by
> wb->list_lock. In the following patches we'll need to maintain
> consistency between inode->i_state and inode->i_io_list so change the
> code so that inode->i_lock protects also all inode's i_io_list handling.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

LGTM.

Reviewed-by: Martijn Coenen <maco@android.com>

> ---
>  fs/fs-writeback.c | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index a605c3dddabc..ff0b18331590 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -144,6 +144,7 @@ static void inode_io_list_del_locked(struct inode *inode,
>                                      struct bdi_writeback *wb)
>  {
>         assert_spin_locked(&wb->list_lock);
> +       assert_spin_locked(&inode->i_lock);
>
>         list_del_init(&inode->i_io_list);
>         wb_io_lists_depopulated(wb);
> @@ -1122,7 +1123,9 @@ void inode_io_list_del(struct inode *inode)
>         struct bdi_writeback *wb;
>
>         wb = inode_to_wb_and_lock_list(inode);
> +       spin_lock(&inode->i_lock);
>         inode_io_list_del_locked(inode, wb);
> +       spin_unlock(&inode->i_lock);
>         spin_unlock(&wb->list_lock);
>  }
>  EXPORT_SYMBOL(inode_io_list_del);
> @@ -1172,8 +1175,10 @@ void sb_clear_inode_writeback(struct inode *inode)
>   * the case then the inode must have been redirtied while it was being written
>   * out and we don't reset its dirtied_when.
>   */
> -static void redirty_tail(struct inode *inode, struct bdi_writeback *wb)
> +static void redirty_tail_locked(struct inode *inode, struct bdi_writeback *wb)
>  {
> +       assert_spin_locked(&inode->i_lock);
> +
>         if (!list_empty(&wb->b_dirty)) {
>                 struct inode *tail;
>
> @@ -1184,6 +1189,13 @@ static void redirty_tail(struct inode *inode, struct bdi_writeback *wb)
>         inode_io_list_move_locked(inode, wb, &wb->b_dirty);
>  }
>
> +static void redirty_tail(struct inode *inode, struct bdi_writeback *wb)
> +{
> +       spin_lock(&inode->i_lock);
> +       redirty_tail_locked(inode, wb);
> +       spin_unlock(&inode->i_lock);
> +}
> +
>  /*
>   * requeue inode for re-scanning after bdi->b_io list is exhausted.
>   */
> @@ -1394,7 +1406,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
>                  * writeback is not making progress due to locked
>                  * buffers. Skip this inode for now.
>                  */
> -               redirty_tail(inode, wb);
> +               redirty_tail_locked(inode, wb);
>                 return;
>         }
>
> @@ -1414,7 +1426,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
>                          * retrying writeback of the dirty page/inode
>                          * that cannot be performed immediately.
>                          */
> -                       redirty_tail(inode, wb);
> +                       redirty_tail_locked(inode, wb);
>                 }
>         } else if (inode->i_state & I_DIRTY) {
>                 /*
> @@ -1422,7 +1434,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
>                  * such as delayed allocation during submission or metadata
>                  * updates after data IO completion.
>                  */
> -               redirty_tail(inode, wb);
> +               redirty_tail_locked(inode, wb);
>         } else if (inode->i_state & I_DIRTY_TIME) {
>                 inode->dirtied_when = jiffies;
>                 inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
> @@ -1669,8 +1681,8 @@ static long writeback_sb_inodes(struct super_block *sb,
>                  */
>                 spin_lock(&inode->i_lock);
>                 if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
> +                       redirty_tail_locked(inode, wb);
>                         spin_unlock(&inode->i_lock);
> -                       redirty_tail(inode, wb);
>                         continue;
>                 }
>                 if ((inode->i_state & I_SYNC) && wbc.sync_mode != WB_SYNC_ALL) {
> --
> 2.16.4
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] writeback: Avoid skipping inode writeback
  2020-06-11  8:11 ` [PATCH 2/4] writeback: Avoid skipping inode writeback Jan Kara
@ 2020-06-12  8:03   ` Martijn Coenen
  2020-06-15  7:15     ` Jan Kara
  2020-06-15  6:27   ` Christoph Hellwig
  1 sibling, 1 reply; 14+ messages in thread
From: Martijn Coenen @ 2020-06-12  8:03 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Christoph Hellwig, stable

Hi Jan,

On Thu, Jun 11, 2020 at 10:12 AM Jan Kara <jack@suse.cz> wrote:
> Reported-by: Martijn Coenen <maco@android.com>
> Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
> CC: stable@vger.kernel.org
> Signed-off-by: Jan Kara <jack@suse.cz>

Thanks again for the fix. We've been running this (well, v1) for over
2 weeks across at least ~1000 Android devices with different kernel
versions, and I can confirm we haven't run into the issue this intends
to fix, or any other issues for that matter. The patch LGTM as well.

Reviewed-by: Martijn Coenen <maco@android.com>
Tested-by: Martijn Coenen <maco@android.com>

> ---
>  fs/fs-writeback.c  | 17 ++++++++++++-----
>  include/linux/fs.h |  8 ++++++--
>  2 files changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index ff0b18331590..f470c10641c5 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -146,6 +146,7 @@ static void inode_io_list_del_locked(struct inode *inode,
>         assert_spin_locked(&wb->list_lock);
>         assert_spin_locked(&inode->i_lock);
>
> +       inode->i_state &= ~I_SYNC_QUEUED;
>         list_del_init(&inode->i_io_list);
>         wb_io_lists_depopulated(wb);
>  }
> @@ -1187,6 +1188,7 @@ static void redirty_tail_locked(struct inode *inode, struct bdi_writeback *wb)
>                         inode->dirtied_when = jiffies;
>         }
>         inode_io_list_move_locked(inode, wb, &wb->b_dirty);
> +       inode->i_state &= ~I_SYNC_QUEUED;
>  }
>
>  static void redirty_tail(struct inode *inode, struct bdi_writeback *wb)
> @@ -1262,8 +1264,11 @@ static int move_expired_inodes(struct list_head *delaying_queue,
>                         break;
>                 list_move(&inode->i_io_list, &tmp);
>                 moved++;
> +               spin_lock(&inode->i_lock);
>                 if (flags & EXPIRE_DIRTY_ATIME)
> -                       set_bit(__I_DIRTY_TIME_EXPIRED, &inode->i_state);
> +                       inode->i_state |= I_DIRTY_TIME_EXPIRED;
> +               inode->i_state |= I_SYNC_QUEUED;
> +               spin_unlock(&inode->i_lock);
>                 if (sb_is_blkdev_sb(inode->i_sb))
>                         continue;
>                 if (sb && sb != inode->i_sb)
> @@ -1438,6 +1443,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
>         } else if (inode->i_state & I_DIRTY_TIME) {
>                 inode->dirtied_when = jiffies;
>                 inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
> +               inode->i_state &= ~I_SYNC_QUEUED;
>         } else {
>                 /* The inode is clean. Remove from writeback lists. */
>                 inode_io_list_del_locked(inode, wb);
> @@ -2301,11 +2307,12 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>                 inode->i_state |= flags;
>
>                 /*
> -                * If the inode is being synced, just update its dirty state.
> -                * The unlocker will place the inode on the appropriate
> -                * superblock list, based upon its state.
> +                * If the inode is queued for writeback by flush worker, just
> +                * update its dirty state. Once the flush worker is done with
> +                * the inode it will place it on the appropriate superblock
> +                * list, based upon its state.
>                  */
> -               if (inode->i_state & I_SYNC)
> +               if (inode->i_state & I_SYNC_QUEUED)
>                         goto out_unlock_inode;
>
>                 /*
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 19ef6c88c152..48556efcdcf0 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2157,6 +2157,10 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>   *
>   * I_DONTCACHE         Evict inode as soon as it is not used anymore.
>   *
> + * I_SYNC_QUEUED       Inode is queued in b_io or b_more_io writeback lists.
> + *                     Used to detect that mark_inode_dirty() should not move
> + *                     inode between dirty lists.
> + *
>   * Q: What is the difference between I_WILL_FREE and I_FREEING?
>   */
>  #define I_DIRTY_SYNC           (1 << 0)
> @@ -2174,12 +2178,12 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>  #define I_DIO_WAKEUP           (1 << __I_DIO_WAKEUP)
>  #define I_LINKABLE             (1 << 10)
>  #define I_DIRTY_TIME           (1 << 11)
> -#define __I_DIRTY_TIME_EXPIRED 12
> -#define I_DIRTY_TIME_EXPIRED   (1 << __I_DIRTY_TIME_EXPIRED)
> +#define I_DIRTY_TIME_EXPIRED   (1 << 12)
>  #define I_WB_SWITCH            (1 << 13)
>  #define I_OVL_INUSE            (1 << 14)
>  #define I_CREATING             (1 << 15)
>  #define I_DONTCACHE            (1 << 16)
> +#define I_SYNC_QUEUED          (1 << 17)
>
>  #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
>  #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
> --
> 2.16.4
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] writeback: Protect inode->i_io_list with inode->i_lock
  2020-06-11  8:11 ` [PATCH 1/4] writeback: Protect inode->i_io_list with inode->i_lock Jan Kara
  2020-06-12  7:45   ` Martijn Coenen
@ 2020-06-15  6:27   ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2020-06-15  6:27 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Christoph Hellwig, Martijn Coenen

On Thu, Jun 11, 2020 at 10:11:52AM +0200, Jan Kara wrote:
> Currently, operations on inode->i_io_list are protected by
> wb->list_lock. In the following patches we'll need to maintain
> consistency between inode->i_state and inode->i_io_list so change the
> code so that inode->i_lock protects also all inode's i_io_list handling.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] writeback: Avoid skipping inode writeback
  2020-06-11  8:11 ` [PATCH 2/4] writeback: Avoid skipping inode writeback Jan Kara
  2020-06-12  8:03   ` Martijn Coenen
@ 2020-06-15  6:27   ` Christoph Hellwig
  2020-06-15  7:15     ` Jan Kara
  1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2020-06-15  6:27 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Christoph Hellwig, Martijn Coenen, stable

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] writeback: Fix sync livelock due to b_dirty_time processing
  2020-06-11  8:11 ` [PATCH 3/4] writeback: Fix sync livelock due to b_dirty_time processing Jan Kara
@ 2020-06-15  6:27   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2020-06-15  6:27 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Christoph Hellwig, Martijn Coenen, stable

On Thu, Jun 11, 2020 at 10:11:54AM +0200, Jan Kara wrote:
> When we are processing writeback for sync(2), move_expired_inodes()
> didn't set any inode expiry value (older_than_this). This can result in
> writeback never completing if there's steady stream of inodes added to
> b_dirty_time list as writeback rechecks dirty lists after each writeback
> round whether there's more work to be done. Fix the problem by using
> sync(2) start time is inode expiry value when processing b_dirty_time
> list similarly as for ordinarily dirtied inodes. This requires some
> refactoring of older_than_this handling which simplifies the code
> noticeably as a bonus.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] writeback: Avoid skipping inode writeback
  2020-06-12  8:03   ` Martijn Coenen
@ 2020-06-15  7:15     ` Jan Kara
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2020-06-15  7:15 UTC (permalink / raw)
  To: Martijn Coenen; +Cc: Jan Kara, linux-fsdevel, Christoph Hellwig, stable

On Fri 12-06-20 10:03:18, Martijn Coenen wrote:
> Hi Jan,
> 
> On Thu, Jun 11, 2020 at 10:12 AM Jan Kara <jack@suse.cz> wrote:
> > Reported-by: Martijn Coenen <maco@android.com>
> > Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
> > CC: stable@vger.kernel.org
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> Thanks again for the fix. We've been running this (well, v1) for over
> 2 weeks across at least ~1000 Android devices with different kernel
> versions, and I can confirm we haven't run into the issue this intends
> to fix, or any other issues for that matter. The patch LGTM as well.
> 
> Reviewed-by: Martijn Coenen <maco@android.com>
> Tested-by: Martijn Coenen <maco@android.com>

Thanks for testing and review! I'll queue the patches in my tree and push
them to Linus later this week.

								Honza


> 
> > ---
> >  fs/fs-writeback.c  | 17 ++++++++++++-----
> >  include/linux/fs.h |  8 ++++++--
> >  2 files changed, 18 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index ff0b18331590..f470c10641c5 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -146,6 +146,7 @@ static void inode_io_list_del_locked(struct inode *inode,
> >         assert_spin_locked(&wb->list_lock);
> >         assert_spin_locked(&inode->i_lock);
> >
> > +       inode->i_state &= ~I_SYNC_QUEUED;
> >         list_del_init(&inode->i_io_list);
> >         wb_io_lists_depopulated(wb);
> >  }
> > @@ -1187,6 +1188,7 @@ static void redirty_tail_locked(struct inode *inode, struct bdi_writeback *wb)
> >                         inode->dirtied_when = jiffies;
> >         }
> >         inode_io_list_move_locked(inode, wb, &wb->b_dirty);
> > +       inode->i_state &= ~I_SYNC_QUEUED;
> >  }
> >
> >  static void redirty_tail(struct inode *inode, struct bdi_writeback *wb)
> > @@ -1262,8 +1264,11 @@ static int move_expired_inodes(struct list_head *delaying_queue,
> >                         break;
> >                 list_move(&inode->i_io_list, &tmp);
> >                 moved++;
> > +               spin_lock(&inode->i_lock);
> >                 if (flags & EXPIRE_DIRTY_ATIME)
> > -                       set_bit(__I_DIRTY_TIME_EXPIRED, &inode->i_state);
> > +                       inode->i_state |= I_DIRTY_TIME_EXPIRED;
> > +               inode->i_state |= I_SYNC_QUEUED;
> > +               spin_unlock(&inode->i_lock);
> >                 if (sb_is_blkdev_sb(inode->i_sb))
> >                         continue;
> >                 if (sb && sb != inode->i_sb)
> > @@ -1438,6 +1443,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
> >         } else if (inode->i_state & I_DIRTY_TIME) {
> >                 inode->dirtied_when = jiffies;
> >                 inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
> > +               inode->i_state &= ~I_SYNC_QUEUED;
> >         } else {
> >                 /* The inode is clean. Remove from writeback lists. */
> >                 inode_io_list_del_locked(inode, wb);
> > @@ -2301,11 +2307,12 @@ void __mark_inode_dirty(struct inode *inode, int flags)
> >                 inode->i_state |= flags;
> >
> >                 /*
> > -                * If the inode is being synced, just update its dirty state.
> > -                * The unlocker will place the inode on the appropriate
> > -                * superblock list, based upon its state.
> > +                * If the inode is queued for writeback by flush worker, just
> > +                * update its dirty state. Once the flush worker is done with
> > +                * the inode it will place it on the appropriate superblock
> > +                * list, based upon its state.
> >                  */
> > -               if (inode->i_state & I_SYNC)
> > +               if (inode->i_state & I_SYNC_QUEUED)
> >                         goto out_unlock_inode;
> >
> >                 /*
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 19ef6c88c152..48556efcdcf0 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2157,6 +2157,10 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> >   *
> >   * I_DONTCACHE         Evict inode as soon as it is not used anymore.
> >   *
> > + * I_SYNC_QUEUED       Inode is queued in b_io or b_more_io writeback lists.
> > + *                     Used to detect that mark_inode_dirty() should not move
> > + *                     inode between dirty lists.
> > + *
> >   * Q: What is the difference between I_WILL_FREE and I_FREEING?
> >   */
> >  #define I_DIRTY_SYNC           (1 << 0)
> > @@ -2174,12 +2178,12 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> >  #define I_DIO_WAKEUP           (1 << __I_DIO_WAKEUP)
> >  #define I_LINKABLE             (1 << 10)
> >  #define I_DIRTY_TIME           (1 << 11)
> > -#define __I_DIRTY_TIME_EXPIRED 12
> > -#define I_DIRTY_TIME_EXPIRED   (1 << __I_DIRTY_TIME_EXPIRED)
> > +#define I_DIRTY_TIME_EXPIRED   (1 << 12)
> >  #define I_WB_SWITCH            (1 << 13)
> >  #define I_OVL_INUSE            (1 << 14)
> >  #define I_CREATING             (1 << 15)
> >  #define I_DONTCACHE            (1 << 16)
> > +#define I_SYNC_QUEUED          (1 << 17)
> >
> >  #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
> >  #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
> > --
> > 2.16.4
> >
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] writeback: Avoid skipping inode writeback
  2020-06-15  6:27   ` Christoph Hellwig
@ 2020-06-15  7:15     ` Jan Kara
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2020-06-15  7:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, linux-fsdevel, Martijn Coenen, stable

On Sun 14-06-20 23:27:33, Christoph Hellwig wrote:
> Looks good,
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks for review!

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] writeback: Drop I_DIRTY_TIME_EXPIRE
  2020-06-11  8:11 ` [PATCH 4/4] writeback: Drop I_DIRTY_TIME_EXPIRE Jan Kara
@ 2020-09-02 17:20   ` Darrick J. Wong
  2020-09-03 10:10     ` Jan Kara
  0 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2020-09-02 17:20 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Christoph Hellwig, Martijn Coenen, xfs, Eric Sandeen

[add linux-xfs and xfsprogs maintainer to cc]

On Thu, Jun 11, 2020 at 10:11:55AM +0200, Jan Kara wrote:
> The only use of I_DIRTY_TIME_EXPIRE is to detect in
> __writeback_single_inode() that inode got there because flush worker
> decided it's time to writeback the dirty inode time stamps (either
> because we are syncing or because of age). However we can detect this
> directly in __writeback_single_inode() and there's no need for the
> strange propagation with I_DIRTY_TIME_EXPIRE flag.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/inode.c                  |  2 +-
>  fs/fs-writeback.c                | 28 +++++++++++-----------------
>  fs/xfs/libxfs/xfs_trans_inode.c  |  4 ++--

Urrk, so I only just noticed this when I rebased my development tree
onto 5.9-rc3.  If you're going to change things in fs/xfs/, please cc
the xfs list to keep us in the loop.  Changes to fs/xfs/libxfs/ have to
be ported to userspace.

--D

>  include/linux/fs.h               |  1 -
>  include/trace/events/writeback.h |  1 -
>  5 files changed, 14 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 40ec5c7ef0d3..4db497f02ffb 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4887,7 +4887,7 @@ static void __ext4_update_other_inode_time(struct super_block *sb,
>  	    (inode->i_state & I_DIRTY_TIME)) {
>  		struct ext4_inode_info	*ei = EXT4_I(inode);
>  
> -		inode->i_state &= ~(I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED);
> +		inode->i_state &= ~I_DIRTY_TIME;
>  		spin_unlock(&inode->i_lock);
>  
>  		spin_lock(&ei->i_raw_lock);
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index ae17d64a3e18..149227160ff0 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1238,7 +1238,7 @@ static bool inode_dirtied_after(struct inode *inode, unsigned long t)
>   */
>  static int move_expired_inodes(struct list_head *delaying_queue,
>  			       struct list_head *dispatch_queue,
> -			       int flags, unsigned long dirtied_before)
> +			       unsigned long dirtied_before)
>  {
>  	LIST_HEAD(tmp);
>  	struct list_head *pos, *node;
> @@ -1254,8 +1254,6 @@ static int move_expired_inodes(struct list_head *delaying_queue,
>  		list_move(&inode->i_io_list, &tmp);
>  		moved++;
>  		spin_lock(&inode->i_lock);
> -		if (flags & EXPIRE_DIRTY_ATIME)
> -			inode->i_state |= I_DIRTY_TIME_EXPIRED;
>  		inode->i_state |= I_SYNC_QUEUED;
>  		spin_unlock(&inode->i_lock);
>  		if (sb_is_blkdev_sb(inode->i_sb))
> @@ -1303,11 +1301,11 @@ static void queue_io(struct bdi_writeback *wb, struct wb_writeback_work *work,
>  
>  	assert_spin_locked(&wb->list_lock);
>  	list_splice_init(&wb->b_more_io, &wb->b_io);
> -	moved = move_expired_inodes(&wb->b_dirty, &wb->b_io, 0, dirtied_before);
> +	moved = move_expired_inodes(&wb->b_dirty, &wb->b_io, dirtied_before);
>  	if (!work->for_sync)
>  		time_expire_jif = jiffies - dirtytime_expire_interval * HZ;
>  	moved += move_expired_inodes(&wb->b_dirty_time, &wb->b_io,
> -				     EXPIRE_DIRTY_ATIME, time_expire_jif);
> +				     time_expire_jif);
>  	if (moved)
>  		wb_io_lists_populated(wb);
>  	trace_writeback_queue_io(wb, work, dirtied_before, moved);
> @@ -1483,18 +1481,14 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
>  	spin_lock(&inode->i_lock);
>  
>  	dirty = inode->i_state & I_DIRTY;
> -	if (inode->i_state & I_DIRTY_TIME) {
> -		if ((dirty & I_DIRTY_INODE) ||
> -		    wbc->sync_mode == WB_SYNC_ALL ||
> -		    unlikely(inode->i_state & I_DIRTY_TIME_EXPIRED) ||
> -		    unlikely(time_after(jiffies,
> -					(inode->dirtied_time_when +
> -					 dirtytime_expire_interval * HZ)))) {
> -			dirty |= I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED;
> -			trace_writeback_lazytime(inode);
> -		}
> -	} else
> -		inode->i_state &= ~I_DIRTY_TIME_EXPIRED;
> +	if ((inode->i_state & I_DIRTY_TIME) &&
> +	    ((dirty & I_DIRTY_INODE) ||
> +	     wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
> +	     time_after(jiffies, inode->dirtied_time_when +
> +			dirtytime_expire_interval * HZ))) {
> +		dirty |= I_DIRTY_TIME;
> +		trace_writeback_lazytime(inode);
> +	}
>  	inode->i_state &= ~dirty;
>  
>  	/*
> diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
> index b5dfb6654842..1b4df6636944 100644
> --- a/fs/xfs/libxfs/xfs_trans_inode.c
> +++ b/fs/xfs/libxfs/xfs_trans_inode.c
> @@ -96,9 +96,9 @@ xfs_trans_log_inode(
>  	 * to log the timestamps, or will clear already cleared fields in the
>  	 * worst case.
>  	 */
> -	if (inode->i_state & (I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED)) {
> +	if (inode->i_state & I_DIRTY_TIME) {
>  		spin_lock(&inode->i_lock);
> -		inode->i_state &= ~(I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED);
> +		inode->i_state &= ~I_DIRTY_TIME;
>  		spin_unlock(&inode->i_lock);
>  	}
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 48556efcdcf0..45eadf5bea5d 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2178,7 +2178,6 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>  #define I_DIO_WAKEUP		(1 << __I_DIO_WAKEUP)
>  #define I_LINKABLE		(1 << 10)
>  #define I_DIRTY_TIME		(1 << 11)
> -#define I_DIRTY_TIME_EXPIRED	(1 << 12)
>  #define I_WB_SWITCH		(1 << 13)
>  #define I_OVL_INUSE		(1 << 14)
>  #define I_CREATING		(1 << 15)
> diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
> index 7565dcd59697..e7cbccc7c14c 100644
> --- a/include/trace/events/writeback.h
> +++ b/include/trace/events/writeback.h
> @@ -20,7 +20,6 @@
>  		{I_CLEAR,		"I_CLEAR"},		\
>  		{I_SYNC,		"I_SYNC"},		\
>  		{I_DIRTY_TIME,		"I_DIRTY_TIME"},	\
> -		{I_DIRTY_TIME_EXPIRED,	"I_DIRTY_TIME_EXPIRED"}, \
>  		{I_REFERENCED,		"I_REFERENCED"}		\
>  	)
>  
> -- 
> 2.16.4
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] writeback: Drop I_DIRTY_TIME_EXPIRE
  2020-09-02 17:20   ` Darrick J. Wong
@ 2020-09-03 10:10     ` Jan Kara
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2020-09-03 10:10 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jan Kara, linux-fsdevel, Christoph Hellwig, Martijn Coenen, xfs,
	Eric Sandeen

On Wed 02-09-20 10:20:48, Darrick J. Wong wrote:
> [add linux-xfs and xfsprogs maintainer to cc]
> 
> On Thu, Jun 11, 2020 at 10:11:55AM +0200, Jan Kara wrote:
> > The only use of I_DIRTY_TIME_EXPIRE is to detect in
> > __writeback_single_inode() that inode got there because flush worker
> > decided it's time to writeback the dirty inode time stamps (either
> > because we are syncing or because of age). However we can detect this
> > directly in __writeback_single_inode() and there's no need for the
> > strange propagation with I_DIRTY_TIME_EXPIRE flag.
> > 
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/ext4/inode.c                  |  2 +-
> >  fs/fs-writeback.c                | 28 +++++++++++-----------------
> >  fs/xfs/libxfs/xfs_trans_inode.c  |  4 ++--
> 
> Urrk, so I only just noticed this when I rebased my development tree
> onto 5.9-rc3.  If you're going to change things in fs/xfs/, please cc
> the xfs list to keep us in the loop.  Changes to fs/xfs/libxfs/ have to
> be ported to userspace.

OK, will do next time. I was just dropping a generic flag XFS didn't use in
any particular way so it didn't occur to me XFS people would be interested...

								Honza

> >  include/linux/fs.h               |  1 -
> >  include/trace/events/writeback.h |  1 -
> >  5 files changed, 14 insertions(+), 22 deletions(-)
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 40ec5c7ef0d3..4db497f02ffb 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -4887,7 +4887,7 @@ static void __ext4_update_other_inode_time(struct super_block *sb,
> >  	    (inode->i_state & I_DIRTY_TIME)) {
> >  		struct ext4_inode_info	*ei = EXT4_I(inode);
> >  
> > -		inode->i_state &= ~(I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED);
> > +		inode->i_state &= ~I_DIRTY_TIME;
> >  		spin_unlock(&inode->i_lock);
> >  
> >  		spin_lock(&ei->i_raw_lock);
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index ae17d64a3e18..149227160ff0 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -1238,7 +1238,7 @@ static bool inode_dirtied_after(struct inode *inode, unsigned long t)
> >   */
> >  static int move_expired_inodes(struct list_head *delaying_queue,
> >  			       struct list_head *dispatch_queue,
> > -			       int flags, unsigned long dirtied_before)
> > +			       unsigned long dirtied_before)
> >  {
> >  	LIST_HEAD(tmp);
> >  	struct list_head *pos, *node;
> > @@ -1254,8 +1254,6 @@ static int move_expired_inodes(struct list_head *delaying_queue,
> >  		list_move(&inode->i_io_list, &tmp);
> >  		moved++;
> >  		spin_lock(&inode->i_lock);
> > -		if (flags & EXPIRE_DIRTY_ATIME)
> > -			inode->i_state |= I_DIRTY_TIME_EXPIRED;
> >  		inode->i_state |= I_SYNC_QUEUED;
> >  		spin_unlock(&inode->i_lock);
> >  		if (sb_is_blkdev_sb(inode->i_sb))
> > @@ -1303,11 +1301,11 @@ static void queue_io(struct bdi_writeback *wb, struct wb_writeback_work *work,
> >  
> >  	assert_spin_locked(&wb->list_lock);
> >  	list_splice_init(&wb->b_more_io, &wb->b_io);
> > -	moved = move_expired_inodes(&wb->b_dirty, &wb->b_io, 0, dirtied_before);
> > +	moved = move_expired_inodes(&wb->b_dirty, &wb->b_io, dirtied_before);
> >  	if (!work->for_sync)
> >  		time_expire_jif = jiffies - dirtytime_expire_interval * HZ;
> >  	moved += move_expired_inodes(&wb->b_dirty_time, &wb->b_io,
> > -				     EXPIRE_DIRTY_ATIME, time_expire_jif);
> > +				     time_expire_jif);
> >  	if (moved)
> >  		wb_io_lists_populated(wb);
> >  	trace_writeback_queue_io(wb, work, dirtied_before, moved);
> > @@ -1483,18 +1481,14 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
> >  	spin_lock(&inode->i_lock);
> >  
> >  	dirty = inode->i_state & I_DIRTY;
> > -	if (inode->i_state & I_DIRTY_TIME) {
> > -		if ((dirty & I_DIRTY_INODE) ||
> > -		    wbc->sync_mode == WB_SYNC_ALL ||
> > -		    unlikely(inode->i_state & I_DIRTY_TIME_EXPIRED) ||
> > -		    unlikely(time_after(jiffies,
> > -					(inode->dirtied_time_when +
> > -					 dirtytime_expire_interval * HZ)))) {
> > -			dirty |= I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED;
> > -			trace_writeback_lazytime(inode);
> > -		}
> > -	} else
> > -		inode->i_state &= ~I_DIRTY_TIME_EXPIRED;
> > +	if ((inode->i_state & I_DIRTY_TIME) &&
> > +	    ((dirty & I_DIRTY_INODE) ||
> > +	     wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
> > +	     time_after(jiffies, inode->dirtied_time_when +
> > +			dirtytime_expire_interval * HZ))) {
> > +		dirty |= I_DIRTY_TIME;
> > +		trace_writeback_lazytime(inode);
> > +	}
> >  	inode->i_state &= ~dirty;
> >  
> >  	/*
> > diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
> > index b5dfb6654842..1b4df6636944 100644
> > --- a/fs/xfs/libxfs/xfs_trans_inode.c
> > +++ b/fs/xfs/libxfs/xfs_trans_inode.c
> > @@ -96,9 +96,9 @@ xfs_trans_log_inode(
> >  	 * to log the timestamps, or will clear already cleared fields in the
> >  	 * worst case.
> >  	 */
> > -	if (inode->i_state & (I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED)) {
> > +	if (inode->i_state & I_DIRTY_TIME) {
> >  		spin_lock(&inode->i_lock);
> > -		inode->i_state &= ~(I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED);
> > +		inode->i_state &= ~I_DIRTY_TIME;
> >  		spin_unlock(&inode->i_lock);
> >  	}
> >  
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 48556efcdcf0..45eadf5bea5d 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2178,7 +2178,6 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> >  #define I_DIO_WAKEUP		(1 << __I_DIO_WAKEUP)
> >  #define I_LINKABLE		(1 << 10)
> >  #define I_DIRTY_TIME		(1 << 11)
> > -#define I_DIRTY_TIME_EXPIRED	(1 << 12)
> >  #define I_WB_SWITCH		(1 << 13)
> >  #define I_OVL_INUSE		(1 << 14)
> >  #define I_CREATING		(1 << 15)
> > diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
> > index 7565dcd59697..e7cbccc7c14c 100644
> > --- a/include/trace/events/writeback.h
> > +++ b/include/trace/events/writeback.h
> > @@ -20,7 +20,6 @@
> >  		{I_CLEAR,		"I_CLEAR"},		\
> >  		{I_SYNC,		"I_SYNC"},		\
> >  		{I_DIRTY_TIME,		"I_DIRTY_TIME"},	\
> > -		{I_DIRTY_TIME_EXPIRED,	"I_DIRTY_TIME_EXPIRED"}, \
> >  		{I_REFERENCED,		"I_REFERENCED"}		\
> >  	)
> >  
> > -- 
> > 2.16.4
> > 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-09-03 10:10 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11  8:11 [PATCH 0/4 v2] writeback: Lazytime handling fix and cleanups Jan Kara
2020-06-11  8:11 ` [PATCH 1/4] writeback: Protect inode->i_io_list with inode->i_lock Jan Kara
2020-06-12  7:45   ` Martijn Coenen
2020-06-15  6:27   ` Christoph Hellwig
2020-06-11  8:11 ` [PATCH 2/4] writeback: Avoid skipping inode writeback Jan Kara
2020-06-12  8:03   ` Martijn Coenen
2020-06-15  7:15     ` Jan Kara
2020-06-15  6:27   ` Christoph Hellwig
2020-06-15  7:15     ` Jan Kara
2020-06-11  8:11 ` [PATCH 3/4] writeback: Fix sync livelock due to b_dirty_time processing Jan Kara
2020-06-15  6:27   ` Christoph Hellwig
2020-06-11  8:11 ` [PATCH 4/4] writeback: Drop I_DIRTY_TIME_EXPIRE Jan Kara
2020-09-02 17:20   ` Darrick J. Wong
2020-09-03 10:10     ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).