linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Fixes and cleanups to fs-writeback
@ 2024-02-08 17:20 Kemeng Shi
  2024-02-08 17:20 ` [PATCH 1/7] fs/writeback: avoid to writeback non-expired inode in kupdate writeback Kemeng Shi
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Kemeng Shi @ 2024-02-08 17:20 UTC (permalink / raw)
  To: viro, brauner, jack; +Cc: linux-fsdevel, linux-kernel

This series contains some random fixes and cleanups to fs-writeback.
More details can be found in respective patches. Thanks!

Kemeng Shi (7):
  fs/writeback: avoid to writeback non-expired inode in kupdate
    writeback
  fs/writeback: bail out if there is no more inodes for IO and queued
    once
  fs/writeback: remove unused parameter wb of finish_writeback_work
  fs/writeback: remove unneeded check in writeback_single_inode
  fs/writeback: only calculate dirtied_before when b_io is empty
  fs/writeback: correct comment of __wakeup_flusher_threads_bdi
  fs/writeback: remove unnecessary return in writeback_inodes_sb

 fs/fs-writeback.c | 66 +++++++++++++++++++++++++++++++----------------
 1 file changed, 44 insertions(+), 22 deletions(-)

-- 
2.30.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/7] fs/writeback: avoid to writeback non-expired inode in kupdate writeback
  2024-02-08 17:20 [PATCH 0/7] Fixes and cleanups to fs-writeback Kemeng Shi
@ 2024-02-08 17:20 ` Kemeng Shi
  2024-02-08 18:29   ` Tim Chen
  2024-02-23 13:42   ` Jan Kara
  2024-02-08 17:20 ` [PATCH 2/7] fs/writeback: bail out if there is no more inodes for IO and queued once Kemeng Shi
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Kemeng Shi @ 2024-02-08 17:20 UTC (permalink / raw)
  To: viro, brauner, jack; +Cc: linux-fsdevel, linux-kernel

In kupdate writeback, only expired inode (have been dirty for longer than
dirty_expire_interval) is supposed to be written back. However, kupdate
writeback will writeback non-expired inode left in b_io or b_more_io from
last wb_writeback. As a result, writeback will keep being triggered
unexpected when we keep dirtying pages even dirty memory is under
threshold and inode is not expired. To be more specific:
Assume dirty background threshold is > 1G and dirty_expire_centisecs is
> 60s. When we running fio -size=1G -invalidate=0 -ioengine=libaio
--time_based -runtime=60... (keep dirtying), the writeback will keep
being triggered as following:
wb_workfn
  wb_do_writeback
    wb_check_background_flush
      /*
       * Wb dirty background threshold starts at 0 if device was idle and
       * grows up when bandwidth of wb is updated. So a background
       * writeback is triggered.
       */
      wb_over_bg_thresh
      /*
       * Dirtied inode will be written back and added to b_more_io list
       * after slice used up (because we keep dirtying the inode).
       */
      wb_writeback

Writeback is triggered per dirty_writeback_centisecs as following:
wb_workfn
  wb_do_writeback
    wb_check_old_data_flush
      /*
       * Write back inode left in b_io and b_more_io from last wb_writeback
       * even the inode is non-expired and it will be added to b_more_io
       * again as slice will be used up (because we keep dirtying the
       * inode)
       */
      wb_writeback

Fix this by moving non-expired inode in io list from last wb_writeback to
dirty list in kudpate writeback.

Test as following:
/* make it more easier to observe the issue */
echo 300000 > /proc/sys/vm/dirty_expire_centisecs
echo 100 > /proc/sys/vm/dirty_writeback_centisecs
/* create a idle device */
mkfs.ext4 -F /dev/vdb
mount /dev/vdb /bdi1/
/* run buffer write with fio */
fio -name test -filename=/bdi1/file -size=800M -ioengine=libaio -bs=4K \
-iodepth=1 -rw=write -direct=0 --time_based -runtime=60 -invalidate=0

Result before fix (run three tests):
1360MB/s
1329MB/s
1455MB/s

Result after fix (run three tests);
790MB/s
1820MB/s
1804MB/s

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/fs-writeback.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 5ab1aaf805f7..a9a918972719 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2046,6 +2046,23 @@ static long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages,
 	return nr_pages - work.nr_pages;
 }
 
+static void filter_expired_io(struct bdi_writeback *wb)
+{
+	struct inode *inode, *tmp;
+	unsigned long expired_jiffies = jiffies -
+		msecs_to_jiffies(dirty_expire_interval * 10);
+
+	spin_lock(&wb->list_lock);
+	list_for_each_entry_safe(inode, tmp, &wb->b_io, i_io_list)
+		if (inode_dirtied_after(inode, expired_jiffies))
+			redirty_tail(inode, wb);
+
+	list_for_each_entry_safe(inode, tmp, &wb->b_more_io, i_io_list)
+		if (inode_dirtied_after(inode, expired_jiffies))
+			redirty_tail(inode, wb);
+	spin_unlock(&wb->list_lock);
+}
+
 /*
  * Explicit flushing or periodic writeback of "old" data.
  *
@@ -2070,6 +2087,9 @@ static long wb_writeback(struct bdi_writeback *wb,
 	long progress;
 	struct blk_plug plug;
 
+	if (work->for_kupdate)
+		filter_expired_io(wb);
+
 	blk_start_plug(&plug);
 	for (;;) {
 		/*
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/7] fs/writeback: bail out if there is no more inodes for IO and queued once
  2024-02-08 17:20 [PATCH 0/7] Fixes and cleanups to fs-writeback Kemeng Shi
  2024-02-08 17:20 ` [PATCH 1/7] fs/writeback: avoid to writeback non-expired inode in kupdate writeback Kemeng Shi
@ 2024-02-08 17:20 ` Kemeng Shi
  2024-02-08 19:21   ` Tim Chen
  2024-02-23 13:49   ` Jan Kara
  2024-02-08 17:20 ` [PATCH 3/7] fs/writeback: remove unused parameter wb of finish_writeback_work Kemeng Shi
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Kemeng Shi @ 2024-02-08 17:20 UTC (permalink / raw)
  To: viro, brauner, jack; +Cc: linux-fsdevel, linux-kernel

For case there is no more inodes for IO in io list from last wb_writeback,
We may bail out early even there is inode in dirty list should be written
back. Only bail out when we queued once to avoid missing dirtied inode.

This is from code reading...

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/fs-writeback.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index a9a918972719..edb0cff51673 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2086,6 +2086,7 @@ static long wb_writeback(struct bdi_writeback *wb,
 	struct inode *inode;
 	long progress;
 	struct blk_plug plug;
+	bool queued = false;
 
 	if (work->for_kupdate)
 		filter_expired_io(wb);
@@ -2131,8 +2132,10 @@ static long wb_writeback(struct bdi_writeback *wb,
 			dirtied_before = jiffies;
 
 		trace_writeback_start(wb, work);
-		if (list_empty(&wb->b_io))
+		if (list_empty(&wb->b_io)) {
 			queue_io(wb, work, dirtied_before);
+			queued = true;
+		}
 		if (work->sb)
 			progress = writeback_sb_inodes(work->sb, wb, work);
 		else
@@ -2155,7 +2158,7 @@ static long wb_writeback(struct bdi_writeback *wb,
 		/*
 		 * No more inodes for IO, bail
 		 */
-		if (list_empty(&wb->b_more_io)) {
+		if (list_empty(&wb->b_more_io) && queued) {
 			spin_unlock(&wb->list_lock);
 			break;
 		}
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/7] fs/writeback: remove unused parameter wb of finish_writeback_work
  2024-02-08 17:20 [PATCH 0/7] Fixes and cleanups to fs-writeback Kemeng Shi
  2024-02-08 17:20 ` [PATCH 1/7] fs/writeback: avoid to writeback non-expired inode in kupdate writeback Kemeng Shi
  2024-02-08 17:20 ` [PATCH 2/7] fs/writeback: bail out if there is no more inodes for IO and queued once Kemeng Shi
@ 2024-02-08 17:20 ` Kemeng Shi
  2024-02-08 19:26   ` Tim Chen
  2024-02-23 13:49   ` Jan Kara
  2024-02-08 17:20 ` [PATCH 4/7] fs/writeback: remove unneeded check in writeback_single_inode Kemeng Shi
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Kemeng Shi @ 2024-02-08 17:20 UTC (permalink / raw)
  To: viro, brauner, jack; +Cc: linux-fsdevel, linux-kernel

Remove unused parameter wb of finish_writeback_work.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/fs-writeback.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index edb0cff51673..2619f74ced70 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -166,8 +166,7 @@ static void wb_wakeup_delayed(struct bdi_writeback *wb)
 	spin_unlock_irq(&wb->work_lock);
 }
 
-static void finish_writeback_work(struct bdi_writeback *wb,
-				  struct wb_writeback_work *work)
+static void finish_writeback_work(struct wb_writeback_work *work)
 {
 	struct wb_completion *done = work->done;
 
@@ -196,7 +195,7 @@ static void wb_queue_work(struct bdi_writeback *wb,
 		list_add_tail(&work->list, &wb->work_list);
 		mod_delayed_work(bdi_wq, &wb->dwork, 0);
 	} else
-		finish_writeback_work(wb, work);
+		finish_writeback_work(work);
 
 	spin_unlock_irq(&wb->work_lock);
 }
@@ -2285,7 +2284,7 @@ static long wb_do_writeback(struct bdi_writeback *wb)
 	while ((work = get_next_work_item(wb)) != NULL) {
 		trace_writeback_exec(wb, work);
 		wrote += wb_writeback(wb, work);
-		finish_writeback_work(wb, work);
+		finish_writeback_work(work);
 	}
 
 	/*
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/7] fs/writeback: remove unneeded check in writeback_single_inode
  2024-02-08 17:20 [PATCH 0/7] Fixes and cleanups to fs-writeback Kemeng Shi
                   ` (2 preceding siblings ...)
  2024-02-08 17:20 ` [PATCH 3/7] fs/writeback: remove unused parameter wb of finish_writeback_work Kemeng Shi
@ 2024-02-08 17:20 ` Kemeng Shi
  2024-02-08 19:34   ` Tim Chen
  2024-02-10  0:46   ` Dave Chinner
  2024-02-08 17:20 ` [PATCH 5/7] fs/writeback: only calculate dirtied_before when b_io is empty Kemeng Shi
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Kemeng Shi @ 2024-02-08 17:20 UTC (permalink / raw)
  To: viro, brauner, jack; +Cc: linux-fsdevel, linux-kernel

I_DIRTY_ALL consists of I_DIRTY_TIME and I_DIRTY, so I_DIRTY_TIME must
be set when any bit of I_DIRTY_ALL is set but I_DIRTY is not set.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/fs-writeback.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 2619f74ced70..b61bf2075931 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1788,7 +1788,7 @@ static int writeback_single_inode(struct inode *inode,
 		else if (!(inode->i_state & I_SYNC_QUEUED)) {
 			if ((inode->i_state & I_DIRTY))
 				redirty_tail_locked(inode, wb);
-			else if (inode->i_state & I_DIRTY_TIME) {
+			else {
 				inode->dirtied_when = jiffies;
 				inode_io_list_move_locked(inode,
 							  wb,
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/7] fs/writeback: only calculate dirtied_before when b_io is empty
  2024-02-08 17:20 [PATCH 0/7] Fixes and cleanups to fs-writeback Kemeng Shi
                   ` (3 preceding siblings ...)
  2024-02-08 17:20 ` [PATCH 4/7] fs/writeback: remove unneeded check in writeback_single_inode Kemeng Shi
@ 2024-02-08 17:20 ` Kemeng Shi
  2024-02-23 13:58   ` Jan Kara
  2024-02-08 17:20 ` [PATCH 6/7] fs/writeback: correct comment of __wakeup_flusher_threads_bdi Kemeng Shi
  2024-02-08 17:20 ` [PATCH 7/7] fs/writeback: remove unnecessary return in writeback_inodes_sb Kemeng Shi
  6 siblings, 1 reply; 25+ messages in thread
From: Kemeng Shi @ 2024-02-08 17:20 UTC (permalink / raw)
  To: viro, brauner, jack; +Cc: linux-fsdevel, linux-kernel

The dirtied_before is only used when b_io is not empty, so only calculate
when b_io is not empty.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/fs-writeback.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index b61bf2075931..e8868e814e0a 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2118,20 +2118,21 @@ static long wb_writeback(struct bdi_writeback *wb,
 
 		spin_lock(&wb->list_lock);
 
-		/*
-		 * Kupdate and background works are special and we want to
-		 * include all inodes that need writing. Livelock avoidance is
-		 * handled by these works yielding to any other work so we are
-		 * safe.
-		 */
-		if (work->for_kupdate) {
-			dirtied_before = jiffies -
-				msecs_to_jiffies(dirty_expire_interval * 10);
-		} else if (work->for_background)
-			dirtied_before = jiffies;
-
 		trace_writeback_start(wb, work);
 		if (list_empty(&wb->b_io)) {
+			/*
+			 * Kupdate and background works are special and we want to
+			 * include all inodes that need writing. Livelock avoidance is
+			 * handled by these works yielding to any other work so we are
+			 * safe.
+			 */
+			if (work->for_kupdate) {
+				dirtied_before = jiffies -
+					msecs_to_jiffies(dirty_expire_interval *
+							 10);
+			} else if (work->for_background)
+				dirtied_before = jiffies;
+
 			queue_io(wb, work, dirtied_before);
 			queued = true;
 		}
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/7] fs/writeback: correct comment of __wakeup_flusher_threads_bdi
  2024-02-08 17:20 [PATCH 0/7] Fixes and cleanups to fs-writeback Kemeng Shi
                   ` (4 preceding siblings ...)
  2024-02-08 17:20 ` [PATCH 5/7] fs/writeback: only calculate dirtied_before when b_io is empty Kemeng Shi
@ 2024-02-08 17:20 ` Kemeng Shi
  2024-02-23 13:59   ` Jan Kara
  2024-02-08 17:20 ` [PATCH 7/7] fs/writeback: remove unnecessary return in writeback_inodes_sb Kemeng Shi
  6 siblings, 1 reply; 25+ messages in thread
From: Kemeng Shi @ 2024-02-08 17:20 UTC (permalink / raw)
  To: viro, brauner, jack; +Cc: linux-fsdevel, linux-kernel

Commit e8e8a0c6c9bfc ("writeback: move nr_pages == 0 logic to one
location") removed parameter nr_pages of __wakeup_flusher_threads_bdi
and we try to writeback all dirty pages in __wakeup_flusher_threads_bdi
now. Just correct stale comment.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/fs-writeback.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e8868e814e0a..816505d74b2f 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2345,8 +2345,7 @@ void wb_workfn(struct work_struct *work)
 }
 
 /*
- * Start writeback of `nr_pages' pages on this bdi. If `nr_pages' is zero,
- * write back the whole world.
+ * Start writeback of all dirty pages on this bdi.
  */
 static void __wakeup_flusher_threads_bdi(struct backing_dev_info *bdi,
 					 enum wb_reason reason)
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 7/7] fs/writeback: remove unnecessary return in writeback_inodes_sb
  2024-02-08 17:20 [PATCH 0/7] Fixes and cleanups to fs-writeback Kemeng Shi
                   ` (5 preceding siblings ...)
  2024-02-08 17:20 ` [PATCH 6/7] fs/writeback: correct comment of __wakeup_flusher_threads_bdi Kemeng Shi
@ 2024-02-08 17:20 ` Kemeng Shi
  2024-02-23 13:59   ` Jan Kara
  6 siblings, 1 reply; 25+ messages in thread
From: Kemeng Shi @ 2024-02-08 17:20 UTC (permalink / raw)
  To: viro, brauner, jack; +Cc: linux-fsdevel, linux-kernel

writeback_inodes_sb doesn't have return value, just remove unnecessary
return in it.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/fs-writeback.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 816505d74b2f..eb62196777dd 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2748,7 +2748,7 @@ EXPORT_SYMBOL(writeback_inodes_sb_nr);
  */
 void writeback_inodes_sb(struct super_block *sb, enum wb_reason reason)
 {
-	return writeback_inodes_sb_nr(sb, get_nr_dirty_pages(), reason);
+	writeback_inodes_sb_nr(sb, get_nr_dirty_pages(), reason);
 }
 EXPORT_SYMBOL(writeback_inodes_sb);
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/7] fs/writeback: avoid to writeback non-expired inode in kupdate writeback
  2024-02-08 17:20 ` [PATCH 1/7] fs/writeback: avoid to writeback non-expired inode in kupdate writeback Kemeng Shi
@ 2024-02-08 18:29   ` Tim Chen
  2024-02-18  2:01     ` Kemeng Shi
  2024-02-23 13:42   ` Jan Kara
  1 sibling, 1 reply; 25+ messages in thread
From: Tim Chen @ 2024-02-08 18:29 UTC (permalink / raw)
  To: Kemeng Shi, viro, brauner, jack; +Cc: linux-fsdevel, linux-kernel

On Fri, 2024-02-09 at 01:20 +0800, Kemeng Shi wrote:
> 
>  
> +static void filter_expired_io(struct bdi_writeback *wb)
> +{
> +	struct inode *inode, *tmp;
> +	unsigned long expired_jiffies = jiffies -
> +		msecs_to_jiffies(dirty_expire_interval * 10);

We have kupdate trigger time hard coded with a factor of 10 to expire interval here.
The kupdate trigger time "mssecs_to_jiffies(dirty_expire_interval * 10)" is
also used in wb_writeback().  It will be better to have a macro or #define
to encapsulate the trigger time so if for any reason we need
to tune the trigger time, we just need to change it at one place.

Tim

> +
> +	spin_lock(&wb->list_lock);
> +	list_for_each_entry_safe(inode, tmp, &wb->b_io, i_io_list)
> +		if (inode_dirtied_after(inode, expired_jiffies))
> +			redirty_tail(inode, wb);
> +
> +	list_for_each_entry_safe(inode, tmp, &wb->b_more_io, i_io_list)
> +		if (inode_dirtied_after(inode, expired_jiffies))
> +			redirty_tail(inode, wb);
> +	spin_unlock(&wb->list_lock);
> +}
> +
>  /*
>   * Explicit flushing or periodic writeback of "old" data.
>   *
> @@ -2070,6 +2087,9 @@ static long wb_writeback(struct bdi_writeback *wb,
>  	long progress;
>  	struct blk_plug plug;
>  
> +	if (work->for_kupdate)
> +		filter_expired_io(wb);
> +
>  	blk_start_plug(&plug);
>  	for (;;) {
>  		/*


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/7] fs/writeback: bail out if there is no more inodes for IO and queued once
  2024-02-08 17:20 ` [PATCH 2/7] fs/writeback: bail out if there is no more inodes for IO and queued once Kemeng Shi
@ 2024-02-08 19:21   ` Tim Chen
  2024-02-18  2:11     ` Kemeng Shi
  2024-02-23 13:49   ` Jan Kara
  1 sibling, 1 reply; 25+ messages in thread
From: Tim Chen @ 2024-02-08 19:21 UTC (permalink / raw)
  To: Kemeng Shi, viro, brauner, jack; +Cc: linux-fsdevel, linux-kernel

On Fri, 2024-02-09 at 01:20 +0800, Kemeng Shi wrote:
> For case there is no more inodes for IO in io list from last wb_writeback,
> We may bail out early even there is inode in dirty list should be written
> back. Only bail out when we queued once to avoid missing dirtied inode.
> 
> This is from code reading...
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/fs-writeback.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index a9a918972719..edb0cff51673 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2086,6 +2086,7 @@ static long wb_writeback(struct bdi_writeback *wb,
>  	struct inode *inode;
>  	long progress;
>  	struct blk_plug plug;
> +	bool queued = false;
>  
>  	if (work->for_kupdate)
>  		filter_expired_io(wb);
> @@ -2131,8 +2132,10 @@ static long wb_writeback(struct bdi_writeback *wb,
>  			dirtied_before = jiffies;
>  
>  		trace_writeback_start(wb, work);
> -		if (list_empty(&wb->b_io))
> +		if (list_empty(&wb->b_io)) {
>  			queue_io(wb, work, dirtied_before);
> +			queued = true;
> +		}
>  		if (work->sb)
>  			progress = writeback_sb_inodes(work->sb, wb, work);
>  		else
> @@ -2155,7 +2158,7 @@ static long wb_writeback(struct bdi_writeback *wb,
>  		/*
>  		 * No more inodes for IO, bail
>  		 */
> -		if (list_empty(&wb->b_more_io)) {
> +		if (list_empty(&wb->b_more_io) && queued) {

Wonder if we can simply do
		if (list_empty(&wb->b_more_io) && list_empty(&wb->b_io)) {

if the intention is to not bail if there are still inodes to be be flushed.

Tim

>  			spin_unlock(&wb->list_lock);
>  			break;
>  		}


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/7] fs/writeback: remove unused parameter wb of finish_writeback_work
  2024-02-08 17:20 ` [PATCH 3/7] fs/writeback: remove unused parameter wb of finish_writeback_work Kemeng Shi
@ 2024-02-08 19:26   ` Tim Chen
  2024-02-23 13:49   ` Jan Kara
  1 sibling, 0 replies; 25+ messages in thread
From: Tim Chen @ 2024-02-08 19:26 UTC (permalink / raw)
  To: Kemeng Shi, viro, brauner, jack; +Cc: linux-fsdevel, linux-kernel

On Fri, 2024-02-09 at 01:20 +0800, Kemeng Shi wrote:
> Remove unused parameter wb of finish_writeback_work.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>

Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>

> ---
>  fs/fs-writeback.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index edb0cff51673..2619f74ced70 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -166,8 +166,7 @@ static void wb_wakeup_delayed(struct bdi_writeback *wb)
>  	spin_unlock_irq(&wb->work_lock);
>  }
>  
> -static void finish_writeback_work(struct bdi_writeback *wb,
> -				  struct wb_writeback_work *work)
> +static void finish_writeback_work(struct wb_writeback_work *work)
>  {
>  	struct wb_completion *done = work->done;
>  
> @@ -196,7 +195,7 @@ static void wb_queue_work(struct bdi_writeback *wb,
>  		list_add_tail(&work->list, &wb->work_list);
>  		mod_delayed_work(bdi_wq, &wb->dwork, 0);
>  	} else
> -		finish_writeback_work(wb, work);
> +		finish_writeback_work(work);
>  
>  	spin_unlock_irq(&wb->work_lock);
>  }
> @@ -2285,7 +2284,7 @@ static long wb_do_writeback(struct bdi_writeback *wb)
>  	while ((work = get_next_work_item(wb)) != NULL) {
>  		trace_writeback_exec(wb, work);
>  		wrote += wb_writeback(wb, work);
> -		finish_writeback_work(wb, work);
> +		finish_writeback_work(work);
>  	}
>  
>  	/*


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/7] fs/writeback: remove unneeded check in writeback_single_inode
  2024-02-08 17:20 ` [PATCH 4/7] fs/writeback: remove unneeded check in writeback_single_inode Kemeng Shi
@ 2024-02-08 19:34   ` Tim Chen
  2024-02-10  0:46   ` Dave Chinner
  1 sibling, 0 replies; 25+ messages in thread
From: Tim Chen @ 2024-02-08 19:34 UTC (permalink / raw)
  To: Kemeng Shi, viro, brauner, jack; +Cc: linux-fsdevel, linux-kernel

On Fri, 2024-02-09 at 01:20 +0800, Kemeng Shi wrote:
> I_DIRTY_ALL consists of I_DIRTY_TIME and I_DIRTY, so I_DIRTY_TIME must
> be set when any bit of I_DIRTY_ALL is set but I_DIRTY is not set.

/s/any bit of/some bit in/

> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>

Reviewed by: Tim Chen <tim.c.chen@linux.intel.com>

> ---
>  fs/fs-writeback.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 2619f74ced70..b61bf2075931 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1788,7 +1788,7 @@ static int writeback_single_inode(struct inode *inode,
>  		else if (!(inode->i_state & I_SYNC_QUEUED)) {
>  			if ((inode->i_state & I_DIRTY))
>  				redirty_tail_locked(inode, wb);
> -			else if (inode->i_state & I_DIRTY_TIME) {
> +			else {
>  				inode->dirtied_when = jiffies;
>  				inode_io_list_move_locked(inode,
>  							  wb,


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/7] fs/writeback: remove unneeded check in writeback_single_inode
  2024-02-08 17:20 ` [PATCH 4/7] fs/writeback: remove unneeded check in writeback_single_inode Kemeng Shi
  2024-02-08 19:34   ` Tim Chen
@ 2024-02-10  0:46   ` Dave Chinner
  2024-02-18  2:37     ` Kemeng Shi
  1 sibling, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2024-02-10  0:46 UTC (permalink / raw)
  To: Kemeng Shi; +Cc: viro, brauner, jack, linux-fsdevel, linux-kernel

On Fri, Feb 09, 2024 at 01:20:21AM +0800, Kemeng Shi wrote:
> I_DIRTY_ALL consists of I_DIRTY_TIME and I_DIRTY, so I_DIRTY_TIME must
> be set when any bit of I_DIRTY_ALL is set but I_DIRTY is not set.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/fs-writeback.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 2619f74ced70..b61bf2075931 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1788,7 +1788,7 @@ static int writeback_single_inode(struct inode *inode,
>  		else if (!(inode->i_state & I_SYNC_QUEUED)) {
>  			if ((inode->i_state & I_DIRTY))
>  				redirty_tail_locked(inode, wb);
> -			else if (inode->i_state & I_DIRTY_TIME) {
> +			else {
>  				inode->dirtied_when = jiffies;
>  				inode_io_list_move_locked(inode,
>  							  wb,

NAK.

The code is correct and the behaviour that is intended it obvious
from the code as it stands.

It is -incorrect- to move any inode that does not have I_DIRTY_TIME
to the wb->b_dirty_time list. By removing the I_DIRTY_TIME guard
from this code, you are leaving a nasty, undocumented logic trap in
the code that somebody is guaranteed to trip over into in the
future. That's making the code worse, not better....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/7] fs/writeback: avoid to writeback non-expired inode in kupdate writeback
  2024-02-08 18:29   ` Tim Chen
@ 2024-02-18  2:01     ` Kemeng Shi
  2024-02-28  1:46       ` Kemeng Shi
  0 siblings, 1 reply; 25+ messages in thread
From: Kemeng Shi @ 2024-02-18  2:01 UTC (permalink / raw)
  To: Tim Chen, viro, brauner, jack; +Cc: linux-fsdevel, linux-kernel



on 2/9/2024 2:29 AM, Tim Chen wrote:
> On Fri, 2024-02-09 at 01:20 +0800, Kemeng Shi wrote:
>>
>>  
>> +static void filter_expired_io(struct bdi_writeback *wb)
>> +{
>> +	struct inode *inode, *tmp;
>> +	unsigned long expired_jiffies = jiffies -
>> +		msecs_to_jiffies(dirty_expire_interval * 10);
> 
> We have kupdate trigger time hard coded with a factor of 10 to expire interval here.
> The kupdate trigger time "mssecs_to_jiffies(dirty_expire_interval * 10)" is
> also used in wb_writeback().  It will be better to have a macro or #define
> to encapsulate the trigger time so if for any reason we need
> to tune the trigger time, we just need to change it at one place.
Hi Tim. Sorry for the late reply, I was on vacation these days.
I agree it's better to have a macro and I will add it in next version.
Thanks!
> 
> Tim
> 
>> +
>> +	spin_lock(&wb->list_lock);
>> +	list_for_each_entry_safe(inode, tmp, &wb->b_io, i_io_list)
>> +		if (inode_dirtied_after(inode, expired_jiffies))
>> +			redirty_tail(inode, wb);
>> +
>> +	list_for_each_entry_safe(inode, tmp, &wb->b_more_io, i_io_list)
>> +		if (inode_dirtied_after(inode, expired_jiffies))
>> +			redirty_tail(inode, wb);
>> +	spin_unlock(&wb->list_lock);
>> +}
>> +
>>  /*
>>   * Explicit flushing or periodic writeback of "old" data.
>>   *
>> @@ -2070,6 +2087,9 @@ static long wb_writeback(struct bdi_writeback *wb,
>>  	long progress;
>>  	struct blk_plug plug;
>>  
>> +	if (work->for_kupdate)
>> +		filter_expired_io(wb);
>> +
>>  	blk_start_plug(&plug);
>>  	for (;;) {
>>  		/*
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/7] fs/writeback: bail out if there is no more inodes for IO and queued once
  2024-02-08 19:21   ` Tim Chen
@ 2024-02-18  2:11     ` Kemeng Shi
  0 siblings, 0 replies; 25+ messages in thread
From: Kemeng Shi @ 2024-02-18  2:11 UTC (permalink / raw)
  To: Tim Chen, viro, brauner, jack; +Cc: linux-fsdevel, linux-kernel



on 2/9/2024 3:21 AM, Tim Chen wrote:
> On Fri, 2024-02-09 at 01:20 +0800, Kemeng Shi wrote:
>> For case there is no more inodes for IO in io list from last wb_writeback,
>> We may bail out early even there is inode in dirty list should be written
>> back. Only bail out when we queued once to avoid missing dirtied inode.
>>
>> This is from code reading...
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>  fs/fs-writeback.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index a9a918972719..edb0cff51673 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -2086,6 +2086,7 @@ static long wb_writeback(struct bdi_writeback *wb,
>>  	struct inode *inode;
>>  	long progress;
>>  	struct blk_plug plug;
>> +	bool queued = false;
>>  
>>  	if (work->for_kupdate)
>>  		filter_expired_io(wb);
>> @@ -2131,8 +2132,10 @@ static long wb_writeback(struct bdi_writeback *wb,
>>  			dirtied_before = jiffies;
>>  
>>  		trace_writeback_start(wb, work);
>> -		if (list_empty(&wb->b_io))
>> +		if (list_empty(&wb->b_io)) {
>>  			queue_io(wb, work, dirtied_before);
>> +			queued = true;
>> +		}
>>  		if (work->sb)
>>  			progress = writeback_sb_inodes(work->sb, wb, work);
>>  		else
>> @@ -2155,7 +2158,7 @@ static long wb_writeback(struct bdi_writeback *wb,
>>  		/*
>>  		 * No more inodes for IO, bail
>>  		 */
>> -		if (list_empty(&wb->b_more_io)) {
>> +		if (list_empty(&wb->b_more_io) && queued) {
> 
> Wonder if we can simply do
> 		if (list_empty(&wb->b_more_io) && list_empty(&wb->b_io)) {
> 
> if the intention is to not bail if there are still inodes to be be flushed.
I suppose not as there may be inodes in wb->b_dirty should be flushed.
For case that there is a inode in wb->b_io which is not flushed in last
wb_writeback and there are a lot of inodes in wb->dirty, the next background
flush is supposed to make dirty pages under threshold however only the inode
in wb->b_io is flushed.
> 
> Tim
> 
>>  			spin_unlock(&wb->list_lock);
>>  			break;
>>  		}
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/7] fs/writeback: remove unneeded check in writeback_single_inode
  2024-02-10  0:46   ` Dave Chinner
@ 2024-02-18  2:37     ` Kemeng Shi
  0 siblings, 0 replies; 25+ messages in thread
From: Kemeng Shi @ 2024-02-18  2:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: viro, brauner, jack, linux-fsdevel, linux-kernel



on 2/10/2024 8:46 AM, Dave Chinner wrote:
> On Fri, Feb 09, 2024 at 01:20:21AM +0800, Kemeng Shi wrote:
>> I_DIRTY_ALL consists of I_DIRTY_TIME and I_DIRTY, so I_DIRTY_TIME must
>> be set when any bit of I_DIRTY_ALL is set but I_DIRTY is not set.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>  fs/fs-writeback.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index 2619f74ced70..b61bf2075931 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -1788,7 +1788,7 @@ static int writeback_single_inode(struct inode *inode,
>>  		else if (!(inode->i_state & I_SYNC_QUEUED)) {
>>  			if ((inode->i_state & I_DIRTY))
>>  				redirty_tail_locked(inode, wb);
>> -			else if (inode->i_state & I_DIRTY_TIME) {
>> +			else {
>>  				inode->dirtied_when = jiffies;
>>  				inode_io_list_move_locked(inode,
>>  							  wb,
> 
> NAK.
> 
> The code is correct and the behaviour that is intended it obvious
> from the code as it stands.
> 
> It is -incorrect- to move any inode that does not have I_DIRTY_TIME
> to the wb->b_dirty_time list. By removing the I_DIRTY_TIME guard
> from this code, you are leaving a nasty, undocumented logic trap in
> the code that somebody is guaranteed to trip over into in the
> future. That's making the code worse, not better....
Sure, I will remove this one in next version. Thanks for the
explanation.
> 
> -Dave.
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/7] fs/writeback: avoid to writeback non-expired inode in kupdate writeback
  2024-02-08 17:20 ` [PATCH 1/7] fs/writeback: avoid to writeback non-expired inode in kupdate writeback Kemeng Shi
  2024-02-08 18:29   ` Tim Chen
@ 2024-02-23 13:42   ` Jan Kara
  2024-02-26 11:47     ` Kemeng Shi
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Kara @ 2024-02-23 13:42 UTC (permalink / raw)
  To: Kemeng Shi; +Cc: viro, brauner, jack, linux-fsdevel, linux-kernel

On Fri 09-02-24 01:20:18, Kemeng Shi wrote:
> In kupdate writeback, only expired inode (have been dirty for longer than
> dirty_expire_interval) is supposed to be written back. However, kupdate
> writeback will writeback non-expired inode left in b_io or b_more_io from
> last wb_writeback. As a result, writeback will keep being triggered
> unexpected when we keep dirtying pages even dirty memory is under
> threshold and inode is not expired. To be more specific:
> Assume dirty background threshold is > 1G and dirty_expire_centisecs is
> > 60s. When we running fio -size=1G -invalidate=0 -ioengine=libaio
> --time_based -runtime=60... (keep dirtying), the writeback will keep
> being triggered as following:
> wb_workfn
>   wb_do_writeback
>     wb_check_background_flush
>       /*
>        * Wb dirty background threshold starts at 0 if device was idle and
>        * grows up when bandwidth of wb is updated. So a background
>        * writeback is triggered.
>        */
>       wb_over_bg_thresh
>       /*
>        * Dirtied inode will be written back and added to b_more_io list
>        * after slice used up (because we keep dirtying the inode).
>        */
>       wb_writeback
> 
> Writeback is triggered per dirty_writeback_centisecs as following:
> wb_workfn
>   wb_do_writeback
>     wb_check_old_data_flush
>       /*
>        * Write back inode left in b_io and b_more_io from last wb_writeback
>        * even the inode is non-expired and it will be added to b_more_io
>        * again as slice will be used up (because we keep dirtying the
>        * inode)
>        */
>       wb_writeback
> 
> Fix this by moving non-expired inode in io list from last wb_writeback to
> dirty list in kudpate writeback.
> 
> Test as following:
> /* make it more easier to observe the issue */
> echo 300000 > /proc/sys/vm/dirty_expire_centisecs
> echo 100 > /proc/sys/vm/dirty_writeback_centisecs
> /* create a idle device */
> mkfs.ext4 -F /dev/vdb
> mount /dev/vdb /bdi1/
> /* run buffer write with fio */
> fio -name test -filename=/bdi1/file -size=800M -ioengine=libaio -bs=4K \
> -iodepth=1 -rw=write -direct=0 --time_based -runtime=60 -invalidate=0
> 
> Result before fix (run three tests):
> 1360MB/s
> 1329MB/s
> 1455MB/s
> 
> Result after fix (run three tests);
> 790MB/s
> 1820MB/s
> 1804MB/s
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>

OK, I don't find this a particularly troubling problem but I agree it might
be nice to fix. But filtering the lists in wb_writeback() like this seems
kind of wrong - the queueing is managed in queue_io() and I'd prefer to
keep it that way. What if we just modified requeue_inode() to not
requeue_io() inodes in case we are doing kupdate style writeback and inode
isn't expired?

Sure we will still possibly writeback unexpired inodes once before calling
redirty_tail_locked() on them but that shouldn't really be noticeable?

								Honza
> ---
>  fs/fs-writeback.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 5ab1aaf805f7..a9a918972719 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2046,6 +2046,23 @@ static long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages,
>  	return nr_pages - work.nr_pages;
>  }
>  
> +static void filter_expired_io(struct bdi_writeback *wb)
> +{
> +	struct inode *inode, *tmp;
> +	unsigned long expired_jiffies = jiffies -
> +		msecs_to_jiffies(dirty_expire_interval * 10);
> +
> +	spin_lock(&wb->list_lock);
> +	list_for_each_entry_safe(inode, tmp, &wb->b_io, i_io_list)
> +		if (inode_dirtied_after(inode, expired_jiffies))
> +			redirty_tail(inode, wb);
> +
> +	list_for_each_entry_safe(inode, tmp, &wb->b_more_io, i_io_list)
> +		if (inode_dirtied_after(inode, expired_jiffies))
> +			redirty_tail(inode, wb);
> +	spin_unlock(&wb->list_lock);
> +}
> +
>  /*
>   * Explicit flushing or periodic writeback of "old" data.
>   *
> @@ -2070,6 +2087,9 @@ static long wb_writeback(struct bdi_writeback *wb,
>  	long progress;
>  	struct blk_plug plug;
>  
> +	if (work->for_kupdate)
> +		filter_expired_io(wb);
> +
>  	blk_start_plug(&plug);
>  	for (;;) {
>  		/*
> -- 
> 2.30.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/7] fs/writeback: bail out if there is no more inodes for IO and queued once
  2024-02-08 17:20 ` [PATCH 2/7] fs/writeback: bail out if there is no more inodes for IO and queued once Kemeng Shi
  2024-02-08 19:21   ` Tim Chen
@ 2024-02-23 13:49   ` Jan Kara
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Kara @ 2024-02-23 13:49 UTC (permalink / raw)
  To: Kemeng Shi; +Cc: viro, brauner, jack, linux-fsdevel, linux-kernel

On Fri 09-02-24 01:20:19, Kemeng Shi wrote:
> For case there is no more inodes for IO in io list from last wb_writeback,
> We may bail out early even there is inode in dirty list should be written
> back. Only bail out when we queued once to avoid missing dirtied inode.
> 
> This is from code reading...
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>

Makes sense. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/fs-writeback.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index a9a918972719..edb0cff51673 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2086,6 +2086,7 @@ static long wb_writeback(struct bdi_writeback *wb,
>  	struct inode *inode;
>  	long progress;
>  	struct blk_plug plug;
> +	bool queued = false;
>  
>  	if (work->for_kupdate)
>  		filter_expired_io(wb);
> @@ -2131,8 +2132,10 @@ static long wb_writeback(struct bdi_writeback *wb,
>  			dirtied_before = jiffies;
>  
>  		trace_writeback_start(wb, work);
> -		if (list_empty(&wb->b_io))
> +		if (list_empty(&wb->b_io)) {
>  			queue_io(wb, work, dirtied_before);
> +			queued = true;
> +		}
>  		if (work->sb)
>  			progress = writeback_sb_inodes(work->sb, wb, work);
>  		else
> @@ -2155,7 +2158,7 @@ static long wb_writeback(struct bdi_writeback *wb,
>  		/*
>  		 * No more inodes for IO, bail
>  		 */
> -		if (list_empty(&wb->b_more_io)) {
> +		if (list_empty(&wb->b_more_io) && queued) {
>  			spin_unlock(&wb->list_lock);
>  			break;
>  		}
> -- 
> 2.30.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/7] fs/writeback: remove unused parameter wb of finish_writeback_work
  2024-02-08 17:20 ` [PATCH 3/7] fs/writeback: remove unused parameter wb of finish_writeback_work Kemeng Shi
  2024-02-08 19:26   ` Tim Chen
@ 2024-02-23 13:49   ` Jan Kara
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Kara @ 2024-02-23 13:49 UTC (permalink / raw)
  To: Kemeng Shi; +Cc: viro, brauner, jack, linux-fsdevel, linux-kernel

On Fri 09-02-24 01:20:20, Kemeng Shi wrote:
> Remove unused parameter wb of finish_writeback_work.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>

Sure. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/fs-writeback.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index edb0cff51673..2619f74ced70 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -166,8 +166,7 @@ static void wb_wakeup_delayed(struct bdi_writeback *wb)
>  	spin_unlock_irq(&wb->work_lock);
>  }
>  
> -static void finish_writeback_work(struct bdi_writeback *wb,
> -				  struct wb_writeback_work *work)
> +static void finish_writeback_work(struct wb_writeback_work *work)
>  {
>  	struct wb_completion *done = work->done;
>  
> @@ -196,7 +195,7 @@ static void wb_queue_work(struct bdi_writeback *wb,
>  		list_add_tail(&work->list, &wb->work_list);
>  		mod_delayed_work(bdi_wq, &wb->dwork, 0);
>  	} else
> -		finish_writeback_work(wb, work);
> +		finish_writeback_work(work);
>  
>  	spin_unlock_irq(&wb->work_lock);
>  }
> @@ -2285,7 +2284,7 @@ static long wb_do_writeback(struct bdi_writeback *wb)
>  	while ((work = get_next_work_item(wb)) != NULL) {
>  		trace_writeback_exec(wb, work);
>  		wrote += wb_writeback(wb, work);
> -		finish_writeback_work(wb, work);
> +		finish_writeback_work(work);
>  	}
>  
>  	/*
> -- 
> 2.30.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/7] fs/writeback: only calculate dirtied_before when b_io is empty
  2024-02-08 17:20 ` [PATCH 5/7] fs/writeback: only calculate dirtied_before when b_io is empty Kemeng Shi
@ 2024-02-23 13:58   ` Jan Kara
  2024-02-26 11:50     ` Kemeng Shi
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2024-02-23 13:58 UTC (permalink / raw)
  To: Kemeng Shi; +Cc: viro, brauner, jack, linux-fsdevel, linux-kernel

On Fri 09-02-24 01:20:22, Kemeng Shi wrote:
> The dirtied_before is only used when b_io is not empty, so only calculate
> when b_io is not empty.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/fs-writeback.c | 25 +++++++++++++------------
>  1 file changed, 13 insertions(+), 12 deletions(-)

OK, but please wrap the comment at 80 columns as well.

								Honza

> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index b61bf2075931..e8868e814e0a 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2118,20 +2118,21 @@ static long wb_writeback(struct bdi_writeback *wb,
>  
>  		spin_lock(&wb->list_lock);
>  
> -		/*
> -		 * Kupdate and background works are special and we want to
> -		 * include all inodes that need writing. Livelock avoidance is
> -		 * handled by these works yielding to any other work so we are
> -		 * safe.
> -		 */
> -		if (work->for_kupdate) {
> -			dirtied_before = jiffies -
> -				msecs_to_jiffies(dirty_expire_interval * 10);
> -		} else if (work->for_background)
> -			dirtied_before = jiffies;
> -
>  		trace_writeback_start(wb, work);
>  		if (list_empty(&wb->b_io)) {
> +			/*
> +			 * Kupdate and background works are special and we want to
> +			 * include all inodes that need writing. Livelock avoidance is
> +			 * handled by these works yielding to any other work so we are
> +			 * safe.
> +			 */
> +			if (work->for_kupdate) {
> +				dirtied_before = jiffies -
> +					msecs_to_jiffies(dirty_expire_interval *
> +							 10);
> +			} else if (work->for_background)
> +				dirtied_before = jiffies;
> +
>  			queue_io(wb, work, dirtied_before);
>  			queued = true;
>  		}
> -- 
> 2.30.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/7] fs/writeback: correct comment of __wakeup_flusher_threads_bdi
  2024-02-08 17:20 ` [PATCH 6/7] fs/writeback: correct comment of __wakeup_flusher_threads_bdi Kemeng Shi
@ 2024-02-23 13:59   ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2024-02-23 13:59 UTC (permalink / raw)
  To: Kemeng Shi; +Cc: viro, brauner, jack, linux-fsdevel, linux-kernel

On Fri 09-02-24 01:20:23, Kemeng Shi wrote:
> Commit e8e8a0c6c9bfc ("writeback: move nr_pages == 0 logic to one
> location") removed parameter nr_pages of __wakeup_flusher_threads_bdi
> and we try to writeback all dirty pages in __wakeup_flusher_threads_bdi
> now. Just correct stale comment.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>

Nice. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/fs-writeback.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index e8868e814e0a..816505d74b2f 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2345,8 +2345,7 @@ void wb_workfn(struct work_struct *work)
>  }
>  
>  /*
> - * Start writeback of `nr_pages' pages on this bdi. If `nr_pages' is zero,
> - * write back the whole world.
> + * Start writeback of all dirty pages on this bdi.
>   */
>  static void __wakeup_flusher_threads_bdi(struct backing_dev_info *bdi,
>  					 enum wb_reason reason)
> -- 
> 2.30.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/7] fs/writeback: remove unnecessary return in writeback_inodes_sb
  2024-02-08 17:20 ` [PATCH 7/7] fs/writeback: remove unnecessary return in writeback_inodes_sb Kemeng Shi
@ 2024-02-23 13:59   ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2024-02-23 13:59 UTC (permalink / raw)
  To: Kemeng Shi; +Cc: viro, brauner, jack, linux-fsdevel, linux-kernel

On Fri 09-02-24 01:20:24, Kemeng Shi wrote:
> writeback_inodes_sb doesn't have return value, just remove unnecessary
> return in it.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/fs-writeback.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 816505d74b2f..eb62196777dd 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2748,7 +2748,7 @@ EXPORT_SYMBOL(writeback_inodes_sb_nr);
>   */
>  void writeback_inodes_sb(struct super_block *sb, enum wb_reason reason)
>  {
> -	return writeback_inodes_sb_nr(sb, get_nr_dirty_pages(), reason);
> +	writeback_inodes_sb_nr(sb, get_nr_dirty_pages(), reason);
>  }
>  EXPORT_SYMBOL(writeback_inodes_sb);
>  
> -- 
> 2.30.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/7] fs/writeback: avoid to writeback non-expired inode in kupdate writeback
  2024-02-23 13:42   ` Jan Kara
@ 2024-02-26 11:47     ` Kemeng Shi
  0 siblings, 0 replies; 25+ messages in thread
From: Kemeng Shi @ 2024-02-26 11:47 UTC (permalink / raw)
  To: Jan Kara; +Cc: viro, brauner, linux-fsdevel, linux-kernel



on 2/23/2024 9:42 PM, Jan Kara wrote:
> On Fri 09-02-24 01:20:18, Kemeng Shi wrote:
>> In kupdate writeback, only expired inode (have been dirty for longer than
>> dirty_expire_interval) is supposed to be written back. However, kupdate
>> writeback will writeback non-expired inode left in b_io or b_more_io from
>> last wb_writeback. As a result, writeback will keep being triggered
>> unexpected when we keep dirtying pages even dirty memory is under
>> threshold and inode is not expired. To be more specific:
>> Assume dirty background threshold is > 1G and dirty_expire_centisecs is
>>> 60s. When we running fio -size=1G -invalidate=0 -ioengine=libaio
>> --time_based -runtime=60... (keep dirtying), the writeback will keep
>> being triggered as following:
>> wb_workfn
>>   wb_do_writeback
>>     wb_check_background_flush
>>       /*
>>        * Wb dirty background threshold starts at 0 if device was idle and
>>        * grows up when bandwidth of wb is updated. So a background
>>        * writeback is triggered.
>>        */
>>       wb_over_bg_thresh
>>       /*
>>        * Dirtied inode will be written back and added to b_more_io list
>>        * after slice used up (because we keep dirtying the inode).
>>        */
>>       wb_writeback
>>
>> Writeback is triggered per dirty_writeback_centisecs as following:
>> wb_workfn
>>   wb_do_writeback
>>     wb_check_old_data_flush
>>       /*
>>        * Write back inode left in b_io and b_more_io from last wb_writeback
>>        * even the inode is non-expired and it will be added to b_more_io
>>        * again as slice will be used up (because we keep dirtying the
>>        * inode)
>>        */
>>       wb_writeback
>>
>> Fix this by moving non-expired inode in io list from last wb_writeback to
>> dirty list in kudpate writeback.
>>
>> Test as following:
>> /* make it more easier to observe the issue */
>> echo 300000 > /proc/sys/vm/dirty_expire_centisecs
>> echo 100 > /proc/sys/vm/dirty_writeback_centisecs
>> /* create a idle device */
>> mkfs.ext4 -F /dev/vdb
>> mount /dev/vdb /bdi1/
>> /* run buffer write with fio */
>> fio -name test -filename=/bdi1/file -size=800M -ioengine=libaio -bs=4K \
>> -iodepth=1 -rw=write -direct=0 --time_based -runtime=60 -invalidate=0
>>
>> Result before fix (run three tests):
>> 1360MB/s
>> 1329MB/s
>> 1455MB/s
>>
>> Result after fix (run three tests);
>> 790MB/s
>> 1820MB/s
>> 1804MB/s
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> 
> OK, I don't find this a particularly troubling problem but I agree it might
> be nice to fix. But filtering the lists in wb_writeback() like this seems
> kind of wrong - the queueing is managed in queue_io() and I'd prefer to
> keep it that way. What if we just modified requeue_inode() to not
> requeue_io() inodes in case we are doing kupdate style writeback and inode
> isn't expired?
Sure, this could solve the reported problem and is acceptable to me. Thanks
for the advise. I will try it in next version.
> 
> Sure we will still possibly writeback unexpired inodes once before calling
> redirty_tail_locked() on them but that shouldn't really be noticeable?
> 
> 								Honza
>> ---
>>  fs/fs-writeback.c | 20 ++++++++++++++++++++
>>  1 file changed, 20 insertions(+)
>>
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index 5ab1aaf805f7..a9a918972719 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -2046,6 +2046,23 @@ static long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages,
>>  	return nr_pages - work.nr_pages;
>>  }
>>  
>> +static void filter_expired_io(struct bdi_writeback *wb)
>> +{
>> +	struct inode *inode, *tmp;
>> +	unsigned long expired_jiffies = jiffies -
>> +		msecs_to_jiffies(dirty_expire_interval * 10);
>> +
>> +	spin_lock(&wb->list_lock);
>> +	list_for_each_entry_safe(inode, tmp, &wb->b_io, i_io_list)
>> +		if (inode_dirtied_after(inode, expired_jiffies))
>> +			redirty_tail(inode, wb);
>> +
>> +	list_for_each_entry_safe(inode, tmp, &wb->b_more_io, i_io_list)
>> +		if (inode_dirtied_after(inode, expired_jiffies))
>> +			redirty_tail(inode, wb);
>> +	spin_unlock(&wb->list_lock);
>> +}
>> +
>>  /*
>>   * Explicit flushing or periodic writeback of "old" data.
>>   *
>> @@ -2070,6 +2087,9 @@ static long wb_writeback(struct bdi_writeback *wb,
>>  	long progress;
>>  	struct blk_plug plug;
>>  
>> +	if (work->for_kupdate)
>> +		filter_expired_io(wb);
>> +
>>  	blk_start_plug(&plug);
>>  	for (;;) {
>>  		/*
>> -- 
>> 2.30.0
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/7] fs/writeback: only calculate dirtied_before when b_io is empty
  2024-02-23 13:58   ` Jan Kara
@ 2024-02-26 11:50     ` Kemeng Shi
  0 siblings, 0 replies; 25+ messages in thread
From: Kemeng Shi @ 2024-02-26 11:50 UTC (permalink / raw)
  To: Jan Kara; +Cc: viro, brauner, linux-fsdevel, linux-kernel



on 2/23/2024 9:58 PM, Jan Kara wrote:
> On Fri 09-02-24 01:20:22, Kemeng Shi wrote:
>> The dirtied_before is only used when b_io is not empty, so only calculate
>> when b_io is not empty.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>  fs/fs-writeback.c | 25 +++++++++++++------------
>>  1 file changed, 13 insertions(+), 12 deletions(-)
> 
> OK, but please wrap the comment at 80 columns as well.
Sorry for missing this as I rely too much on checkpatch.pl to report this
while the script didn't catch it. Will fix it in next version. Thanks for
review.
> 
> 								Honza
> 
>>
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index b61bf2075931..e8868e814e0a 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -2118,20 +2118,21 @@ static long wb_writeback(struct bdi_writeback *wb,
>>  
>>  		spin_lock(&wb->list_lock);
>>  
>> -		/*
>> -		 * Kupdate and background works are special and we want to
>> -		 * include all inodes that need writing. Livelock avoidance is
>> -		 * handled by these works yielding to any other work so we are
>> -		 * safe.
>> -		 */
>> -		if (work->for_kupdate) {
>> -			dirtied_before = jiffies -
>> -				msecs_to_jiffies(dirty_expire_interval * 10);
>> -		} else if (work->for_background)
>> -			dirtied_before = jiffies;
>> -
>>  		trace_writeback_start(wb, work);
>>  		if (list_empty(&wb->b_io)) {
>> +			/*
>> +			 * Kupdate and background works are special and we want to
>> +			 * include all inodes that need writing. Livelock avoidance is
>> +			 * handled by these works yielding to any other work so we are
>> +			 * safe.
>> +			 */
>> +			if (work->for_kupdate) {
>> +				dirtied_before = jiffies -
>> +					msecs_to_jiffies(dirty_expire_interval *
>> +							 10);
>> +			} else if (work->for_background)
>> +				dirtied_before = jiffies;
>> +
>>  			queue_io(wb, work, dirtied_before);
>>  			queued = true;
>>  		}
>> -- 
>> 2.30.0
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/7] fs/writeback: avoid to writeback non-expired inode in kupdate writeback
  2024-02-18  2:01     ` Kemeng Shi
@ 2024-02-28  1:46       ` Kemeng Shi
  0 siblings, 0 replies; 25+ messages in thread
From: Kemeng Shi @ 2024-02-28  1:46 UTC (permalink / raw)
  To: Tim Chen, viro, brauner, jack; +Cc: linux-fsdevel, linux-kernel



on 2/18/2024 10:01 AM, Kemeng Shi wrote:
> 
> 
> on 2/9/2024 2:29 AM, Tim Chen wrote:
>> On Fri, 2024-02-09 at 01:20 +0800, Kemeng Shi wrote:
>>>
>>>  
>>> +static void filter_expired_io(struct bdi_writeback *wb)
>>> +{
>>> +	struct inode *inode, *tmp;
>>> +	unsigned long expired_jiffies = jiffies -
>>> +		msecs_to_jiffies(dirty_expire_interval * 10);
>>
>> We have kupdate trigger time hard coded with a factor of 10 to expire interval here.
>> The kupdate trigger time "mssecs_to_jiffies(dirty_expire_interval * 10)" is
>> also used in wb_writeback().  It will be better to have a macro or #define
>> to encapsulate the trigger time so if for any reason we need
>> to tune the trigger time, we just need to change it at one place.
> Hi Tim. Sorry for the late reply, I was on vacation these days.
> I agree it's better to have a macro and I will add it in next version.
> Thanks!
Hi Tim,
After a deep look, I plan to set dirty_expire_interval in jiffies within sysctl
handler. Then we could use dirty_expire_interval directly instead of
"mssecs_to_jiffies(dirty_expire_interval * 10)" and macro is not needed.
Similar, dirty_writeback_interval and dirtytime_expire_interval could be set in
jiffies to remove repeat convertion from centisecs to jiffies. I will submit a
new series to do this if no one is against this.
Thanks!
>>
>> Tim
>>
>>> +
>>> +	spin_lock(&wb->list_lock);
>>> +	list_for_each_entry_safe(inode, tmp, &wb->b_io, i_io_list)
>>> +		if (inode_dirtied_after(inode, expired_jiffies))
>>> +			redirty_tail(inode, wb);
>>> +
>>> +	list_for_each_entry_safe(inode, tmp, &wb->b_more_io, i_io_list)
>>> +		if (inode_dirtied_after(inode, expired_jiffies))
>>> +			redirty_tail(inode, wb);
>>> +	spin_unlock(&wb->list_lock);
>>> +}
>>> +
>>>  /*
>>>   * Explicit flushing or periodic writeback of "old" data.
>>>   *
>>> @@ -2070,6 +2087,9 @@ static long wb_writeback(struct bdi_writeback *wb,
>>>  	long progress;
>>>  	struct blk_plug plug;
>>>  
>>> +	if (work->for_kupdate)
>>> +		filter_expired_io(wb);
>>> +
>>>  	blk_start_plug(&plug);
>>>  	for (;;) {
>>>  		/*
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-02-28  1:46 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-08 17:20 [PATCH 0/7] Fixes and cleanups to fs-writeback Kemeng Shi
2024-02-08 17:20 ` [PATCH 1/7] fs/writeback: avoid to writeback non-expired inode in kupdate writeback Kemeng Shi
2024-02-08 18:29   ` Tim Chen
2024-02-18  2:01     ` Kemeng Shi
2024-02-28  1:46       ` Kemeng Shi
2024-02-23 13:42   ` Jan Kara
2024-02-26 11:47     ` Kemeng Shi
2024-02-08 17:20 ` [PATCH 2/7] fs/writeback: bail out if there is no more inodes for IO and queued once Kemeng Shi
2024-02-08 19:21   ` Tim Chen
2024-02-18  2:11     ` Kemeng Shi
2024-02-23 13:49   ` Jan Kara
2024-02-08 17:20 ` [PATCH 3/7] fs/writeback: remove unused parameter wb of finish_writeback_work Kemeng Shi
2024-02-08 19:26   ` Tim Chen
2024-02-23 13:49   ` Jan Kara
2024-02-08 17:20 ` [PATCH 4/7] fs/writeback: remove unneeded check in writeback_single_inode Kemeng Shi
2024-02-08 19:34   ` Tim Chen
2024-02-10  0:46   ` Dave Chinner
2024-02-18  2:37     ` Kemeng Shi
2024-02-08 17:20 ` [PATCH 5/7] fs/writeback: only calculate dirtied_before when b_io is empty Kemeng Shi
2024-02-23 13:58   ` Jan Kara
2024-02-26 11:50     ` Kemeng Shi
2024-02-08 17:20 ` [PATCH 6/7] fs/writeback: correct comment of __wakeup_flusher_threads_bdi Kemeng Shi
2024-02-23 13:59   ` Jan Kara
2024-02-08 17:20 ` [PATCH 7/7] fs/writeback: remove unnecessary return in writeback_inodes_sb Kemeng Shi
2024-02-23 13:59   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).