All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes
@ 2011-10-20 15:22 Wu Fengguang
  2011-10-20 15:22 ` [PATCH 1/7] writeback: introduce queue b_more_io_wait Wu Fengguang
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Wu Fengguang @ 2011-10-20 15:22 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jan Kara, Dave Chinner, Christoph Hellwig, Andrew Morton,
	Wu Fengguang, LKML

Hi,

This tries to keep dirtied_when for blocked inodes by converting some
redirty_tail() calls to requeue_io_wait(). The possibly blocked inodes
will be moved to b_more_io_wait. The b_more_io_wait inodes will now be
retried as dillegent as b_more_io inodes, except when the latter goes empty,
b_more_io_wait will be retried by the kupdate work on increasing intervals
until exceeding dirty_writeback_interval.

Christoph Hellwig (1):
      writeback: avoid redirtying when ->write_inode failed to clear I_DIRTY

Jan Kara (2):
      writeback: update wb->last_active on written pages/inodes
      writeback: Retry kupdate work early if we need to retry some inode writeback

Wu Fengguang (4):
      writeback: introduce queue b_more_io_wait
      writeback: requeue_io_wait() on pages_skipped inode
      writeback: requeue_io_wait() on blocked inode
      writeback: requeue_io_wait() when failed to grab superblock

 fs/fs-writeback.c           |   62 ++++++++++++++++++++++++++++++++++---------
 include/linux/backing-dev.h |    8 +++--
 mm/backing-dev.c            |   10 +++++-
 3 files changed, 62 insertions(+), 18 deletions(-)

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/7] writeback: introduce queue b_more_io_wait
  2011-10-20 15:22 [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes Wu Fengguang
@ 2011-10-20 15:22 ` Wu Fengguang
  2011-10-20 23:23   ` Jan Kara
  2011-10-20 15:22 ` [PATCH 2/7] writeback: avoid redirtying when ->write_inode failed to clear I_DIRTY Wu Fengguang
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Wu Fengguang @ 2011-10-20 15:22 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jan Kara, Dave Chinner, Christoph Hellwig, Wu Fengguang,
	Andrew Morton, LKML

[-- Attachment #1: writeback-more_io_wait.patch --]
[-- Type: text/plain, Size: 5373 bytes --]

The problem is, redirty_tail() may update i_dirtied_when and result in
30s max delay. If redirty_tail() is called often enough, some inode may
even be delayed for ever.

So introduce the b_more_io_wait queue to park inodes that for some
reason cannot be synced immediately. The inodes will be sent to b_io at
the next b_io refill time, however won't be busy retried as b_more_io:
when the redirtied inodes are all in b_more_io_wait, wb_writeback() will
see empty b_more_io and hence break out of the loop.

This would be the new data flow after converting all redirty_tail()
calls to requeue_io_wait():

b_dirty --> b_io --> b_more_io/b_more_io_wait --+
             ^                                  |
             |                                  |
             +----------------------------------+

Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c           |   16 ++++++++++++++++
 include/linux/backing-dev.h |    8 +++++---
 mm/backing-dev.c            |   10 ++++++++--
 3 files changed, 29 insertions(+), 5 deletions(-)

--- linux-next.orig/fs/fs-writeback.c	2011-10-20 23:13:48.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2011-10-20 23:19:12.000000000 +0800
@@ -234,6 +234,21 @@ static void requeue_io(struct inode *ino
 	list_move(&inode->i_wb_list, &wb->b_more_io);
 }
 
+/*
+ * The inode should be retried in an opportunistic way.
+ *
+ * The only difference between b_more_io and b_more_io_wait is:
+ * wb_writeback() won't quit as long as b_more_io is not empty.  When
+ * wb_writeback() quit on empty b_more_io and non-empty b_more_io_wait,
+ * the kupdate work will wakeup more frequently to retry the inodes in
+ * b_more_io_wait.
+ */
+static void requeue_io_wait(struct inode *inode, struct bdi_writeback *wb)
+{
+	assert_spin_locked(&wb->list_lock);
+	list_move(&inode->i_wb_list, &wb->b_more_io_wait);
+}
+
 static void inode_sync_complete(struct inode *inode)
 {
 	/*
@@ -321,6 +336,7 @@ static void queue_io(struct bdi_writebac
 	int moved;
 	assert_spin_locked(&wb->list_lock);
 	list_splice_init(&wb->b_more_io, &wb->b_io);
+	list_splice_init(&wb->b_more_io_wait, &wb->b_io);
 	moved = move_expired_inodes(&wb->b_dirty, &wb->b_io, work);
 	trace_writeback_queue_io(wb, work, moved);
 }
--- linux-next.orig/include/linux/backing-dev.h	2011-10-20 23:13:48.000000000 +0800
+++ linux-next/include/linux/backing-dev.h	2011-10-20 23:13:50.000000000 +0800
@@ -59,6 +59,7 @@ struct bdi_writeback {
 	struct list_head b_dirty;	/* dirty inodes */
 	struct list_head b_io;		/* parked for writeback */
 	struct list_head b_more_io;	/* parked for more writeback */
+	struct list_head b_more_io_wait;/* opportunistic retry io */
 	spinlock_t list_lock;		/* protects the b_* lists */
 };
 
@@ -133,9 +134,10 @@ extern struct list_head bdi_pending_list
 
 static inline int wb_has_dirty_io(struct bdi_writeback *wb)
 {
-	return !list_empty(&wb->b_dirty) ||
-	       !list_empty(&wb->b_io) ||
-	       !list_empty(&wb->b_more_io);
+	return !list_empty(&wb->b_dirty)	||
+	       !list_empty(&wb->b_io)		||
+	       !list_empty(&wb->b_more_io)	||
+	       !list_empty(&wb->b_more_io_wait);
 }
 
 static inline void __add_bdi_stat(struct backing_dev_info *bdi,
--- linux-next.orig/mm/backing-dev.c	2011-10-20 23:13:48.000000000 +0800
+++ linux-next/mm/backing-dev.c	2011-10-20 23:13:50.000000000 +0800
@@ -74,10 +74,10 @@ static int bdi_debug_stats_show(struct s
 	unsigned long background_thresh;
 	unsigned long dirty_thresh;
 	unsigned long bdi_thresh;
-	unsigned long nr_dirty, nr_io, nr_more_io;
+	unsigned long nr_dirty, nr_io, nr_more_io, nr_more_io_wait;
 	struct inode *inode;
 
-	nr_dirty = nr_io = nr_more_io = 0;
+	nr_dirty = nr_io = nr_more_io = nr_more_io_wait = 0;
 	spin_lock(&wb->list_lock);
 	list_for_each_entry(inode, &wb->b_dirty, i_wb_list)
 		nr_dirty++;
@@ -85,6 +85,8 @@ static int bdi_debug_stats_show(struct s
 		nr_io++;
 	list_for_each_entry(inode, &wb->b_more_io, i_wb_list)
 		nr_more_io++;
+	list_for_each_entry(inode, &wb->b_more_io_wait, i_wb_list)
+		nr_more_io_wait++;
 	spin_unlock(&wb->list_lock);
 
 	global_dirty_limits(&background_thresh, &dirty_thresh);
@@ -103,6 +105,7 @@ static int bdi_debug_stats_show(struct s
 		   "b_dirty:            %10lu\n"
 		   "b_io:               %10lu\n"
 		   "b_more_io:          %10lu\n"
+		   "b_more_io_wait:     %10lu\n"
 		   "bdi_list:           %10u\n"
 		   "state:              %10lx\n",
 		   (unsigned long) K(bdi_stat(bdi, BDI_WRITEBACK)),
@@ -116,6 +119,7 @@ static int bdi_debug_stats_show(struct s
 		   nr_dirty,
 		   nr_io,
 		   nr_more_io,
+		   nr_more_io_wait,
 		   !list_empty(&bdi->bdi_list), bdi->state);
 #undef K
 
@@ -651,6 +655,7 @@ static void bdi_wb_init(struct bdi_write
 	INIT_LIST_HEAD(&wb->b_dirty);
 	INIT_LIST_HEAD(&wb->b_io);
 	INIT_LIST_HEAD(&wb->b_more_io);
+	INIT_LIST_HEAD(&wb->b_more_io_wait);
 	spin_lock_init(&wb->list_lock);
 	setup_timer(&wb->wakeup_timer, wakeup_timer_fn, (unsigned long)bdi);
 }
@@ -718,6 +723,7 @@ void bdi_destroy(struct backing_dev_info
 		list_splice(&bdi->wb.b_dirty, &dst->b_dirty);
 		list_splice(&bdi->wb.b_io, &dst->b_io);
 		list_splice(&bdi->wb.b_more_io, &dst->b_more_io);
+		list_splice(&bdi->wb.b_more_io_wait, &dst->b_more_io_wait);
 		spin_unlock(&bdi->wb.list_lock);
 		spin_unlock(&dst->list_lock);
 	}



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/7] writeback: avoid redirtying when ->write_inode failed to clear I_DIRTY
  2011-10-20 15:22 [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes Wu Fengguang
  2011-10-20 15:22 ` [PATCH 1/7] writeback: introduce queue b_more_io_wait Wu Fengguang
@ 2011-10-20 15:22 ` Wu Fengguang
  2011-10-20 23:24   ` Jan Kara
  2011-10-20 15:22 ` [PATCH 3/7] writeback: update wb->last_active on written pages/inodes Wu Fengguang
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Wu Fengguang @ 2011-10-20 15:22 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jan Kara, Dave Chinner, Christoph Hellwig, Christoph Hellwig,
	Wu Fengguang, Andrew Morton, LKML

[-- Attachment #1: mutt-wfg-hp-1000-22575-612225d7a6295ddc6 --]
[-- Type: text/plain, Size: 2017 bytes --]

From: Christoph Hellwig <hch@infradead.org>

Right now ->write_inode has no way to safely return a EAGAIN without explicitly
redirtying the inode, as we would lose the dirty state otherwise.  Most
filesystems get this wrong, but XFS makes heavy use of it to avoid blocking
the flusher thread when ->write_inode hits contentended inode locks.  A
contended ilock is something XFS can hit very easibly when extending files, as
the data I/O completion handler takes the lock to update the size, and the
->write_inode call can race with it fairly easily if writing enough data
in one go so that the completion for the first write come in just before
we call ->write_inode.

Change the handling of this case to use requeue_io_wait for a quick retry instead
of redirty_tail, which keeps moving out the dirtied_when data and thus keeps
delaying the writeout more and more with every failed attempt to get the lock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

--- linux-next.orig/fs/fs-writeback.c	2011-10-08 13:30:25.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2011-10-08 13:30:41.000000000 +0800
@@ -488,8 +488,18 @@ writeback_single_inode(struct inode *ino
 			 * operations, such as delayed allocation during
 			 * submission or metadata updates after data IO
 			 * completion.
+			 *
+			 * For the latter case it is very important to give
+			 * the inode another turn on b_more_io instead of
+			 * redirtying it.  Constantly moving dirtied_when
+			 * forward will prevent us from ever writing out
+			 * the metadata dirtied in the I/O completion handler.
+			 *
+			 * For files on XFS that constantly get appended to
+			 * calling redirty_tail means they will never get
+			 * their updated i_size written out.
 			 */
-			redirty_tail(inode, wb);
+			requeue_io_wait(inode, wb);
 		} else {
 			/*
 			 * The inode is clean.  At this point we either have



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/7] writeback: update wb->last_active on written pages/inodes
  2011-10-20 15:22 [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes Wu Fengguang
  2011-10-20 15:22 ` [PATCH 1/7] writeback: introduce queue b_more_io_wait Wu Fengguang
  2011-10-20 15:22 ` [PATCH 2/7] writeback: avoid redirtying when ->write_inode failed to clear I_DIRTY Wu Fengguang
@ 2011-10-20 15:22 ` Wu Fengguang
  2011-10-20 15:22 ` [PATCH 4/7] writeback: Retry kupdate work early if we need to retry some inode writeback Wu Fengguang
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Wu Fengguang @ 2011-10-20 15:22 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jan Kara, Dave Chinner, Christoph Hellwig, Wu Fengguang,
	Andrew Morton, LKML

[-- Attachment #1: writeback-progress --]
[-- Type: text/plain, Size: 1726 bytes --]

From: Jan Kara <jack@suse.cz>

The writeback last_active value should be updated on not only written
pages, but also cleaned inodes.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

--- linux-next.orig/fs/fs-writeback.c	2011-10-20 21:39:48.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2011-10-20 22:36:59.000000000 +0800
@@ -744,6 +744,7 @@ static long wb_writeback(struct bdi_writ
 	unsigned long oldest_jif;
 	struct inode *inode;
 	long progress;
+	long total_progress = 0;
 
 	oldest_jif = jiffies;
 	work->older_than_this = &oldest_jif;
@@ -787,6 +788,7 @@ static long wb_writeback(struct bdi_writ
 		else
 			progress = __writeback_inodes_wb(wb, work);
 		trace_writeback_written(wb->bdi, work);
+		total_progress += progress;
 
 		wb_update_bandwidth(wb, wb_start);
 
@@ -820,7 +822,8 @@ static long wb_writeback(struct bdi_writ
 	}
 	spin_unlock(&wb->list_lock);
 
-	return nr_pages - work->nr_pages;
+	trace_writeback_pages_written(nr_pages - work->nr_pages);
+	return total_progress;
 }
 
 /*
@@ -954,7 +957,7 @@ int bdi_writeback_thread(void *data)
 {
 	struct bdi_writeback *wb = data;
 	struct backing_dev_info *bdi = wb->bdi;
-	long pages_written;
+	long progress;
 
 	current->flags |= PF_SWAPWRITE;
 	set_freezable();
@@ -974,11 +977,9 @@ int bdi_writeback_thread(void *data)
 		 */
 		del_timer(&wb->wakeup_timer);
 
-		pages_written = wb_do_writeback(wb, 0);
-
-		trace_writeback_pages_written(pages_written);
+		progress = wb_do_writeback(wb, 0);
 
-		if (pages_written)
+		if (progress)
 			wb->last_active = jiffies;
 
 		set_current_state(TASK_INTERRUPTIBLE);



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 4/7] writeback: Retry kupdate work early if we need to retry some inode writeback
  2011-10-20 15:22 [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes Wu Fengguang
                   ` (2 preceding siblings ...)
  2011-10-20 15:22 ` [PATCH 3/7] writeback: update wb->last_active on written pages/inodes Wu Fengguang
@ 2011-10-20 15:22 ` Wu Fengguang
  2011-10-20 15:22 ` [PATCH 5/7] writeback: requeue_io_wait() on pages_skipped inode Wu Fengguang
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Wu Fengguang @ 2011-10-20 15:22 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jan Kara, Dave Chinner, Christoph Hellwig, Wu Fengguang,
	Andrew Morton, LKML

[-- Attachment #1: 0001-writeback-Retry-kupdate-work-early-if-we-need-to-ret.patch --]
[-- Type: text/plain, Size: 1980 bytes --]

From: Jan Kara <jack@suse.cz>

In case we could not do any writeback for some inodes, trigger next kupdate
work early so that writeback on these inodes is not delayed for the whole
dirty_writeback_interval.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

--- linux-next.orig/fs/fs-writeback.c	2011-10-20 22:36:59.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2011-10-20 22:42:24.000000000 +0800
@@ -886,7 +886,7 @@ static long wb_check_old_data_flush(stru
 
 	expired = wb->last_old_flush +
 			msecs_to_jiffies(dirty_writeback_interval * 10);
-	if (time_before(jiffies, expired))
+	if (time_before(jiffies, expired) && list_empty(&wb->b_more_io_wait))
 		return 0;
 
 	wb->last_old_flush = jiffies;
@@ -958,6 +958,10 @@ int bdi_writeback_thread(void *data)
 	struct bdi_writeback *wb = data;
 	struct backing_dev_info *bdi = wb->bdi;
 	long progress;
+	unsigned int pause = 1;
+	unsigned int max_pause = dirty_writeback_interval ?
+			msecs_to_jiffies(dirty_writeback_interval * 10) :
+			HZ;
 
 	current->flags |= PF_SWAPWRITE;
 	set_freezable();
@@ -979,8 +983,10 @@ int bdi_writeback_thread(void *data)
 
 		progress = wb_do_writeback(wb, 0);
 
-		if (progress)
+		if (progress) {
 			wb->last_active = jiffies;
+			pause = 1;
+		}
 
 		set_current_state(TASK_INTERRUPTIBLE);
 		if (!list_empty(&bdi->work_list) || kthread_should_stop()) {
@@ -988,8 +994,11 @@ int bdi_writeback_thread(void *data)
 			continue;
 		}
 
-		if (wb_has_dirty_io(wb) && dirty_writeback_interval)
-			schedule_timeout(msecs_to_jiffies(dirty_writeback_interval * 10));
+		if (!list_empty(&wb->b_more_io_wait) && pause < max_pause) {
+			schedule_timeout(pause);
+			pause <<= 1;
+		} else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
+			schedule_timeout(max_pause);
 		else {
 			/*
 			 * We have nothing to do, so can go sleep without any



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 5/7] writeback: requeue_io_wait() on pages_skipped inode
  2011-10-20 15:22 [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes Wu Fengguang
                   ` (3 preceding siblings ...)
  2011-10-20 15:22 ` [PATCH 4/7] writeback: Retry kupdate work early if we need to retry some inode writeback Wu Fengguang
@ 2011-10-20 15:22 ` Wu Fengguang
  2011-10-20 23:25   ` Jan Kara
  2011-10-20 15:22 ` [PATCH 6/7] writeback: requeue_io_wait() on blocked inode Wu Fengguang
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Wu Fengguang @ 2011-10-20 15:22 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jan Kara, Dave Chinner, Christoph Hellwig, Michael Rubin,
	Peter Zijlstra, Wu Fengguang, Andrew Morton, LKML

[-- Attachment #1: writeback-more_io_wait-d.patch --]
[-- Type: text/plain, Size: 889 bytes --]

Use requeue_io_wait() if some pages were skipped due to locked buffers.

Cc: Dave Chinner <david@fromorbit.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-next.orig/fs/fs-writeback.c	2011-10-20 22:42:24.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2011-10-20 22:42:25.000000000 +0800
@@ -618,11 +618,11 @@ static long writeback_sb_inodes(struct s
 		if (wbc.pages_skipped) {
 			/*
 			 * writeback is not making progress due to locked
 			 * buffers.  Skip this inode for now.
 			 */
-			redirty_tail(inode, wb);
+			requeue_io_wait(inode, wb);
 		}
 		spin_unlock(&inode->i_lock);
 		spin_unlock(&wb->list_lock);
 		iput(inode);
 		cond_resched();



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 6/7] writeback: requeue_io_wait() on blocked inode
  2011-10-20 15:22 [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes Wu Fengguang
                   ` (4 preceding siblings ...)
  2011-10-20 15:22 ` [PATCH 5/7] writeback: requeue_io_wait() on pages_skipped inode Wu Fengguang
@ 2011-10-20 15:22 ` Wu Fengguang
  2011-10-20 23:31   ` Jan Kara
  2011-10-20 15:22 ` [PATCH 7/7] writeback: requeue_io_wait() when failed to grab superblock Wu Fengguang
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Wu Fengguang @ 2011-10-20 15:22 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jan Kara, Dave Chinner, Christoph Hellwig, Wu Fengguang,
	Andrew Morton, LKML

[-- Attachment #1: writeback-remove-redirty-blocked.patch --]
[-- Type: text/plain, Size: 1349 bytes --]

Use requeue_io_wait() if inode is somehow blocked.

CC: Jan Kara <jack@suse.cz>
CC: Dave Chinner <david@fromorbit.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-next.orig/fs/fs-writeback.c	2011-10-20 22:42:25.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2011-10-20 22:43:42.000000000 +0800
@@ -471,25 +471,25 @@ writeback_single_inode(struct inode *ino
 				/*
 				 * slice used up: queue for next turn
 				 */
 				requeue_io(inode, wb);
 			} else {
 				/*
 				 * Writeback blocked by something other than
 				 * congestion. Delay the inode for some time to
 				 * avoid spinning on the CPU (100% iowait)
 				 * retrying writeback of the dirty page/inode
 				 * that cannot be performed immediately.
 				 */
-				redirty_tail(inode, wb);
+				requeue_io_wait(inode, wb);
 			}
 		} else if (inode->i_state & I_DIRTY) {
 			/*
 			 * Filesystems can dirty the inode during writeback
 			 * operations, such as delayed allocation during
 			 * submission or metadata updates after data IO
 			 * completion.
 			 *
 			 * For the latter case it is very important to give
 			 * the inode another turn on b_more_io instead of
 			 * redirtying it.  Constantly moving dirtied_when
 			 * forward will prevent us from ever writing out



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 7/7] writeback: requeue_io_wait() when failed to grab superblock
  2011-10-20 15:22 [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes Wu Fengguang
                   ` (5 preceding siblings ...)
  2011-10-20 15:22 ` [PATCH 6/7] writeback: requeue_io_wait() on blocked inode Wu Fengguang
@ 2011-10-20 15:22 ` Wu Fengguang
  2011-10-20 23:25   ` Jan Kara
  2011-10-20 23:21 ` [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes Jan Kara
  2011-10-22  4:46 ` Wu Fengguang
  8 siblings, 1 reply; 22+ messages in thread
From: Wu Fengguang @ 2011-10-20 15:22 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jan Kara, Dave Chinner, Christoph Hellwig, Wu Fengguang,
	Andrew Morton, LKML

[-- Attachment #1: writeback-more_io_wait-grab.patch --]
[-- Type: text/plain, Size: 1065 bytes --]

It's some block condition that's not really related to the inode, but
still need to move it to b_more_io_wait to prevent possible busy looping.

CC: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-next.orig/fs/fs-writeback.c	2011-10-20 22:43:42.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2011-10-20 22:46:53.000000000 +0800
@@ -652,17 +652,17 @@ static long __writeback_inodes_wb(struct
 		struct super_block *sb = inode->i_sb;
 
 		if (!grab_super_passive(sb)) {
 			/*
 			 * grab_super_passive() may fail consistently due to
 			 * s_umount being grabbed by someone else. Don't use
 			 * requeue_io() to avoid busy retrying the inode/sb.
 			 */
-			redirty_tail(inode, wb);
+			requeue_io_wait(inode, wb);
 			continue;
 		}
 		wrote += writeback_sb_inodes(sb, wb, work);
 		drop_super(sb);
 
 		/* refer to the same tests at the end of writeback_sb_inodes */
 		if (wrote) {
 			if (time_is_before_jiffies(start_time + HZ / 10UL))



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes
  2011-10-20 15:22 [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes Wu Fengguang
                   ` (6 preceding siblings ...)
  2011-10-20 15:22 ` [PATCH 7/7] writeback: requeue_io_wait() when failed to grab superblock Wu Fengguang
@ 2011-10-20 23:21 ` Jan Kara
  2011-10-21 10:40   ` Wu Fengguang
  2011-10-22  4:46 ` Wu Fengguang
  8 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2011-10-20 23:21 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-fsdevel, Jan Kara, Dave Chinner, Christoph Hellwig,
	Andrew Morton, LKML

[-- Attachment #1: Type: text/plain, Size: 1192 bytes --]

  Hello,

On Thu 20-10-11 23:22:40, Wu Fengguang wrote:
> This tries to keep dirtied_when for blocked inodes by converting some
> redirty_tail() calls to requeue_io_wait(). The possibly blocked inodes
> will be moved to b_more_io_wait. The b_more_io_wait inodes will now be
> retried as dillegent as b_more_io inodes, except when the latter goes empty,
> b_more_io_wait will be retried by the kupdate work on increasing intervals
> until exceeding dirty_writeback_interval.
> 
> Christoph Hellwig (1):
>       writeback: avoid redirtying when ->write_inode failed to clear I_DIRTY
> 
> Jan Kara (2):
>       writeback: update wb->last_active on written pages/inodes
>       writeback: Retry kupdate work early if we need to retry some inode writeback
> 
> Wu Fengguang (4):
>       writeback: introduce queue b_more_io_wait
>       writeback: requeue_io_wait() on pages_skipped inode
>       writeback: requeue_io_wait() on blocked inode
>       writeback: requeue_io_wait() when failed to grab superblock
  How about adding the attached patch to the series? With it applied we
would have all busyloop prevention done in the same way.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-writeback-requeue_io_wait-when-I_SYNC-is-set.patch --]
[-- Type: text/x-patch, Size: 2299 bytes --]

>From 15f3c6f15891121ff8ca06e214f75c4bac9f2f80 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Fri, 21 Oct 2011 01:15:43 +0200
Subject: [PATCH] writeback: requeue_io_wait() when I_SYNC is set

If writeback skips inode because it is already under writeback (I_SYNC flag is
set), we can now use requeue_io_wait() instead of requeue_io() and remove
special busyloop prevention in wb_writeback().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c |   21 +++++++++------------
 1 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 89b9b70..4e5cae0 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -14,6 +14,7 @@
  */
 
 #include <linux/kernel.h>
+#include <linux/ratelimit.h>
 #include <linux/module.h>
 #include <linux/spinlock.h>
 #include <linux/slab.h>
@@ -391,7 +392,7 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
 		 * completed a full scan of b_io.
 		 */
 		if (wbc->sync_mode != WB_SYNC_ALL) {
-			requeue_io(inode, wb);
+			requeue_io_wait(inode, wb);
 			trace_writeback_single_inode_requeue(inode, wbc,
 							     nr_to_write);
 			return 0;
@@ -725,7 +726,6 @@ static long wb_writeback(struct bdi_writeback *wb,
 	unsigned long wb_start = jiffies;
 	long nr_pages = work->nr_pages;
 	unsigned long oldest_jif;
-	struct inode *inode;
 	long progress;
 	long total_progress = 0;
 
@@ -791,17 +791,14 @@ static long wb_writeback(struct bdi_writeback *wb,
 		if (list_empty(&wb->b_more_io))
 			break;
 		/*
-		 * Nothing written. Wait for some inode to
-		 * become available for writeback. Otherwise
-		 * we'll just busyloop.
+		 * Nothing written but some inodes were moved to b_more_io.
+		 * This should not happen as we can easily busyloop.
 		 */
-		if (!list_empty(&wb->b_more_io))  {
-			trace_writeback_wait(wb->bdi, work);
-			inode = wb_inode(wb->b_more_io.prev);
-			spin_lock(&inode->i_lock);
-			inode_wait_for_writeback(inode, wb);
-			spin_unlock(&inode->i_lock);
-		}
+		pr_warn_ratelimited("mm: Possible busyloop in data writeback "
+			"(bdi %s nr_pages %ld sync_mode %d kupdate %d "
+			"background %d)\n",
+			wb->bdi->name, work->nr_pages, work->sync_mode,
+			work->for_kupdate, work->for_background);
 	}
 	spin_unlock(&wb->list_lock);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/7] writeback: introduce queue b_more_io_wait
  2011-10-20 15:22 ` [PATCH 1/7] writeback: introduce queue b_more_io_wait Wu Fengguang
@ 2011-10-20 23:23   ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2011-10-20 23:23 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-fsdevel, Jan Kara, Dave Chinner, Christoph Hellwig,
	Andrew Morton, LKML

On Thu 20-10-11 23:22:41, Wu Fengguang wrote:
> The problem is, redirty_tail() may update i_dirtied_when and result in
> 30s max delay. If redirty_tail() is called often enough, some inode may
> even be delayed for ever.
> 
> So introduce the b_more_io_wait queue to park inodes that for some
> reason cannot be synced immediately. The inodes will be sent to b_io at
> the next b_io refill time, however won't be busy retried as b_more_io:
> when the redirtied inodes are all in b_more_io_wait, wb_writeback() will
> see empty b_more_io and hence break out of the loop.
> 
> This would be the new data flow after converting all redirty_tail()
> calls to requeue_io_wait():
> 
> b_dirty --> b_io --> b_more_io/b_more_io_wait --+
>              ^                                  |
>              |                                  |
>              +----------------------------------+
> 
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
  You can add:
Acked-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/fs-writeback.c           |   16 ++++++++++++++++
>  include/linux/backing-dev.h |    8 +++++---
>  mm/backing-dev.c            |   10 ++++++++--
>  3 files changed, 29 insertions(+), 5 deletions(-)
> 
> --- linux-next.orig/fs/fs-writeback.c	2011-10-20 23:13:48.000000000 +0800
> +++ linux-next/fs/fs-writeback.c	2011-10-20 23:19:12.000000000 +0800
> @@ -234,6 +234,21 @@ static void requeue_io(struct inode *ino
>  	list_move(&inode->i_wb_list, &wb->b_more_io);
>  }
>  
> +/*
> + * The inode should be retried in an opportunistic way.
> + *
> + * The only difference between b_more_io and b_more_io_wait is:
> + * wb_writeback() won't quit as long as b_more_io is not empty.  When
> + * wb_writeback() quit on empty b_more_io and non-empty b_more_io_wait,
> + * the kupdate work will wakeup more frequently to retry the inodes in
> + * b_more_io_wait.
> + */
> +static void requeue_io_wait(struct inode *inode, struct bdi_writeback *wb)
> +{
> +	assert_spin_locked(&wb->list_lock);
> +	list_move(&inode->i_wb_list, &wb->b_more_io_wait);
> +}
> +
>  static void inode_sync_complete(struct inode *inode)
>  {
>  	/*
> @@ -321,6 +336,7 @@ static void queue_io(struct bdi_writebac
>  	int moved;
>  	assert_spin_locked(&wb->list_lock);
>  	list_splice_init(&wb->b_more_io, &wb->b_io);
> +	list_splice_init(&wb->b_more_io_wait, &wb->b_io);
>  	moved = move_expired_inodes(&wb->b_dirty, &wb->b_io, work);
>  	trace_writeback_queue_io(wb, work, moved);
>  }
> --- linux-next.orig/include/linux/backing-dev.h	2011-10-20 23:13:48.000000000 +0800
> +++ linux-next/include/linux/backing-dev.h	2011-10-20 23:13:50.000000000 +0800
> @@ -59,6 +59,7 @@ struct bdi_writeback {
>  	struct list_head b_dirty;	/* dirty inodes */
>  	struct list_head b_io;		/* parked for writeback */
>  	struct list_head b_more_io;	/* parked for more writeback */
> +	struct list_head b_more_io_wait;/* opportunistic retry io */
>  	spinlock_t list_lock;		/* protects the b_* lists */
>  };
>  
> @@ -133,9 +134,10 @@ extern struct list_head bdi_pending_list
>  
>  static inline int wb_has_dirty_io(struct bdi_writeback *wb)
>  {
> -	return !list_empty(&wb->b_dirty) ||
> -	       !list_empty(&wb->b_io) ||
> -	       !list_empty(&wb->b_more_io);
> +	return !list_empty(&wb->b_dirty)	||
> +	       !list_empty(&wb->b_io)		||
> +	       !list_empty(&wb->b_more_io)	||
> +	       !list_empty(&wb->b_more_io_wait);
>  }
>  
>  static inline void __add_bdi_stat(struct backing_dev_info *bdi,
> --- linux-next.orig/mm/backing-dev.c	2011-10-20 23:13:48.000000000 +0800
> +++ linux-next/mm/backing-dev.c	2011-10-20 23:13:50.000000000 +0800
> @@ -74,10 +74,10 @@ static int bdi_debug_stats_show(struct s
>  	unsigned long background_thresh;
>  	unsigned long dirty_thresh;
>  	unsigned long bdi_thresh;
> -	unsigned long nr_dirty, nr_io, nr_more_io;
> +	unsigned long nr_dirty, nr_io, nr_more_io, nr_more_io_wait;
>  	struct inode *inode;
>  
> -	nr_dirty = nr_io = nr_more_io = 0;
> +	nr_dirty = nr_io = nr_more_io = nr_more_io_wait = 0;
>  	spin_lock(&wb->list_lock);
>  	list_for_each_entry(inode, &wb->b_dirty, i_wb_list)
>  		nr_dirty++;
> @@ -85,6 +85,8 @@ static int bdi_debug_stats_show(struct s
>  		nr_io++;
>  	list_for_each_entry(inode, &wb->b_more_io, i_wb_list)
>  		nr_more_io++;
> +	list_for_each_entry(inode, &wb->b_more_io_wait, i_wb_list)
> +		nr_more_io_wait++;
>  	spin_unlock(&wb->list_lock);
>  
>  	global_dirty_limits(&background_thresh, &dirty_thresh);
> @@ -103,6 +105,7 @@ static int bdi_debug_stats_show(struct s
>  		   "b_dirty:            %10lu\n"
>  		   "b_io:               %10lu\n"
>  		   "b_more_io:          %10lu\n"
> +		   "b_more_io_wait:     %10lu\n"
>  		   "bdi_list:           %10u\n"
>  		   "state:              %10lx\n",
>  		   (unsigned long) K(bdi_stat(bdi, BDI_WRITEBACK)),
> @@ -116,6 +119,7 @@ static int bdi_debug_stats_show(struct s
>  		   nr_dirty,
>  		   nr_io,
>  		   nr_more_io,
> +		   nr_more_io_wait,
>  		   !list_empty(&bdi->bdi_list), bdi->state);
>  #undef K
>  
> @@ -651,6 +655,7 @@ static void bdi_wb_init(struct bdi_write
>  	INIT_LIST_HEAD(&wb->b_dirty);
>  	INIT_LIST_HEAD(&wb->b_io);
>  	INIT_LIST_HEAD(&wb->b_more_io);
> +	INIT_LIST_HEAD(&wb->b_more_io_wait);
>  	spin_lock_init(&wb->list_lock);
>  	setup_timer(&wb->wakeup_timer, wakeup_timer_fn, (unsigned long)bdi);
>  }
> @@ -718,6 +723,7 @@ void bdi_destroy(struct backing_dev_info
>  		list_splice(&bdi->wb.b_dirty, &dst->b_dirty);
>  		list_splice(&bdi->wb.b_io, &dst->b_io);
>  		list_splice(&bdi->wb.b_more_io, &dst->b_more_io);
> +		list_splice(&bdi->wb.b_more_io_wait, &dst->b_more_io_wait);
>  		spin_unlock(&bdi->wb.list_lock);
>  		spin_unlock(&dst->list_lock);
>  	}
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/7] writeback: avoid redirtying when ->write_inode failed to clear I_DIRTY
  2011-10-20 15:22 ` [PATCH 2/7] writeback: avoid redirtying when ->write_inode failed to clear I_DIRTY Wu Fengguang
@ 2011-10-20 23:24   ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2011-10-20 23:24 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-fsdevel, Jan Kara, Dave Chinner, Christoph Hellwig,
	Christoph Hellwig, Andrew Morton, LKML

On Thu 20-10-11 23:22:42, Wu Fengguang wrote:
> From: Christoph Hellwig <hch@infradead.org>
> 
> Right now ->write_inode has no way to safely return a EAGAIN without explicitly
> redirtying the inode, as we would lose the dirty state otherwise.  Most
> filesystems get this wrong, but XFS makes heavy use of it to avoid blocking
> the flusher thread when ->write_inode hits contentended inode locks.  A
> contended ilock is something XFS can hit very easibly when extending files, as
> the data I/O completion handler takes the lock to update the size, and the
> ->write_inode call can race with it fairly easily if writing enough data
> in one go so that the completion for the first write come in just before
> we call ->write_inode.
> 
> Change the handling of this case to use requeue_io_wait for a quick retry instead
> of redirty_tail, which keeps moving out the dirtied_when data and thus keeps
> delaying the writeout more and more with every failed attempt to get the lock.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
  You can add:
Acked-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/fs-writeback.c |   12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> --- linux-next.orig/fs/fs-writeback.c	2011-10-08 13:30:25.000000000 +0800
> +++ linux-next/fs/fs-writeback.c	2011-10-08 13:30:41.000000000 +0800
> @@ -488,8 +488,18 @@ writeback_single_inode(struct inode *ino
>  			 * operations, such as delayed allocation during
>  			 * submission or metadata updates after data IO
>  			 * completion.
> +			 *
> +			 * For the latter case it is very important to give
> +			 * the inode another turn on b_more_io instead of
> +			 * redirtying it.  Constantly moving dirtied_when
> +			 * forward will prevent us from ever writing out
> +			 * the metadata dirtied in the I/O completion handler.
> +			 *
> +			 * For files on XFS that constantly get appended to
> +			 * calling redirty_tail means they will never get
> +			 * their updated i_size written out.
>  			 */
> -			redirty_tail(inode, wb);
> +			requeue_io_wait(inode, wb);
>  		} else {
>  			/*
>  			 * The inode is clean.  At this point we either have
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/7] writeback: requeue_io_wait() on pages_skipped inode
  2011-10-20 15:22 ` [PATCH 5/7] writeback: requeue_io_wait() on pages_skipped inode Wu Fengguang
@ 2011-10-20 23:25   ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2011-10-20 23:25 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-fsdevel, Jan Kara, Dave Chinner, Christoph Hellwig,
	Michael Rubin, Peter Zijlstra, Andrew Morton, LKML

On Thu 20-10-11 23:22:45, Wu Fengguang wrote:
> Use requeue_io_wait() if some pages were skipped due to locked buffers.
> 
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Michael Rubin <mrubin@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
  Two signed-off-by?

Otherwise
  Acked-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/fs-writeback.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- linux-next.orig/fs/fs-writeback.c	2011-10-20 22:42:24.000000000 +0800
> +++ linux-next/fs/fs-writeback.c	2011-10-20 22:42:25.000000000 +0800
> @@ -618,11 +618,11 @@ static long writeback_sb_inodes(struct s
>  		if (wbc.pages_skipped) {
>  			/*
>  			 * writeback is not making progress due to locked
>  			 * buffers.  Skip this inode for now.
>  			 */
> -			redirty_tail(inode, wb);
> +			requeue_io_wait(inode, wb);
>  		}
>  		spin_unlock(&inode->i_lock);
>  		spin_unlock(&wb->list_lock);
>  		iput(inode);
>  		cond_resched();
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 7/7] writeback: requeue_io_wait() when failed to grab superblock
  2011-10-20 15:22 ` [PATCH 7/7] writeback: requeue_io_wait() when failed to grab superblock Wu Fengguang
@ 2011-10-20 23:25   ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2011-10-20 23:25 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-fsdevel, Jan Kara, Dave Chinner, Christoph Hellwig,
	Andrew Morton, LKML

On Thu 20-10-11 23:22:47, Wu Fengguang wrote:
> It's some block condition that's not really related to the inode, but
> still need to move it to b_more_io_wait to prevent possible busy looping.
> 
> CC: Jan Kara <jack@suse.cz>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
  Acked-by: Jan Kara <jack@suse.cz>

								Honza
> ---
>  fs/fs-writeback.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- linux-next.orig/fs/fs-writeback.c	2011-10-20 22:43:42.000000000 +0800
> +++ linux-next/fs/fs-writeback.c	2011-10-20 22:46:53.000000000 +0800
> @@ -652,17 +652,17 @@ static long __writeback_inodes_wb(struct
>  		struct super_block *sb = inode->i_sb;
>  
>  		if (!grab_super_passive(sb)) {
>  			/*
>  			 * grab_super_passive() may fail consistently due to
>  			 * s_umount being grabbed by someone else. Don't use
>  			 * requeue_io() to avoid busy retrying the inode/sb.
>  			 */
> -			redirty_tail(inode, wb);
> +			requeue_io_wait(inode, wb);
>  			continue;
>  		}
>  		wrote += writeback_sb_inodes(sb, wb, work);
>  		drop_super(sb);
>  
>  		/* refer to the same tests at the end of writeback_sb_inodes */
>  		if (wrote) {
>  			if (time_is_before_jiffies(start_time + HZ / 10UL))
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] writeback: requeue_io_wait() on blocked inode
  2011-10-20 15:22 ` [PATCH 6/7] writeback: requeue_io_wait() on blocked inode Wu Fengguang
@ 2011-10-20 23:31   ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2011-10-20 23:31 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-fsdevel, Jan Kara, Dave Chinner, Christoph Hellwig,
	Andrew Morton, LKML

On Thu 20-10-11 23:22:46, Wu Fengguang wrote:
> Use requeue_io_wait() if inode is somehow blocked.
> 
> CC: Jan Kara <jack@suse.cz>
> CC: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
  Acked-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/fs-writeback.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- linux-next.orig/fs/fs-writeback.c	2011-10-20 22:42:25.000000000 +0800
> +++ linux-next/fs/fs-writeback.c	2011-10-20 22:43:42.000000000 +0800
> @@ -471,25 +471,25 @@ writeback_single_inode(struct inode *ino
>  				/*
>  				 * slice used up: queue for next turn
>  				 */
>  				requeue_io(inode, wb);
>  			} else {
>  				/*
>  				 * Writeback blocked by something other than
>  				 * congestion. Delay the inode for some time to
>  				 * avoid spinning on the CPU (100% iowait)
>  				 * retrying writeback of the dirty page/inode
>  				 * that cannot be performed immediately.
>  				 */
> -				redirty_tail(inode, wb);
> +				requeue_io_wait(inode, wb);
>  			}
>  		} else if (inode->i_state & I_DIRTY) {
>  			/*
>  			 * Filesystems can dirty the inode during writeback
>  			 * operations, such as delayed allocation during
>  			 * submission or metadata updates after data IO
>  			 * completion.
>  			 *
>  			 * For the latter case it is very important to give
>  			 * the inode another turn on b_more_io instead of
>  			 * redirtying it.  Constantly moving dirtied_when
>  			 * forward will prevent us from ever writing out
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes
  2011-10-20 23:21 ` [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes Jan Kara
@ 2011-10-21 10:40   ` Wu Fengguang
  2011-10-21 19:54     ` Jan Kara
  0 siblings, 1 reply; 22+ messages in thread
From: Wu Fengguang @ 2011-10-21 10:40 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Dave Chinner, Christoph Hellwig, Andrew Morton, LKML

Hi Jan,

>   How about adding the attached patch to the series? With it applied we
> would have all busyloop prevention done in the same way.

Sure. It looks good.

> +		pr_warn_ratelimited("mm: Possible busyloop in data writeback "
> +			"(bdi %s nr_pages %ld sync_mode %d kupdate %d "
> +			"background %d)\n",
> +			wb->bdi->name, work->nr_pages, work->sync_mode,
> +			work->for_kupdate, work->for_background);

I'll change the last two fields to the newly introduced writeback "reason":

                pr_warn_ratelimited("mm: Possible busyloop in data writeback "
                        "(bdi %s nr_pages %ld sync_mode %d reason %s)\n",
                        wb->bdi->name, work->nr_pages, work->sync_mode,
                        wb_reason_name[work->reason]);

btw, with the I_SYNC case converted, it's actually no longer necessary
to keep a standalone b_more_io_wait. It should still be better to keep
the list and the above error check for catching possible errors and
the flexibility of adding policies like "don't retry possible blocked
inodes in N seconds as long as there are other inodes to work with".

The below diff only intends to show the _possibility_ to remove
b_more_io_wait:

--- linux-next.orig/fs/fs-writeback.c	2011-10-21 18:25:25.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2011-10-21 18:27:41.000000000 +0800
@@ -235,20 +235,7 @@ static void requeue_io(struct inode *ino
 	list_move(&inode->i_wb_list, &wb->b_more_io);
 }
 
-/*
- * The inode should be retried in an opportunistic way.
- *
- * The only difference between b_more_io and b_more_io_wait is:
- * wb_writeback() won't quit as long as b_more_io is not empty.  When
- * wb_writeback() quit on empty b_more_io and non-empty b_more_io_wait,
- * the kupdate work will wakeup more frequently to retry the inodes in
- * b_more_io_wait.
- */
-static void requeue_io_wait(struct inode *inode, struct bdi_writeback *wb)
-{
-	assert_spin_locked(&wb->list_lock);
-	list_move(&inode->i_wb_list, &wb->b_more_io_wait);
-}
+#define requeue_io_wait(inode, wb) requeue_io(inode, wb)
 
 static void inode_sync_complete(struct inode *inode)
 {
@@ -798,21 +785,8 @@ static long wb_writeback(struct bdi_writ
 		 * mean the overall work is done. So we keep looping as long
 		 * as made some progress on cleaning pages or inodes.
 		 */
-		if (progress)
-			continue;
-		/*
-		 * No more inodes for IO, bail
-		 */
-		if (list_empty(&wb->b_more_io))
+		if (!progress)
 			break;
-		/*
-		 * Nothing written but some inodes were moved to b_more_io.
-		 * This should not happen as we can easily busyloop.
-		 */
-		pr_warn_ratelimited("mm: Possible busyloop in data writeback "
-			"(bdi %s nr_pages %ld sync_mode %d reason %s)\n",
-			wb->bdi->name, work->nr_pages, work->sync_mode,
-			wb_reason_name[work->reason]);
 	}
 	spin_unlock(&wb->list_lock);

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes
  2011-10-21 10:40   ` Wu Fengguang
@ 2011-10-21 19:54     ` Jan Kara
  2011-10-22  3:11       ` Wu Fengguang
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2011-10-21 19:54 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Jan Kara, linux-fsdevel, Dave Chinner, Christoph Hellwig,
	Andrew Morton, LKML

On Fri 21-10-11 18:40:49, Wu Fengguang wrote:
> Hi Jan,
> 
> >   How about adding the attached patch to the series? With it applied we
> > would have all busyloop prevention done in the same way.
> 
> Sure. It looks good.
  Thanks.

> > +		pr_warn_ratelimited("mm: Possible busyloop in data writeback "
> > +			"(bdi %s nr_pages %ld sync_mode %d kupdate %d "
> > +			"background %d)\n",
> > +			wb->bdi->name, work->nr_pages, work->sync_mode,
> > +			work->for_kupdate, work->for_background);
> 
> I'll change the last two fields to the newly introduced writeback "reason":
> 
>                 pr_warn_ratelimited("mm: Possible busyloop in data writeback "
>                         "(bdi %s nr_pages %ld sync_mode %d reason %s)\n",
>                         wb->bdi->name, work->nr_pages, work->sync_mode,
>                         wb_reason_name[work->reason]);
  Makes sense. Thanks.

> btw, with the I_SYNC case converted, it's actually no longer necessary
> to keep a standalone b_more_io_wait. It should still be better to keep
> the list and the above error check for catching possible errors and
> the flexibility of adding policies like "don't retry possible blocked
> inodes in N seconds as long as there are other inodes to work with".
> 
> The below diff only intends to show the _possibility_ to remove
> b_more_io_wait:
  Good observation. So should we introduce b_more_io_wait in the end? We
could always introduce it when the need for some more complicated policy
comes...

								Honza

> 
> --- linux-next.orig/fs/fs-writeback.c	2011-10-21 18:25:25.000000000 +0800
> +++ linux-next/fs/fs-writeback.c	2011-10-21 18:27:41.000000000 +0800
> @@ -235,20 +235,7 @@ static void requeue_io(struct inode *ino
>  	list_move(&inode->i_wb_list, &wb->b_more_io);
>  }
>  
> -/*
> - * The inode should be retried in an opportunistic way.
> - *
> - * The only difference between b_more_io and b_more_io_wait is:
> - * wb_writeback() won't quit as long as b_more_io is not empty.  When
> - * wb_writeback() quit on empty b_more_io and non-empty b_more_io_wait,
> - * the kupdate work will wakeup more frequently to retry the inodes in
> - * b_more_io_wait.
> - */
> -static void requeue_io_wait(struct inode *inode, struct bdi_writeback *wb)
> -{
> -	assert_spin_locked(&wb->list_lock);
> -	list_move(&inode->i_wb_list, &wb->b_more_io_wait);
> -}
> +#define requeue_io_wait(inode, wb) requeue_io(inode, wb)
>  
>  static void inode_sync_complete(struct inode *inode)
>  {
> @@ -798,21 +785,8 @@ static long wb_writeback(struct bdi_writ
>  		 * mean the overall work is done. So we keep looping as long
>  		 * as made some progress on cleaning pages or inodes.
>  		 */
> -		if (progress)
> -			continue;
> -		/*
> -		 * No more inodes for IO, bail
> -		 */
> -		if (list_empty(&wb->b_more_io))
> +		if (!progress)
>  			break;
> -		/*
> -		 * Nothing written but some inodes were moved to b_more_io.
> -		 * This should not happen as we can easily busyloop.
> -		 */
> -		pr_warn_ratelimited("mm: Possible busyloop in data writeback "
> -			"(bdi %s nr_pages %ld sync_mode %d reason %s)\n",
> -			wb->bdi->name, work->nr_pages, work->sync_mode,
> -			wb_reason_name[work->reason]);
>  	}
>  	spin_unlock(&wb->list_lock);
> 
> Thanks,
> Fengguang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes
  2011-10-21 19:54     ` Jan Kara
@ 2011-10-22  3:11       ` Wu Fengguang
  2011-10-22  5:38         ` Wu Fengguang
  0 siblings, 1 reply; 22+ messages in thread
From: Wu Fengguang @ 2011-10-22  3:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Dave Chinner, Christoph Hellwig, Andrew Morton, LKML

> > btw, with the I_SYNC case converted, it's actually no longer necessary
> > to keep a standalone b_more_io_wait. It should still be better to keep
> > the list and the above error check for catching possible errors and
> > the flexibility of adding policies like "don't retry possible blocked
> > inodes in N seconds as long as there are other inodes to work with".
> > 
> > The below diff only intends to show the _possibility_ to remove
> > b_more_io_wait:
>   Good observation. So should we introduce b_more_io_wait in the end? We
> could always introduce it when the need for some more complicated policy
> comes...
> 

I have no problem removing it if you liked it more. Anyway, let me
test the idea out first (just kicked off the tests).

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes
  2011-10-20 15:22 [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes Wu Fengguang
                   ` (7 preceding siblings ...)
  2011-10-20 23:21 ` [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes Jan Kara
@ 2011-10-22  4:46 ` Wu Fengguang
  8 siblings, 0 replies; 22+ messages in thread
From: Wu Fengguang @ 2011-10-22  4:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jan Kara, Dave Chinner, Christoph Hellwig, Andrew Morton, LKML

Hi,

I get some test results for this patch: basically it's good until
patch 4. Further more, I got minor regressions when apply several
combinations of patches 5-7 and Jan's patch for the I_SYNC case.

[PATCH 1/7] writeback: introduce queue b_more_io_wait
[PATCH 2/7] writeback: avoid redirtying when ->write_inode failed to clear I_DIR
[PATCH 3/7] writeback: update wb->last_active on written pages/inodes
[PATCH 4/7] writeback: Retry kupdate work early if we need to retry some inode w

[PATCH 5/7] writeback: requeue_io_wait() on pages_skipped inode
[PATCH 6/7] writeback: requeue_io_wait() on blocked inode
[PATCH 7/7] writeback: requeue_io_wait() when failed to grab superblock

Based on the below results, I'd like to push only patches 1-4 to
linux-next and hold back the other patches for more investigation.

Overall throughput increases +1.2% if applying only patches 1-4:

3.1.0-rc9-ioless-full-next-20111014+  3.1.0-rc9-ioless-full-more_io_wait-5-next-20111014+  
------------------------  ------------------------  
                   56.47        +3.1%        58.22  thresh=100M/btrfs-10dd-4k-8p-4096M-100M:10-X
                   56.28        +1.6%        57.20  thresh=100M/btrfs-1dd-4k-8p-4096M-100M:10-X
                   56.11        +2.6%        57.56  thresh=100M/btrfs-2dd-4k-8p-4096M-100M:10-X
                   37.86        +3.5%        39.20  thresh=100M/ext3-10dd-4k-8p-4096M-100M:10-X
                   45.91        +2.2%        46.92  thresh=100M/ext3-1dd-4k-8p-4096M-100M:10-X
                   41.87        +2.7%        42.98  thresh=100M/ext3-2dd-4k-8p-4096M-100M:10-X
                   45.68        +1.7%        46.43  thresh=100M/ext4-10dd-4k-8p-4096M-100M:10-X
                   55.74        +1.2%        56.41  thresh=100M/ext4-1dd-4k-8p-4096M-100M:10-X
                   53.32        +0.7%        53.71  thresh=100M/ext4-2dd-4k-8p-4096M-100M:10-X
                   46.20        -2.6%        44.99  thresh=100M/xfs-10dd-4k-8p-4096M-100M:10-X
                   55.72        +3.9%        57.90  thresh=100M/xfs-1dd-4k-8p-4096M-100M:10-X
                   54.01        +0.4%        54.21  thresh=100M/xfs-2dd-4k-8p-4096M-100M:10-X
                   55.08        +2.0%        56.17  thresh=1G/btrfs-100dd-4k-8p-4096M-1024M:10-X
                   55.49        +1.3%        56.24  thresh=1G/btrfs-10dd-4k-8p-4096M-1024M:10-X
                   55.38        +1.7%        56.30  thresh=1G/btrfs-1dd-4k-8p-4096M-1024M:10-X
                   36.70        +0.7%        36.96  thresh=1G/ext3-100dd-4k-8p-4096M-1024M:10-X
                   40.64        +0.3%        40.77  thresh=1G/ext3-10dd-4k-8p-4096M-1024M:10-X
                   48.65        +1.6%        49.45  thresh=1G/ext3-1dd-4k-8p-4096M-1024M:10-X
                   49.84        -2.6%        48.53  thresh=1G/ext4-100dd-4k-8p-4096M-1024M:10-X
                   56.03        -1.2%        55.34  thresh=1G/ext4-10dd-4k-8p-4096M-1024M:10-X
                   57.42        +0.2%        57.52  thresh=1G/ext4-1dd-4k-8p-4096M-1024M:10-X
                   45.74        +0.4%        45.92  thresh=1G/xfs-100dd-4k-8p-4096M-1024M:10-X
                   54.19        -0.7%        53.79  thresh=1G/xfs-10dd-4k-8p-4096M-1024M:10-X
                   55.93        -0.4%        55.72  thresh=1G/xfs-1dd-4k-8p-4096M-1024M:10-X
                   35.78        -0.0%        35.77  thresh=8M/ext4-10dd-4k-8p-4096M-8M:10-X
                   55.29        +2.1%        56.43  thresh=8M/ext4-1dd-4k-8p-4096M-8M:10-X
                   51.12        +1.6%        51.94  thresh=8M/ext4-2dd-4k-8p-4096M-8M:10-X
                   31.21        +0.9%        31.48  thresh=8M/xfs-10dd-4k-8p-4096M-8M:10-X
                   54.10        +3.9%        56.22  thresh=8M/xfs-1dd-4k-8p-4096M-8M:10-X
                   46.97        +3.8%        48.75  thresh=8M/xfs-2dd-4k-8p-4096M-8M:10-X
                 1490.73        +1.2%      1509.03  TOTAL write_bw

Overall throughput after the patchset increases +1.6% if replacing
patch 5 with another patch to remove redirty_tail() on pages_skipped:

3.1.0-rc9-ioless-full-more_io_wait-next-20111014+  3.1.0-rc9-ioless-full-more_io_wait-7-next-20111014+  
------------------------  ------------------------  
                   46.22        +0.8%        46.58  thresh=100M/ext3-1dd-4k-8p-4096M-100M:10-X
                   42.19        +1.2%        42.68  thresh=100M/ext3-2dd-4k-8p-4096M-100M:10-X
                   45.50        +1.2%        46.06  thresh=100M/ext4-10dd-4k-8p-4096M-100M:10-X
                   54.51        +0.9%        54.97  thresh=100M/ext4-1dd-4k-8p-4096M-100M:10-X
                   53.19        +1.2%        53.83  thresh=100M/ext4-2dd-4k-8p-4096M-100M:10-X
                   43.98        +0.3%        44.11  thresh=100M/xfs-10dd-4k-8p-4096M-100M:10-X
                   55.76        +2.7%        57.25  thresh=100M/xfs-1dd-4k-8p-4096M-100M:10-X
                   52.94        +2.5%        54.29  thresh=100M/xfs-2dd-4k-8p-4096M-100M:10-X
                   54.52        +1.6%        55.36  thresh=1G/btrfs-100dd-4k-8p-4096M-1024M:10-X
                   54.94        +0.8%        55.40  thresh=1G/btrfs-10dd-4k-8p-4096M-1024M:10-X
                   53.91        +1.9%        54.96  thresh=1G/btrfs-1dd-4k-8p-4096M-1024M:10-X
                   36.15        +1.0%        36.53  thresh=1G/ext3-100dd-4k-8p-4096M-1024M:10-X
                   38.25        +6.7%        40.80  thresh=1G/ext3-10dd-4k-8p-4096M-1024M:10-X
                   45.30        +8.7%        49.24  thresh=1G/ext3-1dd-4k-8p-4096M-1024M:10-X
                   48.23        +1.5%        48.93  thresh=1G/ext4-100dd-4k-8p-4096M-1024M:10-X
                   54.21        +0.4%        54.41  thresh=1G/ext4-10dd-4k-8p-4096M-1024M:10-X
                   56.07        +1.4%        56.86  thresh=1G/ext4-1dd-4k-8p-4096M-1024M:10-X
                   45.12        -5.8%        42.52  thresh=1G/xfs-100dd-4k-8p-4096M-1024M:10-X
                   53.94        +1.3%        54.67  thresh=1G/xfs-10dd-4k-8p-4096M-1024M:10-X
                   55.66        +2.7%        57.15  thresh=1G/xfs-1dd-4k-8p-4096M-1024M:10-X
                  990.58        +1.6%      1006.59  TOTAL write_bw

But in general the performance is slightly worse when doing anything
more than patches 1-4.

patches 1-4 VS patches 1-4 plus requeue_io_wait() on I_SYNC

3.1.0-rc9-ioless-full-more_io_wait-5-next-20111014+  3.1.0-rc9-ioless-full-more_io_wait-9-next-20111014+  
------------------------  ------------------------  
                   58.22        -3.2%        56.38  thresh=100M/btrfs-10dd-4k-8p-4096M-100M:10-X
                   57.20        -1.1%        56.56  thresh=100M/btrfs-1dd-4k-8p-4096M-100M:10-X
                   57.56        -2.3%        56.26  thresh=100M/btrfs-2dd-4k-8p-4096M-100M:10-X
                   39.20        -0.8%        38.87  thresh=100M/ext3-10dd-4k-8p-4096M-100M:10-X
                   46.92        +0.1%        46.98  thresh=100M/ext3-1dd-4k-8p-4096M-100M:10-X
                   42.98        -0.7%        42.70  thresh=100M/ext3-2dd-4k-8p-4096M-100M:10-X
                   46.43        -0.6%        46.14  thresh=100M/ext4-10dd-4k-8p-4096M-100M:10-X
                   56.41        +0.1%        56.47  thresh=100M/ext4-1dd-4k-8p-4096M-100M:10-X
                   53.71        +1.6%        54.58  thresh=100M/ext4-2dd-4k-8p-4096M-100M:10-X
                   44.99        +0.4%        45.18  thresh=100M/xfs-10dd-4k-8p-4096M-100M:10-X
                   57.90        -0.4%        57.66  thresh=100M/xfs-1dd-4k-8p-4096M-100M:10-X
                   54.21        +1.2%        54.85  thresh=100M/xfs-2dd-4k-8p-4096M-100M:10-X
                   56.17        -0.6%        55.81  thresh=1G/btrfs-100dd-4k-8p-4096M-1024M:10-X
                   56.24        -1.8%        55.20  thresh=1G/btrfs-10dd-4k-8p-4096M-1024M:10-X
                   56.30        -2.1%        55.13  thresh=1G/btrfs-1dd-4k-8p-4096M-1024M:10-X
                   36.96        -2.0%        36.23  thresh=1G/ext3-100dd-4k-8p-4096M-1024M:10-X
                   40.77        -0.2%        40.68  thresh=1G/ext3-10dd-4k-8p-4096M-1024M:10-X
                   49.45        -0.6%        49.14  thresh=1G/ext3-1dd-4k-8p-4096M-1024M:10-X
                   48.53        -1.6%        47.77  thresh=1G/ext4-100dd-4k-8p-4096M-1024M:10-X
                   55.34        -0.8%        54.89  thresh=1G/ext4-10dd-4k-8p-4096M-1024M:10-X
                   57.52        -1.5%        56.67  thresh=1G/ext4-1dd-4k-8p-4096M-1024M:10-X
                   45.92        -7.2%        42.63  thresh=1G/xfs-100dd-4k-8p-4096M-1024M:10-X
                   53.79        -1.0%        53.25  thresh=1G/xfs-10dd-4k-8p-4096M-1024M:10-X
                   55.72        +2.4%        57.04  thresh=1G/xfs-1dd-4k-8p-4096M-1024M:10-X
                   56.22        -2.2%        54.96  thresh=8M/xfs-1dd-4k-8p-4096M-8M:10-X
                 1284.66        -1.0%      1272.05  TOTAL write_bw

patches 1-4 VS patches 1-4,7 and remove redirty_tail() on pages_skipped

3.1.0-rc9-ioless-full-more_io_wait-5-next-20111014+  3.1.0-rc9-ioless-full-more_io_wait-8-next-20111014+
------------------------  ------------------------
                   58.22        -2.9%        56.55  thresh=100M/btrfs-10dd-4k-8p-4096M-100M:10-X
                   57.20        -2.0%        56.03  thresh=100M/btrfs-1dd-4k-8p-4096M-100M:10-X
                   57.56        -2.1%        56.38  thresh=100M/btrfs-2dd-4k-8p-4096M-100M:10-X
                   39.20        -0.4%        39.06  thresh=100M/ext3-10dd-4k-8p-4096M-100M:10-X
                   46.92        -0.5%        46.68  thresh=100M/ext3-1dd-4k-8p-4096M-100M:10-X
                   42.98        -1.2%        42.45  thresh=100M/ext3-2dd-4k-8p-4096M-100M:10-X
                   46.43        -1.2%        45.89  thresh=100M/ext4-10dd-4k-8p-4096M-100M:10-X
                   56.41        -2.5%        54.98  thresh=100M/ext4-1dd-4k-8p-4096M-100M:10-X
                   53.71        -1.4%        52.96  thresh=100M/ext4-2dd-4k-8p-4096M-100M:10-X
                   44.99        -1.4%        44.36  thresh=100M/xfs-10dd-4k-8p-4096M-100M:10-X
                   57.90        -0.8%        57.42  thresh=100M/xfs-1dd-4k-8p-4096M-100M:10-X
                   54.21        -0.7%        53.84  thresh=100M/xfs-2dd-4k-8p-4096M-100M:10-X
                   56.17        -1.1%        55.57  thresh=1G/btrfs-100dd-4k-8p-4096M-1024M:10-X
                   56.24        -1.1%        55.61  thresh=1G/btrfs-10dd-4k-8p-4096M-1024M:10-X
                   56.30        -1.5%        55.47  thresh=1G/btrfs-1dd-4k-8p-4096M-1024M:10-X
                   36.96        -2.0%        36.23  thresh=1G/ext3-100dd-4k-8p-4096M-1024M:10-X
                   40.77        -0.8%        40.45  thresh=1G/ext3-10dd-4k-8p-4096M-1024M:10-X
                   49.45        -1.5%        48.71  thresh=1G/ext3-1dd-4k-8p-4096M-1024M:10-X
                   48.53        -2.8%        47.17  thresh=1G/ext4-100dd-4k-8p-4096M-1024M:10-X
                   55.34        -2.3%        54.05  thresh=1G/ext4-10dd-4k-8p-4096M-1024M:10-X
                   57.52        -3.6%        55.47  thresh=1G/ext4-1dd-4k-8p-4096M-1024M:10-X
                   45.92        +3.0%        47.30  thresh=1G/xfs-100dd-4k-8p-4096M-1024M:10-X
                   53.79        -2.3%        52.54  thresh=1G/xfs-10dd-4k-8p-4096M-1024M:10-X
                   55.72        +2.3%        57.02  thresh=1G/xfs-1dd-4k-8p-4096M-1024M:10-X
                   35.77        -0.7%        35.54  thresh=8M/ext4-10dd-4k-8p-4096M-8M:10-X
                   56.43        -2.8%        54.86  thresh=8M/ext4-1dd-4k-8p-4096M-8M:10-X
                   51.94        -2.7%        50.52  thresh=8M/ext4-2dd-4k-8p-4096M-8M:10-X
                   31.48        -0.7%        31.25  thresh=8M/xfs-10dd-4k-8p-4096M-8M:10-X
                   56.22        -2.2%        55.00  thresh=8M/xfs-1dd-4k-8p-4096M-8M:10-X
                   48.75        -2.1%        47.73  thresh=8M/xfs-2dd-4k-8p-4096M-8M:10-X
                 1509.03        -1.5%      1487.08  TOTAL write_bw

patches 1-4 VS patches 1-4,6 and remove redirty_tail() on pages_skipped

3.1.0-rc9-ioless-full-more_io_wait-5-next-20111014+  3.1.0-rc9-ioless-full-more_io_wait-6-next-20111014+  
------------------------  ------------------------  
                   57.90        -0.7%        57.48  thresh=100M/xfs-1dd-4k-8p-4096M-100M:10-X
                   54.21        -0.4%        54.01  thresh=100M/xfs-2dd-4k-8p-4096M-100M:10-X
                   56.17        -1.1%        55.57  thresh=1G/btrfs-100dd-4k-8p-4096M-1024M:10-X
                   56.24        -1.5%        55.37  thresh=1G/btrfs-10dd-4k-8p-4096M-1024M:10-X
                   56.30        -1.6%        55.38  thresh=1G/btrfs-1dd-4k-8p-4096M-1024M:10-X
                   36.96        -1.8%        36.28  thresh=1G/ext3-100dd-4k-8p-4096M-1024M:10-X
                   40.77        -1.0%        40.37  thresh=1G/ext3-10dd-4k-8p-4096M-1024M:10-X
                   49.45        -0.5%        49.20  thresh=1G/ext3-1dd-4k-8p-4096M-1024M:10-X
                   48.53        +0.3%        48.68  thresh=1G/ext4-100dd-4k-8p-4096M-1024M:10-X
                   55.34        -1.8%        54.34  thresh=1G/ext4-10dd-4k-8p-4096M-1024M:10-X
                   57.52        -0.9%        57.03  thresh=1G/ext4-1dd-4k-8p-4096M-1024M:10-X
                   45.92        -0.5%        45.68  thresh=1G/xfs-100dd-4k-8p-4096M-1024M:10-X
                   53.79        +3.2%        55.52  thresh=1G/xfs-10dd-4k-8p-4096M-1024M:10-X
                   55.72        +3.2%        57.51  thresh=1G/xfs-1dd-4k-8p-4096M-1024M:10-X
                  724.81        -0.3%       722.43  TOTAL write_bw

patches 1-4 VS patches 1-4,6,7 and remove redirty_tail() on pages_skipped

3.1.0-rc9-ioless-full-more_io_wait-5-next-20111014+  3.1.0-rc9-ioless-full-more_io_wait-7-next-20111014+  
------------------------  ------------------------  
                   46.92        -0.7%        46.58  thresh=100M/ext3-1dd-4k-8p-4096M-100M:10-X
                   42.98        -0.7%        42.68  thresh=100M/ext3-2dd-4k-8p-4096M-100M:10-X
                   46.43        -0.8%        46.06  thresh=100M/ext4-10dd-4k-8p-4096M-100M:10-X
                   56.41        -2.5%        54.97  thresh=100M/ext4-1dd-4k-8p-4096M-100M:10-X
                   53.71        +0.2%        53.83  thresh=100M/ext4-2dd-4k-8p-4096M-100M:10-X
                   44.99        -2.0%        44.11  thresh=100M/xfs-10dd-4k-8p-4096M-100M:10-X
                   57.90        -1.1%        57.25  thresh=100M/xfs-1dd-4k-8p-4096M-100M:10-X
                   54.21        +0.1%        54.29  thresh=100M/xfs-2dd-4k-8p-4096M-100M:10-X
                   56.17        -1.4%        55.36  thresh=1G/btrfs-100dd-4k-8p-4096M-1024M:10-X
                   56.24        -1.5%        55.40  thresh=1G/btrfs-10dd-4k-8p-4096M-1024M:10-X
                   56.30        -2.4%        54.96  thresh=1G/btrfs-1dd-4k-8p-4096M-1024M:10-X
                   36.96        -1.2%        36.53  thresh=1G/ext3-100dd-4k-8p-4096M-1024M:10-X
                   40.77        +0.1%        40.80  thresh=1G/ext3-10dd-4k-8p-4096M-1024M:10-X
                   49.45        -0.4%        49.24  thresh=1G/ext3-1dd-4k-8p-4096M-1024M:10-X
                   48.53        +0.8%        48.93  thresh=1G/ext4-100dd-4k-8p-4096M-1024M:10-X
                   55.34        -1.7%        54.41  thresh=1G/ext4-10dd-4k-8p-4096M-1024M:10-X
                   57.52        -1.2%        56.86  thresh=1G/ext4-1dd-4k-8p-4096M-1024M:10-X
                   45.92        -7.4%        42.52  thresh=1G/xfs-100dd-4k-8p-4096M-1024M:10-X
                   53.79        +1.6%        54.67  thresh=1G/xfs-10dd-4k-8p-4096M-1024M:10-X
                   55.72        +2.6%        57.15  thresh=1G/xfs-1dd-4k-8p-4096M-1024M:10-X
                 1016.26        -1.0%      1006.59  TOTAL write_bw

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes
  2011-10-22  3:11       ` Wu Fengguang
@ 2011-10-22  5:38         ` Wu Fengguang
  2011-10-22  6:59           ` Wu Fengguang
  2011-10-22  7:46           ` Wu Fengguang
  0 siblings, 2 replies; 22+ messages in thread
From: Wu Fengguang @ 2011-10-22  5:38 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Dave Chinner, Christoph Hellwig, Andrew Morton, LKML

On Sat, Oct 22, 2011 at 11:11:35AM +0800, Wu Fengguang wrote:
> > > btw, with the I_SYNC case converted, it's actually no longer necessary
> > > to keep a standalone b_more_io_wait. It should still be better to keep
> > > the list and the above error check for catching possible errors and
> > > the flexibility of adding policies like "don't retry possible blocked
> > > inodes in N seconds as long as there are other inodes to work with".
> > > 
> > > The below diff only intends to show the _possibility_ to remove
> > > b_more_io_wait:
> >   Good observation. So should we introduce b_more_io_wait in the end? We
> > could always introduce it when the need for some more complicated policy
> > comes...
> > 
> 
> I have no problem removing it if you liked it more. Anyway, let me
> test the idea out first (just kicked off the tests).

When removing b_more_io_wait, performance is slightly dropped
comparing to the full more_io_wait patchset.

3.1.0-rc9-ioless-full-more_io_wait-next-20111014+  3.1.0-rc9-ioless-full-more_io_wait-x-next-20111014+  
------------------------  ------------------------  
                   45.30        +6.3%        48.14  thresh=1G/ext3-1dd-4k-8p-4096M-1024M:10-X
                   48.23        -2.0%        47.27  thresh=1G/ext4-100dd-4k-8p-4096M-1024M:10-X
                   54.21        -2.6%        52.80  thresh=1G/ext4-10dd-4k-8p-4096M-1024M:10-X
                   56.07        -0.3%        55.91  thresh=1G/ext4-1dd-4k-8p-4096M-1024M:10-X
                   45.12        -5.8%        42.49  thresh=1G/xfs-100dd-4k-8p-4096M-1024M:10-X
                   53.94        -1.2%        53.27  thresh=1G/xfs-10dd-4k-8p-4096M-1024M:10-X
                   55.66        -0.1%        55.63  thresh=1G/xfs-1dd-4k-8p-4096M-1024M:10-X
                  358.53        -0.8%       355.51  TOTAL write_bw

I'll try to reduce the changes and retest.

In general it looks better we first root case the "decreasing wrote
pages by writeback_single_inode() over time" problem before looking
into further steps..

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes
  2011-10-22  5:38         ` Wu Fengguang
@ 2011-10-22  6:59           ` Wu Fengguang
  2011-10-22  7:07             ` Wu Fengguang
  2011-10-22  7:46           ` Wu Fengguang
  1 sibling, 1 reply; 22+ messages in thread
From: Wu Fengguang @ 2011-10-22  6:59 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Dave Chinner, Christoph Hellwig, Andrew Morton,
	LKML, Tang, Feng

[-- Attachment #1: Type: text/plain, Size: 551 bytes --]

> In general it looks better we first root case the "decreasing wrote
> pages by writeback_single_inode() over time" problem before looking
> into further steps..

The problem shows up in many of the regression cases. For example,
the thresh=1G/ext4-100dd case with patches 1-4 applied will not only
see nr_writeback occasionally dropped low (bdi_dirty_state-8:0.png),
but also see repeated pattern of nr_wrote by writeback_single_inode()
slowly decreasing to 0 during some period of dozens of seconds
(writeback_single_inode.png).

Thanks,
Fengguang

[-- Attachment #2: bdi_dirty_state-8:0.png --]
[-- Type: image/png, Size: 42728 bytes --]

[-- Attachment #3: balance_dirty_pages-pages.png --]
[-- Type: image/png, Size: 85151 bytes --]

[-- Attachment #4: writeback_single_inode.png --]
[-- Type: image/png, Size: 52548 bytes --]

[-- Attachment #5: iostat-misc.png --]
[-- Type: image/png, Size: 102612 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes
  2011-10-22  6:59           ` Wu Fengguang
@ 2011-10-22  7:07             ` Wu Fengguang
  0 siblings, 0 replies; 22+ messages in thread
From: Wu Fengguang @ 2011-10-22  7:07 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Dave Chinner, Christoph Hellwig, Andrew Morton,
	LKML, Tang, Feng

[-- Attachment #1: Type: text/plain, Size: 937 bytes --]

On Sat, Oct 22, 2011 at 02:59:26PM +0800, Wu Fengguang wrote:
> > In general it looks better we first root case the "decreasing wrote
> > pages by writeback_single_inode() over time" problem before looking
> > into further steps..
> 
> The problem shows up in many of the regression cases. For example,
> the thresh=1G/ext4-100dd case with patches 1-4 applied will not only
> see nr_writeback occasionally dropped low (bdi_dirty_state-8:0.png),
> but also see repeated pattern of nr_wrote by writeback_single_inode()
> slowly decreasing to 0 during some period of dozens of seconds
> (writeback_single_inode.png).

Another example is the thresh=1G/xfs-100dd case with patches 1-4 and
Jan's requeue_io_wait-on-I_SYNC patch. The writeback_single_inode.png
graph shows steadily decreasing wrote pages in some period of 400
seconds. It's accompanied by decrease of nr_writeback, IO size and
impacts write bandwidth a lot.

Thanks,
Fengguang

[-- Attachment #2: bdi_dirty_state-8:0.png --]
[-- Type: image/png, Size: 54418 bytes --]

[-- Attachment #3: writeback_single_inode.png --]
[-- Type: image/png, Size: 46406 bytes --]

[-- Attachment #4: iostat-misc.png --]
[-- Type: image/png, Size: 43585 bytes --]

[-- Attachment #5: iostat-bw.png --]
[-- Type: image/png, Size: 65248 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes
  2011-10-22  5:38         ` Wu Fengguang
  2011-10-22  6:59           ` Wu Fengguang
@ 2011-10-22  7:46           ` Wu Fengguang
  1 sibling, 0 replies; 22+ messages in thread
From: Wu Fengguang @ 2011-10-22  7:46 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Dave Chinner, Christoph Hellwig, Andrew Morton, LKML

On Sat, Oct 22, 2011 at 01:38:51PM +0800, Wu Fengguang wrote:
> On Sat, Oct 22, 2011 at 11:11:35AM +0800, Wu Fengguang wrote:
> > > > btw, with the I_SYNC case converted, it's actually no longer necessary
> > > > to keep a standalone b_more_io_wait. It should still be better to keep
> > > > the list and the above error check for catching possible errors and
> > > > the flexibility of adding policies like "don't retry possible blocked
> > > > inodes in N seconds as long as there are other inodes to work with".
> > > > 
> > > > The below diff only intends to show the _possibility_ to remove
> > > > b_more_io_wait:
> > >   Good observation. So should we introduce b_more_io_wait in the end? We
> > > could always introduce it when the need for some more complicated policy
> > > comes...
> > > 
> > 
> > I have no problem removing it if you liked it more. Anyway, let me
> > test the idea out first (just kicked off the tests).
> 
> When removing b_more_io_wait, performance is slightly dropped
> comparing to the full more_io_wait patchset.
> 
> 3.1.0-rc9-ioless-full-more_io_wait-next-20111014+  3.1.0-rc9-ioless-full-more_io_wait-x-next-20111014+  
> ------------------------  ------------------------  
>                    45.30        +6.3%        48.14  thresh=1G/ext3-1dd-4k-8p-4096M-1024M:10-X
>                    48.23        -2.0%        47.27  thresh=1G/ext4-100dd-4k-8p-4096M-1024M:10-X
>                    54.21        -2.6%        52.80  thresh=1G/ext4-10dd-4k-8p-4096M-1024M:10-X
>                    56.07        -0.3%        55.91  thresh=1G/ext4-1dd-4k-8p-4096M-1024M:10-X
>                    45.12        -5.8%        42.49  thresh=1G/xfs-100dd-4k-8p-4096M-1024M:10-X
>                    53.94        -1.2%        53.27  thresh=1G/xfs-10dd-4k-8p-4096M-1024M:10-X
>                    55.66        -0.1%        55.63  thresh=1G/xfs-1dd-4k-8p-4096M-1024M:10-X
>                   358.53        -0.8%       355.51  TOTAL write_bw
> 
> I'll try to reduce the changes and retest.

Unfortunately, the reduced patches 1-4 + I_SYNC change + remove
requeue_more_io_wait combination still performances noticeably worse:

3.1.0-rc9-ioless-full-next-20111014+  3.1.0-rc9-ioless-full-more_io_wait-x2-next-20111014+  
------------------------  ------------------------  
                   49.84        -7.9%        45.91  thresh=1G/ext4-100dd-4k-8p-4096M-1024M:10-X
                   56.03        -7.2%        52.01  thresh=1G/ext4-10dd-4k-8p-4096M-1024M:10-X
                   57.42        -1.7%        56.45  thresh=1G/ext4-1dd-4k-8p-4096M-1024M:10-X
                   45.74        -2.8%        44.48  thresh=1G/xfs-100dd-4k-8p-4096M-1024M:10-X
                   54.19        -4.8%        51.57  thresh=1G/xfs-10dd-4k-8p-4096M-1024M:10-X
                   55.93        -2.2%        54.70  thresh=1G/xfs-1dd-4k-8p-4096M-1024M:10-X
                  319.14        -4.4%       305.12  TOTAL write_bw

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-10-22  7:46 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-20 15:22 [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes Wu Fengguang
2011-10-20 15:22 ` [PATCH 1/7] writeback: introduce queue b_more_io_wait Wu Fengguang
2011-10-20 23:23   ` Jan Kara
2011-10-20 15:22 ` [PATCH 2/7] writeback: avoid redirtying when ->write_inode failed to clear I_DIRTY Wu Fengguang
2011-10-20 23:24   ` Jan Kara
2011-10-20 15:22 ` [PATCH 3/7] writeback: update wb->last_active on written pages/inodes Wu Fengguang
2011-10-20 15:22 ` [PATCH 4/7] writeback: Retry kupdate work early if we need to retry some inode writeback Wu Fengguang
2011-10-20 15:22 ` [PATCH 5/7] writeback: requeue_io_wait() on pages_skipped inode Wu Fengguang
2011-10-20 23:25   ` Jan Kara
2011-10-20 15:22 ` [PATCH 6/7] writeback: requeue_io_wait() on blocked inode Wu Fengguang
2011-10-20 23:31   ` Jan Kara
2011-10-20 15:22 ` [PATCH 7/7] writeback: requeue_io_wait() when failed to grab superblock Wu Fengguang
2011-10-20 23:25   ` Jan Kara
2011-10-20 23:21 ` [PATCH 0/7] writeback: avoid touching dirtied_when on blocked inodes Jan Kara
2011-10-21 10:40   ` Wu Fengguang
2011-10-21 19:54     ` Jan Kara
2011-10-22  3:11       ` Wu Fengguang
2011-10-22  5:38         ` Wu Fengguang
2011-10-22  6:59           ` Wu Fengguang
2011-10-22  7:07             ` Wu Fengguang
2011-10-22  7:46           ` Wu Fengguang
2011-10-22  4:46 ` Wu Fengguang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.