All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/14] Per-sb tracking of dirty inodes
@ 2014-07-31 22:00 Jan Kara
  2014-07-31 22:00 ` [PATCH 01/14] writeback: Get rid of superblock pinning Jan Kara
                   ` (16 more replies)
  0 siblings, 17 replies; 25+ messages in thread
From: Jan Kara @ 2014-07-31 22:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang, Jan Kara

  Hello,

  here is my attempt to implement per superblock tracking of dirty inodes.
I have two motivations for this:
  1) I've tried to get rid of overwriting of inode's dirty time stamp during
     writeback and filtering of dirty inodes by superblock makes this
     significantly harder. For similar reasons also improving scalability
     of inode dirty tracking is more complicated than it has to be.
  2) Filesystems like Tux3 (but to some extent also XFS) would like to
     influence order in which inodes are written back. Currently this isn't
     possible. Tracking dirty inodes per superblock makes it easy to later
     implement filesystem callback for writing back inodes and also possibly
     allow filesystems to implement their own dirty tracking if they desire.

  The patches pass xfstests run and also some sync livelock avoidance tests
I have with 4 filesystems on 2 disks so they should be reasonably sound.
Before I go and base more work on this I'd like to hear some feedback about
whether people find this sane and workable.

After this patch set it is trivial to provide a per-sb callback for writeback
(at level of writeback_inodes()). It is also fairly easy to allow filesystem to
completely override dirty tracking (only needs some restructuring of
mark_inode_dirty()). I can write these as a proof-of-concept patches for Tux3
guys once the general approach in this patch set is acked. Or if there are
some in-tree users (XFS?, btrfs?) I can include them in the patch set.

Any comments welcome!

								Honza 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/14] writeback: Get rid of superblock pinning
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
@ 2014-07-31 22:00 ` Jan Kara
  2014-07-31 22:00 ` [PATCH 02/14] writeback: Remove writeback_inodes_wb() Jan Kara
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2014-07-31 22:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang, Jan Kara

Currently flusher thread pinned superblock (via grab_super_passive()) it
was working on. However this is unnecessary after commit
169ebd90131b "writeback: Avoid iput() from flusher thread". Before this
commit we had to block umount so that it doesn't complain about busy inodes
because of elevated i_count flusher thread used. After this commit we
can let umount run and it will block in evict_inodes() waiting for
flusher thread to be done with the inode (thus flusher thread is also
safe against inode going away from under it). Removing the superblock
pinning allows us to simplify the code quite a bit. Among other things
there's no need to sort b_io list in move_expired_inodes() anymore.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c                | 105 ++++++---------------------------------
 include/trace/events/writeback.h |   2 +-
 2 files changed, 17 insertions(+), 90 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index be568b7311d6..f85ee6795a28 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -255,11 +255,7 @@ static int move_expired_inodes(struct list_head *delaying_queue,
 			       struct list_head *dispatch_queue,
 			       struct wb_writeback_work *work)
 {
-	LIST_HEAD(tmp);
-	struct list_head *pos, *node;
-	struct super_block *sb = NULL;
 	struct inode *inode;
-	int do_sb_sort = 0;
 	int moved = 0;
 
 	while (!list_empty(delaying_queue)) {
@@ -267,31 +263,9 @@ static int move_expired_inodes(struct list_head *delaying_queue,
 		if (work->older_than_this &&
 		    inode_dirtied_after(inode, *work->older_than_this))
 			break;
-		list_move(&inode->i_wb_list, &tmp);
+		list_move(&inode->i_wb_list, dispatch_queue);
 		moved++;
-		if (sb_is_blkdev_sb(inode->i_sb))
-			continue;
-		if (sb && sb != inode->i_sb)
-			do_sb_sort = 1;
-		sb = inode->i_sb;
-	}
-
-	/* just one sb in list, splice to dispatch_queue and we're done */
-	if (!do_sb_sort) {
-		list_splice(&tmp, dispatch_queue);
-		goto out;
-	}
-
-	/* Move inodes from one superblock together */
-	while (!list_empty(&tmp)) {
-		sb = wb_inode(tmp.prev)->i_sb;
-		list_for_each_prev_safe(pos, node, &tmp) {
-			inode = wb_inode(pos);
-			if (inode->i_sb == sb)
-				list_move(&inode->i_wb_list, dispatch_queue);
-		}
 	}
-out:
 	return moved;
 }
 
@@ -500,7 +474,7 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
  *
  * This function is designed to be called for writing back one inode which
  * we go e.g. from filesystem. Flusher thread uses __writeback_single_inode()
- * and does more profound writeback list handling in writeback_sb_inodes().
+ * and does more profound writeback list handling in writeback_inodes().
  */
 static int
 writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
@@ -570,8 +544,8 @@ static long writeback_chunk_size(struct backing_dev_info *bdi,
 	 * The intended call sequence for WB_SYNC_ALL writeback is:
 	 *
 	 *      wb_writeback()
-	 *          writeback_sb_inodes()       <== called only once
-	 *              write_cache_pages()     <== called once for each inode
+	 *          writeback_inodes()       <== called only once
+	 *              write_cache_pages()  <== called once for each inode
 	 *                   (quickly) tag currently dirty pages
 	 *                   (maybe slowly) sync all tagged pages
 	 */
@@ -589,13 +563,12 @@ static long writeback_chunk_size(struct backing_dev_info *bdi,
 }
 
 /*
- * Write a portion of b_io inodes which belong to @sb.
+ * Write inodes in b_io list belonging to @work->sb (if set).
  *
  * Return the number of pages and/or inodes written.
  */
-static long writeback_sb_inodes(struct super_block *sb,
-				struct bdi_writeback *wb,
-				struct wb_writeback_work *work)
+static long writeback_inodes(struct bdi_writeback *wb,
+			     struct wb_writeback_work *work)
 {
 	struct writeback_control wbc = {
 		.sync_mode		= work->sync_mode,
@@ -614,23 +587,14 @@ static long writeback_sb_inodes(struct super_block *sb,
 	while (!list_empty(&wb->b_io)) {
 		struct inode *inode = wb_inode(wb->b_io.prev);
 
-		if (inode->i_sb != sb) {
-			if (work->sb) {
-				/*
-				 * We only want to write back data for this
-				 * superblock, move all inodes not belonging
-				 * to it back onto the dirty list.
-				 */
-				redirty_tail(inode, wb);
-				continue;
-			}
-
+		if (work->sb && inode->i_sb != work->sb) {
 			/*
-			 * The inode belongs to a different superblock.
-			 * Bounce back to the caller to unpin this and
-			 * pin the next superblock.
+			 * We only want to write back data for this
+			 * superblock, move all inodes not belonging
+			 * to it back onto the dirty list.
 			 */
-			break;
+			redirty_tail(inode, wb);
+			continue;
 		}
 
 		/*
@@ -656,7 +620,7 @@ static long writeback_sb_inodes(struct super_block *sb,
 			 */
 			spin_unlock(&inode->i_lock);
 			requeue_io(inode, wb);
-			trace_writeback_sb_inodes_requeue(inode);
+			trace_writeback_inodes_requeue(inode);
 			continue;
 		}
 		spin_unlock(&wb->list_lock);
@@ -710,40 +674,6 @@ static long writeback_sb_inodes(struct super_block *sb,
 	return wrote;
 }
 
-static long __writeback_inodes_wb(struct bdi_writeback *wb,
-				  struct wb_writeback_work *work)
-{
-	unsigned long start_time = jiffies;
-	long wrote = 0;
-
-	while (!list_empty(&wb->b_io)) {
-		struct inode *inode = wb_inode(wb->b_io.prev);
-		struct super_block *sb = inode->i_sb;
-
-		if (!grab_super_passive(sb)) {
-			/*
-			 * grab_super_passive() may fail consistently due to
-			 * s_umount being grabbed by someone else. Don't use
-			 * requeue_io() to avoid busy retrying the inode/sb.
-			 */
-			redirty_tail(inode, wb);
-			continue;
-		}
-		wrote += writeback_sb_inodes(sb, wb, work);
-		drop_super(sb);
-
-		/* refer to the same tests at the end of writeback_sb_inodes */
-		if (wrote) {
-			if (time_is_before_jiffies(start_time + HZ / 10UL))
-				break;
-			if (work->nr_pages <= 0)
-				break;
-		}
-	}
-	/* Leave any unwritten inodes on b_io */
-	return wrote;
-}
-
 static long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages,
 				enum wb_reason reason)
 {
@@ -757,7 +687,7 @@ static long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages,
 	spin_lock(&wb->list_lock);
 	if (list_empty(&wb->b_io))
 		queue_io(wb, &work);
-	__writeback_inodes_wb(wb, &work);
+	writeback_inodes(wb, &work);
 	spin_unlock(&wb->list_lock);
 
 	return nr_pages - work.nr_pages;
@@ -857,10 +787,7 @@ static long wb_writeback(struct bdi_writeback *wb,
 		trace_writeback_start(wb->bdi, work);
 		if (list_empty(&wb->b_io))
 			queue_io(wb, work);
-		if (work->sb)
-			progress = writeback_sb_inodes(work->sb, wb, work);
-		else
-			progress = __writeback_inodes_wb(wb, work);
+		progress = writeback_inodes(wb, work);
 		trace_writeback_written(wb->bdi, work);
 
 		wb_update_bandwidth(wb, wb_start);
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index cee02d65ab3f..9bf6f2da32d2 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -477,7 +477,7 @@ TRACE_EVENT(balance_dirty_pages,
 	  )
 );
 
-TRACE_EVENT(writeback_sb_inodes_requeue,
+TRACE_EVENT(writeback_inodes_requeue,
 
 	TP_PROTO(struct inode *inode),
 	TP_ARGS(inode),
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/14] writeback: Remove writeback_inodes_wb()
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
  2014-07-31 22:00 ` [PATCH 01/14] writeback: Get rid of superblock pinning Jan Kara
@ 2014-07-31 22:00 ` Jan Kara
  2014-07-31 22:00 ` [PATCH 03/14] writeback: Remove useless argument of writeback_single_inode() Jan Kara
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2014-07-31 22:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang, Jan Kara

writeback_inodes_wb() has only single user and that is emergency
writeback when we cannot create a worker thread for writeback work. So
inline the function directly into the caller and call wb_writeback()
instead of writeback_inodes() since that is what all other places are
calling when they want some work to be done. This will reduce code
duplication when we transition do per-sb writeback.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 29 ++++++++---------------------
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index f85ee6795a28..d631ddaa642b 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -674,25 +674,6 @@ static long writeback_inodes(struct bdi_writeback *wb,
 	return wrote;
 }
 
-static long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages,
-				enum wb_reason reason)
-{
-	struct wb_writeback_work work = {
-		.nr_pages	= nr_pages,
-		.sync_mode	= WB_SYNC_NONE,
-		.range_cyclic	= 1,
-		.reason		= reason,
-	};
-
-	spin_lock(&wb->list_lock);
-	if (list_empty(&wb->b_io))
-		queue_io(wb, &work);
-	writeback_inodes(wb, &work);
-	spin_unlock(&wb->list_lock);
-
-	return nr_pages - work.nr_pages;
-}
-
 static bool over_bground_thresh(struct backing_dev_info *bdi)
 {
 	unsigned long background_thresh, dirty_thresh;
@@ -976,8 +957,14 @@ void bdi_writeback_workfn(struct work_struct *work)
 		 * the emergency worker.  Don't hog it.  Hopefully, 1024 is
 		 * enough for efficient IO.
 		 */
-		pages_written = writeback_inodes_wb(&bdi->wb, 1024,
-						    WB_REASON_FORKER_THREAD);
+		struct wb_writeback_work work = {
+			.nr_pages	= 1024,
+			.sync_mode	= WB_SYNC_NONE,
+			.range_cyclic	= 1,
+			.reason		= WB_REASON_FORKER_THREAD,
+		};
+
+		pages_written = wb_writeback(wb, &work);
 		trace_writeback_pages_written(pages_written);
 	}
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/14] writeback: Remove useless argument of writeback_single_inode()
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
  2014-07-31 22:00 ` [PATCH 01/14] writeback: Get rid of superblock pinning Jan Kara
  2014-07-31 22:00 ` [PATCH 02/14] writeback: Remove writeback_inodes_wb() Jan Kara
@ 2014-07-31 22:00 ` Jan Kara
  2014-07-31 22:00 ` [PATCH 04/14] writeback: Don't put inodes which cannot be written to b_more_io Jan Kara
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2014-07-31 22:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang, Jan Kara

All callers of writeback_single_inode() got the wb argument that
function as inode_to_bdi(inode)->wb. So just lookup the proper wb inside
the functions.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index d631ddaa642b..dbace1c09b7f 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -477,10 +477,10 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
  * and does more profound writeback list handling in writeback_inodes().
  */
 static int
-writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
-		       struct writeback_control *wbc)
+writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 {
 	int ret = 0;
+	struct bdi_writeback *wb = &inode_to_bdi(inode)->wb;
 
 	spin_lock(&inode->i_lock);
 	if (!atomic_read(&inode->i_count))
@@ -1320,7 +1320,6 @@ EXPORT_SYMBOL(sync_inodes_sb);
  */
 int write_inode_now(struct inode *inode, int sync)
 {
-	struct bdi_writeback *wb = &inode_to_bdi(inode)->wb;
 	struct writeback_control wbc = {
 		.nr_to_write = LONG_MAX,
 		.sync_mode = sync ? WB_SYNC_ALL : WB_SYNC_NONE,
@@ -1332,7 +1331,7 @@ int write_inode_now(struct inode *inode, int sync)
 		wbc.nr_to_write = 0;
 
 	might_sleep();
-	return writeback_single_inode(inode, wb, &wbc);
+	return writeback_single_inode(inode, &wbc);
 }
 EXPORT_SYMBOL(write_inode_now);
 
@@ -1349,7 +1348,7 @@ EXPORT_SYMBOL(write_inode_now);
  */
 int sync_inode(struct inode *inode, struct writeback_control *wbc)
 {
-	return writeback_single_inode(inode, &inode_to_bdi(inode)->wb, wbc);
+	return writeback_single_inode(inode, wbc);
 }
 EXPORT_SYMBOL(sync_inode);
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/14] writeback: Don't put inodes which cannot be written to b_more_io
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
                   ` (2 preceding siblings ...)
  2014-07-31 22:00 ` [PATCH 03/14] writeback: Remove useless argument of writeback_single_inode() Jan Kara
@ 2014-07-31 22:00 ` Jan Kara
  2014-07-31 22:00 ` [PATCH 05/14] writeback: Move dwork and last_old_flush into backing_dev_info Jan Kara
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2014-07-31 22:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang, Jan Kara

Currently we are somewhat inconsistent in which inodes we put to
b_more_io. We put there inodes which have exhausted their writeback
chunk but we also put there inodes which could not be written because
someone else was already writing them (the inode had I_SYNC set). OTOH
if inodes couldn't be written for other reasons (filesystem indicates
this with pages_skipped or by simply not writing anything) we
redirty_tail() the inode.

So make things more consistent by putting to b_more_io list only inodes
that exhausted their writeback chunk and call redirty_tail() in every
case where inode couldn't be written for some reason. This also makes
the busyloop prevention in wb_writeback() unnecessary since unwriteable
inodes will be moved back to b_dirty list and thus won't confuse us in
b_io or b_more_io lists.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c                | 26 ++++----------------------
 include/trace/events/writeback.h |  1 -
 2 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index dbace1c09b7f..6ee9ee52e3de 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -611,15 +611,12 @@ static long writeback_inodes(struct bdi_writeback *wb,
 		if ((inode->i_state & I_SYNC) && wbc.sync_mode != WB_SYNC_ALL) {
 			/*
 			 * If this inode is locked for writeback and we are not
-			 * doing writeback-for-data-integrity, move it to
-			 * b_more_io so that writeback can proceed with the
-			 * other inodes on s_io.
-			 *
-			 * We'll have another go at writing back this inode
-			 * when we completed a full scan of b_io.
+			 * doing writeback-for-data-integrity, move it back to
+			 * the dirty list so that writeback can proceed with the
+			 * other inodes on b_io.
 			 */
 			spin_unlock(&inode->i_lock);
-			requeue_io(inode, wb);
+			redirty_tail(inode, wb);
 			trace_writeback_inodes_requeue(inode);
 			continue;
 		}
@@ -722,7 +719,6 @@ static long wb_writeback(struct bdi_writeback *wb,
 	unsigned long wb_start = jiffies;
 	long nr_pages = work->nr_pages;
 	unsigned long oldest_jif;
-	struct inode *inode;
 	long progress;
 
 	oldest_jif = jiffies;
@@ -788,20 +784,6 @@ static long wb_writeback(struct bdi_writeback *wb,
 		 */
 		if (list_empty(&wb->b_more_io))
 			break;
-		/*
-		 * Nothing written. Wait for some inode to
-		 * become available for writeback. Otherwise
-		 * we'll just busyloop.
-		 */
-		if (!list_empty(&wb->b_more_io))  {
-			trace_writeback_wait(wb->bdi, work);
-			inode = wb_inode(wb->b_more_io.prev);
-			spin_lock(&inode->i_lock);
-			spin_unlock(&wb->list_lock);
-			/* This function drops i_lock... */
-			inode_sleep_on_writeback(inode);
-			spin_lock(&wb->list_lock);
-		}
 	}
 	spin_unlock(&wb->list_lock);
 
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 9bf6f2da32d2..102e2ad9f90f 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -188,7 +188,6 @@ DEFINE_WRITEBACK_WORK_EVENT(writeback_queue);
 DEFINE_WRITEBACK_WORK_EVENT(writeback_exec);
 DEFINE_WRITEBACK_WORK_EVENT(writeback_start);
 DEFINE_WRITEBACK_WORK_EVENT(writeback_written);
-DEFINE_WRITEBACK_WORK_EVENT(writeback_wait);
 
 TRACE_EVENT(writeback_pages_written,
 	TP_PROTO(long pages_written),
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/14] writeback: Move dwork and last_old_flush into backing_dev_info
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
                   ` (3 preceding siblings ...)
  2014-07-31 22:00 ` [PATCH 04/14] writeback: Don't put inodes which cannot be written to b_more_io Jan Kara
@ 2014-07-31 22:00 ` Jan Kara
  2014-07-31 22:00 ` [PATCH 06/14] writeback: Switch locking of bandwidth fields to wb_lock Jan Kara
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2014-07-31 22:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang, Jan Kara

Move dwork and last_old_flush from bdi_writeback structure directly into
backing_dev_info. Separation between backing_dev_info and bdi_writeback
is inconsistent if anything so let's keep only dirty tracking stuff in
struct bdi_writeback. Also remove unused nr field when we are changing
the structure.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c           | 16 ++++++++--------
 include/linux/backing-dev.h |  8 ++++----
 mm/backing-dev.c            | 14 +++++++-------
 3 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6ee9ee52e3de..47d106ae4879 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -95,7 +95,7 @@ static void bdi_wakeup_thread(struct backing_dev_info *bdi)
 {
 	spin_lock_bh(&bdi->wb_lock);
 	if (test_bit(BDI_registered, &bdi->state))
-		mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
+		mod_delayed_work(bdi_wq, &bdi->dwork, 0);
 	spin_unlock_bh(&bdi->wb_lock);
 }
 
@@ -111,7 +111,7 @@ static void bdi_queue_work(struct backing_dev_info *bdi,
 		goto out_unlock;
 	}
 	list_add_tail(&work->list, &bdi->work_list);
-	mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
+	mod_delayed_work(bdi_wq, &bdi->dwork, 0);
 out_unlock:
 	spin_unlock_bh(&bdi->wb_lock);
 }
@@ -848,12 +848,12 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb)
 	if (!dirty_writeback_interval)
 		return 0;
 
-	expired = wb->last_old_flush +
+	expired = wb->bdi->last_old_flush +
 			msecs_to_jiffies(dirty_writeback_interval * 10);
 	if (time_before(jiffies, expired))
 		return 0;
 
-	wb->last_old_flush = jiffies;
+	wb->bdi->last_old_flush = jiffies;
 	nr_pages = get_nr_dirty_pages();
 
 	if (nr_pages) {
@@ -913,9 +913,9 @@ static long wb_do_writeback(struct bdi_writeback *wb)
  */
 void bdi_writeback_workfn(struct work_struct *work)
 {
-	struct bdi_writeback *wb = container_of(to_delayed_work(work),
-						struct bdi_writeback, dwork);
-	struct backing_dev_info *bdi = wb->bdi;
+	struct backing_dev_info *bdi = container_of(to_delayed_work(work),
+						struct backing_dev_info, dwork);
+	struct bdi_writeback *wb = &bdi->wb;
 	long pages_written;
 
 	set_worker_desc("flush-%s", dev_name(bdi->dev));
@@ -951,7 +951,7 @@ void bdi_writeback_workfn(struct work_struct *work)
 	}
 
 	if (!list_empty(&bdi->work_list))
-		mod_delayed_work(bdi_wq, &wb->dwork, 0);
+		mod_delayed_work(bdi_wq, &bdi->dwork, 0);
 	else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
 		bdi_wakeup_thread_delayed(bdi);
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index e488e9459a93..420750f5ed10 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -50,11 +50,7 @@ enum bdi_stat_item {
 
 struct bdi_writeback {
 	struct backing_dev_info *bdi;	/* our parent bdi */
-	unsigned int nr;
 
-	unsigned long last_old_flush;	/* last old data flush */
-
-	struct delayed_work dwork;	/* work item used for writeback */
 	struct list_head b_dirty;	/* dirty inodes */
 	struct list_head b_io;		/* parked for writeback */
 	struct list_head b_more_io;	/* parked for more writeback */
@@ -94,6 +90,10 @@ struct backing_dev_info {
 	unsigned int min_ratio;
 	unsigned int max_ratio, max_prop_frac;
 
+	unsigned long last_old_flush;	/* last old data flush */
+
+	struct delayed_work dwork;	/* work item used for writeback */
+
 	struct bdi_writeback wb;  /* default writeback info for this bdi */
 	spinlock_t wb_lock;	  /* protects work_list & wb.dwork scheduling */
 
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 1706cbbdf5f0..c44ba43d580d 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -299,7 +299,7 @@ void bdi_wakeup_thread_delayed(struct backing_dev_info *bdi)
 	timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
 	spin_lock_bh(&bdi->wb_lock);
 	if (test_bit(BDI_registered, &bdi->state))
-		queue_delayed_work(bdi_wq, &bdi->wb.dwork, timeout);
+		queue_delayed_work(bdi_wq, &bdi->dwork, timeout);
 	spin_unlock_bh(&bdi->wb_lock);
 }
 
@@ -373,8 +373,8 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
 	 * @bdi->bdi_list is empty telling bdi_Writeback_workfn() that @bdi
 	 * is dying and its work_list needs to be drained no matter what.
 	 */
-	mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
-	flush_delayed_work(&bdi->wb.dwork);
+	mod_delayed_work(bdi_wq, &bdi->dwork, 0);
+	flush_delayed_work(&bdi->dwork);
 	WARN_ON(!list_empty(&bdi->work_list));
 
 	/*
@@ -382,7 +382,7 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
 	 * unflushed dirty IO after work_list is drained.  Do it anyway
 	 * just in case.
 	 */
-	cancel_delayed_work_sync(&bdi->wb.dwork);
+	cancel_delayed_work_sync(&bdi->dwork);
 }
 
 /*
@@ -426,12 +426,10 @@ static void bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
 	memset(wb, 0, sizeof(*wb));
 
 	wb->bdi = bdi;
-	wb->last_old_flush = jiffies;
 	INIT_LIST_HEAD(&wb->b_dirty);
 	INIT_LIST_HEAD(&wb->b_io);
 	INIT_LIST_HEAD(&wb->b_more_io);
 	spin_lock_init(&wb->list_lock);
-	INIT_DELAYED_WORK(&wb->dwork, bdi_writeback_workfn);
 }
 
 /*
@@ -452,6 +450,8 @@ int bdi_init(struct backing_dev_info *bdi)
 	INIT_LIST_HEAD(&bdi->bdi_list);
 	INIT_LIST_HEAD(&bdi->work_list);
 
+	bdi->last_old_flush = jiffies;
+	INIT_DELAYED_WORK(&bdi->dwork, bdi_writeback_workfn);
 	bdi_wb_init(&bdi->wb, bdi);
 
 	for (i = 0; i < NR_BDI_STAT_ITEMS; i++) {
@@ -508,7 +508,7 @@ void bdi_destroy(struct backing_dev_info *bdi)
 	 * could still be pending because bdi_prune_sb() can race with the
 	 * bdi_wakeup_thread_delayed() calls from __mark_inode_dirty().
 	 */
-	cancel_delayed_work_sync(&bdi->wb.dwork);
+	cancel_delayed_work_sync(&bdi->dwork);
 
 	for (i = 0; i < NR_BDI_STAT_ITEMS; i++)
 		percpu_counter_destroy(&bdi->bdi_stat[i]);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/14] writeback: Switch locking of bandwidth fields to wb_lock
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
                   ` (4 preceding siblings ...)
  2014-07-31 22:00 ` [PATCH 05/14] writeback: Move dwork and last_old_flush into backing_dev_info Jan Kara
@ 2014-07-31 22:00 ` Jan Kara
  2014-07-31 22:00 ` [PATCH 07/14] writeback: Provide a function to get bdi from bdi_writeback Jan Kara
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2014-07-31 22:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang, Jan Kara

Currently updates of bandwidth and ratelimit fields in backing_dev_info
is protected by bdi->wb.list_lock. Since that will transition to a
per-sb lock, make those fields protected by bdi->wb_lock.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c           | 6 +++++-
 include/linux/backing-dev.h | 3 ++-
 mm/page-writeback.c         | 4 ++--
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 47d106ae4879..4bf1db730b40 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -695,7 +695,11 @@ static bool over_bground_thresh(struct backing_dev_info *bdi)
 static void wb_update_bandwidth(struct bdi_writeback *wb,
 				unsigned long start_time)
 {
-	__bdi_update_bandwidth(wb->bdi, 0, 0, 0, 0, 0, start_time);
+	struct backing_dev_info *bdi = wb->bdi;
+ 
+	spin_lock_bh(&bdi->wb_lock);
+	__bdi_update_bandwidth(bdi, 0, 0, 0, 0, 0, start_time);
+	spin_unlock_bh(&bdi->wb_lock);
 }
 
 /*
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 420750f5ed10..87096947af68 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -95,7 +95,8 @@ struct backing_dev_info {
 	struct delayed_work dwork;	/* work item used for writeback */
 
 	struct bdi_writeback wb;  /* default writeback info for this bdi */
-	spinlock_t wb_lock;	  /* protects work_list & wb.dwork scheduling */
+	spinlock_t wb_lock;	  /* protects work_list & wb.dwork scheduling,
+				     updates of bandwidth & ratelimit */
 
 	struct list_head work_list;
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 518e2c3f4c75..a0b4776f2bd1 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1164,10 +1164,10 @@ static void bdi_update_bandwidth(struct backing_dev_info *bdi,
 {
 	if (time_is_after_eq_jiffies(bdi->bw_time_stamp + BANDWIDTH_INTERVAL))
 		return;
-	spin_lock(&bdi->wb.list_lock);
+	spin_lock_bh(&bdi->wb_lock);
 	__bdi_update_bandwidth(bdi, thresh, bg_thresh, dirty,
 			       bdi_thresh, bdi_dirty, start_time);
-	spin_unlock(&bdi->wb.list_lock);
+	spin_unlock_bh(&bdi->wb_lock);
 }
 
 /*
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/14] writeback: Provide a function to get bdi from bdi_writeback
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
                   ` (5 preceding siblings ...)
  2014-07-31 22:00 ` [PATCH 06/14] writeback: Switch locking of bandwidth fields to wb_lock Jan Kara
@ 2014-07-31 22:00 ` Jan Kara
  2014-07-31 22:00 ` [PATCH 08/14] writeback: Schedule future writeback if bdi (not wb) has dirty inodes Jan Kara
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2014-07-31 22:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang, Jan Kara

When we switch to per-sb dirty tracking it won't be so trivial to
get bdi from bdi_writeback so provide a helper function for that.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c                | 25 +++++++++++++------------
 include/linux/backing-dev.h      |  5 +++++
 include/trace/events/writeback.h |  2 +-
 3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 4bf1db730b40..4247e1f7cb03 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -637,7 +637,7 @@ static long writeback_inodes(struct bdi_writeback *wb,
 		inode->i_state |= I_SYNC;
 		spin_unlock(&inode->i_lock);
 
-		write_chunk = writeback_chunk_size(wb->bdi, work);
+		write_chunk = writeback_chunk_size(wb_bdi(wb), work);
 		wbc.nr_to_write = write_chunk;
 		wbc.pages_skipped = 0;
 
@@ -695,7 +695,7 @@ static bool over_bground_thresh(struct backing_dev_info *bdi)
 static void wb_update_bandwidth(struct bdi_writeback *wb,
 				unsigned long start_time)
 {
-	struct backing_dev_info *bdi = wb->bdi;
+	struct backing_dev_info *bdi = wb_bdi(wb);
  
 	spin_lock_bh(&bdi->wb_lock);
 	__bdi_update_bandwidth(bdi, 0, 0, 0, 0, 0, start_time);
@@ -724,6 +724,7 @@ static long wb_writeback(struct bdi_writeback *wb,
 	long nr_pages = work->nr_pages;
 	unsigned long oldest_jif;
 	long progress;
+	struct backing_dev_info *bdi = wb_bdi(wb);
 
 	oldest_jif = jiffies;
 	work->older_than_this = &oldest_jif;
@@ -743,14 +744,14 @@ static long wb_writeback(struct bdi_writeback *wb,
 		 * after the other works are all done.
 		 */
 		if ((work->for_background || work->for_kupdate) &&
-		    !list_empty(&wb->bdi->work_list))
+		    !list_empty(&bdi->work_list))
 			break;
 
 		/*
 		 * For background writeout, stop when we are below the
 		 * background dirty threshold
 		 */
-		if (work->for_background && !over_bground_thresh(wb->bdi))
+		if (work->for_background && !over_bground_thresh(bdi))
 			break;
 
 		/*
@@ -765,11 +766,11 @@ static long wb_writeback(struct bdi_writeback *wb,
 		} else if (work->for_background)
 			oldest_jif = jiffies;
 
-		trace_writeback_start(wb->bdi, work);
+		trace_writeback_start(bdi, work);
 		if (list_empty(&wb->b_io))
 			queue_io(wb, work);
 		progress = writeback_inodes(wb, work);
-		trace_writeback_written(wb->bdi, work);
+		trace_writeback_written(bdi, work);
 
 		wb_update_bandwidth(wb, wb_start);
 
@@ -825,7 +826,7 @@ static unsigned long get_nr_dirty_pages(void)
 
 static long wb_check_background_flush(struct bdi_writeback *wb)
 {
-	if (over_bground_thresh(wb->bdi)) {
+	if (over_bground_thresh(wb_bdi(wb))) {
 
 		struct wb_writeback_work work = {
 			.nr_pages	= LONG_MAX,
@@ -852,12 +853,12 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb)
 	if (!dirty_writeback_interval)
 		return 0;
 
-	expired = wb->bdi->last_old_flush +
+	expired = wb_bdi(wb)->last_old_flush +
 			msecs_to_jiffies(dirty_writeback_interval * 10);
 	if (time_before(jiffies, expired))
 		return 0;
 
-	wb->bdi->last_old_flush = jiffies;
+	wb_bdi(wb)->last_old_flush = jiffies;
 	nr_pages = get_nr_dirty_pages();
 
 	if (nr_pages) {
@@ -880,11 +881,11 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb)
  */
 static long wb_do_writeback(struct bdi_writeback *wb)
 {
-	struct backing_dev_info *bdi = wb->bdi;
+	struct backing_dev_info *bdi = wb_bdi(wb);
 	struct wb_writeback_work *work;
 	long wrote = 0;
 
-	set_bit(BDI_writeback_running, &wb->bdi->state);
+	set_bit(BDI_writeback_running, &bdi->state);
 	while ((work = get_next_work_item(bdi)) != NULL) {
 
 		trace_writeback_exec(bdi, work);
@@ -906,7 +907,7 @@ static long wb_do_writeback(struct bdi_writeback *wb)
 	 */
 	wrote += wb_check_old_data_flush(wb);
 	wrote += wb_check_background_flush(wb);
-	clear_bit(BDI_writeback_running, &wb->bdi->state);
+	clear_bit(BDI_writeback_running, &bdi->state);
 
 	return wrote;
 }
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 87096947af68..5bfe30d6f01f 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -110,6 +110,11 @@ struct backing_dev_info {
 #endif
 };
 
+static inline struct backing_dev_info *wb_bdi(struct bdi_writeback *wb)
+{
+	return wb->bdi;
+}
+
 int __must_check bdi_init(struct backing_dev_info *bdi);
 void bdi_destroy(struct backing_dev_info *bdi);
 
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 102e2ad9f90f..2f04fdeebc0c 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -288,7 +288,7 @@ TRACE_EVENT(writeback_queue_io,
 	),
 	TP_fast_assign(
 		unsigned long *older_than_this = work->older_than_this;
-		strncpy(__entry->name, dev_name(wb->bdi->dev), 32);
+		strncpy(__entry->name, dev_name(wb_bdi(wb)->dev), 32);
 		__entry->older	= older_than_this ?  *older_than_this : 0;
 		__entry->age	= older_than_this ?
 				  (jiffies - *older_than_this) * 1000 / HZ : -1;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/14] writeback: Schedule future writeback if bdi (not wb) has dirty inodes
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
                   ` (6 preceding siblings ...)
  2014-07-31 22:00 ` [PATCH 07/14] writeback: Provide a function to get bdi from bdi_writeback Jan Kara
@ 2014-07-31 22:00 ` Jan Kara
  2014-07-31 22:00 ` [PATCH 09/14] writeback: Switch some function arguments from bdi_writeback to bdi Jan Kara
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2014-07-31 22:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang, Jan Kara

bdi_writeback_workfn() uses wb_has_dirty_io() to determine whether it
should schedule writeback work in future. With per-sb dirty tracking
we are going to have more dirty lists for bdi so we have to use
bdi_has_dirty_io() which will properly iterate all of the lists.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 4247e1f7cb03..c8806c81e37c 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -957,7 +957,7 @@ void bdi_writeback_workfn(struct work_struct *work)
 
 	if (!list_empty(&bdi->work_list))
 		mod_delayed_work(bdi_wq, &bdi->dwork, 0);
-	else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
+	else if (bdi_has_dirty_io(bdi) && dirty_writeback_interval)
 		bdi_wakeup_thread_delayed(bdi);
 
 	current->flags &= ~PF_SWAPWRITE;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/14] writeback: Switch some function arguments from bdi_writeback to bdi
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
                   ` (7 preceding siblings ...)
  2014-07-31 22:00 ` [PATCH 08/14] writeback: Schedule future writeback if bdi (not wb) has dirty inodes Jan Kara
@ 2014-07-31 22:00 ` Jan Kara
  2014-07-31 22:00 ` [PATCH 10/14] writeback: Move rechecking of work list into bdi_process_work_items() Jan Kara
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2014-07-31 22:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang, Jan Kara

Some functions take struct bdi_writeback as an argument. However they
need to touch corresponding bdi anyway. So just pass bdi directly
instead which somewhat simplifies the code. Also rename the functions to
match better new arguments.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 48 ++++++++++++++++++++++--------------------------
 1 file changed, 22 insertions(+), 26 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index c8806c81e37c..b7d05c0aad14 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -35,7 +35,7 @@
 #define MIN_WRITEBACK_PAGES	(4096UL >> (PAGE_CACHE_SHIFT - 10))
 
 /*
- * Passed into wb_writeback(), essentially a subset of writeback_control
+ * Passed into bdi_writeback(), essentially a subset of writeback_control
  */
 struct wb_writeback_work {
 	long nr_pages;
@@ -543,7 +543,7 @@ static long writeback_chunk_size(struct backing_dev_info *bdi,
 	 *
 	 * The intended call sequence for WB_SYNC_ALL writeback is:
 	 *
-	 *      wb_writeback()
+	 *      bdi_writeback()
 	 *          writeback_inodes()       <== called only once
 	 *              write_cache_pages()  <== called once for each inode
 	 *                   (quickly) tag currently dirty pages
@@ -658,7 +658,7 @@ static long writeback_inodes(struct bdi_writeback *wb,
 		spin_unlock(&inode->i_lock);
 		cond_resched_lock(&wb->list_lock);
 		/*
-		 * bail out to wb_writeback() often enough to check
+		 * bail out to bdi_writeback() often enough to check
 		 * background threshold and other termination conditions.
 		 */
 		if (wrote) {
@@ -692,11 +692,9 @@ static bool over_bground_thresh(struct backing_dev_info *bdi)
  * Called under wb->list_lock. If there are multiple wb per bdi,
  * only the flusher working on the first wb should do it.
  */
-static void wb_update_bandwidth(struct bdi_writeback *wb,
-				unsigned long start_time)
+static void update_bandwidth(struct backing_dev_info *bdi,
+			     unsigned long start_time)
 {
-	struct backing_dev_info *bdi = wb_bdi(wb);
- 
 	spin_lock_bh(&bdi->wb_lock);
 	__bdi_update_bandwidth(bdi, 0, 0, 0, 0, 0, start_time);
 	spin_unlock_bh(&bdi->wb_lock);
@@ -717,14 +715,14 @@ static void wb_update_bandwidth(struct bdi_writeback *wb,
  * older_than_this takes precedence over nr_to_write.  So we'll only write back
  * all dirty pages if they are all attached to "old" mappings.
  */
-static long wb_writeback(struct bdi_writeback *wb,
-			 struct wb_writeback_work *work)
+static long bdi_writeback(struct backing_dev_info *bdi,
+			  struct wb_writeback_work *work)
 {
 	unsigned long wb_start = jiffies;
 	long nr_pages = work->nr_pages;
 	unsigned long oldest_jif;
 	long progress;
-	struct backing_dev_info *bdi = wb_bdi(wb);
+	struct bdi_writeback *wb = &bdi->wb;
 
 	oldest_jif = jiffies;
 	work->older_than_this = &oldest_jif;
@@ -772,7 +770,7 @@ static long wb_writeback(struct bdi_writeback *wb,
 		progress = writeback_inodes(wb, work);
 		trace_writeback_written(bdi, work);
 
-		wb_update_bandwidth(wb, wb_start);
+		update_bandwidth(bdi, wb_start);
 
 		/*
 		 * Did we write something? Try for more
@@ -824,9 +822,9 @@ static unsigned long get_nr_dirty_pages(void)
 		get_nr_dirty_inodes();
 }
 
-static long wb_check_background_flush(struct bdi_writeback *wb)
+static long wb_check_background_flush(struct backing_dev_info *bdi)
 {
-	if (over_bground_thresh(wb_bdi(wb))) {
+	if (over_bground_thresh(bdi)) {
 
 		struct wb_writeback_work work = {
 			.nr_pages	= LONG_MAX,
@@ -836,13 +834,13 @@ static long wb_check_background_flush(struct bdi_writeback *wb)
 			.reason		= WB_REASON_BACKGROUND,
 		};
 
-		return wb_writeback(wb, &work);
+		return bdi_writeback(bdi, &work);
 	}
 
 	return 0;
 }
 
-static long wb_check_old_data_flush(struct bdi_writeback *wb)
+static long wb_check_old_data_flush(struct backing_dev_info *bdi)
 {
 	unsigned long expired;
 	long nr_pages;
@@ -853,12 +851,12 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb)
 	if (!dirty_writeback_interval)
 		return 0;
 
-	expired = wb_bdi(wb)->last_old_flush +
+	expired = bdi->last_old_flush +
 			msecs_to_jiffies(dirty_writeback_interval * 10);
 	if (time_before(jiffies, expired))
 		return 0;
 
-	wb_bdi(wb)->last_old_flush = jiffies;
+	bdi->last_old_flush = jiffies;
 	nr_pages = get_nr_dirty_pages();
 
 	if (nr_pages) {
@@ -870,7 +868,7 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb)
 			.reason		= WB_REASON_PERIODIC,
 		};
 
-		return wb_writeback(wb, &work);
+		return bdi_writeback(bdi, &work);
 	}
 
 	return 0;
@@ -879,9 +877,8 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb)
 /*
  * Retrieve work items and do the writeback they describe
  */
-static long wb_do_writeback(struct bdi_writeback *wb)
+static long bdi_process_work_items(struct backing_dev_info *bdi)
 {
-	struct backing_dev_info *bdi = wb_bdi(wb);
 	struct wb_writeback_work *work;
 	long wrote = 0;
 
@@ -890,7 +887,7 @@ static long wb_do_writeback(struct bdi_writeback *wb)
 
 		trace_writeback_exec(bdi, work);
 
-		wrote += wb_writeback(wb, work);
+		wrote += bdi_writeback(bdi, work);
 
 		/*
 		 * Notify the caller of completion if this is a synchronous
@@ -905,8 +902,8 @@ static long wb_do_writeback(struct bdi_writeback *wb)
 	/*
 	 * Check for periodic writeback, kupdated() style
 	 */
-	wrote += wb_check_old_data_flush(wb);
-	wrote += wb_check_background_flush(wb);
+	wrote += wb_check_old_data_flush(bdi);
+	wrote += wb_check_background_flush(bdi);
 	clear_bit(BDI_writeback_running, &bdi->state);
 
 	return wrote;
@@ -920,7 +917,6 @@ void bdi_writeback_workfn(struct work_struct *work)
 {
 	struct backing_dev_info *bdi = container_of(to_delayed_work(work),
 						struct backing_dev_info, dwork);
-	struct bdi_writeback *wb = &bdi->wb;
 	long pages_written;
 
 	set_worker_desc("flush-%s", dev_name(bdi->dev));
@@ -935,7 +931,7 @@ void bdi_writeback_workfn(struct work_struct *work)
 		 * rescuer as work_list needs to be drained.
 		 */
 		do {
-			pages_written = wb_do_writeback(wb);
+			pages_written = bdi_process_work_items(bdi);
 			trace_writeback_pages_written(pages_written);
 		} while (!list_empty(&bdi->work_list));
 	} else {
@@ -951,7 +947,7 @@ void bdi_writeback_workfn(struct work_struct *work)
 			.reason		= WB_REASON_FORKER_THREAD,
 		};
 
-		pages_written = wb_writeback(wb, &work);
+		pages_written = bdi_writeback(bdi, &work);
 		trace_writeback_pages_written(pages_written);
 	}
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/14] writeback: Move rechecking of work list into bdi_process_work_items()
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
                   ` (8 preceding siblings ...)
  2014-07-31 22:00 ` [PATCH 09/14] writeback: Switch some function arguments from bdi_writeback to bdi Jan Kara
@ 2014-07-31 22:00 ` Jan Kara
  2014-07-31 22:00 ` [PATCH 11/14] writeback: Shorten list_lock hold times in bdi_writeback() Jan Kara
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2014-07-31 22:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang, Jan Kara

It is more logical that when bdi_process_work_items() returns,
bdi->work_list is empty (modulo tiny races because we don't hold wb_lock
anymore but that's not changed by this patch). So move the retry-loop
from bdi_writeback_workfn() into bdi_process_work_items().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index b7d05c0aad14..149f8b35bab2 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -883,6 +883,7 @@ static long bdi_process_work_items(struct backing_dev_info *bdi)
 	long wrote = 0;
 
 	set_bit(BDI_writeback_running, &bdi->state);
+restart:
 	while ((work = get_next_work_item(bdi)) != NULL) {
 
 		trace_writeback_exec(bdi, work);
@@ -904,6 +905,10 @@ static long bdi_process_work_items(struct backing_dev_info *bdi)
 	 */
 	wrote += wb_check_old_data_flush(bdi);
 	wrote += wb_check_background_flush(bdi);
+
+	/* New work may have been queued while we did background writeback */
+	if (!list_empty(&bdi->work_list))
+		goto restart;
 	clear_bit(BDI_writeback_running, &bdi->state);
 
 	return wrote;
@@ -930,10 +935,7 @@ void bdi_writeback_workfn(struct work_struct *work)
 		 * if @bdi is shutting down even when we're running off the
 		 * rescuer as work_list needs to be drained.
 		 */
-		do {
-			pages_written = bdi_process_work_items(bdi);
-			trace_writeback_pages_written(pages_written);
-		} while (!list_empty(&bdi->work_list));
+		pages_written = bdi_process_work_items(bdi);
 	} else {
 		/*
 		 * bdi_wq can't get enough workers and we're running off
@@ -948,8 +950,8 @@ void bdi_writeback_workfn(struct work_struct *work)
 		};
 
 		pages_written = bdi_writeback(bdi, &work);
-		trace_writeback_pages_written(pages_written);
 	}
+	trace_writeback_pages_written(pages_written);
 
 	if (!list_empty(&bdi->work_list))
 		mod_delayed_work(bdi_wq, &bdi->dwork, 0);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/14] writeback: Shorten list_lock hold times in bdi_writeback()
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
                   ` (9 preceding siblings ...)
  2014-07-31 22:00 ` [PATCH 10/14] writeback: Move rechecking of work list into bdi_process_work_items() Jan Kara
@ 2014-07-31 22:00 ` Jan Kara
  2014-07-31 22:00 ` [PATCH 12/14] writeback: Move refill of b_io list into writeback_inodes() Jan Kara
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2014-07-31 22:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang, Jan Kara

We unnecessarily hold wb->list_lock when checking various conditions in
bdi_writeback(). Move list_lock inside only around the place where we
really need it. This will also make transition to per-sb dirty tracking
easier.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 149f8b35bab2..a3f37b128446 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -727,7 +727,6 @@ static long bdi_writeback(struct backing_dev_info *bdi,
 	oldest_jif = jiffies;
 	work->older_than_this = &oldest_jif;
 
-	spin_lock(&wb->list_lock);
 	for (;;) {
 		/*
 		 * Stop writeback when nr_pages has been consumed
@@ -765,9 +764,11 @@ static long bdi_writeback(struct backing_dev_info *bdi,
 			oldest_jif = jiffies;
 
 		trace_writeback_start(bdi, work);
+		spin_lock(&wb->list_lock);
 		if (list_empty(&wb->b_io))
 			queue_io(wb, work);
 		progress = writeback_inodes(wb, work);
+		spin_unlock(&wb->list_lock);
 		trace_writeback_written(bdi, work);
 
 		update_bandwidth(bdi, wb_start);
@@ -788,7 +789,6 @@ static long bdi_writeback(struct backing_dev_info *bdi,
 		if (list_empty(&wb->b_more_io))
 			break;
 	}
-	spin_unlock(&wb->list_lock);
 
 	return nr_pages - work->nr_pages;
 }
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/14] writeback: Move refill of b_io list into writeback_inodes()
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
                   ` (10 preceding siblings ...)
  2014-07-31 22:00 ` [PATCH 11/14] writeback: Shorten list_lock hold times in bdi_writeback() Jan Kara
@ 2014-07-31 22:00 ` Jan Kara
  2014-07-31 22:00 ` [PATCH 13/14] writeback: Comment update Jan Kara
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2014-07-31 22:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang, Jan Kara

The only place where we refill b_io is just before calling
writeback_inodes() (and that function has only one call site).  Move the
refill into writeback_inodes(). This allows for easier separation when
transitioning to per-sb dirty tracking.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index a3f37b128446..6caf55858dcb 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -563,7 +563,8 @@ static long writeback_chunk_size(struct backing_dev_info *bdi,
 }
 
 /*
- * Write inodes in b_io list belonging to @work->sb (if set).
+ * Refill b_io list if needed and start writing inodes on that list belonging
+ * to @work->sb (if set).
  *
  * Return the number of pages and/or inodes written.
  */
@@ -584,6 +585,9 @@ static long writeback_inodes(struct bdi_writeback *wb,
 	long write_chunk;
 	long wrote = 0;  /* count both pages and inodes */
 
+	spin_lock(&wb->list_lock);
+	if (list_empty(&wb->b_io))
+		queue_io(wb, work);
 	while (!list_empty(&wb->b_io)) {
 		struct inode *inode = wb_inode(wb->b_io.prev);
 
@@ -668,6 +672,8 @@ static long writeback_inodes(struct bdi_writeback *wb,
 				break;
 		}
 	}
+	spin_unlock(&wb->list_lock);
+
 	return wrote;
 }
 
@@ -764,15 +770,10 @@ static long bdi_writeback(struct backing_dev_info *bdi,
 			oldest_jif = jiffies;
 
 		trace_writeback_start(bdi, work);
-		spin_lock(&wb->list_lock);
-		if (list_empty(&wb->b_io))
-			queue_io(wb, work);
 		progress = writeback_inodes(wb, work);
-		spin_unlock(&wb->list_lock);
 		trace_writeback_written(bdi, work);
 
 		update_bandwidth(bdi, wb_start);
-
 		/*
 		 * Did we write something? Try for more
 		 *
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 13/14] writeback: Comment update
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
                   ` (11 preceding siblings ...)
  2014-07-31 22:00 ` [PATCH 12/14] writeback: Move refill of b_io list into writeback_inodes() Jan Kara
@ 2014-07-31 22:00 ` Jan Kara
  2014-07-31 22:00 ` [PATCH 14/14] writeback: Per-sb dirty tracking Jan Kara
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2014-07-31 22:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang, Jan Kara

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 52 ++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6caf55858dcb..f9d8aa7f1ff7 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -695,8 +695,7 @@ static bool over_bground_thresh(struct backing_dev_info *bdi)
 }
 
 /*
- * Called under wb->list_lock. If there are multiple wb per bdi,
- * only the flusher working on the first wb should do it.
+ * Update writeback bandwidth estimate
  */
 static void update_bandwidth(struct backing_dev_info *bdi,
 			     unsigned long start_time)
@@ -707,19 +706,22 @@ static void update_bandwidth(struct backing_dev_info *bdi,
 }
 
 /*
- * Explicit flushing or periodic writeback of "old" data.
+ * Handle flushing according to passed work item
  *
- * Define "old": the first time one of an inode's pages is dirtied, we mark the
- * dirtying-time in the inode's address_space.  So this periodic writeback code
- * just walks the superblock inode list, writing back any inodes which are
- * older than a specific point in time.
+ * There are two 'background' type of writeback work - flushing of old data
+ * (have work->for_kupdate set) and flushing to get below dirty limit (have
+ * work->for_background set). These types of work get interrupted when someone
+ * asks for more specific writeback by queueing work item. They get restarted
+ * when the more specific writeback terminates (if still necessary).
  *
- * Try to run once per dirty_writeback_interval.  But if a writeback event
- * takes longer than a dirty_writeback_interval interval, then leave a
- * one-second gap.
+ * For writeback which can be easily livelocked (!WB_SYNC_ALL &&
+ * !work->tagged_writepages) we limit how many pages should be written for
+ * one inode (see writeback_chunk_size()). When these pages are written we
+ * continue with next inode in the dirty list to avoid spending too much time
+ * on one inode while starving other inodes from writeback.
  *
- * older_than_this takes precedence over nr_to_write.  So we'll only write back
- * all dirty pages if they are all attached to "old" mappings.
+ * Note that older_than_this takes precedence over nr_to_write. So we'll only
+ * write back all dirty pages if they are all attached to "old" enough inodes.
  */
 static long bdi_writeback(struct backing_dev_info *bdi,
 			  struct wb_writeback_work *work)
@@ -823,6 +825,14 @@ static unsigned long get_nr_dirty_pages(void)
 		get_nr_dirty_inodes();
 }
 
+/*
+ * Handle dirty data flushing when we are over background dirty limits
+ *
+ * In this type of writeback we writeback all dirty inodes (starting with
+ * the first dirtied inode) until we get below background dirty limits
+ * or until someone requests specific type of writeback from the flusher]
+ * (by queueing work item).
+ */
 static long wb_check_background_flush(struct backing_dev_info *bdi)
 {
 	if (over_bground_thresh(bdi)) {
@@ -841,6 +851,17 @@ static long wb_check_background_flush(struct backing_dev_info *bdi)
 	return 0;
 }
 
+/*
+ * Handle periodic writeback of "old" data.
+ *
+ * Define "old": the first time one of an inode's pages is dirtied, we mark the
+ * dirtying-time in inode->i_dirtied_when.  So this periodic writeback just
+ * walks the dirty inode list, writing back any inodes which are older than a
+ * specific point in time.
+ * 
+ * Note that this type of writeback can be interrupted if anyone requests
+ * specific writeback by queueing work item.
+ */
 static long wb_check_old_data_flush(struct backing_dev_info *bdi)
 {
 	unsigned long expired;
@@ -901,9 +922,7 @@ restart:
 			kfree(work);
 	}
 
-	/*
-	 * Check for periodic writeback, kupdated() style
-	 */
+	/* Check for background and kupdate-style writeback */
 	wrote += wb_check_old_data_flush(bdi);
 	wrote += wb_check_background_flush(bdi);
 
@@ -917,7 +936,8 @@ restart:
 
 /*
  * Handle writeback of dirty data for the device backed by this bdi. Also
- * reschedules periodically and does kupdated style flushing.
+ * reschedules periodically (once per dirty_writeback_interval) and does
+ * kupdated style flushing.
  */
 void bdi_writeback_workfn(struct work_struct *work)
 {
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 14/14] writeback: Per-sb dirty tracking
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
                   ` (12 preceding siblings ...)
  2014-07-31 22:00 ` [PATCH 13/14] writeback: Comment update Jan Kara
@ 2014-07-31 22:00 ` Jan Kara
  2014-08-01  5:14   ` Daniel Phillips
  2014-08-05 23:44   ` Dave Chinner
  2014-08-01  5:32 ` [RFC PATCH 00/14] Per-sb tracking of dirty inodes Daniel Phillips
                   ` (2 subsequent siblings)
  16 siblings, 2 replies; 25+ messages in thread
From: Jan Kara @ 2014-07-31 22:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang, Jan Kara

Switch inode dirty tracking lists to be per superblock instead of per
bdi. This is a major step towards filesystems being able to do their
own dirty tracking and selection of inodes for writeback if they desire
so (e.g. because they journal or COW data and need to writeback inodes
& pages in a specific order unknown to generic writeback code).

Per superblock dirty lists also make selecting inodes for writeback
somewhat simpler because we don't have to search for inodes from a
particular superblock for some kinds of writeback (OTOH we pay for this
by having to iterate through superblocks for all-bdi type of writeback)
and this simplification will allow for an easier switch to a better
scaling data structure for dirty inodes.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c              |  11 +++--
 fs/fs-writeback.c           | 114 +++++++++++++++++++++++++++-----------------
 fs/inode.c                  |   8 ++--
 fs/super.c                  |   9 ++++
 include/linux/backing-dev.h |  23 +++++----
 include/linux/fs.h          |  18 +++++++
 mm/backing-dev.c            |  92 ++++++++++++++++++++++-------------
 mm/filemap.c                |   2 +-
 8 files changed, 179 insertions(+), 98 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 6d7274619bf9..01310d2c40a3 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -62,17 +62,18 @@ static void bdev_inode_switch_bdi(struct inode *inode,
 
 	if (unlikely(dst == old))		/* deadlock avoidance */
 		return;
-	bdi_lock_two(&old->wb, &dst->wb);
+	bdi_lock_two(&old->wb_queue, &dst->wb_queue);
 	spin_lock(&inode->i_lock);
 	inode->i_data.backing_dev_info = dst;
 	if (inode->i_state & I_DIRTY) {
-		if (bdi_cap_writeback_dirty(dst) && !wb_has_dirty_io(&dst->wb))
+		if (bdi_cap_writeback_dirty(dst) &&
+		    !wb_has_dirty_io(&dst->wb_queue))
 			wakeup_bdi = true;
-		list_move(&inode->i_wb_list, &dst->wb.b_dirty);
+		list_move(&inode->i_wb_list, &dst->wb_queue.b_dirty);
 	}
 	spin_unlock(&inode->i_lock);
-	spin_unlock(&old->wb.list_lock);
-	spin_unlock(&dst->wb.list_lock);
+	spin_unlock(&old->wb_queue.list_lock);
+	spin_unlock(&dst->wb_queue.list_lock);
 
 	if (wakeup_bdi)
 		bdi_wakeup_thread_delayed(dst);
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index f9d8aa7f1ff7..e80d1b9ac355 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -76,6 +76,15 @@ static inline struct backing_dev_info *inode_to_bdi(struct inode *inode)
 	return sb->s_bdi;
 }
 
+static inline struct bdi_writeback *inode_to_wb(struct inode *inode)
+{
+	struct super_block *sb = inode->i_sb;
+
+	if (sb_is_blkdev_sb(sb))
+		return &inode->i_mapping->backing_dev_info->wb_queue;
+	return &sb->s_dirty_inodes;
+}
+
 static inline struct inode *wb_inode(struct list_head *head)
 {
 	return list_entry(head, struct inode, i_wb_list);
@@ -184,11 +193,11 @@ void bdi_start_background_writeback(struct backing_dev_info *bdi)
  */
 void inode_wb_list_del(struct inode *inode)
 {
-	struct backing_dev_info *bdi = inode_to_bdi(inode);
+	struct bdi_writeback *wb_queue = inode_to_wb(inode);
 
-	spin_lock(&bdi->wb.list_lock);
+	spin_lock(&wb_queue->list_lock);
 	list_del_init(&inode->i_wb_list);
-	spin_unlock(&bdi->wb.list_lock);
+	spin_unlock(&wb_queue->list_lock);
 }
 
 /*
@@ -480,7 +489,7 @@ static int
 writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 {
 	int ret = 0;
-	struct bdi_writeback *wb = &inode_to_bdi(inode)->wb;
+	struct bdi_writeback *wb_queue = inode_to_wb(inode);
 
 	spin_lock(&inode->i_lock);
 	if (!atomic_read(&inode->i_count))
@@ -516,7 +525,7 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 
 	ret = __writeback_single_inode(inode, wbc);
 
-	spin_lock(&wb->list_lock);
+	spin_lock(&wb_queue->list_lock);
 	spin_lock(&inode->i_lock);
 	/*
 	 * If inode is clean, remove it from writeback lists. Otherwise don't
@@ -524,7 +533,7 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 	 */
 	if (!(inode->i_state & I_DIRTY))
 		list_del_init(&inode->i_wb_list);
-	spin_unlock(&wb->list_lock);
+	spin_unlock(&wb_queue->list_lock);
 	inode_sync_complete(inode);
 out:
 	spin_unlock(&inode->i_lock);
@@ -563,8 +572,7 @@ static long writeback_chunk_size(struct backing_dev_info *bdi,
 }
 
 /*
- * Refill b_io list if needed and start writing inodes on that list belonging
- * to @work->sb (if set).
+ * Refill b_io list if needed and start writing inodes on that list
  *
  * Return the number of pages and/or inodes written.
  */
@@ -591,16 +599,6 @@ static long writeback_inodes(struct bdi_writeback *wb,
 	while (!list_empty(&wb->b_io)) {
 		struct inode *inode = wb_inode(wb->b_io.prev);
 
-		if (work->sb && inode->i_sb != work->sb) {
-			/*
-			 * We only want to write back data for this
-			 * superblock, move all inodes not belonging
-			 * to it back onto the dirty list.
-			 */
-			redirty_tail(inode, wb);
-			continue;
-		}
-
 		/*
 		 * Don't bother with new inodes or inodes being freed, first
 		 * kind does not need periodic writeout yet, and for the latter
@@ -672,6 +670,15 @@ static long writeback_inodes(struct bdi_writeback *wb,
 				break;
 		}
 	}
+
+	/*
+	 * In case we made no progress in current IO batch and there are no
+	 * inodes postponed for further writeback, set WB_STATE_STALLED
+	 * so that flusher doesn't busyloop in case no dirty inodes can be
+	 * written.
+	 */
+	if (!wrote && list_empty(&wb->b_more_io))
+		wb->state |= WB_STATE_STALLED;
 	spin_unlock(&wb->list_lock);
 
 	return wrote;
@@ -729,8 +736,8 @@ static long bdi_writeback(struct backing_dev_info *bdi,
 	unsigned long wb_start = jiffies;
 	long nr_pages = work->nr_pages;
 	unsigned long oldest_jif;
-	long progress;
-	struct bdi_writeback *wb = &bdi->wb;
+	long progress = 1;
+	struct bdi_writeback *wb;
 
 	oldest_jif = jiffies;
 	work->older_than_this = &oldest_jif;
@@ -771,26 +778,47 @@ static long bdi_writeback(struct backing_dev_info *bdi,
 		} else if (work->for_background)
 			oldest_jif = jiffies;
 
+		/*
+		 * If we made some progress, clear stalled state to retry other
+		 * writeback queues as well.
+		 */
+		if (progress) {
+			spin_lock_bh(&bdi->wb_lock);
+			list_for_each_entry(wb, &bdi->wq_list, bdi_list) {
+				wb->state &= ~WB_STATE_STALLED;
+			}
+			spin_unlock_bh(&bdi->wb_lock);
+		}
+
+		if (work->sb) {
+			wb = &work->sb->s_dirty_inodes;
+			if (wb->state & WB_STATE_STALLED)
+				wb = NULL;
+		} else {
+			spin_lock_bh(&bdi->wb_lock);
+			list_for_each_entry(wb, &bdi->wq_list, bdi_list) {
+				if (!(wb->state & WB_STATE_STALLED) &&
+				    wb_has_dirty_io(wb)) {
+					/*
+					 * Make us start with the following
+					 * writeback queue next time
+					 */
+					list_move(&bdi->wq_list, &wb->bdi_list);
+					goto got_wb;
+				}
+			}
+			wb = NULL;
+got_wb:
+			spin_unlock_bh(&bdi->wb_lock);
+
+		}
+		/* No more dirty inodes. Stop writeback. */
+		if (!wb)
+			break;
 		trace_writeback_start(bdi, work);
 		progress = writeback_inodes(wb, work);
 		trace_writeback_written(bdi, work);
-
 		update_bandwidth(bdi, wb_start);
-		/*
-		 * Did we write something? Try for more
-		 *
-		 * Dirty inodes are moved to b_io for writeback in batches.
-		 * The completion of the current batch does not necessarily
-		 * mean the overall work is done. So we keep looping as long
-		 * as made some progress on cleaning pages or inodes.
-		 */
-		if (progress)
-			continue;
-		/*
-		 * No more inodes for IO, bail
-		 */
-		if (list_empty(&wb->b_more_io))
-			break;
 	}
 
 	return nr_pages - work->nr_pages;
@@ -1051,7 +1079,6 @@ static noinline void block_dump___mark_inode_dirty(struct inode *inode)
 void __mark_inode_dirty(struct inode *inode, int flags)
 {
 	struct super_block *sb = inode->i_sb;
-	struct backing_dev_info *bdi = NULL;
 
 	/*
 	 * Don't do this for I_DIRTY_PAGES - that doesn't actually
@@ -1110,27 +1137,28 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 		 */
 		if (!was_dirty) {
 			bool wakeup_bdi = false;
-			bdi = inode_to_bdi(inode);
+			struct bdi_writeback *wb_queue = inode_to_wb(inode);
+			struct backing_dev_info *bdi = inode_to_bdi(inode);
 
 			spin_unlock(&inode->i_lock);
-			spin_lock(&bdi->wb.list_lock);
+			spin_lock(&wb_queue->list_lock);
 			if (bdi_cap_writeback_dirty(bdi)) {
 				WARN(!test_bit(BDI_registered, &bdi->state),
 				     "bdi-%s not registered\n", bdi->name);
 
 				/*
 				 * If this is the first dirty inode for this
-				 * bdi, we have to wake-up the corresponding
+				 * sb, we will wake-up the corresponding
 				 * bdi thread to make sure background
 				 * write-back happens later.
 				 */
-				if (!wb_has_dirty_io(&bdi->wb))
+				if (!wb_has_dirty_io(wb_queue))
 					wakeup_bdi = true;
 			}
 
 			inode->dirtied_when = jiffies;
-			list_move(&inode->i_wb_list, &bdi->wb.b_dirty);
-			spin_unlock(&bdi->wb.list_lock);
+			list_move(&inode->i_wb_list, &wb_queue->b_dirty);
+			spin_unlock(&wb_queue->list_lock);
 
 			if (wakeup_bdi)
 				bdi_wakeup_thread_delayed(bdi);
diff --git a/fs/inode.c b/fs/inode.c
index 6eecb7ff0b9a..a9d40e57f73d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -29,8 +29,10 @@
  *   inode->i_sb->s_inode_lru, inode->i_lru
  * inode_sb_list_lock protects:
  *   sb->s_inodes, inode->i_sb_list
- * bdi->wb.list_lock protects:
- *   bdi->wb.b_{dirty,io,more_io}, inode->i_wb_list
+ * sb->s_dirty_inodes.list_lock protects:
+ *   sb->s_dirty_inodes.b_{dirty,io,more_io}, inode->i_wb_list
+ * Block device inodes are an exception and their i_wb_list is protected by
+ *   bdi->wb_queue.list_lock
  * inode_hash_lock protects:
  *   inode_hashtable, inode->i_hash
  *
@@ -40,7 +42,7 @@
  *   inode->i_lock
  *     Inode LRU list locks
  *
- * bdi->wb.list_lock
+ * sb->s_dirty_inodes.list_lock
  *   inode->i_lock
  *
  * inode_hash_lock
diff --git a/fs/super.c b/fs/super.c
index d20d5b11dedf..9e4867da6c5d 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -188,6 +188,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 	INIT_HLIST_NODE(&s->s_instances);
 	INIT_HLIST_BL_HEAD(&s->s_anon);
 	INIT_LIST_HEAD(&s->s_inodes);
+	bdi_writeback_queue_init(&s->s_dirty_inodes, s);
 
 	if (list_lru_init(&s->s_dentry_lru))
 		goto fail;
@@ -995,6 +996,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 			goto error;
 		}
 
+		bdi_writeback_queue_register(&s->s_dirty_inodes);
 		s->s_flags |= MS_ACTIVE;
 		bdev->bd_super = s;
 	}
@@ -1015,6 +1017,13 @@ void kill_block_super(struct super_block *sb)
 	struct block_device *bdev = sb->s_bdev;
 	fmode_t mode = sb->s_mode;
 
+	/*
+	 * Unregister superblock from periodic writeback. There may be
+	 * writeback still running for it but we call sync_filesystem() later
+	 * and that will execute only after any background writeback is stopped.
+	 * This guarantees flusher won't touch sb that's going away.
+	 */
+	bdi_writeback_queue_unregister(&sb->s_dirty_inodes);
 	bdev->bd_super = NULL;
 	generic_shutdown_super(sb);
 	sync_blockdev(bdev);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 5bfe30d6f01f..ff3e2a3eb326 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -48,15 +48,6 @@ enum bdi_stat_item {
 
 #define BDI_STAT_BATCH (8*(1+ilog2(nr_cpu_ids)))
 
-struct bdi_writeback {
-	struct backing_dev_info *bdi;	/* our parent bdi */
-
-	struct list_head b_dirty;	/* dirty inodes */
-	struct list_head b_io;		/* parked for writeback */
-	struct list_head b_more_io;	/* parked for more writeback */
-	spinlock_t list_lock;		/* protects the b_* lists */
-};
-
 struct backing_dev_info {
 	struct list_head bdi_list;
 	unsigned long ra_pages;	/* max readahead in PAGE_CACHE_SIZE units */
@@ -94,10 +85,13 @@ struct backing_dev_info {
 
 	struct delayed_work dwork;	/* work item used for writeback */
 
-	struct bdi_writeback wb;  /* default writeback info for this bdi */
 	spinlock_t wb_lock;	  /* protects work_list & wb.dwork scheduling,
-				     updates of bandwidth & ratelimit */
+				     updates of bandwidth & ratelimit, sb_list */
 
+	struct bdi_writeback wb_queue;  /* default writeback queue for this bdi.
+					   Used for block device inodes of this
+					   device. */
+	struct list_head wq_list;	/* list of writeback queues on this bdi */
 	struct list_head work_list;
 
 	struct device *dev;
@@ -112,7 +106,9 @@ struct backing_dev_info {
 
 static inline struct backing_dev_info *wb_bdi(struct bdi_writeback *wb)
 {
-	return wb->bdi;
+	if (!wb->sb)
+		return container_of(wb, struct backing_dev_info, wb_queue);
+	return wb->sb->s_bdi;
 }
 
 int __must_check bdi_init(struct backing_dev_info *bdi);
@@ -124,6 +120,9 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
 void bdi_unregister(struct backing_dev_info *bdi);
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *, unsigned int);
+void bdi_writeback_queue_init(struct bdi_writeback *wb, struct super_block *sb);
+void bdi_writeback_queue_register(struct bdi_writeback *wb_queue);
+void bdi_writeback_queue_unregister(struct bdi_writeback *wb_queue);
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
 			enum wb_reason reason);
 void bdi_start_background_writeback(struct backing_dev_info *bdi);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e11d60cc867b..894fb42438ab 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1172,6 +1172,22 @@ struct sb_writers {
 #endif
 };
 
+#define WB_STATE_STALLED 0x01	/* Writeback for this queue is stalled */
+
+struct bdi_writeback {
+	struct super_block *sb;		/* our parent superblock,
+					   NULL for default bdi queue */
+	struct list_head bdi_list;	/* List of writeback queues on bdi */
+
+	struct list_head b_dirty;	/* dirty inodes */
+	struct list_head b_io;		/* parked for writeback */
+	struct list_head b_more_io;	/* parked for more writeback */
+	spinlock_t list_lock;		/* protects the b_* lists */
+	int state;			/* state of writeback in this queue,
+					 * manipulated only from flusher ->
+					 * no locking */
+};
+
 struct super_block {
 	struct list_head	s_list;		/* Keep this first */
 	dev_t			s_dev;		/* search index; _not_ kdev_t */
@@ -1203,6 +1219,8 @@ struct super_block {
 	struct hlist_node	s_instances;
 	struct quota_info	s_dquot;	/* Diskquota specific options */
 
+	struct bdi_writeback	s_dirty_inodes;	/* Tracking of dirty inodes */
+
 	struct sb_writers	s_writers;
 
 	char s_id[32];				/* Informational name */
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index c44ba43d580d..10ab9c34e155 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -65,22 +65,9 @@ static void bdi_debug_init(void)
 static int bdi_debug_stats_show(struct seq_file *m, void *v)
 {
 	struct backing_dev_info *bdi = m->private;
-	struct bdi_writeback *wb = &bdi->wb;
 	unsigned long background_thresh;
 	unsigned long dirty_thresh;
 	unsigned long bdi_thresh;
-	unsigned long nr_dirty, nr_io, nr_more_io;
-	struct inode *inode;
-
-	nr_dirty = nr_io = nr_more_io = 0;
-	spin_lock(&wb->list_lock);
-	list_for_each_entry(inode, &wb->b_dirty, i_wb_list)
-		nr_dirty++;
-	list_for_each_entry(inode, &wb->b_io, i_wb_list)
-		nr_io++;
-	list_for_each_entry(inode, &wb->b_more_io, i_wb_list)
-		nr_more_io++;
-	spin_unlock(&wb->list_lock);
 
 	global_dirty_limits(&background_thresh, &dirty_thresh);
 	bdi_thresh = bdi_dirty_limit(bdi, dirty_thresh);
@@ -95,9 +82,6 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   "BdiDirtied:         %10lu kB\n"
 		   "BdiWritten:         %10lu kB\n"
 		   "BdiWriteBandwidth:  %10lu kBps\n"
-		   "b_dirty:            %10lu\n"
-		   "b_io:               %10lu\n"
-		   "b_more_io:          %10lu\n"
 		   "bdi_list:           %10u\n"
 		   "state:              %10lx\n",
 		   (unsigned long) K(bdi_stat(bdi, BDI_WRITEBACK)),
@@ -108,9 +92,6 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   (unsigned long) K(bdi_stat(bdi, BDI_DIRTIED)),
 		   (unsigned long) K(bdi_stat(bdi, BDI_WRITTEN)),
 		   (unsigned long) K(bdi->write_bandwidth),
-		   nr_dirty,
-		   nr_io,
-		   nr_more_io,
 		   !list_empty(&bdi->bdi_list), bdi->state);
 #undef K
 
@@ -275,7 +256,17 @@ subsys_initcall(default_bdi_init);
 
 int bdi_has_dirty_io(struct backing_dev_info *bdi)
 {
-	return wb_has_dirty_io(&bdi->wb);
+	struct bdi_writeback *wb_queue;
+
+	spin_lock_bh(&bdi->wb_lock);
+	list_for_each_entry(wb_queue, &bdi->wq_list, bdi_list) {
+		if (wb_has_dirty_io(wb_queue)) {
+			spin_unlock_bh(&bdi->wb_lock);
+			return 1;
+		}
+	}
+	spin_unlock_bh(&bdi->wb_lock);
+	return 0;
 }
 
 /*
@@ -421,15 +412,43 @@ void bdi_unregister(struct backing_dev_info *bdi)
 }
 EXPORT_SYMBOL(bdi_unregister);
 
-static void bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
+void bdi_writeback_queue_init(struct bdi_writeback *wb, struct super_block *sb)
 {
-	memset(wb, 0, sizeof(*wb));
-
-	wb->bdi = bdi;
+	wb->sb = sb;
+	INIT_LIST_HEAD(&wb->bdi_list);
 	INIT_LIST_HEAD(&wb->b_dirty);
 	INIT_LIST_HEAD(&wb->b_io);
 	INIT_LIST_HEAD(&wb->b_more_io);
 	spin_lock_init(&wb->list_lock);
+	wb->state = 0;
+}
+
+/*
+ * Register writeback queue with BDI so that background writeback is run for
+ * it.
+ */
+void bdi_writeback_queue_register(struct bdi_writeback *wb_queue)
+{
+	struct backing_dev_info *bdi = wb_bdi(wb_queue);
+
+	spin_lock_bh(&bdi->wb_lock);
+	list_add(&wb_queue->bdi_list, &bdi->wq_list);
+	spin_unlock_bh(&bdi->wb_lock);
+}
+
+/*
+ * Unregister writeback queue from BDI. No further background writeback will be
+ * started against this superblock. However note that there may be writeback
+ * still running for the sb.
+ */
+void bdi_writeback_queue_unregister(struct bdi_writeback *wb_queue)
+{
+	struct backing_dev_info *bdi = wb_bdi(wb_queue);
+
+	/* Make sure flusher cannot find the superblock any longer */
+	spin_lock_bh(&bdi->wb_lock);
+	list_del_init(&wb_queue->bdi_list);
+	spin_unlock_bh(&bdi->wb_lock);
 }
 
 /*
@@ -449,10 +468,12 @@ int bdi_init(struct backing_dev_info *bdi)
 	spin_lock_init(&bdi->wb_lock);
 	INIT_LIST_HEAD(&bdi->bdi_list);
 	INIT_LIST_HEAD(&bdi->work_list);
+	INIT_LIST_HEAD(&bdi->wq_list);
 
 	bdi->last_old_flush = jiffies;
 	INIT_DELAYED_WORK(&bdi->dwork, bdi_writeback_workfn);
-	bdi_wb_init(&bdi->wb, bdi);
+	bdi_writeback_queue_init(&bdi->wb_queue, NULL);
+	bdi_writeback_queue_register(&bdi->wb_queue);
 
 	for (i = 0; i < NR_BDI_STAT_ITEMS; i++) {
 		err = percpu_counter_init(&bdi->bdi_stat[i], 0);
@@ -486,18 +507,21 @@ void bdi_destroy(struct backing_dev_info *bdi)
 {
 	int i;
 
+	/* bdi disappearing under fs. Bad, bad, bad! */
+	BUG_ON(!list_is_singular(&bdi->wq_list));
 	/*
 	 * Splice our entries to the default_backing_dev_info, if this
-	 * bdi disappears
+	 * bdi disappears. We can still hold some device inodes in dirty lists
 	 */
-	if (bdi_has_dirty_io(bdi)) {
-		struct bdi_writeback *dst = &default_backing_dev_info.wb;
-
-		bdi_lock_two(&bdi->wb, dst);
-		list_splice(&bdi->wb.b_dirty, &dst->b_dirty);
-		list_splice(&bdi->wb.b_io, &dst->b_io);
-		list_splice(&bdi->wb.b_more_io, &dst->b_more_io);
-		spin_unlock(&bdi->wb.list_lock);
+	if (wb_has_dirty_io(&bdi->wb_queue)) {
+		struct bdi_writeback *dst = &default_backing_dev_info.wb_queue;
+		struct bdi_writeback *src = &bdi->wb_queue;
+
+		bdi_lock_two(src, dst);
+		list_splice(&src->b_dirty, &dst->b_dirty);
+		list_splice(&src->b_io, &dst->b_io);
+		list_splice(&src->b_more_io, &dst->b_more_io);
+		spin_unlock(&src->list_lock);
 		spin_unlock(&dst->list_lock);
 	}
 
diff --git a/mm/filemap.c b/mm/filemap.c
index dafb06f70a09..cbc3c647a190 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -80,7 +80,7 @@
  *  ->i_mutex			(generic_perform_write)
  *    ->mmap_sem		(fault_in_pages_readable->do_page_fault)
  *
- *  bdi->wb.list_lock
+ *  bdi->wb_queue.list_lock / sb->s_dirty_inodes.list_lock
  *    sb_lock			(fs/fs-writeback.c)
  *    ->mapping->tree_lock	(__sync_single_inode)
  *
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 14/14] writeback: Per-sb dirty tracking
  2014-07-31 22:00 ` [PATCH 14/14] writeback: Per-sb dirty tracking Jan Kara
@ 2014-08-01  5:14   ` Daniel Phillips
  2014-08-05 23:44   ` Dave Chinner
  1 sibling, 0 replies; 25+ messages in thread
From: Daniel Phillips @ 2014-08-01  5:14 UTC (permalink / raw)
  To: Jan Kara, linux-fsdevel; +Cc: OGAWA Hirofumi, Wu Fengguang

On 07/31/2014 03:00 PM, Jan Kara wrote:
> Switch inode dirty tracking lists to be per superblock instead of per
> bdi. This is a major step towards filesystems being able to do their
> own dirty tracking...

I'll say :)

> ...and selection of inodes for writeback if they desire
> so (e.g. because they journal or COW data and need to writeback inodes
> & pages in a specific order unknown to generic writeback code).

Well, I don't see an actual API here. I suppose that is
intentional. Shall we just roll one, or do you have some
suggestions? Obviously, this groundwork is just what Tux3
wants, and if I got the sense of the previous discussion
correctly, XFS could be able to benefit from it too, not
to mention others.

So... is this patch set something we should be testing
right now, or is it just for comment, or already on its
way upstream, or other? Did you get prelimary performance
data?

Regards,

Daniel

> Per superblock dirty lists also make selecting inodes for writeback
> somewhat simpler because we don't have to search for inodes from a
> particular superblock for some kinds of writeback (OTOH we pay for this
> by having to iterate through superblocks for all-bdi type of writeback)
> and this simplification will allow for an easier switch to a better
> scaling data structure for dirty inodes.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/14] Per-sb tracking of dirty inodes
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
                   ` (13 preceding siblings ...)
  2014-07-31 22:00 ` [PATCH 14/14] writeback: Per-sb dirty tracking Jan Kara
@ 2014-08-01  5:32 ` Daniel Phillips
  2014-08-05  5:22 ` Dave Chinner
  2014-08-05  8:20 ` Dave Chinner
  16 siblings, 0 replies; 25+ messages in thread
From: Daniel Phillips @ 2014-08-01  5:32 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, OGAWA Hirofumi, Wu Fengguang

On Thursday, July 31, 2014 3:00:39 PM PDT, Jan Kara wrote:
>   Hello,
>
>   here is my attempt to implement per superblock tracking of dirty 
inodes.
> I have two motivations for this:
>   1) I've tried to get rid of overwriting of inode's dirty time stamp 
during
>      writeback and filtering of dirty inodes by superblock makes this
>      significantly harder. For similar reasons also improving scalability
>      of inode dirty tracking is more complicated than it has to be.
>   2) Filesystems like Tux3 (but to some extent also XFS) would like to
>      influence order in which inodes are written back. Currently this 
isn't
>      possible...

...without major hackery...

> Tracking dirty inodes per superblock makes it easy to later
>      implement filesystem callback for writing back inodes and also 
possibly
>      allow filesystems to implement their own dirty tracking if they 
desire.
>
>   The patches pass xfstests run and also some sync livelock avoidance 
tests

Which answers the question I posted before finding your RFC.

> I have with 4 filesystems on 2 disks so they should be reasonably sound.
> Before I go and base more work on this I'd like to hear some feedback 
about
> whether people find this sane and workable.

To us (Tux3 team) that means, apply the patches and roll up a
trial API. I don't immediately see anything insane, but then I
haven't spent the required time on it yet either.

> After this patch set it is trivial to provide a per-sb callback 
> for writeback (at level of writeback_inodes()).

Right.

> It is also fairly easy to allow filesystem to completely override dirty
> tracking (only needs some restructuring of mark_inode_dirty()). I can
> write these as a proof-of-concept patches for Tux3 guys once the general
> approach in this patch set is acked. Or if there are some in-tree users
> (XFS?, btrfs?) I can include them in the patch set.
>
> Any comments welcome!

You're a gentleman and a scholar. Please stay tuned for detailed
feedback.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/14] Per-sb tracking of dirty inodes
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
                   ` (14 preceding siblings ...)
  2014-08-01  5:32 ` [RFC PATCH 00/14] Per-sb tracking of dirty inodes Daniel Phillips
@ 2014-08-05  5:22 ` Dave Chinner
  2014-08-05 10:31   ` Jan Kara
  2014-08-05  8:20 ` Dave Chinner
  16 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2014-08-05  5:22 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, OGAWA Hirofumi, Wu Fengguang

On Fri, Aug 01, 2014 at 12:00:39AM +0200, Jan Kara wrote:
>   Hello,
> 
>   here is my attempt to implement per superblock tracking of dirty inodes.
> I have two motivations for this:
>   1) I've tried to get rid of overwriting of inode's dirty time stamp during
>      writeback and filtering of dirty inodes by superblock makes this
>      significantly harder. For similar reasons also improving scalability
>      of inode dirty tracking is more complicated than it has to be.
>   2) Filesystems like Tux3 (but to some extent also XFS) would like to
>      influence order in which inodes are written back. Currently this isn't
>      possible. Tracking dirty inodes per superblock makes it easy to later
>      implement filesystem callback for writing back inodes and also possibly
>      allow filesystems to implement their own dirty tracking if they desire.
> 
>   The patches pass xfstests run and also some sync livelock avoidance tests
> I have with 4 filesystems on 2 disks so they should be reasonably sound.
> Before I go and base more work on this I'd like to hear some feedback about
> whether people find this sane and workable.
> 
> After this patch set it is trivial to provide a per-sb callback for writeback
> (at level of writeback_inodes()). It is also fairly easy to allow filesystem to
> completely override dirty tracking (only needs some restructuring of
> mark_inode_dirty()). I can write these as a proof-of-concept patches for Tux3
> guys once the general approach in this patch set is acked. Or if there are
> some in-tree users (XFS?, btrfs?) I can include them in the patch set.
> 
> Any comments welcome!

My initial performance tests haven't shown any regressions, but
those same tests show that we still need to add plugging to
writeback_inodes(). Patch with numbers below. I haven't done any
sanity testing yet - I'll do that over the next few days...

FWIW, the patch set doesn't solve the sync lock contention problems -
populate all of memory with a millions of inodes on a mounted
filesystem, then run xfs/297 on a different filesystem. The system
will trigger major contention in sync_inodes_sb() and
inode_sb_list_add() on the inode_sb_list_lock because xfs/297 will
cause lots of concurrent sync() calls to occur. The system will
perform really badly on anything filesystem related while this
contention occurs. Normally xfs/297 runs in 36s on the machine I
just ran this test on, with the extra cached inodes it's been
running for 15 minutes burning 8-9 CPU cores and there's no end in
sight....

I guess I should dig out my patchset to fix that and port it on top
of this one....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

writeback: plug writeback at a high level

From: Dave Chinner <dchinner@redhat.com>

tl;dr: 3 lines of code, 86% better fsmark thoughput consuming 13%
less CPU and 43% lower runtime.

Doing writeback on lots of little files causes terrible IOPS storms
because of the per-mapping writeback plugging we do. This
essentially causes imeediate dispatch of IO for each mapping,
regardless of the context in which writeback is occurring.

IOWs, running a concurrent write-lots-of-small 4k files using fsmark
on XFS results in a huge number of IOPS being issued for data
writes.  Metadata writes are sorted and plugged at a high level by
XFS, so aggregate nicely into large IOs.

However, data writeback IOs are dispatched in individual 4k IOs -
even when the blocks of two consecutively written files are
adjacent - because the underlying block device is fast enough not to
congest on such IO. This behaviour is not SSD related - anything
with hardware caches is going to see the same benefits as the IO
rates are limited only by how fast adjacent IOs can be sent to the
hardware caches for aggregation.

Hence the speed of the physical device is irrelevant to this common
writeback workload (happens every time you untar a tarball!) -
performance is limited by the overhead of dispatching individual
IOs from a single writeback thread.

Test VM: 16p, 16GB RAM, 2xSSD in RAID0, 500TB sparse XFS filesystem,
metadata CRCs enabled.

Test:

$ ./fs_mark  -D  10000  -S0  -n  10000  -s  4096  -L  120  -d
/mnt/scratch/0  -d  /mnt/scratch/1  -d  /mnt/scratch/2  -d
/mnt/scratch/3  -d  /mnt/scratch/4  -d  /mnt/scratch/5  -d
/mnt/scratch/6  -d  /mnt/scratch/7

Result:
		wall	sys	create rate	Physical write IO
		time	CPU	(avg files/s)	 IOPS	Bandwidth
		-----	-----	-------------	------	---------
unpatched	5m54s	15m32s	32,500+/-2200	28,000	150MB/s
patched		3m19s	13m28s	52,900+/-1800	 1,500	280MB/s
improvement	-43.8%	-13.3%	  +62.7%	-94.6%	+86.6%

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/fs-writeback.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e80d1b9..2e80e80 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -592,6 +592,9 @@ static long writeback_inodes(struct bdi_writeback *wb,
 	unsigned long start_time = jiffies;
 	long write_chunk;
 	long wrote = 0;  /* count both pages and inodes */
+	struct blk_plug plug;
+
+	blk_start_plug(&plug);
 
 	spin_lock(&wb->list_lock);
 	if (list_empty(&wb->b_io))
@@ -681,6 +684,8 @@ static long writeback_inodes(struct bdi_writeback *wb,
 		wb->state |= WB_STATE_STALLED;
 	spin_unlock(&wb->list_lock);
 
+	blk_finish_plug(&plug);
+
 	return wrote;
 }
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/14] Per-sb tracking of dirty inodes
  2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
                   ` (15 preceding siblings ...)
  2014-08-05  5:22 ` Dave Chinner
@ 2014-08-05  8:20 ` Dave Chinner
  16 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2014-08-05  8:20 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, OGAWA Hirofumi, Wu Fengguang

On Fri, Aug 01, 2014 at 12:00:39AM +0200, Jan Kara wrote:
>   Hello,
> 
>   here is my attempt to implement per superblock tracking of dirty inodes.
> I have two motivations for this:
>   1) I've tried to get rid of overwriting of inode's dirty time stamp during
>      writeback and filtering of dirty inodes by superblock makes this
>      significantly harder. For similar reasons also improving scalability
>      of inode dirty tracking is more complicated than it has to be.
>   2) Filesystems like Tux3 (but to some extent also XFS) would like to
>      influence order in which inodes are written back. Currently this isn't
>      possible. Tracking dirty inodes per superblock makes it easy to later
>      implement filesystem callback for writing back inodes and also possibly
>      allow filesystems to implement their own dirty tracking if they desire.
> 
>   The patches pass xfstests run and also some sync livelock avoidance tests
> I have with 4 filesystems on 2 disks so they should be reasonably sound.
> Before I go and base more work on this I'd like to hear some feedback about
> whether people find this sane and workable.
> 
> After this patch set it is trivial to provide a per-sb callback for writeback
> (at level of writeback_inodes()). It is also fairly easy to allow filesystem to
> completely override dirty tracking (only needs some restructuring of
> mark_inode_dirty()). I can write these as a proof-of-concept patches for Tux3
> guys once the general approach in this patch set is acked. Or if there are
> some in-tree users (XFS?, btrfs?) I can include them in the patch set.
> 
> Any comments welcome!

Hi Jan,

This fails within seconds via generic/013 on a debug XFS. There is an
inode being dirtied, and it it not getting written back before
unmount evicts the inode from the cache. Hence a CONFIG_XFS_DEBUG=y
kernel assert fails like so:

[  227.620732] XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 963
[  227.622506] ------------[ cut here ]------------
[  227.623212] kernel BUG at fs/xfs/xfs_message.c:107!
[  227.623947] invalid opcode: 0000 [#1] SMP
[  227.624606] Modules linked in:
[  227.624724] CPU: 0 PID: 4878 Comm: umount Not tainted 3.16.0-dgc+ #371
[  227.624724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[  227.624724] task: ffff880035973160 ti: ffff880031c14000 task.ti: ffff880031c14000
[  227.624724] RIP: 0010:[<ffffffff814d1d32>]  [<ffffffff814d1d32>] assfail+0x22/0x30
[  227.624724] RSP: 0018:ffff880031c17d88  EFLAGS: 00010282
[  227.624724] RAX: 0000000000000077 RBX: 0000000000000005 RCX: 000000000000e6e4
[  227.624724] RDX: 000000000000e4e4 RSI: 0000000000000046 RDI: 0000000000000246
[  227.624724] RBP: ffff880031c17d88 R08: 000000000000000a R09: 00000000000001e2
[  227.624724] R10: 0000000000000000 R11: ffff880031c17a3e R12: ffff880032774000
[  227.624724] R13: ffff880032774200 R14: 0000000000000005 R15: ffff880032774040
[  227.624724] FS:  00007f2b0424c840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[  227.624724] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  227.624724] CR2: 0000000000415048 CR3: 000000003c061000 CR4: 00000000000006f0
[  227.624724] Stack:
[  227.624724]  ffff880031c17df0 ffffffff814d5128 8000000000173600 0000043c95800034
[  227.624724]  0000000000000b9b 000000000021e4ac 0000000000000034 ffff880000000001
[  227.624724]  ffff880032774200 ffff880032774288 ffffffff81d7a6e0 ffff880031c17e58
[  227.624724] Call Trace:
[  227.624724]  [<ffffffff814d5128>] xfs_fs_destroy_inode+0x198/0x1f0
[  227.624724]  [<ffffffff811c0c88>] destroy_inode+0x38/0x60
[  227.624724]  [<ffffffff811c0dc3>] evict+0x113/0x180
[  227.624724]  [<ffffffff811c0e69>] dispose_list+0x39/0x50
[  227.624724]  [<ffffffff811c1bcc>] evict_inodes+0x11c/0x130
[  227.624724]  [<ffffffff811a9118>] generic_shutdown_super+0x48/0xf0
[  227.624724]  [<ffffffff811a94ec>] kill_block_super+0x3c/0x90
[  227.624724]  [<ffffffff811a9819>] deactivate_locked_super+0x49/0x60
[  227.624724]  [<ffffffff811a9dc6>] deactivate_super+0x46/0x60
[  227.624724]  [<ffffffff811c53b6>] mntput_no_expire+0xd6/0x170
[  227.624724]  [<ffffffff811c692e>] SyS_umount+0x8e/0x100
[  227.624724]  [<ffffffff81ccbaa9>] system_call_fastpath+0x16/0x1b
[  227.624724] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 f1 41 89 d0 48 89 e5 48 89 fa 48 c7 c6 e8 62 11 82 31 ff 31 c0 e8 ce fb ff ff <0f> 0b 66 66 66
[  227.624724] RIP  [<ffffffff814d1d32>] assfail+0x22/0x30
[  227.624724]  RSP <ffff880031c17d88>
[  227.658382] ---[ end trace 3836149aa028dbf6 ]---

i.e. there are still delayed allocation blocks attached to the
inode. Tracing writeback indicates the inode is definitely dirtying
the page cache for every buffer and page dirtied, but there is no
data writeback occuring on that inode between the time it is last
dirtied and unmount evicting the inode.

I'll look into it some more, but it's happening from multiple
different "last dirtied" locations in XFS (buffered writes, sub-page
zeroing in FALLOC_FL_ZERO_RANGE, EOF zeroing from truncate extending
the file, etc) so it doesn't appear to me to be an XFS bug. Hence
you might find it faster than I will. ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 00/14] Per-sb tracking of dirty inodes
  2014-08-05  5:22 ` Dave Chinner
@ 2014-08-05 10:31   ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2014-08-05 10:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Jan Kara, linux-fsdevel, OGAWA Hirofumi, Wu Fengguang

On Tue 05-08-14 15:22:17, Dave Chinner wrote:
> On Fri, Aug 01, 2014 at 12:00:39AM +0200, Jan Kara wrote:
> >   Hello,
> > 
> >   here is my attempt to implement per superblock tracking of dirty inodes.
> > I have two motivations for this:
> >   1) I've tried to get rid of overwriting of inode's dirty time stamp during
> >      writeback and filtering of dirty inodes by superblock makes this
> >      significantly harder. For similar reasons also improving scalability
> >      of inode dirty tracking is more complicated than it has to be.
> >   2) Filesystems like Tux3 (but to some extent also XFS) would like to
> >      influence order in which inodes are written back. Currently this isn't
> >      possible. Tracking dirty inodes per superblock makes it easy to later
> >      implement filesystem callback for writing back inodes and also possibly
> >      allow filesystems to implement their own dirty tracking if they desire.
> > 
> >   The patches pass xfstests run and also some sync livelock avoidance tests
> > I have with 4 filesystems on 2 disks so they should be reasonably sound.
> > Before I go and base more work on this I'd like to hear some feedback about
> > whether people find this sane and workable.
> > 
> > After this patch set it is trivial to provide a per-sb callback for writeback
> > (at level of writeback_inodes()). It is also fairly easy to allow filesystem to
> > completely override dirty tracking (only needs some restructuring of
> > mark_inode_dirty()). I can write these as a proof-of-concept patches for Tux3
> > guys once the general approach in this patch set is acked. Or if there are
> > some in-tree users (XFS?, btrfs?) I can include them in the patch set.
> > 
> > Any comments welcome!
> 
> My initial performance tests haven't shown any regressions, but
> those same tests show that we still need to add plugging to
> writeback_inodes(). Patch with numbers below. I haven't done any
> sanity testing yet - I'll do that over the next few days...
  Thanks for tests! I was concentrating on no-regression part first with
adding possible performance improvements on top of that. I have added your
patch with plugging to the series. Thanks for that.

> FWIW, the patch set doesn't solve the sync lock contention problems -
> populate all of memory with a millions of inodes on a mounted
> filesystem, then run xfs/297 on a different filesystem. The system
> will trigger major contention in sync_inodes_sb() and
> inode_sb_list_add() on the inode_sb_list_lock because xfs/297 will
> cause lots of concurrent sync() calls to occur. The system will
> perform really badly on anything filesystem related while this
> contention occurs. Normally xfs/297 runs in 36s on the machine I
> just ran this test on, with the extra cached inodes it's been
> running for 15 minutes burning 8-9 CPU cores and there's no end in
> sight....
  Yes, I didn't mean to address this yet. When I was last looking into this
problem, redirty_tail() logic was really making handling of dirty & under
writeback inodes difficult (I didn't want to add another list_head to
struct inode for completely separate under-writeback list). So I deferred
this until redirty_tail() gets sorted out. But maybe I should revisit this
with the per-sb dirty tracking unless you beat me to it ;).

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 14/14] writeback: Per-sb dirty tracking
  2014-07-31 22:00 ` [PATCH 14/14] writeback: Per-sb dirty tracking Jan Kara
  2014-08-01  5:14   ` Daniel Phillips
@ 2014-08-05 23:44   ` Dave Chinner
  2014-08-06  8:46     ` Jan Kara
  1 sibling, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2014-08-05 23:44 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, OGAWA Hirofumi, Wu Fengguang

On Fri, Aug 01, 2014 at 12:00:53AM +0200, Jan Kara wrote:
> Switch inode dirty tracking lists to be per superblock instead of per
> bdi. This is a major step towards filesystems being able to do their
> own dirty tracking and selection of inodes for writeback if they desire
> so (e.g. because they journal or COW data and need to writeback inodes
> & pages in a specific order unknown to generic writeback code).
> 
> Per superblock dirty lists also make selecting inodes for writeback
> somewhat simpler because we don't have to search for inodes from a
> particular superblock for some kinds of writeback (OTOH we pay for this
> by having to iterate through superblocks for all-bdi type of writeback)
> and this simplification will allow for an easier switch to a better
> scaling data structure for dirty inodes.

I think the WB_STATE_STALLED code is buggy w.r.t. unmount.

> @@ -672,6 +670,15 @@ static long writeback_inodes(struct bdi_writeback *wb,
>  				break;
>  		}
>  	}
> +
> +	/*
> +	 * In case we made no progress in current IO batch and there are no
> +	 * inodes postponed for further writeback, set WB_STATE_STALLED
> +	 * so that flusher doesn't busyloop in case no dirty inodes can be
> +	 * written.
> +	 */
> +	if (!wrote && list_empty(&wb->b_more_io))
> +		wb->state |= WB_STATE_STALLED;
>  	spin_unlock(&wb->list_lock);

Last background writeback ends with WB_STATE_STALLED.

> @@ -771,26 +778,47 @@ static long bdi_writeback(struct backing_dev_info *bdi,
>  		} else if (work->for_background)
>  			oldest_jif = jiffies;
>  
> +		/*
> +		 * If we made some progress, clear stalled state to retry other
> +		 * writeback queues as well.
> +		 */
> +		if (progress) {
> +			spin_lock_bh(&bdi->wb_lock);
> +			list_for_each_entry(wb, &bdi->wq_list, bdi_list) {
> +				wb->state &= ~WB_STATE_STALLED;
> +			}
> +			spin_unlock_bh(&bdi->wb_lock);
> +		}

First time through we clear the stalled state by walking
&bdi->wq_list, but....

> +
> +		if (work->sb) {
> +			wb = &work->sb->s_dirty_inodes;
> +			if (wb->state & WB_STATE_STALLED)
> +				wb = NULL;

if the sb state is stalled we don't do writeback, and ....

> @@ -1015,6 +1017,13 @@ void kill_block_super(struct super_block *sb)
>  	struct block_device *bdev = sb->s_bdev;
>  	fmode_t mode = sb->s_mode;
>  
> +	/*
> +	 * Unregister superblock from periodic writeback. There may be
> +	 * writeback still running for it but we call sync_filesystem() later
> +	 * and that will execute only after any background writeback is stopped.
> +	 * This guarantees flusher won't touch sb that's going away.
> +	 */
> +	bdi_writeback_queue_unregister(&sb->s_dirty_inodes);
>  	bdev->bd_super = NULL;
>  	generic_shutdown_super(sb);

We unregister the writeback queue from the BDI before unmount runs
sync_filesystem() from geneic_shutdown_super(sb), and ....

> +/*
> + * Unregister writeback queue from BDI. No further background writeback will be
> + * started against this superblock. However note that there may be writeback
> + * still running for the sb.
> + */
> +void bdi_writeback_queue_unregister(struct bdi_writeback *wb_queue)
> +{
> +	struct backing_dev_info *bdi = wb_bdi(wb_queue);
> +
> +	/* Make sure flusher cannot find the superblock any longer */
> +	spin_lock_bh(&bdi->wb_lock);
> +	list_del_init(&wb_queue->bdi_list);
> +	spin_unlock_bh(&bdi->wb_lock);
>  }

Unregistering the BDI removes it from the BDI list and hence
bdi_writeback will never clear the WB_STATE_STALLED bit on
superblocks trying to do writeback in unmount.

I'm not sure I really like this code very much - it seems to be
muchmore complex than it needs to be because writeback is still
managed on a per-bdi basis and the sb iteration is pretty clunky.
If we are moving to per-sb inode tracking, we should also move all
the writeback management to per-sb as well.

IMO, there's no good reason for keeping flusher threads per-bdi and
then having to iterate per-sb just to do background/periodic
writeback, and then have special cases for sb specific writeback
that avoids the bdi per-sb looping. i.e. per-sb flush work executed
by a bdi flusher thread makes a lot more sense than per-bdi
flush work that iterates superblocks.

So for the moment, I think this patch makes things worse rather than
better. I'd much prefer to see a single series that moves from per-bdi
tracking/writeback to per-sb tracking/writeback than to split the
tracking/writeback changes and then have to support an weird,
temporary, intermediate code base like this...

Ignoring that, the hack below makes this patch work for me.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

writeback: clear WB_STATE_STALLED for sb specific writeback

From: Dave Chinner <dchinner@redhat.com>

During unmount, the superblock has been removed from the bdi
writeback list, and so never has the WB_STATE_STALLED flag cleared
before writeback is attempted. hence it never does writeback because
it sees this flag. Fix this by unconditionally clearing the flag if
work->sb is set rather than iterating the bdi....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/fs-writeback.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e80d1b9..6d9cd0c 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -780,12 +780,20 @@ static long bdi_writeback(struct backing_dev_info *bdi,
 
 		/*
 		 * If we made some progress, clear stalled state to retry other
-		 * writeback queues as well.
+		 * writeback queues as well. Note that unmount can remove the
+		 * wbqueue from the bdi before we get here, in which case we'll
+		 * be flushing a specific superblock and hence we have to
+		 * specifically clear the superblock stalled state.
 		 */
 		if (progress) {
 			spin_lock_bh(&bdi->wb_lock);
-			list_for_each_entry(wb, &bdi->wq_list, bdi_list) {
+			if (work->sb) {
+				wb = &work->sb->s_dirty_inodes;
 				wb->state &= ~WB_STATE_STALLED;
+			} else {
+				list_for_each_entry(wb, &bdi->wq_list, bdi_list) {
+					wb->state &= ~WB_STATE_STALLED;
+				}
 			}
 			spin_unlock_bh(&bdi->wb_lock);
 		}

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 14/14] writeback: Per-sb dirty tracking
  2014-08-05 23:44   ` Dave Chinner
@ 2014-08-06  8:46     ` Jan Kara
  2014-08-06 21:13       ` Dave Chinner
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2014-08-06  8:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Jan Kara, linux-fsdevel, OGAWA Hirofumi, Wu Fengguang

On Wed 06-08-14 09:44:16, Dave Chinner wrote:
> On Fri, Aug 01, 2014 at 12:00:53AM +0200, Jan Kara wrote:
> > Switch inode dirty tracking lists to be per superblock instead of per
> > bdi. This is a major step towards filesystems being able to do their
> > own dirty tracking and selection of inodes for writeback if they desire
> > so (e.g. because they journal or COW data and need to writeback inodes
> > & pages in a specific order unknown to generic writeback code).
> > 
> > Per superblock dirty lists also make selecting inodes for writeback
> > somewhat simpler because we don't have to search for inodes from a
> > particular superblock for some kinds of writeback (OTOH we pay for this
> > by having to iterate through superblocks for all-bdi type of writeback)
> > and this simplification will allow for an easier switch to a better
> > scaling data structure for dirty inodes.
> 
> I think the WB_STATE_STALLED code is buggy w.r.t. unmount.
> 
> > @@ -672,6 +670,15 @@ static long writeback_inodes(struct bdi_writeback *wb,
> >  				break;
> >  		}
> >  	}
> > +
> > +	/*
> > +	 * In case we made no progress in current IO batch and there are no
> > +	 * inodes postponed for further writeback, set WB_STATE_STALLED
> > +	 * so that flusher doesn't busyloop in case no dirty inodes can be
> > +	 * written.
> > +	 */
> > +	if (!wrote && list_empty(&wb->b_more_io))
> > +		wb->state |= WB_STATE_STALLED;
> >  	spin_unlock(&wb->list_lock);
> 
> Last background writeback ends with WB_STATE_STALLED.
> 
> > @@ -771,26 +778,47 @@ static long bdi_writeback(struct backing_dev_info *bdi,
> >  		} else if (work->for_background)
> >  			oldest_jif = jiffies;
> >  
> > +		/*
> > +		 * If we made some progress, clear stalled state to retry other
> > +		 * writeback queues as well.
> > +		 */
> > +		if (progress) {
> > +			spin_lock_bh(&bdi->wb_lock);
> > +			list_for_each_entry(wb, &bdi->wq_list, bdi_list) {
> > +				wb->state &= ~WB_STATE_STALLED;
> > +			}
> > +			spin_unlock_bh(&bdi->wb_lock);
> > +		}
> 
> First time through we clear the stalled state by walking
> &bdi->wq_list, but....
> 
> > +
> > +		if (work->sb) {
> > +			wb = &work->sb->s_dirty_inodes;
> > +			if (wb->state & WB_STATE_STALLED)
> > +				wb = NULL;
> 
> if the sb state is stalled we don't do writeback, and ....
> 
> > @@ -1015,6 +1017,13 @@ void kill_block_super(struct super_block *sb)
> >  	struct block_device *bdev = sb->s_bdev;
> >  	fmode_t mode = sb->s_mode;
> >  
> > +	/*
> > +	 * Unregister superblock from periodic writeback. There may be
> > +	 * writeback still running for it but we call sync_filesystem() later
> > +	 * and that will execute only after any background writeback is stopped.
> > +	 * This guarantees flusher won't touch sb that's going away.
> > +	 */
> > +	bdi_writeback_queue_unregister(&sb->s_dirty_inodes);
> >  	bdev->bd_super = NULL;
> >  	generic_shutdown_super(sb);
> 
> We unregister the writeback queue from the BDI before unmount runs
> sync_filesystem() from geneic_shutdown_super(sb), and ....
> 
> > +/*
> > + * Unregister writeback queue from BDI. No further background writeback will be
> > + * started against this superblock. However note that there may be writeback
> > + * still running for the sb.
> > + */
> > +void bdi_writeback_queue_unregister(struct bdi_writeback *wb_queue)
> > +{
> > +	struct backing_dev_info *bdi = wb_bdi(wb_queue);
> > +
> > +	/* Make sure flusher cannot find the superblock any longer */
> > +	spin_lock_bh(&bdi->wb_lock);
> > +	list_del_init(&wb_queue->bdi_list);
> > +	spin_unlock_bh(&bdi->wb_lock);
> >  }
> 
> Unregistering the BDI removes it from the BDI list and hence
> bdi_writeback will never clear the WB_STATE_STALLED bit on
> superblocks trying to do writeback in unmount.
  Ah, well spotted!
 
> I'm not sure I really like this code very much - it seems to be
> muchmore complex than it needs to be because writeback is still
> managed on a per-bdi basis and the sb iteration is pretty clunky.
> If we are moving to per-sb inode tracking, we should also move all
> the writeback management to per-sb as well.
> 
> IMO, there's no good reason for keeping flusher threads per-bdi and
> then having to iterate per-sb just to do background/periodic
> writeback, and then have special cases for sb specific writeback
> that avoids the bdi per-sb looping. i.e. per-sb flush work executed
> by a bdi flusher thread makes a lot more sense than per-bdi
> flush work that iterates superblocks.
> 
> So for the moment, I think this patch makes things worse rather than
> better. I'd much prefer to see a single series that moves from per-bdi
> tracking/writeback to per-sb tracking/writeback than to split the
> tracking/writeback changes and then have to support an weird,
> temporary, intermediate code base like this...
  So when writing this series I was thinking about both possibilities -
i.e., keeping per-bdi threads and changing to per-sb threads. In the end
I've decided to start with keeping per-bdi threads and seeing how things
work out.

Regarding per-sb threads I have two unresolved questions:

1) How to handle block device inodes - when filesystem is mounted on the
block device, it would be natural to writeback such inode together with
other filesystem's inodes. When filesystem isn't mounted on the block
device we have no superblock to attach such inode to so we would have to
have some flusher for the virtual blkdev superblock that will deal with
specifics of block device superblock.

2) How to deal with multiple superblocks per device - and here I'm
convinced we should not regress writeback performance of the case where
disk is partitioned into several partitions. And I think this will require
some kind of synchronization between per-sb threads on the same device.

So overall I'm not convinced that per-sb threads will end up being simpler
than per-bdi threads. But we can try...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 14/14] writeback: Per-sb dirty tracking
  2014-08-06  8:46     ` Jan Kara
@ 2014-08-06 21:13       ` Dave Chinner
  2014-08-08 10:46         ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2014-08-06 21:13 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, OGAWA Hirofumi, Wu Fengguang

On Wed, Aug 06, 2014 at 10:46:39AM +0200, Jan Kara wrote:
> On Wed 06-08-14 09:44:16, Dave Chinner wrote:
> > On Fri, Aug 01, 2014 at 12:00:53AM +0200, Jan Kara wrote:
> > > Switch inode dirty tracking lists to be per superblock instead of per
> > > bdi. This is a major step towards filesystems being able to do their
> > > own dirty tracking and selection of inodes for writeback if they desire
> > > so (e.g. because they journal or COW data and need to writeback inodes
> > > & pages in a specific order unknown to generic writeback code).
> > > 
> > > Per superblock dirty lists also make selecting inodes for writeback
> > > somewhat simpler because we don't have to search for inodes from a
> > > particular superblock for some kinds of writeback (OTOH we pay for this
> > > by having to iterate through superblocks for all-bdi type of writeback)
> > > and this simplification will allow for an easier switch to a better
> > > scaling data structure for dirty inodes.
....
> > I'm not sure I really like this code very much - it seems to be
> > muchmore complex than it needs to be because writeback is still
> > managed on a per-bdi basis and the sb iteration is pretty clunky.
> > If we are moving to per-sb inode tracking, we should also move all
> > the writeback management to per-sb as well.
> > 
> > IMO, there's no good reason for keeping flusher threads per-bdi and
> > then having to iterate per-sb just to do background/periodic
> > writeback, and then have special cases for sb specific writeback
> > that avoids the bdi per-sb looping. i.e. per-sb flush work executed
> > by a bdi flusher thread makes a lot more sense than per-bdi
> > flush work that iterates superblocks.
> > 
> > So for the moment, I think this patch makes things worse rather than
> > better. I'd much prefer to see a single series that moves from per-bdi
> > tracking/writeback to per-sb tracking/writeback than to split the
> > tracking/writeback changes and then have to support an weird,
> > temporary, intermediate code base like this...
>   So when writing this series I was thinking about both possibilities -
> i.e., keeping per-bdi threads and changing to per-sb threads. In the end
> I've decided to start with keeping per-bdi threads and seeing how things
> work out.

> Regarding per-sb threads I have two unresolved questions:
> 
> 1) How to handle block device inodes - when filesystem is mounted on the
> block device, it would be natural to writeback such inode together with
> other filesystem's inodes. When filesystem isn't mounted on the block
> device we have no superblock to attach such inode to so we would have to
> have some flusher for the virtual blkdev superblock that will deal with
> specifics of block device superblock.

You mean for the blockdev_superblock? That's probably a good idea;
it would isolate the special case blockdev inode writeback in it's
own little function. i.e. this is crying out for a superblock
specific writeback method (just like tux3 is asking for) that simply
does:

	list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
		if (work->sync == WB_SYNC_ALL)
			__sync_blockdev(I_BDEV(inode), 1);
		else
			__sync_blockdev(I_BDEV(inode), 0);
	}

For data integrity operations, the higher layers call
sync_blockdev() appropriately so we don't actually have to care
about blockdev inodes during per-sb writeback at all.

With that, we get rid of another longstanding, ugly wart in the
writeback code.

> 2) How to deal with multiple superblocks per device - and here I'm
> convinced we should not regress writeback performance of the case where
> disk is partitioned into several partitions. And I think this will require
> some kind of synchronization between per-sb threads on the same device.

Right, that's why I suggested this:

> > i.e. per-sb flush work executed
> > by a bdi flusher thread makes a lot more sense than per-bdi
> > flush work that iterates superblocks.

What I mean is that every work that is executed on a BDI has
work->sb set. Functions like wb_check_background_flush and
wb_check_background_flush no longer take just the bdi, they also
take the sb they are going to operate on so it canbe stuffed into
the work structure that is passed to bdi_writeback(). At this point,
bdi_writeback() only needs to operate on a single sb.

We still have all the same "periodic/background writeback is aborted
if new work is queued" logic so that data integrity operations
preempt background work, so there is no difference in behaviour
there.  i.e. the bdi still manages the interactions between all the
superblocks, just from a slightly different perspective: we
explicitly schedule writeback by superblock rather than by bdi.

IOWs, I'm not suggesting per-sb flusher threads, just scheduling of
the work to the existing bdi flusher thread slightly differently.

I suspect that the above two things will also make the sb-specific
writeback callout much more obvious and, in general, make the
writeback logic easier to read as there's no extraneous juggling of
special cases needed. The sb callout will handles the blockdev inode
writeback case, and the change of work scheduling gets rid of the
need to iterate the sbs on the bdi....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 14/14] writeback: Per-sb dirty tracking
  2014-08-06 21:13       ` Dave Chinner
@ 2014-08-08 10:46         ` Jan Kara
  2014-08-10 23:16           ` Dave Chinner
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2014-08-08 10:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Jan Kara, linux-fsdevel, OGAWA Hirofumi, Wu Fengguang

On Thu 07-08-14 07:13:05, Dave Chinner wrote:
> On Wed, Aug 06, 2014 at 10:46:39AM +0200, Jan Kara wrote:
> > On Wed 06-08-14 09:44:16, Dave Chinner wrote:
> > > On Fri, Aug 01, 2014 at 12:00:53AM +0200, Jan Kara wrote:
> > > > Switch inode dirty tracking lists to be per superblock instead of per
> > > > bdi. This is a major step towards filesystems being able to do their
> > > > own dirty tracking and selection of inodes for writeback if they desire
> > > > so (e.g. because they journal or COW data and need to writeback inodes
> > > > & pages in a specific order unknown to generic writeback code).
> > > > 
> > > > Per superblock dirty lists also make selecting inodes for writeback
> > > > somewhat simpler because we don't have to search for inodes from a
> > > > particular superblock for some kinds of writeback (OTOH we pay for this
> > > > by having to iterate through superblocks for all-bdi type of writeback)
> > > > and this simplification will allow for an easier switch to a better
> > > > scaling data structure for dirty inodes.
> ....
> > > I'm not sure I really like this code very much - it seems to be
> > > muchmore complex than it needs to be because writeback is still
> > > managed on a per-bdi basis and the sb iteration is pretty clunky.
> > > If we are moving to per-sb inode tracking, we should also move all
> > > the writeback management to per-sb as well.
> > > 
> > > IMO, there's no good reason for keeping flusher threads per-bdi and
> > > then having to iterate per-sb just to do background/periodic
> > > writeback, and then have special cases for sb specific writeback
> > > that avoids the bdi per-sb looping. i.e. per-sb flush work executed
> > > by a bdi flusher thread makes a lot more sense than per-bdi
> > > flush work that iterates superblocks.
> > > 
> > > So for the moment, I think this patch makes things worse rather than
> > > better. I'd much prefer to see a single series that moves from per-bdi
> > > tracking/writeback to per-sb tracking/writeback than to split the
> > > tracking/writeback changes and then have to support an weird,
> > > temporary, intermediate code base like this...
> >   So when writing this series I was thinking about both possibilities -
> > i.e., keeping per-bdi threads and changing to per-sb threads. In the end
> > I've decided to start with keeping per-bdi threads and seeing how things
> > work out.
> 
> > Regarding per-sb threads I have two unresolved questions:
> > 
> > 1) How to handle block device inodes - when filesystem is mounted on the
> > block device, it would be natural to writeback such inode together with
> > other filesystem's inodes. When filesystem isn't mounted on the block
> > device we have no superblock to attach such inode to so we would have to
> > have some flusher for the virtual blkdev superblock that will deal with
> > specifics of block device superblock.
> 
> You mean for the blockdev_superblock? That's probably a good idea;
> it would isolate the special case blockdev inode writeback in it's
> own little function. i.e. this is crying out for a superblock
> specific writeback method (just like tux3 is asking for) that simply
> does:
> 
> 	list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
> 		if (work->sync == WB_SYNC_ALL)
> 			__sync_blockdev(I_BDEV(inode), 1);
> 		else
> 			__sync_blockdev(I_BDEV(inode), 0);
> 	}
  Sadly, it's not that easy - you have to pass in struct writeback_control
so that writing obeys nr_to_write - otherwise you can livelock doing
writeback and preemption of background works won't work either. So you have
to do something along the lines of what writeback_inodes() does...

Also I don't really see how to get rid of specialcasing block devices.
Normal superblocks are attached to a bdi - bdi flusher iterates all
superblocks on that bdi to do background writeback of the bdi. However
block device superblock is special - inodes of that superblock belong to
different bdis so you cannot just attach that superblock to any particular
bdi. That's why I created a special "writeback queue" for each bdi embedded
in struct backing_dev_info and use that for tracking writeback state of
block device inodes belonging to the bdi.

> For data integrity operations, the higher layers call
> sync_blockdev() appropriately so we don't actually have to care
> about blockdev inodes during per-sb writeback at all.
  Agreed.

> With that, we get rid of another longstanding, ugly wart in the
> writeback code.
> 
> > 2) How to deal with multiple superblocks per device - and here I'm
> > convinced we should not regress writeback performance of the case where
> > disk is partitioned into several partitions. And I think this will require
> > some kind of synchronization between per-sb threads on the same device.
> 
> Right, that's why I suggested this:
> 
> > > i.e. per-sb flush work executed
> > > by a bdi flusher thread makes a lot more sense than per-bdi
> > > flush work that iterates superblocks.
  Aha, sorry, I misunderstood this sentense.

> What I mean is that every work that is executed on a BDI has
> work->sb set. Functions like wb_check_background_flush and
> wb_check_background_flush no longer take just the bdi, they also
> take the sb they are going to operate on so it canbe stuffed into
> the work structure that is passed to bdi_writeback(). At this point,
> bdi_writeback() only needs to operate on a single sb.
> 
> We still have all the same "periodic/background writeback is aborted
> if new work is queued" logic so that data integrity operations
> preempt background work, so there is no difference in behaviour
> there.  i.e. the bdi still manages the interactions between all the
> superblocks, just from a slightly different perspective: we
> explicitly schedule writeback by superblock rather than by bdi.
  OK, this is an interesting idea. However suppose you have two active
filesystems on a bdi. Now you have to somehow switch between these two
filesystems - I don't think you can switch to the second filesystem after
you clean the first one because that may starve the second filesystem of
background writeback for a long time and in the extreme case it may never
happen. But I suppose if we set nr_to_write sensibly we could avoid such
problems... I'll try this and see how it looks in the code.

> IOWs, I'm not suggesting per-sb flusher threads, just scheduling of
> the work to the existing bdi flusher thread slightly differently.
> 
> I suspect that the above two things will also make the sb-specific
> writeback callout much more obvious and, in general, make the
> writeback logic easier to read as there's no extraneous juggling of
> special cases needed. The sb callout will handles the blockdev inode
> writeback case, and the change of work scheduling gets rid of the
> need to iterate the sbs on the bdi....

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 14/14] writeback: Per-sb dirty tracking
  2014-08-08 10:46         ` Jan Kara
@ 2014-08-10 23:16           ` Dave Chinner
  0 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2014-08-10 23:16 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, OGAWA Hirofumi, Wu Fengguang

On Fri, Aug 08, 2014 at 12:46:19PM +0200, Jan Kara wrote:
> On Thu 07-08-14 07:13:05, Dave Chinner wrote:
> > On Wed, Aug 06, 2014 at 10:46:39AM +0200, Jan Kara wrote:
> > > On Wed 06-08-14 09:44:16, Dave Chinner wrote:
> > > > On Fri, Aug 01, 2014 at 12:00:53AM +0200, Jan Kara wrote:
> > > > > Switch inode dirty tracking lists to be per superblock instead of per
> > > > > bdi. This is a major step towards filesystems being able to do their
> > > > > own dirty tracking and selection of inodes for writeback if they desire
> > > > > so (e.g. because they journal or COW data and need to writeback inodes
> > > > > & pages in a specific order unknown to generic writeback code).
> > > > > 
> > > > > Per superblock dirty lists also make selecting inodes for writeback
> > > > > somewhat simpler because we don't have to search for inodes from a
> > > > > particular superblock for some kinds of writeback (OTOH we pay for this
> > > > > by having to iterate through superblocks for all-bdi type of writeback)
> > > > > and this simplification will allow for an easier switch to a better
> > > > > scaling data structure for dirty inodes.
> > ....
> > > > I'm not sure I really like this code very much - it seems to be
> > > > muchmore complex than it needs to be because writeback is still
> > > > managed on a per-bdi basis and the sb iteration is pretty clunky.
> > > > If we are moving to per-sb inode tracking, we should also move all
> > > > the writeback management to per-sb as well.
> > > > 
> > > > IMO, there's no good reason for keeping flusher threads per-bdi and
> > > > then having to iterate per-sb just to do background/periodic
> > > > writeback, and then have special cases for sb specific writeback
> > > > that avoids the bdi per-sb looping. i.e. per-sb flush work executed
> > > > by a bdi flusher thread makes a lot more sense than per-bdi
> > > > flush work that iterates superblocks.
> > > > 
> > > > So for the moment, I think this patch makes things worse rather than
> > > > better. I'd much prefer to see a single series that moves from per-bdi
> > > > tracking/writeback to per-sb tracking/writeback than to split the
> > > > tracking/writeback changes and then have to support an weird,
> > > > temporary, intermediate code base like this...
> > >   So when writing this series I was thinking about both possibilities -
> > > i.e., keeping per-bdi threads and changing to per-sb threads. In the end
> > > I've decided to start with keeping per-bdi threads and seeing how things
> > > work out.
> > 
> > > Regarding per-sb threads I have two unresolved questions:
> > > 
> > > 1) How to handle block device inodes - when filesystem is mounted on the
> > > block device, it would be natural to writeback such inode together with
> > > other filesystem's inodes. When filesystem isn't mounted on the block
> > > device we have no superblock to attach such inode to so we would have to
> > > have some flusher for the virtual blkdev superblock that will deal with
> > > specifics of block device superblock.
> > 
> > You mean for the blockdev_superblock? That's probably a good idea;
> > it would isolate the special case blockdev inode writeback in it's
> > own little function. i.e. this is crying out for a superblock
> > specific writeback method (just like tux3 is asking for) that simply
> > does:
> > 
> > 	list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
> > 		if (work->sync == WB_SYNC_ALL)
> > 			__sync_blockdev(I_BDEV(inode), 1);
> > 		else
> > 			__sync_blockdev(I_BDEV(inode), 0);
> > 	}
>   Sadly, it's not that easy - you have to pass in struct writeback_control
> so that writing obeys nr_to_write - otherwise you can livelock doing
> writeback and preemption of background works won't work either. So you have
> to do something along the lines of what writeback_inodes() does...

Sure, the above is just an illustration of the concept, not a
complete solution. indeed, the only real work __sync_blockdev() ends
up doing is do_writepages/filemap_fdatawait. We don't really care
what the inode dirty flags are - if there's nothing dirty then
do_writepages won't issue IO, but filemap_fdatawait() will still
wait on any IO in progress. Hence a real loop looks more like
iterate_bdevs() and ends up looking somewhat like:

	list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
		/* get inode references, drop list locks */

		if (bdi_wb_need_preempt(bdi, work))
			break;

		if (!bdi_wb_need_background(bdi, work))
			break;

		do_writepages(inode->i_mapping, wbc);
		if (wbc->sync == WB_SYNC_ALL)
			filemap_fdatawait(inode->i_mapping);

		/* drop references, lock lists */
	}

I can't see why it needs to be any more complex - syncing a blockdev
is actually really damn simple. That simplicity is actually hidden
by all the other stuff in the inode writeback path needed for
filesystem inodes. Again, this is just psuedo code, so treat it as
such....

IMO, the point of moving to per-sb writeback implementations is to
be able to make code that should be simple actually be simple.
Eveything is tangled up at the moment, so it's really hard to see
what should be simple and what actually needs to be complex.

> Also I don't really see how to get rid of specialcasing block devices.
> Normal superblocks are attached to a bdi - bdi flusher iterates all
> superblocks on that bdi to do background writeback of the bdi. However
> block device superblock is special - inodes of that superblock belong to
> different bdis so you cannot just attach that superblock to any particular
> bdi.

Then attach a psuedo BDI to the psuedo superblock list. We only need
a BDI to provide a flusher thread and scheduling of the flush
work....

> That's why I created a special "writeback queue" for each bdi embedded
> in struct backing_dev_info and use that for tracking writeback state of
> block device inodes belonging to the bdi.

AFAIC, that's a solution required by not abstracting the code in the
best manner.

> > For data integrity operations, the higher layers call
> > sync_blockdev() appropriately so we don't actually have to care
> > about blockdev inodes during per-sb writeback at all.
>   Agreed.
> 
> > With that, we get rid of another longstanding, ugly wart in the
> > writeback code.
> > 
> > > 2) How to deal with multiple superblocks per device - and here I'm
> > > convinced we should not regress writeback performance of the case where
> > > disk is partitioned into several partitions. And I think this will require
> > > some kind of synchronization between per-sb threads on the same device.
> > 
> > Right, that's why I suggested this:
> > 
> > > > i.e. per-sb flush work executed
> > > > by a bdi flusher thread makes a lot more sense than per-bdi
> > > > flush work that iterates superblocks.
>   Aha, sorry, I misunderstood this sentense.
> 
> > What I mean is that every work that is executed on a BDI has
> > work->sb set. Functions like wb_check_background_flush and
> > wb_check_background_flush no longer take just the bdi, they also
> > take the sb they are going to operate on so it canbe stuffed into
> > the work structure that is passed to bdi_writeback(). At this point,
> > bdi_writeback() only needs to operate on a single sb.
> > 
> > We still have all the same "periodic/background writeback is aborted
> > if new work is queued" logic so that data integrity operations
> > preempt background work, so there is no difference in behaviour
> > there.  i.e. the bdi still manages the interactions between all the
> > superblocks, just from a slightly different perspective: we
> > explicitly schedule writeback by superblock rather than by bdi.
>   OK, this is an interesting idea. However suppose you have two active
> filesystems on a bdi. Now you have to somehow switch between these two
> filesystems - I don't think you can switch to the second filesystem after
> you clean the first one because that may starve the second filesystem of
> background writeback for a long time and in the extreme case it may never
> happen. But I suppose if we set nr_to_write sensibly we could avoid such
> problems... I'll try this and see how it looks in the code.

Yeah, nr_to_write management should solve this - we already have a
bandwidth estimation for the device. making the per-sb writeback
slice be (BW / nr_sb_on_bdi) would give a roughly fair behaviour
by only allowing a single background writeback execution it's fair
share of the BW. If other SBs are idle, we immediately run the
current SB again with a new share....

FWIW, we already abort background writeback on a multi-sb BDI when a
work->sb work item comes in (e.g. from sync()/syncfs()). hence we
can already starve background/periodic writeback on certain sb's on
such a BDI simply by constantly dirtying a single sb and calling
syncfs() in a loop. Hence cross-sb background writeback starvation
is already an issue. I suspect that moving to per-sb scheduling will
allow us to note per-emption more easily, and if starvation is
occuring allow background writeback to ignore pre-emption for a
nr_to_write slice periodically...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-08-10 23:16 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
2014-07-31 22:00 ` [PATCH 01/14] writeback: Get rid of superblock pinning Jan Kara
2014-07-31 22:00 ` [PATCH 02/14] writeback: Remove writeback_inodes_wb() Jan Kara
2014-07-31 22:00 ` [PATCH 03/14] writeback: Remove useless argument of writeback_single_inode() Jan Kara
2014-07-31 22:00 ` [PATCH 04/14] writeback: Don't put inodes which cannot be written to b_more_io Jan Kara
2014-07-31 22:00 ` [PATCH 05/14] writeback: Move dwork and last_old_flush into backing_dev_info Jan Kara
2014-07-31 22:00 ` [PATCH 06/14] writeback: Switch locking of bandwidth fields to wb_lock Jan Kara
2014-07-31 22:00 ` [PATCH 07/14] writeback: Provide a function to get bdi from bdi_writeback Jan Kara
2014-07-31 22:00 ` [PATCH 08/14] writeback: Schedule future writeback if bdi (not wb) has dirty inodes Jan Kara
2014-07-31 22:00 ` [PATCH 09/14] writeback: Switch some function arguments from bdi_writeback to bdi Jan Kara
2014-07-31 22:00 ` [PATCH 10/14] writeback: Move rechecking of work list into bdi_process_work_items() Jan Kara
2014-07-31 22:00 ` [PATCH 11/14] writeback: Shorten list_lock hold times in bdi_writeback() Jan Kara
2014-07-31 22:00 ` [PATCH 12/14] writeback: Move refill of b_io list into writeback_inodes() Jan Kara
2014-07-31 22:00 ` [PATCH 13/14] writeback: Comment update Jan Kara
2014-07-31 22:00 ` [PATCH 14/14] writeback: Per-sb dirty tracking Jan Kara
2014-08-01  5:14   ` Daniel Phillips
2014-08-05 23:44   ` Dave Chinner
2014-08-06  8:46     ` Jan Kara
2014-08-06 21:13       ` Dave Chinner
2014-08-08 10:46         ` Jan Kara
2014-08-10 23:16           ` Dave Chinner
2014-08-01  5:32 ` [RFC PATCH 00/14] Per-sb tracking of dirty inodes Daniel Phillips
2014-08-05  5:22 ` Dave Chinner
2014-08-05 10:31   ` Jan Kara
2014-08-05  8:20 ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.