All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups
@ 2010-07-23 15:05 Artem Bityutskiy
  2010-07-23 15:05 ` [PATCHv4 01/14] writeback: harmonize writeback threads naming Artem Bityutskiy
                   ` (13 more replies)
  0 siblings, 14 replies; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-23 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel

Hi,

here is v4 of the patch series which clean-ups bdi threads and substantially
lessens amount of unnecessary kernel wake-ups, which is very important on
battery-powered devices.

Changes since v3:
Tested more internally and found problems in bdi shutdown path. Namely, I forgot
to move the code which wakes up processes waiting on BDI_pending bit to the
forker thread. I also spotted few other things. Only patch N10 was changed.

1. Removed "Reviewed-by: Christoph Hellwig <hch@lst.de>" tag from patch 10
   "writeback: move bdi threads exiting logic to the forker thread" in this
   patch-set, because it was changed substantialy, and needs new review.
2. Move 'clear_bit()' and 'wake_up_bit()' bit to bdi forker
3. Add 'smp_wmb()' and 'smp_rmb()' in bdi forker and 'bdi_wb_shutdown()'
4. In 'bdi_wb_shutdown()' first remove the bdi from 'bdi_list', and then wait
   on the 'BDI_pending' bit, not vise-versa.

Changes since v2:
Basically, added patches 12, 13, 14. Other patches are almost identical to v2,
but I fixed few warnings about mixing tabs and spaces were fixed there. Also,
tweaked them as described in item 6 below.
1. Delay bdi thread wake-up as suggested by Nick Piggin, see patch N12
2. Do not invoke init_timer unnecessarily - spotted and fixed an imperfection,
   see patch N13
3. Introduce tracepoints to the code path which wakes up bdi threads for
   periodic write-back, suggested by Dave Chinner, see patch N14
4. Fix few checkpatch.pl warnings
5. Add more "Reviewed-by" tags
6. Amend patches so that we would check the optimistic case first
   (http://lkml.org/lkml/2010/7/22/112)

Changes since v1
Basically, address all requests from Christoph except of 2.
1.  Drop "[PATCH 01/16] writeback: do not self-wakeup"
2.  Add all "Reviewed-by"
3.  Rebase to the latest "linux-2.6-block / for-2.6.36"
4.  Re-order patches so that the independent ones would go first and could
    be picked independently
5.  Do not remove comment about "temporary measure" in the forker thread.
6.  Drop "[PATCH 01/13] writeback: remove redundant list initialization"
    because one of later patches will kill whole function, so this small
    patch is pointless
7.  Merge "[PATCH 03/13] writeback: clean-up the warning about non-registered
    bdi" with the patch which adds bdi threads wake-ups to
    '__mark_inode_dirty()'.
9.  Do not remove bdis from the bdi_list
8.  Drop "[PATCH 09/13] writeback: add to bdi_list in the forker thread"
    because we do not remove bdis from the bdi_list anymore
10. Use less local variables which are not strictly needed

The following Christoph's requests were *not* addressed:

1. Restructure the loop in bdi forker, because we have to drop spinlock
   before forking a thread, see my answer here:
   http://lkml.org/lkml/2010/7/20/92
2. Get rid of 'BDI_pending' and use a per-bdi mutex. We cannot easily
   use a per-bdi mutex, because we would have to take it while holding
   the 'bdi_lock' spinlock. We could turn 'bdi_lock' into a mutex, though,
   and avoid dropping it before the task is created. This would eliminate
   the need in the 'BDI_pending' flag. I can do this change, if needed.


THE PROBLEM
~~~~~~~~~~~

Each block device has corresponding "flusher" thread, which is usually seen as
"flusher-x:y" in your 'ps' output. Flusher threads are responsible for
background write-back and are used in various kernel code paths like memory
reclamation as well as the periodic background write-out.

The flusher threads wake up every 5 seconds and check whether they have to
write anything back or not. In idle systems with good dynamic power-management
this means that they force the system to wake up from deep sleep, find out that
there is nothing to do, and waste power. This hurts small battery-powered
devices, e.g., linux-based phones.

Idle bdi thread wake-ups do not last forever: the threads kill themselves if
nothing useful has been done for 5 minutes.

However, there is the bdi forker thread, seen as 'bdi-default' in your 'ps'
output. This thread also wakes up every 5 seconds and checks whether it has to
fork a bdi flusher thread, in case there is dirty data on the bdi, but bdi
thread was killed. This thread never kills itself, and disturbs the system all
the time. Again, this is bad for battery-powered devices.

THE SOLUTION
~~~~~~~~~~~~

This patch-set makes bdi threads and the forker thread wake-up only if there is
job to do, otherwise they just sleep. The main idea is to wake-up the needed
thread when adding dirty data to the bdi.

To implement this:

1. I address various race conditions in the current bdi code.
2. I move the killing logic from bdi threads to the forker thread, so that we
   would have one central place where we make decisions about killing inactive
   bdi threads. The reason I do this is because otherwise it is difficult to
   kill inactive threads - they never wake-up, so would never kill themselves.
   There are other technical reasons, too.
3. I add a small piece of code to '__mark_inode_dirt()' which wakes up the bdi
   thread when dirty inodes arrive.
4. There are also clean-up patches and nicification patches which I found to be
   good for better code readability.
5. Some patches are just preparations which make the following real patches
   simpler and easier to review.
6. Some patches are just simplifications of current code.

With this patch-set bdi threads wake up considerably less.

v1 can be found here:
http://thread.gmane.org/gmane.linux.file-systems/43306
v2 can be found here:
http://thread.gmane.org/gmane.linux.kernel/1012439
v3 can be found here:
http://thread.gmane.org/gmane.linux.kernel/1013009

Artem.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCHv4 01/14] writeback: harmonize writeback threads naming
  2010-07-23 15:05 [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups Artem Bityutskiy
@ 2010-07-23 15:05 ` Artem Bityutskiy
  2010-07-23 15:05 ` [PATCHv4 02/14] writeback: fix possible race when creating bdi threads Artem Bityutskiy
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-23 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

The write-back code mixes words "thread" and "task" for the same things. This
is not a big deal, but still an inconsistency.

hch: a convention I tend to use and I've seen in various places
is to always use _task for the storage of the task_struct pointer,
and thread everywhere else.  This especially helps with having
foo_thread for the actual thread and foo_task for a global
variable keeping the task_struct pointer

This patch renames:
* 'bdi_add_default_flusher_task()' -> 'bdi_add_default_flusher_thread()'
* 'bdi_forker_task()'              -> 'bdi_forker_thread()'

because bdi threads are 'bdi_writeback_thread()', so these names are more
consistent.

This patch also amends commentaries and makes them refer the forker and bdi
threads as "thread", not "task".

Also, while on it, make 'bdi_add_default_flusher_thread()' declaration use
'static void' instead of 'void static' and make checkpatch.pl happy.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/fs-writeback.c           |    2 +-
 include/linux/backing-dev.h |    2 +-
 mm/backing-dev.c            |   26 +++++++++++++-------------
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index bf10cbf..5e6b7fc 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -840,7 +840,7 @@ int bdi_writeback_thread(void *data)
 
 			/*
 			 * Longest period of inactivity that we tolerate. If we
-			 * see dirty data again later, the task will get
+			 * see dirty data again later, the thread will get
 			 * recreated automatically.
 			 */
 			max_idle = max(5UL * 60 * HZ, wait_jiffies);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index e536f3a..f0936f5 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -50,7 +50,7 @@ struct bdi_writeback {
 
 	unsigned long last_old_flush;		/* last old data flush */
 
-	struct task_struct	*task;		/* writeback task */
+	struct task_struct	*task;		/* writeback thread */
 	struct list_head	b_dirty;	/* dirty inodes */
 	struct list_head	b_io;		/* parked for writeback */
 	struct list_head	b_more_io;	/* parked for more writeback */
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index ac78a33..4e9ed2a 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -50,7 +50,7 @@ static struct timer_list sync_supers_timer;
 static int bdi_sync_supers(void *);
 static void sync_supers_timer_fn(unsigned long);
 
-static void bdi_add_default_flusher_task(struct backing_dev_info *bdi);
+static void bdi_add_default_flusher_thread(struct backing_dev_info *bdi);
 
 #ifdef CONFIG_DEBUG_FS
 #include <linux/debugfs.h>
@@ -279,10 +279,10 @@ static void bdi_flush_io(struct backing_dev_info *bdi)
 }
 
 /*
- * kupdated() used to do this. We cannot do it from the bdi_forker_task()
+ * kupdated() used to do this. We cannot do it from the bdi_forker_thread()
  * or we risk deadlocking on ->s_umount. The longer term solution would be
  * to implement sync_supers_bdi() or similar and simply do it from the
- * bdi writeback tasks individually.
+ * bdi writeback thread individually.
  */
 static int bdi_sync_supers(void *unused)
 {
@@ -318,7 +318,7 @@ static void sync_supers_timer_fn(unsigned long unused)
 	bdi_arm_supers_timer();
 }
 
-static int bdi_forker_task(void *ptr)
+static int bdi_forker_thread(void *ptr)
 {
 	struct bdi_writeback *me = ptr;
 
@@ -354,7 +354,7 @@ static int bdi_forker_task(void *ptr)
 			    !bdi_has_dirty_io(bdi))
 				continue;
 
-			bdi_add_default_flusher_task(bdi);
+			bdi_add_default_flusher_thread(bdi);
 		}
 
 		set_current_state(TASK_INTERRUPTIBLE);
@@ -376,7 +376,7 @@ static int bdi_forker_task(void *ptr)
 
 		/*
 		 * This is our real job - check for pending entries in
-		 * bdi_pending_list, and create the tasks that got added
+		 * bdi_pending_list, and create the threads that got added
 		 */
 		bdi = list_entry(bdi_pending_list.next, struct backing_dev_info,
 				 bdi_list);
@@ -387,7 +387,7 @@ static int bdi_forker_task(void *ptr)
 		wb->task = kthread_run(bdi_writeback_thread, wb, "flush-%s",
 					dev_name(bdi->dev));
 		/*
-		 * If task creation fails, then readd the bdi to
+		 * If thread creation fails, then readd the bdi to
 		 * the pending list and force writeout of the bdi
 		 * from this forker thread. That will free some memory
 		 * and we can try again.
@@ -430,10 +430,10 @@ static void bdi_add_to_pending(struct rcu_head *head)
 }
 
 /*
- * Add the default flusher task that gets created for any bdi
+ * Add the default flusher thread that gets created for any bdi
  * that has dirty data pending writeout
  */
-void static bdi_add_default_flusher_task(struct backing_dev_info *bdi)
+static void bdi_add_default_flusher_thread(struct backing_dev_info *bdi)
 {
 	if (!bdi_cap_writeback_dirty(bdi))
 		return;
@@ -445,10 +445,10 @@ void static bdi_add_default_flusher_task(struct backing_dev_info *bdi)
 	}
 
 	/*
-	 * Check with the helper whether to proceed adding a task. Will only
+	 * Check with the helper whether to proceed adding a thread. Will only
 	 * abort if we two or more simultanous calls to
-	 * bdi_add_default_flusher_task() occured, further additions will block
-	 * waiting for previous additions to finish.
+	 * bdi_add_default_flusher_thread() occured, further additions will
+	 * block waiting for previous additions to finish.
 	 */
 	if (!test_and_set_bit(BDI_pending, &bdi->state)) {
 		list_del_rcu(&bdi->bdi_list);
@@ -506,7 +506,7 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 	if (bdi_cap_flush_forker(bdi)) {
 		struct bdi_writeback *wb = &bdi->wb;
 
-		wb->task = kthread_run(bdi_forker_task, wb, "bdi-%s",
+		wb->task = kthread_run(bdi_forker_thread, wb, "bdi-%s",
 						dev_name(dev));
 		if (IS_ERR(wb->task)) {
 			wb->task = NULL;
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv4 02/14] writeback: fix possible race when creating bdi threads
  2010-07-23 15:05 [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups Artem Bityutskiy
  2010-07-23 15:05 ` [PATCHv4 01/14] writeback: harmonize writeback threads naming Artem Bityutskiy
@ 2010-07-23 15:05 ` Artem Bityutskiy
  2010-07-23 15:05 ` [PATCHv4 03/14] writeback: do not lose wake-ups in the forker thread - 1 Artem Bityutskiy
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-23 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

This patch fixes a very unlikely race condition on the bdi forker thread error
path: when bdi thread creation fails, 'bdi->wb.task' may contain the error code
for a short period of time. If at the same time someone submits a work to this
bdi, we can end up with an oops 'bdi_queue_work()' while executing
'wake_up_process(wb->task)'.

This patch fixes the issue by introducing a temporary variable 'task' and
storing the possible error code there, so that 'wb->task' would never take
erroneous values.

Note, this race is very unlikely and I never hit it, so it is theoretical, but
nevertheless worth fixing.

This patch also merges 2 comments which were previously separate.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 mm/backing-dev.c |   28 +++++++++++-----------------
 1 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 4e9ed2a..327e36d 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -331,8 +331,8 @@ static int bdi_forker_thread(void *ptr)
 	set_user_nice(current, 0);
 
 	for (;;) {
+		struct task_struct *task;
 		struct backing_dev_info *bdi, *tmp;
-		struct bdi_writeback *wb;
 
 		/*
 		 * Temporary measure, we want to make sure we don't see
@@ -383,29 +383,23 @@ static int bdi_forker_thread(void *ptr)
 		list_del_init(&bdi->bdi_list);
 		spin_unlock_bh(&bdi_lock);
 
-		wb = &bdi->wb;
-		wb->task = kthread_run(bdi_writeback_thread, wb, "flush-%s",
-					dev_name(bdi->dev));
-		/*
-		 * If thread creation fails, then readd the bdi to
-		 * the pending list and force writeout of the bdi
-		 * from this forker thread. That will free some memory
-		 * and we can try again.
-		 */
-		if (IS_ERR(wb->task)) {
-			wb->task = NULL;
-
+		task = kthread_run(bdi_writeback_thread, &bdi->wb, "flush-%s",
+				   dev_name(bdi->dev));
+		if (IS_ERR(task)) {
 			/*
-			 * Add this 'bdi' to the back, so we get
-			 * a chance to flush other bdi's to free
-			 * memory.
+			 * If thread creation fails, then readd the bdi back to
+			 * the list and force writeout of the bdi from this
+			 * forker thread. That will free some memory and we can
+			 * try again. Add it to the tail so we get a chance to
+			 * flush other bdi's to free memory.
 			 */
 			spin_lock_bh(&bdi_lock);
 			list_add_tail(&bdi->bdi_list, &bdi_pending_list);
 			spin_unlock_bh(&bdi_lock);
 
 			bdi_flush_io(bdi);
-		}
+		} else
+			bdi->wb.task = task;
 	}
 
 	return 0;
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv4 03/14] writeback: do not lose wake-ups in the forker thread - 1
  2010-07-23 15:05 [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups Artem Bityutskiy
  2010-07-23 15:05 ` [PATCHv4 01/14] writeback: harmonize writeback threads naming Artem Bityutskiy
  2010-07-23 15:05 ` [PATCHv4 02/14] writeback: fix possible race when creating bdi threads Artem Bityutskiy
@ 2010-07-23 15:05 ` Artem Bityutskiy
  2010-07-23 15:05 ` [PATCHv4 04/14] writeback: do not lose wake-ups in the forker thread - 2 Artem Bityutskiy
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-23 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Currently the forker thread can lose wake-ups which may lead to unnecessary
delays in processing bdi works. E.g., consider the following scenario.

1. 'bdi_forker_thread()' walks the 'bdi_list', finds out there is nothing to
   do, and is about to finish the loop.
2. A bdi thread decides to exit because it was inactive for long time.
3. 'bdi_queue_work()' adds a work to the bdi which just exited, so it wakes up
   the forker thread.
4. but 'bdi_forker_thread()' executes 'set_current_state(TASK_INTERRUPTIBLE)'
   and goes sleep. We lose a wake-up.

Losing the wake-up is not fatal, but this means that the bdi work processing
will be delayed by up to 5 sec. This race is theoretical, I never hit it, but
it is worth fixing.

The fix is to execute 'set_current_state(TASK_INTERRUPTIBLE)' _before_ walking
'bdi_list', not after.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 mm/backing-dev.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 327e36d..b1dc2d4 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -342,6 +342,7 @@ static int bdi_forker_thread(void *ptr)
 			wb_do_writeback(me, 0);
 
 		spin_lock_bh(&bdi_lock);
+		set_current_state(TASK_INTERRUPTIBLE);
 
 		/*
 		 * Check if any existing bdi's have dirty data without
@@ -357,8 +358,6 @@ static int bdi_forker_thread(void *ptr)
 			bdi_add_default_flusher_thread(bdi);
 		}
 
-		set_current_state(TASK_INTERRUPTIBLE);
-
 		if (list_empty(&bdi_pending_list)) {
 			unsigned long wait;
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv4 04/14] writeback: do not lose wake-ups in the forker thread - 2
  2010-07-23 15:05 [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups Artem Bityutskiy
                   ` (2 preceding siblings ...)
  2010-07-23 15:05 ` [PATCHv4 03/14] writeback: do not lose wake-ups in the forker thread - 1 Artem Bityutskiy
@ 2010-07-23 15:05 ` Artem Bityutskiy
  2010-07-23 15:05 ` [PATCHv4 05/14] writeback: do not lose wake-ups in bdi threads Artem Bityutskiy
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-23 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Currently, if someone submits jobs for the default bdi, we can lose wake-up
events. E.g., this can happen if 'bdi_queue_work()' is called when
'bdi_forker_thread()' is executing code after 'wb_do_writeback(me, 0)', but
before 'set_current_state(TASK_INTERRUPTIBLE)'.

This situation is unlikely, and the result is not very severe - we'll just
delay the execution of the work, but this is still not very nice.

This patch fixes the issue by checking whether the default bdi has works before
the forker thread goes sleep.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 mm/backing-dev.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index b1dc2d4..72e6eb9 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -358,6 +358,10 @@ static int bdi_forker_thread(void *ptr)
 			bdi_add_default_flusher_thread(bdi);
 		}
 
+		/* Keep working if default bdi still has things to do */
+		if (!list_empty(&me->bdi->work_list))
+			__set_current_state(TASK_RUNNING);
+
 		if (list_empty(&bdi_pending_list)) {
 			unsigned long wait;
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv4 05/14] writeback: do not lose wake-ups in bdi threads
  2010-07-23 15:05 [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups Artem Bityutskiy
                   ` (3 preceding siblings ...)
  2010-07-23 15:05 ` [PATCHv4 04/14] writeback: do not lose wake-ups in the forker thread - 2 Artem Bityutskiy
@ 2010-07-23 15:05 ` Artem Bityutskiy
  2010-07-23 15:05 ` [PATCHv4 06/14] writeback: simplify bdi code a little Artem Bityutskiy
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-23 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Currently, bdi threads ('bdi_writeback_thread()') can lose wake-ups. For
example, if 'bdi_queue_work()' is executed after the bdi thread have had
finished 'wb_do_writeback()' but before it called
'schedule_timeout_interruptible()'.

To fix this issue, we have to check whether we have works to process after we
have changed the task state to 'TASK_INTERRUPTIBLE'.

This patch also clean-ups handling of the cases when 'dirty_writeback_interval'
is zero or non-zero.

Additionally, this patch also removes unneeded 'list_empty_careful()' call.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/fs-writeback.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 5e6b7fc..8cf53ba 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -848,17 +848,18 @@ int bdi_writeback_thread(void *data)
 				break;
 		}
 
-		if (dirty_writeback_interval) {
-			wait_jiffies = msecs_to_jiffies(dirty_writeback_interval * 10);
-			schedule_timeout_interruptible(wait_jiffies);
-		} else {
-			set_current_state(TASK_INTERRUPTIBLE);
-			if (list_empty_careful(&wb->bdi->work_list) &&
-			    !kthread_should_stop())
-				schedule();
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (!list_empty(&bdi->work_list)) {
 			__set_current_state(TASK_RUNNING);
+			continue;
 		}
 
+		if (dirty_writeback_interval) {
+			wait_jiffies = msecs_to_jiffies(dirty_writeback_interval * 10);
+			schedule_timeout(wait_jiffies);
+		} else
+			schedule();
+
 		try_to_freeze();
 	}
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv4 06/14] writeback: simplify bdi code a little
  2010-07-23 15:05 [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups Artem Bityutskiy
                   ` (4 preceding siblings ...)
  2010-07-23 15:05 ` [PATCHv4 05/14] writeback: do not lose wake-ups in bdi threads Artem Bityutskiy
@ 2010-07-23 15:05 ` Artem Bityutskiy
  2010-07-23 15:05 ` [PATCHv4 07/14] writeback: do not remove bdi from bdi_list Artem Bityutskiy
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-23 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

This patch simplifies bdi code a little by removing the 'pending_list' which is
redundant. Indeed, currently the forker thread ('bdi_forker_thread()') is
working like this:

1. In a loop, fetch all bdi's which have works but have no writeback thread and
   move them to the 'pending_list'.
2. If the list is empty, sleep for 5 sec.
3. Otherwise, take one bdi from the list, fork the writeback thread for this
   bdi, and repeat the loop.

IOW, it first moves everything to the 'pending_list', then process only one
element, and so on. This patch simplifies the algorithm, which is now as
follows.

1. Find the first bdi which has a work and remove it from the global list of
   bdi's (bdi_list).
2. If there was not such bdi, sleep 5 sec.
3. Fork the writeback thread for this bdi and repeat the loop.

IOW, now we find the first bdi to process, process it, and so on. This is
simpler and involves less lists.

The bonus now is that we can get rid of a couple of functions, as well as
remove complications which involve 'rcu_call()' and 'bdi->rcu_head'.

This patch also makes sure we use 'list_add_tail_rcu()', instead of plain
'list_add_tail()', but this piece of code is going to be removed in the next
patch anyway.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/backing-dev.h |    1 -
 mm/backing-dev.c            |   82 +++++++++---------------------------------
 2 files changed, 18 insertions(+), 65 deletions(-)

diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index f0936f5..95ecb2b 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -58,7 +58,6 @@ struct bdi_writeback {
 
 struct backing_dev_info {
 	struct list_head bdi_list;
-	struct rcu_head rcu_head;
 	unsigned long ra_pages;	/* max readahead in PAGE_CACHE_SIZE units */
 	unsigned long state;	/* Always use atomic bitops on this */
 	unsigned int capabilities; /* Device capabilities */
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 72e6eb9..dbc6681 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -50,8 +50,6 @@ static struct timer_list sync_supers_timer;
 static int bdi_sync_supers(void *);
 static void sync_supers_timer_fn(unsigned long);
 
-static void bdi_add_default_flusher_thread(struct backing_dev_info *bdi);
-
 #ifdef CONFIG_DEBUG_FS
 #include <linux/debugfs.h>
 #include <linux/seq_file.h>
@@ -331,6 +329,7 @@ static int bdi_forker_thread(void *ptr)
 	set_user_nice(current, 0);
 
 	for (;;) {
+		bool fork = false;
 		struct task_struct *task;
 		struct backing_dev_info *bdi, *tmp;
 
@@ -349,23 +348,30 @@ static int bdi_forker_thread(void *ptr)
 		 * a thread registered. If so, set that up.
 		 */
 		list_for_each_entry_safe(bdi, tmp, &bdi_list, bdi_list) {
+			if (!bdi_cap_writeback_dirty(bdi))
+				continue;
 			if (bdi->wb.task)
 				continue;
 			if (list_empty(&bdi->work_list) &&
 			    !bdi_has_dirty_io(bdi))
 				continue;
 
-			bdi_add_default_flusher_thread(bdi);
+			WARN(!test_bit(BDI_registered, &bdi->state),
+			     "bdi %p/%s is not registered!\n", bdi, bdi->name);
+
+			list_del_rcu(&bdi->bdi_list);
+			fork = true;
+			break;
 		}
+		spin_unlock_bh(&bdi_lock);
 
 		/* Keep working if default bdi still has things to do */
 		if (!list_empty(&me->bdi->work_list))
 			__set_current_state(TASK_RUNNING);
 
-		if (list_empty(&bdi_pending_list)) {
+		if (!fork) {
 			unsigned long wait;
 
-			spin_unlock_bh(&bdi_lock);
 			wait = msecs_to_jiffies(dirty_writeback_interval * 10);
 			if (wait)
 				schedule_timeout(wait);
@@ -378,13 +384,13 @@ static int bdi_forker_thread(void *ptr)
 		__set_current_state(TASK_RUNNING);
 
 		/*
-		 * This is our real job - check for pending entries in
-		 * bdi_pending_list, and create the threads that got added
+		 * Set the pending bit - if someone will try to unregister this
+		 * bdi - it'll wait on this bit.
 		 */
-		bdi = list_entry(bdi_pending_list.next, struct backing_dev_info,
-				 bdi_list);
-		list_del_init(&bdi->bdi_list);
-		spin_unlock_bh(&bdi_lock);
+		set_bit(BDI_pending, &bdi->state);
+
+		/* Make sure no one uses the picked bdi */
+		synchronize_rcu();
 
 		task = kthread_run(bdi_writeback_thread, &bdi->wb, "flush-%s",
 				   dev_name(bdi->dev));
@@ -397,7 +403,7 @@ static int bdi_forker_thread(void *ptr)
 			 * flush other bdi's to free memory.
 			 */
 			spin_lock_bh(&bdi_lock);
-			list_add_tail(&bdi->bdi_list, &bdi_pending_list);
+			list_add_tail_rcu(&bdi->bdi_list, &bdi_list);
 			spin_unlock_bh(&bdi_lock);
 
 			bdi_flush_io(bdi);
@@ -408,57 +414,6 @@ static int bdi_forker_thread(void *ptr)
 	return 0;
 }
 
-static void bdi_add_to_pending(struct rcu_head *head)
-{
-	struct backing_dev_info *bdi;
-
-	bdi = container_of(head, struct backing_dev_info, rcu_head);
-	INIT_LIST_HEAD(&bdi->bdi_list);
-
-	spin_lock(&bdi_lock);
-	list_add_tail(&bdi->bdi_list, &bdi_pending_list);
-	spin_unlock(&bdi_lock);
-
-	/*
-	 * We are now on the pending list, wake up bdi_forker_task()
-	 * to finish the job and add us back to the active bdi_list
-	 */
-	wake_up_process(default_backing_dev_info.wb.task);
-}
-
-/*
- * Add the default flusher thread that gets created for any bdi
- * that has dirty data pending writeout
- */
-static void bdi_add_default_flusher_thread(struct backing_dev_info *bdi)
-{
-	if (!bdi_cap_writeback_dirty(bdi))
-		return;
-
-	if (WARN_ON(!test_bit(BDI_registered, &bdi->state))) {
-		printk(KERN_ERR "bdi %p/%s is not registered!\n",
-							bdi, bdi->name);
-		return;
-	}
-
-	/*
-	 * Check with the helper whether to proceed adding a thread. Will only
-	 * abort if we two or more simultanous calls to
-	 * bdi_add_default_flusher_thread() occured, further additions will
-	 * block waiting for previous additions to finish.
-	 */
-	if (!test_and_set_bit(BDI_pending, &bdi->state)) {
-		list_del_rcu(&bdi->bdi_list);
-
-		/*
-		 * We must wait for the current RCU period to end before
-		 * moving to the pending list. So schedule that operation
-		 * from an RCU callback.
-		 */
-		call_rcu(&bdi->rcu_head, bdi_add_to_pending);
-	}
-}
-
 /*
  * Remove bdi from bdi_list, and ensure that it is no longer visible
  */
@@ -599,7 +554,6 @@ int bdi_init(struct backing_dev_info *bdi)
 	bdi->max_ratio = 100;
 	bdi->max_prop_frac = PROP_FRAC_BASE;
 	spin_lock_init(&bdi->wb_lock);
-	INIT_RCU_HEAD(&bdi->rcu_head);
 	INIT_LIST_HEAD(&bdi->bdi_list);
 	INIT_LIST_HEAD(&bdi->work_list);
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv4 07/14] writeback: do not remove bdi from bdi_list
  2010-07-23 15:05 [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups Artem Bityutskiy
                   ` (5 preceding siblings ...)
  2010-07-23 15:05 ` [PATCHv4 06/14] writeback: simplify bdi code a little Artem Bityutskiy
@ 2010-07-23 15:05 ` Artem Bityutskiy
  2010-07-23 15:05 ` [PATCHv4 08/14] writeback: move last_active to bdi Artem Bityutskiy
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-23 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

The forker thread removes bdis from 'bdi_list' before forking the bdi thread.
But this is wrong for at least 2 reasons.

Reason #1: if we temporary remove a bdi from the list, we may miss works which
           would otherwise be given to us.

Reason #2: this is racy; indeed, 'bdi_wb_shutdown()' expects that bdis are
           always in the 'bdi_list' (see 'bdi_remove_from_list()'), and when
           it races with the forker thread, it can shut down the bdi thread
           at the same time as the forker creates it.

This patch makes sure the forker thread never removes bdis from 'bdi_list'
(which was suggested by Christoph Hellwig).

In order to make sure that we do not race with 'bdi_wb_shutdown()', we have to
hold the 'bdi_lock' while walking the 'bdi_list' and setting the 'BDI_pending'
flag.

NOTE! The error path is interesting. Currently, when we fail to create a bdi
thread, we move the bdi to the tail of 'bdi_list'. But if we never remove the
bdi from the list, we cannot move it to the tail either, because then we can
mess up the RCU readers which walk the list. And also, we'll have the race
described above in "Reason #2".

But I not think that adding to the tail is any important so I just do not do
that.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/fs-writeback.c |    7 -------
 mm/backing-dev.c  |   31 ++++++++++---------------------
 2 files changed, 10 insertions(+), 28 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 8cf53ba..eaef5c9 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -804,13 +804,6 @@ int bdi_writeback_thread(void *data)
 	unsigned long wait_jiffies = -1UL;
 	long pages_written;
 
-	/*
-	 * Add us to the active bdi_list
-	 */
-	spin_lock_bh(&bdi_lock);
-	list_add_rcu(&bdi->bdi_list, &bdi_list);
-	spin_unlock_bh(&bdi_lock);
-
 	current->flags |= PF_FLUSHER | PF_SWAPWRITE;
 	set_freezable();
 
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index dbc6681..672c17b 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -331,7 +331,7 @@ static int bdi_forker_thread(void *ptr)
 	for (;;) {
 		bool fork = false;
 		struct task_struct *task;
-		struct backing_dev_info *bdi, *tmp;
+		struct backing_dev_info *bdi;
 
 		/*
 		 * Temporary measure, we want to make sure we don't see
@@ -347,7 +347,7 @@ static int bdi_forker_thread(void *ptr)
 		 * Check if any existing bdi's have dirty data without
 		 * a thread registered. If so, set that up.
 		 */
-		list_for_each_entry_safe(bdi, tmp, &bdi_list, bdi_list) {
+		list_for_each_entry(bdi, &bdi_list, bdi_list) {
 			if (!bdi_cap_writeback_dirty(bdi))
 				continue;
 			if (bdi->wb.task)
@@ -359,8 +359,13 @@ static int bdi_forker_thread(void *ptr)
 			WARN(!test_bit(BDI_registered, &bdi->state),
 			     "bdi %p/%s is not registered!\n", bdi, bdi->name);
 
-			list_del_rcu(&bdi->bdi_list);
 			fork = true;
+
+			/*
+			 * Set the pending bit - if someone will try to
+			 * unregister this bdi - it'll wait on this bit.
+			 */
+			set_bit(BDI_pending, &bdi->state);
 			break;
 		}
 		spin_unlock_bh(&bdi_lock);
@@ -383,29 +388,13 @@ static int bdi_forker_thread(void *ptr)
 
 		__set_current_state(TASK_RUNNING);
 
-		/*
-		 * Set the pending bit - if someone will try to unregister this
-		 * bdi - it'll wait on this bit.
-		 */
-		set_bit(BDI_pending, &bdi->state);
-
-		/* Make sure no one uses the picked bdi */
-		synchronize_rcu();
-
 		task = kthread_run(bdi_writeback_thread, &bdi->wb, "flush-%s",
 				   dev_name(bdi->dev));
 		if (IS_ERR(task)) {
 			/*
-			 * If thread creation fails, then readd the bdi back to
-			 * the list and force writeout of the bdi from this
-			 * forker thread. That will free some memory and we can
-			 * try again. Add it to the tail so we get a chance to
-			 * flush other bdi's to free memory.
+			 * If thread creation fails, force writeout of the bdi
+			 * from the thread.
 			 */
-			spin_lock_bh(&bdi_lock);
-			list_add_tail_rcu(&bdi->bdi_list, &bdi_list);
-			spin_unlock_bh(&bdi_lock);
-
 			bdi_flush_io(bdi);
 		} else
 			bdi->wb.task = task;
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv4 08/14] writeback: move last_active to bdi
  2010-07-23 15:05 [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups Artem Bityutskiy
                   ` (6 preceding siblings ...)
  2010-07-23 15:05 ` [PATCHv4 07/14] writeback: do not remove bdi from bdi_list Artem Bityutskiy
@ 2010-07-23 15:05 ` Artem Bityutskiy
  2010-07-23 15:05 ` [PATCHv4 09/14] writeback: restructure bdi forker loop a little Artem Bityutskiy
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-23 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Currently bdi threads use local variable 'last_active' which stores last time
when the bdi thread did some useful work. Move this local variable to 'struct
bdi_writeback'. This is just a preparation for the further patches which will
make the forker thread decide when bdi threads should be killed.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/fs-writeback.c           |    6 +++---
 include/linux/backing-dev.h |   13 +++++++------
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index eaef5c9..53e1028 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -800,12 +800,12 @@ int bdi_writeback_thread(void *data)
 {
 	struct bdi_writeback *wb = data;
 	struct backing_dev_info *bdi = wb->bdi;
-	unsigned long last_active = jiffies;
 	unsigned long wait_jiffies = -1UL;
 	long pages_written;
 
 	current->flags |= PF_FLUSHER | PF_SWAPWRITE;
 	set_freezable();
+	wb->last_active = jiffies;
 
 	/*
 	 * Our parent may run at a different priority, just set us to normal
@@ -827,7 +827,7 @@ int bdi_writeback_thread(void *data)
 		trace_writeback_pages_written(pages_written);
 
 		if (pages_written)
-			last_active = jiffies;
+			wb->last_active = jiffies;
 		else if (wait_jiffies != -1UL) {
 			unsigned long max_idle;
 
@@ -837,7 +837,7 @@ int bdi_writeback_thread(void *data)
 			 * recreated automatically.
 			 */
 			max_idle = max(5UL * 60 * HZ, wait_jiffies);
-			if (time_after(jiffies, max_idle + last_active))
+			if (time_after(jiffies, max_idle + wb->last_active))
 				break;
 		}
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 95ecb2b..71b6223 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -45,15 +45,16 @@ enum bdi_stat_item {
 #define BDI_STAT_BATCH (8*(1+ilog2(nr_cpu_ids)))
 
 struct bdi_writeback {
-	struct backing_dev_info *bdi;		/* our parent bdi */
+	struct backing_dev_info *bdi;	/* our parent bdi */
 	unsigned int nr;
 
-	unsigned long last_old_flush;		/* last old data flush */
+	unsigned long last_old_flush;	/* last old data flush */
+	unsigned long last_active;	/* last time bdi thread was active */
 
-	struct task_struct	*task;		/* writeback thread */
-	struct list_head	b_dirty;	/* dirty inodes */
-	struct list_head	b_io;		/* parked for writeback */
-	struct list_head	b_more_io;	/* parked for more writeback */
+	struct task_struct *task;	/* writeback thread */
+	struct list_head b_dirty;	/* dirty inodes */
+	struct list_head b_io;		/* parked for writeback */
+	struct list_head b_more_io;	/* parked for more writeback */
 };
 
 struct backing_dev_info {
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv4 09/14] writeback: restructure bdi forker loop a little
  2010-07-23 15:05 [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups Artem Bityutskiy
                   ` (7 preceding siblings ...)
  2010-07-23 15:05 ` [PATCHv4 08/14] writeback: move last_active to bdi Artem Bityutskiy
@ 2010-07-23 15:05 ` Artem Bityutskiy
  2010-07-23 15:05 ` [PATCHv4 10/14] writeback: move bdi threads exiting logic to the forker thread Artem Bityutskiy
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-23 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

This patch re-structures the loop which walks bdi a little. This is just a
micro-step towards the coming change where the forker thread will kill the bdi
threads. It should simplify reviewing the following changes, which would
otherwise be larger.

This patch also adds the 'bdi_cap_flush_forker(bdi)' condition check to the
loop. The reason for this is that the forker thread can start _before_ the
'BDI_registered' flag is set (see 'bdi_register()'), so the WARN() statement
will fire for the default bdi. I observed this warning at boot-up.

This patch also amends comments a little.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 mm/backing-dev.c |   28 +++++++++++++++++-----------
 1 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 672c17b..b788b8e 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -348,25 +348,31 @@ static int bdi_forker_thread(void *ptr)
 		 * a thread registered. If so, set that up.
 		 */
 		list_for_each_entry(bdi, &bdi_list, bdi_list) {
-			if (!bdi_cap_writeback_dirty(bdi))
-				continue;
-			if (bdi->wb.task)
-				continue;
-			if (list_empty(&bdi->work_list) &&
-			    !bdi_has_dirty_io(bdi))
+			bool have_dirty_io;
+
+			if (!bdi_cap_writeback_dirty(bdi) ||
+			     bdi_cap_flush_forker(bdi))
 				continue;
 
 			WARN(!test_bit(BDI_registered, &bdi->state),
 			     "bdi %p/%s is not registered!\n", bdi, bdi->name);
 
-			fork = true;
+			have_dirty_io = !list_empty(&bdi->work_list) ||
+					wb_has_dirty_io(&bdi->wb);
 
 			/*
-			 * Set the pending bit - if someone will try to
-			 * unregister this bdi - it'll wait on this bit.
+			 * If the bdi has work to do, but the thread does not
+			 * exist - create it.
 			 */
-			set_bit(BDI_pending, &bdi->state);
-			break;
+			if (!bdi->wb.task && have_dirty_io) {
+				/*
+				 * Set the pending bit - if someone will try to
+				 * unregister this bdi - it'll wait on this bit.
+				 */
+				set_bit(BDI_pending, &bdi->state);
+				fork = true;
+				break;
+			}
 		}
 		spin_unlock_bh(&bdi_lock);
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv4 10/14] writeback: move bdi threads exiting logic to the forker thread
  2010-07-23 15:05 [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups Artem Bityutskiy
                   ` (8 preceding siblings ...)
  2010-07-23 15:05 ` [PATCHv4 09/14] writeback: restructure bdi forker loop a little Artem Bityutskiy
@ 2010-07-23 15:05 ` Artem Bityutskiy
  2010-07-23 16:23   ` Christoph Hellwig
  2010-07-23 15:05 ` [PATCHv4 11/14] writeback: prevent unnecessary bdi threads wakeups Artem Bityutskiy
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-23 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Currently, bdi threads can decide to exit if there were no useful activities
for 5 minutes. However, this causes nasty races: we can easily oops in the
'bdi_queue_work()' if the bdi thread decides to exit while we are waking it up.

And even if we do not oops, but the bdi tread exits immediately after we wake
it up, we'd lose the wake-up event and have an unnecessary delay (up to 5 secs)
in the bdi work processing.

This patch makes the forker thread to be the central place which not only
creates bdi threads, but also kills them if they were inactive long enough.
This better design-wise.

Another reason why this change was done is to prepare for the further changes
which will prevent the bdi threads from waking up every 5 sec and wasting
power. Indeed, when the task does not wake up periodically anymore, it won't be
able to exit either.

This patch also moves the the 'wake_up_bit()' call from the bdi thread to the
forker thread as well. So now the forker thread sets the BDI_pending bit, then
forks the task or kills it, then clears the bit and wakes up the waiting
process.

The only process which may wain on the bit is 'bdi_wb_shutdown()'. This
function was changes as well - now it first removes the bdi from the
'bdi_list', then waits on the 'BDI_pending' bit. Once it wakes up, it is
guaranteed that the forker thread won't race with it, because the bdi is not
visible. Note, the forker thread sets the 'BDI_pending' bit under the
'bdi_lock' which is essential.

Also, I had to add memory barriers to make sure that 'bdi_wb_shutdown()' does
see the 'bdi->wb.task' change.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/fs-writeback.c |   54 +++++++-------------------------
 mm/backing-dev.c  |   87 ++++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 81 insertions(+), 60 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 53e1028..b9e5ba0 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -78,21 +78,17 @@ static void bdi_queue_work(struct backing_dev_info *bdi,
 
 	spin_lock(&bdi->wb_lock);
 	list_add_tail(&work->list, &bdi->work_list);
-	spin_unlock(&bdi->wb_lock);
-
-	/*
-	 * If the default thread isn't there, make sure we add it. When
-	 * it gets created and wakes up, we'll run this work.
-	 */
-	if (unlikely(!bdi->wb.task)) {
+	if (bdi->wb.task) {
+		wake_up_process(bdi->wb.task);
+	} else {
+		/*
+		 * The bdi thread isn't there, wake up the forker thread which
+		 * will create and run it.
+		 */
 		trace_writeback_nothread(bdi, work);
 		wake_up_process(default_backing_dev_info.wb.task);
-	} else {
-		struct bdi_writeback *wb = &bdi->wb;
-
-		if (wb->task)
-			wake_up_process(wb->task);
 	}
+	spin_unlock(&bdi->wb_lock);
 }
 
 static void
@@ -800,7 +796,6 @@ int bdi_writeback_thread(void *data)
 {
 	struct bdi_writeback *wb = data;
 	struct backing_dev_info *bdi = wb->bdi;
-	unsigned long wait_jiffies = -1UL;
 	long pages_written;
 
 	current->flags |= PF_FLUSHER | PF_SWAPWRITE;
@@ -812,13 +807,6 @@ int bdi_writeback_thread(void *data)
 	 */
 	set_user_nice(current, 0);
 
-	/*
-	 * Clear pending bit and wakeup anybody waiting to tear us down
-	 */
-	clear_bit(BDI_pending, &bdi->state);
-	smp_mb__after_clear_bit();
-	wake_up_bit(&bdi->state, BDI_pending);
-
 	trace_writeback_thread_start(bdi);
 
 	while (!kthread_should_stop()) {
@@ -828,18 +816,6 @@ int bdi_writeback_thread(void *data)
 
 		if (pages_written)
 			wb->last_active = jiffies;
-		else if (wait_jiffies != -1UL) {
-			unsigned long max_idle;
-
-			/*
-			 * Longest period of inactivity that we tolerate. If we
-			 * see dirty data again later, the thread will get
-			 * recreated automatically.
-			 */
-			max_idle = max(5UL * 60 * HZ, wait_jiffies);
-			if (time_after(jiffies, max_idle + wb->last_active))
-				break;
-		}
 
 		set_current_state(TASK_INTERRUPTIBLE);
 		if (!list_empty(&bdi->work_list)) {
@@ -847,21 +823,15 @@ int bdi_writeback_thread(void *data)
 			continue;
 		}
 
-		if (dirty_writeback_interval) {
-			wait_jiffies = msecs_to_jiffies(dirty_writeback_interval * 10);
-			schedule_timeout(wait_jiffies);
-		} else
+		if (dirty_writeback_interval)
+			schedule_timeout(msecs_to_jiffies(dirty_writeback_interval * 10));
+		else
 			schedule();
 
 		try_to_freeze();
 	}
 
-	wb->task = NULL;
-
-	/*
-	 * Flush any work that raced with us exiting. No new work
-	 * will be added, since this bdi isn't discoverable anymore.
-	 */
+	/* Flush any work that raced with us exiting */
 	if (!list_empty(&bdi->work_list))
 		wb_do_writeback(wb, 1);
 
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index b788b8e..92e812f 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -316,6 +316,18 @@ static void sync_supers_timer_fn(unsigned long unused)
 	bdi_arm_supers_timer();
 }
 
+/*
+ * Calculate the longest interval (jiffies) bdi threads are allowed to be
+ * inactive.
+ */
+static unsigned long bdi_longest_inactive(void)
+{
+	unsigned long interval;
+
+	interval = msecs_to_jiffies(dirty_writeback_interval * 10);
+	return max(5UL * 60 * HZ, interval);
+}
+
 static int bdi_forker_thread(void *ptr)
 {
 	struct bdi_writeback *me = ptr;
@@ -329,8 +341,8 @@ static int bdi_forker_thread(void *ptr)
 	set_user_nice(current, 0);
 
 	for (;;) {
-		bool fork = false;
-		struct task_struct *task;
+		bool fork = false, kill = false;
+		struct task_struct *task = NULL;
 		struct backing_dev_info *bdi;
 
 		/*
@@ -343,10 +355,6 @@ static int bdi_forker_thread(void *ptr)
 		spin_lock_bh(&bdi_lock);
 		set_current_state(TASK_INTERRUPTIBLE);
 
-		/*
-		 * Check if any existing bdi's have dirty data without
-		 * a thread registered. If so, set that up.
-		 */
 		list_for_each_entry(bdi, &bdi_list, bdi_list) {
 			bool have_dirty_io;
 
@@ -373,6 +381,25 @@ static int bdi_forker_thread(void *ptr)
 				fork = true;
 				break;
 			}
+
+			spin_lock(&bdi->wb_lock);
+			/*
+			 * If there is no work to do and the bdi thread was
+			 * inactive long enough - kill it. The wb_lock is taken
+			 * to make sure no-one adds more work to this bdi and
+			 * wakes the bdi thread up.
+			 */
+			if (bdi->wb.task && !have_dirty_io &&
+			    time_after(jiffies, bdi->wb.last_active +
+						bdi_longest_inactive())) {
+				task = bdi->wb.task;
+				bdi->wb.task = NULL;
+				spin_unlock(&bdi->wb_lock);
+				set_bit(BDI_pending, &bdi->state);
+				kill = true;
+				break;
+			}
+			spin_unlock(&bdi->wb_lock);
 		}
 		spin_unlock_bh(&bdi_lock);
 
@@ -380,7 +407,7 @@ static int bdi_forker_thread(void *ptr)
 		if (!list_empty(&me->bdi->work_list))
 			__set_current_state(TASK_RUNNING);
 
-		if (!fork) {
+		if (!fork && !kill) {
 			unsigned long wait;
 
 			wait = msecs_to_jiffies(dirty_writeback_interval * 10);
@@ -394,16 +421,34 @@ static int bdi_forker_thread(void *ptr)
 
 		__set_current_state(TASK_RUNNING);
 
-		task = kthread_run(bdi_writeback_thread, &bdi->wb, "flush-%s",
-				   dev_name(bdi->dev));
-		if (IS_ERR(task)) {
-			/*
-			 * If thread creation fails, force writeout of the bdi
-			 * from the thread.
-			 */
-			bdi_flush_io(bdi);
+		if (fork) {
+			task = kthread_run(bdi_writeback_thread, &bdi->wb,
+					   "flush-%s", dev_name(bdi->dev));
+			if (IS_ERR(task)) {
+				/*
+				 * If thread creation fails, force writeout of
+				 * the bdi from the thread.
+				 */
+				bdi_flush_io(bdi);
+			} else {
+				bdi->wb.task = task;
+				/*
+				 * Make sure 'bdi->wb.task' assignment is seen
+				 * to other CPUs before the follwoing
+				 * 'BDI_pending' bit change. See the paired
+				 * read memory barrier in 'bdi_wb_shutdown()'.
+				 */
+				smp_wmb();
+			}
 		} else
-			bdi->wb.task = task;
+			kthread_stop(task);
+
+		/*
+		 * Clear pending bit and wakeup anybody waiting to tear us down.
+		 */
+		clear_bit(BDI_pending, &bdi->state);
+		smp_mb__after_clear_bit();
+		wake_up_bit(&bdi->state, BDI_pending);
 	}
 
 	return 0;
@@ -487,15 +532,21 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
 		return;
 
 	/*
+	 * Make sure nobody finds us on the bdi_list anymore
+	 */
+	bdi_remove_from_list(bdi);
+
+	/*
 	 * If setup is pending, wait for that to complete first
 	 */
 	wait_on_bit(&bdi->state, BDI_pending, bdi_sched_wait,
 			TASK_UNINTERRUPTIBLE);
 
 	/*
-	 * Make sure nobody finds us on the bdi_list anymore
+	 * Make sure we the 'bdi->wb.task' assigment made by the forker thread
+	 * is visible to us.
 	 */
-	bdi_remove_from_list(bdi);
+	smp_rmb();
 
 	/*
 	 * Finally, kill the kernel thread. We don't need to be RCU
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv4 11/14] writeback: prevent unnecessary bdi threads wakeups
  2010-07-23 15:05 [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups Artem Bityutskiy
                   ` (9 preceding siblings ...)
  2010-07-23 15:05 ` [PATCHv4 10/14] writeback: move bdi threads exiting logic to the forker thread Artem Bityutskiy
@ 2010-07-23 15:05 ` Artem Bityutskiy
  2010-07-23 15:05 ` [PATCHv4 12/14] writeback: optimize periodic bdi thread wakeups Artem Bityutskiy
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-23 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Finally, we can get rid of unnecessary wake-ups in bdi threads, which are very
bad for battery-driven devices.

There are two types of activities bdi threads do:
1. process bdi works from the 'bdi->work_list'
2. periodic write-back

So there are 2 sources of wake-up events for bdi threads:

1. 'bdi_queue_work()' - submits bdi works
2. '__mark_inode_dirty()' - adds dirty I/O to bdi's

The former already has bdi wake-up code. The latter does not, and this patch
adds it.

'__mark_inode_dirty()' is hot-path function, but this patch adds another
'spin_lock(&bdi->wb_lock)' there. However, it is taken only in rare cases when
the bdi has no dirty inodes. So adding this spinlock should be fine and should
not affect performance.

This patch makes sure bdi threads and the forker thread do not wake-up if there
is nothing to do. The forker thread will nevertheless wake up at least every
5 min. to check whether it has to kill a bdi thread. This can also be optimized,
but is not worth it.

This patch also tidies up the warning about unregistered bid, and turns it from
an ugly crocodile to a simple 'WARN()' statement.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/fs-writeback.c |   60 +++++++++++++++++++++++++++++++++++++++++++---------
 mm/backing-dev.c  |   16 ++++++++++---
 2 files changed, 61 insertions(+), 15 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index b9e5ba0..6814502 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -823,10 +823,16 @@ int bdi_writeback_thread(void *data)
 			continue;
 		}
 
-		if (dirty_writeback_interval)
+		if (wb_has_dirty_io(wb) && dirty_writeback_interval)
 			schedule_timeout(msecs_to_jiffies(dirty_writeback_interval * 10));
-		else
+		else {
+			/*
+			 * We have nothing to do, so can go sleep without any
+			 * timeout and save power. When a work is queued or
+			 * something is made dirty - we will be woken up.
+			 */
 			schedule();
+		}
 
 		try_to_freeze();
 	}
@@ -862,6 +868,26 @@ void wakeup_flusher_threads(long nr_pages)
 	rcu_read_unlock();
 }
 
+/*
+ * This function is used when the first inode for this bdi is marked dirty. It
+ * wakes-up the corresponding bdi thread which should then take care of the
+ * periodic background write-out of dirty inodes.
+ */
+static void wakeup_bdi_thread(struct backing_dev_info *bdi)
+{
+	spin_lock(&bdi->wb_lock);
+	if (bdi->wb.task)
+		wake_up_process(bdi->wb.task);
+	else
+		/*
+		 * When bdi tasks are inactive for long time, they are killed.
+		 * In this case we have to wake-up the forker thread which
+		 * should create and run the bdi thread.
+		 */
+		wake_up_process(default_backing_dev_info.wb.task);
+	spin_unlock(&bdi->wb_lock);
+}
+
 static noinline void block_dump___mark_inode_dirty(struct inode *inode)
 {
 	if (inode->i_ino || strcmp(inode->i_sb->s_id, "bdev")) {
@@ -914,6 +940,8 @@ static noinline void block_dump___mark_inode_dirty(struct inode *inode)
 void __mark_inode_dirty(struct inode *inode, int flags)
 {
 	struct super_block *sb = inode->i_sb;
+	struct backing_dev_info *bdi = NULL;
+	bool wakeup_bdi = false;
 
 	/*
 	 * Don't do this for I_DIRTY_PAGES - that doesn't actually
@@ -967,22 +995,32 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 		 * reposition it (that would break b_dirty time-ordering).
 		 */
 		if (!was_dirty) {
-			struct bdi_writeback *wb = &inode_to_bdi(inode)->wb;
-			struct backing_dev_info *bdi = wb->bdi;
-
-			if (bdi_cap_writeback_dirty(bdi) &&
-			    !test_bit(BDI_registered, &bdi->state)) {
-				WARN_ON(1);
-				printk(KERN_ERR "bdi-%s not registered\n",
-								bdi->name);
+			bdi = inode_to_bdi(inode);
+
+			if (bdi_cap_writeback_dirty(bdi)) {
+				WARN(!test_bit(BDI_registered, &bdi->state),
+				     "bdi-%s not registered\n", bdi->name);
+
+				/*
+				 * If this is the first dirty inode for this
+				 * bdi, we have to wake-up the corresponding
+				 * bdi thread to make sure background
+				 * write-back happens later.
+				 */
+				if (!wb_has_dirty_io(&bdi->wb))
+					wakeup_bdi = true;
 			}
 
 			inode->dirtied_when = jiffies;
-			list_move(&inode->i_list, &wb->b_dirty);
+			list_move(&inode->i_list, &bdi->wb.b_dirty);
 		}
 	}
 out:
 	spin_unlock(&inode_lock);
+
+	if (wakeup_bdi)
+		wakeup_bdi_thread(bdi);
+
 }
 EXPORT_SYMBOL(__mark_inode_dirty);
 
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 92e812f..b31c4f3 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -411,10 +411,18 @@ static int bdi_forker_thread(void *ptr)
 			unsigned long wait;
 
 			wait = msecs_to_jiffies(dirty_writeback_interval * 10);
-			if (wait)
-				schedule_timeout(wait);
-			else
-				schedule();
+			if (!wb_has_dirty_io(me) || !wait) {
+				/*
+				 * There are no dirty data. The only thing we
+				 * should now care is checking for inactive bdi
+				 * threads and killing them. Thus, let's sleep
+				 * for longer time to avoid unnecessary
+				 * wake-ups, save energy and be friendly for
+				 * battery-driven devices.
+				 */
+				wait = bdi_longest_inactive();
+			}
+			schedule_timeout(wait);
 			try_to_freeze();
 			continue;
 		}
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv4 12/14] writeback: optimize periodic bdi thread wakeups
  2010-07-23 15:05 [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups Artem Bityutskiy
                   ` (10 preceding siblings ...)
  2010-07-23 15:05 ` [PATCHv4 11/14] writeback: prevent unnecessary bdi threads wakeups Artem Bityutskiy
@ 2010-07-23 15:05 ` Artem Bityutskiy
  2010-07-23 16:28   ` Christoph Hellwig
  2010-07-23 15:05 ` [PATCHv4 13/14] writeback: remove unnecessary init_timer call Artem Bityutskiy
  2010-07-23 15:05 ` [PATCHv4 14/14] writeback: add new tracepoints Artem Bityutskiy
  13 siblings, 1 reply; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-23 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Whe the first inode for a bdi is marked dirty, we wake up the bdi thread which
should take care of the periodic background write-out. However, the write-out
will actually start only 'dirty_writeback_interval' centisecs later, so we can
delay the wake-up.

This change was requested by Nick Piggin who pointed out that if we delay the
wake-up, we weed out 2 unnecessary contex switches, which matters because
'__mark_inode_dirty()' is a hot-path function.

This patch introduces a new function - 'bdi_wakeup_thread_delayed()', which
sets up a timer to wake-up the bdi thread and returns. So the wake-up is
delayed.

We also delete the timer in bdi threads just before writing-back.

This patch also moves the 'bdi_wb_init()' function down in the file to avoid
forward-declaration of 'bdi_wakeup_thread_delayed()'.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/fs-writeback.c           |   28 +++++--------------
 include/linux/backing-dev.h |    2 +
 mm/backing-dev.c            |   64 +++++++++++++++++++++++++++++++++++--------
 3 files changed, 61 insertions(+), 33 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6814502..b837cae 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -810,6 +810,12 @@ int bdi_writeback_thread(void *data)
 	trace_writeback_thread_start(bdi);
 
 	while (!kthread_should_stop()) {
+		/*
+		 * Remove own delayed wake-up timer, since we are already awake
+		 * and we'll take care of the preriodic write-back.
+		 */
+		del_timer(&wb->wakeup_timer);
+
 		pages_written = wb_do_writeback(wb, 0);
 
 		trace_writeback_pages_written(pages_written);
@@ -868,26 +874,6 @@ void wakeup_flusher_threads(long nr_pages)
 	rcu_read_unlock();
 }
 
-/*
- * This function is used when the first inode for this bdi is marked dirty. It
- * wakes-up the corresponding bdi thread which should then take care of the
- * periodic background write-out of dirty inodes.
- */
-static void wakeup_bdi_thread(struct backing_dev_info *bdi)
-{
-	spin_lock(&bdi->wb_lock);
-	if (bdi->wb.task)
-		wake_up_process(bdi->wb.task);
-	else
-		/*
-		 * When bdi tasks are inactive for long time, they are killed.
-		 * In this case we have to wake-up the forker thread which
-		 * should create and run the bdi thread.
-		 */
-		wake_up_process(default_backing_dev_info.wb.task);
-	spin_unlock(&bdi->wb_lock);
-}
-
 static noinline void block_dump___mark_inode_dirty(struct inode *inode)
 {
 	if (inode->i_ino || strcmp(inode->i_sb->s_id, "bdev")) {
@@ -1019,7 +1005,7 @@ out:
 	spin_unlock(&inode_lock);
 
 	if (wakeup_bdi)
-		wakeup_bdi_thread(bdi);
+		bdi_wakeup_thread_delayed(bdi);
 
 }
 EXPORT_SYMBOL(__mark_inode_dirty);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 71b6223..7628219 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -52,6 +52,7 @@ struct bdi_writeback {
 	unsigned long last_active;	/* last time bdi thread was active */
 
 	struct task_struct *task;	/* writeback thread */
+	struct timer_list wakeup_timer; /* used for delayed bdi thread wakeup */
 	struct list_head b_dirty;	/* dirty inodes */
 	struct list_head b_io;		/* parked for writeback */
 	struct list_head b_more_io;	/* parked for more writeback */
@@ -105,6 +106,7 @@ void bdi_start_background_writeback(struct backing_dev_info *bdi);
 int bdi_writeback_thread(void *data);
 int bdi_has_dirty_io(struct backing_dev_info *bdi);
 void bdi_arm_supers_timer(void);
+void bdi_wakeup_thread_delayed(struct backing_dev_info *bdi);
 
 extern spinlock_t bdi_lock;
 extern struct list_head bdi_list;
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index b31c4f3..cedd9b1 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -248,17 +248,6 @@ static int __init default_bdi_init(void)
 }
 subsys_initcall(default_bdi_init);
 
-static void bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
-{
-	memset(wb, 0, sizeof(*wb));
-
-	wb->bdi = bdi;
-	wb->last_old_flush = jiffies;
-	INIT_LIST_HEAD(&wb->b_dirty);
-	INIT_LIST_HEAD(&wb->b_io);
-	INIT_LIST_HEAD(&wb->b_more_io);
-}
-
 int bdi_has_dirty_io(struct backing_dev_info *bdi)
 {
 	return wb_has_dirty_io(&bdi->wb);
@@ -316,6 +305,43 @@ static void sync_supers_timer_fn(unsigned long unused)
 	bdi_arm_supers_timer();
 }
 
+static void wakeup_timer_fn(unsigned long data)
+{
+	struct backing_dev_info *bdi = (struct backing_dev_info *)data;
+
+	spin_lock(&bdi->wb_lock);
+	if (bdi->wb.task) {
+		wake_up_process(bdi->wb.task);
+	} else {
+		/*
+		 * When bdi tasks are inactive for long time, they are killed.
+		 * In this case we have to wake-up the forker thread which
+		 * should create and run the bdi thread.
+		 */
+		wake_up_process(default_backing_dev_info.wb.task);
+	}
+	spin_unlock(&bdi->wb_lock);
+}
+
+/*
+ * This function is used when the first inode for this bdi is marked dirty. It
+ * wakes-up the corresponding bdi thread which should then take care of the
+ * periodic background write-out of dirty inodes. Since the write-out would
+ * starts only 'dirty_writeback_interval' centisecs from now anyway, we just
+ * set up a timer which wakes the bdi thread up later.
+ *
+ * Note, we wouldn't bother setting up the timer, but this function is on the
+ * fast-path (used by '__mark_inode_dirty()'), so we save few context switches
+ * by delaying the wake-up.
+ */
+void bdi_wakeup_thread_delayed(struct backing_dev_info *bdi)
+{
+	unsigned long timeout;
+
+	timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
+	mod_timer(&bdi->wb.wakeup_timer, jiffies + timeout);
+}
+
 /*
  * Calculate the longest interval (jiffies) bdi threads are allowed to be
  * inactive.
@@ -349,8 +375,10 @@ static int bdi_forker_thread(void *ptr)
 		 * Temporary measure, we want to make sure we don't see
 		 * dirty data on the default backing_dev_info
 		 */
-		if (wb_has_dirty_io(me) || !list_empty(&me->bdi->work_list))
+		if (wb_has_dirty_io(me) || !list_empty(&me->bdi->work_list)) {
+			del_timer(&me->wakeup_timer);
 			wb_do_writeback(me, 0);
+		}
 
 		spin_lock_bh(&bdi_lock);
 		set_current_state(TASK_INTERRUPTIBLE);
@@ -598,6 +626,18 @@ void bdi_unregister(struct backing_dev_info *bdi)
 }
 EXPORT_SYMBOL(bdi_unregister);
 
+static void bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
+{
+	memset(wb, 0, sizeof(*wb));
+
+	wb->bdi = bdi;
+	wb->last_old_flush = jiffies;
+	INIT_LIST_HEAD(&wb->b_dirty);
+	INIT_LIST_HEAD(&wb->b_io);
+	INIT_LIST_HEAD(&wb->b_more_io);
+	setup_timer(&wb->wakeup_timer, wakeup_timer_fn, (unsigned long)bdi);
+}
+
 int bdi_init(struct backing_dev_info *bdi)
 {
 	int i, err;
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv4 13/14] writeback: remove unnecessary init_timer call
  2010-07-23 15:05 [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups Artem Bityutskiy
                   ` (11 preceding siblings ...)
  2010-07-23 15:05 ` [PATCHv4 12/14] writeback: optimize periodic bdi thread wakeups Artem Bityutskiy
@ 2010-07-23 15:05 ` Artem Bityutskiy
  2010-07-23 16:28   ` Christoph Hellwig
  2010-07-23 15:05 ` [PATCHv4 14/14] writeback: add new tracepoints Artem Bityutskiy
  13 siblings, 1 reply; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-23 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

The 'setup_timer()' function also calls 'init_timer()', so the extra
'init_timer()' call is not needed. Indeed, 'setup_timer()' is basically
'init_timer()' plus callback function and data pointers initialization.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 mm/backing-dev.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index cedd9b1..ce105fc 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -236,7 +236,6 @@ static int __init default_bdi_init(void)
 	sync_supers_tsk = kthread_run(bdi_sync_supers, NULL, "sync_supers");
 	BUG_ON(IS_ERR(sync_supers_tsk));
 
-	init_timer(&sync_supers_timer);
 	setup_timer(&sync_supers_timer, sync_supers_timer_fn, 0);
 	bdi_arm_supers_timer();
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv4 14/14] writeback: add new tracepoints
  2010-07-23 15:05 [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups Artem Bityutskiy
                   ` (12 preceding siblings ...)
  2010-07-23 15:05 ` [PATCHv4 13/14] writeback: remove unnecessary init_timer call Artem Bityutskiy
@ 2010-07-23 15:05 ` Artem Bityutskiy
  2010-07-23 16:29   ` Christoph Hellwig
  13 siblings, 1 reply; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-23 15:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Add 2 new trace points to the periodic write-back wake up case, just like we do
in the 'bdi_queue_work()' function. Namely, introduce:

1. trace_writeback_wakeup(bdi)
2. trace_writeback_wakeup_nothread(bdi)

The first event is triggered every time we wake up a bdi thread to start
periodic background write-out. The second event is triggered only when the bdi
thread does not exist and should be created by the forker thread.

This patch was suggested by Dave Chinner <david@fromorbit.com>

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 include/trace/events/writeback.h |    2 ++
 mm/backing-dev.c                 |    3 +++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 84ab72d..eb07d48 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -81,6 +81,8 @@ DEFINE_EVENT(writeback_class, name, \
 	TP_ARGS(bdi))
 
 DEFINE_WRITEBACK_EVENT(writeback_nowork);
+DEFINE_WRITEBACK_EVENT(writeback_wakeup);
+DEFINE_WRITEBACK_EVENT(writeback_wakeup_nothread);
 DEFINE_WRITEBACK_EVENT(writeback_bdi_register);
 DEFINE_WRITEBACK_EVENT(writeback_bdi_unregister);
 DEFINE_WRITEBACK_EVENT(writeback_thread_start);
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index ce105fc..22fc506 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -308,6 +308,8 @@ static void wakeup_timer_fn(unsigned long data)
 {
 	struct backing_dev_info *bdi = (struct backing_dev_info *)data;
 
+	trace_writeback_wakeup(bdi);
+
 	spin_lock(&bdi->wb_lock);
 	if (bdi->wb.task) {
 		wake_up_process(bdi->wb.task);
@@ -317,6 +319,7 @@ static void wakeup_timer_fn(unsigned long data)
 		 * In this case we have to wake-up the forker thread which
 		 * should create and run the bdi thread.
 		 */
+		trace_writeback_wakeup_nothread(bdi);
 		wake_up_process(default_backing_dev_info.wb.task);
 	}
 	spin_unlock(&bdi->wb_lock);
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCHv4 10/14] writeback: move bdi threads exiting logic to the forker thread
  2010-07-23 15:05 ` [PATCHv4 10/14] writeback: move bdi threads exiting logic to the forker thread Artem Bityutskiy
@ 2010-07-23 16:23   ` Christoph Hellwig
  2010-07-24  5:55       ` Artem Bityutskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2010-07-23 16:23 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: Jens Axboe, linux-fsdevel, linux-kernel

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

Maybe bdi_forker_thread can be cleaned up a bit more by introducing a

enum {
	NO_ACTION,
	START_THREAD,
	KILL_THREAD,
} action;

and then do a switch based on it later?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv4 12/14] writeback: optimize periodic bdi thread wakeups
  2010-07-23 15:05 ` [PATCHv4 12/14] writeback: optimize periodic bdi thread wakeups Artem Bityutskiy
@ 2010-07-23 16:28   ` Christoph Hellwig
  2010-07-24  6:00     ` Artem Bityutskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2010-07-23 16:28 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: Jens Axboe, linux-fsdevel, linux-kernel

I haven't reviewed this in detail, but what ensures the timer is
synchronously removed when the forker goes away?  I don't see a
del_timer_sync call anywhere.  For now it might be easier to just
skip this patch and leave it for later.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv4 13/14] writeback: remove unnecessary init_timer call
  2010-07-23 15:05 ` [PATCHv4 13/14] writeback: remove unnecessary init_timer call Artem Bityutskiy
@ 2010-07-23 16:28   ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2010-07-23 16:28 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: Jens Axboe, linux-fsdevel, linux-kernel

On Fri, Jul 23, 2010 at 06:05:53PM +0300, Artem Bityutskiy wrote:
> From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
> 
> The 'setup_timer()' function also calls 'init_timer()', so the extra
> 'init_timer()' call is not needed. Indeed, 'setup_timer()' is basically
> 'init_timer()' plus callback function and data pointers initialization.

Looks good,


Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv4 14/14] writeback: add new tracepoints
  2010-07-23 15:05 ` [PATCHv4 14/14] writeback: add new tracepoints Artem Bityutskiy
@ 2010-07-23 16:29   ` Christoph Hellwig
  2010-07-24  6:01     ` Artem Bityutskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2010-07-23 16:29 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: Jens Axboe, linux-fsdevel, linux-kernel

On Fri, Jul 23, 2010 at 06:05:54PM +0300, Artem Bityutskiy wrote:
> From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
> 
> Add 2 new trace points to the periodic write-back wake up case, just like we do
> in the 'bdi_queue_work()' function. Namely, introduce:
> 
> 1. trace_writeback_wakeup(bdi)
> 2. trace_writeback_wakeup_nothread(bdi)
> 
> The first event is triggered every time we wake up a bdi thread to start
> periodic background write-out. The second event is triggered only when the bdi
> thread does not exist and should be created by the forker thread.
> 
> This patch was suggested by Dave Chinner <david@fromorbit.com>

As mentioned before doing the wakeup just for the case where we
really wake up the flusher thead is much better.  It's not 100%
clear for bdi_queue_work as we queue the work in either case, but
I'd prefer to fix that one up as well (not in your series anyway)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv4 10/14] writeback: move bdi threads exiting logic to the forker thread
  2010-07-23 16:23   ` Christoph Hellwig
@ 2010-07-24  5:55       ` Artem Bityutskiy
  0 siblings, 0 replies; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-24  5:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-fsdevel, linux-kernel

On Fri, 2010-07-23 at 12:23 -0400, Christoph Hellwig wrote:
> Looks good,
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Maybe bdi_forker_thread can be cleaned up a bit more by introducing a
> 
> enum {
> 	NO_ACTION,
> 	START_THREAD,
> 	KILL_THREAD,
> } action;
> 
> and then do a switch based on it later?

Just did this. May be it became a bit nicer, not sure. Matter of taste.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv4 10/14] writeback: move bdi threads exiting logic to the forker thread
@ 2010-07-24  5:55       ` Artem Bityutskiy
  0 siblings, 0 replies; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-24  5:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-fsdevel, linux-kernel

On Fri, 2010-07-23 at 12:23 -0400, Christoph Hellwig wrote:
> Looks good,
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Maybe bdi_forker_thread can be cleaned up a bit more by introducing a
> 
> enum {
> 	NO_ACTION,
> 	START_THREAD,
> 	KILL_THREAD,
> } action;
> 
> and then do a switch based on it later?

Just did this. May be it became a bit nicer, not sure. Matter of taste.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv4 12/14] writeback: optimize periodic bdi thread wakeups
  2010-07-23 16:28   ` Christoph Hellwig
@ 2010-07-24  6:00     ` Artem Bityutskiy
  0 siblings, 0 replies; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-24  6:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-fsdevel, linux-kernel

On Fri, 2010-07-23 at 12:28 -0400, Christoph Hellwig wrote:
> I haven't reviewed this in detail, but what ensures the timer is
> synchronously removed when the forker goes away?

Good point, thanks.

>   I don't see a
> del_timer_sync call anywhere.  For now it might be easier to just
> skip this patch and leave it for later.

Well, my tests showed that with this patch the flushers wake up
considerably less. So I'll try to come up with a better patch.

I will set-up better testing. Will hack things so that the background
dirty writeout timeout is something like 1-3 jiffies and the bdi thread
inactive timeout is something like 3-5 jiffies. Then will write a script
which forks many tasks each of each creates a loop-back device, mounts
it, does some I/O, unmounts, removes the loop-back device, and so on. If
run for long time, it should give good stress to the code paths I'm
working on. I have a 2-way 4-core (total 8) amd64 testbox to test.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHv4 14/14] writeback: add new tracepoints
  2010-07-23 16:29   ` Christoph Hellwig
@ 2010-07-24  6:01     ` Artem Bityutskiy
  0 siblings, 0 replies; 23+ messages in thread
From: Artem Bityutskiy @ 2010-07-24  6:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-fsdevel, linux-kernel

On Fri, 2010-07-23 at 12:29 -0400, Christoph Hellwig wrote:
> On Fri, Jul 23, 2010 at 06:05:54PM +0300, Artem Bityutskiy wrote:
> > From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
> > 
> > Add 2 new trace points to the periodic write-back wake up case, just like we do
> > in the 'bdi_queue_work()' function. Namely, introduce:
> > 
> > 1. trace_writeback_wakeup(bdi)
> > 2. trace_writeback_wakeup_nothread(bdi)
> > 
> > The first event is triggered every time we wake up a bdi thread to start
> > periodic background write-out. The second event is triggered only when the bdi
> > thread does not exist and should be created by the forker thread.
> > 
> > This patch was suggested by Dave Chinner <david@fromorbit.com>
> 
> As mentioned before doing the wakeup just for the case where we
> really wake up the flusher thead is much better.  It's not 100%
> clear for bdi_queue_work as we queue the work in either case, but
> I'd prefer to fix that one up as well (not in your series anyway)

OK, I'll do it your way. Many thanks for review!

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2010-07-24  6:08 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-23 15:05 [PATCHv4 00/14] kill unnecessary bdi wakeups + cleanups Artem Bityutskiy
2010-07-23 15:05 ` [PATCHv4 01/14] writeback: harmonize writeback threads naming Artem Bityutskiy
2010-07-23 15:05 ` [PATCHv4 02/14] writeback: fix possible race when creating bdi threads Artem Bityutskiy
2010-07-23 15:05 ` [PATCHv4 03/14] writeback: do not lose wake-ups in the forker thread - 1 Artem Bityutskiy
2010-07-23 15:05 ` [PATCHv4 04/14] writeback: do not lose wake-ups in the forker thread - 2 Artem Bityutskiy
2010-07-23 15:05 ` [PATCHv4 05/14] writeback: do not lose wake-ups in bdi threads Artem Bityutskiy
2010-07-23 15:05 ` [PATCHv4 06/14] writeback: simplify bdi code a little Artem Bityutskiy
2010-07-23 15:05 ` [PATCHv4 07/14] writeback: do not remove bdi from bdi_list Artem Bityutskiy
2010-07-23 15:05 ` [PATCHv4 08/14] writeback: move last_active to bdi Artem Bityutskiy
2010-07-23 15:05 ` [PATCHv4 09/14] writeback: restructure bdi forker loop a little Artem Bityutskiy
2010-07-23 15:05 ` [PATCHv4 10/14] writeback: move bdi threads exiting logic to the forker thread Artem Bityutskiy
2010-07-23 16:23   ` Christoph Hellwig
2010-07-24  5:55     ` Artem Bityutskiy
2010-07-24  5:55       ` Artem Bityutskiy
2010-07-23 15:05 ` [PATCHv4 11/14] writeback: prevent unnecessary bdi threads wakeups Artem Bityutskiy
2010-07-23 15:05 ` [PATCHv4 12/14] writeback: optimize periodic bdi thread wakeups Artem Bityutskiy
2010-07-23 16:28   ` Christoph Hellwig
2010-07-24  6:00     ` Artem Bityutskiy
2010-07-23 15:05 ` [PATCHv4 13/14] writeback: remove unnecessary init_timer call Artem Bityutskiy
2010-07-23 16:28   ` Christoph Hellwig
2010-07-23 15:05 ` [PATCHv4 14/14] writeback: add new tracepoints Artem Bityutskiy
2010-07-23 16:29   ` Christoph Hellwig
2010-07-24  6:01     ` Artem Bityutskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.