linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/4] writeback: add bdi_dirty_limit() kernel-doc
       [not found] <20110413085937.981293444@intel.com>
@ 2011-04-13  8:59 ` Wu Fengguang
  2011-04-13 21:47   ` Jan Kara
  2011-04-13  8:59 ` [PATCH 2/4] writeback: avoid duplicate balance_dirty_pages_ratelimited() calls Wu Fengguang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Wu Fengguang @ 2011-04-13  8:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Wu Fengguang, Jan Kara, Dave Chinner,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

[-- Attachment #1: writeback-task_dirty_limit-comment.patch --]
[-- Type: text/plain, Size: 1109 bytes --]

Clarify the bdi_dirty_limit() comment.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/page-writeback.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

--- linux-next.orig/mm/page-writeback.c	2011-03-03 14:38:12.000000000 +0800
+++ linux-next/mm/page-writeback.c	2011-03-03 14:40:52.000000000 +0800
@@ -437,10 +437,17 @@ void global_dirty_limits(unsigned long *
 	*pdirty = dirty;
 }
 
-/*
+/**
  * bdi_dirty_limit - @bdi's share of dirty throttling threshold
+ * @bdi: the backing_dev_info to query
+ * @dirty: global dirty limit in pages
+ *
+ * Returns @bdi's dirty limit in pages. The term "dirty" in the context of
+ * dirty balancing includes all PG_dirty, PG_writeback and NFS unstable pages.
+ * And the "limit" in the name is not seriously taken as hard limit in
+ * balance_dirty_pages().
  *
- * Allocate high/low dirty limits to fast/slow devices, in order to prevent
+ * It allocates high/low dirty limits to fast/slow devices, in order to prevent
  * - starving fast devices
  * - piling up dirty pages (that will take long time to sync) on slow devices
  *



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 2/4] writeback: avoid duplicate balance_dirty_pages_ratelimited() calls
       [not found] <20110413085937.981293444@intel.com>
  2011-04-13  8:59 ` [PATCH 1/4] writeback: add bdi_dirty_limit() kernel-doc Wu Fengguang
@ 2011-04-13  8:59 ` Wu Fengguang
  2011-04-13 21:53   ` Jan Kara
  2011-04-13  8:59 ` [PATCH 3/4] writeback: skip balance_dirty_pages() for in-memory fs Wu Fengguang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Wu Fengguang @ 2011-04-13  8:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Wu Fengguang, Jan Kara, Dave Chinner,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

[-- Attachment #1: writeback-fix-duplicate-bdp-calls.patch --]
[-- Type: text/plain, Size: 1130 bytes --]

When dd in 512bytes, balance_dirty_pages_ratelimited() could be called 8
times for the same page, but obviously the page is only dirtied once.

Fix it with a (slightly racy) PageDirty() test.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/filemap.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- linux-next.orig/mm/filemap.c	2011-04-13 16:46:01.000000000 +0800
+++ linux-next/mm/filemap.c	2011-04-13 16:47:26.000000000 +0800
@@ -2313,6 +2313,7 @@ static ssize_t generic_perform_write(str
 	long status = 0;
 	ssize_t written = 0;
 	unsigned int flags = 0;
+	unsigned int dirty;
 
 	/*
 	 * Copies from kernel address space cannot fail (NFSD is a big user).
@@ -2361,6 +2362,7 @@ again:
 		pagefault_enable();
 		flush_dcache_page(page);
 
+		dirty = PageDirty(page);
 		mark_page_accessed(page);
 		status = a_ops->write_end(file, mapping, pos, bytes, copied,
 						page, fsdata);
@@ -2387,7 +2389,8 @@ again:
 		pos += copied;
 		written += copied;
 
-		balance_dirty_pages_ratelimited(mapping);
+		if (!dirty)
+			balance_dirty_pages_ratelimited(mapping);
 
 	} while (iov_iter_count(i));
 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 3/4] writeback: skip balance_dirty_pages() for in-memory fs
       [not found] <20110413085937.981293444@intel.com>
  2011-04-13  8:59 ` [PATCH 1/4] writeback: add bdi_dirty_limit() kernel-doc Wu Fengguang
  2011-04-13  8:59 ` [PATCH 2/4] writeback: avoid duplicate balance_dirty_pages_ratelimited() calls Wu Fengguang
@ 2011-04-13  8:59 ` Wu Fengguang
  2011-04-13 21:54   ` Jan Kara
  2011-04-13  8:59 ` [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time Wu Fengguang
  2011-04-13 10:15 ` [PATCH 0/4] trivial writeback fixes Peter Zijlstra
  4 siblings, 1 reply; 29+ messages in thread
From: Wu Fengguang @ 2011-04-13  8:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Hugh Dickins, Rik van Riel, Wu Fengguang,
	Jan Kara, Dave Chinner, LKML, Linux Memory Management List,
	linux-fsdevel

[-- Attachment #1: writeback-trace-global-dirty-states-fix.patch --]
[-- Type: text/plain, Size: 2960 bytes --]

This avoids unnecessary checks and dirty throttling on tmpfs/ramfs.

It can also prevent

[  388.126563] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050

in the balance_dirty_pages tracepoint, which will call

	dev_name(mapping->backing_dev_info->dev)

but shmem_backing_dev_info.dev is NULL.

Summary notes about the tmpfs/ramfs behavior changes:

As for 2.6.36 and older kernels, the tmpfs writes will sleep inside
balance_dirty_pages() as long as we are over the (dirty+background)/2
global throttle threshold.  This is because both the dirty pages and
threshold will be 0 for tmpfs/ramfs. Hence this test will always
evaluate to TRUE:

                dirty_exceeded =
                        (bdi_nr_reclaimable + bdi_nr_writeback >= bdi_thresh)
                        || (nr_reclaimable + nr_writeback >= dirty_thresh);

For 2.6.37, someone complained that the current logic does not allow the
users to set vm.dirty_ratio=0.  So commit 4cbec4c8b9 changed the test to

                dirty_exceeded =
                        (bdi_nr_reclaimable + bdi_nr_writeback > bdi_thresh)
                        || (nr_reclaimable + nr_writeback > dirty_thresh);

So 2.6.37 will behave differently for tmpfs/ramfs: it will never get
throttled unless the global dirty threshold is exceeded (which is very
unlikely to happen; once happen, will block many tasks).

I'd say that the 2.6.36 behavior is very bad for tmpfs/ramfs. It means
for a busy writing server, tmpfs write()s may get livelocked! The
"inadvertent" throttling can hardly bring help to any workload because
of its "either no throttling, or get throttled to death" property.

So based on 2.6.37, this patch won't bring more noticeable changes.

CC: Hugh Dickins <hughd@google.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/page-writeback.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

--- linux-next.orig/mm/page-writeback.c	2011-03-03 14:43:37.000000000 +0800
+++ linux-next/mm/page-writeback.c	2011-03-03 14:43:51.000000000 +0800
@@ -244,13 +244,8 @@ void task_dirty_inc(struct task_struct *
 static void bdi_writeout_fraction(struct backing_dev_info *bdi,
 		long *numerator, long *denominator)
 {
-	if (bdi_cap_writeback_dirty(bdi)) {
-		prop_fraction_percpu(&vm_completions, &bdi->completions,
+	prop_fraction_percpu(&vm_completions, &bdi->completions,
 				numerator, denominator);
-	} else {
-		*numerator = 0;
-		*denominator = 1;
-	}
 }
 
 static inline void task_dirties_fraction(struct task_struct *tsk,
@@ -495,6 +490,9 @@ static void balance_dirty_pages(struct a
 	bool dirty_exceeded = false;
 	struct backing_dev_info *bdi = mapping->backing_dev_info;
 
+	if (!bdi_cap_account_dirty(bdi))
+		return;
+
 	for (;;) {
 		struct writeback_control wbc = {
 			.sync_mode	= WB_SYNC_NONE,



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
       [not found] <20110413085937.981293444@intel.com>
                   ` (2 preceding siblings ...)
  2011-04-13  8:59 ` [PATCH 3/4] writeback: skip balance_dirty_pages() for in-memory fs Wu Fengguang
@ 2011-04-13  8:59 ` Wu Fengguang
  2011-04-13 22:04   ` Jan Kara
  2011-04-13 10:15 ` [PATCH 0/4] trivial writeback fixes Peter Zijlstra
  4 siblings, 1 reply; 29+ messages in thread
From: Wu Fengguang @ 2011-04-13  8:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Richard Kennedy, Wu Fengguang, Jan Kara,
	Hugh Dickins, Rik van Riel, Dave Chinner, LKML,
	Linux Memory Management List, linux-fsdevel

[-- Attachment #1: writeback-speedup-per-bdi-threshold-ramp-up.patch --]
[-- Type: text/plain, Size: 794 bytes --]

Reduce the dampening for the control system, yielding faster
convergence. The change is a bit conservative, as smaller values may
lead to noticeable bdi threshold fluctuates in low memory JBOD setup.

CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/page-writeback.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-next.orig/mm/page-writeback.c	2011-03-02 14:52:19.000000000 +0800
+++ linux-next/mm/page-writeback.c	2011-03-02 15:00:17.000000000 +0800
@@ -145,7 +145,7 @@ static int calc_period_shift(void)
 	else
 		dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
 				100;
-	return 2 + ilog2(dirty_total - 1);
+	return ilog2(dirty_total - 1);
 }
 
 /*



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] trivial writeback fixes
       [not found] <20110413085937.981293444@intel.com>
                   ` (3 preceding siblings ...)
  2011-04-13  8:59 ` [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time Wu Fengguang
@ 2011-04-13 10:15 ` Peter Zijlstra
  4 siblings, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2011-04-13 10:15 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Jan Kara, Dave Chinner, Hugh Dickins,
	Rik van Riel, LKML, Linux Memory Management List, linux-fsdevel

On Wed, 2011-04-13 at 16:59 +0800, Wu Fengguang wrote:
> Andrew,
> 
> Here are four trivial writeback fix patches that
> should work well for the patches from both Jan and me.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/4] writeback: add bdi_dirty_limit() kernel-doc
  2011-04-13  8:59 ` [PATCH 1/4] writeback: add bdi_dirty_limit() kernel-doc Wu Fengguang
@ 2011-04-13 21:47   ` Jan Kara
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2011-04-13 21:47 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Peter Zijlstra, Jan Kara, Dave Chinner,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

On Wed 13-04-11 16:59:38, Wu Fengguang wrote:
> Clarify the bdi_dirty_limit() comment.
> 
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
  Looks fine.

  Acked-by: Jan Kara <jack@suse.cz>

							Honza
> ---
>  mm/page-writeback.c |   11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> --- linux-next.orig/mm/page-writeback.c	2011-03-03 14:38:12.000000000 +0800
> +++ linux-next/mm/page-writeback.c	2011-03-03 14:40:52.000000000 +0800
> @@ -437,10 +437,17 @@ void global_dirty_limits(unsigned long *
>  	*pdirty = dirty;
>  }
>  
> -/*
> +/**
>   * bdi_dirty_limit - @bdi's share of dirty throttling threshold
> + * @bdi: the backing_dev_info to query
> + * @dirty: global dirty limit in pages
> + *
> + * Returns @bdi's dirty limit in pages. The term "dirty" in the context of
> + * dirty balancing includes all PG_dirty, PG_writeback and NFS unstable pages.
> + * And the "limit" in the name is not seriously taken as hard limit in
> + * balance_dirty_pages().
>   *
> - * Allocate high/low dirty limits to fast/slow devices, in order to prevent
> + * It allocates high/low dirty limits to fast/slow devices, in order to prevent
>   * - starving fast devices
>   * - piling up dirty pages (that will take long time to sync) on slow devices
>   *
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/4] writeback: avoid duplicate balance_dirty_pages_ratelimited() calls
  2011-04-13  8:59 ` [PATCH 2/4] writeback: avoid duplicate balance_dirty_pages_ratelimited() calls Wu Fengguang
@ 2011-04-13 21:53   ` Jan Kara
  2011-04-14  0:30     ` Wu Fengguang
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2011-04-13 21:53 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Peter Zijlstra, Jan Kara, Dave Chinner,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

On Wed 13-04-11 16:59:39, Wu Fengguang wrote:
> When dd in 512bytes, balance_dirty_pages_ratelimited() could be called 8
> times for the same page, but obviously the page is only dirtied once.
> 
> Fix it with a (slightly racy) PageDirty() test.
> 
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  mm/filemap.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> --- linux-next.orig/mm/filemap.c	2011-04-13 16:46:01.000000000 +0800
> +++ linux-next/mm/filemap.c	2011-04-13 16:47:26.000000000 +0800
> @@ -2313,6 +2313,7 @@ static ssize_t generic_perform_write(str
>  	long status = 0;
>  	ssize_t written = 0;
>  	unsigned int flags = 0;
> +	unsigned int dirty;
>  
>  	/*
>  	 * Copies from kernel address space cannot fail (NFSD is a big user).
> @@ -2361,6 +2362,7 @@ again:
>  		pagefault_enable();
>  		flush_dcache_page(page);
>  
> +		dirty = PageDirty(page);
  This isn't completely right as we sometimes dirty the page in
->write_begin() (see e.g. block_write_begin() when we allocate blocks under
an already uptodate page) and in such cases we would not call
balance_dirty_pages(). So I'm not sure we can really do this
optimization (although it's sad)...

>  		mark_page_accessed(page);
>  		status = a_ops->write_end(file, mapping, pos, bytes, copied,
>  						page, fsdata);
> @@ -2387,7 +2389,8 @@ again:
>  		pos += copied;
>  		written += copied;
>  
> -		balance_dirty_pages_ratelimited(mapping);
> +		if (!dirty)
> +			balance_dirty_pages_ratelimited(mapping);
>  
>  	} while (iov_iter_count(i));

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/4] writeback: skip balance_dirty_pages() for in-memory fs
  2011-04-13  8:59 ` [PATCH 3/4] writeback: skip balance_dirty_pages() for in-memory fs Wu Fengguang
@ 2011-04-13 21:54   ` Jan Kara
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2011-04-13 21:54 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Peter Zijlstra, Hugh Dickins, Rik van Riel,
	Jan Kara, Dave Chinner, LKML, Linux Memory Management List,
	linux-fsdevel

On Wed 13-04-11 16:59:40, Wu Fengguang wrote:
> This avoids unnecessary checks and dirty throttling on tmpfs/ramfs.
> 
> It can also prevent
> 
> [  388.126563] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
> 
> in the balance_dirty_pages tracepoint, which will call
> 
> 	dev_name(mapping->backing_dev_info->dev)
> 
> but shmem_backing_dev_info.dev is NULL.
> 
> Summary notes about the tmpfs/ramfs behavior changes:
> 
> As for 2.6.36 and older kernels, the tmpfs writes will sleep inside
> balance_dirty_pages() as long as we are over the (dirty+background)/2
> global throttle threshold.  This is because both the dirty pages and
> threshold will be 0 for tmpfs/ramfs. Hence this test will always
> evaluate to TRUE:
> 
>                 dirty_exceeded =
>                         (bdi_nr_reclaimable + bdi_nr_writeback >= bdi_thresh)
>                         || (nr_reclaimable + nr_writeback >= dirty_thresh);
> 
> For 2.6.37, someone complained that the current logic does not allow the
> users to set vm.dirty_ratio=0.  So commit 4cbec4c8b9 changed the test to
> 
>                 dirty_exceeded =
>                         (bdi_nr_reclaimable + bdi_nr_writeback > bdi_thresh)
>                         || (nr_reclaimable + nr_writeback > dirty_thresh);
> 
> So 2.6.37 will behave differently for tmpfs/ramfs: it will never get
> throttled unless the global dirty threshold is exceeded (which is very
> unlikely to happen; once happen, will block many tasks).
> 
> I'd say that the 2.6.36 behavior is very bad for tmpfs/ramfs. It means
> for a busy writing server, tmpfs write()s may get livelocked! The
> "inadvertent" throttling can hardly bring help to any workload because
> of its "either no throttling, or get throttled to death" property.
> 
> So based on 2.6.37, this patch won't bring more noticeable changes.
> 
> CC: Hugh Dickins <hughd@google.com>
> CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Acked-by: Rik van Riel <riel@redhat.com>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
  Looks good.
Acked-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  mm/page-writeback.c |   10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> --- linux-next.orig/mm/page-writeback.c	2011-03-03 14:43:37.000000000 +0800
> +++ linux-next/mm/page-writeback.c	2011-03-03 14:43:51.000000000 +0800
> @@ -244,13 +244,8 @@ void task_dirty_inc(struct task_struct *
>  static void bdi_writeout_fraction(struct backing_dev_info *bdi,
>  		long *numerator, long *denominator)
>  {
> -	if (bdi_cap_writeback_dirty(bdi)) {
> -		prop_fraction_percpu(&vm_completions, &bdi->completions,
> +	prop_fraction_percpu(&vm_completions, &bdi->completions,
>  				numerator, denominator);
> -	} else {
> -		*numerator = 0;
> -		*denominator = 1;
> -	}
>  }
>  
>  static inline void task_dirties_fraction(struct task_struct *tsk,
> @@ -495,6 +490,9 @@ static void balance_dirty_pages(struct a
>  	bool dirty_exceeded = false;
>  	struct backing_dev_info *bdi = mapping->backing_dev_info;
>  
> +	if (!bdi_cap_account_dirty(bdi))
> +		return;
> +
>  	for (;;) {
>  		struct writeback_control wbc = {
>  			.sync_mode	= WB_SYNC_NONE,
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-04-13  8:59 ` [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time Wu Fengguang
@ 2011-04-13 22:04   ` Jan Kara
  2011-04-13 23:31     ` Wu Fengguang
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2011-04-13 22:04 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Peter Zijlstra, Richard Kennedy, Jan Kara,
	Hugh Dickins, Rik van Riel, Dave Chinner, LKML,
	Linux Memory Management List, linux-fsdevel

On Wed 13-04-11 16:59:41, Wu Fengguang wrote:
> Reduce the dampening for the control system, yielding faster
> convergence. The change is a bit conservative, as smaller values may
> lead to noticeable bdi threshold fluctuates in low memory JBOD setup.
> 
> CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
> CC: Richard Kennedy <richard@rsk.demon.co.uk>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
  Well, I have nothing against this change as such but what I don't like is
that it just changes magical +2 for similarly magical +0. It's clear that
this will lead to more rapid updates of proportions of bdi's share of
writeback and thread's share of dirtying but why +0? Why not +1 or -1? So
I'd prefer to get some understanding of why do we need to update the
proportion period and why 4-times faster is just the right amount of faster
:) If I remember right you had some numbers for this, didn't you?

								Honza
> ---
>  mm/page-writeback.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- linux-next.orig/mm/page-writeback.c	2011-03-02 14:52:19.000000000 +0800
> +++ linux-next/mm/page-writeback.c	2011-03-02 15:00:17.000000000 +0800
> @@ -145,7 +145,7 @@ static int calc_period_shift(void)
>  	else
>  		dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
>  				100;
> -	return 2 + ilog2(dirty_total - 1);
> +	return ilog2(dirty_total - 1);
>  }
>  
>  /*
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-04-13 22:04   ` Jan Kara
@ 2011-04-13 23:31     ` Wu Fengguang
  2011-04-13 23:52       ` Dave Chinner
  0 siblings, 1 reply; 29+ messages in thread
From: Wu Fengguang @ 2011-04-13 23:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Peter Zijlstra, Richard Kennedy, Hugh Dickins,
	Rik van Riel, Dave Chinner, LKML, Linux Memory Management List,
	linux-fsdevel

On Thu, Apr 14, 2011 at 06:04:44AM +0800, Jan Kara wrote:
> On Wed 13-04-11 16:59:41, Wu Fengguang wrote:
> > Reduce the dampening for the control system, yielding faster
> > convergence. The change is a bit conservative, as smaller values may
> > lead to noticeable bdi threshold fluctuates in low memory JBOD setup.
> > 
> > CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > CC: Richard Kennedy <richard@rsk.demon.co.uk>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
>   Well, I have nothing against this change as such but what I don't like is
> that it just changes magical +2 for similarly magical +0. It's clear that

The patch tends to make the rampup time a bit more reasonable for
common desktops. From 100s to 25s (see below).

> this will lead to more rapid updates of proportions of bdi's share of
> writeback and thread's share of dirtying but why +0? Why not +1 or -1? So

Yes, it will especially be a problem on _small memory_ JBOD setups.
Richard actually has requested for a much radical change (decrease by
6) but that looks too much.

My team has a 12-disk JBOD with only 6G memory. The memory is pretty
small as a server, but it's a real setup and serves well as the
reference minimal setup that Linux should be able to run well on.

It will sure create more fluctuations, but still is acceptable in my
tests. For example,

http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/10HDD-JBOD-6G/xfs-128dd-1M-16p-5904M-20%25-2.6.38-rc6-dt6+-2011-02-23-19-46/balance_dirty_pages-pages.png

> I'd prefer to get some understanding of why do we need to update the
> proportion period and why 4-times faster is just the right amount of faster
> :) If I remember right you had some numbers for this, didn't you?

Even better, I have a graph :)

http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/vanilla/4G/xfs-1dd-1M-8p-3911M-20%25-2.6.38-rc7+-2011-03-07-21-55/balance_dirty_pages-pages.png

It shows that doing 1 dd on a 4G box, it took more than 100s to
rampup. The patch will reduce it to 25 seconds for a typical desktop.
The disk has 50MB/s throughput. Given a modern HDD or SSD, it will
converge more fast.

Thanks,
Fengguang

> > ---
> >  mm/page-writeback.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > --- linux-next.orig/mm/page-writeback.c	2011-03-02 14:52:19.000000000 +0800
> > +++ linux-next/mm/page-writeback.c	2011-03-02 15:00:17.000000000 +0800
> > @@ -145,7 +145,7 @@ static int calc_period_shift(void)
> >  	else
> >  		dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> >  				100;
> > -	return 2 + ilog2(dirty_total - 1);
> > +	return ilog2(dirty_total - 1);
> >  }
> >  
> >  /*
> > 
> > 
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-04-13 23:31     ` Wu Fengguang
@ 2011-04-13 23:52       ` Dave Chinner
  2011-04-14  0:23         ` Wu Fengguang
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2011-04-13 23:52 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Richard Kennedy,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

On Thu, Apr 14, 2011 at 07:31:22AM +0800, Wu Fengguang wrote:
> On Thu, Apr 14, 2011 at 06:04:44AM +0800, Jan Kara wrote:
> > On Wed 13-04-11 16:59:41, Wu Fengguang wrote:
> > > Reduce the dampening for the control system, yielding faster
> > > convergence. The change is a bit conservative, as smaller values may
> > > lead to noticeable bdi threshold fluctuates in low memory JBOD setup.
> > > 
> > > CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > CC: Richard Kennedy <richard@rsk.demon.co.uk>
> > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> >   Well, I have nothing against this change as such but what I don't like is
> > that it just changes magical +2 for similarly magical +0. It's clear that
> 
> The patch tends to make the rampup time a bit more reasonable for
> common desktops. From 100s to 25s (see below).
> 
> > this will lead to more rapid updates of proportions of bdi's share of
> > writeback and thread's share of dirtying but why +0? Why not +1 or -1? So
> 
> Yes, it will especially be a problem on _small memory_ JBOD setups.
> Richard actually has requested for a much radical change (decrease by
> 6) but that looks too much.
> 
> My team has a 12-disk JBOD with only 6G memory. The memory is pretty
> small as a server, but it's a real setup and serves well as the
> reference minimal setup that Linux should be able to run well on.

FWIW, linux runs on a lot of low power NAS boxes with jbod and/or
raid setups that have <= 1GB of RAM (many of them run XFS), so even
your setup could be considered large by a significant fraction of
the storage world. Hence you need to be careful of optimising for
what you think is a "normal" server, because there simply isn't such
a thing....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-04-13 23:52       ` Dave Chinner
@ 2011-04-14  0:23         ` Wu Fengguang
  2011-04-14 10:36           ` Richard Kennedy
       [not found]           ` <20110414151424.GA367@localhost>
  0 siblings, 2 replies; 29+ messages in thread
From: Wu Fengguang @ 2011-04-14  0:23 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Richard Kennedy,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

On Thu, Apr 14, 2011 at 07:52:11AM +0800, Dave Chinner wrote:
> On Thu, Apr 14, 2011 at 07:31:22AM +0800, Wu Fengguang wrote:
> > On Thu, Apr 14, 2011 at 06:04:44AM +0800, Jan Kara wrote:
> > > On Wed 13-04-11 16:59:41, Wu Fengguang wrote:
> > > > Reduce the dampening for the control system, yielding faster
> > > > convergence. The change is a bit conservative, as smaller values may
> > > > lead to noticeable bdi threshold fluctuates in low memory JBOD setup.
> > > > 
> > > > CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > CC: Richard Kennedy <richard@rsk.demon.co.uk>
> > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > >   Well, I have nothing against this change as such but what I don't like is
> > > that it just changes magical +2 for similarly magical +0. It's clear that
> > 
> > The patch tends to make the rampup time a bit more reasonable for
> > common desktops. From 100s to 25s (see below).
> > 
> > > this will lead to more rapid updates of proportions of bdi's share of
> > > writeback and thread's share of dirtying but why +0? Why not +1 or -1? So
> > 
> > Yes, it will especially be a problem on _small memory_ JBOD setups.
> > Richard actually has requested for a much radical change (decrease by
> > 6) but that looks too much.
> > 
> > My team has a 12-disk JBOD with only 6G memory. The memory is pretty
> > small as a server, but it's a real setup and serves well as the
> > reference minimal setup that Linux should be able to run well on.
> 
> FWIW, linux runs on a lot of low power NAS boxes with jbod and/or
> raid setups that have <= 1GB of RAM (many of them run XFS), so even
> your setup could be considered large by a significant fraction of
> the storage world. Hence you need to be careful of optimising for
> what you think is a "normal" server, because there simply isn't such
> a thing....

Good point! This patch is likely to hurt a loaded 1GB 4-disk NAS box...
I'll test the setup.

I did test low memory setups -- but only on simple 1-disk cases.

For example, when dirty thresh is lowered to 7MB, the dirty pages are
fluctuating like mad within the controlled scope:

http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/xfs-4dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-14-34/balance_dirty_pages-pages.png

But still, it achieves 100% disk utilization

http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/xfs-4dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-14-34/iostat-util.png

and good IO throughput:

http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/xfs-4dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-14-34/balance_dirty_pages-bandwidth.png

And even better, less than 120ms writeback latencies:

http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/xfs-4dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-14-34/balance_dirty_pages-pause.png

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/4] writeback: avoid duplicate balance_dirty_pages_ratelimited() calls
  2011-04-13 21:53   ` Jan Kara
@ 2011-04-14  0:30     ` Wu Fengguang
  2011-04-14 10:20       ` Jan Kara
  0 siblings, 1 reply; 29+ messages in thread
From: Wu Fengguang @ 2011-04-14  0:30 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Peter Zijlstra, Dave Chinner, Hugh Dickins,
	Rik van Riel, LKML, Linux Memory Management List, linux-fsdevel

On Thu, Apr 14, 2011 at 05:53:07AM +0800, Jan Kara wrote:
> On Wed 13-04-11 16:59:39, Wu Fengguang wrote:
> > When dd in 512bytes, balance_dirty_pages_ratelimited() could be called 8
> > times for the same page, but obviously the page is only dirtied once.
> > 
> > Fix it with a (slightly racy) PageDirty() test.
> > 
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> >  mm/filemap.c |    5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > --- linux-next.orig/mm/filemap.c	2011-04-13 16:46:01.000000000 +0800
> > +++ linux-next/mm/filemap.c	2011-04-13 16:47:26.000000000 +0800
> > @@ -2313,6 +2313,7 @@ static ssize_t generic_perform_write(str
> >  	long status = 0;
> >  	ssize_t written = 0;
> >  	unsigned int flags = 0;
> > +	unsigned int dirty;
> >  
> >  	/*
> >  	 * Copies from kernel address space cannot fail (NFSD is a big user).
> > @@ -2361,6 +2362,7 @@ again:
> >  		pagefault_enable();
> >  		flush_dcache_page(page);
> >  
> > +		dirty = PageDirty(page);
>   This isn't completely right as we sometimes dirty the page in
> ->write_begin() (see e.g. block_write_begin() when we allocate blocks under
> an already uptodate page) and in such cases we would not call
> balance_dirty_pages(). So I'm not sure we can really do this
> optimization (although it's sad)...

Good catch, thanks! I evaluated three possible options, the last one
looks most promising (however is a radical change).

- do radix_tree_tag_get() before calling ->write_begin()
  simple but heavy weight

- add balance_dirty_pages_ratelimited() in __block_write_begin()
  seems not easy, too

- accurately account the dirtied pages in account_page_dirtied() rather than
  in balance_dirty_pages_ratelimited_nr(). This diff on top of my patchset
  illustrates the idea, but will need to sort out cases like direct IO ...

--- linux-next.orig/mm/page-writeback.c	2011-04-14 07:50:09.000000000 +0800
+++ linux-next/mm/page-writeback.c	2011-04-14 07:52:35.000000000 +0800
@@ -1295,8 +1295,6 @@ void balance_dirty_pages_ratelimited_nr(
 	if (!bdi_cap_account_dirty(bdi))
 		return;
 
-	current->nr_dirtied += nr_pages_dirtied;
-
 	if (dirty_exceeded_recently(bdi, MAX_PAUSE)) {
 		unsigned long max = current->nr_dirtied +
 						(128 >> (PAGE_SHIFT - 10));
@@ -1752,6 +1750,7 @@ void account_page_dirtied(struct page *p
 		__inc_bdi_stat(mapping->backing_dev_info, BDI_DIRTIED);
 		task_dirty_inc(current);
 		task_io_account_write(PAGE_CACHE_SIZE);
+		current->nr_dirtied++;
 	}
 }
 EXPORT_SYMBOL(account_page_dirtied);

> >  		mark_page_accessed(page);
> >  		status = a_ops->write_end(file, mapping, pos, bytes, copied,
> >  						page, fsdata);
> > @@ -2387,7 +2389,8 @@ again:
> >  		pos += copied;
> >  		written += copied;
> >  
> > -		balance_dirty_pages_ratelimited(mapping);
> > +		if (!dirty)
> > +			balance_dirty_pages_ratelimited(mapping);
> >  
> >  	} while (iov_iter_count(i));
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/4] writeback: avoid duplicate balance_dirty_pages_ratelimited() calls
  2011-04-14  0:30     ` Wu Fengguang
@ 2011-04-14 10:20       ` Jan Kara
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2011-04-14 10:20 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Dave Chinner,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

On Thu 14-04-11 08:30:45, Wu Fengguang wrote:
> On Thu, Apr 14, 2011 at 05:53:07AM +0800, Jan Kara wrote:
> > On Wed 13-04-11 16:59:39, Wu Fengguang wrote:
> > > When dd in 512bytes, balance_dirty_pages_ratelimited() could be called 8
> > > times for the same page, but obviously the page is only dirtied once.
> > > 
> > > Fix it with a (slightly racy) PageDirty() test.
> > > 
> > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > ---
> > >  mm/filemap.c |    5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > --- linux-next.orig/mm/filemap.c	2011-04-13 16:46:01.000000000 +0800
> > > +++ linux-next/mm/filemap.c	2011-04-13 16:47:26.000000000 +0800
> > > @@ -2313,6 +2313,7 @@ static ssize_t generic_perform_write(str
> > >  	long status = 0;
> > >  	ssize_t written = 0;
> > >  	unsigned int flags = 0;
> > > +	unsigned int dirty;
> > >  
> > >  	/*
> > >  	 * Copies from kernel address space cannot fail (NFSD is a big user).
> > > @@ -2361,6 +2362,7 @@ again:
> > >  		pagefault_enable();
> > >  		flush_dcache_page(page);
> > >  
> > > +		dirty = PageDirty(page);
> >   This isn't completely right as we sometimes dirty the page in
> > ->write_begin() (see e.g. block_write_begin() when we allocate blocks under
> > an already uptodate page) and in such cases we would not call
> > balance_dirty_pages(). So I'm not sure we can really do this
> > optimization (although it's sad)...
> 
> Good catch, thanks! I evaluated three possible options, the last one
> looks most promising (however is a radical change).
> 
> - do radix_tree_tag_get() before calling ->write_begin()
>   simple but heavy weight
  Yes, moreover you cannot really do the check until you have the page
locked for write because otherwise someone could come and write the page
before ->write_begin starts working with it.

> - add balance_dirty_pages_ratelimited() in __block_write_begin()
>   seems not easy, too
  Yes, you would call balance_dirty_pages_ratelimited() with page lock held
which is not a good thing to do.

> - accurately account the dirtied pages in account_page_dirtied() rather than
>   in balance_dirty_pages_ratelimited_nr(). This diff on top of my patchset
>   illustrates the idea, but will need to sort out cases like direct IO ...
> 
> --- linux-next.orig/mm/page-writeback.c	2011-04-14 07:50:09.000000000 +0800
> +++ linux-next/mm/page-writeback.c	2011-04-14 07:52:35.000000000 +0800
> @@ -1295,8 +1295,6 @@ void balance_dirty_pages_ratelimited_nr(
>  	if (!bdi_cap_account_dirty(bdi))
>  		return;
>  
> -	current->nr_dirtied += nr_pages_dirtied;
> -
>  	if (dirty_exceeded_recently(bdi, MAX_PAUSE)) {
>  		unsigned long max = current->nr_dirtied +
>  						(128 >> (PAGE_SHIFT - 10));
> @@ -1752,6 +1750,7 @@ void account_page_dirtied(struct page *p
>  		__inc_bdi_stat(mapping->backing_dev_info, BDI_DIRTIED);
>  		task_dirty_inc(current);
>  		task_io_account_write(PAGE_CACHE_SIZE);
> +		current->nr_dirtied++;
>  	}
>  }
  I see. We could do ratelimit accounting in account_page_dirtied() and
only check limits in balance_dirty_pages(). The only downside of this I can
see is that we would do one-by-one increment instead of a simple addition
when several pages are dirtied (ocfs2, btrfs, and splice interface take
advantage of this). But that should not be a huge issue and it's probably
worth the better ratelimit accounting.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-04-14  0:23         ` Wu Fengguang
@ 2011-04-14 10:36           ` Richard Kennedy
  2011-04-14 13:49             ` Wu Fengguang
       [not found]           ` <20110414151424.GA367@localhost>
  1 sibling, 1 reply; 29+ messages in thread
From: Richard Kennedy @ 2011-04-14 10:36 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Dave Chinner, Jan Kara, Andrew Morton, Peter Zijlstra,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

On Thu, 2011-04-14 at 08:23 +0800, Wu Fengguang wrote:
> On Thu, Apr 14, 2011 at 07:52:11AM +0800, Dave Chinner wrote:
> > On Thu, Apr 14, 2011 at 07:31:22AM +0800, Wu Fengguang wrote:
> > > On Thu, Apr 14, 2011 at 06:04:44AM +0800, Jan Kara wrote:
> > > > On Wed 13-04-11 16:59:41, Wu Fengguang wrote:
> > > > > Reduce the dampening for the control system, yielding faster
> > > > > convergence. The change is a bit conservative, as smaller values may
> > > > > lead to noticeable bdi threshold fluctuates in low memory JBOD setup.
> > > > > 
> > > > > CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > > CC: Richard Kennedy <richard@rsk.demon.co.uk>
> > > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > >   Well, I have nothing against this change as such but what I don't like is
> > > > that it just changes magical +2 for similarly magical +0. It's clear that
> > > 
> > > The patch tends to make the rampup time a bit more reasonable for
> > > common desktops. From 100s to 25s (see below).
> > > 
> > > > this will lead to more rapid updates of proportions of bdi's share of
> > > > writeback and thread's share of dirtying but why +0? Why not +1 or -1? So
> > > 
> > > Yes, it will especially be a problem on _small memory_ JBOD setups.
> > > Richard actually has requested for a much radical change (decrease by
> > > 6) but that looks too much.
> > > 
> > > My team has a 12-disk JBOD with only 6G memory. The memory is pretty
> > > small as a server, but it's a real setup and serves well as the
> > > reference minimal setup that Linux should be able to run well on.
> > 
> > FWIW, linux runs on a lot of low power NAS boxes with jbod and/or
> > raid setups that have <= 1GB of RAM (many of them run XFS), so even
> > your setup could be considered large by a significant fraction of
> > the storage world. Hence you need to be careful of optimising for
> > what you think is a "normal" server, because there simply isn't such
> > a thing....
> 
> Good point! This patch is likely to hurt a loaded 1GB 4-disk NAS box...
> I'll test the setup.
> 
> I did test low memory setups -- but only on simple 1-disk cases.
> 
> For example, when dirty thresh is lowered to 7MB, the dirty pages are
> fluctuating like mad within the controlled scope:
> 
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/xfs-4dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-14-34/balance_dirty_pages-pages.png
> 
> But still, it achieves 100% disk utilization
> 
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/xfs-4dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-14-34/iostat-util.png
> 
> and good IO throughput:
> 
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/xfs-4dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-14-34/balance_dirty_pages-bandwidth.png
> 
> And even better, less than 120ms writeback latencies:
> 
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/xfs-4dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-14-34/balance_dirty_pages-pause.png
> 
> Thanks,
> Fengguang
> 

I'm only testing on a desktop with 2 drives. I use a simple test to
write 2gb to sda then 2gb to sdb while recording the threshold values.
On 2.6.39-rc3, after the 2nd write starts it take approx 90 seconds for
sda's threshold value to drop from its maximum to minimum and sdb's to
rise from min to max. So this seems much too slow for normal desktop
workloads. 

I haven't tested with this patch on 2.6.39-rc3 yet, but I'm just about
to set that up. 

I know it's difficult to pick one magic number to fit every case, but I
don't see any easy way to make this more adaptive. We could make this
calculation take account of more things, but I don't know what.


Nice graphs :) BTW do you know what's causing that 10 second (1/10 Hz)
fluctuation in write bandwidth? and does this change effect that in any
way?   

regards
Richard



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-04-14 10:36           ` Richard Kennedy
@ 2011-04-14 13:49             ` Wu Fengguang
  2011-04-14 14:08               ` Wu Fengguang
  0 siblings, 1 reply; 29+ messages in thread
From: Wu Fengguang @ 2011-04-14 13:49 UTC (permalink / raw)
  To: Richard Kennedy
  Cc: Dave Chinner, Jan Kara, Andrew Morton, Peter Zijlstra,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

On Thu, Apr 14, 2011 at 06:36:22PM +0800, Richard Kennedy wrote:
> On Thu, 2011-04-14 at 08:23 +0800, Wu Fengguang wrote:
> > On Thu, Apr 14, 2011 at 07:52:11AM +0800, Dave Chinner wrote:
> > > On Thu, Apr 14, 2011 at 07:31:22AM +0800, Wu Fengguang wrote:
> > > > On Thu, Apr 14, 2011 at 06:04:44AM +0800, Jan Kara wrote:
> > > > > On Wed 13-04-11 16:59:41, Wu Fengguang wrote:
> > > > > > Reduce the dampening for the control system, yielding faster
> > > > > > convergence. The change is a bit conservative, as smaller values may
> > > > > > lead to noticeable bdi threshold fluctuates in low memory JBOD setup.
> > > > > > 
> > > > > > CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > > > CC: Richard Kennedy <richard@rsk.demon.co.uk>
> > > > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > > >   Well, I have nothing against this change as such but what I don't like is
> > > > > that it just changes magical +2 for similarly magical +0. It's clear that
> > > > 
> > > > The patch tends to make the rampup time a bit more reasonable for
> > > > common desktops. From 100s to 25s (see below).
> > > > 
> > > > > this will lead to more rapid updates of proportions of bdi's share of
> > > > > writeback and thread's share of dirtying but why +0? Why not +1 or -1? So
> > > > 
> > > > Yes, it will especially be a problem on _small memory_ JBOD setups.
> > > > Richard actually has requested for a much radical change (decrease by
> > > > 6) but that looks too much.
> > > > 
> > > > My team has a 12-disk JBOD with only 6G memory. The memory is pretty
> > > > small as a server, but it's a real setup and serves well as the
> > > > reference minimal setup that Linux should be able to run well on.
> > > 
> > > FWIW, linux runs on a lot of low power NAS boxes with jbod and/or
> > > raid setups that have <= 1GB of RAM (many of them run XFS), so even
> > > your setup could be considered large by a significant fraction of
> > > the storage world. Hence you need to be careful of optimising for
> > > what you think is a "normal" server, because there simply isn't such
> > > a thing....
> > 
> > Good point! This patch is likely to hurt a loaded 1GB 4-disk NAS box...
> > I'll test the setup.
> > 
> > I did test low memory setups -- but only on simple 1-disk cases.
> > 
> > For example, when dirty thresh is lowered to 7MB, the dirty pages are
> > fluctuating like mad within the controlled scope:
> > 
> > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/xfs-4dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-14-34/balance_dirty_pages-pages.png
> > 
> > But still, it achieves 100% disk utilization
> > 
> > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/xfs-4dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-14-34/iostat-util.png
> > 
> > and good IO throughput:
> > 
> > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/xfs-4dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-14-34/balance_dirty_pages-bandwidth.png
> > 
> > And even better, less than 120ms writeback latencies:
> > 
> > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/xfs-4dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-14-34/balance_dirty_pages-pause.png
> > 
> > Thanks,
> > Fengguang
> > 
> 
> I'm only testing on a desktop with 2 drives. I use a simple test to
> write 2gb to sda then 2gb to sdb while recording the threshold values.
> On 2.6.39-rc3, after the 2nd write starts it take approx 90 seconds for
> sda's threshold value to drop from its maximum to minimum and sdb's to
> rise from min to max. So this seems much too slow for normal desktop
> workloads. 

Yes.

> I haven't tested with this patch on 2.6.39-rc3 yet, but I'm just about
> to set that up. 

It will sure help, but the problem is now the low-memory NAS servers..

Fortunately my patchset could make the dirty pages ramp up much more
fast than the ramp up speed of the per-bdi threshold, and is also less
sensitive to the fluctuations of per-bdi thresholds in JBOD setup.

In fact my main concern in the low-memory NAS setup is how to prevent
disk from going idle from time to time due to bdi dirty pages running
low. The fluctuations of per-bdi thresholds in this case is no longer
relevant for me. I end up adding a rule to throttle the task less when
the bdi is running low of dirty pages. I find that the vanilla kernel
also has this problem.

> I know it's difficult to pick one magic number to fit every case, but I
> don't see any easy way to make this more adaptive. We could make this
> calculation take account of more things, but I don't know what.
> 
> 
> Nice graphs :) BTW do you know what's causing that 10 second (1/10 Hz)
> fluctuation in write bandwidth? and does this change effect that in any
> way?   

In fact each filesystems is fluctuating in its unique way. For example,

ext4, 4 dd
http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/ext4-4dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-14-49/balance_dirty_pages-bandwidth.png

btrfs, 4 dd
http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/btrfs-4dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-15-03/balance_dirty_pages-bandwidth.png

btrfs, 1 dd
http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/btrfs-1dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-14-56/balance_dirty_pages-bandwidth.png

I'm not sure about the exact root cause, but it's more or less related
to the fluctuations of IO completion events. For example, the
"written" curve is not a strictly straight line:

http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/512M-2%25/btrfs-1dd-1M-8p-435M-2%25-2.6.38-rc5-dt6+-2011-02-22-14-56/global_dirtied_written.png

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-04-14 13:49             ` Wu Fengguang
@ 2011-04-14 14:08               ` Wu Fengguang
  0 siblings, 0 replies; 29+ messages in thread
From: Wu Fengguang @ 2011-04-14 14:08 UTC (permalink / raw)
  To: Richard Kennedy
  Cc: Dave Chinner, Jan Kara, Andrew Morton, Peter Zijlstra,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 930 bytes --]

> > I'm only testing on a desktop with 2 drives. I use a simple test to
> > write 2gb to sda then 2gb to sdb while recording the threshold values.
> > On 2.6.39-rc3, after the 2nd write starts it take approx 90 seconds for
> > sda's threshold value to drop from its maximum to minimum and sdb's to
> > rise from min to max. So this seems much too slow for normal desktop
> > workloads. 
> 
> Yes.
> 
> > I haven't tested with this patch on 2.6.39-rc3 yet, but I'm just about
> > to set that up. 
> 
> It will sure help, but the problem is now the low-memory NAS servers..
> 
> Fortunately my patchset could make the dirty pages ramp up much more
> fast than the ramp up speed of the per-bdi threshold, and is also less
> sensitive to the fluctuations of per-bdi thresholds in JBOD setup.

Look at the attached graph. You cannot notice an obvious "rampup"
stage in the number of dirty pages (red line) at all :)

Thanks,
Fengguang

[-- Attachment #2: balance_dirty_pages-pages.png --]
[-- Type: image/png, Size: 81675 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
       [not found]           ` <20110414151424.GA367@localhost>
@ 2011-04-14 15:56             ` Wu Fengguang
  2011-04-14 18:16             ` Jan Kara
  1 sibling, 0 replies; 29+ messages in thread
From: Wu Fengguang @ 2011-04-14 15:56 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, Andrew Morton, Peter Zijlstra, Richard Kennedy,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 3275 bytes --]

On Thu, Apr 14, 2011 at 11:14:24PM +0800, Wu Fengguang wrote:
> On Thu, Apr 14, 2011 at 08:23:02AM +0800, Wu Fengguang wrote:
> > On Thu, Apr 14, 2011 at 07:52:11AM +0800, Dave Chinner wrote:
> > > On Thu, Apr 14, 2011 at 07:31:22AM +0800, Wu Fengguang wrote:
> > > > On Thu, Apr 14, 2011 at 06:04:44AM +0800, Jan Kara wrote:
> > > > > On Wed 13-04-11 16:59:41, Wu Fengguang wrote:
> > > > > > Reduce the dampening for the control system, yielding faster
> > > > > > convergence. The change is a bit conservative, as smaller values may
> > > > > > lead to noticeable bdi threshold fluctuates in low memory JBOD setup.
> > > > > > 
> > > > > > CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > > > CC: Richard Kennedy <richard@rsk.demon.co.uk>
> > > > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > > >   Well, I have nothing against this change as such but what I don't like is
> > > > > that it just changes magical +2 for similarly magical +0. It's clear that
> > > > 
> > > > The patch tends to make the rampup time a bit more reasonable for
> > > > common desktops. From 100s to 25s (see below).
> > > > 
> > > > > this will lead to more rapid updates of proportions of bdi's share of
> > > > > writeback and thread's share of dirtying but why +0? Why not +1 or -1? So
> > > > 
> > > > Yes, it will especially be a problem on _small memory_ JBOD setups.
> > > > Richard actually has requested for a much radical change (decrease by
> > > > 6) but that looks too much.
> > > > 
> > > > My team has a 12-disk JBOD with only 6G memory. The memory is pretty
> > > > small as a server, but it's a real setup and serves well as the
> > > > reference minimal setup that Linux should be able to run well on.
> > > 
> > > FWIW, linux runs on a lot of low power NAS boxes with jbod and/or
> > > raid setups that have <= 1GB of RAM (many of them run XFS), so even
> > > your setup could be considered large by a significant fraction of
> > > the storage world. Hence you need to be careful of optimising for
> > > what you think is a "normal" server, because there simply isn't such
> > > a thing....
> > 
> > Good point! This patch is likely to hurt a loaded 1GB 4-disk NAS box...
> > I'll test the setup.
> 
> Just did a comparison of the IO-less patches' performance with and
> without this patch. I hardly notice any differences besides some more
> bdi goal fluctuations in the attached graphs. The write throughput is
> a bit large with this patch (80MB/s vs 76MB/s), however the delta is
> within the even larger stddev range (20MB/s).
> 
> The basic conclusion is, my IO-less patchset is very insensible to the
> bdi threshold fluctuations. In this kind of low memory case, just take
> care to stop the bdi pages from dropping too low and you get good
> performance. (well, the disks are still not 100% utilized at times...)

> Fluctuations in disk throughput and dirty rate and virtually
> everything are unavoidable due to the low memory situation.

Yeah the fluctuations in the dirty rate are worse than memory bounty
situations, however is still a lot better than what vanilla kernel can
provide.

The attached graphs are collected with this patch. They show <=20ms
pause times and not all that straight but nowhere bumpy progresses.

Thanks,
Fengguang

[-- Attachment #2: balance_dirty_pages-task-bw.png --]
[-- Type: image/png, Size: 39729 bytes --]

[-- Attachment #3: balance_dirty_pages-pause.png --]
[-- Type: image/png, Size: 50274 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
       [not found]           ` <20110414151424.GA367@localhost>
  2011-04-14 15:56             ` Wu Fengguang
@ 2011-04-14 18:16             ` Jan Kara
  2011-04-15  3:43               ` Wu Fengguang
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Kara @ 2011-04-14 18:16 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Dave Chinner, Jan Kara, Andrew Morton, Peter Zijlstra,
	Richard Kennedy, Hugh Dickins, Rik van Riel, LKML,
	Linux Memory Management List, linux-fsdevel

On Thu 14-04-11 23:14:25, Wu Fengguang wrote:
> On Thu, Apr 14, 2011 at 08:23:02AM +0800, Wu Fengguang wrote:
> > On Thu, Apr 14, 2011 at 07:52:11AM +0800, Dave Chinner wrote:
> > > On Thu, Apr 14, 2011 at 07:31:22AM +0800, Wu Fengguang wrote:
> > > > On Thu, Apr 14, 2011 at 06:04:44AM +0800, Jan Kara wrote:
> > > > > On Wed 13-04-11 16:59:41, Wu Fengguang wrote:
> > > > > > Reduce the dampening for the control system, yielding faster
> > > > > > convergence. The change is a bit conservative, as smaller values may
> > > > > > lead to noticeable bdi threshold fluctuates in low memory JBOD setup.
> > > > > > 
> > > > > > CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > > > CC: Richard Kennedy <richard@rsk.demon.co.uk>
> > > > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > > >   Well, I have nothing against this change as such but what I don't like is
> > > > > that it just changes magical +2 for similarly magical +0. It's clear that
> > > > 
> > > > The patch tends to make the rampup time a bit more reasonable for
> > > > common desktops. From 100s to 25s (see below).
> > > > 
> > > > > this will lead to more rapid updates of proportions of bdi's share of
> > > > > writeback and thread's share of dirtying but why +0? Why not +1 or -1? So
> > > > 
> > > > Yes, it will especially be a problem on _small memory_ JBOD setups.
> > > > Richard actually has requested for a much radical change (decrease by
> > > > 6) but that looks too much.
> > > > 
> > > > My team has a 12-disk JBOD with only 6G memory. The memory is pretty
> > > > small as a server, but it's a real setup and serves well as the
> > > > reference minimal setup that Linux should be able to run well on.
> > > 
> > > FWIW, linux runs on a lot of low power NAS boxes with jbod and/or
> > > raid setups that have <= 1GB of RAM (many of them run XFS), so even
> > > your setup could be considered large by a significant fraction of
> > > the storage world. Hence you need to be careful of optimising for
> > > what you think is a "normal" server, because there simply isn't such
> > > a thing....
> > 
> > Good point! This patch is likely to hurt a loaded 1GB 4-disk NAS box...
> > I'll test the setup.
> 
> Just did a comparison of the IO-less patches' performance with and
> without this patch. I hardly notice any differences besides some more
> bdi goal fluctuations in the attached graphs. The write throughput is
> a bit large with this patch (80MB/s vs 76MB/s), however the delta is
> within the even larger stddev range (20MB/s).
  Thanks for the test but I cannot find out from the numbers you provided
how much did the per-bdi thresholds fluctuate in this low memory NAS case?
You can gather current bdi threshold from /sys/kernel/debug/bdi/<dev>/stats
so it shouldn't be hard to get the numbers...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-04-14 18:16             ` Jan Kara
@ 2011-04-15  3:43               ` Wu Fengguang
       [not found]                 ` <20110415143711.GA17181@localhost>
  0 siblings, 1 reply; 29+ messages in thread
From: Wu Fengguang @ 2011-04-15  3:43 UTC (permalink / raw)
  To: Jan Kara
  Cc: Dave Chinner, Andrew Morton, Peter Zijlstra, Richard Kennedy,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 4056 bytes --]

On Fri, Apr 15, 2011 at 02:16:09AM +0800, Jan Kara wrote:
> On Thu 14-04-11 23:14:25, Wu Fengguang wrote:
> > On Thu, Apr 14, 2011 at 08:23:02AM +0800, Wu Fengguang wrote:
> > > On Thu, Apr 14, 2011 at 07:52:11AM +0800, Dave Chinner wrote:
> > > > On Thu, Apr 14, 2011 at 07:31:22AM +0800, Wu Fengguang wrote:
> > > > > On Thu, Apr 14, 2011 at 06:04:44AM +0800, Jan Kara wrote:
> > > > > > On Wed 13-04-11 16:59:41, Wu Fengguang wrote:
> > > > > > > Reduce the dampening for the control system, yielding faster
> > > > > > > convergence. The change is a bit conservative, as smaller values may
> > > > > > > lead to noticeable bdi threshold fluctuates in low memory JBOD setup.
> > > > > > > 
> > > > > > > CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > > > > CC: Richard Kennedy <richard@rsk.demon.co.uk>
> > > > > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > > > >   Well, I have nothing against this change as such but what I don't like is
> > > > > > that it just changes magical +2 for similarly magical +0. It's clear that
> > > > > 
> > > > > The patch tends to make the rampup time a bit more reasonable for
> > > > > common desktops. From 100s to 25s (see below).
> > > > > 
> > > > > > this will lead to more rapid updates of proportions of bdi's share of
> > > > > > writeback and thread's share of dirtying but why +0? Why not +1 or -1? So
> > > > > 
> > > > > Yes, it will especially be a problem on _small memory_ JBOD setups.
> > > > > Richard actually has requested for a much radical change (decrease by
> > > > > 6) but that looks too much.
> > > > > 
> > > > > My team has a 12-disk JBOD with only 6G memory. The memory is pretty
> > > > > small as a server, but it's a real setup and serves well as the
> > > > > reference minimal setup that Linux should be able to run well on.
> > > > 
> > > > FWIW, linux runs on a lot of low power NAS boxes with jbod and/or
> > > > raid setups that have <= 1GB of RAM (many of them run XFS), so even
> > > > your setup could be considered large by a significant fraction of
> > > > the storage world. Hence you need to be careful of optimising for
> > > > what you think is a "normal" server, because there simply isn't such
> > > > a thing....
> > > 
> > > Good point! This patch is likely to hurt a loaded 1GB 4-disk NAS box...
> > > I'll test the setup.
> > 
> > Just did a comparison of the IO-less patches' performance with and
> > without this patch. I hardly notice any differences besides some more
> > bdi goal fluctuations in the attached graphs. The write throughput is
> > a bit large with this patch (80MB/s vs 76MB/s), however the delta is
> > within the even larger stddev range (20MB/s).
>   Thanks for the test but I cannot find out from the numbers you provided
> how much did the per-bdi thresholds fluctuate in this low memory NAS case?
> You can gather current bdi threshold from /sys/kernel/debug/bdi/<dev>/stats
> so it shouldn't be hard to get the numbers...

Hi Jan, attached are your results w/o this patch. The "bdi goal" (gray
line) is calculated as (bdi_thresh - bdi_thresh/8) and is fluctuating
all over the place.. and average wkB/s is only 49MB/s..

Thanks,
Fengguang
---

wfg ~/bee% cat xfs-1dd-1M-16p-5907M-3:2-2.6.39-rc3-jan-bdp+-2011-04-15.11:11/iostat-avg 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
sum          2.460      0.000     71.080    767.240      0.000   1859.220 
avg          0.091      0.000      2.633     28.416      0.000     68.860 
stddev       0.064      0.000      0.659      7.903      0.000      7.792 


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sum          0.000     58.100      0.000   2926.980      0.000 1331730.590  18278.540    962.290   4850.450     97.470   1315.600 
avg          0.000      2.152      0.000    108.407      0.000  49323.355    676.983     35.640    179.646      3.610     48.726 
stddev       0.000      5.336      0.000    104.398      0.000  47602.790    400.410     40.696    169.289      2.212     45.870 


[-- Attachment #2: balance_dirty_pages-pages.png --]
[-- Type: image/png, Size: 111238 bytes --]

[-- Attachment #3: balance_dirty_pages-task-bw.png --]
[-- Type: image/png, Size: 36656 bytes --]

[-- Attachment #4: balance_dirty_pages-pause.png --]
[-- Type: image/png, Size: 28377 bytes --]

[-- Attachment #5: iostat --]
[-- Type: text/plain, Size: 65182 bytes --]

Linux 2.6.39-rc3-jan-bdp+ (lkp-ne02) 	04/15/11 	_x86_64_	(16 CPU)

04/15/11 11:11:04
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.03    0.00    0.53    0.23    0.00   99.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               2.24     0.00    0.66    0.00     2.62     0.00     7.94     0.00    5.75   2.56   0.17
sdb               2.73     0.00    0.66    0.00     2.61     0.00     7.92     0.00    3.68   2.15   0.14
sdc               5.62     5.17    1.23    2.13     4.40   484.78   290.60     0.01    2.67   2.36   0.79
sdd               5.62     5.17    1.23    2.13     4.40   484.78   290.60     0.01    2.61   2.35   0.79
sdf               5.62     5.17    1.23    2.13     4.40   484.78   290.60     0.01    2.66   2.27   0.76
sdg               2.73     0.00    0.66    0.00     2.61     0.00     7.92     0.00    0.57   0.43   0.03
sdh               2.73     0.00    0.66    0.00     2.61     0.00     7.92     0.00    1.38   0.70   0.05
sdi               2.73     0.00    0.66    0.00     2.61     0.00     7.92     0.00    2.28   1.02   0.07
sdl               2.73     0.00    0.66    0.00     2.61     0.00     7.92     0.00    5.56   2.77   0.18
sdk               2.73     0.00    0.66    0.00     2.61     0.00     7.92     0.00    3.41   2.44   0.16
sdj               2.73     0.00    0.66    0.00     2.61     0.00     7.92     0.00    4.07   2.22   0.15
sdm               2.73     0.00    0.66    0.00     2.61     0.00     7.92     0.00    4.21   2.41   0.16
sde               5.62     5.17    1.23    2.13     4.40   484.78   290.60     0.01    2.54   2.25   0.76

04/15/11 11:11:05
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.25    0.00    2.56    6.81    0.00   90.38

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    6.00   55.00    13.50 25112.00   823.79     7.38   68.49   5.07  30.90
sdd               0.00     0.00    6.00   81.00    13.50 36900.00   848.59    10.04  115.44   4.95  43.10
sdf               0.00     0.00    6.00   24.00    13.50 11156.00   744.63     1.73   48.03   6.73  20.20
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    6.00   33.00    13.50 15544.00   797.82     1.95   49.92   5.87  22.90

04/15/11 11:11:06
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.07    0.00    1.60   11.19    0.00   87.14

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00  148.00     0.00 67648.00   914.16    20.07  129.17   5.26  77.90
sdd               0.00     0.00    0.00  110.00     0.00 50224.00   913.16    13.82  110.78   5.74  63.10
sdf               0.00    10.00    0.00   49.00     0.00 22092.00   901.71     3.19   70.94   6.29  30.80
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00   63.00     0.00 28836.00   915.43     6.02   68.86   7.43  46.80

04/15/11 11:11:07
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.10    0.00    3.93   34.50    0.00   61.47

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00  184.00     0.00 83540.00   908.04    29.32  181.99   4.65  85.50
sdd               0.00     0.00    0.00   83.00     0.00 37924.00   913.83     7.49   92.65   5.66  47.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00   48.00     0.00 22036.00   918.17     3.98   78.25   5.62  27.00

04/15/11 11:11:08
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.12    0.00    2.44   19.95    0.00   77.49

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00   81.00     0.00 36900.00   911.11    10.88  134.35   6.35  51.40
sdd               0.00     0.00    0.00  185.00     0.00 84052.00   908.67    25.92  139.65   5.06  93.60
sdf               0.00     0.00    0.00   81.00     0.00 36900.00   911.11     8.00   98.75   6.26  50.70
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00  112.00     0.00 50740.00   906.07    10.08  106.99   5.58  62.50

04/15/11 11:11:09
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.06    0.00    3.09   23.78    0.00   73.07

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00  118.00     0.00 53812.00   912.07    71.68  327.02   6.63  78.20
sdd               0.00     0.00    0.00  146.00     0.00 66624.00   912.66    51.58  247.23   6.08  88.70
sdf               0.00     0.00    0.00   88.00     0.00 40100.00   911.36    13.76  156.39   6.25  55.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00  102.00     0.00 46636.00   914.43    20.23  198.29   6.41  65.40

04/15/11 11:11:10
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.14    0.00    3.57   29.16    0.00   67.12

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00  232.00     0.00 105864.00   912.62    42.51  314.77   4.31 100.10
sdd               0.00     0.00    0.00  164.00     0.00 74740.00   911.46    15.92  200.82   4.43  72.70
sdf               0.00     0.00    0.00   54.00     0.00 24600.00   911.11     3.42   63.37   5.15  27.80
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00   18.00     0.00  8200.00   911.11     0.58   32.17   5.56  10.00

04/15/11 11:11:11
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.06    0.00    2.91   23.69    0.00   73.34

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00  237.00     0.00 108136.00   912.54    79.45  271.22   4.22 100.00
sdd               0.00     0.00    0.00  149.00     0.00 68160.00   914.90    12.46   81.21   4.45  66.30
sdf               0.00     0.00    0.00    9.00     0.00  4100.00   911.11     0.37   41.56   6.22   5.60
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

04/15/11 11:11:12
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.09    0.00    2.81   33.77    0.00   63.33

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00  215.00     0.00 97888.00   910.59   110.38  467.82   4.65 100.00
sdd               0.00     0.00    0.00   49.00     0.00 22040.00   899.59     1.98   47.69   6.65  32.60
sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

04/15/11 11:11:13
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.06    0.00    2.20   23.63    0.00   74.11

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00  221.00     0.00 100452.00   909.07   102.73  479.98   4.52 100.00
sdd               0.00     0.00    0.00   97.00     0.00 44584.00   919.26     6.89   68.51   5.34  51.80
sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

04/15/11 11:11:14
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.10    0.00    1.77   17.34    0.00   80.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     1.00    0.00  222.00     0.00 101764.00   916.79    88.72  442.20   4.50  99.90
sdd               0.00    11.00    0.00  183.00     0.00 82888.00   905.88    30.25  166.62   4.45  81.40
sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

04/15/11 11:11:15
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.09    0.00    3.28   32.44    0.00   64.19

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00  241.00     0.00 109676.00   910.17    86.70  352.49   4.15 100.10
sdd               0.00     0.00    0.00  171.00     0.00 77900.00   911.11    14.94   76.73   4.18  71.50
sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

04/15/11 11:11:16
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.06    0.00    2.98   21.90    0.00   75.06

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00  183.00     0.00 83028.00   907.41    37.92  297.45   4.42  80.90
sdd               0.00     0.00    0.00  180.00     0.00 82000.00   911.11    28.46  129.28   4.77  85.80
sdf               0.00     0.00    0.00   94.00     0.00 43436.00   924.17     8.69   82.37   5.51  51.80
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     1.00    0.00   96.00     0.00 44184.00   920.50    22.37  230.18   5.56  53.40

04/15/11 11:11:17
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.10    0.00    2.52   18.68    0.00   78.70

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00   39.00     0.00 17936.00   919.79     1.75   41.49   5.69  22.20
sdd               0.00     0.00    0.00  209.00     0.00 95324.00   912.19    58.31  264.49   4.78 100.00
sdf               0.00     0.00    0.00  128.00     0.00 57916.00   904.94    11.57   97.80   5.04  64.50
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00    24.00    0.00  114.00     0.00 51764.00   908.14    11.22   97.05   5.01  57.10

04/15/11 11:11:18
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.08    0.00    3.66   29.37    0.00   66.88

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    6.00     0.00  2564.00   854.67     0.12   42.67   5.67   3.40
sdd               0.00     0.00    0.00  210.00     0.00 95836.00   912.72    78.49  344.57   4.77 100.10
sdf               0.00     0.00    0.00   63.00     0.00 28700.00   911.11     4.01   63.70   4.90  30.90
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00    86.00    0.00  161.00     0.00 73284.00   910.36    16.79  101.65   4.94  79.60

04/15/11 11:11:19
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.13    0.00    1.64   15.79    0.00   82.44

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdd               0.00     6.00    0.00  213.00     0.00 96864.00   909.52    95.20  391.27   4.69 100.00
sdf               0.00     8.00    0.00   64.00     0.00 29204.00   912.62     5.95   87.69   6.64  42.50
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00    10.00    0.00   82.00     0.00 36968.00   901.66     6.81   93.56   5.48  44.90

04/15/11 11:11:20
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.07    0.00    2.67   24.78    0.00   72.48

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00  223.00     0.00 101648.00   911.64   107.04  492.83   4.49 100.10
sdf               0.00    10.00    0.00   82.00     0.00 37292.00   909.56     5.41   70.00   5.28  43.30
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00   35.00     0.00 16396.00   936.91     3.22   70.43   6.37  22.30

04/15/11 11:11:21
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.22    0.00    2.69   26.87    0.00   70.23

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00  220.00     0.00 100448.00   913.16   108.26  468.24   4.54  99.80
sdf               0.00     6.00    0.00   30.00     0.00 12908.00   860.53     0.99   32.93   5.37  16.10
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00   56.00     0.00 25452.00   909.00     2.77   59.91   5.64  31.60

04/15/11 11:11:22
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.15    0.00    2.86   29.01    0.00   67.97

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    3.00     0.00  1536.00  1024.00     1.98   12.33   7.33   2.20
sdd               0.00     0.00    0.00  227.00     0.00 103016.00   907.63   105.63  475.52   4.41 100.10
sdf               0.00     0.00    0.00   72.00     0.00 32800.00   911.11     5.64   78.36   5.10  36.70
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00   33.00     0.00 14864.00   900.85     1.27   43.52   4.97  16.40

04/15/11 11:11:23
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.21    0.00    3.78   25.11    0.00   70.90

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00    23.00    0.00  240.00     0.00 109452.00   912.10    80.12  319.24   4.16  99.90
sdd               0.00     0.00    0.00  237.00     0.00 108136.00   912.54    44.57  304.46   4.22  99.90
sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00   36.00     0.00 16400.00   911.11     1.64   45.61   4.58  16.50

04/15/11 11:11:24
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.04    0.00    2.42   16.49    0.00   81.04

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00  182.00     0.00 82604.00   907.74    48.70  293.42   5.20  94.70
sdd               0.00     0.00    0.00  195.00     0.00 88664.00   909.37    78.54  206.17   5.13 100.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00   36.00     0.00 16400.00   911.11     2.43   67.47   6.25  22.50

04/15/11 11:11:25
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.16    0.00    3.87   28.72    0.00   67.24

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00  181.00     0.00 82512.00   911.73    29.07  140.47   5.17  93.50
sdd               0.00     0.00    0.00  193.00     0.00 88528.00   917.39    79.66  541.11   5.19 100.10
sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

04/15/11 11:11:26
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.14    0.00    3.66   24.21    0.00   72.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     4.00    0.00  205.00     0.00 93328.00   910.52    52.15  238.70   4.88 100.00
sdd               0.00     0.00    0.00  208.00     0.00 94812.00   911.65    52.81  295.45   4.81 100.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

04/15/11 11:11:27
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.15    0.00    2.10   18.14    0.00   79.61

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00  187.00     0.00 85584.00   915.34    46.50  233.48   5.35 100.00
sdd               0.00     0.00    0.00  187.00     0.00 85076.00   909.90    63.27  336.57   5.35 100.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

04/15/11 11:11:28
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.13    0.00    3.05   22.20    0.00   74.62

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00    17.00    0.00  235.00     0.00 106928.00   910.03    87.12  321.71   4.26 100.00
sdd               0.00     0.00    0.00   93.00     0.00 42028.00   903.83    10.28  172.73   4.75  44.20
sdf               0.00     0.00    0.00   98.00     0.00 44624.00   910.69    10.06  102.61   4.42  43.30
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

04/15/11 11:11:29
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.10    0.00    2.41   18.55    0.00   78.95

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00  240.00     0.00 109164.00   909.70    80.55  427.16   4.03  96.80
sdd               0.00     0.00    0.00   57.00     0.00 26136.00   917.05    32.22  122.04   4.33  24.70
sdf               0.00     3.00    0.00   45.00     0.00 20592.00   915.20     4.08   90.78   6.02  27.10
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00   53.00     0.00 24112.00   909.89    10.19  192.17   5.53  29.30

04/15/11 11:11:30
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.20    0.00    2.51   29.36    0.00   67.93

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     3.00    0.00    3.00     0.00    15.00    10.00     0.00    0.33   0.33   0.10
sdd               0.00    11.00    0.00  240.00     0.00 109392.00   911.60   108.14  454.32   4.17 100.00
sdf               0.00     0.00    0.00   45.00     0.00 20500.00   911.11     1.51   33.60   4.42  19.90
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

04/15/11 11:11:31
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.07    0.00    2.23   28.36    0.00   69.34

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00  222.00     0.00 101578.00   915.12   109.82  480.98   4.51 100.10
sdf               0.00     0.00    0.00   36.00     0.00 16400.00   911.11     1.72   47.67   5.92  21.30
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

04/15/11 11:11:32
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.16    0.00    2.65   34.53    0.00   62.67

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdd               0.00     9.00    0.00  239.00     0.00 108112.50   904.71   108.93  457.56   4.18  99.90
sdf               0.00     0.00    0.00   63.00     0.00 28700.00   911.11     2.77   43.98   4.32  27.20
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    1.00     0.00     2.00     4.00     0.00    0.00   0.00   0.00

04/15/11 11:11:33
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.06    0.00    2.06   25.69    0.00   72.19

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00  238.00     0.00 108648.00   913.01   109.50  465.11   4.20 100.00
sdf               0.00     0.00    0.00   56.00     0.00 25002.00   892.93     3.52   62.89   4.16  23.30
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     3.00    0.00    2.00     0.00    13.00    13.00     0.00    0.50   0.50   0.10

04/15/11 11:11:34
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.10    0.00    1.84   21.94    0.00   76.12

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    9.00     0.00  4100.00   911.11     6.45   28.00   5.33   4.80
sdd               0.00     9.00    0.00  228.00     0.00 104036.00   912.60   112.86  484.62   4.39 100.10
sdf               0.00     3.00    0.00   29.00     0.00 12313.00   849.17     0.77   26.41   4.55  13.20
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

04/15/11 11:11:35
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.05    0.00    3.55   20.32    0.00   76.08

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     6.00    0.00  234.00     0.00 106736.00   912.27    79.91  337.08   4.27  99.90
sdd               0.00     0.00    0.00  239.00     0.00 109000.00   912.13    47.34  304.64   4.18  99.90
sdf               0.00     0.00    0.00   36.00     0.00 16400.00   911.11     1.99   55.31   4.83  17.40
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

04/15/11 11:11:43
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.03    0.00    2.89   37.10    0.00   59.98

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     3.77    0.00  233.09     0.00 106175.67   911.03   101.11  425.91   4.29 100.05
sdd               0.00     0.00    0.00   94.65     0.00 43082.73   910.39     6.19   68.82   4.15  39.32
sdf               0.00     0.36    0.00   22.87     0.00 10475.91   916.09     1.13   48.94   4.48  10.24
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.85    0.00   28.95     0.00 13097.32   904.71     1.40   48.43   4.48  12.98

04/15/11 11:12:07
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.01    0.00    1.68   31.49    0.00   66.82

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.04    0.00  232.51     0.00 105830.84   910.35   134.49  578.47   4.30  99.98
sdd               0.00     0.09    0.00   15.93     0.00  7128.99   894.96     4.24  266.26   4.93   7.85
sdf               0.00     0.00    0.00    5.91     0.00  2579.74   873.00     0.50   85.49   5.51   3.25
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.30    0.00    9.08     0.00  4049.38   892.01     1.06  116.65   5.13   4.66

04/15/11 11:12:08
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.07    0.00    2.68   38.97    0.00   58.28

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00  244.00     0.00 111212.00   911.57   105.33  434.26   4.10 100.10
sdd               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

04/15/11 11:13:10
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.01    0.00    2.09   40.16    0.00   57.74

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.13    0.00  106.56     0.00 48465.11   909.64    33.63  318.86   4.38  46.63
sdd               0.00     1.23    0.00  192.93     0.00 87854.15   910.74    87.40  452.15   4.36  84.10
sdf               0.00     0.08    0.00    5.29     0.00  2353.40   889.44     0.34   65.00   6.58   3.48
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.38    0.00   20.72     0.00  9346.82   902.27     1.86   89.81   4.93  10.22

04/15/11 11:13:35
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.01    0.00    1.83   40.09    0.00   58.07

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.48    0.00   25.42     0.00 11497.43   904.68     2.94  113.47   4.65  11.81
sdd               0.00     0.04    0.00  233.65     0.00 106353.64   910.36   126.72  542.92   4.28 100.02
sdf               0.00     0.28    0.00    7.04     0.00  3172.63   901.24     0.43   59.04   5.74   4.04
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.36    0.00    6.13     0.00  2650.02   865.21     0.29   48.47   4.80   2.94

04/15/11 11:13:48
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.03    0.00    1.90   42.73    0.00   55.34

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.08    0.00   18.15     0.00  8206.06   904.46     3.66  207.89   5.13   9.30
sdd               0.00     0.00    0.00  234.07     0.00 106601.62   910.84   134.79  567.23   4.27  99.99
sdf               0.00     0.00    0.00    4.20     0.00  1835.82   874.26     0.35   91.26   6.38   2.68
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    5.39     0.00  2364.54   877.66     0.58  108.31   6.87   3.70

04/15/11 11:13:54
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.01    0.00    2.18   42.40    0.00   55.41

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.60    0.00   63.25     0.00 28720.48   908.11     8.86  127.74   4.78  30.23
sdd               0.00     0.00    0.00  234.49     0.00 106746.39   910.46   105.36  468.43   4.26  99.92
sdf               0.00     0.00    0.00   10.99     0.00  4947.59   900.05     0.66   60.44   7.16   7.88
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.15    0.00   14.31     0.00  6559.04   916.88     1.01   70.37   6.26   8.96

04/15/11 11:13:55
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.05    0.00    3.35   21.45    0.00   75.15

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00  240.00     0.00 109164.00   909.70    57.85  207.77   4.17 100.00
sdd               0.00     0.00    0.00  168.00     0.00 76364.00   909.10    15.61  107.88   4.27  71.70
sdf               0.00     0.00    0.00   61.00     0.00 27980.00   917.38     6.27   70.77   8.07  49.20
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

04/15/11 11:14:01
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.02    0.00    2.83   33.41    0.00   63.74

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
       [not found]                 ` <20110415143711.GA17181@localhost>
@ 2011-04-15 22:13                   ` Jan Kara
  2011-04-16  6:05                     ` Wu Fengguang
  2011-04-16  8:33                     ` Peter Zijlstra
  0 siblings, 2 replies; 29+ messages in thread
From: Jan Kara @ 2011-04-15 22:13 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Jan Kara, Dave Chinner, Andrew Morton, Peter Zijlstra,
	Richard Kennedy, Hugh Dickins, Rik van Riel, LKML,
	Linux Memory Management List, linux-fsdevel

On Fri 15-04-11 22:37:11, Wu Fengguang wrote:
> On Fri, Apr 15, 2011 at 11:43:00AM +0800, Wu Fengguang wrote:
> > On Fri, Apr 15, 2011 at 02:16:09AM +0800, Jan Kara wrote:
> > > On Thu 14-04-11 23:14:25, Wu Fengguang wrote:
> > > > On Thu, Apr 14, 2011 at 08:23:02AM +0800, Wu Fengguang wrote:
> > > > > On Thu, Apr 14, 2011 at 07:52:11AM +0800, Dave Chinner wrote:
> > > > > > On Thu, Apr 14, 2011 at 07:31:22AM +0800, Wu Fengguang wrote:
> > > > > > > On Thu, Apr 14, 2011 at 06:04:44AM +0800, Jan Kara wrote:
> > > > > > > > On Wed 13-04-11 16:59:41, Wu Fengguang wrote:
> > > > > > > > > Reduce the dampening for the control system, yielding faster
> > > > > > > > > convergence. The change is a bit conservative, as smaller values may
> > > > > > > > > lead to noticeable bdi threshold fluctuates in low memory JBOD setup.
> > > > > > > > > 
> > > > > > > > > CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > > > > > > CC: Richard Kennedy <richard@rsk.demon.co.uk>
> > > > > > > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > > > > > >   Well, I have nothing against this change as such but what I don't like is
> > > > > > > > that it just changes magical +2 for similarly magical +0. It's clear that
> > > > > > > 
> > > > > > > The patch tends to make the rampup time a bit more reasonable for
> > > > > > > common desktops. From 100s to 25s (see below).
> > > > > > > 
> > > > > > > > this will lead to more rapid updates of proportions of bdi's share of
> > > > > > > > writeback and thread's share of dirtying but why +0? Why not +1 or -1? So
> > > > > > > 
> > > > > > > Yes, it will especially be a problem on _small memory_ JBOD setups.
> > > > > > > Richard actually has requested for a much radical change (decrease by
> > > > > > > 6) but that looks too much.
> > > > > > > 
> > > > > > > My team has a 12-disk JBOD with only 6G memory. The memory is pretty
> > > > > > > small as a server, but it's a real setup and serves well as the
> > > > > > > reference minimal setup that Linux should be able to run well on.
> > > > > > 
> > > > > > FWIW, linux runs on a lot of low power NAS boxes with jbod and/or
> > > > > > raid setups that have <= 1GB of RAM (many of them run XFS), so even
> > > > > > your setup could be considered large by a significant fraction of
> > > > > > the storage world. Hence you need to be careful of optimising for
> > > > > > what you think is a "normal" server, because there simply isn't such
> > > > > > a thing....
> > > > > 
> > > > > Good point! This patch is likely to hurt a loaded 1GB 4-disk NAS box...
> > > > > I'll test the setup.
> > > > 
> > > > Just did a comparison of the IO-less patches' performance with and
> > > > without this patch. I hardly notice any differences besides some more
> > > > bdi goal fluctuations in the attached graphs. The write throughput is
> > > > a bit large with this patch (80MB/s vs 76MB/s), however the delta is
> > > > within the even larger stddev range (20MB/s).
> > >   Thanks for the test but I cannot find out from the numbers you provided
> > > how much did the per-bdi thresholds fluctuate in this low memory NAS case?
> > > You can gather current bdi threshold from /sys/kernel/debug/bdi/<dev>/stats
> > > so it shouldn't be hard to get the numbers...
> > 
> > Hi Jan, attached are your results w/o this patch. The "bdi goal" (gray
> > line) is calculated as (bdi_thresh - bdi_thresh/8) and is fluctuating
> > all over the place.. and average wkB/s is only 49MB/s..
> 
> I got the numbers for vanilla kernel: XFS can do 57MB/s and 63MB/s in
> the two runs.  There are large fluctuations in the attached graphs, too.
  Hmm, so the graphs from previous email are with longer "proportion
period (without patch we discuss here)" and graphs from this email are
with it?

> To summary it up, for a 1GB mem, 4 disks JBOD setup, running 1 dd per
> disk:
> 
> vanilla: 57MB/s, 63MB/s
> Jan:     49MB/s, 103MB/s
> Wu:      76MB/s, 80MB/s
> 
> The balance_dirty_pages-task-bw-jan.png and
> balance_dirty_pages-pages-jan.png shows very unfair allocation of
> dirty pages and throughput among the disks...
  Fengguang, can we please stay on topic? It's good to know that throughput
fluctuates so much with my patches (although not that surprising seeing the
fluctuations of bdi limits) but for the sake of this patch throughput
numbers with different balance_dirty_pages() implementations do not seem
that interesting.  What is interesting (at least to me) is how this
particular patch changes fluctuations of bdi thresholds (fractions) in
vanilla kernel. In the graphs, I can see only bdi goal - that is the
per-bdi threshold we have in balance_dirty_pages() am I right? And it is
there for only a single device, right?

Anyway either with or without the patch, bdi thresholds are jumping rather
wildly if I'm interpreting the graphs right. Hmm, which is not that surprising
given that in ideal case we should have about 0.5s worth of writeback for
each disk in the page cache. So with your patch the period for proportion
estimation is also just about 0.5s worth of page writeback which is
understandably susceptible to fluctuations. Thinking about it, the original
period of 4*"dirty limit" on your machine is about 2.5 GB which is about
50s worth of writeback on that machine so it is in match with your
observation that it takes ~100s for bdi threshold to climb up.

So what is a takeaway from this for me is that scaling the period
with the dirty limit is not the right thing. If you'd have 4-times more
memory, your choice of "dirty limit" as the period would be as bad as
current 4*"dirty limit". What would seem like a better choice of period
to me would be to have the period in an order of a few seconds worth of
writeback. That would allow the bdi limit to scale up reasonably fast when
new bdi starts to be used and still not make it fluctuate that much
(hopefully).

Looking at math in lib/proportions.c, nothing really fundamental requires
that each period has the same length. So it shouldn't be hard to actually
create proportions calculator that would have timer triggered periods -
simply whenever the timer fires, we would declare a new period. The only
things which would be broken by this are (t represents global counter of
events):
a) counting of periods as t/period_len - we would have to maintain global
period counter but that's trivial
b) trick that we don't do t=t/2 for each new period but rather use
period_len/2+(t % (period_len/2)) when calculating fractions - again we
would have to bite the bullet and divide the global counter when we declare
new period but again it's not a big deal in our case.

Peter what do you think about this? Do you (or anyone else) think it makes
sense?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-04-15 22:13                   ` Jan Kara
@ 2011-04-16  6:05                     ` Wu Fengguang
  2011-04-16  8:33                     ` Peter Zijlstra
  1 sibling, 0 replies; 29+ messages in thread
From: Wu Fengguang @ 2011-04-16  6:05 UTC (permalink / raw)
  To: Jan Kara
  Cc: Dave Chinner, Andrew Morton, Peter Zijlstra, Richard Kennedy,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

On Sat, Apr 16, 2011 at 06:13:14AM +0800, Jan Kara wrote:
> On Fri 15-04-11 22:37:11, Wu Fengguang wrote:
> > On Fri, Apr 15, 2011 at 11:43:00AM +0800, Wu Fengguang wrote:
> > > On Fri, Apr 15, 2011 at 02:16:09AM +0800, Jan Kara wrote:
> > > > On Thu 14-04-11 23:14:25, Wu Fengguang wrote:
> > > > > On Thu, Apr 14, 2011 at 08:23:02AM +0800, Wu Fengguang wrote:
> > > > > > On Thu, Apr 14, 2011 at 07:52:11AM +0800, Dave Chinner wrote:
> > > > > > > On Thu, Apr 14, 2011 at 07:31:22AM +0800, Wu Fengguang wrote:
> > > > > > > > On Thu, Apr 14, 2011 at 06:04:44AM +0800, Jan Kara wrote:
> > > > > > > > > On Wed 13-04-11 16:59:41, Wu Fengguang wrote:
> > > > > > > > > > Reduce the dampening for the control system, yielding faster
> > > > > > > > > > convergence. The change is a bit conservative, as smaller values may
> > > > > > > > > > lead to noticeable bdi threshold fluctuates in low memory JBOD setup.
> > > > > > > > > > 
> > > > > > > > > > CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > > > > > > > CC: Richard Kennedy <richard@rsk.demon.co.uk>
> > > > > > > > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > > > > > > >   Well, I have nothing against this change as such but what I don't like is
> > > > > > > > > that it just changes magical +2 for similarly magical +0. It's clear that
> > > > > > > > 
> > > > > > > > The patch tends to make the rampup time a bit more reasonable for
> > > > > > > > common desktops. From 100s to 25s (see below).
> > > > > > > > 
> > > > > > > > > this will lead to more rapid updates of proportions of bdi's share of
> > > > > > > > > writeback and thread's share of dirtying but why +0? Why not +1 or -1? So
> > > > > > > > 
> > > > > > > > Yes, it will especially be a problem on _small memory_ JBOD setups.
> > > > > > > > Richard actually has requested for a much radical change (decrease by
> > > > > > > > 6) but that looks too much.
> > > > > > > > 
> > > > > > > > My team has a 12-disk JBOD with only 6G memory. The memory is pretty
> > > > > > > > small as a server, but it's a real setup and serves well as the
> > > > > > > > reference minimal setup that Linux should be able to run well on.
> > > > > > > 
> > > > > > > FWIW, linux runs on a lot of low power NAS boxes with jbod and/or
> > > > > > > raid setups that have <= 1GB of RAM (many of them run XFS), so even
> > > > > > > your setup could be considered large by a significant fraction of
> > > > > > > the storage world. Hence you need to be careful of optimising for
> > > > > > > what you think is a "normal" server, because there simply isn't such
> > > > > > > a thing....
> > > > > > 
> > > > > > Good point! This patch is likely to hurt a loaded 1GB 4-disk NAS box...
> > > > > > I'll test the setup.
> > > > > 
> > > > > Just did a comparison of the IO-less patches' performance with and
> > > > > without this patch. I hardly notice any differences besides some more
> > > > > bdi goal fluctuations in the attached graphs. The write throughput is
> > > > > a bit large with this patch (80MB/s vs 76MB/s), however the delta is
> > > > > within the even larger stddev range (20MB/s).
> > > >   Thanks for the test but I cannot find out from the numbers you provided
> > > > how much did the per-bdi thresholds fluctuate in this low memory NAS case?
> > > > You can gather current bdi threshold from /sys/kernel/debug/bdi/<dev>/stats
> > > > so it shouldn't be hard to get the numbers...
> > > 
> > > Hi Jan, attached are your results w/o this patch. The "bdi goal" (gray
> > > line) is calculated as (bdi_thresh - bdi_thresh/8) and is fluctuating
> > > all over the place.. and average wkB/s is only 49MB/s..
> > 
> > I got the numbers for vanilla kernel: XFS can do 57MB/s and 63MB/s in
> > the two runs.  There are large fluctuations in the attached graphs, too.
>   Hmm, so the graphs from previous email are with longer "proportion
> period (without patch we discuss here)" and graphs from this email are
> with it?

All graphs for vanilla and your IO-less kernels are collected without
this patch.

I only showed in previous email how my IO-less kernel works with and
without this patch, and the conclusion is, it's not sensitive to it
and is working fine in both cases.

> > To summary it up, for a 1GB mem, 4 disks JBOD setup, running 1 dd per
> > disk:
> > 
> > vanilla: 57MB/s, 63MB/s
> > Jan:     49MB/s, 103MB/s
> > Wu:      76MB/s, 80MB/s
> > 
> > The balance_dirty_pages-task-bw-jan.png and
> > balance_dirty_pages-pages-jan.png shows very unfair allocation of
> > dirty pages and throughput among the disks...
>   Fengguang, can we please stay on topic? It's good to know that throughput
> fluctuates so much with my patches (although not that surprising seeing the
> fluctuations of bdi limits) but for the sake of this patch throughput
> numbers with different balance_dirty_pages() implementations do not seem
> that interesting.  What is interesting (at least to me) is how this
> particular patch changes fluctuations of bdi thresholds (fractions) in
> vanilla kernel. In the graphs, I can see only bdi goal - that is the
> per-bdi threshold we have in balance_dirty_pages() am I right? And it is
> there for only a single device, right?

bdi_goal = bdi_thresh * 7/8. They are close. So by looking at the bdi
goal curve, you get the idea how bdi_thresh fluctuates over time.

balance_dirty_pages-pages-jan.png looks very like the single device
situation, because the bdi goal is so high! But that's exactly the
problem: the first bdi is consuming most dirty pages quota and run at
full speed, while the other bdi's run mostly idle.  You can confirm
the imbalance in balance_dirty_pages-task-bw-jan.png and iostat.

Looks similar to the problem described here:

https://lkml.org/lkml/2010/12/5/6

> Anyway either with or without the patch, bdi thresholds are jumping rather
> wildly if I'm interpreting the graphs right. Hmm, which is not that surprising
> given that in ideal case we should have about 0.5s worth of writeback for
> each disk in the page cache. So with your patch the period for proportion
> estimation is also just about 0.5s worth of page writeback which is
> understandably susceptible to fluctuations. Thinking about it, the original
> period of 4*"dirty limit" on your machine is about 2.5 GB which is about
> 50s worth of writeback on that machine so it is in match with your
> observation that it takes ~100s for bdi threshold to climb up.
> 
> So what is a takeaway from this for me is that scaling the period
> with the dirty limit is not the right thing. If you'd have 4-times more
> memory, your choice of "dirty limit" as the period would be as bad as
> current 4*"dirty limit". What would seem like a better choice of period
> to me would be to have the period in an order of a few seconds worth of
> writeback. That would allow the bdi limit to scale up reasonably fast when
> new bdi starts to be used and still not make it fluctuate that much
> (hopefully).

Yes it's good to make it more bandwidth and time wise.  I'll be glad
if you can improve the algorithm :)

Thanks,
Fengguang

> Looking at math in lib/proportions.c, nothing really fundamental requires
> that each period has the same length. So it shouldn't be hard to actually
> create proportions calculator that would have timer triggered periods -
> simply whenever the timer fires, we would declare a new period. The only
> things which would be broken by this are (t represents global counter of
> events):
> a) counting of periods as t/period_len - we would have to maintain global
> period counter but that's trivial
> b) trick that we don't do t=t/2 for each new period but rather use
> period_len/2+(t % (period_len/2)) when calculating fractions - again we
> would have to bite the bullet and divide the global counter when we declare
> new period but again it's not a big deal in our case.
> 
> Peter what do you think about this? Do you (or anyone else) think it makes
> sense?
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-04-15 22:13                   ` Jan Kara
  2011-04-16  6:05                     ` Wu Fengguang
@ 2011-04-16  8:33                     ` Peter Zijlstra
  2011-04-16 14:21                       ` Wu Fengguang
  2011-04-18 14:59                       ` Jan Kara
  1 sibling, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2011-04-16  8:33 UTC (permalink / raw)
  To: Jan Kara
  Cc: Wu Fengguang, Dave Chinner, Andrew Morton, Richard Kennedy,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

On Sat, 2011-04-16 at 00:13 +0200, Jan Kara wrote:
> 
> So what is a takeaway from this for me is that scaling the period
> with the dirty limit is not the right thing. If you'd have 4-times more
> memory, your choice of "dirty limit" as the period would be as bad as
> current 4*"dirty limit". What would seem like a better choice of period
> to me would be to have the period in an order of a few seconds worth of
> writeback. That would allow the bdi limit to scale up reasonably fast when
> new bdi starts to be used and still not make it fluctuate that much
> (hopefully).

No best would be to scale the period with the writeout bandwidth, but
lacking that the dirty limit had to do. Since we're counting pages, and
bandwidth is pages/second we'll end up with a time measure, exactly the
thing you wanted.

> Looking at math in lib/proportions.c, nothing really fundamental requires
> that each period has the same length. So it shouldn't be hard to actually
> create proportions calculator that would have timer triggered periods -
> simply whenever the timer fires, we would declare a new period. The only
> things which would be broken by this are (t represents global counter of
> events):
> a) counting of periods as t/period_len - we would have to maintain global
> period counter but that's trivial
> b) trick that we don't do t=t/2 for each new period but rather use
> period_len/2+(t % (period_len/2)) when calculating fractions - again we
> would have to bite the bullet and divide the global counter when we declare
> new period but again it's not a big deal in our case.
> 
> Peter what do you think about this? Do you (or anyone else) think it makes
> sense? 

But if you don't have a fixed sized period, then how do you catch up on
fractions that haven't been updated for several periods? You cannot go
remember all the individual period lengths.

The whole trick to the proportion stuff is that its all O(1) regardless
of the number of contestants. There isn't a single loop that iterates
over all BDIs or tasks to update their cycle, that wouldn't have scaled.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-04-16  8:33                     ` Peter Zijlstra
@ 2011-04-16 14:21                       ` Wu Fengguang
  2011-04-17  2:11                         ` Wu Fengguang
  2011-04-18 14:59                       ` Jan Kara
  1 sibling, 1 reply; 29+ messages in thread
From: Wu Fengguang @ 2011-04-16 14:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jan Kara, Dave Chinner, Andrew Morton, Richard Kennedy,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 3533 bytes --]

On Sat, Apr 16, 2011 at 04:33:29PM +0800, Peter Zijlstra wrote:
> On Sat, 2011-04-16 at 00:13 +0200, Jan Kara wrote:
> > 
> > So what is a takeaway from this for me is that scaling the period
> > with the dirty limit is not the right thing. If you'd have 4-times more
> > memory, your choice of "dirty limit" as the period would be as bad as
> > current 4*"dirty limit". What would seem like a better choice of period
> > to me would be to have the period in an order of a few seconds worth of
> > writeback. That would allow the bdi limit to scale up reasonably fast when
> > new bdi starts to be used and still not make it fluctuate that much
> > (hopefully).
> 
> No best would be to scale the period with the writeout bandwidth, but
> lacking that the dirty limit had to do. Since we're counting pages, and
> bandwidth is pages/second we'll end up with a time measure, exactly the
> thing you wanted.

I owe you the patch :) Here is a tested one for doing the bandwidth
based scaling. It's based on the attached global writeout bandwidth
estimation.

I tried updating the shift both on rosed and fallen bandwidth, however
that leads to reset of the accumulated proportion values. So here the
shift will only be increased and never decreased.

Thanks,
Fengguang
---
Subject: writeback: scale dirty proportions period with writeout bandwidth
Date: Sat Apr 16 18:38:41 CST 2011

CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/page-writeback.c |   23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

--- linux-next.orig/mm/page-writeback.c	2011-04-16 21:02:24.000000000 +0800
+++ linux-next/mm/page-writeback.c	2011-04-16 21:04:08.000000000 +0800
@@ -121,20 +121,13 @@ static struct prop_descriptor vm_complet
 static struct prop_descriptor vm_dirties;
 
 /*
- * couple the period to the dirty_ratio:
+ * couple the period to global write throughput:
  *
- *   period/2 ~ roundup_pow_of_two(dirty limit)
+ *   period/2 ~ roundup_pow_of_two(write IO throughput)
  */
 static int calc_period_shift(void)
 {
-	unsigned long dirty_total;
-
-	if (vm_dirty_bytes)
-		dirty_total = vm_dirty_bytes / PAGE_SIZE;
-	else
-		dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
-				100;
-	return 2 + ilog2(dirty_total - 1);
+	return 2 + ilog2(default_backing_dev_info.avg_write_bandwidth);
 }
 
 /*
@@ -143,6 +136,13 @@ static int calc_period_shift(void)
 static void update_completion_period(void)
 {
 	int shift = calc_period_shift();
+
+	if (shift > PROP_MAX_SHIFT)
+		shift = PROP_MAX_SHIFT;
+
+	if (shift <= vm_completions.pg[0].shift)
+		return;
+
 	prop_change_shift(&vm_completions, shift);
 	prop_change_shift(&vm_dirties, shift);
 }
@@ -180,7 +180,6 @@ int dirty_ratio_handler(struct ctl_table
 
 	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 	if (ret == 0 && write && vm_dirty_ratio != old_ratio) {
-		update_completion_period();
 		vm_dirty_bytes = 0;
 	}
 	return ret;
@@ -196,7 +195,6 @@ int dirty_bytes_handler(struct ctl_table
 
 	ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
 	if (ret == 0 && write && vm_dirty_bytes != old_bytes) {
-		update_completion_period();
 		vm_dirty_ratio = 0;
 	}
 	return ret;
@@ -1026,6 +1024,7 @@ void bdi_update_bandwidth(struct backing
 						global_page_state(NR_WRITTEN));
 		gbdi->bw_time_stamp = now;
 		gbdi->written_stamp = global_page_state(NR_WRITTEN);
+		update_completion_period();
 	}
 	if (thresh) {
 		bdi_update_dirty_ratelimit(bdi, thresh, dirty,

[-- Attachment #2: writeback-global-write-bandwidth.patch --]
[-- Type: text/x-diff, Size: 1386 bytes --]

Subject: writeback: global writeback throughput
Date: Sat Apr 16 18:25:51 CST 2011


Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/page-writeback.c |   13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

--- linux-next.orig/mm/page-writeback.c	2011-04-16 18:26:54.000000000 +0800
+++ linux-next/mm/page-writeback.c	2011-04-16 20:13:21.000000000 +0800
@@ -994,6 +994,7 @@ void bdi_update_bandwidth(struct backing
 	unsigned long elapsed;
 	unsigned long dirtied;
 	unsigned long written;
+	struct backing_dev_info *gbdi = &default_backing_dev_info;
 
 	if (!spin_trylock(&dirty_lock))
 		return;
@@ -1016,11 +1017,15 @@ void bdi_update_bandwidth(struct backing
 	if (elapsed <= MAX_PAUSE)
 		goto unlock;
 
-	if (thresh &&
-	    now - default_backing_dev_info.bw_time_stamp >= MAX_PAUSE) {
+	if (thresh && now - gbdi->bw_time_stamp >= MAX_PAUSE) {
 		update_dirty_limit(thresh, dirty);
-		bdi_update_dirty_smooth(&default_backing_dev_info, dirty);
-		default_backing_dev_info.bw_time_stamp = now;
+		bdi_update_dirty_smooth(gbdi, dirty);
+		if (now - gbdi->bw_time_stamp < HZ + MAX_PAUSE)
+			__bdi_update_write_bandwidth(gbdi,
+						now - gbdi->bw_time_stamp,
+						global_page_state(NR_WRITTEN));
+		gbdi->bw_time_stamp = now;
+		gbdi->written_stamp = global_page_state(NR_WRITTEN);
 	}
 	if (thresh) {
 		bdi_update_dirty_ratelimit(bdi, thresh, dirty,

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-04-16 14:21                       ` Wu Fengguang
@ 2011-04-17  2:11                         ` Wu Fengguang
  0 siblings, 0 replies; 29+ messages in thread
From: Wu Fengguang @ 2011-04-17  2:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jan Kara, Dave Chinner, Andrew Morton, Richard Kennedy,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

On Sat, Apr 16, 2011 at 10:21:14PM +0800, Wu Fengguang wrote:
> On Sat, Apr 16, 2011 at 04:33:29PM +0800, Peter Zijlstra wrote:
> > On Sat, 2011-04-16 at 00:13 +0200, Jan Kara wrote:
> > > 
> > > So what is a takeaway from this for me is that scaling the period
> > > with the dirty limit is not the right thing. If you'd have 4-times more
> > > memory, your choice of "dirty limit" as the period would be as bad as
> > > current 4*"dirty limit". What would seem like a better choice of period
> > > to me would be to have the period in an order of a few seconds worth of
> > > writeback. That would allow the bdi limit to scale up reasonably fast when
> > > new bdi starts to be used and still not make it fluctuate that much
> > > (hopefully).
> > 
> > No best would be to scale the period with the writeout bandwidth, but
> > lacking that the dirty limit had to do. Since we're counting pages, and
> > bandwidth is pages/second we'll end up with a time measure, exactly the
> > thing you wanted.
> 
> I owe you the patch :) Here is a tested one for doing the bandwidth
> based scaling. It's based on the attached global writeout bandwidth
> estimation.
> 
> I tried updating the shift both on rosed and fallen bandwidth, however
> that leads to reset of the accumulated proportion values. So here the
> shift will only be increased and never decreased.

I cannot reproduce the issue now.  It may be due to the bandwidth
estimation went wrong and get tiny values at times in an early patch,
thus "resetting" the proportional values.

I'll carry the below version in future tests. In theory we could do
more coarse tracking with

        if (abs(shift - vm_completions.pg[0].shift) <= 1)
                return;

But let's do it more diligent now.

Thanks,
Fengguang
---
@@ -143,6 +136,13 @@ static int calc_period_shift(void)
 static void update_completion_period(void)
 {
 	int shift = calc_period_shift();
+
+	if (shift > PROP_MAX_SHIFT)
+		shift = PROP_MAX_SHIFT;
+
+	if (shift == vm_completions.pg[0].shift)
+		return;
+
 	prop_change_shift(&vm_completions, shift);
 	prop_change_shift(&vm_dirties, shift);
 }


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-04-16  8:33                     ` Peter Zijlstra
  2011-04-16 14:21                       ` Wu Fengguang
@ 2011-04-18 14:59                       ` Jan Kara
  2011-05-24 12:24                         ` Peter Zijlstra
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Kara @ 2011-04-18 14:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jan Kara, Wu Fengguang, Dave Chinner, Andrew Morton,
	Richard Kennedy, Hugh Dickins, Rik van Riel, LKML,
	Linux Memory Management List, linux-fsdevel

On Sat 16-04-11 10:33:29, Peter Zijlstra wrote:
> On Sat, 2011-04-16 at 00:13 +0200, Jan Kara wrote:
> > 
> > So what is a takeaway from this for me is that scaling the period
> > with the dirty limit is not the right thing. If you'd have 4-times more
> > memory, your choice of "dirty limit" as the period would be as bad as
> > current 4*"dirty limit". What would seem like a better choice of period
> > to me would be to have the period in an order of a few seconds worth of
> > writeback. That would allow the bdi limit to scale up reasonably fast when
> > new bdi starts to be used and still not make it fluctuate that much
> > (hopefully).
> 
> No best would be to scale the period with the writeout bandwidth, but
> lacking that the dirty limit had to do. Since we're counting pages, and
> bandwidth is pages/second we'll end up with a time measure, exactly the
> thing you wanted.
  Yes, I was thinking about this as well. We could measure the throughput
but essentially it's a changing entity (dependent on the type of load and
possibly other things like network load for NFS, or other machines
accessing your NAS). So I'm not sure one constant value will work (esp.
because you have to measure it and you never know at which state you did
the measurement). And when you have changing values, you have to solve the
same problem as with time based periods - that's how I came to them.

> > Looking at math in lib/proportions.c, nothing really fundamental requires
> > that each period has the same length. So it shouldn't be hard to actually
> > create proportions calculator that would have timer triggered periods -
> > simply whenever the timer fires, we would declare a new period. The only
> > things which would be broken by this are (t represents global counter of
> > events):
> > a) counting of periods as t/period_len - we would have to maintain global
> > period counter but that's trivial
> > b) trick that we don't do t=t/2 for each new period but rather use
> > period_len/2+(t % (period_len/2)) when calculating fractions - again we
> > would have to bite the bullet and divide the global counter when we declare
> > new period but again it's not a big deal in our case.
> > 
> > Peter what do you think about this? Do you (or anyone else) think it makes
> > sense? 
> 
> But if you don't have a fixed sized period, then how do you catch up on
> fractions that haven't been updated for several periods? You cannot go
> remember all the individual period lengths.
  OK, I wrote the expressions down and the way I want to do it would get
different fractions than your original formula:

  Your formula is:
p(j)=\sum_i x_i(j)/(t_i*2^{i+1})
  where $i$ sums from 0 to \infty, x_i(j) is the number of events of type
$j$ in period $i$, $t_i$ is the total number of events in period $i$.

  I want to compute
l(j)=\sum_i x_i(j)/2^{i+1}
g=\sum_i t_i/2^{i+1}
  and
p(j)=l(j)/g

  Clearly, all these values can be computed in O(1). Now for t_i = t for every
i, the results of both formulas are the same (which is what made me make my
mistake). But when t_i differ, the results are different. I'd say that the
new formula also provides a meaningful notion of writeback share although
it's hard to quantify how far the computations will be in practice...
  
> The whole trick to the proportion stuff is that its all O(1) regardless
> of the number of contestants. There isn't a single loop that iterates
> over all BDIs or tasks to update their cycle, that wouldn't have scaled.
  Sure, I understand.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-04-18 14:59                       ` Jan Kara
@ 2011-05-24 12:24                         ` Peter Zijlstra
  2011-05-24 12:41                           ` Peter Zijlstra
  2011-06-09 23:58                           ` Jan Kara
  0 siblings, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2011-05-24 12:24 UTC (permalink / raw)
  To: Jan Kara
  Cc: Wu Fengguang, Dave Chinner, Andrew Morton, Richard Kennedy,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

Sorry for the delay, life got interesting and then it slipped my mind.

On Mon, 2011-04-18 at 16:59 +0200, Jan Kara wrote:
>   Your formula is:
> p(j)=\sum_i x_i(j)/(t_i*2^{i+1})
>   where $i$ sums from 0 to \infty, x_i(j) is the number of events of type
> $j$ in period $i$, $t_i$ is the total number of events in period $i$.

Actually:

 p_j = \Sum_{i=0} (d/dt_i) * x_j / 2^(i+1)

[ discrete differential ]

Where x_j is the total number of events for the j-th element of the set
and t_i is the i-th last period.

Also, the 1/2^(i+1) factor ensures recent history counts heavier while
still maintaining a normalized distribution.

Furthermore, by measuring time in the same measure as the events we get:

 t = \Sum_i x_i

which yields that:

 p_j = x_j * {\Sum_i (d/dt_i)} * {\Sum 2^(-i-1)}
     = x_j * (1/t) * 1

Thus

 \Sum_j p_j = \Sum_j x_j / (\Sum_i x_i) = 1

>   I want to compute
> l(j)=\sum_i x_i(j)/2^{i+1}
> g=\sum_i t_i/2^{i+1}
>   and
> p(j)=l(j)/g

Which gives me:

 p_j = x_j * \Sum_i 1/t_i
     = x_j / t

Again, if we then measure t in the same events as x, such that:

 t = \Sum_i x_i

we again get:

 \Sum_j p_j = \Sum_j x_j / \Sum_i x_i = 1

However, if you start measuring t differently that breaks, and the
result is no longer normalized and thus not suitable as a proportion.

Furthermore, while x_j/t is an average, it does not have decaying
history, resulting in past behaviour always affecting current results.
The decaying history thing will ensure that past behaviour will slowly
be 'forgotten' so that when the media is used differently (seeky to
non-seeky workload transition) the slow writeout speed will be forgotten
and we'll end up at the high writeout speed corresponding to less seeks.
Your average will end up hovering in the middle of the slow and fast
modes.

>   Clearly, all these values can be computed in O(1).

True, but you get to keep x and t counts over all history, which could
lead to overflow scenarios (although switching to u64 should mitigate
that problem in our lifetime).

>  Now for t_i = t for every
> i, the results of both formulas are the same (which is what made me make my
> mistake).

I'm not actually seeing how the averages will be the same, as explained,
yours seems to never forget history.

>  But when t_i differ, the results are different.

>From what I can tell, when you stop measuring t in the same events as x
everything comes down because then the sum of proportions isn't
normalized.

>  I'd say that the
> new formula also provides a meaningful notion of writeback share although
> it's hard to quantify how far the computations will be in practice...

s/far/fair/ ?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-05-24 12:24                         ` Peter Zijlstra
@ 2011-05-24 12:41                           ` Peter Zijlstra
  2011-06-09 23:58                           ` Jan Kara
  1 sibling, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2011-05-24 12:41 UTC (permalink / raw)
  To: Jan Kara
  Cc: Wu Fengguang, Dave Chinner, Andrew Morton, Richard Kennedy,
	Hugh Dickins, Rik van Riel, LKML, Linux Memory Management List,
	linux-fsdevel

On Tue, 2011-05-24 at 14:24 +0200, Peter Zijlstra wrote:
> Again, if we then measure t in the same events as x, such that:
> 
>  t = \Sum_i x_i

> However, if you start measuring t differently that breaks, and the
> result is no longer normalized and thus not suitable as a proportion.

Ah, I made a mistake there, your proposal would keep the above relation
true, but the discrete periods t_i wouldn't be uniform.

So disregard the non normalized criticism.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time
  2011-05-24 12:24                         ` Peter Zijlstra
  2011-05-24 12:41                           ` Peter Zijlstra
@ 2011-06-09 23:58                           ` Jan Kara
  1 sibling, 0 replies; 29+ messages in thread
From: Jan Kara @ 2011-06-09 23:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jan Kara, Wu Fengguang, Dave Chinner, Andrew Morton,
	Richard Kennedy, Hugh Dickins, Rik van Riel, LKML,
	Linux Memory Management List, linux-fsdevel

On Tue 24-05-11 14:24:29, Peter Zijlstra wrote:
> Sorry for the delay, life got interesting and then it slipped my mind.
  And I missed you reply so sorry for my delay as well :).

> On Mon, 2011-04-18 at 16:59 +0200, Jan Kara wrote:
> >   Your formula is:
> > p(j)=\sum_i x_i(j)/(t_i*2^{i+1})
> >   where $i$ sums from 0 to \infty, x_i(j) is the number of events of type
> > $j$ in period $i$, $t_i$ is the total number of events in period $i$.
> 
> Actually:
> 
>  p_j = \Sum_{i=0} (d/dt_i) * x_j / 2^(i+1)
> 
> [ discrete differential ]
> 
> Where x_j is the total number of events for the j-th element of the set
> and t_i is the i-th last period.
> 
> Also, the 1/2^(i+1) factor ensures recent history counts heavier while
> still maintaining a normalized distribution.
> 
> Furthermore, by measuring time in the same measure as the events we get:
> 
>  t = \Sum_i x_i
> 
> which yields that:
> 
>  p_j = x_j * {\Sum_i (d/dt_i)} * {\Sum 2^(-i-1)}
>      = x_j * (1/t) * 1
> 
> Thus
> 
>  \Sum_j p_j = \Sum_j x_j / (\Sum_i x_i) = 1
  Yup, I understand this.

> >   I want to compute
> > l(j)=\sum_i x_i(j)/2^{i+1}
> > g=\sum_i t_i/2^{i+1}
> >   and
> > p(j)=l(j)/g
> 
> Which gives me:
> 
>  p_j = x_j * \Sum_i 1/t_i
>      = x_j / t
  It cannot really be simplified like this - 2^{i+1} parts do not cancel
out in p(j). Let's write the formula in an iterative manner so that it
becomes clearer. The first step almost looks like the 2^{i+1} members can
cancel out (note that I use x_1 and t_1 instead of x_0 and t_0 so that I don't
have to renumber when going for the next step):
l'(j) = x_1/2 + l(j)/2
g' = t_1/2 + g/2
thus
p'(j) = l'(j) / g'
      = (x_1 + l(j))/2 / ((t_1 + g)/2)
      = (x_1 + l(j)) / (t_1+g)

But if you properly expand to the next step you'll get:
l''(j) = x_0/2 + l'(j)/2
       = x_0/2 + x_1/4 + l(j)/4
g'' = t_0/2 + g'/2
    = t_0/2 + t_1/4 + g/4
thus we only get:
p''(j) = l''(j)/g''
       = (x_0/2 + x_1/4 + l(j)/4) / (t_0/2 + t_1/4 + g/4)
       = (x_0 + x_1/2 + l(j)/2) / (t_0 + t_1/2 + g/2)

Hmm, I guess I should have written the formulas as

l(j) = \sum_i x_i(j)/2^i
g = \sum_i t_i/2^i

It is equivalent and less confusing for the iterative expression where
we get directly:

l'(j)=x_0+l(j)/2
g'=t_0+g/2

which directly shows what's going on.

> Again, if we then measure t in the same events as x, such that:
> 
>  t = \Sum_i x_i
> 
> we again get:
> 
>  \Sum_j p_j = \Sum_j x_j / \Sum_i x_i = 1
> 
> However, if you start measuring t differently that breaks, and the
> result is no longer normalized and thus not suitable as a proportion.
  The normalization works with my formula as you noted in your next email
(I just expand it here for other readers):
\Sum_j p_j = \Sum_j l(j)/g
           = 1/g * \Sum_j \Sum_i x_i(j)/2^(i+1)
	   = 1/g * \Sum_i (1/2^(i+1) * \Sum_j x_i(j))
(*)        = 1/g * \Sum_i t_i/2^(i+1)
           = 1

(*) Here we use that t_i = \Sum_j x_i(j) because that's the definition of
t_i.

Note that exactly same equality holds when 2^(i+1) is replaced with 2^i in
g and l(j).

> Furthermore, while x_j/t is an average, it does not have decaying
> history, resulting in past behaviour always affecting current results.
> The decaying history thing will ensure that past behaviour will slowly
> be 'forgotten' so that when the media is used differently (seeky to
> non-seeky workload transition) the slow writeout speed will be forgotten
> and we'll end up at the high writeout speed corresponding to less seeks.
> Your average will end up hovering in the middle of the slow and fast
> modes.
  So this the most disputable point of my formulas I believe :). You are
right that if, for example, nothing happens during a time slice (i.e. t_0 =
0, x_0(j)=0), the proportions don't change (well, after some time rounding
starts to have effect but let's ignore that for now). Generally, if
previously t_i was big and then became small (system bandwidth lowered;
e.g. t_5=10000, t_4=10, t_3=20,...,), it will take roughly log_2(maximum
t_i/current t_i) time slices for the contribution of terms with t_i big 
to become comparable with the contribution of later terms with t_i small.
After this number of time slices, proportions will catch up with the change.

On the other hand when t_i was small for some time and then becomes big,
proportions will effectively reflect current state. So when someone starts
writing to a device on otherwise quiet system, the device immediately gets
fraction close to 1.

I'm not sure how big problem the above behavior is or what would actually
be a desirable one...

> >   Clearly, all these values can be computed in O(1).
> 
> True, but you get to keep x and t counts over all history, which could
> lead to overflow scenarios (although switching to u64 should mitigate
> that problem in our lifetime).
  I think even 32-bit numbers might be fine. The numbers we need to keep are
of an order of total maximum bandwidth of the system. If you plug maxbw
instead of all x_i(j) and t_i, you'll get that l(j)=maxbw (or 2*maxbw if we
use 2^i in the formula) and similarly for g. So the math will work in
32-bits for a bandwidth of an order of TB per slice (which I expect to be
something between 0.1 and 10 s). Reasonable given today's HW although
probably we'll have to go to 64-bits soon, you are right.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2011-06-09 23:58 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20110413085937.981293444@intel.com>
2011-04-13  8:59 ` [PATCH 1/4] writeback: add bdi_dirty_limit() kernel-doc Wu Fengguang
2011-04-13 21:47   ` Jan Kara
2011-04-13  8:59 ` [PATCH 2/4] writeback: avoid duplicate balance_dirty_pages_ratelimited() calls Wu Fengguang
2011-04-13 21:53   ` Jan Kara
2011-04-14  0:30     ` Wu Fengguang
2011-04-14 10:20       ` Jan Kara
2011-04-13  8:59 ` [PATCH 3/4] writeback: skip balance_dirty_pages() for in-memory fs Wu Fengguang
2011-04-13 21:54   ` Jan Kara
2011-04-13  8:59 ` [PATCH 4/4] writeback: reduce per-bdi dirty threshold ramp up time Wu Fengguang
2011-04-13 22:04   ` Jan Kara
2011-04-13 23:31     ` Wu Fengguang
2011-04-13 23:52       ` Dave Chinner
2011-04-14  0:23         ` Wu Fengguang
2011-04-14 10:36           ` Richard Kennedy
2011-04-14 13:49             ` Wu Fengguang
2011-04-14 14:08               ` Wu Fengguang
     [not found]           ` <20110414151424.GA367@localhost>
2011-04-14 15:56             ` Wu Fengguang
2011-04-14 18:16             ` Jan Kara
2011-04-15  3:43               ` Wu Fengguang
     [not found]                 ` <20110415143711.GA17181@localhost>
2011-04-15 22:13                   ` Jan Kara
2011-04-16  6:05                     ` Wu Fengguang
2011-04-16  8:33                     ` Peter Zijlstra
2011-04-16 14:21                       ` Wu Fengguang
2011-04-17  2:11                         ` Wu Fengguang
2011-04-18 14:59                       ` Jan Kara
2011-05-24 12:24                         ` Peter Zijlstra
2011-05-24 12:41                           ` Peter Zijlstra
2011-06-09 23:58                           ` Jan Kara
2011-04-13 10:15 ` [PATCH 0/4] trivial writeback fixes Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).