All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: <linux-fsdevel@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andrea Righi <arighi@develer.com>
Cc: linux-mm <linux-mm@kvack.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH 3/5] writeback: dirty rate control
Date: Sat, 06 Aug 2011 16:44:50 +0800	[thread overview]
Message-ID: <20110806094526.878435971@intel.com> (raw)
In-Reply-To: 20110806084447.388624428@intel.com

[-- Attachment #1: dirty-ratelimit --]
[-- Type: text/plain, Size: 6415 bytes --]

It's all about bdi->dirty_ratelimit, which aims to be (write_bw / N)
when there are N dd tasks.

On write() syscall, use bdi->dirty_ratelimit
============================================

    balance_dirty_pages(pages_dirtied)
    {
        pos_bw = bdi->dirty_ratelimit * bdi_position_ratio();
        pause = pages_dirtied / pos_bw;
        sleep(pause);
    }

On every 200ms, update bdi->dirty_ratelimit
===========================================

    bdi_update_dirty_ratelimit()
    {
        bw = bdi->dirty_ratelimit;
        ref_bw = bw * bdi_position_ratio() * write_bw / dirty_bw;
        if (dirty pages unbalanced)
             bdi->dirty_ratelimit = (bw * 3 + ref_bw) / 4;
    }

Estimation of balanced bdi->dirty_ratelimit
===========================================

When started N dd, throttle each dd at

         task_ratelimit = pos_bw (any non-zero initial value is OK)

After 200ms, we got

         dirty_bw = # of pages dirtied by app / 200ms
         write_bw = # of pages written to disk / 200ms

For aggressive dirtiers, the equality holds

         dirty_bw == N * task_ratelimit
                  == N * pos_bw                      	(1)

The balanced throttle bandwidth can be estimated by

         ref_bw = pos_bw * write_bw / dirty_bw       	(2)

>From (1) and (2), we get equality

         ref_bw == write_bw / N                      	(3)

If the N dd's are all throttled at ref_bw, the dirty/writeback rates
will match. So ref_bw is the balanced dirty rate.

In practice, the ref_bw calculated by (2) may fluctuate and have
estimation errors. So the bdi->dirty_ratelimit update policy is to
follow it only when both pos_bw and ref_bw point to the same direction
(indicating not only the dirty position has deviated from the global/bdi
setpoints, but also it's still departing away).

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/backing-dev.h |    7 +++
 mm/backing-dev.c            |    1 
 mm/page-writeback.c         |   69 +++++++++++++++++++++++++++++++++-
 3 files changed, 75 insertions(+), 2 deletions(-)

--- linux-next.orig/include/linux/backing-dev.h	2011-08-05 18:05:36.000000000 +0800
+++ linux-next/include/linux/backing-dev.h	2011-08-05 18:05:36.000000000 +0800
@@ -75,10 +75,17 @@ struct backing_dev_info {
 	struct percpu_counter bdi_stat[NR_BDI_STAT_ITEMS];
 
 	unsigned long bw_time_stamp;	/* last time write bw is updated */
+	unsigned long dirtied_stamp;
 	unsigned long written_stamp;	/* pages written at bw_time_stamp */
 	unsigned long write_bandwidth;	/* the estimated write bandwidth */
 	unsigned long avg_write_bandwidth; /* further smoothed write bw */
 
+	/*
+	 * The base throttle bandwidth, re-calculated on every 200ms.
+	 * All the bdi tasks' dirty rate will be curbed under it.
+	 */
+	unsigned long dirty_ratelimit;
+
 	struct prop_local_percpu completions;
 	int dirty_exceeded;
 
--- linux-next.orig/mm/backing-dev.c	2011-08-05 18:05:36.000000000 +0800
+++ linux-next/mm/backing-dev.c	2011-08-05 18:05:36.000000000 +0800
@@ -674,6 +674,7 @@ int bdi_init(struct backing_dev_info *bd
 	bdi->bw_time_stamp = jiffies;
 	bdi->written_stamp = 0;
 
+	bdi->dirty_ratelimit = INIT_BW;
 	bdi->write_bandwidth = INIT_BW;
 	bdi->avg_write_bandwidth = INIT_BW;
 
--- linux-next.orig/mm/page-writeback.c	2011-08-05 18:05:36.000000000 +0800
+++ linux-next/mm/page-writeback.c	2011-08-06 09:08:35.000000000 +0800
@@ -736,6 +736,66 @@ static void global_update_bandwidth(unsi
 	spin_unlock(&dirty_lock);
 }
 
+/*
+ * Maintain bdi->dirty_ratelimit, the base throttle bandwidth.
+ *
+ * Normal bdi tasks will be curbed at or below it in long term.
+ * Obviously it should be around (write_bw / N) when there are N dd tasks.
+ */
+static void bdi_update_dirty_ratelimit(struct backing_dev_info *bdi,
+				       unsigned long thresh,
+				       unsigned long dirty,
+				       unsigned long bdi_thresh,
+				       unsigned long bdi_dirty,
+				       unsigned long dirtied,
+				       unsigned long elapsed)
+{
+	unsigned long bw = bdi->dirty_ratelimit;
+	unsigned long dirty_bw;
+	unsigned long pos_bw;
+	unsigned long ref_bw;
+	unsigned long long pos_ratio;
+
+	/*
+	 * The dirty rate will match the writeback rate in long term, except
+	 * when dirty pages are truncated by userspace or re-dirtied by FS.
+	 */
+	dirty_bw = (dirtied - bdi->dirtied_stamp) * HZ / elapsed;
+
+	pos_ratio = bdi_position_ratio(bdi, thresh, dirty,
+				       bdi_thresh, bdi_dirty);
+	/*
+	 * pos_bw reflects each dd's dirty rate enforced for the past 200ms.
+	 */
+	pos_bw = bw * pos_ratio >> BANDWIDTH_CALC_SHIFT;
+	pos_bw++;  /* this avoids bdi->dirty_ratelimit get stuck in 0 */
+
+	/*
+	 * ref_bw = pos_bw * write_bw / dirty_bw
+	 *
+	 * It's a linear estimation of the "balanced" throttle bandwidth.
+	 */
+	pos_ratio *= bdi->avg_write_bandwidth;
+	do_div(pos_ratio, dirty_bw | 1);
+	ref_bw = bw * pos_ratio >> BANDWIDTH_CALC_SHIFT;
+
+	/*
+	 * dirty_ratelimit will follow ref_bw/pos_bw conservatively iff they
+	 * are on the same side of dirty_ratelimit. Which not only makes it
+	 * more stable, but also is essential for preventing it being driven
+	 * away by possible systematic errors in ref_bw.
+	 */
+	if (pos_bw < bw) {
+		if (ref_bw < bw)
+			bw = max(ref_bw, pos_bw);
+	} else {
+		if (ref_bw > bw)
+			bw = min(ref_bw, pos_bw);
+	}
+
+	bdi->dirty_ratelimit = bw;
+}
+
 void __bdi_update_bandwidth(struct backing_dev_info *bdi,
 			    unsigned long thresh,
 			    unsigned long dirty,
@@ -745,6 +805,7 @@ void __bdi_update_bandwidth(struct backi
 {
 	unsigned long now = jiffies;
 	unsigned long elapsed = now - bdi->bw_time_stamp;
+	unsigned long dirtied;
 	unsigned long written;
 
 	/*
@@ -753,6 +814,7 @@ void __bdi_update_bandwidth(struct backi
 	if (elapsed < BANDWIDTH_INTERVAL)
 		return;
 
+	dirtied = percpu_counter_read(&bdi->bdi_stat[BDI_DIRTIED]);
 	written = percpu_counter_read(&bdi->bdi_stat[BDI_WRITTEN]);
 
 	/*
@@ -762,12 +824,15 @@ void __bdi_update_bandwidth(struct backi
 	if (elapsed > HZ && time_before(bdi->bw_time_stamp, start_time))
 		goto snapshot;
 
-	if (thresh)
+	if (thresh) {
 		global_update_bandwidth(thresh, dirty, now);
-
+		bdi_update_dirty_ratelimit(bdi, thresh, dirty, bdi_thresh,
+					   bdi_dirty, dirtied, elapsed);
+	}
 	bdi_update_write_bandwidth(bdi, elapsed, written);
 
 snapshot:
+	bdi->dirtied_stamp = dirtied;
 	bdi->written_stamp = written;
 	bdi->bw_time_stamp = now;
 }



WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: <linux-fsdevel@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andrea Righi <arighi@develer.com>
Cc: linux-mm <linux-mm@kvack.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH 3/5] writeback: dirty rate control
Date: Sat, 06 Aug 2011 16:44:50 +0800	[thread overview]
Message-ID: <20110806094526.878435971@intel.com> (raw)
In-Reply-To: 20110806084447.388624428@intel.com

[-- Attachment #1: dirty-ratelimit --]
[-- Type: text/plain, Size: 6718 bytes --]

It's all about bdi->dirty_ratelimit, which aims to be (write_bw / N)
when there are N dd tasks.

On write() syscall, use bdi->dirty_ratelimit
============================================

    balance_dirty_pages(pages_dirtied)
    {
        pos_bw = bdi->dirty_ratelimit * bdi_position_ratio();
        pause = pages_dirtied / pos_bw;
        sleep(pause);
    }

On every 200ms, update bdi->dirty_ratelimit
===========================================

    bdi_update_dirty_ratelimit()
    {
        bw = bdi->dirty_ratelimit;
        ref_bw = bw * bdi_position_ratio() * write_bw / dirty_bw;
        if (dirty pages unbalanced)
             bdi->dirty_ratelimit = (bw * 3 + ref_bw) / 4;
    }

Estimation of balanced bdi->dirty_ratelimit
===========================================

When started N dd, throttle each dd at

         task_ratelimit = pos_bw (any non-zero initial value is OK)

After 200ms, we got

         dirty_bw = # of pages dirtied by app / 200ms
         write_bw = # of pages written to disk / 200ms

For aggressive dirtiers, the equality holds

         dirty_bw == N * task_ratelimit
                  == N * pos_bw                      	(1)

The balanced throttle bandwidth can be estimated by

         ref_bw = pos_bw * write_bw / dirty_bw       	(2)

>From (1) and (2), we get equality

         ref_bw == write_bw / N                      	(3)

If the N dd's are all throttled at ref_bw, the dirty/writeback rates
will match. So ref_bw is the balanced dirty rate.

In practice, the ref_bw calculated by (2) may fluctuate and have
estimation errors. So the bdi->dirty_ratelimit update policy is to
follow it only when both pos_bw and ref_bw point to the same direction
(indicating not only the dirty position has deviated from the global/bdi
setpoints, but also it's still departing away).

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/backing-dev.h |    7 +++
 mm/backing-dev.c            |    1 
 mm/page-writeback.c         |   69 +++++++++++++++++++++++++++++++++-
 3 files changed, 75 insertions(+), 2 deletions(-)

--- linux-next.orig/include/linux/backing-dev.h	2011-08-05 18:05:36.000000000 +0800
+++ linux-next/include/linux/backing-dev.h	2011-08-05 18:05:36.000000000 +0800
@@ -75,10 +75,17 @@ struct backing_dev_info {
 	struct percpu_counter bdi_stat[NR_BDI_STAT_ITEMS];
 
 	unsigned long bw_time_stamp;	/* last time write bw is updated */
+	unsigned long dirtied_stamp;
 	unsigned long written_stamp;	/* pages written at bw_time_stamp */
 	unsigned long write_bandwidth;	/* the estimated write bandwidth */
 	unsigned long avg_write_bandwidth; /* further smoothed write bw */
 
+	/*
+	 * The base throttle bandwidth, re-calculated on every 200ms.
+	 * All the bdi tasks' dirty rate will be curbed under it.
+	 */
+	unsigned long dirty_ratelimit;
+
 	struct prop_local_percpu completions;
 	int dirty_exceeded;
 
--- linux-next.orig/mm/backing-dev.c	2011-08-05 18:05:36.000000000 +0800
+++ linux-next/mm/backing-dev.c	2011-08-05 18:05:36.000000000 +0800
@@ -674,6 +674,7 @@ int bdi_init(struct backing_dev_info *bd
 	bdi->bw_time_stamp = jiffies;
 	bdi->written_stamp = 0;
 
+	bdi->dirty_ratelimit = INIT_BW;
 	bdi->write_bandwidth = INIT_BW;
 	bdi->avg_write_bandwidth = INIT_BW;
 
--- linux-next.orig/mm/page-writeback.c	2011-08-05 18:05:36.000000000 +0800
+++ linux-next/mm/page-writeback.c	2011-08-06 09:08:35.000000000 +0800
@@ -736,6 +736,66 @@ static void global_update_bandwidth(unsi
 	spin_unlock(&dirty_lock);
 }
 
+/*
+ * Maintain bdi->dirty_ratelimit, the base throttle bandwidth.
+ *
+ * Normal bdi tasks will be curbed at or below it in long term.
+ * Obviously it should be around (write_bw / N) when there are N dd tasks.
+ */
+static void bdi_update_dirty_ratelimit(struct backing_dev_info *bdi,
+				       unsigned long thresh,
+				       unsigned long dirty,
+				       unsigned long bdi_thresh,
+				       unsigned long bdi_dirty,
+				       unsigned long dirtied,
+				       unsigned long elapsed)
+{
+	unsigned long bw = bdi->dirty_ratelimit;
+	unsigned long dirty_bw;
+	unsigned long pos_bw;
+	unsigned long ref_bw;
+	unsigned long long pos_ratio;
+
+	/*
+	 * The dirty rate will match the writeback rate in long term, except
+	 * when dirty pages are truncated by userspace or re-dirtied by FS.
+	 */
+	dirty_bw = (dirtied - bdi->dirtied_stamp) * HZ / elapsed;
+
+	pos_ratio = bdi_position_ratio(bdi, thresh, dirty,
+				       bdi_thresh, bdi_dirty);
+	/*
+	 * pos_bw reflects each dd's dirty rate enforced for the past 200ms.
+	 */
+	pos_bw = bw * pos_ratio >> BANDWIDTH_CALC_SHIFT;
+	pos_bw++;  /* this avoids bdi->dirty_ratelimit get stuck in 0 */
+
+	/*
+	 * ref_bw = pos_bw * write_bw / dirty_bw
+	 *
+	 * It's a linear estimation of the "balanced" throttle bandwidth.
+	 */
+	pos_ratio *= bdi->avg_write_bandwidth;
+	do_div(pos_ratio, dirty_bw | 1);
+	ref_bw = bw * pos_ratio >> BANDWIDTH_CALC_SHIFT;
+
+	/*
+	 * dirty_ratelimit will follow ref_bw/pos_bw conservatively iff they
+	 * are on the same side of dirty_ratelimit. Which not only makes it
+	 * more stable, but also is essential for preventing it being driven
+	 * away by possible systematic errors in ref_bw.
+	 */
+	if (pos_bw < bw) {
+		if (ref_bw < bw)
+			bw = max(ref_bw, pos_bw);
+	} else {
+		if (ref_bw > bw)
+			bw = min(ref_bw, pos_bw);
+	}
+
+	bdi->dirty_ratelimit = bw;
+}
+
 void __bdi_update_bandwidth(struct backing_dev_info *bdi,
 			    unsigned long thresh,
 			    unsigned long dirty,
@@ -745,6 +805,7 @@ void __bdi_update_bandwidth(struct backi
 {
 	unsigned long now = jiffies;
 	unsigned long elapsed = now - bdi->bw_time_stamp;
+	unsigned long dirtied;
 	unsigned long written;
 
 	/*
@@ -753,6 +814,7 @@ void __bdi_update_bandwidth(struct backi
 	if (elapsed < BANDWIDTH_INTERVAL)
 		return;
 
+	dirtied = percpu_counter_read(&bdi->bdi_stat[BDI_DIRTIED]);
 	written = percpu_counter_read(&bdi->bdi_stat[BDI_WRITTEN]);
 
 	/*
@@ -762,12 +824,15 @@ void __bdi_update_bandwidth(struct backi
 	if (elapsed > HZ && time_before(bdi->bw_time_stamp, start_time))
 		goto snapshot;
 
-	if (thresh)
+	if (thresh) {
 		global_update_bandwidth(thresh, dirty, now);
-
+		bdi_update_dirty_ratelimit(bdi, thresh, dirty, bdi_thresh,
+					   bdi_dirty, dirtied, elapsed);
+	}
 	bdi_update_write_bandwidth(bdi, elapsed, written);
 
 snapshot:
+	bdi->dirtied_stamp = dirtied;
 	bdi->written_stamp = written;
 	bdi->bw_time_stamp = now;
 }


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: linux-fsdevel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
	Christoph Hellwig <hch@lst.de>,
	Dave Chinner <david@fromorbit.com>,
	Greg Thelen <gthelen@google.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Andrea Righi <arighi@develer.com>, linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH 3/5] writeback: dirty rate control
Date: Sat, 06 Aug 2011 16:44:50 +0800	[thread overview]
Message-ID: <20110806094526.878435971@intel.com> (raw)
In-Reply-To: 20110806084447.388624428@intel.com

[-- Attachment #1: dirty-ratelimit --]
[-- Type: text/plain, Size: 6718 bytes --]

It's all about bdi->dirty_ratelimit, which aims to be (write_bw / N)
when there are N dd tasks.

On write() syscall, use bdi->dirty_ratelimit
============================================

    balance_dirty_pages(pages_dirtied)
    {
        pos_bw = bdi->dirty_ratelimit * bdi_position_ratio();
        pause = pages_dirtied / pos_bw;
        sleep(pause);
    }

On every 200ms, update bdi->dirty_ratelimit
===========================================

    bdi_update_dirty_ratelimit()
    {
        bw = bdi->dirty_ratelimit;
        ref_bw = bw * bdi_position_ratio() * write_bw / dirty_bw;
        if (dirty pages unbalanced)
             bdi->dirty_ratelimit = (bw * 3 + ref_bw) / 4;
    }

Estimation of balanced bdi->dirty_ratelimit
===========================================

When started N dd, throttle each dd at

         task_ratelimit = pos_bw (any non-zero initial value is OK)

After 200ms, we got

         dirty_bw = # of pages dirtied by app / 200ms
         write_bw = # of pages written to disk / 200ms

For aggressive dirtiers, the equality holds

         dirty_bw == N * task_ratelimit
                  == N * pos_bw                      	(1)

The balanced throttle bandwidth can be estimated by

         ref_bw = pos_bw * write_bw / dirty_bw       	(2)

>From (1) and (2), we get equality

         ref_bw == write_bw / N                      	(3)

If the N dd's are all throttled at ref_bw, the dirty/writeback rates
will match. So ref_bw is the balanced dirty rate.

In practice, the ref_bw calculated by (2) may fluctuate and have
estimation errors. So the bdi->dirty_ratelimit update policy is to
follow it only when both pos_bw and ref_bw point to the same direction
(indicating not only the dirty position has deviated from the global/bdi
setpoints, but also it's still departing away).

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/backing-dev.h |    7 +++
 mm/backing-dev.c            |    1 
 mm/page-writeback.c         |   69 +++++++++++++++++++++++++++++++++-
 3 files changed, 75 insertions(+), 2 deletions(-)

--- linux-next.orig/include/linux/backing-dev.h	2011-08-05 18:05:36.000000000 +0800
+++ linux-next/include/linux/backing-dev.h	2011-08-05 18:05:36.000000000 +0800
@@ -75,10 +75,17 @@ struct backing_dev_info {
 	struct percpu_counter bdi_stat[NR_BDI_STAT_ITEMS];
 
 	unsigned long bw_time_stamp;	/* last time write bw is updated */
+	unsigned long dirtied_stamp;
 	unsigned long written_stamp;	/* pages written at bw_time_stamp */
 	unsigned long write_bandwidth;	/* the estimated write bandwidth */
 	unsigned long avg_write_bandwidth; /* further smoothed write bw */
 
+	/*
+	 * The base throttle bandwidth, re-calculated on every 200ms.
+	 * All the bdi tasks' dirty rate will be curbed under it.
+	 */
+	unsigned long dirty_ratelimit;
+
 	struct prop_local_percpu completions;
 	int dirty_exceeded;
 
--- linux-next.orig/mm/backing-dev.c	2011-08-05 18:05:36.000000000 +0800
+++ linux-next/mm/backing-dev.c	2011-08-05 18:05:36.000000000 +0800
@@ -674,6 +674,7 @@ int bdi_init(struct backing_dev_info *bd
 	bdi->bw_time_stamp = jiffies;
 	bdi->written_stamp = 0;
 
+	bdi->dirty_ratelimit = INIT_BW;
 	bdi->write_bandwidth = INIT_BW;
 	bdi->avg_write_bandwidth = INIT_BW;
 
--- linux-next.orig/mm/page-writeback.c	2011-08-05 18:05:36.000000000 +0800
+++ linux-next/mm/page-writeback.c	2011-08-06 09:08:35.000000000 +0800
@@ -736,6 +736,66 @@ static void global_update_bandwidth(unsi
 	spin_unlock(&dirty_lock);
 }
 
+/*
+ * Maintain bdi->dirty_ratelimit, the base throttle bandwidth.
+ *
+ * Normal bdi tasks will be curbed at or below it in long term.
+ * Obviously it should be around (write_bw / N) when there are N dd tasks.
+ */
+static void bdi_update_dirty_ratelimit(struct backing_dev_info *bdi,
+				       unsigned long thresh,
+				       unsigned long dirty,
+				       unsigned long bdi_thresh,
+				       unsigned long bdi_dirty,
+				       unsigned long dirtied,
+				       unsigned long elapsed)
+{
+	unsigned long bw = bdi->dirty_ratelimit;
+	unsigned long dirty_bw;
+	unsigned long pos_bw;
+	unsigned long ref_bw;
+	unsigned long long pos_ratio;
+
+	/*
+	 * The dirty rate will match the writeback rate in long term, except
+	 * when dirty pages are truncated by userspace or re-dirtied by FS.
+	 */
+	dirty_bw = (dirtied - bdi->dirtied_stamp) * HZ / elapsed;
+
+	pos_ratio = bdi_position_ratio(bdi, thresh, dirty,
+				       bdi_thresh, bdi_dirty);
+	/*
+	 * pos_bw reflects each dd's dirty rate enforced for the past 200ms.
+	 */
+	pos_bw = bw * pos_ratio >> BANDWIDTH_CALC_SHIFT;
+	pos_bw++;  /* this avoids bdi->dirty_ratelimit get stuck in 0 */
+
+	/*
+	 * ref_bw = pos_bw * write_bw / dirty_bw
+	 *
+	 * It's a linear estimation of the "balanced" throttle bandwidth.
+	 */
+	pos_ratio *= bdi->avg_write_bandwidth;
+	do_div(pos_ratio, dirty_bw | 1);
+	ref_bw = bw * pos_ratio >> BANDWIDTH_CALC_SHIFT;
+
+	/*
+	 * dirty_ratelimit will follow ref_bw/pos_bw conservatively iff they
+	 * are on the same side of dirty_ratelimit. Which not only makes it
+	 * more stable, but also is essential for preventing it being driven
+	 * away by possible systematic errors in ref_bw.
+	 */
+	if (pos_bw < bw) {
+		if (ref_bw < bw)
+			bw = max(ref_bw, pos_bw);
+	} else {
+		if (ref_bw > bw)
+			bw = min(ref_bw, pos_bw);
+	}
+
+	bdi->dirty_ratelimit = bw;
+}
+
 void __bdi_update_bandwidth(struct backing_dev_info *bdi,
 			    unsigned long thresh,
 			    unsigned long dirty,
@@ -745,6 +805,7 @@ void __bdi_update_bandwidth(struct backi
 {
 	unsigned long now = jiffies;
 	unsigned long elapsed = now - bdi->bw_time_stamp;
+	unsigned long dirtied;
 	unsigned long written;
 
 	/*
@@ -753,6 +814,7 @@ void __bdi_update_bandwidth(struct backi
 	if (elapsed < BANDWIDTH_INTERVAL)
 		return;
 
+	dirtied = percpu_counter_read(&bdi->bdi_stat[BDI_DIRTIED]);
 	written = percpu_counter_read(&bdi->bdi_stat[BDI_WRITTEN]);
 
 	/*
@@ -762,12 +824,15 @@ void __bdi_update_bandwidth(struct backi
 	if (elapsed > HZ && time_before(bdi->bw_time_stamp, start_time))
 		goto snapshot;
 
-	if (thresh)
+	if (thresh) {
 		global_update_bandwidth(thresh, dirty, now);
-
+		bdi_update_dirty_ratelimit(bdi, thresh, dirty, bdi_thresh,
+					   bdi_dirty, dirtied, elapsed);
+	}
 	bdi_update_write_bandwidth(bdi, elapsed, written);
 
 snapshot:
+	bdi->dirtied_stamp = dirtied;
 	bdi->written_stamp = written;
 	bdi->bw_time_stamp = now;
 }


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2011-08-06 12:21 UTC|newest]

Thread overview: 283+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-06  8:44 [PATCH 0/5] IO-less dirty throttling v8 Wu Fengguang
2011-08-06  8:44 ` Wu Fengguang
2011-08-06  8:44 ` Wu Fengguang
2011-08-06  8:44 ` [PATCH 1/5] writeback: account per-bdi accumulated dirtied pages Wu Fengguang
2011-08-06  8:44   ` Wu Fengguang
2011-08-06  8:44   ` Wu Fengguang
2011-08-06  8:44 ` [PATCH 2/5] writeback: dirty position control Wu Fengguang
2011-08-06  8:44   ` Wu Fengguang
2011-08-06  8:44   ` Wu Fengguang
2011-08-08 13:46   ` Peter Zijlstra
2011-08-08 13:46     ` Peter Zijlstra
2011-08-08 13:46     ` Peter Zijlstra
2011-08-08 14:11     ` Wu Fengguang
2011-08-08 14:11       ` Wu Fengguang
2011-08-08 14:31       ` Peter Zijlstra
2011-08-08 14:31         ` Peter Zijlstra
2011-08-08 14:31         ` Peter Zijlstra
2011-08-08 22:47         ` Wu Fengguang
2011-08-08 22:47           ` Wu Fengguang
2011-08-09  9:31           ` Peter Zijlstra
2011-08-09  9:31             ` Peter Zijlstra
2011-08-09  9:31             ` Peter Zijlstra
2011-08-10 12:28             ` Wu Fengguang
2011-08-10 12:28               ` Wu Fengguang
2011-08-08 14:41       ` Peter Zijlstra
2011-08-08 14:41         ` Peter Zijlstra
2011-08-08 14:41         ` Peter Zijlstra
2011-08-08 23:05         ` Wu Fengguang
2011-08-08 23:05           ` Wu Fengguang
2011-08-09 10:32           ` Peter Zijlstra
2011-08-09 10:32             ` Peter Zijlstra
2011-08-09 10:32             ` Peter Zijlstra
2011-08-09 17:20           ` Peter Zijlstra
2011-08-09 17:20             ` Peter Zijlstra
2011-08-09 17:20             ` Peter Zijlstra
2011-08-10 22:34             ` Jan Kara
2011-08-10 22:34               ` Jan Kara
2011-08-11  2:29               ` Wu Fengguang
2011-08-11  2:29                 ` Wu Fengguang
2011-08-11 11:14                 ` Jan Kara
2011-08-11 11:14                   ` Jan Kara
2011-08-16  8:35                   ` Wu Fengguang
2011-08-16  8:35                     ` Wu Fengguang
2011-08-12 13:19             ` Wu Fengguang
2011-08-12 13:19               ` Wu Fengguang
2011-08-10 21:40           ` Vivek Goyal
2011-08-10 21:40             ` Vivek Goyal
2011-08-16  8:55             ` Wu Fengguang
2011-08-16  8:55               ` Wu Fengguang
2011-08-11 22:56           ` Peter Zijlstra
2011-08-11 22:56             ` Peter Zijlstra
2011-08-11 22:56             ` Peter Zijlstra
2011-08-12  2:43             ` Wu Fengguang
2011-08-12  2:43               ` Wu Fengguang
2011-08-12  3:18               ` Wu Fengguang
2011-08-12  5:45               ` Wu Fengguang
2011-08-12  5:45                 ` Wu Fengguang
2011-08-12  9:45                 ` Peter Zijlstra
2011-08-12  9:45                   ` Peter Zijlstra
2011-08-12  9:45                   ` Peter Zijlstra
2011-08-12 11:07                   ` Wu Fengguang
2011-08-12 11:07                     ` Wu Fengguang
2011-08-12 12:17                     ` Peter Zijlstra
2011-08-12 12:17                       ` Peter Zijlstra
2011-08-12 12:17                       ` Peter Zijlstra
2011-08-12  9:47               ` Peter Zijlstra
2011-08-12  9:47                 ` Peter Zijlstra
2011-08-12  9:47                 ` Peter Zijlstra
2011-08-12 11:11                 ` Wu Fengguang
2011-08-12 11:11                   ` Wu Fengguang
2011-08-12 12:54           ` Peter Zijlstra
2011-08-12 12:54             ` Peter Zijlstra
2011-08-12 12:54             ` Peter Zijlstra
2011-08-12 12:59             ` Wu Fengguang
2011-08-12 12:59               ` Wu Fengguang
2011-08-12 13:08               ` Peter Zijlstra
2011-08-12 13:08                 ` Peter Zijlstra
2011-08-12 13:08                 ` Peter Zijlstra
2011-08-12 13:04           ` Peter Zijlstra
2011-08-12 13:04             ` Peter Zijlstra
2011-08-12 13:04             ` Peter Zijlstra
2011-08-12 14:20             ` Wu Fengguang
2011-08-12 14:20               ` Wu Fengguang
2011-08-22 15:38               ` Peter Zijlstra
2011-08-22 15:38                 ` Peter Zijlstra
2011-08-22 15:38                 ` Peter Zijlstra
2011-08-23  3:40                 ` Wu Fengguang
2011-08-23  3:40                   ` Wu Fengguang
2011-08-23 10:01                   ` Peter Zijlstra
2011-08-23 10:01                     ` Peter Zijlstra
2011-08-23 10:01                     ` Peter Zijlstra
2011-08-23 14:15                     ` Wu Fengguang
2011-08-23 14:15                       ` Wu Fengguang
2011-08-23 17:47                       ` Vivek Goyal
2011-08-23 17:47                         ` Vivek Goyal
2011-08-24  0:12                         ` Wu Fengguang
2011-08-24  0:12                           ` Wu Fengguang
2011-08-24 16:12                           ` Peter Zijlstra
2011-08-24 16:12                             ` Peter Zijlstra
2011-08-26  0:18                             ` Wu Fengguang
2011-08-26  0:18                               ` Wu Fengguang
2011-08-26  9:04                               ` Peter Zijlstra
2011-08-26  9:04                                 ` Peter Zijlstra
2011-08-26 10:04                                 ` Wu Fengguang
2011-08-26 10:04                                   ` Wu Fengguang
2011-08-26 10:42                                   ` Peter Zijlstra
2011-08-26 10:42                                     ` Peter Zijlstra
2011-08-26 10:52                                     ` Wu Fengguang
2011-08-26 10:52                                       ` Wu Fengguang
2011-08-26 11:26                                   ` Wu Fengguang
2011-08-26 12:11                                     ` Peter Zijlstra
2011-08-26 12:11                                       ` Peter Zijlstra
2011-08-26 12:20                                       ` Wu Fengguang
2011-08-26 12:20                                         ` Wu Fengguang
2011-08-26 13:13                                         ` Wu Fengguang
2011-08-26 13:18                                           ` Peter Zijlstra
2011-08-26 13:18                                             ` Peter Zijlstra
2011-08-26 13:24                                             ` Wu Fengguang
2011-08-26 13:24                                               ` Wu Fengguang
2011-08-24 18:00                           ` Vivek Goyal
2011-08-24 18:00                             ` Vivek Goyal
2011-08-25  3:19                             ` Wu Fengguang
2011-08-25  3:19                               ` Wu Fengguang
2011-08-25 22:20                               ` Vivek Goyal
2011-08-25 22:20                                 ` Vivek Goyal
2011-08-26  1:56                                 ` Wu Fengguang
2011-08-26  1:56                                   ` Wu Fengguang
2011-08-26  8:56                                   ` Peter Zijlstra
2011-08-26  8:56                                     ` Peter Zijlstra
2011-08-26  9:53                                     ` Wu Fengguang
2011-08-26  9:53                                       ` Wu Fengguang
2011-08-29 13:12                             ` Peter Zijlstra
2011-08-29 13:12                               ` Peter Zijlstra
2011-08-29 13:37                               ` Wu Fengguang
2011-08-29 13:37                                 ` Wu Fengguang
2011-09-02 12:16                                 ` Peter Zijlstra
2011-09-02 12:16                                   ` Peter Zijlstra
2011-09-06 12:40                                 ` Peter Zijlstra
2011-09-06 12:40                                   ` Peter Zijlstra
2011-08-24 15:57                       ` Peter Zijlstra
2011-08-24 15:57                         ` Peter Zijlstra
2011-08-24 15:57                         ` Peter Zijlstra
2011-08-25  5:30                         ` Wu Fengguang
2011-08-25  5:30                           ` Wu Fengguang
2011-08-23 14:36                     ` Vivek Goyal
2011-08-23 14:36                       ` Vivek Goyal
2011-08-09  2:08   ` Vivek Goyal
2011-08-09  2:08     ` Vivek Goyal
2011-08-16  8:59     ` Wu Fengguang
2011-08-16  8:59       ` Wu Fengguang
2011-08-06  8:44 ` Wu Fengguang [this message]
2011-08-06  8:44   ` [PATCH 3/5] writeback: dirty rate control Wu Fengguang
2011-08-06  8:44   ` Wu Fengguang
2011-08-09 14:54   ` Vivek Goyal
2011-08-09 14:54     ` Vivek Goyal
2011-08-11  3:42     ` Wu Fengguang
2011-08-11  3:42       ` Wu Fengguang
2011-08-09 14:57   ` Peter Zijlstra
2011-08-09 14:57     ` Peter Zijlstra
2011-08-09 14:57     ` Peter Zijlstra
2011-08-10 11:07     ` Wu Fengguang
2011-08-10 11:07       ` Wu Fengguang
2011-08-10 16:17       ` Peter Zijlstra
2011-08-10 16:17         ` Peter Zijlstra
2011-08-10 16:17         ` Peter Zijlstra
2011-08-15 14:08         ` Wu Fengguang
2011-08-15 14:08           ` Wu Fengguang
2011-08-09 15:50   ` Vivek Goyal
2011-08-09 15:50     ` Vivek Goyal
2011-08-09 16:16     ` Peter Zijlstra
2011-08-09 16:16       ` Peter Zijlstra
2011-08-09 16:16       ` Peter Zijlstra
2011-08-09 16:19       ` Peter Zijlstra
2011-08-09 16:19         ` Peter Zijlstra
2011-08-09 16:19         ` Peter Zijlstra
2011-08-10 14:07         ` Wu Fengguang
2011-08-10 14:07           ` Wu Fengguang
2011-08-10 14:00       ` Wu Fengguang
2011-08-10 14:00         ` Wu Fengguang
2011-08-10 17:10         ` Peter Zijlstra
2011-08-10 17:10           ` Peter Zijlstra
2011-08-15 14:11           ` Wu Fengguang
2011-08-15 14:11             ` Wu Fengguang
2011-08-09 16:56   ` Peter Zijlstra
2011-08-09 16:56     ` Peter Zijlstra
2011-08-09 16:56     ` Peter Zijlstra
2011-08-10 14:10     ` Wu Fengguang
2011-08-09 17:02   ` Peter Zijlstra
2011-08-09 17:02     ` Peter Zijlstra
2011-08-09 17:02     ` Peter Zijlstra
2011-08-10 14:15     ` Wu Fengguang
2011-08-10 14:15       ` Wu Fengguang
2011-08-06  8:44 ` [PATCH 4/5] writeback: per task dirty rate limit Wu Fengguang
2011-08-06  8:44   ` Wu Fengguang
2011-08-06  8:44   ` Wu Fengguang
2011-08-06 14:35   ` Andrea Righi
2011-08-06 14:35     ` Andrea Righi
2011-08-07  6:19     ` Wu Fengguang
2011-08-07  6:19       ` Wu Fengguang
2011-08-08 13:47   ` Peter Zijlstra
2011-08-08 13:47     ` Peter Zijlstra
2011-08-08 13:47     ` Peter Zijlstra
2011-08-08 14:21     ` Wu Fengguang
2011-08-08 14:21       ` Wu Fengguang
2011-08-08 23:32       ` Wu Fengguang
2011-08-08 23:32         ` Wu Fengguang
2011-08-08 14:23     ` Wu Fengguang
2011-08-08 14:23       ` Wu Fengguang
2011-08-08 14:26       ` Peter Zijlstra
2011-08-08 14:26         ` Peter Zijlstra
2011-08-08 14:26         ` Peter Zijlstra
2011-08-08 22:38         ` Wu Fengguang
2011-08-08 22:38           ` Wu Fengguang
2011-08-13 16:28       ` Andrea Righi
2011-08-13 16:28         ` Andrea Righi
2011-08-15 14:21         ` Wu Fengguang
2011-08-15 14:26           ` Andrea Righi
2011-08-15 14:26             ` Andrea Righi
2011-08-09 17:46   ` Vivek Goyal
2011-08-09 17:46     ` Vivek Goyal
2011-08-10  3:29     ` Wu Fengguang
2011-08-10  3:29       ` Wu Fengguang
2011-08-10 18:18       ` Vivek Goyal
2011-08-10 18:18         ` Vivek Goyal
2011-08-11  0:55         ` Wu Fengguang
2011-08-11  0:55           ` Wu Fengguang
2011-08-09 18:35   ` Peter Zijlstra
2011-08-09 18:35     ` Peter Zijlstra
2011-08-09 18:35     ` Peter Zijlstra
2011-08-10  3:40     ` Wu Fengguang
2011-08-10  3:40       ` Wu Fengguang
2011-08-10 10:25       ` Peter Zijlstra
2011-08-10 10:25         ` Peter Zijlstra
2011-08-10 10:25         ` Peter Zijlstra
2011-08-10 11:13         ` Wu Fengguang
2011-08-10 11:13           ` Wu Fengguang
2011-08-06  8:44 ` [PATCH 5/5] writeback: IO-less balance_dirty_pages() Wu Fengguang
2011-08-06  8:44   ` Wu Fengguang
2011-08-06  8:44   ` Wu Fengguang
2011-08-06 14:48   ` Andrea Righi
2011-08-06 14:48     ` Andrea Righi
2011-08-06 14:48     ` Andrea Righi
2011-08-07  6:44     ` Wu Fengguang
2011-08-07  6:44       ` Wu Fengguang
2011-08-07  6:44       ` Wu Fengguang
2011-08-06 16:46   ` Andrea Righi
2011-08-06 16:46     ` Andrea Righi
2011-08-07  7:18     ` Wu Fengguang
2011-08-07  9:50       ` Andrea Righi
2011-08-07  9:50         ` Andrea Righi
2011-08-09 18:15   ` Vivek Goyal
2011-08-09 18:15     ` Vivek Goyal
2011-08-09 18:41     ` Peter Zijlstra
2011-08-09 18:41       ` Peter Zijlstra
2011-08-09 18:41       ` Peter Zijlstra
2011-08-10  3:22       ` Wu Fengguang
2011-08-10  3:22         ` Wu Fengguang
2011-08-10  3:26     ` Wu Fengguang
2011-08-10  3:26       ` Wu Fengguang
2011-08-09 19:16   ` Vivek Goyal
2011-08-09 19:16     ` Vivek Goyal
2011-08-10  4:33     ` Wu Fengguang
2011-08-09  2:01 ` [PATCH 0/5] IO-less dirty throttling v8 Vivek Goyal
2011-08-09  2:01   ` Vivek Goyal
2011-08-09  5:55   ` Dave Chinner
2011-08-09  5:55     ` Dave Chinner
2011-08-09 14:04     ` Vivek Goyal
2011-08-09 14:04       ` Vivek Goyal
2011-08-10  7:41       ` Greg Thelen
2011-08-10  7:41         ` Greg Thelen
2011-08-10  7:41         ` Greg Thelen
2011-08-10 18:40         ` Vivek Goyal
2011-08-10 18:40           ` Vivek Goyal
2011-08-10 18:40           ` Vivek Goyal
2011-08-11  3:21   ` Wu Fengguang
2011-08-11  3:21     ` Wu Fengguang
2011-08-11 20:42     ` Vivek Goyal
2011-08-11 20:42       ` Vivek Goyal
2011-08-11 21:00       ` Vivek Goyal
2011-08-11 21:00         ` Vivek Goyal
2011-08-16  2:20 [PATCH 0/5] IO-less dirty throttling v9 Wu Fengguang
2011-08-16  2:20 ` [PATCH 3/5] writeback: dirty rate control Wu Fengguang
2011-08-16  2:20   ` Wu Fengguang
2011-08-16  2:20   ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110806094526.878435971@intel.com \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.