linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: <linux-fsdevel@vger.kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Wu Fengguang <fengguang.wu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andrea Righi <arighi@develer.com>
Cc: linux-mm <linux-mm@kvack.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH 07/10] writeback: dirty ratelimit - think time compensation
Date: Fri, 26 Aug 2011 19:38:20 +0800	[thread overview]
Message-ID: <20110826114619.531760091@intel.com> (raw)
In-Reply-To: 20110826113813.895522398@intel.com

[-- Attachment #1: think-time-compensation --]
[-- Type: text/plain, Size: 5222 bytes --]

Compensate the task's think time when computing the final pause time,
so that ->dirty_ratelimit can be executed accurately.

        think time := time spend outside of balance_dirty_pages()

In the rare case that the task slept longer than the 200ms period time
(result in negative pause time), the sleep time will be compensated in
the following periods, too, if it's less than 1 second.

Accumulated errors are carefully avoided as long as the max pause area
is not hitted.

Pseudo code:

        period = pages_dirtied / task_ratelimit;
        think = jiffies - dirty_paused_when;
        pause = period - think;

1) normal case: period > think

        pause = period - think
        dirty_paused_when = jiffies + pause
        nr_dirtied = 0

                             period time
              |===============================>|
                  think time      pause time
              |===============>|==============>|
        ------|----------------|---------------|------------------------
        dirty_paused_when   jiffies


2) no pause case: period <= think

        don't pause; reduce future pause time by:
        dirty_paused_when += period
        nr_dirtied = 0

                           period time
              |===============================>|
                                  think time
              |===================================================>|
        ------|--------------------------------+-------------------|----
        dirty_paused_when                                       jiffies

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/sched.h |    1 +
 kernel/fork.c         |    1 +
 mm/page-writeback.c   |   34 +++++++++++++++++++++++++++++++---
 3 files changed, 33 insertions(+), 3 deletions(-)

--- linux-next.orig/include/linux/sched.h	2011-08-16 08:50:41.000000000 +0800
+++ linux-next/include/linux/sched.h	2011-08-16 08:54:13.000000000 +0800
@@ -1531,6 +1531,7 @@ struct task_struct {
 	 */
 	int nr_dirtied;
 	int nr_dirtied_pause;
+	unsigned long dirty_paused_when; /* start of a write-and-pause period */
 
 #ifdef CONFIG_LATENCYTOP
 	int latency_record_count;
--- linux-next.orig/mm/page-writeback.c	2011-08-16 08:54:08.000000000 +0800
+++ linux-next/mm/page-writeback.c	2011-08-16 08:54:13.000000000 +0800
@@ -913,6 +913,7 @@ static void balance_dirty_pages(struct a
 	unsigned long background_thresh;
 	unsigned long dirty_thresh;
 	unsigned long bdi_thresh;
+	long period;
 	long pause = 0;
 	bool dirty_exceeded = false;
 	unsigned long task_ratelimit;
@@ -922,6 +923,8 @@ static void balance_dirty_pages(struct a
 	unsigned long start_time = jiffies;
 
 	for (;;) {
+		unsigned long now = jiffies;
+
 		/*
 		 * Unstable writes are a feature of certain networked
 		 * filesystems (i.e. NFS) in which data may have been
@@ -940,8 +943,11 @@ static void balance_dirty_pages(struct a
 		 * when the bdi limits are ramping up.
 		 */
 		if (nr_dirty <= dirty_freerun_ceiling(dirty_thresh,
-						      background_thresh))
+						      background_thresh)) {
+			current->dirty_paused_when = now;
+			current->nr_dirtied = 0;
 			break;
+		}
 
 		bdi_thresh = bdi_dirty_limit(bdi, dirty_thresh);
 
@@ -979,18 +985,41 @@ static void balance_dirty_pages(struct a
 					       background_thresh, nr_dirty,
 					       bdi_thresh, bdi_dirty);
 		if (unlikely(pos_ratio == 0)) {
+			period = MAX_PAUSE;
 			pause = MAX_PAUSE;
 			goto pause;
 		}
 		task_ratelimit = (u64)base_rate *
 					pos_ratio >> RATELIMIT_CALC_SHIFT;
-		pause = (HZ * pages_dirtied) / (task_ratelimit | 1);
+		period = (HZ * pages_dirtied) / (task_ratelimit | 1);
+		pause = current->dirty_paused_when + period - now;
+		/*
+		 * For less than 1s think time (ext3/4 may block the dirtier
+		 * for up to 800ms from time to time on 1-HDD; so does xfs,
+		 * however at much less frequency), try to compensate it in
+		 * future periods by updating the virtual time; otherwise just
+		 * do a reset, as it may be a light dirtier.
+		 */
+		if (unlikely(pause <= 0)) {
+			if (pause < -HZ) {
+				current->dirty_paused_when = now;
+				current->nr_dirtied = 0;
+			} else if (period) {
+				current->dirty_paused_when += period;
+				current->nr_dirtied = 0;
+			}
+			pause = 1; /* avoid resetting nr_dirtied_pause below */
+			break;
+		}
 		pause = min(pause, (long)MAX_PAUSE);
 
 pause:
 		__set_current_state(TASK_UNINTERRUPTIBLE);
 		io_schedule_timeout(pause);
 
+		current->dirty_paused_when = now + pause;
+		current->nr_dirtied = 0;
+
 		dirty_thresh = hard_dirty_limit(dirty_thresh);
 		/*
 		 * max-pause area. If dirty exceeded but still within this
@@ -1017,7 +1046,6 @@ pause:
 	if (!dirty_exceeded && bdi->dirty_exceeded)
 		bdi->dirty_exceeded = 0;
 
-	current->nr_dirtied = 0;
 	current->nr_dirtied_pause = dirty_poll_interval(nr_dirty, dirty_thresh);
 
 	if (writeback_in_progress(bdi))
--- linux-next.orig/kernel/fork.c	2011-08-16 08:50:41.000000000 +0800
+++ linux-next/kernel/fork.c	2011-08-16 08:54:13.000000000 +0800
@@ -1303,6 +1303,7 @@ static struct task_struct *copy_process(
 
 	p->nr_dirtied = 0;
 	p->nr_dirtied_pause = 128 >> (PAGE_SHIFT - 10);
+	p->dirty_paused_when = 0;
 
 	/*
 	 * Ok, make it visible to the rest of the system.



  parent reply	other threads:[~2011-08-26 11:48 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-26 11:38 [PATCH 00/10] IO-less dirty throttling v10 Wu Fengguang
2011-08-26 11:38 ` [PATCH 01/10] writeback: account per-bdi accumulated dirtied pages Wu Fengguang
2011-08-26 11:38 ` [PATCH 02/10] writeback: dirty position control Wu Fengguang
2011-08-26 11:38 ` [PATCH 03/10] writeback: dirty rate control Wu Fengguang
2011-08-26 11:38 ` [PATCH 04/10] writeback: stabilize bdi->dirty_ratelimit Wu Fengguang
2011-08-26 11:38 ` [PATCH 05/10] writeback: per task dirty rate limit Wu Fengguang
2011-08-26 12:51   ` Peter Zijlstra
2011-08-26 12:59     ` Wu Fengguang
2011-08-26 11:38 ` [PATCH 06/10] writeback: IO-less balance_dirty_pages() Wu Fengguang
2011-08-26 11:38 ` Wu Fengguang [this message]
2011-08-26 12:05   ` [PATCH 07/10] writeback: dirty ratelimit - think time compensation Wu Fengguang
2011-08-26 11:38 ` [PATCH 08/10] writeback: trace balance_dirty_pages Wu Fengguang
2011-08-26 12:07   ` Wu Fengguang
2011-08-26 11:38 ` [PATCH 09/10] writeback: trace dirty_ratelimit Wu Fengguang
2011-08-26 11:38 ` [PATCH 10/10] trace task_io Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110826114619.531760091@intel.com \
    --to=fengguang.wu@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).