All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>, Hugh Dickins <hughd@google.com>,
	Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH 11/15] writeback: skip balance_dirty_pages() for in-memory fs
Date: Wed, 08 Jun 2011 05:32:47 +0800	[thread overview]
Message-ID: <20110607213854.764050631@intel.com> (raw)
In-Reply-To: 20110607213236.634026193@intel.com

[-- Attachment #1: writeback-trace-global-dirty-states-fix.patch --]
[-- Type: text/plain, Size: 2707 bytes --]

This avoids unnecessary checks and dirty throttling on tmpfs/ramfs.

Notes about the tmpfs/ramfs behavior changes:

As for 2.6.36 and older kernels, the tmpfs writes will sleep inside
balance_dirty_pages() as long as we are over the (dirty+background)/2
global throttle threshold.  This is because both the dirty pages and
threshold will be 0 for tmpfs/ramfs. Hence this test will always
evaluate to TRUE:

                dirty_exceeded =
                        (bdi_nr_reclaimable + bdi_nr_writeback >= bdi_thresh)
                        || (nr_reclaimable + nr_writeback >= dirty_thresh);

For 2.6.37, someone complained that the current logic does not allow the
users to set vm.dirty_ratio=0.  So commit 4cbec4c8b9 changed the test to

                dirty_exceeded =
                        (bdi_nr_reclaimable + bdi_nr_writeback > bdi_thresh)
                        || (nr_reclaimable + nr_writeback > dirty_thresh);

So 2.6.37 will behave differently for tmpfs/ramfs: it will never get
throttled unless the global dirty threshold is exceeded (which is very
unlikely to happen; once happen, will block many tasks).

I'd say that the 2.6.36 behavior is very bad for tmpfs/ramfs. It means
for a busy writing server, tmpfs write()s may get livelocked! The
"inadvertent" throttling can hardly bring help to any workload because
of its "either no throttling, or get throttled to death" property.

So based on 2.6.37, this patch won't bring more noticeable changes.

CC: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/page-writeback.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

--- linux-next.orig/mm/page-writeback.c	2011-05-24 11:17:23.000000000 +0800
+++ linux-next/mm/page-writeback.c	2011-05-24 11:17:24.000000000 +0800
@@ -244,13 +244,8 @@ void task_dirty_inc(struct task_struct *
 static void bdi_writeout_fraction(struct backing_dev_info *bdi,
 		long *numerator, long *denominator)
 {
-	if (bdi_cap_writeback_dirty(bdi)) {
-		prop_fraction_percpu(&vm_completions, &bdi->completions,
+	prop_fraction_percpu(&vm_completions, &bdi->completions,
 				numerator, denominator);
-	} else {
-		*numerator = 0;
-		*denominator = 1;
-	}
 }
 
 static inline void task_dirties_fraction(struct task_struct *tsk,
@@ -495,6 +490,9 @@ static void balance_dirty_pages(struct a
 	bool dirty_exceeded = false;
 	struct backing_dev_info *bdi = mapping->backing_dev_info;
 
+	if (!bdi_cap_account_dirty(bdi))
+		return;
+
 	for (;;) {
 		struct writeback_control wbc = {
 			.sync_mode	= WB_SYNC_NONE,



  parent reply	other threads:[~2011-06-07 21:44 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-07 21:32 [PATCH 00/15] writeback fixes and cleanups for 3.0 (v5) Wu Fengguang
2011-06-07 21:32 ` [PATCH 01/15] writeback: introduce .tagged_writepages for the WB_SYNC_NONE sync stage Wu Fengguang
2011-06-07 23:02   ` Andrew Morton
2011-06-07 23:24     ` Wu Fengguang
2011-06-07 21:32 ` [PATCH 02/15] writeback: update dirtied_when for synced inode to prevent livelock Wu Fengguang
2011-06-07 23:02   ` Andrew Morton
2011-06-07 23:51     ` Wu Fengguang
2011-06-07 21:32 ` [PATCH 03/15] writeback: introduce writeback_control.inodes_cleaned Wu Fengguang
2011-06-07 23:03   ` Andrew Morton
2011-06-08  0:10     ` Wu Fengguang
2011-06-07 21:32 ` [PATCH 04/15] writeback: try more writeback as long as something was written Wu Fengguang
2011-06-07 21:32 ` [PATCH 05/15] writeback: the kupdate expire timestamp should be a moving target Wu Fengguang
2011-06-07 21:32 ` [PATCH 06/15] writeback: refill b_io iff empty Wu Fengguang
2011-06-07 21:32 ` [PATCH 07/15] writeback: split inode_wb_list_lock into bdi_writeback.list_lock Wu Fengguang
2011-06-07 23:03   ` Andrew Morton
2011-06-08  0:20     ` Wu Fengguang
2011-06-08  0:35       ` Andrew Morton
2011-06-08  1:36         ` Wu Fengguang
2011-06-07 21:32 ` [PATCH 08/15] writeback: elevate queue_io() into wb_writeback() Wu Fengguang
2011-06-07 21:32 ` [PATCH 09/15] writeback: avoid extra sync work at enqueue time Wu Fengguang
2011-06-07 21:32 ` [PATCH 10/15] writeback: add bdi_dirty_limit() kernel-doc Wu Fengguang
2011-06-07 21:32 ` Wu Fengguang [this message]
2011-06-11 13:07   ` [PATCH 11/15] writeback: skip balance_dirty_pages() for in-memory fs Wu Fengguang
2011-06-13 13:42     ` Jan Kara
2011-06-07 21:32 ` [PATCH 12/15] writeback: remove writeback_control.more_io Wu Fengguang
2011-07-11 21:31   ` Hugh Dickins
2011-07-12  6:20     ` Wu Fengguang
2011-07-12 19:50       ` Hugh Dickins
2011-07-13  5:49         ` Hugh Dickins
2011-07-13 10:57           ` Hugh Dickins
2011-07-13 11:19             ` Jan Kara
2011-07-13 15:06               ` Hugh Dickins
2011-07-13 22:07         ` Wu Fengguang
2011-06-07 21:32 ` [PATCH 13/15] writeback: remove .nonblocking and .encountered_congestion Wu Fengguang
2011-06-07 21:32 ` [PATCH 14/15] writeback: trace event writeback_single_inode Wu Fengguang
2011-06-07 21:32 ` [PATCH 15/15] writeback: trace event writeback_queue_io Wu Fengguang
2011-06-07 23:04 ` [PATCH 00/15] writeback fixes and cleanups for 3.0 (v5) Andrew Morton
2011-06-08  2:01   ` Wu Fengguang
2011-06-08  6:21     ` Sedat Dilek
2011-06-08 13:45     ` Wu Fengguang
2011-06-09  1:16       ` Stephen Rothwell
2011-06-09  2:18         ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110607213854.764050631@intel.com \
    --to=fengguang.wu@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.