From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754747Ab1GMObc (ORCPT ); Wed, 13 Jul 2011 10:31:32 -0400 Received: from cantor2.suse.de ([195.135.220.15]:54797 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754320Ab1GMObb (ORCPT ); Wed, 13 Jul 2011 10:31:31 -0400 From: Mel Gorman To: Linux-MM Cc: LKML , XFS , Dave Chinner , Christoph Hellwig , Johannes Weiner , Wu Fengguang , Jan Kara , Rik van Riel , Minchan Kim , Mel Gorman Subject: [PATCH 1/5] mm: vmscan: Do not writeback filesystem pages in direct reclaim Date: Wed, 13 Jul 2011 15:31:23 +0100 Message-Id: <1310567487-15367-2-git-send-email-mgorman@suse.de> X-Mailer: git-send-email 1.7.3.4 In-Reply-To: <1310567487-15367-1-git-send-email-mgorman@suse.de> References: <1310567487-15367-1-git-send-email-mgorman@suse.de> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Mel Gorman When kswapd is failing to keep zones above the min watermark, a process will enter direct reclaim in the same manner kswapd does. If a dirty page is encountered during the scan, this page is written to backing storage using mapping->writepage. This causes two problems. First, it can result in very deep call stacks, particularly if the target storage or filesystem are complex. Some filesystems ignore write requests from direct reclaim as a result. The second is that a single-page flush is inefficient in terms of IO. While there is an expectation that the elevator will merge requests, this does not always happen. Quoting Christoph Hellwig; The elevator has a relatively small window it can operate on, and can never fix up a bad large scale writeback pattern. This patch prevents direct reclaim writing back filesystem pages by checking if current is kswapd. Anonymous pages are still written to swap as there is not the equivalent of a flusher thread for anonymos pages. If the dirty pages cannot be written back, they are placed back on the LRU lists. Signed-off-by: Mel Gorman --- include/linux/mmzone.h | 1 + mm/vmscan.c | 9 +++++++++ mm/vmstat.c | 1 + 3 files changed, 11 insertions(+), 0 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9f7c3eb..b70a0c0 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -100,6 +100,7 @@ enum zone_stat_item { NR_UNSTABLE_NFS, /* NFS unstable pages */ NR_BOUNCE, NR_VMSCAN_WRITE, + NR_VMSCAN_WRITE_SKIP, NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */ NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f49535..2d3e5b6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -825,6 +825,15 @@ static unsigned long shrink_page_list(struct list_head *page_list, if (PageDirty(page)) { nr_dirty++; + /* + * Only kswapd can writeback filesystem pages to + * avoid risk of stack overflow + */ + if (page_is_file_cache(page) && !current_is_kswapd()) { + inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP); + goto keep_locked; + } + if (references == PAGEREF_RECLAIM_CLEAN) goto keep_locked; if (!may_enter_fs) diff --git a/mm/vmstat.c b/mm/vmstat.c index 20c18b7..fd109f3 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -702,6 +702,7 @@ const char * const vmstat_text[] = { "nr_unstable", "nr_bounce", "nr_vmscan_write", + "nr_vmscan_write_skip", "nr_writeback_temp", "nr_isolated_anon", "nr_isolated_file", -- 1.7.3.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p6DEVWd5087188 for ; Wed, 13 Jul 2011 09:31:33 -0500 Received: from mx2.suse.de (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id CA4F5681E1 for ; Wed, 13 Jul 2011 07:31:31 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de [195.135.220.15]) by cuda.sgi.com with ESMTP id QALHUMZRTeOgbDDC for ; Wed, 13 Jul 2011 07:31:31 -0700 (PDT) From: Mel Gorman Subject: [PATCH 1/5] mm: vmscan: Do not writeback filesystem pages in direct reclaim Date: Wed, 13 Jul 2011 15:31:23 +0100 Message-Id: <1310567487-15367-2-git-send-email-mgorman@suse.de> In-Reply-To: <1310567487-15367-1-git-send-email-mgorman@suse.de> References: <1310567487-15367-1-git-send-email-mgorman@suse.de> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Linux-MM Cc: Rik van Riel , Jan Kara , LKML , XFS , Christoph Hellwig , Minchan Kim , Wu Fengguang , Johannes Weiner , Mel Gorman From: Mel Gorman When kswapd is failing to keep zones above the min watermark, a process will enter direct reclaim in the same manner kswapd does. If a dirty page is encountered during the scan, this page is written to backing storage using mapping->writepage. This causes two problems. First, it can result in very deep call stacks, particularly if the target storage or filesystem are complex. Some filesystems ignore write requests from direct reclaim as a result. The second is that a single-page flush is inefficient in terms of IO. While there is an expectation that the elevator will merge requests, this does not always happen. Quoting Christoph Hellwig; The elevator has a relatively small window it can operate on, and can never fix up a bad large scale writeback pattern. This patch prevents direct reclaim writing back filesystem pages by checking if current is kswapd. Anonymous pages are still written to swap as there is not the equivalent of a flusher thread for anonymos pages. If the dirty pages cannot be written back, they are placed back on the LRU lists. Signed-off-by: Mel Gorman --- include/linux/mmzone.h | 1 + mm/vmscan.c | 9 +++++++++ mm/vmstat.c | 1 + 3 files changed, 11 insertions(+), 0 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9f7c3eb..b70a0c0 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -100,6 +100,7 @@ enum zone_stat_item { NR_UNSTABLE_NFS, /* NFS unstable pages */ NR_BOUNCE, NR_VMSCAN_WRITE, + NR_VMSCAN_WRITE_SKIP, NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */ NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f49535..2d3e5b6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -825,6 +825,15 @@ static unsigned long shrink_page_list(struct list_head *page_list, if (PageDirty(page)) { nr_dirty++; + /* + * Only kswapd can writeback filesystem pages to + * avoid risk of stack overflow + */ + if (page_is_file_cache(page) && !current_is_kswapd()) { + inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP); + goto keep_locked; + } + if (references == PAGEREF_RECLAIM_CLEAN) goto keep_locked; if (!may_enter_fs) diff --git a/mm/vmstat.c b/mm/vmstat.c index 20c18b7..fd109f3 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -702,6 +702,7 @@ const char * const vmstat_text[] = { "nr_unstable", "nr_bounce", "nr_vmscan_write", + "nr_vmscan_write_skip", "nr_writeback_temp", "nr_isolated_anon", "nr_isolated_file", -- 1.7.3.4 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id 5FBFA9000C2 for ; Wed, 13 Jul 2011 10:31:33 -0400 (EDT) From: Mel Gorman Subject: [PATCH 1/5] mm: vmscan: Do not writeback filesystem pages in direct reclaim Date: Wed, 13 Jul 2011 15:31:23 +0100 Message-Id: <1310567487-15367-2-git-send-email-mgorman@suse.de> In-Reply-To: <1310567487-15367-1-git-send-email-mgorman@suse.de> References: <1310567487-15367-1-git-send-email-mgorman@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Linux-MM Cc: LKML , XFS , Dave Chinner , Christoph Hellwig , Johannes Weiner , Wu Fengguang , Jan Kara , Rik van Riel , Minchan Kim , Mel Gorman From: Mel Gorman When kswapd is failing to keep zones above the min watermark, a process will enter direct reclaim in the same manner kswapd does. If a dirty page is encountered during the scan, this page is written to backing storage using mapping->writepage. This causes two problems. First, it can result in very deep call stacks, particularly if the target storage or filesystem are complex. Some filesystems ignore write requests from direct reclaim as a result. The second is that a single-page flush is inefficient in terms of IO. While there is an expectation that the elevator will merge requests, this does not always happen. Quoting Christoph Hellwig; The elevator has a relatively small window it can operate on, and can never fix up a bad large scale writeback pattern. This patch prevents direct reclaim writing back filesystem pages by checking if current is kswapd. Anonymous pages are still written to swap as there is not the equivalent of a flusher thread for anonymos pages. If the dirty pages cannot be written back, they are placed back on the LRU lists. Signed-off-by: Mel Gorman --- include/linux/mmzone.h | 1 + mm/vmscan.c | 9 +++++++++ mm/vmstat.c | 1 + 3 files changed, 11 insertions(+), 0 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9f7c3eb..b70a0c0 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -100,6 +100,7 @@ enum zone_stat_item { NR_UNSTABLE_NFS, /* NFS unstable pages */ NR_BOUNCE, NR_VMSCAN_WRITE, + NR_VMSCAN_WRITE_SKIP, NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */ NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f49535..2d3e5b6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -825,6 +825,15 @@ static unsigned long shrink_page_list(struct list_head *page_list, if (PageDirty(page)) { nr_dirty++; + /* + * Only kswapd can writeback filesystem pages to + * avoid risk of stack overflow + */ + if (page_is_file_cache(page) && !current_is_kswapd()) { + inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP); + goto keep_locked; + } + if (references == PAGEREF_RECLAIM_CLEAN) goto keep_locked; if (!may_enter_fs) diff --git a/mm/vmstat.c b/mm/vmstat.c index 20c18b7..fd109f3 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -702,6 +702,7 @@ const char * const vmstat_text[] = { "nr_unstable", "nr_bounce", "nr_vmscan_write", + "nr_vmscan_write_skip", "nr_writeback_temp", "nr_isolated_anon", "nr_isolated_file", -- 1.7.3.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org