From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752928Ab1GNBpS (ORCPT ); Wed, 13 Jul 2011 21:45:18 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:38826 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751548Ab1GNBpQ (ORCPT ); Wed, 13 Jul 2011 21:45:16 -0400 X-SecurityPolicyCheck-FJ: OK by FujitsuOutboundMailChecker v1.3.1 Date: Thu, 14 Jul 2011 10:38:01 +0900 From: KAMEZAWA Hiroyuki To: Mel Gorman Cc: Linux-MM , LKML , XFS , Dave Chinner , Christoph Hellwig , Johannes Weiner , Wu Fengguang , Jan Kara , Rik van Riel , Minchan Kim Subject: Re: [PATCH 1/5] mm: vmscan: Do not writeback filesystem pages in direct reclaim Message-Id: <20110714103801.83e10fdb.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <1310567487-15367-2-git-send-email-mgorman@suse.de> References: <1310567487-15367-1-git-send-email-mgorman@suse.de> <1310567487-15367-2-git-send-email-mgorman@suse.de> Organization: FUJITSU Co. LTD. X-Mailer: Sylpheed 3.1.1 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 13 Jul 2011 15:31:23 +0100 Mel Gorman wrote: > From: Mel Gorman > > When kswapd is failing to keep zones above the min watermark, a process > will enter direct reclaim in the same manner kswapd does. If a dirty > page is encountered during the scan, this page is written to backing > storage using mapping->writepage. > > This causes two problems. First, it can result in very deep call > stacks, particularly if the target storage or filesystem are complex. > Some filesystems ignore write requests from direct reclaim as a result. > The second is that a single-page flush is inefficient in terms of IO. > While there is an expectation that the elevator will merge requests, > this does not always happen. Quoting Christoph Hellwig; > > The elevator has a relatively small window it can operate on, > and can never fix up a bad large scale writeback pattern. > > This patch prevents direct reclaim writing back filesystem pages by > checking if current is kswapd. Anonymous pages are still written to > swap as there is not the equivalent of a flusher thread for anonymos > pages. If the dirty pages cannot be written back, they are placed > back on the LRU lists. > > Signed-off-by: Mel Gorman Hm. > --- > include/linux/mmzone.h | 1 + > mm/vmscan.c | 9 +++++++++ > mm/vmstat.c | 1 + > 3 files changed, 11 insertions(+), 0 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 9f7c3eb..b70a0c0 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -100,6 +100,7 @@ enum zone_stat_item { > NR_UNSTABLE_NFS, /* NFS unstable pages */ > NR_BOUNCE, > NR_VMSCAN_WRITE, > + NR_VMSCAN_WRITE_SKIP, > NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */ > NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ > NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 4f49535..2d3e5b6 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -825,6 +825,15 @@ static unsigned long shrink_page_list(struct list_head *page_list, > if (PageDirty(page)) { > nr_dirty++; > > + /* > + * Only kswapd can writeback filesystem pages to > + * avoid risk of stack overflow > + */ > + if (page_is_file_cache(page) && !current_is_kswapd()) { > + inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP); > + goto keep_locked; > + } > + This will cause tons of memcg OOM kill because we have no help of kswapd (now). Could you make this if (scanning_global_lru(sc) && page_is_file_cache(page) && !current_is_kswapd()) ... Then...sorry, please keep file system hook for a while. I'll do memcg dirty_ratio work by myself if Greg will not post new version until the next month. After that, we can remove scanning_global_lru(sc), I think. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p6E1jIQm109441 for ; Wed, 13 Jul 2011 20:45:19 -0500 Received: from fgwmail5.fujitsu.co.jp (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 4ED626B6E0 for ; Wed, 13 Jul 2011 18:45:17 -0700 (PDT) Received: from fgwmail5.fujitsu.co.jp (fgwmail5.fujitsu.co.jp [192.51.44.35]) by cuda.sgi.com with ESMTP id DWhI83m22G3va2a0 for ; Wed, 13 Jul 2011 18:45:17 -0700 (PDT) Received: from m1.gw.fujitsu.co.jp (unknown [10.0.50.71]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 81F5C3EE0C7 for ; Thu, 14 Jul 2011 10:45:15 +0900 (JST) Received: from smail (m1 [127.0.0.1]) by outgoing.m1.gw.fujitsu.co.jp (Postfix) with ESMTP id 654C845DE56 for ; Thu, 14 Jul 2011 10:45:15 +0900 (JST) Received: from s1.gw.fujitsu.co.jp (s1.gw.fujitsu.co.jp [10.0.50.91]) by m1.gw.fujitsu.co.jp (Postfix) with ESMTP id 3AF5445DE55 for ; Thu, 14 Jul 2011 10:45:15 +0900 (JST) Received: from s1.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s1.gw.fujitsu.co.jp (Postfix) with ESMTP id 182DB1DB805B for ; Thu, 14 Jul 2011 10:45:15 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.240.81.134]) by s1.gw.fujitsu.co.jp (Postfix) with ESMTP id B16981DB8054 for ; Thu, 14 Jul 2011 10:45:14 +0900 (JST) Date: Thu, 14 Jul 2011 10:38:01 +0900 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH 1/5] mm: vmscan: Do not writeback filesystem pages in direct reclaim Message-Id: <20110714103801.83e10fdb.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <1310567487-15367-2-git-send-email-mgorman@suse.de> References: <1310567487-15367-1-git-send-email-mgorman@suse.de> <1310567487-15367-2-git-send-email-mgorman@suse.de> Mime-Version: 1.0 List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Mel Gorman Cc: Rik van Riel , Jan Kara , LKML , XFS , Christoph Hellwig , Linux-MM , Minchan Kim , Wu Fengguang , Johannes Weiner On Wed, 13 Jul 2011 15:31:23 +0100 Mel Gorman wrote: > From: Mel Gorman > > When kswapd is failing to keep zones above the min watermark, a process > will enter direct reclaim in the same manner kswapd does. If a dirty > page is encountered during the scan, this page is written to backing > storage using mapping->writepage. > > This causes two problems. First, it can result in very deep call > stacks, particularly if the target storage or filesystem are complex. > Some filesystems ignore write requests from direct reclaim as a result. > The second is that a single-page flush is inefficient in terms of IO. > While there is an expectation that the elevator will merge requests, > this does not always happen. Quoting Christoph Hellwig; > > The elevator has a relatively small window it can operate on, > and can never fix up a bad large scale writeback pattern. > > This patch prevents direct reclaim writing back filesystem pages by > checking if current is kswapd. Anonymous pages are still written to > swap as there is not the equivalent of a flusher thread for anonymos > pages. If the dirty pages cannot be written back, they are placed > back on the LRU lists. > > Signed-off-by: Mel Gorman Hm. > --- > include/linux/mmzone.h | 1 + > mm/vmscan.c | 9 +++++++++ > mm/vmstat.c | 1 + > 3 files changed, 11 insertions(+), 0 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 9f7c3eb..b70a0c0 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -100,6 +100,7 @@ enum zone_stat_item { > NR_UNSTABLE_NFS, /* NFS unstable pages */ > NR_BOUNCE, > NR_VMSCAN_WRITE, > + NR_VMSCAN_WRITE_SKIP, > NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */ > NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ > NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 4f49535..2d3e5b6 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -825,6 +825,15 @@ static unsigned long shrink_page_list(struct list_head *page_list, > if (PageDirty(page)) { > nr_dirty++; > > + /* > + * Only kswapd can writeback filesystem pages to > + * avoid risk of stack overflow > + */ > + if (page_is_file_cache(page) && !current_is_kswapd()) { > + inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP); > + goto keep_locked; > + } > + This will cause tons of memcg OOM kill because we have no help of kswapd (now). Could you make this if (scanning_global_lru(sc) && page_is_file_cache(page) && !current_is_kswapd()) ... Then...sorry, please keep file system hook for a while. I'll do memcg dirty_ratio work by myself if Greg will not post new version until the next month. After that, we can remove scanning_global_lru(sc), I think. Thanks, -Kame _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id 27FE76B004A for ; Wed, 13 Jul 2011 21:45:19 -0400 (EDT) Received: from m4.gw.fujitsu.co.jp (unknown [10.0.50.74]) by fgwmail5.fujitsu.co.jp (Postfix) with ESMTP id 5AD983EE0C1 for ; Thu, 14 Jul 2011 10:45:15 +0900 (JST) Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 3BC6E45DE75 for ; Thu, 14 Jul 2011 10:45:15 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 13D2F45DE6D for ; Thu, 14 Jul 2011 10:45:15 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 069AD1DB8041 for ; Thu, 14 Jul 2011 10:45:15 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.240.81.134]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id B19DF1DB803F for ; Thu, 14 Jul 2011 10:45:14 +0900 (JST) Date: Thu, 14 Jul 2011 10:38:01 +0900 From: KAMEZAWA Hiroyuki Subject: Re: [PATCH 1/5] mm: vmscan: Do not writeback filesystem pages in direct reclaim Message-Id: <20110714103801.83e10fdb.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <1310567487-15367-2-git-send-email-mgorman@suse.de> References: <1310567487-15367-1-git-send-email-mgorman@suse.de> <1310567487-15367-2-git-send-email-mgorman@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Linux-MM , LKML , XFS , Dave Chinner , Christoph Hellwig , Johannes Weiner , Wu Fengguang , Jan Kara , Rik van Riel , Minchan Kim On Wed, 13 Jul 2011 15:31:23 +0100 Mel Gorman wrote: > From: Mel Gorman > > When kswapd is failing to keep zones above the min watermark, a process > will enter direct reclaim in the same manner kswapd does. If a dirty > page is encountered during the scan, this page is written to backing > storage using mapping->writepage. > > This causes two problems. First, it can result in very deep call > stacks, particularly if the target storage or filesystem are complex. > Some filesystems ignore write requests from direct reclaim as a result. > The second is that a single-page flush is inefficient in terms of IO. > While there is an expectation that the elevator will merge requests, > this does not always happen. Quoting Christoph Hellwig; > > The elevator has a relatively small window it can operate on, > and can never fix up a bad large scale writeback pattern. > > This patch prevents direct reclaim writing back filesystem pages by > checking if current is kswapd. Anonymous pages are still written to > swap as there is not the equivalent of a flusher thread for anonymos > pages. If the dirty pages cannot be written back, they are placed > back on the LRU lists. > > Signed-off-by: Mel Gorman Hm. > --- > include/linux/mmzone.h | 1 + > mm/vmscan.c | 9 +++++++++ > mm/vmstat.c | 1 + > 3 files changed, 11 insertions(+), 0 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 9f7c3eb..b70a0c0 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -100,6 +100,7 @@ enum zone_stat_item { > NR_UNSTABLE_NFS, /* NFS unstable pages */ > NR_BOUNCE, > NR_VMSCAN_WRITE, > + NR_VMSCAN_WRITE_SKIP, > NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */ > NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ > NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 4f49535..2d3e5b6 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -825,6 +825,15 @@ static unsigned long shrink_page_list(struct list_head *page_list, > if (PageDirty(page)) { > nr_dirty++; > > + /* > + * Only kswapd can writeback filesystem pages to > + * avoid risk of stack overflow > + */ > + if (page_is_file_cache(page) && !current_is_kswapd()) { > + inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP); > + goto keep_locked; > + } > + This will cause tons of memcg OOM kill because we have no help of kswapd (now). Could you make this if (scanning_global_lru(sc) && page_is_file_cache(page) && !current_is_kswapd()) ... Then...sorry, please keep file system hook for a while. I'll do memcg dirty_ratio work by myself if Greg will not post new version until the next month. After that, we can remove scanning_global_lru(sc), I think. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org