From: Mel Gorman <mel@csn.ul.ie> To: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Linux Kernel List <linux-kernel@vger.kernel.org>, Johannes Weiner <hannes@cmpxchg.org>, Minchan Kim <minchan.kim@gmail.com>, Wu Fengguang <fengguang.wu@intel.com>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Subject: Re: [PATCH 8/8] writeback: Do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone Date: Mon, 20 Sep 2010 10:52:39 +0100 [thread overview] Message-ID: <20100920095239.GE1998@csn.ul.ie> (raw) In-Reply-To: <20100916152810.cb074e9f.akpm@linux-foundation.org> On Thu, Sep 16, 2010 at 03:28:10PM -0700, Andrew Morton wrote: > On Wed, 15 Sep 2010 13:27:51 +0100 > Mel Gorman <mel@csn.ul.ie> wrote: > > > If wait_iff_congested() is called with no BDI congested, the function simply > > calls cond_resched(). In the event there is significant writeback happening > > in the zone that is being reclaimed, this can be a poor decision as reclaim > > would succeed once writeback was completed. Without any backoff logic, > > younger clean pages can be reclaimed resulting in more reclaim overall and > > poor performance. > > This is because cond_resched() is a no-op, Can be a no-op surely. There is an expectation that it will sometimes schedule. > and we skip around the > under-writeback pages and go off and look further along the LRU for > younger clean pages, yes? > Yes. > > This patch tracks how many pages backed by a congested BDI were found during > > scanning. If all the dirty pages encountered on a list isolated from the > > LRU belong to a congested BDI, the zone is marked congested until the zone > > reaches the high watermark. > > High watermark, or low watermark? > High watermark. The check is made by kswapd. > The terms are rather ambiguous so let's avoid them. Maybe "full" > watermark and "empty"? > Unfortunately they are ambiguous to me. I know what the high watermark is but not what the full or empty watermarks are. > > > > ... > > > > @@ -706,6 +726,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, > > goto keep; > > > > VM_BUG_ON(PageActive(page)); > > + VM_BUG_ON(page_zone(page) != zone); > > ? > It should not be the case that pages from multiple zones exist on the list passed to shrink_page_list(). Lets say someone broke that assumption in the future, which one should be marked congested? No way to know, so lets catch the bug if the assumptions is ever broken. > > sc->nr_scanned++; > > > > > > ... > > > > @@ -903,6 +928,15 @@ keep_lumpy: > > VM_BUG_ON(PageLRU(page) || PageUnevictable(page)); > > } > > > > + /* > > + * Tag a zone as congested if all the dirty pages encountered were > > + * backed by a congested BDI. In this case, reclaimers should just > > + * back off and wait for congestion to clear because further reclaim > > + * will encounter the same problem > > + */ > > + if (nr_dirty == nr_congested) > > + zone_set_flag(zone, ZONE_CONGESTED); > > The implicit "100%" there is a magic number. hrm. > It is but any other value for that number would be very specific to a workload or a machine. A sysctl would have to be maintained and I couldn't convince myself that anyone could do something sensible with the value. Rather than introducing a new tunable for this, I was toying with the idea over the weekend on tracking the scanned/reclaimed ratio within the scan control - possibly on a per-zone basis but more likely globally. When this ratio drops below a given threshold, start increasing the time it backs off for up to a maximum of HZ/10. There are a lot of details to iron out but it's possibly a better long-term direction than adding a tunable for this implicit magic number because it would be adaptive to what is happening for the current workload. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab
WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mel@csn.ul.ie> To: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Linux Kernel List <linux-kernel@vger.kernel.org>, Johannes Weiner <hannes@cmpxchg.org>, Minchan Kim <minchan.kim@gmail.com>, Wu Fengguang <fengguang.wu@intel.com>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Subject: Re: [PATCH 8/8] writeback: Do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone Date: Mon, 20 Sep 2010 10:52:39 +0100 [thread overview] Message-ID: <20100920095239.GE1998@csn.ul.ie> (raw) In-Reply-To: <20100916152810.cb074e9f.akpm@linux-foundation.org> On Thu, Sep 16, 2010 at 03:28:10PM -0700, Andrew Morton wrote: > On Wed, 15 Sep 2010 13:27:51 +0100 > Mel Gorman <mel@csn.ul.ie> wrote: > > > If wait_iff_congested() is called with no BDI congested, the function simply > > calls cond_resched(). In the event there is significant writeback happening > > in the zone that is being reclaimed, this can be a poor decision as reclaim > > would succeed once writeback was completed. Without any backoff logic, > > younger clean pages can be reclaimed resulting in more reclaim overall and > > poor performance. > > This is because cond_resched() is a no-op, Can be a no-op surely. There is an expectation that it will sometimes schedule. > and we skip around the > under-writeback pages and go off and look further along the LRU for > younger clean pages, yes? > Yes. > > This patch tracks how many pages backed by a congested BDI were found during > > scanning. If all the dirty pages encountered on a list isolated from the > > LRU belong to a congested BDI, the zone is marked congested until the zone > > reaches the high watermark. > > High watermark, or low watermark? > High watermark. The check is made by kswapd. > The terms are rather ambiguous so let's avoid them. Maybe "full" > watermark and "empty"? > Unfortunately they are ambiguous to me. I know what the high watermark is but not what the full or empty watermarks are. > > > > ... > > > > @@ -706,6 +726,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, > > goto keep; > > > > VM_BUG_ON(PageActive(page)); > > + VM_BUG_ON(page_zone(page) != zone); > > ? > It should not be the case that pages from multiple zones exist on the list passed to shrink_page_list(). Lets say someone broke that assumption in the future, which one should be marked congested? No way to know, so lets catch the bug if the assumptions is ever broken. > > sc->nr_scanned++; > > > > > > ... > > > > @@ -903,6 +928,15 @@ keep_lumpy: > > VM_BUG_ON(PageLRU(page) || PageUnevictable(page)); > > } > > > > + /* > > + * Tag a zone as congested if all the dirty pages encountered were > > + * backed by a congested BDI. In this case, reclaimers should just > > + * back off and wait for congestion to clear because further reclaim > > + * will encounter the same problem > > + */ > > + if (nr_dirty == nr_congested) > > + zone_set_flag(zone, ZONE_CONGESTED); > > The implicit "100%" there is a magic number. hrm. > It is but any other value for that number would be very specific to a workload or a machine. A sysctl would have to be maintained and I couldn't convince myself that anyone could do something sensible with the value. Rather than introducing a new tunable for this, I was toying with the idea over the weekend on tracking the scanned/reclaimed ratio within the scan control - possibly on a per-zone basis but more likely globally. When this ratio drops below a given threshold, start increasing the time it backs off for up to a maximum of HZ/10. There are a lot of details to iron out but it's possibly a better long-term direction than adding a tunable for this implicit magic number because it would be adaptive to what is happening for the current workload. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-09-20 9:52 UTC|newest] Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top 2010-09-15 12:27 [PATCH 0/8] Reduce latencies and improve overall reclaim efficiency v2 Mel Gorman 2010-09-15 12:27 ` Mel Gorman 2010-09-15 12:27 ` [PATCH 1/8] tracing, vmscan: Add trace events for LRU list shrinking Mel Gorman 2010-09-15 12:27 ` Mel Gorman 2010-09-15 12:27 ` [PATCH 2/8] writeback: Account for time spent congestion_waited Mel Gorman 2010-09-15 12:27 ` Mel Gorman 2010-09-15 12:27 ` [PATCH 3/8] vmscan: Synchronous lumpy reclaim should not call congestion_wait() Mel Gorman 2010-09-15 12:27 ` Mel Gorman 2010-09-15 12:27 ` [PATCH 4/8] vmscan: Narrow the scenarios lumpy reclaim uses synchrounous reclaim Mel Gorman 2010-09-15 12:27 ` Mel Gorman 2010-09-15 12:27 ` [PATCH 5/8] vmscan: Remove dead code in shrink_inactive_list() Mel Gorman 2010-09-15 12:27 ` Mel Gorman 2010-09-15 12:27 ` [PATCH 6/8] vmscan: isolated_lru_pages() stop neighbour search if neighbour cannot be isolated Mel Gorman 2010-09-15 12:27 ` Mel Gorman 2010-09-15 12:27 ` [PATCH 7/8] writeback: Do not sleep on the congestion queue if there are no congested BDIs Mel Gorman 2010-09-15 12:27 ` Mel Gorman 2010-09-16 7:59 ` Minchan Kim 2010-09-16 7:59 ` Minchan Kim 2010-09-16 8:23 ` Mel Gorman 2010-09-16 8:23 ` Mel Gorman 2010-09-15 12:27 ` [PATCH 8/8] writeback: Do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone Mel Gorman 2010-09-15 12:27 ` Mel Gorman 2010-09-16 8:13 ` Minchan Kim 2010-09-16 8:13 ` Minchan Kim 2010-09-16 9:18 ` Mel Gorman 2010-09-16 9:18 ` Mel Gorman 2010-09-16 14:11 ` Minchan Kim 2010-09-16 14:11 ` Minchan Kim 2010-09-16 15:18 ` Mel Gorman 2010-09-16 15:18 ` Mel Gorman 2010-09-16 22:28 ` Andrew Morton 2010-09-16 22:28 ` Andrew Morton 2010-09-20 9:52 ` Mel Gorman [this message] 2010-09-20 9:52 ` Mel Gorman 2010-09-21 21:44 ` Andrew Morton 2010-09-21 21:44 ` Andrew Morton 2010-09-21 22:10 ` Mel Gorman 2010-09-21 22:10 ` Mel Gorman 2010-09-21 22:24 ` Andrew Morton 2010-09-21 22:24 ` Andrew Morton 2010-09-20 13:05 ` [PATCH] writeback: Do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encounted in the current zone fix Mel Gorman 2010-09-20 13:05 ` Mel Gorman 2010-09-16 22:28 ` [PATCH 0/8] Reduce latencies and improve overall reclaim efficiency v2 Andrew Morton 2010-09-16 22:28 ` Andrew Morton 2010-09-17 7:52 ` Mel Gorman 2010-09-17 7:52 ` Mel Gorman 2010-10-14 15:28 ` Christian Ehrhardt 2010-10-14 15:28 ` Christian Ehrhardt 2010-10-14 15:28 ` Christian Ehrhardt 2010-10-18 13:55 ` Mel Gorman 2010-10-18 13:55 ` Mel Gorman 2010-10-22 12:29 ` Christian Ehrhardt 2010-10-22 12:29 ` Christian Ehrhardt 2010-10-22 12:29 ` Christian Ehrhardt 2010-11-03 10:50 ` Christian Ehrhardt 2010-11-03 10:50 ` Christian Ehrhardt 2010-11-03 10:50 ` Christian Ehrhardt 2010-11-10 14:37 ` Mel Gorman 2010-11-10 14:37 ` Mel Gorman
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20100920095239.GE1998@csn.ul.ie \ --to=mel@csn.ul.ie \ --cc=akpm@linux-foundation.org \ --cc=fengguang.wu@intel.com \ --cc=hannes@cmpxchg.org \ --cc=kamezawa.hiroyu@jp.fujitsu.com \ --cc=kosaki.motohiro@jp.fujitsu.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=minchan.kim@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.