From: Wu Fengguang <fengguang.wu@intel.com> To: Mel Gorman <mel@csn.ul.ie> Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>, "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>, Linux Kernel List <linux-kernel@vger.kernel.org>, Rik van Riel <riel@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>, Minchan Kim <minchan.kim@gmail.com>, Andrea Arcangeli <aarcange@redhat.com>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>, Dave Chinner <david@fromorbit.com>, Chris Mason <chris.mason@oracle.com>, Christoph Hellwig <hch@lst.de>, Andrew Morton <akpm@linux-foundation.org> Subject: Re: [PATCH 09/10] vmscan: Do not writeback filesystem pages in direct reclaim Date: Mon, 13 Sep 2010 22:33:01 +0800 [thread overview] Message-ID: <20100913143301.GB14158@localhost> (raw) In-Reply-To: <20100913135510.GH23508@csn.ul.ie> > > > /* Check if we should syncronously wait for writeback */ > > > if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) { > > > > It is possible to OOM if the LRU list is small and/or the storage is slow, so > > that the flusher cannot clean enough pages before the LRU is fully scanned. > > > > To go OOM, nr_reclaimed would have to be 0 and for that, the entire list > would have to be dirty or unreclaimable. If that situation happens, is > the dirty throttling not also broken? My worry is, even if the dirty throttling limit is instantly set to 0, it may still take time to knock down the number of dirty pages. Think about 500MB dirty pages waiting to be flushed to a slow USB stick. > > So we may need do waits on dirty/writeback pages on *order-0* > > direct reclaims, when priority goes rather low (such as < 3). > > > > In case this is really necessary, the necessary stalling could be done by > removing the check for lumpy reclaim in should_reclaim_stall(). What do > you think of the following replacement? I merely want to provide a guarantee, so it may be enough to add this: if (nr_freed == nr_taken) return false; + if (!priority) + return true; This ensures the last full LRU scan will do necessary waits to prevent the OOM. > /* > * Returns true if the caller should wait to clean dirty/writeback pages. > * > * If we are direct reclaiming for contiguous pages and we do not reclaim > * everything in the list, try again and wait for writeback IO to complete. > * This will stall high-order allocations noticeably. Only do that when really > * need to free the pages under high memory pressure. > * > * Alternatively, if priority is getting high, it may be because there are > * too many dirty pages on the LRU. Rather than returning nr_reclaimed == 0 > * and potentially causing an OOM, we stall on writeback. > */ > static inline bool should_reclaim_stall(unsigned long nr_taken, > unsigned long nr_freed, > int priority, > struct scan_control *sc) > { > int stall_priority; > > /* kswapd should not stall on sync IO */ > if (current_is_kswapd()) > return false; > > /* If we cannot writeback, there is no point stalling */ > if (!sc->may_writepage) > return false; > > /* If we have relaimed everything on the isolated list, no stall */ > if (nr_freed == nr_taken) > return false; > > /* > * For high-order allocations, there are two stall thresholds. > * High-cost allocations stall immediately where as lower > * order allocations such as stacks require the scanning > * priority to be much higher before stalling. > */ > if (sc->order > PAGE_ALLOC_COSTLY_ORDER) > stall_priority = DEF_PRIORITY; > else > stall_priority = DEF_PRIORITY / 3; > > return priority <= stall_priority; > } > > > > > + int dirty_retry = MAX_SWAP_CLEAN_WAIT; > > > set_lumpy_reclaim_mode(priority, sc, true); > > > - nr_reclaimed += shrink_page_list(&page_list, sc); > > > + > > > + while (nr_reclaimed < nr_taken && nr_dirty && dirty_retry--) { > > > + struct page *page, *tmp; > > > + > > > > > + /* Take off the clean pages marked for activation */ > > > + list_for_each_entry_safe(page, tmp, &page_list, lru) { > > > + if (PageDirty(page) || PageWriteback(page)) > > > + continue; > > > + > > > + list_del(&page->lru); > > > + list_add(&page->lru, &putback_list); > > > + } > > > > nitpick: I guess the above loop is optional code to avoid overheads > > of shrink_page_list() repeatedly going through some unfreeable pages? > > Pretty much, if they are to be activated, there is no point trying to reclaim > them again. It's unnecessary overhead. A strong motivation for this > series is to reduce overheads in the reclaim paths and unnecessary > retrying of unfreeable pages. We do so much waits in this loop, so that users will get upset by the iowait stalls much much more than the CPU overheads.. best option is always to avoid entering this loop in the first place, and if we succeeded on that, these lines of optimizations will be nothing but mind destroyers for newbie developers. > > Considering this is the slow code path, I'd prefer to keep the code > > simple than to do such optimizations. > > > > > + wakeup_flusher_threads(laptop_mode ? 0 : nr_dirty); > > > > how about > > if (!laptop_mode) > > wakeup_flusher_threads(nr_dirty); > > > > It's not the same thing. wakeup_flusher_threads(0) in laptop_mode is to > clean all pages if some need dirtying. laptop_mode cleans all pages to > minimise disk spinups. Ah.. that's sure fine. I wonder if the flusher could be more smart to automatically extend the number of pages to write in laptop mode. This could simplify some callers. Thanks, Fengguang
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com> To: Mel Gorman <mel@csn.ul.ie> Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>, "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>, Linux Kernel List <linux-kernel@vger.kernel.org>, Rik van Riel <riel@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>, Minchan Kim <minchan.kim@gmail.com>, Andrea Arcangeli <aarcange@redhat.com>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>, Dave Chinner <david@fromorbit.com>, Chris Mason <chris.mason@oracle.com>, Christoph Hellwig <hch@lst.de>, Andrew Morton <akpm@linux-foundation.org> Subject: Re: [PATCH 09/10] vmscan: Do not writeback filesystem pages in direct reclaim Date: Mon, 13 Sep 2010 22:33:01 +0800 [thread overview] Message-ID: <20100913143301.GB14158@localhost> (raw) In-Reply-To: <20100913135510.GH23508@csn.ul.ie> > > > /* Check if we should syncronously wait for writeback */ > > > if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) { > > > > It is possible to OOM if the LRU list is small and/or the storage is slow, so > > that the flusher cannot clean enough pages before the LRU is fully scanned. > > > > To go OOM, nr_reclaimed would have to be 0 and for that, the entire list > would have to be dirty or unreclaimable. If that situation happens, is > the dirty throttling not also broken? My worry is, even if the dirty throttling limit is instantly set to 0, it may still take time to knock down the number of dirty pages. Think about 500MB dirty pages waiting to be flushed to a slow USB stick. > > So we may need do waits on dirty/writeback pages on *order-0* > > direct reclaims, when priority goes rather low (such as < 3). > > > > In case this is really necessary, the necessary stalling could be done by > removing the check for lumpy reclaim in should_reclaim_stall(). What do > you think of the following replacement? I merely want to provide a guarantee, so it may be enough to add this: if (nr_freed == nr_taken) return false; + if (!priority) + return true; This ensures the last full LRU scan will do necessary waits to prevent the OOM. > /* > * Returns true if the caller should wait to clean dirty/writeback pages. > * > * If we are direct reclaiming for contiguous pages and we do not reclaim > * everything in the list, try again and wait for writeback IO to complete. > * This will stall high-order allocations noticeably. Only do that when really > * need to free the pages under high memory pressure. > * > * Alternatively, if priority is getting high, it may be because there are > * too many dirty pages on the LRU. Rather than returning nr_reclaimed == 0 > * and potentially causing an OOM, we stall on writeback. > */ > static inline bool should_reclaim_stall(unsigned long nr_taken, > unsigned long nr_freed, > int priority, > struct scan_control *sc) > { > int stall_priority; > > /* kswapd should not stall on sync IO */ > if (current_is_kswapd()) > return false; > > /* If we cannot writeback, there is no point stalling */ > if (!sc->may_writepage) > return false; > > /* If we have relaimed everything on the isolated list, no stall */ > if (nr_freed == nr_taken) > return false; > > /* > * For high-order allocations, there are two stall thresholds. > * High-cost allocations stall immediately where as lower > * order allocations such as stacks require the scanning > * priority to be much higher before stalling. > */ > if (sc->order > PAGE_ALLOC_COSTLY_ORDER) > stall_priority = DEF_PRIORITY; > else > stall_priority = DEF_PRIORITY / 3; > > return priority <= stall_priority; > } > > > > > + int dirty_retry = MAX_SWAP_CLEAN_WAIT; > > > set_lumpy_reclaim_mode(priority, sc, true); > > > - nr_reclaimed += shrink_page_list(&page_list, sc); > > > + > > > + while (nr_reclaimed < nr_taken && nr_dirty && dirty_retry--) { > > > + struct page *page, *tmp; > > > + > > > > > + /* Take off the clean pages marked for activation */ > > > + list_for_each_entry_safe(page, tmp, &page_list, lru) { > > > + if (PageDirty(page) || PageWriteback(page)) > > > + continue; > > > + > > > + list_del(&page->lru); > > > + list_add(&page->lru, &putback_list); > > > + } > > > > nitpick: I guess the above loop is optional code to avoid overheads > > of shrink_page_list() repeatedly going through some unfreeable pages? > > Pretty much, if they are to be activated, there is no point trying to reclaim > them again. It's unnecessary overhead. A strong motivation for this > series is to reduce overheads in the reclaim paths and unnecessary > retrying of unfreeable pages. We do so much waits in this loop, so that users will get upset by the iowait stalls much much more than the CPU overheads.. best option is always to avoid entering this loop in the first place, and if we succeeded on that, these lines of optimizations will be nothing but mind destroyers for newbie developers. > > Considering this is the slow code path, I'd prefer to keep the code > > simple than to do such optimizations. > > > > > + wakeup_flusher_threads(laptop_mode ? 0 : nr_dirty); > > > > how about > > if (!laptop_mode) > > wakeup_flusher_threads(nr_dirty); > > > > It's not the same thing. wakeup_flusher_threads(0) in laptop_mode is to > clean all pages if some need dirtying. laptop_mode cleans all pages to > minimise disk spinups. Ah.. that's sure fine. I wonder if the flusher could be more smart to automatically extend the number of pages to write in laptop mode. This could simplify some callers. Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-09-13 14:33 UTC|newest] Thread overview: 133+ messages / expand[flat|nested] mbox.gz Atom feed top 2010-09-06 10:47 [PATCH 0/9] Reduce latencies and improve overall reclaim efficiency v1 Mel Gorman 2010-09-06 10:47 ` Mel Gorman 2010-09-06 10:47 ` [PATCH 01/10] tracing, vmscan: Add trace events for LRU list shrinking Mel Gorman 2010-09-06 10:47 ` Mel Gorman 2010-09-06 10:47 ` [PATCH 02/10] writeback: Account for time spent congestion_waited Mel Gorman 2010-09-06 10:47 ` Mel Gorman 2010-09-06 10:47 ` [PATCH 03/10] writeback: Do not congestion sleep if there are no congested BDIs or significant writeback Mel Gorman 2010-09-06 10:47 ` Mel Gorman 2010-09-07 15:25 ` Minchan Kim 2010-09-07 15:25 ` Minchan Kim 2010-09-08 11:04 ` Mel Gorman 2010-09-08 11:04 ` Mel Gorman 2010-09-08 14:52 ` Minchan Kim 2010-09-08 14:52 ` Minchan Kim 2010-09-09 8:54 ` Mel Gorman 2010-09-09 8:54 ` Mel Gorman 2010-09-12 15:37 ` Minchan Kim 2010-09-12 15:37 ` Minchan Kim 2010-09-13 8:55 ` Mel Gorman 2010-09-13 8:55 ` Mel Gorman 2010-09-13 9:48 ` Minchan Kim 2010-09-13 9:48 ` Minchan Kim 2010-09-13 10:07 ` Mel Gorman 2010-09-13 10:07 ` Mel Gorman 2010-09-13 10:20 ` Minchan Kim 2010-09-13 10:20 ` Minchan Kim 2010-09-13 10:30 ` Mel Gorman 2010-09-13 10:30 ` Mel Gorman 2010-09-08 21:23 ` Andrew Morton 2010-09-08 21:23 ` Andrew Morton 2010-09-09 10:43 ` Mel Gorman 2010-09-09 10:43 ` Mel Gorman 2010-09-09 3:02 ` KAMEZAWA Hiroyuki 2010-09-09 3:02 ` KAMEZAWA Hiroyuki 2010-09-09 8:58 ` Mel Gorman 2010-09-09 8:58 ` Mel Gorman 2010-09-06 10:47 ` [PATCH 04/10] vmscan: Synchronous lumpy reclaim should not call congestion_wait() Mel Gorman 2010-09-06 10:47 ` Mel Gorman 2010-09-07 15:26 ` Minchan Kim 2010-09-07 15:26 ` Minchan Kim 2010-09-08 6:15 ` Johannes Weiner 2010-09-08 6:15 ` Johannes Weiner 2010-09-08 11:25 ` Wu Fengguang 2010-09-08 11:25 ` Wu Fengguang 2010-09-09 3:03 ` KAMEZAWA Hiroyuki 2010-09-09 3:03 ` KAMEZAWA Hiroyuki 2010-09-06 10:47 ` [PATCH 05/10] vmscan: Synchrounous lumpy reclaim use lock_page() instead trylock_page() Mel Gorman 2010-09-06 10:47 ` Mel Gorman 2010-09-07 15:28 ` Minchan Kim 2010-09-07 15:28 ` Minchan Kim 2010-09-08 6:16 ` Johannes Weiner 2010-09-08 6:16 ` Johannes Weiner 2010-09-08 11:28 ` Wu Fengguang 2010-09-08 11:28 ` Wu Fengguang 2010-09-09 3:04 ` KAMEZAWA Hiroyuki 2010-09-09 3:04 ` KAMEZAWA Hiroyuki 2010-09-09 3:15 ` KAMEZAWA Hiroyuki 2010-09-09 3:15 ` KAMEZAWA Hiroyuki 2010-09-09 3:25 ` Wu Fengguang 2010-09-09 3:25 ` Wu Fengguang 2010-09-09 4:13 ` KOSAKI Motohiro 2010-09-09 4:13 ` KOSAKI Motohiro 2010-09-09 9:22 ` Mel Gorman 2010-09-09 9:22 ` Mel Gorman 2010-09-10 10:25 ` KOSAKI Motohiro 2010-09-10 10:25 ` KOSAKI Motohiro 2010-09-10 10:33 ` KOSAKI Motohiro 2010-09-10 10:33 ` KOSAKI Motohiro 2010-09-10 10:33 ` KOSAKI Motohiro 2010-09-13 9:14 ` Mel Gorman 2010-09-13 9:14 ` Mel Gorman 2010-09-14 10:14 ` KOSAKI Motohiro 2010-09-14 10:14 ` KOSAKI Motohiro 2010-09-06 10:47 ` [PATCH 06/10] vmscan: Narrow the scenarios lumpy reclaim uses synchrounous reclaim Mel Gorman 2010-09-06 10:47 ` Mel Gorman 2010-09-09 3:14 ` KAMEZAWA Hiroyuki 2010-09-09 3:14 ` KAMEZAWA Hiroyuki 2010-09-06 10:47 ` [PATCH 07/10] vmscan: Remove dead code in shrink_inactive_list() Mel Gorman 2010-09-06 10:47 ` Mel Gorman 2010-09-07 15:33 ` Minchan Kim 2010-09-07 15:33 ` Minchan Kim 2010-09-06 10:47 ` [PATCH 08/10] vmscan: isolated_lru_pages() stop neighbour search if neighbour cannot be isolated Mel Gorman 2010-09-06 10:47 ` Mel Gorman 2010-09-07 15:37 ` Minchan Kim 2010-09-07 15:37 ` Minchan Kim 2010-09-08 11:12 ` Mel Gorman 2010-09-08 11:12 ` Mel Gorman 2010-09-08 14:58 ` Minchan Kim 2010-09-08 14:58 ` Minchan Kim 2010-09-08 11:37 ` Wu Fengguang 2010-09-08 11:37 ` Wu Fengguang 2010-09-08 12:50 ` Mel Gorman 2010-09-08 12:50 ` Mel Gorman 2010-09-08 13:14 ` Wu Fengguang 2010-09-08 13:14 ` Wu Fengguang 2010-09-08 13:27 ` Mel Gorman 2010-09-08 13:27 ` Mel Gorman 2010-09-09 3:17 ` KAMEZAWA Hiroyuki 2010-09-09 3:17 ` KAMEZAWA Hiroyuki 2010-09-06 10:47 ` [PATCH 09/10] vmscan: Do not writeback filesystem pages in direct reclaim Mel Gorman 2010-09-06 10:47 ` Mel Gorman 2010-09-13 13:31 ` Wu Fengguang 2010-09-13 13:31 ` Wu Fengguang 2010-09-13 13:55 ` Mel Gorman 2010-09-13 13:55 ` Mel Gorman 2010-09-13 14:33 ` Wu Fengguang [this message] 2010-09-13 14:33 ` Wu Fengguang 2010-10-28 21:50 ` Christoph Hellwig 2010-10-28 21:50 ` Christoph Hellwig 2010-10-29 10:26 ` Mel Gorman 2010-10-29 10:26 ` Mel Gorman 2010-09-06 10:47 ` [PATCH 10/10] vmscan: Kick flusher threads to clean pages when reclaim is encountering dirty pages Mel Gorman 2010-09-06 10:47 ` Mel Gorman 2010-09-09 3:22 ` KAMEZAWA Hiroyuki 2010-09-09 3:22 ` KAMEZAWA Hiroyuki 2010-09-09 9:32 ` Mel Gorman 2010-09-09 9:32 ` Mel Gorman 2010-09-13 0:53 ` KAMEZAWA Hiroyuki 2010-09-13 0:53 ` KAMEZAWA Hiroyuki 2010-09-13 13:48 ` Wu Fengguang 2010-09-13 13:48 ` Wu Fengguang 2010-09-13 14:10 ` Mel Gorman 2010-09-13 14:10 ` Mel Gorman 2010-09-13 14:41 ` Wu Fengguang 2010-09-13 14:41 ` Wu Fengguang 2010-09-06 10:49 ` [PATCH 0/9] Reduce latencies and improve overall reclaim efficiency v1 Mel Gorman 2010-09-06 10:49 ` Mel Gorman 2010-09-08 3:14 ` KOSAKI Motohiro 2010-09-08 3:14 ` KOSAKI Motohiro 2010-09-08 8:38 ` Mel Gorman 2010-09-08 8:38 ` Mel Gorman 2010-09-13 23:10 ` Minchan Kim 2010-09-13 23:10 ` Minchan Kim
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20100913143301.GB14158@localhost \ --to=fengguang.wu@intel.com \ --cc=aarcange@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=chris.mason@oracle.com \ --cc=david@fromorbit.com \ --cc=hannes@cmpxchg.org \ --cc=hch@lst.de \ --cc=kamezawa.hiroyu@jp.fujitsu.com \ --cc=kosaki.motohiro@jp.fujitsu.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mel@csn.ul.ie \ --cc=minchan.kim@gmail.com \ --cc=riel@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.