From: Ying Han <yinghan@google.com> To: Johannes Weiner <hannes@cmpxchg.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>, Suleiman Souhlal <ssouhlal@freebsd.org>, Dave Chinner <david@fromorbit.com>, Mel Gorman <mel@csn.ul.ie>, Chris Mason <chris.mason@oracle.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, suleiman@google.com Subject: Re: [PATCH 1/4] vmscan: delegate pageout io to flusher thread if current is kswapd Date: Mon, 19 Apr 2010 19:56:27 -0700 [thread overview] Message-ID: <u2i604427e01004191956zc0a40b06t6689041af6156b78@mail.gmail.com> (raw) In-Reply-To: <20100415103053.GA5336@cmpxchg.org> On Thu, Apr 15, 2010 at 3:30 AM, Johannes Weiner <hannes@cmpxchg.org> wrote: > On Thu, Apr 15, 2010 at 05:26:27PM +0900, KOSAKI Motohiro wrote: >> Cc to Johannes >> >> > > >> > > On Apr 14, 2010, at 9:11 PM, KOSAKI Motohiro wrote: >> > > >> > > > Now, vmscan pageout() is one of IO throuput degression source. >> > > > Some IO workload makes very much order-0 allocation and reclaim >> > > > and pageout's 4K IOs are making annoying lots seeks. >> > > > >> > > > At least, kswapd can avoid such pageout() because kswapd don't >> > > > need to consider OOM-Killer situation. that's no risk. >> > > > >> > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> >> > > >> > > What's your opinion on trying to cluster the writes done by pageout, >> > > instead of not doing any paging out in kswapd? >> > > Something along these lines: >> > >> > Interesting. >> > So, I'd like to review your patch carefully. can you please give me one >> > day? :) >> >> Hannes, if my remember is correct, you tried similar swap-cluster IO >> long time ago. now I can't remember why we didn't merged such patch. >> Do you remember anything? > > Oh, quite vividly in fact :) For a lot of swap loads the LRU order > diverged heavily from swap slot order and readaround was a waste of > time. > > Of course, the patch looked good, too, but it did not match reality > that well. > > I guess 'how about this patch?' won't get us as far as 'how about > those numbers/graphs of several real-life workloads? oh and here > is the patch...'. Hannes, We recently ran into this problem while running some experiments on ext4 filesystem. We experienced the scenario where we are writing a large file or just opening a large file with limited memory allocation (using containers), and the process got OOMed. The memory assigned to the container is reasonably large, and the OOM can not be reproduced on ext2 with the same configurations. Later we figured this might be due to the delayed block allocation from ext4. Vmscan sends a single page to ext4->writepage(), then ext4 punts if the block is DA'ed and re-dirties the page. On the other hand, the flusher thread use ext4->writepages() which does include the block allocation. We looked at the OOM log under ext4, all pages within the container were in inactive list and either Dirty or WriteBack. Also, the zones are all marked as "all_unreclaimable" which indicates the reclaim path has scanned the LRU quite lot times without making progress. If the delayed block allocation is the cause for pageout() not being able to flush dirty pages and then triggers OOMs, should we signal the fs to force write out dirty pages under memory pressure? --Ying > >> > > Cluster writes to disk due to memory pressure. >> > > >> > > Write out logically adjacent pages to the one we're paging out >> > > so that we may get better IOs in these situations: >> > > These pages are likely to be contiguous on disk to the one we're >> > > writing out, so they should get merged into a single disk IO. >> > > >> > > Signed-off-by: Suleiman Souhlal <suleiman@google.com> > > For random IO, LRU order will have nothing to do with mapping/disk order. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >
WARNING: multiple messages have this Message-ID (diff)
From: Ying Han <yinghan@google.com> To: Johannes Weiner <hannes@cmpxchg.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>, Suleiman Souhlal <ssouhlal@freebsd.org>, Dave Chinner <david@fromorbit.com>, Mel Gorman <mel@csn.ul.ie>, Chris Mason <chris.mason@oracle.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, suleiman@google.com Subject: Re: [PATCH 1/4] vmscan: delegate pageout io to flusher thread if current is kswapd Date: Mon, 19 Apr 2010 19:56:27 -0700 [thread overview] Message-ID: <u2i604427e01004191956zc0a40b06t6689041af6156b78@mail.gmail.com> (raw) In-Reply-To: <20100415103053.GA5336@cmpxchg.org> On Thu, Apr 15, 2010 at 3:30 AM, Johannes Weiner <hannes@cmpxchg.org> wrote: > On Thu, Apr 15, 2010 at 05:26:27PM +0900, KOSAKI Motohiro wrote: >> Cc to Johannes >> >> > > >> > > On Apr 14, 2010, at 9:11 PM, KOSAKI Motohiro wrote: >> > > >> > > > Now, vmscan pageout() is one of IO throuput degression source. >> > > > Some IO workload makes very much order-0 allocation and reclaim >> > > > and pageout's 4K IOs are making annoying lots seeks. >> > > > >> > > > At least, kswapd can avoid such pageout() because kswapd don't >> > > > need to consider OOM-Killer situation. that's no risk. >> > > > >> > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> >> > > >> > > What's your opinion on trying to cluster the writes done by pageout, >> > > instead of not doing any paging out in kswapd? >> > > Something along these lines: >> > >> > Interesting. >> > So, I'd like to review your patch carefully. can you please give me one >> > day? :) >> >> Hannes, if my remember is correct, you tried similar swap-cluster IO >> long time ago. now I can't remember why we didn't merged such patch. >> Do you remember anything? > > Oh, quite vividly in fact :) For a lot of swap loads the LRU order > diverged heavily from swap slot order and readaround was a waste of > time. > > Of course, the patch looked good, too, but it did not match reality > that well. > > I guess 'how about this patch?' won't get us as far as 'how about > those numbers/graphs of several real-life workloads? oh and here > is the patch...'. Hannes, We recently ran into this problem while running some experiments on ext4 filesystem. We experienced the scenario where we are writing a large file or just opening a large file with limited memory allocation (using containers), and the process got OOMed. The memory assigned to the container is reasonably large, and the OOM can not be reproduced on ext2 with the same configurations. Later we figured this might be due to the delayed block allocation from ext4. Vmscan sends a single page to ext4->writepage(), then ext4 punts if the block is DA'ed and re-dirties the page. On the other hand, the flusher thread use ext4->writepages() which does include the block allocation. We looked at the OOM log under ext4, all pages within the container were in inactive list and either Dirty or WriteBack. Also, the zones are all marked as "all_unreclaimable" which indicates the reclaim path has scanned the LRU quite lot times without making progress. If the delayed block allocation is the cause for pageout() not being able to flush dirty pages and then triggers OOMs, should we signal the fs to force write out dirty pages under memory pressure? --Ying > >> > > Cluster writes to disk due to memory pressure. >> > > >> > > Write out logically adjacent pages to the one we're paging out >> > > so that we may get better IOs in these situations: >> > > These pages are likely to be contiguous on disk to the one we're >> > > writing out, so they should get merged into a single disk IO. >> > > >> > > Signed-off-by: Suleiman Souhlal <suleiman@google.com> > > For random IO, LRU order will have nothing to do with mapping/disk order. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-04-20 2:56 UTC|newest] Thread overview: 248+ messages / expand[flat|nested] mbox.gz Atom feed top 2010-04-13 0:17 [PATCH] mm: disallow direct reclaim page writeback Dave Chinner 2010-04-13 0:17 ` Dave Chinner 2010-04-13 8:31 ` KOSAKI Motohiro 2010-04-13 8:31 ` KOSAKI Motohiro 2010-04-13 10:29 ` Dave Chinner 2010-04-13 10:29 ` Dave Chinner 2010-04-13 11:39 ` KOSAKI Motohiro 2010-04-13 11:39 ` KOSAKI Motohiro 2010-04-13 14:36 ` Dave Chinner 2010-04-13 14:36 ` Dave Chinner 2010-04-14 3:12 ` Dave Chinner 2010-04-14 3:12 ` Dave Chinner 2010-04-14 6:52 ` KOSAKI Motohiro 2010-04-14 6:52 ` KOSAKI Motohiro 2010-04-15 1:56 ` Dave Chinner 2010-04-15 1:56 ` Dave Chinner 2010-04-14 6:52 ` KOSAKI Motohiro 2010-04-14 6:52 ` KOSAKI Motohiro 2010-04-14 7:36 ` Dave Chinner 2010-04-14 7:36 ` Dave Chinner 2010-04-13 9:58 ` Mel Gorman 2010-04-13 9:58 ` Mel Gorman 2010-04-13 11:19 ` Dave Chinner 2010-04-13 11:19 ` Dave Chinner 2010-04-13 19:34 ` Mel Gorman 2010-04-13 19:34 ` Mel Gorman 2010-04-13 20:20 ` Chris Mason 2010-04-13 20:20 ` Chris Mason 2010-04-14 1:40 ` Dave Chinner 2010-04-14 1:40 ` Dave Chinner 2010-04-14 4:59 ` KAMEZAWA Hiroyuki 2010-04-14 4:59 ` KAMEZAWA Hiroyuki 2010-04-14 5:41 ` Dave Chinner 2010-04-14 5:41 ` Dave Chinner 2010-04-14 5:54 ` KOSAKI Motohiro 2010-04-14 5:54 ` KOSAKI Motohiro 2010-04-14 6:13 ` Minchan Kim 2010-04-14 7:19 ` Minchan Kim 2010-04-14 7:19 ` Minchan Kim 2010-04-14 9:42 ` KAMEZAWA Hiroyuki 2010-04-14 9:42 ` KAMEZAWA Hiroyuki 2010-04-14 9:42 ` KAMEZAWA Hiroyuki 2010-04-14 10:01 ` Minchan Kim 2010-04-14 10:01 ` Minchan Kim 2010-04-14 10:07 ` Mel Gorman 2010-04-14 10:07 ` Mel Gorman 2010-04-14 10:07 ` Mel Gorman 2010-04-14 10:16 ` Minchan Kim 2010-04-14 10:16 ` Minchan Kim 2010-04-14 7:06 ` Dave Chinner 2010-04-14 7:06 ` Dave Chinner 2010-04-14 6:52 ` KOSAKI Motohiro 2010-04-14 6:52 ` KOSAKI Motohiro 2010-04-14 7:28 ` Dave Chinner 2010-04-14 7:28 ` Dave Chinner 2010-04-14 8:51 ` Mel Gorman 2010-04-14 8:51 ` Mel Gorman 2010-04-15 1:34 ` Dave Chinner 2010-04-15 1:34 ` Dave Chinner 2010-04-15 1:34 ` Dave Chinner 2010-04-15 4:09 ` KOSAKI Motohiro 2010-04-15 4:09 ` KOSAKI Motohiro 2010-04-15 4:11 ` [PATCH 1/4] vmscan: delegate pageout io to flusher thread if current is kswapd KOSAKI Motohiro 2010-04-15 4:11 ` KOSAKI Motohiro 2010-04-15 4:11 ` KOSAKI Motohiro 2010-04-15 8:05 ` Suleiman Souhlal 2010-04-15 8:05 ` Suleiman Souhlal 2010-04-15 8:17 ` KOSAKI Motohiro 2010-04-15 8:17 ` KOSAKI Motohiro 2010-04-15 8:26 ` KOSAKI Motohiro 2010-04-15 8:26 ` KOSAKI Motohiro 2010-04-15 10:30 ` Johannes Weiner 2010-04-15 10:30 ` Johannes Weiner 2010-04-15 17:24 ` Suleiman Souhlal 2010-04-15 17:24 ` Suleiman Souhlal 2010-04-20 2:56 ` Ying Han [this message] 2010-04-20 2:56 ` Ying Han 2010-04-15 9:32 ` Dave Chinner 2010-04-15 9:32 ` Dave Chinner 2010-04-15 9:41 ` KOSAKI Motohiro 2010-04-15 9:41 ` KOSAKI Motohiro 2010-04-15 17:27 ` Suleiman Souhlal 2010-04-15 17:27 ` Suleiman Souhlal 2010-04-15 23:33 ` Dave Chinner 2010-04-15 23:33 ` Dave Chinner 2010-04-15 23:41 ` Suleiman Souhlal 2010-04-15 23:41 ` Suleiman Souhlal 2010-04-16 9:50 ` Alan Cox 2010-04-16 9:50 ` Alan Cox 2010-04-17 3:06 ` Dave Chinner 2010-04-17 3:06 ` Dave Chinner 2010-04-15 8:18 ` KOSAKI Motohiro 2010-04-15 8:18 ` KOSAKI Motohiro 2010-04-15 8:18 ` KOSAKI Motohiro 2010-04-15 10:31 ` Mel Gorman 2010-04-15 10:31 ` Mel Gorman 2010-04-15 11:26 ` KOSAKI Motohiro 2010-04-15 11:26 ` KOSAKI Motohiro 2010-04-15 4:13 ` [PATCH 2/4] vmscan: kill prev_priority completely KOSAKI Motohiro 2010-04-15 4:13 ` KOSAKI Motohiro 2010-04-15 4:13 ` KOSAKI Motohiro 2010-04-15 4:14 ` [PATCH 3/4] vmscan: move priority variable into scan_control KOSAKI Motohiro 2010-04-15 4:14 ` KOSAKI Motohiro 2010-04-15 4:14 ` KOSAKI Motohiro 2010-04-15 4:15 ` [PATCH 4/4] vmscan: delegate page cleaning io to flusher thread if VM pressure is low KOSAKI Motohiro 2010-04-15 4:15 ` KOSAKI Motohiro 2010-04-15 4:15 ` KOSAKI Motohiro 2010-04-15 4:35 ` [PATCH] mm: disallow direct reclaim page writeback KOSAKI Motohiro 2010-04-15 4:35 ` KOSAKI Motohiro 2010-04-15 6:32 ` Dave Chinner 2010-04-15 6:32 ` Dave Chinner 2010-04-15 6:44 ` KOSAKI Motohiro 2010-04-15 6:44 ` KOSAKI Motohiro 2010-04-15 6:58 ` Dave Chinner 2010-04-15 6:58 ` Dave Chinner 2010-04-15 6:20 ` Dave Chinner 2010-04-15 6:20 ` Dave Chinner 2010-04-15 6:35 ` KOSAKI Motohiro 2010-04-15 6:35 ` KOSAKI Motohiro 2010-04-15 8:54 ` Dave Chinner 2010-04-15 8:54 ` Dave Chinner 2010-04-15 10:21 ` KOSAKI Motohiro 2010-04-15 10:21 ` KOSAKI Motohiro 2010-04-15 10:23 ` [PATCH 1/4] vmscan: simplify shrink_inactive_list() KOSAKI Motohiro 2010-04-15 10:23 ` KOSAKI Motohiro 2010-04-15 13:15 ` Mel Gorman 2010-04-15 13:15 ` Mel Gorman 2010-04-15 15:01 ` Andi Kleen 2010-04-15 15:01 ` Andi Kleen 2010-04-15 15:01 ` Andi Kleen 2010-04-15 15:44 ` Mel Gorman 2010-04-15 15:44 ` Mel Gorman 2010-04-15 16:54 ` Andi Kleen 2010-04-15 16:54 ` Andi Kleen 2010-04-15 23:40 ` Dave Chinner 2010-04-15 23:40 ` Dave Chinner 2010-04-16 7:13 ` Andi Kleen 2010-04-16 7:13 ` Andi Kleen 2010-04-16 14:57 ` Mel Gorman 2010-04-16 14:57 ` Mel Gorman 2010-04-17 2:37 ` Dave Chinner 2010-04-17 2:37 ` Dave Chinner 2010-04-16 14:55 ` Mel Gorman 2010-04-16 14:55 ` Mel Gorman 2010-04-15 18:22 ` Valdis.Kletnieks 2010-04-16 9:39 ` Mel Gorman 2010-04-16 9:39 ` Mel Gorman 2010-04-15 10:24 ` [PATCH 2/4] [cleanup] mm: introduce free_pages_prepare KOSAKI Motohiro 2010-04-15 10:24 ` KOSAKI Motohiro 2010-04-15 10:24 ` KOSAKI Motohiro 2010-04-15 13:33 ` Mel Gorman 2010-04-15 13:33 ` Mel Gorman 2010-04-15 10:24 ` [PATCH 3/4] mm: introduce free_pages_bulk KOSAKI Motohiro 2010-04-15 10:24 ` KOSAKI Motohiro 2010-04-15 10:24 ` KOSAKI Motohiro 2010-04-15 13:46 ` Mel Gorman 2010-04-15 13:46 ` Mel Gorman 2010-04-15 10:26 ` [PATCH 4/4] vmscan: replace the pagevec in shrink_inactive_list() with list KOSAKI Motohiro 2010-04-15 10:26 ` KOSAKI Motohiro 2010-04-15 10:28 ` [PATCH] mm: disallow direct reclaim page writeback Mel Gorman 2010-04-15 10:28 ` Mel Gorman 2010-04-15 13:42 ` Chris Mason 2010-04-15 13:42 ` Chris Mason 2010-04-15 17:50 ` tytso 2010-04-15 17:50 ` tytso 2010-04-15 17:50 ` tytso 2010-04-16 15:05 ` Mel Gorman 2010-04-16 15:05 ` Mel Gorman 2010-04-19 15:15 ` Mel Gorman 2010-04-19 15:15 ` Mel Gorman 2010-04-19 15:15 ` Mel Gorman 2010-04-19 17:38 ` Chris Mason 2010-04-16 15:05 ` Mel Gorman 2010-04-16 4:14 ` Dave Chinner 2010-04-16 4:14 ` Dave Chinner 2010-04-16 15:14 ` Mel Gorman 2010-04-16 15:14 ` Mel Gorman 2010-04-18 0:32 ` Andrew Morton 2010-04-18 0:32 ` Andrew Morton 2010-04-18 19:05 ` Christoph Hellwig 2010-04-18 19:05 ` Christoph Hellwig 2010-04-18 16:31 ` Andrew Morton 2010-04-18 16:31 ` Andrew Morton 2010-04-18 19:35 ` Christoph Hellwig 2010-04-18 19:35 ` Christoph Hellwig 2010-04-18 19:11 ` Sorin Faibish 2010-04-18 19:11 ` Sorin Faibish 2010-04-18 19:11 ` Sorin Faibish 2010-04-18 19:10 ` Sorin Faibish 2010-04-18 19:10 ` Sorin Faibish 2010-04-18 19:10 ` Sorin Faibish 2010-04-18 21:30 ` James Bottomley 2010-04-18 21:30 ` James Bottomley 2010-04-18 23:34 ` Sorin Faibish 2010-04-18 23:34 ` Sorin Faibish 2010-04-18 23:34 ` Sorin Faibish 2010-04-19 3:08 ` tytso 2010-04-19 3:08 ` tytso 2010-04-19 0:35 ` Dave Chinner 2010-04-19 0:35 ` Dave Chinner 2010-04-19 0:49 ` Arjan van de Ven 2010-04-19 0:49 ` Arjan van de Ven 2010-04-19 1:08 ` Dave Chinner 2010-04-19 1:08 ` Dave Chinner 2010-04-19 4:32 ` Arjan van de Ven 2010-04-19 4:32 ` Arjan van de Ven 2010-04-19 15:20 ` Mel Gorman 2010-04-19 15:20 ` Mel Gorman 2010-04-23 1:06 ` Dave Chinner 2010-04-23 1:06 ` Dave Chinner 2010-04-23 10:50 ` Mel Gorman 2010-04-23 10:50 ` Mel Gorman 2010-04-15 14:57 ` Andi Kleen 2010-04-15 14:57 ` Andi Kleen 2010-04-15 2:37 ` Johannes Weiner 2010-04-15 2:37 ` Johannes Weiner 2010-04-15 2:43 ` KOSAKI Motohiro 2010-04-15 2:43 ` KOSAKI Motohiro 2010-04-16 23:56 ` Johannes Weiner 2010-04-16 23:56 ` Johannes Weiner 2010-04-14 6:52 ` KOSAKI Motohiro 2010-04-14 6:52 ` KOSAKI Motohiro 2010-04-14 10:06 ` Andi Kleen 2010-04-14 10:06 ` Andi Kleen 2010-04-14 10:06 ` Andi Kleen 2010-04-14 11:20 ` Chris Mason 2010-04-14 11:20 ` Chris Mason 2010-04-14 12:15 ` Andi Kleen 2010-04-14 12:15 ` Andi Kleen 2010-04-14 12:15 ` Andi Kleen 2010-04-14 12:32 ` Alan Cox 2010-04-14 12:32 ` Alan Cox 2010-04-14 12:34 ` Andi Kleen 2010-04-14 12:34 ` Andi Kleen 2010-04-14 13:23 ` Mel Gorman 2010-04-14 13:23 ` Mel Gorman 2010-04-14 14:07 ` Chris Mason 2010-04-14 14:07 ` Chris Mason 2010-04-14 0:24 ` Minchan Kim 2010-04-14 0:24 ` Minchan Kim 2010-04-14 4:44 ` Dave Chinner 2010-04-14 4:44 ` Dave Chinner 2010-04-14 7:54 ` Minchan Kim 2010-04-14 7:54 ` Minchan Kim 2010-04-16 1:13 ` KAMEZAWA Hiroyuki 2010-04-16 1:13 ` KAMEZAWA Hiroyuki 2010-04-16 4:18 ` KAMEZAWA Hiroyuki 2010-04-16 4:18 ` KAMEZAWA Hiroyuki
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=u2i604427e01004191956zc0a40b06t6689041af6156b78@mail.gmail.com \ --to=yinghan@google.com \ --cc=chris.mason@oracle.com \ --cc=david@fromorbit.com \ --cc=hannes@cmpxchg.org \ --cc=kosaki.motohiro@jp.fujitsu.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mel@csn.ul.ie \ --cc=ssouhlal@freebsd.org \ --cc=suleiman@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.