From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
To: Chris Mason <chris.mason@oracle.com>, Mel Gorman <mel@csn.ul.ie>,
Dave Chinner <david@fromorbit.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org
Cc: kosaki.motohiro@jp.fujitsu.com
Subject: Re: [PATCH] mm: disallow direct reclaim page writeback
Date: Wed, 14 Apr 2010 15:52:21 +0900 (JST) [thread overview]
Message-ID: <20100414155211.D14D.A69D9226@jp.fujitsu.com> (raw)
In-Reply-To: <20100413202021.GZ13327@think>
> On Tue, Apr 13, 2010 at 08:34:29PM +0100, Mel Gorman wrote:
> > > This problem is not a filesystem recursion problem which is, as I
> > > understand it, what GFP_NOFS is used to prevent. It's _any_ kernel
> > > code that uses signficant stack before trying to allocate memory
> > > that is the problem. e.g a select() system call:
> > >
> > > Depth Size Location (47 entries)
> > > ----- ---- --------
> > > 0) 7568 16 mempool_alloc_slab+0x16/0x20
> > > 1) 7552 144 mempool_alloc+0x65/0x140
> > > 2) 7408 96 get_request+0x124/0x370
> > > 3) 7312 144 get_request_wait+0x29/0x1b0
> > > 4) 7168 96 __make_request+0x9b/0x490
> > > 5) 7072 208 generic_make_request+0x3df/0x4d0
> > > 6) 6864 80 submit_bio+0x7c/0x100
> > > 7) 6784 96 _xfs_buf_ioapply+0x128/0x2c0 [xfs]
> > > ....
> > > 32) 3184 64 xfs_vm_writepage+0xab/0x160 [xfs]
> > > 33) 3120 384 shrink_page_list+0x65e/0x840
> > > 34) 2736 528 shrink_zone+0x63f/0xe10
> > > 35) 2208 112 do_try_to_free_pages+0xc2/0x3c0
> > > 36) 2096 128 try_to_free_pages+0x77/0x80
> > > 37) 1968 240 __alloc_pages_nodemask+0x3e4/0x710
> > > 38) 1728 48 alloc_pages_current+0x8c/0xe0
> > > 39) 1680 16 __get_free_pages+0xe/0x50
> > > 40) 1664 48 __pollwait+0xca/0x110
> > > 41) 1616 32 unix_poll+0x28/0xc0
> > > 42) 1584 16 sock_poll+0x1d/0x20
> > > 43) 1568 912 do_select+0x3d6/0x700
> > > 44) 656 416 core_sys_select+0x18c/0x2c0
> > > 45) 240 112 sys_select+0x4f/0x110
> > > 46) 128 128 system_call_fastpath+0x16/0x1b
> > >
> > > There's 1.6k of stack used before memory allocation is called, 3.1k
> > > used there before ->writepage is entered, XFS used 3.5k, and
> > > if the mempool needed to allocate a page it would have blown the
> > > stack. If there was any significant storage subsystem (add dm, md
> > > and/or scsi of some kind), it would have blown the stack.
> > >
> > > Basically, there is not enough stack space available to allow direct
> > > reclaim to enter ->writepage _anywhere_ according to the stack usage
> > > profiles we are seeing here....
> > >
> >
> > I'm not denying the evidence but how has it been gotten away with for years
> > then? Prevention of writeback isn't the answer without figuring out how
> > direct reclaimers can queue pages for IO and in the case of lumpy reclaim
> > doing sync IO, then waiting on those pages.
>
> So, I've been reading along, nodding my head to Dave's side of things
> because seeks are evil and direct reclaim makes seeks. I'd really loev
> for direct reclaim to somehow trigger writepages on large chunks instead
> of doing page by page spatters of IO to the drive.
>
> But, somewhere along the line I overlooked the part of Dave's stack trace
> that said:
>
> 43) 1568 912 do_select+0x3d6/0x700
>
> Huh, 912 bytes...for select, really? From poll.h:
>
> /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
> additional memory. */
> #define MAX_STACK_ALLOC 832
> #define FRONTEND_STACK_ALLOC 256
> #define SELECT_STACK_ALLOC FRONTEND_STACK_ALLOC
> #define POLL_STACK_ALLOC FRONTEND_STACK_ALLOC
> #define WQUEUES_STACK_ALLOC (MAX_STACK_ALLOC - FRONTEND_STACK_ALLOC)
> #define N_INLINE_POLL_ENTRIES (WQUEUES_STACK_ALLOC / sizeof(struct poll_table_entry))
>
> So, select is intentionally trying to use that much stack. It should be using
> GFP_NOFS if it really wants to suck down that much stack...if only the
> kernel had some sort of way to dynamically allocate ram, it could try
> that too.
Yeah, Of cource much. I would propse to revert 70674f95c0.
But I doubt GFP_NOFS solve our issue.
next prev parent reply other threads:[~2010-04-14 6:52 UTC|newest]
Thread overview: 115+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-13 0:17 [PATCH] mm: disallow direct reclaim page writeback Dave Chinner
2010-04-13 8:31 ` KOSAKI Motohiro
2010-04-13 10:29 ` Dave Chinner
2010-04-13 11:39 ` KOSAKI Motohiro
2010-04-13 14:36 ` Dave Chinner
2010-04-14 3:12 ` Dave Chinner
2010-04-14 6:52 ` KOSAKI Motohiro
2010-04-15 1:56 ` Dave Chinner
2010-04-14 6:52 ` KOSAKI Motohiro
2010-04-14 7:36 ` Dave Chinner
2010-04-13 9:58 ` Mel Gorman
2010-04-13 11:19 ` Dave Chinner
2010-04-13 19:34 ` Mel Gorman
2010-04-13 20:20 ` Chris Mason
2010-04-14 1:40 ` Dave Chinner
2010-04-14 4:59 ` KAMEZAWA Hiroyuki
2010-04-14 5:41 ` Dave Chinner
2010-04-14 5:54 ` KOSAKI Motohiro
2010-04-14 6:13 ` Minchan Kim
2010-04-14 7:19 ` Minchan Kim
2010-04-14 9:42 ` KAMEZAWA Hiroyuki
2010-04-14 10:01 ` Minchan Kim
2010-04-14 10:07 ` Mel Gorman
2010-04-14 10:16 ` Minchan Kim
2010-04-14 7:06 ` Dave Chinner
2010-04-14 6:52 ` KOSAKI Motohiro
2010-04-14 7:28 ` Dave Chinner
2010-04-14 8:51 ` Mel Gorman
2010-04-15 1:34 ` Dave Chinner
2010-04-15 4:09 ` KOSAKI Motohiro
2010-04-15 4:11 ` [PATCH 1/4] vmscan: delegate pageout io to flusher thread if current is kswapd KOSAKI Motohiro
2010-04-15 8:05 ` Suleiman Souhlal
2010-04-15 8:17 ` KOSAKI Motohiro
2010-04-15 8:26 ` KOSAKI Motohiro
2010-04-15 10:30 ` Johannes Weiner
2010-04-15 17:24 ` Suleiman Souhlal
2010-04-20 2:56 ` Ying Han
2010-04-15 9:32 ` Dave Chinner
2010-04-15 9:41 ` KOSAKI Motohiro
2010-04-15 17:27 ` Suleiman Souhlal
2010-04-15 23:33 ` Dave Chinner
2010-04-15 23:41 ` Suleiman Souhlal
2010-04-16 9:50 ` Alan Cox
2010-04-17 3:06 ` Dave Chinner
2010-04-15 8:18 ` KOSAKI Motohiro
2010-04-15 10:31 ` Mel Gorman
2010-04-15 11:26 ` KOSAKI Motohiro
2010-04-15 4:13 ` [PATCH 2/4] vmscan: kill prev_priority completely KOSAKI Motohiro
2010-04-15 4:14 ` [PATCH 3/4] vmscan: move priority variable into scan_control KOSAKI Motohiro
2010-04-15 4:15 ` [PATCH 4/4] vmscan: delegate page cleaning io to flusher thread if VM pressure is low KOSAKI Motohiro
2010-04-15 4:35 ` [PATCH] mm: disallow direct reclaim page writeback KOSAKI Motohiro
2010-04-15 6:32 ` Dave Chinner
2010-04-15 6:44 ` KOSAKI Motohiro
2010-04-15 6:58 ` Dave Chinner
2010-04-15 6:20 ` Dave Chinner
2010-04-15 6:35 ` KOSAKI Motohiro
2010-04-15 8:54 ` Dave Chinner
2010-04-15 10:21 ` KOSAKI Motohiro
2010-04-15 10:23 ` [PATCH 1/4] vmscan: simplify shrink_inactive_list() KOSAKI Motohiro
2010-04-15 13:15 ` Mel Gorman
2010-04-15 15:01 ` Andi Kleen
2010-04-15 15:44 ` Mel Gorman
2010-04-15 16:54 ` Andi Kleen
2010-04-15 23:40 ` Dave Chinner
2010-04-16 7:13 ` Andi Kleen
2010-04-16 14:57 ` Mel Gorman
2010-04-17 2:37 ` Dave Chinner
2010-04-16 14:55 ` Mel Gorman
2010-04-15 18:22 ` Valdis.Kletnieks
2010-04-16 9:39 ` Mel Gorman
2010-04-15 10:24 ` [PATCH 2/4] [cleanup] mm: introduce free_pages_prepare KOSAKI Motohiro
2010-04-15 13:33 ` Mel Gorman
2010-04-15 10:24 ` [PATCH 3/4] mm: introduce free_pages_bulk KOSAKI Motohiro
2010-04-15 13:46 ` Mel Gorman
2010-04-15 10:26 ` [PATCH 4/4] vmscan: replace the pagevec in shrink_inactive_list() with list KOSAKI Motohiro
2010-04-15 10:28 ` [PATCH] mm: disallow direct reclaim page writeback Mel Gorman
2010-04-15 13:42 ` Chris Mason
2010-04-15 17:50 ` tytso
2010-04-16 15:05 ` Mel Gorman
2010-04-19 15:15 ` Mel Gorman
2010-04-16 4:14 ` Dave Chinner
2010-04-16 15:14 ` Mel Gorman
2010-04-18 0:32 ` Andrew Morton
2010-04-18 19:05 ` Christoph Hellwig
2010-04-18 16:31 ` Andrew Morton
2010-04-18 19:35 ` Christoph Hellwig
2010-04-18 19:11 ` Sorin Faibish
2010-04-18 19:10 ` Sorin Faibish
2010-04-18 21:30 ` James Bottomley
2010-04-18 23:34 ` Sorin Faibish
2010-04-19 3:08 ` tytso
2010-04-19 0:35 ` Dave Chinner
2010-04-19 0:49 ` Arjan van de Ven
2010-04-19 1:08 ` Dave Chinner
2010-04-19 4:32 ` Arjan van de Ven
2010-04-19 15:20 ` Mel Gorman
2010-04-23 1:06 ` Dave Chinner
2010-04-23 10:50 ` Mel Gorman
2010-04-15 14:57 ` Andi Kleen
2010-04-15 2:37 ` Johannes Weiner
2010-04-15 2:43 ` KOSAKI Motohiro
2010-04-16 23:56 ` Johannes Weiner
2010-04-14 6:52 ` KOSAKI Motohiro [this message]
2010-04-14 10:06 ` Andi Kleen
2010-04-14 11:20 ` Chris Mason
2010-04-14 12:15 ` Andi Kleen
2010-04-14 12:32 ` Alan Cox
2010-04-14 12:34 ` Andi Kleen
2010-04-14 13:23 ` Mel Gorman
2010-04-14 14:07 ` Chris Mason
2010-04-14 0:24 ` Minchan Kim
2010-04-14 4:44 ` Dave Chinner
2010-04-14 7:54 ` Minchan Kim
2010-04-16 1:13 ` KAMEZAWA Hiroyuki
2010-04-16 4:18 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100414155211.D14D.A69D9226@jp.fujitsu.com \
--to=kosaki.motohiro@jp.fujitsu.com \
--cc=chris.mason@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).