linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
To: Dave Chinner <david@fromorbit.com>
Cc: kosaki.motohiro@jp.fujitsu.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	Chris Mason <chris.mason@oracle.com>
Subject: Re: [PATCH] mm: disallow direct reclaim page writeback
Date: Wed, 14 Apr 2010 15:52:10 +0900 (JST)	[thread overview]
Message-ID: <20100414155201.D14A.A69D9226@jp.fujitsu.com> (raw)
In-Reply-To: <20100413143659.GA2493@dastard>

> On Tue, Apr 13, 2010 at 08:39:29PM +0900, KOSAKI Motohiro wrote:
> > Hi
> > 
> > > > Pros:
> > > > 	1) prevent XFS stack overflow
> > > > 	2) improve io workload performance
> > > > 
> > > > Cons:
> > > > 	3) TOTALLY kill lumpy reclaim (i.e. high order allocation)
> > > > 
> > > > So, If we only need to consider io workload this is no downside. but
> > > > it can't.
> > > > 
> > > > I think (1) is XFS issue. XFS should care it itself.
> > > 
> > > The filesystem is irrelevant, IMO.
> > > 
> > > The traces from the reporter showed that we've got close to a 2k
> > > stack footprint for memory allocation to direct reclaim and then we
> > > can put the entire writeback path on top of that. This is roughly
> > > 3.5k for XFS, and then depending on the storage subsystem
> > > configuration and transport can be another 2k of stack needed below
> > > XFS.
> > > 
> > > IOWs, if we completely ignore the filesystem stack usage, there's
> > > still up to 4k of stack needed in the direct reclaim path. Given
> > > that one of the stack traces supplied show direct reclaim being
> > > entered with over 3k of stack already used, pretty much any
> > > filesystem is capable of blowing an 8k stack.
> > > 
> > > So, this is not an XFS issue, even though XFS is the first to
> > > uncover it. Don't shoot the messenger....
> > 
> > Thanks explanation. I haven't noticed direct reclaim consume
> > 2k stack. I'll investigate it and try diet it.
> > But XFS 3.5K stack consumption is too large too. please diet too.
> 
> It hasn't grown in the last 2 years after the last major diet where
> all the fat was trimmed from it in the last round of the i386 4k
> stack vs XFS saga. it seems that everything else around XFS has
> grown in that time, and now we are blowing stacks again....

I have dumb question, If xfs haven't bloat stack usage, why 3.5
stack usage works fine on 4k stack kernel? It seems impossible.

Please don't think I blame you. I don't know what is "4k stack vs XFS saga".
I merely want to understand what you said.


> > > Hence I think that direct reclaim should be deferring to the
> > > background flusher threads for cleaning memory and not trying to be
> > > doing it itself.
> > 
> > Well, you seems continue to discuss io workload. I don't disagree
> > such point. 
> > 
> > example, If only order-0 reclaim skip pageout(), we will get the above
> > benefit too.
> 
> But it won't prevent start blowups...
> 
> > > > but we never kill pageout() completely because we can't
> > > > assume users don't run high order allocation workload.
> > > 
> > > I think that lumpy reclaim will still work just fine.
> > > 
> > > Lumpy reclaim appears to be using IO as a method of slowing
> > > down the reclaim cycle - the congestion_wait() call will still
> > > function as it does now if the background flusher threads are active
> > > and causing congestion. I don't see why lumpy reclaim specifically
> > > needs to be issuing IO to make it work - if the congestion_wait() is
> > > not waiting long enough then wait longer - don't issue IO to extend
> > > the wait time.
> > 
> > lumpy reclaim is for allocation high order page. then, it not only
> > reclaim LRU head page, but also its PFN neighborhood. PFN neighborhood
> > is often newly page and still dirty. then we enfoce pageout cleaning
> > and discard it.
> 
> Ok, I see that now - I missed the second call to __isolate_lru_pages()
> in isolate_lru_pages().

No problem. It's one of VM mess. Usual developers don't know it :-)



> > When high order allocation occur, we don't only need free enough amount
> > memory, but also need free enough contenious memory block.
> 
> Agreed, that was why I was kind of surprised not to find it was
> doing that. But, as you have pointed out, that was my mistake.
> 
> > If we need to consider _only_ io throughput, waiting flusher thread
> > might faster perhaps, but actually we also need to consider reclaim
> > latency. I'm worry about such point too.
> 
> True, but without know how to test and measure such things I can't
> really comment...

Agreed. I know making VM mesurement benchmark is very difficult. but
probably it is necessary....
I'm sorry, now I can't give you good convenient benchmark.

> 
> > > Of course, the code is a maze of twisty passages, so I probably
> > > missed something important. Hopefully someone can tell me what. ;)
> > > 
> > > FWIW, the biggest problem here is that I have absolutely no clue on
> > > how to test what the impact on lumpy reclaim really is. Does anyone
> > > have a relatively simple test that can be run to determine what the
> > > impact is?
> > 
> > So, can you please run two workloads concurrently?
> >  - Normal IO workload (fio, iozone, etc..)
> >  - echo $NUM > /proc/sys/vm/nr_hugepages
> 
> What do I measure/observe/record that is meaningful?
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>




  parent reply	other threads:[~2010-04-14  6:52 UTC|newest]

Thread overview: 115+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-13  0:17 [PATCH] mm: disallow direct reclaim page writeback Dave Chinner
2010-04-13  8:31 ` KOSAKI Motohiro
2010-04-13 10:29   ` Dave Chinner
2010-04-13 11:39     ` KOSAKI Motohiro
2010-04-13 14:36       ` Dave Chinner
2010-04-14  3:12         ` Dave Chinner
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-15  1:56             ` Dave Chinner
2010-04-14  6:52         ` KOSAKI Motohiro [this message]
2010-04-14  7:36           ` Dave Chinner
2010-04-13  9:58 ` Mel Gorman
2010-04-13 11:19   ` Dave Chinner
2010-04-13 19:34     ` Mel Gorman
2010-04-13 20:20       ` Chris Mason
2010-04-14  1:40         ` Dave Chinner
2010-04-14  4:59           ` KAMEZAWA Hiroyuki
2010-04-14  5:41             ` Dave Chinner
2010-04-14  5:54               ` KOSAKI Motohiro
2010-04-14  6:13                 ` Minchan Kim
2010-04-14  7:19                   ` Minchan Kim
2010-04-14  9:42                     ` KAMEZAWA Hiroyuki
2010-04-14 10:01                       ` Minchan Kim
2010-04-14 10:07                         ` Mel Gorman
2010-04-14 10:16                           ` Minchan Kim
2010-04-14  7:06                 ` Dave Chinner
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-14  7:28             ` Dave Chinner
2010-04-14  8:51               ` Mel Gorman
2010-04-15  1:34                 ` Dave Chinner
2010-04-15  4:09                   ` KOSAKI Motohiro
2010-04-15  4:11                     ` [PATCH 1/4] vmscan: delegate pageout io to flusher thread if current is kswapd KOSAKI Motohiro
2010-04-15  8:05                       ` Suleiman Souhlal
2010-04-15  8:17                         ` KOSAKI Motohiro
2010-04-15  8:26                           ` KOSAKI Motohiro
2010-04-15 10:30                             ` Johannes Weiner
2010-04-15 17:24                               ` Suleiman Souhlal
2010-04-20  2:56                               ` Ying Han
2010-04-15  9:32                         ` Dave Chinner
2010-04-15  9:41                           ` KOSAKI Motohiro
2010-04-15 17:27                           ` Suleiman Souhlal
2010-04-15 23:33                             ` Dave Chinner
2010-04-15 23:41                               ` Suleiman Souhlal
2010-04-16  9:50                               ` Alan Cox
2010-04-17  3:06                                 ` Dave Chinner
2010-04-15  8:18                       ` KOSAKI Motohiro
2010-04-15 10:31                       ` Mel Gorman
2010-04-15 11:26                         ` KOSAKI Motohiro
2010-04-15  4:13                     ` [PATCH 2/4] vmscan: kill prev_priority completely KOSAKI Motohiro
2010-04-15  4:14                     ` [PATCH 3/4] vmscan: move priority variable into scan_control KOSAKI Motohiro
2010-04-15  4:15                     ` [PATCH 4/4] vmscan: delegate page cleaning io to flusher thread if VM pressure is low KOSAKI Motohiro
2010-04-15  4:35                     ` [PATCH] mm: disallow direct reclaim page writeback KOSAKI Motohiro
2010-04-15  6:32                       ` Dave Chinner
2010-04-15  6:44                         ` KOSAKI Motohiro
2010-04-15  6:58                           ` Dave Chinner
2010-04-15  6:20                     ` Dave Chinner
2010-04-15  6:35                       ` KOSAKI Motohiro
2010-04-15  8:54                         ` Dave Chinner
2010-04-15 10:21                           ` KOSAKI Motohiro
2010-04-15 10:23                             ` [PATCH 1/4] vmscan: simplify shrink_inactive_list() KOSAKI Motohiro
2010-04-15 13:15                               ` Mel Gorman
2010-04-15 15:01                                 ` Andi Kleen
2010-04-15 15:44                                   ` Mel Gorman
2010-04-15 16:54                                     ` Andi Kleen
2010-04-15 23:40                                       ` Dave Chinner
2010-04-16  7:13                                         ` Andi Kleen
2010-04-16 14:57                                         ` Mel Gorman
2010-04-17  2:37                                           ` Dave Chinner
2010-04-16 14:55                                       ` Mel Gorman
2010-04-15 18:22                                 ` Valdis.Kletnieks
2010-04-16  9:39                                   ` Mel Gorman
2010-04-15 10:24                             ` [PATCH 2/4] [cleanup] mm: introduce free_pages_prepare KOSAKI Motohiro
2010-04-15 13:33                               ` Mel Gorman
2010-04-15 10:24                             ` [PATCH 3/4] mm: introduce free_pages_bulk KOSAKI Motohiro
2010-04-15 13:46                               ` Mel Gorman
2010-04-15 10:26                             ` [PATCH 4/4] vmscan: replace the pagevec in shrink_inactive_list() with list KOSAKI Motohiro
2010-04-15 10:28                   ` [PATCH] mm: disallow direct reclaim page writeback Mel Gorman
2010-04-15 13:42                     ` Chris Mason
2010-04-15 17:50                       ` tytso
2010-04-16 15:05                       ` Mel Gorman
2010-04-19 15:15                         ` Mel Gorman
2010-04-16  4:14                     ` Dave Chinner
2010-04-16 15:14                       ` Mel Gorman
2010-04-18  0:32                         ` Andrew Morton
2010-04-18 19:05                           ` Christoph Hellwig
2010-04-18 16:31                             ` Andrew Morton
2010-04-18 19:35                               ` Christoph Hellwig
2010-04-18 19:11                             ` Sorin Faibish
2010-04-18 19:10                           ` Sorin Faibish
2010-04-18 21:30                             ` James Bottomley
2010-04-18 23:34                               ` Sorin Faibish
2010-04-19  3:08                               ` tytso
2010-04-19  0:35                           ` Dave Chinner
2010-04-19  0:49                             ` Arjan van de Ven
2010-04-19  1:08                               ` Dave Chinner
2010-04-19  4:32                                 ` Arjan van de Ven
2010-04-19 15:20                         ` Mel Gorman
2010-04-23  1:06                           ` Dave Chinner
2010-04-23 10:50                             ` Mel Gorman
2010-04-15 14:57                   ` Andi Kleen
2010-04-15  2:37                 ` Johannes Weiner
2010-04-15  2:43                   ` KOSAKI Motohiro
2010-04-16 23:56                     ` Johannes Weiner
2010-04-14  6:52         ` KOSAKI Motohiro
2010-04-14 10:06         ` Andi Kleen
2010-04-14 11:20           ` Chris Mason
2010-04-14 12:15             ` Andi Kleen
2010-04-14 12:32               ` Alan Cox
2010-04-14 12:34                 ` Andi Kleen
2010-04-14 13:23             ` Mel Gorman
2010-04-14 14:07               ` Chris Mason
2010-04-14  0:24 ` Minchan Kim
2010-04-14  4:44   ` Dave Chinner
2010-04-14  7:54     ` Minchan Kim
2010-04-16  1:13 ` KAMEZAWA Hiroyuki
2010-04-16  4:18   ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100414155201.D14A.A69D9226@jp.fujitsu.com \
    --to=kosaki.motohiro@jp.fujitsu.com \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).