All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Chris Mason <chris.mason@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] mm: disallow direct reclaim page writeback
Date: Mon, 19 Apr 2010 10:35:56 +1000	[thread overview]
Message-ID: <20100419003556.GC2520@dastard> (raw)
In-Reply-To: <20100417203239.dda79e88.akpm@linux-foundation.org>

On Sat, Apr 17, 2010 at 08:32:39PM -0400, Andrew Morton wrote:
> 
> There are two issues here: stack utilisation and poor IO patterns in
> direct reclaim.  They are different.
> 
> The poor IO patterns thing is a regression.  Some time several years
> ago (around 2.6.16, perhaps), page reclaim started to do a LOT more
> dirty-page writeback than it used to.  AFAIK nobody attempted to work
> out why, nor attempted to try to fix it.

I think that part of the problem is that at roughly the same time
writeback started on a long down hill slide as well, and we've
really only fixed that in the last couple of kernel releases. Also,
it tends to take more that just writing a few large files to invoke
the LRU-based writeback code is it is generally not invoked in
filesystem "performance" testing. Hence my bet is on the fact that
the effects of LRU-based writeback are rarely noticed in common
testing.

IOWs, low memory testing is not something a lot of people do. Add to
that the fact that most fs people, including me, have been treating
the VM as a black box that a bunch of other people have been taking
care of and hence really just been hoping it does the right thing,
and we've got a recipe for an unnoticed descent into a Bad Place.

[snip]

> Any attempt to implement writearound in pageout will need to find a way
> to safely pin that address_space.  One way is to take a temporary ref
> on mapping->host, but IIRC that introduced nasties with inode_lock. 
> Certainly it'll put more load on that worrisomely-singleton lock.

A problem already solved in the background flusher threads....

> Regarding simply not doing any writeout in direct reclaim (Dave's
> initial proposal): the problem is that pageout() will clean a page in
> the target zone.  Normal writeout won't do that, so we could get into a
> situation where vast amounts of writeout is happening, but none of it
> is cleaning pages in the zone which we're trying to allocate from. 
> It's quite possibly livelockable, too.

That's true, but seeing as we can't safely do writeback from
reclaim, we need some method of telling the background threads to
write a certain region of an inode. Perhaps some extension of a
struct writeback_control?

> Doing writearound (if we can get it going) will solve that adequately
> (assuming that the target page gets reliably written), but it won't
> help the stack usage problem.
> 
> 
> To solve the IO-pattern thing I really do think we should first work
> out ytf we started doing much more IO off the LRU.  What caused it?  Is
> it really unavoidable?

/me wonders who has the time and expertise to do that archeology

> To solve the stack-usage thing: dunno, really.  One could envisage code
> which skips pageout() if we're using more than X amount of stack, but

Which, if we have to set it as low as 1.5k of stack used, may as
well just skip pageout()....

> that sucks.  Another possibility might be to hand the target page over
> to another thread (I suppose kswapd will do) and then synchronise with
> that thread - get_page()+wait_on_page_locked() is one way.  The helper
> thread could of course do writearound.

I'm fundamentally opposed to pushing IO to another place in the VM
when it could be just as easily handed to the flusher threads.
Also, consider that there's only one kswapd thread in a given
context (e.g. per CPU), but we can scale the number of flusher
threads as need be....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Chris Mason <chris.mason@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] mm: disallow direct reclaim page writeback
Date: Mon, 19 Apr 2010 10:35:56 +1000	[thread overview]
Message-ID: <20100419003556.GC2520@dastard> (raw)
In-Reply-To: <20100417203239.dda79e88.akpm@linux-foundation.org>

On Sat, Apr 17, 2010 at 08:32:39PM -0400, Andrew Morton wrote:
> 
> There are two issues here: stack utilisation and poor IO patterns in
> direct reclaim.  They are different.
> 
> The poor IO patterns thing is a regression.  Some time several years
> ago (around 2.6.16, perhaps), page reclaim started to do a LOT more
> dirty-page writeback than it used to.  AFAIK nobody attempted to work
> out why, nor attempted to try to fix it.

I think that part of the problem is that at roughly the same time
writeback started on a long down hill slide as well, and we've
really only fixed that in the last couple of kernel releases. Also,
it tends to take more that just writing a few large files to invoke
the LRU-based writeback code is it is generally not invoked in
filesystem "performance" testing. Hence my bet is on the fact that
the effects of LRU-based writeback are rarely noticed in common
testing.

IOWs, low memory testing is not something a lot of people do. Add to
that the fact that most fs people, including me, have been treating
the VM as a black box that a bunch of other people have been taking
care of and hence really just been hoping it does the right thing,
and we've got a recipe for an unnoticed descent into a Bad Place.

[snip]

> Any attempt to implement writearound in pageout will need to find a way
> to safely pin that address_space.  One way is to take a temporary ref
> on mapping->host, but IIRC that introduced nasties with inode_lock. 
> Certainly it'll put more load on that worrisomely-singleton lock.

A problem already solved in the background flusher threads....

> Regarding simply not doing any writeout in direct reclaim (Dave's
> initial proposal): the problem is that pageout() will clean a page in
> the target zone.  Normal writeout won't do that, so we could get into a
> situation where vast amounts of writeout is happening, but none of it
> is cleaning pages in the zone which we're trying to allocate from. 
> It's quite possibly livelockable, too.

That's true, but seeing as we can't safely do writeback from
reclaim, we need some method of telling the background threads to
write a certain region of an inode. Perhaps some extension of a
struct writeback_control?

> Doing writearound (if we can get it going) will solve that adequately
> (assuming that the target page gets reliably written), but it won't
> help the stack usage problem.
> 
> 
> To solve the IO-pattern thing I really do think we should first work
> out ytf we started doing much more IO off the LRU.  What caused it?  Is
> it really unavoidable?

/me wonders who has the time and expertise to do that archeology

> To solve the stack-usage thing: dunno, really.  One could envisage code
> which skips pageout() if we're using more than X amount of stack, but

Which, if we have to set it as low as 1.5k of stack used, may as
well just skip pageout()....

> that sucks.  Another possibility might be to hand the target page over
> to another thread (I suppose kswapd will do) and then synchronise with
> that thread - get_page()+wait_on_page_locked() is one way.  The helper
> thread could of course do writearound.

I'm fundamentally opposed to pushing IO to another place in the VM
when it could be just as easily handed to the flusher threads.
Also, consider that there's only one kswapd thread in a given
context (e.g. per CPU), but we can scale the number of flusher
threads as need be....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-04-19  0:36 UTC|newest]

Thread overview: 248+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-13  0:17 [PATCH] mm: disallow direct reclaim page writeback Dave Chinner
2010-04-13  0:17 ` Dave Chinner
2010-04-13  8:31 ` KOSAKI Motohiro
2010-04-13  8:31   ` KOSAKI Motohiro
2010-04-13 10:29   ` Dave Chinner
2010-04-13 10:29     ` Dave Chinner
2010-04-13 11:39     ` KOSAKI Motohiro
2010-04-13 11:39       ` KOSAKI Motohiro
2010-04-13 14:36       ` Dave Chinner
2010-04-13 14:36         ` Dave Chinner
2010-04-14  3:12         ` Dave Chinner
2010-04-14  3:12           ` Dave Chinner
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-14  6:52             ` KOSAKI Motohiro
2010-04-15  1:56             ` Dave Chinner
2010-04-15  1:56               ` Dave Chinner
2010-04-14  6:52         ` KOSAKI Motohiro
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-14  7:36           ` Dave Chinner
2010-04-14  7:36             ` Dave Chinner
2010-04-13  9:58 ` Mel Gorman
2010-04-13  9:58   ` Mel Gorman
2010-04-13 11:19   ` Dave Chinner
2010-04-13 11:19     ` Dave Chinner
2010-04-13 19:34     ` Mel Gorman
2010-04-13 19:34       ` Mel Gorman
2010-04-13 20:20       ` Chris Mason
2010-04-13 20:20         ` Chris Mason
2010-04-14  1:40         ` Dave Chinner
2010-04-14  1:40           ` Dave Chinner
2010-04-14  4:59           ` KAMEZAWA Hiroyuki
2010-04-14  4:59             ` KAMEZAWA Hiroyuki
2010-04-14  5:41             ` Dave Chinner
2010-04-14  5:41               ` Dave Chinner
2010-04-14  5:54               ` KOSAKI Motohiro
2010-04-14  5:54                 ` KOSAKI Motohiro
2010-04-14  6:13                 ` Minchan Kim
2010-04-14  7:19                   ` Minchan Kim
2010-04-14  7:19                     ` Minchan Kim
2010-04-14  9:42                     ` KAMEZAWA Hiroyuki
2010-04-14  9:42                       ` KAMEZAWA Hiroyuki
2010-04-14  9:42                       ` KAMEZAWA Hiroyuki
2010-04-14 10:01                       ` Minchan Kim
2010-04-14 10:01                         ` Minchan Kim
2010-04-14 10:07                         ` Mel Gorman
2010-04-14 10:07                           ` Mel Gorman
2010-04-14 10:07                           ` Mel Gorman
2010-04-14 10:16                           ` Minchan Kim
2010-04-14 10:16                             ` Minchan Kim
2010-04-14  7:06                 ` Dave Chinner
2010-04-14  7:06                   ` Dave Chinner
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-14  6:52             ` KOSAKI Motohiro
2010-04-14  7:28             ` Dave Chinner
2010-04-14  7:28               ` Dave Chinner
2010-04-14  8:51               ` Mel Gorman
2010-04-14  8:51                 ` Mel Gorman
2010-04-15  1:34                 ` Dave Chinner
2010-04-15  1:34                   ` Dave Chinner
2010-04-15  1:34                   ` Dave Chinner
2010-04-15  4:09                   ` KOSAKI Motohiro
2010-04-15  4:09                     ` KOSAKI Motohiro
2010-04-15  4:11                     ` [PATCH 1/4] vmscan: delegate pageout io to flusher thread if current is kswapd KOSAKI Motohiro
2010-04-15  4:11                       ` KOSAKI Motohiro
2010-04-15  4:11                       ` KOSAKI Motohiro
2010-04-15  8:05                       ` Suleiman Souhlal
2010-04-15  8:05                         ` Suleiman Souhlal
2010-04-15  8:17                         ` KOSAKI Motohiro
2010-04-15  8:17                           ` KOSAKI Motohiro
2010-04-15  8:26                           ` KOSAKI Motohiro
2010-04-15  8:26                             ` KOSAKI Motohiro
2010-04-15 10:30                             ` Johannes Weiner
2010-04-15 10:30                               ` Johannes Weiner
2010-04-15 17:24                               ` Suleiman Souhlal
2010-04-15 17:24                                 ` Suleiman Souhlal
2010-04-20  2:56                               ` Ying Han
2010-04-20  2:56                                 ` Ying Han
2010-04-15  9:32                         ` Dave Chinner
2010-04-15  9:32                           ` Dave Chinner
2010-04-15  9:41                           ` KOSAKI Motohiro
2010-04-15  9:41                             ` KOSAKI Motohiro
2010-04-15 17:27                           ` Suleiman Souhlal
2010-04-15 17:27                             ` Suleiman Souhlal
2010-04-15 23:33                             ` Dave Chinner
2010-04-15 23:33                               ` Dave Chinner
2010-04-15 23:41                               ` Suleiman Souhlal
2010-04-15 23:41                                 ` Suleiman Souhlal
2010-04-16  9:50                               ` Alan Cox
2010-04-16  9:50                                 ` Alan Cox
2010-04-17  3:06                                 ` Dave Chinner
2010-04-17  3:06                                   ` Dave Chinner
2010-04-15  8:18                       ` KOSAKI Motohiro
2010-04-15  8:18                         ` KOSAKI Motohiro
2010-04-15  8:18                         ` KOSAKI Motohiro
2010-04-15 10:31                       ` Mel Gorman
2010-04-15 10:31                         ` Mel Gorman
2010-04-15 11:26                         ` KOSAKI Motohiro
2010-04-15 11:26                           ` KOSAKI Motohiro
2010-04-15  4:13                     ` [PATCH 2/4] vmscan: kill prev_priority completely KOSAKI Motohiro
2010-04-15  4:13                       ` KOSAKI Motohiro
2010-04-15  4:13                       ` KOSAKI Motohiro
2010-04-15  4:14                     ` [PATCH 3/4] vmscan: move priority variable into scan_control KOSAKI Motohiro
2010-04-15  4:14                       ` KOSAKI Motohiro
2010-04-15  4:14                       ` KOSAKI Motohiro
2010-04-15  4:15                     ` [PATCH 4/4] vmscan: delegate page cleaning io to flusher thread if VM pressure is low KOSAKI Motohiro
2010-04-15  4:15                       ` KOSAKI Motohiro
2010-04-15  4:15                       ` KOSAKI Motohiro
2010-04-15  4:35                     ` [PATCH] mm: disallow direct reclaim page writeback KOSAKI Motohiro
2010-04-15  4:35                       ` KOSAKI Motohiro
2010-04-15  6:32                       ` Dave Chinner
2010-04-15  6:32                         ` Dave Chinner
2010-04-15  6:44                         ` KOSAKI Motohiro
2010-04-15  6:44                           ` KOSAKI Motohiro
2010-04-15  6:58                           ` Dave Chinner
2010-04-15  6:58                             ` Dave Chinner
2010-04-15  6:20                     ` Dave Chinner
2010-04-15  6:20                       ` Dave Chinner
2010-04-15  6:35                       ` KOSAKI Motohiro
2010-04-15  6:35                         ` KOSAKI Motohiro
2010-04-15  8:54                         ` Dave Chinner
2010-04-15  8:54                           ` Dave Chinner
2010-04-15 10:21                           ` KOSAKI Motohiro
2010-04-15 10:21                             ` KOSAKI Motohiro
2010-04-15 10:23                             ` [PATCH 1/4] vmscan: simplify shrink_inactive_list() KOSAKI Motohiro
2010-04-15 10:23                               ` KOSAKI Motohiro
2010-04-15 13:15                               ` Mel Gorman
2010-04-15 13:15                                 ` Mel Gorman
2010-04-15 15:01                                 ` Andi Kleen
2010-04-15 15:01                                   ` Andi Kleen
2010-04-15 15:01                                   ` Andi Kleen
2010-04-15 15:44                                   ` Mel Gorman
2010-04-15 15:44                                     ` Mel Gorman
2010-04-15 16:54                                     ` Andi Kleen
2010-04-15 16:54                                       ` Andi Kleen
2010-04-15 23:40                                       ` Dave Chinner
2010-04-15 23:40                                         ` Dave Chinner
2010-04-16  7:13                                         ` Andi Kleen
2010-04-16  7:13                                           ` Andi Kleen
2010-04-16 14:57                                         ` Mel Gorman
2010-04-16 14:57                                           ` Mel Gorman
2010-04-17  2:37                                           ` Dave Chinner
2010-04-17  2:37                                             ` Dave Chinner
2010-04-16 14:55                                       ` Mel Gorman
2010-04-16 14:55                                         ` Mel Gorman
2010-04-15 18:22                                 ` Valdis.Kletnieks
2010-04-16  9:39                                   ` Mel Gorman
2010-04-16  9:39                                     ` Mel Gorman
2010-04-15 10:24                             ` [PATCH 2/4] [cleanup] mm: introduce free_pages_prepare KOSAKI Motohiro
2010-04-15 10:24                               ` KOSAKI Motohiro
2010-04-15 10:24                               ` KOSAKI Motohiro
2010-04-15 13:33                               ` Mel Gorman
2010-04-15 13:33                                 ` Mel Gorman
2010-04-15 10:24                             ` [PATCH 3/4] mm: introduce free_pages_bulk KOSAKI Motohiro
2010-04-15 10:24                               ` KOSAKI Motohiro
2010-04-15 10:24                               ` KOSAKI Motohiro
2010-04-15 13:46                               ` Mel Gorman
2010-04-15 13:46                                 ` Mel Gorman
2010-04-15 10:26                             ` [PATCH 4/4] vmscan: replace the pagevec in shrink_inactive_list() with list KOSAKI Motohiro
2010-04-15 10:26                               ` KOSAKI Motohiro
2010-04-15 10:28                   ` [PATCH] mm: disallow direct reclaim page writeback Mel Gorman
2010-04-15 10:28                     ` Mel Gorman
2010-04-15 13:42                     ` Chris Mason
2010-04-15 13:42                       ` Chris Mason
2010-04-15 17:50                       ` tytso
2010-04-15 17:50                       ` tytso
2010-04-15 17:50                         ` tytso
2010-04-16 15:05                       ` Mel Gorman
2010-04-16 15:05                         ` Mel Gorman
2010-04-19 15:15                         ` Mel Gorman
2010-04-19 15:15                         ` Mel Gorman
2010-04-19 15:15                           ` Mel Gorman
2010-04-19 17:38                           ` Chris Mason
2010-04-16 15:05                       ` Mel Gorman
2010-04-16  4:14                     ` Dave Chinner
2010-04-16  4:14                       ` Dave Chinner
2010-04-16 15:14                       ` Mel Gorman
2010-04-16 15:14                         ` Mel Gorman
2010-04-18  0:32                         ` Andrew Morton
2010-04-18  0:32                           ` Andrew Morton
2010-04-18 19:05                           ` Christoph Hellwig
2010-04-18 19:05                             ` Christoph Hellwig
2010-04-18 16:31                             ` Andrew Morton
2010-04-18 16:31                               ` Andrew Morton
2010-04-18 19:35                               ` Christoph Hellwig
2010-04-18 19:35                                 ` Christoph Hellwig
2010-04-18 19:11                             ` Sorin Faibish
2010-04-18 19:11                               ` Sorin Faibish
2010-04-18 19:11                               ` Sorin Faibish
2010-04-18 19:10                           ` Sorin Faibish
2010-04-18 19:10                             ` Sorin Faibish
2010-04-18 19:10                             ` Sorin Faibish
2010-04-18 21:30                             ` James Bottomley
2010-04-18 21:30                               ` James Bottomley
2010-04-18 23:34                               ` Sorin Faibish
2010-04-18 23:34                                 ` Sorin Faibish
2010-04-18 23:34                                 ` Sorin Faibish
2010-04-19  3:08                               ` tytso
2010-04-19  3:08                                 ` tytso
2010-04-19  0:35                           ` Dave Chinner [this message]
2010-04-19  0:35                             ` Dave Chinner
2010-04-19  0:49                             ` Arjan van de Ven
2010-04-19  0:49                               ` Arjan van de Ven
2010-04-19  1:08                               ` Dave Chinner
2010-04-19  1:08                                 ` Dave Chinner
2010-04-19  4:32                                 ` Arjan van de Ven
2010-04-19  4:32                                   ` Arjan van de Ven
2010-04-19 15:20                         ` Mel Gorman
2010-04-19 15:20                           ` Mel Gorman
2010-04-23  1:06                           ` Dave Chinner
2010-04-23  1:06                             ` Dave Chinner
2010-04-23 10:50                             ` Mel Gorman
2010-04-23 10:50                               ` Mel Gorman
2010-04-15 14:57                   ` Andi Kleen
2010-04-15 14:57                     ` Andi Kleen
2010-04-15  2:37                 ` Johannes Weiner
2010-04-15  2:37                   ` Johannes Weiner
2010-04-15  2:43                   ` KOSAKI Motohiro
2010-04-15  2:43                     ` KOSAKI Motohiro
2010-04-16 23:56                     ` Johannes Weiner
2010-04-16 23:56                       ` Johannes Weiner
2010-04-14  6:52         ` KOSAKI Motohiro
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-14 10:06         ` Andi Kleen
2010-04-14 10:06           ` Andi Kleen
2010-04-14 10:06           ` Andi Kleen
2010-04-14 11:20           ` Chris Mason
2010-04-14 11:20             ` Chris Mason
2010-04-14 12:15             ` Andi Kleen
2010-04-14 12:15               ` Andi Kleen
2010-04-14 12:15               ` Andi Kleen
2010-04-14 12:32               ` Alan Cox
2010-04-14 12:32                 ` Alan Cox
2010-04-14 12:34                 ` Andi Kleen
2010-04-14 12:34                   ` Andi Kleen
2010-04-14 13:23             ` Mel Gorman
2010-04-14 13:23               ` Mel Gorman
2010-04-14 14:07               ` Chris Mason
2010-04-14 14:07                 ` Chris Mason
2010-04-14  0:24 ` Minchan Kim
2010-04-14  0:24   ` Minchan Kim
2010-04-14  4:44   ` Dave Chinner
2010-04-14  4:44     ` Dave Chinner
2010-04-14  7:54     ` Minchan Kim
2010-04-14  7:54       ` Minchan Kim
2010-04-16  1:13 ` KAMEZAWA Hiroyuki
2010-04-16  1:13   ` KAMEZAWA Hiroyuki
2010-04-16  4:18   ` KAMEZAWA Hiroyuki
2010-04-16  4:18     ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100419003556.GC2520@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.