All of lore.kernel.org
 help / color / mirror / Atom feed
From: tytso@mit.edu
To: Chris Mason <chris.mason@oracle.com>, Mel Gorman <mel@csn.ul.ie>,
	Dave Chinner <david@fromorbit.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-kernel@vger.kernel.org, l
Subject: Re: [PATCH] mm: disallow direct reclaim page writeback
Date: Thu, 15 Apr 2010 13:50:29 -0400	[thread overview]
Message-ID: <20100415175029.GF19959__33265.7380997718$1271353945$gmane$org@thunk.org> (raw)
In-Reply-To: <20100415134217.GB3794@think>

On Thu, Apr 15, 2010 at 09:42:17AM -0400, Chris Mason wrote:
> I'd like to add one more:
> 
> 5. Don't dive into filesystem locks during reclaim.
> 
> This is different from splicing code paths together, but
> the filesystem writepage code has become the center of our attempts at
> doing big fat contiguous writes on disk.  We push off work as late as we
> can until just before the pages go down to disk.
> 
> I'll pick on ext4 and btrfs for a minute, just to broaden the scope
> outside of XFS.  Writepage comes along and the filesystem needs to
> actually find blocks on disk for all the dirty pages it has promised to
> write.
> 
> So, we start a transaction, we take various allocator locks, modify
> different metadata, log changed blocks, take a break (logging is hard
> work you know, need_resched() triggered a by now), stuff it
> all into the file's metadata, log that, and finally return.
> 
> Each of the steps above can block for a long time.  Ext4 solves
> this by not doing them.  ext4_writepage only writes pages that
> are already fully allocated on disk.
> 
> Btrfs is much more efficient at not doing them, it just returns right
> away for PF_MEMALLOC.

This is a real problem, BTW.  One of the problems we've been fighting
inside Google is because ext4_writepage() refuses to write pages that
are subject to delayed allocation, it can cause the OOM killer to get
invoked.  

I had thought this was because of some evil games we're playing for
container support that makes zones small, but just last night at the
LF Collaboration Summit reception, I ran into a technologist from a
major financial industry customer reported to me that when they tried
using ext4, they ran into the exact same problem because they were
running Oracle which was pinning down 3 gigs of memory, and then when
they tried writing a very big file using ext4, they had the same
problem of writepage() not being able to reclaim enough pages, so the
kernel fell back to invoking the OOM killer, and things got ugly in a
hurry...

One of the things I was proposing internally to try as a long-term
we-gotta-fix writeback is that we need some kind of signal so that we
can do the lumpy reclaim (a) in a separate process, to avoid a lock
inversion problem and the gee-its-going-to-take-a-long-time problem
which Chris Mentioned, and (b) to try to cluster I/O so that we're not
dribbling out writes to the disk in small, seeky, 4k writes, which is
really a disaster from a performance standpoint.  Maybe the VM guys
don't care about this, but this sort of things tends to get us
filesystem guys all up in a lather not just because of the really
sucky performance, but also because it tends to mean that the system
can thrash itself to death in low memory situations.

    	       	      	     	      	 - Ted

  reply	other threads:[~2010-04-15 17:51 UTC|newest]

Thread overview: 248+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-13  0:17 [PATCH] mm: disallow direct reclaim page writeback Dave Chinner
2010-04-13  0:17 ` Dave Chinner
2010-04-13  8:31 ` KOSAKI Motohiro
2010-04-13  8:31   ` KOSAKI Motohiro
2010-04-13 10:29   ` Dave Chinner
2010-04-13 10:29     ` Dave Chinner
2010-04-13 11:39     ` KOSAKI Motohiro
2010-04-13 11:39       ` KOSAKI Motohiro
2010-04-13 14:36       ` Dave Chinner
2010-04-13 14:36         ` Dave Chinner
2010-04-14  3:12         ` Dave Chinner
2010-04-14  3:12           ` Dave Chinner
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-14  6:52             ` KOSAKI Motohiro
2010-04-15  1:56             ` Dave Chinner
2010-04-15  1:56               ` Dave Chinner
2010-04-14  6:52         ` KOSAKI Motohiro
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-14  7:36           ` Dave Chinner
2010-04-14  7:36             ` Dave Chinner
2010-04-13  9:58 ` Mel Gorman
2010-04-13  9:58   ` Mel Gorman
2010-04-13 11:19   ` Dave Chinner
2010-04-13 11:19     ` Dave Chinner
2010-04-13 19:34     ` Mel Gorman
2010-04-13 19:34       ` Mel Gorman
2010-04-13 20:20       ` Chris Mason
2010-04-13 20:20         ` Chris Mason
2010-04-14  1:40         ` Dave Chinner
2010-04-14  1:40           ` Dave Chinner
2010-04-14  4:59           ` KAMEZAWA Hiroyuki
2010-04-14  4:59             ` KAMEZAWA Hiroyuki
2010-04-14  5:41             ` Dave Chinner
2010-04-14  5:41               ` Dave Chinner
2010-04-14  5:54               ` KOSAKI Motohiro
2010-04-14  5:54                 ` KOSAKI Motohiro
2010-04-14  6:13                 ` Minchan Kim
2010-04-14  7:19                   ` Minchan Kim
2010-04-14  7:19                     ` Minchan Kim
2010-04-14  9:42                     ` KAMEZAWA Hiroyuki
2010-04-14  9:42                       ` KAMEZAWA Hiroyuki
2010-04-14  9:42                       ` KAMEZAWA Hiroyuki
2010-04-14 10:01                       ` Minchan Kim
2010-04-14 10:01                         ` Minchan Kim
2010-04-14 10:07                         ` Mel Gorman
2010-04-14 10:07                           ` Mel Gorman
2010-04-14 10:07                           ` Mel Gorman
2010-04-14 10:16                           ` Minchan Kim
2010-04-14 10:16                             ` Minchan Kim
2010-04-14  7:06                 ` Dave Chinner
2010-04-14  7:06                   ` Dave Chinner
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-14  6:52             ` KOSAKI Motohiro
2010-04-14  7:28             ` Dave Chinner
2010-04-14  7:28               ` Dave Chinner
2010-04-14  8:51               ` Mel Gorman
2010-04-14  8:51                 ` Mel Gorman
2010-04-15  1:34                 ` Dave Chinner
2010-04-15  1:34                   ` Dave Chinner
2010-04-15  1:34                   ` Dave Chinner
2010-04-15  4:09                   ` KOSAKI Motohiro
2010-04-15  4:09                     ` KOSAKI Motohiro
2010-04-15  4:11                     ` [PATCH 1/4] vmscan: delegate pageout io to flusher thread if current is kswapd KOSAKI Motohiro
2010-04-15  4:11                       ` KOSAKI Motohiro
2010-04-15  4:11                       ` KOSAKI Motohiro
2010-04-15  8:05                       ` Suleiman Souhlal
2010-04-15  8:05                         ` Suleiman Souhlal
2010-04-15  8:17                         ` KOSAKI Motohiro
2010-04-15  8:17                           ` KOSAKI Motohiro
2010-04-15  8:26                           ` KOSAKI Motohiro
2010-04-15  8:26                             ` KOSAKI Motohiro
2010-04-15 10:30                             ` Johannes Weiner
2010-04-15 10:30                               ` Johannes Weiner
2010-04-15 17:24                               ` Suleiman Souhlal
2010-04-15 17:24                                 ` Suleiman Souhlal
2010-04-20  2:56                               ` Ying Han
2010-04-20  2:56                                 ` Ying Han
2010-04-15  9:32                         ` Dave Chinner
2010-04-15  9:32                           ` Dave Chinner
2010-04-15  9:41                           ` KOSAKI Motohiro
2010-04-15  9:41                             ` KOSAKI Motohiro
2010-04-15 17:27                           ` Suleiman Souhlal
2010-04-15 17:27                             ` Suleiman Souhlal
2010-04-15 23:33                             ` Dave Chinner
2010-04-15 23:33                               ` Dave Chinner
2010-04-15 23:41                               ` Suleiman Souhlal
2010-04-15 23:41                                 ` Suleiman Souhlal
2010-04-16  9:50                               ` Alan Cox
2010-04-16  9:50                                 ` Alan Cox
2010-04-17  3:06                                 ` Dave Chinner
2010-04-17  3:06                                   ` Dave Chinner
2010-04-15  8:18                       ` KOSAKI Motohiro
2010-04-15  8:18                         ` KOSAKI Motohiro
2010-04-15  8:18                         ` KOSAKI Motohiro
2010-04-15 10:31                       ` Mel Gorman
2010-04-15 10:31                         ` Mel Gorman
2010-04-15 11:26                         ` KOSAKI Motohiro
2010-04-15 11:26                           ` KOSAKI Motohiro
2010-04-15  4:13                     ` [PATCH 2/4] vmscan: kill prev_priority completely KOSAKI Motohiro
2010-04-15  4:13                       ` KOSAKI Motohiro
2010-04-15  4:13                       ` KOSAKI Motohiro
2010-04-15  4:14                     ` [PATCH 3/4] vmscan: move priority variable into scan_control KOSAKI Motohiro
2010-04-15  4:14                       ` KOSAKI Motohiro
2010-04-15  4:14                       ` KOSAKI Motohiro
2010-04-15  4:15                     ` [PATCH 4/4] vmscan: delegate page cleaning io to flusher thread if VM pressure is low KOSAKI Motohiro
2010-04-15  4:15                       ` KOSAKI Motohiro
2010-04-15  4:15                       ` KOSAKI Motohiro
2010-04-15  4:35                     ` [PATCH] mm: disallow direct reclaim page writeback KOSAKI Motohiro
2010-04-15  4:35                       ` KOSAKI Motohiro
2010-04-15  6:32                       ` Dave Chinner
2010-04-15  6:32                         ` Dave Chinner
2010-04-15  6:44                         ` KOSAKI Motohiro
2010-04-15  6:44                           ` KOSAKI Motohiro
2010-04-15  6:58                           ` Dave Chinner
2010-04-15  6:58                             ` Dave Chinner
2010-04-15  6:20                     ` Dave Chinner
2010-04-15  6:20                       ` Dave Chinner
2010-04-15  6:35                       ` KOSAKI Motohiro
2010-04-15  6:35                         ` KOSAKI Motohiro
2010-04-15  8:54                         ` Dave Chinner
2010-04-15  8:54                           ` Dave Chinner
2010-04-15 10:21                           ` KOSAKI Motohiro
2010-04-15 10:21                             ` KOSAKI Motohiro
2010-04-15 10:23                             ` [PATCH 1/4] vmscan: simplify shrink_inactive_list() KOSAKI Motohiro
2010-04-15 10:23                               ` KOSAKI Motohiro
2010-04-15 13:15                               ` Mel Gorman
2010-04-15 13:15                                 ` Mel Gorman
2010-04-15 15:01                                 ` Andi Kleen
2010-04-15 15:01                                   ` Andi Kleen
2010-04-15 15:01                                   ` Andi Kleen
2010-04-15 15:44                                   ` Mel Gorman
2010-04-15 15:44                                     ` Mel Gorman
2010-04-15 16:54                                     ` Andi Kleen
2010-04-15 16:54                                       ` Andi Kleen
2010-04-15 23:40                                       ` Dave Chinner
2010-04-15 23:40                                         ` Dave Chinner
2010-04-16  7:13                                         ` Andi Kleen
2010-04-16  7:13                                           ` Andi Kleen
2010-04-16 14:57                                         ` Mel Gorman
2010-04-16 14:57                                           ` Mel Gorman
2010-04-17  2:37                                           ` Dave Chinner
2010-04-17  2:37                                             ` Dave Chinner
2010-04-16 14:55                                       ` Mel Gorman
2010-04-16 14:55                                         ` Mel Gorman
2010-04-15 18:22                                 ` Valdis.Kletnieks
2010-04-16  9:39                                   ` Mel Gorman
2010-04-16  9:39                                     ` Mel Gorman
2010-04-15 10:24                             ` [PATCH 2/4] [cleanup] mm: introduce free_pages_prepare KOSAKI Motohiro
2010-04-15 10:24                               ` KOSAKI Motohiro
2010-04-15 10:24                               ` KOSAKI Motohiro
2010-04-15 13:33                               ` Mel Gorman
2010-04-15 13:33                                 ` Mel Gorman
2010-04-15 10:24                             ` [PATCH 3/4] mm: introduce free_pages_bulk KOSAKI Motohiro
2010-04-15 10:24                               ` KOSAKI Motohiro
2010-04-15 10:24                               ` KOSAKI Motohiro
2010-04-15 13:46                               ` Mel Gorman
2010-04-15 13:46                                 ` Mel Gorman
2010-04-15 10:26                             ` [PATCH 4/4] vmscan: replace the pagevec in shrink_inactive_list() with list KOSAKI Motohiro
2010-04-15 10:26                               ` KOSAKI Motohiro
2010-04-15 10:28                   ` [PATCH] mm: disallow direct reclaim page writeback Mel Gorman
2010-04-15 10:28                     ` Mel Gorman
2010-04-15 13:42                     ` Chris Mason
2010-04-15 13:42                       ` Chris Mason
2010-04-15 17:50                       ` tytso [this message]
2010-04-15 17:50                       ` tytso
2010-04-15 17:50                         ` tytso
2010-04-16 15:05                       ` Mel Gorman
2010-04-16 15:05                         ` Mel Gorman
2010-04-19 15:15                         ` Mel Gorman
2010-04-19 15:15                         ` Mel Gorman
2010-04-19 15:15                           ` Mel Gorman
2010-04-19 17:38                           ` Chris Mason
2010-04-16 15:05                       ` Mel Gorman
2010-04-16  4:14                     ` Dave Chinner
2010-04-16  4:14                       ` Dave Chinner
2010-04-16 15:14                       ` Mel Gorman
2010-04-16 15:14                         ` Mel Gorman
2010-04-18  0:32                         ` Andrew Morton
2010-04-18  0:32                           ` Andrew Morton
2010-04-18 19:05                           ` Christoph Hellwig
2010-04-18 19:05                             ` Christoph Hellwig
2010-04-18 16:31                             ` Andrew Morton
2010-04-18 16:31                               ` Andrew Morton
2010-04-18 19:35                               ` Christoph Hellwig
2010-04-18 19:35                                 ` Christoph Hellwig
2010-04-18 19:11                             ` Sorin Faibish
2010-04-18 19:11                               ` Sorin Faibish
2010-04-18 19:11                               ` Sorin Faibish
2010-04-18 19:10                           ` Sorin Faibish
2010-04-18 19:10                             ` Sorin Faibish
2010-04-18 19:10                             ` Sorin Faibish
2010-04-18 21:30                             ` James Bottomley
2010-04-18 21:30                               ` James Bottomley
2010-04-18 23:34                               ` Sorin Faibish
2010-04-18 23:34                                 ` Sorin Faibish
2010-04-18 23:34                                 ` Sorin Faibish
2010-04-19  3:08                               ` tytso
2010-04-19  3:08                                 ` tytso
2010-04-19  0:35                           ` Dave Chinner
2010-04-19  0:35                             ` Dave Chinner
2010-04-19  0:49                             ` Arjan van de Ven
2010-04-19  0:49                               ` Arjan van de Ven
2010-04-19  1:08                               ` Dave Chinner
2010-04-19  1:08                                 ` Dave Chinner
2010-04-19  4:32                                 ` Arjan van de Ven
2010-04-19  4:32                                   ` Arjan van de Ven
2010-04-19 15:20                         ` Mel Gorman
2010-04-19 15:20                           ` Mel Gorman
2010-04-23  1:06                           ` Dave Chinner
2010-04-23  1:06                             ` Dave Chinner
2010-04-23 10:50                             ` Mel Gorman
2010-04-23 10:50                               ` Mel Gorman
2010-04-15 14:57                   ` Andi Kleen
2010-04-15 14:57                     ` Andi Kleen
2010-04-15  2:37                 ` Johannes Weiner
2010-04-15  2:37                   ` Johannes Weiner
2010-04-15  2:43                   ` KOSAKI Motohiro
2010-04-15  2:43                     ` KOSAKI Motohiro
2010-04-16 23:56                     ` Johannes Weiner
2010-04-16 23:56                       ` Johannes Weiner
2010-04-14  6:52         ` KOSAKI Motohiro
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-14 10:06         ` Andi Kleen
2010-04-14 10:06           ` Andi Kleen
2010-04-14 10:06           ` Andi Kleen
2010-04-14 11:20           ` Chris Mason
2010-04-14 11:20             ` Chris Mason
2010-04-14 12:15             ` Andi Kleen
2010-04-14 12:15               ` Andi Kleen
2010-04-14 12:15               ` Andi Kleen
2010-04-14 12:32               ` Alan Cox
2010-04-14 12:32                 ` Alan Cox
2010-04-14 12:34                 ` Andi Kleen
2010-04-14 12:34                   ` Andi Kleen
2010-04-14 13:23             ` Mel Gorman
2010-04-14 13:23               ` Mel Gorman
2010-04-14 14:07               ` Chris Mason
2010-04-14 14:07                 ` Chris Mason
2010-04-14  0:24 ` Minchan Kim
2010-04-14  0:24   ` Minchan Kim
2010-04-14  4:44   ` Dave Chinner
2010-04-14  4:44     ` Dave Chinner
2010-04-14  7:54     ` Minchan Kim
2010-04-14  7:54       ` Minchan Kim
2010-04-16  1:13 ` KAMEZAWA Hiroyuki
2010-04-16  1:13   ` KAMEZAWA Hiroyuki
2010-04-16  4:18   ` KAMEZAWA Hiroyuki
2010-04-16  4:18     ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='20100415175029.GF19959__33265.7380997718$1271353945$gmane$org@thunk.org' \
    --to=tytso@mit.edu \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.