All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: Chris Mason <chris.mason@oracle.com>
Cc: Mel Gorman <mel@csn.ul.ie>, Dave Chinner <david@fromorbit.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] mm: disallow direct reclaim page writeback
Date: Wed, 14 Apr 2010 14:15:16 +0200	[thread overview]
Message-ID: <8739yy9qnf.fsf@basil.nowhere.org> (raw)
In-Reply-To: <20100414112015.GO13327@think> (Chris Mason's message of "Wed, 14 Apr 2010 07:20:15 -0400")

Chris Mason <chris.mason@oracle.com> writes:
>> 
>> Basically if you cannot tolerate 1K (or more likely more) of stack
>> used before your fs is called you're toast in lots of other situations
>> anyways.
>
> Well, on a 4K stack kernel, 832 bytes is a very large percentage for
> just one function.

To be honest I think 4K stack simply has to go. I tend to call
it "russian roulette" mode. 

It was just a old workaround for a very old buggy VM that couldn't free 8K
pages and the VM is a lot better at that now. And the general trend is
to more complex code everywhere, so 4K stacks become more and more hazardous.

It was a bad idea back then and is still a bad idea, getting
worse and worse with each MLOC being added to the kernel each year.

We don't have any good ways to verify that obscure paths through
the more and more subsystems won't exceed it (in fact I'm pretty
sure there are plenty of problems in exotic configurations)

And even if you can make a specific load work there's basically
no safety net.

The only part of the 4K stack code that's good is the separate
interrupt stack, but that one should be just combined with a sane 8K 
process stack.

But yes on a 4K kernel you probably don't want to do any direct reclaim. 
Maybe for GFP_NOFS everywhere except user allocations when it's set? 
Or simply drop it?

> But they don't realize their function can dive down into ecryptfs then
> the filesystem then maybe loop and then perhaps raid6 on top of a
> network block device.

Those stackings need to use separate threads anyways. A lot of them
do in fact. Block avoided this problem by iterating instead of
recursing.  Those that still recurse on the same stack simply
need to be fixed.

> Yeah, but since the call chain does eventually go into the allocator,
> this function needs to be more stack friendly.

For common fast paths it doesn't go into the allocator.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

WARNING: multiple messages have this Message-ID (diff)
From: Andi Kleen <andi@firstfloor.org>
To: Chris Mason <chris.mason@oracle.com>
Cc: Mel Gorman <mel@csn.ul.ie>,  Dave Chinner <david@fromorbit.com>,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] mm: disallow direct reclaim page writeback
Date: Wed, 14 Apr 2010 14:15:16 +0200	[thread overview]
Message-ID: <8739yy9qnf.fsf@basil.nowhere.org> (raw)
In-Reply-To: <20100414112015.GO13327@think> (Chris Mason's message of "Wed, 14 Apr 2010 07:20:15 -0400")

Chris Mason <chris.mason@oracle.com> writes:
>> 
>> Basically if you cannot tolerate 1K (or more likely more) of stack
>> used before your fs is called you're toast in lots of other situations
>> anyways.
>
> Well, on a 4K stack kernel, 832 bytes is a very large percentage for
> just one function.

To be honest I think 4K stack simply has to go. I tend to call
it "russian roulette" mode. 

It was just a old workaround for a very old buggy VM that couldn't free 8K
pages and the VM is a lot better at that now. And the general trend is
to more complex code everywhere, so 4K stacks become more and more hazardous.

It was a bad idea back then and is still a bad idea, getting
worse and worse with each MLOC being added to the kernel each year.

We don't have any good ways to verify that obscure paths through
the more and more subsystems won't exceed it (in fact I'm pretty
sure there are plenty of problems in exotic configurations)

And even if you can make a specific load work there's basically
no safety net.

The only part of the 4K stack code that's good is the separate
interrupt stack, but that one should be just combined with a sane 8K 
process stack.

But yes on a 4K kernel you probably don't want to do any direct reclaim. 
Maybe for GFP_NOFS everywhere except user allocations when it's set? 
Or simply drop it?

> But they don't realize their function can dive down into ecryptfs then
> the filesystem then maybe loop and then perhaps raid6 on top of a
> network block device.

Those stackings need to use separate threads anyways. A lot of them
do in fact. Block avoided this problem by iterating instead of
recursing.  Those that still recurse on the same stack simply
need to be fixed.

> Yeah, but since the call chain does eventually go into the allocator,
> this function needs to be more stack friendly.

For common fast paths it doesn't go into the allocator.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Andi Kleen <andi@firstfloor.org>
To: Chris Mason <chris.mason@oracle.com>
Cc: Mel Gorman <mel@csn.ul.ie>, Dave Chinner <david@fromorbit.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] mm: disallow direct reclaim page writeback
Date: Wed, 14 Apr 2010 14:15:16 +0200	[thread overview]
Message-ID: <8739yy9qnf.fsf@basil.nowhere.org> (raw)
In-Reply-To: <20100414112015.GO13327@think> (Chris Mason's message of "Wed, 14 Apr 2010 07:20:15 -0400")

Chris Mason <chris.mason@oracle.com> writes:
>> 
>> Basically if you cannot tolerate 1K (or more likely more) of stack
>> used before your fs is called you're toast in lots of other situations
>> anyways.
>
> Well, on a 4K stack kernel, 832 bytes is a very large percentage for
> just one function.

To be honest I think 4K stack simply has to go. I tend to call
it "russian roulette" mode. 

It was just a old workaround for a very old buggy VM that couldn't free 8K
pages and the VM is a lot better at that now. And the general trend is
to more complex code everywhere, so 4K stacks become more and more hazardous.

It was a bad idea back then and is still a bad idea, getting
worse and worse with each MLOC being added to the kernel each year.

We don't have any good ways to verify that obscure paths through
the more and more subsystems won't exceed it (in fact I'm pretty
sure there are plenty of problems in exotic configurations)

And even if you can make a specific load work there's basically
no safety net.

The only part of the 4K stack code that's good is the separate
interrupt stack, but that one should be just combined with a sane 8K 
process stack.

But yes on a 4K kernel you probably don't want to do any direct reclaim. 
Maybe for GFP_NOFS everywhere except user allocations when it's set? 
Or simply drop it?

> But they don't realize their function can dive down into ecryptfs then
> the filesystem then maybe loop and then perhaps raid6 on top of a
> network block device.

Those stackings need to use separate threads anyways. A lot of them
do in fact. Block avoided this problem by iterating instead of
recursing.  Those that still recurse on the same stack simply
need to be fixed.

> Yeah, but since the call chain does eventually go into the allocator,
> this function needs to be more stack friendly.

For common fast paths it doesn't go into the allocator.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-04-14 12:15 UTC|newest]

Thread overview: 248+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-13  0:17 [PATCH] mm: disallow direct reclaim page writeback Dave Chinner
2010-04-13  0:17 ` Dave Chinner
2010-04-13  8:31 ` KOSAKI Motohiro
2010-04-13  8:31   ` KOSAKI Motohiro
2010-04-13 10:29   ` Dave Chinner
2010-04-13 10:29     ` Dave Chinner
2010-04-13 11:39     ` KOSAKI Motohiro
2010-04-13 11:39       ` KOSAKI Motohiro
2010-04-13 14:36       ` Dave Chinner
2010-04-13 14:36         ` Dave Chinner
2010-04-14  3:12         ` Dave Chinner
2010-04-14  3:12           ` Dave Chinner
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-14  6:52             ` KOSAKI Motohiro
2010-04-15  1:56             ` Dave Chinner
2010-04-15  1:56               ` Dave Chinner
2010-04-14  6:52         ` KOSAKI Motohiro
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-14  7:36           ` Dave Chinner
2010-04-14  7:36             ` Dave Chinner
2010-04-13  9:58 ` Mel Gorman
2010-04-13  9:58   ` Mel Gorman
2010-04-13 11:19   ` Dave Chinner
2010-04-13 11:19     ` Dave Chinner
2010-04-13 19:34     ` Mel Gorman
2010-04-13 19:34       ` Mel Gorman
2010-04-13 20:20       ` Chris Mason
2010-04-13 20:20         ` Chris Mason
2010-04-14  1:40         ` Dave Chinner
2010-04-14  1:40           ` Dave Chinner
2010-04-14  4:59           ` KAMEZAWA Hiroyuki
2010-04-14  4:59             ` KAMEZAWA Hiroyuki
2010-04-14  5:41             ` Dave Chinner
2010-04-14  5:41               ` Dave Chinner
2010-04-14  5:54               ` KOSAKI Motohiro
2010-04-14  5:54                 ` KOSAKI Motohiro
2010-04-14  6:13                 ` Minchan Kim
2010-04-14  7:19                   ` Minchan Kim
2010-04-14  7:19                     ` Minchan Kim
2010-04-14  9:42                     ` KAMEZAWA Hiroyuki
2010-04-14  9:42                       ` KAMEZAWA Hiroyuki
2010-04-14  9:42                       ` KAMEZAWA Hiroyuki
2010-04-14 10:01                       ` Minchan Kim
2010-04-14 10:01                         ` Minchan Kim
2010-04-14 10:07                         ` Mel Gorman
2010-04-14 10:07                           ` Mel Gorman
2010-04-14 10:07                           ` Mel Gorman
2010-04-14 10:16                           ` Minchan Kim
2010-04-14 10:16                             ` Minchan Kim
2010-04-14  7:06                 ` Dave Chinner
2010-04-14  7:06                   ` Dave Chinner
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-14  6:52             ` KOSAKI Motohiro
2010-04-14  7:28             ` Dave Chinner
2010-04-14  7:28               ` Dave Chinner
2010-04-14  8:51               ` Mel Gorman
2010-04-14  8:51                 ` Mel Gorman
2010-04-15  1:34                 ` Dave Chinner
2010-04-15  1:34                   ` Dave Chinner
2010-04-15  1:34                   ` Dave Chinner
2010-04-15  4:09                   ` KOSAKI Motohiro
2010-04-15  4:09                     ` KOSAKI Motohiro
2010-04-15  4:11                     ` [PATCH 1/4] vmscan: delegate pageout io to flusher thread if current is kswapd KOSAKI Motohiro
2010-04-15  4:11                       ` KOSAKI Motohiro
2010-04-15  4:11                       ` KOSAKI Motohiro
2010-04-15  8:05                       ` Suleiman Souhlal
2010-04-15  8:05                         ` Suleiman Souhlal
2010-04-15  8:17                         ` KOSAKI Motohiro
2010-04-15  8:17                           ` KOSAKI Motohiro
2010-04-15  8:26                           ` KOSAKI Motohiro
2010-04-15  8:26                             ` KOSAKI Motohiro
2010-04-15 10:30                             ` Johannes Weiner
2010-04-15 10:30                               ` Johannes Weiner
2010-04-15 17:24                               ` Suleiman Souhlal
2010-04-15 17:24                                 ` Suleiman Souhlal
2010-04-20  2:56                               ` Ying Han
2010-04-20  2:56                                 ` Ying Han
2010-04-15  9:32                         ` Dave Chinner
2010-04-15  9:32                           ` Dave Chinner
2010-04-15  9:41                           ` KOSAKI Motohiro
2010-04-15  9:41                             ` KOSAKI Motohiro
2010-04-15 17:27                           ` Suleiman Souhlal
2010-04-15 17:27                             ` Suleiman Souhlal
2010-04-15 23:33                             ` Dave Chinner
2010-04-15 23:33                               ` Dave Chinner
2010-04-15 23:41                               ` Suleiman Souhlal
2010-04-15 23:41                                 ` Suleiman Souhlal
2010-04-16  9:50                               ` Alan Cox
2010-04-16  9:50                                 ` Alan Cox
2010-04-17  3:06                                 ` Dave Chinner
2010-04-17  3:06                                   ` Dave Chinner
2010-04-15  8:18                       ` KOSAKI Motohiro
2010-04-15  8:18                         ` KOSAKI Motohiro
2010-04-15  8:18                         ` KOSAKI Motohiro
2010-04-15 10:31                       ` Mel Gorman
2010-04-15 10:31                         ` Mel Gorman
2010-04-15 11:26                         ` KOSAKI Motohiro
2010-04-15 11:26                           ` KOSAKI Motohiro
2010-04-15  4:13                     ` [PATCH 2/4] vmscan: kill prev_priority completely KOSAKI Motohiro
2010-04-15  4:13                       ` KOSAKI Motohiro
2010-04-15  4:13                       ` KOSAKI Motohiro
2010-04-15  4:14                     ` [PATCH 3/4] vmscan: move priority variable into scan_control KOSAKI Motohiro
2010-04-15  4:14                       ` KOSAKI Motohiro
2010-04-15  4:14                       ` KOSAKI Motohiro
2010-04-15  4:15                     ` [PATCH 4/4] vmscan: delegate page cleaning io to flusher thread if VM pressure is low KOSAKI Motohiro
2010-04-15  4:15                       ` KOSAKI Motohiro
2010-04-15  4:15                       ` KOSAKI Motohiro
2010-04-15  4:35                     ` [PATCH] mm: disallow direct reclaim page writeback KOSAKI Motohiro
2010-04-15  4:35                       ` KOSAKI Motohiro
2010-04-15  6:32                       ` Dave Chinner
2010-04-15  6:32                         ` Dave Chinner
2010-04-15  6:44                         ` KOSAKI Motohiro
2010-04-15  6:44                           ` KOSAKI Motohiro
2010-04-15  6:58                           ` Dave Chinner
2010-04-15  6:58                             ` Dave Chinner
2010-04-15  6:20                     ` Dave Chinner
2010-04-15  6:20                       ` Dave Chinner
2010-04-15  6:35                       ` KOSAKI Motohiro
2010-04-15  6:35                         ` KOSAKI Motohiro
2010-04-15  8:54                         ` Dave Chinner
2010-04-15  8:54                           ` Dave Chinner
2010-04-15 10:21                           ` KOSAKI Motohiro
2010-04-15 10:21                             ` KOSAKI Motohiro
2010-04-15 10:23                             ` [PATCH 1/4] vmscan: simplify shrink_inactive_list() KOSAKI Motohiro
2010-04-15 10:23                               ` KOSAKI Motohiro
2010-04-15 13:15                               ` Mel Gorman
2010-04-15 13:15                                 ` Mel Gorman
2010-04-15 15:01                                 ` Andi Kleen
2010-04-15 15:01                                   ` Andi Kleen
2010-04-15 15:01                                   ` Andi Kleen
2010-04-15 15:44                                   ` Mel Gorman
2010-04-15 15:44                                     ` Mel Gorman
2010-04-15 16:54                                     ` Andi Kleen
2010-04-15 16:54                                       ` Andi Kleen
2010-04-15 23:40                                       ` Dave Chinner
2010-04-15 23:40                                         ` Dave Chinner
2010-04-16  7:13                                         ` Andi Kleen
2010-04-16  7:13                                           ` Andi Kleen
2010-04-16 14:57                                         ` Mel Gorman
2010-04-16 14:57                                           ` Mel Gorman
2010-04-17  2:37                                           ` Dave Chinner
2010-04-17  2:37                                             ` Dave Chinner
2010-04-16 14:55                                       ` Mel Gorman
2010-04-16 14:55                                         ` Mel Gorman
2010-04-15 18:22                                 ` Valdis.Kletnieks
2010-04-16  9:39                                   ` Mel Gorman
2010-04-16  9:39                                     ` Mel Gorman
2010-04-15 10:24                             ` [PATCH 2/4] [cleanup] mm: introduce free_pages_prepare KOSAKI Motohiro
2010-04-15 10:24                               ` KOSAKI Motohiro
2010-04-15 10:24                               ` KOSAKI Motohiro
2010-04-15 13:33                               ` Mel Gorman
2010-04-15 13:33                                 ` Mel Gorman
2010-04-15 10:24                             ` [PATCH 3/4] mm: introduce free_pages_bulk KOSAKI Motohiro
2010-04-15 10:24                               ` KOSAKI Motohiro
2010-04-15 10:24                               ` KOSAKI Motohiro
2010-04-15 13:46                               ` Mel Gorman
2010-04-15 13:46                                 ` Mel Gorman
2010-04-15 10:26                             ` [PATCH 4/4] vmscan: replace the pagevec in shrink_inactive_list() with list KOSAKI Motohiro
2010-04-15 10:26                               ` KOSAKI Motohiro
2010-04-15 10:28                   ` [PATCH] mm: disallow direct reclaim page writeback Mel Gorman
2010-04-15 10:28                     ` Mel Gorman
2010-04-15 13:42                     ` Chris Mason
2010-04-15 13:42                       ` Chris Mason
2010-04-15 17:50                       ` tytso
2010-04-15 17:50                       ` tytso
2010-04-15 17:50                         ` tytso
2010-04-16 15:05                       ` Mel Gorman
2010-04-16 15:05                         ` Mel Gorman
2010-04-19 15:15                         ` Mel Gorman
2010-04-19 15:15                         ` Mel Gorman
2010-04-19 15:15                           ` Mel Gorman
2010-04-19 17:38                           ` Chris Mason
2010-04-16 15:05                       ` Mel Gorman
2010-04-16  4:14                     ` Dave Chinner
2010-04-16  4:14                       ` Dave Chinner
2010-04-16 15:14                       ` Mel Gorman
2010-04-16 15:14                         ` Mel Gorman
2010-04-18  0:32                         ` Andrew Morton
2010-04-18  0:32                           ` Andrew Morton
2010-04-18 19:05                           ` Christoph Hellwig
2010-04-18 19:05                             ` Christoph Hellwig
2010-04-18 16:31                             ` Andrew Morton
2010-04-18 16:31                               ` Andrew Morton
2010-04-18 19:35                               ` Christoph Hellwig
2010-04-18 19:35                                 ` Christoph Hellwig
2010-04-18 19:11                             ` Sorin Faibish
2010-04-18 19:11                               ` Sorin Faibish
2010-04-18 19:11                               ` Sorin Faibish
2010-04-18 19:10                           ` Sorin Faibish
2010-04-18 19:10                             ` Sorin Faibish
2010-04-18 19:10                             ` Sorin Faibish
2010-04-18 21:30                             ` James Bottomley
2010-04-18 21:30                               ` James Bottomley
2010-04-18 23:34                               ` Sorin Faibish
2010-04-18 23:34                                 ` Sorin Faibish
2010-04-18 23:34                                 ` Sorin Faibish
2010-04-19  3:08                               ` tytso
2010-04-19  3:08                                 ` tytso
2010-04-19  0:35                           ` Dave Chinner
2010-04-19  0:35                             ` Dave Chinner
2010-04-19  0:49                             ` Arjan van de Ven
2010-04-19  0:49                               ` Arjan van de Ven
2010-04-19  1:08                               ` Dave Chinner
2010-04-19  1:08                                 ` Dave Chinner
2010-04-19  4:32                                 ` Arjan van de Ven
2010-04-19  4:32                                   ` Arjan van de Ven
2010-04-19 15:20                         ` Mel Gorman
2010-04-19 15:20                           ` Mel Gorman
2010-04-23  1:06                           ` Dave Chinner
2010-04-23  1:06                             ` Dave Chinner
2010-04-23 10:50                             ` Mel Gorman
2010-04-23 10:50                               ` Mel Gorman
2010-04-15 14:57                   ` Andi Kleen
2010-04-15 14:57                     ` Andi Kleen
2010-04-15  2:37                 ` Johannes Weiner
2010-04-15  2:37                   ` Johannes Weiner
2010-04-15  2:43                   ` KOSAKI Motohiro
2010-04-15  2:43                     ` KOSAKI Motohiro
2010-04-16 23:56                     ` Johannes Weiner
2010-04-16 23:56                       ` Johannes Weiner
2010-04-14  6:52         ` KOSAKI Motohiro
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-14 10:06         ` Andi Kleen
2010-04-14 10:06           ` Andi Kleen
2010-04-14 10:06           ` Andi Kleen
2010-04-14 11:20           ` Chris Mason
2010-04-14 11:20             ` Chris Mason
2010-04-14 12:15             ` Andi Kleen [this message]
2010-04-14 12:15               ` Andi Kleen
2010-04-14 12:15               ` Andi Kleen
2010-04-14 12:32               ` Alan Cox
2010-04-14 12:32                 ` Alan Cox
2010-04-14 12:34                 ` Andi Kleen
2010-04-14 12:34                   ` Andi Kleen
2010-04-14 13:23             ` Mel Gorman
2010-04-14 13:23               ` Mel Gorman
2010-04-14 14:07               ` Chris Mason
2010-04-14 14:07                 ` Chris Mason
2010-04-14  0:24 ` Minchan Kim
2010-04-14  0:24   ` Minchan Kim
2010-04-14  4:44   ` Dave Chinner
2010-04-14  4:44     ` Dave Chinner
2010-04-14  7:54     ` Minchan Kim
2010-04-14  7:54       ` Minchan Kim
2010-04-16  1:13 ` KAMEZAWA Hiroyuki
2010-04-16  1:13   ` KAMEZAWA Hiroyuki
2010-04-16  4:18   ` KAMEZAWA Hiroyuki
2010-04-16  4:18     ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8739yy9qnf.fsf@basil.nowhere.org \
    --to=andi@firstfloor.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.