linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "azurIt" <azurit@pobox.sk>
To: "Mel Gorman" <mel@csn.ul.ie>,
	"Andrew Morton" <akpm@linux-foundation.org>
Cc: "Eric Dumazet" <eric.dumazet@gmail.com>,
	"Changli Gao" <xiaosuo@gmail.com>,
	"Am?rico Wang" <xiyou.wangcong@gmail.com>,
	"Jiri Slaby" <jslaby@suse.cz>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>, <linux-fsdevel@vger.kernel.org>,
	"Jiri Slaby" <jirislaby@gmail.com>
Subject: Re: Regression from 2.6.36
Date: Fri, 15 Apr 2011 11:59:03 +0200	[thread overview]
Message-ID: <20110415115903.315DEAA1@pobox.sk> (raw)
In-Reply-To: <20110414102501.GE11871@csn.ul.ie>


Also this new patch is working fine and fixing the problem.

Mel, I cannot run your script:
# perl watch-highorder-latency.pl
Failed to open /sys/kernel/debug/tracing/set_ftrace_filter for writing at watch-highorder-latency.pl line 17.

# ls -ld /sys/kernel/debug/
ls: cannot access /sys/kernel/debug/: No such file or directory


azur

______________________________________________________________
> Od: "Mel Gorman" <mel@csn.ul.ie>
> Komu: Andrew Morton <akpm@linux-foundation.org>
> Dátum: 14.04.2011 12:25
> Predmet: Re: Regression from 2.6.36
>
> CC: "Eric Dumazet" <eric.dumazet@gmail.com>, "Changli Gao" <xiaosuo@gmail.com>, "Am?rico Wang" <xiyou.wangcong@gmail.com>, "Jiri Slaby" <jslaby@suse.cz>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, "Jiri Slaby" <jirislaby@gmail.com>
>On Wed, Apr 13, 2011 at 02:16:00PM -0700, Andrew Morton wrote:
>> On Wed, 13 Apr 2011 04:37:36 +0200
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> 
>> > Le mardi 12 avril 2011 __ 18:31 -0700, Andrew Morton a __crit :
>> > > On Wed, 13 Apr 2011 09:23:11 +0800 Changli Gao <xiaosuo@gmail.com> wrote:
>> > > 
>> > > > On Wed, Apr 13, 2011 at 6:49 AM, Andrew Morton
>> > > > <akpm@linux-foundation.org> wrote:
>> > > > >
>> > > > > It's somewhat unclear (to me) what caused this regression.
>> > > > >
>> > > > > Is it because the kernel is now doing large kmalloc()s for the fdtable,
>> > > > > and this makes the page allocator go nuts trying to satisfy high-order
>> > > > > page allocation requests?
>> > > > >
>> > > > > Is it because the kernel now will usually free the fdtable
>> > > > > synchronously within the rcu callback, rather than deferring this to a
>> > > > > workqueue?
>> > > > >
>> > > > > The latter seems unlikely, so I'm thinking this was a case of
>> > > > > high-order-allocations-considered-harmful?
>> > > > >
>> > > > 
>> > > > Maybe, but I am not sure. Maybe my patch causes too many inner
>> > > > fragments. For example, when asking for 5 pages, get 8 pages, and 3
>> > > > pages are wasted, then memory thrash happens finally.
>> > > 
>> > > That theory sounds less likely, but could be tested by using
>> > > alloc_pages_exact().
>> > > 
>> > 
>> > Very unlikely, since fdtable sizes are powers of two, unless you hit
>> > sysctl_nr_open and it was changed (default value being 2^20)
>> > 
>> 
>> So am I correct in believing that this regression is due to the
>> high-order allocations putting excess stress onto page reclaim?
>> 
>
>This is very plausible but it would be nice to get confirmation on
>what the size of the fdtable was to be sure. If it's big enough for
>high-order allocations and it's a fork-heavy workload with memory
>mostly in use, the fork() latencies could be getting very high. In
>addition, each fork is potentially kicking kswapd awake (to rebalance
>the zone for higher orders). I do not see CONFIG_COMPACTION enabled
>meaning that if I'm right in that kswapd is awake and fork() is
>entering direct reclaim, then we are lumpy reclaiming as well which
>can stall pretty severely.
>
>> If so, then how large _are_ these allocations?  This perhaps can be
>> determined from /proc/slabinfo.  They must be pretty huge, because slub
>> likes to do excessively-large allocations and the system handles that
>> reasonably well.
>> 
>
>I'd be interested in finding out the value of /proc/sys/fs/file-max and
>the output of ulimit -n (max open files) for the main server is. This
>should help us determine what the size of the fdtable is.
>
>> I suppose that a suitable fix would be
>> 
>> 
>> From: Andrew Morton <akpm@linux-foundation.org>
>> 
>> Azurit reports large increases in system time after 2.6.36 when running
>> Apache.  It was bisected down to a892e2d7dcdfa6c76e6 ("vfs: use kmalloc()
>> to allocate fdmem if possible").
>> 
>> That patch caused the vfs to use kmalloc() for very large allocations and
>> this is causing excessive work (and presumably excessive reclaim) within
>> the page allocator.
>> 
>> Fix it by falling back to vmalloc() earlier - when the allocation attempt
>> would have been considered "costly" by reclaim.
>> 
>> Reported-by: azurIt <azurit@pobox.sk>
>> Cc: Changli Gao <xiaosuo@gmail.com>
>> Cc: Americo Wang <xiyou.wangcong@gmail.com>
>> Cc: Jiri Slaby <jslaby@suse.cz>
>> Cc: Eric Dumazet <eric.dumazet@gmail.com>
>> Cc: Mel Gorman <mel@csn.ul.ie>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> ---
>> 
>>  fs/file.c |   17 ++++++++++-------
>>  1 file changed, 10 insertions(+), 7 deletions(-)
>> 
>> diff -puN fs/file.c~a fs/file.c
>> --- a/fs/file.c~a
>> +++ a/fs/file.c
>> @@ -39,14 +39,17 @@ int sysctl_nr_open_max = 1024 * 1024; /*
>>   */
>>  static DEFINE_PER_CPU(struct fdtable_defer, fdtable_defer_list);
>>  
>> -static inline void *alloc_fdmem(unsigned int size)
>> +static void *alloc_fdmem(unsigned int size)
>>  {
>> -	void *data;
>> -
>> -	data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
>> -	if (data != NULL)
>> -		return data;
>> -
>> +	/*
>> +	 * Very large allocations can stress page reclaim, so fall back to
>> +	 * vmalloc() if the allocation size will be considered "large" by the VM.
>> +	 */
>> +	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER) {
>
>The reporter will need to retest this is really ok. The patch that was
>reported to help avoided high-order allocations entirely. If fork-heavy
>workloads are really entering direct reclaim and increasing fork latency
>enough to ruin performance, then this patch will also suffer. How much
>it helps depends on how big fdtable.
>
>> +		void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
>> +		if (data != NULL)
>> +			return data;
>> +	}
>>  	return vmalloc(size);
>>  }
>>  
>
>I'm attaching a primitive perl script that reports high-order allocation
>latencies. I'd be interesting to see what the output of it looks like,
>particularly when the server is in trouble if the bug reporter as the
>time.
>
>-- 
>Mel Gorman
>SUSE Labs
>
>

  reply	other threads:[~2011-04-15  9:59 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-15 13:25 Regression from 2.6.36 azurIt
2011-03-17  0:15 ` Greg KH
2011-03-17  0:53   ` Dave Jones
2011-03-17 13:30     ` azurIt
2011-04-07 10:01   ` azurIt
2011-04-07 10:19     ` Jiri Slaby
2011-04-07 11:21       ` Américo Wang
2011-04-07 11:57         ` Eric Dumazet
2011-04-07 12:13           ` Eric Dumazet
2011-04-07 15:27             ` Changli Gao
2011-04-07 15:36               ` Eric Dumazet
2011-04-12 22:49                 ` Andrew Morton
2011-04-13  1:23                   ` Changli Gao
2011-04-13  1:31                     ` Andrew Morton
2011-04-13  2:37                       ` Eric Dumazet
2011-04-13  6:54                         ` Regarding memory fragmentation using malloc Pintu Agarwal
2011-04-13 11:44                           ` Américo Wang
2011-04-13 13:56                             ` Pintu Agarwal
2011-04-13 15:25                               ` Michal Nazarewicz
2011-04-14  6:44                                 ` Pintu Agarwal
2011-04-14 10:47                                   ` Michal Nazarewicz
2011-04-14 12:24                                     ` Pintu Agarwal
2011-04-14 12:31                                       ` Michal Nazarewicz
2011-04-13 21:16                         ` Regression from 2.6.36 Andrew Morton
2011-04-13 21:24                           ` Andrew Morton
2011-04-19 19:29                             ` azurIt
2011-04-19 19:55                               ` Andrew Morton
2011-04-13 21:44                           ` David Rientjes
2011-04-13 21:54                             ` Andrew Morton
2011-04-14  2:10                           ` Eric Dumazet
2011-04-14  5:28                             ` Andrew Morton
2011-04-14  6:31                               ` Eric Dumazet
2011-04-14  9:08                                 ` azurIt
2011-04-14 10:27                                   ` Eric Dumazet
2011-04-14 10:31                                     ` azurIt
2011-04-14 10:25                           ` Mel Gorman
2011-04-15  9:59                             ` azurIt [this message]
2011-04-15 10:47                               ` Mel Gorman
2011-04-15 10:56                                 ` azurIt
2011-04-15 11:17                                   ` Mel Gorman
2011-04-15 11:36                                     ` azurIt
2011-04-15 13:01                                       ` Mel Gorman
2011-04-15 13:21                                         ` azurIt
2011-04-15 14:15                                           ` Mel Gorman
2011-04-08 12:25               ` azurIt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110415115903.315DEAA1@pobox.sk \
    --to=azurit@pobox.sk \
    --cc=akpm@linux-foundation.org \
    --cc=eric.dumazet@gmail.com \
    --cc=jirislaby@gmail.com \
    --cc=jslaby@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=xiaosuo@gmail.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).