linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NFS Still broken in 2.6.x?
@ 2006-02-23 20:35 Bryan Fink
  2006-02-23 22:47 ` Trond Myklebust
  0 siblings, 1 reply; 9+ messages in thread
From: Bryan Fink @ 2006-02-23 20:35 UTC (permalink / raw)
  To: linux-kernel

Hi All.  I'm running into a bit of trouble with NFS on 2.6.  I see that
at least Trond thought, mid-January, that "The readahead algorithm has
been broken in 2.6.x for at least the past 6 months." (
http://www.ussg.iu.edu/hypermail/linux/kernel/0601.2/0559.html) Anyone
know if that has been fixed?

Basically, the problem I'm having is that downloads from an NFS server
using kernel 2.6 are no more than half as fast as the same from a
server using kernel 2.4.  Write speed (uploading) seems to be about the
same, but reading is slow.

I'm using tcp as my protocol, at the suggestion of many posts, but
flipping over to udp doesn't seem to make any difference.  I'm using
version 3, although I did switch to 2 just to check (it's no better,
usually slower).  My read size is 32768 and my write size is 8192.
Decreasing the read size only slows down the transfers.  Increasing
write size has no effect.

As for hardware, both machines are dual AMD Opterons, 100Mbps ethernet,
and the NFS is serving space on a RAID array.  The 2.4 (2.4.21 to be
exact) kernel is running under SuSE 9.0, and the 2.6 (2.6.15 to be
exact) kernel is running under SuSE 10.0.  I saw the same speed drop
when attempting to upgrade to SuSE 9.3.  I stayed with 9.0 in hopes
that the problem would be fixed in the future.

Anyone have any ideas?

-Bryan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS Still broken in 2.6.x?
  2006-02-23 20:35 NFS Still broken in 2.6.x? Bryan Fink
@ 2006-02-23 22:47 ` Trond Myklebust
  2006-02-24 12:14   ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2006-02-23 22:47 UTC (permalink / raw)
  To: Bryan Fink; +Cc: linux-kernel

On Thu, 2006-02-23 at 15:35 -0500, Bryan Fink wrote:
> Hi All.  I'm running into a bit of trouble with NFS on 2.6.  I see that
> at least Trond thought, mid-January, that "The readahead algorithm has
> been broken in 2.6.x for at least the past 6 months." (
> http://www.ussg.iu.edu/hypermail/linux/kernel/0601.2/0559.html) Anyone
> know if that has been fixed?

No it hasn't been fixed. ...and no, this is not a problem that only
affects NFS: it just happens to give a more noticeable performance
impact due to the larger latency of NFS over a 100Mbps link.

I will get round to this, but the general opacity of the current
readahead code has been a bit of a put-off in the face of other NFS
problems.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS Still broken in 2.6.x?
  2006-02-23 22:47 ` Trond Myklebust
@ 2006-02-24 12:14   ` Andrew Morton
  2006-02-24 13:36     ` Trond Myklebust
  2006-02-24 16:18     ` Bryan Fink
  0 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2006-02-24 12:14 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: bfink, linux-kernel

Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
>
> On Thu, 2006-02-23 at 15:35 -0500, Bryan Fink wrote:
>  > Hi All.  I'm running into a bit of trouble with NFS on 2.6.  I see that
>  > at least Trond thought, mid-January, that "The readahead algorithm has
>  > been broken in 2.6.x for at least the past 6 months." (
>  > http://www.ussg.iu.edu/hypermail/linux/kernel/0601.2/0559.html) Anyone
>  > know if that has been fixed?
> 
>  No it hasn't been fixed. ...and no, this is not a problem that only
>  affects NFS: it just happens to give a more noticeable performance
>  impact due to the larger latency of NFS over a 100Mbps link.

iirc, last time we went round this loop Ram and I were unable to reproduce it.

Does anyone have a testcase?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS Still broken in 2.6.x?
  2006-02-24 12:14   ` Andrew Morton
@ 2006-02-24 13:36     ` Trond Myklebust
  2006-02-24 14:22       ` Bryan Fink
  2006-02-24 16:18     ` Bryan Fink
  1 sibling, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2006-02-24 13:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: bfink, linux-kernel

On Fri, 2006-02-24 at 04:14 -0800, Andrew Morton wrote:
> Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> >
> > On Thu, 2006-02-23 at 15:35 -0500, Bryan Fink wrote:
> >  > Hi All.  I'm running into a bit of trouble with NFS on 2.6.  I see that
> >  > at least Trond thought, mid-January, that "The readahead algorithm has
> >  > been broken in 2.6.x for at least the past 6 months." (
> >  > http://www.ussg.iu.edu/hypermail/linux/kernel/0601.2/0559.html) Anyone
> >  > know if that has been fixed?
> > 
> >  No it hasn't been fixed. ...and no, this is not a problem that only
> >  affects NFS: it just happens to give a more noticeable performance
> >  impact due to the larger latency of NFS over a 100Mbps link.
> 
> iirc, last time we went round this loop Ram and I were unable to reproduce it.
> 
> Does anyone have a testcase?

Yes. A dead simple one

run iozone in sequential read mode on a tcp link w/ rsize == 32k

Monitor the traffic using tcpdump. Pretty soon you will see the size of
the NFS read requests drop from 32k to 4k, which indicates that there is
no readahead at all.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS Still broken in 2.6.x?
  2006-02-24 13:36     ` Trond Myklebust
@ 2006-02-24 14:22       ` Bryan Fink
  0 siblings, 0 replies; 9+ messages in thread
From: Bryan Fink @ 2006-02-24 14:22 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Andrew Morton, linux-kernel

Trond Myklebust wrote:

>On Fri, 2006-02-24 at 04:14 -0800, Andrew Morton wrote:
>  
>
>>Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
>>    
>>
>>>On Thu, 2006-02-23 at 15:35 -0500, Bryan Fink wrote:
>>> > Hi All.  I'm running into a bit of trouble with NFS on 2.6.  I see that
>>> > at least Trond thought, mid-January, that "The readahead algorithm has
>>> > been broken in 2.6.x for at least the past 6 months." (
>>> > http://www.ussg.iu.edu/hypermail/linux/kernel/0601.2/0559.html) Anyone
>>> > know if that has been fixed?
>>>
>>> No it hasn't been fixed. ...and no, this is not a problem that only
>>> affects NFS: it just happens to give a more noticeable performance
>>> impact due to the larger latency of NFS over a 100Mbps link.
>>>      
>>>
>>iirc, last time we went round this loop Ram and I were unable to reproduce it.
>>
>>Does anyone have a testcase?
>>    
>>
>
>Yes. A dead simple one
>
>run iozone in sequential read mode on a tcp link w/ rsize == 32k
>  
>
I'm sure Trond's testcase is much more useful, but for reference, I 
thought I'd add that I've been doing my testing with a simple "dd 
if=/nfsmount/file of=/dev/null bs=32k".  /nfsmount/file is usually 2.5-3 
GB, which makes the difference between NFS servers long enough that I 
feel safe throwing a "time" in front of the whole command.  That is, the 
difference is nowhere near millisecond resolution (it's nearer a 
minute), so I like to start the test and then walk away to do other things.

Interesting that it's not an NFS-only bug.  I assumed it was when I 
logged into each server so I could run "dd if=file of=/dev/null bs=32k" 
locally.  When I did that, both servers gave roughly the same speed.  
Sorry I left this bit out of my first email.  I assume this example only 
illustrates how opaque the code around this problem truly is.

Thanks very much for the help.

-Bryan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS Still broken in 2.6.x?
  2006-02-24 12:14   ` Andrew Morton
  2006-02-24 13:36     ` Trond Myklebust
@ 2006-02-24 16:18     ` Bryan Fink
  2006-02-24 22:32       ` Grant Coady
  1 sibling, 1 reply; 9+ messages in thread
From: Bryan Fink @ 2006-02-24 16:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Trond Myklebust, linux-kernel

Andrew Morton wrote:

>Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
>  
>
>>On Thu, 2006-02-23 at 15:35 -0500, Bryan Fink wrote:
>> > Hi All.  I'm running into a bit of trouble with NFS on 2.6.  I see that
>> > at least Trond thought, mid-January, that "The readahead algorithm has
>> > been broken in 2.6.x for at least the past 6 months." (
>> > http://www.ussg.iu.edu/hypermail/linux/kernel/0601.2/0559.html) Anyone
>> > know if that has been fixed?
>>
>> No it hasn't been fixed. ...and no, this is not a problem that only
>> affects NFS: it just happens to give a more noticeable performance
>> impact due to the larger latency of NFS over a 100Mbps link.
>>    
>>
>
>iirc, last time we went round this loop Ram and I were unable to reproduce it.
>
>Does anyone have a testcase?
>  
>

Hi again. I just found some new, very interesting information. Until 
just a few minutes ago, I hadn't realized that one could change the I/O 
scheduler at runtime. Looking into it, my system was using "cfq", and I 
have three other options, "noop", "anticipatory", and "deadline". I've 
now run tests using all three of the other schedulers, and they all 
bring performance back up to the level I had with kernel 2.4. So, either 
NFS is incompatible with cfq, or cfq has some issues that show very 
vividly when used with NFS (or, I suppose, I just have my system tuned 
wrong for use with cfq).

Hope this helps the bug hunt. Special thanks to Asfand Yar Qazi for 
writing to the list this morning asking how to change schedulers at 
runtime 
(http://www.ussg.iu.edu/hypermail/linux/kernel/0602.3/0135.html). Off to 
find out exactly what the best scheduler is for my needs.

-Bryan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS Still broken in 2.6.x?
  2006-02-24 16:18     ` Bryan Fink
@ 2006-02-24 22:32       ` Grant Coady
  0 siblings, 0 replies; 9+ messages in thread
From: Grant Coady @ 2006-02-24 22:32 UTC (permalink / raw)
  To: Bryan Fink; +Cc: Andrew Morton, Trond Myklebust, linux-kernel

On Fri, 24 Feb 2006 11:18:44 -0500, Bryan Fink <bfink@eventmonitor.com> wrote:

>Hi again. I just found some new, very interesting information. Until 
>just a few minutes ago, I hadn't realized that one could change the I/O 
>scheduler at runtime. Looking into it, my system was using "cfq", and I 
>have three other options, "noop", "anticipatory", and "deadline". I've 
>now run tests using all three of the other schedulers, and they all 
>bring performance back up to the level I had with kernel 2.4. So, either 
>NFS is incompatible with cfq, or cfq has some issues that show very 
>vividly when used with NFS (or, I suppose, I just have my system tuned 
>wrong for use with cfq).

I run NFS for ages -- all linux boxen here mount a shared export from 
localnet controller box to get source + patches.

Only have 'deadline' installed on 2.6 kernels -- not seen any problems 
with NFS here (apart from back when I had data corruption due a faulty 
memory stick).

Grant.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS Still broken in 2.6.x?
  2006-02-24 15:22 Oleg Nesterov
@ 2006-02-24 16:12 ` Oleg Nesterov
  0 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2006-02-24 16:12 UTC (permalink / raw)
  To: Andrew Morton, Bryan Fink, linux-kernel, Ram Pai, Steven Pratt,
	Trond Myklebust

Oleg Nesterov wrote:
> 
> Afaics, this problem was resolved a long ago.
> 
> The patch below should fix this problem. Does it?

Forgot to mention, this patch was tested,

Steven Pratt wrote:
>
> This is the patch I think we should apply.  Running tiobench with 4k 
> request size, 4GB working set, 256 threads and a 2MB max_readahead (to 
> help induce thrashing) on a 1GB 8way machine, throughput of sequential 
> IO increased from 50MB/sec to 92MB/sec on a 5disk raid0 array.  Tests 
> with smaller max_readaheads and smaller thread counts were all withing 
> the noise range of the benchmark, which is to be expected.

Oleg.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS Still broken in 2.6.x?
@ 2006-02-24 15:22 Oleg Nesterov
  2006-02-24 16:12 ` Oleg Nesterov
  0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2006-02-24 15:22 UTC (permalink / raw)
  To: Andrew Morton, Bryan Fink
  Cc: linux-kernel, Ram Pai, Steven Pratt, Trond Myklebust

Andrew morton wrote:
>
> Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> >
> > On Thu, 2006-02-23 at 15:35 -0500, Bryan Fink wrote:
> >  > Hi All.  I'm running into a bit of trouble with NFS on 2.6.  I see that
> >  > at least Trond thought, mid-January, that "The readahead algorithm has
> >  > been broken in 2.6.x for at least the past 6 months." (
> >  > http://www.ussg.iu.edu/hypermail/linux/kernel/0601.2/0559.html) Anyone
> >  > know if that has been fixed?
> > 
> >  No it hasn't been fixed. ...and no, this is not a problem that only
> >  affects NFS: it just happens to give a more noticeable performance
> >  impact due to the larger latency of NFS over a 100Mbps link.
> 
> iirc, last time we went round this loop Ram and I were unable to reproduce it.
> 
> Does anyone have a testcase?

Afaics, this problem was resolved a long ago.

The patch below should fix this problem. Does it?

Andrew, I'll resend it with a proper changelog and comments on Sunday,
currently I can't even do a compile test. I verified this patch still
applies cleanly.

------------------------------------------------------------------------------
>From - Thu Aug  4 20:33:03 2005
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Message-ID: <42F2433F.CF3B797A@tv-sign.ru>
Date: Thu, 04 Aug 2005 20:33:03 +0400
From: Oleg Nesterov <oleg@tv-sign.ru>
X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.20 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Ram Pai <linuxram@us.ibm.com>,
 	Trond Myklebust <Trond.Myklebust@netapp.com>,
 	Linus Torvalds <torvalds@osdl.org>,
 	Steven Pratt <slpratt@austin.ibm.com>, Andrew Morton <akpm@osdl.org>
Subject: Re: Readahead algorithm problems again...
References: <1122690994.8240.21.camel@lade.trondhjem.org> ...
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Length: 1255
Lines: 47

Oleg Nesterov wrote:
> 
> What do you think about this patch?

Ohh... Sorry, I attached the wrong one.

--- 2.6.13-rc4/mm/readahead.c~	Thu Apr  7 12:59:41 2005
+++ 2.6.13-rc4/mm/readahead.c	Thu Aug  4 20:25:14 2005
@@ -57,8 +57,8 @@ static inline void ra_off(struct file_ra
 	ra->start = 0;
 	ra->flags = 0;
 	ra->size = 0;
+	ra->ahead_size += ra->ahead_start;
 	ra->ahead_start = 0;
-	ra->ahead_size = 0;
 	return;
 }
 
@@ -423,8 +423,8 @@ static int make_ahead_window(struct addr
 		 * congestion.  The ahead window will any way be closed
 		 * in case we failed due to excessive page cache hits.
 		 */
+		ra->ahead_size += ra->ahead_start;
 		ra->ahead_start = 0;
-		ra->ahead_size = 0;
 	}
 
 	return ret;
@@ -507,7 +507,7 @@ page_cache_readahead(struct address_spac
 
 	if (ra->ahead_start == 0) {	 /* no ahead window yet */
 		if (!make_ahead_window(mapping, filp, ra, 0))
-			goto out;
+			goto recheck;
 	}
 	/*
 	 * Already have an ahead window, check if we crossed into it.
@@ -520,6 +520,9 @@ page_cache_readahead(struct address_spac
 		ra->start = ra->ahead_start;
 		ra->size = ra->ahead_size;
 		make_ahead_window(mapping, filp, ra, 0);
+recheck:
+		ra->prev_page = min(ra->prev_page,
+			ra->ahead_start + ra->ahead_size - 1);
 	}
 
 out:

------------------------------------------------------------------------------

There is another one, from Steven Pratt:

------------------------------------------------------------------------------
>From - Sat Aug 13 11:49:43 2005
Return-Path: <slpratt@austin.ibm.com>
X-Original-To: oleg@tv-sign.ru
Delivered-To: oleg@tv-sign.ru
Received: from localhost (localhost [127.0.0.1])
	by several.ru (Postfix) with ESMTP id 08412C014B
	for <oleg@tv-sign.ru>; Fri, 12 Aug 2005 23:12:25 +0400 (MSD)
Received: from several.ru ([127.0.0.1])
 by localhost (several.ru [127.0.0.1]) (amavisd-new, port 10024) with ESMTP
 id 23382-09 for <oleg@tv-sign.ru>; Fri, 12 Aug 2005 23:12:20 +0400 (MSD)
Received: by several.ru (Postfix, from userid 106)
	id 0F66CBFBC8; Fri, 12 Aug 2005 23:12:20 +0400 (MSD)
Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133])
	by several.ru (Postfix) with ESMTP id 2DE8ABFB5D
	for <oleg@tv-sign.ru>; Fri, 12 Aug 2005 23:12:19 +0400 (MSD)
Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106])
	by e35.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j7CJCHWY067722
	for <oleg@tv-sign.ru>; Fri, 12 Aug 2005 15:12:17 -0400
Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170])
	by d03relay04.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j7CJCVc7234970
	for <oleg@tv-sign.ru>; Fri, 12 Aug 2005 13:12:31 -0600
Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1])
	by d03av04.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j7CJCGGB032673
	for <oleg@tv-sign.ru>; Fri, 12 Aug 2005 13:12:16 -0600
Received: from [9.41.223.36] (slpratt-009041223036.austin.ibm.com [9.41.223.36])
	by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j7CJCFIx032651;
	Fri, 12 Aug 2005 13:12:16 -0600
Message-ID: <42FCF481.5050507@austin.ibm.com>
Date: Fri, 12 Aug 2005 14:12:01 -0500
From: Steven Pratt <slpratt@austin.ibm.com>
User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050317)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Andrew Morton <akpm@osdl.org>
Cc: Ram Pai <linuxram@us.ibm.com>, oleg@tv-sign.ru,
	Trond.Myklebust@netapp.com, torvalds@osdl.org
Subject: Re: Readahead algorithm problems again...
References: <1122690994.8240.21.camel@lade.trondhjem.org> ...
In-Reply-To: <20050812112141.1868a1af.akpm@osdl.org>
Content-Type: multipart/mixed;
 boundary="------------010005060703040509010104"
X-Mozilla-Status: 8011
X-Mozilla-Status2: 00000000
X-UIDL: 434d6b27fc1a5f9a
Status: O
Content-Length: 1655
Lines: 71

This is a multi-part message in MIME format.
--------------010005060703040509010104
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

The current current get_init_ra_size is not optimal across different IO 
sizes and max_readahead values.  Here is a quick summary of sizes 
computed under current design and under the attached patch.  All of 
these assume 1st IO at offset 0, or 1st detected sequential IO.

32k max, 4k request

old         new
-----------------
 8k        8k
16k       16k
32k       32k

128k max, 4k request
old         new
-----------------
32k         16k
64k         32k
128k        64k
128k       128k

128k max, 32k request
old         new
-----------------
32k         64k    <-----
64k        128k
128k       128k


512k max, 4k request
old         new
-----------------
4k         32k     <----
16k        64k
64k       128k
128k      256k
512k      512k


Steve

--- linux-2.6.12/mm/readahead.org.c	2005-08-01 08:52:12.000000000 -0500
+++ linux-2.6.12/mm/readahead.c	2005-08-10 10:16:52.000000000 -0500
@@ -72,10 +72,10 @@ static unsigned long get_init_ra_size(un
 {
 	unsigned long newsize = roundup_pow_of_two(size);
 
-	if (newsize <= max / 64)
-		newsize = newsize * newsize;
+	if (newsize <= max / 32)
+		newsize = newsize * 4;
 	else if (newsize <= max / 4)
-		newsize = max / 4;
+		newsize = newsize * 2;
 	else
 		newsize = max;
 	return newsize;

------------------------------------------------------------------------------

Oleg.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-02-24 22:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-23 20:35 NFS Still broken in 2.6.x? Bryan Fink
2006-02-23 22:47 ` Trond Myklebust
2006-02-24 12:14   ` Andrew Morton
2006-02-24 13:36     ` Trond Myklebust
2006-02-24 14:22       ` Bryan Fink
2006-02-24 16:18     ` Bryan Fink
2006-02-24 22:32       ` Grant Coady
2006-02-24 15:22 Oleg Nesterov
2006-02-24 16:12 ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).