* NFS Still broken in 2.6.x?
@ 2006-02-23 20:35 Bryan Fink
2006-02-23 22:47 ` Trond Myklebust
0 siblings, 1 reply; 9+ messages in thread
From: Bryan Fink @ 2006-02-23 20:35 UTC (permalink / raw)
To: linux-kernel
Hi All. I'm running into a bit of trouble with NFS on 2.6. I see that
at least Trond thought, mid-January, that "The readahead algorithm has
been broken in 2.6.x for at least the past 6 months." (
http://www.ussg.iu.edu/hypermail/linux/kernel/0601.2/0559.html) Anyone
know if that has been fixed?
Basically, the problem I'm having is that downloads from an NFS server
using kernel 2.6 are no more than half as fast as the same from a
server using kernel 2.4. Write speed (uploading) seems to be about the
same, but reading is slow.
I'm using tcp as my protocol, at the suggestion of many posts, but
flipping over to udp doesn't seem to make any difference. I'm using
version 3, although I did switch to 2 just to check (it's no better,
usually slower). My read size is 32768 and my write size is 8192.
Decreasing the read size only slows down the transfers. Increasing
write size has no effect.
As for hardware, both machines are dual AMD Opterons, 100Mbps ethernet,
and the NFS is serving space on a RAID array. The 2.4 (2.4.21 to be
exact) kernel is running under SuSE 9.0, and the 2.6 (2.6.15 to be
exact) kernel is running under SuSE 10.0. I saw the same speed drop
when attempting to upgrade to SuSE 9.3. I stayed with 9.0 in hopes
that the problem would be fixed in the future.
Anyone have any ideas?
-Bryan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Still broken in 2.6.x?
2006-02-23 20:35 NFS Still broken in 2.6.x? Bryan Fink
@ 2006-02-23 22:47 ` Trond Myklebust
2006-02-24 12:14 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2006-02-23 22:47 UTC (permalink / raw)
To: Bryan Fink; +Cc: linux-kernel
On Thu, 2006-02-23 at 15:35 -0500, Bryan Fink wrote:
> Hi All. I'm running into a bit of trouble with NFS on 2.6. I see that
> at least Trond thought, mid-January, that "The readahead algorithm has
> been broken in 2.6.x for at least the past 6 months." (
> http://www.ussg.iu.edu/hypermail/linux/kernel/0601.2/0559.html) Anyone
> know if that has been fixed?
No it hasn't been fixed. ...and no, this is not a problem that only
affects NFS: it just happens to give a more noticeable performance
impact due to the larger latency of NFS over a 100Mbps link.
I will get round to this, but the general opacity of the current
readahead code has been a bit of a put-off in the face of other NFS
problems.
Cheers,
Trond
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Still broken in 2.6.x?
2006-02-23 22:47 ` Trond Myklebust
@ 2006-02-24 12:14 ` Andrew Morton
2006-02-24 13:36 ` Trond Myklebust
2006-02-24 16:18 ` Bryan Fink
0 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2006-02-24 12:14 UTC (permalink / raw)
To: Trond Myklebust; +Cc: bfink, linux-kernel
Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
>
> On Thu, 2006-02-23 at 15:35 -0500, Bryan Fink wrote:
> > Hi All. I'm running into a bit of trouble with NFS on 2.6. I see that
> > at least Trond thought, mid-January, that "The readahead algorithm has
> > been broken in 2.6.x for at least the past 6 months." (
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0601.2/0559.html) Anyone
> > know if that has been fixed?
>
> No it hasn't been fixed. ...and no, this is not a problem that only
> affects NFS: it just happens to give a more noticeable performance
> impact due to the larger latency of NFS over a 100Mbps link.
iirc, last time we went round this loop Ram and I were unable to reproduce it.
Does anyone have a testcase?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Still broken in 2.6.x?
2006-02-24 12:14 ` Andrew Morton
@ 2006-02-24 13:36 ` Trond Myklebust
2006-02-24 14:22 ` Bryan Fink
2006-02-24 16:18 ` Bryan Fink
1 sibling, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2006-02-24 13:36 UTC (permalink / raw)
To: Andrew Morton; +Cc: bfink, linux-kernel
On Fri, 2006-02-24 at 04:14 -0800, Andrew Morton wrote:
> Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> >
> > On Thu, 2006-02-23 at 15:35 -0500, Bryan Fink wrote:
> > > Hi All. I'm running into a bit of trouble with NFS on 2.6. I see that
> > > at least Trond thought, mid-January, that "The readahead algorithm has
> > > been broken in 2.6.x for at least the past 6 months." (
> > > http://www.ussg.iu.edu/hypermail/linux/kernel/0601.2/0559.html) Anyone
> > > know if that has been fixed?
> >
> > No it hasn't been fixed. ...and no, this is not a problem that only
> > affects NFS: it just happens to give a more noticeable performance
> > impact due to the larger latency of NFS over a 100Mbps link.
>
> iirc, last time we went round this loop Ram and I were unable to reproduce it.
>
> Does anyone have a testcase?
Yes. A dead simple one
run iozone in sequential read mode on a tcp link w/ rsize == 32k
Monitor the traffic using tcpdump. Pretty soon you will see the size of
the NFS read requests drop from 32k to 4k, which indicates that there is
no readahead at all.
Cheers,
Trond
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Still broken in 2.6.x?
2006-02-24 13:36 ` Trond Myklebust
@ 2006-02-24 14:22 ` Bryan Fink
0 siblings, 0 replies; 9+ messages in thread
From: Bryan Fink @ 2006-02-24 14:22 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Andrew Morton, linux-kernel
Trond Myklebust wrote:
>On Fri, 2006-02-24 at 04:14 -0800, Andrew Morton wrote:
>
>
>>Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
>>
>>
>>>On Thu, 2006-02-23 at 15:35 -0500, Bryan Fink wrote:
>>> > Hi All. I'm running into a bit of trouble with NFS on 2.6. I see that
>>> > at least Trond thought, mid-January, that "The readahead algorithm has
>>> > been broken in 2.6.x for at least the past 6 months." (
>>> > http://www.ussg.iu.edu/hypermail/linux/kernel/0601.2/0559.html) Anyone
>>> > know if that has been fixed?
>>>
>>> No it hasn't been fixed. ...and no, this is not a problem that only
>>> affects NFS: it just happens to give a more noticeable performance
>>> impact due to the larger latency of NFS over a 100Mbps link.
>>>
>>>
>>iirc, last time we went round this loop Ram and I were unable to reproduce it.
>>
>>Does anyone have a testcase?
>>
>>
>
>Yes. A dead simple one
>
>run iozone in sequential read mode on a tcp link w/ rsize == 32k
>
>
I'm sure Trond's testcase is much more useful, but for reference, I
thought I'd add that I've been doing my testing with a simple "dd
if=/nfsmount/file of=/dev/null bs=32k". /nfsmount/file is usually 2.5-3
GB, which makes the difference between NFS servers long enough that I
feel safe throwing a "time" in front of the whole command. That is, the
difference is nowhere near millisecond resolution (it's nearer a
minute), so I like to start the test and then walk away to do other things.
Interesting that it's not an NFS-only bug. I assumed it was when I
logged into each server so I could run "dd if=file of=/dev/null bs=32k"
locally. When I did that, both servers gave roughly the same speed.
Sorry I left this bit out of my first email. I assume this example only
illustrates how opaque the code around this problem truly is.
Thanks very much for the help.
-Bryan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Still broken in 2.6.x?
2006-02-24 12:14 ` Andrew Morton
2006-02-24 13:36 ` Trond Myklebust
@ 2006-02-24 16:18 ` Bryan Fink
2006-02-24 22:32 ` Grant Coady
1 sibling, 1 reply; 9+ messages in thread
From: Bryan Fink @ 2006-02-24 16:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: Trond Myklebust, linux-kernel
Andrew Morton wrote:
>Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
>
>
>>On Thu, 2006-02-23 at 15:35 -0500, Bryan Fink wrote:
>> > Hi All. I'm running into a bit of trouble with NFS on 2.6. I see that
>> > at least Trond thought, mid-January, that "The readahead algorithm has
>> > been broken in 2.6.x for at least the past 6 months." (
>> > http://www.ussg.iu.edu/hypermail/linux/kernel/0601.2/0559.html) Anyone
>> > know if that has been fixed?
>>
>> No it hasn't been fixed. ...and no, this is not a problem that only
>> affects NFS: it just happens to give a more noticeable performance
>> impact due to the larger latency of NFS over a 100Mbps link.
>>
>>
>
>iirc, last time we went round this loop Ram and I were unable to reproduce it.
>
>Does anyone have a testcase?
>
>
Hi again. I just found some new, very interesting information. Until
just a few minutes ago, I hadn't realized that one could change the I/O
scheduler at runtime. Looking into it, my system was using "cfq", and I
have three other options, "noop", "anticipatory", and "deadline". I've
now run tests using all three of the other schedulers, and they all
bring performance back up to the level I had with kernel 2.4. So, either
NFS is incompatible with cfq, or cfq has some issues that show very
vividly when used with NFS (or, I suppose, I just have my system tuned
wrong for use with cfq).
Hope this helps the bug hunt. Special thanks to Asfand Yar Qazi for
writing to the list this morning asking how to change schedulers at
runtime
(http://www.ussg.iu.edu/hypermail/linux/kernel/0602.3/0135.html). Off to
find out exactly what the best scheduler is for my needs.
-Bryan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Still broken in 2.6.x?
2006-02-24 16:18 ` Bryan Fink
@ 2006-02-24 22:32 ` Grant Coady
0 siblings, 0 replies; 9+ messages in thread
From: Grant Coady @ 2006-02-24 22:32 UTC (permalink / raw)
To: Bryan Fink; +Cc: Andrew Morton, Trond Myklebust, linux-kernel
On Fri, 24 Feb 2006 11:18:44 -0500, Bryan Fink <bfink@eventmonitor.com> wrote:
>Hi again. I just found some new, very interesting information. Until
>just a few minutes ago, I hadn't realized that one could change the I/O
>scheduler at runtime. Looking into it, my system was using "cfq", and I
>have three other options, "noop", "anticipatory", and "deadline". I've
>now run tests using all three of the other schedulers, and they all
>bring performance back up to the level I had with kernel 2.4. So, either
>NFS is incompatible with cfq, or cfq has some issues that show very
>vividly when used with NFS (or, I suppose, I just have my system tuned
>wrong for use with cfq).
I run NFS for ages -- all linux boxen here mount a shared export from
localnet controller box to get source + patches.
Only have 'deadline' installed on 2.6 kernels -- not seen any problems
with NFS here (apart from back when I had data corruption due a faulty
memory stick).
Grant.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Still broken in 2.6.x?
2006-02-24 15:22 Oleg Nesterov
@ 2006-02-24 16:12 ` Oleg Nesterov
0 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2006-02-24 16:12 UTC (permalink / raw)
To: Andrew Morton, Bryan Fink, linux-kernel, Ram Pai, Steven Pratt,
Trond Myklebust
Oleg Nesterov wrote:
>
> Afaics, this problem was resolved a long ago.
>
> The patch below should fix this problem. Does it?
Forgot to mention, this patch was tested,
Steven Pratt wrote:
>
> This is the patch I think we should apply. Running tiobench with 4k
> request size, 4GB working set, 256 threads and a 2MB max_readahead (to
> help induce thrashing) on a 1GB 8way machine, throughput of sequential
> IO increased from 50MB/sec to 92MB/sec on a 5disk raid0 array. Tests
> with smaller max_readaheads and smaller thread counts were all withing
> the noise range of the benchmark, which is to be expected.
Oleg.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS Still broken in 2.6.x?
@ 2006-02-24 15:22 Oleg Nesterov
2006-02-24 16:12 ` Oleg Nesterov
0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2006-02-24 15:22 UTC (permalink / raw)
To: Andrew Morton, Bryan Fink
Cc: linux-kernel, Ram Pai, Steven Pratt, Trond Myklebust
Andrew morton wrote:
>
> Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> >
> > On Thu, 2006-02-23 at 15:35 -0500, Bryan Fink wrote:
> > > Hi All. I'm running into a bit of trouble with NFS on 2.6. I see that
> > > at least Trond thought, mid-January, that "The readahead algorithm has
> > > been broken in 2.6.x for at least the past 6 months." (
> > > http://www.ussg.iu.edu/hypermail/linux/kernel/0601.2/0559.html) Anyone
> > > know if that has been fixed?
> >
> > No it hasn't been fixed. ...and no, this is not a problem that only
> > affects NFS: it just happens to give a more noticeable performance
> > impact due to the larger latency of NFS over a 100Mbps link.
>
> iirc, last time we went round this loop Ram and I were unable to reproduce it.
>
> Does anyone have a testcase?
Afaics, this problem was resolved a long ago.
The patch below should fix this problem. Does it?
Andrew, I'll resend it with a proper changelog and comments on Sunday,
currently I can't even do a compile test. I verified this patch still
applies cleanly.
------------------------------------------------------------------------------
>From - Thu Aug 4 20:33:03 2005
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Message-ID: <42F2433F.CF3B797A@tv-sign.ru>
Date: Thu, 04 Aug 2005 20:33:03 +0400
From: Oleg Nesterov <oleg@tv-sign.ru>
X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.20 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Ram Pai <linuxram@us.ibm.com>,
Trond Myklebust <Trond.Myklebust@netapp.com>,
Linus Torvalds <torvalds@osdl.org>,
Steven Pratt <slpratt@austin.ibm.com>, Andrew Morton <akpm@osdl.org>
Subject: Re: Readahead algorithm problems again...
References: <1122690994.8240.21.camel@lade.trondhjem.org> ...
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Length: 1255
Lines: 47
Oleg Nesterov wrote:
>
> What do you think about this patch?
Ohh... Sorry, I attached the wrong one.
--- 2.6.13-rc4/mm/readahead.c~ Thu Apr 7 12:59:41 2005
+++ 2.6.13-rc4/mm/readahead.c Thu Aug 4 20:25:14 2005
@@ -57,8 +57,8 @@ static inline void ra_off(struct file_ra
ra->start = 0;
ra->flags = 0;
ra->size = 0;
+ ra->ahead_size += ra->ahead_start;
ra->ahead_start = 0;
- ra->ahead_size = 0;
return;
}
@@ -423,8 +423,8 @@ static int make_ahead_window(struct addr
* congestion. The ahead window will any way be closed
* in case we failed due to excessive page cache hits.
*/
+ ra->ahead_size += ra->ahead_start;
ra->ahead_start = 0;
- ra->ahead_size = 0;
}
return ret;
@@ -507,7 +507,7 @@ page_cache_readahead(struct address_spac
if (ra->ahead_start == 0) { /* no ahead window yet */
if (!make_ahead_window(mapping, filp, ra, 0))
- goto out;
+ goto recheck;
}
/*
* Already have an ahead window, check if we crossed into it.
@@ -520,6 +520,9 @@ page_cache_readahead(struct address_spac
ra->start = ra->ahead_start;
ra->size = ra->ahead_size;
make_ahead_window(mapping, filp, ra, 0);
+recheck:
+ ra->prev_page = min(ra->prev_page,
+ ra->ahead_start + ra->ahead_size - 1);
}
out:
------------------------------------------------------------------------------
There is another one, from Steven Pratt:
------------------------------------------------------------------------------
>From - Sat Aug 13 11:49:43 2005
Return-Path: <slpratt@austin.ibm.com>
X-Original-To: oleg@tv-sign.ru
Delivered-To: oleg@tv-sign.ru
Received: from localhost (localhost [127.0.0.1])
by several.ru (Postfix) with ESMTP id 08412C014B
for <oleg@tv-sign.ru>; Fri, 12 Aug 2005 23:12:25 +0400 (MSD)
Received: from several.ru ([127.0.0.1])
by localhost (several.ru [127.0.0.1]) (amavisd-new, port 10024) with ESMTP
id 23382-09 for <oleg@tv-sign.ru>; Fri, 12 Aug 2005 23:12:20 +0400 (MSD)
Received: by several.ru (Postfix, from userid 106)
id 0F66CBFBC8; Fri, 12 Aug 2005 23:12:20 +0400 (MSD)
Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133])
by several.ru (Postfix) with ESMTP id 2DE8ABFB5D
for <oleg@tv-sign.ru>; Fri, 12 Aug 2005 23:12:19 +0400 (MSD)
Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106])
by e35.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j7CJCHWY067722
for <oleg@tv-sign.ru>; Fri, 12 Aug 2005 15:12:17 -0400
Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170])
by d03relay04.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j7CJCVc7234970
for <oleg@tv-sign.ru>; Fri, 12 Aug 2005 13:12:31 -0600
Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1])
by d03av04.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j7CJCGGB032673
for <oleg@tv-sign.ru>; Fri, 12 Aug 2005 13:12:16 -0600
Received: from [9.41.223.36] (slpratt-009041223036.austin.ibm.com [9.41.223.36])
by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j7CJCFIx032651;
Fri, 12 Aug 2005 13:12:16 -0600
Message-ID: <42FCF481.5050507@austin.ibm.com>
Date: Fri, 12 Aug 2005 14:12:01 -0500
From: Steven Pratt <slpratt@austin.ibm.com>
User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050317)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Andrew Morton <akpm@osdl.org>
Cc: Ram Pai <linuxram@us.ibm.com>, oleg@tv-sign.ru,
Trond.Myklebust@netapp.com, torvalds@osdl.org
Subject: Re: Readahead algorithm problems again...
References: <1122690994.8240.21.camel@lade.trondhjem.org> ...
In-Reply-To: <20050812112141.1868a1af.akpm@osdl.org>
Content-Type: multipart/mixed;
boundary="------------010005060703040509010104"
X-Mozilla-Status: 8011
X-Mozilla-Status2: 00000000
X-UIDL: 434d6b27fc1a5f9a
Status: O
Content-Length: 1655
Lines: 71
This is a multi-part message in MIME format.
--------------010005060703040509010104
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
The current current get_init_ra_size is not optimal across different IO
sizes and max_readahead values. Here is a quick summary of sizes
computed under current design and under the attached patch. All of
these assume 1st IO at offset 0, or 1st detected sequential IO.
32k max, 4k request
old new
-----------------
8k 8k
16k 16k
32k 32k
128k max, 4k request
old new
-----------------
32k 16k
64k 32k
128k 64k
128k 128k
128k max, 32k request
old new
-----------------
32k 64k <-----
64k 128k
128k 128k
512k max, 4k request
old new
-----------------
4k 32k <----
16k 64k
64k 128k
128k 256k
512k 512k
Steve
--- linux-2.6.12/mm/readahead.org.c 2005-08-01 08:52:12.000000000 -0500
+++ linux-2.6.12/mm/readahead.c 2005-08-10 10:16:52.000000000 -0500
@@ -72,10 +72,10 @@ static unsigned long get_init_ra_size(un
{
unsigned long newsize = roundup_pow_of_two(size);
- if (newsize <= max / 64)
- newsize = newsize * newsize;
+ if (newsize <= max / 32)
+ newsize = newsize * 4;
else if (newsize <= max / 4)
- newsize = max / 4;
+ newsize = newsize * 2;
else
newsize = max;
return newsize;
------------------------------------------------------------------------------
Oleg.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-02-24 22:33 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-23 20:35 NFS Still broken in 2.6.x? Bryan Fink
2006-02-23 22:47 ` Trond Myklebust
2006-02-24 12:14 ` Andrew Morton
2006-02-24 13:36 ` Trond Myklebust
2006-02-24 14:22 ` Bryan Fink
2006-02-24 16:18 ` Bryan Fink
2006-02-24 22:32 ` Grant Coady
2006-02-24 15:22 Oleg Nesterov
2006-02-24 16:12 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).