All of lore.kernel.org
 help / color / mirror / Atom feed
* Horrible NFS Client Performance During Heavy Server IO
@ 2009-03-13 20:36 David Rees
  2009-03-13 21:10 ` Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: David Rees @ 2009-03-13 20:36 UTC (permalink / raw)
  To: linux-nfs

I've been trying to troubleshoot/tune around a problem I have been
experiencing where if the NFS server is under heavy write load (the
disks are saturated), client side NFS performance drops to nearly 0.
As soon as the load is lifted so that there is no significant IO wait
time on the server, the clients start acting responsively again.

Server setup:
Fedora 9 kernel 2.6.27.15-78.2.23.fc9.x86_64
NFS tuning - none
Disk system - 230GB SATA RAID1 array on a basic AACRAID adapter
Dual XEONs, 8GB RAM
Network GigE

Client setup:
Fedora 10 kernel 2.6.27.19-170.2.35.fc10.x86_64
NFS tuning - none
Network GigE

Things I have tried:

I have tried playing with the disk scheduler, switching to deadline
from cfq = no difference.
I have tried playing with rsize/wsize settings on the client = no difference.

Steps to reproduce:

1. Write a big file to the same partition that is exported on the server:
dd if=/dev/zero of=/opt/export/bigfile bs=1M count=5000 conv=fdatasync
2. Write a small file to the same partition from the client:
dd if=/dev/zero of=/opt/export/bigfile bs=16k count=8 conf=fdatasync

I am seeing slightly less than 2kBps (yes, 1000-2000 bytes per second)
from the client while this is happening.

It really looks like the nfs daemons on the server just get no
priority when compared to the local process.

Any ideas?  This is something I have noticed for quite some time.
Only thing I can think of trying is to upgrade the disk array so that
the disks are no longer a bottleneck.

-Dave

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Horrible NFS Client Performance During Heavy Server IO
  2009-03-13 20:36 Horrible NFS Client Performance During Heavy Server IO David Rees
@ 2009-03-13 21:10 ` Trond Myklebust
  2009-03-13 21:32   ` David Rees
       [not found]   ` <1236978608.7265.41.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 2 replies; 10+ messages in thread
From: Trond Myklebust @ 2009-03-13 21:10 UTC (permalink / raw)
  To: David Rees; +Cc: linux-nfs

On Fri, 2009-03-13 at 13:36 -0700, David Rees wrote:
> I've been trying to troubleshoot/tune around a problem I have been
> experiencing where if the NFS server is under heavy write load (the
> disks are saturated), client side NFS performance drops to nearly 0.
> As soon as the load is lifted so that there is no significant IO wait
> time on the server, the clients start acting responsively again.
> 
> Server setup:
> Fedora 9 kernel 2.6.27.15-78.2.23.fc9.x86_64
> NFS tuning - none
> Disk system - 230GB SATA RAID1 array on a basic AACRAID adapter
> Dual XEONs, 8GB RAM
> Network GigE
> 
> Client setup:
> Fedora 10 kernel 2.6.27.19-170.2.35.fc10.x86_64
> NFS tuning - none
> Network GigE
> 
> Things I have tried:
> 
> I have tried playing with the disk scheduler, switching to deadline
> from cfq = no difference.
> I have tried playing with rsize/wsize settings on the client = no difference.
> 
> Steps to reproduce:
> 
> 1. Write a big file to the same partition that is exported on the server:
> dd if=/dev/zero of=/opt/export/bigfile bs=1M count=5000 conv=fdatasync
> 2. Write a small file to the same partition from the client:
> dd if=/dev/zero of=/opt/export/bigfile bs=16k count=8 conf=fdatasync

You don't need conv=fdatasync when writing to NFS. The close-to-open
cache consistency automatically guarantees fdatasync on close().

> I am seeing slightly less than 2kBps (yes, 1000-2000 bytes per second)
> from the client while this is happening.

UDP transport, or TCP? If the former, then definitely switch to the
latter, since you're probably pounding the server with unnecessary RPC
retries while it is busy with the I/O.

For the same reason, also make sure that -otimeo=600 (the default for
TCP).

Trond


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Horrible NFS Client Performance During Heavy Server IO
  2009-03-13 21:10 ` Trond Myklebust
@ 2009-03-13 21:32   ` David Rees
  2009-03-13 21:40     ` Trond Myklebust
       [not found]   ` <1236978608.7265.41.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  1 sibling, 1 reply; 10+ messages in thread
From: David Rees @ 2009-03-13 21:32 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Fri, Mar 13, 2009 at 2:10 PM, Trond Myklebust
<trond.myklebust@fys.uio.no> wrote:
> On Fri, 2009-03-13 at 13:36 -0700, David Rees wrote:
>> Steps to reproduce:
>>
>> 1. Write a big file to the same partition that is exported on the server:
>> dd if=/dev/zero of=/opt/export/bigfile bs=1M count=5000 conv=fdatasync
>> 2. Write a small file to the same partition from the client:
>> dd if=/dev/zero of=/opt/export/bigfile bs=16k count=8 conf=fdatasync
>
> You don't need conv=fdatasync when writing to NFS. The close-to-open
> cache consistency automatically guarantees fdatasync on close().

Yeah, I didn't think I did, but I added it for good measure since I
needed it to simulate load directly on the server.  It doesn't seem to
affect the numbers significantly either way.

>> I am seeing slightly less than 2kBps (yes, 1000-2000 bytes per second)
>> from the client while this is happening.
>
> UDP transport, or TCP? If the former, then definitely switch to the
> latter, since you're probably pounding the server with unnecessary RPC
> retries while it is busy with the I/O.
>
> For the same reason, also make sure that -otimeo=600 (the default for
> TCP).

Yes, both set that way by default.  From /proc/mounts on the client:

rw,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,nointr,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.2.1.13,mountvers=3,mountproto=tcp,addr=10.2.1.13

-Dave

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Horrible NFS Client Performance During Heavy Server IO
  2009-03-13 21:32   ` David Rees
@ 2009-03-13 21:40     ` Trond Myklebust
  2009-03-13 21:59       ` David Rees
  0 siblings, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2009-03-13 21:40 UTC (permalink / raw)
  To: David Rees; +Cc: linux-nfs

On Fri, 2009-03-13 at 14:32 -0700, David Rees wrote:
> On Fri, Mar 13, 2009 at 2:10 PM, Trond Myklebust
> <trond.myklebust@fys.uio.no> wrote:
> > On Fri, 2009-03-13 at 13:36 -0700, David Rees wrote:
> >> Steps to reproduce:
> >>
> >> 1. Write a big file to the same partition that is exported on the server:
> >> dd if=/dev/zero of=/opt/export/bigfile bs=1M count=5000 conv=fdatasync
> >> 2. Write a small file to the same partition from the client:
> >> dd if=/dev/zero of=/opt/export/bigfile bs=16k count=8 conf=fdatasync
> >
> > You don't need conv=fdatasync when writing to NFS. The close-to-open
> > cache consistency automatically guarantees fdatasync on close().
> 
> Yeah, I didn't think I did, but I added it for good measure since I
> needed it to simulate load directly on the server.  It doesn't seem to
> affect the numbers significantly either way.
> 
> >> I am seeing slightly less than 2kBps (yes, 1000-2000 bytes per second)
> >> from the client while this is happening.
> >
> > UDP transport, or TCP? If the former, then definitely switch to the
> > latter, since you're probably pounding the server with unnecessary RPC
> > retries while it is busy with the I/O.
> >
> > For the same reason, also make sure that -otimeo=600 (the default for
> > TCP).
> 
> Yes, both set that way by default.  From /proc/mounts on the client:
> 
> rw,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,nointr,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.2.1.13,mountvers=3,mountproto=tcp,addr=10.2.1.13

That all looks correct.

Just a few more random thoughts:

Do you perhaps see a lot of sockets connected to port 2049 and that are
not in the TCP_ESTABLISHED state when you do 'netstat -nt' on the
server?
What is the ratio of 'write' to 'commit' operations when you look at the
'nfsstat -s' output (also on the server)?

Trond


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Horrible NFS Client Performance During Heavy Server IO
  2009-03-13 21:40     ` Trond Myklebust
@ 2009-03-13 21:59       ` David Rees
  2009-03-13 22:29         ` Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: David Rees @ 2009-03-13 21:59 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Fri, Mar 13, 2009 at 2:40 PM, Trond Myklebust
<trond.myklebust@fys.uio.no> wrote:
> Do you perhaps see a lot of sockets connected to port 2049 and that are
> not in the TCP_ESTABLISHED state when you do 'netstat -nt' on the
> server?

Only one tcp socket from the client to the server (no other clients
need to be active when running the testcase).

> What is the ratio of 'write' to 'commit' operations when you look at the
> 'nfsstat -s' output (also on the server)?

It appears to be about 3-4 to 1 writes to commits.  By far the
heaviest activity is with getattr calls with 60-75% of the activity.

# nfsstat -s
Server rpc stats:
calls      badcalls   badauth    badclnt    xdrcall
21044536   0          0          0          0

Server nfs v3:
null         getattr      setattr      lookup       access       readlink
30        0% 12977249 61% 267095    1% 1201529   5% 3030825  14% 4816      0%
read         write        create       mkdir        symlink      mknod
722160    3% 1369511   6% 242484    1% 2852      0% 103       0% 1         0%
remove       rmdir        rename       link         readdir      readdirplus
218097    1% 4354      0% 189125    0% 284       0% 1569      0% 139267    0%
fsstat       fsinfo       pathconf     commit
3930      0% 554       0% 272       0% 610240    2%

And the activity around the time I am reproducing the slowdown:

Server nfs v3:
null         getattr      setattr      lookup       access       readlink
0         0% 3503     75% 7         0% 31        0% 1027     22% 0         0%
read         write        create       mkdir        symlink      mknod
9         0% 50        1% 6         0% 0         0% 0         0% 0         0%
remove       rmdir        rename       link         readdir      readdirplus
2         0% 0         0% 2         0% 0         0% 0         0% 0         0%
fsstat       fsinfo       pathconf     commit
0         0% 0         0% 0         0% 13        0%

-Dave

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Horrible NFS Client Performance During Heavy Server IO
  2009-03-13 21:59       ` David Rees
@ 2009-03-13 22:29         ` Trond Myklebust
  2009-03-13 23:05           ` David Rees
  0 siblings, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2009-03-13 22:29 UTC (permalink / raw)
  To: David Rees; +Cc: linux-nfs

On Fri, 2009-03-13 at 14:59 -0700, David Rees wrote:
> And the activity around the time I am reproducing the slowdown:
> 
> Server nfs v3:
> null         getattr      setattr      lookup       access       readlink
> 0         0% 3503     75% 7         0% 31        0% 1027     22% 0         0%
> read         write        create       mkdir        symlink      mknod
> 9         0% 50        1% 6         0% 0         0% 0         0% 0         0%
> remove       rmdir        rename       link         readdir      readdirplus
> 2         0% 0         0% 2         0% 0         0% 0         0% 0         0%
> fsstat       fsinfo       pathconf     commit
> 0         0% 0         0% 0         0% 13        0%

Is this the result of only doing 2 'dd' copies? That's a lot of GETATTR
calls for that kind of workload...

Trond


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Horrible NFS Client Performance During Heavy Server IO
  2009-03-13 22:29         ` Trond Myklebust
@ 2009-03-13 23:05           ` David Rees
  2009-03-17  4:54             ` Greg Banks
  0 siblings, 1 reply; 10+ messages in thread
From: David Rees @ 2009-03-13 23:05 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Fri, Mar 13, 2009 at 3:29 PM, Trond Myklebust
<trond.myklebust@fys.uio.no> wrote:
> On Fri, 2009-03-13 at 14:59 -0700, David Rees wrote:
>> And the activity around the time I am reproducing the slowdown:
>>
>> Server nfs v3:
>> null         getattr      setattr      lookup       access       readlink
>> 0         0% 3503     75% 7         0% 31        0% 1027     22% 0         0%
>> read         write        create       mkdir        symlink      mknod
>> 9         0% 50        1% 6         0% 0         0% 0         0% 0         0%
>> remove       rmdir        rename       link         readdir      readdirplus
>> 2         0% 0         0% 2         0% 0         0% 0         0% 0         0%
>> fsstat       fsinfo       pathconf     commit
>> 0         0% 0         0% 0         0% 13        0%
>
> Is this the result of only doing 2 'dd' copies? That's a lot of GETATTR
> calls for that kind of workload...

No - the client that I have been duplicating this from is also my
desktop and the NFS server hosts my home directory and it was active
during the test.

That's where the extreme slowdown in NFS performance affects me the
most.  When then heavy IO on the server is going on (even just a
single process writing as fast as it can), my applications (Firefox,
Thunderbird, Gnome Terminals, just about anything that accesses the
NFS mount) will basically lock up and go totally unresponsive while
they wait for the NFS server to respond.  They will sit unresponsive
for minutes at a time and are unusable until the heavy IO stops on the
server.

I do software development from this machine and I have timed one of my
project builds with and without the heavy IO on the NFS server - a
build that normally takes about 20 seconds will take 5 minutes to
complete (it does read/write a lot of small files).

-Dave

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Horrible NFS Client Performance During Heavy Server IO
       [not found]   ` <1236978608.7265.41.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-03-17  4:43     ` Greg Banks
  0 siblings, 0 replies; 10+ messages in thread
From: Greg Banks @ 2009-03-17  4:43 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: David Rees, linux-nfs

Trond Myklebust wrote:
> On Fri, 2009-03-13 at 13:36 -0700, David Rees wrote:
>   
>>     
>
> You don't need conv=fdatasync when writing to NFS. The close-to-open
> cache consistency automatically guarantees fdatasync on close().
>   
Yes...but at least some versions of dd will print a performance number
at the end, where the time delta used in the calculation includes the
fdatasync() time but not the close() time.  So conv=fdatasync is a
useful trick to get accurate performance numbers from dd.  Personally I
don't use that metric from dd because it's a simple overall average and
I like to watch the time behaviour of the transfer rate.

-- 
Greg Banks, P.Engineer, SGI Australian Software Group.
the brightly coloured sporks of revolution.
I don't speak for SGI.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Horrible NFS Client Performance During Heavy Server IO
  2009-03-13 23:05           ` David Rees
@ 2009-03-17  4:54             ` Greg Banks
  2009-03-17  6:28               ` David Rees
  0 siblings, 1 reply; 10+ messages in thread
From: Greg Banks @ 2009-03-17  4:54 UTC (permalink / raw)
  To: David Rees; +Cc: Trond Myklebust, linux-nfs

David Rees wrote:
> On Fri, Mar 13, 2009 at 3:29 PM, Trond Myklebust
> <trond.myklebust@fys.uio.no> wrote:
>   
>> On Fri, 2009-03-13 at 14:59 -0700, David Rees wrote:
>>     
>>> And the activity around the time I am reproducing the slowdown:
>>>
>>> Server nfs v3:
>>> null         getattr      setattr      lookup       access       readlink
>>> 0         0% 3503     75% 7         0% 31        0% 1027     22% 0         0%
>>> read         write        create       mkdir        symlink      mknod
>>> 9         0% 50        1% 6         0% 0         0% 0         0% 0         0%
>>> remove       rmdir        rename       link         readdir      readdirplus
>>> 2         0% 0         0% 2         0% 0         0% 0         0% 0         0%
>>> fsstat       fsinfo       pathconf     commit
>>> 0         0% 0         0% 0         0% 13        0%
>>>       
>> Is this the result of only doing 2 'dd' copies? That's a lot of GETATTR
>> calls for that kind of workload...
>>     
>
> No - the client that I have been duplicating this from is also my
> desktop and the NFS server hosts my home directory and it was active
> during the test.
>
> That's where the extreme slowdown in NFS performance affects me the
> most.  When then heavy IO on the server is going on (even just a
> single process writing as fast as it can), my applications (Firefox,
> Thunderbird, Gnome Terminals, just about anything that accesses the
> NFS mount) will basically lock up and go totally unresponsive while
> they wait for the NFS server to respond.  They will sit unresponsive
> for minutes at a time and are unusable until the heavy IO stops on the
> server.
>
> I do software development from this machine and I have timed one of my
> project builds with and without the heavy IO on the NFS server - a
> build that normally takes about 20 seconds will take 5 minutes to
> complete (it does read/write a lot of small files).
>   
David, could you try your test case with this command running on the server?

tethereal -i eth0 -f 'port 2049' -z rpc,rtt,100003,3 -w /dev/null

and ^C it when you've done.  You should see a table by RPC call with
minimum maximum and average "roundtrip times" (actually as they're
measured on the server, they should be server response times).  This
should tell you which calls are slow and whether it's all those calls or
only a few outliers.

Another experiment worth trying is a local filesystem load fairness test
on the server (no NFS client involved).  This will tell you whether the
nfsd's IO priority is an issue, or if you're just seeing an IO
scheduling issue.

1.  echo 3 > /proc/sys/vm/drop_caches
2.  time find /your/home/some/dir -ls > /dev/null (choose a directory
tree with a few hundred files)
3.  echo 3 > /proc/sys/vm/drop_caches
4.  start your large writes
5.  time find /your/home/some/dir -ls > /dev/null

-- 
Greg Banks, P.Engineer, SGI Australian Software Group.
the brightly coloured sporks of revolution.
I don't speak for SGI.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Horrible NFS Client Performance During Heavy Server IO
  2009-03-17  4:54             ` Greg Banks
@ 2009-03-17  6:28               ` David Rees
  0 siblings, 0 replies; 10+ messages in thread
From: David Rees @ 2009-03-17  6:28 UTC (permalink / raw)
  To: Greg Banks; +Cc: Trond Myklebust, linux-nfs

On Mon, Mar 16, 2009 at 9:54 PM, Greg Banks <gnb@sgi.com> wrote:
> David, could you try your test case with this command running on the server?
>
> tethereal -i eth0 -f 'port 2049' -z rpc,rtt,100003,3 -w /dev/null
>
> and ^C it when you've done.  You should see a table by RPC call with
> minimum maximum and average "roundtrip times" (actually as they're
> measured on the server, they should be server response times).  This
> should tell you which calls are slow and whether it's all those calls or
> only a few outliers.

OK, filtering out the empty lines, I ran the test (note that I
modified it slightly, details later) with and without the server side
large write running:

Without:

Procedure        Calls   Min RTT   Max RTT   Avg RTT
GETATTR              3   0.00006   0.00133   0.00048
SETATTR              1   0.03878   0.03878   0.03878
ACCESS               9   0.00006   0.00232   0.00049
COMMIT               1   1.13381   1.13381   1.13381

With:
Procedure        Calls   Min RTT   Max RTT   Avg RTT
GETATTR              1   0.00016   0.00016   0.00016
SETATTR              1  30.14662  30.14662  30.14662
ACCESS               8   0.00005   0.00544   0.00154
COMMIT               1   0.34472   0.34472   0.34472

> Another experiment worth trying is a local filesystem load fairness test
> on the server (no NFS client involved).  This will tell you whether the
> nfsd's IO priority is an issue, or if you're just seeing an IO
> scheduling issue.
>
> 1.  echo 3 > /proc/sys/vm/drop_caches
> 2.  time find /your/home/some/dir -ls > /dev/null (choose a directory
> tree with a few hundred files)
> 3.  echo 3 > /proc/sys/vm/drop_caches
> 4.  start your large writes
> 5.  time find /your/home/some/dir -ls > /dev/null

This test actually returned similar response times with or without the
big server write running.  Which led me to thinking that this isn't
even a NFS problem - it actually appears to be a VM or
filesystem(ext3) problem.

Here's the test case again:

1. On the server, write out a large file:
dd if=/dev/zero of=bigfile bs=1M count=5000

2. On the client, write out a small file:
dd if=/dev/zero of=smallfile bs=4k count=1

Now, I just ran #2 both on a NFS client and on the local server -
response time was the same, about 45 seconds.

So it appears that my combination of server (older dual Xeon 3Ghz, 8GB
RAM, SATA RAID1) lets too much dirty data accumulate creating a huge
backlog of data to be written to the journal and leaving small,
latency sensitive writes go very slowly.  I seem to remember reading
about this problem before - oh yeah:

http://lwn.net/Articles/216853/
on top of this newer bug:
http://bugzilla.kernel.org/show_bug.cgi?id=12309
http://lkml.org/lkml/2009/1/16/487

I'm off to go hack dirty_ratio and dirty_background_ratio for now
until this bug gets fixed and ext4 appears to fully stabilize. :-)

Thanks for your help, guys.

-Dave

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-03-17  6:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-13 20:36 Horrible NFS Client Performance During Heavy Server IO David Rees
2009-03-13 21:10 ` Trond Myklebust
2009-03-13 21:32   ` David Rees
2009-03-13 21:40     ` Trond Myklebust
2009-03-13 21:59       ` David Rees
2009-03-13 22:29         ` Trond Myklebust
2009-03-13 23:05           ` David Rees
2009-03-17  4:54             ` Greg Banks
2009-03-17  6:28               ` David Rees
     [not found]   ` <1236978608.7265.41.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-03-17  4:43     ` Greg Banks

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.