All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: render farm NFS server is having hard time staying up.
@ 2004-10-19 15:44 Lever, Charles
  2004-10-19 16:59 ` James Pearson
  0 siblings, 1 reply; 3+ messages in thread
From: Lever, Charles @ 2004-10-19 15:44 UTC (permalink / raw)
  To: Greg Whynott; +Cc: Linux NFS Mailing List

> 24 nfsd's fire off at startup.

if you try more than 24, does that help or hurt the situation?

> I have added this as part of the system startup:
> echo 262144 > /proc/sys/net/core/rmem_default
> echo 262144 > /proc/sys/net/core/rmem_max
> /etc/init.d/nfs start
> echo 65536 > /proc/sys/net/core/rmem_default
> echo 65536 > /proc/sys/net/core/rmem_max

i don't understand why you reset these values after you've started nfs.
262144 should be safe to use all the time, and for both rmem and wmem.

you are probably using NFS over UDP.  "netstat -s" on either the client
or server will show you IP fragment reassembly stats that could be one
source of your problem.


-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: render farm NFS server is having hard time staying up.
  2004-10-19 15:44 render farm NFS server is having hard time staying up Lever, Charles
@ 2004-10-19 16:59 ` James Pearson
  0 siblings, 0 replies; 3+ messages in thread
From: James Pearson @ 2004-10-19 16:59 UTC (permalink / raw)
  To: Lever, Charles; +Cc: Linux NFS Mailing List

Lever, Charles wrote:
>>24 nfsd's fire off at startup.
> 
> 
> if you try more than 24, does that help or hurt the situation?
> 
> 
>>I have added this as part of the system startup:
>>echo 262144 > /proc/sys/net/core/rmem_default
>>echo 262144 > /proc/sys/net/core/rmem_max
>>/etc/init.d/nfs start
>>echo 65536 > /proc/sys/net/core/rmem_default
>>echo 65536 > /proc/sys/net/core/rmem_max
> 
> 
> i don't understand why you reset these values after you've started nfs.
> 262144 should be safe to use all the time, and for both rmem and wmem.

Probably because it says so in the NFS-HOWTO ...

see:

http://nfs.sourceforge.net/nfs-howto/performance.html#MEMLIMITS

James Pearson


-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* render farm NFS server is having hard time staying up.
@ 2004-10-19 15:35 Greg Whynott
  0 siblings, 0 replies; 3+ messages in thread
From: Greg Whynott @ 2004-10-19 15:35 UTC (permalink / raw)
  To: Linux NFS Mailing List

Hello Folks,

    I'm looking for any information which may help me resolve a NFS 
server issues we are seeing.  We are seeing about 1-3% curruption on 
files wrote to the array over NFS when under load.  Some times we'll see 
I/O errors, other times we'll see this error in the dmesg output"nfs: 
server murdock not responding, timed out",  and othertimes the result is 
a bad file. 

here are the details of the enviroment:

@200-300 dual cpu render nodes (depending on time of day).
all connected to gigabit network ports.

NFS server is a dual 2.8 p4 with 4gigs memory.

auto neg is off on switch ports,  locked to 1000/full-dup/flow-control

render nodes mount the file server(s) with automount using these options:
-rw,insecure,hard,rsize=8192,wsize=8192,intr,timeo=600

RedHat 9 is running on the servers:
2.4.20-8 with big mem support.
rw,no_root_squash,insecure,sync,no_subtree_check
24 nfsd's fire off at startup.

contents of proc-nfsd:
[root@barney root]# cat /proc/net/rpc/nfsd
rc 6738 70516059 9738836
fh 500 79366229 10104583 667218 0
io 196640402 2028579561
th 24 387656 14064.970 2016.480 615.180 93.980 239.450 152.980 143.640 
144.910 2.240 831.600
ra 48 47883 0 0 0 0 74 0 0 0 0 121
net 80270754 80270754 0 0
rpc 80261633 9121 0 9121 0
proc2 18 22 6763 918 0 1406 1 0 0 163637 142 0 0 0 0 1 0 0 11
proc3 22 4 2462879 570357 1141041 5515254 650 48078 69567752 142094 6308 
3 0 3 0 71582 0 6417 0 4474 4477 0 547359


RedHat 7.3 is running on the render nodes:
2.4.18-.7
export options:

The disk arrays connected to the server are Sun T4s in a 6320 array via 
dual 2G FC (active/active),  6 trays of 14 disks, hardware RAID 5 horz,  
RAID 0 vert.  The switches report few errors (counters reset 7 days ago):

  Port name is BARNEY
  MTU 1518 bytes, encapsulation ethernet
  300 second input rate: 23597672 bits/sec, 2266 packets/sec, 2.39% 
utilization
  300 second output rate: 7404080 bits/sec, 2025 packets/sec, 0.76% 
utilization
  595831889 packets input, 589820579851 bytes, 0 no buffer
  Received 63119 broadcasts, 0 multicasts, 595768764 unicasts
  9 input errors, 6 CRC, 0 frame, 0 ignored
  3 runts, 0 giants, DMA received 595831869 packets
  765643165 packets output, 620030207291 bytes, 0 underruns
  Transmitted 57746415 broadcasts, 551424 multicasts, 707345326 unicasts
  0 output errors, 0 collisions, DMA transmitted 765643165 packets



I have added this as part of the system startup:
echo 262144 > /proc/sys/net/core/rmem_default
echo 262144 > /proc/sys/net/core/rmem_max
/etc/init.d/nfs start
echo 65536 > /proc/sys/net/core/rmem_default
echo 65536 > /proc/sys/net/core/rmem_max


This is a render farm where images are rendered then wrote out the the 
array when complete.  At the same time there is are people reading files 
from the same array.  I suspect we are giving our NFS server a DoS of 
sorts,  my hopes are we can set things up in such away that if a file 
starts to write to the array, it'll finish and not write out bogas 
data.  If the server is to busy it should reject further connections 
rather than handle them incorrectly. pipe dream?

thanks very much for your time,  if you wish further info please let me 
know, I must run off to a meeting,

greg










-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-10-19 16:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-19 15:44 render farm NFS server is having hard time staying up Lever, Charles
2004-10-19 16:59 ` James Pearson
  -- strict thread matches above, loose matches on Subject: below --
2004-10-19 15:35 Greg Whynott

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.