From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Mikkelborg, Kjetil" Subject: RE: NFS client write performance issue ... thoughts? Date: Mon, 12 Jan 2004 13:45:19 +0100 Sender: nfs-admin@lists.sourceforge.net Message-ID: <75587E33AC778145AACCE1601EEABF420D49B2@kda-beexc-02.kda.kongsberg.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: Return-path: Received: from sc8-sf-mx1-b.sourceforge.net ([10.3.1.11] helo=sc8-sf-mx1.sourceforge.net) by sc8-sf-list2.sourceforge.net with esmtp (Exim 4.24) id 1Ag2MK-0007z2-7c for nfs@lists.sourceforge.net; Mon, 12 Jan 2004 05:44:00 -0800 Received: from kda-mailgw-01.kongsberg.com ([193.71.180.106]) by sc8-sf-mx1.sourceforge.net with esmtp (Exim 4.30) id 1Ag2MH-0000c8-Pm for nfs@lists.sourceforge.net; Mon, 12 Jan 2004 05:43:57 -0800 To: "Paul Smith" Errors-To: nfs-admin@lists.sourceforge.net List-Unsubscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Post: List-Help: List-Subscribe: , List-Archive: -----Original Message----- From: Paul Smith [mailto:pausmith@nortelnetworks.com]=20 Sent: 8. januar 2004 18:47 To: nfs@lists.sourceforge.net Subject: Re: [NFS] NFS client write performance issue ... thoughts? %% writes: tm> All you are basically showing here is that our write caching sucks tm> badly. There's nothing there to pinpoint merging vs not merging tm> requests as the culprit. Good point. I think that was "intuited" from other info, but I'll have to check. tm> 3 things that will affect those numbers, and cloud the issue: tm> 1) Linux 2.4.x has a hard limit of 256 outstanding read+write nfs_page tm> struct per mountpoint in order to deal with the fact that the VM does tm> not have the necessary support to notify us when we are low on memory tm> (This limit has been removed in 2.6.x...). OK. tm> 2) Linux immediately puts the write on the wire once there are more tm> than wsize bytes to write out. This explains why bumping wsize results tm> in fewer writes. OK. tm> 3) There are accounting errors in Linux 2.4.18 that cause tm> retransmitted requests to be added to the total number of transmitted tm> ones. That explains why switching to TCP improves matters. Do you know when those accounting errors were fixed? ClearCase implements its own virtual filesystem type, and so is heavily tied to specific kernels (the kernel module is not open source of course :( ). We basically can move to any kernel that has been released as part of an official Red Hat release (say, 2.4.20-8 from RH9 would work), but no other kernels can be used (the ClearCase kernel module has checks on the sizes of various kernel structures and won't load if they're not what it thinks they should be--and since it's a filesystem it cares deeply about structures that have tended to change a lot. It won't even work with vanilla kernel.org kernels of the same version.) Actually It does not look like clearcase is checking for an exact kernel version, it just depends on redhat hacks in the kernel (I have no clue to which). But taking a 2.4.20-XX redhat kernel, and building it from SRPM actually work. Furthermore, since you have the kernel in source when building it from SRPM, you can add as many patches as you want, as long as these patches does not screw with the same stuff clearcase mvfs relies on. I managed to do some heavy modifying of a rh9 kernel SRPM, patch it up to what level I needed + include support for diskless boot. And use this on Fedora, and still got clearcase to work ( I had to tweak the /etc/issue, since clearcase actually checks for redhat(version) string). tm> Note: Try doing this with mmap(), and you will get very different tm> numbers, since mmap() can cache the entire database in memory, and only tm> flush it out when you msync() (or when memory pressure forces it to do tm> so). OK... except since we don't have the source we can't switch to mmap() without doing something very hacky like introducing some kind of shim shared library to remap some read/write calls to mmap(). Ouch. Also I think that ClearCase _does_ force sync fairly regularly to be sure the database is consistent. tm> One further criticism: there are no READ requests on the Sun tm> machine. That suggests that it had the database entirely in cache tm> when you started you test. Good point. Thanks Trond! --=20 ------------------------------------------------------------------------ ------- Paul D. Smith HASMAT--HA Software Mthds & Tools "Please remain calm...I may be mad, but I am a professional." --Mad Scientist ------------------------------------------------------------------------ ------- These are my opinions---Nortel Networks takes no responsibility for them. ------------------------------------------------------- This SF.net email is sponsored by: Perforce Software. Perforce is the Fast Software Configuration Management System offering advanced branching capabilities and atomic changes on 50+ platforms. Free Eval! http://www.perforce.com/perforce/loadprog.html _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ------------------------------------------------------- This SF.net email is sponsored by: Perforce Software. Perforce is the Fast Software Configuration Management System offering advanced branching capabilities and atomic changes on 50+ platforms. Free Eval! http://www.perforce.com/perforce/loadprog.html _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs