linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Spurious NFS ESTALE errors w/NFSv3 server, non-v3 client
@ 2001-09-06 18:11 Caleb Epstein
  2001-09-07 11:42 ` [NFS] " Neil Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Caleb Epstein @ 2001-09-06 18:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: nfs


	I belive this is new behavior in the latest (post-2.4.7 I
	believe) kernel NFS software:

	I have two machines, both running kernel 2.4.9, each of which
	act as both an NFS client and server to the other.  I am using
	the kernel NFS daemon and am exporting ext2fs filesystems on a
	local switched LAN.

	One box, called tela, was configured with NFSv3 enabled for
	both the client and server code.  The other box, hagrid, was
	not configured with any NFSv3 support enabled.  I just neglected
	to enable this in the configuration, its was not for any
	particular reason.

	When I did large file reads on hagrid (the v2 client), I
	would get spurious ESTALE errors on files which are totally
	static and haven't been
	touched in months.  Basically the filesystem contains a lot
	of audio files, and I was running md5 checksums on them from
	hagrid, while they were hosted on tela.

	When I checked the configuration on the client, and realized
	that NFSv3 was not enabled, I enabled it and rebuilt the
	kernel.  After a reboot, the errors disappeared and I can
	successfully read many gigabytes of data without a hiccup.

	Is this one of those "if it hurts then don't do that" kind of
	things, or is it the expected behavior?  I think I've had the
	two machines configured like this for several kernel
	revisions (2.4.0 onwards) and only noticed this behavior since
	I switched my server to 2.4.9 from 2.4.7.  It *may* have
	happened before and I didn't notice it, but I think this was
	introduced some time in 2.4.8 or later.

-- 
cae at bklyn dot org | Caleb Epstein | bklyn . org | Brooklyn Dust Bunny Mfg.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [NFS] Spurious NFS ESTALE errors w/NFSv3 server, non-v3 client
  2001-09-06 18:11 Spurious NFS ESTALE errors w/NFSv3 server, non-v3 client Caleb Epstein
@ 2001-09-07 11:42 ` Neil Brown
  2001-09-07 13:27   ` Caleb Epstein
  2001-09-08  2:45   ` mount source code Julius R Sirait
  0 siblings, 2 replies; 6+ messages in thread
From: Neil Brown @ 2001-09-07 11:42 UTC (permalink / raw)
  To: Caleb Epstein; +Cc: linux-kernel, nfs

On Thursday September 6, cae@bklyn.org wrote:
> 
> 	I belive this is new behavior in the latest (post-2.4.7 I
> 	believe) kernel NFS software:
> 
> 	I have two machines, both running kernel 2.4.9, each of which
> 	act as both an NFS client and server to the other.  I am using
> 	the kernel NFS daemon and am exporting ext2fs filesystems on a
> 	local switched LAN.
> 
> 	One box, called tela, was configured with NFSv3 enabled for
> 	both the client and server code.  The other box, hagrid, was
> 	not configured with any NFSv3 support enabled.  I just neglected
> 	to enable this in the configuration, its was not for any
> 	particular reason.
> 
> 	When I did large file reads on hagrid (the v2 client), I
> 	would get spurious ESTALE errors on files which are totally
> 	static and haven't been
> 	touched in months.  Basically the filesystem contains a lot
> 	of audio files, and I was running md5 checksums on them from
> 	hagrid, while they were hosted on tela.

NFSv2 has a limit of 2Gigabytes per file.  Are the files that you are
reading close to, or exceeding, this size?

However, I wouldn't expect an ESTALE for that reason. 

Can you run "tcpdump -s 1024", the the response that contains the
error, and send the dozon or so lines around that?

NeilBrown

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [NFS] Spurious NFS ESTALE errors w/NFSv3 server, non-v3 client
  2001-09-07 11:42 ` [NFS] " Neil Brown
@ 2001-09-07 13:27   ` Caleb Epstein
  2001-09-10  6:49     ` Neil Brown
  2001-09-08  2:45   ` mount source code Julius R Sirait
  1 sibling, 1 reply; 6+ messages in thread
From: Caleb Epstein @ 2001-09-07 13:27 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel, nfs

On Fri, Sep 07, 2001 at 09:42:16PM +1000, Neil Brown wrote:

> NFSv2 has a limit of 2Gigabytes per file.  Are the files that you are
> reading close to, or exceeding, this size?

	Not that large, no.  They're on the order of tens of
	megabytes, maybe 150 MBytes max.

> However, I wouldn't expect an ESTALE for that reason.  Can you run
> "tcpdump -s 1024", the the response that contains the error, and
> send the dozon or so lines around that?

	See below for the end of the tcpdump log, which concludes with
	the ESTALE.  I've got the entire log saved, which is about 1.4
	MB gzipped, available as http://bklyn.org/~cae/tcpdump.log.gz

	For this test, server was linux 2.4.9 w/nfsv3 enabled, client
	was 2.4.7 with no nfsv3.  Filesystem mounted on the client as:

tela:/shn on /shn/tela type nfs (rw,rsize=8192,wsize=8192,soft,addr=192.168.1.2,addr=192.168.1.2)

	Let me know if I can provide any add'l info that might help.

09:24:16.191079 tela.bklyn.org.nfs > hagrid.bklyn.org.2225077128: reply ok 96 write (DF)
09:24:16.191107 hagrid.bklyn.org > tela.bklyn.org: (frag 33381:1332@2960)
09:24:16.191116 hagrid.bklyn.org > tela.bklyn.org: (frag 33381:1480@1480+)
09:24:16.191131 hagrid.bklyn.org.2275408776 > tela.bklyn.org.nfs: 1472 write fh Unknown/1 4096 (4096) bytes @ 94208 (94208) (frag 33381:1480@0+)
09:24:16.204605 tela.bklyn.org.nfs > hagrid.bklyn.org.2241854344: reply ok 96 write (DF)
09:24:16.204632 hagrid.bklyn.org > tela.bklyn.org: (frag 33382:1332@2960)
09:24:16.204640 hagrid.bklyn.org > tela.bklyn.org: (frag 33382:1480@1480+)
09:24:16.204656 hagrid.bklyn.org.2292185992 > tela.bklyn.org.nfs: 1472 write fh Unknown/1 4096 (4096) bytes @ 102400 (102400) (frag 33382:1480@0+)
09:24:16.212418 tela.bklyn.org.nfs > hagrid.bklyn.org.2258631560: reply ok 96 write (DF)
09:24:16.212443 hagrid.bklyn.org > tela.bklyn.org: (frag 33383:988@7400)
09:24:16.212451 hagrid.bklyn.org > tela.bklyn.org: (frag 33383:1480@5920+)
09:24:16.212463 hagrid.bklyn.org > tela.bklyn.org: (frag 33383:1480@4440+)
09:24:16.212480 hagrid.bklyn.org > tela.bklyn.org: (frag 33383:1480@2960+)
09:24:16.212485 hagrid.bklyn.org > tela.bklyn.org: (frag 33383:1480@1480+)
09:24:16.212492 hagrid.bklyn.org.2308963208 > tela.bklyn.org.nfs: 1472 write fh Unknown/1 8192 (8192) bytes @ 110592 (110592) (frag 33383:1480@0+)
09:24:16.220518 tela.bklyn.org.nfs > hagrid.bklyn.org.2275408776: reply ok 96 write (DF)
09:24:16.220542 hagrid.bklyn.org > tela.bklyn.org: (frag 33384:988@7400)
09:24:16.220550 hagrid.bklyn.org > tela.bklyn.org: (frag 33384:1480@5920+)
09:24:16.220562 hagrid.bklyn.org > tela.bklyn.org: (frag 33384:1480@4440+)
09:24:16.220580 hagrid.bklyn.org > tela.bklyn.org: (frag 33384:1480@2960+)
09:24:16.220587 hagrid.bklyn.org > tela.bklyn.org: (frag 33384:1480@1480+)
09:24:16.220594 hagrid.bklyn.org.2325740424 > tela.bklyn.org.nfs: 1472 write fh Unknown/1 8192 (8192) bytes @ 118784 (118784) (frag 33384:1480@0+)
09:24:16.228087 tela.bklyn.org.nfs > hagrid.bklyn.org.2292185992: reply ok 96 write (DF)
09:24:16.235465 tela.bklyn.org.nfs > hagrid.bklyn.org.2308963208: reply ok 96 write (DF)
09:24:16.246145 tela.bklyn.org.nfs > hagrid.bklyn.org.2325740424: reply ok 96 write (DF)
09:24:16.246286 hagrid.bklyn.org.2342517640 > tela.bklyn.org.nfs: 184 read fh Unknown/1 4096 bytes @ 39305216 (DF)
09:24:16.246631 tela.bklyn.org.nfs > hagrid.bklyn.org.2342517640: reply ok 28 read ERROR: Stale NFS file handle (DF)

-- 
cae at bklyn dot org | Caleb Epstein | bklyn . org | Brooklyn Dust Bunny Mfg.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* mount source code
  2001-09-07 11:42 ` [NFS] " Neil Brown
  2001-09-07 13:27   ` Caleb Epstein
@ 2001-09-08  2:45   ` Julius R Sirait
  2001-09-08 11:30     ` [NFS] " Trond Myklebust
  1 sibling, 1 reply; 6+ messages in thread
From: Julius R Sirait @ 2001-09-08  2:45 UTC (permalink / raw)
  To: nfs; +Cc: linux-kernel

Hello,

Where can I find mount source code? specifically nfs mount source code if
there is available.


Thanks,

Julius



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [NFS] mount source code
  2001-09-08  2:45   ` mount source code Julius R Sirait
@ 2001-09-08 11:30     ` Trond Myklebust
  0 siblings, 0 replies; 6+ messages in thread
From: Trond Myklebust @ 2001-09-08 11:30 UTC (permalink / raw)
  To: Julius R Sirait; +Cc: nfs, linux-kernel

>>>>> " " == Julius R Sirait <Julius> writes:

     > Hello, Where can I find mount source code? specifically nfs
     > mount source code if there is available.

It's part of the util-linux suite. See

   ftp://ftp.win.tue.nl/pub/linux-local/utils/util-linux

or on one of the kernel.org mirrors as

  ftp://ftp.*.kernel.org/pub/linux/utils/util-linux

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [NFS] Spurious NFS ESTALE errors w/NFSv3 server, non-v3 client
  2001-09-07 13:27   ` Caleb Epstein
@ 2001-09-10  6:49     ` Neil Brown
  0 siblings, 0 replies; 6+ messages in thread
From: Neil Brown @ 2001-09-10  6:49 UTC (permalink / raw)
  To: Caleb Epstein; +Cc: linux-kernel, nfs

On Friday September 7, cae@bklyn.org wrote:
> On Fri, Sep 07, 2001 at 09:42:16PM +1000, Neil Brown wrote:
> 
> > NFSv2 has a limit of 2Gigabytes per file.  Are the files that you are
> > reading close to, or exceeding, this size?
> 
> 	Not that large, no.  They're on the order of tens of
> 	megabytes, maybe 150 MBytes max.
> 
> > However, I wouldn't expect an ESTALE for that reason.  Can you run
> > "tcpdump -s 1024", the the response that contains the error, and
> > send the dozon or so lines around that?
> 
> 	See below for the end of the tcpdump log, which concludes with
> 	the ESTALE.  I've got the entire log saved, which is about 1.4
> 	MB gzipped, available as http://bklyn.org/~cae/tcpdump.log.gz
> 
> 	For this test, server was linux 2.4.9 w/nfsv3 enabled, client
> 	was 2.4.7 with no nfsv3.  Filesystem mounted on the client as:
> 
> tela:/shn on /shn/tela type nfs (rw,rsize=8192,wsize=8192,soft,addr=192.168.1.2,addr=192.168.1.2)
> 
> 	Let me know if I can provide any add'l info that might help.

Well.....

It looks like you are reading through some large file, then you write
to some other file, and when you try to read the file file again, it
isn't there for some reason....

Is there any chance that the file that you are reading from is being
renamed or removed while it is being read?
Can you try exporting with "no_subtree_check" and see if that makes a
difference?
Could you 
    echo 2 > /proc/sys/sunrpc/nfsd_debug 

and get it to fail again, and then show me that last hundred lines or
so of the kernel log.

NeilBrown

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-09-10  6:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-06 18:11 Spurious NFS ESTALE errors w/NFSv3 server, non-v3 client Caleb Epstein
2001-09-07 11:42 ` [NFS] " Neil Brown
2001-09-07 13:27   ` Caleb Epstein
2001-09-10  6:49     ` Neil Brown
2001-09-08  2:45   ` mount source code Julius R Sirait
2001-09-08 11:30     ` [NFS] " Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).