linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6 nfsd troubles - stale filehandles
@ 2003-11-26 22:43 Fredrik Tolf
  2003-11-26 23:42 ` Neil Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Fredrik Tolf @ 2003-11-26 22:43 UTC (permalink / raw)
  To: linux-kernel

Hi!

I'm running my NFSv3 server at home on a 2.6 kernel, and it seems to
have some issues, to say the least. The clients sporadically get stale
handle errors, and I don't really know how to debug it.

The server is currently running 2.6.0-test10, but it has been running
test9, test6 and test4 as well, and they have all had the same
problems. The clients run different kernels, 2.4.22, 2.4.20, RH9
standard kernel and 2.6.0-test8-mm1. The filesystem on the kernel is
ReiserFS, but I had it on XFS before, and it had the same problems, so
it doesn't seem to be related to that.

In essence, I have no idea whatsoever what causes it, and the major
problem is that I don't really know how to find out. The NFS code is
really large, and I haven't had the time to get really into it.

I'll describe the symptoms. Everything works as it should normally,
but every now and then (the frequency, too, is very irregular -
sometimes it doesn't happen for days while sometimes it happens all
the time), some filehandles become stale. For example, a client's
working directory can become stale, and nothing except leaving the
directory and going back seems to fix it. Note that when it happens,
the directory can still be listed through an absolute path to it,
although sometimes that fails too. A "staled" cwd never recovers.

I have noted especially strange behavior when reading files that
suddenly go stale. This happens a lot to my mailboxes, so I have been
able to gather pretty thorough symptoms of it. When a mailbox
(ordinary UNIX style spool file) goes stale, I run 'wc' repeatedly on
it to fix it ("while ! wc $file; do :; done"), and when I do, I see
that the number of lines that wc can count "grows" for each iteration,
until the entire file can be counted and the problem is solved,
temporarily. Sometimes (rarely) during this process the line count
resets. In that manner, it almost seems as if the file is being
"transferred" from the server, but I guess that conclusion is too
weird to be correct.

Another strange thing is that some directories _never_ are affected by
this. I use this NFS export to share my home directories, and /home
and the home directories themselves have never once become stale. It
almost seems as if files are more easily "staled" the lower down in
the directory hierarchy they are, but that too seems too weird to be
possible, and not all files are like that.

Sometimes the "staling frequency" can be lowered by rebooting the
clients, and sometimes it can be lowered by rebooting the server.

Sometimes, files and directories can be staled even though no process
is using them. I don't know if maybe the client forgets to release a
filehandle to it for any reason. It happens all too often to my
mozilla pref directory (/home/fredrik/.mozilla/default/58nhp2e8.slt).
When it happens, it is extremely hard to access, be it by cwd or
absolute path. Stupid as I am, I haven't tried if any other client can
access it when it happens, but rebooting the client tends to help
temporarily. While it is that way, commands do work sporadically on
the directory and the files therein, but the majority of the requests
on that directory fail with ESTALE.

When a cwd recently became staled, I enabled debug output on the
client and server, and although there doesn't seem to be anything
special about it, I'll include it just to be sure. To generate the
output, I ran ls once in the staled cwd.

Server:
nfsd_dispatch: vers 3 proc 1 
nfsd: GETATTR(3)  36: 06000001 0000fe00 00000002 00023957 00000002 00000000 
nfsd: fh_verify(36: 06000001 0000fe00 00000002 00023957 00000002 00000000) 

Client:
NFS: 56712 flushd starting 
NFS: 56712 flushd back to sleep 
NFS: revalidating (7/145751) 
NFS call  getattr 
NFS reply getattr 
NFS: refresh_inode(7/145751 ct=1 info=0x6) 
NFS: (7/145751) revalidation complete 
NFS: revalidating (7/146244) 

145751 (0x23957) is the i-number of the home directory
(/home/hannes). fe00 is the device number of the LVM2 volume the
filesystem resides on. 146244 is the i-number of the staled directory
(/home/hannes/dc). This client is running 2.4.22.

During this debug test, only two packets were transferred between the
client and server. One was the GETATTR request from the client on
/home/hannes, and the other was a successful response from the server.

I staled the directory essentially by being inside it and running
"while:; do ls; done". It worked a hundred times or so and then became
staled. There was only one request that failed with ERR_STALE, and it
was a GETATTR request on /home/hannes/dc, just like any other (ie. the
request didn't differ from any of the preceding hundred requests).
Note that this method doesn't always work; sometimes the command goes
on forever, as it should.

That is all the info that I can think of right now. If you're still
reading, I thank you for your endurance. If you need anything more,
tell me and I'll find it out.

Lastly, is this error known? Have I been reporting this unnecessarily?
I would have imagined that this problem hadn't gone unnoticed, but I
didn't find anything in the archives. Could it be that NFS bugs are
handled elsewhere? In that case, please tell me where.

Thank you for your time!

Fredrik Tolf


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6 nfsd troubles - stale filehandles
  2003-11-26 22:43 2.6 nfsd troubles - stale filehandles Fredrik Tolf
@ 2003-11-26 23:42 ` Neil Brown
  2003-11-27 13:49   ` Fredrik Tolf
  0 siblings, 1 reply; 6+ messages in thread
From: Neil Brown @ 2003-11-26 23:42 UTC (permalink / raw)
  To: Fredrik Tolf; +Cc: linux-kernel

On Wednesday November 26, fredrik@dolda2000.com wrote:
> Hi!
> 
> I'm running my NFSv3 server at home on a 2.6 kernel, and it seems to
> have some issues, to say the least. The clients sporadically get stale
> handle errors, and I don't really know how to debug it.

I'll see if I can help.

I suspect that if you add the "no_subtree_check" export option the
problem will go away.  If you could confirm that, and then set it back
to "subtree_check" so we can keep hunting, that would be good.

Next, some better tracing.
The Linux NFS client will never re-try a filehandle that it thinks is
stale, so the tracing you did doesn't actually show any access of the stale
filehandle. 
So you need to have tracing on when the filehandle goes stale.

If you could:

  echo 2 >  /proc/sys/sunrpc/nfsd_debug 

and then try to create a stale file/directory, then the trace produced
by that could well be helpful.

Finally, when you have create a stale filehandle and got a good trace,
could you send it to me and include an
   ls -l
for the bad file/directory and every parent up to the export point.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6 nfsd troubles - stale filehandles
  2003-11-26 23:42 ` Neil Brown
@ 2003-11-27 13:49   ` Fredrik Tolf
  2003-12-03  0:26     ` Fredrik Tolf
  0 siblings, 1 reply; 6+ messages in thread
From: Fredrik Tolf @ 2003-11-27 13:49 UTC (permalink / raw)
  To: Neil Brown; +Cc: Fredrik Tolf, linux-kernel

Neil Brown writes:
 > On Wednesday November 26, fredrik@dolda2000.com wrote:
 > > Hi!
 > > 
 > > I'm running my NFSv3 server at home on a 2.6 kernel, and it seems to
 > > have some issues, to say the least. The clients sporadically get stale
 > > handle errors, and I don't really know how to debug it.
 > 
 > I'll see if I can help.
 > 
 > I suspect that if you add the "no_subtree_check" export option the
 > problem will go away.  If you could confirm that, and then set it back
 > to "subtree_check" so we can keep hunting, that would be good.

That actually does seem to have done the job. I thought subtree_check
only affected exports that aren't entire filesystems, but I guess it
does something to the filehandles anyway. Thank you, at least now I
have something to fall back upon if no other solution presents itself.

 > Next, some better tracing.
 > The Linux NFS client will never re-try a filehandle that it thinks is
 > stale, so the tracing you did doesn't actually show any access of the stale
 > filehandle. 

I see... I thought it would try to get a new filehandle to the same
file somehow.

 > So you need to have tracing on when the filehandle goes stale.
 > 
 > If you could:
 > 
 >   echo 2 >  /proc/sys/sunrpc/nfsd_debug 
 > 
 > and then try to create a stale file/directory, then the trace produced
 > by that could well be helpful.
 > 
 > Finally, when you have create a stale filehandle and got a good trace,
 > could you send it to me and include an
 >    ls -l
 > for the bad file/directory and every parent up to the export point.

I'll do my best, but I don't know how long it will take me. It is
extremely hard to predict when it will happen, so tracing the actual
fault won't be easy.

I'll post again when (and if) I manage to get a good trace.

Fredrik Tolf


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6 nfsd troubles - stale filehandles
  2003-11-27 13:49   ` Fredrik Tolf
@ 2003-12-03  0:26     ` Fredrik Tolf
  2003-12-04  3:14       ` Neil Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Fredrik Tolf @ 2003-12-03  0:26 UTC (permalink / raw)
  To: Fredrik Tolf; +Cc: Neil Brown, linux-kernel

[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 4299 bytes --]

Fredrik Tolf writes:
 > Neil Brown writes:
 >  > On Wednesday November 26, fredrik@dolda2000.com wrote:
 >  > > Hi!
 >  > > 
 >  > > I'm running my NFSv3 server at home on a 2.6 kernel, and it seems to
 >  > > have some issues, to say the least. The clients sporadically get stale
 >  > > handle errors, and I don't really know how to debug it.
 >  > 
 >  > I'll see if I can help.
 >  > 
 >  > I suspect that if you add the "no_subtree_check" export option the
 >  > problem will go away.  If you could confirm that, and then set it back
 >  > to "subtree_check" so we can keep hunting, that would be good.
 > 
 > That actually does seem to have done the job. I thought subtree_check
 > only affected exports that aren't entire filesystems, but I guess it
 > does something to the filehandles anyway. Thank you, at least now I
 > have something to fall back upon if no other solution presents itself.
 > 
 >  > Next, some better tracing.
 >  > The Linux NFS client will never re-try a filehandle that it thinks is
 >  > stale, so the tracing you did doesn't actually show any access of the stale
 >  > filehandle. 
 > 
 > I see... I thought it would try to get a new filehandle to the same
 > file somehow.
 > 
 >  > So you need to have tracing on when the filehandle goes stale.
 >  > 
 >  > If you could:
 >  > 
 >  >   echo 2 >  /proc/sys/sunrpc/nfsd_debug 
 >  > 
 >  > and then try to create a stale file/directory, then the trace produced
 >  > by that could well be helpful.
 >  > 
 >  > Finally, when you have create a stale filehandle and got a good trace,
 >  > could you send it to me and include an
 >  >    ls -l
 >  > for the bad file/directory and every parent up to the export point.
 > 
 > I'll do my best, but I don't know how long it will take me. It is
 > extremely hard to predict when it will happen, so tracing the actual
 > fault won't be easy.
 > 
 > I'll post again when (and if) I manage to get a good trace.

So, I managed to do it now. The strange thing is, I had created an
extra share of the home dirs for this test (I had "mount --bind"'ed it
on /var/lib/nfs/nfstest), and only set subtree_check back on the test
export, in an attempt to keep the normal parts of the system reliable
while testing, but just doing that made the real export behave bad as
well. It seems to run amok as soon as I have subtree_checking on only
one export.

Anyway, the traces got rather long, and therefore I have only made the
entire logs available as attachments to this mail. I hope that's
OK. Anyway, here follows what seems to be the interesting parts.

I used the same directory to make it fail again. From the filesystem's
point of view, it is /hannes/dc, inode 146244 (23b44 hex). On the
failed attempt (once again, I ran ls with the failing dir as the cwd),
the server says this:

nfsd_dispatch: vers 3 proc 1 
nfsd: GETATTR(3)  36: 06000001 0000fe00 00000002 00023b44 00023957 00000000 
nfsd: fh_verify(36: 06000001 0000fe00 00000002 00023b44 00023957 00000000) 
nfsd_acceptable failed at c669a5c0 dc 

The working attempts all looked a bit different, so I will have to
refer you to the log excerpt, if you don't mind. In case you wonder
about the inodes, 145751 (23957 hex) is /hannes, 146273 (23b61 hex) is
a regular file inside /hannes/dc, and 2 is (surprisingly) the root dir
of that filesystem. Again, 146244 (23b44 hex) is the failing
directory, /hannes/dc.

On the client, the failed call (an attemp is rather long on the
client) looks like this:

NFS: revalidating (0:d/146244)
NFS call  getattr
NFS reply getattr
nfs_revalidate_inode: (0:d/146244) getattr failed, error=-116

Some more info: I was root while causing this error, and the dir arch
looks like this (from this filesystem's point of view, it is really my
home dirs):

rwxr-xr-x  root   root  /
rwx--x---+ hannes users /hannes
rwxr-xr-x  hannes users /hannes/dc

I have an ACL which allows x perm for apache on /hannes, but that is
not the cause of the fault; it was like this before I had even applied
the ACL patches. The real mode on /hannes is 700; the 010 is added by
the ACL code. In short, getfacl returns the following when querying
/hannes:

# owner: hannes
# group: users
user::rwx
user:apache:--x
group::---
mask::--x
other::---

That is all I have found out; I hope it's helpful.

Fredrik Tolf


[-- Attachment #2: Client log --]
[-- Type: text/plain, Size: 10053 bytes --]

Dec  3 00:41:22 pc7 kernel: NFS: dentry_delete(vfolders/applications.vfolder-info, 0)
Dec  3 00:41:22 pc7 kernel: NFS: dentry_delete(vfolders/applications, 0)
Dec  3 00:41:22 pc7 last message repeated 2 times
Dec  3 00:41:28 pc7 kernel: NFS: dentry_delete(vfolders/applications.vfolder-info, 0)
Dec  3 00:41:28 pc7 kernel: NFS: dentry_delete(vfolders/applications, 0)
Dec  3 00:41:28 pc7 last message repeated 2 times
Dec  3 00:41:28 pc7 kernel: NFS call  access
Dec  3 00:41:28 pc7 kernel: NFS: refresh_inode(0:d/2 ct=1 info=0x6)
Dec  3 00:41:28 pc7 kernel: NFS reply access
Dec  3 00:41:28 pc7 kernel: NFS: revalidating (0:d/145751)
Dec  3 00:41:28 pc7 kernel: NFS call  getattr
Dec  3 00:41:28 pc7 kernel: NFS reply getattr
Dec  3 00:41:28 pc7 kernel: NFS: refresh_inode(0:d/145751 ct=1 info=0x6)
Dec  3 00:41:28 pc7 kernel: NFS: (0:d/145751) revalidation complete
Dec  3 00:41:28 pc7 kernel: NFS call  access
Dec  3 00:41:28 pc7 kernel: NFS: refresh_inode(0:d/145751 ct=1 info=0x6)
Dec  3 00:41:28 pc7 kernel: NFS reply access
Dec  3 00:41:28 pc7 kernel: NFS: revalidating (0:d/146244)
Dec  3 00:41:28 pc7 kernel: NFS call  getattr
Dec  3 00:41:28 pc7 kernel: NFS reply getattr
Dec  3 00:41:28 pc7 kernel: NFS: refresh_inode(0:d/146244 ct=1 info=0x6)
Dec  3 00:41:28 pc7 kernel: NFS: (0:d/146244) revalidation complete
Dec  3 00:41:28 pc7 kernel: NFS call  access
Dec  3 00:41:28 pc7 kernel: NFS: refresh_inode(0:d/146244 ct=1 info=0x6)
Dec  3 00:41:28 pc7 kernel: NFS reply access
Dec  3 00:41:30 pc7 kernel: NFS: revalidating (0:d/146244)
Dec  3 00:41:30 pc7 kernel: NFS call  getattr
Dec  3 00:41:30 pc7 kernel: NFS reply getattr
Dec  3 00:41:30 pc7 kernel: NFS: refresh_inode(0:d/146244 ct=1 info=0x6)
Dec  3 00:41:30 pc7 kernel: NFS: (0:d/146244) revalidation complete
Dec  3 00:41:30 pc7 kernel: NFS call  access
Dec  3 00:41:30 pc7 kernel: NFS: refresh_inode(0:d/146244 ct=1 info=0x6)
Dec  3 00:41:30 pc7 kernel: NFS reply access
Dec  3 00:41:30 pc7 kernel: NFS: readdir_search_pagecache() searching for cookie 0
Dec  3 00:41:30 pc7 kernel: NFS: find_dirent_page() searching directory page 0
Dec  3 00:41:30 pc7 kernel: NFS: found cookie 2
Dec  3 00:41:30 pc7 kernel: NFS: find_dirent() returns 0
Dec  3 00:41:30 pc7 kernel: NFS: find_dirent_page() returns 0
Dec  3 00:41:30 pc7 kernel: NFS: readdir_search_pagecache() returned 0
Dec  3 00:41:31 pc7 kernel: NFS: nfs_do_filldir() filling starting @ cookie 0
Dec  3 00:41:31 pc7 kernel: NFS: nfs_do_filldir() filling ended @ cookie 1387241729; returning = 0
Dec  3 00:41:31 pc7 kernel: NFS call  access
Dec  3 00:41:31 pc7 kernel: NFS: refresh_inode(0:d/146244 ct=1 info=0x6)
Dec  3 00:41:31 pc7 kernel: NFS reply access
Dec  3 00:41:31 pc7 kernel: NFS: revalidating (0:d/146273)
Dec  3 00:41:31 pc7 kernel: NFS call  getattr
Dec  3 00:41:31 pc7 kernel: NFS reply getattr
Dec  3 00:41:31 pc7 kernel: NFS: refresh_inode(0:d/146273 ct=1 info=0x6)
Dec  3 00:41:31 pc7 kernel: NFS: (0:d/146273) revalidation complete
Dec  3 00:41:31 pc7 kernel: NFS: dentry_delete(dc/filter.log, 0)
Dec  3 00:41:31 pc7 kernel: NFS: readdir_search_pagecache() searching for cookie 1387241729
Dec  3 00:41:31 pc7 kernel: NFS: find_dirent_page() searching directory page 0
Dec  3 00:41:31 pc7 kernel: NFS: found cookie 2
Dec  3 00:41:31 pc7 kernel: NFS: found cookie 2711168
Dec  3 00:41:31 pc7 kernel: NFS: found cookie 26116736
Dec  3 00:41:31 pc7 kernel: NFS: found cookie 945084544
Dec  3 00:41:31 pc7 kernel: NFS: found cookie 1207804544
Dec  3 00:41:31 pc7 kernel: NFS: found cookie 1387241728
Dec  3 00:41:31 pc7 kernel: NFS: found cookie 1387241729
Dec  3 00:41:31 pc7 kernel: NFS: find_dirent() returns -523
Dec  3 00:41:31 pc7 kernel: NFS: find_dirent_page() returns -523
Dec  3 00:41:31 pc7 kernel: NFS: readdir_search_pagecache() returned -523
Dec  3 00:41:33 pc7 kernel: NFS: revalidating (0:d/146244)
Dec  3 00:41:33 pc7 kernel: NFS call  getattr
Dec  3 00:41:33 pc7 kernel: NFS reply getattr
Dec  3 00:41:33 pc7 kernel: NFS: refresh_inode(0:d/146244 ct=1 info=0x6)
Dec  3 00:41:33 pc7 kernel: NFS: (0:d/146244) revalidation complete
Dec  3 00:41:33 pc7 kernel: NFS call  access
Dec  3 00:41:33 pc7 kernel: NFS: refresh_inode(0:d/146244 ct=1 info=0x6)
Dec  3 00:41:33 pc7 kernel: NFS reply access
Dec  3 00:41:33 pc7 kernel: NFS: readdir_search_pagecache() searching for cookie 0
Dec  3 00:41:33 pc7 kernel: NFS: find_dirent_page() searching directory page 0
Dec  3 00:41:33 pc7 kernel: NFS: found cookie 2
Dec  3 00:41:33 pc7 kernel: NFS: find_dirent() returns 0
Dec  3 00:41:33 pc7 kernel: NFS: find_dirent_page() returns 0
Dec  3 00:41:33 pc7 kernel: NFS: readdir_search_pagecache() returned 0
Dec  3 00:41:33 pc7 kernel: NFS: nfs_do_filldir() filling starting @ cookie 0
Dec  3 00:41:33 pc7 kernel: NFS: nfs_do_filldir() filling ended @ cookie 1387241729; returning = 0
Dec  3 00:41:33 pc7 kernel: NFS call  access
Dec  3 00:41:33 pc7 kernel: NFS: refresh_inode(0:d/146244 ct=1 info=0x6)
Dec  3 00:41:33 pc7 kernel: NFS reply access
Dec  3 00:41:33 pc7 kernel: NFS: dentry_delete(dc/filter.log, 0)
Dec  3 00:41:34 pc7 kernel: NFS: readdir_search_pagecache() searching for cookie 1387241729
Dec  3 00:41:34 pc7 kernel: NFS: find_dirent_page() searching directory page 0
Dec  3 00:41:34 pc7 kernel: NFS: found cookie 2
Dec  3 00:41:34 pc7 kernel: NFS: found cookie 2711168
Dec  3 00:41:34 pc7 kernel: NFS: found cookie 26116736
Dec  3 00:41:34 pc7 kernel: NFS: found cookie 945084544
Dec  3 00:41:34 pc7 kernel: NFS: found cookie 1207804544
Dec  3 00:41:34 pc7 kernel: NFS: found cookie 1387241728
Dec  3 00:41:34 pc7 kernel: NFS: found cookie 1387241729
Dec  3 00:41:34 pc7 kernel: NFS: find_dirent() returns -523
Dec  3 00:41:34 pc7 kernel: NFS: find_dirent_page() returns -523
Dec  3 00:41:34 pc7 kernel: NFS: readdir_search_pagecache() returned -523
Dec  3 00:41:34 pc7 kernel: NFS call  access
Dec  3 00:41:34 pc7 kernel: NFS: refresh_inode(0:d/2 ct=1 info=0x6)
Dec  3 00:41:34 pc7 kernel: NFS reply access
Dec  3 00:41:34 pc7 kernel: NFS: revalidating (0:d/5216)
Dec  3 00:41:34 pc7 kernel: NFS call  getattr
Dec  3 00:41:34 pc7 kernel: NFS reply getattr
Dec  3 00:41:34 pc7 kernel: NFS: refresh_inode(0:d/5216 ct=1 info=0x6)
Dec  3 00:41:34 pc7 kernel: NFS: (0:d/5216) revalidation complete
Dec  3 00:41:34 pc7 kernel: NFS: dentry_delete(vfolders/applications.vfolder-info, 0)
Dec  3 00:41:34 pc7 kernel: NFS: dentry_delete(vfolders/applications, 0)
Dec  3 00:41:34 pc7 last message repeated 2 times
Dec  3 00:41:36 pc7 kernel: NFS: revalidating (0:d/146244)
Dec  3 00:41:36 pc7 kernel: NFS call  getattr
Dec  3 00:41:36 pc7 kernel: NFS reply getattr
Dec  3 00:41:36 pc7 kernel: nfs_revalidate_inode: (0:d/146244) getattr failed, error=-116
Dec  3 00:41:39 pc7 kernel: NFS: dentry_delete(hannes/dc, 0)
Dec  3 00:41:40 pc7 kernel: NFS: revalidating (0:d/8333)
Dec  3 00:41:40 pc7 kernel: NFS call  getattr
Dec  3 00:41:40 pc7 kernel: NFS reply getattr
Dec  3 00:41:40 pc7 kernel: NFS: refresh_inode(0:d/8333 ct=1 info=0x6)
Dec  3 00:41:40 pc7 kernel: NFS: (0:d/8333) revalidation complete
Dec  3 00:41:40 pc7 kernel: NFS: dentry_delete(vfolders/applications.vfolder-info, 0)
Dec  3 00:41:40 pc7 kernel: NFS: dentry_delete(vfolders/applications, 0)
Dec  3 00:41:40 pc7 last message repeated 2 times
Dec  3 00:41:42 pc7 kernel: NFS call  access
Dec  3 00:41:42 pc7 kernel: NFS: refresh_inode(0:d/2 ct=1 info=0x6)
Dec  3 00:41:42 pc7 kernel: NFS reply access
Dec  3 00:41:42 pc7 kernel: NFS call  access
Dec  3 00:41:42 pc7 kernel: NFS: refresh_inode(0:d/5216 ct=1 info=0x6)
Dec  3 00:41:42 pc7 kernel: NFS reply access
Dec  3 00:41:42 pc7 kernel: NFS call  lookup .emacs.d
Dec  3 00:41:42 pc7 kernel: NFS reply lookup: 0
Dec  3 00:41:42 pc7 kernel: NFS: refresh_inode(0:d/5216 ct=1 info=0x6)
Dec  3 00:41:42 pc7 kernel: NFS: refresh_inode(0:d/120801 ct=1 info=0x6)
Dec  3 00:41:42 pc7 kernel: NFS call  access
Dec  3 00:41:42 pc7 kernel: NFS: refresh_inode(0:d/120801 ct=1 info=0x6)
Dec  3 00:41:42 pc7 kernel: NFS reply access
Dec  3 00:41:42 pc7 kernel: NFS: revalidating (0:d/120802)
Dec  3 00:41:42 pc7 kernel: NFS call  getattr
Dec  3 00:41:42 pc7 kernel: NFS reply getattr
Dec  3 00:41:42 pc7 kernel: NFS: refresh_inode(0:d/120802 ct=1 info=0x6)
Dec  3 00:41:42 pc7 kernel: NFS: (0:d/120802) revalidation complete
Dec  3 00:41:42 pc7 kernel: NFS call  access
Dec  3 00:41:42 pc7 kernel: NFS: refresh_inode(0:d/120802 ct=1 info=0x6)
Dec  3 00:41:42 pc7 kernel: NFS reply access
Dec  3 00:41:42 pc7 kernel: NFS: revalidating (0:d/9802)
Dec  3 00:41:42 pc7 kernel: NFS call  getattr
Dec  3 00:41:42 pc7 kernel: NFS reply getattr
Dec  3 00:41:42 pc7 kernel: NFS: refresh_inode(0:d/9802 ct=1 info=0x6)
Dec  3 00:41:42 pc7 kernel: NFS: (0:d/9802) revalidation complete
Dec  3 00:41:42 pc7 kernel: NFS call  access
Dec  3 00:41:42 pc7 kernel: NFS: refresh_inode(0:d/9802 ct=1 info=0x6)
Dec  3 00:41:42 pc7 kernel: NFS reply access
Dec  3 00:41:42 pc7 kernel: NFS call  setattr
Dec  3 00:41:42 pc7 kernel: NFS reply setattr
Dec  3 00:41:42 pc7 kernel: NFS: refresh_inode(0:d/9802 ct=1 info=0x7)
Dec  3 00:41:42 pc7 kernel: nfs: write(auto-save-list/.saves-6006-pc7.dolda2000.com~(9802), 1197@0)
Dec  3 00:41:42 pc7 kernel: NFS:      nfs_updatepage(auto-save-list/.saves-6006-pc7.dolda2000.com~ 1197@0)
Dec  3 00:41:42 pc7 kernel: NFS:      nfs_updatepage returns 0 (isize 1197)
Dec  3 00:41:42 pc7 kernel: nfs: flush(0:d/9802)
Dec  3 00:41:42 pc7 kernel: NFS: 37826 initiated write call (req 0:d/9802, 1197 bytes @ offset 0)
Dec  3 00:41:42 pc7 kernel: NFS: refresh_inode(0:d/9802 ct=2 info=0x7)
Dec  3 00:41:42 pc7 kernel: NFS: 37826 nfs_writeback_done (status 1197)
Dec  3 00:41:42 pc7 kernel: NFS: write (0:d/9802 1197@0) marked for commit
Dec  3 00:41:42 pc7 kernel: NFS: 37827 initiated commit call
Dec  3 00:41:43 pc7 kernel: NFS: refresh_inode(0:d/9802 ct=2 info=0x7)
Dec  3 00:41:43 pc7 kernel: NFS: 37827 nfs_commit_done (status 0)
Dec  3 00:41:43 pc7 kernel: NFS: commit (0:d/9802 1197@0) OK
Dec  3 00:41:43 pc7 kernel: NFS: dentry_delete(auto-save-list/.saves-6006-pc7.dolda2000.com~, 0)

[-- Attachment #3: Server log --]
[-- Type: text/plain, Size: 3829 bytes --]

Dec  3 00:36:17 pc5 kernel: nfsd_dispatch: vers 3 proc 4 
Dec  3 00:36:17 pc5 kernel: nfsd: ACCESS(3)   12: 00000001 0000fe00 00000002 00000000 00000000 00000000 0x2 
Dec  3 00:36:17 pc5 kernel: nfsd: fh_verify(12: 00000001 0000fe00 00000002 00000000 00000000 00000000) 
Dec  3 00:36:17 pc5 kernel: nfsd_dispatch: vers 3 proc 1 
Dec  3 00:36:17 pc5 kernel: nfsd: GETATTR(3)  24: 03000001 0000fe00 00000002 00023957 00000002 00000000 
Dec  3 00:36:17 pc5 kernel: nfsd: fh_verify(24: 03000001 0000fe00 00000002 00023957 00000002 00000000) 
Dec  3 00:36:17 pc5 kernel: nfsd_dispatch: vers 3 proc 4 
Dec  3 00:36:17 pc5 kernel: nfsd: ACCESS(3)   24: 03000001 0000fe00 00000002 00023957 00000002 00000000 0x2 
Dec  3 00:36:17 pc5 kernel: nfsd: fh_verify(24: 03000001 0000fe00 00000002 00023957 00000002 00000000) 
Dec  3 00:36:18 pc5 kernel: nfsd_dispatch: vers 3 proc 1 
Dec  3 00:36:18 pc5 kernel: nfsd: GETATTR(3)  36: 06000001 0000fe00 00000002 00023b44 00023957 00000000 
Dec  3 00:36:18 pc5 kernel: nfsd: fh_verify(36: 06000001 0000fe00 00000002 00023b44 00023957 00000000) 
Dec  3 00:36:18 pc5 kernel: nfsd_dispatch: vers 3 proc 4 
Dec  3 00:36:18 pc5 kernel: nfsd: ACCESS(3)   36: 06000001 0000fe00 00000002 00023b44 00023957 00000000 0x2 
Dec  3 00:36:18 pc5 kernel: nfsd: fh_verify(36: 06000001 0000fe00 00000002 00023b44 00023957 00000000) 
Dec  3 00:36:21 pc5 kernel: nfsd_dispatch: vers 3 proc 1 
Dec  3 00:36:21 pc5 kernel: nfsd: GETATTR(3)  36: 06000001 0000fe00 00000002 00023b44 00023957 00000000 
Dec  3 00:36:21 pc5 kernel: nfsd: fh_verify(36: 06000001 0000fe00 00000002 00023b44 00023957 00000000) 
Dec  3 00:36:21 pc5 kernel: nfsd_dispatch: vers 3 proc 4 
Dec  3 00:36:21 pc5 kernel: nfsd: ACCESS(3)   36: 06000001 0000fe00 00000002 00023b44 00023957 00000000 0x1 
Dec  3 00:36:21 pc5 kernel: nfsd: fh_verify(36: 06000001 0000fe00 00000002 00023b44 00023957 00000000) 
Dec  3 00:36:21 pc5 kernel: nfsd_dispatch: vers 3 proc 4 
Dec  3 00:36:21 pc5 kernel: nfsd: ACCESS(3)   36: 06000001 0000fe00 00000002 00023b44 00023957 00000000 0x2 
Dec  3 00:36:21 pc5 kernel: nfsd: fh_verify(36: 06000001 0000fe00 00000002 00023b44 00023957 00000000) 
Dec  3 00:36:21 pc5 kernel: nfsd_dispatch: vers 3 proc 1 
Dec  3 00:36:21 pc5 kernel: nfsd: GETATTR(3)  36: 06000001 0000fe00 00000002 00023b61 00023b44 00000000 
Dec  3 00:36:21 pc5 kernel: nfsd: fh_verify(36: 06000001 0000fe00 00000002 00023b61 00023b44 00000000) 
Dec  3 00:36:21 pc5 kernel: nfsd_dispatch: vers 3 proc 4 
Dec  3 00:36:21 pc5 kernel: nfsd: ACCESS(3)   12: 00000001 0000fe00 00000002 00000000 00000000 00000000 0x2 
Dec  3 00:36:22 pc5 kernel: nfsd: fh_verify(12: 00000001 0000fe00 00000002 00000000 00000000 00000000) 
Dec  3 00:36:22 pc5 kernel: nfsd_dispatch: vers 3 proc 1 
Dec  3 00:36:22 pc5 kernel: nfsd: GETATTR(3)  36: 06000001 0000fe00 00000002 0000208d 00001460 00000000 
Dec  3 00:36:22 pc5 kernel: nfsd: fh_verify(36: 06000001 0000fe00 00000002 0000208d 00001460 00000000) 
Dec  3 00:36:23 pc5 kernel: nfsd_dispatch: vers 3 proc 1 
Dec  3 00:36:23 pc5 kernel: nfsd: GETATTR(3)  36: 06000001 0000fe00 00000002 00023b44 00023957 00000000 
Dec  3 00:36:23 pc5 kernel: nfsd: fh_verify(36: 06000001 0000fe00 00000002 00023b44 00023957 00000000) 
Dec  3 00:36:23 pc5 kernel: nfsd_acceptable failed at c669a5c0 dc 
Dec  3 00:36:24 pc5 kernel: nfsd_dispatch: vers 3 proc 1 
Dec  3 00:36:24 pc5 kernel: nfsd: GETATTR(3)  24: 03000001 0000fe00 00000002 00023957 00000002 00000000 
Dec  3 00:36:24 pc5 kernel: nfsd: fh_verify(24: 03000001 0000fe00 00000002 00023957 00000002 00000000) 
Dec  3 00:36:29 pc5 kernel: nfsd_dispatch: vers 3 proc 1 
Dec  3 00:36:29 pc5 kernel: nfsd: GETATTR(3)  24: 03000001 0000fe00 00000002 00023957 00000002 00000000 
Dec  3 00:36:29 pc5 kernel: nfsd: fh_verify(24: 03000001 0000fe00 00000002 00023957 00000002 00000000) 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6 nfsd troubles - stale filehandles
  2003-12-03  0:26     ` Fredrik Tolf
@ 2003-12-04  3:14       ` Neil Brown
  2003-12-04  3:31         ` Fredrik Tolf
  0 siblings, 1 reply; 6+ messages in thread
From: Neil Brown @ 2003-12-04  3:14 UTC (permalink / raw)
  To: Fredrik Tolf; +Cc: linux-kernel

On Wednesday December 3, fredrik@dolda2000.com wrote:
> 
> So, I managed to do it now. The strange thing is, I had created an
> extra share of the home dirs for this test (I had "mount --bind"'ed it
> on /var/lib/nfs/nfstest), and only set subtree_check back on the test
> export, in an attempt to keep the normal parts of the system reliable
> while testing, but just doing that made the real export behave bad as
> well. It seems to run amok as soon as I have subtree_checking on only
> one export.

"mount --bind" certainly has a good chance of confusing nfsd.
If you --bind mount the root of the filesystem somewhere else and
export that, then the filehandles generated will be exactly the same
and nfsd cannot know whether a request is indented for one mountpoint
or the other.
When using --bind, it is best to give an 'fsid=' option in
/etc/exports so that nfsd can use that to differentiate the mount
points.

> 
> nfsd_dispatch: vers 3 proc 1 
> nfsd: GETATTR(3)  36: 06000001 0000fe00 00000002 00023b44 00023957 00000000 
> nfsd: fh_verify(36: 06000001 0000fe00 00000002 00023b44 00023957 00000000) 
> nfsd_acceptable failed at c669a5c0 dc 

This strongly suggests that nfsd thought that the user making the
request didn't have 'x' access to the parent of 'dc'. i.e. to /hannes.

> 
> Some more info: I was root while causing this error, and the dir arch
> looks like this (from this filesystem's point of view, it is really my
> home dirs):
> 
> rwxr-xr-x  root   root  /
> rwx--x---+ hannes users /hannes
> rwxr-xr-x  hannes users /hannes/dc

And if you are not exporting with no_root_squash, then the user does
not have 'x' access to hannes.

So if you haven't exported with 'no_root_squash', then this completely
makes sense. The nfs client is allowing root access (based on cached
data that some other local users recently accessed) but the server is
not allowing root access. 
Arguably you should be getting "permission denied" rather than
"stale", but you certainly shouldn't expect it to work.

If, on the other hand, you have specified no_root_squash, then this is
still very strange.

What export options are you using?

NeilBrown


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6 nfsd troubles - stale filehandles
  2003-12-04  3:14       ` Neil Brown
@ 2003-12-04  3:31         ` Fredrik Tolf
  0 siblings, 0 replies; 6+ messages in thread
From: Fredrik Tolf @ 2003-12-04  3:31 UTC (permalink / raw)
  To: Neil Brown; +Cc: Fredrik Tolf, linux-kernel

Neil Brown writes:
 > On Wednesday December 3, fredrik@dolda2000.com wrote:
 > > 
 > > So, I managed to do it now. The strange thing is, I had created an
 > > extra share of the home dirs for this test (I had "mount --bind"'ed it
 > > on /var/lib/nfs/nfstest), and only set subtree_check back on the test
 > > export, in an attempt to keep the normal parts of the system reliable
 > > while testing, but just doing that made the real export behave bad as
 > > well. It seems to run amok as soon as I have subtree_checking on only
 > > one export.
 > 
 > "mount --bind" certainly has a good chance of confusing nfsd.
 > If you --bind mount the root of the filesystem somewhere else and
 > export that, then the filehandles generated will be exactly the same
 > and nfsd cannot know whether a request is indented for one mountpoint
 > or the other.
 > When using --bind, it is best to give an 'fsid=' option in
 > /etc/exports so that nfsd can use that to differentiate the mount
 > points.

Oh, I see. Now that you mention it, it does quite much make sense, I
have to admit.

 > > 
 > > nfsd_dispatch: vers 3 proc 1 
 > > nfsd: GETATTR(3)  36: 06000001 0000fe00 00000002 00023b44 00023957 00000000 
 > > nfsd: fh_verify(36: 06000001 0000fe00 00000002 00023b44 00023957 00000000) 
 > > nfsd_acceptable failed at c669a5c0 dc 
 > 
 > This strongly suggests that nfsd thought that the user making the
 > request didn't have 'x' access to the parent of 'dc'. i.e. to /hannes.

That's what I thought too, but I could not figure out why.

 > > 
 > > Some more info: I was root while causing this error, and the dir arch
 > > looks like this (from this filesystem's point of view, it is really my
 > > home dirs):
 > > 
 > > rwxr-xr-x  root   root  /
 > > rwx--x---+ hannes users /hannes
 > > rwxr-xr-x  hannes users /hannes/dc
 > 
 > And if you are not exporting with no_root_squash, then the user does
 > not have 'x' access to hannes.
 > 
 > So if you haven't exported with 'no_root_squash', then this completely
 > makes sense. The nfs client is allowing root access (based on cached
 > data that some other local users recently accessed) but the server is
 > not allowing root access. 
 > Arguably you should be getting "permission denied" rather than
 > "stale", but you certainly shouldn't expect it to work.
 >
 > If, on the other hand, you have specified no_root_squash, then this is
 > still very strange.
 > 
 > What export options are you using?

I'm sorry for not clarifying that immediately. The entire export line
goes like this:

/home 192.168.0.0/16(sync,rw,no_root_squash,subtree_check)

So as you can see, I am indeed using no_root_squash, which is exactly
why I thought it was so strange. I did take a look at nfsd_acceptable,
and I just couldn't understand it (I understood the code, but I
couldn't understand how it could fail).

FYI, I'm currently using ReiserFS with ACL patches on this filesystem,
but before that I was using ReiserFS without ACL, and before that I
was using XFS, and these errors have been there all the time. At first
I thought the errors were because of XFS, and therefore I switched to
ReiserFS, but that didn't stop it from happening.

Fredrik Tolf


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-12-04  3:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-26 22:43 2.6 nfsd troubles - stale filehandles Fredrik Tolf
2003-11-26 23:42 ` Neil Brown
2003-11-27 13:49   ` Fredrik Tolf
2003-12-03  0:26     ` Fredrik Tolf
2003-12-04  3:14       ` Neil Brown
2003-12-04  3:31         ` Fredrik Tolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).