linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* nfs_statfs: statfs error = 116
@ 2003-11-13 14:15 martin.knoblauch 
  2003-11-13 14:39 ` Richard B. Johnson
  0 siblings, 1 reply; 11+ messages in thread
From: martin.knoblauch  @ 2003-11-13 14:15 UTC (permalink / raw)
  To: linux-kernel

Hi,

  sorry if OT, but what is above message trying to tell me? Where can I 
find a translation of the numbers? We are seeing 116 very frequently, 
512 and 5 on occasion.


  We have a bunch of Linux clients (Dual P4, RH7.3, 2.4.20-18.7smp 
errata kernel) hanging off two Sun NFS Servers (Solaris 8) in a 
Veritas/VCS HA configuration. All of the clients show the 116 messages, 
while some of them show the 512 in addition. Those with the 512s seem to 
"hang" for some periods of time.

  The mounts are "vers=3,proto=tcp,hard,intr,bg". Some of them mounted 
at boottime, quite a few via "amd".

  Any ideas are welcome.

Thanks
Martin



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 14:15 nfs_statfs: statfs error = 116 martin.knoblauch 
@ 2003-11-13 14:39 ` Richard B. Johnson
  2003-11-13 14:52   ` Martin.Knoblauch
  2003-11-13 15:27   ` Trond Myklebust
  0 siblings, 2 replies; 11+ messages in thread
From: Richard B. Johnson @ 2003-11-13 14:39 UTC (permalink / raw)
  To: martin.knoblauch ; +Cc: Linux kernel

On Thu, 13 Nov 2003, martin.knoblauch  wrote:

> Hi,
>
>   sorry if OT, but what is above message trying to tell me? Where can I
> find a translation of the numbers? We are seeing 116 very frequently,
> 512 and 5 on occasion.
>

ESTALE is "errno" 116
EIO  is "errno" 5
ERESTARTSYS is "errno" 512

You can find these in /usr/include/asm/errno.h (not good to
directly include in a program).

The program reporting these errors should have included:

<errno.h>
<string.h>

Then used...
	strerror(errno);
or
	perror("");
etc.


Errno 512 should never be seen by user-mode program, so the
header file, /usr/include/linux/errno.h, states...

ESTALE happens when a mounted file-system is on a server that
went down or re-booted. The file-handles are then "stale".

EIO is a general catch-all for an I/O error.

ERESTARTSYS is the error returned by a server that has
re-booted that is supposed to tell the client-side software
to get a new file-handle because of an attempt to access with
a stale file-handle. When getting this error, the client
should have reopened the file(s) to obtain a new handle.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 14:39 ` Richard B. Johnson
@ 2003-11-13 14:52   ` Martin.Knoblauch
  2003-11-13 20:26     ` Jesse Pollard
  2003-11-13 15:27   ` Trond Myklebust
  1 sibling, 1 reply; 11+ messages in thread
From: Martin.Knoblauch @ 2003-11-13 14:52 UTC (permalink / raw)
  To: root; +Cc: Linux kernel





"Richard B. Johnson" <root@chaos.analogic.com> wrote on 11/13/2003 03:39:53
PM:

> On Thu, 13 Nov 2003, martin.knoblauch  wrote:
>
> > Hi,
> >
> >   sorry if OT, but what is above message trying to tell me? Where can I
> > find a translation of the numbers? We are seeing 116 very frequently,
> > 512 and 5 on occasion.
> >
>
> ESTALE is "errno" 116
> EIO  is "errno" 5
> ERESTARTSYS is "errno" 512
>
> You can find these in /usr/include/asm/errno.h (not good to
> directly include in a program).
>
> The program reporting these errors should have included:
>
> <errno.h>
> <string.h>
>

 The messages actually come out of the kernel-nfs code (inode.c). Should
have mentioned "dmesg" :-)

> Then used...
>    strerror(errno);
> or
>    perror("");
> etc.
>
>
> Errno 512 should never be seen by user-mode program, so the
> header file, /usr/include/linux/errno.h, states...
>

 This worries me a bit :-)

> ESTALE happens when a mounted file-system is on a server that
> went down or re-booted. The file-handles are then "stale".
>

 I am "alomost" sure that there were no reboot or failover events at the
time of most of the stale messages. But I'm not going to lay my hand on the
book for that.

> EIO is a general catch-all for an I/O error.
>
> ERESTARTSYS is the error returned by a server that has
> re-booted that is supposed to tell the client-side software
> to get a new file-handle because of an attempt to access with
> a stale file-handle. When getting this error, the client
> should have reopened the file(s) to obtain a new handle.
>

 Definitely no server reboot or HA Failover at the time of the messages.

Thanks
Martin


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 14:39 ` Richard B. Johnson
  2003-11-13 14:52   ` Martin.Knoblauch
@ 2003-11-13 15:27   ` Trond Myklebust
  2003-11-13 16:00     ` Richard B. Johnson
  1 sibling, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2003-11-13 15:27 UTC (permalink / raw)
  To: root; +Cc: martin.knoblauch , Linux kernel

>>>>> " " == Richard B Johnson <root@chaos.analogic.com> writes:

     > ESTALE happens when a mounted file-system is on a server that
     > went down or re-booted. The file-handles are then "stale".

Sort of. It means that the server is unable to find the file that
corresponds to the filehandle that the client sent it. If the server
strictly follows the NFS specs, then this is only supposed to happen
if somebody else has deleted the file (and this is why designing a
scheme for generating filehandles is such a difficult job).

Some broken servers do, however, "lose" the file in other interesting
and unpredictable ways.

     > ERESTARTSYS is the error returned by a server that has
     > re-booted that is supposed to tell the client-side software to
     > get a new file-handle because of an attempt to access with a
     > stale file-handle. When getting this error, the client should
     > have reopened the file(s) to obtain a new handle.

ERESTARTSYS actually just means that a signal was received while
inside a system call. If this results in a interruption of that
syscall, the kernel is supposed to translate ERESTARTSYS into the user
error EINTR.

Userland should therefore never have to handle ERESTARTSYS errors.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 15:27   ` Trond Myklebust
@ 2003-11-13 16:00     ` Richard B. Johnson
  2003-11-13 17:03       ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Richard B. Johnson @ 2003-11-13 16:00 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: martin.knoblauch , Linux kernel

On Thu, 13 Nov 2003, Trond Myklebust wrote:

> >>>>> " " == Richard B Johnson <root@chaos.analogic.com> writes:
>
>      > ESTALE happens when a mounted file-system is on a server that
>      > went down or re-booted. The file-handles are then "stale".
>
> Sort of. It means that the server is unable to find the file that
> corresponds to the filehandle that the client sent it. If the server
> strictly follows the NFS specs, then this is only supposed to happen
> if somebody else has deleted the file (and this is why designing a
> scheme for generating filehandles is such a difficult job).
>
> Some broken servers do, however, "lose" the file in other interesting
> and unpredictable ways.
>
>      > ERESTARTSYS is the error returned by a server that has
>      > re-booted that is supposed to tell the client-side software to
>      > get a new file-handle because of an attempt to access with a
>      > stale file-handle. When getting this error, the client should
>      > have reopened the file(s) to obtain a new handle.
>
> ERESTARTSYS actually just means that a signal was received while
> inside a system call. If this results in a interruption of that
> syscall, the kernel is supposed to translate ERESTARTSYS into the user
> error EINTR.
>
> Userland should therefore never have to handle ERESTARTSYS errors.
>

Hmmm, Maybe I'm getting confused by all the winning-lottery messages,
but it's in the syscall specifications for connect() and
even fcntl(). http:/www.infran.ru/Techinfo/syscalls/syscalls_43.html

Also, maybe Linux now claims exclusive ownership and keeps it internal,
but some networking software, nfsd and pcnfsd, might not know about that.
I've seen ERESTARTSYS returned from a DOS (actually FAT) file-handle use
after a server has crashed and come back on-line.

Moot point, though, the reported errors were internal via syslog, which
was not previously known when I responded.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 16:00     ` Richard B. Johnson
@ 2003-11-13 17:03       ` Trond Myklebust
  0 siblings, 0 replies; 11+ messages in thread
From: Trond Myklebust @ 2003-11-13 17:03 UTC (permalink / raw)
  To: root; +Cc: martin.knoblauch , Linux kernel

>>>>> " " == Richard B Johnson <root@chaos.analogic.com> writes:

    >> ERESTARTSYS actually just means that a signal was received
    >> while inside a system call. If this results in a interruption
    >> of that syscall, the kernel is supposed to translate
    >> ERESTARTSYS into the user error EINTR.

     > Hmmm, Maybe I'm getting confused by all the winning-lottery
     > messages, but it's in the syscall specifications for connect()
     > and even
     > fcntl(). http:/www.infran.ru/Techinfo/syscalls/syscalls_43.html

AFAICS that documentation was written in 1994, and refers to Linux
v1.0. We've come a long way since then...

Todays Linux userland is supposed to try to comply with the Single
Unix Specification (see http://www.unix-systems.org/version3/)
whenever possible. ERESTARTSYS is missing altogether from the SUSv3
definitions in <errno.h> (and hence does not appear as a valid return
value for any SUSv3-compliant functions).

Note: the Linux manpages do list ERESTARTSYS as still being returned
by the accept() and syslog() system call. In both those cases,
however, they point out that your libc is supposed to intercept it
before it gets to the user.

     > Also, maybe Linux now claims exclusive ownership and keeps it
     > internal, but some networking software, nfsd and pcnfsd, might
     > not know about that.  I've seen ERESTARTSYS returned from a DOS
     > (actually FAT) file-handle use after a server has crashed and
     > come back on-line.

Linux used to be buggy/non-compliant w.r.t. NFS exporting of FAT
filesystems. I'm not sure if that has been fixed yet.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 14:52   ` Martin.Knoblauch
@ 2003-11-13 20:26     ` Jesse Pollard
  2003-11-13 20:34       ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Jesse Pollard @ 2003-11-13 20:26 UTC (permalink / raw)
  To: Martin.Knoblauch, root; +Cc: Linux kernel

On Thursday 13 November 2003 08:52, Martin.Knoblauch@mscsoftware.com wrote:
> "Richard B. Johnson" <root@chaos.analogic.com> wrote on 11/13/2003 03:39:53
>
> PM:
> > On Thu, 13 Nov 2003, martin.knoblauch  wrote:
[snip]
> > ESTALE happens when a mounted file-system is on a server that
> > went down or re-booted. The file-handles are then "stale".
>
>  I am "alomost" sure that there were no reboot or failover events at the
> time of most of the stale messages. But I'm not going to lay my hand on the
> book for that.

ESTALE should occur whenever the client looses connection to the server,
or thinks it has lost connection. It isn't directly related to the server
other than the fact that a server reboot will also cause it to happen.

This should be a transient failure that recovers when communication verified
from some of the timeouts/retries associated with NFS.

At worst, it can require a remount of the NFS volumn.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 20:26     ` Jesse Pollard
@ 2003-11-13 20:34       ` Trond Myklebust
  2003-11-14  8:43         ` Martin.Knoblauch
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2003-11-13 20:34 UTC (permalink / raw)
  To: Jesse Pollard; +Cc: Martin.Knoblauch, root, Linux kernel

>>>>> " " == Jesse Pollard <jesse@cats-chateau.net> writes:

     > ESTALE should occur whenever the client looses connection to
     > the server, or thinks it has lost connection.

No it should not.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-13 20:34       ` Trond Myklebust
@ 2003-11-14  8:43         ` Martin.Knoblauch
  2003-11-14 13:49           ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Martin.Knoblauch @ 2003-11-14  8:43 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jesse Pollard, Linux kernel, root






Trond Myklebust <trond.myklebust@fys.uio.no> wrote on 11/13/2003 09:34:55
PM:

> >>>>> " " == Jesse Pollard <jesse@cats-chateau.net> writes:
>
>      > ESTALE should occur whenever the client looses connection to
>      > the server, or thinks it has lost connection.
>
> No it should not.
>
> Cheers,
>   Trond
Hi Trond,

 just by incident I found one reason when an user space application can get
the ESTALE in our setup (Linux client RH-2.4.20-18.7smp, Solaris 2.8
Server). I accidentally run iozone on two clients with the output file
being the same and residing on the NFS Server. Pure luser error, but it
produced ESTALE pretty much reproducibly.

B^HCheers
Martin
--
Martin Knoblauch
Senior System Architect
MSC.software GmbH
Am Moosfeld 13
D-81829 Muenchen, Germany

e-mail: martin.knoblauch@mscsoftware.com
http://www.mscsoftware.com
Phone/Fax: +49-89-431987-189 / -7189
Mobile: +49-174-3069245



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-14  8:43         ` Martin.Knoblauch
@ 2003-11-14 13:49           ` Trond Myklebust
  2003-11-14 14:22             ` Martin.Knoblauch
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2003-11-14 13:49 UTC (permalink / raw)
  To: Martin.Knoblauch; +Cc: Linux kernel

>>>>> " " == Martin Knoblauch <Martin.Knoblauch@mscsoftware.com> writes:

     > I accidentally run iozone on two clients with the output file
     > being the same and residing on the NFS Server. Pure luser
     > error, but it produced ESTALE pretty much reproducibly.

Sure. This is a prime example of where ESTALE *is* appropriate. One
NFS client is deleting a file on the server while the other is still
using it.

In the NFSv2/v3 protocols, the assumption is that filehandles are
valid for the entire lifetime of the file on the server. IOW only
"unlink()" can cause a valid filehandle to become stale. This is
mainly because there is no notion of open()/close(), so the server
would never be capable of determining when your client has stopped
using the filehandle.

If your 2 processes were running on the same machine, you would have
seen the kernel temporarily rename your file to .nfsXXXXXX in order to
work around the above problem. Delete that file, and you will generate
ESTALE reproducibly too....

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs_statfs: statfs error = 116
  2003-11-14 13:49           ` Trond Myklebust
@ 2003-11-14 14:22             ` Martin.Knoblauch
  0 siblings, 0 replies; 11+ messages in thread
From: Martin.Knoblauch @ 2003-11-14 14:22 UTC (permalink / raw)
  To: trond.myklebust; +Cc: Linux kernel






Trond Myklebust <trond.myklebust@fys.uio.no> wrote on 11/14/2003 02:49:31
PM:

> >>>>> " " == Martin Knoblauch <Martin.Knoblauch@mscsoftware.com> writes:
>
>      > I accidentally run iozone on two clients with the output file
>      > being the same and residing on the NFS Server. Pure luser
>      > error, but it produced ESTALE pretty much reproducibly.
>
> Sure. This is a prime example of where ESTALE *is* appropriate. One
> NFS client is deleting a file on the server while the other is still
> using it.
>
> In the NFSv2/v3 protocols, the assumption is that filehandles are
> valid for the entire lifetime of the file on the server. IOW only
> "unlink()" can cause a valid filehandle to become stale. This is
> mainly because there is no notion of open()/close(), so the server
> would never be capable of determining when your client has stopped
> using the filehandle.
>
> If your 2 processes were running on the same machine, you would have
> seen the kernel temporarily rename your file to .nfsXXXXXX in order to
> work around the above problem. Delete that file, and you will generate
> ESTALE reproducibly too....
>
> Cheers,
>   Trond
Trond,

 cool. Great explanation. Always good if you can get those that know into
talking :-)

Cheers
Martin


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2003-11-14 14:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-13 14:15 nfs_statfs: statfs error = 116 martin.knoblauch 
2003-11-13 14:39 ` Richard B. Johnson
2003-11-13 14:52   ` Martin.Knoblauch
2003-11-13 20:26     ` Jesse Pollard
2003-11-13 20:34       ` Trond Myklebust
2003-11-14  8:43         ` Martin.Knoblauch
2003-11-14 13:49           ` Trond Myklebust
2003-11-14 14:22             ` Martin.Knoblauch
2003-11-13 15:27   ` Trond Myklebust
2003-11-13 16:00     ` Richard B. Johnson
2003-11-13 17:03       ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).