All of lore.kernel.org
 help / color / mirror / Atom feed
* "exportfs -a" -> stale NFS filehandle
@ 2007-11-14 23:19 ` Kaz Kylheku
  0 siblings, 0 replies; 13+ messages in thread
From: Kaz Kylheku @ 2007-11-14 23:19 UTC (permalink / raw)
  To: linux-mips

Hi all,

I have an NFS problem on a multi-node MIPS system running kernel
2.6.17.7. NFS utils is 1.1.0. ABI is n32.

One node (call it primary) exports a directory which is mounted by
several others (the secondaries) as their root filesystem.

If I run "exportfs -a" on the primary, the secondary nodes lose their
root filesystem and so everything stops working.

I turned on all NFS debugging on a secondary node (sysctl -w
sunrpc.nfs_debug=65535). What is happening is that NFS operations
suddenly start returning error -151 (stale NFS filehandle).

I don't see exportfs causing this problem on other systems. If I run
"exportfs -a" on a big NFS server (Fedora Core 5, i686) which has lots
of diskless clients, nothing bad happens. (And some of those diskless
clients are MIPS systems just like this one!)

I'm pretty sure that exportfs -a shouldn't screw up the existing mounted
clients.

Could there be some ABI problem that corrupts up the effect of the
re-exporting operation on the server?

(This issure reproduces always. Something which reproduces rarely is a
kernel crash on a secondary node, inside the nfsd process, also
apparently in response to the "exportfs -a". I don't yet have enough
information about that one, such as a call trace, etc. That one I can
drill into, if I have a program counter and call stack.)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* "exportfs -a" -> stale NFS filehandle
@ 2007-11-14 23:19 ` Kaz Kylheku
  0 siblings, 0 replies; 13+ messages in thread
From: Kaz Kylheku @ 2007-11-14 23:19 UTC (permalink / raw)
  To: linux-mips

Hi all,

I have an NFS problem on a multi-node MIPS system running kernel
2.6.17.7. NFS utils is 1.1.0. ABI is n32.

One node (call it primary) exports a directory which is mounted by
several others (the secondaries) as their root filesystem.

If I run "exportfs -a" on the primary, the secondary nodes lose their
root filesystem and so everything stops working.

I turned on all NFS debugging on a secondary node (sysctl -w
sunrpc.nfs_debug=65535). What is happening is that NFS operations
suddenly start returning error -151 (stale NFS filehandle).

I don't see exportfs causing this problem on other systems. If I run
"exportfs -a" on a big NFS server (Fedora Core 5, i686) which has lots
of diskless clients, nothing bad happens. (And some of those diskless
clients are MIPS systems just like this one!)

I'm pretty sure that exportfs -a shouldn't screw up the existing mounted
clients.

Could there be some ABI problem that corrupts up the effect of the
re-exporting operation on the server?

(This issure reproduces always. Something which reproduces rarely is a
kernel crash on a secondary node, inside the nfsd process, also
apparently in response to the "exportfs -a". I don't yet have enough
information about that one, such as a call trace, etc. That one I can
drill into, if I have a program counter and call stack.)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: "exportfs -a" -> stale NFS filehandle
  2007-11-14 23:19 ` Kaz Kylheku
  (?)
@ 2007-11-15  0:48 ` Ralf Baechle
  2007-11-15 18:38     ` Kaz Kylheku
  -1 siblings, 1 reply; 13+ messages in thread
From: Ralf Baechle @ 2007-11-15  0:48 UTC (permalink / raw)
  To: Kaz Kylheku; +Cc: linux-mips

On Wed, Nov 14, 2007 at 03:19:43PM -0800, Kaz Kylheku wrote:

> I have an NFS problem on a multi-node MIPS system running kernel
> 2.6.17.7. NFS utils is 1.1.0. ABI is n32.
> 
> One node (call it primary) exports a directory which is mounted by
> several others (the secondaries) as their root filesystem.
> 
> If I run "exportfs -a" on the primary, the secondary nodes lose their
> root filesystem and so everything stops working.
> 
> I turned on all NFS debugging on a secondary node (sysctl -w
> sunrpc.nfs_debug=65535). What is happening is that NFS operations
> suddenly start returning error -151 (stale NFS filehandle).
> 
> I don't see exportfs causing this problem on other systems. If I run
> "exportfs -a" on a big NFS server (Fedora Core 5, i686) which has lots
> of diskless clients, nothing bad happens. (And some of those diskless
> clients are MIPS systems just like this one!)
> 
> I'm pretty sure that exportfs -a shouldn't screw up the existing mounted
> clients.
> 
> Could there be some ABI problem that corrupts up the effect of the
> re-exporting operation on the server?

Can you test below patch?

  Ralf

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

diff --git a/arch/mips/kernel/scall64-n32.S b/arch/mips/kernel/scall64-n32.S
index 118be24..01993ec 100644
--- a/arch/mips/kernel/scall64-n32.S
+++ b/arch/mips/kernel/scall64-n32.S
@@ -293,7 +293,7 @@ EXPORT(sysn32_call_table)
 	PTR	sys_ni_syscall			/* 6170, was get_kernel_syms */
 	PTR	sys_ni_syscall			/* was query_module */
 	PTR	sys_quotactl
-	PTR	sys_nfsservctl
+	PTR	compat_sys_nfsservctl
 	PTR	sys_ni_syscall			/* res. for getpmsg */
 	PTR	sys_ni_syscall			/* 6175  for putpmsg */
 	PTR	sys_ni_syscall			/* res. for afs_syscall */

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: "exportfs -a" -> stale NFS filehandle
@ 2007-11-15 18:38     ` Kaz Kylheku
  0 siblings, 0 replies; 13+ messages in thread
From: Kaz Kylheku @ 2007-11-15 18:38 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Ralf Baechle wrote:
> Can you test below patch?
> 
>   Ralf

[ snip ]

> -	PTR	sys_nfsservctl
> +	PTR	compat_sys_nfsservctl

That's damn funny!

I checked for replies this morning, but your e-mail went to my inbox
rather than my linux-mips folder, so I didn't see it.

I just made that change just moments ago. 

As I'm compiling it, a coworker says, ``Kaz, did you see Ralph's
reply?''. :)

Nope!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: "exportfs -a" -> stale NFS filehandle
@ 2007-11-15 18:38     ` Kaz Kylheku
  0 siblings, 0 replies; 13+ messages in thread
From: Kaz Kylheku @ 2007-11-15 18:38 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Ralf Baechle wrote:
> Can you test below patch?
> 
>   Ralf

[ snip ]

> -	PTR	sys_nfsservctl
> +	PTR	compat_sys_nfsservctl

That's damn funny!

I checked for replies this morning, but your e-mail went to my inbox
rather than my linux-mips folder, so I didn't see it.

I just made that change just moments ago. 

As I'm compiling it, a coworker says, ``Kaz, did you see Ralph's
reply?''. :)

Nope!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: "exportfs -a" -> stale NFS filehandle
@ 2007-11-15 19:26       ` Kaz Kylheku
  0 siblings, 0 replies; 13+ messages in thread
From: Kaz Kylheku @ 2007-11-15 19:26 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

linux-mips-bounce@linux-mips.org wrote:
> Ralf Baechle wrote:
>> Can you test below patch?
>> 
>>   Ralf
> 
> [ snip ]
> 
>> -	PTR	sys_nfsservctl
>> +	PTR	compat_sys_nfsservctl
> 
> That's damn funny!

... but it doesn't work. Now the slave systems won't even boot at all.

  Looking up port of RPC 100003/2 on 127.3.0.1
  Root-NFS: Unable to get nfsd port number from server, using default
  Looking up port of RPC 100005/1 on 127.3.0.1
  Root-NFS: Server returned error -13 while mounting /cf2

Ah, but the reason for /that/ is that I have an n32 patch against
nfsutils in user space already, which has to be backed out.

After backing out the nfsutils patch, the diskless node does boot.

However, the original "exportfs -a" problem comes back!

So this problem is not resolved simply by using the correct compat
routine; it's deeper.

Sigh.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: "exportfs -a" -> stale NFS filehandle
@ 2007-11-15 19:26       ` Kaz Kylheku
  0 siblings, 0 replies; 13+ messages in thread
From: Kaz Kylheku @ 2007-11-15 19:26 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

linux-mips-bounce@linux-mips.org wrote:
> Ralf Baechle wrote:
>> Can you test below patch?
>> 
>>   Ralf
> 
> [ snip ]
> 
>> -	PTR	sys_nfsservctl
>> +	PTR	compat_sys_nfsservctl
> 
> That's damn funny!

... but it doesn't work. Now the slave systems won't even boot at all.

  Looking up port of RPC 100003/2 on 127.3.0.1
  Root-NFS: Unable to get nfsd port number from server, using default
  Looking up port of RPC 100005/1 on 127.3.0.1
  Root-NFS: Server returned error -13 while mounting /cf2

Ah, but the reason for /that/ is that I have an n32 patch against
nfsutils in user space already, which has to be backed out.

After backing out the nfsutils patch, the diskless node does boot.

However, the original "exportfs -a" problem comes back!

So this problem is not resolved simply by using the correct compat
routine; it's deeper.

Sigh.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: "exportfs -a" -> stale NFS filehandle
  2007-11-15 19:26       ` Kaz Kylheku
  (?)
@ 2007-11-15 19:45       ` Ralf Baechle
  2007-11-15 20:15           ` Kaz Kylheku
  -1 siblings, 1 reply; 13+ messages in thread
From: Ralf Baechle @ 2007-11-15 19:45 UTC (permalink / raw)
  To: Kaz Kylheku; +Cc: linux-mips

On Thu, Nov 15, 2007 at 11:26:06AM -0800, Kaz Kylheku wrote:

> After backing out the nfsutils patch, the diskless node does boot.
> 
> However, the original "exportfs -a" problem comes back!
> 
> So this problem is not resolved simply by using the correct compat
> routine; it's deeper.
> 
> Sigh.

Thanks for testing anyway!

  Ralf

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: "exportfs -a" -> stale NFS filehandle
@ 2007-11-15 20:15           ` Kaz Kylheku
  0 siblings, 0 replies; 13+ messages in thread
From: Kaz Kylheku @ 2007-11-15 20:15 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Ralf Baechle wrote:
> On Thu, Nov 15, 2007 at 11:26:06AM -0800, Kaz Kylheku wrote:
> 
>> After backing out the nfsutils patch, the diskless node does boot.
>> 
>> However, the original "exportfs -a" problem comes back!
>> 
>> So this problem is not resolved simply by using the correct compat
>> routine; it's deeper. 
>> 
>> Sigh.
> 
> Thanks for testing anyway!

I'm continuing to dig into the problem.

The export logic doesn't even go through nfsctl() anyway, which is why I
originally hadn't even suspected that syscall.

The nfsexport() function in nfsutils first tries opening
"/proc/net/rpc/nfsd.fh./channel". If that works, it uses that, via a
text-based protocol. Only if that interface doesn't exist does it fall
back on the nfsctl(NFSCTL_EXPORT, ...) interface.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: "exportfs -a" -> stale NFS filehandle
@ 2007-11-15 20:15           ` Kaz Kylheku
  0 siblings, 0 replies; 13+ messages in thread
From: Kaz Kylheku @ 2007-11-15 20:15 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Ralf Baechle wrote:
> On Thu, Nov 15, 2007 at 11:26:06AM -0800, Kaz Kylheku wrote:
> 
>> After backing out the nfsutils patch, the diskless node does boot.
>> 
>> However, the original "exportfs -a" problem comes back!
>> 
>> So this problem is not resolved simply by using the correct compat
>> routine; it's deeper. 
>> 
>> Sigh.
> 
> Thanks for testing anyway!

I'm continuing to dig into the problem.

The export logic doesn't even go through nfsctl() anyway, which is why I
originally hadn't even suspected that syscall.

The nfsexport() function in nfsutils first tries opening
"/proc/net/rpc/nfsd.fh./channel". If that works, it uses that, via a
text-based protocol. Only if that interface doesn't exist does it fall
back on the nfsctl(NFSCTL_EXPORT, ...) interface.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: "exportfs -a" -> stale NFS filehandle
  2007-11-15 20:15           ` Kaz Kylheku
  (?)
@ 2007-11-15 23:02           ` Ralf Baechle
  -1 siblings, 0 replies; 13+ messages in thread
From: Ralf Baechle @ 2007-11-15 23:02 UTC (permalink / raw)
  To: Kaz Kylheku; +Cc: linux-mips

On Thu, Nov 15, 2007 at 12:15:39PM -0800, Kaz Kylheku wrote:

> Ralf Baechle wrote:
> > On Thu, Nov 15, 2007 at 11:26:06AM -0800, Kaz Kylheku wrote:
> > 
> >> After backing out the nfsutils patch, the diskless node does boot.
> >> 
> >> However, the original "exportfs -a" problem comes back!
> >> 
> >> So this problem is not resolved simply by using the correct compat
> >> routine; it's deeper. 
> >> 
> >> Sigh.
> > 
> > Thanks for testing anyway!
> 
> I'm continuing to dig into the problem.
> 
> The export logic doesn't even go through nfsctl() anyway, which is why I
> originally hadn't even suspected that syscall.
> 
> The nfsexport() function in nfsutils first tries opening
> "/proc/net/rpc/nfsd.fh./channel". If that works, it uses that, via a
> text-based protocol. Only if that interface doesn't exist does it fall
> back on the nfsctl(NFSCTL_EXPORT, ...) interface.

After checking that latest glibc still isn't trying to compensate for the
N32 nfsservctl issue in userland I've applied the patch I sent you earlier.

  Ralf

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: "exportfs -a" -> stale NFS filehandle
@ 2007-11-19 22:26             ` Kaz Kylheku
  0 siblings, 0 replies; 13+ messages in thread
From: Kaz Kylheku @ 2007-11-19 22:26 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Last week, I wrote:
> Ralf Baechle wrote:
>> On Thu, Nov 15, 2007 at 11:26:06AM -0800, Kaz Kylheku wrote:
>> 
>>> After backing out the nfsutils patch, the diskless node does boot.
>>> 
>>> However, the original "exportfs -a" problem comes back!
>>> 
>>> So this problem is not resolved simply by using the correct compat
>>> routine; it's deeper. 
>>> 
>>> Sigh.
>> 
>> Thanks for testing anyway!
> 
> I'm continuing to dig into the problem.
> 
> The export logic doesn't even go through nfsctl() anyway,
> which is why I
> originally hadn't even suspected that syscall.
> 
> The nfsexport() function in nfsutils first tries opening
> "/proc/net/rpc/nfsd.fh./channel". If that works, it uses that, via a
> text-based protocol. Only if that interface doesn't exist does it fall
> back on the nfsctl(NFSCTL_EXPORT, ...) interface.

Basically, the export table is being mismanaged. Simply restarting NFS
(service nfs restart) will cause this problem to appear.

When the system is first booted up and NFS is started in runlevel 3 by
the nfs init script, the exportfs command correctly populates the export
table based on the /etc/exports file.

However, after that, further management of the export table fails. Doing
an "exportfs -a" clears it out. You can see the table in
/proc/net/rpc/nfsd.export/content. Before the operation, the table has
valid entries. After the operation, it simply clears out and stays
empty. 

This is in spite of the fact that the exportfs command seems to be doing
exactly what it did the first time when NFS was successfully started
(i.e. it's a kernel problem; user space is doing the same thing that
worked before).

I verified that by turning on various additional tracing with sysctl
(sunrpc.nfsd_debug), and I added some extra traces to the function that
adds exports (svc_export_parse) to view the messages that are coming
down the nfsd.fh/channel pipe in /proc.

So the summary is that this problem appears to be some kind of
corruption of the RPC cache for exports.

I did see the kernel crash with an alignment exception once upon
reproducing the problem, but haven't been able to repro that.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: "exportfs -a" -> stale NFS filehandle
@ 2007-11-19 22:26             ` Kaz Kylheku
  0 siblings, 0 replies; 13+ messages in thread
From: Kaz Kylheku @ 2007-11-19 22:26 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Last week, I wrote:
> Ralf Baechle wrote:
>> On Thu, Nov 15, 2007 at 11:26:06AM -0800, Kaz Kylheku wrote:
>> 
>>> After backing out the nfsutils patch, the diskless node does boot.
>>> 
>>> However, the original "exportfs -a" problem comes back!
>>> 
>>> So this problem is not resolved simply by using the correct compat
>>> routine; it's deeper. 
>>> 
>>> Sigh.
>> 
>> Thanks for testing anyway!
> 
> I'm continuing to dig into the problem.
> 
> The export logic doesn't even go through nfsctl() anyway,
> which is why I
> originally hadn't even suspected that syscall.
> 
> The nfsexport() function in nfsutils first tries opening
> "/proc/net/rpc/nfsd.fh./channel". If that works, it uses that, via a
> text-based protocol. Only if that interface doesn't exist does it fall
> back on the nfsctl(NFSCTL_EXPORT, ...) interface.

Basically, the export table is being mismanaged. Simply restarting NFS
(service nfs restart) will cause this problem to appear.

When the system is first booted up and NFS is started in runlevel 3 by
the nfs init script, the exportfs command correctly populates the export
table based on the /etc/exports file.

However, after that, further management of the export table fails. Doing
an "exportfs -a" clears it out. You can see the table in
/proc/net/rpc/nfsd.export/content. Before the operation, the table has
valid entries. After the operation, it simply clears out and stays
empty. 

This is in spite of the fact that the exportfs command seems to be doing
exactly what it did the first time when NFS was successfully started
(i.e. it's a kernel problem; user space is doing the same thing that
worked before).

I verified that by turning on various additional tracing with sysctl
(sunrpc.nfsd_debug), and I added some extra traces to the function that
adds exports (svc_export_parse) to view the messages that are coming
down the nfsd.fh/channel pipe in /proc.

So the summary is that this problem appears to be some kind of
corruption of the RPC cache for exports.

I did see the kernel crash with an alignment exception once upon
reproducing the problem, but haven't been able to repro that.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-11-19 22:27 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-14 23:19 "exportfs -a" -> stale NFS filehandle Kaz Kylheku
2007-11-14 23:19 ` Kaz Kylheku
2007-11-15  0:48 ` Ralf Baechle
2007-11-15 18:38   ` Kaz Kylheku
2007-11-15 18:38     ` Kaz Kylheku
2007-11-15 19:26     ` Kaz Kylheku
2007-11-15 19:26       ` Kaz Kylheku
2007-11-15 19:45       ` Ralf Baechle
2007-11-15 20:15         ` Kaz Kylheku
2007-11-15 20:15           ` Kaz Kylheku
2007-11-15 23:02           ` Ralf Baechle
2007-11-19 22:26           ` Kaz Kylheku
2007-11-19 22:26             ` Kaz Kylheku

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.