All of lore.kernel.org
 help / color / mirror / Atom feed
* what on earth is going on here? paths above mountpoints turn into "(unreachable)"
@ 2015-02-03  0:25 Nix
  2015-02-03 19:53 ` J. Bruce Fields
  0 siblings, 1 reply; 30+ messages in thread
From: Nix @ 2015-02-03  0:25 UTC (permalink / raw)
  To: NFS list

This is with client and server both running NFSv3 on 3.18.4 (an upgrade
to .5 is on the cards soon). (This is *not* the same client I've been
reporting the panic on, though the server is the same, and this client
too has been seeing the panic. Not that that's relevant, since that's a
bug on shutdown, and this isn't. Indeed this may not be a bug at all,
but it's new behaviour with 3.18.x, and it breaks programs, and it's
weird.)

The server says (for this client):

/usr/archive -fsid=25,root_squash,async,subtree_check,crossmnt mutilate(rw,root_squash,insecure)
/usr/archive/series -fsid=29,root_squash,async,subtree_check mutilate(rw,root_squash,insecure)

The client says:

package.srvr.nix:/usr/archive /usr/archive nfs defaults,rw

(i.e. relying on the crossmnt, though I have just changed this to
explicitly mount both mount points, while preserving the crossmnt on the
parent for the sake of other clients that can't mount both because
they're using libnfs and need to follow symlinks from other places under
/usr/archive/ into /usr/archive/series, and the software is too stupid
to understand that there might be more than one mount point involved.)

I'm seeing this bizarre output after a long delay (memory pressure not
required: vapoursynth, which was running here, is using a couple of gig
out of 16GiB, and the machine has 12GiB in buffers/cache):

FileNotFoundError: [Errno 2] No such file or directory: '(unreachable)/Orphan-Black/1'
Failed to retrieve output node. Invalid index specified?
pipe:: Invalid data found when processing input
nix@mutilate 45 .../Orphan-Black/1% pwd -P
/usr/archive/series/Orphan-Black/1
# Well, doesn't this cwd look weird.
nix@mutilate 46 ...//1% ls -l /proc/self/cwd
lrwxrwxrwx 1 nix users 0 Feb  2 23:28 /proc/self/cwd -> /Orphan-Black/1
nix@mutilate 49 .../Orphan-Black/1% ls -id .
624194 .
# Try going out of the directory and back into it again.
nix@mutilate 50 .../Orphan-Black/1% cd -
/tmp
nix@mutilate 51 /tmp% cd -
/usr/archive/series/Orphan-Black/1
# Same inode, but now the cwd is valid!
nix@mutilate 52 .../Orphan-Black/1% ls -id .
624194 .
nix@mutilate 54 .../Orphan-Black/1% ls -l /proc/self/cwd
lrwxrwxrwx 1 nix users 0 Feb  2 23:28 /proc/self/cwd -> /usr/archive/series/Orphan-Black/1

So something about the mountpoint is expiring away, as if it had been
umount -l'ed. No automounter of any kind is running, and (obviously) it
*wasn't* umount -l'ed.

I guess by tomorrow I'll know if this is crossmnt-related, at least... I
know crossmnt is considered bad and evil: is this sort of thing why?

The filesystem being NFS-exported is on a USB-attached disk that spins
down when not in use, but I'd not expect that to do anything other than
cause delays on first access (and indeed even if I turn off spindown
this still happens, even though the filesystem is perfectly accessible,
from the server and indeed from the client: it's just... disconnected,
which seems to greatly annoy a lot of programs.)

If it happens again I'll see if I can arrange to have two processes, one
with a cwd in the full path, one with a cwd in the truncated one. That
at least will tell us whether this is some sort of expiry thing attached
to one mountpoint, and going back into it is de-expiring it for all
users under that mountpoint, or whether we really are seeing a new mount
here, somehow (how, without mount(2) ever being called?!).

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-03  0:25 what on earth is going on here? paths above mountpoints turn into "(unreachable)" Nix
@ 2015-02-03 19:53 ` J. Bruce Fields
  2015-02-03 19:57   ` Nix
  0 siblings, 1 reply; 30+ messages in thread
From: J. Bruce Fields @ 2015-02-03 19:53 UTC (permalink / raw)
  To: Nix; +Cc: NFS list

On Tue, Feb 03, 2015 at 12:25:18AM +0000, Nix wrote:
> This is with client and server both running NFSv3 on 3.18.4 (an upgrade
> to .5 is on the cards soon). (This is *not* the same client I've been
> reporting the panic on, though the server is the same, and this client
> too has been seeing the panic. Not that that's relevant, since that's a
> bug on shutdown, and this isn't. Indeed this may not be a bug at all,
> but it's new behaviour with 3.18.x, and it breaks programs, and it's
> weird.)
> 
> The server says (for this client):
> 
> /usr/archive -fsid=25,root_squash,async,subtree_check,crossmnt mutilate(rw,root_squash,insecure)
> /usr/archive/series -fsid=29,root_squash,async,subtree_check mutilate(rw,root_squash,insecure)
> 
> The client says:
> 
> package.srvr.nix:/usr/archive /usr/archive nfs defaults,rw
> 
> (i.e. relying on the crossmnt, though I have just changed this to
> explicitly mount both mount points, while preserving the crossmnt on the
> parent for the sake of other clients that can't mount both because
> they're using libnfs and need to follow symlinks from other places under
> /usr/archive/ into /usr/archive/series, and the software is too stupid
> to understand that there might be more than one mount point involved.)
> 
> I'm seeing this bizarre output after a long delay (memory pressure not
> required: vapoursynth, which was running here, is using a couple of gig
> out of 16GiB, and the machine has 12GiB in buffers/cache):
> 
> FileNotFoundError: [Errno 2] No such file or directory: '(unreachable)/Orphan-Black/1'

Haven't really read this carefully, just noticed the ENOENT.  There was
49a068f82a "rpc: fix xdr_truncate_encode to handle buffer ending on page
boundary" recently, fixing a problem introduced in 3.16 that could I
think cause an enoent if Orphan-Black/ was a large-ish directory.

Actually, that fix is in 3.18.3, never mind....

--b.

> Failed to retrieve output node. Invalid index specified?
> pipe:: Invalid data found when processing input
> nix@mutilate 45 .../Orphan-Black/1% pwd -P
> /usr/archive/series/Orphan-Black/1
> # Well, doesn't this cwd look weird.
> nix@mutilate 46 ...//1% ls -l /proc/self/cwd
> lrwxrwxrwx 1 nix users 0 Feb  2 23:28 /proc/self/cwd -> /Orphan-Black/1
> nix@mutilate 49 .../Orphan-Black/1% ls -id .
> 624194 .
> # Try going out of the directory and back into it again.
> nix@mutilate 50 .../Orphan-Black/1% cd -
> /tmp
> nix@mutilate 51 /tmp% cd -
> /usr/archive/series/Orphan-Black/1
> # Same inode, but now the cwd is valid!
> nix@mutilate 52 .../Orphan-Black/1% ls -id .
> 624194 .
> nix@mutilate 54 .../Orphan-Black/1% ls -l /proc/self/cwd
> lrwxrwxrwx 1 nix users 0 Feb  2 23:28 /proc/self/cwd -> /usr/archive/series/Orphan-Black/1
> 
> So something about the mountpoint is expiring away, as if it had been
> umount -l'ed. No automounter of any kind is running, and (obviously) it
> *wasn't* umount -l'ed.
> 
> I guess by tomorrow I'll know if this is crossmnt-related, at least... I
> know crossmnt is considered bad and evil: is this sort of thing why?
> 
> The filesystem being NFS-exported is on a USB-attached disk that spins
> down when not in use, but I'd not expect that to do anything other than
> cause delays on first access (and indeed even if I turn off spindown
> this still happens, even though the filesystem is perfectly accessible,
> from the server and indeed from the client: it's just... disconnected,
> which seems to greatly annoy a lot of programs.)
> 
> If it happens again I'll see if I can arrange to have two processes, one
> with a cwd in the full path, one with a cwd in the truncated one. That
> at least will tell us whether this is some sort of expiry thing attached
> to one mountpoint, and going back into it is de-expiring it for all
> users under that mountpoint, or whether we really are seeing a new mount
> here, somehow (how, without mount(2) ever being called?!).
> 
> -- 
> NULL && (void)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-03 19:53 ` J. Bruce Fields
@ 2015-02-03 19:57   ` Nix
  2015-02-04 23:28     ` Nix
  0 siblings, 1 reply; 30+ messages in thread
From: Nix @ 2015-02-03 19:57 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: NFS list

On 3 Feb 2015, J. Bruce Fields spake thusly:

> On Tue, Feb 03, 2015 at 12:25:18AM +0000, Nix wrote:
>> I'm seeing this bizarre output after a long delay (memory pressure not
>> required: vapoursynth, which was running here, is using a couple of gig
>> out of 16GiB, and the machine has 12GiB in buffers/cache):
>> 
>> FileNotFoundError: [Errno 2] No such file or directory: '(unreachable)/Orphan-Black/1'
>
> Haven't really read this carefully, just noticed the ENOENT.  There was
> 49a068f82a "rpc: fix xdr_truncate_encode to handle buffer ending on page
> boundary" recently, fixing a problem introduced in 3.16 that could I
> think cause an enoent if Orphan-Black/ was a large-ish directory.

It's got three files in it. Orphan-Black/1 has fifteen. Way under, say,
a page.

The problem hasn't recurred since I mounted /usr/archive and
/usr/archive/series explicitly rather than relying on nohide, so I'd
guess the problem lies there: it is, after all, evil.

> Actually, that fix is in 3.18.3, never mind....

Guess it can't be that then :)

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-03 19:57   ` Nix
@ 2015-02-04 23:28     ` Nix
  2015-02-05  0:26       ` NeilBrown
  0 siblings, 1 reply; 30+ messages in thread
From: Nix @ 2015-02-04 23:28 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: NFS list

On 3 Feb 2015, nix@esperi.org.uk outgrape:

> On 3 Feb 2015, J. Bruce Fields spake thusly:
>
>> On Tue, Feb 03, 2015 at 12:25:18AM +0000, Nix wrote:
>>> I'm seeing this bizarre output after a long delay (memory pressure not
>>> required: vapoursynth, which was running here, is using a couple of gig
>>> out of 16GiB, and the machine has 12GiB in buffers/cache):
>>> 
>>> FileNotFoundError: [Errno 2] No such file or directory: '(unreachable)/Orphan-Black/1'
>>
>> Haven't really read this carefully, just noticed the ENOENT.  There was
>> 49a068f82a "rpc: fix xdr_truncate_encode to handle buffer ending on page
>> boundary" recently, fixing a problem introduced in 3.16 that could I
>> think cause an enoent if Orphan-Black/ was a large-ish directory.
>
> It's got three files in it. Orphan-Black/1 has fifteen. Way under, say,
> a page.
>
> The problem hasn't recurred since I mounted /usr/archive and
> /usr/archive/series explicitly rather than relying on nohide, so I'd
> guess the problem lies there: it is, after all, evil.

It doesn't. It still recurs.

If I cd out of the tree entirely (say, to /tmp) then back in, the
mountpoint often reconnects for all its users: if I do things in there
(e.g. an ls) it always seems to. If I cd within the disconnected subtree
via a relative path, it doesn't.

So running this often helps:

while sleep 300; do cd /usr/archive/series/Orphan-Black/2; ls > /dev/null; cd /tmp; done

but not always -- if the thing vanishes <300s before the last problem,
we're still in for it.

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-04 23:28     ` Nix
@ 2015-02-05  0:26       ` NeilBrown
  2015-02-10 17:48         ` Nix
  0 siblings, 1 reply; 30+ messages in thread
From: NeilBrown @ 2015-02-05  0:26 UTC (permalink / raw)
  To: Nix; +Cc: J. Bruce Fields, NFS list

[-- Attachment #1: Type: text/plain, Size: 2417 bytes --]

On Wed, 04 Feb 2015 23:28:17 +0000 Nix <nix@esperi.org.uk> wrote:

> On 3 Feb 2015, nix@esperi.org.uk outgrape:
> 
> > On 3 Feb 2015, J. Bruce Fields spake thusly:
> >
> >> On Tue, Feb 03, 2015 at 12:25:18AM +0000, Nix wrote:
> >>> I'm seeing this bizarre output after a long delay (memory pressure not
> >>> required: vapoursynth, which was running here, is using a couple of gig
> >>> out of 16GiB, and the machine has 12GiB in buffers/cache):
> >>> 
> >>> FileNotFoundError: [Errno 2] No such file or directory: '(unreachable)/Orphan-Black/1'
> >>
> >> Haven't really read this carefully, just noticed the ENOENT.  There was
> >> 49a068f82a "rpc: fix xdr_truncate_encode to handle buffer ending on page
> >> boundary" recently, fixing a problem introduced in 3.16 that could I
> >> think cause an enoent if Orphan-Black/ was a large-ish directory.
> >
> > It's got three files in it. Orphan-Black/1 has fifteen. Way under, say,
> > a page.
> >
> > The problem hasn't recurred since I mounted /usr/archive and
> > /usr/archive/series explicitly rather than relying on nohide, so I'd
> > guess the problem lies there: it is, after all, evil.
> 
> It doesn't. It still recurs.

Is /usr/archive still exported to mutilate with crossmnt?
If it is, can you change to not do that (it is quite possible to have
different export options for different clients).

I think that if crossmnt is enabled on the server, then explicitly
mounting /usr/archive/series will have the same net effect as not doing so
(though I'm not 100% certain).

Also, can you try changing
   /proc/sys/fs/nfs/nfs_mountpoint_timeout

It defaults to 500 (seconds - time for light from Sun to reach Earth).
If you make it smaller and the problem gets worse, or make it much bigger
and the problem goes away, that would be interesting.
If it makes no difference, that also would be interesting.

NeilBrown

> 
> If I cd out of the tree entirely (say, to /tmp) then back in, the
> mountpoint often reconnects for all its users: if I do things in there
> (e.g. an ls) it always seems to. If I cd within the disconnected subtree
> via a relative path, it doesn't.
> 
> So running this often helps:
> 
> while sleep 300; do cd /usr/archive/series/Orphan-Black/2; ls > /dev/null; cd /tmp; done
> 
> but not always -- if the thing vanishes <300s before the last problem,
> we're still in for it.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-05  0:26       ` NeilBrown
@ 2015-02-10 17:48         ` Nix
  2015-02-10 18:32           ` J. Bruce Fields
  2015-02-11  3:07           ` NeilBrown
  0 siblings, 2 replies; 30+ messages in thread
From: Nix @ 2015-02-10 17:48 UTC (permalink / raw)
  To: NeilBrown; +Cc: J. Bruce Fields, NFS list

On 5 Feb 2015, NeilBrown spake thusly:

> On Wed, 04 Feb 2015 23:28:17 +0000 Nix <nix@esperi.org.uk> wrote:
>> It doesn't. It still recurs.
>
> Is /usr/archive still exported to mutilate with crossmnt?
> If it is, can you change to not do that (it is quite possible to have
> different export options for different clients).

OK. Adjusted.

> I think that if crossmnt is enabled on the server, then explicitly
> mounting /usr/archive/series will have the same net effect as not doing so
> (though I'm not 100% certain).
>
> Also, can you try changing
>    /proc/sys/fs/nfs/nfs_mountpoint_timeout
>
> It defaults to 500 (seconds - time for light from Sun to reach Earth).
> If you make it smaller and the problem gets worse, or make it much bigger
> and the problem goes away, that would be interesting.
> If it makes no difference, that also would be interesting.

Seems to make no difference, which is distinctly surprising. If
anything, it happens more often at the default value than at either the
high or low values. It's very erratic: it happened ten times in one day,
then three days passed and it didn't happen at all... system under
very similar load the whole time.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-10 17:48         ` Nix
@ 2015-02-10 18:32           ` J. Bruce Fields
  2015-02-11 23:07             ` Nix
  2015-02-14 13:17             ` Nix
  2015-02-11  3:07           ` NeilBrown
  1 sibling, 2 replies; 30+ messages in thread
From: J. Bruce Fields @ 2015-02-10 18:32 UTC (permalink / raw)
  To: Nix; +Cc: NeilBrown, NFS list

On Tue, Feb 10, 2015 at 05:48:48PM +0000, Nix wrote:
> On 5 Feb 2015, NeilBrown spake thusly:
> 
> > On Wed, 04 Feb 2015 23:28:17 +0000 Nix <nix@esperi.org.uk> wrote:
> >> It doesn't. It still recurs.
> >
> > Is /usr/archive still exported to mutilate with crossmnt?
> > If it is, can you change to not do that (it is quite possible to have
> > different export options for different clients).
> 
> OK. Adjusted.
> 
> > I think that if crossmnt is enabled on the server, then explicitly
> > mounting /usr/archive/series will have the same net effect as not doing so
> > (though I'm not 100% certain).
> >
> > Also, can you try changing
> >    /proc/sys/fs/nfs/nfs_mountpoint_timeout
> >
> > It defaults to 500 (seconds - time for light from Sun to reach Earth).
> > If you make it smaller and the problem gets worse, or make it much bigger
> > and the problem goes away, that would be interesting.
> > If it makes no difference, that also would be interesting.
> 
> Seems to make no difference, which is distinctly surprising. If
> anything, it happens more often at the default value than at either the
> high or low values. It's very erratic: it happened ten times in one day,
> then three days passed and it didn't happen at all... system under
> very similar load the whole time.
> 
> >From other prompts, what I'm seeing now -- but wasn't then, before I
> took the crossmnt out -- is an epidemic of spontaneous unmounting: i.e.,
> /usr/archive/series suddenly vanishes until remounted.
> 
> I might just reboot all systems involved in this mess and hope it goes
> away. I have no *clue* what's going on, I've never seen it before, maybe
> it'll stop if I no longer believe in it.

It might be interesting to see output from

	rpc.debug -m rpc -s cache
	cat /proc/net/rpc/nfsd.export/content
	cat /proc/net/rpc/nfsd.fh/content

especially after the problem manifests.

Also, /usr/archive/series is a separate filesystem from /usr/archive,
right?  (The output of "mount" run on the server might also be useful.)

The reason crossmnt is considered "bad and evil" is that nfsv2 and v3
clients don't necessarily expect mountpoints within exports, and may be
get confused when (for example), they discover to files with the same
inode number that appear to be on the same filesystem.

I'm  not actually sure what the current linux client does--I think it
may be smart enough to use the fsid to avoid at least some of those
problems.  But NFSv4 clients are the only ones that should really be
counted on to get this right.

--b.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-10 17:48         ` Nix
  2015-02-10 18:32           ` J. Bruce Fields
@ 2015-02-11  3:07           ` NeilBrown
  2015-02-11 23:11             ` Nix
  1 sibling, 1 reply; 30+ messages in thread
From: NeilBrown @ 2015-02-11  3:07 UTC (permalink / raw)
  To: Nix; +Cc: J. Bruce Fields, NFS list

[-- Attachment #1: Type: text/plain, Size: 2723 bytes --]

On Tue, 10 Feb 2015 17:48:48 +0000 Nix <nix@esperi.org.uk> wrote:

> On 5 Feb 2015, NeilBrown spake thusly:
> 
> > On Wed, 04 Feb 2015 23:28:17 +0000 Nix <nix@esperi.org.uk> wrote:
> >> It doesn't. It still recurs.
> >
> > Is /usr/archive still exported to mutilate with crossmnt?
> > If it is, can you change to not do that (it is quite possible to have
> > different export options for different clients).
> 
> OK. Adjusted.
> 
> > I think that if crossmnt is enabled on the server, then explicitly
> > mounting /usr/archive/series will have the same net effect as not doing so
> > (though I'm not 100% certain).
> >
> > Also, can you try changing
> >    /proc/sys/fs/nfs/nfs_mountpoint_timeout
> >
> > It defaults to 500 (seconds - time for light from Sun to reach Earth).
> > If you make it smaller and the problem gets worse, or make it much bigger
> > and the problem goes away, that would be interesting.
> > If it makes no difference, that also would be interesting.
> 
> Seems to make no difference, which is distinctly surprising. If
> anything, it happens more often at the default value than at either the
> high or low values. It's very erratic: it happened ten times in one day,
> then three days passed and it didn't happen at all... system under
> very similar load the whole time.
> 
> >From other prompts, what I'm seeing now -- but wasn't then, before I
> took the crossmnt out -- is an epidemic of spontaneous unmounting: i.e.,
> /usr/archive/series suddenly vanishes until remounted.
> 
> I might just reboot all systems involved in this mess and hope it goes
> away. I have no *clue* what's going on, I've never seen it before, maybe
> it'll stop if I no longer believe in it.
> 

This all sounds remarkably similar to a problem that a customer reported
recently.
In that case the server was a NetApp and v4 was in use and the server seemed
to suggest that it was using volatile file handles.
If a filehandle for a mounted-on directory changes, then (I think) a new
inode will be allocated and the mountpoint will effectively disappear
(though I think it should remain in /proc/mounts).

However your have a Linux server and v3, so if it is the same problem, then I
completely mis-diagnosed it.

I wonder if something is going wrong in nfs_prime_dcache().  The code looks
right, but it is a little complex...

You could rule that out by disabling READDIRPLUS by using the nordirplus
mount option.  If that makes the proble go away, it would be very
interesting...

A more intrusive debugging approach would be to get d_drop() to scream if the
dentry being dropped had DCACHE_MOUNTED set.

Are you able to try either of those?

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-10 18:32           ` J. Bruce Fields
@ 2015-02-11 23:07             ` Nix
  2015-02-11 23:18               ` NeilBrown
  2015-02-12 15:38               ` J. Bruce Fields
  2015-02-14 13:17             ` Nix
  1 sibling, 2 replies; 30+ messages in thread
From: Nix @ 2015-02-11 23:07 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: NeilBrown, NFS list

On 10 Feb 2015, J. Bruce Fields said:

> On Tue, Feb 10, 2015 at 05:48:48PM +0000, Nix wrote:
>> On 5 Feb 2015, NeilBrown spake thusly:
>> 
>> > On Wed, 04 Feb 2015 23:28:17 +0000 Nix <nix@esperi.org.uk> wrote:
>> >> It doesn't. It still recurs.
>> >
>> > Is /usr/archive still exported to mutilate with crossmnt?
>> > If it is, can you change to not do that (it is quite possible to have
>> > different export options for different clients).
>> 
>> OK. Adjusted.
>> 
>> > I think that if crossmnt is enabled on the server, then explicitly
>> > mounting /usr/archive/series will have the same net effect as not doing so
>> > (though I'm not 100% certain).
>> >
>> > Also, can you try changing
>> >    /proc/sys/fs/nfs/nfs_mountpoint_timeout
>> >
>> > It defaults to 500 (seconds - time for light from Sun to reach Earth).
>> > If you make it smaller and the problem gets worse, or make it much bigger
>> > and the problem goes away, that would be interesting.
>> > If it makes no difference, that also would be interesting.
>> 
>> Seems to make no difference, which is distinctly surprising. If
>> anything, it happens more often at the default value than at either the
>> high or low values. It's very erratic: it happened ten times in one day,
>> then three days passed and it didn't happen at all... system under
>> very similar load the whole time.
>> 
>> >From other prompts, what I'm seeing now -- but wasn't then, before I
>> took the crossmnt out -- is an epidemic of spontaneous unmounting: i.e.,
>> /usr/archive/series suddenly vanishes until remounted.
>> 
>> I might just reboot all systems involved in this mess and hope it goes
>> away. I have no *clue* what's going on, I've never seen it before, maybe
>> it'll stop if I no longer believe in it.
>
> It might be interesting to see output from
>
> 	rpc.debug -m rpc -s cache
> 	cat /proc/net/rpc/nfsd.export/content
> 	cat /proc/net/rpc/nfsd.fh/content
>
> especially after the problem manifests.

It's manifested right now, as a matter of fact.

# cat /proc/net/rpc/nfsd.export/content
#path domain(flags)
/usr/src        mutilate.wkstn.nix(rw,no_root_squash,async,wdelay,no_subtree_check,fsid=16,uuid=333950aa:8e3f440a:bc94d0cc:4adae198,sec=1)
/usr/share/texlive      mutilate.wkstn.nix(rw,no_root_squash,async,wdelay,fsid=7,uuid=5cccc224:a92440ee:b4450447:3898c2ec,sec=1)
/home/.spindle.srvr.nix mutilate.wkstn.nix(rw,no_root_squash,async,wdelay,no_subtree_check,fsid=1,uuid=95bd22c2:253c456f:8e36b6cf:b9ecd4ef,sec=1)
/usr/archive/series     *.srvr.nix,xios.srvr.nix(ro,insecure,root_squash,async,wdelay,no_subtree_check,fsid=29,uuid=543a1ca9:d17246ca:b6c53092:5896549d,sec=1)
/usr/lib/X11/fonts      mutilate.wkstn.nix(ro,root_squash,async,wdelay,fsid=12,uuid=5cccc224:a92440ee:b4450447:3898c2ec,sec=1)
/home/.spindle.srvr.nix *.srvr.nix,fold.srvr.nix(rw,root_squash,async,wdelay,no_subtree_check,fsid=1,uuid=95bd22c2:253c456f:8e36b6cf:b9ecd4ef,sec=1)
/usr/archive    mutilate.wkstn.nix(rw,insecure,root_squash,async,wdelay,fsid=25,uuid=d20e3edd:06a54a9b:85dcfa19:62975969,sec=1)

# note: no /usr/archive/series, though I mounted it on mutilate and did
# not unmount it: however, it no longer appears in /proc/mounts on
# mutilate and appears as an empty directory under /usr/archive.
# However, it *does* appear here:

# cat /proc/net/rpc/nfsd.fh/content
#domain fsidtype fsid [path]
*.srvr.nix,xios.srvr.nix 1 0x0000001d /usr/archive/series
mutilate.wkstn.nix 1 0x0000000f /etc/shai-hulud
mutilate.wkstn.nix 1 0x0000000b /pkg/non-free
mutilate.wkstn.nix 1 0x00000016 /usr/share/emacs/site-lisp
mutilate.wkstn.nix 1 0x00000012 /usr/share/httpd/htdocs/munin
mutilate.wkstn.nix 1 0x00000013 /usr/share/clamav
mutilate.wkstn.nix 1 0x0000000a /usr/share/nethack
mutilate.wkstn.nix 1 0x00000009 /usr/share/xplanet
mutilate.wkstn.nix 1 0x00000008 /usr/share/xemacs
mutilate.wkstn.nix 1 0x00000015 /usr/share/flightgear
mutilate.wkstn.nix 1 0x00000005 /usr/doc
mutilate.wkstn.nix 1 0x00000006 /usr/info
mutilate.wkstn.nix 1 0x00000011 /var/state/munin
mutilate.wkstn.nix 1 0x0000000e /var/log.real
mutilate.wkstn.nix 1 0x00000007 /usr/share/texlive
mutilate.wkstn.nix 1 0x00000010 /usr/src
mutilate.wkstn.nix 1 0x0000000c /usr/lib/X11/fonts
mutilate.wkstn.nix 1 0x00000019 /usr/archive
mutilate.wkstn.nix 1 0x0000001d /usr/archive/series
mutilate.wkstn.nix 1 0x00000001 /home/.spindle.srvr.nix
*.srvr.nix,fold.srvr.nix 1 0x00000001 /home/.spindle.srvr.nix

When this happens, I get an (unreachable) and broken symlink under /proc
(not really surprising as the mountpoint has gone) -- but in this
situation, cd'ing out and back in does not fix it, only a remount does.
I'm not surprised by *those* symptoms at all.

> Also, /usr/archive/series is a separate filesystem from /usr/archive,
> right?  (The output of "mount" run on the server might also be useful.)

They are separate server filesystems:

/dev/mapper/main-archive /usr/archive ext4 rw,nosuid,nodev,relatime,nobarrier,commit=30,data=ordered 0 0
/dev/sdc1 /usr/archive/series ext4 rw,nosuid,nodev,relatime,commit=30,data=ordered 0 0
/dev/mapper/main-winbackup /usr/archive/winbackup ext4 rw,nosuid,nodev,relatime,nobarrier,commit=30,data=ordered 0 0

> The reason crossmnt is considered "bad and evil" is that nfsv2 and v3
> clients don't necessarily expect mountpoints within exports, and may be
> get confused when (for example), they discover to files with the same
> inode number that appear to be on the same filesystem.

That I expected. NFS mounts within NFS mounts are presumably fine (I
hope so, I've been using them extensively for decades).

> I'm  not actually sure what the current linux client does--I think it
> may be smart enough to use the fsid to avoid at least some of those
> problems.  But NFSv4 clients are the only ones that should really be
> counted on to get this right.

I wish I could get NFSv4 to work. It's just screamed about a lack of
adequate authentication every time I've tried it, and my network is so
NFS-dependent that significant experimentation is difficult (getting
anything wrong tends to cause my entire desktop to deadlock in seconds).
I suppose I should set up some VMs and play in there :)

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-11  3:07           ` NeilBrown
@ 2015-02-11 23:11             ` Nix
  0 siblings, 0 replies; 30+ messages in thread
From: Nix @ 2015-02-11 23:11 UTC (permalink / raw)
  To: NeilBrown; +Cc: J. Bruce Fields, NFS list

On 11 Feb 2015, NeilBrown spake thusly:
> I wonder if something is going wrong in nfs_prime_dcache().  The code looks
> right, but it is a little complex...
>
> You could rule that out by disabling READDIRPLUS by using the nordirplus
> mount option.  If that makes the proble go away, it would be very
> interesting...
>
> A more intrusive debugging approach would be to get d_drop() to scream if the
> dentry being dropped had DCACHE_MOUNTED set.
>
> Are you able to try either of those?

I'll test both (the former first). The problem's still happening several
times a day, so I should be able to get back to you fairly fast...

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-11 23:07             ` Nix
@ 2015-02-11 23:18               ` NeilBrown
  2015-02-12  1:50                 ` Nix
  2015-02-12 15:38               ` J. Bruce Fields
  1 sibling, 1 reply; 30+ messages in thread
From: NeilBrown @ 2015-02-11 23:18 UTC (permalink / raw)
  To: Nix; +Cc: J. Bruce Fields, NFS list

[-- Attachment #1: Type: text/plain, Size: 785 bytes --]


> > It might be interesting to see output from
> >
> > 	rpc.debug -m rpc -s cache
> > 	cat /proc/net/rpc/nfsd.export/content
> > 	cat /proc/net/rpc/nfsd.fh/content
> >
> > especially after the problem manifests.
> 
> It's manifested right now, as a matter of fact.
> 
> # cat /proc/net/rpc/nfsd.export/content
> #path domain(flags)
> /usr/src        mutilate.wkstn.nix(rw,no_root_squash,async,wdelay,no_subtree_check,fsid=16,uuid=333950aa:8e3f440a:bc94d0cc:4adae198,sec=1)
> /usr/share/texlive      mutilate.wkstn.nix(rw,no_root_squash,async,wdelay,fsid=7,uuid=5cccc224:a92440ee:b4450447:3898c2ec,sec=1)

You didn't run the "rpcdebug" command first (there is no '.').
That causes the 'content' files to contain extra information which might be
useful.

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-11 23:18               ` NeilBrown
@ 2015-02-12  1:50                 ` Nix
  0 siblings, 0 replies; 30+ messages in thread
From: Nix @ 2015-02-12  1:50 UTC (permalink / raw)
  To: NeilBrown; +Cc: J. Bruce Fields, NFS list

On 11 Feb 2015, NeilBrown said:

>
>> > It might be interesting to see output from
>> >
>> > 	rpc.debug -m rpc -s cache
>> > 	cat /proc/net/rpc/nfsd.export/content
>> > 	cat /proc/net/rpc/nfsd.fh/content
>> >
>> > especially after the problem manifests.
>> 
>> It's manifested right now, as a matter of fact.
>> 
>> # cat /proc/net/rpc/nfsd.export/content
>> #path domain(flags)
>> /usr/src        mutilate.wkstn.nix(rw,no_root_squash,async,wdelay,no_subtree_check,fsid=16,uuid=333950aa:8e3f440a:bc94d0cc:4adae198,sec=1)
>> /usr/share/texlive      mutilate.wkstn.nix(rw,no_root_squash,async,wdelay,fsid=7,uuid=5cccc224:a92440ee:b4450447:3898c2ec,sec=1)
>
> You didn't run the "rpcdebug" command first (there is no '.').
> That causes the 'content' files to contain extra information which might be
> useful.

Ah. That's because of this:

/proc/sys/sunrpc/rpc_debug: No such file or directory

Looks like I'll need to recompile with CONFIG_SUNRPC_DEBUG to do that. I
have a stable kernel upgrade coming up anyway: I'll get that done and
see if the problem recurs after the usual round-the-houses reboot...
with my luck, it'll go away :/

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-11 23:07             ` Nix
  2015-02-11 23:18               ` NeilBrown
@ 2015-02-12 15:38               ` J. Bruce Fields
  1 sibling, 0 replies; 30+ messages in thread
From: J. Bruce Fields @ 2015-02-12 15:38 UTC (permalink / raw)
  To: Nix; +Cc: NeilBrown, NFS list

On Wed, Feb 11, 2015 at 11:07:42PM +0000, Nix wrote:
> On 10 Feb 2015, J. Bruce Fields said:
> 
> > On Tue, Feb 10, 2015 at 05:48:48PM +0000, Nix wrote:
> >> On 5 Feb 2015, NeilBrown spake thusly:
> >> 
> >> > On Wed, 04 Feb 2015 23:28:17 +0000 Nix <nix@esperi.org.uk> wrote:
> >> >> It doesn't. It still recurs.
> >> >
> >> > Is /usr/archive still exported to mutilate with crossmnt?
> >> > If it is, can you change to not do that (it is quite possible to have
> >> > different export options for different clients).
> >> 
> >> OK. Adjusted.
> >> 
> >> > I think that if crossmnt is enabled on the server, then explicitly
> >> > mounting /usr/archive/series will have the same net effect as not doing so
> >> > (though I'm not 100% certain).
> >> >
> >> > Also, can you try changing
> >> >    /proc/sys/fs/nfs/nfs_mountpoint_timeout
> >> >
> >> > It defaults to 500 (seconds - time for light from Sun to reach Earth).
> >> > If you make it smaller and the problem gets worse, or make it much bigger
> >> > and the problem goes away, that would be interesting.
> >> > If it makes no difference, that also would be interesting.
> >> 
> >> Seems to make no difference, which is distinctly surprising. If
> >> anything, it happens more often at the default value than at either the
> >> high or low values. It's very erratic: it happened ten times in one day,
> >> then three days passed and it didn't happen at all... system under
> >> very similar load the whole time.
> >> 
> >> >From other prompts, what I'm seeing now -- but wasn't then, before I
> >> took the crossmnt out -- is an epidemic of spontaneous unmounting: i.e.,
> >> /usr/archive/series suddenly vanishes until remounted.
> >> 
> >> I might just reboot all systems involved in this mess and hope it goes
> >> away. I have no *clue* what's going on, I've never seen it before, maybe
> >> it'll stop if I no longer believe in it.
> >
> > It might be interesting to see output from
> >
> > 	rpc.debug -m rpc -s cache
> > 	cat /proc/net/rpc/nfsd.export/content
> > 	cat /proc/net/rpc/nfsd.fh/content
> >
> > especially after the problem manifests.
> 
> It's manifested right now, as a matter of fact.

Thanks.  Unfortunately nothing there really shouts wrong to me there.

--b.

> 
> # cat /proc/net/rpc/nfsd.export/content
> #path domain(flags)
> /usr/src        mutilate.wkstn.nix(rw,no_root_squash,async,wdelay,no_subtree_check,fsid=16,uuid=333950aa:8e3f440a:bc94d0cc:4adae198,sec=1)
> /usr/share/texlive      mutilate.wkstn.nix(rw,no_root_squash,async,wdelay,fsid=7,uuid=5cccc224:a92440ee:b4450447:3898c2ec,sec=1)
> /home/.spindle.srvr.nix mutilate.wkstn.nix(rw,no_root_squash,async,wdelay,no_subtree_check,fsid=1,uuid=95bd22c2:253c456f:8e36b6cf:b9ecd4ef,sec=1)
> /usr/archive/series     *.srvr.nix,xios.srvr.nix(ro,insecure,root_squash,async,wdelay,no_subtree_check,fsid=29,uuid=543a1ca9:d17246ca:b6c53092:5896549d,sec=1)
> /usr/lib/X11/fonts      mutilate.wkstn.nix(ro,root_squash,async,wdelay,fsid=12,uuid=5cccc224:a92440ee:b4450447:3898c2ec,sec=1)
> /home/.spindle.srvr.nix *.srvr.nix,fold.srvr.nix(rw,root_squash,async,wdelay,no_subtree_check,fsid=1,uuid=95bd22c2:253c456f:8e36b6cf:b9ecd4ef,sec=1)
> /usr/archive    mutilate.wkstn.nix(rw,insecure,root_squash,async,wdelay,fsid=25,uuid=d20e3edd:06a54a9b:85dcfa19:62975969,sec=1)
> 
> # note: no /usr/archive/series, though I mounted it on mutilate and did
> # not unmount it: however, it no longer appears in /proc/mounts on
> # mutilate and appears as an empty directory under /usr/archive.
> # However, it *does* appear here:
> 
> # cat /proc/net/rpc/nfsd.fh/content
> #domain fsidtype fsid [path]
> *.srvr.nix,xios.srvr.nix 1 0x0000001d /usr/archive/series
> mutilate.wkstn.nix 1 0x0000000f /etc/shai-hulud
> mutilate.wkstn.nix 1 0x0000000b /pkg/non-free
> mutilate.wkstn.nix 1 0x00000016 /usr/share/emacs/site-lisp
> mutilate.wkstn.nix 1 0x00000012 /usr/share/httpd/htdocs/munin
> mutilate.wkstn.nix 1 0x00000013 /usr/share/clamav
> mutilate.wkstn.nix 1 0x0000000a /usr/share/nethack
> mutilate.wkstn.nix 1 0x00000009 /usr/share/xplanet
> mutilate.wkstn.nix 1 0x00000008 /usr/share/xemacs
> mutilate.wkstn.nix 1 0x00000015 /usr/share/flightgear
> mutilate.wkstn.nix 1 0x00000005 /usr/doc
> mutilate.wkstn.nix 1 0x00000006 /usr/info
> mutilate.wkstn.nix 1 0x00000011 /var/state/munin
> mutilate.wkstn.nix 1 0x0000000e /var/log.real
> mutilate.wkstn.nix 1 0x00000007 /usr/share/texlive
> mutilate.wkstn.nix 1 0x00000010 /usr/src
> mutilate.wkstn.nix 1 0x0000000c /usr/lib/X11/fonts
> mutilate.wkstn.nix 1 0x00000019 /usr/archive
> mutilate.wkstn.nix 1 0x0000001d /usr/archive/series
> mutilate.wkstn.nix 1 0x00000001 /home/.spindle.srvr.nix
> *.srvr.nix,fold.srvr.nix 1 0x00000001 /home/.spindle.srvr.nix
> 
> When this happens, I get an (unreachable) and broken symlink under /proc
> (not really surprising as the mountpoint has gone) -- but in this
> situation, cd'ing out and back in does not fix it, only a remount does.
> I'm not surprised by *those* symptoms at all.
> 
> > Also, /usr/archive/series is a separate filesystem from /usr/archive,
> > right?  (The output of "mount" run on the server might also be useful.)
> 
> They are separate server filesystems:
> 
> /dev/mapper/main-archive /usr/archive ext4 rw,nosuid,nodev,relatime,nobarrier,commit=30,data=ordered 0 0
> /dev/sdc1 /usr/archive/series ext4 rw,nosuid,nodev,relatime,commit=30,data=ordered 0 0
> /dev/mapper/main-winbackup /usr/archive/winbackup ext4 rw,nosuid,nodev,relatime,nobarrier,commit=30,data=ordered 0 0
> 
> > The reason crossmnt is considered "bad and evil" is that nfsv2 and v3
> > clients don't necessarily expect mountpoints within exports, and may be
> > get confused when (for example), they discover to files with the same
> > inode number that appear to be on the same filesystem.
> 
> That I expected. NFS mounts within NFS mounts are presumably fine (I
> hope so, I've been using them extensively for decades).
> 
> > I'm  not actually sure what the current linux client does--I think it
> > may be smart enough to use the fsid to avoid at least some of those
> > problems.  But NFSv4 clients are the only ones that should really be
> > counted on to get this right.
> 
> I wish I could get NFSv4 to work. It's just screamed about a lack of
> adequate authentication every time I've tried it, and my network is so
> NFS-dependent that significant experimentation is difficult (getting
> anything wrong tends to cause my entire desktop to deadlock in seconds).
> I suppose I should set up some VMs and play in there :)
> 
> -- 
> NULL && (void)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-10 18:32           ` J. Bruce Fields
  2015-02-11 23:07             ` Nix
@ 2015-02-14 13:17             ` Nix
  2015-02-16  2:46               ` NeilBrown
  2015-02-16 15:43               ` J. Bruce Fields
  1 sibling, 2 replies; 30+ messages in thread
From: Nix @ 2015-02-14 13:17 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: NeilBrown, NFS list

On 10 Feb 2015, J. Bruce Fields outgrape:

> It might be interesting to see output from
>
> 	rpc.debug -m rpc -s cache
> 	cat /proc/net/rpc/nfsd.export/content
> 	cat /proc/net/rpc/nfsd.fh/content
>
> especially after the problem manifests.

So the mount has vanished again. I couldn't make it happen with
nordirplus in the mount options, so that might provide you with a clue.

Obviously, on the client, no useful output was seen (I learned to avoid
NFS cross-exports long ago, so the client doesn't normally NFS-export
anything).

On the server:

# cat /proc/net/rpc/nfsd.export/content
#path domain(flags)
# expiry=1423920264 refcnt=1 flags=1
/usr/src        mutilate.wkstn.nix(rw,no_root_squash,async,wdelay,no_subtree_check,fsid=16,uuid=333950aa:8e3f440a:bc94d0cc:4adae198,sec=1)
# expiry=1423920746 refcnt=1 flags=1
/home/.spindle.srvr.nix mutilate.wkstn.nix(rw,no_root_squash,async,wdelay,no_subtree_check,fsid=1,uuid=95bd22c2:253c456f:8e36b6cf:b9ecd4ef,sec=1)
# expiry=1423920907 refcnt=1 flags=1
/usr/lib/X11/fonts      mutilate.wkstn.nix(ro,root_squash,async,wdelay,fsid=12,uuid=5cccc224:a92440ee:b4450447:3898c2ec,sec=1)
# expiry=1423920721 refcnt=1 flags=1
/home/.spindle.srvr.nix *.srvr.nix,fold.srvr.nix(rw,root_squash,async,wdelay,no_subtree_check,fsid=1,uuid=95bd22c2:253c456f:8e36b6cf:b9ecd4ef,sec=1)

# cat /proc/net/rpc/nfsd.fh/content
#domain fsidtype fsid [path]
# expiry=2147483647 refcnt=1 flags=1
*.vm.nix,192.168.20.0/24,owork.vm.nix 1 0x00000010 /usr/src
# expiry=2147483647 refcnt=1 flags=1
*.vm.nix,192.168.20.0/24,owork.vm.nix 1 0x00000016 /home/oranix/o
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000018 /home/.spindle.srvr.nix/nix/Graphics/Private
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000003 /home/.spindle.srvr.nix/nix/Graphics/Photos
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000002 /home/.spindle.srvr.nix/nix/Mail/nnmh/spambox-verified
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000013 /usr/share/clamav
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000012 /usr/share/httpd/htdocs/munin
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000011 /var/state/munin
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x0000001d /usr/archive/series
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000010 /usr/src
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x0000000b /pkg/non-free
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000015 /usr/share/flightgear
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000007 /usr/share/texlive
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x0000000a /usr/share/nethack
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000009 /usr/share/xplanet
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000016 /usr/share/emacs/site-lisp
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000008 /usr/share/xemacs
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000005 /usr/doc
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000006 /usr/info
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x0000000f /etc/shai-hulud
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x0000000e /var/log.real
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x0000000c /usr/lib/X11/fonts
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000019 /usr/archive
# expiry=2147483647 refcnt=1 flags=1
mutilate.wkstn.nix 1 0x00000001 /home/.spindle.srvr.nix
# expiry=2147483647 refcnt=1 flags=1
*.vm.nix,192.168.20.0/24,linux-o.vm.nix 1 0x00000010 /usr/src
# expiry=2147483647 refcnt=1 flags=1
*.vm.nix,192.168.20.0/24,linux-o.vm.nix 1 0x00000016 /home/oranix/o
# expiry=2147483647 refcnt=1 flags=1
*.srvr.nix,fold.srvr.nix 1 0x00000001 /home/.spindle.srvr.nix

I remounted it, and nfsd.export/content gained a few lines:

# expiry=1423921406 refcnt=1 flags=1
/usr/archive/series     mutilate.wkstn.nix(rw,insecure,root_squash,async,wdelay,no_subtree_check,fsid=29,uuid=543a1ca9:d17246ca:b6c53092:5896549d,sec=1)
# expiry=1423921383 refcnt=1 flags=1
/usr/archive    mutilate.wkstn.nix(rw,insecure,root_squash,async,wdelay,fsid=25,uuid=d20e3edd:06a54a9b:85dcfa19:62975969,sec=1)

nfsd.fh/content is unchanged.

To me, this all looks completely normal: unused mounts *do* expire away.
The problem is that they're not coming back as they should (I guess
they're coming back with a different inode number?)

This is all with nfs-utils 1.3.0, btw.

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-14 13:17             ` Nix
@ 2015-02-16  2:46               ` NeilBrown
  2015-02-16  3:57                 ` NeilBrown
  2015-02-16  4:28                 ` Trond Myklebust
  2015-02-16 15:43               ` J. Bruce Fields
  1 sibling, 2 replies; 30+ messages in thread
From: NeilBrown @ 2015-02-16  2:46 UTC (permalink / raw)
  To: Nix; +Cc: J. Bruce Fields, NFS list, Trond Myklebust

[-- Attachment #1: Type: text/plain, Size: 1819 bytes --]

On Sat, 14 Feb 2015 13:17:00 +0000 Nix <nix@esperi.org.uk> wrote:

> On 10 Feb 2015, J. Bruce Fields outgrape:
> 
> > It might be interesting to see output from
> >
> > 	rpc.debug -m rpc -s cache
> > 	cat /proc/net/rpc/nfsd.export/content
> > 	cat /proc/net/rpc/nfsd.fh/content
> >
> > especially after the problem manifests.
> 
> So the mount has vanished again. I couldn't make it happen with
> nordirplus in the mount options, so that might provide you with a clue.

Yup.  It does.

There is definitely something wrong in nfs_prime_dcache.  I cannot quite
trace through from cause to effect, but maybe I don't need to.

Can you try the following patch and see if that makes the problem disappear?

When you perform a READDIRPLUS request on a directory that contains
mountpoints, the the Linux NFS server doesn't return a file-handle for
those names which are mountpoints (because doing so is a bit tricky).

nfs3_decode_dirent notices and decodes as a filehandle with zero length.

The "nfs_same_file()" check in nfs_prime_dcache() determines that isn't
the same as the filehandle it has, and tries to invalidate it and make a new
one.

The invalidation should fail (probably does).
The creating of a new one ... might succeed.  Beyond that, it all gets a bit
hazy.

Anyway, please try:

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 9b0c55cb2a2e..a460669dc395 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -541,7 +541,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
 
 		count++;
 
-		if (desc->plus != 0)
+		if (desc->plus != 0 && entry->fh.size)
 			nfs_prime_dcache(desc->file->f_path.dentry, entry);
 
 		status = nfs_readdir_add_to_array(entry, page);


which you might have to apply by hand.

Thanks,
NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-16  2:46               ` NeilBrown
@ 2015-02-16  3:57                 ` NeilBrown
  2015-02-17 17:32                   ` Nix
  2015-02-20 17:26                   ` Nix
  2015-02-16  4:28                 ` Trond Myklebust
  1 sibling, 2 replies; 30+ messages in thread
From: NeilBrown @ 2015-02-16  3:57 UTC (permalink / raw)
  To: Nix; +Cc: J. Bruce Fields, NFS list, Trond Myklebust

[-- Attachment #1: Type: text/plain, Size: 702 bytes --]

On Mon, 16 Feb 2015 13:46:28 +1100 NeilBrown <neilb@suse.de> wrote:

> 
> Anyway, please try:
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 9b0c55cb2a2e..a460669dc395 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -541,7 +541,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
>  
>  		count++;
>  
> -		if (desc->plus != 0)
> +		if (desc->plus != 0 && entry->fh.size)
>  			nfs_prime_dcache(desc->file->f_path.dentry, entry);
>  
>  		status = nfs_readdir_add_to_array(entry, page);
> 
> 
> which you might have to apply by hand.
> 
> Thanks,
> NeilBrown


Make that "entry->fh->size", not "entry->fh.size" :-(

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-16  2:46               ` NeilBrown
  2015-02-16  3:57                 ` NeilBrown
@ 2015-02-16  4:28                 ` Trond Myklebust
  2015-02-16  4:54                   ` NeilBrown
  1 sibling, 1 reply; 30+ messages in thread
From: Trond Myklebust @ 2015-02-16  4:28 UTC (permalink / raw)
  To: NeilBrown; +Cc: Nix, J. Bruce Fields, NFS list

On Sun, Feb 15, 2015 at 9:46 PM, NeilBrown <neilb@suse.de> wrote:
> On Sat, 14 Feb 2015 13:17:00 +0000 Nix <nix@esperi.org.uk> wrote:
>
>> On 10 Feb 2015, J. Bruce Fields outgrape:
>>
>> > It might be interesting to see output from
>> >
>> >     rpc.debug -m rpc -s cache
>> >     cat /proc/net/rpc/nfsd.export/content
>> >     cat /proc/net/rpc/nfsd.fh/content
>> >
>> > especially after the problem manifests.
>>
>> So the mount has vanished again. I couldn't make it happen with
>> nordirplus in the mount options, so that might provide you with a clue.
>
> Yup.  It does.
>
> There is definitely something wrong in nfs_prime_dcache.  I cannot quite
> trace through from cause to effect, but maybe I don't need to.
>
> Can you try the following patch and see if that makes the problem disappear?
>
> When you perform a READDIRPLUS request on a directory that contains
> mountpoints, the the Linux NFS server doesn't return a file-handle for
> those names which are mountpoints (because doing so is a bit tricky).
>
> nfs3_decode_dirent notices and decodes as a filehandle with zero length.
>
> The "nfs_same_file()" check in nfs_prime_dcache() determines that isn't
> the same as the filehandle it has, and tries to invalidate it and make a new
> one.
>
> The invalidation should fail (probably does).
> The creating of a new one ... might succeed.  Beyond that, it all gets a bit
> hazy.
>
> Anyway, please try:
>
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 9b0c55cb2a2e..a460669dc395 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -541,7 +541,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
>
>                 count++;
>
> -               if (desc->plus != 0)
> +               if (desc->plus != 0 && entry->fh.size)
>                         nfs_prime_dcache(desc->file->f_path.dentry, entry);
>
>                 status = nfs_readdir_add_to_array(entry, page);
>
>
> which you might have to apply by hand.

Doesn't that check ultimately belong in nfs_fget()? It would seem to
apply to all filehandles, irrespective of provenance.

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-16  4:28                 ` Trond Myklebust
@ 2015-02-16  4:54                   ` NeilBrown
  2015-02-22 22:13                     ` Trond Myklebust
  0 siblings, 1 reply; 30+ messages in thread
From: NeilBrown @ 2015-02-16  4:54 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Nix, J. Bruce Fields, NFS list

[-- Attachment #1: Type: text/plain, Size: 2868 bytes --]

On Sun, 15 Feb 2015 23:28:12 -0500 Trond Myklebust
<trond.myklebust@primarydata.com> wrote:

> On Sun, Feb 15, 2015 at 9:46 PM, NeilBrown <neilb@suse.de> wrote:
> > On Sat, 14 Feb 2015 13:17:00 +0000 Nix <nix@esperi.org.uk> wrote:
> >
> >> On 10 Feb 2015, J. Bruce Fields outgrape:
> >>
> >> > It might be interesting to see output from
> >> >
> >> >     rpc.debug -m rpc -s cache
> >> >     cat /proc/net/rpc/nfsd.export/content
> >> >     cat /proc/net/rpc/nfsd.fh/content
> >> >
> >> > especially after the problem manifests.
> >>
> >> So the mount has vanished again. I couldn't make it happen with
> >> nordirplus in the mount options, so that might provide you with a clue.
> >
> > Yup.  It does.
> >
> > There is definitely something wrong in nfs_prime_dcache.  I cannot quite
> > trace through from cause to effect, but maybe I don't need to.
> >
> > Can you try the following patch and see if that makes the problem disappear?
> >
> > When you perform a READDIRPLUS request on a directory that contains
> > mountpoints, the the Linux NFS server doesn't return a file-handle for
> > those names which are mountpoints (because doing so is a bit tricky).
> >
> > nfs3_decode_dirent notices and decodes as a filehandle with zero length.
> >
> > The "nfs_same_file()" check in nfs_prime_dcache() determines that isn't
> > the same as the filehandle it has, and tries to invalidate it and make a new
> > one.
> >
> > The invalidation should fail (probably does).
> > The creating of a new one ... might succeed.  Beyond that, it all gets a bit
> > hazy.
> >
> > Anyway, please try:
> >
> > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > index 9b0c55cb2a2e..a460669dc395 100644
> > --- a/fs/nfs/dir.c
> > +++ b/fs/nfs/dir.c
> > @@ -541,7 +541,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
> >
> >                 count++;
> >
> > -               if (desc->plus != 0)
> > +               if (desc->plus != 0 && entry->fh.size)
> >                         nfs_prime_dcache(desc->file->f_path.dentry, entry);
> >
> >                 status = nfs_readdir_add_to_array(entry, page);
> >
> >
> > which you might have to apply by hand.
> 
> Doesn't that check ultimately belong in nfs_fget()? It would seem to
> apply to all filehandles, irrespective of provenance.
> 

Maybe.  Though I think it also needs to be before nfs_prime_dcache() tries to
valid the dentry it found.
e.g.

 if (dentry != NULL) {
    if (entry->fh->size == 0)
       goto out;
    else if (nfs_same_file(..)) {
	....
    else {
        d_invalidate();
        ...
    }
  }

??

I'd really like to understand what is actually happening though.
d_invalidate() shouldn't effect an unmount.

Maybe the dentry that gets mounted on is the one with the all-zero fh...

NeilBrown


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-14 13:17             ` Nix
  2015-02-16  2:46               ` NeilBrown
@ 2015-02-16 15:43               ` J. Bruce Fields
  1 sibling, 0 replies; 30+ messages in thread
From: J. Bruce Fields @ 2015-02-16 15:43 UTC (permalink / raw)
  To: Nix; +Cc: NeilBrown, NFS list

On Sat, Feb 14, 2015 at 01:17:00PM +0000, Nix wrote:
> On 10 Feb 2015, J. Bruce Fields outgrape:
> 
> > It might be interesting to see output from
> >
> > 	rpc.debug -m rpc -s cache
> > 	cat /proc/net/rpc/nfsd.export/content
> > 	cat /proc/net/rpc/nfsd.fh/content
> >
> > especially after the problem manifests.
> 
> So the mount has vanished again. I couldn't make it happen with
> nordirplus in the mount options, so that might provide you with a clue.
> 
> Obviously, on the client, no useful output was seen (I learned to avoid
> NFS cross-exports long ago, so the client doesn't normally NFS-export
> anything).

Nothing looks obviously wrong to me here.

I'm not sure why you have nfsd.fh but but not nfsd.export entries for
your two exports.  Might be worth reviewing the code to work out in
which cases that happens.

Anyway Neil's probably right to suspect the client here....

--b.

> 
> On the server:
> 
> # cat /proc/net/rpc/nfsd.export/content
> #path domain(flags)
> # expiry=1423920264 refcnt=1 flags=1
> /usr/src        mutilate.wkstn.nix(rw,no_root_squash,async,wdelay,no_subtree_check,fsid=16,uuid=333950aa:8e3f440a:bc94d0cc:4adae198,sec=1)
> # expiry=1423920746 refcnt=1 flags=1
> /home/.spindle.srvr.nix mutilate.wkstn.nix(rw,no_root_squash,async,wdelay,no_subtree_check,fsid=1,uuid=95bd22c2:253c456f:8e36b6cf:b9ecd4ef,sec=1)
> # expiry=1423920907 refcnt=1 flags=1
> /usr/lib/X11/fonts      mutilate.wkstn.nix(ro,root_squash,async,wdelay,fsid=12,uuid=5cccc224:a92440ee:b4450447:3898c2ec,sec=1)
> # expiry=1423920721 refcnt=1 flags=1
> /home/.spindle.srvr.nix *.srvr.nix,fold.srvr.nix(rw,root_squash,async,wdelay,no_subtree_check,fsid=1,uuid=95bd22c2:253c456f:8e36b6cf:b9ecd4ef,sec=1)
> 
> # cat /proc/net/rpc/nfsd.fh/content
> #domain fsidtype fsid [path]
> # expiry=2147483647 refcnt=1 flags=1
> *.vm.nix,192.168.20.0/24,owork.vm.nix 1 0x00000010 /usr/src
> # expiry=2147483647 refcnt=1 flags=1
> *.vm.nix,192.168.20.0/24,owork.vm.nix 1 0x00000016 /home/oranix/o
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000018 /home/.spindle.srvr.nix/nix/Graphics/Private
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000003 /home/.spindle.srvr.nix/nix/Graphics/Photos
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000002 /home/.spindle.srvr.nix/nix/Mail/nnmh/spambox-verified
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000013 /usr/share/clamav
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000012 /usr/share/httpd/htdocs/munin
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000011 /var/state/munin
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x0000001d /usr/archive/series
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000010 /usr/src
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x0000000b /pkg/non-free
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000015 /usr/share/flightgear
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000007 /usr/share/texlive
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x0000000a /usr/share/nethack
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000009 /usr/share/xplanet
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000016 /usr/share/emacs/site-lisp
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000008 /usr/share/xemacs
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000005 /usr/doc
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000006 /usr/info
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x0000000f /etc/shai-hulud
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x0000000e /var/log.real
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x0000000c /usr/lib/X11/fonts
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000019 /usr/archive
> # expiry=2147483647 refcnt=1 flags=1
> mutilate.wkstn.nix 1 0x00000001 /home/.spindle.srvr.nix
> # expiry=2147483647 refcnt=1 flags=1
> *.vm.nix,192.168.20.0/24,linux-o.vm.nix 1 0x00000010 /usr/src
> # expiry=2147483647 refcnt=1 flags=1
> *.vm.nix,192.168.20.0/24,linux-o.vm.nix 1 0x00000016 /home/oranix/o
> # expiry=2147483647 refcnt=1 flags=1
> *.srvr.nix,fold.srvr.nix 1 0x00000001 /home/.spindle.srvr.nix
> 
> I remounted it, and nfsd.export/content gained a few lines:
> 
> # expiry=1423921406 refcnt=1 flags=1
> /usr/archive/series     mutilate.wkstn.nix(rw,insecure,root_squash,async,wdelay,no_subtree_check,fsid=29,uuid=543a1ca9:d17246ca:b6c53092:5896549d,sec=1)
> # expiry=1423921383 refcnt=1 flags=1
> /usr/archive    mutilate.wkstn.nix(rw,insecure,root_squash,async,wdelay,fsid=25,uuid=d20e3edd:06a54a9b:85dcfa19:62975969,sec=1)
> 
> nfsd.fh/content is unchanged.
> 
> To me, this all looks completely normal: unused mounts *do* expire away.
> The problem is that they're not coming back as they should (I guess
> they're coming back with a different inode number?)
> 
> This is all with nfs-utils 1.3.0, btw.
> 
> -- 
> NULL && (void)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-16  3:57                 ` NeilBrown
@ 2015-02-17 17:32                   ` Nix
  2015-02-20 17:26                   ` Nix
  1 sibling, 0 replies; 30+ messages in thread
From: Nix @ 2015-02-17 17:32 UTC (permalink / raw)
  To: NeilBrown; +Cc: J. Bruce Fields, NFS list, Trond Myklebust

On 16 Feb 2015, NeilBrown verbalised:
> Make that "entry->fh->size", not "entry->fh.size" :-(

Testing underway. It'll be a few days before I can be confident that the
problem's gone (or much less time if it hasn't!).

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-16  3:57                 ` NeilBrown
  2015-02-17 17:32                   ` Nix
@ 2015-02-20 17:26                   ` Nix
  2015-02-20 21:03                     ` NeilBrown
  1 sibling, 1 reply; 30+ messages in thread
From: Nix @ 2015-02-20 17:26 UTC (permalink / raw)
  To: NeilBrown; +Cc: J. Bruce Fields, NFS list, Trond Myklebust

On 16 Feb 2015, NeilBrown uttered the following:

> On Mon, 16 Feb 2015 13:46:28 +1100 NeilBrown <neilb@suse.de> wrote:
>
>> Anyway, please try:
>> 
>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>> index 9b0c55cb2a2e..a460669dc395 100644
>> --- a/fs/nfs/dir.c
>> +++ b/fs/nfs/dir.c
>> @@ -541,7 +541,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
>>  
>>  		count++;
>>  
>> -		if (desc->plus != 0)
>> +		if (desc->plus != 0 && entry->fh.size)
>>  			nfs_prime_dcache(desc->file->f_path.dentry, entry);
>>  
>>  		status = nfs_readdir_add_to_array(entry, page);
>> 
>> 
>> which you might have to apply by hand.
>
> Make that "entry->fh->size", not "entry->fh.size" :-(

That looks to fix it (though my first day of testing was wasted because
I forgot nordirplus was in the mount options :/ ).

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-20 17:26                   ` Nix
@ 2015-02-20 21:03                     ` NeilBrown
  0 siblings, 0 replies; 30+ messages in thread
From: NeilBrown @ 2015-02-20 21:03 UTC (permalink / raw)
  To: Nix; +Cc: J. Bruce Fields, NFS list, Trond Myklebust

[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]

On Fri, 20 Feb 2015 17:26:12 +0000 Nix <nix@esperi.org.uk> wrote:

> On 16 Feb 2015, NeilBrown uttered the following:
> 
> > On Mon, 16 Feb 2015 13:46:28 +1100 NeilBrown <neilb@suse.de> wrote:
> >
> >> Anyway, please try:
> >> 
> >> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> >> index 9b0c55cb2a2e..a460669dc395 100644
> >> --- a/fs/nfs/dir.c
> >> +++ b/fs/nfs/dir.c
> >> @@ -541,7 +541,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
> >>  
> >>  		count++;
> >>  
> >> -		if (desc->plus != 0)
> >> +		if (desc->plus != 0 && entry->fh.size)
> >>  			nfs_prime_dcache(desc->file->f_path.dentry, entry);
> >>  
> >>  		status = nfs_readdir_add_to_array(entry, page);
> >> 
> >> 
> >> which you might have to apply by hand.
> >
> > Make that "entry->fh->size", not "entry->fh.size" :-(
> 
> That looks to fix it (though my first day of testing was wasted because
> I forgot nordirplus was in the mount options :/ ).
> 

Excellent.  Thanks for testing.

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-16  4:54                   ` NeilBrown
@ 2015-02-22 22:13                     ` Trond Myklebust
  2015-02-22 22:47                       ` NeilBrown
  0 siblings, 1 reply; 30+ messages in thread
From: Trond Myklebust @ 2015-02-22 22:13 UTC (permalink / raw)
  To: NeilBrown; +Cc: Nix, J. Bruce Fields, NFS list

On Mon, 2015-02-16 at 15:54 +1100, NeilBrown wrote:
> On Sun, 15 Feb 2015 23:28:12 -0500 Trond Myklebust
> <trond.myklebust@primarydata.com> wrote:
> 
> > On Sun, Feb 15, 2015 at 9:46 PM, NeilBrown <neilb@suse.de> wrote:
> > > On Sat, 14 Feb 2015 13:17:00 +0000 Nix <nix@esperi.org.uk> wrote:
> > >
> > >> On 10 Feb 2015, J. Bruce Fields outgrape:
> > >>
> > >> > It might be interesting to see output from
> > >> >
> > >> >     rpc.debug -m rpc -s cache
> > >> >     cat /proc/net/rpc/nfsd.export/content
> > >> >     cat /proc/net/rpc/nfsd.fh/content
> > >> >
> > >> > especially after the problem manifests.
> > >>
> > >> So the mount has vanished again. I couldn't make it happen with
> > >> nordirplus in the mount options, so that might provide you with a clue.
> > >
> > > Yup.  It does.
> > >
> > > There is definitely something wrong in nfs_prime_dcache.  I cannot quite
> > > trace through from cause to effect, but maybe I don't need to.
> > >
> > > Can you try the following patch and see if that makes the problem disappear?
> > >
> > > When you perform a READDIRPLUS request on a directory that contains
> > > mountpoints, the the Linux NFS server doesn't return a file-handle for
> > > those names which are mountpoints (because doing so is a bit tricky).
> > >
> > > nfs3_decode_dirent notices and decodes as a filehandle with zero length.
> > >
> > > The "nfs_same_file()" check in nfs_prime_dcache() determines that isn't
> > > the same as the filehandle it has, and tries to invalidate it and make a new
> > > one.
> > >
> > > The invalidation should fail (probably does).
> > > The creating of a new one ... might succeed.  Beyond that, it all gets a bit
> > > hazy.
> > >
> > > Anyway, please try:
> > >
> > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > > index 9b0c55cb2a2e..a460669dc395 100644
> > > --- a/fs/nfs/dir.c
> > > +++ b/fs/nfs/dir.c
> > > @@ -541,7 +541,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
> > >
> > >                 count++;
> > >
> > > -               if (desc->plus != 0)
> > > +               if (desc->plus != 0 && entry->fh.size)
> > >                         nfs_prime_dcache(desc->file->f_path.dentry, entry);
> > >
> > >                 status = nfs_readdir_add_to_array(entry, page);
> > >
> > >
> > > which you might have to apply by hand.
> > 
> > Doesn't that check ultimately belong in nfs_fget()? It would seem to
> > apply to all filehandles, irrespective of provenance.
> > 
> 
> Maybe.  Though I think it also needs to be before nfs_prime_dcache() tries to
> valid the dentry it found.
> e.g.
> 
>  if (dentry != NULL) {
>     if (entry->fh->size == 0)
>        goto out;
>     else if (nfs_same_file(..)) {
> 	....
>     else {
>         d_invalidate();
>         ...
>     }
>   }
> 
> ??
> 
> I'd really like to understand what is actually happening though.
> d_invalidate() shouldn't effect an unmount.
> 
> Maybe the dentry that gets mounted on is the one with the all-zero fh...

Commit 8ed936b5671bf (v3.18+) changes d_invalidate() to unmount the
subtree on a directory being invalidated.

I disagree that the problem here is the zero length filehandle. It is
rather that we need to accommodate situations where the server is
setting us up for a submount or a NFSv4 referral.
In that situation, it is perfectly OK for nfs_prime_dcache() to create
an entry for the mounted-on file. It's just not OK for it to invalidate
the dentry if the submount was already performed.

So how about the following alternative patch?

8<----------------------------------------------------------------
>From 1c8194f2147c10fc7a142eda4f6d7f35ae1f7d4f Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@primarydata.com>
Date: Sun, 22 Feb 2015 16:35:36 -0500
Subject: [PATCH] NFS: Don't invalidate a submounted dentry in
 nfs_prime_dcache()

If we're traversing a directory which contains a submounted filesystem,
or one that has a referral, the NFS server that is processing the READDIR
request will often return information for the underlying (mounted-on)
directory. It may, or may not, also return filehandle information.

If this happens, and the lookup in nfs_prime_dcache() returns the
dentry for the submounted directory, the filehandle comparison will
fail, and we call d_invalidate(). Post-commit 8ed936b5671bf
("vfs: Lazily remove mounts on unlinked files and directories."), this
means the entire subtree is unmounted.

The following minimal patch addresses this problem by punting on
the invalidation if there is a submount.

Cudos to Neil Brown <neilb@suse.de> for having tracked down this
issue (see link).

Reported-by: Nix <nix@esperi.org.uk>
Link: http://lkml.kernel.org/r/87iofju9ht.fsf@spindle.srvr.nix
Fixes: d39ab9de3b80 ("NFS: re-add readdir plus")
Cc: stable@vger.kernel.org # 2.6.27+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/dir.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 43e29e3e3697..c35ff07b7345 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -485,10 +485,10 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
 			if (!status)
 				nfs_setsecurity(dentry->d_inode, entry->fattr, entry->label);
 			goto out;
-		} else {
-			d_invalidate(dentry);
-			dput(dentry);
-		}
+		} else if (IS_ROOT(dentry))
+			goto out;
+		d_invalidate(dentry);
+		dput(dentry);
 	}
 
 	dentry = d_alloc(parent, &filename);
-- 
2.1.0


-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com





^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-22 22:13                     ` Trond Myklebust
@ 2015-02-22 22:47                       ` NeilBrown
  2015-02-23  2:05                         ` Trond Myklebust
  0 siblings, 1 reply; 30+ messages in thread
From: NeilBrown @ 2015-02-22 22:47 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Nix, J. Bruce Fields, NFS list

[-- Attachment #1: Type: text/plain, Size: 7137 bytes --]

On Sun, 22 Feb 2015 17:13:31 -0500 Trond Myklebust
<trond.myklebust@primarydata.com> wrote:

> On Mon, 2015-02-16 at 15:54 +1100, NeilBrown wrote:
> > On Sun, 15 Feb 2015 23:28:12 -0500 Trond Myklebust
> > <trond.myklebust@primarydata.com> wrote:
> > 
> > > On Sun, Feb 15, 2015 at 9:46 PM, NeilBrown <neilb@suse.de> wrote:
> > > > On Sat, 14 Feb 2015 13:17:00 +0000 Nix <nix@esperi.org.uk> wrote:
> > > >
> > > >> On 10 Feb 2015, J. Bruce Fields outgrape:
> > > >>
> > > >> > It might be interesting to see output from
> > > >> >
> > > >> >     rpc.debug -m rpc -s cache
> > > >> >     cat /proc/net/rpc/nfsd.export/content
> > > >> >     cat /proc/net/rpc/nfsd.fh/content
> > > >> >
> > > >> > especially after the problem manifests.
> > > >>
> > > >> So the mount has vanished again. I couldn't make it happen with
> > > >> nordirplus in the mount options, so that might provide you with a clue.
> > > >
> > > > Yup.  It does.
> > > >
> > > > There is definitely something wrong in nfs_prime_dcache.  I cannot quite
> > > > trace through from cause to effect, but maybe I don't need to.
> > > >
> > > > Can you try the following patch and see if that makes the problem disappear?
> > > >
> > > > When you perform a READDIRPLUS request on a directory that contains
> > > > mountpoints, the the Linux NFS server doesn't return a file-handle for
> > > > those names which are mountpoints (because doing so is a bit tricky).
> > > >
> > > > nfs3_decode_dirent notices and decodes as a filehandle with zero length.
> > > >
> > > > The "nfs_same_file()" check in nfs_prime_dcache() determines that isn't
> > > > the same as the filehandle it has, and tries to invalidate it and make a new
> > > > one.
> > > >
> > > > The invalidation should fail (probably does).
> > > > The creating of a new one ... might succeed.  Beyond that, it all gets a bit
> > > > hazy.
> > > >
> > > > Anyway, please try:
> > > >
> > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > > > index 9b0c55cb2a2e..a460669dc395 100644
> > > > --- a/fs/nfs/dir.c
> > > > +++ b/fs/nfs/dir.c
> > > > @@ -541,7 +541,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
> > > >
> > > >                 count++;
> > > >
> > > > -               if (desc->plus != 0)
> > > > +               if (desc->plus != 0 && entry->fh.size)
> > > >                         nfs_prime_dcache(desc->file->f_path.dentry, entry);
> > > >
> > > >                 status = nfs_readdir_add_to_array(entry, page);
> > > >
> > > >
> > > > which you might have to apply by hand.
> > > 
> > > Doesn't that check ultimately belong in nfs_fget()? It would seem to
> > > apply to all filehandles, irrespective of provenance.
> > > 
> > 
> > Maybe.  Though I think it also needs to be before nfs_prime_dcache() tries to
> > valid the dentry it found.
> > e.g.
> > 
> >  if (dentry != NULL) {
> >     if (entry->fh->size == 0)
> >        goto out;
> >     else if (nfs_same_file(..)) {
> > 	....
> >     else {
> >         d_invalidate();
> >         ...
> >     }
> >   }
> > 
> > ??
> > 
> > I'd really like to understand what is actually happening though.
> > d_invalidate() shouldn't effect an unmount.
> > 
> > Maybe the dentry that gets mounted on is the one with the all-zero fh...
> 
> Commit 8ed936b5671bf (v3.18+) changes d_invalidate() to unmount the
> subtree on a directory being invalidated.
> 
> I disagree that the problem here is the zero length filehandle. It is
> rather that we need to accommodate situations where the server is
> setting us up for a submount or a NFSv4 referral.

I don't understand how you can view the treatment of a non-existent
filehandle as though it were a real filehandle as anything other than a bug.

I certainly agree that there may be other issues with this code.  It is
unlikely to handle volatile filehandles well, and as you say, referrals may
well be an issue too.

But surely if the server didn't return a valid filehandle, then it is wrong
to pretend that "all-zeros" is a valid filehandle.  That is what the current
code does.


> In that situation, it is perfectly OK for nfs_prime_dcache() to create
> an entry for the mounted-on file. It's just not OK for it to invalidate
> the dentry if the submount was already performed.
> 
> So how about the following alternative patch?
> 
> 8<----------------------------------------------------------------
> >From 1c8194f2147c10fc7a142eda4f6d7f35ae1f7d4f Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <trond.myklebust@primarydata.com>
> Date: Sun, 22 Feb 2015 16:35:36 -0500
> Subject: [PATCH] NFS: Don't invalidate a submounted dentry in
>  nfs_prime_dcache()
> 
> If we're traversing a directory which contains a submounted filesystem,
> or one that has a referral, the NFS server that is processing the READDIR
> request will often return information for the underlying (mounted-on)
> directory. It may, or may not, also return filehandle information.
> 
> If this happens, and the lookup in nfs_prime_dcache() returns the
> dentry for the submounted directory, the filehandle comparison will
> fail, and we call d_invalidate(). Post-commit 8ed936b5671bf
> ("vfs: Lazily remove mounts on unlinked files and directories."), this
> means the entire subtree is unmounted.
> 
> The following minimal patch addresses this problem by punting on
> the invalidation if there is a submount.
> 
> Cudos to Neil Brown <neilb@suse.de> for having tracked down this
> issue (see link).
> 
> Reported-by: Nix <nix@esperi.org.uk>
> Link: http://lkml.kernel.org/r/87iofju9ht.fsf@spindle.srvr.nix
> Fixes: d39ab9de3b80 ("NFS: re-add readdir plus")
> Cc: stable@vger.kernel.org # 2.6.27+
> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
> ---
>  fs/nfs/dir.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 43e29e3e3697..c35ff07b7345 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -485,10 +485,10 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
>  			if (!status)
>  				nfs_setsecurity(dentry->d_inode, entry->fattr, entry->label);
>  			goto out;
> -		} else {
> -			d_invalidate(dentry);
> -			dput(dentry);
> -		}
> +		} else if (IS_ROOT(dentry))
> +			goto out;
> +		d_invalidate(dentry);
> +		dput(dentry);

The 'dentry' in this case was obtained via d_lookup() which doesn't follow
mount points.  So there is no chance that IS_ROOT(dentry).
d_mountpoint(dentry) might be a more interesting test.

However d_invalidate will unmount any subtree further down.
So if I have /a/b/c/d all via NFS, and  'd' is a mountpoint, then if the NFS
server returns a new filehandle for 'b', 'd' will get unmounted.  Neither
'IS_ROOT' nor 'd_mountpoint' will guard against that.

I'm not really sure what we do want here.  The old behaviour of d_invalidate,
where it failed if anything was mounted, seemed like a reasonable sort of
behaviour.  But we don't have that available any more.

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-22 22:47                       ` NeilBrown
@ 2015-02-23  2:05                         ` Trond Myklebust
  2015-02-23  2:33                           ` Trond Myklebust
  2015-02-23  3:05                           ` NeilBrown
  0 siblings, 2 replies; 30+ messages in thread
From: Trond Myklebust @ 2015-02-23  2:05 UTC (permalink / raw)
  To: NeilBrown; +Cc: Nix, J. Bruce Fields, NFS list

On Sun, Feb 22, 2015 at 5:47 PM, NeilBrown <neilb@suse.de> wrote:
> On Sun, 22 Feb 2015 17:13:31 -0500 Trond Myklebust
> <trond.myklebust@primarydata.com> wrote:
>
>> On Mon, 2015-02-16 at 15:54 +1100, NeilBrown wrote:
>> > On Sun, 15 Feb 2015 23:28:12 -0500 Trond Myklebust
>> > <trond.myklebust@primarydata.com> wrote:
>> >
>> > > On Sun, Feb 15, 2015 at 9:46 PM, NeilBrown <neilb@suse.de> wrote:
>> > > > On Sat, 14 Feb 2015 13:17:00 +0000 Nix <nix@esperi.org.uk> wrote:
>> > > >
>> > > >> On 10 Feb 2015, J. Bruce Fields outgrape:
>> > > >>
>> > > >> > It might be interesting to see output from
>> > > >> >
>> > > >> >     rpc.debug -m rpc -s cache
>> > > >> >     cat /proc/net/rpc/nfsd.export/content
>> > > >> >     cat /proc/net/rpc/nfsd.fh/content
>> > > >> >
>> > > >> > especially after the problem manifests.
>> > > >>
>> > > >> So the mount has vanished again. I couldn't make it happen with
>> > > >> nordirplus in the mount options, so that might provide you with a clue.
>> > > >
>> > > > Yup.  It does.
>> > > >
>> > > > There is definitely something wrong in nfs_prime_dcache.  I cannot quite
>> > > > trace through from cause to effect, but maybe I don't need to.
>> > > >
>> > > > Can you try the following patch and see if that makes the problem disappear?
>> > > >
>> > > > When you perform a READDIRPLUS request on a directory that contains
>> > > > mountpoints, the the Linux NFS server doesn't return a file-handle for
>> > > > those names which are mountpoints (because doing so is a bit tricky).
>> > > >
>> > > > nfs3_decode_dirent notices and decodes as a filehandle with zero length.
>> > > >
>> > > > The "nfs_same_file()" check in nfs_prime_dcache() determines that isn't
>> > > > the same as the filehandle it has, and tries to invalidate it and make a new
>> > > > one.
>> > > >
>> > > > The invalidation should fail (probably does).
>> > > > The creating of a new one ... might succeed.  Beyond that, it all gets a bit
>> > > > hazy.
>> > > >
>> > > > Anyway, please try:
>> > > >
>> > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>> > > > index 9b0c55cb2a2e..a460669dc395 100644
>> > > > --- a/fs/nfs/dir.c
>> > > > +++ b/fs/nfs/dir.c
>> > > > @@ -541,7 +541,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
>> > > >
>> > > >                 count++;
>> > > >
>> > > > -               if (desc->plus != 0)
>> > > > +               if (desc->plus != 0 && entry->fh.size)
>> > > >                         nfs_prime_dcache(desc->file->f_path.dentry, entry);
>> > > >
>> > > >                 status = nfs_readdir_add_to_array(entry, page);
>> > > >
>> > > >
>> > > > which you might have to apply by hand.
>> > >
>> > > Doesn't that check ultimately belong in nfs_fget()? It would seem to
>> > > apply to all filehandles, irrespective of provenance.
>> > >
>> >
>> > Maybe.  Though I think it also needs to be before nfs_prime_dcache() tries to
>> > valid the dentry it found.
>> > e.g.
>> >
>> >  if (dentry != NULL) {
>> >     if (entry->fh->size == 0)
>> >        goto out;
>> >     else if (nfs_same_file(..)) {
>> >     ....
>> >     else {
>> >         d_invalidate();
>> >         ...
>> >     }
>> >   }
>> >
>> > ??
>> >
>> > I'd really like to understand what is actually happening though.
>> > d_invalidate() shouldn't effect an unmount.
>> >
>> > Maybe the dentry that gets mounted on is the one with the all-zero fh...
>>
>> Commit 8ed936b5671bf (v3.18+) changes d_invalidate() to unmount the
>> subtree on a directory being invalidated.
>>
>> I disagree that the problem here is the zero length filehandle. It is
>> rather that we need to accommodate situations where the server is
>> setting us up for a submount or a NFSv4 referral.
>
> I don't understand how you can view the treatment of a non-existent
> filehandle as though it were a real filehandle as anything other than a bug.

I see it as a case of "I can't return a filehandle, because you're not
supposed to ever see this inode".

IOW: it is literally the case that the client is supposed to create a
proxy inode because this is supposed to be a mountpoint.

> I certainly agree that there may be other issues with this code.  It is
> unlikely to handle volatile filehandles well, and as you say, referrals may
> well be an issue too.
>
> But surely if the server didn't return a valid filehandle, then it is wrong
> to pretend that "all-zeros" is a valid filehandle.  That is what the current
> code does.

As long as we have a valid mounted-on-fileid or a valid fileid, then
we can still discriminate. That is also what the current code does.
The only really broken case is if the server returns no filehandle or
fileid. AFAICS we should be handling that case correctly too in
nfs_refresh_inode().

>> In that situation, it is perfectly OK for nfs_prime_dcache() to create
>> an entry for the mounted-on file. It's just not OK for it to invalidate
>> the dentry if the submount was already performed.
>>
>> So how about the following alternative patch?
>>
>> 8<----------------------------------------------------------------
>> >From 1c8194f2147c10fc7a142eda4f6d7f35ae1f7d4f Mon Sep 17 00:00:00 2001
>> From: Trond Myklebust <trond.myklebust@primarydata.com>
>> Date: Sun, 22 Feb 2015 16:35:36 -0500
>> Subject: [PATCH] NFS: Don't invalidate a submounted dentry in
>>  nfs_prime_dcache()
>>
>> If we're traversing a directory which contains a submounted filesystem,
>> or one that has a referral, the NFS server that is processing the READDIR
>> request will often return information for the underlying (mounted-on)
>> directory. It may, or may not, also return filehandle information.
>>
>> If this happens, and the lookup in nfs_prime_dcache() returns the
>> dentry for the submounted directory, the filehandle comparison will
>> fail, and we call d_invalidate(). Post-commit 8ed936b5671bf
>> ("vfs: Lazily remove mounts on unlinked files and directories."), this
>> means the entire subtree is unmounted.
>>
>> The following minimal patch addresses this problem by punting on
>> the invalidation if there is a submount.
>>
>> Cudos to Neil Brown <neilb@suse.de> for having tracked down this
>> issue (see link).
>>
>> Reported-by: Nix <nix@esperi.org.uk>
>> Link: http://lkml.kernel.org/r/87iofju9ht.fsf@spindle.srvr.nix
>> Fixes: d39ab9de3b80 ("NFS: re-add readdir plus")
>> Cc: stable@vger.kernel.org # 2.6.27+
>> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
>> ---
>>  fs/nfs/dir.c | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>> index 43e29e3e3697..c35ff07b7345 100644
>> --- a/fs/nfs/dir.c
>> +++ b/fs/nfs/dir.c
>> @@ -485,10 +485,10 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
>>                       if (!status)
>>                               nfs_setsecurity(dentry->d_inode, entry->fattr, entry->label);
>>                       goto out;
>> -             } else {
>> -                     d_invalidate(dentry);
>> -                     dput(dentry);
>> -             }
>> +             } else if (IS_ROOT(dentry))
>> +                     goto out;
>> +             d_invalidate(dentry);
>> +             dput(dentry);
>
> The 'dentry' in this case was obtained via d_lookup() which doesn't follow
> mount points.  So there is no chance that IS_ROOT(dentry).
> d_mountpoint(dentry) might be a more interesting test.
>
> However d_invalidate will unmount any subtree further down.
> So if I have /a/b/c/d all via NFS, and  'd' is a mountpoint, then if the NFS
> server returns a new filehandle for 'b', 'd' will get unmounted.  Neither
> 'IS_ROOT' nor 'd_mountpoint' will guard against that.
>
> I'm not really sure what we do want here.  The old behaviour of d_invalidate,
> where it failed if anything was mounted, seemed like a reasonable sort of
> behaviour.  But we don't have that available any more.

If the mounted-on-fileid has changed, then we _should_ invalidate.
-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-23  2:05                         ` Trond Myklebust
@ 2015-02-23  2:33                           ` Trond Myklebust
  2015-02-23  3:05                           ` NeilBrown
  1 sibling, 0 replies; 30+ messages in thread
From: Trond Myklebust @ 2015-02-23  2:33 UTC (permalink / raw)
  To: NeilBrown; +Cc: Nix, J. Bruce Fields, NFS list

On Sun, 2015-02-22 at 21:05 -0500, Trond Myklebust wrote:
> On Sun, Feb 22, 2015 at 5:47 PM, NeilBrown <neilb@suse.de> wrote:
> > On Sun, 22 Feb 2015 17:13:31 -0500 Trond Myklebust
> > <trond.myklebust@primarydata.com> wrote:
> >
> >> On Mon, 2015-02-16 at 15:54 +1100, NeilBrown wrote:
> >> > On Sun, 15 Feb 2015 23:28:12 -0500 Trond Myklebust
> >> > <trond.myklebust@primarydata.com> wrote:
> >> >
> >> > > On Sun, Feb 15, 2015 at 9:46 PM, NeilBrown <neilb@suse.de> wrote:
> >> > > > On Sat, 14 Feb 2015 13:17:00 +0000 Nix <nix@esperi.org.uk> wrote:
> >> > > >
> >> > > >> On 10 Feb 2015, J. Bruce Fields outgrape:
> >> > > >>
> >> > > >> > It might be interesting to see output from
> >> > > >> >
> >> > > >> >     rpc.debug -m rpc -s cache
> >> > > >> >     cat /proc/net/rpc/nfsd.export/content
> >> > > >> >     cat /proc/net/rpc/nfsd.fh/content
> >> > > >> >
> >> > > >> > especially after the problem manifests.
> >> > > >>
> >> > > >> So the mount has vanished again. I couldn't make it happen with
> >> > > >> nordirplus in the mount options, so that might provide you with a clue.
> >> > > >
> >> > > > Yup.  It does.
> >> > > >
> >> > > > There is definitely something wrong in nfs_prime_dcache.  I cannot quite
> >> > > > trace through from cause to effect, but maybe I don't need to.
> >> > > >
> >> > > > Can you try the following patch and see if that makes the problem disappear?
> >> > > >
> >> > > > When you perform a READDIRPLUS request on a directory that contains
> >> > > > mountpoints, the the Linux NFS server doesn't return a file-handle for
> >> > > > those names which are mountpoints (because doing so is a bit tricky).
> >> > > >
> >> > > > nfs3_decode_dirent notices and decodes as a filehandle with zero length.
> >> > > >
> >> > > > The "nfs_same_file()" check in nfs_prime_dcache() determines that isn't
> >> > > > the same as the filehandle it has, and tries to invalidate it and make a new
> >> > > > one.
> >> > > >
> >> > > > The invalidation should fail (probably does).
> >> > > > The creating of a new one ... might succeed.  Beyond that, it all gets a bit
> >> > > > hazy.
> >> > > >
> >> > > > Anyway, please try:
> >> > > >
> >> > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> >> > > > index 9b0c55cb2a2e..a460669dc395 100644
> >> > > > --- a/fs/nfs/dir.c
> >> > > > +++ b/fs/nfs/dir.c
> >> > > > @@ -541,7 +541,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
> >> > > >
> >> > > >                 count++;
> >> > > >
> >> > > > -               if (desc->plus != 0)
> >> > > > +               if (desc->plus != 0 && entry->fh.size)
> >> > > >                         nfs_prime_dcache(desc->file->f_path.dentry, entry);
> >> > > >
> >> > > >                 status = nfs_readdir_add_to_array(entry, page);
> >> > > >
> >> > > >
> >> > > > which you might have to apply by hand.
> >> > >
> >> > > Doesn't that check ultimately belong in nfs_fget()? It would seem to
> >> > > apply to all filehandles, irrespective of provenance.
> >> > >
> >> >
> >> > Maybe.  Though I think it also needs to be before nfs_prime_dcache() tries to
> >> > valid the dentry it found.
> >> > e.g.
> >> >
> >> >  if (dentry != NULL) {
> >> >     if (entry->fh->size == 0)
> >> >        goto out;
> >> >     else if (nfs_same_file(..)) {
> >> >     ....
> >> >     else {
> >> >         d_invalidate();
> >> >         ...
> >> >     }
> >> >   }
> >> >
> >> > ??
> >> >
> >> > I'd really like to understand what is actually happening though.
> >> > d_invalidate() shouldn't effect an unmount.
> >> >
> >> > Maybe the dentry that gets mounted on is the one with the all-zero fh...
> >>
> >> Commit 8ed936b5671bf (v3.18+) changes d_invalidate() to unmount the
> >> subtree on a directory being invalidated.
> >>
> >> I disagree that the problem here is the zero length filehandle. It is
> >> rather that we need to accommodate situations where the server is
> >> setting us up for a submount or a NFSv4 referral.
> >
> > I don't understand how you can view the treatment of a non-existent
> > filehandle as though it were a real filehandle as anything other than a bug.
> 
> I see it as a case of "I can't return a filehandle, because you're not
> supposed to ever see this inode".
> 
> IOW: it is literally the case that the client is supposed to create a
> proxy inode because this is supposed to be a mountpoint.
> 
> > I certainly agree that there may be other issues with this code.  It is
> > unlikely to handle volatile filehandles well, and as you say, referrals may
> > well be an issue too.
> >
> > But surely if the server didn't return a valid filehandle, then it is wrong
> > to pretend that "all-zeros" is a valid filehandle.  That is what the current
> > code does.
> 
> As long as we have a valid mounted-on-fileid or a valid fileid, then
> we can still discriminate. That is also what the current code does.
> The only really broken case is if the server returns no filehandle or
> fileid. AFAICS we should be handling that case correctly too in
> nfs_refresh_inode().
> 
> >> In that situation, it is perfectly OK for nfs_prime_dcache() to create
> >> an entry for the mounted-on file. It's just not OK for it to invalidate
> >> the dentry if the submount was already performed.
> >>
> >> So how about the following alternative patch?
> >>
> >> 8<----------------------------------------------------------------
> >> >From 1c8194f2147c10fc7a142eda4f6d7f35ae1f7d4f Mon Sep 17 00:00:00 2001
> >> From: Trond Myklebust <trond.myklebust@primarydata.com>
> >> Date: Sun, 22 Feb 2015 16:35:36 -0500
> >> Subject: [PATCH] NFS: Don't invalidate a submounted dentry in
> >>  nfs_prime_dcache()
> >>
> >> If we're traversing a directory which contains a submounted filesystem,
> >> or one that has a referral, the NFS server that is processing the READDIR
> >> request will often return information for the underlying (mounted-on)
> >> directory. It may, or may not, also return filehandle information.
> >>
> >> If this happens, and the lookup in nfs_prime_dcache() returns the
> >> dentry for the submounted directory, the filehandle comparison will
> >> fail, and we call d_invalidate(). Post-commit 8ed936b5671bf
> >> ("vfs: Lazily remove mounts on unlinked files and directories."), this
> >> means the entire subtree is unmounted.
> >>
> >> The following minimal patch addresses this problem by punting on
> >> the invalidation if there is a submount.
> >>
> >> Cudos to Neil Brown <neilb@suse.de> for having tracked down this
> >> issue (see link).
> >>
> >> Reported-by: Nix <nix@esperi.org.uk>
> >> Link: http://lkml.kernel.org/r/87iofju9ht.fsf@spindle.srvr.nix
> >> Fixes: d39ab9de3b80 ("NFS: re-add readdir plus")
> >> Cc: stable@vger.kernel.org # 2.6.27+
> >> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
> >> ---
> >>  fs/nfs/dir.c | 8 ++++----
> >>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> >> index 43e29e3e3697..c35ff07b7345 100644
> >> --- a/fs/nfs/dir.c
> >> +++ b/fs/nfs/dir.c
> >> @@ -485,10 +485,10 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
> >>                       if (!status)
> >>                               nfs_setsecurity(dentry->d_inode, entry->fattr, entry->label);
> >>                       goto out;
> >> -             } else {
> >> -                     d_invalidate(dentry);
> >> -                     dput(dentry);
> >> -             }
> >> +             } else if (IS_ROOT(dentry))
> >> +                     goto out;
> >> +             d_invalidate(dentry);
> >> +             dput(dentry);
> >
> > The 'dentry' in this case was obtained via d_lookup() which doesn't follow
> > mount points.  So there is no chance that IS_ROOT(dentry).
> > d_mountpoint(dentry) might be a more interesting test.
> >
> > However d_invalidate will unmount any subtree further down.
> > So if I have /a/b/c/d all via NFS, and  'd' is a mountpoint, then if the NFS
> > server returns a new filehandle for 'b', 'd' will get unmounted.  Neither
> > 'IS_ROOT' nor 'd_mountpoint' will guard against that.
> >
> > I'm not really sure what we do want here.  The old behaviour of d_invalidate,
> > where it failed if anything was mounted, seemed like a reasonable sort of
> > behaviour.  But we don't have that available any more.
> 
> If the mounted-on-fileid has changed, then we _should_ invalidate.

IOW: the following should be overkill. If it isn't working, then I'd
definitely want to see some wireshark traces before accepting any
patches.

8<-------------------------------------------------------
>From 1a5eaa9ece7924bd605377445583e2e345e79c74 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@primarydata.com>
Date: Sun, 22 Feb 2015 16:35:36 -0500
Subject: [PATCH] NFS: Don't invalidate a submounted dentry in
 nfs_prime_dcache()

If we're traversing a directory which contains a submounted filesystem,
or one that has a referral, the NFS server that is processing the READDIR
request will often return information for the underlying (mounted-on)
directory. It may, or may not, also return filehandle information.

If this happens, and the lookup in nfs_prime_dcache() returns the
dentry for the submounted directory, the filehandle comparison will
fail, and we call d_invalidate(). Post-commit 8ed936b5671bf
("vfs: Lazily remove mounts on unlinked files and directories."), this
means the entire subtree is unmounted.

The following minimal patch addresses this problem by punting on
the invalidation if there is a submount.

Cudos to Neil Brown <neilb@suse.de> for having tracked down this
issue (see link).

Reported-by: Nix <nix@esperi.org.uk>
Link: http://lkml.kernel.org/r/87iofju9ht.fsf@spindle.srvr.nix
Fixes: d39ab9de3b80 ("NFS: re-add readdir plus")
Cc: stable@vger.kernel.org # 2.6.27+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/dir.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 9b0c55cb2a2e..1e80dc1716b1 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -485,10 +485,14 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
 			if (!status)
 				nfs_setsecurity(dentry->d_inode, entry->fattr, entry->label);
 			goto out;
-		} else {
-			d_invalidate(dentry);
-			dput(dentry);
 		}
+		/* Is this a submount? If so, ignore. */
+		if (IS_ROOT(dentry) ||
+		    !nfs_fsid_equal(&NFS_SB(dentry->d_sb)->fsid,
+					&entry->fattr->fsid))
+			goto out;
+		d_invalidate(dentry);
+		dput(dentry);
 	}
 
 	dentry = d_alloc(parent, &filename);
-- 
2.1.0




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-23  2:05                         ` Trond Myklebust
  2015-02-23  2:33                           ` Trond Myklebust
@ 2015-02-23  3:05                           ` NeilBrown
  2015-02-23  3:33                             ` Trond Myklebust
  1 sibling, 1 reply; 30+ messages in thread
From: NeilBrown @ 2015-02-23  3:05 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Nix, J. Bruce Fields, NFS list

[-- Attachment #1: Type: text/plain, Size: 10362 bytes --]

On Sun, 22 Feb 2015 21:05:12 -0500 Trond Myklebust
<trond.myklebust@primarydata.com> wrote:

> On Sun, Feb 22, 2015 at 5:47 PM, NeilBrown <neilb@suse.de> wrote:
> > On Sun, 22 Feb 2015 17:13:31 -0500 Trond Myklebust
> > <trond.myklebust@primarydata.com> wrote:
> >
> >> On Mon, 2015-02-16 at 15:54 +1100, NeilBrown wrote:
> >> > On Sun, 15 Feb 2015 23:28:12 -0500 Trond Myklebust
> >> > <trond.myklebust@primarydata.com> wrote:
> >> >
> >> > > On Sun, Feb 15, 2015 at 9:46 PM, NeilBrown <neilb@suse.de> wrote:
> >> > > > On Sat, 14 Feb 2015 13:17:00 +0000 Nix <nix@esperi.org.uk> wrote:
> >> > > >
> >> > > >> On 10 Feb 2015, J. Bruce Fields outgrape:
> >> > > >>
> >> > > >> > It might be interesting to see output from
> >> > > >> >
> >> > > >> >     rpc.debug -m rpc -s cache
> >> > > >> >     cat /proc/net/rpc/nfsd.export/content
> >> > > >> >     cat /proc/net/rpc/nfsd.fh/content
> >> > > >> >
> >> > > >> > especially after the problem manifests.
> >> > > >>
> >> > > >> So the mount has vanished again. I couldn't make it happen with
> >> > > >> nordirplus in the mount options, so that might provide you with a clue.
> >> > > >
> >> > > > Yup.  It does.
> >> > > >
> >> > > > There is definitely something wrong in nfs_prime_dcache.  I cannot quite
> >> > > > trace through from cause to effect, but maybe I don't need to.
> >> > > >
> >> > > > Can you try the following patch and see if that makes the problem disappear?
> >> > > >
> >> > > > When you perform a READDIRPLUS request on a directory that contains
> >> > > > mountpoints, the the Linux NFS server doesn't return a file-handle for
> >> > > > those names which are mountpoints (because doing so is a bit tricky).
> >> > > >
> >> > > > nfs3_decode_dirent notices and decodes as a filehandle with zero length.
> >> > > >
> >> > > > The "nfs_same_file()" check in nfs_prime_dcache() determines that isn't
> >> > > > the same as the filehandle it has, and tries to invalidate it and make a new
> >> > > > one.
> >> > > >
> >> > > > The invalidation should fail (probably does).
> >> > > > The creating of a new one ... might succeed.  Beyond that, it all gets a bit
> >> > > > hazy.
> >> > > >
> >> > > > Anyway, please try:
> >> > > >
> >> > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> >> > > > index 9b0c55cb2a2e..a460669dc395 100644
> >> > > > --- a/fs/nfs/dir.c
> >> > > > +++ b/fs/nfs/dir.c
> >> > > > @@ -541,7 +541,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
> >> > > >
> >> > > >                 count++;
> >> > > >
> >> > > > -               if (desc->plus != 0)
> >> > > > +               if (desc->plus != 0 && entry->fh.size)
> >> > > >                         nfs_prime_dcache(desc->file->f_path.dentry, entry);
> >> > > >
> >> > > >                 status = nfs_readdir_add_to_array(entry, page);
> >> > > >
> >> > > >
> >> > > > which you might have to apply by hand.
> >> > >
> >> > > Doesn't that check ultimately belong in nfs_fget()? It would seem to
> >> > > apply to all filehandles, irrespective of provenance.
> >> > >
> >> >
> >> > Maybe.  Though I think it also needs to be before nfs_prime_dcache() tries to
> >> > valid the dentry it found.
> >> > e.g.
> >> >
> >> >  if (dentry != NULL) {
> >> >     if (entry->fh->size == 0)
> >> >        goto out;
> >> >     else if (nfs_same_file(..)) {
> >> >     ....
> >> >     else {
> >> >         d_invalidate();
> >> >         ...
> >> >     }
> >> >   }
> >> >
> >> > ??
> >> >
> >> > I'd really like to understand what is actually happening though.
> >> > d_invalidate() shouldn't effect an unmount.
> >> >
> >> > Maybe the dentry that gets mounted on is the one with the all-zero fh...
> >>
> >> Commit 8ed936b5671bf (v3.18+) changes d_invalidate() to unmount the
> >> subtree on a directory being invalidated.
> >>
> >> I disagree that the problem here is the zero length filehandle. It is
> >> rather that we need to accommodate situations where the server is
> >> setting us up for a submount or a NFSv4 referral.
> >
> > I don't understand how you can view the treatment of a non-existent
> > filehandle as though it were a real filehandle as anything other than a bug.
> 
> I see it as a case of "I can't return a filehandle, because you're not
> supposed to ever see this inode".

According to rfc1813, READDIRPLUS returns the filehandles in a "post_of_fh3"
structure which can optionally contain a filehandle.
The text says:
   One of the principles of this revision of the NFS protocol is to
   return the real value from the indicated operation and not an
   error from an incidental operation. The post_op_fh3 structure
   was designed to allow the server to recover from errors
   encountered while constructing a file handle.

which suggests that the absence of a filehandle could possibly be interpreted
as an error having occurred, but it doesn't allow the client to guess
what that error might have been.
It certainly doesn't allow the client to deduce "you're not supposed to ever
see this inode".


> 
> IOW: it is literally the case that the client is supposed to create a
> proxy inode because this is supposed to be a mountpoint.

This may be valid in the specific case that we are talking to a Linux NFSv3
server (of a certain vintage).  It isn't generally valid.


> 
> > I certainly agree that there may be other issues with this code.  It is
> > unlikely to handle volatile filehandles well, and as you say, referrals may
> > well be an issue too.
> >
> > But surely if the server didn't return a valid filehandle, then it is wrong
> > to pretend that "all-zeros" is a valid filehandle.  That is what the current
> > code does.
> 
> As long as we have a valid mounted-on-fileid or a valid fileid, then
> we can still discriminate. That is also what the current code does.
> The only really broken case is if the server returns no filehandle or
> fileid. AFAICS we should be handling that case correctly too in
> nfs_refresh_inode().

When nfs_same_file() returns 'true', I agree that nfs_refresh_inode() does
the correct thing.
When nfs_same_file() returns 'false', (e.g. the server returns no
filehandle), then we don't even get to nfs_refresh_inode().

When readdirplus returns the expected filehandle and/or  fileid, we should
clearly refresh the cached attributed.  When it returns clearly different
information it is reasonable to discard the cached information.
When it explicitly returns no information - there is nothing that can be
assumed.


> 
> >> In that situation, it is perfectly OK for nfs_prime_dcache() to create
> >> an entry for the mounted-on file. It's just not OK for it to invalidate
> >> the dentry if the submount was already performed.
> >>
> >> So how about the following alternative patch?
> >>
> >> 8<----------------------------------------------------------------
> >> >From 1c8194f2147c10fc7a142eda4f6d7f35ae1f7d4f Mon Sep 17 00:00:00 2001
> >> From: Trond Myklebust <trond.myklebust@primarydata.com>
> >> Date: Sun, 22 Feb 2015 16:35:36 -0500
> >> Subject: [PATCH] NFS: Don't invalidate a submounted dentry in
> >>  nfs_prime_dcache()
> >>
> >> If we're traversing a directory which contains a submounted filesystem,
> >> or one that has a referral, the NFS server that is processing the READDIR
> >> request will often return information for the underlying (mounted-on)
> >> directory. It may, or may not, also return filehandle information.
> >>
> >> If this happens, and the lookup in nfs_prime_dcache() returns the
> >> dentry for the submounted directory, the filehandle comparison will
> >> fail, and we call d_invalidate(). Post-commit 8ed936b5671bf
> >> ("vfs: Lazily remove mounts on unlinked files and directories."), this
> >> means the entire subtree is unmounted.
> >>
> >> The following minimal patch addresses this problem by punting on
> >> the invalidation if there is a submount.
> >>
> >> Cudos to Neil Brown <neilb@suse.de> for having tracked down this
> >> issue (see link).
> >>
> >> Reported-by: Nix <nix@esperi.org.uk>
> >> Link: http://lkml.kernel.org/r/87iofju9ht.fsf@spindle.srvr.nix
> >> Fixes: d39ab9de3b80 ("NFS: re-add readdir plus")
> >> Cc: stable@vger.kernel.org # 2.6.27+
> >> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
> >> ---
> >>  fs/nfs/dir.c | 8 ++++----
> >>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> >> index 43e29e3e3697..c35ff07b7345 100644
> >> --- a/fs/nfs/dir.c
> >> +++ b/fs/nfs/dir.c
> >> @@ -485,10 +485,10 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
> >>                       if (!status)
> >>                               nfs_setsecurity(dentry->d_inode, entry->fattr, entry->label);
> >>                       goto out;
> >> -             } else {
> >> -                     d_invalidate(dentry);
> >> -                     dput(dentry);
> >> -             }
> >> +             } else if (IS_ROOT(dentry))
> >> +                     goto out;
> >> +             d_invalidate(dentry);
> >> +             dput(dentry);
> >
> > The 'dentry' in this case was obtained via d_lookup() which doesn't follow
> > mount points.  So there is no chance that IS_ROOT(dentry).
> > d_mountpoint(dentry) might be a more interesting test.
> >
> > However d_invalidate will unmount any subtree further down.
> > So if I have /a/b/c/d all via NFS, and  'd' is a mountpoint, then if the NFS
> > server returns a new filehandle for 'b', 'd' will get unmounted.  Neither
> > 'IS_ROOT' nor 'd_mountpoint' will guard against that.
> >
> > I'm not really sure what we do want here.  The old behaviour of d_invalidate,
> > where it failed if anything was mounted, seemed like a reasonable sort of
> > behaviour.  But we don't have that available any more.
> 
> If the mounted-on-fileid has changed, then we _should_ invalidate.

I can't argue with that.
However as "nfs_same_file()" doesn't check the fileid, I'm not sure how
relevant it is.
Maybe nfs_same_file() should compare the fileid - providing the
fileattributes are included in the READDIRPLUS reply.  comparing the
entry->ino fileid might not work reliably.

Thanks,
NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-23  3:05                           ` NeilBrown
@ 2015-02-23  3:33                             ` Trond Myklebust
  2015-02-23  4:49                               ` NeilBrown
  0 siblings, 1 reply; 30+ messages in thread
From: Trond Myklebust @ 2015-02-23  3:33 UTC (permalink / raw)
  To: NeilBrown; +Cc: Nix, J. Bruce Fields, NFS list

On Sun, Feb 22, 2015 at 10:05 PM, NeilBrown <neilb@suse.de> wrote:
> On Sun, 22 Feb 2015 21:05:12 -0500 Trond Myklebust
> <trond.myklebust@primarydata.com> wrote:
>
>> On Sun, Feb 22, 2015 at 5:47 PM, NeilBrown <neilb@suse.de> wrote:
>> > On Sun, 22 Feb 2015 17:13:31 -0500 Trond Myklebust
>> > <trond.myklebust@primarydata.com> wrote:
>> >
>> >> On Mon, 2015-02-16 at 15:54 +1100, NeilBrown wrote:
>> >> > On Sun, 15 Feb 2015 23:28:12 -0500 Trond Myklebust
>> >> > <trond.myklebust@primarydata.com> wrote:
>> >> >
>> >> > > On Sun, Feb 15, 2015 at 9:46 PM, NeilBrown <neilb@suse.de> wrote:
>> >> > > > On Sat, 14 Feb 2015 13:17:00 +0000 Nix <nix@esperi.org.uk> wrote:
>> >> > > >
>> >> > > >> On 10 Feb 2015, J. Bruce Fields outgrape:
>> >> > > >>
>> >> > > >> > It might be interesting to see output from
>> >> > > >> >
>> >> > > >> >     rpc.debug -m rpc -s cache
>> >> > > >> >     cat /proc/net/rpc/nfsd.export/content
>> >> > > >> >     cat /proc/net/rpc/nfsd.fh/content
>> >> > > >> >
>> >> > > >> > especially after the problem manifests.
>> >> > > >>
>> >> > > >> So the mount has vanished again. I couldn't make it happen with
>> >> > > >> nordirplus in the mount options, so that might provide you with a clue.
>> >> > > >
>> >> > > > Yup.  It does.
>> >> > > >
>> >> > > > There is definitely something wrong in nfs_prime_dcache.  I cannot quite
>> >> > > > trace through from cause to effect, but maybe I don't need to.
>> >> > > >
>> >> > > > Can you try the following patch and see if that makes the problem disappear?
>> >> > > >
>> >> > > > When you perform a READDIRPLUS request on a directory that contains
>> >> > > > mountpoints, the the Linux NFS server doesn't return a file-handle for
>> >> > > > those names which are mountpoints (because doing so is a bit tricky).
>> >> > > >
>> >> > > > nfs3_decode_dirent notices and decodes as a filehandle with zero length.
>> >> > > >
>> >> > > > The "nfs_same_file()" check in nfs_prime_dcache() determines that isn't
>> >> > > > the same as the filehandle it has, and tries to invalidate it and make a new
>> >> > > > one.
>> >> > > >
>> >> > > > The invalidation should fail (probably does).
>> >> > > > The creating of a new one ... might succeed.  Beyond that, it all gets a bit
>> >> > > > hazy.
>> >> > > >
>> >> > > > Anyway, please try:
>> >> > > >
>> >> > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>> >> > > > index 9b0c55cb2a2e..a460669dc395 100644
>> >> > > > --- a/fs/nfs/dir.c
>> >> > > > +++ b/fs/nfs/dir.c
>> >> > > > @@ -541,7 +541,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
>> >> > > >
>> >> > > >                 count++;
>> >> > > >
>> >> > > > -               if (desc->plus != 0)
>> >> > > > +               if (desc->plus != 0 && entry->fh.size)
>> >> > > >                         nfs_prime_dcache(desc->file->f_path.dentry, entry);
>> >> > > >
>> >> > > >                 status = nfs_readdir_add_to_array(entry, page);
>> >> > > >
>> >> > > >
>> >> > > > which you might have to apply by hand.
>> >> > >
>> >> > > Doesn't that check ultimately belong in nfs_fget()? It would seem to
>> >> > > apply to all filehandles, irrespective of provenance.
>> >> > >
>> >> >
>> >> > Maybe.  Though I think it also needs to be before nfs_prime_dcache() tries to
>> >> > valid the dentry it found.
>> >> > e.g.
>> >> >
>> >> >  if (dentry != NULL) {
>> >> >     if (entry->fh->size == 0)
>> >> >        goto out;
>> >> >     else if (nfs_same_file(..)) {
>> >> >     ....
>> >> >     else {
>> >> >         d_invalidate();
>> >> >         ...
>> >> >     }
>> >> >   }
>> >> >
>> >> > ??
>> >> >
>> >> > I'd really like to understand what is actually happening though.
>> >> > d_invalidate() shouldn't effect an unmount.
>> >> >
>> >> > Maybe the dentry that gets mounted on is the one with the all-zero fh...
>> >>
>> >> Commit 8ed936b5671bf (v3.18+) changes d_invalidate() to unmount the
>> >> subtree on a directory being invalidated.
>> >>
>> >> I disagree that the problem here is the zero length filehandle. It is
>> >> rather that we need to accommodate situations where the server is
>> >> setting us up for a submount or a NFSv4 referral.
>> >
>> > I don't understand how you can view the treatment of a non-existent
>> > filehandle as though it were a real filehandle as anything other than a bug.
>>
>> I see it as a case of "I can't return a filehandle, because you're not
>> supposed to ever see this inode".
>
> According to rfc1813, READDIRPLUS returns the filehandles in a "post_of_fh3"
> structure which can optionally contain a filehandle.
> The text says:
>    One of the principles of this revision of the NFS protocol is to
>    return the real value from the indicated operation and not an
>    error from an incidental operation. The post_op_fh3 structure
>    was designed to allow the server to recover from errors
>    encountered while constructing a file handle.
>
> which suggests that the absence of a filehandle could possibly be interpreted
> as an error having occurred, but it doesn't allow the client to guess
> what that error might have been.
> It certainly doesn't allow the client to deduce "you're not supposed to ever
> see this inode".

NFSv3 had no concept of submounts so, quite frankly, it should not be
considered authoritative in this case.

>> IOW: it is literally the case that the client is supposed to create a
>> proxy inode because this is supposed to be a mountpoint.
>
> This may be valid in the specific case that we are talking to a Linux NFSv3
> server (of a certain vintage).  It isn't generally valid.
>
>
>>
>> > I certainly agree that there may be other issues with this code.  It is
>> > unlikely to handle volatile filehandles well, and as you say, referrals may
>> > well be an issue too.
>> >
>> > But surely if the server didn't return a valid filehandle, then it is wrong
>> > to pretend that "all-zeros" is a valid filehandle.  That is what the current
>> > code does.
>>
>> As long as we have a valid mounted-on-fileid or a valid fileid, then
>> we can still discriminate. That is also what the current code does.
>> The only really broken case is if the server returns no filehandle or
>> fileid. AFAICS we should be handling that case correctly too in
>> nfs_refresh_inode().
>
> When nfs_same_file() returns 'true', I agree that nfs_refresh_inode() does
> the correct thing.
> When nfs_same_file() returns 'false', (e.g. the server returns no
> filehandle), then we don't even get to nfs_refresh_inode().
>
> When readdirplus returns the expected filehandle and/or  fileid, we should
> clearly refresh the cached attributed.  When it returns clearly different
> information it is reasonable to discard the cached information.
> When it explicitly returns no information - there is nothing that can be
> assumed.

Your statement assumes that fh->size == 0 implies the server returned
no information. I strongly disagree.
No information => fh->size == 0, but the reverse is not the case, as
you indeed admit in your changelog.

That said, we're talking about the Linux knfsd server here, which
_always_ returns a filehandle unless there request races with an
unlink or the entry is a mountpoint.

>> >> In that situation, it is perfectly OK for nfs_prime_dcache() to create
>> >> an entry for the mounted-on file. It's just not OK for it to invalidate
>> >> the dentry if the submount was already performed.
>> >>
>> >> So how about the following alternative patch?
>> >>
>> >> 8<----------------------------------------------------------------
>> >> >From 1c8194f2147c10fc7a142eda4f6d7f35ae1f7d4f Mon Sep 17 00:00:00 2001
>> >> From: Trond Myklebust <trond.myklebust@primarydata.com>
>> >> Date: Sun, 22 Feb 2015 16:35:36 -0500
>> >> Subject: [PATCH] NFS: Don't invalidate a submounted dentry in
>> >>  nfs_prime_dcache()
>> >>
>> >> If we're traversing a directory which contains a submounted filesystem,
>> >> or one that has a referral, the NFS server that is processing the READDIR
>> >> request will often return information for the underlying (mounted-on)
>> >> directory. It may, or may not, also return filehandle information.
>> >>
>> >> If this happens, and the lookup in nfs_prime_dcache() returns the
>> >> dentry for the submounted directory, the filehandle comparison will
>> >> fail, and we call d_invalidate(). Post-commit 8ed936b5671bf
>> >> ("vfs: Lazily remove mounts on unlinked files and directories."), this
>> >> means the entire subtree is unmounted.
>> >>
>> >> The following minimal patch addresses this problem by punting on
>> >> the invalidation if there is a submount.
>> >>
>> >> Cudos to Neil Brown <neilb@suse.de> for having tracked down this
>> >> issue (see link).
>> >>
>> >> Reported-by: Nix <nix@esperi.org.uk>
>> >> Link: http://lkml.kernel.org/r/87iofju9ht.fsf@spindle.srvr.nix
>> >> Fixes: d39ab9de3b80 ("NFS: re-add readdir plus")
>> >> Cc: stable@vger.kernel.org # 2.6.27+
>> >> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
>> >> ---
>> >>  fs/nfs/dir.c | 8 ++++----
>> >>  1 file changed, 4 insertions(+), 4 deletions(-)
>> >>
>> >> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>> >> index 43e29e3e3697..c35ff07b7345 100644
>> >> --- a/fs/nfs/dir.c
>> >> +++ b/fs/nfs/dir.c
>> >> @@ -485,10 +485,10 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
>> >>                       if (!status)
>> >>                               nfs_setsecurity(dentry->d_inode, entry->fattr, entry->label);
>> >>                       goto out;
>> >> -             } else {
>> >> -                     d_invalidate(dentry);
>> >> -                     dput(dentry);
>> >> -             }
>> >> +             } else if (IS_ROOT(dentry))
>> >> +                     goto out;
>> >> +             d_invalidate(dentry);
>> >> +             dput(dentry);
>> >
>> > The 'dentry' in this case was obtained via d_lookup() which doesn't follow
>> > mount points.  So there is no chance that IS_ROOT(dentry).
>> > d_mountpoint(dentry) might be a more interesting test.
>> >
>> > However d_invalidate will unmount any subtree further down.
>> > So if I have /a/b/c/d all via NFS, and  'd' is a mountpoint, then if the NFS
>> > server returns a new filehandle for 'b', 'd' will get unmounted.  Neither
>> > 'IS_ROOT' nor 'd_mountpoint' will guard against that.
>> >
>> > I'm not really sure what we do want here.  The old behaviour of d_invalidate,
>> > where it failed if anything was mounted, seemed like a reasonable sort of
>> > behaviour.  But we don't have that available any more.
>>
>> If the mounted-on-fileid has changed, then we _should_ invalidate.
>
> I can't argue with that.
> However as "nfs_same_file()" doesn't check the fileid, I'm not sure how
> relevant it is.
> Maybe nfs_same_file() should compare the fileid - providing the
> fileattributes are included in the READDIRPLUS reply.  comparing the
> entry->ino fileid might not work reliably.
>
> Thanks,
> NeilBrown



-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-23  3:33                             ` Trond Myklebust
@ 2015-02-23  4:49                               ` NeilBrown
  2015-02-23 13:55                                 ` Trond Myklebust
  0 siblings, 1 reply; 30+ messages in thread
From: NeilBrown @ 2015-02-23  4:49 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Nix, J. Bruce Fields, NFS list

[-- Attachment #1: Type: text/plain, Size: 4953 bytes --]

On Sun, 22 Feb 2015 22:33:28 -0500 Trond Myklebust
<trond.myklebust@primarydata.com> wrote:

> > According to rfc1813, READDIRPLUS returns the filehandles in a "post_of_fh3"
> > structure which can optionally contain a filehandle.
> > The text says:
> >    One of the principles of this revision of the NFS protocol is to
> >    return the real value from the indicated operation and not an
> >    error from an incidental operation. The post_op_fh3 structure
> >    was designed to allow the server to recover from errors
> >    encountered while constructing a file handle.
> >
> > which suggests that the absence of a filehandle could possibly be interpreted
> > as an error having occurred, but it doesn't allow the client to guess
> > what that error might have been.
> > It certainly doesn't allow the client to deduce "you're not supposed to ever
> > see this inode".
> 
> NFSv3 had no concept of submounts so, quite frankly, it should not be
> considered authoritative in this case.

The presence or otherwise of submounts is tangential to the bug.
The spec implies that nothing can be deduced from the absence of a filehandle
in a READDIRPLUS reply, but the code appears to deduce something.  This is a
bug.  submounts happen to trigger it in the one case we have clear evidence
for.


> 
> >> IOW: it is literally the case that the client is supposed to create a
> >> proxy inode because this is supposed to be a mountpoint.
> >
> > This may be valid in the specific case that we are talking to a Linux NFSv3
> > server (of a certain vintage).  It isn't generally valid.
> >
> >
> >>
> >> > I certainly agree that there may be other issues with this code.  It is
> >> > unlikely to handle volatile filehandles well, and as you say, referrals may
> >> > well be an issue too.
> >> >
> >> > But surely if the server didn't return a valid filehandle, then it is wrong
> >> > to pretend that "all-zeros" is a valid filehandle.  That is what the current
> >> > code does.
> >>
> >> As long as we have a valid mounted-on-fileid or a valid fileid, then
> >> we can still discriminate. That is also what the current code does.
> >> The only really broken case is if the server returns no filehandle or
> >> fileid. AFAICS we should be handling that case correctly too in
> >> nfs_refresh_inode().
> >
> > When nfs_same_file() returns 'true', I agree that nfs_refresh_inode() does
> > the correct thing.
> > When nfs_same_file() returns 'false', (e.g. the server returns no
> > filehandle), then we don't even get to nfs_refresh_inode().
> >
> > When readdirplus returns the expected filehandle and/or  fileid, we should
> > clearly refresh the cached attributed.  When it returns clearly different
> > information it is reasonable to discard the cached information.
> > When it explicitly returns no information - there is nothing that can be
> > assumed.
> 
> Your statement assumes that fh->size == 0 implies the server returned
> no information. I strongly disagree.

I accept that a server might genuinely return a filehandle with a size of
zero for some one object (maybe a root directory or something).
In that case, this code:

		if (*p != xdr_zero) {
			error = decode_nfs_fh3(xdr, entry->fh);
			if (unlikely(error)) {
				if (error == -E2BIG)
					goto out_truncated;
				return error;
			}
		} else
			zero_nfs_fh3(entry->fh);

in nfs3_decode_dirent is wrong.
'zero_nfs_fh3' should not be used as that makes the non-existent filehandle
look like something that could be a genuine filehandle.
Rather something like
   entry->fh->size == NFS_NO_FH_SIZE;
should be used, where NFS_NO_FH_SIZE is (NFS_MAXFHSIZE+1) or similar.

Or maybe entry->fh should  be set to NULL if no filehandle is present.


> No information => fh->size == 0, but the reverse is not the case, as
> you indeed admit in your changelog.

I didn't not mean to admit that.  I admitted that the server could genuinely
return a filehandle that was different to the one cached by the client.  I did
not mean that the server could genuinely return a filehandle with size of 0
(though I now agree that is possible as the spec doesn't forbid it).

On glancing through RFC1813 I noticed:

   Clients should use file handle
   comparisons only to improve performance, not for correct
   behavior.

This suggests that triggering an unmount when filehandles don't match is not
correct.  Flushing the cache is OK (that is a performance issue).  Removing
mountpoints is not.  So the change to d_invalidate() in 3.18 really isn't
very good for NFS.


> 
> That said, we're talking about the Linux knfsd server here, which
> _always_ returns a filehandle unless there request races with an
> unlink or the entry is a mountpoint.

Surely the implementation should, where possible, be written to the spec, not
to some particular server implementation.

Thanks,
NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: what on earth is going on here? paths above mountpoints turn into "(unreachable)"
  2015-02-23  4:49                               ` NeilBrown
@ 2015-02-23 13:55                                 ` Trond Myklebust
  0 siblings, 0 replies; 30+ messages in thread
From: Trond Myklebust @ 2015-02-23 13:55 UTC (permalink / raw)
  To: NeilBrown; +Cc: Nix, J. Bruce Fields, NFS list

On Sun, Feb 22, 2015 at 11:49 PM, NeilBrown <neilb@suse.de> wrote:
> On Sun, 22 Feb 2015 22:33:28 -0500 Trond Myklebust
> <trond.myklebust@primarydata.com> wrote:
>
>> > According to rfc1813, READDIRPLUS returns the filehandles in a "post_of_fh3"
>> > structure which can optionally contain a filehandle.
>> > The text says:
>> >    One of the principles of this revision of the NFS protocol is to
>> >    return the real value from the indicated operation and not an
>> >    error from an incidental operation. The post_op_fh3 structure
>> >    was designed to allow the server to recover from errors
>> >    encountered while constructing a file handle.
>> >
>> > which suggests that the absence of a filehandle could possibly be interpreted
>> > as an error having occurred, but it doesn't allow the client to guess
>> > what that error might have been.
>> > It certainly doesn't allow the client to deduce "you're not supposed to ever
>> > see this inode".
>>
>> NFSv3 had no concept of submounts so, quite frankly, it should not be
>> considered authoritative in this case.
>
> The presence or otherwise of submounts is tangential to the bug.
> The spec implies that nothing can be deduced from the absence of a filehandle
> in a READDIRPLUS reply, but the code appears to deduce something.  This is a
> bug.  submounts happen to trigger it in the one case we have clear evidence
> for.
>
>
>>
>> >> IOW: it is literally the case that the client is supposed to create a
>> >> proxy inode because this is supposed to be a mountpoint.
>> >
>> > This may be valid in the specific case that we are talking to a Linux NFSv3
>> > server (of a certain vintage).  It isn't generally valid.
>> >
>> >
>> >>
>> >> > I certainly agree that there may be other issues with this code.  It is
>> >> > unlikely to handle volatile filehandles well, and as you say, referrals may
>> >> > well be an issue too.
>> >> >
>> >> > But surely if the server didn't return a valid filehandle, then it is wrong
>> >> > to pretend that "all-zeros" is a valid filehandle.  That is what the current
>> >> > code does.
>> >>
>> >> As long as we have a valid mounted-on-fileid or a valid fileid, then
>> >> we can still discriminate. That is also what the current code does.
>> >> The only really broken case is if the server returns no filehandle or
>> >> fileid. AFAICS we should be handling that case correctly too in
>> >> nfs_refresh_inode().
>> >
>> > When nfs_same_file() returns 'true', I agree that nfs_refresh_inode() does
>> > the correct thing.
>> > When nfs_same_file() returns 'false', (e.g. the server returns no
>> > filehandle), then we don't even get to nfs_refresh_inode().
>> >
>> > When readdirplus returns the expected filehandle and/or  fileid, we should
>> > clearly refresh the cached attributed.  When it returns clearly different
>> > information it is reasonable to discard the cached information.
>> > When it explicitly returns no information - there is nothing that can be
>> > assumed.
>>
>> Your statement assumes that fh->size == 0 implies the server returned
>> no information. I strongly disagree.
>
> I accept that a server might genuinely return a filehandle with a size of
> zero for some one object (maybe a root directory or something).
> In that case, this code:
>
>                 if (*p != xdr_zero) {
>                         error = decode_nfs_fh3(xdr, entry->fh);
>                         if (unlikely(error)) {
>                                 if (error == -E2BIG)
>                                         goto out_truncated;
>                                 return error;
>                         }
>                 } else
>                         zero_nfs_fh3(entry->fh);
>
> in nfs3_decode_dirent is wrong.
> 'zero_nfs_fh3' should not be used as that makes the non-existent filehandle
> look like something that could be a genuine filehandle.
> Rather something like
>    entry->fh->size == NFS_NO_FH_SIZE;
> should be used, where NFS_NO_FH_SIZE is (NFS_MAXFHSIZE+1) or similar.
>
> Or maybe entry->fh should  be set to NULL if no filehandle is present.
>
>
>> No information => fh->size == 0, but the reverse is not the case, as
>> you indeed admit in your changelog.
>
> I didn't not mean to admit that.  I admitted that the server could genuinely
> return a filehandle that was different to the one cached by the client.  I did
> not mean that the server could genuinely return a filehandle with size of 0
> (though I now agree that is possible as the spec doesn't forbid it).
>
> On glancing through RFC1813 I noticed:
>
>    Clients should use file handle
>    comparisons only to improve performance, not for correct
>    behavior.
>
> This suggests that triggering an unmount when filehandles don't match is not
> correct.  Flushing the cache is OK (that is a performance issue).  Removing
> mountpoints is not.  So the change to d_invalidate() in 3.18 really isn't
> very good for NFS.
>

It really doesn't matter what the NFSv3 spec says here. We store the
filehandle in the inode, and if that filehandle changes then we drop
the inode (and hence the dentry). This has always been the case and is
not particular to the readdir code. We do the exact same thing in
nfs_lookup_revalidate().
The reason why we ignore the letter of the NFSv3 spec here is reality:
NetApp chose to ignore the spec admonition about keeping fileids
unique for the case where they create a snapshot. That basically means
the filehandle is the only thing we can rely on to know when a file is
the same and can be gathered into the same inode.

The NFSv4 spec, which adds protocol support for crossing submounts on
the server tried to clarify things in the section entitled "Data
Caching and File Identity", and we do try to stick to that
clarification: the check for mountpoints is equivalent here to the
fsid check.

>>
>> That said, we're talking about the Linux knfsd server here, which
>> _always_ returns a filehandle unless there request races with an
>> unlink or the entry is a mountpoint.
>
> Surely the implementation should, where possible, be written to the spec, not
> to some particular server implementation.

It is. The question is where the regression comes from.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-02-23 13:55 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-03  0:25 what on earth is going on here? paths above mountpoints turn into "(unreachable)" Nix
2015-02-03 19:53 ` J. Bruce Fields
2015-02-03 19:57   ` Nix
2015-02-04 23:28     ` Nix
2015-02-05  0:26       ` NeilBrown
2015-02-10 17:48         ` Nix
2015-02-10 18:32           ` J. Bruce Fields
2015-02-11 23:07             ` Nix
2015-02-11 23:18               ` NeilBrown
2015-02-12  1:50                 ` Nix
2015-02-12 15:38               ` J. Bruce Fields
2015-02-14 13:17             ` Nix
2015-02-16  2:46               ` NeilBrown
2015-02-16  3:57                 ` NeilBrown
2015-02-17 17:32                   ` Nix
2015-02-20 17:26                   ` Nix
2015-02-20 21:03                     ` NeilBrown
2015-02-16  4:28                 ` Trond Myklebust
2015-02-16  4:54                   ` NeilBrown
2015-02-22 22:13                     ` Trond Myklebust
2015-02-22 22:47                       ` NeilBrown
2015-02-23  2:05                         ` Trond Myklebust
2015-02-23  2:33                           ` Trond Myklebust
2015-02-23  3:05                           ` NeilBrown
2015-02-23  3:33                             ` Trond Myklebust
2015-02-23  4:49                               ` NeilBrown
2015-02-23 13:55                                 ` Trond Myklebust
2015-02-16 15:43               ` J. Bruce Fields
2015-02-11  3:07           ` NeilBrown
2015-02-11 23:11             ` Nix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.