All of lore.kernel.org
 help / color / mirror / Atom feed
* Delays on "first" access to a NFS mount
@ 2007-03-07 10:23 Simon Peter
  2007-03-07 12:38 ` Talpey, Thomas
  0 siblings, 1 reply; 45+ messages in thread
From: Simon Peter @ 2007-03-07 10:23 UTC (permalink / raw)
  To: nfs

[-- Attachment #1: Type: text/plain, Size: 521 bytes --]

Hi,

I get a good 10 second delay anytime I am accessing my NFS mounts from
a client for the first time (or after a long time not accessing them --
I suppose whenever the cache is cleared or something similar).

I usually did not bother, even though it is very annoying, but this
time I collected a network protocol capture, which is attached. Notice
the big delay between packet #6 and #8, while #7 should show that it is
not a network issue.

I would be very glad if somebody could explain these delays.

Thanks,
Simon

[-- Attachment #2: capture.txt --]
[-- Type: text/plain, Size: 9564 bytes --]

No.     Time        Source                Destination           Protocol Info
      3 2.814252    192.168.110.10        192.168.110.1         TCP      675 > nfs [SYN] Seq=0 Len=0 MSS=1460 TSV=1145045 TSER=0 WS=7

Frame 3 (74 bytes on wire, 74 bytes captured)
Ethernet II, Src: AsustekC_0a:06:28 (00:0c:6e:0a:06:28), Dst: ZonetTec_8c:11:73 (00:50:22:8c:11:73)
Internet Protocol, Src: 192.168.110.10 (192.168.110.10), Dst: 192.168.110.1 (192.168.110.1)
Transmission Control Protocol, Src Port: 675 (675), Dst Port: nfs (2049), Seq: 0, Len: 0

No.     Time        Source                Destination           Protocol Info
      4 2.814382    192.168.110.1         192.168.110.10        TCP      nfs > 675 [SYN, ACK] Seq=0 Ack=1 Win=92672 Len=0 MSS=1460 TSV=474034265 TSER=1145045 WS=4

Frame 4 (74 bytes on wire, 74 bytes captured)
Ethernet II, Src: ZonetTec_8c:11:73 (00:50:22:8c:11:73), Dst: AsustekC_0a:06:28 (00:0c:6e:0a:06:28)
Internet Protocol, Src: 192.168.110.1 (192.168.110.1), Dst: 192.168.110.10 (192.168.110.10)
Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 675 (675), Seq: 0, Ack: 1, Len: 0

No.     Time        Source                Destination           Protocol Info
      5 2.814410    192.168.110.10        192.168.110.1         TCP      675 > nfs [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=1145045 TSER=474034265

Frame 5 (66 bytes on wire, 66 bytes captured)
Ethernet II, Src: AsustekC_0a:06:28 (00:0c:6e:0a:06:28), Dst: ZonetTec_8c:11:73 (00:50:22:8c:11:73)
Internet Protocol, Src: 192.168.110.10 (192.168.110.10), Dst: 192.168.110.1 (192.168.110.1)
Transmission Control Protocol, Src Port: 675 (675), Dst Port: nfs (2049), Seq: 1, Ack: 1, Len: 0

No.     Time        Source                Destination           Protocol Info
      6 2.822284    192.168.110.10        192.168.110.1         NFS      V3 GETATTR Call (Reply In 8), FH:0x43fe0000

Frame 6 (210 bytes on wire, 210 bytes captured)
Ethernet II, Src: AsustekC_0a:06:28 (00:0c:6e:0a:06:28), Dst: ZonetTec_8c:11:73 (00:50:22:8c:11:73)
Internet Protocol, Src: 192.168.110.10 (192.168.110.10), Dst: 192.168.110.1 (192.168.110.1)
Transmission Control Protocol, Src Port: 675 (675), Dst Port: nfs (2049), Seq: 1, Ack: 1, Len: 144
Remote Procedure Call, Type:Call XID:0x8e4655b0
Network File System, GETATTR Call FH:0x43fe0000
    [Program Version: 3]
    [V3 Procedure: GETATTR (1)]
    object

No.     Time        Source                Destination           Protocol Info
      7 2.822400    192.168.110.1         192.168.110.10        TCP      nfs > 675 [ACK] Seq=1 Ack=145 Win=6864 Len=0 TSV=474034267 TSER=1145047

Frame 7 (66 bytes on wire, 66 bytes captured)
Ethernet II, Src: ZonetTec_8c:11:73 (00:50:22:8c:11:73), Dst: AsustekC_0a:06:28 (00:0c:6e:0a:06:28)
Internet Protocol, Src: 192.168.110.1 (192.168.110.1), Dst: 192.168.110.10 (192.168.110.10)
Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 675 (675), Seq: 1, Ack: 145, Len: 0

No.     Time        Source                Destination           Protocol Info
      8 13.749967   192.168.110.1         192.168.110.10        NFS      V3 GETATTR Reply (Call In 6)  Directory mode:2775 uid:102 gid:1000

Frame 8 (182 bytes on wire, 182 bytes captured)
Ethernet II, Src: ZonetTec_8c:11:73 (00:50:22:8c:11:73), Dst: AsustekC_0a:06:28 (00:0c:6e:0a:06:28)
Internet Protocol, Src: 192.168.110.1 (192.168.110.1), Dst: 192.168.110.10 (192.168.110.10)
Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 675 (675), Seq: 1, Ack: 145, Len: 116
Remote Procedure Call, Type:Reply XID:0x8e4655b0
Network File System, GETATTR Reply  Directory mode:2775 uid:102 gid:1000
    [Program Version: 3]
    [V3 Procedure: GETATTR (1)]
    Status: NFS3_OK (0)
    obj_attributes  Directory mode:2775 uid:102 gid:1000
        Type: Directory (2)
        mode: 042775
        nlink: 2
        uid: 102
        gid: 1000
        size: 4096
        used: 4096
        rdev: 0,0
        fsid: 0x000000000000fe00
        fileid: 15650
        atime: Mar  7, 2007 10:31:07.000000000
        mtime: Mar  7, 2007 10:31:07.000000000
        ctime: Mar  7, 2007 10:31:07.000000000

No.     Time        Source                Destination           Protocol Info
      9 13.750012   192.168.110.10        192.168.110.1         TCP      675 > nfs [ACK] Seq=145 Ack=117 Win=5888 Len=0 TSV=1147787 TSER=474037003

Frame 9 (66 bytes on wire, 66 bytes captured)
Ethernet II, Src: AsustekC_0a:06:28 (00:0c:6e:0a:06:28), Dst: ZonetTec_8c:11:73 (00:50:22:8c:11:73)
Internet Protocol, Src: 192.168.110.10 (192.168.110.10), Dst: 192.168.110.1 (192.168.110.1)
Transmission Control Protocol, Src Port: 675 (675), Dst Port: nfs (2049), Seq: 145, Ack: 117, Len: 0

No.     Time        Source                Destination           Protocol Info
     12 14.695027   192.168.110.10        192.168.110.1         NFS      V3 ACCESS Call (Reply In 14), FH:0x43fe0000

Frame 12 (214 bytes on wire, 214 bytes captured)
Ethernet II, Src: AsustekC_0a:06:28 (00:0c:6e:0a:06:28), Dst: ZonetTec_8c:11:73 (00:50:22:8c:11:73)
Internet Protocol, Src: 192.168.110.10 (192.168.110.10), Dst: 192.168.110.1 (192.168.110.1)
Transmission Control Protocol, Src Port: 675 (675), Dst Port: nfs (2049), Seq: 145, Ack: 117, Len: 148
Remote Procedure Call, Type:Call XID:0x8f4655b0
Network File System, ACCESS Call FH:0x43fe0000
    [Program Version: 3]
    [V3 Procedure: ACCESS (4)]
    object
    access: 0x1f

No.     Time        Source                Destination           Protocol Info
     13 14.695157   192.168.110.1         192.168.110.10        TCP      nfs > 675 [ACK] Seq=117 Ack=293 Win=7936 Len=0 TSV=474037239 TSER=1148027

Frame 13 (66 bytes on wire, 66 bytes captured)
Ethernet II, Src: ZonetTec_8c:11:73 (00:50:22:8c:11:73), Dst: AsustekC_0a:06:28 (00:0c:6e:0a:06:28)
Internet Protocol, Src: 192.168.110.1 (192.168.110.1), Dst: 192.168.110.10 (192.168.110.10)
Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 675 (675), Seq: 117, Ack: 293, Len: 0

No.     Time        Source                Destination           Protocol Info
     14 14.695260   192.168.110.1         192.168.110.10        NFS      V3 ACCESS Reply (Call In 12)

Frame 14 (190 bytes on wire, 190 bytes captured)
Ethernet II, Src: ZonetTec_8c:11:73 (00:50:22:8c:11:73), Dst: AsustekC_0a:06:28 (00:0c:6e:0a:06:28)
Internet Protocol, Src: 192.168.110.1 (192.168.110.1), Dst: 192.168.110.10 (192.168.110.10)
Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 675 (675), Seq: 117, Ack: 293, Len: 124
Remote Procedure Call, Type:Reply XID:0x8f4655b0
Network File System, ACCESS Reply
    [Program Version: 3]
    [V3 Procedure: ACCESS (4)]
    Status: NFS3_OK (0)
    obj_attributes  Directory mode:2775 uid:102 gid:1000
    access: 0x1f

No.     Time        Source                Destination           Protocol Info
     15 14.695269   192.168.110.10        192.168.110.1         TCP      675 > nfs [ACK] Seq=293 Ack=241 Win=5888 Len=0 TSV=1148027 TSER=474037239

Frame 15 (66 bytes on wire, 66 bytes captured)
Ethernet II, Src: AsustekC_0a:06:28 (00:0c:6e:0a:06:28), Dst: ZonetTec_8c:11:73 (00:50:22:8c:11:73)
Internet Protocol, Src: 192.168.110.10 (192.168.110.10), Dst: 192.168.110.1 (192.168.110.1)
Transmission Control Protocol, Src Port: 675 (675), Dst Port: nfs (2049), Seq: 293, Ack: 241, Len: 0

No.     Time        Source                Destination           Protocol Info
     16 14.695344   192.168.110.10        192.168.110.1         NFS      V3 READDIRPLUS Call (Reply In 17), FH:0x43fe0000

Frame 16 (234 bytes on wire, 234 bytes captured)
Ethernet II, Src: AsustekC_0a:06:28 (00:0c:6e:0a:06:28), Dst: ZonetTec_8c:11:73 (00:50:22:8c:11:73)
Internet Protocol, Src: 192.168.110.10 (192.168.110.10), Dst: 192.168.110.1 (192.168.110.1)
Transmission Control Protocol, Src Port: 675 (675), Dst Port: nfs (2049), Seq: 293, Ack: 241, Len: 168
Remote Procedure Call, Type:Call XID:0x904655b0
Network File System, READDIRPLUS Call FH:0x43fe0000
    [Program Version: 3]
    [V3 Procedure: READDIRPLUS (17)]
    dir
    cookie: 0
    Verifier: Opaque Data
    dircount: 512
    maxcount: 4096

No.     Time        Source                Destination           Protocol Info
     17 14.695604   192.168.110.1         192.168.110.10        NFS      V3 READDIRPLUS Reply (Call In 16) . ..

Frame 17 (482 bytes on wire, 482 bytes captured)
Ethernet II, Src: ZonetTec_8c:11:73 (00:50:22:8c:11:73), Dst: AsustekC_0a:06:28 (00:0c:6e:0a:06:28)
Internet Protocol, Src: 192.168.110.1 (192.168.110.1), Dst: 192.168.110.10 (192.168.110.10)
Transmission Control Protocol, Src Port: nfs (2049), Dst Port: 675 (675), Seq: 241, Ack: 461, Len: 416
Remote Procedure Call, Type:Reply XID:0x904655b0
Network File System, READDIRPLUS Reply
    [Program Version: 3]
    [V3 Procedure: READDIRPLUS (17)]
    Status: NFS3_OK (0)
    dir_attributes  Directory mode:2775 uid:102 gid:1000
    Verifier: Opaque Data
    Value Follows: Yes
    Entry: name .
    Value Follows: Yes
    Entry: name ..
    Value Follows: No
    EOF: 1

No.     Time        Source                Destination           Protocol Info
     18 14.733282   192.168.110.10        192.168.110.1         TCP      675 > nfs [ACK] Seq=461 Ack=657 Win=6912 Len=0 TSV=1148037 TSER=474037239

Frame 18 (66 bytes on wire, 66 bytes captured)
Ethernet II, Src: AsustekC_0a:06:28 (00:0c:6e:0a:06:28), Dst: ZonetTec_8c:11:73 (00:50:22:8c:11:73)
Internet Protocol, Src: 192.168.110.10 (192.168.110.10), Dst: 192.168.110.1 (192.168.110.1)
Transmission Control Protocol, Src Port: 675 (675), Dst Port: nfs (2049), Seq: 461, Ack: 657, Len: 0

[-- Attachment #3: Type: text/plain, Size: 345 bytes --]

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

[-- Attachment #4: Type: text/plain, Size: 140 bytes --]

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 10:23 Delays on "first" access to a NFS mount Simon Peter
@ 2007-03-07 12:38 ` Talpey, Thomas
  2007-03-07 13:22   ` Simon Peter
  2007-03-07 15:06   ` Simon Peter
  0 siblings, 2 replies; 45+ messages in thread
From: Talpey, Thomas @ 2007-03-07 12:38 UTC (permalink / raw)
  To: Simon Peter; +Cc: nfs

The delay seems to be on the server side. What is the server running?
Does it have any nameservice issues? Delays on the first response can
often be due to server exports checking, which usually requires reverse
name lookups.

Tom.

At 05:23 AM 3/7/2007, Simon Peter wrote:
>Hi,
>
>I get a good 10 second delay anytime I am accessing my NFS mounts from
>a client for the first time (or after a long time not accessing them --
>I suppose whenever the cache is cleared or something similar).
>
>I usually did not bother, even though it is very annoying, but this
>time I collected a network protocol capture, which is attached. Notice
>the big delay between packet #6 and #8, while #7 should show that it is
>not a network issue.
>
>I would be very glad if somebody could explain these delays.
>
>Thanks,
>Simon


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 12:38 ` Talpey, Thomas
@ 2007-03-07 13:22   ` Simon Peter
  2007-03-07 15:06   ` Simon Peter
  1 sibling, 0 replies; 45+ messages in thread
From: Simon Peter @ 2007-03-07 13:22 UTC (permalink / raw)
  To: Talpey, Thomas; +Cc: nfs

The server runs Debian unstable:
Linux server 2.6.18-3-k7 #1 SMP Sun Dec 10 20:17:39 UTC 2006 i686
GNU/Linux

And I use classic NFSv3, exporting only to that subnet with options
(rw,sync). The client uses solely the intr option.

There shouldn't be any nameservice issues since the server is also
running the nameserver and is the authority for the subnet that I am
mounting from. I tried canonical and reverse lookups of the client's
name/IP from that server, all without delay. How would I check for
nameservice issues?

The server has some disks with exported directories that spin down after
some idle time, but the disk of that particular mount point that I am
using is always online. Maybe the server somehow checks all exports all
the time and not just the particular requested one and thus spins up
all the other disks?

Simon

> The delay seems to be on the server side. What is the server running?
> Does it have any nameservice issues? Delays on the first response can
> often be due to server exports checking, which usually requires
> reverse name lookups.
> 
> Tom.
> 
> At 05:23 AM 3/7/2007, Simon Peter wrote:
> >Hi,
> >
> >I get a good 10 second delay anytime I am accessing my NFS mounts
> >from a client for the first time (or after a long time not accessing
> >them -- I suppose whenever the cache is cleared or something
> >similar).
> >
> >I usually did not bother, even though it is very annoying, but this
> >time I collected a network protocol capture, which is attached.
> >Notice the big delay between packet #6 and #8, while #7 should show
> >that it is not a network issue.
> >
> >I would be very glad if somebody could explain these delays.
> >
> >Thanks,
> >Simon


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 12:38 ` Talpey, Thomas
  2007-03-07 13:22   ` Simon Peter
@ 2007-03-07 15:06   ` Simon Peter
  2007-03-07 15:10     ` Simon Peter
  2007-03-07 15:42     ` J. Bruce Fields
  1 sibling, 2 replies; 45+ messages in thread
From: Simon Peter @ 2007-03-07 15:06 UTC (permalink / raw)
  To: Talpey, Thomas; +Cc: nfs

Hi again,

just verified that the server indeed spins up all disks before
answering the request. I thus suspect it is somehow checking all exports
whenever any one export is mounted. Is this correct behaviour?

Simon

> The delay seems to be on the server side. What is the server running?
> Does it have any nameservice issues? Delays on the first response can
> often be due to server exports checking, which usually requires
> reverse name lookups.
> 
> Tom.
> 
> At 05:23 AM 3/7/2007, Simon Peter wrote:
> >Hi,
> >
> >I get a good 10 second delay anytime I am accessing my NFS mounts
> >from a client for the first time (or after a long time not accessing
> >them -- I suppose whenever the cache is cleared or something
> >similar).
> >
> >I usually did not bother, even though it is very annoying, but this
> >time I collected a network protocol capture, which is attached.
> >Notice the big delay between packet #6 and #8, while #7 should show
> >that it is not a network issue.
> >
> >I would be very glad if somebody could explain these delays.
> >
> >Thanks,
> >Simon


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 15:06   ` Simon Peter
@ 2007-03-07 15:10     ` Simon Peter
  2007-03-07 15:42     ` J. Bruce Fields
  1 sibling, 0 replies; 45+ messages in thread
From: Simon Peter @ 2007-03-07 15:10 UTC (permalink / raw)
  To: Thomas.Talpey; +Cc: nfs

That should have read "accessed", not "mounted". The export is already
mounted. :)

Simon

> Hi again,
> 
> just verified that the server indeed spins up all disks before
> answering the request. I thus suspect it is somehow checking all
> exports whenever any one export is mounted. Is this correct behaviour?
> 
> Simon
> 
> > The delay seems to be on the server side. What is the server
> > running? Does it have any nameservice issues? Delays on the first
> > response can often be due to server exports checking, which usually
> > requires reverse name lookups.
> > 
> > Tom.
> > 
> > At 05:23 AM 3/7/2007, Simon Peter wrote:
> > >Hi,
> > >
> > >I get a good 10 second delay anytime I am accessing my NFS mounts
> > >from a client for the first time (or after a long time not
> > >accessing them -- I suppose whenever the cache is cleared or
> > >something similar).
> > >
> > >I usually did not bother, even though it is very annoying, but this
> > >time I collected a network protocol capture, which is attached.
> > >Notice the big delay between packet #6 and #8, while #7 should show
> > >that it is not a network issue.
> > >
> > >I would be very glad if somebody could explain these delays.
> > >
> > >Thanks,
> > >Simon
> 


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 15:06   ` Simon Peter
  2007-03-07 15:10     ` Simon Peter
@ 2007-03-07 15:42     ` J. Bruce Fields
  2007-03-07 18:44       ` Simon Peter
  1 sibling, 1 reply; 45+ messages in thread
From: J. Bruce Fields @ 2007-03-07 15:42 UTC (permalink / raw)
  To: Simon Peter; +Cc: nfs, Talpey, Thomas

On Wed, Mar 07, 2007 at 04:06:33PM +0100, Simon Peter wrote:
> just verified that the server indeed spins up all disks before
> answering the request. I thus suspect it is somehow checking all exports
> whenever any one export is mounted. Is this correct behaviour?

Hm.  If you have the nfs-utils source, you can see there's a loop in

	nfs-utils/utils/mountd/cache.c:nfsd_fh()

that stats the root of each export, in two places; the first it looks
like you shouldn't hit if you don't have the mountpoint export option
set:

	 if (exp->m_export.e_mountpoint &&
	     !is_mountpoint(exp->m_export.e_mountpoint[0]?
	                    exp->m_export.e_mountpoint:
	                    exp->m_export.e_path))
		dev_missing ++;

The second is to figure out which filesystem the filehandle that you passed in
that getattr is for:

         if (stat(exp->m_export.e_path, &stb) != 0)
                 continue;
         if (fsidtype == 1 &&
             ((exp->m_export.e_flags & NFSEXP_FSID) == 0 ||
              exp->m_export.e_fsid != fsidnum))
                 continue;
         if (fsidtype != 1) {
                 if (stb.st_ino != inode)
                         continue;
                 if (major != major(stb.st_dev) ||
                     minor != minor(stb.st_dev))
                         continue;
         }
	/* It's a match !! */

You could stick a printf() in there somewhere or something to check whether
this is really where it's waiting.

Could we cache the stat information in the export and then double-check it if
necessary when there's a match?  Or is there some way we could get the kernel
to keep that cached for us?

It seems reasonable to want to export filesystems from a bunch of disks without
necessarily keeping them all spun up all the time.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 15:42     ` J. Bruce Fields
@ 2007-03-07 18:44       ` Simon Peter
  2007-03-07 20:29         ` J. Bruce Fields
  2007-03-07 20:31         ` Talpey, Thomas
  0 siblings, 2 replies; 45+ messages in thread
From: Simon Peter @ 2007-03-07 18:44 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: nfs, Talpey, Thomas

> > just verified that the server indeed spins up all disks before
> > answering the request. I thus suspect it is somehow checking all
> > exports whenever any one export is accessed. Is this correct
> > behaviour?
> Hm.  If you have the nfs-utils source, you can see there's a loop in
> 	nfs-utils/utils/mountd/cache.c:nfsd_fh()
> that stats the root of each export, in two places; the first it looks
> like you shouldn't hit if you don't have the mountpoint export option
> set:

Correct. This one is never hit in my case.

> The second is to figure out which filesystem the filehandle that you
> passed in that getattr is for:
>          if (stat(exp->m_export.e_path, &stb) != 0)
>                  continue;

This is where the wait for the respective disk to spin up occurs.

> Could we cache the stat information in the export and then
> double-check it if necessary when there's a match?  Or is there some
> way we could get the kernel to keep that cached for us?

I could certainly cook up a patch for mountd to cache that information
on its own. I don't have too much clue about how the kernel does its
cacheing, though. If it's useful to do that directly in mountd, I could
get my hands on it.

> It seems reasonable to want to export filesystems from a bunch of
> disks without necessarily keeping them all spun up all the time.

This is at least what I would like to see...

Thanks,
Simon

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 18:44       ` Simon Peter
@ 2007-03-07 20:29         ` J. Bruce Fields
  2007-03-07 21:46           ` Simon Peter
                             ` (3 more replies)
  2007-03-07 20:31         ` Talpey, Thomas
  1 sibling, 4 replies; 45+ messages in thread
From: J. Bruce Fields @ 2007-03-07 20:29 UTC (permalink / raw)
  To: Simon Peter; +Cc: nfs, Talpey, Thomas

On Wed, Mar 07, 2007 at 07:44:18PM +0100, Simon Peter wrote:
> > > just verified that the server indeed spins up all disks before
> > > answering the request. I thus suspect it is somehow checking all
> > > exports whenever any one export is accessed. Is this correct
> > > behaviour?
> > Hm.  If you have the nfs-utils source, you can see there's a loop in
> > 	nfs-utils/utils/mountd/cache.c:nfsd_fh()
> > that stats the root of each export, in two places; the first it looks
> > like you shouldn't hit if you don't have the mountpoint export option
> > set:
> 
> Correct. This one is never hit in my case.
> 
> > The second is to figure out which filesystem the filehandle that you
> > passed in that getattr is for:
> >          if (stat(exp->m_export.e_path, &stb) != 0)
> >                  continue;
> 
> This is where the wait for the respective disk to spin up occurs.

OK, cool, so we understand the problem.

> > Could we cache the stat information in the export and then
> > double-check it if necessary when there's a match?  Or is there some
> > way we could get the kernel to keep that cached for us?
> 
> I could certainly cook up a patch for mountd to cache that information
> on its own.  I don't have too much clue about how the kernel does its
> cacheing, though. If it's useful to do that directly in mountd, I
> could get my hands on it.

There's two caches involved:

	- the filesystem caches attributes so that subsequent stat's of
	  the exported directory can be answered without having to go to
	  disk.  I guess it's not suprising that that wouldn't be cached
	  anymore if you hadn't touched the filesystem in a long time.
	  Though there's one point I'm unclear on: are the directories
	  you're exporting mountpoints?  That's the normal
	  configuration, and in that case I would've thought the inode
	  for that directory would be pinned in memory so the stat
	  wouldn't have to go to disk.  I'm probably missing something.

	- the filehandle->export mapping that this function tells the
	  kernel about is cached by nfsd for a half-hour.  That time is
	  set a little later in nfsd_fh:

	          qword_printint(f, time(0)+30*60);

	  I don't think there would be any harm to just changing that
	  time(0)+30*60 to 0x7FFFFFFF (== never expire)--nfs-utils
	  should be invalidating that cache explicitly whenever it's
	  needed.  Maybe that should be the default.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 18:44       ` Simon Peter
  2007-03-07 20:29         ` J. Bruce Fields
@ 2007-03-07 20:31         ` Talpey, Thomas
  2007-03-07 20:50           ` J. Bruce Fields
  1 sibling, 1 reply; 45+ messages in thread
From: Talpey, Thomas @ 2007-03-07 20:31 UTC (permalink / raw)
  To: Simon Peter; +Cc: J. Bruce Fields, nfs

At 01:44 PM 3/7/2007, Simon Peter wrote:
>> Could we cache the stat information in the export and then
>> double-check it if necessary when there's a match?  Or is there some
>> way we could get the kernel to keep that cached for us?
>
>I could certainly cook up a patch for mountd to cache that information
>on its own. I don't have too much clue about how the kernel does its
>cacheing, though. If it's useful to do that directly in mountd, I could
>get my hands on it.

This sounds like a job for inotify. The mountd could stat the export root
and use inotify_add_watch(2) to keep an eye on it to see if the stat
contents changed. Since the export already has a reference, it doesn't
seem offhand like it would change things much, operationally. Of course,
making mountd depend on an optional facility might be an issue, but it
could always fall back to the current behavior.

You probably don't want to sign up for enhancing the in-kernel export
cache. :-) Let's just say it's a bit mysterious, especially its interaction
with mountd.

Tom.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 20:31         ` Talpey, Thomas
@ 2007-03-07 20:50           ` J. Bruce Fields
  2007-03-07 21:07             ` Talpey, Thomas
                               ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: J. Bruce Fields @ 2007-03-07 20:50 UTC (permalink / raw)
  To: Talpey, Thomas; +Cc: nfs, Simon Peter

On Wed, Mar 07, 2007 at 03:31:49PM -0500, Talpey, Thomas wrote:
> At 01:44 PM 3/7/2007, Simon Peter wrote:
> >> Could we cache the stat information in the export and then
> >> double-check it if necessary when there's a match?  Or is there some
> >> way we could get the kernel to keep that cached for us?
> >
> >I could certainly cook up a patch for mountd to cache that information
> >on its own. I don't have too much clue about how the kernel does its
> >cacheing, though. If it's useful to do that directly in mountd, I could
> >get my hands on it.
> 
> This sounds like a job for inotify. The mountd could stat the export root
> and use inotify_add_watch(2) to keep an eye on it to see if the stat
> contents changed.

Hm.  Would it be enough just to hold an open file descriptor for the
directory?  Is it safe to assume that for any filesystem (uh, any disk
filesystem anyway) that if you have something open then stat() on it
won't have to go to the disk?

> You probably don't want to sign up for enhancing the in-kernel export
> cache. :-) Let's just say it's a bit mysterious, especially its interaction
> with mountd.

I sympathize, though this is actually one of the few mysteries of our
nfs implementation that I feel like I understand, at least on alternate
Thursdays....  What's bugging me a lot these days is I don't understand
well enough why it's the way it is and what might need to be done to
make it better.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 20:50           ` J. Bruce Fields
@ 2007-03-07 21:07             ` Talpey, Thomas
  2007-03-07 21:17               ` J. Bruce Fields
  2007-03-07 21:40             ` Simon Peter
  2007-03-07 22:12             ` Neil Brown
  2 siblings, 1 reply; 45+ messages in thread
From: Talpey, Thomas @ 2007-03-07 21:07 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: nfs, Simon Peter

At 03:50 PM 3/7/2007, J. Bruce Fields wrote:
>> You probably don't want to sign up for enhancing the in-kernel export
>> cache. :-) Let's just say it's a bit mysterious, especially its interaction
>> with mountd.
>
>I sympathize, though this is actually one of the few mysteries of our
>nfs implementation that I feel like I understand, at least on alternate
>Thursdays....  What's bugging me a lot these days is I don't understand
>well enough why it's the way it is and what might need to be done to
>make it better.

I was actually trying to talk Simon out of trying, in case that wasn't
obvious.

But I'm really glad it bugs you! Keep thinking that way, maybe you can
untangle it someday. :-)

While you're thinking about it, what's the actual timeout on a given
in-kernel export cache entry? There's a 120-second deadline on an
unresolved cache miss being populated, but when, exactly, does an
existing (resolved) entry go stale? I admit to having tried to figure it
out once and wound up going in circles.

Tom.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 21:07             ` Talpey, Thomas
@ 2007-03-07 21:17               ` J. Bruce Fields
  2007-03-07 21:23                 ` Talpey, Thomas
  0 siblings, 1 reply; 45+ messages in thread
From: J. Bruce Fields @ 2007-03-07 21:17 UTC (permalink / raw)
  To: Talpey, Thomas; +Cc: nfs, Simon Peter

On Wed, Mar 07, 2007 at 04:07:39PM -0500, Talpey, Thomas wrote:
> I was actually trying to talk Simon out of trying, in case that wasn't
> obvious.

Probably good advice.

> But I'm really glad it bugs you! Keep thinking that way, maybe you can
> untangle it someday. :-)
> 
> While you're thinking about it, what's the actual timeout on a given
> in-kernel export cache entry? There's a 120-second deadline on an
> unresolved cache miss being populated, but when, exactly, does an
> existing (resolved) entry go stale? I admit to having tried to figure it
> out once and wound up going in circles.

There's an expiry time that's passed down with each cache entry.  In
this particular case it's 30 minutes.  There's also a "flush" file you
can write to to ask that the whole cache be flushed.  I don't remember
how this works in detail, though.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 21:17               ` J. Bruce Fields
@ 2007-03-07 21:23                 ` Talpey, Thomas
  2007-03-07 21:54                   ` J. Bruce Fields
  2007-03-07 22:15                   ` Neil Brown
  0 siblings, 2 replies; 45+ messages in thread
From: Talpey, Thomas @ 2007-03-07 21:23 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: nfs, Simon Peter

At 04:17 PM 3/7/2007, J. Bruce Fields wrote:
>There's an expiry time that's passed down with each cache entry.  In
>this particular case it's 30 minutes.  There's also a "flush" file you
>can write to to ask that the whole cache be flushed.  I don't remember
>how this works in detail, though.

Aha - so the time comes from mountd. There's some sort of refresh
timer that the kernel triggers though. So it's not a deadline of this
time (I think). Or is it.

The "flush" file lives in /proc/net/rpc/nfsd.export, and you write an
integer value to it. I *think* it then flushes any entries which are
more than that many seconds old.

The whole thing makes my head hurt and I try not to look at it.

Tom.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 20:50           ` J. Bruce Fields
  2007-03-07 21:07             ` Talpey, Thomas
@ 2007-03-07 21:40             ` Simon Peter
  2007-03-07 22:17               ` Neil Brown
  2007-03-07 22:12             ` Neil Brown
  2 siblings, 1 reply; 45+ messages in thread
From: Simon Peter @ 2007-03-07 21:40 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: nfs, Talpey, Thomas

> > This sounds like a job for inotify. The mountd could stat the
> > export root and use inotify_add_watch(2) to keep an eye on it to
> > see if the stat contents changed.
> Hm.  Would it be enough just to hold an open file descriptor for the
> directory?  Is it safe to assume that for any filesystem (uh, any disk
> filesystem anyway) that if you have something open then stat() on it
> won't have to go to the disk?

I think what Tom had in mind was to stat all directories once, remember
their values, have inotify keep an eye on 'em and whenever they change,
update the remembered values. This way, disk access would only have to
be done whenever something changes, which is when the disk is spun up
anyway.

Simon

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 20:29         ` J. Bruce Fields
@ 2007-03-07 21:46           ` Simon Peter
  2007-03-07 22:05             ` J. Bruce Fields
  2007-03-07 22:09           ` Neil Brown
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 45+ messages in thread
From: Simon Peter @ 2007-03-07 21:46 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: nfs, Talpey, Thomas

> 	  Though there's one point I'm unclear on: are the directories
> 	  you're exporting mountpoints?  That's the normal

Not all of my exported directories are mountpoints of the underlying
VFS of the server. Some are, though.

> 	- the filehandle->export mapping that this function tells the
> 	  kernel about is cached by nfsd for a half-hour.  That time
> is set a little later in nfsd_fh:
> 	  I don't think there would be any harm to just changing that
> 	  time(0)+30*60 to 0x7FFFFFFF (== never expire)--nfs-utils
> 	  should be invalidating that cache explicitly whenever it's
> 	  needed.  Maybe that should be the default.

I could try that for now.

Are you sure these are invalidated automatically, especially through
nfs-utils? If the kernel cache never expires, it should consequently
never ask for it, so nfs-utils would not be involved. Am I missing
something?

Simon

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 21:23                 ` Talpey, Thomas
@ 2007-03-07 21:54                   ` J. Bruce Fields
  2007-03-07 22:37                     ` Neil Brown
  2007-03-08 13:27                     ` Olaf Kirch
  2007-03-07 22:15                   ` Neil Brown
  1 sibling, 2 replies; 45+ messages in thread
From: J. Bruce Fields @ 2007-03-07 21:54 UTC (permalink / raw)
  To: Talpey, Thomas; +Cc: nfs, Simon Peter

On Wed, Mar 07, 2007 at 04:23:23PM -0500, Talpey, Thomas wrote:
> At 04:17 PM 3/7/2007, J. Bruce Fields wrote:
> >There's an expiry time that's passed down with each cache entry.  In
> >this particular case it's 30 minutes.  There's also a "flush" file you
> >can write to to ask that the whole cache be flushed.  I don't remember
> >how this works in detail, though.
> 
> Aha - so the time comes from mountd. There's some sort of refresh
> timer that the kernel triggers though. So it's not a deadline of this
> time (I think). Or is it.

The kernel sweeps through the cache every now and then and cleans out
expired entries.  I think it also takes note of the earliest future
expiry it runs across in the process, and uses that to decide when to
check next.  This is all in linux/net/sunrpc/cache.c:cache_clean().

> The "flush" file lives in /proc/net/rpc/nfsd.export, and you write an
> integer value to it. I *think* it then flushes any entries which are
> more than that many seconds old.

Right.

> The whole thing makes my head hurt and I try not to look at it.

Well, non-head-hurty ideas always welcomed.  I've got two export-related
problems to fix:

	- Our current NFSv4 pseudofs fsid=0 hack is a pain to administer
	  and results in inconsistent paths across different NFS
	  versions.

	- The trick of using the pseudoflavor as a client name (so doing

		/export	 gss/krb5(rw)
	
	  instead of

		/export *(sec=krb5,rw)

	  ), is inconsistent with what other os's do, and makes it
	  impossible to specify restrictions based both on flavor and on
	  ip network/dns name/netgroup.

While I'm at it Trond and Christoph and others seem to be asking whether
we can't make some more fundamental changes, such as:

	- Maintaining a static in-kernel exports table instead of
	  loading it on demand from mountd, and

	- divorcing the exports namespace completely from any local
	  process namespace, to the extent that you could even just say
	  "I want to export /dev/sda7 as /usr/local/bin" without first 
	  mounting /dev/sda7 someplace.

But I really need a better idea of the requirements on the exports
system.  And some other examples to look at wouldn't hurt either.  (Take
pity on me, Linux is all I know....)

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 21:46           ` Simon Peter
@ 2007-03-07 22:05             ` J. Bruce Fields
  2007-03-07 23:19               ` Simon Peter
  0 siblings, 1 reply; 45+ messages in thread
From: J. Bruce Fields @ 2007-03-07 22:05 UTC (permalink / raw)
  To: Simon Peter; +Cc: nfs, Talpey, Thomas

On Wed, Mar 07, 2007 at 10:46:24PM +0100, Simon Peter wrote:
> > 	  Though there's one point I'm unclear on: are the directories
> > 	  you're exporting mountpoints?  That's the normal
> 
> Not all of my exported directories are mountpoints of the underlying
> VFS of the server.

I'd be curious why.  There's some hard-to-solve security problems
there--people can guess filehandles of unexported files and access them
directly without lookups.  So some day I'd love to actually forbid (or
at least strongly discourage) what you're doing....  But clearly we'd
first need to understand why people do that and make sure there are
adequate alternatives.

> Some are, though.

Are the spinning-up delays happening only on those drives that have
exported directories that aren't mountpoints?

> > 	- the filehandle->export mapping that this function tells the
> > 	  kernel about is cached by nfsd for a half-hour.  That time
> > is set a little later in nfsd_fh:
> > 	  I don't think there would be any harm to just changing that
> > 	  time(0)+30*60 to 0x7FFFFFFF (== never expire)--nfs-utils
> > 	  should be invalidating that cache explicitly whenever it's
> > 	  needed.  Maybe that should be the default.
> 
> I could try that for now.
> 
> Are you sure these are invalidated automatically, especially through
> nfs-utils? If the kernel cache never expires, it should consequently
> never ask for it, so nfs-utils would not be involved. Am I missing
> something?

There's also a mechanism by which nfs-utils can ask for the whole cache
to be flushed immediately on its own.  So re-running exportfs to change
the exports, for example, should result in the cache being flushed.  I
haven't checked whether that's done in all the places it should be, but
it probably is.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 20:29         ` J. Bruce Fields
  2007-03-07 21:46           ` Simon Peter
@ 2007-03-07 22:09           ` Neil Brown
  2007-03-08 15:49           ` Simon Peter
  2007-03-09 13:02           ` Simon Peter
  3 siblings, 0 replies; 45+ messages in thread
From: Neil Brown @ 2007-03-07 22:09 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Talpey, Thomas, nfs, Simon Peter

On Wednesday March 7, bfields@fieldses.org wrote:
> 
> 	- the filehandle->export mapping that this function tells the
> 	  kernel about is cached by nfsd for a half-hour.  That time is
> 	  set a little later in nfsd_fh:
> 
> 	          qword_printint(f, time(0)+30*60);
> 
> 	  I don't think there would be any harm to just changing that
> 	  time(0)+30*60 to 0x7FFFFFFF (== never expire)--nfs-utils
> 	  should be invalidating that cache explicitly whenever it's
> 	  needed.  Maybe that should be the default.

I picked 30 minutes because it seems like a good number at the time
and there where plenty of other more important things to think about.
On reflection, I agree that never-expire is appropriate for the
fsid->exportpoint cache (which is relevant here) and the
client+exportpoint -> export options cache.
The IP->clientname cache should have an expiry time based on the TTL
from the DNS (assuming that the DNS was used to do part of the
mapping), but that information is not available at all easily... so
30minutes is probably as good as anything else.

Patches welcome.... :-)

NeilBrown

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 20:50           ` J. Bruce Fields
  2007-03-07 21:07             ` Talpey, Thomas
  2007-03-07 21:40             ` Simon Peter
@ 2007-03-07 22:12             ` Neil Brown
  2007-03-07 22:23               ` J. Bruce Fields
  2 siblings, 1 reply; 45+ messages in thread
From: Neil Brown @ 2007-03-07 22:12 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: nfs, Talpey, Thomas, Simon Peter

On Wednesday March 7, bfields@fieldses.org wrote:
> 
> Hm.  Would it be enough just to hold an open file descriptor for the
> directory?  Is it safe to assume that for any filesystem (uh, any disk
> filesystem anyway) that if you have something open then stat() on it
> won't have to go to the disk?

Trouble with holding an open file descriptor is that sometimes people
want to unmount their export filesystems, and an open file descriptor
will stop that.
Currently you can "exportfs -f" and unmount/remount (as long as no
client is actively using it.

NeilBrown

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 21:23                 ` Talpey, Thomas
  2007-03-07 21:54                   ` J. Bruce Fields
@ 2007-03-07 22:15                   ` Neil Brown
  1 sibling, 0 replies; 45+ messages in thread
From: Neil Brown @ 2007-03-07 22:15 UTC (permalink / raw)
  To: Talpey, Thomas; +Cc: J. Bruce Fields, nfs, Simon Peter

On Wednesday March 7, Thomas.Talpey@netapp.com wrote:
> 
> The "flush" file lives in /proc/net/rpc/nfsd.export, and you write an
> integer value to it. I *think* it then flushes any entries which are
> more than that many seconds old.

You write a timestamp in seconds-since-epoch and any entries older
that time are treated as expired.  We typically write out the
mtime of the etab file.  Or '1' to force a full flush.

Arguably that sort of fine control isn't needed, but it seemed like a
good idea at the time (it cause problems if your system clock goes
backwards).

NeilBrown

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 21:40             ` Simon Peter
@ 2007-03-07 22:17               ` Neil Brown
  2007-03-07 22:36                 ` Talpey, Thomas
  0 siblings, 1 reply; 45+ messages in thread
From: Neil Brown @ 2007-03-07 22:17 UTC (permalink / raw)
  To: Simon Peter; +Cc: J. Bruce Fields, Talpey,  Thomas, nfs

On Wednesday March 7, simon.peter@gmx.de wrote:
> 
> I think what Tom had in mind was to stat all directories once, remember
> their values, have inotify keep an eye on 'em and whenever they change,
> update the remembered values. This way, disk access would only have to
> be done whenever something changes, which is when the disk is spun up
> anyway.

There is certainly some sense in that approach.  I don't think
inotify is needed though.  The only part of the stat information we
are interested in is major/minor/inode numbers, and they don't change.

We could possibly stat everything that seems to be interesting once
and store the details.
When a request comes in, match against the records to find a path,
then double-check the path still matches.  If it does, good.  If not,
take the long way around again.

NeilBrown

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 22:12             ` Neil Brown
@ 2007-03-07 22:23               ` J. Bruce Fields
  0 siblings, 0 replies; 45+ messages in thread
From: J. Bruce Fields @ 2007-03-07 22:23 UTC (permalink / raw)
  To: Neil Brown; +Cc: nfs, Talpey, Thomas, Simon Peter

On Thu, Mar 08, 2007 at 09:12:09AM +1100, Neil Brown wrote:
> Trouble with holding an open file descriptor is that sometimes people
> want to unmount their export filesystems, and an open file descriptor
> will stop that.

Yeah, OK.  I was thinking of this as an advantage rather than a
disadvantage, to be perfectly honest.  But no doubt they have some good
reason....

> Currently you can "exportfs -f" and unmount/remount (as long as no
> client is actively using it.

Right, OK.--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 22:17               ` Neil Brown
@ 2007-03-07 22:36                 ` Talpey, Thomas
  2007-03-07 22:48                   ` Neil Brown
  0 siblings, 1 reply; 45+ messages in thread
From: Talpey, Thomas @ 2007-03-07 22:36 UTC (permalink / raw)
  To: Neil Brown; +Cc: nfs

At 05:17 PM 3/7/2007, Neil Brown wrote:
>On Wednesday March 7, simon.peter@gmx.de wrote:
>> 
>> I think what Tom had in mind was to stat all directories once, remember
>> their values, have inotify keep an eye on 'em and whenever they change,
>> update the remembered values. This way, disk access would only have to
>> be done whenever something changes, which is when the disk is spun up
>> anyway.
>
>There is certainly some sense in that approach.  I don't think
>inotify is needed though.  The only part of the stat information we
>are interested in is major/minor/inode numbers, and they don't change.

Actually I thought it would important to also look at the permission
bits and/or any acls, in case access were revoked e.g mode 0. The
export needs to be invalidated, basically as mountd would have found
if exporting from scratch. Wouldn't it?

Come to think of it, maybe not. If the export goes bad then the
client gets ESTALE, that's different from revoking access or never
exporting. In that case I think mountd just needs to be sure the
export point doesn't go away due to unmount, or is covered by a
new one.

Tom.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 21:54                   ` J. Bruce Fields
@ 2007-03-07 22:37                     ` Neil Brown
  2007-03-07 23:06                       ` J. Bruce Fields
  2007-03-07 23:24                       ` J. Bruce Fields
  2007-03-08 13:27                     ` Olaf Kirch
  1 sibling, 2 replies; 45+ messages in thread
From: Neil Brown @ 2007-03-07 22:37 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: nfs, Talpey, Thomas, Simon Peter

On Wednesday March 7, bfields@fieldses.org wrote:
> 
> Well, non-head-hurty ideas always welcomed.  I've got two export-related
> problems to fix:
> 
> 	- Our current NFSv4 pseudofs fsid=0 hack is a pain to administer
> 	  and results in inconsistent paths across different NFS
> 	  versions.

You've got to put that v4 pseudo root somewhere...
It just needs cleverness in nfs-utils to auto-bind-mount things into
the pseudoroot...  but I guess people cannot magically unmount things
then.

How about this.  We add an export option "follow-symlinks" so that
when nfsd is asked to stat a symlink it does a 'stat' instead of an
'lstat' (effectively).
Then we get mountd to make a tmpfs in /var/lib/nfs/pseudoroot which
contains directories and symlinks to the various export points names
in etab.  This tmpfs is exported as fsid=0,follow-symlinks.

Problem solved?

Ofcourse if different clients get to see different exports, then we
might need multiple tmpfs's in /var/lib/nfs/pseudoroot/$CLIENT/ ....

> 
> 	- The trick of using the pseudoflavor as a client name (so doing
> 
> 		/export	 gss/krb5(rw)
> 	
> 	  instead of
> 
> 		/export *(sec=krb5,rw)
> 
> 	  ), is inconsistent with what other os's do, and makes it
> 	  impossible to specify restrictions based both on flavor and on
> 	  ip network/dns name/netgroup.

Yeeesssss.  If you are using crypto-security, then the
source-ip-address (which is a terribly weak form of security) should
be irrelevant.  But I think you've convinced me that some people have
valid cases for the combined tested.  Grumble Grumble ;-) 

> 
> While I'm at it Trond and Christoph and others seem to be asking whether
> we can't make some more fundamental changes, such as:
> 
> 	- Maintaining a static in-kernel exports table instead of
> 	  loading it on demand from mountd, and
> 

Don't like that idea at all.  Demand-loading is a good thing.

> 	- divorcing the exports namespace completely from any local
> 	  process namespace, to the extent that you could even just say
> 	  "I want to export /dev/sda7 as /usr/local/bin" without first 
> 	  mounting /dev/sda7 someplace.

I like that even less.  Much much less.  Way way way less.  Yuck.

Having a private name-space for nfsd and co might be OK, but that is
the closest I could come to the above suggestion, and even then I'm not
convinced.  I think private name spaces are a very powerful tool that
should be used very very carefully.  There is plenty of room for
confusion of the poor sysadmin if you start doing too much with
private name spaces.

If you built a system where every daemon has a private namespace, then
having one for nfsd would be ok, but it really should be a system wide
approach to management, not an approach only used by nfsd.

NeilBrown

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 22:36                 ` Talpey, Thomas
@ 2007-03-07 22:48                   ` Neil Brown
  2007-03-07 22:56                     ` Talpey, Thomas
  0 siblings, 1 reply; 45+ messages in thread
From: Neil Brown @ 2007-03-07 22:48 UTC (permalink / raw)
  To: Talpey, Thomas; +Cc: nfs

On Wednesday March 7, Thomas.Talpey@netapp.com wrote:
> At 05:17 PM 3/7/2007, Neil Brown wrote:
> >On Wednesday March 7, simon.peter@gmx.de wrote:
> >> 
> >> I think what Tom had in mind was to stat all directories once, remember
> >> their values, have inotify keep an eye on 'em and whenever they change,
> >> update the remembered values. This way, disk access would only have to
> >> be done whenever something changes, which is when the disk is spun up
> >> anyway.
> >
> >There is certainly some sense in that approach.  I don't think
> >inotify is needed though.  The only part of the stat information we
> >are interested in is major/minor/inode numbers, and they don't change.
> 
> Actually I thought it would important to also look at the permission
> bits and/or any acls, in case access were revoked e.g mode 0. The
> export needs to be invalidated, basically as mountd would have found
> if exporting from scratch. Wouldn't it?

No.  The mode/acl on anything isn't interesting to mountd.  If the
client has access to a file, it gets access, if not: not.  That is all
handled by nfsd.
And remember that if you "chmod 0" a directory, that doesn't remove
access from people with files in the directory already open.


> 
> Come to think of it, maybe not. If the export goes bad then the
> client gets ESTALE, that's different from revoking access or never
> exporting. In that case I think mountd just needs to be sure the
> export point doesn't go away due to unmount, or is covered by a
> new one.

Mounting on top of an export point is an odd case that is not
explicitly handled.  Currently if you do that, the clients won't
notice until after 32 minutes of inactivity.
I'm not sure it is a case that is worth any effort to deal with
'sensibly'.

NeilBrown

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 22:48                   ` Neil Brown
@ 2007-03-07 22:56                     ` Talpey, Thomas
  0 siblings, 0 replies; 45+ messages in thread
From: Talpey, Thomas @ 2007-03-07 22:56 UTC (permalink / raw)
  To: Neil Brown; +Cc: nfs

At 05:48 PM 3/7/2007, Neil Brown wrote:
>No.  The mode/acl on anything isn't interesting to mountd.  If the
>client has access to a file, it gets access, if not: not.  That is all
>handled by nfsd.
>And remember that if you "chmod 0" a directory, that doesn't remove
>access from people with files in the directory already open.

Well, if mountd isn't running as root and the export point is mode 0,
then it can't be exported because the daemon can't stat it, right?

Corner case, I guess, maybe mountd can't be non-root. Agreed on
the revoke-when-open, of course.

Tom.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 22:37                     ` Neil Brown
@ 2007-03-07 23:06                       ` J. Bruce Fields
  2007-03-07 23:39                         ` Neil Brown
  2007-03-16 21:47                         ` Christoph Hellwig
  2007-03-07 23:24                       ` J. Bruce Fields
  1 sibling, 2 replies; 45+ messages in thread
From: J. Bruce Fields @ 2007-03-07 23:06 UTC (permalink / raw)
  To: Neil Brown; +Cc: Christoph Hellwig, nfs, Talpey,  Thomas, Simon Peter

On Thu, Mar 08, 2007 at 09:37:18AM +1100, Neil Brown wrote:
> On Wednesday March 7, bfields@fieldses.org wrote:
> > While I'm at it Trond and Christoph and others seem to be asking whether
> > we can't make some more fundamental changes, such as:
> > 
> > 	- Maintaining a static in-kernel exports table instead of
> > 	  loading it on demand from mountd, and
> > 
> 
> Don't like that idea at all.  Demand-loading is a good thing.

Depending on why you need it, this may be inconsistent with the nfsv4
pseudofilesystem construction; we need to be able to service readdir on
the pseudofilesystem root, for example, which means knowing at least the
list of paths.

So could you remind me what the uses cases are here?  Who is it that
requires demand loading, and why?

I'll promise to write it all down someplace and then hopefully we won't
have to re-ask the same questions too many times....

> > 	- divorcing the exports namespace completely from any local
> > 	  process namespace, to the extent that you could even just say
> > 	  "I want to export /dev/sda7 as /usr/local/bin" without first 
> > 	  mounting /dev/sda7 someplace.
> 
> I like that even less.  Much much less.  Way way way less.  Yuck.
> 
> Having a private name-space for nfsd and co might be OK, but that is
> the closest I could come to the above suggestion, and even then I'm not
> convinced.  I think private name spaces are a very powerful tool that
> should be used very very carefully.  There is plenty of room for
> confusion of the poor sysadmin if you start doing too much with
> private name spaces.
> 
> If you built a system where every daemon has a private namespace, then
> having one for nfsd would be ok, but it really should be a system wide
> approach to management, not an approach only used by nfsd.

This is Christoph's suggestion, so I'm cc:'ing him.

I can believe that it would be convenient for sysadmin's to be able to
export filesystems at paths other than the paths they're locally mounted
at.  And once you're going to allow that, it seems cleanest just to let
the exports tree be a purely in-kernel thing.  But this also doesn't
seem like a hard requirement to me.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 22:05             ` J. Bruce Fields
@ 2007-03-07 23:19               ` Simon Peter
  0 siblings, 0 replies; 45+ messages in thread
From: Simon Peter @ 2007-03-07 23:19 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: nfs, Talpey, Thomas

> > Not all of my exported directories are mountpoints of the underlying
> > VFS of the server.
> I'd be curious why.  There's some hard-to-solve security problems
> there--people can guess filehandles of unexported files and access
> them directly without lookups.  So some day I'd love to actually
> forbid (or at least strongly discourage) what you're doing....  But
> clearly we'd first need to understand why people do that and make
> sure there are adequate alternatives.

Well, I've actually done it for security (not knowing what you just
said about it). There are some directories on those disks that I don't
want people to poke around in, so I don't export the whole filesystem
of a disk. For some other directories, I have different access
constraints.

For example, there's one subdirectory that I export to two subnets and
one that is only exported to one of them. I do that because I have an
"access granting" security philosophy: At first, any access is denied
and then I grant access only to those people who can make use of their
granted resources. Since one of those directories is only useful to
the users of that one subnet, I only export it for that one.

> > Some are, though.
> Are the spinning-up delays happening only on those drives that have
> exported directories that aren't mountpoints?

I just notice that I was wrong. No exports are on mountpoints. I'm
sorry.

> > Are you sure these are invalidated automatically, especially through
> > nfs-utils? If the kernel cache never expires, it should consequently
> > never ask for it, so nfs-utils would not be involved. Am I missing
> > something?
> There's also a mechanism by which nfs-utils can ask for the whole
> cache to be flushed immediately on its own.  So re-running exportfs
> to change the exports, for example, should result in the cache being
> flushed.  I haven't checked whether that's done in all the places it
> should be, but it probably is.

Okay. So if we really only need major, minor and inode information,
like Neil said, then that would work. Because otherwise the data on
disk could change without the kernel noticing.

Simon

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 22:37                     ` Neil Brown
  2007-03-07 23:06                       ` J. Bruce Fields
@ 2007-03-07 23:24                       ` J. Bruce Fields
  2007-03-07 23:51                         ` Neil Brown
  1 sibling, 1 reply; 45+ messages in thread
From: J. Bruce Fields @ 2007-03-07 23:24 UTC (permalink / raw)
  To: Neil Brown; +Cc: Talpey, Thomas, nfs, Simon Peter

On Thu, Mar 08, 2007 at 09:37:18AM +1100, Neil Brown wrote:
> On Wednesday March 7, bfields@fieldses.org wrote:
> > 
> > Well, non-head-hurty ideas always welcomed.  I've got two export-related
> > problems to fix:
> > 
> > 	- Our current NFSv4 pseudofs fsid=0 hack is a pain to administer
> > 	  and results in inconsistent paths across different NFS
> > 	  versions.
> 
> You've got to put that v4 pseudo root somewhere...
> It just needs cleverness in nfs-utils to auto-bind-mount things into
> the pseudoroot...

I've got make-mountd-clever patches here; if I can get them working in
the next couple days then I'll pass them along so people can see what
they're doing.

> but I guess people cannot magically unmount things then.
>
> How about this.  We add an export option "follow-symlinks" so that
> when nfsd is asked to stat a symlink it does a 'stat' instead of an
> 'lstat' (effectively).
> Then we get mountd to make a tmpfs in /var/lib/nfs/pseudoroot which
> contains directories and symlinks to the various export points names
> in etab.  This tmpfs is exported as fsid=0,follow-symlinks.
> 
> Problem solved?

Maybe.  Getting those symlink/mountpoints right sounds tricky.

How different is this from the in-kernel automounting that the NFS
client is using for fsid traversal, for example?

> Ofcourse if different clients get to see different exports, then we
> might need multiple tmpfs's in /var/lib/nfs/pseudoroot/$CLIENT/ ....

Ugh.  Is this something a lot of people do?

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 23:06                       ` J. Bruce Fields
@ 2007-03-07 23:39                         ` Neil Brown
  2007-03-08  5:14                           ` J. Bruce Fields
  2007-03-16 21:47                         ` Christoph Hellwig
  1 sibling, 1 reply; 45+ messages in thread
From: Neil Brown @ 2007-03-07 23:39 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Christoph Hellwig, nfs, Talpey,  Thomas, Simon Peter

On Wednesday March 7, bfields@fieldses.org wrote:
> 
> So could you remind me what the uses cases are here?  Who is it that
> requires demand loading, and why?

Partly it is the principle that demand-based configuration is more
flexible.  Witness the various efforts to replace rc.d scripts with
something event/demand based.

The IP->clientname table must be demand loaded because you obviously
cannot know all needed IP addresses in advance. (The rmtab experience
proves that)

The clientname+path->export-options table must be demand loaded
because - depending a bit of how you choose client names and how
complicated /etc/exports is - you either don't know all client names
in advance, or computing them all is complex and wasteful.

The fsid->path table could possible be made 'static', but I think
demand-loading is still best.  There are multiple possible fsids for
some filesystems, and telling the kernel about all of them when only
one will be used seems wasteful.  And the filesystems may not all be
available when you try to create the static table.  You could update
the table at every mount, but with demand-loading, you don't have to.

Imagine having hundreds of filesystems on some sort of library (a CD
library?) where each can be identified by a UUID which gets stored in
the fsid in the filehandle.
Imagine a simple extension to mountd so that a call-out were made when
an unknown filehandle arrived.  This callout could mount the required
filesystem and export it.  Maybe the library only allows 3 filesystems
to be mounted at a time, so it would unmount the lease-recently-used
one.

How are you going to handle that system except with demand-loading of
the fsid->path table?

> 
> I'll promise to write it all down someplace and then hopefully we won't
> have to re-ask the same questions too many times....

Sounds like a fine idea.
I have often wanted to write a 'Linux commentary' that explains all
the hows and whys of things.  I even started some bits once (to help
me understand the VFS layer).  But Linux changes so fast that any
entry in such a commentary would be out-of-date before it was
written....

NeilBrown

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 23:24                       ` J. Bruce Fields
@ 2007-03-07 23:51                         ` Neil Brown
  2007-03-08  4:36                           ` J. Bruce Fields
  0 siblings, 1 reply; 45+ messages in thread
From: Neil Brown @ 2007-03-07 23:51 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: nfs, Talpey, Thomas, Simon Peter

On Wednesday March 7, bfields@fieldses.org wrote:
> On Thu, Mar 08, 2007 at 09:37:18AM +1100, Neil Brown wrote:
> > On Wednesday March 7, bfields@fieldses.org wrote:
> > > 
> > > Well, non-head-hurty ideas always welcomed.  I've got two export-related
> > > problems to fix:
> > > 
> > > 	- Our current NFSv4 pseudofs fsid=0 hack is a pain to administer
> > > 	  and results in inconsistent paths across different NFS
> > > 	  versions.
> > 
> > You've got to put that v4 pseudo root somewhere...
> > It just needs cleverness in nfs-utils to auto-bind-mount things into
> > the pseudoroot...
> 
> I've got make-mountd-clever patches here; if I can get them working in
> the next couple days then I'll pass them along so people can see what
> they're doing.

Cool.

> 
> > but I guess people cannot magically unmount things then.
> >
> > How about this.  We add an export option "follow-symlinks" so that
> > when nfsd is asked to stat a symlink it does a 'stat' instead of an
> > 'lstat' (effectively).
> > Then we get mountd to make a tmpfs in /var/lib/nfs/pseudoroot which
> > contains directories and symlinks to the various export points names
> > in etab.  This tmpfs is exported as fsid=0,follow-symlinks.
> > 
> > Problem solved?
> 
> Maybe.  Getting those symlink/mountpoints right sounds tricky.

Is it?
 mount -t tmpfs tmpfs /var/lib/nfs/pseudoroot
 grep '^/' /etc/exports | while read a b
 do
   d=`dirname $a`
   mkdir -p /var/lib/nfs/pseudoroot/$d
   ln -s $a /var/lib/nfs/pseudoroot/$a
 done
(untested, and probably has some corner cases but the essence is
 there).


> 
> How different is this from the in-kernel automounting that the NFS
> client is using for fsid traversal, for example?
> 
> > Ofcourse if different clients get to see different exports, then we
> > might need multiple tmpfs's in /var/lib/nfs/pseudoroot/$CLIENT/ ....
> 
> Ugh.  Is this something a lot of people do?

Well, if you export a different root filesystem to each diskless
client, or if you export /home to some places and /backup to others,
then you already have the potential for a different pseudo filesystem
for each client.  My little hacky shell script above essentially
merges them all which might be OK, or might not.

Suppose you wanted to allow every diskless client to see it's root as
'/'?  Is that a dumb thing to do, or just a difficult thing to do?

I was just looking at the RFC again and saw this in section 7.3:

   Based on the construction of the server's name space, it is possible
   that multiple pseudo filesystems may exist.  For example,

   /a         pseudo filesystem
   /a/b       real filesystem
   /a/b/c     pseudo filesystem
   /a/b/c/d   real filesystem

   Each of the pseudo filesystems are considered separate entities and
   therefore will have a unique fsid.

That adds a whole new dimension of complexity..... do we really want
to go there?

NeilBrown

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 23:51                         ` Neil Brown
@ 2007-03-08  4:36                           ` J. Bruce Fields
  0 siblings, 0 replies; 45+ messages in thread
From: J. Bruce Fields @ 2007-03-08  4:36 UTC (permalink / raw)
  To: Neil Brown; +Cc: nfs, Talpey, Thomas, Simon Peter

On Thu, Mar 08, 2007 at 10:51:04AM +1100, Neil Brown wrote:
> Well, if you export a different root filesystem to each diskless
> client, or if you export /home to some places and /backup to others,
> then you already have the potential for a different pseudo filesystem
> for each client.  My little hacky shell script above essentially
> merges them all which might be OK, or might not.

I think we want a single export tree, parts of which clients may or may
not have permission to see, rather than try to support per-client
namespaces.  If nothing else, secinfo becomes a bit weird otherwise (do
you tell the client that it can use auth_unix security for this
filesystem, if actually using auth_unix might result in the client
seeing a *different* filesystem?).

> I was just looking at the RFC again and saw this in section 7.3:
> 
>    Based on the construction of the server's name space, it is possible
>    that multiple pseudo filesystems may exist.  For example,
> 
>    /a         pseudo filesystem
>    /a/b       real filesystem
>    /a/b/c     pseudo filesystem
>    /a/b/c/d   real filesystem
> 
>    Each of the pseudo filesystems are considered separate entities and
>    therefore will have a unique fsid.

Hah.

> That adds a whole new dimension of complexity..... do we really want
> to go there?

We certainly aren't required to support every weird configuration that
the RFC allows.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 23:39                         ` Neil Brown
@ 2007-03-08  5:14                           ` J. Bruce Fields
  2007-03-08  5:42                             ` Neil Brown
  2007-03-08 13:43                             ` Olaf Kirch
  0 siblings, 2 replies; 45+ messages in thread
From: J. Bruce Fields @ 2007-03-08  5:14 UTC (permalink / raw)
  To: Neil Brown; +Cc: Christoph Hellwig, nfs, Talpey,  Thomas, Simon Peter

On Thu, Mar 08, 2007 at 10:39:23AM +1100, Neil Brown wrote:
> Imagine having hundreds of filesystems on some sort of library (a CD
> library?) where each can be identified by a UUID which gets stored in
> the fsid in the filehandle.
> Imagine a simple extension to mountd so that a call-out were made when
> an unknown filehandle arrived.  This callout could mount the required
> filesystem and export it.  Maybe the library only allows 3 filesystems
> to be mounted at a time, so it would unmount the lease-recently-used
> one.

Maybe.  Is this practical?  Do we know of any cases of users doing this?
Do you block forever if you try to access 4 filesystems at once?  I
dunno....

> I have often wanted to write a 'Linux commentary' that explains all
> the hows and whys of things.  I even started some bits once (to help
> me understand the VFS layer).  But Linux changes so fast that any
> entry in such a commentary would be out-of-date before it was
> written....

And there's a lot to document.  I mean, look:

	http://www.oreilly.com/catalog/understandlni/

Just over a thousand pages, just covering the kernel networking code
(and only some of it at that).  Maybe they just lost the forest for the
trees, but still, yipes.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-08  5:14                           ` J. Bruce Fields
@ 2007-03-08  5:42                             ` Neil Brown
  2007-03-08 13:43                             ` Olaf Kirch
  1 sibling, 0 replies; 45+ messages in thread
From: Neil Brown @ 2007-03-08  5:42 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Christoph Hellwig, nfs, Talpey,  Thomas, Simon Peter

On Thursday March 8, bfields@fieldses.org wrote:
> On Thu, Mar 08, 2007 at 10:39:23AM +1100, Neil Brown wrote:
> > Imagine having hundreds of filesystems on some sort of library (a CD
> > library?) where each can be identified by a UUID which gets stored in
> > the fsid in the filehandle.
> > Imagine a simple extension to mountd so that a call-out were made when
> > an unknown filehandle arrived.  This callout could mount the required
> > filesystem and export it.  Maybe the library only allows 3 filesystems
> > to be mounted at a time, so it would unmount the lease-recently-used
> > one.
> 
> Maybe.  Is this practical?  Do we know of any cases of users doing this?

Someone once mentioned doing something vaguely like this... I don't
know the exact details.  It was mentioned in the context of the fsid
being too small to do a really good job.

I wonder if you can export autofs to get them mounted in the first
place..... 

> Do you block forever if you try to access 4 filesystems at once?  I
> dunno....

round-robin?  Sending lots of EINPROGRESS for one filesystem while the
others have a turn.


> 
> > I have often wanted to write a 'Linux commentary' that explains all
> > the hows and whys of things.  I even started some bits once (to help
> > me understand the VFS layer).  But Linux changes so fast that any
> > entry in such a commentary would be out-of-date before it was
> > written....
> 
> And there's a lot to document.  I mean, look:
> 
> 	http://www.oreilly.com/catalog/understandlni/
> 
> Just over a thousand pages, just covering the kernel networking code
> (and only some of it at that).  Maybe they just lost the forest for the
> trees, but still, yipes.

Is there really a market for that sort of book?  Maybe we should write
"Understanding Linux NFS" in our spare time .... oh wait, we don't
have any spare time.  Maybe in a couple of decades when I retire:-)

NeilBrown

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 21:54                   ` J. Bruce Fields
  2007-03-07 22:37                     ` Neil Brown
@ 2007-03-08 13:27                     ` Olaf Kirch
  2007-03-08 21:46                       ` J. Bruce Fields
  1 sibling, 1 reply; 45+ messages in thread
From: Olaf Kirch @ 2007-03-08 13:27 UTC (permalink / raw)
  To: nfs

On Wednesday 07 March 2007 22:54, J. Bruce Fields wrote:
> 	- Maintaining a static in-kernel exports table instead of
> 	  loading it on demand from mountd, and

Well, the original implementation did just that, and people kept
forgetting to re-run exportfs after changing the exports table,
and whatnot. Lots of gross inconsistencies. The addition of a
dynamic exports table was considered a sliced bread kind of
innovation... so it does feel like time warp when we talk about
a static export table now.

> 	- divorcing the exports namespace completely from any local
> 	  process namespace, to the extent that you could even just say
> 	  "I want to export /dev/sda7 as /usr/local/bin" without first
> 	  mounting /dev/sda7 someplace.

Is that really a desirable goal? From an admin's point of view,
file names are usually more "natural" than using fs uuids or
retro stuff such as device file names (the udev people would
actually hit you with their "device numbers are smoke and mirrors"
bat now).

Users actually want things like "I export /mnt and then clients
can see the contents of the CD mounted on /mnt/cdrom"

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir@lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-08  5:14                           ` J. Bruce Fields
  2007-03-08  5:42                             ` Neil Brown
@ 2007-03-08 13:43                             ` Olaf Kirch
  2007-03-08 21:27                               ` J. Bruce Fields
  1 sibling, 1 reply; 45+ messages in thread
From: Olaf Kirch @ 2007-03-08 13:43 UTC (permalink / raw)
  To: nfs
  Cc: J. Bruce Fields, Neil Brown, Simon Peter, Talpey,  Thomas,
	Christoph Hellwig

On Thursday 08 March 2007 06:14, J. Bruce Fields wrote:
> Maybe.  Is this practical?  Do we know of any cases of users doing this?
> Do you block forever if you try to access 4 filesystems at once?  I
> dunno....

IIRC SGI had a storage appliance a while back that included a tape robot,
but it was hiding the details somewhere deep inside XFS. I remember seeing
patches involving nfsd and dmapi (I can see you cringe, Christoph :-)

Note that in real-life scenarios, we're sometimes talking about literally
thousands of exported file systems. My previous employer has a customer with
such a setup, using NetApp filers. We had some trouble getting the Linux
client to survive in this environment, as it ran out of privileged ports
way too quickly. Absurd as it may sound, this kind of setup seems to be
the trend.

Now think about handling a system with several thousand exported
file systems on the server side - if you need to look at each file system
before nfsd is ready to service requests, we're talking of a considerable
delay in boot time. In the worst case we're talking about several thousand
*disks* that need to be spun up, and fuses going pop-pop-pop.

Short summary - if you want to scale beyond small work group servers,
you need something that scales well. Demand loading the exports
table does.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir@lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 20:29         ` J. Bruce Fields
  2007-03-07 21:46           ` Simon Peter
  2007-03-07 22:09           ` Neil Brown
@ 2007-03-08 15:49           ` Simon Peter
  2007-03-09 13:02           ` Simon Peter
  3 siblings, 0 replies; 45+ messages in thread
From: Simon Peter @ 2007-03-08 15:49 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: nfs, Talpey, Thomas

> 	- the filehandle->export mapping that this function tells the
> 	  kernel about is cached by nfsd for a half-hour.  That time
> is set a little later in nfsd_fh:
> 	          qword_printint(f, time(0)+30*60);
> 	  I don't think there would be any harm to just changing that
> 	  time(0)+30*60 to 0x7FFFFFFF (== never expire)--nfs-utils
> 	  should be invalidating that cache explicitly whenever it's
> 	  needed.  Maybe that should be the default.

I've done so and it seems to work! Been using the changed version the
whole day.

Simon

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-08 13:43                             ` Olaf Kirch
@ 2007-03-08 21:27                               ` J. Bruce Fields
  2007-03-09 15:02                                 ` Olaf Kirch
  0 siblings, 1 reply; 45+ messages in thread
From: J. Bruce Fields @ 2007-03-08 21:27 UTC (permalink / raw)
  To: Olaf Kirch
  Cc: Neil Brown, Talpey, Thomas, nfs, Christoph Hellwig, Simon Peter

On Thu, Mar 08, 2007 at 02:43:16PM +0100, Olaf Kirch wrote:
> Now think about handling a system with several thousand exported
> file systems on the server side - if you need to look at each file system
> before nfsd is ready to service requests, we're talking of a considerable
> delay in boot time. In the worst case we're talking about several thousand
> *disks* that need to be spun up, and fuses going pop-pop-pop.
>
> Short summary - if you want to scale beyond small work group servers,
> you need something that scales well. Demand loading the exports
> table does.

There's some confusion here--the reason that this was happening was that
we're mapping filehandles to exports by stat()ing the root of every
exported filesystem.  That may be an obstacle to handling large numbers
of exports, but it's not really related to the demand-loading question.

So why does demand-loading scale better?  Is the worry just the kernel
memory required to store the export table for thousands of mostly
inactive exports?

So you need the mountpoints for the exported filesystem, the export
options, and the name of the client(s).  If that adds up to a few K,
thousands would add up to a few Megs.

Are there other problems?  (E.g. does the VFS handle thousands of mounts
well?)

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-08 13:27                     ` Olaf Kirch
@ 2007-03-08 21:46                       ` J. Bruce Fields
  0 siblings, 0 replies; 45+ messages in thread
From: J. Bruce Fields @ 2007-03-08 21:46 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: nfs

On Thu, Mar 08, 2007 at 02:27:08PM +0100, Olaf Kirch wrote:
> On Wednesday 07 March 2007 22:54, J. Bruce Fields wrote:
> > 	- Maintaining a static in-kernel exports table instead of
> > 	  loading it on demand from mountd, and
> 
> Well, the original implementation did just that, and people kept
> forgetting to re-run exportfs after changing the exports table,

What exactly changed?  It's not the case today that you can expect
modifications to /etc/exports to be noticed automatically.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 20:29         ` J. Bruce Fields
                             ` (2 preceding siblings ...)
  2007-03-08 15:49           ` Simon Peter
@ 2007-03-09 13:02           ` Simon Peter
  2007-03-09 14:59             ` J. Bruce Fields
  3 siblings, 1 reply; 45+ messages in thread
From: Simon Peter @ 2007-03-09 13:02 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: nfs, Talpey, Thomas

> 	- the filehandle->export mapping that this function tells the
> 	  kernel about is cached by nfsd for a half-hour.  That time
> is set a little later in nfsd_fh:
> 	          qword_printint(f, time(0)+30*60);
> 	  I don't think there would be any harm to just changing that
> 	  time(0)+30*60 to 0x7FFFFFFF (== never expire)--nfs-utils
> 	  should be invalidating that cache explicitly whenever it's
> 	  needed.  Maybe that should be the default.

Is there any way that we could see this change incorporated into
nfs-utils? I certainly would like to have it.

Simon

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-09 13:02           ` Simon Peter
@ 2007-03-09 14:59             ` J. Bruce Fields
  0 siblings, 0 replies; 45+ messages in thread
From: J. Bruce Fields @ 2007-03-09 14:59 UTC (permalink / raw)
  To: Simon Peter; +Cc: Talpey, Thomas, nfs

On Fri, Mar 09, 2007 at 02:02:37PM +0100, Simon Peter wrote:
> > 	- the filehandle->export mapping that this function tells the
> > 	  kernel about is cached by nfsd for a half-hour.  That time
> > is set a little later in nfsd_fh:
> > 	          qword_printint(f, time(0)+30*60);
> > 	  I don't think there would be any harm to just changing that
> > 	  time(0)+30*60 to 0x7FFFFFFF (== never expire)--nfs-utils
> > 	  should be invalidating that cache explicitly whenever it's
> > 	  needed.  Maybe that should be the default.
> 
> Is there any way that we could see this change incorporated into
> nfs-utils? I certainly would like to have it.

Make a diff showing that change (and nothing else), stick an explanation
of the change and why it's correct at the top, and mail it to Neil cc'd
to this list?....

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-08 21:27                               ` J. Bruce Fields
@ 2007-03-09 15:02                                 ` Olaf Kirch
  0 siblings, 0 replies; 45+ messages in thread
From: Olaf Kirch @ 2007-03-09 15:02 UTC (permalink / raw)
  To: nfs
  Cc: J. Bruce Fields, Neil Brown, Simon Peter, Talpey,  Thomas,
	Christoph Hellwig

On Thursday 08 March 2007 22:27, J. Bruce Fields wrote:
> There's some confusion here--the reason that this was happening was that
> we're mapping filehandles to exports by stat()ing the root of every
> exported filesystem.  That may be an obstacle to handling large numbers
> of exports, but it's not really related to the demand-loading question.

Sorry if I was expressing myself poorly. What I was driving at was that
it makes sense to not mount all file systems prior to starting the
NFS server.

> So why does demand-loading scale better?  Is the worry just the kernel
> memory required to store the export table for thousands of mostly
> inactive exports?

It means you can start serving files without having to wait for all 
file systems to be mounted (and having their journals replayed,
etc). All you need is a way for mountd to figure out whether a file system
is there already (so we can push the rootfh into the kernel) or whether it's
not (so nfsd can return EJUKEBOX or defer the request)

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir@lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-07 23:06                       ` J. Bruce Fields
  2007-03-07 23:39                         ` Neil Brown
@ 2007-03-16 21:47                         ` Christoph Hellwig
  2007-03-16 21:54                           ` J. Bruce Fields
  1 sibling, 1 reply; 45+ messages in thread
From: Christoph Hellwig @ 2007-03-16 21:47 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Neil Brown, Christoph Hellwig, Talpey,  Thomas, nfs, Simon Peter

First I'd like to apologize for jumping in a little late, but I have
been very busy at work the last weeks.

Then I'll try to put my repsones to most concerns here in one length
mail because there's a lot of different things talked about here
that are are all interwinded.

There's a lot of things about exporting handling that can be static
vs ondemand, kernel vs userspace etc.


 - with NFSv4 there's a clear pseudo-filesystem structure where that
   really should be represented using the kernel mount tables, or
   to speak be a namespace reusing the infrastructure we have
 - for NFSv2/3 that's not as important, but when we have to have
   this namespace magic for NFSv4 anyway, it would be nice to reuse
   it for NFSv3 as far as possible, e.g. serving older NFS client
   from the same namespace as NFSv4 clients, even if only the actually
   defined mountpoints without the pseudo root instead of the full
   hiearchy that is visible
 - once we're talking about mounting filesystems into specific places
   we really should reuse all the existing excellent kernel code
   to deal with this.  Given that for NFSv4 we most likely want
   to represent a different tree that the normal local filesystem views
   that means a separate namespace.  Note that we should have a way
   for the administrator to easily see and modify this namespace to
   not cause all too much trouble.
 - using namespace and the kernel mount code means we need to have
   at least enough information on where these mount points are in
   kernel space.
 - it does however not mean that we need to set up all these mount
   points at boot time!  We have a nice scheme to creates mounts
   when a special follow_link operation that is for example used
   in the nfs client for submount, or we could have some sort
   of automounter that does userspace upcalls.

Does this give a better view of where I want to go?

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-16 21:47                         ` Christoph Hellwig
@ 2007-03-16 21:54                           ` J. Bruce Fields
  2007-03-16 21:57                             ` Christoph Hellwig
  0 siblings, 1 reply; 45+ messages in thread
From: J. Bruce Fields @ 2007-03-16 21:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Neil Brown, Talpey, Thomas, nfs, Simon Peter

On Fri, Mar 16, 2007 at 09:47:38PM +0000, Christoph Hellwig wrote:
>  - using namespace and the kernel mount code means we need to have
>    at least enough information on where these mount points are in
>    kernel space.
>  - it does however not mean that we need to set up all these mount
>    points at boot time!  We have a nice scheme to creates mounts
>    when a special follow_link operation that is for example used
>    in the nfs client for submount, or we could have some sort
>    of automounter that does userspace upcalls.

Note that clients may access filesystems by traversing mountpoints, but
they may also just jump in with a filehandle they got on a previous
boot.  So we still need to be able to look up exports by filehandle as
well as by path.

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Delays on "first" access to a NFS mount
  2007-03-16 21:54                           ` J. Bruce Fields
@ 2007-03-16 21:57                             ` Christoph Hellwig
  0 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2007-03-16 21:57 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Christoph Hellwig, Neil Brown, Talpey,  Thomas, nfs, Simon Peter

On Fri, Mar 16, 2007 at 05:54:29PM -0400, J. Bruce Fields wrote:
> On Fri, Mar 16, 2007 at 09:47:38PM +0000, Christoph Hellwig wrote:
> >  - using namespace and the kernel mount code means we need to have
> >    at least enough information on where these mount points are in
> >    kernel space.
> >  - it does however not mean that we need to set up all these mount
> >    points at boot time!  We have a nice scheme to creates mounts
> >    when a special follow_link operation that is for example used
> >    in the nfs client for submount, or we could have some sort
> >    of automounter that does userspace upcalls.
> 
> Note that clients may access filesystems by traversing mountpoints, but
> they may also just jump in with a filehandle they got on a previous
> boot.  So we still need to be able to look up exports by filehandle as
> well as by path.

Yeah, but an explicit kern_mount is easy.  For old dev_t based filehandles
anyway, and for uuid or fsid based onces we'd need an upcall for a uuid/fsid
to dev_t mapping.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2007-03-16 21:57 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-07 10:23 Delays on "first" access to a NFS mount Simon Peter
2007-03-07 12:38 ` Talpey, Thomas
2007-03-07 13:22   ` Simon Peter
2007-03-07 15:06   ` Simon Peter
2007-03-07 15:10     ` Simon Peter
2007-03-07 15:42     ` J. Bruce Fields
2007-03-07 18:44       ` Simon Peter
2007-03-07 20:29         ` J. Bruce Fields
2007-03-07 21:46           ` Simon Peter
2007-03-07 22:05             ` J. Bruce Fields
2007-03-07 23:19               ` Simon Peter
2007-03-07 22:09           ` Neil Brown
2007-03-08 15:49           ` Simon Peter
2007-03-09 13:02           ` Simon Peter
2007-03-09 14:59             ` J. Bruce Fields
2007-03-07 20:31         ` Talpey, Thomas
2007-03-07 20:50           ` J. Bruce Fields
2007-03-07 21:07             ` Talpey, Thomas
2007-03-07 21:17               ` J. Bruce Fields
2007-03-07 21:23                 ` Talpey, Thomas
2007-03-07 21:54                   ` J. Bruce Fields
2007-03-07 22:37                     ` Neil Brown
2007-03-07 23:06                       ` J. Bruce Fields
2007-03-07 23:39                         ` Neil Brown
2007-03-08  5:14                           ` J. Bruce Fields
2007-03-08  5:42                             ` Neil Brown
2007-03-08 13:43                             ` Olaf Kirch
2007-03-08 21:27                               ` J. Bruce Fields
2007-03-09 15:02                                 ` Olaf Kirch
2007-03-16 21:47                         ` Christoph Hellwig
2007-03-16 21:54                           ` J. Bruce Fields
2007-03-16 21:57                             ` Christoph Hellwig
2007-03-07 23:24                       ` J. Bruce Fields
2007-03-07 23:51                         ` Neil Brown
2007-03-08  4:36                           ` J. Bruce Fields
2007-03-08 13:27                     ` Olaf Kirch
2007-03-08 21:46                       ` J. Bruce Fields
2007-03-07 22:15                   ` Neil Brown
2007-03-07 21:40             ` Simon Peter
2007-03-07 22:17               ` Neil Brown
2007-03-07 22:36                 ` Talpey, Thomas
2007-03-07 22:48                   ` Neil Brown
2007-03-07 22:56                     ` Talpey, Thomas
2007-03-07 22:12             ` Neil Brown
2007-03-07 22:23               ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.