linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: NFS Client patch
       [not found] <Pine.LNX.3.96L.1010709131315.16113O-200000@happyplace.pdl.cmu.edu.suse.lists.linux.kernel>
@ 2001-07-09 18:33 ` Andi Kleen
  2001-07-10 13:33   ` Chris Wedgwood
  2001-07-17 22:02   ` Hans Reiser
  0 siblings, 2 replies; 36+ messages in thread
From: Andi Kleen @ 2001-07-09 18:33 UTC (permalink / raw)
  To: Craig Soules; +Cc: linux-kernel

Craig Soules <soules@happyplace.pdl.cmu.edu> writes:
 
> Our system does automatic directory compaction through the use of a tree
> structure, and so the cookie needs to be invalidated.  Also, any other
> file system whicih does immediate directory compaction would require this.

Actually all the file systems who do that on Linux (JFS, XFS, reiserfs) 
have fixed the issue properly server side, by adding a layer that generates
stable cookies. You should too.

-Andi


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-09 18:33 ` NFS Client patch Andi Kleen
@ 2001-07-10 13:33   ` Chris Wedgwood
  2001-07-10 13:41     ` Andi Kleen
  2001-07-17 22:02   ` Hans Reiser
  1 sibling, 1 reply; 36+ messages in thread
From: Chris Wedgwood @ 2001-07-10 13:33 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Craig Soules, linux-kernel

On Mon, Jul 09, 2001 at 08:33:31PM +0200, Andi Kleen wrote:

    Actually all the file systems who do that on Linux (JFS, XFS,
    reiserfs) have fixed the issue properly server side, by adding a
    layer that generates stable cookies. You should too.

I've always thought that was a stupid fix. Why not have the clients be
smarted and make them responsible for getting a new cookie if the old
one is hosed?

For linux, with the dcache, I'm not even sure that this would be all
the hard. Persumable Solaris could (does?) do the same?



  --cw

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-10 13:33   ` Chris Wedgwood
@ 2001-07-10 13:41     ` Andi Kleen
  2001-07-10 16:48       ` Craig Soules
  0 siblings, 1 reply; 36+ messages in thread
From: Andi Kleen @ 2001-07-10 13:41 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Andi Kleen, Craig Soules, linux-kernel

On Wed, Jul 11, 2001 at 01:33:11AM +1200, Chris Wedgwood wrote:
> On Mon, Jul 09, 2001 at 08:33:31PM +0200, Andi Kleen wrote:
> 
>     Actually all the file systems who do that on Linux (JFS, XFS,
>     reiserfs) have fixed the issue properly server side, by adding a
>     layer that generates stable cookies. You should too.
> 
> I've always thought that was a stupid fix. Why not have the clients be
> smarted and make them responsible for getting a new cookie if the old
> one is hosed?

Because to get that new cookie you would need another cookie; otherwise
you could violate the readdir guarantee that it'll never return files
twice.

> For linux, with the dcache, I'm not even sure that this would be all
> the hard. Persumable Solaris could (does?) do the same?

dcache is not populated on readdir for good reasons (it would 
require reading the inodes and tie a of lot of memory) and you would need
to lock all the dcache entries belong to a directory while a nfs readdir;
tieing up even more memory. Also a readdir() is not bounded in time.

BTW; the cookie issue is not an NFS only problem. It occurs on local
IO as well. Just consider rm -rf - reading directories and in parallel
deleting them (the original poster's file system would have surely
gotten that wrong). Another tricky case is telldir().  


-Andi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-10 13:41     ` Andi Kleen
@ 2001-07-10 16:48       ` Craig Soules
  2001-07-10 17:06         ` Andi Kleen
  0 siblings, 1 reply; 36+ messages in thread
From: Craig Soules @ 2001-07-10 16:48 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Chris Wedgwood, linux-kernel

On Tue, 10 Jul 2001, Andi Kleen wrote:
> Because to get that new cookie you would need another cookie; otherwise
> you could violate the readdir guarantee that it'll never return files
> twice.

I cannot locate any such guarantee in the NFS spec... are you refering to
another spec which applies?

> BTW; the cookie issue is not an NFS only problem. It occurs on local
> IO as well. Just consider rm -rf - reading directories and in parallel
> deleting them (the original poster's file system would have surely
> gotten that wrong). Another tricky case is telldir().  

I don't believe that the behavior in this case is deterministic.  If you
have multiple people accessing a single file, reading and writing to it,
there is no guarantee as to what the behavior is.  The client should be
able to handle any errors it creates for itself while doing this kind of
parallel operation.

Craig


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-10 16:48       ` Craig Soules
@ 2001-07-10 17:06         ` Andi Kleen
  2001-07-10 18:04           ` Chris Wedgwood
  0 siblings, 1 reply; 36+ messages in thread
From: Andi Kleen @ 2001-07-10 17:06 UTC (permalink / raw)
  To: Craig Soules; +Cc: Andi Kleen, Chris Wedgwood, linux-kernel

On Tue, Jul 10, 2001 at 12:48:20PM -0400, Craig Soules wrote:
> On Tue, 10 Jul 2001, Andi Kleen wrote:
> > Because to get that new cookie you would need another cookie; otherwise
> > you could violate the readdir guarantee that it'll never return files
> > twice.
> 
> I cannot locate any such guarantee in the NFS spec... are you refering to
> another spec which applies?

It's the unix semantics of readdir(); e.g. specified in Single Unix:

``   The type DIR, which is defined in the header <dirent.h>, represents
     a directory stream, which is an ordered sequence of all the
     directory entries in a particular directory. Directory entries
     represent files; files may be removed from a directory or added to
     a directory asynchronously to the operation of readdir(). ''

An ordered sequence does not include cycles.


> 
> > BTW; the cookie issue is not an NFS only problem. It occurs on local
> > IO as well. Just consider rm -rf - reading directories and in parallel
> > deleting them (the original poster's file system would have surely
> > gotten that wrong). Another tricky case is telldir().  
> 
> I don't believe that the behavior in this case is deterministic.  If you
> have multiple people accessing a single file, reading and writing to it,
> there is no guarantee as to what the behavior is.  The client should be
> able to handle any errors it creates for itself while doing this kind of
> parallel operation.

What happens with new entries added is unspecified; but old entries removed
in parallel should never cause a violation of the rule above.

A simple index into a rebalancing btree unfortunately doesn't fulfil this;
but there are ways to add additional layers to fix it.

The easiest test for it is rm -rf. 


-Andi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-10 17:06         ` Andi Kleen
@ 2001-07-10 18:04           ` Chris Wedgwood
  2001-07-12 20:57             ` Alan Cox
  0 siblings, 1 reply; 36+ messages in thread
From: Chris Wedgwood @ 2001-07-10 18:04 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Craig Soules, linux-kernel

On Tue, Jul 10, 2001 at 07:06:02PM +0200, Andi Kleen wrote:

    It's the unix semantics of readdir(); e.g. specified in Single Unix:
    
    ``   The type DIR, which is defined in the header <dirent.h>, represents
         a directory stream, which is an ordered sequence of all the
         directory entries in a particular directory. Directory entries
         represent files; files may be removed from a directory or added to
         a directory asynchronously to the operation of readdir(). ''
    
    An ordered sequence does not include cycles.

*Who* says NFS has to be a *unix* like filesystem?



  --cw

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-10 18:04           ` Chris Wedgwood
@ 2001-07-12 20:57             ` Alan Cox
  2001-07-13 11:26               ` Chris Wedgwood
  0 siblings, 1 reply; 36+ messages in thread
From: Alan Cox @ 2001-07-12 20:57 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Andi Kleen, Craig Soules, linux-kernel

>     An ordered sequence does not include cycles.
> 
> *Who* says NFS has to be a *unix* like filesystem?

NFS is most emphatically not a posix compliant FS, at the best of times.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-12 20:57             ` Alan Cox
@ 2001-07-13 11:26               ` Chris Wedgwood
  0 siblings, 0 replies; 36+ messages in thread
From: Chris Wedgwood @ 2001-07-13 11:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andi Kleen, Craig Soules, linux-kernel

On Thu, Jul 12, 2001 at 09:57:55PM +0100, Alan Cox wrote:

    NFS is most emphatically not a posix compliant FS, at the best of
    times.

Thats my point... why are we requiring file-systems and knfsd do all
sorts of psycho-acoustic-dildonics to make it look like one?  Why not
just accept its not very posix like and that good-enough, is, well,
good-enough?



  --cw

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-09 18:33 ` NFS Client patch Andi Kleen
  2001-07-10 13:33   ` Chris Wedgwood
@ 2001-07-17 22:02   ` Hans Reiser
  2001-07-17 22:14     ` Craig Soules
  2001-07-18 13:57     ` Chris Mason
  1 sibling, 2 replies; 36+ messages in thread
From: Hans Reiser @ 2001-07-17 22:02 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Craig Soules, linux-kernel

Andi Kleen wrote:
> 
> Craig Soules <soules@happyplace.pdl.cmu.edu> writes:
> 
> > Our system does automatic directory compaction through the use of a tree
> > structure, and so the cookie needs to be invalidated.  Also, any other
> > file system whicih does immediate directory compaction would require this.
> 
> Actually all the file systems who do that on Linux (JFS, XFS, reiserfs)
> have fixed the issue properly server side, by adding a layer that generates
> stable cookies. You should too.
> 
> -Andi
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

I take issue with the word "properly".  We have bastardized our FS design to do it.  NFS should not
be allowed to impose stable cookie maintenance on filesystems, it violates layering.  Simply
returning the last returned filename is so simple to code, much simpler than what we have to do to
cope with cookies.  Linux should fix the protocol for NFS, not ask Craig to screw over his FS
design.  Not that I think that will happen.....

Hans

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-17 22:02   ` Hans Reiser
@ 2001-07-17 22:14     ` Craig Soules
  2001-07-17 22:21       ` Hans Reiser
  2001-07-18 13:57     ` Chris Mason
  1 sibling, 1 reply; 36+ messages in thread
From: Craig Soules @ 2001-07-17 22:14 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Andi Kleen, linux-kernel

On Wed, 18 Jul 2001, Hans Reiser wrote:
> I take issue with the word "properly".  We have bastardized our FS design to do it.  NFS should not
> be allowed to impose stable cookie maintenance on filesystems, it violates layering.  Simply
> returning the last returned filename is so simple to code, much simpler than what we have to do to
> cope with cookies.  Linux should fix the protocol for NFS, not ask Craig to screw over his FS
> design.  Not that I think that will happen.....

Unfortunately to comply with NFSv2, the cookie cannot be larger than
32-bits.  I believe this oversight has been correct in later NFS versions.

I do agree that forcing the underlying fs to "fix" itself for NFS is the
wrong solution. I can understand their desire to follow unix semantics
(although I don't entirely agree with them), so until I think up a more
palatable solution for the linux community, I will just keep my patches to
myself :)

Craig


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-17 22:14     ` Craig Soules
@ 2001-07-17 22:21       ` Hans Reiser
  2001-07-18 13:30         ` Daniel Phillips
  2001-07-18 14:00         ` Jan Harkes
  0 siblings, 2 replies; 36+ messages in thread
From: Hans Reiser @ 2001-07-17 22:21 UTC (permalink / raw)
  To: Craig Soules; +Cc: Andi Kleen, linux-kernel

Craig Soules wrote:
> 
> On Wed, 18 Jul 2001, Hans Reiser wrote:
> > I take issue with the word "properly".  We have bastardized our FS design to do it.  NFS should not
> > be allowed to impose stable cookie maintenance on filesystems, it violates layering.  Simply
> > returning the last returned filename is so simple to code, much simpler than what we have to do to
> > cope with cookies.  Linux should fix the protocol for NFS, not ask Craig to screw over his FS
> > design.  Not that I think that will happen.....
> 
> Unfortunately to comply with NFSv2, the cookie cannot be larger than
> 32-bits.  I believe this oversight has been correct in later NFS versions.
> 
> I do agree that forcing the underlying fs to "fix" itself for NFS is the
> wrong solution. I can understand their desire to follow unix semantics
> (although I don't entirely agree with them), so until I think up a more
> palatable solution for the linux community, I will just keep my patches to
> myself :)
> 
> Craig

64 bits as in NFS v4 is still not large enough to hold a filename.  For practical reasons, ReiserFS
does what is needed to work with NFS, but what is needed bad design features, and any FS designer
who doesn't feel the need to get along with NFS should not have acceptance of bad design be made a
criterion for the acceptance of his patches.  Just let NFS not work for Craig's FS, what is the
problem with that?

Hans

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-17 22:21       ` Hans Reiser
@ 2001-07-18 13:30         ` Daniel Phillips
  2001-07-18 14:46           ` Hans Reiser
  2001-07-18 14:00         ` Jan Harkes
  1 sibling, 1 reply; 36+ messages in thread
From: Daniel Phillips @ 2001-07-18 13:30 UTC (permalink / raw)
  To: Hans Reiser, Craig Soules; +Cc: Andi Kleen, linux-kernel

On Wednesday 18 July 2001 00:21, Hans Reiser wrote:
> Craig Soules wrote:
> > On Wed, 18 Jul 2001, Hans Reiser wrote:
> > > I take issue with the word "properly".  We have bastardized our
> > > FS design to do it.  NFS should not be allowed to impose stable
> > > cookie maintenance on filesystems, it violates layering.  Simply
> > > returning the last returned filename is so simple to code, much
> > > simpler than what we have to do to cope with cookies.  Linux
> > > should fix the protocol for NFS, not ask Craig to screw over his
> > > FS design.  Not that I think that will happen.....
> >
> > Unfortunately to comply with NFSv2, the cookie cannot be larger
> > than 32-bits.  I believe this oversight has been correct in later
> > NFS versions.
> >
> > I do agree that forcing the underlying fs to "fix" itself for NFS
> > is the wrong solution. I can understand their desire to follow unix
> > semantics (although I don't entirely agree with them), so until I
> > think up a more palatable solution for the linux community, I will
> > just keep my patches to myself :)
> >
> > Craig
>
> 64 bits as in NFS v4 is still not large enough to hold a filename. 
> For practical reasons, ReiserFS does what is needed to work with NFS,
> but what is needed bad design features, and any FS designer who
> doesn't feel the need to get along with NFS should not have
> acceptance of bad design be made a criterion for the acceptance of
> his patches.  Just let NFS not work for Craig's FS, what is the
> problem with that?

I was planning to add coalesce-on-delete to my ext2 directory index
patch at some point, now I see I'll step right into this NFS
doo-d^H^H^H^H^H problem.  What to do?  Obviously it's not an option
to have NFS not work for ext2.  Just leaving the directory 
uncoalesced fixes the problem in some sense and doesn't hurt things
all that much.  Ext2 has been running that way for years.

Can I automagically know that a directory is mounted via NFS and
disable the coalescing?  Or maybe I need a -o coalesce=on/off, with
"off" as the default.  Ugh.

As you point out, this sucks.

--
Daniel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-17 22:02   ` Hans Reiser
  2001-07-17 22:14     ` Craig Soules
@ 2001-07-18 13:57     ` Chris Mason
  2001-07-19 11:35       ` Trond Myklebust
  1 sibling, 1 reply; 36+ messages in thread
From: Chris Mason @ 2001-07-18 13:57 UTC (permalink / raw)
  To: Hans Reiser, Andi Kleen; +Cc: Craig Soules, linux-kernel



On Wednesday, July 18, 2001 02:02:33 AM +0400 Hans Reiser <reiser@namesys.com> wrote:

> Andi Kleen wrote:
>> 
>> Craig Soules <soules@happyplace.pdl.cmu.edu> writes:
>> 
>> > Our system does automatic directory compaction through the use of a tree
>> > structure, and so the cookie needs to be invalidated.  Also, any other
>> > file system whicih does immediate directory compaction would require this.
>> 
>> Actually all the file systems who do that on Linux (JFS, XFS, reiserfs)
>> have fixed the issue properly server side, by adding a layer that generates
>> stable cookies. You should too.
>> 

> I take issue with the word "properly".  We have bastardized our FS design to do it.  NFS should not
> be allowed to impose stable cookie maintenance on filesystems, it violates layering.  Simply
> returning the last returned filename is so simple to code, much simpler than what we have to do to
> cope with cookies.  Linux should fix the protocol for NFS, not ask Craig to screw over his FS
> design.  Not that I think that will happen.....
> 

Well, returning the last filename won't do much for filesystems that don't
have any directory indexes, but that's besides the point.  Could nfsv4 be
better than it is?  probably.  Can we change older NFS protocols to have
a linux specific hack that makes them more filesystem (or at least reiserfs)
friendly?  probably.

NFS is what it is, good and bad.  People use it because it fits their setup
better than the others, so the filesystems need to cater to it, exactly
because it has such a large cross platform user base.  Linux specific mods
take away from that cross platform base.

-chris


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-17 22:21       ` Hans Reiser
  2001-07-18 13:30         ` Daniel Phillips
@ 2001-07-18 14:00         ` Jan Harkes
  2001-07-18 14:46           ` Hans Reiser
  1 sibling, 1 reply; 36+ messages in thread
From: Jan Harkes @ 2001-07-18 14:00 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Craig Soules, Andi Kleen, linux-kernel

On Wed, Jul 18, 2001 at 02:21:46AM +0400, Hans Reiser wrote:
> Craig Soules wrote:
> > Unfortunately to comply with NFSv2, the cookie cannot be larger than
> > 32-bits.  I believe this oversight has been correct in later NFS versions.
> > 
> > I do agree that forcing the underlying fs to "fix" itself for NFS is the
> > wrong solution. I can understand their desire to follow unix semantics
> > (although I don't entirely agree with them), so until I think up a more
> > palatable solution for the linux community, I will just keep my patches to
> > myself :)
> > 
> > Craig
> 
> 64 bits as in NFS v4 is still not large enough to hold a filename.
> For practical reasons, ReiserFS does what is needed to work with NFS,
> but what is needed bad design features, and any FS designer who
> doesn't feel the need to get along with NFS should not have acceptance
> of bad design be made a criterion for the acceptance of his patches.
> Just let NFS not work for Craig's FS, what is the problem with that?

Those 64-bits could be used for a simple hash to identify the filename.

In any case, what happens if the file was renamed or removed between the
2 readdir calls. A cookie identifying a name that was returned last, or
should be read next is just as volatile as a cookie that contains an
offset into the directory.

Jan


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-18 14:00         ` Jan Harkes
@ 2001-07-18 14:46           ` Hans Reiser
  2001-07-19 18:24             ` Pavel Machek
  0 siblings, 1 reply; 36+ messages in thread
From: Hans Reiser @ 2001-07-18 14:46 UTC (permalink / raw)
  To: Jan Harkes; +Cc: Craig Soules, Andi Kleen, linux-kernel

Jan Harkes wrote:
> 
> On Wed, Jul 18, 2001 at 02:21:46AM +0400, Hans Reiser wrote:
> > Craig Soules wrote:
> > > Unfortunately to comply with NFSv2, the cookie cannot be larger than
> > > 32-bits.  I believe this oversight has been correct in later NFS versions.
> > >
> > > I do agree that forcing the underlying fs to "fix" itself for NFS is the
> > > wrong solution. I can understand their desire to follow unix semantics
> > > (although I don't entirely agree with them), so until I think up a more
> > > palatable solution for the linux community, I will just keep my patches to
> > > myself :)
> > >
> > > Craig
> >
> > 64 bits as in NFS v4 is still not large enough to hold a filename.
> > For practical reasons, ReiserFS does what is needed to work with NFS,
> > but what is needed bad design features, and any FS designer who
> > doesn't feel the need to get along with NFS should not have acceptance
> > of bad design be made a criterion for the acceptance of his patches.
> > Just let NFS not work for Craig's FS, what is the problem with that?
> 
> Those 64-bits could be used for a simple hash to identify the filename.

That is what reiserfs does today.  It sucks.  The performance is worse for many to most apps because
files are not in lexicographic order, and stem compression is impossible.

> 
> In any case, what happens if the file was renamed or removed between the
> 2 readdir calls. A cookie identifying a name that was returned last, or
> should be read next is just as volatile as a cookie that contains an
> offset into the directory.

No, if the file was removed, it still tells you where to start your search.  A missing filename is
just as good a marker as a present one.

Hans

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-18 13:30         ` Daniel Phillips
@ 2001-07-18 14:46           ` Hans Reiser
  0 siblings, 0 replies; 36+ messages in thread
From: Hans Reiser @ 2001-07-18 14:46 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Craig Soules, Andi Kleen, linux-kernel

Daniel Phillips wrote:
> 
> On Wednesday 18 July 2001 00:21, Hans Reiser wrote:
> > Craig Soules wrote:
> > > On Wed, 18 Jul 2001, Hans Reiser wrote:
> > > > I take issue with the word "properly".  We have bastardized our
> > > > FS design to do it.  NFS should not be allowed to impose stable
> > > > cookie maintenance on filesystems, it violates layering.  Simply
> > > > returning the last returned filename is so simple to code, much
> > > > simpler than what we have to do to cope with cookies.  Linux
> > > > should fix the protocol for NFS, not ask Craig to screw over his
> > > > FS design.  Not that I think that will happen.....
> > >
> > > Unfortunately to comply with NFSv2, the cookie cannot be larger
> > > than 32-bits.  I believe this oversight has been correct in later
> > > NFS versions.
> > >
> > > I do agree that forcing the underlying fs to "fix" itself for NFS
> > > is the wrong solution. I can understand their desire to follow unix
> > > semantics (although I don't entirely agree with them), so until I
> > > think up a more palatable solution for the linux community, I will
> > > just keep my patches to myself :)
> > >
> > > Craig
> >
> > 64 bits as in NFS v4 is still not large enough to hold a filename.
> > For practical reasons, ReiserFS does what is needed to work with NFS,
> > but what is needed bad design features, and any FS designer who
> > doesn't feel the need to get along with NFS should not have
> > acceptance of bad design be made a criterion for the acceptance of
> > his patches.  Just let NFS not work for Craig's FS, what is the
> > problem with that?
> 
> I was planning to add coalesce-on-delete to my ext2 directory index
> patch at some point, now I see I'll step right into this NFS
> doo-d^H^H^H^H^H problem.  What to do?  Obviously it's not an option
> to have NFS not work for ext2.  Just leaving the directory
> uncoalesced fixes the problem in some sense and doesn't hurt things
> all that much.  Ext2 has been running that way for years.
> 
> Can I automagically know that a directory is mounted via NFS and
> disable the coalescing?  Or maybe I need a -o coalesce=on/off, with
> "off" as the default.  Ugh.
> 
> As you point out, this sucks.
> 
> --
> Daniel

The NFS v4 committee clowns had this pointed out to them, and failed to fix it.  I personally think
we should just fix Linux NFS to use filenames as cookies if both client and server are
"v4.cookie_monster" compliant.  That is not to say that I actually have a person free to do that
work, but it sure gets tempting at times.....

When we implement more advanced directories for ReiserFS (which we deferred doing until after
Reiser4), we will probably have to code up a v4.cookie_monster implementation.

Hans

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-18 13:57     ` Chris Mason
@ 2001-07-19 11:35       ` Trond Myklebust
  2001-07-19 18:02         ` Hans Reiser
  2001-07-20  8:50         ` Trond Myklebust
  0 siblings, 2 replies; 36+ messages in thread
From: Trond Myklebust @ 2001-07-19 11:35 UTC (permalink / raw)
  To: Chris Mason; +Cc: Hans Reiser, Andi Kleen, Craig Soules, linux-kernel

>>>>> " " == Chris Mason <mason@suse.com> writes:

     > Well, returning the last filename won't do much for filesystems
     > that don't have any directory indexes, but that's besides the
     > point.  Could nfsv4 be better than it is?  probably.  Can we
     > change older NFS protocols to have a linux specific hack that
     > makes them more filesystem (or at least reiserfs) friendly?
     > probably.

NFSv2 and v3 have a fixed format for readdir calls. There's bugger all
you can do to change this without making the resulting protocol
incompatible with NFS.

If you don't want Reiserfs to be NFS compatible, then fine, but I
personally don't want to see hacks to the NFS v2/v3 code that rely on
'hidden knowledge' of the filesystem on the server.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-19 11:35       ` Trond Myklebust
@ 2001-07-19 18:02         ` Hans Reiser
  2001-07-20  8:50         ` Trond Myklebust
  1 sibling, 0 replies; 36+ messages in thread
From: Hans Reiser @ 2001-07-19 18:02 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Chris Mason, Andi Kleen, Craig Soules, linux-kernel

Trond Myklebust wrote:
> 
> >>>>> " " == Chris Mason <mason@suse.com> writes:
> 
>      > Well, returning the last filename won't do much for filesystems
>      > that don't have any directory indexes, but that's besides the
>      > point.  Could nfsv4 be better than it is?  probably.  Can we
>      > change older NFS protocols to have a linux specific hack that
>      > makes them more filesystem (or at least reiserfs) friendly?
>      > probably.
> 
> NFSv2 and v3 have a fixed format for readdir calls. There's bugger all
> you can do to change this without making the resulting protocol
> incompatible with NFS.
> 
> If you don't want Reiserfs to be NFS compatible, then fine, but I
> personally don't want to see hacks to the NFS v2/v3 code that rely on
> 'hidden knowledge' of the filesystem on the server.
> 
> Cheers,
>   Trond
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

The current code does rely on hidden knowledge of the filesytem on the server, and refuses to
operate with any FS that does not describe a position in a directory as an offset or hash that fits
into 32 or 64 bits.

But be calm, I am not planning on fixing this myself anytime in the next year, we have an ugly and
hideous hack deployed in ReiserFS that works, for now I am just saying the folks who designed NFS
did a bad job and resolutely continue doing a bad job, and if someone wanted to fix it, they could
fix cookies to use filenames instead of byte offsets for those filesytems able to better use
filenames than byte offsets to describe a position within a directory, and for those clients and
servers who are both smart enough to understand filenames instead of cookies (able to understand the
cookie monster protocol).

Hans

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-18 14:46           ` Hans Reiser
@ 2001-07-19 18:24             ` Pavel Machek
  2001-07-22 15:15               ` Rob Landley
  0 siblings, 1 reply; 36+ messages in thread
From: Pavel Machek @ 2001-07-19 18:24 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Jan Harkes, Craig Soules, Andi Kleen, linux-kernel

Hi!

> > In any case, what happens if the file was renamed or removed between the
> > 2 readdir calls. A cookie identifying a name that was returned last, or
> > should be read next is just as volatile as a cookie that contains an
> > offset into the directory.
> 
> No, if the file was removed, it still tells you where to start your search.  A missing filename is
> just as good a marker as a present one.

And if new file is created with same name?
								Pavel

-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-19 11:35       ` Trond Myklebust
  2001-07-19 18:02         ` Hans Reiser
@ 2001-07-20  8:50         ` Trond Myklebust
  2001-07-20 11:30           ` Hans Reiser
  2001-07-20 14:07           ` Chris Mason
  1 sibling, 2 replies; 36+ messages in thread
From: Trond Myklebust @ 2001-07-20  8:50 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Chris Mason, Andi Kleen, Craig Soules, linux-kernel

>>>>> " " == Hans Reiser <reiser@namesys.com> writes:

     > The current code does rely on hidden knowledge of the filesytem
     > on the server, and refuses to operate with any FS that does not
     > describe a position in a directory as an offset or hash that
     > fits into 32 or 64 bits.

I'm not saying that ReiserFS is wrong to question the correctness of
this. I'm just saying that NFSv2 and v3 are fixed protocols, and that
it's too late to do anything about them. I read Chris mail as a
suggestion of creating yet another NQNFS, and this would IMHO be a
mistake. Better to concentrate on NFSv4 which is meant to be
extendible.

     > But be calm, I am not planning on fixing this myself anytime in
     > the next year, we have an ugly and hideous hack deployed in
     > ReiserFS that works, for now I am just saying the folks who
     > designed NFS did a bad job and resolutely continue doing a bad
     > job, and if someone wanted to fix it, they could fix cookies to
     > use filenames instead of byte offsets for those filesytems able
     > to better use filenames than byte offsets to describe a
     > position within a directory, and for those clients and servers
     > who are both smart enough to understand filenames instead of
     > cookies (able to understand the cookie monster protocol).

This is something which I believe you raised in the NFSv4 group, and
which could indeed be a candidate for an NFSv4 extension. After all,
this is in essence a recognition of the method most NFS clients
implement for recovering from an EBADCOOKIE error. Why was the idea
dropped?

(Note: As I said, under Linux we're currently hampered when
considering the above alternatives by the fact that glibc requires the
ability to lseek() on directories. This is a bug that they could
easily fix, and it affects not only your suggestion, but also all the
other suggestions in which one implements non-permanent cookies)

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-20  8:50         ` Trond Myklebust
@ 2001-07-20 11:30           ` Hans Reiser
  2001-07-20 14:07           ` Chris Mason
  1 sibling, 0 replies; 36+ messages in thread
From: Hans Reiser @ 2001-07-20 11:30 UTC (permalink / raw)
  To: trond.myklebust; +Cc: Chris Mason, Andi Kleen, Craig Soules, linux-kernel

Trond Myklebust wrote:
> 
> >>>>> " " == Hans Reiser <reiser@namesys.com> writes:
> 
>      > The current code does rely on hidden knowledge of the filesytem
>      > on the server, and refuses to operate with any FS that does not
>      > describe a position in a directory as an offset or hash that
>      > fits into 32 or 64 bits.
> 
> I'm not saying that ReiserFS is wrong to question the correctness of
> this. I'm just saying that NFSv2 and v3 are fixed protocols, and that
> it's too late to do anything about them. I read Chris mail as a
> suggestion of creating yet another NQNFS, and this would IMHO be a
> mistake. Better to concentrate on NFSv4 which is meant to be
> extendible.
> 
>      > But be calm, I am not planning on fixing this myself anytime in
>      > the next year, we have an ugly and hideous hack deployed in
>      > ReiserFS that works, for now I am just saying the folks who
>      > designed NFS did a bad job and resolutely continue doing a bad
>      > job, and if someone wanted to fix it, they could fix cookies to
>      > use filenames instead of byte offsets for those filesytems able
>      > to better use filenames than byte offsets to describe a
>      > position within a directory, and for those clients and servers
>      > who are both smart enough to understand filenames instead of
>      > cookies (able to understand the cookie monster protocol).
> 
> This is something which I believe you raised in the NFSv4 group, and
> which could indeed be a candidate for an NFSv4 extension. After all,
> this is in essence a recognition of the method most NFS clients
> implement for recovering from an EBADCOOKIE error. Why was the idea
> dropped?

Lack of desire to do anything, near as I could tell.

> 
> (Note: As I said, under Linux we're currently hampered when
> considering the above alternatives by the fact that glibc requires the
> ability to lseek() on directories. This is a bug that they could
> easily fix, and it affects not only your suggestion, but also all the
> other suggestions in which one implements non-permanent cookies)

I would be quite happy if you (or anyone) could fix it, sometime in the next 3 years.  

> 
> Cheers,
>    Trond

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-20  8:50         ` Trond Myklebust
  2001-07-20 11:30           ` Hans Reiser
@ 2001-07-20 14:07           ` Chris Mason
  1 sibling, 0 replies; 36+ messages in thread
From: Chris Mason @ 2001-07-20 14:07 UTC (permalink / raw)
  To: trond.myklebust, Hans Reiser; +Cc: Andi Kleen, Craig Soules, linux-kernel



On Friday, July 20, 2001 10:50:57 AM +0200 Trond Myklebust <trond.myklebust@fys.uio.no> wrote:

>>>>>> " " == Hans Reiser <reiser@namesys.com> writes:
> 
>      > The current code does rely on hidden knowledge of the filesytem
>      > on the server, and refuses to operate with any FS that does not
>      > describe a position in a directory as an offset or hash that
>      > fits into 32 or 64 bits.
> 
> I'm not saying that ReiserFS is wrong to question the correctness of
> this. I'm just saying that NFSv2 and v3 are fixed protocols, and that
> it's too late to do anything about them. I read Chris mail as a
> suggestion of creating yet another NQNFS, and this would IMHO be a
> mistake. Better to concentrate on NFSv4 which is meant to be
> extendible.

Ah, then I was unclear...I think that while we certainly could
make linux (or reiserfs) specific changes to NFSvOld, it would be
a really bad idea.  In my mind, the biggest strength behind NFS
is its cross platform support, and maintaining some extension
would only be slightly more fun than daily visits to the dentist ;-)

I also think it is easy to call NFSv4 poorly designed, but much
harder to design it to exploit the strengths of every FS on every
unix flavor.  Shrug, there are tradeoffs everywhere.  

I don't plan on supporting NFSv4 because it is the best network 
filesystem ever made, but because it is in our best interest to 
be compatible with those kinds of industry standards.

-chris


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-19 18:24             ` Pavel Machek
@ 2001-07-22 15:15               ` Rob Landley
  2001-07-23  2:02                 ` Horst von Brand
  0 siblings, 1 reply; 36+ messages in thread
From: Rob Landley @ 2001-07-22 15:15 UTC (permalink / raw)
  To: Pavel Machek, Hans Reiser
  Cc: Jan Harkes, Craig Soules, Andi Kleen, linux-kernel

On Thursday 19 July 2001 14:24, Pavel Machek wrote:
> Hi!
>
> > > In any case, what happens if the file was renamed or removed between
> > > the 2 readdir calls. A cookie identifying a name that was returned
> > > last, or should be read next is just as volatile as a cookie that
> > > contains an offset into the directory.
> >
> > No, if the file was removed, it still tells you where to start your
> > search.  A missing filename is just as good a marker as a present one.
>
> And if new file is created with same name?
> 								Pavel

The same thing that happens as if a new file was inserted BEFORE your cursor, 
in the part of the directory you've already looked at.  You ignore it.

The "filename cookie" indicates the LAST file we looked at.  We've already 
seen it.  Therefore, whether it's the same file or not, we don't care.  We 
just want the next file AFTER that one.

We're doing fairly arbitrary, unlocked reads across volatile data.  No 
algorithm is going to behave perfectly here.  We just want a behavior that is 
consistent, guaranteed to complete, and doesn't violate any obvious 
constraints about how filesystems should behave (like producing duplicate 
entries, which returning two different sets of data with the same filename 
would do).

Makes sense to me, anyway...

Rob

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-22 15:15               ` Rob Landley
@ 2001-07-23  2:02                 ` Horst von Brand
  2001-07-23  9:57                   ` Rob Landley
  0 siblings, 1 reply; 36+ messages in thread
From: Horst von Brand @ 2001-07-23  2:02 UTC (permalink / raw)
  To: landley; +Cc: linux-kernel

Rob Landley <landley@webofficenow.com> said:
> On Thursday 19 July 2001 14:24, Pavel Machek wrote:

[...]

> > > No, if the file was removed, it still tells you where to start your
> > > search.  A missing filename is just as good a marker as a present one.

> > And if new file is created with same name?

> The same thing that happens as if a new file was inserted BEFORE your
> cursor, in the part of the directory you've already looked at.  You
> ignore it.

Who says that if I've got files A, B, C, D, and delete B, and create a new
B, whatever underlying directory structure there is will place it where the
old B was? It might reuse holes before A...
-- 
Horst von Brand                             vonbrand@sleipnir.valparaiso.cl
Casilla 9G, Vin~a del Mar, Chile                               +56 32 672616

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-23  2:02                 ` Horst von Brand
@ 2001-07-23  9:57                   ` Rob Landley
  0 siblings, 0 replies; 36+ messages in thread
From: Rob Landley @ 2001-07-23  9:57 UTC (permalink / raw)
  To: Horst von Brand, landley; +Cc: linux-kernel

On Sunday 22 July 2001 22:02, Horst von Brand wrote:
> Rob Landley <landley@webofficenow.com> said:
> > On Thursday 19 July 2001 14:24, Pavel Machek wrote:
>
> [...]
>
> > > > No, if the file was removed, it still tells you where to start your
> > > > search.  A missing filename is just as good a marker as a present
> > > > one.
> > >
> > > And if new file is created with same name?
> >
> > The same thing that happens as if a new file was inserted BEFORE your
> > cursor, in the part of the directory you've already looked at.  You
> > ignore it.
>
> Who says that if I've got files A, B, C, D, and delete B, and create a new
> B, whatever underlying directory structure there is will place it where the
> old B was? It might reuse holes before A...

I suppose the assumption was that the directory entries are returned in 
alphabetically sorted order, even if the underlying filesystem doesn't do 
that.  Maybe this is a waste of effort on the server's part (and 
generating/maintaining other sorts of cookies aren't?), but it also seems 
fairly easy to make it work.  (I could easily be missing something obvious...)

Rob

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-10  8:22     ` Trond Myklebust
  2001-07-10 13:38       ` Chris Wedgwood
@ 2001-07-11  8:14       ` Trond Myklebust
  1 sibling, 0 replies; 36+ messages in thread
From: Trond Myklebust @ 2001-07-11  8:14 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Craig Soules, jrs, linux-kernel

>>>>> " " == Chris Wedgwood <cw@f00f.org> writes:

     > On Tue, Jul 10, 2001 at 10:22:16AM +0200, Trond Myklebust
     > wrote:
     >     Imagine if somebody gives you a 1Gb directory. Would it or
     >     would it not piss you off if your file pointer got reset to
     >     0 every time somebody created a file?
    
     >     The current semantics are scalable. Anything which resets
     >     the file pointer upon change of a file/directory/whatever
     >     isn't...

     > Anyone using a 1GB directory deserves for it not to scale.  I
     > think this is a very poor example.

It's an extreme case, but it illustrates something the kernel should
be able to cope with.

     > No that I disagree with you, the largest directories I have on
     > my system here are 2.6MB (freedb, lots of hashed flat-files in
     > one directory), here I do agree that you should not have to
     > reset the counter everytime.

Right: this is what most people expect. The reason why the readdir
code went through several quite different incarnations in the 2.3.x
series was that duplicate directory entries were not acceptable to
people.

readdir() is not an atomic operation. You can't lock a directory while
doing a series of readdir calls either on local filesystems nor over
NFS. As such, the idea of volatile cookies doesn't really make sense,
nor is it supported in rfc1094 (NFSv2):

   Each "entry" contains a "fileid" which consists of a unique number
   to identify the file within a filesystem, the "name" of the file,
   and a "cookie" which is an opaque pointer to the next entry in the
   directory.  The cookie is used in the next READDIR call to get more
   entries starting at a given point in the directory.

Nothing there states that the cookie can be invalidated, nor is there
even an error to tell you that this is the case.


In rfc1813 (NFSv3), they recognized that NFSv2 couldn't cope with
stale cookies (yes: this fact is explicitly written down on pages 77
and 78), and hence they introduced the cookie verifier and the
NFS3ERR_BAD_COOKIE error, that can be used by the server to declare a
cookie as being stale. In this case, some extra recovery action might
make sense. I can see 3 possible solutions:

  1) The behaviour in this case is undefined. Leave it up to the
     user to reopen the directory, reset the file pointer, or whatever.
     This is what we do now.

  2) implement some extra caching info to allow an improved recovery
     of the last file position. This would likely have to involve
     storing the fileid + filename of the last entry somewhere in the
     struct file.

     This scheme means that lseek() breaks, and can undermine
     glibc. The latter has a lousy getdents algorithm in which it
     reads a number n of entries into a temporary buffer, the copy <=
     n entries to the user (because their struct dirent is larger than
     the kernel struct dirent), and then use lseek() to jump back.

  3) Implement something like Craig suggests whereby you reset the
     file pointer.

     This gives unexpected results as far as the user is concerned as
     it causes duplicate entries to pop up without any warning. It too
     breaks lseek(). It's a policy decision on behalf of the user.

If someone can persuade the glibc people to implement a sane algorithm
for getdents() that precalculates the upper limit on how much padding
is needed, and drops the use of lseek(), then (2) might possibly be
worth doing for 2.5.x (I believe I heard that Solaris does something
along these lines).

If not, given a choice between (1) and (3), I choose (1).



Finally, any scheme that assumes cookie staleness in all cases where
the directory mtime changes will not be passed on to Linus.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: NFS Client patch
  2001-07-09 21:46     ` J. Richard Sladkey
@ 2001-07-10 15:06       ` Craig Soules
  0 siblings, 0 replies; 36+ messages in thread
From: Craig Soules @ 2001-07-10 15:06 UTC (permalink / raw)
  To: J. Richard Sladkey; +Cc: Trond Myklebust, linux-kernel

On Mon, 9 Jul 2001, J. Richard Sladkey wrote:
> This interpretation isn't useful.  If a second client modifies the
> directory while the first client is reading a directory, the first
> client has no way of knowing that its cookie is now invalid, yet it
> clearly will be invalid if the server's cookies are invalid after
> any directory modifying operation.

I believe that the behavior in this case is still undetermanistic.  More 
generically, if one person is writing to a file and another is reading, it
is unclear what that person will get at all.

Craig


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-10  8:22     ` Trond Myklebust
@ 2001-07-10 13:38       ` Chris Wedgwood
  2001-07-11  8:14       ` Trond Myklebust
  1 sibling, 0 replies; 36+ messages in thread
From: Chris Wedgwood @ 2001-07-10 13:38 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Craig Soules, jrs, linux-kernel

On Tue, Jul 10, 2001 at 10:22:16AM +0200, Trond Myklebust wrote:

    Imagine if somebody gives you a 1Gb directory. Would it or would
    it not piss you off if your file pointer got reset to 0 every time
    somebody created a file?
    
    The current semantics are scalable. Anything which resets the file
    pointer upon change of a file/directory/whatever isn't...

Anyone using a 1GB directory deserves for it not to scale.  I think
this is a very poor example.

No that I disagree with you, the largest directories I have on my
system here are 2.6MB (freedb, lots of hashed flat-files in one
directory), here I do agree that you should not have to reset the
counter everytime.





  --cw

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-09 20:05   ` Trond Myklebust
  2001-07-09 22:09     ` Craig Soules
@ 2001-07-10  8:22     ` Trond Myklebust
  2001-07-10 13:38       ` Chris Wedgwood
  2001-07-11  8:14       ` Trond Myklebust
  1 sibling, 2 replies; 36+ messages in thread
From: Trond Myklebust @ 2001-07-10  8:22 UTC (permalink / raw)
  To: Craig Soules; +Cc: jrs, linux-kernel

>>>>> " " == Craig Soules <soules@happyplace.pdl.cmu.edu> writes:

     > On Mon, 9 Jul 2001, Trond Myklebust wrote:
    >> If the client discovers that the cache is invalid, it clears
    >> it, and refills the cache. We then start off at the next cookie
    >> after the last read cookie. Test it on an ordinary filesystem
    >> and you'll see the exact same behaviour. The act of creating or
    >> deleting files is *not* supposed invalidate the readdir offset.

     > I would say that assuming that the readdir cookie is an offset
     > is a break in the spec.  In fact, there are a few things in the
     > spec which I'd like to point out.  First of all, "All of the
     > procedures in the NFS protocol are assumed to be synchronous."
     > Which means that you should not even be making asynchronous
     > remove calls.  Second, the server is meant to be "as stateless
     > as possible."  I would argue that this means that you should
     > not make assumptions about the cookie's state if another
     > operation is interposed between two readdir() operations.  As
     > an aside, by adding a translation layer to the cookies (as
     > suggested by an earlier post) would break this, as the server
     > would have to store that state in the event of a server crash,
     > thus breaking the spec.

Imagine if somebody gives you a 1Gb directory. Would it or would it
not piss you off if your file pointer got reset to 0 every time
somebody created a file?

The current semantics are scalable. Anything which resets the file
pointer upon change of a file/directory/whatever isn't...

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-09 20:05   ` Trond Myklebust
@ 2001-07-09 22:09     ` Craig Soules
  2001-07-10  8:22     ` Trond Myklebust
  1 sibling, 0 replies; 36+ messages in thread
From: Craig Soules @ 2001-07-09 22:09 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: jrs, linux-kernel

On Mon, 9 Jul 2001, Trond Myklebust wrote:
> If the client discovers that the cache is invalid, it clears it, and
> refills the cache. We then start off at the next cookie after the last
> read cookie. Test it on an ordinary filesystem and you'll see the
> exact same behaviour. The act of creating or deleting files is *not*
> supposed invalidate the readdir offset.

I would say that assuming that the readdir cookie is an offset is a break
in the spec.  In fact, there are a few things in the spec which I'd like
to point out.  First of all, "All of the procedures in the NFS protocol
are assumed to be synchronous."  Which means that you should not even be
making asynchronous remove calls.  Second, the server is meant to be
"as stateless as possible."  I would argue that this means that you should
not make assumptions about the cookie's state if another operation is
interposed between two readdir() operations.  As an aside, by adding a
translation layer to the cookies (as suggested by an earlier post) would
break this, as the server would have to store that state in the event of a
server crash, thus breaking the spec.

> You are confusing the act of detecting whether or not the cache is
> invalid with that of recovering after a cache invalidation. In the
> former case we do have room for improvement: see for instance

It may be that my solution is imperfect for the problem I would like
fixed, however, I do not believe that your current recovery mechanism is
correct.

Craig


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: NFS Client patch
  2001-07-09 19:45   ` Craig Soules
  2001-07-09 19:53     ` Charles Cazabon
@ 2001-07-09 21:46     ` J. Richard Sladkey
  2001-07-10 15:06       ` Craig Soules
  1 sibling, 1 reply; 36+ messages in thread
From: J. Richard Sladkey @ 2001-07-09 21:46 UTC (permalink / raw)
  To: Craig Soules, Trond Myklebust; +Cc: linux-kernel

> Ok, perhaps I mis-spoke slightly.  What the spec does state is that the
> cookie is opaque.  This has generally been interpreted to mean that you
> should not trust it to be stable after a change to that directory.

This interpretation isn't useful.  If a second client modifies the
directory while the first client is reading a directory, the first
client has no way of knowing that its cookie is now invalid, yet it
clearly will be invalid if the server's cookies are invalid after
any directory modifying operation.

The solution is that the server must cope with directory modifying
operations and still keep its cookies valid.  It can do this by
cooperating with the filesystem to update the "true" index associated
with the "opaque" index stored in any outstanding cookies.  Or it can
simply have a more sophisticated cookie, such as passing the inode number
of the first unread directory entry in the cookie and the offset of the
entry.  If they are the same continue as usual.  If they are different,
re-read the directory from the beginning, searching for that specific
inode.  Or any other scheme.  Note that this information is still
completely opaque to the client.

Other operating systems may not show the problem if they pre-read
much larger blocks of directory entries.  However, you should be able
to provoke the problem out of any OS by creating a sufficiently
large directory.  In any case, the two-client argument clearly shows
that cookies should be so fragile that any directory operation
makes them invalid.  This makes your server more complicated, but it
seems like the correct behavior.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-09 18:59 ` Trond Myklebust
  2001-07-09 19:45   ` Craig Soules
@ 2001-07-09 20:05   ` Trond Myklebust
  2001-07-09 22:09     ` Craig Soules
  2001-07-10  8:22     ` Trond Myklebust
  1 sibling, 2 replies; 36+ messages in thread
From: Trond Myklebust @ 2001-07-09 20:05 UTC (permalink / raw)
  To: Craig Soules; +Cc: Trond Myklebust, jrs, linux-kernel

>>>>> " " == Craig Soules <soules@happyplace.pdl.cmu.edu> writes:

    >> Your patch will automatically lead to duplicate entries in
    >> readdir() on most if not all servers whenever the attributes on
    >> the inode have been refreshed (whether or not the cache has
    >> been invalidated). That's a bug...

     > If I were to do a create during a readdir() operation which
     > inserted itself in the directory before the place it left off,
     > that entry would be left out of the listing.  That is also a
     > bug, wouldn't you think?

No: it's POSIX

If the client discovers that the cache is invalid, it clears it, and
refills the cache. We then start off at the next cookie after the last
read cookie. Test it on an ordinary filesystem and you'll see the
exact same behaviour. The act of creating or deleting files is *not*
supposed invalidate the readdir offset.

You are confusing the act of detecting whether or not the cache is
invalid with that of recovering after a cache invalidation. In the
former case we do have room for improvement: see for instance

  http://www.fys.uio.no/~trondmy/src/2.4.6/linux-2.4.6-cto.dif

which strengthens the attribute checking on open().

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-09 19:45   ` Craig Soules
@ 2001-07-09 19:53     ` Charles Cazabon
  2001-07-09 21:46     ` J. Richard Sladkey
  1 sibling, 0 replies; 36+ messages in thread
From: Charles Cazabon @ 2001-07-09 19:53 UTC (permalink / raw)
  To: linux-kernel

Craig Soules <soules@happyplace.pdl.cmu.edu> wrote:
> 
> Ok, perhaps I mis-spoke slightly.  What the spec does state is that the
> cookie is opaque.  This has generally been interpreted to mean that you
> should not trust it to be stable after a change to that directory.

I thought the generally accepted meaning of "opaque" in this context was
"don't expect to be able to infer anything relevant from this data".  This has
nothing to do with how long a particular cookie is valid.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon                            <linux@discworld.dyndns.org>
GPL'ed software available at:  http://www.qcc.sk.ca/~charlesc/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: NFS Client patch
  2001-07-09 18:59 ` Trond Myklebust
@ 2001-07-09 19:45   ` Craig Soules
  2001-07-09 19:53     ` Charles Cazabon
  2001-07-09 21:46     ` J. Richard Sladkey
  2001-07-09 20:05   ` Trond Myklebust
  1 sibling, 2 replies; 36+ messages in thread
From: Craig Soules @ 2001-07-09 19:45 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: jrs, linux-kernel

On Mon, 9 Jul 2001, Trond Myklebust wrote:
> The NFSv2 spec says no such thing. It simply says that you set the
> cookie to zero when you want to start at the beginning of the
> directory. This is only needed when we want to reread the directory
> into the page cache.

Ok, perhaps I mis-spoke slightly.  What the spec does state is that the
cookie is opaque.  This has generally been interpreted to mean that you
should not trust it to be stable after a change to that directory.

> Your patch will automatically lead to duplicate entries in readdir()
> on most if not all servers whenever the attributes on the inode have
> been refreshed (whether or not the cache has been invalidated). That's
> a bug...

If I were to do a create during a readdir() operation which inserted
itself in the directory before the place it left off, that entry would be
left out of the listing.  That is also a bug, wouldn't you think?

I'd also like to point out that every other operating system which I have
tested this with has resulted in the correct behavior (NetBSD, FreeBSD,
Digital Unix, ...)

As for the refresh vs. invalidate problem, I would be happy to add
another time stamp to the in-core nfs inode exclusively for directory
invalidation.

Craig


^ permalink raw reply	[flat|nested] 36+ messages in thread

* NFS Client patch
  2001-07-09 17:28 Craig Soules
@ 2001-07-09 18:59 ` Trond Myklebust
  2001-07-09 19:45   ` Craig Soules
  2001-07-09 20:05   ` Trond Myklebust
  0 siblings, 2 replies; 36+ messages in thread
From: Trond Myklebust @ 2001-07-09 18:59 UTC (permalink / raw)
  To: Craig Soules; +Cc: jrs, linux-kernel

>>>>> " " == Craig Soules <soules@happyplace.pdl.cmu.edu> writes:

     > Hello, I hope that I am sending this to the appropriate people.
     > I have been working on a project known as Self-Securing Storage
     > here at Carnegie Mellon University.  We have developed our
     > storage server to act as an NFSv2 server, and have been using
     > the Linux NFSv2 client to do our benchmarking. I have run
     > across a small problem with the 2.4 implementation of the Linux
     > NFSv2 client.

     > The problem is in the readdir() operation.  The current cookie
     > for a given readdir operation is being stored in the file
     > descriptor.  The problem is that it is not being reset to 0 if
     > a change has been made to the directory as is indicated in the
     > NFSv2 spec.  This problem is often seen when doing an operation
     > such as rm -rf to a large directory tree due to the
     > asynchronous remove operation that has been implemented.

The NFSv2 spec says no such thing. It simply says that you set the
cookie to zero when you want to start at the beginning of the
directory. This is only needed when we want to reread the directory
into the page cache.

Your patch will automatically lead to duplicate entries in readdir()
on most if not all servers whenever the attributes on the inode have
been refreshed (whether or not the cache has been invalidated). That's
a bug...

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 36+ messages in thread

* NFS Client patch
@ 2001-07-09 17:28 Craig Soules
  2001-07-09 18:59 ` Trond Myklebust
  0 siblings, 1 reply; 36+ messages in thread
From: Craig Soules @ 2001-07-09 17:28 UTC (permalink / raw)
  To: trond.myklebust, jrs; +Cc: linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1574 bytes --]

Hello,

I hope that I am sending this to the appropriate people.  I have been
working on a project known as Self-Securing Storage here at Carnegie
Mellon University.  We have developed our storage server to act as an
NFSv2 server, and have been using the Linux NFSv2 client to do our
benchmarking. I have run across a small problem with the 2.4
implementation of the Linux NFSv2 client.

The problem is in the readdir() operation.  The current cookie for a given
readdir operation is being stored in the file descriptor.  The problem is
that it is not being reset to 0 if a change has been made to the
directory as is indicated in the NFSv2 spec.  This problem is often
seen when doing an operation such as rm -rf to a large directory tree due
to the asynchronous remove operation that has been implemented.

This has not traditionally been a problem for Linux because in ext2 the
cookie is the offset into the directory of the next entry.  If a file
is deleted in that directory during the readdir() operation, it has no
effect since ext2 does lazy directory compaction.

Our system does automatic directory compaction through the use of a tree
structure, and so the cookie needs to be invalidated.  Also, any other
file system whicih does immediate directory compaction would require this.

I have attached my proposed modifications to the end of this file.  They
have been tested under 2.4.5.

Please let me know if there is anything more I need to do, or if you have
any questions.  I would really like to get this proper behavior into the
kernel.

Thanks,
Craig Soules

[-- Attachment #2: Type: TEXT/PLAIN, Size: 2570 bytes --]

--- linux/include/linux/nfs_fs.h	Fri May 25 21:02:11 2001
+++ /usr/src/linux/include/linux/nfs_fs.h	Sun Jul  8 14:40:57 2001
@@ -160,13 +160,18 @@
 static __inline__ struct rpc_cred *
 nfs_file_cred(struct file *file)
 {
-	struct rpc_cred *cred = (struct rpc_cred *)(file->private_data);
+	struct nfs_file_private *priv =
+		(struct nfs_file_private *)(file->private_data);
+	struct rpc_cred *cred = priv->cred;
 #ifdef RPC_DEBUG
 	if (cred && cred->cr_magic != RPCAUTH_CRED_MAGIC)
 		BUG();
 #endif
 	return cred;
 }
+
+#define NFS_FILE_LASTMOD(filep) \
+	((struct nfs_file_private *)((filep)->private_data))->lastmod
 
 /*
  * linux/fs/nfs/dir.c
--- linux/include/linux/nfs_fs_i.h	Mon Feb 19 20:13:00 2001
+++ /usr/src/linux/include/linux/nfs_fs_i.h	Sun Jul  8 14:18:16 2001
@@ -98,4 +98,12 @@
  */
 #define NFS_LCK_GRANTED		0x0001		/* lock has been granted */
 
+/*
+ * NFS private data in the file descriptor
+ */
+struct nfs_file_private {
+	struct rpc_cred *cred;
+	unsigned long lastmod;
+};
+
 #endif
--- linux/fs/nfs/dir.c	Sat May 19 21:02:45 2001
+++ /usr/src/linux/fs/nfs/dir.c	Sun Jul  8 14:35:55 2001
@@ -384,6 +384,11 @@
 	memset(&my_entry, 0, sizeof(my_entry));
 
 	desc->file = filp;
+
+	if(NFS_FILE_LASTMOD(filp) < NFS_ATTRTIMEO_UPDATE(inode)) {
+		filp->f_pos = 0;
+		NFS_FILE_LASTMOD(filp) = NFS_ATTRTIMEO_UPDATE(inode);
+	}
 	desc->target = filp->f_pos;
 	desc->entry = &my_entry;
 	desc->decode = NFS_PROTO(inode)->decode_dirent;
--- linux/fs/nfs/inode.c	Sat May 19 21:14:38 2001
+++ /usr/src/linux/fs/nfs/inode.c	Sun Jul  8 14:36:30 2001
@@ -799,13 +799,14 @@
  */
 int nfs_open(struct inode *inode, struct file *filp)
 {
+	struct nfs_file_private *priv;
 	struct rpc_auth *auth;
-	struct rpc_cred *cred;
 
 	lock_kernel();
+	priv = kmalloc(sizeof(struct nfs_file_private), GFP_KERNEL);
 	auth = NFS_CLIENT(inode)->cl_auth;
-	cred = rpcauth_lookupcred(auth, 0);
-	filp->private_data = cred;
+	priv->cred = rpcauth_lookupcred(auth, 0);
+	filp->private_data = priv;
 	unlock_kernel();
 	return 0;
 }
@@ -814,12 +815,17 @@
 {
 	struct rpc_auth *auth;
 	struct rpc_cred *cred;
+	struct nfs_file_private *priv;
 
 	lock_kernel();
-	auth = NFS_CLIENT(inode)->cl_auth;
-	cred = nfs_file_cred(filp);
-	if (cred)
-		rpcauth_releasecred(auth, cred);
+	priv = filp->private_data;
+	if(priv) {
+		auth = NFS_CLIENT(inode)->cl_auth;
+		cred = nfs_file_cred(filp);
+		if (cred)
+			rpcauth_releasecred(auth, cred);
+		kfree(priv);
+	}
 	unlock_kernel();
 	return 0;
 }

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2001-07-23 18:59 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.3.96L.1010709131315.16113O-200000@happyplace.pdl.cmu.edu.suse.lists.linux.kernel>
2001-07-09 18:33 ` NFS Client patch Andi Kleen
2001-07-10 13:33   ` Chris Wedgwood
2001-07-10 13:41     ` Andi Kleen
2001-07-10 16:48       ` Craig Soules
2001-07-10 17:06         ` Andi Kleen
2001-07-10 18:04           ` Chris Wedgwood
2001-07-12 20:57             ` Alan Cox
2001-07-13 11:26               ` Chris Wedgwood
2001-07-17 22:02   ` Hans Reiser
2001-07-17 22:14     ` Craig Soules
2001-07-17 22:21       ` Hans Reiser
2001-07-18 13:30         ` Daniel Phillips
2001-07-18 14:46           ` Hans Reiser
2001-07-18 14:00         ` Jan Harkes
2001-07-18 14:46           ` Hans Reiser
2001-07-19 18:24             ` Pavel Machek
2001-07-22 15:15               ` Rob Landley
2001-07-23  2:02                 ` Horst von Brand
2001-07-23  9:57                   ` Rob Landley
2001-07-18 13:57     ` Chris Mason
2001-07-19 11:35       ` Trond Myklebust
2001-07-19 18:02         ` Hans Reiser
2001-07-20  8:50         ` Trond Myklebust
2001-07-20 11:30           ` Hans Reiser
2001-07-20 14:07           ` Chris Mason
2001-07-09 17:28 Craig Soules
2001-07-09 18:59 ` Trond Myklebust
2001-07-09 19:45   ` Craig Soules
2001-07-09 19:53     ` Charles Cazabon
2001-07-09 21:46     ` J. Richard Sladkey
2001-07-10 15:06       ` Craig Soules
2001-07-09 20:05   ` Trond Myklebust
2001-07-09 22:09     ` Craig Soules
2001-07-10  8:22     ` Trond Myklebust
2001-07-10 13:38       ` Chris Wedgwood
2001-07-11  8:14       ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).