All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [nfsv4] RFC 7530: Filehandle of opened file after the REMOVE
       [not found]                 ` <20161229074830.GA3002@lst.de>
@ 2016-12-29 20:54                   ` Bruce James Fields
  2016-12-30  8:35                     ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Bruce James Fields @ 2016-12-29 20:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Trond Myklebust, linux-nfs

On Thu, Dec 29, 2016 at 08:48:30AM +0100, Christoph Hellwig wrote:
> On Wed, Dec 28, 2016 at 09:47:03PM -0500, Bruce James Fields wrote:
> > I never seriously worked on it, but for a while I was in the habit of
> > running it by people.  Christoph Hellwig thought it was doable (I think
> > he suggested some sort of callback from the filesystem during the
> > garbage collection, possibly because he had in mind some other
> > application for that--but my memory may be wrong).  Chris Mason didn't
> > like the idea at all.  He asked what we expect to happen on fsck, or if
> > the filesystem gets mounted without nfs getting started, or... some
> > other scenarios I forget.
> 
> The way open but unlinked files are handled by modern transaction
> file systems is that the file system has a list of those inodes
> (in XFS this is the unlinked inode list in the allocation group header,
> other file systems use different terminologies and slightly different
> technics, e.g. in ext4 the list is global for the whole file system).
> 
> After an unclean shutdown when file system recovery is run we'll perform
> the deferred delete for all the inodes on the unlinked inode list.
> At that point the file system could in theory inform NFSD about that
> fact.  But at least as far as the current Linux kernel is concerned (
> sorry for delving into implementation details, but I guess this is still
> easier to understand than an abstract discussion) at the point where
> file system performs recovery NFSD has not been started, or at least doesn't
> know about the file system yet.   We could still persist that information
> somewhere, or use a flag to delay the deletion of unlinked inodes until
> NFSD runs.

Veering even further into implementation details (and changing cc: to
linux-nfs instead of nfsv4@ietf, hope that's OK):

I assume this would need userspace updates too, so fsck would know not
to free the unlinked files, and so administrators could see what was
going on and maybe free them manually if need be.

It may seem like overkill, but we have (mostly complete) support for
running multiple nfsd's in containers, which can be started and stopped
independently.  And we may want to allow a single filesystem to be
exported by more than one such nfsd. I think we can still manage that
with a single unlinked inode list, though--we'd just need logic in nfsd
to delay freeing as long as any nfsd is restarting.

--b.

> > We could do the same silly rename tricks on the server side.  Something
> > like: create a directory with an unlikely name in the root of the
> > export, rename files there on REMOVE.  Possible problems:
> 
> Personally I'd love to see sillyrename die.  It's a major pain for
> getting sensible semantics out of NFS.
> 
> > 	- you'll never be able to completely hide that directory.  But
> > 	  maybe we could get some sort of filesystem support for a
> > 	  hidden directory.
> 
> 
> The unlinked inode list is almost a directory, except that it doesn't
> have names for the entries, you can only find inodes on it by the inode
> number and generation (aka NFS file handle).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [nfsv4] RFC 7530: Filehandle of opened file after the REMOVE
  2016-12-29 20:54                   ` [nfsv4] RFC 7530: Filehandle of opened file after the REMOVE Bruce James Fields
@ 2016-12-30  8:35                     ` Christoph Hellwig
  2017-01-01 13:58                       ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2016-12-30  8:35 UTC (permalink / raw)
  To: Bruce James Fields; +Cc: Trond Myklebust, linux-nfs

On Thu, Dec 29, 2016 at 03:54:26PM -0500, Bruce James Fields wrote:
> I assume this would need userspace updates too, so fsck would know not
> to free the unlinked files, and so administrators could see what was
> going on and maybe free them manually if need be.

At least for XFS that's not a worry - if the file system is so toast
that you run xfs_repair NFS exports getting back are the least of your
worries.

Note that these open but unlinked files already happen all the time during
normal operation, they only interesting aspect that NFS would add is that
we might have to reclaim them later when NFS is involved.

> It may seem like overkill, but we have (mostly complete) support for
> running multiple nfsd's in containers, which can be started and stopped
> independently.  And we may want to allow a single filesystem to be
> exported by more than one such nfsd. I think we can still manage that
> with a single unlinked inode list, though--we'd just need logic in nfsd
> to delay freeing as long as any nfsd is restarting.

I'd really want to avoid that logic in the fs.  What I can trivially
offer you from the fs is to:

 1) offer a mount option not to reclaim the unlinked inode list
 2) offer an interface (e.g. in super_operations or export_ops)
    to reclaim the unlinked inodes for a fs mounted with that option

and NFSD would have call the second options.  But I'd like to keep
it out of the fs when that is called exactly.

Note that the filesystem still is in a perfectly fine (although a little
odd) state before we reclaim the unlinked inodes - from the on disk
POV it is not different from a currently active but unlinked file,
the only difference is that we won't ever get a final unlink for it
and only a manual reclaim free them.

The only somewhat hard thing about implementing this on the fs side
is to come up with a scheme to distinguish between old unlinked inodes
left over from before a crash, and new ones created after it that are
active.  But that's something a simple inode flag should be able to
solve.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [nfsv4] RFC 7530: Filehandle of opened file after the REMOVE
  2016-12-30  8:35                     ` Christoph Hellwig
@ 2017-01-01 13:58                       ` Christoph Hellwig
  2017-01-01 22:10                         ` Bruce James Fields
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2017-01-01 13:58 UTC (permalink / raw)
  To: Bruce James Fields; +Cc: Trond Myklebust, linux-nfs

Btw, thinking about this a bit more the simplest thing possible would be a
mount option to delay reclaiming unlinked inodes for N seconds, set to the
NFS gracce periods plus a reasonable slack for starting NFSD after mounting
the fs.  This would be fairly easily to implement in the fs, does not
require tight coupling between the fs and NFSD, and will eventually reclaim
the unlinked inodes even if the file system happens to not be exported at
all. 

I could implement this quickly if you want to play around with the NFSD
side.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [nfsv4] RFC 7530: Filehandle of opened file after the REMOVE
  2017-01-01 13:58                       ` Christoph Hellwig
@ 2017-01-01 22:10                         ` Bruce James Fields
  2017-01-02  8:40                           ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Bruce James Fields @ 2017-01-01 22:10 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Trond Myklebust, linux-nfs

On Sun, Jan 01, 2017 at 02:58:17PM +0100, Christoph Hellwig wrote:
> Btw, thinking about this a bit more the simplest thing possible would be a
> mount option to delay reclaiming unlinked inodes for N seconds, set to the
> NFS gracce periods plus a reasonable slack for starting NFSD after mounting
> the fs.  This would be fairly easily to implement in the fs, does not
> require tight coupling between the fs and NFSD, and will eventually reclaim
> the unlinked inodes even if the file system happens to not be exported at
> all. 
> 
> I could implement this quickly if you want to play around with the NFSD
> side.

Sure, that'd be interesting.

How do we handle clean shutdown, though?  At a minimum a server admin
needs to be able to e.g. take down the server for an OS upgrade.

--b.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [nfsv4] RFC 7530: Filehandle of opened file after the REMOVE
  2017-01-01 22:10                         ` Bruce James Fields
@ 2017-01-02  8:40                           ` Christoph Hellwig
  2017-01-02 15:27                             ` Bruce James Fields
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2017-01-02  8:40 UTC (permalink / raw)
  To: Bruce James Fields; +Cc: Trond Myklebust, linux-nfs

On Sun, Jan 01, 2017 at 05:10:25PM -0500, Bruce James Fields wrote:
> How do we handle clean shutdown, though?  At a minimum a server admin
> needs to be able to e.g. take down the server for an OS upgrade.

That's going to be a nightmare to implement unfortunately.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [nfsv4] RFC 7530: Filehandle of opened file after the REMOVE
  2017-01-02  8:40                           ` Christoph Hellwig
@ 2017-01-02 15:27                             ` Bruce James Fields
  2017-01-04 17:42                               ` Bruce James Fields
  0 siblings, 1 reply; 9+ messages in thread
From: Bruce James Fields @ 2017-01-02 15:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Trond Myklebust, linux-nfs

On Mon, Jan 02, 2017 at 09:40:05AM +0100, Christoph Hellwig wrote:
> On Sun, Jan 01, 2017 at 05:10:25PM -0500, Bruce James Fields wrote:
> > How do we handle clean shutdown, though?  At a minimum a server admin
> > needs to be able to e.g. take down the server for an OS upgrade.
> 
> That's going to be a nightmare to implement unfortunately.

Ugh.  I think it's a requirement; without it:

	- if we set the flag that allows the client to turn off
	  sillyrename, then users will see a regression (ESTALE after
	  clean shutdowns in situations we previously guaranteed safe).

	- if we don't set that flag, we don't get to turn off client
	  sillyrename.  The only improvement is that we avoid ESTALE
	  after crashes on files unlinked by a different client than
	  held it open.  I don't think that's very interesting on its
	  own.

--b.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [nfsv4] RFC 7530: Filehandle of opened file after the REMOVE
  2017-01-02 15:27                             ` Bruce James Fields
@ 2017-01-04 17:42                               ` Bruce James Fields
  2017-01-05  5:51                                 ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Bruce James Fields @ 2017-01-04 17:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Trond Myklebust, linux-nfs

On Mon, Jan 02, 2017 at 10:27:41AM -0500, Bruce James Fields wrote:
> On Mon, Jan 02, 2017 at 09:40:05AM +0100, Christoph Hellwig wrote:
> > On Sun, Jan 01, 2017 at 05:10:25PM -0500, Bruce James Fields wrote:
> > > How do we handle clean shutdown, though?  At a minimum a server admin
> > > needs to be able to e.g. take down the server for an OS upgrade.
> > 
> > That's going to be a nightmare to implement unfortunately.
> 
> Ugh.  I think it's a requirement; without it:
> 
> 	- if we set the flag that allows the client to turn off
> 	  sillyrename, then users will see a regression (ESTALE after
> 	  clean shutdowns in situations we previously guaranteed safe).
> 
> 	- if we don't set that flag, we don't get to turn off client
> 	  sillyrename.  The only improvement is that we avoid ESTALE
> 	  after crashes on files unlinked by a different client than
> 	  held it open.  I don't think that's very interesting on its
> 	  own.

Dumb question: don't local filesystems have the ability to do some sort
of emergency conversion to read-only on detecting corruption?  Does that
prevent any open-file cleanup?  If not that, is there some other
mechanism nfsd could use to crash the filesystem on shutdown if
appropriate (so if it's holding opens on a filesystem and if the
filesystem was mounted with the new option)?

Possibly better would be if we could keep a separate list of
unlinked-but-still-held-by-nfsd files that was managed diferently than
the existing list.

But, I don't have the local filesystem knowledge to know where the
nightmares are here.

--b.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [nfsv4] RFC 7530: Filehandle of opened file after the REMOVE
  2017-01-04 17:42                               ` Bruce James Fields
@ 2017-01-05  5:51                                 ` Christoph Hellwig
  2017-01-06 21:13                                   ` Bruce James Fields
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2017-01-05  5:51 UTC (permalink / raw)
  To: Bruce James Fields; +Cc: Christoph Hellwig, Trond Myklebust, linux-nfs

On Wed, Jan 04, 2017 at 12:42:45PM -0500, Bruce James Fields wrote:
> Dumb question: don't local filesystems have the ability to do some sort
> of emergency conversion to read-only on detecting corruption?

Yes.

> Does that
> prevent any open-file cleanup?

Yes, at least before the reboot.

> If not that, is there some other
> mechanism nfsd could use to crash the filesystem on shutdown if
> appropriate (so if it's holding opens on a filesystem and if the
> filesystem was mounted with the new option)?
> 
> Possibly better would be if we could keep a separate list of
> unlinked-but-still-held-by-nfsd files that was managed diferently than
> the existing list.
> 
> But, I don't have the local filesystem knowledge to know where the
> nightmares are here.

Maybe I shouldn't have called it a nighmare, but it's significantly
more effort.  We'll need a way for NFSD to mark a file as not being
allowed to cleaned up before the final iput for the reboot case
mostly.

I'll try to come up with a prototype later this month, but it might not
be pretty.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [nfsv4] RFC 7530: Filehandle of opened file after the REMOVE
  2017-01-05  5:51                                 ` Christoph Hellwig
@ 2017-01-06 21:13                                   ` Bruce James Fields
  0 siblings, 0 replies; 9+ messages in thread
From: Bruce James Fields @ 2017-01-06 21:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Trond Myklebust, linux-nfs

On Thu, Jan 05, 2017 at 06:51:30AM +0100, Christoph Hellwig wrote:
> On Wed, Jan 04, 2017 at 12:42:45PM -0500, Bruce James Fields wrote:
> > Dumb question: don't local filesystems have the ability to do some sort
> > of emergency conversion to read-only on detecting corruption?
> 
> Yes.
> 
> > Does that
> > prevent any open-file cleanup?
> 
> Yes, at least before the reboot.
> 
> > If not that, is there some other
> > mechanism nfsd could use to crash the filesystem on shutdown if
> > appropriate (so if it's holding opens on a filesystem and if the
> > filesystem was mounted with the new option)?
> > 
> > Possibly better would be if we could keep a separate list of
> > unlinked-but-still-held-by-nfsd files that was managed diferently than
> > the existing list.
> > 
> > But, I don't have the local filesystem knowledge to know where the
> > nightmares are here.
> 
> Maybe I shouldn't have called it a nighmare, but it's significantly
> more effort.  We'll need a way for NFSD to mark a file as not being
> allowed to cleaned up before the final iput for the reboot case
> mostly.
> 
> I'll try to come up with a prototype later this month, but it might not
> be pretty.

OK, thanks, I'll look forward to seeing how it works, pretty or not.

--b.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-01-06 21:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAABAsM5L0xdKodxk1dRSugLyROzn2JzgDkq6kdHE0LuGcfh++A@mail.gmail.com>
     [not found] ` <20161213181734.Horde.EqgB09El8rupnkesIQaBwJ3@mail.telka.sk>
     [not found]   ` <CADaq8jcq2C0o8EWXoGjxDn58sV_J+-SP-=rj934Se-DV69b-pw@mail.gmail.com>
     [not found]     ` <20161214112112.Horde.aPh8AjT6iWRl37CULwihyV7@mail.telka.sk>
     [not found]       ` <CAABAsM7v6y0bsb0jKzfvobkUjniTLhM3uv8FYjo07HcLD2004w@mail.gmail.com>
     [not found]         ` <20161227144414.GA32002@fieldses.org>
     [not found]           ` <CADaq8jck14SKL6Ua9QxbqPyX1=1aaA7+76wv-__EWFvh7ZcEJA@mail.gmail.com>
     [not found]             ` <C496AE44-0F27-4B66-A1F6-A76AEAFD7A90@gmail.com>
     [not found]               ` <20161229024703.GA21325@fieldses.org>
     [not found]                 ` <20161229074830.GA3002@lst.de>
2016-12-29 20:54                   ` [nfsv4] RFC 7530: Filehandle of opened file after the REMOVE Bruce James Fields
2016-12-30  8:35                     ` Christoph Hellwig
2017-01-01 13:58                       ` Christoph Hellwig
2017-01-01 22:10                         ` Bruce James Fields
2017-01-02  8:40                           ` Christoph Hellwig
2017-01-02 15:27                             ` Bruce James Fields
2017-01-04 17:42                               ` Bruce James Fields
2017-01-05  5:51                                 ` Christoph Hellwig
2017-01-06 21:13                                   ` Bruce James Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.