linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Frank Filz" <ffilzlnx@mindspring.com>
To: "'Trond Myklebust'" <trondmy@hammerspace.com>, <bfields@fieldses.org>
Cc: <linux-cachefs@redhat.com>, <linux-nfs@vger.kernel.org>,
	<daire@dneg.com>
Subject: RE: Adventures in NFS re-exporting
Date: Thu, 3 Dec 2020 15:34:57 -0800	[thread overview]
Message-ID: <01ae01d6c9cc$e9a40440$bcec0cc0$@mindspring.com> (raw)
In-Reply-To: <72362b839eb2ecc992f41a0e7783545b13e8ecbc.camel@hammerspace.com>

> > > -----Original Message-----
> > > From: Trond Myklebust [mailto:trondmy@hammerspace.com]
> > > Sent: Thursday, December 3, 2020 2:14 PM
> > > To: bfields@fieldses.org
> > > Cc: linux-cachefs@redhat.com; ffilzlnx@mindspring.com; linux-
> > > nfs@vger.kernel.org; daire@dneg.com
> > > Subject: Re: Adventures in NFS re-exporting
> > >
> > > On Thu, 2020-12-03 at 17:04 -0500, bfields@fieldses.org wrote:
> > > > On Thu, Dec 03, 2020 at 09:57:41PM +0000, Trond Myklebust wrote:
> > > > > On Thu, 2020-12-03 at 13:45 -0800, Frank Filz wrote:
> > > > > > > On Thu, 2020-12-03 at 16:13 -0500, bfields@fieldses.org
> > > > > > > wrote:
> > > > > > > > On Thu, Dec 03, 2020 at 08:27:39PM +0000, Trond Myklebust
> > > > > > > > wrote:
> > > > > > > > > On Thu, 2020-12-03 at 13:51 -0500, bfields wrote:
> > > > > > > > > > I've been scratching my head over how to handle reboot
> > > > > > > > > > of a
> > > > > > > > > > re-
> > > > > > > > > > exporting server.  I think one way to fix it might be
> > > > > > > > > > just to allow the re- export server to pass along
> > > > > > > > > > reclaims to the original server as it receives them
> > > > > > > > > > from its own clients.  It might require some protocol
> > > > > > > > > > tweaks, I'm not sure.  I'll try to get my thoughts in
> > > > > > > > > > order and propose something.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > It's more complicated than that. If the re-exporting
> > > > > > > > > server reboots, but the original server does not, then
> > > > > > > > > unless that
> > > > > > > > > re- exporting server persisted its lease and a full set
> > > > > > > > > of stateids somewhere, it will not be able to atomically
> > > > > > > > > reclaim delegation and lock state on the server on
> > > > > > > > > behalf of its clients.
> > > > > > > >
> > > > > > > > By sending reclaims to the original server, I mean
> > > > > > > > literally sending new open and lock requests with the
> > > > > > > > RECLAIM bit set, which would get brand new stateids.
> > > > > > > >
> > > > > > > > So, the original server would invalidate the existing
> > > > > > > > client's previous clientid and stateids--just as it
> > > > > > > > normally would on reboot--but it would optionally remember
> > > > > > > > the underlying locks held by the client and allow
> > > > > > > > compatible lock reclaims.
> > > > > > > >
> > > > > > > > Rough attempt:
> > > > > > > >
> > > > > > > >
> > > > > > > > https://wiki.linux-nfs.org/wiki/index.php/Reboot_recovery_
> > > > > > > > for_
> > > > > > > > re-expor
> > > > > > > > t_servers
> > > > > > > >
> > > > > > > > Think it would fly?
> > > > > > >
> > > > > > > So this would be a variant of courtesy locks that can be
> > > > > > > reclaimed by the client using the reboot reclaim variant of
> > > > > > > OPEN/LOCK outside the grace period? The purpose being to
> > > > > > > allow reclaim without forcing the client to persist the
> > > > > > > original stateid?
> > > > > > >
> > > > > > > Hmm... That's doable, but how about the following
> > > > > > > alternative:
> > > > > > > Add
> > > > > > > a function
> > > > > > > that allows the client to request the full list of stateids
> > > > > > > that the server holds on its behalf?
> > > > > > >
> > > > > > > I've been wanting such a function for quite a while anyway
> > > > > > > in order to allow the client to detect state leaks (either
> > > > > > > due to soft timeouts, or due to reordered close/open
> > > > > > > operations).
> > > > > >
> > > > > > Oh, that sounds interesting. So basically the re-export server
> > > > > > would re-populate it's state from the original server rather
> > > > > > than relying on it's clients doing reclaims? Hmm, but how does
> > > > > > the re-export server rebuild its stateids? I guess it could
> > > > > > make the clients repopulate them with the same "give me a dump
> > > > > > of all my state", using the state details to match up with the
> > > > > > old state and replacing stateids. Or did you have something
> > > > > > different in mind?
> > > > > >
> > > > >
> > > > > I was thinking that the re-export server could just use that
> > > > > list of stateids to figure out which locks can be reclaimed
> > > > > atomically, and which ones have been irredeemably lost. The
> > > > > assumption is that if you have a lock stateid or a delegation,
> > > > > then that means the clients can reclaim all the locks that were
> > > > > represented by that stateid.
> > > >
> > > > I'm confused about how the re-export server uses that list.  Are
> > > > you assuming it persisted its own list across its own
> > > > crash/reboot?
> > > > I
> > > > guess that's what I was trying to avoid having to do.
> > > >
> > > No. The server just uses the stateids as part of a check for 'do I
> > > hold state for this file on this server?'. If the answer is 'yes'
> > > and the lock owners are sane, then we should be able to assume the
> > > full set of locks that lock owner held on that file are still valid.
> > >
> > > BTW: if the lock owner is also returned by the server, then since
> > > the lock owner is an opaque value, it could, for instance, be used
> > > by the client to cache info on the server about which uid/gid owns
> > > these locks.
> >
> > Let me see if I'm understanding your idea right...
> >
> > Re-export server reboots within the extended lease period it's been
> > given by the original server. I'm assuming it uses the same clientid?
> 
> Yes. It would have to use the same clientid.
> 
> > But would probably open new sessions. It requests the list of
> > stateids. Hmm, how to make the owner information useful, nfs-ganesha
> > doesn't pass on the actual client's owner but rather just passes the
> > address of its record for that client owner. Maybe it will have to do
> > something a bit different for this degree of re-export support...
> >
> > Now the re-export server knows which original client lock owners are
> > allowed to reclaim state. So it just acquires locks using the original
> > stateid as the client reclaims (what happens if the client doesn't
> > reclaim a lock? I suppose the re-export server could unlock all
> > regions not explicitly locked once reclaim is complete). Since the
> > re-export server is acquiring new locks using the original stateid it
> > will just overlay the original lock with the new lock and write locks
> > don't conflict since they are being acquired by the same lock owner.
> > Actually the original server could even balk at a "reclaim" in this
> > way that wasn't originally held... And the original server could
> > "refresh" the locks, and discard any that aren't refreshed at the end
> > of reclaim. That part assumes the original server is apprised that
> > what is actually happening is a reclaim.
> >
> > The re-export server can destroy any stateids that it doesn't receive
> > reclaims for.
> 
> Right. That's in essence what I'm suggesting. There are corner cases to be
> considered: e.g. "what happens if the re-export server crashes after unlocking
> on the server, but before passing the LOCKU reply on the the client", however I
> think it should be possible to figure out strategies for those cases.

That's no different than a regular NFS server crashes before responding to an unlock. The client likely doesn't reclaim locks it was attempting to drop at server crash time. So then one place we would definitely have abandoned locks on the original server IF the unlock never made it to the original server. But we're already talking strategies to clean up abandoned locks.

I won't be surprised if we find a more tricky corner case, but my gut feel is every corner case will have a relatively simple solution.

Another consideration is how to handle the size of the state list... Ideally we would have some way to break it up that is less clunky than readdir (at least the state list can be assumed to be static during the course of the fetching of it, even for a regular client just interested in it, it could pause state activity until the list is retrieved).

Frank

Frank


  reply	other threads:[~2020-12-03 23:35 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 17:31 Adventures in NFS re-exporting Daire Byrne
2020-09-08  9:40 ` Mkrtchyan, Tigran
2020-09-08 11:06   ` Daire Byrne
2020-09-15 17:21 ` J. Bruce Fields
2020-09-15 19:59   ` Trond Myklebust
2020-09-16 16:01     ` Daire Byrne
2020-10-19 16:19       ` Daire Byrne
2020-10-19 17:53         ` [PATCH 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 17:53           ` [PATCH 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 17:53             ` [PATCH 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-19 20:05         ` [PATCH v2 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 20:05           ` [PATCH v2 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 20:05             ` [PATCH v2 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37         ` [PATCH v3 0/3] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-20 18:37           ` [PATCH v3 1/3] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-20 18:37             ` [PATCH v3 2/3] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37               ` [PATCH v3 3/3] NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp trondmy
2020-10-21  9:33         ` Adventures in NFS re-exporting Daire Byrne
2020-11-09 16:02           ` bfields
2020-11-12 13:01             ` Daire Byrne
2020-11-12 13:57               ` bfields
2020-11-12 18:33                 ` Daire Byrne
2020-11-12 20:55                   ` bfields
2020-11-12 23:05                     ` Daire Byrne
2020-11-13 14:50                       ` bfields
2020-11-13 22:26                         ` bfields
2020-11-14 12:57                           ` Daire Byrne
2020-11-16 15:18                             ` bfields
2020-11-16 15:53                             ` bfields
2020-11-16 19:21                               ` Daire Byrne
2020-11-16 15:29                           ` Jeff Layton
2020-11-16 15:56                             ` bfields
2020-11-16 16:03                               ` Jeff Layton
2020-11-16 16:14                                 ` bfields
2020-11-16 16:38                                   ` Jeff Layton
2020-11-16 19:03                                     ` bfields
2020-11-16 20:03                                       ` Jeff Layton
2020-11-17  3:16                                         ` bfields
2020-11-17  3:18                                           ` [PATCH 1/4] nfsd: move fill_{pre,post}_wcc to nfsfh.c J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-17 15:26                                                 ` J. Bruce Fields
2020-11-17 15:34                                                   ` Jeff Layton
2020-11-20 22:38                                                     ` J. Bruce Fields
2020-11-20 22:39                                                       ` [PATCH 1/8] nfsd: only call inode_query_iversion in the I_VERSION case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 2/8] nfsd: simplify nfsd4_change_info J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 3/8] nfsd: minor nfsd4_change_attribute cleanup J. Bruce Fields
2020-11-21  0:34                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 4/8] nfsd4: don't query change attribute in v2/v3 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 5/8] nfs: use change attribute for NFS re-exports J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 6/8] nfsd: move change attribute generation to filesystem J. Bruce Fields
2020-11-21  0:58                                                           ` Jeff Layton
2020-11-21  1:01                                                             ` J. Bruce Fields
2020-11-21 13:00                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 7/8] nfsd: skip some unnecessary stats in the v4 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 8/8] Revert "nfsd4: support change_attr_type attribute" J. Bruce Fields
2020-11-20 22:44                                                       ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-21  1:03                                                         ` Jeff Layton
2020-11-21 21:44                                                           ` Daire Byrne
2020-11-22  0:02                                                             ` bfields
2020-11-22  1:55                                                               ` Daire Byrne
2020-11-22  3:03                                                                 ` bfields
2020-11-23 20:07                                                                   ` Daire Byrne
2020-11-17 15:25                                               ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 3/4] nfs: don't mangle i_version on NFS J. Bruce Fields
2020-11-17 12:27                                               ` Jeff Layton
2020-11-17 14:14                                                 ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 4/4] nfs: support i_version in the NFSv4 case J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-24 20:35               ` Adventures in NFS re-exporting Daire Byrne
2020-11-24 21:15                 ` bfields
2020-11-24 22:15                   ` Frank Filz
2020-11-25 14:47                     ` 'bfields'
2020-11-25 16:25                       ` Frank Filz
2020-11-25 19:03                         ` 'bfields'
2020-11-26  0:04                           ` Frank Filz
2020-11-25 17:14                   ` Daire Byrne
2020-11-25 19:31                     ` bfields
2020-12-03 12:20                     ` Daire Byrne
2020-12-03 18:51                       ` bfields
2020-12-03 20:27                         ` Trond Myklebust
2020-12-03 21:13                           ` bfields
2020-12-03 21:32                             ` Frank Filz
2020-12-03 21:34                             ` Trond Myklebust
2020-12-03 21:45                               ` Frank Filz
2020-12-03 21:57                                 ` Trond Myklebust
2020-12-03 22:04                                   ` bfields
2020-12-03 22:14                                     ` Trond Myklebust
2020-12-03 22:39                                       ` Frank Filz
2020-12-03 22:50                                         ` Trond Myklebust
2020-12-03 23:34                                           ` Frank Filz [this message]
2020-12-03 22:44                                       ` bfields
2020-12-03 21:54                               ` bfields
2020-12-03 22:45                               ` bfields
2020-12-03 22:53                                 ` Trond Myklebust
2020-12-03 23:16                                   ` bfields
2020-12-03 23:28                                     ` Frank Filz
2020-12-04  1:02                                     ` Trond Myklebust
2020-12-04  1:41                                       ` bfields
2020-12-04  2:27                                         ` Trond Myklebust
2020-09-17 16:01   ` Daire Byrne
2020-09-17 19:09     ` bfields
2020-09-17 20:23       ` Frank van der Linden
2020-09-17 21:57         ` bfields
2020-09-19 11:08           ` Daire Byrne
2020-09-22 16:43         ` Chuck Lever
2020-09-23 20:25           ` Daire Byrne
2020-09-23 21:01             ` Frank van der Linden
2020-09-26  9:00               ` Daire Byrne
2020-09-28 15:49                 ` Frank van der Linden
2020-09-28 16:08                   ` Chuck Lever
2020-09-28 17:42                     ` Frank van der Linden
2020-09-22 12:31 ` Daire Byrne
2020-09-22 13:52   ` Trond Myklebust
2020-09-23 12:40     ` J. Bruce Fields
2020-09-23 13:09       ` Trond Myklebust
2020-09-23 17:07         ` bfields
2020-09-30 19:30   ` [Linux-cachefs] " Jeff Layton
2020-10-01  0:09     ` Daire Byrne
2020-10-01 10:36       ` Jeff Layton
2020-10-01 12:38         ` Trond Myklebust
2020-10-01 16:39           ` Jeff Layton
2020-10-05 12:54         ` Daire Byrne
2020-10-13  9:59           ` Daire Byrne
2020-10-01 18:41     ` J. Bruce Fields
2020-10-01 19:24       ` Trond Myklebust
2020-10-01 19:26         ` bfields
2020-10-01 19:29           ` Trond Myklebust
2020-10-01 19:51             ` bfields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='01ae01d6c9cc$e9a40440$bcec0cc0$@mindspring.com' \
    --to=ffilzlnx@mindspring.com \
    --cc=bfields@fieldses.org \
    --cc=daire@dneg.com \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).