All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "bfields@fieldses.org" <bfields@fieldses.org>,
	"ffilzlnx@mindspring.com" <ffilzlnx@mindspring.com>
Cc: "linux-cachefs@redhat.com" <linux-cachefs@redhat.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"daire@dneg.com" <daire@dneg.com>
Subject: Re: Adventures in NFS re-exporting
Date: Thu, 3 Dec 2020 22:50:46 +0000	[thread overview]
Message-ID: <72362b839eb2ecc992f41a0e7783545b13e8ecbc.camel@hammerspace.com> (raw)
In-Reply-To: <01a001d6c9c5$37eb34f0$a7c19ed0$@mindspring.com>

On Thu, 2020-12-03 at 14:39 -0800, Frank Filz wrote:
> 
> 
> > -----Original Message-----
> > From: Trond Myklebust [mailto:trondmy@hammerspace.com]
> > Sent: Thursday, December 3, 2020 2:14 PM
> > To: bfields@fieldses.org
> > Cc: linux-cachefs@redhat.com; ffilzlnx@mindspring.com; linux-
> > nfs@vger.kernel.org; daire@dneg.com
> > Subject: Re: Adventures in NFS re-exporting
> > 
> > On Thu, 2020-12-03 at 17:04 -0500, bfields@fieldses.org wrote:
> > > On Thu, Dec 03, 2020 at 09:57:41PM +0000, Trond Myklebust wrote:
> > > > On Thu, 2020-12-03 at 13:45 -0800, Frank Filz wrote:
> > > > > > On Thu, 2020-12-03 at 16:13 -0500,
> > > > > > bfields@fieldses.org wrote:
> > > > > > > On Thu, Dec 03, 2020 at 08:27:39PM +0000, Trond Myklebust
> > > > > > > wrote:
> > > > > > > > On Thu, 2020-12-03 at 13:51 -0500, bfields wrote:
> > > > > > > > > I've been scratching my head over how to handle
> > > > > > > > > reboot of
> > > > > > > > > a
> > > > > > > > > re-
> > > > > > > > > exporting server.  I think one way to fix it might be
> > > > > > > > > just
> > > > > > > > > to allow the re- export server to pass along reclaims
> > > > > > > > > to
> > > > > > > > > the original server as it receives them from its own
> > > > > > > > > clients.  It might require some protocol tweaks, I'm
> > > > > > > > > not
> > > > > > > > > sure.  I'll try to get my thoughts in order and
> > > > > > > > > propose
> > > > > > > > > something.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > It's more complicated than that. If the re-exporting
> > > > > > > > server
> > > > > > > > reboots, but the original server does not, then unless
> > > > > > > > that
> > > > > > > > re- exporting server persisted its lease and a full set
> > > > > > > > of
> > > > > > > > stateids somewhere, it will not be able to atomically
> > > > > > > > reclaim delegation and lock state on the server on
> > > > > > > > behalf of
> > > > > > > > its clients.
> > > > > > > 
> > > > > > > By sending reclaims to the original server, I mean
> > > > > > > literally
> > > > > > > sending new open and lock requests with the RECLAIM bit
> > > > > > > set,
> > > > > > > which would get brand new stateids.
> > > > > > > 
> > > > > > > So, the original server would invalidate the existing
> > > > > > > client's
> > > > > > > previous clientid and stateids--just as it normally would
> > > > > > > on
> > > > > > > reboot--but it would optionally remember the underlying
> > > > > > > locks
> > > > > > > held by the client and allow compatible lock reclaims.
> > > > > > > 
> > > > > > > Rough attempt:
> > > > > > > 
> > > > > > > 
> > > > > > > https://wiki.linux-nfs.org/wiki/index.php/Reboot_recovery_for_
> > > > > > > re-expor
> > > > > > > t_servers
> > > > > > > 
> > > > > > > Think it would fly?
> > > > > > 
> > > > > > So this would be a variant of courtesy locks that can be
> > > > > > reclaimed by the client using the reboot reclaim variant of
> > > > > > OPEN/LOCK outside the grace period? The purpose being to
> > > > > > allow
> > > > > > reclaim without forcing the client to persist the original
> > > > > > stateid?
> > > > > > 
> > > > > > Hmm... That's doable, but how about the following
> > > > > > alternative:
> > > > > > Add
> > > > > > a function
> > > > > > that allows the client to request the full list of stateids
> > > > > > that
> > > > > > the server holds on its behalf?
> > > > > > 
> > > > > > I've been wanting such a function for quite a while anyway
> > > > > > in
> > > > > > order to allow the client to detect state leaks (either due
> > > > > > to
> > > > > > soft timeouts, or due to reordered close/open operations).
> > > > > 
> > > > > Oh, that sounds interesting. So basically the re-export
> > > > > server
> > > > > would re-populate it's state from the original server rather
> > > > > than
> > > > > relying on it's clients doing reclaims? Hmm, but how does the
> > > > > re-export server rebuild its stateids? I guess it could make
> > > > > the
> > > > > clients repopulate them with the same "give me a dump of all
> > > > > my
> > > > > state", using the state details to match up with the old
> > > > > state and
> > > > > replacing stateids. Or did you have something different in
> > > > > mind?
> > > > > 
> > > > 
> > > > I was thinking that the re-export server could just use that
> > > > list of
> > > > stateids to figure out which locks can be reclaimed atomically,
> > > > and
> > > > which ones have been irredeemably lost. The assumption is that
> > > > if
> > > > you have a lock stateid or a delegation, then that means the
> > > > clients
> > > > can reclaim all the locks that were represented by that
> > > > stateid.
> > > 
> > > I'm confused about how the re-export server uses that list.  Are
> > > you
> > > assuming it persisted its own list across its own crash/reboot? 
> > > I
> > > guess that's what I was trying to avoid having to do.
> > > 
> > No. The server just uses the stateids as part of a check for 'do I
> > hold state for
> > this file on this server?'. If the answer is 'yes' and the lock
> > owners are sane, then
> > we should be able to assume the full set of locks that lock owner
> > held on that
> > file are still valid.
> > 
> > BTW: if the lock owner is also returned by the server, then since
> > the lock owner
> > is an opaque value, it could, for instance, be used by the client
> > to cache info on
> > the server about which uid/gid owns these locks.
> 
> Let me see if I'm understanding your idea right...
> 
> Re-export server reboots within the extended lease period it's been
> given by the original server. I'm assuming it uses the same clientid?

Yes. It would have to use the same clientid.

> But would probably open new sessions. It requests the list of
> stateids. Hmm, how to make the owner information useful, nfs-ganesha
> doesn't pass on the actual client's owner but rather just passes the
> address of its record for that client owner. Maybe it will have to do
> something a bit different for this degree of re-export support...
> 
> Now the re-export server knows which original client lock owners are
> allowed to reclaim state. So it just acquires locks using the
> original stateid as the client reclaims (what happens if the client
> doesn't reclaim a lock? I suppose the re-export server could unlock
> all regions not explicitly locked once reclaim is complete). Since
> the re-export server is acquiring new locks using the original
> stateid it will just overlay the original lock with the new lock and
> write locks don't conflict since they are being acquired by the same
> lock owner. Actually the original server could even balk at a
> "reclaim" in this way that wasn't originally held... And the original
> server could "refresh" the locks, and discard any that aren't
> refreshed at the end of reclaim. That part assumes the original
> server is apprised that what is actually happening is a reclaim.
> 
> The re-export server can destroy any stateids that it doesn't receive
> reclaims for.

Right. That's in essence what I'm suggesting. There are corner cases to
be considered: e.g. "what happens if the re-export server crashes after
unlocking on the server, but before passing the LOCKU reply on the the
client", however I think it should be possible to figure out strategies
for those cases.

> 
> Hmm, I think if the re-export server is implemented as an HA cluster,
> it should establish a clientid on the original server for each
> virtual IP (assuming that's the unit of HA)  that exists. Then when
> virtual IPs are moved, the re-export server just goes through the
> above reclaim process for that clientid.
> 

Yes, we could do something like that.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2020-12-03 22:51 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 17:31 Adventures in NFS re-exporting Daire Byrne
2020-09-08  9:40 ` Mkrtchyan, Tigran
2020-09-08 11:06   ` Daire Byrne
2020-09-15 17:21 ` J. Bruce Fields
2020-09-15 19:59   ` Trond Myklebust
2020-09-16 16:01     ` Daire Byrne
2020-10-19 16:19       ` Daire Byrne
2020-10-19 17:53         ` [PATCH 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 17:53           ` [PATCH 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 17:53             ` [PATCH 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-19 20:05         ` [PATCH v2 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 20:05           ` [PATCH v2 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 20:05             ` [PATCH v2 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37         ` [PATCH v3 0/3] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-20 18:37           ` [PATCH v3 1/3] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-20 18:37             ` [PATCH v3 2/3] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37               ` [PATCH v3 3/3] NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp trondmy
2020-10-21  9:33         ` Adventures in NFS re-exporting Daire Byrne
2020-11-09 16:02           ` bfields
2020-11-12 13:01             ` Daire Byrne
2020-11-12 13:57               ` bfields
2020-11-12 18:33                 ` Daire Byrne
2020-11-12 20:55                   ` bfields
2020-11-12 23:05                     ` Daire Byrne
2020-11-13 14:50                       ` bfields
2020-11-13 22:26                         ` bfields
2020-11-14 12:57                           ` Daire Byrne
2020-11-16 15:18                             ` bfields
2020-11-16 15:53                             ` bfields
2020-11-16 19:21                               ` Daire Byrne
2020-11-16 15:29                           ` Jeff Layton
2020-11-16 15:56                             ` bfields
2020-11-16 16:03                               ` Jeff Layton
2020-11-16 16:14                                 ` bfields
2020-11-16 16:38                                   ` Jeff Layton
2020-11-16 19:03                                     ` bfields
2020-11-16 20:03                                       ` Jeff Layton
2020-11-17  3:16                                         ` bfields
2020-11-17  3:18                                           ` [PATCH 1/4] nfsd: move fill_{pre,post}_wcc to nfsfh.c J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-17 15:26                                                 ` J. Bruce Fields
2020-11-17 15:34                                                   ` Jeff Layton
2020-11-20 22:38                                                     ` J. Bruce Fields
2020-11-20 22:39                                                       ` [PATCH 1/8] nfsd: only call inode_query_iversion in the I_VERSION case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 2/8] nfsd: simplify nfsd4_change_info J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 3/8] nfsd: minor nfsd4_change_attribute cleanup J. Bruce Fields
2020-11-21  0:34                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 4/8] nfsd4: don't query change attribute in v2/v3 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 5/8] nfs: use change attribute for NFS re-exports J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 6/8] nfsd: move change attribute generation to filesystem J. Bruce Fields
2020-11-21  0:58                                                           ` Jeff Layton
2020-11-21  1:01                                                             ` J. Bruce Fields
2020-11-21 13:00                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 7/8] nfsd: skip some unnecessary stats in the v4 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 8/8] Revert "nfsd4: support change_attr_type attribute" J. Bruce Fields
2020-11-20 22:44                                                       ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-21  1:03                                                         ` Jeff Layton
2020-11-21 21:44                                                           ` Daire Byrne
2020-11-22  0:02                                                             ` bfields
2020-11-22  1:55                                                               ` Daire Byrne
2020-11-22  3:03                                                                 ` bfields
2020-11-23 20:07                                                                   ` Daire Byrne
2020-11-17 15:25                                               ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 3/4] nfs: don't mangle i_version on NFS J. Bruce Fields
2020-11-17 12:27                                               ` Jeff Layton
2020-11-17 14:14                                                 ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 4/4] nfs: support i_version in the NFSv4 case J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-24 20:35               ` Adventures in NFS re-exporting Daire Byrne
2020-11-24 21:15                 ` bfields
2020-11-24 22:15                   ` Frank Filz
2020-11-25 14:47                     ` 'bfields'
2020-11-25 16:25                       ` Frank Filz
2020-11-25 19:03                         ` 'bfields'
2020-11-26  0:04                           ` Frank Filz
2020-11-25 17:14                   ` Daire Byrne
2020-11-25 19:31                     ` bfields
2020-12-03 12:20                     ` Daire Byrne
2020-12-03 18:51                       ` bfields
2020-12-03 20:27                         ` Trond Myklebust
2020-12-03 21:13                           ` bfields
2020-12-03 21:32                             ` Frank Filz
2020-12-03 21:34                             ` Trond Myklebust
2020-12-03 21:45                               ` Frank Filz
2020-12-03 21:57                                 ` Trond Myklebust
2020-12-03 22:04                                   ` bfields
2020-12-03 22:14                                     ` Trond Myklebust
2020-12-03 22:39                                       ` Frank Filz
2020-12-03 22:50                                         ` Trond Myklebust [this message]
2020-12-03 23:34                                           ` Frank Filz
2020-12-03 22:44                                       ` bfields
2020-12-03 21:54                               ` bfields
2020-12-03 22:45                               ` bfields
2020-12-03 22:53                                 ` Trond Myklebust
2020-12-03 23:16                                   ` bfields
2020-12-03 23:28                                     ` Frank Filz
2020-12-04  1:02                                     ` Trond Myklebust
2020-12-04  1:41                                       ` bfields
2020-12-04  2:27                                         ` Trond Myklebust
2020-09-17 16:01   ` Daire Byrne
2020-09-17 19:09     ` bfields
2020-09-17 20:23       ` Frank van der Linden
2020-09-17 21:57         ` bfields
2020-09-19 11:08           ` Daire Byrne
2020-09-22 16:43         ` Chuck Lever
2020-09-23 20:25           ` Daire Byrne
2020-09-23 21:01             ` Frank van der Linden
2020-09-26  9:00               ` Daire Byrne
2020-09-28 15:49                 ` Frank van der Linden
2020-09-28 16:08                   ` Chuck Lever
2020-09-28 17:42                     ` Frank van der Linden
2020-09-22 12:31 ` Daire Byrne
2020-09-22 13:52   ` Trond Myklebust
2020-09-23 12:40     ` J. Bruce Fields
2020-09-23 13:09       ` Trond Myklebust
2020-09-23 17:07         ` bfields
2020-09-30 19:30   ` [Linux-cachefs] " Jeff Layton
2020-10-01  0:09     ` Daire Byrne
2020-10-01 10:36       ` Jeff Layton
2020-10-01 12:38         ` Trond Myklebust
2020-10-01 16:39           ` Jeff Layton
2020-10-05 12:54         ` Daire Byrne
2020-10-13  9:59           ` Daire Byrne
2020-10-01 18:41     ` J. Bruce Fields
2020-10-01 19:24       ` Trond Myklebust
2020-10-01 19:26         ` bfields
2020-10-01 19:29           ` Trond Myklebust
2020-10-01 19:51             ` bfields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=72362b839eb2ecc992f41a0e7783545b13e8ecbc.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=bfields@fieldses.org \
    --cc=daire@dneg.com \
    --cc=ffilzlnx@mindspring.com \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.