All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.
@ 2022-01-27  4:58 NeilBrown
  2022-01-27  4:58 ` [PATCH 1/4] nfsd: prepare for supporting admin-revocation of state NeilBrown
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: NeilBrown @ 2022-01-27  4:58 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

If a filesystem is exported to a client with NFSv4 and that client holds
a file open, the filesystem cannot be unmounted without either stopping the
NFS server completely, or blocking all access from that client
(unexporting all filesystems) and waiting for the lease timeout.

For NFSv3 - and particularly NLM - it is possible to revoke all state by
writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem.

This series extends this functionality to NFSv4.  With this, to unmount
an exported filesystem is it sufficient to disable export of that
filesystem, and then write the path to unlock_filesystem.

I've cursed mainly on NFSv4.1 and later for this.  I haven't tested
yet with NFSv4.0 which has different mechanisms for state management.

If this series is seen as a general acceptable approach, I'll look into
the NFSv4.0 aspects properly and make sure it works there.

Thanks,
NeilBrown


---

NeilBrown (4):
      nfsd: prepare for supporting admin-revocation of state
      nfsd: allow open state ids to be revoked and then freed
      nfsd: allow lock state ids to be revoked and then freed
      nfsd: allow delegation state ids to be revoked and then freed


 fs/nfsd/nfs4state.c | 105 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 96 insertions(+), 9 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] nfsd: prepare for supporting admin-revocation of state
  2022-01-27  4:58 [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked NeilBrown
@ 2022-01-27  4:58 ` NeilBrown
  2022-01-27 12:59     ` kernel test robot
  2022-01-27 14:51     ` kernel test robot
  2022-01-27  4:58 ` [PATCH 3/4] nfsd: allow lock state ids to be revoked and then freed NeilBrown
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 22+ messages in thread
From: NeilBrown @ 2022-01-27  4:58 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

The NFSv4 protocol allows state to be revoked by the admin and has error
codes which allow this to be communicated to the client.

This patch
 - introduces 3 new state-id types for revoked open, lock, and
   delegation state.  This requires the bitmask to be 'short',
   not 'char'
 - reports NFS4ERR_ADMIN_REVOKED when these are accessed
 - introduces a per-client counter of these states and returns
   SEQ4_STATUS_ADMIN_STATE_REVOKED when the counter is not zero
 - introduces stub code to find all interesting states for a given
   superblock so they can be revoked via the 'unlock_filesystem'
   file in /proc/fs/nfsd/
   No actual states are handled yet.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4layouts.c |    2 +
 fs/nfsd/nfs4state.c   |   79 +++++++++++++++++++++++++++++++++++++++++++++++--
 fs/nfsd/nfsctl.c      |    1 +
 fs/nfsd/nfsd.h        |    1 +
 fs/nfsd/state.h       |   26 ++++++++++------
 5 files changed, 95 insertions(+), 14 deletions(-)

diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 6d1b5bb051c5..211cbefa3d80 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -269,7 +269,7 @@ nfsd4_preprocess_layout_stateid(struct svc_rqst *rqstp,
 {
 	struct nfs4_layout_stateid *ls;
 	struct nfs4_stid *stid;
-	unsigned char typemask = NFS4_LAYOUT_STID;
+	unsigned short typemask = NFS4_LAYOUT_STID;
 	__be32 status;
 
 	if (create)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 72900b89cf84..ad02b4ebe1b6 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1575,6 +1575,55 @@ static void release_openowner(struct nfs4_openowner *oo)
 	nfs4_put_stateowner(&oo->oo_owner);
 }
 
+static struct nfs4_stid *find_one_sb_stid(struct nfs4_client *clp,
+					  struct super_block *sb,
+					  unsigned short sc_types)
+{
+	unsigned long id = 0;
+	struct nfs4_stid *stid;
+
+	spin_lock(&clp->cl_lock);
+	do {
+		id += 1;
+		stid = idr_get_next_ul(&clp->cl_stateids, &id);
+	} while (stid &&
+		 !((stid->sc_type & sc_types) &&
+		   stid->sc_file->fi_inode->i_sb == sb));
+	if (stid)
+		refcount_inc(&stid->sc_count);
+	spin_unlock(&clp->cl_lock);
+	return stid;
+}
+
+void nfsd4_revoke_states(struct net *net, struct super_block *sb)
+{
+	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+	unsigned int idhashval;
+	unsigned short sc_types;
+
+	sc_types = 0;
+
+	spin_lock(&nn->client_lock);
+	for (idhashval = 0; idhashval < CLIENT_HASH_MASK; idhashval++) {
+		struct list_head *head = &nn->conf_id_hashtbl[idhashval];
+		struct nfs4_client *clp;
+	retry:
+		list_for_each_entry(clp, head, cl_idhash) {
+			struct nfs4_stid *stid = find_one_sb_stid(clp, sb,
+								  sc_types);
+			if (stid) {
+				spin_unlock(&nn->client_lock);
+				switch (stid->sc_type) {
+				}
+				nfs4_put_stid(stid);
+				spin_lock(&nn->client_lock);
+				goto retry;
+			}
+		}
+	}
+	spin_unlock(&nn->client_lock);
+}
+
 static inline int
 hash_sessionid(struct nfs4_sessionid *sessionid)
 {
@@ -2349,7 +2398,8 @@ find_stateid_locked(struct nfs4_client *cl, stateid_t *t)
 }
 
 static struct nfs4_stid *
-find_stateid_by_type(struct nfs4_client *cl, stateid_t *t, char typemask)
+find_stateid_by_type(struct nfs4_client *cl, stateid_t *t,
+		     unsigned short typemask)
 {
 	struct nfs4_stid *s;
 
@@ -2426,6 +2476,8 @@ static int client_info_show(struct seq_file *m, void *v)
 	}
 	seq_printf(m, "callback state: %s\n", cb_state2str(clp->cl_cb_state));
 	seq_printf(m, "callback address: %pISpc\n", &clp->cl_cb_conn.cb_addr);
+	seq_printf(m, "admin-revoked states: %d\n",
+		   atomic_read(&clp->cl_admin_revoked));
 	drop_client(clp);
 
 	return 0;
@@ -3907,6 +3959,8 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	}
 	if (!list_empty(&clp->cl_revoked))
 		seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED;
+	if (atomic_read(&clp->cl_admin_revoked))
+		seq->status_flags |= SEQ4_STATUS_ADMIN_STATE_REVOKED;
 out_no_session:
 	if (conn)
 		free_conn(conn);
@@ -4361,6 +4415,11 @@ nfsd4_verify_open_stid(struct nfs4_stid *s)
 		break;
 	case NFS4_REVOKED_DELEG_STID:
 		ret = nfserr_deleg_revoked;
+		break;
+	case NFS4_ADMIN_REVOKED_STID:
+	case NFS4_ADMIN_REVOKED_DELEG_STID:
+		ret = nfserr_admin_revoked;
+		break;
 	}
 	return ret;
 }
@@ -4893,6 +4952,11 @@ nfs4_check_deleg(struct nfs4_client *cl, struct nfsd4_open *open,
 			status = nfserr_deleg_revoked;
 		goto out;
 	}
+	if (deleg->dl_stid.sc_type == NFS4_ADMIN_REVOKED_DELEG_STID) {
+		nfs4_put_stid(&deleg->dl_stid);
+		status = nfserr_admin_revoked;
+		goto out;
+	}
 	flags = share_access_to_flags(open->op_share_access);
 	status = nfs4_check_delegmode(deleg, flags);
 	if (status) {
@@ -5840,6 +5904,11 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
 	case NFS4_REVOKED_DELEG_STID:
 		status = nfserr_deleg_revoked;
 		break;
+	case NFS4_ADMIN_REVOKED_STID:
+	case NFS4_ADMIN_REVOKED_LOCK_STID:
+	case NFS4_ADMIN_REVOKED_DELEG_STID:
+		status = nfserr_admin_revoked;
+		break;
 	case NFS4_OPEN_STID:
 	case NFS4_LOCK_STID:
 		status = nfsd4_check_openowner_confirmed(openlockstateid(s));
@@ -5858,7 +5927,7 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
 
 __be32
 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
-		     stateid_t *stateid, unsigned char typemask,
+		     stateid_t *stateid, unsigned short typemask,
 		     struct nfs4_stid **s, struct nfsd_net *nn)
 {
 	__be32 status;
@@ -5893,6 +5962,10 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
 			return nfserr_deleg_revoked;
 		return nfserr_bad_stateid;
 	}
+	if (((*s)->sc_type == NFS4_ADMIN_REVOKED_DELEG_STID) && !return_revoked) {
+		nfs4_put_stid(*s);
+		return nfserr_admin_revoked;
+	}
 	return nfs_ok;
 }
 
@@ -6235,7 +6308,7 @@ static __be32 nfs4_seqid_op_checks(struct nfsd4_compound_state *cstate, stateid_
  */
 static __be32
 nfs4_preprocess_seqid_op(struct nfsd4_compound_state *cstate, u32 seqid,
-			 stateid_t *stateid, char typemask,
+			 stateid_t *stateid, unsigned short typemask,
 			 struct nfs4_ol_stateid **stpp,
 			 struct nfsd_net *nn)
 {
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index b9f27fbcd768..2576867bdd71 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -323,6 +323,7 @@ static ssize_t write_unlock_fs(struct file *file, char *buf, size_t size)
 	 * 3.  Is that directory the root of an exported file system?
 	 */
 	error = nlmsvc_unlock_all_by_sb(path.dentry->d_sb);
+	nfsd4_revoke_states(netns(file), path.dentry->d_sb);
 
 	path_put(&path);
 	return error;
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 3e5008b475ff..f9fdd4078956 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -249,6 +249,7 @@ void		nfsd_lockd_shutdown(void);
 #define	nfserr_no_grace		cpu_to_be32(NFSERR_NO_GRACE)
 #define	nfserr_reclaim_bad	cpu_to_be32(NFSERR_RECLAIM_BAD)
 #define	nfserr_badname		cpu_to_be32(NFSERR_BADNAME)
+#define	nfserr_admin_revoked	cpu_to_be32(NFS4ERR_ADMIN_REVOKED)
 #define	nfserr_cb_path_down	cpu_to_be32(NFSERR_CB_PATH_DOWN)
 #define	nfserr_locked		cpu_to_be32(NFSERR_LOCKED)
 #define	nfserr_wrongsec		cpu_to_be32(NFSERR_WRONGSEC)
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 95457cfd37fc..8429bfc06216 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -88,17 +88,20 @@ struct nfsd4_callback_ops {
  */
 struct nfs4_stid {
 	refcount_t		sc_count;
-#define NFS4_OPEN_STID 1
-#define NFS4_LOCK_STID 2
-#define NFS4_DELEG_STID 4
+	struct list_head	sc_cp_list;
+	unsigned short		sc_type;
+#define NFS4_OPEN_STID			BIT(0)
+#define NFS4_LOCK_STID			BIT(1)
+#define NFS4_DELEG_STID			BIT(2)
 /* For an open stateid kept around *only* to process close replays: */
-#define NFS4_CLOSED_STID 8
+#define NFS4_CLOSED_STID		BIT(3)
 /* For a deleg stateid kept around only to process free_stateid's: */
-#define NFS4_REVOKED_DELEG_STID 16
-#define NFS4_CLOSED_DELEG_STID 32
-#define NFS4_LAYOUT_STID 64
-	struct list_head	sc_cp_list;
-	unsigned char		sc_type;
+#define NFS4_REVOKED_DELEG_STID		BIT(4)
+#define NFS4_CLOSED_DELEG_STID		BIT(5)
+#define NFS4_LAYOUT_STID		BIT(6)
+#define NFS4_ADMIN_REVOKED_STID		BIT(7)
+#define NFS4_ADMIN_REVOKED_LOCK_STID	BIT(8)
+#define NFS4_ADMIN_REVOKED_DELEG_STID	BIT(9)
 	stateid_t		sc_stateid;
 	spinlock_t		sc_lock;
 	struct nfs4_client	*sc_client;
@@ -330,6 +333,7 @@ struct nfs4_client {
 	clientid_t		cl_clientid;	/* generated by server */
 	nfs4_verifier		cl_confirm;	/* generated by server */
 	u32			cl_minorversion;
+	atomic_t		cl_admin_revoked; /* count of admin-revoked states */
 	/* NFSv4.1 client implementation id: */
 	struct xdr_netobj	cl_nii_domain;
 	struct xdr_netobj	cl_nii_name;
@@ -645,7 +649,7 @@ extern __be32 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
 		stateid_t *stateid, int flags, struct nfsd_file **filp,
 		struct nfs4_stid **cstid);
 __be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
-		     stateid_t *stateid, unsigned char typemask,
+		     stateid_t *stateid, unsigned short typemask,
 		     struct nfs4_stid **s, struct nfsd_net *nn);
 struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *slab,
 				  void (*sc_free)(struct nfs4_stid *));
@@ -691,6 +695,8 @@ static inline void get_nfs4_file(struct nfs4_file *fi)
 }
 struct nfsd_file *find_any_file(struct nfs4_file *f);
 
+void nfsd4_revoke_states(struct net *net, struct super_block *sb);
+
 /* grace period management */
 void nfsd4_end_grace(struct nfsd_net *nn);
 



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] nfsd: allow open state ids to be revoked and then freed
  2022-01-27  4:58 [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked NeilBrown
                   ` (2 preceding siblings ...)
  2022-01-27  4:58 ` [PATCH 4/4] nfsd: allow delegation " NeilBrown
@ 2022-01-27  4:58 ` NeilBrown
  2022-01-27 17:00 ` [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked Chuck Lever III
  2022-01-27 20:06 ` J. Bruce Fields
  5 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2022-01-27  4:58 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

Revoking state through 'unlock_filesystem' now revokes any open states
found.  When the stateids are then freed by the client, the revoked
stateids will be cleaned up correctly.

Possibly the related lock states should be revoked too, but a
subsequent patch will do that for all lock state on the superblock.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4state.c |   37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index ad02b4ebe1b6..6854546ebd08 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1113,6 +1113,9 @@ nfs4_put_stid(struct nfs4_stid *s)
 		return;
 	}
 	idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
+	if (s->sc_type == NFS4_ADMIN_REVOKED_STID)
+		atomic_dec(&clp->cl_admin_revoked);
+	s->sc_type = 0;
 	nfs4_free_cpntf_statelist(clp->net, s);
 	spin_unlock(&clp->cl_lock);
 	s->sc_free(s);
@@ -1165,6 +1168,9 @@ static void destroy_unhashed_deleg(struct nfs4_delegation *dp)
 
 void nfs4_unhash_stid(struct nfs4_stid *s)
 {
+	struct nfs4_client *clp = s->sc_client;
+	if (s->sc_type == NFS4_ADMIN_REVOKED_STID)
+		atomic_dec(&clp->cl_admin_revoked);
 	s->sc_type = 0;
 }
 
@@ -1431,6 +1437,9 @@ static void put_ol_stateid_locked(struct nfs4_ol_stateid *stp,
 	}
 
 	idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
+	if (s->sc_type == NFS4_ADMIN_REVOKED_STID)
+		atomic_dec(&clp->cl_admin_revoked);
+	s->sc_type = 0;
 	list_add(&stp->st_locks, reaplist);
 }
 
@@ -1601,7 +1610,7 @@ void nfsd4_revoke_states(struct net *net, struct super_block *sb)
 	unsigned int idhashval;
 	unsigned short sc_types;
 
-	sc_types = 0;
+	sc_types = NFS4_OPEN_STID;
 
 	spin_lock(&nn->client_lock);
 	for (idhashval = 0; idhashval < CLIENT_HASH_MASK; idhashval++) {
@@ -1612,8 +1621,23 @@ void nfsd4_revoke_states(struct net *net, struct super_block *sb)
 			struct nfs4_stid *stid = find_one_sb_stid(clp, sb,
 								  sc_types);
 			if (stid) {
+				struct nfs4_ol_stateid *stp;
+
 				spin_unlock(&nn->client_lock);
 				switch (stid->sc_type) {
+				case NFS4_OPEN_STID:
+					stp = openlockstateid(stid);
+					mutex_lock_nested(&stp->st_mutex,
+							  OPEN_STATEID_MUTEX);
+					if (stid->sc_type == NFS4_OPEN_STID) {
+						release_all_access(stp);
+						stid->sc_type =
+							NFS4_ADMIN_REVOKED_STID;
+						atomic_inc(&clp->cl_admin_revoked);
+						/* FIXME revoke the locks */
+					}
+					mutex_unlock(&stp->st_mutex);
+					break;
 				}
 				nfs4_put_stid(stid);
 				spin_lock(&nn->client_lock);
@@ -6235,6 +6259,8 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	stateid_t *stateid = &free_stateid->fr_stateid;
 	struct nfs4_stid *s;
 	struct nfs4_delegation *dp;
+	struct nfs4_ol_stateid *stp;
+	LIST_HEAD(reaplist);
 	struct nfs4_client *cl = cstate->clp;
 	__be32 ret = nfserr_bad_stateid;
 
@@ -6259,6 +6285,15 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		spin_unlock(&cl->cl_lock);
 		ret = nfsd4_free_lock_stateid(stateid, s);
 		goto out;
+	case NFS4_ADMIN_REVOKED_STID:
+		stp = openlockstateid(s);
+		spin_unlock(&s->sc_lock);
+		if (unhash_open_stateid(stp, &reaplist))
+			put_ol_stateid_locked(stp, &reaplist);
+		spin_unlock(&cl->cl_lock);
+		free_ol_stateid_reaplist(&reaplist);
+		ret = nfs_ok;
+		goto out;
 	case NFS4_REVOKED_DELEG_STID:
 		spin_unlock(&s->sc_lock);
 		dp = delegstateid(s);



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] nfsd: allow lock state ids to be revoked and then freed
  2022-01-27  4:58 [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked NeilBrown
  2022-01-27  4:58 ` [PATCH 1/4] nfsd: prepare for supporting admin-revocation of state NeilBrown
@ 2022-01-27  4:58 ` NeilBrown
  2022-01-27  4:58 ` [PATCH 4/4] nfsd: allow delegation " NeilBrown
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2022-01-27  4:58 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

Revoking state through 'unlock_filesystem' now revokes any lock states
found.  When the stateids are then freed by the client, the revoked
stateids will be cleaned up correctly.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4state.c |   43 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 6854546ebd08..b1dc8ed1d356 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1113,7 +1113,8 @@ nfs4_put_stid(struct nfs4_stid *s)
 		return;
 	}
 	idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
-	if (s->sc_type == NFS4_ADMIN_REVOKED_STID)
+	if (s->sc_type == NFS4_ADMIN_REVOKED_STID ||
+	    s->sc_type == NFS4_ADMIN_REVOKED_LOCK_STID)
 		atomic_dec(&clp->cl_admin_revoked);
 	s->sc_type = 0;
 	nfs4_free_cpntf_statelist(clp->net, s);
@@ -1169,7 +1170,8 @@ static void destroy_unhashed_deleg(struct nfs4_delegation *dp)
 void nfs4_unhash_stid(struct nfs4_stid *s)
 {
 	struct nfs4_client *clp = s->sc_client;
-	if (s->sc_type == NFS4_ADMIN_REVOKED_STID)
+	if (s->sc_type == NFS4_ADMIN_REVOKED_STID ||
+	    s->sc_type == NFS4_ADMIN_REVOKED_LOCK_STID)
 		atomic_dec(&clp->cl_admin_revoked);
 	s->sc_type = 0;
 }
@@ -1437,7 +1439,8 @@ static void put_ol_stateid_locked(struct nfs4_ol_stateid *stp,
 	}
 
 	idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
-	if (s->sc_type == NFS4_ADMIN_REVOKED_STID)
+	if (s->sc_type == NFS4_ADMIN_REVOKED_STID ||
+	    s->sc_type == NFS4_ADMIN_REVOKED_LOCK_STID)
 		atomic_dec(&clp->cl_admin_revoked);
 	s->sc_type = 0;
 	list_add(&stp->st_locks, reaplist);
@@ -1610,7 +1613,7 @@ void nfsd4_revoke_states(struct net *net, struct super_block *sb)
 	unsigned int idhashval;
 	unsigned short sc_types;
 
-	sc_types = NFS4_OPEN_STID;
+	sc_types = NFS4_OPEN_STID | NFS4_LOCK_STID;
 
 	spin_lock(&nn->client_lock);
 	for (idhashval = 0; idhashval < CLIENT_HASH_MASK; idhashval++) {
@@ -1638,6 +1641,28 @@ void nfsd4_revoke_states(struct net *net, struct super_block *sb)
 					}
 					mutex_unlock(&stp->st_mutex);
 					break;
+				case NFS4_LOCK_STID:
+					stp = openlockstateid(stid);
+					mutex_lock_nested(&stp->st_mutex,
+							  LOCK_STATEID_MUTEX);
+					if (stid->sc_type == NFS4_LOCK_STID) {
+						struct nfs4_lockowner *lo =
+							lockowner(stp->st_stateowner);
+						struct nfsd_file *nf;
+						nf = find_any_file(stp->st_stid.sc_file);
+						if (nf) {
+							get_file(nf->nf_file);
+							filp_close(nf->nf_file,
+								   (fl_owner_t)lo);
+							nfsd_file_put(nf);
+						}
+						release_all_access(stp);
+						stid->sc_type =
+							NFS4_ADMIN_REVOKED_LOCK_STID;
+						atomic_inc(&clp->cl_admin_revoked);
+					}
+					mutex_unlock(&stp->st_mutex);
+					break;
 				}
 				nfs4_put_stid(stid);
 				spin_lock(&nn->client_lock);
@@ -6261,6 +6286,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	struct nfs4_delegation *dp;
 	struct nfs4_ol_stateid *stp;
 	LIST_HEAD(reaplist);
+	bool unhashed;
 	struct nfs4_client *cl = cstate->clp;
 	__be32 ret = nfserr_bad_stateid;
 
@@ -6294,6 +6320,15 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		free_ol_stateid_reaplist(&reaplist);
 		ret = nfs_ok;
 		goto out;
+	case NFS4_ADMIN_REVOKED_LOCK_STID:
+		stp = openlockstateid(s);
+		spin_unlock(&s->sc_lock);
+		unhashed = unhash_lock_stateid(stp);
+		spin_unlock(&cl->cl_lock);
+		if (unhashed)
+			nfs4_put_stid(s);
+		ret = nfs_ok;
+		goto out;
 	case NFS4_REVOKED_DELEG_STID:
 		spin_unlock(&s->sc_lock);
 		dp = delegstateid(s);



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] nfsd: allow delegation state ids to be revoked and then freed
  2022-01-27  4:58 [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked NeilBrown
  2022-01-27  4:58 ` [PATCH 1/4] nfsd: prepare for supporting admin-revocation of state NeilBrown
  2022-01-27  4:58 ` [PATCH 3/4] nfsd: allow lock state ids to be revoked and then freed NeilBrown
@ 2022-01-27  4:58 ` NeilBrown
  2022-01-27  4:58 ` [PATCH 2/4] nfsd: allow open " NeilBrown
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2022-01-27  4:58 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

Revoking state through 'unlock_filesystem' now revokes any delegation
states found.  When the stateids are then freed by the client, the
revoked stateids will be cleaned up correctly.

As there is already support for revoking delegations, we build on that
for admin-revoking.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4state.c |   35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index b1dc8ed1d356..30177554a77b 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1270,14 +1270,15 @@ static void destroy_delegation(struct nfs4_delegation *dp)
 		destroy_unhashed_deleg(dp);
 }
 
-static void revoke_delegation(struct nfs4_delegation *dp)
+static void revoke_delegation(struct nfs4_delegation *dp,
+			      unsigned short sc_type)
 {
 	struct nfs4_client *clp = dp->dl_stid.sc_client;
 
 	WARN_ON(!list_empty(&dp->dl_recall_lru));
 
-	if (clp->cl_minorversion) {
-		dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID;
+	if (clp->cl_minorversion || sc_type == NFS4_ADMIN_REVOKED_DELEG_STID) {
+		dp->dl_stid.sc_type = sc_type;
 		refcount_inc(&dp->dl_stid.sc_count);
 		spin_lock(&clp->cl_lock);
 		list_add(&dp->dl_recall_lru, &clp->cl_revoked);
@@ -1613,7 +1614,7 @@ void nfsd4_revoke_states(struct net *net, struct super_block *sb)
 	unsigned int idhashval;
 	unsigned short sc_types;
 
-	sc_types = NFS4_OPEN_STID | NFS4_LOCK_STID;
+	sc_types = NFS4_OPEN_STID | NFS4_LOCK_STID | NFS4_DELEG_STID;
 
 	spin_lock(&nn->client_lock);
 	for (idhashval = 0; idhashval < CLIENT_HASH_MASK; idhashval++) {
@@ -1625,6 +1626,7 @@ void nfsd4_revoke_states(struct net *net, struct super_block *sb)
 								  sc_types);
 			if (stid) {
 				struct nfs4_ol_stateid *stp;
+				struct nfs4_delegation *dp;
 
 				spin_unlock(&nn->client_lock);
 				switch (stid->sc_type) {
@@ -1663,6 +1665,18 @@ void nfsd4_revoke_states(struct net *net, struct super_block *sb)
 					}
 					mutex_unlock(&stp->st_mutex);
 					break;
+				case NFS4_DELEG_STID:
+					dp = delegstateid(stid);
+					spin_lock(&state_lock);
+					if (!unhash_delegation_locked(dp))
+						dp = NULL;
+					spin_unlock(&state_lock);
+					if (dp) {
+						list_del_init(&dp->dl_recall_lru);
+						revoke_delegation(
+							dp, NFS4_ADMIN_REVOKED_DELEG_STID);
+					}
+					break;
 				}
 				nfs4_put_stid(stid);
 				spin_lock(&nn->client_lock);
@@ -4742,8 +4756,9 @@ static int nfsd4_cb_recall_done(struct nfsd4_callback *cb,
 	struct nfs4_delegation *dp = cb_to_delegation(cb);
 
 	if (dp->dl_stid.sc_type == NFS4_CLOSED_DELEG_STID ||
-	    dp->dl_stid.sc_type == NFS4_REVOKED_DELEG_STID)
-	        return 1;
+	    dp->dl_stid.sc_type == NFS4_REVOKED_DELEG_STID ||
+	    dp->dl_stid.sc_type == NFS4_ADMIN_REVOKED_DELEG_STID)
+		return 1;
 
 	switch (task->tk_status) {
 	case 0:
@@ -5770,7 +5785,7 @@ nfs4_laundromat(struct nfsd_net *nn)
 		dp = list_first_entry(&reaplist, struct nfs4_delegation,
 					dl_recall_lru);
 		list_del_init(&dp->dl_recall_lru);
-		revoke_delegation(dp);
+		revoke_delegation(dp, NFS4_REVOKED_DELEG_STID);
 	}
 
 	spin_lock(&nn->client_lock);
@@ -5988,8 +6003,9 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
 	 */
 	if (typemask & NFS4_REVOKED_DELEG_STID)
 		return_revoked = true;
-	else if (typemask & NFS4_DELEG_STID)
-		typemask |= NFS4_REVOKED_DELEG_STID;
+	if (typemask & NFS4_DELEG_STID)
+		typemask |= NFS4_REVOKED_DELEG_STID |
+			NFS4_ADMIN_REVOKED_DELEG_STID;
 
 	if (ZERO_STATEID(stateid) || ONE_STATEID(stateid) ||
 		CLOSE_STATEID(stateid))
@@ -6330,6 +6346,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		ret = nfs_ok;
 		goto out;
 	case NFS4_REVOKED_DELEG_STID:
+	case NFS4_ADMIN_REVOKED_DELEG_STID:
 		spin_unlock(&s->sc_lock);
 		dp = delegstateid(s);
 		list_del_init(&dp->dl_recall_lru);



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] nfsd: prepare for supporting admin-revocation of state
  2022-01-27  4:58 ` [PATCH 1/4] nfsd: prepare for supporting admin-revocation of state NeilBrown
@ 2022-01-27 12:59     ` kernel test robot
  2022-01-27 14:51     ` kernel test robot
  1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2022-01-27 12:59 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: kbuild-all, linux-nfs

Hi NeilBrown,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.17-rc1 next-20220127]
[cannot apply to cel-2.6/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/NeilBrown/nfsd-allow-NFSv4-state-to-be-revoked/20220127-130132
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 0280e3c58f92b2fe0e8fbbdf8d386449168de4a8
config: parisc-defconfig (https://download.01.org/0day-ci/archive/20220127/202201272005.vw3QlCL5-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/36a49da0c1204b769399e692f540bf2a6332f89a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review NeilBrown/nfsd-allow-NFSv4-state-to-be-revoked/20220127-130132
        git checkout 36a49da0c1204b769399e692f540bf2a6332f89a
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=parisc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "nfsd4_revoke_states" [fs/nfsd/nfsd.ko] undefined!

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] nfsd: prepare for supporting admin-revocation of state
@ 2022-01-27 12:59     ` kernel test robot
  0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2022-01-27 12:59 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1848 bytes --]

Hi NeilBrown,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.17-rc1 next-20220127]
[cannot apply to cel-2.6/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/NeilBrown/nfsd-allow-NFSv4-state-to-be-revoked/20220127-130132
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 0280e3c58f92b2fe0e8fbbdf8d386449168de4a8
config: parisc-defconfig (https://download.01.org/0day-ci/archive/20220127/202201272005.vw3QlCL5-lkp(a)intel.com/config)
compiler: hppa-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/36a49da0c1204b769399e692f540bf2a6332f89a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review NeilBrown/nfsd-allow-NFSv4-state-to-be-revoked/20220127-130132
        git checkout 36a49da0c1204b769399e692f540bf2a6332f89a
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=parisc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "nfsd4_revoke_states" [fs/nfsd/nfsd.ko] undefined!

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] nfsd: prepare for supporting admin-revocation of state
  2022-01-27  4:58 ` [PATCH 1/4] nfsd: prepare for supporting admin-revocation of state NeilBrown
@ 2022-01-27 14:51     ` kernel test robot
  2022-01-27 14:51     ` kernel test robot
  1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2022-01-27 14:51 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: kbuild-all, linux-nfs

Hi NeilBrown,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.17-rc1 next-20220127]
[cannot apply to cel-2.6/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/NeilBrown/nfsd-allow-NFSv4-state-to-be-revoked/20220127-130132
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 0280e3c58f92b2fe0e8fbbdf8d386449168de4a8
config: s390-randconfig-r004-20220126 (https://download.01.org/0day-ci/archive/20220127/202201272253.LNgeJGaJ-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/36a49da0c1204b769399e692f540bf2a6332f89a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review NeilBrown/nfsd-allow-NFSv4-state-to-be-revoked/20220127-130132
        git checkout 36a49da0c1204b769399e692f540bf2a6332f89a
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=s390 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   s390-linux-ld: fs/nfsd/nfsctl.o: in function `write_unlock_fs':
>> nfsctl.c:(.text+0x302): undefined reference to `nfsd4_revoke_states'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] nfsd: prepare for supporting admin-revocation of state
@ 2022-01-27 14:51     ` kernel test robot
  0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2022-01-27 14:51 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1904 bytes --]

Hi NeilBrown,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.17-rc1 next-20220127]
[cannot apply to cel-2.6/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/NeilBrown/nfsd-allow-NFSv4-state-to-be-revoked/20220127-130132
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 0280e3c58f92b2fe0e8fbbdf8d386449168de4a8
config: s390-randconfig-r004-20220126 (https://download.01.org/0day-ci/archive/20220127/202201272253.LNgeJGaJ-lkp(a)intel.com/config)
compiler: s390-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/36a49da0c1204b769399e692f540bf2a6332f89a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review NeilBrown/nfsd-allow-NFSv4-state-to-be-revoked/20220127-130132
        git checkout 36a49da0c1204b769399e692f540bf2a6332f89a
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=s390 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   s390-linux-ld: fs/nfsd/nfsctl.o: in function `write_unlock_fs':
>> nfsctl.c:(.text+0x302): undefined reference to `nfsd4_revoke_states'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.
  2022-01-27  4:58 [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked NeilBrown
                   ` (3 preceding siblings ...)
  2022-01-27  4:58 ` [PATCH 2/4] nfsd: allow open " NeilBrown
@ 2022-01-27 17:00 ` Chuck Lever III
  2022-01-27 22:41   ` NeilBrown
  2022-01-27 20:06 ` J. Bruce Fields
  5 siblings, 1 reply; 22+ messages in thread
From: Chuck Lever III @ 2022-01-27 17:00 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux NFS Mailing List

Hi Neil-

> On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@suse.de> wrote:
> 
> If a filesystem is exported to a client with NFSv4 and that client holds
> a file open, the filesystem cannot be unmounted without either stopping the
> NFS server completely, or blocking all access from that client
> (unexporting all filesystems) and waiting for the lease timeout.
> 
> For NFSv3 - and particularly NLM - it is possible to revoke all state by
> writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem.
> 
> This series extends this functionality to NFSv4.  With this, to unmount
> an exported filesystem is it sufficient to disable export of that
> filesystem, and then write the path to unlock_filesystem.
> 
> I've cursed mainly on NFSv4.1 and later for this.  I haven't tested
> yet with NFSv4.0 which has different mechanisms for state management.
> 
> If this series is seen as a general acceptable approach, I'll look into
> the NFSv4.0 aspects properly and make sure it works there.

I've browsed this series and need to think about:
- whether we want to enable administrative state revocation and
- whether NFSv4.0 can support that reasonably

In particular, are there security consequences for revoking
state? What would applications see, and would that depend on
which minor version is in use? Are there data corruption risks
if this facility were to be misused?

Also, Dai's courteous server work is something that potentially
conflicts with some of this, and I'd like to see that go in
first.

Do you have specific user requests for this feature, and if so,
what are the particular usage scenarios?


> Thanks,
> NeilBrown
> 
> 
> ---
> 
> NeilBrown (4):
>      nfsd: prepare for supporting admin-revocation of state
>      nfsd: allow open state ids to be revoked and then freed
>      nfsd: allow lock state ids to be revoked and then freed
>      nfsd: allow delegation state ids to be revoked and then freed
> 
> 
> fs/nfsd/nfs4state.c | 105 ++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 96 insertions(+), 9 deletions(-)
> 
> --
> Signature

You should fill this in. :-)

--
Chuck Lever




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.
  2022-01-27  4:58 [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked NeilBrown
                   ` (4 preceding siblings ...)
  2022-01-27 17:00 ` [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked Chuck Lever III
@ 2022-01-27 20:06 ` J. Bruce Fields
  5 siblings, 0 replies; 22+ messages in thread
From: J. Bruce Fields @ 2022-01-27 20:06 UTC (permalink / raw)
  To: NeilBrown; +Cc: Chuck Lever, linux-nfs

On Thu, Jan 27, 2022 at 03:58:10PM +1100, NeilBrown wrote:
> If a filesystem is exported to a client with NFSv4 and that client holds
> a file open, the filesystem cannot be unmounted without either stopping the
> NFS server completely, or blocking all access from that client
> (unexporting all filesystems) and waiting for the lease timeout.
> 
> For NFSv3 - and particularly NLM - it is possible to revoke all state by
> writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem.
> 
> This series extends this functionality to NFSv4.  With this, to unmount
> an exported filesystem is it sufficient to disable export of that
> filesystem, and then write the path to unlock_filesystem.

It's always been weird that /proc/fs/nfsd/unlock_filesystem was v3-only,
so thanks for looking into extending it to v4.

You can accomplish the same by stopping the server, unexporting, then
restarting, but then applications may see grace-period-length delays.
So in a way this is just an optimization for what's probably a rare
operation.  Probably worth it, but I'd still be curious if there's any
specific motivating cases you can share.

I guess the procedure would be to unexport and then write to
/proc/fs/nfsd/unlock_filesystem?  An option to exportfs to do both might
be handy.

> I've cursed mainly on NFSv4.1

It does inspire strong feelings sometimes.

--b.

> and later for this.  I haven't tested
> yet with NFSv4.0 which has different mechanisms for state management.
> 
> If this series is seen as a general acceptable approach, I'll look into
> the NFSv4.0 aspects properly and make sure it works there.
> 
> Thanks,
> NeilBrown
> 
> 
> ---
> 
> NeilBrown (4):
>       nfsd: prepare for supporting admin-revocation of state
>       nfsd: allow open state ids to be revoked and then freed
>       nfsd: allow lock state ids to be revoked and then freed
>       nfsd: allow delegation state ids to be revoked and then freed
> 
> 
>  fs/nfsd/nfs4state.c | 105 ++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 96 insertions(+), 9 deletions(-)
> 
> --
> Signature

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.
  2022-01-27 17:00 ` [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked Chuck Lever III
@ 2022-01-27 22:41   ` NeilBrown
  2022-01-27 23:15     ` dai.ngo
                       ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: NeilBrown @ 2022-01-27 22:41 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List

On Fri, 28 Jan 2022, Chuck Lever III wrote:
> Hi Neil-
> 
> > On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@suse.de> wrote:
> > 
> > If a filesystem is exported to a client with NFSv4 and that client holds
> > a file open, the filesystem cannot be unmounted without either stopping the
> > NFS server completely, or blocking all access from that client
> > (unexporting all filesystems) and waiting for the lease timeout.
> > 
> > For NFSv3 - and particularly NLM - it is possible to revoke all state by
> > writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem.
> > 
> > This series extends this functionality to NFSv4.  With this, to unmount
> > an exported filesystem is it sufficient to disable export of that
> > filesystem, and then write the path to unlock_filesystem.
> > 
> > I've cursed mainly on NFSv4.1 and later for this.  I haven't tested
> > yet with NFSv4.0 which has different mechanisms for state management.
> > 
> > If this series is seen as a general acceptable approach, I'll look into
> > the NFSv4.0 aspects properly and make sure it works there.
> 
> I've browsed this series and need to think about:
> - whether we want to enable administrative state revocation and
> - whether NFSv4.0 can support that reasonably
> 
> In particular, are there security consequences for revoking
> state? What would applications see, and would that depend on
> which minor version is in use? Are there data corruption risks
> if this facility were to be misused?

The expectation is that this would only be used after unexporting the
filesystem.  In that case, the client wouldn't notice any difference
from the act of writing to unlock_filesystem, as the entire filesystem
would already be inaccessible.

If we did unlock_filesystem a filesystem that was still exported, the
client would see similar behaviour to a network partition that was of
longer duration than the lease time.   Locks would be lost.

> 
> Also, Dai's courteous server work is something that potentially
> conflicts with some of this, and I'd like to see that go in
> first.

I'm perfectly happy to wait for the courteous server work to land before
pursuing this.

> 
> Do you have specific user requests for this feature, and if so,
> what are the particular usage scenarios?

It's complicated....

The customer has an HA config with multiple filesystem resource which
they want to be able to migrate independently.  I don't think we really
support that, but they seem to want to see if they can make it work (and
it should be noted that I talk to an L2 support technician who talks to
the customer representative, so I might be getting the full story).

Customer reported that even after unexporting a filesystem, they cannot
then unmount it.  Whether or not we think that independent filesystem
resources is supportable, I do think that the customer should have a
clear path for unmounting a filesystem without interfering with service
provided from other filesystems.  Stopping nfsd would interfere with
that service by forcing a grace-period on all filesystems.
The RFC explicitly supports admin-revocation of state, and that would
address this specific need, so it seemed completely appropriate to
provide it.

As an aside ...  I'd like to be able to suggest that the customer use
network namespaces for the different filesystem resources.  Each could
be in its own namespace and managed independently.  However I don't
think we have good admin infrastructure for that do we?

I'd like to be able to say "set up these 2 or 3 config files and run 
systemctl start nfs-server@foo and the 'foo' network namespace will be
created, configured, and have an nfs server running".
Do we have anything approaching that?  Even a HOWTO ??

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.
  2022-01-27 22:41   ` NeilBrown
@ 2022-01-27 23:15     ` dai.ngo
  2022-01-28  0:07     ` Chuck Lever III
  2022-01-28  2:51     ` J. Bruce Fields
  2 siblings, 0 replies; 22+ messages in thread
From: dai.ngo @ 2022-01-27 23:15 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever III; +Cc: Linux NFS Mailing List


On 1/27/22 2:41 PM, NeilBrown wrote:
> On Fri, 28 Jan 2022, Chuck Lever III wrote:
>> Hi Neil-
>>
>>> On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@suse.de> wrote:
>>>
>>> If a filesystem is exported to a client with NFSv4 and that client holds
>>> a file open, the filesystem cannot be unmounted without either stopping the
>>> NFS server completely, or blocking all access from that client
>>> (unexporting all filesystems) and waiting for the lease timeout.
>>>
>>> For NFSv3 - and particularly NLM - it is possible to revoke all state by
>>> writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem.
>>>
>>> This series extends this functionality to NFSv4.  With this, to unmount
>>> an exported filesystem is it sufficient to disable export of that
>>> filesystem, and then write the path to unlock_filesystem.
>>>
>>> I've cursed mainly on NFSv4.1 and later for this.  I haven't tested
>>> yet with NFSv4.0 which has different mechanisms for state management.
>>>
>>> If this series is seen as a general acceptable approach, I'll look into
>>> the NFSv4.0 aspects properly and make sure it works there.
>> I've browsed this series and need to think about:
>> - whether we want to enable administrative state revocation and
>> - whether NFSv4.0 can support that reasonably
>>
>> In particular, are there security consequences for revoking
>> state? What would applications see, and would that depend on
>> which minor version is in use? Are there data corruption risks
>> if this facility were to be misused?
> The expectation is that this would only be used after unexporting the
> filesystem.  In that case, the client wouldn't notice any difference
> from the act of writing to unlock_filesystem, as the entire filesystem
> would already be inaccessible.
>
> If we did unlock_filesystem a filesystem that was still exported, the
> client would see similar behaviour to a network partition that was of
> longer duration than the lease time.   Locks would be lost.
>
>> Also, Dai's courteous server work is something that potentially
>> conflicts with some of this, and I'd like to see that go in
>> first.
> I'm perfectly happy to wait for the courteous server work to land before
> pursuing this.

Thank you Chuck and Neil, I'm chasing a couple intermittent share
reservation related problems with pynfs test. I hope to have them
resolved and submit the v10 patch by end of this week.

-Dai

>
>> Do you have specific user requests for this feature, and if so,
>> what are the particular usage scenarios?
> It's complicated....
>
> The customer has an HA config with multiple filesystem resource which
> they want to be able to migrate independently.  I don't think we really
> support that, but they seem to want to see if they can make it work (and
> it should be noted that I talk to an L2 support technician who talks to
> the customer representative, so I might be getting the full story).
>
> Customer reported that even after unexporting a filesystem, they cannot
> then unmount it.  Whether or not we think that independent filesystem
> resources is supportable, I do think that the customer should have a
> clear path for unmounting a filesystem without interfering with service
> provided from other filesystems.  Stopping nfsd would interfere with
> that service by forcing a grace-period on all filesystems.
> The RFC explicitly supports admin-revocation of state, and that would
> address this specific need, so it seemed completely appropriate to
> provide it.
>
> As an aside ...  I'd like to be able to suggest that the customer use
> network namespaces for the different filesystem resources.  Each could
> be in its own namespace and managed independently.  However I don't
> think we have good admin infrastructure for that do we?
>
> I'd like to be able to say "set up these 2 or 3 config files and run
> systemctl start nfs-server@foo and the 'foo' network namespace will be
> created, configured, and have an nfs server running".
> Do we have anything approaching that?  Even a HOWTO ??
>
> Thanks,
> NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.
  2022-01-27 22:41   ` NeilBrown
  2022-01-27 23:15     ` dai.ngo
@ 2022-01-28  0:07     ` Chuck Lever III
  2022-01-28  4:24       ` NeilBrown
  2022-01-28  2:51     ` J. Bruce Fields
  2 siblings, 1 reply; 22+ messages in thread
From: Chuck Lever III @ 2022-01-28  0:07 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux NFS Mailing List



> On Jan 27, 2022, at 5:41 PM, NeilBrown <neilb@suse.de> wrote:
> 
> On Fri, 28 Jan 2022, Chuck Lever III wrote:
>> Hi Neil-
>> 
>>> On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@suse.de> wrote:
>>> 
>>> If a filesystem is exported to a client with NFSv4 and that client holds
>>> a file open, the filesystem cannot be unmounted without either stopping the
>>> NFS server completely, or blocking all access from that client
>>> (unexporting all filesystems) and waiting for the lease timeout.
>>> 
>>> For NFSv3 - and particularly NLM - it is possible to revoke all state by
>>> writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem.
>>> 
>>> This series extends this functionality to NFSv4.  With this, to unmount
>>> an exported filesystem is it sufficient to disable export of that
>>> filesystem, and then write the path to unlock_filesystem.
>>> 
>>> I've cursed mainly on NFSv4.1 and later for this.  I haven't tested
>>> yet with NFSv4.0 which has different mechanisms for state management.
>>> 
>>> If this series is seen as a general acceptable approach, I'll look into
>>> the NFSv4.0 aspects properly and make sure it works there.
>> 
>> I've browsed this series and need to think about:
>> - whether we want to enable administrative state revocation and
>> - whether NFSv4.0 can support that reasonably
>> 
>> In particular, are there security consequences for revoking
>> state? What would applications see, and would that depend on
>> which minor version is in use? Are there data corruption risks
>> if this facility were to be misused?
> 
> The expectation is that this would only be used after unexporting the
> filesystem.  In that case, the client wouldn't notice any difference
> from the act of writing to unlock_filesystem, as the entire filesystem
> would already be inaccessible.
> 
> If we did unlock_filesystem a filesystem that was still exported, the
> client would see similar behaviour to a network partition that was of
> longer duration than the lease time.   Locks would be lost.
> 
>> 
>> Also, Dai's courteous server work is something that potentially
>> conflicts with some of this, and I'd like to see that go in
>> first.
> 
> I'm perfectly happy to wait for the courteous server work to land before
> pursuing this.
> 
>> 
>> Do you have specific user requests for this feature, and if so,
>> what are the particular usage scenarios?
> 
> It's complicated....
> 
> The customer has an HA config with multiple filesystem resource which
> they want to be able to migrate independently.  I don't think we really
> support that,

With NFSv4, the protocol has mechanisms to indicate to clients that
a shared filesystem has migrated, and to indicate that the clients'
state has been migrated too. Clients can reclaim their state if the
servers did not migrate that state with the data. It deals with the
edge cases to prevent clients from stealing open/lock state during
the migration.

Unexporting doesn't seem like the right approach to that.


> but they seem to want to see if they can make it work (and
> it should be noted that I talk to an L2 support technician who talks to
> the customer representative, so I might be getting the full story).
> 
> Customer reported that even after unexporting a filesystem, they cannot
> then unmount it.

My first thought is that probably clients are still pinning
resources on that shared filesystem. I guess that's what the
unlock_ interface is supposed to deal with. But that suggests
to me that unexporting first is not as risk-free as you
describe above. I think applications would notice and there
would be edge cases where other clients might be able to
grab open/lock state before the original holders could
re-establish their lease.


> Whether or not we think that independent filesystem
> resources is supportable, I do think that the customer should have a
> clear path for unmounting a filesystem without interfering with service
> provided from other filesystems.

Maybe. I guess I put that in the "last resort" category
rather than "this is something safe that I want to do as
part of daily operation" category.


> Stopping nfsd would interfere with
> that service by forcing a grace-period on all filesystems.

Yep. We have discussed implementing a per-filesystem
grace period in the past. That is probably a pre-requisite
to enabling filesystem migration.


> The RFC explicitly supports admin-revocation of state, and that would
> address this specific need, so it seemed completely appropriate to
> provide it.

Well the RFC also provides for migrating filesystems without
stopping the NFS service. If that's truly the goal, then I
think we want to encourage that direction instead of ripping
out open and lock state.

Also, it's not clear to me that clients support administrative
revocation as broadly as we might like. The Linux NFS client
does have support for NFSv4 migration, though it's a bit
fallow these days.


> As an aside ...  I'd like to be able to suggest that the customer use
> network namespaces for the different filesystem resources.  Each could
> be in its own namespace and managed independently.  However I don't
> think we have good admin infrastructure for that do we?

None that I'm aware of. SteveD is the best person to ask.


> I'd like to be able to say "set up these 2 or 3 config files and run 
> systemctl start nfs-server@foo and the 'foo' network namespace will be
> created, configured, and have an nfs server running".
> Do we have anything approaching that?  Even a HOWTO ??

Interesting idea! But doesn't ring a bell.

--
Chuck Lever




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.
  2022-01-27 22:41   ` NeilBrown
  2022-01-27 23:15     ` dai.ngo
  2022-01-28  0:07     ` Chuck Lever III
@ 2022-01-28  2:51     ` J. Bruce Fields
  2022-01-28  4:14       ` NeilBrown
  2 siblings, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2022-01-28  2:51 UTC (permalink / raw)
  To: NeilBrown; +Cc: Chuck Lever III, Linux NFS Mailing List

On Fri, Jan 28, 2022 at 09:41:24AM +1100, NeilBrown wrote:
> It's complicated....
> 
> The customer has an HA config with multiple filesystem resource which
> they want to be able to migrate independently.  I don't think we really
> support that, but they seem to want to see if they can make it work (and
> it should be noted that I talk to an L2 support technician who talks to
> the customer representative, so I might be getting the full story).
> 
> Customer reported that even after unexporting a filesystem, they cannot
> then unmount it.  Whether or not we think that independent filesystem
> resources is supportable, I do think that the customer should have a
> clear path for unmounting a filesystem without interfering with service
> provided from other filesystems.  Stopping nfsd would interfere with
> that service by forcing a grace-period on all filesystems.
> The RFC explicitly supports admin-revocation of state, and that would
> address this specific need, so it seemed completely appropriate to
> provide it.

I was little worried that might be the use-case.

I don't see how it's going to work.  You've got clients that hold locks
an opens on the unexported filesystem.  So maybe you can use an NFSv4
referral to point them to the new server.  Are they going to try to
issue reclaims to the new server?  There's more to do before this works.

> As an aside ...  I'd like to be able to suggest that the customer use
> network namespaces for the different filesystem resources.  Each could
> be in its own namespace and managed independently.

Yeah.  Then you're basically migrating the whole server, not just the
one export, and that's more of a solved problem.

> However I don't think we have good admin infrastructure for that do
> we?
> 
> I'd like to be able to say "set up these 2 or 3 config files and run
> systemctl start nfs-server@foo and the 'foo' network namespace will be
> created, configured, and have an nfs server running".  Do we have
> anything approaching that?  Even a HOWTO ??

But I don't think we've got anything that simple yet?

--b.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.
  2022-01-28  2:51     ` J. Bruce Fields
@ 2022-01-28  4:14       ` NeilBrown
  2022-01-28 13:38         ` J. Bruce Fields
  0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2022-01-28  4:14 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Chuck Lever III, Linux NFS Mailing List

On Fri, 28 Jan 2022, J. Bruce Fields wrote:
> On Fri, Jan 28, 2022 at 09:41:24AM +1100, NeilBrown wrote:
> > It's complicated....
> > 
> > The customer has an HA config with multiple filesystem resource which
> > they want to be able to migrate independently.  I don't think we really
> > support that, but they seem to want to see if they can make it work (and
> > it should be noted that I talk to an L2 support technician who talks to
> > the customer representative, so I might be getting the full story).
> > 
> > Customer reported that even after unexporting a filesystem, they cannot
> > then unmount it.  Whether or not we think that independent filesystem
> > resources is supportable, I do think that the customer should have a
> > clear path for unmounting a filesystem without interfering with service
> > provided from other filesystems.  Stopping nfsd would interfere with
> > that service by forcing a grace-period on all filesystems.
> > The RFC explicitly supports admin-revocation of state, and that would
> > address this specific need, so it seemed completely appropriate to
> > provide it.
> 
> I was little worried that might be the use-case.
> 
> 

:-)

> I don't see how it's going to work.  You've got clients that hold locks
> an opens on the unexported filesystem.  So maybe you can use an NFSv4
> referral to point them to the new server.  Are they going to try to
> issue reclaims to the new server?  There's more to do before this works.

As I hope I implied, I'm not at all sure that the specific problem that
the customer raised (cannot unmount a filesystem) directly related to
the general solution that the customer is trying to create.  Some
customers like us to hold their hand the whole way, others like to (feel
that they) have more control.  In general I like to encourage
independence (but I have to consciously avoid trusting the results).

We have an "unlock_filesystem" interface.  I want it to work for NFSv4. 
The HA config was background, not a complete motivation.

> 
> > As an aside ...  I'd like to be able to suggest that the customer use
> > network namespaces for the different filesystem resources.  Each could
> > be in its own namespace and managed independently.
> 
> Yeah.  Then you're basically migrating the whole server, not just the
> one export, and that's more of a solved problem.

Exactly.

> 
> > However I don't think we have good admin infrastructure for that do
> > we?
> > 
> > I'd like to be able to say "set up these 2 or 3 config files and run
> > systemctl start nfs-server@foo and the 'foo' network namespace will be
> > created, configured, and have an nfs server running".  Do we have
> > anything approaching that?  Even a HOWTO ??
> 
> But I don't think we've got anything that simple yet?

I guess I have some work to do....

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.
  2022-01-28  0:07     ` Chuck Lever III
@ 2022-01-28  4:24       ` NeilBrown
  2022-01-28  4:35         ` Trond Myklebust
  2022-01-28 13:46         ` Chuck Lever III
  0 siblings, 2 replies; 22+ messages in thread
From: NeilBrown @ 2022-01-28  4:24 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List

On Fri, 28 Jan 2022, Chuck Lever III wrote:
> 
> > On Jan 27, 2022, at 5:41 PM, NeilBrown <neilb@suse.de> wrote:
> > 
> > On Fri, 28 Jan 2022, Chuck Lever III wrote:
> >> Hi Neil-
> >> 
> >>> On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@suse.de> wrote:
> >>> 
> >>> If a filesystem is exported to a client with NFSv4 and that client holds
> >>> a file open, the filesystem cannot be unmounted without either stopping the
> >>> NFS server completely, or blocking all access from that client
> >>> (unexporting all filesystems) and waiting for the lease timeout.
> >>> 
> >>> For NFSv3 - and particularly NLM - it is possible to revoke all state by
> >>> writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem.
> >>> 
> >>> This series extends this functionality to NFSv4.  With this, to unmount
> >>> an exported filesystem is it sufficient to disable export of that
> >>> filesystem, and then write the path to unlock_filesystem.
> >>> 
> >>> I've cursed mainly on NFSv4.1 and later for this.  I haven't tested
> >>> yet with NFSv4.0 which has different mechanisms for state management.
> >>> 
> >>> If this series is seen as a general acceptable approach, I'll look into
> >>> the NFSv4.0 aspects properly and make sure it works there.
> >> 
> >> I've browsed this series and need to think about:
> >> - whether we want to enable administrative state revocation and
> >> - whether NFSv4.0 can support that reasonably
> >> 
> >> In particular, are there security consequences for revoking
> >> state? What would applications see, and would that depend on
> >> which minor version is in use? Are there data corruption risks
> >> if this facility were to be misused?
> > 
> > The expectation is that this would only be used after unexporting the
> > filesystem.  In that case, the client wouldn't notice any difference
> > from the act of writing to unlock_filesystem, as the entire filesystem
> > would already be inaccessible.
> > 
> > If we did unlock_filesystem a filesystem that was still exported, the
> > client would see similar behaviour to a network partition that was of
> > longer duration than the lease time.   Locks would be lost.
> > 
> >> 
> >> Also, Dai's courteous server work is something that potentially
> >> conflicts with some of this, and I'd like to see that go in
> >> first.
> > 
> > I'm perfectly happy to wait for the courteous server work to land before
> > pursuing this.
> > 
> >> 
> >> Do you have specific user requests for this feature, and if so,
> >> what are the particular usage scenarios?
> > 
> > It's complicated....
> > 
> > The customer has an HA config with multiple filesystem resource which
> > they want to be able to migrate independently.  I don't think we really
> > support that,
> 
> With NFSv4, the protocol has mechanisms to indicate to clients that
> a shared filesystem has migrated, and to indicate that the clients'
> state has been migrated too. Clients can reclaim their state if the
> servers did not migrate that state with the data. It deals with the
> edge cases to prevent clients from stealing open/lock state during
> the migration.
> 
> Unexporting doesn't seem like the right approach to that.

No, but it something that should work, and should allow the filesystem
to be unmounted.  You get to keep both halves.

> 
> 
> > but they seem to want to see if they can make it work (and
> > it should be noted that I talk to an L2 support technician who talks to
> > the customer representative, so I might be getting the full story).
> > 
> > Customer reported that even after unexporting a filesystem, they cannot
> > then unmount it.
> 
> My first thought is that probably clients are still pinning
> resources on that shared filesystem. I guess that's what the
> unlock_ interface is supposed to deal with. But that suggests
> to me that unexporting first is not as risk-free as you
> describe above. I think applications would notice and there
> would be edge cases where other clients might be able to
> grab open/lock state before the original holders could
> re-establish their lease.

Unexporting isn't risk free.  It just absorbs all the risks - none are
left of unlock_filesystem to be blamed for.

Expecting an application to recover if you unexport a filesystem and
later re-export it is certainly not guaranteed.  That isn't the use-case
I particularly want to fix.  I want to be able to unmount a filesystem
without visiting call clients and killing off applications.

> 
> 
> > Whether or not we think that independent filesystem
> > resources is supportable, I do think that the customer should have a
> > clear path for unmounting a filesystem without interfering with service
> > provided from other filesystems.
> 
> Maybe. I guess I put that in the "last resort" category
> rather than "this is something safe that I want to do as
> part of daily operation" category.

Agree.  Definitely "last resort".

> 
> 
> > Stopping nfsd would interfere with
> > that service by forcing a grace-period on all filesystems.
> 
> Yep. We have discussed implementing a per-filesystem
> grace period in the past. That is probably a pre-requisite
> to enabling filesystem migration.
> 
> 
> > The RFC explicitly supports admin-revocation of state, and that would
> > address this specific need, so it seemed completely appropriate to
> > provide it.
> 
> Well the RFC also provides for migrating filesystems without
> stopping the NFS service. If that's truly the goal, then I
> think we want to encourage that direction instead of ripping
> out open and lock state.

I suspect that virtual IPs and network namespaces is the better approach
for migrating exported filesystems.  It isn't clear to me that
integrated migration support in NFS would add anything of value.

But as I think I said to Bruce - seamless migration support is not my
goal here.  In the context where a site has multiple filesystems that
are all NFS exported, there is a case for being able to forcibly
unexport/unmount one filesystem without affecting the others.  That is
my aim here.

Thanks,
NeilBrown


> 
> Also, it's not clear to me that clients support administrative
> revocation as broadly as we might like. The Linux NFS client
> does have support for NFSv4 migration, though it's a bit
> fallow these days.
> 
> 
> > As an aside ...  I'd like to be able to suggest that the customer use
> > network namespaces for the different filesystem resources.  Each could
> > be in its own namespace and managed independently.  However I don't
> > think we have good admin infrastructure for that do we?
> 
> None that I'm aware of. SteveD is the best person to ask.
> 
> 
> > I'd like to be able to say "set up these 2 or 3 config files and run 
> > systemctl start nfs-server@foo and the 'foo' network namespace will be
> > created, configured, and have an nfs server running".
> > Do we have anything approaching that?  Even a HOWTO ??
> 
> Interesting idea! But doesn't ring a bell.
> 
> --
> Chuck Lever
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.
  2022-01-28  4:24       ` NeilBrown
@ 2022-01-28  4:35         ` Trond Myklebust
  2022-01-28 13:46         ` Chuck Lever III
  1 sibling, 0 replies; 22+ messages in thread
From: Trond Myklebust @ 2022-01-28  4:35 UTC (permalink / raw)
  To: neilb, chuck.lever; +Cc: linux-nfs

On Fri, 2022-01-28 at 15:24 +1100, NeilBrown wrote:
> On Fri, 28 Jan 2022, Chuck Lever III wrote:
> > 
> > > On Jan 27, 2022, at 5:41 PM, NeilBrown <neilb@suse.de> wrote:
> > > 
> > > On Fri, 28 Jan 2022, Chuck Lever III wrote:
> > > > Hi Neil-
> > > > 
> > > > > On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@suse.de>
> > > > > wrote:
> > > > > 
> > > > > If a filesystem is exported to a client with NFSv4 and that
> > > > > client holds
> > > > > a file open, the filesystem cannot be unmounted without
> > > > > either stopping the
> > > > > NFS server completely, or blocking all access from that
> > > > > client
> > > > > (unexporting all filesystems) and waiting for the lease
> > > > > timeout.
> > > > > 
> > > > > For NFSv3 - and particularly NLM - it is possible to revoke
> > > > > all state by
> > > > > writing the path to the filesystem into
> > > > > /proc/fs/nfsd/unlock_filesystem.
> > > > > 
> > > > > This series extends this functionality to NFSv4.  With this,
> > > > > to unmount
> > > > > an exported filesystem is it sufficient to disable export of
> > > > > that
> > > > > filesystem, and then write the path to unlock_filesystem.
> > > > > 
> > > > > I've cursed mainly on NFSv4.1 and later for this.  I haven't
> > > > > tested
> > > > > yet with NFSv4.0 which has different mechanisms for state
> > > > > management.
> > > > > 
> > > > > If this series is seen as a general acceptable approach, I'll
> > > > > look into
> > > > > the NFSv4.0 aspects properly and make sure it works there.
> > > > 
> > > > I've browsed this series and need to think about:
> > > > - whether we want to enable administrative state revocation and
> > > > - whether NFSv4.0 can support that reasonably
> > > > 
> > > > In particular, are there security consequences for revoking
> > > > state? What would applications see, and would that depend on
> > > > which minor version is in use? Are there data corruption risks
> > > > if this facility were to be misused?
> > > 
> > > The expectation is that this would only be used after unexporting
> > > the
> > > filesystem.  In that case, the client wouldn't notice any
> > > difference
> > > from the act of writing to unlock_filesystem, as the entire
> > > filesystem
> > > would already be inaccessible.
> > > 
> > > If we did unlock_filesystem a filesystem that was still exported,
> > > the
> > > client would see similar behaviour to a network partition that
> > > was of
> > > longer duration than the lease time.   Locks would be lost.
> > > 
> > > > 
> > > > Also, Dai's courteous server work is something that potentially
> > > > conflicts with some of this, and I'd like to see that go in
> > > > first.
> > > 
> > > I'm perfectly happy to wait for the courteous server work to land
> > > before
> > > pursuing this.
> > > 
> > > > 
> > > > Do you have specific user requests for this feature, and if so,
> > > > what are the particular usage scenarios?
> > > 
> > > It's complicated....
> > > 
> > > The customer has an HA config with multiple filesystem resource
> > > which
> > > they want to be able to migrate independently.  I don't think we
> > > really
> > > support that,
> > 
> > With NFSv4, the protocol has mechanisms to indicate to clients that
> > a shared filesystem has migrated, and to indicate that the clients'
> > state has been migrated too. Clients can reclaim their state if the
> > servers did not migrate that state with the data. It deals with the
> > edge cases to prevent clients from stealing open/lock state during
> > the migration.
> > 
> > Unexporting doesn't seem like the right approach to that.
> 
> No, but it something that should work, and should allow the
> filesystem
> to be unmounted.  You get to keep both halves.
> 
> > 
> > 
> > > but they seem to want to see if they can make it work (and
> > > it should be noted that I talk to an L2 support technician who
> > > talks to
> > > the customer representative, so I might be getting the full
> > > story).
> > > 
> > > Customer reported that even after unexporting a filesystem, they
> > > cannot
> > > then unmount it.
> > 
> > My first thought is that probably clients are still pinning
> > resources on that shared filesystem. I guess that's what the
> > unlock_ interface is supposed to deal with. But that suggests
> > to me that unexporting first is not as risk-free as you
> > describe above. I think applications would notice and there
> > would be edge cases where other clients might be able to
> > grab open/lock state before the original holders could
> > re-establish their lease.
> 
> Unexporting isn't risk free.  It just absorbs all the risks - none
> are
> left of unlock_filesystem to be blamed for.
> 
> Expecting an application to recover if you unexport a filesystem and
> later re-export it is certainly not guaranteed.  That isn't the use-
> case
> I particularly want to fix.  I want to be able to unmount a
> filesystem
> without visiting call clients and killing off applications.
> 
> > 
> > 
> > > Whether or not we think that independent filesystem
> > > resources is supportable, I do think that the customer should
> > > have a
> > > clear path for unmounting a filesystem without interfering with
> > > service
> > > provided from other filesystems.
> > 
> > Maybe. I guess I put that in the "last resort" category
> > rather than "this is something safe that I want to do as
> > part of daily operation" category.
> 
> Agree.  Definitely "last resort".
> 
> > 
> > 
> > > Stopping nfsd would interfere with
> > > that service by forcing a grace-period on all filesystems.
> > 
> > Yep. We have discussed implementing a per-filesystem
> > grace period in the past. That is probably a pre-requisite
> > to enabling filesystem migration.
> > 
> > 
> > > The RFC explicitly supports admin-revocation of state, and that
> > > would
> > > address this specific need, so it seemed completely appropriate
> > > to
> > > provide it.
> > 
> > Well the RFC also provides for migrating filesystems without
> > stopping the NFS service. If that's truly the goal, then I
> > think we want to encourage that direction instead of ripping
> > out open and lock state.
> 
> I suspect that virtual IPs and network namespaces is the better
> approach
> for migrating exported filesystems.  It isn't clear to me that
> integrated migration support in NFS would add anything of value.

No, but referrals allow you to create an arbitrary namespace out of a
set of containerised knfsd instances. It really wouldn't be hard to
convert an existing setup into something that gives you the single-
filesystem migration capabilities you're asking for.

> 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.
  2022-01-28  4:14       ` NeilBrown
@ 2022-01-28 13:38         ` J. Bruce Fields
  2022-01-28 16:23           ` J. Bruce Fields
  0 siblings, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2022-01-28 13:38 UTC (permalink / raw)
  To: NeilBrown; +Cc: Chuck Lever III, Linux NFS Mailing List

On Fri, Jan 28, 2022 at 03:14:51PM +1100, NeilBrown wrote:
> On Fri, 28 Jan 2022, J. Bruce Fields wrote:
> > I don't see how it's going to work.  You've got clients that hold locks
> > an opens on the unexported filesystem.  So maybe you can use an NFSv4
> > referral to point them to the new server.  Are they going to try to
> > issue reclaims to the new server?  There's more to do before this works.
> 
> As I hope I implied, I'm not at all sure that the specific problem that
> the customer raised (cannot unmount a filesystem) directly related to
> the general solution that the customer is trying to create.  Some
> customers like us to hold their hand the whole way, others like to (feel
> that they) have more control.  In general I like to encourage
> independence (but I have to consciously avoid trusting the results).
> 
> We have an "unlock_filesystem" interface.  I want it to work for NFSv4. 
> The HA config was background, not a complete motivation.

I think people do occasionally need to just rip a filesystem out even if
it means IO errors to applications.  And we already do this for NFSv3.
So, I'm inclined to support the idea.

(I also wonder whether some of the code could be a useful step towards
other functionality.)

> > > However I don't think we have good admin infrastructure for that do
> > > we?
> > > 
> > > I'd like to be able to say "set up these 2 or 3 config files and run
> > > systemctl start nfs-server@foo and the 'foo' network namespace will be
> > > created, configured, and have an nfs server running".  Do we have
> > > anything approaching that?  Even a HOWTO ??
> > 
> > But I don't think we've got anything that simple yet?
> 
> I guess I have some work to do....

RHEL HA does support NFS failover using containers.  I think it's a bit
more complicated than you're looking for.  Let me go dig that up....

With a KVM VM and shared backend storage I think it's pretty easy: just
shut down the VM on one machine and bring it up on another.

--b.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.
  2022-01-28  4:24       ` NeilBrown
  2022-01-28  4:35         ` Trond Myklebust
@ 2022-01-28 13:46         ` Chuck Lever III
  2022-01-28 15:03           ` J. Bruce Fields
  1 sibling, 1 reply; 22+ messages in thread
From: Chuck Lever III @ 2022-01-28 13:46 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux NFS Mailing List



> On Jan 27, 2022, at 11:24 PM, NeilBrown <neilb@suse.de> wrote:
> 
> On Fri, 28 Jan 2022, Chuck Lever III wrote:
>> 
>>> On Jan 27, 2022, at 5:41 PM, NeilBrown <neilb@suse.de> wrote:
>>> 
>>> On Fri, 28 Jan 2022, Chuck Lever III wrote:
>>>> Hi Neil-
>>>> 
>>>>> On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@suse.de> wrote:
>>>>> 
>>>>> If a filesystem is exported to a client with NFSv4 and that client holds
>>>>> a file open, the filesystem cannot be unmounted without either stopping the
>>>>> NFS server completely, or blocking all access from that client
>>>>> (unexporting all filesystems) and waiting for the lease timeout.
>>>>> 
>>>>> For NFSv3 - and particularly NLM - it is possible to revoke all state by
>>>>> writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem.
>>>>> 
>>>>> This series extends this functionality to NFSv4.  With this, to unmount
>>>>> an exported filesystem is it sufficient to disable export of that
>>>>> filesystem, and then write the path to unlock_filesystem.
>>>>> 
>>>>> I've cursed mainly on NFSv4.1 and later for this.  I haven't tested
>>>>> yet with NFSv4.0 which has different mechanisms for state management.
>>>>> 
>>>>> If this series is seen as a general acceptable approach, I'll look into
>>>>> the NFSv4.0 aspects properly and make sure it works there.
>>>> 
>>>> I've browsed this series and need to think about:
>>>> - whether we want to enable administrative state revocation and
>>>> - whether NFSv4.0 can support that reasonably
>>>> 
>>>> In particular, are there security consequences for revoking
>>>> state? What would applications see, and would that depend on
>>>> which minor version is in use? Are there data corruption risks
>>>> if this facility were to be misused?
>>> 
>>> The expectation is that this would only be used after unexporting the
>>> filesystem.  In that case, the client wouldn't notice any difference
>>> from the act of writing to unlock_filesystem, as the entire filesystem
>>> would already be inaccessible.
>>> 
>>> If we did unlock_filesystem a filesystem that was still exported, the
>>> client would see similar behaviour to a network partition that was of
>>> longer duration than the lease time.   Locks would be lost.
>>> 
>>>> 
>>>> Also, Dai's courteous server work is something that potentially
>>>> conflicts with some of this, and I'd like to see that go in
>>>> first.
>>> 
>>> I'm perfectly happy to wait for the courteous server work to land before
>>> pursuing this.
>>> 
>>>> 
>>>> Do you have specific user requests for this feature, and if so,
>>>> what are the particular usage scenarios?
>>> 
>>> It's complicated....
>>> 
>>> The customer has an HA config with multiple filesystem resource which
>>> they want to be able to migrate independently.  I don't think we really
>>> support that,
>> 
>> With NFSv4, the protocol has mechanisms to indicate to clients that
>> a shared filesystem has migrated, and to indicate that the clients'
>> state has been migrated too. Clients can reclaim their state if the
>> servers did not migrate that state with the data. It deals with the
>> edge cases to prevent clients from stealing open/lock state during
>> the migration.
>> 
>> Unexporting doesn't seem like the right approach to that.
> 
> No, but it something that should work, and should allow the filesystem
> to be unmounted.  You get to keep both halves.
> 
>> 
>> 
>>> but they seem to want to see if they can make it work (and
>>> it should be noted that I talk to an L2 support technician who talks to
>>> the customer representative, so I might be getting the full story).
>>> 
>>> Customer reported that even after unexporting a filesystem, they cannot
>>> then unmount it.
>> 
>> My first thought is that probably clients are still pinning
>> resources on that shared filesystem. I guess that's what the
>> unlock_ interface is supposed to deal with. But that suggests
>> to me that unexporting first is not as risk-free as you
>> describe above. I think applications would notice and there
>> would be edge cases where other clients might be able to
>> grab open/lock state before the original holders could
>> re-establish their lease.
> 
> Unexporting isn't risk free.  It just absorbs all the risks - none are
> left of unlock_filesystem to be blamed for.
> 
> Expecting an application to recover if you unexport a filesystem and
> later re-export it is certainly not guaranteed.  That isn't the use-case
> I particularly want to fix.  I want to be able to unmount a filesystem
> without visiting call clients and killing off applications.

OK. The top level goal then is simply to provide another
arrow in the administrator's quiver to manage a large
NFS server. It brings NFSv4 closer to par with the NFSv3
toolset.

I say we have enough motivation for a full proof of
concept. I would like to see support for minor version 0
added, and a fuller discussion of the consequences for
clients and applications will be needed (at least for
the purpose of administrator documentation).


>>> Whether or not we think that independent filesystem
>>> resources is supportable, I do think that the customer should have a
>>> clear path for unmounting a filesystem without interfering with service
>>> provided from other filesystems.
>> 
>> Maybe. I guess I put that in the "last resort" category
>> rather than "this is something safe that I want to do as
>> part of daily operation" category.
> 
> Agree.  Definitely "last resort".
> 
>> 
>> 
>>> Stopping nfsd would interfere with
>>> that service by forcing a grace-period on all filesystems.
>> 
>> Yep. We have discussed implementing a per-filesystem
>> grace period in the past. That is probably a pre-requisite
>> to enabling filesystem migration.
>> 
>> 
>>> The RFC explicitly supports admin-revocation of state, and that would
>>> address this specific need, so it seemed completely appropriate to
>>> provide it.
>> 
>> Well the RFC also provides for migrating filesystems without
>> stopping the NFS service. If that's truly the goal, then I
>> think we want to encourage that direction instead of ripping
>> out open and lock state.
> 
> I suspect that virtual IPs and network namespaces is the better approach
> for migrating exported filesystems.  It isn't clear to me that
> integrated migration support in NFS would add anything of value.
> 
> But as I think I said to Bruce - seamless migration support is not my
> goal here.  In the context where a site has multiple filesystems that
> are all NFS exported, there is a case for being able to forcibly
> unexport/unmount one filesystem without affecting the others.  That is
> my aim here.

My initial impulse is to better understand what is preventing
the unexported filesystem from being unmounted. Better
observability there could potentially be of value.


> Thanks,
> NeilBrown
> 
> 
>> 
>> Also, it's not clear to me that clients support administrative
>> revocation as broadly as we might like. The Linux NFS client
>> does have support for NFSv4 migration, though it's a bit
>> fallow these days.
>> 
>> 
>>> As an aside ...  I'd like to be able to suggest that the customer use
>>> network namespaces for the different filesystem resources.  Each could
>>> be in its own namespace and managed independently.  However I don't
>>> think we have good admin infrastructure for that do we?
>> 
>> None that I'm aware of. SteveD is the best person to ask.
>> 
>> 
>>> I'd like to be able to say "set up these 2 or 3 config files and run 
>>> systemctl start nfs-server@foo and the 'foo' network namespace will be
>>> created, configured, and have an nfs server running".
>>> Do we have anything approaching that?  Even a HOWTO ??
>> 
>> Interesting idea! But doesn't ring a bell.
>> 
>> --
>> Chuck Lever

--
Chuck Lever




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.
  2022-01-28 13:46         ` Chuck Lever III
@ 2022-01-28 15:03           ` J. Bruce Fields
  0 siblings, 0 replies; 22+ messages in thread
From: J. Bruce Fields @ 2022-01-28 15:03 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Neil Brown, Linux NFS Mailing List

On Fri, Jan 28, 2022 at 01:46:45PM +0000, Chuck Lever III wrote:
> My initial impulse is to better understand what is preventing
> the unexported filesystem from being unmounted. Better
> observability there could potentially be of value.

In theory that information's in /proc/fs/nfsd/clients/.  Somebody could
write a tool to scan that for state referencing a given filesystem.

--b.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.
  2022-01-28 13:38         ` J. Bruce Fields
@ 2022-01-28 16:23           ` J. Bruce Fields
  0 siblings, 0 replies; 22+ messages in thread
From: J. Bruce Fields @ 2022-01-28 16:23 UTC (permalink / raw)
  To: NeilBrown; +Cc: Chuck Lever III, Linux NFS Mailing List

On Fri, Jan 28, 2022 at 08:38:43AM -0500, J. Bruce Fields wrote:
> On Fri, Jan 28, 2022 at 03:14:51PM +1100, NeilBrown wrote:
> > On Fri, 28 Jan 2022, J. Bruce Fields wrote:
> > > I don't see how it's going to work.  You've got clients that hold locks
> > > an opens on the unexported filesystem.  So maybe you can use an NFSv4
> > > referral to point them to the new server.  Are they going to try to
> > > issue reclaims to the new server?  There's more to do before this works.
> > 
> > As I hope I implied, I'm not at all sure that the specific problem that
> > the customer raised (cannot unmount a filesystem) directly related to
> > the general solution that the customer is trying to create.  Some
> > customers like us to hold their hand the whole way, others like to (feel
> > that they) have more control.  In general I like to encourage
> > independence (but I have to consciously avoid trusting the results).
> > 
> > We have an "unlock_filesystem" interface.  I want it to work for NFSv4. 
> > The HA config was background, not a complete motivation.
> 
> I think people do occasionally need to just rip a filesystem out even if
> it means IO errors to applications.  And we already do this for NFSv3.
> So, I'm inclined to support the idea.
> 
> (I also wonder whether some of the code could be a useful step towards
> other functionality.)

For example, AFS-like read-only replica update: unmount a filesystem,
mount a new version in its place.  "Reconnecting" locks after the open
would be difficult, I think, but opens should be doable?  And in the
read-only case nobody should care about locks.

--b.


> 
> > > > However I don't think we have good admin infrastructure for that do
> > > > we?
> > > > 
> > > > I'd like to be able to say "set up these 2 or 3 config files and run
> > > > systemctl start nfs-server@foo and the 'foo' network namespace will be
> > > > created, configured, and have an nfs server running".  Do we have
> > > > anything approaching that?  Even a HOWTO ??
> > > 
> > > But I don't think we've got anything that simple yet?
> > 
> > I guess I have some work to do....
> 
> RHEL HA does support NFS failover using containers.  I think it's a bit
> more complicated than you're looking for.  Let me go dig that up....
> 
> With a KVM VM and shared backend storage I think it's pretty easy: just
> shut down the VM on one machine and bring it up on another.
> 
> --b.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-01-28 16:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-27  4:58 [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked NeilBrown
2022-01-27  4:58 ` [PATCH 1/4] nfsd: prepare for supporting admin-revocation of state NeilBrown
2022-01-27 12:59   ` kernel test robot
2022-01-27 12:59     ` kernel test robot
2022-01-27 14:51   ` kernel test robot
2022-01-27 14:51     ` kernel test robot
2022-01-27  4:58 ` [PATCH 3/4] nfsd: allow lock state ids to be revoked and then freed NeilBrown
2022-01-27  4:58 ` [PATCH 4/4] nfsd: allow delegation " NeilBrown
2022-01-27  4:58 ` [PATCH 2/4] nfsd: allow open " NeilBrown
2022-01-27 17:00 ` [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked Chuck Lever III
2022-01-27 22:41   ` NeilBrown
2022-01-27 23:15     ` dai.ngo
2022-01-28  0:07     ` Chuck Lever III
2022-01-28  4:24       ` NeilBrown
2022-01-28  4:35         ` Trond Myklebust
2022-01-28 13:46         ` Chuck Lever III
2022-01-28 15:03           ` J. Bruce Fields
2022-01-28  2:51     ` J. Bruce Fields
2022-01-28  4:14       ` NeilBrown
2022-01-28 13:38         ` J. Bruce Fields
2022-01-28 16:23           ` J. Bruce Fields
2022-01-27 20:06 ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.