All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] nfsd: support for lifting grace period early
@ 2014-08-19 18:38 Jeff Layton
  2014-08-19 18:38 ` [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module Jeff Layton
                   ` (5 more replies)
  0 siblings, 6 replies; 37+ messages in thread
From: Jeff Layton @ 2014-08-19 18:38 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

v2:
- move grace period handling into its own module

One of the huge annoyances in dealing with knfsd is the 90s grace period
that's imposed when the server reboots. This is not just an annoyance,
but means a siginificant amount of "downtime" in many production
environments.

This patchset aimed at reducing this pain. It adds a couple of /proc
knobs that tell the lockd and nfsd lock managers to lift the grace
period.

It also changes the UMH upcalls to pass a little bit of extra info in
the form of environment variables so that the upcall program can
determine whether there are still any clients that may be in the process
of reclaiming.

There are also a couple of cleanup patches in here that are not strictly
required. In particular, making a separate grace.ko module doesn't have
to be done, but I think it's a good idea.

Jeff Layton (5):
  lockd: move lockd's grace period handling into its own module
  lockd: add a /proc/fs/lockd/nlm_end_grace file
  nfsd: add a v4_end_grace file to /proc/fs/nfsd
  nfsd: remove redundant boot_time parm from grace_done client tracking
    op
  nfsd: pass extra info in env vars to upcalls to allow for early grace
    period end

 fs/Kconfig                       |   6 ++-
 fs/lockd/Makefile                |   3 +-
 fs/lockd/netns.h                 |   1 -
 fs/lockd/procfs.c                |  76 +++++++++++++++++++++++++++
 fs/lockd/procfs.h                |  28 ++++++++++
 fs/lockd/svc.c                   |  10 +++-
 fs/nfs_common/Makefile           |   3 +-
 fs/{lockd => nfs_common}/grace.c |  68 +++++++++++++++++++++----
 fs/nfsd/Kconfig                  |   1 +
 fs/nfsd/nfs4recover.c            | 107 +++++++++++++++++++++++++++++++--------
 fs/nfsd/nfs4state.c              |   8 +--
 fs/nfsd/nfsctl.c                 |  35 +++++++++++++
 fs/nfsd/state.h                  |   5 +-
 include/linux/proc_fs.h          |   2 +
 14 files changed, 312 insertions(+), 41 deletions(-)
 create mode 100644 fs/lockd/procfs.c
 create mode 100644 fs/lockd/procfs.h
 rename fs/{lockd => nfs_common}/grace.c (50%)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module
  2014-08-19 18:38 [PATCH v2 0/5] nfsd: support for lifting grace period early Jeff Layton
@ 2014-08-19 18:38 ` Jeff Layton
  2014-08-28 20:01   ` J. Bruce Fields
  2014-08-19 18:38 ` [PATCH v2 2/5] lockd: add a /proc/fs/lockd/nlm_end_grace file Jeff Layton
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 37+ messages in thread
From: Jeff Layton @ 2014-08-19 18:38 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Currently, all of the grace period handling is part of lockd. Eventually
though we'd like to be able to build v4-only servers, at which point
we'll need to put all of this elsewhere.

Move the code itself into fs/nfs_common and have it build a grace.ko
module. Then, rejigger the Kconfig options so that both nfsd and lockd
enable it automatically.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
---
 fs/Kconfig                       |  6 +++-
 fs/lockd/Makefile                |  2 +-
 fs/lockd/netns.h                 |  1 -
 fs/lockd/svc.c                   |  1 -
 fs/nfs_common/Makefile           |  3 +-
 fs/{lockd => nfs_common}/grace.c | 68 ++++++++++++++++++++++++++++++++++------
 fs/nfsd/Kconfig                  |  1 +
 include/linux/proc_fs.h          |  2 ++
 8 files changed, 69 insertions(+), 15 deletions(-)
 rename fs/{lockd => nfs_common}/grace.c (50%)

diff --git a/fs/Kconfig b/fs/Kconfig
index 312393f32948..db5dc1598716 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -233,9 +233,13 @@ if NETWORK_FILESYSTEMS
 source "fs/nfs/Kconfig"
 source "fs/nfsd/Kconfig"
 
+config GRACE_PERIOD
+	tristate
+
 config LOCKD
 	tristate
 	depends on FILE_LOCKING
+	select GRACE_PERIOD
 
 config LOCKD_V4
 	bool
@@ -249,7 +253,7 @@ config NFS_ACL_SUPPORT
 
 config NFS_COMMON
 	bool
-	depends on NFSD || NFS_FS
+	depends on NFSD || NFS_FS || LOCKD
 	default y
 
 source "net/sunrpc/Kconfig"
diff --git a/fs/lockd/Makefile b/fs/lockd/Makefile
index ca58d64374ca..6a0b351ce30e 100644
--- a/fs/lockd/Makefile
+++ b/fs/lockd/Makefile
@@ -5,6 +5,6 @@
 obj-$(CONFIG_LOCKD) += lockd.o
 
 lockd-objs-y := clntlock.o clntproc.o clntxdr.o host.o svc.o svclock.o \
-	        svcshare.o svcproc.o svcsubs.o mon.o xdr.o grace.o
+	        svcshare.o svcproc.o svcsubs.o mon.o xdr.o
 lockd-objs-$(CONFIG_LOCKD_V4) += clnt4xdr.o xdr4.o svc4proc.o
 lockd-objs		      := $(lockd-objs-y)
diff --git a/fs/lockd/netns.h b/fs/lockd/netns.h
index 5010b55628b4..097bfa3adb1c 100644
--- a/fs/lockd/netns.h
+++ b/fs/lockd/netns.h
@@ -11,7 +11,6 @@ struct lockd_net {
 
 	struct delayed_work grace_period_end;
 	struct lock_manager lockd_manager;
-	struct list_head grace_list;
 
 	spinlock_t nsm_clnt_lock;
 	unsigned int nsm_users;
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 8f27c93f8d2e..3599c9ca28ae 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -583,7 +583,6 @@ static int lockd_init_net(struct net *net)
 	struct lockd_net *ln = net_generic(net, lockd_net_id);
 
 	INIT_DELAYED_WORK(&ln->grace_period_end, grace_ender);
-	INIT_LIST_HEAD(&ln->grace_list);
 	spin_lock_init(&ln->nsm_clnt_lock);
 	return 0;
 }
diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
index f689ed82af3a..d153ca3ea577 100644
--- a/fs/nfs_common/Makefile
+++ b/fs/nfs_common/Makefile
@@ -3,5 +3,6 @@
 #
 
 obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o
-
 nfs_acl-objs := nfsacl.o
+
+obj-$(CONFIG_GRACE_PERIOD) += grace.o
diff --git a/fs/lockd/grace.c b/fs/nfs_common/grace.c
similarity index 50%
rename from fs/lockd/grace.c
rename to fs/nfs_common/grace.c
index 6d1ee7204c88..ae6e58ea4de5 100644
--- a/fs/lockd/grace.c
+++ b/fs/nfs_common/grace.c
@@ -1,17 +1,20 @@
 /*
  * Common code for control of lockd and nfsv4 grace periods.
+ *
+ * Transplanted from lockd code
  */
 
 #include <linux/module.h>
-#include <linux/lockd/bind.h>
 #include <net/net_namespace.h>
+#include <net/netns/generic.h>
+#include <linux/fs.h>
 
-#include "netns.h"
-
+static int grace_net_id;
 static DEFINE_SPINLOCK(grace_lock);
 
 /**
  * locks_start_grace
+ * @net: net namespace that this lock manager belongs to
  * @lm: who this grace period is for
  *
  * A grace period is a period during which locks should not be given
@@ -21,18 +24,20 @@ static DEFINE_SPINLOCK(grace_lock);
  *
  * This function is called to start a grace period.
  */
-void locks_start_grace(struct net *net, struct lock_manager *lm)
+void
+locks_start_grace(struct net *net, struct lock_manager *lm)
 {
-	struct lockd_net *ln = net_generic(net, lockd_net_id);
+	struct list_head *grace_list = net_generic(net, grace_net_id);
 
 	spin_lock(&grace_lock);
-	list_add(&lm->list, &ln->grace_list);
+	list_add(&lm->list, grace_list);
 	spin_unlock(&grace_lock);
 }
 EXPORT_SYMBOL_GPL(locks_start_grace);
 
 /**
  * locks_end_grace
+ * @net: net namespace that this lock manager belongs to
  * @lm: who this grace period is for
  *
  * Call this function to state that the given lock manager is ready to
@@ -41,7 +46,8 @@ EXPORT_SYMBOL_GPL(locks_start_grace);
  * Note that callers count on it being safe to call this more than once,
  * and the second call should be a no-op.
  */
-void locks_end_grace(struct lock_manager *lm)
+void
+locks_end_grace(struct lock_manager *lm)
 {
 	spin_lock(&grace_lock);
 	list_del_init(&lm->list);
@@ -56,10 +62,52 @@ EXPORT_SYMBOL_GPL(locks_end_grace);
  * to answer ordinary lock requests, and when they should accept only
  * lock reclaims.
  */
-int locks_in_grace(struct net *net)
+int
+locks_in_grace(struct net *net)
 {
-	struct lockd_net *ln = net_generic(net, lockd_net_id);
+	struct list_head *grace_list = net_generic(net, grace_net_id);
 
-	return !list_empty(&ln->grace_list);
+	return !list_empty(grace_list);
 }
 EXPORT_SYMBOL_GPL(locks_in_grace);
+
+static int __net_init
+grace_init_net(struct net *net)
+{
+	struct list_head *grace_list = net_generic(net, grace_net_id);
+
+	INIT_LIST_HEAD(grace_list);
+	return 0;
+}
+
+static void __net_exit
+grace_exit_net(struct net *net)
+{
+	struct list_head *grace_list = net_generic(net, grace_net_id);
+
+	BUG_ON(!list_empty(grace_list));
+}
+
+static struct pernet_operations grace_net_ops = {
+	.init = grace_init_net,
+	.exit = grace_exit_net,
+	.id   = &grace_net_id,
+	.size = sizeof(struct list_head),
+};
+
+static int __init
+init_grace(void)
+{
+	return register_pernet_subsys(&grace_net_ops);
+}
+
+static void __exit
+exit_grace(void)
+{
+	unregister_pernet_subsys(&grace_net_ops);
+}
+
+MODULE_AUTHOR("Jeff Layton <jlayton@primarydata.com>");
+MODULE_LICENSE("GPL");
+module_init(init_grace)
+module_exit(exit_grace)
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index f994e750e0d1..4fa98764de21 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -71,6 +71,7 @@ config NFSD_V4
 	select FS_POSIX_ACL
 	select SUNRPC_GSS
 	select CRYPTO
+	select GRACE_PERIOD
 	help
 	  This option enables support in your system's NFS server for
 	  version 4 of the NFS protocol (RFC 3530).
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 9d117f61d976..b97bf2ef996e 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -74,6 +74,8 @@ static inline int remove_proc_subtree(const char *name, struct proc_dir_entry *p
 
 #endif /* CONFIG_PROC_FS */
 
+struct net;
+
 static inline struct proc_dir_entry *proc_net_mkdir(
 	struct net *net, const char *name, struct proc_dir_entry *parent)
 {
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 2/5] lockd: add a /proc/fs/lockd/nlm_end_grace file
  2014-08-19 18:38 [PATCH v2 0/5] nfsd: support for lifting grace period early Jeff Layton
  2014-08-19 18:38 ` [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module Jeff Layton
@ 2014-08-19 18:38 ` Jeff Layton
  2014-09-04 19:52   ` J. Bruce Fields
  2014-08-19 18:38 ` [PATCH v2 3/5] nfsd: add a v4_end_grace file to /proc/fs/nfsd Jeff Layton
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 37+ messages in thread
From: Jeff Layton @ 2014-08-19 18:38 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Add a new procfile that will allow a (privileged) userland process to
end the NLM grace period early. The basic idea here will be to have
sm-notify write to this file, if it sent out no NOTIFY requests when
it runs. In that situation, we can generally expect that there will be
no reclaim requests so the grace period can be lifted early.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
---
 fs/lockd/Makefile |  1 +
 fs/lockd/procfs.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/lockd/procfs.h | 28 ++++++++++++++++++++
 fs/lockd/svc.c    |  9 +++++++
 4 files changed, 114 insertions(+)
 create mode 100644 fs/lockd/procfs.c
 create mode 100644 fs/lockd/procfs.h

diff --git a/fs/lockd/Makefile b/fs/lockd/Makefile
index 6a0b351ce30e..9b320cc2a8cf 100644
--- a/fs/lockd/Makefile
+++ b/fs/lockd/Makefile
@@ -7,4 +7,5 @@ obj-$(CONFIG_LOCKD) += lockd.o
 lockd-objs-y := clntlock.o clntproc.o clntxdr.o host.o svc.o svclock.o \
 	        svcshare.o svcproc.o svcsubs.o mon.o xdr.o
 lockd-objs-$(CONFIG_LOCKD_V4) += clnt4xdr.o xdr4.o svc4proc.o
+lockd-objs-$(CONFIG_PROC_FS) += procfs.o
 lockd-objs		      := $(lockd-objs-y)
diff --git a/fs/lockd/procfs.c b/fs/lockd/procfs.c
new file mode 100644
index 000000000000..dfd708fa43a3
--- /dev/null
+++ b/fs/lockd/procfs.c
@@ -0,0 +1,76 @@
+/*
+ * Procfs support for lockd
+ *
+ * Copyright (c) 2014 Jeff Layton <jlayton@primarydata.com>
+ */
+
+#include <linux/fs.h>
+#include <linux/proc_fs.h>
+#include <linux/module.h>
+#include <linux/nsproxy.h>
+#include <net/net_namespace.h>
+
+#include "netns.h"
+#include "procfs.h"
+
+/*
+ * Just check to see if the data written is longer than zero length. We
+ * don't really care what's in the data, just that someone wrote something
+ * to the file.
+ */
+static ssize_t
+nlm_end_grace_write(struct file *file, const char __user *buf, size_t size,
+		    loff_t *pos)
+{
+	struct lockd_net *ln = net_generic(current->nsproxy->net_ns, lockd_net_id);
+
+	if (size > 0)
+		locks_end_grace(&ln->lockd_manager);
+
+	return size;
+}
+
+static ssize_t
+nlm_end_grace_read(struct file *file, char __user *buf, size_t size,
+		   loff_t *pos)
+{
+	struct lockd_net *ln = net_generic(current->nsproxy->net_ns, lockd_net_id);
+	char resp[3];
+
+	resp[0] = list_empty(&ln->lockd_manager.list) ? 'Y' : 'N';
+	resp[1] = '\n';
+	resp[2] = '\0';
+
+	return simple_read_from_buffer(buf, size, pos, resp, sizeof(resp));
+}
+
+static const struct file_operations lockd_end_grace_operations = {
+	.write		= nlm_end_grace_write,
+	.read		= nlm_end_grace_read,
+	.llseek		= default_llseek,
+	.owner		= THIS_MODULE,
+};
+
+int __init
+lockd_create_procfs(void)
+{
+	struct proc_dir_entry *entry;
+
+	entry = proc_mkdir("fs/lockd", NULL);
+	if (!entry)
+		return -ENOMEM;
+	entry = proc_create("nlm_end_grace", S_IRUGO|S_IWUSR, entry,
+				 &lockd_end_grace_operations);
+	if (!entry) {
+		remove_proc_entry("fs/lockd", NULL);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+void __exit
+lockd_remove_procfs(void)
+{
+	remove_proc_entry("fs/lockd/nlm_end_grace", NULL);
+	remove_proc_entry("fs/lockd", NULL);
+}
diff --git a/fs/lockd/procfs.h b/fs/lockd/procfs.h
new file mode 100644
index 000000000000..2257a1311027
--- /dev/null
+++ b/fs/lockd/procfs.h
@@ -0,0 +1,28 @@
+/*
+ * Procfs support for lockd
+ *
+ * Copyright (c) 2014 Jeff Layton <jlayton@primarydata.com>
+ */
+#ifndef _LOCKD_PROCFS_H
+#define _LOCKD_PROCFS_H
+
+#include <linux/kconfig.h>
+
+#if IS_ENABLED(CONFIG_PROC_FS)
+int lockd_create_procfs(void);
+void lockd_remove_procfs(void);
+#else
+static inline int
+lockd_create_procfs(void)
+{
+	return 0;
+}
+
+static inline void
+lockd_remove_procfs(void)
+{
+	return;
+}
+#endif /* IS_ENABLED(CONFIG_PROC_FS) */
+
+#endif /* _LOCKD_PROCFS_H */
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 3599c9ca28ae..f01a9aa01dff 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -36,6 +36,7 @@
 #include <linux/nfs.h>
 
 #include "netns.h"
+#include "procfs.h"
 
 #define NLMDBG_FACILITY		NLMDBG_SVC
 #define LOCKD_BUFSIZE		(1024 + NLMSVC_XDRSIZE)
@@ -616,8 +617,15 @@ static int __init init_nlm(void)
 	err = register_pernet_subsys(&lockd_net_ops);
 	if (err)
 		goto err_pernet;
+
+	err = lockd_create_procfs();
+	if (err)
+		goto err_procfs;
+
 	return 0;
 
+err_procfs:
+	unregister_pernet_subsys(&lockd_net_ops);
 err_pernet:
 #ifdef CONFIG_SYSCTL
 	unregister_sysctl_table(nlm_sysctl_table);
@@ -630,6 +638,7 @@ static void __exit exit_nlm(void)
 {
 	/* FIXME: delete all NLM clients */
 	nlm_shutdown_hosts();
+	lockd_remove_procfs();
 	unregister_pernet_subsys(&lockd_net_ops);
 #ifdef CONFIG_SYSCTL
 	unregister_sysctl_table(nlm_sysctl_table);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 3/5] nfsd: add a v4_end_grace file to /proc/fs/nfsd
  2014-08-19 18:38 [PATCH v2 0/5] nfsd: support for lifting grace period early Jeff Layton
  2014-08-19 18:38 ` [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module Jeff Layton
  2014-08-19 18:38 ` [PATCH v2 2/5] lockd: add a /proc/fs/lockd/nlm_end_grace file Jeff Layton
@ 2014-08-19 18:38 ` Jeff Layton
  2014-09-04 19:54   ` J. Bruce Fields
  2014-08-19 18:38 ` [PATCH v2 4/5] nfsd: remove redundant boot_time parm from grace_done client tracking op Jeff Layton
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 37+ messages in thread
From: Jeff Layton @ 2014-08-19 18:38 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Allow a privileged userland process to end the v4 grace period early.
Any write to the file will cause the v4 grace period to be lifted.
The basic idea with this will be to allow the userland client tracking
program to lift the grace period once it knows that no more clients
will be reclaiming state.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
---
 fs/nfsd/nfs4state.c |  2 +-
 fs/nfsd/nfsctl.c    | 35 +++++++++++++++++++++++++++++++++++
 fs/nfsd/state.h     |  3 +++
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 2e80a59e7e91..711280e0e4ac 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4107,7 +4107,7 @@ out:
 	return status;
 }
 
-static void
+void
 nfsd4_end_grace(struct nfsd_net *nn)
 {
 	/* do nothing if grace period already ended */
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 4e042105fb6e..99044e237861 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -49,6 +49,7 @@ enum {
 	NFSD_Leasetime,
 	NFSD_Gracetime,
 	NFSD_RecoveryDir,
+	NFSD_V4EndGrace,
 #endif
 };
 
@@ -68,6 +69,7 @@ static ssize_t write_maxconn(struct file *file, char *buf, size_t size);
 static ssize_t write_leasetime(struct file *file, char *buf, size_t size);
 static ssize_t write_gracetime(struct file *file, char *buf, size_t size);
 static ssize_t write_recoverydir(struct file *file, char *buf, size_t size);
+static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size);
 #endif
 
 static ssize_t (*write_op[])(struct file *, char *, size_t) = {
@@ -84,6 +86,7 @@ static ssize_t (*write_op[])(struct file *, char *, size_t) = {
 	[NFSD_Leasetime] = write_leasetime,
 	[NFSD_Gracetime] = write_gracetime,
 	[NFSD_RecoveryDir] = write_recoverydir,
+	[NFSD_V4EndGrace] = write_v4_end_grace,
 #endif
 };
 
@@ -1077,6 +1080,37 @@ static ssize_t write_recoverydir(struct file *file, char *buf, size_t size)
 	return rv;
 }
 
+/**
+ * write_v4_end_grace - release grace period for nfsd's v4.x lock manager
+ *
+ * Input:
+ *			buf:		ignored
+ *			size:		zero
+ * OR
+ *
+ * Input:
+ * 			buf:		any value
+ *			size:		non-zero length of C string in @buf
+ * Output:
+ *			passed-in buffer filled with "Y" or "N" with a newline
+ *			and NULL-terminated C string. This indicates whether
+ *			the grace period has ended in the current net
+ *			namespace. Return code is the size in bytes of the
+ *			string. Writing to the file will end the grace period
+ *			for nfsd's v4 lock manager.
+ */
+static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size)
+{
+	struct net *net = file->f_dentry->d_sb->s_fs_info;
+	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+
+	if (size > 0)
+		nfsd4_end_grace(nn);
+
+	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%c\n",
+			 nn->grace_ended ? 'Y' : 'N');
+}
+
 #endif
 
 /*----------------------------------------------------------------------------*/
@@ -1110,6 +1144,7 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
 		[NFSD_Leasetime] = {"nfsv4leasetime", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Gracetime] = {"nfsv4gracetime", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_RecoveryDir] = {"nfsv4recoverydir", &transaction_ops, S_IWUSR|S_IRUSR},
+		[NFSD_V4EndGrace] = {"v4_end_grace", &transaction_ops, S_IWUSR|S_IRUGO},
 #endif
 		/* last one */ {""}
 	};
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 4a89e00d7461..ecf579904892 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -545,6 +545,9 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
 							struct nfsd_net *nn);
 extern bool nfs4_has_reclaimed_state(const char *name, struct nfsd_net *nn);
 
+/* grace period management */
+void nfsd4_end_grace(struct nfsd_net *nn);
+
 /* nfs4recover operations */
 extern int nfsd4_client_tracking_init(struct net *net);
 extern void nfsd4_client_tracking_exit(struct net *net);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 4/5] nfsd: remove redundant boot_time parm from grace_done client tracking op
  2014-08-19 18:38 [PATCH v2 0/5] nfsd: support for lifting grace period early Jeff Layton
                   ` (2 preceding siblings ...)
  2014-08-19 18:38 ` [PATCH v2 3/5] nfsd: add a v4_end_grace file to /proc/fs/nfsd Jeff Layton
@ 2014-08-19 18:38 ` Jeff Layton
  2014-09-04 19:54   ` J. Bruce Fields
  2014-08-19 18:38 ` [PATCH v2 5/5] nfsd: pass extra info in env vars to upcalls to allow for early grace period end Jeff Layton
  2014-09-26 18:39 ` [PATCH v2 0/5] nfsd: support for lifting grace period early J. Bruce Fields
  5 siblings, 1 reply; 37+ messages in thread
From: Jeff Layton @ 2014-08-19 18:38 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Since it's stored in nfsd_net, we don't need to pass it in separately.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
---
 fs/nfsd/nfs4recover.c | 17 ++++++++---------
 fs/nfsd/nfs4state.c   |  2 +-
 fs/nfsd/state.h       |  2 +-
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 9c271f42604a..a0d2ba956a3f 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -58,7 +58,7 @@ struct nfsd4_client_tracking_ops {
 	void (*create)(struct nfs4_client *);
 	void (*remove)(struct nfs4_client *);
 	int (*check)(struct nfs4_client *);
-	void (*grace_done)(struct nfsd_net *, time_t);
+	void (*grace_done)(struct nfsd_net *);
 };
 
 /* Globals */
@@ -392,7 +392,7 @@ purge_old(struct dentry *parent, struct dentry *child, struct nfsd_net *nn)
 }
 
 static void
-nfsd4_recdir_purge_old(struct nfsd_net *nn, time_t boot_time)
+nfsd4_recdir_purge_old(struct nfsd_net *nn)
 {
 	int status;
 
@@ -1016,7 +1016,7 @@ nfsd4_cld_check(struct nfs4_client *clp)
 }
 
 static void
-nfsd4_cld_grace_done(struct nfsd_net *nn, time_t boot_time)
+nfsd4_cld_grace_done(struct nfsd_net *nn)
 {
 	int ret;
 	struct cld_upcall *cup;
@@ -1029,7 +1029,7 @@ nfsd4_cld_grace_done(struct nfsd_net *nn, time_t boot_time)
 	}
 
 	cup->cu_msg.cm_cmd = Cld_GraceDone;
-	cup->cu_msg.cm_u.cm_gracetime = (int64_t)boot_time;
+	cup->cu_msg.cm_u.cm_gracetime = (int64_t)nn->boot_time;
 	ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_msg);
 	if (!ret)
 		ret = cup->cu_msg.cm_status;
@@ -1245,13 +1245,12 @@ nfsd4_umh_cltrack_check(struct nfs4_client *clp)
 }
 
 static void
-nfsd4_umh_cltrack_grace_done(struct nfsd_net __attribute__((unused)) *nn,
-				time_t boot_time)
+nfsd4_umh_cltrack_grace_done(struct nfsd_net *nn)
 {
 	char *legacy;
 	char timestr[22]; /* FIXME: better way to determine max size? */
 
-	sprintf(timestr, "%ld", boot_time);
+	sprintf(timestr, "%ld", nn->boot_time);
 	legacy = nfsd4_cltrack_legacy_topdir();
 	nfsd4_umh_cltrack_upcall("gracedone", timestr, legacy);
 	kfree(legacy);
@@ -1356,10 +1355,10 @@ nfsd4_client_record_check(struct nfs4_client *clp)
 }
 
 void
-nfsd4_record_grace_done(struct nfsd_net *nn, time_t boot_time)
+nfsd4_record_grace_done(struct nfsd_net *nn)
 {
 	if (nn->client_tracking_ops)
-		nn->client_tracking_ops->grace_done(nn, boot_time);
+		nn->client_tracking_ops->grace_done(nn);
 }
 
 static int
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 711280e0e4ac..21becb29dae1 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4116,7 +4116,7 @@ nfsd4_end_grace(struct nfsd_net *nn)
 
 	dprintk("NFSD: end of grace period\n");
 	nn->grace_ended = true;
-	nfsd4_record_grace_done(nn, nn->boot_time);
+	nfsd4_record_grace_done(nn);
 	locks_end_grace(&nn->nfsd4_manager);
 	/*
 	 * Now that every NFSv4 client has had the chance to recover and
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index ecf579904892..854f0c574ccf 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -554,7 +554,7 @@ extern void nfsd4_client_tracking_exit(struct net *net);
 extern void nfsd4_client_record_create(struct nfs4_client *clp);
 extern void nfsd4_client_record_remove(struct nfs4_client *clp);
 extern int nfsd4_client_record_check(struct nfs4_client *clp);
-extern void nfsd4_record_grace_done(struct nfsd_net *nn, time_t boot_time);
+extern void nfsd4_record_grace_done(struct nfsd_net *nn);
 
 /* nfs fault injection functions */
 #ifdef CONFIG_NFSD_FAULT_INJECTION
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 5/5] nfsd: pass extra info in env vars to upcalls to allow for early grace period end
  2014-08-19 18:38 [PATCH v2 0/5] nfsd: support for lifting grace period early Jeff Layton
                   ` (3 preceding siblings ...)
  2014-08-19 18:38 ` [PATCH v2 4/5] nfsd: remove redundant boot_time parm from grace_done client tracking op Jeff Layton
@ 2014-08-19 18:38 ` Jeff Layton
  2014-09-04 19:59   ` J. Bruce Fields
  2014-09-26 18:39 ` [PATCH v2 0/5] nfsd: support for lifting grace period early J. Bruce Fields
  5 siblings, 1 reply; 37+ messages in thread
From: Jeff Layton @ 2014-08-19 18:38 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

In order to support lifting the grace period early, we must tell
nfsdcltrack what sort of client the "create" upcall is for. We can't
reliably tell if a v4.0 client has completed reclaiming, so we can only
lift the grace period once all the v4.1+ clients have issued a
RECLAIM_COMPLETE and if there are no v4.0 clients.

Also, in order to lift the grace period, we have to tell userland when
the grace period started so that it can tell whether a RECLAIM_COMPLETE
has been issued for each client since then.

Since this is all optional info, we pass it along in environment
variables to the "init" and "create" upcalls. By doing this, we don't
need to revise the upcall format. The UMH upcall can simply make use of
this info if it happens to be present. If it's not then it can just
avoid lifting the grace period early.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
---
 fs/nfsd/nfs4recover.c | 90 ++++++++++++++++++++++++++++++++++++++++++++-------
 fs/nfsd/nfs4state.c   |  4 +--
 2 files changed, 80 insertions(+), 14 deletions(-)

diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index a0d2ba956a3f..2b61bcd92b58 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -1062,6 +1062,8 @@ MODULE_PARM_DESC(cltrack_legacy_disable,
 
 #define LEGACY_TOPDIR_ENV_PREFIX "NFSDCLTRACK_LEGACY_TOPDIR="
 #define LEGACY_RECDIR_ENV_PREFIX "NFSDCLTRACK_LEGACY_RECDIR="
+#define CLIENT_MINORVERS_ENV_PREFIX "NFSDCLTRACK_CLIENT_MINORVERSION="
+#define GRACE_START_ENV_PREFIX "NFSDCLTRACK_GRACE_START="
 
 static char *
 nfsd4_cltrack_legacy_topdir(void)
@@ -1126,10 +1128,60 @@ nfsd4_cltrack_legacy_recdir(const struct xdr_netobj *name)
 	return result;
 }
 
+static char *
+nfsd4_cltrack_client_minorversion(struct nfs4_client *clp)
+{
+	int copied;
+	size_t len;
+	char *result;
+
+	/* prefix + max width of integer string + terminating NULL */
+	len = strlen(CLIENT_MINORVERS_ENV_PREFIX) + 10 + 1;
+
+	result = kmalloc(len, GFP_KERNEL);
+	if (!result)
+		return result;
+
+	copied = snprintf(result, len, CLIENT_MINORVERS_ENV_PREFIX "%u",
+				clp->cl_minorversion);
+	if (copied >= len) {
+		/* just return nothing if output was truncated */
+		kfree(result);
+		return NULL;
+	}
+
+	return result;
+}
+
+static char *
+nfsd4_cltrack_grace_start(time_t grace_start)
+{
+	int copied;
+	size_t len;
+	char *result;
+
+	/* prefix + max width of int64_t string + terminating NULL */
+	len = strlen(GRACE_START_ENV_PREFIX) + 22 + 1;
+
+	result = kmalloc(len, GFP_KERNEL);
+	if (!result)
+		return result;
+
+	copied = snprintf(result, len, GRACE_START_ENV_PREFIX "%ld",
+				grace_start);
+	if (copied >= len) {
+		/* just return nothing if output was truncated */
+		kfree(result);
+		return NULL;
+	}
+
+	return result;
+}
+
 static int
-nfsd4_umh_cltrack_upcall(char *cmd, char *arg, char *legacy)
+nfsd4_umh_cltrack_upcall(char *cmd, char *arg, char *env0, char *env1)
 {
-	char *envp[2];
+	char *envp[3];
 	char *argv[4];
 	int ret;
 
@@ -1140,10 +1192,12 @@ nfsd4_umh_cltrack_upcall(char *cmd, char *arg, char *legacy)
 
 	dprintk("%s: cmd: %s\n", __func__, cmd);
 	dprintk("%s: arg: %s\n", __func__, arg ? arg : "(null)");
-	dprintk("%s: legacy: %s\n", __func__, legacy ? legacy : "(null)");
+	dprintk("%s: env0: %s\n", __func__, env0 ? env0 : "(null)");
+	dprintk("%s: env1: %s\n", __func__, env1 ? env1 : "(null)");
 
-	envp[0] = legacy;
-	envp[1] = NULL;
+	envp[0] = env0;
+	envp[1] = env1;
+	envp[2] = NULL;
 
 	argv[0] = (char *)cltrack_prog;
 	argv[1] = cmd;
@@ -1187,28 +1241,40 @@ bin_to_hex_dup(const unsigned char *src, int srclen)
 }
 
 static int
-nfsd4_umh_cltrack_init(struct net __attribute__((unused)) *net)
+nfsd4_umh_cltrack_init(struct net *net)
 {
+	int ret;
+	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+	char *grace_start = nfsd4_cltrack_grace_start(nn->boot_time);
+
 	/* XXX: The usermode helper s not working in container yet. */
 	if (net != &init_net) {
 		WARN(1, KERN_ERR "NFSD: attempt to initialize umh client "
 			"tracking in a container!\n");
 		return -EINVAL;
 	}
-	return nfsd4_umh_cltrack_upcall("init", NULL, NULL);
+
+	ret = nfsd4_umh_cltrack_upcall("init", NULL, grace_start, NULL);
+	kfree(grace_start);
+	return ret;
 }
 
 static void
 nfsd4_umh_cltrack_create(struct nfs4_client *clp)
 {
-	char *hexid;
+	char *hexid, *minorvers, *grace_start;
+	struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
 
 	hexid = bin_to_hex_dup(clp->cl_name.data, clp->cl_name.len);
 	if (!hexid) {
 		dprintk("%s: can't allocate memory for upcall!\n", __func__);
 		return;
 	}
-	nfsd4_umh_cltrack_upcall("create", hexid, NULL);
+	minorvers = nfsd4_cltrack_client_minorversion(clp);
+	grace_start = nfsd4_cltrack_grace_start(nn->boot_time);
+	nfsd4_umh_cltrack_upcall("create", hexid, minorvers, grace_start);
+	kfree(minorvers);
+	kfree(grace_start);
 	kfree(hexid);
 }
 
@@ -1222,7 +1288,7 @@ nfsd4_umh_cltrack_remove(struct nfs4_client *clp)
 		dprintk("%s: can't allocate memory for upcall!\n", __func__);
 		return;
 	}
-	nfsd4_umh_cltrack_upcall("remove", hexid, NULL);
+	nfsd4_umh_cltrack_upcall("remove", hexid, NULL, NULL);
 	kfree(hexid);
 }
 
@@ -1238,7 +1304,7 @@ nfsd4_umh_cltrack_check(struct nfs4_client *clp)
 		return -ENOMEM;
 	}
 	legacy = nfsd4_cltrack_legacy_recdir(&clp->cl_name);
-	ret = nfsd4_umh_cltrack_upcall("check", hexid, legacy);
+	ret = nfsd4_umh_cltrack_upcall("check", hexid, legacy, NULL);
 	kfree(legacy);
 	kfree(hexid);
 	return ret;
@@ -1252,7 +1318,7 @@ nfsd4_umh_cltrack_grace_done(struct nfsd_net *nn)
 
 	sprintf(timestr, "%ld", nn->boot_time);
 	legacy = nfsd4_cltrack_legacy_topdir();
-	nfsd4_umh_cltrack_upcall("gracedone", timestr, legacy);
+	nfsd4_umh_cltrack_upcall("gracedone", timestr, legacy, NULL);
 	kfree(legacy);
 }
 
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 21becb29dae1..5b04353bf2ec 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -6342,10 +6342,10 @@ nfs4_state_start_net(struct net *net)
 	ret = nfs4_state_create_net(net);
 	if (ret)
 		return ret;
-	nfsd4_client_tracking_init(net);
 	nn->boot_time = get_seconds();
-	locks_start_grace(net, &nn->nfsd4_manager);
 	nn->grace_ended = false;
+	locks_start_grace(net, &nn->nfsd4_manager);
+	nfsd4_client_tracking_init(net);
 	printk(KERN_INFO "NFSD: starting %ld-second grace period (net %p)\n",
 	       nn->nfsd4_grace, net);
 	queue_delayed_work(laundry_wq, &nn->laundromat_work, nn->nfsd4_grace * HZ);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module
  2014-08-19 18:38 ` [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module Jeff Layton
@ 2014-08-28 20:01   ` J. Bruce Fields
  2014-08-28 20:24     ` J. Bruce Fields
  0 siblings, 1 reply; 37+ messages in thread
From: J. Bruce Fields @ 2014-08-28 20:01 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Tue, Aug 19, 2014 at 02:38:25PM -0400, Jeff Layton wrote:
> Currently, all of the grace period handling is part of lockd. Eventually
> though we'd like to be able to build v4-only servers, at which point
> we'll need to put all of this elsewhere.
> 
> Move the code itself into fs/nfs_common and have it build a grace.ko
> module. Then, rejigger the Kconfig options so that both nfsd and lockd
> enable it automatically.

Thanks, applying this one for 3.18 indepedently of the others.

--b.

> 
> Signed-off-by: Jeff Layton <jlayton@primarydata.com>
> ---
>  fs/Kconfig                       |  6 +++-
>  fs/lockd/Makefile                |  2 +-
>  fs/lockd/netns.h                 |  1 -
>  fs/lockd/svc.c                   |  1 -
>  fs/nfs_common/Makefile           |  3 +-
>  fs/{lockd => nfs_common}/grace.c | 68 ++++++++++++++++++++++++++++++++++------
>  fs/nfsd/Kconfig                  |  1 +
>  include/linux/proc_fs.h          |  2 ++
>  8 files changed, 69 insertions(+), 15 deletions(-)
>  rename fs/{lockd => nfs_common}/grace.c (50%)
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 312393f32948..db5dc1598716 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -233,9 +233,13 @@ if NETWORK_FILESYSTEMS
>  source "fs/nfs/Kconfig"
>  source "fs/nfsd/Kconfig"
>  
> +config GRACE_PERIOD
> +	tristate
> +
>  config LOCKD
>  	tristate
>  	depends on FILE_LOCKING
> +	select GRACE_PERIOD
>  
>  config LOCKD_V4
>  	bool
> @@ -249,7 +253,7 @@ config NFS_ACL_SUPPORT
>  
>  config NFS_COMMON
>  	bool
> -	depends on NFSD || NFS_FS
> +	depends on NFSD || NFS_FS || LOCKD
>  	default y
>  
>  source "net/sunrpc/Kconfig"
> diff --git a/fs/lockd/Makefile b/fs/lockd/Makefile
> index ca58d64374ca..6a0b351ce30e 100644
> --- a/fs/lockd/Makefile
> +++ b/fs/lockd/Makefile
> @@ -5,6 +5,6 @@
>  obj-$(CONFIG_LOCKD) += lockd.o
>  
>  lockd-objs-y := clntlock.o clntproc.o clntxdr.o host.o svc.o svclock.o \
> -	        svcshare.o svcproc.o svcsubs.o mon.o xdr.o grace.o
> +	        svcshare.o svcproc.o svcsubs.o mon.o xdr.o
>  lockd-objs-$(CONFIG_LOCKD_V4) += clnt4xdr.o xdr4.o svc4proc.o
>  lockd-objs		      := $(lockd-objs-y)
> diff --git a/fs/lockd/netns.h b/fs/lockd/netns.h
> index 5010b55628b4..097bfa3adb1c 100644
> --- a/fs/lockd/netns.h
> +++ b/fs/lockd/netns.h
> @@ -11,7 +11,6 @@ struct lockd_net {
>  
>  	struct delayed_work grace_period_end;
>  	struct lock_manager lockd_manager;
> -	struct list_head grace_list;
>  
>  	spinlock_t nsm_clnt_lock;
>  	unsigned int nsm_users;
> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> index 8f27c93f8d2e..3599c9ca28ae 100644
> --- a/fs/lockd/svc.c
> +++ b/fs/lockd/svc.c
> @@ -583,7 +583,6 @@ static int lockd_init_net(struct net *net)
>  	struct lockd_net *ln = net_generic(net, lockd_net_id);
>  
>  	INIT_DELAYED_WORK(&ln->grace_period_end, grace_ender);
> -	INIT_LIST_HEAD(&ln->grace_list);
>  	spin_lock_init(&ln->nsm_clnt_lock);
>  	return 0;
>  }
> diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
> index f689ed82af3a..d153ca3ea577 100644
> --- a/fs/nfs_common/Makefile
> +++ b/fs/nfs_common/Makefile
> @@ -3,5 +3,6 @@
>  #
>  
>  obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o
> -
>  nfs_acl-objs := nfsacl.o
> +
> +obj-$(CONFIG_GRACE_PERIOD) += grace.o
> diff --git a/fs/lockd/grace.c b/fs/nfs_common/grace.c
> similarity index 50%
> rename from fs/lockd/grace.c
> rename to fs/nfs_common/grace.c
> index 6d1ee7204c88..ae6e58ea4de5 100644
> --- a/fs/lockd/grace.c
> +++ b/fs/nfs_common/grace.c
> @@ -1,17 +1,20 @@
>  /*
>   * Common code for control of lockd and nfsv4 grace periods.
> + *
> + * Transplanted from lockd code
>   */
>  
>  #include <linux/module.h>
> -#include <linux/lockd/bind.h>
>  #include <net/net_namespace.h>
> +#include <net/netns/generic.h>
> +#include <linux/fs.h>
>  
> -#include "netns.h"
> -
> +static int grace_net_id;
>  static DEFINE_SPINLOCK(grace_lock);
>  
>  /**
>   * locks_start_grace
> + * @net: net namespace that this lock manager belongs to
>   * @lm: who this grace period is for
>   *
>   * A grace period is a period during which locks should not be given
> @@ -21,18 +24,20 @@ static DEFINE_SPINLOCK(grace_lock);
>   *
>   * This function is called to start a grace period.
>   */
> -void locks_start_grace(struct net *net, struct lock_manager *lm)
> +void
> +locks_start_grace(struct net *net, struct lock_manager *lm)
>  {
> -	struct lockd_net *ln = net_generic(net, lockd_net_id);
> +	struct list_head *grace_list = net_generic(net, grace_net_id);
>  
>  	spin_lock(&grace_lock);
> -	list_add(&lm->list, &ln->grace_list);
> +	list_add(&lm->list, grace_list);
>  	spin_unlock(&grace_lock);
>  }
>  EXPORT_SYMBOL_GPL(locks_start_grace);
>  
>  /**
>   * locks_end_grace
> + * @net: net namespace that this lock manager belongs to
>   * @lm: who this grace period is for
>   *
>   * Call this function to state that the given lock manager is ready to
> @@ -41,7 +46,8 @@ EXPORT_SYMBOL_GPL(locks_start_grace);
>   * Note that callers count on it being safe to call this more than once,
>   * and the second call should be a no-op.
>   */
> -void locks_end_grace(struct lock_manager *lm)
> +void
> +locks_end_grace(struct lock_manager *lm)
>  {
>  	spin_lock(&grace_lock);
>  	list_del_init(&lm->list);
> @@ -56,10 +62,52 @@ EXPORT_SYMBOL_GPL(locks_end_grace);
>   * to answer ordinary lock requests, and when they should accept only
>   * lock reclaims.
>   */
> -int locks_in_grace(struct net *net)
> +int
> +locks_in_grace(struct net *net)
>  {
> -	struct lockd_net *ln = net_generic(net, lockd_net_id);
> +	struct list_head *grace_list = net_generic(net, grace_net_id);
>  
> -	return !list_empty(&ln->grace_list);
> +	return !list_empty(grace_list);
>  }
>  EXPORT_SYMBOL_GPL(locks_in_grace);
> +
> +static int __net_init
> +grace_init_net(struct net *net)
> +{
> +	struct list_head *grace_list = net_generic(net, grace_net_id);
> +
> +	INIT_LIST_HEAD(grace_list);
> +	return 0;
> +}
> +
> +static void __net_exit
> +grace_exit_net(struct net *net)
> +{
> +	struct list_head *grace_list = net_generic(net, grace_net_id);
> +
> +	BUG_ON(!list_empty(grace_list));
> +}
> +
> +static struct pernet_operations grace_net_ops = {
> +	.init = grace_init_net,
> +	.exit = grace_exit_net,
> +	.id   = &grace_net_id,
> +	.size = sizeof(struct list_head),
> +};
> +
> +static int __init
> +init_grace(void)
> +{
> +	return register_pernet_subsys(&grace_net_ops);
> +}
> +
> +static void __exit
> +exit_grace(void)
> +{
> +	unregister_pernet_subsys(&grace_net_ops);
> +}
> +
> +MODULE_AUTHOR("Jeff Layton <jlayton@primarydata.com>");
> +MODULE_LICENSE("GPL");
> +module_init(init_grace)
> +module_exit(exit_grace)
> diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
> index f994e750e0d1..4fa98764de21 100644
> --- a/fs/nfsd/Kconfig
> +++ b/fs/nfsd/Kconfig
> @@ -71,6 +71,7 @@ config NFSD_V4
>  	select FS_POSIX_ACL
>  	select SUNRPC_GSS
>  	select CRYPTO
> +	select GRACE_PERIOD
>  	help
>  	  This option enables support in your system's NFS server for
>  	  version 4 of the NFS protocol (RFC 3530).
> diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
> index 9d117f61d976..b97bf2ef996e 100644
> --- a/include/linux/proc_fs.h
> +++ b/include/linux/proc_fs.h
> @@ -74,6 +74,8 @@ static inline int remove_proc_subtree(const char *name, struct proc_dir_entry *p
>  
>  #endif /* CONFIG_PROC_FS */
>  
> +struct net;
> +
>  static inline struct proc_dir_entry *proc_net_mkdir(
>  	struct net *net, const char *name, struct proc_dir_entry *parent)
>  {
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module
  2014-08-28 20:01   ` J. Bruce Fields
@ 2014-08-28 20:24     ` J. Bruce Fields
  2014-08-28 23:53       ` Jeff Layton
  2014-09-15 22:08       ` J. Bruce Fields
  0 siblings, 2 replies; 37+ messages in thread
From: J. Bruce Fields @ 2014-08-28 20:24 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Thu, Aug 28, 2014 at 04:01:00PM -0400, J. Bruce Fields wrote:
> On Tue, Aug 19, 2014 at 02:38:25PM -0400, Jeff Layton wrote:
> > Currently, all of the grace period handling is part of lockd. Eventually
> > though we'd like to be able to build v4-only servers, at which point
> > we'll need to put all of this elsewhere.
> > 
> > Move the code itself into fs/nfs_common and have it build a grace.ko
> > module. Then, rejigger the Kconfig options so that both nfsd and lockd
> > enable it automatically.
> 
> Thanks, applying this one for 3.18 indepedently of the others.

This code should also be fixed, though.

Currently nfsd is recording the grace period as done when its own timer
runs out, but then it continuing to accept reclaims until lockd is also
done.

This is an unusual bug to actually hit in part because lockd's grace
period is by default less than nfsd's.

Your code introduces new cases when nfsd's grace period could end before
lockd's, but in all of them future reclaims would be denied (because the
only clients active on last shutdown would be 4.1 clients that have all
issued RECLAIM_COMPLETE, so reclaims are all going to be denied).

So maybe it's not urgent.

--b.

> 
> --b.
> 
> > 
> > Signed-off-by: Jeff Layton <jlayton@primarydata.com>
> > ---
> >  fs/Kconfig                       |  6 +++-
> >  fs/lockd/Makefile                |  2 +-
> >  fs/lockd/netns.h                 |  1 -
> >  fs/lockd/svc.c                   |  1 -
> >  fs/nfs_common/Makefile           |  3 +-
> >  fs/{lockd => nfs_common}/grace.c | 68 ++++++++++++++++++++++++++++++++++------
> >  fs/nfsd/Kconfig                  |  1 +
> >  include/linux/proc_fs.h          |  2 ++
> >  8 files changed, 69 insertions(+), 15 deletions(-)
> >  rename fs/{lockd => nfs_common}/grace.c (50%)
> > 
> > diff --git a/fs/Kconfig b/fs/Kconfig
> > index 312393f32948..db5dc1598716 100644
> > --- a/fs/Kconfig
> > +++ b/fs/Kconfig
> > @@ -233,9 +233,13 @@ if NETWORK_FILESYSTEMS
> >  source "fs/nfs/Kconfig"
> >  source "fs/nfsd/Kconfig"
> >  
> > +config GRACE_PERIOD
> > +	tristate
> > +
> >  config LOCKD
> >  	tristate
> >  	depends on FILE_LOCKING
> > +	select GRACE_PERIOD
> >  
> >  config LOCKD_V4
> >  	bool
> > @@ -249,7 +253,7 @@ config NFS_ACL_SUPPORT
> >  
> >  config NFS_COMMON
> >  	bool
> > -	depends on NFSD || NFS_FS
> > +	depends on NFSD || NFS_FS || LOCKD
> >  	default y
> >  
> >  source "net/sunrpc/Kconfig"
> > diff --git a/fs/lockd/Makefile b/fs/lockd/Makefile
> > index ca58d64374ca..6a0b351ce30e 100644
> > --- a/fs/lockd/Makefile
> > +++ b/fs/lockd/Makefile
> > @@ -5,6 +5,6 @@
> >  obj-$(CONFIG_LOCKD) += lockd.o
> >  
> >  lockd-objs-y := clntlock.o clntproc.o clntxdr.o host.o svc.o svclock.o \
> > -	        svcshare.o svcproc.o svcsubs.o mon.o xdr.o grace.o
> > +	        svcshare.o svcproc.o svcsubs.o mon.o xdr.o
> >  lockd-objs-$(CONFIG_LOCKD_V4) += clnt4xdr.o xdr4.o svc4proc.o
> >  lockd-objs		      := $(lockd-objs-y)
> > diff --git a/fs/lockd/netns.h b/fs/lockd/netns.h
> > index 5010b55628b4..097bfa3adb1c 100644
> > --- a/fs/lockd/netns.h
> > +++ b/fs/lockd/netns.h
> > @@ -11,7 +11,6 @@ struct lockd_net {
> >  
> >  	struct delayed_work grace_period_end;
> >  	struct lock_manager lockd_manager;
> > -	struct list_head grace_list;
> >  
> >  	spinlock_t nsm_clnt_lock;
> >  	unsigned int nsm_users;
> > diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> > index 8f27c93f8d2e..3599c9ca28ae 100644
> > --- a/fs/lockd/svc.c
> > +++ b/fs/lockd/svc.c
> > @@ -583,7 +583,6 @@ static int lockd_init_net(struct net *net)
> >  	struct lockd_net *ln = net_generic(net, lockd_net_id);
> >  
> >  	INIT_DELAYED_WORK(&ln->grace_period_end, grace_ender);
> > -	INIT_LIST_HEAD(&ln->grace_list);
> >  	spin_lock_init(&ln->nsm_clnt_lock);
> >  	return 0;
> >  }
> > diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
> > index f689ed82af3a..d153ca3ea577 100644
> > --- a/fs/nfs_common/Makefile
> > +++ b/fs/nfs_common/Makefile
> > @@ -3,5 +3,6 @@
> >  #
> >  
> >  obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o
> > -
> >  nfs_acl-objs := nfsacl.o
> > +
> > +obj-$(CONFIG_GRACE_PERIOD) += grace.o
> > diff --git a/fs/lockd/grace.c b/fs/nfs_common/grace.c
> > similarity index 50%
> > rename from fs/lockd/grace.c
> > rename to fs/nfs_common/grace.c
> > index 6d1ee7204c88..ae6e58ea4de5 100644
> > --- a/fs/lockd/grace.c
> > +++ b/fs/nfs_common/grace.c
> > @@ -1,17 +1,20 @@
> >  /*
> >   * Common code for control of lockd and nfsv4 grace periods.
> > + *
> > + * Transplanted from lockd code
> >   */
> >  
> >  #include <linux/module.h>
> > -#include <linux/lockd/bind.h>
> >  #include <net/net_namespace.h>
> > +#include <net/netns/generic.h>
> > +#include <linux/fs.h>
> >  
> > -#include "netns.h"
> > -
> > +static int grace_net_id;
> >  static DEFINE_SPINLOCK(grace_lock);
> >  
> >  /**
> >   * locks_start_grace
> > + * @net: net namespace that this lock manager belongs to
> >   * @lm: who this grace period is for
> >   *
> >   * A grace period is a period during which locks should not be given
> > @@ -21,18 +24,20 @@ static DEFINE_SPINLOCK(grace_lock);
> >   *
> >   * This function is called to start a grace period.
> >   */
> > -void locks_start_grace(struct net *net, struct lock_manager *lm)
> > +void
> > +locks_start_grace(struct net *net, struct lock_manager *lm)
> >  {
> > -	struct lockd_net *ln = net_generic(net, lockd_net_id);
> > +	struct list_head *grace_list = net_generic(net, grace_net_id);
> >  
> >  	spin_lock(&grace_lock);
> > -	list_add(&lm->list, &ln->grace_list);
> > +	list_add(&lm->list, grace_list);
> >  	spin_unlock(&grace_lock);
> >  }
> >  EXPORT_SYMBOL_GPL(locks_start_grace);
> >  
> >  /**
> >   * locks_end_grace
> > + * @net: net namespace that this lock manager belongs to
> >   * @lm: who this grace period is for
> >   *
> >   * Call this function to state that the given lock manager is ready to
> > @@ -41,7 +46,8 @@ EXPORT_SYMBOL_GPL(locks_start_grace);
> >   * Note that callers count on it being safe to call this more than once,
> >   * and the second call should be a no-op.
> >   */
> > -void locks_end_grace(struct lock_manager *lm)
> > +void
> > +locks_end_grace(struct lock_manager *lm)
> >  {
> >  	spin_lock(&grace_lock);
> >  	list_del_init(&lm->list);
> > @@ -56,10 +62,52 @@ EXPORT_SYMBOL_GPL(locks_end_grace);
> >   * to answer ordinary lock requests, and when they should accept only
> >   * lock reclaims.
> >   */
> > -int locks_in_grace(struct net *net)
> > +int
> > +locks_in_grace(struct net *net)
> >  {
> > -	struct lockd_net *ln = net_generic(net, lockd_net_id);
> > +	struct list_head *grace_list = net_generic(net, grace_net_id);
> >  
> > -	return !list_empty(&ln->grace_list);
> > +	return !list_empty(grace_list);
> >  }
> >  EXPORT_SYMBOL_GPL(locks_in_grace);
> > +
> > +static int __net_init
> > +grace_init_net(struct net *net)
> > +{
> > +	struct list_head *grace_list = net_generic(net, grace_net_id);
> > +
> > +	INIT_LIST_HEAD(grace_list);
> > +	return 0;
> > +}
> > +
> > +static void __net_exit
> > +grace_exit_net(struct net *net)
> > +{
> > +	struct list_head *grace_list = net_generic(net, grace_net_id);
> > +
> > +	BUG_ON(!list_empty(grace_list));
> > +}
> > +
> > +static struct pernet_operations grace_net_ops = {
> > +	.init = grace_init_net,
> > +	.exit = grace_exit_net,
> > +	.id   = &grace_net_id,
> > +	.size = sizeof(struct list_head),
> > +};
> > +
> > +static int __init
> > +init_grace(void)
> > +{
> > +	return register_pernet_subsys(&grace_net_ops);
> > +}
> > +
> > +static void __exit
> > +exit_grace(void)
> > +{
> > +	unregister_pernet_subsys(&grace_net_ops);
> > +}
> > +
> > +MODULE_AUTHOR("Jeff Layton <jlayton@primarydata.com>");
> > +MODULE_LICENSE("GPL");
> > +module_init(init_grace)
> > +module_exit(exit_grace)
> > diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
> > index f994e750e0d1..4fa98764de21 100644
> > --- a/fs/nfsd/Kconfig
> > +++ b/fs/nfsd/Kconfig
> > @@ -71,6 +71,7 @@ config NFSD_V4
> >  	select FS_POSIX_ACL
> >  	select SUNRPC_GSS
> >  	select CRYPTO
> > +	select GRACE_PERIOD
> >  	help
> >  	  This option enables support in your system's NFS server for
> >  	  version 4 of the NFS protocol (RFC 3530).
> > diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
> > index 9d117f61d976..b97bf2ef996e 100644
> > --- a/include/linux/proc_fs.h
> > +++ b/include/linux/proc_fs.h
> > @@ -74,6 +74,8 @@ static inline int remove_proc_subtree(const char *name, struct proc_dir_entry *p
> >  
> >  #endif /* CONFIG_PROC_FS */
> >  
> > +struct net;
> > +
> >  static inline struct proc_dir_entry *proc_net_mkdir(
> >  	struct net *net, const char *name, struct proc_dir_entry *parent)
> >  {
> > -- 
> > 1.9.3
> > 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module
  2014-08-28 20:24     ` J. Bruce Fields
@ 2014-08-28 23:53       ` Jeff Layton
  2014-09-03 21:54         ` J. Bruce Fields
  2014-09-15 22:08       ` J. Bruce Fields
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff Layton @ 2014-08-28 23:53 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Thu, 28 Aug 2014 16:24:43 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Thu, Aug 28, 2014 at 04:01:00PM -0400, J. Bruce Fields wrote:
> > On Tue, Aug 19, 2014 at 02:38:25PM -0400, Jeff Layton wrote:
> > > Currently, all of the grace period handling is part of lockd. Eventually
> > > though we'd like to be able to build v4-only servers, at which point
> > > we'll need to put all of this elsewhere.
> > > 
> > > Move the code itself into fs/nfs_common and have it build a grace.ko
> > > module. Then, rejigger the Kconfig options so that both nfsd and lockd
> > > enable it automatically.
> > 
> > Thanks, applying this one for 3.18 indepedently of the others.
> 
> This code should also be fixed, though.
> 
> Currently nfsd is recording the grace period as done when its own timer
> runs out, but then it continuing to accept reclaims until lockd is also
> done.
> 
> This is an unusual bug to actually hit in part because lockd's grace
> period is by default less than nfsd's.
> 
> Your code introduces new cases when nfsd's grace period could end before
> lockd's, but in all of them future reclaims would be denied (because the
> only clients active on last shutdown would be 4.1 clients that have all
> issued RECLAIM_COMPLETE, so reclaims are all going to be denied).
> 
> So maybe it's not urgent.
> 
> --b.
> 

Yeah, I don't really see a bug here...

We'll only lift the nfsd grace period once all the v4.1 clients have
sent a RECLAIM_COMPLETE and if there aren't any v4.0 ones. At that
point, no one should be sending any reclaims but we'll still allow them
if the lockd grace period hasn't been lifted yet.

I suppose that we could deny them in principle, but there doesn't seem
to be any real harm in allowing it to happen. The simple "fix" for it
would be to have nfs4_check_open_reclaim do a test for
NFSD4_CLIENT_RECLAIM_COMPLETE, and reject the reclaim request if it has
been set.

It also occurs to me that we probably ought to have the nfsdcltrack
upcall set and test NFSD4_CLIENT_STABLE to reduce upcalls. The fact
that we don't do that there seems inefficient (though it shouldn't harm
correctness).

> > 
> > --b.
> > 
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@primarydata.com>
> > > ---
> > >  fs/Kconfig                       |  6 +++-
> > >  fs/lockd/Makefile                |  2 +-
> > >  fs/lockd/netns.h                 |  1 -
> > >  fs/lockd/svc.c                   |  1 -
> > >  fs/nfs_common/Makefile           |  3 +-
> > >  fs/{lockd => nfs_common}/grace.c | 68 ++++++++++++++++++++++++++++++++++------
> > >  fs/nfsd/Kconfig                  |  1 +
> > >  include/linux/proc_fs.h          |  2 ++
> > >  8 files changed, 69 insertions(+), 15 deletions(-)
> > >  rename fs/{lockd => nfs_common}/grace.c (50%)
> > > 
> > > diff --git a/fs/Kconfig b/fs/Kconfig
> > > index 312393f32948..db5dc1598716 100644
> > > --- a/fs/Kconfig
> > > +++ b/fs/Kconfig
> > > @@ -233,9 +233,13 @@ if NETWORK_FILESYSTEMS
> > >  source "fs/nfs/Kconfig"
> > >  source "fs/nfsd/Kconfig"
> > >  
> > > +config GRACE_PERIOD
> > > +	tristate
> > > +
> > >  config LOCKD
> > >  	tristate
> > >  	depends on FILE_LOCKING
> > > +	select GRACE_PERIOD
> > >  
> > >  config LOCKD_V4
> > >  	bool
> > > @@ -249,7 +253,7 @@ config NFS_ACL_SUPPORT
> > >  
> > >  config NFS_COMMON
> > >  	bool
> > > -	depends on NFSD || NFS_FS
> > > +	depends on NFSD || NFS_FS || LOCKD
> > >  	default y
> > >  
> > >  source "net/sunrpc/Kconfig"
> > > diff --git a/fs/lockd/Makefile b/fs/lockd/Makefile
> > > index ca58d64374ca..6a0b351ce30e 100644
> > > --- a/fs/lockd/Makefile
> > > +++ b/fs/lockd/Makefile
> > > @@ -5,6 +5,6 @@
> > >  obj-$(CONFIG_LOCKD) += lockd.o
> > >  
> > >  lockd-objs-y := clntlock.o clntproc.o clntxdr.o host.o svc.o svclock.o \
> > > -	        svcshare.o svcproc.o svcsubs.o mon.o xdr.o grace.o
> > > +	        svcshare.o svcproc.o svcsubs.o mon.o xdr.o
> > >  lockd-objs-$(CONFIG_LOCKD_V4) += clnt4xdr.o xdr4.o svc4proc.o
> > >  lockd-objs		      := $(lockd-objs-y)
> > > diff --git a/fs/lockd/netns.h b/fs/lockd/netns.h
> > > index 5010b55628b4..097bfa3adb1c 100644
> > > --- a/fs/lockd/netns.h
> > > +++ b/fs/lockd/netns.h
> > > @@ -11,7 +11,6 @@ struct lockd_net {
> > >  
> > >  	struct delayed_work grace_period_end;
> > >  	struct lock_manager lockd_manager;
> > > -	struct list_head grace_list;
> > >  
> > >  	spinlock_t nsm_clnt_lock;
> > >  	unsigned int nsm_users;
> > > diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> > > index 8f27c93f8d2e..3599c9ca28ae 100644
> > > --- a/fs/lockd/svc.c
> > > +++ b/fs/lockd/svc.c
> > > @@ -583,7 +583,6 @@ static int lockd_init_net(struct net *net)
> > >  	struct lockd_net *ln = net_generic(net, lockd_net_id);
> > >  
> > >  	INIT_DELAYED_WORK(&ln->grace_period_end, grace_ender);
> > > -	INIT_LIST_HEAD(&ln->grace_list);
> > >  	spin_lock_init(&ln->nsm_clnt_lock);
> > >  	return 0;
> > >  }
> > > diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
> > > index f689ed82af3a..d153ca3ea577 100644
> > > --- a/fs/nfs_common/Makefile
> > > +++ b/fs/nfs_common/Makefile
> > > @@ -3,5 +3,6 @@
> > >  #
> > >  
> > >  obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o
> > > -
> > >  nfs_acl-objs := nfsacl.o
> > > +
> > > +obj-$(CONFIG_GRACE_PERIOD) += grace.o
> > > diff --git a/fs/lockd/grace.c b/fs/nfs_common/grace.c
> > > similarity index 50%
> > > rename from fs/lockd/grace.c
> > > rename to fs/nfs_common/grace.c
> > > index 6d1ee7204c88..ae6e58ea4de5 100644
> > > --- a/fs/lockd/grace.c
> > > +++ b/fs/nfs_common/grace.c
> > > @@ -1,17 +1,20 @@
> > >  /*
> > >   * Common code for control of lockd and nfsv4 grace periods.
> > > + *
> > > + * Transplanted from lockd code
> > >   */
> > >  
> > >  #include <linux/module.h>
> > > -#include <linux/lockd/bind.h>
> > >  #include <net/net_namespace.h>
> > > +#include <net/netns/generic.h>
> > > +#include <linux/fs.h>
> > >  
> > > -#include "netns.h"
> > > -
> > > +static int grace_net_id;
> > >  static DEFINE_SPINLOCK(grace_lock);
> > >  
> > >  /**
> > >   * locks_start_grace
> > > + * @net: net namespace that this lock manager belongs to
> > >   * @lm: who this grace period is for
> > >   *
> > >   * A grace period is a period during which locks should not be given
> > > @@ -21,18 +24,20 @@ static DEFINE_SPINLOCK(grace_lock);
> > >   *
> > >   * This function is called to start a grace period.
> > >   */
> > > -void locks_start_grace(struct net *net, struct lock_manager *lm)
> > > +void
> > > +locks_start_grace(struct net *net, struct lock_manager *lm)
> > >  {
> > > -	struct lockd_net *ln = net_generic(net, lockd_net_id);
> > > +	struct list_head *grace_list = net_generic(net, grace_net_id);
> > >  
> > >  	spin_lock(&grace_lock);
> > > -	list_add(&lm->list, &ln->grace_list);
> > > +	list_add(&lm->list, grace_list);
> > >  	spin_unlock(&grace_lock);
> > >  }
> > >  EXPORT_SYMBOL_GPL(locks_start_grace);
> > >  
> > >  /**
> > >   * locks_end_grace
> > > + * @net: net namespace that this lock manager belongs to
> > >   * @lm: who this grace period is for
> > >   *
> > >   * Call this function to state that the given lock manager is ready to
> > > @@ -41,7 +46,8 @@ EXPORT_SYMBOL_GPL(locks_start_grace);
> > >   * Note that callers count on it being safe to call this more than once,
> > >   * and the second call should be a no-op.
> > >   */
> > > -void locks_end_grace(struct lock_manager *lm)
> > > +void
> > > +locks_end_grace(struct lock_manager *lm)
> > >  {
> > >  	spin_lock(&grace_lock);
> > >  	list_del_init(&lm->list);
> > > @@ -56,10 +62,52 @@ EXPORT_SYMBOL_GPL(locks_end_grace);
> > >   * to answer ordinary lock requests, and when they should accept only
> > >   * lock reclaims.
> > >   */
> > > -int locks_in_grace(struct net *net)
> > > +int
> > > +locks_in_grace(struct net *net)
> > >  {
> > > -	struct lockd_net *ln = net_generic(net, lockd_net_id);
> > > +	struct list_head *grace_list = net_generic(net, grace_net_id);
> > >  
> > > -	return !list_empty(&ln->grace_list);
> > > +	return !list_empty(grace_list);
> > >  }
> > >  EXPORT_SYMBOL_GPL(locks_in_grace);
> > > +
> > > +static int __net_init
> > > +grace_init_net(struct net *net)
> > > +{
> > > +	struct list_head *grace_list = net_generic(net, grace_net_id);
> > > +
> > > +	INIT_LIST_HEAD(grace_list);
> > > +	return 0;
> > > +}
> > > +
> > > +static void __net_exit
> > > +grace_exit_net(struct net *net)
> > > +{
> > > +	struct list_head *grace_list = net_generic(net, grace_net_id);
> > > +
> > > +	BUG_ON(!list_empty(grace_list));
> > > +}
> > > +
> > > +static struct pernet_operations grace_net_ops = {
> > > +	.init = grace_init_net,
> > > +	.exit = grace_exit_net,
> > > +	.id   = &grace_net_id,
> > > +	.size = sizeof(struct list_head),
> > > +};
> > > +
> > > +static int __init
> > > +init_grace(void)
> > > +{
> > > +	return register_pernet_subsys(&grace_net_ops);
> > > +}
> > > +
> > > +static void __exit
> > > +exit_grace(void)
> > > +{
> > > +	unregister_pernet_subsys(&grace_net_ops);
> > > +}
> > > +
> > > +MODULE_AUTHOR("Jeff Layton <jlayton@primarydata.com>");
> > > +MODULE_LICENSE("GPL");
> > > +module_init(init_grace)
> > > +module_exit(exit_grace)
> > > diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
> > > index f994e750e0d1..4fa98764de21 100644
> > > --- a/fs/nfsd/Kconfig
> > > +++ b/fs/nfsd/Kconfig
> > > @@ -71,6 +71,7 @@ config NFSD_V4
> > >  	select FS_POSIX_ACL
> > >  	select SUNRPC_GSS
> > >  	select CRYPTO
> > > +	select GRACE_PERIOD
> > >  	help
> > >  	  This option enables support in your system's NFS server for
> > >  	  version 4 of the NFS protocol (RFC 3530).
> > > diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
> > > index 9d117f61d976..b97bf2ef996e 100644
> > > --- a/include/linux/proc_fs.h
> > > +++ b/include/linux/proc_fs.h
> > > @@ -74,6 +74,8 @@ static inline int remove_proc_subtree(const char *name, struct proc_dir_entry *p
> > >  
> > >  #endif /* CONFIG_PROC_FS */
> > >  
> > > +struct net;
> > > +
> > >  static inline struct proc_dir_entry *proc_net_mkdir(
> > >  	struct net *net, const char *name, struct proc_dir_entry *parent)
> > >  {
> > > -- 
> > > 1.9.3
> > > 


-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module
  2014-08-28 23:53       ` Jeff Layton
@ 2014-09-03 21:54         ` J. Bruce Fields
  0 siblings, 0 replies; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-03 21:54 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Thu, Aug 28, 2014 at 07:53:26PM -0400, Jeff Layton wrote:
> On Thu, 28 Aug 2014 16:24:43 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Thu, Aug 28, 2014 at 04:01:00PM -0400, J. Bruce Fields wrote:
> > > On Tue, Aug 19, 2014 at 02:38:25PM -0400, Jeff Layton wrote:
> > > > Currently, all of the grace period handling is part of lockd. Eventually
> > > > though we'd like to be able to build v4-only servers, at which point
> > > > we'll need to put all of this elsewhere.
> > > > 
> > > > Move the code itself into fs/nfs_common and have it build a grace.ko
> > > > module. Then, rejigger the Kconfig options so that both nfsd and lockd
> > > > enable it automatically.
> > > 
> > > Thanks, applying this one for 3.18 indepedently of the others.
> > 
> > This code should also be fixed, though.
> > 
> > Currently nfsd is recording the grace period as done when its own timer
> > runs out, but then it continuing to accept reclaims until lockd is also
> > done.
> > 
> > This is an unusual bug to actually hit in part because lockd's grace
> > period is by default less than nfsd's.
> > 
> > Your code introduces new cases when nfsd's grace period could end before
> > lockd's, but in all of them future reclaims would be denied (because the
> > only clients active on last shutdown would be 4.1 clients that have all
> > issued RECLAIM_COMPLETE, so reclaims are all going to be denied).
> > 
> > So maybe it's not urgent.
> > 
> > --b.
> > 
> 
> Yeah, I don't really see a bug here...
> 
> We'll only lift the nfsd grace period once all the v4.1 clients have
> sent a RECLAIM_COMPLETE and if there aren't any v4.0 ones. At that
> point, no one should be sending any reclaims but we'll still allow them
> if the lockd grace period hasn't been lifted yet.
> 
> I suppose that we could deny them in principle, but there doesn't seem
> to be any real harm in allowing it to happen. The simple "fix" for it
> would be to have nfs4_check_open_reclaim do a test for
> NFSD4_CLIENT_RECLAIM_COMPLETE, and reject the reclaim request if it has
> been set.

Oops, I didn't realize we don't already do that.  I don't know that such
reclaims are really harmless.  But since only a very broken client would
send them maybe it's not a big deal that we don't catch them.  Still,
it's probably a bug not to check for that.

Anyway, we still do have the preexisting bug in the (non-default) case
where the nfsd grace period ends first.

--b.

> It also occurs to me that we probably ought to have the nfsdcltrack
> upcall set and test NFSD4_CLIENT_STABLE to reduce upcalls. The fact
> that we don't do that there seems inefficient (though it shouldn't harm
> correctness).
> 
> > > 
> > > --b.
> > > 
> > > > 
> > > > Signed-off-by: Jeff Layton <jlayton@primarydata.com>
> > > > ---
> > > >  fs/Kconfig                       |  6 +++-
> > > >  fs/lockd/Makefile                |  2 +-
> > > >  fs/lockd/netns.h                 |  1 -
> > > >  fs/lockd/svc.c                   |  1 -
> > > >  fs/nfs_common/Makefile           |  3 +-
> > > >  fs/{lockd => nfs_common}/grace.c | 68 ++++++++++++++++++++++++++++++++++------
> > > >  fs/nfsd/Kconfig                  |  1 +
> > > >  include/linux/proc_fs.h          |  2 ++
> > > >  8 files changed, 69 insertions(+), 15 deletions(-)
> > > >  rename fs/{lockd => nfs_common}/grace.c (50%)
> > > > 
> > > > diff --git a/fs/Kconfig b/fs/Kconfig
> > > > index 312393f32948..db5dc1598716 100644
> > > > --- a/fs/Kconfig
> > > > +++ b/fs/Kconfig
> > > > @@ -233,9 +233,13 @@ if NETWORK_FILESYSTEMS
> > > >  source "fs/nfs/Kconfig"
> > > >  source "fs/nfsd/Kconfig"
> > > >  
> > > > +config GRACE_PERIOD
> > > > +	tristate
> > > > +
> > > >  config LOCKD
> > > >  	tristate
> > > >  	depends on FILE_LOCKING
> > > > +	select GRACE_PERIOD
> > > >  
> > > >  config LOCKD_V4
> > > >  	bool
> > > > @@ -249,7 +253,7 @@ config NFS_ACL_SUPPORT
> > > >  
> > > >  config NFS_COMMON
> > > >  	bool
> > > > -	depends on NFSD || NFS_FS
> > > > +	depends on NFSD || NFS_FS || LOCKD
> > > >  	default y
> > > >  
> > > >  source "net/sunrpc/Kconfig"
> > > > diff --git a/fs/lockd/Makefile b/fs/lockd/Makefile
> > > > index ca58d64374ca..6a0b351ce30e 100644
> > > > --- a/fs/lockd/Makefile
> > > > +++ b/fs/lockd/Makefile
> > > > @@ -5,6 +5,6 @@
> > > >  obj-$(CONFIG_LOCKD) += lockd.o
> > > >  
> > > >  lockd-objs-y := clntlock.o clntproc.o clntxdr.o host.o svc.o svclock.o \
> > > > -	        svcshare.o svcproc.o svcsubs.o mon.o xdr.o grace.o
> > > > +	        svcshare.o svcproc.o svcsubs.o mon.o xdr.o
> > > >  lockd-objs-$(CONFIG_LOCKD_V4) += clnt4xdr.o xdr4.o svc4proc.o
> > > >  lockd-objs		      := $(lockd-objs-y)
> > > > diff --git a/fs/lockd/netns.h b/fs/lockd/netns.h
> > > > index 5010b55628b4..097bfa3adb1c 100644
> > > > --- a/fs/lockd/netns.h
> > > > +++ b/fs/lockd/netns.h
> > > > @@ -11,7 +11,6 @@ struct lockd_net {
> > > >  
> > > >  	struct delayed_work grace_period_end;
> > > >  	struct lock_manager lockd_manager;
> > > > -	struct list_head grace_list;
> > > >  
> > > >  	spinlock_t nsm_clnt_lock;
> > > >  	unsigned int nsm_users;
> > > > diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> > > > index 8f27c93f8d2e..3599c9ca28ae 100644
> > > > --- a/fs/lockd/svc.c
> > > > +++ b/fs/lockd/svc.c
> > > > @@ -583,7 +583,6 @@ static int lockd_init_net(struct net *net)
> > > >  	struct lockd_net *ln = net_generic(net, lockd_net_id);
> > > >  
> > > >  	INIT_DELAYED_WORK(&ln->grace_period_end, grace_ender);
> > > > -	INIT_LIST_HEAD(&ln->grace_list);
> > > >  	spin_lock_init(&ln->nsm_clnt_lock);
> > > >  	return 0;
> > > >  }
> > > > diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
> > > > index f689ed82af3a..d153ca3ea577 100644
> > > > --- a/fs/nfs_common/Makefile
> > > > +++ b/fs/nfs_common/Makefile
> > > > @@ -3,5 +3,6 @@
> > > >  #
> > > >  
> > > >  obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o
> > > > -
> > > >  nfs_acl-objs := nfsacl.o
> > > > +
> > > > +obj-$(CONFIG_GRACE_PERIOD) += grace.o
> > > > diff --git a/fs/lockd/grace.c b/fs/nfs_common/grace.c
> > > > similarity index 50%
> > > > rename from fs/lockd/grace.c
> > > > rename to fs/nfs_common/grace.c
> > > > index 6d1ee7204c88..ae6e58ea4de5 100644
> > > > --- a/fs/lockd/grace.c
> > > > +++ b/fs/nfs_common/grace.c
> > > > @@ -1,17 +1,20 @@
> > > >  /*
> > > >   * Common code for control of lockd and nfsv4 grace periods.
> > > > + *
> > > > + * Transplanted from lockd code
> > > >   */
> > > >  
> > > >  #include <linux/module.h>
> > > > -#include <linux/lockd/bind.h>
> > > >  #include <net/net_namespace.h>
> > > > +#include <net/netns/generic.h>
> > > > +#include <linux/fs.h>
> > > >  
> > > > -#include "netns.h"
> > > > -
> > > > +static int grace_net_id;
> > > >  static DEFINE_SPINLOCK(grace_lock);
> > > >  
> > > >  /**
> > > >   * locks_start_grace
> > > > + * @net: net namespace that this lock manager belongs to
> > > >   * @lm: who this grace period is for
> > > >   *
> > > >   * A grace period is a period during which locks should not be given
> > > > @@ -21,18 +24,20 @@ static DEFINE_SPINLOCK(grace_lock);
> > > >   *
> > > >   * This function is called to start a grace period.
> > > >   */
> > > > -void locks_start_grace(struct net *net, struct lock_manager *lm)
> > > > +void
> > > > +locks_start_grace(struct net *net, struct lock_manager *lm)
> > > >  {
> > > > -	struct lockd_net *ln = net_generic(net, lockd_net_id);
> > > > +	struct list_head *grace_list = net_generic(net, grace_net_id);
> > > >  
> > > >  	spin_lock(&grace_lock);
> > > > -	list_add(&lm->list, &ln->grace_list);
> > > > +	list_add(&lm->list, grace_list);
> > > >  	spin_unlock(&grace_lock);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(locks_start_grace);
> > > >  
> > > >  /**
> > > >   * locks_end_grace
> > > > + * @net: net namespace that this lock manager belongs to
> > > >   * @lm: who this grace period is for
> > > >   *
> > > >   * Call this function to state that the given lock manager is ready to
> > > > @@ -41,7 +46,8 @@ EXPORT_SYMBOL_GPL(locks_start_grace);
> > > >   * Note that callers count on it being safe to call this more than once,
> > > >   * and the second call should be a no-op.
> > > >   */
> > > > -void locks_end_grace(struct lock_manager *lm)
> > > > +void
> > > > +locks_end_grace(struct lock_manager *lm)
> > > >  {
> > > >  	spin_lock(&grace_lock);
> > > >  	list_del_init(&lm->list);
> > > > @@ -56,10 +62,52 @@ EXPORT_SYMBOL_GPL(locks_end_grace);
> > > >   * to answer ordinary lock requests, and when they should accept only
> > > >   * lock reclaims.
> > > >   */
> > > > -int locks_in_grace(struct net *net)
> > > > +int
> > > > +locks_in_grace(struct net *net)
> > > >  {
> > > > -	struct lockd_net *ln = net_generic(net, lockd_net_id);
> > > > +	struct list_head *grace_list = net_generic(net, grace_net_id);
> > > >  
> > > > -	return !list_empty(&ln->grace_list);
> > > > +	return !list_empty(grace_list);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(locks_in_grace);
> > > > +
> > > > +static int __net_init
> > > > +grace_init_net(struct net *net)
> > > > +{
> > > > +	struct list_head *grace_list = net_generic(net, grace_net_id);
> > > > +
> > > > +	INIT_LIST_HEAD(grace_list);
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static void __net_exit
> > > > +grace_exit_net(struct net *net)
> > > > +{
> > > > +	struct list_head *grace_list = net_generic(net, grace_net_id);
> > > > +
> > > > +	BUG_ON(!list_empty(grace_list));
> > > > +}
> > > > +
> > > > +static struct pernet_operations grace_net_ops = {
> > > > +	.init = grace_init_net,
> > > > +	.exit = grace_exit_net,
> > > > +	.id   = &grace_net_id,
> > > > +	.size = sizeof(struct list_head),
> > > > +};
> > > > +
> > > > +static int __init
> > > > +init_grace(void)
> > > > +{
> > > > +	return register_pernet_subsys(&grace_net_ops);
> > > > +}
> > > > +
> > > > +static void __exit
> > > > +exit_grace(void)
> > > > +{
> > > > +	unregister_pernet_subsys(&grace_net_ops);
> > > > +}
> > > > +
> > > > +MODULE_AUTHOR("Jeff Layton <jlayton@primarydata.com>");
> > > > +MODULE_LICENSE("GPL");
> > > > +module_init(init_grace)
> > > > +module_exit(exit_grace)
> > > > diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
> > > > index f994e750e0d1..4fa98764de21 100644
> > > > --- a/fs/nfsd/Kconfig
> > > > +++ b/fs/nfsd/Kconfig
> > > > @@ -71,6 +71,7 @@ config NFSD_V4
> > > >  	select FS_POSIX_ACL
> > > >  	select SUNRPC_GSS
> > > >  	select CRYPTO
> > > > +	select GRACE_PERIOD
> > > >  	help
> > > >  	  This option enables support in your system's NFS server for
> > > >  	  version 4 of the NFS protocol (RFC 3530).
> > > > diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
> > > > index 9d117f61d976..b97bf2ef996e 100644
> > > > --- a/include/linux/proc_fs.h
> > > > +++ b/include/linux/proc_fs.h
> > > > @@ -74,6 +74,8 @@ static inline int remove_proc_subtree(const char *name, struct proc_dir_entry *p
> > > >  
> > > >  #endif /* CONFIG_PROC_FS */
> > > >  
> > > > +struct net;
> > > > +
> > > >  static inline struct proc_dir_entry *proc_net_mkdir(
> > > >  	struct net *net, const char *name, struct proc_dir_entry *parent)
> > > >  {
> > > > -- 
> > > > 1.9.3
> > > > 
> 
> 
> -- 
> Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 2/5] lockd: add a /proc/fs/lockd/nlm_end_grace file
  2014-08-19 18:38 ` [PATCH v2 2/5] lockd: add a /proc/fs/lockd/nlm_end_grace file Jeff Layton
@ 2014-09-04 19:52   ` J. Bruce Fields
  0 siblings, 0 replies; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-04 19:52 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Tue, Aug 19, 2014 at 02:38:26PM -0400, Jeff Layton wrote:
> Add a new procfile that will allow a (privileged) userland process to
> end the NLM grace period early. The basic idea here will be to have
> sm-notify write to this file, if it sent out no NOTIFY requests when
> it runs. In that situation, we can generally expect that there will be
> no reclaim requests so the grace period can be lifted early.
> 
> +static ssize_t
> +nlm_end_grace_write(struct file *file, const char __user *buf, size_t size,
> +		    loff_t *pos)
> +{
> +	struct lockd_net *ln = net_generic(current->nsproxy->net_ns, lockd_net_id);
> +
> +	if (size > 0)

I'd rather we expect some particular string ("Y\n"?) and return -EINVAL
otherwise, just in case we want to extend this some day.

--b.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/5] nfsd: add a v4_end_grace file to /proc/fs/nfsd
  2014-08-19 18:38 ` [PATCH v2 3/5] nfsd: add a v4_end_grace file to /proc/fs/nfsd Jeff Layton
@ 2014-09-04 19:54   ` J. Bruce Fields
  2014-09-05 11:40     ` Jeff Layton
  0 siblings, 1 reply; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-04 19:54 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Tue, Aug 19, 2014 at 02:38:27PM -0400, Jeff Layton wrote:
> Allow a privileged userland process to end the v4 grace period early.
> Any write to the file will cause the v4 grace period to be lifted.
> The basic idea with this will be to allow the userland client tracking
> program to lift the grace period once it knows that no more clients
> will be reclaiming state.
...
> +/**
> + * write_v4_end_grace - release grace period for nfsd's v4.x lock manager
> + *
> + * Input:
> + *			buf:		ignored
> + *			size:		zero
> + * OR
> + *
> + * Input:
> + * 			buf:		any value
> + *			size:		non-zero length of C string in @buf
> + * Output:
> + *			passed-in buffer filled with "Y" or "N" with a newline
> + *			and NULL-terminated C string. This indicates whether
> + *			the grace period has ended in the current net
> + *			namespace. Return code is the size in bytes of the
> + *			string. Writing to the file will end the grace period
> + *			for nfsd's v4 lock manager.
> + */
> +static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size)
> +{
> +	struct net *net = file->f_dentry->d_sb->s_fs_info;
> +	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> +
> +	if (size > 0)
> +		nfsd4_end_grace(nn);

Ditto for this one.

Do we really need separate files for nlm and nfsd?

I think the separate nlm and nfsd grace periods may just be a historical
mistake.

--b.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 4/5] nfsd: remove redundant boot_time parm from grace_done client tracking op
  2014-08-19 18:38 ` [PATCH v2 4/5] nfsd: remove redundant boot_time parm from grace_done client tracking op Jeff Layton
@ 2014-09-04 19:54   ` J. Bruce Fields
  0 siblings, 0 replies; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-04 19:54 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Tue, Aug 19, 2014 at 02:38:28PM -0400, Jeff Layton wrote:
> Since it's stored in nfsd_net, we don't need to pass it in separately.

OK.  I may as well take this for 3.18 now, I guess.

--b.

> 
> Signed-off-by: Jeff Layton <jlayton@primarydata.com>
> ---
>  fs/nfsd/nfs4recover.c | 17 ++++++++---------
>  fs/nfsd/nfs4state.c   |  2 +-
>  fs/nfsd/state.h       |  2 +-
>  3 files changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> index 9c271f42604a..a0d2ba956a3f 100644
> --- a/fs/nfsd/nfs4recover.c
> +++ b/fs/nfsd/nfs4recover.c
> @@ -58,7 +58,7 @@ struct nfsd4_client_tracking_ops {
>  	void (*create)(struct nfs4_client *);
>  	void (*remove)(struct nfs4_client *);
>  	int (*check)(struct nfs4_client *);
> -	void (*grace_done)(struct nfsd_net *, time_t);
> +	void (*grace_done)(struct nfsd_net *);
>  };
>  
>  /* Globals */
> @@ -392,7 +392,7 @@ purge_old(struct dentry *parent, struct dentry *child, struct nfsd_net *nn)
>  }
>  
>  static void
> -nfsd4_recdir_purge_old(struct nfsd_net *nn, time_t boot_time)
> +nfsd4_recdir_purge_old(struct nfsd_net *nn)
>  {
>  	int status;
>  
> @@ -1016,7 +1016,7 @@ nfsd4_cld_check(struct nfs4_client *clp)
>  }
>  
>  static void
> -nfsd4_cld_grace_done(struct nfsd_net *nn, time_t boot_time)
> +nfsd4_cld_grace_done(struct nfsd_net *nn)
>  {
>  	int ret;
>  	struct cld_upcall *cup;
> @@ -1029,7 +1029,7 @@ nfsd4_cld_grace_done(struct nfsd_net *nn, time_t boot_time)
>  	}
>  
>  	cup->cu_msg.cm_cmd = Cld_GraceDone;
> -	cup->cu_msg.cm_u.cm_gracetime = (int64_t)boot_time;
> +	cup->cu_msg.cm_u.cm_gracetime = (int64_t)nn->boot_time;
>  	ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_msg);
>  	if (!ret)
>  		ret = cup->cu_msg.cm_status;
> @@ -1245,13 +1245,12 @@ nfsd4_umh_cltrack_check(struct nfs4_client *clp)
>  }
>  
>  static void
> -nfsd4_umh_cltrack_grace_done(struct nfsd_net __attribute__((unused)) *nn,
> -				time_t boot_time)
> +nfsd4_umh_cltrack_grace_done(struct nfsd_net *nn)
>  {
>  	char *legacy;
>  	char timestr[22]; /* FIXME: better way to determine max size? */
>  
> -	sprintf(timestr, "%ld", boot_time);
> +	sprintf(timestr, "%ld", nn->boot_time);
>  	legacy = nfsd4_cltrack_legacy_topdir();
>  	nfsd4_umh_cltrack_upcall("gracedone", timestr, legacy);
>  	kfree(legacy);
> @@ -1356,10 +1355,10 @@ nfsd4_client_record_check(struct nfs4_client *clp)
>  }
>  
>  void
> -nfsd4_record_grace_done(struct nfsd_net *nn, time_t boot_time)
> +nfsd4_record_grace_done(struct nfsd_net *nn)
>  {
>  	if (nn->client_tracking_ops)
> -		nn->client_tracking_ops->grace_done(nn, boot_time);
> +		nn->client_tracking_ops->grace_done(nn);
>  }
>  
>  static int
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 711280e0e4ac..21becb29dae1 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -4116,7 +4116,7 @@ nfsd4_end_grace(struct nfsd_net *nn)
>  
>  	dprintk("NFSD: end of grace period\n");
>  	nn->grace_ended = true;
> -	nfsd4_record_grace_done(nn, nn->boot_time);
> +	nfsd4_record_grace_done(nn);
>  	locks_end_grace(&nn->nfsd4_manager);
>  	/*
>  	 * Now that every NFSv4 client has had the chance to recover and
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index ecf579904892..854f0c574ccf 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -554,7 +554,7 @@ extern void nfsd4_client_tracking_exit(struct net *net);
>  extern void nfsd4_client_record_create(struct nfs4_client *clp);
>  extern void nfsd4_client_record_remove(struct nfs4_client *clp);
>  extern int nfsd4_client_record_check(struct nfs4_client *clp);
> -extern void nfsd4_record_grace_done(struct nfsd_net *nn, time_t boot_time);
> +extern void nfsd4_record_grace_done(struct nfsd_net *nn);
>  
>  /* nfs fault injection functions */
>  #ifdef CONFIG_NFSD_FAULT_INJECTION
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 5/5] nfsd: pass extra info in env vars to upcalls to allow for early grace period end
  2014-08-19 18:38 ` [PATCH v2 5/5] nfsd: pass extra info in env vars to upcalls to allow for early grace period end Jeff Layton
@ 2014-09-04 19:59   ` J. Bruce Fields
  2014-09-05 11:43     ` Jeff Layton
  0 siblings, 1 reply; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-04 19:59 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Tue, Aug 19, 2014 at 02:38:29PM -0400, Jeff Layton wrote:
> In order to support lifting the grace period early, we must tell
> nfsdcltrack what sort of client the "create" upcall is for. We can't
> reliably tell if a v4.0 client has completed reclaiming, so we can only
> lift the grace period once all the v4.1+ clients have issued a
> RECLAIM_COMPLETE and if there are no v4.0 clients.

We really only care about 4.0 vs 4.1 clients.  Maybe
NFSDCLTRACK_CLIENT_WILL_RECLAIM_COMPLETE=Y|N would be better.

--b.

> 
> Also, in order to lift the grace period, we have to tell userland when
> the grace period started so that it can tell whether a RECLAIM_COMPLETE
> has been issued for each client since then.
> 
> Since this is all optional info, we pass it along in environment
> variables to the "init" and "create" upcalls. By doing this, we don't
> need to revise the upcall format. The UMH upcall can simply make use of
> this info if it happens to be present. If it's not then it can just
> avoid lifting the grace period early.
> 
> Signed-off-by: Jeff Layton <jlayton@primarydata.com>
> ---
>  fs/nfsd/nfs4recover.c | 90 ++++++++++++++++++++++++++++++++++++++++++++-------
>  fs/nfsd/nfs4state.c   |  4 +--
>  2 files changed, 80 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> index a0d2ba956a3f..2b61bcd92b58 100644
> --- a/fs/nfsd/nfs4recover.c
> +++ b/fs/nfsd/nfs4recover.c
> @@ -1062,6 +1062,8 @@ MODULE_PARM_DESC(cltrack_legacy_disable,
>  
>  #define LEGACY_TOPDIR_ENV_PREFIX "NFSDCLTRACK_LEGACY_TOPDIR="
>  #define LEGACY_RECDIR_ENV_PREFIX "NFSDCLTRACK_LEGACY_RECDIR="
> +#define CLIENT_MINORVERS_ENV_PREFIX "NFSDCLTRACK_CLIENT_MINORVERSION="
> +#define GRACE_START_ENV_PREFIX "NFSDCLTRACK_GRACE_START="
>  
>  static char *
>  nfsd4_cltrack_legacy_topdir(void)
> @@ -1126,10 +1128,60 @@ nfsd4_cltrack_legacy_recdir(const struct xdr_netobj *name)
>  	return result;
>  }
>  
> +static char *
> +nfsd4_cltrack_client_minorversion(struct nfs4_client *clp)
> +{
> +	int copied;
> +	size_t len;
> +	char *result;
> +
> +	/* prefix + max width of integer string + terminating NULL */
> +	len = strlen(CLIENT_MINORVERS_ENV_PREFIX) + 10 + 1;
> +
> +	result = kmalloc(len, GFP_KERNEL);
> +	if (!result)
> +		return result;
> +
> +	copied = snprintf(result, len, CLIENT_MINORVERS_ENV_PREFIX "%u",
> +				clp->cl_minorversion);
> +	if (copied >= len) {
> +		/* just return nothing if output was truncated */
> +		kfree(result);
> +		return NULL;
> +	}
> +
> +	return result;
> +}
> +
> +static char *
> +nfsd4_cltrack_grace_start(time_t grace_start)
> +{
> +	int copied;
> +	size_t len;
> +	char *result;
> +
> +	/* prefix + max width of int64_t string + terminating NULL */
> +	len = strlen(GRACE_START_ENV_PREFIX) + 22 + 1;
> +
> +	result = kmalloc(len, GFP_KERNEL);
> +	if (!result)
> +		return result;
> +
> +	copied = snprintf(result, len, GRACE_START_ENV_PREFIX "%ld",
> +				grace_start);
> +	if (copied >= len) {
> +		/* just return nothing if output was truncated */
> +		kfree(result);
> +		return NULL;
> +	}
> +
> +	return result;
> +}
> +
>  static int
> -nfsd4_umh_cltrack_upcall(char *cmd, char *arg, char *legacy)
> +nfsd4_umh_cltrack_upcall(char *cmd, char *arg, char *env0, char *env1)
>  {
> -	char *envp[2];
> +	char *envp[3];
>  	char *argv[4];
>  	int ret;
>  
> @@ -1140,10 +1192,12 @@ nfsd4_umh_cltrack_upcall(char *cmd, char *arg, char *legacy)
>  
>  	dprintk("%s: cmd: %s\n", __func__, cmd);
>  	dprintk("%s: arg: %s\n", __func__, arg ? arg : "(null)");
> -	dprintk("%s: legacy: %s\n", __func__, legacy ? legacy : "(null)");
> +	dprintk("%s: env0: %s\n", __func__, env0 ? env0 : "(null)");
> +	dprintk("%s: env1: %s\n", __func__, env1 ? env1 : "(null)");
>  
> -	envp[0] = legacy;
> -	envp[1] = NULL;
> +	envp[0] = env0;
> +	envp[1] = env1;
> +	envp[2] = NULL;
>  
>  	argv[0] = (char *)cltrack_prog;
>  	argv[1] = cmd;
> @@ -1187,28 +1241,40 @@ bin_to_hex_dup(const unsigned char *src, int srclen)
>  }
>  
>  static int
> -nfsd4_umh_cltrack_init(struct net __attribute__((unused)) *net)
> +nfsd4_umh_cltrack_init(struct net *net)
>  {
> +	int ret;
> +	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> +	char *grace_start = nfsd4_cltrack_grace_start(nn->boot_time);
> +
>  	/* XXX: The usermode helper s not working in container yet. */
>  	if (net != &init_net) {
>  		WARN(1, KERN_ERR "NFSD: attempt to initialize umh client "
>  			"tracking in a container!\n");
>  		return -EINVAL;
>  	}
> -	return nfsd4_umh_cltrack_upcall("init", NULL, NULL);
> +
> +	ret = nfsd4_umh_cltrack_upcall("init", NULL, grace_start, NULL);
> +	kfree(grace_start);
> +	return ret;
>  }
>  
>  static void
>  nfsd4_umh_cltrack_create(struct nfs4_client *clp)
>  {
> -	char *hexid;
> +	char *hexid, *minorvers, *grace_start;
> +	struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
>  
>  	hexid = bin_to_hex_dup(clp->cl_name.data, clp->cl_name.len);
>  	if (!hexid) {
>  		dprintk("%s: can't allocate memory for upcall!\n", __func__);
>  		return;
>  	}
> -	nfsd4_umh_cltrack_upcall("create", hexid, NULL);
> +	minorvers = nfsd4_cltrack_client_minorversion(clp);
> +	grace_start = nfsd4_cltrack_grace_start(nn->boot_time);
> +	nfsd4_umh_cltrack_upcall("create", hexid, minorvers, grace_start);
> +	kfree(minorvers);
> +	kfree(grace_start);
>  	kfree(hexid);
>  }
>  
> @@ -1222,7 +1288,7 @@ nfsd4_umh_cltrack_remove(struct nfs4_client *clp)
>  		dprintk("%s: can't allocate memory for upcall!\n", __func__);
>  		return;
>  	}
> -	nfsd4_umh_cltrack_upcall("remove", hexid, NULL);
> +	nfsd4_umh_cltrack_upcall("remove", hexid, NULL, NULL);
>  	kfree(hexid);
>  }
>  
> @@ -1238,7 +1304,7 @@ nfsd4_umh_cltrack_check(struct nfs4_client *clp)
>  		return -ENOMEM;
>  	}
>  	legacy = nfsd4_cltrack_legacy_recdir(&clp->cl_name);
> -	ret = nfsd4_umh_cltrack_upcall("check", hexid, legacy);
> +	ret = nfsd4_umh_cltrack_upcall("check", hexid, legacy, NULL);
>  	kfree(legacy);
>  	kfree(hexid);
>  	return ret;
> @@ -1252,7 +1318,7 @@ nfsd4_umh_cltrack_grace_done(struct nfsd_net *nn)
>  
>  	sprintf(timestr, "%ld", nn->boot_time);
>  	legacy = nfsd4_cltrack_legacy_topdir();
> -	nfsd4_umh_cltrack_upcall("gracedone", timestr, legacy);
> +	nfsd4_umh_cltrack_upcall("gracedone", timestr, legacy, NULL);
>  	kfree(legacy);
>  }
>  
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 21becb29dae1..5b04353bf2ec 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -6342,10 +6342,10 @@ nfs4_state_start_net(struct net *net)
>  	ret = nfs4_state_create_net(net);
>  	if (ret)
>  		return ret;
> -	nfsd4_client_tracking_init(net);
>  	nn->boot_time = get_seconds();
> -	locks_start_grace(net, &nn->nfsd4_manager);
>  	nn->grace_ended = false;
> +	locks_start_grace(net, &nn->nfsd4_manager);
> +	nfsd4_client_tracking_init(net);
>  	printk(KERN_INFO "NFSD: starting %ld-second grace period (net %p)\n",
>  	       nn->nfsd4_grace, net);
>  	queue_delayed_work(laundry_wq, &nn->laundromat_work, nn->nfsd4_grace * HZ);
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/5] nfsd: add a v4_end_grace file to /proc/fs/nfsd
  2014-09-04 19:54   ` J. Bruce Fields
@ 2014-09-05 11:40     ` Jeff Layton
  0 siblings, 0 replies; 37+ messages in thread
From: Jeff Layton @ 2014-09-05 11:40 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Thu, 4 Sep 2014 15:54:20 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Tue, Aug 19, 2014 at 02:38:27PM -0400, Jeff Layton wrote:
> > Allow a privileged userland process to end the v4 grace period early.
> > Any write to the file will cause the v4 grace period to be lifted.
> > The basic idea with this will be to allow the userland client tracking
> > program to lift the grace period once it knows that no more clients
> > will be reclaiming state.
> ...
> > +/**
> > + * write_v4_end_grace - release grace period for nfsd's v4.x lock manager
> > + *
> > + * Input:
> > + *			buf:		ignored
> > + *			size:		zero
> > + * OR
> > + *
> > + * Input:
> > + * 			buf:		any value
> > + *			size:		non-zero length of C string in @buf
> > + * Output:
> > + *			passed-in buffer filled with "Y" or "N" with a newline
> > + *			and NULL-terminated C string. This indicates whether
> > + *			the grace period has ended in the current net
> > + *			namespace. Return code is the size in bytes of the
> > + *			string. Writing to the file will end the grace period
> > + *			for nfsd's v4 lock manager.
> > + */
> > +static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size)
> > +{
> > +	struct net *net = file->f_dentry->d_sb->s_fs_info;
> > +	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > +
> > +	if (size > 0)
> > +		nfsd4_end_grace(nn);
> 
> Ditto for this one.
> 

Sure. I'll fix that up in the next iteration.

> Do we really need separate files for nlm and nfsd?
> 
> I think the separate nlm and nfsd grace periods may just be a historical
> mistake.
> 

statd and nfsdcltrack are separate programs, and both will need to
"sign off" before we can lift the grace period. With the way that grace
period management works today, I don't see a way to do this without
separate files. If you have an idea in mind for how to unify this
interface then I'm happy to entertain it however...

-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 5/5] nfsd: pass extra info in env vars to upcalls to allow for early grace period end
  2014-09-04 19:59   ` J. Bruce Fields
@ 2014-09-05 11:43     ` Jeff Layton
  2014-09-05 15:58       ` J. Bruce Fields
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Layton @ 2014-09-05 11:43 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Thu, 4 Sep 2014 15:59:14 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Tue, Aug 19, 2014 at 02:38:29PM -0400, Jeff Layton wrote:
> > In order to support lifting the grace period early, we must tell
> > nfsdcltrack what sort of client the "create" upcall is for. We can't
> > reliably tell if a v4.0 client has completed reclaiming, so we can only
> > lift the grace period once all the v4.1+ clients have issued a
> > RECLAIM_COMPLETE and if there are no v4.0 clients.
> 
> We really only care about 4.0 vs 4.1 clients.  Maybe
> NFSDCLTRACK_CLIENT_WILL_RECLAIM_COMPLETE=Y|N would be better.
> 
> --b.
> 

We have v4.2 in the queue now, and v4.3 will eventually come later.
Maybe v4.4 or v4.5 will change how recovery works? I figured it was
better to pass down more specific info and let userland sort out
what to do with it.

That said, if you think that a simple y/n flag is better, then I'll go
with that.

> > 
> > Also, in order to lift the grace period, we have to tell userland when
> > the grace period started so that it can tell whether a RECLAIM_COMPLETE
> > has been issued for each client since then.
> > 
> > Since this is all optional info, we pass it along in environment
> > variables to the "init" and "create" upcalls. By doing this, we don't
> > need to revise the upcall format. The UMH upcall can simply make use of
> > this info if it happens to be present. If it's not then it can just
> > avoid lifting the grace period early.
> > 
> > Signed-off-by: Jeff Layton <jlayton@primarydata.com>
> > ---
> >  fs/nfsd/nfs4recover.c | 90 ++++++++++++++++++++++++++++++++++++++++++++-------
> >  fs/nfsd/nfs4state.c   |  4 +--
> >  2 files changed, 80 insertions(+), 14 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> > index a0d2ba956a3f..2b61bcd92b58 100644
> > --- a/fs/nfsd/nfs4recover.c
> > +++ b/fs/nfsd/nfs4recover.c
> > @@ -1062,6 +1062,8 @@ MODULE_PARM_DESC(cltrack_legacy_disable,
> >  
> >  #define LEGACY_TOPDIR_ENV_PREFIX "NFSDCLTRACK_LEGACY_TOPDIR="
> >  #define LEGACY_RECDIR_ENV_PREFIX "NFSDCLTRACK_LEGACY_RECDIR="
> > +#define CLIENT_MINORVERS_ENV_PREFIX "NFSDCLTRACK_CLIENT_MINORVERSION="
> > +#define GRACE_START_ENV_PREFIX "NFSDCLTRACK_GRACE_START="
> >  
> >  static char *
> >  nfsd4_cltrack_legacy_topdir(void)
> > @@ -1126,10 +1128,60 @@ nfsd4_cltrack_legacy_recdir(const struct xdr_netobj *name)
> >  	return result;
> >  }
> >  
> > +static char *
> > +nfsd4_cltrack_client_minorversion(struct nfs4_client *clp)
> > +{
> > +	int copied;
> > +	size_t len;
> > +	char *result;
> > +
> > +	/* prefix + max width of integer string + terminating NULL */
> > +	len = strlen(CLIENT_MINORVERS_ENV_PREFIX) + 10 + 1;
> > +
> > +	result = kmalloc(len, GFP_KERNEL);
> > +	if (!result)
> > +		return result;
> > +
> > +	copied = snprintf(result, len, CLIENT_MINORVERS_ENV_PREFIX "%u",
> > +				clp->cl_minorversion);
> > +	if (copied >= len) {
> > +		/* just return nothing if output was truncated */
> > +		kfree(result);
> > +		return NULL;
> > +	}
> > +
> > +	return result;
> > +}
> > +
> > +static char *
> > +nfsd4_cltrack_grace_start(time_t grace_start)
> > +{
> > +	int copied;
> > +	size_t len;
> > +	char *result;
> > +
> > +	/* prefix + max width of int64_t string + terminating NULL */
> > +	len = strlen(GRACE_START_ENV_PREFIX) + 22 + 1;
> > +
> > +	result = kmalloc(len, GFP_KERNEL);
> > +	if (!result)
> > +		return result;
> > +
> > +	copied = snprintf(result, len, GRACE_START_ENV_PREFIX "%ld",
> > +				grace_start);
> > +	if (copied >= len) {
> > +		/* just return nothing if output was truncated */
> > +		kfree(result);
> > +		return NULL;
> > +	}
> > +
> > +	return result;
> > +}
> > +
> >  static int
> > -nfsd4_umh_cltrack_upcall(char *cmd, char *arg, char *legacy)
> > +nfsd4_umh_cltrack_upcall(char *cmd, char *arg, char *env0, char *env1)
> >  {
> > -	char *envp[2];
> > +	char *envp[3];
> >  	char *argv[4];
> >  	int ret;
> >  
> > @@ -1140,10 +1192,12 @@ nfsd4_umh_cltrack_upcall(char *cmd, char *arg, char *legacy)
> >  
> >  	dprintk("%s: cmd: %s\n", __func__, cmd);
> >  	dprintk("%s: arg: %s\n", __func__, arg ? arg : "(null)");
> > -	dprintk("%s: legacy: %s\n", __func__, legacy ? legacy : "(null)");
> > +	dprintk("%s: env0: %s\n", __func__, env0 ? env0 : "(null)");
> > +	dprintk("%s: env1: %s\n", __func__, env1 ? env1 : "(null)");
> >  
> > -	envp[0] = legacy;
> > -	envp[1] = NULL;
> > +	envp[0] = env0;
> > +	envp[1] = env1;
> > +	envp[2] = NULL;
> >  
> >  	argv[0] = (char *)cltrack_prog;
> >  	argv[1] = cmd;
> > @@ -1187,28 +1241,40 @@ bin_to_hex_dup(const unsigned char *src, int srclen)
> >  }
> >  
> >  static int
> > -nfsd4_umh_cltrack_init(struct net __attribute__((unused)) *net)
> > +nfsd4_umh_cltrack_init(struct net *net)
> >  {
> > +	int ret;
> > +	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > +	char *grace_start = nfsd4_cltrack_grace_start(nn->boot_time);
> > +
> >  	/* XXX: The usermode helper s not working in container yet. */
> >  	if (net != &init_net) {
> >  		WARN(1, KERN_ERR "NFSD: attempt to initialize umh client "
> >  			"tracking in a container!\n");
> >  		return -EINVAL;
> >  	}
> > -	return nfsd4_umh_cltrack_upcall("init", NULL, NULL);
> > +
> > +	ret = nfsd4_umh_cltrack_upcall("init", NULL, grace_start, NULL);
> > +	kfree(grace_start);
> > +	return ret;
> >  }
> >  
> >  static void
> >  nfsd4_umh_cltrack_create(struct nfs4_client *clp)
> >  {
> > -	char *hexid;
> > +	char *hexid, *minorvers, *grace_start;
> > +	struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
> >  
> >  	hexid = bin_to_hex_dup(clp->cl_name.data, clp->cl_name.len);
> >  	if (!hexid) {
> >  		dprintk("%s: can't allocate memory for upcall!\n", __func__);
> >  		return;
> >  	}
> > -	nfsd4_umh_cltrack_upcall("create", hexid, NULL);
> > +	minorvers = nfsd4_cltrack_client_minorversion(clp);
> > +	grace_start = nfsd4_cltrack_grace_start(nn->boot_time);
> > +	nfsd4_umh_cltrack_upcall("create", hexid, minorvers, grace_start);
> > +	kfree(minorvers);
> > +	kfree(grace_start);
> >  	kfree(hexid);
> >  }
> >  
> > @@ -1222,7 +1288,7 @@ nfsd4_umh_cltrack_remove(struct nfs4_client *clp)
> >  		dprintk("%s: can't allocate memory for upcall!\n", __func__);
> >  		return;
> >  	}
> > -	nfsd4_umh_cltrack_upcall("remove", hexid, NULL);
> > +	nfsd4_umh_cltrack_upcall("remove", hexid, NULL, NULL);
> >  	kfree(hexid);
> >  }
> >  
> > @@ -1238,7 +1304,7 @@ nfsd4_umh_cltrack_check(struct nfs4_client *clp)
> >  		return -ENOMEM;
> >  	}
> >  	legacy = nfsd4_cltrack_legacy_recdir(&clp->cl_name);
> > -	ret = nfsd4_umh_cltrack_upcall("check", hexid, legacy);
> > +	ret = nfsd4_umh_cltrack_upcall("check", hexid, legacy, NULL);
> >  	kfree(legacy);
> >  	kfree(hexid);
> >  	return ret;
> > @@ -1252,7 +1318,7 @@ nfsd4_umh_cltrack_grace_done(struct nfsd_net *nn)
> >  
> >  	sprintf(timestr, "%ld", nn->boot_time);
> >  	legacy = nfsd4_cltrack_legacy_topdir();
> > -	nfsd4_umh_cltrack_upcall("gracedone", timestr, legacy);
> > +	nfsd4_umh_cltrack_upcall("gracedone", timestr, legacy, NULL);
> >  	kfree(legacy);
> >  }
> >  
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 21becb29dae1..5b04353bf2ec 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -6342,10 +6342,10 @@ nfs4_state_start_net(struct net *net)
> >  	ret = nfs4_state_create_net(net);
> >  	if (ret)
> >  		return ret;
> > -	nfsd4_client_tracking_init(net);
> >  	nn->boot_time = get_seconds();
> > -	locks_start_grace(net, &nn->nfsd4_manager);
> >  	nn->grace_ended = false;
> > +	locks_start_grace(net, &nn->nfsd4_manager);
> > +	nfsd4_client_tracking_init(net);
> >  	printk(KERN_INFO "NFSD: starting %ld-second grace period (net %p)\n",
> >  	       nn->nfsd4_grace, net);
> >  	queue_delayed_work(laundry_wq, &nn->laundromat_work, nn->nfsd4_grace * HZ);
> > -- 
> > 1.9.3
> > 


-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 5/5] nfsd: pass extra info in env vars to upcalls to allow for early grace period end
  2014-09-05 11:43     ` Jeff Layton
@ 2014-09-05 15:58       ` J. Bruce Fields
  0 siblings, 0 replies; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-05 15:58 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Fri, Sep 05, 2014 at 07:43:00AM -0400, Jeff Layton wrote:
> On Thu, 4 Sep 2014 15:59:14 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Tue, Aug 19, 2014 at 02:38:29PM -0400, Jeff Layton wrote:
> > > In order to support lifting the grace period early, we must tell
> > > nfsdcltrack what sort of client the "create" upcall is for. We can't
> > > reliably tell if a v4.0 client has completed reclaiming, so we can only
> > > lift the grace period once all the v4.1+ clients have issued a
> > > RECLAIM_COMPLETE and if there are no v4.0 clients.
> > 
> > We really only care about 4.0 vs 4.1 clients.  Maybe
> > NFSDCLTRACK_CLIENT_WILL_RECLAIM_COMPLETE=Y|N would be better.
> > 
> > --b.
> > 
> 
> We have v4.2 in the queue now, and v4.3 will eventually come later.
> Maybe v4.4 or v4.5 will change how recovery works? I figured it was
> better to pass down more specific info and let userland sort out
> what to do with it.
> 
> That said, if you think that a simple y/n flag is better, then I'll go
> with that.

Yes, I'd prefer the single flag, thanks.  That doesn't rule out adding
more information later if you turn out to be right.

--b.

> 
> > > 
> > > Also, in order to lift the grace period, we have to tell userland when
> > > the grace period started so that it can tell whether a RECLAIM_COMPLETE
> > > has been issued for each client since then.
> > > 
> > > Since this is all optional info, we pass it along in environment
> > > variables to the "init" and "create" upcalls. By doing this, we don't
> > > need to revise the upcall format. The UMH upcall can simply make use of
> > > this info if it happens to be present. If it's not then it can just
> > > avoid lifting the grace period early.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@primarydata.com>
> > > ---
> > >  fs/nfsd/nfs4recover.c | 90 ++++++++++++++++++++++++++++++++++++++++++++-------
> > >  fs/nfsd/nfs4state.c   |  4 +--
> > >  2 files changed, 80 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> > > index a0d2ba956a3f..2b61bcd92b58 100644
> > > --- a/fs/nfsd/nfs4recover.c
> > > +++ b/fs/nfsd/nfs4recover.c
> > > @@ -1062,6 +1062,8 @@ MODULE_PARM_DESC(cltrack_legacy_disable,
> > >  
> > >  #define LEGACY_TOPDIR_ENV_PREFIX "NFSDCLTRACK_LEGACY_TOPDIR="
> > >  #define LEGACY_RECDIR_ENV_PREFIX "NFSDCLTRACK_LEGACY_RECDIR="
> > > +#define CLIENT_MINORVERS_ENV_PREFIX "NFSDCLTRACK_CLIENT_MINORVERSION="
> > > +#define GRACE_START_ENV_PREFIX "NFSDCLTRACK_GRACE_START="
> > >  
> > >  static char *
> > >  nfsd4_cltrack_legacy_topdir(void)
> > > @@ -1126,10 +1128,60 @@ nfsd4_cltrack_legacy_recdir(const struct xdr_netobj *name)
> > >  	return result;
> > >  }
> > >  
> > > +static char *
> > > +nfsd4_cltrack_client_minorversion(struct nfs4_client *clp)
> > > +{
> > > +	int copied;
> > > +	size_t len;
> > > +	char *result;
> > > +
> > > +	/* prefix + max width of integer string + terminating NULL */
> > > +	len = strlen(CLIENT_MINORVERS_ENV_PREFIX) + 10 + 1;
> > > +
> > > +	result = kmalloc(len, GFP_KERNEL);
> > > +	if (!result)
> > > +		return result;
> > > +
> > > +	copied = snprintf(result, len, CLIENT_MINORVERS_ENV_PREFIX "%u",
> > > +				clp->cl_minorversion);
> > > +	if (copied >= len) {
> > > +		/* just return nothing if output was truncated */
> > > +		kfree(result);
> > > +		return NULL;
> > > +	}
> > > +
> > > +	return result;
> > > +}
> > > +
> > > +static char *
> > > +nfsd4_cltrack_grace_start(time_t grace_start)
> > > +{
> > > +	int copied;
> > > +	size_t len;
> > > +	char *result;
> > > +
> > > +	/* prefix + max width of int64_t string + terminating NULL */
> > > +	len = strlen(GRACE_START_ENV_PREFIX) + 22 + 1;
> > > +
> > > +	result = kmalloc(len, GFP_KERNEL);
> > > +	if (!result)
> > > +		return result;
> > > +
> > > +	copied = snprintf(result, len, GRACE_START_ENV_PREFIX "%ld",
> > > +				grace_start);
> > > +	if (copied >= len) {
> > > +		/* just return nothing if output was truncated */
> > > +		kfree(result);
> > > +		return NULL;
> > > +	}
> > > +
> > > +	return result;
> > > +}
> > > +
> > >  static int
> > > -nfsd4_umh_cltrack_upcall(char *cmd, char *arg, char *legacy)
> > > +nfsd4_umh_cltrack_upcall(char *cmd, char *arg, char *env0, char *env1)
> > >  {
> > > -	char *envp[2];
> > > +	char *envp[3];
> > >  	char *argv[4];
> > >  	int ret;
> > >  
> > > @@ -1140,10 +1192,12 @@ nfsd4_umh_cltrack_upcall(char *cmd, char *arg, char *legacy)
> > >  
> > >  	dprintk("%s: cmd: %s\n", __func__, cmd);
> > >  	dprintk("%s: arg: %s\n", __func__, arg ? arg : "(null)");
> > > -	dprintk("%s: legacy: %s\n", __func__, legacy ? legacy : "(null)");
> > > +	dprintk("%s: env0: %s\n", __func__, env0 ? env0 : "(null)");
> > > +	dprintk("%s: env1: %s\n", __func__, env1 ? env1 : "(null)");
> > >  
> > > -	envp[0] = legacy;
> > > -	envp[1] = NULL;
> > > +	envp[0] = env0;
> > > +	envp[1] = env1;
> > > +	envp[2] = NULL;
> > >  
> > >  	argv[0] = (char *)cltrack_prog;
> > >  	argv[1] = cmd;
> > > @@ -1187,28 +1241,40 @@ bin_to_hex_dup(const unsigned char *src, int srclen)
> > >  }
> > >  
> > >  static int
> > > -nfsd4_umh_cltrack_init(struct net __attribute__((unused)) *net)
> > > +nfsd4_umh_cltrack_init(struct net *net)
> > >  {
> > > +	int ret;
> > > +	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > > +	char *grace_start = nfsd4_cltrack_grace_start(nn->boot_time);
> > > +
> > >  	/* XXX: The usermode helper s not working in container yet. */
> > >  	if (net != &init_net) {
> > >  		WARN(1, KERN_ERR "NFSD: attempt to initialize umh client "
> > >  			"tracking in a container!\n");
> > >  		return -EINVAL;
> > >  	}
> > > -	return nfsd4_umh_cltrack_upcall("init", NULL, NULL);
> > > +
> > > +	ret = nfsd4_umh_cltrack_upcall("init", NULL, grace_start, NULL);
> > > +	kfree(grace_start);
> > > +	return ret;
> > >  }
> > >  
> > >  static void
> > >  nfsd4_umh_cltrack_create(struct nfs4_client *clp)
> > >  {
> > > -	char *hexid;
> > > +	char *hexid, *minorvers, *grace_start;
> > > +	struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
> > >  
> > >  	hexid = bin_to_hex_dup(clp->cl_name.data, clp->cl_name.len);
> > >  	if (!hexid) {
> > >  		dprintk("%s: can't allocate memory for upcall!\n", __func__);
> > >  		return;
> > >  	}
> > > -	nfsd4_umh_cltrack_upcall("create", hexid, NULL);
> > > +	minorvers = nfsd4_cltrack_client_minorversion(clp);
> > > +	grace_start = nfsd4_cltrack_grace_start(nn->boot_time);
> > > +	nfsd4_umh_cltrack_upcall("create", hexid, minorvers, grace_start);
> > > +	kfree(minorvers);
> > > +	kfree(grace_start);
> > >  	kfree(hexid);
> > >  }
> > >  
> > > @@ -1222,7 +1288,7 @@ nfsd4_umh_cltrack_remove(struct nfs4_client *clp)
> > >  		dprintk("%s: can't allocate memory for upcall!\n", __func__);
> > >  		return;
> > >  	}
> > > -	nfsd4_umh_cltrack_upcall("remove", hexid, NULL);
> > > +	nfsd4_umh_cltrack_upcall("remove", hexid, NULL, NULL);
> > >  	kfree(hexid);
> > >  }
> > >  
> > > @@ -1238,7 +1304,7 @@ nfsd4_umh_cltrack_check(struct nfs4_client *clp)
> > >  		return -ENOMEM;
> > >  	}
> > >  	legacy = nfsd4_cltrack_legacy_recdir(&clp->cl_name);
> > > -	ret = nfsd4_umh_cltrack_upcall("check", hexid, legacy);
> > > +	ret = nfsd4_umh_cltrack_upcall("check", hexid, legacy, NULL);
> > >  	kfree(legacy);
> > >  	kfree(hexid);
> > >  	return ret;
> > > @@ -1252,7 +1318,7 @@ nfsd4_umh_cltrack_grace_done(struct nfsd_net *nn)
> > >  
> > >  	sprintf(timestr, "%ld", nn->boot_time);
> > >  	legacy = nfsd4_cltrack_legacy_topdir();
> > > -	nfsd4_umh_cltrack_upcall("gracedone", timestr, legacy);
> > > +	nfsd4_umh_cltrack_upcall("gracedone", timestr, legacy, NULL);
> > >  	kfree(legacy);
> > >  }
> > >  
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index 21becb29dae1..5b04353bf2ec 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -6342,10 +6342,10 @@ nfs4_state_start_net(struct net *net)
> > >  	ret = nfs4_state_create_net(net);
> > >  	if (ret)
> > >  		return ret;
> > > -	nfsd4_client_tracking_init(net);
> > >  	nn->boot_time = get_seconds();
> > > -	locks_start_grace(net, &nn->nfsd4_manager);
> > >  	nn->grace_ended = false;
> > > +	locks_start_grace(net, &nn->nfsd4_manager);
> > > +	nfsd4_client_tracking_init(net);
> > >  	printk(KERN_INFO "NFSD: starting %ld-second grace period (net %p)\n",
> > >  	       nn->nfsd4_grace, net);
> > >  	queue_delayed_work(laundry_wq, &nn->laundromat_work, nn->nfsd4_grace * HZ);
> > > -- 
> > > 1.9.3
> > > 
> 
> 
> -- 
> Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module
  2014-08-28 20:24     ` J. Bruce Fields
  2014-08-28 23:53       ` Jeff Layton
@ 2014-09-15 22:08       ` J. Bruce Fields
  2014-09-15 22:09         ` J. Bruce Fields
  2014-09-15 23:11         ` Jeff Layton
  1 sibling, 2 replies; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-15 22:08 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Thu, Aug 28, 2014 at 04:24:43PM -0400, J. Bruce Fields wrote:
> On Thu, Aug 28, 2014 at 04:01:00PM -0400, J. Bruce Fields wrote:
> > On Tue, Aug 19, 2014 at 02:38:25PM -0400, Jeff Layton wrote:
> > > Currently, all of the grace period handling is part of lockd. Eventually
> > > though we'd like to be able to build v4-only servers, at which point
> > > we'll need to put all of this elsewhere.
> > > 
> > > Move the code itself into fs/nfs_common and have it build a grace.ko
> > > module. Then, rejigger the Kconfig options so that both nfsd and lockd
> > > enable it automatically.
> > 
> > Thanks, applying this one for 3.18 indepedently of the others.
> 
> This code should also be fixed, though.
> 
> Currently nfsd is recording the grace period as done when its own timer
> runs out, but then it continuing to accept reclaims until lockd is also
> done.

I sat down to fix this today then decided there's not a real bug:

All the grace_done upcall really does is throw away any previous clients
that haven't reclaimed yet.

It's legal to do that as soon as the correct amount of time has passed.
It's actually OK to continue to allow the grace period to run past that
point, the only requirement is that all reclaims precede all new opens,
which the code still does enforce.

So I think it might just be worth a little extra explanation.

--b.

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index a298c3d..ca6f6ee 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4122,8 +4122,23 @@ nfsd4_end_grace(struct nfsd_net *nn)
 
 	dprintk("NFSD: end of grace period\n");
 	nn->grace_ended = true;
+	/*
+	 * If the server goes down again right now, an NFSv4
+	 * client will still be allowed to reclaim after it comes back up,
+	 * even if it hasn't yet had a chance to reclaim state this time.
+	 */
 	nfsd4_record_grace_done(nn);
+	/*
+	 * At this point, NFSv4 clients can still reclaim.  But if the
+	 * server crashes, any that have not yet reclaimed will be out
+	 * of luck on the next boot.
+	 */
 	locks_end_grace(&nn->nfsd4_manager);
+	/*
+	 * At this point, and once lockd and/or any other containers
+	 * exit their grace period, further reclaims aren't permitted
+	 * and regular locking can resume.
+	 */
 }
 
 static time_t

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module
  2014-09-15 22:08       ` J. Bruce Fields
@ 2014-09-15 22:09         ` J. Bruce Fields
  2014-09-15 22:10           ` J. Bruce Fields
  2014-09-15 23:19           ` Jeff Layton
  2014-09-15 23:11         ` Jeff Layton
  1 sibling, 2 replies; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-15 22:09 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Mon, Sep 15, 2014 at 06:08:13PM -0400, J. Bruce Fields wrote:
> On Thu, Aug 28, 2014 at 04:24:43PM -0400, J. Bruce Fields wrote:
> > On Thu, Aug 28, 2014 at 04:01:00PM -0400, J. Bruce Fields wrote:
> > > On Tue, Aug 19, 2014 at 02:38:25PM -0400, Jeff Layton wrote:
> > > > Currently, all of the grace period handling is part of lockd. Eventually
> > > > though we'd like to be able to build v4-only servers, at which point
> > > > we'll need to put all of this elsewhere.
> > > > 
> > > > Move the code itself into fs/nfs_common and have it build a grace.ko
> > > > module. Then, rejigger the Kconfig options so that both nfsd and lockd
> > > > enable it automatically.
> > > 
> > > Thanks, applying this one for 3.18 indepedently of the others.
> > 
> > This code should also be fixed, though.
> > 
> > Currently nfsd is recording the grace period as done when its own timer
> > runs out, but then it continuing to accept reclaims until lockd is also
> > done.
> 
> I sat down to fix this today then decided there's not a real bug:
> 
> All the grace_done upcall really does is throw away any previous clients
> that haven't reclaimed yet.
> 
> It's legal to do that as soon as the correct amount of time has passed.
> It's actually OK to continue to allow the grace period to run past that
> point, the only requirement is that all reclaims precede all new opens,
> which the code still does enforce.
> 
> So I think it might just be worth a little extra explanation.

I considered doing something like the following anyway (total
off-the-cuff draft, not tested), but I think it's overkill.

--b.

commit 675121da48fb4d7d85b58467c7f79dd3a6964879
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Mon Sep 15 17:42:14 2014 -0400

    nfsd4: end grace only when last grace period ends
    
    In the case both lockd and nfsd4 are running, or when we have multiple
    containers exporting the same filesystem, there may be multiple grace
    periods.  In that case the grace period doesn't really end till the last
    one does and though it's harmless to do the grace_done upcall before
    then, it would probably be better not to.
    
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

diff --git a/fs/nfs_common/grace.c b/fs/nfs_common/grace.c
index ae6e58e..dc67e5d 100644
--- a/fs/nfs_common/grace.c
+++ b/fs/nfs_common/grace.c
@@ -12,6 +12,11 @@
 static int grace_net_id;
 static DEFINE_SPINLOCK(grace_lock);
 
+struct grace_net {
+	struct list_head *grace_list;
+	bool in_grace;
+};
+
 /**
  * locks_start_grace
  * @net: net namespace that this lock manager belongs to
@@ -27,10 +32,10 @@ static DEFINE_SPINLOCK(grace_lock);
 void
 locks_start_grace(struct net *net, struct lock_manager *lm)
 {
-	struct list_head *grace_list = net_generic(net, grace_net_id);
+	struct list_head *gn = net_generic(net, grace_net_id);
 
 	spin_lock(&grace_lock);
-	list_add(&lm->list, grace_list);
+	list_add(&lm->list, &gn->grace_list);
 	spin_unlock(&grace_lock);
 }
 EXPORT_SYMBOL_GPL(locks_start_grace);
@@ -49,9 +54,31 @@ EXPORT_SYMBOL_GPL(locks_start_grace);
 void
 locks_end_grace(struct lock_manager *lm)
 {
+	struct list_head *gn = net_generic(net, grace_net_id);
+
+	if (list_empty(&lm->list))
+		return;
 	spin_lock(&grace_lock);
 	list_del_init(&lm->list);
 	spin_unlock(&grace_lock);
+	if (list_empty(&gn->grace_list))
+		return;
+	/*
+	 * If the server goes down again right now, an NFSv4
+	 * client will still be reclaim after it comes back up, even if
+	 * it hasn't yet had a chance to reclaim state this time.
+	 */
+	lm->finish_grace(lm);
+	/*
+	 * At this point, NFSv4 clients can still reclaim.  But if the
+	 * server crashes, any that have not yet reclaimed will be out
+	 * of luck on the next boot.
+	 */
+	ln->in_grace = false;
+	/*
+	 * At this point, no reclaims are permitted.  Regular locking
+	 * can resume.
+	 */
 }
 EXPORT_SYMBOL_GPL(locks_end_grace);
 
@@ -65,27 +92,28 @@ EXPORT_SYMBOL_GPL(locks_end_grace);
 int
 locks_in_grace(struct net *net)
 {
-	struct list_head *grace_list = net_generic(net, grace_net_id);
+	struct list_head *gn = net_generic(net, grace_net_id);
 
-	return !list_empty(grace_list);
+	return gn->in_grace;
 }
 EXPORT_SYMBOL_GPL(locks_in_grace);
 
 static int __net_init
 grace_init_net(struct net *net)
 {
-	struct list_head *grace_list = net_generic(net, grace_net_id);
+	struct list_head *gn = net_generic(net, grace_net_id);
 
-	INIT_LIST_HEAD(grace_list);
+	INIT_LIST_HEAD(&gn->grace_list);
+	gn->in_grace = true;
 	return 0;
 }
 
 static void __net_exit
 grace_exit_net(struct net *net)
 {
-	struct list_head *grace_list = net_generic(net, grace_net_id);
+	struct list_head *gn = net_generic(net, grace_net_id);
 
-	BUG_ON(!list_empty(grace_list));
+	BUG_ON(!list_empty(&gn->grace_list));
 }
 
 static struct pernet_operations grace_net_ops = {
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index fc88b57..968acd0 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4122,7 +4122,6 @@ nfsd4_end_grace(struct nfsd_net *nn)
 
 	dprintk("NFSD: end of grace period\n");
 	nn->grace_ended = true;
-	nfsd4_record_grace_done(nn);
 	locks_end_grace(&nn->nfsd4_manager);
 	/*
 	 * Now that every NFSv4 client has had the chance to recover and
@@ -6341,6 +6340,13 @@ nfs4_state_destroy_net(struct net *net)
 	put_net(net);
 }
 
+void nfsd4_finish_grace(struct lock_manager *lm)
+{
+	struct nfsd_net *nn = lm->private;
+
+	nfsd4_record_grace_done(nn);
+}
+
 int
 nfs4_state_start_net(struct net *net)
 {
@@ -6352,6 +6358,8 @@ nfs4_state_start_net(struct net *net)
 		return ret;
 	nn->boot_time = get_seconds();
 	nn->grace_ended = false;
+	nn->nfsd4_manager.finish_grace = nfsd4_finish_grace;
+	nn->nfsd4_manager.private = nn;
 	locks_start_grace(net, &nn->nfsd4_manager);
 	nfsd4_client_tracking_init(net);
 	printk(KERN_INFO "NFSD: starting %ld-second grace period (net %p)\n",
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9418772..0e33e7c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -876,6 +876,8 @@ struct lock_manager_operations {
 
 struct lock_manager {
 	struct list_head list;
+	void (*finish_grace)(struct lock_manager *lm);
+	void *private;
 };
 
 struct net;

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module
  2014-09-15 22:09         ` J. Bruce Fields
@ 2014-09-15 22:10           ` J. Bruce Fields
  2014-09-15 23:19           ` Jeff Layton
  1 sibling, 0 replies; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-15 22:10 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Mon, Sep 15, 2014 at 06:09:39PM -0400, J. Bruce Fields wrote:
> I considered doing something like the following anyway (total
> off-the-cuff draft, not tested), but I think it's overkill.

Oh, and the one other nit here is this.

--b.

commit 24e24d2e8125d917a49b3df6a5248d6ca5c9592b
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Mon Sep 15 11:05:46 2014 -0400

    nfsd4: stop grace_time update at end of grace period
    
    The attempt to automatically set a new grace period time at the end of
    the grace period isn't really helpful.  We'll probably shut down and
    reboot before we actually make use of the new grace period time anyway.
    So may as well leave it up to the init system to get this right.
    
    This just confuses people when they see /proc/fs/nfsd/nfsv4gracetime
    change from what they set it to.
    
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index fc88b57..a298c3d 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4124,12 +4124,6 @@ nfsd4_end_grace(struct nfsd_net *nn)
 	nn->grace_ended = true;
 	nfsd4_record_grace_done(nn);
 	locks_end_grace(&nn->nfsd4_manager);
-	/*
-	 * Now that every NFSv4 client has had the chance to recover and
-	 * to see the (possibly new, possibly shorter) lease time, we
-	 * can safely set the next grace time to the current lease time:
-	 */
-	nn->nfsd4_grace = nn->nfsd4_lease;
 }
 
 static time_t

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module
  2014-09-15 22:08       ` J. Bruce Fields
  2014-09-15 22:09         ` J. Bruce Fields
@ 2014-09-15 23:11         ` Jeff Layton
  1 sibling, 0 replies; 37+ messages in thread
From: Jeff Layton @ 2014-09-15 23:11 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Mon, 15 Sep 2014 18:08:13 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Thu, Aug 28, 2014 at 04:24:43PM -0400, J. Bruce Fields wrote:
> > On Thu, Aug 28, 2014 at 04:01:00PM -0400, J. Bruce Fields wrote:
> > > On Tue, Aug 19, 2014 at 02:38:25PM -0400, Jeff Layton wrote:
> > > > Currently, all of the grace period handling is part of lockd. Eventually
> > > > though we'd like to be able to build v4-only servers, at which point
> > > > we'll need to put all of this elsewhere.
> > > > 
> > > > Move the code itself into fs/nfs_common and have it build a grace.ko
> > > > module. Then, rejigger the Kconfig options so that both nfsd and lockd
> > > > enable it automatically.
> > > 
> > > Thanks, applying this one for 3.18 indepedently of the others.
> > 
> > This code should also be fixed, though.
> > 
> > Currently nfsd is recording the grace period as done when its own timer
> > runs out, but then it continuing to accept reclaims until lockd is also
> > done.
> 
> I sat down to fix this today then decided there's not a real bug:
> 
> All the grace_done upcall really does is throw away any previous clients
> that haven't reclaimed yet.
> 
> It's legal to do that as soon as the correct amount of time has passed.
> It's actually OK to continue to allow the grace period to run past that
> point, the only requirement is that all reclaims precede all new opens,
> which the code still does enforce.
> 
> So I think it might just be worth a little extra explanation.
> 
> --b.
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index a298c3d..ca6f6ee 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -4122,8 +4122,23 @@ nfsd4_end_grace(struct nfsd_net *nn)
>  
>  	dprintk("NFSD: end of grace period\n");
>  	nn->grace_ended = true;
> +	/*
> +	 * If the server goes down again right now, an NFSv4
> +	 * client will still be allowed to reclaim after it comes back up,
> +	 * even if it hasn't yet had a chance to reclaim state this time.
> +	 */

Might even be reasonable to lay out the v4.0/v4.1+ difference here? For
a v4.0 client any lock reclaim will do, but for v4.1+ the client must
have _completed_ reclaim at this point (a'la RECLAIM_COMPLETE)

...or at least that'll be the situation once the patches I sent go in...
 
>  	nfsd4_record_grace_done(nn);
> +	/*
> +	 * At this point, NFSv4 clients can still reclaim.  But if
> the
> +	 * server crashes, any that have not yet reclaimed will be
> out
> +	 * of luck on the next boot.
> +	 */
>  	locks_end_grace(&nn->nfsd4_manager);
> +	/*
> +	 * At this point, and once lockd and/or any other containers
> +	 * exit their grace period, further reclaims aren't permitted
> +	 * and regular locking can resume.
> +	 */
>  }
>  
>  static time_t
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> in the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

ACK, overall. The comments make sense to me and it does clarify things
a bit.

-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module
  2014-09-15 22:09         ` J. Bruce Fields
  2014-09-15 22:10           ` J. Bruce Fields
@ 2014-09-15 23:19           ` Jeff Layton
  2014-09-16  0:19             ` J. Bruce Fields
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff Layton @ 2014-09-15 23:19 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Mon, 15 Sep 2014 18:09:39 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Mon, Sep 15, 2014 at 06:08:13PM -0400, J. Bruce Fields wrote:
> > On Thu, Aug 28, 2014 at 04:24:43PM -0400, J. Bruce Fields wrote:
> > > On Thu, Aug 28, 2014 at 04:01:00PM -0400, J. Bruce Fields wrote:
> > > > On Tue, Aug 19, 2014 at 02:38:25PM -0400, Jeff Layton wrote:
> > > > > Currently, all of the grace period handling is part of lockd. Eventually
> > > > > though we'd like to be able to build v4-only servers, at which point
> > > > > we'll need to put all of this elsewhere.
> > > > > 
> > > > > Move the code itself into fs/nfs_common and have it build a grace.ko
> > > > > module. Then, rejigger the Kconfig options so that both nfsd and lockd
> > > > > enable it automatically.
> > > > 
> > > > Thanks, applying this one for 3.18 indepedently of the others.
> > > 
> > > This code should also be fixed, though.
> > > 
> > > Currently nfsd is recording the grace period as done when its own timer
> > > runs out, but then it continuing to accept reclaims until lockd is also
> > > done.
> > 
> > I sat down to fix this today then decided there's not a real bug:
> > 
> > All the grace_done upcall really does is throw away any previous clients
> > that haven't reclaimed yet.
> > 
> > It's legal to do that as soon as the correct amount of time has passed.
> > It's actually OK to continue to allow the grace period to run past that
> > point, the only requirement is that all reclaims precede all new opens,
> > which the code still does enforce.
> > 
> > So I think it might just be worth a little extra explanation.
> 
> I considered doing something like the following anyway (total
> off-the-cuff draft, not tested), but I think it's overkill.
> 
> --b.
> 

Yeah, I tend to agree. I don't see a real need for it.

In principle, I guess you could end up in a situation where the lockd
and nfsd grace periods different by a large amount. In that case, you
could end up in a situation where you started denying reclaims for v4.x
clients that haven't reclaimed anything yet, but then still didn't
allow them to establish new locks either.

So it might be reasonable to add something like this, but I don't see
it as anything most setups would ever hit -- particularly not in sane
configurations.


> commit 675121da48fb4d7d85b58467c7f79dd3a6964879
> Author: J. Bruce Fields <bfields@redhat.com>
> Date:   Mon Sep 15 17:42:14 2014 -0400
> 
>     nfsd4: end grace only when last grace period ends
>     
>     In the case both lockd and nfsd4 are running, or when we have multiple
>     containers exporting the same filesystem, there may be multiple grace
>     periods.  In that case the grace period doesn't really end till the last
>     one does and though it's harmless to do the grace_done upcall before
>     then, it would probably be better not to.
>     
>     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> 
> diff --git a/fs/nfs_common/grace.c b/fs/nfs_common/grace.c
> index ae6e58e..dc67e5d 100644
> --- a/fs/nfs_common/grace.c
> +++ b/fs/nfs_common/grace.c
> @@ -12,6 +12,11 @@
>  static int grace_net_id;
>  static DEFINE_SPINLOCK(grace_lock);
>  
> +struct grace_net {
> +	struct list_head *grace_list;
> +	bool in_grace;
> +};
> +
>  /**
>   * locks_start_grace
>   * @net: net namespace that this lock manager belongs to
> @@ -27,10 +32,10 @@ static DEFINE_SPINLOCK(grace_lock);
>  void
>  locks_start_grace(struct net *net, struct lock_manager *lm)
>  {
> -	struct list_head *grace_list = net_generic(net, grace_net_id);
> +	struct list_head *gn = net_generic(net, grace_net_id);
>  
>  	spin_lock(&grace_lock);
> -	list_add(&lm->list, grace_list);
> +	list_add(&lm->list, &gn->grace_list);
>  	spin_unlock(&grace_lock);
>  }
>  EXPORT_SYMBOL_GPL(locks_start_grace);
> @@ -49,9 +54,31 @@ EXPORT_SYMBOL_GPL(locks_start_grace);
>  void
>  locks_end_grace(struct lock_manager *lm)
>  {
> +	struct list_head *gn = net_generic(net, grace_net_id);
> +
> +	if (list_empty(&lm->list))
> +		return;
>  	spin_lock(&grace_lock);
>  	list_del_init(&lm->list);
>  	spin_unlock(&grace_lock);
> +	if (list_empty(&gn->grace_list))
> +		return;
> +	/*
> +	 * If the server goes down again right now, an NFSv4
> +	 * client will still be reclaim after it comes back up, even if
> +	 * it hasn't yet had a chance to reclaim state this time.
> +	 */
> +	lm->finish_grace(lm);
> +	/*
> +	 * At this point, NFSv4 clients can still reclaim.  But if the
> +	 * server crashes, any that have not yet reclaimed will be out
> +	 * of luck on the next boot.
> +	 */
> +	ln->in_grace = false;

	gn->in_grace = false;

> +	/*
> +	 * At this point, no reclaims are permitted.  Regular locking
> +	 * can resume.
> +	 */
>  }
>  EXPORT_SYMBOL_GPL(locks_end_grace);
>  
> @@ -65,27 +92,28 @@ EXPORT_SYMBOL_GPL(locks_end_grace);
>  int
>  locks_in_grace(struct net *net)
>  {
> -	struct list_head *grace_list = net_generic(net, grace_net_id);
> +	struct list_head *gn = net_generic(net, grace_net_id);
>  
> -	return !list_empty(grace_list);
> +	return gn->in_grace;
>  }
>  EXPORT_SYMBOL_GPL(locks_in_grace);
>  
>  static int __net_init
>  grace_init_net(struct net *net)
>  {
> -	struct list_head *grace_list = net_generic(net, grace_net_id);
> +	struct list_head *gn = net_generic(net, grace_net_id);
>  
> -	INIT_LIST_HEAD(grace_list);
> +	INIT_LIST_HEAD(&gn->grace_list);
> +	gn->in_grace = true;
>  	return 0;
>  }
>  
>  static void __net_exit
>  grace_exit_net(struct net *net)
>  {
> -	struct list_head *grace_list = net_generic(net, grace_net_id);
> +	struct list_head *gn = net_generic(net, grace_net_id);
>  
> -	BUG_ON(!list_empty(grace_list));
> +	BUG_ON(!list_empty(&gn->grace_list));
>  }
>  
>  static struct pernet_operations grace_net_ops = {
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index fc88b57..968acd0 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -4122,7 +4122,6 @@ nfsd4_end_grace(struct nfsd_net *nn)
>  
>  	dprintk("NFSD: end of grace period\n");
>  	nn->grace_ended = true;
> -	nfsd4_record_grace_done(nn);
>  	locks_end_grace(&nn->nfsd4_manager);
>  	/*
>  	 * Now that every NFSv4 client has had the chance to recover and
> @@ -6341,6 +6340,13 @@ nfs4_state_destroy_net(struct net *net)
>  	put_net(net);
>  }
>  
> +void nfsd4_finish_grace(struct lock_manager *lm)
> +{
> +	struct nfsd_net *nn = lm->private;
> +
> +	nfsd4_record_grace_done(nn);
> +}
> +
>  int
>  nfs4_state_start_net(struct net *net)
>  {
> @@ -6352,6 +6358,8 @@ nfs4_state_start_net(struct net *net)
>  		return ret;
>  	nn->boot_time = get_seconds();
>  	nn->grace_ended = false;
> +	nn->nfsd4_manager.finish_grace = nfsd4_finish_grace;
> +	nn->nfsd4_manager.private = nn;
>  	locks_start_grace(net, &nn->nfsd4_manager);
>  	nfsd4_client_tracking_init(net);
>  	printk(KERN_INFO "NFSD: starting %ld-second grace period (net %p)\n",
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9418772..0e33e7c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -876,6 +876,8 @@ struct lock_manager_operations {
>  
>  struct lock_manager {
>  	struct list_head list;
> +	void (*finish_grace)(struct lock_manager *lm);
> +	void *private;
>  };
>  
>  struct net;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module
  2014-09-15 23:19           ` Jeff Layton
@ 2014-09-16  0:19             ` J. Bruce Fields
  0 siblings, 0 replies; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-16  0:19 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Mon, Sep 15, 2014 at 07:19:19PM -0400, Jeff Layton wrote:
> On Mon, 15 Sep 2014 18:09:39 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Mon, Sep 15, 2014 at 06:08:13PM -0400, J. Bruce Fields wrote:
> > > On Thu, Aug 28, 2014 at 04:24:43PM -0400, J. Bruce Fields wrote:
> > > > On Thu, Aug 28, 2014 at 04:01:00PM -0400, J. Bruce Fields wrote:
> > > > > On Tue, Aug 19, 2014 at 02:38:25PM -0400, Jeff Layton wrote:
> > > > > > Currently, all of the grace period handling is part of lockd. Eventually
> > > > > > though we'd like to be able to build v4-only servers, at which point
> > > > > > we'll need to put all of this elsewhere.
> > > > > > 
> > > > > > Move the code itself into fs/nfs_common and have it build a grace.ko
> > > > > > module. Then, rejigger the Kconfig options so that both nfsd and lockd
> > > > > > enable it automatically.
> > > > > 
> > > > > Thanks, applying this one for 3.18 indepedently of the others.
> > > > 
> > > > This code should also be fixed, though.
> > > > 
> > > > Currently nfsd is recording the grace period as done when its own timer
> > > > runs out, but then it continuing to accept reclaims until lockd is also
> > > > done.
> > > 
> > > I sat down to fix this today then decided there's not a real bug:
> > > 
> > > All the grace_done upcall really does is throw away any previous clients
> > > that haven't reclaimed yet.
> > > 
> > > It's legal to do that as soon as the correct amount of time has passed.
> > > It's actually OK to continue to allow the grace period to run past that
> > > point, the only requirement is that all reclaims precede all new opens,
> > > which the code still does enforce.
> > > 
> > > So I think it might just be worth a little extra explanation.
> > 
> > I considered doing something like the following anyway (total
> > off-the-cuff draft, not tested), but I think it's overkill.
> > 
> > --b.
> > 
> 
> Yeah, I tend to agree. I don't see a real need for it.
> 
> In principle, I guess you could end up in a situation where the lockd
> and nfsd grace periods different by a large amount. In that case, you
> could end up in a situation where you started denying reclaims for v4.x
> clients that haven't reclaimed anything yet, but then still didn't
> allow them to establish new locks either.

Oh, right, I forget that this also keeps old clients from doing further
reclaims during this boot.

I think that could actually be fixed purely in userspace--instead of
purging those old clients, just record the new end-of-grace timestamp
and keep them around a while longer.  You still know which ones to deny
on the next boot, but they're still around if you want to keep allowing
them to reclaim.

Anyway:

> So it might be reasonable to add something like this, but I don't see
> it as anything most setups would ever hit -- particularly not in sane
> configurations.

Yeah, no big deal.

--b.

> 
> 
> > commit 675121da48fb4d7d85b58467c7f79dd3a6964879
> > Author: J. Bruce Fields <bfields@redhat.com>
> > Date:   Mon Sep 15 17:42:14 2014 -0400
> > 
> >     nfsd4: end grace only when last grace period ends
> >     
> >     In the case both lockd and nfsd4 are running, or when we have multiple
> >     containers exporting the same filesystem, there may be multiple grace
> >     periods.  In that case the grace period doesn't really end till the last
> >     one does and though it's harmless to do the grace_done upcall before
> >     then, it would probably be better not to.
> >     
> >     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > 
> > diff --git a/fs/nfs_common/grace.c b/fs/nfs_common/grace.c
> > index ae6e58e..dc67e5d 100644
> > --- a/fs/nfs_common/grace.c
> > +++ b/fs/nfs_common/grace.c
> > @@ -12,6 +12,11 @@
> >  static int grace_net_id;
> >  static DEFINE_SPINLOCK(grace_lock);
> >  
> > +struct grace_net {
> > +	struct list_head *grace_list;
> > +	bool in_grace;
> > +};
> > +
> >  /**
> >   * locks_start_grace
> >   * @net: net namespace that this lock manager belongs to
> > @@ -27,10 +32,10 @@ static DEFINE_SPINLOCK(grace_lock);
> >  void
> >  locks_start_grace(struct net *net, struct lock_manager *lm)
> >  {
> > -	struct list_head *grace_list = net_generic(net, grace_net_id);
> > +	struct list_head *gn = net_generic(net, grace_net_id);
> >  
> >  	spin_lock(&grace_lock);
> > -	list_add(&lm->list, grace_list);
> > +	list_add(&lm->list, &gn->grace_list);
> >  	spin_unlock(&grace_lock);
> >  }
> >  EXPORT_SYMBOL_GPL(locks_start_grace);
> > @@ -49,9 +54,31 @@ EXPORT_SYMBOL_GPL(locks_start_grace);
> >  void
> >  locks_end_grace(struct lock_manager *lm)
> >  {
> > +	struct list_head *gn = net_generic(net, grace_net_id);
> > +
> > +	if (list_empty(&lm->list))
> > +		return;
> >  	spin_lock(&grace_lock);
> >  	list_del_init(&lm->list);
> >  	spin_unlock(&grace_lock);
> > +	if (list_empty(&gn->grace_list))
> > +		return;
> > +	/*
> > +	 * If the server goes down again right now, an NFSv4
> > +	 * client will still be reclaim after it comes back up, even if
> > +	 * it hasn't yet had a chance to reclaim state this time.
> > +	 */
> > +	lm->finish_grace(lm);
> > +	/*
> > +	 * At this point, NFSv4 clients can still reclaim.  But if the
> > +	 * server crashes, any that have not yet reclaimed will be out
> > +	 * of luck on the next boot.
> > +	 */
> > +	ln->in_grace = false;
> 
> 	gn->in_grace = false;
> 
> > +	/*
> > +	 * At this point, no reclaims are permitted.  Regular locking
> > +	 * can resume.
> > +	 */
> >  }
> >  EXPORT_SYMBOL_GPL(locks_end_grace);
> >  
> > @@ -65,27 +92,28 @@ EXPORT_SYMBOL_GPL(locks_end_grace);
> >  int
> >  locks_in_grace(struct net *net)
> >  {
> > -	struct list_head *grace_list = net_generic(net, grace_net_id);
> > +	struct list_head *gn = net_generic(net, grace_net_id);
> >  
> > -	return !list_empty(grace_list);
> > +	return gn->in_grace;
> >  }
> >  EXPORT_SYMBOL_GPL(locks_in_grace);
> >  
> >  static int __net_init
> >  grace_init_net(struct net *net)
> >  {
> > -	struct list_head *grace_list = net_generic(net, grace_net_id);
> > +	struct list_head *gn = net_generic(net, grace_net_id);
> >  
> > -	INIT_LIST_HEAD(grace_list);
> > +	INIT_LIST_HEAD(&gn->grace_list);
> > +	gn->in_grace = true;
> >  	return 0;
> >  }
> >  
> >  static void __net_exit
> >  grace_exit_net(struct net *net)
> >  {
> > -	struct list_head *grace_list = net_generic(net, grace_net_id);
> > +	struct list_head *gn = net_generic(net, grace_net_id);
> >  
> > -	BUG_ON(!list_empty(grace_list));
> > +	BUG_ON(!list_empty(&gn->grace_list));
> >  }
> >  
> >  static struct pernet_operations grace_net_ops = {
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index fc88b57..968acd0 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -4122,7 +4122,6 @@ nfsd4_end_grace(struct nfsd_net *nn)
> >  
> >  	dprintk("NFSD: end of grace period\n");
> >  	nn->grace_ended = true;
> > -	nfsd4_record_grace_done(nn);
> >  	locks_end_grace(&nn->nfsd4_manager);
> >  	/*
> >  	 * Now that every NFSv4 client has had the chance to recover and
> > @@ -6341,6 +6340,13 @@ nfs4_state_destroy_net(struct net *net)
> >  	put_net(net);
> >  }
> >  
> > +void nfsd4_finish_grace(struct lock_manager *lm)
> > +{
> > +	struct nfsd_net *nn = lm->private;
> > +
> > +	nfsd4_record_grace_done(nn);
> > +}
> > +
> >  int
> >  nfs4_state_start_net(struct net *net)
> >  {
> > @@ -6352,6 +6358,8 @@ nfs4_state_start_net(struct net *net)
> >  		return ret;
> >  	nn->boot_time = get_seconds();
> >  	nn->grace_ended = false;
> > +	nn->nfsd4_manager.finish_grace = nfsd4_finish_grace;
> > +	nn->nfsd4_manager.private = nn;
> >  	locks_start_grace(net, &nn->nfsd4_manager);
> >  	nfsd4_client_tracking_init(net);
> >  	printk(KERN_INFO "NFSD: starting %ld-second grace period (net %p)\n",
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 9418772..0e33e7c 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -876,6 +876,8 @@ struct lock_manager_operations {
> >  
> >  struct lock_manager {
> >  	struct list_head list;
> > +	void (*finish_grace)(struct lock_manager *lm);
> > +	void *private;
> >  };
> >  
> >  struct net;
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> -- 
> Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 0/5] nfsd: support for lifting grace period early
  2014-08-19 18:38 [PATCH v2 0/5] nfsd: support for lifting grace period early Jeff Layton
                   ` (4 preceding siblings ...)
  2014-08-19 18:38 ` [PATCH v2 5/5] nfsd: pass extra info in env vars to upcalls to allow for early grace period end Jeff Layton
@ 2014-09-26 18:39 ` J. Bruce Fields
  2014-09-26 18:54   ` Jeff Layton
  5 siblings, 1 reply; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-26 18:39 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

By the way, I've seen the following *before* your patches, but in case
you're still looking at reboot recovery problems:

I'm getting sporadic failures in the REBT6 pynfs test--a reclaim open
succeeds after a previous boot (with full grace period) during which the
client had failed to reclaim.

I managed to catch one trace, the relevant parts looked like:

	SETCLIENTID client1
	OPEN
	LOCK

	(server restart here)

	SETCLIENTID client2
	OPEN
	LOCK (lock that conflicts with client1's)

	(server restart here)

	SETCLIENTID client1
	OPEN CLAIM_PREVIOUS

And all those ops (including the last reclaim open) succeeded.

So I didn't have a chance to review it more carefully, but it certainly
looks like a server bug, not a test bug.  (Well, technically the server
behavior above is correct since it's not required to refuse anything
till we actually attempt to reclaim the original lock, but we know our
server's not that smart.)

But I haven't gotten any further than that....

--b.

On Tue, Aug 19, 2014 at 02:38:24PM -0400, Jeff Layton wrote:
> v2:
> - move grace period handling into its own module
> 
> One of the huge annoyances in dealing with knfsd is the 90s grace period
> that's imposed when the server reboots. This is not just an annoyance,
> but means a siginificant amount of "downtime" in many production
> environments.
> 
> This patchset aimed at reducing this pain. It adds a couple of /proc
> knobs that tell the lockd and nfsd lock managers to lift the grace
> period.
> 
> It also changes the UMH upcalls to pass a little bit of extra info in
> the form of environment variables so that the upcall program can
> determine whether there are still any clients that may be in the process
> of reclaiming.
> 
> There are also a couple of cleanup patches in here that are not strictly
> required. In particular, making a separate grace.ko module doesn't have
> to be done, but I think it's a good idea.
> 
> Jeff Layton (5):
>   lockd: move lockd's grace period handling into its own module
>   lockd: add a /proc/fs/lockd/nlm_end_grace file
>   nfsd: add a v4_end_grace file to /proc/fs/nfsd
>   nfsd: remove redundant boot_time parm from grace_done client tracking
>     op
>   nfsd: pass extra info in env vars to upcalls to allow for early grace
>     period end
> 
>  fs/Kconfig                       |   6 ++-
>  fs/lockd/Makefile                |   3 +-
>  fs/lockd/netns.h                 |   1 -
>  fs/lockd/procfs.c                |  76 +++++++++++++++++++++++++++
>  fs/lockd/procfs.h                |  28 ++++++++++
>  fs/lockd/svc.c                   |  10 +++-
>  fs/nfs_common/Makefile           |   3 +-
>  fs/{lockd => nfs_common}/grace.c |  68 +++++++++++++++++++++----
>  fs/nfsd/Kconfig                  |   1 +
>  fs/nfsd/nfs4recover.c            | 107 +++++++++++++++++++++++++++++++--------
>  fs/nfsd/nfs4state.c              |   8 +--
>  fs/nfsd/nfsctl.c                 |  35 +++++++++++++
>  fs/nfsd/state.h                  |   5 +-
>  include/linux/proc_fs.h          |   2 +
>  14 files changed, 312 insertions(+), 41 deletions(-)
>  create mode 100644 fs/lockd/procfs.c
>  create mode 100644 fs/lockd/procfs.h
>  rename fs/{lockd => nfs_common}/grace.c (50%)
> 
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 0/5] nfsd: support for lifting grace period early
  2014-09-26 18:39 ` [PATCH v2 0/5] nfsd: support for lifting grace period early J. Bruce Fields
@ 2014-09-26 18:54   ` Jeff Layton
  2014-09-26 19:46     ` J. Bruce Fields
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Layton @ 2014-09-26 18:54 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Fri, 26 Sep 2014 14:39:49 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> By the way, I've seen the following *before* your patches, but in case
> you're still looking at reboot recovery problems:
> 
> I'm getting sporadic failures in the REBT6 pynfs test--a reclaim open
> succeeds after a previous boot (with full grace period) during which the
> client had failed to reclaim.
> 
> I managed to catch one trace, the relevant parts looked like:
> 
> 	SETCLIENTID client1
> 	OPEN
> 	LOCK
> 
> 	(server restart here)
> 
> 	SETCLIENTID client2
> 	OPEN
> 	LOCK (lock that conflicts with client1's)
> 
> 	(server restart here)
> 
> 	SETCLIENTID client1
> 	OPEN CLAIM_PREVIOUS
> 
> And all those ops (including the last reclaim open) succeeded.
> 
> So I didn't have a chance to review it more carefully, but it certainly
> looks like a server bug, not a test bug.  (Well, technically the server
> behavior above is correct since it's not required to refuse anything
> till we actually attempt to reclaim the original lock, but we know our
> server's not that smart.)
> 
> But I haven't gotten any further than that....
> 
> --b.
> 

Ewww...v4.0... ;)

Well, I guess that could happen if, after the first reboot, client1 also
did a SETCLIENTID *and* reclaimed something that didn't conflict with
the lock that client2 grabs...or, did an OPEN/OPEN_CONFIRM after the
grace period without reclaiming its lock previously.

If it didn't do one or the other, then its record should have been
cleaned out of the DB after the grace period ended between the reboots
and it wouldn't have been able to reclaim after the second reboot.

It's a bit of a pathological case, and I don't see a way to fix that in
the context of v4.0. The fact that there's no RECLAIM_COMPLETE is a
pretty nasty protocol bug, IMO. Yet another reason to start really
moving people toward v4.1+...

> On Tue, Aug 19, 2014 at 02:38:24PM -0400, Jeff Layton wrote:
> > v2:
> > - move grace period handling into its own module
> > 
> > One of the huge annoyances in dealing with knfsd is the 90s grace period
> > that's imposed when the server reboots. This is not just an annoyance,
> > but means a siginificant amount of "downtime" in many production
> > environments.
> > 
> > This patchset aimed at reducing this pain. It adds a couple of /proc
> > knobs that tell the lockd and nfsd lock managers to lift the grace
> > period.
> > 
> > It also changes the UMH upcalls to pass a little bit of extra info in
> > the form of environment variables so that the upcall program can
> > determine whether there are still any clients that may be in the process
> > of reclaiming.
> > 
> > There are also a couple of cleanup patches in here that are not strictly
> > required. In particular, making a separate grace.ko module doesn't have
> > to be done, but I think it's a good idea.
> > 
> > Jeff Layton (5):
> >   lockd: move lockd's grace period handling into its own module
> >   lockd: add a /proc/fs/lockd/nlm_end_grace file
> >   nfsd: add a v4_end_grace file to /proc/fs/nfsd
> >   nfsd: remove redundant boot_time parm from grace_done client tracking
> >     op
> >   nfsd: pass extra info in env vars to upcalls to allow for early grace
> >     period end
> > 
> >  fs/Kconfig                       |   6 ++-
> >  fs/lockd/Makefile                |   3 +-
> >  fs/lockd/netns.h                 |   1 -
> >  fs/lockd/procfs.c                |  76 +++++++++++++++++++++++++++
> >  fs/lockd/procfs.h                |  28 ++++++++++
> >  fs/lockd/svc.c                   |  10 +++-
> >  fs/nfs_common/Makefile           |   3 +-
> >  fs/{lockd => nfs_common}/grace.c |  68 +++++++++++++++++++++----
> >  fs/nfsd/Kconfig                  |   1 +
> >  fs/nfsd/nfs4recover.c            | 107 +++++++++++++++++++++++++++++++--------
> >  fs/nfsd/nfs4state.c              |   8 +--
> >  fs/nfsd/nfsctl.c                 |  35 +++++++++++++
> >  fs/nfsd/state.h                  |   5 +-
> >  include/linux/proc_fs.h          |   2 +
> >  14 files changed, 312 insertions(+), 41 deletions(-)
> >  create mode 100644 fs/lockd/procfs.c
> >  create mode 100644 fs/lockd/procfs.h
> >  rename fs/{lockd => nfs_common}/grace.c (50%)
> > 
> > -- 
> > 1.9.3
> > 


-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 0/5] nfsd: support for lifting grace period early
  2014-09-26 18:54   ` Jeff Layton
@ 2014-09-26 19:46     ` J. Bruce Fields
  2014-09-26 20:37       ` Trond Myklebust
  2014-09-27 13:04       ` Jeff Layton
  0 siblings, 2 replies; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-26 19:46 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Fri, Sep 26, 2014 at 02:54:46PM -0400, Jeff Layton wrote:
> On Fri, 26 Sep 2014 14:39:49 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > By the way, I've seen the following *before* your patches, but in case
> > you're still looking at reboot recovery problems:
> > 
> > I'm getting sporadic failures in the REBT6 pynfs test--a reclaim open
> > succeeds after a previous boot (with full grace period) during which the
> > client had failed to reclaim.
> > 
> > I managed to catch one trace, the relevant parts looked like:
> > 
> > 	SETCLIENTID client1
> > 	OPEN
> > 	LOCK
> > 
> > 	(server restart here)
> > 
> > 	SETCLIENTID client2
> > 	OPEN
> > 	LOCK (lock that conflicts with client1's)
> > 
> > 	(server restart here)
> > 
> > 	SETCLIENTID client1
> > 	OPEN CLAIM_PREVIOUS
> > 
> > And all those ops (including the last reclaim open) succeeded.
> > 
> > So I didn't have a chance to review it more carefully, but it certainly
> > looks like a server bug, not a test bug.  (Well, technically the server
> > behavior above is correct since it's not required to refuse anything
> > till we actually attempt to reclaim the original lock, but we know our
> > server's not that smart.)
> > 
> > But I haven't gotten any further than that....
> > 
> > --b.
> > 
> 
> Ewww...v4.0... ;)
> 
> Well, I guess that could happen if, after the first reboot, client1 also
> did a SETCLIENTID *and* reclaimed something that didn't conflict with
> the lock that client2 grabs...or, did an OPEN/OPEN_CONFIRM after the
> grace period without reclaiming its lock previously.
> 
> If it didn't do one or the other, then its record should have been
> cleaned out of the DB after the grace period ended between the reboots
> and it wouldn't have been able to reclaim after the second reboot.

Yeah.  Is there an easy way to tell nfsdcltrack to log everything?  I'm
only seeing this occasionally.

> It's a bit of a pathological case, and I don't see a way to fix that in
> the context of v4.0. The fact that there's no RECLAIM_COMPLETE is a
> pretty nasty protocol bug, IMO. Yet another reason to start really
> moving people toward v4.1+...

I don't belive there's a protocol bug here.

A correct NFSv4.0 client wouldn't send the open reclaim in the case you
describe above.

As I understand it, the rule for the client is: you're allowed to
reclaim only the set locks that you held previously, where "the set of
locks you held previously" is "the set of locks held by the clientid
which last managed to send a reclaim OPEN or OPEN_CONFIRM".  So for
example once client1 sends that unrelated OPEN reclaim it's giving up on
anything else it doesn't manage to reclaim this time around.

Any server that loses client state on reboot has no choice but to trust
clients to get this sort of thing right.

--b.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 0/5] nfsd: support for lifting grace period early
  2014-09-26 19:46     ` J. Bruce Fields
@ 2014-09-26 20:37       ` Trond Myklebust
  2014-09-26 20:45         ` J. Bruce Fields
  2014-09-27 13:04       ` Jeff Layton
  1 sibling, 1 reply; 37+ messages in thread
From: Trond Myklebust @ 2014-09-26 20:37 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs

On Fri, Sep 26, 2014 at 3:46 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>
> As I understand it, the rule for the client is: you're allowed to
> reclaim only the set locks that you held previously, where "the set of
> locks you held previously" is "the set of locks held by the clientid
> which last managed to send a reclaim OPEN or OPEN_CONFIRM".  So for
> example once client1 sends that unrelated OPEN reclaim it's giving up on
> anything else it doesn't manage to reclaim this time around.

The rule for the client is very simple: "You may attempt to reclaim
any locks that were held immediately prior to the reboot of the
server."
It doesn't matter how those locks were established (ordinary OPEN,
delegated open, reclaim open, LOCK, reclaim lock...).

However if the server reboots and the client did not manage to
re-establish a lease (SETCLIENTID+SETCLIENTID_CONFIRM and/or
EXCHANGE_ID+CREATE_SESSION) before the second reboot, then it is the
server's responsibility to block that client from reclaiming any
locks, since the client has no way to know how many times the server
has rebooted.
Ditto, of course, if the client tries to reclaim any locks outside the
grace period and the server isn't tracking whether or not those locks
have been handed out to another client.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 0/5] nfsd: support for lifting grace period early
  2014-09-26 20:37       ` Trond Myklebust
@ 2014-09-26 20:45         ` J. Bruce Fields
  2014-09-26 20:58           ` Trond Myklebust
  0 siblings, 1 reply; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-26 20:45 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jeff Layton, linux-nfs

On Fri, Sep 26, 2014 at 04:37:23PM -0400, Trond Myklebust wrote:
> On Fri, Sep 26, 2014 at 3:46 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > As I understand it, the rule for the client is: you're allowed to
> > reclaim only the set locks that you held previously, where "the set of
> > locks you held previously" is "the set of locks held by the clientid
> > which last managed to send a reclaim OPEN or OPEN_CONFIRM".  So for
> > example once client1 sends that unrelated OPEN reclaim it's giving up on
> > anything else it doesn't manage to reclaim this time around.
> 
> The rule for the client is very simple: "You may attempt to reclaim
> any locks that were held immediately prior to the reboot of the
> server."
> It doesn't matter how those locks were established (ordinary OPEN,
> delegated open, reclaim open, LOCK, reclaim lock...).
> 
> However if the server reboots and the client did not manage to
> re-establish a lease (SETCLIENTID+SETCLIENTID_CONFIRM and/or
> EXCHANGE_ID+CREATE_SESSION) before the second reboot, then it is the
> server's responsibility to block that client from reclaiming any
> locks, since the client has no way to know how many times the server
> has rebooted.
> Ditto, of course, if the client tries to reclaim any locks outside the
> grace period and the server isn't tracking whether or not those locks
> have been handed out to another client.

Agreed with everything except:

	(SETCLIENTID+SETCLIENTID_CONFIRM and/or
	EXCHANGE_ID+CREATE_SESSION)

If I remember correctly: RFC 5661 says the point where this happens is
actually RECLAIM_COMPLETE.  RFC 3530 was more vague but suggested first
OPEN reclaim or OPEN_CONFIRM, and 3530bis makes that explicit.

But the client can choose an earlier point without violating the
protocol--it means it will decline reclaiming some things it could have,
but that's safer than the reverse mistake.

--b.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 0/5] nfsd: support for lifting grace period early
  2014-09-26 20:45         ` J. Bruce Fields
@ 2014-09-26 20:58           ` Trond Myklebust
  2014-09-26 21:47             ` J. Bruce Fields
  0 siblings, 1 reply; 37+ messages in thread
From: Trond Myklebust @ 2014-09-26 20:58 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, Linux NFS Mailing List

On Fri, Sep 26, 2014 at 4:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Fri, Sep 26, 2014 at 04:37:23PM -0400, Trond Myklebust wrote:
>> On Fri, Sep 26, 2014 at 3:46 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>> >
>> > As I understand it, the rule for the client is: you're allowed to
>> > reclaim only the set locks that you held previously, where "the set of
>> > locks you held previously" is "the set of locks held by the clientid
>> > which last managed to send a reclaim OPEN or OPEN_CONFIRM".  So for
>> > example once client1 sends that unrelated OPEN reclaim it's giving up on
>> > anything else it doesn't manage to reclaim this time around.
>>
>> The rule for the client is very simple: "You may attempt to reclaim
>> any locks that were held immediately prior to the reboot of the
>> server."
>> It doesn't matter how those locks were established (ordinary OPEN,
>> delegated open, reclaim open, LOCK, reclaim lock...).
>>
>> However if the server reboots and the client did not manage to
>> re-establish a lease (SETCLIENTID+SETCLIENTID_CONFIRM and/or
>> EXCHANGE_ID+CREATE_SESSION) before the second reboot, then it is the
>> server's responsibility to block that client from reclaiming any
>> locks, since the client has no way to know how many times the server
>> has rebooted.
>> Ditto, of course, if the client tries to reclaim any locks outside the
>> grace period and the server isn't tracking whether or not those locks
>> have been handed out to another client.
>
> Agreed with everything except:
>
>         (SETCLIENTID+SETCLIENTID_CONFIRM and/or
>         EXCHANGE_ID+CREATE_SESSION)
>
> If I remember correctly: RFC 5661 says the point where this happens is
> actually RECLAIM_COMPLETE.  RFC 3530 was more vague but suggested first
> OPEN reclaim or OPEN_CONFIRM, and 3530bis makes that explicit.
>
> But the client can choose an earlier point without violating the
> protocol--it means it will decline reclaiming some things it could have,
> but that's safer than the reverse mistake.
>

Where is this documented? I'm not seeing it.

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 0/5] nfsd: support for lifting grace period early
  2014-09-26 20:58           ` Trond Myklebust
@ 2014-09-26 21:47             ` J. Bruce Fields
  2014-09-26 22:17               ` Trond Myklebust
  0 siblings, 1 reply; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-26 21:47 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jeff Layton, Linux NFS Mailing List

On Fri, Sep 26, 2014 at 04:58:47PM -0400, Trond Myklebust wrote:
> On Fri, Sep 26, 2014 at 4:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > On Fri, Sep 26, 2014 at 04:37:23PM -0400, Trond Myklebust wrote:
> >> On Fri, Sep 26, 2014 at 3:46 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >> >
> >> > As I understand it, the rule for the client is: you're allowed to
> >> > reclaim only the set locks that you held previously, where "the set of
> >> > locks you held previously" is "the set of locks held by the clientid
> >> > which last managed to send a reclaim OPEN or OPEN_CONFIRM".  So for
> >> > example once client1 sends that unrelated OPEN reclaim it's giving up on
> >> > anything else it doesn't manage to reclaim this time around.
> >>
> >> The rule for the client is very simple: "You may attempt to reclaim
> >> any locks that were held immediately prior to the reboot of the
> >> server."
> >> It doesn't matter how those locks were established (ordinary OPEN,
> >> delegated open, reclaim open, LOCK, reclaim lock...).
> >>
> >> However if the server reboots and the client did not manage to
> >> re-establish a lease (SETCLIENTID+SETCLIENTID_CONFIRM and/or
> >> EXCHANGE_ID+CREATE_SESSION) before the second reboot, then it is the
> >> server's responsibility to block that client from reclaiming any
> >> locks, since the client has no way to know how many times the server
> >> has rebooted.
> >> Ditto, of course, if the client tries to reclaim any locks outside the
> >> grace period and the server isn't tracking whether or not those locks
> >> have been handed out to another client.
> >
> > Agreed with everything except:
> >
> >         (SETCLIENTID+SETCLIENTID_CONFIRM and/or
> >         EXCHANGE_ID+CREATE_SESSION)
> >
> > If I remember correctly: RFC 5661 says the point where this happens is
> > actually RECLAIM_COMPLETE.  RFC 3530 was more vague but suggested first
> > OPEN reclaim or OPEN_CONFIRM, and 3530bis makes that explicit.
> >
> > But the client can choose an earlier point without violating the
> > protocol--it means it will decline reclaiming some things it could have,
> > but that's safer than the reverse mistake.
> >
> 
> Where is this documented? I'm not seeing it.

It's more vague than I remembered:

http://tools.ietf.org/html/rfc5661#section-8.4.3

	The server will set this for any client record in stable
	storage where the client has not done a suitable
	RECLAIM_COMPLETE (global or file system-specific depending on
	the target of the lock request) before it grants any new (i.e.,
	not reclaimed) lock to any client.

And the corresponding langue in 8.6.3 of rfc 3530 is:

	a timestamp that is updated the first time after a server boot
	or reboot the client acquires record locking, share reservation,
	or delegation state on the server.  The timestamp need not be
	updated on subsequent lock requests until the server reboots.

I thought there was something referring specifically to OPEN reclaim or
OPEN_CONFIRM as the point where "the client acquires record locking" but
can't find it on a quick skim.

I also say this is "vague" because, unfortunately, in both cases, this
language is part of a description of an example server implementation,
no actual protocol requirement is made explicit.

Which is weird given that noticing the partial-reclaim case was actually
Dave Noveck's original motivation for introducing RECLAIM_COMPLETE (then
RECOVERY_COMPLETE), with the grace-period shortening an extra benefit:

	http://osdir.com/ml/ietf.nfsv4/2006-01/msg00020.html

	Adding the RECOVERY_COMPLETE op allows this situation to be
	dealt with fairly simply. If a client has not recovered all of
	its locks and we have the possiblity of having given out a lock
	inconsistent with one of those (the normal realization of this
	would be that once we declare grace over with some client's
	reclaims not complete) we mark that client as essentially having
	had a lock effectively revoked and thus it would not allowed to
	reclaim locks after a subsequent reboot since it could no longer
	vouch for all the locks it thinks it had.

In the 3530 case we decided that the only safe point to choose was the
one described in the sample server implementation, so 3530bis says:

	A server may consider a client's lease "successfully
	established" once it has received an open operation from that
	client.

(And "open operation" probably is still too vague.)

Sorry for the length.

Anyway, if the client's currently doing this at SETCLIENTID_CONFIRM and
CREATE_SESSION then I think that's correct but more conservative than
necessary.  Which may be a good idea given that I think the chance of a
random server implementor making there way through all this is small.

--b.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 0/5] nfsd: support for lifting grace period early
  2014-09-26 21:47             ` J. Bruce Fields
@ 2014-09-26 22:17               ` Trond Myklebust
  2014-09-26 22:35                 ` Trond Myklebust
  0 siblings, 1 reply; 37+ messages in thread
From: Trond Myklebust @ 2014-09-26 22:17 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, Linux NFS Mailing List

On Fri, Sep 26, 2014 at 5:47 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Fri, Sep 26, 2014 at 04:58:47PM -0400, Trond Myklebust wrote:
>> On Fri, Sep 26, 2014 at 4:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>> > On Fri, Sep 26, 2014 at 04:37:23PM -0400, Trond Myklebust wrote:
>> >> On Fri, Sep 26, 2014 at 3:46 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>> >> >
>> >> > As I understand it, the rule for the client is: you're allowed to
>> >> > reclaim only the set locks that you held previously, where "the set of
>> >> > locks you held previously" is "the set of locks held by the clientid
>> >> > which last managed to send a reclaim OPEN or OPEN_CONFIRM".  So for
>> >> > example once client1 sends that unrelated OPEN reclaim it's giving up on
>> >> > anything else it doesn't manage to reclaim this time around.
>> >>
>> >> The rule for the client is very simple: "You may attempt to reclaim
>> >> any locks that were held immediately prior to the reboot of the
>> >> server."
>> >> It doesn't matter how those locks were established (ordinary OPEN,
>> >> delegated open, reclaim open, LOCK, reclaim lock...).
>> >>
>> >> However if the server reboots and the client did not manage to
>> >> re-establish a lease (SETCLIENTID+SETCLIENTID_CONFIRM and/or
>> >> EXCHANGE_ID+CREATE_SESSION) before the second reboot, then it is the
>> >> server's responsibility to block that client from reclaiming any
>> >> locks, since the client has no way to know how many times the server
>> >> has rebooted.
>> >> Ditto, of course, if the client tries to reclaim any locks outside the
>> >> grace period and the server isn't tracking whether or not those locks
>> >> have been handed out to another client.
>> >
>> > Agreed with everything except:
>> >
>> >         (SETCLIENTID+SETCLIENTID_CONFIRM and/or
>> >         EXCHANGE_ID+CREATE_SESSION)
>> >
>> > If I remember correctly: RFC 5661 says the point where this happens is
>> > actually RECLAIM_COMPLETE.  RFC 3530 was more vague but suggested first
>> > OPEN reclaim or OPEN_CONFIRM, and 3530bis makes that explicit.
>> >
>> > But the client can choose an earlier point without violating the
>> > protocol--it means it will decline reclaiming some things it could have,
>> > but that's safer than the reverse mistake.
>> >
>>
>> Where is this documented? I'm not seeing it.
>
> It's more vague than I remembered:
>
> http://tools.ietf.org/html/rfc5661#section-8.4.3
>
>         The server will set this for any client record in stable
>         storage where the client has not done a suitable
>         RECLAIM_COMPLETE (global or file system-specific depending on
>         the target of the lock request) before it grants any new (i.e.,
>         not reclaimed) lock to any client.

Yes, I read that. Then I read this:

   For the second edge condition, after the server restarts for a second
   time, the indication that the client had not completed its reclaims
   at the time at which the grace period ended means that the server
   must reject a reclaim from client A with the error NFS4ERR_NO_GRACE.

This text is just plain wrong, and we should fix it in an errata.
There is absolutely NO edge case condition for those locks that were
successfully reclaimed after the first reboot. There is absolutely no
reason why the client shouldn't be able to reclaim those locks after
reboot number 2.

All the absence of the RECLAIM_COMPLETE tells the server is that the
client MAY have held more locks; what is the server supposed to do
with that information? It's not the server that is responsible for
reclaiming locks.

> And the corresponding langue in 8.6.3 of rfc 3530 is:
>
>         a timestamp that is updated the first time after a server boot
>         or reboot the client acquires record locking, share reservation,
>         or delegation state on the server.  The timestamp need not be
>         updated on subsequent lock requests until the server reboots.
>
> I thought there was something referring specifically to OPEN reclaim or
> OPEN_CONFIRM as the point where "the client acquires record locking" but
> can't find it on a quick skim.
>
> I also say this is "vague" because, unfortunately, in both cases, this
> language is part of a description of an example server implementation,
> no actual protocol requirement is made explicit.
>
> Which is weird given that noticing the partial-reclaim case was actually
> Dave Noveck's original motivation for introducing RECLAIM_COMPLETE (then
> RECOVERY_COMPLETE), with the grace-period shortening an extra benefit:
>
>         http://osdir.com/ml/ietf.nfsv4/2006-01/msg00020.html
>
>         Adding the RECOVERY_COMPLETE op allows this situation to be
>         dealt with fairly simply. If a client has not recovered all of
>         its locks and we have the possiblity of having given out a lock
>         inconsistent with one of those (the normal realization of this
>         would be that once we declare grace over with some client's
>         reclaims not complete) we mark that client as essentially having
>         had a lock effectively revoked and thus it would not allowed to
>         reclaim locks after a subsequent reboot since it could no longer
>         vouch for all the locks it thinks it had.

Sigh. Another example of one of Dave's proposals going through without
adequate review because the reader died of exhaustion while wrestling
with the preamble text.
So, I agree that if the client were to try to reclaim a lock that it
didn't own (because it didn't manage to reclaim it) in the previous
boot instance would be a problem. However WHY would a sane client do
this?

> In the 3530 case we decided that the only safe point to choose was the
> one described in the sample server implementation, so 3530bis says:
>
>         A server may consider a client's lease "successfully
>         established" once it has received an open operation from that
>         client.
>
> (And "open operation" probably is still too vague.)
>
> Sorry for the length.
>
> Anyway, if the client's currently doing this at SETCLIENTID_CONFIRM and
> CREATE_SESSION then I think that's correct but more conservative than
> necessary.  Which may be a good idea given that I think the chance of a
> random server implementor making there way through all this is small.
>

The client is pruning all those locks that it did not manage to
reclaim before the grace period expired and/or the second reboot
occurred. That is the correct behaviour.

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 0/5] nfsd: support for lifting grace period early
  2014-09-26 22:17               ` Trond Myklebust
@ 2014-09-26 22:35                 ` Trond Myklebust
  0 siblings, 0 replies; 37+ messages in thread
From: Trond Myklebust @ 2014-09-26 22:35 UTC (permalink / raw)
  To: J. Bruce Fields, Thomas D Haynes; +Cc: Jeff Layton, Linux NFS Mailing List

On Fri, Sep 26, 2014 at 6:17 PM, Trond Myklebust
<trond.myklebust@primarydata.com> wrote:
> On Fri, Sep 26, 2014 at 5:47 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>> On Fri, Sep 26, 2014 at 04:58:47PM -0400, Trond Myklebust wrote:
>>> On Fri, Sep 26, 2014 at 4:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>>> > On Fri, Sep 26, 2014 at 04:37:23PM -0400, Trond Myklebust wrote:
>>> >> On Fri, Sep 26, 2014 at 3:46 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>>> >> >
>>> >> > As I understand it, the rule for the client is: you're allowed to
>>> >> > reclaim only the set locks that you held previously, where "the set of
>>> >> > locks you held previously" is "the set of locks held by the clientid
>>> >> > which last managed to send a reclaim OPEN or OPEN_CONFIRM".  So for
>>> >> > example once client1 sends that unrelated OPEN reclaim it's giving up on
>>> >> > anything else it doesn't manage to reclaim this time around.
>>> >>
>>> >> The rule for the client is very simple: "You may attempt to reclaim
>>> >> any locks that were held immediately prior to the reboot of the
>>> >> server."
>>> >> It doesn't matter how those locks were established (ordinary OPEN,
>>> >> delegated open, reclaim open, LOCK, reclaim lock...).
>>> >>
>>> >> However if the server reboots and the client did not manage to
>>> >> re-establish a lease (SETCLIENTID+SETCLIENTID_CONFIRM and/or
>>> >> EXCHANGE_ID+CREATE_SESSION) before the second reboot, then it is the
>>> >> server's responsibility to block that client from reclaiming any
>>> >> locks, since the client has no way to know how many times the server
>>> >> has rebooted.
>>> >> Ditto, of course, if the client tries to reclaim any locks outside the
>>> >> grace period and the server isn't tracking whether or not those locks
>>> >> have been handed out to another client.
>>> >
>>> > Agreed with everything except:
>>> >
>>> >         (SETCLIENTID+SETCLIENTID_CONFIRM and/or
>>> >         EXCHANGE_ID+CREATE_SESSION)
>>> >
>>> > If I remember correctly: RFC 5661 says the point where this happens is
>>> > actually RECLAIM_COMPLETE.  RFC 3530 was more vague but suggested first
>>> > OPEN reclaim or OPEN_CONFIRM, and 3530bis makes that explicit.
>>> >
>>> > But the client can choose an earlier point without violating the
>>> > protocol--it means it will decline reclaiming some things it could have,
>>> > but that's safer than the reverse mistake.
>>> >
>>>
>>> Where is this documented? I'm not seeing it.
>>
>> It's more vague than I remembered:
>>
>> http://tools.ietf.org/html/rfc5661#section-8.4.3
>>
>>         The server will set this for any client record in stable
>>         storage where the client has not done a suitable
>>         RECLAIM_COMPLETE (global or file system-specific depending on
>>         the target of the lock request) before it grants any new (i.e.,
>>         not reclaimed) lock to any client.
>
> Yes, I read that. Then I read this:
>
>    For the second edge condition, after the server restarts for a second
>    time, the indication that the client had not completed its reclaims
>    at the time at which the grace period ended means that the server
>    must reject a reclaim from client A with the error NFS4ERR_NO_GRACE.
>
> This text is just plain wrong, and we should fix it in an errata.
> There is absolutely NO edge case condition for those locks that were
> successfully reclaimed after the first reboot. There is absolutely no
> reason why the client shouldn't be able to reclaim those locks after
> reboot number 2.
>
> All the absence of the RECLAIM_COMPLETE tells the server is that the
> client MAY have held more locks; what is the server supposed to do
> with that information? It's not the server that is responsible for
> reclaiming locks.
>

Sigh, the information in RFC3530bis is still confused.

This makes sense:
   o  a timestamp that is updated the first time after a server boot or
      reboot the client acquires byte-range locking, share reservation,
      or delegation state on the server.  The timestamp need not be
      updated on subsequent lock requests until the server reboots.
....
This does not:
   For the second edge condition, after the server reboots for a second
   time, the record that the client had an unexpired record lock, share
   reservation, or delegation established before the server's previous
   incarnation means that the server must reject a reclaim from client A
   with the error NFS4ERR_NO_GRACE or NFS4ERR_RECLAIM_BAD.

Tom?

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 0/5] nfsd: support for lifting grace period early
  2014-09-26 19:46     ` J. Bruce Fields
  2014-09-26 20:37       ` Trond Myklebust
@ 2014-09-27 13:04       ` Jeff Layton
  2014-09-29 16:44         ` J. Bruce Fields
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff Layton @ 2014-09-27 13:04 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs

On Fri, 26 Sep 2014 15:46:17 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Fri, Sep 26, 2014 at 02:54:46PM -0400, Jeff Layton wrote:
> > On Fri, 26 Sep 2014 14:39:49 -0400
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> > > By the way, I've seen the following *before* your patches, but in case
> > > you're still looking at reboot recovery problems:
> > > 
> > > I'm getting sporadic failures in the REBT6 pynfs test--a reclaim open
> > > succeeds after a previous boot (with full grace period) during which the
> > > client had failed to reclaim.
> > > 
> > > I managed to catch one trace, the relevant parts looked like:
> > > 
> > > 	SETCLIENTID client1
> > > 	OPEN
> > > 	LOCK
> > > 
> > > 	(server restart here)
> > > 
> > > 	SETCLIENTID client2
> > > 	OPEN
> > > 	LOCK (lock that conflicts with client1's)
> > > 
> > > 	(server restart here)
> > > 
> > > 	SETCLIENTID client1
> > > 	OPEN CLAIM_PREVIOUS
> > > 
> > > And all those ops (including the last reclaim open) succeeded.
> > > 
> > > So I didn't have a chance to review it more carefully, but it certainly
> > > looks like a server bug, not a test bug.  (Well, technically the server
> > > behavior above is correct since it's not required to refuse anything
> > > till we actually attempt to reclaim the original lock, but we know our
> > > server's not that smart.)
> > > 
> > > But I haven't gotten any further than that....
> > > 
> > > --b.
> > > 
> > 
> > Ewww...v4.0... ;)
> > 
> > Well, I guess that could happen if, after the first reboot, client1 also
> > did a SETCLIENTID *and* reclaimed something that didn't conflict with
> > the lock that client2 grabs...or, did an OPEN/OPEN_CONFIRM after the
> > grace period without reclaiming its lock previously.
> > 
> > If it didn't do one or the other, then its record should have been
> > cleaned out of the DB after the grace period ended between the reboots
> > and it wouldn't have been able to reclaim after the second reboot.
> 
> Yeah.  Is there an easy way to tell nfsdcltrack to log everything?  I'm
> only seeing this occasionally.
> 

If you can make sure that it's run with the '-d' flag then that'll make
it do debug level logging. Unfortunately, the kernel doesn't have a
handy switch to enable that so you'll need to patch the kernel (or
maybe wrap nfsdcltrack) to make that happen.

Maybe we should add a module parm to nfsd.ko that makes it run
nfsdcltrack with -d?

> > It's a bit of a pathological case, and I don't see a way to fix that in
> > the context of v4.0. The fact that there's no RECLAIM_COMPLETE is a
> > pretty nasty protocol bug, IMO. Yet another reason to start really
> > moving people toward v4.1+...
> 
> I don't belive there's a protocol bug here.
> 
> A correct NFSv4.0 client wouldn't send the open reclaim in the case you
> describe above.
> 

It would in the partial reclaim case.

Suppose we start reclaiming things but don't get everything before
there's a network partition. The server ends the grace period and then
hands out the conflicting lock to client2. It then reboots again and the
network partition heals. At that point, client1 could try to reclaim
everything it had before (including the lock that conflicts with the
one that client2 had).

I'd still argue that this is a protocol bug. Without RECLAIM_COMPLETE,
the server simply has no way to know whether the reclaims done by
client1 are complete or not.

> As I understand it, the rule for the client is: you're allowed to
> reclaim only the set locks that you held previously, where "the set of
> locks you held previously" is "the set of locks held by the clientid
> which last managed to send a reclaim OPEN or OPEN_CONFIRM".  So for
> example once client1 sends that unrelated OPEN reclaim it's giving up
> on anything else it doesn't manage to reclaim this time around.
> 
> Any server that loses client state on reboot has no choice but to
> trust clients to get this sort of thing right.
> 


-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 0/5] nfsd: support for lifting grace period early
  2014-09-27 13:04       ` Jeff Layton
@ 2014-09-29 16:44         ` J. Bruce Fields
  2014-09-29 16:53           ` Trond Myklebust
  2014-09-29 17:11           ` Jeff Layton
  0 siblings, 2 replies; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-29 16:44 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Sat, Sep 27, 2014 at 09:04:58AM -0400, Jeff Layton wrote:
> On Fri, 26 Sep 2014 15:46:17 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Fri, Sep 26, 2014 at 02:54:46PM -0400, Jeff Layton wrote:
> > > On Fri, 26 Sep 2014 14:39:49 -0400
> > > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > > 
> > > > By the way, I've seen the following *before* your patches, but in case
> > > > you're still looking at reboot recovery problems:
> > > > 
> > > > I'm getting sporadic failures in the REBT6 pynfs test--a reclaim open
> > > > succeeds after a previous boot (with full grace period) during which the
> > > > client had failed to reclaim.
> > > > 
> > > > I managed to catch one trace, the relevant parts looked like:
> > > > 
> > > > 	SETCLIENTID client1
> > > > 	OPEN
> > > > 	LOCK
> > > > 
> > > > 	(server restart here)
> > > > 
> > > > 	SETCLIENTID client2
> > > > 	OPEN
> > > > 	LOCK (lock that conflicts with client1's)
> > > > 
> > > > 	(server restart here)
> > > > 
> > > > 	SETCLIENTID client1
> > > > 	OPEN CLAIM_PREVIOUS
> > > > 
> > > > And all those ops (including the last reclaim open) succeeded.
> > > > 
> > > > So I didn't have a chance to review it more carefully, but it certainly
> > > > looks like a server bug, not a test bug.  (Well, technically the server
> > > > behavior above is correct since it's not required to refuse anything
> > > > till we actually attempt to reclaim the original lock, but we know our
> > > > server's not that smart.)
> > > > 
> > > > But I haven't gotten any further than that....
> > > > 
> > > > --b.
> > > > 
> > > 
> > > Ewww...v4.0... ;)
> > > 
> > > Well, I guess that could happen if, after the first reboot, client1 also
> > > did a SETCLIENTID *and* reclaimed something that didn't conflict with
> > > the lock that client2 grabs...or, did an OPEN/OPEN_CONFIRM after the
> > > grace period without reclaiming its lock previously.
> > > 
> > > If it didn't do one or the other, then its record should have been
> > > cleaned out of the DB after the grace period ended between the reboots
> > > and it wouldn't have been able to reclaim after the second reboot.
> > 
> > Yeah.  Is there an easy way to tell nfsdcltrack to log everything?  I'm
> > only seeing this occasionally.
> > 
> 
> If you can make sure that it's run with the '-d' flag then that'll make
> it do debug level logging. Unfortunately, the kernel doesn't have a
> handy switch to enable that so you'll need to patch the kernel (or
> maybe wrap nfsdcltrack) to make that happen.
> 
> Maybe we should add a module parm to nfsd.ko that makes it run
> nfsdcltrack with -d?

Would userland configuration (e.g. and optional /etc/ file) be more
flexible?

> 
> > > It's a bit of a pathological case, and I don't see a way to fix that in
> > > the context of v4.0. The fact that there's no RECLAIM_COMPLETE is a
> > > pretty nasty protocol bug, IMO. Yet another reason to start really
> > > moving people toward v4.1+...
> > 
> > I don't belive there's a protocol bug here.
> > 
> > A correct NFSv4.0 client wouldn't send the open reclaim in the case you
> > describe above.
> > 
> 
> It would in the partial reclaim case.

My (and I think Trond's) understanding is that client1 would not send
the reclaim in that case.  In the case of a partial reclaim, a correct
4.0 client will stop trying to reclaim the state that it failed to
reclaim previously.  My understanding is that is how the linux client in
fact behaves.

So the bug would be in the client, not the protocol.  That said,
admittedly:

	- the "protocol" here seems to be largely in our heads; the spec
	  does a poor job of explaining all this.
	- we (not suprisingly) disagree on some of the details.

Nevertheless, I believe that despite our disagreement there's no actual
bug at least between the existing Linux client and server.  And that
3530bis and 5661 were at least *supposed* to deal with this case....

Oh, look, Trond and you have written a pile more of text on this on the
ietf list.  Should I even try to catch up, or would be better off just
spending the rest of my day back in bed?

--b.

> 
> Suppose we start reclaiming things but don't get everything before
> there's a network partition. The server ends the grace period and then
> hands out the conflicting lock to client2. It then reboots again and the
> network partition heals. At that point, client1 could try to reclaim
> everything it had before (including the lock that conflicts with the
> one that client2 had).
> 
> I'd still argue that this is a protocol bug. Without RECLAIM_COMPLETE,
> the server simply has no way to know whether the reclaims done by
> client1 are complete or not.
> 
> > As I understand it, the rule for the client is: you're allowed to
> > reclaim only the set locks that you held previously, where "the set of
> > locks you held previously" is "the set of locks held by the clientid
> > which last managed to send a reclaim OPEN or OPEN_CONFIRM".  So for
> > example once client1 sends that unrelated OPEN reclaim it's giving up
> > on anything else it doesn't manage to reclaim this time around.
> > 
> > Any server that loses client state on reboot has no choice but to
> > trust clients to get this sort of thing right.
> > 
> 
> 
> -- 
> Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 0/5] nfsd: support for lifting grace period early
  2014-09-29 16:44         ` J. Bruce Fields
@ 2014-09-29 16:53           ` Trond Myklebust
  2014-09-29 17:11           ` Jeff Layton
  1 sibling, 0 replies; 37+ messages in thread
From: Trond Myklebust @ 2014-09-29 16:53 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, Linux NFS Mailing List

On Mon, Sep 29, 2014 at 12:44 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> Oh, look, Trond and you have written a pile more of text on this on the
> ietf list.  Should I even try to catch up, or would be better off just
> spending the rest of my day back in bed?
>

Sleep, Bruce...

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 0/5] nfsd: support for lifting grace period early
  2014-09-29 16:44         ` J. Bruce Fields
  2014-09-29 16:53           ` Trond Myklebust
@ 2014-09-29 17:11           ` Jeff Layton
  2014-09-29 17:55             ` J. Bruce Fields
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff Layton @ 2014-09-29 17:11 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs

On Mon, 29 Sep 2014 12:44:26 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Sat, Sep 27, 2014 at 09:04:58AM -0400, Jeff Layton wrote:
> > On Fri, 26 Sep 2014 15:46:17 -0400
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> > > On Fri, Sep 26, 2014 at 02:54:46PM -0400, Jeff Layton wrote:
> > > > On Fri, 26 Sep 2014 14:39:49 -0400
> > > > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > > > 
> > > > > By the way, I've seen the following *before* your patches, but in case
> > > > > you're still looking at reboot recovery problems:
> > > > > 
> > > > > I'm getting sporadic failures in the REBT6 pynfs test--a reclaim open
> > > > > succeeds after a previous boot (with full grace period) during which the
> > > > > client had failed to reclaim.
> > > > > 
> > > > > I managed to catch one trace, the relevant parts looked like:
> > > > > 
> > > > > 	SETCLIENTID client1
> > > > > 	OPEN
> > > > > 	LOCK
> > > > > 
> > > > > 	(server restart here)
> > > > > 
> > > > > 	SETCLIENTID client2
> > > > > 	OPEN
> > > > > 	LOCK (lock that conflicts with client1's)
> > > > > 
> > > > > 	(server restart here)
> > > > > 
> > > > > 	SETCLIENTID client1
> > > > > 	OPEN CLAIM_PREVIOUS
> > > > > 
> > > > > And all those ops (including the last reclaim open) succeeded.
> > > > > 
> > > > > So I didn't have a chance to review it more carefully, but it certainly
> > > > > looks like a server bug, not a test bug.  (Well, technically the server
> > > > > behavior above is correct since it's not required to refuse anything
> > > > > till we actually attempt to reclaim the original lock, but we know our
> > > > > server's not that smart.)
> > > > > 
> > > > > But I haven't gotten any further than that....
> > > > > 
> > > > > --b.
> > > > > 
> > > > 
> > > > Ewww...v4.0... ;)
> > > > 
> > > > Well, I guess that could happen if, after the first reboot, client1 also
> > > > did a SETCLIENTID *and* reclaimed something that didn't conflict with
> > > > the lock that client2 grabs...or, did an OPEN/OPEN_CONFIRM after the
> > > > grace period without reclaiming its lock previously.
> > > > 
> > > > If it didn't do one or the other, then its record should have been
> > > > cleaned out of the DB after the grace period ended between the reboots
> > > > and it wouldn't have been able to reclaim after the second reboot.
> > > 
> > > Yeah.  Is there an easy way to tell nfsdcltrack to log everything?  I'm
> > > only seeing this occasionally.
> > > 
> > 
> > If you can make sure that it's run with the '-d' flag then that'll make
> > it do debug level logging. Unfortunately, the kernel doesn't have a
> > handy switch to enable that so you'll need to patch the kernel (or
> > maybe wrap nfsdcltrack) to make that happen.
> > 
> > Maybe we should add a module parm to nfsd.ko that makes it run
> > nfsdcltrack with -d?
> 
> Would userland configuration (e.g. and optional /etc/ file) be more
> flexible?
> 

Yeah, that might not hurt.

We do want this thing to run pretty quickly, but if the config file
were only processed if it were present (and it usually wasn't) then
that might not be too bad.

> > 
> > > > It's a bit of a pathological case, and I don't see a way to fix that in
> > > > the context of v4.0. The fact that there's no RECLAIM_COMPLETE is a
> > > > pretty nasty protocol bug, IMO. Yet another reason to start really
> > > > moving people toward v4.1+...
> > > 
> > > I don't belive there's a protocol bug here.
> > > 
> > > A correct NFSv4.0 client wouldn't send the open reclaim in the case you
> > > describe above.
> > > 
> > 
> > It would in the partial reclaim case.
> 
> My (and I think Trond's) understanding is that client1 would not send
> the reclaim in that case.  In the case of a partial reclaim, a correct
> 4.0 client will stop trying to reclaim the state that it failed to
> reclaim previously.  My understanding is that is how the linux client in
> fact behaves.
> 
> So the bug would be in the client, not the protocol.  That said,
> admittedly:
> 
> 	- the "protocol" here seems to be largely in our heads; the spec
> 	  does a poor job of explaining all this.
> 	- we (not suprisingly) disagree on some of the details.
> 
> Nevertheless, I believe that despite our disagreement there's no actual
> bug at least between the existing Linux client and server.  And that
> 3530bis and 5661 were at least *supposed* to deal with this case....
> 
> Oh, look, Trond and you have written a pile more of text on this on the
> ietf list.  Should I even try to catch up, or would be better off just
> spending the rest of my day back in bed?
> 
> --b.
> 

If there is any bug, I think it's only going to crop up under extremely
rare circumstances -- network partitions cropping up during reclaims,
multiple reboots, etc...

We probably ought to try and clarify this stuff for the spec (and our
own implementations) but I don't think there's any cause for alarm...

-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 0/5] nfsd: support for lifting grace period early
  2014-09-29 17:11           ` Jeff Layton
@ 2014-09-29 17:55             ` J. Bruce Fields
  0 siblings, 0 replies; 37+ messages in thread
From: J. Bruce Fields @ 2014-09-29 17:55 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Mon, Sep 29, 2014 at 01:11:30PM -0400, Jeff Layton wrote:
> If there is any bug, I think it's only going to crop up under extremely
> rare circumstances -- network partitions cropping up during reclaims,
> multiple reboots, etc...
> 
> We probably ought to try and clarify this stuff for the spec (and our
> own implementations) but I don't think there's any cause for alarm...

Yes in this case I agree.

In general though reboot-recovery interoperability is hard to test and
bad documentation increases the chance we'll get it wrong.

(And these "edge cases" could also be more common that we expect: e.g.
if power failures come in clusters, and bring your switches down at
about the same time, could multiple-restarts-with-network-partitions be
the normal case?)

--b.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2014-09-29 17:55 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-19 18:38 [PATCH v2 0/5] nfsd: support for lifting grace period early Jeff Layton
2014-08-19 18:38 ` [PATCH v2 1/5] lockd: move lockd's grace period handling into its own module Jeff Layton
2014-08-28 20:01   ` J. Bruce Fields
2014-08-28 20:24     ` J. Bruce Fields
2014-08-28 23:53       ` Jeff Layton
2014-09-03 21:54         ` J. Bruce Fields
2014-09-15 22:08       ` J. Bruce Fields
2014-09-15 22:09         ` J. Bruce Fields
2014-09-15 22:10           ` J. Bruce Fields
2014-09-15 23:19           ` Jeff Layton
2014-09-16  0:19             ` J. Bruce Fields
2014-09-15 23:11         ` Jeff Layton
2014-08-19 18:38 ` [PATCH v2 2/5] lockd: add a /proc/fs/lockd/nlm_end_grace file Jeff Layton
2014-09-04 19:52   ` J. Bruce Fields
2014-08-19 18:38 ` [PATCH v2 3/5] nfsd: add a v4_end_grace file to /proc/fs/nfsd Jeff Layton
2014-09-04 19:54   ` J. Bruce Fields
2014-09-05 11:40     ` Jeff Layton
2014-08-19 18:38 ` [PATCH v2 4/5] nfsd: remove redundant boot_time parm from grace_done client tracking op Jeff Layton
2014-09-04 19:54   ` J. Bruce Fields
2014-08-19 18:38 ` [PATCH v2 5/5] nfsd: pass extra info in env vars to upcalls to allow for early grace period end Jeff Layton
2014-09-04 19:59   ` J. Bruce Fields
2014-09-05 11:43     ` Jeff Layton
2014-09-05 15:58       ` J. Bruce Fields
2014-09-26 18:39 ` [PATCH v2 0/5] nfsd: support for lifting grace period early J. Bruce Fields
2014-09-26 18:54   ` Jeff Layton
2014-09-26 19:46     ` J. Bruce Fields
2014-09-26 20:37       ` Trond Myklebust
2014-09-26 20:45         ` J. Bruce Fields
2014-09-26 20:58           ` Trond Myklebust
2014-09-26 21:47             ` J. Bruce Fields
2014-09-26 22:17               ` Trond Myklebust
2014-09-26 22:35                 ` Trond Myklebust
2014-09-27 13:04       ` Jeff Layton
2014-09-29 16:44         ` J. Bruce Fields
2014-09-29 16:53           ` Trond Myklebust
2014-09-29 17:11           ` Jeff Layton
2014-09-29 17:55             ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.