All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] nfs-utils: support for lifting grace period early
@ 2014-08-19 18:49 Jeff Layton
  2014-08-19 18:49 ` [PATCH v2 1/7] sm-notify: inform the kernel if there were no hosts to notify Jeff Layton
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Jeff Layton @ 2014-08-19 18:49 UTC (permalink / raw)
  To: steved; +Cc: linux-nfs

This patchset adds some support to sm-notify and nfsdcltrack for lifting
the grace periods early. Allowing this to actually work depends on the
companion kernel patchset, but the approach I've taken here should deal
properly with userland/kernel mismatches.

There are two main pieces:

sm-notify: in the event that sm-notify isn't sending any NOTIFY
requests, we don't expect to see any reclaims from clients. In that
case, we should be able to safely lift the lockd grace period early.
The first patch in the series implements this (though we'll probably
need a bit of selinux work to get that working in Fedora under enforcing
mode).

nfsdcltrack: if there are no v4.0 clients and all v4.1+ clients have
issued a RECLAIM_COMPLETE, then we can go ahead and end the nfsd grace
period. The remainder of the patchset adds the support for this. This
requires revving the DB schema for it, and making use of the environment
variables that are passed to the upcall by the kernel.

With an updated kernel and nfs-utils, I typically see the grace period
being lifted just a few seconds after it starts. It may take a little
longer with more clients, but this is a vast improvement over having
to wait 90s after each reboot to get meaningful work done.

Assuming that the kernel parts are acceptable to Bruce et. al., then I
think we'll want this merged around the same time.

Jeff Layton (7):
  sm-notify: inform the kernel if there were no hosts to notify
  nfsdcltrack: update comments in sqlite.c
  nfsdcltrack: rename CLD_* constants with CLTRACK_* prefixes
  nfsdcltrack: overhaul database initializtion
  nfsdcltrack: update schema to v2
  nfsdcltrack: grab the client minorversion from the env var if it's
    present
  nfsdcltrack: fetch NFSDCLTRACK_GRACE_START out of environment

 utils/nfsdcltrack/nfsdcltrack.c | 106 +++++++++++-
 utils/nfsdcltrack/sqlite.c      | 375 ++++++++++++++++++++++++++++++----------
 utils/nfsdcltrack/sqlite.h      |   5 +-
 utils/statd/sm-notify.c         |  25 +++
 4 files changed, 413 insertions(+), 98 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/7] sm-notify: inform the kernel if there were no hosts to notify
  2014-08-19 18:49 [PATCH v2 0/7] nfs-utils: support for lifting grace period early Jeff Layton
@ 2014-08-19 18:49 ` Jeff Layton
  2014-08-19 18:49 ` [PATCH v2 2/7] nfsdcltrack: update comments in sqlite.c Jeff Layton
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2014-08-19 18:49 UTC (permalink / raw)
  To: steved; +Cc: linux-nfs

In the event that there no hosts to be notified after a reboot, there's
no real reason to force lockd to wait the entire grace period before
handing out locks. We're not expecting any reclaim requests to come in
that situation.

Have sm-notify do a write to /proc/fs/lockd/nlm_end_grace if that file
is present. That informs the kernel that it's OK to go ahead and lift
lockd's grace period early.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
---
 utils/statd/sm-notify.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/utils/statd/sm-notify.c b/utils/statd/sm-notify.c
index 9dbe5d908336..828a6991f8e6 100644
--- a/utils/statd/sm-notify.c
+++ b/utils/statd/sm-notify.c
@@ -42,6 +42,8 @@
 #define NSM_TIMEOUT	2
 #define NSM_MAX_TIMEOUT	120	/* don't make this too big */
 
+#define NLM_END_GRACE_FILE	"/proc/fs/lockd/nlm_end_grace"
+
 struct nsm_host {
 	struct nsm_host *	next;
 	char *			name;
@@ -450,6 +452,28 @@ retry:
 	return sock;
 }
 
+/* Inform the kernel that it's OK to lift lockd's grace period */
+static void
+nsm_lift_grace_period(void)
+{
+	int fd;
+
+	fd = open(NLM_END_GRACE_FILE, O_WRONLY);
+	if (fd < 0) {
+		/* Don't warn if file isn't present */
+		if (errno != ENOENT)
+			xlog(L_WARNING, "Unable to open %s: %m",
+				NLM_END_GRACE_FILE);
+		return;
+	}
+
+	if (write(fd, "Y", 1) < 0)
+		xlog(L_WARNING, "Unable to write to %s: %m", NLM_END_GRACE_FILE);
+
+	close(fd);
+	return;
+}
+
 int
 main(int argc, char **argv)
 {
@@ -534,6 +558,7 @@ usage:		fprintf(stderr,
 	(void)nsm_retire_monitored_hosts();
 	if (nsm_load_notify_list(smn_get_host) == 0) {
 		xlog(D_GENERAL, "No hosts to notify; exiting");
+		nsm_lift_grace_period();
 		return 0;
 	}
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/7] nfsdcltrack: update comments in sqlite.c
  2014-08-19 18:49 [PATCH v2 0/7] nfs-utils: support for lifting grace period early Jeff Layton
  2014-08-19 18:49 ` [PATCH v2 1/7] sm-notify: inform the kernel if there were no hosts to notify Jeff Layton
@ 2014-08-19 18:49 ` Jeff Layton
  2014-08-19 18:49 ` [PATCH v2 3/7] nfsdcltrack: rename CLD_* constants with CLTRACK_* prefixes Jeff Layton
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2014-08-19 18:49 UTC (permalink / raw)
  To: steved; +Cc: linux-nfs

Clean up and fix some inaccuracies.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
---
 utils/nfsdcltrack/sqlite.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/utils/nfsdcltrack/sqlite.c b/utils/nfsdcltrack/sqlite.c
index bac678980938..352f029d55c0 100644
--- a/utils/nfsdcltrack/sqlite.c
+++ b/utils/nfsdcltrack/sqlite.c
@@ -21,17 +21,15 @@
  * Explanation:
  *
  * This file contains the code to manage the sqlite backend database for the
- * clstated upcall daemon.
+ * nfsdcltrack usermodehelper upcall program.
  *
  * The main database is called main.sqlite and contains the following tables:
  *
  * parameters: simple key/value pairs for storing database info
  *
- * clients: one column containing a BLOB with the as sent by the client
- * 	    and a timestamp (in epoch seconds) of when the record was
- * 	    established
- *
- * FIXME: should we also record the fsid being accessed?
+ * clients: an "id" column containing a BLOB with the long-form clientid as
+ * 	    sent by the client, a "time" column containing a timestamp (in
+ * 	    epoch seconds) of when the record was last updated.
  */
 
 #ifdef HAVE_CONFIG_H
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/7] nfsdcltrack: rename CLD_* constants with CLTRACK_* prefixes
  2014-08-19 18:49 [PATCH v2 0/7] nfs-utils: support for lifting grace period early Jeff Layton
  2014-08-19 18:49 ` [PATCH v2 1/7] sm-notify: inform the kernel if there were no hosts to notify Jeff Layton
  2014-08-19 18:49 ` [PATCH v2 2/7] nfsdcltrack: update comments in sqlite.c Jeff Layton
@ 2014-08-19 18:49 ` Jeff Layton
  2014-08-19 18:49 ` [PATCH v2 4/7] nfsdcltrack: overhaul database initializtion Jeff Layton
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2014-08-19 18:49 UTC (permalink / raw)
  To: steved; +Cc: linux-nfs

From: Jeff Layton <jlayton@poochiereds.net>

Since nfsdcld has been dead for a few years now, clean up the prefixes
on the constants.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
---
 utils/nfsdcltrack/sqlite.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/utils/nfsdcltrack/sqlite.c b/utils/nfsdcltrack/sqlite.c
index 352f029d55c0..52fcaa4d7ca3 100644
--- a/utils/nfsdcltrack/sqlite.c
+++ b/utils/nfsdcltrack/sqlite.c
@@ -50,10 +50,10 @@
 
 #include "xlog.h"
 
-#define CLD_SQLITE_SCHEMA_VERSION 1
+#define CLTRACK_SQLITE_LATEST_SCHEMA_VERSION 1
 
 /* in milliseconds */
-#define CLD_SQLITE_BUSY_TIMEOUT 10000
+#define CLTRACK_SQLITE_BUSY_TIMEOUT 10000
 
 /* private data structures */
 
@@ -111,7 +111,7 @@ sqlite_prepare_dbh(const char *topdir)
 		return ret;
 	}
 
-	ret = sqlite3_busy_timeout(dbh, CLD_SQLITE_BUSY_TIMEOUT);
+	ret = sqlite3_busy_timeout(dbh, CLTRACK_SQLITE_BUSY_TIMEOUT);
 	if (ret != SQLITE_OK) {
 		xlog(L_ERROR, "Unable to set sqlite busy timeout: %d", ret);
 		sqlite3_close(dbh);
@@ -156,7 +156,7 @@ sqlite_maindb_init(const char *topdir)
 	/* insert version into table -- ignore error if it fails */
 	ret = snprintf(buf, sizeof(buf),
 		       "INSERT OR IGNORE INTO parameters values (\"version\", "
-		       "\"%d\");", CLD_SQLITE_SCHEMA_VERSION);
+		       "\"%d\");", CLTRACK_SQLITE_CURRENT_SCHEMA_VERSION);
 	if (ret < 0) {
 		goto out_err;
 	} else if ((size_t)ret >= sizeof(buf)) {
@@ -189,10 +189,10 @@ sqlite_maindb_init(const char *topdir)
 
 	/* process SELECT result */
 	ret = sqlite3_column_int(stmt, 0);
-	if (ret != CLD_SQLITE_SCHEMA_VERSION) {
+	if (ret != CLTRACK_SQLITE_CURRENT_SCHEMA_VERSION) {
 		xlog(L_ERROR, "Unsupported database schema version! "
 			"Expected %d, got %d.",
-			CLD_SQLITE_SCHEMA_VERSION, ret);
+			CLTRACK_SQLITE_CURRENT_SCHEMA_VERSION, ret);
 		ret = -EINVAL;
 		goto out_err;
 	}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 4/7] nfsdcltrack: overhaul database initializtion
  2014-08-19 18:49 [PATCH v2 0/7] nfs-utils: support for lifting grace period early Jeff Layton
                   ` (2 preceding siblings ...)
  2014-08-19 18:49 ` [PATCH v2 3/7] nfsdcltrack: rename CLD_* constants with CLTRACK_* prefixes Jeff Layton
@ 2014-08-19 18:49 ` Jeff Layton
  2014-08-19 18:49 ` [PATCH v2 5/7] nfsdcltrack: update schema to v2 Jeff Layton
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2014-08-19 18:49 UTC (permalink / raw)
  To: steved; +Cc: linux-nfs

We have some possibility for races with nfsdcltrack when the DB schema
is upgraded.

Suppose we update the nfs-utils package on a machine after the DB has
been initialized. With the current scheme of initializing the DB only
during the "init" phase, we could end up with a new program that expects
a new schema with an old database.

We could try to do a one-time update when the package is installed, but
that could be racy. We could get an upcall between when the program is
installed and when we run the update. Also, relying on packaging to get
that right is tricky at best. To fix this, change how the database
initialization and checking of the schema revision works.

On every upcall, attempt to open the db as we normally would. If that
fails, then try to create the directory if it doesn't exist and then
retry the open. If it fails again, then give up.

If we get a successful open, then query the DB for the schema version.
If it matches what we expect, then declare success and move on. If the
query fails then assume that the DB isn't set up yet. Start an exclusive
transaction, check the schema version again and then set up the DB if no
one raced in to create it in the meantime.

This should only add a tiny bit of overhead on most upcalls (just an
extra select of the parameters table), and should improve the
performance of the "init" upcall. It'll also make it possible to handle
DB schema changes sanely.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
---
 utils/nfsdcltrack/nfsdcltrack.c |   2 +-
 utils/nfsdcltrack/sqlite.c      | 225 +++++++++++++++++++++++++---------------
 utils/nfsdcltrack/sqlite.h      |   1 -
 3 files changed, 142 insertions(+), 86 deletions(-)

diff --git a/utils/nfsdcltrack/nfsdcltrack.c b/utils/nfsdcltrack/nfsdcltrack.c
index 4334340197b1..f2813610be73 100644
--- a/utils/nfsdcltrack/nfsdcltrack.c
+++ b/utils/nfsdcltrack/nfsdcltrack.c
@@ -241,7 +241,7 @@ cltrack_init(const char __attribute__((unused)) *unused)
 	}
 
 	/* set up storage db */
-	ret = sqlite_maindb_init(storagedir);
+	ret = sqlite_prepare_dbh(storagedir);
 	if (ret) {
 		xlog(L_ERROR, "Failed to init database: %d", ret);
 		/*
diff --git a/utils/nfsdcltrack/sqlite.c b/utils/nfsdcltrack/sqlite.c
index 52fcaa4d7ca3..dde2f7ff8621 100644
--- a/utils/nfsdcltrack/sqlite.c
+++ b/utils/nfsdcltrack/sqlite.c
@@ -88,135 +88,192 @@ mkdir_if_not_exist(const char *dirname)
 	return ret;
 }
 
-/* Open the database and set up the database handle for it */
-int
-sqlite_prepare_dbh(const char *topdir)
+static int
+sqlite_query_schema_version(void)
 {
 	int ret;
+	sqlite3_stmt *stmt = NULL;
 
-	/* Do nothing if the database handle is already set up */
-	if (dbh)
-		return 0;
-
-	ret = snprintf(buf, PATH_MAX - 1, "%s/main.sqlite", topdir);
-	if (ret < 0)
-		return ret;
-
-	buf[PATH_MAX - 1] = '\0';
-
-	ret = sqlite3_open(buf, &dbh);
+	/* prepare select query */
+	ret = sqlite3_prepare_v2(dbh,
+		"SELECT value FROM parameters WHERE key == \"version\";",
+		 -1, &stmt, NULL);
 	if (ret != SQLITE_OK) {
-		xlog(L_ERROR, "Unable to open main database: %d", ret);
-		dbh = NULL;
-		return ret;
+		xlog(L_ERROR, "Unable to prepare select statement: %s",
+			sqlite3_errstr(ret));
+		ret = 0;
+		goto out;
 	}
 
-	ret = sqlite3_busy_timeout(dbh, CLTRACK_SQLITE_BUSY_TIMEOUT);
-	if (ret != SQLITE_OK) {
-		xlog(L_ERROR, "Unable to set sqlite busy timeout: %d", ret);
-		sqlite3_close(dbh);
-		dbh = NULL;
+	/* query schema version */
+	ret = sqlite3_step(stmt);
+	if (ret != SQLITE_ROW) {
+		xlog(L_ERROR, "Select statement execution failed: %s",
+				sqlite3_errstr(ret));
+		ret = 0;
+		goto out;
 	}
 
+	ret = sqlite3_column_int(stmt, 0);
+out:
+	sqlite3_finalize(stmt);
 	return ret;
 }
 
 /*
- * Open the "main" database, and attempt to initialize it by creating the
- * parameters table and inserting the schema version into it. Ignore any errors
- * from that, and then attempt to select the version out of it again. If the
- * version appears wrong, then assume that the DB is corrupt or has been
- * upgraded, and return an error. If all of that works, then attempt to create
- * the "clients" table.
+ * Start an exclusive transaction and recheck the DB schema version. If it's
+ * still zero (indicating a new database) then set it up. If that all works,
+ * then insert schema version into the parameters table and commit the
+ * transaction. On any error, rollback the transaction.
  */
 int
-sqlite_maindb_init(const char *topdir)
+sqlite_maindb_init_v1(void)
 {
 	int ret;
 	char *err = NULL;
-	sqlite3_stmt *stmt = NULL;
 
-	ret = mkdir_if_not_exist(topdir);
-	if (ret)
+	/* Start a transaction */
+	ret = sqlite3_exec(dbh, "BEGIN EXCLUSIVE TRANSACTION;", NULL, NULL,
+				&err);
+	if (ret != SQLITE_OK) {
+		xlog(L_ERROR, "Unable to begin transaction: %s", err);
 		return ret;
+	}
 
-	ret = sqlite_prepare_dbh(topdir);
-	if (ret)
-		return ret;
+	/*
+	 * Check schema version again. This time, under an exclusive
+	 * transaction to guard against racing DB setup attempts
+	 */
+	ret = sqlite_query_schema_version();
+	switch (ret) {
+	case 0:
+		/* Query failed again -- set up DB */
+		break;
+	case CLTRACK_SQLITE_LATEST_SCHEMA_VERSION:
+		/* Someone else raced in and set it up */
+		ret = 0;
+		goto rollback;
+	default:
+		/* Something went wrong -- fail! */
+		ret = -EINVAL;
+		goto rollback;
+	}
 
-	/* Try to create table */
-	ret = sqlite3_exec(dbh, "CREATE TABLE IF NOT EXISTS parameters "
+	ret = sqlite3_exec(dbh, "CREATE TABLE parameters "
 				"(key TEXT PRIMARY KEY, value TEXT);",
 				NULL, NULL, &err);
 	if (ret != SQLITE_OK) {
-		xlog(L_ERROR, "Unable to create parameter table: %d", ret);
-		goto out_err;
+		xlog(L_ERROR, "Unable to create parameter table: %s", err);
+		goto rollback;
 	}
 
-	/* insert version into table -- ignore error if it fails */
-	ret = snprintf(buf, sizeof(buf),
-		       "INSERT OR IGNORE INTO parameters values (\"version\", "
-		       "\"%d\");", CLTRACK_SQLITE_CURRENT_SCHEMA_VERSION);
+	/* create the "clients" table */
+	ret = sqlite3_exec(dbh, "CREATE TABLE clients (id BLOB PRIMARY KEY, "
+				"time INTEGER);",
+				NULL, NULL, &err);
+	if (ret != SQLITE_OK) {
+		xlog(L_ERROR, "Unable to create clients table: %s", err);
+		goto rollback;
+	}
+
+
+	/* insert version into parameters table */
+	ret = snprintf(buf, sizeof(buf), "INSERT OR FAIL INTO parameters "
+			"values (\"version\", \"%d\");",
+			CLTRACK_SQLITE_LATEST_SCHEMA_VERSION);
 	if (ret < 0) {
-		goto out_err;
+		xlog(L_ERROR, "sprintf failed!");
+		goto rollback;
 	} else if ((size_t)ret >= sizeof(buf)) {
+		xlog(L_ERROR, "sprintf output too long! (%d chars)", ret);
 		ret = -EINVAL;
-		goto out_err;
+		goto rollback;
 	}
 
 	ret = sqlite3_exec(dbh, (const char *)buf, NULL, NULL, &err);
 	if (ret != SQLITE_OK) {
-		xlog(L_ERROR, "Unable to insert into parameter table: %d",
-				ret);
-		goto out_err;
+		xlog(L_ERROR, "Unable to insert into parameter table: %s", err);
+		goto rollback;
 	}
 
-	ret = sqlite3_prepare_v2(dbh,
-		"SELECT value FROM parameters WHERE key == \"version\";",
-		 -1, &stmt, NULL);
+	ret = sqlite3_exec(dbh, "COMMIT TRANSACTION;", NULL, NULL, &err);
 	if (ret != SQLITE_OK) {
-		xlog(L_ERROR, "Unable to prepare select statement: %d", ret);
-		goto out_err;
+		xlog(L_ERROR, "Unable to commit transaction: %s", err);
+		goto rollback;
 	}
+out:
+	sqlite3_free(err);
+	return ret;
 
-	/* check schema version */
-	ret = sqlite3_step(stmt);
-	if (ret != SQLITE_ROW) {
-		xlog(L_ERROR, "Select statement execution failed: %s",
-				sqlite3_errmsg(dbh));
-		goto out_err;
-	}
+rollback:
+	/* Attempt to rollback the transaction */
+	sqlite3_exec(dbh, "ROLLBACK TRANSACTION;", NULL, NULL, &err);
+	goto out;
+}
 
-	/* process SELECT result */
-	ret = sqlite3_column_int(stmt, 0);
-	if (ret != CLTRACK_SQLITE_CURRENT_SCHEMA_VERSION) {
-		xlog(L_ERROR, "Unsupported database schema version! "
-			"Expected %d, got %d.",
-			CLTRACK_SQLITE_CURRENT_SCHEMA_VERSION, ret);
-		ret = -EINVAL;
-		goto out_err;
-	}
+/* Open the database and set up the database handle for it */
+int
+sqlite_prepare_dbh(const char *topdir)
+{
+	int ret;
 
-	/* now create the "clients" table */
-	ret = sqlite3_exec(dbh, "CREATE TABLE IF NOT EXISTS clients "
-				"(id BLOB PRIMARY KEY, time INTEGER);",
-				NULL, NULL, &err);
+	/* Do nothing if the database handle is already set up */
+	if (dbh)
+		return 0;
+
+	ret = snprintf(buf, PATH_MAX - 1, "%s/main.sqlite", topdir);
+	if (ret < 0)
+		return ret;
+
+	buf[PATH_MAX - 1] = '\0';
+
+	/* open a new DB handle */
+	ret = sqlite3_open(buf, &dbh);
 	if (ret != SQLITE_OK) {
-		xlog(L_ERROR, "Unable to create clients table: %s", err);
-		goto out_err;
+		/* try to create the dir */
+		ret = mkdir_if_not_exist(topdir);
+		if (ret)
+			goto out_close;
+
+		/* retry open */
+		ret = sqlite3_open(buf, &dbh);
+		if (ret != SQLITE_OK)
+			goto out_close;
 	}
 
-	sqlite3_free(err);
-	sqlite3_finalize(stmt);
-	return 0;
+	/* set busy timeout */
+	ret = sqlite3_busy_timeout(dbh, CLTRACK_SQLITE_BUSY_TIMEOUT);
+	if (ret != SQLITE_OK) {
+		xlog(L_ERROR, "Unable to set sqlite busy timeout: %s",
+				sqlite3_errstr(ret));
+		goto out_close;
+	}
 
-out_err:
-	if (err) {
-		xlog(L_ERROR, "sqlite error: %s", err);
-		sqlite3_free(err);
+	ret = sqlite_query_schema_version();
+	switch (ret) {
+	case CLTRACK_SQLITE_LATEST_SCHEMA_VERSION:
+		/* DB is already set up. Do nothing */
+		ret = 0;
+		break;
+	case 0:
+		/* Query failed -- try to set up new DB */
+		ret = sqlite_maindb_init_v1();
+		if (ret)
+			goto out_close;
+		break;
+	default:
+		/* Unknown DB version -- downgrade? Fail */
+		xlog(L_ERROR, "Unsupported database schema version! "
+			"Expected %d, got %d.",
+			CLTRACK_SQLITE_LATEST_SCHEMA_VERSION, ret);
+		ret = -EINVAL;
+		goto out_close;
 	}
-	sqlite3_finalize(stmt);
-	sqlite3_close(dbh);
+
+	return ret;
+out_close:
+	sqlite3_close_v2(dbh);
+	dbh = NULL;
 	return ret;
 }
 
diff --git a/utils/nfsdcltrack/sqlite.h b/utils/nfsdcltrack/sqlite.h
index ebf04c3cdbaa..3713bfb66e2f 100644
--- a/utils/nfsdcltrack/sqlite.h
+++ b/utils/nfsdcltrack/sqlite.h
@@ -21,7 +21,6 @@
 #define _SQLITE_H_
 
 int sqlite_prepare_dbh(const char *topdir);
-int sqlite_maindb_init(const char *topdir);
 int sqlite_insert_client(const unsigned char *clname, const size_t namelen);
 int sqlite_remove_client(const unsigned char *clname, const size_t namelen);
 int sqlite_check_client(const unsigned char *clname, const size_t namelen);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 5/7] nfsdcltrack: update schema to v2
  2014-08-19 18:49 [PATCH v2 0/7] nfs-utils: support for lifting grace period early Jeff Layton
                   ` (3 preceding siblings ...)
  2014-08-19 18:49 ` [PATCH v2 4/7] nfsdcltrack: overhaul database initializtion Jeff Layton
@ 2014-08-19 18:49 ` Jeff Layton
  2014-08-19 18:49 ` [PATCH v2 6/7] nfsdcltrack: grab the client minorversion from the env var if it's present Jeff Layton
  2014-08-19 18:49 ` [PATCH v2 7/7] nfsdcltrack: fetch NFSDCLTRACK_GRACE_START out of environment Jeff Layton
  6 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2014-08-19 18:49 UTC (permalink / raw)
  To: steved; +Cc: linux-nfs

From: Jeff Layton <jlayton@poochiereds.net>

In order to allow knfsd's lock manager to lift its grace period early,
we need to figure out whether all clients have finished reclaiming
their state not. Unfortunately, the current code doesn't allow us to
ascertain this. All we track for each client is a timestamp that tells
us when the last "check" or "create" operation came in.

We need to track the two timestamps separately. Add a new
"reclaim_complete" column to the database that tells us when the last
"create" operation came in. For now, we just insert "0" in that column
but a later patch will make it so that we insert a real timestamp for
v4.1+ client records.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
---
 utils/nfsdcltrack/sqlite.c | 102 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 94 insertions(+), 8 deletions(-)

diff --git a/utils/nfsdcltrack/sqlite.c b/utils/nfsdcltrack/sqlite.c
index dde2f7ff8621..e260e81b1722 100644
--- a/utils/nfsdcltrack/sqlite.c
+++ b/utils/nfsdcltrack/sqlite.c
@@ -29,7 +29,10 @@
  *
  * clients: an "id" column containing a BLOB with the long-form clientid as
  * 	    sent by the client, a "time" column containing a timestamp (in
- * 	    epoch seconds) of when the record was last updated.
+ * 	    epoch seconds) of when the record was last updated, and a
+ * 	    "reclaim_complete" column containing a timestamp (in epoch seconds)
+ * 	    of when the last "create" operation came in for v4.1+ clients.
+ * 	    v4.0 clients should always have this set to 0.
  */
 
 #ifdef HAVE_CONFIG_H
@@ -50,7 +53,7 @@
 
 #include "xlog.h"
 
-#define CLTRACK_SQLITE_LATEST_SCHEMA_VERSION 1
+#define CLTRACK_SQLITE_LATEST_SCHEMA_VERSION 2
 
 /* in milliseconds */
 #define CLTRACK_SQLITE_BUSY_TIMEOUT 10000
@@ -120,6 +123,81 @@ out:
 	return ret;
 }
 
+static int
+sqlite_maindb_update_v1_to_v2(void)
+{
+	int ret, ret2;
+	char *err;
+
+	/* begin transaction */
+	ret = sqlite3_exec(dbh, "BEGIN EXCLUSIVE TRANSACTION;", NULL, NULL,
+				&err);
+	if (ret != SQLITE_OK) {
+		xlog(L_ERROR, "Unable to begin transaction: %s", err);
+		goto rollback;
+	}
+
+	/*
+	 * Check schema version again. This time, under an exclusive
+	 * transaction to guard against racing DB setup attempts
+	 */
+	ret = sqlite_query_schema_version();
+	switch (ret) {
+	case 1:
+		/* Still at v1 -- do conversion */
+		break;
+	case CLTRACK_SQLITE_LATEST_SCHEMA_VERSION:
+		/* Someone else raced in and set it up */
+		ret = 0;
+		goto rollback;
+	default:
+		/* Something went wrong -- fail! */
+		ret = -EINVAL;
+		goto rollback;
+	}
+
+	/* create v2 clients table */
+	ret = sqlite3_exec(dbh, "ALTER TABLE clients ADD COLUMN "
+				"reclaim_complete INTEGER;",
+				NULL, NULL, &err);
+	if (ret != SQLITE_OK) {
+		xlog(L_ERROR, "Unable to update clients table: %s", err);
+		goto rollback;
+	}
+
+	ret = snprintf(buf, sizeof(buf), "UPDATE parameters SET value = %d "
+			"WHERE key = \"version\";",
+			CLTRACK_SQLITE_LATEST_SCHEMA_VERSION);
+	if (ret < 0) {
+		xlog(L_ERROR, "sprintf failed!");
+		goto rollback;
+	} else if ((size_t)ret >= sizeof(buf)) {
+		xlog(L_ERROR, "sprintf output too long! (%d chars)", ret);
+		ret = -EINVAL;
+		goto rollback;
+	}
+
+	ret = sqlite3_exec(dbh, (const char *)buf, NULL, NULL, &err);
+	if (ret != SQLITE_OK) {
+		xlog(L_ERROR, "Unable to update schema version: %s", err);
+		goto rollback;
+	}
+
+	ret = sqlite3_exec(dbh, "COMMIT TRANSACTION;", NULL, NULL, &err);
+	if (ret != SQLITE_OK) {
+		xlog(L_ERROR, "Unable to commit transaction: %s", err);
+		goto rollback;
+	}
+out:
+	sqlite3_free(err);
+	return ret;
+rollback:
+	ret2 = sqlite3_exec(dbh, "ROLLBACK TRANSACTION;", NULL, NULL, &err);
+	if (ret2 != SQLITE_OK)
+		xlog(L_ERROR, "Unable to rollback transaction: %s", err);
+	goto out;
+}
+
 /*
  * Start an exclusive transaction and recheck the DB schema version. If it's
  * still zero (indicating a new database) then set it up. If that all works,
@@ -127,9 +205,9 @@ out:
  * transaction. On any error, rollback the transaction.
  */
 int
-sqlite_maindb_init_v1(void)
+sqlite_maindb_init_v2(void)
 {
-	int ret;
+	int ret, ret2;
 	char *err = NULL;
 
 	/* Start a transaction */
@@ -169,7 +247,7 @@ sqlite_maindb_init_v1(void)
 
 	/* create the "clients" table */
 	ret = sqlite3_exec(dbh, "CREATE TABLE clients (id BLOB PRIMARY KEY, "
-				"time INTEGER);",
+				"time INTEGER, reclaim_complete INTEGER);",
 				NULL, NULL, &err);
 	if (ret != SQLITE_OK) {
 		xlog(L_ERROR, "Unable to create clients table: %s", err);
@@ -207,7 +285,9 @@ out:
 
 rollback:
 	/* Attempt to rollback the transaction */
-	sqlite3_exec(dbh, "ROLLBACK TRANSACTION;", NULL, NULL, &err);
+	ret2 = sqlite3_exec(dbh, "ROLLBACK TRANSACTION;", NULL, NULL, &err);
+	if (ret2 != SQLITE_OK)
+		xlog(L_ERROR, "Unable to rollback transaction: %s", err);
 	goto out;
 }
 
@@ -255,9 +335,15 @@ sqlite_prepare_dbh(const char *topdir)
 		/* DB is already set up. Do nothing */
 		ret = 0;
 		break;
+	case 1:
+		/* Old DB -- update to new schema */
+		ret = sqlite_maindb_update_v1_to_v2();
+		if (ret)
+			goto out_close;
+		break;
 	case 0:
 		/* Query failed -- try to set up new DB */
-		ret = sqlite_maindb_init_v1();
+		ret = sqlite_maindb_init_v2();
 		if (ret)
 			goto out_close;
 		break;
@@ -289,7 +375,7 @@ sqlite_insert_client(const unsigned char *clname, const size_t namelen)
 	sqlite3_stmt *stmt = NULL;
 
 	ret = sqlite3_prepare_v2(dbh, "INSERT OR REPLACE INTO clients VALUES "
-				      "(?, strftime('%s', 'now'));", -1,
+					"(?, strftime('%s', 'now'), 0);", -1,
 					&stmt, NULL);
 	if (ret != SQLITE_OK) {
 		xlog(L_ERROR, "%s: insert statement prepare failed: %s",
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 6/7] nfsdcltrack: grab the client minorversion from the env var if it's present
  2014-08-19 18:49 [PATCH v2 0/7] nfs-utils: support for lifting grace period early Jeff Layton
                   ` (4 preceding siblings ...)
  2014-08-19 18:49 ` [PATCH v2 5/7] nfsdcltrack: update schema to v2 Jeff Layton
@ 2014-08-19 18:49 ` Jeff Layton
  2014-08-19 18:49 ` [PATCH v2 7/7] nfsdcltrack: fetch NFSDCLTRACK_GRACE_START out of environment Jeff Layton
  6 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2014-08-19 18:49 UTC (permalink / raw)
  To: steved; +Cc: linux-nfs

And set the reclaim_complete field in the DB based on whether it's zero
or not. We have no way to know for certain when a v4.0 client has
completed reclaim, so we always set that field in the DB to zero for
v4.0 clients.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
---
 utils/nfsdcltrack/nfsdcltrack.c | 29 +++++++++++++++++++++++++++--
 utils/nfsdcltrack/sqlite.c      | 16 ++++++++++++----
 utils/nfsdcltrack/sqlite.h      |  3 ++-
 3 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/utils/nfsdcltrack/nfsdcltrack.c b/utils/nfsdcltrack/nfsdcltrack.c
index f2813610be73..cbc98c7edc11 100644
--- a/utils/nfsdcltrack/nfsdcltrack.c
+++ b/utils/nfsdcltrack/nfsdcltrack.c
@@ -37,6 +37,7 @@
 #include <libgen.h>
 #include <sys/inotify.h>
 #include <dirent.h>
+#include <limits.h>
 #ifdef HAVE_SYS_CAPABILITY_H
 #include <sys/prctl.h>
 #include <sys/capability.h>
@@ -254,6 +255,30 @@ cltrack_init(const char __attribute__((unused)) *unused)
 	return ret;
 }
 
+/*
+ * Fetch the contents of the NFSDCLTRACK_CLIENT_MINORVERSION env var. If
+ * it's not set or there is an error converting it to an unsigned int
+ * then return 0 (since the minorversion isn't reliable at that point).
+ */
+static unsigned int
+cltrack_get_minorvers(void)
+{
+	unsigned long minorvers;
+	char *end;
+	char *minorvers_str = getenv("NFSDCLTRACK_CLIENT_MINORVERSION");
+
+	if (!minorvers_str)
+		return 0;
+
+	errno = 0;
+	minorvers = strtoul(minorvers_str, &end, 0);
+	/* Problem converting or value is too large? */
+	if (errno || minorvers > UINT_MAX)
+		return 0;
+
+	return (unsigned int)minorvers;
+}
+
 static int
 cltrack_create(const char *id)
 {
@@ -270,7 +295,7 @@ cltrack_create(const char *id)
 	if (len < 0)
 		return (int)len;
 
-	ret = sqlite_insert_client(blob, len);
+	ret = sqlite_insert_client(blob, len, cltrack_get_minorvers());
 
 	return ret ? -EREMOTEIO : ret;
 }
@@ -323,7 +348,7 @@ cltrack_check_legacy(const unsigned char *blob, const ssize_t len)
 	}
 
 	/* Dir exists, try to insert record into db */
-	ret = sqlite_insert_client(blob, len);
+	ret = sqlite_insert_client(blob, len, 0);
 	if (ret) {
 		xlog(D_GENERAL, "Failed to insert client: %d", ret);
 		return -EREMOTEIO;
diff --git a/utils/nfsdcltrack/sqlite.c b/utils/nfsdcltrack/sqlite.c
index e260e81b1722..01084bb6c4d8 100644
--- a/utils/nfsdcltrack/sqlite.c
+++ b/utils/nfsdcltrack/sqlite.c
@@ -369,14 +369,22 @@ out_close:
  * Returns a non-zero sqlite error code, or SQLITE_OK (aka 0)
  */
 int
-sqlite_insert_client(const unsigned char *clname, const size_t namelen)
+sqlite_insert_client(const unsigned char *clname, const size_t namelen,
+			const unsigned int minorvers)
 {
 	int ret;
 	sqlite3_stmt *stmt = NULL;
 
-	ret = sqlite3_prepare_v2(dbh, "INSERT OR REPLACE INTO clients VALUES "
-					"(?, strftime('%s', 'now'), 0);", -1,
-					&stmt, NULL);
+	if (minorvers == 0)
+		ret = sqlite3_prepare_v2(dbh, "INSERT OR REPLACE INTO clients "
+				"VALUES (?, strftime('%s', 'now'), 0);", -1,
+				&stmt, NULL);
+	else
+		ret = sqlite3_prepare_v2(dbh, "INSERT OR REPLACE INTO clients "
+				"VALUES (?, strftime('%s', 'now'), "
+				"strftime('%s', 'now'));", -1,
+				&stmt, NULL);
+
 	if (ret != SQLITE_OK) {
 		xlog(L_ERROR, "%s: insert statement prepare failed: %s",
 			__func__, sqlite3_errmsg(dbh));
diff --git a/utils/nfsdcltrack/sqlite.h b/utils/nfsdcltrack/sqlite.h
index 3713bfb66e2f..e9cc84cbb294 100644
--- a/utils/nfsdcltrack/sqlite.h
+++ b/utils/nfsdcltrack/sqlite.h
@@ -21,7 +21,8 @@
 #define _SQLITE_H_
 
 int sqlite_prepare_dbh(const char *topdir);
-int sqlite_insert_client(const unsigned char *clname, const size_t namelen);
+int sqlite_insert_client(const unsigned char *clname, const size_t namelen,
+				const unsigned int minorvers);
 int sqlite_remove_client(const unsigned char *clname, const size_t namelen);
 int sqlite_check_client(const unsigned char *clname, const size_t namelen);
 int sqlite_remove_unreclaimed(const time_t grace_start);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 7/7] nfsdcltrack: fetch NFSDCLTRACK_GRACE_START out of environment
  2014-08-19 18:49 [PATCH v2 0/7] nfs-utils: support for lifting grace period early Jeff Layton
                   ` (5 preceding siblings ...)
  2014-08-19 18:49 ` [PATCH v2 6/7] nfsdcltrack: grab the client minorversion from the env var if it's present Jeff Layton
@ 2014-08-19 18:49 ` Jeff Layton
  6 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2014-08-19 18:49 UTC (permalink / raw)
  To: steved; +Cc: linux-nfs

Allow the fetching of NFSDCLTRACK_GRACE_START out of environment
variables. If it's present in the "create" or "init" upcalls, then we
can use that to query the database to see whether there are any clients
that have not issued a RECLAIM_COMPLETE since that time.

If there aren't any, then we know that all reclaim activity is now done
and we can then cue the kernel to lift the grace period.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
---
 utils/nfsdcltrack/nfsdcltrack.c | 77 ++++++++++++++++++++++++++++++++++++++++-
 utils/nfsdcltrack/sqlite.c      | 40 +++++++++++++++++++++
 utils/nfsdcltrack/sqlite.h      |  1 +
 3 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/utils/nfsdcltrack/nfsdcltrack.c b/utils/nfsdcltrack/nfsdcltrack.c
index cbc98c7edc11..f500216e6021 100644
--- a/utils/nfsdcltrack/nfsdcltrack.c
+++ b/utils/nfsdcltrack/nfsdcltrack.c
@@ -50,6 +50,8 @@
 #define CLD_DEFAULT_STORAGEDIR NFS_STATEDIR "/nfsdcltrack"
 #endif
 
+#define NFSD_END_GRACE_FILE "/proc/fs/nfsd/v4_end_grace"
+
 /* defined by RFC 3530 */
 #define NFS4_OPAQUE_LIMIT	1024
 
@@ -211,6 +213,64 @@ cltrack_set_caps(void)
 	return ret;
 }
 
+/* Inform the kernel that it's OK to lift nfsd's grace period */
+static void
+cltrack_lift_grace_period(void)
+{
+	int fd;
+
+	fd = open(NFSD_END_GRACE_FILE, O_WRONLY);
+	if (fd < 0) {
+		/* Don't warn if file isn't present */
+		if (errno != ENOENT)
+			xlog(L_WARNING, "Unable to open %s: %m",
+				NFSD_END_GRACE_FILE);
+		return;
+	}
+
+	if (write(fd, "Y", 1) < 0)
+		xlog(L_WARNING, "Unable to write to %s: %m",
+				NFSD_END_GRACE_FILE);
+
+	close(fd);
+	return;
+}
+
+/*
+ * Fetch the contents of the NFSDCLTRACK_GRACE_START env var. If it's not set
+ * or there's an error converting it to time_t, then return LONG_MAX.
+ */
+static time_t
+cltrack_get_grace_start(void)
+{
+	time_t grace_start;
+	char *end;
+	char *grace_start_str = getenv("NFSDCLTRACK_GRACE_START");
+
+	if (!grace_start_str)
+		return LONG_MAX;
+
+	errno = 0;
+	grace_start = strtol(grace_start_str, &end, 0);
+	/* Problem converting or value is too large? */
+	if (errno)
+		return LONG_MAX;
+
+	return grace_start;
+}
+
+static bool
+cltrack_reclaims_complete(void)
+{
+	time_t grace_start = cltrack_get_grace_start();
+
+	/* Don't query DB if we didn't get a valid boot time */
+	if (grace_start == LONG_MAX)
+		return false;
+
+	return !sqlite_query_reclaiming(grace_start);
+}
+
 static int
 cltrack_init(const char __attribute__((unused)) *unused)
 {
@@ -251,7 +311,11 @@ cltrack_init(const char __attribute__((unused)) *unused)
 		 * stop upcalling until the problem is resolved.
 		 */
 		ret = -EACCES;
+	} else {
+		if (cltrack_reclaims_complete())
+			cltrack_lift_grace_period();
 	}
+
 	return ret;
 }
 
@@ -284,6 +348,7 @@ cltrack_create(const char *id)
 {
 	int ret;
 	ssize_t len;
+	unsigned int minorvers;
 
 	xlog(D_GENERAL, "%s: create client record.", __func__);
 
@@ -295,8 +360,18 @@ cltrack_create(const char *id)
 	if (len < 0)
 		return (int)len;
 
-	ret = sqlite_insert_client(blob, len, cltrack_get_minorvers());
+	minorvers = cltrack_get_minorvers();
+	ret = sqlite_insert_client(blob, len, minorvers);
+	if (ret)
+		goto out;
+
+	/* No point in querying the DB if this client was v4.0 */
+	if (minorvers == 0)
+		goto out;
 
+	if (cltrack_reclaims_complete())
+		cltrack_lift_grace_period();
+out:
 	return ret ? -EREMOTEIO : ret;
 }
 
diff --git a/utils/nfsdcltrack/sqlite.c b/utils/nfsdcltrack/sqlite.c
index 01084bb6c4d8..17437e48ffaa 100644
--- a/utils/nfsdcltrack/sqlite.c
+++ b/utils/nfsdcltrack/sqlite.c
@@ -547,3 +547,43 @@ sqlite_remove_unreclaimed(time_t grace_start)
 	sqlite3_free(err);
 	return ret;
 }
+
+/*
+ * Are there any clients that are possibly still reclaiming? Return a positive
+ * integer (usually number of clients) if so. If not, then return 0. On any
+ * error, return non-zero.
+ */
+int
+sqlite_query_reclaiming(const time_t grace_start)
+{
+	int ret;
+	sqlite3_stmt *stmt = NULL;
+
+	ret = sqlite3_prepare_v2(dbh, "SELECT count(*) FROM clients WHERE "
+				      "reclaim_complete < ?", -1, &stmt, NULL);
+	if (ret != SQLITE_OK) {
+		xlog(L_ERROR, "%s: unable to prepare select statement: %s",
+				__func__, sqlite3_errstr(ret));
+		return ret;
+	}
+
+	ret = sqlite3_bind_int64(stmt, 1, (sqlite3_int64)grace_start);
+	if (ret != SQLITE_OK) {
+		xlog(L_ERROR, "%s: bind int64 failed: %s",
+				__func__, sqlite3_errstr(ret));
+		return ret;
+	}
+
+	ret = sqlite3_step(stmt);
+	if (ret != SQLITE_ROW) {
+		xlog(L_ERROR, "%s: unexpected return code from select: %s",
+				__func__, sqlite3_errstr(ret));
+		return ret;
+	}
+
+	ret = sqlite3_column_int(stmt, 0);
+	sqlite3_finalize(stmt);
+	xlog(D_GENERAL, "%s: there are %d clients that have not completed "
+			"reclaim", __func__, ret);
+	return ret;
+}
diff --git a/utils/nfsdcltrack/sqlite.h b/utils/nfsdcltrack/sqlite.h
index e9cc84cbb294..0e1939198367 100644
--- a/utils/nfsdcltrack/sqlite.h
+++ b/utils/nfsdcltrack/sqlite.h
@@ -26,5 +26,6 @@ int sqlite_insert_client(const unsigned char *clname, const size_t namelen,
 int sqlite_remove_client(const unsigned char *clname, const size_t namelen);
 int sqlite_check_client(const unsigned char *clname, const size_t namelen);
 int sqlite_remove_unreclaimed(const time_t grace_start);
+int sqlite_query_reclaiming(const time_t grace_start);
 
 #endif /* _SQLITE_H */
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-08-19 18:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-19 18:49 [PATCH v2 0/7] nfs-utils: support for lifting grace period early Jeff Layton
2014-08-19 18:49 ` [PATCH v2 1/7] sm-notify: inform the kernel if there were no hosts to notify Jeff Layton
2014-08-19 18:49 ` [PATCH v2 2/7] nfsdcltrack: update comments in sqlite.c Jeff Layton
2014-08-19 18:49 ` [PATCH v2 3/7] nfsdcltrack: rename CLD_* constants with CLTRACK_* prefixes Jeff Layton
2014-08-19 18:49 ` [PATCH v2 4/7] nfsdcltrack: overhaul database initializtion Jeff Layton
2014-08-19 18:49 ` [PATCH v2 5/7] nfsdcltrack: update schema to v2 Jeff Layton
2014-08-19 18:49 ` [PATCH v2 6/7] nfsdcltrack: grab the client minorversion from the env var if it's present Jeff Layton
2014-08-19 18:49 ` [PATCH v2 7/7] nfsdcltrack: fetch NFSDCLTRACK_GRACE_START out of environment Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.