All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Assorted mount-related nfs-utils patches.
@ 2016-07-14  2:26 NeilBrown
  2016-07-14  2:26 ` [PATCH 3/8] mountd: remove 'dev_missing' checks NeilBrown
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: NeilBrown @ 2016-07-14  2:26 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing list

This series is an assortment of mount-related things.

1/ resend the manpage update for "retry=" that I sent a while
   back, but messed up
2/ remove a meaningless option from mountd.
3/ Fix up support for the "mountpoint" export option in mountd,
   particularly making it work sensibly with NFSv4
4/ temporary hostname failure should be treated by mount.nfs
   as a temporary condition
5/ Avoid advertising a temporary IPv6 address for NFSv4.0 callback.

That last point needs a kernel fix too, but I'm not sure what is best
yet.

If you have IPv6 privacy extensions enabled, then the default
address for outgoing connections is a temporary address which
becomes "deprecated" after a period of time, when another address will
become available for use.  A while later the deprecated address becomes
invalid, and will stop working.
So if a connection is held open for longer than the "valid lifetime"
(valid_lft in "ip -6 addr" output) it will stop working, which isn't
what users will expect.

We could possibly close and re-open the connection after the address
becomes deprecated.  Or we could insist on a 'public' address when
establishing the connection.  Thoughts?

When setting the call-back address for NFSv4.0, using a public address
is the only real option, so that is what this patch does.

Thanks,
NeilBrown

---

NeilBrown (8):
      nfs.man: clarify effect of 'retry' option.
      mountd: remove the --exports-file option
      mountd: remove 'dev_missing' checks
      mountd: cause attempts to access unmounted exportpoints to return ESTALE
      mountd: Don't export unmounted exports to NFSv4
      mountd: don't add paths to non-mounted export points to pseudo-root
      mount: don't treat temporary name resolution failure as permanent.
      mount: use a public address for IPv6 callback.


 support/include/v4root.h |    2 +-
 utils/mount/network.c    |    5 ++++
 utils/mount/nfs.man      |   14 ++++++++++--
 utils/mount/stropts.c    |   54 ++++++++++++++++++++++++++--------------------
 utils/mountd/auth.c      |   34 ++++++++++++++++++++++++-----
 utils/mountd/cache.c     |   36 ++++++++++++++-----------------
 utils/mountd/mountd.c    |   11 +++------
 utils/mountd/mountd.h    |    2 +-
 utils/mountd/mountd.man  |    8 -------
 utils/mountd/v4root.c    |   11 +++++++++
 10 files changed, 107 insertions(+), 70 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/8] nfs.man: clarify effect of 'retry' option.
  2016-07-14  2:26 [PATCH 0/8] Assorted mount-related nfs-utils patches NeilBrown
                   ` (3 preceding siblings ...)
  2016-07-14  2:26 ` [PATCH 2/8] mountd: remove the --exports-file option NeilBrown
@ 2016-07-14  2:26 ` NeilBrown
  2016-07-14  2:26 ` [PATCH 5/8] mountd: Don't export unmounted exports to NFSv4 NeilBrown
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2016-07-14  2:26 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing list

The total timeout for a "mount" attempt to a non-responsive server
will always be a multiple of the time a single mount attempt in the
kernel takes, which for TCP defaults to about 4 minutes.

The documentation for the "retry" option seems to suggest that this can be used
to set a maximum but it really sets a time after which to stop retrying.
The total can be as much as "retry" plus the time for a single attempt.

So clarify the documentation a bit, and also note that retrans
defaults are different for UDP and TCP:
   #define NFS_DEF_UDP_RETRANS	(3)
   #define NFS_DEF_TCP_RETRANS	(2)

Reported-by: Howard Guo<hguo@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 utils/mount/nfs.man |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man
index e541cdc95cb1..a0f790a5961b 100644
--- a/utils/mount/nfs.man
+++ b/utils/mount/nfs.man
@@ -158,8 +158,8 @@ up to a maximum timeout length of 60 seconds.
 The number of times the NFS client retries a request before
 it attempts further recovery action. If the
 .B retrans
-option is not specified, the NFS client tries each request
-three times.
+option is not specified, the NFS client tries each UDP request
+three times and each TCP request twice.
 .IP
 The NFS client generates a "server not responding" message
 after
@@ -391,6 +391,16 @@ is 2 minutes, and the default value for background mounts is 10000 minutes
 If a value of zero is specified, the
 .BR mount (8)
 command exits immediately after the first failure.
+.IP
+Note that this only affects how many retries are made and doesn't
+affect the delay caused by each retry.  For UDP each retry takes the
+time determined by the
+.BR timeo
+and
+.BR retrans
+options, which by default will be about 7 seconds.  For TCP the
+default is 3 minutes, but system TCP connection timeouts will
+sometimes limit the timeout of each retransmission to around 2 minutes.
 .TP 1.5i
 .BI sec= flavors
 A colon-separated list of one or more security flavors to use for accessing



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/8] mountd: remove the --exports-file option
  2016-07-14  2:26 [PATCH 0/8] Assorted mount-related nfs-utils patches NeilBrown
                   ` (2 preceding siblings ...)
  2016-07-14  2:26 ` [PATCH 4/8] mountd: cause attempts to access unmounted exportpoints to return ESTALE NeilBrown
@ 2016-07-14  2:26 ` NeilBrown
  2016-07-18 16:19   ` J. Bruce Fields
  2016-07-14  2:26 ` [PATCH 1/8] nfs.man: clarify effect of 'retry' option NeilBrown
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: NeilBrown @ 2016-07-14  2:26 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing list

It is completely ineffective.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 utils/mountd/auth.c     |    5 +----
 utils/mountd/mountd.c   |   11 +++--------
 utils/mountd/mountd.h   |    2 +-
 utils/mountd/mountd.man |    8 --------
 4 files changed, 5 insertions(+), 21 deletions(-)

diff --git a/utils/mountd/auth.c b/utils/mountd/auth.c
index 894a7a53957f..0881d9a6edba 100644
--- a/utils/mountd/auth.c
+++ b/utils/mountd/auth.c
@@ -36,7 +36,6 @@ enum auth_error
 };
 
 static void		auth_fixpath(char *path);
-static char	*export_file = NULL;
 static nfs_export my_exp;
 static nfs_client my_client;
 
@@ -44,10 +43,8 @@ extern int new_cache;
 extern int use_ipaddr;
 
 void
-auth_init(char *exports)
+auth_init(void)
 {
-
-	export_file = exports;
 	auth_reload();
 	xtab_mount_write();
 }
diff --git a/utils/mountd/mountd.c b/utils/mountd/mountd.c
index 063da269f895..7a51b093f66a 100644
--- a/utils/mountd/mountd.c
+++ b/utils/mountd/mountd.c
@@ -57,7 +57,6 @@ static struct option longopts[] =
 	{ "descriptors", 1, 0, 'o' },
 	{ "debug", 1, 0, 'd' },
 	{ "help", 0, 0, 'h' },
-	{ "exports-file", 1, 0, 'f' },
 	{ "nfs-version", 1, 0, 'V' },
 	{ "no-nfs-version", 1, 0, 'N' },
 	{ "version", 0, 0, 'v' },
@@ -689,7 +688,6 @@ get_exportlist(void)
 int
 main(int argc, char **argv)
 {
-	char	*export_file = _PATH_EXPORTS;
 	char    *state_dir = NFS_STATEDIR;
 	char	*progname;
 	unsigned int listeners = 0;
@@ -709,7 +707,7 @@ main(int argc, char **argv)
 
 	/* Parse the command line options and arguments. */
 	opterr = 0;
-	while ((c = getopt_long(argc, argv, "o:nFd:f:p:P:hH:N:V:vurs:t:g", longopts, NULL)) != EOF)
+	while ((c = getopt_long(argc, argv, "o:nFd:p:P:hH:N:V:vurs:t:g", longopts, NULL)) != EOF)
 		switch (c) {
 		case 'g':
 			manage_gids = 1;
@@ -728,9 +726,6 @@ main(int argc, char **argv)
 		case 'd':
 			xlog_sconfig(optarg, 1);
 			break;
-		case 'f':
-			export_file = optarg;
-			break;
 		case 'H': /* PRC: specify a high-availability callout program */
 			ha_callout_prog = optarg;
 			break;
@@ -862,7 +857,7 @@ main(int argc, char **argv)
 	sa.sa_handler = sig_hup;
 	sigaction(SIGHUP, &sa, NULL);
 
-	auth_init(export_file);
+	auth_init();
 
 	if (!foreground) {
 		/* We first fork off a child. */
@@ -908,7 +903,7 @@ usage(const char *prog, int n)
 {
 	fprintf(stderr,
 "Usage: %s [-F|--foreground] [-h|--help] [-v|--version] [-d kind|--debug kind]\n"
-"	[-o num|--descriptors num] [-f exports-file|--exports-file=file]\n"
+"	[-o num|--descriptors num]\n"
 "	[-p|--port port] [-V version|--nfs-version version]\n"
 "	[-N version|--no-nfs-version version] [-n|--no-tcp]\n"
 "	[-H prog |--ha-callout prog] [-r |--reverse-lookup]\n"
diff --git a/utils/mountd/mountd.h b/utils/mountd/mountd.h
index 6d358a75d9f3..f058f01d3584 100644
--- a/utils/mountd/mountd.h
+++ b/utils/mountd/mountd.h
@@ -39,7 +39,7 @@ bool_t		mount_pathconf_2_svc(struct svc_req *, dirpath *, ppathcnf *);
 bool_t		mount_mnt_3_svc(struct svc_req *, dirpath *, mountres3 *);
 
 void		mount_dispatch(struct svc_req *, SVCXPRT *);
-void		auth_init(char *export_file);
+void		auth_init(void);
 unsigned int	auth_reload(void);
 nfs_export *	auth_authenticate(const char *what,
 					const struct sockaddr *caller,
diff --git a/utils/mountd/mountd.man b/utils/mountd/mountd.man
index 66e3bba7e865..e0d1a0acba3a 100644
--- a/utils/mountd/mountd.man
+++ b/utils/mountd/mountd.man
@@ -86,14 +86,6 @@ Turn on debugging. Valid kinds are: all, auth, call, general and parse.
 .B \-F " or " \-\-foreground
 Run in foreground (do not daemonize)
 .TP
-.B \-f export-file " or " \-\-exports-file export-file
-This option specifies the exports file, listing the clients that this
-server is prepared to serve and parameters to apply to each
-such mount (see
-.BR exports (5)).
-By default, export information is read from
-.IR /etc/exports .
-.TP
 .B \-h " or " \-\-help
 Display usage message.
 .TP



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/8] mountd: remove 'dev_missing' checks
  2016-07-14  2:26 [PATCH 0/8] Assorted mount-related nfs-utils patches NeilBrown
@ 2016-07-14  2:26 ` NeilBrown
  2016-07-18 20:01   ` J. Bruce Fields
  2016-07-14  2:26 ` [PATCH 6/8] mountd: don't add paths to non-mounted export points to pseudo-root NeilBrown
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: NeilBrown @ 2016-07-14  2:26 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing list

I now think this was a mistaken idea.

If a filesystem is exported with the "mountpoint" or "mp" option, it
should only be exported if the directory is a mount point.  The
intention is that if there is a problem with one filesystem on a
server, the others can still be exported, but clients won't
incorrectly see the empty directory on the parent when accessing the
missing filesystem, they will see clearly that the filesystem is
missing.

It is easy to handle this correctly for NFSv3 MOUNT requests, but what
is the correct behavior if a client already has the filesystem mounted
and so has a filehandle?  Maybe the server rebooted and came back with
one device missing.  What should the client see?

The "dev_missing" code tries to detect this case and causes the server
to respond with silence rather than ESTALE.  The idea was that the
client would retry and when (or if) the filesystem came back, service
would be transparently restored.

The problem with this is that arbitrarily long delays are not what
people would expect, and can be quite annoying.  ESTALE, while
unpleasant, it at least easily understood.  A device disappearing is a
fairly significant event and hiding it doesn't really serve anyone.

So: remove the code and allow ESTALE.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 utils/mountd/cache.c |   12 ------------
 1 file changed, 12 deletions(-)

diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c
index ec86a22613cf..346a8b3af8b5 100644
--- a/utils/mountd/cache.c
+++ b/utils/mountd/cache.c
@@ -687,7 +687,6 @@ static void nfsd_fh(int f)
 	char *found_path = NULL;
 	nfs_export *exp;
 	int i;
-	int dev_missing = 0;
 	char buf[RPC_CHAN_BUF_SIZE], *bp;
 	int blen;
 
@@ -754,11 +753,6 @@ static void nfsd_fh(int f)
 			if (!is_ipaddr_client(dom)
 					&& !namelist_client_matches(exp, dom))
 				continue;
-			if (exp->m_export.e_mountpoint &&
-			    !is_mountpoint(exp->m_export.e_mountpoint[0]?
-					   exp->m_export.e_mountpoint:
-					   exp->m_export.e_path))
-				dev_missing ++;
 
 			if (!match_fsid(&parsed, exp, path))
 				continue;
@@ -801,12 +795,6 @@ static void nfsd_fh(int f)
 		/* FIXME we need to make sure we re-visit this later */
 		goto out;
 	}
-	if (!found && dev_missing) {
-		/* The missing dev could be what we want, so just be
-		 * quite rather than returning stale yet
-		 */
-		goto out;
-	}
 
 	if (found)
 		if (cache_export_ent(buf, sizeof(buf), dom, found, found_path) < 0)



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/8] mountd: cause attempts to access unmounted exportpoints to return ESTALE
  2016-07-14  2:26 [PATCH 0/8] Assorted mount-related nfs-utils patches NeilBrown
  2016-07-14  2:26 ` [PATCH 3/8] mountd: remove 'dev_missing' checks NeilBrown
  2016-07-14  2:26 ` [PATCH 6/8] mountd: don't add paths to non-mounted export points to pseudo-root NeilBrown
@ 2016-07-14  2:26 ` NeilBrown
  2016-07-14  2:26 ` [PATCH 2/8] mountd: remove the --exports-file option NeilBrown
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2016-07-14  2:26 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing list

If a filehandle arrives for an export with the "mountpoint" option,
but which isn't mounted, then we should return ESTALE rather than
delaying in the hope the filesystem might get mounted.
It is safest to assume that if a filesystem didn't get mounted, then
there is a serious problem, and waiting isn't likely to fix it.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 utils/mountd/cache.c |   17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c
index 346a8b3af8b5..f33f66640c6c 100644
--- a/utils/mountd/cache.c
+++ b/utils/mountd/cache.c
@@ -689,6 +689,7 @@ static void nfsd_fh(int f)
 	int i;
 	char buf[RPC_CHAN_BUF_SIZE], *bp;
 	int blen;
+	int short_timeout = 0;
 
 	blen = read(f, buf, sizeof(buf));
 	if (blen <= 0 || buf[blen-1] != '\n') return;
@@ -787,13 +788,12 @@ static void nfsd_fh(int f)
 	    !is_mountpoint(found->e_mountpoint[0]?
 			   found->e_mountpoint:
 			   found->e_path)) {
-		/* Cannot export this yet 
-		 * should log a warning, but need to rate limit
-		   xlog(L_WARNING, "%s not exported as %d not a mountpoint",
-		   found->e_path, found->e_mountpoint);
+		/* Cannot export this, as it isn't mounted.
+		 * Set a short timeout so we might try again
+		 * if the filesystem gets mounted.
 		 */
-		/* FIXME we need to make sure we re-visit this later */
-		goto out;
+		found = 0;
+		short_timeout = 1;
 	}
 
 	if (found)
@@ -812,7 +812,10 @@ static void nfsd_fh(int f)
 	 * timeout.  Maybe this should be configurable on the command
 	 * line.
 	 */
-	qword_addint(&bp, &blen, 0x7fffffff);
+	if (short_timeout)
+		qword_addint(&bp, &blen, DEFAULT_TTL);
+	else
+		qword_addint(&bp, &blen, 0x7fffffff);
 	if (found)
 		qword_add(&bp, &blen, found_path);
 	qword_addeol(&bp, &blen);



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/8] mountd: Don't export unmounted exports to NFSv4
  2016-07-14  2:26 [PATCH 0/8] Assorted mount-related nfs-utils patches NeilBrown
                   ` (4 preceding siblings ...)
  2016-07-14  2:26 ` [PATCH 1/8] nfs.man: clarify effect of 'retry' option NeilBrown
@ 2016-07-14  2:26 ` NeilBrown
  2016-07-14  2:26 ` [PATCH 7/8] mount: don't treat temporary name resolution failure as permanent NeilBrown
  2016-07-14  2:26 ` [PATCH 8/8] mount: use a public address for IPv6 callback NeilBrown
  7 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2016-07-14  2:26 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing list

An export point with the "mountpoint" option should not be exported
if it isn't a mount point.
For NFSv3, this is handled primarily by failing the MOUNT request.
For NFSv4, we must ensure a lookup from the pseduo-root fails too.

This means nfsd_export must check for the 'mountpoint' option and handle
it correctly.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 utils/mountd/cache.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c
index f33f66640c6c..19e7607e4fb1 100644
--- a/utils/mountd/cache.c
+++ b/utils/mountd/cache.c
@@ -1321,7 +1321,12 @@ static void nfsd_export(int f)
 	found = lookup_export(dom, path, ai);
 
 	if (found) {
-		if (dump_to_cache(f, buf, sizeof(buf), dom, path, &found->m_export) < 0) {
+		if (found->m_export.e_mountpoint &&
+		    !is_mountpoint(found->m_export.e_mountpoint[0]?
+				   found->m_export.e_mountpoint:
+				   found->m_export.e_path))
+			dump_to_cache(f, buf, sizeof(buf), dom, path, NULL);
+		else if (dump_to_cache(f, buf, sizeof(buf), dom, path, &found->m_export) < 0) {
 			xlog(L_WARNING,
 			     "Cannot export %s, possibly unsupported filesystem"
 			     " or fsid= required", path);



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/8] mountd: don't add paths to non-mounted export points to pseudo-root
  2016-07-14  2:26 [PATCH 0/8] Assorted mount-related nfs-utils patches NeilBrown
  2016-07-14  2:26 ` [PATCH 3/8] mountd: remove 'dev_missing' checks NeilBrown
@ 2016-07-14  2:26 ` NeilBrown
  2016-07-18 20:32   ` J. Bruce Fields
  2016-07-14  2:26 ` [PATCH 4/8] mountd: cause attempts to access unmounted exportpoints to return ESTALE NeilBrown
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: NeilBrown @ 2016-07-14  2:26 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing list

export points with the "mountpoint" flag should not be exported
if they aren't mounted.
They shouldn't even appear in the pseudo-root filesystem.
So add an appropriate check to v4root_set().

This means that the v4root might need to be recomputed whenever a
filesystem is mounted or unmounted.  So when there are export points
with the "mountpoint" flag, check for changes in the mount table.
This is done be measuring the size of /proc/mounts.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 support/include/v4root.h |    2 +-
 utils/mountd/auth.c      |   29 +++++++++++++++++++++++++++--
 utils/mountd/v4root.c    |   11 ++++++++++-
 3 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/support/include/v4root.h b/support/include/v4root.h
index 706c15c70d95..406fd4e43e5a 100644
--- a/support/include/v4root.h
+++ b/support/include/v4root.h
@@ -10,6 +10,6 @@
 #define V4ROOT_H
 
 extern int v4root_needed;
-extern void v4root_set(void);
+extern void v4root_set(int *mountpoints_checked);
 
 #endif /* V4ROOT_H */
diff --git a/utils/mountd/auth.c b/utils/mountd/auth.c
index 0881d9a6edba..5bd7e6622307 100644
--- a/utils/mountd/auth.c
+++ b/utils/mountd/auth.c
@@ -77,6 +77,29 @@ check_useipaddr(void)
 		cache_flush(1);
 }
 
+static int mountpoints_changed(void)
+{
+	static int last_size = 0;
+	int size;
+	int fd;
+	char buf[4096];
+	int n;
+
+	fd = open("/proc/mounts", O_RDONLY);
+	if (fd < 0)
+		/* ignore mountpoint changes if we cannot read /proc/mounts */
+		return 0;
+	size = 0;
+	while ((n = read(fd, buf, sizeof(buf))) > 0)
+		size += n;
+	if (n < 0)
+		return 0;
+	if (size == last_size)
+		return 0;
+	last_size = size;
+	return 1;
+}
+
 unsigned int
 auth_reload()
 {
@@ -84,6 +107,7 @@ auth_reload()
 	static ino_t		last_inode;
 	static int		last_fd = -1;
 	static unsigned int	counter;
+	static int		mountpoints_checked = 0;
 	int			fd;
 
 	if ((fd = open(_PATH_ETAB, O_RDONLY)) < 0) {
@@ -91,7 +115,8 @@ auth_reload()
 	} else if (fstat(fd, &stb) < 0) {
 		xlog(L_FATAL, "couldn't stat %s", _PATH_ETAB);
 		close(fd);
-	} else if (last_fd != -1 && stb.st_ino == last_inode) {
+	} else if (last_fd != -1 && stb.st_ino == last_inode &&
+		   (!mountpoints_checked || !mountpoints_changed())) {
 		/* We opened the etab file before, and its inode
 		 * number hasn't changed since then.
 		 */
@@ -114,7 +139,7 @@ auth_reload()
 	memset(&my_client, 0, sizeof(my_client));
 	xtab_export_read();
 	check_useipaddr();
-	v4root_set();
+	v4root_set(&mountpoints_checked);
 
 	++counter;
 
diff --git a/utils/mountd/v4root.c b/utils/mountd/v4root.c
index d52172592823..1a5778f9c7de 100644
--- a/utils/mountd/v4root.c
+++ b/utils/mountd/v4root.c
@@ -183,7 +183,7 @@ static int v4root_add_parents(nfs_export *exp)
  * looking for components of the v4 mount.
  */
 void
-v4root_set()
+v4root_set(int *mountpoints_checked)
 {
 	nfs_export	*exp;
 	int	i;
@@ -193,6 +193,7 @@ v4root_set()
 	if (!v4root_support())
 		return;
 
+	*mountpoints_checked = 0;
 	for (i = 0; i < MCL_MAXTYPES; i++) {
 		for (exp = exportlist[i].p_head; exp; exp = exp->m_next) {
 			if (exp->m_export.e_flags & NFSEXP_V4ROOT)
@@ -202,6 +203,14 @@ v4root_set()
 				 */
 				continue;
 
+			if (exp->m_export.e_mountpoint) {
+				*mountpoints_checked = 1;
+				if (!is_mountpoint(exp->m_export.e_mountpoint[0]?
+						   exp->m_export.e_mountpoint:
+						   exp->m_export.e_path))
+					continue;
+			}
+
 			if (strcmp(exp->m_export.e_path, "/") == 0 &&
 			    !(exp->m_export.e_flags & NFSEXP_FSID)) {
 				/* Force '/' to be exported as fsid == 0*/



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 7/8] mount: don't treat temporary name resolution failure as permanent.
  2016-07-14  2:26 [PATCH 0/8] Assorted mount-related nfs-utils patches NeilBrown
                   ` (5 preceding siblings ...)
  2016-07-14  2:26 ` [PATCH 5/8] mountd: Don't export unmounted exports to NFSv4 NeilBrown
@ 2016-07-14  2:26 ` NeilBrown
  2016-07-19 23:01   ` NeilBrown
  2016-07-14  2:26 ` [PATCH 8/8] mount: use a public address for IPv6 callback NeilBrown
  7 siblings, 1 reply; 28+ messages in thread
From: NeilBrown @ 2016-07-14  2:26 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing list

If getaddrinfo() returns EAI_AGAIN, we shouldn't just give up, but
should continue normal retries as the nameserver may be unavailable
for the same reason as the NFS server.

So move the getaddrinfo() call from nfs_validate_options() into
nfs_try_mounts() which is always called soon after, except in the
'remount' case when we don't want it anyway.

If EAI_AGAIN is returned, set errno to EAGAIN and allow this to be a
temporary failure.  Otherwise report error and set errno to EALREADY
so no further message is given.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 utils/mount/stropts.c |   54 ++++++++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
index d60b484ab960..522c7ae015f1 100644
--- a/utils/mount/stropts.c
+++ b/utils/mount/stropts.c
@@ -84,6 +84,7 @@ struct nfsmount_info {
 				*type;		/* "nfs" or "nfs4" */
 	char			*hostname;	/* server's hostname */
 	struct addrinfo		*address;	/* server's addresses */
+	sa_family_t		family;		/* Address family */
 
 	struct mount_options	*options;	/* parsed mount options */
 	char			**extra_opts;	/* string for /etc/mtab */
@@ -371,39 +372,19 @@ static int nfs_set_version(struct nfsmount_info *mi)
  */
 static int nfs_validate_options(struct nfsmount_info *mi)
 {
-	struct addrinfo hint = {
-		.ai_protocol	= (int)IPPROTO_UDP,
-	};
-	sa_family_t family;
-	int error;
-
 	if (!nfs_parse_devname(mi->spec, &mi->hostname, NULL))
 		return 0;
 
-	if (!nfs_nfs_proto_family(mi->options, &family))
+	if (!nfs_nfs_proto_family(mi->options, &mi->family))
 		return 0;
 
 	/*
 	 * A remount is not going to be able to change the server's address,
 	 * nor should we try to resolve another address for the server as we
 	 * may end up with a different address.
+	 * A non-remount will set 'addr' from ->hostname
 	 */
-	if (mi->flags & MS_REMOUNT) {
-		po_remove_all(mi->options, "addr");
-	} else {
-		hint.ai_family = (int)family;
-		error = getaddrinfo(mi->hostname, NULL, &hint, &mi->address);
-		if (error != 0) {
-			nfs_error(_("%s: Failed to resolve server %s: %s"),
-				progname, mi->hostname, gai_strerror(error));
-			mi->address = NULL;
-			return 0;
-		}
-
-		if (!nfs_append_addr_option(mi->address->ai_addr,
-						mi->address->ai_addrlen, mi->options))
-			return 0;
-	}
+	po_remove_all(mi->options, "addr");
 
 	if (!nfs_set_version(mi))
 		return 0;
@@ -903,6 +884,32 @@ static int nfs_try_mount(struct nfsmount_info *mi)
 {
 	int result = 0;
 
+	if (mi->address == NULL) {
+		struct addrinfo hint = {
+			.ai_protocol	= (int)IPPROTO_UDP,
+		};
+		int error;
+		struct addrinfo *address;
+
+		hint.ai_family = (int)mi->family;
+		error = getaddrinfo(mi->hostname, NULL, &hint, &address);
+		if (error != 0) {
+			if (error == EAI_AGAIN)
+				errno = EAGAIN;
+			else {
+				nfs_error(_("%s: Failed to resolve server %s: %s"),
+					  progname, mi->hostname, gai_strerror(error));
+				errno = EALREADY;
+			}
+			return 0;
+		}
+
+		if (!nfs_append_addr_option(mi->address->ai_addr,
+					    mi->address->ai_addrlen, mi->options))
+			return 0;
+		mi->address = address;
+	}
+
 	switch (mi->version.major) {
 		case 2:
 		case 3:
@@ -941,6 +948,7 @@ static int nfs_is_permanent_error(int error)
 	case ETIMEDOUT:
 	case ECONNREFUSED:
 	case EHOSTUNREACH:
+	case EAGAIN:
 		return 0;	/* temporary */
 	default:
 		return 1;	/* permanent */



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 8/8] mount: use a public address for IPv6 callback.
  2016-07-14  2:26 [PATCH 0/8] Assorted mount-related nfs-utils patches NeilBrown
                   ` (6 preceding siblings ...)
  2016-07-14  2:26 ` [PATCH 7/8] mount: don't treat temporary name resolution failure as permanent NeilBrown
@ 2016-07-14  2:26 ` NeilBrown
  7 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2016-07-14  2:26 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing list

If IPv6 address privacy is active, the "clientaddr" given to the server
will likely be a temporary address which will eventually expire, thus
breaking callback.
So ask for a public address to ensure continued service.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 utils/mount/network.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/utils/mount/network.c b/utils/mount/network.c
index 0d12613e86a4..6529b6ee6b01 100644
--- a/utils/mount/network.c
+++ b/utils/mount/network.c
@@ -38,6 +38,7 @@
 #include <sys/socket.h>
 #include <sys/wait.h>
 #include <sys/stat.h>
+#include <linux/in6.h>
 #include <netinet/in.h>
 #include <rpc/rpc.h>
 #include <rpc/pmap_prot.h>
@@ -1113,6 +1114,7 @@ static int nfs_ca_sockname(const struct sockaddr *sap, const socklen_t salen,
 		.sin6_addr		= IN6ADDR_ANY_INIT,
 	};
 	int sock, result = 0;
+	int val;
 
 	sock = socket(sap->sa_family, SOCK_DGRAM, IPPROTO_UDP);
 	if (sock < 0)
@@ -1124,6 +1126,9 @@ static int nfs_ca_sockname(const struct sockaddr *sap, const socklen_t salen,
 			goto out;
 		break;
 	case AF_INET6:
+		/* Make sure the call-back address is public */
+		val = IPV6_PREFER_SRC_PUBLIC;
+		setsockopt(sock, SOL_IPV6, IPV6_ADDR_PREFERENCES, &val, sizeof(val));
 		if (bind(sock, SAFE_SOCKADDR(&sin6), sizeof(sin6)) < 0)
 			goto out;
 		break;



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/8] mountd: remove the --exports-file option
  2016-07-14  2:26 ` [PATCH 2/8] mountd: remove the --exports-file option NeilBrown
@ 2016-07-18 16:19   ` J. Bruce Fields
  0 siblings, 0 replies; 28+ messages in thread
From: J. Bruce Fields @ 2016-07-18 16:19 UTC (permalink / raw)
  To: NeilBrown; +Cc: Steve Dickson, Linux NFS Mailing list

On Thu, Jul 14, 2016 at 12:26:43PM +1000, NeilBrown wrote:
> It is completely ineffective.

And it's been that way since the beginning of git history, and nobody's
noticed.  Weird.

ACK to the patch.--b.

> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  utils/mountd/auth.c     |    5 +----
>  utils/mountd/mountd.c   |   11 +++--------
>  utils/mountd/mountd.h   |    2 +-
>  utils/mountd/mountd.man |    8 --------
>  4 files changed, 5 insertions(+), 21 deletions(-)
> 
> diff --git a/utils/mountd/auth.c b/utils/mountd/auth.c
> index 894a7a53957f..0881d9a6edba 100644
> --- a/utils/mountd/auth.c
> +++ b/utils/mountd/auth.c
> @@ -36,7 +36,6 @@ enum auth_error
>  };
>  
>  static void		auth_fixpath(char *path);
> -static char	*export_file = NULL;
>  static nfs_export my_exp;
>  static nfs_client my_client;
>  
> @@ -44,10 +43,8 @@ extern int new_cache;
>  extern int use_ipaddr;
>  
>  void
> -auth_init(char *exports)
> +auth_init(void)
>  {
> -
> -	export_file = exports;
>  	auth_reload();
>  	xtab_mount_write();
>  }
> diff --git a/utils/mountd/mountd.c b/utils/mountd/mountd.c
> index 063da269f895..7a51b093f66a 100644
> --- a/utils/mountd/mountd.c
> +++ b/utils/mountd/mountd.c
> @@ -57,7 +57,6 @@ static struct option longopts[] =
>  	{ "descriptors", 1, 0, 'o' },
>  	{ "debug", 1, 0, 'd' },
>  	{ "help", 0, 0, 'h' },
> -	{ "exports-file", 1, 0, 'f' },
>  	{ "nfs-version", 1, 0, 'V' },
>  	{ "no-nfs-version", 1, 0, 'N' },
>  	{ "version", 0, 0, 'v' },
> @@ -689,7 +688,6 @@ get_exportlist(void)
>  int
>  main(int argc, char **argv)
>  {
> -	char	*export_file = _PATH_EXPORTS;
>  	char    *state_dir = NFS_STATEDIR;
>  	char	*progname;
>  	unsigned int listeners = 0;
> @@ -709,7 +707,7 @@ main(int argc, char **argv)
>  
>  	/* Parse the command line options and arguments. */
>  	opterr = 0;
> -	while ((c = getopt_long(argc, argv, "o:nFd:f:p:P:hH:N:V:vurs:t:g", longopts, NULL)) != EOF)
> +	while ((c = getopt_long(argc, argv, "o:nFd:p:P:hH:N:V:vurs:t:g", longopts, NULL)) != EOF)
>  		switch (c) {
>  		case 'g':
>  			manage_gids = 1;
> @@ -728,9 +726,6 @@ main(int argc, char **argv)
>  		case 'd':
>  			xlog_sconfig(optarg, 1);
>  			break;
> -		case 'f':
> -			export_file = optarg;
> -			break;
>  		case 'H': /* PRC: specify a high-availability callout program */
>  			ha_callout_prog = optarg;
>  			break;
> @@ -862,7 +857,7 @@ main(int argc, char **argv)
>  	sa.sa_handler = sig_hup;
>  	sigaction(SIGHUP, &sa, NULL);
>  
> -	auth_init(export_file);
> +	auth_init();
>  
>  	if (!foreground) {
>  		/* We first fork off a child. */
> @@ -908,7 +903,7 @@ usage(const char *prog, int n)
>  {
>  	fprintf(stderr,
>  "Usage: %s [-F|--foreground] [-h|--help] [-v|--version] [-d kind|--debug kind]\n"
> -"	[-o num|--descriptors num] [-f exports-file|--exports-file=file]\n"
> +"	[-o num|--descriptors num]\n"
>  "	[-p|--port port] [-V version|--nfs-version version]\n"
>  "	[-N version|--no-nfs-version version] [-n|--no-tcp]\n"
>  "	[-H prog |--ha-callout prog] [-r |--reverse-lookup]\n"
> diff --git a/utils/mountd/mountd.h b/utils/mountd/mountd.h
> index 6d358a75d9f3..f058f01d3584 100644
> --- a/utils/mountd/mountd.h
> +++ b/utils/mountd/mountd.h
> @@ -39,7 +39,7 @@ bool_t		mount_pathconf_2_svc(struct svc_req *, dirpath *, ppathcnf *);
>  bool_t		mount_mnt_3_svc(struct svc_req *, dirpath *, mountres3 *);
>  
>  void		mount_dispatch(struct svc_req *, SVCXPRT *);
> -void		auth_init(char *export_file);
> +void		auth_init(void);
>  unsigned int	auth_reload(void);
>  nfs_export *	auth_authenticate(const char *what,
>  					const struct sockaddr *caller,
> diff --git a/utils/mountd/mountd.man b/utils/mountd/mountd.man
> index 66e3bba7e865..e0d1a0acba3a 100644
> --- a/utils/mountd/mountd.man
> +++ b/utils/mountd/mountd.man
> @@ -86,14 +86,6 @@ Turn on debugging. Valid kinds are: all, auth, call, general and parse.
>  .B \-F " or " \-\-foreground
>  Run in foreground (do not daemonize)
>  .TP
> -.B \-f export-file " or " \-\-exports-file export-file
> -This option specifies the exports file, listing the clients that this
> -server is prepared to serve and parameters to apply to each
> -such mount (see
> -.BR exports (5)).
> -By default, export information is read from
> -.IR /etc/exports .
> -.TP
>  .B \-h " or " \-\-help
>  Display usage message.
>  .TP
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/8] mountd: remove 'dev_missing' checks
  2016-07-14  2:26 ` [PATCH 3/8] mountd: remove 'dev_missing' checks NeilBrown
@ 2016-07-18 20:01   ` J. Bruce Fields
  2016-07-19 22:50     ` NeilBrown
  0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2016-07-18 20:01 UTC (permalink / raw)
  To: NeilBrown; +Cc: Steve Dickson, Linux NFS Mailing list

On Thu, Jul 14, 2016 at 12:26:43PM +1000, NeilBrown wrote:
> I now think this was a mistaken idea.
> 
> If a filesystem is exported with the "mountpoint" or "mp" option, it
> should only be exported if the directory is a mount point.  The
> intention is that if there is a problem with one filesystem on a
> server, the others can still be exported, but clients won't
> incorrectly see the empty directory on the parent when accessing the
> missing filesystem, they will see clearly that the filesystem is
> missing.
> 
> It is easy to handle this correctly for NFSv3 MOUNT requests, but what
> is the correct behavior if a client already has the filesystem mounted
> and so has a filehandle?  Maybe the server rebooted and came back with
> one device missing.  What should the client see?
> 
> The "dev_missing" code tries to detect this case and causes the server
> to respond with silence rather than ESTALE.  The idea was that the
> client would retry and when (or if) the filesystem came back, service
> would be transparently restored.
> 
> The problem with this is that arbitrarily long delays are not what
> people would expect, and can be quite annoying.  ESTALE, while
> unpleasant, it at least easily understood.  A device disappearing is a
> fairly significant event and hiding it doesn't really serve anyone.

It could also be a filesystem disappearing because it failed to mount in
time on a reboot.

> So: remove the code and allow ESTALE.

I'm not completely sure I understand the justification.

I don't like the current behavior either--I'd be happy if we could
deprecate "mountpoint" entirely--but changing it now would seem to risk
regressions if anyone depends on it.

--b.

> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  utils/mountd/cache.c |   12 ------------
>  1 file changed, 12 deletions(-)
> 
> diff --git a/utils/mountd/cache.c b/utils/mountd/cache.c
> index ec86a22613cf..346a8b3af8b5 100644
> --- a/utils/mountd/cache.c
> +++ b/utils/mountd/cache.c
> @@ -687,7 +687,6 @@ static void nfsd_fh(int f)
>  	char *found_path = NULL;
>  	nfs_export *exp;
>  	int i;
> -	int dev_missing = 0;
>  	char buf[RPC_CHAN_BUF_SIZE], *bp;
>  	int blen;
>  
> @@ -754,11 +753,6 @@ static void nfsd_fh(int f)
>  			if (!is_ipaddr_client(dom)
>  					&& !namelist_client_matches(exp, dom))
>  				continue;
> -			if (exp->m_export.e_mountpoint &&
> -			    !is_mountpoint(exp->m_export.e_mountpoint[0]?
> -					   exp->m_export.e_mountpoint:
> -					   exp->m_export.e_path))
> -				dev_missing ++;
>  
>  			if (!match_fsid(&parsed, exp, path))
>  				continue;
> @@ -801,12 +795,6 @@ static void nfsd_fh(int f)
>  		/* FIXME we need to make sure we re-visit this later */
>  		goto out;
>  	}
> -	if (!found && dev_missing) {
> -		/* The missing dev could be what we want, so just be
> -		 * quite rather than returning stale yet
> -		 */
> -		goto out;
> -	}
>  
>  	if (found)
>  		if (cache_export_ent(buf, sizeof(buf), dom, found, found_path) < 0)
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/8] mountd: don't add paths to non-mounted export points to pseudo-root
  2016-07-14  2:26 ` [PATCH 6/8] mountd: don't add paths to non-mounted export points to pseudo-root NeilBrown
@ 2016-07-18 20:32   ` J. Bruce Fields
  2016-07-19  8:00     ` Chuck Lever
  2016-07-19 22:59     ` NeilBrown
  0 siblings, 2 replies; 28+ messages in thread
From: J. Bruce Fields @ 2016-07-18 20:32 UTC (permalink / raw)
  To: NeilBrown; +Cc: Steve Dickson, Linux NFS Mailing list

On Thu, Jul 14, 2016 at 12:26:43PM +1000, NeilBrown wrote:
> export points with the "mountpoint" flag should not be exported
> if they aren't mounted.
> They shouldn't even appear in the pseudo-root filesystem.
> So add an appropriate check to v4root_set().
> 
> This means that the v4root might need to be recomputed whenever a
> filesystem is mounted or unmounted.  So when there are export points
> with the "mountpoint" flag, check for changes in the mount table.
> This is done be measuring the size of /proc/mounts.

Surely there's some more reliable measurement--could we track some data
about the mountpoint itself, maybe?

But I'd still like some more justification for this change in logic.
Does anyone currently use the "mp" option?  If not, could we just
deprecate it?  If so, can we really get away with changing it this way?

--b.

> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  support/include/v4root.h |    2 +-
>  utils/mountd/auth.c      |   29 +++++++++++++++++++++++++++--
>  utils/mountd/v4root.c    |   11 ++++++++++-
>  3 files changed, 38 insertions(+), 4 deletions(-)
> 
> diff --git a/support/include/v4root.h b/support/include/v4root.h
> index 706c15c70d95..406fd4e43e5a 100644
> --- a/support/include/v4root.h
> +++ b/support/include/v4root.h
> @@ -10,6 +10,6 @@
>  #define V4ROOT_H
>  
>  extern int v4root_needed;
> -extern void v4root_set(void);
> +extern void v4root_set(int *mountpoints_checked);
>  
>  #endif /* V4ROOT_H */
> diff --git a/utils/mountd/auth.c b/utils/mountd/auth.c
> index 0881d9a6edba..5bd7e6622307 100644
> --- a/utils/mountd/auth.c
> +++ b/utils/mountd/auth.c
> @@ -77,6 +77,29 @@ check_useipaddr(void)
>  		cache_flush(1);
>  }
>  
> +static int mountpoints_changed(void)
> +{
> +	static int last_size = 0;
> +	int size;
> +	int fd;
> +	char buf[4096];
> +	int n;
> +
> +	fd = open("/proc/mounts", O_RDONLY);
> +	if (fd < 0)
> +		/* ignore mountpoint changes if we cannot read /proc/mounts */
> +		return 0;
> +	size = 0;
> +	while ((n = read(fd, buf, sizeof(buf))) > 0)
> +		size += n;
> +	if (n < 0)
> +		return 0;
> +	if (size == last_size)
> +		return 0;
> +	last_size = size;
> +	return 1;
> +}
> +
>  unsigned int
>  auth_reload()
>  {
> @@ -84,6 +107,7 @@ auth_reload()
>  	static ino_t		last_inode;
>  	static int		last_fd = -1;
>  	static unsigned int	counter;
> +	static int		mountpoints_checked = 0;
>  	int			fd;
>  
>  	if ((fd = open(_PATH_ETAB, O_RDONLY)) < 0) {
> @@ -91,7 +115,8 @@ auth_reload()
>  	} else if (fstat(fd, &stb) < 0) {
>  		xlog(L_FATAL, "couldn't stat %s", _PATH_ETAB);
>  		close(fd);
> -	} else if (last_fd != -1 && stb.st_ino == last_inode) {
> +	} else if (last_fd != -1 && stb.st_ino == last_inode &&
> +		   (!mountpoints_checked || !mountpoints_changed())) {
>  		/* We opened the etab file before, and its inode
>  		 * number hasn't changed since then.
>  		 */
> @@ -114,7 +139,7 @@ auth_reload()
>  	memset(&my_client, 0, sizeof(my_client));
>  	xtab_export_read();
>  	check_useipaddr();
> -	v4root_set();
> +	v4root_set(&mountpoints_checked);
>  
>  	++counter;
>  
> diff --git a/utils/mountd/v4root.c b/utils/mountd/v4root.c
> index d52172592823..1a5778f9c7de 100644
> --- a/utils/mountd/v4root.c
> +++ b/utils/mountd/v4root.c
> @@ -183,7 +183,7 @@ static int v4root_add_parents(nfs_export *exp)
>   * looking for components of the v4 mount.
>   */
>  void
> -v4root_set()
> +v4root_set(int *mountpoints_checked)
>  {
>  	nfs_export	*exp;
>  	int	i;
> @@ -193,6 +193,7 @@ v4root_set()
>  	if (!v4root_support())
>  		return;
>  
> +	*mountpoints_checked = 0;
>  	for (i = 0; i < MCL_MAXTYPES; i++) {
>  		for (exp = exportlist[i].p_head; exp; exp = exp->m_next) {
>  			if (exp->m_export.e_flags & NFSEXP_V4ROOT)
> @@ -202,6 +203,14 @@ v4root_set()
>  				 */
>  				continue;
>  
> +			if (exp->m_export.e_mountpoint) {
> +				*mountpoints_checked = 1;
> +				if (!is_mountpoint(exp->m_export.e_mountpoint[0]?
> +						   exp->m_export.e_mountpoint:
> +						   exp->m_export.e_path))
> +					continue;
> +			}
> +
>  			if (strcmp(exp->m_export.e_path, "/") == 0 &&
>  			    !(exp->m_export.e_flags & NFSEXP_FSID)) {
>  				/* Force '/' to be exported as fsid == 0*/
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/8] mountd: don't add paths to non-mounted export points to pseudo-root
  2016-07-18 20:32   ` J. Bruce Fields
@ 2016-07-19  8:00     ` Chuck Lever
  2016-07-19 22:59     ` NeilBrown
  1 sibling, 0 replies; 28+ messages in thread
From: Chuck Lever @ 2016-07-19  8:00 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: NeilBrown, Steve Dickson, Linux NFS Mailing List


> On Jul 18, 2016, at 10:32 PM, bfields@fieldses.org wrote:
> 
> On Thu, Jul 14, 2016 at 12:26:43PM +1000, NeilBrown wrote:
>> export points with the "mountpoint" flag should not be exported
>> if they aren't mounted.
>> They shouldn't even appear in the pseudo-root filesystem.
>> So add an appropriate check to v4root_set().
>> 
>> This means that the v4root might need to be recomputed whenever a
>> filesystem is mounted or unmounted.  So when there are export points
>> with the "mountpoint" flag, check for changes in the mount table.
>> This is done be measuring the size of /proc/mounts.
> 
> Surely there's some more reliable measurement--could we track some data
> about the mountpoint itself, maybe?
> 
> But I'd still like some more justification for this change in logic.
> Does anyone currently use the "mp" option?

The fedfs-domainroot tool used to specify "mp" on
domain root exports. There were some issues with it,
and I think "mp" was removed.


> If not, could we just
> deprecate it?  If so, can we really get away with changing it this way?
> 
> --b.
> 
>> 
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>> support/include/v4root.h |    2 +-
>> utils/mountd/auth.c      |   29 +++++++++++++++++++++++++++--
>> utils/mountd/v4root.c    |   11 ++++++++++-
>> 3 files changed, 38 insertions(+), 4 deletions(-)
>> 
>> diff --git a/support/include/v4root.h b/support/include/v4root.h
>> index 706c15c70d95..406fd4e43e5a 100644
>> --- a/support/include/v4root.h
>> +++ b/support/include/v4root.h
>> @@ -10,6 +10,6 @@
>> #define V4ROOT_H
>> 
>> extern int v4root_needed;
>> -extern void v4root_set(void);
>> +extern void v4root_set(int *mountpoints_checked);
>> 
>> #endif /* V4ROOT_H */
>> diff --git a/utils/mountd/auth.c b/utils/mountd/auth.c
>> index 0881d9a6edba..5bd7e6622307 100644
>> --- a/utils/mountd/auth.c
>> +++ b/utils/mountd/auth.c
>> @@ -77,6 +77,29 @@ check_useipaddr(void)
>> 		cache_flush(1);
>> }
>> 
>> +static int mountpoints_changed(void)
>> +{
>> +	static int last_size = 0;
>> +	int size;
>> +	int fd;
>> +	char buf[4096];
>> +	int n;
>> +
>> +	fd = open("/proc/mounts", O_RDONLY);
>> +	if (fd < 0)
>> +		/* ignore mountpoint changes if we cannot read /proc/mounts */
>> +		return 0;
>> +	size = 0;
>> +	while ((n = read(fd, buf, sizeof(buf))) > 0)
>> +		size += n;
>> +	if (n < 0)
>> +		return 0;
>> +	if (size == last_size)
>> +		return 0;
>> +	last_size = size;
>> +	return 1;
>> +}
>> +
>> unsigned int
>> auth_reload()
>> {
>> @@ -84,6 +107,7 @@ auth_reload()
>> 	static ino_t		last_inode;
>> 	static int		last_fd = -1;
>> 	static unsigned int	counter;
>> +	static int		mountpoints_checked = 0;
>> 	int			fd;
>> 
>> 	if ((fd = open(_PATH_ETAB, O_RDONLY)) < 0) {
>> @@ -91,7 +115,8 @@ auth_reload()
>> 	} else if (fstat(fd, &stb) < 0) {
>> 		xlog(L_FATAL, "couldn't stat %s", _PATH_ETAB);
>> 		close(fd);
>> -	} else if (last_fd != -1 && stb.st_ino == last_inode) {
>> +	} else if (last_fd != -1 && stb.st_ino == last_inode &&
>> +		   (!mountpoints_checked || !mountpoints_changed())) {
>> 		/* We opened the etab file before, and its inode
>> 		 * number hasn't changed since then.
>> 		 */
>> @@ -114,7 +139,7 @@ auth_reload()
>> 	memset(&my_client, 0, sizeof(my_client));
>> 	xtab_export_read();
>> 	check_useipaddr();
>> -	v4root_set();
>> +	v4root_set(&mountpoints_checked);
>> 
>> 	++counter;
>> 
>> diff --git a/utils/mountd/v4root.c b/utils/mountd/v4root.c
>> index d52172592823..1a5778f9c7de 100644
>> --- a/utils/mountd/v4root.c
>> +++ b/utils/mountd/v4root.c
>> @@ -183,7 +183,7 @@ static int v4root_add_parents(nfs_export *exp)
>>  * looking for components of the v4 mount.
>>  */
>> void
>> -v4root_set()
>> +v4root_set(int *mountpoints_checked)
>> {
>> 	nfs_export	*exp;
>> 	int	i;
>> @@ -193,6 +193,7 @@ v4root_set()
>> 	if (!v4root_support())
>> 		return;
>> 
>> +	*mountpoints_checked = 0;
>> 	for (i = 0; i < MCL_MAXTYPES; i++) {
>> 		for (exp = exportlist[i].p_head; exp; exp = exp->m_next) {
>> 			if (exp->m_export.e_flags & NFSEXP_V4ROOT)
>> @@ -202,6 +203,14 @@ v4root_set()
>> 				 */
>> 				continue;
>> 
>> +			if (exp->m_export.e_mountpoint) {
>> +				*mountpoints_checked = 1;
>> +				if (!is_mountpoint(exp->m_export.e_mountpoint[0]?
>> +						   exp->m_export.e_mountpoint:
>> +						   exp->m_export.e_path))
>> +					continue;
>> +			}
>> +
>> 			if (strcmp(exp->m_export.e_path, "/") == 0 &&
>> 			    !(exp->m_export.e_flags & NFSEXP_FSID)) {
>> 				/* Force '/' to be exported as fsid == 0*/
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/8] mountd: remove 'dev_missing' checks
  2016-07-18 20:01   ` J. Bruce Fields
@ 2016-07-19 22:50     ` NeilBrown
  2016-07-21 17:24       ` J. Bruce Fields
  0 siblings, 1 reply; 28+ messages in thread
From: NeilBrown @ 2016-07-19 22:50 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Steve Dickson, Linux NFS Mailing list

[-- Attachment #1: Type: text/plain, Size: 3003 bytes --]

On Tue, Jul 19 2016, J. Bruce Fields wrote:

> On Thu, Jul 14, 2016 at 12:26:43PM +1000, NeilBrown wrote:
>> I now think this was a mistaken idea.
>> 
>> If a filesystem is exported with the "mountpoint" or "mp" option, it
>> should only be exported if the directory is a mount point.  The
>> intention is that if there is a problem with one filesystem on a
>> server, the others can still be exported, but clients won't
>> incorrectly see the empty directory on the parent when accessing the
>> missing filesystem, they will see clearly that the filesystem is
>> missing.
>> 
>> It is easy to handle this correctly for NFSv3 MOUNT requests, but what
>> is the correct behavior if a client already has the filesystem mounted
>> and so has a filehandle?  Maybe the server rebooted and came back with
>> one device missing.  What should the client see?
>> 
>> The "dev_missing" code tries to detect this case and causes the server
>> to respond with silence rather than ESTALE.  The idea was that the
>> client would retry and when (or if) the filesystem came back, service
>> would be transparently restored.
>> 
>> The problem with this is that arbitrarily long delays are not what
>> people would expect, and can be quite annoying.  ESTALE, while
>> unpleasant, it at least easily understood.  A device disappearing is a
>> fairly significant event and hiding it doesn't really serve anyone.
>
> It could also be a filesystem disappearing because it failed to mount in
> time on a reboot.

I don't think "in time" is really an issue.  Boot sequencing should not
start nfsd until everything in /etc/fstab is mounted, has failed and the
failure has been deemed acceptable.
That is why nfs-server.services has "After= local-fs.target"

>
>> So: remove the code and allow ESTALE.
>
> I'm not completely sure I understand the justification.

"hangs are bad".

When you cannot get a reply from the NFS server there are multiple
possible causes from temporary network glitch to server-is-dead.
You cannot reliably differentiate, so you have to just wait.

There server itself doesn't have the same uncertainty about its exported
filesystems.  They are either working or they aren't.
So it is possible, and I think reasonable, to send a more definitive
reply - ESTALE.

This particularly became an issues with NFSv4.
With NFSv3, mounting the filesystem is distinct from accessing it.
So it was easy to fail a mount request but delay an access request.
With NFSv4 we don't have that distinction.  If we make accesses wait,
then we make mount attempts wait too, which isn't at all friendly.

>
> I don't like the current behavior either--I'd be happy if we could
> deprecate "mountpoint" entirely--but changing it now would seem to risk
> regressions if anyone depends on it.

True.  There isn't really an easy solution there.

"mountpoint" seemed like a good idea when I wrote it.  But I never got
any proper peer review.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/8] mountd: don't add paths to non-mounted export points to pseudo-root
  2016-07-18 20:32   ` J. Bruce Fields
  2016-07-19  8:00     ` Chuck Lever
@ 2016-07-19 22:59     ` NeilBrown
  2016-07-21 17:33       ` J. Bruce Fields
  1 sibling, 1 reply; 28+ messages in thread
From: NeilBrown @ 2016-07-19 22:59 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Steve Dickson, Linux NFS Mailing list

[-- Attachment #1: Type: text/plain, Size: 5809 bytes --]

On Tue, Jul 19 2016, J. Bruce Fields wrote:

> On Thu, Jul 14, 2016 at 12:26:43PM +1000, NeilBrown wrote:
>> export points with the "mountpoint" flag should not be exported
>> if they aren't mounted.
>> They shouldn't even appear in the pseudo-root filesystem.
>> So add an appropriate check to v4root_set().
>> 
>> This means that the v4root might need to be recomputed whenever a
>> filesystem is mounted or unmounted.  So when there are export points
>> with the "mountpoint" flag, check for changes in the mount table.
>> This is done be measuring the size of /proc/mounts.
>
> Surely there's some more reliable measurement--could we track some data
> about the mountpoint itself, maybe?

We could.  But it would be more complex code for very little gain.
I did consider using select() on /proc/mounts to get a notification
whenever anything changes.  What we be more reliable but more difficult.
I also considered calculating an SHA1, or maybe just a crc32 on the
contents of /proc/mounts.  But then I realised that the size was very
easy and very nearly as reliable.

>
> But I'd still like some more justification for this change in logic.
> Does anyone currently use the "mp" option?  If not, could we just
> deprecate it?  If so, can we really get away with changing it this way?

I have a customer complaining that it doesn't work as advertised for
NFSv4.  So presumably they have a use-case, though I haven't asked for
details on exactly why they want it.

I actually think this is the most useful of the changes.  It means that
if a filesystem isn't mounted, it isn't even visible over NFSv4.

After all, the reality is that people export filesystems, not names in
their namespace.  NFSv4 tries to make it all look like the same thing,
and there is some justification for that.  But I think a lot of people
think about it as filesystems being exported, and the mountpoint option
allows that thought to be expressed in the configuration.

NeilBrown


>
> --b.
>
>> 
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>  support/include/v4root.h |    2 +-
>>  utils/mountd/auth.c      |   29 +++++++++++++++++++++++++++--
>>  utils/mountd/v4root.c    |   11 ++++++++++-
>>  3 files changed, 38 insertions(+), 4 deletions(-)
>> 
>> diff --git a/support/include/v4root.h b/support/include/v4root.h
>> index 706c15c70d95..406fd4e43e5a 100644
>> --- a/support/include/v4root.h
>> +++ b/support/include/v4root.h
>> @@ -10,6 +10,6 @@
>>  #define V4ROOT_H
>>  
>>  extern int v4root_needed;
>> -extern void v4root_set(void);
>> +extern void v4root_set(int *mountpoints_checked);
>>  
>>  #endif /* V4ROOT_H */
>> diff --git a/utils/mountd/auth.c b/utils/mountd/auth.c
>> index 0881d9a6edba..5bd7e6622307 100644
>> --- a/utils/mountd/auth.c
>> +++ b/utils/mountd/auth.c
>> @@ -77,6 +77,29 @@ check_useipaddr(void)
>>  		cache_flush(1);
>>  }
>>  
>> +static int mountpoints_changed(void)
>> +{
>> +	static int last_size = 0;
>> +	int size;
>> +	int fd;
>> +	char buf[4096];
>> +	int n;
>> +
>> +	fd = open("/proc/mounts", O_RDONLY);
>> +	if (fd < 0)
>> +		/* ignore mountpoint changes if we cannot read /proc/mounts */
>> +		return 0;
>> +	size = 0;
>> +	while ((n = read(fd, buf, sizeof(buf))) > 0)
>> +		size += n;
>> +	if (n < 0)
>> +		return 0;
>> +	if (size == last_size)
>> +		return 0;
>> +	last_size = size;
>> +	return 1;
>> +}
>> +
>>  unsigned int
>>  auth_reload()
>>  {
>> @@ -84,6 +107,7 @@ auth_reload()
>>  	static ino_t		last_inode;
>>  	static int		last_fd = -1;
>>  	static unsigned int	counter;
>> +	static int		mountpoints_checked = 0;
>>  	int			fd;
>>  
>>  	if ((fd = open(_PATH_ETAB, O_RDONLY)) < 0) {
>> @@ -91,7 +115,8 @@ auth_reload()
>>  	} else if (fstat(fd, &stb) < 0) {
>>  		xlog(L_FATAL, "couldn't stat %s", _PATH_ETAB);
>>  		close(fd);
>> -	} else if (last_fd != -1 && stb.st_ino == last_inode) {
>> +	} else if (last_fd != -1 && stb.st_ino == last_inode &&
>> +		   (!mountpoints_checked || !mountpoints_changed())) {
>>  		/* We opened the etab file before, and its inode
>>  		 * number hasn't changed since then.
>>  		 */
>> @@ -114,7 +139,7 @@ auth_reload()
>>  	memset(&my_client, 0, sizeof(my_client));
>>  	xtab_export_read();
>>  	check_useipaddr();
>> -	v4root_set();
>> +	v4root_set(&mountpoints_checked);
>>  
>>  	++counter;
>>  
>> diff --git a/utils/mountd/v4root.c b/utils/mountd/v4root.c
>> index d52172592823..1a5778f9c7de 100644
>> --- a/utils/mountd/v4root.c
>> +++ b/utils/mountd/v4root.c
>> @@ -183,7 +183,7 @@ static int v4root_add_parents(nfs_export *exp)
>>   * looking for components of the v4 mount.
>>   */
>>  void
>> -v4root_set()
>> +v4root_set(int *mountpoints_checked)
>>  {
>>  	nfs_export	*exp;
>>  	int	i;
>> @@ -193,6 +193,7 @@ v4root_set()
>>  	if (!v4root_support())
>>  		return;
>>  
>> +	*mountpoints_checked = 0;
>>  	for (i = 0; i < MCL_MAXTYPES; i++) {
>>  		for (exp = exportlist[i].p_head; exp; exp = exp->m_next) {
>>  			if (exp->m_export.e_flags & NFSEXP_V4ROOT)
>> @@ -202,6 +203,14 @@ v4root_set()
>>  				 */
>>  				continue;
>>  
>> +			if (exp->m_export.e_mountpoint) {
>> +				*mountpoints_checked = 1;
>> +				if (!is_mountpoint(exp->m_export.e_mountpoint[0]?
>> +						   exp->m_export.e_mountpoint:
>> +						   exp->m_export.e_path))
>> +					continue;
>> +			}
>> +
>>  			if (strcmp(exp->m_export.e_path, "/") == 0 &&
>>  			    !(exp->m_export.e_flags & NFSEXP_FSID)) {
>>  				/* Force '/' to be exported as fsid == 0*/
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/8] mount: don't treat temporary name resolution failure as permanent.
  2016-07-14  2:26 ` [PATCH 7/8] mount: don't treat temporary name resolution failure as permanent NeilBrown
@ 2016-07-19 23:01   ` NeilBrown
  0 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2016-07-19 23:01 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing list

[-- Attachment #1: Type: text/plain, Size: 1097 bytes --]

On Thu, Jul 14 2016, NeilBrown wrote:

> If getaddrinfo() returns EAI_AGAIN, we shouldn't just give up, but
> should continue normal retries as the nameserver may be unavailable
> for the same reason as the NFS server.
>
> So move the getaddrinfo() call from nfs_validate_options() into
> nfs_try_mounts() which is always called soon after, except in the
> 'remount' case when we don't want it anyway.
>
> If EAI_AGAIN is returned, set errno to EAGAIN and allow this to be a
> temporary failure.  Otherwise report error and set errno to EALREADY
> so no further message is given.
>
> Signed-off-by: NeilBrown <neilb@suse.com>


> +
> +		if (!nfs_append_addr_option(mi->address->ai_addr,
> +					    mi->address->ai_addrlen, mi->options))
> +			return 0;
> +		mi->address = address;

This patch is badly buggy.  I'm de-referencing mi->address before
assigning it.  How did that ever pass quality control I ask myself.
(fortunately SUSE does better QA than I do).

I'll submit a revised version ... maybe put all the patches affecting
the mountpoint export option at the end of the list.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/8] mountd: remove 'dev_missing' checks
  2016-07-19 22:50     ` NeilBrown
@ 2016-07-21 17:24       ` J. Bruce Fields
  2016-08-11  2:51         ` NeilBrown
  0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2016-07-21 17:24 UTC (permalink / raw)
  To: NeilBrown; +Cc: Steve Dickson, Linux NFS Mailing list

On Wed, Jul 20, 2016 at 08:50:12AM +1000, NeilBrown wrote:
> On Tue, Jul 19 2016, J. Bruce Fields wrote:
> 
> > On Thu, Jul 14, 2016 at 12:26:43PM +1000, NeilBrown wrote:
> >> I now think this was a mistaken idea.
> >> 
> >> If a filesystem is exported with the "mountpoint" or "mp" option, it
> >> should only be exported if the directory is a mount point.  The
> >> intention is that if there is a problem with one filesystem on a
> >> server, the others can still be exported, but clients won't
> >> incorrectly see the empty directory on the parent when accessing the
> >> missing filesystem, they will see clearly that the filesystem is
> >> missing.
> >> 
> >> It is easy to handle this correctly for NFSv3 MOUNT requests, but what
> >> is the correct behavior if a client already has the filesystem mounted
> >> and so has a filehandle?  Maybe the server rebooted and came back with
> >> one device missing.  What should the client see?
> >> 
> >> The "dev_missing" code tries to detect this case and causes the server
> >> to respond with silence rather than ESTALE.  The idea was that the
> >> client would retry and when (or if) the filesystem came back, service
> >> would be transparently restored.
> >> 
> >> The problem with this is that arbitrarily long delays are not what
> >> people would expect, and can be quite annoying.  ESTALE, while
> >> unpleasant, it at least easily understood.  A device disappearing is a
> >> fairly significant event and hiding it doesn't really serve anyone.
> >
> > It could also be a filesystem disappearing because it failed to mount in
> > time on a reboot.
> 
> I don't think "in time" is really an issue.  Boot sequencing should not
> start nfsd until everything in /etc/fstab is mounted, has failed and the
> failure has been deemed acceptable.
> That is why nfs-server.services has "After= local-fs.target"

Yeah, I agree, that's the right way to do it.  I believe the old
behavior would be forgiving of misconfiguration here, though, which
means there's a chance somebody would witness a failure on upgrade.
Maybe the chance is small.

--b.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/8] mountd: don't add paths to non-mounted export points to pseudo-root
  2016-07-19 22:59     ` NeilBrown
@ 2016-07-21 17:33       ` J. Bruce Fields
  2016-07-25  7:22         ` NeilBrown
  0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2016-07-21 17:33 UTC (permalink / raw)
  To: NeilBrown; +Cc: Steve Dickson, Linux NFS Mailing list

On Wed, Jul 20, 2016 at 08:59:30AM +1000, NeilBrown wrote:
> On Tue, Jul 19 2016, J. Bruce Fields wrote:
> 
> > On Thu, Jul 14, 2016 at 12:26:43PM +1000, NeilBrown wrote:
> >> export points with the "mountpoint" flag should not be exported
> >> if they aren't mounted.
> >> They shouldn't even appear in the pseudo-root filesystem.
> >> So add an appropriate check to v4root_set().
> >> 
> >> This means that the v4root might need to be recomputed whenever a
> >> filesystem is mounted or unmounted.  So when there are export points
> >> with the "mountpoint" flag, check for changes in the mount table.
> >> This is done be measuring the size of /proc/mounts.
> >
> > Surely there's some more reliable measurement--could we track some data
> > about the mountpoint itself, maybe?
> 
> We could.  But it would be more complex code for very little gain.
> I did consider using select() on /proc/mounts to get a notification
> whenever anything changes.  What we be more reliable but more difficult.
> I also considered calculating an SHA1, or maybe just a crc32 on the
> contents of /proc/mounts.  But then I realised that the size was very
> easy and very nearly as reliable.

So we don't care enough about the mountpoint option enough to make it
work 100% reliably?

If we expect too few users for there to be a real chance of hitting the
bad case here, then I wonder again whether the whole feature is worth
the trouble.

> > But I'd still like some more justification for this change in logic.
> > Does anyone currently use the "mp" option?  If not, could we just
> > deprecate it?  If so, can we really get away with changing it this way?
> 
> I have a customer complaining that it doesn't work as advertised for
> NFSv4.  So presumably they have a use-case, though I haven't asked for
> details on exactly why they want it.

I'd be inclined to ask for more details about the use case before
continuing.

As it is I'm inclined towards some plan like documenting "mountpoint" as
deprecated, warning on use, and taking it out eventually.

--b.

> I actually think this is the most useful of the changes.  It means that
> if a filesystem isn't mounted, it isn't even visible over NFSv4.
> 
> After all, the reality is that people export filesystems, not names in
> their namespace.  NFSv4 tries to make it all look like the same thing,
> and there is some justification for that.  But I think a lot of people
> think about it as filesystems being exported, and the mountpoint option
> allows that thought to be expressed in the configuration.
> 
> NeilBrown
> 
> 
> >
> > --b.
> >
> >> 
> >> Signed-off-by: NeilBrown <neilb@suse.com>
> >> ---
> >>  support/include/v4root.h |    2 +-
> >>  utils/mountd/auth.c      |   29 +++++++++++++++++++++++++++--
> >>  utils/mountd/v4root.c    |   11 ++++++++++-
> >>  3 files changed, 38 insertions(+), 4 deletions(-)
> >> 
> >> diff --git a/support/include/v4root.h b/support/include/v4root.h
> >> index 706c15c70d95..406fd4e43e5a 100644
> >> --- a/support/include/v4root.h
> >> +++ b/support/include/v4root.h
> >> @@ -10,6 +10,6 @@
> >>  #define V4ROOT_H
> >>  
> >>  extern int v4root_needed;
> >> -extern void v4root_set(void);
> >> +extern void v4root_set(int *mountpoints_checked);
> >>  
> >>  #endif /* V4ROOT_H */
> >> diff --git a/utils/mountd/auth.c b/utils/mountd/auth.c
> >> index 0881d9a6edba..5bd7e6622307 100644
> >> --- a/utils/mountd/auth.c
> >> +++ b/utils/mountd/auth.c
> >> @@ -77,6 +77,29 @@ check_useipaddr(void)
> >>  		cache_flush(1);
> >>  }
> >>  
> >> +static int mountpoints_changed(void)
> >> +{
> >> +	static int last_size = 0;
> >> +	int size;
> >> +	int fd;
> >> +	char buf[4096];
> >> +	int n;
> >> +
> >> +	fd = open("/proc/mounts", O_RDONLY);
> >> +	if (fd < 0)
> >> +		/* ignore mountpoint changes if we cannot read /proc/mounts */
> >> +		return 0;
> >> +	size = 0;
> >> +	while ((n = read(fd, buf, sizeof(buf))) > 0)
> >> +		size += n;
> >> +	if (n < 0)
> >> +		return 0;
> >> +	if (size == last_size)
> >> +		return 0;
> >> +	last_size = size;
> >> +	return 1;
> >> +}
> >> +
> >>  unsigned int
> >>  auth_reload()
> >>  {
> >> @@ -84,6 +107,7 @@ auth_reload()
> >>  	static ino_t		last_inode;
> >>  	static int		last_fd = -1;
> >>  	static unsigned int	counter;
> >> +	static int		mountpoints_checked = 0;
> >>  	int			fd;
> >>  
> >>  	if ((fd = open(_PATH_ETAB, O_RDONLY)) < 0) {
> >> @@ -91,7 +115,8 @@ auth_reload()
> >>  	} else if (fstat(fd, &stb) < 0) {
> >>  		xlog(L_FATAL, "couldn't stat %s", _PATH_ETAB);
> >>  		close(fd);
> >> -	} else if (last_fd != -1 && stb.st_ino == last_inode) {
> >> +	} else if (last_fd != -1 && stb.st_ino == last_inode &&
> >> +		   (!mountpoints_checked || !mountpoints_changed())) {
> >>  		/* We opened the etab file before, and its inode
> >>  		 * number hasn't changed since then.
> >>  		 */
> >> @@ -114,7 +139,7 @@ auth_reload()
> >>  	memset(&my_client, 0, sizeof(my_client));
> >>  	xtab_export_read();
> >>  	check_useipaddr();
> >> -	v4root_set();
> >> +	v4root_set(&mountpoints_checked);
> >>  
> >>  	++counter;
> >>  
> >> diff --git a/utils/mountd/v4root.c b/utils/mountd/v4root.c
> >> index d52172592823..1a5778f9c7de 100644
> >> --- a/utils/mountd/v4root.c
> >> +++ b/utils/mountd/v4root.c
> >> @@ -183,7 +183,7 @@ static int v4root_add_parents(nfs_export *exp)
> >>   * looking for components of the v4 mount.
> >>   */
> >>  void
> >> -v4root_set()
> >> +v4root_set(int *mountpoints_checked)
> >>  {
> >>  	nfs_export	*exp;
> >>  	int	i;
> >> @@ -193,6 +193,7 @@ v4root_set()
> >>  	if (!v4root_support())
> >>  		return;
> >>  
> >> +	*mountpoints_checked = 0;
> >>  	for (i = 0; i < MCL_MAXTYPES; i++) {
> >>  		for (exp = exportlist[i].p_head; exp; exp = exp->m_next) {
> >>  			if (exp->m_export.e_flags & NFSEXP_V4ROOT)
> >> @@ -202,6 +203,14 @@ v4root_set()
> >>  				 */
> >>  				continue;
> >>  
> >> +			if (exp->m_export.e_mountpoint) {
> >> +				*mountpoints_checked = 1;
> >> +				if (!is_mountpoint(exp->m_export.e_mountpoint[0]?
> >> +						   exp->m_export.e_mountpoint:
> >> +						   exp->m_export.e_path))
> >> +					continue;
> >> +			}
> >> +
> >>  			if (strcmp(exp->m_export.e_path, "/") == 0 &&
> >>  			    !(exp->m_export.e_flags & NFSEXP_FSID)) {
> >>  				/* Force '/' to be exported as fsid == 0*/
> >> 
> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/8] mountd: don't add paths to non-mounted export points to pseudo-root
  2016-07-21 17:33       ` J. Bruce Fields
@ 2016-07-25  7:22         ` NeilBrown
  2016-07-28 20:54           ` J. Bruce Fields
  0 siblings, 1 reply; 28+ messages in thread
From: NeilBrown @ 2016-07-25  7:22 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Steve Dickson, Linux NFS Mailing list

[-- Attachment #1: Type: text/plain, Size: 9325 bytes --]

On Fri, Jul 22 2016, J. Bruce Fields wrote:

> On Wed, Jul 20, 2016 at 08:59:30AM +1000, NeilBrown wrote:
>> On Tue, Jul 19 2016, J. Bruce Fields wrote:
>> 
>> > On Thu, Jul 14, 2016 at 12:26:43PM +1000, NeilBrown wrote:
>> >> export points with the "mountpoint" flag should not be exported
>> >> if they aren't mounted.
>> >> They shouldn't even appear in the pseudo-root filesystem.
>> >> So add an appropriate check to v4root_set().
>> >> 
>> >> This means that the v4root might need to be recomputed whenever a
>> >> filesystem is mounted or unmounted.  So when there are export points
>> >> with the "mountpoint" flag, check for changes in the mount table.
>> >> This is done be measuring the size of /proc/mounts.
>> >
>> > Surely there's some more reliable measurement--could we track some data
>> > about the mountpoint itself, maybe?
>> 
>> We could.  But it would be more complex code for very little gain.
>> I did consider using select() on /proc/mounts to get a notification
>> whenever anything changes.  What we be more reliable but more difficult.
>> I also considered calculating an SHA1, or maybe just a crc32 on the
>> contents of /proc/mounts.  But then I realised that the size was very
>> easy and very nearly as reliable.
>
> So we don't care enough about the mountpoint option enough to make it
> work 100% reliably?
>
> If we expect too few users for there to be a real chance of hitting the
> bad case here, then I wonder again whether the whole feature is worth
> the trouble.
>
>> > But I'd still like some more justification for this change in logic.
>> > Does anyone currently use the "mp" option?  If not, could we just
>> > deprecate it?  If so, can we really get away with changing it this way?
>> 
>> I have a customer complaining that it doesn't work as advertised for
>> NFSv4.  So presumably they have a use-case, though I haven't asked for
>> details on exactly why they want it.
>
> I'd be inclined to ask for more details about the use case before
> continuing.

I asked, and found the answer quite helpful.  So thanks for prompting
that.

For NFSv2/3, If I list "/export/foo" in /etc/exports, but /export/foo
fails to mount during boot, then a client which tries to mount
"/export/foo" will get a file handle on /export (probably the root
filesystem).
Unless subtree_check is set (which we don't like) this effectively means
that the whole root filesystem is potentially exported, if the client can
determine the filehandles.
Once the problem is fixed, the filesystem is mounted, and "exportfs -r"
is run (possibly by reboot), the root filesystem will no longer be
exported, so that filehandle that the client has becomes stale (this is
the particlar symptom the customer mentioned).

I think it is safe to argue that having 'mount' fail is safer than
having it succeed, present an empty directory, and then have that
directory suddenly become stale at some later time.

For NFSv4 the root filesystem is always exported, but usually as the
'pseudo-root', being read-only and files being completely unavailable.
If /export/foo is not mounted but /export/foo is exported, then at least
part of the root filesystem will be exported (potentially) r/w.
I'm not sure what happens with filehandles.  A filehandle from the pseudo-root
filesystem has fsid=0.  A filehandle from a properly exported directory
on the root filesystem might not - I'd have to check another day.
So you might not get the 'stale file handles', but would might still get
unexpected access to the root filesystem.

So I think this justifies maintaining (and maybe even encouraging) the
'mountpoint' export option.

For NFSv4 it is probably OK for the to-be-mounted-on directory to be visible,
but firmly 'pseudo'.  So I can probably drop my elegant /proc/mounts
change detector which you aren't fond of.

When we get a filehandle for a filesystem which isn't currently mounted
the current code sends no response, so clients hang.  My latest patch
sends ESTALE, so client gets an error.
I wonder if we could arrange to make just the exported root look like an
empty (pseudo-root-style) directory.  Then when the filesystem gets
mounted the directory morphs into the real thing..
For files on the filesystem I could probably be convinced either way,
unless testing shows some unpleasant behaviour.

Are you convinced?  At all?

Thanks,
NeilBrown


>
> As it is I'm inclined towards some plan like documenting "mountpoint" as
> deprecated, warning on use, and taking it out eventually.
>
> --b.
>
>> I actually think this is the most useful of the changes.  It means that
>> if a filesystem isn't mounted, it isn't even visible over NFSv4.
>> 
>> After all, the reality is that people export filesystems, not names in
>> their namespace.  NFSv4 tries to make it all look like the same thing,
>> and there is some justification for that.  But I think a lot of people
>> think about it as filesystems being exported, and the mountpoint option
>> allows that thought to be expressed in the configuration.
>> 
>> NeilBrown
>> 
>> 
>> >
>> > --b.
>> >
>> >> 
>> >> Signed-off-by: NeilBrown <neilb@suse.com>
>> >> ---
>> >>  support/include/v4root.h |    2 +-
>> >>  utils/mountd/auth.c      |   29 +++++++++++++++++++++++++++--
>> >>  utils/mountd/v4root.c    |   11 ++++++++++-
>> >>  3 files changed, 38 insertions(+), 4 deletions(-)
>> >> 
>> >> diff --git a/support/include/v4root.h b/support/include/v4root.h
>> >> index 706c15c70d95..406fd4e43e5a 100644
>> >> --- a/support/include/v4root.h
>> >> +++ b/support/include/v4root.h
>> >> @@ -10,6 +10,6 @@
>> >>  #define V4ROOT_H
>> >>  
>> >>  extern int v4root_needed;
>> >> -extern void v4root_set(void);
>> >> +extern void v4root_set(int *mountpoints_checked);
>> >>  
>> >>  #endif /* V4ROOT_H */
>> >> diff --git a/utils/mountd/auth.c b/utils/mountd/auth.c
>> >> index 0881d9a6edba..5bd7e6622307 100644
>> >> --- a/utils/mountd/auth.c
>> >> +++ b/utils/mountd/auth.c
>> >> @@ -77,6 +77,29 @@ check_useipaddr(void)
>> >>  		cache_flush(1);
>> >>  }
>> >>  
>> >> +static int mountpoints_changed(void)
>> >> +{
>> >> +	static int last_size = 0;
>> >> +	int size;
>> >> +	int fd;
>> >> +	char buf[4096];
>> >> +	int n;
>> >> +
>> >> +	fd = open("/proc/mounts", O_RDONLY);
>> >> +	if (fd < 0)
>> >> +		/* ignore mountpoint changes if we cannot read /proc/mounts */
>> >> +		return 0;
>> >> +	size = 0;
>> >> +	while ((n = read(fd, buf, sizeof(buf))) > 0)
>> >> +		size += n;
>> >> +	if (n < 0)
>> >> +		return 0;
>> >> +	if (size == last_size)
>> >> +		return 0;
>> >> +	last_size = size;
>> >> +	return 1;
>> >> +}
>> >> +
>> >>  unsigned int
>> >>  auth_reload()
>> >>  {
>> >> @@ -84,6 +107,7 @@ auth_reload()
>> >>  	static ino_t		last_inode;
>> >>  	static int		last_fd = -1;
>> >>  	static unsigned int	counter;
>> >> +	static int		mountpoints_checked = 0;
>> >>  	int			fd;
>> >>  
>> >>  	if ((fd = open(_PATH_ETAB, O_RDONLY)) < 0) {
>> >> @@ -91,7 +115,8 @@ auth_reload()
>> >>  	} else if (fstat(fd, &stb) < 0) {
>> >>  		xlog(L_FATAL, "couldn't stat %s", _PATH_ETAB);
>> >>  		close(fd);
>> >> -	} else if (last_fd != -1 && stb.st_ino == last_inode) {
>> >> +	} else if (last_fd != -1 && stb.st_ino == last_inode &&
>> >> +		   (!mountpoints_checked || !mountpoints_changed())) {
>> >>  		/* We opened the etab file before, and its inode
>> >>  		 * number hasn't changed since then.
>> >>  		 */
>> >> @@ -114,7 +139,7 @@ auth_reload()
>> >>  	memset(&my_client, 0, sizeof(my_client));
>> >>  	xtab_export_read();
>> >>  	check_useipaddr();
>> >> -	v4root_set();
>> >> +	v4root_set(&mountpoints_checked);
>> >>  
>> >>  	++counter;
>> >>  
>> >> diff --git a/utils/mountd/v4root.c b/utils/mountd/v4root.c
>> >> index d52172592823..1a5778f9c7de 100644
>> >> --- a/utils/mountd/v4root.c
>> >> +++ b/utils/mountd/v4root.c
>> >> @@ -183,7 +183,7 @@ static int v4root_add_parents(nfs_export *exp)
>> >>   * looking for components of the v4 mount.
>> >>   */
>> >>  void
>> >> -v4root_set()
>> >> +v4root_set(int *mountpoints_checked)
>> >>  {
>> >>  	nfs_export	*exp;
>> >>  	int	i;
>> >> @@ -193,6 +193,7 @@ v4root_set()
>> >>  	if (!v4root_support())
>> >>  		return;
>> >>  
>> >> +	*mountpoints_checked = 0;
>> >>  	for (i = 0; i < MCL_MAXTYPES; i++) {
>> >>  		for (exp = exportlist[i].p_head; exp; exp = exp->m_next) {
>> >>  			if (exp->m_export.e_flags & NFSEXP_V4ROOT)
>> >> @@ -202,6 +203,14 @@ v4root_set()
>> >>  				 */
>> >>  				continue;
>> >>  
>> >> +			if (exp->m_export.e_mountpoint) {
>> >> +				*mountpoints_checked = 1;
>> >> +				if (!is_mountpoint(exp->m_export.e_mountpoint[0]?
>> >> +						   exp->m_export.e_mountpoint:
>> >> +						   exp->m_export.e_path))
>> >> +					continue;
>> >> +			}
>> >> +
>> >>  			if (strcmp(exp->m_export.e_path, "/") == 0 &&
>> >>  			    !(exp->m_export.e_flags & NFSEXP_FSID)) {
>> >>  				/* Force '/' to be exported as fsid == 0*/
>> >> 
>> >> 
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/8] mountd: don't add paths to non-mounted export points to pseudo-root
  2016-07-25  7:22         ` NeilBrown
@ 2016-07-28 20:54           ` J. Bruce Fields
  0 siblings, 0 replies; 28+ messages in thread
From: J. Bruce Fields @ 2016-07-28 20:54 UTC (permalink / raw)
  To: NeilBrown; +Cc: Steve Dickson, Linux NFS Mailing list

On Mon, Jul 25, 2016 at 05:22:09PM +1000, NeilBrown wrote:
> On Fri, Jul 22 2016, J. Bruce Fields wrote:
> 
> > On Wed, Jul 20, 2016 at 08:59:30AM +1000, NeilBrown wrote:
> >> On Tue, Jul 19 2016, J. Bruce Fields wrote:
> >> 
> >> > On Thu, Jul 14, 2016 at 12:26:43PM +1000, NeilBrown wrote:
> >> >> export points with the "mountpoint" flag should not be exported
> >> >> if they aren't mounted.
> >> >> They shouldn't even appear in the pseudo-root filesystem.
> >> >> So add an appropriate check to v4root_set().
> >> >> 
> >> >> This means that the v4root might need to be recomputed whenever a
> >> >> filesystem is mounted or unmounted.  So when there are export points
> >> >> with the "mountpoint" flag, check for changes in the mount table.
> >> >> This is done be measuring the size of /proc/mounts.
> >> >
> >> > Surely there's some more reliable measurement--could we track some data
> >> > about the mountpoint itself, maybe?
> >> 
> >> We could.  But it would be more complex code for very little gain.
> >> I did consider using select() on /proc/mounts to get a notification
> >> whenever anything changes.  What we be more reliable but more difficult.
> >> I also considered calculating an SHA1, or maybe just a crc32 on the
> >> contents of /proc/mounts.  But then I realised that the size was very
> >> easy and very nearly as reliable.
> >
> > So we don't care enough about the mountpoint option enough to make it
> > work 100% reliably?
> >
> > If we expect too few users for there to be a real chance of hitting the
> > bad case here, then I wonder again whether the whole feature is worth
> > the trouble.
> >
> >> > But I'd still like some more justification for this change in logic.
> >> > Does anyone currently use the "mp" option?  If not, could we just
> >> > deprecate it?  If so, can we really get away with changing it this way?
> >> 
> >> I have a customer complaining that it doesn't work as advertised for
> >> NFSv4.  So presumably they have a use-case, though I haven't asked for
> >> details on exactly why they want it.
> >
> > I'd be inclined to ask for more details about the use case before
> > continuing.
> 
> I asked, and found the answer quite helpful.

I agree, thanks!

> So thanks for prompting that.
> 
> For NFSv2/3, If I list "/export/foo" in /etc/exports, but /export/foo
> fails to mount during boot, then a client which tries to mount
> "/export/foo" will get a file handle on /export (probably the root
> filesystem).
> Unless subtree_check is set (which we don't like) this effectively means
> that the whole root filesystem is potentially exported, if the client can
> determine the filehandles.
> Once the problem is fixed, the filesystem is mounted, and "exportfs -r"
> is run (possibly by reboot), the root filesystem will no longer be
> exported, so that filehandle that the client has becomes stale (this is
> the particlar symptom the customer mentioned).
> 
> I think it is safe to argue that having 'mount' fail is safer than
> having it succeed, present an empty directory, and then have that
> directory suddenly become stale at some later time.

Yes, and the security exposure is terrible too.

But users should get security by default.  And the same for sensible
errors on mount failures.  They shouldn't have to request it.

(Maybe they do: on typical distributions, nfsd probably won't start
until all local filesystems are mounted, will it?)

> For NFSv4 the root filesystem is always exported, but usually as the
> 'pseudo-root', being read-only and files being completely unavailable.
> If /export/foo is not mounted but /export/foo is exported, then at least
> part of the root filesystem will be exported (potentially) r/w.
> I'm not sure what happens with filehandles.  A filehandle from the pseudo-root
> filesystem has fsid=0.  A filehandle from a properly exported directory
> on the root filesystem might not - I'd have to check another day.
> So you might not get the 'stale file handles', but would might still get
> unexpected access to the root filesystem.

In the end the situation sounds about the same for all NFS versions.

> So I think this justifies maintaining (and maybe even encouraging) the
> 'mountpoint' export option.
> 
> For NFSv4 it is probably OK for the to-be-mounted-on directory to be visible,
> but firmly 'pseudo'.  So I can probably drop my elegant /proc/mounts
> change detector which you aren't fond of.

If we do keep (even encourage) "mountpoint", then we will get a bug
where somebody hit a false negative.

> When we get a filehandle for a filesystem which isn't currently mounted
> the current code sends no response, so clients hang.  My latest patch
> sends ESTALE, so client gets an error.
> I wonder if we could arrange to make just the exported root look like an
> empty (pseudo-root-style) directory.  Then when the filesystem gets
> mounted the directory morphs into the real thing..
> For files on the filesystem I could probably be convinced either way,
> unless testing shows some unpleasant behaviour.
> 
> Are you convinced?  At all?

I dunno.  "mountpoint" probably isn't widely used, so maybe we can get
away with changing it in the way you suggest, and I agree that that
would be better (though I still don't get why the
not-completely-reliable /proc/mounts thing is OK).

--b.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/8] mountd: remove 'dev_missing' checks
  2016-07-21 17:24       ` J. Bruce Fields
@ 2016-08-11  2:51         ` NeilBrown
  2016-08-16 15:21           ` J. Bruce Fields
  0 siblings, 1 reply; 28+ messages in thread
From: NeilBrown @ 2016-08-11  2:51 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Steve Dickson, Linux NFS Mailing list

[-- Attachment #1: Type: text/plain, Size: 4917 bytes --]

On Fri, Jul 22 2016, J. Bruce Fields wrote:

> On Wed, Jul 20, 2016 at 08:50:12AM +1000, NeilBrown wrote:
>> On Tue, Jul 19 2016, J. Bruce Fields wrote:
>> 
>> > On Thu, Jul 14, 2016 at 12:26:43PM +1000, NeilBrown wrote:
>> >> I now think this was a mistaken idea.
>> >> 
>> >> If a filesystem is exported with the "mountpoint" or "mp" option, it
>> >> should only be exported if the directory is a mount point.  The
>> >> intention is that if there is a problem with one filesystem on a
>> >> server, the others can still be exported, but clients won't
>> >> incorrectly see the empty directory on the parent when accessing the
>> >> missing filesystem, they will see clearly that the filesystem is
>> >> missing.
>> >> 
>> >> It is easy to handle this correctly for NFSv3 MOUNT requests, but what
>> >> is the correct behavior if a client already has the filesystem mounted
>> >> and so has a filehandle?  Maybe the server rebooted and came back with
>> >> one device missing.  What should the client see?
>> >> 
>> >> The "dev_missing" code tries to detect this case and causes the server
>> >> to respond with silence rather than ESTALE.  The idea was that the
>> >> client would retry and when (or if) the filesystem came back, service
>> >> would be transparently restored.
>> >> 
>> >> The problem with this is that arbitrarily long delays are not what
>> >> people would expect, and can be quite annoying.  ESTALE, while
>> >> unpleasant, it at least easily understood.  A device disappearing is a
>> >> fairly significant event and hiding it doesn't really serve anyone.
>> >
>> > It could also be a filesystem disappearing because it failed to mount in
>> > time on a reboot.
>> 
>> I don't think "in time" is really an issue.  Boot sequencing should not
>> start nfsd until everything in /etc/fstab is mounted, has failed and the
>> failure has been deemed acceptable.
>> That is why nfs-server.services has "After= local-fs.target"
>
> Yeah, I agree, that's the right way to do it.  [snip]

There is actually more here ... don't you love getting drip-feed
symptoms and requirements :-)

It turns out the the customer is NFS-exporting a filesystem mounted
using iSCSI.  Such filesystems are treated by systemd as "network"
filesystem, which seems at least a little bit reasonable.
So it is "remote-fs" that applies, or more accurately
"remote-fs-pre.target"
And nfs-server.services contains:

Before=remote-fs-pre.target

So nfsd is likely to start up before the iSCSI filesystems are mounted.

The customer tried to stop this bt using a systemd drop-in to add
RequiresMountsFor= for the remote filesystem, but that causes
a loop with the Before=remote-fs-pre.target.

I don't think we need this line for sequencing start-up, but it does
seem to be useful for sequencing shutdown - so that nfs-server is
stopped after remote-fs-pre, which is stopped after things are
unmounted.
"useful", but not "right".  This doesn't stop remote servers from
shutting down in the wrong order.
We should probably remove this line and teach systemd to use "umount -f"
which doesn't block when the server is down.  If systemd just used a
script, that would be easy....

I'm not 100% certain that "umount -f" is correct.  We just want to stop
umount from stating the mountpoint, we don't need to send MNT_FORCE.
I sometimes think it would be really nice if NFS didn't block a
'getattr' request of the mountpoint.  That would remove some pain from
unmount and other places where the server was down, but probably would
cause other problem.

Does anyone have any opinions on the best way to make sure systemd
doesn't hang when it tries to umount a filesystem from an unresponsive
server?  Is "-f" best, or something else.



There is another issue related to this that I've been meaning to
mention.  It related to the start-up ordering rather than shut down.

When you try to mount an NFS filesystem and the server isn't responding,
mount.nfs retries for a little while and then - if "bg" is given - it
forks and retries a bit longer.
While it keeps gets failures that appear temporary, like ECONNREFUSED or
ETIMEDOUT (see nfs_is_permanent_error()) it keeps retrying.

There is typically a window between when rpcbind starts responding to
queries, and when nfsd has registered with it.  If mount.nfs sends an
rpcbind query in this window. It gets RPC_PROGNOTREGISTERED which
nfs_rewrite_pmap_mount_options maps to EOPNOTSUPP, and
nfs_is_permanent_error() thinks that is a permanent error.

Strangely, when the 'bg' option is used, there is special-case code
Commit: bf66c9facb8e ("mounts.nfs: v2 and v3 background mounts should retry when server is down.")
to stop EOPNOTSUPP from being a permanent error.

Do people think it would be reasonable to make it a transient error for
foreground mounts too?

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/8] mountd: remove 'dev_missing' checks
  2016-08-11  2:51         ` NeilBrown
@ 2016-08-16 15:21           ` J. Bruce Fields
  2016-08-18  1:32             ` NeilBrown
  0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2016-08-16 15:21 UTC (permalink / raw)
  To: NeilBrown; +Cc: Steve Dickson, Linux NFS Mailing list

On Thu, Aug 11, 2016 at 12:51:45PM +1000, NeilBrown wrote:
> On Fri, Jul 22 2016, J. Bruce Fields wrote:
> 
> > On Wed, Jul 20, 2016 at 08:50:12AM +1000, NeilBrown wrote:
> >> On Tue, Jul 19 2016, J. Bruce Fields wrote:
> >> 
> >> > On Thu, Jul 14, 2016 at 12:26:43PM +1000, NeilBrown wrote:
> >> >> I now think this was a mistaken idea.
> >> >> 
> >> >> If a filesystem is exported with the "mountpoint" or "mp" option, it
> >> >> should only be exported if the directory is a mount point.  The
> >> >> intention is that if there is a problem with one filesystem on a
> >> >> server, the others can still be exported, but clients won't
> >> >> incorrectly see the empty directory on the parent when accessing the
> >> >> missing filesystem, they will see clearly that the filesystem is
> >> >> missing.
> >> >> 
> >> >> It is easy to handle this correctly for NFSv3 MOUNT requests, but what
> >> >> is the correct behavior if a client already has the filesystem mounted
> >> >> and so has a filehandle?  Maybe the server rebooted and came back with
> >> >> one device missing.  What should the client see?
> >> >> 
> >> >> The "dev_missing" code tries to detect this case and causes the server
> >> >> to respond with silence rather than ESTALE.  The idea was that the
> >> >> client would retry and when (or if) the filesystem came back, service
> >> >> would be transparently restored.
> >> >> 
> >> >> The problem with this is that arbitrarily long delays are not what
> >> >> people would expect, and can be quite annoying.  ESTALE, while
> >> >> unpleasant, it at least easily understood.  A device disappearing is a
> >> >> fairly significant event and hiding it doesn't really serve anyone.
> >> >
> >> > It could also be a filesystem disappearing because it failed to mount in
> >> > time on a reboot.
> >> 
> >> I don't think "in time" is really an issue.  Boot sequencing should not
> >> start nfsd until everything in /etc/fstab is mounted, has failed and the
> >> failure has been deemed acceptable.
> >> That is why nfs-server.services has "After= local-fs.target"
> >
> > Yeah, I agree, that's the right way to do it.  [snip]
> 
> There is actually more here ... don't you love getting drip-feed
> symptoms and requirements :-)

It's all good.

> It turns out the the customer is NFS-exporting a filesystem mounted
> using iSCSI.  Such filesystems are treated by systemd as "network"
> filesystem, which seems at least a little bit reasonable.
> So it is "remote-fs" that applies, or more accurately
> "remote-fs-pre.target"
> And nfs-server.services contains:
> 
> Before=remote-fs-pre.target

This is to handle the loopback case?

In which case what it really wants to say is "before nfs mounts" (or
even "before nfs mounts of localhost"; and vice versa on shutdown).  I
can't tell if there's an easy way to get say that.

> So nfsd is likely to start up before the iSCSI filesystems are mounted.
> 
> The customer tried to stop this bt using a systemd drop-in to add
> RequiresMountsFor= for the remote filesystem, but that causes
> a loop with the Before=remote-fs-pre.target.
> 
> I don't think we need this line for sequencing start-up, but it does
> seem to be useful for sequencing shutdown - so that nfs-server is
> stopped after remote-fs-pre, which is stopped after things are
> unmounted.
> "useful", but not "right".  This doesn't stop remote servers from
> shutting down in the wrong order.
> We should probably remove this line and teach systemd to use "umount -f"
> which doesn't block when the server is down.  If systemd just used a
> script, that would be easy....
> 
> I'm not 100% certain that "umount -f" is correct.  We just want to stop
> umount from stating the mountpoint, we don't need to send MNT_FORCE.
> I sometimes think it would be really nice if NFS didn't block a
> 'getattr' request of the mountpoint.  That would remove some pain from
> unmount and other places where the server was down, but probably would
> cause other problem.

I thought I remembered Jeff Layton doing some work to remove the need to
revalidate the mountpoint on unmount in recent kernels.

Is that the only risk, though?  Maybe so--presumably you've killed any
users, so any write data associated with opens should be flushed.  And
if you do a sync after that you take care of write delegations too.

> Does anyone have any opinions on the best way to make sure systemd
> doesn't hang when it tries to umount a filesystem from an unresponsive
> server?  Is "-f" best, or something else.
> 
> 
> 
> There is another issue related to this that I've been meaning to
> mention.  It related to the start-up ordering rather than shut down.
> 
> When you try to mount an NFS filesystem and the server isn't responding,
> mount.nfs retries for a little while and then - if "bg" is given - it
> forks and retries a bit longer.
> While it keeps gets failures that appear temporary, like ECONNREFUSED or
> ETIMEDOUT (see nfs_is_permanent_error()) it keeps retrying.
> 
> There is typically a window between when rpcbind starts responding to
> queries, and when nfsd has registered with it.  If mount.nfs sends an
> rpcbind query in this window. It gets RPC_PROGNOTREGISTERED which
> nfs_rewrite_pmap_mount_options maps to EOPNOTSUPP, and
> nfs_is_permanent_error() thinks that is a permanent error.

Looking at rpcbind(8)....  Shouldn't "-w" prevent this by loading some
registrations before it starts responding to requests?

--b.

> 
> Strangely, when the 'bg' option is used, there is special-case code
> Commit: bf66c9facb8e ("mounts.nfs: v2 and v3 background mounts should retry when server is down.")
> to stop EOPNOTSUPP from being a permanent error.
> 
> Do people think it would be reasonable to make it a transient error for
> foreground mounts too?
> 
> Thanks,
> NeilBrown



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/8] mountd: remove 'dev_missing' checks
  2016-08-16 15:21           ` J. Bruce Fields
@ 2016-08-18  1:32             ` NeilBrown
  2016-08-18  2:57               ` Chuck Lever
  2016-08-18 13:57               ` J. Bruce Fields
  0 siblings, 2 replies; 28+ messages in thread
From: NeilBrown @ 2016-08-18  1:32 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Steve Dickson, Linux NFS Mailing list

[-- Attachment #1: Type: text/plain, Size: 6199 bytes --]

On Wed, Aug 17 2016, J. Bruce Fields wrote:
>
>> It turns out the the customer is NFS-exporting a filesystem mounted
>> using iSCSI.  Such filesystems are treated by systemd as "network"
>> filesystem, which seems at least a little bit reasonable.
>> So it is "remote-fs" that applies, or more accurately
>> "remote-fs-pre.target"
>> And nfs-server.services contains:
>> 
>> Before=remote-fs-pre.target
>
> This is to handle the loopback case?

That is what the git-commit says.  Specifically to handle the
shutdown/unmount side.

>
> In which case what it really wants to say is "before nfs mounts" (or
> even "before nfs mounts of localhost"; and vice versa on shutdown).  I
> can't tell if there's an easy way to get say that.

I'd be happy with a difficult/complex way, if it was reliable.
Could we write a systemd generator which parses /etc/fstab, determines
all mount points which a loop-back NFS mounts (or even just any NFS
mounts) and creates a drop-in for nfs-server which adds
  Before=mount-point.mount
for each /mount/point.

Could that be reliable?  I might try.


Another generator could process /etc/exports and add
   RequiresMountsFor=/export/point
for every export point.  Maybe skip export points with the "mountpoint"
option as mountd expects those to possibly appear later.
   
>
>> So nfsd is likely to start up before the iSCSI filesystems are mounted.
>> 
>> The customer tried to stop this bt using a systemd drop-in to add
>> RequiresMountsFor= for the remote filesystem, but that causes
>> a loop with the Before=remote-fs-pre.target.
>> 
>> I don't think we need this line for sequencing start-up, but it does
>> seem to be useful for sequencing shutdown - so that nfs-server is
>> stopped after remote-fs-pre, which is stopped after things are
>> unmounted.
>> "useful", but not "right".  This doesn't stop remote servers from
>> shutting down in the wrong order.
>> We should probably remove this line and teach systemd to use "umount -f"
>> which doesn't block when the server is down.  If systemd just used a
>> script, that would be easy....
>> 
>> I'm not 100% certain that "umount -f" is correct.  We just want to stop
>> umount from stating the mountpoint, we don't need to send MNT_FORCE.
>> I sometimes think it would be really nice if NFS didn't block a
>> 'getattr' request of the mountpoint.  That would remove some pain from
>> unmount and other places where the server was down, but probably would
>> cause other problem.
>
> I thought I remembered Jeff Layton doing some work to remove the need to
> revalidate the mountpoint on unmount in recent kernels.

Jeff's work means that the umount systemcall won't revalidate the mount
point.  This is important and useful, but not sufficient.
/usr/bin/mount will 'lstat' the mountpoint (unless -f or -l is given).
That is what causes the problem.
mount mostly want to make sure it has a canonical name.  It should be
possible to get it to avoid the stat if the name can be determined to
be canonical some other way.  Just looking to see if it is in
/proc/mounts would work, but there are comments in the code about
/proc/mounts sometimes being very large, and not wanting to do that too
much.... need to look again.

>
> Is that the only risk, though?  Maybe so--presumably you've killed any
> users, so any write data associated with opens should be flushed.  And
> if you do a sync after that you take care of write delegations too.

In the easily reproducible case, all user processes are gone.
It would be worth checking what happens if processes are accessing a
filesystem from an unreachable server at shutdown.
"kill -9" should get rid of them all now, so it might be OK.
"sync" would hang though.  I'd be happy for that to cause a delay of a
minute or so, but hopefully systemd would (or could be told to) kill -9
a sync if it took too long.

>
>> Does anyone have any opinions on the best way to make sure systemd
>> doesn't hang when it tries to umount a filesystem from an unresponsive
>> server?  Is "-f" best, or something else.
>> 
>> 
>> 
>> There is another issue related to this that I've been meaning to
>> mention.  It related to the start-up ordering rather than shut down.
>> 
>> When you try to mount an NFS filesystem and the server isn't responding,
>> mount.nfs retries for a little while and then - if "bg" is given - it
>> forks and retries a bit longer.
>> While it keeps gets failures that appear temporary, like ECONNREFUSED or
>> ETIMEDOUT (see nfs_is_permanent_error()) it keeps retrying.
>> 
>> There is typically a window between when rpcbind starts responding to
>> queries, and when nfsd has registered with it.  If mount.nfs sends an
>> rpcbind query in this window. It gets RPC_PROGNOTREGISTERED which
>> nfs_rewrite_pmap_mount_options maps to EOPNOTSUPP, and
>> nfs_is_permanent_error() thinks that is a permanent error.
>
> Looking at rpcbind(8)....  Shouldn't "-w" prevent this by loading some
> registrations before it starts responding to requests?

"-w" (which isn't listed in the SYNOPSIS!) only applies to a warm-start
where the daemons which previously registered are still running.
The problem case is that the daemons haven't registered yet (so we don't
necessarily know what port number they will get).

To address the issue in rpcbind, we would need a flag to say "don't
respond to lookup requests, just accept registrations", then when all
registrations are complete, send some message to rpcbind to say "OK,
respond to lookups now".  That could even be done by killing and
restarting with "-w", though that it a bit ugly.

I'm leaning towards having mount retry after RPC_PROGNOTREGISTERED for
fg like it does with bg.

Thanks,
NeilBrown


>
> --b.
>
>> 
>> Strangely, when the 'bg' option is used, there is special-case code
>> Commit: bf66c9facb8e ("mounts.nfs: v2 and v3 background mounts should retry when server is down.")
>> to stop EOPNOTSUPP from being a permanent error.
>> 
>> Do people think it would be reasonable to make it a transient error for
>> foreground mounts too?
>> 
>> Thanks,
>> NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/8] mountd: remove 'dev_missing' checks
  2016-08-18  1:32             ` NeilBrown
@ 2016-08-18  2:57               ` Chuck Lever
  2016-08-19  1:31                 ` NeilBrown
  2016-08-18 13:57               ` J. Bruce Fields
  1 sibling, 1 reply; 28+ messages in thread
From: Chuck Lever @ 2016-08-18  2:57 UTC (permalink / raw)
  To: NeilBrown; +Cc: J. Bruce Fields, Steve Dickson, Linux NFS Mailing List


> On Aug 17, 2016, at 9:32 PM, NeilBrown <neilb@suse.com> wrote:
> 
> On Wed, Aug 17 2016, J. Bruce Fields wrote:
>>> 
>>> 
>>> There is another issue related to this that I've been meaning to
>>> mention.  It related to the start-up ordering rather than shut down.
>>> 
>>> When you try to mount an NFS filesystem and the server isn't responding,
>>> mount.nfs retries for a little while and then - if "bg" is given - it
>>> forks and retries a bit longer.
>>> While it keeps gets failures that appear temporary, like ECONNREFUSED or
>>> ETIMEDOUT (see nfs_is_permanent_error()) it keeps retrying.
>>> 
>>> There is typically a window between when rpcbind starts responding to
>>> queries, and when nfsd has registered with it.  If mount.nfs sends an
>>> rpcbind query in this window. It gets RPC_PROGNOTREGISTERED which
>>> nfs_rewrite_pmap_mount_options maps to EOPNOTSUPP, and
>>> nfs_is_permanent_error() thinks that is a permanent error.
>> 
>> Looking at rpcbind(8)....  Shouldn't "-w" prevent this by loading some
>> registrations before it starts responding to requests?
> 
> "-w" (which isn't listed in the SYNOPSIS!) only applies to a warm-start
> where the daemons which previously registered are still running.
> The problem case is that the daemons haven't registered yet (so we don't
> necessarily know what port number they will get).
> 
> To address the issue in rpcbind, we would need a flag to say "don't
> respond to lookup requests, just accept registrations", then when all
> registrations are complete, send some message to rpcbind to say "OK,
> respond to lookups now".  That could even be done by killing and
> restarting with "-w", though that it a bit ugly.

An alternative would be to create a temporary firewall rule that
blocked port 111 to remote connections. Once local RPC services
had registered, the rule is removed.

Just a thought.


> I'm leaning towards having mount retry after RPC_PROGNOTREGISTERED for
> fg like it does with bg.

It probably should do that. If rpcbind is up, then the other
services are probably on their way.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/8] mountd: remove 'dev_missing' checks
  2016-08-18  1:32             ` NeilBrown
  2016-08-18  2:57               ` Chuck Lever
@ 2016-08-18 13:57               ` J. Bruce Fields
  2016-08-19  1:28                 ` NeilBrown
  1 sibling, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2016-08-18 13:57 UTC (permalink / raw)
  To: NeilBrown; +Cc: Steve Dickson, Linux NFS Mailing list

Not really arguing--I'll trust your judgement--just some random ideas:

On Thu, Aug 18, 2016 at 11:32:52AM +1000, NeilBrown wrote:
> On Wed, Aug 17 2016, J. Bruce Fields wrote:
> > In which case what it really wants to say is "before nfs mounts" (or
> > even "before nfs mounts of localhost"; and vice versa on shutdown).  I
> > can't tell if there's an easy way to get say that.
> 
> I'd be happy with a difficult/complex way, if it was reliable.
> Could we write a systemd generator which parses /etc/fstab, determines
> all mount points which a loop-back NFS mounts (or even just any NFS
> mounts) and creates a drop-in for nfs-server which adds
>   Before=mount-point.mount
> for each /mount/point.
> 
> Could that be reliable?  I might try.

Digging around... we've also got this callout from mount to start-statd,
can we use something like that to make loopback nfs mounts wait on nfs
server startup?

> > Is that the only risk, though?  Maybe so--presumably you've killed any
> > users, so any write data associated with opens should be flushed.  And
> > if you do a sync after that you take care of write delegations too.
> 
> In the easily reproducible case, all user processes are gone.
> It would be worth checking what happens if processes are accessing a
> filesystem from an unreachable server at shutdown.
> "kill -9" should get rid of them all now, so it might be OK.
> "sync" would hang though.  I'd be happy for that to cause a delay of a
> minute or so, but hopefully systemd would (or could be told to) kill -9
> a sync if it took too long.

We shouldn't have to resort to that in the loopback nfs case, where we
control ordering.  So in that case, I'm just pointing out that:

	kill -9 all users of the filesystem
	shutdown nfs server
	umount nfs filesystems

isn't the right ordering, because in the presence of write delegations
there could still be writeback data.

(OK, actually, knfsd doesn't currently implement write delegations--but
we shouldn't depend on that assumption.)

Adding a sync between the first two steps might help, though the write
delegations themselves could still linger, and I don't know how the
client will behave when it finds it can't return them.

So it'd be nice if we could just order the umount before the server
shutdown.

The case of a remote server shut down too early is different of course.

> > Looking at rpcbind(8)....  Shouldn't "-w" prevent this by loading some
> > registrations before it starts responding to requests?
> 
> "-w" (which isn't listed in the SYNOPSIS!) only applies to a warm-start
> where the daemons which previously registered are still running.
> The problem case is that the daemons haven't registered yet (so we don't
> necessarily know what port number they will get).

We probably know the port in the specific case of nfsd, and could fake
up rpcbind's state file if necessary.  Eh, your idea's not as bad:

> To address the issue in rpcbind, we would need a flag to say "don't
> respond to lookup requests, just accept registrations", then when all
> registrations are complete, send some message to rpcbind to say "OK,
> respond to lookups now".  That could even be done by killing and
> restarting with "-w", though that it a bit ugly.
> 
> I'm leaning towards having mount retry after RPC_PROGNOTREGISTERED for
> fg like it does with bg.

Anyway, sounds OK to me.

--b.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/8] mountd: remove 'dev_missing' checks
  2016-08-18 13:57               ` J. Bruce Fields
@ 2016-08-19  1:28                 ` NeilBrown
  2016-08-19 17:27                   ` J. Bruce Fields
  0 siblings, 1 reply; 28+ messages in thread
From: NeilBrown @ 2016-08-19  1:28 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Steve Dickson, Linux NFS Mailing list

[-- Attachment #1: Type: text/plain, Size: 3999 bytes --]

On Thu, Aug 18 2016, J. Bruce Fields wrote:

> Not really arguing--I'll trust your judgement--just some random ideas:
>
> On Thu, Aug 18, 2016 at 11:32:52AM +1000, NeilBrown wrote:
>> On Wed, Aug 17 2016, J. Bruce Fields wrote:
>> > In which case what it really wants to say is "before nfs mounts" (or
>> > even "before nfs mounts of localhost"; and vice versa on shutdown).  I
>> > can't tell if there's an easy way to get say that.
>> 
>> I'd be happy with a difficult/complex way, if it was reliable.
>> Could we write a systemd generator which parses /etc/fstab, determines
>> all mount points which a loop-back NFS mounts (or even just any NFS
>> mounts) and creates a drop-in for nfs-server which adds
>>   Before=mount-point.mount
>> for each /mount/point.
>> 
>> Could that be reliable?  I might try.
>
> Digging around... we've also got this callout from mount to start-statd,
> can we use something like that to make loopback nfs mounts wait on nfs
> server startup?

An nfs mount already waits for the server to start up.  The ordering
dependency between NFS mounts and the nfs-server only really matters at
shutdown, and we cannot enhance mount.nfs to wait for a negative amount
of time (also known as "time travel")

>
>> > Is that the only risk, though?  Maybe so--presumably you've killed any
>> > users, so any write data associated with opens should be flushed.  And
>> > if you do a sync after that you take care of write delegations too.
>> 
>> In the easily reproducible case, all user processes are gone.
>> It would be worth checking what happens if processes are accessing a
>> filesystem from an unreachable server at shutdown.
>> "kill -9" should get rid of them all now, so it might be OK.
>> "sync" would hang though.  I'd be happy for that to cause a delay of a
>> minute or so, but hopefully systemd would (or could be told to) kill -9
>> a sync if it took too long.
>
> We shouldn't have to resort to that in the loopback nfs case, where we
> control ordering.  So in that case, I'm just pointing out that:
>
> 	kill -9 all users of the filesystem
> 	shutdown nfs server
> 	umount nfs filesystems
>
> isn't the right ordering, because in the presence of write delegations
> there could still be writeback data.

Yes, that does make a good case for getting the ordering right, rather
than just getting the shutdown-sequence not to block.  Thanks,

>
> (OK, actually, knfsd doesn't currently implement write delegations--but
> we shouldn't depend on that assumption.)
>
> Adding a sync between the first two steps might help, though the write
> delegations themselves could still linger, and I don't know how the
> client will behave when it finds it can't return them.
>
> So it'd be nice if we could just order the umount before the server
> shutdown.
>
> The case of a remote server shut down too early is different of course.
>
>> > Looking at rpcbind(8)....  Shouldn't "-w" prevent this by loading some
>> > registrations before it starts responding to requests?
>> 
>> "-w" (which isn't listed in the SYNOPSIS!) only applies to a warm-start
>> where the daemons which previously registered are still running.
>> The problem case is that the daemons haven't registered yet (so we don't
>> necessarily know what port number they will get).
>
> We probably know the port in the specific case of nfsd, and could fake
> up rpcbind's state file if necessary.  Eh, your idea's not as bad:
>
>> To address the issue in rpcbind, we would need a flag to say "don't
>> respond to lookup requests, just accept registrations", then when all
>> registrations are complete, send some message to rpcbind to say "OK,
>> respond to lookups now".  That could even be done by killing and
>> restarting with "-w", though that it a bit ugly.
>> 
>> I'm leaning towards having mount retry after RPC_PROGNOTREGISTERED for
>> fg like it does with bg.
>
> Anyway, sounds OK to me.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/8] mountd: remove 'dev_missing' checks
  2016-08-18  2:57               ` Chuck Lever
@ 2016-08-19  1:31                 ` NeilBrown
  0 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2016-08-19  1:31 UTC (permalink / raw)
  To: Chuck Lever; +Cc: J. Bruce Fields, Steve Dickson, Linux NFS Mailing List

[-- Attachment #1: Type: text/plain, Size: 2519 bytes --]

On Thu, Aug 18 2016, Chuck Lever wrote:

>> On Aug 17, 2016, at 9:32 PM, NeilBrown <neilb@suse.com> wrote:
>> 
>> On Wed, Aug 17 2016, J. Bruce Fields wrote:
>>>> 
>>>> 
>>>> There is another issue related to this that I've been meaning to
>>>> mention.  It related to the start-up ordering rather than shut down.
>>>> 
>>>> When you try to mount an NFS filesystem and the server isn't responding,
>>>> mount.nfs retries for a little while and then - if "bg" is given - it
>>>> forks and retries a bit longer.
>>>> While it keeps gets failures that appear temporary, like ECONNREFUSED or
>>>> ETIMEDOUT (see nfs_is_permanent_error()) it keeps retrying.
>>>> 
>>>> There is typically a window between when rpcbind starts responding to
>>>> queries, and when nfsd has registered with it.  If mount.nfs sends an
>>>> rpcbind query in this window. It gets RPC_PROGNOTREGISTERED which
>>>> nfs_rewrite_pmap_mount_options maps to EOPNOTSUPP, and
>>>> nfs_is_permanent_error() thinks that is a permanent error.
>>> 
>>> Looking at rpcbind(8)....  Shouldn't "-w" prevent this by loading some
>>> registrations before it starts responding to requests?
>> 
>> "-w" (which isn't listed in the SYNOPSIS!) only applies to a warm-start
>> where the daemons which previously registered are still running.
>> The problem case is that the daemons haven't registered yet (so we don't
>> necessarily know what port number they will get).
>> 
>> To address the issue in rpcbind, we would need a flag to say "don't
>> respond to lookup requests, just accept registrations", then when all
>> registrations are complete, send some message to rpcbind to say "OK,
>> respond to lookups now".  That could even be done by killing and
>> restarting with "-w", though that it a bit ugly.
>
> An alternative would be to create a temporary firewall rule that
> blocked port 111 to remote connections. Once local RPC services
> had registered, the rule is removed.
>
> Just a thought.

Interesting ..... probably the sort of thing that I would resort to if I
really needed to fix this and didn't have any source code.
But fiddling with fire-wall rules is not one of my favourite things so I
think I stick with a more focussed solution.
Thanks!

NeilBrown


>
>
>> I'm leaning towards having mount retry after RPC_PROGNOTREGISTERED for
>> fg like it does with bg.
>
> It probably should do that. If rpcbind is up, then the other
> services are probably on their way.
>
>
> --
> Chuck Lever

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/8] mountd: remove 'dev_missing' checks
  2016-08-19  1:28                 ` NeilBrown
@ 2016-08-19 17:27                   ` J. Bruce Fields
  0 siblings, 0 replies; 28+ messages in thread
From: J. Bruce Fields @ 2016-08-19 17:27 UTC (permalink / raw)
  To: NeilBrown; +Cc: Steve Dickson, Linux NFS Mailing list

On Fri, Aug 19, 2016 at 11:28:30AM +1000, NeilBrown wrote:
> On Thu, Aug 18 2016, J. Bruce Fields wrote:
> 
> > Not really arguing--I'll trust your judgement--just some random ideas:
> >
> > On Thu, Aug 18, 2016 at 11:32:52AM +1000, NeilBrown wrote:
> >> On Wed, Aug 17 2016, J. Bruce Fields wrote:
> >> > In which case what it really wants to say is "before nfs mounts" (or
> >> > even "before nfs mounts of localhost"; and vice versa on shutdown).  I
> >> > can't tell if there's an easy way to get say that.
> >> 
> >> I'd be happy with a difficult/complex way, if it was reliable.
> >> Could we write a systemd generator which parses /etc/fstab, determines
> >> all mount points which a loop-back NFS mounts (or even just any NFS
> >> mounts) and creates a drop-in for nfs-server which adds
> >>   Before=mount-point.mount
> >> for each /mount/point.
> >> 
> >> Could that be reliable?  I might try.
> >
> > Digging around... we've also got this callout from mount to start-statd,
> > can we use something like that to make loopback nfs mounts wait on nfs
> > server startup?
> 
> An nfs mount already waits for the server to start up.  The ordering
> dependency between NFS mounts and the nfs-server only really matters at
> shutdown, and we cannot enhance mount.nfs to wait for a negative amount
> of time (also known as "time travel")

D'oh, I keep forgetting that point.

--b.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-08-19 17:27 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-14  2:26 [PATCH 0/8] Assorted mount-related nfs-utils patches NeilBrown
2016-07-14  2:26 ` [PATCH 3/8] mountd: remove 'dev_missing' checks NeilBrown
2016-07-18 20:01   ` J. Bruce Fields
2016-07-19 22:50     ` NeilBrown
2016-07-21 17:24       ` J. Bruce Fields
2016-08-11  2:51         ` NeilBrown
2016-08-16 15:21           ` J. Bruce Fields
2016-08-18  1:32             ` NeilBrown
2016-08-18  2:57               ` Chuck Lever
2016-08-19  1:31                 ` NeilBrown
2016-08-18 13:57               ` J. Bruce Fields
2016-08-19  1:28                 ` NeilBrown
2016-08-19 17:27                   ` J. Bruce Fields
2016-07-14  2:26 ` [PATCH 6/8] mountd: don't add paths to non-mounted export points to pseudo-root NeilBrown
2016-07-18 20:32   ` J. Bruce Fields
2016-07-19  8:00     ` Chuck Lever
2016-07-19 22:59     ` NeilBrown
2016-07-21 17:33       ` J. Bruce Fields
2016-07-25  7:22         ` NeilBrown
2016-07-28 20:54           ` J. Bruce Fields
2016-07-14  2:26 ` [PATCH 4/8] mountd: cause attempts to access unmounted exportpoints to return ESTALE NeilBrown
2016-07-14  2:26 ` [PATCH 2/8] mountd: remove the --exports-file option NeilBrown
2016-07-18 16:19   ` J. Bruce Fields
2016-07-14  2:26 ` [PATCH 1/8] nfs.man: clarify effect of 'retry' option NeilBrown
2016-07-14  2:26 ` [PATCH 5/8] mountd: Don't export unmounted exports to NFSv4 NeilBrown
2016-07-14  2:26 ` [PATCH 7/8] mount: don't treat temporary name resolution failure as permanent NeilBrown
2016-07-19 23:01   ` NeilBrown
2016-07-14  2:26 ` [PATCH 8/8] mount: use a public address for IPv6 callback NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.