linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Revise handling of some mount errors.
@ 2016-11-30  0:53 NeilBrown
  2016-11-30  0:53 ` [PATCH 2/2] mount: take history into account when assessing if an error is permanent NeilBrown
  2016-11-30  0:53 ` [PATCH 1/2] mount: don't hide temporary error code on timeout NeilBrown
  0 siblings, 2 replies; 5+ messages in thread
From: NeilBrown @ 2016-11-30  0:53 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing List

These two patches improve the error messages reported when mount.nfs
fails, and reduces the timeout for some errors which are unlikely to
persist.

---

NeilBrown (2):
      mount: don't hide temporary error code on timeout.
      mount: take history into account when assessing if an error is permanent.


 utils/mount/stropts.c |   31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] mount: don't hide temporary error code on timeout.
  2016-11-30  0:53 [PATCH 0/2] Revise handling of some mount errors NeilBrown
  2016-11-30  0:53 ` [PATCH 2/2] mount: take history into account when assessing if an error is permanent NeilBrown
@ 2016-11-30  0:53 ` NeilBrown
  1 sibling, 0 replies; 5+ messages in thread
From: NeilBrown @ 2016-11-30  0:53 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing List

If a mount attempt times out due to repeated non-permanent errors, we
always report ETIMEDOUT rather than the actual error.
Errors like "ECONNREFUSED" or "EHOSTUNREACH" or "ESTALE" might be more
useful than the generic "ETIMEDOUT".

So preserve the error code.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 utils/mount/stropts.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
index d5dfb5e4a669..7b1ad93effc0 100644
--- a/utils/mount/stropts.c
+++ b/utils/mount/stropts.c
@@ -990,10 +990,8 @@ static int nfsmount_fg(struct nfsmount_info *mi)
 		if (nfs_is_permanent_error(errno))
 			break;
 
-		if (time(NULL) > timeout) {
-			errno = ETIMEDOUT;
+		if (time(NULL) > timeout)
 			break;
-		}
 
 		if (errno != ETIMEDOUT) {
 			if (sleep(secs))



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] mount: take history into account when assessing if an error is permanent.
  2016-11-30  0:53 [PATCH 0/2] Revise handling of some mount errors NeilBrown
@ 2016-11-30  0:53 ` NeilBrown
  2016-12-01 19:36   ` Steve Dickson
  2016-11-30  0:53 ` [PATCH 1/2] mount: don't hide temporary error code on timeout NeilBrown
  1 sibling, 1 reply; 5+ messages in thread
From: NeilBrown @ 2016-11-30  0:53 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing List

When attempting an NFSv3 mount request, it is possible to catch the
server at an "awkward" moment while it is still starting up.
In these cases it is possible to get an error that would otherwise
indiciate a permanent error, but which should be considered temporary during
the start-up window.

In particular:
 ECONNREFUSED will be returned between the time the network interface
   is configured, and the time that rpcbind starts
 EOPNOTSUPP (representing RPC_PROGNOTREGISTERED) will be returned between
   the time that rpcbind starts and the time when nfsd registers, and
 ESTALE will be returned between the time nfsd starts and when filesystems
   are exported (this windown can be removed with correct configuration).

So these errors only deserve a relatively small timeout.

So change nfs_is_permanent_error() to record the previous error
and the number of times the same error has been seen.
If ESTALE or EOPNOTSUPP is seen 3 times (over 3 seconds or more)
or ECONNREFUSED is seen 5 times (15 seconds), report a permanent
error, others assume it could be temporary.

A result of this is that if you try a UDP mount from a server which
doesn't support UDP, you get an error without a few seconds, rather
than a 2-minute timeout.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 utils/mount/stropts.c |   27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
index 7b1ad93effc0..ba6593036a50 100644
--- a/utils/mount/stropts.c
+++ b/utils/mount/stropts.c
@@ -935,20 +935,37 @@ static int nfs_try_mount(struct nfsmount_info *mi)
  * failed so far, but fail immediately if there is a local
  * error (like a bad mount option).
  *
- * ESTALE is also a temporary error because some servers
- * return ESTALE when a share is temporarily offline.
+ * If there is a remote error, like ESTALE or RPC_PROGNOTREGISTERED
+ * then it is probably permanent, but there is a small chance
+ * the it is temporary can we caught the server at an awkward
+ * time during start-up.  So require that we see three of those
+ * before treating them as permanent.
+ * For ECONNREFUSED, wait a bit longer as there is often a longer
+ * gap between the network being ready and the NFS server starting.
  *
  * Returns 1 if we should fail immediately, or 0 if we
  * should retry.
  */
 static int nfs_is_permanent_error(int error)
 {
+	static int prev_error;
+	static int rpt_cnt;
+
+	if (error == prev_error)
+		rpt_cnt += 1;
+	else
+		rpt_cnt = 1;
+	prev_error = error;
+
 	switch (error) {
 	case ESTALE:
-	case ETIMEDOUT:
+	case EOPNOTSUPP:	/* aka RPC_PROGNOTREGISTERED */
+		/* If two in a row, assume permanent */
+		return rpt_cnt >= 3;
 	case ECONNREFUSED:
+		return rpt_cnt >= 5;
+	case ETIMEDOUT:
 	case EHOSTUNREACH:
-	case EOPNOTSUPP:	/* aka RPC_PROGNOTREGISTERED */
 	case EAGAIN:
 		return 0;	/* temporary */
 	default:
@@ -1000,7 +1017,7 @@ static int nfsmount_fg(struct nfsmount_info *mi)
 			if (secs > 10)
 				secs = 10;
 		}
-	};
+	}
 
 	mount_error(mi->spec, mi->node, errno);
 	return EX_FAIL;



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] mount: take history into account when assessing if an error is permanent.
  2016-11-30  0:53 ` [PATCH 2/2] mount: take history into account when assessing if an error is permanent NeilBrown
@ 2016-12-01 19:36   ` Steve Dickson
  2016-12-01 22:09     ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Steve Dickson @ 2016-12-01 19:36 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux NFS Mailing List



On 11/29/2016 07:53 PM, NeilBrown wrote:
> When attempting an NFSv3 mount request, it is possible to catch the
> server at an "awkward" moment while it is still starting up.
> In these cases it is possible to get an error that would otherwise
> indiciate a permanent error, but which should be considered temporary during
> the start-up window.
> 
> In particular:
>  ECONNREFUSED will be returned between the time the network interface
>    is configured, and the time that rpcbind starts
>  EOPNOTSUPP (representing RPC_PROGNOTREGISTERED) will be returned between
>    the time that rpcbind starts and the time when nfsd registers, and
>  ESTALE will be returned between the time nfsd starts and when filesystems
>    are exported (this windown can be removed with correct configuration).
> 
> So these errors only deserve a relatively small timeout.
> 
> So change nfs_is_permanent_error() to record the previous error
> and the number of times the same error has been seen.
> If ESTALE or EOPNOTSUPP is seen 3 times (over 3 seconds or more)
> or ECONNREFUSED is seen 5 times (15 seconds), report a permanent
> error, others assume it could be temporary.
> 
> A result of this is that if you try a UDP mount from a server which
> doesn't support UDP, you get an error without a few seconds, rather
> than a 2-minute timeout.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  utils/mount/stropts.c |   27 ++++++++++++++++++++++-----
>  1 file changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
> index 7b1ad93effc0..ba6593036a50 100644
> --- a/utils/mount/stropts.c
> +++ b/utils/mount/stropts.c
> @@ -935,20 +935,37 @@ static int nfs_try_mount(struct nfsmount_info *mi)
>   * failed so far, but fail immediately if there is a local
>   * error (like a bad mount option).
>   *
> - * ESTALE is also a temporary error because some servers
> - * return ESTALE when a share is temporarily offline.
> + * If there is a remote error, like ESTALE or RPC_PROGNOTREGISTERED
> + * then it is probably permanent, but there is a small chance
> + * the it is temporary can we caught the server at an awkward
> + * time during start-up.  So require that we see three of those
> + * before treating them as permanent.
> + * For ECONNREFUSED, wait a bit longer as there is often a longer
> + * gap between the network being ready and the NFS server starting.
>   *
>   * Returns 1 if we should fail immediately, or 0 if we
>   * should retry.
>   */
>  static int nfs_is_permanent_error(int error)
>  {
> +	static int prev_error;
> +	static int rpt_cnt;
> +
> +	if (error == prev_error)
> +		rpt_cnt += 1;
> +	else
> +		rpt_cnt = 1;
> +	prev_error = error;
> +
>  	switch (error) {
>  	case ESTALE:
> -	case ETIMEDOUT:
> +	case EOPNOTSUPP:	/* aka RPC_PROGNOTREGISTERED */
> +		/* If two in a row, assume permanent */
> +		return rpt_cnt >= 3;
This looks good... very clean... and now the time
out is control by the -o retry setting... Perfect!

>  	case ECONNREFUSED:
> +		return rpt_cnt >= 5;
My only question is why mess with this? Over the years 
we've always time out after 2 mins (the default) and now 
we are bring that down to 15 secs? I'm think that might
be a bit short... I seems like we might be fixing something
that is not broken... 

But again the design is very clever... nice work! 

steved.

> +	case ETIMEDOUT:
>  	case EHOSTUNREACH:
> -	case EOPNOTSUPP:	/* aka RPC_PROGNOTREGISTERED */
>  	case EAGAIN:
>  		return 0;	/* temporary */
>  	default:
> @@ -1000,7 +1017,7 @@ static int nfsmount_fg(struct nfsmount_info *mi)
>  			if (secs > 10)
>  				secs = 10;
>  		}
> -	};
> +	}
>  
>  	mount_error(mi->spec, mi->node, errno);
>  	return EX_FAIL;
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] mount: take history into account when assessing if an error is permanent.
  2016-12-01 19:36   ` Steve Dickson
@ 2016-12-01 22:09     ` NeilBrown
  0 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2016-12-01 22:09 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Linux NFS Mailing List

[-- Attachment #1: Type: text/plain, Size: 1265 bytes --]

On Fri, Dec 02 2016, Steve Dickson wrote:
>>  static int nfs_is_permanent_error(int error)
>>  {
>> +	static int prev_error;
>> +	static int rpt_cnt;
>> +
>> +	if (error == prev_error)
>> +		rpt_cnt += 1;
>> +	else
>> +		rpt_cnt = 1;
>> +	prev_error = error;
>> +
>>  	switch (error) {
>>  	case ESTALE:
>> -	case ETIMEDOUT:
>> +	case EOPNOTSUPP:	/* aka RPC_PROGNOTREGISTERED */
>> +		/* If two in a row, assume permanent */
>> +		return rpt_cnt >= 3;
> This looks good... very clean... and now the time
> out is control by the -o retry setting... Perfect!
>
>>  	case ECONNREFUSED:
>> +		return rpt_cnt >= 5;
> My only question is why mess with this? Over the years 
> we've always time out after 2 mins (the default) and now 
> we are bring that down to 15 secs? I'm think that might
> be a bit short... I seems like we might be fixing something
> that is not broken... 

Because it seemed like the "right" thing to do.  ECONNREFUSED should be
a "permanent" error, except during a small window when the server is
booting.
But following the principle "if it ain't broke, don't fix it", I guess
it should stay as a "temporary" error.  I'll resend.

>
> But again the design is very clever... nice work! 

Thanks!

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-12-01 22:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-30  0:53 [PATCH 0/2] Revise handling of some mount errors NeilBrown
2016-11-30  0:53 ` [PATCH 2/2] mount: take history into account when assessing if an error is permanent NeilBrown
2016-12-01 19:36   ` Steve Dickson
2016-12-01 22:09     ` NeilBrown
2016-11-30  0:53 ` [PATCH 1/2] mount: don't hide temporary error code on timeout NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).