All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sunrpc: wake up SOFTCONN tasks when a connection error happens.
@ 2011-11-07  4:06 NeilBrown
  2011-11-07 14:56 ` Chuck Lever
  2011-11-07 19:49 ` Trond Myklebust
  0 siblings, 2 replies; 4+ messages in thread
From: NeilBrown @ 2011-11-07  4:06 UTC (permalink / raw)
  To: Trond Myklebust, Chuck Lever, Jeff Layton; +Cc: NFS

[-- Attachment #1: Type: text/plain, Size: 4378 bytes --]


hi all,
 It being over a year since I last raised this I thought it might be time to
 try again.

 The problem is that an NFSv4 mount request (the default) to an unrouteable
 server results in a 3 minute timeout instead of an instant failure.

 This is easy to test by simply removing your default route then trying to
 mount something outside your local network.

 This patch causes any SOFTCONN task to be woken up as soon as a connection
 error occurs so that it can fail promptly.  The failure reasons gets passed
 back and as it is not ETIMEDOUT it causes immediate failure.

 Is this a reasonable approach?

Thanks,
NeilBrown




From a1aea8fc3977ffa9951c3d7f27dbb1905e5f560f Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Mon, 7 Nov 2011 15:00:17 +1100
Subject: [PATCH] sunrpc: wake up SOFTCONN tasks when a connection error
 happens.

A 'SOFTCONN' task should fail if there is an error or a major timeout
during connection.

However errors are currently converted into a timeout (60seconds for
TCP) which is treated as a minor timeout and 3 of these are required
before failure.

The result of this is that if you try to mount an NFSv4 filesystem
(which doesn't require rpcbind and the failure modes that provides)
from a server which you do not have a route to (an so get
NETUNREACHABLE), you have an unnecessary 3 minutes timeout.

So when ENETUNREACH is reported for a connection - or other errors
which are fatal, wake up any SOFTCONN tasks with that error - rather
than letting them wait 60 seconds and then generate ETIMEDOUT.

This causes the above mentioned mount attempt to fail instantly.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 include/linux/sunrpc/sched.h |    1 +
 net/sunrpc/sched.c           |   29 +++++++++++++++++++++++++++++
 net/sunrpc/xprtsock.c        |    6 +++++-
 3 files changed, 35 insertions(+), 1 deletions(-)

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index e775689..b85451b 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -236,6 +236,7 @@ void		rpc_wake_up_queued_task(struct rpc_wait_queue *,
 void		rpc_wake_up(struct rpc_wait_queue *);
 struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *);
 void		rpc_wake_up_status(struct rpc_wait_queue *, int);
+void		rpc_wake_up_softconn_status(struct rpc_wait_queue *, int);
 int		rpc_queue_empty(struct rpc_wait_queue *);
 void		rpc_delay(struct rpc_task *, unsigned long);
 void *		rpc_malloc(struct rpc_task *, size_t);
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index d12ffa5..d92000a 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -543,6 +543,35 @@ void rpc_wake_up_status(struct rpc_wait_queue *queue, int status)
 }
 EXPORT_SYMBOL_GPL(rpc_wake_up_status);
 
+/**
+ * rpc_wake_up_softconn_status - wake up all SOFTCONN rpc_tasks and set their
+ * status value.
+ * @queue: rpc_wait_queue on which the tasks are sleeping
+ * @status: status value to set
+ *
+ * Grabs queue->lock
+ */
+void rpc_wake_up_softconn_status(struct rpc_wait_queue *queue, int status)
+{
+	struct rpc_task *task, *next;
+	struct list_head *head;
+
+	spin_lock_bh(&queue->lock);
+	head = &queue->tasks[queue->maxpriority];
+	for (;;) {
+		list_for_each_entry_safe(task, next, head, u.tk_wait.list)
+			if (RPC_IS_SOFTCONN(task)) {
+				task->tk_status = status;
+				rpc_wake_up_task_queue_locked(queue, task);
+			}
+		if (head == &queue->tasks[0])
+			break;
+		head--;
+	}
+	spin_unlock_bh(&queue->lock);
+}
+EXPORT_SYMBOL_GPL(rpc_wake_up_softconn_status);
+
 static void __rpc_queue_timer_fn(unsigned long ptr)
 {
 	struct rpc_wait_queue *queue = (struct rpc_wait_queue *)ptr;
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index d7f97ef..02c683b 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2158,7 +2158,11 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 	case -ECONNREFUSED:
 	case -ECONNRESET:
 	case -ENETUNREACH:
-		/* retry with existing socket, after a delay */
+		/* Retry with existing socket after a delay, except
+		 * for SOFTCONN tasks which fail. */
+		xprt_clear_connecting(xprt);
+		rpc_wake_up_softconn_status(&xprt->pending, status);
+		return;
 	case 0:
 	case -EINPROGRESS:
 	case -EALREADY:
-- 
1.7.7


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] sunrpc: wake up SOFTCONN tasks when a connection error happens.
  2011-11-07  4:06 [PATCH] sunrpc: wake up SOFTCONN tasks when a connection error happens NeilBrown
@ 2011-11-07 14:56 ` Chuck Lever
  2011-11-07 19:49 ` Trond Myklebust
  1 sibling, 0 replies; 4+ messages in thread
From: Chuck Lever @ 2011-11-07 14:56 UTC (permalink / raw)
  To: NeilBrown; +Cc: Trond Myklebust, Jeff Layton, NFS


On Nov 6, 2011, at 11:06 PM, NeilBrown wrote:

> 
> hi all,
> It being over a year since I last raised this I thought it might be time to
> try again.
> 
> The problem is that an NFSv4 mount request (the default) to an unrouteable
> server results in a 3 minute timeout instead of an instant failure.
> 
> This is easy to test by simply removing your default route then trying to
> mount something outside your local network.

Awesome, thanks for the simple reproducer!

> This patch causes any SOFTCONN task to be woken up as soon as a connection
> error occurs so that it can fail promptly.  The failure reasons gets passed
> back and as it is not ETIMEDOUT it causes immediate failure.
> 
> Is this a reasonable approach?

I don't have a philosophical objection to this.  However, Trond knows the architecture of TCP reconnection the best; I can't comment on the choice of returning the actual errno instead of ETIMEDOUT.

> Thanks,
> NeilBrown
> 
> 
> 
> 
> From a1aea8fc3977ffa9951c3d7f27dbb1905e5f560f Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@suse.de>
> Date: Mon, 7 Nov 2011 15:00:17 +1100
> Subject: [PATCH] sunrpc: wake up SOFTCONN tasks when a connection error
> happens.
> 
> A 'SOFTCONN' task should fail if there is an error or a major timeout
> during connection.
> 
> However errors are currently converted into a timeout (60seconds for
> TCP) which is treated as a minor timeout and 3 of these are required
> before failure.
> 
> The result of this is that if you try to mount an NFSv4 filesystem
> (which doesn't require rpcbind and the failure modes that provides)
> from a server which you do not have a route to (an so get
> NETUNREACHABLE), you have an unnecessary 3 minutes timeout.
> 
> So when ENETUNREACH is reported for a connection - or other errors
> which are fatal, wake up any SOFTCONN tasks with that error - rather
> than letting them wait 60 seconds and then generate ETIMEDOUT.
> 
> This causes the above mentioned mount attempt to fail instantly.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> include/linux/sunrpc/sched.h |    1 +
> net/sunrpc/sched.c           |   29 +++++++++++++++++++++++++++++
> net/sunrpc/xprtsock.c        |    6 +++++-
> 3 files changed, 35 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
> index e775689..b85451b 100644
> --- a/include/linux/sunrpc/sched.h
> +++ b/include/linux/sunrpc/sched.h
> @@ -236,6 +236,7 @@ void		rpc_wake_up_queued_task(struct rpc_wait_queue *,
> void		rpc_wake_up(struct rpc_wait_queue *);
> struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *);
> void		rpc_wake_up_status(struct rpc_wait_queue *, int);
> +void		rpc_wake_up_softconn_status(struct rpc_wait_queue *, int);
> int		rpc_queue_empty(struct rpc_wait_queue *);
> void		rpc_delay(struct rpc_task *, unsigned long);
> void *		rpc_malloc(struct rpc_task *, size_t);
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index d12ffa5..d92000a 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -543,6 +543,35 @@ void rpc_wake_up_status(struct rpc_wait_queue *queue, int status)
> }
> EXPORT_SYMBOL_GPL(rpc_wake_up_status);
> 
> +/**
> + * rpc_wake_up_softconn_status - wake up all SOFTCONN rpc_tasks and set their
> + * status value.
> + * @queue: rpc_wait_queue on which the tasks are sleeping
> + * @status: status value to set
> + *
> + * Grabs queue->lock
> + */
> +void rpc_wake_up_softconn_status(struct rpc_wait_queue *queue, int status)
> +{
> +	struct rpc_task *task, *next;
> +	struct list_head *head;
> +
> +	spin_lock_bh(&queue->lock);
> +	head = &queue->tasks[queue->maxpriority];
> +	for (;;) {
> +		list_for_each_entry_safe(task, next, head, u.tk_wait.list)
> +			if (RPC_IS_SOFTCONN(task)) {
> +				task->tk_status = status;
> +				rpc_wake_up_task_queue_locked(queue, task);
> +			}
> +		if (head == &queue->tasks[0])
> +			break;
> +		head--;
> +	}
> +	spin_unlock_bh(&queue->lock);
> +}
> +EXPORT_SYMBOL_GPL(rpc_wake_up_softconn_status);
> +
> static void __rpc_queue_timer_fn(unsigned long ptr)
> {
> 	struct rpc_wait_queue *queue = (struct rpc_wait_queue *)ptr;
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index d7f97ef..02c683b 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -2158,7 +2158,11 @@ static void xs_tcp_setup_socket(struct work_struct *work)
> 	case -ECONNREFUSED:
> 	case -ECONNRESET:
> 	case -ENETUNREACH:
> -		/* retry with existing socket, after a delay */
> +		/* Retry with existing socket after a delay, except
> +		 * for SOFTCONN tasks which fail. */
> +		xprt_clear_connecting(xprt);
> +		rpc_wake_up_softconn_status(&xprt->pending, status);
> +		return;
> 	case 0:
> 	case -EINPROGRESS:
> 	case -EALREADY:
> -- 
> 1.7.7
> 

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sunrpc: wake up SOFTCONN tasks when a connection error happens.
  2011-11-07  4:06 [PATCH] sunrpc: wake up SOFTCONN tasks when a connection error happens NeilBrown
  2011-11-07 14:56 ` Chuck Lever
@ 2011-11-07 19:49 ` Trond Myklebust
  2011-11-08  0:13   ` NeilBrown
  1 sibling, 1 reply; 4+ messages in thread
From: Trond Myklebust @ 2011-11-07 19:49 UTC (permalink / raw)
  To: NeilBrown; +Cc: Chuck Lever, Jeff Layton, NFS

On Sun, 2011-11-06 at 20:06 -0800, NeilBrown wrote: 
> hi all,
>  It being over a year since I last raised this I thought it might be time to
>  try again.
> 
>  The problem is that an NFSv4 mount request (the default) to an unrouteable
>  server results in a 3 minute timeout instead of an instant failure.
> 
>  This is easy to test by simply removing your default route then trying to
>  mount something outside your local network.
> 
>  This patch causes any SOFTCONN task to be woken up as soon as a connection
>  error occurs so that it can fail promptly.  The failure reasons gets passed
>  back and as it is not ETIMEDOUT it causes immediate failure.
> 
>  Is this a reasonable approach?
> 
> Thanks,
> NeilBrown
> 
> 
> 
> 
> From a1aea8fc3977ffa9951c3d7f27dbb1905e5f560f Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@suse.de>
> Date: Mon, 7 Nov 2011 15:00:17 +1100
> Subject: [PATCH] sunrpc: wake up SOFTCONN tasks when a connection error
>  happens.
> 
> A 'SOFTCONN' task should fail if there is an error or a major timeout
> during connection.
> 
> However errors are currently converted into a timeout (60seconds for
> TCP) which is treated as a minor timeout and 3 of these are required
> before failure.
> 
> The result of this is that if you try to mount an NFSv4 filesystem
> (which doesn't require rpcbind and the failure modes that provides)
> from a server which you do not have a route to (an so get
> NETUNREACHABLE), you have an unnecessary 3 minutes timeout.
> 
> So when ENETUNREACH is reported for a connection - or other errors
> which are fatal, wake up any SOFTCONN tasks with that error - rather
> than letting them wait 60 seconds and then generate ETIMEDOUT.
> 
> This causes the above mentioned mount attempt to fail instantly.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  include/linux/sunrpc/sched.h |    1 +
>  net/sunrpc/sched.c           |   29 +++++++++++++++++++++++++++++
>  net/sunrpc/xprtsock.c        |    6 +++++-
>  3 files changed, 35 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
> index e775689..b85451b 100644
> --- a/include/linux/sunrpc/sched.h
> +++ b/include/linux/sunrpc/sched.h
> @@ -236,6 +236,7 @@ void		rpc_wake_up_queued_task(struct rpc_wait_queue *,
>  void		rpc_wake_up(struct rpc_wait_queue *);
>  struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *);
>  void		rpc_wake_up_status(struct rpc_wait_queue *, int);
> +void		rpc_wake_up_softconn_status(struct rpc_wait_queue *, int);
>  int		rpc_queue_empty(struct rpc_wait_queue *);
>  void		rpc_delay(struct rpc_task *, unsigned long);
>  void *		rpc_malloc(struct rpc_task *, size_t);
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index d12ffa5..d92000a 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -543,6 +543,35 @@ void rpc_wake_up_status(struct rpc_wait_queue *queue, int status)
>  }
>  EXPORT_SYMBOL_GPL(rpc_wake_up_status);
>  
> +/**
> + * rpc_wake_up_softconn_status - wake up all SOFTCONN rpc_tasks and set their
> + * status value.
> + * @queue: rpc_wait_queue on which the tasks are sleeping
> + * @status: status value to set
> + *
> + * Grabs queue->lock
> + */
> +void rpc_wake_up_softconn_status(struct rpc_wait_queue *queue, int status)
> +{
> +	struct rpc_task *task, *next;
> +	struct list_head *head;
> +
> +	spin_lock_bh(&queue->lock);
> +	head = &queue->tasks[queue->maxpriority];
> +	for (;;) {
> +		list_for_each_entry_safe(task, next, head, u.tk_wait.list)
> +			if (RPC_IS_SOFTCONN(task)) {
> +				task->tk_status = status;
> +				rpc_wake_up_task_queue_locked(queue, task);
> +			}

This is basically rpc_wake_up_status() with an extra conditional test
(which again is just rpc_wake_up() with an extra status argument).
Should we consider merging these functions?

> +		if (head == &queue->tasks[0])
> +			break;
> +		head--;
> +	}
> +	spin_unlock_bh(&queue->lock);
> +}
> +EXPORT_SYMBOL_GPL(rpc_wake_up_softconn_status);

Why do we want to export this?



-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sunrpc: wake up SOFTCONN tasks when a connection error happens.
  2011-11-07 19:49 ` Trond Myklebust
@ 2011-11-08  0:13   ` NeilBrown
  0 siblings, 0 replies; 4+ messages in thread
From: NeilBrown @ 2011-11-08  0:13 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Chuck Lever, Jeff Layton, NFS

[-- Attachment #1: Type: text/plain, Size: 5092 bytes --]

On Mon, 07 Nov 2011 14:49:03 -0500 Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:

> On Sun, 2011-11-06 at 20:06 -0800, NeilBrown wrote: 
> > hi all,
> >  It being over a year since I last raised this I thought it might be time to
> >  try again.
> > 
> >  The problem is that an NFSv4 mount request (the default) to an unrouteable
> >  server results in a 3 minute timeout instead of an instant failure.
> > 
> >  This is easy to test by simply removing your default route then trying to
> >  mount something outside your local network.
> > 
> >  This patch causes any SOFTCONN task to be woken up as soon as a connection
> >  error occurs so that it can fail promptly.  The failure reasons gets passed
> >  back and as it is not ETIMEDOUT it causes immediate failure.
> > 
> >  Is this a reasonable approach?
> > 
> > Thanks,
> > NeilBrown
> > 
> > 
> > 
> > 
> > From a1aea8fc3977ffa9951c3d7f27dbb1905e5f560f Mon Sep 17 00:00:00 2001
> > From: NeilBrown <neilb@suse.de>
> > Date: Mon, 7 Nov 2011 15:00:17 +1100
> > Subject: [PATCH] sunrpc: wake up SOFTCONN tasks when a connection error
> >  happens.
> > 
> > A 'SOFTCONN' task should fail if there is an error or a major timeout
> > during connection.
> > 
> > However errors are currently converted into a timeout (60seconds for
> > TCP) which is treated as a minor timeout and 3 of these are required
> > before failure.
> > 
> > The result of this is that if you try to mount an NFSv4 filesystem
> > (which doesn't require rpcbind and the failure modes that provides)
> > from a server which you do not have a route to (an so get
> > NETUNREACHABLE), you have an unnecessary 3 minutes timeout.
> > 
> > So when ENETUNREACH is reported for a connection - or other errors
> > which are fatal, wake up any SOFTCONN tasks with that error - rather
> > than letting them wait 60 seconds and then generate ETIMEDOUT.
> > 
> > This causes the above mentioned mount attempt to fail instantly.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  include/linux/sunrpc/sched.h |    1 +
> >  net/sunrpc/sched.c           |   29 +++++++++++++++++++++++++++++
> >  net/sunrpc/xprtsock.c        |    6 +++++-
> >  3 files changed, 35 insertions(+), 1 deletions(-)
> > 
> > diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
> > index e775689..b85451b 100644
> > --- a/include/linux/sunrpc/sched.h
> > +++ b/include/linux/sunrpc/sched.h
> > @@ -236,6 +236,7 @@ void		rpc_wake_up_queued_task(struct rpc_wait_queue *,
> >  void		rpc_wake_up(struct rpc_wait_queue *);
> >  struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *);
> >  void		rpc_wake_up_status(struct rpc_wait_queue *, int);
> > +void		rpc_wake_up_softconn_status(struct rpc_wait_queue *, int);
> >  int		rpc_queue_empty(struct rpc_wait_queue *);
> >  void		rpc_delay(struct rpc_task *, unsigned long);
> >  void *		rpc_malloc(struct rpc_task *, size_t);
> > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > index d12ffa5..d92000a 100644
> > --- a/net/sunrpc/sched.c
> > +++ b/net/sunrpc/sched.c
> > @@ -543,6 +543,35 @@ void rpc_wake_up_status(struct rpc_wait_queue *queue, int status)
> >  }
> >  EXPORT_SYMBOL_GPL(rpc_wake_up_status);
> >  
> > +/**
> > + * rpc_wake_up_softconn_status - wake up all SOFTCONN rpc_tasks and set their
> > + * status value.
> > + * @queue: rpc_wait_queue on which the tasks are sleeping
> > + * @status: status value to set
> > + *
> > + * Grabs queue->lock
> > + */
> > +void rpc_wake_up_softconn_status(struct rpc_wait_queue *queue, int status)
> > +{
> > +	struct rpc_task *task, *next;
> > +	struct list_head *head;
> > +
> > +	spin_lock_bh(&queue->lock);
> > +	head = &queue->tasks[queue->maxpriority];
> > +	for (;;) {
> > +		list_for_each_entry_safe(task, next, head, u.tk_wait.list)
> > +			if (RPC_IS_SOFTCONN(task)) {
> > +				task->tk_status = status;
> > +				rpc_wake_up_task_queue_locked(queue, task);
> > +			}
> 
> This is basically rpc_wake_up_status() with an extra conditional test
> (which again is just rpc_wake_up() with an extra status argument).
> Should we consider merging these functions?

I wondered a bit about this, but felt it safest to leave  the code structured
as it was.
You could possibly combine them all into one function with:

 if ((task->tk_flags & flags) == flags) {
    if (status < 0)
        task->tk_status = status;
    rpc_wake_up.....
 }
in the heart of the loop.   Though that would only be right if
rpc_wake_up_status was never called with a zero status (or positive).


> 
> > +		if (head == &queue->tasks[0])
> > +			break;
> > +		head--;
> > +	}
> > +	spin_unlock_bh(&queue->lock);
> > +}
> > +EXPORT_SYMBOL_GPL(rpc_wake_up_softconn_status);
> 
> Why do we want to export this?
>

We probably don't.  It is just a copy/paste artefact.

If you agree with the approach, and suggest how you would like to handle the
proliferation of rpc_wake_up_* I can respin that patch as a formal submission.

Thanks,
NeilBrown




[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-11-08  0:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-07  4:06 [PATCH] sunrpc: wake up SOFTCONN tasks when a connection error happens NeilBrown
2011-11-07 14:56 ` Chuck Lever
2011-11-07 19:49 ` Trond Myklebust
2011-11-08  0:13   ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.