NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.

All of lore.kernel.org
 help / color / mirror / Atom feed

* NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
@ 2010-10-20  7:17 Neil Brown
  2010-10-20 14:29 ` Chuck Lever
  2010-10-20 17:55 ` Jeff Layton
  0 siblings, 2 replies; 20+ messages in thread
From: Neil Brown @ 2010-10-20  7:17 UTC (permalink / raw)
  To: Linux NFS Mailing List

If I don't have any network configured (except loop-back), and try an NFSv3
mount, then it fails quickly:

....
mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable
mount.nfs: Network is unreachable

If I try the same thing with a NFSv4 mount, it times out before it fails,
making a much longer delay.

This is because mount.nfs doesn't do a portmap lookup but just leaves
everything to the kernel.
The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN.
So at least it doesn't retry after the timeout.  But given that we have a
clear error, we shouldn't timeout at all.

Unfortunately I cannot see an easy way to fix this.

The place where ENETUNREACH is in xs_tcp_setup_socket.  The comment there
says "Retry with the same socket after a delay".  The "delay" bit is correct,
the "retry" isn't.

It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN
was set.  However we cannot see the task at this point - in fact it seems
that there could be a queue of tasks waiting on this connection.  I guess
some could be soft, and some not. ???

So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to
fail immediately when  RPC_TASK_SOFTCONN is set ???

This affects people who upgrade from openSUSE11.2 (which didn't support v4
mounts) to openSUSE11.3 (which defaults to v4) and who use network-manager
(which configures networks late) and have NFS mounts in /etc/fstab with
either explicit IP addresses or host names that can be resolved without the
network.
This config will work because when the network comes up, network-manager will
re-run the 'init.d/nfs' script.  However since 11.3 there is an unpleasant
pause before boot completes.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-20  7:17 NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts Neil Brown
@ 2010-10-20 14:29 ` Chuck Lever
  2010-10-20 21:29   ` Neil Brown
  2010-10-20 17:55 ` Jeff Layton
  1 sibling, 1 reply; 20+ messages in thread
From: Chuck Lever @ 2010-10-20 14:29 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux NFS Mailing List


On Oct 20, 2010, at 3:17 AM, Neil Brown wrote:

> 
> 
> If I don't have any network configured (except loop-back), and try an NFSv3
> mount, then it fails quickly:
> 
> 
> ....
> mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable
> mount.nfs: Network is unreachable
> 
> 
> If I try the same thing with a NFSv4 mount, it times out before it fails,
> making a much longer delay.
> 
> This is because mount.nfs doesn't do a portmap lookup but just leaves
> everything to the kernel.
> The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN.
> So at least it doesn't retry after the timeout.  But given that we have a
> clear error, we shouldn't timeout at all.
> 
> Unfortunately I cannot see an easy way to fix this.
> 
> The place where ENETUNREACH is in xs_tcp_setup_socket.  The comment there
> says "Retry with the same socket after a delay".  The "delay" bit is correct,
> the "retry" isn't.
> 
> It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN
> was set.  However we cannot see the task at this point - in fact it seems
> that there could be a queue of tasks waiting on this connection.  I guess
> some could be soft, and some not. ???
> 
> So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to
> fail immediately when  RPC_TASK_SOFTCONN is set ???

ECONNREFUSED should already fail immediately in this case.  If it's not failing immediately, that's a bug.

I agree that ENETUNREACH seems appropriate for quick failure if RPC_TASK_SOFTCONN is set.  (I thought it already worked this way, but maybe I'm mistaken).

> This affects people who upgrade from openSUSE11.2 (which didn't support v4
> mounts) to openSUSE11.3 (which defaults to v4) and who use network-manager
> (which configures networks late) and have NFS mounts in /etc/fstab with
> either explicit IP addresses or host names that can be resolved without the
> network.
> This config will work because when the network comes up, network-manager will
> re-run the 'init.d/nfs' script.  However since 11.3 there is an unpleasant
> pause before boot completes.


-- 
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-20  7:17 NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts Neil Brown
  2010-10-20 14:29 ` Chuck Lever
@ 2010-10-20 17:55 ` Jeff Layton
  2010-10-20 19:16   ` Jeff Layton
  1 sibling, 1 reply; 20+ messages in thread
From: Jeff Layton @ 2010-10-20 17:55 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux NFS Mailing List

On Wed, 20 Oct 2010 18:17:01 +1100
Neil Brown <neilb@suse.de> wrote:

> 
> 
> If I don't have any network configured (except loop-back), and try an NFSv3
> mount, then it fails quickly:
> 
> 
> ....
> mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable
> mount.nfs: Network is unreachable
> 
> 
> If I try the same thing with a NFSv4 mount, it times out before it fails,
> making a much longer delay.
> 
> This is because mount.nfs doesn't do a portmap lookup but just leaves
> everything to the kernel.
> The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN.
> So at least it doesn't retry after the timeout.  But given that we have a
> clear error, we shouldn't timeout at all.
> 
> Unfortunately I cannot see an easy way to fix this.
> 
> The place where ENETUNREACH is in xs_tcp_setup_socket.  The comment there
> says "Retry with the same socket after a delay".  The "delay" bit is correct,
> the "retry" isn't.
> 
> It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN
> was set.  However we cannot see the task at this point - in fact it seems
> that there could be a queue of tasks waiting on this connection.  I guess
> some could be soft, and some not. ???
> 
> So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to
> fail immediately when  RPC_TASK_SOFTCONN is set ???
> 
> 
> This affects people who upgrade from openSUSE11.2 (which didn't support v4
> mounts) to openSUSE11.3 (which defaults to v4) and who use network-manager
> (which configures networks late) and have NFS mounts in /etc/fstab with
> either explicit IP addresses or host names that can be resolved without the
> network.
> This config will work because when the network comes up, network-manager will
> re-run the 'init.d/nfs' script.  However since 11.3 there is an unpleasant
> pause before boot completes.
> 

Took me a few tries to get an ENETUNREACH error but I see the same hang
you do. For the record I was able to get one by not configuring an IPv6
addr on the box and attempting to mount an IPv6 address.

Interestingly while I was trying to reproduce it, I ended up
reproducing an EHOSTUNREACH error by trying to mount a IPv6 host to
which I didn't have a route. That error returns quickly from the
kernel. Maybe we can solve this simply by treating ENETUNREACH the same
as EHOSTUNREACH in this situation?

I'm not quite sure exactly how to make that happen, but it seems like
reasonable behavior.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-20 17:55 ` Jeff Layton
@ 2010-10-20 19:16   ` Jeff Layton
  2010-10-20 20:40     ` Neil Brown
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Layton @ 2010-10-20 19:16 UTC (permalink / raw)
  To: Neil Brown, chuck.lever; +Cc: Linux NFS Mailing List

On Wed, 20 Oct 2010 13:55:25 -0400
Jeff Layton <jlayton@redhat.com> wrote:

> On Wed, 20 Oct 2010 18:17:01 +1100
> Neil Brown <neilb@suse.de> wrote:
> 
> > 
> > 
> > If I don't have any network configured (except loop-back), and try an NFSv3
> > mount, then it fails quickly:
> > 
> > 
> > ....
> > mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable
> > mount.nfs: Network is unreachable
> > 
> > 
> > If I try the same thing with a NFSv4 mount, it times out before it fails,
> > making a much longer delay.
> > 
> > This is because mount.nfs doesn't do a portmap lookup but just leaves
> > everything to the kernel.
> > The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN.
> > So at least it doesn't retry after the timeout.  But given that we have a
> > clear error, we shouldn't timeout at all.
> > 
> > Unfortunately I cannot see an easy way to fix this.
> > 
> > The place where ENETUNREACH is in xs_tcp_setup_socket.  The comment there
> > says "Retry with the same socket after a delay".  The "delay" bit is correct,
> > the "retry" isn't.
> > 
> > It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN
> > was set.  However we cannot see the task at this point - in fact it seems
> > that there could be a queue of tasks waiting on this connection.  I guess
> > some could be soft, and some not. ???
> > 
> > So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to
> > fail immediately when  RPC_TASK_SOFTCONN is set ???
> > 
> > 
> > This affects people who upgrade from openSUSE11.2 (which didn't support v4
> > mounts) to openSUSE11.3 (which defaults to v4) and who use network-manager
> > (which configures networks late) and have NFS mounts in /etc/fstab with
> > either explicit IP addresses or host names that can be resolved without the
> > network.
> > This config will work because when the network comes up, network-manager will
> > re-run the 'init.d/nfs' script.  However since 11.3 there is an unpleasant
> > pause before boot completes.
> > 
> 
> Took me a few tries to get an ENETUNREACH error but I see the same hang
> you do. For the record I was able to get one by not configuring an IPv6
> addr on the box and attempting to mount an IPv6 address.
> 
> Interestingly while I was trying to reproduce it, I ended up
> reproducing an EHOSTUNREACH error by trying to mount a IPv6 host to
> which I didn't have a route. That error returns quickly from the
> kernel. Maybe we can solve this simply by treating ENETUNREACH the same
> as EHOSTUNREACH in this situation?
> 
> I'm not quite sure exactly how to make that happen, but it seems like
> reasonable behavior.
> 

Sigh, nothing's ever easy in the RPC layer. Please bear with my
scatterbrained analysis...

There's a bit of difference at the socket layer between those two cases.
xs_tcp_finish_connecting calls this to connect the socket:

    kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK);

...in the ENETUNREACH case, this returns immediately with the error. In
EHOSTUNREACH case, it returns EINPROGRESS and then the sk_error_report
handles the rest. Fine...we can emulate the similar behavior, but...

Then what happens is that xs_tcp_send_request gets called again to try
to resend the packet. In the EHOSTUNREACH case, that returns
EHOSTUNREACH which eventually causes an rpc_exit with that error. In
the ENETUNREACH case that returns EPIPE, which makes the state machine
move next to call_bind and the whole thing starts over again.

I'm still not sure what the right approach is here. The fact that
attempting to send on the socket in this case gives us an EPIPE makes
it tough to handle this case the same way as EHOSTUNREACH.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-20 19:16   ` Jeff Layton
@ 2010-10-20 20:40     ` Neil Brown
  2010-10-21  0:45       ` Jeff Layton
  0 siblings, 1 reply; 20+ messages in thread
From: Neil Brown @ 2010-10-20 20:40 UTC (permalink / raw)
  To: Jeff Layton; +Cc: chuck.lever, Linux NFS Mailing List

On Wed, 20 Oct 2010 15:16:57 -0400
Jeff Layton <jlayton@redhat.com> wrote:

> On Wed, 20 Oct 2010 13:55:25 -0400
> Jeff Layton <jlayton@redhat.com> wrote:
> 
> > On Wed, 20 Oct 2010 18:17:01 +1100
> > Neil Brown <neilb@suse.de> wrote:
> > 
> > > 
> > > 
> > > If I don't have any network configured (except loop-back), and try an NFSv3
> > > mount, then it fails quickly:
> > > 
> > > 
> > > ....
> > > mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable
> > > mount.nfs: Network is unreachable
> > > 
> > > 
> > > If I try the same thing with a NFSv4 mount, it times out before it fails,
> > > making a much longer delay.
> > > 
> > > This is because mount.nfs doesn't do a portmap lookup but just leaves
> > > everything to the kernel.
> > > The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN.
> > > So at least it doesn't retry after the timeout.  But given that we have a
> > > clear error, we shouldn't timeout at all.
> > > 
> > > Unfortunately I cannot see an easy way to fix this.
> > > 
> > > The place where ENETUNREACH is in xs_tcp_setup_socket.  The comment there
> > > says "Retry with the same socket after a delay".  The "delay" bit is correct,
> > > the "retry" isn't.
> > > 
> > > It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN
> > > was set.  However we cannot see the task at this point - in fact it seems
> > > that there could be a queue of tasks waiting on this connection.  I guess
> > > some could be soft, and some not. ???
> > > 
> > > So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to
> > > fail immediately when  RPC_TASK_SOFTCONN is set ???
> > > 
> > > 
> > > This affects people who upgrade from openSUSE11.2 (which didn't support v4
> > > mounts) to openSUSE11.3 (which defaults to v4) and who use network-manager
> > > (which configures networks late) and have NFS mounts in /etc/fstab with
> > > either explicit IP addresses or host names that can be resolved without the
> > > network.
> > > This config will work because when the network comes up, network-manager will
> > > re-run the 'init.d/nfs' script.  However since 11.3 there is an unpleasant
> > > pause before boot completes.
> > > 
> > 
> > Took me a few tries to get an ENETUNREACH error but I see the same hang
> > you do. For the record I was able to get one by not configuring an IPv6
> > addr on the box and attempting to mount an IPv6 address.
> > 
> > Interestingly while I was trying to reproduce it, I ended up
> > reproducing an EHOSTUNREACH error by trying to mount a IPv6 host to
> > which I didn't have a route. That error returns quickly from the
> > kernel. Maybe we can solve this simply by treating ENETUNREACH the same
> > as EHOSTUNREACH in this situation?
> > 
> > I'm not quite sure exactly how to make that happen, but it seems like
> > reasonable behavior.
> > 
> 
> Sigh, nothing's ever easy in the RPC layer. Please bear with my
> scatterbrained analysis...
> 
> There's a bit of difference at the socket layer between those two cases.
> xs_tcp_finish_connecting calls this to connect the socket:
> 
>     kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK);
> 
> ...in the ENETUNREACH case, this returns immediately with the error. In
> EHOSTUNREACH case, it returns EINPROGRESS and then the sk_error_report
> handles the rest. Fine...we can emulate the similar behavior, but...

That all seems to make sense and helps complete the picture, however ...


> 
> Then what happens is that xs_tcp_send_request gets called again to try
> to resend the packet. In the EHOSTUNREACH case, that returns
> EHOSTUNREACH which eventually causes an rpc_exit with that error. In
> the ENETUNREACH case that returns EPIPE, which makes the state machine
> move next to call_bind and the whole thing starts over again.

This confuses me.  Why would  xs_tcp_send_request (aka ->send_request) get
called before the connect has succeeded?  Can you make sense of that?


Thanks,
NeilBrown


> 
> I'm still not sure what the right approach is here. The fact that
> attempting to send on the socket in this case gives us an EPIPE makes
> it tough to handle this case the same way as EHOSTUNREACH.
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-20 14:29 ` Chuck Lever
@ 2010-10-20 21:29   ` Neil Brown
  2010-10-21  0:56     ` Neil Brown
  0 siblings, 1 reply; 20+ messages in thread
From: Neil Brown @ 2010-10-20 21:29 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List

On Wed, 20 Oct 2010 10:29:05 -0400
Chuck Lever <chuck.lever@ORACLE.COM> wrote:

> 
> On Oct 20, 2010, at 3:17 AM, Neil Brown wrote:
> 
> > 
> > 
> > If I don't have any network configured (except loop-back), and try an NFSv3
> > mount, then it fails quickly:
> > 
> > 
> > ....
> > mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable
> > mount.nfs: Network is unreachable
> > 
> > 
> > If I try the same thing with a NFSv4 mount, it times out before it fails,
> > making a much longer delay.
> > 
> > This is because mount.nfs doesn't do a portmap lookup but just leaves
> > everything to the kernel.
> > The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN.
> > So at least it doesn't retry after the timeout.  But given that we have a
> > clear error, we shouldn't timeout at all.
> > 
> > Unfortunately I cannot see an easy way to fix this.
> > 
> > The place where ENETUNREACH is in xs_tcp_setup_socket.  The comment there
> > says "Retry with the same socket after a delay".  The "delay" bit is correct,
> > the "retry" isn't.
> > 
> > It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN
> > was set.  However we cannot see the task at this point - in fact it seems
> > that there could be a queue of tasks waiting on this connection.  I guess
> > some could be soft, and some not. ???
> > 
> > So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to
> > fail immediately when  RPC_TASK_SOFTCONN is set ???
> 
> ECONNREFUSED should already fail immediately in this case.  If it's not failing immediately, that's a bug.
> 
> I agree that ENETUNREACH seems appropriate for quick failure if RPC_TASK_SOFTCONN is set.  (I thought it already worked this way, but maybe I'm mistaken).

There is certainly code that seems to treat ENETUNREACH differently if
RPC_TASK_SOFTCONN is set, but it doesn't seem to apply in the particular case
I am testing.
e.g. call_bind_status handles ENETUNREACH as a retry if not SOFTCONN and as a
failure in the SOFTCONN case.  
I guess NFSv4 doesn't hit this because the port is explicitly set to 2049 so
it never does the rpcbind step.
So maybe we need to handle ENETUNREACH in call_connect_status as well as
call_bind_status ??

Maybe something like that ...  The placement of rpc_delay seems a little of
to me, but follows call_bind_status, so it could be correct.

??

I haven't thought how EHOSTUNREACH fits into this... presumably it should
fail-quickly when SOFTCONN (which Jeff suggests it does) and should retry for
not SOFTCONN (which I haven't checked).

NeilBrown


diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index fa55490..539885e 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1245,6 +1245,12 @@ call_connect_status(struct rpc_task *task)
        }
 
        switch (status) {
+       case -ENETUNREACH:
+       case -ECONNRESET:
+       case -ECONNREFUSED:
+               if (!RPC_IS_SOFTCONN(task))
+                       rpc_delay(task, 5*HZ);
+               /* fall through */
                /* if soft mounted, test if we've timed out */
        case -ETIMEDOUT:
                task->tk_action = call_timeout;
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index fe9306b..0743994 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1906,7 +1906,8 @@ static void xs_tcp_setup_socket(struct rpc_xprt *xprt,
        case -ECONNREFUSED:
        case -ECONNRESET:
        case -ENETUNREACH:
-               /* retry with existing socket, after a delay */
+               /* allow upper layers to choose between failure and retry */
+               goto out;
        case 0:
        case -EINPROGRESS:
        case -EALREADY:


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-20 20:40     ` Neil Brown
@ 2010-10-21  0:45       ` Jeff Layton
  2010-10-21  3:25         ` Neil Brown
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Layton @ 2010-10-21  0:45 UTC (permalink / raw)
  To: Neil Brown; +Cc: chuck.lever, Linux NFS Mailing List

On Thu, 21 Oct 2010 07:40:28 +1100
Neil Brown <neilb@suse.de> wrote:

> > 
> > Then what happens is that xs_tcp_send_request gets called again to try
> > to resend the packet. In the EHOSTUNREACH case, that returns
> > EHOSTUNREACH which eventually causes an rpc_exit with that error. In
> > the ENETUNREACH case that returns EPIPE, which makes the state machine
> > move next to call_bind and the whole thing starts over again.
> 
> This confuses me.  Why would  xs_tcp_send_request (aka ->send_request) get
> called before the connect has succeeded?  Can you make sense of that?
> 

It confuses me too. I suspect that this may actually be a bug...

So EINPROGRESS makes the connect_worker task clear the connecting bit
and return. Eventually, the EHOSTUNREACH error is reported to
xs_error_report. That function does this:

        xprt_wake_pending_tasks(xprt, -EAGAIN);

The task that was waiting on the connect_worker is then woken up.
call_connect_status does this:

        if (status >= 0 || status == -EAGAIN) {
                clnt->cl_stats->netreconn++;
                task->tk_action = call_transmit;
                return;
        }

...and we end up in call_transmit without the socket being connected.

So I understand how this happened, but I don't really understand the
design of the connect mechanism well enough to know whether this is
by design or not.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-20 21:29   ` Neil Brown
@ 2010-10-21  0:56     ` Neil Brown
  2010-10-21 12:09       ` Jeff Layton
  2010-10-21 14:10       ` Chuck Lever
  0 siblings, 2 replies; 20+ messages in thread
From: Neil Brown @ 2010-10-21  0:56 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List

On Thu, 21 Oct 2010 08:29:38 +1100
Neil Brown <neilb@suse.de> wrote:

> On Wed, 20 Oct 2010 10:29:05 -0400
> Chuck Lever <chuck.lever@ORACLE.COM> wrote:
> 
> > 
> > On Oct 20, 2010, at 3:17 AM, Neil Brown wrote:
> > 
> > > 
> > > 
> > > If I don't have any network configured (except loop-back), and try an NFSv3
> > > mount, then it fails quickly:
> > > 
> > > 
> > > ....
> > > mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable
> > > mount.nfs: Network is unreachable
> > > 
> > > 
> > > If I try the same thing with a NFSv4 mount, it times out before it fails,
> > > making a much longer delay.
> > > 
> > > This is because mount.nfs doesn't do a portmap lookup but just leaves
> > > everything to the kernel.
> > > The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN.
> > > So at least it doesn't retry after the timeout.  But given that we have a
> > > clear error, we shouldn't timeout at all.
> > > 
> > > Unfortunately I cannot see an easy way to fix this.
> > > 
> > > The place where ENETUNREACH is in xs_tcp_setup_socket.  The comment there
> > > says "Retry with the same socket after a delay".  The "delay" bit is correct,
> > > the "retry" isn't.
> > > 
> > > It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN
> > > was set.  However we cannot see the task at this point - in fact it seems
> > > that there could be a queue of tasks waiting on this connection.  I guess
> > > some could be soft, and some not. ???
> > > 
> > > So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to
> > > fail immediately when  RPC_TASK_SOFTCONN is set ???
> > 
> > ECONNREFUSED should already fail immediately in this case.  If it's not failing immediately, that's a bug.
> > 
> > I agree that ENETUNREACH seems appropriate for quick failure if RPC_TASK_SOFTCONN is set.  (I thought it already worked this way, but maybe I'm mistaken).
> 
> There is certainly code that seems to treat ENETUNREACH differently if
> RPC_TASK_SOFTCONN is set, but it doesn't seem to apply in the particular case
> I am testing.
> e.g. call_bind_status handles ENETUNREACH as a retry if not SOFTCONN and as a
> failure in the SOFTCONN case.  
> I guess NFSv4 doesn't hit this because the port is explicitly set to 2049 so
> it never does the rpcbind step.
> So maybe we need to handle ENETUNREACH in call_connect_status as well as
> call_bind_status ??
> 
> Maybe something like that ...  The placement of rpc_delay seems a little of
> to me, but follows call_bind_status, so it could be correct.
>

I did a bit of testing of the patch that I sent and it isn't quite write -
the ENETUNREACH doesn't propagate all the way up to call_connect_status.
This patch fixes that.

With it, the rpc_ping fails nicely,  but when a reconnect is tried on an
already-mounted filesystem it doesn't fail but rather retries every 5 seconds.
This is what I wanted to happen.

However I'm not at all sure that "5 seconds" is correct.  I copied it from
call_bind_status, but it seems a bit short.  Maybe the number in
call_bind_status is a bit low???

Here is my current patch - which is more a starting point for discussion than
a concrete proposal.

Thanks,
NeilBrown

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index fa55490..539885e 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1245,6 +1245,12 @@ call_connect_status(struct rpc_task *task)
 	}
 
 	switch (status) {
+	case -ENETUNREACH:
+	case -ECONNRESET:
+	case -ECONNREFUSED:
+		if (!RPC_IS_SOFTCONN(task))
+			rpc_delay(task, 5*HZ);
+		/* fall through */
 		/* if soft mounted, test if we've timed out */
 	case -ETIMEDOUT:
 		task->tk_action = call_timeout;
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 970fb00..27673d9 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -733,6 +733,10 @@ static void xprt_connect_status(struct rpc_task *task)
 	}
 
 	switch (task->tk_status) {
+	case -ENETUNREACH:
+	case -ECONNREFUSED:
+	case -ECONNRESET:
+		break;
 	case -EAGAIN:
 		dprintk("RPC: %5u xprt_connect_status: retrying\n", task->tk_pid);
 		break;
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index fe9306b..0743994 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1906,7 +1906,8 @@ static void xs_tcp_setup_socket(struct rpc_xprt *xprt,
 	case -ECONNREFUSED:
 	case -ECONNRESET:
 	case -ENETUNREACH:
-		/* retry with existing socket, after a delay */
+		/* allow upper layers to choose between failure and retry */
+		goto out;
 	case 0:
 	case -EINPROGRESS:
 	case -EALREADY:


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-21  0:45       ` Jeff Layton
@ 2010-10-21  3:25         ` Neil Brown
  2010-10-21 14:05           ` Trond Myklebust
  0 siblings, 1 reply; 20+ messages in thread
From: Neil Brown @ 2010-10-21  3:25 UTC (permalink / raw)
  To: Jeff Layton; +Cc: chuck.lever, Linux NFS Mailing List, Trond Myklebust

On Wed, 20 Oct 2010 20:45:32 -0400
Jeff Layton <jlayton@redhat.com> wrote:

> On Thu, 21 Oct 2010 07:40:28 +1100
> Neil Brown <neilb@suse.de> wrote:
> 
> > > 
> > > Then what happens is that xs_tcp_send_request gets called again to try
> > > to resend the packet. In the EHOSTUNREACH case, that returns
> > > EHOSTUNREACH which eventually causes an rpc_exit with that error. In
> > > the ENETUNREACH case that returns EPIPE, which makes the state machine
> > > move next to call_bind and the whole thing starts over again.
> > 
> > This confuses me.  Why would  xs_tcp_send_request (aka ->send_request) get
> > called before the connect has succeeded?  Can you make sense of that?
> > 
> 
> It confuses me too. I suspect that this may actually be a bug...
> 
> So EINPROGRESS makes the connect_worker task clear the connecting bit
> and return. Eventually, the EHOSTUNREACH error is reported to
> xs_error_report. That function does this:
> 
>         xprt_wake_pending_tasks(xprt, -EAGAIN);
> 
> The task that was waiting on the connect_worker is then woken up.
> call_connect_status does this:
> 
>         if (status >= 0 || status == -EAGAIN) {
>                 clnt->cl_stats->netreconn++;
>                 task->tk_action = call_transmit;
>                 return;
>         }
> 
> ...and we end up in call_transmit without the socket being connected.
> 
> So I understand how this happened, but I don't really understand the
> design of the connect mechanism well enough to know whether this is
> by design or not.
> 

Now that *is* interesting.....

I thought that code in call_connect_status was hard to understand too, so I
asked git who to blame it on. It said:

commit 2a4919919a97911b0aa4b9f5ac1eab90ba87652b
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Wed Mar 11 14:38:00 2009 -0400

    SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending
    
    While we should definitely return socket errors to the task that is
    currently trying to send data, there is no need to propagate the same error
    to all the other tasks on xprt->pending. Doing so actually slows down
    recovery, since it causes more than one tasks to attempt socket recovery.
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>



That commit not only adds the "status == -EAGAIN" test, but also introduced
some of the xprtsock.c code that I suggested changing in a previous patch.

So there seem to be some correlation between that commit and the present
problem.

I tried compiling the kernel just prior to that commit, and 
  mount -t nfs4 unrouteable.ip.addres:/ /mnt

took 3 seconds to time fail.

I then stepped forward to that commit and the same command took 3 *minutes* to
time out.  So something isn't right there.  Unfortunately I don't know what.

Trond: can you comment on this - maybe explain the reasoning behind that
commit better, and suggest how we can get ENOTCONN to fail SOFTCONN
connections faster without undoing the things this patch tried to achieve?

Thanks,

NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-21  0:56     ` Neil Brown
@ 2010-10-21 12:09       ` Jeff Layton
  2010-10-21 13:52         ` Chuck Lever
  2010-10-21 14:10       ` Chuck Lever
  1 sibling, 1 reply; 20+ messages in thread
From: Jeff Layton @ 2010-10-21 12:09 UTC (permalink / raw)
  To: Neil Brown; +Cc: Chuck Lever, Linux NFS Mailing List

On Thu, 21 Oct 2010 11:56:22 +1100
Neil Brown <neilb@suse.de> wrote:

> On Thu, 21 Oct 2010 08:29:38 +1100
> Neil Brown <neilb@suse.de> wrote:
> 
> > On Wed, 20 Oct 2010 10:29:05 -0400
> > Chuck Lever <chuck.lever@ORACLE.COM> wrote:
> > 
> > > 
> > > On Oct 20, 2010, at 3:17 AM, Neil Brown wrote:
> > > 
> > > > 
> > > > 
> > > > If I don't have any network configured (except loop-back), and try an NFSv3
> > > > mount, then it fails quickly:
> > > > 
> > > > 
> > > > ....
> > > > mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable
> > > > mount.nfs: Network is unreachable
> > > > 
> > > > 
> > > > If I try the same thing with a NFSv4 mount, it times out before it fails,
> > > > making a much longer delay.
> > > > 
> > > > This is because mount.nfs doesn't do a portmap lookup but just leaves
> > > > everything to the kernel.
> > > > The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN.
> > > > So at least it doesn't retry after the timeout.  But given that we have a
> > > > clear error, we shouldn't timeout at all.
> > > > 
> > > > Unfortunately I cannot see an easy way to fix this.
> > > > 
> > > > The place where ENETUNREACH is in xs_tcp_setup_socket.  The comment there
> > > > says "Retry with the same socket after a delay".  The "delay" bit is correct,
> > > > the "retry" isn't.
> > > > 
> > > > It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN
> > > > was set.  However we cannot see the task at this point - in fact it seems
> > > > that there could be a queue of tasks waiting on this connection.  I guess
> > > > some could be soft, and some not. ???
> > > > 
> > > > So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to
> > > > fail immediately when  RPC_TASK_SOFTCONN is set ???
> > > 
> > > ECONNREFUSED should already fail immediately in this case.  If it's not failing immediately, that's a bug.
> > > 
> > > I agree that ENETUNREACH seems appropriate for quick failure if RPC_TASK_SOFTCONN is set.  (I thought it already worked this way, but maybe I'm mistaken).
> > 
> > There is certainly code that seems to treat ENETUNREACH differently if
> > RPC_TASK_SOFTCONN is set, but it doesn't seem to apply in the particular case
> > I am testing.
> > e.g. call_bind_status handles ENETUNREACH as a retry if not SOFTCONN and as a
> > failure in the SOFTCONN case.  
> > I guess NFSv4 doesn't hit this because the port is explicitly set to 2049 so
> > it never does the rpcbind step.
> > So maybe we need to handle ENETUNREACH in call_connect_status as well as
> > call_bind_status ??
> > 
> > Maybe something like that ...  The placement of rpc_delay seems a little of
> > to me, but follows call_bind_status, so it could be correct.
> >
> 
> I did a bit of testing of the patch that I sent and it isn't quite write -
> the ENETUNREACH doesn't propagate all the way up to call_connect_status.
> This patch fixes that.
> 
> With it, the rpc_ping fails nicely,  but when a reconnect is tried on an
> already-mounted filesystem it doesn't fail but rather retries every 5 seconds.
> This is what I wanted to happen.
> 
> However I'm not at all sure that "5 seconds" is correct.  I copied it from
> call_bind_status, but it seems a bit short.  Maybe the number in
> call_bind_status is a bit low???
> 
> Here is my current patch - which is more a starting point for discussion than
> a concrete proposal.
> 
> Thanks,
> NeilBrown
> 
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index fa55490..539885e 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1245,6 +1245,12 @@ call_connect_status(struct rpc_task *task)
>  	}
>  
>  	switch (status) {
> +	case -ENETUNREACH:
> +	case -ECONNRESET:
> +	case -ECONNREFUSED:
> +		if (!RPC_IS_SOFTCONN(task))
> +			rpc_delay(task, 5*HZ);
> +		/* fall through */

Maybe instead of the above, we should do something like:

		if (RPC_IS_SOFTCONN(task)) {
			rpc_exit(task, status);
			return;
		}

...IOW, if this is a SOFTCONN task, return connect errors immediately.
If it's not a SOFTCONN task, treat it as we would a timeout?

That'll would probably fix the -ENETUNREACH case, but I'm not sure what
to do about the cases that rely on xs_error_report. It seems a little
suspicious that those errors all get turned into -EAGAIN.

>  		/* if soft mounted, test if we've timed out */
>  	case -ETIMEDOUT:
>  		task->tk_action = call_timeout;
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index 970fb00..27673d9 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -733,6 +733,10 @@ static void xprt_connect_status(struct rpc_task *task)
>  	}
>  
>  	switch (task->tk_status) {
> +	case -ENETUNREACH:
> +	case -ECONNREFUSED:
> +	case -ECONNRESET:
> +		break;
>  	case -EAGAIN:
>  		dprintk("RPC: %5u xprt_connect_status: retrying\n", task->tk_pid);
>  		break;
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index fe9306b..0743994 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1906,7 +1906,8 @@ static void xs_tcp_setup_socket(struct rpc_xprt *xprt,
>  	case -ECONNREFUSED:
>  	case -ECONNRESET:
>  	case -ENETUNREACH:
> -		/* retry with existing socket, after a delay */
> +		/* allow upper layers to choose between failure and retry */
> +		goto out;
>  	case 0:
>  	case -EINPROGRESS:
>  	case -EALREADY:
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-21 12:09       ` Jeff Layton
@ 2010-10-21 13:52         ` Chuck Lever
  0 siblings, 0 replies; 20+ messages in thread
From: Chuck Lever @ 2010-10-21 13:52 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Neil Brown, Linux NFS Mailing List


On Oct 21, 2010, at 8:09 AM, Jeff Layton wrote:

> On Thu, 21 Oct 2010 11:56:22 +1100
> Neil Brown <neilb@suse.de> wrote:
> 
>> On Thu, 21 Oct 2010 08:29:38 +1100
>> Neil Brown <neilb@suse.de> wrote:
>> 
>>> On Wed, 20 Oct 2010 10:29:05 -0400
>>> Chuck Lever <chuck.lever@ORACLE.COM> wrote:
>>> 
>>>> 
>>>> On Oct 20, 2010, at 3:17 AM, Neil Brown wrote:
>>>> 
>>>>> 
>>>>> 
>>>>> If I don't have any network configured (except loop-back), and try an NFSv3
>>>>> mount, then it fails quickly:
>>>>> 
>>>>> 
>>>>> ....
>>>>> mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable
>>>>> mount.nfs: Network is unreachable
>>>>> 
>>>>> 
>>>>> If I try the same thing with a NFSv4 mount, it times out before it fails,
>>>>> making a much longer delay.
>>>>> 
>>>>> This is because mount.nfs doesn't do a portmap lookup but just leaves
>>>>> everything to the kernel.
>>>>> The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN.
>>>>> So at least it doesn't retry after the timeout.  But given that we have a
>>>>> clear error, we shouldn't timeout at all.
>>>>> 
>>>>> Unfortunately I cannot see an easy way to fix this.
>>>>> 
>>>>> The place where ENETUNREACH is in xs_tcp_setup_socket.  The comment there
>>>>> says "Retry with the same socket after a delay".  The "delay" bit is correct,
>>>>> the "retry" isn't.
>>>>> 
>>>>> It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN
>>>>> was set.  However we cannot see the task at this point - in fact it seems
>>>>> that there could be a queue of tasks waiting on this connection.  I guess
>>>>> some could be soft, and some not. ???
>>>>> 
>>>>> So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to
>>>>> fail immediately when  RPC_TASK_SOFTCONN is set ???
>>>> 
>>>> ECONNREFUSED should already fail immediately in this case.  If it's not failing immediately, that's a bug.
>>>> 
>>>> I agree that ENETUNREACH seems appropriate for quick failure if RPC_TASK_SOFTCONN is set.  (I thought it already worked this way, but maybe I'm mistaken).
>>> 
>>> There is certainly code that seems to treat ENETUNREACH differently if
>>> RPC_TASK_SOFTCONN is set, but it doesn't seem to apply in the particular case
>>> I am testing.
>>> e.g. call_bind_status handles ENETUNREACH as a retry if not SOFTCONN and as a
>>> failure in the SOFTCONN case.  
>>> I guess NFSv4 doesn't hit this because the port is explicitly set to 2049 so
>>> it never does the rpcbind step.
>>> So maybe we need to handle ENETUNREACH in call_connect_status as well as
>>> call_bind_status ??
>>> 
>>> Maybe something like that ...  The placement of rpc_delay seems a little of
>>> to me, but follows call_bind_status, so it could be correct.
>>> 
>> 
>> I did a bit of testing of the patch that I sent and it isn't quite write -
>> the ENETUNREACH doesn't propagate all the way up to call_connect_status.
>> This patch fixes that.
>> 
>> With it, the rpc_ping fails nicely,  but when a reconnect is tried on an
>> already-mounted filesystem it doesn't fail but rather retries every 5 seconds.
>> This is what I wanted to happen.
>> 
>> However I'm not at all sure that "5 seconds" is correct.  I copied it from
>> call_bind_status, but it seems a bit short.  Maybe the number in
>> call_bind_status is a bit low???
>> 
>> Here is my current patch - which is more a starting point for discussion than
>> a concrete proposal.
>> 
>> Thanks,
>> NeilBrown
>> 
>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>> index fa55490..539885e 100644
>> --- a/net/sunrpc/clnt.c
>> +++ b/net/sunrpc/clnt.c
>> @@ -1245,6 +1245,12 @@ call_connect_status(struct rpc_task *task)
>> 	}
>> 
>> 	switch (status) {
>> +	case -ENETUNREACH:
>> +	case -ECONNRESET:
>> +	case -ECONNREFUSED:
>> +		if (!RPC_IS_SOFTCONN(task))
>> +			rpc_delay(task, 5*HZ);
>> +		/* fall through */
> 
> Maybe instead of the above, we should do something like:
> 
> 		if (RPC_IS_SOFTCONN(task)) {
> 			rpc_exit(task, status);
> 			return;
> 		}
> 
> ...IOW, if this is a SOFTCONN task, return connect errors immediately.
> If it's not a SOFTCONN task, treat it as we would a timeout?
> 
> That'll would probably fix the -ENETUNREACH case, but I'm not sure what
> to do about the cases that rely on xs_error_report. It seems a little
> suspicious that those errors all get turned into -EAGAIN.

I think the theory is that all of the transport connect problems should be handled by the lower layers (xprtsock and the kernel's socket code) during the call_transmit step.  If something needs to be retried (say, a connection attempt) then the lower layer returns EAGAIN and the finite state machine will retry.

One of the issues is how to deal with partially transmitted requests and GSS credentials.  Some cases you have to re-encode the request before transmitting again.  The finite state machine attempts to deal with that.


> 
>> 		/* if soft mounted, test if we've timed out */
>> 	case -ETIMEDOUT:
>> 		task->tk_action = call_timeout;
>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
>> index 970fb00..27673d9 100644
>> --- a/net/sunrpc/xprt.c
>> +++ b/net/sunrpc/xprt.c
>> @@ -733,6 +733,10 @@ static void xprt_connect_status(struct rpc_task *task)
>> 	}
>> 
>> 	switch (task->tk_status) {
>> +	case -ENETUNREACH:
>> +	case -ECONNREFUSED:
>> +	case -ECONNRESET:
>> +		break;
>> 	case -EAGAIN:
>> 		dprintk("RPC: %5u xprt_connect_status: retrying\n", task->tk_pid);
>> 		break;
>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>> index fe9306b..0743994 100644
>> --- a/net/sunrpc/xprtsock.c
>> +++ b/net/sunrpc/xprtsock.c
>> @@ -1906,7 +1906,8 @@ static void xs_tcp_setup_socket(struct rpc_xprt *xprt,
>> 	case -ECONNREFUSED:
>> 	case -ECONNRESET:
>> 	case -ENETUNREACH:
>> -		/* retry with existing socket, after a delay */
>> +		/* allow upper layers to choose between failure and retry */
>> +		goto out;
>> 	case 0:
>> 	case -EINPROGRESS:
>> 	case -EALREADY:
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
> 
> 
> -- 
> Jeff Layton <jlayton@redhat.com>

-- 
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-21  3:25         ` Neil Brown
@ 2010-10-21 14:05           ` Trond Myklebust
  2010-10-21 14:31             ` Chuck Lever
  0 siblings, 1 reply; 20+ messages in thread
From: Trond Myklebust @ 2010-10-21 14:05 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jeff Layton, chuck.lever, Linux NFS Mailing List

On Thu, 2010-10-21 at 14:25 +1100, Neil Brown wrote:
> On Wed, 20 Oct 2010 20:45:32 -0400
> Jeff Layton <jlayton@redhat.com> wrote:
> 
> > On Thu, 21 Oct 2010 07:40:28 +1100
> > Neil Brown <neilb@suse.de> wrote:
> > 
> > > > 
> > > > Then what happens is that xs_tcp_send_request gets called again to try
> > > > to resend the packet. In the EHOSTUNREACH case, that returns
> > > > EHOSTUNREACH which eventually causes an rpc_exit with that error. In
> > > > the ENETUNREACH case that returns EPIPE, which makes the state machine
> > > > move next to call_bind and the whole thing starts over again.
> > > 
> > > This confuses me.  Why would  xs_tcp_send_request (aka ->send_request) get
> > > called before the connect has succeeded?  Can you make sense of that?
> > > 
> > 
> > It confuses me too. I suspect that this may actually be a bug...
> > 
> > So EINPROGRESS makes the connect_worker task clear the connecting bit
> > and return. Eventually, the EHOSTUNREACH error is reported to
> > xs_error_report. That function does this:
> > 
> >         xprt_wake_pending_tasks(xprt, -EAGAIN);
> > 
> > The task that was waiting on the connect_worker is then woken up.
> > call_connect_status does this:
> > 
> >         if (status >= 0 || status == -EAGAIN) {
> >                 clnt->cl_stats->netreconn++;
> >                 task->tk_action = call_transmit;
> >                 return;
> >         }
> > 
> > ...and we end up in call_transmit without the socket being connected.
> > 
> > So I understand how this happened, but I don't really understand the
> > design of the connect mechanism well enough to know whether this is
> > by design or not.
> > 
> 
> Now that *is* interesting.....
> 
> I thought that code in call_connect_status was hard to understand too, so I
> asked git who to blame it on. It said:
> 
> commit 2a4919919a97911b0aa4b9f5ac1eab90ba87652b
> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date:   Wed Mar 11 14:38:00 2009 -0400
> 
>     SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending
>     
>     While we should definitely return socket errors to the task that is
>     currently trying to send data, there is no need to propagate the same error
>     to all the other tasks on xprt->pending. Doing so actually slows down
>     recovery, since it causes more than one tasks to attempt socket recovery.
>     
>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> 
> 
> That commit not only adds the "status == -EAGAIN" test, but also introduced
> some of the xprtsock.c code that I suggested changing in a previous patch.

That is because ENOTCONN was trying to report the current state of the
socket to a bunch of tasks on the 'pending' list. The problem is that
because none of those tasks hold any form of lock, then by the time they
get round to processing that ENOTCONN, the state of the socket can (and
usually will) have changed.

> So there seem to be some correlation between that commit and the present
> problem.
> 
> I tried compiling the kernel just prior to that commit, and 
>   mount -t nfs4 unrouteable.ip.addres:/ /mnt
> 
> took 3 seconds to time fail.
> 
> I then stepped forward to that commit and the same command took 3 *minutes* to
> time out.  So something isn't right there.  Unfortunately I don't know what.
> 
> Trond: can you comment on this - maybe explain the reasoning behind that
> commit better, and suggest how we can get ENOTCONN to fail SOFTCONN
> connections faster without undoing the things this patch tried to achieve?

It seems to me that the problem is basically that RPC_IS_SOFTCONN is
poorly defined. IMO, the definition should really be that
'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run
call_bind or a call_connect more than once (i.e. if they ever have to
loop back).

With that in mind, I really don't see why the RPC_IS_SOFTCONN case is
being handled in call_transmit_status() instead of in call_status().
I furthermore don't see why ECONNRESET, ENOTCONN and EPIPE should be
treated any differently from ECONNREFUSED, EHOSTDOWN, EHOSTUNREACH and
ENETUNREACH.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-21  0:56     ` Neil Brown
  2010-10-21 12:09       ` Jeff Layton
@ 2010-10-21 14:10       ` Chuck Lever
  1 sibling, 0 replies; 20+ messages in thread
From: Chuck Lever @ 2010-10-21 14:10 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux NFS Mailing List


On Oct 20, 2010, at 8:56 PM, Neil Brown wrote:

> On Thu, 21 Oct 2010 08:29:38 +1100
> Neil Brown <neilb@suse.de> wrote:
> 
>> On Wed, 20 Oct 2010 10:29:05 -0400
>> Chuck Lever <chuck.lever@ORACLE.COM> wrote:
>> 
>>> 
>>> On Oct 20, 2010, at 3:17 AM, Neil Brown wrote:
>>> 
>>>> 
>>>> 
>>>> If I don't have any network configured (except loop-back), and try an NFSv3
>>>> mount, then it fails quickly:
>>>> 
>>>> 
>>>> ....
>>>> mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable
>>>> mount.nfs: Network is unreachable
>>>> 
>>>> 
>>>> If I try the same thing with a NFSv4 mount, it times out before it fails,
>>>> making a much longer delay.
>>>> 
>>>> This is because mount.nfs doesn't do a portmap lookup but just leaves
>>>> everything to the kernel.
>>>> The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN.
>>>> So at least it doesn't retry after the timeout.  But given that we have a
>>>> clear error, we shouldn't timeout at all.
>>>> 
>>>> Unfortunately I cannot see an easy way to fix this.
>>>> 
>>>> The place where ENETUNREACH is in xs_tcp_setup_socket.  The comment there
>>>> says "Retry with the same socket after a delay".  The "delay" bit is correct,
>>>> the "retry" isn't.
>>>> 
>>>> It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN
>>>> was set.  However we cannot see the task at this point - in fact it seems
>>>> that there could be a queue of tasks waiting on this connection.  I guess
>>>> some could be soft, and some not. ???
>>>> 
>>>> So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to
>>>> fail immediately when  RPC_TASK_SOFTCONN is set ???
>>> 
>>> ECONNREFUSED should already fail immediately in this case.  If it's not failing immediately, that's a bug.
>>> 
>>> I agree that ENETUNREACH seems appropriate for quick failure if RPC_TASK_SOFTCONN is set.  (I thought it already worked this way, but maybe I'm mistaken).
>> 
>> There is certainly code that seems to treat ENETUNREACH differently if
>> RPC_TASK_SOFTCONN is set, but it doesn't seem to apply in the particular case
>> I am testing.
>> e.g. call_bind_status handles ENETUNREACH as a retry if not SOFTCONN and as a
>> failure in the SOFTCONN case.  
>> I guess NFSv4 doesn't hit this because the port is explicitly set to 2049 so
>> it never does the rpcbind step.
>> So maybe we need to handle ENETUNREACH in call_connect_status as well as
>> call_bind_status ??
>> 
>> Maybe something like that ...  The placement of rpc_delay seems a little of
>> to me, but follows call_bind_status, so it could be correct.
>> 
> 
> I did a bit of testing of the patch that I sent and it isn't quite write -
> the ENETUNREACH doesn't propagate all the way up to call_connect_status.
> This patch fixes that.
> 
> With it, the rpc_ping fails nicely,  but when a reconnect is tried on an
> already-mounted filesystem it doesn't fail but rather retries every 5 seconds.
> This is what I wanted to happen.
> 
> However I'm not at all sure that "5 seconds" is correct.  I copied it from
> call_bind_status, but it seems a bit short.  Maybe the number in
> call_bind_status is a bit low???

I think that number is arbitrary.  Five seconds might be too often to try on a server that is trying to come up, I agree.

> Here is my current patch - which is more a starting point for discussion than
> a concrete proposal.

Sure.

I'm sorry I can't be more helpful here, it's been a while.  An important thing to remember is that the call_bind step is different than the call_connect step: call_bind can actually perform a whole other RPC under the covers.  And: rpcbind doesn't do an initial NULL RPC, it just does the RPCBIND request.

So the three places where this SOFTCONN thingie seems to matter is a) during the initial RPC ping, b) during an rpcbind request, and c) when the server is probing the callback parameters the client gave it via SETCLIENTID.

We should be sure that all three places are working as expected; ie. the check-in criteria should include more than just "mount".

Also, TCP is not the only connection-oriented transport we support here, so we should be careful that we're not doing anything that will break RDMA.


> Thanks,
> NeilBrown
> 
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index fa55490..539885e 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1245,6 +1245,12 @@ call_connect_status(struct rpc_task *task)
> 	}
> 
> 	switch (status) {
> +	case -ENETUNREACH:
> +	case -ECONNRESET:
> +	case -ECONNREFUSED:
> +		if (!RPC_IS_SOFTCONN(task))
> +			rpc_delay(task, 5*HZ);
> +		/* fall through */
> 		/* if soft mounted, test if we've timed out */
> 	case -ETIMEDOUT:
> 		task->tk_action = call_timeout;
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index 970fb00..27673d9 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -733,6 +733,10 @@ static void xprt_connect_status(struct rpc_task *task)
> 	}
> 
> 	switch (task->tk_status) {
> +	case -ENETUNREACH:
> +	case -ECONNREFUSED:
> +	case -ECONNRESET:
> +		break;

I'm not sure why you are including ECONN* here (and above).  My impression was that case was working as expected.  Can you explain the (non-NETUNREACH) transport connection-related problem you see?

Theoretically, the architecture of the connection retry logic means that all three of these should be mapped to EAGAIN by the transports (xprtsock.c, in this specific case).

> 	case -EAGAIN:
> 		dprintk("RPC: %5u xprt_connect_status: retrying\n", task->tk_pid);
> 		break;
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index fe9306b..0743994 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1906,7 +1906,8 @@ static void xs_tcp_setup_socket(struct rpc_xprt *xprt,
> 	case -ECONNREFUSED:
> 	case -ECONNRESET:
> 	case -ENETUNREACH:
> -		/* retry with existing socket, after a delay */
> +		/* allow upper layers to choose between failure and retry */
> +		goto out;
> 	case 0:
> 	case -EINPROGRESS:
> 	case -EALREADY:
> 

-- 
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-21 14:05           ` Trond Myklebust
@ 2010-10-21 14:31             ` Chuck Lever
  2010-10-21 14:42               ` Trond Myklebust
  0 siblings, 1 reply; 20+ messages in thread
From: Chuck Lever @ 2010-10-21 14:31 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Neil Brown, Jeff Layton, Linux NFS Mailing List


On Oct 21, 2010, at 10:05 AM, Trond Myklebust wrote:

> On Thu, 2010-10-21 at 14:25 +1100, Neil Brown wrote:
>> On Wed, 20 Oct 2010 20:45:32 -0400
>> Jeff Layton <jlayton@redhat.com> wrote:
>> 
>>> On Thu, 21 Oct 2010 07:40:28 +1100
>>> Neil Brown <neilb@suse.de> wrote:
>>> 
>>>>> 
>>>>> Then what happens is that xs_tcp_send_request gets called again to try
>>>>> to resend the packet. In the EHOSTUNREACH case, that returns
>>>>> EHOSTUNREACH which eventually causes an rpc_exit with that error. In
>>>>> the ENETUNREACH case that returns EPIPE, which makes the state machine
>>>>> move next to call_bind and the whole thing starts over again.
>>>> 
>>>> This confuses me.  Why would  xs_tcp_send_request (aka ->send_request) get
>>>> called before the connect has succeeded?  Can you make sense of that?
>>>> 
>>> 
>>> It confuses me too. I suspect that this may actually be a bug...
>>> 
>>> So EINPROGRESS makes the connect_worker task clear the connecting bit
>>> and return. Eventually, the EHOSTUNREACH error is reported to
>>> xs_error_report. That function does this:
>>> 
>>>        xprt_wake_pending_tasks(xprt, -EAGAIN);
>>> 
>>> The task that was waiting on the connect_worker is then woken up.
>>> call_connect_status does this:
>>> 
>>>        if (status >= 0 || status == -EAGAIN) {
>>>                clnt->cl_stats->netreconn++;
>>>                task->tk_action = call_transmit;
>>>                return;
>>>        }
>>> 
>>> ...and we end up in call_transmit without the socket being connected.
>>> 
>>> So I understand how this happened, but I don't really understand the
>>> design of the connect mechanism well enough to know whether this is
>>> by design or not.
>>> 
>> 
>> Now that *is* interesting.....
>> 
>> I thought that code in call_connect_status was hard to understand too, so I
>> asked git who to blame it on. It said:
>> 
>> commit 2a4919919a97911b0aa4b9f5ac1eab90ba87652b
>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
>> Date:   Wed Mar 11 14:38:00 2009 -0400
>> 
>>    SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending
>> 
>>    While we should definitely return socket errors to the task that is
>>    currently trying to send data, there is no need to propagate the same error
>>    to all the other tasks on xprt->pending. Doing so actually slows down
>>    recovery, since it causes more than one tasks to attempt socket recovery.
>> 
>>    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>> 
>> 
>> 
>> That commit not only adds the "status == -EAGAIN" test, but also introduced
>> some of the xprtsock.c code that I suggested changing in a previous patch.
> 
> That is because ENOTCONN was trying to report the current state of the
> socket to a bunch of tasks on the 'pending' list. The problem is that
> because none of those tasks hold any form of lock, then by the time they
> get round to processing that ENOTCONN, the state of the socket can (and
> usually will) have changed.
> 
>> So there seem to be some correlation between that commit and the present
>> problem.
>> 
>> I tried compiling the kernel just prior to that commit, and 
>>  mount -t nfs4 unrouteable.ip.addres:/ /mnt
>> 
>> took 3 seconds to time fail.
>> 
>> I then stepped forward to that commit and the same command took 3 *minutes* to
>> time out.  So something isn't right there.  Unfortunately I don't know what.
>> 
>> Trond: can you comment on this - maybe explain the reasoning behind that
>> commit better, and suggest how we can get ENOTCONN to fail SOFTCONN
>> connections faster without undoing the things this patch tried to achieve?
> 
> It seems to me that the problem is basically that RPC_IS_SOFTCONN is
> poorly defined. IMO, the definition should really be that
> 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run
> call_bind or a call_connect more than once (i.e. if they ever have to
> loop back).

I think you are suggesting that we define the places where RPC_IS_SOFTCONN should be tested as exactly those places that might want to restart the RPC call via rpc_force_rebind, tk_action = call_refresh, and so on.  Yes?

> With that in mind, I really don't see why the RPC_IS_SOFTCONN case is
> being handled in call_transmit_status() instead of in call_status().

I tried it in call_status(), but it didn't work as expected.  My notes aren't specific about the problem.  One thing is for sure, there seems to be some redundant error handling logic in both of those states.  Unraveling it might be more than Neil bargained for, but would help "future generations."

> I furthermore don't see why ECONNRESET, ENOTCONN and EPIPE should be
> treated any differently from ECONNREFUSED, EHOSTDOWN, EHOSTUNREACH and
> ENETUNREACH.

I can get behind that.  It looks like call_bind_status already combines all these error codes, so it makes some sense.

-- 
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-21 14:31             ` Chuck Lever
@ 2010-10-21 14:42               ` Trond Myklebust
  2010-10-21 19:40                 ` Jeff Layton
  0 siblings, 1 reply; 20+ messages in thread
From: Trond Myklebust @ 2010-10-21 14:42 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Neil Brown, Jeff Layton, Linux NFS Mailing List

On Thu, 2010-10-21 at 10:31 -0400, Chuck Lever wrote:
> On Oct 21, 2010, at 10:05 AM, Trond Myklebust wrote:
> > It seems to me that the problem is basically that RPC_IS_SOFTCONN is
> > poorly defined. IMO, the definition should really be that
> > 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run
> > call_bind or a call_connect more than once (i.e. if they ever have to
> > loop back).
> 
> I think you are suggesting that we define the places where RPC_IS_SOFTCONN should be tested as exactly those places that might want to restart the RPC call via rpc_force_rebind, tk_action = call_refresh, and so on.  Yes?

Yes.

> > With that in mind, I really don't see why the RPC_IS_SOFTCONN case is
> > being handled in call_transmit_status() instead of in call_status().
> 
> I tried it in call_status(), but it didn't work as expected.  My notes aren't specific about the problem.  One thing is for sure, there seems to be some redundant error handling logic in both of those states.  Unraveling it might be more than Neil bargained for, but would help "future generations."

call_transmit_status() is there basically in order to call
xprt_end_transmit() in the cases where we want to free up the socket
write lock (and then possibly sleep).

Actual state error handling is supposed occur in call_status().

Cheers
   Trond

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-21 14:42               ` Trond Myklebust
@ 2010-10-21 19:40                 ` Jeff Layton
  2010-10-21 19:47                   ` Trond Myklebust
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Layton @ 2010-10-21 19:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Chuck Lever, Neil Brown, Linux NFS Mailing List

On Thu, 21 Oct 2010 10:42:04 -0400
Trond Myklebust <Trond.Myklebust@netapp.com> wrote:

> On Thu, 2010-10-21 at 10:31 -0400, Chuck Lever wrote:
> > On Oct 21, 2010, at 10:05 AM, Trond Myklebust wrote:
> > > It seems to me that the problem is basically that RPC_IS_SOFTCONN is
> > > poorly defined. IMO, the definition should really be that
> > > 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run
> > > call_bind or a call_connect more than once (i.e. if they ever have to
> > > loop back).
> > 
> > I think you are suggesting that we define the places where RPC_IS_SOFTCONN should be tested as exactly those places that might want to restart the RPC call via rpc_force_rebind, tk_action = call_refresh, and so on.  Yes?
> 
> Yes.
> 
> > > With that in mind, I really don't see why the RPC_IS_SOFTCONN case is
> > > being handled in call_transmit_status() instead of in call_status().
> > 
> > I tried it in call_status(), but it didn't work as expected.  My notes aren't specific about the problem.  One thing is for sure, there seems to be some redundant error handling logic in both of those states.  Unraveling it might be more than Neil bargained for, but would help "future generations."
> 
> call_transmit_status() is there basically in order to call
> xprt_end_transmit() in the cases where we want to free up the socket
> write lock (and then possibly sleep).
> 
> Actual state error handling is supposed occur in call_status().
> 

This is terribly confusing. So the connect handling is done in
call_transmit?

It seems like it would make more sense to have the connect handling
mostly done in call_connect and call_connect_status, and only allow the
tasks to proceed to the call_transmit phase after the connect has
succeeded.

If that's not the case, then maybe some renaming of functions is in
order so that the purpose of them is more clear?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-21 19:40                 ` Jeff Layton
@ 2010-10-21 19:47                   ` Trond Myklebust
  2010-10-21 20:08                     ` Jeff Layton
  0 siblings, 1 reply; 20+ messages in thread
From: Trond Myklebust @ 2010-10-21 19:47 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever, Neil Brown, Linux NFS Mailing List

On Thu, 2010-10-21 at 15:40 -0400, Jeff Layton wrote:
> On Thu, 21 Oct 2010 10:42:04 -0400
> Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> 
> > On Thu, 2010-10-21 at 10:31 -0400, Chuck Lever wrote:
> > > On Oct 21, 2010, at 10:05 AM, Trond Myklebust wrote:
> > > > It seems to me that the problem is basically that RPC_IS_SOFTCONN is
> > > > poorly defined. IMO, the definition should really be that
> > > > 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run
> > > > call_bind or a call_connect more than once (i.e. if they ever have to
> > > > loop back).
> > > 
> > > I think you are suggesting that we define the places where RPC_IS_SOFTCONN should be tested as exactly those places that might want to restart the RPC call via rpc_force_rebind, tk_action = call_refresh, and so on.  Yes?
> > 
> > Yes.
> > 
> > > > With that in mind, I really don't see why the RPC_IS_SOFTCONN case is
> > > > being handled in call_transmit_status() instead of in call_status().
> > > 
> > > I tried it in call_status(), but it didn't work as expected.  My notes aren't specific about the problem.  One thing is for sure, there seems to be some redundant error handling logic in both of those states.  Unraveling it might be more than Neil bargained for, but would help "future generations."
> > 
> > call_transmit_status() is there basically in order to call
> > xprt_end_transmit() in the cases where we want to free up the socket
> > write lock (and then possibly sleep).
> > 
> > Actual state error handling is supposed occur in call_status().
> > 
> 
> This is terribly confusing. So the connect handling is done in
> call_transmit?

No. Connection-related _error conditions_ that result from trying to
send stuff through a socket that is not connected, are handled in
call_status(). Usually, by sending the process back to call_connect().

> It seems like it would make more sense to have the connect handling
> mostly done in call_connect and call_connect_status, and only allow the
> tasks to proceed to the call_transmit phase after the connect has
> succeeded.

That is the case. However sockets can and do get closed by the _server_
in unpredictable ways. This is what may need to be handled after the
task has passed the call_connect state.

   Trond


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-21 19:47                   ` Trond Myklebust
@ 2010-10-21 20:08                     ` Jeff Layton
  2010-10-21 20:18                       ` Trond Myklebust
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Layton @ 2010-10-21 20:08 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Chuck Lever, Neil Brown, Linux NFS Mailing List

On Thu, 21 Oct 2010 15:47:24 -0400
Trond Myklebust <Trond.Myklebust@netapp.com> wrote:

> On Thu, 2010-10-21 at 15:40 -0400, Jeff Layton wrote:
> > On Thu, 21 Oct 2010 10:42:04 -0400
> > Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> > 
> > > On Thu, 2010-10-21 at 10:31 -0400, Chuck Lever wrote:
> > > > On Oct 21, 2010, at 10:05 AM, Trond Myklebust wrote:
> > > > > It seems to me that the problem is basically that RPC_IS_SOFTCONN is
> > > > > poorly defined. IMO, the definition should really be that
> > > > > 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run
> > > > > call_bind or a call_connect more than once (i.e. if they ever have to
> > > > > loop back).
> > > > 
> > > > I think you are suggesting that we define the places where RPC_IS_SOFTCONN should be tested as exactly those places that might want to restart the RPC call via rpc_force_rebind, tk_action = call_refresh, and so on.  Yes?
> > > 
> > > Yes.
> > > 
> > > > > With that in mind, I really don't see why the RPC_IS_SOFTCONN case is
> > > > > being handled in call_transmit_status() instead of in call_status().
> > > > 
> > > > I tried it in call_status(), but it didn't work as expected.  My notes aren't specific about the problem.  One thing is for sure, there seems to be some redundant error handling logic in both of those states.  Unraveling it might be more than Neil bargained for, but would help "future generations."
> > > 
> > > call_transmit_status() is there basically in order to call
> > > xprt_end_transmit() in the cases where we want to free up the socket
> > > write lock (and then possibly sleep).
> > > 
> > > Actual state error handling is supposed occur in call_status().
> > > 
> > 
> > This is terribly confusing. So the connect handling is done in
> > call_transmit?
> 
> No. Connection-related _error conditions_ that result from trying to
> send stuff through a socket that is not connected, are handled in
> call_status(). Usually, by sending the process back to call_connect().
> 

Yep, I get that part...

> > It seems like it would make more sense to have the connect handling
> > mostly done in call_connect and call_connect_status, and only allow the
> > tasks to proceed to the call_transmit phase after the connect has
> > succeeded.
> 
> That is the case. However sockets can and do get closed by the _server_
> in unpredictable ways. This is what may need to be handled after the
> task has passed the call_connect state.
> 

Right, the socket can change state at any time, but the code doesn't
seem to do what you describe for initial connects.

In the EHOSTDOWN case, kernel_connect returns EINPROGRESS and then
xs_error_report is called with a socket error of EHOSTDOWN. That wakes
up the tasks with a status of EAGAIN and call_connect_status sends the
task to call_transmit even though the socket is still not connected.

Having xs_error_report set the status of all tasks with EAGAIN seems
wrong to me since we're essentially losing the error code that the
socket layer sent up.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-21 20:08                     ` Jeff Layton
@ 2010-10-21 20:18                       ` Trond Myklebust
  2011-03-23  6:41                         ` NeilBrown
  0 siblings, 1 reply; 20+ messages in thread
From: Trond Myklebust @ 2010-10-21 20:18 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever, Neil Brown, Linux NFS Mailing List

On Thu, 2010-10-21 at 16:08 -0400, Jeff Layton wrote:
> On Thu, 21 Oct 2010 15:47:24 -0400
> Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> 
> > On Thu, 2010-10-21 at 15:40 -0400, Jeff Layton wrote:
> > > On Thu, 21 Oct 2010 10:42:04 -0400
> > > Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> > > 
> > > > On Thu, 2010-10-21 at 10:31 -0400, Chuck Lever wrote:
> > > > > On Oct 21, 2010, at 10:05 AM, Trond Myklebust wrote:
> > > > > > It seems to me that the problem is basically that RPC_IS_SOFTCONN is
> > > > > > poorly defined. IMO, the definition should really be that
> > > > > > 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run
> > > > > > call_bind or a call_connect more than once (i.e. if they ever have to
> > > > > > loop back).
> > > > > 
> > > > > I think you are suggesting that we define the places where RPC_IS_SOFTCONN should be tested as exactly those places that might want to restart the RPC call via rpc_force_rebind, tk_action = call_refresh, and so on.  Yes?
> > > > 
> > > > Yes.
> > > > 
> > > > > > With that in mind, I really don't see why the RPC_IS_SOFTCONN case is
> > > > > > being handled in call_transmit_status() instead of in call_status().
> > > > > 
> > > > > I tried it in call_status(), but it didn't work as expected.  My notes aren't specific about the problem.  One thing is for sure, there seems to be some redundant error handling logic in both of those states.  Unraveling it might be more than Neil bargained for, but would help "future generations."
> > > > 
> > > > call_transmit_status() is there basically in order to call
> > > > xprt_end_transmit() in the cases where we want to free up the socket
> > > > write lock (and then possibly sleep).
> > > > 
> > > > Actual state error handling is supposed occur in call_status().
> > > > 
> > > 
> > > This is terribly confusing. So the connect handling is done in
> > > call_transmit?
> > 
> > No. Connection-related _error conditions_ that result from trying to
> > send stuff through a socket that is not connected, are handled in
> > call_status(). Usually, by sending the process back to call_connect().
> > 
> 
> Yep, I get that part...
> 
> > > It seems like it would make more sense to have the connect handling
> > > mostly done in call_connect and call_connect_status, and only allow the
> > > tasks to proceed to the call_transmit phase after the connect has
> > > succeeded.
> > 
> > That is the case. However sockets can and do get closed by the _server_
> > in unpredictable ways. This is what may need to be handled after the
> > task has passed the call_connect state.
> > 
> 
> Right, the socket can change state at any time, but the code doesn't
> seem to do what you describe for initial connects.
> 
> In the EHOSTDOWN case, kernel_connect returns EINPROGRESS and then
> xs_error_report is called with a socket error of EHOSTDOWN. That wakes
> up the tasks with a status of EAGAIN and call_connect_status sends the
> task to call_transmit even though the socket is still not connected.
> 
> Having xs_error_report set the status of all tasks with EAGAIN seems
> wrong to me since we're essentially losing the error code that the
> socket layer sent up.

No. The next task to try to access the socket should pick it up.

See my previous comment about ensuring that we don't try to handle
state-related without adequate protection (i.e. locks) to ensure that
the state being reported by the error hasn't changed.

Trond

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.
  2010-10-21 20:18                       ` Trond Myklebust
@ 2011-03-23  6:41                         ` NeilBrown
  0 siblings, 0 replies; 20+ messages in thread
From: NeilBrown @ 2011-03-23  6:41 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jeff Layton, Chuck Lever, Linux NFS Mailing List



Hi all,

 I wold like to resurrect this old thread from last October, as the issue is
still a problem.

i.e. when the NFSv4 mount client in the kernel gets ENETUNREACH it doesn't
fail immediately but rather it times out and then fails.
Note that it doesn't timeout, retry, then fail.
Rather it gets the error status, waits the timeout (2 minutes), and then fails
with no retry.

The message below was the last in the thread.  There-in Trond says something
which sounds sensible and relevant, but unfortunately I completely fail to
understand the full relevance of because I don't know the ins-and-outs of the
rpc layer well enough.

Trond (or anyone else):  Could you possibly expand on this, maybe with some
pointers into the code.  I'm happy to try to come up with a fix myself, but I
don't like my chances with my current level of understanding.

Thanks,
NeilBrown




On Thu, 21 Oct 2010 16:18:18 -0400 Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:

> On Thu, 2010-10-21 at 16:08 -0400, Jeff Layton wrote:
> > On Thu, 21 Oct 2010 15:47:24 -0400
> > Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> > 
> > > On Thu, 2010-10-21 at 15:40 -0400, Jeff Layton wrote:
> > > > On Thu, 21 Oct 2010 10:42:04 -0400
> > > > Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> > > > 
> > > > > On Thu, 2010-10-21 at 10:31 -0400, Chuck Lever wrote:
> > > > > > On Oct 21, 2010, at 10:05 AM, Trond Myklebust wrote:
> > > > > > > It seems to me that the problem is basically that RPC_IS_SOFTCONN is
> > > > > > > poorly defined. IMO, the definition should really be that
> > > > > > > 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run
> > > > > > > call_bind or a call_connect more than once (i.e. if they ever have to
> > > > > > > loop back).
> > > > > > 
> > > > > > I think you are suggesting that we define the places where RPC_IS_SOFTCONN should be tested as exactly those places that might want to restart the RPC call via rpc_force_rebind, tk_action = call_refresh, and so on.  Yes?
> > > > > 
> > > > > Yes.
> > > > > 
> > > > > > > With that in mind, I really don't see why the RPC_IS_SOFTCONN case is
> > > > > > > being handled in call_transmit_status() instead of in call_status().
> > > > > > 
> > > > > > I tried it in call_status(), but it didn't work as expected.  My notes aren't specific about the problem.  One thing is for sure, there seems to be some redundant error handling logic in both of those states.  Unraveling it might be more than Neil bargained for, but would help "future generations."
> > > > > 
> > > > > call_transmit_status() is there basically in order to call
> > > > > xprt_end_transmit() in the cases where we want to free up the socket
> > > > > write lock (and then possibly sleep).
> > > > > 
> > > > > Actual state error handling is supposed occur in call_status().
> > > > > 
> > > > 
> > > > This is terribly confusing. So the connect handling is done in
> > > > call_transmit?
> > > 
> > > No. Connection-related _error conditions_ that result from trying to
> > > send stuff through a socket that is not connected, are handled in
> > > call_status(). Usually, by sending the process back to call_connect().
> > > 
> > 
> > Yep, I get that part...
> > 
> > > > It seems like it would make more sense to have the connect handling
> > > > mostly done in call_connect and call_connect_status, and only allow the
> > > > tasks to proceed to the call_transmit phase after the connect has
> > > > succeeded.
> > > 
> > > That is the case. However sockets can and do get closed by the _server_
> > > in unpredictable ways. This is what may need to be handled after the
> > > task has passed the call_connect state.
> > > 
> > 
> > Right, the socket can change state at any time, but the code doesn't
> > seem to do what you describe for initial connects.
> > 
> > In the EHOSTDOWN case, kernel_connect returns EINPROGRESS and then
> > xs_error_report is called with a socket error of EHOSTDOWN. That wakes
> > up the tasks with a status of EAGAIN and call_connect_status sends the
> > task to call_transmit even though the socket is still not connected.
> > 
> > Having xs_error_report set the status of all tasks with EAGAIN seems
> > wrong to me since we're essentially losing the error code that the
> > socket layer sent up.
> 
> No. The next task to try to access the socket should pick it up.
> 
> See my previous comment about ensuring that we don't try to handle
> state-related without adequate protection (i.e. locks) to ensure that
> the state being reported by the error hasn't changed.
> 
> Trond
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-03-23  6:41 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-20  7:17 NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts Neil Brown
2010-10-20 14:29 ` Chuck Lever
2010-10-20 21:29   ` Neil Brown
2010-10-21  0:56     ` Neil Brown
2010-10-21 12:09       ` Jeff Layton
2010-10-21 13:52         ` Chuck Lever
2010-10-21 14:10       ` Chuck Lever
2010-10-20 17:55 ` Jeff Layton
2010-10-20 19:16   ` Jeff Layton
2010-10-20 20:40     ` Neil Brown
2010-10-21  0:45       ` Jeff Layton
2010-10-21  3:25         ` Neil Brown
2010-10-21 14:05           ` Trond Myklebust
2010-10-21 14:31             ` Chuck Lever
2010-10-21 14:42               ` Trond Myklebust
2010-10-21 19:40                 ` Jeff Layton
2010-10-21 19:47                   ` Trond Myklebust
2010-10-21 20:08                     ` Jeff Layton
2010-10-21 20:18                       ` Trond Myklebust
2011-03-23  6:41                         ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.