Hi,

On Tue, Mar 14, 2017 at 11:15:25PM +0100, Sebastian Schmidt wrote:
> I was debugging some mysterious high CPU usage and tracked it down to
> monitoring daemon regularly calling stat*() on an NFS automount
> directory. The problem is triggered when mount.nfs passes mount() an
> addr= that points to an unreachable address (i.e. connecting fails
> immediately).

I looked further into the busy-reconnect issue and I want to share what
I believe happens.

My initial report called mount.nfs with "2001:4860:4860:0:0:0:0:8888:/" 
which is, as Jeff pointed out, incorrect, but it caused mount(2) to be 
called with addr=0.0.7.209. In reality, I'm losing my default route and 
an actually valid addr= is getting passed to mount(), but both cases hit
the same code.

In xs_tcp_setup_socket(), xs_tcp_finish_connecting() returns an error. 
For my made-up test case (0.0.7.209) it's EINVAL, in real life
ENETUNREACH. The third trigger is passing a valid IPv6 addr= and setting
net.ipv6.conf.all.disable_ipv6 to 1, thereby causing an EADDRNOTAVAIL.

Interestingly, the EADDRNOTAVAIL branch has this comment:
        /* We're probably in TIME_WAIT. Get rid of existing socket,
         * and retry
         */
        xs_tcp_force_close(xprt);
        break;

whereas the EINVAL and ENETUNREACH case carries this one:
        /* retry with existing socket, after a delay */
        xs_tcp_force_close(xprt);
        goto out;

So both calls to xs_tcp_force_close() claim to retry, but one reuses the
socket and the other doesn't? The only code skipped by the "goto out" 
for the second case is "status = -EAGAIN", and this apparently does not
cause any delayed retries either.

That second case got changed in 
4efdd92c921135175a85452cd41273d9e2788db3, where the call to 
xs_tcp_force_close() was added initially. That call, however, causes an 
autoclose call via xprt_force_disconnect(), eventually invalidating 
transport->sock.

That transport->sock, however, is being checked in xs_connect() for 
!NULL and, in that case only, a delayed reconnect is scheduled. If 
disable_ipv6=1 would already have caused connect() to return 
EADDRNOTAVAIL, rather than ENETUNREACH as with 3.19-ish, that same 
busy-reconnect loop would have also been triggered in that case, even 
before 4efdd92c.

So apparently the (only?) code that's responsible for delaying a 
reconnect is in xs_connect(), and due to the fact that 
xs_tcp_force_close() is called on every connection error, 
transport->sock gets NULLed due to autoclose and that delay code is 
never run. 

Here I'm stuck at figuring out what the code is intented to do and would
appreciate any help.

Thanks,

Sebastian