linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Infinite loop when connecting to unreachable NFS servers
@ 2017-03-14 22:15 Sebastian Schmidt
  2017-03-14 22:53 ` Jeff Layton
  2017-03-20 22:42 ` Sebastian Schmidt
  0 siblings, 2 replies; 3+ messages in thread
From: Sebastian Schmidt @ 2017-03-14 22:15 UTC (permalink / raw)
  To: linux-nfs
  Cc: J. Bruce Fields, Jeff Layton, Trond Myklebust, Anna Schumaker,
	David S. Miller, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4057 bytes --]

Hi,

I was debugging some mysterious high CPU usage and tracked it down to
monitoring daemon regularly calling stat*() on an NFS automount
directory. The problem is triggered when mount.nfs passes mount() an
addr= that points to an unreachable address (i.e. connecting fails
immediately).

One way to reproduce is by trying to mount Google's DNS server, whose
IPv6 address isn't properly parsed or rejected by nfs-utils, causing an
invalid addr= to be passed to mount():

# strace -vf mount.nfs -v 2001:4860:4860:0:0:0:0:8888:/ /mnt
write(1, "mount.nfs: timeout set for Tue Mar 14 22:57:49 2017", ...
[...]
socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.7.209")}, 16) = -1 EINVAL (Invalid argument)
close(3)                                = 0
write(1, "mount.nfs: trying text-based options 'vers=4.2,addr=0.0.7.209,clientaddr=0.0.0.0'", ...
mount("2001:4860:4860:0:0:0:0:8888:/", "/mnt", "nfs", 0, "vers=4.2,addr=0.0.7.209,clientad"...) = ?
+++ killed by SIGKILL +++

While this is certainly a bug in mount.nfs, too, the bug can also be
triggered with net.conf.ipv6.all.disable_ipv6=1 and using a hostname that
resolves to an IPv6 address only (/etc/hosts works) with mount.nfs.

<4>[ 6564.082853] RPC:     8 sleep_on(queue "xprt_pending" time 4296533344)
<4>[ 6564.082855] RPC:     8 added to queue ffff8a673b146a18 "xprt_pending"
<4>[ 6564.082858] RPC:     8 setting alarm for 180000 ms
<4>[ 6564.082860] RPC:       xs_connect scheduled xprt ffff8a673b146800
<4>[ 6564.082863] RPC:     8 sync task going to sleep
<4>[ 6564.082868] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:784: ok (0)
<4>[ 6564.082870] RPC:       worker connecting xprt ffff8a673b146800 via tcp to 2001:4860:4860::8888 (port 2049)
<4>[ 6564.082874] RPC:       ffff8a673b146800 connect status 99 connected 0 sock state 7
<4>[ 6564.082876] RPC:     8 __rpc_wake_up_task (now 4296533344)
<4>[ 6564.082878] RPC:     8 disabling timer
<4>[ 6564.082880] RPC:     8 removed from queue ffff8a673b146a18 "xprt_pending"
<4>[ 6564.082883] RPC:       __rpc_wake_up_task done
<4>[ 6564.082885] RPC:       xs_tcp_state_change client ffff8a673b146800...
<4>[ 6564.082887] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
<4>[ 6564.082889] RPC:       disconnected transport ffff8a673b146800
<4>[ 6564.082893] RPC:       wake_up_first(ffff8a673b146970 "xprt_sending")
<4>[ 6564.082897] RPC:     8 sync task resuming
<4>[ 6564.082899] RPC:     8 xprt_connect_status: retrying
<4>[ 6564.082901] RPC:     8 call_connect_status (status -11)
<4>[ 6564.082903] RPC:     8 call_timeout (minor)
<4>[ 6564.082905] RPC:     8 call_bind (status 0)
<4>[ 6564.082907] RPC:     8 call_connect xprt ffff8a673b146800 is not connected
<4>[ 6564.082909] RPC:     8 xprt_connect xprt ffff8a673b146800 is not connected
<4>[ 6564.082912] RPC:     8 sleep_on(queue "xprt_pending" time 4296533344)
<4>[ 6564.082913] RPC:     8 added to queue ffff8a673b146a18 "xprt_pending"
<4>[ 6564.082914] RPC:     8 setting alarm for 180000 ms
<4>[ 6564.082914] RPC:       xs_connect scheduled xprt ffff8a673b146800
<4>[ 6564.082916] RPC:     8 sync task going to sleep
<4>[ 6564.082919] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:938: ok (0)
<4>[ 6564.082920] RPC:       worker connecting xprt ffff8a673b146800 via tcp to 2001:4860:4860::8888 (port 2049)
<4>[ 6564.082923] RPC:       ffff8a673b146800 connect status 99 connected 0 sock state 7
<4>[ 6564.082924] RPC:     8 __rpc_wake_up_task (now 4296533344)
<4>[ 6564.082924] RPC:     8 disabling timer
(and so on, until mount() times out)

Similar for the IPv6-literal test case, where it's failing to connect to
0.0.7.209, so it's affecting IPv4, too. While grepping around I found a
comment saying "If a UDP socket connect fails, the delay behavior here
prevents retry floods (hard mounts)." for xs_connect(), but I haven't
figured out how to trigger the UDP connection code.

Sebastian

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Infinite loop when connecting to unreachable NFS servers
  2017-03-14 22:15 Infinite loop when connecting to unreachable NFS servers Sebastian Schmidt
@ 2017-03-14 22:53 ` Jeff Layton
  2017-03-20 22:42 ` Sebastian Schmidt
  1 sibling, 0 replies; 3+ messages in thread
From: Jeff Layton @ 2017-03-14 22:53 UTC (permalink / raw)
  To: Sebastian Schmidt, linux-nfs
  Cc: J. Bruce Fields, Trond Myklebust, Anna Schumaker,
	David S. Miller, linux-kernel

On Tue, 2017-03-14 at 23:15 +0100, Sebastian Schmidt wrote:
> Hi,
> 
> I was debugging some mysterious high CPU usage and tracked it down to
> monitoring daemon regularly calling stat*() on an NFS automount
> directory. The problem is triggered when mount.nfs passes mount() an
> addr= that points to an unreachable address (i.e. connecting fails
> immediately).
> 
> One way to reproduce is by trying to mount Google's DNS server, whose
> IPv6 address isn't properly parsed or rejected by nfs-utils, causing an
> invalid addr= to be passed to mount():
> 
> # strace -vf mount.nfs -v 2001:4860:4860:0:0:0:0:8888:/ /mnt
> write(1, "mount.nfs: timeout set for Tue Mar 14 22:57:49 2017", ...
> [...]
> socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3
> bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
> connect(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.7.209")}, 16) = -1 EINVAL (Invalid argument)
> close(3)                                = 0
> write(1, "mount.nfs: trying text-based options 'vers=4.2,addr=0.0.7.209,clientaddr=0.0.0.0'", ...
> mount("2001:4860:4860:0:0:0:0:8888:/", "/mnt", "nfs", 0, "vers=4.2,addr=0.0.7.209,clientad"...) = ?
> +++ killed by SIGKILL +++
> 

That mount syntax is invalid. From nfs(5):

       The server's hostname can be an unqualified hostname, a fully qualified
       domain name, a dotted quad IPv4 address, or an IPv6 address enclosed in
       square brackets.  Link-local and  site-local  IPv6  addresses  must  be
       accompanied  by  an  interface  identifier.  See ipv6(7) for details on
       specifying raw IPv6 addresses.

So, something like this may work better:

    mount -t nfs '[2001:4860:4860:0:0:0:0:8888]:/' /mnt


> While this is certainly a bug in mount.nfs, too, the bug can also be
> triggered with net.conf.ipv6.all.disable_ipv6=1 and using a hostname that
> resolves to an IPv6 address only (/etc/hosts works) with mount.nfs.
> 
> <4>[ 6564.082853] RPC:     8 sleep_on(queue "xprt_pending" time 4296533344)
> <4>[ 6564.082855] RPC:     8 added to queue ffff8a673b146a18 "xprt_pending"
> <4>[ 6564.082858] RPC:     8 setting alarm for 180000 ms
> <4>[ 6564.082860] RPC:       xs_connect scheduled xprt ffff8a673b146800
> <4>[ 6564.082863] RPC:     8 sync task going to sleep
> <4>[ 6564.082868] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:784: ok (0)
> <4>[ 6564.082870] RPC:       worker connecting xprt ffff8a673b146800 via tcp to 2001:4860:4860::8888 (port 2049)
> <4>[ 6564.082874] RPC:       ffff8a673b146800 connect status 99 connected 0 sock state 7
> <4>[ 6564.082876] RPC:     8 __rpc_wake_up_task (now 4296533344)
> <4>[ 6564.082878] RPC:     8 disabling timer
> <4>[ 6564.082880] RPC:     8 removed from queue ffff8a673b146a18 "xprt_pending"
> <4>[ 6564.082883] RPC:       __rpc_wake_up_task done
> <4>[ 6564.082885] RPC:       xs_tcp_state_change client ffff8a673b146800...
> <4>[ 6564.082887] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> <4>[ 6564.082889] RPC:       disconnected transport ffff8a673b146800
> <4>[ 6564.082893] RPC:       wake_up_first(ffff8a673b146970 "xprt_sending")
> <4>[ 6564.082897] RPC:     8 sync task resuming
> <4>[ 6564.082899] RPC:     8 xprt_connect_status: retrying
> <4>[ 6564.082901] RPC:     8 call_connect_status (status -11)
> <4>[ 6564.082903] RPC:     8 call_timeout (minor)
> <4>[ 6564.082905] RPC:     8 call_bind (status 0)
> <4>[ 6564.082907] RPC:     8 call_connect xprt ffff8a673b146800 is not connected
> <4>[ 6564.082909] RPC:     8 xprt_connect xprt ffff8a673b146800 is not connected
> <4>[ 6564.082912] RPC:     8 sleep_on(queue "xprt_pending" time 4296533344)
> <4>[ 6564.082913] RPC:     8 added to queue ffff8a673b146a18 "xprt_pending"
> <4>[ 6564.082914] RPC:     8 setting alarm for 180000 ms
> <4>[ 6564.082914] RPC:       xs_connect scheduled xprt ffff8a673b146800
> <4>[ 6564.082916] RPC:     8 sync task going to sleep
> <4>[ 6564.082919] RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:938: ok (0)
> <4>[ 6564.082920] RPC:       worker connecting xprt ffff8a673b146800 via tcp to 2001:4860:4860::8888 (port 2049)
> <4>[ 6564.082923] RPC:       ffff8a673b146800 connect status 99 connected 0 sock state 7
> <4>[ 6564.082924] RPC:     8 __rpc_wake_up_task (now 4296533344)
> <4>[ 6564.082924] RPC:     8 disabling timer
> (and so on, until mount() times out)
> 
> Similar for the IPv6-literal test case, where it's failing to connect to
> 0.0.7.209, so it's affecting IPv4, too. While grepping around I found a
> comment saying "If a UDP socket connect fails, the delay behavior here
> prevents retry floods (hard mounts)." for xs_connect(), but I haven't
> figured out how to trigger the UDP connection code.
> 



-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Infinite loop when connecting to unreachable NFS servers
  2017-03-14 22:15 Infinite loop when connecting to unreachable NFS servers Sebastian Schmidt
  2017-03-14 22:53 ` Jeff Layton
@ 2017-03-20 22:42 ` Sebastian Schmidt
  1 sibling, 0 replies; 3+ messages in thread
From: Sebastian Schmidt @ 2017-03-20 22:42 UTC (permalink / raw)
  To: linux-nfs, Trond Myklebust
  Cc: J. Bruce Fields, Jeff Layton, Anna Schumaker, David S. Miller,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2725 bytes --]

Hi,

On Tue, Mar 14, 2017 at 11:15:25PM +0100, Sebastian Schmidt wrote:
> I was debugging some mysterious high CPU usage and tracked it down to
> monitoring daemon regularly calling stat*() on an NFS automount
> directory. The problem is triggered when mount.nfs passes mount() an
> addr= that points to an unreachable address (i.e. connecting fails
> immediately).

I looked further into the busy-reconnect issue and I want to share what
I believe happens.

My initial report called mount.nfs with "2001:4860:4860:0:0:0:0:8888:/" 
which is, as Jeff pointed out, incorrect, but it caused mount(2) to be 
called with addr=0.0.7.209. In reality, I'm losing my default route and 
an actually valid addr= is getting passed to mount(), but both cases hit
the same code.

In xs_tcp_setup_socket(), xs_tcp_finish_connecting() returns an error. 
For my made-up test case (0.0.7.209) it's EINVAL, in real life
ENETUNREACH. The third trigger is passing a valid IPv6 addr= and setting
net.ipv6.conf.all.disable_ipv6 to 1, thereby causing an EADDRNOTAVAIL.

Interestingly, the EADDRNOTAVAIL branch has this comment:
        /* We're probably in TIME_WAIT. Get rid of existing socket,
         * and retry
         */
        xs_tcp_force_close(xprt);
        break;

whereas the EINVAL and ENETUNREACH case carries this one:
        /* retry with existing socket, after a delay */
        xs_tcp_force_close(xprt);
        goto out;

So both calls to xs_tcp_force_close() claim to retry, but one reuses the
socket and the other doesn't? The only code skipped by the "goto out" 
for the second case is "status = -EAGAIN", and this apparently does not
cause any delayed retries either.

That second case got changed in 
4efdd92c921135175a85452cd41273d9e2788db3, where the call to 
xs_tcp_force_close() was added initially. That call, however, causes an 
autoclose call via xprt_force_disconnect(), eventually invalidating 
transport->sock.

That transport->sock, however, is being checked in xs_connect() for 
!NULL and, in that case only, a delayed reconnect is scheduled. If 
disable_ipv6=1 would already have caused connect() to return 
EADDRNOTAVAIL, rather than ENETUNREACH as with 3.19-ish, that same 
busy-reconnect loop would have also been triggered in that case, even 
before 4efdd92c.

So apparently the (only?) code that's responsible for delaying a 
reconnect is in xs_connect(), and due to the fact that 
xs_tcp_force_close() is called on every connection error, 
transport->sock gets NULLed due to autoclose and that delay code is 
never run. 

Here I'm stuck at figuring out what the code is intented to do and would
appreciate any help.

Thanks,

Sebastian

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-03-20 22:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-14 22:15 Infinite loop when connecting to unreachable NFS servers Sebastian Schmidt
2017-03-14 22:53 ` Jeff Layton
2017-03-20 22:42 ` Sebastian Schmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).