All of lore.kernel.org
 help / color / mirror / Atom feed
* Inconsistent error codes between NFSv4 and v3 on network issues
@ 2013-03-01 11:43 Jan Engelhardt
  2013-03-04 14:16 ` J. Bruce Fields
  2013-03-04 17:47 ` Chuck Lever
  0 siblings, 2 replies; 8+ messages in thread
From: Jan Engelhardt @ 2013-03-01 11:43 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

Hi.


I had here a case with a Linux 3.7.9 system, a virtual machine in a
RFC1918 range, that did not want to mount NFS.

linux-3lzm:~ # strace -fe mount mount -t nfs 134.76.12.5:/X /mnt
Process 1477 attached
[pid  1515] mount("134.76.12.5:/X", "/mnt", "nfs", 0, "vers=4,addr=134.76.12.5,clientaddr=0.0.0.0") = -1 EIO (Input/output error)
mount.nfs: mount system call failed
[pid  1477] +++ exited with 32 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1477, si_status=32, si_utime=0, si_stime=0} ---
+++ exited with 32 +++

Nothing in dmesg...

[   84.202243] RPC: Registered named UNIX socket transport module.
[   84.202246] RPC: Registered udp transport module.
[   84.202248] RPC: Registered tcp transport module.
[   84.202249] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   84.205909] FS-Cache: Loaded
[   84.208782] FS-Cache: Netfs 'nfs' registered for caching
[   84.215733] NFS: Registering the id_resolver key type
[   84.215762] Key type id_resolver registered
[   84.215763] Key type id_legacy registered

When mounting with NFSv3, the error became clear:

# strace -fe mount -s 65536 mount -t nfs 134.76.12.5:/X /mnt -o nfsvers=3,nolock
Process 1550 attached
mount.nfs: Network is unreachable
[pid  1550] +++ exited with 32 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1550, si_status=32, si_utime=0, si_stime=0} ---
+++ exited with 32 +++

Can NFSv4 be made to return -ENETUNREACH as well?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Inconsistent error codes between NFSv4 and v3 on network issues
  2013-03-01 11:43 Inconsistent error codes between NFSv4 and v3 on network issues Jan Engelhardt
@ 2013-03-04 14:16 ` J. Bruce Fields
  2013-03-04 14:16   ` J. Bruce Fields
  2013-03-04 17:47 ` Chuck Lever
  1 sibling, 1 reply; 8+ messages in thread
From: J. Bruce Fields @ 2013-03-04 14:16 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: linux-nfs

Adding Chuck to cc:, as he's probably done the most mucking with the
mount code in recent years?

--b.

On Fri, Mar 01, 2013 at 12:43:58PM +0100, Jan Engelhardt wrote:
> Hi.
> 
> 
> I had here a case with a Linux 3.7.9 system, a virtual machine in a
> RFC1918 range, that did not want to mount NFS.
> 
> linux-3lzm:~ # strace -fe mount mount -t nfs 134.76.12.5:/X /mnt
> Process 1477 attached
> [pid  1515] mount("134.76.12.5:/X", "/mnt", "nfs", 0, "vers=4,addr=134.76.12.5,clientaddr=0.0.0.0") = -1 EIO (Input/output error)
> mount.nfs: mount system call failed
> [pid  1477] +++ exited with 32 +++
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1477, si_status=32, si_utime=0, si_stime=0} ---
> +++ exited with 32 +++
> 
> Nothing in dmesg...
> 
> [   84.202243] RPC: Registered named UNIX socket transport module.
> [   84.202246] RPC: Registered udp transport module.
> [   84.202248] RPC: Registered tcp transport module.
> [   84.202249] RPC: Registered tcp NFSv4.1 backchannel transport module.
> [   84.205909] FS-Cache: Loaded
> [   84.208782] FS-Cache: Netfs 'nfs' registered for caching
> [   84.215733] NFS: Registering the id_resolver key type
> [   84.215762] Key type id_resolver registered
> [   84.215763] Key type id_legacy registered
> 
> When mounting with NFSv3, the error became clear:
> 
> # strace -fe mount -s 65536 mount -t nfs 134.76.12.5:/X /mnt -o nfsvers=3,nolock
> Process 1550 attached
> mount.nfs: Network is unreachable
> [pid  1550] +++ exited with 32 +++
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1550, si_status=32, si_utime=0, si_stime=0} ---
> +++ exited with 32 +++
> 
> Can NFSv4 be made to return -ENETUNREACH as well?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Inconsistent error codes between NFSv4 and v3 on network issues
  2013-03-04 14:16 ` J. Bruce Fields
@ 2013-03-04 14:16   ` J. Bruce Fields
  0 siblings, 0 replies; 8+ messages in thread
From: J. Bruce Fields @ 2013-03-04 14:16 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: linux-nfs, chuck.lever

Um, for real this time.

On Mon, Mar 04, 2013 at 09:16:00AM -0500, J. Bruce Fields wrote:
> Adding Chuck to cc:, as he's probably done the most mucking with the
> mount code in recent years?
> 
> --b.
> 
> On Fri, Mar 01, 2013 at 12:43:58PM +0100, Jan Engelhardt wrote:
> > Hi.
> > 
> > 
> > I had here a case with a Linux 3.7.9 system, a virtual machine in a
> > RFC1918 range, that did not want to mount NFS.
> > 
> > linux-3lzm:~ # strace -fe mount mount -t nfs 134.76.12.5:/X /mnt
> > Process 1477 attached
> > [pid  1515] mount("134.76.12.5:/X", "/mnt", "nfs", 0, "vers=4,addr=134.76.12.5,clientaddr=0.0.0.0") = -1 EIO (Input/output error)
> > mount.nfs: mount system call failed
> > [pid  1477] +++ exited with 32 +++
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1477, si_status=32, si_utime=0, si_stime=0} ---
> > +++ exited with 32 +++
> > 
> > Nothing in dmesg...
> > 
> > [   84.202243] RPC: Registered named UNIX socket transport module.
> > [   84.202246] RPC: Registered udp transport module.
> > [   84.202248] RPC: Registered tcp transport module.
> > [   84.202249] RPC: Registered tcp NFSv4.1 backchannel transport module.
> > [   84.205909] FS-Cache: Loaded
> > [   84.208782] FS-Cache: Netfs 'nfs' registered for caching
> > [   84.215733] NFS: Registering the id_resolver key type
> > [   84.215762] Key type id_resolver registered
> > [   84.215763] Key type id_legacy registered
> > 
> > When mounting with NFSv3, the error became clear:
> > 
> > # strace -fe mount -s 65536 mount -t nfs 134.76.12.5:/X /mnt -o nfsvers=3,nolock
> > Process 1550 attached
> > mount.nfs: Network is unreachable
> > [pid  1550] +++ exited with 32 +++
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1550, si_status=32, si_utime=0, si_stime=0} ---
> > +++ exited with 32 +++
> > 
> > Can NFSv4 be made to return -ENETUNREACH as well?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Inconsistent error codes between NFSv4 and v3 on network issues
  2013-03-01 11:43 Inconsistent error codes between NFSv4 and v3 on network issues Jan Engelhardt
  2013-03-04 14:16 ` J. Bruce Fields
@ 2013-03-04 17:47 ` Chuck Lever
  2013-03-04 19:10   ` Jan Engelhardt
  1 sibling, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2013-03-04 17:47 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: J. Bruce Fields, linux-nfs


On Mar 1, 2013, at 6:43 AM, Jan Engelhardt <jengelh@inai.de> wrote:

> Hi.
> 
> 
> I had here a case with a Linux 3.7.9 system, a virtual machine in a
> RFC1918 range, that did not want to mount NFS.
> 
> linux-3lzm:~ # strace -fe mount mount -t nfs 134.76.12.5:/X /mnt
> Process 1477 attached
> [pid  1515] mount("134.76.12.5:/X", "/mnt", "nfs", 0, "vers=4,addr=134.76.12.5,clientaddr=0.0.0.0") = -1 EIO (Input/output error)

"clientaddr=0.0.0.0" is interesting: perhaps the mount.nfs command should have failed right there.

But, let's find out why the kernel is returning EIO.  Enter:

 # rpcdebug -m nfs -s all
 # rpcdebug -m rpc -s call xprt

Try your NFSv4 mount command again, then post relevant excerpts from the kernel log.

By the way, does this happen on older kernels?

> mount.nfs: mount system call failed
> [pid  1477] +++ exited with 32 +++
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1477, si_status=32, si_utime=0, si_stime=0} ---
> +++ exited with 32 +++
> 
> Nothing in dmesg...
> 
> [   84.202243] RPC: Registered named UNIX socket transport module.
> [   84.202246] RPC: Registered udp transport module.
> [   84.202248] RPC: Registered tcp transport module.
> [   84.202249] RPC: Registered tcp NFSv4.1 backchannel transport module.
> [   84.205909] FS-Cache: Loaded
> [   84.208782] FS-Cache: Netfs 'nfs' registered for caching
> [   84.215733] NFS: Registering the id_resolver key type
> [   84.215762] Key type id_resolver registered
> [   84.215763] Key type id_legacy registered

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Inconsistent error codes between NFSv4 and v3 on network issues
  2013-03-04 17:47 ` Chuck Lever
@ 2013-03-04 19:10   ` Jan Engelhardt
  2013-03-04 20:43     ` Myklebust, Trond
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Engelhardt @ 2013-03-04 19:10 UTC (permalink / raw)
  To: Chuck Lever; +Cc: J. Bruce Fields, linux-nfs


On Monday 2013-03-04 18:47, Chuck Lever wrote:
>> 
>> linux-3lzm:~ # strace -fe mount mount -t nfs 134.76.12.5:/X /mnt
>> Process 1477 attached
>> [pid  1515] mount("134.76.12.5:/X", "/mnt", "nfs", 0, "vers=4,addr=134.76.12.5,clientaddr=0.0.0.0") = -1 EIO (Input/output error)
>
>"clientaddr=0.0.0.0" is interesting: perhaps the mount.nfs command
>should have failed right there.

Maybe (since there is no route). Maybe not: I would interpret 0.0.0.0
as "kernel will pick something".

This kernel 3.7.9 system has rpcbind-0.2.0_git201103171419-7.1.1.x86_64
and nfs-client-1.2.7-2.1.1.x86_64.


>But, let's find out why the kernel is returning EIO.  Enter:
>
> # rpcdebug -m nfs -s all
> # rpcdebug -m rpc -s call xprt
>
>Try your NFSv4 mount command again, then post relevant excerpts from the kernel log.

[205196.949572] NFS: nfs mount opts='vers=4,addr=134.76.12.5,clientaddr=0.0.0.0'
[205196.949580] NFS:   parsing nfs mount option 'vers=4'
[205196.949586] NFS:   parsing nfs mount option 'addr=134.76.12.5'
[205196.949592] NFS:   parsing nfs mount option 'clientaddr=0.0.0.0'
[205196.949597] NFS: MNTPATH: '/X'
[205196.949600] --> nfs4_try_mount()
[205196.949607] --> nfs4_create_server()
[205196.949639] --> nfs4_init_server()
[205196.949641] --> nfs4_set_client()
[205196.949644] --> nfs_get_client(134.76.12.5,v4)
[205196.949651] NFS: get client cookie (0xffff88007abe3800/0xffff88007ba29420)
[205196.949666] RPC:       created transport ffff88007ba50800 with 65536 slots
[205196.949670] RPC:       creating nfs client for 134.76.12.5 (xprt ffff88007ba50800)
[205196.949693] RPC:     2 call_start nfs4 proc NULL (sync)
[205196.949705] RPC:     2 call_reserve (status 0)
[205196.949709] RPC:     2 reserved req ffff88007b342800 xid 11fca56c
[205196.949712] RPC:     2 call_reserveresult (status 0)
[205196.949714] RPC:     2 call_refresh (status 0)
[205196.949716] RPC:     2 call_refreshresult (status 0)
[205196.949718] RPC:     2 call_allocate (status 0)
[205196.949722] RPC:     2 call_bind (status 0)
[205196.949724] RPC:     2 call_connect xprt ffff88007ba50800 is not connected
[205196.949727] RPC:     2 xprt_connect xprt ffff88007ba50800 is not connected
[205196.949765] RPC:     2 xprt_connect_status: error 101 connecting to server 134.76.12.5
[205196.949779] RPC:     2 call_connect_status (status -5)
[205196.949791] RPC:     2 release request ffff88007b342800
[205196.949794] RPC:       rpc_release_client(ffff88007b342e00)
[205196.949797] RPC:       shutting down nfs client for 134.76.12.5
[205196.949798] RPC:       rpc_release_client(ffff88007b342e00)
[205196.949801] RPC:       destroying nfs client for 134.76.12.5
[205196.949803] RPC:       destroying transport ffff88007ba50800
[205196.949812] RPC:       disconnected transport ffff88007ba50800
[205196.949816] nfs_create_rpc_client: cannot create RPC client. Error = -5
[205196.949818] --> nfs_put_client({1})
[205196.949820] --> nfs_free_client(4)
[205196.949822] NFS: releasing client cookie (0xffff88007abe3800/0xffff88007ba29420)
[205196.949824] <-- nfs_free_client()
[205196.949826] <-- nfs4_init_client() = xerror -5
[205196.949828] <-- nfs4_set_client() = xerror -5
[205196.949829] <-- nfs4_init_server() = -5
[205196.949831] --> nfs_free_server()
[205196.949850] <-- nfs_free_server()
[205196.949852] <-- nfs4_create_server() = error -5
[205196.949856] <-- nfs4_try_mount() = -5 [error]

Just nuke your default route, and it should be easily reproducible.


>By the way, does this happen on older kernels?

On kernel 3.0.51 with rpcbind-0.1.6+git20080930-6.18.1 and
nfs-utils-1.2.3, I observe that

rtsnode1:~ # strace -fe mount -s 65535 mount -t nfs 134.76.12.5:/X /mnt -o
nfsvers=4
Process 24358 attached
Process 24357 suspended
[pid 24358] mount("134.76.12.5:/X", "/mnt", "nfs", 0,
"nfsvers=4,addr=134.76.12.5,clientaddr=0.0.0.0"
<hang, but interruptible>

rpcdebug/dmesg:
[790206.304549] NFS: nfs mount opts='nfsvers=4,addr=134.76.12.5,clientaddr=0.0.0.0'
[790206.304549] NFS:   parsing nfs mount option 'nfsvers=4'
[790206.304549] NFS:   parsing nfs mount option 'addr=134.76.12.5'
[790206.304549] NFS:   parsing nfs mount option 'clientaddr=0.0.0.0'
[790206.304549] NFS: MNTPATH: '/X'
[790206.304549] --> nfs4_try_mount()
[790206.304549] --> nfs4_create_server()
[790206.304549] --> nfs4_init_server()
[790206.304549] --> nfs4_set_client()
[790206.304549] --> nfs_get_client(134.76.12.5,v4)
[790206.304549] NFS: get client cookie (0xffff880079a16400/0xffff880079cd41e0)
[790206.304549] RPC:       created transport ffff880078f26800 with 16 slots
[790206.304549] RPC:       creating nfs client for 134.76.12.5 (xprt ffff880078f26800)
[790206.304550] RPC:    45 call_start nfs4 proc NULL (sync)
[790206.304550] RPC:    45 call_reserve (status 0)
[790206.304550] RPC:    45 reserved req ffff880037a16000 xid 68604dfd
[790206.304550] RPC:    45 call_reserveresult (status 0)
[790206.304550] RPC:    45 call_refresh (status 0)
[790206.304550] RPC:    45 call_refreshresult (status 0)
[790206.304550] RPC:    45 call_allocate (status 0)
[790206.304550] RPC:    45 call_bind (status 0)
[790206.304550] RPC:    45 call_connect xprt ffff880078f26800 is not connected
[790206.304550] RPC:    45 xprt_connect xprt ffff880078f26800 is not connected

<no further output until Ctrl-C issued>

[790320.020951] RPC:    45 xprt_connect_status: error 512 connecting to server 134.76.12.5
[790320.020960] RPC:    45 release request ffff880037a16000
[790320.020963] RPC:       rpc_release_client(ffff880079bc0200)
[790320.020968] RPC:       shutting down nfs client for 134.76.12.5
[790320.020970] RPC:       rpc_release_client(ffff880079bc0200)
[790320.020973] RPC:       destroying nfs client for 134.76.12.5
[790320.024752] RPC:       destroying transport ffff880078f26800
[790320.024767] RPC:       disconnected transport ffff880078f26800
[790320.024772] nfs_create_rpc_client: cannot create RPC client. Error = -5
[790320.024775] <-- nfs4_init_client() = xerror -5
[790320.024777] --> nfs_put_client({1})
[790320.024781] --> nfs_free_client(4)
[790320.024783] NFS: releasing client cookie (0xffff880079a16400/0xffff880079cd41e0)
[790320.024788] <-- nfs_free_client()
[790320.024789] <-- nfs4_set_client() = xerror -5
[790320.024791] <-- nfs4_init_server() = -5
[790320.024810] --> nfs_free_server()
[790320.024816] <-- nfs_free_server()
[790320.024818] <-- nfs4_create_server() = error -5
[790320.024822] <-- nfs4_try_mount() = -5 [error]



And, if network connectivity is present, this occurs:


# strace -fe mount -s 65535 mount -t nfs 134.76.12.5:/X /mnt -o nfsvers=4
Process 24383 attached
Process 24382 suspended
[pid 24383] mount("134.76.12.5:/X", "/mnt", "nfs", 0,
"nfsvers=4,addr=134.76.12.5,clientaddr=10.10.7.142" <short wait> ) = 0
Process 24382 resumed
Process 24383 detached
--- SIGCHLD (Child exited) @ 0 (0) ---


And that then probably answers your clientaddr= concern.

So what 3.7.x does better is that it immediately identifies
EHOSTUNREACH/ENETUNREACH while 3.0.x waits for Godot.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Inconsistent error codes between NFSv4 and v3 on network issues
  2013-03-04 19:10   ` Jan Engelhardt
@ 2013-03-04 20:43     ` Myklebust, Trond
  2013-03-04 22:37       ` Myklebust, Trond
  0 siblings, 1 reply; 8+ messages in thread
From: Myklebust, Trond @ 2013-03-04 20:43 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Chuck Lever, J. Bruce Fields, linux-nfs

On Mon, 2013-03-04 at 20:10 +0100, Jan Engelhardt wrote:
> On Monday 2013-03-04 18:47, Chuck Lever wrote:
> >> 
> >> linux-3lzm:~ # strace -fe mount mount -t nfs 134.76.12.5:/X /mnt
> >> Process 1477 attached
> >> [pid  1515] mount("134.76.12.5:/X", "/mnt", "nfs", 0, "vers=4,addr=134.76.12.5,clientaddr=0.0.0.0") = -1 EIO (Input/output error)
> >
> >"clientaddr=0.0.0.0" is interesting: perhaps the mount.nfs command
> >should have failed right there.
> 
> Maybe (since there is no route). Maybe not: I would interpret 0.0.0.0
> as "kernel will pick something".
> 
> This kernel 3.7.9 system has rpcbind-0.2.0_git201103171419-7.1.1.x86_64
> and nfs-client-1.2.7-2.1.1.x86_64.
> 
> 
> >But, let's find out why the kernel is returning EIO.  Enter:
> >
> > # rpcdebug -m nfs -s all
> > # rpcdebug -m rpc -s call xprt
> >
> >Try your NFSv4 mount command again, then post relevant excerpts from the kernel log.
> 
> [205196.949572] NFS: nfs mount opts='vers=4,addr=134.76.12.5,clientaddr=0.0.0.0'
> [205196.949580] NFS:   parsing nfs mount option 'vers=4'
> [205196.949586] NFS:   parsing nfs mount option 'addr=134.76.12.5'
> [205196.949592] NFS:   parsing nfs mount option 'clientaddr=0.0.0.0'
> [205196.949597] NFS: MNTPATH: '/X'
> [205196.949600] --> nfs4_try_mount()
> [205196.949607] --> nfs4_create_server()
> [205196.949639] --> nfs4_init_server()
> [205196.949641] --> nfs4_set_client()
> [205196.949644] --> nfs_get_client(134.76.12.5,v4)
> [205196.949651] NFS: get client cookie (0xffff88007abe3800/0xffff88007ba29420)
> [205196.949666] RPC:       created transport ffff88007ba50800 with 65536 slots
> [205196.949670] RPC:       creating nfs client for 134.76.12.5 (xprt ffff88007ba50800)
> [205196.949693] RPC:     2 call_start nfs4 proc NULL (sync)
> [205196.949705] RPC:     2 call_reserve (status 0)
> [205196.949709] RPC:     2 reserved req ffff88007b342800 xid 11fca56c
> [205196.949712] RPC:     2 call_reserveresult (status 0)
> [205196.949714] RPC:     2 call_refresh (status 0)
> [205196.949716] RPC:     2 call_refreshresult (status 0)
> [205196.949718] RPC:     2 call_allocate (status 0)
> [205196.949722] RPC:     2 call_bind (status 0)
> [205196.949724] RPC:     2 call_connect xprt ffff88007ba50800 is not connected
> [205196.949727] RPC:     2 xprt_connect xprt ffff88007ba50800 is not connected
> [205196.949765] RPC:     2 xprt_connect_status: error 101 connecting to server 134.76.12.5
> [205196.949779] RPC:     2 call_connect_status (status -5)
> [205196.949791] RPC:     2 release request ffff88007b342800
> [205196.949794] RPC:       rpc_release_client(ffff88007b342e00)
> [205196.949797] RPC:       shutting down nfs client for 134.76.12.5
> [205196.949798] RPC:       rpc_release_client(ffff88007b342e00)
> [205196.949801] RPC:       destroying nfs client for 134.76.12.5
> [205196.949803] RPC:       destroying transport ffff88007ba50800
> [205196.949812] RPC:       disconnected transport ffff88007ba50800
> [205196.949816] nfs_create_rpc_client: cannot create RPC client. Error = -5
> [205196.949818] --> nfs_put_client({1})
> [205196.949820] --> nfs_free_client(4)
> [205196.949822] NFS: releasing client cookie (0xffff88007abe3800/0xffff88007ba29420)
> [205196.949824] <-- nfs_free_client()
> [205196.949826] <-- nfs4_init_client() = xerror -5
> [205196.949828] <-- nfs4_set_client() = xerror -5
> [205196.949829] <-- nfs4_init_server() = -5
> [205196.949831] --> nfs_free_server()
> [205196.949850] <-- nfs_free_server()
> [205196.949852] <-- nfs4_create_server() = error -5
> [205196.949856] <-- nfs4_try_mount() = -5 [error]
> 
> Just nuke your default route, and it should be easily reproducible.
> 

The problem is that call_connect_status() is converting that ENETUNREACH
into a EIO. We shouldn't be doing that, but should leave it up to the
caller (i.e. the NFS layer) to perform that kind of mapping.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Inconsistent error codes between NFSv4 and v3 on network issues
  2013-03-04 20:43     ` Myklebust, Trond
@ 2013-03-04 22:37       ` Myklebust, Trond
  2013-03-13 14:47         ` Jan Engelhardt
  0 siblings, 1 reply; 8+ messages in thread
From: Myklebust, Trond @ 2013-03-04 22:37 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Chuck Lever, J. Bruce Fields, linux-nfs

[-- Attachment #1: Type: text/plain, Size: 576 bytes --]

On Mon, 2013-03-04 at 20:43 +0000, Myklebust, Trond wrote:
> On Mon, 2013-03-04 at 20:10 +0100, Jan Engelhardt wrote:
> > Just nuke your default route, and it should be easily reproducible.
> > 
> 
> The problem is that call_connect_status() is converting that ENETUNREACH
> into a EIO. We shouldn't be doing that, but should leave it up to the
> caller (i.e. the NFS layer) to perform that kind of mapping.

Could you please check if the attached patch helps.
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-SUNRPC-Report-network-connection-errors-correctly-fo.patch --]
[-- Type: text/x-patch; name="0001-SUNRPC-Report-network-connection-errors-correctly-fo.patch", Size: 2560 bytes --]

From 724bf7a71b44145811a1fec3b20a26bd1edae51b Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Mon, 4 Mar 2013 17:29:33 -0500
Subject: [PATCH] SUNRPC: Report network/connection errors correctly for
 SOFTCONN rpc tasks

In the case of a SOFTCONN rpc task, we really want to ensure that it
reports errors like ENETUNREACH back to the caller. Currently, only
some of these errors are being reported back (connect errors are not),
and they are being converted by the RPC layer into EIO.

Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 net/sunrpc/clnt.c     | 24 ++++++++++++++----------
 net/sunrpc/xprtsock.c |  8 ++++----
 2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index dcc446e..b95a0a2 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1644,22 +1644,26 @@ call_connect_status(struct rpc_task *task)
 
 	dprint_status(task);
 
-	task->tk_status = 0;
-	if (status >= 0 || status == -EAGAIN) {
-		clnt->cl_stats->netreconn++;
-		task->tk_action = call_transmit;
-		return;
-	}
-
 	trace_rpc_connect_status(task, status);
 	switch (status) {
 		/* if soft mounted, test if we've timed out */
 	case -ETIMEDOUT:
 		task->tk_action = call_timeout;
-		break;
-	default:
-		rpc_exit(task, -EIO);
+		return;
+	case -ECONNREFUSED:
+	case -ECONNRESET:
+	case -ENETUNREACH:
+		if (RPC_IS_SOFTCONN(task))
+			break;
+		/* retry with existing socket, after a delay */
+	case 0:
+	case -EAGAIN:
+		task->tk_status = 0;
+		clnt->cl_stats->netreconn++;
+		task->tk_action = call_transmit;
+		return;
 	}
+	rpc_exit(task, status);
 }
 
 /*
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index c1d8476..3081620 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2202,10 +2202,6 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 		 */
 		xs_tcp_force_close(xprt);
 		break;
-	case -ECONNREFUSED:
-	case -ECONNRESET:
-	case -ENETUNREACH:
-		/* retry with existing socket, after a delay */
 	case 0:
 	case -EINPROGRESS:
 	case -EALREADY:
@@ -2216,6 +2212,10 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 		/* Happens, for instance, if the user specified a link
 		 * local IPv6 address without a scope-id.
 		 */
+	case -ECONNREFUSED:
+	case -ECONNRESET:
+	case -ENETUNREACH:
+		/* retry with existing socket, after a delay */
 		goto out;
 	}
 out_eagain:
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Inconsistent error codes between NFSv4 and v3 on network issues
  2013-03-04 22:37       ` Myklebust, Trond
@ 2013-03-13 14:47         ` Jan Engelhardt
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Engelhardt @ 2013-03-13 14:47 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Chuck Lever, J. Bruce Fields, linux-nfs


On Monday 2013-03-04 23:37, Myklebust, Trond wrote:

>On Mon, 2013-03-04 at 20:43 +0000, Myklebust, Trond wrote:
>> On Mon, 2013-03-04 at 20:10 +0100, Jan Engelhardt wrote:
>> > Just nuke your default route, and it should be easily reproducible.
>> > 
>> 
>> The problem is that call_connect_status() is converting that ENETUNREACH
>> into a EIO. We shouldn't be doing that, but should leave it up to the
>> caller (i.e. the NFS layer) to perform that kind of mapping.
>
>Could you please check if the attached patch helps.

With the patch, I still get EIO from the mount syscall.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-03-13 14:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-01 11:43 Inconsistent error codes between NFSv4 and v3 on network issues Jan Engelhardt
2013-03-04 14:16 ` J. Bruce Fields
2013-03-04 14:16   ` J. Bruce Fields
2013-03-04 17:47 ` Chuck Lever
2013-03-04 19:10   ` Jan Engelhardt
2013-03-04 20:43     ` Myklebust, Trond
2013-03-04 22:37       ` Myklebust, Trond
2013-03-13 14:47         ` Jan Engelhardt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.