linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 3.10.y regression caused by:  lockd: ensure we tear down any live sockets when socket creation fails during lockd_up
@ 2014-06-20 11:14 Nikita Yushchenko
  2014-07-07 22:27 ` Greg Kroah-Hartman
  0 siblings, 1 reply; 5+ messages in thread
From: Nikita Yushchenko @ 2014-06-20 11:14 UTC (permalink / raw)
  To: stable
  Cc: Raphos, Jeff Layton, Stanislav Kinsbursky, J. Bruce Fields,
	Greg Kroah-Hartman, 'Alexey Lugovskoy',
	Konstantin Kholopov, linux-kernel

With current 3.10.y, if kernel is booted with init=/bin/sh and then nfs mount
is attempted (without portmap or rpcbind running) using busybox mount, following
OOPS happen:

# mount -t nfs 10.30.130.21:/opt /mnt
svc: failed to register lockdv1 RPC service (errno 111).
lockd_up: makesock failed, error=-111
Unable to handle kernel paging request for data at address 0x00000030
Faulting instruction address: 0xc055e65c
Oops: Kernel access of bad area, sig: 11 [#1]
MPC85xx CDS
Modules linked in:
CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117
task: cf29cea0 ti: cf35c000 task.ti: cf35c000
NIP: c055e65c LR: c0566490 CTR: c055e648
REGS: cf35dad0 TRAP: 0300   Not tainted  (3.10.44.cge)
MSR: 00029000 <CE,EE,ME>  CR: 22442488  XER: 20000000
DEAR: 00000030, ESR: 00000000

GPR00: c05606f4 cf35db80 cf29cea0 cf0ded80 cf0dedb8 00000001 1dec3086 00000000 
GPR08: 00000000 c07b1640 00000007 1dec3086 22442482 100b9758 00000000 10090ae8 
GPR16: 00000000 000186a5 00000000 00000000 100c3018 bfa46edc 100b0000 bfa46ef0 
GPR24: cf386ae0 c07834f0 00000000 c0565f88 00000001 cf0dedb8 00000000 cf0ded80 
NIP [c055e65c] call_start+0x14/0x34
LR [c0566490] __rpc_execute+0x70/0x250
Call Trace:
[cf35db80] [00000080] 0x80 (unreliable)
[cf35dbb0] [c05606f4] rpc_run_task+0x9c/0xc4
[cf35dbc0] [c0560840] rpc_call_sync+0x50/0xb8
[cf35dbf0] [c056ee90] rpcb_register_call+0x54/0x84
[cf35dc10] [c056f24c] rpcb_register+0xf8/0x10c
[cf35dc70] [c0569e18] svc_unregister.isra.23+0x100/0x108
[cf35dc90] [c0569e38] svc_rpcb_cleanup+0x18/0x30
[cf35dca0] [c0198c5c] lockd_up+0x1dc/0x2e0
[cf35dcd0] [c0195348] nlmclnt_init+0x2c/0xc8
[cf35dcf0] [c015bb5c] nfs_start_lockd+0x98/0xec
[cf35dd20] [c015ce6c] nfs_create_server+0x1e8/0x3f4
[cf35dd90] [c0171590] nfs3_create_server+0x10/0x44
[cf35dda0] [c016528c] nfs_try_mount+0x158/0x1e4
[cf35de20] [c01670d0] nfs_fs_mount+0x434/0x8c8
[cf35de70] [c00cd3bc] mount_fs+0x20/0xbc
[cf35de90] [c00e4f88] vfs_kern_mount+0x50/0x104
[cf35dec0] [c00e6e0c] do_mount+0x1d0/0x8e0
[cf35df10] [c00e75ac] SyS_mount+0x90/0xd0
[cf35df40] [c000ccf4] ret_from_syscall+0x0/0x3c
--- Exception: c01 at 0xff2acc4
    LR = 0x10048ab8
Instruction dump:
3d20c056 3929e648 91230028 38600001 4e800020 38600000 4e800020 81230014 
8103000c 81490014 394a0001 91490014 <81280030> 81490018 394a0001 91490018 
---[ end trace 033b5b4715cb5452 ]---


This does not happen if

commit 72a6e594497032bd911bd187a88fae4b4473abb3
Author: Jeff Layton <jlayton@redhat.com>
Date:   Tue Mar 25 11:55:26 2014 -0700

    lockd: ensure we tear down any live sockets when socket creation fails during lockd_up
    
    commit 679b033df48422191c4cac52b610d9980e019f9b upstream.

is reverted:

# mount -t nfs 10.30.130.21:/opt /mnt
svc: failed to register lockdv1 RPC service (errno 111).
lockd_up: makesock failed, error=-111
mount: mounting 10.30.130.21:/opt on /mnt failed: Connection refused
#


Physical reason of the OOPS is that:

- addition of svc_shutdown_net() call to error path of make_socks() causes
double call of svc_rpcb_cleanup():
  - first call is from within svc_shutdown_net(), because serv->sv_shutdown
points to svc_rpcb_cleanup() at this time,
  - immediately followed by second call from lockd_up_net()'s error path

- when second svc_rpcb_cleanup() is executed, then at
  svc_unregister() -> __svc_unregister() -> rpcb_register() -> rpcb_register_call()
call path, rpcb_register_call() is called with clnt=NULL.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 3.10.y regression caused by:  lockd: ensure we tear down any live sockets when socket creation fails during lockd_up
  2014-06-20 11:14 3.10.y regression caused by: lockd: ensure we tear down any live sockets when socket creation fails during lockd_up Nikita Yushchenko
@ 2014-07-07 22:27 ` Greg Kroah-Hartman
  2014-07-22 13:59   ` Nikita Yushchenko
  2014-08-29 20:25   ` J. Bruce Fields
  0 siblings, 2 replies; 5+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-07 22:27 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: stable, Raphos, Jeff Layton, Stanislav Kinsbursky,
	J. Bruce Fields, 'Alexey Lugovskoy',
	Konstantin Kholopov, linux-kernel

On Fri, Jun 20, 2014 at 03:14:03PM +0400, Nikita Yushchenko wrote:
> With current 3.10.y, if kernel is booted with init=/bin/sh and then nfs mount
> is attempted (without portmap or rpcbind running) using busybox mount, following
> OOPS happen:
> 
> # mount -t nfs 10.30.130.21:/opt /mnt
> svc: failed to register lockdv1 RPC service (errno 111).
> lockd_up: makesock failed, error=-111
> Unable to handle kernel paging request for data at address 0x00000030
> Faulting instruction address: 0xc055e65c
> Oops: Kernel access of bad area, sig: 11 [#1]
> MPC85xx CDS
> Modules linked in:
> CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117
> task: cf29cea0 ti: cf35c000 task.ti: cf35c000
> NIP: c055e65c LR: c0566490 CTR: c055e648
> REGS: cf35dad0 TRAP: 0300   Not tainted  (3.10.44.cge)
> MSR: 00029000 <CE,EE,ME>  CR: 22442488  XER: 20000000
> DEAR: 00000030, ESR: 00000000
> 
> GPR00: c05606f4 cf35db80 cf29cea0 cf0ded80 cf0dedb8 00000001 1dec3086 00000000 
> GPR08: 00000000 c07b1640 00000007 1dec3086 22442482 100b9758 00000000 10090ae8 
> GPR16: 00000000 000186a5 00000000 00000000 100c3018 bfa46edc 100b0000 bfa46ef0 
> GPR24: cf386ae0 c07834f0 00000000 c0565f88 00000001 cf0dedb8 00000000 cf0ded80 
> NIP [c055e65c] call_start+0x14/0x34
> LR [c0566490] __rpc_execute+0x70/0x250
> Call Trace:
> [cf35db80] [00000080] 0x80 (unreliable)
> [cf35dbb0] [c05606f4] rpc_run_task+0x9c/0xc4
> [cf35dbc0] [c0560840] rpc_call_sync+0x50/0xb8
> [cf35dbf0] [c056ee90] rpcb_register_call+0x54/0x84
> [cf35dc10] [c056f24c] rpcb_register+0xf8/0x10c
> [cf35dc70] [c0569e18] svc_unregister.isra.23+0x100/0x108
> [cf35dc90] [c0569e38] svc_rpcb_cleanup+0x18/0x30
> [cf35dca0] [c0198c5c] lockd_up+0x1dc/0x2e0
> [cf35dcd0] [c0195348] nlmclnt_init+0x2c/0xc8
> [cf35dcf0] [c015bb5c] nfs_start_lockd+0x98/0xec
> [cf35dd20] [c015ce6c] nfs_create_server+0x1e8/0x3f4
> [cf35dd90] [c0171590] nfs3_create_server+0x10/0x44
> [cf35dda0] [c016528c] nfs_try_mount+0x158/0x1e4
> [cf35de20] [c01670d0] nfs_fs_mount+0x434/0x8c8
> [cf35de70] [c00cd3bc] mount_fs+0x20/0xbc
> [cf35de90] [c00e4f88] vfs_kern_mount+0x50/0x104
> [cf35dec0] [c00e6e0c] do_mount+0x1d0/0x8e0
> [cf35df10] [c00e75ac] SyS_mount+0x90/0xd0
> [cf35df40] [c000ccf4] ret_from_syscall+0x0/0x3c
> --- Exception: c01 at 0xff2acc4
>     LR = 0x10048ab8
> Instruction dump:
> 3d20c056 3929e648 91230028 38600001 4e800020 38600000 4e800020 81230014 
> 8103000c 81490014 394a0001 91490014 <81280030> 81490018 394a0001 91490018 
> ---[ end trace 033b5b4715cb5452 ]---
> 
> 
> This does not happen if
> 
> commit 72a6e594497032bd911bd187a88fae4b4473abb3
> Author: Jeff Layton <jlayton@redhat.com>
> Date:   Tue Mar 25 11:55:26 2014 -0700
> 
>     lockd: ensure we tear down any live sockets when socket creation fails during lockd_up
>     
>     commit 679b033df48422191c4cac52b610d9980e019f9b upstream.
> 
> is reverted:
> 
> # mount -t nfs 10.30.130.21:/opt /mnt
> svc: failed to register lockdv1 RPC service (errno 111).
> lockd_up: makesock failed, error=-111
> mount: mounting 10.30.130.21:/opt on /mnt failed: Connection refused
> #
> 
> 
> Physical reason of the OOPS is that:
> 
> - addition of svc_shutdown_net() call to error path of make_socks() causes
> double call of svc_rpcb_cleanup():
>   - first call is from within svc_shutdown_net(), because serv->sv_shutdown
> points to svc_rpcb_cleanup() at this time,
>   - immediately followed by second call from lockd_up_net()'s error path
> 
> - when second svc_rpcb_cleanup() is executed, then at
>   svc_unregister() -> __svc_unregister() -> rpcb_register() -> rpcb_register_call()
> call path, rpcb_register_call() is called with clnt=NULL.

So, Jeff, what should I do here?  Drop this patch from 3.10?  Add
something else to fix it up?  Something else entirely?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 3.10.y regression caused by:  lockd: ensure we tear down any live sockets when socket creation fails during lockd_up
  2014-07-07 22:27 ` Greg Kroah-Hartman
@ 2014-07-22 13:59   ` Nikita Yushchenko
  2014-08-29 20:25   ` J. Bruce Fields
  1 sibling, 0 replies; 5+ messages in thread
From: Nikita Yushchenko @ 2014-07-22 13:59 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: stable, Raphos, Jeff Layton, Stanislav Kinsbursky,
	J. Bruce Fields, 'Alexey Lugovskoy',
	Konstantin Kholopov, linux-kernel

>> With current 3.10.y, if kernel is booted with init=/bin/sh and then nfs mount
>> is attempted (without portmap or rpcbind running) using busybox mount, following
>> OOPS happen:
>>
>> # mount -t nfs 10.30.130.21:/opt /mnt
>> svc: failed to register lockdv1 RPC service (errno 111).
>> lockd_up: makesock failed, error=-111
>> Unable to handle kernel paging request for data at address 0x00000030
>> Faulting instruction address: 0xc055e65c
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> MPC85xx CDS
>> Modules linked in:
>> CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117
>> task: cf29cea0 ti: cf35c000 task.ti: cf35c000
>> NIP: c055e65c LR: c0566490 CTR: c055e648
>> REGS: cf35dad0 TRAP: 0300   Not tainted  (3.10.44.cge)
>> MSR: 00029000 <CE,EE,ME>  CR: 22442488  XER: 20000000
>> DEAR: 00000030, ESR: 00000000
>>
>> GPR00: c05606f4 cf35db80 cf29cea0 cf0ded80 cf0dedb8 00000001 1dec3086 00000000
>> GPR08: 00000000 c07b1640 00000007 1dec3086 22442482 100b9758 00000000 10090ae8
>> GPR16: 00000000 000186a5 00000000 00000000 100c3018 bfa46edc 100b0000 bfa46ef0
>> GPR24: cf386ae0 c07834f0 00000000 c0565f88 00000001 cf0dedb8 00000000 cf0ded80
>> NIP [c055e65c] call_start+0x14/0x34
>> LR [c0566490] __rpc_execute+0x70/0x250
>> Call Trace:
>> [cf35db80] [00000080] 0x80 (unreliable)
>> [cf35dbb0] [c05606f4] rpc_run_task+0x9c/0xc4
>> [cf35dbc0] [c0560840] rpc_call_sync+0x50/0xb8
>> [cf35dbf0] [c056ee90] rpcb_register_call+0x54/0x84
>> [cf35dc10] [c056f24c] rpcb_register+0xf8/0x10c
>> [cf35dc70] [c0569e18] svc_unregister.isra.23+0x100/0x108
>> [cf35dc90] [c0569e38] svc_rpcb_cleanup+0x18/0x30
>> [cf35dca0] [c0198c5c] lockd_up+0x1dc/0x2e0
>> [cf35dcd0] [c0195348] nlmclnt_init+0x2c/0xc8
>> [cf35dcf0] [c015bb5c] nfs_start_lockd+0x98/0xec
>> [cf35dd20] [c015ce6c] nfs_create_server+0x1e8/0x3f4
>> [cf35dd90] [c0171590] nfs3_create_server+0x10/0x44
>> [cf35dda0] [c016528c] nfs_try_mount+0x158/0x1e4
>> [cf35de20] [c01670d0] nfs_fs_mount+0x434/0x8c8
>> [cf35de70] [c00cd3bc] mount_fs+0x20/0xbc
>> [cf35de90] [c00e4f88] vfs_kern_mount+0x50/0x104
>> [cf35dec0] [c00e6e0c] do_mount+0x1d0/0x8e0
>> [cf35df10] [c00e75ac] SyS_mount+0x90/0xd0
>> [cf35df40] [c000ccf4] ret_from_syscall+0x0/0x3c
>> --- Exception: c01 at 0xff2acc4
>>      LR = 0x10048ab8
>> Instruction dump:
>> 3d20c056 3929e648 91230028 38600001 4e800020 38600000 4e800020 81230014
>> 8103000c 81490014 394a0001 91490014 <81280030> 81490018 394a0001 91490018
>> ---[ end trace 033b5b4715cb5452 ]---
>>
>>
>> This does not happen if
>>
>> commit 72a6e594497032bd911bd187a88fae4b4473abb3
>> Author: Jeff Layton <jlayton@redhat.com>
>> Date:   Tue Mar 25 11:55:26 2014 -0700
>>
>>      lockd: ensure we tear down any live sockets when socket creation fails during lockd_up
>>      
>>      commit 679b033df48422191c4cac52b610d9980e019f9b upstream.
>>
>> is reverted:
>>
>> # mount -t nfs 10.30.130.21:/opt /mnt
>> svc: failed to register lockdv1 RPC service (errno 111).
>> lockd_up: makesock failed, error=-111
>> mount: mounting 10.30.130.21:/opt on /mnt failed: Connection refused
>> #
>>
>>
>> Physical reason of the OOPS is that:
>>
>> - addition of svc_shutdown_net() call to error path of make_socks() causes
>> double call of svc_rpcb_cleanup():
>>    - first call is from within svc_shutdown_net(), because serv->sv_shutdown
>> points to svc_rpcb_cleanup() at this time,
>>    - immediately followed by second call from lockd_up_net()'s error path
>>
>> - when second svc_rpcb_cleanup() is executed, then at
>>    svc_unregister() -> __svc_unregister() -> rpcb_register() -> rpcb_register_call()
>> call path, rpcb_register_call() is called with clnt=NULL.
> 
> So, Jeff, what should I do here?  Drop this patch from 3.10?  Add
> something else to fix it up?  Something else entirely?

Problem is still there with 3.10.49

sh-4.2# /tmp/mount 10.150.42.24:/opt /mnt                                       
svc: failed to register lockdv1 RPC service (errno 111).                        
lockd_up: makesock failed, error=-111                                           
Unable to handle kernel paging request for data at address 0x00000038           
Faulting instruction address: 0xc055bb5c                                        
Oops: Kernel access of bad area, sig: 11 [#1]                                   
PREEMPT SMP NR_CPUS=2 MPC8572 DS                                                
Modules linked in:                                                              
CPU: 0 PID: 1315 Comm: mount Not tainted 3.10.49.cge #123                       
task: efb1f300 ti: c7ab0000 task.ti: c7ab0000                                   
NIP: c055bb5c LR: c0564df4 CTR: c055bb48                                        
REGS: c7ab1aa0 TRAP: 0300   Not tainted  (3.10.49.cge)                          
MSR: 00029000 <CE,EE,ME>  CR: 22442482  XER: 20000000                           
DEAR: 00000038, ESR: 00000000                                                   

GPR00: c055e124 c7ab1b50 efb1f300 ef8f8d80 ef8f8db8 00000001 5d3119ef 00000000  
GPR08: 00000000 c075e534 00000007 21964fef 22442482 100b9758 00000000 10090ae8  
GPR16: 00000000 000186a5 00000000 00000000 101e3018 c0564814 00000001 ef8f8db8  
GPR24: c0760000 00000000 c7ab0000 c055bb48 00000000 c055bb48 c7ab1bb8 ef8f8d80  
NIP [c055bb5c] call_start+0x14/0x34                                             
LR [c0564df4] __rpc_execute+0x90/0x388                                          
Call Trace:                                                                     
[c7ab1b50] [c00879e8] ktime_get+0x154/0x170 (unreliable)                        
[c7ab1ba0] [c055e124] rpc_run_task+0x9c/0xc4                                    
[c7ab1bb0] [c055e270] rpc_call_sync+0x50/0xb8                                   
[c7ab1be0] [c056e1e4] rpcb_register_call+0x54/0x84
[c7ab1c00] [c056e680] rpcb_register+0x108/0x11c
[c7ab1c70] [c0568d08] svc_unregister+0x110/0x118
[c7ab1c90] [c0568d28] svc_rpcb_cleanup+0x18/0x30
[c7ab1ca0] [c02803c4] lockd_up+0x1e4/0x2e8
[c7ab1cd0] [c027c8fc] nlmclnt_init+0x2c/0xc8
[c7ab1cf0] [c024b3bc] nfs_start_lockd+0x98/0xec
[c7ab1d20] [c024c744] nfs_create_server+0x1e8/0x3f4
[c7ab1d90] [c02622dc] nfs3_create_server+0x14/0x40
[c7ab1da0] [c0255558] nfs_try_mount+0x158/0x1e4
[c7ab1e20] [c0257420] nfs_fs_mount+0x438/0x8cc
[c7ab1e70] [c0140e3c] mount_fs+0x20/0xbc
[c7ab1e90] [c015b7a8] vfs_kern_mount+0x50/0x104
[c7ab1ec0] [c015dad0] do_mount+0x1d0/0x8ec
[c7ab1f10] [c015e27c] SyS_mount+0x90/0xd0
[c7ab1f40] [c000ee74] ret_from_syscall+0x0/0x3c
--- Exception: c01 at 0xff0ada0
    LR = 0x10048ab8
Instruction dump:
3d20c056 3929bb48 91230028 38600001 4e800020 38600000 4e800020 81230014 
8103000c 81490014 394a0001 91490014 <81280038> 81490018 394a0001 91490018 
---[ end trace 17b77871713e3175 ]---


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 3.10.y regression caused by:  lockd: ensure we tear down any live sockets when socket creation fails during lockd_up
  2014-07-07 22:27 ` Greg Kroah-Hartman
  2014-07-22 13:59   ` Nikita Yushchenko
@ 2014-08-29 20:25   ` J. Bruce Fields
  2014-08-29 21:22     ` Jeff Layton
  1 sibling, 1 reply; 5+ messages in thread
From: J. Bruce Fields @ 2014-08-29 20:25 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Nikita Yushchenko, stable, Raphos, Stanislav Kinsbursky,
	'Alexey Lugovskoy',
	Konstantin Kholopov, linux-kernel, jlayton, linux-nfs

On Mon, Jul 07, 2014 at 03:27:21PM -0700, Greg Kroah-Hartman wrote:
> On Fri, Jun 20, 2014 at 03:14:03PM +0400, Nikita Yushchenko wrote:
> > With current 3.10.y, if kernel is booted with init=/bin/sh and then nfs mount
> > is attempted (without portmap or rpcbind running) using busybox mount, following
> > OOPS happen:
> > 
> > # mount -t nfs 10.30.130.21:/opt /mnt
> > svc: failed to register lockdv1 RPC service (errno 111).
> > lockd_up: makesock failed, error=-111
> > Unable to handle kernel paging request for data at address 0x00000030
> > Faulting instruction address: 0xc055e65c
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > MPC85xx CDS
> > Modules linked in:
> > CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117
> > task: cf29cea0 ti: cf35c000 task.ti: cf35c000
> > NIP: c055e65c LR: c0566490 CTR: c055e648
> > REGS: cf35dad0 TRAP: 0300   Not tainted  (3.10.44.cge)
> > MSR: 00029000 <CE,EE,ME>  CR: 22442488  XER: 20000000
> > DEAR: 00000030, ESR: 00000000
> > 
> > GPR00: c05606f4 cf35db80 cf29cea0 cf0ded80 cf0dedb8 00000001 1dec3086 00000000 
> > GPR08: 00000000 c07b1640 00000007 1dec3086 22442482 100b9758 00000000 10090ae8 
> > GPR16: 00000000 000186a5 00000000 00000000 100c3018 bfa46edc 100b0000 bfa46ef0 
> > GPR24: cf386ae0 c07834f0 00000000 c0565f88 00000001 cf0dedb8 00000000 cf0ded80 
> > NIP [c055e65c] call_start+0x14/0x34
> > LR [c0566490] __rpc_execute+0x70/0x250
> > Call Trace:
> > [cf35db80] [00000080] 0x80 (unreliable)
> > [cf35dbb0] [c05606f4] rpc_run_task+0x9c/0xc4
> > [cf35dbc0] [c0560840] rpc_call_sync+0x50/0xb8
> > [cf35dbf0] [c056ee90] rpcb_register_call+0x54/0x84
> > [cf35dc10] [c056f24c] rpcb_register+0xf8/0x10c
> > [cf35dc70] [c0569e18] svc_unregister.isra.23+0x100/0x108
> > [cf35dc90] [c0569e38] svc_rpcb_cleanup+0x18/0x30
> > [cf35dca0] [c0198c5c] lockd_up+0x1dc/0x2e0
> > [cf35dcd0] [c0195348] nlmclnt_init+0x2c/0xc8
> > [cf35dcf0] [c015bb5c] nfs_start_lockd+0x98/0xec
> > [cf35dd20] [c015ce6c] nfs_create_server+0x1e8/0x3f4
> > [cf35dd90] [c0171590] nfs3_create_server+0x10/0x44
> > [cf35dda0] [c016528c] nfs_try_mount+0x158/0x1e4
> > [cf35de20] [c01670d0] nfs_fs_mount+0x434/0x8c8
> > [cf35de70] [c00cd3bc] mount_fs+0x20/0xbc
> > [cf35de90] [c00e4f88] vfs_kern_mount+0x50/0x104
> > [cf35dec0] [c00e6e0c] do_mount+0x1d0/0x8e0
> > [cf35df10] [c00e75ac] SyS_mount+0x90/0xd0
> > [cf35df40] [c000ccf4] ret_from_syscall+0x0/0x3c
> > --- Exception: c01 at 0xff2acc4
> >     LR = 0x10048ab8
> > Instruction dump:
> > 3d20c056 3929e648 91230028 38600001 4e800020 38600000 4e800020 81230014 
> > 8103000c 81490014 394a0001 91490014 <81280030> 81490018 394a0001 91490018 
> > ---[ end trace 033b5b4715cb5452 ]---
> > 
> > 
> > This does not happen if
> > 
> > commit 72a6e594497032bd911bd187a88fae4b4473abb3
> > Author: Jeff Layton <jlayton@redhat.com>
> > Date:   Tue Mar 25 11:55:26 2014 -0700
> > 
> >     lockd: ensure we tear down any live sockets when socket creation fails during lockd_up
> >     
> >     commit 679b033df48422191c4cac52b610d9980e019f9b upstream.
> > 
> > is reverted:
> > 
> > # mount -t nfs 10.30.130.21:/opt /mnt
> > svc: failed to register lockdv1 RPC service (errno 111).
> > lockd_up: makesock failed, error=-111
> > mount: mounting 10.30.130.21:/opt on /mnt failed: Connection refused
> > #
> > 
> > 
> > Physical reason of the OOPS is that:
> > 
> > - addition of svc_shutdown_net() call to error path of make_socks() causes
> > double call of svc_rpcb_cleanup():
> >   - first call is from within svc_shutdown_net(), because serv->sv_shutdown
> > points to svc_rpcb_cleanup() at this time,
> >   - immediately followed by second call from lockd_up_net()'s error path
> > 
> > - when second svc_rpcb_cleanup() is executed, then at
> >   svc_unregister() -> __svc_unregister() -> rpcb_register() -> rpcb_register_call()
> > call path, rpcb_register_call() is called with clnt=NULL.
> 
> So, Jeff, what should I do here?  Drop this patch from 3.10?  Add
> something else to fix it up?  Something else entirely?

Sorry this got ignored.  Adding more useful addressess....

So looks like the new svc_shutdown_net made lockd_up_net's cleanup
redundant, and just removing it might do the job?

--b.

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 673668a9eec1..685e953c5103 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -253,13 +253,11 @@ static int lockd_up_net(struct svc_serv *serv, struct net *net)
 
 	error = make_socks(serv, net);
 	if (error < 0)
-		goto err_socks;
+		goto err_bind;
 	set_grace_period(net);
 	dprintk("lockd_up_net: per-net data created; net=%p\n", net);
 	return 0;
 
-err_socks:
-	svc_rpcb_cleanup(serv, net);
 err_bind:
 	ln->nlmsvc_users--;
 	return error;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: 3.10.y regression caused by:  lockd: ensure we tear down any live sockets when socket creation fails during lockd_up
  2014-08-29 20:25   ` J. Bruce Fields
@ 2014-08-29 21:22     ` Jeff Layton
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Layton @ 2014-08-29 21:22 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Greg Kroah-Hartman, Nikita Yushchenko, stable, Raphos,
	Stanislav Kinsbursky, 'Alexey Lugovskoy',
	Konstantin Kholopov, linux-kernel, linux-nfs

On Fri, 29 Aug 2014 16:25:33 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> On Mon, Jul 07, 2014 at 03:27:21PM -0700, Greg Kroah-Hartman wrote:
> > On Fri, Jun 20, 2014 at 03:14:03PM +0400, Nikita Yushchenko wrote:
> > > With current 3.10.y, if kernel is booted with init=/bin/sh and then nfs mount
> > > is attempted (without portmap or rpcbind running) using busybox mount, following
> > > OOPS happen:
> > > 
> > > # mount -t nfs 10.30.130.21:/opt /mnt
> > > svc: failed to register lockdv1 RPC service (errno 111).
> > > lockd_up: makesock failed, error=-111
> > > Unable to handle kernel paging request for data at address 0x00000030
> > > Faulting instruction address: 0xc055e65c
> > > Oops: Kernel access of bad area, sig: 11 [#1]
> > > MPC85xx CDS
> > > Modules linked in:
> > > CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117
> > > task: cf29cea0 ti: cf35c000 task.ti: cf35c000
> > > NIP: c055e65c LR: c0566490 CTR: c055e648
> > > REGS: cf35dad0 TRAP: 0300   Not tainted  (3.10.44.cge)
> > > MSR: 00029000 <CE,EE,ME>  CR: 22442488  XER: 20000000
> > > DEAR: 00000030, ESR: 00000000
> > > 
> > > GPR00: c05606f4 cf35db80 cf29cea0 cf0ded80 cf0dedb8 00000001 1dec3086 00000000 
> > > GPR08: 00000000 c07b1640 00000007 1dec3086 22442482 100b9758 00000000 10090ae8 
> > > GPR16: 00000000 000186a5 00000000 00000000 100c3018 bfa46edc 100b0000 bfa46ef0 
> > > GPR24: cf386ae0 c07834f0 00000000 c0565f88 00000001 cf0dedb8 00000000 cf0ded80 
> > > NIP [c055e65c] call_start+0x14/0x34
> > > LR [c0566490] __rpc_execute+0x70/0x250
> > > Call Trace:
> > > [cf35db80] [00000080] 0x80 (unreliable)
> > > [cf35dbb0] [c05606f4] rpc_run_task+0x9c/0xc4
> > > [cf35dbc0] [c0560840] rpc_call_sync+0x50/0xb8
> > > [cf35dbf0] [c056ee90] rpcb_register_call+0x54/0x84
> > > [cf35dc10] [c056f24c] rpcb_register+0xf8/0x10c
> > > [cf35dc70] [c0569e18] svc_unregister.isra.23+0x100/0x108
> > > [cf35dc90] [c0569e38] svc_rpcb_cleanup+0x18/0x30
> > > [cf35dca0] [c0198c5c] lockd_up+0x1dc/0x2e0
> > > [cf35dcd0] [c0195348] nlmclnt_init+0x2c/0xc8
> > > [cf35dcf0] [c015bb5c] nfs_start_lockd+0x98/0xec
> > > [cf35dd20] [c015ce6c] nfs_create_server+0x1e8/0x3f4
> > > [cf35dd90] [c0171590] nfs3_create_server+0x10/0x44
> > > [cf35dda0] [c016528c] nfs_try_mount+0x158/0x1e4
> > > [cf35de20] [c01670d0] nfs_fs_mount+0x434/0x8c8
> > > [cf35de70] [c00cd3bc] mount_fs+0x20/0xbc
> > > [cf35de90] [c00e4f88] vfs_kern_mount+0x50/0x104
> > > [cf35dec0] [c00e6e0c] do_mount+0x1d0/0x8e0
> > > [cf35df10] [c00e75ac] SyS_mount+0x90/0xd0
> > > [cf35df40] [c000ccf4] ret_from_syscall+0x0/0x3c
> > > --- Exception: c01 at 0xff2acc4
> > >     LR = 0x10048ab8
> > > Instruction dump:
> > > 3d20c056 3929e648 91230028 38600001 4e800020 38600000 4e800020 81230014 
> > > 8103000c 81490014 394a0001 91490014 <81280030> 81490018 394a0001 91490018 
> > > ---[ end trace 033b5b4715cb5452 ]---
> > > 
> > > 
> > > This does not happen if
> > > 
> > > commit 72a6e594497032bd911bd187a88fae4b4473abb3
> > > Author: Jeff Layton <jlayton@redhat.com>
> > > Date:   Tue Mar 25 11:55:26 2014 -0700
> > > 
> > >     lockd: ensure we tear down any live sockets when socket creation fails during lockd_up
> > >     
> > >     commit 679b033df48422191c4cac52b610d9980e019f9b upstream.
> > > 
> > > is reverted:
> > > 
> > > # mount -t nfs 10.30.130.21:/opt /mnt
> > > svc: failed to register lockdv1 RPC service (errno 111).
> > > lockd_up: makesock failed, error=-111
> > > mount: mounting 10.30.130.21:/opt on /mnt failed: Connection refused
> > > #
> > > 
> > > 
> > > Physical reason of the OOPS is that:
> > > 
> > > - addition of svc_shutdown_net() call to error path of make_socks() causes
> > > double call of svc_rpcb_cleanup():
> > >   - first call is from within svc_shutdown_net(), because serv->sv_shutdown
> > > points to svc_rpcb_cleanup() at this time,
> > >   - immediately followed by second call from lockd_up_net()'s error path
> > > 
> > > - when second svc_rpcb_cleanup() is executed, then at
> > >   svc_unregister() -> __svc_unregister() -> rpcb_register() -> rpcb_register_call()
> > > call path, rpcb_register_call() is called with clnt=NULL.
> > 
> > So, Jeff, what should I do here?  Drop this patch from 3.10?  Add
> > something else to fix it up?  Something else entirely?
> 
> Sorry this got ignored.  Adding more useful addressess....
> 
> So looks like the new svc_shutdown_net made lockd_up_net's cleanup
> redundant, and just removing it might do the job?
> 
> --b.
> 
> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> index 673668a9eec1..685e953c5103 100644
> --- a/fs/lockd/svc.c
> +++ b/fs/lockd/svc.c
> @@ -253,13 +253,11 @@ static int lockd_up_net(struct svc_serv *serv, struct net *net)
>  
>  	error = make_socks(serv, net);
>  	if (error < 0)
> -		goto err_socks;
> +		goto err_bind;
>  	set_grace_period(net);
>  	dprintk("lockd_up_net: per-net data created; net=%p\n", net);
>  	return 0;
>  
> -err_socks:
> -	svc_rpcb_cleanup(serv, net);
>  err_bind:
>  	ln->nlmsvc_users--;
>  	return error;

Oof -- sorry I missed this. Must have gotten lost in the shuffle with my
email address change...

Yeah, that patch looks correct to me. I do wish the whole svc
setup/shutdown codepath weren't so godawful complicated, but that's not
a trivial thing to untangle at this point (particularly not in the
context of -stable).

Acked-by: Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-08-29 21:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-20 11:14 3.10.y regression caused by: lockd: ensure we tear down any live sockets when socket creation fails during lockd_up Nikita Yushchenko
2014-07-07 22:27 ` Greg Kroah-Hartman
2014-07-22 13:59   ` Nikita Yushchenko
2014-08-29 20:25   ` J. Bruce Fields
2014-08-29 21:22     ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).