* [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared @ 2008-09-08 13:39 Cedric Le Goater 2008-09-08 15:19 ` Serge E. Hallyn ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Cedric Le Goater @ 2008-09-08 13:39 UTC (permalink / raw) To: Andrew Morton Cc: Serge E. Hallyn, Eric W. Biederman, Trond Myklebust, Chuck Lever, Linux Kernel Mailing List, Linux Containers, linux-nfs On a system with nfs mounts, if a task unshares its mount namespace, a oops can occur when the system is rebooted if the task is the last to unreference the nfs mount. It will try to create a rpc request using utsname() which has been invalidated by free_nsproxy(). The patch fixes the issue by using the global init_utsname() but at the same time, it breaks the capability of identifying rpc clients per uts namespace. Any better suggestions ? BUG: unable to handle kernel NULL pointer dereference at 00000004 IP: [<c024c9ab>] rpc_create+0x332/0x42f Oops: 0000 [#1] DEBUG_PAGEALLOC Pid: 1857, comm: uts-oops Not tainted (2.6.27-rc5-00319-g7686ad5 #4) EIP: 0060:[<c024c9ab>] EFLAGS: 00210287 CPU: 0 EIP is at rpc_create+0x332/0x42f EAX: 00000000 EBX: df26adf0 ECX: c0251887 EDX: 00000001 ESI: df26ae58 EDI: c02f293c EBP: dda0fc9c ESP: dda0fc2c DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 Process uts-oops (pid: 1857, ti=dda0e000 task=dd9a0778 task.ti=dda0e000) Stack: c0104532 dda0fffc dda0fcac dda0e000 dda0e000 dd93b7f0 00000009 c02f2880 df26aefc dda0fc68 c01096b7 00000000 c0266ee0 c039a070 c039a070 dda0fc74 c012ca67 c039a064 dda0fc8c c012cb20 c03daf74 00000011 00000000 c0275c90 Call Trace: [<c0104532>] ? dump_trace+0xc2/0xe2 [<c01096b7>] ? save_stack_trace+0x1c/0x3a [<c012ca67>] ? save_trace+0x37/0x8c [<c012cb20>] ? add_lock_to_list+0x64/0x96 [<c0256fc4>] ? rpcb_register_call+0x62/0xbb [<c02570c8>] ? rpcb_register+0xab/0xb3 [<c0252f4d>] ? svc_register+0xb4/0x128 [<c0253114>] ? svc_destroy+0xec/0x103 [<c02531b2>] ? svc_exit_thread+0x87/0x8d [<c01a75cd>] ? lockd_down+0x61/0x81 [<c01a577b>] ? nlmclnt_done+0xd/0xf [<c01941fe>] ? nfs_destroy_server+0x14/0x16 [<c0194328>] ? nfs_free_server+0x4c/0xaa [<c019a066>] ? nfs_kill_super+0x23/0x27 [<c0158585>] ? deactivate_super+0x3f/0x51 [<c01695d1>] ? mntput_no_expire+0x95/0xb4 [<c016965b>] ? release_mounts+0x6b/0x7a [<c01696cc>] ? __put_mnt_ns+0x62/0x70 [<c0127501>] ? free_nsproxy+0x25/0x80 [<c012759a>] ? switch_task_namespaces+0x3e/0x43 [<c01275a9>] ? exit_task_namespaces+0xa/0xc [<c0117fed>] ? do_exit+0x4fd/0x666 [<c01181b3>] ? do_group_exit+0x5d/0x83 [<c011fa8c>] ? get_signal_to_deliver+0x2c8/0x2e0 [<c0102630>] ? do_notify_resume+0x69/0x700 [<c011d85a>] ? do_sigaction+0x134/0x145 [<c0127205>] ? hrtimer_nanosleep+0x8f/0xce [<c0126d1a>] ? hrtimer_wakeup+0x0/0x1c [<c0103488>] ? work_notifysig+0x13/0x1b ======================= Code: 70 20 68 cb c1 2c c0 e8 75 4e 01 00 8b 83 ac 00 00 00 59 3d 00 f0 ff ff 5f 77 63 eb 57 a1 00 80 2d c0 8b 80 a8 02 00 00 8d 73 68 <8b> 40 04 83 c0 45 e8 41 46 f7 ff ba 20 00 00 00 83 f8 21 0f 4c EIP: [<c024c9ab>] rpc_create+0x332/0x42f SS:ESP 0068:dda0fc2c Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> --- net/sunrpc/clnt.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 76739e9..a59cdf4 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -213,10 +213,10 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru } /* save the nodename */ - clnt->cl_nodelen = strlen(utsname()->nodename); + clnt->cl_nodelen = strlen(init_utsname()->nodename); if (clnt->cl_nodelen > UNX_MAXNODENAME) clnt->cl_nodelen = UNX_MAXNODENAME; - memcpy(clnt->cl_nodename, utsname()->nodename, clnt->cl_nodelen); + memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen); rpc_register_client(clnt); return clnt; -- 1.5.5.1 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-08 13:39 [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared Cedric Le Goater @ 2008-09-08 15:19 ` Serge E. Hallyn 2008-09-08 15:27 ` Serge E. Hallyn 2008-09-08 15:37 ` Cedric Le Goater 2008-09-08 16:09 ` Eric W. Biederman 2008-09-09 12:43 ` Serge E. Hallyn 2 siblings, 2 replies; 22+ messages in thread From: Serge E. Hallyn @ 2008-09-08 15:19 UTC (permalink / raw) To: Cedric Le Goater Cc: Andrew Morton, Eric W. Biederman, Trond Myklebust, Chuck Lever, Linux Kernel Mailing List, Linux Containers, linux-nfs Quoting Cedric Le Goater (clg@fr.ibm.com): > On a system with nfs mounts, if a task unshares its mount namespace, > a oops can occur when the system is rebooted if the task is the last > to unreference the nfs mount. It will try to create a rpc request > using utsname() which has been invalidated by free_nsproxy(). > > The patch fixes the issue by using the global init_utsname() but at > the same time, it breaks the capability of identifying rpc clients > per uts namespace. > > Any better suggestions ? But the utsname gets freed after the mnt_ns, so the analysis seems wrong somehow. I trust addr2line or whatever verified that rpc_create+0x332/0x42f is exactly at the call to utsname()? -serge > BUG: unable to handle kernel NULL pointer dereference at 00000004 > IP: [<c024c9ab>] rpc_create+0x332/0x42f > Oops: 0000 [#1] DEBUG_PAGEALLOC > > Pid: 1857, comm: uts-oops Not tainted (2.6.27-rc5-00319-g7686ad5 #4) > EIP: 0060:[<c024c9ab>] EFLAGS: 00210287 CPU: 0 > EIP is at rpc_create+0x332/0x42f > EAX: 00000000 EBX: df26adf0 ECX: c0251887 EDX: 00000001 > ESI: df26ae58 EDI: c02f293c EBP: dda0fc9c ESP: dda0fc2c > DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 > Process uts-oops (pid: 1857, ti=dda0e000 task=dd9a0778 task.ti=dda0e000) > Stack: c0104532 dda0fffc dda0fcac dda0e000 dda0e000 dd93b7f0 00000009 c02f2880 > df26aefc dda0fc68 c01096b7 00000000 c0266ee0 c039a070 c039a070 dda0fc74 > c012ca67 c039a064 dda0fc8c c012cb20 c03daf74 00000011 00000000 c0275c90 > Call Trace: > [<c0104532>] ? dump_trace+0xc2/0xe2 > [<c01096b7>] ? save_stack_trace+0x1c/0x3a > [<c012ca67>] ? save_trace+0x37/0x8c > [<c012cb20>] ? add_lock_to_list+0x64/0x96 > [<c0256fc4>] ? rpcb_register_call+0x62/0xbb > [<c02570c8>] ? rpcb_register+0xab/0xb3 > [<c0252f4d>] ? svc_register+0xb4/0x128 > [<c0253114>] ? svc_destroy+0xec/0x103 > [<c02531b2>] ? svc_exit_thread+0x87/0x8d > [<c01a75cd>] ? lockd_down+0x61/0x81 > [<c01a577b>] ? nlmclnt_done+0xd/0xf > [<c01941fe>] ? nfs_destroy_server+0x14/0x16 > [<c0194328>] ? nfs_free_server+0x4c/0xaa > [<c019a066>] ? nfs_kill_super+0x23/0x27 > [<c0158585>] ? deactivate_super+0x3f/0x51 > [<c01695d1>] ? mntput_no_expire+0x95/0xb4 > [<c016965b>] ? release_mounts+0x6b/0x7a > [<c01696cc>] ? __put_mnt_ns+0x62/0x70 > [<c0127501>] ? free_nsproxy+0x25/0x80 > [<c012759a>] ? switch_task_namespaces+0x3e/0x43 > [<c01275a9>] ? exit_task_namespaces+0xa/0xc > [<c0117fed>] ? do_exit+0x4fd/0x666 > [<c01181b3>] ? do_group_exit+0x5d/0x83 > [<c011fa8c>] ? get_signal_to_deliver+0x2c8/0x2e0 > [<c0102630>] ? do_notify_resume+0x69/0x700 > [<c011d85a>] ? do_sigaction+0x134/0x145 > [<c0127205>] ? hrtimer_nanosleep+0x8f/0xce > [<c0126d1a>] ? hrtimer_wakeup+0x0/0x1c > [<c0103488>] ? work_notifysig+0x13/0x1b > ======================= > Code: 70 20 68 cb c1 2c c0 e8 75 4e 01 00 8b 83 ac 00 00 00 59 3d 00 f0 ff ff 5f 77 63 eb 57 a1 00 80 2d c0 8b 80 a8 02 00 00 8d 73 68 <8b> 40 04 83 c0 45 e8 41 46 f7 ff ba 20 00 00 00 83 f8 21 0f 4c > EIP: [<c024c9ab>] rpc_create+0x332/0x42f SS:ESP 0068:dda0fc2c > > Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> > --- > net/sunrpc/clnt.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > index 76739e9..a59cdf4 100644 > --- a/net/sunrpc/clnt.c > +++ b/net/sunrpc/clnt.c > @@ -213,10 +213,10 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru > } > > /* save the nodename */ > - clnt->cl_nodelen = strlen(utsname()->nodename); > + clnt->cl_nodelen = strlen(init_utsname()->nodename); > if (clnt->cl_nodelen > UNX_MAXNODENAME) > clnt->cl_nodelen = UNX_MAXNODENAME; > - memcpy(clnt->cl_nodename, utsname()->nodename, clnt->cl_nodelen); > + memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen); > rpc_register_client(clnt); > return clnt; > > -- > 1.5.5.1 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-08 15:19 ` Serge E. Hallyn @ 2008-09-08 15:27 ` Serge E. Hallyn 2008-09-08 15:37 ` Cedric Le Goater 1 sibling, 0 replies; 22+ messages in thread From: Serge E. Hallyn @ 2008-09-08 15:27 UTC (permalink / raw) To: Serge E. Hallyn Cc: Cedric Le Goater, Andrew Morton, Eric W. Biederman, Trond Myklebust, Chuck Lever, Linux Kernel Mailing List, Linux Containers, linux-nfs Quoting Serge E. Hallyn (serue@us.ibm.com): > Quoting Cedric Le Goater (clg@fr.ibm.com): > > On a system with nfs mounts, if a task unshares its mount namespace, > > a oops can occur when the system is rebooted if the task is the last > > to unreference the nfs mount. It will try to create a rpc request > > using utsname() which has been invalidated by free_nsproxy(). > > > > The patch fixes the issue by using the global init_utsname() but at > > the same time, it breaks the capability of identifying rpc clients > > per uts namespace. > > > > Any better suggestions ? > > But the utsname gets freed after the mnt_ns, so the analysis seems > wrong somehow. (Oh, you straightened me out on irc - current->nsproxy is already NULL.) > I trust addr2line or whatever verified that rpc_create+0x332/0x42f is > exactly at the call to utsname()? > > -serge > > > BUG: unable to handle kernel NULL pointer dereference at 00000004 > > IP: [<c024c9ab>] rpc_create+0x332/0x42f > > Oops: 0000 [#1] DEBUG_PAGEALLOC > > > > Pid: 1857, comm: uts-oops Not tainted (2.6.27-rc5-00319-g7686ad5 #4) > > EIP: 0060:[<c024c9ab>] EFLAGS: 00210287 CPU: 0 > > EIP is at rpc_create+0x332/0x42f > > EAX: 00000000 EBX: df26adf0 ECX: c0251887 EDX: 00000001 > > ESI: df26ae58 EDI: c02f293c EBP: dda0fc9c ESP: dda0fc2c > > DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 > > Process uts-oops (pid: 1857, ti=dda0e000 task=dd9a0778 task.ti=dda0e000) > > Stack: c0104532 dda0fffc dda0fcac dda0e000 dda0e000 dd93b7f0 00000009 c02f2880 > > df26aefc dda0fc68 c01096b7 00000000 c0266ee0 c039a070 c039a070 dda0fc74 > > c012ca67 c039a064 dda0fc8c c012cb20 c03daf74 00000011 00000000 c0275c90 > > Call Trace: > > [<c0104532>] ? dump_trace+0xc2/0xe2 > > [<c01096b7>] ? save_stack_trace+0x1c/0x3a > > [<c012ca67>] ? save_trace+0x37/0x8c > > [<c012cb20>] ? add_lock_to_list+0x64/0x96 > > [<c0256fc4>] ? rpcb_register_call+0x62/0xbb > > [<c02570c8>] ? rpcb_register+0xab/0xb3 > > [<c0252f4d>] ? svc_register+0xb4/0x128 > > [<c0253114>] ? svc_destroy+0xec/0x103 > > [<c02531b2>] ? svc_exit_thread+0x87/0x8d > > [<c01a75cd>] ? lockd_down+0x61/0x81 > > [<c01a577b>] ? nlmclnt_done+0xd/0xf > > [<c01941fe>] ? nfs_destroy_server+0x14/0x16 > > [<c0194328>] ? nfs_free_server+0x4c/0xaa > > [<c019a066>] ? nfs_kill_super+0x23/0x27 > > [<c0158585>] ? deactivate_super+0x3f/0x51 > > [<c01695d1>] ? mntput_no_expire+0x95/0xb4 > > [<c016965b>] ? release_mounts+0x6b/0x7a > > [<c01696cc>] ? __put_mnt_ns+0x62/0x70 > > [<c0127501>] ? free_nsproxy+0x25/0x80 > > [<c012759a>] ? switch_task_namespaces+0x3e/0x43 > > [<c01275a9>] ? exit_task_namespaces+0xa/0xc > > [<c0117fed>] ? do_exit+0x4fd/0x666 > > [<c01181b3>] ? do_group_exit+0x5d/0x83 > > [<c011fa8c>] ? get_signal_to_deliver+0x2c8/0x2e0 > > [<c0102630>] ? do_notify_resume+0x69/0x700 > > [<c011d85a>] ? do_sigaction+0x134/0x145 > > [<c0127205>] ? hrtimer_nanosleep+0x8f/0xce > > [<c0126d1a>] ? hrtimer_wakeup+0x0/0x1c > > [<c0103488>] ? work_notifysig+0x13/0x1b > > ======================= > > Code: 70 20 68 cb c1 2c c0 e8 75 4e 01 00 8b 83 ac 00 00 00 59 3d 00 f0 ff ff 5f 77 63 eb 57 a1 00 80 2d c0 8b 80 a8 02 00 00 8d 73 68 <8b> 40 04 83 c0 45 e8 41 46 f7 ff ba 20 00 00 00 83 f8 21 0f 4c > > EIP: [<c024c9ab>] rpc_create+0x332/0x42f SS:ESP 0068:dda0fc2c > > > > Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> > > --- > > net/sunrpc/clnt.c | 4 ++-- > > 1 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > > index 76739e9..a59cdf4 100644 > > --- a/net/sunrpc/clnt.c > > +++ b/net/sunrpc/clnt.c > > @@ -213,10 +213,10 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru > > } > > > > /* save the nodename */ > > - clnt->cl_nodelen = strlen(utsname()->nodename); > > + clnt->cl_nodelen = strlen(init_utsname()->nodename); > > if (clnt->cl_nodelen > UNX_MAXNODENAME) > > clnt->cl_nodelen = UNX_MAXNODENAME; > > - memcpy(clnt->cl_nodename, utsname()->nodename, clnt->cl_nodelen); > > + memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen); > > rpc_register_client(clnt); > > return clnt; > > > > -- > > 1.5.5.1 > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-08 15:19 ` Serge E. Hallyn 2008-09-08 15:27 ` Serge E. Hallyn @ 2008-09-08 15:37 ` Cedric Le Goater 2008-09-08 15:43 ` Serge E. Hallyn 1 sibling, 1 reply; 22+ messages in thread From: Cedric Le Goater @ 2008-09-08 15:37 UTC (permalink / raw) To: Serge E. Hallyn Cc: Andrew Morton, Eric W. Biederman, Trond Myklebust, Chuck Lever, Linux Kernel Mailing List, Linux Containers, linux-nfs Serge E. Hallyn wrote: > Quoting Cedric Le Goater (clg@fr.ibm.com): >> On a system with nfs mounts, if a task unshares its mount namespace, >> a oops can occur when the system is rebooted if the task is the last >> to unreference the nfs mount. It will try to create a rpc request >> using utsname() which has been invalidated by free_nsproxy(). >> >> The patch fixes the issue by using the global init_utsname() but at >> the same time, it breaks the capability of identifying rpc clients >> per uts namespace. >> >> Any better suggestions ? > > But the utsname gets freed after the mnt_ns, so the analysis seems > wrong somehow. yes but switch_task_namespaces() assigns ->nsproxy to NULL. the result is the same. > I trust addr2line or whatever verified that rpc_create+0x332/0x42f is > exactly at the call to utsname()? yes. it points to net/sunrpc/clnt.c:216 clnt->cl_nodelen = strlen(utsname()->nodename); Thanks, C. > -serge > >> BUG: unable to handle kernel NULL pointer dereference at 00000004 >> IP: [<c024c9ab>] rpc_create+0x332/0x42f >> Oops: 0000 [#1] DEBUG_PAGEALLOC >> >> Pid: 1857, comm: uts-oops Not tainted (2.6.27-rc5-00319-g7686ad5 #4) >> EIP: 0060:[<c024c9ab>] EFLAGS: 00210287 CPU: 0 >> EIP is at rpc_create+0x332/0x42f >> EAX: 00000000 EBX: df26adf0 ECX: c0251887 EDX: 00000001 >> ESI: df26ae58 EDI: c02f293c EBP: dda0fc9c ESP: dda0fc2c >> DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 >> Process uts-oops (pid: 1857, ti=dda0e000 task=dd9a0778 task.ti=dda0e000) >> Stack: c0104532 dda0fffc dda0fcac dda0e000 dda0e000 dd93b7f0 00000009 c02f2880 >> df26aefc dda0fc68 c01096b7 00000000 c0266ee0 c039a070 c039a070 dda0fc74 >> c012ca67 c039a064 dda0fc8c c012cb20 c03daf74 00000011 00000000 c0275c90 >> Call Trace: >> [<c0104532>] ? dump_trace+0xc2/0xe2 >> [<c01096b7>] ? save_stack_trace+0x1c/0x3a >> [<c012ca67>] ? save_trace+0x37/0x8c >> [<c012cb20>] ? add_lock_to_list+0x64/0x96 >> [<c0256fc4>] ? rpcb_register_call+0x62/0xbb >> [<c02570c8>] ? rpcb_register+0xab/0xb3 >> [<c0252f4d>] ? svc_register+0xb4/0x128 >> [<c0253114>] ? svc_destroy+0xec/0x103 >> [<c02531b2>] ? svc_exit_thread+0x87/0x8d >> [<c01a75cd>] ? lockd_down+0x61/0x81 >> [<c01a577b>] ? nlmclnt_done+0xd/0xf >> [<c01941fe>] ? nfs_destroy_server+0x14/0x16 >> [<c0194328>] ? nfs_free_server+0x4c/0xaa >> [<c019a066>] ? nfs_kill_super+0x23/0x27 >> [<c0158585>] ? deactivate_super+0x3f/0x51 >> [<c01695d1>] ? mntput_no_expire+0x95/0xb4 >> [<c016965b>] ? release_mounts+0x6b/0x7a >> [<c01696cc>] ? __put_mnt_ns+0x62/0x70 >> [<c0127501>] ? free_nsproxy+0x25/0x80 >> [<c012759a>] ? switch_task_namespaces+0x3e/0x43 >> [<c01275a9>] ? exit_task_namespaces+0xa/0xc >> [<c0117fed>] ? do_exit+0x4fd/0x666 >> [<c01181b3>] ? do_group_exit+0x5d/0x83 >> [<c011fa8c>] ? get_signal_to_deliver+0x2c8/0x2e0 >> [<c0102630>] ? do_notify_resume+0x69/0x700 >> [<c011d85a>] ? do_sigaction+0x134/0x145 >> [<c0127205>] ? hrtimer_nanosleep+0x8f/0xce >> [<c0126d1a>] ? hrtimer_wakeup+0x0/0x1c >> [<c0103488>] ? work_notifysig+0x13/0x1b >> ======================= >> Code: 70 20 68 cb c1 2c c0 e8 75 4e 01 00 8b 83 ac 00 00 00 59 3d 00 f0 ff ff 5f 77 63 eb 57 a1 00 80 2d c0 8b 80 a8 02 00 00 8d 73 68 <8b> 40 04 83 c0 45 e8 41 46 f7 ff ba 20 00 00 00 83 f8 21 0f 4c >> EIP: [<c024c9ab>] rpc_create+0x332/0x42f SS:ESP 0068:dda0fc2c >> >> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> >> --- >> net/sunrpc/clnt.c | 4 ++-- >> 1 files changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c >> index 76739e9..a59cdf4 100644 >> --- a/net/sunrpc/clnt.c >> +++ b/net/sunrpc/clnt.c >> @@ -213,10 +213,10 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru >> } >> >> /* save the nodename */ >> - clnt->cl_nodelen = strlen(utsname()->nodename); >> + clnt->cl_nodelen = strlen(init_utsname()->nodename); >> if (clnt->cl_nodelen > UNX_MAXNODENAME) >> clnt->cl_nodelen = UNX_MAXNODENAME; >> - memcpy(clnt->cl_nodename, utsname()->nodename, clnt->cl_nodelen); >> + memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen); >> rpc_register_client(clnt); >> return clnt; >> >> -- >> 1.5.5.1 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-08 15:37 ` Cedric Le Goater @ 2008-09-08 15:43 ` Serge E. Hallyn 0 siblings, 0 replies; 22+ messages in thread From: Serge E. Hallyn @ 2008-09-08 15:43 UTC (permalink / raw) To: Cedric Le Goater Cc: Andrew Morton, Eric W. Biederman, Trond Myklebust, Chuck Lever, Linux Kernel Mailing List, Linux Containers, linux-nfs Quoting Cedric Le Goater (clg@fr.ibm.com): > Serge E. Hallyn wrote: > > Quoting Cedric Le Goater (clg@fr.ibm.com): > >> On a system with nfs mounts, if a task unshares its mount namespace, > >> a oops can occur when the system is rebooted if the task is the last > >> to unreference the nfs mount. It will try to create a rpc request > >> using utsname() which has been invalidated by free_nsproxy(). > >> > >> The patch fixes the issue by using the global init_utsname() but at > >> the same time, it breaks the capability of identifying rpc clients > >> per uts namespace. > >> > >> Any better suggestions ? > > > > But the utsname gets freed after the mnt_ns, so the analysis seems > > wrong somehow. > > yes but switch_task_namespaces() assigns ->nsproxy to NULL. the result > is the same. > > > I trust addr2line or whatever verified that rpc_create+0x332/0x42f is > > exactly at the call to utsname()? > > yes. it points to net/sunrpc/clnt.c:216 > > clnt->cl_nodelen = strlen(utsname()->nodename); Pavel, at the mini-summit you mentioned sunrpc transports as one example of the mini-namespaces openvz currently implements. We then apparently went off on a bit of a tangent (http://wiki.openvz.org/Containers/Mini-summit_2008_notes#Namespaces_and_containers) after Dave asked for a list of mini-namespaces openvz implements. What exactly is openvz doing with sunrpc transports, and would it be a better solution to this problem? thanks, -serge ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-08 13:39 [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared Cedric Le Goater 2008-09-08 15:19 ` Serge E. Hallyn @ 2008-09-08 16:09 ` Eric W. Biederman 2008-09-09 11:54 ` Cedric Le Goater 2008-09-09 12:43 ` Serge E. Hallyn 2 siblings, 1 reply; 22+ messages in thread From: Eric W. Biederman @ 2008-09-08 16:09 UTC (permalink / raw) To: Cedric Le Goater Cc: Andrew Morton, Serge E. Hallyn, Trond Myklebust, Chuck Lever, Linux Kernel Mailing List, Linux Containers, linux-nfs Cedric Le Goater <clg@fr.ibm.com> writes: > On a system with nfs mounts, if a task unshares its mount namespace, > a oops can occur when the system is rebooted if the task is the last > to unreference the nfs mount. It will try to create a rpc request > using utsname() which has been invalidated by free_nsproxy(). > > The patch fixes the issue by using the global init_utsname() but at > the same time, it breaks the capability of identifying rpc clients > per uts namespace. > > Any better suggestions ? Can we push utsname into rpc_create_args and push the access of utsname up the food chain? My gut feeling says we should capture the utsname or the uts_ns when we mount the nfs filesystem so we stay in sync for the life of the mount. Eric > BUG: unable to handle kernel NULL pointer dereference at 00000004 > IP: [<c024c9ab>] rpc_create+0x332/0x42f > Oops: 0000 [#1] DEBUG_PAGEALLOC > > Pid: 1857, comm: uts-oops Not tainted (2.6.27-rc5-00319-g7686ad5 #4) > EIP: 0060:[<c024c9ab>] EFLAGS: 00210287 CPU: 0 > EIP is at rpc_create+0x332/0x42f > EAX: 00000000 EBX: df26adf0 ECX: c0251887 EDX: 00000001 > ESI: df26ae58 EDI: c02f293c EBP: dda0fc9c ESP: dda0fc2c > DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 > Process uts-oops (pid: 1857, ti=dda0e000 task=dd9a0778 task.ti=dda0e000) > Stack: c0104532 dda0fffc dda0fcac dda0e000 dda0e000 dd93b7f0 00000009 c02f2880 > df26aefc dda0fc68 c01096b7 00000000 c0266ee0 c039a070 c039a070 dda0fc74 > c012ca67 c039a064 dda0fc8c c012cb20 c03daf74 00000011 00000000 c0275c90 > Call Trace: > [<c0104532>] ? dump_trace+0xc2/0xe2 > [<c01096b7>] ? save_stack_trace+0x1c/0x3a > [<c012ca67>] ? save_trace+0x37/0x8c > [<c012cb20>] ? add_lock_to_list+0x64/0x96 > [<c0256fc4>] ? rpcb_register_call+0x62/0xbb > [<c02570c8>] ? rpcb_register+0xab/0xb3 > [<c0252f4d>] ? svc_register+0xb4/0x128 > [<c0253114>] ? svc_destroy+0xec/0x103 > [<c02531b2>] ? svc_exit_thread+0x87/0x8d > [<c01a75cd>] ? lockd_down+0x61/0x81 > [<c01a577b>] ? nlmclnt_done+0xd/0xf > [<c01941fe>] ? nfs_destroy_server+0x14/0x16 > [<c0194328>] ? nfs_free_server+0x4c/0xaa > [<c019a066>] ? nfs_kill_super+0x23/0x27 > [<c0158585>] ? deactivate_super+0x3f/0x51 > [<c01695d1>] ? mntput_no_expire+0x95/0xb4 > [<c016965b>] ? release_mounts+0x6b/0x7a > [<c01696cc>] ? __put_mnt_ns+0x62/0x70 > [<c0127501>] ? free_nsproxy+0x25/0x80 > [<c012759a>] ? switch_task_namespaces+0x3e/0x43 > [<c01275a9>] ? exit_task_namespaces+0xa/0xc > [<c0117fed>] ? do_exit+0x4fd/0x666 > [<c01181b3>] ? do_group_exit+0x5d/0x83 > [<c011fa8c>] ? get_signal_to_deliver+0x2c8/0x2e0 > [<c0102630>] ? do_notify_resume+0x69/0x700 > [<c011d85a>] ? do_sigaction+0x134/0x145 > [<c0127205>] ? hrtimer_nanosleep+0x8f/0xce > [<c0126d1a>] ? hrtimer_wakeup+0x0/0x1c > [<c0103488>] ? work_notifysig+0x13/0x1b > ======================= > Code: 70 20 68 cb c1 2c c0 e8 75 4e 01 00 8b 83 ac 00 00 00 59 3d 00 f0 ff ff 5f > 77 63 eb 57 a1 00 80 2d c0 8b 80 a8 02 00 00 8d 73 68 <8b> 40 04 83 c0 45 e8 41 > 46 f7 ff ba 20 00 00 00 83 f8 21 0f 4c > EIP: [<c024c9ab>] rpc_create+0x332/0x42f SS:ESP 0068:dda0fc2c > > Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> > --- > net/sunrpc/clnt.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > index 76739e9..a59cdf4 100644 > --- a/net/sunrpc/clnt.c > +++ b/net/sunrpc/clnt.c > @@ -213,10 +213,10 @@ static struct rpc_clnt * rpc_new_client(const struct > rpc_create_args *args, stru > } > > /* save the nodename */ > - clnt->cl_nodelen = strlen(utsname()->nodename); > + clnt->cl_nodelen = strlen(init_utsname()->nodename); > if (clnt->cl_nodelen > UNX_MAXNODENAME) > clnt->cl_nodelen = UNX_MAXNODENAME; > - memcpy(clnt->cl_nodename, utsname()->nodename, clnt->cl_nodelen); > + memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen); > rpc_register_client(clnt); > return clnt; > > -- > 1.5.5.1 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-08 16:09 ` Eric W. Biederman @ 2008-09-09 11:54 ` Cedric Le Goater 0 siblings, 0 replies; 22+ messages in thread From: Cedric Le Goater @ 2008-09-09 11:54 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, Serge E. Hallyn, Trond Myklebust, Chuck Lever, Linux Kernel Mailing List, Linux Containers, linux-nfs Eric W. Biederman wrote: > Cedric Le Goater <clg@fr.ibm.com> writes: > >> On a system with nfs mounts, if a task unshares its mount namespace, >> a oops can occur when the system is rebooted if the task is the last >> to unreference the nfs mount. It will try to create a rpc request >> using utsname() which has been invalidated by free_nsproxy(). >> >> The patch fixes the issue by using the global init_utsname() but at >> the same time, it breaks the capability of identifying rpc clients >> per uts namespace. >> >> Any better suggestions ? > > Can we push utsname into rpc_create_args and push the access > of utsname up the food chain? struct rpc_create_args seems to be used only as a stack argument for rpc_create() it's not kept in any nfs or sunrpc objects. > My gut feeling says we should capture the utsname or the > uts_ns when we mount the nfs filesystem so we stay in sync > for the life of the mount. I see. It make sense but, looking at the code, the nfs and sunrpc will need some heavy changes ... Thanks, C. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-08 13:39 [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared Cedric Le Goater 2008-09-08 15:19 ` Serge E. Hallyn 2008-09-08 16:09 ` Eric W. Biederman @ 2008-09-09 12:43 ` Serge E. Hallyn 2008-09-09 15:09 ` Eric W. Biederman 2 siblings, 1 reply; 22+ messages in thread From: Serge E. Hallyn @ 2008-09-09 12:43 UTC (permalink / raw) To: Cedric Le Goater Cc: Andrew Morton, Eric W. Biederman, Trond Myklebust, Chuck Lever, Linux Kernel Mailing List, Linux Containers, linux-nfs Quoting Cedric Le Goater (clg@fr.ibm.com): > On a system with nfs mounts, if a task unshares its mount namespace, > a oops can occur when the system is rebooted if the task is the last > to unreference the nfs mount. It will try to create a rpc request > using utsname() which has been invalidated by free_nsproxy(). > > The patch fixes the issue by using the global init_utsname() but at > the same time, it breaks the capability of identifying rpc clients > per uts namespace. > > Any better suggestions ? > > BUG: unable to handle kernel NULL pointer dereference at 00000004 > IP: [<c024c9ab>] rpc_create+0x332/0x42f > Oops: 0000 [#1] DEBUG_PAGEALLOC > > Pid: 1857, comm: uts-oops Not tainted (2.6.27-rc5-00319-g7686ad5 #4) > EIP: 0060:[<c024c9ab>] EFLAGS: 00210287 CPU: 0 > EIP is at rpc_create+0x332/0x42f > EAX: 00000000 EBX: df26adf0 ECX: c0251887 EDX: 00000001 > ESI: df26ae58 EDI: c02f293c EBP: dda0fc9c ESP: dda0fc2c > DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 > Process uts-oops (pid: 1857, ti=dda0e000 task=dd9a0778 task.ti=dda0e000) > Stack: c0104532 dda0fffc dda0fcac dda0e000 dda0e000 dd93b7f0 00000009 c02f2880 > df26aefc dda0fc68 c01096b7 00000000 c0266ee0 c039a070 c039a070 dda0fc74 > c012ca67 c039a064 dda0fc8c c012cb20 c03daf74 00000011 00000000 c0275c90 > Call Trace: > [<c0104532>] ? dump_trace+0xc2/0xe2 > [<c01096b7>] ? save_stack_trace+0x1c/0x3a > [<c012ca67>] ? save_trace+0x37/0x8c > [<c012cb20>] ? add_lock_to_list+0x64/0x96 > [<c0256fc4>] ? rpcb_register_call+0x62/0xbb > [<c02570c8>] ? rpcb_register+0xab/0xb3 > [<c0252f4d>] ? svc_register+0xb4/0x128 > [<c0253114>] ? svc_destroy+0xec/0x103 > [<c02531b2>] ? svc_exit_thread+0x87/0x8d > [<c01a75cd>] ? lockd_down+0x61/0x81 > [<c01a577b>] ? nlmclnt_done+0xd/0xf > [<c01941fe>] ? nfs_destroy_server+0x14/0x16 > [<c0194328>] ? nfs_free_server+0x4c/0xaa > [<c019a066>] ? nfs_kill_super+0x23/0x27 > [<c0158585>] ? deactivate_super+0x3f/0x51 > [<c01695d1>] ? mntput_no_expire+0x95/0xb4 > [<c016965b>] ? release_mounts+0x6b/0x7a > [<c01696cc>] ? __put_mnt_ns+0x62/0x70 > [<c0127501>] ? free_nsproxy+0x25/0x80 > [<c012759a>] ? switch_task_namespaces+0x3e/0x43 > [<c01275a9>] ? exit_task_namespaces+0xa/0xc > [<c0117fed>] ? do_exit+0x4fd/0x666 > [<c01181b3>] ? do_group_exit+0x5d/0x83 > [<c011fa8c>] ? get_signal_to_deliver+0x2c8/0x2e0 > [<c0102630>] ? do_notify_resume+0x69/0x700 > [<c011d85a>] ? do_sigaction+0x134/0x145 > [<c0127205>] ? hrtimer_nanosleep+0x8f/0xce > [<c0126d1a>] ? hrtimer_wakeup+0x0/0x1c > [<c0103488>] ? work_notifysig+0x13/0x1b > ======================= > Code: 70 20 68 cb c1 2c c0 e8 75 4e 01 00 8b 83 ac 00 00 00 59 3d 00 f0 ff ff 5f 77 63 eb 57 a1 00 80 2d c0 8b 80 a8 02 00 00 8d 73 68 <8b> 40 04 83 c0 45 e8 41 46 f7 ff ba 20 00 00 00 83 f8 21 0f 4c > EIP: [<c024c9ab>] rpc_create+0x332/0x42f SS:ESP 0068:dda0fc2c > > Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Thanks, Cedric. Eric is probably right about the long-term fix, but yeah it might take a while to properly wade through the sunrpc and nfs layers to store the nodename at nfs mount time, and in the meantime this fixes a real oops. Acked-by: Serge Hallyn <serue@us.ibm.com> > --- > net/sunrpc/clnt.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > index 76739e9..a59cdf4 100644 > --- a/net/sunrpc/clnt.c > +++ b/net/sunrpc/clnt.c > @@ -213,10 +213,10 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru > } > > /* save the nodename */ > - clnt->cl_nodelen = strlen(utsname()->nodename); > + clnt->cl_nodelen = strlen(init_utsname()->nodename); > if (clnt->cl_nodelen > UNX_MAXNODENAME) > clnt->cl_nodelen = UNX_MAXNODENAME; > - memcpy(clnt->cl_nodename, utsname()->nodename, clnt->cl_nodelen); > + memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen); > rpc_register_client(clnt); > return clnt; > > -- > 1.5.5.1 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-09 12:43 ` Serge E. Hallyn @ 2008-09-09 15:09 ` Eric W. Biederman 2008-09-09 15:29 ` Serge E. Hallyn 0 siblings, 1 reply; 22+ messages in thread From: Eric W. Biederman @ 2008-09-09 15:09 UTC (permalink / raw) To: Serge E. Hallyn Cc: Cedric Le Goater, Andrew Morton, Trond Myklebust, Chuck Lever, Linux Kernel Mailing List, Linux Containers, linux-nfs "Serge E. Hallyn" <serue@us.ibm.com> writes: > Thanks, Cedric. Eric is probably right about the long-term fix, but > yeah it might take a while to properly wade through the sunrpc and nfs > layers to store the nodename at nfs mount time, and in the meantime this > fixes a real oops. A very esoteric oops that hasn't shown up for two years. Please let's look at this and see what it would take to fix this properly. What are we trying to achieve by reading utsname? Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-09 15:09 ` Eric W. Biederman @ 2008-09-09 15:29 ` Serge E. Hallyn 2008-09-09 15:40 ` Cedric Le Goater 2008-09-09 17:07 ` Chuck Lever 0 siblings, 2 replies; 22+ messages in thread From: Serge E. Hallyn @ 2008-09-09 15:29 UTC (permalink / raw) To: Eric W. Biederman Cc: Cedric Le Goater, Andrew Morton, Trond Myklebust, Chuck Lever, Linux Kernel Mailing List, Linux Containers, linux-nfs Quoting Eric W. Biederman (ebiederm@xmission.com): > "Serge E. Hallyn" <serue@us.ibm.com> writes: > > > Thanks, Cedric. Eric is probably right about the long-term fix, but > > yeah it might take a while to properly wade through the sunrpc and nfs > > layers to store the nodename at nfs mount time, and in the meantime this > > fixes a real oops. > > A very esoteric oops that hasn't shown up for two years. But an easily reproducible one. It's not as though we'll stop looking for the right fix just bc we have this "bad" fix in for a short while. > Please let's look at this and see what it would take to fix this > properly. Of course. Cedric is looking at the best way to fix it... > What are we trying to achieve by reading utsname? It looks like it gets copied into the sunrpc messages so I assume it is a part of the sunrpc spec? I don't want to do this, but we *could* put a conditional in utsname() to have it return init_utsname if current->nsproxy is null... -serge ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-09 15:29 ` Serge E. Hallyn @ 2008-09-09 15:40 ` Cedric Le Goater 2008-09-09 17:07 ` Chuck Lever 1 sibling, 0 replies; 22+ messages in thread From: Cedric Le Goater @ 2008-09-09 15:40 UTC (permalink / raw) To: Serge E. Hallyn Cc: Eric W. Biederman, Andrew Morton, Trond Myklebust, Chuck Lever, Linux Kernel Mailing List, Linux Containers, linux-nfs Serge E. Hallyn wrote: > Quoting Eric W. Biederman (ebiederm@xmission.com): >> "Serge E. Hallyn" <serue@us.ibm.com> writes: >> >>> Thanks, Cedric. Eric is probably right about the long-term fix, but >>> yeah it might take a while to properly wade through the sunrpc and nfs >>> layers to store the nodename at nfs mount time, and in the meantime this >>> fixes a real oops. >> A very esoteric oops that hasn't shown up for two years. > > But an easily reproducible one. > > It's not as though we'll stop looking for the right fix just bc we have > this "bad" fix in for a short while. > >> Please let's look at this and see what it would take to fix this >> properly. > > Of course. Cedric is looking at the best way to fix it... yes. well, my eyes are making progress in the NFS code. it will take some time :) >> What are we trying to achieve by reading utsname? > > It looks like it gets copied into the sunrpc messages so I assume it is > a part of the sunrpc spec? > > I don't want to do this, but we *could* put a conditional in utsname() > to have it return init_utsname if current->nsproxy is null... I nearly did that one but it will hide future misusage of utsname(). So I think it's better to keep it that way, and let the machine oops when we need to fix our code. C. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-09 15:29 ` Serge E. Hallyn 2008-09-09 15:40 ` Cedric Le Goater @ 2008-09-09 17:07 ` Chuck Lever 2008-09-09 18:20 ` Eric W. Biederman 1 sibling, 1 reply; 22+ messages in thread From: Chuck Lever @ 2008-09-09 17:07 UTC (permalink / raw) To: Serge E. Hallyn Cc: Eric W. Biederman, Cedric Le Goater, Andrew Morton, Trond Myklebust, Linux Kernel Mailing List, Linux Containers, linux-nfs On Sep 9, 2008, at Sep 9, 2008, 11:29 AM, Serge E. Hallyn wrote: > Quoting Eric W. Biederman (ebiederm@xmission.com): >> "Serge E. Hallyn" <serue@us.ibm.com> writes: >> >>> Thanks, Cedric. Eric is probably right about the long-term fix, but >>> yeah it might take a while to properly wade through the sunrpc and >>> nfs >>> layers to store the nodename at nfs mount time, and in the >>> meantime this >>> fixes a real oops. >> >> A very esoteric oops that hasn't shown up for two years. > > But an easily reproducible one. > > It's not as though we'll stop looking for the right fix just bc we > have > this "bad" fix in for a short while. > >> Please let's look at this and see what it would take to fix this >> properly. > > Of course. Cedric is looking at the best way to fix it... If the upper layers are responsible for providing the utsname, you will need to fix up lockd and the NFS server's callback client too, at least. >> What are we trying to achieve by reading utsname? > > It looks like it gets copied into the sunrpc messages so I assume it > is > a part of the sunrpc spec? It appears to be used only for RPC's AUTH_SYS credentials. The nodename is used to identify the caller's host. See RFC 1831, Appendix A: http://rfclibrary.hosting.com/rfc/rfc1831/rfc1831-16.asp I'm not terribly familiar with uts namespaces, though. Can someone explain why we need to distinguish between these for AUTH_SYS if the caller is on a remote system? > I don't want to do this, but we *could* put a conditional in utsname() > to have it return init_utsname if current->nsproxy is null... I don't like the idea of an oops in here. Instead, (for now) it should warn and fail to create the client, IMO. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-09 17:07 ` Chuck Lever @ 2008-09-09 18:20 ` Eric W. Biederman 2008-09-09 19:00 ` Chuck Lever 2008-09-10 9:23 ` Cedric Le Goater 0 siblings, 2 replies; 22+ messages in thread From: Eric W. Biederman @ 2008-09-09 18:20 UTC (permalink / raw) To: Chuck Lever Cc: Serge E. Hallyn, Cedric Le Goater, Andrew Morton, Trond Myklebust, Linux Kernel Mailing List, Linux Containers, linux-nfs Chuck Lever <chuck.lever@oracle.com> writes: > If the upper layers are responsible for providing the utsname, you will need to > fix up lockd and the NFS server's callback client too, at least. Actually looking at the code. It looks like a proper fix may be even simpler. Why do we have both clnt->cl_server and clnt->cl_nodename? Or is cl_server the other side of the connection? >>> What are we trying to achieve by reading utsname? >> >> It looks like it gets copied into the sunrpc messages so I assume it is >> a part of the sunrpc spec? > > It appears to be used only for RPC's AUTH_SYS credentials. The nodename is used > to identify the caller's host. See RFC 1831, Appendix A: > > http://rfclibrary.hosting.com/rfc/rfc1831/rfc1831-16.asp Thanks that helps a lot. > I'm not terribly familiar with uts namespaces, though. Can someone explain why > we need to distinguish between these for AUTH_SYS if the caller is on a remote > system? Semantically processes in different uts namespaces are on different machines. > I don't like the idea of an oops in here. Instead, (for now) it should warn and > fail to create the client, IMO. Which is interesting when the problem happens during NFS unmount. Although frankly it could fail anyway. It seems strange that we are creating a client during unmount anyway. Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-09 18:20 ` Eric W. Biederman @ 2008-09-09 19:00 ` Chuck Lever 2008-09-09 20:08 ` Eric W. Biederman 2008-09-10 9:23 ` Cedric Le Goater 1 sibling, 1 reply; 22+ messages in thread From: Chuck Lever @ 2008-09-09 19:00 UTC (permalink / raw) To: ebiederm Cc: Serge E. Hallyn, Cedric Le Goater, Andrew Morton, Trond Myklebust, Linux Kernel Mailing List, Linux Containers, linux-nfs On Sep 9, 2008, at Sep 9, 2008, 2:20 PM, ebiederm@xmission.com wrote: > Chuck Lever <chuck.lever@oracle.com> writes: > >> If the upper layers are responsible for providing the utsname, you >> will need to >> fix up lockd and the NFS server's callback client too, at least. > > Actually looking at the code. It looks like a proper fix may be > even simpler. > Why do we have both clnt->cl_server and clnt->cl_nodename? Or is > cl_server > the other side of the connection? cl_server is the server-side. cl_nodename is the "machine name" of the client. >>>> What are we trying to achieve by reading utsname? >>> >>> It looks like it gets copied into the sunrpc messages so I assume >>> it is >>> a part of the sunrpc spec? >> >> It appears to be used only for RPC's AUTH_SYS credentials. The >> nodename is used >> to identify the caller's host. See RFC 1831, Appendix A: >> >> http://rfclibrary.hosting.com/rfc/rfc1831/rfc1831-16.asp > > Thanks that helps a lot. > >> I'm not terribly familiar with uts namespaces, though. Can someone >> explain why >> we need to distinguish between these for AUTH_SYS if the caller is >> on a remote >> system? > > Semantically processes in different uts namespaces are on different > machines. OK. >> I don't like the idea of an oops in here. Instead, (for now) it >> should warn and >> fail to create the client, IMO. > > Which is interesting when the problem happens during NFS unmount. > Although > frankly it could fail anyway. > > It seems strange that we are creating a client during unmount anyway. It's worth investigating. Just enable RPC tracing (/usr/sbin/rpcdebug -m rpc -s all) before shutting down the client. NFS unmounting is historically difficult because during a system shutdown, unmounting is the last thing to occur, and usually happens after most system services (such as the portmapper service) have become unavailable. That's fine for local file systems, but it makes for a rather dodgy situation for NFS. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-09 19:00 ` Chuck Lever @ 2008-09-09 20:08 ` Eric W. Biederman 0 siblings, 0 replies; 22+ messages in thread From: Eric W. Biederman @ 2008-09-09 20:08 UTC (permalink / raw) To: Chuck Lever Cc: Serge E. Hallyn, Cedric Le Goater, Andrew Morton, Trond Myklebust, Linux Kernel Mailing List, Linux Containers, linux-nfs Chuck Lever <chuck.lever@oracle.com> writes: > On Sep 9, 2008, at Sep 9, 2008, 2:20 PM, ebiederm@xmission.com wrote: >> Chuck Lever <chuck.lever@oracle.com> writes: >> >>> If the upper layers are responsible for providing the utsname, you will need >>> to >>> fix up lockd and the NFS server's callback client too, at least. >> >> Actually looking at the code. It looks like a proper fix may be even simpler. >> Why do we have both clnt->cl_server and clnt->cl_nodename? Or is cl_server >> the other side of the connection? > > cl_server is the server-side. cl_nodename is the "machine name" of the client. Thanks, Darn. Looks like we need to capture both names at the same or a similar point. I'm wondering if we need to capture a network namespace as well. >>> I don't like the idea of an oops in here. Instead, (for now) it should warn >>> and >>> fail to create the client, IMO. >> >> Which is interesting when the problem happens during NFS unmount. Although >> frankly it could fail anyway. >> >> It seems strange that we are creating a client during unmount anyway. > > It's worth investigating. Just enable RPC tracing (/usr/sbin/rpcdebug -m rpc -s > all) before shutting down the client. > > NFS unmounting is historically difficult because during a system shutdown, > unmounting is the last thing to occur, and usually happens after most system > services (such as the portmapper service) have become unavailable. That's fine > for local file systems, but it makes for a rather dodgy situation for NFS. Interesting. Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-09 18:20 ` Eric W. Biederman 2008-09-09 19:00 ` Chuck Lever @ 2008-09-10 9:23 ` Cedric Le Goater 2008-09-10 15:12 ` Chuck Lever 1 sibling, 1 reply; 22+ messages in thread From: Cedric Le Goater @ 2008-09-10 9:23 UTC (permalink / raw) To: Eric W. Biederman Cc: Chuck Lever, Serge E. Hallyn, Andrew Morton, Trond Myklebust, Linux Kernel Mailing List, Linux Containers, linux-nfs Eric W. Biederman wrote: > Chuck Lever <chuck.lever@oracle.com> writes: > >> If the upper layers are responsible for providing the utsname, you will need to >> fix up lockd and the NFS server's callback client too, at least. > > Actually looking at the code. It looks like a proper fix may be even simpler. > Why do we have both clnt->cl_server and clnt->cl_nodename? Or is cl_server > the other side of the connection? > >>>> What are we trying to achieve by reading utsname? >>> It looks like it gets copied into the sunrpc messages so I assume it is >>> a part of the sunrpc spec? >> It appears to be used only for RPC's AUTH_SYS credentials. The nodename is used >> to identify the caller's host. See RFC 1831, Appendix A: >> >> http://rfclibrary.hosting.com/rfc/rfc1831/rfc1831-16.asp > > Thanks that helps a lot. > >> I'm not terribly familiar with uts namespaces, though. Can someone explain why >> we need to distinguish between these for AUTH_SYS if the caller is on a remote >> system? > > Semantically processes in different uts namespaces are on different machines. > >> I don't like the idea of an oops in here. Instead, (for now) it should warn and >> fail to create the client, IMO. > > Which is interesting when the problem happens during NFS unmount. Although > frankly it could fail anyway. > > It seems strange that we are creating a client during unmount anyway. the task exiting brings down the lockd thread and unregisters the lockd service with the portmapper. This is done with a rpc call which creates a client and a request. that's how I understand the code and the oops. C. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-10 9:23 ` Cedric Le Goater @ 2008-09-10 15:12 ` Chuck Lever 2008-09-10 20:02 ` Eric W. Biederman 0 siblings, 1 reply; 22+ messages in thread From: Chuck Lever @ 2008-09-10 15:12 UTC (permalink / raw) To: Cedric Le Goater Cc: Eric W. Biederman, Serge E. Hallyn, Andrew Morton, Trond Myklebust, Linux Kernel Mailing List, Linux Containers, linux-nfs On Wed, Sep 10, 2008 at 5:23 AM, Cedric Le Goater <clg@fr.ibm.com> wrote: > Eric W. Biederman wrote: >> Chuck Lever <chuck.lever@oracle.com> writes: >> >>> If the upper layers are responsible for providing the utsname, you will need to >>> fix up lockd and the NFS server's callback client too, at least. >> >> Actually looking at the code. It looks like a proper fix may be even simpler. >> Why do we have both clnt->cl_server and clnt->cl_nodename? Or is cl_server >> the other side of the connection? >> >>>>> What are we trying to achieve by reading utsname? >>>> It looks like it gets copied into the sunrpc messages so I assume it is >>>> a part of the sunrpc spec? >>> It appears to be used only for RPC's AUTH_SYS credentials. The nodename is used >>> to identify the caller's host. See RFC 1831, Appendix A: >>> >>> http://rfclibrary.hosting.com/rfc/rfc1831/rfc1831-16.asp >> >> Thanks that helps a lot. >> >>> I'm not terribly familiar with uts namespaces, though. Can someone explain why >>> we need to distinguish between these for AUTH_SYS if the caller is on a remote >>> system? >> >> Semantically processes in different uts namespaces are on different machines. >> >>> I don't like the idea of an oops in here. Instead, (for now) it should warn and >>> fail to create the client, IMO. >> >> Which is interesting when the problem happens during NFS unmount. Although >> frankly it could fail anyway. >> >> It seems strange that we are creating a client during unmount anyway. > > the task exiting brings down the lockd thread and unregisters the lockd service > with the portmapper. This is done with a rpc call which creates a client and a > request. That makes sense. This is likely coming from lockd_down(), and is almost certainly not coming from the same uts namespace as the lockd_up() that did the pmap_set, which was done by the first NFS mount done in the first uts namespace on the system. It's just something that the kernel has to do for maintenance. There is only one lockd() instance that is shared among all the uts namespaces, right? In this case, what is the correct utsname to use? -- Chuck Lever ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-10 15:12 ` Chuck Lever @ 2008-09-10 20:02 ` Eric W. Biederman 2008-09-10 20:54 ` Chuck Lever 2008-09-11 9:02 ` Cedric Le Goater 0 siblings, 2 replies; 22+ messages in thread From: Eric W. Biederman @ 2008-09-10 20:02 UTC (permalink / raw) To: chucklever Cc: Cedric Le Goater, Serge E. Hallyn, Andrew Morton, Trond Myklebust, Linux Kernel Mailing List, Linux Containers, linux-nfs "Chuck Lever" <chuck.lever@oracle.com> writes: > That makes sense. > > This is likely coming from lockd_down(), and is almost certainly not > coming from the same uts namespace as the lockd_up() that did the > pmap_set, which was done by the first NFS mount done in the first uts > namespace on the system. It's just something that the kernel has to > do for maintenance. > > There is only one lockd() instance that is shared among all the uts > namespaces, right? In this case, what is the correct utsname to use? Interesting. As a general rule I would say we should capture the uts instance in locked_up(). And use the same instance in locked_down(). I'm not at all familiar with how locked interacts with nfs mounts in a practical sense. Is there one locked instance (or at least context) per nfs mount? The way I would expect things to work is that when we mount an nfs filesystem from an nfs server. We would create a locked context for that server, that additional nfs mounts to the same nfs server could share. The way I would expect nfs to interact with the namespaces is for the nfs mount to capture the uts and network namespaces, and use them for all transactions relating to the mount. In particular when creating or a locked context the nfs mount would use the uts namespace and the network namespace as discriminators to see if an existing locked context is the same. I don't think nfs has a 1-1 thread to context model which is where things get really hazy for me. The conservative play is to always force use of the initial namespace and to deny creation of mounts that would use different namespaces. In part because the initial version of the namespace always exists. Which means as relates to Cedrics initial patch we would still need to know which mounts should cause us to use a different uts namespace so we can deny them. Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-10 20:02 ` Eric W. Biederman @ 2008-09-10 20:54 ` Chuck Lever 2008-09-11 9:02 ` Cedric Le Goater 1 sibling, 0 replies; 22+ messages in thread From: Chuck Lever @ 2008-09-10 20:54 UTC (permalink / raw) To: ebiederm Cc: chucklever, Cedric Le Goater, Serge E. Hallyn, Andrew Morton, Trond Myklebust, Linux Kernel Mailing List, Linux Containers, linux-nfs On Sep 10, 2008, at Sep 10, 2008, 4:02 PM, ebiederm@xmission.com wrote: > "Chuck Lever" <chuck.lever@oracle.com> writes: >> That makes sense. >> >> This is likely coming from lockd_down(), and is almost certainly not >> coming from the same uts namespace as the lockd_up() that did the >> pmap_set, which was done by the first NFS mount done in the first uts >> namespace on the system. It's just something that the kernel has to >> do for maintenance. >> >> There is only one lockd() instance that is shared among all the uts >> namespaces, right? In this case, what is the correct utsname to use? > > Interesting. > > As a general rule I would say we should capture the uts instance > in locked_up(). And use the same instance in locked_down(). > > I'm not at all familiar with how locked interacts with nfs mounts > in a practical sense. Is there one locked instance (or at least > context) > per nfs mount? > > The way I would expect things to work is that when we mount an nfs > filesystem > from an nfs server. We would create a locked context for that > server, that > additional nfs mounts to the same nfs server could share. There is one lockd, one statd, and one rpcbind per client. These are shared between all the NFS mounts on the client. Likewise, there is one of each of these per server, and they are shared among all exports. lockd_up() and lockd_down() maintain a count of mounts and exports, and lockd_down() shuts down lockd when the count goes to zero. statd provides the ability to signal a server when a client reboots (and vice versa). This gives the server an indication of when to free locks for any applications on a rebooting client, and gives the client an indication of when it needs to reclaim locks on a rebooting server. statd (user space) and lockd (kernel) have to share a cookie (mon_name) which is used to identify the client to servers, and the server to clients, so reboots can be detected. That cookie would probably need to be the initial utsname. > The way I would expect nfs to interact with the namespaces is for > the nfs > mount to capture the uts and network namespaces, and use them for all > transactions relating to the mount. That works for the main NFS protocol, perhaps, but the auxiliary protocols are another matter. They operate on behalf of a whole client or server, not on behalf of an individual mount or export. > In particular when creating > or a locked context the nfs mount would use the uts namespace and the > network namespace as discriminators to see if an existing locked > context > is the same. Possible, but I would expect this to be a lot of work for not much gain. The right answer is likely that you need a lockd and statd instance (virtual or real) for each namespace. The mounts and exports in each namespace would have their own lockd and statd. > I don't think nfs has a 1-1 thread to context model which is where > things > get really hazy for me. Users are assigned credentials. The credentials are passed from client to server, and the server does work on behalf of that credential (user). lockd uses a credential and a process identifier to find locks on files. AUTH_SYS credentials (the lowest common denominator) are constructed from the user's UID and GID and the client's utsname. The kernel, then, will have to construct unique credentials for users in each uts namespace. This is likely not an NFS mount-time issue, but is instead part of the mechanism of mapping requests from processes to RPC credentials. > The conservative play is to always force use of the initial namespace > and to deny creation of mounts that would use different namespaces. > In part > because the initial version of the namespace always exists. Which > means > as relates to Cedrics initial patch we would still need to know which > mounts should cause us to use a different uts namespace so we can deny > them. OK. I think what you are saying is that NFS won't work outside of the initial uts namespace, for now? Also, how would an automounter fit into this uts namespace scheme? -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-10 20:02 ` Eric W. Biederman 2008-09-10 20:54 ` Chuck Lever @ 2008-09-11 9:02 ` Cedric Le Goater 2008-09-11 10:27 ` Eric W. Biederman 2008-09-11 16:39 ` Chuck Lever 1 sibling, 2 replies; 22+ messages in thread From: Cedric Le Goater @ 2008-09-11 9:02 UTC (permalink / raw) To: Eric W. Biederman Cc: chucklever, Serge E. Hallyn, Andrew Morton, Trond Myklebust, Linux Kernel Mailing List, Linux Containers, linux-nfs > The conservative play is to always force use of the initial namespace > and to deny creation of mounts that would use different namespaces. In part > because the initial version of the namespace always exists. Which means > as relates to Cedrics initial patch we would still need to know which > mounts should cause us to use a different uts namespace so we can deny > them. I will send the initial patch which forces use of the initial namespace because it does fix a real oops. Then, I should be able to find some time to work on improving the uts namespace checks when NFS mounts are done. Thanks ! C. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-11 9:02 ` Cedric Le Goater @ 2008-09-11 10:27 ` Eric W. Biederman 2008-09-11 16:39 ` Chuck Lever 1 sibling, 0 replies; 22+ messages in thread From: Eric W. Biederman @ 2008-09-11 10:27 UTC (permalink / raw) To: Cedric Le Goater Cc: chucklever, Serge E. Hallyn, Andrew Morton, Trond Myklebust, Linux Kernel Mailing List, Linux Containers, linux-nfs Cedric Le Goater <clg@fr.ibm.com> writes: >> The conservative play is to always force use of the initial namespace >> and to deny creation of mounts that would use different namespaces. In part >> because the initial version of the namespace always exists. Which means >> as relates to Cedrics initial patch we would still need to know which >> mounts should cause us to use a different uts namespace so we can deny >> them. > > I will send the initial patch which forces use of the initial namespace because > it does fix a real oops. > > Then, I should be able to find some time to work on improving the uts namespace > checks when NFS mounts are done. Thanks. Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared 2008-09-11 9:02 ` Cedric Le Goater 2008-09-11 10:27 ` Eric W. Biederman @ 2008-09-11 16:39 ` Chuck Lever 1 sibling, 0 replies; 22+ messages in thread From: Chuck Lever @ 2008-09-11 16:39 UTC (permalink / raw) To: Cedric Le Goater Cc: Eric W. Biederman, Serge E. Hallyn, Andrew Morton, Trond Myklebust, Linux Kernel Mailing List, Linux Containers, Linux NFS Mailing List Cedric- On Sep 11, 2008, at Sep 11, 2008, 5:02 AM, Cedric Le Goater wrote: >> The conservative play is to always force use of the initial namespace >> and to deny creation of mounts that would use different >> namespaces. In part >> because the initial version of the namespace always exists. Which >> means >> as relates to Cedrics initial patch we would still need to know which >> mounts should cause us to use a different uts namespace so we can >> deny >> them. > > I will send the initial patch which forces use of the initial > namespace because > it does fix a real oops. Today's patch looks reasonable to me. I would add a FIXME comment in net/sunrpc/clnt.c near the site of the fix. > Then, I should be able to find some time to work on improving the > uts namespace > checks when NFS mounts are done. As a final note, I don't think this is an issue only for NFS mounts. The problem is with how AUTH_SYS RPC credentials are formed. cl_nodename is a copy of utsname kept in the rpc_clnt structure for efficiency. I think the auth_unix code has to be more sensitive to which utsname is "in effect" for each RPC request. We probably can't use the same utsname for all RPC requests for the life of an RPC client. And since RPC credentials are cached, we should be more careful about which cached credential is selected. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2008-09-11 16:45 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-09-08 13:39 [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared Cedric Le Goater 2008-09-08 15:19 ` Serge E. Hallyn 2008-09-08 15:27 ` Serge E. Hallyn 2008-09-08 15:37 ` Cedric Le Goater 2008-09-08 15:43 ` Serge E. Hallyn 2008-09-08 16:09 ` Eric W. Biederman 2008-09-09 11:54 ` Cedric Le Goater 2008-09-09 12:43 ` Serge E. Hallyn 2008-09-09 15:09 ` Eric W. Biederman 2008-09-09 15:29 ` Serge E. Hallyn 2008-09-09 15:40 ` Cedric Le Goater 2008-09-09 17:07 ` Chuck Lever 2008-09-09 18:20 ` Eric W. Biederman 2008-09-09 19:00 ` Chuck Lever 2008-09-09 20:08 ` Eric W. Biederman 2008-09-10 9:23 ` Cedric Le Goater 2008-09-10 15:12 ` Chuck Lever 2008-09-10 20:02 ` Eric W. Biederman 2008-09-10 20:54 ` Chuck Lever 2008-09-11 9:02 ` Cedric Le Goater 2008-09-11 10:27 ` Eric W. Biederman 2008-09-11 16:39 ` Chuck Lever
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).