linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1
@ 2019-05-01  6:42 Wenbin Zeng
  2019-05-01  6:42 ` [PATCH 1/3] nsfs: add evict callback into struct proc_ns_operations Wenbin Zeng
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Wenbin Zeng @ 2019-05-01  6:42 UTC (permalink / raw)
  To: viro, davem, bfields, jlayton, trond.myklebust, anna.schumaker,
	wenbinzeng, dsahern, nicolas.dichtel, willy, edumazet,
	jakub.kicinski, tyhicks, chuck.lever, neilb
  Cc: linux-fsdevel, linux-kernel, netdev, linux-nfs

This patch series fixes an auth_gss bug that results in netns refcount leaks when use-gss-proxy is set to 1.

The problem was found in privileged docker containers with gssproxy service enabled and /proc/net/rpc/use-gss-proxy set to 1, the corresponding struct net->count ends up at 2 after container gets killed, the consequence is that the struct net cannot be freed.

It turns out that write_gssp() called gssp_rpc_create() to create a rpc client, this increases net->count by 2; rpcsec_gss_exit_net() is supposed to decrease net->count but it never gets called because its call-path is:
	net->count==0 -> cleanup_net -> ops_exit_list -> rpcsec_gss_exit_net
Before rpcsec_gss_exit_net() gets called, net->count cannot reach 0, this is a deadlock situation.

To fix the problem, we must break the deadlock, rpcsec_gss_exit_net() should move out of the put() path and find another chance to get called, I think nsfs_evict() is a good place to go, when netns inode gets evicted we call rpcsec_gss_exit_net() to free the rpc client, this requires a new callback i.e. evict to be added in struct proc_ns_operations, and add netns_evict() as one of netns_operations as well.

Wenbin Zeng (3):
  nsfs: add evict callback into struct proc_ns_operations
  netns: add netns_evict into netns_operations
  auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when
    use-gss-proxy==1

 fs/nsfs.c                      |  2 ++
 include/linux/proc_ns.h        |  1 +
 include/net/net_namespace.h    |  1 +
 net/core/net_namespace.c       | 12 ++++++++++++
 net/sunrpc/auth_gss/auth_gss.c |  9 ++++++---
 5 files changed, 22 insertions(+), 3 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/3] nsfs: add evict callback into struct proc_ns_operations
  2019-05-01  6:42 [PATCH 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1 Wenbin Zeng
@ 2019-05-01  6:42 ` Wenbin Zeng
  2019-05-02  3:04   ` Al Viro
  2019-05-01  6:42 ` [PATCH 2/3] netns: add netns_evict into netns_operations Wenbin Zeng
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Wenbin Zeng @ 2019-05-01  6:42 UTC (permalink / raw)
  To: viro, davem, bfields, jlayton, trond.myklebust, anna.schumaker,
	wenbinzeng, dsahern, nicolas.dichtel, willy, edumazet,
	jakub.kicinski, tyhicks, chuck.lever, neilb
  Cc: linux-fsdevel, linux-kernel, netdev, linux-nfs

The newly added evict callback shall be called by nsfs_evict(). Currently
only put() callback is called in nsfs_evict(), it is not able to release
all netns refcount, for example, a rpc client holds two netns refcounts,
these refcounts are supposed to be released when the rpc client is freed,
but the code to free rpc client is normally triggered by put() callback
only when netns refcount gets to 0, specifically:
    refcount=0 -> cleanup_net() -> ops_exit_list -> free rpc client
But netns refcount will never get to 0 before rpc client gets freed, to
break the deadlock, the code to free rpc client can be put into the newly
added evict callback.

Signed-off-by: Wenbin Zeng <wenbinzeng@tencent.com>
---
 fs/nsfs.c               | 2 ++
 include/linux/proc_ns.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 60702d6..5939b12 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -49,6 +49,8 @@ static void nsfs_evict(struct inode *inode)
 	struct ns_common *ns = inode->i_private;
 	clear_inode(inode);
 	ns->ops->put(ns);
+	if (ns->ops->evict)
+		ns->ops->evict(ns);
 }
 
 static void *__ns_get_path(struct path *path, struct ns_common *ns)
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index d31cb62..919f0d4 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -19,6 +19,7 @@ struct proc_ns_operations {
 	int type;
 	struct ns_common *(*get)(struct task_struct *task);
 	void (*put)(struct ns_common *ns);
+	void (*evict)(struct ns_common *ns);
 	int (*install)(struct nsproxy *nsproxy, struct ns_common *ns);
 	struct user_namespace *(*owner)(struct ns_common *ns);
 	struct ns_common *(*get_parent)(struct ns_common *ns);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/3] netns: add netns_evict into netns_operations
  2019-05-01  6:42 [PATCH 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1 Wenbin Zeng
  2019-05-01  6:42 ` [PATCH 1/3] nsfs: add evict callback into struct proc_ns_operations Wenbin Zeng
@ 2019-05-01  6:42 ` Wenbin Zeng
  2019-05-04  4:10   ` David Miller
  2019-05-01  6:42 ` [PATCH 3/3] auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when use-gss-proxy==1 Wenbin Zeng
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Wenbin Zeng @ 2019-05-01  6:42 UTC (permalink / raw)
  To: viro, davem, bfields, jlayton, trond.myklebust, anna.schumaker,
	wenbinzeng, dsahern, nicolas.dichtel, willy, edumazet,
	jakub.kicinski, tyhicks, chuck.lever, neilb
  Cc: linux-fsdevel, linux-kernel, netdev, linux-nfs

The newly added netns_evict() shall be called when the netns inode being
evicted. It provides another path to release netns refcounts, previously
netns_put() is the only choice, but it is not able to release all netns
refcount, for example, a rpc client holds two netns refcounts, these
refcounts are supposed to be released when the rpc client is freed, but
the code to free rpc client is normally triggered by put() callback only
when netns refcount gets to 0, specifically:
    refcount=0 -> cleanup_net() -> ops_exit_list -> free rpc client
But netns refcount will never get to 0 before rpc client gets freed, to
break the deadlock, the code to free rpc client can be put into the newly
added netns_evict.

Signed-off-by: Wenbin Zeng <wenbinzeng@tencent.com>
---
 include/net/net_namespace.h |  1 +
 net/core/net_namespace.c    | 12 ++++++++++++
 2 files changed, 13 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 12689dd..c44306a 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -357,6 +357,7 @@ struct pernet_operations {
 	int (*init)(struct net *net);
 	void (*exit)(struct net *net);
 	void (*exit_batch)(struct list_head *net_exit_list);
+	void (*evict)(struct net *net);
 	unsigned int *id;
 	size_t size;
 };
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 7e6dcc6..0626fc4 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -1296,6 +1296,17 @@ static void netns_put(struct ns_common *ns)
 	put_net(to_net_ns(ns));
 }
 
+static void netns_evict(struct ns_common *ns)
+{
+	struct net *net = to_net_ns(ns);
+	const struct pernet_operations *ops;
+
+	list_for_each_entry_reverse(ops, &pernet_list, list) {
+		if (ops->evict)
+			ops->evict(net);
+	}
+}
+
 static int netns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 {
 	struct net *net = to_net_ns(ns);
@@ -1319,6 +1330,7 @@ static struct user_namespace *netns_owner(struct ns_common *ns)
 	.type		= CLONE_NEWNET,
 	.get		= netns_get,
 	.put		= netns_put,
+	.evict		= netns_evict,
 	.install	= netns_install,
 	.owner		= netns_owner,
 };
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/3] auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when use-gss-proxy==1
  2019-05-01  6:42 [PATCH 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1 Wenbin Zeng
  2019-05-01  6:42 ` [PATCH 1/3] nsfs: add evict callback into struct proc_ns_operations Wenbin Zeng
  2019-05-01  6:42 ` [PATCH 2/3] netns: add netns_evict into netns_operations Wenbin Zeng
@ 2019-05-01  6:42 ` Wenbin Zeng
  2019-05-09 20:52 ` [PATCH 0/3] auth_gss: netns refcount leaks " J. Bruce Fields
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Wenbin Zeng @ 2019-05-01  6:42 UTC (permalink / raw)
  To: viro, davem, bfields, jlayton, trond.myklebust, anna.schumaker,
	wenbinzeng, dsahern, nicolas.dichtel, willy, edumazet,
	jakub.kicinski, tyhicks, chuck.lever, neilb
  Cc: linux-fsdevel, linux-kernel, netdev, linux-nfs

When use-gss-proxy is set to 1, write_gssp() creates a rpc client in
gssp_rpc_create(), this increases netns refcount by 2, these refcounts are
supposed to be released in rpcsec_gss_exit_net(), but it will never happen
because rpcsec_gss_exit_net() is triggered only when netns refcount gets
to 0, specifically:
    refcount=0 -> cleanup_net() -> ops_exit_list -> rpcsec_gss_exit_net
It is a deadlock situation here, refcount will never get to 0 unless
rpcsec_gss_exit_net() is called.

This fix introduced a new callback i.e. evict in struct proc_ns_operations,
which is called in nsfs_evict. Moving rpcsec_gss_exit_net to evict path
gives it a chance to get called and avoids the above deadlock situation.

Signed-off-by: Wenbin Zeng <wenbinzeng@tencent.com>
---
 net/sunrpc/auth_gss/auth_gss.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 3fd56c0..3e6bd59 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -2136,14 +2136,17 @@ static __net_init int rpcsec_gss_init_net(struct net *net)
 	return gss_svc_init_net(net);
 }
 
-static __net_exit void rpcsec_gss_exit_net(struct net *net)
+static void rpcsec_gss_evict_net(struct net *net)
 {
-	gss_svc_shutdown_net(net);
+	struct sunrpc_net *sn = net_generic(net, sunrpc_net_id);
+
+	if (sn->gssp_clnt)
+		gss_svc_shutdown_net(net);
 }
 
 static struct pernet_operations rpcsec_gss_net_ops = {
 	.init = rpcsec_gss_init_net,
-	.exit = rpcsec_gss_exit_net,
+	.evict = rpcsec_gss_evict_net,
 };
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] nsfs: add evict callback into struct proc_ns_operations
  2019-05-01  6:42 ` [PATCH 1/3] nsfs: add evict callback into struct proc_ns_operations Wenbin Zeng
@ 2019-05-02  3:04   ` Al Viro
  2019-05-04 16:08     ` Wenbin Zeng
  0 siblings, 1 reply; 27+ messages in thread
From: Al Viro @ 2019-05-02  3:04 UTC (permalink / raw)
  To: Wenbin Zeng
  Cc: davem, bfields, jlayton, trond.myklebust, anna.schumaker,
	wenbinzeng, dsahern, nicolas.dichtel, willy, edumazet,
	jakub.kicinski, tyhicks, chuck.lever, neilb, linux-fsdevel,
	linux-kernel, netdev, linux-nfs

On Wed, May 01, 2019 at 02:42:23PM +0800, Wenbin Zeng wrote:
> The newly added evict callback shall be called by nsfs_evict(). Currently
> only put() callback is called in nsfs_evict(), it is not able to release
> all netns refcount, for example, a rpc client holds two netns refcounts,
> these refcounts are supposed to be released when the rpc client is freed,
> but the code to free rpc client is normally triggered by put() callback
> only when netns refcount gets to 0, specifically:
>     refcount=0 -> cleanup_net() -> ops_exit_list -> free rpc client
> But netns refcount will never get to 0 before rpc client gets freed, to
> break the deadlock, the code to free rpc client can be put into the newly
> added evict callback.
> 
> Signed-off-by: Wenbin Zeng <wenbinzeng@tencent.com>
> ---
>  fs/nsfs.c               | 2 ++
>  include/linux/proc_ns.h | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/fs/nsfs.c b/fs/nsfs.c
> index 60702d6..5939b12 100644
> --- a/fs/nsfs.c
> +++ b/fs/nsfs.c
> @@ -49,6 +49,8 @@ static void nsfs_evict(struct inode *inode)
>  	struct ns_common *ns = inode->i_private;
>  	clear_inode(inode);
>  	ns->ops->put(ns);
> +	if (ns->ops->evict)
> +		ns->ops->evict(ns);

What's to guarantee that ns will not be freed by ->put()?
Confused...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] netns: add netns_evict into netns_operations
  2019-05-01  6:42 ` [PATCH 2/3] netns: add netns_evict into netns_operations Wenbin Zeng
@ 2019-05-04  4:10   ` David Miller
  0 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2019-05-04  4:10 UTC (permalink / raw)
  To: wenbin.zeng
  Cc: viro, bfields, jlayton, trond.myklebust, anna.schumaker,
	wenbinzeng, dsahern, nicolas.dichtel, willy, edumazet,
	jakub.kicinski, tyhicks, chuck.lever, neilb, linux-fsdevel,
	linux-kernel, netdev, linux-nfs

From: Wenbin Zeng <wenbin.zeng@gmail.com>
Date: Wed,  1 May 2019 14:42:24 +0800

> The newly added netns_evict() shall be called when the netns inode being
> evicted. It provides another path to release netns refcounts, previously
> netns_put() is the only choice, but it is not able to release all netns
> refcount, for example, a rpc client holds two netns refcounts, these
> refcounts are supposed to be released when the rpc client is freed, but
> the code to free rpc client is normally triggered by put() callback only
> when netns refcount gets to 0, specifically:
>     refcount=0 -> cleanup_net() -> ops_exit_list -> free rpc client
> But netns refcount will never get to 0 before rpc client gets freed, to
> break the deadlock, the code to free rpc client can be put into the newly
> added netns_evict.
> 
> Signed-off-by: Wenbin Zeng <wenbinzeng@tencent.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] nsfs: add evict callback into struct proc_ns_operations
  2019-05-02  3:04   ` Al Viro
@ 2019-05-04 16:08     ` Wenbin Zeng
  0 siblings, 0 replies; 27+ messages in thread
From: Wenbin Zeng @ 2019-05-04 16:08 UTC (permalink / raw)
  To: Al Viro
  Cc: davem, bfields, jlayton, trond.myklebust, anna.schumaker,
	wenbinzeng, dsahern, nicolas.dichtel, willy, edumazet,
	jakub.kicinski, tyhicks, chuck.lever, neilb, linux-fsdevel,
	linux-kernel, netdev, linux-nfs

On Thu, May 02, 2019 at 04:04:06AM +0100, Al Viro wrote:
> On Wed, May 01, 2019 at 02:42:23PM +0800, Wenbin Zeng wrote:
> > The newly added evict callback shall be called by nsfs_evict(). Currently
> > only put() callback is called in nsfs_evict(), it is not able to release
> > all netns refcount, for example, a rpc client holds two netns refcounts,
> > these refcounts are supposed to be released when the rpc client is freed,
> > but the code to free rpc client is normally triggered by put() callback
> > only when netns refcount gets to 0, specifically:
> >     refcount=0 -> cleanup_net() -> ops_exit_list -> free rpc client
> > But netns refcount will never get to 0 before rpc client gets freed, to
> > break the deadlock, the code to free rpc client can be put into the newly
> > added evict callback.
> > 
> > Signed-off-by: Wenbin Zeng <wenbinzeng@tencent.com>
> > ---
> >  fs/nsfs.c               | 2 ++
> >  include/linux/proc_ns.h | 1 +
> >  2 files changed, 3 insertions(+)
> > 
> > diff --git a/fs/nsfs.c b/fs/nsfs.c
> > index 60702d6..5939b12 100644
> > --- a/fs/nsfs.c
> > +++ b/fs/nsfs.c
> > @@ -49,6 +49,8 @@ static void nsfs_evict(struct inode *inode)
> >  	struct ns_common *ns = inode->i_private;
> >  	clear_inode(inode);
> >  	ns->ops->put(ns);
> > +	if (ns->ops->evict)
> > +		ns->ops->evict(ns);
> 
> What's to guarantee that ns will not be freed by ->put()?
> Confused...

Hi Al, thank you very much. You are absolutely right.
->evict() should be called before ->put(), i.e.:

@@ -49,6 +49,8 @@ static void nsfs_evict(struct inode *inode)
	struct ns_common *ns = inode->i_private;
	clear_inode(inode);
+	if (ns->ops->evict)
+		ns->ops->evict(ns);
	ns->ops->put(ns);
 }

Does this look good?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1
  2019-05-01  6:42 [PATCH 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1 Wenbin Zeng
                   ` (2 preceding siblings ...)
  2019-05-01  6:42 ` [PATCH 3/3] auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when use-gss-proxy==1 Wenbin Zeng
@ 2019-05-09 20:52 ` J. Bruce Fields
  2019-05-10  5:09   ` Wenbin Zeng
  2019-05-10  6:36 ` [PATCH v2 " Wenbin Zeng
  2019-06-12 12:09 ` [PATCH v3 " Wenbin Zeng
  5 siblings, 1 reply; 27+ messages in thread
From: J. Bruce Fields @ 2019-05-09 20:52 UTC (permalink / raw)
  To: Wenbin Zeng
  Cc: viro, davem, jlayton, trond.myklebust, anna.schumaker,
	wenbinzeng, dsahern, nicolas.dichtel, willy, edumazet,
	jakub.kicinski, tyhicks, chuck.lever, neilb, linux-fsdevel,
	linux-kernel, netdev, linux-nfs

Thanks for figuring this out!

I guess I'll take these patches (with the one fix in your response to
Al) through the nfsd tree, unless someone tells me otherwise.  (The
original bug was introduced through nfsd.)

How serious are the consequences of the leak?  I'm wondering if it's
worth a stable cc or not.

--b.

On Wed, May 01, 2019 at 02:42:22PM +0800, Wenbin Zeng wrote:
> This patch series fixes an auth_gss bug that results in netns refcount leaks when use-gss-proxy is set to 1.
> 
> The problem was found in privileged docker containers with gssproxy service enabled and /proc/net/rpc/use-gss-proxy set to 1, the corresponding struct net->count ends up at 2 after container gets killed, the consequence is that the struct net cannot be freed.
> 
> It turns out that write_gssp() called gssp_rpc_create() to create a rpc client, this increases net->count by 2; rpcsec_gss_exit_net() is supposed to decrease net->count but it never gets called because its call-path is:
> 	net->count==0 -> cleanup_net -> ops_exit_list -> rpcsec_gss_exit_net
> Before rpcsec_gss_exit_net() gets called, net->count cannot reach 0, this is a deadlock situation.
> 
> To fix the problem, we must break the deadlock, rpcsec_gss_exit_net() should move out of the put() path and find another chance to get called, I think nsfs_evict() is a good place to go, when netns inode gets evicted we call rpcsec_gss_exit_net() to free the rpc client, this requires a new callback i.e. evict to be added in struct proc_ns_operations, and add netns_evict() as one of netns_operations as well.
> 
> Wenbin Zeng (3):
>   nsfs: add evict callback into struct proc_ns_operations
>   netns: add netns_evict into netns_operations
>   auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when
>     use-gss-proxy==1
> 
>  fs/nsfs.c                      |  2 ++
>  include/linux/proc_ns.h        |  1 +
>  include/net/net_namespace.h    |  1 +
>  net/core/net_namespace.c       | 12 ++++++++++++
>  net/sunrpc/auth_gss/auth_gss.c |  9 ++++++---
>  5 files changed, 22 insertions(+), 3 deletions(-)
> 
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1
  2019-05-09 20:52 ` [PATCH 0/3] auth_gss: netns refcount leaks " J. Bruce Fields
@ 2019-05-10  5:09   ` Wenbin Zeng
  0 siblings, 0 replies; 27+ messages in thread
From: Wenbin Zeng @ 2019-05-10  5:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: viro, davem, jlayton, trond.myklebust, anna.schumaker,
	wenbinzeng, dsahern, nicolas.dichtel, willy, edumazet,
	jakub.kicinski, tyhicks, chuck.lever, neilb, linux-fsdevel,
	linux-kernel, netdev, linux-nfs

On Thu, May 09, 2019 at 04:52:18PM -0400, J. Bruce Fields wrote:
> Thanks for figuring this out!
> 
> I guess I'll take these patches (with the one fix in your response to
> Al) through the nfsd tree, unless someone tells me otherwise.  (The
> original bug was introduced through nfsd.)

Thank you, Bruce.
I am submitting v2 with that fix right away.

> 
> How serious are the consequences of the leak?  I'm wondering if it's
> worth a stable cc or not.

Though the leak only happens with _privileged_ docker containers that have
gssproxy service enabled and use-gss-proxy set to 1, the consequences
can be ugly, the killed/stopped containers not only leave struct net
unfreed, also possibly leave behind veth devices linked to the netns, in
environments that containers are frequently killed/stopped, it is quite
ugly.

> 
> --b.
> 
> On Wed, May 01, 2019 at 02:42:22PM +0800, Wenbin Zeng wrote:
> > This patch series fixes an auth_gss bug that results in netns refcount leaks when use-gss-proxy is set to 1.
> > 
> > The problem was found in privileged docker containers with gssproxy service enabled and /proc/net/rpc/use-gss-proxy set to 1, the corresponding struct net->count ends up at 2 after container gets killed, the consequence is that the struct net cannot be freed.
> > 
> > It turns out that write_gssp() called gssp_rpc_create() to create a rpc client, this increases net->count by 2; rpcsec_gss_exit_net() is supposed to decrease net->count but it never gets called because its call-path is:
> > 	net->count==0 -> cleanup_net -> ops_exit_list -> rpcsec_gss_exit_net
> > Before rpcsec_gss_exit_net() gets called, net->count cannot reach 0, this is a deadlock situation.
> > 
> > To fix the problem, we must break the deadlock, rpcsec_gss_exit_net() should move out of the put() path and find another chance to get called, I think nsfs_evict() is a good place to go, when netns inode gets evicted we call rpcsec_gss_exit_net() to free the rpc client, this requires a new callback i.e. evict to be added in struct proc_ns_operations, and add netns_evict() as one of netns_operations as well.
> > 
> > Wenbin Zeng (3):
> >   nsfs: add evict callback into struct proc_ns_operations
> >   netns: add netns_evict into netns_operations
> >   auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when
> >     use-gss-proxy==1
> > 
> >  fs/nsfs.c                      |  2 ++
> >  include/linux/proc_ns.h        |  1 +
> >  include/net/net_namespace.h    |  1 +
> >  net/core/net_namespace.c       | 12 ++++++++++++
> >  net/sunrpc/auth_gss/auth_gss.c |  9 ++++++---
> >  5 files changed, 22 insertions(+), 3 deletions(-)
> > 
> > -- 
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1
  2019-05-01  6:42 [PATCH 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1 Wenbin Zeng
                   ` (3 preceding siblings ...)
  2019-05-09 20:52 ` [PATCH 0/3] auth_gss: netns refcount leaks " J. Bruce Fields
@ 2019-05-10  6:36 ` Wenbin Zeng
  2019-05-10  6:36   ` [PATCH v2 1/3] nsfs: add evict callback into struct proc_ns_operations Wenbin Zeng
                     ` (3 more replies)
  2019-06-12 12:09 ` [PATCH v3 " Wenbin Zeng
  5 siblings, 4 replies; 27+ messages in thread
From: Wenbin Zeng @ 2019-05-10  6:36 UTC (permalink / raw)
  To: bfields, viro, davem
  Cc: jlayton, trond.myklebust, anna.schumaker, wenbinzeng, dsahern,
	nicolas.dichtel, willy, edumazet, jakub.kicinski, tyhicks,
	chuck.lever, neilb, linux-fsdevel, linux-kernel, netdev,
	linux-nfs

This patch series fixes an auth_gss bug that results in netns refcount
leaks when use-gss-proxy is set to 1.

The problem was found in privileged docker containers with gssproxy service
enabled and /proc/net/rpc/use-gss-proxy set to 1, the corresponding
struct net->count ends up at 2 after container gets killed, the consequence
is that the struct net cannot be freed.

It turns out that write_gssp() called gssp_rpc_create() to create a rpc
client, this increases net->count by 2; rpcsec_gss_exit_net() is supposed
to decrease net->count but it never gets called because its call-path is:
        net->count==0 -> cleanup_net -> ops_exit_list -> rpcsec_gss_exit_net
Before rpcsec_gss_exit_net() gets called, net->count cannot reach 0, this
is a deadlock situation.

To fix the problem, we must break the deadlock, rpcsec_gss_exit_net()
should move out of the put() path and find another chance to get called,
I think nsfs_evict() is a good place to go, when netns inode gets evicted
we call rpcsec_gss_exit_net() to free the rpc client, this requires a new
callback i.e. evict to be added in struct proc_ns_operations, and add
netns_evict() as one of netns_operations as well.

v1->v2:
 * in nsfs_evict(), move ->evict() in front of ->put()

Wenbin Zeng (3):
  nsfs: add evict callback into struct proc_ns_operations
  netns: add netns_evict into netns_operations
  auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when
    use-gss-proxy==1

 fs/nsfs.c                      |  2 ++
 include/linux/proc_ns.h        |  1 +
 include/net/net_namespace.h    |  1 +
 net/core/net_namespace.c       | 12 ++++++++++++
 net/sunrpc/auth_gss/auth_gss.c |  9 ++++++---
 5 files changed, 22 insertions(+), 3 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 1/3] nsfs: add evict callback into struct proc_ns_operations
  2019-05-10  6:36 ` [PATCH v2 " Wenbin Zeng
@ 2019-05-10  6:36   ` Wenbin Zeng
  2019-05-10  6:36   ` [PATCH v2 2/3] netns: add netns_evict into netns_operations Wenbin Zeng
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Wenbin Zeng @ 2019-05-10  6:36 UTC (permalink / raw)
  To: bfields, viro, davem
  Cc: jlayton, trond.myklebust, anna.schumaker, wenbinzeng, dsahern,
	nicolas.dichtel, willy, edumazet, jakub.kicinski, tyhicks,
	chuck.lever, neilb, linux-fsdevel, linux-kernel, netdev,
	linux-nfs

The newly added evict callback shall be called by nsfs_evict(). Currently
only put() callback is called in nsfs_evict(), it is not able to release
all netns refcount, for example, a rpc client holds two netns refcounts,
these refcounts are supposed to be released when the rpc client is freed,
but the code to free rpc client is normally triggered by put() callback
only when netns refcount gets to 0, specifically:
    refcount=0 -> cleanup_net() -> ops_exit_list -> free rpc client
But netns refcount will never get to 0 before rpc client gets freed, to
break the deadlock, the code to free rpc client can be put into the newly
added evict callback.

Signed-off-by: Wenbin Zeng <wenbinzeng@tencent.com>
---
 fs/nsfs.c               | 2 ++
 include/linux/proc_ns.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 60702d6..a122288 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -48,6 +48,8 @@ static void nsfs_evict(struct inode *inode)
 {
 	struct ns_common *ns = inode->i_private;
 	clear_inode(inode);
+	if (ns->ops->evict)
+		ns->ops->evict(ns);
 	ns->ops->put(ns);
 }
 
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index d31cb62..919f0d4 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -19,6 +19,7 @@ struct proc_ns_operations {
 	int type;
 	struct ns_common *(*get)(struct task_struct *task);
 	void (*put)(struct ns_common *ns);
+	void (*evict)(struct ns_common *ns);
 	int (*install)(struct nsproxy *nsproxy, struct ns_common *ns);
 	struct user_namespace *(*owner)(struct ns_common *ns);
 	struct ns_common *(*get_parent)(struct ns_common *ns);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 2/3] netns: add netns_evict into netns_operations
  2019-05-10  6:36 ` [PATCH v2 " Wenbin Zeng
  2019-05-10  6:36   ` [PATCH v2 1/3] nsfs: add evict callback into struct proc_ns_operations Wenbin Zeng
@ 2019-05-10  6:36   ` Wenbin Zeng
  2019-05-10 22:13     ` David Miller
  2019-05-10  6:36   ` [PATCH v2 3/3] auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when use-gss-proxy==1 Wenbin Zeng
  2019-05-15  1:03   ` [PATCH v2 0/3] auth_gss: netns refcount leaks " J. Bruce Fields
  3 siblings, 1 reply; 27+ messages in thread
From: Wenbin Zeng @ 2019-05-10  6:36 UTC (permalink / raw)
  To: bfields, viro, davem
  Cc: jlayton, trond.myklebust, anna.schumaker, wenbinzeng, dsahern,
	nicolas.dichtel, willy, edumazet, jakub.kicinski, tyhicks,
	chuck.lever, neilb, linux-fsdevel, linux-kernel, netdev,
	linux-nfs

The newly added netns_evict() shall be called when the netns inode being
evicted. It provides another path to release netns refcounts, previously
netns_put() is the only choice, but it is not able to release all netns
refcount, for example, a rpc client holds two netns refcounts, these
refcounts are supposed to be released when the rpc client is freed, but
the code to free rpc client is normally triggered by put() callback only
when netns refcount gets to 0, specifically:
    refcount=0 -> cleanup_net() -> ops_exit_list -> free rpc client
But netns refcount will never get to 0 before rpc client gets freed, to
break the deadlock, the code to free rpc client can be put into the newly
added netns_evict.

Signed-off-by: Wenbin Zeng <wenbinzeng@tencent.com>
---
 include/net/net_namespace.h |  1 +
 net/core/net_namespace.c    | 12 ++++++++++++
 2 files changed, 13 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 12689dd..c44306a 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -357,6 +357,7 @@ struct pernet_operations {
 	int (*init)(struct net *net);
 	void (*exit)(struct net *net);
 	void (*exit_batch)(struct list_head *net_exit_list);
+	void (*evict)(struct net *net);
 	unsigned int *id;
 	size_t size;
 };
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 7e6dcc6..0626fc4 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -1296,6 +1296,17 @@ static void netns_put(struct ns_common *ns)
 	put_net(to_net_ns(ns));
 }
 
+static void netns_evict(struct ns_common *ns)
+{
+	struct net *net = to_net_ns(ns);
+	const struct pernet_operations *ops;
+
+	list_for_each_entry_reverse(ops, &pernet_list, list) {
+		if (ops->evict)
+			ops->evict(net);
+	}
+}
+
 static int netns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 {
 	struct net *net = to_net_ns(ns);
@@ -1319,6 +1330,7 @@ static struct user_namespace *netns_owner(struct ns_common *ns)
 	.type		= CLONE_NEWNET,
 	.get		= netns_get,
 	.put		= netns_put,
+	.evict		= netns_evict,
 	.install	= netns_install,
 	.owner		= netns_owner,
 };
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 3/3] auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when use-gss-proxy==1
  2019-05-10  6:36 ` [PATCH v2 " Wenbin Zeng
  2019-05-10  6:36   ` [PATCH v2 1/3] nsfs: add evict callback into struct proc_ns_operations Wenbin Zeng
  2019-05-10  6:36   ` [PATCH v2 2/3] netns: add netns_evict into netns_operations Wenbin Zeng
@ 2019-05-10  6:36   ` Wenbin Zeng
  2019-05-15  1:03   ` [PATCH v2 0/3] auth_gss: netns refcount leaks " J. Bruce Fields
  3 siblings, 0 replies; 27+ messages in thread
From: Wenbin Zeng @ 2019-05-10  6:36 UTC (permalink / raw)
  To: bfields, viro, davem
  Cc: jlayton, trond.myklebust, anna.schumaker, wenbinzeng, dsahern,
	nicolas.dichtel, willy, edumazet, jakub.kicinski, tyhicks,
	chuck.lever, neilb, linux-fsdevel, linux-kernel, netdev,
	linux-nfs

When use-gss-proxy is set to 1, write_gssp() creates a rpc client in
gssp_rpc_create(), this increases netns refcount by 2, these refcounts are
supposed to be released in rpcsec_gss_exit_net(), but it will never happen
because rpcsec_gss_exit_net() is triggered only when netns refcount gets
to 0, specifically:
    refcount=0 -> cleanup_net() -> ops_exit_list -> rpcsec_gss_exit_net
It is a deadlock situation here, refcount will never get to 0 unless
rpcsec_gss_exit_net() is called.

This fix introduced a new callback i.e. evict in struct proc_ns_operations,
which is called in nsfs_evict. Moving rpcsec_gss_exit_net to evict path
gives it a chance to get called and avoids the above deadlock situation.

Signed-off-by: Wenbin Zeng <wenbinzeng@tencent.com>
---
 net/sunrpc/auth_gss/auth_gss.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 3fd56c0..3e6bd59 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -2136,14 +2136,17 @@ static __net_init int rpcsec_gss_init_net(struct net *net)
 	return gss_svc_init_net(net);
 }
 
-static __net_exit void rpcsec_gss_exit_net(struct net *net)
+static void rpcsec_gss_evict_net(struct net *net)
 {
-	gss_svc_shutdown_net(net);
+	struct sunrpc_net *sn = net_generic(net, sunrpc_net_id);
+
+	if (sn->gssp_clnt)
+		gss_svc_shutdown_net(net);
 }
 
 static struct pernet_operations rpcsec_gss_net_ops = {
 	.init = rpcsec_gss_init_net,
-	.exit = rpcsec_gss_exit_net,
+	.evict = rpcsec_gss_evict_net,
 };
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/3] netns: add netns_evict into netns_operations
  2019-05-10  6:36   ` [PATCH v2 2/3] netns: add netns_evict into netns_operations Wenbin Zeng
@ 2019-05-10 22:13     ` David Miller
  0 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2019-05-10 22:13 UTC (permalink / raw)
  To: wenbin.zeng
  Cc: bfields, viro, jlayton, trond.myklebust, anna.schumaker,
	wenbinzeng, dsahern, nicolas.dichtel, willy, edumazet,
	jakub.kicinski, tyhicks, chuck.lever, neilb, linux-fsdevel,
	linux-kernel, netdev, linux-nfs

From: Wenbin Zeng <wenbin.zeng@gmail.com>
Date: Fri, 10 May 2019 14:36:02 +0800

> The newly added netns_evict() shall be called when the netns inode being
> evicted. It provides another path to release netns refcounts, previously
> netns_put() is the only choice, but it is not able to release all netns
> refcount, for example, a rpc client holds two netns refcounts, these
> refcounts are supposed to be released when the rpc client is freed, but
> the code to free rpc client is normally triggered by put() callback only
> when netns refcount gets to 0, specifically:
>     refcount=0 -> cleanup_net() -> ops_exit_list -> free rpc client
> But netns refcount will never get to 0 before rpc client gets freed, to
> break the deadlock, the code to free rpc client can be put into the newly
> added netns_evict.
> 
> Signed-off-by: Wenbin Zeng <wenbinzeng@tencent.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1
  2019-05-10  6:36 ` [PATCH v2 " Wenbin Zeng
                     ` (2 preceding siblings ...)
  2019-05-10  6:36   ` [PATCH v2 3/3] auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when use-gss-proxy==1 Wenbin Zeng
@ 2019-05-15  1:03   ` J. Bruce Fields
  2019-06-12  8:37     ` Wenbin Zeng
  3 siblings, 1 reply; 27+ messages in thread
From: J. Bruce Fields @ 2019-05-15  1:03 UTC (permalink / raw)
  To: Wenbin Zeng
  Cc: viro, davem, jlayton, trond.myklebust, anna.schumaker,
	wenbinzeng, dsahern, nicolas.dichtel, willy, edumazet,
	jakub.kicinski, tyhicks, chuck.lever, neilb, linux-fsdevel,
	linux-kernel, netdev, linux-nfs

Whoops, I was slow to test these.  I'm getting failuring krb5 nfs
mounts, and the following the server's logs.  Dropping the three patches
for now.

--b.

[   40.894408] remove_proc_entry: removing non-empty directory 'net/rpc', leaking at least 'use-gss-proxy'
[   40.897352] WARNING: CPU: 2 PID: 31 at fs/proc/generic.c:683 remove_proc_entry+0x17d/0x190
[   40.899373] Modules linked in: nfsd nfs_acl lockd grace auth_rpcgss sunrpc
[   40.901335] CPU: 2 PID: 31 Comm: kworker/u8:1 Not tainted 5.1.0-10733-g4f10d1cb695e #2220
[   40.903759] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
[   40.906972] Workqueue: netns cleanup_net
[   40.907828] RIP: 0010:remove_proc_entry+0x17d/0x190
[   40.908904] Code: 52 82 48 85 c0 48 8d 90 48 ff ff ff 48 0f 45 c2 48 8b 93 a8 00 00 00 4c 8b 80 d0 00 00 00 48 8b 92 d0 00 00 00 e8 a7 24 dc ff <0f> 0b e9 52 ff ff ff e8 a7 21 dc ff 0f 1f 80 00 00 00 00 0f 1f 44
[   40.912689] RSP: 0018:ffffc90000123d80 EFLAGS: 00010282
[   40.913495] RAX: 0000000000000000 RBX: ffff888079f96e40 RCX: 0000000000000000
[   40.914747] RDX: ffff88807fd24e80 RSI: ffff88807fd165b8 RDI: 00000000ffffffff
[   40.916107] RBP: ffff888079f96ef0 R08: 0000000000000000 R09: 0000000000000000
[   40.917253] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88807cd76d68
[   40.918508] R13: ffffffffa0057000 R14: ffff8880683db200 R15: ffffffff82970240
[   40.919642] FS:  0000000000000000(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
[   40.920956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   40.921867] CR2: 00007f9d70010cb8 CR3: 000000007cc5c006 CR4: 00000000001606e0
[   40.923044] Call Trace:
[   40.923364]  sunrpc_exit_net+0xcc/0x190 [sunrpc]
[   40.924069]  ops_exit_list.isra.0+0x36/0x70
[   40.924713]  cleanup_net+0x1cb/0x2c0
[   40.925182]  process_one_work+0x219/0x620
[   40.925780]  worker_thread+0x3c/0x390
[   40.926312]  ? process_one_work+0x620/0x620
[   40.927015]  kthread+0x11d/0x140
[   40.927430]  ? kthread_park+0x80/0x80
[   40.927822]  ret_from_fork+0x3a/0x50
[   40.928281] irq event stamp: 11688
[   40.928780] hardirqs last  enabled at (11687): [<ffffffff811225fe>] console_unlock+0x41e/0x590
[   40.930319] hardirqs last disabled at (11688): [<ffffffff81001b2c>] trace_hardirqs_off_thunk+0x1a/0x1c
[   40.932123] softirqs last  enabled at (11684): [<ffffffff820002c5>] __do_softirq+0x2c5/0x4c5
[   40.933657] softirqs last disabled at (11673): [<ffffffff810bf970>] irq_exit+0x80/0x90


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1
  2019-05-15  1:03   ` [PATCH v2 0/3] auth_gss: netns refcount leaks " J. Bruce Fields
@ 2019-06-12  8:37     ` Wenbin Zeng
  2019-06-12 15:52       ` J. Bruce Fields
  0 siblings, 1 reply; 27+ messages in thread
From: Wenbin Zeng @ 2019-06-12  8:37 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: viro, davem, jlayton, trond.myklebust, anna.schumaker,
	wenbinzeng, dsahern, nicolas.dichtel, willy, edumazet,
	jakub.kicinski, tyhicks, chuck.lever, neilb, linux-fsdevel,
	linux-kernel, netdev, linux-nfs

On Tue, May 14, 2019 at 09:03:31PM -0400, J. Bruce Fields wrote:
> Whoops, I was slow to test these.  I'm getting failuring krb5 nfs
> mounts, and the following the server's logs.  Dropping the three patches
> for now.
My bad, I should have found it earlier. Thank you for testing it, Bruce.

I figured it out, the problem that you saw is due to the following code:
the if-condition is incorrect here because sn->gssp_clnt==NULL doesn't mean
inexistence of 'use-gss-proxy':

-static __net_exit void rpcsec_gss_exit_net(struct net *net)
+static void rpcsec_gss_evict_net(struct net *net)
 {
-       gss_svc_shutdown_net(net);
+       struct sunrpc_net *sn = net_generic(net, sunrpc_net_id);
+
+       if (sn->gssp_clnt)
+               gss_svc_shutdown_net(net);
 }

Simply using the original logic in rpcsec_gss_exit_net() should be fine,
i.e.:

-static __net_exit void rpcsec_gss_exit_net(struct net *net)
+static void rpcsec_gss_evict_net(struct net *net)
 {
        gss_svc_shutdown_net(net);
 }

I'm going to submit v3 soon.

Wenbin.

> 
> --b.
> 
> [   40.894408] remove_proc_entry: removing non-empty directory 'net/rpc', leaking at least 'use-gss-proxy'
> [   40.897352] WARNING: CPU: 2 PID: 31 at fs/proc/generic.c:683 remove_proc_entry+0x17d/0x190
> [   40.899373] Modules linked in: nfsd nfs_acl lockd grace auth_rpcgss sunrpc
> [   40.901335] CPU: 2 PID: 31 Comm: kworker/u8:1 Not tainted 5.1.0-10733-g4f10d1cb695e #2220
> [   40.903759] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
> [   40.906972] Workqueue: netns cleanup_net
> [   40.907828] RIP: 0010:remove_proc_entry+0x17d/0x190
> [   40.908904] Code: 52 82 48 85 c0 48 8d 90 48 ff ff ff 48 0f 45 c2 48 8b 93 a8 00 00 00 4c 8b 80 d0 00 00 00 48 8b 92 d0 00 00 00 e8 a7 24 dc ff <0f> 0b e9 52 ff ff ff e8 a7 21 dc ff 0f 1f 80 00 00 00 00 0f 1f 44
> [   40.912689] RSP: 0018:ffffc90000123d80 EFLAGS: 00010282
> [   40.913495] RAX: 0000000000000000 RBX: ffff888079f96e40 RCX: 0000000000000000
> [   40.914747] RDX: ffff88807fd24e80 RSI: ffff88807fd165b8 RDI: 00000000ffffffff
> [   40.916107] RBP: ffff888079f96ef0 R08: 0000000000000000 R09: 0000000000000000
> [   40.917253] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88807cd76d68
> [   40.918508] R13: ffffffffa0057000 R14: ffff8880683db200 R15: ffffffff82970240
> [   40.919642] FS:  0000000000000000(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
> [   40.920956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   40.921867] CR2: 00007f9d70010cb8 CR3: 000000007cc5c006 CR4: 00000000001606e0
> [   40.923044] Call Trace:
> [   40.923364]  sunrpc_exit_net+0xcc/0x190 [sunrpc]
> [   40.924069]  ops_exit_list.isra.0+0x36/0x70
> [   40.924713]  cleanup_net+0x1cb/0x2c0
> [   40.925182]  process_one_work+0x219/0x620
> [   40.925780]  worker_thread+0x3c/0x390
> [   40.926312]  ? process_one_work+0x620/0x620
> [   40.927015]  kthread+0x11d/0x140
> [   40.927430]  ? kthread_park+0x80/0x80
> [   40.927822]  ret_from_fork+0x3a/0x50
> [   40.928281] irq event stamp: 11688
> [   40.928780] hardirqs last  enabled at (11687): [<ffffffff811225fe>] console_unlock+0x41e/0x590
> [   40.930319] hardirqs last disabled at (11688): [<ffffffff81001b2c>] trace_hardirqs_off_thunk+0x1a/0x1c
> [   40.932123] softirqs last  enabled at (11684): [<ffffffff820002c5>] __do_softirq+0x2c5/0x4c5
> [   40.933657] softirqs last disabled at (11673): [<ffffffff810bf970>] irq_exit+0x80/0x90
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1
  2019-05-01  6:42 [PATCH 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1 Wenbin Zeng
                   ` (4 preceding siblings ...)
  2019-05-10  6:36 ` [PATCH v2 " Wenbin Zeng
@ 2019-06-12 12:09 ` Wenbin Zeng
  2019-06-12 12:09   ` [PATCH v3 1/3] nsfs: add evict callback into struct proc_ns_operations Wenbin Zeng
                     ` (3 more replies)
  5 siblings, 4 replies; 27+ messages in thread
From: Wenbin Zeng @ 2019-06-12 12:09 UTC (permalink / raw)
  To: bfields, davem, viro
  Cc: jlayton, trond.myklebust, anna.schumaker, wenbinzeng, dsahern,
	nicolas.dichtel, willy, edumazet, jakub.kicinski, tyhicks,
	chuck.lever, neilb, linux-fsdevel, linux-kernel, netdev,
	linux-nfs

This patch series fixes an auth_gss bug that results in netns refcount
leaks when use-gss-proxy is set to 1.

The problem was found in privileged docker containers with gssproxy service
enabled and /proc/net/rpc/use-gss-proxy set to 1, the corresponding
struct net->count ends up at 2 after container gets killed, the consequence
is that the struct net cannot be freed.

It turns out that write_gssp() called gssp_rpc_create() to create a rpc
client, this increases net->count by 2; rpcsec_gss_exit_net() is supposed
to decrease net->count but it never gets called because its call-path is:
        net->count==0 -> cleanup_net -> ops_exit_list -> rpcsec_gss_exit_net
Before rpcsec_gss_exit_net() gets called, net->count cannot reach 0, this
is a deadlock situation.

To fix the problem, we must break the deadlock, rpcsec_gss_exit_net()
should move out of the put() path and find another chance to get called,
I think nsfs_evict() is a good place to go, when netns inode gets evicted
we call rpcsec_gss_exit_net() to free the rpc client, this requires a new
callback i.e. evict to be added in struct proc_ns_operations, and add
netns_evict() as one of netns_operations as well.

v1->v2:
 * in nsfs_evict(), move ->evict() in front of ->put()
v2->v3:
 * rpcsec_gss_evict_net() directly call gss_svc_shutdown_net() regardless
   if gssp_clnt is null, this is exactly same to what rpcsec_gss_exit_net()
   previously did

Wenbin Zeng (3):
  nsfs: add evict callback into struct proc_ns_operations
  netns: add netns_evict into netns_operations
  auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when
    use-gss-proxy==1

 fs/nsfs.c                      |  2 ++
 include/linux/proc_ns.h        |  1 +
 include/net/net_namespace.h    |  1 +
 net/core/net_namespace.c       | 12 ++++++++++++
 net/sunrpc/auth_gss/auth_gss.c |  4 ++--
 5 files changed, 18 insertions(+), 2 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 1/3] nsfs: add evict callback into struct proc_ns_operations
  2019-06-12 12:09 ` [PATCH v3 " Wenbin Zeng
@ 2019-06-12 12:09   ` Wenbin Zeng
  2019-06-12 12:09   ` [PATCH v3 2/3] netns: add netns_evict into netns_operations Wenbin Zeng
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Wenbin Zeng @ 2019-06-12 12:09 UTC (permalink / raw)
  To: bfields, davem, viro
  Cc: jlayton, trond.myklebust, anna.schumaker, wenbinzeng, dsahern,
	nicolas.dichtel, willy, edumazet, jakub.kicinski, tyhicks,
	chuck.lever, neilb, linux-fsdevel, linux-kernel, netdev,
	linux-nfs

The newly added evict callback shall be called by nsfs_evict(). Currently
only put() callback is called in nsfs_evict(), it is not able to release
all netns refcount, for example, a rpc client holds two netns refcounts,
these refcounts are supposed to be released when the rpc client is freed,
but the code to free rpc client is normally triggered by put() callback
only when netns refcount gets to 0, specifically:
    refcount=0 -> cleanup_net() -> ops_exit_list -> free rpc client
But netns refcount will never get to 0 before rpc client gets freed, to
break the deadlock, the code to free rpc client can be put into the newly
added evict callback.

Signed-off-by: Wenbin Zeng <wenbinzeng@tencent.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/nsfs.c               | 2 ++
 include/linux/proc_ns.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 60702d6..a122288 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -48,6 +48,8 @@ static void nsfs_evict(struct inode *inode)
 {
 	struct ns_common *ns = inode->i_private;
 	clear_inode(inode);
+	if (ns->ops->evict)
+		ns->ops->evict(ns);
 	ns->ops->put(ns);
 }
 
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index d31cb62..919f0d4 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -19,6 +19,7 @@ struct proc_ns_operations {
 	int type;
 	struct ns_common *(*get)(struct task_struct *task);
 	void (*put)(struct ns_common *ns);
+	void (*evict)(struct ns_common *ns);
 	int (*install)(struct nsproxy *nsproxy, struct ns_common *ns);
 	struct user_namespace *(*owner)(struct ns_common *ns);
 	struct ns_common *(*get_parent)(struct ns_common *ns);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 2/3] netns: add netns_evict into netns_operations
  2019-06-12 12:09 ` [PATCH v3 " Wenbin Zeng
  2019-06-12 12:09   ` [PATCH v3 1/3] nsfs: add evict callback into struct proc_ns_operations Wenbin Zeng
@ 2019-06-12 12:09   ` Wenbin Zeng
  2019-06-12 12:09   ` [PATCH v3 3/3] auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when use-gss-proxy==1 Wenbin Zeng
  2019-08-01 19:53   ` [PATCH v3 0/3] auth_gss: netns refcount leaks " J. Bruce Fields
  3 siblings, 0 replies; 27+ messages in thread
From: Wenbin Zeng @ 2019-06-12 12:09 UTC (permalink / raw)
  To: bfields, davem, viro
  Cc: jlayton, trond.myklebust, anna.schumaker, wenbinzeng, dsahern,
	nicolas.dichtel, willy, edumazet, jakub.kicinski, tyhicks,
	chuck.lever, neilb, linux-fsdevel, linux-kernel, netdev,
	linux-nfs

The newly added netns_evict() shall be called when the netns inode being
evicted. It provides another path to release netns refcounts, previously
netns_put() is the only choice, but it is not able to release all netns
refcount, for example, a rpc client holds two netns refcounts, these
refcounts are supposed to be released when the rpc client is freed, but
the code to free rpc client is normally triggered by put() callback only
when netns refcount gets to 0, specifically:
    refcount=0 -> cleanup_net() -> ops_exit_list -> free rpc client
But netns refcount will never get to 0 before rpc client gets freed, to
break the deadlock, the code to free rpc client can be put into the newly
added netns_evict.

Signed-off-by: Wenbin Zeng <wenbinzeng@tencent.com>
Acked-by: David S. Miller <davem@davemloft.net>
---
 include/net/net_namespace.h |  1 +
 net/core/net_namespace.c    | 12 ++++++++++++
 2 files changed, 13 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 12689dd..c44306a 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -357,6 +357,7 @@ struct pernet_operations {
 	int (*init)(struct net *net);
 	void (*exit)(struct net *net);
 	void (*exit_batch)(struct list_head *net_exit_list);
+	void (*evict)(struct net *net);
 	unsigned int *id;
 	size_t size;
 };
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 7e6dcc6..0626fc4 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -1296,6 +1296,17 @@ static void netns_put(struct ns_common *ns)
 	put_net(to_net_ns(ns));
 }
 
+static void netns_evict(struct ns_common *ns)
+{
+	struct net *net = to_net_ns(ns);
+	const struct pernet_operations *ops;
+
+	list_for_each_entry_reverse(ops, &pernet_list, list) {
+		if (ops->evict)
+			ops->evict(net);
+	}
+}
+
 static int netns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 {
 	struct net *net = to_net_ns(ns);
@@ -1319,6 +1330,7 @@ static struct user_namespace *netns_owner(struct ns_common *ns)
 	.type		= CLONE_NEWNET,
 	.get		= netns_get,
 	.put		= netns_put,
+	.evict		= netns_evict,
 	.install	= netns_install,
 	.owner		= netns_owner,
 };
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 3/3] auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when use-gss-proxy==1
  2019-06-12 12:09 ` [PATCH v3 " Wenbin Zeng
  2019-06-12 12:09   ` [PATCH v3 1/3] nsfs: add evict callback into struct proc_ns_operations Wenbin Zeng
  2019-06-12 12:09   ` [PATCH v3 2/3] netns: add netns_evict into netns_operations Wenbin Zeng
@ 2019-06-12 12:09   ` Wenbin Zeng
  2019-08-01 19:53   ` [PATCH v3 0/3] auth_gss: netns refcount leaks " J. Bruce Fields
  3 siblings, 0 replies; 27+ messages in thread
From: Wenbin Zeng @ 2019-06-12 12:09 UTC (permalink / raw)
  To: bfields, davem, viro
  Cc: jlayton, trond.myklebust, anna.schumaker, wenbinzeng, dsahern,
	nicolas.dichtel, willy, edumazet, jakub.kicinski, tyhicks,
	chuck.lever, neilb, linux-fsdevel, linux-kernel, netdev,
	linux-nfs, J. Bruce Fields

When use-gss-proxy is set to 1, write_gssp() creates a rpc client in
gssp_rpc_create(), this increases netns refcount by 2, these refcounts are
supposed to be released in rpcsec_gss_exit_net(), but it will never happen
because rpcsec_gss_exit_net() is triggered only when netns refcount gets
to 0, specifically:
    refcount=0 -> cleanup_net() -> ops_exit_list -> rpcsec_gss_exit_net
It is a deadlock situation here, refcount will never get to 0 unless
rpcsec_gss_exit_net() is called.

This fix introduced a new callback i.e. evict in struct proc_ns_operations,
which is called in nsfs_evict. Moving rpcsec_gss_exit_net to evict path
gives it a chance to get called and avoids the above deadlock situation.

Signed-off-by: Wenbin Zeng <wenbinzeng@tencent.com>
Cc: J. Bruce Fields <bfields@redhat.com>
---
 net/sunrpc/auth_gss/auth_gss.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 3fd56c0..3e76c8a 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -2136,14 +2136,14 @@ static __net_init int rpcsec_gss_init_net(struct net *net)
 	return gss_svc_init_net(net);
 }
 
-static __net_exit void rpcsec_gss_exit_net(struct net *net)
+static void rpcsec_gss_evict_net(struct net *net)
 {
 	gss_svc_shutdown_net(net);
 }
 
 static struct pernet_operations rpcsec_gss_net_ops = {
 	.init = rpcsec_gss_init_net,
-	.exit = rpcsec_gss_exit_net,
+	.evict = rpcsec_gss_evict_net,
 };
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1
  2019-06-12  8:37     ` Wenbin Zeng
@ 2019-06-12 15:52       ` J. Bruce Fields
  2021-09-07 14:48         ` wanghai (M)
  0 siblings, 1 reply; 27+ messages in thread
From: J. Bruce Fields @ 2019-06-12 15:52 UTC (permalink / raw)
  To: Wenbin Zeng
  Cc: viro, davem, jlayton, trond.myklebust, anna.schumaker,
	wenbinzeng, dsahern, nicolas.dichtel, willy, edumazet,
	jakub.kicinski, tyhicks, chuck.lever, neilb, linux-fsdevel,
	linux-kernel, netdev, linux-nfs

On Wed, Jun 12, 2019 at 04:37:55PM +0800, Wenbin Zeng wrote:
> On Tue, May 14, 2019 at 09:03:31PM -0400, J. Bruce Fields wrote:
> > Whoops, I was slow to test these.  I'm getting failuring krb5 nfs
> > mounts, and the following the server's logs.  Dropping the three patches
> > for now.
> My bad, I should have found it earlier. Thank you for testing it, Bruce.
> 
> I figured it out, the problem that you saw is due to the following code:
> the if-condition is incorrect here because sn->gssp_clnt==NULL doesn't mean
> inexistence of 'use-gss-proxy':

Thanks, but with the new patches I see the following.  I haven't tried
to investigate.

--b.

[ 2908.134813] ------------[ cut here ]------------
[ 2908.135732] name 'use-gss-proxy'
[ 2908.136276] WARNING: CPU: 2 PID: 15032 at fs/proc/generic.c:673 remove_proc_entry+0x124/0x190
[ 2908.138144] Modules linked in: nfsv4 rpcsec_gss_krb5 nfsv3 nfs_acl nfs lockd grace auth_rpcgss sunrpc
[ 2908.140183] CPU: 2 PID: 15032 Comm: (coredump) Not tainted 5.2.0-rc2-00441-gaef575f54640 #2257
[ 2908.142062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[ 2908.143756] RIP: 0010:remove_proc_entry+0x124/0x190
[ 2908.144519] Code: c3 48 c7 c7 60 24 8b 82 e8 29 16 a5 00 eb d5 48 c7 c7 60 24 8b 82 e8 1b 16 a5 00 4c 89 e6 48 c7 c7 ec 4c 52 82 e8 50 fd db ff <0f> 0b eb b6 48 8b 04 24 83 a8 90 00 00 00 01 e9 78 ff ff ff 4c 89
[ 2908.148138] RSP: 0018:ffffc900047bbdb0 EFLAGS: 00010282
[ 2908.148945] RAX: 0000000000000000 RBX: ffff888036060580 RCX: 0000000000000000
[ 2908.150139] RDX: ffff88807fd24e80 RSI: ffff88807fd165b8 RDI: 00000000ffffffff
[ 2908.151334] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 2908.152564] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa00adb1b
[ 2908.153816] R13: 00007ffc8bda5d30 R14: 0000000000000000 R15: ffff88805e2873a8
[ 2908.155007] FS:  00007f470bc27e40(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
[ 2908.156421] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2908.157333] CR2: 0000562b07764c58 CR3: 000000005e8ea001 CR4: 00000000001606e0
[ 2908.158529] Call Trace:
[ 2908.158796]  destroy_use_gss_proxy_proc_entry+0xb7/0x150 [auth_rpcgss]
[ 2908.159966]  gss_svc_shutdown_net+0x11/0x170 [auth_rpcgss]
[ 2908.160830]  netns_evict+0x2f/0x40
[ 2908.161266]  nsfs_evict+0x27/0x40
[ 2908.161685]  evict+0xd0/0x1a0
[ 2908.162035]  __dentry_kill+0xdf/0x180
[ 2908.162520]  dentry_kill+0x50/0x1c0
[ 2908.163005]  ? dput+0x1c/0x2b0
[ 2908.163369]  dput+0x260/0x2b0
[ 2908.163739]  path_put+0x12/0x20
[ 2908.164155]  do_faccessat+0x17c/0x240
[ 2908.164643]  do_syscall_64+0x50/0x1c0
[ 2908.165170]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 2908.165959] RIP: 0033:0x7f47098e2157
[ 2908.166445] Code: 77 01 c3 48 8b 15 69 dd 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 15 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 39 dd 2c 00 f7 d8 64 89 02 b8
[ 2908.169994] RSP: 002b:00007ffc8bda5d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
[ 2908.171315] RAX: ffffffffffffffda RBX: 0000562b0774d979 RCX: 00007f47098e2157
[ 2908.172563] RDX: 00007ffc8bda5d3e RSI: 0000000000000000 RDI: 00007ffc8bda5d30
[ 2908.173753] RBP: 00007ffc8bda5d70 R08: 0000000000000000 R09: 0000562b07d0b130
[ 2908.174943] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc8bda5d30
[ 2908.176163] R13: 0000562b07b34c80 R14: 0000562b07b35120 R15: 0000000000000000
[ 2908.177395] irq event stamp: 4256
[ 2908.177835] hardirqs last  enabled at (4255): [<ffffffff811221ee>] console_unlock+0x41e/0x590
[ 2908.179378] hardirqs last disabled at (4256): [<ffffffff81001b2f>] trace_hardirqs_off_thunk+0x1a/0x1c
[ 2908.181031] softirqs last  enabled at (4252): [<ffffffff820002be>] __do_softirq+0x2be/0x4aa
[ 2908.182458] softirqs last disabled at (4233): [<ffffffff810bf8e0>] irq_exit+0x80/0x90
[ 2908.183869] ---[ end trace d88132b63efc09d8 ]---
[ 2908.184620] BUG: kernel NULL pointer dereference, address: 0000000000000030
[ 2908.185829] #PF: supervisor read access in kernel mode
[ 2908.186924] #PF: error_code(0x0000) - not-present page
[ 2908.187887] PGD 0 P4D 0 
[ 2908.188318] Oops: 0000 [#1] PREEMPT SMP PTI
[ 2908.189254] CPU: 2 PID: 15032 Comm: (coredump) Tainted: G        W         5.2.0-rc2-00441-gaef575f54640 #2257
[ 2908.192506] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[ 2908.195137] RIP: 0010:__lock_acquire+0x3d2/0x1d90
[ 2908.196414] Code: db 48 8b 84 24 88 00 00 00 65 48 33 04 25 28 00 00 00 0f 85 be 10 00 00 48 8d 65 d8 44 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <48> 81 3f 60 0d 01 83 41 bb 00 00 00 00 45 0f 45 d8 83 fe 01 0f 87
[ 2908.202720] RSP: 0018:ffffc900047bbc80 EFLAGS: 00010002
[ 2908.204165] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[ 2908.206125] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000030
[ 2908.208203] RBP: ffffc900047bbd40 R08: 0000000000000001 R09: 0000000000000000
[ 2908.210219] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88807ad91500
[ 2908.211386] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000282
[ 2908.212532] FS:  00007f470bc27e40(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
[ 2908.213647] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2908.214400] CR2: 0000000000000030 CR3: 000000005e8ea001 CR4: 00000000001606e0
[ 2908.215393] Call Trace:
[ 2908.215589]  ? __lock_acquire+0x255/0x1d90
[ 2908.216071]  ? clear_gssp_clnt+0x1b/0x50 [auth_rpcgss]
[ 2908.216720]  ? __mutex_lock+0x99/0x920
[ 2908.217114]  lock_acquire+0x95/0x1b0
[ 2908.217484]  ? cache_purge+0x1c/0x110 [sunrpc]
[ 2908.218000]  _raw_spin_lock+0x2f/0x40
[ 2908.218370]  ? cache_purge+0x1c/0x110 [sunrpc]
[ 2908.218882]  cache_purge+0x1c/0x110 [sunrpc]
[ 2908.219346]  gss_svc_shutdown_net+0xb8/0x170 [auth_rpcgss]
[ 2908.220104]  netns_evict+0x2f/0x40
[ 2908.220439]  nsfs_evict+0x27/0x40
[ 2908.220786]  evict+0xd0/0x1a0
[ 2908.221050]  __dentry_kill+0xdf/0x180
[ 2908.221458]  dentry_kill+0x50/0x1c0
[ 2908.221842]  ? dput+0x1c/0x2b0
[ 2908.222126]  dput+0x260/0x2b0
[ 2908.222384]  path_put+0x12/0x20
[ 2908.222753]  do_faccessat+0x17c/0x240
[ 2908.223125]  do_syscall_64+0x50/0x1c0
[ 2908.223479]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 2908.224152] RIP: 0033:0x7f47098e2157
[ 2908.224566] Code: 77 01 c3 48 8b 15 69 dd 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 15 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 39 dd 2c 00 f7 d8 64 89 02 b8
[ 2908.228198] RSP: 002b:00007ffc8bda5d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
[ 2908.229496] RAX: ffffffffffffffda RBX: 0000562b0774d979 RCX: 00007f47098e2157
[ 2908.230938] RDX: 00007ffc8bda5d3e RSI: 0000000000000000 RDI: 00007ffc8bda5d30
[ 2908.232182] RBP: 00007ffc8bda5d70 R08: 0000000000000000 R09: 0000562b07d0b130
[ 2908.233481] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc8bda5d30
[ 2908.234750] R13: 0000562b07b34c80 R14: 0000562b07b35120 R15: 0000000000000000
[ 2908.236068] Modules linked in: nfsv4 rpcsec_gss_krb5 nfsv3 nfs_acl nfs lockd grace auth_rpcgss sunrpc
[ 2908.237861] CR2: 0000000000000030
[ 2908.238277] ---[ end trace d88132b63efc09d9 ]---

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1
  2019-06-12 12:09 ` [PATCH v3 " Wenbin Zeng
                     ` (2 preceding siblings ...)
  2019-06-12 12:09   ` [PATCH v3 3/3] auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when use-gss-proxy==1 Wenbin Zeng
@ 2019-08-01 19:53   ` J. Bruce Fields
  2021-08-28 11:26     ` wanghai (M)
  3 siblings, 1 reply; 27+ messages in thread
From: J. Bruce Fields @ 2019-08-01 19:53 UTC (permalink / raw)
  To: Wenbin Zeng
  Cc: davem, viro, jlayton, trond.myklebust, anna.schumaker,
	wenbinzeng, dsahern, nicolas.dichtel, willy, edumazet,
	jakub.kicinski, tyhicks, chuck.lever, neilb, linux-fsdevel,
	linux-kernel, netdev, linux-nfs

I lost track, what happened to these patches?

--b.

On Wed, Jun 12, 2019 at 08:09:27PM +0800, Wenbin Zeng wrote:
> This patch series fixes an auth_gss bug that results in netns refcount
> leaks when use-gss-proxy is set to 1.
> 
> The problem was found in privileged docker containers with gssproxy service
> enabled and /proc/net/rpc/use-gss-proxy set to 1, the corresponding
> struct net->count ends up at 2 after container gets killed, the consequence
> is that the struct net cannot be freed.
> 
> It turns out that write_gssp() called gssp_rpc_create() to create a rpc
> client, this increases net->count by 2; rpcsec_gss_exit_net() is supposed
> to decrease net->count but it never gets called because its call-path is:
>         net->count==0 -> cleanup_net -> ops_exit_list -> rpcsec_gss_exit_net
> Before rpcsec_gss_exit_net() gets called, net->count cannot reach 0, this
> is a deadlock situation.
> 
> To fix the problem, we must break the deadlock, rpcsec_gss_exit_net()
> should move out of the put() path and find another chance to get called,
> I think nsfs_evict() is a good place to go, when netns inode gets evicted
> we call rpcsec_gss_exit_net() to free the rpc client, this requires a new
> callback i.e. evict to be added in struct proc_ns_operations, and add
> netns_evict() as one of netns_operations as well.
> 
> v1->v2:
>  * in nsfs_evict(), move ->evict() in front of ->put()
> v2->v3:
>  * rpcsec_gss_evict_net() directly call gss_svc_shutdown_net() regardless
>    if gssp_clnt is null, this is exactly same to what rpcsec_gss_exit_net()
>    previously did
> 
> Wenbin Zeng (3):
>   nsfs: add evict callback into struct proc_ns_operations
>   netns: add netns_evict into netns_operations
>   auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when
>     use-gss-proxy==1
> 
>  fs/nsfs.c                      |  2 ++
>  include/linux/proc_ns.h        |  1 +
>  include/net/net_namespace.h    |  1 +
>  net/core/net_namespace.c       | 12 ++++++++++++
>  net/sunrpc/auth_gss/auth_gss.c |  4 ++--
>  5 files changed, 18 insertions(+), 2 deletions(-)
> 
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Re: [PATCH v3 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1
  2019-08-01 19:53   ` [PATCH v3 0/3] auth_gss: netns refcount leaks " J. Bruce Fields
@ 2021-08-28 11:26     ` wanghai (M)
  0 siblings, 0 replies; 27+ messages in thread
From: wanghai (M) @ 2021-08-28 11:26 UTC (permalink / raw)
  To: J. Bruce Fields, Wenbin Zeng
  Cc: davem, viro, jlayton, trond.myklebust, anna.schumaker,
	wenbinzeng, dsahern, nicolas.dichtel, willy, edumazet,
	jakub.kicinski, tyhicks, chuck.lever, neilb, linux-fsdevel,
	linux-kernel, netdev, linux-nfs


在 2019/8/2 3:53, J. Bruce Fields 写道:
> I lost track, what happened to these patches?
>
> --b.
>
> On Wed, Jun 12, 2019 at 08:09:27PM +0800, Wenbin Zeng wrote:
>> This patch series fixes an auth_gss bug that results in netns refcount
>> leaks when use-gss-proxy is set to 1.
>>
>> The problem was found in privileged docker containers with gssproxy service
>> enabled and /proc/net/rpc/use-gss-proxy set to 1, the corresponding
>> struct net->count ends up at 2 after container gets killed, the consequence
>> is that the struct net cannot be freed.
>>
>> It turns out that write_gssp() called gssp_rpc_create() to create a rpc
>> client, this increases net->count by 2; rpcsec_gss_exit_net() is supposed
>> to decrease net->count but it never gets called because its call-path is:
>>          net->count==0 -> cleanup_net -> ops_exit_list -> rpcsec_gss_exit_net
>> Before rpcsec_gss_exit_net() gets called, net->count cannot reach 0, this
>> is a deadlock situation.
>>
>> To fix the problem, we must break the deadlock, rpcsec_gss_exit_net()
>> should move out of the put() path and find another chance to get called,
>> I think nsfs_evict() is a good place to go, when netns inode gets evicted
>> we call rpcsec_gss_exit_net() to free the rpc client, this requires a new
>> callback i.e. evict to be added in struct proc_ns_operations, and add
>> netns_evict() as one of netns_operations as well.
>>
>> v1->v2:
>>   * in nsfs_evict(), move ->evict() in front of ->put()
>> v2->v3:
>>   * rpcsec_gss_evict_net() directly call gss_svc_shutdown_net() regardless
>>     if gssp_clnt is null, this is exactly same to what rpcsec_gss_exit_net()
>>     previously did
>>
>> Wenbin Zeng (3):
>>    nsfs: add evict callback into struct proc_ns_operations
>>    netns: add netns_evict into netns_operations
>>    auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when
>>      use-gss-proxy==1
>>
>>   fs/nsfs.c                      |  2 ++
>>   include/linux/proc_ns.h        |  1 +
>>   include/net/net_namespace.h    |  1 +
>>   net/core/net_namespace.c       | 12 ++++++++++++
>>   net/sunrpc/auth_gss/auth_gss.c |  4 ++--
>>   5 files changed, 18 insertions(+), 2 deletions(-)
>>
>> -- 
>> 1.8.3.1
These patchsets don't seem to merge into the mainline, are there any 
other patches that fix this bug?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1
  2019-06-12 15:52       ` J. Bruce Fields
@ 2021-09-07 14:48         ` wanghai (M)
  2021-09-08 20:51           ` J. Bruce Fields
  0 siblings, 1 reply; 27+ messages in thread
From: wanghai (M) @ 2021-09-07 14:48 UTC (permalink / raw)
  To: J. Bruce Fields, Wenbin Zeng, viro, davem, jlayton,
	trond.myklebust, anna.schumaker, wenbinzeng, dsahern,
	nicolas.dichtel, willy, edumazet, jakub.kicinski, tyhicks,
	chuck.lever, neilb, linux-fsdevel, linux-kernel, netdev,
	linux-nfs


在 2019/6/12 23:52, J. Bruce Fields 写道:
> On Wed, Jun 12, 2019 at 04:37:55PM +0800, Wenbin Zeng wrote:
>> On Tue, May 14, 2019 at 09:03:31PM -0400, J. Bruce Fields wrote:
>>> Whoops, I was slow to test these.  I'm getting failuring krb5 nfs
>>> mounts, and the following the server's logs.  Dropping the three patches
>>> for now.
>> My bad, I should have found it earlier. Thank you for testing it, Bruce.
>>
>> I figured it out, the problem that you saw is due to the following code:
>> the if-condition is incorrect here because sn->gssp_clnt==NULL doesn't mean
>> inexistence of 'use-gss-proxy':
> Thanks, but with the new patches I see the following.  I haven't tried
> to investigate.
This patchset adds the nsfs_evict()->netns_evict() code for breaking 
deadlock bugs that exist, but this may cause double free because 
nsfs_evict()->netns_evict() may be called multiple times.

for example:

int main()
{
     int fd = open("/proc/self/ns/net", O_RDONLY);
     close(fd);

     fd = open("/proc/self/ns/net", O_RDONLY);
     close(fd);
}

Therefore, the nsfs evict cannot be used to break the deadlock.

A large number of netns leaks may cause OOM problems, currently I can't 
find a good solution to fix it, does anyone have a good idea?
> --b.
>
> [ 2908.134813] ------------[ cut here ]------------
> [ 2908.135732] name 'use-gss-proxy'
> [ 2908.136276] WARNING: CPU: 2 PID: 15032 at fs/proc/generic.c:673 remove_proc_entry+0x124/0x190
> [ 2908.138144] Modules linked in: nfsv4 rpcsec_gss_krb5 nfsv3 nfs_acl nfs lockd grace auth_rpcgss sunrpc
> [ 2908.140183] CPU: 2 PID: 15032 Comm: (coredump) Not tainted 5.2.0-rc2-00441-gaef575f54640 #2257
> [ 2908.142062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
> [ 2908.143756] RIP: 0010:remove_proc_entry+0x124/0x190
> [ 2908.144519] Code: c3 48 c7 c7 60 24 8b 82 e8 29 16 a5 00 eb d5 48 c7 c7 60 24 8b 82 e8 1b 16 a5 00 4c 89 e6 48 c7 c7 ec 4c 52 82 e8 50 fd db ff <0f> 0b eb b6 48 8b 04 24 83 a8 90 00 00 00 01 e9 78 ff ff ff 4c 89
> [ 2908.148138] RSP: 0018:ffffc900047bbdb0 EFLAGS: 00010282
> [ 2908.148945] RAX: 0000000000000000 RBX: ffff888036060580 RCX: 0000000000000000
> [ 2908.150139] RDX: ffff88807fd24e80 RSI: ffff88807fd165b8 RDI: 00000000ffffffff
> [ 2908.151334] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ 2908.152564] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa00adb1b
> [ 2908.153816] R13: 00007ffc8bda5d30 R14: 0000000000000000 R15: ffff88805e2873a8
> [ 2908.155007] FS:  00007f470bc27e40(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
> [ 2908.156421] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2908.157333] CR2: 0000562b07764c58 CR3: 000000005e8ea001 CR4: 00000000001606e0
> [ 2908.158529] Call Trace:
> [ 2908.158796]  destroy_use_gss_proxy_proc_entry+0xb7/0x150 [auth_rpcgss]
> [ 2908.159966]  gss_svc_shutdown_net+0x11/0x170 [auth_rpcgss]
> [ 2908.160830]  netns_evict+0x2f/0x40
> [ 2908.161266]  nsfs_evict+0x27/0x40
> [ 2908.161685]  evict+0xd0/0x1a0
> [ 2908.162035]  __dentry_kill+0xdf/0x180
> [ 2908.162520]  dentry_kill+0x50/0x1c0
> [ 2908.163005]  ? dput+0x1c/0x2b0
> [ 2908.163369]  dput+0x260/0x2b0
> [ 2908.163739]  path_put+0x12/0x20
> [ 2908.164155]  do_faccessat+0x17c/0x240
> [ 2908.164643]  do_syscall_64+0x50/0x1c0
> [ 2908.165170]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 2908.165959] RIP: 0033:0x7f47098e2157
> [ 2908.166445] Code: 77 01 c3 48 8b 15 69 dd 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 15 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 39 dd 2c 00 f7 d8 64 89 02 b8
> [ 2908.169994] RSP: 002b:00007ffc8bda5d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
> [ 2908.171315] RAX: ffffffffffffffda RBX: 0000562b0774d979 RCX: 00007f47098e2157
> [ 2908.172563] RDX: 00007ffc8bda5d3e RSI: 0000000000000000 RDI: 00007ffc8bda5d30
> [ 2908.173753] RBP: 00007ffc8bda5d70 R08: 0000000000000000 R09: 0000562b07d0b130
> [ 2908.174943] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc8bda5d30
> [ 2908.176163] R13: 0000562b07b34c80 R14: 0000562b07b35120 R15: 0000000000000000
> [ 2908.177395] irq event stamp: 4256
> [ 2908.177835] hardirqs last  enabled at (4255): [<ffffffff811221ee>] console_unlock+0x41e/0x590
> [ 2908.179378] hardirqs last disabled at (4256): [<ffffffff81001b2f>] trace_hardirqs_off_thunk+0x1a/0x1c
> [ 2908.181031] softirqs last  enabled at (4252): [<ffffffff820002be>] __do_softirq+0x2be/0x4aa
> [ 2908.182458] softirqs last disabled at (4233): [<ffffffff810bf8e0>] irq_exit+0x80/0x90
> [ 2908.183869] ---[ end trace d88132b63efc09d8 ]---
> [ 2908.184620] BUG: kernel NULL pointer dereference, address: 0000000000000030
> [ 2908.185829] #PF: supervisor read access in kernel mode
> [ 2908.186924] #PF: error_code(0x0000) - not-present page
> [ 2908.187887] PGD 0 P4D 0
> [ 2908.188318] Oops: 0000 [#1] PREEMPT SMP PTI
> [ 2908.189254] CPU: 2 PID: 15032 Comm: (coredump) Tainted: G        W         5.2.0-rc2-00441-gaef575f54640 #2257
> [ 2908.192506] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
> [ 2908.195137] RIP: 0010:__lock_acquire+0x3d2/0x1d90
> [ 2908.196414] Code: db 48 8b 84 24 88 00 00 00 65 48 33 04 25 28 00 00 00 0f 85 be 10 00 00 48 8d 65 d8 44 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <48> 81 3f 60 0d 01 83 41 bb 00 00 00 00 45 0f 45 d8 83 fe 01 0f 87
> [ 2908.202720] RSP: 0018:ffffc900047bbc80 EFLAGS: 00010002
> [ 2908.204165] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> [ 2908.206125] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000030
> [ 2908.208203] RBP: ffffc900047bbd40 R08: 0000000000000001 R09: 0000000000000000
> [ 2908.210219] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88807ad91500
> [ 2908.211386] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000282
> [ 2908.212532] FS:  00007f470bc27e40(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
> [ 2908.213647] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2908.214400] CR2: 0000000000000030 CR3: 000000005e8ea001 CR4: 00000000001606e0
> [ 2908.215393] Call Trace:
> [ 2908.215589]  ? __lock_acquire+0x255/0x1d90
> [ 2908.216071]  ? clear_gssp_clnt+0x1b/0x50 [auth_rpcgss]
> [ 2908.216720]  ? __mutex_lock+0x99/0x920
> [ 2908.217114]  lock_acquire+0x95/0x1b0
> [ 2908.217484]  ? cache_purge+0x1c/0x110 [sunrpc]
> [ 2908.218000]  _raw_spin_lock+0x2f/0x40
> [ 2908.218370]  ? cache_purge+0x1c/0x110 [sunrpc]
> [ 2908.218882]  cache_purge+0x1c/0x110 [sunrpc]
> [ 2908.219346]  gss_svc_shutdown_net+0xb8/0x170 [auth_rpcgss]
> [ 2908.220104]  netns_evict+0x2f/0x40
> [ 2908.220439]  nsfs_evict+0x27/0x40
> [ 2908.220786]  evict+0xd0/0x1a0
> [ 2908.221050]  __dentry_kill+0xdf/0x180
> [ 2908.221458]  dentry_kill+0x50/0x1c0
> [ 2908.221842]  ? dput+0x1c/0x2b0
> [ 2908.222126]  dput+0x260/0x2b0
> [ 2908.222384]  path_put+0x12/0x20
> [ 2908.222753]  do_faccessat+0x17c/0x240
> [ 2908.223125]  do_syscall_64+0x50/0x1c0
> [ 2908.223479]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 2908.224152] RIP: 0033:0x7f47098e2157
> [ 2908.224566] Code: 77 01 c3 48 8b 15 69 dd 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 15 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 39 dd 2c 00 f7 d8 64 89 02 b8
> [ 2908.228198] RSP: 002b:00007ffc8bda5d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
> [ 2908.229496] RAX: ffffffffffffffda RBX: 0000562b0774d979 RCX: 00007f47098e2157
> [ 2908.230938] RDX: 00007ffc8bda5d3e RSI: 0000000000000000 RDI: 00007ffc8bda5d30
> [ 2908.232182] RBP: 00007ffc8bda5d70 R08: 0000000000000000 R09: 0000562b07d0b130
> [ 2908.233481] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc8bda5d30
> [ 2908.234750] R13: 0000562b07b34c80 R14: 0000562b07b35120 R15: 0000000000000000
> [ 2908.236068] Modules linked in: nfsv4 rpcsec_gss_krb5 nfsv3 nfs_acl nfs lockd grace auth_rpcgss sunrpc
> [ 2908.237861] CR2: 0000000000000030
> [ 2908.238277] ---[ end trace d88132b63efc09d9 ]---

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1
  2021-09-07 14:48         ` wanghai (M)
@ 2021-09-08 20:51           ` J. Bruce Fields
  2021-09-09  2:52             ` wanghai (M)
  0 siblings, 1 reply; 27+ messages in thread
From: J. Bruce Fields @ 2021-09-08 20:51 UTC (permalink / raw)
  To: wanghai (M)
  Cc: Wenbin Zeng, viro, davem, jlayton, trond.myklebust,
	anna.schumaker, wenbinzeng, dsahern, nicolas.dichtel, willy,
	edumazet, jakub.kicinski, tyhicks, chuck.lever, neilb,
	linux-fsdevel, linux-kernel, netdev, linux-nfs

On Tue, Sep 07, 2021 at 10:48:52PM +0800, wanghai (M) wrote:
> 
> 在 2019/6/12 23:52, J. Bruce Fields 写道:
> >On Wed, Jun 12, 2019 at 04:37:55PM +0800, Wenbin Zeng wrote:
> >>On Tue, May 14, 2019 at 09:03:31PM -0400, J. Bruce Fields wrote:
> >>>Whoops, I was slow to test these.  I'm getting failuring krb5 nfs
> >>>mounts, and the following the server's logs.  Dropping the three patches
> >>>for now.
> >>My bad, I should have found it earlier. Thank you for testing it, Bruce.
> >>
> >>I figured it out, the problem that you saw is due to the following code:
> >>the if-condition is incorrect here because sn->gssp_clnt==NULL doesn't mean
> >>inexistence of 'use-gss-proxy':
> >Thanks, but with the new patches I see the following.  I haven't tried
> >to investigate.
> This patchset adds the nsfs_evict()->netns_evict() code for breaking
> deadlock bugs that exist, but this may cause double free because
> nsfs_evict()->netns_evict() may be called multiple times.
> 
> for example:
> 
> int main()
> {
>     int fd = open("/proc/self/ns/net", O_RDONLY);
>     close(fd);
> 
>     fd = open("/proc/self/ns/net", O_RDONLY);
>     close(fd);
> }
> 
> Therefore, the nsfs evict cannot be used to break the deadlock.

Sorry, I haven't really been following this, but I though this problem
was fixed by your checking for gssp_clnt (instead of just relying on the
use_gssp_proc check) in v3 of your patches?

--b.

> 
> A large number of netns leaks may cause OOM problems, currently I
> can't find a good solution to fix it, does anyone have a good idea?
> >--b.
> >
> >[ 2908.134813] ------------[ cut here ]------------
> >[ 2908.135732] name 'use-gss-proxy'
> >[ 2908.136276] WARNING: CPU: 2 PID: 15032 at fs/proc/generic.c:673 remove_proc_entry+0x124/0x190
> >[ 2908.138144] Modules linked in: nfsv4 rpcsec_gss_krb5 nfsv3 nfs_acl nfs lockd grace auth_rpcgss sunrpc
> >[ 2908.140183] CPU: 2 PID: 15032 Comm: (coredump) Not tainted 5.2.0-rc2-00441-gaef575f54640 #2257
> >[ 2908.142062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
> >[ 2908.143756] RIP: 0010:remove_proc_entry+0x124/0x190
> >[ 2908.144519] Code: c3 48 c7 c7 60 24 8b 82 e8 29 16 a5 00 eb d5 48 c7 c7 60 24 8b 82 e8 1b 16 a5 00 4c 89 e6 48 c7 c7 ec 4c 52 82 e8 50 fd db ff <0f> 0b eb b6 48 8b 04 24 83 a8 90 00 00 00 01 e9 78 ff ff ff 4c 89
> >[ 2908.148138] RSP: 0018:ffffc900047bbdb0 EFLAGS: 00010282
> >[ 2908.148945] RAX: 0000000000000000 RBX: ffff888036060580 RCX: 0000000000000000
> >[ 2908.150139] RDX: ffff88807fd24e80 RSI: ffff88807fd165b8 RDI: 00000000ffffffff
> >[ 2908.151334] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> >[ 2908.152564] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa00adb1b
> >[ 2908.153816] R13: 00007ffc8bda5d30 R14: 0000000000000000 R15: ffff88805e2873a8
> >[ 2908.155007] FS:  00007f470bc27e40(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
> >[ 2908.156421] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >[ 2908.157333] CR2: 0000562b07764c58 CR3: 000000005e8ea001 CR4: 00000000001606e0
> >[ 2908.158529] Call Trace:
> >[ 2908.158796]  destroy_use_gss_proxy_proc_entry+0xb7/0x150 [auth_rpcgss]
> >[ 2908.159966]  gss_svc_shutdown_net+0x11/0x170 [auth_rpcgss]
> >[ 2908.160830]  netns_evict+0x2f/0x40
> >[ 2908.161266]  nsfs_evict+0x27/0x40
> >[ 2908.161685]  evict+0xd0/0x1a0
> >[ 2908.162035]  __dentry_kill+0xdf/0x180
> >[ 2908.162520]  dentry_kill+0x50/0x1c0
> >[ 2908.163005]  ? dput+0x1c/0x2b0
> >[ 2908.163369]  dput+0x260/0x2b0
> >[ 2908.163739]  path_put+0x12/0x20
> >[ 2908.164155]  do_faccessat+0x17c/0x240
> >[ 2908.164643]  do_syscall_64+0x50/0x1c0
> >[ 2908.165170]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >[ 2908.165959] RIP: 0033:0x7f47098e2157
> >[ 2908.166445] Code: 77 01 c3 48 8b 15 69 dd 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 15 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 39 dd 2c 00 f7 d8 64 89 02 b8
> >[ 2908.169994] RSP: 002b:00007ffc8bda5d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
> >[ 2908.171315] RAX: ffffffffffffffda RBX: 0000562b0774d979 RCX: 00007f47098e2157
> >[ 2908.172563] RDX: 00007ffc8bda5d3e RSI: 0000000000000000 RDI: 00007ffc8bda5d30
> >[ 2908.173753] RBP: 00007ffc8bda5d70 R08: 0000000000000000 R09: 0000562b07d0b130
> >[ 2908.174943] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc8bda5d30
> >[ 2908.176163] R13: 0000562b07b34c80 R14: 0000562b07b35120 R15: 0000000000000000
> >[ 2908.177395] irq event stamp: 4256
> >[ 2908.177835] hardirqs last  enabled at (4255): [<ffffffff811221ee>] console_unlock+0x41e/0x590
> >[ 2908.179378] hardirqs last disabled at (4256): [<ffffffff81001b2f>] trace_hardirqs_off_thunk+0x1a/0x1c
> >[ 2908.181031] softirqs last  enabled at (4252): [<ffffffff820002be>] __do_softirq+0x2be/0x4aa
> >[ 2908.182458] softirqs last disabled at (4233): [<ffffffff810bf8e0>] irq_exit+0x80/0x90
> >[ 2908.183869] ---[ end trace d88132b63efc09d8 ]---
> >[ 2908.184620] BUG: kernel NULL pointer dereference, address: 0000000000000030
> >[ 2908.185829] #PF: supervisor read access in kernel mode
> >[ 2908.186924] #PF: error_code(0x0000) - not-present page
> >[ 2908.187887] PGD 0 P4D 0
> >[ 2908.188318] Oops: 0000 [#1] PREEMPT SMP PTI
> >[ 2908.189254] CPU: 2 PID: 15032 Comm: (coredump) Tainted: G        W         5.2.0-rc2-00441-gaef575f54640 #2257
> >[ 2908.192506] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
> >[ 2908.195137] RIP: 0010:__lock_acquire+0x3d2/0x1d90
> >[ 2908.196414] Code: db 48 8b 84 24 88 00 00 00 65 48 33 04 25 28 00 00 00 0f 85 be 10 00 00 48 8d 65 d8 44 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <48> 81 3f 60 0d 01 83 41 bb 00 00 00 00 45 0f 45 d8 83 fe 01 0f 87
> >[ 2908.202720] RSP: 0018:ffffc900047bbc80 EFLAGS: 00010002
> >[ 2908.204165] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> >[ 2908.206125] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000030
> >[ 2908.208203] RBP: ffffc900047bbd40 R08: 0000000000000001 R09: 0000000000000000
> >[ 2908.210219] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88807ad91500
> >[ 2908.211386] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000282
> >[ 2908.212532] FS:  00007f470bc27e40(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
> >[ 2908.213647] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >[ 2908.214400] CR2: 0000000000000030 CR3: 000000005e8ea001 CR4: 00000000001606e0
> >[ 2908.215393] Call Trace:
> >[ 2908.215589]  ? __lock_acquire+0x255/0x1d90
> >[ 2908.216071]  ? clear_gssp_clnt+0x1b/0x50 [auth_rpcgss]
> >[ 2908.216720]  ? __mutex_lock+0x99/0x920
> >[ 2908.217114]  lock_acquire+0x95/0x1b0
> >[ 2908.217484]  ? cache_purge+0x1c/0x110 [sunrpc]
> >[ 2908.218000]  _raw_spin_lock+0x2f/0x40
> >[ 2908.218370]  ? cache_purge+0x1c/0x110 [sunrpc]
> >[ 2908.218882]  cache_purge+0x1c/0x110 [sunrpc]
> >[ 2908.219346]  gss_svc_shutdown_net+0xb8/0x170 [auth_rpcgss]
> >[ 2908.220104]  netns_evict+0x2f/0x40
> >[ 2908.220439]  nsfs_evict+0x27/0x40
> >[ 2908.220786]  evict+0xd0/0x1a0
> >[ 2908.221050]  __dentry_kill+0xdf/0x180
> >[ 2908.221458]  dentry_kill+0x50/0x1c0
> >[ 2908.221842]  ? dput+0x1c/0x2b0
> >[ 2908.222126]  dput+0x260/0x2b0
> >[ 2908.222384]  path_put+0x12/0x20
> >[ 2908.222753]  do_faccessat+0x17c/0x240
> >[ 2908.223125]  do_syscall_64+0x50/0x1c0
> >[ 2908.223479]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >[ 2908.224152] RIP: 0033:0x7f47098e2157
> >[ 2908.224566] Code: 77 01 c3 48 8b 15 69 dd 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 15 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 39 dd 2c 00 f7 d8 64 89 02 b8
> >[ 2908.228198] RSP: 002b:00007ffc8bda5d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
> >[ 2908.229496] RAX: ffffffffffffffda RBX: 0000562b0774d979 RCX: 00007f47098e2157
> >[ 2908.230938] RDX: 00007ffc8bda5d3e RSI: 0000000000000000 RDI: 00007ffc8bda5d30
> >[ 2908.232182] RBP: 00007ffc8bda5d70 R08: 0000000000000000 R09: 0000562b07d0b130
> >[ 2908.233481] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc8bda5d30
> >[ 2908.234750] R13: 0000562b07b34c80 R14: 0000562b07b35120 R15: 0000000000000000
> >[ 2908.236068] Modules linked in: nfsv4 rpcsec_gss_krb5 nfsv3 nfs_acl nfs lockd grace auth_rpcgss sunrpc
> >[ 2908.237861] CR2: 0000000000000030
> >[ 2908.238277] ---[ end trace d88132b63efc09d9 ]---

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1
  2021-09-08 20:51           ` J. Bruce Fields
@ 2021-09-09  2:52             ` wanghai (M)
  2021-09-09 19:52               ` J. Bruce Fields
  0 siblings, 1 reply; 27+ messages in thread
From: wanghai (M) @ 2021-09-09  2:52 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Wenbin Zeng, viro, davem, jlayton, trond.myklebust,
	anna.schumaker, wenbinzeng, dsahern, nicolas.dichtel, willy,
	edumazet, jakub.kicinski, tyhicks, chuck.lever, neilb,
	linux-fsdevel, linux-kernel, netdev, linux-nfs


在 2021/9/9 4:51, J. Bruce Fields 写道:
> On Tue, Sep 07, 2021 at 10:48:52PM +0800, wanghai (M) wrote:
>> 在 2019/6/12 23:52, J. Bruce Fields 写道:
>>> On Wed, Jun 12, 2019 at 04:37:55PM +0800, Wenbin Zeng wrote:
>>>> On Tue, May 14, 2019 at 09:03:31PM -0400, J. Bruce Fields wrote:
>>>>> Whoops, I was slow to test these.  I'm getting failuring krb5 nfs
>>>>> mounts, and the following the server's logs.  Dropping the three patches
>>>>> for now.
>>>> My bad, I should have found it earlier. Thank you for testing it, Bruce.
>>>>
>>>> I figured it out, the problem that you saw is due to the following code:
>>>> the if-condition is incorrect here because sn->gssp_clnt==NULL doesn't mean
>>>> inexistence of 'use-gss-proxy':
>>> Thanks, but with the new patches I see the following.  I haven't tried
>>> to investigate.
>> This patchset adds the nsfs_evict()->netns_evict() code for breaking
>> deadlock bugs that exist, but this may cause double free because
>> nsfs_evict()->netns_evict() may be called multiple times.
>>
>> for example:
>>
>> int main()
>> {
>>      int fd = open("/proc/self/ns/net", O_RDONLY);
>>      close(fd);
>>
>>      fd = open("/proc/self/ns/net", O_RDONLY);
>>      close(fd);
>> }
>>
>> Therefore, the nsfs evict cannot be used to break the deadlock.
> Sorry, I haven't really been following this, but I though this problem
> was fixed by your checking for gssp_clnt (instead of just relying on the
> use_gssp_proc check) in v3 of your patches?
>
> --b.
Sorry, I'm not Wenbin Zeng. I recently encountered the same problem and 
found that Zeng had posted a patchset for it. However, after my own 
analysis, I found that Zeng's patchset will cause the dobule free problem.

The v3 patches also has the double free issue. Because 
nsfs_evict()->netns_evict()->gss_svc_shutdown_net()->cache_purge() can 
be called multiple times.

And, even if there is no double free problem, the application can make 
the evict called earlier by 'open("/proc/self/ns/net", O_RDONLY); 
close(fd);', which makes sunrpc unusable.

Therefore, the v3 patchset is also not applicable.

This issue causes an OOM on my server after multiple docker creation and 
destruction, and I don't have a good solution at the moment.
>> A large number of netns leaks may cause OOM problems, currently I
>> can't find a good solution to fix it, does anyone have a good idea?
>>> --b.
>>>
>>> [ 2908.134813] ------------[ cut here ]------------
>>> [ 2908.135732] name 'use-gss-proxy'
>>> [ 2908.136276] WARNING: CPU: 2 PID: 15032 at fs/proc/generic.c:673 remove_proc_entry+0x124/0x190
>>> [ 2908.138144] Modules linked in: nfsv4 rpcsec_gss_krb5 nfsv3 nfs_acl nfs lockd grace auth_rpcgss sunrpc
>>> [ 2908.140183] CPU: 2 PID: 15032 Comm: (coredump) Not tainted 5.2.0-rc2-00441-gaef575f54640 #2257
>>> [ 2908.142062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
>>> [ 2908.143756] RIP: 0010:remove_proc_entry+0x124/0x190
>>> [ 2908.144519] Code: c3 48 c7 c7 60 24 8b 82 e8 29 16 a5 00 eb d5 48 c7 c7 60 24 8b 82 e8 1b 16 a5 00 4c 89 e6 48 c7 c7 ec 4c 52 82 e8 50 fd db ff <0f> 0b eb b6 48 8b 04 24 83 a8 90 00 00 00 01 e9 78 ff ff ff 4c 89
>>> [ 2908.148138] RSP: 0018:ffffc900047bbdb0 EFLAGS: 00010282
>>> [ 2908.148945] RAX: 0000000000000000 RBX: ffff888036060580 RCX: 0000000000000000
>>> [ 2908.150139] RDX: ffff88807fd24e80 RSI: ffff88807fd165b8 RDI: 00000000ffffffff
>>> [ 2908.151334] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>>> [ 2908.152564] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa00adb1b
>>> [ 2908.153816] R13: 00007ffc8bda5d30 R14: 0000000000000000 R15: ffff88805e2873a8
>>> [ 2908.155007] FS:  00007f470bc27e40(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
>>> [ 2908.156421] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 2908.157333] CR2: 0000562b07764c58 CR3: 000000005e8ea001 CR4: 00000000001606e0
>>> [ 2908.158529] Call Trace:
>>> [ 2908.158796]  destroy_use_gss_proxy_proc_entry+0xb7/0x150 [auth_rpcgss]
>>> [ 2908.159966]  gss_svc_shutdown_net+0x11/0x170 [auth_rpcgss]
>>> [ 2908.160830]  netns_evict+0x2f/0x40
>>> [ 2908.161266]  nsfs_evict+0x27/0x40
>>> [ 2908.161685]  evict+0xd0/0x1a0
>>> [ 2908.162035]  __dentry_kill+0xdf/0x180
>>> [ 2908.162520]  dentry_kill+0x50/0x1c0
>>> [ 2908.163005]  ? dput+0x1c/0x2b0
>>> [ 2908.163369]  dput+0x260/0x2b0
>>> [ 2908.163739]  path_put+0x12/0x20
>>> [ 2908.164155]  do_faccessat+0x17c/0x240
>>> [ 2908.164643]  do_syscall_64+0x50/0x1c0
>>> [ 2908.165170]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>> [ 2908.165959] RIP: 0033:0x7f47098e2157
>>> [ 2908.166445] Code: 77 01 c3 48 8b 15 69 dd 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 15 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 39 dd 2c 00 f7 d8 64 89 02 b8
>>> [ 2908.169994] RSP: 002b:00007ffc8bda5d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
>>> [ 2908.171315] RAX: ffffffffffffffda RBX: 0000562b0774d979 RCX: 00007f47098e2157
>>> [ 2908.172563] RDX: 00007ffc8bda5d3e RSI: 0000000000000000 RDI: 00007ffc8bda5d30
>>> [ 2908.173753] RBP: 00007ffc8bda5d70 R08: 0000000000000000 R09: 0000562b07d0b130
>>> [ 2908.174943] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc8bda5d30
>>> [ 2908.176163] R13: 0000562b07b34c80 R14: 0000562b07b35120 R15: 0000000000000000
>>> [ 2908.177395] irq event stamp: 4256
>>> [ 2908.177835] hardirqs last  enabled at (4255): [<ffffffff811221ee>] console_unlock+0x41e/0x590
>>> [ 2908.179378] hardirqs last disabled at (4256): [<ffffffff81001b2f>] trace_hardirqs_off_thunk+0x1a/0x1c
>>> [ 2908.181031] softirqs last  enabled at (4252): [<ffffffff820002be>] __do_softirq+0x2be/0x4aa
>>> [ 2908.182458] softirqs last disabled at (4233): [<ffffffff810bf8e0>] irq_exit+0x80/0x90
>>> [ 2908.183869] ---[ end trace d88132b63efc09d8 ]---
>>> [ 2908.184620] BUG: kernel NULL pointer dereference, address: 0000000000000030
>>> [ 2908.185829] #PF: supervisor read access in kernel mode
>>> [ 2908.186924] #PF: error_code(0x0000) - not-present page
>>> [ 2908.187887] PGD 0 P4D 0
>>> [ 2908.188318] Oops: 0000 [#1] PREEMPT SMP PTI
>>> [ 2908.189254] CPU: 2 PID: 15032 Comm: (coredump) Tainted: G        W         5.2.0-rc2-00441-gaef575f54640 #2257
>>> [ 2908.192506] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
>>> [ 2908.195137] RIP: 0010:__lock_acquire+0x3d2/0x1d90
>>> [ 2908.196414] Code: db 48 8b 84 24 88 00 00 00 65 48 33 04 25 28 00 00 00 0f 85 be 10 00 00 48 8d 65 d8 44 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <48> 81 3f 60 0d 01 83 41 bb 00 00 00 00 45 0f 45 d8 83 fe 01 0f 87
>>> [ 2908.202720] RSP: 0018:ffffc900047bbc80 EFLAGS: 00010002
>>> [ 2908.204165] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
>>> [ 2908.206125] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000030
>>> [ 2908.208203] RBP: ffffc900047bbd40 R08: 0000000000000001 R09: 0000000000000000
>>> [ 2908.210219] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88807ad91500
>>> [ 2908.211386] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000282
>>> [ 2908.212532] FS:  00007f470bc27e40(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
>>> [ 2908.213647] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 2908.214400] CR2: 0000000000000030 CR3: 000000005e8ea001 CR4: 00000000001606e0
>>> [ 2908.215393] Call Trace:
>>> [ 2908.215589]  ? __lock_acquire+0x255/0x1d90
>>> [ 2908.216071]  ? clear_gssp_clnt+0x1b/0x50 [auth_rpcgss]
>>> [ 2908.216720]  ? __mutex_lock+0x99/0x920
>>> [ 2908.217114]  lock_acquire+0x95/0x1b0
>>> [ 2908.217484]  ? cache_purge+0x1c/0x110 [sunrpc]
>>> [ 2908.218000]  _raw_spin_lock+0x2f/0x40
>>> [ 2908.218370]  ? cache_purge+0x1c/0x110 [sunrpc]
>>> [ 2908.218882]  cache_purge+0x1c/0x110 [sunrpc]
>>> [ 2908.219346]  gss_svc_shutdown_net+0xb8/0x170 [auth_rpcgss]
>>> [ 2908.220104]  netns_evict+0x2f/0x40
>>> [ 2908.220439]  nsfs_evict+0x27/0x40
>>> [ 2908.220786]  evict+0xd0/0x1a0
>>> [ 2908.221050]  __dentry_kill+0xdf/0x180
>>> [ 2908.221458]  dentry_kill+0x50/0x1c0
>>> [ 2908.221842]  ? dput+0x1c/0x2b0
>>> [ 2908.222126]  dput+0x260/0x2b0
>>> [ 2908.222384]  path_put+0x12/0x20
>>> [ 2908.222753]  do_faccessat+0x17c/0x240
>>> [ 2908.223125]  do_syscall_64+0x50/0x1c0
>>> [ 2908.223479]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>> [ 2908.224152] RIP: 0033:0x7f47098e2157
>>> [ 2908.224566] Code: 77 01 c3 48 8b 15 69 dd 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 15 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 39 dd 2c 00 f7 d8 64 89 02 b8
>>> [ 2908.228198] RSP: 002b:00007ffc8bda5d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
>>> [ 2908.229496] RAX: ffffffffffffffda RBX: 0000562b0774d979 RCX: 00007f47098e2157
>>> [ 2908.230938] RDX: 00007ffc8bda5d3e RSI: 0000000000000000 RDI: 00007ffc8bda5d30
>>> [ 2908.232182] RBP: 00007ffc8bda5d70 R08: 0000000000000000 R09: 0000562b07d0b130
>>> [ 2908.233481] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc8bda5d30
>>> [ 2908.234750] R13: 0000562b07b34c80 R14: 0000562b07b35120 R15: 0000000000000000
>>> [ 2908.236068] Modules linked in: nfsv4 rpcsec_gss_krb5 nfsv3 nfs_acl nfs lockd grace auth_rpcgss sunrpc
>>> [ 2908.237861] CR2: 0000000000000030
>>> [ 2908.238277] ---[ end trace d88132b63efc09d9 ]---
> .
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1
  2021-09-09  2:52             ` wanghai (M)
@ 2021-09-09 19:52               ` J. Bruce Fields
  0 siblings, 0 replies; 27+ messages in thread
From: J. Bruce Fields @ 2021-09-09 19:52 UTC (permalink / raw)
  To: wanghai (M)
  Cc: Wenbin Zeng, viro, davem, jlayton, trond.myklebust,
	anna.schumaker, wenbinzeng, dsahern, nicolas.dichtel, willy,
	edumazet, jakub.kicinski, tyhicks, chuck.lever, neilb,
	linux-fsdevel, linux-kernel, netdev, linux-nfs

On Thu, Sep 09, 2021 at 10:52:51AM +0800, wanghai (M) wrote:
> 
> 在 2021/9/9 4:51, J. Bruce Fields 写道:
> >On Tue, Sep 07, 2021 at 10:48:52PM +0800, wanghai (M) wrote:
> >>在 2019/6/12 23:52, J. Bruce Fields 写道:
> >>>On Wed, Jun 12, 2019 at 04:37:55PM +0800, Wenbin Zeng wrote:
> >>>>On Tue, May 14, 2019 at 09:03:31PM -0400, J. Bruce Fields wrote:
> >>>>>Whoops, I was slow to test these.  I'm getting failuring krb5 nfs
> >>>>>mounts, and the following the server's logs.  Dropping the three patches
> >>>>>for now.
> >>>>My bad, I should have found it earlier. Thank you for testing it, Bruce.
> >>>>
> >>>>I figured it out, the problem that you saw is due to the following code:
> >>>>the if-condition is incorrect here because sn->gssp_clnt==NULL doesn't mean
> >>>>inexistence of 'use-gss-proxy':
> >>>Thanks, but with the new patches I see the following.  I haven't tried
> >>>to investigate.
> >>This patchset adds the nsfs_evict()->netns_evict() code for breaking
> >>deadlock bugs that exist, but this may cause double free because
> >>nsfs_evict()->netns_evict() may be called multiple times.
> >>
> >>for example:
> >>
> >>int main()
> >>{
> >>     int fd = open("/proc/self/ns/net", O_RDONLY);
> >>     close(fd);
> >>
> >>     fd = open("/proc/self/ns/net", O_RDONLY);
> >>     close(fd);
> >>}
> >>
> >>Therefore, the nsfs evict cannot be used to break the deadlock.
> >Sorry, I haven't really been following this, but I though this problem
> >was fixed by your checking for gssp_clnt (instead of just relying on the
> >use_gssp_proc check) in v3 of your patches?
> >
> >--b.
> Sorry, I'm not Wenbin Zeng.

Apologies!

> I recently encountered the same problem
> and found that Zeng had posted a patchset for it. However, after my
> own analysis, I found that Zeng's patchset will cause the dobule
> free problem.
> 
> The v3 patches also has the double free issue. Because
> nsfs_evict()->netns_evict()->gss_svc_shutdown_net()->cache_purge()
> can be called multiple times.

OK, I see.

> And, even if there is no double free problem, the application can
> make the evict called earlier by 'open("/proc/self/ns/net",
> O_RDONLY); close(fd);', which makes sunrpc unusable.

So it seems like we'd need a way to atomically check whether the only
remaining reference counts on the network namespace were held by this
rpc client and then continue shutting it down if so?

If that's not practical, maybe another solution would be to provide
userspace with some explicit way to shut this down, and modify gssproxy
to do that.

--b.

> Therefore, the v3 patchset is also not applicable.
> 
> This issue causes an OOM on my server after multiple docker creation
> and destruction, and I don't have a good solution at the moment.
> >>A large number of netns leaks may cause OOM problems, currently I
> >>can't find a good solution to fix it, does anyone have a good idea?
> >>>--b.
> >>>
> >>>[ 2908.134813] ------------[ cut here ]------------
> >>>[ 2908.135732] name 'use-gss-proxy'
> >>>[ 2908.136276] WARNING: CPU: 2 PID: 15032 at fs/proc/generic.c:673 remove_proc_entry+0x124/0x190
> >>>[ 2908.138144] Modules linked in: nfsv4 rpcsec_gss_krb5 nfsv3 nfs_acl nfs lockd grace auth_rpcgss sunrpc
> >>>[ 2908.140183] CPU: 2 PID: 15032 Comm: (coredump) Not tainted 5.2.0-rc2-00441-gaef575f54640 #2257
> >>>[ 2908.142062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
> >>>[ 2908.143756] RIP: 0010:remove_proc_entry+0x124/0x190
> >>>[ 2908.144519] Code: c3 48 c7 c7 60 24 8b 82 e8 29 16 a5 00 eb d5 48 c7 c7 60 24 8b 82 e8 1b 16 a5 00 4c 89 e6 48 c7 c7 ec 4c 52 82 e8 50 fd db ff <0f> 0b eb b6 48 8b 04 24 83 a8 90 00 00 00 01 e9 78 ff ff ff 4c 89
> >>>[ 2908.148138] RSP: 0018:ffffc900047bbdb0 EFLAGS: 00010282
> >>>[ 2908.148945] RAX: 0000000000000000 RBX: ffff888036060580 RCX: 0000000000000000
> >>>[ 2908.150139] RDX: ffff88807fd24e80 RSI: ffff88807fd165b8 RDI: 00000000ffffffff
> >>>[ 2908.151334] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> >>>[ 2908.152564] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa00adb1b
> >>>[ 2908.153816] R13: 00007ffc8bda5d30 R14: 0000000000000000 R15: ffff88805e2873a8
> >>>[ 2908.155007] FS:  00007f470bc27e40(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
> >>>[ 2908.156421] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>[ 2908.157333] CR2: 0000562b07764c58 CR3: 000000005e8ea001 CR4: 00000000001606e0
> >>>[ 2908.158529] Call Trace:
> >>>[ 2908.158796]  destroy_use_gss_proxy_proc_entry+0xb7/0x150 [auth_rpcgss]
> >>>[ 2908.159966]  gss_svc_shutdown_net+0x11/0x170 [auth_rpcgss]
> >>>[ 2908.160830]  netns_evict+0x2f/0x40
> >>>[ 2908.161266]  nsfs_evict+0x27/0x40
> >>>[ 2908.161685]  evict+0xd0/0x1a0
> >>>[ 2908.162035]  __dentry_kill+0xdf/0x180
> >>>[ 2908.162520]  dentry_kill+0x50/0x1c0
> >>>[ 2908.163005]  ? dput+0x1c/0x2b0
> >>>[ 2908.163369]  dput+0x260/0x2b0
> >>>[ 2908.163739]  path_put+0x12/0x20
> >>>[ 2908.164155]  do_faccessat+0x17c/0x240
> >>>[ 2908.164643]  do_syscall_64+0x50/0x1c0
> >>>[ 2908.165170]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>>[ 2908.165959] RIP: 0033:0x7f47098e2157
> >>>[ 2908.166445] Code: 77 01 c3 48 8b 15 69 dd 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 15 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 39 dd 2c 00 f7 d8 64 89 02 b8
> >>>[ 2908.169994] RSP: 002b:00007ffc8bda5d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
> >>>[ 2908.171315] RAX: ffffffffffffffda RBX: 0000562b0774d979 RCX: 00007f47098e2157
> >>>[ 2908.172563] RDX: 00007ffc8bda5d3e RSI: 0000000000000000 RDI: 00007ffc8bda5d30
> >>>[ 2908.173753] RBP: 00007ffc8bda5d70 R08: 0000000000000000 R09: 0000562b07d0b130
> >>>[ 2908.174943] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc8bda5d30
> >>>[ 2908.176163] R13: 0000562b07b34c80 R14: 0000562b07b35120 R15: 0000000000000000
> >>>[ 2908.177395] irq event stamp: 4256
> >>>[ 2908.177835] hardirqs last  enabled at (4255): [<ffffffff811221ee>] console_unlock+0x41e/0x590
> >>>[ 2908.179378] hardirqs last disabled at (4256): [<ffffffff81001b2f>] trace_hardirqs_off_thunk+0x1a/0x1c
> >>>[ 2908.181031] softirqs last  enabled at (4252): [<ffffffff820002be>] __do_softirq+0x2be/0x4aa
> >>>[ 2908.182458] softirqs last disabled at (4233): [<ffffffff810bf8e0>] irq_exit+0x80/0x90
> >>>[ 2908.183869] ---[ end trace d88132b63efc09d8 ]---
> >>>[ 2908.184620] BUG: kernel NULL pointer dereference, address: 0000000000000030
> >>>[ 2908.185829] #PF: supervisor read access in kernel mode
> >>>[ 2908.186924] #PF: error_code(0x0000) - not-present page
> >>>[ 2908.187887] PGD 0 P4D 0
> >>>[ 2908.188318] Oops: 0000 [#1] PREEMPT SMP PTI
> >>>[ 2908.189254] CPU: 2 PID: 15032 Comm: (coredump) Tainted: G        W         5.2.0-rc2-00441-gaef575f54640 #2257
> >>>[ 2908.192506] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
> >>>[ 2908.195137] RIP: 0010:__lock_acquire+0x3d2/0x1d90
> >>>[ 2908.196414] Code: db 48 8b 84 24 88 00 00 00 65 48 33 04 25 28 00 00 00 0f 85 be 10 00 00 48 8d 65 d8 44 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <48> 81 3f 60 0d 01 83 41 bb 00 00 00 00 45 0f 45 d8 83 fe 01 0f 87
> >>>[ 2908.202720] RSP: 0018:ffffc900047bbc80 EFLAGS: 00010002
> >>>[ 2908.204165] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> >>>[ 2908.206125] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000030
> >>>[ 2908.208203] RBP: ffffc900047bbd40 R08: 0000000000000001 R09: 0000000000000000
> >>>[ 2908.210219] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88807ad91500
> >>>[ 2908.211386] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000282
> >>>[ 2908.212532] FS:  00007f470bc27e40(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
> >>>[ 2908.213647] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>[ 2908.214400] CR2: 0000000000000030 CR3: 000000005e8ea001 CR4: 00000000001606e0
> >>>[ 2908.215393] Call Trace:
> >>>[ 2908.215589]  ? __lock_acquire+0x255/0x1d90
> >>>[ 2908.216071]  ? clear_gssp_clnt+0x1b/0x50 [auth_rpcgss]
> >>>[ 2908.216720]  ? __mutex_lock+0x99/0x920
> >>>[ 2908.217114]  lock_acquire+0x95/0x1b0
> >>>[ 2908.217484]  ? cache_purge+0x1c/0x110 [sunrpc]
> >>>[ 2908.218000]  _raw_spin_lock+0x2f/0x40
> >>>[ 2908.218370]  ? cache_purge+0x1c/0x110 [sunrpc]
> >>>[ 2908.218882]  cache_purge+0x1c/0x110 [sunrpc]
> >>>[ 2908.219346]  gss_svc_shutdown_net+0xb8/0x170 [auth_rpcgss]
> >>>[ 2908.220104]  netns_evict+0x2f/0x40
> >>>[ 2908.220439]  nsfs_evict+0x27/0x40
> >>>[ 2908.220786]  evict+0xd0/0x1a0
> >>>[ 2908.221050]  __dentry_kill+0xdf/0x180
> >>>[ 2908.221458]  dentry_kill+0x50/0x1c0
> >>>[ 2908.221842]  ? dput+0x1c/0x2b0
> >>>[ 2908.222126]  dput+0x260/0x2b0
> >>>[ 2908.222384]  path_put+0x12/0x20
> >>>[ 2908.222753]  do_faccessat+0x17c/0x240
> >>>[ 2908.223125]  do_syscall_64+0x50/0x1c0
> >>>[ 2908.223479]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>>[ 2908.224152] RIP: 0033:0x7f47098e2157
> >>>[ 2908.224566] Code: 77 01 c3 48 8b 15 69 dd 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 15 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 39 dd 2c 00 f7 d8 64 89 02 b8
> >>>[ 2908.228198] RSP: 002b:00007ffc8bda5d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
> >>>[ 2908.229496] RAX: ffffffffffffffda RBX: 0000562b0774d979 RCX: 00007f47098e2157
> >>>[ 2908.230938] RDX: 00007ffc8bda5d3e RSI: 0000000000000000 RDI: 00007ffc8bda5d30
> >>>[ 2908.232182] RBP: 00007ffc8bda5d70 R08: 0000000000000000 R09: 0000562b07d0b130
> >>>[ 2908.233481] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc8bda5d30
> >>>[ 2908.234750] R13: 0000562b07b34c80 R14: 0000562b07b35120 R15: 0000000000000000
> >>>[ 2908.236068] Modules linked in: nfsv4 rpcsec_gss_krb5 nfsv3 nfs_acl nfs lockd grace auth_rpcgss sunrpc
> >>>[ 2908.237861] CR2: 0000000000000030
> >>>[ 2908.238277] ---[ end trace d88132b63efc09d9 ]---
> >.
> >

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-09-09 19:52 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-01  6:42 [PATCH 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1 Wenbin Zeng
2019-05-01  6:42 ` [PATCH 1/3] nsfs: add evict callback into struct proc_ns_operations Wenbin Zeng
2019-05-02  3:04   ` Al Viro
2019-05-04 16:08     ` Wenbin Zeng
2019-05-01  6:42 ` [PATCH 2/3] netns: add netns_evict into netns_operations Wenbin Zeng
2019-05-04  4:10   ` David Miller
2019-05-01  6:42 ` [PATCH 3/3] auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when use-gss-proxy==1 Wenbin Zeng
2019-05-09 20:52 ` [PATCH 0/3] auth_gss: netns refcount leaks " J. Bruce Fields
2019-05-10  5:09   ` Wenbin Zeng
2019-05-10  6:36 ` [PATCH v2 " Wenbin Zeng
2019-05-10  6:36   ` [PATCH v2 1/3] nsfs: add evict callback into struct proc_ns_operations Wenbin Zeng
2019-05-10  6:36   ` [PATCH v2 2/3] netns: add netns_evict into netns_operations Wenbin Zeng
2019-05-10 22:13     ` David Miller
2019-05-10  6:36   ` [PATCH v2 3/3] auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when use-gss-proxy==1 Wenbin Zeng
2019-05-15  1:03   ` [PATCH v2 0/3] auth_gss: netns refcount leaks " J. Bruce Fields
2019-06-12  8:37     ` Wenbin Zeng
2019-06-12 15:52       ` J. Bruce Fields
2021-09-07 14:48         ` wanghai (M)
2021-09-08 20:51           ` J. Bruce Fields
2021-09-09  2:52             ` wanghai (M)
2021-09-09 19:52               ` J. Bruce Fields
2019-06-12 12:09 ` [PATCH v3 " Wenbin Zeng
2019-06-12 12:09   ` [PATCH v3 1/3] nsfs: add evict callback into struct proc_ns_operations Wenbin Zeng
2019-06-12 12:09   ` [PATCH v3 2/3] netns: add netns_evict into netns_operations Wenbin Zeng
2019-06-12 12:09   ` [PATCH v3 3/3] auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when use-gss-proxy==1 Wenbin Zeng
2019-08-01 19:53   ` [PATCH v3 0/3] auth_gss: netns refcount leaks " J. Bruce Fields
2021-08-28 11:26     ` wanghai (M)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).