All of lore.kernel.org
 help / color / mirror / Atom feed
* Netlink socket leaks
@ 2015-05-01 10:22 Andrey Wagin
  2015-05-01 23:31 ` Andrey Wagin
  0 siblings, 1 reply; 6+ messages in thread
From: Andrey Wagin @ 2015-05-01 10:22 UTC (permalink / raw)
  To: netdev, Herbert Xu

Hi Herbert,

I've found that netlink sockets start leaking after your patch:
commit c428ecd1a21f1457ca3beb4df71b8a079c410e41
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Fri Mar 20 21:57:01 2015 +1100

    netlink: Move namespace into hash key

Could you look at this?

It can be reproduced by the following script:
ss -a | wc -l; ip netns add test && ip netns exec test ip a && ip
netns del test ; ss -a | wc -l

[root@jenkins ~]# uname -a
Linux jenkins.criu.org 4.1.0-rc1-00117-g4a152c3 #46 SMP Fri May 1
14:09:02 MSK 2015 x86_64 x86_64 x86_64 GNU/Linux


I have applied a patch with debug messages:
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 72c6b55..79aece4 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1081,7 +1081,9 @@ static int netlink_insert(struct sock *sk, u32 portid)
                if (err == -EEXIST)
                        err = -EADDRINUSE;
                sock_put(sk);
+               printk("%s:%d: %p\n", __func__, __LINE__, sk);
        }
+       printk("%s:%d: %p\n", __func__, __LINE__, sk);

 err:
        release_sock(sk);
@@ -1097,7 +1099,9 @@ static void netlink_remove(struct sock *sk)
                                    netlink_rhashtable_params)) {
                WARN_ON(atomic_read(&sk->sk_refcnt) == 1);
                __sock_put(sk);
+               printk("%s:%d: %p\n", __func__, __LINE__, sk);
        }
+       printk("%s:%d: %p\n", __func__, __LINE__, sk);

        netlink_table_grab();
        if (nlk_sk(sk)->subscriptions) {

And here is the output
[ 3868.202260] netlink_insert:1086: ffff8800d8c14000
[ 3868.202962] netlink_insert:1086: ffff8800d8c15800
[ 3868.202995] netlink_insert:1086: ffff8800d8c12000
[ 3868.203228] netlink_insert:1086: ffff8800d8c16000
[ 3868.203360] netlink_insert:1086: ffff8800d8c11000
[ 3868.203920] netlink_insert:1086: ffff8800d8c13000
[ 3868.203951] netlink_insert:1086: ffff8800d8c11800
[ 3868.204216] netlink_insert:1086: ffff8800d8c17000
[ 3868.205846] netlink_remove:1102: ffff8800d8c14000
[ 3868.205849] netlink_remove:1104: ffff8800d8c14000
[ 3868.208270] netlink_insert:1086: ffff8800d81d3800
[ 3868.215363] netlink_remove:1102: ffff8800d81d3800
[ 3868.215368] netlink_remove:1104: ffff8800d81d3800
[ 3868.216465] netlink_insert:1086: ffff8800d8c0a000
[ 3868.226320] netlink_remove:1102: ffff8800d8c0a000
[ 3868.226324] netlink_remove:1104: ffff8800d8c0a000
[ 3868.230662] netlink_insert:1086: ffff8800d8c0e800
[ 3868.235906] netlink_remove:1102: ffff8800d8c0e800
[ 3868.235910] netlink_remove:1104: ffff8800d8c0e800
[ 3868.280231] netlink_remove:1104: ffff8800d8c17000
[ 3868.280283] netlink_remove:1104: ffff8800d8c11800
[ 3868.286152] netlink_remove:1104: ffff8800d8c13000
[ 3868.318148] netlink_remove:1104: ffff8800d8c11000
[ 3868.318185] netlink_remove:1104: ffff8800d8c16000
[ 3868.318241] netlink_remove:1104: ffff8800d8c12000
[ 3868.318255] netlink_remove:1104: ffff8800d8c15800

So it looks like

netlink_insert() returns zero, but then rhashtable_remove_fast()
returns non-zero code

Thanks,
Andrew

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Netlink socket leaks
  2015-05-01 10:22 Netlink socket leaks Andrey Wagin
@ 2015-05-01 23:31 ` Andrey Wagin
  2015-05-02  2:05   ` Herbert Xu
  2015-05-03  0:04   ` Herbert Xu
  0 siblings, 2 replies; 6+ messages in thread
From: Andrey Wagin @ 2015-05-01 23:31 UTC (permalink / raw)
  To: netdev, Herbert Xu

2015-05-01 13:22 GMT+03:00 Andrey Wagin <avagin@gmail.com>:
> Hi Herbert,
>
> I've found that netlink sockets start leaking after your patch:
> commit c428ecd1a21f1457ca3beb4df71b8a079c410e41
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Fri Mar 20 21:57:01 2015 +1100
>
>     netlink: Move namespace into hash key
>
> Could you look at this?

A socket leaks if it is released by sk_release_kernel(). The problem
is that netlink_insert() and netlink_remove() is called when a socket
has different values of sk->sk_net.

The following patch fixes this problem, but I am not sure that it
breaks nothing else.
diff --git a/net/core/sock.c b/net/core/sock.c
index e891bcf..292f422 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1474,8 +1474,8 @@ void sk_release_kernel(struct sock *sk)
                return;

        sock_hold(sk);
-       sock_net_set(sk, get_net(&init_net));
        sock_release(sk->sk_socket);
+       sock_net_set(sk, get_net(&init_net));
        sock_put(sk);
 }
 EXPORT_SYMBOL(sk_release_kernel);

>
> It can be reproduced by the following script:
> ss -a | wc -l; ip netns add test && ip netns exec test ip a && ip
> netns del test ; ss -a | wc -l
>
> [root@jenkins ~]# uname -a
> Linux jenkins.criu.org 4.1.0-rc1-00117-g4a152c3 #46 SMP Fri May 1
> 14:09:02 MSK 2015 x86_64 x86_64 x86_64 GNU/Linux
>
>
> I have applied a patch with debug messages:
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 72c6b55..79aece4 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -1081,7 +1081,9 @@ static int netlink_insert(struct sock *sk, u32 portid)
>                 if (err == -EEXIST)
>                         err = -EADDRINUSE;
>                 sock_put(sk);
> +               printk("%s:%d: %p\n", __func__, __LINE__, sk);
>         }
> +       printk("%s:%d: %p\n", __func__, __LINE__, sk);
>
>  err:
>         release_sock(sk);
> @@ -1097,7 +1099,9 @@ static void netlink_remove(struct sock *sk)
>                                     netlink_rhashtable_params)) {
>                 WARN_ON(atomic_read(&sk->sk_refcnt) == 1);
>                 __sock_put(sk);
> +               printk("%s:%d: %p\n", __func__, __LINE__, sk);
>         }
> +       printk("%s:%d: %p\n", __func__, __LINE__, sk);
>
>         netlink_table_grab();
>         if (nlk_sk(sk)->subscriptions) {
>
> And here is the output
> [ 3868.202260] netlink_insert:1086: ffff8800d8c14000
> [ 3868.202962] netlink_insert:1086: ffff8800d8c15800
> [ 3868.202995] netlink_insert:1086: ffff8800d8c12000
> [ 3868.203228] netlink_insert:1086: ffff8800d8c16000
> [ 3868.203360] netlink_insert:1086: ffff8800d8c11000
> [ 3868.203920] netlink_insert:1086: ffff8800d8c13000
> [ 3868.203951] netlink_insert:1086: ffff8800d8c11800
> [ 3868.204216] netlink_insert:1086: ffff8800d8c17000
> [ 3868.205846] netlink_remove:1102: ffff8800d8c14000
> [ 3868.205849] netlink_remove:1104: ffff8800d8c14000
> [ 3868.208270] netlink_insert:1086: ffff8800d81d3800
> [ 3868.215363] netlink_remove:1102: ffff8800d81d3800
> [ 3868.215368] netlink_remove:1104: ffff8800d81d3800
> [ 3868.216465] netlink_insert:1086: ffff8800d8c0a000
> [ 3868.226320] netlink_remove:1102: ffff8800d8c0a000
> [ 3868.226324] netlink_remove:1104: ffff8800d8c0a000
> [ 3868.230662] netlink_insert:1086: ffff8800d8c0e800
> [ 3868.235906] netlink_remove:1102: ffff8800d8c0e800
> [ 3868.235910] netlink_remove:1104: ffff8800d8c0e800
> [ 3868.280231] netlink_remove:1104: ffff8800d8c17000
> [ 3868.280283] netlink_remove:1104: ffff8800d8c11800
> [ 3868.286152] netlink_remove:1104: ffff8800d8c13000
> [ 3868.318148] netlink_remove:1104: ffff8800d8c11000
> [ 3868.318185] netlink_remove:1104: ffff8800d8c16000
> [ 3868.318241] netlink_remove:1104: ffff8800d8c12000
> [ 3868.318255] netlink_remove:1104: ffff8800d8c15800
>
> So it looks like
>
> netlink_insert() returns zero, but then rhashtable_remove_fast()
> returns non-zero code
>
> Thanks,
> Andrew

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Netlink socket leaks
  2015-05-01 23:31 ` Andrey Wagin
@ 2015-05-02  2:05   ` Herbert Xu
  2015-05-03  0:04   ` Herbert Xu
  1 sibling, 0 replies; 6+ messages in thread
From: Herbert Xu @ 2015-05-02  2:05 UTC (permalink / raw)
  To: Andrey Wagin; +Cc: netdev

On Sat, May 02, 2015 at 02:31:09AM +0300, Andrey Wagin wrote:
>
> A socket leaks if it is released by sk_release_kernel(). The problem
> is that netlink_insert() and netlink_remove() is called when a socket
> has different values of sk->sk_net.

Thanks for catching this! Let me look into this to see if we
can still safely dereference the namespace at this point.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Netlink socket leaks
  2015-05-01 23:31 ` Andrey Wagin
  2015-05-02  2:05   ` Herbert Xu
@ 2015-05-03  0:04   ` Herbert Xu
  2015-05-04  4:13     ` David Miller
  2015-05-04 10:09     ` Ying Xue
  1 sibling, 2 replies; 6+ messages in thread
From: Herbert Xu @ 2015-05-03  0:04 UTC (permalink / raw)
  To: Andrey Wagin; +Cc: netdev, Ying Xue

On Sat, May 02, 2015 at 02:31:09AM +0300, Andrey Wagin wrote:
>
> A socket leaks if it is released by sk_release_kernel(). The problem
> is that netlink_insert() and netlink_remove() is called when a socket
> has different values of sk->sk_net.

I think we simply need to revert
c243d7e20996254f89c28d4838b5feca735c030d.

---8<---
Subject: Revert "net: kernel socket should be released in init_net namespace"

This reverts commit c243d7e20996254f89c28d4838b5feca735c030d.

That patch is solving a non-existant problem while creating a
real problem.  Just because a socket is allocated in the init
name space doesn't mean that it gets hashed in the init name space.

When we unhash it the name space must be the same as the one
we had when we hashed it.  So this patch is completely bogus
and causes socket leaks.

Reported-by: Andrey Wagin <avagin@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/core/sock.c b/net/core/sock.c
index e891bcf..292f422 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1474,8 +1474,8 @@ void sk_release_kernel(struct sock *sk)
 		return;
 
 	sock_hold(sk);
-	sock_net_set(sk, get_net(&init_net));
 	sock_release(sk->sk_socket);
+	sock_net_set(sk, get_net(&init_net));
 	sock_put(sk);
 }
 EXPORT_SYMBOL(sk_release_kernel);
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Netlink socket leaks
  2015-05-03  0:04   ` Herbert Xu
@ 2015-05-04  4:13     ` David Miller
  2015-05-04 10:09     ` Ying Xue
  1 sibling, 0 replies; 6+ messages in thread
From: David Miller @ 2015-05-04  4:13 UTC (permalink / raw)
  To: herbert; +Cc: avagin, netdev, ying.xue

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 3 May 2015 08:04:28 +0800

> ---8<---
> Subject: Revert "net: kernel socket should be released in init_net namespace"
> 
> This reverts commit c243d7e20996254f89c28d4838b5feca735c030d.
> 
> That patch is solving a non-existant problem while creating a
> real problem.  Just because a socket is allocated in the init
> name space doesn't mean that it gets hashed in the init name space.
> 
> When we unhash it the name space must be the same as the one
> we had when we hashed it.  So this patch is completely bogus
> and causes socket leaks.
> 
> Reported-by: Andrey Wagin <avagin@gmail.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks Herbert.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Netlink socket leaks
  2015-05-03  0:04   ` Herbert Xu
  2015-05-04  4:13     ` David Miller
@ 2015-05-04 10:09     ` Ying Xue
  1 sibling, 0 replies; 6+ messages in thread
From: Ying Xue @ 2015-05-04 10:09 UTC (permalink / raw)
  To: Herbert Xu, Andrey Wagin; +Cc: netdev

On 05/03/2015 08:04 AM, Herbert Xu wrote:
> On Sat, May 02, 2015 at 02:31:09AM +0300, Andrey Wagin wrote:
>>
>> A socket leaks if it is released by sk_release_kernel(). The problem
>> is that netlink_insert() and netlink_remove() is called when a socket
>> has different values of sk->sk_net.
> 
> I think we simply need to revert
> c243d7e20996254f89c28d4838b5feca735c030d.
> 
> ---8<---
> Subject: Revert "net: kernel socket should be released in init_net namespace"
> 
> This reverts commit c243d7e20996254f89c28d4838b5feca735c030d.
> 
> That patch is solving a non-existant problem while creating a
> real problem.  Just because a socket is allocated in the init
> name space doesn't mean that it gets hashed in the init name space.
> 
> When we unhash it the name space must be the same as the one
> we had when we hashed it.  So this patch is completely bogus
> and causes socket leaks.
> 

Herbert, thanks for the fix.

Reverting commit c243d7e20996254f89c28d4838b5feca735c030d is absolutely a
correct decision now.

Actually my initial purpose of creating the commit is because inserting tipc
socket into its rhashtable happens in socket creation, and deleting the socket
from its rhashtable occurs in socket release. Without the commit, the creation
of tipc kernel internal socket happens in init_net context, but the socket
release occurs in the current namespace. More importantly, tipc allocates
different rhashtables for different namespaces. Therefore, tipc kernel internal
sockets created init_net would be inserted into init_net's rhashtable, but they
would be removed from current namespace's rhashtable when deleting them. As a
result, tipc kernel sockets are leaked as they are unable to be found in current
namespace's rhashtable. But as the commit can guarantee that both creation and
deletion of tipc kernel internal socket always happens in the current namespace,
leaking tipc socket can be avoided.

However, I did not realize that hashing of inet sockets usually occurs in
bind(), and unashing is in release(). As The former context is current
namespace, and the latter is init_net. the socket leak happens on netlink sockets.

Currently as for tipc kernel sockets, even your patch is involved, the leak
would never happen on tipc sockets because tipc uses __sock_create() instead of
sock_create_kern() to create its kernel sockets.

Until now, I believe that it's safe for all kinds of kernel sockets together
with your patch.

However, when I reviewed your patch, I found that moving netlink socket's
namespace back and forth is unnecessary for us at all. Instead it artificially
increases the complexity of netlink code. Therefore, I create the following
patch to avoid it, please review it:

http://patchwork.ozlabs.org/patch/467535/

Thanks,
Ying

> Reported-by: Andrey Wagin <avagin@gmail.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index e891bcf..292f422 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1474,8 +1474,8 @@ void sk_release_kernel(struct sock *sk)
>  		return;
>  
>  	sock_hold(sk);
> -	sock_net_set(sk, get_net(&init_net));
>  	sock_release(sk->sk_socket);
> +	sock_net_set(sk, get_net(&init_net));
>  	sock_put(sk);
>  }
>  EXPORT_SYMBOL(sk_release_kernel);
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-05-04 10:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-01 10:22 Netlink socket leaks Andrey Wagin
2015-05-01 23:31 ` Andrey Wagin
2015-05-02  2:05   ` Herbert Xu
2015-05-03  0:04   ` Herbert Xu
2015-05-04  4:13     ` David Miller
2015-05-04 10:09     ` Ying Xue

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.