* [PATCH v5 1/2] sock: Change the netns_core member name.
@ 2017-12-07 16:45 Tonghao Zhang
2017-12-07 16:45 ` [PATCH v5 2/2] sock: Move the socket inuse to namespace Tonghao Zhang
0 siblings, 1 reply; 13+ messages in thread
From: Tonghao Zhang @ 2017-12-07 16:45 UTC (permalink / raw)
To: davem, xiyou.wangcong, edumazet, willemb; +Cc: netdev, Tonghao Zhang
Change the member name will make the code more readable.
This patch will be used in next patch.
Signed-off-by: Martin Zhang <zhangjunweimartin@didichuxing.com>
Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
---
include/net/netns/core.h | 2 +-
net/core/sock.c | 10 +++++-----
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/net/netns/core.h b/include/net/netns/core.h
index 0ad4d0c..45cfb5d 100644
--- a/include/net/netns/core.h
+++ b/include/net/netns/core.h
@@ -11,7 +11,7 @@ struct netns_core {
int sysctl_somaxconn;
- struct prot_inuse __percpu *inuse;
+ struct prot_inuse __percpu *prot_inuse;
};
#endif
diff --git a/net/core/sock.c b/net/core/sock.c
index c0b5b2f..c2dd2d3 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3045,7 +3045,7 @@ struct prot_inuse {
void sock_prot_inuse_add(struct net *net, struct proto *prot, int val)
{
- __this_cpu_add(net->core.inuse->val[prot->inuse_idx], val);
+ __this_cpu_add(net->core.prot_inuse->val[prot->inuse_idx], val);
}
EXPORT_SYMBOL_GPL(sock_prot_inuse_add);
@@ -3055,7 +3055,7 @@ int sock_prot_inuse_get(struct net *net, struct proto *prot)
int res = 0;
for_each_possible_cpu(cpu)
- res += per_cpu_ptr(net->core.inuse, cpu)->val[idx];
+ res += per_cpu_ptr(net->core.prot_inuse, cpu)->val[idx];
return res >= 0 ? res : 0;
}
@@ -3063,13 +3063,13 @@ int sock_prot_inuse_get(struct net *net, struct proto *prot)
static int __net_init sock_inuse_init_net(struct net *net)
{
- net->core.inuse = alloc_percpu(struct prot_inuse);
- return net->core.inuse ? 0 : -ENOMEM;
+ net->core.prot_inuse = alloc_percpu(struct prot_inuse);
+ return net->core.prot_inuse ? 0 : -ENOMEM;
}
static void __net_exit sock_inuse_exit_net(struct net *net)
{
- free_percpu(net->core.inuse);
+ free_percpu(net->core.prot_inuse);
}
static struct pernet_operations net_inuse_ops = {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 2/2] sock: Move the socket inuse to namespace.
2017-12-07 16:45 [PATCH v5 1/2] sock: Change the netns_core member name Tonghao Zhang
@ 2017-12-07 16:45 ` Tonghao Zhang
2017-12-07 17:20 ` Eric Dumazet
0 siblings, 1 reply; 13+ messages in thread
From: Tonghao Zhang @ 2017-12-07 16:45 UTC (permalink / raw)
To: davem, xiyou.wangcong, edumazet, willemb; +Cc: netdev, Tonghao Zhang
In some case, we want to know how many sockets are in use in
different _net_ namespaces. It's a key resource metric.
This patch add a member in struct netns_core. This is a counter
for socket-inuse in the _net_ namespace. The patch will add/sub
counter in the sk_alloc, sk_clone_lock and __sk_free.
The main reasons for doing this are that:
1. When linux calls the 'do_exit' for process to exit, the functions
'exit_task_namespaces' and 'exit_task_work' will be called sequentially.
'exit_task_namespaces' may have destroyed the _net_ namespace, but
'sock_release' called in 'exit_task_work' may use the _net_ namespace
if we counter the socket-inuse in sock_release.
2. socket and sock are in pair. More important, sock holds the _net_
namespace. We counter the socket-inuse in sock, for avoiding holding
_net_ namespace again in socket. It's a easy way to maintain the code.
Signed-off-by: Martin Zhang <zhangjunweimartin@didichuxing.com>
Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
---
include/net/netns/core.h | 1 +
include/net/sock.h | 1 +
net/core/sock.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++--
net/socket.c | 21 ++-----------------
4 files changed, 54 insertions(+), 21 deletions(-)
diff --git a/include/net/netns/core.h b/include/net/netns/core.h
index 45cfb5d..d1b4748f 100644
--- a/include/net/netns/core.h
+++ b/include/net/netns/core.h
@@ -11,6 +11,7 @@ struct netns_core {
int sysctl_somaxconn;
+ int __percpu *sock_inuse;
struct prot_inuse __percpu *prot_inuse;
};
diff --git a/include/net/sock.h b/include/net/sock.h
index 79e1a2c..0809b31 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1266,6 +1266,7 @@ static inline void sk_sockets_allocated_inc(struct sock *sk)
/* Called with local bh disabled */
void sock_prot_inuse_add(struct net *net, struct proto *prot, int inc);
int sock_prot_inuse_get(struct net *net, struct proto *proto);
+int sock_inuse_get(struct net *net);
#else
static inline void sock_prot_inuse_add(struct net *net, struct proto *prot,
int inc)
diff --git a/net/core/sock.c b/net/core/sock.c
index c2dd2d3..a11680a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -145,6 +145,8 @@
static DEFINE_MUTEX(proto_list_mutex);
static LIST_HEAD(proto_list);
+static void sock_inuse_add(struct net *net, int val);
+
/**
* sk_ns_capable - General socket capability test
* @sk: Socket to use a capability on or through
@@ -1534,6 +1536,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
if (likely(sk->sk_net_refcnt))
get_net(net);
sock_net_set(sk, net);
+ sock_inuse_add(net, 1);
refcount_set(&sk->sk_wmem_alloc, 1);
mem_cgroup_sk_alloc(sk);
@@ -1595,6 +1598,8 @@ void sk_destruct(struct sock *sk)
static void __sk_free(struct sock *sk)
{
+ sock_inuse_add(sock_net(sk), -1);
+
if (unlikely(sock_diag_has_destroy_listeners(sk) && sk->sk_net_refcnt))
sock_diag_broadcast_destroy(sk);
else
@@ -1716,6 +1721,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
newsk->sk_priority = 0;
newsk->sk_incoming_cpu = raw_smp_processor_id();
atomic64_set(&newsk->sk_cookie, 0);
+ sock_inuse_add(sock_net(newsk), 1);
/*
* Before updating sk_refcnt, we must commit prior changes to memory
@@ -3061,15 +3067,53 @@ int sock_prot_inuse_get(struct net *net, struct proto *prot)
}
EXPORT_SYMBOL_GPL(sock_prot_inuse_get);
+static void sock_inuse_add(struct net *net, int val)
+{
+ if (net->core.prot_inuse)
+ this_cpu_add(*net->core.sock_inuse, val);
+}
+
+int sock_inuse_get(struct net *net)
+{
+ int cpu, res = 0;
+
+ if (!net->core.prot_inuse)
+ return 0;
+
+ for_each_possible_cpu(cpu)
+ res += *per_cpu_ptr(net->core.sock_inuse, cpu);
+
+ return res >= 0 ? res : 0;
+}
+EXPORT_SYMBOL_GPL(sock_inuse_get);
+
static int __net_init sock_inuse_init_net(struct net *net)
{
net->core.prot_inuse = alloc_percpu(struct prot_inuse);
- return net->core.prot_inuse ? 0 : -ENOMEM;
+ if (!net->core.prot_inuse)
+ return -ENOMEM;
+
+ net->core.sock_inuse = alloc_percpu(int);
+ if (!net->core.sock_inuse)
+ goto out;
+
+ return 0;
+out:
+ free_percpu(net->core.prot_inuse);
+ return -ENOMEM;
}
static void __net_exit sock_inuse_exit_net(struct net *net)
{
- free_percpu(net->core.prot_inuse);
+ if (net->core.prot_inuse) {
+ free_percpu(net->core.prot_inuse);
+ net->core.prot_inuse = NULL;
+ }
+
+ if (net->core.sock_inuse) {
+ free_percpu(net->core.sock_inuse);
+ net->core.prot_inuse = NULL;
+ }
}
static struct pernet_operations net_inuse_ops = {
@@ -3112,6 +3156,10 @@ static inline void assign_proto_idx(struct proto *prot)
static inline void release_proto_idx(struct proto *prot)
{
}
+
+static void sock_inuse_add(struct net *net, int val)
+{
+}
#endif
static void req_prot_cleanup(struct request_sock_ops *rsk_prot)
diff --git a/net/socket.c b/net/socket.c
index 42d8e9c..183de8f01 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -163,12 +163,6 @@ static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
static const struct net_proto_family __rcu *net_families[NPROTO] __read_mostly;
/*
- * Statistics counters of the socket lists
- */
-
-static DEFINE_PER_CPU(int, sockets_in_use);
-
-/*
* Support routines.
* Move socket addresses back and forth across the kernel/user
* divide and look after the messy bits.
@@ -574,7 +568,6 @@ struct socket *sock_alloc(void)
inode->i_gid = current_fsgid();
inode->i_op = &sockfs_inode_ops;
- this_cpu_add(sockets_in_use, 1);
return sock;
}
EXPORT_SYMBOL(sock_alloc);
@@ -601,7 +594,6 @@ void sock_release(struct socket *sock)
if (rcu_dereference_protected(sock->wq, 1)->fasync_list)
pr_err("%s: fasync list not empty!\n", __func__);
- this_cpu_sub(sockets_in_use, 1);
if (!sock->file) {
iput(SOCK_INODE(sock));
return;
@@ -2644,17 +2636,8 @@ static int __init sock_init(void)
#ifdef CONFIG_PROC_FS
void socket_seq_show(struct seq_file *seq)
{
- int cpu;
- int counter = 0;
-
- for_each_possible_cpu(cpu)
- counter += per_cpu(sockets_in_use, cpu);
-
- /* It can be negative, by the way. 8) */
- if (counter < 0)
- counter = 0;
-
- seq_printf(seq, "sockets: used %d\n", counter);
+ seq_printf(seq, "sockets: used %d\n",
+ sock_inuse_get(seq->private));
}
#endif /* CONFIG_PROC_FS */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.
2017-12-07 16:45 ` [PATCH v5 2/2] sock: Move the socket inuse to namespace Tonghao Zhang
@ 2017-12-07 17:20 ` Eric Dumazet
2017-12-07 21:28 ` Cong Wang
2017-12-08 5:28 ` Tonghao Zhang
0 siblings, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2017-12-07 17:20 UTC (permalink / raw)
To: Tonghao Zhang, davem, xiyou.wangcong, edumazet, willemb; +Cc: netdev
On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote:
> In some case, we want to know how many sockets are in use in
> different _net_ namespaces. It's a key resource metric.
>
...
> +static void sock_inuse_add(struct net *net, int val)
> +{
> + if (net->core.prot_inuse)
> + this_cpu_add(*net->core.sock_inuse, val);
> +}
This is very confusing.
Why testing net->core.prot_inuse for NULL is needed at all ?
Why not testing net->core.sock_inuse instead ?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.
2017-12-07 17:20 ` Eric Dumazet
@ 2017-12-07 21:28 ` Cong Wang
2017-12-08 5:28 ` Tonghao Zhang
1 sibling, 0 replies; 13+ messages in thread
From: Cong Wang @ 2017-12-07 21:28 UTC (permalink / raw)
To: Eric Dumazet
Cc: Tonghao Zhang, David Miller, Eric Dumazet, Willem de Bruijn,
Linux Kernel Network Developers
On Thu, Dec 7, 2017 at 9:20 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote:
>> In some case, we want to know how many sockets are in use in
>> different _net_ namespaces. It's a key resource metric.
>>
>
> ...
>
>> +static void sock_inuse_add(struct net *net, int val)
>> +{
>> + if (net->core.prot_inuse)
>> + this_cpu_add(*net->core.sock_inuse, val);
>> +}
>
> This is very confusing.
>
> Why testing net->core.prot_inuse for NULL is needed at all ?
>
> Why not testing net->core.sock_inuse instead ?
I bet that is copy-n-paste error given that sock_inuse_exit_net()
has a similar typo.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.
2017-12-07 17:20 ` Eric Dumazet
2017-12-07 21:28 ` Cong Wang
@ 2017-12-08 5:28 ` Tonghao Zhang
2017-12-08 5:40 ` Eric Dumazet
2017-12-08 22:09 ` Cong Wang
1 sibling, 2 replies; 13+ messages in thread
From: Tonghao Zhang @ 2017-12-08 5:28 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, Cong Wang, Eric Dumazet, Willem de Bruijn,
Linux Kernel Network Developers
On Fri, Dec 8, 2017 at 1:20 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote:
>> In some case, we want to know how many sockets are in use in
>> different _net_ namespaces. It's a key resource metric.
>>
>
> ...
>
>> +static void sock_inuse_add(struct net *net, int val)
>> +{
>> + if (net->core.prot_inuse)
>> + this_cpu_add(*net->core.sock_inuse, val);
>> +}
>
> This is very confusing.
>
> Why testing net->core.prot_inuse for NULL is needed at all ?
>
> Why not testing net->core.sock_inuse instead ?
>
Hi Eric and Cong, oh it's a typo. it's net->core.sock_inuse there. Why
we should check the net->core.sock_inuse
Now show you the code:
cleanup_net will call all of the network namespace exit methods,
rcu_barrier, and then remove the _net_ namespace.
cleanup_net:
list_for_each_entry_reverse(ops, &pernet_list, list)
ops_exit_list(ops, &net_exit_list);
rcu_barrier(); /* for netlink sock, the ‘deferred_put_nlk_sk’ will
be called. But sock_inuse has been released. */
/* Finally it is safe to free my network namespace structure */
list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {}
Release the netlink sock created in kernel(not hold the _net_ namespace):
netlink_release
call_rcu(&nlk->rcu, deferred_put_nlk_sk);
deferred_put_nlk_sk
sk_free(sk);
I may add a comment for sock_inuse_add in v6.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.
2017-12-08 5:28 ` Tonghao Zhang
@ 2017-12-08 5:40 ` Eric Dumazet
2017-12-08 9:52 ` Tonghao Zhang
2017-12-08 22:09 ` Cong Wang
1 sibling, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2017-12-08 5:40 UTC (permalink / raw)
To: Tonghao Zhang
Cc: David Miller, Cong Wang, Eric Dumazet, Willem de Bruijn,
Linux Kernel Network Developers
On Fri, 2017-12-08 at 13:28 +0800, Tonghao Zhang wrote:
> On Fri, Dec 8, 2017 at 1:20 AM, Eric Dumazet <eric.dumazet@gmail.com>
> wrote:
> > On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote:
> > > In some case, we want to know how many sockets are in use in
> > > different _net_ namespaces. It's a key resource metric.
> > >
> >
> > ...
> >
> > > +static void sock_inuse_add(struct net *net, int val)
> > > +{
> > > + if (net->core.prot_inuse)
> > > + this_cpu_add(*net->core.sock_inuse, val);
> > > +}
> >
> > This is very confusing.
> >
> > Why testing net->core.prot_inuse for NULL is needed at all ?
> >
> > Why not testing net->core.sock_inuse instead ?
> >
>
> Hi Eric and Cong, oh it's a typo. it's net->core.sock_inuse there.
> Why
> we should check the net->core.sock_inuse
> Now show you the code:
>
> cleanup_net will call all of the network namespace exit methods,
> rcu_barrier, and then remove the _net_ namespace.
>
> cleanup_net:
> list_for_each_entry_reverse(ops, &pernet_list, list)
> ops_exit_list(ops, &net_exit_list);
>
> rcu_barrier(); /* for netlink sock, the ‘deferred_put_nlk_sk’
> will
> be called. But sock_inuse has been released. */
Thats would be a bug.
Please find another way, but we want ultimately to check that before
net->core.sock_inuse is freed, folding the inuse count on all cpus is
0, to make sure we do not have a bug somewhere.
We should not have to test if net->core.sock_inuse is NULL or not from
sock_inuse_add(). Pointer must be there all the time.
The freeing should only happen once we are sure sock_inuse_add() can
not be called anymore.
>
>
> /* Finally it is safe to free my network namespace structure */
> list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {}
>
>
>
> Release the netlink sock created in kernel(not hold the _net_
> namespace):
>
> netlink_release
> call_rcu(&nlk->rcu, deferred_put_nlk_sk);
>
> deferred_put_nlk_sk
> sk_free(sk);
>
>
> I may add a comment for sock_inuse_add in v6.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.
2017-12-08 5:40 ` Eric Dumazet
@ 2017-12-08 9:52 ` Tonghao Zhang
2017-12-08 11:29 ` Tonghao Zhang
0 siblings, 1 reply; 13+ messages in thread
From: Tonghao Zhang @ 2017-12-08 9:52 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, Cong Wang, Eric Dumazet, Willem de Bruijn,
Linux Kernel Network Developers
On Fri, Dec 8, 2017 at 1:40 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2017-12-08 at 13:28 +0800, Tonghao Zhang wrote:
>> On Fri, Dec 8, 2017 at 1:20 AM, Eric Dumazet <eric.dumazet@gmail.com>
>> wrote:
>> > On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote:
>> > > In some case, we want to know how many sockets are in use in
>> > > different _net_ namespaces. It's a key resource metric.
>> > >
>> >
>> > ...
>> >
>> > > +static void sock_inuse_add(struct net *net, int val)
>> > > +{
>> > > + if (net->core.prot_inuse)
>> > > + this_cpu_add(*net->core.sock_inuse, val);
>> > > +}
>> >
>> > This is very confusing.
>> >
>> > Why testing net->core.prot_inuse for NULL is needed at all ?
>> >
>> > Why not testing net->core.sock_inuse instead ?
>> >
>>
>> Hi Eric and Cong, oh it's a typo. it's net->core.sock_inuse there.
>> Why
>> we should check the net->core.sock_inuse
>> Now show you the code:
>>
>> cleanup_net will call all of the network namespace exit methods,
>> rcu_barrier, and then remove the _net_ namespace.
>>
>> cleanup_net:
>> list_for_each_entry_reverse(ops, &pernet_list, list)
>> ops_exit_list(ops, &net_exit_list);
>>
>> rcu_barrier(); /* for netlink sock, the ‘deferred_put_nlk_sk’
>> will
>> be called. But sock_inuse has been released. */
>
>
> Thats would be a bug.
>
> Please find another way, but we want ultimately to check that before
> net->core.sock_inuse is freed, folding the inuse count on all cpus is
> 0, to make sure we do not have a bug somewhere.
Yes, I am aware of this issue even we will destroy the network namespace.
By the way, we can counter the socket-inuse in sock_alloc or sock_release.
In this way, we have to hold the network namespace again(via
get_net()) while sock
may hold it.
what do you think of this idea?
> We should not have to test if net->core.sock_inuse is NULL or not from
> sock_inuse_add(). Pointer must be there all the time.
>
> The freeing should only happen once we are sure sock_inuse_add() can
> not be called anymore.
>
>>
>>
>> /* Finally it is safe to free my network namespace structure */
>> list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {}
>>
>>
>>
>> Release the netlink sock created in kernel(not hold the _net_
>> namespace):
>>
>> netlink_release
>> call_rcu(&nlk->rcu, deferred_put_nlk_sk);
>>
>> deferred_put_nlk_sk
>> sk_free(sk);
>>
>>
>> I may add a comment for sock_inuse_add in v6.
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.
2017-12-08 9:52 ` Tonghao Zhang
@ 2017-12-08 11:29 ` Tonghao Zhang
2017-12-08 13:24 ` Eric Dumazet
0 siblings, 1 reply; 13+ messages in thread
From: Tonghao Zhang @ 2017-12-08 11:29 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, Cong Wang, Eric Dumazet, Willem de Bruijn,
Linux Kernel Network Developers
hi all. we can add synchronize_rcu and rcu_barrier in sock_inuse_exit_net to
ensure there are no outstanding rcu callbacks using this network namespace.
we will not have to test if net->core.sock_inuse is NULL or not from
sock_inuse_add(). :)
static void __net_exit sock_inuse_exit_net(struct net *net)
{
free_percpu(net->core.prot_inuse);
+
+ synchronize_rcu();
+ rcu_barrier();
+
+ free_percpu(net->core.sock_inuse);
}
On Fri, Dec 8, 2017 at 5:52 PM, Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> On Fri, Dec 8, 2017 at 1:40 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Fri, 2017-12-08 at 13:28 +0800, Tonghao Zhang wrote:
>>> On Fri, Dec 8, 2017 at 1:20 AM, Eric Dumazet <eric.dumazet@gmail.com>
>>> wrote:
>>> > On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote:
>>> > > In some case, we want to know how many sockets are in use in
>>> > > different _net_ namespaces. It's a key resource metric.
>>> > >
>>> >
>>> > ...
>>> >
>>> > > +static void sock_inuse_add(struct net *net, int val)
>>> > > +{
>>> > > + if (net->core.prot_inuse)
>>> > > + this_cpu_add(*net->core.sock_inuse, val);
>>> > > +}
>>> >
>>> > This is very confusing.
>>> >
>>> > Why testing net->core.prot_inuse for NULL is needed at all ?
>>> >
>>> > Why not testing net->core.sock_inuse instead ?
>>> >
>>>
>>> Hi Eric and Cong, oh it's a typo. it's net->core.sock_inuse there.
>>> Why
>>> we should check the net->core.sock_inuse
>>> Now show you the code:
>>>
>>> cleanup_net will call all of the network namespace exit methods,
>>> rcu_barrier, and then remove the _net_ namespace.
>>>
>>> cleanup_net:
>>> list_for_each_entry_reverse(ops, &pernet_list, list)
>>> ops_exit_list(ops, &net_exit_list);
>>>
>>> rcu_barrier(); /* for netlink sock, the ‘deferred_put_nlk_sk’
>>> will
>>> be called. But sock_inuse has been released. */
>>
>>
>> Thats would be a bug.
>>
>> Please find another way, but we want ultimately to check that before
>> net->core.sock_inuse is freed, folding the inuse count on all cpus is
>> 0, to make sure we do not have a bug somewhere.
>
> Yes, I am aware of this issue even we will destroy the network namespace.
> By the way, we can counter the socket-inuse in sock_alloc or sock_release.
> In this way, we have to hold the network namespace again(via
> get_net()) while sock
> may hold it.
>
> what do you think of this idea?
>
>> We should not have to test if net->core.sock_inuse is NULL or not from
>> sock_inuse_add(). Pointer must be there all the time.
>>
>> The freeing should only happen once we are sure sock_inuse_add() can
>> not be called anymore.
>>
>>>
>>>
>>> /* Finally it is safe to free my network namespace structure */
>>> list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {}
>>>
>>>
>>>
>>> Release the netlink sock created in kernel(not hold the _net_
>>> namespace):
>>>
>>> netlink_release
>>> call_rcu(&nlk->rcu, deferred_put_nlk_sk);
>>>
>>> deferred_put_nlk_sk
>>> sk_free(sk);
>>>
>>>
>>> I may add a comment for sock_inuse_add in v6.
>>
>>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.
2017-12-08 11:29 ` Tonghao Zhang
@ 2017-12-08 13:24 ` Eric Dumazet
2017-12-09 5:25 ` Tonghao Zhang
0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2017-12-08 13:24 UTC (permalink / raw)
To: Tonghao Zhang
Cc: David Miller, Cong Wang, Eric Dumazet, Willem de Bruijn,
Linux Kernel Network Developers
On Fri, 2017-12-08 at 19:29 +0800, Tonghao Zhang wrote:
> hi all. we can add synchronize_rcu and rcu_barrier in
> sock_inuse_exit_net to
> ensure there are no outstanding rcu callbacks using this network
> namespace.
> we will not have to test if net->core.sock_inuse is NULL or not from
> sock_inuse_add(). :)
>
> static void __net_exit sock_inuse_exit_net(struct net *net)
> {
> free_percpu(net->core.prot_inuse);
> +
> + synchronize_rcu();
> + rcu_barrier();
> +
> + free_percpu(net->core.sock_inuse);
> }
Oh well. Do you have any idea of the major problem this would add ?
Try the following, before and after your patches :
for i in `seq 1 40`
do
(for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
done
wait
( Check commit 8ca712c373a462cfa1b62272870b6c2c74aa83f9 )
This is a complex problem, we wont accept patches that kill network
namespaces dismantling performance by adding brute force
synchronize_rcu() or rcu_barrier() calls.
Why not freeing net->core.sock_inuse right before feeing net itself in
net_free() ?
You do not have to hijack sock_inuse_exit_net() just because it has a
misleading name.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.
2017-12-08 5:28 ` Tonghao Zhang
2017-12-08 5:40 ` Eric Dumazet
@ 2017-12-08 22:09 ` Cong Wang
2017-12-09 5:27 ` Tonghao Zhang
1 sibling, 1 reply; 13+ messages in thread
From: Cong Wang @ 2017-12-08 22:09 UTC (permalink / raw)
To: Tonghao Zhang
Cc: Eric Dumazet, David Miller, Eric Dumazet, Willem de Bruijn,
Linux Kernel Network Developers
On Thu, Dec 7, 2017 at 9:28 PM, Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> Release the netlink sock created in kernel(not hold the _net_ namespace):
>
You can avoid counting kernel sock by testing 'kern' in sk_alloc()
and testing 'sk->sk_net_refcnt' in __sk_free().
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.
2017-12-08 13:24 ` Eric Dumazet
@ 2017-12-09 5:25 ` Tonghao Zhang
0 siblings, 0 replies; 13+ messages in thread
From: Tonghao Zhang @ 2017-12-09 5:25 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, Cong Wang, Eric Dumazet, Willem de Bruijn,
Linux Kernel Network Developers
On Fri, Dec 8, 2017 at 9:24 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2017-12-08 at 19:29 +0800, Tonghao Zhang wrote:
>> hi all. we can add synchronize_rcu and rcu_barrier in
>> sock_inuse_exit_net to
>> ensure there are no outstanding rcu callbacks using this network
>> namespace.
>> we will not have to test if net->core.sock_inuse is NULL or not from
>> sock_inuse_add(). :)
>>
>> static void __net_exit sock_inuse_exit_net(struct net *net)
>> {
>> free_percpu(net->core.prot_inuse);
>> +
>> + synchronize_rcu();
>> + rcu_barrier();
>> +
>> + free_percpu(net->core.sock_inuse);
>> }
>
>
> Oh well. Do you have any idea of the major problem this would add ?
>
> Try the following, before and after your patches :
>
> for i in `seq 1 40`
> do
> (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
> done
> wait
>
> ( Check commit 8ca712c373a462cfa1b62272870b6c2c74aa83f9 )
>
Yes, I did the test. The patches drop the performance.
Before patch:
# time ./add_del_unshare.sh
net_namespace 97 125 6016 5 8 : tunables 0 0
0 : slabdata 25 25 0
real 8m19.665s
user 0m4.268s
sys 0m6.477s
After :
# time ./add_del_unshare.sh
net_namespace 102 130 6016 5 8 : tunables 0 0
0 : slabdata 26 26 0
real 8m52.563s
user 0m4.040s
sys 0m7.558s
>
> This is a complex problem, we wont accept patches that kill network
> namespaces dismantling performance by adding brute force
> synchronize_rcu() or rcu_barrier() calls.
>
> Why not freeing net->core.sock_inuse right before feeing net itself in
> net_free() ?
I try this way, alloc core.sock_inuse in net_alloc(), free it in net_free ().
It does not drop performance, and we will not always to check the
core.sock_inuse
in sock_inuse_add().
After :
# time ./add_del_unshare.sh
net_namespace 109 135 6016 5 8 : tunables 0 0
0 : slabdata 27 27 0
real 8m19.265s
user 0m4.090s
sys 0m8.185s
> You do not have to hijack sock_inuse_exit_net() just because it has a
> misleading name.
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.
2017-12-08 22:09 ` Cong Wang
@ 2017-12-09 5:27 ` Tonghao Zhang
2017-12-09 19:42 ` Cong Wang
0 siblings, 1 reply; 13+ messages in thread
From: Tonghao Zhang @ 2017-12-09 5:27 UTC (permalink / raw)
To: Cong Wang
Cc: Eric Dumazet, David Miller, Eric Dumazet, Willem de Bruijn,
Linux Kernel Network Developers
On Sat, Dec 9, 2017 at 6:09 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Thu, Dec 7, 2017 at 9:28 PM, Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>>
>> Release the netlink sock created in kernel(not hold the _net_ namespace):
>>
>
> You can avoid counting kernel sock by testing 'kern' in sk_alloc()
> and testing 'sk->sk_net_refcnt' in __sk_free().
Hi cong, if we do it in this way, we will not counter the sock created
in kernel, right ?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace.
2017-12-09 5:27 ` Tonghao Zhang
@ 2017-12-09 19:42 ` Cong Wang
0 siblings, 0 replies; 13+ messages in thread
From: Cong Wang @ 2017-12-09 19:42 UTC (permalink / raw)
To: Tonghao Zhang
Cc: Eric Dumazet, David Miller, Eric Dumazet, Willem de Bruijn,
Linux Kernel Network Developers
On Fri, Dec 8, 2017 at 9:27 PM, Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> On Sat, Dec 9, 2017 at 6:09 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Thu, Dec 7, 2017 at 9:28 PM, Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>>>
>>> Release the netlink sock created in kernel(not hold the _net_ namespace):
>>>
>>
>> You can avoid counting kernel sock by testing 'kern' in sk_alloc()
>> and testing 'sk->sk_net_refcnt' in __sk_free().
> Hi cong, if we do it in this way, we will not counter the sock created
> in kernel, right ?
Yes, it is not very useful for user-space to know how many kernel
sockets we create, IMHO, so not counting kernel sockets seems
fine.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-12-09 19:42 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-07 16:45 [PATCH v5 1/2] sock: Change the netns_core member name Tonghao Zhang
2017-12-07 16:45 ` [PATCH v5 2/2] sock: Move the socket inuse to namespace Tonghao Zhang
2017-12-07 17:20 ` Eric Dumazet
2017-12-07 21:28 ` Cong Wang
2017-12-08 5:28 ` Tonghao Zhang
2017-12-08 5:40 ` Eric Dumazet
2017-12-08 9:52 ` Tonghao Zhang
2017-12-08 11:29 ` Tonghao Zhang
2017-12-08 13:24 ` Eric Dumazet
2017-12-09 5:25 ` Tonghao Zhang
2017-12-08 22:09 ` Cong Wang
2017-12-09 5:27 ` Tonghao Zhang
2017-12-09 19:42 ` Cong Wang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.