* [PATCH v5 1/2] sock: Change the netns_core member name. @ 2017-12-07 16:45 Tonghao Zhang 2017-12-07 16:45 ` [PATCH v5 2/2] sock: Move the socket inuse to namespace Tonghao Zhang 0 siblings, 1 reply; 13+ messages in thread From: Tonghao Zhang @ 2017-12-07 16:45 UTC (permalink / raw) To: davem, xiyou.wangcong, edumazet, willemb; +Cc: netdev, Tonghao Zhang Change the member name will make the code more readable. This patch will be used in next patch. Signed-off-by: Martin Zhang <zhangjunweimartin@didichuxing.com> Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com> --- include/net/netns/core.h | 2 +- net/core/sock.c | 10 +++++----- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/net/netns/core.h b/include/net/netns/core.h index 0ad4d0c..45cfb5d 100644 --- a/include/net/netns/core.h +++ b/include/net/netns/core.h @@ -11,7 +11,7 @@ struct netns_core { int sysctl_somaxconn; - struct prot_inuse __percpu *inuse; + struct prot_inuse __percpu *prot_inuse; }; #endif diff --git a/net/core/sock.c b/net/core/sock.c index c0b5b2f..c2dd2d3 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3045,7 +3045,7 @@ struct prot_inuse { void sock_prot_inuse_add(struct net *net, struct proto *prot, int val) { - __this_cpu_add(net->core.inuse->val[prot->inuse_idx], val); + __this_cpu_add(net->core.prot_inuse->val[prot->inuse_idx], val); } EXPORT_SYMBOL_GPL(sock_prot_inuse_add); @@ -3055,7 +3055,7 @@ int sock_prot_inuse_get(struct net *net, struct proto *prot) int res = 0; for_each_possible_cpu(cpu) - res += per_cpu_ptr(net->core.inuse, cpu)->val[idx]; + res += per_cpu_ptr(net->core.prot_inuse, cpu)->val[idx]; return res >= 0 ? res : 0; } @@ -3063,13 +3063,13 @@ int sock_prot_inuse_get(struct net *net, struct proto *prot) static int __net_init sock_inuse_init_net(struct net *net) { - net->core.inuse = alloc_percpu(struct prot_inuse); - return net->core.inuse ? 0 : -ENOMEM; + net->core.prot_inuse = alloc_percpu(struct prot_inuse); + return net->core.prot_inuse ? 0 : -ENOMEM; } static void __net_exit sock_inuse_exit_net(struct net *net) { - free_percpu(net->core.inuse); + free_percpu(net->core.prot_inuse); } static struct pernet_operations net_inuse_ops = { -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 2/2] sock: Move the socket inuse to namespace. 2017-12-07 16:45 [PATCH v5 1/2] sock: Change the netns_core member name Tonghao Zhang @ 2017-12-07 16:45 ` Tonghao Zhang 2017-12-07 17:20 ` Eric Dumazet 0 siblings, 1 reply; 13+ messages in thread From: Tonghao Zhang @ 2017-12-07 16:45 UTC (permalink / raw) To: davem, xiyou.wangcong, edumazet, willemb; +Cc: netdev, Tonghao Zhang In some case, we want to know how many sockets are in use in different _net_ namespaces. It's a key resource metric. This patch add a member in struct netns_core. This is a counter for socket-inuse in the _net_ namespace. The patch will add/sub counter in the sk_alloc, sk_clone_lock and __sk_free. The main reasons for doing this are that: 1. When linux calls the 'do_exit' for process to exit, the functions 'exit_task_namespaces' and 'exit_task_work' will be called sequentially. 'exit_task_namespaces' may have destroyed the _net_ namespace, but 'sock_release' called in 'exit_task_work' may use the _net_ namespace if we counter the socket-inuse in sock_release. 2. socket and sock are in pair. More important, sock holds the _net_ namespace. We counter the socket-inuse in sock, for avoiding holding _net_ namespace again in socket. It's a easy way to maintain the code. Signed-off-by: Martin Zhang <zhangjunweimartin@didichuxing.com> Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com> --- include/net/netns/core.h | 1 + include/net/sock.h | 1 + net/core/sock.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++-- net/socket.c | 21 ++----------------- 4 files changed, 54 insertions(+), 21 deletions(-) diff --git a/include/net/netns/core.h b/include/net/netns/core.h index 45cfb5d..d1b4748f 100644 --- a/include/net/netns/core.h +++ b/include/net/netns/core.h @@ -11,6 +11,7 @@ struct netns_core { int sysctl_somaxconn; + int __percpu *sock_inuse; struct prot_inuse __percpu *prot_inuse; }; diff --git a/include/net/sock.h b/include/net/sock.h index 79e1a2c..0809b31 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1266,6 +1266,7 @@ static inline void sk_sockets_allocated_inc(struct sock *sk) /* Called with local bh disabled */ void sock_prot_inuse_add(struct net *net, struct proto *prot, int inc); int sock_prot_inuse_get(struct net *net, struct proto *proto); +int sock_inuse_get(struct net *net); #else static inline void sock_prot_inuse_add(struct net *net, struct proto *prot, int inc) diff --git a/net/core/sock.c b/net/core/sock.c index c2dd2d3..a11680a 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -145,6 +145,8 @@ static DEFINE_MUTEX(proto_list_mutex); static LIST_HEAD(proto_list); +static void sock_inuse_add(struct net *net, int val); + /** * sk_ns_capable - General socket capability test * @sk: Socket to use a capability on or through @@ -1534,6 +1536,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority, if (likely(sk->sk_net_refcnt)) get_net(net); sock_net_set(sk, net); + sock_inuse_add(net, 1); refcount_set(&sk->sk_wmem_alloc, 1); mem_cgroup_sk_alloc(sk); @@ -1595,6 +1598,8 @@ void sk_destruct(struct sock *sk) static void __sk_free(struct sock *sk) { + sock_inuse_add(sock_net(sk), -1); + if (unlikely(sock_diag_has_destroy_listeners(sk) && sk->sk_net_refcnt)) sock_diag_broadcast_destroy(sk); else @@ -1716,6 +1721,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority) newsk->sk_priority = 0; newsk->sk_incoming_cpu = raw_smp_processor_id(); atomic64_set(&newsk->sk_cookie, 0); + sock_inuse_add(sock_net(newsk), 1); /* * Before updating sk_refcnt, we must commit prior changes to memory @@ -3061,15 +3067,53 @@ int sock_prot_inuse_get(struct net *net, struct proto *prot) } EXPORT_SYMBOL_GPL(sock_prot_inuse_get); +static void sock_inuse_add(struct net *net, int val) +{ + if (net->core.prot_inuse) + this_cpu_add(*net->core.sock_inuse, val); +} + +int sock_inuse_get(struct net *net) +{ + int cpu, res = 0; + + if (!net->core.prot_inuse) + return 0; + + for_each_possible_cpu(cpu) + res += *per_cpu_ptr(net->core.sock_inuse, cpu); + + return res >= 0 ? res : 0; +} +EXPORT_SYMBOL_GPL(sock_inuse_get); + static int __net_init sock_inuse_init_net(struct net *net) { net->core.prot_inuse = alloc_percpu(struct prot_inuse); - return net->core.prot_inuse ? 0 : -ENOMEM; + if (!net->core.prot_inuse) + return -ENOMEM; + + net->core.sock_inuse = alloc_percpu(int); + if (!net->core.sock_inuse) + goto out; + + return 0; +out: + free_percpu(net->core.prot_inuse); + return -ENOMEM; } static void __net_exit sock_inuse_exit_net(struct net *net) { - free_percpu(net->core.prot_inuse); + if (net->core.prot_inuse) { + free_percpu(net->core.prot_inuse); + net->core.prot_inuse = NULL; + } + + if (net->core.sock_inuse) { + free_percpu(net->core.sock_inuse); + net->core.prot_inuse = NULL; + } } static struct pernet_operations net_inuse_ops = { @@ -3112,6 +3156,10 @@ static inline void assign_proto_idx(struct proto *prot) static inline void release_proto_idx(struct proto *prot) { } + +static void sock_inuse_add(struct net *net, int val) +{ +} #endif static void req_prot_cleanup(struct request_sock_ops *rsk_prot) diff --git a/net/socket.c b/net/socket.c index 42d8e9c..183de8f01 100644 --- a/net/socket.c +++ b/net/socket.c @@ -163,12 +163,6 @@ static ssize_t sock_splice_read(struct file *file, loff_t *ppos, static const struct net_proto_family __rcu *net_families[NPROTO] __read_mostly; /* - * Statistics counters of the socket lists - */ - -static DEFINE_PER_CPU(int, sockets_in_use); - -/* * Support routines. * Move socket addresses back and forth across the kernel/user * divide and look after the messy bits. @@ -574,7 +568,6 @@ struct socket *sock_alloc(void) inode->i_gid = current_fsgid(); inode->i_op = &sockfs_inode_ops; - this_cpu_add(sockets_in_use, 1); return sock; } EXPORT_SYMBOL(sock_alloc); @@ -601,7 +594,6 @@ void sock_release(struct socket *sock) if (rcu_dereference_protected(sock->wq, 1)->fasync_list) pr_err("%s: fasync list not empty!\n", __func__); - this_cpu_sub(sockets_in_use, 1); if (!sock->file) { iput(SOCK_INODE(sock)); return; @@ -2644,17 +2636,8 @@ static int __init sock_init(void) #ifdef CONFIG_PROC_FS void socket_seq_show(struct seq_file *seq) { - int cpu; - int counter = 0; - - for_each_possible_cpu(cpu) - counter += per_cpu(sockets_in_use, cpu); - - /* It can be negative, by the way. 8) */ - if (counter < 0) - counter = 0; - - seq_printf(seq, "sockets: used %d\n", counter); + seq_printf(seq, "sockets: used %d\n", + sock_inuse_get(seq->private)); } #endif /* CONFIG_PROC_FS */ -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace. 2017-12-07 16:45 ` [PATCH v5 2/2] sock: Move the socket inuse to namespace Tonghao Zhang @ 2017-12-07 17:20 ` Eric Dumazet 2017-12-07 21:28 ` Cong Wang 2017-12-08 5:28 ` Tonghao Zhang 0 siblings, 2 replies; 13+ messages in thread From: Eric Dumazet @ 2017-12-07 17:20 UTC (permalink / raw) To: Tonghao Zhang, davem, xiyou.wangcong, edumazet, willemb; +Cc: netdev On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote: > In some case, we want to know how many sockets are in use in > different _net_ namespaces. It's a key resource metric. > ... > +static void sock_inuse_add(struct net *net, int val) > +{ > + if (net->core.prot_inuse) > + this_cpu_add(*net->core.sock_inuse, val); > +} This is very confusing. Why testing net->core.prot_inuse for NULL is needed at all ? Why not testing net->core.sock_inuse instead ? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace. 2017-12-07 17:20 ` Eric Dumazet @ 2017-12-07 21:28 ` Cong Wang 2017-12-08 5:28 ` Tonghao Zhang 1 sibling, 0 replies; 13+ messages in thread From: Cong Wang @ 2017-12-07 21:28 UTC (permalink / raw) To: Eric Dumazet Cc: Tonghao Zhang, David Miller, Eric Dumazet, Willem de Bruijn, Linux Kernel Network Developers On Thu, Dec 7, 2017 at 9:20 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote: >> In some case, we want to know how many sockets are in use in >> different _net_ namespaces. It's a key resource metric. >> > > ... > >> +static void sock_inuse_add(struct net *net, int val) >> +{ >> + if (net->core.prot_inuse) >> + this_cpu_add(*net->core.sock_inuse, val); >> +} > > This is very confusing. > > Why testing net->core.prot_inuse for NULL is needed at all ? > > Why not testing net->core.sock_inuse instead ? I bet that is copy-n-paste error given that sock_inuse_exit_net() has a similar typo. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace. 2017-12-07 17:20 ` Eric Dumazet 2017-12-07 21:28 ` Cong Wang @ 2017-12-08 5:28 ` Tonghao Zhang 2017-12-08 5:40 ` Eric Dumazet 2017-12-08 22:09 ` Cong Wang 1 sibling, 2 replies; 13+ messages in thread From: Tonghao Zhang @ 2017-12-08 5:28 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, Cong Wang, Eric Dumazet, Willem de Bruijn, Linux Kernel Network Developers On Fri, Dec 8, 2017 at 1:20 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote: >> In some case, we want to know how many sockets are in use in >> different _net_ namespaces. It's a key resource metric. >> > > ... > >> +static void sock_inuse_add(struct net *net, int val) >> +{ >> + if (net->core.prot_inuse) >> + this_cpu_add(*net->core.sock_inuse, val); >> +} > > This is very confusing. > > Why testing net->core.prot_inuse for NULL is needed at all ? > > Why not testing net->core.sock_inuse instead ? > Hi Eric and Cong, oh it's a typo. it's net->core.sock_inuse there. Why we should check the net->core.sock_inuse Now show you the code: cleanup_net will call all of the network namespace exit methods, rcu_barrier, and then remove the _net_ namespace. cleanup_net: list_for_each_entry_reverse(ops, &pernet_list, list) ops_exit_list(ops, &net_exit_list); rcu_barrier(); /* for netlink sock, the ‘deferred_put_nlk_sk’ will be called. But sock_inuse has been released. */ /* Finally it is safe to free my network namespace structure */ list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {} Release the netlink sock created in kernel(not hold the _net_ namespace): netlink_release call_rcu(&nlk->rcu, deferred_put_nlk_sk); deferred_put_nlk_sk sk_free(sk); I may add a comment for sock_inuse_add in v6. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace. 2017-12-08 5:28 ` Tonghao Zhang @ 2017-12-08 5:40 ` Eric Dumazet 2017-12-08 9:52 ` Tonghao Zhang 2017-12-08 22:09 ` Cong Wang 1 sibling, 1 reply; 13+ messages in thread From: Eric Dumazet @ 2017-12-08 5:40 UTC (permalink / raw) To: Tonghao Zhang Cc: David Miller, Cong Wang, Eric Dumazet, Willem de Bruijn, Linux Kernel Network Developers On Fri, 2017-12-08 at 13:28 +0800, Tonghao Zhang wrote: > On Fri, Dec 8, 2017 at 1:20 AM, Eric Dumazet <eric.dumazet@gmail.com> > wrote: > > On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote: > > > In some case, we want to know how many sockets are in use in > > > different _net_ namespaces. It's a key resource metric. > > > > > > > ... > > > > > +static void sock_inuse_add(struct net *net, int val) > > > +{ > > > + if (net->core.prot_inuse) > > > + this_cpu_add(*net->core.sock_inuse, val); > > > +} > > > > This is very confusing. > > > > Why testing net->core.prot_inuse for NULL is needed at all ? > > > > Why not testing net->core.sock_inuse instead ? > > > > Hi Eric and Cong, oh it's a typo. it's net->core.sock_inuse there. > Why > we should check the net->core.sock_inuse > Now show you the code: > > cleanup_net will call all of the network namespace exit methods, > rcu_barrier, and then remove the _net_ namespace. > > cleanup_net: > list_for_each_entry_reverse(ops, &pernet_list, list) > ops_exit_list(ops, &net_exit_list); > > rcu_barrier(); /* for netlink sock, the ‘deferred_put_nlk_sk’ > will > be called. But sock_inuse has been released. */ Thats would be a bug. Please find another way, but we want ultimately to check that before net->core.sock_inuse is freed, folding the inuse count on all cpus is 0, to make sure we do not have a bug somewhere. We should not have to test if net->core.sock_inuse is NULL or not from sock_inuse_add(). Pointer must be there all the time. The freeing should only happen once we are sure sock_inuse_add() can not be called anymore. > > > /* Finally it is safe to free my network namespace structure */ > list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {} > > > > Release the netlink sock created in kernel(not hold the _net_ > namespace): > > netlink_release > call_rcu(&nlk->rcu, deferred_put_nlk_sk); > > deferred_put_nlk_sk > sk_free(sk); > > > I may add a comment for sock_inuse_add in v6. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace. 2017-12-08 5:40 ` Eric Dumazet @ 2017-12-08 9:52 ` Tonghao Zhang 2017-12-08 11:29 ` Tonghao Zhang 0 siblings, 1 reply; 13+ messages in thread From: Tonghao Zhang @ 2017-12-08 9:52 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, Cong Wang, Eric Dumazet, Willem de Bruijn, Linux Kernel Network Developers On Fri, Dec 8, 2017 at 1:40 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Fri, 2017-12-08 at 13:28 +0800, Tonghao Zhang wrote: >> On Fri, Dec 8, 2017 at 1:20 AM, Eric Dumazet <eric.dumazet@gmail.com> >> wrote: >> > On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote: >> > > In some case, we want to know how many sockets are in use in >> > > different _net_ namespaces. It's a key resource metric. >> > > >> > >> > ... >> > >> > > +static void sock_inuse_add(struct net *net, int val) >> > > +{ >> > > + if (net->core.prot_inuse) >> > > + this_cpu_add(*net->core.sock_inuse, val); >> > > +} >> > >> > This is very confusing. >> > >> > Why testing net->core.prot_inuse for NULL is needed at all ? >> > >> > Why not testing net->core.sock_inuse instead ? >> > >> >> Hi Eric and Cong, oh it's a typo. it's net->core.sock_inuse there. >> Why >> we should check the net->core.sock_inuse >> Now show you the code: >> >> cleanup_net will call all of the network namespace exit methods, >> rcu_barrier, and then remove the _net_ namespace. >> >> cleanup_net: >> list_for_each_entry_reverse(ops, &pernet_list, list) >> ops_exit_list(ops, &net_exit_list); >> >> rcu_barrier(); /* for netlink sock, the ‘deferred_put_nlk_sk’ >> will >> be called. But sock_inuse has been released. */ > > > Thats would be a bug. > > Please find another way, but we want ultimately to check that before > net->core.sock_inuse is freed, folding the inuse count on all cpus is > 0, to make sure we do not have a bug somewhere. Yes, I am aware of this issue even we will destroy the network namespace. By the way, we can counter the socket-inuse in sock_alloc or sock_release. In this way, we have to hold the network namespace again(via get_net()) while sock may hold it. what do you think of this idea? > We should not have to test if net->core.sock_inuse is NULL or not from > sock_inuse_add(). Pointer must be there all the time. > > The freeing should only happen once we are sure sock_inuse_add() can > not be called anymore. > >> >> >> /* Finally it is safe to free my network namespace structure */ >> list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {} >> >> >> >> Release the netlink sock created in kernel(not hold the _net_ >> namespace): >> >> netlink_release >> call_rcu(&nlk->rcu, deferred_put_nlk_sk); >> >> deferred_put_nlk_sk >> sk_free(sk); >> >> >> I may add a comment for sock_inuse_add in v6. > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace. 2017-12-08 9:52 ` Tonghao Zhang @ 2017-12-08 11:29 ` Tonghao Zhang 2017-12-08 13:24 ` Eric Dumazet 0 siblings, 1 reply; 13+ messages in thread From: Tonghao Zhang @ 2017-12-08 11:29 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, Cong Wang, Eric Dumazet, Willem de Bruijn, Linux Kernel Network Developers hi all. we can add synchronize_rcu and rcu_barrier in sock_inuse_exit_net to ensure there are no outstanding rcu callbacks using this network namespace. we will not have to test if net->core.sock_inuse is NULL or not from sock_inuse_add(). :) static void __net_exit sock_inuse_exit_net(struct net *net) { free_percpu(net->core.prot_inuse); + + synchronize_rcu(); + rcu_barrier(); + + free_percpu(net->core.sock_inuse); } On Fri, Dec 8, 2017 at 5:52 PM, Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote: > On Fri, Dec 8, 2017 at 1:40 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> On Fri, 2017-12-08 at 13:28 +0800, Tonghao Zhang wrote: >>> On Fri, Dec 8, 2017 at 1:20 AM, Eric Dumazet <eric.dumazet@gmail.com> >>> wrote: >>> > On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote: >>> > > In some case, we want to know how many sockets are in use in >>> > > different _net_ namespaces. It's a key resource metric. >>> > > >>> > >>> > ... >>> > >>> > > +static void sock_inuse_add(struct net *net, int val) >>> > > +{ >>> > > + if (net->core.prot_inuse) >>> > > + this_cpu_add(*net->core.sock_inuse, val); >>> > > +} >>> > >>> > This is very confusing. >>> > >>> > Why testing net->core.prot_inuse for NULL is needed at all ? >>> > >>> > Why not testing net->core.sock_inuse instead ? >>> > >>> >>> Hi Eric and Cong, oh it's a typo. it's net->core.sock_inuse there. >>> Why >>> we should check the net->core.sock_inuse >>> Now show you the code: >>> >>> cleanup_net will call all of the network namespace exit methods, >>> rcu_barrier, and then remove the _net_ namespace. >>> >>> cleanup_net: >>> list_for_each_entry_reverse(ops, &pernet_list, list) >>> ops_exit_list(ops, &net_exit_list); >>> >>> rcu_barrier(); /* for netlink sock, the ‘deferred_put_nlk_sk’ >>> will >>> be called. But sock_inuse has been released. */ >> >> >> Thats would be a bug. >> >> Please find another way, but we want ultimately to check that before >> net->core.sock_inuse is freed, folding the inuse count on all cpus is >> 0, to make sure we do not have a bug somewhere. > > Yes, I am aware of this issue even we will destroy the network namespace. > By the way, we can counter the socket-inuse in sock_alloc or sock_release. > In this way, we have to hold the network namespace again(via > get_net()) while sock > may hold it. > > what do you think of this idea? > >> We should not have to test if net->core.sock_inuse is NULL or not from >> sock_inuse_add(). Pointer must be there all the time. >> >> The freeing should only happen once we are sure sock_inuse_add() can >> not be called anymore. >> >>> >>> >>> /* Finally it is safe to free my network namespace structure */ >>> list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {} >>> >>> >>> >>> Release the netlink sock created in kernel(not hold the _net_ >>> namespace): >>> >>> netlink_release >>> call_rcu(&nlk->rcu, deferred_put_nlk_sk); >>> >>> deferred_put_nlk_sk >>> sk_free(sk); >>> >>> >>> I may add a comment for sock_inuse_add in v6. >> >> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace. 2017-12-08 11:29 ` Tonghao Zhang @ 2017-12-08 13:24 ` Eric Dumazet 2017-12-09 5:25 ` Tonghao Zhang 0 siblings, 1 reply; 13+ messages in thread From: Eric Dumazet @ 2017-12-08 13:24 UTC (permalink / raw) To: Tonghao Zhang Cc: David Miller, Cong Wang, Eric Dumazet, Willem de Bruijn, Linux Kernel Network Developers On Fri, 2017-12-08 at 19:29 +0800, Tonghao Zhang wrote: > hi all. we can add synchronize_rcu and rcu_barrier in > sock_inuse_exit_net to > ensure there are no outstanding rcu callbacks using this network > namespace. > we will not have to test if net->core.sock_inuse is NULL or not from > sock_inuse_add(). :) > > static void __net_exit sock_inuse_exit_net(struct net *net) > { > free_percpu(net->core.prot_inuse); > + > + synchronize_rcu(); > + rcu_barrier(); > + > + free_percpu(net->core.sock_inuse); > } Oh well. Do you have any idea of the major problem this would add ? Try the following, before and after your patches : for i in `seq 1 40` do (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) & done wait ( Check commit 8ca712c373a462cfa1b62272870b6c2c74aa83f9 ) This is a complex problem, we wont accept patches that kill network namespaces dismantling performance by adding brute force synchronize_rcu() or rcu_barrier() calls. Why not freeing net->core.sock_inuse right before feeing net itself in net_free() ? You do not have to hijack sock_inuse_exit_net() just because it has a misleading name. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace. 2017-12-08 13:24 ` Eric Dumazet @ 2017-12-09 5:25 ` Tonghao Zhang 0 siblings, 0 replies; 13+ messages in thread From: Tonghao Zhang @ 2017-12-09 5:25 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, Cong Wang, Eric Dumazet, Willem de Bruijn, Linux Kernel Network Developers On Fri, Dec 8, 2017 at 9:24 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Fri, 2017-12-08 at 19:29 +0800, Tonghao Zhang wrote: >> hi all. we can add synchronize_rcu and rcu_barrier in >> sock_inuse_exit_net to >> ensure there are no outstanding rcu callbacks using this network >> namespace. >> we will not have to test if net->core.sock_inuse is NULL or not from >> sock_inuse_add(). :) >> >> static void __net_exit sock_inuse_exit_net(struct net *net) >> { >> free_percpu(net->core.prot_inuse); >> + >> + synchronize_rcu(); >> + rcu_barrier(); >> + >> + free_percpu(net->core.sock_inuse); >> } > > > Oh well. Do you have any idea of the major problem this would add ? > > Try the following, before and after your patches : > > for i in `seq 1 40` > do > (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) & > done > wait > > ( Check commit 8ca712c373a462cfa1b62272870b6c2c74aa83f9 ) > Yes, I did the test. The patches drop the performance. Before patch: # time ./add_del_unshare.sh net_namespace 97 125 6016 5 8 : tunables 0 0 0 : slabdata 25 25 0 real 8m19.665s user 0m4.268s sys 0m6.477s After : # time ./add_del_unshare.sh net_namespace 102 130 6016 5 8 : tunables 0 0 0 : slabdata 26 26 0 real 8m52.563s user 0m4.040s sys 0m7.558s > > This is a complex problem, we wont accept patches that kill network > namespaces dismantling performance by adding brute force > synchronize_rcu() or rcu_barrier() calls. > > Why not freeing net->core.sock_inuse right before feeing net itself in > net_free() ? I try this way, alloc core.sock_inuse in net_alloc(), free it in net_free (). It does not drop performance, and we will not always to check the core.sock_inuse in sock_inuse_add(). After : # time ./add_del_unshare.sh net_namespace 109 135 6016 5 8 : tunables 0 0 0 : slabdata 27 27 0 real 8m19.265s user 0m4.090s sys 0m8.185s > You do not have to hijack sock_inuse_exit_net() just because it has a > misleading name. > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace. 2017-12-08 5:28 ` Tonghao Zhang 2017-12-08 5:40 ` Eric Dumazet @ 2017-12-08 22:09 ` Cong Wang 2017-12-09 5:27 ` Tonghao Zhang 1 sibling, 1 reply; 13+ messages in thread From: Cong Wang @ 2017-12-08 22:09 UTC (permalink / raw) To: Tonghao Zhang Cc: Eric Dumazet, David Miller, Eric Dumazet, Willem de Bruijn, Linux Kernel Network Developers On Thu, Dec 7, 2017 at 9:28 PM, Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote: > > Release the netlink sock created in kernel(not hold the _net_ namespace): > You can avoid counting kernel sock by testing 'kern' in sk_alloc() and testing 'sk->sk_net_refcnt' in __sk_free(). ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace. 2017-12-08 22:09 ` Cong Wang @ 2017-12-09 5:27 ` Tonghao Zhang 2017-12-09 19:42 ` Cong Wang 0 siblings, 1 reply; 13+ messages in thread From: Tonghao Zhang @ 2017-12-09 5:27 UTC (permalink / raw) To: Cong Wang Cc: Eric Dumazet, David Miller, Eric Dumazet, Willem de Bruijn, Linux Kernel Network Developers On Sat, Dec 9, 2017 at 6:09 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > On Thu, Dec 7, 2017 at 9:28 PM, Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote: >> >> Release the netlink sock created in kernel(not hold the _net_ namespace): >> > > You can avoid counting kernel sock by testing 'kern' in sk_alloc() > and testing 'sk->sk_net_refcnt' in __sk_free(). Hi cong, if we do it in this way, we will not counter the sock created in kernel, right ? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace. 2017-12-09 5:27 ` Tonghao Zhang @ 2017-12-09 19:42 ` Cong Wang 0 siblings, 0 replies; 13+ messages in thread From: Cong Wang @ 2017-12-09 19:42 UTC (permalink / raw) To: Tonghao Zhang Cc: Eric Dumazet, David Miller, Eric Dumazet, Willem de Bruijn, Linux Kernel Network Developers On Fri, Dec 8, 2017 at 9:27 PM, Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote: > On Sat, Dec 9, 2017 at 6:09 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote: >> On Thu, Dec 7, 2017 at 9:28 PM, Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote: >>> >>> Release the netlink sock created in kernel(not hold the _net_ namespace): >>> >> >> You can avoid counting kernel sock by testing 'kern' in sk_alloc() >> and testing 'sk->sk_net_refcnt' in __sk_free(). > Hi cong, if we do it in this way, we will not counter the sock created > in kernel, right ? Yes, it is not very useful for user-space to know how many kernel sockets we create, IMHO, so not counting kernel sockets seems fine. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-12-09 19:42 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-12-07 16:45 [PATCH v5 1/2] sock: Change the netns_core member name Tonghao Zhang 2017-12-07 16:45 ` [PATCH v5 2/2] sock: Move the socket inuse to namespace Tonghao Zhang 2017-12-07 17:20 ` Eric Dumazet 2017-12-07 21:28 ` Cong Wang 2017-12-08 5:28 ` Tonghao Zhang 2017-12-08 5:40 ` Eric Dumazet 2017-12-08 9:52 ` Tonghao Zhang 2017-12-08 11:29 ` Tonghao Zhang 2017-12-08 13:24 ` Eric Dumazet 2017-12-09 5:25 ` Tonghao Zhang 2017-12-08 22:09 ` Cong Wang 2017-12-09 5:27 ` Tonghao Zhang 2017-12-09 19:42 ` Cong Wang
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.