From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tonghao Zhang Subject: Re: [PATCH v5 2/2] sock: Move the socket inuse to namespace. Date: Fri, 8 Dec 2017 19:29:43 +0800 Message-ID: References: <1512665148-2413-1-git-send-email-xiangxia.m.yue@gmail.com> <1512665148-2413-2-git-send-email-xiangxia.m.yue@gmail.com> <1512667208.25033.13.camel@gmail.com> <1512711658.25033.23.camel@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Cc: David Miller , Cong Wang , Eric Dumazet , Willem de Bruijn , Linux Kernel Network Developers To: Eric Dumazet Return-path: Received: from mail-ot0-f196.google.com ([74.125.82.196]:35110 "EHLO mail-ot0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750817AbdLHL3o (ORCPT ); Fri, 8 Dec 2017 06:29:44 -0500 Received: by mail-ot0-f196.google.com with SMTP id q3so8934930oth.2 for ; Fri, 08 Dec 2017 03:29:44 -0800 (PST) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: hi all. we can add synchronize_rcu and rcu_barrier in sock_inuse_exit_net t= o ensure there are no outstanding rcu callbacks using this network namespace. we will not have to test if net->core.sock_inuse is NULL or not from sock_inuse_add(). :) static void __net_exit sock_inuse_exit_net(struct net *net) { free_percpu(net->core.prot_inuse); + + synchronize_rcu(); + rcu_barrier(); + + free_percpu(net->core.sock_inuse); } On Fri, Dec 8, 2017 at 5:52 PM, Tonghao Zhang wr= ote: > On Fri, Dec 8, 2017 at 1:40 PM, Eric Dumazet wro= te: >> On Fri, 2017-12-08 at 13:28 +0800, Tonghao Zhang wrote: >>> On Fri, Dec 8, 2017 at 1:20 AM, Eric Dumazet >>> wrote: >>> > On Thu, 2017-12-07 at 08:45 -0800, Tonghao Zhang wrote: >>> > > In some case, we want to know how many sockets are in use in >>> > > different _net_ namespaces. It's a key resource metric. >>> > > >>> > >>> > ... >>> > >>> > > +static void sock_inuse_add(struct net *net, int val) >>> > > +{ >>> > > + if (net->core.prot_inuse) >>> > > + this_cpu_add(*net->core.sock_inuse, val); >>> > > +} >>> > >>> > This is very confusing. >>> > >>> > Why testing net->core.prot_inuse for NULL is needed at all ? >>> > >>> > Why not testing net->core.sock_inuse instead ? >>> > >>> >>> Hi Eric and Cong, oh it's a typo. it's net->core.sock_inuse there. >>> Why >>> we should check the net->core.sock_inuse >>> Now show you the code: >>> >>> cleanup_net will call all of the network namespace exit methods, >>> rcu_barrier, and then remove the _net_ namespace. >>> >>> cleanup_net: >>> list_for_each_entry_reverse(ops, &pernet_list, list) >>> ops_exit_list(ops, &net_exit_list); >>> >>> rcu_barrier(); /* for netlink sock, the =E2=80=98deferred_put_nlk_s= k=E2=80=99 >>> will >>> be called. But sock_inuse has been released. */ >> >> >> Thats would be a bug. >> >> Please find another way, but we want ultimately to check that before >> net->core.sock_inuse is freed, folding the inuse count on all cpus is >> 0, to make sure we do not have a bug somewhere. > > Yes, I am aware of this issue even we will destroy the network namespace. > By the way, we can counter the socket-inuse in sock_alloc or sock_release= . > In this way, we have to hold the network namespace again(via > get_net()) while sock > may hold it. > > what do you think of this idea? > >> We should not have to test if net->core.sock_inuse is NULL or not from >> sock_inuse_add(). Pointer must be there all the time. >> >> The freeing should only happen once we are sure sock_inuse_add() can >> not be called anymore. >> >>> >>> >>> /* Finally it is safe to free my network namespace structure */ >>> list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {} >>> >>> >>> >>> Release the netlink sock created in kernel(not hold the _net_ >>> namespace): >>> >>> netlink_release >>> call_rcu(&nlk->rcu, deferred_put_nlk_sk); >>> >>> deferred_put_nlk_sk >>> sk_free(sk); >>> >>> >>> I may add a comment for sock_inuse_add in v6. >> >>