From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f196.google.com ([209.85.192.196]:37609 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729720AbeGQTgV (ORCPT ); Tue, 17 Jul 2018 15:36:21 -0400 Subject: Re: [PATCH RFC/RFT net-next 00/17] net: Convert neighbor tables to per-namespace References: <20180717120651.15748-1-dsahern@kernel.org> <1a3f59a9-0ba5-c83f-16a6-f9550a84f693@gmail.com> From: David Ahern Message-ID: <1a27e301-3275-b349-a2f8-afdfdc02f04f@gmail.com> Date: Tue, 17 Jul 2018 13:02:18 -0600 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-wpan-owner@vger.kernel.org List-ID: To: Cong Wang Cc: Linux Kernel Network Developers , nikita.leshchenko@oracle.com, Roopa Prabhu , Stephen Hemminger , Ido Schimmel , Jiri Pirko , Saeed Mahameed , alex.aring@gmail.com, linux-wpan@vger.kernel.org, NetFilter , LKML On 7/17/18 11:53 AM, Cong Wang wrote: > You can see the original discussion here: > https://marc.info/?l=linux-netdev&m=140356141019653&w=2 > Thanks for the reference. I was surprised that the tables are still global. A number of objections raised in that thread were due to a large patch tackling multiple issues. This set is focused one thing - moving the tables to net - and does so in small incremental changes to make it easy to review. One of DaveM's comments: "Finally, another problem are permanent neigh entries as those cannot be reclaimed, that might be part of the main problem here. One idea wrt. permanent entries is that we could decide that, since they are administratively added, they don't count against the thresholds and limits." this is another we have hit and with same thinking ... permanent entries should not count in the gc numbers. We need to address this for EVPN. As for the per-namespace tables, it is 4 years later and over that time Linux supports a number of features: EVPN which is very mac heavy, VRR which doubles mac entries (one against the VRR device and one against the lower device) and NOS level features such as mlxsw which has to ensure mac entries for nexthop gateaways stay active. In addition there are other features on the horizon - like the ability to use namespaces to create virtual switches (what Cisco calls a VDC) where you absolutely want isolation and not allowing entries from virtual switch to evict entries from another. And of course the continued proliferation of containerized workloads where isolation is desired. I understand the concern about global resource and limits: as it stands you have to increase the limits in init_net to the max expected and hope for the best. With per namespace limits you can lower the limits of each namespace better control the total impact on the total memory used. Perhaps the defaults for namespaces after init_net could have really low defaults (e.g., 16 / 32 / 64 for gc_thresh 1/2/3) requiring admin intervention.