Linux-WPAN Archive on lore.kernel.org
 help / color / Atom feed
From: David Ahern <dsahern@gmail.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>,
	David Miller <davem@davemloft.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	nikita.leshchenko@oracle.com,
	Roopa Prabhu <roopa@cumulusnetworks.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	Ido Schimmel <idosch@mellanox.com>,
	Jiri Pirko <jiri@mellanox.com>,
	Saeed Mahameed <saeedm@mellanox.com>,
	Alexander Aring <alex.aring@gmail.com>,
	linux-wpan@vger.kernel.org,
	NetFilter <netfilter-devel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC/RFT net-next 00/17] net: Convert neighbor tables to per-namespace
Date: Mon, 13 Aug 2018 15:48:17 -0600
Message-ID: <a227d23c-40a1-97a9-d39c-70d16cd470b4@gmail.com> (raw)
In-Reply-To: <87o9evt9a6.fsf@xmission.com>

On 7/25/18 1:17 PM, Eric W. Biederman wrote:
> David Ahern <dsahern@gmail.com> writes:
> 
>> On 7/25/18 11:38 AM, Eric W. Biederman wrote:
>>>
>>> Absolutely NOT.  Global thresholds are exactly correct given the fact
>>> you are running on a single kernel.
>>>
>>> Memory is not free (Even though we are swimming in enough of it memory
>>> rarely matters).  One of the few remaining challenges is for containers
>>> is finding was to limit resources in such a way that one application
>>> does not mess things up for another container during ordinary usage.
>>>
>>> It looks like the neighbour tables absolutely are that kind of problem,
>>> because the artificial limits are too strict.   Completely giving up on
>>> limits does not seem right approach either.  We need to fix the limits
>>> we have (perhaps making them go away entirely), not just apply a
>>> band-aid.  Let's get to the bottom of this and make the system better.
>>
>> Eric: yes, they all share the global resource of memory and there should
>> be limits on how many entries a remote entity can create.
>>
>> Network namespaces can provide a separation such that one namespace does
>> not disrupt networking in another. It is absolutely appropriate to do
>> so. Your rigid stance is inconsistent given the basic meaning of a
>> network namespace and the parallels to this same problem -- bridges,
>> vxlans, and ip fragments. Only neighbor tables are not per-device or per
>> namespace; your insistence on global limits is missing the mark and wrong.
> 
> That is not what I said.  Let me rephrase and see if you understand.
> 
> The problem appears to be of lots of devices.  Fundamentally if you use
> lots of network devices today unless you adjust gc_thresh3 you will run
> out of neighbour table entries.
> 
> The problem has a bigger scope than what you are looking at.
> 
> If you fix the core problem you won't see the problem in the context
> of network namespaces either.
> 
> Default limits should be something that will never be hit unless
> something goes crazy.  We are hitting them.  Therefore by definition
> there is a bug in these limits.

I disagree that the problem is a global limit. It is trivial for users
to increase gc_thresh3. That does not solve the fundamental problem.

> 
> 
> And yes there is absolutely a place for global limits on things like
> inodes, file descriptors etc, that does not care about which part of the
> kernel you are in.  However hitting those limits in normal operation is
> a bug.
> 
> We have ourselves a bug.

I agree we have a bug; we disagree on what that bug is.

I am just back from vacation and re-read your responses. No where do you
acknowledge the fundamental point of this patch set - that adding a new
neighbor entry in one namespace can evict an entry in another namespace
or worse networking in one namespace can fail due to table overflow
because of entries from another. That is a real problem.

It is not a matter of increasing the default gc_thresh3 to some number
N; it is ensuring that regardless of the value of gc_thresh3 one
namespace is not affected by another.

You created network namespaces and it provides isolation -- separate
tables essentially -- for devices, FIB entries, sockets, etc, but you
argue against completing the task with separate neighbor tables which is
very strange given the impact (completely broken networking).

> 
> Eric
> 
> p.s. I wrote the definition of network namespaces and it absolutely does
> have room for global limits.   One of the things Linus has periodically
> yelled at me about is that there are not enough of them.
> 

  reply index

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-17 12:06 dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 01/17] net/ipv4: rename ipv4_neigh_lookup to ipv4_dst_neigh_lookup dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 02/17] net/neigh: export neigh_find_table dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 03/17] net/ipv4: wrappers for arp table references dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 04/17] net/ipv4: Remove open coded use of arp table dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 05/17] net/ipv6: wrappers for neighbor table references dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 06/17] net/ipv6: Remove open coded use of neighbor table dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 07/17] drivers/net: remove open coding of neighbor tables dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 08/17] net: Remove nd_tbl from ipv6 stub dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 09/17] net: Remove arp_tbl and nd_tbl from headers dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 10/17] net: Add key_len to neighbor constructor dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 11/17] net: Change neigh_table_init and neigh_table_clear signature dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 12/17] net/neigh: Change neigh_xmit to take an address family dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 13/17] net/neighbor: Convert internal functions away from neigh_tables dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 14/17] net/ipv4: Convert arp table to per namespace dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 15/17] net/ipv6: Convert neighbor table to per-namespace dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 16/17] net/decnet: Move " dsahern
2018-07-17 12:06 ` [PATCH RFC/RFT net-next 17/17] net/neighbor: Remove neigh_tables and NEIGH enum dsahern
2018-07-17 17:40 ` [PATCH RFC/RFT net-next 00/17] net: Convert neighbor tables to per-namespace Cong Wang
2018-07-17 17:43   ` David Ahern
2018-07-17 17:53     ` Cong Wang
2018-07-17 19:02       ` David Ahern
2018-07-17 20:37         ` Cong Wang
2018-07-18  3:59         ` David Miller
2018-07-19 16:16           ` David Ahern
2018-07-19 17:12             ` Cong Wang
2018-07-24 15:14               ` David Ahern
2018-07-24 17:14                 ` David Miller
2018-07-25 18:23                   ` David Ahern
2018-07-24 22:09                 ` Cong Wang
2018-07-25 12:33                   ` Eric W. Biederman
2018-07-25 14:06                     ` David Ahern
2018-07-25 17:38                       ` Eric W. Biederman
2018-07-25 18:13                         ` David Ahern
2018-07-25 19:17                           ` Eric W. Biederman
2018-08-13 21:48                             ` David Ahern [this message]
2018-08-15  4:36                               ` Eric W. Biederman
2018-07-26 11:12                         ` David Laight
2018-07-27 16:27                           ` Eric W. Biederman
2018-07-19  0:54 ` Michael Richardson
2018-07-19 15:49   ` David Ahern
2018-08-12  6:46 ` [RFC/RFT, net-next, " Vasily Averin
2018-08-12 17:37   ` David Ahern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a227d23c-40a1-97a9-d39c-70d16cd470b4@gmail.com \
    --to=dsahern@gmail.com \
    --cc=alex.aring@gmail.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=idosch@mellanox.com \
    --cc=jiri@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wpan@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=nikita.leshchenko@oracle.com \
    --cc=roopa@cumulusnetworks.com \
    --cc=saeedm@mellanox.com \
    --cc=stephen@networkplumber.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-WPAN Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-wpan/0 linux-wpan/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-wpan linux-wpan/ https://lore.kernel.org/linux-wpan \
		linux-wpan@vger.kernel.org
	public-inbox-index linux-wpan

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-wpan


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git