From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Ahern Subject: Re: [PATCH net-next] veth: report NEWLINK event when moving the peer device in a new namespace Date: Sat, 8 Sep 2018 19:16:46 -0600 Message-ID: <3edbd1ad-49ba-7fe4-0dd3-6c8d16dd307c@gmail.com> References: <51722660f2ef860779e227541dab77046496f135.1535712096.git.lorenzo.bianconi@redhat.com> <1321a4ad-75c5-358f-3a5d-1ec1549a9474@gmail.com> <20180831161902.GD6236@localhost.localdomain> <20180831165444.GE6236@localhost.localdomain> <247de7c5-5046-81f4-d0b9-f6fcd505e8e8@gmail.com> <7153fa4339ebe04d7c3819a8d8eabd789c93753a.camel@redhat.com> <76fe95ba-d80f-e420-1982-97019aa09d7c@gmail.com> <8eb1f295bbfe0d662e0df19448e9719be24348f5.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: "David S. Miller" , Network Development To: Thomas Haller , Lorenzo Bianconi Return-path: Received: from mail-pg1-f195.google.com ([209.85.215.195]:46777 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726539AbeIIGEd (ORCPT ); Sun, 9 Sep 2018 02:04:33 -0400 Received: by mail-pg1-f195.google.com with SMTP id b129-v6so8706956pga.13 for ; Sat, 08 Sep 2018 18:16:50 -0700 (PDT) In-Reply-To: <8eb1f295bbfe0d662e0df19448e9719be24348f5.camel@redhat.com> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: Hi Thomas: On 9/7/18 12:52 PM, Thomas Haller wrote: > Hi David, > > > On Mon, 2018-09-03 at 20:54 -0600, David Ahern wrote: > >> From init_net: >> $ ip monitor all-nsid > > I thought the concern of the patch is the overhead of sending one > additional RTM_NEWLINK message. This workaround has likely higher > overhead. More importantly, it's so cumbersome, that I doubt anybody > would implementing such a solution. Yes, the concern is additional messages. Each one adds non-negligible overhead to build and send, something that is measurable in a system at scale. For example, consider an interface manager creating 4k VLANs on one or more bridges. The RTM_NEWLINK messages (there are 2) adds 33% to the overhead and time of creating the vlans. Just in some quick tests I see times vary from as little as 15 usec to over 1 msec - per message with the RTNL held. The norm seems to be somewhere between 35 and 40 usecs, but those add up when it is per link and per action. We have to more disciplined about the size and frequency of these messages. > > When the events of one namespace are not sufficient to get all relevant > information (local to the namespace itself), the solution is not > monitor all-nsid. > > You might save complexity and performance overhead in kernel. But what > you save here is just moved to user-space, which faces higher > complexity (at multiple places/projects, where developers are not > experts in netlink) and higher overhead. First, this is one use case that seems to care about a message coming back on the veth pairs. The message is sent for everyone, always. Adding a 3rd message puts additional load on all use cases. Second, I am questioning this use case driving a kernel change. You say a userspace app cares about a veth pair but only wants to monitor events for one end of it. That's a userspace choice. Another argument brought up is that the other end of the pair can change namespaces faster than the app can track it. Sure. But the monitored end can move as well - and faster than the app can track it. e.g., 1. veth1 and veth2 are a pair. 2. app filters events for veth1 only 3. veth2 is moved to namespace foo 4. app is sent notification of move 5. veth2 is moved to namespace bar 6. app processes notification of the event in step 3, looks in namespace foo and link is gone. That seems to be the fundamentals of your request for this new message, yes? What happens when veth1 is moved to another namespace and then another namespace - faster than the app can not track it? You have the same problem. Your use case may not have this problem, but generically it can happen and this solution does not cover it. The movement of links is provided to userspace, and userspace needs to be smart about what it wants to do. Alternatively, you could provide an API for your interface manager -- whatever is handling the link movement. It know where it is moving interfaces. You could have it respond to inquiries of where device X is at any given moment.