Re: [PATCH net-next] veth: report NEWLINK event when moving the peer device in a new namespace

From: David Ahern <dsahern@gmail.com>
To: Thomas Haller <thaller@redhat.com>,
	Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH net-next] veth: report NEWLINK event when moving the peer device in a new namespace
Date: Sat, 8 Sep 2018 19:16:46 -0600	[thread overview]
Message-ID: <3edbd1ad-49ba-7fe4-0dd3-6c8d16dd307c@gmail.com> (raw)
In-Reply-To: <8eb1f295bbfe0d662e0df19448e9719be24348f5.camel@redhat.com>

Hi Thomas:

On 9/7/18 12:52 PM, Thomas Haller wrote:
> Hi David,
> 
> 
> On Mon, 2018-09-03 at 20:54 -0600, David Ahern wrote:
> 
>> From init_net:
>> $ ip monitor all-nsid
> 
> I thought the concern of the patch is the overhead of sending one
> additional RTM_NEWLINK message. This workaround has likely higher
> overhead. More importantly, it's so cumbersome, that I doubt anybody
> would implementing such a solution.

Yes, the concern is additional messages. Each one adds non-negligible
overhead to build and send, something that is measurable in a system at
scale. For example, consider an interface manager creating 4k VLANs on
one or more bridges. The RTM_NEWLINK messages (there are 2) adds 33% to
the overhead and time of creating the vlans. Just in some quick tests I
see times vary from as little as 15 usec to over 1 msec - per message
with the RTNL held. The norm seems to be somewhere between 35 and 40
usecs, but those add up when it is per link and per action.

We have to more disciplined about the size and frequency of these messages.

> 
> When the events of one namespace are not sufficient to get all relevant
> information (local to the namespace itself), the solution is not
> monitor all-nsid.
> 
> You might save complexity and performance overhead in kernel. But what
> you save here is just moved to user-space, which faces higher
> complexity (at multiple places/projects, where developers are not
> experts in netlink) and higher overhead.

First, this is one use case that seems to care about a message coming
back on the veth pairs. The message is sent for everyone, always. Adding
a 3rd message puts additional load on all use cases.

Second, I am questioning this use case driving a kernel change. You say
a userspace app cares about a veth pair but only wants to monitor events
for one end of it. That's a userspace choice. Another argument brought
up is that the other end of the pair can change namespaces faster than
the app can track it. Sure. But the monitored end can move as well - and
faster than the app can track it. e.g.,

1. veth1 and veth2 are a pair.
2. app filters events for veth1 only
3. veth2 is moved to namespace foo
4. app is sent notification of move
5. veth2 is moved to namespace bar
6. app processes notification of the event in step 3, looks in namespace
foo and link is gone.

That seems to be the fundamentals of your request for this new message, yes?

What happens when veth1 is moved to another namespace and then another
namespace - faster than the app can not track it? You have the same
problem. Your use case may not have this problem, but generically it can
happen and this solution does not cover it.

The movement of links is provided to userspace, and userspace needs to
be smart about what it wants to do.

Alternatively, you could provide an API for your interface manager --
whatever is handling the link movement. It know where it is moving
interfaces. You could have it respond to inquiries of where device X is
at any given moment.