From mboxrd@z Thu Jan  1 00:00:00 1970
From: Florian Westphal <fw@strlen.de>
Subject: Re: [RFC PATCH 0/3] sk_buff: add skb extension infrastructure
Date: Mon, 26 Nov 2018 22:19:33 +0100
Message-ID: <20181126211933.kw6wzb63sgjmg634@breakpoint.cc>
References: <20181126113857.29270-1-fw@strlen.de>
 <20181126.124114.1938501276935155970.davem@davemloft.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: fw@strlen.de, netdev@vger.kernel.org
To: David Miller <davem@davemloft.net>
Return-path: <netdev-owner@vger.kernel.org>
Received: from Chamillionaire.breakpoint.cc ([146.0.238.67]:48906 "EHLO
        Chamillionaire.breakpoint.cc" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1726315AbeK0IPB (ORCPT
        <rfc822;netdev@vger.kernel.org>); Tue, 27 Nov 2018 03:15:01 -0500
Content-Disposition: inline
In-Reply-To: <20181126.124114.1938501276935155970.davem@davemloft.net>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

David Miller <davem@davemloft.net> wrote:
> > This adds an extension infrastructure for sk_buff instead:
> > 1. extension memory is released when the sk_buff is free'd.
> > 2. data is shared after cloning an skb.
> > 3. adding extension to an skb will COW the extension
> >    buffer if needed.
> 
> So MP-TCP, when enabled for a connection, will have a new atomic
> operation for every packet?

Yes, at least for every kfree_skb call.

> And new tests all in the fast paths of the networking to facilitate
> this feature, a cost paid by everyone.

No, right now everyone has two non-atomic tests (skb->sp + skb->nf_bridge),
with this proposal everyone has one (skb->active_extensions), assuming that
both br_nf and xfrm are converted to use the extension system.

Test(s) occur both on copy/clone and kfree_skb, just like in current
kernels.

atomic test(s) are done in case skb->{sp,nf_bridge} are set, with
this patch its done if skb->active_exensions is != 0.

So from that angle current status is kept.

Main motivation was to find a solution that does not add more costs
for normal cases.

I did a quick hack to also convert skb->sp, it seems possible to do so.

In that case skbuff size is reduced by 8 bytes as sp/nf_bridge get
replaced by single 'extension pointer', and slightly less code
provided kernel is built with both XFRM and bridge netfilter support.

> Sorry, that doesn't seem like a good idea to me.
>
> Can't they just encode whatever huge amount of crap they want to
> put into the CB by deriving the information from skb->sk and some
> tiny value like an index or something to resolve the path?

Perhaps, if thats the only way I'm afraid thats what will need to be
used.  I did try such a scheme once in the past to get
rid of skb->nf_bridge and things became very very fugly due to
kfree_skb() not being aware of such 'external storage', i.e. no
way to easily clean the external storage when an skbuff gets tossed.

Might be possibe to use destructor to take care of this in mptcp case.
I can have a look if this is the only possible way.

> In the future please document what is so enormous and absolutely
> required that they must put it all into the SKB control block.

Ok, will do.

> Like Eric, I am concerned about the slow creep of overhead.  Lots of
> small "not that bad" additions of extra cycles here and there over
> time adds up to impossible to fix performance regressions.

I have the same concern, which is why i am proposing the conversion
of xfrm and nf_bridge to use this instead of the current
nf_bridge/secpath maintanance.

Although MPTCP is the main motivation here, it was intended as a
standalone series, i.e., these 3 patches and a few followup changes
to convert xfrm.

> I'm sorry if this is a major disappointment for the MP-TCP folks but a
> better way needs to be found to integrate what they want to do with
> real zero cost for the rest of the world which won't be using MP-TCP
> and therefore should not be paying for it's added overhead at all.

Agreed.