All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Stanislav Fomichev <sdf@fomichev.me>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	Stanislav Fomichev <sdf@google.com>,
	Network Development <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	simon.horman@netronome.com, Willem de Bruijn <willemb@google.com>
Subject: Re: [RFC bpf-next 0/7] net: flow_dissector: trigger BPF hook when called from eth_get_headlen
Date: Wed, 13 Feb 2019 20:39:32 -0800	[thread overview]
Message-ID: <20190214043931.hdpddeom57rnnlhm@ast-mbp> (raw)
In-Reply-To: <20190212170232.GB10595@mini-arch>

On Tue, Feb 12, 2019 at 09:02:32AM -0800, Stanislav Fomichev wrote:
> On 02/05, Stanislav Fomichev wrote:
> > On 02/05, Alexei Starovoitov wrote:
> > > On Tue, Feb 05, 2019 at 07:56:19PM -0800, Stanislav Fomichev wrote:
> > > > On 02/05, Alexei Starovoitov wrote:
> > > > > On Tue, Feb 05, 2019 at 04:59:31PM -0800, Stanislav Fomichev wrote:
> > > > > > On 02/05, Alexei Starovoitov wrote:
> > > > > > > On Tue, Feb 05, 2019 at 12:40:03PM -0800, Stanislav Fomichev wrote:
> > > > > > > > On 02/05, Willem de Bruijn wrote:
> > > > > > > > > On Tue, Feb 5, 2019 at 12:57 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Currently, when eth_get_headlen calls flow dissector, it doesn't pass any
> > > > > > > > > > skb. Because we use passed skb to lookup associated networking namespace
> > > > > > > > > > to find whether we have a BPF program attached or not, we always use
> > > > > > > > > > C-based flow dissector in this case.
> > > > > > > > > >
> > > > > > > > > > The goal of this patch series is to add new networking namespace argument
> > > > > > > > > > to the eth_get_headlen and make BPF flow dissector programs be able to
> > > > > > > > > > work in the skb-less case.
> > > > > > > > > >
> > > > > > > > > > The series goes like this:
> > > > > > > > > > 1. introduce __init_skb and __init_skb_shinfo; those will be used to
> > > > > > > > > >    initialize temporary skb
> > > > > > > > > > 2. introduce skb_net which can be used to get networking namespace
> > > > > > > > > >    associated with an skb
> > > > > > > > > > 3. add new optional network namespace argument to __skb_flow_dissect and
> > > > > > > > > >    plumb through the callers
> > > > > > > > > > 4. add new __flow_bpf_dissect which constructs temporary on-stack skb
> > > > > > > > > >    (using __init_skb) and calls BPF flow dissector program
> > > > > > > > > 
> > > > > > > > > The main concern I see with this series is this cost of skb zeroing
> > > > > > > > > for every packet in the device driver receive routine, *independent*
> > > > > > > > > from the real skb allocation and zeroing which will likely happen
> > > > > > > > > later.
> > > > > > > > Yes, plus ~200 bytes on the stack for the callers.
> > > > > > > > 
> > > > > > > > Not sure how visible this zeroing though, I can probably try to get some
> > > > > > > > numbers from BPF_PROG_TEST_RUN (running current version vs running with
> > > > > > > > on-stack skb).
> > > > > > > 
> > > > > > > imo extra 256 byte memset for every packet is non starter.
> > > > > > We can put pre-allocated/initialized skbs without data into percpu or even
> > > > > > use pcpu_freelist_pop/pcpu_freelist_push to make sure we don't have to think
> > > > > > about having multiple percpu for irq/softirq/process contexts.
> > > > > > Any concerns with that approach?
> > > > > > Any other possible concerns with the overall series?
> > > > > 
> > > > > I'm missing why the whole thing is needed.
> > > > > You're saying:
> > > > > " make BPF flow dissector programs be able to work in the skb-less case".
> > > > > What does it mean specifically?
> > > > > The only non-skb case is XDP.
> > > > > Are you saying you want flow_dissector prog to be run in XDP?
> > > > eth_get_headlen that drivers call on RX path on a chunk of data to
> > > > guesstimate the length of the headers calls flow dissector without an skb
> > > > (__skb_flow_dissect was a weird interface where it accepts skb or
> > > > data+len). Right now, there is no way to trigger BPF flow dissector
> > > > for this case (we don't have an skb to get associated namespace/etc/etc).
> > > > The patch series tries to fix that to make sure that we always trigger
> > > > BPF program if it's attached to a device's namespace.
> > > 
> > > then why not to create flow_dissector prog type that works without skb?
> > > Why do you need to fake an skb?
> > > XDP progs work just fine without it.
> > What's the advantage of having another prog type? In this case we would have
> > to write the same flow dissector program twice: first time against __skb_buff
> > interface, second time against xdp_md.
> > By using fake skb, we make the same flow dissector __sk_buff BPF program
> > work in both contexts without a rewrite to an xdp interface (I don't
> > think users should care whether flow dissector was called form "xdp" vs skb
> > context; and we're sort of stuck with __sk_buff interface already).
> Should I follow up with v2 where I address memset(,,256) for each packet?
> Or you still have some questions/doubts/suggestions regarding the problem
> I'm trying to solve?

sorry for delay. I'm still thinking what is the path forward here.

That 'stuck with __sk_buff' is what bothers me.
It's an indication that api wasn't thought through if first thing
it needs is this fake skb hack.
If bpf_flow.c is a realistic example of such flow dissector prog
it means that real skb fields are accessed.
In particular skb->vlan_proto, skb->protocol.
These fields in case of 'fake skb' will not be set, since eth_type_trans()
isn't called yet.
So either flow_dissector needs a real __sk_buff and all of its fields
should be real or it's a different flow_dissector prog type that
needs ctx->data, ctx->data_end, ctx->flow_keys only.
Either way going with fake skb is incorrect, since bpf_flow.c example
will be broken and for program writers it will be hard to figure why
it's broken.


  reply	other threads:[~2019-02-14  4:39 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-05 17:36 [RFC bpf-next 0/7] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Stanislav Fomichev
2019-02-05 17:36 ` [RFC bpf-next 1/7] net: introduce __init_skb and __init_skb_shinfo helpers Stanislav Fomichev
2019-02-05 20:18   ` Willem de Bruijn
2019-02-05 17:36 ` [RFC bpf-next 2/7] net: introduce skb_net helper Stanislav Fomichev
2019-02-05 20:19   ` Willem de Bruijn
2019-02-05 20:42     ` Stanislav Fomichev
2019-02-05 17:36 ` [RFC bpf-next 3/7] net: plumb network namespace into __skb_flow_dissect Stanislav Fomichev
2019-02-05 20:19   ` Willem de Bruijn
2019-02-05 20:40     ` Stanislav Fomichev
2019-02-05 17:36 ` [RFC bpf-next 4/7] net: flow_dissector: handle no-skb use case Stanislav Fomichev
2019-02-05 20:19   ` Willem de Bruijn
2019-02-05 20:45     ` Stanislav Fomichev
2019-02-05 17:36 ` [RFC bpf-next 5/7] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode Stanislav Fomichev
2019-02-05 20:19   ` Willem de Bruijn
2019-02-05 17:36 ` [RFC bpf-next 6/7] selftests/bpf: add flow dissector bpf_skb_load_bytes helper test Stanislav Fomichev
2019-02-05 17:36 ` [RFC bpf-next 7/7] net: flow_dissector: pass net argument to the eth_get_headlen Stanislav Fomichev
2019-02-05 20:18 ` [RFC bpf-next 0/7] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Willem de Bruijn
2019-02-05 20:40   ` Stanislav Fomichev
2019-02-06  0:47     ` Alexei Starovoitov
2019-02-06  0:59       ` Stanislav Fomichev
2019-02-06  3:12         ` Alexei Starovoitov
2019-02-06  3:56           ` Stanislav Fomichev
2019-02-06  4:11             ` Alexei Starovoitov
2019-02-06  5:49               ` Stanislav Fomichev
2019-02-12 17:02                 ` Stanislav Fomichev
2019-02-14  4:39                   ` Alexei Starovoitov [this message]
2019-02-14  5:57                     ` Stanislav Fomichev
2019-02-14  6:38                       ` Alexei Starovoitov
2019-02-14 17:35                         ` Stanislav Fomichev
2019-02-25 20:33                           ` Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190214043931.hdpddeom57rnnlhm@ast-mbp \
    --to=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@fomichev.me \
    --cc=sdf@google.com \
    --cc=simon.horman@netronome.com \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.