From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Stanislav Fomichev <sdf@fomichev.me>,
Stanislav Fomichev <sdf@google.com>,
Network Development <netdev@vger.kernel.org>,
bpf <bpf@vger.kernel.org>, David Miller <davem@davemloft.net>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Simon Horman <simon.horman@netronome.com>,
Willem de Bruijn <willemb@google.com>,
Petar Penkov <peterpenkov96@gmail.com>
Subject: Re: [RFC bpf-next v3 6/8] flow_dissector: handle no-skb use case
Date: Wed, 27 Mar 2019 20:32:14 -0700 [thread overview]
Message-ID: <20190328033212.hmhmnvksxfyaxmm4@ast-mbp> (raw)
In-Reply-To: <CAF=yD-KkurbKskd=7XpZDG8K0T9pj-Pv-Bsi+u1RubTF6_tHcQ@mail.gmail.com>
On Wed, Mar 27, 2019 at 11:14:46PM -0400, Willem de Bruijn wrote:
> On Wed, Mar 27, 2019 at 9:26 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Mar 27, 2019 at 12:58:20PM -0700, Stanislav Fomichev wrote:
> > > On 03/27, Alexei Starovoitov wrote:
> > > > On Tue, Mar 26, 2019 at 07:44:21PM -0700, Stanislav Fomichev wrote:
> > > > > On 03/26, Alexei Starovoitov wrote:
> > > > > > On Tue, Mar 26, 2019 at 11:54:56AM -0700, Stanislav Fomichev wrote:
> > > > > > > On 03/26, Alexei Starovoitov wrote:
> > > > > > > > On Tue, Mar 26, 2019 at 11:17:19AM -0700, Stanislav Fomichev wrote:
> > > > > > > > > On 03/26, Alexei Starovoitov wrote:
> > > > > > > > > > On Tue, Mar 26, 2019 at 10:52 AM Willem de Bruijn
> > > > > > > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > > > > > > > The BPF flow dissector should work the same. It is fine to pass the
> > > > > > > > > > > data including ethernet header, but parsing can start at nhoff with
> > > > > > > > > > > proto explicitly passed.
> > > > > > > > > > >
> > > > > > > > > > > We should not assume Ethernet link layer.
> > > > > > > > > >
> > > > > > > > > > then skb-less dissector has to be different program type
> > > > > > > > > > because semantics are different.
> > > > > > > > > The semantics are the same as for c-based __skb_flow_dissect.
> > > > > > > > > We just need to pass nhoff and proto that has been passed to
> > > > > > > > > __skb_flow_dissect to the bpf program. In case of with-skb,
> > > > > > > > > take this initial data from skb, like __skb_flow_dissect does (and don't
> > > > > > > > > ask BPF program to do it essentially):
> > > > > > > > >
> > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/net/core/flow_dissector.c#n763
> > > > > > > > >
> > > > > > > > > I was thinking of passing proto as flow_keys->n_proto and we already
> > > > > > > > > pass flow_keys->nhoff, so no need to do anything for it. With that,
> > > > > > > > > BPF program doesn't need to look into skb and can parse optional vlan
> > > > > > > > > and L3+ headers. The same way __skb_flow_dissect does that.
> > > > > > > >
> > > > > > > > makes sense. then I'd also prefer for proto to be in flow_keys to
> > > > > > > > high light this difference.
> > > > > > > Maybe rename existing flow_keys->n_proto to flow_keys->proto?
> > > > > > > That would match __skb_flow_dissect and remove ambiguity with both proto
> > > > > > > and n_proto in flow_keys.
> > > > > >
> > > > > > disabling useless fields in ctx is one thing, since probability of breaking users
> > > > > > is low, but renaming n_proto is imo too much.
> > > > > >
> > > > > > > > may be add vlan_proto/present/tci there as well?
> > > > > > > > At least on the kernel side ctx rewriter will be the same for w/ & w/o skb cases.
> > > > > > > Why do you think we need them? My understanding was that when
> > > > > > > skb_vlan_tag_present(skb) (or skb->vlan_present) returns true, that means
> > > > > > > that vlan info has been already parsed out of the packet and stored in
> > > > > > > the vlan_tci/vlan_proto (where vlan_proto is 8021Q/8021AD); skb data
> > > > > > > points to proper L3 header.
> > > > > > >
> > > > > > > If that's correct, BPF flow dissector should not care about that. For
> > > > > > > example, look at how C-based flow dissector does that:
> > > > > > >
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/net/core/flow_dissector.c#n944
> > > > > > >
> > > > > > > If skb_vlan_tag_present(skb) returns true, we set proto to skb->protocol
> > > > > > > and move on.
> > > > > > >
> > > > > > > But, we would need vlan_proto/present/tci in the flow_keys in the future.
> > > > > > > We don't currently return parsed vlan data from the BPF flow dissector.
> > > > > > > But it feels like it's getting into bpf-next territory :-)
> > > > > >
> > > > > > Whether ctx->data points to L2 or L3 is uapi regardless whether
> > > > > > progs/bpf_flow.c is relying on that or not.
> > > > > > So far I think you're saying that in all three cases:
> > > > > > no-skb, skb befor rfs, skb after rfs ctx->data points to L2, right?
> > > > > > This has to be preserved.
> > > > > It points to L3 (or vlan). And this will be preserved, I have no
> > > > > intention to change that.
> > > > >
> > > > > Just to make sure, we are on the same page, here is what
> > > > > __skb_flow_dissect (and BPF prog) is seeing in nhoff.
> > > > >
> > > > > NO-VLAN is always the same for both with-skb/no-skb:
> > > > > +----+----+-----+--+
> > > > > |DMAC|SMAC|PROTO|L3|
> > > > > +----+----+-----+--+
> > > > > ^
> > > > > +-- nhoff
> > > > > proto = PROTO
> > > > >
> > > > > VLAN no-skb (eth_get_headlen):
> > > > > +----+----+----+---+-----+--+
> > > > > |DMAC|SMAC|TPID|TCI|PROTO|L3|
> > > > > +----+----+----+---+-----+--+
> > > > > ^
> > > > > +-- nhoff
> > > > > proto = TPID
> > > >
> > > > where ctx->data will point to ?
> > > > These nhoff differences are fine.
> > > > I want to make sure that ctx->data is the same for all.
> > > For with-skb, nhoff would be zero, and ctx->data would point to
> > > TCI/L3.
> > > For skb-less, ctx->data would point to L2 (DMAC), and nhoff would be
> > > non-zero (TCI/L3 offset).
> > >
> > > If you want, for skb-less case, when calling BPF program we can do the math
> > > ourselves and set ctx->data to data + nhoff, and pass nhoff = 0.
> > > But I'm not sure whether we need to do that; flow dissector is supposed
> > > to look at ctx->data + nhoff, it should not matter what each individual
> > > value is, they only make sense together.
> >
> > My strong preference is to have data to point to L2 in all cases.
> > Semantics of requiring bpf prog to start processing from a tuple
> > (data + nhoff) where both point to random places is very confusing.
>
> Since flow dissection starts at the network layer, I would then
> suggest data always at L3 and nhoff 0.
>
> This can be derived in the same manner as __skb_flow_dissect
> already does if !data, using only skb_network_offset.
>
> From a quick scan, skb_mac_offset should also be valid in all cases
> where the flow dissector is called today, so the other can be computed, too.
>
> But this is less obvious. For instance, tun_get_user calls into the flow
> dissector up to three times (wow) and IFF_TUN has no link layer
> (ARPHRD_NONE). And then there are also fun variable length link layer
> protocols to deal with..
ahh. ok. Can we guarantee some stable position?
Current bpf_flow_dissect_get_header assumes that
ctx->data + ctx->flow_keys->thoff point to IP, right?
Based on what Stanislav saying above even that is not a guarantee?
I'm struggling to see how users can wrap their heads around this.
It seems bpf_flow.c will become the only prog that can deal with
this range of possible inputs.
I propose to start with the doc that describes all cases, where
things point to and how prog suppose to parse that.
next prev parent reply other threads:[~2019-03-28 3:32 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-22 19:58 [RFC bpf-next v3 0/8] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Stanislav Fomichev
2019-03-22 19:58 ` [RFC bpf-next v3 1/8] flow_dissector: allow access only to a subset of __sk_buff fields Stanislav Fomichev
2019-03-22 19:58 ` [RFC bpf-next v3 2/8] flow_dissector: switch kernel context to struct bpf_flow_dissector Stanislav Fomichev
2019-03-22 19:58 ` [RFC bpf-next v3 3/8] flow_dissector: fix clamping of BPF flow_keys for non-zero nhoff Stanislav Fomichev
2019-03-22 19:58 ` [RFC bpf-next v3 4/8] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode Stanislav Fomichev
2019-03-22 19:59 ` [RFC bpf-next v3 5/8] net: plumb network namespace into __skb_flow_dissect Stanislav Fomichev
2019-03-22 19:59 ` [RFC bpf-next v3 6/8] flow_dissector: handle no-skb use case Stanislav Fomichev
2019-03-23 1:00 ` Alexei Starovoitov
2019-03-23 1:19 ` Stanislav Fomichev
2019-03-23 1:41 ` Alexei Starovoitov
2019-03-23 16:05 ` Stanislav Fomichev
2019-03-26 0:35 ` Alexei Starovoitov
2019-03-26 16:45 ` Stanislav Fomichev
2019-03-26 17:48 ` Alexei Starovoitov
2019-03-26 17:51 ` Willem de Bruijn
2019-03-26 18:08 ` Alexei Starovoitov
2019-03-26 18:17 ` Stanislav Fomichev
2019-03-26 18:30 ` Alexei Starovoitov
2019-03-26 18:54 ` Stanislav Fomichev
2019-03-27 1:41 ` Alexei Starovoitov
2019-03-27 2:44 ` Stanislav Fomichev
2019-03-27 17:55 ` Alexei Starovoitov
2019-03-27 19:58 ` Stanislav Fomichev
2019-03-28 1:26 ` Alexei Starovoitov
2019-03-28 3:14 ` Willem de Bruijn
2019-03-28 3:32 ` Alexei Starovoitov [this message]
2019-03-28 4:17 ` Stanislav Fomichev
2019-03-28 12:58 ` Willem de Bruijn
2019-04-01 16:30 ` Stanislav Fomichev
2019-03-22 19:59 ` [RFC bpf-next v3 7/8] net: pass net argument to the eth_get_headlen Stanislav Fomichev
2019-03-22 19:59 ` [RFC bpf-next v3 8/8] selftests/bpf: add flow dissector bpf_skb_load_bytes helper test Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190328033212.hmhmnvksxfyaxmm4@ast-mbp \
--to=alexei.starovoitov@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=peterpenkov96@gmail.com \
--cc=sdf@fomichev.me \
--cc=sdf@google.com \
--cc=simon.horman@netronome.com \
--cc=willemb@google.com \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).