bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@fomichev.me>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Stanislav Fomichev <sdf@google.com>,
	Network Development <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, David Miller <davem@davemloft.net>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Simon Horman <simon.horman@netronome.com>,
	Willem de Bruijn <willemb@google.com>,
	Petar Penkov <peterpenkov96@gmail.com>
Subject: Re: [RFC bpf-next v3 6/8] flow_dissector: handle no-skb use case
Date: Mon, 1 Apr 2019 09:30:15 -0700	[thread overview]
Message-ID: <20190401163015.GH7431@mini-arch.hsd1.ca.comcast.net> (raw)
In-Reply-To: <CAF=yD-KG8xaP+VMAXcAYKnbJOtuLmF7yibnetJfws94bQZvmVQ@mail.gmail.com>

On 03/28, Willem de Bruijn wrote:
> > > > > > > > > > If skb_vlan_tag_present(skb) returns true, we set proto to skb->protocol
> > > > > > > > > > and move on.
> > > > > > > > > >
> > > > > > > > > > But, we would need vlan_proto/present/tci in the flow_keys in the future.
> > > > > > > > > > We don't currently return parsed vlan data from the BPF flow dissector.
> > > > > > > > > > But it feels like it's getting into bpf-next territory :-)
> > > > > > > > >
> > > > > > > > > Whether ctx->data points to L2 or L3 is uapi regardless whether
> > > > > > > > > progs/bpf_flow.c is relying on that or not.
> > > > > > > > > So far I think you're saying that in all three cases:
> > > > > > > > > no-skb, skb befor rfs, skb after rfs ctx->data points to L2, right?
> > > > > > > > > This has to be preserved.
> > > > > > > > It points to L3 (or vlan). And this will be preserved, I have no
> > > > > > > > intention to change that.
> > > > > > > >
> > > > > > > > Just to make sure, we are on the same page, here is what
> > > > > > > > __skb_flow_dissect (and BPF prog) is seeing in nhoff.
> > > > > > > >
> > > > > > > > NO-VLAN is always the same for both with-skb/no-skb:
> > > > > > > > +----+----+-----+--+
> > > > > > > > |DMAC|SMAC|PROTO|L3|
> > > > > > > > +----+----+-----+--+
> > > > > > > >                  ^
> > > > > > > >                  +-- nhoff
> > > > > > > >                      proto = PROTO
> > > > > > > >
> > > > > > > > VLAN no-skb (eth_get_headlen):
> > > > > > > > +----+----+----+---+-----+--+
> > > > > > > > |DMAC|SMAC|TPID|TCI|PROTO|L3|
> > > > > > > > +----+----+----+---+-----+--+
> > > > > > > >                 ^
> > > > > > > >                 +-- nhoff
> > > > > > > >                     proto = TPID
> > > > > > >
> > > > > > > where ctx->data will point to ?
> > > > > > > These nhoff differences are fine.
> > > > > > > I want to make sure that ctx->data is the same for all.
> > > > > > For with-skb, nhoff would be zero, and ctx->data would point to
> > > > > > TCI/L3.
> > > > > > For skb-less, ctx->data would point to L2 (DMAC), and nhoff would be
> > > > > > non-zero (TCI/L3 offset).
> > > > > >
> > > > > > If you want, for skb-less case, when calling BPF program we can do the math
> > > > > > ourselves and set ctx->data to data + nhoff, and pass nhoff = 0.
> > > > > > But I'm not sure whether we need to do that; flow dissector is supposed
> > > > > > to look at ctx->data + nhoff, it should not matter what each individual
> > > > > > value is, they only make sense together.
> > > > >
> > > > > My strong preference is to have data to point to L2 in all cases.
> > > > > Semantics of requiring bpf prog to start processing from a tuple
> > > > > (data + nhoff) where both point to random places is very confusing.
> > > >
> > > > Since flow dissection starts at the network layer, I would then
> > > > suggest data always at L3 and nhoff 0.
> > For eth_get_headlen we need to manually parse 802.1q header. And for RFS
> > case as well (unless I'm missing something).
> >
> > > > This can be derived in the same manner as __skb_flow_dissect
> > > > already does if !data, using only skb_network_offset.
> > > >
> > > > From a quick scan, skb_mac_offset should also be valid in all cases
> > > > where the flow dissector is called today, so the other can be computed, too.
> > > >
> > > > But this is less obvious. For instance, tun_get_user calls into the flow
> > > > dissector up to three times (wow) and IFF_TUN has no link layer
> > > > (ARPHRD_NONE). And then there are also fun variable length link layer
> > > > protocols to deal with..
> > >
> > > ahh. ok. Can we guarantee some stable position?
> > I don't think so. Pre RFS ctx->data+nhoff can point to 802.1q header,
> > post RFS it will point to L3. The only thing we can do is to have
> > nhoff=0 (and adjust ctx->data accordingly) when the main bpf
> > flow dissector procedure is called. But that would require bringing
> > this new kernel context (bpf_flow_dissector) into bpf/stable.
> > (And it's not clear what's the benefit, since tail calls would still
> > have to look at that offset).
> 
> The flow dissector can be called also before and after tunneling, in
> which case skb_network_offset points to an inner header. Or after
> MPLS, which stumps a flow dissector called earlier as that has no
> information about the encapsulated protocol.
> 
> I don't think that there should be a goal that flow dissection starts
> at the same point in the packet for all callsites along the datapath.
> As long as it always starts at a known ETH_P_.. type protocol header
> the program should be able to parse that. That is how the non-BPF
> flow dissector works.
> 
> > > Current bpf_flow_dissect_get_header assumes that
> > > ctx->data + ctx->flow_keys->thoff point to IP, right?
> > Yes, mostly, except that if skb->protocol is 802.1q/ad, it's 802.1q header.
> > And it's only for the "main" call; bpf program adjusts this thoff
> > to make sure that tail calls preserve some sense of progress (so it
> > eventually points to L4 and that's what we export back).
> >
> > > Based on what Stanislav saying above even that is not a guarantee?
> > > I'm struggling to see how users can wrap their heads around this.
> > > It seems bpf_flow.c will become the only prog that can deal with
> > > this range of possible inputs.
> > >
> > > I propose to start with the doc that describes all cases, where
> > > things point to and how prog suppose to parse that.
> > Yeah, that is what I was going to propose - add a doc along with the
> > patch series. I don't see how we can make it simple(r) at this point :-(
> 
> Does it have to be simpler? A flow dissector should be ready to
> dissect VLAN tags. That's the only complication here?
I don't see how it can be made simpler. That's the context from which
existing __skb_flow_dissect is called and that's what we have to dissect
from the BPF as well. We can try to make nhoff to be 0 when the
dissector is called, that's probably the only simplification we can
attempt to do (but, as I said previously, it requires bringing
new kernel context to bpf/stable and seems more complicated than
necessary).

Let me prepare a series for bpf/stable with the small doc describing
BPF flow dissector environment. We can continue the discussion
from there :-)

> > I can try to document everything so users don't have to read the
> > kernel code to understand how to write the bpf flow dissector programs.

  reply	other threads:[~2019-04-01 16:30 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-22 19:58 [RFC bpf-next v3 0/8] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Stanislav Fomichev
2019-03-22 19:58 ` [RFC bpf-next v3 1/8] flow_dissector: allow access only to a subset of __sk_buff fields Stanislav Fomichev
2019-03-22 19:58 ` [RFC bpf-next v3 2/8] flow_dissector: switch kernel context to struct bpf_flow_dissector Stanislav Fomichev
2019-03-22 19:58 ` [RFC bpf-next v3 3/8] flow_dissector: fix clamping of BPF flow_keys for non-zero nhoff Stanislav Fomichev
2019-03-22 19:58 ` [RFC bpf-next v3 4/8] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode Stanislav Fomichev
2019-03-22 19:59 ` [RFC bpf-next v3 5/8] net: plumb network namespace into __skb_flow_dissect Stanislav Fomichev
2019-03-22 19:59 ` [RFC bpf-next v3 6/8] flow_dissector: handle no-skb use case Stanislav Fomichev
2019-03-23  1:00   ` Alexei Starovoitov
2019-03-23  1:19     ` Stanislav Fomichev
2019-03-23  1:41       ` Alexei Starovoitov
2019-03-23 16:05         ` Stanislav Fomichev
2019-03-26  0:35           ` Alexei Starovoitov
2019-03-26 16:45             ` Stanislav Fomichev
2019-03-26 17:48               ` Alexei Starovoitov
2019-03-26 17:51                 ` Willem de Bruijn
2019-03-26 18:08                   ` Alexei Starovoitov
2019-03-26 18:17                     ` Stanislav Fomichev
2019-03-26 18:30                       ` Alexei Starovoitov
2019-03-26 18:54                         ` Stanislav Fomichev
2019-03-27  1:41                           ` Alexei Starovoitov
2019-03-27  2:44                             ` Stanislav Fomichev
2019-03-27 17:55                               ` Alexei Starovoitov
2019-03-27 19:58                                 ` Stanislav Fomichev
2019-03-28  1:26                                   ` Alexei Starovoitov
2019-03-28  3:14                                     ` Willem de Bruijn
2019-03-28  3:32                                       ` Alexei Starovoitov
2019-03-28  4:17                                         ` Stanislav Fomichev
2019-03-28 12:58                                           ` Willem de Bruijn
2019-04-01 16:30                                             ` Stanislav Fomichev [this message]
2019-03-22 19:59 ` [RFC bpf-next v3 7/8] net: pass net argument to the eth_get_headlen Stanislav Fomichev
2019-03-22 19:59 ` [RFC bpf-next v3 8/8] selftests/bpf: add flow dissector bpf_skb_load_bytes helper test Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190401163015.GH7431@mini-arch.hsd1.ca.comcast.net \
    --to=sdf@fomichev.me \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=peterpenkov96@gmail.com \
    --cc=sdf@google.com \
    --cc=simon.horman@netronome.com \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).