Re: [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Stanislav Fomichev <sdf@fomichev.me>
Cc: Stanislav Fomichev <sdf@google.com>,
	Network Development <netdev@vger.kernel.org>,
	bpf@vger.kernel.org, David Miller <davem@davemloft.net>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	simon.horman@netronome.com, Willem de Bruijn <willemb@google.com>,
	Petar Penkov <peterpenkov96@gmail.com>
Subject: Re: [RFC bpf-next v2 7/9] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode
Date: Wed, 20 Mar 2019 16:03:26 -0400	[thread overview]
Message-ID: <CAF=yD-Jbtz39CLXp2TeUiVyW4DPmNFGscy2YosLUQ+=ODM_syg@mail.gmail.com> (raw)
In-Reply-To: <20190320194854.GN7431@mini-arch.hsd1.ca.comcast.net>

On Wed, Mar 20, 2019 at 3:48 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 03/20, Willem de Bruijn wrote:
> > On Wed, Mar 20, 2019 at 3:19 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > >
> > > On 03/20, Willem de Bruijn wrote:
> > > > On Wed, Mar 20, 2019 at 3:02 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > > >
> > > > > On 03/20, Willem de Bruijn wrote:
> > > > > > On Wed, Mar 20, 2019 at 12:57 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > > > > >
> > > > > > > On 03/19, Willem de Bruijn wrote:
> > > > > > > > On Tue, Mar 19, 2019 at 6:21 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > > > > > >
> > > > > > > > > Now that we have __flow_bpf_dissect which works on raw data (by
> > > > > > > > > constructing temporary on-stack skb), use it when doing
> > > > > > > > > BPF_PROG_TEST_RUN for flow dissector.
> > > > > > > > >
> > > > > > > > > This should help us catch any possible bugs due to missing shinfo on
> > > > > > > > > the per-cpu skb.
> > > > > > > > >
> > > > > > > > > Note that existing __skb_flow_bpf_dissect swallows L2 headers and returns
> > > > > > > > > nhoff=0, we need to preserve the existing behavior.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > > > > > ---
> > > > > > > > >  net/bpf/test_run.c | 48 ++++++++++++++--------------------------------
> > > > > > > > >  1 file changed, 14 insertions(+), 34 deletions(-)
> > > > > > > > >
> > > > > > > >
> > > > > > > > > @@ -300,9 +277,13 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> > > > > > > > >         preempt_disable();
> > > > > > > > >         time_start = ktime_get_ns();
> > > > > > > > >         for (i = 0; i < repeat; i++) {
> > > > > > > > > -               retval = bpf_flow_dissect_skb(prog, skb,
> > > > > > > > > -                                             &flow_keys_dissector,
> > > > > > > > > -                                             &flow_keys);
> > > > > > > > > +               retval = bpf_flow_dissect(prog, data, eth->h_proto, ETH_HLEN,
> > > > > > > > > +                                         size, &flow_keys_dissector,
> > > > > > > > > +                                         &flow_keys);
> > > > > > > > > +               if (flow_keys.nhoff >= ETH_HLEN)
> > > > > > > > > +                       flow_keys.nhoff -= ETH_HLEN;
> > > > > > > > > +               if (flow_keys.thoff >= ETH_HLEN)
> > > > > > > > > +                       flow_keys.thoff -= ETH_HLEN;
> > > > > > > >
> > > > > > > > why are these conditional?
> > > > > > > Hm, I didn't want these to be negative, because bpf flow program can set
> > > > > > > them to zero and clamp_flow_keys makes sure they are in a "sensible"
> > > > > > > range. For this particular case, I think we need to amend
> > > > > > > clamp_flow_keys to make sure that flow_keys.nhoff is in the range of
> > > > > > > initial_nhoff..hlen, not 0..hlen (and then we can drop these checks).
> > > > > >
> > > > > > So, previously eth_type_trans would call with data at the network
> > > > > > header. Now it is called with data at the link layer. How would
> > > > > > __skb_flow_bpf_dissect "swallows L2 headers and returns nhoff=0"? That
> > > > > s/__skb_flow_bpf_dissect/eth_type_trans/, I'll clarify that in the patch
> > > > > description.
> > > > >
> > > > > > sounds incorrect.
> > > > > Previously, for skb case, eth_type_trans would pull ETH_HLEN (L2) and
> > > > > after that we did skb_reset_network_header. So when later we initialized
> > > > > flow keys (flow_keys->nhoff = skb_network_offset(skb)), that would
> > > > > yield nhoff == 0.
> > > > >
> > > > > For example, see:
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> > > > >
> > > > > Now, we explicitly call bpf_flow_dissect with nhoff=ETH_HLEN and have to
> > > > > undo it, otherwise, it breaks those tests.
> > > > >
> > > > > We could do something like the following instead:
> > > > > retval = bpf_flow_dissect(prog, data + ETH_HLEN, eth->h_proto, 0,
> > > > >                           size, &flow_keys_dissector,
> > > > >                           &flow_keys);
> > > > >
> > > > > But I wanted to make sure nhoff != 0 works.
> > > >
> > > > Makes sense. Ensuring that nhoff lies within initial_nhoff..hlen
> > > > sounds correct to me. But this is a limitation of the test, so should
> > > > be in the test logic, not in the generic clamp code. Perhaps just fail
> > > > the test if returned nhoff < ETH_HLEN?
> > > I don't think it's only about the tests. BPF program can return
> > > nhoff/thoff out of range as well (if there was some bug in its logic,
> > > for example). We should not blindly trust whatever it returns, right?
> >
> > Definitely. That's why we clamp. I'm not sure that we have to restrict
> > the minimum offset to initial nhoff, however.
> Makes sense. TBH, only the tests currently care about nhoff that flow
> dissector returns. In the kernel we use only thoff from bpf flow dissector
> and ignore any modifications to the nhoff.
>
> Do you think there is a usecase for nhoff possibly going backwards?
> In other words, why not prohibit that from the beginning and set the
> expectations strait (i.e. nhoff only grows).

Fair point. That is how the non bpf flow dissector works. And if the
initial offset is always sensible, indeed I see no reasonable case
where the program would return a lower value. I was a bit concerned
about that precondition. In practice, all but one caller passes 0 and
data at network header, where this discussion is moot. And the
exception is eth_get_headlen which hardcodes ETH_HLEN. Given that, y
our original suggestion to adjust the clamp function SGTM.