bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
	Daniel Borkmann <borkmann@iogearbox.net>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	maze@google.com, lmb@cloudflare.com, shaun@tigera.io,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	marek@cloudflare.com, John Fastabend <john.fastabend@gmail.com>,
	Jakub Kicinski <kuba@kernel.org>,
	eyal.birger@gmail.com, colrack@gmail.com,
	David Ahern <dsahern@kernel.org>,
	brouer@redhat.com
Subject: Re: [PATCH bpf-next V15 2/7] bpf: fix bpf_fib_lookup helper MTU check for SKB ctx
Date: Mon, 8 Feb 2021 16:20:56 +0100	[thread overview]
Message-ID: <20210208162056.44b0236e@carbon> (raw)
In-Reply-To: <20210208145713.4ee3e9ba@carbon>

On Mon, 8 Feb 2021 14:57:13 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> On Fri, 5 Feb 2021 01:06:35 +0100
> Daniel Borkmann <daniel@iogearbox.net> wrote:
> 
> > On 2/2/21 5:26 PM, Jesper Dangaard Brouer wrote:  
> > > BPF end-user on Cilium slack-channel (Carlo Carraro) wants to use
> > > bpf_fib_lookup for doing MTU-check, but *prior* to extending packet size,
> > > by adjusting fib_params 'tot_len' with the packet length plus the expected
> > > encap size. (Just like the bpf_check_mtu helper supports). He discovered
> > > that for SKB ctx the param->tot_len was not used, instead skb->len was used
> > > (via MTU check in is_skb_forwardable() that checks against netdev MTU).
> > > 
> > > Fix this by using fib_params 'tot_len' for MTU check. If not provided (e.g.
> > > zero) then keep existing TC behaviour intact. Notice that 'tot_len' for MTU
> > > check is done like XDP code-path, which checks against FIB-dst MTU.
> > > 
> > > V13:
> > > - Only do ifindex lookup one time, calling dev_get_by_index_rcu().
> > > 
> > > V10:
> > > - Use same method as XDP for 'tot_len' MTU check
> > > 
> > > Fixes: 4c79579b44b1 ("bpf: Change bpf_fib_lookup to return lookup status")
> > > Reported-by: Carlo Carraro <colrack@gmail.com>
> > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > > Acked-by: John Fastabend <john.fastabend@gmail.com>    
> > [...]
> > 
> > I was about to apply the series just now, but on a last double check there is
> > a subtle logic bug in here that still needs fixing unfortunately. :/ See below:
> >   
> > > @@ -5568,7 +5565,9 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
> > >   	   struct bpf_fib_lookup *, params, int, plen, u32, flags)
> > >   {
> > >   	struct net *net = dev_net(skb->dev);
> > > +	struct net_device *dev;
> > >   	int rc = -EAFNOSUPPORT;
> > > +	bool check_mtu = false;
> > >   
> > >   	if (plen < sizeof(*params))
> > >   		return -EINVAL;
> > > @@ -5576,23 +5575,30 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
> > >   	if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT))
> > >   		return -EINVAL;
> > >   
> > > +	dev = dev_get_by_index_rcu(net, params->ifindex);
> > > +	if (unlikely(!dev))
> > > +		return -ENODEV;    
> > 
> > Based on your earlier idea, you try to avoid refetching the dev this way, so
> > here it's being looked up via params->ifindex provided from the BPF prog ...
> >   
> > > +	if (params->tot_len)
> > > +		check_mtu = true;
> > > +
> > >   	switch (params->family) {
> > >   #if IS_ENABLED(CONFIG_INET)
> > >   	case AF_INET:
> > > -		rc = bpf_ipv4_fib_lookup(net, params, flags, false);
> > > +		rc = bpf_ipv4_fib_lookup(net, dev, params, flags, check_mtu);    
> > 
> > ... however, bpf_ipv{4,6}_fib_lookup() might change params->ifindex here to
> > indicate nexthop output dev:
> > 
> > [...]
> >          dev = nhc->nhc_dev;
> > 
> >          params->rt_metric = res.fi->fib_priority;
> >          params->ifindex = dev->ifindex;
> > [...]  
> 
> I want to hear David Ahern, what cases does this cover?

Let me answer this myself (as it should have be obvious to myself).
The ifindex that is used as part of return param, is "set to the device
index of the nexthop from the FIB lookup" (as man-page says).  The
params->ifindex will *always* be updated, and will very likely be
another ifindex, as on "input" it is the ingress-device and on
"output" it will be egress-device (result of FIB lookup).

 
> > >   		break;
> > >   #endif
> > >   #if IS_ENABLED(CONFIG_IPV6)
> > >   	case AF_INET6:
> > > -		rc = bpf_ipv6_fib_lookup(net, params, flags, false);
> > > +		rc = bpf_ipv6_fib_lookup(net, dev, params, flags, check_mtu);
> > >   		break;
> > >   #endif
> > >   	}
> > >   
> > > -	if (!rc) {
> > > -		struct net_device *dev;
> > > -
> > > -		dev = dev_get_by_index_rcu(net, params->ifindex);
> > > +	if (rc == BPF_FIB_LKUP_RET_SUCCESS && !check_mtu) {
> > > +		/* When tot_len isn't provided by user,
> > > +		 * check skb against net_device MTU
> > > +		 */
> > >   		if (!is_skb_forwardable(dev, skb))
> > >   			rc = BPF_FIB_LKUP_RET_FRAG_NEEDED;    
> > 
> > ... so using old cached dev from above will result in wrong MTU check &
> > subsequent passing of wrong params->mtu_result = dev->mtu this way.   
> 
> Yes, you are right, params->ifindex have a chance to change in the calls.
> So, our attempt to save an ifindex lookup (dev_get_by_index_rcu) is not
> correct.
> 
> > So one
> > way to fix is that we would need to pass &dev to bpf_ipv{4,6}_fib_lookup().  
> 
> Ok, I will try to code it up, and see how ugly it looks, but I'm no
> longer sure that it is worth saving this ifindex lookup, as it will
> only happen if BPF-prog didn't specify params->tot_len.

I guess we can still do this as an "optimization", but the dev/ifindex
will very likely be another at this point.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


  reply	other threads:[~2021-02-08 15:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-02 16:26 [PATCH bpf-next V15 0/7] bpf: New approach for BPF MTU handling Jesper Dangaard Brouer
2021-02-02 16:26 ` [PATCH bpf-next V15 1/7] bpf: Remove MTU check in __bpf_skb_max_len Jesper Dangaard Brouer
2021-02-02 16:26 ` [PATCH bpf-next V15 2/7] bpf: fix bpf_fib_lookup helper MTU check for SKB ctx Jesper Dangaard Brouer
2021-02-05  0:06   ` Daniel Borkmann
2021-02-08 13:57     ` Jesper Dangaard Brouer
2021-02-08 15:20       ` Jesper Dangaard Brouer [this message]
2021-02-08 15:41         ` Daniel Borkmann
2021-02-08 16:27           ` Jesper Dangaard Brouer
2021-02-08 16:46             ` Daniel Borkmann
2021-02-02 16:26 ` [PATCH bpf-next V15 3/7] bpf: bpf_fib_lookup return MTU value as output when looked up Jesper Dangaard Brouer
2021-02-02 16:27 ` [PATCH bpf-next V15 4/7] bpf: add BPF-helper for MTU checking Jesper Dangaard Brouer
2021-02-02 16:27 ` [PATCH bpf-next V15 5/7] bpf: drop MTU check when doing TC-BPF redirect to ingress Jesper Dangaard Brouer
2021-02-02 16:27 ` [PATCH bpf-next V15 6/7] selftests/bpf: use bpf_check_mtu in selftest test_cls_redirect Jesper Dangaard Brouer
2021-02-02 16:27 ` [PATCH bpf-next V15 7/7] selftests/bpf: tests using bpf_check_mtu BPF-helper Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210208162056.44b0236e@carbon \
    --to=brouer@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=borkmann@iogearbox.net \
    --cc=bpf@vger.kernel.org \
    --cc=colrack@gmail.com \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@kernel.org \
    --cc=eyal.birger@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lmb@cloudflare.com \
    --cc=lorenzo@kernel.org \
    --cc=marek@cloudflare.com \
    --cc=maze@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=shaun@tigera.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).