Re: [PATCH bpf-next V3 0/6] bpf: New approach for BPF MTU handling

From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>,
	bpf@vger.kernel.org, netdev@vger.kernel.org,
	Daniel Borkmann <borkmann@iogearbox.net>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	maze@google.com, lmb@cloudflare.com, shaun@tigera.io,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	marek@cloudflare.com, eyal.birger@gmail.com, brouer@redhat.com
Subject: Re: [PATCH bpf-next V3 0/6] bpf: New approach for BPF MTU handling
Date: Tue, 13 Oct 2020 22:40:09 +0200	[thread overview]
Message-ID: <20201013224009.77d6f746@carbon> (raw)
In-Reply-To: <20201010093212.374d1e68@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Sat, 10 Oct 2020 09:32:12 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> On Sat, 10 Oct 2020 12:44:02 +0200 Jesper Dangaard Brouer wrote:
> > > > > We will not be sprinkling validation checks across the drivers because
> > > > > some reconfiguration path may occasionally yield a bad packet, or it's
> > > > > hard to do something right with BPF.        
> > > > 
> > > > This is a driver bug then. As it stands today drivers may get hit with
> > > > skb with MTU greater than set MTU as best I can tell.      
> > > 
> > > You're talking about taking it from "maybe this can happen, but will
> > > still be at most jumbo" to "it's going to be very easy to trigger and
> > > length may be > MAX_U16".    
> > 
> > It is interesting that a misbehaving BPF program can easily trigger this.
> > Next week, I will looking writing such a BPF-prog and then test it on
> > the hardware I have avail in my testlab.  

I've tested sending different packet sizes that exceed the MTU on
different hardware. They all silently drop the transmitted packet. mlx5
and i40e configured to (L3) MTU 1500, will lets through upto 1504, while
ixgbe will drop size 1504.

Packets can be observed locally with tcpdump, but the other end doesn't
receive the packet. I didn't find any counters (including ethtool -S)
indicating these packets were dropped at hardware/firmware level, which
were a little concerning for later troubleshooting.

Another observation is that size increases (with bpf_skb_adjust_room)
above 4096 + e.g 128 will likely fail, even-though I have the 64K limit in
this kernel.

> FWIW I took a quick swing at testing it with the HW I have and it did
> exactly what hardware should do. The TX unit entered an error state 
> and then the driver detected that and reset it a few seconds later.

The drivers (i40e, mlx5, ixgbe) I tested with didn't entered an error
state, when getting packets exceeding the MTU.  I didn't go much above
4K, so maybe I didn't trigger those cases.

> Hardware is almost always designed to behave like that. If some NIC
> actually cleanly drops over sized TX frames, I'd bet it's done in FW,
> or some other software piece.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer