netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Xu <dxu@dxuuu.xyz>
To: Edward Cree <ecree.xilinx@gmail.com>
Cc: bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	netdev@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH bpf-next v2 0/8] Support defragmenting IPv(4|6) packets in BPF
Date: Mon, 27 Feb 2023 15:04:06 -0700	[thread overview]
Message-ID: <20230227220406.4x45jcigpnjjpdfy@kashmir.localdomain> (raw)
In-Reply-To: <cf49a091-9b14-05b8-6a79-00e56f3019e1@gmail.com>

Hi Ed,

Thanks for giving this a look.

On Mon, Feb 27, 2023 at 08:38:41PM +0000, Edward Cree wrote:
> On 27/02/2023 19:51, Daniel Xu wrote:
> > However, when policy is enforced through BPF, the prog is run before the
> > kernel reassembles fragmented packets. This leaves BPF developers in a
> > awkward place: implement reassembly (possibly poorly) or use a stateless
> > method as described above.
> 
> Just out of curiosity - what stops BPF progs using the middle ground of
>  stateful validation?  I'm thinking of something like:
> First-frag: run the usual checks on L4 headers etc, if we PASS then save
>  IPID and maybe expected next frag-offset into a map.  But don't try to
>  stash the packet contents anywhere for later reassembly, just PASS it.
> Subsequent frags: look up the IPID in the map.  If we find it, validate
>  and update the frag-offset in the map; if this is the last fragment then
>  delete the map entry.  If the frag-offset was bogus or the IPID wasn't
>  found in the map, DROP; otherwise PASS.
> (If re-ordering is prevalent then use something more sophisticated than
>  just expected next frag-offset, but the principle is the same. And of
>  course you might want to put in timers for expiry etc.)
> So this avoids the need to stash the packet data and modify/consume SKBs,
>  because you're not actually doing reassembly; the down-side is that the
>  BPF program can't so easily make decisions about the application-layer
>  contents of the fragmented datagram, but for the common case (we just
>  care about the 5-tuple) it's simple enough.
> But I haven't actually tried it, so maybe there's some obvious reason why
>  it can't work this way.

I don't believe full L4 headers are required in the first fragment.
Sufficiently sneaky attackers can, I think, send a byte at a time to
subvert your proposed algorithm. Storing skb data seems inevitable here.
Someone can correct me if I'm wrong here.

Reordering like you mentioned is another attack vector. Perhaps there
are more sophisticated semi-stateful algorithms that can solve the
problem, but it leads me to my next point.

A semi-stateful method like you are proposing is concerning to me from a
reliability and correctness stand point. Such a method can suffer from
impedance mismatches with the rest of the system. For example, whatever
map sizes you choose should probably be aligned with sysfs conntrack
values otherwise you may get some very interesting and unexpected pkt
drops. I think cilium had a talk about debugging a related conntrack
issue in the same vein a while ago. Furthermore, the debugging and
troubleshooting facilities will be different (counters, logs, etc).

Unless someone has had lots of experience writing an ip stack from
the ground up, I suspect there are quite a few more unknown-unknowns
here. What I find valuable about this patch series is that we can
leverage the well understood and battle hardened kernel facilities. So
avoid all the correctness and security issues that the kernel has spent
20+ years fixing. And make it trivial for the next person that comes
along to do the right thing.

Hopefully this all makes sense.

Thanks,
Daniel

  reply	other threads:[~2023-02-27 22:04 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-27 19:51 [PATCH bpf-next v2 0/8] Support defragmenting IPv(4|6) packets in BPF Daniel Xu
2023-02-27 19:51 ` [PATCH bpf-next v2 1/8] ip: frags: Return actual error codes from ip_check_defrag() Daniel Xu
2023-02-27 19:51 ` [PATCH bpf-next v2 3/8] bpf, net, frags: Add bpf_ip_check_defrag() kfunc Daniel Xu
2023-02-28 19:37   ` Stanislav Fomichev
2023-02-28 22:00     ` Daniel Xu
2023-02-28 22:18       ` Stanislav Fomichev
2023-02-27 19:51 ` [PATCH bpf-next v2 4/8] net: ipv6: Factor ipv6_frag_rcv() to take netns and user Daniel Xu
2023-02-27 19:51 ` [PATCH bpf-next v2 5/8] bpf: net: ipv6: Add bpf_ipv6_frag_rcv() kfunc Daniel Xu
2023-02-28  8:15   ` kernel test robot
2023-02-28  9:37   ` kernel test robot
2023-02-27 20:38 ` [PATCH bpf-next v2 0/8] Support defragmenting IPv(4|6) packets in BPF Edward Cree
2023-02-27 22:04   ` Daniel Xu [this message]
2023-02-27 22:58     ` Edward Cree
2023-03-01 16:24       ` Daniel Xu
2023-02-27 23:03 ` Alexei Starovoitov
     [not found]   ` <20230228015712.clq6kyrsd7rrklbz@kashmir.localdomain>
2023-02-28  4:56     ` Alexei Starovoitov
2023-02-28 13:43       ` Daniel Borkmann
2023-02-28 23:17       ` Daniel Xu
2023-03-07  4:17         ` Alexei Starovoitov
2023-03-07 19:48           ` Daniel Xu
2023-03-07 20:11             ` Florian Westphal
2023-03-07 21:18               ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230227220406.4x45jcigpnjjpdfy@kashmir.localdomain \
    --to=dxu@dxuuu.xyz \
    --cc=bpf@vger.kernel.org \
    --cc=ecree.xilinx@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).