bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Xu <dxu@dxuuu.xyz>
To: bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	netdev@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH bpf-next v2 0/8] Support defragmenting IPv(4|6) packets in BPF
Date: Mon, 27 Feb 2023 12:51:02 -0700	[thread overview]
Message-ID: <cover.1677526810.git.dxu@dxuuu.xyz> (raw)

=== Context ===

In the context of a middlebox, fragmented packets are tricky to handle.
The full 5-tuple of a packet is often only available in the first
fragment which makes enforcing consistent policy difficult. There are
really only two stateless options, neither of which are very nice:

1. Enforce policy on first fragment and accept all subsequent fragments.
   This works but may let in certain attacks or allow data exfiltration.

2. Enforce policy on first fragment and drop all subsequent fragments.
   This does not really work b/c some protocols may rely on
   fragmentation. For example, DNS may rely on oversized UDP packets for
   large responses.

So stateful tracking is the only sane option. RFC 8900 [0] calls this
out as well in section 6.3:

    Middleboxes [...] should process IP fragments in a manner that is
    consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes
    must maintain state in order to achieve this goal.

=== BPF related bits ===

However, when policy is enforced through BPF, the prog is run before the
kernel reassembles fragmented packets. This leaves BPF developers in a
awkward place: implement reassembly (possibly poorly) or use a stateless
method as described above.

Fortunately, the kernel has robust support for fragmented IP packets.
This patchset wraps the existing defragmentation facilities in kfuncs so
that BPF progs running on middleboxes can reassemble fragmented packets
before applying policy.

=== Patchset details ===

This patchset is (hopefully) relatively straightforward from BPF perspective.
One thing I'd like to call out is the skb_copy()ing of the prog skb. I
did this to maintain the invariant that the ctx remains valid after prog
has run. This is relevant b/c ip_defrag() and ip_check_defrag() may
consume the skb if the skb is a fragment.

Originally I did play around with teaching the verifier about kfuncs
that may consume the ctx and disallowing ctx accesses in ret != 0
branches. It worked ok, but it seemed too complex to modify the
surrounding assumptions about ctx validity.

[0]: https://datatracker.ietf.org/doc/html/rfc8900

===

Changes from v1:
* Add support for ipv6 defragmentation


Daniel Xu (8):
  ip: frags: Return actual error codes from ip_check_defrag()
  bpf: verifier: Support KF_CHANGES_PKT flag
  bpf, net, frags: Add bpf_ip_check_defrag() kfunc
  net: ipv6: Factor ipv6_frag_rcv() to take netns and user
  bpf: net: ipv6: Add bpf_ipv6_frag_rcv() kfunc
  bpf: selftests: Support not connecting client socket
  bpf: selftests: Support custom type and proto for client sockets
  bpf: selftests: Add defrag selftests

 Documentation/bpf/kfuncs.rst                  |   7 +
 drivers/net/macvlan.c                         |   2 +-
 include/linux/btf.h                           |   1 +
 include/net/ip.h                              |  11 +
 include/net/ipv6.h                            |   1 +
 include/net/ipv6_frag.h                       |   1 +
 include/net/transp_v6.h                       |   1 +
 kernel/bpf/verifier.c                         |   8 +
 net/ipv4/Makefile                             |   1 +
 net/ipv4/ip_fragment.c                        |  15 +-
 net/ipv4/ip_fragment_bpf.c                    |  98 ++++++
 net/ipv6/Makefile                             |   1 +
 net/ipv6/af_inet6.c                           |   4 +
 net/ipv6/reassembly.c                         |  16 +-
 net/ipv6/reassembly_bpf.c                     | 143 ++++++++
 net/packet/af_packet.c                        |   2 +-
 tools/testing/selftests/bpf/Makefile          |   3 +-
 .../selftests/bpf/generate_udp_fragments.py   |  90 +++++
 .../selftests/bpf/ip_check_defrag_frags.h     |  57 +++
 tools/testing/selftests/bpf/network_helpers.c |  26 +-
 tools/testing/selftests/bpf/network_helpers.h |   3 +
 .../bpf/prog_tests/ip_check_defrag.c          | 327 ++++++++++++++++++
 .../selftests/bpf/progs/bpf_tracing_net.h     |   1 +
 .../selftests/bpf/progs/ip_check_defrag.c     | 133 +++++++
 24 files changed, 931 insertions(+), 21 deletions(-)
 create mode 100644 net/ipv4/ip_fragment_bpf.c
 create mode 100644 net/ipv6/reassembly_bpf.c
 create mode 100755 tools/testing/selftests/bpf/generate_udp_fragments.py
 create mode 100644 tools/testing/selftests/bpf/ip_check_defrag_frags.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/ip_check_defrag.c
 create mode 100644 tools/testing/selftests/bpf/progs/ip_check_defrag.c

-- 
2.39.1


             reply	other threads:[~2023-02-27 19:51 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-27 19:51 Daniel Xu [this message]
2023-02-27 19:51 ` [PATCH bpf-next v2 1/8] ip: frags: Return actual error codes from ip_check_defrag() Daniel Xu
2023-02-27 19:51 ` [PATCH bpf-next v2 2/8] bpf: verifier: Support KF_CHANGES_PKT flag Daniel Xu
2023-02-27 19:51 ` [PATCH bpf-next v2 3/8] bpf, net, frags: Add bpf_ip_check_defrag() kfunc Daniel Xu
2023-02-28 19:37   ` Stanislav Fomichev
2023-02-28 22:00     ` Daniel Xu
2023-02-28 22:18       ` Stanislav Fomichev
2023-02-27 19:51 ` [PATCH bpf-next v2 4/8] net: ipv6: Factor ipv6_frag_rcv() to take netns and user Daniel Xu
2023-02-27 19:51 ` [PATCH bpf-next v2 5/8] bpf: net: ipv6: Add bpf_ipv6_frag_rcv() kfunc Daniel Xu
2023-02-28  8:15   ` kernel test robot
2023-02-28  9:37   ` kernel test robot
2023-02-27 19:51 ` [PATCH bpf-next v2 6/8] bpf: selftests: Support not connecting client socket Daniel Xu
2023-02-27 19:51 ` [PATCH bpf-next v2 7/8] bpf: selftests: Support custom type and proto for client sockets Daniel Xu
2023-02-27 19:51 ` [PATCH bpf-next v2 8/8] bpf: selftests: Add defrag selftests Daniel Xu
2023-02-27 20:38 ` [PATCH bpf-next v2 0/8] Support defragmenting IPv(4|6) packets in BPF Edward Cree
2023-02-27 22:04   ` Daniel Xu
2023-02-27 22:58     ` Edward Cree
2023-03-01 16:24       ` Daniel Xu
2023-02-27 23:03 ` Alexei Starovoitov
     [not found]   ` <20230228015712.clq6kyrsd7rrklbz@kashmir.localdomain>
2023-02-28  4:56     ` Alexei Starovoitov
2023-02-28 13:43       ` Daniel Borkmann
2023-02-28 23:17       ` Daniel Xu
2023-03-07  4:17         ` Alexei Starovoitov
2023-03-07 19:48           ` Daniel Xu
2023-03-07 20:11             ` Florian Westphal
2023-03-07 21:18               ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1677526810.git.dxu@dxuuu.xyz \
    --to=dxu@dxuuu.xyz \
    --cc=bpf@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).