All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: Daniel Xu <dxu@dxuuu.xyz>,
	"David S. Miller" <davem@davemloft.net>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: ppenkov@aviatrix.com, dbird@aviatrix.com,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next 3/6] bpf, net, frags: Add bpf_ip_check_defrag() kfunc
Date: Thu, 15 Dec 2022 23:31:52 +0100	[thread overview]
Message-ID: <451b291a-7798-cfe2-84da-815937b54f70@iogearbox.net> (raw)
In-Reply-To: <1f48a340a898c4d22d65e0e445dbf15f72081b9a.1671049840.git.dxu@dxuuu.xyz>

Hi Daniel,

Thanks for working on this!

On 12/15/22 12:25 AM, Daniel Xu wrote:
[...]
> +#include <linux/bpf.h>
> +#include <linux/btf_ids.h>
> +#include <linux/ip.h>
> +#include <linux/filter.h>
> +#include <linux/netdevice.h>
> +#include <net/ip.h>
> +#include <net/sock.h>
> +
> +__diag_push();
> +__diag_ignore_all("-Wmissing-prototypes",
> +		  "Global functions as their definitions will be in ip_fragment BTF");
> +
> +/* bpf_ip_check_defrag - Defragment an ipv4 packet
> + *
> + * This helper takes an skb as input. If this skb successfully reassembles
> + * the original packet, the skb is updated to contain the original, reassembled
> + * packet.
> + *
> + * Otherwise (on error or incomplete reassembly), the input skb remains
> + * unmodified.
> + *
> + * Parameters:
> + * @ctx		- Pointer to program context (skb)
> + * @netns	- Child network namespace id. If value is a negative signed
> + *		  32-bit integer, the netns of the device in the skb is used.
> + *
> + * Return:
> + * 0 on successfully reassembly or non-fragmented packet. Negative value on
> + * error or incomplete reassembly.
> + */
> +int bpf_ip_check_defrag(struct __sk_buff *ctx, u64 netns)

small nit: for sk lookup helper we've used u32 netns_id, would be nice to have
this consistent here as well.

> +{
> +	struct sk_buff *skb = (struct sk_buff *)ctx;
> +	struct sk_buff *skb_cpy, *skb_out;
> +	struct net *caller_net;
> +	struct net *net;
> +	int mac_len;
> +	void *mac;
> +
> +	if (unlikely(!((s32)netns < 0 || netns <= S32_MAX)))
> +		return -EINVAL;
> +
> +	caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk);
> +	if ((s32)netns < 0) {
> +		net = caller_net;
> +	} else {
> +		net = get_net_ns_by_id(caller_net, netns);
> +		if (unlikely(!net))
> +			return -EINVAL;
> +	}
> +
> +	mac_len = skb->mac_len;
> +	skb_cpy = skb_copy(skb, GFP_ATOMIC);
> +	if (!skb_cpy)
> +		return -ENOMEM;

Given slow path, this idea is expensive but okay. Maybe in future it could be lifted
which might be a bigger lift to teach verifier that input ctx cannot be accessed
anymore.. but then frags are very much discouraged either way and bpf_ip_check_defrag()
might only apply in corner case situations (like DNS, etc).

> +	skb_out = ip_check_defrag(net, skb_cpy, IP_DEFRAG_BPF);
> +	if (IS_ERR(skb_out))
> +		return PTR_ERR(skb_out);

Looks like ip_check_defrag() can gracefully handle IPv6 packet. It will just return back
skb_cpy pointer in that case. However, this brings me to my main complaint.. I don't
think we should merge anything IPv4-related without also having IPv6 equivalent support,
otherwise we're building up tech debt, so pls also add support for the latter.

> +	skb_morph(skb, skb_out);
> +	kfree_skb(skb_out);
> +
> +	/* ip_check_defrag() does not maintain mac header, so push empty header
> +	 * in so prog sees the correct layout. The empty mac header will be
> +	 * later pulled from cls_bpf.
> +	 */
> +	mac = skb_push(skb, mac_len);
> +	memset(mac, 0, mac_len);
> +	bpf_compute_data_pointers(skb);
> +
> +	return 0;
> +}
> +

Thanks,
Daniel

  reply	other threads:[~2022-12-15 22:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-14 23:25 [PATCH bpf-next 0/6] Support defragmenting IPv4 packets in BPF Daniel Xu
2022-12-14 23:25 ` [PATCH bpf-next 1/6] ip: frags: Return actual error codes from ip_check_defrag() Daniel Xu
2022-12-14 23:25 ` [PATCH bpf-next 2/6] bpf: verifier: Support KF_CHANGES_PKT flag Daniel Xu
2022-12-17 18:08   ` kernel test robot
2022-12-14 23:25 ` [PATCH bpf-next 3/6] bpf, net, frags: Add bpf_ip_check_defrag() kfunc Daniel Xu
2022-12-15 22:31   ` Daniel Borkmann [this message]
2022-12-15 23:58     ` Daniel Xu
2022-12-14 23:25 ` [PATCH bpf-next 4/6] bpf: selftests: Support not connecting client socket Daniel Xu
2022-12-14 23:25 ` [PATCH bpf-next 5/6] bpf: selftests: Support custom type and proto for client sockets Daniel Xu
2022-12-14 23:25 ` [PATCH bpf-next 6/6] bpf: selftests: Add bpf_ip_check_defrag() selftest Daniel Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=451b291a-7798-cfe2-84da-815937b54f70@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=dbird@aviatrix.com \
    --cc=dsahern@kernel.org \
    --cc=dxu@dxuuu.xyz \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=ppenkov@aviatrix.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.