All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb
@ 2021-06-01 22:18 Tanner Love
  2021-06-01 22:18 ` [PATCH net-next v3 1/3] net: flow_dissector: extend bpf flow dissector support with vnet hdr Tanner Love
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Tanner Love @ 2021-06-01 22:18 UTC (permalink / raw)
  To: netdev
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eric Dumazet, Willem de Bruijn, Petar Penkov, Jakub Kicinski,
	Tanner Love

From: Tanner Love <tannerlove@google.com>

First patch extends the flow dissector BPF program type to accept
virtio-net header members.

Second patch uses this feature to add optional flow dissection in
virtio_net_hdr_to_skb(). This allows admins to define permitted
packets more strictly, for example dropping deprecated UDP_UFO
packets.

Third patch extends kselftest to cover this feature.

Tanner Love (3):
  net: flow_dissector: extend bpf flow dissector support with vnet hdr
  virtio_net: add optional flow dissection in virtio_net_hdr_to_skb
  selftests/net: amend bpf flow dissector prog to do vnet hdr validation

 drivers/net/bonding/bond_main.c               |   2 +-
 include/linux/skbuff.h                        |  26 ++-
 include/linux/virtio_net.h                    |  25 ++-
 include/net/flow_dissector.h                  |   6 +
 include/uapi/linux/bpf.h                      |   6 +
 net/core/filter.c                             |  55 +++++
 net/core/flow_dissector.c                     |  27 ++-
 net/core/sysctl_net_core.c                    |   9 +
 tools/include/uapi/linux/bpf.h                |   6 +
 tools/testing/selftests/bpf/progs/bpf_flow.c  | 188 +++++++++++++-----
 .../selftests/bpf/test_flow_dissector.c       | 181 +++++++++++++++--
 .../selftests/bpf/test_flow_dissector.sh      |  19 ++
 12 files changed, 470 insertions(+), 80 deletions(-)

-- 
2.32.0.rc0.204.g9fa02ecfa5-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next v3 1/3] net: flow_dissector: extend bpf flow dissector support with vnet hdr
  2021-06-01 22:18 [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb Tanner Love
@ 2021-06-01 22:18 ` Tanner Love
  2021-06-03 15:39   ` sdf
  2021-06-01 22:18 ` [PATCH net-next v3 2/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb Tanner Love
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Tanner Love @ 2021-06-01 22:18 UTC (permalink / raw)
  To: netdev
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eric Dumazet, Willem de Bruijn, Petar Penkov, Jakub Kicinski,
	Tanner Love

From: Tanner Love <tannerlove@google.com>

Amend the bpf flow dissector program type to accept virtio_net_hdr
members. Do this to enable bpf flow dissector programs to perform
virtio-net header validation. The next patch in this series will add
a flow dissection hook in virtio_net_hdr_to_skb and make use of this
extended functionality. That commit message has more background on the
use case.

Signed-off-by: Tanner Love <tannerlove@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Petar Penkov <ppenkov@google.com>
---
 drivers/net/bonding/bond_main.c |  2 +-
 include/linux/skbuff.h          | 26 ++++++++++++----
 include/net/flow_dissector.h    |  6 ++++
 include/uapi/linux/bpf.h        |  6 ++++
 net/core/filter.c               | 55 +++++++++++++++++++++++++++++++++
 net/core/flow_dissector.c       | 24 ++++++++++++--
 tools/include/uapi/linux/bpf.h  |  6 ++++
 7 files changed, 116 insertions(+), 9 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 7e469c203ca5..5d2d7d5c5704 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3554,7 +3554,7 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
 	case BOND_XMIT_POLICY_ENCAP34:
 		memset(fk, 0, sizeof(*fk));
 		return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
-					  fk, NULL, 0, 0, 0, 0);
+					  fk, NULL, 0, 0, 0, 0, NULL);
 	default:
 		break;
 	}
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index dbf820a50a39..fef8f4b5db6e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1312,18 +1312,20 @@ struct bpf_flow_dissector;
 bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
 		      __be16 proto, int nhoff, int hlen, unsigned int flags);
 
+struct virtio_net_hdr;
 bool __skb_flow_dissect(const struct net *net,
 			const struct sk_buff *skb,
 			struct flow_dissector *flow_dissector,
 			void *target_container, const void *data,
-			__be16 proto, int nhoff, int hlen, unsigned int flags);
+			__be16 proto, int nhoff, int hlen, unsigned int flags,
+			const struct virtio_net_hdr *vhdr);
 
 static inline bool skb_flow_dissect(const struct sk_buff *skb,
 				    struct flow_dissector *flow_dissector,
 				    void *target_container, unsigned int flags)
 {
 	return __skb_flow_dissect(NULL, skb, flow_dissector,
-				  target_container, NULL, 0, 0, 0, flags);
+				  target_container, NULL, 0, 0, 0, flags, NULL);
 }
 
 static inline bool skb_flow_dissect_flow_keys(const struct sk_buff *skb,
@@ -1332,7 +1334,20 @@ static inline bool skb_flow_dissect_flow_keys(const struct sk_buff *skb,
 {
 	memset(flow, 0, sizeof(*flow));
 	return __skb_flow_dissect(NULL, skb, &flow_keys_dissector,
-				  flow, NULL, 0, 0, 0, flags);
+				  flow, NULL, 0, 0, 0, flags, NULL);
+}
+
+static inline bool
+__skb_flow_dissect_flow_keys_basic(const struct net *net,
+				   const struct sk_buff *skb,
+				   struct flow_keys_basic *flow,
+				   const void *data, __be16 proto,
+				   int nhoff, int hlen, unsigned int flags,
+				   const struct virtio_net_hdr *vhdr)
+{
+	memset(flow, 0, sizeof(*flow));
+	return __skb_flow_dissect(net, skb, &flow_keys_basic_dissector, flow,
+				  data, proto, nhoff, hlen, flags, vhdr);
 }
 
 static inline bool
@@ -1342,9 +1357,8 @@ skb_flow_dissect_flow_keys_basic(const struct net *net,
 				 const void *data, __be16 proto,
 				 int nhoff, int hlen, unsigned int flags)
 {
-	memset(flow, 0, sizeof(*flow));
-	return __skb_flow_dissect(net, skb, &flow_keys_basic_dissector, flow,
-				  data, proto, nhoff, hlen, flags);
+	return __skb_flow_dissect_flow_keys_basic(net, skb, flow, data, proto,
+						  nhoff, hlen, flags, NULL);
 }
 
 void skb_flow_dissect_meta(const struct sk_buff *skb,
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index ffd386ea0dbb..0796ad745e69 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -370,6 +370,12 @@ struct bpf_flow_dissector {
 	const struct sk_buff	*skb;
 	const void		*data;
 	const void		*data_end;
+	__u8			vhdr_flags;
+	__u8			vhdr_gso_type;
+	__u16			vhdr_hdr_len;
+	__u16			vhdr_gso_size;
+	__u16			vhdr_csum_start;
+	__u16			vhdr_csum_offset;
 };
 
 static inline void
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 418b9b813d65..de525defd462 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5155,6 +5155,12 @@ struct __sk_buff {
 	__u32 gso_segs;
 	__bpf_md_ptr(struct bpf_sock *, sk);
 	__u32 gso_size;
+	__u8  vhdr_flags;
+	__u8  vhdr_gso_type;
+	__u16 vhdr_hdr_len;
+	__u16 vhdr_gso_size;
+	__u16 vhdr_csum_start;
+	__u16 vhdr_csum_offset;
 };
 
 struct bpf_tunnel_key {
diff --git a/net/core/filter.c b/net/core/filter.c
index 239de1306de9..af45e769ced6 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8358,6 +8358,16 @@ static bool flow_dissector_is_valid_access(int off, int size,
 			return false;
 		info->reg_type = PTR_TO_FLOW_KEYS;
 		return true;
+	case bpf_ctx_range(struct __sk_buff, len):
+		return size == size_default;
+	case bpf_ctx_range(struct __sk_buff, vhdr_flags):
+	case bpf_ctx_range(struct __sk_buff, vhdr_gso_type):
+		return size == sizeof(__u8);
+	case bpf_ctx_range(struct __sk_buff, vhdr_hdr_len):
+	case bpf_ctx_range(struct __sk_buff, vhdr_gso_size):
+	case bpf_ctx_range(struct __sk_buff, vhdr_csum_start):
+	case bpf_ctx_range(struct __sk_buff, vhdr_csum_offset):
+		return size == sizeof(__u16);
 	default:
 		return false;
 	}
@@ -8390,6 +8400,51 @@ static u32 flow_dissector_convert_ctx_access(enum bpf_access_type type,
 				      si->dst_reg, si->src_reg,
 				      offsetof(struct bpf_flow_dissector, flow_keys));
 		break;
+
+	case offsetof(struct __sk_buff, vhdr_flags):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_flow_dissector, vhdr_flags),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_flow_dissector, vhdr_flags));
+		break;
+
+	case offsetof(struct __sk_buff, vhdr_gso_type):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_flow_dissector, vhdr_gso_type),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_flow_dissector, vhdr_gso_type));
+		break;
+
+	case offsetof(struct __sk_buff, vhdr_hdr_len):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_flow_dissector, vhdr_hdr_len),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_flow_dissector, vhdr_hdr_len));
+		break;
+
+	case offsetof(struct __sk_buff, vhdr_gso_size):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_flow_dissector, vhdr_gso_size),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_flow_dissector, vhdr_gso_size));
+		break;
+
+	case offsetof(struct __sk_buff, vhdr_csum_start):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_flow_dissector, vhdr_csum_start),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_flow_dissector, vhdr_csum_start));
+		break;
+
+	case offsetof(struct __sk_buff, vhdr_csum_offset):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_flow_dissector, vhdr_csum_offset),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_flow_dissector, vhdr_csum_offset));
+		break;
+
+	case offsetof(struct __sk_buff, len):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_flow_dissector, skb),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_flow_dissector, skb));
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, len),
+				      si->dst_reg, si->dst_reg,
+				      offsetof(struct sk_buff, len));
+		break;
 	}
 
 	return insn - insn_buf;
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 3ed7c98a98e1..4b171ebec084 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -28,6 +28,7 @@
 #include <scsi/fc/fc_fcoe.h>
 #include <uapi/linux/batadv_packet.h>
 #include <linux/bpf.h>
+#include <linux/virtio_net.h>
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
 #include <net/netfilter/nf_conntrack_core.h>
 #include <net/netfilter/nf_conntrack_labels.h>
@@ -904,6 +905,7 @@ bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
  * @hlen: packet header length, if @data is NULL use skb_headlen(skb)
  * @flags: flags that control the dissection process, e.g.
  *         FLOW_DISSECTOR_F_STOP_AT_ENCAP.
+ * @vhdr: virtio_net_header to include in kernel context for BPF flow dissector
  *
  * The function will try to retrieve individual keys into target specified
  * by flow_dissector from either the skbuff or a raw buffer specified by the
@@ -915,7 +917,8 @@ bool __skb_flow_dissect(const struct net *net,
 			const struct sk_buff *skb,
 			struct flow_dissector *flow_dissector,
 			void *target_container, const void *data,
-			__be16 proto, int nhoff, int hlen, unsigned int flags)
+			__be16 proto, int nhoff, int hlen, unsigned int flags,
+			const struct virtio_net_hdr *vhdr)
 {
 	struct flow_dissector_key_control *key_control;
 	struct flow_dissector_key_basic *key_basic;
@@ -1001,6 +1004,23 @@ bool __skb_flow_dissect(const struct net *net,
 			__be16 n_proto = proto;
 			struct bpf_prog *prog;
 
+			if (vhdr) {
+				ctx.vhdr_flags = vhdr->flags;
+				ctx.vhdr_gso_type = vhdr->gso_type;
+				ctx.vhdr_hdr_len =
+					__virtio16_to_cpu(virtio_legacy_is_little_endian(),
+							  vhdr->hdr_len);
+				ctx.vhdr_gso_size =
+					__virtio16_to_cpu(virtio_legacy_is_little_endian(),
+							  vhdr->gso_size);
+				ctx.vhdr_csum_start =
+					__virtio16_to_cpu(virtio_legacy_is_little_endian(),
+							  vhdr->csum_start);
+				ctx.vhdr_csum_offset =
+					__virtio16_to_cpu(virtio_legacy_is_little_endian(),
+							  vhdr->csum_offset);
+			}
+
 			if (skb) {
 				ctx.skb = skb;
 				/* we can't use 'proto' in the skb case
@@ -1610,7 +1630,7 @@ u32 __skb_get_hash_symmetric(const struct sk_buff *skb)
 	memset(&keys, 0, sizeof(keys));
 	__skb_flow_dissect(NULL, skb, &flow_keys_dissector_symmetric,
 			   &keys, NULL, 0, 0, 0,
-			   FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL);
+			   FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL, NULL);
 
 	return __flow_hash_from_keys(&keys, &hashrnd);
 }
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 418b9b813d65..de525defd462 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5155,6 +5155,12 @@ struct __sk_buff {
 	__u32 gso_segs;
 	__bpf_md_ptr(struct bpf_sock *, sk);
 	__u32 gso_size;
+	__u8  vhdr_flags;
+	__u8  vhdr_gso_type;
+	__u16 vhdr_hdr_len;
+	__u16 vhdr_gso_size;
+	__u16 vhdr_csum_start;
+	__u16 vhdr_csum_offset;
 };
 
 struct bpf_tunnel_key {
-- 
2.32.0.rc0.204.g9fa02ecfa5-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next v3 2/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb
  2021-06-01 22:18 [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb Tanner Love
  2021-06-01 22:18 ` [PATCH net-next v3 1/3] net: flow_dissector: extend bpf flow dissector support with vnet hdr Tanner Love
@ 2021-06-01 22:18 ` Tanner Love
  2021-06-03 15:54   ` sdf
  2021-06-03 23:56   ` Alexei Starovoitov
  2021-06-01 22:18 ` [PATCH net-next v3 3/3] selftests/net: amend bpf flow dissector prog to do vnet hdr validation Tanner Love
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 15+ messages in thread
From: Tanner Love @ 2021-06-01 22:18 UTC (permalink / raw)
  To: netdev
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eric Dumazet, Willem de Bruijn, Petar Penkov, Jakub Kicinski,
	Tanner Love, kernel test robot

From: Tanner Love <tannerlove@google.com>

Syzkaller bugs have resulted from loose specification of
virtio_net_hdr[1]. Enable execution of a BPF flow dissector program
in virtio_net_hdr_to_skb to validate the vnet header and drop bad
input.

The existing behavior of accepting these vnet headers is part of the
ABI. But individual admins may want to enforce restrictions. For
example, verifying that a GSO_TCPV4 gso_type matches packet contents:
unencapsulated TCP/IPV4 packet with payload exceeding gso_size and
hdr_len at payload offset.

Introduce a new sysctl net.core.flow_dissect_vnet_hdr controlling a
static key to decide whether to perform flow dissection. When the key
is false, virtio_net_hdr_to_skb computes as before.

[1] https://syzkaller.appspot.com/bug?id=b419a5ca95062664fe1a60b764621eb4526e2cd0

[ um build error ]
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Tanner Love <tannerlove@google.com>
Suggested-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/virtio_net.h | 25 +++++++++++++++++++++----
 net/core/flow_dissector.c  |  3 +++
 net/core/sysctl_net_core.c |  9 +++++++++
 3 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index b465f8f3e554..cdc6152b40c6 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -25,10 +25,13 @@ static inline int virtio_net_hdr_set_proto(struct sk_buff *skb,
 	return 0;
 }
 
+DECLARE_STATIC_KEY_FALSE(sysctl_flow_dissect_vnet_hdr_key);
+
 static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 					const struct virtio_net_hdr *hdr,
 					bool little_endian)
 {
+	struct flow_keys_basic keys;
 	unsigned int gso_type = 0;
 	unsigned int thlen = 0;
 	unsigned int p_off = 0;
@@ -78,13 +81,24 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 		p_off = skb_transport_offset(skb) + thlen;
 		if (!pskb_may_pull(skb, p_off))
 			return -EINVAL;
-	} else {
+	}
+
+	/* BPF flow dissection for optional strict validation.
+	 *
+	 * Admins can define permitted packets more strictly, such as dropping
+	 * deprecated UDP_UFO packets and requiring skb->protocol to be non-zero
+	 * and matching packet headers.
+	 */
+	if (static_branch_unlikely(&sysctl_flow_dissect_vnet_hdr_key) &&
+	    !__skb_flow_dissect_flow_keys_basic(NULL, skb, &keys, NULL, 0, 0, 0,
+						0, hdr))
+		return -EINVAL;
+
+	if (!(hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM)) {
 		/* gso packets without NEEDS_CSUM do not set transport_offset.
 		 * probe and drop if does not match one of the above types.
 		 */
 		if (gso_type && skb->network_header) {
-			struct flow_keys_basic keys;
-
 			if (!skb->protocol) {
 				__be16 protocol = dev_parse_header_protocol(skb);
 
@@ -92,8 +106,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 				if (protocol && protocol != skb->protocol)
 					return -EINVAL;
 			}
+
 retry:
-			if (!skb_flow_dissect_flow_keys_basic(NULL, skb, &keys,
+			/* only if flow dissection not already done */
+			if (!static_branch_unlikely(&sysctl_flow_dissect_vnet_hdr_key) &&
+			    !skb_flow_dissect_flow_keys_basic(NULL, skb, &keys,
 							      NULL, 0, 0, 0,
 							      0)) {
 				/* UFO does not specify ipv4 or 6: try both */
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 4b171ebec084..ae2ce382f4f4 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -35,6 +35,9 @@
 #endif
 #include <linux/bpf-netns.h>
 
+DEFINE_STATIC_KEY_FALSE(sysctl_flow_dissect_vnet_hdr_key);
+EXPORT_SYMBOL(sysctl_flow_dissect_vnet_hdr_key);
+
 static void dissector_set_key(struct flow_dissector *flow_dissector,
 			      enum flow_dissector_key_id key_id)
 {
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index c8496c1142c9..c01b9366bb75 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -36,6 +36,8 @@ static int net_msg_warn;	/* Unused, but still a sysctl */
 int sysctl_fb_tunnels_only_for_init_net __read_mostly = 0;
 EXPORT_SYMBOL(sysctl_fb_tunnels_only_for_init_net);
 
+DECLARE_STATIC_KEY_FALSE(sysctl_flow_dissect_vnet_hdr_key);
+
 /* 0 - Keep current behavior:
  *     IPv4: inherit all current settings from init_net
  *     IPv6: reset all settings to default
@@ -580,6 +582,13 @@ static struct ctl_table net_core_table[] = {
 		.extra1		= SYSCTL_ONE,
 		.extra2		= &int_3600,
 	},
+	{
+		.procname       = "flow_dissect_vnet_hdr",
+		.data           = &sysctl_flow_dissect_vnet_hdr_key.key,
+		.maxlen         = sizeof(sysctl_flow_dissect_vnet_hdr_key),
+		.mode           = 0644,
+		.proc_handler   = proc_do_static_key,
+	},
 	{ }
 };
 
-- 
2.32.0.rc0.204.g9fa02ecfa5-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next v3 3/3] selftests/net: amend bpf flow dissector prog to do vnet hdr validation
  2021-06-01 22:18 [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb Tanner Love
  2021-06-01 22:18 ` [PATCH net-next v3 1/3] net: flow_dissector: extend bpf flow dissector support with vnet hdr Tanner Love
  2021-06-01 22:18 ` [PATCH net-next v3 2/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb Tanner Love
@ 2021-06-01 22:18 ` Tanner Love
  2021-06-02 20:10 ` [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb David Miller
  2021-06-04  2:55 ` Jason Wang
  4 siblings, 0 replies; 15+ messages in thread
From: Tanner Love @ 2021-06-01 22:18 UTC (permalink / raw)
  To: netdev
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eric Dumazet, Willem de Bruijn, Petar Penkov, Jakub Kicinski,
	Tanner Love

From: Tanner Love <tannerlove@google.com>

Change the BPF flow dissector program to perform various checks on the
virtio_net_hdr fields after doing flow dissection.

Amend test_flow_dissector.(c|sh) to add test cases that inject packets
with reasonable or unreasonable virtio-net headers and assert that bad
packets are dropped and good packets are not. Do this via packet
socket; the kernel executes tpacket_snd, which enters
virtio_net_hdr_to_skb, where flow dissection / vnet header validation
occurs.

Signed-off-by: Tanner Love <tannerlove@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
 tools/testing/selftests/bpf/progs/bpf_flow.c  | 188 +++++++++++++-----
 .../selftests/bpf/test_flow_dissector.c       | 181 +++++++++++++++--
 .../selftests/bpf/test_flow_dissector.sh      |  19 ++
 3 files changed, 321 insertions(+), 67 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/bpf_flow.c b/tools/testing/selftests/bpf/progs/bpf_flow.c
index 95a5a0778ed7..681cfbdba1d7 100644
--- a/tools/testing/selftests/bpf/progs/bpf_flow.c
+++ b/tools/testing/selftests/bpf/progs/bpf_flow.c
@@ -13,6 +13,7 @@
 #include <linux/tcp.h>
 #include <linux/udp.h>
 #include <linux/if_packet.h>
+#include <linux/virtio_net.h>
 #include <sys/socket.h>
 #include <linux/if_tunnel.h>
 #include <linux/mpls.h>
@@ -71,15 +72,119 @@ struct {
 	__type(value, struct bpf_flow_keys);
 } last_dissection SEC(".maps");
 
-static __always_inline int export_flow_keys(struct bpf_flow_keys *keys,
-					    int ret)
+/* Drops invalid virtio-net headers */
+static __always_inline int validate_virtio_net_hdr(const struct __sk_buff *skb)
 {
+	const struct bpf_flow_keys *keys = skb->flow_keys;
+
+	/* Check gso */
+	if (skb->vhdr_gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+		if (!(skb->vhdr_flags & VIRTIO_NET_HDR_F_NEEDS_CSUM))
+			return BPF_DROP;
+
+		if (keys->is_encap)
+			return BPF_DROP;
+
+		switch (skb->vhdr_gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
+		case VIRTIO_NET_HDR_GSO_TCPV4:
+			if (keys->addr_proto != ETH_P_IP ||
+			    keys->ip_proto != IPPROTO_TCP)
+				return BPF_DROP;
+
+			if (skb->vhdr_gso_size >= skb->len - keys->thoff -
+					sizeof(struct tcphdr))
+				return BPF_DROP;
+
+			break;
+		case VIRTIO_NET_HDR_GSO_TCPV6:
+			if (keys->addr_proto != ETH_P_IPV6 ||
+			    keys->ip_proto != IPPROTO_TCP)
+				return BPF_DROP;
+
+			if (skb->vhdr_gso_size >= skb->len - keys->thoff -
+					sizeof(struct tcphdr))
+				return BPF_DROP;
+
+			break;
+		case VIRTIO_NET_HDR_GSO_UDP:
+			if (keys->ip_proto != IPPROTO_UDP)
+				return BPF_DROP;
+
+			if (skb->vhdr_gso_size >= skb->len - keys->thoff -
+					sizeof(struct udphdr))
+				return BPF_DROP;
+
+			break;
+		default:
+			return BPF_DROP;
+		}
+	}
+
+	/* Check hdr_len */
+	if (skb->vhdr_hdr_len) {
+		switch (keys->ip_proto) {
+		case IPPROTO_TCP:
+			if (skb->vhdr_hdr_len != skb->flow_keys->thoff +
+					sizeof(struct tcphdr))
+				return BPF_DROP;
+
+			break;
+		case IPPROTO_UDP:
+			if (skb->vhdr_hdr_len != keys->thoff +
+					sizeof(struct udphdr))
+				return BPF_DROP;
+
+			break;
+		}
+	}
+
+	/* Check csum */
+	if (skb->vhdr_flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+		if (keys->addr_proto != ETH_P_IP &&
+		    keys->addr_proto != ETH_P_IPV6)
+			return BPF_DROP;
+
+		if (skb->vhdr_csum_start != keys->thoff)
+			return BPF_DROP;
+
+		switch (keys->ip_proto) {
+		case IPPROTO_TCP:
+			if (skb->vhdr_csum_offset !=
+					offsetof(struct tcphdr, check))
+				return BPF_DROP;
+
+			break;
+		case IPPROTO_UDP:
+			if (skb->vhdr_csum_offset !=
+					offsetof(struct udphdr, check))
+				return BPF_DROP;
+
+			break;
+		default:
+			return BPF_DROP;
+		}
+	}
+
+	return BPF_OK;
+}
+
+/* Common steps to perform regardless of where protocol parsing finishes:
+ * 1. store flow keys in map
+ * 2. if parse result is BPF_OK, parse the vnet hdr if present
+ * 3. return the parse result
+ */
+static __always_inline int parse_epilogue(struct __sk_buff *skb, int ret)
+{
+	const struct bpf_flow_keys *keys = skb->flow_keys;
 	__u32 key = (__u32)(keys->sport) << 16 | keys->dport;
 	struct bpf_flow_keys val;
 
 	memcpy(&val, keys, sizeof(val));
 	bpf_map_update_elem(&last_dissection, &key, &val, BPF_ANY);
-	return ret;
+
+	if (ret != BPF_OK)
+		return ret;
+	return validate_virtio_net_hdr(skb);
 }
 
 #define IPV6_FLOWLABEL_MASK		__bpf_constant_htonl(0x000FFFFF)
@@ -114,8 +219,6 @@ static __always_inline void *bpf_flow_dissect_get_header(struct __sk_buff *skb,
 /* Dispatches on ETHERTYPE */
 static __always_inline int parse_eth_proto(struct __sk_buff *skb, __be16 proto)
 {
-	struct bpf_flow_keys *keys = skb->flow_keys;
-
 	switch (proto) {
 	case bpf_htons(ETH_P_IP):
 		bpf_tail_call_static(skb, &jmp_table, IP);
@@ -131,12 +234,10 @@ static __always_inline int parse_eth_proto(struct __sk_buff *skb, __be16 proto)
 	case bpf_htons(ETH_P_8021AD):
 		bpf_tail_call_static(skb, &jmp_table, VLAN);
 		break;
-	default:
-		/* Protocol not supported */
-		return export_flow_keys(keys, BPF_DROP);
 	}
 
-	return export_flow_keys(keys, BPF_DROP);
+	/* Protocol not supported */
+	return parse_epilogue(skb, BPF_DROP);
 }
 
 SEC("flow_dissector")
@@ -162,28 +263,28 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto)
 	case IPPROTO_ICMP:
 		icmp = bpf_flow_dissect_get_header(skb, sizeof(*icmp), &_icmp);
 		if (!icmp)
-			return export_flow_keys(keys, BPF_DROP);
-		return export_flow_keys(keys, BPF_OK);
+			return parse_epilogue(skb, BPF_DROP);
+		return parse_epilogue(skb, BPF_OK);
 	case IPPROTO_IPIP:
 		keys->is_encap = true;
 		if (keys->flags & BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP)
-			return export_flow_keys(keys, BPF_OK);
+			return parse_epilogue(skb, BPF_OK);
 
 		return parse_eth_proto(skb, bpf_htons(ETH_P_IP));
 	case IPPROTO_IPV6:
 		keys->is_encap = true;
 		if (keys->flags & BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP)
-			return export_flow_keys(keys, BPF_OK);
+			return parse_epilogue(skb, BPF_OK);
 
 		return parse_eth_proto(skb, bpf_htons(ETH_P_IPV6));
 	case IPPROTO_GRE:
 		gre = bpf_flow_dissect_get_header(skb, sizeof(*gre), &_gre);
 		if (!gre)
-			return export_flow_keys(keys, BPF_DROP);
+			return parse_epilogue(skb, BPF_DROP);
 
 		if (bpf_htons(gre->flags & GRE_VERSION))
 			/* Only inspect standard GRE packets with version 0 */
-			return export_flow_keys(keys, BPF_OK);
+			return parse_epilogue(skb, BPF_OK);
 
 		keys->thoff += sizeof(*gre); /* Step over GRE Flags and Proto */
 		if (GRE_IS_CSUM(gre->flags))
@@ -195,13 +296,13 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto)
 
 		keys->is_encap = true;
 		if (keys->flags & BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP)
-			return export_flow_keys(keys, BPF_OK);
+			return parse_epilogue(skb, BPF_OK);
 
 		if (gre->proto == bpf_htons(ETH_P_TEB)) {
 			eth = bpf_flow_dissect_get_header(skb, sizeof(*eth),
 							  &_eth);
 			if (!eth)
-				return export_flow_keys(keys, BPF_DROP);
+				return parse_epilogue(skb, BPF_DROP);
 
 			keys->thoff += sizeof(*eth);
 
@@ -212,37 +313,35 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto)
 	case IPPROTO_TCP:
 		tcp = bpf_flow_dissect_get_header(skb, sizeof(*tcp), &_tcp);
 		if (!tcp)
-			return export_flow_keys(keys, BPF_DROP);
+			return parse_epilogue(skb, BPF_DROP);
 
 		if (tcp->doff < 5)
-			return export_flow_keys(keys, BPF_DROP);
+			return parse_epilogue(skb, BPF_DROP);
 
 		if ((__u8 *)tcp + (tcp->doff << 2) > data_end)
-			return export_flow_keys(keys, BPF_DROP);
+			return parse_epilogue(skb, BPF_DROP);
 
 		keys->sport = tcp->source;
 		keys->dport = tcp->dest;
-		return export_flow_keys(keys, BPF_OK);
+		return parse_epilogue(skb, BPF_OK);
 	case IPPROTO_UDP:
 	case IPPROTO_UDPLITE:
 		udp = bpf_flow_dissect_get_header(skb, sizeof(*udp), &_udp);
 		if (!udp)
-			return export_flow_keys(keys, BPF_DROP);
+			return parse_epilogue(skb, BPF_DROP);
 
 		keys->sport = udp->source;
 		keys->dport = udp->dest;
-		return export_flow_keys(keys, BPF_OK);
+		return parse_epilogue(skb, BPF_OK);
 	default:
-		return export_flow_keys(keys, BPF_DROP);
+		return parse_epilogue(skb, BPF_DROP);
 	}
 
-	return export_flow_keys(keys, BPF_DROP);
+	return parse_epilogue(skb, BPF_DROP);
 }
 
 static __always_inline int parse_ipv6_proto(struct __sk_buff *skb, __u8 nexthdr)
 {
-	struct bpf_flow_keys *keys = skb->flow_keys;
-
 	switch (nexthdr) {
 	case IPPROTO_HOPOPTS:
 	case IPPROTO_DSTOPTS:
@@ -255,7 +354,7 @@ static __always_inline int parse_ipv6_proto(struct __sk_buff *skb, __u8 nexthdr)
 		return parse_ip_proto(skb, nexthdr);
 	}
 
-	return export_flow_keys(keys, BPF_DROP);
+	return parse_epilogue(skb, BPF_DROP);
 }
 
 PROG(IP)(struct __sk_buff *skb)
@@ -268,11 +367,11 @@ PROG(IP)(struct __sk_buff *skb)
 
 	iph = bpf_flow_dissect_get_header(skb, sizeof(*iph), &_iph);
 	if (!iph)
-		return export_flow_keys(keys, BPF_DROP);
+		return parse_epilogue(skb, BPF_DROP);
 
 	/* IP header cannot be smaller than 20 bytes */
 	if (iph->ihl < 5)
-		return export_flow_keys(keys, BPF_DROP);
+		return parse_epilogue(skb, BPF_DROP);
 
 	keys->addr_proto = ETH_P_IP;
 	keys->ipv4_src = iph->saddr;
@@ -281,7 +380,7 @@ PROG(IP)(struct __sk_buff *skb)
 
 	keys->thoff += iph->ihl << 2;
 	if (data + keys->thoff > data_end)
-		return export_flow_keys(keys, BPF_DROP);
+		return parse_epilogue(skb, BPF_DROP);
 
 	if (iph->frag_off & bpf_htons(IP_MF | IP_OFFSET)) {
 		keys->is_frag = true;
@@ -302,7 +401,7 @@ PROG(IP)(struct __sk_buff *skb)
 	}
 
 	if (done)
-		return export_flow_keys(keys, BPF_OK);
+		return parse_epilogue(skb, BPF_OK);
 
 	return parse_ip_proto(skb, iph->protocol);
 }
@@ -314,7 +413,7 @@ PROG(IPV6)(struct __sk_buff *skb)
 
 	ip6h = bpf_flow_dissect_get_header(skb, sizeof(*ip6h), &_ip6h);
 	if (!ip6h)
-		return export_flow_keys(keys, BPF_DROP);
+		return parse_epilogue(skb, BPF_DROP);
 
 	keys->addr_proto = ETH_P_IPV6;
 	memcpy(&keys->ipv6_src, &ip6h->saddr, 2*sizeof(ip6h->saddr));
@@ -324,7 +423,7 @@ PROG(IPV6)(struct __sk_buff *skb)
 	keys->flow_label = ip6_flowlabel(ip6h);
 
 	if (keys->flags & BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL)
-		return export_flow_keys(keys, BPF_OK);
+		return parse_epilogue(skb, BPF_OK);
 
 	return parse_ipv6_proto(skb, ip6h->nexthdr);
 }
@@ -336,7 +435,7 @@ PROG(IPV6OP)(struct __sk_buff *skb)
 
 	ip6h = bpf_flow_dissect_get_header(skb, sizeof(*ip6h), &_ip6h);
 	if (!ip6h)
-		return export_flow_keys(keys, BPF_DROP);
+		return parse_epilogue(skb, BPF_DROP);
 
 	/* hlen is in 8-octets and does not include the first 8 bytes
 	 * of the header
@@ -354,7 +453,7 @@ PROG(IPV6FR)(struct __sk_buff *skb)
 
 	fragh = bpf_flow_dissect_get_header(skb, sizeof(*fragh), &_fragh);
 	if (!fragh)
-		return export_flow_keys(keys, BPF_DROP);
+		return parse_epilogue(skb, BPF_DROP);
 
 	keys->thoff += sizeof(*fragh);
 	keys->is_frag = true;
@@ -367,9 +466,9 @@ PROG(IPV6FR)(struct __sk_buff *skb)
 		 * explicitly asked for.
 		 */
 		if (!(keys->flags & BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG))
-			return export_flow_keys(keys, BPF_OK);
+			return parse_epilogue(skb, BPF_OK);
 	} else {
-		return export_flow_keys(keys, BPF_OK);
+		return parse_epilogue(skb, BPF_OK);
 	}
 
 	return parse_ipv6_proto(skb, fragh->nexthdr);
@@ -377,14 +476,13 @@ PROG(IPV6FR)(struct __sk_buff *skb)
 
 PROG(MPLS)(struct __sk_buff *skb)
 {
-	struct bpf_flow_keys *keys = skb->flow_keys;
 	struct mpls_label *mpls, _mpls;
 
 	mpls = bpf_flow_dissect_get_header(skb, sizeof(*mpls), &_mpls);
 	if (!mpls)
-		return export_flow_keys(keys, BPF_DROP);
+		return parse_epilogue(skb, BPF_DROP);
 
-	return export_flow_keys(keys, BPF_OK);
+	return parse_epilogue(skb, BPF_OK);
 }
 
 PROG(VLAN)(struct __sk_buff *skb)
@@ -396,10 +494,10 @@ PROG(VLAN)(struct __sk_buff *skb)
 	if (keys->n_proto == bpf_htons(ETH_P_8021AD)) {
 		vlan = bpf_flow_dissect_get_header(skb, sizeof(*vlan), &_vlan);
 		if (!vlan)
-			return export_flow_keys(keys, BPF_DROP);
+			return parse_epilogue(skb, BPF_DROP);
 
 		if (vlan->h_vlan_encapsulated_proto != bpf_htons(ETH_P_8021Q))
-			return export_flow_keys(keys, BPF_DROP);
+			return parse_epilogue(skb, BPF_DROP);
 
 		keys->nhoff += sizeof(*vlan);
 		keys->thoff += sizeof(*vlan);
@@ -407,14 +505,14 @@ PROG(VLAN)(struct __sk_buff *skb)
 
 	vlan = bpf_flow_dissect_get_header(skb, sizeof(*vlan), &_vlan);
 	if (!vlan)
-		return export_flow_keys(keys, BPF_DROP);
+		return parse_epilogue(skb, BPF_DROP);
 
 	keys->nhoff += sizeof(*vlan);
 	keys->thoff += sizeof(*vlan);
 	/* Only allow 8021AD + 8021Q double tagging and no triple tagging.*/
 	if (vlan->h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021AD) ||
 	    vlan->h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021Q))
-		return export_flow_keys(keys, BPF_DROP);
+		return parse_epilogue(skb, BPF_DROP);
 
 	keys->n_proto = vlan->h_vlan_encapsulated_proto;
 	return parse_eth_proto(skb, vlan->h_vlan_encapsulated_proto);
diff --git a/tools/testing/selftests/bpf/test_flow_dissector.c b/tools/testing/selftests/bpf/test_flow_dissector.c
index 571cc076dd7d..583c13f75ead 100644
--- a/tools/testing/selftests/bpf/test_flow_dissector.c
+++ b/tools/testing/selftests/bpf/test_flow_dissector.c
@@ -17,6 +17,8 @@
 #include <linux/if_packet.h>
 #include <linux/if_ether.h>
 #include <linux/ipv6.h>
+#include <linux/virtio_net.h>
+#include <net/if.h>
 #include <netinet/ip.h>
 #include <netinet/in.h>
 #include <netinet/udp.h>
@@ -65,7 +67,8 @@ struct guehdr {
 static uint8_t	cfg_dsfield_inner;
 static uint8_t	cfg_dsfield_outer;
 static uint8_t	cfg_encap_proto;
-static bool	cfg_expect_failure = false;
+static bool	cfg_expect_norx;
+static bool	cfg_expect_snd_failure;
 static int	cfg_l3_extra = AF_UNSPEC;	/* optional SIT prefix */
 static int	cfg_l3_inner = AF_UNSPEC;
 static int	cfg_l3_outer = AF_UNSPEC;
@@ -77,8 +80,14 @@ static int	cfg_port_gue = 6080;
 static bool	cfg_only_rx;
 static bool	cfg_only_tx;
 static int	cfg_src_port = 9;
+static bool	cfg_tx_pf_packet;
+static bool	cfg_use_vnet;
+static bool	cfg_vnet_use_hdr_len_bad;
+static bool	cfg_vnet_use_gso;
+static bool	cfg_vnet_use_csum_off;
+static bool	cfg_partial_udp_hdr;
 
-static char	buf[ETH_DATA_LEN];
+static char	buf[ETH_MAX_MTU];
 
 #define INIT_ADDR4(name, addr4, port)				\
 	static struct sockaddr_in name = {			\
@@ -273,8 +282,48 @@ static int l3_length(int family)
 		return sizeof(struct ipv6hdr);
 }
 
+static int build_vnet_header(void *header, int il3_len)
+{
+	struct virtio_net_hdr *vh = header;
+
+	vh->hdr_len = ETH_HLEN + il3_len + sizeof(struct udphdr);
+
+	if (cfg_partial_udp_hdr) {
+		vh->hdr_len -= (sizeof(struct udphdr) >> 1);
+		return sizeof(*vh);
+	}
+
+	/* Alteration must increase hdr_len; if not, kernel overwrites it */
+	if (cfg_vnet_use_hdr_len_bad)
+		vh->hdr_len++;
+
+	if (cfg_vnet_use_csum_off) {
+		vh->flags |= VIRTIO_NET_HDR_F_NEEDS_CSUM;
+		vh->csum_start = ETH_HLEN + il3_len;
+		vh->csum_offset = __builtin_offsetof(struct udphdr, check);
+	}
+
+	if (cfg_vnet_use_gso) {
+		vh->gso_type = VIRTIO_NET_HDR_GSO_UDP;
+		vh->gso_size = ETH_DATA_LEN - il3_len;
+	}
+
+	return sizeof(*vh);
+}
+
+static int build_eth_header(void *header)
+{
+	struct ethhdr *eth = header;
+	uint16_t proto = cfg_l3_inner == PF_INET ? ETH_P_IP : ETH_P_IPV6;
+
+	eth->h_proto = htons(proto);
+
+	return ETH_HLEN;
+}
+
 static int build_packet(void)
 {
+	int l2_len = 0;
 	int ol3_len = 0, ol4_len = 0, il3_len = 0, il4_len = 0;
 	int el3_len = 0;
 
@@ -294,23 +343,29 @@ static int build_packet(void)
 	il3_len = l3_length(cfg_l3_inner);
 	il4_len = sizeof(struct udphdr);
 
-	if (el3_len + ol3_len + ol4_len + il3_len + il4_len + cfg_payload_len >=
-	    sizeof(buf))
+	if (cfg_use_vnet)
+		l2_len += build_vnet_header(buf, il3_len);
+	if (cfg_tx_pf_packet)
+		l2_len += build_eth_header(buf + l2_len);
+
+	if (l2_len + el3_len + ol3_len + ol4_len + il3_len + il4_len +
+	    cfg_payload_len >= sizeof(buf))
 		error(1, 0, "packet too large\n");
 
 	/*
 	 * Fill packet from inside out, to calculate correct checksums.
 	 * But create ip before udp headers, as udp uses ip for pseudo-sum.
 	 */
-	memset(buf + el3_len + ol3_len + ol4_len + il3_len + il4_len,
+	memset(buf + l2_len + el3_len + ol3_len + ol4_len + il3_len + il4_len,
 	       cfg_payload_char, cfg_payload_len);
 
 	/* add zero byte for udp csum padding */
-	buf[el3_len + ol3_len + ol4_len + il3_len + il4_len + cfg_payload_len] = 0;
+	buf[l2_len + el3_len + ol3_len + ol4_len + il3_len + il4_len +
+	    cfg_payload_len] = 0;
 
 	switch (cfg_l3_inner) {
 	case PF_INET:
-		build_ipv4_header(buf + el3_len + ol3_len + ol4_len,
+		build_ipv4_header(buf + l2_len + el3_len + ol3_len + ol4_len,
 				  IPPROTO_UDP,
 				  in_saddr4.sin_addr.s_addr,
 				  in_daddr4.sin_addr.s_addr,
@@ -318,7 +373,7 @@ static int build_packet(void)
 				  cfg_dsfield_inner);
 		break;
 	case PF_INET6:
-		build_ipv6_header(buf + el3_len + ol3_len + ol4_len,
+		build_ipv6_header(buf + l2_len + el3_len + ol3_len + ol4_len,
 				  IPPROTO_UDP,
 				  &in_saddr6, &in_daddr6,
 				  il4_len + cfg_payload_len,
@@ -326,22 +381,25 @@ static int build_packet(void)
 		break;
 	}
 
-	build_udp_header(buf + el3_len + ol3_len + ol4_len + il3_len,
+	build_udp_header(buf + l2_len + el3_len + ol3_len + ol4_len + il3_len,
 			 cfg_payload_len, CFG_PORT_INNER, cfg_l3_inner);
 
+	if (cfg_partial_udp_hdr)
+		return l2_len + il3_len + (il4_len >> 1);
+
 	if (!cfg_encap_proto)
-		return il3_len + il4_len + cfg_payload_len;
+		return l2_len + il3_len + il4_len + cfg_payload_len;
 
 	switch (cfg_l3_outer) {
 	case PF_INET:
-		build_ipv4_header(buf + el3_len, cfg_encap_proto,
+		build_ipv4_header(buf + l2_len + el3_len, cfg_encap_proto,
 				  out_saddr4.sin_addr.s_addr,
 				  out_daddr4.sin_addr.s_addr,
 				  ol4_len + il3_len + il4_len + cfg_payload_len,
 				  cfg_dsfield_outer);
 		break;
 	case PF_INET6:
-		build_ipv6_header(buf + el3_len, cfg_encap_proto,
+		build_ipv6_header(buf + l2_len + el3_len, cfg_encap_proto,
 				  &out_saddr6, &out_daddr6,
 				  ol4_len + il3_len + il4_len + cfg_payload_len,
 				  cfg_dsfield_outer);
@@ -350,17 +408,17 @@ static int build_packet(void)
 
 	switch (cfg_encap_proto) {
 	case IPPROTO_UDP:
-		build_gue_header(buf + el3_len + ol3_len + ol4_len -
+		build_gue_header(buf + l2_len + el3_len + ol3_len + ol4_len -
 				 sizeof(struct guehdr),
 				 cfg_l3_inner == PF_INET ? IPPROTO_IPIP
 							 : IPPROTO_IPV6);
-		build_udp_header(buf + el3_len + ol3_len,
+		build_udp_header(buf + l2_len + el3_len + ol3_len,
 				 sizeof(struct guehdr) + il3_len + il4_len +
 				 cfg_payload_len,
 				 cfg_port_gue, cfg_l3_outer);
 		break;
 	case IPPROTO_GRE:
-		build_gre_header(buf + el3_len + ol3_len,
+		build_gre_header(buf + l2_len + el3_len + ol3_len,
 				 cfg_l3_inner == PF_INET ? ETH_P_IP
 							 : ETH_P_IPV6);
 		break;
@@ -368,7 +426,7 @@ static int build_packet(void)
 
 	switch (cfg_l3_extra) {
 	case PF_INET:
-		build_ipv4_header(buf,
+		build_ipv4_header(buf + l2_len,
 				  cfg_l3_outer == PF_INET ? IPPROTO_IPIP
 							  : IPPROTO_IPV6,
 				  extra_saddr4.sin_addr.s_addr,
@@ -377,7 +435,7 @@ static int build_packet(void)
 				  cfg_payload_len, 0);
 		break;
 	case PF_INET6:
-		build_ipv6_header(buf,
+		build_ipv6_header(buf + l2_len,
 				  cfg_l3_outer == PF_INET ? IPPROTO_IPIP
 							  : IPPROTO_IPV6,
 				  &extra_saddr6, &extra_daddr6,
@@ -386,15 +444,46 @@ static int build_packet(void)
 		break;
 	}
 
-	return el3_len + ol3_len + ol4_len + il3_len + il4_len +
+	return l2_len + el3_len + ol3_len + ol4_len + il3_len + il4_len +
 	       cfg_payload_len;
 }
 
+static int setup_tx_pfpacket(void)
+{
+	struct sockaddr_ll laddr = {0};
+	const int one = 1;
+	uint16_t proto;
+	int fd;
+
+	fd = socket(PF_PACKET, SOCK_RAW, 0);
+	if (fd == -1)
+		error(1, errno, "socket tx");
+
+	if (cfg_use_vnet &&
+	    setsockopt(fd, SOL_PACKET, PACKET_VNET_HDR, &one, sizeof(one)))
+		error(1, errno, "setsockopt vnet");
+
+	proto = cfg_l3_inner == PF_INET ? ETH_P_IP : ETH_P_IPV6;
+	laddr.sll_family = AF_PACKET;
+	laddr.sll_protocol = htons(proto);
+	laddr.sll_ifindex = if_nametoindex("lo");
+	if (!laddr.sll_ifindex)
+		error(1, errno, "if_nametoindex");
+
+	if (bind(fd, (void *)&laddr, sizeof(laddr)))
+		error(1, errno, "bind");
+
+	return fd;
+}
+
 /* sender transmits encapsulated over RAW or unencap'd over UDP */
 static int setup_tx(void)
 {
 	int family, fd, ret;
 
+	if (cfg_tx_pf_packet)
+		return setup_tx_pfpacket();
+
 	if (cfg_l3_extra)
 		family = cfg_l3_extra;
 	else if (cfg_l3_outer)
@@ -464,6 +553,13 @@ static int do_tx(int fd, const char *pkt, int len)
 	int ret;
 
 	ret = write(fd, pkt, len);
+
+	if (cfg_expect_snd_failure) {
+		if (ret == -1)
+			return 0;
+		error(1, 0, "expected tx to fail but it did not");
+	}
+
 	if (ret == -1)
 		error(1, errno, "send");
 	if (ret != len)
@@ -571,7 +667,7 @@ static int do_main(void)
 	 * success (== 0) only if received all packets
 	 * unless failure is expected, in which case none must arrive.
 	 */
-	if (cfg_expect_failure)
+	if (cfg_expect_norx || cfg_expect_snd_failure)
 		return rx != 0;
 	else
 		return rx != tx;
@@ -623,8 +719,12 @@ static void parse_opts(int argc, char **argv)
 {
 	int c;
 
-	while ((c = getopt(argc, argv, "d:D:e:f:Fhi:l:n:o:O:Rs:S:t:Tx:X:")) != -1) {
+	while ((c = getopt(argc, argv,
+			   "cd:D:e:Ef:FGghi:l:Ln:o:O:pRs:S:t:TUvx:X:")) != -1) {
 		switch (c) {
+		case 'c':
+			cfg_vnet_use_csum_off = true;
+			break;
 		case 'd':
 			if (cfg_l3_outer == AF_UNSPEC)
 				error(1, 0, "-d must be preceded by -o");
@@ -653,11 +753,17 @@ static void parse_opts(int argc, char **argv)
 			else
 				usage(argv[0]);
 			break;
+		case 'E':
+			cfg_expect_snd_failure = true;
+			break;
 		case 'f':
 			cfg_src_port = strtol(optarg, NULL, 0);
 			break;
 		case 'F':
-			cfg_expect_failure = true;
+			cfg_expect_norx = true;
+			break;
+		case 'g':
+			cfg_vnet_use_gso = true;
 			break;
 		case 'h':
 			usage(argv[0]);
@@ -673,6 +779,9 @@ static void parse_opts(int argc, char **argv)
 		case 'l':
 			cfg_payload_len = strtol(optarg, NULL, 0);
 			break;
+		case 'L':
+			cfg_vnet_use_hdr_len_bad = true;
+			break;
 		case 'n':
 			cfg_num_pkt = strtol(optarg, NULL, 0);
 			break;
@@ -682,6 +791,9 @@ static void parse_opts(int argc, char **argv)
 		case 'O':
 			cfg_l3_extra = parse_protocol_family(argv[0], optarg);
 			break;
+		case 'p':
+			cfg_tx_pf_packet = true;
+			break;
 		case 'R':
 			cfg_only_rx = true;
 			break;
@@ -703,6 +815,12 @@ static void parse_opts(int argc, char **argv)
 		case 'T':
 			cfg_only_tx = true;
 			break;
+		case 'U':
+			cfg_partial_udp_hdr = true;
+			break;
+		case 'v':
+			cfg_use_vnet = true;
+			break;
 		case 'x':
 			cfg_dsfield_outer = strtol(optarg, NULL, 0);
 			break;
@@ -733,7 +851,26 @@ static void parse_opts(int argc, char **argv)
 	 */
 	if (((cfg_dsfield_outer & 0x3) == 0x3) &&
 	    ((cfg_dsfield_inner & 0x3) == 0x0))
-		cfg_expect_failure = true;
+		cfg_expect_norx = true;
+
+	/* Don't wait around for packets that we expect to fail to send */
+	if (cfg_expect_snd_failure && !cfg_num_secs)
+		cfg_num_secs = 3;
+
+	if (cfg_partial_udp_hdr && cfg_encap_proto)
+		error(1, 0,
+		      "ops: can't specify partial UDP hdr (-U) and encap (-e)");
+
+	if (cfg_use_vnet && cfg_encap_proto)
+		error(1, 0, "options: can't specify encap (-e) with vnet (-v)");
+	if (cfg_use_vnet && !cfg_tx_pf_packet)
+		error(1, 0, "options: vnet (-v) requires psock for tx (-p)");
+	if (cfg_vnet_use_gso && !cfg_use_vnet)
+		error(1, 0, "options: gso (-g) requires vnet (-v)");
+	if (cfg_vnet_use_csum_off && !cfg_use_vnet)
+		error(1, 0, "options: vnet csum (-c) requires vnet (-v)");
+	if (cfg_vnet_use_hdr_len_bad && !cfg_use_vnet)
+		error(1, 0, "options: bad vnet hdrlen (-L) requires vnet (-v)");
 }
 
 static void print_opts(void)
diff --git a/tools/testing/selftests/bpf/test_flow_dissector.sh b/tools/testing/selftests/bpf/test_flow_dissector.sh
index 174b72a64a4c..5852cf815eeb 100755
--- a/tools/testing/selftests/bpf/test_flow_dissector.sh
+++ b/tools/testing/selftests/bpf/test_flow_dissector.sh
@@ -51,6 +51,9 @@ if [[ -z $(ip netns identify $$) ]]; then
 		echo "Skipping root flow dissector test, bpftool not found" >&2
 	fi
 
+	orig_flow_dissect_sysctl=$(</proc/sys/net/core/flow_dissect_vnet_hdr)
+	sysctl net.core.flow_dissect_vnet_hdr=1
+
 	# Run the rest of the tests in a net namespace.
 	../net/in_netns.sh "$0" "$@"
 	err=$(( $err + $? ))
@@ -61,6 +64,7 @@ if [[ -z $(ip netns identify $$) ]]; then
 		echo "selftests: $TESTNAME [FAILED]";
 	fi
 
+	sysctl net.core.flow_dissect_vnet_hdr=$orig_flow_dissect_sysctl
 	exit $err
 fi
 
@@ -165,4 +169,19 @@ tc filter add dev lo parent ffff: protocol ipv6 pref 1337 flower ip_proto \
 # Send 10 IPv6/UDP packets from port 10. Filter should not drop any.
 ./test_flow_dissector -i 6 -f 10
 
+
+echo "Testing virtio-net header validation..."
+echo "Testing valid vnet headers. Should *not* be dropped."
+./with_addr.sh ./test_flow_dissector -i 4 -D 192.168.0.1 -S 1.1.1.1 -p -v
+echo "Testing partial transport header. Should be dropped."
+./with_addr.sh ./test_flow_dissector -i 4 -D 192.168.0.1 -S 1.1.1.1 -p -v -U -E
+echo "Testing valid vnet gso spec. Should *not* be dropped."
+./with_addr.sh ./test_flow_dissector -i 4 -D 192.168.0.1 -S 1.1.1.1 -p -v -g -c -l 8000
+echo "Testing invalid vnet gso size. Should be dropped."
+./with_addr.sh ./test_flow_dissector -i 4 -D 192.168.0.1 -S 1.1.1.1 -p -v -g -c -l 100 -E
+echo "Testing invalid vnet header len. Should be dropped."
+./with_addr.sh ./test_flow_dissector -i 4 -D 192.168.0.1 -S 1.1.1.1 -p -v -L -E
+echo "Testing vnet gso without csum. Should be dropped."
+./with_addr.sh ./test_flow_dissector -i 4 -D 192.168.0.1 -S 1.1.1.1 -p -v -g -l 8000 -E
+
 exit 0
-- 
2.32.0.rc0.204.g9fa02ecfa5-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb
  2021-06-01 22:18 [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb Tanner Love
                   ` (2 preceding siblings ...)
  2021-06-01 22:18 ` [PATCH net-next v3 3/3] selftests/net: amend bpf flow dissector prog to do vnet hdr validation Tanner Love
@ 2021-06-02 20:10 ` David Miller
  2021-06-02 23:16   ` Alexei Starovoitov
  2021-06-04  2:55 ` Jason Wang
  4 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2021-06-02 20:10 UTC (permalink / raw)
  To: tannerlove.kernel
  Cc: netdev, ast, daniel, andrii, edumazet, willemb, ppenkov, kuba,
	tannerlove

From: Tanner Love <tannerlove.kernel@gmail.com>
Date: Tue,  1 Jun 2021 18:18:37 -0400

> From: Tanner Love <tannerlove@google.com>
> 
> First patch extends the flow dissector BPF program type to accept
> virtio-net header members.
> 
> Second patch uses this feature to add optional flow dissection in
> virtio_net_hdr_to_skb(). This allows admins to define permitted
> packets more strictly, for example dropping deprecated UDP_UFO
> packets.
> 
> Third patch extends kselftest to cover this feature.

Definitely need some bpf review of these changes.

Alexei, Daniel?

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb
  2021-06-02 20:10 ` [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb David Miller
@ 2021-06-02 23:16   ` Alexei Starovoitov
  0 siblings, 0 replies; 15+ messages in thread
From: Alexei Starovoitov @ 2021-06-02 23:16 UTC (permalink / raw)
  To: David Miller
  Cc: tannerlove.kernel, Network Development, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Eric Dumazet, Willem de Bruijn,
	Petar Penkov, Jakub Kicinski, tannerlove

On Wed, Jun 2, 2021 at 1:10 PM David Miller <davem@davemloft.net> wrote:
>
> From: Tanner Love <tannerlove.kernel@gmail.com>
> Date: Tue,  1 Jun 2021 18:18:37 -0400
>
> > From: Tanner Love <tannerlove@google.com>
> >
> > First patch extends the flow dissector BPF program type to accept
> > virtio-net header members.
> >
> > Second patch uses this feature to add optional flow dissection in
> > virtio_net_hdr_to_skb(). This allows admins to define permitted
> > packets more strictly, for example dropping deprecated UDP_UFO
> > packets.
> >
> > Third patch extends kselftest to cover this feature.
>
> Definitely need some bpf review of these changes.

Yeah. imo it's more bpf material than net.
We'll process it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 1/3] net: flow_dissector: extend bpf flow dissector support with vnet hdr
  2021-06-01 22:18 ` [PATCH net-next v3 1/3] net: flow_dissector: extend bpf flow dissector support with vnet hdr Tanner Love
@ 2021-06-03 15:39   ` sdf
  0 siblings, 0 replies; 15+ messages in thread
From: sdf @ 2021-06-03 15:39 UTC (permalink / raw)
  To: Tanner Love
  Cc: netdev, davem, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Eric Dumazet, Willem de Bruijn, Petar Penkov,
	Jakub Kicinski, Tanner Love

On 06/01, Tanner Love wrote:
> From: Tanner Love <tannerlove@google.com>

> Amend the bpf flow dissector program type to accept virtio_net_hdr
> members. Do this to enable bpf flow dissector programs to perform
> virtio-net header validation. The next patch in this series will add
> a flow dissection hook in virtio_net_hdr_to_skb and make use of this
> extended functionality. That commit message has more background on the
> use case.

> Signed-off-by: Tanner Love <tannerlove@google.com>
> Reviewed-by: Willem de Bruijn <willemb@google.com>
> Reviewed-by: Petar Penkov <ppenkov@google.com>
> ---
>   drivers/net/bonding/bond_main.c |  2 +-
>   include/linux/skbuff.h          | 26 ++++++++++++----
>   include/net/flow_dissector.h    |  6 ++++
>   include/uapi/linux/bpf.h        |  6 ++++
>   net/core/filter.c               | 55 +++++++++++++++++++++++++++++++++
>   net/core/flow_dissector.c       | 24 ++++++++++++--
>   tools/include/uapi/linux/bpf.h  |  6 ++++
>   7 files changed, 116 insertions(+), 9 deletions(-)

> diff --git a/drivers/net/bonding/bond_main.c  
> b/drivers/net/bonding/bond_main.c
> index 7e469c203ca5..5d2d7d5c5704 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3554,7 +3554,7 @@ static bool bond_flow_dissect(struct bonding *bond,  
> struct sk_buff *skb,
>   	case BOND_XMIT_POLICY_ENCAP34:
>   		memset(fk, 0, sizeof(*fk));
>   		return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
> -					  fk, NULL, 0, 0, 0, 0);
> +					  fk, NULL, 0, 0, 0, 0, NULL);
>   	default:
>   		break;
>   	}
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index dbf820a50a39..fef8f4b5db6e 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1312,18 +1312,20 @@ struct bpf_flow_dissector;
>   bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector  
> *ctx,
>   		      __be16 proto, int nhoff, int hlen, unsigned int flags);

> +struct virtio_net_hdr;
>   bool __skb_flow_dissect(const struct net *net,
>   			const struct sk_buff *skb,
>   			struct flow_dissector *flow_dissector,
>   			void *target_container, const void *data,
> -			__be16 proto, int nhoff, int hlen, unsigned int flags);
> +			__be16 proto, int nhoff, int hlen, unsigned int flags,
> +			const struct virtio_net_hdr *vhdr);

>   static inline bool skb_flow_dissect(const struct sk_buff *skb,
>   				    struct flow_dissector *flow_dissector,
>   				    void *target_container, unsigned int flags)
>   {
>   	return __skb_flow_dissect(NULL, skb, flow_dissector,
> -				  target_container, NULL, 0, 0, 0, flags);
> +				  target_container, NULL, 0, 0, 0, flags, NULL);
>   }

>   static inline bool skb_flow_dissect_flow_keys(const struct sk_buff *skb,
> @@ -1332,7 +1334,20 @@ static inline bool  
> skb_flow_dissect_flow_keys(const struct sk_buff *skb,
>   {
>   	memset(flow, 0, sizeof(*flow));
>   	return __skb_flow_dissect(NULL, skb, &flow_keys_dissector,
> -				  flow, NULL, 0, 0, 0, flags);
> +				  flow, NULL, 0, 0, 0, flags, NULL);
> +}
> +
> +static inline bool
> +__skb_flow_dissect_flow_keys_basic(const struct net *net,
> +				   const struct sk_buff *skb,
> +				   struct flow_keys_basic *flow,
> +				   const void *data, __be16 proto,
> +				   int nhoff, int hlen, unsigned int flags,
> +				   const struct virtio_net_hdr *vhdr)
> +{
> +	memset(flow, 0, sizeof(*flow));
> +	return __skb_flow_dissect(net, skb, &flow_keys_basic_dissector, flow,
> +				  data, proto, nhoff, hlen, flags, vhdr);
>   }

>   static inline bool
> @@ -1342,9 +1357,8 @@ skb_flow_dissect_flow_keys_basic(const struct net  
> *net,
>   				 const void *data, __be16 proto,
>   				 int nhoff, int hlen, unsigned int flags)
>   {
> -	memset(flow, 0, sizeof(*flow));
> -	return __skb_flow_dissect(net, skb, &flow_keys_basic_dissector, flow,
> -				  data, proto, nhoff, hlen, flags);
> +	return __skb_flow_dissect_flow_keys_basic(net, skb, flow, data, proto,
> +						  nhoff, hlen, flags, NULL);
>   }

>   void skb_flow_dissect_meta(const struct sk_buff *skb,
> diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
> index ffd386ea0dbb..0796ad745e69 100644
> --- a/include/net/flow_dissector.h
> +++ b/include/net/flow_dissector.h
> @@ -370,6 +370,12 @@ struct bpf_flow_dissector {
>   	const struct sk_buff	*skb;
>   	const void		*data;
>   	const void		*data_end;
> +	__u8			vhdr_flags;
> +	__u8			vhdr_gso_type;
> +	__u16			vhdr_hdr_len;
> +	__u16			vhdr_gso_size;
> +	__u16			vhdr_csum_start;
> +	__u16			vhdr_csum_offset;
>   };

>   static inline void
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 418b9b813d65..de525defd462 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5155,6 +5155,12 @@ struct __sk_buff {
>   	__u32 gso_segs;
>   	__bpf_md_ptr(struct bpf_sock *, sk);
>   	__u32 gso_size;

[..]

> +	__u8  vhdr_flags;
> +	__u8  vhdr_gso_type;
> +	__u16 vhdr_hdr_len;
> +	__u16 vhdr_gso_size;
> +	__u16 vhdr_csum_start;
> +	__u16 vhdr_csum_offset;

These are flow dissector specific, any reason not to add them to
struct bpf_flow_keys instead?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 2/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb
  2021-06-01 22:18 ` [PATCH net-next v3 2/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb Tanner Love
@ 2021-06-03 15:54   ` sdf
  2021-06-03 23:56   ` Alexei Starovoitov
  1 sibling, 0 replies; 15+ messages in thread
From: sdf @ 2021-06-03 15:54 UTC (permalink / raw)
  To: Tanner Love
  Cc: netdev, davem, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Eric Dumazet, Willem de Bruijn, Petar Penkov,
	Jakub Kicinski, Tanner Love, kernel test robot

On 06/01, Tanner Love wrote:
> From: Tanner Love <tannerlove@google.com>

> Syzkaller bugs have resulted from loose specification of
> virtio_net_hdr[1]. Enable execution of a BPF flow dissector program
> in virtio_net_hdr_to_skb to validate the vnet header and drop bad
> input.

> The existing behavior of accepting these vnet headers is part of the
> ABI. But individual admins may want to enforce restrictions. For
> example, verifying that a GSO_TCPV4 gso_type matches packet contents:
> unencapsulated TCP/IPV4 packet with payload exceeding gso_size and
> hdr_len at payload offset.

> Introduce a new sysctl net.core.flow_dissect_vnet_hdr controlling a
> static key to decide whether to perform flow dissection. When the key
> is false, virtio_net_hdr_to_skb computes as before.

> [1]  
> https://syzkaller.appspot.com/bug?id=b419a5ca95062664fe1a60b764621eb4526e2cd0

> [ um build error ]
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Tanner Love <tannerlove@google.com>
> Suggested-by: Willem de Bruijn <willemb@google.com>
> ---
>   include/linux/virtio_net.h | 25 +++++++++++++++++++++----
>   net/core/flow_dissector.c  |  3 +++
>   net/core/sysctl_net_core.c |  9 +++++++++
>   3 files changed, 33 insertions(+), 4 deletions(-)

> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> index b465f8f3e554..cdc6152b40c6 100644
> --- a/include/linux/virtio_net.h
> +++ b/include/linux/virtio_net.h
> @@ -25,10 +25,13 @@ static inline int virtio_net_hdr_set_proto(struct  
> sk_buff *skb,
>   	return 0;
>   }

> +DECLARE_STATIC_KEY_FALSE(sysctl_flow_dissect_vnet_hdr_key);
> +
>   static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
>   					const struct virtio_net_hdr *hdr,
>   					bool little_endian)
>   {
> +	struct flow_keys_basic keys;
>   	unsigned int gso_type = 0;
>   	unsigned int thlen = 0;
>   	unsigned int p_off = 0;
> @@ -78,13 +81,24 @@ static inline int virtio_net_hdr_to_skb(struct  
> sk_buff *skb,
>   		p_off = skb_transport_offset(skb) + thlen;
>   		if (!pskb_may_pull(skb, p_off))
>   			return -EINVAL;
> -	} else {
> +	}
> +
> +	/* BPF flow dissection for optional strict validation.
> +	 *
> +	 * Admins can define permitted packets more strictly, such as dropping
> +	 * deprecated UDP_UFO packets and requiring skb->protocol to be non-zero
> +	 * and matching packet headers.
> +	 */
> +	if (static_branch_unlikely(&sysctl_flow_dissect_vnet_hdr_key) &&
> +	    !__skb_flow_dissect_flow_keys_basic(NULL, skb, &keys, NULL, 0, 0, 0,
> +						0, hdr))
> +		return -EINVAL;
> +
> +	if (!(hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM)) {
>   		/* gso packets without NEEDS_CSUM do not set transport_offset.
>   		 * probe and drop if does not match one of the above types.
>   		 */
>   		if (gso_type && skb->network_header) {
> -			struct flow_keys_basic keys;
> -
>   			if (!skb->protocol) {
>   				__be16 protocol = dev_parse_header_protocol(skb);

> @@ -92,8 +106,11 @@ static inline int virtio_net_hdr_to_skb(struct  
> sk_buff *skb,
>   				if (protocol && protocol != skb->protocol)
>   					return -EINVAL;
>   			}
> +
>   retry:
> -			if (!skb_flow_dissect_flow_keys_basic(NULL, skb, &keys,
> +			/* only if flow dissection not already done */
> +			if (!static_branch_unlikely(&sysctl_flow_dissect_vnet_hdr_key) &&
> +			    !skb_flow_dissect_flow_keys_basic(NULL, skb, &keys,
>   							      NULL, 0, 0, 0,
>   							      0)) {
>   				/* UFO does not specify ipv4 or 6: try both */
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 4b171ebec084..ae2ce382f4f4 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -35,6 +35,9 @@
>   #endif
>   #include <linux/bpf-netns.h>

> +DEFINE_STATIC_KEY_FALSE(sysctl_flow_dissect_vnet_hdr_key);
> +EXPORT_SYMBOL(sysctl_flow_dissect_vnet_hdr_key);
> +
>   static void dissector_set_key(struct flow_dissector *flow_dissector,
>   			      enum flow_dissector_key_id key_id)
>   {
> diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> index c8496c1142c9..c01b9366bb75 100644
> --- a/net/core/sysctl_net_core.c
> +++ b/net/core/sysctl_net_core.c
> @@ -36,6 +36,8 @@ static int net_msg_warn;	/* Unused, but still a sysctl  
> */
>   int sysctl_fb_tunnels_only_for_init_net __read_mostly = 0;
>   EXPORT_SYMBOL(sysctl_fb_tunnels_only_for_init_net);

> +DECLARE_STATIC_KEY_FALSE(sysctl_flow_dissect_vnet_hdr_key);
> +
>   /* 0 - Keep current behavior:
>    *     IPv4: inherit all current settings from init_net
>    *     IPv6: reset all settings to default
> @@ -580,6 +582,13 @@ static struct ctl_table net_core_table[] = {
>   		.extra1		= SYSCTL_ONE,
>   		.extra2		= &int_3600,
>   	},

[..]

> +	{
> +		.procname       = "flow_dissect_vnet_hdr",

This sounds generic, but iiuc, it makes sense only for bpf flow
dissection? Should it be bpf_flow_dissect_vnet_hdr instead?

Or should we drop this sysctl in favor of some per-program flag
(either attach_flags or some signal via btf)? That way,
individual bpf programs can signal their willingness to
parse vnet hdr.

> +		.data           = &sysctl_flow_dissect_vnet_hdr_key.key,
> +		.maxlen         = sizeof(sysctl_flow_dissect_vnet_hdr_key),
> +		.mode           = 0644,
> +		.proc_handler   = proc_do_static_key,
> +	},
>   	{ }
>   };

> --
> 2.32.0.rc0.204.g9fa02ecfa5-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 2/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb
  2021-06-01 22:18 ` [PATCH net-next v3 2/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb Tanner Love
  2021-06-03 15:54   ` sdf
@ 2021-06-03 23:56   ` Alexei Starovoitov
  2021-06-04  0:44     ` Willem de Bruijn
  1 sibling, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2021-06-03 23:56 UTC (permalink / raw)
  To: Tanner Love
  Cc: netdev, davem, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Eric Dumazet, Willem de Bruijn, Petar Penkov,
	Jakub Kicinski, Tanner Love, kernel test robot

On Tue, Jun 01, 2021 at 06:18:39PM -0400, Tanner Love wrote:
> From: Tanner Love <tannerlove@google.com>
> 
> Syzkaller bugs have resulted from loose specification of
> virtio_net_hdr[1]. Enable execution of a BPF flow dissector program
> in virtio_net_hdr_to_skb to validate the vnet header and drop bad
> input.
> 
> The existing behavior of accepting these vnet headers is part of the
> ABI. 

So ?
It's ok to fix ABI when it's broken.
The whole feature is a way to workaround broken ABI with additional
BPF based validation.
It's certainly a novel idea.
I've never seen BPF being used to fix the kernel bugs.
But I think the better way forward is to admit that vnet ABI is broken
and fix it in the kernel with proper validation.
BPF-based validation is a band-aid. The out of the box kernel will
stay broken and syzbot will continue to crash it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 2/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb
  2021-06-03 23:56   ` Alexei Starovoitov
@ 2021-06-04  0:44     ` Willem de Bruijn
  2021-06-04  2:04       ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Willem de Bruijn @ 2021-06-04  0:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Tanner Love, Network Development, David Miller,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eric Dumazet, Petar Penkov, Jakub Kicinski, Tanner Love,
	kernel test robot

On Thu, Jun 3, 2021 at 7:56 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Jun 01, 2021 at 06:18:39PM -0400, Tanner Love wrote:
> > From: Tanner Love <tannerlove@google.com>
> >
> > Syzkaller bugs have resulted from loose specification of
> > virtio_net_hdr[1]. Enable execution of a BPF flow dissector program
> > in virtio_net_hdr_to_skb to validate the vnet header and drop bad
> > input.
> >
> > The existing behavior of accepting these vnet headers is part of the
> > ABI.
>
> So ?
> It's ok to fix ABI when it's broken.
> The whole feature is a way to workaround broken ABI with additional
> BPF based validation.
> It's certainly a novel idea.
> I've never seen BPF being used to fix the kernel bugs.
> But I think the better way forward is to admit that vnet ABI is broken
> and fix it in the kernel with proper validation.
> BPF-based validation is a band-aid. The out of the box kernel will
> stay broken and syzbot will continue to crash it.

The intention is not to use this to avoid kernel fixes.

There are three parts:

1. is the ABI broken and can we simply drop certain weird packets?
2. will we accept an extra packet parsing stage in
virtio_net_hdr_to_skb to find all the culprits?
3. do we want to add yet another parser in C when this is exactly what
the BPF flow dissector does well?

The virtio_net_hdr interface is more permissive than I think it was
intended. But weird edge cases like protocol 0 do occur. So I believe
it is simply too late to drop everything that is not obviously
correct.

More problematic is that some of the gray area cases are non-trivial
and require protocol parsing. Do the packet headers match the gso
type? Are subtle variants like IPv6/TCP with extension headers
supported or not?

We have previously tried to add unconditional protocol parsing in
virtio_net_hdr_to_skb, but received pushback because this will cause
performance regressions, e.g., for vm-to-vm forwarding.

Combined, that's why I think BPF flow dissection is the right approach
here. It is optional, can be as pedantic as the admin wants / workload
allows (e.g., dropping UFO altogether) and ensures protocol parsing
itself is robust. Even if not enabled continuously, offers a quick way
to patch a vulnerability when it is discovered. This is the same
reasoning of the existing BPF flow dissector.

It is not intended as an alternative to subsequently fixing a bug in
the kernel path.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 2/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb
  2021-06-04  0:44     ` Willem de Bruijn
@ 2021-06-04  2:04       ` Alexei Starovoitov
  0 siblings, 0 replies; 15+ messages in thread
From: Alexei Starovoitov @ 2021-06-04  2:04 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Tanner Love, Network Development, David Miller,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eric Dumazet, Petar Penkov, Jakub Kicinski, Tanner Love,
	kernel test robot

On Thu, Jun 03, 2021 at 08:44:51PM -0400, Willem de Bruijn wrote:
> On Thu, Jun 3, 2021 at 7:56 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Jun 01, 2021 at 06:18:39PM -0400, Tanner Love wrote:
> > > From: Tanner Love <tannerlove@google.com>
> > >
> > > Syzkaller bugs have resulted from loose specification of
> > > virtio_net_hdr[1]. Enable execution of a BPF flow dissector program
> > > in virtio_net_hdr_to_skb to validate the vnet header and drop bad
> > > input.
> > >
> > > The existing behavior of accepting these vnet headers is part of the
> > > ABI.
> >
> > So ?
> > It's ok to fix ABI when it's broken.
> > The whole feature is a way to workaround broken ABI with additional
> > BPF based validation.
> > It's certainly a novel idea.
> > I've never seen BPF being used to fix the kernel bugs.
> > But I think the better way forward is to admit that vnet ABI is broken
> > and fix it in the kernel with proper validation.
> > BPF-based validation is a band-aid. The out of the box kernel will
> > stay broken and syzbot will continue to crash it.
> 
> The intention is not to use this to avoid kernel fixes.
> 
> There are three parts:
> 
> 1. is the ABI broken and can we simply drop certain weird packets?
> 2. will we accept an extra packet parsing stage in
> virtio_net_hdr_to_skb to find all the culprits?
> 3. do we want to add yet another parser in C when this is exactly what
> the BPF flow dissector does well?
> 
> The virtio_net_hdr interface is more permissive than I think it was
> intended. But weird edge cases like protocol 0 do occur. So I believe
> it is simply too late to drop everything that is not obviously
> correct.
> 
> More problematic is that some of the gray area cases are non-trivial
> and require protocol parsing. Do the packet headers match the gso
> type? Are subtle variants like IPv6/TCP with extension headers
> supported or not?
> 
> We have previously tried to add unconditional protocol parsing in
> virtio_net_hdr_to_skb, but received pushback because this will cause
> performance regressions, e.g., for vm-to-vm forwarding.
> 
> Combined, that's why I think BPF flow dissection is the right approach
> here. It is optional, can be as pedantic as the admin wants / workload
> allows (e.g., dropping UFO altogether) and ensures protocol parsing
> itself is robust. Even if not enabled continuously, offers a quick way
> to patch a vulnerability when it is discovered. This is the same
> reasoning of the existing BPF flow dissector.
> 
> It is not intended as an alternative to subsequently fixing a bug in
> the kernel path.

Thanks for these details. They need to be part of commit log.

As far as patches the overhead of copying extra fields can be avoided.
How about copying single pointer to struct virtio_net_hdr into bpf_flow_keys instead?
See struct bpf_sockopt and _bpf_md_ptr(struct bpf_sock *, sk).
The overhead will much lower and bpf prog will access all virtio_net_hdr directly.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb
  2021-06-01 22:18 [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb Tanner Love
                   ` (3 preceding siblings ...)
  2021-06-02 20:10 ` [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb David Miller
@ 2021-06-04  2:55 ` Jason Wang
  2021-06-04  3:51   ` Willem de Bruijn
  4 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2021-06-04  2:55 UTC (permalink / raw)
  To: Tanner Love, netdev
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eric Dumazet, Willem de Bruijn, Petar Penkov, Jakub Kicinski,
	Tanner Love, Michael S. Tsirkin


在 2021/6/2 上午6:18, Tanner Love 写道:
> From: Tanner Love <tannerlove@google.com>
>
> First patch extends the flow dissector BPF program type to accept
> virtio-net header members.
>
> Second patch uses this feature to add optional flow dissection in
> virtio_net_hdr_to_skb(). This allows admins to define permitted
> packets more strictly, for example dropping deprecated UDP_UFO
> packets.
>
> Third patch extends kselftest to cover this feature.


I wonder why virtio maintainers is not copied in this series.

Several questions:

1) having bpf core to know about virito-net header seems like a layer 
violation, it doesn't scale as we may add new fields, actually there's 
already fields that is not implemented in the spec but not Linux right now.
2) virtio_net_hdr_to_skb() is not the single entry point, packet could 
go via XDP
3) I wonder whether we can simply use XDP to solve this issue (metadata 
probably but I don't have a deep thought)
4) If I understand the code correctly, it should deal with all dodgy 
packets instead of just for virtio

Thanks


>
> Tanner Love (3):
>    net: flow_dissector: extend bpf flow dissector support with vnet hdr
>    virtio_net: add optional flow dissection in virtio_net_hdr_to_skb
>    selftests/net: amend bpf flow dissector prog to do vnet hdr validation
>
>   drivers/net/bonding/bond_main.c               |   2 +-
>   include/linux/skbuff.h                        |  26 ++-
>   include/linux/virtio_net.h                    |  25 ++-
>   include/net/flow_dissector.h                  |   6 +
>   include/uapi/linux/bpf.h                      |   6 +
>   net/core/filter.c                             |  55 +++++
>   net/core/flow_dissector.c                     |  27 ++-
>   net/core/sysctl_net_core.c                    |   9 +
>   tools/include/uapi/linux/bpf.h                |   6 +
>   tools/testing/selftests/bpf/progs/bpf_flow.c  | 188 +++++++++++++-----
>   .../selftests/bpf/test_flow_dissector.c       | 181 +++++++++++++++--
>   .../selftests/bpf/test_flow_dissector.sh      |  19 ++
>   12 files changed, 470 insertions(+), 80 deletions(-)
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb
  2021-06-04  2:55 ` Jason Wang
@ 2021-06-04  3:51   ` Willem de Bruijn
  2021-06-04  6:43     ` Jason Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Willem de Bruijn @ 2021-06-04  3:51 UTC (permalink / raw)
  To: Jason Wang
  Cc: Tanner Love, Network Development, David Miller,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eric Dumazet, Petar Penkov, Jakub Kicinski, Tanner Love,
	Michael S. Tsirkin

On Thu, Jun 3, 2021 at 10:55 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/6/2 上午6:18, Tanner Love 写道:
> > From: Tanner Love <tannerlove@google.com>
> >
> > First patch extends the flow dissector BPF program type to accept
> > virtio-net header members.
> >
> > Second patch uses this feature to add optional flow dissection in
> > virtio_net_hdr_to_skb(). This allows admins to define permitted
> > packets more strictly, for example dropping deprecated UDP_UFO
> > packets.
> >
> > Third patch extends kselftest to cover this feature.
>
>
> I wonder why virtio maintainers is not copied in this series.

Sorry, an oversight.

> Several questions:
>
> 1) having bpf core to know about virito-net header seems like a layer
> violation, it doesn't scale as we may add new fields, actually there's
> already fields that is not implemented in the spec but not Linux right now.

struct virtio_net_hdr is used by multiple interfaces, not just virtio.
The interface as is will remain, regardless of additional extensions.

If the interface is extended, the validation can be extended with it.

Just curious: can you share what extra fields may be in the pipeline?
The struct has historically not seen (m)any changes.

> 2) virtio_net_hdr_to_skb() is not the single entry point, packet could
> go via XDP

Do you mean AF_XDP? As far as I know, vnet_hdr is the only injection
interface for complex packets that include offload instructions (GSO,
csum) -- which are the ones mostly implicated in bug reports.

> 3) I wonder whether we can simply use XDP to solve this issue (metadata
> probably but I don't have a deep thought)
> 4) If I understand the code correctly, it should deal with all dodgy
> packets instead of just for virtio

Yes. Some callers of virtio_net_hdr_to_skb, such as tun_get_user and
virtio receive_buf, pass all packets to it. Others, like tap_get_user
and packet_snd, only call it if a virtio_net_hdr is passed. Once we
have a validation hook, ideally all packets need to pass it. Modifying
callers like tap_get_user can be a simple follow-on.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb
  2021-06-04  3:51   ` Willem de Bruijn
@ 2021-06-04  6:43     ` Jason Wang
  2021-06-04 14:43       ` Willem de Bruijn
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2021-06-04  6:43 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Tanner Love, Network Development, David Miller,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eric Dumazet, Petar Penkov, Jakub Kicinski, Tanner Love,
	Michael S. Tsirkin


在 2021/6/4 上午11:51, Willem de Bruijn 写道:
> On Thu, Jun 3, 2021 at 10:55 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/6/2 上午6:18, Tanner Love 写道:
>>> From: Tanner Love <tannerlove@google.com>
>>>
>>> First patch extends the flow dissector BPF program type to accept
>>> virtio-net header members.
>>>
>>> Second patch uses this feature to add optional flow dissection in
>>> virtio_net_hdr_to_skb(). This allows admins to define permitted
>>> packets more strictly, for example dropping deprecated UDP_UFO
>>> packets.
>>>
>>> Third patch extends kselftest to cover this feature.
>>
>> I wonder why virtio maintainers is not copied in this series.
> Sorry, an oversight.


No problem.


>
>> Several questions:
>>
>> 1) having bpf core to know about virito-net header seems like a layer
>> violation, it doesn't scale as we may add new fields, actually there's
>> already fields that is not implemented in the spec but not Linux right now.
> struct virtio_net_hdr is used by multiple interfaces, not just virtio.
> The interface as is will remain, regardless of additional extensions.
>
> If the interface is extended, the validation can be extended with it.


One possible problem is that there's no sufficient context.

The vnet header length is not a fixed value but depends on the feature 
negotiation. The num_buffers (not implemented in this series) is an 
example. The field doesn't not exist for legacy device if mergeable 
buffer is disabled. If we decide to go with this way, we probably need 
to fix this by introducing a vnet header length.

And I'm not sure it can work for all the future cases e.g the semantic 
of a field may vary depends on the feature negotiated, but maybe it's 
safe since it needs to set the flags.

Another note is that the spec doesn't exclude the possibility to have a 
complete new vnet header format in the future. And the bpf program is 
unaware of any virtio features.


>
> Just curious: can you share what extra fields may be in the pipeline?
> The struct has historically not seen (m)any changes.


For extra fields, I vaguely remember we had some discussions on the 
possible method to extend that, but I forget the actual features.

But spec support RSC which may reuse csum_start/offset but it looks to 
me RSC is not something like Linux need.


>
>> 2) virtio_net_hdr_to_skb() is not the single entry point, packet could
>> go via XDP
> Do you mean AF_XDP?


Yes and kernel XDP as well. If the packet is redirected or transmitted, 
it won't even go to virtio_net_hdr_to_skb().

Since there's no GSO/csum support for XDP, it's probably ok, but needs 
to consider this for the future consider the multi-buffer XDP is being 
developed right now, we can release those restriction.


> As far as I know, vnet_hdr is the only injection
> interface for complex packets that include offload instructions (GSO,
> csum) -- which are the ones mostly implicated in bug reports.


Ideally, if GSO/csum is supported by XDP, it would be more simple to use 
XDP I think.


>
>> 3) I wonder whether we can simply use XDP to solve this issue (metadata
>> probably but I don't have a deep thought)
>> 4) If I understand the code correctly, it should deal with all dodgy
>> packets instead of just for virtio
> Yes. Some callers of virtio_net_hdr_to_skb, such as tun_get_user and
> virtio receive_buf, pass all packets to it. Others, like tap_get_user
> and packet_snd, only call it if a virtio_net_hdr is passed. Once we
> have a validation hook, ideally all packets need to pass it. Modifying
> callers like tap_get_user can be a simple follow-on.


Ok.

Thanks

>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb
  2021-06-04  6:43     ` Jason Wang
@ 2021-06-04 14:43       ` Willem de Bruijn
  0 siblings, 0 replies; 15+ messages in thread
From: Willem de Bruijn @ 2021-06-04 14:43 UTC (permalink / raw)
  To: Jason Wang
  Cc: Tanner Love, Network Development, David Miller,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eric Dumazet, Petar Penkov, Jakub Kicinski, Tanner Love,
	Michael S. Tsirkin

> >> Several questions:
> >>
> >> 1) having bpf core to know about virito-net header seems like a layer
> >> violation, it doesn't scale as we may add new fields, actually there's
> >> already fields that is not implemented in the spec but not Linux right now.
> > struct virtio_net_hdr is used by multiple interfaces, not just virtio.
> > The interface as is will remain, regardless of additional extensions.
> >
> > If the interface is extended, the validation can be extended with it.
>
>
> One possible problem is that there's no sufficient context.
>
> The vnet header length is not a fixed value but depends on the feature
> negotiation. The num_buffers (not implemented in this series) is an
> example. The field doesn't not exist for legacy device if mergeable
> buffer is disabled. If we decide to go with this way, we probably need
> to fix this by introducing a vnet header length.
>
> And I'm not sure it can work for all the future cases e.g the semantic
> of a field may vary depends on the feature negotiated, but maybe it's
> safe since it needs to set the flags.
>
> Another note is that the spec doesn't exclude the possibility to have a
> complete new vnet header format in the future. And the bpf program is
> unaware of any virtio features.

We can extend the program with a version or type field, if multiple
variants appear. The callers can set this.

Thanks for the examples. As a matter of fact, I do know that kind of
extension. I proposed new fields myself this winter, to for timestamps,
pacing offload and hash info on tx:

https://lore.kernel.org/netdev/20210208185558.995292-1-willemdebruijn.kernel@gmail.com/T/#mcbd4dff966a93d61a31844c9d968e7cd4ee7f0ab

Like num_buffers, those are new fields appended to the struct.

Agreed that if the semantics of the existing fields would change or
a whole new v2 type would be defined (with much stricter semantics
that time around, and validation from the start), then a type field in
the flow dissector will be needed.

That is feasible and won't have to break the BPF interface.

> >
> > Just curious: can you share what extra fields may be in the pipeline?
> > The struct has historically not seen (m)any changes.
>
>
> For extra fields, I vaguely remember we had some discussions on the
> possible method to extend that, but I forget the actual features.
>
> But spec support RSC which may reuse csum_start/offset but it looks to
> me RSC is not something like Linux need.
>
>
> >
> >> 2) virtio_net_hdr_to_skb() is not the single entry point, packet could
> >> go via XDP
> > Do you mean AF_XDP?
>
>
> Yes and kernel XDP as well. If the packet is redirected or transmitted,
> it won't even go to virtio_net_hdr_to_skb().

Redirected packets are already in the kernel.

This is strictly a chokepoint for new packets injected from userspace.

> Since there's no GSO/csum support for XDP, it's probably ok, but needs
> to consider this for the future consider the multi-buffer XDP is being
> developed right now, we can release those restriction.

Yes, we have to make sure not to introduce the same issues with any
XDP GSO extensions, if it comes to that.

> > As far as I know, vnet_hdr is the only injection
> > interface for complex packets that include offload instructions (GSO,
> > csum) -- which are the ones mostly implicated in bug reports.
>
>
> Ideally, if GSO/csum is supported by XDP, it would be more simple to use
> XDP I think.

That might actually reduce the odds of seeing new virtio_net_hdr extensions?

That legacy interface is here to stay, though, so we have to continue
to be prepared to handle any input that comes that way.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-06-04 14:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-01 22:18 [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb Tanner Love
2021-06-01 22:18 ` [PATCH net-next v3 1/3] net: flow_dissector: extend bpf flow dissector support with vnet hdr Tanner Love
2021-06-03 15:39   ` sdf
2021-06-01 22:18 ` [PATCH net-next v3 2/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb Tanner Love
2021-06-03 15:54   ` sdf
2021-06-03 23:56   ` Alexei Starovoitov
2021-06-04  0:44     ` Willem de Bruijn
2021-06-04  2:04       ` Alexei Starovoitov
2021-06-01 22:18 ` [PATCH net-next v3 3/3] selftests/net: amend bpf flow dissector prog to do vnet hdr validation Tanner Love
2021-06-02 20:10 ` [PATCH net-next v3 0/3] virtio_net: add optional flow dissection in virtio_net_hdr_to_skb David Miller
2021-06-02 23:16   ` Alexei Starovoitov
2021-06-04  2:55 ` Jason Wang
2021-06-04  3:51   ` Willem de Bruijn
2021-06-04  6:43     ` Jason Wang
2021-06-04 14:43       ` Willem de Bruijn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.