All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6 net-next] VXLAN Group Policy Extension
@ 2015-01-07  2:05 Thomas Graf
  2015-01-07  2:05 ` [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented Thomas Graf
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Thomas Graf @ 2015-01-07  2:05 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar; +Cc: netdev, dev

Implements supports for the Group Policy VXLAN extension [0] to provide
a lightweight and simple security label mechanism across network peers
based on VXLAN. The security context and associated metadata is mapped
to/from skb->mark. This allows further mapping to a SELinux context
using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
tc, etc.

The extension is disabled by default and should be run on a distinct
port in mixed Linux VXLAN VTEP environments. Liberal VXLAN VTEPs
which ignore unknown reserved bits will be able to receive VXLAN-GBP
frames.

Simple usage example:

10.1.1.1:
   # ip link add vxlan0 type vxlan id 10 remote 10.1.1.2 gbp
   # iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200

10.1.1.2:
   # ip link add vxlan0 type vxlan id 10 remote 10.1.1.1 gbp
   # iptables -I INPUT -m mark --mark 0x200 -j DROP

iproute2 [1] and OVS [2] support will be provided in separate patches.

[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
[1] https://github.com/tgraf/iproute2/tree/vxlan-gbp
[2] https://github.com/tgraf/ovs/tree/vxlan-gbp

Thomas Graf (6):
  vxlan: Allow for VXLAN extensions to be implemented
  vxlan: Group Policy extension
  vxlan: Only bind to sockets with correct extensions enabled
  vxlan: Fail build if VXLAN header is misdefined
  openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS()
  openvswitch: Support VXLAN Group Policy extension

 drivers/net/vxlan.c              | 221 +++++++++++++++++++++++++++------------
 include/net/vxlan.h              | 104 ++++++++++++++++--
 include/uapi/linux/if_link.h     |   8 ++
 include/uapi/linux/openvswitch.h |  19 ++++
 net/openvswitch/flow.c           |   2 +-
 net/openvswitch/flow.h           |  14 +--
 net/openvswitch/flow_netlink.c   | 111 ++++++++++++--------
 net/openvswitch/vport-vxlan.c    |  89 +++++++++++++++-
 8 files changed, 435 insertions(+), 133 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented
  2015-01-07  2:05 [PATCH 0/6 net-next] VXLAN Group Policy Extension Thomas Graf
@ 2015-01-07  2:05 ` Thomas Graf
  2015-01-07  3:46   ` Tom Herbert
  2015-01-07  2:05 ` [PATCH 2/6] vxlan: Group Policy extension Thomas Graf
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2015-01-07  2:05 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar; +Cc: netdev, dev

The VXLAN receive code is currently conservative in what it accepts and
will reject any frame that uses any of the reserved VXLAN protocol fields.
The VXLAN draft specifies that "reserved fields MUST be set to zero on
transmit and ignored on receive.".

Retain the current conservative parsing behaviour by default but allows
these fields to be used by VXLAN extensions which are explicitly enabled
on the VXLAN socket respectively VXLAN net_device.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 drivers/net/vxlan.c | 29 +++++++++++++++++++----------
 include/net/vxlan.h | 32 +++++++++++++++++++++++++++++---
 2 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 2ab0922..4d52aa9 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -65,7 +65,7 @@
 #define VXLAN_VID_MASK	(VXLAN_N_VID - 1)
 #define VXLAN_HLEN (sizeof(struct udphdr) + sizeof(struct vxlanhdr))
 
-#define VXLAN_FLAGS 0x08000000	/* struct vxlanhdr.vx_flags required value. */
+#define VXLAN_FLAGS 0x08000000 /* struct vxlanhdr.vx_flags default value. */
 
 /* UDP port for VXLAN traffic.
  * The IANA assigned port is 4789, but the Linux default is 8472
@@ -1100,22 +1100,28 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	if (!pskb_may_pull(skb, VXLAN_HLEN))
 		goto error;
 
+	vs = rcu_dereference_sk_user_data(sk);
+	if (!vs)
+		goto drop;
+
 	/* Return packets with reserved bits set */
 	vxh = (struct vxlanhdr *)(udp_hdr(skb) + 1);
-	if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
-	    (vxh->vx_vni & htonl(0xff))) {
-		netdev_dbg(skb->dev, "invalid vxlan flags=%#x vni=%#x\n",
-			   ntohl(vxh->vx_flags), ntohl(vxh->vx_vni));
-		goto error;
+
+	/* For backwards compatibility, only allow reserved fields to be
+	 * used by VXLAN extensions if explicitly requested.
+	 */
+	if (vs->exts) {
+		if (!vxh->vni_present)
+			goto error_invalid_header;
+	} else {
+		if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
+		    (vxh->vx_vni & htonl(0xff)))
+			goto error_invalid_header;
 	}
 
 	if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
 		goto drop;
 
-	vs = rcu_dereference_sk_user_data(sk);
-	if (!vs)
-		goto drop;
-
 	vs->rcv(vs, skb, vxh->vx_vni);
 	return 0;
 
@@ -1124,6 +1130,9 @@ drop:
 	kfree_skb(skb);
 	return 0;
 
+error_invalid_header:
+	netdev_dbg(skb->dev, "invalid vxlan flags=%#x vni=%#x\n",
+		   ntohl(vxh->vx_flags), ntohl(vxh->vx_vni));
 error:
 	/* Return non vxlan pkt */
 	return 1;
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 903461a..3e98d31 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -11,10 +11,35 @@
 #define VNI_HASH_BITS	10
 #define VNI_HASH_SIZE	(1<<VNI_HASH_BITS)
 
-/* VXLAN protocol header */
+/* VXLAN protocol header:
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |R|R|R|R|I|R|R|R|               Reserved                        |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                VXLAN Network Identifier (VNI) |   Reserved    |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * I = 1	VXLAN Network Identifier (VNI) present
+ */
 struct vxlanhdr {
-	__be32 vx_flags;
-	__be32 vx_vni;
+	union {
+		struct {
+#ifdef __LITTLE_ENDIAN_BITFIELD
+			__u8	reserved_flags1:3,
+				vni_present:1,
+				reserved_flags2:4;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+			__u8	reserved_flags2:4,
+				vni_present:1,
+				reserved_flags1:3;
+#else
+#error	"Please fix <asm/byteorder.h>"
+#endif
+			__u8	vx_reserved1;
+			__be16	vx_reserved2;
+		};
+		__be32 vx_flags;
+	};
+	__be32	vx_vni;
 };
 
 struct vxlan_sock;
@@ -25,6 +50,7 @@ struct vxlan_sock {
 	struct hlist_node hlist;
 	vxlan_rcv_t	 *rcv;
 	void		 *data;
+	u32		  exts;
 	struct work_struct del_work;
 	struct socket	 *sock;
 	struct rcu_head	  rcu;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/6] vxlan: Group Policy extension
  2015-01-07  2:05 [PATCH 0/6 net-next] VXLAN Group Policy Extension Thomas Graf
  2015-01-07  2:05 ` [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented Thomas Graf
@ 2015-01-07  2:05 ` Thomas Graf
  2015-01-07 16:05   ` Tom Herbert
  2015-01-07  2:05 ` [PATCH 3/6] vxlan: Only bind to sockets with correct extensions enabled Thomas Graf
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2015-01-07  2:05 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar; +Cc: netdev, dev

Implements supports for the Group Policy VXLAN extension [0] to provide
a lightweight and simple security label mechanism across network peers
based on VXLAN. The security context and associated metadata is mapped
to/from skb->mark. This allows further mapping to a SELinux context
using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
tc, etc.

The group membership is defined by the lower 16 bits of skb->mark, the
upper 16 bits are used for flags.

SELinux allows to manage label to secure local resources. However,
distributed applications require ACLs to implemented across hosts. This
is typically achieved by matching on L2-L4 fields to identify the
original sending host and process on the receiver. On top of that,
netlabel and specifically CIPSO [1] allow to map security contexts to
universal labels.  However, netlabel and CIPSO are relatively complex.
This patch provides a lightweight alternative for overlay network
environments with a trusted underlay. No additional control protocol
is required.

           Host 1:                       Host 2:

      Group A        Group B        Group B     Group A
      +-----+   +-------------+    +-------+   +-----+
      | lxc |   | SELinux CTX |    | httpd |   | VM  |
      +--+--+   +--+----------+    +---+---+   +--+--+
	  \---+---/                     \----+---/
	      |                              |
	  +---+---+                      +---+---+
	  | vxlan |                      | vxlan |
	  +---+---+                      +---+---+
	      +------------------------------+

Backwards compatibility:
A VXLAN-GBP socket can receive standard VXLAN frames and will assign
the default group 0x0000 to such frames. A Linux VXLAN socket will
drop VXLAN-GBP  frames. The extension is therefore disabled by default
and needs to be specifically enabled:

   ip link add [...] type vxlan [...] gbp

In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
must run on a separate port number.

Examples:
  iptables:
  $ iptables -I OUTPUT -p icmp -j MARK --set-mark 0x200
  $ iptables -I INPUT -i br0 -m mark --mark 0x200 -j ACCEPT

  OVS (patches provided separately):
  in_port=1, actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL

[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
[1] http://lwn.net/Articles/204905/

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 drivers/net/vxlan.c           | 155 ++++++++++++++++++++++++++++++------------
 include/net/vxlan.h           |  80 ++++++++++++++++++++--
 include/uapi/linux/if_link.h  |   8 +++
 net/openvswitch/vport-vxlan.c |   9 ++-
 4 files changed, 197 insertions(+), 55 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 4d52aa9..30b7b59 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -132,6 +132,7 @@ struct vxlan_dev {
 	__u8		  tos;		/* TOS override */
 	__u8		  ttl;
 	u32		  flags;	/* VXLAN_F_* in vxlan.h */
+	u32		  exts;		/* Enabled extensions */
 
 	struct work_struct sock_work;
 	struct work_struct igmp_join;
@@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
 			continue;
 
 		vh2 = (struct vxlanhdr *)(p->data + off_vx);
-		if (vh->vx_vni != vh2->vx_vni) {
+		if (vh->vx_flags != vh2->vx_flags ||
+		    vh->vx_vni != vh2->vx_vni) {
 			NAPI_GRO_CB(p)->same_flow = 0;
 			continue;
 		}
@@ -1095,6 +1097,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
 	struct vxlan_sock *vs;
 	struct vxlanhdr *vxh;
+	struct vxlan_metadata md = {0};
 
 	/* Need Vxlan and inner Ethernet header to be present */
 	if (!pskb_may_pull(skb, VXLAN_HLEN))
@@ -1113,6 +1116,19 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	if (vs->exts) {
 		if (!vxh->vni_present)
 			goto error_invalid_header;
+
+		if (vxh->gbp_present) {
+			if (!(vs->exts & VXLAN_EXT_GBP))
+				goto error_invalid_header;
+
+			md.gbp = ntohs(vxh->gbp.policy_id);
+
+			if (vxh->gbp.dont_learn)
+				md.gbp |= VXLAN_GBP_DONT_LEARN;
+
+			if (vxh->gbp.policy_applied)
+				md.gbp |= VXLAN_GBP_POLICY_APPLIED;
+		}
 	} else {
 		if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
 		    (vxh->vx_vni & htonl(0xff)))
@@ -1122,7 +1138,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
 		goto drop;
 
-	vs->rcv(vs, skb, vxh->vx_vni);
+	md.vni = vxh->vx_vni;
+	vs->rcv(vs, skb, &md);
 	return 0;
 
 drop:
@@ -1138,8 +1155,8 @@ error:
 	return 1;
 }
 
-static void vxlan_rcv(struct vxlan_sock *vs,
-		      struct sk_buff *skb, __be32 vx_vni)
+static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
+		      struct vxlan_metadata *md)
 {
 	struct iphdr *oip = NULL;
 	struct ipv6hdr *oip6 = NULL;
@@ -1150,7 +1167,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
 	int err = 0;
 	union vxlan_addr *remote_ip;
 
-	vni = ntohl(vx_vni) >> 8;
+	vni = ntohl(md->vni) >> 8;
 	/* Is this VNI defined? */
 	vxlan = vxlan_vs_find_vni(vs, vni);
 	if (!vxlan)
@@ -1184,6 +1201,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
 		goto drop;
 
 	skb_reset_network_header(skb);
+	skb->mark = md->gbp;
 
 	if (oip6)
 		err = IP6_ECN_decapsulate(oip6, skb);
@@ -1533,15 +1551,54 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 	return false;
 }
 
+static int vxlan_build_hdr(struct sk_buff *skb, struct vxlan_sock *vs,
+			   int min_headroom, struct vxlan_metadata *md)
+{
+	struct vxlanhdr *vxh;
+	int err;
+
+	/* Need space for new headers (invalidates iph ptr) */
+	err = skb_cow_head(skb, min_headroom);
+	if (unlikely(err)) {
+		kfree_skb(skb);
+		return err;
+	}
+
+	skb = vlan_hwaccel_push_inside(skb);
+	if (WARN_ON(!skb))
+		return -ENOMEM;
+
+	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
+	vxh->vx_flags = htonl(VXLAN_FLAGS);
+	vxh->vx_vni = md->vni;
+
+	if (vs->exts)  {
+		if (vs->exts & VXLAN_EXT_GBP) {
+			vxh->gbp_present = 1;
+
+			if (md->gbp & VXLAN_GBP_DONT_LEARN)
+				vxh->gbp.dont_learn = 1;
+
+			if (md->gbp & VXLAN_GBP_POLICY_APPLIED)
+				vxh->gbp.policy_applied = 1;
+
+			vxh->gbp.policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
+		}
+	}
+
+	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
+
+	return 0;
+}
+
 #if IS_ENABLED(CONFIG_IPV6)
 static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 			   struct dst_entry *dst, struct sk_buff *skb,
 			   struct net_device *dev, struct in6_addr *saddr,
 			   struct in6_addr *daddr, __u8 prio, __u8 ttl,
-			   __be16 src_port, __be16 dst_port, __be32 vni,
-			   bool xnet)
+			   __be16 src_port, __be16 dst_port,
+			   struct vxlan_metadata *md, bool xnet)
 {
-	struct vxlanhdr *vxh;
 	int min_headroom;
 	int err;
 	bool udp_sum = !udp_get_no_check6_tx(vs->sock->sk);
@@ -1558,24 +1615,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 			+ VXLAN_HLEN + sizeof(struct ipv6hdr)
 			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
 
-	/* Need space for new headers (invalidates iph ptr) */
-	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err)) {
-		kfree_skb(skb);
-		goto err;
-	}
-
-	skb = vlan_hwaccel_push_inside(skb);
-	if (WARN_ON(!skb)) {
-		err = -ENOMEM;
+	err = vxlan_build_hdr(skb, vs, min_headroom, md);
+	if (err)
 		goto err;
-	}
-
-	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
-	vxh->vx_flags = htonl(VXLAN_FLAGS);
-	vxh->vx_vni = vni;
-
-	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
 	udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio,
 			     ttl, src_port, dst_port);
@@ -1589,9 +1631,9 @@ err:
 int vxlan_xmit_skb(struct vxlan_sock *vs,
 		   struct rtable *rt, struct sk_buff *skb,
 		   __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
-		   __be16 src_port, __be16 dst_port, __be32 vni, bool xnet)
+		   __be16 src_port, __be16 dst_port,
+		   struct vxlan_metadata *md, bool xnet)
 {
-	struct vxlanhdr *vxh;
 	int min_headroom;
 	int err;
 	bool udp_sum = !vs->sock->sk->sk_no_check_tx;
@@ -1604,22 +1646,9 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 			+ VXLAN_HLEN + sizeof(struct iphdr)
 			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
 
-	/* Need space for new headers (invalidates iph ptr) */
-	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err)) {
-		kfree_skb(skb);
+	err = vxlan_build_hdr(skb, vs, min_headroom, md);
+	if (err)
 		return err;
-	}
-
-	skb = vlan_hwaccel_push_inside(skb);
-	if (WARN_ON(!skb))
-		return -ENOMEM;
-
-	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
-	vxh->vx_flags = htonl(VXLAN_FLAGS);
-	vxh->vx_vni = vni;
-
-	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
 	return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
 				   ttl, df, src_port, dst_port, xnet);
@@ -1679,6 +1708,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 	const struct iphdr *old_iph;
 	struct flowi4 fl4;
 	union vxlan_addr *dst;
+	struct vxlan_metadata md;
 	__be16 src_port = 0, dst_port;
 	u32 vni;
 	__be16 df = 0;
@@ -1749,11 +1779,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 
 		tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
 		ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
+		md.vni = htonl(vni << 8);
+		md.gbp = skb->mark;
 
 		err = vxlan_xmit_skb(vxlan->vn_sock, rt, skb,
 				     fl4.saddr, dst->sin.sin_addr.s_addr,
-				     tos, ttl, df, src_port, dst_port,
-				     htonl(vni << 8),
+				     tos, ttl, df, src_port, dst_port, &md,
 				     !net_eq(vxlan->net, dev_net(vxlan->dev)));
 		if (err < 0) {
 			/* skb is already freed. */
@@ -1806,10 +1837,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		}
 
 		ttl = ttl ? : ip6_dst_hoplimit(ndst);
+		md.vni = htonl(vni << 8);
+		md.gbp = skb->mark;
 
 		err = vxlan6_xmit_skb(vxlan->vn_sock, ndst, skb,
 				      dev, &fl6.saddr, &fl6.daddr, 0, ttl,
-				      src_port, dst_port, htonl(vni << 8),
+				      src_port, dst_port, &md,
 				      !net_eq(vxlan->net, dev_net(vxlan->dev)));
 #endif
 	}
@@ -2210,6 +2243,11 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
 	[IFLA_VXLAN_UDP_CSUM]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]	= { .type = NLA_U8 },
+	[IFLA_VXLAN_EXTENSION]	= { .type = NLA_NESTED },
+};
+
+static const struct nla_policy vxlan_ext_policy[IFLA_VXLAN_EXT_MAX + 1] = {
+	[IFLA_VXLAN_EXT_GBP]	= { .type = NLA_FLAG, },
 };
 
 static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -2246,6 +2284,18 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
 		}
 	}
 
+	if (data[IFLA_VXLAN_EXTENSION]) {
+		int err;
+
+		err = nla_validate_nested(data[IFLA_VXLAN_EXTENSION],
+					  IFLA_VXLAN_EXT_MAX, vxlan_ext_policy);
+		if (err < 0) {
+			pr_debug("invalid VXLAN extension configuration: %d\n",
+				 err);
+			return -EINVAL;
+		}
+	}
+
 	return 0;
 }
 
@@ -2400,6 +2450,18 @@ static void vxlan_sock_work(struct work_struct *work)
 	dev_put(vxlan->dev);
 }
 
+static void configure_vxlan_exts(struct vxlan_dev *vxlan, struct nlattr *attr)
+{
+	struct nlattr *exts[IFLA_VXLAN_EXT_MAX+1];
+
+	/* Validated in vxlan_validate() */
+	if (nla_parse_nested(exts, IFLA_VXLAN_EXT_MAX, attr, NULL) < 0)
+		BUG();
+
+	if (exts[IFLA_VXLAN_EXT_GBP])
+		vxlan->exts |= VXLAN_EXT_GBP;
+}
+
 static int vxlan_newlink(struct net *net, struct net_device *dev,
 			 struct nlattr *tb[], struct nlattr *data[])
 {
@@ -2525,6 +2587,9 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 	    nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
 		vxlan->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
 
+	if (data[IFLA_VXLAN_EXTENSION])
+		configure_vxlan_exts(vxlan, data[IFLA_VXLAN_EXTENSION]);
+
 	if (vxlan_find_vni(net, vni, use_ipv6 ? AF_INET6 : AF_INET,
 			   vxlan->dst_port)) {
 		pr_info("duplicate VNI %u\n", vni);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 3e98d31..66000d0 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -11,13 +11,60 @@
 #define VNI_HASH_BITS	10
 #define VNI_HASH_SIZE	(1<<VNI_HASH_BITS)
 
+/*
+ * VXLAN Group Based Policy Extension:
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |1|-|-|-|1|-|-|-|R|D|R|R|A|R|R|R|        Group Policy ID        |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                VXLAN Network Identifier (VNI) |   Reserved    |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * D = Don't Learn bit. When set, this bit indicates that the egress
+ *     VTEP MUST NOT learn the source address of the encapsulated frame.
+ *
+ * A = Indicates that the group policy has already been applied to
+ *     this packet. Policies MUST NOT be applied by devices when the
+ *     A bit is set.
+ *
+ * [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
+ */
+struct vxlan_gbp {
+#ifdef __LITTLE_ENDIAN_BITFIELD
+	__u8	reserved_flags1:3,
+		policy_applied:1,
+		reserved_flags2:2,
+		dont_learn:1,
+		reserved_flags3:1;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+	__u8	reserved_flags1:1,
+		dont_learn:1,
+		reserved_flags2:2,
+		policy_applied:1,
+		reserved_flags3:3;
+#else
+#error	"Please fix <asm/byteorder.h>"
+#endif
+	__be16 policy_id;
+} __packed;
+
+/* skb->mark mapping
+ *
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |R|R|R|R|R|R|R|R|R|D|R|R|A|R|R|R|        Group Policy ID        |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+#define VXLAN_GBP_DONT_LEARN		(BIT(6) << 16)
+#define VXLAN_GBP_POLICY_APPLIED	(BIT(3) << 16)
+#define VXLAN_GBP_ID_MASK		(0xFFFF)
+
 /* VXLAN protocol header:
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- * |R|R|R|R|I|R|R|R|               Reserved                        |
+ * |G|R|R|R|I|R|R|R|               Reserved                        |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  * |                VXLAN Network Identifier (VNI) |   Reserved    |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  *
+ * G = 1	Group Policy (VXLAN-GBP)
  * I = 1	VXLAN Network Identifier (VNI) present
  */
 struct vxlanhdr {
@@ -26,24 +73,42 @@ struct vxlanhdr {
 #ifdef __LITTLE_ENDIAN_BITFIELD
 			__u8	reserved_flags1:3,
 				vni_present:1,
-				reserved_flags2:4;
+				reserved_flags2:3,
+				gbp_present:1;
 #elif defined(__BIG_ENDIAN_BITFIELD)
-			__u8	reserved_flags2:4,
+			__u8	gbp_present:1,
+				reserved_flags2:3,
 				vni_present:1,
 				reserved_flags1:3;
 #else
 #error	"Please fix <asm/byteorder.h>"
 #endif
-			__u8	vx_reserved1;
-			__be16	vx_reserved2;
+			union {
+				/* NOTE: Offset 0 will be 1 byte aligned, so
+				 * all member structs must be marked packed.
+				 */
+				struct vxlan_gbp gbp;
+				struct {
+					__u8	vx_reserved1;
+					__be16	vx_reserved2;
+				} __packed;
+			};
 		};
 		__be32 vx_flags;
 	};
 	__be32	vx_vni;
 };
 
+struct vxlan_metadata {
+	__be32		vni;
+	u32		gbp;
+};
+
 struct vxlan_sock;
-typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb, __be32 key);
+typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb,
+			   struct vxlan_metadata *md);
+
+#define VXLAN_EXT_GBP			BIT(0)
 
 /* per UDP socket information */
 struct vxlan_sock {
@@ -78,7 +143,8 @@ void vxlan_sock_release(struct vxlan_sock *vs);
 int vxlan_xmit_skb(struct vxlan_sock *vs,
 		   struct rtable *rt, struct sk_buff *skb,
 		   __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
-		   __be16 src_port, __be16 dst_port, __be32 vni, bool xnet);
+		   __be16 src_port, __be16 dst_port, struct vxlan_metadata *md,
+		   bool xnet);
 
 static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
 						     netdev_features_t features)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index f7d0d2d..9f07bf5 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -370,10 +370,18 @@ enum {
 	IFLA_VXLAN_UDP_CSUM,
 	IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
 	IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
+	IFLA_VXLAN_EXTENSION,
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)
 
+enum {
+	IFLA_VXLAN_EXT_UNSPEC,
+	IFLA_VXLAN_EXT_GBP,
+	__IFLA_VXLAN_EXT_MAX,
+};
+#define IFLA_VXLAN_EXT_MAX (__IFLA_VXLAN_EXT_MAX - 1)
+
 struct ifla_vxlan_port_range {
 	__be16	low;
 	__be16	high;
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index d7c46b3..dd68c97 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -59,7 +59,8 @@ static inline struct vxlan_port *vxlan_vport(const struct vport *vport)
 }
 
 /* Called with rcu_read_lock and BH disabled. */
-static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
+static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
+		      struct vxlan_metadata *md)
 {
 	struct ovs_tunnel_info tun_info;
 	struct vport *vport = vs->data;
@@ -68,7 +69,7 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
 
 	/* Save outer tunnel values */
 	iph = ip_hdr(skb);
-	key = cpu_to_be64(ntohl(vx_vni) >> 8);
+	key = cpu_to_be64(ntohl(md->vni) >> 8);
 	ovs_flow_tun_info_init(&tun_info, iph,
 			       udp_hdr(skb)->source, udp_hdr(skb)->dest,
 			       key, TUNNEL_KEY, NULL, 0);
@@ -146,6 +147,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 	struct vxlan_port *vxlan_port = vxlan_vport(vport);
 	__be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport;
 	struct ovs_key_ipv4_tunnel *tun_key;
+	struct vxlan_metadata md;
 	struct rtable *rt;
 	struct flowi4 fl;
 	__be16 src_port;
@@ -178,12 +180,13 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 	skb->ignore_df = 1;
 
 	src_port = udp_flow_src_port(net, skb, 0, 0, true);
+	md.vni = htonl(be64_to_cpu(tun_key->tun_id) << 8);
 
 	err = vxlan_xmit_skb(vxlan_port->vs, rt, skb,
 			     fl.saddr, tun_key->ipv4_dst,
 			     tun_key->ipv4_tos, tun_key->ipv4_ttl, df,
 			     src_port, dst_port,
-			     htonl(be64_to_cpu(tun_key->tun_id) << 8),
+			     &md,
 			     false);
 	if (err < 0)
 		ip_rt_put(rt);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/6] vxlan: Only bind to sockets with correct extensions enabled
  2015-01-07  2:05 [PATCH 0/6 net-next] VXLAN Group Policy Extension Thomas Graf
  2015-01-07  2:05 ` [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented Thomas Graf
  2015-01-07  2:05 ` [PATCH 2/6] vxlan: Group Policy extension Thomas Graf
@ 2015-01-07  2:05 ` Thomas Graf
  2015-01-07 22:45   ` Jesse Gross
  2015-01-07  2:05 ` [PATCH 4/6] vxlan: Fail build if VXLAN header is misdefined Thomas Graf
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2015-01-07  2:05 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar; +Cc: netdev, dev

A VXLAN net_device looking for an appropriate socket may only
consider a socket which has the exact set of extensions enabled.
If none can be found, a new socket must be created.

The OVS VXLAN port is kept unaware of extensions at this point.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 drivers/net/vxlan.c           | 35 +++++++++++++++++++++--------------
 include/net/vxlan.h           |  2 +-
 net/openvswitch/vport-vxlan.c |  2 +-
 3 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 30b7b59..2b75c62 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -271,14 +271,15 @@ static inline struct vxlan_rdst *first_remote_rtnl(struct vxlan_fdb *fdb)
 }
 
 /* Find VXLAN socket based on network namespace, address family and UDP port */
-static struct vxlan_sock *vxlan_find_sock(struct net *net,
-					  sa_family_t family, __be16 port)
+static struct vxlan_sock *vxlan_find_sock(struct net *net, sa_family_t family,
+					  __be16 port, u32 exts)
 {
 	struct vxlan_sock *vs;
 
 	hlist_for_each_entry_rcu(vs, vs_head(net, port), hlist) {
 		if (inet_sk(vs->sock->sk)->inet_sport == port &&
-		    inet_sk(vs->sock->sk)->sk.sk_family == family)
+		    inet_sk(vs->sock->sk)->sk.sk_family == family &&
+		    vs->exts == exts)
 			return vs;
 	}
 	return NULL;
@@ -298,11 +299,12 @@ static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs, u32 id)
 
 /* Look up VNI in a per net namespace table */
 static struct vxlan_dev *vxlan_find_vni(struct net *net, u32 id,
-					sa_family_t family, __be16 port)
+					sa_family_t family, __be16 port,
+					u32 exts)
 {
 	struct vxlan_sock *vs;
 
-	vs = vxlan_find_sock(net, family, port);
+	vs = vxlan_find_sock(net, family, port, exts);
 	if (!vs)
 		return NULL;
 
@@ -1770,7 +1772,8 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 
 			ip_rt_put(rt);
 			dst_vxlan = vxlan_find_vni(vxlan->net, vni,
-						   dst->sa.sa_family, dst_port);
+						   dst->sa.sa_family, dst_port,
+						   vxlan->exts);
 			if (!dst_vxlan)
 				goto tx_error;
 			vxlan_encap_bypass(skb, vxlan, dst_vxlan);
@@ -1829,7 +1832,8 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 
 			dst_release(ndst);
 			dst_vxlan = vxlan_find_vni(vxlan->net, vni,
-						   dst->sa.sa_family, dst_port);
+						   dst->sa.sa_family, dst_port,
+						   vxlan->exts);
 			if (!dst_vxlan)
 				goto tx_error;
 			vxlan_encap_bypass(skb, vxlan, dst_vxlan);
@@ -1999,7 +2003,7 @@ static int vxlan_init(struct net_device *dev)
 
 	spin_lock(&vn->sock_lock);
 	vs = vxlan_find_sock(vxlan->net, ipv6 ? AF_INET6 : AF_INET,
-			     vxlan->dst_port);
+			     vxlan->dst_port, vxlan->exts);
 	if (vs && atomic_add_unless(&vs->refcnt, 1, 0)) {
 		/* If we have a socket with same port already, reuse it */
 		vxlan_vs_add_dev(vs, vxlan);
@@ -2353,7 +2357,7 @@ static struct socket *vxlan_create_sock(struct net *net, bool ipv6,
 /* Create new listen socket if needed */
 static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 					      vxlan_rcv_t *rcv, void *data,
-					      u32 flags)
+					      u32 flags, u32 exts)
 {
 	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
 	struct vxlan_sock *vs;
@@ -2381,6 +2385,7 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 	atomic_set(&vs->refcnt, 1);
 	vs->rcv = rcv;
 	vs->data = data;
+	vs->exts = exts;
 
 	/* Initialize the vxlan udp offloads structure */
 	vs->udp_offloads.port = port;
@@ -2405,13 +2410,14 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 
 struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
 				  vxlan_rcv_t *rcv, void *data,
-				  bool no_share, u32 flags)
+				  bool no_share, u32 flags,
+				  u32 exts)
 {
 	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
 	struct vxlan_sock *vs;
 	bool ipv6 = flags & VXLAN_F_IPV6;
 
-	vs = vxlan_socket_create(net, port, rcv, data, flags);
+	vs = vxlan_socket_create(net, port, rcv, data, flags, exts);
 	if (!IS_ERR(vs))
 		return vs;
 
@@ -2419,7 +2425,7 @@ struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
 		return vs;
 
 	spin_lock(&vn->sock_lock);
-	vs = vxlan_find_sock(net, ipv6 ? AF_INET6 : AF_INET, port);
+	vs = vxlan_find_sock(net, ipv6 ? AF_INET6 : AF_INET, port, exts);
 	if (vs && ((vs->rcv != rcv) ||
 		   !atomic_add_unless(&vs->refcnt, 1, 0)))
 			vs = ERR_PTR(-EBUSY);
@@ -2441,7 +2447,8 @@ static void vxlan_sock_work(struct work_struct *work)
 	__be16 port = vxlan->dst_port;
 	struct vxlan_sock *nvs;
 
-	nvs = vxlan_sock_add(net, port, vxlan_rcv, NULL, false, vxlan->flags);
+	nvs = vxlan_sock_add(net, port, vxlan_rcv, NULL, false, vxlan->flags,
+			     vxlan->exts);
 	spin_lock(&vn->sock_lock);
 	if (!IS_ERR(nvs))
 		vxlan_vs_add_dev(nvs, vxlan);
@@ -2591,7 +2598,7 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 		configure_vxlan_exts(vxlan, data[IFLA_VXLAN_EXTENSION]);
 
 	if (vxlan_find_vni(net, vni, use_ipv6 ? AF_INET6 : AF_INET,
-			   vxlan->dst_port)) {
+			   vxlan->dst_port, vxlan->exts)) {
 		pr_info("duplicate VNI %u\n", vni);
 		return -EEXIST;
 	}
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 66000d0..da257a7 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -136,7 +136,7 @@ struct vxlan_sock {
 
 struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
 				  vxlan_rcv_t *rcv, void *data,
-				  bool no_share, u32 flags);
+				  bool no_share, u32 flags, u32 exts);
 
 void vxlan_sock_release(struct vxlan_sock *vs);
 
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index dd68c97..266c595 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -128,7 +128,7 @@ static struct vport *vxlan_tnl_create(const struct vport_parms *parms)
 	vxlan_port = vxlan_vport(vport);
 	strncpy(vxlan_port->name, parms->name, IFNAMSIZ);
 
-	vs = vxlan_sock_add(net, htons(dst_port), vxlan_rcv, vport, true, 0);
+	vs = vxlan_sock_add(net, htons(dst_port), vxlan_rcv, vport, true, 0, 0);
 	if (IS_ERR(vs)) {
 		ovs_vport_free(vport);
 		return (void *)vs;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/6] vxlan: Fail build if VXLAN header is misdefined
  2015-01-07  2:05 [PATCH 0/6 net-next] VXLAN Group Policy Extension Thomas Graf
                   ` (2 preceding siblings ...)
  2015-01-07  2:05 ` [PATCH 3/6] vxlan: Only bind to sockets with correct extensions enabled Thomas Graf
@ 2015-01-07  2:05 ` Thomas Graf
  2015-01-07  2:05 ` [PATCH 5/6] openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS() Thomas Graf
  2015-01-07  2:05 ` [PATCH 6/6] openvswitch: Support VXLAN Group Policy extension Thomas Graf
  5 siblings, 0 replies; 26+ messages in thread
From: Thomas Graf @ 2015-01-07  2:05 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar; +Cc: netdev, dev

Due to the complexity of struct vxlanhdr, protect against unwanted
and undesired changes by failing the build if the size of the struct
changes.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 drivers/net/vxlan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 2b75c62..293d524 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2842,6 +2842,8 @@ static int __init vxlan_init_module(void)
 {
 	int rc;
 
+	BUILD_BUG_ON(sizeof(struct vxlanhdr) != 8);
+
 	vxlan_wq = alloc_workqueue("vxlan", 0, 0);
 	if (!vxlan_wq)
 		return -ENOMEM;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/6] openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS()
  2015-01-07  2:05 [PATCH 0/6 net-next] VXLAN Group Policy Extension Thomas Graf
                   ` (3 preceding siblings ...)
  2015-01-07  2:05 ` [PATCH 4/6] vxlan: Fail build if VXLAN header is misdefined Thomas Graf
@ 2015-01-07  2:05 ` Thomas Graf
  2015-01-07 22:46   ` Jesse Gross
  2015-01-07  2:05 ` [PATCH 6/6] openvswitch: Support VXLAN Group Policy extension Thomas Graf
  5 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2015-01-07  2:05 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar; +Cc: netdev, dev

A subsequent patch will introduce VXLAN options. Rename the existing
GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic
tunnel metadata options.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/openvswitch/flow.c         |  2 +-
 net/openvswitch/flow.h         | 14 +++++++-------
 net/openvswitch/flow_netlink.c | 37 +++++++++++++++++--------------------
 3 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 70bef2a..bfc74ac 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -690,7 +690,7 @@ int ovs_flow_key_extract(const struct ovs_tunnel_info *tun_info,
 			BUILD_BUG_ON((1 << (sizeof(tun_info->options_len) *
 						   8)) - 1
 					> sizeof(key->tun_opts));
-			memcpy(GENEVE_OPTS(key, tun_info->options_len),
+			memcpy(TUN_METADATA_OPTS(key, tun_info->options_len),
 			       tun_info->options, tun_info->options_len);
 			key->tun_opts_len = tun_info->options_len;
 		} else {
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index a8b30f3..d3d0a40 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -53,7 +53,7 @@ struct ovs_key_ipv4_tunnel {
 
 struct ovs_tunnel_info {
 	struct ovs_key_ipv4_tunnel tunnel;
-	const struct geneve_opt *options;
+	const void *options;
 	u8 options_len;
 };
 
@@ -61,10 +61,10 @@ struct ovs_tunnel_info {
  * maximum size. This allows us to get the benefits of variable length
  * matching for small options.
  */
-#define GENEVE_OPTS(flow_key, opt_len)	\
-	((struct geneve_opt *)((flow_key)->tun_opts + \
-			       FIELD_SIZEOF(struct sw_flow_key, tun_opts) - \
-			       opt_len))
+#define TUN_METADATA_OFFSET(opt_len) \
+	(FIELD_SIZEOF(struct sw_flow_key, tun_opts) - opt_len)
+#define TUN_METADATA_OPTS(flow_key, opt_len) \
+	((void *)((flow_key)->tun_opts + TUN_METADATA_OFFSET(opt_len)))
 
 static inline void __ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
 					    __be32 saddr, __be32 daddr,
@@ -73,7 +73,7 @@ static inline void __ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
 					    __be16 tp_dst,
 					    __be64 tun_id,
 					    __be16 tun_flags,
-					    const struct geneve_opt *opts,
+					    const void *opts,
 					    u8 opts_len)
 {
 	tun_info->tunnel.tun_id = tun_id;
@@ -105,7 +105,7 @@ static inline void ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
 					  __be16 tp_dst,
 					  __be64 tun_id,
 					  __be16 tun_flags,
-					  const struct geneve_opt *opts,
+					  const void *opts,
 					  u8 opts_len)
 {
 	__ovs_flow_tun_info_init(tun_info, iph->saddr, iph->daddr,
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index d1eecf7..c60ae3f 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -387,20 +387,20 @@ static int parse_flow_nlattrs(const struct nlattr *attr,
 	return __parse_flow_nlattrs(attr, a, attrsp, log, false);
 }
 
-static int genev_tun_opt_from_nlattr(const struct nlattr *a,
-				     struct sw_flow_match *match, bool is_mask,
-				     bool log)
+static int tun_md_opt_from_nlattr(const struct nlattr *a,
+				  struct sw_flow_match *match, bool is_mask,
+				  bool log)
 {
 	unsigned long opt_key_offset;
 
 	if (nla_len(a) > sizeof(match->key->tun_opts)) {
-		OVS_NLERR(log, "Geneve option length err (len %d, max %zu).",
+		OVS_NLERR(log, "Tunnel metadata option length err (len %d, max %zu).",
 			  nla_len(a), sizeof(match->key->tun_opts));
 		return -EINVAL;
 	}
 
 	if (nla_len(a) % 4 != 0) {
-		OVS_NLERR(log, "Geneve opt len %d is not a multiple of 4.",
+		OVS_NLERR(log, "Tunnel metadata opt len %d is not a multiple of 4.",
 			  nla_len(a));
 		return -EINVAL;
 	}
@@ -424,7 +424,7 @@ static int genev_tun_opt_from_nlattr(const struct nlattr *a,
 		 * information later.
 		 */
 		if (match->key->tun_opts_len != nla_len(a)) {
-			OVS_NLERR(log, "Geneve option len %d != mask len %d",
+			OVS_NLERR(log, "Tunnel metadata option len %d != mask len %d",
 				  match->key->tun_opts_len, nla_len(a));
 			return -EINVAL;
 		}
@@ -432,8 +432,7 @@ static int genev_tun_opt_from_nlattr(const struct nlattr *a,
 		SW_FLOW_KEY_PUT(match, tun_opts_len, 0xff, true);
 	}
 
-	opt_key_offset = (unsigned long)GENEVE_OPTS((struct sw_flow_key *)0,
-						    nla_len(a));
+	opt_key_offset = TUN_METADATA_OFFSET(nla_len(a));
 	SW_FLOW_KEY_MEMCPY_OFFSET(match, opt_key_offset, nla_data(a),
 				  nla_len(a), is_mask);
 	return 0;
@@ -520,7 +519,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 			tun_flags |= TUNNEL_OAM;
 			break;
 		case OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS:
-			err = genev_tun_opt_from_nlattr(a, match, is_mask, log);
+			err = tun_md_opt_from_nlattr(a, match, is_mask, log);
 			if (err)
 				return err;
 
@@ -558,8 +557,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 
 static int __ipv4_tun_to_nlattr(struct sk_buff *skb,
 				const struct ovs_key_ipv4_tunnel *output,
-				const struct geneve_opt *tun_opts,
-				int swkey_tun_opts_len)
+				const void *tun_opts, int swkey_tun_opts_len)
 {
 	if (output->tun_flags & TUNNEL_KEY &&
 	    nla_put_be64(skb, OVS_TUNNEL_KEY_ATTR_ID, output->tun_id))
@@ -600,8 +598,7 @@ static int __ipv4_tun_to_nlattr(struct sk_buff *skb,
 
 static int ipv4_tun_to_nlattr(struct sk_buff *skb,
 			      const struct ovs_key_ipv4_tunnel *output,
-			      const struct geneve_opt *tun_opts,
-			      int swkey_tun_opts_len)
+			      const void *tun_opts, int swkey_tun_opts_len)
 {
 	struct nlattr *nla;
 	int err;
@@ -1148,10 +1145,10 @@ int ovs_nla_put_flow(const struct sw_flow_key *swkey,
 		goto nla_put_failure;
 
 	if ((swkey->tun_key.ipv4_dst || is_mask)) {
-		const struct geneve_opt *opts = NULL;
+		const void *opts = NULL;
 
 		if (output->tun_key.tun_flags & TUNNEL_OPTIONS_PRESENT)
-			opts = GENEVE_OPTS(output, swkey->tun_opts_len);
+			opts = TUN_METADATA_OPTS(output, swkey->tun_opts_len);
 
 		if (ipv4_tun_to_nlattr(skb, &output->tun_key, opts,
 				       swkey->tun_opts_len))
@@ -1555,11 +1552,11 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 		return err;
 
 	if (key.tun_opts_len) {
-		struct geneve_opt *option = GENEVE_OPTS(&key,
-							key.tun_opts_len);
+		struct geneve_opt *option;
 		int opts_len = key.tun_opts_len;
 		bool crit_opt = false;
 
+		option = (struct geneve_opt *) TUN_METADATA_OPTS(&key, key.tun_opts_len);
 		while (opts_len > 0) {
 			int len;
 
@@ -1597,9 +1594,9 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 		 * everything else will go away after flow setup. We can append
 		 * it to tun_info and then point there.
 		 */
-		memcpy((tun_info + 1), GENEVE_OPTS(&key, key.tun_opts_len),
-		       key.tun_opts_len);
-		tun_info->options = (struct geneve_opt *)(tun_info + 1);
+		memcpy((tun_info + 1), TUN_METADATA_OPTS(&key,
+		       key.tun_opts_len), key.tun_opts_len);
+		tun_info->options = (tun_info + 1);
 	} else {
 		tun_info->options = NULL;
 	}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6] openvswitch: Support VXLAN Group Policy extension
  2015-01-07  2:05 [PATCH 0/6 net-next] VXLAN Group Policy Extension Thomas Graf
                   ` (4 preceding siblings ...)
  2015-01-07  2:05 ` [PATCH 5/6] openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS() Thomas Graf
@ 2015-01-07  2:05 ` Thomas Graf
  2015-01-07 22:46   ` Jesse Gross
  5 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2015-01-07  2:05 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar; +Cc: netdev, dev

Introduces support for the group policy extension to the VXLAN virtual
port. The extension is disabled by default and only enabled if the user
has provided the respective configuration.

  ovs-vsctl add-port br0 vxlan0 -- \
     set Interface vxlan0 type=vxlan options:exts=gbp

The configuration interface to enable the extension is based on a new
attribute OVS_VXLAN_EXT_GBP nested inside OVS_TUNNEL_ATTR_EXTENSION
which can carry additional extensions as needed in the future.

The group policy metadata is handled in the same way as Geneve options
and transported as binary blob in a new Netlink attribute
OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS which is mutually exclusive to the
existing OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 include/uapi/linux/openvswitch.h | 19 ++++++++++
 net/openvswitch/flow_netlink.c   | 78 +++++++++++++++++++++++++--------------
 net/openvswitch/vport-vxlan.c    | 80 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 148 insertions(+), 29 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 3a6dcaa..676a89e 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -248,11 +248,29 @@ enum ovs_vport_attr {
 
 #define OVS_VPORT_ATTR_MAX (__OVS_VPORT_ATTR_MAX - 1)
 
+/**
+ * struct ovs_vxlan_opts - VXLAN tunnel options
+ * @gbp: Group policy bits
+ */
+struct ovs_vxlan_opts {
+	__u32 gbp;
+};
+
+enum {
+	OVS_VXLAN_EXT_UNSPEC,
+	OVS_VXLAN_EXT_GBP,
+	__OVS_VXLAN_EXT_MAX,
+};
+
+#define OVS_VXLAN_EXT_MAX (__OVS_VXLAN_EXT_MAX - 1)
+
+
 /* OVS_VPORT_ATTR_OPTIONS attributes for tunnels.
  */
 enum {
 	OVS_TUNNEL_ATTR_UNSPEC,
 	OVS_TUNNEL_ATTR_DST_PORT, /* 16-bit UDP port, used by L4 tunnels. */
+	OVS_TUNNEL_ATTR_EXTENSION,
 	__OVS_TUNNEL_ATTR_MAX
 };
 
@@ -324,6 +342,7 @@ enum ovs_tunnel_key_attr {
 	OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS,        /* Array of Geneve options. */
 	OVS_TUNNEL_KEY_ATTR_TP_SRC,		/* be16 src Transport Port. */
 	OVS_TUNNEL_KEY_ATTR_TP_DST,		/* be16 dst Transport Port. */
+	OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS,		/* struct ovs_vxlan_opts. */
 	__OVS_TUNNEL_KEY_ATTR_MAX
 };
 
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index c60ae3f..1528709 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -446,6 +446,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 	int rem;
 	bool ttl = false;
 	__be16 tun_flags = 0;
+	int opts_type = 0;
 
 	nla_for_each_nested(a, attr, rem) {
 		int type = nla_type(a);
@@ -463,6 +464,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 			[OVS_TUNNEL_KEY_ATTR_TP_DST] = sizeof(u16),
 			[OVS_TUNNEL_KEY_ATTR_OAM] = 0,
 			[OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS] = -1,
+			[OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS] = -1,
 		};
 
 		if (type > OVS_TUNNEL_KEY_ATTR_MAX) {
@@ -519,11 +521,18 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 			tun_flags |= TUNNEL_OAM;
 			break;
 		case OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS:
+		case OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS:
+			if (opts_type) {
+				OVS_NLERR(log, "Multiple metadata blocks provided");
+				return -EINVAL;
+			}
+
 			err = tun_md_opt_from_nlattr(a, match, is_mask, log);
 			if (err)
 				return err;
 
 			tun_flags |= TUNNEL_OPTIONS_PRESENT;
+			opts_type = type;
 			break;
 		default:
 			OVS_NLERR(log, "Unknown IPv4 tunnel attribute %d",
@@ -552,7 +561,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 		}
 	}
 
-	return 0;
+	return opts_type;
 }
 
 static int __ipv4_tun_to_nlattr(struct sk_buff *skb,
@@ -1537,6 +1546,34 @@ void ovs_match_init(struct sw_flow_match *match,
 	}
 }
 
+static int validate_and_copy_geneve_opts(struct sw_flow_key *key)
+{
+	struct geneve_opt *option;
+	int opts_len = key->tun_opts_len;
+	bool crit_opt = false;
+
+	option = (struct geneve_opt *) TUN_METADATA_OPTS(key, key->tun_opts_len);
+	while (opts_len > 0) {
+		int len;
+
+		if (opts_len < sizeof(*option))
+			return -EINVAL;
+
+		len = sizeof(*option) + option->length * 4;
+		if (len > opts_len)
+			return -EINVAL;
+
+		crit_opt |= !!(option->type & GENEVE_CRIT_OPT_TYPE);
+
+		option = (struct geneve_opt *)((u8 *)option + len);
+		opts_len -= len;
+	};
+
+	key->tun_key.tun_flags |= crit_opt ? TUNNEL_CRIT_OPT : 0;
+
+	return 0;
+}
+
 static int validate_and_copy_set_tun(const struct nlattr *attr,
 				     struct sw_flow_actions **sfa, bool log)
 {
@@ -1544,36 +1581,23 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 	struct sw_flow_key key;
 	struct ovs_tunnel_info *tun_info;
 	struct nlattr *a;
-	int err, start;
+	int err, start, opts_type;
 
 	ovs_match_init(&match, &key, NULL);
-	err = ipv4_tun_from_nlattr(nla_data(attr), &match, false, log);
-	if (err)
-		return err;
+	opts_type = ipv4_tun_from_nlattr(nla_data(attr), &match, false, log);
+	if (opts_type < 0)
+		return opts_type;
 
 	if (key.tun_opts_len) {
-		struct geneve_opt *option;
-		int opts_len = key.tun_opts_len;
-		bool crit_opt = false;
-
-		option = (struct geneve_opt *) TUN_METADATA_OPTS(&key, key.tun_opts_len);
-		while (opts_len > 0) {
-			int len;
-
-			if (opts_len < sizeof(*option))
-				return -EINVAL;
-
-			len = sizeof(*option) + option->length * 4;
-			if (len > opts_len)
-				return -EINVAL;
-
-			crit_opt |= !!(option->type & GENEVE_CRIT_OPT_TYPE);
-
-			option = (struct geneve_opt *)((u8 *)option + len);
-			opts_len -= len;
-		};
-
-		key.tun_key.tun_flags |= crit_opt ? TUNNEL_CRIT_OPT : 0;
+		switch (opts_type) {
+		case OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS:
+			err = validate_and_copy_geneve_opts(&key);
+			if (err < 0)
+				return err;
+			break;
+		case OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS:
+			break;
+		}
 	};
 
 	start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SET, log);
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 266c595..8ed7163 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -49,6 +49,7 @@
 struct vxlan_port {
 	struct vxlan_sock *vs;
 	char name[IFNAMSIZ];
+	u32 exts; /* VXLAN_EXT_* in <net/vxlan.h> */
 };
 
 static struct vport_ops ovs_vxlan_vport_ops;
@@ -63,16 +64,26 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
 		      struct vxlan_metadata *md)
 {
 	struct ovs_tunnel_info tun_info;
+	struct vxlan_port *vxlan_port;
 	struct vport *vport = vs->data;
 	struct iphdr *iph;
+	struct ovs_vxlan_opts opts = {
+		.gbp = md->gbp,
+	};
 	__be64 key;
+	__be16 flags;
+
+	flags = TUNNEL_KEY;
+	vxlan_port = vxlan_vport(vport);
+	if (vxlan_port->exts & VXLAN_EXT_GBP)
+		flags |= TUNNEL_OPTIONS_PRESENT;
 
 	/* Save outer tunnel values */
 	iph = ip_hdr(skb);
 	key = cpu_to_be64(ntohl(md->vni) >> 8);
 	ovs_flow_tun_info_init(&tun_info, iph,
 			       udp_hdr(skb)->source, udp_hdr(skb)->dest,
-			       key, TUNNEL_KEY, NULL, 0);
+			       key, flags, &opts, sizeof(opts));
 
 	ovs_vport_receive(vport, skb, &tun_info);
 }
@@ -84,6 +95,21 @@ static int vxlan_get_options(const struct vport *vport, struct sk_buff *skb)
 
 	if (nla_put_u16(skb, OVS_TUNNEL_ATTR_DST_PORT, ntohs(dst_port)))
 		return -EMSGSIZE;
+
+	if (vxlan_port->exts) {
+		struct nlattr *exts;
+
+		exts = nla_nest_start(skb, OVS_TUNNEL_ATTR_EXTENSION);
+		if (!exts)
+			return -EMSGSIZE;
+
+		if (vxlan_port->exts & VXLAN_EXT_GBP &&
+		    nla_put_flag(skb, OVS_VXLAN_EXT_GBP))
+			return -EMSGSIZE;
+
+		nla_nest_end(skb, exts);
+	}
+
 	return 0;
 }
 
@@ -96,6 +122,31 @@ static void vxlan_tnl_destroy(struct vport *vport)
 	ovs_vport_deferred_free(vport);
 }
 
+static const struct nla_policy exts_policy[OVS_VXLAN_EXT_MAX+1] = {
+	[OVS_VXLAN_EXT_GBP]	= { .type = NLA_FLAG, },
+};
+
+static int vxlan_configure_exts(struct vport *vport, struct nlattr *attr)
+{
+	struct nlattr *exts[OVS_VXLAN_EXT_MAX+1];
+	struct vxlan_port *vxlan_port;
+	int err;
+
+	if (nla_len(attr) < sizeof(struct nlattr))
+		return -EINVAL;
+
+	err = nla_parse_nested(exts, OVS_VXLAN_EXT_MAX, attr, exts_policy);
+	if (err < 0)
+		return err;
+
+	vxlan_port = vxlan_vport(vport);
+
+	if (exts[OVS_VXLAN_EXT_GBP])
+		vxlan_port->exts |= VXLAN_EXT_GBP;
+
+	return 0;
+}
+
 static struct vport *vxlan_tnl_create(const struct vport_parms *parms)
 {
 	struct net *net = ovs_dp_get_net(parms->dp);
@@ -128,7 +179,17 @@ static struct vport *vxlan_tnl_create(const struct vport_parms *parms)
 	vxlan_port = vxlan_vport(vport);
 	strncpy(vxlan_port->name, parms->name, IFNAMSIZ);
 
-	vs = vxlan_sock_add(net, htons(dst_port), vxlan_rcv, vport, true, 0, 0);
+	a = nla_find_nested(options, OVS_TUNNEL_ATTR_EXTENSION);
+	if (a) {
+		err = vxlan_configure_exts(vport, a);
+		if (err) {
+			ovs_vport_free(vport);
+			goto error;
+		}
+	}
+
+	vs = vxlan_sock_add(net, htons(dst_port), vxlan_rcv, vport, true, 0,
+			    vxlan_port->exts);
 	if (IS_ERR(vs)) {
 		ovs_vport_free(vport);
 		return (void *)vs;
@@ -141,6 +202,20 @@ error:
 	return ERR_PTR(err);
 }
 
+static int vxlan_ext_gbp(struct sk_buff *skb)
+{
+	const struct ovs_tunnel_info *tun_info;
+	const struct ovs_vxlan_opts *opts;
+
+	tun_info = OVS_CB(skb)->egress_tun_info;
+	opts = tun_info->options;
+
+	if (tun_info->options_len >= sizeof(*opts))
+		return opts->gbp;
+	else
+		return 0;
+}
+
 static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 {
 	struct net *net = ovs_dp_get_net(vport->dp);
@@ -181,6 +256,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 
 	src_port = udp_flow_src_port(net, skb, 0, 0, true);
 	md.vni = htonl(be64_to_cpu(tun_key->tun_id) << 8);
+	md.gbp = vxlan_ext_gbp(skb);
 
 	err = vxlan_xmit_skb(vxlan_port->vs, rt, skb,
 			     fl.saddr, tun_key->ipv4_dst,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented
  2015-01-07  2:05 ` [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented Thomas Graf
@ 2015-01-07  3:46   ` Tom Herbert
  2015-01-07 10:27     ` Thomas Graf
  0 siblings, 1 reply; 26+ messages in thread
From: Tom Herbert @ 2015-01-07  3:46 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Linux Netdev List, dev

On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> The VXLAN receive code is currently conservative in what it accepts and
> will reject any frame that uses any of the reserved VXLAN protocol fields.
> The VXLAN draft specifies that "reserved fields MUST be set to zero on
> transmit and ignored on receive.".
>
IMO it is an unfortunate decision in VXLAN to ignore set reserved
fields on receive. Had they not done this, then adding a protocol
field to the header would have been feasible and we wouldn't need yet
another encapsulation protocol (i.e. VXLAN-GPE). Rejecting frames with
reserved bits set is the better behavior, but I think the comment
about this needs to be clear about why this diverges from RFC7348.

> Retain the current conservative parsing behaviour by default but allows
> these fields to be used by VXLAN extensions which are explicitly enabled
> on the VXLAN socket respectively VXLAN net_device.
>
Conceptually, VXLAN has both mandatory flags and optional flags for
extensions. You may want to look at the VXLAN RCO patches that added
an extension and infrastructure for them.

Tom

> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
>  drivers/net/vxlan.c | 29 +++++++++++++++++++----------
>  include/net/vxlan.h | 32 +++++++++++++++++++++++++++++---
>  2 files changed, 48 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 2ab0922..4d52aa9 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -65,7 +65,7 @@
>  #define VXLAN_VID_MASK (VXLAN_N_VID - 1)
>  #define VXLAN_HLEN (sizeof(struct udphdr) + sizeof(struct vxlanhdr))
>
> -#define VXLAN_FLAGS 0x08000000 /* struct vxlanhdr.vx_flags required value. */
> +#define VXLAN_FLAGS 0x08000000 /* struct vxlanhdr.vx_flags default value. */
>
>  /* UDP port for VXLAN traffic.
>   * The IANA assigned port is 4789, but the Linux default is 8472
> @@ -1100,22 +1100,28 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>         if (!pskb_may_pull(skb, VXLAN_HLEN))
>                 goto error;
>
> +       vs = rcu_dereference_sk_user_data(sk);
> +       if (!vs)
> +               goto drop;
> +
>         /* Return packets with reserved bits set */
>         vxh = (struct vxlanhdr *)(udp_hdr(skb) + 1);
> -       if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
> -           (vxh->vx_vni & htonl(0xff))) {
> -               netdev_dbg(skb->dev, "invalid vxlan flags=%#x vni=%#x\n",
> -                          ntohl(vxh->vx_flags), ntohl(vxh->vx_vni));
> -               goto error;
> +
> +       /* For backwards compatibility, only allow reserved fields to be
> +        * used by VXLAN extensions if explicitly requested.
> +        */
> +       if (vs->exts) {
> +               if (!vxh->vni_present)
> +                       goto error_invalid_header;
> +       } else {
> +               if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
> +                   (vxh->vx_vni & htonl(0xff)))
> +                       goto error_invalid_header;
>         }
>
>         if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
>                 goto drop;
>
> -       vs = rcu_dereference_sk_user_data(sk);
> -       if (!vs)
> -               goto drop;
> -
>         vs->rcv(vs, skb, vxh->vx_vni);
>         return 0;
>
> @@ -1124,6 +1130,9 @@ drop:
>         kfree_skb(skb);
>         return 0;
>
> +error_invalid_header:
> +       netdev_dbg(skb->dev, "invalid vxlan flags=%#x vni=%#x\n",
> +                  ntohl(vxh->vx_flags), ntohl(vxh->vx_vni));
>  error:
>         /* Return non vxlan pkt */
>         return 1;
> diff --git a/include/net/vxlan.h b/include/net/vxlan.h
> index 903461a..3e98d31 100644
> --- a/include/net/vxlan.h
> +++ b/include/net/vxlan.h
> @@ -11,10 +11,35 @@
>  #define VNI_HASH_BITS  10
>  #define VNI_HASH_SIZE  (1<<VNI_HASH_BITS)
>
> -/* VXLAN protocol header */
> +/* VXLAN protocol header:
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |R|R|R|R|I|R|R|R|               Reserved                        |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |                VXLAN Network Identifier (VNI) |   Reserved    |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + *
> + * I = 1       VXLAN Network Identifier (VNI) present
> + */
>  struct vxlanhdr {
> -       __be32 vx_flags;
> -       __be32 vx_vni;
> +       union {
> +               struct {
> +#ifdef __LITTLE_ENDIAN_BITFIELD
> +                       __u8    reserved_flags1:3,
> +                               vni_present:1,
> +                               reserved_flags2:4;
> +#elif defined(__BIG_ENDIAN_BITFIELD)
> +                       __u8    reserved_flags2:4,
> +                               vni_present:1,
> +                               reserved_flags1:3;
> +#else
> +#error "Please fix <asm/byteorder.h>"
> +#endif
> +                       __u8    vx_reserved1;
> +                       __be16  vx_reserved2;
> +               };
> +               __be32 vx_flags;
> +       };
> +       __be32  vx_vni;
>  };
>
>  struct vxlan_sock;
> @@ -25,6 +50,7 @@ struct vxlan_sock {
>         struct hlist_node hlist;
>         vxlan_rcv_t      *rcv;
>         void             *data;
> +       u32               exts;
>         struct work_struct del_work;
>         struct socket    *sock;
>         struct rcu_head   rcu;
> --
> 1.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented
  2015-01-07  3:46   ` Tom Herbert
@ 2015-01-07 10:27     ` Thomas Graf
  2015-01-07 22:45       ` Jesse Gross
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2015-01-07 10:27 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Linux Netdev List, dev

On 01/06/15 at 07:46pm, Tom Herbert wrote:
> On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > The VXLAN receive code is currently conservative in what it accepts and
> > will reject any frame that uses any of the reserved VXLAN protocol fields.
> > The VXLAN draft specifies that "reserved fields MUST be set to zero on
> > transmit and ignored on receive.".
> >
> IMO it is an unfortunate decision in VXLAN to ignore set reserved
> fields on receive. Had they not done this, then adding a protocol
> field to the header would have been feasible and we wouldn't need yet
> another encapsulation protocol (i.e. VXLAN-GPE). Rejecting frames with
> reserved bits set is the better behavior, but I think the comment
> about this needs to be clear about why this diverges from RFC7348.

Agreed. Do you want to give it a shot? I know you have been involved
on all aspects of this topic for a long time in NVO3 ;-)

> > Retain the current conservative parsing behaviour by default but allows
> > these fields to be used by VXLAN extensions which are explicitly enabled
> > on the VXLAN socket respectively VXLAN net_device.
> >
> Conceptually, VXLAN has both mandatory flags and optional flags for
> extensions. You may want to look at the VXLAN RCO patches that added
> an extension and infrastructure for them.

I've seen your proposed changes. I think they could be easily rebased
on top of this and use the enablement infrastructure. In fact, I see
this as the only feasible option to deal with VXLAN extensions: Leave
it to the user to decide which extensions to run. That way we can
support any combination of extensions, even conflicting ones. All we
have to do is catch incompatible extension configurations on a socket
and reject them at configuration time. Expecting that NVO3/IETF will
find consensus on a list of compatible set of extensions with perfect
forward and backwards compatibility is unrealistic in my eyes. We have
Geneve and GUE to solve it properly in the future.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-07  2:05 ` [PATCH 2/6] vxlan: Group Policy extension Thomas Graf
@ 2015-01-07 16:05   ` Tom Herbert
       [not found]     ` <CA+mtBx_Jj-tUM1nbHd2fHb0-=QpK3tcQgA=smWmg=cB-fupdGg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Tom Herbert @ 2015-01-07 16:05 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Linux Netdev List, dev

On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> Implements supports for the Group Policy VXLAN extension [0] to provide
> a lightweight and simple security label mechanism across network peers
> based on VXLAN. The security context and associated metadata is mapped
> to/from skb->mark. This allows further mapping to a SELinux context
> using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
> tc, etc.
>
> The group membership is defined by the lower 16 bits of skb->mark, the
> upper 16 bits are used for flags.
>
> SELinux allows to manage label to secure local resources. However,
> distributed applications require ACLs to implemented across hosts. This
> is typically achieved by matching on L2-L4 fields to identify the
> original sending host and process on the receiver. On top of that,
> netlabel and specifically CIPSO [1] allow to map security contexts to
> universal labels.  However, netlabel and CIPSO are relatively complex.
> This patch provides a lightweight alternative for overlay network
> environments with a trusted underlay. No additional control protocol
> is required.
>
Associating a sixteen bit field with security is worrisome, especially
considering that VXLAN provides no verification for any header fields
and doesn't even advocate use of outer UDP checksum so the field is
susceptible to an undetected single bit flip. The concept of a
"trusted underlay" is weak justification and hardly universal, so the
only way to actually secure this is through IPsec (this is mentioned
in the VXLAN-GPB draft). But if we have the security state of IPsec
then why would we need this field anyway?

Could this same functionality be achieved if we just match the VNI to
a mark in IP tables?

Tom

>            Host 1:                       Host 2:
>
>       Group A        Group B        Group B     Group A
>       +-----+   +-------------+    +-------+   +-----+
>       | lxc |   | SELinux CTX |    | httpd |   | VM  |
>       +--+--+   +--+----------+    +---+---+   +--+--+
>           \---+---/                     \----+---/
>               |                              |
>           +---+---+                      +---+---+
>           | vxlan |                      | vxlan |
>           +---+---+                      +---+---+
>               +------------------------------+
>
> Backwards compatibility:
> A VXLAN-GBP socket can receive standard VXLAN frames and will assign
> the default group 0x0000 to such frames. A Linux VXLAN socket will
> drop VXLAN-GBP  frames. The extension is therefore disabled by default
> and needs to be specifically enabled:
>
>    ip link add [...] type vxlan [...] gbp
>
> In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
> must run on a separate port number.
>
> Examples:
>   iptables:
>   $ iptables -I OUTPUT -p icmp -j MARK --set-mark 0x200
>   $ iptables -I INPUT -i br0 -m mark --mark 0x200 -j ACCEPT
>
>   OVS (patches provided separately):
>   in_port=1, actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL
>
> [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
> [1] http://lwn.net/Articles/204905/
>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
>  drivers/net/vxlan.c           | 155 ++++++++++++++++++++++++++++++------------
>  include/net/vxlan.h           |  80 ++++++++++++++++++++--
>  include/uapi/linux/if_link.h  |   8 +++
>  net/openvswitch/vport-vxlan.c |   9 ++-
>  4 files changed, 197 insertions(+), 55 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 4d52aa9..30b7b59 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -132,6 +132,7 @@ struct vxlan_dev {
>         __u8              tos;          /* TOS override */
>         __u8              ttl;
>         u32               flags;        /* VXLAN_F_* in vxlan.h */
> +       u32               exts;         /* Enabled extensions */
>
>         struct work_struct sock_work;
>         struct work_struct igmp_join;
> @@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
>                         continue;
>
>                 vh2 = (struct vxlanhdr *)(p->data + off_vx);
> -               if (vh->vx_vni != vh2->vx_vni) {
> +               if (vh->vx_flags != vh2->vx_flags ||
> +                   vh->vx_vni != vh2->vx_vni) {
>                         NAPI_GRO_CB(p)->same_flow = 0;
>                         continue;
>                 }
> @@ -1095,6 +1097,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>  {
>         struct vxlan_sock *vs;
>         struct vxlanhdr *vxh;
> +       struct vxlan_metadata md = {0};
>
>         /* Need Vxlan and inner Ethernet header to be present */
>         if (!pskb_may_pull(skb, VXLAN_HLEN))
> @@ -1113,6 +1116,19 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>         if (vs->exts) {
>                 if (!vxh->vni_present)
>                         goto error_invalid_header;
> +
> +               if (vxh->gbp_present) {
> +                       if (!(vs->exts & VXLAN_EXT_GBP))
> +                               goto error_invalid_header;
> +
> +                       md.gbp = ntohs(vxh->gbp.policy_id);
> +
> +                       if (vxh->gbp.dont_learn)
> +                               md.gbp |= VXLAN_GBP_DONT_LEARN;
> +
> +                       if (vxh->gbp.policy_applied)
> +                               md.gbp |= VXLAN_GBP_POLICY_APPLIED;
> +               }
>         } else {
>                 if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
>                     (vxh->vx_vni & htonl(0xff)))
> @@ -1122,7 +1138,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>         if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
>                 goto drop;
>
> -       vs->rcv(vs, skb, vxh->vx_vni);
> +       md.vni = vxh->vx_vni;
> +       vs->rcv(vs, skb, &md);
>         return 0;
>
>  drop:
> @@ -1138,8 +1155,8 @@ error:
>         return 1;
>  }
>
> -static void vxlan_rcv(struct vxlan_sock *vs,
> -                     struct sk_buff *skb, __be32 vx_vni)
> +static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
> +                     struct vxlan_metadata *md)
>  {
>         struct iphdr *oip = NULL;
>         struct ipv6hdr *oip6 = NULL;
> @@ -1150,7 +1167,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
>         int err = 0;
>         union vxlan_addr *remote_ip;
>
> -       vni = ntohl(vx_vni) >> 8;
> +       vni = ntohl(md->vni) >> 8;
>         /* Is this VNI defined? */
>         vxlan = vxlan_vs_find_vni(vs, vni);
>         if (!vxlan)
> @@ -1184,6 +1201,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
>                 goto drop;
>
>         skb_reset_network_header(skb);
> +       skb->mark = md->gbp;
>
>         if (oip6)
>                 err = IP6_ECN_decapsulate(oip6, skb);
> @@ -1533,15 +1551,54 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
>         return false;
>  }
>
> +static int vxlan_build_hdr(struct sk_buff *skb, struct vxlan_sock *vs,
> +                          int min_headroom, struct vxlan_metadata *md)
> +{
> +       struct vxlanhdr *vxh;
> +       int err;
> +
> +       /* Need space for new headers (invalidates iph ptr) */
> +       err = skb_cow_head(skb, min_headroom);
> +       if (unlikely(err)) {
> +               kfree_skb(skb);
> +               return err;
> +       }
> +
> +       skb = vlan_hwaccel_push_inside(skb);
> +       if (WARN_ON(!skb))
> +               return -ENOMEM;
> +
> +       vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> +       vxh->vx_flags = htonl(VXLAN_FLAGS);
> +       vxh->vx_vni = md->vni;
> +
> +       if (vs->exts)  {
> +               if (vs->exts & VXLAN_EXT_GBP) {
> +                       vxh->gbp_present = 1;
> +
> +                       if (md->gbp & VXLAN_GBP_DONT_LEARN)
> +                               vxh->gbp.dont_learn = 1;
> +
> +                       if (md->gbp & VXLAN_GBP_POLICY_APPLIED)
> +                               vxh->gbp.policy_applied = 1;
> +
> +                       vxh->gbp.policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
> +               }
> +       }
> +
> +       skb_set_inner_protocol(skb, htons(ETH_P_TEB));
> +
> +       return 0;
> +}
> +
>  #if IS_ENABLED(CONFIG_IPV6)
>  static int vxlan6_xmit_skb(struct vxlan_sock *vs,
>                            struct dst_entry *dst, struct sk_buff *skb,
>                            struct net_device *dev, struct in6_addr *saddr,
>                            struct in6_addr *daddr, __u8 prio, __u8 ttl,
> -                          __be16 src_port, __be16 dst_port, __be32 vni,
> -                          bool xnet)
> +                          __be16 src_port, __be16 dst_port,
> +                          struct vxlan_metadata *md, bool xnet)
>  {
> -       struct vxlanhdr *vxh;
>         int min_headroom;
>         int err;
>         bool udp_sum = !udp_get_no_check6_tx(vs->sock->sk);
> @@ -1558,24 +1615,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
>                         + VXLAN_HLEN + sizeof(struct ipv6hdr)
>                         + (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
>
> -       /* Need space for new headers (invalidates iph ptr) */
> -       err = skb_cow_head(skb, min_headroom);
> -       if (unlikely(err)) {
> -               kfree_skb(skb);
> -               goto err;
> -       }
> -
> -       skb = vlan_hwaccel_push_inside(skb);
> -       if (WARN_ON(!skb)) {
> -               err = -ENOMEM;
> +       err = vxlan_build_hdr(skb, vs, min_headroom, md);
> +       if (err)
>                 goto err;
> -       }
> -
> -       vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> -       vxh->vx_flags = htonl(VXLAN_FLAGS);
> -       vxh->vx_vni = vni;
> -
> -       skb_set_inner_protocol(skb, htons(ETH_P_TEB));
>
>         udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio,
>                              ttl, src_port, dst_port);
> @@ -1589,9 +1631,9 @@ err:
>  int vxlan_xmit_skb(struct vxlan_sock *vs,
>                    struct rtable *rt, struct sk_buff *skb,
>                    __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
> -                  __be16 src_port, __be16 dst_port, __be32 vni, bool xnet)
> +                  __be16 src_port, __be16 dst_port,
> +                  struct vxlan_metadata *md, bool xnet)
>  {
> -       struct vxlanhdr *vxh;
>         int min_headroom;
>         int err;
>         bool udp_sum = !vs->sock->sk->sk_no_check_tx;
> @@ -1604,22 +1646,9 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
>                         + VXLAN_HLEN + sizeof(struct iphdr)
>                         + (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
>
> -       /* Need space for new headers (invalidates iph ptr) */
> -       err = skb_cow_head(skb, min_headroom);
> -       if (unlikely(err)) {
> -               kfree_skb(skb);
> +       err = vxlan_build_hdr(skb, vs, min_headroom, md);
> +       if (err)
>                 return err;
> -       }
> -
> -       skb = vlan_hwaccel_push_inside(skb);
> -       if (WARN_ON(!skb))
> -               return -ENOMEM;
> -
> -       vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> -       vxh->vx_flags = htonl(VXLAN_FLAGS);
> -       vxh->vx_vni = vni;
> -
> -       skb_set_inner_protocol(skb, htons(ETH_P_TEB));
>
>         return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
>                                    ttl, df, src_port, dst_port, xnet);
> @@ -1679,6 +1708,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>         const struct iphdr *old_iph;
>         struct flowi4 fl4;
>         union vxlan_addr *dst;
> +       struct vxlan_metadata md;
>         __be16 src_port = 0, dst_port;
>         u32 vni;
>         __be16 df = 0;
> @@ -1749,11 +1779,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>
>                 tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
>                 ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
> +               md.vni = htonl(vni << 8);
> +               md.gbp = skb->mark;
>
>                 err = vxlan_xmit_skb(vxlan->vn_sock, rt, skb,
>                                      fl4.saddr, dst->sin.sin_addr.s_addr,
> -                                    tos, ttl, df, src_port, dst_port,
> -                                    htonl(vni << 8),
> +                                    tos, ttl, df, src_port, dst_port, &md,
>                                      !net_eq(vxlan->net, dev_net(vxlan->dev)));
>                 if (err < 0) {
>                         /* skb is already freed. */
> @@ -1806,10 +1837,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>                 }
>
>                 ttl = ttl ? : ip6_dst_hoplimit(ndst);
> +               md.vni = htonl(vni << 8);
> +               md.gbp = skb->mark;
>
>                 err = vxlan6_xmit_skb(vxlan->vn_sock, ndst, skb,
>                                       dev, &fl6.saddr, &fl6.daddr, 0, ttl,
> -                                     src_port, dst_port, htonl(vni << 8),
> +                                     src_port, dst_port, &md,
>                                       !net_eq(vxlan->net, dev_net(vxlan->dev)));
>  #endif
>         }
> @@ -2210,6 +2243,11 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
>         [IFLA_VXLAN_UDP_CSUM]   = { .type = NLA_U8 },
>         [IFLA_VXLAN_UDP_ZERO_CSUM6_TX]  = { .type = NLA_U8 },
>         [IFLA_VXLAN_UDP_ZERO_CSUM6_RX]  = { .type = NLA_U8 },
> +       [IFLA_VXLAN_EXTENSION]  = { .type = NLA_NESTED },
> +};
> +
> +static const struct nla_policy vxlan_ext_policy[IFLA_VXLAN_EXT_MAX + 1] = {
> +       [IFLA_VXLAN_EXT_GBP]    = { .type = NLA_FLAG, },
>  };
>
>  static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
> @@ -2246,6 +2284,18 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
>                 }
>         }
>
> +       if (data[IFLA_VXLAN_EXTENSION]) {
> +               int err;
> +
> +               err = nla_validate_nested(data[IFLA_VXLAN_EXTENSION],
> +                                         IFLA_VXLAN_EXT_MAX, vxlan_ext_policy);
> +               if (err < 0) {
> +                       pr_debug("invalid VXLAN extension configuration: %d\n",
> +                                err);
> +                       return -EINVAL;
> +               }
> +       }
> +
>         return 0;
>  }
>
> @@ -2400,6 +2450,18 @@ static void vxlan_sock_work(struct work_struct *work)
>         dev_put(vxlan->dev);
>  }
>
> +static void configure_vxlan_exts(struct vxlan_dev *vxlan, struct nlattr *attr)
> +{
> +       struct nlattr *exts[IFLA_VXLAN_EXT_MAX+1];
> +
> +       /* Validated in vxlan_validate() */
> +       if (nla_parse_nested(exts, IFLA_VXLAN_EXT_MAX, attr, NULL) < 0)
> +               BUG();
> +
> +       if (exts[IFLA_VXLAN_EXT_GBP])
> +               vxlan->exts |= VXLAN_EXT_GBP;
> +}
> +
>  static int vxlan_newlink(struct net *net, struct net_device *dev,
>                          struct nlattr *tb[], struct nlattr *data[])
>  {
> @@ -2525,6 +2587,9 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
>             nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
>                 vxlan->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
>
> +       if (data[IFLA_VXLAN_EXTENSION])
> +               configure_vxlan_exts(vxlan, data[IFLA_VXLAN_EXTENSION]);
> +
>         if (vxlan_find_vni(net, vni, use_ipv6 ? AF_INET6 : AF_INET,
>                            vxlan->dst_port)) {
>                 pr_info("duplicate VNI %u\n", vni);
> diff --git a/include/net/vxlan.h b/include/net/vxlan.h
> index 3e98d31..66000d0 100644
> --- a/include/net/vxlan.h
> +++ b/include/net/vxlan.h
> @@ -11,13 +11,60 @@
>  #define VNI_HASH_BITS  10
>  #define VNI_HASH_SIZE  (1<<VNI_HASH_BITS)
>
> +/*
> + * VXLAN Group Based Policy Extension:
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |1|-|-|-|1|-|-|-|R|D|R|R|A|R|R|R|        Group Policy ID        |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |                VXLAN Network Identifier (VNI) |   Reserved    |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + *
> + * D = Don't Learn bit. When set, this bit indicates that the egress
> + *     VTEP MUST NOT learn the source address of the encapsulated frame.
> + *
> + * A = Indicates that the group policy has already been applied to
> + *     this packet. Policies MUST NOT be applied by devices when the
> + *     A bit is set.
> + *
> + * [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
> + */
> +struct vxlan_gbp {
> +#ifdef __LITTLE_ENDIAN_BITFIELD
> +       __u8    reserved_flags1:3,
> +               policy_applied:1,
> +               reserved_flags2:2,
> +               dont_learn:1,
> +               reserved_flags3:1;
> +#elif defined(__BIG_ENDIAN_BITFIELD)
> +       __u8    reserved_flags1:1,
> +               dont_learn:1,
> +               reserved_flags2:2,
> +               policy_applied:1,
> +               reserved_flags3:3;
> +#else
> +#error "Please fix <asm/byteorder.h>"
> +#endif
> +       __be16 policy_id;
> +} __packed;
> +
> +/* skb->mark mapping
> + *
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |R|R|R|R|R|R|R|R|R|D|R|R|A|R|R|R|        Group Policy ID        |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + */
> +#define VXLAN_GBP_DONT_LEARN           (BIT(6) << 16)
> +#define VXLAN_GBP_POLICY_APPLIED       (BIT(3) << 16)
> +#define VXLAN_GBP_ID_MASK              (0xFFFF)
> +
>  /* VXLAN protocol header:
>   * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> - * |R|R|R|R|I|R|R|R|               Reserved                        |
> + * |G|R|R|R|I|R|R|R|               Reserved                        |
>   * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>   * |                VXLAN Network Identifier (VNI) |   Reserved    |
>   * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>   *
> + * G = 1       Group Policy (VXLAN-GBP)
>   * I = 1       VXLAN Network Identifier (VNI) present
>   */
>  struct vxlanhdr {
> @@ -26,24 +73,42 @@ struct vxlanhdr {
>  #ifdef __LITTLE_ENDIAN_BITFIELD
>                         __u8    reserved_flags1:3,
>                                 vni_present:1,
> -                               reserved_flags2:4;
> +                               reserved_flags2:3,
> +                               gbp_present:1;
>  #elif defined(__BIG_ENDIAN_BITFIELD)
> -                       __u8    reserved_flags2:4,
> +                       __u8    gbp_present:1,
> +                               reserved_flags2:3,
>                                 vni_present:1,
>                                 reserved_flags1:3;
>  #else
>  #error "Please fix <asm/byteorder.h>"
>  #endif
> -                       __u8    vx_reserved1;
> -                       __be16  vx_reserved2;
> +                       union {
> +                               /* NOTE: Offset 0 will be 1 byte aligned, so
> +                                * all member structs must be marked packed.
> +                                */
> +                               struct vxlan_gbp gbp;
> +                               struct {
> +                                       __u8    vx_reserved1;
> +                                       __be16  vx_reserved2;
> +                               } __packed;
> +                       };
>                 };
>                 __be32 vx_flags;
>         };
>         __be32  vx_vni;
>  };
>
> +struct vxlan_metadata {
> +       __be32          vni;
> +       u32             gbp;
> +};
> +
>  struct vxlan_sock;
> -typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb, __be32 key);
> +typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb,
> +                          struct vxlan_metadata *md);
> +
> +#define VXLAN_EXT_GBP                  BIT(0)
>
>  /* per UDP socket information */
>  struct vxlan_sock {
> @@ -78,7 +143,8 @@ void vxlan_sock_release(struct vxlan_sock *vs);
>  int vxlan_xmit_skb(struct vxlan_sock *vs,
>                    struct rtable *rt, struct sk_buff *skb,
>                    __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
> -                  __be16 src_port, __be16 dst_port, __be32 vni, bool xnet);
> +                  __be16 src_port, __be16 dst_port, struct vxlan_metadata *md,
> +                  bool xnet);
>
>  static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
>                                                      netdev_features_t features)
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index f7d0d2d..9f07bf5 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -370,10 +370,18 @@ enum {
>         IFLA_VXLAN_UDP_CSUM,
>         IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
>         IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
> +       IFLA_VXLAN_EXTENSION,
>         __IFLA_VXLAN_MAX
>  };
>  #define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
>
> +enum {
> +       IFLA_VXLAN_EXT_UNSPEC,
> +       IFLA_VXLAN_EXT_GBP,
> +       __IFLA_VXLAN_EXT_MAX,
> +};
> +#define IFLA_VXLAN_EXT_MAX (__IFLA_VXLAN_EXT_MAX - 1)
> +
>  struct ifla_vxlan_port_range {
>         __be16  low;
>         __be16  high;
> diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
> index d7c46b3..dd68c97 100644
> --- a/net/openvswitch/vport-vxlan.c
> +++ b/net/openvswitch/vport-vxlan.c
> @@ -59,7 +59,8 @@ static inline struct vxlan_port *vxlan_vport(const struct vport *vport)
>  }
>
>  /* Called with rcu_read_lock and BH disabled. */
> -static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
> +static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
> +                     struct vxlan_metadata *md)
>  {
>         struct ovs_tunnel_info tun_info;
>         struct vport *vport = vs->data;
> @@ -68,7 +69,7 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
>
>         /* Save outer tunnel values */
>         iph = ip_hdr(skb);
> -       key = cpu_to_be64(ntohl(vx_vni) >> 8);
> +       key = cpu_to_be64(ntohl(md->vni) >> 8);
>         ovs_flow_tun_info_init(&tun_info, iph,
>                                udp_hdr(skb)->source, udp_hdr(skb)->dest,
>                                key, TUNNEL_KEY, NULL, 0);
> @@ -146,6 +147,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
>         struct vxlan_port *vxlan_port = vxlan_vport(vport);
>         __be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport;
>         struct ovs_key_ipv4_tunnel *tun_key;
> +       struct vxlan_metadata md;
>         struct rtable *rt;
>         struct flowi4 fl;
>         __be16 src_port;
> @@ -178,12 +180,13 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
>         skb->ignore_df = 1;
>
>         src_port = udp_flow_src_port(net, skb, 0, 0, true);
> +       md.vni = htonl(be64_to_cpu(tun_key->tun_id) << 8);
>
>         err = vxlan_xmit_skb(vxlan_port->vs, rt, skb,
>                              fl.saddr, tun_key->ipv4_dst,
>                              tun_key->ipv4_tos, tun_key->ipv4_ttl, df,
>                              src_port, dst_port,
> -                            htonl(be64_to_cpu(tun_key->tun_id) << 8),
> +                            &md,
>                              false);
>         if (err < 0)
>                 ip_rt_put(rt);
> --
> 1.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
       [not found]     ` <CA+mtBx_Jj-tUM1nbHd2fHb0-=QpK3tcQgA=smWmg=cB-fupdGg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-01-07 16:21       ` Thomas Graf
  2015-01-07 16:56         ` Tom Herbert
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2015-01-07 16:21 UTC (permalink / raw)
  To: Tom Herbert
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, Linux Netdev List, Stephen Hemminger,
	David Miller

On 01/07/15 at 08:05am, Tom Herbert wrote:
> Associating a sixteen bit field with security is worrisome, especially
> considering that VXLAN provides no verification for any header fields
> and doesn't even advocate use of outer UDP checksum so the field is
> susceptible to an undetected single bit flip. The concept of a
> "trusted underlay" is weak justification and hardly universal, so the
> only way to actually secure this is through IPsec (this is mentioned
> in the VXLAN-GPB draft).

As you state correctly, this work requires a trusted underlay which can
be achieved with IPsec, OpenVPN, SSH, ...

> But if we have the security state of IPsec then why would we need
> this field anyway?

It's a separation of concern: the security label mechanism of the
overlay should not depend on an eventual encryption layer in the
underlay as not all of them provide a mechanism to label packets.
 
> Could this same functionality be achieved if we just match the VNI to
> a mark in IP tables?

If the VNI is not already used for another purpose, yes. The solution
as proposed can be integrated into existing VXLAN overlays separated by
VNI. It is also compatible with hardware VXLAN VTEPs which ignore the
reserved bits while continueing to maintain VNI separation.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-07 16:21       ` Thomas Graf
@ 2015-01-07 16:56         ` Tom Herbert
       [not found]           ` <CA+mtBx_A_M3+irq7w4nNCyPZBgM7ja+wfJT4w4Q0Yo6GMGYVgA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Tom Herbert @ 2015-01-07 16:56 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Linux Netdev List, dev

On Wed, Jan 7, 2015 at 8:21 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 01/07/15 at 08:05am, Tom Herbert wrote:
>> Associating a sixteen bit field with security is worrisome, especially
>> considering that VXLAN provides no verification for any header fields
>> and doesn't even advocate use of outer UDP checksum so the field is
>> susceptible to an undetected single bit flip. The concept of a
>> "trusted underlay" is weak justification and hardly universal, so the
>> only way to actually secure this is through IPsec (this is mentioned
>> in the VXLAN-GPB draft).
>
> As you state correctly, this work requires a trusted underlay which can
> be achieved with IPsec, OpenVPN, SSH, ...
>
This can't be enforced. There's already a lot of deployment of VXLAN
in non-trusted networks, and there's nothing to prevent someone from
using this feature in those environments. Maybe there's an argument
that VXLAN already fundamentally lacks security and verification, so
adding this field might not make things worse :-/.

>> But if we have the security state of IPsec then why would we need
>> this field anyway?
>
> It's a separation of concern: the security label mechanism of the
> overlay should not depend on an eventual encryption layer in the
> underlay as not all of them provide a mechanism to label packets.
>
>> Could this same functionality be achieved if we just match the VNI to
>> a mark in IP tables?
>
> If the VNI is not already used for another purpose, yes. The solution
> as proposed can be integrated into existing VXLAN overlays separated by
> VNI. It is also compatible with hardware VXLAN VTEPs which ignore the
> reserved bits while continueing to maintain VNI separation.

It seems like it should be relatively easy to group VNIs together to
have the same mark with the current use of VNI. The works up to the
point that all packets corresponding to a single VNI get the same
mark.

Tom

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
       [not found]           ` <CA+mtBx_A_M3+irq7w4nNCyPZBgM7ja+wfJT4w4Q0Yo6GMGYVgA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-01-07 17:21             ` Thomas Graf
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Graf @ 2015-01-07 17:21 UTC (permalink / raw)
  To: Tom Herbert
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, Linux Netdev List, Stephen Hemminger,
	David Miller

On 01/07/15 at 08:56am, Tom Herbert wrote:
> On Wed, Jan 7, 2015 at 8:21 AM, Thomas Graf <tgraf@suug.ch> wrote:
> > If the VNI is not already used for another purpose, yes. The solution
> > as proposed can be integrated into existing VXLAN overlays separated by
> > VNI. It is also compatible with hardware VXLAN VTEPs which ignore the
> > reserved bits while continueing to maintain VNI separation.
> 
> It seems like it should be relatively easy to group VNIs together to
> have the same mark with the current use of VNI. The works up to the
> point that all packets corresponding to a single VNI get the same
> mark.

This really depends on the network architecture and assumes that you
can remap the VNIs in the entire network. You might want to run L3
with group definitions across multiple L2 VNI segments. A second issue
is that many hardware VXLAN VTEPs do VNI based learning and will run
into capacity limits.

I'm not saying it's impossible but it's very tricky to intergrate if
you can't start from scratch. The whole point of this is to come up
with something that is painfully easy to use and integrate without
requiring to change much.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented
  2015-01-07 10:27     ` Thomas Graf
@ 2015-01-07 22:45       ` Jesse Gross
       [not found]         ` <CAEP_g=_Xr=6nVoeVT8dgRZoM0bNXDiiua5GrjweZ1GaT1ixhZg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Jesse Gross @ 2015-01-07 22:45 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Tom Herbert, David Miller, Stephen Hemminger, Pravin B Shelar,
	Linux Netdev List, dev

On Wed, Jan 7, 2015 at 2:27 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 01/06/15 at 07:46pm, Tom Herbert wrote:
>> On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
>> > The VXLAN receive code is currently conservative in what it accepts and
>> > will reject any frame that uses any of the reserved VXLAN protocol fields.
>> > The VXLAN draft specifies that "reserved fields MUST be set to zero on
>> > transmit and ignored on receive.".
>> >
>> IMO it is an unfortunate decision in VXLAN to ignore set reserved
>> fields on receive. Had they not done this, then adding a protocol
>> field to the header would have been feasible and we wouldn't need yet
>> another encapsulation protocol (i.e. VXLAN-GPE). Rejecting frames with
>> reserved bits set is the better behavior, but I think the comment
>> about this needs to be clear about why this diverges from RFC7348.
>
> Agreed. Do you want to give it a shot? I know you have been involved
> on all aspects of this topic for a long time in NVO3 ;-)

There is little chance of this happening. Besides the IETF procedural
aspects, ignoring unknown reserved bits on receive is the long
standing standard way of extending protocols. It's how TCP was
extended to support ECN, for example.

>> > Retain the current conservative parsing behaviour by default but allows
>> > these fields to be used by VXLAN extensions which are explicitly enabled
>> > on the VXLAN socket respectively VXLAN net_device.
>> >
>> Conceptually, VXLAN has both mandatory flags and optional flags for
>> extensions. You may want to look at the VXLAN RCO patches that added
>> an extension and infrastructure for them.
>
> I've seen your proposed changes. I think they could be easily rebased
> on top of this and use the enablement infrastructure. In fact, I see
> this as the only feasible option to deal with VXLAN extensions: Leave
> it to the user to decide which extensions to run. That way we can
> support any combination of extensions, even conflicting ones. All we
> have to do is catch incompatible extension configurations on a socket
> and reject them at configuration time. Expecting that NVO3/IETF will
> find consensus on a list of compatible set of extensions with perfect
> forward and backwards compatibility is unrealistic in my eyes. We have
> Geneve and GUE to solve it properly in the future.

My concern is that having multiple (and potentially overlapping)
extensions is going to make the VXLAN code very messy and hard to
follow. I think there's already quite a big of complexity there from
the DOVE extensions (which are basically dead at this point to the
best of my knowledge). We already see this some with the discussion
over the header struct but that's just the beginning - can you imagine
the code that checks a single bit for several different
interpretations? Not to mention the fact that the location of the bits
in some of these proposals has already been shuffled several times. It
seems very painful...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] vxlan: Only bind to sockets with correct extensions enabled
  2015-01-07  2:05 ` [PATCH 3/6] vxlan: Only bind to sockets with correct extensions enabled Thomas Graf
@ 2015-01-07 22:45   ` Jesse Gross
  2015-01-07 22:52     ` Thomas Graf
  0 siblings, 1 reply; 26+ messages in thread
From: Jesse Gross @ 2015-01-07 22:45 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David Miller, Stephen Hemminger, Pravin Shelar, netdev, dev

On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> A VXLAN net_device looking for an appropriate socket may only
> consider a socket which has the exact set of extensions enabled.
> If none can be found, a new socket must be created.

Maybe it's just the phrasing of the commit message but won't the new
socket that needs to be created immediately fail? I think this is
really just checking that you don't try to instantiate two different
sets of extensions on the same UDP port - it's not like this is going
to somehow create a new socket and they will be able to coexist.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS()
  2015-01-07  2:05 ` [PATCH 5/6] openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS() Thomas Graf
@ 2015-01-07 22:46   ` Jesse Gross
  2015-01-07 22:55     ` Thomas Graf
  0 siblings, 1 reply; 26+ messages in thread
From: Jesse Gross @ 2015-01-07 22:46 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David Miller, Stephen Hemminger, Pravin Shelar, netdev, dev

On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> A subsequent patch will introduce VXLAN options. Rename the existing
> GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic
> tunnel metadata options.
>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>

This is generally a good idea (even outside of the larger context of
this patch series) although a couple comments below:

> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index d1eecf7..c60ae3f 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -387,20 +387,20 @@ static int parse_flow_nlattrs(const struct nlattr *attr,
>         return __parse_flow_nlattrs(attr, a, attrsp, log, false);
>  }
>
> -static int genev_tun_opt_from_nlattr(const struct nlattr *a,
> -                                    struct sw_flow_match *match, bool is_mask,
> -                                    bool log)
> +static int tun_md_opt_from_nlattr(const struct nlattr *a,
> +                                 struct sw_flow_match *match, bool is_mask,
> +                                 bool log)

I think this is a somewhat overzealous conversion. This function is
verifying OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS and as such has some
elements of Geneve protocol knowledge baked in (such as the check that
the options are a multiple of 4 bytes). It's also nice for the log
messages to indicate which netlink key is causing the problem and not
be too generic.

> @@ -1148,10 +1145,10 @@ int ovs_nla_put_flow(const struct sw_flow_key *swkey,
>                 goto nla_put_failure;
>
>         if ((swkey->tun_key.ipv4_dst || is_mask)) {
> -               const struct geneve_opt *opts = NULL;
> +               const void *opts = NULL;
>
>                 if (output->tun_key.tun_flags & TUNNEL_OPTIONS_PRESENT)
> -                       opts = GENEVE_OPTS(output, swkey->tun_opts_len);
> +                       opts = TUN_METADATA_OPTS(output, swkey->tun_opts_len);
>
>                 if (ipv4_tun_to_nlattr(skb, &output->tun_key, opts,
>                                        swkey->tun_opts_len))

I think it's probably better to factor this block of code out into a
new function as you have done in the next patch (and include it with
this one). It makes it clearer that it is Geneve-specific.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] openvswitch: Support VXLAN Group Policy extension
  2015-01-07  2:05 ` [PATCH 6/6] openvswitch: Support VXLAN Group Policy extension Thomas Graf
@ 2015-01-07 22:46   ` Jesse Gross
  2015-01-07 23:01     ` Thomas Graf
  0 siblings, 1 reply; 26+ messages in thread
From: Jesse Gross @ 2015-01-07 22:46 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David Miller, Stephen Hemminger, Pravin Shelar, netdev, dev

On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> The group policy metadata is handled in the same way as Geneve options
> and transported as binary blob in a new Netlink attribute
> OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS which is mutually exclusive to the
> existing OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS.

Can you explain some more what the encoding would look like if
additional options were defined (including ones that are potentially
mutually exclusive)? The Geneve options are binary but that is coming
directly from the protocol specification. However, this isn't an on
the wire format so I'm not sure what it would look like or how it
would be defined to avoid conflict and allow evolution.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] vxlan: Only bind to sockets with correct extensions enabled
  2015-01-07 22:45   ` Jesse Gross
@ 2015-01-07 22:52     ` Thomas Graf
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Graf @ 2015-01-07 22:52 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, Stephen Hemminger, Pravin Shelar, netdev, dev

On 01/07/15 at 02:45pm, Jesse Gross wrote:
> On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > A VXLAN net_device looking for an appropriate socket may only
> > consider a socket which has the exact set of extensions enabled.
> > If none can be found, a new socket must be created.
> 
> Maybe it's just the phrasing of the commit message but won't the new
> socket that needs to be created immediately fail? I think this is
> really just checking that you don't try to instantiate two different
> sets of extensions on the same UDP port - it's not like this is going
> to somehow create a new socket and they will be able to coexist.

Your interpretation is correct and the phrasing is poor. It prevents
a non-GBP socket from being used for a GBP socket. I will rework the
commit message.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS()
  2015-01-07 22:46   ` Jesse Gross
@ 2015-01-07 22:55     ` Thomas Graf
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Graf @ 2015-01-07 22:55 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, Stephen Hemminger, Pravin Shelar, netdev, dev

On 01/07/15 at 02:46pm, Jesse Gross wrote:
> On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > A subsequent patch will introduce VXLAN options. Rename the existing
> > GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic
> > tunnel metadata options.
> >
> > Signed-off-by: Thomas Graf <tgraf@suug.ch>
> 
> This is generally a good idea (even outside of the larger context of
> this patch series) although a couple comments below:
> 
> > diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> > index d1eecf7..c60ae3f 100644
> > --- a/net/openvswitch/flow_netlink.c
> > +++ b/net/openvswitch/flow_netlink.c
> > @@ -387,20 +387,20 @@ static int parse_flow_nlattrs(const struct nlattr *attr,
> >         return __parse_flow_nlattrs(attr, a, attrsp, log, false);
> >  }
> >
> > -static int genev_tun_opt_from_nlattr(const struct nlattr *a,
> > -                                    struct sw_flow_match *match, bool is_mask,
> > -                                    bool log)
> > +static int tun_md_opt_from_nlattr(const struct nlattr *a,
> > +                                 struct sw_flow_match *match, bool is_mask,
> > +                                 bool log)
> 
> I think this is a somewhat overzealous conversion. This function is
> verifying OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS and as such has some
> elements of Geneve protocol knowledge baked in (such as the check that
> the options are a multiple of 4 bytes). It's also nice for the log
> messages to indicate which netlink key is causing the problem and not
> be too generic.

OK. I was planning on always padding vxlan_opts to 4 bytes as well but
I agree with you. It seems clearer to have distinct functions for each
and they can share code where it makes sense.

> > @@ -1148,10 +1145,10 @@ int ovs_nla_put_flow(const struct sw_flow_key *swkey,
> >                 goto nla_put_failure;
> >
> >         if ((swkey->tun_key.ipv4_dst || is_mask)) {
> > -               const struct geneve_opt *opts = NULL;
> > +               const void *opts = NULL;
> >
> >                 if (output->tun_key.tun_flags & TUNNEL_OPTIONS_PRESENT)
> > -                       opts = GENEVE_OPTS(output, swkey->tun_opts_len);
> > +                       opts = TUN_METADATA_OPTS(output, swkey->tun_opts_len);
> >
> >                 if (ipv4_tun_to_nlattr(skb, &output->tun_key, opts,
> >                                        swkey->tun_opts_len))
> 
> I think it's probably better to factor this block of code out into a
> new function as you have done in the next patch (and include it with
> this one). It makes it clearer that it is Geneve-specific.

OK, will do.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] openvswitch: Support VXLAN Group Policy extension
  2015-01-07 22:46   ` Jesse Gross
@ 2015-01-07 23:01     ` Thomas Graf
  2015-01-08  1:18       ` Jesse Gross
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2015-01-07 23:01 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, Stephen Hemminger, Pravin Shelar, netdev, dev

On 01/07/15 at 02:46pm, Jesse Gross wrote:
> On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > The group policy metadata is handled in the same way as Geneve options
> > and transported as binary blob in a new Netlink attribute
> > OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS which is mutually exclusive to the
> > existing OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS.
> 
> Can you explain some more what the encoding would look like if
> additional options were defined (including ones that are potentially
> mutually exclusive)? The Geneve options are binary but that is coming
> directly from the protocol specification. However, this isn't an on
> the wire format so I'm not sure what it would look like or how it
> would be defined to avoid conflict and allow evolution.

The encoding will be based on struct ovs_vxlan_opts which is extended
as needed by appending new members to the end of the struct. Parsers
will look at the provided length to see which fields are provided.

The user space side looks as follows. I will add similar logic to the
kernel side as soon as we have a 2nd extension.

+/* Returns true if attribute is long enough to cover member of type. */
+#define NL_PROVIDES_MEMBER(attr, type, member, size) \
+       (nl_attr_get_size(attr) >= (offsetof(type, member) + size))
+

[...]

+        case OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS: {
+            struct ovs_vxlan_opts *vxlan_opts;
+
+            /* Length verification done per member */
+            vxlan_opts = (struct ovs_vxlan_opts *)nl_attr_get_unspec(a, 0);
+
+            if (NL_PROVIDES_MEMBER(a, struct ovs_vxlan_opts, gbp, sizeof(vxlan_opts->gbp))) {
+                tun->gbp_id = htons(vxlan_opts->gbp & 0xFFFF);
+                tun->gbp_flags = (vxlan_opts->gbp >> 16) & 0xFF;
+            }
+            break;
+        }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented
       [not found]         ` <CAEP_g=_Xr=6nVoeVT8dgRZoM0bNXDiiua5GrjweZ1GaT1ixhZg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-01-07 23:24           ` Thomas Graf
       [not found]             ` <20150107232412.GD21149-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2015-01-07 23:24 UTC (permalink / raw)
  To: Jesse Gross
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, Linux Netdev List, Stephen Hemminger,
	David Miller, Tom Herbert

On 01/07/15 at 02:45pm, Jesse Gross wrote:
> My concern is that having multiple (and potentially overlapping)
> extensions is going to make the VXLAN code very messy and hard to
> follow. I think there's already quite a big of complexity there from
> the DOVE extensions (which are basically dead at this point to the
> best of my knowledge). We already see this some with the discussion
> over the header struct but that's just the beginning - can you imagine
> the code that checks a single bit for several different
> interpretations?

I'm not disagreeing with you Jesse. I think it's manageable though.
I don't think the code as proposed is overly difficult to understand
or messy. I'm glad to adjust the struct definition if the union style
is considered messy, no problem.

We have these reserved bits in a widely used protocol and it is our
choice to make use of these bits or not. My opinion is that we should.
Good candidates to me seem to be GBP for security labels, RCO for
checksumming and GPE for non-L2 over VXLAN.

I would love for everybody to switch over to Geneve or GUE and as you
know I'm highly interested in pushing it forward. Reality is that it
might be another couple of years for it to become fully established.

> Not to mention the fact that the location of the bits
> in some of these proposals has already been shuffled several times. It
> seems very painful...

I don't really want to defend the process behind these drafts. I
believe that code defines the standard, in particular open source code.
As for GBP, we are the authors of the draft and this has been deployed
to hardware as well so this won't change.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented
       [not found]             ` <20150107232412.GD21149-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
@ 2015-01-08  0:02               ` Tom Herbert
  2015-01-08  0:14                 ` Thomas Graf
  0 siblings, 1 reply; 26+ messages in thread
From: Tom Herbert @ 2015-01-08  0:02 UTC (permalink / raw)
  To: Thomas Graf
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, Linux Netdev List, Stephen Hemminger,
	David Miller

On Wed, Jan 7, 2015 at 3:24 PM, Thomas Graf <tgraf@suug.ch> wrote:
> On 01/07/15 at 02:45pm, Jesse Gross wrote:
>> My concern is that having multiple (and potentially overlapping)
>> extensions is going to make the VXLAN code very messy and hard to
>> follow. I think there's already quite a big of complexity there from
>> the DOVE extensions (which are basically dead at this point to the
>> best of my knowledge). We already see this some with the discussion
>> over the header struct but that's just the beginning - can you imagine
>> the code that checks a single bit for several different
>> interpretations?
>
> I'm not disagreeing with you Jesse. I think it's manageable though.
> I don't think the code as proposed is overly difficult to understand
> or messy. I'm glad to adjust the struct definition if the union style
> is considered messy, no problem.
>
> We have these reserved bits in a widely used protocol and it is our
> choice to make use of these bits or not. My opinion is that we should.
> Good candidates to me seem to be GBP for security labels, RCO for
> checksumming and GPE for non-L2 over VXLAN.
>
Do you know how could GPE work with GBP they want to use the same bits
in header for data? Seems like these are mutually exclusive
extensions. RCO should be fine with either :-)

> I would love for everybody to switch over to Geneve or GUE and as you
> know I'm highly interested in pushing it forward. Reality is that it
> might be another couple of years for it to become fully established.
>
>> Not to mention the fact that the location of the bits
>> in some of these proposals has already been shuffled several times. It
>> seems very painful...
>
> I don't really want to defend the process behind these drafts. I
> believe that code defines the standard, in particular open source code.
> As for GBP, we are the authors of the draft and this has been deployed
> to hardware as well so this won't change.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented
  2015-01-08  0:02               ` Tom Herbert
@ 2015-01-08  0:14                 ` Thomas Graf
       [not found]                   ` <20150108001401.GG21149-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2015-01-08  0:14 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Jesse Gross, David Miller, Stephen Hemminger, Pravin B Shelar,
	Linux Netdev List, dev

On 01/07/15 at 04:02pm, Tom Herbert wrote:
> Do you know how could GPE work with GBP they want to use the same bits
> in header for data? Seems like these are mutually exclusive
> extensions. RCO should be fine with either :-)

Yes, GBP and GPE are mutually exclusive extensions. Although
GPE would allow to define an intermediate header for additional
metadata, f.e. encoded as GUE or Geneve options.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented
       [not found]                   ` <20150108001401.GG21149-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
@ 2015-01-08  0:23                     ` Tom Herbert
  0 siblings, 0 replies; 26+ messages in thread
From: Tom Herbert @ 2015-01-08  0:23 UTC (permalink / raw)
  To: Thomas Graf
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, Linux Netdev List, Stephen Hemminger,
	David Miller

On Wed, Jan 7, 2015 at 4:14 PM, Thomas Graf <tgraf@suug.ch> wrote:
> On 01/07/15 at 04:02pm, Tom Herbert wrote:
>> Do you know how could GPE work with GBP they want to use the same bits
>> in header for data? Seems like these are mutually exclusive
>> extensions. RCO should be fine with either :-)
>
> Yes, GBP and GPE are mutually exclusive extensions. Although
> GPE would allow to define an intermediate header for additional
> metadata, f.e. encoded as GUE or Geneve options.

The latest GPE draft uses eight bits for next protocol, so if you make
the security field eight bits it would fit nicely in the octet before
the next protocol.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] openvswitch: Support VXLAN Group Policy extension
  2015-01-07 23:01     ` Thomas Graf
@ 2015-01-08  1:18       ` Jesse Gross
  2015-01-08 10:22         ` Thomas Graf
  0 siblings, 1 reply; 26+ messages in thread
From: Jesse Gross @ 2015-01-08  1:18 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David Miller, Stephen Hemminger, Pravin Shelar, netdev, dev

On Wed, Jan 7, 2015 at 3:01 PM, Thomas Graf <tgraf@suug.ch> wrote:
> On 01/07/15 at 02:46pm, Jesse Gross wrote:
>> On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
>> > The group policy metadata is handled in the same way as Geneve options
>> > and transported as binary blob in a new Netlink attribute
>> > OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS which is mutually exclusive to the
>> > existing OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS.
>>
>> Can you explain some more what the encoding would look like if
>> additional options were defined (including ones that are potentially
>> mutually exclusive)? The Geneve options are binary but that is coming
>> directly from the protocol specification. However, this isn't an on
>> the wire format so I'm not sure what it would look like or how it
>> would be defined to avoid conflict and allow evolution.
>
> The encoding will be based on struct ovs_vxlan_opts which is extended
> as needed by appending new members to the end of the struct. Parsers
> will look at the provided length to see which fields are provided.

But this means that if there are two extensions that are conflicting
or if one is retired you still need to carry the earlier members of
the struct. Why not make them real netlink attributes?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] openvswitch: Support VXLAN Group Policy extension
  2015-01-08  1:18       ` Jesse Gross
@ 2015-01-08 10:22         ` Thomas Graf
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Graf @ 2015-01-08 10:22 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, Stephen Hemminger, Pravin Shelar, netdev, dev

On 01/07/15 at 05:18pm, Jesse Gross wrote:
> On Wed, Jan 7, 2015 at 3:01 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > The encoding will be based on struct ovs_vxlan_opts which is extended
> > as needed by appending new members to the end of the struct. Parsers
> > will look at the provided length to see which fields are provided.
> 
> But this means that if there are two extensions that are conflicting
> or if one is retired you still need to carry the earlier members of
> the struct. Why not make them real netlink attributes?

I figured that due to the limited space available in the VXLAN header
the structure would never grow big. I have no problem converting this
to use Netlink attributes internally though. Will address this in v2.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-01-08 10:22 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-07  2:05 [PATCH 0/6 net-next] VXLAN Group Policy Extension Thomas Graf
2015-01-07  2:05 ` [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented Thomas Graf
2015-01-07  3:46   ` Tom Herbert
2015-01-07 10:27     ` Thomas Graf
2015-01-07 22:45       ` Jesse Gross
     [not found]         ` <CAEP_g=_Xr=6nVoeVT8dgRZoM0bNXDiiua5GrjweZ1GaT1ixhZg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-07 23:24           ` Thomas Graf
     [not found]             ` <20150107232412.GD21149-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
2015-01-08  0:02               ` Tom Herbert
2015-01-08  0:14                 ` Thomas Graf
     [not found]                   ` <20150108001401.GG21149-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
2015-01-08  0:23                     ` Tom Herbert
2015-01-07  2:05 ` [PATCH 2/6] vxlan: Group Policy extension Thomas Graf
2015-01-07 16:05   ` Tom Herbert
     [not found]     ` <CA+mtBx_Jj-tUM1nbHd2fHb0-=QpK3tcQgA=smWmg=cB-fupdGg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-07 16:21       ` Thomas Graf
2015-01-07 16:56         ` Tom Herbert
     [not found]           ` <CA+mtBx_A_M3+irq7w4nNCyPZBgM7ja+wfJT4w4Q0Yo6GMGYVgA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-07 17:21             ` Thomas Graf
2015-01-07  2:05 ` [PATCH 3/6] vxlan: Only bind to sockets with correct extensions enabled Thomas Graf
2015-01-07 22:45   ` Jesse Gross
2015-01-07 22:52     ` Thomas Graf
2015-01-07  2:05 ` [PATCH 4/6] vxlan: Fail build if VXLAN header is misdefined Thomas Graf
2015-01-07  2:05 ` [PATCH 5/6] openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS() Thomas Graf
2015-01-07 22:46   ` Jesse Gross
2015-01-07 22:55     ` Thomas Graf
2015-01-07  2:05 ` [PATCH 6/6] openvswitch: Support VXLAN Group Policy extension Thomas Graf
2015-01-07 22:46   ` Jesse Gross
2015-01-07 23:01     ` Thomas Graf
2015-01-08  1:18       ` Jesse Gross
2015-01-08 10:22         ` Thomas Graf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.