[PATCH v9 net-next 0/7] openvswitch: support for layer 3 encapsulated packets

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v9 net-next 0/7] openvswitch: support for layer 3 encapsulated packets
@ 2016-05-04  7:36 Simon Horman
  2016-05-04  7:36 ` [PATCH v9 net-next 1/7] net: add skb_vlan_deaccel helper Simon Horman
                   ` (6 more replies)
  0 siblings, 7 replies; 43+ messages in thread
From: Simon Horman @ 2016-05-04  7:36 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, dev-yBygre7rU0TnMu66kgdUjQ; +Cc: Simon Horman

At the core of this patch set is removing the assumption in Open vSwitch
datapath that all packets have Ethernet header.  Support for layer 3 GRE
tunnels is also added by this patchset.

The implementation relies on the presence of pop_eth and push_eth actions
in datapath flows to facilitate adding and removing Ethernet headers as
appropriate. The construction of such flows is left up to user-space.

This series is based on work by Lorand Jakab, Thomas Morin and others.


This patch set is comprised of kernel patches against net-next.
It depends on:

    "[PATCH net-next 0/3] gre: receive also TEB packets for lwtunnels"

To aid review it and the above dependency is available at:

    tree: https://github.com/horms/openvswitch
    branch: me/l3-vpn
    tag: l3-vpn-v9

There is a companion patch set for the Open vSwitch user-space code
which I will post separately to the dev@openvswitch.org mailing list as:

    "[PATCH v10 0/5] userspace: Support for layer 3 encapsulated packets"



Lorand Jakab (1):
  openvswitch: add layer 3 flow/port support

Simon Horman (6):
  net: add skb_vlan_deaccel helper
  openvswitch: set skb protocol when receiving on internal device
  openvswitch: add support to push and pop mpls for layer3 packets
  openvswitch: add layer 3 support to ovs_packet_cmd_execute()
  openvswitch: extend layer 3 support to cover non-IP packets
  openvswitch: use ipgre tunnel rather than gretap tunnel

 include/linux/skbuff.h               |   1 +
 include/net/gre.h                    |   4 +-
 include/uapi/linux/openvswitch.h     |  14 +++
 net/core/skbuff.c                    |  50 +++++----
 net/ipv4/ip_gre.c                    |   8 +-
 net/openvswitch/actions.c            |  70 ++++++++++--
 net/openvswitch/datapath.c           |  13 +--
 net/openvswitch/flow.c               |  66 ++++++++----
 net/openvswitch/flow.h               |   4 +-
 net/openvswitch/flow_netlink.c       | 200 +++++++++++++++++++++++++----------
 net/openvswitch/vport-geneve.c       |   2 +-
 net/openvswitch/vport-gre.c          |   4 +-
 net/openvswitch/vport-internal_dev.c |   8 ++
 net/openvswitch/vport-netdev.c       |  31 +++++-
 net/openvswitch/vport-netdev.h       |   3 +
 net/openvswitch/vport-vxlan.c        |   2 +-
 16 files changed, 350 insertions(+), 130 deletions(-)

-- 
2.7.0.rc3.207.g0ac5344

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v9 net-next 1/7] net: add skb_vlan_deaccel helper
  2016-05-04  7:36 [PATCH v9 net-next 0/7] openvswitch: support for layer 3 encapsulated packets Simon Horman
@ 2016-05-04  7:36 ` Simon Horman
  2016-05-04  7:36 ` [PATCH v9 net-next 2/7] openvswitch: set skb protocol when receiving on internal device Simon Horman
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 43+ messages in thread
From: Simon Horman @ 2016-05-04  7:36 UTC (permalink / raw)
  To: netdev, dev; +Cc: Simon Horman

This breaks out the bulk of skb_vlan_push into a separate helper.
This new helper moves any vlan tag present in metadata into packet data.
The existing skb_vlan_push uses this new helper and then pushes a new
vlan tag (into metadata).

The motivation is to allow deaccelerating VLAN tags without adding a new
one. This is in preparation for a push ethernet header support in Open
vSwitch.

Signed-off-by: Simon Horman <simon.horman@netronome.com>

---
v9 [Simon Horman]
* New patch
---
 include/linux/skbuff.h |  1 +
 net/core/skbuff.c      | 50 ++++++++++++++++++++++++++++++++------------------
 2 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c413c588a24f..fa504d28c7e4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2994,6 +2994,7 @@ int skb_vlan_pop(struct sk_buff *skb);
 int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci);
 struct sk_buff *pskb_extract(struct sk_buff *skb, int off, int to_copy,
 			     gfp_t gfp);
+int skb_vlan_deaccel(struct sk_buff *skb);
 
 static inline int memcpy_from_msg(void *data, struct msghdr *msg, int len)
 {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7a1d48983f81..84a3f8cc4b0a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4516,30 +4516,44 @@ int skb_vlan_pop(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(skb_vlan_pop);
 
-int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci)
+int skb_vlan_deaccel(struct sk_buff *skb)
 {
-	if (skb_vlan_tag_present(skb)) {
-		unsigned int offset = skb->data - skb_mac_header(skb);
-		int err;
+	unsigned int offset;
+	int err;
 
-		/* __vlan_insert_tag expect skb->data pointing to mac header.
-		 * So change skb->data before calling it and change back to
-		 * original position later
-		 */
-		__skb_push(skb, offset);
-		err = __vlan_insert_tag(skb, skb->vlan_proto,
-					skb_vlan_tag_get(skb));
-		if (err) {
-			__skb_pull(skb, offset);
-			return err;
-		}
+	if (!skb_vlan_tag_present(skb))
+		return 0;
 
-		skb->protocol = skb->vlan_proto;
-		skb->mac_len += VLAN_HLEN;
+	offset = skb->data - skb_mac_header(skb);
 
-		skb_postpush_rcsum(skb, skb->data + (2 * ETH_ALEN), VLAN_HLEN);
+	/* __vlan_insert_tag expect skb->data pointing to mac header.
+	 * So change skb->data before calling it and change back to
+	 * original position later
+	 */
+	__skb_push(skb, offset);
+	err = __vlan_insert_tag(skb, skb->vlan_proto, skb_vlan_tag_get(skb));
+	if (err) {
 		__skb_pull(skb, offset);
+		return err;
 	}
+
+	skb->protocol = skb->vlan_proto;
+	skb->mac_len += VLAN_HLEN;
+
+	skb_postpush_rcsum(skb, skb->data + (2 * ETH_ALEN), VLAN_HLEN);
+	__skb_pull(skb, offset);
+
+	return 0;
+}
+EXPORT_SYMBOL(skb_vlan_deaccel);
+
+int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci)
+{
+	int err;
+
+	err = skb_vlan_deaccel(skb);
+	if (err)
+		return err;
 	__vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
 	return 0;
 }
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v9 net-next 2/7] openvswitch: set skb protocol when receiving on internal device
  2016-05-04  7:36 [PATCH v9 net-next 0/7] openvswitch: support for layer 3 encapsulated packets Simon Horman
  2016-05-04  7:36 ` [PATCH v9 net-next 1/7] net: add skb_vlan_deaccel helper Simon Horman
@ 2016-05-04  7:36 ` Simon Horman
  2016-05-04  7:36 ` [PATCH v9 net-next 3/7] openvswitch: add support to push and pop mpls for layer3 packets Simon Horman
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 43+ messages in thread
From: Simon Horman @ 2016-05-04  7:36 UTC (permalink / raw)
  To: netdev, dev; +Cc: Simon Horman

Set skb protocol based on contents of packet. I have observed this is
necessary to get actual protocol of a packet when it is injected into an
internal device e.g. by libnet in which case skb protocol will be set to
ETH_ALL.

The motivation for this is that support for outputting to layer 3 (non-tap)
GRE tunnels as implemented by a subsequent patch depends on protocol being
set correctly in order to provide the protocol for the inner packet.

Signed-off-by: Simon Horman <simon.horman@netronome.com>

---
v9
* New patch
---
 net/openvswitch/vport-internal_dev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index 2ee48e447b72..4ce2ad8c3a5c 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -48,6 +48,8 @@ static int internal_dev_xmit(struct sk_buff *skb, struct net_device *netdev)
 {
 	int len, err;
 
+	skb->protocol = eth_type_trans(skb, netdev);
+	skb_push(skb, ETH_HLEN);
 	len = skb->len;
 	rcu_read_lock();
 	err = ovs_vport_receive(internal_dev_priv(netdev)->vport, skb, NULL);
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v9 net-next 3/7] openvswitch: add support to push and pop mpls for layer3 packets
  2016-05-04  7:36 [PATCH v9 net-next 0/7] openvswitch: support for layer 3 encapsulated packets Simon Horman
  2016-05-04  7:36 ` [PATCH v9 net-next 1/7] net: add skb_vlan_deaccel helper Simon Horman
  2016-05-04  7:36 ` [PATCH v9 net-next 2/7] openvswitch: set skb protocol when receiving on internal device Simon Horman
@ 2016-05-04  7:36 ` Simon Horman
       [not found]   ` <1462347393-22354-4-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
  2016-05-04  7:36 ` [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support Simon Horman
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 43+ messages in thread
From: Simon Horman @ 2016-05-04  7:36 UTC (permalink / raw)
  To: netdev, dev; +Cc: Simon Horman

Allow push and pop mpls actions to act on layer 3 packets by teaching
them not to access non-existent L2 headers of such packets.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
v9
* New Patch
---
 net/openvswitch/actions.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 879185fe183f..fe885a89f501 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -160,8 +160,10 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 
 	skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
 
-	hdr = eth_hdr(skb);
-	hdr->h_proto = mpls->mpls_ethertype;
+	if (skb->mac_len) {
+		hdr = eth_hdr(skb);
+		hdr->h_proto = mpls->mpls_ethertype;
+	}
 
 	if (!skb->inner_protocol)
 		skb_set_inner_protocol(skb, skb->protocol);
@@ -174,7 +176,6 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 		    const __be16 ethertype)
 {
-	struct ethhdr *hdr;
 	int err;
 
 	err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN);
@@ -189,11 +190,16 @@ static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 	__skb_pull(skb, MPLS_HLEN);
 	skb_reset_mac_header(skb);
 
-	/* skb_mpls_header() is used to locate the ethertype
-	 * field correctly in the presence of VLAN tags.
-	 */
-	hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
-	hdr->h_proto = ethertype;
+	if (skb->mac_len) {
+		/* skb_mpls_header() is used to locate the ethertype
+		 * field correctly in the presence of VLAN tags.
+		 */
+		struct ethhdr *hdr;
+
+		hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
+		BUG_ON((unsigned char *)hdr < skb_mac_header(skb));
+		hdr->h_proto = ethertype;
+	}
 	if (eth_p_mpls(skb->protocol))
 		skb->protocol = ethertype;
 
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-04  7:36 [PATCH v9 net-next 0/7] openvswitch: support for layer 3 encapsulated packets Simon Horman
                   ` (2 preceding siblings ...)
  2016-05-04  7:36 ` [PATCH v9 net-next 3/7] openvswitch: add support to push and pop mpls for layer3 packets Simon Horman
@ 2016-05-04  7:36 ` Simon Horman
       [not found]   ` <1462347393-22354-5-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
                     ` (2 more replies)
       [not found] ` <1462347393-22354-1-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
                   ` (2 subsequent siblings)
  6 siblings, 3 replies; 43+ messages in thread
From: Simon Horman @ 2016-05-04  7:36 UTC (permalink / raw)
  To: netdev, dev; +Cc: Lorand Jakab, Thomas Morin, Simon Horman

From: Lorand Jakab <lojakab@cisco.com>

Implementation of the pop_eth and push_eth actions in the kernel, and
layer 3 flow support.

This doesn't actually do anything yet as no layer 2 tunnel ports are
supported yet. The original patch by Lorand was against the Open vSwtich
tree which has L2 LISP tunnels but that is not supported in mainline Linux.
I (Simon) plan to follow up with support for non-TEB GRE ports based on
work by Thomas Morin.

Cc: Thomas Morin <thomas.morin@orange.com>
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>

---
v9 [Simon Horman]
* Rebase
* Minor coding style updates
* Prohibit push/pop MPLS on l3 packets
* There are no layer 3 ports supported at this time so only
  send and receive layer 2 packets: that is don't actually
  use this new infrastructure yet
* Expect that vports that can handle layer 3 packets will: have
  a type other than ARPHRD_IPETHER; can also handle layer 2 packets;
  and that packets can be differentiated by layer 2 packets having
  skb->protocol set to htons(ETH_P_TEB)

v1 - v8 [Lorand Jakub]
---
 include/uapi/linux/openvswitch.h     |  12 +++
 net/openvswitch/actions.c            |  48 ++++++++++++
 net/openvswitch/flow.c               |  63 ++++++++++-----
 net/openvswitch/flow.h               |   4 +-
 net/openvswitch/flow_netlink.c       | 148 +++++++++++++++++++++++++----------
 net/openvswitch/vport-geneve.c       |   2 +-
 net/openvswitch/vport-gre.c          |   2 +-
 net/openvswitch/vport-internal_dev.c |   6 ++
 net/openvswitch/vport-netdev.c       |  19 ++++-
 net/openvswitch/vport-netdev.h       |   2 +
 net/openvswitch/vport-vxlan.c        |   2 +-
 11 files changed, 240 insertions(+), 68 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index bb0d515b7654..53129c03273c 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -699,6 +699,16 @@ enum ovs_nat_attr {
 
 #define OVS_NAT_ATTR_MAX (__OVS_NAT_ATTR_MAX - 1)
 
+/*
+ * struct ovs_action_push_eth - %OVS_ACTION_ATTR_PUSH_ETH action argument.
+ * @addresses: Source and destination MAC addresses.
+ * @eth_type: Ethernet type
+ */
+struct ovs_action_push_eth {
+	struct ovs_key_ethernet addresses;
+	__be16   eth_type;
+};
+
 /**
  * enum ovs_action_attr - Action types.
  *
@@ -756,6 +766,8 @@ enum ovs_action_attr {
 				       * The data must be zero for the unmasked
 				       * bits. */
 	OVS_ACTION_ATTR_CT,           /* Nested OVS_CT_ATTR_* . */
+	OVS_ACTION_ATTR_PUSH_ETH,     /* struct ovs_action_push_eth. */
+	OVS_ACTION_ATTR_POP_ETH,      /* No argument. */
 
 	__OVS_ACTION_ATTR_MAX,	      /* Nothing past this will be accepted
 				       * from userspace. */
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index fe885a89f501..63d29263d51a 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -291,6 +291,46 @@ static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key,
 	return 0;
 }
 
+static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
+{
+	skb_pull_rcsum(skb, ETH_HLEN);
+	skb_reset_mac_header(skb);
+	skb->mac_len -= ETH_HLEN;
+
+	invalidate_flow_key(key);
+	return 0;
+}
+
+static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
+		    const struct ovs_action_push_eth *ethh)
+{
+	int err;
+
+	/* De-accelerate any hardware accelerated VLAN tag added to a previous
+	 * Ethernet header */
+	err = skb_vlan_deaccel(skb);
+	if (unlikely(err))
+		return err;
+
+	/* Add the new Ethernet header */
+	if (skb_cow_head(skb, ETH_HLEN) < 0)
+		return -ENOMEM;
+
+	skb_push(skb, ETH_HLEN);
+	skb_reset_mac_header(skb);
+	skb_reset_mac_len(skb);
+
+	ether_addr_copy(eth_hdr(skb)->h_source, ethh->addresses.eth_src);
+	ether_addr_copy(eth_hdr(skb)->h_dest, ethh->addresses.eth_dst);
+	eth_hdr(skb)->h_proto = ethh->eth_type;
+
+	skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
+
+	skb->protocol = ethh->eth_type;
+	invalidate_flow_key(key);
+	return 0;
+}
+
 static void update_ip_l4_checksum(struct sk_buff *skb, struct iphdr *nh,
 				  __be32 addr, __be32 new_addr)
 {
@@ -1079,6 +1119,14 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 			err = pop_vlan(skb, key);
 			break;
 
+		case OVS_ACTION_ATTR_PUSH_ETH:
+			err = push_eth(skb, key, nla_data(a));
+			break;
+
+		case OVS_ACTION_ATTR_POP_ETH:
+			err = pop_eth(skb, key);
+			break;
+
 		case OVS_ACTION_ATTR_RECIRC:
 			err = execute_recirc(dp, skb, key, a, rem);
 			if (nla_is_last(a, rem)) {
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 0ea128eeeab2..6e174ea5f2bb 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -468,28 +468,31 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 
 	skb_reset_mac_header(skb);
 
-	/* Link layer.  We are guaranteed to have at least the 14 byte Ethernet
-	 * header in the linear data area.
-	 */
-	eth = eth_hdr(skb);
-	ether_addr_copy(key->eth.src, eth->h_source);
-	ether_addr_copy(key->eth.dst, eth->h_dest);
+	/* Link layer. */
+	if (key->phy.is_layer3) {
+		key->eth.tci = 0;
+		key->eth.type = skb->protocol;
+	} else {
+		eth = eth_hdr(skb);
+		ether_addr_copy(key->eth.src, eth->h_source);
+		ether_addr_copy(key->eth.dst, eth->h_dest);
 
-	__skb_pull(skb, 2 * ETH_ALEN);
-	/* We are going to push all headers that we pull, so no need to
-	 * update skb->csum here.
-	 */
+		__skb_pull(skb, 2 * ETH_ALEN);
+		/* We are going to push all headers that we pull, so no need to
+		 * update skb->csum here.
+		 */
 
-	key->eth.tci = 0;
-	if (skb_vlan_tag_present(skb))
-		key->eth.tci = htons(skb->vlan_tci);
-	else if (eth->h_proto == htons(ETH_P_8021Q))
-		if (unlikely(parse_vlan(skb, key)))
-			return -ENOMEM;
+		key->eth.tci = 0;
+		if (skb_vlan_tag_present(skb))
+			key->eth.tci = htons(skb->vlan_tci);
+		else if (eth->h_proto == htons(ETH_P_8021Q))
+			if (unlikely(parse_vlan(skb, key)))
+				return -ENOMEM;
 
-	key->eth.type = parse_ethertype(skb);
-	if (unlikely(key->eth.type == htons(0)))
-		return -ENOMEM;
+		key->eth.type = parse_ethertype(skb);
+		if (unlikely(key->eth.type == htons(0)))
+			return -ENOMEM;
+	}
 
 	skb_reset_network_header(skb);
 	skb_reset_mac_len(skb);
@@ -696,11 +699,23 @@ int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key)
 int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 			 struct sk_buff *skb, struct sw_flow_key *key)
 {
+	bool is_layer3 = false;
+	bool is_teb = false;
+	int err;
+
 	/* Extract metadata from packet. */
 	if (tun_info) {
 		key->tun_proto = ip_tunnel_info_af(tun_info);
 		memcpy(&key->tun_key, &tun_info->key, sizeof(key->tun_key));
 
+		if (OVS_CB(skb)->input_vport->dev->type != ARPHRD_ETHER) {
+			if (skb->protocol == htons(ETH_P_TEB))
+				is_teb = true;
+			else
+				is_layer3 = true;
+		}
+
+
 		if (tun_info->options_len) {
 			BUILD_BUG_ON((1 << (sizeof(tun_info->options_len) *
 						   8)) - 1
@@ -723,9 +738,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 	key->phy.skb_mark = skb->mark;
 	ovs_ct_fill_key(skb, key);
 	key->ovs_flow_hash = 0;
+	key->phy.is_layer3 = is_layer3;
 	key->recirc_id = 0;
 
-	return key_extract(skb, key);
+	err = key_extract(skb, key);
+	if (err < 0)
+		return err;
+
+	if (is_teb)
+		skb->protocol = key->eth.type;
+
+	return err;
 }
 
 int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 03378e75a67c..5395ec0c3c13 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -62,6 +62,7 @@ struct sw_flow_key {
 		u32	priority;	/* Packet QoS priority. */
 		u32	skb_mark;	/* SKB mark. */
 		u16	in_port;	/* Input switch port (or DP_MAX_PORTS). */
+		bool	is_layer3;	/* Packet has no Ethernet header */
 	} __packed phy; /* Safe when right after 'tun_key'. */
 	u8 tun_proto;			/* Protocol of encapsulating tunnel. */
 	u32 ovs_flow_hash;		/* Datapath computed hash value.  */
@@ -219,8 +220,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
 
 int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key);
 int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
-			 struct sk_buff *skb,
-			 struct sw_flow_key *key);
+			 struct sk_buff *skb, struct sw_flow_key *key);
 /* Extract key from packet coming from userspace. */
 int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
 				   struct sk_buff *skb,
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 0bb650f4f219..d0ef4fa69bea 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -123,7 +123,7 @@ static void update_range(struct sw_flow_match *match,
 static bool match_validate(const struct sw_flow_match *match,
 			   u64 key_attrs, u64 mask_attrs, bool log)
 {
-	u64 key_expected = 1 << OVS_KEY_ATTR_ETHERNET;
+	u64 key_expected = 0;
 	u64 mask_allowed = key_attrs;  /* At most allow all key attributes */
 
 	/* The following mask attributes allowed only if they
@@ -145,6 +145,10 @@ static bool match_validate(const struct sw_flow_match *match,
 		       | (1 << OVS_KEY_ATTR_IN_PORT)
 		       | (1 << OVS_KEY_ATTR_ETHERTYPE));
 
+	/* If Ethertype is present, expect MAC addresses */
+	if (key_attrs & (1ULL << OVS_KEY_ATTR_ETHERTYPE))
+		key_expected |= 1ULL << OVS_KEY_ATTR_ETHERNET;
+
 	/* Check key attributes. */
 	if (match->key->eth.type == htons(ETH_P_ARP)
 			|| match->key->eth.type == htons(ETH_P_RARP)) {
@@ -898,6 +902,15 @@ static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match,
 				   sizeof(*cl), is_mask);
 		*attrs &= ~(1ULL << OVS_KEY_ATTR_CT_LABELS);
 	}
+	if (is_mask) {
+		/* Always exact match is_layer3 */
+		SW_FLOW_KEY_PUT(match, phy.is_layer3, true, is_mask);
+	} else {
+		if (*attrs & (1ULL << OVS_KEY_ATTR_ETHERNET))
+			SW_FLOW_KEY_PUT(match, phy.is_layer3, false, is_mask);
+		else
+			SW_FLOW_KEY_PUT(match, phy.is_layer3, true, is_mask);
+	}
 	return 0;
 }
 
@@ -961,6 +974,17 @@ static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
 	if (attrs & (1 << OVS_KEY_ATTR_IPV4)) {
 		const struct ovs_key_ipv4 *ipv4_key;
 
+		/* Add eth.type value for layer 3 flows */
+		if (!(attrs & (1ULL << OVS_KEY_ATTR_ETHERTYPE))) {
+			__be16 eth_type;
+
+			if (is_mask)
+				eth_type = htons(0xffff);
+			else
+				eth_type = htons(ETH_P_IP);
+			SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask);
+		}
+
 		ipv4_key = nla_data(a[OVS_KEY_ATTR_IPV4]);
 		if (!is_mask && ipv4_key->ipv4_frag > OVS_FRAG_TYPE_MAX) {
 			OVS_NLERR(log, "IPv4 frag type %d is out of range max %d",
@@ -985,6 +1009,17 @@ static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
 	if (attrs & (1 << OVS_KEY_ATTR_IPV6)) {
 		const struct ovs_key_ipv6 *ipv6_key;
 
+		/* Add eth.type value for layer 3 flows */
+		if (!(attrs & (1ULL << OVS_KEY_ATTR_ETHERTYPE))) {
+			__be16 eth_type;
+
+			if (is_mask)
+				eth_type = htons(0xffff);
+			else
+				eth_type = htons(ETH_P_IPV6);
+			SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask);
+		}
+
 		ipv6_key = nla_data(a[OVS_KEY_ATTR_IPV6]);
 		if (!is_mask && ipv6_key->ipv6_frag > OVS_FRAG_TYPE_MAX) {
 			OVS_NLERR(log, "IPv6 frag type %d is out of range max %d",
@@ -1415,7 +1450,7 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
 			     struct sk_buff *skb)
 {
 	struct ovs_key_ethernet *eth_key;
-	struct nlattr *nla, *encap;
+	struct nlattr *nla, *encap = NULL;
 
 	if (nla_put_u32(skb, OVS_KEY_ATTR_RECIRC_ID, output->recirc_id))
 		goto nla_put_failure;
@@ -1456,42 +1491,44 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
 	if (ovs_ct_put_key(output, skb))
 		goto nla_put_failure;
 
-	nla = nla_reserve(skb, OVS_KEY_ATTR_ETHERNET, sizeof(*eth_key));
-	if (!nla)
-		goto nla_put_failure;
+	if (!swkey->phy.is_layer3) {
+		nla = nla_reserve(skb, OVS_KEY_ATTR_ETHERNET, sizeof(*eth_key));
+		if (!nla)
+			goto nla_put_failure;
 
-	eth_key = nla_data(nla);
-	ether_addr_copy(eth_key->eth_src, output->eth.src);
-	ether_addr_copy(eth_key->eth_dst, output->eth.dst);
+		eth_key = nla_data(nla);
+		ether_addr_copy(eth_key->eth_src, output->eth.src);
+		ether_addr_copy(eth_key->eth_dst, output->eth.dst);
 
-	if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) {
-		__be16 eth_type;
-		eth_type = !is_mask ? htons(ETH_P_8021Q) : htons(0xffff);
-		if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, eth_type) ||
-		    nla_put_be16(skb, OVS_KEY_ATTR_VLAN, output->eth.tci))
-			goto nla_put_failure;
-		encap = nla_nest_start(skb, OVS_KEY_ATTR_ENCAP);
-		if (!swkey->eth.tci)
-			goto unencap;
-	} else
-		encap = NULL;
-
-	if (swkey->eth.type == htons(ETH_P_802_2)) {
-		/*
-		 * Ethertype 802.2 is represented in the netlink with omitted
-		 * OVS_KEY_ATTR_ETHERTYPE in the flow key attribute, and
-		 * 0xffff in the mask attribute.  Ethertype can also
-		 * be wildcarded.
-		 */
-		if (is_mask && output->eth.type)
-			if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE,
-						output->eth.type))
+		if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) {
+			__be16 eth_type;
+			eth_type = !is_mask ? htons(ETH_P_8021Q) : htons(0xffff);
+			if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, eth_type) ||
+			    nla_put_be16(skb, OVS_KEY_ATTR_VLAN,
+					 output->eth.tci))
 				goto nla_put_failure;
-		goto unencap;
-	}
+			encap = nla_nest_start(skb, OVS_KEY_ATTR_ENCAP);
+			if (!swkey->eth.tci)
+				goto unencap;
+		}
 
-	if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, output->eth.type))
-		goto nla_put_failure;
+		if (swkey->eth.type == htons(ETH_P_802_2)) {
+			/*
+			 * Ethertype 802.2 is represented in the netlink
+			 * with omitted OVS_KEY_ATTR_ETHERTYPE in the flow
+			 * key attribute, and 0xffff in the mask attribute.
+			 * Ethertype can also be wildcarded.
+			 */
+			if (is_mask && output->eth.type)
+				if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE,
+						 output->eth.type))
+					goto nla_put_failure;
+			goto unencap;
+		}
+
+		if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, output->eth.type))
+			goto nla_put_failure;
+	}
 
 	if (swkey->eth.type == htons(ETH_P_IP)) {
 		struct ovs_key_ipv4 *ipv4_key;
@@ -2010,8 +2047,8 @@ static bool validate_masked(u8 *data, int len)
 
 static int validate_set(const struct nlattr *a,
 			const struct sw_flow_key *flow_key,
-			struct sw_flow_actions **sfa,
-			bool *skip_copy, __be16 eth_type, bool masked, bool log)
+			struct sw_flow_actions **sfa, bool *skip_copy,
+			__be16 eth_type, bool masked, bool log, bool is_layer3)
 {
 	const struct nlattr *ovs_key = nla_data(a);
 	int key_type = nla_type(ovs_key);
@@ -2041,7 +2078,11 @@ static int validate_set(const struct nlattr *a,
 	case OVS_KEY_ATTR_SKB_MARK:
 	case OVS_KEY_ATTR_CT_MARK:
 	case OVS_KEY_ATTR_CT_LABELS:
+		break;
+
 	case OVS_KEY_ATTR_ETHERNET:
+		if (is_layer3)
+			return -EINVAL;
 		break;
 
 	case OVS_KEY_ATTR_TUNNEL:
@@ -2120,6 +2161,8 @@ static int validate_set(const struct nlattr *a,
 		break;
 
 	case OVS_KEY_ATTR_MPLS:
+		if (is_layer3)
+			return -EINVAL;
 		if (!eth_p_mpls(eth_type))
 			return -EINVAL;
 		break;
@@ -2208,6 +2251,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 				  int depth, struct sw_flow_actions **sfa,
 				  __be16 eth_type, __be16 vlan_tci, bool log)
 {
+	bool is_layer3 = key->phy.is_layer3;
 	const struct nlattr *a;
 	int rem, err;
 
@@ -2229,6 +2273,8 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			[OVS_ACTION_ATTR_SAMPLE] = (u32)-1,
 			[OVS_ACTION_ATTR_HASH] = sizeof(struct ovs_action_hash),
 			[OVS_ACTION_ATTR_CT] = (u32)-1,
+			[OVS_ACTION_ATTR_PUSH_ETH] = sizeof(struct ovs_action_push_eth),
+			[OVS_ACTION_ATTR_POP_ETH] = 0,
 		};
 		const struct ovs_action_push_vlan *vlan;
 		int type = nla_type(a);
@@ -2269,10 +2315,14 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 		}
 
 		case OVS_ACTION_ATTR_POP_VLAN:
+			if (is_layer3)
+				return -EINVAL;
 			vlan_tci = htons(0);
 			break;
 
 		case OVS_ACTION_ATTR_PUSH_VLAN:
+			if (is_layer3)
+				return -EINVAL;
 			vlan = nla_data(a);
 			if (vlan->vlan_tpid != htons(ETH_P_8021Q))
 				return -EINVAL;
@@ -2287,7 +2337,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 		case OVS_ACTION_ATTR_PUSH_MPLS: {
 			const struct ovs_action_push_mpls *mpls = nla_data(a);
 
-			if (!eth_p_mpls(mpls->mpls_ethertype))
+			if (is_layer3 || !eth_p_mpls(mpls->mpls_ethertype))
 				return -EINVAL;
 			/* Prohibit push MPLS other than to a white list
 			 * for packets that have a known tag order.
@@ -2304,7 +2354,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 		}
 
 		case OVS_ACTION_ATTR_POP_MPLS:
-			if (vlan_tci & htons(VLAN_TAG_PRESENT) ||
+			if (is_layer3 || vlan_tci & htons(VLAN_TAG_PRESENT) ||
 			    !eth_p_mpls(eth_type))
 				return -EINVAL;
 
@@ -2322,14 +2372,16 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 
 		case OVS_ACTION_ATTR_SET:
 			err = validate_set(a, key, sfa,
-					   &skip_copy, eth_type, false, log);
+					   &skip_copy, eth_type, false, log,
+					   is_layer3);
 			if (err)
 				return err;
 			break;
 
 		case OVS_ACTION_ATTR_SET_MASKED:
 			err = validate_set(a, key, sfa,
-					   &skip_copy, eth_type, true, log);
+					   &skip_copy, eth_type, true, log,
+					   is_layer3);
 			if (err)
 				return err;
 			break;
@@ -2349,6 +2401,22 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			skip_copy = true;
 			break;
 
+		case OVS_ACTION_ATTR_POP_ETH:
+			if (is_layer3)
+				return -EINVAL;
+			if (vlan_tci & htons(VLAN_TAG_PRESENT))
+				return -EINVAL;
+			is_layer3 = true;
+			break;
+
+		case OVS_ACTION_ATTR_PUSH_ETH:
+			/* For now disallow pushing an Ethernet header if one
+			 * is already present */
+			if (!is_layer3)
+				return -EINVAL;
+			is_layer3 = false;
+			break;
+
 		default:
 			OVS_NLERR(log, "Unknown Action type %d", type);
 			return -EINVAL;
diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
index 1a1fcec88695..906ae9c58c2e 100644
--- a/net/openvswitch/vport-geneve.c
+++ b/net/openvswitch/vport-geneve.c
@@ -116,7 +116,7 @@ static struct vport_ops ovs_geneve_vport_ops = {
 	.create		= geneve_create,
 	.destroy	= ovs_netdev_tunnel_destroy,
 	.get_options	= geneve_get_options,
-	.send		= dev_queue_xmit,
+	.send		= ovs_netdev_send_tap,
 };
 
 static int __init ovs_geneve_tnl_init(void)
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index 7f8897f33a67..f003225de994 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -87,7 +87,7 @@ static struct vport *gre_create(const struct vport_parms *parms)
 static struct vport_ops ovs_gre_vport_ops = {
 	.type		= OVS_VPORT_TYPE_GRE,
 	.create		= gre_create,
-	.send		= dev_queue_xmit,
+	.send		= ovs_netdev_send_tap,
 	.destroy	= ovs_netdev_tunnel_destroy,
 };
 
diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index 4ce2ad8c3a5c..a20f090bd971 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -256,6 +256,12 @@ static netdev_tx_t internal_dev_recv(struct sk_buff *skb)
 	struct net_device *netdev = skb->dev;
 	struct pcpu_sw_netstats *stats;
 
+	/* Only send/receive L2 packets */
+	if (!skb->mac_len) {
+		kfree_skb(skb);
+		return -EINVAL;
+	}
+
 	if (unlikely(!(netdev->flags & IFF_UP))) {
 		kfree_skb(skb);
 		netdev->stats.rx_dropped++;
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 4e3972344aa6..0e0b9286dd11 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -57,8 +57,10 @@ static void netdev_port_receive(struct sk_buff *skb)
 	if (unlikely(!skb))
 		return;
 
-	skb_push(skb, ETH_HLEN);
-	skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
+	if (vport->dev->type == ARPHRD_ETHER) {
+		skb_push(skb, ETH_HLEN);
+		skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
+	}
 	ovs_vport_receive(vport, skb, skb_tunnel_info(skb));
 	return;
 error:
@@ -194,6 +196,17 @@ void ovs_netdev_tunnel_destroy(struct vport *vport)
 }
 EXPORT_SYMBOL_GPL(ovs_netdev_tunnel_destroy);
 
+int ovs_netdev_send_tap(struct sk_buff *skb)
+{
+	/* Only send L2 packets */
+	if (skb->mac_len)
+		return dev_queue_xmit(skb);
+
+	kfree_skb(skb);
+	return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(ovs_netdev_send_tap);
+
 /* Returns null if this device is not attached to a datapath. */
 struct vport *ovs_netdev_get_vport(struct net_device *dev)
 {
@@ -208,7 +221,7 @@ static struct vport_ops ovs_netdev_vport_ops = {
 	.type		= OVS_VPORT_TYPE_NETDEV,
 	.create		= netdev_create,
 	.destroy	= netdev_destroy,
-	.send		= dev_queue_xmit,
+	.send		= ovs_netdev_send_tap,
 };
 
 int __init ovs_netdev_init(void)
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index 19e29c12adcc..02f38a822334 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -33,4 +33,6 @@ int __init ovs_netdev_init(void);
 void ovs_netdev_exit(void);
 
 void ovs_netdev_tunnel_destroy(struct vport *vport);
+
+int ovs_netdev_send_tap(struct sk_buff *skb);
 #endif /* vport_netdev.h */
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 5eb7694348b5..009a4aab505a 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -153,7 +153,7 @@ static struct vport_ops ovs_vxlan_netdev_vport_ops = {
 	.create			= vxlan_create,
 	.destroy		= ovs_netdev_tunnel_destroy,
 	.get_options		= vxlan_get_options,
-	.send			= dev_queue_xmit,
+	.send			= ovs_netdev_send_tap,
 };
 
 static int __init ovs_vxlan_tnl_init(void)
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v9 net-next 5/7] openvswitch: add layer 3 support to ovs_packet_cmd_execute()
       [not found] ` <1462347393-22354-1-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
@ 2016-05-04  7:36   ` Simon Horman
       [not found]     ` <1462347393-22354-6-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
  0 siblings, 1 reply; 43+ messages in thread
From: Simon Horman @ 2016-05-04  7:36 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, dev-yBygre7rU0TnMu66kgdUjQ; +Cc: Simon Horman

From: Lorand Jakab <lojakab@cisco.com>

When user space handles a packet back to the kernel datapath for
execution, it is not accompanied by the full flow key, only packet
metadata.  This doesn't allow detection of packet type (L2/L3).  Add the
OVS_KEY_ATTR_PACKET_ETHERTYPE attribute to the packet metadata,
containing 0 for layer 2 packets and the Ethertype for layer 3 packets.

Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
v9 [Simon Horman]
* Rebase

v8 [Lorand Jakab]

v7 [Lorand Jakab]
* New patch
---
 include/uapi/linux/openvswitch.h |  2 ++
 net/openvswitch/datapath.c       | 13 +------------
 net/openvswitch/flow.c           |  5 ++++-
 net/openvswitch/flow_netlink.c   | 39 +++++++++++++++++++++++++++++++--------
 4 files changed, 38 insertions(+), 21 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 53129c03273c..2bc2c12ced78 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -329,6 +329,8 @@ enum ovs_key_attr {
 	OVS_KEY_ATTR_CT_ZONE,	/* u16 connection tracking zone. */
 	OVS_KEY_ATTR_CT_MARK,	/* u32 connection tracking mark */
 	OVS_KEY_ATTR_CT_LABELS,	/* 16-octet connection tracking label */
+	OVS_KEY_ATTR_PACKET_ETHERTYPE,  /* be16 Ethernet type for packet
+					 * execution */
 
 #ifdef __KERNEL__
 	OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 856bd8dba676..780957e78d2b 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -546,7 +546,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	struct sw_flow *flow;
 	struct sw_flow_actions *sf_acts;
 	struct datapath *dp;
-	struct ethhdr *eth;
 	struct vport *input_vport;
 	u16 mru = 0;
 	int len;
@@ -567,17 +566,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 
 	nla_memcpy(__skb_put(packet, len), a[OVS_PACKET_ATTR_PACKET], len);
 
-	skb_reset_mac_header(packet);
-	eth = eth_hdr(packet);
-
-	/* Normally, setting the skb 'protocol' field would be handled by a
-	 * call to eth_type_trans(), but it assumes there's a sending
-	 * device, which we may not have. */
-	if (eth_proto_is_802_3(eth->h_proto))
-		packet->protocol = eth->h_proto;
-	else
-		packet->protocol = htons(ETH_P_802_2);
-
 	/* Set packet's mru */
 	if (a[OVS_PACKET_ATTR_MRU]) {
 		mru = nla_get_u16(a[OVS_PACKET_ATTR_MRU]);
@@ -604,6 +592,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	rcu_assign_pointer(flow->sf_acts, acts);
 	packet->priority = flow->key.phy.priority;
 	packet->mark = flow->key.phy.skb_mark;
+	packet->protocol = flow->key.eth.type;
 
 	rcu_read_lock();
 	dp = get_dp_rcu(net, ovs_header->dp_ifindex);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 6e174ea5f2bb..d320c2657627 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -471,7 +471,6 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 	/* Link layer. */
 	if (key->phy.is_layer3) {
 		key->eth.tci = 0;
-		key->eth.type = skb->protocol;
 	} else {
 		eth = eth_hdr(skb);
 		ether_addr_copy(key->eth.src, eth->h_source);
@@ -693,6 +692,8 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 
 int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key)
 {
+	key->eth.type = skb->protocol;
+
 	return key_extract(skb, key);
 }
 
@@ -747,6 +748,8 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 
 	if (is_teb)
 		skb->protocol = key->eth.type;
+	else if (is_layer3)
+		key->eth.type = skb->protocol;
 
 	return err;
 }
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index d0ef4fa69bea..2bca1e5e9a18 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -286,7 +286,7 @@ size_t ovs_key_attr_size(void)
 	/* Whenever adding new OVS_KEY_ FIELDS, we should consider
 	 * updating this function.
 	 */
-	BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 26);
+	BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 27);
 
 	return    nla_total_size(4)   /* OVS_KEY_ATTR_PRIORITY */
 		+ nla_total_size(0)   /* OVS_KEY_ATTR_TUNNEL */
@@ -359,6 +359,7 @@ static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
 	[OVS_KEY_ATTR_CT_ZONE]	 = { .len = sizeof(u16) },
 	[OVS_KEY_ATTR_CT_MARK]	 = { .len = sizeof(u32) },
 	[OVS_KEY_ATTR_CT_LABELS] = { .len = sizeof(struct ovs_key_ct_labels) },
+	[OVS_KEY_ATTR_PACKET_ETHERTYPE] = { .len = sizeof(__be16) },
 };
 
 static bool check_attr_len(unsigned int attr_len, unsigned int expected_len)
@@ -816,6 +817,8 @@ static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match,
 				 u64 *attrs, const struct nlattr **a,
 				 bool is_mask, bool log)
 {
+	bool is_layer3;
+
 	if (*attrs & (1 << OVS_KEY_ATTR_DP_HASH)) {
 		u32 hash_val = nla_get_u32(a[OVS_KEY_ATTR_DP_HASH]);
 
@@ -902,15 +905,35 @@ static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match,
 				   sizeof(*cl), is_mask);
 		*attrs &= ~(1ULL << OVS_KEY_ATTR_CT_LABELS);
 	}
-	if (is_mask) {
+
+	/* For full flow keys the layer is determined based on the presence of
+	 * OVS_KEY_ATTR_ETHERNET
+	 */
+	if (is_mask)
 		/* Always exact match is_layer3 */
-		SW_FLOW_KEY_PUT(match, phy.is_layer3, true, is_mask);
-	} else {
-		if (*attrs & (1ULL << OVS_KEY_ATTR_ETHERNET))
-			SW_FLOW_KEY_PUT(match, phy.is_layer3, false, is_mask);
-		else
-			SW_FLOW_KEY_PUT(match, phy.is_layer3, true, is_mask);
+		is_layer3 = true;
+	else
+		is_layer3 =  !(*attrs & (1ULL << OVS_KEY_ATTR_ETHERNET));
+	/* Packets from user space for execution only have metadata key
+	 * attributes.  OVS_KEY_ATTR_PACKET_ETHERTYPE is then used to specify
+	 * the starting layer of the packet.  Packets with Ethernet headers
+	 * have this attribute set to 0
+	 */
+	if (*attrs & (1ULL << OVS_KEY_ATTR_PACKET_ETHERTYPE)) {
+		__be16 eth_type;
+
+		if (is_mask) {
+			/* Always exact match packet EtherType */
+			eth_type = htons(0xffff);
+		} else {
+			eth_type = nla_get_be16(a[OVS_KEY_ATTR_PACKET_ETHERTYPE]);
+			is_layer3 = ((eth_type == htons(ETH_P_IP)) ||
+				     (eth_type == htons(ETH_P_IPV6)));
+		}
+		SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask);
 	}
+
+	SW_FLOW_KEY_PUT(match, phy.is_layer3, is_layer3, is_mask);
 	return 0;
 }
 
-- 
2.7.0.rc3.207.g0ac5344

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v9 net-next 6/7] openvswitch: extend layer 3 support to cover non-IP packets
  2016-05-04  7:36 [PATCH v9 net-next 0/7] openvswitch: support for layer 3 encapsulated packets Simon Horman
                   ` (4 preceding siblings ...)
       [not found] ` <1462347393-22354-1-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
@ 2016-05-04  7:36 ` Simon Horman
  2016-05-04  7:36 ` [PATCH v9 net-next 7/7] openvswitch: use ipgre tunnel rather than gretap tunnel Simon Horman
  6 siblings, 0 replies; 43+ messages in thread
From: Simon Horman @ 2016-05-04  7:36 UTC (permalink / raw)
  To: netdev, dev; +Cc: Simon Horman

Extend support for layer 3 packets to cover non-IP packets.

This removes the assumption that the first octet of a layer 3 packet
indicates the IP protocol version - true for IP (v4 and v6), but not
for necessarily for other protocols.

The key motivation for this is to allow forwarding of MPLS packets which
are technically layer 2.5 rather than 3 but the distinction seems unimportant
here.

This sets OVS_KEY_ATTR_PACKET_ETHERTYPE to the ethernet type corresponding
to the protocol of layer 3 packets on a flow miss.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
v9
* New patch
---
 net/openvswitch/flow_netlink.c | 121 ++++++++++++++++++++---------------------
 1 file changed, 59 insertions(+), 62 deletions(-)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 2bca1e5e9a18..1e1392c3c0ed 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -817,7 +817,7 @@ static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match,
 				 u64 *attrs, const struct nlattr **a,
 				 bool is_mask, bool log)
 {
-	bool is_layer3;
+	bool is_layer3 = false;
 
 	if (*attrs & (1 << OVS_KEY_ATTR_DP_HASH)) {
 		u32 hash_val = nla_get_u32(a[OVS_KEY_ATTR_DP_HASH]);
@@ -906,48 +906,40 @@ static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match,
 		*attrs &= ~(1ULL << OVS_KEY_ATTR_CT_LABELS);
 	}
 
-	/* For full flow keys the layer is determined based on the presence of
-	 * OVS_KEY_ATTR_ETHERNET
-	 */
-	if (is_mask)
-		/* Always exact match is_layer3 */
-		is_layer3 = true;
-	else
-		is_layer3 =  !(*attrs & (1ULL << OVS_KEY_ATTR_ETHERNET));
-	/* Packets from user space for execution only have metadata key
-	 * attributes.  OVS_KEY_ATTR_PACKET_ETHERTYPE is then used to specify
-	 * the starting layer of the packet.  Packets with Ethernet headers
-	 * have this attribute set to 0
+	/* For layer 3 packets the ethernet type is provided by by the
+	 * metadata field OVS_KEY_ATTR_PACKET_ETHERTYPE which has a
+	 * non-zero value. Otherwise the ethernet type is provided by the
+	 * packet and reflected by OVS_KEY_ATTR_PACKET_ETHERTYPE.
 	 */
 	if (*attrs & (1ULL << OVS_KEY_ATTR_PACKET_ETHERTYPE)) {
+		/* This is duplicate code from ovs_key_from_nlattrs*/
 		__be16 eth_type;
 
-		if (is_mask) {
+		if (is_mask)
 			/* Always exact match packet EtherType */
 			eth_type = htons(0xffff);
-		} else {
+		else
 			eth_type = nla_get_be16(a[OVS_KEY_ATTR_PACKET_ETHERTYPE]);
-			is_layer3 = ((eth_type == htons(ETH_P_IP)) ||
-				     (eth_type == htons(ETH_P_IPV6)));
+
+		if (eth_type != htons(0)) {
+			is_layer3 = true;
+			SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask);
 		}
-		SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask);
+
+		*attrs &= ~(1ULL << OVS_KEY_ATTR_PACKET_ETHERTYPE);
 	}
 
-	SW_FLOW_KEY_PUT(match, phy.is_layer3, is_layer3, is_mask);
-	return 0;
+	/* Always exact match is_layer3 */
+	SW_FLOW_KEY_PUT(match, phy.is_layer3, is_mask ? true : is_layer3,
+			is_mask);
+	return is_layer3;
 }
 
-static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
-				u64 attrs, const struct nlattr **a,
-				bool is_mask, bool log)
+static int l2_from_nlattrs(struct net *net, struct sw_flow_match *match,
+			   u64 *attrs, const struct nlattr **a,
+			   bool is_mask, bool log)
 {
-	int err;
-
-	err = metadata_from_nlattrs(net, match, &attrs, a, is_mask, log);
-	if (err)
-		return err;
-
-	if (attrs & (1 << OVS_KEY_ATTR_ETHERNET)) {
+	if (*attrs & (1 << OVS_KEY_ATTR_ETHERNET)) {
 		const struct ovs_key_ethernet *eth_key;
 
 		eth_key = nla_data(a[OVS_KEY_ATTR_ETHERNET]);
@@ -955,10 +947,10 @@ static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
 				eth_key->eth_src, ETH_ALEN, is_mask);
 		SW_FLOW_KEY_MEMCPY(match, eth.dst,
 				eth_key->eth_dst, ETH_ALEN, is_mask);
-		attrs &= ~(1 << OVS_KEY_ATTR_ETHERNET);
+		*attrs &= ~(1 << OVS_KEY_ATTR_ETHERNET);
 	}
 
-	if (attrs & (1 << OVS_KEY_ATTR_VLAN)) {
+	if (*attrs & (1 << OVS_KEY_ATTR_VLAN)) {
 		__be16 tci;
 
 		tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);
@@ -972,10 +964,10 @@ static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
 		}
 
 		SW_FLOW_KEY_PUT(match, eth.tci, tci, is_mask);
-		attrs &= ~(1 << OVS_KEY_ATTR_VLAN);
+		*attrs &= ~(1 << OVS_KEY_ATTR_VLAN);
 	}
 
-	if (attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) {
+	if (*attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) {
 		__be16 eth_type;
 
 		eth_type = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]);
@@ -989,24 +981,34 @@ static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
 		}
 
 		SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask);
-		attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE);
+		*attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE);
 	} else if (!is_mask) {
 		SW_FLOW_KEY_PUT(match, eth.type, htons(ETH_P_802_2), is_mask);
 	}
 
-	if (attrs & (1 << OVS_KEY_ATTR_IPV4)) {
-		const struct ovs_key_ipv4 *ipv4_key;
+	return 0;
+}
 
-		/* Add eth.type value for layer 3 flows */
-		if (!(attrs & (1ULL << OVS_KEY_ATTR_ETHERTYPE))) {
-			__be16 eth_type;
+static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
+				u64 attrs, const struct nlattr **a,
+				bool is_mask, bool log)
+{
+	int err;
+	bool is_layer3;
 
-			if (is_mask)
-				eth_type = htons(0xffff);
-			else
-				eth_type = htons(ETH_P_IP);
-			SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask);
-		}
+	err = metadata_from_nlattrs(net, match, &attrs, a, is_mask, log);
+	if (err < 0)
+		return err;
+	is_layer3 = err != 0;
+
+	if (!is_layer3) {
+		err = l2_from_nlattrs(net, match, &attrs, a, is_mask, log);
+		if (err < 0)
+			return err;
+	}
+
+	if (attrs & (1 << OVS_KEY_ATTR_IPV4)) {
+		const struct ovs_key_ipv4 *ipv4_key;
 
 		ipv4_key = nla_data(a[OVS_KEY_ATTR_IPV4]);
 		if (!is_mask && ipv4_key->ipv4_frag > OVS_FRAG_TYPE_MAX) {
@@ -1032,17 +1034,6 @@ static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
 	if (attrs & (1 << OVS_KEY_ATTR_IPV6)) {
 		const struct ovs_key_ipv6 *ipv6_key;
 
-		/* Add eth.type value for layer 3 flows */
-		if (!(attrs & (1ULL << OVS_KEY_ATTR_ETHERTYPE))) {
-			__be16 eth_type;
-
-			if (is_mask)
-				eth_type = htons(0xffff);
-			else
-				eth_type = htons(ETH_P_IPV6);
-			SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask);
-		}
-
 		ipv6_key = nla_data(a[OVS_KEY_ATTR_IPV6]);
 		if (!is_mask && ipv6_key->ipv6_frag > OVS_FRAG_TYPE_MAX) {
 			OVS_NLERR(log, "IPv6 frag type %d is out of range max %d",
@@ -1465,7 +1456,11 @@ int ovs_nla_get_flow_metadata(struct net *net, const struct nlattr *attr,
 	memset(&key->ct, 0, sizeof(key->ct));
 	key->phy.in_port = DP_MAX_PORTS;
 
-	return metadata_from_nlattrs(net, &match, &attrs, a, false, log);
+	err = metadata_from_nlattrs(net, &match, &attrs, a, false, log);
+	if (err < 0)
+		return err;
+
+	return 0;
 }
 
 static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
@@ -1551,6 +1546,10 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
 
 		if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, output->eth.type))
 			goto nla_put_failure;
+	} else {
+		if (nla_put_be16(skb, OVS_KEY_ATTR_PACKET_ETHERTYPE,
+				 output->eth.type))
+			goto nla_put_failure;
 	}
 
 	if (swkey->eth.type == htons(ETH_P_IP)) {
@@ -2184,8 +2183,6 @@ static int validate_set(const struct nlattr *a,
 		break;
 
 	case OVS_KEY_ATTR_MPLS:
-		if (is_layer3)
-			return -EINVAL;
 		if (!eth_p_mpls(eth_type))
 			return -EINVAL;
 		break;
@@ -2360,7 +2357,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 		case OVS_ACTION_ATTR_PUSH_MPLS: {
 			const struct ovs_action_push_mpls *mpls = nla_data(a);
 
-			if (is_layer3 || !eth_p_mpls(mpls->mpls_ethertype))
+			if (!eth_p_mpls(mpls->mpls_ethertype))
 				return -EINVAL;
 			/* Prohibit push MPLS other than to a white list
 			 * for packets that have a known tag order.
@@ -2377,7 +2374,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 		}
 
 		case OVS_ACTION_ATTR_POP_MPLS:
-			if (is_layer3 || vlan_tci & htons(VLAN_TAG_PRESENT) ||
+			if (vlan_tci & htons(VLAN_TAG_PRESENT) ||
 			    !eth_p_mpls(eth_type))
 				return -EINVAL;
 
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v9 net-next 7/7] openvswitch: use ipgre tunnel rather than gretap tunnel
  2016-05-04  7:36 [PATCH v9 net-next 0/7] openvswitch: support for layer 3 encapsulated packets Simon Horman
                   ` (5 preceding siblings ...)
  2016-05-04  7:36 ` [PATCH v9 net-next 6/7] openvswitch: extend layer 3 support to cover non-IP packets Simon Horman
@ 2016-05-04  7:36 ` Simon Horman
       [not found]   ` <1462347393-22354-8-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
  6 siblings, 1 reply; 43+ messages in thread
From: Simon Horman @ 2016-05-04  7:36 UTC (permalink / raw)
  To: netdev, dev; +Cc: Simon Horman

This allows GRE tunnels to send and receive both
layer 2 packets (packets with an ethernet header) and
layer 3 packets (packets without an ethernet header).

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
v9
New Patch
---
 include/net/gre.h              |  4 ++--
 net/ipv4/ip_gre.c              |  8 ++++----
 net/openvswitch/vport-gre.c    |  4 ++--
 net/openvswitch/vport-netdev.c | 12 +++++++++++-
 net/openvswitch/vport-netdev.h |  1 +
 5 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/include/net/gre.h b/include/net/gre.h
index 29e37322c06e..181357d124b1 100644
--- a/include/net/gre.h
+++ b/include/net/gre.h
@@ -23,8 +23,8 @@ struct gre_protocol {
 int gre_add_protocol(const struct gre_protocol *proto, u8 version);
 int gre_del_protocol(const struct gre_protocol *proto, u8 version);
 
-struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
-				       u8 name_assign_type);
+struct net_device *gre_fb_dev_create(struct net *net, const char *name,
+				     u8 name_assign_type);
 int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
 		     bool *csum_err, int *hdr_len);
 
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 52011f78e3c7..d542812217d7 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1114,8 +1114,8 @@ static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
 	.get_link_net	= ip_tunnel_get_link_net,
 };
 
-struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
-					u8 name_assign_type)
+struct net_device *gre_fb_dev_create(struct net *net, const char *name,
+				     u8 name_assign_type)
 {
 	struct nlattr *tb[IFLA_MAX + 1];
 	struct net_device *dev;
@@ -1125,7 +1125,7 @@ struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
 	memset(&tb, 0, sizeof(tb));
 
 	dev = rtnl_create_link(net, name, name_assign_type,
-			       &ipgre_tap_ops, tb);
+			       &ipgre_link_ops, tb);
 	if (IS_ERR(dev))
 		return dev;
 
@@ -1149,7 +1149,7 @@ out:
 	free_netdev(dev);
 	return ERR_PTR(err);
 }
-EXPORT_SYMBOL_GPL(gretap_fb_dev_create);
+EXPORT_SYMBOL_GPL(gre_fb_dev_create);
 
 static int __net_init ipgre_tap_init_net(struct net *net)
 {
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index f003225de994..b1aa02904ae4 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -60,7 +60,7 @@ static struct vport *gre_tnl_create(const struct vport_parms *parms)
 		return vport;
 
 	rtnl_lock();
-	dev = gretap_fb_dev_create(net, parms->name, NET_NAME_USER);
+	dev = gre_fb_dev_create(net, parms->name, NET_NAME_USER);
 	if (IS_ERR(dev)) {
 		rtnl_unlock();
 		ovs_vport_free(vport);
@@ -87,7 +87,7 @@ static struct vport *gre_create(const struct vport_parms *parms)
 static struct vport_ops ovs_gre_vport_ops = {
 	.type		= OVS_VPORT_TYPE_GRE,
 	.create		= gre_create,
-	.send		= ovs_netdev_send_tap,
+	.send		= ovs_netdev_send_raw_tun,
 	.destroy	= ovs_netdev_tunnel_destroy,
 };
 
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 0e0b9286dd11..e6a2718204a8 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -99,7 +99,8 @@ struct vport *ovs_netdev_link(struct vport *vport, const char *name)
 	}
 
 	if (vport->dev->flags & IFF_LOOPBACK ||
-	    vport->dev->type != ARPHRD_ETHER ||
+	    (vport->dev->type != ARPHRD_ETHER &&
+	     vport->dev->type != ARPHRD_IPGRE) ||
 	    ovs_is_internal_dev(vport->dev)) {
 		err = -EINVAL;
 		goto error_put;
@@ -207,6 +208,15 @@ int ovs_netdev_send_tap(struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(ovs_netdev_send_tap);
 
+int ovs_netdev_send_raw_tun(struct sk_buff *skb)
+{
+	if (skb->mac_len)
+		skb->protocol = ntohs(ETH_P_TEB);
+
+	return dev_queue_xmit(skb);
+}
+EXPORT_SYMBOL_GPL(ovs_netdev_send_raw_tun);
+
 /* Returns null if this device is not attached to a datapath. */
 struct vport *ovs_netdev_get_vport(struct net_device *dev)
 {
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index 02f38a822334..ae59c02ba6a9 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -35,4 +35,5 @@ void ovs_netdev_exit(void);
 void ovs_netdev_tunnel_destroy(struct vport *vport);
 
 int ovs_netdev_send_tap(struct sk_buff *skb);
+int ovs_netdev_send_raw_tun(struct sk_buff *skb);
 #endif /* vport_netdev.h */
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 3/7] openvswitch: add support to push and pop mpls for layer3 packets
       [not found]   ` <1462347393-22354-4-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
@ 2016-05-05 17:35     ` pravin shelar
  2016-05-06  4:33       ` Simon Horman
  0 siblings, 1 reply; 43+ messages in thread
From: pravin shelar @ 2016-05-05 17:35 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers

On Wed, May 4, 2016 at 12:36 AM, Simon Horman
<simon.horman@netronome.com> wrote:
> Allow push and pop mpls actions to act on layer 3 packets by teaching
> them not to access non-existent L2 headers of such packets.
>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
> ---
> v9
> * New Patch
> ---
>  net/openvswitch/actions.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 879185fe183f..fe885a89f501 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -160,8 +160,10 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
>
>         skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
>
> -       hdr = eth_hdr(skb);
> -       hdr->h_proto = mpls->mpls_ethertype;
> +       if (skb->mac_len) {
can you move hdr definition to this block?

> +               hdr = eth_hdr(skb);
> +               hdr->h_proto = mpls->mpls_ethertype;
> +       }
>
We need to update skb checksum here.  This bug is not related to your
patch. since you are changing this code can you also fix this?

>         if (!skb->inner_protocol)
>                 skb_set_inner_protocol(skb, skb->protocol);
> @@ -174,7 +176,6 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
>  static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
>                     const __be16 ethertype)
>  {
> -       struct ethhdr *hdr;
>         int err;
>
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
       [not found]   ` <1462347393-22354-5-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
@ 2016-05-05 17:37     ` pravin shelar
  2016-05-06  5:57       ` Simon Horman
  0 siblings, 1 reply; 43+ messages in thread
From: pravin shelar @ 2016-05-05 17:37 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers

On Wed, May 4, 2016 at 12:36 AM, Simon Horman
<simon.horman@netronome.com> wrote:
> From: Lorand Jakab <lojakab@cisco.com>
>
> Implementation of the pop_eth and push_eth actions in the kernel, and
> layer 3 flow support.
>
> This doesn't actually do anything yet as no layer 2 tunnel ports are
> supported yet. The original patch by Lorand was against the Open vSwtich
> tree which has L2 LISP tunnels but that is not supported in mainline Linux.
> I (Simon) plan to follow up with support for non-TEB GRE ports based on
> work by Thomas Morin.
>
> Cc: Thomas Morin <thomas.morin@orange.com>
> Signed-off-by: Lorand Jakab <lojakab@cisco.com>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
>
> ---

...

> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index 0ea128eeeab2..6e174ea5f2bb 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -468,28 +468,31 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>
>         skb_reset_mac_header(skb);
>
> -       /* Link layer.  We are guaranteed to have at least the 14 byte Ethernet
> -        * header in the linear data area.
> -        */
> -       eth = eth_hdr(skb);
> -       ether_addr_copy(key->eth.src, eth->h_source);
> -       ether_addr_copy(key->eth.dst, eth->h_dest);
> +       /* Link layer. */
> +       if (key->phy.is_layer3) {
> +               key->eth.tci = 0;
> +               key->eth.type = skb->protocol;
> +       } else {
> +               eth = eth_hdr(skb);
> +               ether_addr_copy(key->eth.src, eth->h_source);
> +               ether_addr_copy(key->eth.dst, eth->h_dest);
>
> -       __skb_pull(skb, 2 * ETH_ALEN);
> -       /* We are going to push all headers that we pull, so no need to
> -        * update skb->csum here.
> -        */
> +               __skb_pull(skb, 2 * ETH_ALEN);
> +               /* We are going to push all headers that we pull, so no need to
> +                * update skb->csum here.
> +                */
>
> -       key->eth.tci = 0;
> -       if (skb_vlan_tag_present(skb))
> -               key->eth.tci = htons(skb->vlan_tci);
> -       else if (eth->h_proto == htons(ETH_P_8021Q))
> -               if (unlikely(parse_vlan(skb, key)))
> -                       return -ENOMEM;
> +               key->eth.tci = 0;
> +               if (skb_vlan_tag_present(skb))
> +                       key->eth.tci = htons(skb->vlan_tci);
> +               else if (eth->h_proto == htons(ETH_P_8021Q))
> +                       if (unlikely(parse_vlan(skb, key)))
> +                               return -ENOMEM;
>
> -       key->eth.type = parse_ethertype(skb);
> -       if (unlikely(key->eth.type == htons(0)))
> -               return -ENOMEM;
> +               key->eth.type = parse_ethertype(skb);
> +               if (unlikely(key->eth.type == htons(0)))
> +                       return -ENOMEM;
> +       }
>
>         skb_reset_network_header(skb);
>         skb_reset_mac_len(skb);
> @@ -696,11 +699,23 @@ int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key)
>  int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>                          struct sk_buff *skb, struct sw_flow_key *key)
>  {
> +       bool is_layer3 = false;
> +       bool is_teb = false;
is_layer3 and is_teb are mutually exclusive, so can't we use single
boolean here?


> +       int err;
> +
>         /* Extract metadata from packet. */
>         if (tun_info) {
>                 key->tun_proto = ip_tunnel_info_af(tun_info);
>                 memcpy(&key->tun_key, &tun_info->key, sizeof(key->tun_key));
>
> +               if (OVS_CB(skb)->input_vport->dev->type != ARPHRD_ETHER) {
> +                       if (skb->protocol == htons(ETH_P_TEB))
> +                               is_teb = true;
> +                       else
> +                               is_layer3 = true;
> +               }
> +
On transmit side you are using mac_len to detect l3 packet, why not do
same while extracting the key?
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 7/7] openvswitch: use ipgre tunnel rather than gretap tunnel
       [not found]   ` <1462347393-22354-8-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
@ 2016-05-05 21:45     ` pravin shelar
  2016-05-06  6:54       ` Simon Horman
  0 siblings, 1 reply; 43+ messages in thread
From: pravin shelar @ 2016-05-05 21:45 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers

On Wed, May 4, 2016 at 12:36 AM, Simon Horman
<simon.horman@netronome.com> wrote:
> This allows GRE tunnels to send and receive both
> layer 2 packets (packets with an ethernet header) and
> layer 3 packets (packets without an ethernet header).
>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
> ---
> v9
> New Patch
> ---
>  include/net/gre.h              |  4 ++--
>  net/ipv4/ip_gre.c              |  8 ++++----
>  net/openvswitch/vport-gre.c    |  4 ++--
>  net/openvswitch/vport-netdev.c | 12 +++++++++++-
>  net/openvswitch/vport-netdev.h |  1 +
>  5 files changed, 20 insertions(+), 9 deletions(-)
>
...
...
> diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
> index f003225de994..b1aa02904ae4 100644
> --- a/net/openvswitch/vport-gre.c
> +++ b/net/openvswitch/vport-gre.c
> @@ -60,7 +60,7 @@ static struct vport *gre_tnl_create(const struct vport_parms *parms)
>                 return vport;
>
>         rtnl_lock();
> -       dev = gretap_fb_dev_create(net, parms->name, NET_NAME_USER);
> +       dev = gre_fb_dev_create(net, parms->name, NET_NAME_USER);
>         if (IS_ERR(dev)) {
>                 rtnl_unlock();
>                 ovs_vport_free(vport);
> @@ -87,7 +87,7 @@ static struct vport *gre_create(const struct vport_parms *parms)
>  static struct vport_ops ovs_gre_vport_ops = {
>         .type           = OVS_VPORT_TYPE_GRE,
>         .create         = gre_create,
> -       .send           = ovs_netdev_send_tap,
> +       .send           = ovs_netdev_send_raw_tun,
>         .destroy        = ovs_netdev_tunnel_destroy,
>  };
>

This trick of using vport-send only works in case of compat tunnel
device mode. But in normal case the LWT interface allows us to use net
devices for tunnel traffic. So you need some sort of mechanism to
handle l3 only packets on vport-netdev.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 3/7] openvswitch: add support to push and pop mpls for layer3 packets
  2016-05-05 17:35     ` pravin shelar
@ 2016-05-06  4:33       ` Simon Horman
  0 siblings, 0 replies; 43+ messages in thread
From: Simon Horman @ 2016-05-06  4:33 UTC (permalink / raw)
  To: pravin shelar; +Cc: Linux Kernel Network Developers, ovs dev

On Thu, May 05, 2016 at 10:35:52AM -0700, pravin shelar wrote:
> On Wed, May 4, 2016 at 12:36 AM, Simon Horman
> <simon.horman@netronome.com> wrote:
> > Allow push and pop mpls actions to act on layer 3 packets by teaching
> > them not to access non-existent L2 headers of such packets.
> >
> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
> > ---
> > v9
> > * New Patch
> > ---
> >  net/openvswitch/actions.c | 22 ++++++++++++++--------
> >  1 file changed, 14 insertions(+), 8 deletions(-)
> >
> > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> > index 879185fe183f..fe885a89f501 100644
> > --- a/net/openvswitch/actions.c
> > +++ b/net/openvswitch/actions.c
> > @@ -160,8 +160,10 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
> >
> >         skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
> >
> > -       hdr = eth_hdr(skb);
> > -       hdr->h_proto = mpls->mpls_ethertype;
> > +       if (skb->mac_len) {
> can you move hdr definition to this block?

Sure, sorry for overlooking that.

> > +               hdr = eth_hdr(skb);
> > +               hdr->h_proto = mpls->mpls_ethertype;
> > +       }
> >
> We need to update skb checksum here.  This bug is not related to your
> patch. since you are changing this code can you also fix this?

Yes, of course. Would something like this broken out into a patch earlier
in the series resolve the problem you see? If so it looks like a similar
fix is also needed for pop_mpls().

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 63d29263d51a..89ad0027420a 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -161,6 +160,14 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 		struct ethhdr *hdr;
 
 		hdr = eth_hdr(skb);
+
+		if (skb->ip_summed == CHECKSUM_COMPLETE) {
+			__be16 diff[] = { ~(hdr->h_proto), mpls->mpls_ethertype };
+
+			skb->csum = ~csum_partial((char *)diff, sizeof(diff),
+						~skb->csum);
+		}
+
 		hdr->h_proto = mpls->mpls_ethertype;
 	}
 

> 
> >         if (!skb->inner_protocol)
> >                 skb_set_inner_protocol(skb, skb->protocol);
> > @@ -174,7 +176,6 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
> >  static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
> >                     const __be16 ethertype)
> >  {
> > -       struct ethhdr *hdr;
> >         int err;
> >

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-05 17:37     ` pravin shelar
@ 2016-05-06  5:57       ` Simon Horman
  2016-05-06  9:25         ` Jiri Benc
  0 siblings, 1 reply; 43+ messages in thread
From: Simon Horman @ 2016-05-06  5:57 UTC (permalink / raw)
  To: pravin shelar
  Cc: Linux Kernel Network Developers, ovs dev, Lorand Jakab,
	Thomas Morin, Jiri Benc

[CC Jiri Benc]

On Thu, May 05, 2016 at 10:37:08AM -0700, pravin shelar wrote:
> On Wed, May 4, 2016 at 12:36 AM, Simon Horman
> <simon.horman@netronome.com> wrote:
> > From: Lorand Jakab <lojakab@cisco.com>
> >
> > Implementation of the pop_eth and push_eth actions in the kernel, and
> > layer 3 flow support.
> >
> > This doesn't actually do anything yet as no layer 2 tunnel ports are
> > supported yet. The original patch by Lorand was against the Open vSwtich
> > tree which has L2 LISP tunnels but that is not supported in mainline Linux.
> > I (Simon) plan to follow up with support for non-TEB GRE ports based on
> > work by Thomas Morin.
> >
> > Cc: Thomas Morin <thomas.morin@orange.com>
> > Signed-off-by: Lorand Jakab <lojakab@cisco.com>
> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
> >
> > ---
> 
> ...
> 
> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> > index 0ea128eeeab2..6e174ea5f2bb 100644
> > --- a/net/openvswitch/flow.c
> > +++ b/net/openvswitch/flow.c
> > @@ -468,28 +468,31 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
> >
> >         skb_reset_mac_header(skb);
> >
> > -       /* Link layer.  We are guaranteed to have at least the 14 byte Ethernet
> > -        * header in the linear data area.
> > -        */
> > -       eth = eth_hdr(skb);
> > -       ether_addr_copy(key->eth.src, eth->h_source);
> > -       ether_addr_copy(key->eth.dst, eth->h_dest);
> > +       /* Link layer. */
> > +       if (key->phy.is_layer3) {
> > +               key->eth.tci = 0;
> > +               key->eth.type = skb->protocol;
> > +       } else {
> > +               eth = eth_hdr(skb);
> > +               ether_addr_copy(key->eth.src, eth->h_source);
> > +               ether_addr_copy(key->eth.dst, eth->h_dest);
> >
> > -       __skb_pull(skb, 2 * ETH_ALEN);
> > -       /* We are going to push all headers that we pull, so no need to
> > -        * update skb->csum here.
> > -        */
> > +               __skb_pull(skb, 2 * ETH_ALEN);
> > +               /* We are going to push all headers that we pull, so no need to
> > +                * update skb->csum here.
> > +                */
> >
> > -       key->eth.tci = 0;
> > -       if (skb_vlan_tag_present(skb))
> > -               key->eth.tci = htons(skb->vlan_tci);
> > -       else if (eth->h_proto == htons(ETH_P_8021Q))
> > -               if (unlikely(parse_vlan(skb, key)))
> > -                       return -ENOMEM;
> > +               key->eth.tci = 0;
> > +               if (skb_vlan_tag_present(skb))
> > +                       key->eth.tci = htons(skb->vlan_tci);
> > +               else if (eth->h_proto == htons(ETH_P_8021Q))
> > +                       if (unlikely(parse_vlan(skb, key)))
> > +                               return -ENOMEM;
> >
> > -       key->eth.type = parse_ethertype(skb);
> > -       if (unlikely(key->eth.type == htons(0)))
> > -               return -ENOMEM;
> > +               key->eth.type = parse_ethertype(skb);
> > +               if (unlikely(key->eth.type == htons(0)))
> > +                       return -ENOMEM;
> > +       }
> >
> >         skb_reset_network_header(skb);
> >         skb_reset_mac_len(skb);
> > @@ -696,11 +699,23 @@ int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key)
> >  int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
> >                          struct sk_buff *skb, struct sw_flow_key *key)
> >  {
> > +       bool is_layer3 = false;
> > +       bool is_teb = false;
> is_layer3 and is_teb are mutually exclusive, so can't we use single
> boolean here?

Sure, I can do something like the following if you prefer.
To my mind it makes things a bit less readable. But I don't feel
strongly about this.

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index d320c2657627..fc92cf542101 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -701,7 +701,6 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 			 struct sk_buff *skb, struct sw_flow_key *key)
 {
 	bool is_layer3 = false;
-	bool is_teb = false;
 	int err;
 
 	/* Extract metadata from packet. */
@@ -709,13 +708,9 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 		key->tun_proto = ip_tunnel_info_af(tun_info);
 		memcpy(&key->tun_key, &tun_info->key, sizeof(key->tun_key));
 
-		if (OVS_CB(skb)->input_vport->dev->type != ARPHRD_ETHER) {
-			if (skb->protocol == htons(ETH_P_TEB))
-				is_teb = true;
-			else
-				is_layer3 = true;
-		}
-
+		if (OVS_CB(skb)->input_vport->dev->type != ARPHRD_ETHER &&
+		    skb->protocol != htons(ETH_P_TEB))
+			is_layer3 = true;
 
 		if (tun_info->options_len) {
 			BUILD_BUG_ON((1 << (sizeof(tun_info->options_len) *
@@ -746,10 +741,12 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 	if (err < 0)
 		return err;
 
-	if (is_teb)
-		skb->protocol = key->eth.type;
-	else if (is_layer3)
-		key->eth.type = skb->protocol;
+	if (tun_info && OVS_CB(skb)->input_vport->dev->type != ARPHRD_ETHER) {
+		if (is_layer3)
+			key->eth.type = skb->protocol;
+		else
+			skb->protocol = key->eth.type;
+	}
 
 	return err;
 }

> > +       int err;
> > +
> >         /* Extract metadata from packet. */
> >         if (tun_info) {
> >                 key->tun_proto = ip_tunnel_info_af(tun_info);
> >                 memcpy(&key->tun_key, &tun_info->key, sizeof(key->tun_key));
> >
> > +               if (OVS_CB(skb)->input_vport->dev->type != ARPHRD_ETHER) {
> > +                       if (skb->protocol == htons(ETH_P_TEB))
> > +                               is_teb = true;
> > +                       else
> > +                               is_layer3 = true;
> > +               }
> > +
> On transmit side you are using mac_len to detect l3 packet, why not do
> same while extracting the key?

Unfortunately mac_len can't be relied on here, emprically it has the same
value (28 in my tests) for both the TEB and layer3 case above.

Perhaps that could be changed by futher enhancements in the tunneling code
but I think things are symetric as they stand:

* On recieve skb->protocol can be read to distinguish TEB and layer3 packets
* On transmit skb->protocol should be set to distinguish TEB and layer3 packets

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 7/7] openvswitch: use ipgre tunnel rather than gretap tunnel
  2016-05-05 21:45     ` pravin shelar
@ 2016-05-06  6:54       ` Simon Horman
  2016-05-06  9:15         ` Jiri Benc
  0 siblings, 1 reply; 43+ messages in thread
From: Simon Horman @ 2016-05-06  6:54 UTC (permalink / raw)
  To: pravin shelar; +Cc: Linux Kernel Network Developers, ovs dev, Jiri Benc

[CC Jiri Benc]

On Thu, May 05, 2016 at 02:45:15PM -0700, pravin shelar wrote:
> On Wed, May 4, 2016 at 12:36 AM, Simon Horman
> <simon.horman@netronome.com> wrote:
> > This allows GRE tunnels to send and receive both
> > layer 2 packets (packets with an ethernet header) and
> > layer 3 packets (packets without an ethernet header).
> >
> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
> > ---
> > v9
> > New Patch
> > ---
> >  include/net/gre.h              |  4 ++--
> >  net/ipv4/ip_gre.c              |  8 ++++----
> >  net/openvswitch/vport-gre.c    |  4 ++--
> >  net/openvswitch/vport-netdev.c | 12 +++++++++++-
> >  net/openvswitch/vport-netdev.h |  1 +
> >  5 files changed, 20 insertions(+), 9 deletions(-)
> >
> ...
> ...
> > diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
> > index f003225de994..b1aa02904ae4 100644
> > --- a/net/openvswitch/vport-gre.c
> > +++ b/net/openvswitch/vport-gre.c
> > @@ -60,7 +60,7 @@ static struct vport *gre_tnl_create(const struct vport_parms *parms)
> >                 return vport;
> >
> >         rtnl_lock();
> > -       dev = gretap_fb_dev_create(net, parms->name, NET_NAME_USER);
> > +       dev = gre_fb_dev_create(net, parms->name, NET_NAME_USER);
> >         if (IS_ERR(dev)) {
> >                 rtnl_unlock();
> >                 ovs_vport_free(vport);
> > @@ -87,7 +87,7 @@ static struct vport *gre_create(const struct vport_parms *parms)
> >  static struct vport_ops ovs_gre_vport_ops = {
> >         .type           = OVS_VPORT_TYPE_GRE,
> >         .create         = gre_create,
> > -       .send           = ovs_netdev_send_tap,
> > +       .send           = ovs_netdev_send_raw_tun,
> >         .destroy        = ovs_netdev_tunnel_destroy,
> >  };
> >
> 
> This trick of using vport-send only works in case of compat tunnel
> device mode. But in normal case the LWT interface allows us to use net
> devices for tunnel traffic. So you need some sort of mechanism to
> handle l3 only packets on vport-netdev.

Thanks. Clearly I did not consider that and clearly I should have.

On the kernel side I wonder if something like the following (untested
with tunnel vport-netdev) would resolve the problem.

I think that some work probably also needs to be done in user-space to make it
aware of how to handle l3 only packets on such vports.

diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
index 906ae9c58c2e..7a06e19f5279 100644
--- a/net/openvswitch/vport-geneve.c
+++ b/net/openvswitch/vport-geneve.c
@@ -116,7 +116,7 @@ static struct vport_ops ovs_geneve_vport_ops = {
 	.create		= geneve_create,
 	.destroy	= ovs_netdev_tunnel_destroy,
 	.get_options	= geneve_get_options,
-	.send		= ovs_netdev_send_tap,
+	.send		= ovs_netdev_send,
 };
 
 static int __init ovs_geneve_tnl_init(void)
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index b1aa02904ae4..c1cab9dd392f 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -87,7 +87,7 @@ static struct vport *gre_create(const struct vport_parms *parms)
 static struct vport_ops ovs_gre_vport_ops = {
 	.type		= OVS_VPORT_TYPE_GRE,
 	.create		= gre_create,
-	.send		= ovs_netdev_send_raw_tun,
+	.send		= ovs_netdev_send,
 	.destroy	= ovs_netdev_tunnel_destroy,
 };
 
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index e6a2718204a8..0ffe07721336 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -197,25 +197,20 @@ void ovs_netdev_tunnel_destroy(struct vport *vport)
 }
 EXPORT_SYMBOL_GPL(ovs_netdev_tunnel_destroy);
 
-int ovs_netdev_send_tap(struct sk_buff *skb)
+int ovs_netdev_send(struct sk_buff *skb)
 {
-	/* Only send L2 packets */
-	if (skb->mac_len)
-		return dev_queue_xmit(skb);
+	struct net_device *dev = skb->dev;
 
-	kfree_skb(skb);
-	return -EINVAL;
-}
-EXPORT_SYMBOL_GPL(ovs_netdev_send_tap);
-
-int ovs_netdev_send_raw_tun(struct sk_buff *skb)
-{
-	if (skb->mac_len)
-		skb->protocol = ntohs(ETH_P_TEB);
+	if (dev->type != ARPHRD_ETHER && skb->mac_len) {
+		skb->protocol = htons(ETH_P_TEB);
+	} else if (dev->type == ARPHRD_ETHER && !skb->mac_len) {
+		kfree_skb(skb);
+		return -EINVAL;
+	}
 
 	return dev_queue_xmit(skb);
 }
-EXPORT_SYMBOL_GPL(ovs_netdev_send_raw_tun);
+EXPORT_SYMBOL_GPL(ovs_netdev_send);
 
 /* Returns null if this device is not attached to a datapath. */
 struct vport *ovs_netdev_get_vport(struct net_device *dev)
@@ -231,7 +226,7 @@ static struct vport_ops ovs_netdev_vport_ops = {
 	.type		= OVS_VPORT_TYPE_NETDEV,
 	.create		= netdev_create,
 	.destroy	= netdev_destroy,
-	.send		= ovs_netdev_send_tap,
+	.send		= ovs_netdev_send,
 };
 
 int __init ovs_netdev_init(void)
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index ae59c02ba6a9..637b14a9963c 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -34,6 +34,5 @@ void ovs_netdev_exit(void);
 
 void ovs_netdev_tunnel_destroy(struct vport *vport);
 
-int ovs_netdev_send_tap(struct sk_buff *skb);
-int ovs_netdev_send_raw_tun(struct sk_buff *skb);
+int ovs_netdev_send(struct sk_buff *skb);
 #endif /* vport_netdev.h */
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 009a4aab505a..13f11ad7e35a 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -153,7 +153,7 @@ static struct vport_ops ovs_vxlan_netdev_vport_ops = {
 	.create			= vxlan_create,
 	.destroy		= ovs_netdev_tunnel_destroy,
 	.get_options		= vxlan_get_options,
-	.send			= ovs_netdev_send_tap,
+	.send			= ovs_netdev_send,
 };
 
 static int __init ovs_vxlan_tnl_init(void)

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 7/7] openvswitch: use ipgre tunnel rather than gretap tunnel
  2016-05-06  6:54       ` Simon Horman
@ 2016-05-06  9:15         ` Jiri Benc
  0 siblings, 0 replies; 43+ messages in thread
From: Jiri Benc @ 2016-05-06  9:15 UTC (permalink / raw)
  To: Simon Horman; +Cc: pravin shelar, Linux Kernel Network Developers, ovs dev

On Fri, 6 May 2016 15:54:02 +0900, Simon Horman wrote:
> -int ovs_netdev_send_raw_tun(struct sk_buff *skb)
> -{
> -	if (skb->mac_len)
> -		skb->protocol = ntohs(ETH_P_TEB);
> +	if (dev->type != ARPHRD_ETHER && skb->mac_len) {
> +		skb->protocol = htons(ETH_P_TEB);
> +	} else if (dev->type == ARPHRD_ETHER && !skb->mac_len) {
> +		kfree_skb(skb);
> +		return -EINVAL;
> +	}

This was something I was missing in your patches (sorry, did not get to
the full review yet).

You'll also need to enable ARPHRD_NONE and ARPHRD_IPGRE interfaces in
ovs_netdev_link.

 Jiri

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-06  5:57       ` Simon Horman
@ 2016-05-06  9:25         ` Jiri Benc
  2016-05-09  8:04           ` Simon Horman
  0 siblings, 1 reply; 43+ messages in thread
From: Jiri Benc @ 2016-05-06  9:25 UTC (permalink / raw)
  To: Simon Horman
  Cc: pravin shelar, Linux Kernel Network Developers, ovs dev,
	Lorand Jakab, Thomas Morin

On Fri, 6 May 2016 14:57:07 +0900, Simon Horman wrote:
> On Thu, May 05, 2016 at 10:37:08AM -0700, pravin shelar wrote:
> > On transmit side you are using mac_len to detect l3 packet, why not do
> > same while extracting the key?

I agree. The skb should be self-contained, i.e. it should be obvious
whether it has the MAC header set or not just from the skb itself at
any point in the packet processing. Otherwise, I'd expect things like
recirculation to break after push/pop of eth header.

> Unfortunately mac_len can't be relied on here, emprically it has the same
> value (28 in my tests) for both the TEB and layer3 case above.

That's strange, it looks like there's something setting the mac header
unconditionally in ovs code. We should find that place and change it.

The ARPHRD_NONE interfaces don't even set mac_header and mac_len, this
will need to be set by ovs upon getting frame from such interface. 

> Perhaps that could be changed by futher enhancements in the tunneling code
> but I think things are symetric as they stand:
> 
> * On recieve skb->protocol can be read to distinguish TEB and layer3 packets
> * On transmit skb->protocol should be set to distinguish TEB and layer3 packets

Yes, but you need to act upon this directly after receiving the
frame/just before sending the frame and set up an internal flag that
will be used throughout the code. That way, the packet can be handed
over to different parts of the code, recirculated, etc. without
worries. skb->mac_len is probably a good candidate for such flag.

 Jiri

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-04  7:36 ` [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support Simon Horman
       [not found]   ` <1462347393-22354-5-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
@ 2016-05-06  9:35   ` Jiri Benc
  2016-05-09  8:18     ` Simon Horman
  2016-05-17 14:32   ` Jiri Benc
  2 siblings, 1 reply; 43+ messages in thread
From: Jiri Benc @ 2016-05-06  9:35 UTC (permalink / raw)
  To: Simon Horman; +Cc: netdev, dev, Lorand Jakab, Thomas Morin

On Wed,  4 May 2016 16:36:30 +0900, Simon Horman wrote:
> +static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
> +		    const struct ovs_action_push_eth *ethh)
> +{
> +	int err;
> +
> +	/* De-accelerate any hardware accelerated VLAN tag added to a previous
> +	 * Ethernet header */
> +	err = skb_vlan_deaccel(skb);
> +	if (unlikely(err))
> +		return err;
> +
> +	/* Add the new Ethernet header */
> +	if (skb_cow_head(skb, ETH_HLEN) < 0)
> +		return -ENOMEM;
> +
> +	skb_push(skb, ETH_HLEN);
> +	skb_reset_mac_header(skb);
> +	skb_reset_mac_len(skb);
> +
> +	ether_addr_copy(eth_hdr(skb)->h_source, ethh->addresses.eth_src);
> +	ether_addr_copy(eth_hdr(skb)->h_dest, ethh->addresses.eth_dst);
> +	eth_hdr(skb)->h_proto = ethh->eth_type;

This doesn't seem right. We know the packet type, it's skb->protocol.
We should fill in that.

In addition, we should check whether mac_len > 0 and in such case,
change skb->protocol to ETH_P_TEB first (and store that value in the
pushed eth header).

Similarly on pop_eth, we need to check skb->protocol and if it is
ETH_P_TEB, call eth_type_trans on the modified frame to set the new
skb->protocol correctly. It's probably not that simple, as we'd need a
version of eth_type_trans that doesn't need a net device.

 Jiri

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-06  9:25         ` Jiri Benc
@ 2016-05-09  8:04           ` Simon Horman
       [not found]             ` <20160509080420.GA4470-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 43+ messages in thread
From: Simon Horman @ 2016-05-09  8:04 UTC (permalink / raw)
  To: Jiri Benc
  Cc: pravin shelar, Linux Kernel Network Developers, ovs dev,
	Lorand Jakab, Thomas Morin

On Fri, May 06, 2016 at 11:25:14AM +0200, Jiri Benc wrote:
> On Fri, 6 May 2016 14:57:07 +0900, Simon Horman wrote:
> > On Thu, May 05, 2016 at 10:37:08AM -0700, pravin shelar wrote:
> > > On transmit side you are using mac_len to detect l3 packet, why not do
> > > same while extracting the key?
> 
> I agree. The skb should be self-contained, i.e. it should be obvious
> whether it has the MAC header set or not just from the skb itself at
> any point in the packet processing. Otherwise, I'd expect things like
> recirculation to break after push/pop of eth header.
> 
> > Unfortunately mac_len can't be relied on here, emprically it has the same
> > value (28 in my tests) for both the TEB and layer3 case above.
> 
> That's strange, it looks like there's something setting the mac header
> unconditionally in ovs code. We should find that place and change it.

It seems to be caused by the following:

1. __ipgre_rcv() calls skb_pop_mac_header() which
   sets skb->mac_header to the skb->network_header.

2. __ipgre_rcv() then calls ip_tunnel_rcv() which calls
   skb_reset_network_header(). This updates skb->network_header to
   just after the end of the GRE header.

   This is 28 bytes after the old skb->network_header
   as there is a 20 byte IPv4 header followed by an
   8 byte GRE header.

3. Later, dev_gro_receive() calls skb_reset_mac_len().
   This calculates skb->mac_len based on skb->network_header and
   skb->mac_header. I.e. 28 bytes.

I think this may be possible to address by calling
skb_reset_network_header() instead of skb_pop_mac_header()
in __ipgre_rcv().

I think that would leave skb->mac_header and skb->network_header both
set to just after the end of the GRE header and result in mac_len of 0.
It looks like this owuld be for for both TEB and non-TEB GRE packets
and that OVS would need to check against mac_len, protocol and most
likely dev->type early on.

> The ARPHRD_NONE interfaces don't even set mac_header and mac_len, this
> will need to be set by ovs upon getting frame from such interface. 

Noted.

> > Perhaps that could be changed by futher enhancements in the tunneling code
> > but I think things are symetric as they stand:
> > 
> > * On recieve skb->protocol can be read to distinguish TEB and layer3 packets
> > * On transmit skb->protocol should be set to distinguish TEB and layer3 packets
> 
> Yes, but you need to act upon this directly after receiving the
> frame/just before sending the frame and set up an internal flag that
> will be used throughout the code. That way, the packet can be handed
> over to different parts of the code, recirculated, etc. without
> worries. skb->mac_len is probably a good candidate for such flag.

Its possible that I've overlooked something but as things stand I think
things look like this:

* ovs_flow_key_extract() keys off dev->type and skb->protocol.
* ovs_flow_key_extract() calls key_extract() which amongst other things
  sets up the skb->mac_header and skb->mac_len of the skb.
* ovs_flow_key_extract() sets skb->protocol to that of the inner packet
  in the case of TEB
* Actions update the above mentioned skb fields as appropriate.

So it seems to me that it should be safe to rely on skb->protocol
in the receive path. Or more specifically, in netdev_port_receive().

If mac_len is also able to be used then I think fine. But it seems to me
that it needs to be set up by OvS at least for the ARPHRD_NONE case. This
could be done early on, say in netdev_port_receive(). But it seems that
would involve duplicating some of what is already occurring in
key_extract().

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-06  9:35   ` Jiri Benc
@ 2016-05-09  8:18     ` Simon Horman
  2016-05-10  0:16       ` [ovs-dev] " Yang, Yi Y
  2016-05-10 12:06       ` Jiri Benc
  0 siblings, 2 replies; 43+ messages in thread
From: Simon Horman @ 2016-05-09  8:18 UTC (permalink / raw)
  To: Jiri Benc; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

On Fri, May 06, 2016 at 11:35:04AM +0200, Jiri Benc wrote:
> On Wed,  4 May 2016 16:36:30 +0900, Simon Horman wrote:
> > +static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
> > +		    const struct ovs_action_push_eth *ethh)
> > +{
> > +	int err;
> > +
> > +	/* De-accelerate any hardware accelerated VLAN tag added to a previous
> > +	 * Ethernet header */
> > +	err = skb_vlan_deaccel(skb);
> > +	if (unlikely(err))
> > +		return err;
> > +
> > +	/* Add the new Ethernet header */
> > +	if (skb_cow_head(skb, ETH_HLEN) < 0)
> > +		return -ENOMEM;
> > +
> > +	skb_push(skb, ETH_HLEN);
> > +	skb_reset_mac_header(skb);
> > +	skb_reset_mac_len(skb);
> > +
> > +	ether_addr_copy(eth_hdr(skb)->h_source, ethh->addresses.eth_src);
> > +	ether_addr_copy(eth_hdr(skb)->h_dest, ethh->addresses.eth_dst);
> > +	eth_hdr(skb)->h_proto = ethh->eth_type;
> 
> This doesn't seem right. We know the packet type, it's skb->protocol.
> We should fill in that.

I think that makes sense. Possibly the eth_type field
can be removed from ovs_action_push_eth.

> In addition, we should check whether mac_len > 0 and in such case,
> change skb->protocol to ETH_P_TEB first (and store that value in the
> pushed eth header).
> 
> Similarly on pop_eth, we need to check skb->protocol and if it is
> ETH_P_TEB, call eth_type_trans on the modified frame to set the new
> skb->protocol correctly. It's probably not that simple, as we'd need a
> version of eth_type_trans that doesn't need a net device.

I'm not sure I understand the interaction with ETH_P_TEB here.

In my mind skb->protocol == ETH_P_TEB may be used early on in OvS's receive
processing to find the inner protocol from the packet and at that point
skb->protocol is set to that value. And that for further packet processing
the fact the packet was received as TEB is transparent.

Conversely, skb->protocol may be set as necessary on transmit as a packet
exits OvS.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [ovs-dev] [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-09  8:18     ` Simon Horman
@ 2016-05-10  0:16       ` Yang, Yi Y
       [not found]         ` <79BBBFE6CB6C9B488C1A45ACD284F51913CB6446-0J0gbvR4kTggGBtAFL8yw7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2016-05-10 12:06       ` Jiri Benc
  1 sibling, 1 reply; 43+ messages in thread
From: Yang, Yi Y @ 2016-05-10  0:16 UTC (permalink / raw)
  To: Simon Horman, Jiri Benc; +Cc: dev, netdev

I think it is still necessary to keep eth_type in push_eth action, for the classifier case, it will push_nsh then push_eth for the original frame, this will need to set eth_type to 0x894f by push_eth, otherwise push_eth won't know this eth_type.

-----Original Message-----
From: dev [mailto:dev-bounces@openvswitch.org] On Behalf Of Simon Horman
Sent: Monday, May 09, 2016 4:18 PM
To: Jiri Benc <jbenc@redhat.com>
Cc: dev@openvswitch.org; netdev@vger.kernel.org
Subject: Re: [ovs-dev] [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

On Fri, May 06, 2016 at 11:35:04AM +0200, Jiri Benc wrote:
> On Wed,  4 May 2016 16:36:30 +0900, Simon Horman wrote:
> > +static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
> > +		    const struct ovs_action_push_eth *ethh) {
> > +	int err;
> > +
> > +	/* De-accelerate any hardware accelerated VLAN tag added to a previous
> > +	 * Ethernet header */
> > +	err = skb_vlan_deaccel(skb);
> > +	if (unlikely(err))
> > +		return err;
> > +
> > +	/* Add the new Ethernet header */
> > +	if (skb_cow_head(skb, ETH_HLEN) < 0)
> > +		return -ENOMEM;
> > +
> > +	skb_push(skb, ETH_HLEN);
> > +	skb_reset_mac_header(skb);
> > +	skb_reset_mac_len(skb);
> > +
> > +	ether_addr_copy(eth_hdr(skb)->h_source, ethh->addresses.eth_src);
> > +	ether_addr_copy(eth_hdr(skb)->h_dest, ethh->addresses.eth_dst);
> > +	eth_hdr(skb)->h_proto = ethh->eth_type;
> 
> This doesn't seem right. We know the packet type, it's skb->protocol.
> We should fill in that.

I think that makes sense. Possibly the eth_type field can be removed from ovs_action_push_eth.

> In addition, we should check whether mac_len > 0 and in such case, 
> change skb->protocol to ETH_P_TEB first (and store that value in the 
> pushed eth header).
> 
> Similarly on pop_eth, we need to check skb->protocol and if it is 
> ETH_P_TEB, call eth_type_trans on the modified frame to set the new
> skb->protocol correctly. It's probably not that simple, as we'd need a
> version of eth_type_trans that doesn't need a net device.

I'm not sure I understand the interaction with ETH_P_TEB here.

In my mind skb->protocol == ETH_P_TEB may be used early on in OvS's receive processing to find the inner protocol from the packet and at that point
skb->protocol is set to that value. And that for further packet 
skb->processing
the fact the packet was received as TEB is transparent.

Conversely, skb->protocol may be set as necessary on transmit as a packet exits OvS.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
       [not found]             ` <20160509080420.GA4470-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-05-10 12:01               ` Jiri Benc
  2016-05-11  1:50                 ` Simon Horman
  0 siblings, 1 reply; 43+ messages in thread
From: Jiri Benc @ 2016-05-10 12:01 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers

On Mon, 9 May 2016 17:04:22 +0900, Simon Horman wrote:
> It seems to be caused by the following:
> 
> 1. __ipgre_rcv() calls skb_pop_mac_header() which
>    sets skb->mac_header to the skb->network_header.
> 
> 2. __ipgre_rcv() then calls ip_tunnel_rcv() which calls
>    skb_reset_network_header(). This updates skb->network_header to
>    just after the end of the GRE header.
> 
>    This is 28 bytes after the old skb->network_header
>    as there is a 20 byte IPv4 header followed by an
>    8 byte GRE header.
> 
> 3. Later, dev_gro_receive() calls skb_reset_mac_len().
>    This calculates skb->mac_len based on skb->network_header and
>    skb->mac_header. I.e. 28 bytes.

Right. Thanks for tracking this down!

> I think this may be possible to address by calling
> skb_reset_network_header() instead of skb_pop_mac_header()
> in __ipgre_rcv().

We can't do that. The interface type is ARPHRD_IPGRE and not
ARPHRD_NONE, so the current behavior makes pretty good sense. See
e.g. commit 0e3da5bb8da45.

We have two options here:

1. As for metadata tunnels all the info is in metadata_dst and we
   don't need the IP/GRE header for anything, we can make the ipgre
   interface ARPHRD_NONE in metadata based mode.

2. We can fix this up in ovs after receiving the packet from
   ARPHRD_IPGRE interface.

I think the first option is the correct one. We already don't assign
dev->header_ops in metadata mode. I'll prepare a patch.

> Its possible that I've overlooked something but as things stand I think
> things look like this:
> 
> * ovs_flow_key_extract() keys off dev->type and skb->protocol.
> * ovs_flow_key_extract() calls key_extract() which amongst other things
>   sets up the skb->mac_header and skb->mac_len of the skb.
> * ovs_flow_key_extract() sets skb->protocol to that of the inner packet
>   in the case of TEB
> * Actions update the above mentioned skb fields as appropriate.

Okay, that actually eases things somewhat.

> So it seems to me that it should be safe to rely on skb->protocol
> in the receive path. Or more specifically, in netdev_port_receive().
> 
> If mac_len is also able to be used then I think fine. But it seems to me
> that it needs to be set up by OvS at least for the ARPHRD_NONE case. This
> could be done early on, say in netdev_port_receive(). But it seems that
> would involve duplicating some of what is already occurring in
> key_extract().

I'd actually prefer doing this earlier, netdev_port_receive looks like
the right place. Just set mac_len there (or whatever) and let
key_extract do the rest of the work, not depending on dev->type in
there.

My point about recirculation was not actually valid, as I missed you're
doing this in ovs_flow_key_extract and not in key_extract. Still,
I think the special handling of particular interface types belongs to
the tx processing on those interfaces, not to the common code.

Thanks!

 Jiri
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-09  8:18     ` Simon Horman
  2016-05-10  0:16       ` [ovs-dev] " Yang, Yi Y
@ 2016-05-10 12:06       ` Jiri Benc
  2016-05-11  3:28         ` Simon Horman
  1 sibling, 1 reply; 43+ messages in thread
From: Jiri Benc @ 2016-05-10 12:06 UTC (permalink / raw)
  To: Simon Horman; +Cc: netdev, dev, Lorand Jakab, Thomas Morin

On Mon, 9 May 2016 17:18:20 +0900, Simon Horman wrote:
> On Fri, May 06, 2016 at 11:35:04AM +0200, Jiri Benc wrote:
> > In addition, we should check whether mac_len > 0 and in such case,
> > change skb->protocol to ETH_P_TEB first (and store that value in the
> > pushed eth header).
> > 
> > Similarly on pop_eth, we need to check skb->protocol and if it is
> > ETH_P_TEB, call eth_type_trans on the modified frame to set the new
> > skb->protocol correctly. It's probably not that simple, as we'd need a
> > version of eth_type_trans that doesn't need a net device.
> 
> I'm not sure I understand the interaction with ETH_P_TEB here.
> 
> In my mind skb->protocol == ETH_P_TEB may be used early on in OvS's receive
> processing to find the inner protocol from the packet and at that point
> skb->protocol is set to that value. And that for further packet processing
> the fact the packet was received as TEB is transparent.

Yes but think about the case when you have two Ethernet headers pushed.

We can either disallow it, or do what I described.

Specifically, let's assume we have an IP packet with an Ethernet
header present. skb->protocol is ETH_P_IP. Now, when there's skb_push,
the correct operation would be setting skb->protocol to ETH_P_TEB,
pushing a new Ethernet header and filing ETH_P_TEB to the ethertype
field in the new header. The packet is not ETH_P_IP anymore, as the L2
header is followed by another Ethernet header now.

 Jiri

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
       [not found]         ` <79BBBFE6CB6C9B488C1A45ACD284F51913CB6446-0J0gbvR4kTggGBtAFL8yw7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2016-05-10 12:07           ` Jiri Benc
  0 siblings, 0 replies; 43+ messages in thread
From: Jiri Benc @ 2016-05-10 12:07 UTC (permalink / raw)
  To: Yang, Yi Y
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, Simon Horman, netdev-u79uwXL29TY76Z2rM5mHXA

Please do not top post.

On Tue, 10 May 2016 00:16:36 +0000, Yang, Yi Y wrote:
> I think it is still necessary to keep eth_type in push_eth action, for
> the classifier case, it will push_nsh then push_eth for the original
> frame, this will need to set eth_type to 0x894f by push_eth, otherwise
> push_eth won't know this eth_type.

Nope, push_nsh will set skb->protocol to ETH_P_NSH. Later push_eth will
use that value correctly.

 Jiri
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-10 12:01               ` Jiri Benc
@ 2016-05-11  1:50                 ` Simon Horman
       [not found]                   ` <20160511015009.GB24436-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  2016-05-11 13:57                   ` Jiri Benc
  0 siblings, 2 replies; 43+ messages in thread
From: Simon Horman @ 2016-05-11  1:50 UTC (permalink / raw)
  To: Jiri Benc; +Cc: ovs dev, Linux Kernel Network Developers

Hi Jiri,

On Tue, May 10, 2016 at 02:01:06PM +0200, Jiri Benc wrote:
> On Mon, 9 May 2016 17:04:22 +0900, Simon Horman wrote:
> > It seems to be caused by the following:
> > 
> > 1. __ipgre_rcv() calls skb_pop_mac_header() which
> >    sets skb->mac_header to the skb->network_header.
> > 
> > 2. __ipgre_rcv() then calls ip_tunnel_rcv() which calls
> >    skb_reset_network_header(). This updates skb->network_header to
> >    just after the end of the GRE header.
> > 
> >    This is 28 bytes after the old skb->network_header
> >    as there is a 20 byte IPv4 header followed by an
> >    8 byte GRE header.
> > 
> > 3. Later, dev_gro_receive() calls skb_reset_mac_len().
> >    This calculates skb->mac_len based on skb->network_header and
> >    skb->mac_header. I.e. 28 bytes.
> 
> Right. Thanks for tracking this down!
> 
> > I think this may be possible to address by calling
> > skb_reset_network_header() instead of skb_pop_mac_header()
> > in __ipgre_rcv().
> 
> We can't do that. The interface type is ARPHRD_IPGRE and not
> ARPHRD_NONE, so the current behavior makes pretty good sense. See
> e.g. commit 0e3da5bb8da45.
> 
> We have two options here:
> 
> 1. As for metadata tunnels all the info is in metadata_dst and we
>    don't need the IP/GRE header for anything, we can make the ipgre
>    interface ARPHRD_NONE in metadata based mode.
> 
> 2. We can fix this up in ovs after receiving the packet from
>    ARPHRD_IPGRE interface.
> 
> I think the first option is the correct one. We already don't assign
> dev->header_ops in metadata mode. I'll prepare a patch.

I agree that 1. seems to be the better approach.

> > Its possible that I've overlooked something but as things stand I think
> > things look like this:
> > 
> > * ovs_flow_key_extract() keys off dev->type and skb->protocol.
> > * ovs_flow_key_extract() calls key_extract() which amongst other things
> >   sets up the skb->mac_header and skb->mac_len of the skb.
> > * ovs_flow_key_extract() sets skb->protocol to that of the inner packet
> >   in the case of TEB
> > * Actions update the above mentioned skb fields as appropriate.
> 
> Okay, that actually eases things somewhat.
> 
> > So it seems to me that it should be safe to rely on skb->protocol
> > in the receive path. Or more specifically, in netdev_port_receive().
> > 
> > If mac_len is also able to be used then I think fine. But it seems to me
> > that it needs to be set up by OvS at least for the ARPHRD_NONE case. This
> > could be done early on, say in netdev_port_receive(). But it seems that
> > would involve duplicating some of what is already occurring in
> > key_extract().
> 
> I'd actually prefer doing this earlier, netdev_port_receive looks like
> the right place. Just set mac_len there (or whatever) and let
> key_extract do the rest of the work, not depending on dev->type in
> there.
> 
> My point about recirculation was not actually valid, as I missed you're
> doing this in ovs_flow_key_extract and not in key_extract. Still,
> I think the special handling of particular interface types belongs to
> the tx processing on those interfaces, not to the common code.

Sure, if that is your preference I think it should be simple enough to
implement. I agree that netdev_port_receive() looks like a good place for
this.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
       [not found]                   ` <20160511015009.GB24436-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-05-11  3:06                     ` Simon Horman
  2016-05-11 14:09                       ` Jiri Benc
  0 siblings, 1 reply; 43+ messages in thread
From: Simon Horman @ 2016-05-11  3:06 UTC (permalink / raw)
  To: Jiri Benc; +Cc: ovs dev, Linux Kernel Network Developers

Hi Jiri,

On Wed, May 11, 2016 at 10:50:09AM +0900, Simon Horman wrote:

[...]

> > > Its possible that I've overlooked something but as things stand I think
> > > things look like this:
> > > 
> > > * ovs_flow_key_extract() keys off dev->type and skb->protocol.
> > > * ovs_flow_key_extract() calls key_extract() which amongst other things
> > >   sets up the skb->mac_header and skb->mac_len of the skb.
> > > * ovs_flow_key_extract() sets skb->protocol to that of the inner packet
> > >   in the case of TEB
> > > * Actions update the above mentioned skb fields as appropriate.
> > 
> > Okay, that actually eases things somewhat.
> > 
> > > So it seems to me that it should be safe to rely on skb->protocol
> > > in the receive path. Or more specifically, in netdev_port_receive().
> > > 
> > > If mac_len is also able to be used then I think fine. But it seems to me
> > > that it needs to be set up by OvS at least for the ARPHRD_NONE case. This
> > > could be done early on, say in netdev_port_receive(). But it seems that
> > > would involve duplicating some of what is already occurring in
> > > key_extract().
> > 
> > I'd actually prefer doing this earlier, netdev_port_receive looks like
> > the right place. Just set mac_len there (or whatever) and let
> > key_extract do the rest of the work, not depending on dev->type in
> > there.
> > 
> > My point about recirculation was not actually valid, as I missed you're
> > doing this in ovs_flow_key_extract and not in key_extract. Still,
> > I think the special handling of particular interface types belongs to
> > the tx processing on those interfaces, not to the common code.
> 
> Sure, if that is your preference I think it should be simple enough to
> implement. I agree that netdev_port_receive() looks like a good place for
> this.

So far I have the following, which seems to work
changes to make dev->type ARPHRD_NONE for compat GRE vports.

Is this close to what you had in mind?

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index d320c2657627..4d2698596033 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -700,8 +700,6 @@ int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key)
 int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 			 struct sk_buff *skb, struct sw_flow_key *key)
 {
-	bool is_layer3 = false;
-	bool is_teb = false;
 	int err;
 
 	/* Extract metadata from packet. */
@@ -709,14 +707,6 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 		key->tun_proto = ip_tunnel_info_af(tun_info);
 		memcpy(&key->tun_key, &tun_info->key, sizeof(key->tun_key));
 
-		if (OVS_CB(skb)->input_vport->dev->type != ARPHRD_ETHER) {
-			if (skb->protocol == htons(ETH_P_TEB))
-				is_teb = true;
-			else
-				is_layer3 = true;
-		}
-
-
 		if (tun_info->options_len) {
 			BUILD_BUG_ON((1 << (sizeof(tun_info->options_len) *
 						   8)) - 1
@@ -739,17 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 	key->phy.skb_mark = skb->mark;
 	ovs_ct_fill_key(skb, key);
 	key->ovs_flow_hash = 0;
-	key->phy.is_layer3 = is_layer3;
+	key->phy.is_layer3 = (tun_info && skb->mac_len == 0);
 	key->recirc_id = 0;
 
 	err = key_extract(skb, key);
 	if (err < 0)
 		return err;
 
-	if (is_teb)
-		skb->protocol = key->eth.type;
-	else if (is_layer3)
+	if (key->phy.is_layer3)
 		key->eth.type = skb->protocol;
+	else if (tun_info && skb->protocol == htons(ETH_P_TEB))
+		skb->protocol = key->eth.type;
 
 	return err;
 }
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 7d54414b35eb..ac8178ac2c81 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -60,7 +60,21 @@ static void netdev_port_receive(struct sk_buff *skb)
 	if (vport->dev->type == ARPHRD_ETHER) {
 		skb_push(skb, ETH_HLEN);
 		skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
+	} else if (vport->dev->type == ARPHRD_NONE) {
+		if (skb->protocol == htons(ETH_P_TEB)) {
+			struct ethhdr *eth = eth_hdr(skb);
+
+			if (unlikely(skb->len < ETH_HLEN))
+				goto error;
+
+			skb->mac_len = ETH_HLEN;
+			if (eth->h_proto == htons(ETH_P_8021Q))
+				skb->mac_len += VLAN_HLEN;
+		} else {
+			skb->mac_len = 0;
+		}
 	}
+
 	ovs_vport_receive(vport, skb, skb_tunnel_info(skb));
 	return;
 error:
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-10 12:06       ` Jiri Benc
@ 2016-05-11  3:28         ` Simon Horman
  2016-05-11 14:10           ` Jiri Benc
  0 siblings, 1 reply; 43+ messages in thread
From: Simon Horman @ 2016-05-11  3:28 UTC (permalink / raw)
  To: Jiri Benc; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

Hi Jiri,

On Tue, May 10, 2016 at 02:06:18PM +0200, Jiri Benc wrote:
> On Mon, 9 May 2016 17:18:20 +0900, Simon Horman wrote:
> > On Fri, May 06, 2016 at 11:35:04AM +0200, Jiri Benc wrote:
> > > In addition, we should check whether mac_len > 0 and in such case,
> > > change skb->protocol to ETH_P_TEB first (and store that value in the
> > > pushed eth header).
> > > 
> > > Similarly on pop_eth, we need to check skb->protocol and if it is
> > > ETH_P_TEB, call eth_type_trans on the modified frame to set the new
> > > skb->protocol correctly. It's probably not that simple, as we'd need a
> > > version of eth_type_trans that doesn't need a net device.
> > 
> > I'm not sure I understand the interaction with ETH_P_TEB here.
> > 
> > In my mind skb->protocol == ETH_P_TEB may be used early on in OvS's receive
> > processing to find the inner protocol from the packet and at that point
> > skb->protocol is set to that value. And that for further packet processing
> > the fact the packet was received as TEB is transparent.
> 
> Yes but think about the case when you have two Ethernet headers pushed.
> 
> We can either disallow it, or do what I described.
> 
> Specifically, let's assume we have an IP packet with an Ethernet
> header present. skb->protocol is ETH_P_IP. Now, when there's skb_push,
> the correct operation would be setting skb->protocol to ETH_P_TEB,
> pushing a new Ethernet header and filing ETH_P_TEB to the ethertype
> field in the new header. The packet is not ETH_P_IP anymore, as the L2
> header is followed by another Ethernet header now.

Thanks for the clarification, I had not considered the case of two
ethernet headers when I wrote my previous email.

I think that at this stage I would prefer to prohibit push_eth() acting
on a packet which already has an ethernet header. Indeed that is what
my patch-set already does in its modifications of __ovs_nla_copy_actions().

The reason that I lean towards prohibiting this is that I do not
have an easy way to exercise this case within the current patch-set.
And thus this extra complexity seems well suited to being handled handled
incrementally as further work.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-11  1:50                 ` Simon Horman
       [not found]                   ` <20160511015009.GB24436-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-05-11 13:57                   ` Jiri Benc
  1 sibling, 0 replies; 43+ messages in thread
From: Jiri Benc @ 2016-05-11 13:57 UTC (permalink / raw)
  To: Simon Horman
  Cc: pravin shelar, Linux Kernel Network Developers, ovs dev,
	Lorand Jakab, Thomas Morin

On Wed, 11 May 2016 10:50:12 +0900, Simon Horman wrote:
> On Tue, May 10, 2016 at 02:01:06PM +0200, Jiri Benc wrote:
> > We have two options here:
> > 
> > 1. As for metadata tunnels all the info is in metadata_dst and we
> >    don't need the IP/GRE header for anything, we can make the ipgre
> >    interface ARPHRD_NONE in metadata based mode.
> > 
> > 2. We can fix this up in ovs after receiving the packet from
> >    ARPHRD_IPGRE interface.
> > 
> > I think the first option is the correct one. We already don't assign
> > dev->header_ops in metadata mode. I'll prepare a patch.
> 
> I agree that 1. seems to be the better approach.

I just sent a patch that fixes this. And we have the same bug in
VXLAN-GPE, I sent a patch, too.

> Sure, if that is your preference I think it should be simple enough to
> implement. I agree that netdev_port_receive() looks like a good place for
> this.

Great, thanks!

 Jiri

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-11  3:06                     ` Simon Horman
@ 2016-05-11 14:09                       ` Jiri Benc
  2016-05-11 22:46                         ` Simon Horman
  0 siblings, 1 reply; 43+ messages in thread
From: Jiri Benc @ 2016-05-11 14:09 UTC (permalink / raw)
  To: Simon Horman
  Cc: pravin shelar, Linux Kernel Network Developers, ovs dev,
	Lorand Jakab, Thomas Morin

On Wed, 11 May 2016 12:06:35 +0900, Simon Horman wrote:
> Is this close to what you had in mind?

Yes but see below.

> @@ -739,17 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>  	key->phy.skb_mark = skb->mark;
>  	ovs_ct_fill_key(skb, key);
>  	key->ovs_flow_hash = 0;
> -	key->phy.is_layer3 = is_layer3;
> +	key->phy.is_layer3 = (tun_info && skb->mac_len == 0);

Do we have to depend on tun_info? It would be nice to support all
ARPHRD_NONE interfaces, not just tunnels. The tun interface (from
the tuntap driver) comes to mind, for example.

> --- a/net/openvswitch/vport-netdev.c
> +++ b/net/openvswitch/vport-netdev.c
> @@ -60,7 +60,21 @@ static void netdev_port_receive(struct sk_buff *skb)
>  	if (vport->dev->type == ARPHRD_ETHER) {
>  		skb_push(skb, ETH_HLEN);
>  		skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
> +	} else if (vport->dev->type == ARPHRD_NONE) {
> +		if (skb->protocol == htons(ETH_P_TEB)) {
> +			struct ethhdr *eth = eth_hdr(skb);
> +
> +			if (unlikely(skb->len < ETH_HLEN))
> +				goto error;
> +
> +			skb->mac_len = ETH_HLEN;
> +			if (eth->h_proto == htons(ETH_P_8021Q))
> +				skb->mac_len += VLAN_HLEN;
> +		} else {
> +			skb->mac_len = 0;
> +		}

Without putting much thought into this, could this perhaps be left for
parse_ethertype (called from key_extract) to do?

Thanks,

 Jiri

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-11  3:28         ` Simon Horman
@ 2016-05-11 14:10           ` Jiri Benc
  0 siblings, 0 replies; 43+ messages in thread
From: Jiri Benc @ 2016-05-11 14:10 UTC (permalink / raw)
  To: Simon Horman; +Cc: netdev, dev, Lorand Jakab, Thomas Morin

On Wed, 11 May 2016 12:28:14 +0900, Simon Horman wrote:
> I think that at this stage I would prefer to prohibit push_eth() acting
> on a packet which already has an ethernet header. Indeed that is what
> my patch-set already does in its modifications of __ovs_nla_copy_actions().
> 
> The reason that I lean towards prohibiting this is that I do not
> have an easy way to exercise this case within the current patch-set.
> And thus this extra complexity seems well suited to being handled handled
> incrementally as further work.

Works for me. I don't see any real usage for multiple Ethernet headers.

Thanks!

 Jiri

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-11 14:09                       ` Jiri Benc
@ 2016-05-11 22:46                         ` Simon Horman
  2016-05-17 14:43                           ` Jiri Benc
  0 siblings, 1 reply; 43+ messages in thread
From: Simon Horman @ 2016-05-11 22:46 UTC (permalink / raw)
  To: Jiri Benc
  Cc: pravin shelar, Linux Kernel Network Developers, ovs dev,
	Lorand Jakab, Thomas Morin

On Wed, May 11, 2016 at 04:09:28PM +0200, Jiri Benc wrote:
> On Wed, 11 May 2016 12:06:35 +0900, Simon Horman wrote:
> > Is this close to what you had in mind?
> 
> Yes but see below.
> 
> > @@ -739,17 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
> >  	key->phy.skb_mark = skb->mark;
> >  	ovs_ct_fill_key(skb, key);
> >  	key->ovs_flow_hash = 0;
> > -	key->phy.is_layer3 = is_layer3;
> > +	key->phy.is_layer3 = (tun_info && skb->mac_len == 0);
> 
> Do we have to depend on tun_info? It would be nice to support all
> ARPHRD_NONE interfaces, not just tunnels. The tun interface (from
> the tuntap driver) comes to mind, for example.

Yes, I think that should work. I was just being cautious.

Do you think it is safe to detect TEB based on skb->protocol regardless
of the presence of tun_info?

> > +++ b/net/openvswitch/vport-netdev.c
> > @@ -60,7 +60,21 @@ static void netdev_port_receive(struct sk_buff *skb)
> >  	if (vport->dev->type == ARPHRD_ETHER) {
> >  		skb_push(skb, ETH_HLEN);
> >  		skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
> > +	} else if (vport->dev->type == ARPHRD_NONE) {
> > +		if (skb->protocol == htons(ETH_P_TEB)) {
> > +			struct ethhdr *eth = eth_hdr(skb);
> > +
> > +			if (unlikely(skb->len < ETH_HLEN))
> > +				goto error;
> > +
> > +			skb->mac_len = ETH_HLEN;
> > +			if (eth->h_proto == htons(ETH_P_8021Q))
> > +				skb->mac_len += VLAN_HLEN;
> > +		} else {
> > +			skb->mac_len = 0;
> > +		}
> 
> Without putting much thought into this, could this perhaps be left for
> parse_ethertype (called from key_extract) to do?

I think I am confused.

I believe that key_extract() does already do all of the above (and more).

The purpose of the above change was to do this work here rather than
leaving it to parse_ethertype. This is because I was under the impression
that is what you were after. Specifically as a mechanism to avoid relying
on vport->dev->type in ovs_flow_key_extract.

If we can live with a bogus skb->mac_len value that is sufficient for
ovs_flow_key_extract.() and set correctly by key_extract() (which happens
anyway) we could do something like this:

	} else if (vport->dev->type == ARPHRD_NONE) {
		if (skb->protocol == htons(ETH_P_TEB))
			/* Ignores presence of VLAN but is sufficient for
			 * ovs_flow_key_extract() which then calls key_extract()
			 * which calculates skb->mac_len correctly. */
			skb->mac_len = ETH_HLEN; /* Ignores presence of VLAN */
		else
			skb->mac_len = 0;
	}


But perhaps I have missed the point somehow.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-04  7:36 ` [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support Simon Horman
       [not found]   ` <1462347393-22354-5-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
  2016-05-06  9:35   ` Jiri Benc
@ 2016-05-17 14:32   ` Jiri Benc
  2016-05-20  5:29     ` Simon Horman
  2 siblings, 1 reply; 43+ messages in thread
From: Jiri Benc @ 2016-05-17 14:32 UTC (permalink / raw)
  To: Simon Horman; +Cc: netdev, dev, Lorand Jakab, Thomas Morin

Looking through the patchset again, this time more deeply. Sorry for
the delay.

On Wed,  4 May 2016 16:36:30 +0900, Simon Horman wrote:
> +struct ovs_action_push_eth {
> +	struct ovs_key_ethernet addresses;
> +	__be16   eth_type;

Extra spaces.

> +static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
> +{
> +	skb_pull_rcsum(skb, ETH_HLEN);
> +	skb_reset_mac_header(skb);
> +	skb->mac_len -= ETH_HLEN;
> +
> +	invalidate_flow_key(key);
> +	return 0;
> +}

There's a fundamental question here: how should pop_eth behave when
vlan tag is present?

There are two options: either vlan is considered part of the Ethernet
header and pop_eth means implicitly resetting vlan tag, or packet can
have vlan tag even if it's not Ethernet.

This patch seems to implement the first option; however, skb->vlan_tci
should be reset and pop_eth should check whether the vlan tag is
present in the frame (deaccelerated) and remove it if it is. Otherwise,
the behavior of pop_eth would be inconsistent.

However, I'm not sure whether the second option does not make more
sense. It may, in fact, be needed - ARPHRD_NONE tunnel port could not
be set as an access port otherwise (unless I'm missing something).

In that case, pop_eth will need to put the vlan tag to skb->vlan_tci if
it's in the frame itself. Also, push_vlan and pop_vlan would need to be
modified to work with is_layer3 packets.

> +static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
> +		    const struct ovs_action_push_eth *ethh)
> +{
> +	int err;
> +
> +	/* De-accelerate any hardware accelerated VLAN tag added to a previous
> +	 * Ethernet header */
> +	err = skb_vlan_deaccel(skb);

Why? Just keep it in skb->vlan_tci.

> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -468,28 +468,31 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>  
>  	skb_reset_mac_header(skb);
>  
> -	/* Link layer.  We are guaranteed to have at least the 14 byte Ethernet
> -	 * header in the linear data area.
> -	 */
> -	eth = eth_hdr(skb);
> -	ether_addr_copy(key->eth.src, eth->h_source);
> -	ether_addr_copy(key->eth.dst, eth->h_dest);
> +	/* Link layer. */
> +	if (key->phy.is_layer3) {
> +		key->eth.tci = 0;

Could make sense to use skb->vlan_tci, see above.

Thanks,

 Jiri

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-11 22:46                         ` Simon Horman
@ 2016-05-17 14:43                           ` Jiri Benc
  2016-05-18  2:18                             ` Simon Horman
  0 siblings, 1 reply; 43+ messages in thread
From: Jiri Benc @ 2016-05-17 14:43 UTC (permalink / raw)
  To: Simon Horman
  Cc: pravin shelar, Linux Kernel Network Developers, ovs dev,
	Lorand Jakab, Thomas Morin

On Thu, 12 May 2016 07:46:52 +0900, Simon Horman wrote:
> If we can live with a bogus skb->mac_len value that is sufficient for
> ovs_flow_key_extract.() and set correctly by key_extract() (which happens
> anyway) we could do something like this:
> 
> 	} else if (vport->dev->type == ARPHRD_NONE) {
> 		if (skb->protocol == htons(ETH_P_TEB))
> 			/* Ignores presence of VLAN but is sufficient for
> 			 * ovs_flow_key_extract() which then calls key_extract()
> 			 * which calculates skb->mac_len correctly. */
> 			skb->mac_len = ETH_HLEN; /* Ignores presence of VLAN */
> 		else
> 			skb->mac_len = 0;
> 	}
> 
> 
> But perhaps I have missed the point somehow.

You did not, I was more thinking aloud. But you're right, it doesn't
make much sense.

So, wouldn't it be actually more correct and in line with patch 2 to
call eth_type_trans() here? Possibly even followed by skb_vlan_untag
(not needed. But it might make things easier to have the first vlan tag
always reside inside skb->vlan_tci and treat that as an invariant in
the whole ovs kernel code. It'll need to be done in more spots than
just this one, though, and is probably matter of a separate patchset).

 Jiri

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 5/7] openvswitch: add layer 3 support to ovs_packet_cmd_execute()
       [not found]     ` <1462347393-22354-6-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
@ 2016-05-17 14:51       ` Jiri Benc
  2016-05-18  2:24         ` Simon Horman
  0 siblings, 1 reply; 43+ messages in thread
From: Jiri Benc @ 2016-05-17 14:51 UTC (permalink / raw)
  To: Simon Horman; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

On Wed,  4 May 2016 16:36:31 +0900, Simon Horman wrote:
> +	/* Packets from user space for execution only have metadata key
> +	 * attributes.  OVS_KEY_ATTR_PACKET_ETHERTYPE is then used to specify
> +	 * the starting layer of the packet.  Packets with Ethernet headers
> +	 * have this attribute set to 0
> +	 */
> +	if (*attrs & (1ULL << OVS_KEY_ATTR_PACKET_ETHERTYPE)) {
> +		__be16 eth_type;
> +
> +		if (is_mask) {
> +			/* Always exact match packet EtherType */
> +			eth_type = htons(0xffff);
> +		} else {
> +			eth_type = nla_get_be16(a[OVS_KEY_ATTR_PACKET_ETHERTYPE]);
> +			is_layer3 = ((eth_type == htons(ETH_P_IP)) ||
> +				     (eth_type == htons(ETH_P_IPV6)));

Unknown types need to be rejected, not treated as layer2, otherwise we
may run into problems later (with combination of this kernel + newer
user space) when we add more types, such as ETH_P_NSH.

 Jiri
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-17 14:43                           ` Jiri Benc
@ 2016-05-18  2:18                             ` Simon Horman
  0 siblings, 0 replies; 43+ messages in thread
From: Simon Horman @ 2016-05-18  2:18 UTC (permalink / raw)
  To: Jiri Benc; +Cc: ovs dev, Linux Kernel Network Developers

On Tue, May 17, 2016 at 04:43:20PM +0200, Jiri Benc wrote:
> On Thu, 12 May 2016 07:46:52 +0900, Simon Horman wrote:
> > If we can live with a bogus skb->mac_len value that is sufficient for
> > ovs_flow_key_extract.() and set correctly by key_extract() (which happens
> > anyway) we could do something like this:
> > 
> > 	} else if (vport->dev->type == ARPHRD_NONE) {
> > 		if (skb->protocol == htons(ETH_P_TEB))
> > 			/* Ignores presence of VLAN but is sufficient for
> > 			 * ovs_flow_key_extract() which then calls key_extract()
> > 			 * which calculates skb->mac_len correctly. */
> > 			skb->mac_len = ETH_HLEN; /* Ignores presence of VLAN */
> > 		else
> > 			skb->mac_len = 0;
> > 	}
> > 
> > 
> > But perhaps I have missed the point somehow.
> 
> You did not, I was more thinking aloud. But you're right, it doesn't
> make much sense.
> 
> So, wouldn't it be actually more correct and in line with patch 2 to
> call eth_type_trans() here?

Yes, that seems reasonable.

> Possibly even followed by skb_vlan_untag
> (not needed. But it might make things easier to have the first vlan tag
> always reside inside skb->vlan_tci and treat that as an invariant in
> the whole ovs kernel code. It'll need to be done in more spots than
> just this one, though, and is probably matter of a separate patchset).

That also seems reasonable. But as it seems more invasive and is not
strictly necessary I'd rather handle that update separately.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 5/7] openvswitch: add layer 3 support to ovs_packet_cmd_execute()
  2016-05-17 14:51       ` Jiri Benc
@ 2016-05-18  2:24         ` Simon Horman
  0 siblings, 0 replies; 43+ messages in thread
From: Simon Horman @ 2016-05-18  2:24 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev, dev, Lorand Jakab

On Tue, May 17, 2016 at 04:51:08PM +0200, Jiri Benc wrote:
> On Wed,  4 May 2016 16:36:31 +0900, Simon Horman wrote:
> > +	/* Packets from user space for execution only have metadata key
> > +	 * attributes.  OVS_KEY_ATTR_PACKET_ETHERTYPE is then used to specify
> > +	 * the starting layer of the packet.  Packets with Ethernet headers
> > +	 * have this attribute set to 0
> > +	 */
> > +	if (*attrs & (1ULL << OVS_KEY_ATTR_PACKET_ETHERTYPE)) {
> > +		__be16 eth_type;
> > +
> > +		if (is_mask) {
> > +			/* Always exact match packet EtherType */
> > +			eth_type = htons(0xffff);
> > +		} else {
> > +			eth_type = nla_get_be16(a[OVS_KEY_ATTR_PACKET_ETHERTYPE]);
> > +			is_layer3 = ((eth_type == htons(ETH_P_IP)) ||
> > +				     (eth_type == htons(ETH_P_IPV6)));
> 
> Unknown types need to be rejected, not treated as layer2, otherwise we
> may run into problems later (with combination of this kernel + newer
> user space) when we add more types, such as ETH_P_NSH.

I believe this is fixed by the following patch.

Possibly it makes sense to squash that patch into and earlier patches.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-17 14:32   ` Jiri Benc
@ 2016-05-20  5:29     ` Simon Horman
  2016-05-20  8:00       ` Jiri Benc
  0 siblings, 1 reply; 43+ messages in thread
From: Simon Horman @ 2016-05-20  5:29 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev, dev, Lorand Jakab, Thomas Morin

Hi Jiri,

On Tue, May 17, 2016 at 04:32:50PM +0200, Jiri Benc wrote:
> Looking through the patchset again, this time more deeply. Sorry for
> the delay.

No need to be sorry, good things take time.

> On Wed,  4 May 2016 16:36:30 +0900, Simon Horman wrote:
> > +struct ovs_action_push_eth {
> > +	struct ovs_key_ethernet addresses;
> > +	__be16   eth_type;
> 
> Extra spaces.

Sorry about that.

As per some earlier discussion I plan to remove the eth_type field entirely.

> 
> > +static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
> > +{
> > +	skb_pull_rcsum(skb, ETH_HLEN);
> > +	skb_reset_mac_header(skb);
> > +	skb->mac_len -= ETH_HLEN;
> > +
> > +	invalidate_flow_key(key);
> > +	return 0;
> > +}
> 
> There's a fundamental question here: how should pop_eth behave when
> vlan tag is present?
> 
> There are two options: either vlan is considered part of the Ethernet
> header and pop_eth means implicitly resetting vlan tag, or packet can
> have vlan tag even if it's not Ethernet.
> 
> This patch seems to implement the first option; however, skb->vlan_tci
> should be reset and pop_eth should check whether the vlan tag is
> present in the frame (deaccelerated) and remove it if it is. Otherwise,
> the behavior of pop_eth would be inconsistent.
> 
> However, I'm not sure whether the second option does not make more
> sense. It may, in fact, be needed - ARPHRD_NONE tunnel port could not
> be set as an access port otherwise (unless I'm missing something).
> 
> In that case, pop_eth will need to put the vlan tag to skb->vlan_tci if
> it's in the frame itself. Also, push_vlan and pop_vlan would need to be
> modified to work with is_layer3 packets.

Good point.

The second option does seem rather tempting although I'm not sure
that it actually plays out in the access-port scenario at this time.

> > +static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
> > +		    const struct ovs_action_push_eth *ethh)
> > +{
> > +	int err;
> > +
> > +	/* De-accelerate any hardware accelerated VLAN tag added to a previous
> > +	 * Ethernet header */
> > +	err = skb_vlan_deaccel(skb);
> 
> Why? Just keep it in skb->vlan_tci.

Agreed, this seems unnecessary.

> > --- a/net/openvswitch/flow.c
> > +++ b/net/openvswitch/flow.c
> > @@ -468,28 +468,31 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
> >  
> >  	skb_reset_mac_header(skb);
> >  
> > -	/* Link layer.  We are guaranteed to have at least the 14 byte Ethernet
> > -	 * header in the linear data area.
> > -	 */
> > -	eth = eth_hdr(skb);
> > -	ether_addr_copy(key->eth.src, eth->h_source);
> > -	ether_addr_copy(key->eth.dst, eth->h_dest);
> > +	/* Link layer. */
> > +	if (key->phy.is_layer3) {
> > +		key->eth.tci = 0;
> 
> Could make sense to use skb->vlan_tci, see above.

The incremental patch below is what I have so far.
The patch to add skb_vlan_deaccel() should also be dropped.

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c413c588a24f..6853ab008861 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2994,6 +2994,7 @@ int skb_vlan_pop(struct sk_buff *skb);
 int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci);
 struct sk_buff *pskb_extract(struct sk_buff *skb, int off, int to_copy,
 			     gfp_t gfp);
+int skb_vlan_accel(struct sk_buff *skb);
 
 static inline int memcpy_from_msg(void *data, struct msghdr *msg, int len)
 {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7a1d48983f81..a36c7491f714 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4482,12 +4482,28 @@ pull:
 	return err;
 }
 
-int skb_vlan_pop(struct sk_buff *skb)
+/* If a vlan tag is present move it to hw accel tag */
+int skb_vlan_accel(struct sk_buff *skb)
 {
 	u16 vlan_tci;
 	__be16 vlan_proto;
 	int err;
 
+	vlan_proto = skb->protocol;
+	err = __skb_vlan_pop(skb, &vlan_tci);
+	if (unlikely(err))
+		return err;
+
+	__vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
+	return 0;
+}
+EXPORT_SYMBOL(skb_vlan_accel);
+
+int skb_vlan_pop(struct sk_buff *skb)
+{
+	u16 vlan_tci;
+	int err;
+
 	if (likely(skb_vlan_tag_present(skb))) {
 		skb->vlan_tci = 0;
 	} else {
@@ -4500,19 +4516,13 @@ int skb_vlan_pop(struct sk_buff *skb)
 		if (err)
 			return err;
 	}
-	/* move next vlan tag to hw accel tag */
+
 	if (likely((skb->protocol != htons(ETH_P_8021Q) &&
 		    skb->protocol != htons(ETH_P_8021AD)) ||
 		   skb->len < VLAN_ETH_HLEN))
 		return 0;
 
-	vlan_proto = skb->protocol;
-	err = __skb_vlan_pop(skb, &vlan_tci);
-	if (unlikely(err))
-		return err;
-
-	__vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
-	return 0;
+	return skb_vlan_accel(skb);
 }
 EXPORT_SYMBOL(skb_vlan_pop);
 
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 7d9b2307d6eb..ad2331cde732 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -302,6 +302,17 @@ static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key,
 
 static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
 {
+	/* Push outermost VLAN tag to skb metadata unless a VLAN tag
+	 * is already present there.
+	 */
+	if ((skb->protocol == htons(ETH_P_8021Q) ||
+	     skb->protocol == htons(ETH_P_8021AD)) &&
+	    !skb_vlan_tag_present(skb)) {
+		int err = skb_vlan_accel(skb);
+		if (unlikely(err))
+			return err;
+	}
+
 	skb_pull_rcsum(skb, ETH_HLEN);
 	skb_reset_mac_header(skb);
 	skb->mac_len -= ETH_HLEN;
@@ -314,13 +325,6 @@ static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
 		    const struct ovs_action_push_eth *ethh)
 {
 	struct ethhdr *hdr;
-	int err;
-
-	/* De-accelerate any hardware accelerated VLAN tag added to a previous
-	 * Ethernet header */
-	err = skb_vlan_deaccel(skb);
-	if (unlikely(err))
-		return err;
 
 	/* Add the new Ethernet header */
 	if (skb_cow_head(skb, ETH_HLEN) < 0)
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 4d2698596033..fdefee776d4f 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -469,8 +469,10 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 	skb_reset_mac_header(skb);
 
 	/* Link layer. */
+	key->eth.tci = 0;
 	if (key->phy.is_layer3) {
-		key->eth.tci = 0;
+		if (skb_vlan_tag_present(skb))
+			key->eth.tci = htons(skb->vlan_tci);
 	} else {
 		eth = eth_hdr(skb);
 		ether_addr_copy(key->eth.src, eth->h_source);
@@ -481,7 +483,6 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 		 * update skb->csum here.
 		 */
 
-		key->eth.tci = 0;
 		if (skb_vlan_tag_present(skb))
 			key->eth.tci = htons(skb->vlan_tci);
 		else if (eth->h_proto == htons(ETH_P_8021Q))

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-20  5:29     ` Simon Horman
@ 2016-05-20  8:00       ` Jiri Benc
  2016-05-20  8:11         ` Simon Horman
  0 siblings, 1 reply; 43+ messages in thread
From: Jiri Benc @ 2016-05-20  8:00 UTC (permalink / raw)
  To: Simon Horman; +Cc: netdev, dev, Lorand Jakab, Thomas Morin

On Fri, 20 May 2016 14:29:01 +0900, Simon Horman wrote:
> The second option does seem rather tempting although I'm not sure
> that it actually plays out in the access-port scenario at this time.

We support gre ports to be access ports currently. With conversion to
ipgre, this needs to continue working. It's no problem for frames with
the Ethernet header but now we have a situation where a port is tagged,
thus the user expects that packets received on that port will behave
accordingly. I don't think we can make some packets honor this and some
ignore this; and we can't disallow gre to be an access port.

How do you plan to solve this? By user space always pushing an ethernet
header before push_vlan?

 Jiri

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-20  8:00       ` Jiri Benc
@ 2016-05-20  8:11         ` Simon Horman
  2016-05-20  8:16           ` Simon Horman
  0 siblings, 1 reply; 43+ messages in thread
From: Simon Horman @ 2016-05-20  8:11 UTC (permalink / raw)
  To: Jiri Benc; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

On Fri, May 20, 2016 at 10:00:28AM +0200, Jiri Benc wrote:
> On Fri, 20 May 2016 14:29:01 +0900, Simon Horman wrote:
> > The second option does seem rather tempting although I'm not sure
> > that it actually plays out in the access-port scenario at this time.
> 
> We support gre ports to be access ports currently. With conversion to
> ipgre, this needs to continue working. It's no problem for frames with
> the Ethernet header but now we have a situation where a port is tagged,
> thus the user expects that packets received on that port will behave
> accordingly. I don't think we can make some packets honor this and some
> ignore this; and we can't disallow gre to be an access port.
> 
> How do you plan to solve this? By user space always pushing an ethernet
> header before push_vlan?

Yes. That is my understanding of how OvS currently handles access ports but
I have a feeling that either I am mistaken or that you are referring to a
slightly different scenario.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-20  8:11         ` Simon Horman
@ 2016-05-20  8:16           ` Simon Horman
       [not found]             ` <20160520081611.GB17561-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 43+ messages in thread
From: Simon Horman @ 2016-05-20  8:16 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev, dev, Lorand Jakab, Thomas Morin

On Fri, May 20, 2016 at 05:11:23PM +0900, Simon Horman wrote:
> On Fri, May 20, 2016 at 10:00:28AM +0200, Jiri Benc wrote:
> > On Fri, 20 May 2016 14:29:01 +0900, Simon Horman wrote:
> > > The second option does seem rather tempting although I'm not sure
> > > that it actually plays out in the access-port scenario at this time.
> > 
> > We support gre ports to be access ports currently. With conversion to
> > ipgre, this needs to continue working. It's no problem for frames with
> > the Ethernet header but now we have a situation where a port is tagged,
> > thus the user expects that packets received on that port will behave
> > accordingly. I don't think we can make some packets honor this and some
> > ignore this; and we can't disallow gre to be an access port.
> > 
> > How do you plan to solve this? By user space always pushing an ethernet
> > header before push_vlan?
> 
> Yes. That is my understanding of how OvS currently handles access ports but
> I have a feeling that either I am mistaken or that you are referring to a
> slightly different scenario.

Hi again.

I apologise for having sent my previous email a little too quickly.

My understanding is that currently OvS handles access ports using a
push_vlan action. And that this patch set in conjunction with its
user-space counterpart should ensure that a push_eth action occurs first.
This is the context of my remarks above.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
       [not found]             ` <20160520081611.GB17561-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-05-20  8:39               ` Jiri Benc
  2016-05-20  9:12                 ` Simon Horman
  0 siblings, 1 reply; 43+ messages in thread
From: Jiri Benc @ 2016-05-20  8:39 UTC (permalink / raw)
  To: Simon Horman; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

On Fri, 20 May 2016 17:16:13 +0900, Simon Horman wrote:
> My understanding is that currently OvS handles access ports using a
> push_vlan action.

When needed (i.e. when the packet goes to a non-access port), yes.

> And that this patch set in conjunction with its
> user-space counterpart should ensure that a push_eth action occurs first.
> This is the context of my remarks above.

Okay, works for me principally. If it's later found insufficient,
relaxing push_vlan and pop_vlan to work also for L3 frames is still
easily possible without breaking compatibility.

Out of curiosity (and without looking at the user space patchset), what
will the pushed Ethernet header contain? E.g., consider the following
scenario: two GRE ports, both set as access ports with the same tag,
and a mirror port mirroring everything. Now an IP packet without inner
Ethernet header is received on one of the GRE interfaces.

Will the packet be output to the second GRE port? Will it be sent out
without the inner Ethernet header? Are custom rules necessary, or will
NORMAL take care of this? What will be sent to the mirror port?

Thanks!

 Jiri
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-20  8:39               ` Jiri Benc
@ 2016-05-20  9:12                 ` Simon Horman
  2016-05-20  9:20                   ` Jiri Benc
  0 siblings, 1 reply; 43+ messages in thread
From: Simon Horman @ 2016-05-20  9:12 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev, dev, Lorand Jakab, Thomas Morin

On Fri, May 20, 2016 at 10:39:39AM +0200, Jiri Benc wrote:
> On Fri, 20 May 2016 17:16:13 +0900, Simon Horman wrote:
> > My understanding is that currently OvS handles access ports using a
> > push_vlan action.
> 
> When needed (i.e. when the packet goes to a non-access port), yes.
> 
> > And that this patch set in conjunction with its
> > user-space counterpart should ensure that a push_eth action occurs first.
> > This is the context of my remarks above.
> 
> Okay, works for me principally. If it's later found insufficient,
> relaxing push_vlan and pop_vlan to work also for L3 frames is still
> easily possible without breaking compatibility.

Right. I'm all for allowing extension later if the need arises.

> Out of curiosity (and without looking at the user space patchset), what
> will the pushed Ethernet header contain? E.g., consider the following
> scenario: two GRE ports, both set as access ports with the same tag,
> and a mirror port mirroring everything. Now an IP packet without inner
> Ethernet header is received on one of the GRE interfaces.
> 
> Will the packet be output to the second GRE port? Will it be sent out
> without the inner Ethernet header? Are custom rules necessary, or will
> NORMAL take care of this? What will be sent to the mirror port?

Let me take a stab at answering that without running any tests.

1. push_eth adds an Ethernet header with all-zero addresses and
   the Ethernet type as determined from skb->protocol which is in
   turn determined by the tunnel header (we have discussed that
   bit before).

   In principle it is pushed when needed. And this happens automatically
   as controlled by user-space.

   It is possible to modify the Ethernet addresses using a custom rule.
   (I need to exercise that more often.)

2. For the GRE part of the scenario above it is important to know that with
   the accompanying user-space patch set OvS user-space the user-space
   representation of a vport (from now on simply vport) may be "layer3" or
   not.

   This allows OvS user-space to determine if an Ethernet header should be
   present or not on receive. And if it needs to be present or not on
   transmit. This allows it to automatically use pop_eth and push_eth to
   control the presence of an Ethernet header so its there when it needs to
   be and not when it doesn't.

   So if a GRE vport is "layer3" then no Ethernet header should be
   present on transmit, regardless of where the packet came from. And
   conversely if the GRE vport is not "layer3" then an Ethernet header
   should be present.

3. With regards to the mirroring part of your connection, I need to check
   on that and possibly its broken. But my thinking is that a mirroring
   vport would regarded in the same way as any other vport in this respect
   and the presence of the "layer3" flag would control whether an Ethernet
   header is present or not.

   It may be the case that its not possible to use a tunnel vport as a
   mirroring vport. And as all other vports currently do not support
   "layer3" then currently an Ethernet header would always be present
   on output to a mirror.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-20  9:12                 ` Simon Horman
@ 2016-05-20  9:20                   ` Jiri Benc
  2016-05-20 10:14                     ` Simon Horman
  0 siblings, 1 reply; 43+ messages in thread
From: Jiri Benc @ 2016-05-20  9:20 UTC (permalink / raw)
  To: Simon Horman; +Cc: netdev, dev, Lorand Jakab, Thomas Morin

On Fri, 20 May 2016 18:12:05 +0900, Simon Horman wrote:
> 1. push_eth adds an Ethernet header with all-zero addresses and
>    the Ethernet type as determined from skb->protocol which is in
>    turn determined by the tunnel header (we have discussed that
>    bit before).
> 
>    In principle it is pushed when needed. And this happens automatically
>    as controlled by user-space.
> 
>    It is possible to modify the Ethernet addresses using a custom rule.
>    (I need to exercise that more often.)
> 
> 2. For the GRE part of the scenario above it is important to know that with
>    the accompanying user-space patch set OvS user-space the user-space
>    representation of a vport (from now on simply vport) may be "layer3" or
>    not.
> 
>    This allows OvS user-space to determine if an Ethernet header should be
>    present or not on receive. And if it needs to be present or not on
>    transmit. This allows it to automatically use pop_eth and push_eth to
>    control the presence of an Ethernet header so its there when it needs to
>    be and not when it doesn't.
> 
>    So if a GRE vport is "layer3" then no Ethernet header should be
>    present on transmit, regardless of where the packet came from. And
>    conversely if the GRE vport is not "layer3" then an Ethernet header
>    should be present.

I think this works for me. Thanks a lot for answering my questions!

> 3. With regards to the mirroring part of your connection, I need to check
>    on that and possibly its broken. But my thinking is that a mirroring
>    vport would regarded in the same way as any other vport in this respect
>    and the presence of the "layer3" flag would control whether an Ethernet
>    header is present or not.
> 
>    It may be the case that its not possible to use a tunnel vport as a
>    mirroring vport. And as all other vports currently do not support
>    "layer3" then currently an Ethernet header would always be present
>    on output to a mirror.

We can just require a mirror port to be always L2 and output the L3
packets with zero Ethernet addresses. For mirroring, this should be
okay(?)

 Jiri

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support
  2016-05-20  9:20                   ` Jiri Benc
@ 2016-05-20 10:14                     ` Simon Horman
  0 siblings, 0 replies; 43+ messages in thread
From: Simon Horman @ 2016-05-20 10:14 UTC (permalink / raw)
  To: Jiri Benc; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

On Fri, May 20, 2016 at 11:20:04AM +0200, Jiri Benc wrote:
> On Fri, 20 May 2016 18:12:05 +0900, Simon Horman wrote:

[...]

> > 3. With regards to the mirroring part of your question, I need to check
> >    on that and possibly its broken. But my thinking is that a mirroring
> >    vport would regarded in the same way as any other vport in this respect
> >    and the presence of the "layer3" flag would control whether an Ethernet
> >    header is present or not.
> > 
> >    It may be the case that its not possible to use a tunnel vport as a
> >    mirroring vport. And as all other vports currently do not support
> >    "layer3" then currently an Ethernet header would always be present
> >    on output to a mirror.
> 
> We can just require a mirror port to be always L2 and output the L3
> packets with zero Ethernet addresses. For mirroring, this should be
> okay(?)

Yes, I expect that is a good approach.

I'll look over the code to see about making that so if its not already the
case.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2016-05-20 10:14 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-04  7:36 [PATCH v9 net-next 0/7] openvswitch: support for layer 3 encapsulated packets Simon Horman
2016-05-04  7:36 ` [PATCH v9 net-next 1/7] net: add skb_vlan_deaccel helper Simon Horman
2016-05-04  7:36 ` [PATCH v9 net-next 2/7] openvswitch: set skb protocol when receiving on internal device Simon Horman
2016-05-04  7:36 ` [PATCH v9 net-next 3/7] openvswitch: add support to push and pop mpls for layer3 packets Simon Horman
     [not found]   ` <1462347393-22354-4-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
2016-05-05 17:35     ` pravin shelar
2016-05-06  4:33       ` Simon Horman
2016-05-04  7:36 ` [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support Simon Horman
     [not found]   ` <1462347393-22354-5-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
2016-05-05 17:37     ` pravin shelar
2016-05-06  5:57       ` Simon Horman
2016-05-06  9:25         ` Jiri Benc
2016-05-09  8:04           ` Simon Horman
     [not found]             ` <20160509080420.GA4470-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
2016-05-10 12:01               ` Jiri Benc
2016-05-11  1:50                 ` Simon Horman
     [not found]                   ` <20160511015009.GB24436-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
2016-05-11  3:06                     ` Simon Horman
2016-05-11 14:09                       ` Jiri Benc
2016-05-11 22:46                         ` Simon Horman
2016-05-17 14:43                           ` Jiri Benc
2016-05-18  2:18                             ` Simon Horman
2016-05-11 13:57                   ` Jiri Benc
2016-05-06  9:35   ` Jiri Benc
2016-05-09  8:18     ` Simon Horman
2016-05-10  0:16       ` [ovs-dev] " Yang, Yi Y
     [not found]         ` <79BBBFE6CB6C9B488C1A45ACD284F51913CB6446-0J0gbvR4kTggGBtAFL8yw7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2016-05-10 12:07           ` Jiri Benc
2016-05-10 12:06       ` Jiri Benc
2016-05-11  3:28         ` Simon Horman
2016-05-11 14:10           ` Jiri Benc
2016-05-17 14:32   ` Jiri Benc
2016-05-20  5:29     ` Simon Horman
2016-05-20  8:00       ` Jiri Benc
2016-05-20  8:11         ` Simon Horman
2016-05-20  8:16           ` Simon Horman
     [not found]             ` <20160520081611.GB17561-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
2016-05-20  8:39               ` Jiri Benc
2016-05-20  9:12                 ` Simon Horman
2016-05-20  9:20                   ` Jiri Benc
2016-05-20 10:14                     ` Simon Horman
     [not found] ` <1462347393-22354-1-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
2016-05-04  7:36   ` [PATCH v9 net-next 5/7] openvswitch: add layer 3 support to ovs_packet_cmd_execute() Simon Horman
     [not found]     ` <1462347393-22354-6-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
2016-05-17 14:51       ` Jiri Benc
2016-05-18  2:24         ` Simon Horman
2016-05-04  7:36 ` [PATCH v9 net-next 6/7] openvswitch: extend layer 3 support to cover non-IP packets Simon Horman
2016-05-04  7:36 ` [PATCH v9 net-next 7/7] openvswitch: use ipgre tunnel rather than gretap tunnel Simon Horman
     [not found]   ` <1462347393-22354-8-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
2016-05-05 21:45     ` pravin shelar
2016-05-06  6:54       ` Simon Horman
2016-05-06  9:15         ` Jiri Benc

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.