[PATCH net-next v11 0/6] openvswitch: support for layer 3 encapsulated packets

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next v11 0/6] openvswitch: support for layer 3 encapsulated packets
@ 2016-07-06 17:59 Simon Horman
       [not found] ` <1467827996-32547-1-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Horman @ 2016-07-06 17:59 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: dev-yBygre7rU0TnMu66kgdUjQ

At the core of this patch set is removing the assumption in Open vSwitch
datapath that all packets have Ethernet header. Support for layer 3 GRE
tunnels is also added by this patchset.

The implementation relies on the presence of pop_eth and push_eth actions
in datapath flows to facilitate adding and removing Ethernet headers as
appropriate. The construction of such flows is left up to user-space.

This series is based on work by Lorand Jakab, Thomas Morin and others.
And it relies on recently merged work by Jiri Benc, much thanks to him for
his help.

This patch set is comprised of kernel patches against net-next.

To aid review it and the above dependency is available at:

    tree: https://github.com/horms/openvswitch
    branch: l3-vpn
    tag: l3-vpn-v11

There is a companion patch set for the Open vSwitch user-space code
which I will post separately to the dev@openvswitch.org mailing list as:

    "[PATCH v12 0/3] userspace: Support for layer 3 encapsulated packets"


Lorand Jakab (1):
  openvswitch: add layer 3 flow/port support

Simon Horman (5):
  net: introduce skb_transport_header_was_set()
  gre: unset mac header for non-TEB packets received by ipgre device
  openvswitch: set skb protocol and mac_len when receiving on internal
    device
  openvswitch: add support to push and pop mpls for layer3 packets
  openvswitch: use ipgre tunnel rather than gretap tunnel

 include/linux/skbuff.h               |   5 +
 include/net/gre.h                    |   4 +-
 include/uapi/linux/openvswitch.h     |  13 +++
 net/ipv4/ip_gre.c                    |  11 +-
 net/openvswitch/actions.c            |  69 ++++++++++--
 net/openvswitch/datapath.c           |  13 +--
 net/openvswitch/flow.c               |  65 +++++++----
 net/openvswitch/flow.h               |   4 +-
 net/openvswitch/flow_netlink.c       | 213 ++++++++++++++++++++++++-----------
 net/openvswitch/vport-geneve.c       |   2 +-
 net/openvswitch/vport-gre.c          |   4 +-
 net/openvswitch/vport-internal_dev.c |   9 ++
 net/openvswitch/vport-netdev.c       |  27 ++++-
 net/openvswitch/vport-netdev.h       |   2 +
 net/openvswitch/vport-vxlan.c        |   2 +-
 15 files changed, 321 insertions(+), 122 deletions(-)

-- 
2.7.0.rc3.207.g0ac5344

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH net-next v11 1/6] net: introduce skb_transport_header_was_set()
       [not found] ` <1467827996-32547-1-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
@ 2016-07-06 17:59   ` Simon Horman
       [not found]     ` <1467827996-32547-2-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
  2016-07-06 17:59   ` [PATCH net-next v11 2/6] gre: unset mac header for non-TEB packets received by ipgre device Simon Horman
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: Simon Horman @ 2016-07-06 17:59 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: dev-yBygre7rU0TnMu66kgdUjQ

This helper resets the mac_header of an skb to a state where
skb_transport_header_was_set() will return false.

This is intended to be used with packets received on
ARPHRD_NONE devices without an Ethernet header in the inner packet.
It allows skb_transport_header_was_set to be subsequently used to
differentiate such packets from those with an Ethernet header.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
v11
* New patch
---
 include/linux/skbuff.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 638b0e004310..669d63b038f6 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2170,6 +2170,11 @@ static inline void skb_reset_mac_header(struct sk_buff *skb)
 	skb->mac_header = skb->data - skb->head;
 }
 
+static inline void skb_unset_mac_header(struct sk_buff *skb)
+{
+	skb->mac_header = (typeof(skb->mac_header))~0U;
+}
+
 static inline void skb_set_mac_header(struct sk_buff *skb, const int offset)
 {
 	skb_reset_mac_header(skb);
-- 
2.7.0.rc3.207.g0ac5344

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next v11 2/6] gre: unset mac header for non-TEB packets received by ipgre device
       [not found] ` <1467827996-32547-1-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
  2016-07-06 17:59   ` [PATCH net-next v11 1/6] net: introduce skb_transport_header_was_set() Simon Horman
@ 2016-07-06 17:59   ` Simon Horman
  2016-07-07 20:51     ` [ovs-dev] " pravin shelar
  2016-07-06 17:59   ` [PATCH net-next v11 3/6] openvswitch: set skb protocol and mac_len when receiving on internal device Simon Horman
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: Simon Horman @ 2016-07-06 17:59 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: dev-yBygre7rU0TnMu66kgdUjQ

unset rather than reset mach header for non-TEB packets received by an
ipgre device.  This allows skb_transport_header_was_set to be subsequently
used to differentiate TEB and non-TEB packets recieved on an ipgre device.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
v11
* New patch
---
 net/ipv4/ip_gre.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 5b1481be0282..330d58e9c523 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -283,6 +283,8 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
 
 		if (tunnel->dev->type != ARPHRD_NONE)
 			skb_pop_mac_header(skb);
+		else if (tpi->proto != htons(ETH_P_TEB))
+			skb_unset_mac_header(skb);
 		else
 			skb_reset_mac_header(skb);
 		if (tunnel->collect_md) {
-- 
2.7.0.rc3.207.g0ac5344

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next v11 3/6] openvswitch: set skb protocol and mac_len when receiving on internal device
       [not found] ` <1467827996-32547-1-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
  2016-07-06 17:59   ` [PATCH net-next v11 1/6] net: introduce skb_transport_header_was_set() Simon Horman
  2016-07-06 17:59   ` [PATCH net-next v11 2/6] gre: unset mac header for non-TEB packets received by ipgre device Simon Horman
@ 2016-07-06 17:59   ` Simon Horman
       [not found]     ` <1467827996-32547-4-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
  2016-07-06 17:59   ` [PATCH net-next v11 4/6] openvswitch: add support to push and pop mpls for layer3 packets Simon Horman
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: Simon Horman @ 2016-07-06 17:59 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: dev-yBygre7rU0TnMu66kgdUjQ

* Set skb protocol based on contents of packet. I have observed this is
  necessary to get actual protocol of a packet when it is injected into an
  internal device e.g. by libnet in which case skb protocol will be set to
  ETH_ALL.

* Set the mac_len which has been observed to not be set up correctly when
  an ARP packet is generated and sent via an openvswitch bridge.
  My test case is a scenario where there are two open vswtich bridges.
  One outputs to a tunnel port which egresses on the other.

The motivation for this is that support for outputting to layer 3 (non-tap)
GRE tunnels as implemented by a subsequent patch depends on protocol and
mac_len being set correctly on receive.

Signed-off-by: Simon Horman <simon.horman@netronome.com>

---
v11
* Do not set mac_len.
  Instead of relying on mac_len follow-up patches now
  use skb_unset_mac_header()

v10
* Set mac_len

v9
* New patch
---
 net/openvswitch/vport-internal_dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index 434e04c3a189..32d8e94d9bff 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -48,6 +48,9 @@ static int internal_dev_xmit(struct sk_buff *skb, struct net_device *netdev)
 {
 	int len, err;
 
+	skb->protocol = eth_type_trans(skb, netdev);
+	skb_push(skb, ETH_HLEN);
+
 	len = skb->len;
 	rcu_read_lock();
 	err = ovs_vport_receive(internal_dev_priv(netdev)->vport, skb, NULL);
-- 
2.7.0.rc3.207.g0ac5344

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next v11 4/6] openvswitch: add support to push and pop mpls for layer3 packets
       [not found] ` <1467827996-32547-1-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
                     ` (2 preceding siblings ...)
  2016-07-06 17:59   ` [PATCH net-next v11 3/6] openvswitch: set skb protocol and mac_len when receiving on internal device Simon Horman
@ 2016-07-06 17:59   ` Simon Horman
       [not found]     ` <1467827996-32547-5-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
  2016-07-06 17:59   ` [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support Simon Horman
  2016-07-06 17:59   ` [PATCH net-next v11 6/6] openvswitch: use ipgre tunnel rather than gretap tunnel Simon Horman
  5 siblings, 1 reply; 36+ messages in thread
From: Simon Horman @ 2016-07-06 17:59 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: dev-yBygre7rU0TnMu66kgdUjQ

Allow push and pop mpls actions to act on layer 3 packets by teaching
them not to access non-existent L2 headers of such packets.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
v11
* group l2 code in pop_mpls()

v10
* Limit scope of hdr in {push,pop}_mpls()

v9
* New Patch
---
 include/uapi/linux/openvswitch.h |  2 ++
 net/openvswitch/actions.c        | 24 +++++++++++++++---------
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index d95a3018f6a1..5cde501433eb 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -737,6 +737,8 @@ enum ovs_nat_attr {
  * is no MPLS label stack, as determined by ethertype, no action is taken.
  * @OVS_ACTION_ATTR_CT: Track the connection. Populate the conntrack-related
  * entries in the flow key.
+ * @OVS_ACTION_ATTR_PUSH_ETH: Push a new outermost Ethernet header onto the      * packet.
+ * @OVS_ACTION_ATTR_POP_ETH: Pop the outermost Ethernet header off the packet.
  *
  * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 1ecbd7715f6d..12e8a8942a42 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -163,8 +163,6 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 		return -ENOMEM;
 
 	skb_push(skb, MPLS_HLEN);
-	memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
-		skb->mac_len);
 	skb_reset_mac_header(skb);
 
 	new_mpls_lse = (__be32 *)skb_mpls_header(skb);
@@ -172,7 +170,11 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 
 	skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
 
-	update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
+	if (skb->mac_len) {
+		update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
+		memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
+			skb->mac_len);
+	}
 	if (!skb->inner_protocol)
 		skb_set_inner_protocol(skb, skb->protocol);
 	skb->protocol = mpls->mpls_ethertype;
@@ -184,7 +186,6 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 		    const __be16 ethertype)
 {
-	struct ethhdr *hdr;
 	int err;
 
 	err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN);
@@ -199,11 +200,16 @@ static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 	__skb_pull(skb, MPLS_HLEN);
 	skb_reset_mac_header(skb);
 
-	/* skb_mpls_header() is used to locate the ethertype
-	 * field correctly in the presence of VLAN tags.
-	 */
-	hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
-	update_ethertype(skb, hdr, ethertype);
+	if (skb->mac_len) {
+		struct ethhdr *hdr;
+
+		/* skb_mpls_header() is used to locate the ethertype
+		 * field correctly in the presence of VLAN tags.
+		 */
+		hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
+		update_ethertype(skb, hdr, ethertype);
+	}
+
 	if (eth_p_mpls(skb->protocol))
 		skb->protocol = ethertype;
 
-- 
2.7.0.rc3.207.g0ac5344

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
       [not found] ` <1467827996-32547-1-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
                     ` (3 preceding siblings ...)
  2016-07-06 17:59   ` [PATCH net-next v11 4/6] openvswitch: add support to push and pop mpls for layer3 packets Simon Horman
@ 2016-07-06 17:59   ` Simon Horman
  2016-07-07 20:54     ` [ovs-dev] " pravin shelar
  2016-07-06 17:59   ` [PATCH net-next v11 6/6] openvswitch: use ipgre tunnel rather than gretap tunnel Simon Horman
  5 siblings, 1 reply; 36+ messages in thread
From: Simon Horman @ 2016-07-06 17:59 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: dev-yBygre7rU0TnMu66kgdUjQ

From: Lorand Jakab <lojakab@cisco.com>

Implementation of the pop_eth and push_eth actions in the kernel, and
layer 3 flow support.

This doesn't actually do anything yet as no layer 2 tunnel ports are
supported yet. The original patch by Lorand was against the Open vSwitch
tree which has L2 LISP tunnels but that is not supported in mainline Linux.
I (Simon) plan to follow up with support for non-TEB GRE ports based on
work by Thomas Morin.

Cc: Thomas Morin <thomas.morin@orange.com>
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>

---
v11 [Simon Horman]
* Consolidate setting of eth.key in key_extract.
* Limit scope of eth in key_extract()
* Update push_eth to account for MPLS
* Do not include VLAN support in pop_eth
  - pop_eth is never called for VLAN packets so don't add support for them
* Use OVS_KEY_ATTR_ETHERTYPE instead of adding OVS_KEY_ATTR_PACKET_ETHERTYPE

v10 [Simon Horman]
* Move outermost VLAN into skb metadata in pop_eth and
  leave any VLAN as-is in push_eth. The effect is to allow the presence
  of a vlan to be independent of pushing and popping ethernet headers.
* Omit unnecessary type field from push_eth action
* Squash with the following patches to make a more complete patch:
  "openvswitch: add layer 3 support to ovs_packet_cmd_execute()"
  "openvswitch: extend layer 3 support to cover non-IP packets"

v9 [Simon Horman]
* Rebase
* Minor coding style updates
* Prohibit push/pop MPLS on l3 packets
* There are no layer 3 ports supported at this time so only
  send and receive layer 2 packets: that is don't actually
  use this new infrastructure yet
* Expect that vports that can handle layer 3 packets will: have
  a type other than ARPHRD_IPETHER; can also handle layer 2 packets;
  and that packets can be differentiated by layer 2 packets having
  skb->protocol set to htons(ETH_P_TEB)

v1 - v8 [Lorand Jakub]
---
 include/uapi/linux/openvswitch.h     |  11 ++
 net/openvswitch/actions.c            |  45 ++++++++
 net/openvswitch/datapath.c           |  13 +--
 net/openvswitch/flow.c               |  65 +++++++----
 net/openvswitch/flow.h               |   4 +-
 net/openvswitch/flow_netlink.c       | 213 ++++++++++++++++++++++++-----------
 net/openvswitch/vport-geneve.c       |   2 +-
 net/openvswitch/vport-gre.c          |   2 +-
 net/openvswitch/vport-internal_dev.c |   6 +
 net/openvswitch/vport-netdev.c       |  19 +++-
 net/openvswitch/vport-netdev.h       |   2 +
 net/openvswitch/vport-vxlan.c        |   2 +-
 12 files changed, 279 insertions(+), 105 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 5cde501433eb..6f505e486e93 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -705,6 +705,15 @@ enum ovs_nat_attr {
 
 #define OVS_NAT_ATTR_MAX (__OVS_NAT_ATTR_MAX - 1)
 
+/*
+ * struct ovs_action_push_eth - %OVS_ACTION_ATTR_PUSH_ETH action argument.
+ * @addresses: Source and destination MAC addresses.
+ * @eth_type: Ethernet type
+ */
+struct ovs_action_push_eth {
+	struct ovs_key_ethernet addresses;
+};
+
 /**
  * enum ovs_action_attr - Action types.
  *
@@ -766,6 +775,8 @@ enum ovs_action_attr {
 				       * bits. */
 	OVS_ACTION_ATTR_CT,           /* Nested OVS_CT_ATTR_* . */
 	OVS_ACTION_ATTR_TRUNC,        /* u32 struct ovs_action_trunc. */
+	OVS_ACTION_ATTR_PUSH_ETH,     /* struct ovs_action_push_eth. */
+	OVS_ACTION_ATTR_POP_ETH,      /* No argument. */
 
 	__OVS_ACTION_ATTR_MAX,	      /* Nothing past this will be accepted
 				       * from userspace. */
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 12e8a8942a42..0001f651c934 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -301,6 +301,43 @@ static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key,
 	return 0;
 }
 
+/* pop_eth does not support VLAN packets as this action is never called
+ * for them.
+ */
+static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
+{
+	skb_pull_rcsum(skb, ETH_HLEN);
+	skb_reset_mac_header(skb);
+	skb->mac_len -= ETH_HLEN;
+
+	invalidate_flow_key(key);
+	return 0;
+}
+
+static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
+		    const struct ovs_action_push_eth *ethh)
+{
+	struct ethhdr *hdr;
+
+	/* Add the new Ethernet header */
+	if (skb_cow_head(skb, ETH_HLEN) < 0)
+		return -ENOMEM;
+
+	skb_push(skb, ETH_HLEN);
+	skb_reset_mac_header(skb);
+	skb->mac_len += ETH_HLEN;
+
+	hdr = eth_hdr(skb);
+	ether_addr_copy(hdr->h_source, ethh->addresses.eth_src);
+	ether_addr_copy(hdr->h_dest, ethh->addresses.eth_dst);
+	hdr->h_proto = skb->protocol;
+
+	skb_postpush_rcsum(skb, hdr, ETH_HLEN);
+
+	invalidate_flow_key(key);
+	return 0;
+}
+
 static void update_ip_l4_checksum(struct sk_buff *skb, struct iphdr *nh,
 				  __be32 addr, __be32 new_addr)
 {
@@ -1121,6 +1158,14 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 			err = pop_vlan(skb, key);
 			break;
 
+		case OVS_ACTION_ATTR_PUSH_ETH:
+			err = push_eth(skb, key, nla_data(a));
+			break;
+
+		case OVS_ACTION_ATTR_POP_ETH:
+			err = pop_eth(skb, key);
+			break;
+
 		case OVS_ACTION_ATTR_RECIRC:
 			err = execute_recirc(dp, skb, key, a, rem);
 			if (nla_is_last(a, rem)) {
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 524c0fd3078e..277f4f5ffea8 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -562,7 +562,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	struct sw_flow *flow;
 	struct sw_flow_actions *sf_acts;
 	struct datapath *dp;
-	struct ethhdr *eth;
 	struct vport *input_vport;
 	u16 mru = 0;
 	int len;
@@ -583,17 +582,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 
 	nla_memcpy(__skb_put(packet, len), a[OVS_PACKET_ATTR_PACKET], len);
 
-	skb_reset_mac_header(packet);
-	eth = eth_hdr(packet);
-
-	/* Normally, setting the skb 'protocol' field would be handled by a
-	 * call to eth_type_trans(), but it assumes there's a sending
-	 * device, which we may not have. */
-	if (eth_proto_is_802_3(eth->h_proto))
-		packet->protocol = eth->h_proto;
-	else
-		packet->protocol = htons(ETH_P_802_2);
-
 	/* Set packet's mru */
 	if (a[OVS_PACKET_ATTR_MRU]) {
 		mru = nla_get_u16(a[OVS_PACKET_ATTR_MRU]);
@@ -620,6 +608,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	rcu_assign_pointer(flow->sf_acts, acts);
 	packet->priority = flow->key.phy.priority;
 	packet->mark = flow->key.phy.skb_mark;
+	packet->protocol = flow->key.eth.type;
 
 	rcu_read_lock();
 	dp = get_dp_rcu(net, ovs_header->dp_ifindex);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 0ea128eeeab2..86f2cfb19de3 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -461,35 +461,39 @@ invalid:
 static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 {
 	int error;
-	struct ethhdr *eth;
 
 	/* Flags are always used as part of stats */
 	key->tp.flags = 0;
 
 	skb_reset_mac_header(skb);
 
-	/* Link layer.  We are guaranteed to have at least the 14 byte Ethernet
-	 * header in the linear data area.
-	 */
-	eth = eth_hdr(skb);
-	ether_addr_copy(key->eth.src, eth->h_source);
-	ether_addr_copy(key->eth.dst, eth->h_dest);
+	/* Link layer. */
+	key->eth.tci = 0;
+	if (key->phy.is_layer3) {
+		if (skb_vlan_tag_present(skb))
+			key->eth.tci = htons(skb->vlan_tci);
+		key->eth.type = skb->protocol;
+	} else {
+		struct ethhdr *eth = eth_hdr(skb);
 
-	__skb_pull(skb, 2 * ETH_ALEN);
-	/* We are going to push all headers that we pull, so no need to
-	 * update skb->csum here.
-	 */
+		ether_addr_copy(key->eth.src, eth->h_source);
+		ether_addr_copy(key->eth.dst, eth->h_dest);
 
-	key->eth.tci = 0;
-	if (skb_vlan_tag_present(skb))
-		key->eth.tci = htons(skb->vlan_tci);
-	else if (eth->h_proto == htons(ETH_P_8021Q))
-		if (unlikely(parse_vlan(skb, key)))
-			return -ENOMEM;
+		__skb_pull(skb, 2 * ETH_ALEN);
+		/* We are going to push all headers that we pull, so no need to
+		 * update skb->csum here.
+		 */
 
-	key->eth.type = parse_ethertype(skb);
-	if (unlikely(key->eth.type == htons(0)))
-		return -ENOMEM;
+		if (skb_vlan_tag_present(skb))
+			key->eth.tci = htons(skb->vlan_tci);
+		else if (eth->h_proto == htons(ETH_P_8021Q))
+			if (unlikely(parse_vlan(skb, key)))
+				return -ENOMEM;
+
+		key->eth.type = parse_ethertype(skb);
+		if (unlikely(key->eth.type == htons(0)))
+			return -ENOMEM;
+	}
 
 	skb_reset_network_header(skb);
 	skb_reset_mac_len(skb);
@@ -696,6 +700,8 @@ int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key)
 int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 			 struct sk_buff *skb, struct sw_flow_key *key)
 {
+	int err;
+
 	/* Extract metadata from packet. */
 	if (tun_info) {
 		key->tun_proto = ip_tunnel_info_af(tun_info);
@@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 	key->phy.skb_mark = skb->mark;
 	ovs_ct_fill_key(skb, key);
 	key->ovs_flow_hash = 0;
+	key->phy.is_layer3 = skb->mac_len == 0;
 	key->recirc_id = 0;
 
-	return key_extract(skb, key);
+	err = key_extract(skb, key);
+	if (err < 0)
+		return err;
+
+	if (tun_info && skb->protocol == htons(ETH_P_TEB))
+		skb->protocol = key->eth.type;
+
+	return err;
 }
 
 int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
@@ -741,5 +755,14 @@ int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
 	if (err)
 		return err;
 
+	/* key_extract assumes that skb->protocol is set-up for
+	 * layer 3 packets which is the case for other callers,
+	 * in particular packets recieved from the network stack.
+	 * Here the correct value can be set from the metadata
+	 * extracted above.
+	 */
+	if (key->phy.is_layer3)
+		skb->protocol = key->eth.type;
+
 	return key_extract(skb, key);
 }
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 03378e75a67c..5395ec0c3c13 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -62,6 +62,7 @@ struct sw_flow_key {
 		u32	priority;	/* Packet QoS priority. */
 		u32	skb_mark;	/* SKB mark. */
 		u16	in_port;	/* Input switch port (or DP_MAX_PORTS). */
+		bool	is_layer3;	/* Packet has no Ethernet header */
 	} __packed phy; /* Safe when right after 'tun_key'. */
 	u8 tun_proto;			/* Protocol of encapsulating tunnel. */
 	u32 ovs_flow_hash;		/* Datapath computed hash value.  */
@@ -219,8 +220,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
 
 int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key);
 int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
-			 struct sk_buff *skb,
-			 struct sw_flow_key *key);
+			 struct sk_buff *skb, struct sw_flow_key *key);
 /* Extract key from packet coming from userspace. */
 int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
 				   struct sk_buff *skb,
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index c78a6a1476fb..fc94fbe1ddc3 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -123,7 +123,7 @@ static void update_range(struct sw_flow_match *match,
 static bool match_validate(const struct sw_flow_match *match,
 			   u64 key_attrs, u64 mask_attrs, bool log)
 {
-	u64 key_expected = 1 << OVS_KEY_ATTR_ETHERNET;
+	u64 key_expected = 0;
 	u64 mask_allowed = key_attrs;  /* At most allow all key attributes */
 
 	/* The following mask attributes allowed only if they
@@ -808,10 +808,34 @@ int ovs_nla_put_tunnel_info(struct sk_buff *skb,
 				  ip_tunnel_info_af(tun_info));
 }
 
+static int ethertype_from_nlattrs(struct net *net, struct sw_flow_match *match,
+				 u64 *attrs, const struct nlattr **a,
+				 bool is_mask, bool log)
+{
+	__be16 eth_type;
+
+	eth_type = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]);
+	if (is_mask) {
+		/* Always exact match EtherType. */
+		eth_type = htons(0xffff);
+	} else if (!eth_proto_is_802_3(eth_type)) {
+		OVS_NLERR(log, "EtherType %x is less than min %x",
+			  ntohs(eth_type), ETH_P_802_3_MIN);
+		return -EINVAL;
+	}
+
+	SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask);
+	*attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE);
+
+	return 0;
+}
+
 static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match,
 				 u64 *attrs, const struct nlattr **a,
 				 bool is_mask, bool log)
 {
+	bool is_layer3 = false;
+
 	if (*attrs & (1 << OVS_KEY_ATTR_DP_HASH)) {
 		u32 hash_val = nla_get_u32(a[OVS_KEY_ATTR_DP_HASH]);
 
@@ -898,20 +922,33 @@ static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match,
 				   sizeof(*cl), is_mask);
 		*attrs &= ~(1ULL << OVS_KEY_ATTR_CT_LABELS);
 	}
-	return 0;
-}
 
-static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
-				u64 attrs, const struct nlattr **a,
-				bool is_mask, bool log)
-{
-	int err;
+	/* For layer 3 packets the ethernet type is provided
+	 * and treated as metadata but no MAC addresses are provided.
+	 */
+	if (*attrs & (1ULL << OVS_KEY_ATTR_ETHERTYPE) &&
+	    !(*attrs & (1 << OVS_KEY_ATTR_ETHERNET))) {
+		int err;
 
-	err = metadata_from_nlattrs(net, match, &attrs, a, is_mask, log);
-	if (err)
-		return err;
+		err = ethertype_from_nlattrs(net, match, attrs, a, is_mask,
+					     log);
+		if (err)
+			return err;
+
+		is_layer3 = true;
+	}
 
-	if (attrs & (1 << OVS_KEY_ATTR_ETHERNET)) {
+	/* Always exact match is_layer3 */
+	SW_FLOW_KEY_PUT(match, phy.is_layer3, is_mask ? true : is_layer3,
+			is_mask);
+	return is_layer3;
+}
+
+static int l2_from_nlattrs(struct net *net, struct sw_flow_match *match,
+			   u64 *attrs, const struct nlattr **a,
+			   bool is_mask, bool log)
+{
+	if (*attrs & (1 << OVS_KEY_ATTR_ETHERNET)) {
 		const struct ovs_key_ethernet *eth_key;
 
 		eth_key = nla_data(a[OVS_KEY_ATTR_ETHERNET]);
@@ -919,10 +956,10 @@ static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
 				eth_key->eth_src, ETH_ALEN, is_mask);
 		SW_FLOW_KEY_MEMCPY(match, eth.dst,
 				eth_key->eth_dst, ETH_ALEN, is_mask);
-		attrs &= ~(1 << OVS_KEY_ATTR_ETHERNET);
+		*attrs &= ~(1 << OVS_KEY_ATTR_ETHERNET);
 	}
 
-	if (attrs & (1 << OVS_KEY_ATTR_VLAN)) {
+	if (*attrs & (1 << OVS_KEY_ATTR_VLAN)) {
 		__be16 tci;
 
 		tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);
@@ -936,28 +973,41 @@ static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
 		}
 
 		SW_FLOW_KEY_PUT(match, eth.tci, tci, is_mask);
-		attrs &= ~(1 << OVS_KEY_ATTR_VLAN);
+		*attrs &= ~(1 << OVS_KEY_ATTR_VLAN);
 	}
 
-	if (attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) {
-		__be16 eth_type;
-
-		eth_type = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]);
-		if (is_mask) {
-			/* Always exact match EtherType. */
-			eth_type = htons(0xffff);
-		} else if (!eth_proto_is_802_3(eth_type)) {
-			OVS_NLERR(log, "EtherType %x is less than min %x",
-				  ntohs(eth_type), ETH_P_802_3_MIN);
-			return -EINVAL;
-		}
+	if (*attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) {
+		int err;
 
-		SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask);
-		attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE);
+		err = ethertype_from_nlattrs(net, match, attrs, a, is_mask,
+					     log);
+		if (err)
+			return err;
 	} else if (!is_mask) {
 		SW_FLOW_KEY_PUT(match, eth.type, htons(ETH_P_802_2), is_mask);
 	}
 
+	return 0;
+}
+
+static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
+				u64 attrs, const struct nlattr **a,
+				bool is_mask, bool log)
+{
+	int err;
+	bool is_layer3;
+
+	err = metadata_from_nlattrs(net, match, &attrs, a, is_mask, log);
+	if (err < 0)
+		return err;
+	is_layer3 = err != 0;
+
+	if (!is_layer3) {
+		err = l2_from_nlattrs(net, match, &attrs, a, is_mask, log);
+		if (err < 0)
+			return err;
+	}
+
 	if (attrs & (1 << OVS_KEY_ATTR_IPV4)) {
 		const struct ovs_key_ipv4 *ipv4_key;
 
@@ -1407,7 +1457,11 @@ int ovs_nla_get_flow_metadata(struct net *net, const struct nlattr *attr,
 	memset(&key->ct, 0, sizeof(key->ct));
 	key->phy.in_port = DP_MAX_PORTS;
 
-	return metadata_from_nlattrs(net, &match, &attrs, a, false, log);
+	err = metadata_from_nlattrs(net, &match, &attrs, a, false, log);
+	if (err < 0)
+		return err;
+
+	return 0;
 }
 
 static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
@@ -1415,7 +1469,7 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
 			     struct sk_buff *skb)
 {
 	struct ovs_key_ethernet *eth_key;
-	struct nlattr *nla, *encap;
+	struct nlattr *nla, *encap = NULL;
 
 	if (nla_put_u32(skb, OVS_KEY_ATTR_RECIRC_ID, output->recirc_id))
 		goto nla_put_failure;
@@ -1456,38 +1510,40 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
 	if (ovs_ct_put_key(output, skb))
 		goto nla_put_failure;
 
-	nla = nla_reserve(skb, OVS_KEY_ATTR_ETHERNET, sizeof(*eth_key));
-	if (!nla)
-		goto nla_put_failure;
+	if (!swkey->phy.is_layer3) {
+		nla = nla_reserve(skb, OVS_KEY_ATTR_ETHERNET, sizeof(*eth_key));
+		if (!nla)
+			goto nla_put_failure;
 
-	eth_key = nla_data(nla);
-	ether_addr_copy(eth_key->eth_src, output->eth.src);
-	ether_addr_copy(eth_key->eth_dst, output->eth.dst);
+		eth_key = nla_data(nla);
+		ether_addr_copy(eth_key->eth_src, output->eth.src);
+		ether_addr_copy(eth_key->eth_dst, output->eth.dst);
 
-	if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) {
-		__be16 eth_type;
-		eth_type = !is_mask ? htons(ETH_P_8021Q) : htons(0xffff);
-		if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, eth_type) ||
-		    nla_put_be16(skb, OVS_KEY_ATTR_VLAN, output->eth.tci))
-			goto nla_put_failure;
-		encap = nla_nest_start(skb, OVS_KEY_ATTR_ENCAP);
-		if (!swkey->eth.tci)
-			goto unencap;
-	} else
-		encap = NULL;
-
-	if (swkey->eth.type == htons(ETH_P_802_2)) {
-		/*
-		 * Ethertype 802.2 is represented in the netlink with omitted
-		 * OVS_KEY_ATTR_ETHERTYPE in the flow key attribute, and
-		 * 0xffff in the mask attribute.  Ethertype can also
-		 * be wildcarded.
-		 */
-		if (is_mask && output->eth.type)
-			if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE,
-						output->eth.type))
+		if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) {
+			__be16 eth_type;
+			eth_type = !is_mask ? htons(ETH_P_8021Q) : htons(0xffff);
+			if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, eth_type) ||
+			    nla_put_be16(skb, OVS_KEY_ATTR_VLAN,
+					 output->eth.tci))
 				goto nla_put_failure;
-		goto unencap;
+			encap = nla_nest_start(skb, OVS_KEY_ATTR_ENCAP);
+			if (!swkey->eth.tci)
+				goto unencap;
+		}
+
+		if (swkey->eth.type == htons(ETH_P_802_2)) {
+			/*
+			 * Ethertype 802.2 is represented in the netlink
+			 * with omitted OVS_KEY_ATTR_ETHERTYPE in the flow
+			 * key attribute, and 0xffff in the mask attribute.
+			 * Ethertype can also be wildcarded.
+			 */
+			if (is_mask && output->eth.type)
+				if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE,
+						 output->eth.type))
+					goto nla_put_failure;
+			goto unencap;
+		}
 	}
 
 	if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, output->eth.type))
@@ -2010,8 +2066,8 @@ static bool validate_masked(u8 *data, int len)
 
 static int validate_set(const struct nlattr *a,
 			const struct sw_flow_key *flow_key,
-			struct sw_flow_actions **sfa,
-			bool *skip_copy, __be16 eth_type, bool masked, bool log)
+			struct sw_flow_actions **sfa, bool *skip_copy,
+			__be16 eth_type, bool masked, bool log, bool is_layer3)
 {
 	const struct nlattr *ovs_key = nla_data(a);
 	int key_type = nla_type(ovs_key);
@@ -2041,7 +2097,11 @@ static int validate_set(const struct nlattr *a,
 	case OVS_KEY_ATTR_SKB_MARK:
 	case OVS_KEY_ATTR_CT_MARK:
 	case OVS_KEY_ATTR_CT_LABELS:
+		break;
+
 	case OVS_KEY_ATTR_ETHERNET:
+		if (is_layer3)
+			return -EINVAL;
 		break;
 
 	case OVS_KEY_ATTR_TUNNEL:
@@ -2208,6 +2268,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 				  int depth, struct sw_flow_actions **sfa,
 				  __be16 eth_type, __be16 vlan_tci, bool log)
 {
+	bool is_layer3 = key->phy.is_layer3;
 	const struct nlattr *a;
 	int rem, err;
 
@@ -2230,6 +2291,8 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			[OVS_ACTION_ATTR_HASH] = sizeof(struct ovs_action_hash),
 			[OVS_ACTION_ATTR_CT] = (u32)-1,
 			[OVS_ACTION_ATTR_TRUNC] = sizeof(struct ovs_action_trunc),
+			[OVS_ACTION_ATTR_PUSH_ETH] = sizeof(struct ovs_action_push_eth),
+			[OVS_ACTION_ATTR_POP_ETH] = 0,
 		};
 		const struct ovs_action_push_vlan *vlan;
 		int type = nla_type(a);
@@ -2278,10 +2341,14 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 		}
 
 		case OVS_ACTION_ATTR_POP_VLAN:
+			if (is_layer3)
+				return -EINVAL;
 			vlan_tci = htons(0);
 			break;
 
 		case OVS_ACTION_ATTR_PUSH_VLAN:
+			if (is_layer3)
+				return -EINVAL;
 			vlan = nla_data(a);
 			if (vlan->vlan_tpid != htons(ETH_P_8021Q))
 				return -EINVAL;
@@ -2331,14 +2398,16 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 
 		case OVS_ACTION_ATTR_SET:
 			err = validate_set(a, key, sfa,
-					   &skip_copy, eth_type, false, log);
+					   &skip_copy, eth_type, false, log,
+					   is_layer3);
 			if (err)
 				return err;
 			break;
 
 		case OVS_ACTION_ATTR_SET_MASKED:
 			err = validate_set(a, key, sfa,
-					   &skip_copy, eth_type, true, log);
+					   &skip_copy, eth_type, true, log,
+					   is_layer3);
 			if (err)
 				return err;
 			break;
@@ -2358,6 +2427,22 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			skip_copy = true;
 			break;
 
+		case OVS_ACTION_ATTR_POP_ETH:
+			if (is_layer3)
+				return -EINVAL;
+			if (vlan_tci & htons(VLAN_TAG_PRESENT))
+				return -EINVAL;
+			is_layer3 = true;
+			break;
+
+		case OVS_ACTION_ATTR_PUSH_ETH:
+			/* For now disallow pushing an Ethernet header if one
+			 * is already present */
+			if (!is_layer3)
+				return -EINVAL;
+			is_layer3 = false;
+			break;
+
 		default:
 			OVS_NLERR(log, "Unknown Action type %d", type);
 			return -EINVAL;
diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
index 1a1fcec88695..7a06e19f5279 100644
--- a/net/openvswitch/vport-geneve.c
+++ b/net/openvswitch/vport-geneve.c
@@ -116,7 +116,7 @@ static struct vport_ops ovs_geneve_vport_ops = {
 	.create		= geneve_create,
 	.destroy	= ovs_netdev_tunnel_destroy,
 	.get_options	= geneve_get_options,
-	.send		= dev_queue_xmit,
+	.send		= ovs_netdev_send,
 };
 
 static int __init ovs_geneve_tnl_init(void)
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index 7f8897f33a67..bcbc91b8b077 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -87,7 +87,7 @@ static struct vport *gre_create(const struct vport_parms *parms)
 static struct vport_ops ovs_gre_vport_ops = {
 	.type		= OVS_VPORT_TYPE_GRE,
 	.create		= gre_create,
-	.send		= dev_queue_xmit,
+	.send		= ovs_netdev_send,
 	.destroy	= ovs_netdev_tunnel_destroy,
 };
 
diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index 32d8e94d9bff..adc364161626 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -257,6 +257,12 @@ static netdev_tx_t internal_dev_recv(struct sk_buff *skb)
 	struct net_device *netdev = skb->dev;
 	struct pcpu_sw_netstats *stats;
 
+	/* Only send/receive L2 packets */
+	if (!skb->mac_len) {
+		kfree_skb(skb);
+		return -EINVAL;
+	}
+
 	if (unlikely(!(netdev->flags & IFF_UP))) {
 		kfree_skb(skb);
 		netdev->stats.rx_dropped++;
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 4e3972344aa6..733e7914f6bd 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -57,8 +57,10 @@ static void netdev_port_receive(struct sk_buff *skb)
 	if (unlikely(!skb))
 		return;
 
-	skb_push(skb, ETH_HLEN);
-	skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
+	if (vport->dev->type == ARPHRD_ETHER) {
+		skb_push(skb, ETH_HLEN);
+		skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
+	}
 	ovs_vport_receive(vport, skb, skb_tunnel_info(skb));
 	return;
 error:
@@ -194,6 +196,17 @@ void ovs_netdev_tunnel_destroy(struct vport *vport)
 }
 EXPORT_SYMBOL_GPL(ovs_netdev_tunnel_destroy);
 
+int ovs_netdev_send(struct sk_buff *skb)
+{
+	/* Only send L2 packets */
+	if (skb->mac_len)
+		return dev_queue_xmit(skb);
+
+	kfree_skb(skb);
+	return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(ovs_netdev_send);
+
 /* Returns null if this device is not attached to a datapath. */
 struct vport *ovs_netdev_get_vport(struct net_device *dev)
 {
@@ -208,7 +221,7 @@ static struct vport_ops ovs_netdev_vport_ops = {
 	.type		= OVS_VPORT_TYPE_NETDEV,
 	.create		= netdev_create,
 	.destroy	= netdev_destroy,
-	.send		= dev_queue_xmit,
+	.send		= ovs_netdev_send,
 };
 
 int __init ovs_netdev_init(void)
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index 19e29c12adcc..637b14a9963c 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -33,4 +33,6 @@ int __init ovs_netdev_init(void);
 void ovs_netdev_exit(void);
 
 void ovs_netdev_tunnel_destroy(struct vport *vport);
+
+int ovs_netdev_send(struct sk_buff *skb);
 #endif /* vport_netdev.h */
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 5eb7694348b5..13f11ad7e35a 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -153,7 +153,7 @@ static struct vport_ops ovs_vxlan_netdev_vport_ops = {
 	.create			= vxlan_create,
 	.destroy		= ovs_netdev_tunnel_destroy,
 	.get_options		= vxlan_get_options,
-	.send			= dev_queue_xmit,
+	.send			= ovs_netdev_send,
 };
 
 static int __init ovs_vxlan_tnl_init(void)
-- 
2.7.0.rc3.207.g0ac5344

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next v11 6/6] openvswitch: use ipgre tunnel rather than gretap tunnel
       [not found] ` <1467827996-32547-1-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
                     ` (4 preceding siblings ...)
  2016-07-06 17:59   ` [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support Simon Horman
@ 2016-07-06 17:59   ` Simon Horman
  5 siblings, 0 replies; 36+ messages in thread
From: Simon Horman @ 2016-07-06 17:59 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: dev-yBygre7rU0TnMu66kgdUjQ

This allows GRE tunnels to send and receive both
layer 2 packets (packets with an ethernet header) and
layer 3 packets (packets without an ethernet header).

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
v11
* Make use of skb_mac_header_was_set() to avoid needing to calculate mac_len

v10
* Handle case of l3 only packets on vport-netdev
* Use ARPHRD_NONE for ipgre interfaces as per recent changes in mainline
* Ensure skb->mac_len is set correctly in netdev_port_receive and
  relay on this value to differentiate layer3 packets in
  ovs_flow_key_extract()
---
 include/net/gre.h              |  4 ++--
 net/ipv4/ip_gre.c              |  9 +++++----
 net/openvswitch/flow.c         |  2 +-
 net/openvswitch/vport-gre.c    |  2 +-
 net/openvswitch/vport-netdev.c | 18 ++++++++++++------
 5 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/include/net/gre.h b/include/net/gre.h
index 7a54a31d1d4c..a218a4deffd1 100644
--- a/include/net/gre.h
+++ b/include/net/gre.h
@@ -23,8 +23,8 @@ struct gre_protocol {
 int gre_add_protocol(const struct gre_protocol *proto, u8 version);
 int gre_del_protocol(const struct gre_protocol *proto, u8 version);
 
-struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
-				       u8 name_assign_type);
+struct net_device *gre_fb_dev_create(struct net *net, const char *name,
+				     u8 name_assign_type);
 int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
 		     bool *csum_err, __be16 proto, int nhs);
 
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 330d58e9c523..a20248355da0 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1147,8 +1147,8 @@ static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
 	.get_link_net	= ip_tunnel_get_link_net,
 };
 
-struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
-					u8 name_assign_type)
+struct net_device *gre_fb_dev_create(struct net *net, const char *name,
+				     u8 name_assign_type)
 {
 	struct nlattr *tb[IFLA_MAX + 1];
 	struct net_device *dev;
@@ -1159,13 +1159,14 @@ struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
 	memset(&tb, 0, sizeof(tb));
 
 	dev = rtnl_create_link(net, name, name_assign_type,
-			       &ipgre_tap_ops, tb);
+			       &ipgre_link_ops, tb);
 	if (IS_ERR(dev))
 		return dev;
 
 	/* Configure flow based GRE device. */
 	t = netdev_priv(dev);
 	t->collect_md = true;
+	dev->type = ARPHRD_NONE;
 
 	err = ipgre_newlink(net, dev, tb, NULL);
 	if (err < 0) {
@@ -1190,7 +1191,7 @@ out:
 	unregister_netdevice_many(&list_kill);
 	return ERR_PTR(err);
 }
-EXPORT_SYMBOL_GPL(gretap_fb_dev_create);
+EXPORT_SYMBOL_GPL(gre_fb_dev_create);
 
 static int __net_init ipgre_tap_init_net(struct net *net)
 {
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 86f2cfb19de3..42587d5bf894 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -729,7 +729,7 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 	key->phy.skb_mark = skb->mark;
 	ovs_ct_fill_key(skb, key);
 	key->ovs_flow_hash = 0;
-	key->phy.is_layer3 = skb->mac_len == 0;
+	key->phy.is_layer3 = skb_mac_header_was_set(skb) == 0;
 	key->recirc_id = 0;
 
 	err = key_extract(skb, key);
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index bcbc91b8b077..c1cab9dd392f 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -60,7 +60,7 @@ static struct vport *gre_tnl_create(const struct vport_parms *parms)
 		return vport;
 
 	rtnl_lock();
-	dev = gretap_fb_dev_create(net, parms->name, NET_NAME_USER);
+	dev = gre_fb_dev_create(net, parms->name, NET_NAME_USER);
 	if (IS_ERR(dev)) {
 		rtnl_unlock();
 		ovs_vport_free(vport);
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 733e7914f6bd..82b10802abe6 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -61,6 +61,7 @@ static void netdev_port_receive(struct sk_buff *skb)
 		skb_push(skb, ETH_HLEN);
 		skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
 	}
+
 	ovs_vport_receive(vport, skb, skb_tunnel_info(skb));
 	return;
 error:
@@ -99,7 +100,8 @@ struct vport *ovs_netdev_link(struct vport *vport, const char *name)
 	}
 
 	if (vport->dev->flags & IFF_LOOPBACK ||
-	    vport->dev->type != ARPHRD_ETHER ||
+	    (vport->dev->type != ARPHRD_ETHER &&
+	     vport->dev->type != ARPHRD_NONE) ||
 	    ovs_is_internal_dev(vport->dev)) {
 		err = -EINVAL;
 		goto error_put;
@@ -198,12 +200,16 @@ EXPORT_SYMBOL_GPL(ovs_netdev_tunnel_destroy);
 
 int ovs_netdev_send(struct sk_buff *skb)
 {
-	/* Only send L2 packets */
-	if (skb->mac_len)
-		return dev_queue_xmit(skb);
+	struct net_device *dev = skb->dev;
 
-	kfree_skb(skb);
-	return -EINVAL;
+	if (dev->type != ARPHRD_ETHER && skb->mac_len) {
+		skb->protocol = htons(ETH_P_TEB);
+	} else if (dev->type == ARPHRD_ETHER && !skb->mac_len) {
+		kfree_skb(skb);
+		return -EINVAL;
+	}
+
+	return dev_queue_xmit(skb);
 }
 EXPORT_SYMBOL_GPL(ovs_netdev_send);
 
-- 
2.7.0.rc3.207.g0ac5344

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 1/6] net: introduce skb_transport_header_was_set()
       [not found]     ` <1467827996-32547-2-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
@ 2016-07-07 20:51       ` pravin shelar
  0 siblings, 0 replies; 36+ messages in thread
From: pravin shelar @ 2016-07-07 20:51 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers

On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
<simon.horman@netronome.com> wrote:
> This helper resets the mac_header of an skb to a state where
> skb_transport_header_was_set() will return false.
>
> This is intended to be used with packets received on
> ARPHRD_NONE devices without an Ethernet header in the inner packet.
> It allows skb_transport_header_was_set to be subsequently used to
> differentiate such packets from those with an Ethernet header.
>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ovs-dev] [PATCH net-next v11 2/6] gre: unset mac header for non-TEB packets received by ipgre device
  2016-07-06 17:59   ` [PATCH net-next v11 2/6] gre: unset mac header for non-TEB packets received by ipgre device Simon Horman
@ 2016-07-07 20:51     ` pravin shelar
  0 siblings, 0 replies; 36+ messages in thread
From: pravin shelar @ 2016-07-07 20:51 UTC (permalink / raw)
  To: Simon Horman; +Cc: Linux Kernel Network Developers, ovs dev

On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
<simon.horman@netronome.com> wrote:
> unset rather than reset mach header for non-TEB packets received by an
> ipgre device.  This allows skb_transport_header_was_set to be subsequently
> used to differentiate TEB and non-TEB packets recieved on an ipgre device.
>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 3/6] openvswitch: set skb protocol and mac_len when receiving on internal device
       [not found]     ` <1467827996-32547-4-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
@ 2016-07-07 20:52       ` pravin shelar
       [not found]         ` <CAOrHB_B2VDPcEe0B471J+XjmviAbTO0JRPTHiS7jHzF5V8uHZg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: pravin shelar @ 2016-07-07 20:52 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers

On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
<simon.horman@netronome.com> wrote:
> * Set skb protocol based on contents of packet. I have observed this is
>   necessary to get actual protocol of a packet when it is injected into an
>   internal device e.g. by libnet in which case skb protocol will be set to
>   ETH_ALL.
>
I am not sure what do yo mean by ETH_ALL. I could not find it in the kernel.
anyways, Can we fix libnet to set skb->protocol field correctly? The
change is introducing overhead for every packet received on internal
port.

> * Set the mac_len which has been observed to not be set up correctly when
>   an ARP packet is generated and sent via an openvswitch bridge.
>   My test case is a scenario where there are two open vswtich bridges.
>   One outputs to a tunnel port which egresses on the other.
>
> The motivation for this is that support for outputting to layer 3 (non-tap)
> GRE tunnels as implemented by a subsequent patch depends on protocol and
> mac_len being set correctly on receive.
>
The commit msg and the change does not match anymore.


> Signed-off-by: Simon Horman <simon.horman@netronome.com>
>
> ---
> v11
> * Do not set mac_len.
>   Instead of relying on mac_len follow-up patches now
>   use skb_unset_mac_header()
>
> v10
> * Set mac_len
>
> v9
> * New patch
> ---
>  net/openvswitch/vport-internal_dev.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
> index 434e04c3a189..32d8e94d9bff 100644
> --- a/net/openvswitch/vport-internal_dev.c
> +++ b/net/openvswitch/vport-internal_dev.c
> @@ -48,6 +48,9 @@ static int internal_dev_xmit(struct sk_buff *skb, struct net_device *netdev)
>  {
>         int len, err;
>
> +       skb->protocol = eth_type_trans(skb, netdev);
> +       skb_push(skb, ETH_HLEN);
> +
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 4/6] openvswitch: add support to push and pop mpls for layer3 packets
       [not found]     ` <1467827996-32547-5-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
@ 2016-07-07 20:52       ` pravin shelar
  2016-07-10 11:14         ` [ovs-dev] " Simon Horman
  0 siblings, 1 reply; 36+ messages in thread
From: pravin shelar @ 2016-07-07 20:52 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers

On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
<simon.horman@netronome.com> wrote:
> Allow push and pop mpls actions to act on layer 3 packets by teaching
> them not to access non-existent L2 headers of such packets.
>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
> ---
> v11
> * group l2 code in pop_mpls()
>
> v10
> * Limit scope of hdr in {push,pop}_mpls()
>
> v9
> * New Patch
> ---
>  include/uapi/linux/openvswitch.h |  2 ++
>  net/openvswitch/actions.c        | 24 +++++++++++++++---------
>  2 files changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
> index d95a3018f6a1..5cde501433eb 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -737,6 +737,8 @@ enum ovs_nat_attr {
>   * is no MPLS label stack, as determined by ethertype, no action is taken.
>   * @OVS_ACTION_ATTR_CT: Track the connection. Populate the conntrack-related
>   * entries in the flow key.
> + * @OVS_ACTION_ATTR_PUSH_ETH: Push a new outermost Ethernet header onto the      * packet.
> + * @OVS_ACTION_ATTR_POP_ETH: Pop the outermost Ethernet header off the packet.
>   *
>   * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
>   * fields within a header are modifiable, e.g. the IPv4 protocol and fragment

This hunk is not related to this patch.
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 1ecbd7715f6d..12e8a8942a42 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
...
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ovs-dev] [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
  2016-07-06 17:59   ` [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support Simon Horman
@ 2016-07-07 20:54     ` pravin shelar
       [not found]       ` <CAOrHB_BYD40ZkWbU0dvhPOCcaCVgooksOUkejxyFoagyoiBTNw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: pravin shelar @ 2016-07-07 20:54 UTC (permalink / raw)
  To: Simon Horman; +Cc: Linux Kernel Network Developers, ovs dev

On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
<simon.horman@netronome.com> wrote:
> From: Lorand Jakab <lojakab@cisco.com>
>
> Implementation of the pop_eth and push_eth actions in the kernel, and
> layer 3 flow support.
>
> This doesn't actually do anything yet as no layer 2 tunnel ports are
> supported yet. The original patch by Lorand was against the Open vSwitch
> tree which has L2 LISP tunnels but that is not supported in mainline Linux.
> I (Simon) plan to follow up with support for non-TEB GRE ports based on
> work by Thomas Morin.
>
> Cc: Thomas Morin <thomas.morin@orange.com>
> Signed-off-by: Lorand Jakab <lojakab@cisco.com>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
>
> ---
....
> ---
>  include/uapi/linux/openvswitch.h     |  11 ++
>  net/openvswitch/actions.c            |  45 ++++++++
>  net/openvswitch/datapath.c           |  13 +--
>  net/openvswitch/flow.c               |  65 +++++++----
>  net/openvswitch/flow.h               |   4 +-
>  net/openvswitch/flow_netlink.c       | 213 ++++++++++++++++++++++++-----------
>  net/openvswitch/vport-geneve.c       |   2 +-
>  net/openvswitch/vport-gre.c          |   2 +-
>  net/openvswitch/vport-internal_dev.c |   6 +
>  net/openvswitch/vport-netdev.c       |  19 +++-
>  net/openvswitch/vport-netdev.h       |   2 +
>  net/openvswitch/vport-vxlan.c        |   2 +-
>  12 files changed, 279 insertions(+), 105 deletions(-)
>
> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
> index 5cde501433eb..6f505e486e93 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
...

> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 12e8a8942a42..0001f651c934 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -301,6 +301,43 @@ static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key,
>         return 0;
>  }
>
> +/* pop_eth does not support VLAN packets as this action is never called
> + * for them.
> + */
> +static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
> +{
> +       skb_pull_rcsum(skb, ETH_HLEN);
> +       skb_reset_mac_header(skb);
> +       skb->mac_len -= ETH_HLEN;
> +
> +       invalidate_flow_key(key);
> +       return 0;
> +}
This is changing l2 packet to l3 packet by reseting mac header. We
need to unset mac header so that OVS key-extract can identify this
packet later on, for example after recirc action.
Other option would be keeping key is_layer3 consistent with packet.
Push ethernet and pop ethernet action can unset and set the flag in
flow key. So that OVS can keep track of packet headers by looking at
packet key. When packet leaves OVS (in netdev-send) we can unset mac
header according to this flag.

> +
> +static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
> +                   const struct ovs_action_push_eth *ethh)
> +{
> +       struct ethhdr *hdr;
> +
> +       /* Add the new Ethernet header */
> +       if (skb_cow_head(skb, ETH_HLEN) < 0)
> +               return -ENOMEM;
> +
> +       skb_push(skb, ETH_HLEN);
> +       skb_reset_mac_header(skb);
> +       skb->mac_len += ETH_HLEN;
> +
> +       hdr = eth_hdr(skb);
> +       ether_addr_copy(hdr->h_source, ethh->addresses.eth_src);
> +       ether_addr_copy(hdr->h_dest, ethh->addresses.eth_dst);
> +       hdr->h_proto = skb->protocol;
> +
> +       skb_postpush_rcsum(skb, hdr, ETH_HLEN);
> +
> +       invalidate_flow_key(key);
> +       return 0;
> +}
> +
....

> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index 0ea128eeeab2..86f2cfb19de3 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
...

> @@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>         key->phy.skb_mark = skb->mark;
>         ovs_ct_fill_key(skb, key);
>         key->ovs_flow_hash = 0;
> +       key->phy.is_layer3 = skb->mac_len == 0;

I do not think mac_len can be used. mac_header needs to be checked.
...

> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index c78a6a1476fb..fc94fbe1ddc3 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
...

> @@ -898,20 +922,33 @@ static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match,
>                                    sizeof(*cl), is_mask);
>                 *attrs &= ~(1ULL << OVS_KEY_ATTR_CT_LABELS);
>         }
> -       return 0;
> -}
>
> -static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
> -                               u64 attrs, const struct nlattr **a,
> -                               bool is_mask, bool log)
> -{
> -       int err;
> +       /* For layer 3 packets the ethernet type is provided
> +        * and treated as metadata but no MAC addresses are provided.
> +        */
> +       if (*attrs & (1ULL << OVS_KEY_ATTR_ETHERTYPE) &&
> +           !(*attrs & (1 << OVS_KEY_ATTR_ETHERNET))) {
> +               int err;
>
Here attr_ethertype can be processed irrespective of attr_ethernet.
is_layer3 can be derived independently. This way there is no need to
again process attr_ethertyp in l2_from_nlattrs().

> -       err = metadata_from_nlattrs(net, match, &attrs, a, is_mask, log);
> -       if (err)
> -               return err;
> +               err = ethertype_from_nlattrs(net, match, attrs, a, is_mask,
> +                                            log);
> +               if (err)
> +                       return err;
> +
> +               is_layer3 = true;
> +       }
>
> -       if (attrs & (1 << OVS_KEY_ATTR_ETHERNET)) {
> +       /* Always exact match is_layer3 */
> +       SW_FLOW_KEY_PUT(match, phy.is_layer3, is_mask ? true : is_layer3,
> +                       is_mask);
> +       return is_layer3;
> +}
> +
....
> +       if (*attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) {
> +               int err;
>
> -               SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask);
> -               attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE);
> +               err = ethertype_from_nlattrs(net, match, attrs, a, is_mask,
> +                                            log);
> +               if (err)
> +                       return err;
>         } else if (!is_mask) {
>                 SW_FLOW_KEY_PUT(match, eth.type, htons(ETH_P_802_2), is_mask);
>         }
>
> +       return 0;
> +}
> +
> +static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
> +                               u64 attrs, const struct nlattr **a,
> +                               bool is_mask, bool log)
> +{
> +       int err;
> +       bool is_layer3;
> +
> +       err = metadata_from_nlattrs(net, match, &attrs, a, is_mask, log);
> +       if (err < 0)
> +               return err;
> +       is_layer3 = err != 0;
> +
> +       if (!is_layer3) {
> +               err = l2_from_nlattrs(net, match, &attrs, a, is_mask, log);
> +               if (err < 0)
> +                       return err;
> +       }
> +
...


> diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
> index 32d8e94d9bff..adc364161626 100644
> --- a/net/openvswitch/vport-internal_dev.c
> +++ b/net/openvswitch/vport-internal_dev.c
> @@ -257,6 +257,12 @@ static netdev_tx_t internal_dev_recv(struct sk_buff *skb)
>         struct net_device *netdev = skb->dev;
>         struct pcpu_sw_netstats *stats;
>
> +       /* Only send/receive L2 packets */
> +       if (!skb->mac_len) {
> +               kfree_skb(skb);
> +               return -EINVAL;
> +       }
> +
Is mac_len consistent? I thought we decided to use
skb_mac_header_was_set() to detect l3 packets.

>         if (unlikely(!(netdev->flags & IFF_UP))) {
>                 kfree_skb(skb);
>                 netdev->stats.rx_dropped++;
> diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
> index 4e3972344aa6..733e7914f6bd 100644
> --- a/net/openvswitch/vport-netdev.c
> +++ b/net/openvswitch/vport-netdev.c
> @@ -57,8 +57,10 @@ static void netdev_port_receive(struct sk_buff *skb)
>         if (unlikely(!skb))
>                 return;
>
> -       skb_push(skb, ETH_HLEN);
> -       skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
> +       if (vport->dev->type == ARPHRD_ETHER) {
> +               skb_push(skb, ETH_HLEN);
> +               skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
> +       }
This is still required for tunnel device of ARPHRD_NONE which can
handle l2 packets.

>         ovs_vport_receive(vport, skb, skb_tunnel_info(skb));
>         return;
>  error:
> @@ -194,6 +196,17 @@ void ovs_netdev_tunnel_destroy(struct vport *vport)
>  }
>  EXPORT_SYMBOL_GPL(ovs_netdev_tunnel_destroy);
>
> +int ovs_netdev_send(struct sk_buff *skb)
> +{
> +       /* Only send L2 packets */
> +       if (skb->mac_len)
> +               return dev_queue_xmit(skb);
> +
As commented in earlier, we can send is_layer3 flag from flow key. If
it is l3 packet then unset mac header before sending it out to keep
the packet metadata consistent.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ovs-dev] [PATCH net-next v11 4/6] openvswitch: add support to push and pop mpls for layer3 packets
  2016-07-07 20:52       ` pravin shelar
@ 2016-07-10 11:14         ` Simon Horman
  0 siblings, 0 replies; 36+ messages in thread
From: Simon Horman @ 2016-07-10 11:14 UTC (permalink / raw)
  To: pravin shelar; +Cc: Linux Kernel Network Developers, ovs dev

On Thu, Jul 07, 2016 at 01:52:47PM -0700, pravin shelar wrote:
> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
> <simon.horman@netronome.com> wrote:
> > Allow push and pop mpls actions to act on layer 3 packets by teaching
> > them not to access non-existent L2 headers of such packets.
> >
> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
> > ---
> > v11
> > * group l2 code in pop_mpls()
> >
> > v10
> > * Limit scope of hdr in {push,pop}_mpls()
> >
> > v9
> > * New Patch
> > ---
> >  include/uapi/linux/openvswitch.h |  2 ++
> >  net/openvswitch/actions.c        | 24 +++++++++++++++---------
> >  2 files changed, 17 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
> > index d95a3018f6a1..5cde501433eb 100644
> > --- a/include/uapi/linux/openvswitch.h
> > +++ b/include/uapi/linux/openvswitch.h
> > @@ -737,6 +737,8 @@ enum ovs_nat_attr {
> >   * is no MPLS label stack, as determined by ethertype, no action is taken.
> >   * @OVS_ACTION_ATTR_CT: Track the connection. Populate the conntrack-related
> >   * entries in the flow key.
> > + * @OVS_ACTION_ATTR_PUSH_ETH: Push a new outermost Ethernet header onto the      * packet.
> > + * @OVS_ACTION_ATTR_POP_ETH: Pop the outermost Ethernet header off the packet.
> >   *
> >   * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
> >   * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
> 
> This hunk is not related to this patch.

Sorry about that, I will move it.

> > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> > index 1ecbd7715f6d..12e8a8942a42 100644
> > --- a/net/openvswitch/actions.c
> > +++ b/net/openvswitch/actions.c
> ...

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 3/6] openvswitch: set skb protocol and mac_len when receiving on internal device
       [not found]         ` <CAOrHB_B2VDPcEe0B471J+XjmviAbTO0JRPTHiS7jHzF5V8uHZg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-07-13  7:17           ` Simon Horman
  0 siblings, 0 replies; 36+ messages in thread
From: Simon Horman @ 2016-07-13  7:17 UTC (permalink / raw)
  To: pravin shelar; +Cc: ovs dev, Linux Kernel Network Developers

On Thu, Jul 07, 2016 at 01:52:25PM -0700, pravin shelar wrote:
> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
> <simon.horman@netronome.com> wrote:
> > * Set skb protocol based on contents of packet. I have observed this is
> >   necessary to get actual protocol of a packet when it is injected into an
> >   internal device e.g. by libnet in which case skb protocol will be set to
> >   ETH_ALL.
> >
> I am not sure what do yo mean by ETH_ALL. I could not find it in the kernel.
> anyways, Can we fix libnet to set skb->protocol field correctly? The
> change is introducing overhead for every packet received on internal
> port.

It was quite a while ago since I wrote this patch and I don't recall what
I meant by ETH_ALL.

I now plan to drop this patch.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
       [not found]       ` <CAOrHB_BYD40ZkWbU0dvhPOCcaCVgooksOUkejxyFoagyoiBTNw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-07-13  7:31         ` Simon Horman
  2016-07-15 21:07           ` [ovs-dev] " pravin shelar
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Horman @ 2016-07-13  7:31 UTC (permalink / raw)
  To: pravin shelar; +Cc: ovs dev, Linux Kernel Network Developers

Hi Pravin,

On Thu, Jul 07, 2016 at 01:54:15PM -0700, pravin shelar wrote:
> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
> <simon.horman@netronome.com> wrote:

...

> > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> > index 12e8a8942a42..0001f651c934 100644
> > --- a/net/openvswitch/actions.c
> > +++ b/net/openvswitch/actions.c
> > @@ -301,6 +301,43 @@ static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key,
> >         return 0;
> >  }
> >
> > +/* pop_eth does not support VLAN packets as this action is never called
> > + * for them.
> > + */
> > +static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
> > +{
> > +       skb_pull_rcsum(skb, ETH_HLEN);
> > +       skb_reset_mac_header(skb);
> > +       skb->mac_len -= ETH_HLEN;
> > +
> > +       invalidate_flow_key(key);
> > +       return 0;
> > +}
> This is changing l2 packet to l3 packet by reseting mac header. We
> need to unset mac header so that OVS key-extract can identify this
> packet later on, for example after recirc action.
> Other option would be keeping key is_layer3 consistent with packet.
> Push ethernet and pop ethernet action can unset and set the flag in
> flow key. So that OVS can keep track of packet headers by looking at
> packet key. When packet leaves OVS (in netdev-send) we can unset mac
> header according to this flag.

...

> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> > index 0ea128eeeab2..86f2cfb19de3 100644
> > --- a/net/openvswitch/flow.c
> > +++ b/net/openvswitch/flow.c
> ...
> 
> > @@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
> >         key->phy.skb_mark = skb->mark;
> >         ovs_ct_fill_key(skb, key);
> >         key->ovs_flow_hash = 0;
> > +       key->phy.is_layer3 = skb->mac_len == 0;
> 
> I do not think mac_len can be used. mac_header needs to be checked.
> ...

Yes, indeed. The update to use skb_mac_header_was_set() here accidently
slipped into the following patch, sorry about that.

With that change in place I believe that this patch is internally
consistent because mac_header and mac_len are set correctly by the
call to key_extract() which is called by ovs_flow_key_extract() just
after where the excerpt above ends.

That said, I do think that it is possible to rely on skb_mac_header_was_set
throughout the datapath, including action processing etc... I have provided
an incremental patch - which I created on top of this entire series - at
the end of this email. If you prefer that approach I am happy to take it,
though I do feel that using mac_len leads to slightly cleaner code. Let me
know what you think.

> > diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> > index c78a6a1476fb..fc94fbe1ddc3 100644
> > --- a/net/openvswitch/flow_netlink.c
> > +++ b/net/openvswitch/flow_netlink.c
> ...
> 
> > @@ -898,20 +922,33 @@ static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match,
> >                                    sizeof(*cl), is_mask);
> >                 *attrs &= ~(1ULL << OVS_KEY_ATTR_CT_LABELS);
> >         }
> > -       return 0;
> > -}
> >
> > -static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
> > -                               u64 attrs, const struct nlattr **a,
> > -                               bool is_mask, bool log)
> > -{
> > -       int err;
> > +       /* For layer 3 packets the ethernet type is provided
> > +        * and treated as metadata but no MAC addresses are provided.
> > +        */
> > +       if (*attrs & (1ULL << OVS_KEY_ATTR_ETHERTYPE) &&
> > +           !(*attrs & (1 << OVS_KEY_ATTR_ETHERNET))) {
> > +               int err;
> >
> Here attr_ethertype can be processed irrespective of attr_ethernet.
> is_layer3 can be derived independently. This way there is no need to
> again process attr_ethertyp in l2_from_nlattrs().

Thanks, I have reworked things as you suggest.

...

> > diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
> > index 4e3972344aa6..733e7914f6bd 100644
> > --- a/net/openvswitch/vport-netdev.c
> > +++ b/net/openvswitch/vport-netdev.c
> > @@ -57,8 +57,10 @@ static void netdev_port_receive(struct sk_buff *skb)
> >         if (unlikely(!skb))
> >                 return;
> >
> > -       skb_push(skb, ETH_HLEN);
> > -       skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
> > +       if (vport->dev->type == ARPHRD_ETHER) {
> > +               skb_push(skb, ETH_HLEN);
> > +               skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
> > +       }
> This is still required for tunnel device of ARPHRD_NONE which can
> handle l2 packets.

That is not necessary given the current implementation (of ipgre) as it
supplies an skb with the mac header in place if the inner packet was an
Ethernet packet. This scheme could of course be adjusted.

...



Update to use skb_mac_header_was_set() more as mentioned above.
Please let me know what you think about this approach.

 include/net/mpls.h                   |    4 ++-
 net/openvswitch/actions.c            |   42 ++++++++++++++++++++---------------
 net/openvswitch/flow.c               |   23 +++++++++++--------
 net/openvswitch/vport-internal_dev.c |    2 -
 net/openvswitch/vport-netdev.c       |    4 +--
 5 files changed, 44 insertions(+), 31 deletions(-)

diff --git a/include/net/mpls.h b/include/net/mpls.h
index 5b3b5addfb08..296b68661be0 100644
--- a/include/net/mpls.h
+++ b/include/net/mpls.h
@@ -34,6 +34,8 @@ static inline bool eth_p_mpls(__be16 eth_type)
  */
 static inline unsigned char *skb_mpls_header(struct sk_buff *skb)
 {
-	return skb_mac_header(skb) + skb->mac_len;
+	return skb_mac_header_was_set(skb) ?
+		skb_mac_header(skb) + skb->mac_len :
+		skb->data;
 }
 #endif
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 0001f651c934..a18feccb2baa 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -163,18 +163,20 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 		return -ENOMEM;
 
 	skb_push(skb, MPLS_HLEN);
-	skb_reset_mac_header(skb);
 
 	new_mpls_lse = (__be32 *)skb_mpls_header(skb);
 	*new_mpls_lse = mpls->mpls_lse;
 
 	skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
 
-	if (skb->mac_len) {
+	if (skb_mac_header_was_set(skb)) {
+		skb_reset_mac_header(skb);
+
 		update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
 		memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
 			skb->mac_len);
 	}
+
 	if (!skb->inner_protocol)
 		skb_set_inner_protocol(skb, skb->protocol);
 	skb->protocol = mpls->mpls_ethertype;
@@ -186,22 +188,18 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 		    const __be16 ethertype)
 {
-	int err;
-
-	err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN);
-	if (unlikely(err))
-		return err;
-
-	skb_postpull_rcsum(skb, skb_mpls_header(skb), MPLS_HLEN);
+	if (skb_mac_header_was_set(skb)) {
+		struct ethhdr *hdr;
+		int err;
 
-	memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb),
-		skb->mac_len);
+		skb_postpull_rcsum(skb, skb_mpls_header(skb), MPLS_HLEN);
 
-	__skb_pull(skb, MPLS_HLEN);
-	skb_reset_mac_header(skb);
+		err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN);
+		if (unlikely(err))
+			return err;
 
-	if (skb->mac_len) {
-		struct ethhdr *hdr;
+		memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb),
+			skb->mac_len);
 
 		/* skb_mpls_header() is used to locate the ethertype
 		 * field correctly in the presence of VLAN tags.
@@ -210,6 +208,11 @@ static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 		update_ethertype(skb, hdr, ethertype);
 	}
 
+	__skb_pull(skb, MPLS_HLEN);
+
+	if (skb_mac_header_was_set(skb))
+		skb_reset_mac_header(skb);
+
 	if (eth_p_mpls(skb->protocol))
 		skb->protocol = ethertype;
 
@@ -220,11 +223,14 @@ static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 static int set_mpls(struct sk_buff *skb, struct sw_flow_key *flow_key,
 		    const __be32 *mpls_lse, const __be32 *mask)
 {
+	__u16 mac_len;
 	__be32 *stack;
 	__be32 lse;
 	int err;
 
-	err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN);
+	mac_len = skb_mac_header_was_set(skb) ? skb->mac_len : 0;
+
+	err = skb_ensure_writable(skb, mac_len + MPLS_HLEN);
 	if (unlikely(err))
 		return err;
 
@@ -307,7 +313,7 @@ static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key,
 static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
 {
 	skb_pull_rcsum(skb, ETH_HLEN);
-	skb_reset_mac_header(skb);
+	skb_unset_mac_header(skb);
 	skb->mac_len -= ETH_HLEN;
 
 	invalidate_flow_key(key);
@@ -325,7 +331,7 @@ static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
 
 	skb_push(skb, ETH_HLEN);
 	skb_reset_mac_header(skb);
-	skb->mac_len += ETH_HLEN;
+	skb->mac_len = ETH_HLEN;
 
 	hdr = eth_hdr(skb);
 	ether_addr_copy(hdr->h_source, ethh->addresses.eth_src);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 42587d5bf894..837ea4f9a71d 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -465,17 +465,18 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 	/* Flags are always used as part of stats */
 	key->tp.flags = 0;
 
-	skb_reset_mac_header(skb);
-
 	/* Link layer. */
 	key->eth.tci = 0;
 	if (key->phy.is_layer3) {
 		if (skb_vlan_tag_present(skb))
 			key->eth.tci = htons(skb->vlan_tci);
 		key->eth.type = skb->protocol;
+		skb_reset_network_header(skb);
 	} else {
 		struct ethhdr *eth = eth_hdr(skb);
 
+		skb_reset_mac_header(skb);
+
 		ether_addr_copy(key->eth.src, eth->h_source);
 		ether_addr_copy(key->eth.dst, eth->h_dest);
 
@@ -493,11 +494,11 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 		key->eth.type = parse_ethertype(skb);
 		if (unlikely(key->eth.type == htons(0)))
 			return -ENOMEM;
-	}
 
-	skb_reset_network_header(skb);
-	skb_reset_mac_len(skb);
-	__skb_push(skb, skb->data - skb_mac_header(skb));
+		skb_reset_network_header(skb);
+		skb_reset_mac_len(skb);
+		__skb_push(skb, skb->data - skb_mac_header(skb));
+	}
 
 	/* Network layer. */
 	if (key->eth.type == htons(ETH_P_IP)) {
@@ -608,12 +609,16 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 		 * header and the beginning of the L3 header differ.
 		 *
 		 * Advance network_header to the beginning of the L3
-		 * header. mac_len corresponds to the end of the L2 header.
+		 * header. For packets with an L2 header mac_len corresponds
+		 * to the end of the L2 header.
 		 */
 		while (1) {
+			__u16 mac_len;
 			__be32 lse;
 
-			error = check_header(skb, skb->mac_len + stack_len);
+			mac_len = key->phy.is_layer3 ? 0 : skb->mac_len;
+
+			error = check_header(skb, mac_len + stack_len);
 			if (unlikely(error))
 				return 0;
 
@@ -622,7 +627,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 			if (stack_len == MPLS_HLEN)
 				memcpy(&key->mpls.top_lse, &lse, MPLS_HLEN);
 
-			skb_set_network_header(skb, skb->mac_len + stack_len);
+			skb_set_network_header(skb, mac_len + stack_len);
 			if (lse & htonl(MPLS_LS_S_MASK))
 				break;
 
diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index 5ad184bd5802..a5ea0bcd310c 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -255,7 +255,7 @@ static netdev_tx_t internal_dev_recv(struct sk_buff *skb)
 	struct pcpu_sw_netstats *stats;
 
 	/* Only send/receive L2 packets */
-	if (!skb->mac_len) {
+	if (!skb_mac_header_was_set(skb)) {
 		kfree_skb(skb);
 		return -EINVAL;
 	}
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 7d54414b35eb..05985209f611 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -201,9 +201,9 @@ int ovs_netdev_send(struct sk_buff *skb)
 {
 	struct net_device *dev = skb->dev;
 
-	if (dev->type != ARPHRD_ETHER && skb->mac_len) {
+	if (dev->type != ARPHRD_ETHER && skb_mac_header_was_set(skb)) {
 		skb->protocol = htons(ETH_P_TEB);
-	} else if (dev->type == ARPHRD_ETHER && !skb->mac_len) {
+	} else if (dev->type == ARPHRD_ETHER && !skb_mac_header_was_set(skb)) {
 		kfree_skb(skb);
 		return -EINVAL;
 	}

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [ovs-dev] [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
  2016-07-13  7:31         ` Simon Horman
@ 2016-07-15 21:07           ` pravin shelar
  2016-07-18  4:50             ` Simon Horman
  2016-09-26 16:53             ` [ovs-dev] " Jiri Benc
  0 siblings, 2 replies; 36+ messages in thread
From: pravin shelar @ 2016-07-15 21:07 UTC (permalink / raw)
  To: Simon Horman; +Cc: Linux Kernel Network Developers, ovs dev

On Wed, Jul 13, 2016 at 12:31 AM, Simon Horman
<simon.horman@netronome.com> wrote:
> Hi Pravin,
>
> On Thu, Jul 07, 2016 at 01:54:15PM -0700, pravin shelar wrote:
>> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
>> <simon.horman@netronome.com> wrote:
>
> ...

>
>> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>> > index 0ea128eeeab2..86f2cfb19de3 100644
>> > --- a/net/openvswitch/flow.c
>> > +++ b/net/openvswitch/flow.c
>> ...
>>
>> > @@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>> >         key->phy.skb_mark = skb->mark;
>> >         ovs_ct_fill_key(skb, key);
>> >         key->ovs_flow_hash = 0;
>> > +       key->phy.is_layer3 = skb->mac_len == 0;
>>
>> I do not think mac_len can be used. mac_header needs to be checked.
>> ...
>
> Yes, indeed. The update to use skb_mac_header_was_set() here accidently
> slipped into the following patch, sorry about that.
>
> With that change in place I believe that this patch is internally
> consistent because mac_header and mac_len are set correctly by the
> call to key_extract() which is called by ovs_flow_key_extract() just
> after where the excerpt above ends.
>
> That said, I do think that it is possible to rely on skb_mac_header_was_set
> throughout the datapath, including action processing etc... I have provided
> an incremental patch - which I created on top of this entire series - at
> the end of this email. If you prefer that approach I am happy to take it,
> though I do feel that using mac_len leads to slightly cleaner code. Let me
> know what you think.
>


I am not sure if you can use only mac_len to detect L3 packet. This
does not work with MPLS packets, mac_len is used to account MPLS
headers pushed on skb. Therefore in case of a MPLS header on L3
packet, mac_len would be non zero and we have to look at either
mac_header or some other metadata like is_layer3 flag from key to
check for L3 packet.


>> > diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
>> > index 4e3972344aa6..733e7914f6bd 100644
>> > --- a/net/openvswitch/vport-netdev.c
>> > +++ b/net/openvswitch/vport-netdev.c
>> > @@ -57,8 +57,10 @@ static void netdev_port_receive(struct sk_buff *skb)
>> >         if (unlikely(!skb))
>> >                 return;
>> >
>> > -       skb_push(skb, ETH_HLEN);
>> > -       skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
>> > +       if (vport->dev->type == ARPHRD_ETHER) {
>> > +               skb_push(skb, ETH_HLEN);
>> > +               skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
>> > +       }
>> This is still required for tunnel device of ARPHRD_NONE which can
>> handle l2 packets.
>
> That is not necessary given the current implementation (of ipgre) as it
> supplies an skb with the mac header in place if the inner packet was an
> Ethernet packet. This scheme could of course be adjusted.
>
> ...
>

I think we should send L2 header with l2 header pushed on skb. This is
what OVS expect. The skb-push should be done for all l2 packets rather
than for particular type of device.

>
>
> Update to use skb_mac_header_was_set() more as mentioned above.
> Please let me know what you think about this approach.
>
>  include/net/mpls.h                   |    4 ++-
>  net/openvswitch/actions.c            |   42 ++++++++++++++++++++---------------
>  net/openvswitch/flow.c               |   23 +++++++++++--------
>  net/openvswitch/vport-internal_dev.c |    2 -
>  net/openvswitch/vport-netdev.c       |    4 +--
>  5 files changed, 44 insertions(+), 31 deletions(-)
>
> diff --git a/include/net/mpls.h b/include/net/mpls.h
> index 5b3b5addfb08..296b68661be0 100644
> --- a/include/net/mpls.h
> +++ b/include/net/mpls.h
> @@ -34,6 +34,8 @@ static inline bool eth_p_mpls(__be16 eth_type)
>   */
>  static inline unsigned char *skb_mpls_header(struct sk_buff *skb)
>  {
> -       return skb_mac_header(skb) + skb->mac_len;
> +       return skb_mac_header_was_set(skb) ?
> +               skb_mac_header(skb) + skb->mac_len :
> +               skb->data;
>  }

This function is also called from GSO layer. issue is in GSO layer, it
does reset mac header and mac length and then calls mpls-gso-handler.
So all subsequent check for L3 packet fails.
So far we have explored three different ways to detect L3 packet but
each has its own issue.
1. skb mac header : GSO can reset mac header.
2. skb mac length : MPLS uses mac_len to account for MPLS header
length along with L2 header
3. skb protocol: ETH_P_TEB is not set for all L2 frames, networking
stack is not ready to handle this type for given skb.

So none of them works consistently. I think the only option to detect
L3 packet reliably (and without adding field to skb) is to use
skb-protocol along with ARPHRD_NONE device type. If ARPHRD_NONE type
device generates L2 packet it needs to set protocol to ETH_P_TEB. Some
networking stack function also needs to be fixed to handle this
protocol type, e.g. vlan_get_protocol(), br_dev_queue_push_xmit(),
etc.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ovs-dev] [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
  2016-07-15 21:07           ` [ovs-dev] " pravin shelar
@ 2016-07-18  4:50             ` Simon Horman
  2016-07-18 22:34               ` pravin shelar
       [not found]               ` <20160718045025.GA2490-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  2016-09-26 16:53             ` [ovs-dev] " Jiri Benc
  1 sibling, 2 replies; 36+ messages in thread
From: Simon Horman @ 2016-07-18  4:50 UTC (permalink / raw)
  To: pravin shelar; +Cc: Linux Kernel Network Developers, ovs dev, Jiri Benc

[CC Jiri Benc for portion regarding GRE]

Hi Pravin,

On Fri, Jul 15, 2016 at 02:07:37PM -0700, pravin shelar wrote:
> On Wed, Jul 13, 2016 at 12:31 AM, Simon Horman
> <simon.horman@netronome.com> wrote:
> > Hi Pravin,
> >
> > On Thu, Jul 07, 2016 at 01:54:15PM -0700, pravin shelar wrote:
> >> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
> >> <simon.horman@netronome.com> wrote:
> >
> > ...
> 
> >
> >> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> >> > index 0ea128eeeab2..86f2cfb19de3 100644
> >> > --- a/net/openvswitch/flow.c
> >> > +++ b/net/openvswitch/flow.c
> >> ...
> >>
> >> > @@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
> >> >         key->phy.skb_mark = skb->mark;
> >> >         ovs_ct_fill_key(skb, key);
> >> >         key->ovs_flow_hash = 0;
> >> > +       key->phy.is_layer3 = skb->mac_len == 0;
> >>
> >> I do not think mac_len can be used. mac_header needs to be checked.
> >> ...
> >
> > Yes, indeed. The update to use skb_mac_header_was_set() here accidently
> > slipped into the following patch, sorry about that.
> >
> > With that change in place I believe that this patch is internally
> > consistent because mac_header and mac_len are set correctly by the
> > call to key_extract() which is called by ovs_flow_key_extract() just
> > after where the excerpt above ends.
> >
> > That said, I do think that it is possible to rely on skb_mac_header_was_set
> > throughout the datapath, including action processing etc... I have provided
> > an incremental patch - which I created on top of this entire series - at
> > the end of this email. If you prefer that approach I am happy to take it,
> > though I do feel that using mac_len leads to slightly cleaner code. Let me
> > know what you think.
> >
> 
> 
> I am not sure if you can use only mac_len to detect L3 packet. This
> does not work with MPLS packets, mac_len is used to account MPLS
> headers pushed on skb. Therefore in case of a MPLS header on L3
> packet, mac_len would be non zero and we have to look at either
> mac_header or some other metadata like is_layer3 flag from key to
> check for L3 packet.

At least within OvS mac_len does not include the length of the MPLS label
stack. Rather, the MPLS label stack length is the difference between the
end of (mac_header + mac_len) and network_header.

So I think that the scheme does work as mac_len is 0 if there is no L2
header regardless of if an MPLS label stack is present or not.

> >> > diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
> >> > index 4e3972344aa6..733e7914f6bd 100644
> >> > --- a/net/openvswitch/vport-netdev.c
> >> > +++ b/net/openvswitch/vport-netdev.c
> >> > @@ -57,8 +57,10 @@ static void netdev_port_receive(struct sk_buff *skb)
> >> >         if (unlikely(!skb))
> >> >                 return;
> >> >
> >> > -       skb_push(skb, ETH_HLEN);
> >> > -       skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
> >> > +       if (vport->dev->type == ARPHRD_ETHER) {
> >> > +               skb_push(skb, ETH_HLEN);
> >> > +               skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
> >> > +       }
> >> This is still required for tunnel device of ARPHRD_NONE which can
> >> handle l2 packets.
> >
> > That is not necessary given the current implementation (of ipgre) as it
> > supplies an skb with the mac header in place if the inner packet was an
> > Ethernet packet. This scheme could of course be adjusted.
> >
> > ...
> >
> 
> I think we should send L2 header with l2 header pushed on skb. This is
> what OVS expect. The skb-push should be done for all l2 packets rather
> than for particular type of device.

The following seems to achieve that.
Jiri, what do you think?

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index a20248355da0..edbc10690b60 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -281,10 +281,9 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
 					   raw_proto, false) < 0)
 			goto drop;
 
-		if (tunnel->dev->type != ARPHRD_NONE)
+		if (tunnel->dev->type != ARPHRD_NONE ||
+		    tpi->proto == htons(ETH_P_TEB))
 			skb_pop_mac_header(skb);
-		else if (tpi->proto != htons(ETH_P_TEB))
-			skb_unset_mac_header(skb);
 		else
 			skb_reset_mac_header(skb);
 		if (tunnel->collect_md) {
@@ -326,7 +325,7 @@ static int ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
 		 * also ETH_P_TEB traffic.
 		 */
 		itn = net_generic(net, ipgre_net_id);
-		res = __ipgre_rcv(skb, tpi, itn, hdr_len, true);
+		res = __ipgre_rcv(skb, tpi, itn, hdr_len, false);
 	}
 	return res;
 }
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 05985209f611..933ac46db53a 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -57,7 +57,8 @@ static void netdev_port_receive(struct sk_buff *skb)
 	if (unlikely(!skb))
 		return;
 
-	if (vport->dev->type == ARPHRD_ETHER) {
+	if (vport->dev->type != ARPHRD_NONE ||
+	    skb->protocol == htons(ETH_P_TEB)) {
 		skb_push(skb, ETH_HLEN);
 		skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
 	}

> > Update to use skb_mac_header_was_set() more as mentioned above.
> > Please let me know what you think about this approach.
> >
> >  include/net/mpls.h                   |    4 ++-
> >  net/openvswitch/actions.c            |   42 ++++++++++++++++++++---------------
> >  net/openvswitch/flow.c               |   23 +++++++++++--------
> >  net/openvswitch/vport-internal_dev.c |    2 -
> >  net/openvswitch/vport-netdev.c       |    4 +--
> >  5 files changed, 44 insertions(+), 31 deletions(-)
> >
> > diff --git a/include/net/mpls.h b/include/net/mpls.h
> > index 5b3b5addfb08..296b68661be0 100644
> > --- a/include/net/mpls.h
> > +++ b/include/net/mpls.h
> > @@ -34,6 +34,8 @@ static inline bool eth_p_mpls(__be16 eth_type)
> >   */
> >  static inline unsigned char *skb_mpls_header(struct sk_buff *skb)
> >  {
> > -       return skb_mac_header(skb) + skb->mac_len;
> > +       return skb_mac_header_was_set(skb) ?
> > +               skb_mac_header(skb) + skb->mac_len :
> > +               skb->data;
> >  }
> 
> This function is also called from GSO layer. issue is in GSO layer, it
> does reset mac header and mac length and then calls mpls-gso-handler.
> So all subsequent check for L3 packet fails.
> So far we have explored three different ways to detect L3 packet but
> each has its own issue.
> 1. skb mac header : GSO can reset mac header.
> 2. skb mac length : MPLS uses mac_len to account for MPLS header
> length along with L2 header
> 3. skb protocol: ETH_P_TEB is not set for all L2 frames, networking
> stack is not ready to handle this type for given skb.
> 
> So none of them works consistently. I think the only option to detect
> L3 packet reliably (and without adding field to skb) is to use
> skb-protocol along with ARPHRD_NONE device type. If ARPHRD_NONE type
> device generates L2 packet it needs to set protocol to ETH_P_TEB. Some
> networking stack function also needs to be fixed to handle this
> protocol type, e.g. vlan_get_protocol(), br_dev_queue_push_xmit(),
> etc.

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [ovs-dev] [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
  2016-07-18  4:50             ` Simon Horman
@ 2016-07-18 22:34               ` pravin shelar
       [not found]                 ` <CAOrHB_C3Hq-V4uPWLELSc2VMywjYSnKiFJ4VJQDnPpCu7s1Xkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
       [not found]               ` <20160718045025.GA2490-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  1 sibling, 1 reply; 36+ messages in thread
From: pravin shelar @ 2016-07-18 22:34 UTC (permalink / raw)
  To: Simon Horman; +Cc: Linux Kernel Network Developers, ovs dev, Jiri Benc

On Sun, Jul 17, 2016 at 9:50 PM, Simon Horman
<simon.horman@netronome.com> wrote:
> [CC Jiri Benc for portion regarding GRE]
>
> Hi Pravin,
>
> On Fri, Jul 15, 2016 at 02:07:37PM -0700, pravin shelar wrote:
>> On Wed, Jul 13, 2016 at 12:31 AM, Simon Horman
>> <simon.horman@netronome.com> wrote:
>> > Hi Pravin,
>> >
>> > On Thu, Jul 07, 2016 at 01:54:15PM -0700, pravin shelar wrote:
>> >> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
>> >> <simon.horman@netronome.com> wrote:
>> >
>> > ...
>>
>> >
>> >> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>> >> > index 0ea128eeeab2..86f2cfb19de3 100644
>> >> > --- a/net/openvswitch/flow.c
>> >> > +++ b/net/openvswitch/flow.c
>> >> ...
>> >>
>> >> > @@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>> >> >         key->phy.skb_mark = skb->mark;
>> >> >         ovs_ct_fill_key(skb, key);
>> >> >         key->ovs_flow_hash = 0;
>> >> > +       key->phy.is_layer3 = skb->mac_len == 0;
>> >>
>> >> I do not think mac_len can be used. mac_header needs to be checked.
>> >> ...
>> >
>> > Yes, indeed. The update to use skb_mac_header_was_set() here accidently
>> > slipped into the following patch, sorry about that.
>> >
>> > With that change in place I believe that this patch is internally
>> > consistent because mac_header and mac_len are set correctly by the
>> > call to key_extract() which is called by ovs_flow_key_extract() just
>> > after where the excerpt above ends.
>> >
>> > That said, I do think that it is possible to rely on skb_mac_header_was_set
>> > throughout the datapath, including action processing etc... I have provided
>> > an incremental patch - which I created on top of this entire series - at
>> > the end of this email. If you prefer that approach I am happy to take it,
>> > though I do feel that using mac_len leads to slightly cleaner code. Let me
>> > know what you think.
>> >
>>
>>
>> I am not sure if you can use only mac_len to detect L3 packet. This
>> does not work with MPLS packets, mac_len is used to account MPLS
>> headers pushed on skb. Therefore in case of a MPLS header on L3
>> packet, mac_len would be non zero and we have to look at either
>> mac_header or some other metadata like is_layer3 flag from key to
>> check for L3 packet.
>
> At least within OvS mac_len does not include the length of the MPLS label
> stack. Rather, the MPLS label stack length is the difference between the
> end of (mac_header + mac_len) and network_header.
>
> So I think that the scheme does work as mac_len is 0 if there is no L2
> header regardless of if an MPLS label stack is present or not.
>

I was thinking in overall networking stack rather than just ovs
datapath. I think we should have consistent method of detecting L3
packet. As commented in previous mail it could be achieved using
skb-protocol and device type.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
       [not found]                 ` <CAOrHB_C3Hq-V4uPWLELSc2VMywjYSnKiFJ4VJQDnPpCu7s1Xkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-07-20  0:02                   ` Simon Horman
       [not found]                     ` <20160720000243.GA4688-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Horman @ 2016-07-20  0:02 UTC (permalink / raw)
  To: pravin shelar; +Cc: ovs dev, Linux Kernel Network Developers, Jiri Benc

On Mon, Jul 18, 2016 at 03:34:52PM -0700, pravin shelar wrote:
> On Sun, Jul 17, 2016 at 9:50 PM, Simon Horman
> <simon.horman@netronome.com> wrote:
> > [CC Jiri Benc for portion regarding GRE]
> >
> > Hi Pravin,
> >
> > On Fri, Jul 15, 2016 at 02:07:37PM -0700, pravin shelar wrote:
> >> On Wed, Jul 13, 2016 at 12:31 AM, Simon Horman
> >> <simon.horman@netronome.com> wrote:
> >> > Hi Pravin,
> >> >
> >> > On Thu, Jul 07, 2016 at 01:54:15PM -0700, pravin shelar wrote:
> >> >> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
> >> >> <simon.horman@netronome.com> wrote:
> >> >
> >> > ...
> >>
> >> >
> >> >> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> >> >> > index 0ea128eeeab2..86f2cfb19de3 100644
> >> >> > --- a/net/openvswitch/flow.c
> >> >> > +++ b/net/openvswitch/flow.c
> >> >> ...
> >> >>
> >> >> > @@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
> >> >> >         key->phy.skb_mark = skb->mark;
> >> >> >         ovs_ct_fill_key(skb, key);
> >> >> >         key->ovs_flow_hash = 0;
> >> >> > +       key->phy.is_layer3 = skb->mac_len == 0;
> >> >>
> >> >> I do not think mac_len can be used. mac_header needs to be checked.
> >> >> ...
> >> >
> >> > Yes, indeed. The update to use skb_mac_header_was_set() here accidently
> >> > slipped into the following patch, sorry about that.
> >> >
> >> > With that change in place I believe that this patch is internally
> >> > consistent because mac_header and mac_len are set correctly by the
> >> > call to key_extract() which is called by ovs_flow_key_extract() just
> >> > after where the excerpt above ends.
> >> >
> >> > That said, I do think that it is possible to rely on skb_mac_header_was_set
> >> > throughout the datapath, including action processing etc... I have provided
> >> > an incremental patch - which I created on top of this entire series - at
> >> > the end of this email. If you prefer that approach I am happy to take it,
> >> > though I do feel that using mac_len leads to slightly cleaner code. Let me
> >> > know what you think.
> >> >
> >>
> >>
> >> I am not sure if you can use only mac_len to detect L3 packet. This
> >> does not work with MPLS packets, mac_len is used to account MPLS
> >> headers pushed on skb. Therefore in case of a MPLS header on L3
> >> packet, mac_len would be non zero and we have to look at either
> >> mac_header or some other metadata like is_layer3 flag from key to
> >> check for L3 packet.
> >
> > At least within OvS mac_len does not include the length of the MPLS label
> > stack. Rather, the MPLS label stack length is the difference between the
> > end of (mac_header + mac_len) and network_header.
> >
> > So I think that the scheme does work as mac_len is 0 if there is no L2
> > header regardless of if an MPLS label stack is present or not.
> >
> 
> I was thinking in overall networking stack rather than just ovs
> datapath. I think we should have consistent method of detecting L3
> packet. As commented in previous mail it could be achieved using
> skb-protocol and device type.

This is somewhat of a surprise to me. As far as I recall when MPLS support
was added to OvS it and the accompanying support for MPLS GSO was the only
MPLS support present in the kernel. And at the time the scheme developed by
Jesse Gross, myself and others was as I describe above.

Internally OvS relies on this scheme and in particular it is used
by skb_mpls_header() to calculate the beginning of the MPLS label stack
accurately in the presence of VLAN tags.

Is it mpls_gso_segment() that you are concerned about?
If so, perhaps the problem could be addressed there.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
       [not found]                     ` <20160720000243.GA4688-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-07-20 18:06                       ` pravin shelar
  2016-08-08 15:17                         ` [ovs-dev] " Simon Horman
  0 siblings, 1 reply; 36+ messages in thread
From: pravin shelar @ 2016-07-20 18:06 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers, Jiri Benc

On Tue, Jul 19, 2016 at 5:02 PM, Simon Horman
<simon.horman@netronome.com> wrote:
> On Mon, Jul 18, 2016 at 03:34:52PM -0700, pravin shelar wrote:
>> On Sun, Jul 17, 2016 at 9:50 PM, Simon Horman
>> <simon.horman@netronome.com> wrote:
>> > [CC Jiri Benc for portion regarding GRE]
>> >
>> > Hi Pravin,
>> >
>> > On Fri, Jul 15, 2016 at 02:07:37PM -0700, pravin shelar wrote:
>> >> On Wed, Jul 13, 2016 at 12:31 AM, Simon Horman
>> >> <simon.horman@netronome.com> wrote:
>> >> > Hi Pravin,
>> >> >
>> >> > On Thu, Jul 07, 2016 at 01:54:15PM -0700, pravin shelar wrote:
>> >> >> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
>> >> >> <simon.horman@netronome.com> wrote:
>> >> >
>> >> > ...
>> >>
>> >> >
>> >> >> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>> >> >> > index 0ea128eeeab2..86f2cfb19de3 100644
>> >> >> > --- a/net/openvswitch/flow.c
>> >> >> > +++ b/net/openvswitch/flow.c
>> >> >> ...
>> >> >>
>> >> >> > @@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>> >> >> >         key->phy.skb_mark = skb->mark;
>> >> >> >         ovs_ct_fill_key(skb, key);
>> >> >> >         key->ovs_flow_hash = 0;
>> >> >> > +       key->phy.is_layer3 = skb->mac_len == 0;
>> >> >>
>> >> >> I do not think mac_len can be used. mac_header needs to be checked.
>> >> >> ...
>> >> >
>> >> > Yes, indeed. The update to use skb_mac_header_was_set() here accidently
>> >> > slipped into the following patch, sorry about that.
>> >> >
>> >> > With that change in place I believe that this patch is internally
>> >> > consistent because mac_header and mac_len are set correctly by the
>> >> > call to key_extract() which is called by ovs_flow_key_extract() just
>> >> > after where the excerpt above ends.
>> >> >
>> >> > That said, I do think that it is possible to rely on skb_mac_header_was_set
>> >> > throughout the datapath, including action processing etc... I have provided
>> >> > an incremental patch - which I created on top of this entire series - at
>> >> > the end of this email. If you prefer that approach I am happy to take it,
>> >> > though I do feel that using mac_len leads to slightly cleaner code. Let me
>> >> > know what you think.
>> >> >
>> >>
>> >>
>> >> I am not sure if you can use only mac_len to detect L3 packet. This
>> >> does not work with MPLS packets, mac_len is used to account MPLS
>> >> headers pushed on skb. Therefore in case of a MPLS header on L3
>> >> packet, mac_len would be non zero and we have to look at either
>> >> mac_header or some other metadata like is_layer3 flag from key to
>> >> check for L3 packet.
>> >
>> > At least within OvS mac_len does not include the length of the MPLS label
>> > stack. Rather, the MPLS label stack length is the difference between the
>> > end of (mac_header + mac_len) and network_header.
>> >
>> > So I think that the scheme does work as mac_len is 0 if there is no L2
>> > header regardless of if an MPLS label stack is present or not.
>> >
>>
>> I was thinking in overall networking stack rather than just ovs
>> datapath. I think we should have consistent method of detecting L3
>> packet. As commented in previous mail it could be achieved using
>> skb-protocol and device type.
>
> This is somewhat of a surprise to me. As far as I recall when MPLS support
> was added to OvS it and the accompanying support for MPLS GSO was the only
> MPLS support present in the kernel. And at the time the scheme developed by
> Jesse Gross, myself and others was as I describe above.
>
> Internally OvS relies on this scheme and in particular it is used
> by skb_mpls_header() to calculate the beginning of the MPLS label stack
> accurately in the presence of VLAN tags.
>
> Is it mpls_gso_segment() that you are concerned about?
> If so, perhaps the problem could be addressed there.

Yes.
Can you read the comment I made in previous main in context of
function skb_mpls_header(). I have given rational for requested
change.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
       [not found]               ` <20160718045025.GA2490-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-07-21 15:39                 ` Jiri Benc
  0 siblings, 0 replies; 36+ messages in thread
From: Jiri Benc @ 2016-07-21 15:39 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers

On Mon, 18 Jul 2016 13:50:27 +0900, Simon Horman wrote:
> On Fri, Jul 15, 2016 at 02:07:37PM -0700, pravin shelar wrote:
> > I think we should send L2 header with l2 header pushed on skb. This is
> > what OVS expect. The skb-push should be done for all l2 packets rather
> > than for particular type of device.
> 
> The following seems to achieve that.
> Jiri, what do you think?
> 
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index a20248355da0..edbc10690b60 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -281,10 +281,9 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
>  					   raw_proto, false) < 0)
>  			goto drop;
>  
> -		if (tunnel->dev->type != ARPHRD_NONE)
> +		if (tunnel->dev->type != ARPHRD_NONE ||
> +		    tpi->proto == htons(ETH_P_TEB))
>  			skb_pop_mac_header(skb);

This is wrong. The MAC header for ARPHRD_NONE interfaces is null,
that's the meaning of ARPHRD_NONE. mac_header cannot point to the outer
IP header. That would be ARPHRD_IPGRE.

 Jiri
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ovs-dev] [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
  2016-07-20 18:06                       ` pravin shelar
@ 2016-08-08 15:17                         ` Simon Horman
  2016-08-08 15:28                           ` Jiri Benc
       [not found]                           ` <20160808151716.GA8477-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 2 replies; 36+ messages in thread
From: Simon Horman @ 2016-08-08 15:17 UTC (permalink / raw)
  To: pravin shelar; +Cc: Linux Kernel Network Developers, ovs dev, Jiri Benc

On Wed, Jul 20, 2016 at 11:06:37AM -0700, pravin shelar wrote:
> On Tue, Jul 19, 2016 at 5:02 PM, Simon Horman
> <simon.horman@netronome.com> wrote:
> > On Mon, Jul 18, 2016 at 03:34:52PM -0700, pravin shelar wrote:
> >> On Sun, Jul 17, 2016 at 9:50 PM, Simon Horman
> >> <simon.horman@netronome.com> wrote:
> >> > [CC Jiri Benc for portion regarding GRE]
> >> >
> >> > Hi Pravin,
> >> >
> >> > On Fri, Jul 15, 2016 at 02:07:37PM -0700, pravin shelar wrote:
> >> >> On Wed, Jul 13, 2016 at 12:31 AM, Simon Horman
> >> >> <simon.horman@netronome.com> wrote:
> >> >> > Hi Pravin,
> >> >> >
> >> >> > On Thu, Jul 07, 2016 at 01:54:15PM -0700, pravin shelar wrote:
> >> >> >> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
> >> >> >> <simon.horman@netronome.com> wrote:
> >> >> >
> >> >> > ...
> >> >>
> >> >> >
> >> >> >> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> >> >> >> > index 0ea128eeeab2..86f2cfb19de3 100644
> >> >> >> > --- a/net/openvswitch/flow.c
> >> >> >> > +++ b/net/openvswitch/flow.c
> >> >> >> ...
> >> >> >>
> >> >> >> > @@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
> >> >> >> >         key->phy.skb_mark = skb->mark;
> >> >> >> >         ovs_ct_fill_key(skb, key);
> >> >> >> >         key->ovs_flow_hash = 0;
> >> >> >> > +       key->phy.is_layer3 = skb->mac_len == 0;
> >> >> >>
> >> >> >> I do not think mac_len can be used. mac_header needs to be checked.
> >> >> >> ...
> >> >> >
> >> >> > Yes, indeed. The update to use skb_mac_header_was_set() here accidently
> >> >> > slipped into the following patch, sorry about that.
> >> >> >
> >> >> > With that change in place I believe that this patch is internally
> >> >> > consistent because mac_header and mac_len are set correctly by the
> >> >> > call to key_extract() which is called by ovs_flow_key_extract() just
> >> >> > after where the excerpt above ends.
> >> >> >
> >> >> > That said, I do think that it is possible to rely on skb_mac_header_was_set
> >> >> > throughout the datapath, including action processing etc... I have provided
> >> >> > an incremental patch - which I created on top of this entire series - at
> >> >> > the end of this email. If you prefer that approach I am happy to take it,
> >> >> > though I do feel that using mac_len leads to slightly cleaner code. Let me
> >> >> > know what you think.
> >> >> >
> >> >>
> >> >>
> >> >> I am not sure if you can use only mac_len to detect L3 packet. This
> >> >> does not work with MPLS packets, mac_len is used to account MPLS
> >> >> headers pushed on skb. Therefore in case of a MPLS header on L3
> >> >> packet, mac_len would be non zero and we have to look at either
> >> >> mac_header or some other metadata like is_layer3 flag from key to
> >> >> check for L3 packet.
> >> >
> >> > At least within OvS mac_len does not include the length of the MPLS label
> >> > stack. Rather, the MPLS label stack length is the difference between the
> >> > end of (mac_header + mac_len) and network_header.
> >> >
> >> > So I think that the scheme does work as mac_len is 0 if there is no L2
> >> > header regardless of if an MPLS label stack is present or not.
> >> >
> >>
> >> I was thinking in overall networking stack rather than just ovs
> >> datapath. I think we should have consistent method of detecting L3
> >> packet. As commented in previous mail it could be achieved using
> >> skb-protocol and device type.
> >
> > This is somewhat of a surprise to me. As far as I recall when MPLS support
> > was added to OvS it and the accompanying support for MPLS GSO was the only
> > MPLS support present in the kernel. And at the time the scheme developed by
> > Jesse Gross, myself and others was as I describe above.
> >
> > Internally OvS relies on this scheme and in particular it is used
> > by skb_mpls_header() to calculate the beginning of the MPLS label stack
> > accurately in the presence of VLAN tags.
> >
> > Is it mpls_gso_segment() that you are concerned about?
> > If so, perhaps the problem could be addressed there.
> 
> Yes.
> Can you read the comment I made in previous main in context of
> function skb_mpls_header(). I have given rational for requested
> change.

Hi Pravin,

I have made an attempt to implement your suggestion to the extent that
I understand it. The following is an incremental change on top
of this patch-set. Does it move things closer to what you have in mind?

Light testing seems to indicate that it works for GSO skbs
received over both L3 and L2 GRE tunnels by OvS with both
IP-in-MPLS and IP (without MPLS) payloads.

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 72ece516535d..42033537eb4d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2171,17 +2171,14 @@ static inline void skb_reset_mac_header(struct sk_buff *skb)
 	skb->mac_header = skb->data - skb->head;
 }
 
-static inline void skb_unset_mac_header(struct sk_buff *skb)
-{
-	skb->mac_header = (typeof(skb->mac_header))~0U;
-}
-
 static inline void skb_set_mac_header(struct sk_buff *skb, const int offset)
 {
 	skb_reset_mac_header(skb);
 	skb->mac_header += offset;
 }
 
+bool skb_mac_header_present(struct sk_buff *skb);
+
 static inline void skb_pop_mac_header(struct sk_buff *skb)
 {
 	skb->mac_header = skb->network_header;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3864b4b68fa1..8e55e9503c9d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4883,3 +4883,11 @@ struct sk_buff *pskb_extract(struct sk_buff *skb, int off,
 	return clone;
 }
 EXPORT_SYMBOL(pskb_extract);
+
+bool skb_mac_header_present(struct sk_buff *skb)
+{
+	return skb->dev->type == ARPHRD_ETHER ||
+		(skb->dev->type == ARPHRD_NONE &&
+		 skb->protocol == htons(ETH_P_TEB));
+}
+EXPORT_SYMBOL(skb_mac_header_present);
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index a20248355da0..3f730ad4a874 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -283,8 +283,6 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
 
 		if (tunnel->dev->type != ARPHRD_NONE)
 			skb_pop_mac_header(skb);
-		else if (tpi->proto != htons(ETH_P_TEB))
-			skb_unset_mac_header(skb);
 		else
 			skb_reset_mac_header(skb);
 		if (tunnel->collect_md) {
@@ -453,6 +451,7 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	df = key->tun_flags & TUNNEL_DONT_FRAGMENT ?  htons(IP_DF) : 0;
 
+	skb_set_inner_protocol(skb, proto);
 	iptunnel_xmit(skb->sk, rt, skb, fl.saddr, key->u.ipv4.dst, IPPROTO_GRE,
 		      key->tos, key->ttl, df, false);
 	return;
diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
index 2055e57ed1c3..113cba89653d 100644
--- a/net/mpls/mpls_gso.c
+++ b/net/mpls/mpls_gso.c
@@ -39,16 +39,18 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
 	mpls_features = skb->dev->mpls_features & features;
 	segs = skb_mac_gso_segment(skb, mpls_features);
 
-
-	/* Restore outer protocol. */
-	skb->protocol = mpls_protocol;
-
 	/* Re-pull the mac header that the call to skb_mac_gso_segment()
 	 * above pulled.  It will be re-pushed after returning
 	 * skb_mac_gso_segment(), an indirect caller of this function.
 	 */
 	__skb_pull(skb, skb->data - skb_mac_header(skb));
 
+	/* Restore outer protocol. */
+	skb->protocol = mpls_protocol;
+	if (!IS_ERR(segs))
+		for (skb = segs; skb; skb = skb->next)
+			skb->protocol = mpls_protocol;
+
 	return segs;
 }
 
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 0001f651c934..424164222f1e 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -163,18 +163,20 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 		return -ENOMEM;
 
 	skb_push(skb, MPLS_HLEN);
-	skb_reset_mac_header(skb);
-
-	new_mpls_lse = (__be32 *)skb_mpls_header(skb);
-	*new_mpls_lse = mpls->mpls_lse;
-
-	skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
 
-	if (skb->mac_len) {
+	if (key->phy.is_layer3) {
+		new_mpls_lse = (__be32 *)skb->data;
+	} else {
 		update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
 		memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
 			skb->mac_len);
+		skb_reset_mac_header(skb);
+		new_mpls_lse = (__be32 *)skb_mpls_header(skb);
 	}
+	*new_mpls_lse = mpls->mpls_lse;
+
+	skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
+
 	if (!skb->inner_protocol)
 		skb_set_inner_protocol(skb, skb->protocol);
 	skb->protocol = mpls->mpls_ethertype;
@@ -186,30 +188,31 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 		    const __be16 ethertype)
 {
-	int err;
-
-	err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN);
-	if (unlikely(err))
-		return err;
-
-	skb_postpull_rcsum(skb, skb_mpls_header(skb), MPLS_HLEN);
-
-	memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb),
-		skb->mac_len);
+	if (!key->phy.is_layer3) {
+		struct ethhdr *hdr;
+		int err;
 
-	__skb_pull(skb, MPLS_HLEN);
-	skb_reset_mac_header(skb);
+		skb_postpull_rcsum(skb, skb_mpls_header(skb), MPLS_HLEN);
 
-	if (skb->mac_len) {
-		struct ethhdr *hdr;
+		err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN);
+		if (unlikely(err))
+			return err;
 
 		/* skb_mpls_header() is used to locate the ethertype
 		 * field correctly in the presence of VLAN tags.
 		 */
 		hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
 		update_ethertype(skb, hdr, ethertype);
+
+		memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb),
+			skb->mac_len);
 	}
 
+	__skb_pull(skb, MPLS_HLEN);
+
+	if (!key->phy.is_layer3)
+		skb_reset_mac_header(skb);
+
 	if (eth_p_mpls(skb->protocol))
 		skb->protocol = ethertype;
 
@@ -220,15 +223,23 @@ static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 static int set_mpls(struct sk_buff *skb, struct sw_flow_key *flow_key,
 		    const __be32 *mpls_lse, const __be32 *mask)
 {
+	__be16 mac_len;
 	__be32 *stack;
 	__be32 lse;
 	int err;
 
+	if (flow_key->phy.is_layer3) {
+		mac_len = 0;
+		stack = (__be32 *)skb->data;
+	} else {
+		mac_len = skb->mac_len;
+	}
+
 	err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN);
 	if (unlikely(err))
 		return err;
-
 	stack = (__be32 *)skb_mpls_header(skb);
+
 	lse = OVS_MASKED(*stack, *mpls_lse, *mask);
 	if (skb->ip_summed == CHECKSUM_COMPLETE) {
 		__be32 diff[] = { ~(*stack), lse };
@@ -308,8 +319,8 @@ static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
 {
 	skb_pull_rcsum(skb, ETH_HLEN);
 	skb_reset_mac_header(skb);
-	skb->mac_len -= ETH_HLEN;
 
+	key->phy.is_layer3 = true;
 	invalidate_flow_key(key);
 	return 0;
 }
@@ -325,7 +336,7 @@ static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
 
 	skb_push(skb, ETH_HLEN);
 	skb_reset_mac_header(skb);
-	skb->mac_len += ETH_HLEN;
+	skb->mac_len = ETH_HLEN;
 
 	hdr = eth_hdr(skb);
 	ether_addr_copy(hdr->h_source, ethh->addresses.eth_src);
@@ -334,6 +345,7 @@ static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
 
 	skb_postpush_rcsum(skb, hdr, ETH_HLEN);
 
+	key->phy.is_layer3 = false;
 	invalidate_flow_key(key);
 	return 0;
 }
@@ -795,6 +807,9 @@ static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port,
 		u16 mru = OVS_CB(skb)->mru;
 		u32 cutlen = OVS_CB(skb)->cutlen;
 
+		if (vport->dev->type == ARPHRD_NONE && !key->phy.is_layer3)
+			skb->protocol = htons(ETH_P_TEB);
+
 		if (unlikely(cutlen > 0)) {
 			if (skb->len - cutlen > ETH_HLEN)
 				pskb_trim(skb, skb->len - cutlen);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 42587d5bf894..812f8e10d9d4 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -448,7 +448,7 @@ invalid:
  *
  * Initializes @skb header pointers as follows:
  *
- *    - skb->mac_header: the Ethernet header.
+ *    - skb->mac_header: the Ethernet header if flow is L2, unset otherwise
  *
  *    - skb->network_header: just past the Ethernet header, or just past the
  *      VLAN header, to the first byte of the Ethernet payload.
@@ -465,17 +465,19 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 	/* Flags are always used as part of stats */
 	key->tp.flags = 0;
 
-	skb_reset_mac_header(skb);
-
 	/* Link layer. */
 	key->eth.tci = 0;
 	if (key->phy.is_layer3) {
 		if (skb_vlan_tag_present(skb))
 			key->eth.tci = htons(skb->vlan_tci);
 		key->eth.type = skb->protocol;
+		skb_reset_network_header(skb);
 	} else {
-		struct ethhdr *eth = eth_hdr(skb);
+		struct ethhdr *eth;
+
+		skb_reset_mac_header(skb);
 
+		eth = eth_hdr(skb);
 		ether_addr_copy(key->eth.src, eth->h_source);
 		ether_addr_copy(key->eth.dst, eth->h_dest);
 
@@ -493,11 +495,11 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 		key->eth.type = parse_ethertype(skb);
 		if (unlikely(key->eth.type == htons(0)))
 			return -ENOMEM;
-	}
 
-	skb_reset_network_header(skb);
-	skb_reset_mac_len(skb);
-	__skb_push(skb, skb->data - skb_mac_header(skb));
+		skb_reset_network_header(skb);
+		skb_reset_mac_len(skb);
+		__skb_push(skb, skb->data - skb_mac_header(skb));
+	}
 
 	/* Network layer. */
 	if (key->eth.type == htons(ETH_P_IP)) {
@@ -608,12 +610,16 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 		 * header and the beginning of the L3 header differ.
 		 *
 		 * Advance network_header to the beginning of the L3
-		 * header. mac_len corresponds to the end of the L2 header.
+		 * header. For packets with an L2 header mac_len corresponds
+		 * to the end of the L2 header.
 		 */
 		while (1) {
+			__u16 mac_len;
 			__be32 lse;
 
-			error = check_header(skb, skb->mac_len + stack_len);
+			mac_len = key->phy.is_layer3 ? 0 : skb->mac_len;
+
+			error = check_header(skb, mac_len + stack_len);
 			if (unlikely(error))
 				return 0;
 
@@ -622,7 +628,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 			if (stack_len == MPLS_HLEN)
 				memcpy(&key->mpls.top_lse, &lse, MPLS_HLEN);
 
-			skb_set_network_header(skb, skb->mac_len + stack_len);
+			skb_set_network_header(skb, mac_len + stack_len);
 			if (lse & htonl(MPLS_LS_S_MASK))
 				break;
 
@@ -729,7 +735,7 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 	key->phy.skb_mark = skb->mark;
 	ovs_ct_fill_key(skb, key);
 	key->ovs_flow_hash = 0;
-	key->phy.is_layer3 = skb_mac_header_was_set(skb) == 0;
+	key->phy.is_layer3 = !skb_mac_header_present(skb);
 	key->recirc_id = 0;
 
 	err = key_extract(skb, key);
diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
index 7a06e19f5279..1a1fcec88695 100644
--- a/net/openvswitch/vport-geneve.c
+++ b/net/openvswitch/vport-geneve.c
@@ -116,7 +116,7 @@ static struct vport_ops ovs_geneve_vport_ops = {
 	.create		= geneve_create,
 	.destroy	= ovs_netdev_tunnel_destroy,
 	.get_options	= geneve_get_options,
-	.send		= ovs_netdev_send,
+	.send		= dev_queue_xmit,
 };
 
 static int __init ovs_geneve_tnl_init(void)
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index c1cab9dd392f..cbfb0afe041d 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -87,7 +87,7 @@ static struct vport *gre_create(const struct vport_parms *parms)
 static struct vport_ops ovs_gre_vport_ops = {
 	.type		= OVS_VPORT_TYPE_GRE,
 	.create		= gre_create,
-	.send		= ovs_netdev_send,
+	.send		= dev_queue_xmit,
 	.destroy	= ovs_netdev_tunnel_destroy,
 };
 
diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index 5ad184bd5802..3d392e84e7a7 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -255,7 +255,7 @@ static netdev_tx_t internal_dev_recv(struct sk_buff *skb)
 	struct pcpu_sw_netstats *stats;
 
 	/* Only send/receive L2 packets */
-	if (!skb->mac_len) {
+	if (!skb_mac_header_present(skb)) {
 		kfree_skb(skb);
 		return -EINVAL;
 	}
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 7d54414b35eb..b6b45cf90816 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -197,21 +197,6 @@ void ovs_netdev_tunnel_destroy(struct vport *vport)
 }
 EXPORT_SYMBOL_GPL(ovs_netdev_tunnel_destroy);
 
-int ovs_netdev_send(struct sk_buff *skb)
-{
-	struct net_device *dev = skb->dev;
-
-	if (dev->type != ARPHRD_ETHER && skb->mac_len) {
-		skb->protocol = htons(ETH_P_TEB);
-	} else if (dev->type == ARPHRD_ETHER && !skb->mac_len) {
-		kfree_skb(skb);
-		return -EINVAL;
-	}
-
-	return dev_queue_xmit(skb);
-}
-EXPORT_SYMBOL_GPL(ovs_netdev_send);
-
 /* Returns null if this device is not attached to a datapath. */
 struct vport *ovs_netdev_get_vport(struct net_device *dev)
 {
@@ -226,7 +211,7 @@ static struct vport_ops ovs_netdev_vport_ops = {
 	.type		= OVS_VPORT_TYPE_NETDEV,
 	.create		= netdev_create,
 	.destroy	= netdev_destroy,
-	.send		= ovs_netdev_send,
+	.send		= dev_queue_xmit,
 };
 
 int __init ovs_netdev_init(void)
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 13f11ad7e35a..5eb7694348b5 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -153,7 +153,7 @@ static struct vport_ops ovs_vxlan_netdev_vport_ops = {
 	.create			= vxlan_create,
 	.destroy		= ovs_netdev_tunnel_destroy,
 	.get_options		= vxlan_get_options,
-	.send			= ovs_netdev_send,
+	.send			= dev_queue_xmit,
 };
 
 static int __init ovs_vxlan_tnl_init(void)

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [ovs-dev] [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
  2016-08-08 15:17                         ` [ovs-dev] " Simon Horman
@ 2016-08-08 15:28                           ` Jiri Benc
  2016-08-10 10:16                             ` Simon Horman
       [not found]                           ` <20160808151716.GA8477-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  1 sibling, 1 reply; 36+ messages in thread
From: Jiri Benc @ 2016-08-08 15:28 UTC (permalink / raw)
  To: Simon Horman; +Cc: pravin shelar, Linux Kernel Network Developers, ovs dev

On Mon, 8 Aug 2016 17:17:17 +0200, Simon Horman wrote:
> +bool skb_mac_header_present(struct sk_buff *skb)
> +{
> +	return skb->dev->type == ARPHRD_ETHER ||
> +		(skb->dev->type == ARPHRD_NONE &&
> +		 skb->protocol == htons(ETH_P_TEB));
> +}
> +EXPORT_SYMBOL(skb_mac_header_present);

I'd suggest a different name, this looks like it has something to do
with skb->mac_header, which it doesn't. skb_eth_header_present, perhaps?

 Jiri

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
       [not found]                           ` <20160808151716.GA8477-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-08-09 15:47                             ` pravin shelar
       [not found]                               ` <CAOrHB_BYtGsWPSs2pxTjPajqFEP=5YySmqjc93NbdtY96-dYfw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: pravin shelar @ 2016-08-09 15:47 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers, Jiri Benc

On Mon, Aug 8, 2016 at 8:17 AM, Simon Horman <simon.horman@netronome.com> wrote:
> On Wed, Jul 20, 2016 at 11:06:37AM -0700, pravin shelar wrote:
>> On Tue, Jul 19, 2016 at 5:02 PM, Simon Horman
>> <simon.horman@netronome.com> wrote:
>> > On Mon, Jul 18, 2016 at 03:34:52PM -0700, pravin shelar wrote:
>> >> On Sun, Jul 17, 2016 at 9:50 PM, Simon Horman
>> >> <simon.horman@netronome.com> wrote:
>> >> > [CC Jiri Benc for portion regarding GRE]
>> >> >
>> >> > Hi Pravin,
>> >> >
>> >> > On Fri, Jul 15, 2016 at 02:07:37PM -0700, pravin shelar wrote:
>> >> >> On Wed, Jul 13, 2016 at 12:31 AM, Simon Horman
>> >> >> <simon.horman@netronome.com> wrote:
>> >> >> > Hi Pravin,
>> >> >> >
>> >> >> > On Thu, Jul 07, 2016 at 01:54:15PM -0700, pravin shelar wrote:
>> >> >> >> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
>> >> >> >> <simon.horman@netronome.com> wrote:
>> >> >> >
>> >> >> > ...
>> >> >>
>> >> >> >
>> >> >> >> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>> >> >> >> > index 0ea128eeeab2..86f2cfb19de3 100644
>> >> >> >> > --- a/net/openvswitch/flow.c
>> >> >> >> > +++ b/net/openvswitch/flow.c
>> >> >> >> ...
>> >> >> >>
>> >> >> >> > @@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>> >> >> >> >         key->phy.skb_mark = skb->mark;
>> >> >> >> >         ovs_ct_fill_key(skb, key);
>> >> >> >> >         key->ovs_flow_hash = 0;
>> >> >> >> > +       key->phy.is_layer3 = skb->mac_len == 0;
>> >> >> >>
>> >> >> >> I do not think mac_len can be used. mac_header needs to be checked.
>> >> >> >> ...
>> >> >> >
>> >> >> > Yes, indeed. The update to use skb_mac_header_was_set() here accidently
>> >> >> > slipped into the following patch, sorry about that.
>> >> >> >
>> >> >> > With that change in place I believe that this patch is internally
>> >> >> > consistent because mac_header and mac_len are set correctly by the
>> >> >> > call to key_extract() which is called by ovs_flow_key_extract() just
>> >> >> > after where the excerpt above ends.
>> >> >> >
>> >> >> > That said, I do think that it is possible to rely on skb_mac_header_was_set
>> >> >> > throughout the datapath, including action processing etc... I have provided
>> >> >> > an incremental patch - which I created on top of this entire series - at
>> >> >> > the end of this email. If you prefer that approach I am happy to take it,
>> >> >> > though I do feel that using mac_len leads to slightly cleaner code. Let me
>> >> >> > know what you think.
>> >> >> >
>> >> >>
>> >> >>
>> >> >> I am not sure if you can use only mac_len to detect L3 packet. This
>> >> >> does not work with MPLS packets, mac_len is used to account MPLS
>> >> >> headers pushed on skb. Therefore in case of a MPLS header on L3
>> >> >> packet, mac_len would be non zero and we have to look at either
>> >> >> mac_header or some other metadata like is_layer3 flag from key to
>> >> >> check for L3 packet.
>> >> >
>> >> > At least within OvS mac_len does not include the length of the MPLS label
>> >> > stack. Rather, the MPLS label stack length is the difference between the
>> >> > end of (mac_header + mac_len) and network_header.
>> >> >
>> >> > So I think that the scheme does work as mac_len is 0 if there is no L2
>> >> > header regardless of if an MPLS label stack is present or not.
>> >> >
>> >>
>> >> I was thinking in overall networking stack rather than just ovs
>> >> datapath. I think we should have consistent method of detecting L3
>> >> packet. As commented in previous mail it could be achieved using
>> >> skb-protocol and device type.
>> >
>> > This is somewhat of a surprise to me. As far as I recall when MPLS support
>> > was added to OvS it and the accompanying support for MPLS GSO was the only
>> > MPLS support present in the kernel. And at the time the scheme developed by
>> > Jesse Gross, myself and others was as I describe above.
>> >
>> > Internally OvS relies on this scheme and in particular it is used
>> > by skb_mpls_header() to calculate the beginning of the MPLS label stack
>> > accurately in the presence of VLAN tags.
>> >
>> > Is it mpls_gso_segment() that you are concerned about?
>> > If so, perhaps the problem could be addressed there.
>>
>> Yes.
>> Can you read the comment I made in previous main in context of
>> function skb_mpls_header(). I have given rational for requested
>> change.
>
> Hi Pravin,
>
> I have made an attempt to implement your suggestion to the extent that
> I understand it. The following is an incremental change on top
> of this patch-set. Does it move things closer to what you have in mind?
>
Following approach looks good to me. I have posted couple of comments.

> Light testing seems to indicate that it works for GSO skbs
> received over both L3 and L2 GRE tunnels by OvS with both
> IP-in-MPLS and IP (without MPLS) payloads.
>

Thanks for testing it. Can you also add those tests to OVS kmod test suite?
..

> diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
> index 2055e57ed1c3..113cba89653d 100644
> --- a/net/mpls/mpls_gso.c
> +++ b/net/mpls/mpls_gso.c
> @@ -39,16 +39,18 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
>         mpls_features = skb->dev->mpls_features & features;
>         segs = skb_mac_gso_segment(skb, mpls_features);
>
> -
> -       /* Restore outer protocol. */
> -       skb->protocol = mpls_protocol;
> -
>         /* Re-pull the mac header that the call to skb_mac_gso_segment()
>          * above pulled.  It will be re-pushed after returning
>          * skb_mac_gso_segment(), an indirect caller of this function.
>          */
>         __skb_pull(skb, skb->data - skb_mac_header(skb));
>
> +       /* Restore outer protocol. */
> +       skb->protocol = mpls_protocol;
> +       if (!IS_ERR(segs))
> +               for (skb = segs; skb; skb = skb->next)
> +                       skb->protocol = mpls_protocol;
> +

skb_mac_gso_segment() can also return NULL. Therefore segs should be
checked for NULL case.

> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 0001f651c934..424164222f1e 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c

...
> @@ -308,8 +319,8 @@ static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
>  {
>         skb_pull_rcsum(skb, ETH_HLEN);
>         skb_reset_mac_header(skb);
> -       skb->mac_len -= ETH_HLEN;
>

I am not sure why this line is removed.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
  2016-08-08 15:28                           ` Jiri Benc
@ 2016-08-10 10:16                             ` Simon Horman
  0 siblings, 0 replies; 36+ messages in thread
From: Simon Horman @ 2016-08-10 10:16 UTC (permalink / raw)
  To: Jiri Benc; +Cc: ovs dev, Linux Kernel Network Developers

On Mon, Aug 08, 2016 at 05:28:39PM +0200, Jiri Benc wrote:
> On Mon, 8 Aug 2016 17:17:17 +0200, Simon Horman wrote:
> > +bool skb_mac_header_present(struct sk_buff *skb)
> > +{
> > +	return skb->dev->type == ARPHRD_ETHER ||
> > +		(skb->dev->type == ARPHRD_NONE &&
> > +		 skb->protocol == htons(ETH_P_TEB));
> > +}
> > +EXPORT_SYMBOL(skb_mac_header_present);
> 
> I'd suggest a different name, this looks like it has something to do
> with skb->mac_header, which it doesn't. skb_eth_header_present, perhaps?

I struggled to come up with a reasonable name and I do like your
suggestion better than mine. I'll update the code as you suggest unless a
better name emerges.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
       [not found]                               ` <CAOrHB_BYtGsWPSs2pxTjPajqFEP=5YySmqjc93NbdtY96-dYfw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-08-10 10:20                                 ` Simon Horman
       [not found]                                   ` <20160810102043.GE5451-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Horman @ 2016-08-10 10:20 UTC (permalink / raw)
  To: pravin shelar; +Cc: ovs dev, Linux Kernel Network Developers, Jiri Benc

On Tue, Aug 09, 2016 at 08:47:40AM -0700, pravin shelar wrote:
> On Mon, Aug 8, 2016 at 8:17 AM, Simon Horman <simon.horman@netronome.com> wrote:

...

> > Hi Pravin,
> >
> > I have made an attempt to implement your suggestion to the extent that
> > I understand it. The following is an incremental change on top
> > of this patch-set. Does it move things closer to what you have in mind?
> >
> Following approach looks good to me. I have posted couple of comments.

Thanks, I am rather glad to hear that.

> > Light testing seems to indicate that it works for GSO skbs
> > received over both L3 and L2 GRE tunnels by OvS with both
> > IP-in-MPLS and IP (without MPLS) payloads.
> >
> 
> Thanks for testing it. Can you also add those tests to OVS kmod test suite?
> ..

Sure, I will look into doing that.
Am I correct in thinking Joe Stringer is the best person to contact if
I run into trouble there?

> > diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
> > index 2055e57ed1c3..113cba89653d 100644
> > --- a/net/mpls/mpls_gso.c
> > +++ b/net/mpls/mpls_gso.c
> > @@ -39,16 +39,18 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
> >         mpls_features = skb->dev->mpls_features & features;
> >         segs = skb_mac_gso_segment(skb, mpls_features);
> >
> > -
> > -       /* Restore outer protocol. */
> > -       skb->protocol = mpls_protocol;
> > -
> >         /* Re-pull the mac header that the call to skb_mac_gso_segment()
> >          * above pulled.  It will be re-pushed after returning
> >          * skb_mac_gso_segment(), an indirect caller of this function.
> >          */
> >         __skb_pull(skb, skb->data - skb_mac_header(skb));
> >
> > +       /* Restore outer protocol. */
> > +       skb->protocol = mpls_protocol;
> > +       if (!IS_ERR(segs))
> > +               for (skb = segs; skb; skb = skb->next)
> > +                       skb->protocol = mpls_protocol;
> > +
> 
> skb_mac_gso_segment() can also return NULL. Therefore segs should be
> checked for NULL case.

Sure, I will fix that.
I think that can be trivially resolved using IS_ERR_OR_NULL()

> > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> > index 0001f651c934..424164222f1e 100644
> > --- a/net/openvswitch/actions.c
> > +++ b/net/openvswitch/actions.c
> 
> ...
> > @@ -308,8 +319,8 @@ static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
> >  {
> >         skb_pull_rcsum(skb, ETH_HLEN);
> >         skb_reset_mac_header(skb);
> > -       skb->mac_len -= ETH_HLEN;
> >
> 
> I am not sure why this line is removed.

I will restore it.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
       [not found]                                   ` <20160810102043.GE5451-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-08-10 17:17                                     ` Joe Stringer
  2016-08-22 11:04                                       ` [ovs-dev] " Simon Horman
  0 siblings, 1 reply; 36+ messages in thread
From: Joe Stringer @ 2016-08-10 17:17 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers, Jiri Benc

On 10 August 2016 at 03:20, Simon Horman <simon.horman@netronome.com> wrote:
> On Tue, Aug 09, 2016 at 08:47:40AM -0700, pravin shelar wrote:
>> On Mon, Aug 8, 2016 at 8:17 AM, Simon Horman <simon.horman@netronome.com> wrote:
>> > Light testing seems to indicate that it works for GSO skbs
>> > received over both L3 and L2 GRE tunnels by OvS with both
>> > IP-in-MPLS and IP (without MPLS) payloads.
>> >
>>
>> Thanks for testing it. Can you also add those tests to OVS kmod test suite?
>> ..
>
> Sure, I will look into doing that.
> Am I correct in thinking Joe Stringer is the best person to contact if
> I run into trouble there?

Sure. The basics of running the tests is documented here:
https://github.com/openvswitch/ovs/blob/master/INSTALL.md#datapath-testing

You should be able to get a good feel for how to add tests by perusing
the commits to tests/system-{traffic,kmod-macros}.at in the OVS source
tree.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ovs-dev] [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
  2016-08-10 17:17                                     ` Joe Stringer
@ 2016-08-22 11:04                                       ` Simon Horman
       [not found]                                         ` <20160822110444.GA29971-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Horman @ 2016-08-22 11:04 UTC (permalink / raw)
  To: Joe Stringer
  Cc: pravin shelar, ovs dev, Linux Kernel Network Developers, Jiri Benc

On Wed, Aug 10, 2016 at 10:17:30AM -0700, Joe Stringer wrote:
> On 10 August 2016 at 03:20, Simon Horman <simon.horman@netronome.com> wrote:
> > On Tue, Aug 09, 2016 at 08:47:40AM -0700, pravin shelar wrote:
> >> On Mon, Aug 8, 2016 at 8:17 AM, Simon Horman <simon.horman@netronome.com> wrote:
> >> > Light testing seems to indicate that it works for GSO skbs
> >> > received over both L3 and L2 GRE tunnels by OvS with both
> >> > IP-in-MPLS and IP (without MPLS) payloads.
> >> >
> >>
> >> Thanks for testing it. Can you also add those tests to OVS kmod test suite?
> >> ..
> >
> > Sure, I will look into doing that.
> > Am I correct in thinking Joe Stringer is the best person to contact if
> > I run into trouble there?
> 
> Sure. The basics of running the tests is documented here:
> https://github.com/openvswitch/ovs/blob/master/INSTALL.md#datapath-testing
> 
> You should be able to get a good feel for how to add tests by perusing
> the commits to tests/system-{traffic,kmod-macros}.at in the OVS source
> tree.

Thanks Joe,

it took me a while but I think that I have something working
against the head branch of the OVS tree. I'd value opinions
on the direction I have taken.

Subject: [PATCH] system-traffic: Exercise GSO

Exercise GSO for: unencapsulated; MPLS; GRE; and MPLS in GRE.

There is scope to extend this testing to other encapsulation formats
if desired.

This is motivated by a desire to test GRE and MPLS encapsulation in
the context of L3/VPN (MPLS over non-TEB GRE work). That is not
tested here but tests for those cases would ideally be based on those in
this patch.

---
 tests/system-common-macros.at |  36 +++++--
 tests/system-kmod-macros.at   |  22 +++++
 tests/system-traffic.at       | 225 +++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 274 insertions(+), 9 deletions(-)

diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at
index 4ffc3822a4d3..a201cf8ce100 100644
--- a/tests/system-common-macros.at
+++ b/tests/system-common-macros.at
@@ -56,7 +56,7 @@ m4_define([ADD_INT],
     ]
 )
 
-# ADD_VETH([port], [namespace], [ovs-br], [ip_addr] [mac_addr [gateway]])
+# ADD_VETH([port], [namespace], [ovs-br], [ip_addr] [mac_addr [gateway [ofport]]])
 #
 # Add a pair of veth ports. 'port' will be added to name space 'namespace',
 # and "ovs-'port'" will be added to ovs bridge 'ovs-br'.
@@ -64,8 +64,8 @@ m4_define([ADD_INT],
 # The 'port' in 'namespace' will be brought up with static IP address
 # with 'ip_addr' in CIDR notation.
 #
-# Optionally, one can specify the 'mac_addr' for 'port' and the default
-# 'gateway'.
+# Optionally, one can specify the 'mac_addr' for 'port', the default
+# 'gateway' and the 'ofport' number.
 #
 # The existing 'port' or 'ovs-port' will be removed before new ones are added.
 #
@@ -74,8 +74,14 @@ m4_define([ADD_VETH],
       CONFIGURE_VETH_OFFLOADS([$1])
       AT_CHECK([ip link set $1 netns $2])
       AT_CHECK([ip link set dev ovs-$1 up])
-      AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
-                set interface ovs-$1 external-ids:iface-id="$1"])
+      if test -n "$7"; then
+        AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
+                  set interface ovs-$1 external-ids:iface-id="$1" \
+                  ofport_request=$7])
+      else
+        AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
+                  set interface ovs-$1 external-ids:iface-id="$1"])
+      fi
       NS_CHECK_EXEC([$2], [ip addr add $4 dev $1])
       NS_CHECK_EXEC([$2], [ip link set dev $1 up])
       if test -n "$5"; then
@@ -99,7 +105,7 @@ m4_define([ADD_VLAN],
     ]
 )
 
-# ADD_OVS_TUNNEL([type], [bridge], [port], [remote-addr], [overlay-addr])
+# ADD_OVS_TUNNEL([type], [bridge], [port], [remote-addr], [overlay-addr [ofport]])
 #
 # Add an ovs-based tunnel device in the root namespace, with name 'port' and
 # type 'type'. The tunnel device will be configured as point-to-point with the
@@ -107,9 +113,17 @@ m4_define([ADD_VLAN],
 #
 # 'port will be configured with the address 'overlay-addr'.
 #
+# Optionally one can specify the 'ofport' number
+#
 m4_define([ADD_OVS_TUNNEL],
-   [AT_CHECK([ovs-vsctl add-port $2 $3 -- \
-              set int $3 type=$1 options:remote_ip=$4])
+   [if test -n "$6"; then
+      AT_CHECK([ovs-vsctl add-port $2 $3 -- \
+                set int $3 type=$1 options:remote_ip=$4 \
+		ofport_request=$6])
+    else
+      AT_CHECK([ovs-vsctl add-port $2 $3 -- \
+                set int $3 type=$1 options:remote_ip=$4])
+    fi
     AT_CHECK([ip addr add dev $2 $5])
     AT_CHECK([ip link set dev $2 up])
     AT_CHECK([ip link set dev $2 mtu 1450])
@@ -143,6 +157,12 @@ m4_define([ADD_NATIVE_TUNNEL],
 #
 m4_define([FORMAT_PING], [grep "transmitted" | sed 's/time.*ms$/time 0ms/'])
 
+# FORMAT_DD([])
+#
+# Strip variant pieces from dd output so the output can be reliably compared.
+#
+m4_define([FORMAT_DD], [sed 's/copied,.*$/copied, .../'])
+
 # FORMAT_CT([ip-addr])
 #
 # Strip content from the piped input which would differ from test to test
diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
index e1b5707925a5..c71186630e99 100644
--- a/tests/system-kmod-macros.at
+++ b/tests/system-kmod-macros.at
@@ -95,3 +95,25 @@ m4_define([CHECK_CONNTRACK_LOCAL_STACK])
 # always supports NAT, so no check is needed.
 #
 m4_define([CHECK_CONNTRACK_NAT])
+
+# CHECK_MPLS()
+#
+# Perform requirements checks for running MPLS tests.
+#
+m4_define([CHECK_MPLS],
+    [AT_SKIP_IF([test $HAVE_PYTHON = no])
+     m4_foreach([mod], [[mpls_router]],
+                [modprobe mod || echo "Module mod not loaded."
+                 on_exit 'modprobe -r mod'
+                ])
+     # Requires Linux v4.7+ and ip route v4.7+
+     AT_CHECK([# Prepare
+               echo 101 > /proc/sys/net/mpls/platform_labels || exit 77 # skip
+               ip -f mpls route del 100
+               # Test
+               ip -f mpls route add 100 dev lo && \
+               # Cleanup \
+               ip -f mpls route del 100 && \
+               echo 0 > /proc/sys/net/mpls/platform_labels || exit 77 #skip])
+    ]
+)
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 2f42efaeacbc..f14a9ff47a5b 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -33,12 +33,27 @@ ADD_NAMESPACES(at_ns0, at_ns1)
 ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
 ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
 
+dnl Disable TSO as to exercise software segmentation
+dnl when outputting GSO skbs over GRE from OvS
+AT_CHECK([ethtool -K ovs-p0 tso off])
+
 NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+AT_CHECK([yes | dd bs=1k count=32 of=32k.txt], [0], [], stderr)
+AT_CHECK([cat stderr | FORMAT_DD], [0], [dnl
+32+0 records in
+32+0 records out
+32768 bytes (33 kB) copied, ...
+])
 NETNS_DAEMONIZE([at_ns1], [[$PYTHON $srcdir/test-l7.py]], [http0.pid])
-NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2/32k.txt -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+
+dnl Use the absence of retransmitted segments as a proxy for functioning TSO
+NS_CHECK_EXEC([at_ns1], [netstat -s | grep retransmited], [0], [dnl
+    0 segments retransmited
+])
 
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
@@ -69,6 +84,69 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.2.2.2 | FORMAT_PING
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([datapath - http over mpls between two ports])
+CHECK_MPLS()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+dnl Set up underlay
+ADD_VETH(p0, at_ns0, br0, "172.31.1.1/24", [], [], 2)
+ADD_VETH(p1, at_ns1, br0, "172.31.1.2/24", [], [], 3)
+
+dnl Set up MPLS overlay
+dnl IP is encapsulated in MPLS when sent from and recieved from ns0
+dnl OvS, sitting between ns0 and ns1 pushes MPLS onto IP recieved from ns1
+dnl befor sending to ns0, and pops MPLS recieved from ns0 and sends the
+dnl resulting IP packets to ns1
+NS_CHECK_EXEC([at_ns1], [ip addr add 10.1.1.2/24 dev p1])
+
+dnl push MPLS LSE on packets from p1 (ns1) to p0 (ns0)
+AT_CHECK([ovs-ofctl add-flow br0 "in_port=3,ip,nw_src=10.1.1.2,actions=push_mpls:0x8847,set_field:101->mpls_label,output:2"])
+
+dnl pop MPLS LSE from packets from p0 (ns0) to p1 (ns1)
+AT_CHECK([ovs-ofctl add-flow br0 "in_port=2,dl_type=0x8847,mpls_label=100,actions=pop_mpls:0x0800,output:3"])
+
+dnl Default to normal rule
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+dnl Allow MPLS forwarding of packets received on p0 in ns0
+NS_CHECK_EXEC([at_ns0], [echo 1 > /proc/sys/net/mpls/conf/p0/input])
+dnl Larger than MPLS label to be routed by ns0 (101)
+NS_CHECK_EXEC([at_ns0], [echo 102 > /proc/sys/net/mpls/platform_labels])
+
+dnl Set up route to encapsulate 10.1.1.0/24 packets in MPLS
+NS_CHECK_EXEC([at_ns0], [ip route add 10.1.1.0/24 encap mpls 100 via inet 172.31.1.2 dev p0])
+
+dnl Set up route to decapsulate MPLS label 101 and deliver locally
+NS_CHECK_EXEC([at_ns0], [ip -f mpls route add 101 dev lo])
+
+dnl Set loopback interface up for local delivery
+NS_CHECK_EXEC([at_ns0], [ip link set up dev lo])
+
+dnl Test ping
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Test http
+AT_CHECK([yes | dd bs=1k count=32 of=32k.txt], [0], [], stderr)
+AT_CHECK([cat stderr | FORMAT_DD], [0], [dnl
+32+0 records in
+32+0 records out
+32768 bytes (33 kB) copied, ...
+])
+NETNS_DAEMONIZE([at_ns1], [[$PYTHON $srcdir/test-l7.py]], [http0.pid])
+NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2/32k.txt -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+
+dnl Use the absence of retransmitted segments as a proxy for functioning GSO
+NS_CHECK_EXEC([at_ns1], [netstat -s | grep retransmited], [0], [dnl
+    0 segments retransmited
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([datapath - ping6 between two ports])
 OVS_TRAFFIC_VSWITCHD_START()
 
@@ -206,6 +284,151 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PI
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+AT_CHECK([yes | dd bs=1k count=32 of=32k.txt], [0], [], stderr)
+AT_CHECK([cat stderr | FORMAT_DD], [0], [dnl
+32+0 records in
+32+0 records out
+32768 bytes (33 kB) copied, ...
+])
+NETNS_DAEMONIZE([at_ns1], [[$PYTHON $srcdir/test-l7.py]], [http0.pid])
+NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2/32k.txt -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - http over gre tunnel])
+OVS_CHECK_GRE()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+dnl Set up underlay link from host into the at_ns0 namespace using veth pair.
+ADD_VETH(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the at_ns0 namespace and
+dnl with a native linux device inside the namespace.
+ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24])
+ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], [10.1.1.1/24])
+
+dnl Add veth pair connected to to br0 and at_ns1 namespace
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Disable TSO as to exercise software segmentation
+dnl when outputting GSO skbs over GRE from OvS
+AT_CHECK([ethtool -K ovs-p0 tso off])
+
+dnl First check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Next check the overlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2  | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Now check HTTP transfer
+AT_CHECK([yes | dd bs=1k count=32 of=32k.txt], [0], [], stderr)
+AT_CHECK([cat stderr | FORMAT_DD], [0], [dnl
+32+0 records in
+32+0 records out
+32768 bytes (33 kB) copied, ...
+])
+NETNS_DAEMONIZE([at_ns1], [[$PYTHON $srcdir/test-l7.py]], [http0.pid])
+NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2/32k.txt -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+
+dnl Use the absence of retransmitted segments as a proxy for functioning TSO
+NS_CHECK_EXEC([at_ns1], [netstat -s | grep retransmited], [0], [dnl
+    0 segments retransmited
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - http over mpls over gre tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_MPLS()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+dnl Set up underlay link from host into the at_ns0 namespace using veth pair.
+ADD_VETH(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the at_ns0 namespace and
+dnl with a native linux device inside the namespace.
+ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24], 2)
+ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], [10.1.1.1/24])
+
+dnl Add veth pair connected to to br0 and at_ns1 namespace
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", [], [], 3)
+
+dnl Set up MPLS overlay
+dnl IP is encapsulated in MPLS when sent from and recieved from ns0
+dnl OvS, sitting between ns0 and ns1 pushes MPLS onto IP recieved from ns1
+dnl befor sending to ns0, and pops MPLS recieved from ns0 and sends the
+dnl resulting IP packets to ns1
+NS_CHECK_EXEC([at_ns1], [ip addr add 10.1.2.2/24 dev p1])
+
+dnl push MPLS LSE on packets from p1 (ns1) to p0 (ns0)
+AT_CHECK([ovs-ofctl add-flow br0 "in_port=3,ip,nw_src=10.1.2.2,actions=push_mpls:0x8847,set_field:101->mpls_label,output:2"])
+
+dnl pop MPLS LSE from packets from p0 (ns0) to p1 (ns1)
+AT_CHECK([ovs-ofctl add-flow br0 "in_port=2,dl_type=0x8847,mpls_label=100,actions=pop_mpls:0x0800,output:3"])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+dnl Allow MPLS forwarding of packets received on p0 in ns0
+NS_CHECK_EXEC([at_ns0], [echo 1 > /proc/sys/net/mpls/conf/ns_gre0/input])
+dnl Larger than MPLS label to be routed by ns0 (101)
+NS_CHECK_EXEC([at_ns0], [echo 102 > /proc/sys/net/mpls/platform_labels])
+
+dnl Set up route to encapsulate 10.1.2.0/24 packets in MPLS
+NS_CHECK_EXEC([at_ns0], [ip route add 10.1.2.0/24 encap mpls 100 via inet 10.1.1.2 dev ns_gre0])
+
+dnl Set up route to decapsulate MPLS label 101 and deliver locally
+NS_CHECK_EXEC([at_ns0], [ip -f mpls route add 101 dev lo])
+
+dnl Set loopback interface up for local delivery
+NS_CHECK_EXEC([at_ns0], [ip link set up dev lo])
+
+dnl First check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Next check the overlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.2.2  | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Now check HTTP transfer
+AT_CHECK([yes | dd bs=1k count=32 of=32k.txt], [0], [], stderr)
+AT_CHECK([cat stderr | FORMAT_DD], [0], [dnl
+32+0 records in
+32+0 records out
+32768 bytes (33 kB) copied, ...
+])
+NETNS_DAEMONIZE([at_ns1], [[$PYTHON $srcdir/test-l7.py]], [http0.pid])
+NS_CHECK_EXEC([at_ns0], [wget 10.1.2.2/32k.txt -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+
+dnl Use the absence of retransmitted segments as a proxy for functioning GSO
+NS_CHECK_EXEC([at_ns1], [netstat -s | grep retransmited], [0], [dnl
+    0 segments retransmited
+])
+
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
       [not found]                                         ` <20160822110444.GA29971-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-08-22 21:47                                           ` Joe Stringer
       [not found]                                             ` <CAPWQB7EQhbcDEk==AmN58Qxndmd6oHpw8z78kj2Q4M4-mD7+Dw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Joe Stringer @ 2016-08-22 21:47 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers, Jiri Benc

On 22 August 2016 at 04:04, Simon Horman <simon.horman@netronome.com> wrote:
> On Wed, Aug 10, 2016 at 10:17:30AM -0700, Joe Stringer wrote:
>> On 10 August 2016 at 03:20, Simon Horman <simon.horman@netronome.com> wrote:
>> > On Tue, Aug 09, 2016 at 08:47:40AM -0700, pravin shelar wrote:
>> >> On Mon, Aug 8, 2016 at 8:17 AM, Simon Horman <simon.horman@netronome.com> wrote:
>> >> > Light testing seems to indicate that it works for GSO skbs
>> >> > received over both L3 and L2 GRE tunnels by OvS with both
>> >> > IP-in-MPLS and IP (without MPLS) payloads.
>> >> >
>> >>
>> >> Thanks for testing it. Can you also add those tests to OVS kmod test suite?
>> >> ..
>> >
>> > Sure, I will look into doing that.
>> > Am I correct in thinking Joe Stringer is the best person to contact if
>> > I run into trouble there?
>>
>> Sure. The basics of running the tests is documented here:
>> https://github.com/openvswitch/ovs/blob/master/INSTALL.md#datapath-testing
>>
>> You should be able to get a good feel for how to add tests by perusing
>> the commits to tests/system-{traffic,kmod-macros}.at in the OVS source
>> tree.
>
> Thanks Joe,
>
> it took me a while but I think that I have something working
> against the head branch of the OVS tree. I'd value opinions
> on the direction I have taken.
>
> Subject: [PATCH] system-traffic: Exercise GSO
>
> Exercise GSO for: unencapsulated; MPLS; GRE; and MPLS in GRE.
>
> There is scope to extend this testing to other encapsulation formats
> if desired.
>
> This is motivated by a desire to test GRE and MPLS encapsulation in
> the context of L3/VPN (MPLS over non-TEB GRE work). That is not
> tested here but tests for those cases would ideally be based on those in
> this patch.

This makes sense to me. There's a few corners that could be improved,
primarily for reproducing sane results on a variety of systems, then a
couple of style comments. Please do run the tests via both "make
check-kernel" and "make check-system-userspace" before submitting,
ideally with at least two varieties of kernel: One where you would
expect the test to pass, and one where you would expect the tests to
be skipped.

* CHECK_MPLS is defined in system-kmod-macros.at, so a corresponding
version should be provided in system-userspace-macros.at. If the
criteria for running the test(s) with both userspace and kernel
datapaths is the same, then this could instead be moved into
system-common-macros.at.
* "datapath - ping over gre tunnel" adds a command to execute in
at_ns1, but that namespace doesn't exist.
* "datapath - http over gre tunnel" is missing MPLS_CHECK.
* Is there a way to clear the netstat statistics before running the
tests which rely on it? I'm getting a failure on one of my systems
(ubuntu trusty with a 4.7 kernel), but I'm not sure if the counter was
already high before I ran the test.
* "datapath - http over mpls between two ports"  (maybe others too?)
should shift all openflow rules into a single section using AT_DATA,
similar to the other tests. This makes it easier to reason about the
flow table and understand what's going on before reading through the
rest of the test.
* If there is a common set of configuration you do for local stack
within a namespace to route MPLS traffic, you could consider adding
another macro into system-common-macros.at.

I also see this error on "http over mpls over gre tunnel":
+sh: 1: cannot create /proc/sys/net/mpls/conf/ns_gre0/input: Directory
nonexistent

Maybe MPLS + GRE needs a separate check?
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
       [not found]                                             ` <CAPWQB7EQhbcDEk==AmN58Qxndmd6oHpw8z78kj2Q4M4-mD7+Dw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-08-23  8:51                                               ` Simon Horman
       [not found]                                                 ` <20160823085144.GA22304-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Horman @ 2016-08-23  8:51 UTC (permalink / raw)
  To: Joe Stringer; +Cc: ovs dev, Linux Kernel Network Developers, Jiri Benc

On Mon, Aug 22, 2016 at 02:47:42PM -0700, Joe Stringer wrote:
> On 22 August 2016 at 04:04, Simon Horman <simon.horman@netronome.com> wrote:
> > On Wed, Aug 10, 2016 at 10:17:30AM -0700, Joe Stringer wrote:
> >> On 10 August 2016 at 03:20, Simon Horman <simon.horman@netronome.com> wrote:
> >> > On Tue, Aug 09, 2016 at 08:47:40AM -0700, pravin shelar wrote:
> >> >> On Mon, Aug 8, 2016 at 8:17 AM, Simon Horman <simon.horman@netronome.com> wrote:
> >> >> > Light testing seems to indicate that it works for GSO skbs
> >> >> > received over both L3 and L2 GRE tunnels by OvS with both
> >> >> > IP-in-MPLS and IP (without MPLS) payloads.
> >> >> >
> >> >>
> >> >> Thanks for testing it. Can you also add those tests to OVS kmod test suite?
> >> >> ..
> >> >
> >> > Sure, I will look into doing that.
> >> > Am I correct in thinking Joe Stringer is the best person to contact if
> >> > I run into trouble there?
> >>
> >> Sure. The basics of running the tests is documented here:
> >> https://github.com/openvswitch/ovs/blob/master/INSTALL.md#datapath-testing
> >>
> >> You should be able to get a good feel for how to add tests by perusing
> >> the commits to tests/system-{traffic,kmod-macros}.at in the OVS source
> >> tree.
> >
> > Thanks Joe,
> >
> > it took me a while but I think that I have something working
> > against the head branch of the OVS tree. I'd value opinions
> > on the direction I have taken.
> >
> > Subject: [PATCH] system-traffic: Exercise GSO
> >
> > Exercise GSO for: unencapsulated; MPLS; GRE; and MPLS in GRE.
> >
> > There is scope to extend this testing to other encapsulation formats
> > if desired.
> >
> > This is motivated by a desire to test GRE and MPLS encapsulation in
> > the context of L3/VPN (MPLS over non-TEB GRE work). That is not
> > tested here but tests for those cases would ideally be based on those in
> > this patch.
> 
> This makes sense to me. There's a few corners that could be improved,
> primarily for reproducing sane results on a variety of systems, then a
> couple of style comments. Please do run the tests via both "make
> check-kernel" and "make check-system-userspace" before submitting,
> ideally with at least two varieties of kernel: One where you would
> expect the test to pass, and one where you would expect the tests to
> be skipped.

Thanks. I'm glad I ran this by you before expanding the number of tests.

> * CHECK_MPLS is defined in system-kmod-macros.at, so a corresponding
> version should be provided in system-userspace-macros.at. If the
> criteria for running the test(s) with both userspace and kernel
> datapaths is the same, then this could instead be moved into
> system-common-macros.at.

Understood.

> * "datapath - ping over gre tunnel" adds a command to execute in
> at_ns1, but that namespace doesn't exist.

Oops.

> * "datapath - http over gre tunnel" is missing MPLS_CHECK.

Thanks, I'll fix that.

> * Is there a way to clear the netstat statistics before running the
> tests which rely on it? I'm getting a failure on one of my systems
> (ubuntu trusty with a 4.7 kernel), but I'm not sure if the counter was
> already high before I ran the test.

I'll look into that. If not they could be recorded to allow a check
for a non-zero delta.

Possibly an entirely different mechanism is needed to check for GSO
functioning. But I'm not sure what it would be at this point.

> * "datapath - http over mpls between two ports"  (maybe others too?)
> should shift all openflow rules into a single section using AT_DATA,
> similar to the other tests. This makes it easier to reason about the
> flow table and understand what's going on before reading through the
> rest of the test.

Sure, will do.

> * If there is a common set of configuration you do for local stack
> within a namespace to route MPLS traffic, you could consider adding
> another macro into system-common-macros.at.

Ok, possibly there is if some of the configuration is parametrised:
e.g. over the namespace/netdev to send/receive MPLS using native Linux MPLS
routing.

> I also see this error on "http over mpls over gre tunnel":
> +sh: 1: cannot create /proc/sys/net/mpls/conf/ns_gre0/input: Directory
> nonexistent
> 
> Maybe MPLS + GRE needs a separate check?

Yes, that is probably the case.

I believe some versions of the kernel support MPLS routing for
some interfaces but not GRE interfaces.

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
       [not found]                                                 ` <20160823085144.GA22304-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-08-25 10:08                                                   ` Simon Horman
       [not found]                                                     ` <20160825100833.GA31926-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Horman @ 2016-08-25 10:08 UTC (permalink / raw)
  To: Joe Stringer; +Cc: ovs dev, Linux Kernel Network Developers, Jiri Benc

On Tue, Aug 23, 2016 at 10:51:47AM +0200, Simon Horman wrote:
> On Mon, Aug 22, 2016 at 02:47:42PM -0700, Joe Stringer wrote:
> > On 22 August 2016 at 04:04, Simon Horman <simon.horman@netronome.com> wrote:
> > > On Wed, Aug 10, 2016 at 10:17:30AM -0700, Joe Stringer wrote:
> > >> On 10 August 2016 at 03:20, Simon Horman <simon.horman@netronome.com> wrote:
> > >> > On Tue, Aug 09, 2016 at 08:47:40AM -0700, pravin shelar wrote:
> > >> >> On Mon, Aug 8, 2016 at 8:17 AM, Simon Horman <simon.horman@netronome.com> wrote:
> > >> >> > Light testing seems to indicate that it works for GSO skbs
> > >> >> > received over both L3 and L2 GRE tunnels by OvS with both
> > >> >> > IP-in-MPLS and IP (without MPLS) payloads.
> > >> >> >
> > >> >>
> > >> >> Thanks for testing it. Can you also add those tests to OVS kmod test suite?
> > >> >> ..
> > >> >
> > >> > Sure, I will look into doing that.
> > >> > Am I correct in thinking Joe Stringer is the best person to contact if
> > >> > I run into trouble there?
> > >>
> > >> Sure. The basics of running the tests is documented here:
> > >> https://github.com/openvswitch/ovs/blob/master/INSTALL.md#datapath-testing
> > >>
> > >> You should be able to get a good feel for how to add tests by perusing
> > >> the commits to tests/system-{traffic,kmod-macros}.at in the OVS source
> > >> tree.
> > >
> > > Thanks Joe,
> > >
> > > it took me a while but I think that I have something working
> > > against the head branch of the OVS tree. I'd value opinions
> > > on the direction I have taken.
> > >
> > > Subject: [PATCH] system-traffic: Exercise GSO
> > >
> > > Exercise GSO for: unencapsulated; MPLS; GRE; and MPLS in GRE.
> > >
> > > There is scope to extend this testing to other encapsulation formats
> > > if desired.
> > >
> > > This is motivated by a desire to test GRE and MPLS encapsulation in
> > > the context of L3/VPN (MPLS over non-TEB GRE work). That is not
> > > tested here but tests for those cases would ideally be based on those in
> > > this patch.
> > 
> > This makes sense to me. There's a few corners that could be improved,
> > primarily for reproducing sane results on a variety of systems, then a
> > couple of style comments. Please do run the tests via both "make
> > check-kernel" and "make check-system-userspace" before submitting,
> > ideally with at least two varieties of kernel: One where you would
> > expect the test to pass, and one where you would expect the tests to
> > be skipped.

Both make check-kernel and make check-system-userspace are now working.
I have tested against net-next and the 3.16 kernel that ships with
Debian stable.

> Thanks. I'm glad I ran this by you before expanding the number of tests.
> 
> > * CHECK_MPLS is defined in system-kmod-macros.at, so a corresponding
> > version should be provided in system-userspace-macros.at. If the
> > criteria for running the test(s) with both userspace and kernel
> > datapaths is the same, then this could instead be moved into
> > system-common-macros.at.
> 
> Understood.
>
> > * "datapath - ping over gre tunnel" adds a command to execute in
> > at_ns1, but that namespace doesn't exist.
> 
> Oops.

I have removed the chunk in question, it seems to be an artifact
of my development of the tests.

> > * "datapath - http over gre tunnel" is missing MPLS_CHECK.
> 
> Thanks, I'll fix that.

On further inspection it seems tome that this check does not use MPLS,
rather it is testing GSO for GRE (without MPLS).

> > * Is there a way to clear the netstat statistics before running the
> > tests which rely on it? I'm getting a failure on one of my systems
> > (ubuntu trusty with a 4.7 kernel), but I'm not sure if the counter was
> > already high before I ran the test.
> 
> I'll look into that. If not they could be recorded to allow a check
> for a non-zero delta.
> 
> Possibly an entirely different mechanism is needed to check for GSO
> functioning. But I'm not sure what it would be at this point.
>
> > * "datapath - http over mpls between two ports"  (maybe others too?)
> > should shift all openflow rules into a single section using AT_DATA,
> > similar to the other tests. This makes it easier to reason about the
> > flow table and understand what's going on before reading through the
> > rest of the test.
> 
> Sure, will do.
> 
> > * If there is a common set of configuration you do for local stack
> > within a namespace to route MPLS traffic, you could consider adding
> > another macro into system-common-macros.at.
> 
> Ok, possibly there is if some of the configuration is parametrised:
> e.g. over the namespace/netdev to send/receive MPLS using native Linux MPLS
> routing.
> 
> > I also see this error on "http over mpls over gre tunnel":
> > +sh: 1: cannot create /proc/sys/net/mpls/conf/ns_gre0/input: Directory
> > nonexistent
> > 
> > Maybe MPLS + GRE needs a separate check?
> 
> Yes, that is probably the case.
> 
> I believe some versions of the kernel support MPLS routing for
> some interfaces but not GRE interfaces.

Please find my working patch below.

From: Simon Horman <horms@verge.net.au>
Subject: [PATCH] system-traffic: Exercise GSO

Exercise GSO for: unencapsulated; MPLS; GRE; and MPLS in GRE.

There is scope to extend this testing to other encapsulation formats
if desired.

This is motivated by a desire to test GRE and MPLS encapsulation in
the context of L3/VPN (MPLS over non-TEB GRE work). That is not
tested here but tests for those cases would idealy be based on those in
this patch.

Signed-off-by: Simon Horman <horms@verge.net.au>
---
v2
* As suggested by Joe Stringer
  -  Do not add spurious ns1 / http commands to "datapath - ping over gre tunnel"
  - Add OVS_CHECK_MPLS_GRE
  - Use AT_SKIP_IF
  - Use AT_DATA
  - Check for zero delta of retransmits rather than absoloute zero
  - Consolidate MPLS configuration into NS_ADD_MPLS_ROUTE macro
  - Move macros from system-kmod-macros.at to system-common-marcos.at
    to allow tests to work for make check-system-userspace to work
    as well as make check-kernel
---
 tests/system-common-macros.at |  94 +++++++++++++++++++--
 tests/system-traffic.at       | 191 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 276 insertions(+), 9 deletions(-)

diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at
index 4ffc382..41040a8 100644
--- a/tests/system-common-macros.at
+++ b/tests/system-common-macros.at
@@ -56,7 +56,7 @@ m4_define([ADD_INT],
     ]
 )
 
-# ADD_VETH([port], [namespace], [ovs-br], [ip_addr] [mac_addr [gateway]])
+# ADD_VETH([port], [namespace], [ovs-br], [ip_addr] [mac_addr [gateway [ofport]]])
 #
 # Add a pair of veth ports. 'port' will be added to name space 'namespace',
 # and "ovs-'port'" will be added to ovs bridge 'ovs-br'.
@@ -64,8 +64,8 @@ m4_define([ADD_INT],
 # The 'port' in 'namespace' will be brought up with static IP address
 # with 'ip_addr' in CIDR notation.
 #
-# Optionally, one can specify the 'mac_addr' for 'port' and the default
-# 'gateway'.
+# Optionally, one can specify the 'mac_addr' for 'port', the default
+# 'gateway' and the 'ofport' number.
 #
 # The existing 'port' or 'ovs-port' will be removed before new ones are added.
 #
@@ -74,8 +74,14 @@ m4_define([ADD_VETH],
       CONFIGURE_VETH_OFFLOADS([$1])
       AT_CHECK([ip link set $1 netns $2])
       AT_CHECK([ip link set dev ovs-$1 up])
-      AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
-                set interface ovs-$1 external-ids:iface-id="$1"])
+      if test -n "$7"; then
+        AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
+                  set interface ovs-$1 external-ids:iface-id="$1" \
+                  ofport_request=$7])
+      else
+        AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
+                  set interface ovs-$1 external-ids:iface-id="$1"])
+      fi
       NS_CHECK_EXEC([$2], [ip addr add $4 dev $1])
       NS_CHECK_EXEC([$2], [ip link set dev $1 up])
       if test -n "$5"; then
@@ -99,7 +105,7 @@ m4_define([ADD_VLAN],
     ]
 )
 
-# ADD_OVS_TUNNEL([type], [bridge], [port], [remote-addr], [overlay-addr])
+# ADD_OVS_TUNNEL([type], [bridge], [port], [remote-addr], [overlay-addr [ofport]])
 #
 # Add an ovs-based tunnel device in the root namespace, with name 'port' and
 # type 'type'. The tunnel device will be configured as point-to-point with the
@@ -107,9 +113,17 @@ m4_define([ADD_VLAN],
 #
 # 'port will be configured with the address 'overlay-addr'.
 #
+# Optionally one can specify the 'ofport' number
+#
 m4_define([ADD_OVS_TUNNEL],
-   [AT_CHECK([ovs-vsctl add-port $2 $3 -- \
-              set int $3 type=$1 options:remote_ip=$4])
+   [if test -n "$6"; then
+      AT_CHECK([ovs-vsctl add-port $2 $3 -- \
+                set int $3 type=$1 options:remote_ip=$4 \
+		ofport_request=$6])
+    else
+      AT_CHECK([ovs-vsctl add-port $2 $3 -- \
+                set int $3 type=$1 options:remote_ip=$4])
+    fi
     AT_CHECK([ip addr add dev $2 $5])
     AT_CHECK([ip link set dev $2 up])
     AT_CHECK([ip link set dev $2 mtu 1450])
@@ -143,6 +157,12 @@ m4_define([ADD_NATIVE_TUNNEL],
 #
 m4_define([FORMAT_PING], [grep "transmitted" | sed 's/time.*ms$/time 0ms/'])
 
+# FORMAT_DD([])
+#
+# Strip variant pieces from dd output so the output can be reliably compared.
+#
+m4_define([FORMAT_DD], [sed 's/copied,.*$/copied, .../'])
+
 # FORMAT_CT([ip-addr])
 #
 # Strip content from the piped input which would differ from test to test
@@ -168,10 +188,68 @@ m4_define([NETNS_DAEMONIZE],
 m4_define([OVS_CHECK_VXLAN],
     [AT_SKIP_IF([! ip link add foo type vxlan help 2>&1 | grep dstport >/dev/null])])
 
+# OVS_CHECK_MPLS()
+#
+# Perform requirements checks for running MPLS tests.
+#
+m4_define([OVS_CHECK_MPLS],
+    [m4_foreach([mod], [[mpls_router]],
+                [modprobe mod || echo "Module mod not loaded."
+                 on_exit 'modprobe -r mod'
+                ])
+     dnl Requires Linux v4.7+ and ip route v4.7+
+     dnl  Prepare
+     AT_SKIP_IF([! echo 101 > /proc/sys/net/mpls/platform_labels])
+     [ip -f mpls route del 100]
+     dnl  Test
+     AT_SKIP_IF([! ip -f mpls route add 100 dev lo])
+     dnl  Cleanup
+     AT_SKIP_IF([! ip -f mpls route del 100])
+     AT_SKIP_IF([! echo 0 > /proc/sys/net/mpls/platform_labels])
+    ]
+)
+
 # OVS_CHECK_GRE()
 m4_define([OVS_CHECK_GRE],
     [AT_SKIP_IF([! ip link add foo type gretap help 2>&1 | grep gre >/dev/null])])
 
+# OVS_CHECK_MPLS_GRE()
+#
+# Perform requirements checks for running MPLS over GRE tests.
+#
+m4_define([OVS_CHECK_MPLS_GRE],
+    [OVS_CHECK_MPLS()
+     OVS_CHECK_GRE()
+     m4_foreach([mod], [[ip_gre]],
+                [modprobe mod || echo "Module mod not loaded."
+                 on_exit 'modprobe -r mod'
+                ])
+     dnl Requires Linux v4.8
+     AT_SKIP_IF([! test -e /proc/sys/net/mpls/conf/gretap0/input])
+    ]
+)
+
 # OVS_CHECK_GENEVE()
 m4_define([OVS_CHECK_GENEVE],
     [AT_SKIP_IF([! ip link add foo type geneve help 2>&1 | grep geneve >/dev/null])])
+
+# NS_ADD_MPLS_ROUTE([ns], [port], [prefix], [gateway])
+#
+# Route IP packets to prefix via gateway on port using MPLS label 100 and
+# recieve MPLS packets with label 101 on port locally as IP packets
+m4_define([NS_ADD_MPLS_ROUTE],
+    [dnl Allow MPLS forwarding of packets received on $1
+     NS_CHECK_EXEC([$1], [echo 1 > /proc/sys/net/mpls/conf/$2/input])
+
+     dnl Larger than MPLS label to be routed (101)
+     NS_CHECK_EXEC([$1], [echo 102 > /proc/sys/net/mpls/platform_labels])
+
+     dnl Set up route to encapsulate packets to $2 in MPLS
+     NS_CHECK_EXEC([$1], [ip route add $3 encap mpls 100 via inet $4 dev $2])
+
+     dnl Set up route to decapsulate MPLS label 101 and deliver locally
+     NS_CHECK_EXEC([$1], [ip -f mpls route add 101 dev lo])
+
+     dnl Set loopback interface up for local delivery
+     NS_CHECK_EXEC([$1], [ip link set up dev lo])
+    ])
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 2f42efa..3f9db51 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -33,12 +33,27 @@ ADD_NAMESPACES(at_ns0, at_ns1)
 ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
 ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
 
+dnl Disable TSO as to exercise software segmentation
+dnl when outputting GSO skbs over GRE from OvS
+AT_CHECK([ethtool -K ovs-p0 tso off])
+
 NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+AT_CHECK([yes | dd bs=1k count=32 of=32k.txt], [0], [], stderr)
+AT_CHECK([cat stderr | FORMAT_DD], [0], [dnl
+32+0 records in
+32+0 records out
+32768 bytes (33 kB) copied, ...
+])
 NETNS_DAEMONIZE([at_ns1], [[$PYTHON $srcdir/test-l7.py]], [http0.pid])
-NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+NS_CHECK_EXEC([at_ns1], [netstat -s | grep retransmited > retransmit_before])
+NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2/32k.txt -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+NS_CHECK_EXEC([at_ns1], [netstat -s | grep retransmited > retransmit_after])
+
+dnl Use the absence of retransmitted segments as a proxy for functioning TSO
+AT_CHECK([diff -u retransmit_before retransmit_after], [0])
 
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
@@ -69,6 +84,58 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.2.2.2 | FORMAT_PING
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([datapath - http over mpls between two ports])
+OVS_CHECK_MPLS()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+dnl Set up underlay
+ADD_VETH(p0, at_ns0, br0, "172.31.1.1/24", [], [], 2)
+ADD_VETH(p1, at_ns1, br0, "172.31.1.2/24", [], [], 3)
+
+dnl Set up MPLS overlay
+dnl IP is encapsulated in MPLS when sent from and recieved from ns0
+dnl OvS, sitting between ns0 and ns1 pushes MPLS onto IP recieved from ns1
+dnl befor sending to ns0, and pops MPLS recieved from ns0 and sends the
+dnl resulting IP packets to ns1
+NS_CHECK_EXEC([at_ns1], [ip addr add 10.1.1.2/24 dev p1])
+
+AT_DATA([flows.txt], [dnl
+in_port=3,ip,nw_src=10.1.1.2,actions=push_mpls:0x8847,set_field:101->mpls_label,output:2
+in_port=2,dl_type=0x8847,mpls_label=100,actions=pop_mpls:0x0800,output:3
+actions=normal
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+NS_ADD_MPLS_ROUTE([at_ns0], [p0], [10.1.1.0/24], [172.31.1.2])
+
+dnl Allow for overhead of MPLS LSE
+NS_CHECK_EXEC([at_ns1], [ip link set mtu 1496 dev p1])
+
+dnl Test ping
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Test http
+AT_CHECK([yes | dd bs=1k count=32 of=32k.txt], [0], [], stderr)
+AT_CHECK([cat stderr | FORMAT_DD], [0], [dnl
+32+0 records in
+32+0 records out
+32768 bytes (33 kB) copied, ...
+])
+NETNS_DAEMONIZE([at_ns1], [[$PYTHON $srcdir/test-l7.py]], [http0.pid])
+NS_CHECK_EXEC([at_ns1], [netstat -s | grep retransmited > retransmit_before])
+NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2/32k.txt -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+NS_CHECK_EXEC([at_ns1], [netstat -s | grep retransmited > retransmit_after])
+
+dnl Use the absence of retransmitted segments as a proxy for functioning GSO
+AT_CHECK([diff -u retransmit_before retransmit_after], [0])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([datapath - ping6 between two ports])
 OVS_TRAFFIC_VSWITCHD_START()
 
@@ -209,6 +276,128 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PI
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([datapath - http over gre tunnel])
+OVS_CHECK_GRE()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+dnl Set up underlay link from host into the at_ns0 namespace using veth pair.
+ADD_VETH(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the at_ns0 namespace and
+dnl with a native linux device inside the namespace.
+ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24])
+ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], [10.1.1.1/24])
+
+dnl Add veth pair connected to to br0 and at_ns1 namespace
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Disable TSO as to exercise software segmentation
+dnl when outputting GSO skbs over GRE from OvS
+AT_CHECK([ethtool -K ovs-p0 tso off])
+
+dnl First check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Next check the overlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2  | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Now check HTTP transfer
+AT_CHECK([yes | dd bs=1k count=32 of=32k.txt], [0], [], stderr)
+AT_CHECK([cat stderr | FORMAT_DD], [0], [dnl
+32+0 records in
+32+0 records out
+32768 bytes (33 kB) copied, ...
+])
+NETNS_DAEMONIZE([at_ns1], [[$PYTHON $srcdir/test-l7.py]], [http0.pid])
+NS_CHECK_EXEC([at_ns1], [netstat -s | grep retransmited > retransmit_before])
+NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2/32k.txt -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+NS_CHECK_EXEC([at_ns1], [netstat -s | grep retransmited > retransmit_after])
+
+dnl Use the absence of retransmitted segments as a proxy for functioning TSO
+AT_CHECK([diff -u retransmit_before retransmit_after], [0])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - http over mpls over gre tunnel])
+OVS_CHECK_MPLS_GRE()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+dnl Set up underlay link from host into the at_ns0 namespace using veth pair.
+ADD_VETH(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the at_ns0 namespace and
+dnl with a native linux device inside the namespace.
+ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24], 2)
+ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], [10.1.1.1/24])
+
+dnl Add veth pair connected to to br0 and at_ns1 namespace
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", [], [], 3)
+
+dnl Set up MPLS overlay
+dnl IP is encapsulated in MPLS when sent from and recieved from ns0
+dnl OvS, sitting between ns0 and ns1 pushes MPLS onto IP recieved from ns1
+dnl befor sending to ns0, and pops MPLS recieved from ns0 and sends the
+dnl resulting IP packets to ns1
+NS_CHECK_EXEC([at_ns1], [ip addr add 10.1.2.2/24 dev p1])
+
+AT_DATA([flows.txt], [dnl
+in_port=3,ip,nw_src=10.1.2.2,actions=push_mpls:0x8847,set_field:101->mpls_label,output:2
+in_port=2,dl_type=0x8847,mpls_label=100,actions=pop_mpls:0x0800,output:3
+actions=normal
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+NS_ADD_MPLS_ROUTE([at_ns0], [ns_gre0], [10.1.2.0/24], [10.1.1.2])
+
+dnl First check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Next check the overlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.2.2  | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Now check HTTP transfer
+AT_CHECK([yes | dd bs=1k count=32 of=32k.txt], [0], [], stderr)
+AT_CHECK([cat stderr | FORMAT_DD], [0], [dnl
+32+0 records in
+32+0 records out
+32768 bytes (33 kB) copied, ...
+])
+NETNS_DAEMONIZE([at_ns1], [[$PYTHON $srcdir/test-l7.py]], [http0.pid])
+NS_CHECK_EXEC([at_ns1], [netstat -s | grep retransmited > retransmit_before])
+NS_CHECK_EXEC([at_ns0], [wget 10.1.2.2/32k.txt -t 3 -T 1 --retry-connrefused -v -o wget0.log])
+NS_CHECK_EXEC([at_ns1], [netstat -s | grep retransmited > retransmit_after])
+
+dnl Use the absence of retransmitted segments as a proxy for functioning GSO
+AT_CHECK([diff -u retransmit_before retransmit_after], [0])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([datapath - ping over geneve tunnel])
 OVS_CHECK_GENEVE()
 
-- 
2.1.4


_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
       [not found]                                                     ` <20160825100833.GA31926-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-08-26  0:33                                                       ` Joe Stringer
       [not found]                                                         ` <CAPWQB7G8RekHoTMNR5jAJGu7n2i8fNZ1=Fvj4XX_tXVSovpGug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Joe Stringer @ 2016-08-26  0:33 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers, Jiri Benc

On 25 August 2016 at 03:08, Simon Horman <simon.horman@netronome.com> wrote:
> Please find my working patch below.
>
> From: Simon Horman <horms@verge.net.au>
> Subject: [PATCH] system-traffic: Exercise GSO
>
> Exercise GSO for: unencapsulated; MPLS; GRE; and MPLS in GRE.
>
> There is scope to extend this testing to other encapsulation formats
> if desired.
>
> This is motivated by a desire to test GRE and MPLS encapsulation in
> the context of L3/VPN (MPLS over non-TEB GRE work). That is not
> tested here but tests for those cases would idealy be based on those in
> this patch.
>
> Signed-off-by: Simon Horman <horms@verge.net.au>

I realised that these tests disable TSO, but they don't actually check
if GSO is enabled. Maybe it's safe to assume this, but it's more
explicit to actually look for it in the tests.

With particular setups (fedora23 in particular, both kernel and
userspace testsuites) I see this:

./system-traffic.at:371: ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
ip route add 10.1.2.0/24 encap mpls 100 via inet 10.1.1.2 dev ns_gre0
NS_EXEC_HEREDOC
--- /dev/null 2016-08-19 01:28:02.151000000 +0000
+++ /home/gitlab-runner/builds/83c49bff/0/root/gitlab-ovs/ovs/tests/system-kmod-testsuite.dir/at-groups/10/stderr
2016-08-25 17:16:27.324000000 +0000
@@ -0,0 +1 @@
+Error: either "to" is duplicate, or "encap" is a garbage.

I'm guessing the ip tool is a little out of date. We could detect and
skip this with something like:

AT_SKIP_IF([ip route help 2>&1 | grep encap])

in the CHECK_MPLS.

Hmm, I'm still seeing the bad counts of segments retransmited even
with the diff change on a kernel I have built at bf0f500bd019 ("Merge
tag 'trace-v4.8-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace").
If it's passing on latest net-next then maybe I just need to swap out
that box's kernel for a newer build. I'll try that.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
       [not found]                                                         ` <CAPWQB7G8RekHoTMNR5jAJGu7n2i8fNZ1=Fvj4XX_tXVSovpGug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-08-26  9:13                                                           ` Simon Horman
       [not found]                                                             ` <20160826091322.GE22464-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Simon Horman @ 2016-08-26  9:13 UTC (permalink / raw)
  To: Joe Stringer; +Cc: ovs dev, Linux Kernel Network Developers, Jiri Benc

On Thu, Aug 25, 2016 at 05:33:57PM -0700, Joe Stringer wrote:
> On 25 August 2016 at 03:08, Simon Horman <simon.horman@netronome.com> wrote:
> > Please find my working patch below.
> >
> > From: Simon Horman <horms@verge.net.au>
> > Subject: [PATCH] system-traffic: Exercise GSO
> >
> > Exercise GSO for: unencapsulated; MPLS; GRE; and MPLS in GRE.
> >
> > There is scope to extend this testing to other encapsulation formats
> > if desired.
> >
> > This is motivated by a desire to test GRE and MPLS encapsulation in
> > the context of L3/VPN (MPLS over non-TEB GRE work). That is not
> > tested here but tests for those cases would idealy be based on those in
> > this patch.
> >
> > Signed-off-by: Simon Horman <horms@verge.net.au>
> 
> I realised that these tests disable TSO, but they don't actually check
> if GSO is enabled. Maybe it's safe to assume this, but it's more
> explicit to actually look for it in the tests.

Good point, I'll see about checking that.

> With particular setups (fedora23 in particular, both kernel and
> userspace testsuites) I see this:
> 
> ./system-traffic.at:371: ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
> ip route add 10.1.2.0/24 encap mpls 100 via inet 10.1.1.2 dev ns_gre0
> NS_EXEC_HEREDOC
> --- /dev/null 2016-08-19 01:28:02.151000000 +0000
> +++ /home/gitlab-runner/builds/83c49bff/0/root/gitlab-ovs/ovs/tests/system-kmod-testsuite.dir/at-groups/10/stderr
> 2016-08-25 17:16:27.324000000 +0000
> @@ -0,0 +1 @@
> +Error: either "to" is duplicate, or "encap" is a garbage.
> 
> I'm guessing the ip tool is a little out of date. We could detect and
> skip this with something like:
> 
> AT_SKIP_IF([ip route help 2>&1 | grep encap])
> 
> in the CHECK_MPLS.

Thanks, I'll add something like that.

> Hmm, I'm still seeing the bad counts of segments retransmited even
> with the diff change on a kernel I have built at bf0f500bd019 ("Merge
> tag 'trace-v4.8-1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace").
> If it's passing on latest net-next then maybe I just need to swap out
> that box's kernel for a newer build. I'll try that.

It is possible that it is detecting a bug.
Which test is failing?

At this stage I have mostly added TSO/GSO testing to existing checks.
Perhaps it would be better to break them out into separate checks so
ping/http can be be checked without considering TSO/GSO which may have some
value in situations where TSO/GSO is broken which is actually what I am
interested in testing.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
       [not found]                                                             ` <20160826091322.GE22464-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-08-30 23:23                                                               ` Joe Stringer
  0 siblings, 0 replies; 36+ messages in thread
From: Joe Stringer @ 2016-08-30 23:23 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers, Jiri Benc

On 26 August 2016 at 02:13, Simon Horman <simon.horman@netronome.com> wrote:
> On Thu, Aug 25, 2016 at 05:33:57PM -0700, Joe Stringer wrote:
>> On 25 August 2016 at 03:08, Simon Horman <simon.horman@netronome.com> wrote:
>> > Please find my working patch below.
>> >
>> > From: Simon Horman <horms@verge.net.au>
>> > Subject: [PATCH] system-traffic: Exercise GSO
>> >
>> > Exercise GSO for: unencapsulated; MPLS; GRE; and MPLS in GRE.
>> >
>> > There is scope to extend this testing to other encapsulation formats
>> > if desired.
>> >
>> > This is motivated by a desire to test GRE and MPLS encapsulation in
>> > the context of L3/VPN (MPLS over non-TEB GRE work). That is not
>> > tested here but tests for those cases would idealy be based on those in
>> > this patch.
>> >
>> > Signed-off-by: Simon Horman <horms@verge.net.au>
>>
>> I realised that these tests disable TSO, but they don't actually check
>> if GSO is enabled. Maybe it's safe to assume this, but it's more
>> explicit to actually look for it in the tests.
>
> Good point, I'll see about checking that.
>
>> With particular setups (fedora23 in particular, both kernel and
>> userspace testsuites) I see this:
>>
>> ./system-traffic.at:371: ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
>> ip route add 10.1.2.0/24 encap mpls 100 via inet 10.1.1.2 dev ns_gre0
>> NS_EXEC_HEREDOC
>> --- /dev/null 2016-08-19 01:28:02.151000000 +0000
>> +++ /home/gitlab-runner/builds/83c49bff/0/root/gitlab-ovs/ovs/tests/system-kmod-testsuite.dir/at-groups/10/stderr
>> 2016-08-25 17:16:27.324000000 +0000
>> @@ -0,0 +1 @@
>> +Error: either "to" is duplicate, or "encap" is a garbage.
>>
>> I'm guessing the ip tool is a little out of date. We could detect and
>> skip this with something like:
>>
>> AT_SKIP_IF([ip route help 2>&1 | grep encap])
>>
>> in the CHECK_MPLS.
>
> Thanks, I'll add something like that.
>
>> Hmm, I'm still seeing the bad counts of segments retransmited even
>> with the diff change on a kernel I have built at bf0f500bd019 ("Merge
>> tag 'trace-v4.8-1' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace").
>> If it's passing on latest net-next then maybe I just need to swap out
>> that box's kernel for a newer build. I'll try that.
>
> It is possible that it is detecting a bug.
> Which test is failing?

FWIW I tried with a newer build, commit 9a0a5c4cb1af ("net:
systemport: Fix ordering in intrl2_*_mask_clear macro"). I no longer
see the issue.

Unfortunately I lost my test output. It was one of these two:

  8: datapath - ping over gre tunnel                 FAILED
(system-traffic.at:294)
  9: datapath - http over gre tunnel                 FAILED
(system-traffic.at:348)

I also realised that I didn't have MPLS router enabled in my kernel
config so the MPLS tests were getting skipped. I enabled MPLS_ROUTING,
but now I see this failure on the "http over mpls" tests:

./system-traffic.at:111: ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
ip route add 10.1.1.0/24 encap mpls 100 via inet 172.31.1.2 dev p0
NS_EXEC_HEREDOC
--- /dev/null 2016-08-30 15:22:28.813316948 -0700
+++ /home/gitlab-runner/builds/f1d4a2be/0/root/gitlab-ovs/ovs/tests/system-kmod-testsuite.dir/at-groups/4/stderr
2016-08-30 15:33:45.133306581 -0700
@@ -0,0 +1 @@
+RTNETLINK answers: Operation not supported


> At this stage I have mostly added TSO/GSO testing to existing checks.
> Perhaps it would be better to break them out into separate checks so
> ping/http can be be checked without considering TSO/GSO which may have some
> value in situations where TSO/GSO is broken which is actually what I am
> interested in testing.

Sounds reasonable.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ovs-dev] [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
  2016-07-15 21:07           ` [ovs-dev] " pravin shelar
  2016-07-18  4:50             ` Simon Horman
@ 2016-09-26 16:53             ` Jiri Benc
  2016-09-27  4:09               ` pravin shelar
  1 sibling, 1 reply; 36+ messages in thread
From: Jiri Benc @ 2016-09-26 16:53 UTC (permalink / raw)
  To: pravin shelar; +Cc: Simon Horman, Linux Kernel Network Developers, ovs dev

Reviving a very old thread, sorry. Simon handed this over to me, I'm
preparing v12.

On Fri, 15 Jul 2016 14:07:37 -0700, pravin shelar wrote:
> I am not sure if you can use only mac_len to detect L3 packet. This
> does not work with MPLS packets, mac_len is used to account MPLS
> headers pushed on skb. Therefore in case of a MPLS header on L3
> packet, mac_len would be non zero and we have to look at either
> mac_header or some other metadata like is_layer3 flag from key to
> check for L3 packet.

I went through the relevant code paths and I don't see any problem in
using mac_len for that. MPLS GSO seems to work correctly. The kernel
MPLS code expects mac_len to be just the L2 header len, excluding MPLS.
The same is the case for openvswitch (you're not correct that "mac_len
is used to account MPLS headers pushed on skb", at least not with the
current code). In no place I see any problem with mac_len being 0, the
calculations just nicely work.

What was your concern with that, Pravin?

In another mail in this thread you mentioned skb_mpls_header. That one
works correctly with mac_len == 0 if mac_header points to the beginning
of the packet.

You also wrote:

> I was thinking in overall networking stack rather than just ovs
> datapath. I think we should have consistent method of detecting L3
> packet. As commented in previous mail it could be achieved using
> skb-protocol and device type.

Again, mac_len == 0 works correctly and consistently, provided that
both mac_header and network_header point to the same place. In case of
a MPLS packet it would be the beginning of MPLS headers.

> > --- a/include/net/mpls.h
> > +++ b/include/net/mpls.h
> > @@ -34,6 +34,8 @@ static inline bool eth_p_mpls(__be16 eth_type)
> >   */
> >  static inline unsigned char *skb_mpls_header(struct sk_buff *skb)
> >  {
> > -       return skb_mac_header(skb) + skb->mac_len;
> > +       return skb_mac_header_was_set(skb) ?
> > +               skb_mac_header(skb) + skb->mac_len :
> > +               skb->data;
> >  }
> 
> This function is also called from GSO layer.

I don't see it used anywhere outside of openvswitch. Not even when
grepping git history. I may be missing something, though.

> issue is in GSO layer, it
> does reset mac header and mac length and then calls mpls-gso-handler.
> So all subsequent check for L3 packet fails.
> So far we have explored three different ways to detect L3 packet but
> each has its own issue.
> 1. skb mac header : GSO can reset mac header.
> 2. skb mac length : MPLS uses mac_len to account for MPLS header
> length along with L2 header

It does not appear to be the case. Or at least not anymore.

> 3. skb protocol: ETH_P_TEB is not set for all L2 frames, networking
> stack is not ready to handle this type for given skb.
> 
> So none of them works consistently. I think the only option to detect
> L3 packet reliably (and without adding field to skb) is to use
> skb-protocol along with ARPHRD_NONE device type. If ARPHRD_NONE type
> device generates L2 packet it needs to set protocol to ETH_P_TEB. Some
> networking stack function also needs to be fixed to handle this
> protocol type, e.g. vlan_get_protocol(), br_dev_queue_push_xmit(),
> etc.

All of this said, I'm not opposed to using the skb_eth_header_present
helper and checking the device type, it works. I just want to understand
whether I missed some problem with mac_len. Seems to make some things
simpler if we could use mac_len.

Thanks,

 Jiri

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
  2016-09-26 16:53             ` [ovs-dev] " Jiri Benc
@ 2016-09-27  4:09               ` pravin shelar
  0 siblings, 0 replies; 36+ messages in thread
From: pravin shelar @ 2016-09-27  4:09 UTC (permalink / raw)
  To: Jiri Benc; +Cc: ovs dev, Simon Horman, Linux Kernel Network Developers

On Mon, Sep 26, 2016 at 9:53 AM, Jiri Benc <jbenc@redhat.com> wrote:
> Reviving a very old thread, sorry. Simon handed this over to me, I'm
> preparing v12.
>
> On Fri, 15 Jul 2016 14:07:37 -0700, pravin shelar wrote:
>> I am not sure if you can use only mac_len to detect L3 packet. This
>> does not work with MPLS packets, mac_len is used to account MPLS
>> headers pushed on skb. Therefore in case of a MPLS header on L3
>> packet, mac_len would be non zero and we have to look at either
>> mac_header or some other metadata like is_layer3 flag from key to
>> check for L3 packet.
>
> I went through the relevant code paths and I don't see any problem in
> using mac_len for that. MPLS GSO seems to work correctly. The kernel
> MPLS code expects mac_len to be just the L2 header len, excluding MPLS.
> The same is the case for openvswitch (you're not correct that "mac_len
> is used to account MPLS headers pushed on skb", at least not with the
> current code). In no place I see any problem with mac_len being 0, the
> calculations just nicely work.
>
> What was your concern with that, Pravin?
>
> In another mail in this thread you mentioned skb_mpls_header. That one
> works correctly with mac_len == 0 if mac_header points to the beginning
> of the packet.
>
> You also wrote:
>
>> I was thinking in overall networking stack rather than just ovs
>> datapath. I think we should have consistent method of detecting L3
>> packet. As commented in previous mail it could be achieved using
>> skb-protocol and device type.
>
> Again, mac_len == 0 works correctly and consistently, provided that
> both mac_header and network_header point to the same place. In case of
> a MPLS packet it would be the beginning of MPLS headers.
>
>> > --- a/include/net/mpls.h
>> > +++ b/include/net/mpls.h
>> > @@ -34,6 +34,8 @@ static inline bool eth_p_mpls(__be16 eth_type)
>> >   */
>> >  static inline unsigned char *skb_mpls_header(struct sk_buff *skb)
>> >  {
>> > -       return skb_mac_header(skb) + skb->mac_len;
>> > +       return skb_mac_header_was_set(skb) ?
>> > +               skb_mac_header(skb) + skb->mac_len :
>> > +               skb->data;
>> >  }
>>
>> This function is also called from GSO layer.
>
> I don't see it used anywhere outside of openvswitch. Not even when
> grepping git history. I may be missing something, though.
>
>> issue is in GSO layer, it
>> does reset mac header and mac length and then calls mpls-gso-handler.
>> So all subsequent check for L3 packet fails.
>> So far we have explored three different ways to detect L3 packet but
>> each has its own issue.
>> 1. skb mac header : GSO can reset mac header.
>> 2. skb mac length : MPLS uses mac_len to account for MPLS header
>> length along with L2 header
>
> It does not appear to be the case. Or at least not anymore.
>
>> 3. skb protocol: ETH_P_TEB is not set for all L2 frames, networking
>> stack is not ready to handle this type for given skb.
>>
>> So none of them works consistently. I think the only option to detect
>> L3 packet reliably (and without adding field to skb) is to use
>> skb-protocol along with ARPHRD_NONE device type. If ARPHRD_NONE type
>> device generates L2 packet it needs to set protocol to ETH_P_TEB. Some
>> networking stack function also needs to be fixed to handle this
>> protocol type, e.g. vlan_get_protocol(), br_dev_queue_push_xmit(),
>> etc.
>
> All of this said, I'm not opposed to using the skb_eth_header_present
> helper and checking the device type, it works. I just want to understand
> whether I missed some problem with mac_len. Seems to make some things
> simpler if we could use mac_len.
>

After commit 48d2ab609b6bb ("net: mpls: Fixups for GSO") MPLS does not
need to use skb mac-len to track the header, so using mac-len test for
L3 packet detection would result in better and cleaner solution.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2016-09-27  4:09 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-06 17:59 [PATCH net-next v11 0/6] openvswitch: support for layer 3 encapsulated packets Simon Horman
     [not found] ` <1467827996-32547-1-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
2016-07-06 17:59   ` [PATCH net-next v11 1/6] net: introduce skb_transport_header_was_set() Simon Horman
     [not found]     ` <1467827996-32547-2-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
2016-07-07 20:51       ` pravin shelar
2016-07-06 17:59   ` [PATCH net-next v11 2/6] gre: unset mac header for non-TEB packets received by ipgre device Simon Horman
2016-07-07 20:51     ` [ovs-dev] " pravin shelar
2016-07-06 17:59   ` [PATCH net-next v11 3/6] openvswitch: set skb protocol and mac_len when receiving on internal device Simon Horman
     [not found]     ` <1467827996-32547-4-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
2016-07-07 20:52       ` pravin shelar
     [not found]         ` <CAOrHB_B2VDPcEe0B471J+XjmviAbTO0JRPTHiS7jHzF5V8uHZg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-13  7:17           ` Simon Horman
2016-07-06 17:59   ` [PATCH net-next v11 4/6] openvswitch: add support to push and pop mpls for layer3 packets Simon Horman
     [not found]     ` <1467827996-32547-5-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
2016-07-07 20:52       ` pravin shelar
2016-07-10 11:14         ` [ovs-dev] " Simon Horman
2016-07-06 17:59   ` [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support Simon Horman
2016-07-07 20:54     ` [ovs-dev] " pravin shelar
     [not found]       ` <CAOrHB_BYD40ZkWbU0dvhPOCcaCVgooksOUkejxyFoagyoiBTNw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-13  7:31         ` Simon Horman
2016-07-15 21:07           ` [ovs-dev] " pravin shelar
2016-07-18  4:50             ` Simon Horman
2016-07-18 22:34               ` pravin shelar
     [not found]                 ` <CAOrHB_C3Hq-V4uPWLELSc2VMywjYSnKiFJ4VJQDnPpCu7s1Xkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-20  0:02                   ` Simon Horman
     [not found]                     ` <20160720000243.GA4688-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
2016-07-20 18:06                       ` pravin shelar
2016-08-08 15:17                         ` [ovs-dev] " Simon Horman
2016-08-08 15:28                           ` Jiri Benc
2016-08-10 10:16                             ` Simon Horman
     [not found]                           ` <20160808151716.GA8477-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
2016-08-09 15:47                             ` pravin shelar
     [not found]                               ` <CAOrHB_BYtGsWPSs2pxTjPajqFEP=5YySmqjc93NbdtY96-dYfw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-08-10 10:20                                 ` Simon Horman
     [not found]                                   ` <20160810102043.GE5451-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
2016-08-10 17:17                                     ` Joe Stringer
2016-08-22 11:04                                       ` [ovs-dev] " Simon Horman
     [not found]                                         ` <20160822110444.GA29971-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
2016-08-22 21:47                                           ` Joe Stringer
     [not found]                                             ` <CAPWQB7EQhbcDEk==AmN58Qxndmd6oHpw8z78kj2Q4M4-mD7+Dw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-08-23  8:51                                               ` Simon Horman
     [not found]                                                 ` <20160823085144.GA22304-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
2016-08-25 10:08                                                   ` Simon Horman
     [not found]                                                     ` <20160825100833.GA31926-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
2016-08-26  0:33                                                       ` Joe Stringer
     [not found]                                                         ` <CAPWQB7G8RekHoTMNR5jAJGu7n2i8fNZ1=Fvj4XX_tXVSovpGug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-08-26  9:13                                                           ` Simon Horman
     [not found]                                                             ` <20160826091322.GE22464-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
2016-08-30 23:23                                                               ` Joe Stringer
     [not found]               ` <20160718045025.GA2490-ucRxlxcrRFEsysjaEhV7d2ey4e3TpSOZIxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
2016-07-21 15:39                 ` Jiri Benc
2016-09-26 16:53             ` [ovs-dev] " Jiri Benc
2016-09-27  4:09               ` pravin shelar
2016-07-06 17:59   ` [PATCH net-next v11 6/6] openvswitch: use ipgre tunnel rather than gretap tunnel Simon Horman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.