[PATCH net-next v10 0/5] openvswitch: support for layer 3 encapsulated packets

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next v10 0/5] openvswitch: support for layer 3 encapsulated packets
@ 2016-06-02  6:24 Simon Horman
  2016-06-02  6:24 ` [PATCH net-next v10 1/5] net: add skb_vlan_accel helper Simon Horman
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Simon Horman @ 2016-06-02  6:24 UTC (permalink / raw)
  To: netdev; +Cc: dev

At the core of this patch set is removing the assumption in Open vSwitch
datapath that all packets have Ethernet header. Support for layer 3 GRE
tunnels is also added by this patchset.

The implementation relies on the presence of pop_eth and push_eth actions
in datapath flows to facilitate adding and removing Ethernet headers as
appropriate. The construction of such flows is left up to user-space.

This series is based on work by Lorand Jakab, Thomas Morin and others.
And it relies on recently merged work by Jiri Benc, much thanks to him for
his help.

This patch set is comprised of kernel patches against net-next.

To aid review it and the above dependency is available at:

    tree: https://github.com/horms/openvswitch
    branch: me/l3-vpn
    tag: l3-vpn-v10

There is a companion patch set for the Open vSwitch user-space code
which I will post separately to the dev@openvswitch.org mailing list as:

    "[PATCH v11 0/5] userspace: Support for layer 3 encapsulated packets"

Changes since the previous posting are noted in the changelogs
of individual patches.

Lorand Jakab (1):
  openvswitch: add layer 3 flow/port support

Simon Horman (4):
  net: add skb_vlan_accel helper
  openvswitch: set skb protocol and mac_len when receiving on internal
    device
  openvswitch: add support to push and pop mpls for layer3 packets
  openvswitch: use ipgre tunnel rather than gretap tunnel

 include/linux/skbuff.h               |   1 +
 include/net/gre.h                    |   4 +-
 include/uapi/linux/openvswitch.h     |  13 +++
 net/core/skbuff.c                    |  28 +++--
 net/ipv4/ip_gre.c                    |   9 +-
 net/openvswitch/actions.c            |  72 +++++++++++--
 net/openvswitch/datapath.c           |  13 +--
 net/openvswitch/flow.c               |  57 ++++++----
 net/openvswitch/flow.h               |   4 +-
 net/openvswitch/flow_netlink.c       | 200 +++++++++++++++++++++++++----------
 net/openvswitch/vport-geneve.c       |   2 +-
 net/openvswitch/vport-gre.c          |   4 +-
 net/openvswitch/vport-internal_dev.c |  10 ++
 net/openvswitch/vport-netdev.c       |  43 +++++++-
 net/openvswitch/vport-netdev.h       |   2 +
 net/openvswitch/vport-vxlan.c        |   2 +-
 16 files changed, 344 insertions(+), 120 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 1/5] net: add skb_vlan_accel helper
  2016-06-02  6:24 [PATCH net-next v10 0/5] openvswitch: support for layer 3 encapsulated packets Simon Horman
@ 2016-06-02  6:24 ` Simon Horman
  2016-06-02 22:01   ` pravin shelar
  2016-06-02  6:24 ` [PATCH net-next v10 2/5] openvswitch: set skb protocol and mac_len when receiving on internal device Simon Horman
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Simon Horman @ 2016-06-02  6:24 UTC (permalink / raw)
  To: netdev; +Cc: dev

This breaks out some of of skb_vlan_pop into a separate helper.
This new helper moves the outer-most vlan tag present in packet data
into metadata.

The motivation is to allow acceleration VLAN tags without adding a new
one. This is in preparation for a push ethernet header support in Open
vSwitch.

Signed-off-by: Simon Horman <simon.horman@netronome.com>

---
v10 [Simon Horman]
* New patch
---
 include/linux/skbuff.h |  1 +
 net/core/skbuff.c      | 28 +++++++++++++++++++---------
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ee38a4127475..5a7eb1c6f211 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2994,6 +2994,7 @@ int skb_vlan_pop(struct sk_buff *skb);
 int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci);
 struct sk_buff *pskb_extract(struct sk_buff *skb, int off, int to_copy,
 			     gfp_t gfp);
+int skb_vlan_accel(struct sk_buff *skb);
 
 static inline int memcpy_from_msg(void *data, struct msghdr *msg, int len)
 {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f2b77e549c03..99bd231e3bf5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4485,12 +4485,28 @@ pull:
 	return err;
 }
 
-int skb_vlan_pop(struct sk_buff *skb)
+/* Move vlan tag from packet to hw accel tag */
+int skb_vlan_accel(struct sk_buff *skb)
 {
 	u16 vlan_tci;
 	__be16 vlan_proto;
 	int err;
 
+	vlan_proto = skb->protocol;
+	err = __skb_vlan_pop(skb, &vlan_tci);
+	if (unlikely(err))
+		return err;
+
+	__vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
+	return 0;
+}
+EXPORT_SYMBOL(skb_vlan_accel);
+
+int skb_vlan_pop(struct sk_buff *skb)
+{
+	u16 vlan_tci;
+	int err;
+
 	if (likely(skb_vlan_tag_present(skb))) {
 		skb->vlan_tci = 0;
 	} else {
@@ -4503,19 +4519,13 @@ int skb_vlan_pop(struct sk_buff *skb)
 		if (err)
 			return err;
 	}
-	/* move next vlan tag to hw accel tag */
+
 	if (likely((skb->protocol != htons(ETH_P_8021Q) &&
 		    skb->protocol != htons(ETH_P_8021AD)) ||
 		   skb->len < VLAN_ETH_HLEN))
 		return 0;
 
-	vlan_proto = skb->protocol;
-	err = __skb_vlan_pop(skb, &vlan_tci);
-	if (unlikely(err))
-		return err;
-
-	__vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
-	return 0;
+	return skb_vlan_accel(skb);
 }
 EXPORT_SYMBOL(skb_vlan_pop);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 2/5] openvswitch: set skb protocol and mac_len when receiving on internal device
  2016-06-02  6:24 [PATCH net-next v10 0/5] openvswitch: support for layer 3 encapsulated packets Simon Horman
  2016-06-02  6:24 ` [PATCH net-next v10 1/5] net: add skb_vlan_accel helper Simon Horman
@ 2016-06-02  6:24 ` Simon Horman
       [not found]   ` <1464848686-7656-3-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
  2016-06-02  6:24 ` [PATCH net-next v10 3/5] openvswitch: add support to push and pop mpls for layer3 packets Simon Horman
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Simon Horman @ 2016-06-02  6:24 UTC (permalink / raw)
  To: netdev; +Cc: dev

* Set skb protocol based on contents of packet. I have observed this is
  necessary to get actual protocol of a packet when it is injected into an
  internal device e.g. by libnet in which case skb protocol will be set to
  ETH_ALL.

* Set the mac_len which has been observed to not be set up correctly when
  an ARP packet is generated and sent via an openvswitch bridge.
  My test case is a scenario where there are two open vswtich bridges.
  One outputs to a tunnel port which egresses on the other.

The motivation for this is that support for outputting to layer 3 (non-tap)
GRE tunnels as implemented by a subsequent patch depends on protocol and
mac_len being set correctly on receive.

Signed-off-by: Simon Horman <simon.horman@netronome.com>

---
v10
* Set mac_len

v9
* New patch
---
 net/openvswitch/vport-internal_dev.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index 2ee48e447b72..f89b1efa88f1 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -48,6 +48,10 @@ static int internal_dev_xmit(struct sk_buff *skb, struct net_device *netdev)
 {
 	int len, err;

+	skb->protocol = eth_type_trans(skb, netdev);
+	skb_push(skb, ETH_HLEN);
+	skb_reset_mac_len(skb);
+
 	len = skb->len;
 	rcu_read_lock();
 	err = ovs_vport_receive(internal_dev_priv(netdev)->vport, skb, NULL);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 3/5] openvswitch: add support to push and pop mpls for layer3 packets
  2016-06-02  6:24 [PATCH net-next v10 0/5] openvswitch: support for layer 3 encapsulated packets Simon Horman
  2016-06-02  6:24 ` [PATCH net-next v10 1/5] net: add skb_vlan_accel helper Simon Horman
  2016-06-02  6:24 ` [PATCH net-next v10 2/5] openvswitch: set skb protocol and mac_len when receiving on internal device Simon Horman
@ 2016-06-02  6:24 ` Simon Horman
  2016-06-02 22:02   ` pravin shelar
  2016-06-02  6:24 ` [PATCH net-next v10 4/5] openvswitch: add layer 3 flow/port support Simon Horman
  2016-06-02  6:24 ` [PATCH net-next v10 5/5] openvswitch: use ipgre tunnel rather than gretap tunnel Simon Horman
  4 siblings, 1 reply; 23+ messages in thread
From: Simon Horman @ 2016-06-02  6:24 UTC (permalink / raw)
  To: netdev; +Cc: dev

Allow push and pop mpls actions to act on layer 3 packets by teaching
them not to access non-existent L2 headers of such packets.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
v10
* Limit scope of hdr in {push,pop}_mpls()

v9
* New Patch
---
 net/openvswitch/actions.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 9a3eb7a0ebf4..15f130e4c22b 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -172,7 +172,8 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 
 	skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
 
-	update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
+	if (skb->mac_len)
+		update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
 	if (!skb->inner_protocol)
 		skb_set_inner_protocol(skb, skb->protocol);
 	skb->protocol = mpls->mpls_ethertype;
@@ -184,7 +185,6 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 		    const __be16 ethertype)
 {
-	struct ethhdr *hdr;
 	int err;
 
 	err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN);
@@ -199,11 +199,16 @@ static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 	__skb_pull(skb, MPLS_HLEN);
 	skb_reset_mac_header(skb);
 
-	/* skb_mpls_header() is used to locate the ethertype
-	 * field correctly in the presence of VLAN tags.
-	 */
-	hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
-	update_ethertype(skb, hdr, ethertype);
+	if (skb->mac_len) {
+		struct ethhdr *hdr;
+
+		/* skb_mpls_header() is used to locate the ethertype
+		 * field correctly in the presence of VLAN tags.
+		 */
+		hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
+		update_ethertype(skb, hdr, ethertype);
+	}
+
 	if (eth_p_mpls(skb->protocol))
 		skb->protocol = ethertype;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 4/5] openvswitch: add layer 3 flow/port support
  2016-06-02  6:24 [PATCH net-next v10 0/5] openvswitch: support for layer 3 encapsulated packets Simon Horman
                   ` (2 preceding siblings ...)
  2016-06-02  6:24 ` [PATCH net-next v10 3/5] openvswitch: add support to push and pop mpls for layer3 packets Simon Horman
@ 2016-06-02  6:24 ` Simon Horman
       [not found]   ` <1464848686-7656-5-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
  2016-06-02  6:24 ` [PATCH net-next v10 5/5] openvswitch: use ipgre tunnel rather than gretap tunnel Simon Horman
  4 siblings, 1 reply; 23+ messages in thread
From: Simon Horman @ 2016-06-02  6:24 UTC (permalink / raw)
  To: netdev; +Cc: dev

From: Lorand Jakab <lojakab@cisco.com>

Implementation of the pop_eth and push_eth actions in the kernel, and
layer 3 flow support.

This doesn't actually do anything yet as no layer 2 tunnel ports are
supported yet. The original patch by Lorand was against the Open vSwitch
tree which has L2 LISP tunnels but that is not supported in mainline Linux.
I (Simon) plan to follow up with support for non-TEB GRE ports based on
work by Thomas Morin.

Cc: Thomas Morin <thomas.morin@orange.com>
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>

---
v10 [Simon Horman]
* Move outermost VLAN into skb metadata in pop_eth and
  leave any VLAN as-is in push_eth. The effect is to allow the presence
  of a vlan to be independent of pushing and popping ethernet headers.
* Omit unnecessary type field from push_eth action
* Squash with the following patches to make a more complete patch:
  "openvswitch: add layer 3 support to ovs_packet_cmd_execute()"
  "openvswitch: extend layer 3 support to cover non-IP packets"

v9 [Simon Horman]
* Rebase
* Minor coding style updates
* Prohibit push/pop MPLS on l3 packets
* There are no layer 3 ports supported at this time so only
  send and receive layer 2 packets: that is don't actually
  use this new infrastructure yet
* Expect that vports that can handle layer 3 packets will: have
  a type other than ARPHRD_IPETHER; can also handle layer 2 packets;
  and that packets can be differentiated by layer 2 packets having
  skb->protocol set to htons(ETH_P_TEB)

v1 - v8 [Lorand Jakub]

wip: fix: openvswitch: add support to push and pop

* Consistently use skb_hdr() in push_eth() by assigning
  its value to a local variable.
* Limit scope of hdr in push_mpls()
* Recalculate csum for protocl change in push_mpls.
  - Also needed for pop_mpls?
  - Break out into a fix-patch

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 include/uapi/linux/openvswitch.h     |  13 +++
 net/openvswitch/actions.c            |  53 ++++++++++
 net/openvswitch/datapath.c           |  13 +--
 net/openvswitch/flow.c               |  57 ++++++----
 net/openvswitch/flow.h               |   4 +-
 net/openvswitch/flow_netlink.c       | 200 +++++++++++++++++++++++++----------
 net/openvswitch/vport-geneve.c       |   2 +-
 net/openvswitch/vport-gre.c          |   2 +-
 net/openvswitch/vport-internal_dev.c |   6 ++
 net/openvswitch/vport-netdev.c       |  19 +++-
 net/openvswitch/vport-netdev.h       |   2 +
 net/openvswitch/vport-vxlan.c        |   2 +-
 12 files changed, 277 insertions(+), 96 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index bb0d515b7654..ace0ad6229ce 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -329,6 +329,8 @@ enum ovs_key_attr {
 	OVS_KEY_ATTR_CT_ZONE,	/* u16 connection tracking zone. */
 	OVS_KEY_ATTR_CT_MARK,	/* u32 connection tracking mark */
 	OVS_KEY_ATTR_CT_LABELS,	/* 16-octet connection tracking label */
+	OVS_KEY_ATTR_PACKET_ETHERTYPE,  /* be16 Ethernet type for packet
+					 * execution */
 
 #ifdef __KERNEL__
 	OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
@@ -699,6 +701,15 @@ enum ovs_nat_attr {
 
 #define OVS_NAT_ATTR_MAX (__OVS_NAT_ATTR_MAX - 1)
 
+/*
+ * struct ovs_action_push_eth - %OVS_ACTION_ATTR_PUSH_ETH action argument.
+ * @addresses: Source and destination MAC addresses.
+ * @eth_type: Ethernet type
+ */
+struct ovs_action_push_eth {
+	struct ovs_key_ethernet addresses;
+};
+
 /**
  * enum ovs_action_attr - Action types.
  *
@@ -756,6 +767,8 @@ enum ovs_action_attr {
 				       * The data must be zero for the unmasked
 				       * bits. */
 	OVS_ACTION_ATTR_CT,           /* Nested OVS_CT_ATTR_* . */
+	OVS_ACTION_ATTR_PUSH_ETH,     /* struct ovs_action_push_eth. */
+	OVS_ACTION_ATTR_POP_ETH,      /* No argument. */
 
 	__OVS_ACTION_ATTR_MAX,	      /* Nothing past this will be accepted
 				       * from userspace. */
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 15f130e4c22b..5567529904fa 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -300,6 +300,51 @@ static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key,
 	return 0;
 }
 
+static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
+{
+	/* Pop outermost VLAN tag to skb metadata unless a VLAN tag
+	 * is already present there.
+	 */
+	if ((skb->protocol == htons(ETH_P_8021Q) ||
+	     skb->protocol == htons(ETH_P_8021AD)) &&
+	    !skb_vlan_tag_present(skb)) {
+		int err = skb_vlan_accel(skb);
+		if (unlikely(err))
+			return err;
+	}
+
+	skb_pull_rcsum(skb, ETH_HLEN);
+	skb_reset_mac_header(skb);
+	skb->mac_len -= ETH_HLEN;
+
+	invalidate_flow_key(key);
+	return 0;
+}
+
+static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
+		    const struct ovs_action_push_eth *ethh)
+{
+	struct ethhdr *hdr;
+
+	/* Add the new Ethernet header */
+	if (skb_cow_head(skb, ETH_HLEN) < 0)
+		return -ENOMEM;
+
+	skb_push(skb, ETH_HLEN);
+	skb_reset_mac_header(skb);
+	skb_reset_mac_len(skb);
+
+	hdr = eth_hdr(skb);
+	ether_addr_copy(hdr->h_source, ethh->addresses.eth_src);
+	ether_addr_copy(hdr->h_dest, ethh->addresses.eth_dst);
+	hdr->h_proto = skb->protocol;
+
+	skb_postpush_rcsum(skb, hdr, ETH_HLEN);
+
+	invalidate_flow_key(key);
+	return 0;
+}
+
 static void update_ip_l4_checksum(struct sk_buff *skb, struct iphdr *nh,
 				  __be32 addr, __be32 new_addr)
 {
@@ -1088,6 +1133,14 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 			err = pop_vlan(skb, key);
 			break;
 
+		case OVS_ACTION_ATTR_PUSH_ETH:
+			err = push_eth(skb, key, nla_data(a));
+			break;
+
+		case OVS_ACTION_ATTR_POP_ETH:
+			err = pop_eth(skb, key);
+			break;
+
 		case OVS_ACTION_ATTR_RECIRC:
 			err = execute_recirc(dp, skb, key, a, rem);
 			if (nla_is_last(a, rem)) {
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 856bd8dba676..780957e78d2b 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -546,7 +546,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	struct sw_flow *flow;
 	struct sw_flow_actions *sf_acts;
 	struct datapath *dp;
-	struct ethhdr *eth;
 	struct vport *input_vport;
 	u16 mru = 0;
 	int len;
@@ -567,17 +566,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 
 	nla_memcpy(__skb_put(packet, len), a[OVS_PACKET_ATTR_PACKET], len);
 
-	skb_reset_mac_header(packet);
-	eth = eth_hdr(packet);
-
-	/* Normally, setting the skb 'protocol' field would be handled by a
-	 * call to eth_type_trans(), but it assumes there's a sending
-	 * device, which we may not have. */
-	if (eth_proto_is_802_3(eth->h_proto))
-		packet->protocol = eth->h_proto;
-	else
-		packet->protocol = htons(ETH_P_802_2);
-
 	/* Set packet's mru */
 	if (a[OVS_PACKET_ATTR_MRU]) {
 		mru = nla_get_u16(a[OVS_PACKET_ATTR_MRU]);
@@ -604,6 +592,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	rcu_assign_pointer(flow->sf_acts, acts);
 	packet->priority = flow->key.phy.priority;
 	packet->mark = flow->key.phy.skb_mark;
+	packet->protocol = flow->key.eth.type;
 
 	rcu_read_lock();
 	dp = get_dp_rcu(net, ovs_header->dp_ifindex);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 0ea128eeeab2..2d9777abcfc9 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -468,28 +468,31 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 
 	skb_reset_mac_header(skb);
 
-	/* Link layer.  We are guaranteed to have at least the 14 byte Ethernet
-	 * header in the linear data area.
-	 */
-	eth = eth_hdr(skb);
-	ether_addr_copy(key->eth.src, eth->h_source);
-	ether_addr_copy(key->eth.dst, eth->h_dest);
+	/* Link layer. */
+	key->eth.tci = 0;
+	if (key->phy.is_layer3) {
+		if (skb_vlan_tag_present(skb))
+			key->eth.tci = htons(skb->vlan_tci);
+	} else {
+		eth = eth_hdr(skb);
+		ether_addr_copy(key->eth.src, eth->h_source);
+		ether_addr_copy(key->eth.dst, eth->h_dest);
 
-	__skb_pull(skb, 2 * ETH_ALEN);
-	/* We are going to push all headers that we pull, so no need to
-	 * update skb->csum here.
-	 */
+		__skb_pull(skb, 2 * ETH_ALEN);
+		/* We are going to push all headers that we pull, so no need to
+		 * update skb->csum here.
+		 */
 
-	key->eth.tci = 0;
-	if (skb_vlan_tag_present(skb))
-		key->eth.tci = htons(skb->vlan_tci);
-	else if (eth->h_proto == htons(ETH_P_8021Q))
-		if (unlikely(parse_vlan(skb, key)))
-			return -ENOMEM;
+		if (skb_vlan_tag_present(skb))
+			key->eth.tci = htons(skb->vlan_tci);
+		else if (eth->h_proto == htons(ETH_P_8021Q))
+			if (unlikely(parse_vlan(skb, key)))
+				return -ENOMEM;
 
-	key->eth.type = parse_ethertype(skb);
-	if (unlikely(key->eth.type == htons(0)))
-		return -ENOMEM;
+		key->eth.type = parse_ethertype(skb);
+		if (unlikely(key->eth.type == htons(0)))
+			return -ENOMEM;
+	}
 
 	skb_reset_network_header(skb);
 	skb_reset_mac_len(skb);
@@ -690,12 +693,16 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 
 int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key)
 {
+	key->eth.type = skb->protocol;
+
 	return key_extract(skb, key);
 }
 
 int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 			 struct sk_buff *skb, struct sw_flow_key *key)
 {
+	int err;
+
 	/* Extract metadata from packet. */
 	if (tun_info) {
 		key->tun_proto = ip_tunnel_info_af(tun_info);
@@ -723,9 +730,19 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 	key->phy.skb_mark = skb->mark;
 	ovs_ct_fill_key(skb, key);
 	key->ovs_flow_hash = 0;
+	key->phy.is_layer3 = skb->mac_len == 0;
 	key->recirc_id = 0;
 
-	return key_extract(skb, key);
+	err = key_extract(skb, key);
+	if (err < 0)
+		return err;
+
+	if (key->phy.is_layer3)
+		key->eth.type = skb->protocol;
+	else if (tun_info && skb->protocol == htons(ETH_P_TEB))
+		skb->protocol = key->eth.type;
+
+	return err;
 }
 
 int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 03378e75a67c..5395ec0c3c13 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -62,6 +62,7 @@ struct sw_flow_key {
 		u32	priority;	/* Packet QoS priority. */
 		u32	skb_mark;	/* SKB mark. */
 		u16	in_port;	/* Input switch port (or DP_MAX_PORTS). */
+		bool	is_layer3;	/* Packet has no Ethernet header */
 	} __packed phy; /* Safe when right after 'tun_key'. */
 	u8 tun_proto;			/* Protocol of encapsulating tunnel. */
 	u32 ovs_flow_hash;		/* Datapath computed hash value.  */
@@ -219,8 +220,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
 
 int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key);
 int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
-			 struct sk_buff *skb,
-			 struct sw_flow_key *key);
+			 struct sk_buff *skb, struct sw_flow_key *key);
 /* Extract key from packet coming from userspace. */
 int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
 				   struct sk_buff *skb,
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 0bb650f4f219..1e1392c3c0ed 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -123,7 +123,7 @@ static void update_range(struct sw_flow_match *match,
 static bool match_validate(const struct sw_flow_match *match,
 			   u64 key_attrs, u64 mask_attrs, bool log)
 {
-	u64 key_expected = 1 << OVS_KEY_ATTR_ETHERNET;
+	u64 key_expected = 0;
 	u64 mask_allowed = key_attrs;  /* At most allow all key attributes */
 
 	/* The following mask attributes allowed only if they
@@ -145,6 +145,10 @@ static bool match_validate(const struct sw_flow_match *match,
 		       | (1 << OVS_KEY_ATTR_IN_PORT)
 		       | (1 << OVS_KEY_ATTR_ETHERTYPE));
 
+	/* If Ethertype is present, expect MAC addresses */
+	if (key_attrs & (1ULL << OVS_KEY_ATTR_ETHERTYPE))
+		key_expected |= 1ULL << OVS_KEY_ATTR_ETHERNET;
+
 	/* Check key attributes. */
 	if (match->key->eth.type == htons(ETH_P_ARP)
 			|| match->key->eth.type == htons(ETH_P_RARP)) {
@@ -282,7 +286,7 @@ size_t ovs_key_attr_size(void)
 	/* Whenever adding new OVS_KEY_ FIELDS, we should consider
 	 * updating this function.
 	 */
-	BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 26);
+	BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 27);
 
 	return    nla_total_size(4)   /* OVS_KEY_ATTR_PRIORITY */
 		+ nla_total_size(0)   /* OVS_KEY_ATTR_TUNNEL */
@@ -355,6 +359,7 @@ static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
 	[OVS_KEY_ATTR_CT_ZONE]	 = { .len = sizeof(u16) },
 	[OVS_KEY_ATTR_CT_MARK]	 = { .len = sizeof(u32) },
 	[OVS_KEY_ATTR_CT_LABELS] = { .len = sizeof(struct ovs_key_ct_labels) },
+	[OVS_KEY_ATTR_PACKET_ETHERTYPE] = { .len = sizeof(__be16) },
 };
 
 static bool check_attr_len(unsigned int attr_len, unsigned int expected_len)
@@ -812,6 +817,8 @@ static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match,
 				 u64 *attrs, const struct nlattr **a,
 				 bool is_mask, bool log)
 {
+	bool is_layer3 = false;
+
 	if (*attrs & (1 << OVS_KEY_ATTR_DP_HASH)) {
 		u32 hash_val = nla_get_u32(a[OVS_KEY_ATTR_DP_HASH]);
 
@@ -898,20 +905,41 @@ static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match,
 				   sizeof(*cl), is_mask);
 		*attrs &= ~(1ULL << OVS_KEY_ATTR_CT_LABELS);
 	}
-	return 0;
-}
 
-static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
-				u64 attrs, const struct nlattr **a,
-				bool is_mask, bool log)
-{
-	int err;
+	/* For layer 3 packets the ethernet type is provided by by the
+	 * metadata field OVS_KEY_ATTR_PACKET_ETHERTYPE which has a
+	 * non-zero value. Otherwise the ethernet type is provided by the
+	 * packet and reflected by OVS_KEY_ATTR_PACKET_ETHERTYPE.
+	 */
+	if (*attrs & (1ULL << OVS_KEY_ATTR_PACKET_ETHERTYPE)) {
+		/* This is duplicate code from ovs_key_from_nlattrs*/
+		__be16 eth_type;
 
-	err = metadata_from_nlattrs(net, match, &attrs, a, is_mask, log);
-	if (err)
-		return err;
+		if (is_mask)
+			/* Always exact match packet EtherType */
+			eth_type = htons(0xffff);
+		else
+			eth_type = nla_get_be16(a[OVS_KEY_ATTR_PACKET_ETHERTYPE]);
+
+		if (eth_type != htons(0)) {
+			is_layer3 = true;
+			SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask);
+		}
+
+		*attrs &= ~(1ULL << OVS_KEY_ATTR_PACKET_ETHERTYPE);
+	}
 
-	if (attrs & (1 << OVS_KEY_ATTR_ETHERNET)) {
+	/* Always exact match is_layer3 */
+	SW_FLOW_KEY_PUT(match, phy.is_layer3, is_mask ? true : is_layer3,
+			is_mask);
+	return is_layer3;
+}
+
+static int l2_from_nlattrs(struct net *net, struct sw_flow_match *match,
+			   u64 *attrs, const struct nlattr **a,
+			   bool is_mask, bool log)
+{
+	if (*attrs & (1 << OVS_KEY_ATTR_ETHERNET)) {
 		const struct ovs_key_ethernet *eth_key;
 
 		eth_key = nla_data(a[OVS_KEY_ATTR_ETHERNET]);
@@ -919,10 +947,10 @@ static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
 				eth_key->eth_src, ETH_ALEN, is_mask);
 		SW_FLOW_KEY_MEMCPY(match, eth.dst,
 				eth_key->eth_dst, ETH_ALEN, is_mask);
-		attrs &= ~(1 << OVS_KEY_ATTR_ETHERNET);
+		*attrs &= ~(1 << OVS_KEY_ATTR_ETHERNET);
 	}
 
-	if (attrs & (1 << OVS_KEY_ATTR_VLAN)) {
+	if (*attrs & (1 << OVS_KEY_ATTR_VLAN)) {
 		__be16 tci;
 
 		tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);
@@ -936,10 +964,10 @@ static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
 		}
 
 		SW_FLOW_KEY_PUT(match, eth.tci, tci, is_mask);
-		attrs &= ~(1 << OVS_KEY_ATTR_VLAN);
+		*attrs &= ~(1 << OVS_KEY_ATTR_VLAN);
 	}
 
-	if (attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) {
+	if (*attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) {
 		__be16 eth_type;
 
 		eth_type = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]);
@@ -953,11 +981,32 @@ static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
 		}
 
 		SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask);
-		attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE);
+		*attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE);
 	} else if (!is_mask) {
 		SW_FLOW_KEY_PUT(match, eth.type, htons(ETH_P_802_2), is_mask);
 	}
 
+	return 0;
+}
+
+static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
+				u64 attrs, const struct nlattr **a,
+				bool is_mask, bool log)
+{
+	int err;
+	bool is_layer3;
+
+	err = metadata_from_nlattrs(net, match, &attrs, a, is_mask, log);
+	if (err < 0)
+		return err;
+	is_layer3 = err != 0;
+
+	if (!is_layer3) {
+		err = l2_from_nlattrs(net, match, &attrs, a, is_mask, log);
+		if (err < 0)
+			return err;
+	}
+
 	if (attrs & (1 << OVS_KEY_ATTR_IPV4)) {
 		const struct ovs_key_ipv4 *ipv4_key;
 
@@ -1407,7 +1456,11 @@ int ovs_nla_get_flow_metadata(struct net *net, const struct nlattr *attr,
 	memset(&key->ct, 0, sizeof(key->ct));
 	key->phy.in_port = DP_MAX_PORTS;
 
-	return metadata_from_nlattrs(net, &match, &attrs, a, false, log);
+	err = metadata_from_nlattrs(net, &match, &attrs, a, false, log);
+	if (err < 0)
+		return err;
+
+	return 0;
 }
 
 static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
@@ -1415,7 +1468,7 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
 			     struct sk_buff *skb)
 {
 	struct ovs_key_ethernet *eth_key;
-	struct nlattr *nla, *encap;
+	struct nlattr *nla, *encap = NULL;
 
 	if (nla_put_u32(skb, OVS_KEY_ATTR_RECIRC_ID, output->recirc_id))
 		goto nla_put_failure;
@@ -1456,42 +1509,48 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
 	if (ovs_ct_put_key(output, skb))
 		goto nla_put_failure;
 
-	nla = nla_reserve(skb, OVS_KEY_ATTR_ETHERNET, sizeof(*eth_key));
-	if (!nla)
-		goto nla_put_failure;
+	if (!swkey->phy.is_layer3) {
+		nla = nla_reserve(skb, OVS_KEY_ATTR_ETHERNET, sizeof(*eth_key));
+		if (!nla)
+			goto nla_put_failure;
 
-	eth_key = nla_data(nla);
-	ether_addr_copy(eth_key->eth_src, output->eth.src);
-	ether_addr_copy(eth_key->eth_dst, output->eth.dst);
+		eth_key = nla_data(nla);
+		ether_addr_copy(eth_key->eth_src, output->eth.src);
+		ether_addr_copy(eth_key->eth_dst, output->eth.dst);
 
-	if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) {
-		__be16 eth_type;
-		eth_type = !is_mask ? htons(ETH_P_8021Q) : htons(0xffff);
-		if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, eth_type) ||
-		    nla_put_be16(skb, OVS_KEY_ATTR_VLAN, output->eth.tci))
-			goto nla_put_failure;
-		encap = nla_nest_start(skb, OVS_KEY_ATTR_ENCAP);
-		if (!swkey->eth.tci)
-			goto unencap;
-	} else
-		encap = NULL;
-
-	if (swkey->eth.type == htons(ETH_P_802_2)) {
-		/*
-		 * Ethertype 802.2 is represented in the netlink with omitted
-		 * OVS_KEY_ATTR_ETHERTYPE in the flow key attribute, and
-		 * 0xffff in the mask attribute.  Ethertype can also
-		 * be wildcarded.
-		 */
-		if (is_mask && output->eth.type)
-			if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE,
-						output->eth.type))
+		if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) {
+			__be16 eth_type;
+			eth_type = !is_mask ? htons(ETH_P_8021Q) : htons(0xffff);
+			if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, eth_type) ||
+			    nla_put_be16(skb, OVS_KEY_ATTR_VLAN,
+					 output->eth.tci))
 				goto nla_put_failure;
-		goto unencap;
-	}
+			encap = nla_nest_start(skb, OVS_KEY_ATTR_ENCAP);
+			if (!swkey->eth.tci)
+				goto unencap;
+		}
 
-	if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, output->eth.type))
-		goto nla_put_failure;
+		if (swkey->eth.type == htons(ETH_P_802_2)) {
+			/*
+			 * Ethertype 802.2 is represented in the netlink
+			 * with omitted OVS_KEY_ATTR_ETHERTYPE in the flow
+			 * key attribute, and 0xffff in the mask attribute.
+			 * Ethertype can also be wildcarded.
+			 */
+			if (is_mask && output->eth.type)
+				if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE,
+						 output->eth.type))
+					goto nla_put_failure;
+			goto unencap;
+		}
+
+		if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, output->eth.type))
+			goto nla_put_failure;
+	} else {
+		if (nla_put_be16(skb, OVS_KEY_ATTR_PACKET_ETHERTYPE,
+				 output->eth.type))
+			goto nla_put_failure;
+	}
 
 	if (swkey->eth.type == htons(ETH_P_IP)) {
 		struct ovs_key_ipv4 *ipv4_key;
@@ -2010,8 +2069,8 @@ static bool validate_masked(u8 *data, int len)
 
 static int validate_set(const struct nlattr *a,
 			const struct sw_flow_key *flow_key,
-			struct sw_flow_actions **sfa,
-			bool *skip_copy, __be16 eth_type, bool masked, bool log)
+			struct sw_flow_actions **sfa, bool *skip_copy,
+			__be16 eth_type, bool masked, bool log, bool is_layer3)
 {
 	const struct nlattr *ovs_key = nla_data(a);
 	int key_type = nla_type(ovs_key);
@@ -2041,7 +2100,11 @@ static int validate_set(const struct nlattr *a,
 	case OVS_KEY_ATTR_SKB_MARK:
 	case OVS_KEY_ATTR_CT_MARK:
 	case OVS_KEY_ATTR_CT_LABELS:
+		break;
+
 	case OVS_KEY_ATTR_ETHERNET:
+		if (is_layer3)
+			return -EINVAL;
 		break;
 
 	case OVS_KEY_ATTR_TUNNEL:
@@ -2208,6 +2271,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 				  int depth, struct sw_flow_actions **sfa,
 				  __be16 eth_type, __be16 vlan_tci, bool log)
 {
+	bool is_layer3 = key->phy.is_layer3;
 	const struct nlattr *a;
 	int rem, err;
 
@@ -2229,6 +2293,8 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			[OVS_ACTION_ATTR_SAMPLE] = (u32)-1,
 			[OVS_ACTION_ATTR_HASH] = sizeof(struct ovs_action_hash),
 			[OVS_ACTION_ATTR_CT] = (u32)-1,
+			[OVS_ACTION_ATTR_PUSH_ETH] = sizeof(struct ovs_action_push_eth),
+			[OVS_ACTION_ATTR_POP_ETH] = 0,
 		};
 		const struct ovs_action_push_vlan *vlan;
 		int type = nla_type(a);
@@ -2269,10 +2335,14 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 		}
 
 		case OVS_ACTION_ATTR_POP_VLAN:
+			if (is_layer3)
+				return -EINVAL;
 			vlan_tci = htons(0);
 			break;
 
 		case OVS_ACTION_ATTR_PUSH_VLAN:
+			if (is_layer3)
+				return -EINVAL;
 			vlan = nla_data(a);
 			if (vlan->vlan_tpid != htons(ETH_P_8021Q))
 				return -EINVAL;
@@ -2322,14 +2392,16 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 
 		case OVS_ACTION_ATTR_SET:
 			err = validate_set(a, key, sfa,
-					   &skip_copy, eth_type, false, log);
+					   &skip_copy, eth_type, false, log,
+					   is_layer3);
 			if (err)
 				return err;
 			break;
 
 		case OVS_ACTION_ATTR_SET_MASKED:
 			err = validate_set(a, key, sfa,
-					   &skip_copy, eth_type, true, log);
+					   &skip_copy, eth_type, true, log,
+					   is_layer3);
 			if (err)
 				return err;
 			break;
@@ -2349,6 +2421,22 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			skip_copy = true;
 			break;
 
+		case OVS_ACTION_ATTR_POP_ETH:
+			if (is_layer3)
+				return -EINVAL;
+			if (vlan_tci & htons(VLAN_TAG_PRESENT))
+				return -EINVAL;
+			is_layer3 = true;
+			break;
+
+		case OVS_ACTION_ATTR_PUSH_ETH:
+			/* For now disallow pushing an Ethernet header if one
+			 * is already present */
+			if (!is_layer3)
+				return -EINVAL;
+			is_layer3 = false;
+			break;
+
 		default:
 			OVS_NLERR(log, "Unknown Action type %d", type);
 			return -EINVAL;
diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
index 1a1fcec88695..7a06e19f5279 100644
--- a/net/openvswitch/vport-geneve.c
+++ b/net/openvswitch/vport-geneve.c
@@ -116,7 +116,7 @@ static struct vport_ops ovs_geneve_vport_ops = {
 	.create		= geneve_create,
 	.destroy	= ovs_netdev_tunnel_destroy,
 	.get_options	= geneve_get_options,
-	.send		= dev_queue_xmit,
+	.send		= ovs_netdev_send,
 };
 
 static int __init ovs_geneve_tnl_init(void)
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index 7f8897f33a67..bcbc91b8b077 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -87,7 +87,7 @@ static struct vport *gre_create(const struct vport_parms *parms)
 static struct vport_ops ovs_gre_vport_ops = {
 	.type		= OVS_VPORT_TYPE_GRE,
 	.create		= gre_create,
-	.send		= dev_queue_xmit,
+	.send		= ovs_netdev_send,
 	.destroy	= ovs_netdev_tunnel_destroy,
 };
 
diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index f89b1efa88f1..484ba529c682 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -258,6 +258,12 @@ static netdev_tx_t internal_dev_recv(struct sk_buff *skb)
 	struct net_device *netdev = skb->dev;
 	struct pcpu_sw_netstats *stats;
 
+	/* Only send/receive L2 packets */
+	if (!skb->mac_len) {
+		kfree_skb(skb);
+		return -EINVAL;
+	}
+
 	if (unlikely(!(netdev->flags & IFF_UP))) {
 		kfree_skb(skb);
 		netdev->stats.rx_dropped++;
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 4e3972344aa6..733e7914f6bd 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -57,8 +57,10 @@ static void netdev_port_receive(struct sk_buff *skb)
 	if (unlikely(!skb))
 		return;
 
-	skb_push(skb, ETH_HLEN);
-	skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
+	if (vport->dev->type == ARPHRD_ETHER) {
+		skb_push(skb, ETH_HLEN);
+		skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
+	}
 	ovs_vport_receive(vport, skb, skb_tunnel_info(skb));
 	return;
 error:
@@ -194,6 +196,17 @@ void ovs_netdev_tunnel_destroy(struct vport *vport)
 }
 EXPORT_SYMBOL_GPL(ovs_netdev_tunnel_destroy);
 
+int ovs_netdev_send(struct sk_buff *skb)
+{
+	/* Only send L2 packets */
+	if (skb->mac_len)
+		return dev_queue_xmit(skb);
+
+	kfree_skb(skb);
+	return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(ovs_netdev_send);
+
 /* Returns null if this device is not attached to a datapath. */
 struct vport *ovs_netdev_get_vport(struct net_device *dev)
 {
@@ -208,7 +221,7 @@ static struct vport_ops ovs_netdev_vport_ops = {
 	.type		= OVS_VPORT_TYPE_NETDEV,
 	.create		= netdev_create,
 	.destroy	= netdev_destroy,
-	.send		= dev_queue_xmit,
+	.send		= ovs_netdev_send,
 };
 
 int __init ovs_netdev_init(void)
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index 19e29c12adcc..637b14a9963c 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -33,4 +33,6 @@ int __init ovs_netdev_init(void);
 void ovs_netdev_exit(void);
 
 void ovs_netdev_tunnel_destroy(struct vport *vport);
+
+int ovs_netdev_send(struct sk_buff *skb);
 #endif /* vport_netdev.h */
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 5eb7694348b5..13f11ad7e35a 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -153,7 +153,7 @@ static struct vport_ops ovs_vxlan_netdev_vport_ops = {
 	.create			= vxlan_create,
 	.destroy		= ovs_netdev_tunnel_destroy,
 	.get_options		= vxlan_get_options,
-	.send			= dev_queue_xmit,
+	.send			= ovs_netdev_send,
 };
 
 static int __init ovs_vxlan_tnl_init(void)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 5/5] openvswitch: use ipgre tunnel rather than gretap tunnel
  2016-06-02  6:24 [PATCH net-next v10 0/5] openvswitch: support for layer 3 encapsulated packets Simon Horman
                   ` (3 preceding siblings ...)
  2016-06-02  6:24 ` [PATCH net-next v10 4/5] openvswitch: add layer 3 flow/port support Simon Horman
@ 2016-06-02  6:24 ` Simon Horman
  4 siblings, 0 replies; 23+ messages in thread
From: Simon Horman @ 2016-06-02  6:24 UTC (permalink / raw)
  To: netdev; +Cc: dev

This allows GRE tunnels to send and receive both
layer 2 packets (packets with an ethernet header) and
layer 3 packets (packets without an ethernet header).

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
v10
* Handle case of l3 only packets on vport-netdev
* Use ARPHRD_NONE for ipgre interfaces as per recent changes in mainline
* Ensure skb->mac_len is set correctly in netdev_port_receive and
  relay on this value to differentiate layer3 packets in
  ovs_flow_key_extract()

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 include/net/gre.h              |  4 ++--
 net/ipv4/ip_gre.c              |  9 +++++----
 net/openvswitch/vport-gre.c    |  2 +-
 net/openvswitch/vport-netdev.c | 34 ++++++++++++++++++++++++++++------
 4 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/include/net/gre.h b/include/net/gre.h
index 5dce30a6abe3..aeb748a97e0d 100644
--- a/include/net/gre.h
+++ b/include/net/gre.h
@@ -23,8 +23,8 @@ struct gre_protocol {
 int gre_add_protocol(const struct gre_protocol *proto, u8 version);
 int gre_del_protocol(const struct gre_protocol *proto, u8 version);
 
-struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
-				       u8 name_assign_type);
+struct net_device *gre_fb_dev_create(struct net *net, const char *name,
+				     u8 name_assign_type);
 int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
 		     bool *csum_err, __be16 proto);
 
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 4d2025f7ec57..58d323289872 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1116,8 +1116,8 @@ static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
 	.get_link_net	= ip_tunnel_get_link_net,
 };
 
-struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
-					u8 name_assign_type)
+struct net_device *gre_fb_dev_create(struct net *net, const char *name,
+				     u8 name_assign_type)
 {
 	struct nlattr *tb[IFLA_MAX + 1];
 	struct net_device *dev;
@@ -1127,13 +1127,14 @@ struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
 	memset(&tb, 0, sizeof(tb));
 
 	dev = rtnl_create_link(net, name, name_assign_type,
-			       &ipgre_tap_ops, tb);
+			       &ipgre_link_ops, tb);
 	if (IS_ERR(dev))
 		return dev;
 
 	/* Configure flow based GRE device. */
 	t = netdev_priv(dev);
 	t->collect_md = true;
+	dev->type = ARPHRD_NONE;
 
 	err = ipgre_newlink(net, dev, tb, NULL);
 	if (err < 0)
@@ -1151,7 +1152,7 @@ out:
 	free_netdev(dev);
 	return ERR_PTR(err);
 }
-EXPORT_SYMBOL_GPL(gretap_fb_dev_create);
+EXPORT_SYMBOL_GPL(gre_fb_dev_create);
 
 static int __net_init ipgre_tap_init_net(struct net *net)
 {
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index bcbc91b8b077..c1cab9dd392f 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -60,7 +60,7 @@ static struct vport *gre_tnl_create(const struct vport_parms *parms)
 		return vport;
 
 	rtnl_lock();
-	dev = gretap_fb_dev_create(net, parms->name, NET_NAME_USER);
+	dev = gre_fb_dev_create(net, parms->name, NET_NAME_USER);
 	if (IS_ERR(dev)) {
 		rtnl_unlock();
 		ovs_vport_free(vport);
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 733e7914f6bd..3df36df62ee9 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -60,7 +60,24 @@ static void netdev_port_receive(struct sk_buff *skb)
 	if (vport->dev->type == ARPHRD_ETHER) {
 		skb_push(skb, ETH_HLEN);
 		skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
+	} else if (vport->dev->type == ARPHRD_NONE) {
+		if (skb->protocol == htons(ETH_P_TEB)) {
+			__be16 eth_type;
+
+			if (unlikely(skb->len < ETH_HLEN))
+				goto error;
+
+			eth_type = eth_type_trans(skb, skb->dev);
+			skb->mac_len = skb->data - skb_mac_header(skb);
+			__skb_push(skb, skb->mac_len);
+
+			if (eth_type == htons(ETH_P_8021Q))
+				skb->mac_len += VLAN_HLEN;
+		} else {
+			skb->mac_len = 0;
+		}
 	}
+
 	ovs_vport_receive(vport, skb, skb_tunnel_info(skb));
 	return;
 error:
@@ -99,7 +116,8 @@ struct vport *ovs_netdev_link(struct vport *vport, const char *name)
 	}
 
 	if (vport->dev->flags & IFF_LOOPBACK ||
-	    vport->dev->type != ARPHRD_ETHER ||
+	    (vport->dev->type != ARPHRD_ETHER &&
+	     vport->dev->type != ARPHRD_NONE) ||
 	    ovs_is_internal_dev(vport->dev)) {
 		err = -EINVAL;
 		goto error_put;
@@ -198,12 +216,16 @@ EXPORT_SYMBOL_GPL(ovs_netdev_tunnel_destroy);
 
 int ovs_netdev_send(struct sk_buff *skb)
 {
-	/* Only send L2 packets */
-	if (skb->mac_len)
-		return dev_queue_xmit(skb);
+	struct net_device *dev = skb->dev;
 
-	kfree_skb(skb);
-	return -EINVAL;
+	if (dev->type != ARPHRD_ETHER && skb->mac_len) {
+		skb->protocol = htons(ETH_P_TEB);
+	} else if (dev->type == ARPHRD_ETHER && !skb->mac_len) {
+		kfree_skb(skb);
+		return -EINVAL;
+	}
+
+	return dev_queue_xmit(skb);
 }
 EXPORT_SYMBOL_GPL(ovs_netdev_send);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 1/5] net: add skb_vlan_accel helper
  2016-06-02  6:24 ` [PATCH net-next v10 1/5] net: add skb_vlan_accel helper Simon Horman
@ 2016-06-02 22:01   ` pravin shelar
  0 siblings, 0 replies; 23+ messages in thread
From: pravin shelar @ 2016-06-02 22:01 UTC (permalink / raw)
  To: Simon Horman; +Cc: Linux Kernel Network Developers, ovs dev

On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
<simon.horman@netronome.com> wrote:
> This breaks out some of of skb_vlan_pop into a separate helper.
> This new helper moves the outer-most vlan tag present in packet data
> into metadata.
>
> The motivation is to allow acceleration VLAN tags without adding a new
> one. This is in preparation for a push ethernet header support in Open
> vSwitch.
>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
>

I am not sure we need this function at this point. I will post comment
on patch 4 where it is used.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 2/5] openvswitch: set skb protocol and mac_len when receiving on internal device
       [not found]   ` <1464848686-7656-3-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
@ 2016-06-02 22:01     ` pravin shelar
  2016-06-07  3:08       ` Simon Horman
  0 siblings, 1 reply; 23+ messages in thread
From: pravin shelar @ 2016-06-02 22:01 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers

On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
<simon.horman@netronome.com> wrote:
> * Set skb protocol based on contents of packet. I have observed this is
>   necessary to get actual protocol of a packet when it is injected into an
>   internal device e.g. by libnet in which case skb protocol will be set to
>   ETH_ALL.
>
> * Set the mac_len which has been observed to not be set up correctly when
>   an ARP packet is generated and sent via an openvswitch bridge.
>   My test case is a scenario where there are two open vswtich bridges.
>   One outputs to a tunnel port which egresses on the other.
>
> The motivation for this is that support for outputting to layer 3 (non-tap)
> GRE tunnels as implemented by a subsequent patch depends on protocol and
> mac_len being set correctly on receive.
>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
>
> ---
> v10
> * Set mac_len
>
> v9
> * New patch
> ---
>  net/openvswitch/vport-internal_dev.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
> index 2ee48e447b72..f89b1efa88f1 100644
> --- a/net/openvswitch/vport-internal_dev.c
> +++ b/net/openvswitch/vport-internal_dev.c
> @@ -48,6 +48,10 @@ static int internal_dev_xmit(struct sk_buff *skb, struct net_device *netdev)
>  {
>         int len, err;
>
> +       skb->protocol = eth_type_trans(skb, netdev);
> +       skb_push(skb, ETH_HLEN);
> +       skb_reset_mac_len(skb);
> +
resetting mac-len breaks the assumption about mac_len for referencing
MPLS header ref: skb_mpls_header().
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 3/5] openvswitch: add support to push and pop mpls for layer3 packets
  2016-06-02  6:24 ` [PATCH net-next v10 3/5] openvswitch: add support to push and pop mpls for layer3 packets Simon Horman
@ 2016-06-02 22:02   ` pravin shelar
  2016-06-07  2:51     ` Simon Horman
  0 siblings, 1 reply; 23+ messages in thread
From: pravin shelar @ 2016-06-02 22:02 UTC (permalink / raw)
  To: Simon Horman; +Cc: Linux Kernel Network Developers, ovs dev

On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
<simon.horman@netronome.com> wrote:
> Allow push and pop mpls actions to act on layer 3 packets by teaching
> them not to access non-existent L2 headers of such packets.
>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
> ---
> v10
> * Limit scope of hdr in {push,pop}_mpls()
>
> v9
> * New Patch
> ---
>  net/openvswitch/actions.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 9a3eb7a0ebf4..15f130e4c22b 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -172,7 +172,8 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
>
>         skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
>
> -       update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
> +       if (skb->mac_len)
> +               update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
We can move all ethernet related code in this if block. for example memmove().

>         if (!skb->inner_protocol)
>                 skb_set_inner_protocol(skb, skb->protocol);
>         skb->protocol = mpls->mpls_ethertype;
> @@ -184,7 +185,6 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
>  static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
>                     const __be16 ethertype)
>  {
> -       struct ethhdr *hdr;
>         int err;
>
>         err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN);
> @@ -199,11 +199,16 @@ static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
>         __skb_pull(skb, MPLS_HLEN);
>         skb_reset_mac_header(skb);
>
> -       /* skb_mpls_header() is used to locate the ethertype
> -        * field correctly in the presence of VLAN tags.
> -        */
> -       hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
> -       update_ethertype(skb, hdr, ethertype);
> +       if (skb->mac_len) {
> +               struct ethhdr *hdr;
> +
> +               /* skb_mpls_header() is used to locate the ethertype
> +                * field correctly in the presence of VLAN tags.
> +                */
> +               hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
> +               update_ethertype(skb, hdr, ethertype);
> +       }
same here.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 4/5] openvswitch: add layer 3 flow/port support
       [not found]   ` <1464848686-7656-5-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
@ 2016-06-02 22:02     ` pravin shelar
  2016-06-07  2:46       ` Simon Horman
  0 siblings, 1 reply; 23+ messages in thread
From: pravin shelar @ 2016-06-02 22:02 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers

On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
<simon.horman@netronome.com> wrote:
> From: Lorand Jakab <lojakab@cisco.com>
>
> Implementation of the pop_eth and push_eth actions in the kernel, and
> layer 3 flow support.
>
> This doesn't actually do anything yet as no layer 2 tunnel ports are
> supported yet. The original patch by Lorand was against the Open vSwitch
> tree which has L2 LISP tunnels but that is not supported in mainline Linux.
> I (Simon) plan to follow up with support for non-TEB GRE ports based on
> work by Thomas Morin.
>
> Cc: Thomas Morin <thomas.morin@orange.com>
> Signed-off-by: Lorand Jakab <lojakab@cisco.com>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
>
> ---
> v10 [Simon Horman]
> * Move outermost VLAN into skb metadata in pop_eth and
>   leave any VLAN as-is in push_eth. The effect is to allow the presence
>   of a vlan to be independent of pushing and popping ethernet headers.
> * Omit unnecessary type field from push_eth action
> * Squash with the following patches to make a more complete patch:
>   "openvswitch: add layer 3 support to ovs_packet_cmd_execute()"
>   "openvswitch: extend layer 3 support to cover non-IP packets"
>
> v9 [Simon Horman]
> * Rebase
> * Minor coding style updates
> * Prohibit push/pop MPLS on l3 packets
> * There are no layer 3 ports supported at this time so only
>   send and receive layer 2 packets: that is don't actually
>   use this new infrastructure yet
> * Expect that vports that can handle layer 3 packets will: have
>   a type other than ARPHRD_IPETHER; can also handle layer 2 packets;
>   and that packets can be differentiated by layer 2 packets having
>   skb->protocol set to htons(ETH_P_TEB)
>
> v1 - v8 [Lorand Jakub]
>
> wip: fix: openvswitch: add support to push and pop
>
> * Consistently use skb_hdr() in push_eth() by assigning
>   its value to a local variable.
> * Limit scope of hdr in push_mpls()
> * Recalculate csum for protocl change in push_mpls.
>   - Also needed for pop_mpls?
>   - Break out into a fix-patch
>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
...

> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 15f130e4c22b..5567529904fa 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -300,6 +300,51 @@ static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key,
>         return 0;
>  }
>
> +static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
> +{
> +       /* Pop outermost VLAN tag to skb metadata unless a VLAN tag
> +        * is already present there.
> +        */
> +       if ((skb->protocol == htons(ETH_P_8021Q) ||
> +            skb->protocol == htons(ETH_P_8021AD)) &&
> +           !skb_vlan_tag_present(skb)) {
> +               int err = skb_vlan_accel(skb);
> +               if (unlikely(err))
> +                       return err;
> +       }
> +
I do not think we can keep just the vlan tag and pop ethernet header.
There are multiple issues with this.
First networking stack can not handle suck packet. second issue even
after this patch OVS can not parse this type of packet. third this
patch does not allow pop-eth action on vlan tagged packet.
There is already separate vlan related actions in OVS so lets keep it simple.

> +       skb_pull_rcsum(skb, ETH_HLEN);
> +       skb_reset_mac_header(skb);
> +       skb->mac_len -= ETH_HLEN;
> +
> +       invalidate_flow_key(key);
> +       return 0;
> +}
> +
...
...
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index 0ea128eeeab2..2d9777abcfc9 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -468,28 +468,31 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>
>         skb_reset_mac_header(skb);
>
> -       /* Link layer.  We are guaranteed to have at least the 14 byte Ethernet
> -        * header in the linear data area.
> -        */
> -       eth = eth_hdr(skb);
> -       ether_addr_copy(key->eth.src, eth->h_source);
> -       ether_addr_copy(key->eth.dst, eth->h_dest);
> +       /* Link layer. */
> +       key->eth.tci = 0;
> +       if (key->phy.is_layer3) {
> +               if (skb_vlan_tag_present(skb))
> +                       key->eth.tci = htons(skb->vlan_tci);
> +       } else {
> +               eth = eth_hdr(skb);
eth can be moved to this block.

> +               ether_addr_copy(key->eth.src, eth->h_source);
> +               ether_addr_copy(key->eth.dst, eth->h_dest);
>
> -       __skb_pull(skb, 2 * ETH_ALEN);
> -       /* We are going to push all headers that we pull, so no need to
> -        * update skb->csum here.
> -        */
> +               __skb_pull(skb, 2 * ETH_ALEN);
> +               /* We are going to push all headers that we pull, so no need to
> +                * update skb->csum here.
> +                */
>
> -       key->eth.tci = 0;
> -       if (skb_vlan_tag_present(skb))
> -               key->eth.tci = htons(skb->vlan_tci);
> -       else if (eth->h_proto == htons(ETH_P_8021Q))
> -               if (unlikely(parse_vlan(skb, key)))
> -                       return -ENOMEM;
> +               if (skb_vlan_tag_present(skb))
> +                       key->eth.tci = htons(skb->vlan_tci);
> +               else if (eth->h_proto == htons(ETH_P_8021Q))
> +                       if (unlikely(parse_vlan(skb, key)))
> +                               return -ENOMEM;
>
> -       key->eth.type = parse_ethertype(skb);
> -       if (unlikely(key->eth.type == htons(0)))
> -               return -ENOMEM;
> +               key->eth.type = parse_ethertype(skb);
> +               if (unlikely(key->eth.type == htons(0)))
> +                       return -ENOMEM;
> +       }

...

> @@ -723,9 +730,19 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>         key->phy.skb_mark = skb->mark;
>         ovs_ct_fill_key(skb, key);
>         key->ovs_flow_hash = 0;
> +       key->phy.is_layer3 = skb->mac_len == 0;
>         key->recirc_id = 0;
>
> -       return key_extract(skb, key);
> +       err = key_extract(skb, key);
> +       if (err < 0)
> +               return err;
> +
> +       if (key->phy.is_layer3)
> +               key->eth.type = skb->protocol;
Now key->eth.type is set in three different function, can you
centralize in key_extract()?

> +       else if (tun_info && skb->protocol == htons(ETH_P_TEB))
> +               skb->protocol = key->eth.type;
> +
> +       return err;
>  }
>
>  int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
> diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
> index 03378e75a67c..5395ec0c3c13 100644
...

> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index 0bb650f4f219..1e1392c3c0ed 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -123,7 +123,7 @@ static void update_range(struct sw_flow_match *match,
>  static bool match_validate(const struct sw_flow_match *match,
>                            u64 key_attrs, u64 mask_attrs, bool log)
>  {
> -       u64 key_expected = 1 << OVS_KEY_ATTR_ETHERNET;
> +       u64 key_expected = 0;
>         u64 mask_allowed = key_attrs;  /* At most allow all key attributes */
>
>         /* The following mask attributes allowed only if they
> @@ -145,6 +145,10 @@ static bool match_validate(const struct sw_flow_match *match,
>                        | (1 << OVS_KEY_ATTR_IN_PORT)
>                        | (1 << OVS_KEY_ATTR_ETHERTYPE));
>
> +       /* If Ethertype is present, expect MAC addresses */
> +       if (key_attrs & (1ULL << OVS_KEY_ATTR_ETHERTYPE))
> +               key_expected |= 1ULL << OVS_KEY_ATTR_ETHERNET;
> +
>         /* Check key attributes. */
>         if (match->key->eth.type == htons(ETH_P_ARP)
>                         || match->key->eth.type == htons(ETH_P_RARP)) {
> @@ -282,7 +286,7 @@ size_t ovs_key_attr_size(void)
>         /* Whenever adding new OVS_KEY_ FIELDS, we should consider
>          * updating this function.
>          */
> -       BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 26);
> +       BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 27);
>
>         return    nla_total_size(4)   /* OVS_KEY_ATTR_PRIORITY */
>                 + nla_total_size(0)   /* OVS_KEY_ATTR_TUNNEL */
> @@ -355,6 +359,7 @@ static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
>         [OVS_KEY_ATTR_CT_ZONE]   = { .len = sizeof(u16) },
>         [OVS_KEY_ATTR_CT_MARK]   = { .len = sizeof(u32) },
>         [OVS_KEY_ATTR_CT_LABELS] = { .len = sizeof(struct ovs_key_ct_labels) },
> +       [OVS_KEY_ATTR_PACKET_ETHERTYPE] = { .len = sizeof(__be16) },
>  };
>
I do not see need for OVS_KEY_ATTR_PACKET_ETHERTYPE, we can use
existing OVS_KEY_ATTR_ETHERTYPE to serialize the flow key. If there is
no OVS_KEY_ATTR_ETHERNET attribute then its l3 packet.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 4/5] openvswitch: add layer 3 flow/port support
  2016-06-02 22:02     ` pravin shelar
@ 2016-06-07  2:46       ` Simon Horman
       [not found]         ` <20160607024609.GC31696-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Simon Horman @ 2016-06-07  2:46 UTC (permalink / raw)
  To: pravin shelar; +Cc: Linux Kernel Network Developers, ovs dev

On Thu, Jun 02, 2016 at 03:02:18PM -0700, pravin shelar wrote:
> On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
> <simon.horman@netronome.com> wrote:

[...]

> > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> > index 15f130e4c22b..5567529904fa 100644
> > --- a/net/openvswitch/actions.c
> > +++ b/net/openvswitch/actions.c
> > @@ -300,6 +300,51 @@ static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key,
> >         return 0;
> >  }
> >
> > +static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
> > +{
> > +       /* Pop outermost VLAN tag to skb metadata unless a VLAN tag
> > +        * is already present there.
> > +        */
> > +       if ((skb->protocol == htons(ETH_P_8021Q) ||
> > +            skb->protocol == htons(ETH_P_8021AD)) &&
> > +           !skb_vlan_tag_present(skb)) {
> > +               int err = skb_vlan_accel(skb);
> > +               if (unlikely(err))
> > +                       return err;
> > +       }
> > +
>
> I do not think we can keep just the vlan tag and pop ethernet header.
> There are multiple issues with this.
> First networking stack can not handle suck packet. second issue even
> after this patch OVS can not parse this type of packet. third this
> patch does not allow pop-eth action on vlan tagged packet.
> There is already separate vlan related actions in OVS so lets keep it simple.

I wonder if the best solution is to simply omit handling VLAN tags
in pop_eth for now. As you mention pop_eth is not permitted on such packets.

[...]

> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> > index 0ea128eeeab2..2d9777abcfc9 100644
> > --- a/net/openvswitch/flow.c
> > +++ b/net/openvswitch/flow.c
> > @@ -468,28 +468,31 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
> >
> >         skb_reset_mac_header(skb);
> >
> > -       /* Link layer.  We are guaranteed to have at least the 14 byte Ethernet
> > -        * header in the linear data area.
> > -        */
> > -       eth = eth_hdr(skb);
> > -       ether_addr_copy(key->eth.src, eth->h_source);
> > -       ether_addr_copy(key->eth.dst, eth->h_dest);
> > +       /* Link layer. */
> > +       key->eth.tci = 0;
> > +       if (key->phy.is_layer3) {
> > +               if (skb_vlan_tag_present(skb))
> > +                       key->eth.tci = htons(skb->vlan_tci);
> > +       } else {
> > +               eth = eth_hdr(skb);
> eth can be moved to this block.

Thanks, done.

[...]

> > @@ -723,9 +730,19 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
> >         key->phy.skb_mark = skb->mark;
> >         ovs_ct_fill_key(skb, key);
> >         key->ovs_flow_hash = 0;
> > +       key->phy.is_layer3 = skb->mac_len == 0;
> >         key->recirc_id = 0;
> >
> > -       return key_extract(skb, key);
> > +       err = key_extract(skb, key);
> > +       if (err < 0)
> > +               return err;
> > +
> > +       if (key->phy.is_layer3)
> > +               key->eth.type = skb->protocol;
> Now key->eth.type is set in three different function, can you
> centralize in key_extract()?

Sure, I think that the instance above can be trivially moved into
key_extract() and the one in ovs_flow_key_update() can be removed.

[...]

> > diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> > index 0bb650f4f219..1e1392c3c0ed 100644
> > --- a/net/openvswitch/flow_netlink.c
> > +++ b/net/openvswitch/flow_netlink.c

[...]

> > @@ -355,6 +359,7 @@ static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
> >         [OVS_KEY_ATTR_CT_ZONE]   = { .len = sizeof(u16) },
> >         [OVS_KEY_ATTR_CT_MARK]   = { .len = sizeof(u32) },
> >         [OVS_KEY_ATTR_CT_LABELS] = { .len = sizeof(struct ovs_key_ct_labels) },
> > +       [OVS_KEY_ATTR_PACKET_ETHERTYPE] = { .len = sizeof(__be16) },
> >  };
> >
> I do not see need for OVS_KEY_ATTR_PACKET_ETHERTYPE, we can use
> existing OVS_KEY_ATTR_ETHERTYPE to serialize the flow key. If there is
> no OVS_KEY_ATTR_ETHERNET attribute then its l3 packet.

The idea of OVS_KEY_ATTR_PACKET_ETHERTYPE is to allow communication of
the L2 type of the packet which is not present in an L3 packet. In terms
of GRE (non-TEB) this correlates to the Protocol Type field in the GRE
header.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 3/5] openvswitch: add support to push and pop mpls for layer3 packets
  2016-06-02 22:02   ` pravin shelar
@ 2016-06-07  2:51     ` Simon Horman
  2016-06-07 22:45       ` pravin shelar
  0 siblings, 1 reply; 23+ messages in thread
From: Simon Horman @ 2016-06-07  2:51 UTC (permalink / raw)
  To: pravin shelar; +Cc: Linux Kernel Network Developers, ovs dev

On Thu, Jun 02, 2016 at 03:02:00PM -0700, pravin shelar wrote:
> On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
> <simon.horman@netronome.com> wrote:
> > Allow push and pop mpls actions to act on layer 3 packets by teaching
> > them not to access non-existent L2 headers of such packets.
> >
> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
> > ---
> > v10
> > * Limit scope of hdr in {push,pop}_mpls()
> >
> > v9
> > * New Patch
> > ---
> >  net/openvswitch/actions.c | 19 ++++++++++++-------
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> > index 9a3eb7a0ebf4..15f130e4c22b 100644
> > --- a/net/openvswitch/actions.c
> > +++ b/net/openvswitch/actions.c
> > @@ -172,7 +172,8 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
> >
> >         skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
> >
> > -       update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
> > +       if (skb->mac_len)
> > +               update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
> We can move all ethernet related code in this if block. for example memmove().

My assumption is that the memmove() does nothing if skb->mac_len is zero
and from my point of view it seems clean to leave it where it is unless
the code around it also moves.

Is there other code you think could/should be moved into the
if (skb->mac_len) block?

> 
> >         if (!skb->inner_protocol)
> >                 skb_set_inner_protocol(skb, skb->protocol);
> >         skb->protocol = mpls->mpls_ethertype;
> > @@ -184,7 +185,6 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
> >  static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
> >                     const __be16 ethertype)
> >  {
> > -       struct ethhdr *hdr;
> >         int err;
> >
> >         err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN);
> > @@ -199,11 +199,16 @@ static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
> >         __skb_pull(skb, MPLS_HLEN);
> >         skb_reset_mac_header(skb);
> >
> > -       /* skb_mpls_header() is used to locate the ethertype
> > -        * field correctly in the presence of VLAN tags.
> > -        */
> > -       hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
> > -       update_ethertype(skb, hdr, ethertype);
> > +       if (skb->mac_len) {
> > +               struct ethhdr *hdr;
> > +
> > +               /* skb_mpls_header() is used to locate the ethertype
> > +                * field correctly in the presence of VLAN tags.
> > +                */
> > +               hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
> > +               update_ethertype(skb, hdr, ethertype);
> > +       }
> same here.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 2/5] openvswitch: set skb protocol and mac_len when receiving on internal device
  2016-06-02 22:01     ` pravin shelar
@ 2016-06-07  3:08       ` Simon Horman
       [not found]         ` <20160607030809.GE31696-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Simon Horman @ 2016-06-07  3:08 UTC (permalink / raw)
  To: pravin shelar; +Cc: Linux Kernel Network Developers, ovs dev

On Thu, Jun 02, 2016 at 03:01:47PM -0700, pravin shelar wrote:
> On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
> <simon.horman@netronome.com> wrote:
> > * Set skb protocol based on contents of packet. I have observed this is
> >   necessary to get actual protocol of a packet when it is injected into an
> >   internal device e.g. by libnet in which case skb protocol will be set to
> >   ETH_ALL.
> >
> > * Set the mac_len which has been observed to not be set up correctly when
> >   an ARP packet is generated and sent via an openvswitch bridge.
> >   My test case is a scenario where there are two open vswtich bridges.
> >   One outputs to a tunnel port which egresses on the other.
> >
> > The motivation for this is that support for outputting to layer 3 (non-tap)
> > GRE tunnels as implemented by a subsequent patch depends on protocol and
> > mac_len being set correctly on receive.
> >
> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
> >
> > ---
> > v10
> > * Set mac_len
> >
> > v9
> > * New patch
> > ---
> >  net/openvswitch/vport-internal_dev.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
> > index 2ee48e447b72..f89b1efa88f1 100644
> > --- a/net/openvswitch/vport-internal_dev.c
> > +++ b/net/openvswitch/vport-internal_dev.c
> > @@ -48,6 +48,10 @@ static int internal_dev_xmit(struct sk_buff *skb, struct net_device *netdev)
> >  {
> >         int len, err;
> >
> > +       skb->protocol = eth_type_trans(skb, netdev);
> > +       skb_push(skb, ETH_HLEN);
> > +       skb_reset_mac_len(skb);
> > +
> resetting mac-len breaks the assumption about mac_len for referencing
> MPLS header ref: skb_mpls_header().

Thanks I had overlooked this. I think it is actually safe as
the mac_len is recalculated quite soon in key_extract() and IIRC
the most important thing is for mac_len to be 0 or non-zero
for the benefit of ovs_flow_key_extract(). None the less it does
seem untidy and moreover inconsistent with the handling in
netdev_port_receive() by a latter patch which does the following:

	eth_type = eth_type_trans(skb, skb->dev);
	skb->mac_len = skb->data - skb_mac_header(skb);
	__skb_push(skb, skb->mac_len);

	if (eth_type == htons(ETH_P_8021Q))
		skb->mac_len += VLAN_HLEN;

Perhaps that logic ought to be in a helper used by both internal_dev_xmit()
and netdev_port_receive(). Or somehow centralised in ovs_vport_receive().

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 2/5] openvswitch: set skb protocol and mac_len when receiving on internal device
       [not found]         ` <20160607030809.GE31696-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-06-07 22:45           ` pravin shelar
  2016-06-17  5:53             ` Simon Horman
  0 siblings, 1 reply; 23+ messages in thread
From: pravin shelar @ 2016-06-07 22:45 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers

On Mon, Jun 6, 2016 at 8:08 PM, Simon Horman <simon.horman@netronome.com> wrote:
> On Thu, Jun 02, 2016 at 03:01:47PM -0700, pravin shelar wrote:
>> On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
>> <simon.horman@netronome.com> wrote:
>> > * Set skb protocol based on contents of packet. I have observed this is
>> >   necessary to get actual protocol of a packet when it is injected into an
>> >   internal device e.g. by libnet in which case skb protocol will be set to
>> >   ETH_ALL.
>> >
>> > * Set the mac_len which has been observed to not be set up correctly when
>> >   an ARP packet is generated and sent via an openvswitch bridge.
>> >   My test case is a scenario where there are two open vswtich bridges.
>> >   One outputs to a tunnel port which egresses on the other.
>> >
>> > The motivation for this is that support for outputting to layer 3 (non-tap)
>> > GRE tunnels as implemented by a subsequent patch depends on protocol and
>> > mac_len being set correctly on receive.
>> >
>> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
>> >
>> > ---
>> > v10
>> > * Set mac_len
>> >
>> > v9
>> > * New patch
>> > ---
>> >  net/openvswitch/vport-internal_dev.c | 4 ++++
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
>> > index 2ee48e447b72..f89b1efa88f1 100644
>> > --- a/net/openvswitch/vport-internal_dev.c
>> > +++ b/net/openvswitch/vport-internal_dev.c
>> > @@ -48,6 +48,10 @@ static int internal_dev_xmit(struct sk_buff *skb, struct net_device *netdev)
>> >  {
>> >         int len, err;
>> >
>> > +       skb->protocol = eth_type_trans(skb, netdev);
>> > +       skb_push(skb, ETH_HLEN);
>> > +       skb_reset_mac_len(skb);
>> > +
>> resetting mac-len breaks the assumption about mac_len for referencing
>> MPLS header ref: skb_mpls_header().
>
> Thanks I had overlooked this. I think it is actually safe as
> the mac_len is recalculated quite soon in key_extract() and IIRC
> the most important thing is for mac_len to be 0 or non-zero
> for the benefit of ovs_flow_key_extract(). None the less it does
> seem untidy and moreover inconsistent with the handling in
> netdev_port_receive() by a latter patch which does the following:
>
>         eth_type = eth_type_trans(skb, skb->dev);
>         skb->mac_len = skb->data - skb_mac_header(skb);
>         __skb_push(skb, skb->mac_len);
>
>         if (eth_type == htons(ETH_P_8021Q))
>                 skb->mac_len += VLAN_HLEN;
>
> Perhaps that logic ought to be in a helper used by both internal_dev_xmit()
> and netdev_port_receive(). Or somehow centralised in ovs_vport_receive().

This does looks bit complex. Can we use other skb metadata like
skb_mac_header_was_set()?
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 3/5] openvswitch: add support to push and pop mpls for layer3 packets
  2016-06-07  2:51     ` Simon Horman
@ 2016-06-07 22:45       ` pravin shelar
  0 siblings, 0 replies; 23+ messages in thread
From: pravin shelar @ 2016-06-07 22:45 UTC (permalink / raw)
  To: Simon Horman; +Cc: Linux Kernel Network Developers, ovs dev

On Mon, Jun 6, 2016 at 7:51 PM, Simon Horman <simon.horman@netronome.com> wrote:
> On Thu, Jun 02, 2016 at 03:02:00PM -0700, pravin shelar wrote:
>> On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
>> <simon.horman@netronome.com> wrote:
>> > Allow push and pop mpls actions to act on layer 3 packets by teaching
>> > them not to access non-existent L2 headers of such packets.
>> >
>> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
>> > ---
>> > v10
>> > * Limit scope of hdr in {push,pop}_mpls()
>> >
>> > v9
>> > * New Patch
>> > ---
>> >  net/openvswitch/actions.c | 19 ++++++++++++-------
>> >  1 file changed, 12 insertions(+), 7 deletions(-)
>> >
>> > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> > index 9a3eb7a0ebf4..15f130e4c22b 100644
>> > --- a/net/openvswitch/actions.c
>> > +++ b/net/openvswitch/actions.c
>> > @@ -172,7 +172,8 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
>> >
>> >         skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
>> >
>> > -       update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
>> > +       if (skb->mac_len)
>> > +               update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
>> We can move all ethernet related code in this if block. for example memmove().
>
> My assumption is that the memmove() does nothing if skb->mac_len is zero
> and from my point of view it seems clean to leave it where it is unless
> the code around it also moves.
>
> Is there other code you think could/should be moved into the
> if (skb->mac_len) block?
>
yes, the code is correct. But I think grouping l2 related updated
together makes code easier to follow.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 4/5] openvswitch: add layer 3 flow/port support
       [not found]         ` <20160607024609.GC31696-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-06-07 22:45           ` pravin shelar
  2016-06-17  6:53             ` Simon Horman
  0 siblings, 1 reply; 23+ messages in thread
From: pravin shelar @ 2016-06-07 22:45 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers

On Mon, Jun 6, 2016 at 7:46 PM, Simon Horman <simon.horman@netronome.com> wrote:
> On Thu, Jun 02, 2016 at 03:02:18PM -0700, pravin shelar wrote:
>> On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
>> <simon.horman@netronome.com> wrote:
>
> [...]
>
>> > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> > index 15f130e4c22b..5567529904fa 100644
>> > --- a/net/openvswitch/actions.c
>> > +++ b/net/openvswitch/actions.c
>> > @@ -300,6 +300,51 @@ static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key,
>> >         return 0;
>> >  }
>> >
>> > +static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
>> > +{
>> > +       /* Pop outermost VLAN tag to skb metadata unless a VLAN tag
>> > +        * is already present there.
>> > +        */
>> > +       if ((skb->protocol == htons(ETH_P_8021Q) ||
>> > +            skb->protocol == htons(ETH_P_8021AD)) &&
>> > +           !skb_vlan_tag_present(skb)) {
>> > +               int err = skb_vlan_accel(skb);
>> > +               if (unlikely(err))
>> > +                       return err;
>> > +       }
>> > +
>>
>> I do not think we can keep just the vlan tag and pop ethernet header.
>> There are multiple issues with this.
>> First networking stack can not handle suck packet. second issue even
>> after this patch OVS can not parse this type of packet. third this
>> patch does not allow pop-eth action on vlan tagged packet.
>> There is already separate vlan related actions in OVS so lets keep it simple.
>
> I wonder if the best solution is to simply omit handling VLAN tags
> in pop_eth for now. As you mention pop_eth is not permitted on such packets.
>
yes, lets just drop vlan support here.


>> > diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
>> > index 0bb650f4f219..1e1392c3c0ed 100644
>> > --- a/net/openvswitch/flow_netlink.c
>> > +++ b/net/openvswitch/flow_netlink.c
>
> [...]
>
>> > @@ -355,6 +359,7 @@ static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
>> >         [OVS_KEY_ATTR_CT_ZONE]   = { .len = sizeof(u16) },
>> >         [OVS_KEY_ATTR_CT_MARK]   = { .len = sizeof(u32) },
>> >         [OVS_KEY_ATTR_CT_LABELS] = { .len = sizeof(struct ovs_key_ct_labels) },
>> > +       [OVS_KEY_ATTR_PACKET_ETHERTYPE] = { .len = sizeof(__be16) },
>> >  };
>> >
>> I do not see need for OVS_KEY_ATTR_PACKET_ETHERTYPE, we can use
>> existing OVS_KEY_ATTR_ETHERTYPE to serialize the flow key. If there is
>> no OVS_KEY_ATTR_ETHERNET attribute then its l3 packet.
>
> The idea of OVS_KEY_ATTR_PACKET_ETHERTYPE is to allow communication of
> the L2 type of the packet which is not present in an L3 packet. In terms
> of GRE (non-TEB) this correlates to the Protocol Type field in the GRE
> header.

How about using OVS_KEY_ATTR_ETHERTYPE to communicate the protocol type?
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 2/5] openvswitch: set skb protocol and mac_len when receiving on internal device
  2016-06-07 22:45           ` pravin shelar
@ 2016-06-17  5:53             ` Simon Horman
       [not found]               ` <20160617055331.GA24833-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Simon Horman @ 2016-06-17  5:53 UTC (permalink / raw)
  To: pravin shelar; +Cc: Linux Kernel Network Developers, ovs dev

On Tue, Jun 07, 2016 at 03:45:27PM -0700, pravin shelar wrote:
> On Mon, Jun 6, 2016 at 8:08 PM, Simon Horman <simon.horman@netronome.com> wrote:
> > On Thu, Jun 02, 2016 at 03:01:47PM -0700, pravin shelar wrote:
> >> On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
> >> <simon.horman@netronome.com> wrote:
> >> > * Set skb protocol based on contents of packet. I have observed this is
> >> >   necessary to get actual protocol of a packet when it is injected into an
> >> >   internal device e.g. by libnet in which case skb protocol will be set to
> >> >   ETH_ALL.
> >> >
> >> > * Set the mac_len which has been observed to not be set up correctly when
> >> >   an ARP packet is generated and sent via an openvswitch bridge.
> >> >   My test case is a scenario where there are two open vswtich bridges.
> >> >   One outputs to a tunnel port which egresses on the other.
> >> >
> >> > The motivation for this is that support for outputting to layer 3 (non-tap)
> >> > GRE tunnels as implemented by a subsequent patch depends on protocol and
> >> > mac_len being set correctly on receive.
> >> >
> >> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
> >> >
> >> > ---
> >> > v10
> >> > * Set mac_len
> >> >
> >> > v9
> >> > * New patch
> >> > ---
> >> >  net/openvswitch/vport-internal_dev.c | 4 ++++
> >> >  1 file changed, 4 insertions(+)
> >> >
> >> > diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
> >> > index 2ee48e447b72..f89b1efa88f1 100644
> >> > --- a/net/openvswitch/vport-internal_dev.c
> >> > +++ b/net/openvswitch/vport-internal_dev.c
> >> > @@ -48,6 +48,10 @@ static int internal_dev_xmit(struct sk_buff *skb, struct net_device *netdev)
> >> >  {
> >> >         int len, err;
> >> >
> >> > +       skb->protocol = eth_type_trans(skb, netdev);
> >> > +       skb_push(skb, ETH_HLEN);
> >> > +       skb_reset_mac_len(skb);
> >> > +
> >> resetting mac-len breaks the assumption about mac_len for referencing
> >> MPLS header ref: skb_mpls_header().
> >
> > Thanks I had overlooked this. I think it is actually safe as
> > the mac_len is recalculated quite soon in key_extract() and IIRC
> > the most important thing is for mac_len to be 0 or non-zero
> > for the benefit of ovs_flow_key_extract(). None the less it does
> > seem untidy and moreover inconsistent with the handling in
> > netdev_port_receive() by a latter patch which does the following:
> >
> >         eth_type = eth_type_trans(skb, skb->dev);
> >         skb->mac_len = skb->data - skb_mac_header(skb);
> >         __skb_push(skb, skb->mac_len);
> >
> >         if (eth_type == htons(ETH_P_8021Q))
> >                 skb->mac_len += VLAN_HLEN;
> >
> > Perhaps that logic ought to be in a helper used by both internal_dev_xmit()
> > and netdev_port_receive(). Or somehow centralised in ovs_vport_receive().
> 
> This does looks bit complex. Can we use other skb metadata like
> skb_mac_header_was_set()?

Yes, I think that can be made to work if skb->mac_header is unset
for l3 packets in netdev_port_receive(). The following is an incremental
patch on the entire series. Is this the kind of thing you had in mind?

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 86f2cfb19de3..42587d5bf894 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -729,7 +729,7 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 	key->phy.skb_mark = skb->mark;
 	ovs_ct_fill_key(skb, key);
 	key->ovs_flow_hash = 0;
-	key->phy.is_layer3 = skb->mac_len == 0;
+	key->phy.is_layer3 = skb_mac_header_was_set(skb) == 0;
 	key->recirc_id = 0;
 
 	err = key_extract(skb, key);
diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index 484ba529c682..8973d4db509b 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -50,7 +50,6 @@ static int internal_dev_xmit(struct sk_buff *skb, struct net_device *netdev)
 
 	skb->protocol = eth_type_trans(skb, netdev);
 	skb_push(skb, ETH_HLEN);
-	skb_reset_mac_len(skb);
 
 	len = skb->len;
 	rcu_read_lock();
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 3df36df62ee9..4cf3f12ffc99 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -60,22 +60,9 @@ static void netdev_port_receive(struct sk_buff *skb)
 	if (vport->dev->type == ARPHRD_ETHER) {
 		skb_push(skb, ETH_HLEN);
 		skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
-	} else if (vport->dev->type == ARPHRD_NONE) {
-		if (skb->protocol == htons(ETH_P_TEB)) {
-			__be16 eth_type;
-
-			if (unlikely(skb->len < ETH_HLEN))
-				goto error;
-
-			eth_type = eth_type_trans(skb, skb->dev);
-			skb->mac_len = skb->data - skb_mac_header(skb);
-			__skb_push(skb, skb->mac_len);
-
-			if (eth_type == htons(ETH_P_8021Q))
-				skb->mac_len += VLAN_HLEN;
-		} else {
-			skb->mac_len = 0;
-		}
+	} else if (vport->dev->type == ARPHRD_NONE &&
+		   skb->protocol != htons(ETH_P_TEB)) {
+		skb->mac_header = (typeof(skb->mac_header))~0U;
 	}
 
 	ovs_vport_receive(vport, skb, skb_tunnel_info(skb));

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 4/5] openvswitch: add layer 3 flow/port support
  2016-06-07 22:45           ` pravin shelar
@ 2016-06-17  6:53             ` Simon Horman
  0 siblings, 0 replies; 23+ messages in thread
From: Simon Horman @ 2016-06-17  6:53 UTC (permalink / raw)
  To: pravin shelar; +Cc: Linux Kernel Network Developers, ovs dev

On Tue, Jun 07, 2016 at 03:45:58PM -0700, pravin shelar wrote:
> On Mon, Jun 6, 2016 at 7:46 PM, Simon Horman <simon.horman@netronome.com> wrote:
> > On Thu, Jun 02, 2016 at 03:02:18PM -0700, pravin shelar wrote:
> >> On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
> >> <simon.horman@netronome.com> wrote:

[...]

> >> > diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> >> > index 0bb650f4f219..1e1392c3c0ed 100644
> >> > --- a/net/openvswitch/flow_netlink.c
> >> > +++ b/net/openvswitch/flow_netlink.c
> >
> > [...]
> >
> >> > @@ -355,6 +359,7 @@ static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
> >> >         [OVS_KEY_ATTR_CT_ZONE]   = { .len = sizeof(u16) },
> >> >         [OVS_KEY_ATTR_CT_MARK]   = { .len = sizeof(u32) },
> >> >         [OVS_KEY_ATTR_CT_LABELS] = { .len = sizeof(struct ovs_key_ct_labels) },
> >> > +       [OVS_KEY_ATTR_PACKET_ETHERTYPE] = { .len = sizeof(__be16) },
> >> >  };
> >> >
> >> I do not see need for OVS_KEY_ATTR_PACKET_ETHERTYPE, we can use
> >> existing OVS_KEY_ATTR_ETHERTYPE to serialize the flow key. If there is
> >> no OVS_KEY_ATTR_ETHERNET attribute then its l3 packet.
> >
> > The idea of OVS_KEY_ATTR_PACKET_ETHERTYPE is to allow communication of
> > the L2 type of the packet which is not present in an L3 packet. In terms
> > of GRE (non-TEB) this correlates to the Protocol Type field in the GRE
> > header.
> 
> How about using OVS_KEY_ATTR_ETHERTYPE to communicate the protocol type?

Yes, I believe that I now have that working locally. The assumption that
I am now working with is that OVS_KEY_ATTR_ETHERTYPE provides the type
metadata for Layer 3 packets if OVS_KEY_ATTR_ETHERNET (the Mac addresses)
is absent.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 2/5] openvswitch: set skb protocol and mac_len when receiving on internal device
       [not found]               ` <20160617055331.GA24833-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-06-19  1:38                 ` pravin shelar
       [not found]                   ` <CAOrHB_BWfLk8yba2XkF43Hm-czGUZnVWQQ=HW3-vvpFEH7GNpA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: pravin shelar @ 2016-06-19  1:38 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers

On Thu, Jun 16, 2016 at 10:53 PM, Simon Horman
<simon.horman@netronome.com> wrote:
> On Tue, Jun 07, 2016 at 03:45:27PM -0700, pravin shelar wrote:
>> On Mon, Jun 6, 2016 at 8:08 PM, Simon Horman <simon.horman@netronome.com> wrote:
>> > On Thu, Jun 02, 2016 at 03:01:47PM -0700, pravin shelar wrote:
>> >> On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
>> >> <simon.horman@netronome.com> wrote:
>> >> > * Set skb protocol based on contents of packet. I have observed this is
>> >> >   necessary to get actual protocol of a packet when it is injected into an
>> >> >   internal device e.g. by libnet in which case skb protocol will be set to
>> >> >   ETH_ALL.
....
....
>> >         eth_type = eth_type_trans(skb, skb->dev);
>> >         skb->mac_len = skb->data - skb_mac_header(skb);
>> >         __skb_push(skb, skb->mac_len);
>> >
>> >         if (eth_type == htons(ETH_P_8021Q))
>> >                 skb->mac_len += VLAN_HLEN;
>> >
>> > Perhaps that logic ought to be in a helper used by both internal_dev_xmit()
>> > and netdev_port_receive(). Or somehow centralised in ovs_vport_receive().
>>
>> This does looks bit complex. Can we use other skb metadata like
>> skb_mac_header_was_set()?
>
> Yes, I think that can be made to work if skb->mac_header is unset
> for l3 packets in netdev_port_receive(). The following is an incremental
> patch on the entire series. Is this the kind of thing you had in mind?
>
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index 86f2cfb19de3..42587d5bf894 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -729,7 +729,7 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>         key->phy.skb_mark = skb->mark;
>         ovs_ct_fill_key(skb, key);
>         key->ovs_flow_hash = 0;
> -       key->phy.is_layer3 = skb->mac_len == 0;
> +       key->phy.is_layer3 = skb_mac_header_was_set(skb) == 0;
>         key->recirc_id = 0;
>
>         err = key_extract(skb, key);
> diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
> index 484ba529c682..8973d4db509b 100644
> --- a/net/openvswitch/vport-internal_dev.c
> +++ b/net/openvswitch/vport-internal_dev.c
> @@ -50,7 +50,6 @@ static int internal_dev_xmit(struct sk_buff *skb, struct net_device *netdev)
>
>         skb->protocol = eth_type_trans(skb, netdev);
>         skb_push(skb, ETH_HLEN);
> -       skb_reset_mac_len(skb);
>
>         len = skb->len;
>         rcu_read_lock();
> diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
> index 3df36df62ee9..4cf3f12ffc99 100644
> --- a/net/openvswitch/vport-netdev.c
> +++ b/net/openvswitch/vport-netdev.c
> @@ -60,22 +60,9 @@ static void netdev_port_receive(struct sk_buff *skb)
>         if (vport->dev->type == ARPHRD_ETHER) {
>                 skb_push(skb, ETH_HLEN);
>                 skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
> -       } else if (vport->dev->type == ARPHRD_NONE) {
> -               if (skb->protocol == htons(ETH_P_TEB)) {
> -                       __be16 eth_type;
> -
> -                       if (unlikely(skb->len < ETH_HLEN))
> -                               goto error;
> -
> -                       eth_type = eth_type_trans(skb, skb->dev);
> -                       skb->mac_len = skb->data - skb_mac_header(skb);
> -                       __skb_push(skb, skb->mac_len);
> -
> -                       if (eth_type == htons(ETH_P_8021Q))
> -                               skb->mac_len += VLAN_HLEN;
> -               } else {
> -                       skb->mac_len = 0;
> -               }
> +       } else if (vport->dev->type == ARPHRD_NONE &&
> +                  skb->protocol != htons(ETH_P_TEB)) {
> +               skb->mac_header = (typeof(skb->mac_header))~0U;
>         }
>
>         ovs_vport_receive(vport, skb, skb_tunnel_info(skb));

This certainly looks better. I was wondering if we can unset the mac
header offset in L3 tunnel devices itself. So there is no need to have
this check here.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 2/5] openvswitch: set skb protocol and mac_len when receiving on internal device
       [not found]                   ` <CAOrHB_BWfLk8yba2XkF43Hm-czGUZnVWQQ=HW3-vvpFEH7GNpA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-06-21  2:25                     ` Simon Horman
       [not found]                       ` <20160621022458.GA28358-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Simon Horman @ 2016-06-21  2:25 UTC (permalink / raw)
  To: pravin shelar; +Cc: ovs dev, Linux Kernel Network Developers, Jiri Benc

[Cc Jiri Benc]

On Sat, Jun 18, 2016 at 06:38:54PM -0700, pravin shelar wrote:
> On Thu, Jun 16, 2016 at 10:53 PM, Simon Horman
> <simon.horman@netronome.com> wrote:
> > On Tue, Jun 07, 2016 at 03:45:27PM -0700, pravin shelar wrote:
> >> On Mon, Jun 6, 2016 at 8:08 PM, Simon Horman <simon.horman@netronome.com> wrote:
> >> > On Thu, Jun 02, 2016 at 03:01:47PM -0700, pravin shelar wrote:
> >> >> On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
> >> >> <simon.horman@netronome.com> wrote:
> >> >> > * Set skb protocol based on contents of packet. I have observed this is
> >> >> >   necessary to get actual protocol of a packet when it is injected into an
> >> >> >   internal device e.g. by libnet in which case skb protocol will be set to
> >> >> >   ETH_ALL.
> ....
> ....
> >> >         eth_type = eth_type_trans(skb, skb->dev);
> >> >         skb->mac_len = skb->data - skb_mac_header(skb);
> >> >         __skb_push(skb, skb->mac_len);
> >> >
> >> >         if (eth_type == htons(ETH_P_8021Q))
> >> >                 skb->mac_len += VLAN_HLEN;
> >> >
> >> > Perhaps that logic ought to be in a helper used by both internal_dev_xmit()
> >> > and netdev_port_receive(). Or somehow centralised in ovs_vport_receive().
> >>
> >> This does looks bit complex. Can we use other skb metadata like
> >> skb_mac_header_was_set()?
> >
> > Yes, I think that can be made to work if skb->mac_header is unset
> > for l3 packets in netdev_port_receive(). The following is an incremental
> > patch on the entire series. Is this the kind of thing you had in mind?
> >
> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> > index 86f2cfb19de3..42587d5bf894 100644
> > --- a/net/openvswitch/flow.c
> > +++ b/net/openvswitch/flow.c
> > @@ -729,7 +729,7 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
> >         key->phy.skb_mark = skb->mark;
> >         ovs_ct_fill_key(skb, key);
> >         key->ovs_flow_hash = 0;
> > -       key->phy.is_layer3 = skb->mac_len == 0;
> > +       key->phy.is_layer3 = skb_mac_header_was_set(skb) == 0;
> >         key->recirc_id = 0;
> >
> >         err = key_extract(skb, key);
> > diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
> > index 484ba529c682..8973d4db509b 100644
> > --- a/net/openvswitch/vport-internal_dev.c
> > +++ b/net/openvswitch/vport-internal_dev.c
> > @@ -50,7 +50,6 @@ static int internal_dev_xmit(struct sk_buff *skb, struct net_device *netdev)
> >
> >         skb->protocol = eth_type_trans(skb, netdev);
> >         skb_push(skb, ETH_HLEN);
> > -       skb_reset_mac_len(skb);
> >
> >         len = skb->len;
> >         rcu_read_lock();
> > diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
> > index 3df36df62ee9..4cf3f12ffc99 100644
> > --- a/net/openvswitch/vport-netdev.c
> > +++ b/net/openvswitch/vport-netdev.c
> > @@ -60,22 +60,9 @@ static void netdev_port_receive(struct sk_buff *skb)
> >         if (vport->dev->type == ARPHRD_ETHER) {
> >                 skb_push(skb, ETH_HLEN);
> >                 skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
> > -       } else if (vport->dev->type == ARPHRD_NONE) {
> > -               if (skb->protocol == htons(ETH_P_TEB)) {
> > -                       __be16 eth_type;
> > -
> > -                       if (unlikely(skb->len < ETH_HLEN))
> > -                               goto error;
> > -
> > -                       eth_type = eth_type_trans(skb, skb->dev);
> > -                       skb->mac_len = skb->data - skb_mac_header(skb);
> > -                       __skb_push(skb, skb->mac_len);
> > -
> > -                       if (eth_type == htons(ETH_P_8021Q))
> > -                               skb->mac_len += VLAN_HLEN;
> > -               } else {
> > -                       skb->mac_len = 0;
> > -               }
> > +       } else if (vport->dev->type == ARPHRD_NONE &&
> > +                  skb->protocol != htons(ETH_P_TEB)) {
> > +               skb->mac_header = (typeof(skb->mac_header))~0U;
> >         }
> >
> >         ovs_vport_receive(vport, skb, skb_tunnel_info(skb));
> 
> This certainly looks better. I was wondering if we can unset the mac
> header offset in L3 tunnel devices itself. So there is no need to have
> this check here.

I think that might be possible for GRE by modifying the following in
__ipgre_rcv().

                if (tunnel->dev->type != ARPHRD_NONE)
                        skb_pop_mac_header(skb);
                else
                        skb_reset_mac_header(skb);

But I am unsure what side effects this might have on other users of the
code.

Jiri, do you have any thoughts on this?
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 2/5] openvswitch: set skb protocol and mac_len when receiving on internal device
       [not found]                       ` <20160621022458.GA28358-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
@ 2016-06-21 16:30                         ` pravin shelar
  2016-06-23  2:04                           ` Simon Horman
  0 siblings, 1 reply; 23+ messages in thread
From: pravin shelar @ 2016-06-21 16:30 UTC (permalink / raw)
  To: Simon Horman; +Cc: ovs dev, Linux Kernel Network Developers, Jiri Benc

On Mon, Jun 20, 2016 at 7:25 PM, Simon Horman
<simon.horman@netronome.com> wrote:
> [Cc Jiri Benc]
>
> On Sat, Jun 18, 2016 at 06:38:54PM -0700, pravin shelar wrote:
>> On Thu, Jun 16, 2016 at 10:53 PM, Simon Horman
>> <simon.horman@netronome.com> wrote:
>> > On Tue, Jun 07, 2016 at 03:45:27PM -0700, pravin shelar wrote:
>> >> On Mon, Jun 6, 2016 at 8:08 PM, Simon Horman <simon.horman@netronome.com> wrote:
>> >> > On Thu, Jun 02, 2016 at 03:01:47PM -0700, pravin shelar wrote:
>> >> >> On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
>> >> >> <simon.horman@netronome.com> wrote:
>> >> >> > * Set skb protocol based on contents of packet. I have observed this is
>> >> >> >   necessary to get actual protocol of a packet when it is injected into an
>> >> >> >   internal device e.g. by libnet in which case skb protocol will be set to
>> >> >> >   ETH_ALL.
>> ....
>> ....
>> >> >         eth_type = eth_type_trans(skb, skb->dev);
>> >> >         skb->mac_len = skb->data - skb_mac_header(skb);
>> >> >         __skb_push(skb, skb->mac_len);
>> >> >
>> >> >         if (eth_type == htons(ETH_P_8021Q))
>> >> >                 skb->mac_len += VLAN_HLEN;
>> >> >
>> >> > Perhaps that logic ought to be in a helper used by both internal_dev_xmit()
>> >> > and netdev_port_receive(). Or somehow centralised in ovs_vport_receive().
>> >>
>> >> This does looks bit complex. Can we use other skb metadata like
>> >> skb_mac_header_was_set()?
>> >
>> > Yes, I think that can be made to work if skb->mac_header is unset
>> > for l3 packets in netdev_port_receive(). The following is an incremental
>> > patch on the entire series. Is this the kind of thing you had in mind?
>> >
>> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>> > index 86f2cfb19de3..42587d5bf894 100644
>> > --- a/net/openvswitch/flow.c
>> > +++ b/net/openvswitch/flow.c
>> > @@ -729,7 +729,7 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>> >         key->phy.skb_mark = skb->mark;
>> >         ovs_ct_fill_key(skb, key);
>> >         key->ovs_flow_hash = 0;
>> > -       key->phy.is_layer3 = skb->mac_len == 0;
>> > +       key->phy.is_layer3 = skb_mac_header_was_set(skb) == 0;
>> >         key->recirc_id = 0;
>> >
>> >         err = key_extract(skb, key);
>> > diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
>> > index 484ba529c682..8973d4db509b 100644
>> > --- a/net/openvswitch/vport-internal_dev.c
>> > +++ b/net/openvswitch/vport-internal_dev.c
>> > @@ -50,7 +50,6 @@ static int internal_dev_xmit(struct sk_buff *skb, struct net_device *netdev)
>> >
>> >         skb->protocol = eth_type_trans(skb, netdev);
>> >         skb_push(skb, ETH_HLEN);
>> > -       skb_reset_mac_len(skb);
>> >
>> >         len = skb->len;
>> >         rcu_read_lock();
>> > diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
>> > index 3df36df62ee9..4cf3f12ffc99 100644
>> > --- a/net/openvswitch/vport-netdev.c
>> > +++ b/net/openvswitch/vport-netdev.c
>> > @@ -60,22 +60,9 @@ static void netdev_port_receive(struct sk_buff *skb)
>> >         if (vport->dev->type == ARPHRD_ETHER) {
>> >                 skb_push(skb, ETH_HLEN);
>> >                 skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
>> > -       } else if (vport->dev->type == ARPHRD_NONE) {
>> > -               if (skb->protocol == htons(ETH_P_TEB)) {
>> > -                       __be16 eth_type;
>> > -
>> > -                       if (unlikely(skb->len < ETH_HLEN))
>> > -                               goto error;
>> > -
>> > -                       eth_type = eth_type_trans(skb, skb->dev);
>> > -                       skb->mac_len = skb->data - skb_mac_header(skb);
>> > -                       __skb_push(skb, skb->mac_len);
>> > -
>> > -                       if (eth_type == htons(ETH_P_8021Q))
>> > -                               skb->mac_len += VLAN_HLEN;
>> > -               } else {
>> > -                       skb->mac_len = 0;
>> > -               }
>> > +       } else if (vport->dev->type == ARPHRD_NONE &&
>> > +                  skb->protocol != htons(ETH_P_TEB)) {
>> > +               skb->mac_header = (typeof(skb->mac_header))~0U;
>> >         }
>> >
>> >         ovs_vport_receive(vport, skb, skb_tunnel_info(skb));
>>
>> This certainly looks better. I was wondering if we can unset the mac
>> header offset in L3 tunnel devices itself. So there is no need to have
>> this check here.
>
> I think that might be possible for GRE by modifying the following in
> __ipgre_rcv().
>
>                 if (tunnel->dev->type != ARPHRD_NONE)
>                         skb_pop_mac_header(skb);
>                 else
>                         skb_reset_mac_header(skb);
>
> But I am unsure what side effects this might have on other users of the
> code.
>
I think it is fine with device of type ARPHRD_NONE. metadata tunnel
devices would be of this type anyways.

> Jiri, do you have any thoughts on this?
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 2/5] openvswitch: set skb protocol and mac_len when receiving on internal device
  2016-06-21 16:30                         ` pravin shelar
@ 2016-06-23  2:04                           ` Simon Horman
  2016-06-27  9:35                             ` Jiri Benc
  0 siblings, 1 reply; 23+ messages in thread
From: Simon Horman @ 2016-06-23  2:04 UTC (permalink / raw)
  To: pravin shelar; +Cc: Linux Kernel Network Developers, ovs dev, Jiri Benc

On Tue, Jun 21, 2016 at 09:30:17AM -0700, pravin shelar wrote:
> On Mon, Jun 20, 2016 at 7:25 PM, Simon Horman
> <simon.horman@netronome.com> wrote:
> > [Cc Jiri Benc]
> >
> > On Sat, Jun 18, 2016 at 06:38:54PM -0700, pravin shelar wrote:
> >> On Thu, Jun 16, 2016 at 10:53 PM, Simon Horman
> >> <simon.horman@netronome.com> wrote:
> >> > On Tue, Jun 07, 2016 at 03:45:27PM -0700, pravin shelar wrote:
> >> >> On Mon, Jun 6, 2016 at 8:08 PM, Simon Horman <simon.horman@netronome.com> wrote:
> >> >> > On Thu, Jun 02, 2016 at 03:01:47PM -0700, pravin shelar wrote:
> >> >> >> On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
> >> >> >> <simon.horman@netronome.com> wrote:
> >> >> >> > * Set skb protocol based on contents of packet. I have observed this is
> >> >> >> >   necessary to get actual protocol of a packet when it is injected into an
> >> >> >> >   internal device e.g. by libnet in which case skb protocol will be set to
> >> >> >> >   ETH_ALL.
> >> ....
> >> ....
> >> >> >         eth_type = eth_type_trans(skb, skb->dev);
> >> >> >         skb->mac_len = skb->data - skb_mac_header(skb);
> >> >> >         __skb_push(skb, skb->mac_len);
> >> >> >
> >> >> >         if (eth_type == htons(ETH_P_8021Q))
> >> >> >                 skb->mac_len += VLAN_HLEN;
> >> >> >
> >> >> > Perhaps that logic ought to be in a helper used by both internal_dev_xmit()
> >> >> > and netdev_port_receive(). Or somehow centralised in ovs_vport_receive().
> >> >>
> >> >> This does looks bit complex. Can we use other skb metadata like
> >> >> skb_mac_header_was_set()?
> >> >
> >> > Yes, I think that can be made to work if skb->mac_header is unset
> >> > for l3 packets in netdev_port_receive(). The following is an incremental
> >> > patch on the entire series. Is this the kind of thing you had in mind?
> >> >
> >> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> >> > index 86f2cfb19de3..42587d5bf894 100644
> >> > --- a/net/openvswitch/flow.c
> >> > +++ b/net/openvswitch/flow.c
> >> > @@ -729,7 +729,7 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
> >> >         key->phy.skb_mark = skb->mark;
> >> >         ovs_ct_fill_key(skb, key);
> >> >         key->ovs_flow_hash = 0;
> >> > -       key->phy.is_layer3 = skb->mac_len == 0;
> >> > +       key->phy.is_layer3 = skb_mac_header_was_set(skb) == 0;
> >> >         key->recirc_id = 0;
> >> >
> >> >         err = key_extract(skb, key);
> >> > diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
> >> > index 484ba529c682..8973d4db509b 100644
> >> > --- a/net/openvswitch/vport-internal_dev.c
> >> > +++ b/net/openvswitch/vport-internal_dev.c
> >> > @@ -50,7 +50,6 @@ static int internal_dev_xmit(struct sk_buff *skb, struct net_device *netdev)
> >> >
> >> >         skb->protocol = eth_type_trans(skb, netdev);
> >> >         skb_push(skb, ETH_HLEN);
> >> > -       skb_reset_mac_len(skb);
> >> >
> >> >         len = skb->len;
> >> >         rcu_read_lock();
> >> > diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
> >> > index 3df36df62ee9..4cf3f12ffc99 100644
> >> > --- a/net/openvswitch/vport-netdev.c
> >> > +++ b/net/openvswitch/vport-netdev.c
> >> > @@ -60,22 +60,9 @@ static void netdev_port_receive(struct sk_buff *skb)
> >> >         if (vport->dev->type == ARPHRD_ETHER) {
> >> >                 skb_push(skb, ETH_HLEN);
> >> >                 skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
> >> > -       } else if (vport->dev->type == ARPHRD_NONE) {
> >> > -               if (skb->protocol == htons(ETH_P_TEB)) {
> >> > -                       __be16 eth_type;
> >> > -
> >> > -                       if (unlikely(skb->len < ETH_HLEN))
> >> > -                               goto error;
> >> > -
> >> > -                       eth_type = eth_type_trans(skb, skb->dev);
> >> > -                       skb->mac_len = skb->data - skb_mac_header(skb);
> >> > -                       __skb_push(skb, skb->mac_len);
> >> > -
> >> > -                       if (eth_type == htons(ETH_P_8021Q))
> >> > -                               skb->mac_len += VLAN_HLEN;
> >> > -               } else {
> >> > -                       skb->mac_len = 0;
> >> > -               }
> >> > +       } else if (vport->dev->type == ARPHRD_NONE &&
> >> > +                  skb->protocol != htons(ETH_P_TEB)) {
> >> > +               skb->mac_header = (typeof(skb->mac_header))~0U;
> >> >         }
> >> >
> >> >         ovs_vport_receive(vport, skb, skb_tunnel_info(skb));
> >>
> >> This certainly looks better. I was wondering if we can unset the mac
> >> header offset in L3 tunnel devices itself. So there is no need to have
> >> this check here.
> >
> > I think that might be possible for GRE by modifying the following in
> > __ipgre_rcv().
> >
> >                 if (tunnel->dev->type != ARPHRD_NONE)
> >                         skb_pop_mac_header(skb);
> >                 else
> >                         skb_reset_mac_header(skb);
> >
> > But I am unsure what side effects this might have on other users of the
> > code.
> >
> I think it is fine with device of type ARPHRD_NONE. metadata tunnel
> devices would be of this type anyways.
> 
> > Jiri, do you have any thoughts on this?

I think you are right as IIRC the call to skb_reset_mac_header was
added for this use-case. Its unfortunate that we can't use it in
internal_dev_xmit() because of loosing track of MPLS as you mentioned
earlier. But it does seem that setting mac_header to ~0 works well
in conjunction with updates to OvS posted earlier in this sub-therad.

I have the following working. Jiri, is the ip_gre portion acceptable to you?


diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 58d323289872..e6772b6934a3 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -279,7 +279,7 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
 		if (tunnel->dev->type != ARPHRD_NONE)
 			skb_pop_mac_header(skb);
 		else
-			skb_reset_mac_header(skb);
+			skb->mac_header = (typeof(skb->mac_header))~0U;
 		if (tunnel->collect_md) {
 			__be16 flags;
 			__be64 tun_id;
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 4cf3f12ffc99..82b10802abe6 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -60,9 +60,6 @@ static void netdev_port_receive(struct sk_buff *skb)
 	if (vport->dev->type == ARPHRD_ETHER) {
 		skb_push(skb, ETH_HLEN);
 		skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
-	} else if (vport->dev->type == ARPHRD_NONE &&
-		   skb->protocol != htons(ETH_P_TEB)) {
-		skb->mac_header = (typeof(skb->mac_header))~0U;
 	}
 
 	ovs_vport_receive(vport, skb, skb_tunnel_info(skb));

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 2/5] openvswitch: set skb protocol and mac_len when receiving on internal device
  2016-06-23  2:04                           ` Simon Horman
@ 2016-06-27  9:35                             ` Jiri Benc
  0 siblings, 0 replies; 23+ messages in thread
From: Jiri Benc @ 2016-06-27  9:35 UTC (permalink / raw)
  To: Simon Horman; +Cc: pravin shelar, Linux Kernel Network Developers, ovs dev

On Thu, 23 Jun 2016 11:04:38 +0900, Simon Horman wrote:
> I think you are right as IIRC the call to skb_reset_mac_header was
> added for this use-case. Its unfortunate that we can't use it in
> internal_dev_xmit() because of loosing track of MPLS as you mentioned
> earlier. But it does seem that setting mac_header to ~0 works well
> in conjunction with updates to OvS posted earlier in this sub-therad.
> 
> I have the following working. Jiri, is the ip_gre portion acceptable to you?

Sorry for the late reply, I was off the last week.

> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 58d323289872..e6772b6934a3 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -279,7 +279,7 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
>  		if (tunnel->dev->type != ARPHRD_NONE)
>  			skb_pop_mac_header(skb);
>  		else
> -			skb_reset_mac_header(skb);
> +			skb->mac_header = (typeof(skb->mac_header))~0U;

Looks good. We should introduce a helper for this, though
(skb_unset_mac_header or so).

Thanks!

 Jiri

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-06-27  9:35 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-02  6:24 [PATCH net-next v10 0/5] openvswitch: support for layer 3 encapsulated packets Simon Horman
2016-06-02  6:24 ` [PATCH net-next v10 1/5] net: add skb_vlan_accel helper Simon Horman
2016-06-02 22:01   ` pravin shelar
2016-06-02  6:24 ` [PATCH net-next v10 2/5] openvswitch: set skb protocol and mac_len when receiving on internal device Simon Horman
     [not found]   ` <1464848686-7656-3-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
2016-06-02 22:01     ` pravin shelar
2016-06-07  3:08       ` Simon Horman
     [not found]         ` <20160607030809.GE31696-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
2016-06-07 22:45           ` pravin shelar
2016-06-17  5:53             ` Simon Horman
     [not found]               ` <20160617055331.GA24833-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
2016-06-19  1:38                 ` pravin shelar
     [not found]                   ` <CAOrHB_BWfLk8yba2XkF43Hm-czGUZnVWQQ=HW3-vvpFEH7GNpA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-21  2:25                     ` Simon Horman
     [not found]                       ` <20160621022458.GA28358-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
2016-06-21 16:30                         ` pravin shelar
2016-06-23  2:04                           ` Simon Horman
2016-06-27  9:35                             ` Jiri Benc
2016-06-02  6:24 ` [PATCH net-next v10 3/5] openvswitch: add support to push and pop mpls for layer3 packets Simon Horman
2016-06-02 22:02   ` pravin shelar
2016-06-07  2:51     ` Simon Horman
2016-06-07 22:45       ` pravin shelar
2016-06-02  6:24 ` [PATCH net-next v10 4/5] openvswitch: add layer 3 flow/port support Simon Horman
     [not found]   ` <1464848686-7656-5-git-send-email-simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
2016-06-02 22:02     ` pravin shelar
2016-06-07  2:46       ` Simon Horman
     [not found]         ` <20160607024609.GC31696-IxS8c3vjKQDk1uMJSBkQmQ@public.gmane.org>
2016-06-07 22:45           ` pravin shelar
2016-06-17  6:53             ` Simon Horman
2016-06-02  6:24 ` [PATCH net-next v10 5/5] openvswitch: use ipgre tunnel rather than gretap tunnel Simon Horman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.