[PATCH v3 net-next 0/3] net: mpls: fragmentation and gso fixes for locally originated traffic

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 net-next 0/3] net: mpls: fragmentation and gso fixes for locally originated traffic
@ 2016-08-19 17:08 David Ahern
  2016-08-19 17:09 ` [PATCH net-next 1/3] net: lwtunnel: Handle fragmentation David Ahern
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: David Ahern @ 2016-08-19 17:08 UTC (permalink / raw)
  To: netdev, davem
  Cc: buytenh, simon.horman, ebiederm, rshearma, tom, tgraf,
	olivier.dugeon, alexander.duyck, roopa, David Ahern

This series fixes mtu and fragmentation for tunnels using lwtunnel
output redirect, and fixes GSO for MPLS for locally originated traffic
reported by Lennert Buytenhek.

A follow on series will address fragmentation and GSO for forwarded
MPLS traffic. Hardware offload of GSO with MPLS also needs to be
addressed.

v3
- updates to mpls_gso_segment per Alex's comments
- dropped skb->encapsulation = 1 from mpls_xmit per Alex's comment

v2
- consistent use of network_header in skb to fix GSO for MPLS
- update MPLS code in OVS to network_header and inner_network_header


David Ahern (2):
  net: mpls: Fixups for GSO
  net: veth: Set features for MPLS

Roopa Prabhu (1):
  net: lwtunnel: Handle fragmentation

 drivers/net/veth.c        |  1 +
 include/net/lwtunnel.h    | 44 ++++++++++++++++++++++++++++++++++++++++++++
 net/core/lwtunnel.c       | 35 +++++++++++++++++++++++++++++++++++
 net/ipv4/ip_output.c      |  8 ++++++++
 net/ipv4/route.c          |  4 +++-
 net/ipv6/ip6_output.c     |  8 ++++++++
 net/ipv6/route.c          |  4 +++-
 net/mpls/mpls_gso.c       | 38 +++++++++++++++++++++++++++-----------
 net/mpls/mpls_iptunnel.c  | 13 +++++++++----
 net/openvswitch/actions.c |  6 ++++++
 10 files changed, 144 insertions(+), 17 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH net-next 1/3] net: lwtunnel: Handle fragmentation
  2016-08-19 17:08 [PATCH v3 net-next 0/3] net: mpls: fragmentation and gso fixes for locally originated traffic David Ahern
@ 2016-08-19 17:09 ` David Ahern
  2016-08-19 17:09 ` [PATCH net-next 2/3] net: mpls: Fixups for GSO David Ahern
  2016-08-19 17:09 ` [PATCH net-next 3/3] net: veth: Set features for MPLS David Ahern
  2 siblings, 0 replies; 22+ messages in thread
From: David Ahern @ 2016-08-19 17:09 UTC (permalink / raw)
  To: netdev, davem
  Cc: buytenh, simon.horman, ebiederm, rshearma, tom, tgraf,
	olivier.dugeon, alexander.duyck, roopa, David Ahern

From: Roopa Prabhu <roopa@cumulusnetworks.com>

Today mpls iptunnel lwtunnel_output redirect expects the tunnel
output function to handle fragmentation. This is ok but can be
avoided if we did not do the mpls output redirect too early.
ie we could wait until ip fragmentation is done and then call
mpls output for each ip fragment.

To make this work we will need,
1) the lwtunnel state to carry encap headroom
2) and do the redirect to the encap output handler on the ip fragment
(essentially do the output redirect after fragmentation)

This patch adds tunnel headroom in lwtstate to make sure we
account for tunnel data in mtu calculations during fragmentation
and adds new xmit redirect handler to redirect to lwtunnel xmit func
after ip fragmentation.

This includes IPV6 and some mtu fixes and testing from David Ahern.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/net/lwtunnel.h   | 44 ++++++++++++++++++++++++++++++++++++++++++++
 net/core/lwtunnel.c      | 35 +++++++++++++++++++++++++++++++++++
 net/ipv4/ip_output.c     |  8 ++++++++
 net/ipv4/route.c         |  4 +++-
 net/ipv6/ip6_output.c    |  8 ++++++++
 net/ipv6/route.c         |  4 +++-
 net/mpls/mpls_iptunnel.c |  9 +++++----
 7 files changed, 106 insertions(+), 6 deletions(-)

diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index e9f116e29c22..ea3f80f58fd6 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -13,6 +13,13 @@
 /* lw tunnel state flags */
 #define LWTUNNEL_STATE_OUTPUT_REDIRECT	BIT(0)
 #define LWTUNNEL_STATE_INPUT_REDIRECT	BIT(1)
+#define LWTUNNEL_STATE_XMIT_REDIRECT	BIT(2)
+
+enum {
+	LWTUNNEL_XMIT_DONE,
+	LWTUNNEL_XMIT_CONTINUE,
+};
+
 
 struct lwtunnel_state {
 	__u16		type;
@@ -21,6 +28,7 @@ struct lwtunnel_state {
 	int		(*orig_output)(struct net *net, struct sock *sk, struct sk_buff *skb);
 	int		(*orig_input)(struct sk_buff *);
 	int             len;
+	__u16		headroom;
 	__u8            data[0];
 };
 
@@ -34,6 +42,7 @@ struct lwtunnel_encap_ops {
 			  struct lwtunnel_state *lwtstate);
 	int (*get_encap_size)(struct lwtunnel_state *lwtstate);
 	int (*cmp_encap)(struct lwtunnel_state *a, struct lwtunnel_state *b);
+	int (*xmit)(struct sk_buff *skb);
 };
 
 #ifdef CONFIG_LWTUNNEL
@@ -75,6 +84,24 @@ static inline bool lwtunnel_input_redirect(struct lwtunnel_state *lwtstate)
 
 	return false;
 }
+
+static inline bool lwtunnel_xmit_redirect(struct lwtunnel_state *lwtstate)
+{
+	if (lwtstate && (lwtstate->flags & LWTUNNEL_STATE_XMIT_REDIRECT))
+		return true;
+
+	return false;
+}
+
+static inline unsigned int lwtunnel_headroom(struct lwtunnel_state *lwtstate,
+					     unsigned int mtu)
+{
+	if (lwtunnel_xmit_redirect(lwtstate) && lwtstate->headroom < mtu)
+		return lwtstate->headroom;
+
+	return 0;
+}
+
 int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op,
 			   unsigned int num);
 int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
@@ -90,6 +117,7 @@ struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len);
 int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b);
 int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb);
 int lwtunnel_input(struct sk_buff *skb);
+int lwtunnel_xmit(struct sk_buff *skb);
 
 #else
 
@@ -117,6 +145,17 @@ static inline bool lwtunnel_input_redirect(struct lwtunnel_state *lwtstate)
 	return false;
 }
 
+static inline bool lwtunnel_xmit_redirect(struct lwtunnel_state *lwtstate)
+{
+	return false;
+}
+
+static inline unsigned int lwtunnel_headroom(struct lwtunnel_state *lwtstate,
+					     unsigned int mtu)
+{
+	return 0;
+}
+
 static inline int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op,
 					 unsigned int num)
 {
@@ -170,6 +209,11 @@ static inline int lwtunnel_input(struct sk_buff *skb)
 	return -EOPNOTSUPP;
 }
 
+static inline int lwtunnel_xmit(struct sk_buff *skb)
+{
+	return -EOPNOTSUPP;
+}
+
 #endif /* CONFIG_LWTUNNEL */
 
 #define MODULE_ALIAS_RTNL_LWT(encap_type) MODULE_ALIAS("rtnl-lwt-" __stringify(encap_type))
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index 669ecc9f884e..e5f84c26ba1a 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -251,6 +251,41 @@ int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(lwtunnel_output);
 
+int lwtunnel_xmit(struct sk_buff *skb)
+{
+	struct dst_entry *dst = skb_dst(skb);
+	const struct lwtunnel_encap_ops *ops;
+	struct lwtunnel_state *lwtstate;
+	int ret = -EINVAL;
+
+	if (!dst)
+		goto drop;
+
+	lwtstate = dst->lwtstate;
+
+	if (lwtstate->type == LWTUNNEL_ENCAP_NONE ||
+	    lwtstate->type > LWTUNNEL_ENCAP_MAX)
+		return 0;
+
+	ret = -EOPNOTSUPP;
+	rcu_read_lock();
+	ops = rcu_dereference(lwtun_encaps[lwtstate->type]);
+	if (likely(ops && ops->xmit))
+		ret = ops->xmit(skb);
+	rcu_read_unlock();
+
+	if (ret == -EOPNOTSUPP)
+		goto drop;
+
+	return ret;
+
+drop:
+	kfree_skb(skb);
+
+	return ret;
+}
+EXPORT_SYMBOL(lwtunnel_xmit);
+
 int lwtunnel_input(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index dde37fb340bf..65569274efb8 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -73,6 +73,7 @@
 #include <net/icmp.h>
 #include <net/checksum.h>
 #include <net/inetpeer.h>
+#include <net/lwtunnel.h>
 #include <linux/igmp.h>
 #include <linux/netfilter_ipv4.h>
 #include <linux/netfilter_bridge.h>
@@ -197,6 +198,13 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s
 		skb = skb2;
 	}
 
+	if (lwtunnel_xmit_redirect(dst->lwtstate)) {
+		int res = lwtunnel_xmit(skb);
+
+		if (res < 0 || res == LWTUNNEL_XMIT_DONE)
+			return res;
+	}
+
 	rcu_read_lock_bh();
 	nexthop = (__force u32) rt_nexthop(rt, ip_hdr(skb)->daddr);
 	neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index a1f2830d8110..3e992783c1d0 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1246,7 +1246,9 @@ static unsigned int ipv4_mtu(const struct dst_entry *dst)
 			mtu = 576;
 	}
 
-	return min_t(unsigned int, mtu, IP_MAX_MTU);
+	mtu = min_t(unsigned int, mtu, IP_MAX_MTU);
+
+	return mtu - lwtunnel_headroom(dst->lwtstate, mtu);
 }
 
 static struct fib_nh_exception *find_exception(struct fib_nh *nh, __be32 daddr)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 1dfc402d9ad1..993fd9666f1b 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -56,6 +56,7 @@
 #include <net/checksum.h>
 #include <linux/mroute6.h>
 #include <net/l3mdev.h>
+#include <net/lwtunnel.h>
 
 static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
@@ -104,6 +105,13 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff *
 		}
 	}
 
+	if (lwtunnel_xmit_redirect(dst->lwtstate)) {
+		int res = lwtunnel_xmit(skb);
+
+		if (res < 0 || res == LWTUNNEL_XMIT_DONE)
+			return res;
+	}
+
 	rcu_read_lock_bh();
 	nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
 	neigh = __ipv6_neigh_lookup_noref(dst->dev, nexthop);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 49817555449e..09d43ff11a8d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1604,7 +1604,9 @@ static unsigned int ip6_mtu(const struct dst_entry *dst)
 	rcu_read_unlock();
 
 out:
-	return min_t(unsigned int, mtu, IP6_MAX_MTU);
+	mtu = min_t(unsigned int, mtu, IP6_MAX_MTU);
+
+	return mtu - lwtunnel_headroom(dst->lwtstate, mtu);
 }
 
 static struct dst_entry *icmp6_dst_gc_list;
diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index 644a8da6d4bd..aed872cc05a6 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -37,7 +37,7 @@ static unsigned int mpls_encap_size(struct mpls_iptunnel_encap *en)
 	return en->labels * sizeof(struct mpls_shim_hdr);
 }
 
-static int mpls_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+static int mpls_xmit(struct sk_buff *skb)
 {
 	struct mpls_iptunnel_encap *tun_encap_info;
 	struct mpls_shim_hdr *hdr;
@@ -115,7 +115,7 @@ static int mpls_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 		net_dbg_ratelimited("%s: packet transmission failed: %d\n",
 				    __func__, err);
 
-	return 0;
+	return LWTUNNEL_XMIT_DONE;
 
 drop:
 	kfree_skb(skb);
@@ -153,7 +153,8 @@ static int mpls_build_state(struct net_device *dev, struct nlattr *nla,
 	if (ret)
 		goto errout;
 	newts->type = LWTUNNEL_ENCAP_MPLS;
-	newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT;
+	newts->flags |= LWTUNNEL_STATE_XMIT_REDIRECT;
+	newts->headroom = mpls_encap_size(tun_encap_info);
 
 	*ts = newts;
 
@@ -209,7 +210,7 @@ static int mpls_encap_cmp(struct lwtunnel_state *a, struct lwtunnel_state *b)
 
 static const struct lwtunnel_encap_ops mpls_iptun_ops = {
 	.build_state = mpls_build_state,
-	.output = mpls_output,
+	.xmit = mpls_xmit,
 	.fill_encap = mpls_fill_encap_info,
 	.get_encap_size = mpls_encap_nlsize,
 	.cmp_encap = mpls_encap_cmp,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-08-19 17:08 [PATCH v3 net-next 0/3] net: mpls: fragmentation and gso fixes for locally originated traffic David Ahern
  2016-08-19 17:09 ` [PATCH net-next 1/3] net: lwtunnel: Handle fragmentation David Ahern
@ 2016-08-19 17:09 ` David Ahern
  2016-08-19 20:17   ` Alexander Duyck
  2016-08-22 12:21   ` Simon Horman
  2016-08-19 17:09 ` [PATCH net-next 3/3] net: veth: Set features for MPLS David Ahern
  2 siblings, 2 replies; 22+ messages in thread
From: David Ahern @ 2016-08-19 17:09 UTC (permalink / raw)
  To: netdev, davem
  Cc: buytenh, simon.horman, ebiederm, rshearma, tom, tgraf,
	olivier.dugeon, alexander.duyck, roopa, David Ahern

As reported by Lennert the MPLS GSO code is failing to properly segment
large packets. There are a couple of problems:

1. the inner protocol is not set so the gso segment functions for inner
   protocol layers are not getting run, and

2  MPLS labels for packets that use the "native" (non-OVS) MPLS code
   are not properly accounted for in mpls_gso_segment.

The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
to call the gso segment functions for the higher layer protocols. That
means skb_mac_gso_segment is called twice -- once with the network
protocol set to MPLS and again with the network protocol set to the
inner protocol.

This patch sets the inner skb protocol addressing item 1 above and sets
the network_header and inner_network_header to mark where the MPLS labels
start and end. The MPLS code in OVS is also updated to set the two
network markers.

>From there the MPLS GSO code uses the difference between the network
header and the inner network header to know the size of the MPLS header
that was pushed. It then pulls the MPLS header, resets the mac_len and
protocol for the inner protocol and then calls skb_mac_gso_segment
to segment the skb.

Afterward the inner protocol segmentation is done the skb protocol
is set to mpls for each segment and the network and mac headers
restored.

Reported-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 net/mpls/mpls_gso.c       | 38 +++++++++++++++++++++++++++-----------
 net/mpls/mpls_iptunnel.c  |  4 ++++
 net/openvswitch/actions.c |  6 ++++++
 3 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
index 2055e57ed1c3..2aa4beaa0e4f 100644
--- a/net/mpls/mpls_gso.c
+++ b/net/mpls/mpls_gso.c
@@ -23,32 +23,48 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
 				       netdev_features_t features)
 {
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
+	u16 mac_offset = skb->mac_header;
 	netdev_features_t mpls_features;
+	u16 mac_len = skb->mac_len;
 	__be16 mpls_protocol;
+	int mpls_hlen;
+
+	skb_reset_network_header(skb);
+	mpls_hlen = skb_inner_network_header(skb) - skb_network_header(skb);
+	if (unlikely(!pskb_may_pull(skb, mpls_hlen)))
+		goto out;
 
 	/* Setup inner SKB. */
 	mpls_protocol = skb->protocol;
 	skb->protocol = skb->inner_protocol;
 
-	/* Push back the mac header that skb_mac_gso_segment() has pulled.
-	 * It will be re-pulled by the call to skb_mac_gso_segment() below
-	 */
-	__skb_push(skb, skb->mac_len);
+	__skb_pull(skb, mpls_hlen);
+
+	skb->mac_len = 0;
+	skb_reset_mac_header(skb);
+	skb_set_network_header(skb, skb_inner_network_offset(skb));
 
 	/* Segment inner packet. */
 	mpls_features = skb->dev->mpls_features & features;
 	segs = skb_mac_gso_segment(skb, mpls_features);
+	if (IS_ERR_OR_NULL(segs)) {
+		skb_gso_error_unwind(skb, mpls_protocol, mpls_hlen, mac_offset,
+				     mac_len);
+		goto out;
+	}
 
+	skb = segs;
+	do {
+		skb->mac_len = mac_len;
+		skb->protocol = mpls_protocol;
 
-	/* Restore outer protocol. */
-	skb->protocol = mpls_protocol;
+		__skb_push(skb, mpls_hlen + mac_len);
 
-	/* Re-pull the mac header that the call to skb_mac_gso_segment()
-	 * above pulled.  It will be re-pushed after returning
-	 * skb_mac_gso_segment(), an indirect caller of this function.
-	 */
-	__skb_pull(skb, skb->data - skb_mac_header(skb));
+		skb_reset_mac_header(skb);
+		skb_set_network_header(skb, mac_len);
+	} while ((skb = skb->next));
 
+out:
 	return segs;
 }
 
diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index aed872cc05a6..cf52cf30ac4b 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -90,7 +90,11 @@ static int mpls_xmit(struct sk_buff *skb)
 	if (skb_cow(skb, hh_len + new_header_size))
 		goto drop;
 
+	skb_set_inner_protocol(skb, skb->protocol);
+	skb_reset_inner_network_header(skb);
+
 	skb_push(skb, new_header_size);
+
 	skb_reset_network_header(skb);
 
 	skb->dev = out_dev;
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 1ecbd7715f6d..6d78f162a88b 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 		skb->mac_len);
 	skb_reset_mac_header(skb);
 
+	/* for GSO: set MPLS as network header and encapsulated protocol
+	 * header as inner network header
+	 */
+	skb_set_network_header(skb, skb->mac_len);
+	skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);
+
 	new_mpls_lse = (__be32 *)skb_mpls_header(skb);
 	*new_mpls_lse = mpls->mpls_lse;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 3/3] net: veth: Set features for MPLS
  2016-08-19 17:08 [PATCH v3 net-next 0/3] net: mpls: fragmentation and gso fixes for locally originated traffic David Ahern
  2016-08-19 17:09 ` [PATCH net-next 1/3] net: lwtunnel: Handle fragmentation David Ahern
  2016-08-19 17:09 ` [PATCH net-next 2/3] net: mpls: Fixups for GSO David Ahern
@ 2016-08-19 17:09 ` David Ahern
  2 siblings, 0 replies; 22+ messages in thread
From: David Ahern @ 2016-08-19 17:09 UTC (permalink / raw)
  To: netdev, davem
  Cc: buytenh, simon.horman, ebiederm, rshearma, tom, tgraf,
	olivier.dugeon, alexander.duyck, roopa, David Ahern

veth does not really transmit packets only moves the skb from one
netdev to another so gso and checksum is not really needed. Add
the features to mpls_features to get the same benefit and performance
with MPLS as without it.

Reported-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 drivers/net/veth.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index f37a6e61d4ad..5db320a4d5cf 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -340,6 +340,7 @@ static void veth_setup(struct net_device *dev)
 
 	dev->hw_features = VETH_FEATURES;
 	dev->hw_enc_features = VETH_FEATURES;
+	dev->mpls_features = NETIF_F_HW_CSUM | NETIF_F_GSO_SOFTWARE;
 }
 
 /*
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-08-19 17:09 ` [PATCH net-next 2/3] net: mpls: Fixups for GSO David Ahern
@ 2016-08-19 20:17   ` Alexander Duyck
  2016-08-22 12:21   ` Simon Horman
  1 sibling, 0 replies; 22+ messages in thread
From: Alexander Duyck @ 2016-08-19 20:17 UTC (permalink / raw)
  To: David Ahern
  Cc: Netdev, David Miller, Lennert Buytenhek, Simon Horman,
	Eric W. Biederman, rshearma, Tom Herbert, Thomas Graf,
	olivier.dugeon, Roopa Prabhu

On Fri, Aug 19, 2016 at 10:09 AM, David Ahern <dsa@cumulusnetworks.com> wrote:
> As reported by Lennert the MPLS GSO code is failing to properly segment
> large packets. There are a couple of problems:
>
> 1. the inner protocol is not set so the gso segment functions for inner
>    protocol layers are not getting run, and
>
> 2  MPLS labels for packets that use the "native" (non-OVS) MPLS code
>    are not properly accounted for in mpls_gso_segment.
>
> The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
> to call the gso segment functions for the higher layer protocols. That
> means skb_mac_gso_segment is called twice -- once with the network
> protocol set to MPLS and again with the network protocol set to the
> inner protocol.
>
> This patch sets the inner skb protocol addressing item 1 above and sets
> the network_header and inner_network_header to mark where the MPLS labels
> start and end. The MPLS code in OVS is also updated to set the two
> network markers.
>
> From there the MPLS GSO code uses the difference between the network
> header and the inner network header to know the size of the MPLS header
> that was pushed. It then pulls the MPLS header, resets the mac_len and
> protocol for the inner protocol and then calls skb_mac_gso_segment
> to segment the skb.
>
> Afterward the inner protocol segmentation is done the skb protocol
> is set to mpls for each segment and the network and mac headers
> restored.
>
> Reported-by: Lennert Buytenhek <buytenh@wantstofly.org>
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
>  net/mpls/mpls_gso.c       | 38 +++++++++++++++++++++++++++-----------
>  net/mpls/mpls_iptunnel.c  |  4 ++++
>  net/openvswitch/actions.c |  6 ++++++
>  3 files changed, 37 insertions(+), 11 deletions(-)
>
> diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
> index 2055e57ed1c3..2aa4beaa0e4f 100644
> --- a/net/mpls/mpls_gso.c
> +++ b/net/mpls/mpls_gso.c
> @@ -23,32 +23,48 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
>                                        netdev_features_t features)
>  {
>         struct sk_buff *segs = ERR_PTR(-EINVAL);
> +       u16 mac_offset = skb->mac_header;
>         netdev_features_t mpls_features;
> +       u16 mac_len = skb->mac_len;
>         __be16 mpls_protocol;
> +       int mpls_hlen;
> +
> +       skb_reset_network_header(skb);
> +       mpls_hlen = skb_inner_network_header(skb) - skb_network_header(skb);
> +       if (unlikely(!pskb_may_pull(skb, mpls_hlen)))
> +               goto out;
>
>         /* Setup inner SKB. */
>         mpls_protocol = skb->protocol;
>         skb->protocol = skb->inner_protocol;
>
> -       /* Push back the mac header that skb_mac_gso_segment() has pulled.
> -        * It will be re-pulled by the call to skb_mac_gso_segment() below
> -        */
> -       __skb_push(skb, skb->mac_len);
> +       __skb_pull(skb, mpls_hlen);
> +
> +       skb->mac_len = 0;
> +       skb_reset_mac_header(skb);
> +       skb_set_network_header(skb, skb_inner_network_offset(skb));

No need to set the network header.  Both IPv4 and IPv6 GSO paths will
reset the network header just like you did at the start.

>         /* Segment inner packet. */
>         mpls_features = skb->dev->mpls_features & features;
>         segs = skb_mac_gso_segment(skb, mpls_features);
> +       if (IS_ERR_OR_NULL(segs)) {
> +               skb_gso_error_unwind(skb, mpls_protocol, mpls_hlen, mac_offset,
> +                                    mac_len);
> +               goto out;
> +       }
>
> +       skb = segs;

You could probably pull your math for mpls_hlen + mac_len out of the
loop below and just take care of adding mac_len to mpls_hlen up here
and store it of in mpls_hlen since it isn't used anywhere else.

> +       do {
> +               skb->mac_len = mac_len;
> +               skb->protocol = mpls_protocol;
>
> -       /* Restore outer protocol. */
> -       skb->protocol = mpls_protocol;
> +               __skb_push(skb, mpls_hlen + mac_len);
>
> -       /* Re-pull the mac header that the call to skb_mac_gso_segment()
> -        * above pulled.  It will be re-pushed after returning
> -        * skb_mac_gso_segment(), an indirect caller of this function.
> -        */
> -       __skb_pull(skb, skb->data - skb_mac_header(skb));

You need to store off the inner network header before you overwrite it
in the lines below.  Either skb_reset_inner_network_header before the
push, or skb_reset_inner_headers before you call the two lines below.

> +               skb_reset_mac_header(skb);
> +               skb_set_network_header(skb, mac_len);
> +       } while ((skb = skb->next));
>
> +out:
>         return segs;
>  }
>
> diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
> index aed872cc05a6..cf52cf30ac4b 100644
> --- a/net/mpls/mpls_iptunnel.c
> +++ b/net/mpls/mpls_iptunnel.c
> @@ -90,7 +90,11 @@ static int mpls_xmit(struct sk_buff *skb)
>         if (skb_cow(skb, hh_len + new_header_size))
>                 goto drop;
>
> +       skb_set_inner_protocol(skb, skb->protocol);
> +       skb_reset_inner_network_header(skb);
> +
>         skb_push(skb, new_header_size);
> +
>         skb_reset_network_header(skb);
>
>         skb->dev = out_dev;
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 1ecbd7715f6d..6d78f162a88b 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
>                 skb->mac_len);
>         skb_reset_mac_header(skb);
>
> +       /* for GSO: set MPLS as network header and encapsulated protocol
> +        * header as inner network header
> +        */
> +       skb_set_network_header(skb, skb->mac_len);
> +       skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);
> +
>         new_mpls_lse = (__be32 *)skb_mpls_header(skb);
>         *new_mpls_lse = mpls->mpls_lse;
>
> --
> 2.1.4
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-08-19 17:09 ` [PATCH net-next 2/3] net: mpls: Fixups for GSO David Ahern
  2016-08-19 20:17   ` Alexander Duyck
@ 2016-08-22 12:21   ` Simon Horman
  2016-08-22 13:11     ` David Ahern
  1 sibling, 1 reply; 22+ messages in thread
From: Simon Horman @ 2016-08-22 12:21 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, buytenh, ebiederm, rshearma, tom, tgraf,
	olivier.dugeon, alexander.duyck, roopa

On Fri, Aug 19, 2016 at 10:09:01AM -0700, David Ahern wrote:
> As reported by Lennert the MPLS GSO code is failing to properly segment
> large packets. There are a couple of problems:
> 
> 1. the inner protocol is not set so the gso segment functions for inner
>    protocol layers are not getting run, and
> 
> 2  MPLS labels for packets that use the "native" (non-OVS) MPLS code
>    are not properly accounted for in mpls_gso_segment.
> 
> The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
> to call the gso segment functions for the higher layer protocols. That
> means skb_mac_gso_segment is called twice -- once with the network
> protocol set to MPLS and again with the network protocol set to the
> inner protocol.
> 
> This patch sets the inner skb protocol addressing item 1 above and sets
> the network_header and inner_network_header to mark where the MPLS labels
> start and end. The MPLS code in OVS is also updated to set the two
> network markers.
> 
> From there the MPLS GSO code uses the difference between the network
> header and the inner network header to know the size of the MPLS header
> that was pushed. It then pulls the MPLS header, resets the mac_len and
> protocol for the inner protocol and then calls skb_mac_gso_segment
> to segment the skb.
> 
> Afterward the inner protocol segmentation is done the skb protocol
> is set to mpls for each segment and the network and mac headers
> restored.
> 
> Reported-by: Lennert Buytenhek <buytenh@wantstofly.org>
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
>  net/mpls/mpls_gso.c       | 38 +++++++++++++++++++++++++++-----------
>  net/mpls/mpls_iptunnel.c  |  4 ++++
>  net/openvswitch/actions.c |  6 ++++++
>  3 files changed, 37 insertions(+), 11 deletions(-)
> 
> diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
> index 2055e57ed1c3..2aa4beaa0e4f 100644

...

> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 1ecbd7715f6d..6d78f162a88b 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
>  		skb->mac_len);
>  	skb_reset_mac_header(skb);
>  
> +	/* for GSO: set MPLS as network header and encapsulated protocol
> +	 * header as inner network header
> +	 */
> +	skb_set_network_header(skb, skb->mac_len);
> +	skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);
> +
>  	new_mpls_lse = (__be32 *)skb_mpls_header(skb);
>  	*new_mpls_lse = mpls->mpls_lse;

Is the above calculation correct if push_mpls() is called multiple times?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-08-22 12:21   ` Simon Horman
@ 2016-08-22 13:11     ` David Ahern
  2016-08-22 14:51       ` Simon Horman
  0 siblings, 1 reply; 22+ messages in thread
From: David Ahern @ 2016-08-22 13:11 UTC (permalink / raw)
  To: Simon Horman
  Cc: netdev, davem, buytenh, ebiederm, rshearma, tom, tgraf,
	olivier.dugeon, alexander.duyck, roopa

On 8/22/16 6:21 AM, Simon Horman wrote:
>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> index 1ecbd7715f6d..6d78f162a88b 100644
>> --- a/net/openvswitch/actions.c
>> +++ b/net/openvswitch/actions.c
>> @@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
>>  		skb->mac_len);
>>  	skb_reset_mac_header(skb);
>>  
>> +	/* for GSO: set MPLS as network header and encapsulated protocol
>> +	 * header as inner network header
>> +	 */
>> +	skb_set_network_header(skb, skb->mac_len);
>> +	skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);
>> +
>>  	new_mpls_lse = (__be32 *)skb_mpls_header(skb);
>>  	*new_mpls_lse = mpls->mpls_lse;
> 
> Is the above calculation correct if push_mpls() is called multiple times?
> 

No. Does OVS support more than 1? I really need someone who is familiar with the OVS code to make sure it works for all use cases. e.g., set skb_set_inner_network_header() before pushing a series of MPLS labels.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-08-22 13:11     ` David Ahern
@ 2016-08-22 14:51       ` Simon Horman
  2016-08-23 19:24         ` David Ahern
  0 siblings, 1 reply; 22+ messages in thread
From: Simon Horman @ 2016-08-22 14:51 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, buytenh, ebiederm, rshearma, tom, tgraf,
	olivier.dugeon, alexander.duyck, roopa, Pravin B Shelar

On Mon, Aug 22, 2016 at 07:11:27AM -0600, David Ahern wrote:
> On 8/22/16 6:21 AM, Simon Horman wrote:
> >> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> >> index 1ecbd7715f6d..6d78f162a88b 100644
> >> --- a/net/openvswitch/actions.c
> >> +++ b/net/openvswitch/actions.c
> >> @@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
> >>  		skb->mac_len);
> >>  	skb_reset_mac_header(skb);
> >>  
> >> +	/* for GSO: set MPLS as network header and encapsulated protocol
> >> +	 * header as inner network header
> >> +	 */
> >> +	skb_set_network_header(skb, skb->mac_len);
> >> +	skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);
> >> +
> >>  	new_mpls_lse = (__be32 *)skb_mpls_header(skb);
> >>  	*new_mpls_lse = mpls->mpls_lse;
> > 
> > Is the above calculation correct if push_mpls() is called multiple times?
> > 
> 
> No. Does OVS support more than 1? I really need someone who is familiar with the OVS code to make sure it works for all use cases. e.g., set skb_set_inner_network_header() before pushing a series of MPLS labels.

Yes that is supported.

The scheme that OvS uses so far is that mac_len denotes the number of bytes
from the start of the MAC header until its end. In the absence of MPLS that
will be the beginning of the network header. And in the presence of MPLS it
will be the beginning of the MPLS label stack. The network header is... the
network header. This allows the MAC header, MPLS label stack and network
header to be tracked.

Pravin (CCed) may have different ideas but I wonder if the above scheme can
be preserved while also meeting the needs of your new MPLS GSO scheme if
you set skb_set_network_header() and skb_set_inner_network_header() in
net/openvswitch/actions.c:do_output().

It may also be possible to teach OvS to use skb_set_network_header to
denote the beginning of the MPLS LSE and skb_set_inner_network_header to
denote the network header in the presence of MPLS. Which is my current
understanding of what you are trying to achieve. But I think its likely
that I misunderstand things as it seems strange to me to pretend that an
MPLS LSE is a network header and the outer most network header is an inner
network header

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-08-22 14:51       ` Simon Horman
@ 2016-08-23 19:24         ` David Ahern
  2016-08-24  7:20           ` Simon Horman
  0 siblings, 1 reply; 22+ messages in thread
From: David Ahern @ 2016-08-23 19:24 UTC (permalink / raw)
  To: Simon Horman, Pravin B Shelar
  Cc: netdev, davem, buytenh, ebiederm, rshearma, tom, tgraf,
	olivier.dugeon, alexander.duyck, roopa

On 8/22/16 8:51 AM, Simon Horman wrote:
> 
> The scheme that OvS uses so far is that mac_len denotes the number of bytes
> from the start of the MAC header until its end. In the absence of MPLS that
> will be the beginning of the network header. And in the presence of MPLS it
> will be the beginning of the MPLS label stack. The network header is... the
> network header. This allows the MAC header, MPLS label stack and network
> header to be tracked.

The neigh output functions do '__skb_pull(skb, skb_network_offset(skb))' so if mpls_xmit does not reset the network header the labels get dropped. To me this says MPLS labels can not be lumped with the mac header which leaves the only option as the outer network header.

> 
> Pravin (CCed) may have different ideas but I wonder if the above scheme can
> be preserved while also meeting the needs of your new MPLS GSO scheme if
> you set skb_set_network_header() and skb_set_inner_network_header() in
> net/openvswitch/actions.c:do_output().
> 
> It may also be possible to teach OvS to use skb_set_network_header to
> denote the beginning of the MPLS LSE and skb_set_inner_network_header to
> denote the network header in the presence of MPLS. Which is my current
> understanding of what you are trying to achieve. But I think its likely
> that I misunderstand things as it seems strange to me to pretend that an
> MPLS LSE is a network header and the outer most network header is an inner
> network header
> 

This is the only option I can see working, but open to patches showing an alternative.

I would like to get it resolved this week so I can move on to gso in the mpls forward case.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-08-23 19:24         ` David Ahern
@ 2016-08-24  7:20           ` Simon Horman
  2016-08-24 16:28             ` pravin shelar
  0 siblings, 1 reply; 22+ messages in thread
From: Simon Horman @ 2016-08-24  7:20 UTC (permalink / raw)
  To: David Ahern
  Cc: Pravin B Shelar, netdev, davem, buytenh, ebiederm, rshearma, tom,
	tgraf, olivier.dugeon, alexander.duyck, roopa

Hi David,

On Tue, Aug 23, 2016 at 01:24:51PM -0600, David Ahern wrote:
> On 8/22/16 8:51 AM, Simon Horman wrote:
> > 
> > The scheme that OvS uses so far is that mac_len denotes the number of bytes
> > from the start of the MAC header until its end. In the absence of MPLS that
> > will be the beginning of the network header. And in the presence of MPLS it
> > will be the beginning of the MPLS label stack. The network header is... the
> > network header. This allows the MAC header, MPLS label stack and network
> > header to be tracked.
> 
> The neigh output functions do '__skb_pull(skb, skb_network_offset(skb))' so if mpls_xmit does not reset the network header the labels get dropped. To me this says MPLS labels can not be lumped with the mac header which leaves the only option as the outer network header.
> 
> > 
> > Pravin (CCed) may have different ideas but I wonder if the above scheme can
> > be preserved while also meeting the needs of your new MPLS GSO scheme if
> > you set skb_set_network_header() and skb_set_inner_network_header() in
> > net/openvswitch/actions.c:do_output().
> > 
> > It may also be possible to teach OvS to use skb_set_network_header to
> > denote the beginning of the MPLS LSE and skb_set_inner_network_header to
> > denote the network header in the presence of MPLS. Which is my current
> > understanding of what you are trying to achieve. But I think its likely
> > that I misunderstand things as it seems strange to me to pretend that an
> > MPLS LSE is a network header and the outer most network header is an inner
> > network header
> > 
> 
> This is the only option I can see working, but open to patches showing an
> alternative.

On reflection I came to a similar conclusion.

> I would like to get it resolved this week so I can move on to gso in the
> mpls forward case.

How do you feel about implementing the do_output() idea I suggested above?
I'm happy to provide testing and review.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-08-24  7:20           ` Simon Horman
@ 2016-08-24 16:28             ` pravin shelar
  2016-08-24 16:37               ` David Ahern
  0 siblings, 1 reply; 22+ messages in thread
From: pravin shelar @ 2016-08-24 16:28 UTC (permalink / raw)
  To: Simon Horman
  Cc: David Ahern, Pravin B Shelar, Linux Kernel Network Developers,
	David S. Miller, buytenh, Eric W. Biederman, rshearma, tom,
	Thomas Graf, olivier.dugeon, Alexander Duyck, roopa

On Wed, Aug 24, 2016 at 12:20 AM, Simon Horman
<simon.horman@netronome.com> wrote:
> Hi David,
>
> On Tue, Aug 23, 2016 at 01:24:51PM -0600, David Ahern wrote:
>> On 8/22/16 8:51 AM, Simon Horman wrote:
>> >
>> > The scheme that OvS uses so far is that mac_len denotes the number of bytes
>> > from the start of the MAC header until its end. In the absence of MPLS that
>> > will be the beginning of the network header. And in the presence of MPLS it
>> > will be the beginning of the MPLS label stack. The network header is... the
>> > network header. This allows the MAC header, MPLS label stack and network
>> > header to be tracked.
>>
>> The neigh output functions do '__skb_pull(skb, skb_network_offset(skb))' so if mpls_xmit does not reset the network header the labels get dropped. To me this says MPLS labels can not be lumped with the mac header which leaves the only option as the outer network header.
>>
>> >
>> > Pravin (CCed) may have different ideas but I wonder if the above scheme can
>> > be preserved while also meeting the needs of your new MPLS GSO scheme if
>> > you set skb_set_network_header() and skb_set_inner_network_header() in
>> > net/openvswitch/actions.c:do_output().
>> >
>> > It may also be possible to teach OvS to use skb_set_network_header to
>> > denote the beginning of the MPLS LSE and skb_set_inner_network_header to
>> > denote the network header in the presence of MPLS. Which is my current
>> > understanding of what you are trying to achieve. But I think its likely
>> > that I misunderstand things as it seems strange to me to pretend that an
>> > MPLS LSE is a network header and the outer most network header is an inner
>> > network header
>> >
>>
>> This is the only option I can see working, but open to patches showing an
>> alternative.
>
> On reflection I came to a similar conclusion.
>
>> I would like to get it resolved this week so I can move on to gso in the
>> mpls forward case.
>
> How do you feel about implementing the do_output() idea I suggested above?
> I'm happy to provide testing and review.

I am not sure about changing do_output(). why not just use same scheme
to track mpls header in OVS datapath as done in mpls device?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-08-24 16:28             ` pravin shelar
@ 2016-08-24 16:37               ` David Ahern
  2016-08-24 17:41                 ` pravin shelar
  2016-09-26 15:56                 ` Jiri Benc
  0 siblings, 2 replies; 22+ messages in thread
From: David Ahern @ 2016-08-24 16:37 UTC (permalink / raw)
  To: pravin shelar, Simon Horman
  Cc: Pravin B Shelar, Linux Kernel Network Developers,
	David S. Miller, buytenh, Eric W. Biederman, rshearma, tom,
	Thomas Graf, olivier.dugeon, Alexander Duyck, roopa

On 8/24/16 10:28 AM, pravin shelar wrote:
>> How do you feel about implementing the do_output() idea I suggested above?
>> I'm happy to provide testing and review.
> 
> I am not sure about changing do_output(). why not just use same scheme
> to track mpls header in OVS datapath as done in mpls device?
> 

was just replying with the same. 

Something like this should be able to handle multiple labels. The inner network header is set once and the outer one pointing to MPLS is adjusted each time a label is pushed:

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 1ecbd7715f6d..0f37b17e3a73 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -162,10 +162,16 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
        if (skb_cow_head(skb, MPLS_HLEN) < 0)
                return -ENOMEM;

+       if (!skb->inner_protocol) {
+               skb_set_inner_network_header(skb, skb->mac_len);
+               skb_set_inner_protocol(skb, skb->protocol);
+       }
+
        skb_push(skb, MPLS_HLEN);
        memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
                skb->mac_len);
        skb_reset_mac_header(skb);
+       skb_set_network_header(skb, skb->mac_len);

        new_mpls_lse = (__be32 *)skb_mpls_header(skb);
        *new_mpls_lse = mpls->mpls_lse;
@@ -173,8 +179,7 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
        skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);

        update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
-       if (!skb->inner_protocol)
-               skb_set_inner_protocol(skb, skb->protocol);
+
        skb->protocol = mpls->mpls_ethertype;

        invalidate_flow_key(key);




If it does, what else needs to be changed in OVS to handle the network layer now pointing to the MPLS labels?

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-08-24 16:37               ` David Ahern
@ 2016-08-24 17:41                 ` pravin shelar
  2016-08-24 18:53                   ` David Ahern
  2016-09-26 15:56                 ` Jiri Benc
  1 sibling, 1 reply; 22+ messages in thread
From: pravin shelar @ 2016-08-24 17:41 UTC (permalink / raw)
  To: David Ahern
  Cc: Simon Horman, Pravin B Shelar, Linux Kernel Network Developers,
	David S. Miller, buytenh, Eric W. Biederman, rshearma, tom,
	Thomas Graf, olivier.dugeon, Alexander Duyck, roopa

On Wed, Aug 24, 2016 at 9:37 AM, David Ahern <dsa@cumulusnetworks.com> wrote:
> On 8/24/16 10:28 AM, pravin shelar wrote:
>>> How do you feel about implementing the do_output() idea I suggested above?
>>> I'm happy to provide testing and review.
>>
>> I am not sure about changing do_output(). why not just use same scheme
>> to track mpls header in OVS datapath as done in mpls device?
>>
>
> was just replying with the same.
>
> Something like this should be able to handle multiple labels. The inner network header is set once and the outer one pointing to MPLS is adjusted each time a label is pushed:
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 1ecbd7715f6d..0f37b17e3a73 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -162,10 +162,16 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
>         if (skb_cow_head(skb, MPLS_HLEN) < 0)
>                 return -ENOMEM;
>
> +       if (!skb->inner_protocol) {
> +               skb_set_inner_network_header(skb, skb->mac_len);
> +               skb_set_inner_protocol(skb, skb->protocol);
> +       }
> +
>         skb_push(skb, MPLS_HLEN);
>         memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
>                 skb->mac_len);
>         skb_reset_mac_header(skb);
> +       skb_set_network_header(skb, skb->mac_len);
>
>         new_mpls_lse = (__be32 *)skb_mpls_header(skb);
>         *new_mpls_lse = mpls->mpls_lse;
> @@ -173,8 +179,7 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
>         skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
>
>         update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
> -       if (!skb->inner_protocol)
> -               skb_set_inner_protocol(skb, skb->protocol);
> +
>         skb->protocol = mpls->mpls_ethertype;
>
>         invalidate_flow_key(key);
>
>
>
>
> If it does, what else needs to be changed in OVS to handle the network layer now pointing to the MPLS labels?
>
You also need to change pop_mpls().

Anyways I was thinking about the neigh output functions skb pull
issue, where it is using network-header offset. Can we use mac_len?
this way we would not use any inner offsets for MPLS skb and current
scheme used by OVS datapath works.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-08-24 17:41                 ` pravin shelar
@ 2016-08-24 18:53                   ` David Ahern
  2016-08-25  3:12                     ` David Ahern
  2016-08-25  3:58                     ` pravin shelar
  0 siblings, 2 replies; 22+ messages in thread
From: David Ahern @ 2016-08-24 18:53 UTC (permalink / raw)
  To: pravin shelar
  Cc: Simon Horman, Pravin B Shelar, Linux Kernel Network Developers,
	David S. Miller, buytenh, Eric W. Biederman, rshearma, tom,
	Thomas Graf, olivier.dugeon, Alexander Duyck, roopa

On 8/24/16 11:41 AM, pravin shelar wrote:
> You also need to change pop_mpls().

What change is needed in pop_mpls? It already resets the mac_header and if MPLS labels are removed there is no need to set network_header. I take it you mean if the protocol is still MPLS and there are still labels then the network header needs to be set and that means finding the bottom label. Does OVS set the bottom of stack bit? From what I can tell OVS is not parsing the MPLS label so no requirement that BOS is set. Without that there is no way to tell when the labels are done short of guessing.

> 
> Anyways I was thinking about the neigh output functions skb pull
> issue, where it is using network-header offset. Can we use mac_len?
> this way we would not use any inner offsets for MPLS skb and current
> scheme used by OVS datapath works.

neigh_resolve_output and neigh_connected_output both do an __skb_pull to the network offset. When these functions are called there may or may not be a mac header set in the skb making the mac_header unreliable for how you want to use it. e.g. I tried this:

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 2ae929f9bd06..9f20a0b8e6be 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1292,12 +1292,16 @@ int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb)
                int err;
                struct net_device *dev = neigh->dev;
                unsigned int seq;
+               unsigned int offset = skb_network_offset(skb);
+
+               if (unlikely(skb_mac_header_was_set(skb)))
+                       offset = skb_mac_header(skb) - skb->data;

                if (dev->header_ops->cache && !neigh->hh.hh_len)
                        neigh_hh_init(neigh);

                do {
-                       __skb_pull(skb, skb_network_offset(skb));
+                       __skb_pull(skb, offset);
                        seq = read_seqbegin(&neigh->ha_lock);
                        err = dev_hard_header(skb, dev, ntohs(skb->protocol),
                                              neigh->ha, NULL, skb->len);


It does not work. The MPLS packet goes down the stack fine, but when the packet is forwarded from one namespace to another you can get a panic since it hits neigh_resolve_output with a mac header and the pull above will do the wrong thing.

[   18.254133] BUG: unable to handle kernel paging request at ffff88023860404a
[   18.255566] IP: [<ffffffff813eb418>] eth_header+0x40/0xaf
[   18.256649] PGD 1c40067 PUD 0
[   18.257277] Oops: 0002 [#1] SMP
[   18.257872] Modules linked in: veth 8021q garp mrp stp llc vrf
[   18.259168] CPU: 2 PID: 868 Comm: ping Not tainted 4.8.0-rc2+ #81
[   18.260308] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[   18.262184] task: ffff88013ab61040 task.stack: ffff880135090000
[   18.263285] RIP: 0010:[<ffffffff813eb418>]  [<ffffffff813eb418>] eth_header+0x40/0xaf
[   18.264762] RSP: 0018:ffff88013fd03c80  EFLAGS: 00010216
[   18.265791] RAX: ffff88023860403e RBX: 0000000000000008 RCX: ffff88013a5c18a0
[   18.267040] RDX: ffff88023860403e RSI: 000000000000000e RDI: ffff88013ab0a200
[   18.268307] RBP: ffff88013fd03ca8 R08: 0000000000000000 R09: 0000000000000058
[   18.269556] R10: ffff88023860403e R11: 0000000000000000 R12: ffff88013a5c18a0
[   18.270807] R13: ffff880135b0b000 R14: ffff880135b0b000 R15: ffff88013a5c1828
[   18.272064] FS:  00007fbc44b66700(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
[   18.273477] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.274492] CR2: ffff88023860404a CR3: 00000001350c8000 CR4: 00000000000406e0
[   18.275746] Stack:
[   18.276125]  0000000000000000 0000005800000246 ffff88013ab0a200 0000000000000002
[   18.277519]  ffff88013a5c1800 ffff88013fd03cb8 ffffffff813d5912 ffff88013fd03d00
[   18.278904]  ffffffff813d73ea ffff88013a5c18a0 fffffffc01000246 ffff88013a5c1838
[   18.280295] Call Trace:
[   18.280712]  <IRQ>
[   18.281049]  [<ffffffff813d5912>] dev_hard_header.constprop.42+0x26/0x28
[   18.282204]  [<ffffffff813d73ea>] neigh_resolve_output+0x1b9/0x270
[   18.283228]  [<ffffffff813d627c>] neigh_update+0x372/0x497
[   18.284160]  [<ffffffff81429704>] arp_process+0x520/0x572
[   18.285061]  [<ffffffff8142987f>] arp_rcv+0x10e/0x17d
[   18.285909]  [<ffffffff813ca6bd>] __netif_receive_skb_core+0x3ea/0x4b8
[   18.286995]  [<ffffffff813ca7a1>] __netif_receive_skb+0x16/0x66
[   18.287993]  [<ffffffff813cad3d>] process_backlog+0xa4/0x132
[   18.288935]  [<ffffffff813cab28>] net_rx_action+0xd1/0x242
[   18.289854]  [<ffffffff8104e611>] __do_softirq+0x100/0x26d
[   18.290764]  [<ffffffff814b1d8c>] do_softirq_own_stack+0x1c/0x30
[   18.291775]  <EOI>
[   18.292100]  [<ffffffff8104e7e3>] do_softirq+0x30/0x3b
[   18.292968]  [<ffffffff8104e857>] __local_bh_enable_ip+0x69/0x73
[   18.293919]  [<ffffffff813d468d>] local_bh_enable+0x15/0x17
[   18.294798]  [<ffffffff813d6fb7>] neigh_xmit+0x93/0xe3
[   18.295626]  [<ffffffff814a86e4>] mpls_xmit+0x379/0x3c0
[   18.296464]  [<ffffffff813e9ac3>] lwtunnel_xmit+0x48/0x63



Generically though this approach just feels wrong. You want to lump the MPLS labels with the ethernet header but not formally, just by playing games with skb markers. The core networking stack is resisting this approach.



 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-08-24 18:53                   ` David Ahern
@ 2016-08-25  3:12                     ` David Ahern
  2016-08-25  3:58                     ` pravin shelar
  1 sibling, 0 replies; 22+ messages in thread
From: David Ahern @ 2016-08-25  3:12 UTC (permalink / raw)
  To: pravin shelar
  Cc: Simon Horman, Pravin B Shelar, Linux Kernel Network Developers,
	David S. Miller, buytenh, Eric W. Biederman, rshearma, tom,
	Thomas Graf, olivier.dugeon, Alexander Duyck, roopa

On 8/24/16 12:53 PM, David Ahern wrote:
> What change is needed in pop_mpls? It already resets the mac_header and if MPLS labels are removed there is no need to set network_header. I take it you mean if the protocol is still MPLS and there are still labels then the network header needs to be set and that means finding the bottom label. Does OVS set the bottom of stack bit? From what I can tell OVS is not parsing the MPLS label so no requirement that BOS is set. Without that there is no way to tell when the labels are done short of guessing.

I was confusing the inner network layer with the mpls network header. Just sent a v4. can you verify it works for single and multiple labels with OVS? 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-08-24 18:53                   ` David Ahern
  2016-08-25  3:12                     ` David Ahern
@ 2016-08-25  3:58                     ` pravin shelar
  1 sibling, 0 replies; 22+ messages in thread
From: pravin shelar @ 2016-08-25  3:58 UTC (permalink / raw)
  To: David Ahern
  Cc: Simon Horman, Pravin B Shelar, Linux Kernel Network Developers,
	David S. Miller, buytenh, Eric W. Biederman, rshearma, tom,
	Thomas Graf, olivier.dugeon, Alexander Duyck, roopa

On Wed, Aug 24, 2016 at 11:53 AM, David Ahern <dsa@cumulusnetworks.com> wrote:
> On 8/24/16 11:41 AM, pravin shelar wrote:
>> You also need to change pop_mpls().
>
> What change is needed in pop_mpls? It already resets the mac_header and if MPLS labels are removed there is no need to set network_header. I take it you mean if the protocol is still MPLS and there are still labels then the network header needs to be set and that means finding the bottom label. Does OVS set the bottom of stack bit? From what I can tell OVS is not parsing the MPLS label so no requirement that BOS is set. Without that there is no way to tell when the labels are done short of guessing.
>

OVS mpls push and pop action works on outer most mpls label. So
according to new mpls offsets tracking scheme on mpls_pop action you
need to adjust skb network offset.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-08-24 16:37               ` David Ahern
  2016-08-24 17:41                 ` pravin shelar
@ 2016-09-26 15:56                 ` Jiri Benc
  2016-09-26 17:02                   ` Jiri Benc
  1 sibling, 1 reply; 22+ messages in thread
From: Jiri Benc @ 2016-09-26 15:56 UTC (permalink / raw)
  To: David Ahern
  Cc: pravin shelar, Simon Horman, Pravin B Shelar,
	Linux Kernel Network Developers, David S. Miller, buytenh,
	Eric W. Biederman, rshearma, tom, Thomas Graf, olivier.dugeon,
	Alexander Duyck, roopa

On Wed, 24 Aug 2016 10:37:51 -0600, David Ahern wrote:
> Something like this should be able to handle multiple labels. The
> inner network header is set once and the outer one pointing to MPLS
> is adjusted each time a label is pushed:
> 
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 1ecbd7715f6d..0f37b17e3a73 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -162,10 +162,16 @@ static int push_mpls(struct sk_buff *skb,
> struct sw_flow_key *key, if (skb_cow_head(skb, MPLS_HLEN) < 0)
>                 return -ENOMEM;
> 
> +       if (!skb->inner_protocol) {
> +               skb_set_inner_network_header(skb, skb->mac_len);
> +               skb_set_inner_protocol(skb, skb->protocol);
> +       }
> +
>         skb_push(skb, MPLS_HLEN);
>         memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
>                 skb->mac_len);
>         skb_reset_mac_header(skb);
> +       skb_set_network_header(skb, skb->mac_len);

Sorry for chiming in after a month. The code above got in
(48d2ab609b6bb), I'm currently looking at this and it looks very
suspicious to me.

After push_mpls, network_header points to the start of MPLS headers.
Which I understand was the point of this patch. However, push_mpls also
calls invalidate_flow_key. Meaning that, depending on actions, we may
end up calling key_extract soon after. And key_extract sets the network
header *after* the MPLS headers.

That means that on output, for otherwise identical packet,
network_header can point before or after MPLS headers based on what
actions happened to be executed (recirculation, mainly).

If I'm not misreading the code or missing something, this can't be
right.

mpls_gso_segment does not care, it resets the network_header anyway.
What about drivers? What is the correct behavior?

 Jiri

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-09-26 15:56                 ` Jiri Benc
@ 2016-09-26 17:02                   ` Jiri Benc
  2016-09-27  2:04                     ` David Ahern
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Benc @ 2016-09-26 17:02 UTC (permalink / raw)
  To: David Ahern
  Cc: pravin shelar, Simon Horman, Pravin B Shelar,
	Linux Kernel Network Developers, David S. Miller, buytenh,
	Eric W. Biederman, rshearma, tom, Thomas Graf, olivier.dugeon,
	Alexander Duyck, roopa

On Mon, 26 Sep 2016 17:56:22 +0200, Jiri Benc wrote:
> After push_mpls, network_header points to the start of MPLS headers.
> Which I understand was the point of this patch. However, push_mpls also
> calls invalidate_flow_key. Meaning that, depending on actions, we may
> end up calling key_extract soon after. And key_extract sets the network
> header *after* the MPLS headers.
> 
> That means that on output, for otherwise identical packet,
> network_header can point before or after MPLS headers based on what
> actions happened to be executed (recirculation, mainly).
> 
> If I'm not misreading the code or missing something, this can't be
> right.
> 
> mpls_gso_segment does not care, it resets the network_header anyway.
> What about drivers? What is the correct behavior?

Answering to myself: it breaks skb_mac_gso_segment. Seems we need to
fix key_extract to set network_header to the beginning of MPLS headers.
I'll prepare a patch.

 Jiri

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-09-26 17:02                   ` Jiri Benc
@ 2016-09-27  2:04                     ` David Ahern
  2016-09-27  7:45                       ` Jiri Benc
  0 siblings, 1 reply; 22+ messages in thread
From: David Ahern @ 2016-09-27  2:04 UTC (permalink / raw)
  To: Jiri Benc
  Cc: pravin shelar, Simon Horman, Pravin B Shelar,
	Linux Kernel Network Developers, David S. Miller, buytenh,
	Eric W. Biederman, rshearma, tom, Thomas Graf, olivier.dugeon,
	Alexander Duyck, roopa

On 9/26/16 11:02 AM, Jiri Benc wrote:
> On Mon, 26 Sep 2016 17:56:22 +0200, Jiri Benc wrote:
>> After push_mpls, network_header points to the start of MPLS headers.
>> Which I understand was the point of this patch. However, push_mpls also
>> calls invalidate_flow_key. Meaning that, depending on actions, we may
>> end up calling key_extract soon after. And key_extract sets the network
>> header *after* the MPLS headers.

you know this code better than me, but key_extract pulls the eth header and then sets network header. If MPLS labels are present then it is the labels that the network_header now points to. How did come to the conclusion it is after the labels?

>>
>> That means that on output, for otherwise identical packet,
>> network_header can point before or after MPLS headers based on what
>> actions happened to be executed (recirculation, mainly).
>>
>> If I'm not misreading the code or missing something, this can't be
>> right.
>>
>> mpls_gso_segment does not care, it resets the network_header anyway.
>> What about drivers? What is the correct behavior?
> 
> Answering to myself: it breaks skb_mac_gso_segment. Seems we need to
> fix key_extract to set network_header to the beginning of MPLS headers.
> I'll prepare a patch.
> 
>  Jiri
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-09-27  2:04                     ` David Ahern
@ 2016-09-27  7:45                       ` Jiri Benc
  2016-09-27 16:38                         ` David Ahern
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Benc @ 2016-09-27  7:45 UTC (permalink / raw)
  To: David Ahern
  Cc: pravin shelar, Simon Horman, Pravin B Shelar,
	Linux Kernel Network Developers, David S. Miller, buytenh,
	Eric W. Biederman, rshearma, tom, Thomas Graf, olivier.dugeon,
	Alexander Duyck, roopa

On Mon, 26 Sep 2016 20:04:06 -0600, David Ahern wrote:
> you know this code better than me, but key_extract pulls the eth
> header and then sets network header. If MPLS labels are present then
> it is the labels that the network_header now points to. How did come
> to the conclusion it is after the labels?

Look ~100 lines below that, to "if (eth_p_mpls(key->eth.type))".
There's a while loop advancing network header.

 Jiri

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-09-27  7:45                       ` Jiri Benc
@ 2016-09-27 16:38                         ` David Ahern
  2016-09-27 16:45                           ` Jiri Benc
  0 siblings, 1 reply; 22+ messages in thread
From: David Ahern @ 2016-09-27 16:38 UTC (permalink / raw)
  To: Jiri Benc
  Cc: pravin shelar, Simon Horman, Pravin B Shelar,
	Linux Kernel Network Developers, David S. Miller, buytenh,
	Eric W. Biederman, rshearma, tom, Thomas Graf, olivier.dugeon,
	Alexander Duyck, roopa

On 9/27/16 1:45 AM, Jiri Benc wrote:
> On Mon, 26 Sep 2016 20:04:06 -0600, David Ahern wrote:
>> you know this code better than me, but key_extract pulls the eth
>> header and then sets network header. If MPLS labels are present then
>> it is the labels that the network_header now points to. How did come
>> to the conclusion it is after the labels?
> 
> Look ~100 lines below that, to "if (eth_p_mpls(key->eth.type))".
> There's a while loop advancing network header.

got it, thanks. so that block can drop the while loop and just set mpls.top_lse

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO
  2016-09-27 16:38                         ` David Ahern
@ 2016-09-27 16:45                           ` Jiri Benc
  0 siblings, 0 replies; 22+ messages in thread
From: Jiri Benc @ 2016-09-27 16:45 UTC (permalink / raw)
  To: David Ahern
  Cc: pravin shelar, Simon Horman, Pravin B Shelar,
	Linux Kernel Network Developers, David S. Miller, buytenh,
	Eric W. Biederman, rshearma, tom, Thomas Graf, olivier.dugeon,
	Alexander Duyck, roopa

On Tue, 27 Sep 2016 10:38:41 -0600, David Ahern wrote:
> On 9/27/16 1:45 AM, Jiri Benc wrote:
> > On Mon, 26 Sep 2016 20:04:06 -0600, David Ahern wrote:
> >> you know this code better than me, but key_extract pulls the eth
> >> header and then sets network header. If MPLS labels are present then
> >> it is the labels that the network_header now points to. How did come
> >> to the conclusion it is after the labels?
> > 
> > Look ~100 lines below that, to "if (eth_p_mpls(key->eth.type))".
> > There's a while loop advancing network header.
> 
> got it, thanks. so that block can drop the while loop and just set mpls.top_lse

I think we still need to traverse the loop to set inner_network_header.

 Jiri

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-09-27 16:45 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-19 17:08 [PATCH v3 net-next 0/3] net: mpls: fragmentation and gso fixes for locally originated traffic David Ahern
2016-08-19 17:09 ` [PATCH net-next 1/3] net: lwtunnel: Handle fragmentation David Ahern
2016-08-19 17:09 ` [PATCH net-next 2/3] net: mpls: Fixups for GSO David Ahern
2016-08-19 20:17   ` Alexander Duyck
2016-08-22 12:21   ` Simon Horman
2016-08-22 13:11     ` David Ahern
2016-08-22 14:51       ` Simon Horman
2016-08-23 19:24         ` David Ahern
2016-08-24  7:20           ` Simon Horman
2016-08-24 16:28             ` pravin shelar
2016-08-24 16:37               ` David Ahern
2016-08-24 17:41                 ` pravin shelar
2016-08-24 18:53                   ` David Ahern
2016-08-25  3:12                     ` David Ahern
2016-08-25  3:58                     ` pravin shelar
2016-09-26 15:56                 ` Jiri Benc
2016-09-26 17:02                   ` Jiri Benc
2016-09-27  2:04                     ` David Ahern
2016-09-27  7:45                       ` Jiri Benc
2016-09-27 16:38                         ` David Ahern
2016-09-27 16:45                           ` Jiri Benc
2016-08-19 17:09 ` [PATCH net-next 3/3] net: veth: Set features for MPLS David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.