netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 net-next 0/6] net: add support for ip_tun_info options setting
@ 2019-10-08 15:16 Xin Long
  2019-10-08 15:16 ` [PATCHv2 net-next 1/6] lwtunnel: add options process for arp request Xin Long
  0 siblings, 1 reply; 12+ messages in thread
From: Xin Long @ 2019-10-08 15:16 UTC (permalink / raw)
  To: network dev; +Cc: davem, Jiri Benc, Thomas Graf, u9012063

With this patchset, users can configure options with LWTUNNEL_IP(6)_OPTS
by ip route encap for ersapn or vxlan lwtunnel. Note that in kernel part
it won't parse the option details but do some check and memcpy only, and
the options will be parsed by iproute in userspace.

We also improve the vxlan and erspan options processing in this patchset.

As an example I also wrote a patch for iproute2 (see v1), with it we can
add options for erspan lwtunnel like:

   # ip net a a; ip net a b
   # ip -n a l a eth0 type veth peer name eth0 netns b
   # ip -n a l s eth0 up; ip -n b link set eth0 up
   # ip -n a a a 10.1.0.1/24 dev eth0; ip -n b a a 10.1.0.2/24 dev eth0
   # ip -n b l a erspan1 type erspan key 1 seq erspan 123 \
        local 10.1.0.2 remote 10.1.0.1
   # ip -n b a a 1.1.1.1/24 dev erspan1; ip -n b l s erspan1 up
   # ip -n b r a 2.1.1.0/24 dev erspan1
   # ip -n a l a erspan1 type erspan key 1 seq local 10.1.0.1 external
   # ip -n a a a 2.1.1.1/24 dev erspan1; ip -n a l s erspan1 up
   # ip -n a r a 1.1.1.0/24 encap ip id 1 erspan ver 1 idx 123 \
        dst 10.1.0.2 dev erspan1
   # ip -n a r s; ip net exec a ping 1.1.1.1 -c 1

v1->v2:
  - no change, net-next reopened.

Xin Long (6):
  lwtunnel: add options process for arp request
  lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip
  lwtunnel: add LWTUNNEL_IP6_OPTS support for lwtunnel_ip6
  vxlan: check tun_info options_len properly
  erspan: fix the tun_info options_len check
  erspan: make md work without TUNNEL_ERSPAN_OPT set

 drivers/net/vxlan.c           |  6 +++--
 include/uapi/linux/lwtunnel.h |  2 ++
 net/ipv4/ip_gre.c             | 31 ++++++++++-------------
 net/ipv4/ip_tunnel_core.c     | 59 +++++++++++++++++++++++++++++++++----------
 net/ipv6/ip6_gre.c            | 35 +++++++++++++------------
 5 files changed, 84 insertions(+), 49 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv2 net-next 1/6] lwtunnel: add options process for arp request
  2019-10-08 15:16 [PATCHv2 net-next 0/6] net: add support for ip_tun_info options setting Xin Long
@ 2019-10-08 15:16 ` Xin Long
  2019-10-08 15:16   ` [PATCHv2 net-next 2/6] lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip Xin Long
  0 siblings, 1 reply; 12+ messages in thread
From: Xin Long @ 2019-10-08 15:16 UTC (permalink / raw)
  To: network dev; +Cc: davem, Jiri Benc, Thomas Graf, u9012063

Without options copied to the dst tun_info in iptunnel_metadata_reply()
called by arp_process for handling arp_request, the generated arp_reply
packet may be dropped or sent out with wrong options for some tunnels
like erspan and vxlan, and the traffic will break.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/ipv4/ip_tunnel_core.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 1452a97..10f0848 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -126,15 +126,14 @@ struct metadata_dst *iptunnel_metadata_reply(struct metadata_dst *md,
 
 	if (!md || md->type != METADATA_IP_TUNNEL ||
 	    md->u.tun_info.mode & IP_TUNNEL_INFO_TX)
-
 		return NULL;
 
-	res = metadata_dst_alloc(0, METADATA_IP_TUNNEL, flags);
+	src = &md->u.tun_info;
+	res = metadata_dst_alloc(src->options_len, METADATA_IP_TUNNEL, flags);
 	if (!res)
 		return NULL;
 
 	dst = &res->u.tun_info;
-	src = &md->u.tun_info;
 	dst->key.tun_id = src->key.tun_id;
 	if (src->mode & IP_TUNNEL_INFO_IPV6)
 		memcpy(&dst->key.u.ipv6.dst, &src->key.u.ipv6.src,
@@ -143,6 +142,8 @@ struct metadata_dst *iptunnel_metadata_reply(struct metadata_dst *md,
 		dst->key.u.ipv4.dst = src->key.u.ipv4.src;
 	dst->key.tun_flags = src->key.tun_flags;
 	dst->mode = src->mode | IP_TUNNEL_INFO_TX;
+	ip_tunnel_info_opts_set(dst, ip_tunnel_info_opts(src),
+				src->options_len, 0);
 
 	return res;
 }
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv2 net-next 2/6] lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip
  2019-10-08 15:16 ` [PATCHv2 net-next 1/6] lwtunnel: add options process for arp request Xin Long
@ 2019-10-08 15:16   ` Xin Long
  2019-10-08 15:16     ` [PATCHv2 net-next 3/6] lwtunnel: add LWTUNNEL_IP6_OPTS support for lwtunnel_ip6 Xin Long
  2019-10-09  7:55     ` [PATCHv2 net-next 2/6] lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip Simon Horman
  0 siblings, 2 replies; 12+ messages in thread
From: Xin Long @ 2019-10-08 15:16 UTC (permalink / raw)
  To: network dev; +Cc: davem, Jiri Benc, Thomas Graf, u9012063

This patch is to add LWTUNNEL_IP_OPTS into lwtunnel_ip_t, by which
users will be able to set options for ip_tunnel_info by "ip route
encap" for erspan and vxlan's private metadata. Like one way to go
in iproute is:

  # ip route add 1.1.1.0/24 encap ip id 1 erspan ver 1 idx 123 \
      dst 10.1.0.2 dev erspan1
  # ip route show
    1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 \
      tos 0 erspan ver 1 idx 123 dev erspan1 scope link

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/uapi/linux/lwtunnel.h |  1 +
 net/ipv4/ip_tunnel_core.c     | 30 ++++++++++++++++++++++++------
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
index de696ca..93f2c05 100644
--- a/include/uapi/linux/lwtunnel.h
+++ b/include/uapi/linux/lwtunnel.h
@@ -27,6 +27,7 @@ enum lwtunnel_ip_t {
 	LWTUNNEL_IP_TOS,
 	LWTUNNEL_IP_FLAGS,
 	LWTUNNEL_IP_PAD,
+	LWTUNNEL_IP_OPTS,
 	__LWTUNNEL_IP_MAX,
 };
 
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 10f0848..d9b7188 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -218,6 +218,7 @@ static const struct nla_policy ip_tun_policy[LWTUNNEL_IP_MAX + 1] = {
 	[LWTUNNEL_IP_TTL]	= { .type = NLA_U8 },
 	[LWTUNNEL_IP_TOS]	= { .type = NLA_U8 },
 	[LWTUNNEL_IP_FLAGS]	= { .type = NLA_U16 },
+	[LWTUNNEL_IP_OPTS]	= { .type = NLA_BINARY },
 };
 
 static int ip_tun_build_state(struct nlattr *attr,
@@ -228,14 +229,20 @@ static int ip_tun_build_state(struct nlattr *attr,
 	struct ip_tunnel_info *tun_info;
 	struct lwtunnel_state *new_state;
 	struct nlattr *tb[LWTUNNEL_IP_MAX + 1];
-	int err;
+	int err, opts_len = 0;
+	void *opts;
 
 	err = nla_parse_nested_deprecated(tb, LWTUNNEL_IP_MAX, attr,
 					  ip_tun_policy, extack);
 	if (err < 0)
 		return err;
 
-	new_state = lwtunnel_state_alloc(sizeof(*tun_info));
+	if (tb[LWTUNNEL_IP_OPTS]) {
+		opts = nla_data(tb[LWTUNNEL_IP_OPTS]);
+		opts_len = nla_len(tb[LWTUNNEL_IP_OPTS]);
+	}
+
+	new_state = lwtunnel_state_alloc(sizeof(*tun_info) + opts_len);
 	if (!new_state)
 		return -ENOMEM;
 
@@ -269,8 +276,10 @@ static int ip_tun_build_state(struct nlattr *attr,
 	if (tb[LWTUNNEL_IP_FLAGS])
 		tun_info->key.tun_flags = nla_get_be16(tb[LWTUNNEL_IP_FLAGS]);
 
+	if (opts_len)
+		ip_tunnel_info_opts_set(tun_info, opts, opts_len, 0);
+
 	tun_info->mode = IP_TUNNEL_INFO_TX;
-	tun_info->options_len = 0;
 
 	*ts = new_state;
 
@@ -299,6 +308,10 @@ static int ip_tun_fill_encap_info(struct sk_buff *skb,
 	    nla_put_u8(skb, LWTUNNEL_IP_TTL, tun_info->key.ttl) ||
 	    nla_put_be16(skb, LWTUNNEL_IP_FLAGS, tun_info->key.tun_flags))
 		return -ENOMEM;
+	if (tun_info->options_len &&
+	    nla_put(skb, LWTUNNEL_IP_OPTS,
+		    tun_info->options_len, ip_tunnel_info_opts(tun_info)))
+		return -ENOMEM;
 
 	return 0;
 }
@@ -310,13 +323,18 @@ static int ip_tun_encap_nlsize(struct lwtunnel_state *lwtstate)
 		+ nla_total_size(4)	/* LWTUNNEL_IP_SRC */
 		+ nla_total_size(1)	/* LWTUNNEL_IP_TOS */
 		+ nla_total_size(1)	/* LWTUNNEL_IP_TTL */
-		+ nla_total_size(2);	/* LWTUNNEL_IP_FLAGS */
+		+ nla_total_size(2)	/* LWTUNNEL_IP_FLAGS */
+		+ lwt_tun_info(lwtstate)->options_len;	/* LWTUNNEL_IP_OPTS */
 }
 
 static int ip_tun_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b)
 {
-	return memcmp(lwt_tun_info(a), lwt_tun_info(b),
-		      sizeof(struct ip_tunnel_info));
+	struct ip_tunnel_info *info_a = lwt_tun_info(a);
+	struct ip_tunnel_info *info_b = lwt_tun_info(b);
+	u8 opts_len;
+
+	opts_len = min(info_a->options_len, info_b->options_len);
+	return memcmp(info_a, info_b, sizeof(*info_a) + opts_len);
 }
 
 static const struct lwtunnel_encap_ops ip_tun_lwt_ops = {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv2 net-next 3/6] lwtunnel: add LWTUNNEL_IP6_OPTS support for lwtunnel_ip6
  2019-10-08 15:16   ` [PATCHv2 net-next 2/6] lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip Xin Long
@ 2019-10-08 15:16     ` Xin Long
  2019-10-08 15:16       ` [PATCHv2 net-next 4/6] vxlan: check tun_info options_len properly Xin Long
  2019-10-09  7:55     ` [PATCHv2 net-next 2/6] lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip Simon Horman
  1 sibling, 1 reply; 12+ messages in thread
From: Xin Long @ 2019-10-08 15:16 UTC (permalink / raw)
  To: network dev; +Cc: davem, Jiri Benc, Thomas Graf, u9012063

Similar to lwtunnel_ip, this patch is to add options set/dump support
for lwtunnel_ip6.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/uapi/linux/lwtunnel.h |  1 +
 net/ipv4/ip_tunnel_core.c     | 22 ++++++++++++++++++----
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
index 93f2c05..4bed5e6 100644
--- a/include/uapi/linux/lwtunnel.h
+++ b/include/uapi/linux/lwtunnel.h
@@ -42,6 +42,7 @@ enum lwtunnel_ip6_t {
 	LWTUNNEL_IP6_TC,
 	LWTUNNEL_IP6_FLAGS,
 	LWTUNNEL_IP6_PAD,
+	LWTUNNEL_IP6_OPTS,
 	__LWTUNNEL_IP6_MAX,
 };
 
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index d9b7188..c8f5375a 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -353,6 +353,7 @@ static const struct nla_policy ip6_tun_policy[LWTUNNEL_IP6_MAX + 1] = {
 	[LWTUNNEL_IP6_HOPLIMIT]		= { .type = NLA_U8 },
 	[LWTUNNEL_IP6_TC]		= { .type = NLA_U8 },
 	[LWTUNNEL_IP6_FLAGS]		= { .type = NLA_U16 },
+	[LWTUNNEL_IP6_OPTS]		= { .type = NLA_BINARY },
 };
 
 static int ip6_tun_build_state(struct nlattr *attr,
@@ -363,14 +364,20 @@ static int ip6_tun_build_state(struct nlattr *attr,
 	struct ip_tunnel_info *tun_info;
 	struct lwtunnel_state *new_state;
 	struct nlattr *tb[LWTUNNEL_IP6_MAX + 1];
-	int err;
+	int err, opts_len = 0;
+	void *opts;
 
 	err = nla_parse_nested_deprecated(tb, LWTUNNEL_IP6_MAX, attr,
 					  ip6_tun_policy, extack);
 	if (err < 0)
 		return err;
 
-	new_state = lwtunnel_state_alloc(sizeof(*tun_info));
+	if (tb[LWTUNNEL_IP6_OPTS]) {
+		opts = nla_data(tb[LWTUNNEL_IP6_OPTS]);
+		opts_len = nla_len(tb[LWTUNNEL_IP6_OPTS]);
+	}
+
+	new_state = lwtunnel_state_alloc(sizeof(*tun_info)  + opts_len);
 	if (!new_state)
 		return -ENOMEM;
 
@@ -396,8 +403,10 @@ static int ip6_tun_build_state(struct nlattr *attr,
 	if (tb[LWTUNNEL_IP6_FLAGS])
 		tun_info->key.tun_flags = nla_get_be16(tb[LWTUNNEL_IP6_FLAGS]);
 
+	if (opts_len)
+		ip_tunnel_info_opts_set(tun_info, opts, opts_len, 0);
+
 	tun_info->mode = IP_TUNNEL_INFO_TX | IP_TUNNEL_INFO_IPV6;
-	tun_info->options_len = 0;
 
 	*ts = new_state;
 
@@ -417,6 +426,10 @@ static int ip6_tun_fill_encap_info(struct sk_buff *skb,
 	    nla_put_u8(skb, LWTUNNEL_IP6_HOPLIMIT, tun_info->key.ttl) ||
 	    nla_put_be16(skb, LWTUNNEL_IP6_FLAGS, tun_info->key.tun_flags))
 		return -ENOMEM;
+	if (tun_info->options_len &&
+	    nla_put(skb, LWTUNNEL_IP6_OPTS,
+		    tun_info->options_len, ip_tunnel_info_opts(tun_info)))
+		return -ENOMEM;
 
 	return 0;
 }
@@ -428,7 +441,8 @@ static int ip6_tun_encap_nlsize(struct lwtunnel_state *lwtstate)
 		+ nla_total_size(16)	/* LWTUNNEL_IP6_SRC */
 		+ nla_total_size(1)	/* LWTUNNEL_IP6_HOPLIMIT */
 		+ nla_total_size(1)	/* LWTUNNEL_IP6_TC */
-		+ nla_total_size(2);	/* LWTUNNEL_IP6_FLAGS */
+		+ nla_total_size(2)	/* LWTUNNEL_IP6_FLAGS */
+		+ lwt_tun_info(lwtstate)->options_len;  /* LWTUNNEL_IP6_OPTS */
 }
 
 static const struct lwtunnel_encap_ops ip6_tun_lwt_ops = {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv2 net-next 4/6] vxlan: check tun_info options_len properly
  2019-10-08 15:16     ` [PATCHv2 net-next 3/6] lwtunnel: add LWTUNNEL_IP6_OPTS support for lwtunnel_ip6 Xin Long
@ 2019-10-08 15:16       ` Xin Long
  2019-10-08 15:16         ` [PATCHv2 net-next 5/6] erspan: fix the tun_info options_len check Xin Long
  0 siblings, 1 reply; 12+ messages in thread
From: Xin Long @ 2019-10-08 15:16 UTC (permalink / raw)
  To: network dev; +Cc: davem, Jiri Benc, Thomas Graf, u9012063

This patch is to improve the tun_info options_len by dropping
the skb when TUNNEL_VXLAN_OPT is set but options_len is less
than vxlan_metadata. This can void a potential out-of-bounds
access on ip_tun_info.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 drivers/net/vxlan.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 3d9bcc9..e0787286 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2487,9 +2487,11 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		vni = tunnel_id_to_key32(info->key.tun_id);
 		ifindex = 0;
 		dst_cache = &info->dst_cache;
-		if (info->options_len &&
-		    info->key.tun_flags & TUNNEL_VXLAN_OPT)
+		if (info->key.tun_flags & TUNNEL_VXLAN_OPT) {
+			if (info->options_len < sizeof(*md))
+				goto drop;
 			md = ip_tunnel_info_opts(info);
+		}
 		ttl = info->key.ttl;
 		tos = info->key.tos;
 		label = info->key.label;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv2 net-next 5/6] erspan: fix the tun_info options_len check
  2019-10-08 15:16       ` [PATCHv2 net-next 4/6] vxlan: check tun_info options_len properly Xin Long
@ 2019-10-08 15:16         ` Xin Long
  2019-10-08 15:16           ` [PATCHv2 net-next 6/6] erspan: make md work without TUNNEL_ERSPAN_OPT set Xin Long
  0 siblings, 1 reply; 12+ messages in thread
From: Xin Long @ 2019-10-08 15:16 UTC (permalink / raw)
  To: network dev; +Cc: davem, Jiri Benc, Thomas Graf, u9012063

The check for !md doens't really work for ip_tunnel_info_opts(info) which
only does info + 1. Also to avoid out-of-bounds access on info, it should
ensure options_len is not less than erspan_metadata in both erspan_xmit()
and ip6erspan_tunnel_xmit().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/ipv4/ip_gre.c  | 4 ++--
 net/ipv6/ip6_gre.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 52690bb..b5e1f5e 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -509,9 +509,9 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev)
 	key = &tun_info->key;
 	if (!(tun_info->key.tun_flags & TUNNEL_ERSPAN_OPT))
 		goto err_free_skb;
-	md = ip_tunnel_info_opts(tun_info);
-	if (!md)
+	if (sizeof(*md) > tun_info->options_len)
 		goto err_free_skb;
+	md = ip_tunnel_info_opts(tun_info);
 
 	/* ERSPAN has fixed 8 byte GRE header */
 	version = md->version;
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index d5779d6..116987d 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -980,9 +980,9 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
 		dsfield = key->tos;
 		if (!(tun_info->key.tun_flags & TUNNEL_ERSPAN_OPT))
 			goto tx_err;
-		md = ip_tunnel_info_opts(tun_info);
-		if (!md)
+		if (sizeof(*md) > tun_info->options_len)
 			goto tx_err;
+		md = ip_tunnel_info_opts(tun_info);
 
 		tun_id = tunnel_id_to_key32(key->tun_id);
 		if (md->version == 1) {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv2 net-next 6/6] erspan: make md work without TUNNEL_ERSPAN_OPT set
  2019-10-08 15:16         ` [PATCHv2 net-next 5/6] erspan: fix the tun_info options_len check Xin Long
@ 2019-10-08 15:16           ` Xin Long
  0 siblings, 0 replies; 12+ messages in thread
From: Xin Long @ 2019-10-08 15:16 UTC (permalink / raw)
  To: network dev; +Cc: davem, Jiri Benc, Thomas Graf, u9012063

Now when a skb comes with ip_tun_info but with no TUNNEL_ERSPAN_OPT to
a md erspan device, it will be dropped.

This patch is to allow this skb to go through this erspan device, and
the options (version, index, hwid, dir, etc.) will be filled with
tunnel's params, which can be configured by users.

This can be verified by:

 # ip net a a; ip net a b
 # ip -n a l a eth0 type veth peer name eth0 netns b
 # ip -n a l s eth0 up; ip -n b link set eth0 up
 # ip -n a a a 10.1.0.1/24 dev eth0; ip -n b a a 10.1.0.2/24 dev eth0
 # ip -n b l a erspan1 type erspan key 1 seq local 10.1.0.2 remote 10.1.0.1
 # ip -n b a a 1.1.1.1/24 dev erspan1; ip -n b l s erspan1 up
 # ip -n b r a 2.1.1.0/24 dev erspan1
 # ip -n a l a erspan1 type erspan key 1 seq local 10.1.0.1 external
 # ip -n a a a 2.1.1.1/24 dev erspan1; ip -n a l s erspan1 up
 # ip -n a r a 1.1.1.0/24 encap ip id 1 dst 10.1.0.2 dev erspan1
 # ip net exec a ping 1.1.1.1 -c 1

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/ipv4/ip_gre.c  | 31 +++++++++++++------------------
 net/ipv6/ip6_gre.c | 35 +++++++++++++++++++----------------
 2 files changed, 32 insertions(+), 34 deletions(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index b5e1f5e..84a445d 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -491,15 +491,12 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
 static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
+	struct erspan_metadata *md = NULL;
 	struct ip_tunnel_info *tun_info;
 	const struct ip_tunnel_key *key;
-	struct erspan_metadata *md;
+	int version, nhoff, thoff;
 	bool truncate = false;
 	__be16 proto;
-	int tunnel_hlen;
-	int version;
-	int nhoff;
-	int thoff;
 
 	tun_info = skb_tunnel_info(skb);
 	if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
@@ -507,15 +504,11 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto err_free_skb;
 
 	key = &tun_info->key;
-	if (!(tun_info->key.tun_flags & TUNNEL_ERSPAN_OPT))
-		goto err_free_skb;
-	if (sizeof(*md) > tun_info->options_len)
-		goto err_free_skb;
-	md = ip_tunnel_info_opts(tun_info);
-
-	/* ERSPAN has fixed 8 byte GRE header */
-	version = md->version;
-	tunnel_hlen = 8 + erspan_hdr_len(version);
+	if (key->tun_flags & TUNNEL_ERSPAN_OPT) {
+		if (tun_info->options_len < sizeof(*md))
+			goto err_free_skb;
+		md = ip_tunnel_info_opts(tun_info);
+	}
 
 	if (skb_cow_head(skb, dev->needed_headroom))
 		goto err_free_skb;
@@ -538,15 +531,17 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev)
 	    (ntohs(ipv6_hdr(skb)->payload_len) > skb->len - thoff))
 		truncate = true;
 
+	version = md ? md->version : tunnel->erspan_ver;
 	if (version == 1) {
 		erspan_build_header(skb, ntohl(tunnel_id_to_key32(key->tun_id)),
-				    ntohl(md->u.index), truncate, true);
+				    md ? ntohl(md->u.index) : tunnel->index,
+				    truncate, true);
 		proto = htons(ETH_P_ERSPAN);
 	} else if (version == 2) {
 		erspan_build_header_v2(skb,
 				       ntohl(tunnel_id_to_key32(key->tun_id)),
-				       md->u.md2.dir,
-				       get_hwid(&md->u.md2),
+				       md ? md->u.md2.dir : tunnel->dir,
+				       md ? get_hwid(&md->u.md2) : tunnel->hwid,
 				       truncate, true);
 		proto = htons(ETH_P_ERSPAN2);
 	} else {
@@ -556,7 +551,7 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev)
 	gre_build_header(skb, 8, TUNNEL_SEQ,
 			 proto, 0, htonl(tunnel->o_seqno++));
 
-	ip_md_tunnel_xmit(skb, dev, IPPROTO_GRE, tunnel_hlen);
+	ip_md_tunnel_xmit(skb, dev, IPPROTO_GRE, 8 + erspan_hdr_len(version));
 
 	return;
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 116987d..3b7d213 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -959,10 +959,11 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
 	 * for native mode, call prepare_ip6gre_xmit_{ipv4,ipv6}.
 	 */
 	if (t->parms.collect_md) {
+		struct erspan_metadata *md = NULL;
 		struct ip_tunnel_info *tun_info;
 		const struct ip_tunnel_key *key;
-		struct erspan_metadata *md;
 		__be32 tun_id;
+		int version;
 
 		tun_info = skb_tunnel_info(skb);
 		if (unlikely(!tun_info ||
@@ -978,23 +979,25 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
 		fl6.flowi6_uid = sock_net_uid(dev_net(dev), NULL);
 
 		dsfield = key->tos;
-		if (!(tun_info->key.tun_flags & TUNNEL_ERSPAN_OPT))
-			goto tx_err;
-		if (sizeof(*md) > tun_info->options_len)
-			goto tx_err;
-		md = ip_tunnel_info_opts(tun_info);
+		if (key->tun_flags & TUNNEL_ERSPAN_OPT) {
+			if (tun_info->options_len < sizeof(*md))
+				goto tx_err;
+			md = ip_tunnel_info_opts(tun_info);
+		}
 
 		tun_id = tunnel_id_to_key32(key->tun_id);
-		if (md->version == 1) {
-			erspan_build_header(skb,
-					    ntohl(tun_id),
-					    ntohl(md->u.index), truncate,
-					    false);
-		} else if (md->version == 2) {
-			erspan_build_header_v2(skb,
-					       ntohl(tun_id),
-					       md->u.md2.dir,
-					       get_hwid(&md->u.md2),
+		version = md ? md->version : t->parms.erspan_ver;
+		if (version == 1) {
+			erspan_build_header(skb, ntohl(tun_id),
+					    md ? ntohl(md->u.index)
+					       : t->parms.index,
+					    truncate, false);
+		} else if (version == 2) {
+			erspan_build_header_v2(skb, ntohl(tun_id),
+					       md ? md->u.md2.dir
+						  : t->parms.dir,
+					       md ? get_hwid(&md->u.md2)
+						  : t->parms.hwid,
 					       truncate, false);
 		} else {
 			goto tx_err;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 net-next 2/6] lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip
  2019-10-08 15:16   ` [PATCHv2 net-next 2/6] lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip Xin Long
  2019-10-08 15:16     ` [PATCHv2 net-next 3/6] lwtunnel: add LWTUNNEL_IP6_OPTS support for lwtunnel_ip6 Xin Long
@ 2019-10-09  7:55     ` Simon Horman
  2019-10-09  9:15       ` Jiri Benc
  2019-10-10  9:45       ` Xin Long
  1 sibling, 2 replies; 12+ messages in thread
From: Simon Horman @ 2019-10-09  7:55 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, davem, Jiri Benc, Thomas Graf, u9012063

On Tue, Oct 08, 2019 at 11:16:12PM +0800, Xin Long wrote:
> This patch is to add LWTUNNEL_IP_OPTS into lwtunnel_ip_t, by which
> users will be able to set options for ip_tunnel_info by "ip route
> encap" for erspan and vxlan's private metadata. Like one way to go
> in iproute is:
> 
>   # ip route add 1.1.1.0/24 encap ip id 1 erspan ver 1 idx 123 \
>       dst 10.1.0.2 dev erspan1
>   # ip route show
>     1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 \
>       tos 0 erspan ver 1 idx 123 dev erspan1 scope link
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Hi Xin,

thanks for your patch.

While I have no objection to allowing options to be set, as per the sample
ip commands above, I am concerned that the approach you have taken exposes
to userspace the internal encoding used by the kernel for these options.

This is the same concerned that was raised by others when I posed a patch
to allow setting of Geneve options in a similar manner. I think what is
called for here, as was the case in the Geneve work, is to expose netlink
attributes for each option that may be set and have the kernel form
these into the internal format (which appears to also be the wire format).

> ---
>  include/uapi/linux/lwtunnel.h |  1 +
>  net/ipv4/ip_tunnel_core.c     | 30 ++++++++++++++++++++++++------
>  2 files changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
> index de696ca..93f2c05 100644
> --- a/include/uapi/linux/lwtunnel.h
> +++ b/include/uapi/linux/lwtunnel.h
> @@ -27,6 +27,7 @@ enum lwtunnel_ip_t {
>  	LWTUNNEL_IP_TOS,
>  	LWTUNNEL_IP_FLAGS,
>  	LWTUNNEL_IP_PAD,
> +	LWTUNNEL_IP_OPTS,
>  	__LWTUNNEL_IP_MAX,
>  };
>  
> diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
> index 10f0848..d9b7188 100644
> --- a/net/ipv4/ip_tunnel_core.c
> +++ b/net/ipv4/ip_tunnel_core.c
> @@ -218,6 +218,7 @@ static const struct nla_policy ip_tun_policy[LWTUNNEL_IP_MAX + 1] = {
>  	[LWTUNNEL_IP_TTL]	= { .type = NLA_U8 },
>  	[LWTUNNEL_IP_TOS]	= { .type = NLA_U8 },
>  	[LWTUNNEL_IP_FLAGS]	= { .type = NLA_U16 },
> +	[LWTUNNEL_IP_OPTS]	= { .type = NLA_BINARY },
>  };

...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 net-next 2/6] lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip
  2019-10-09  7:55     ` [PATCHv2 net-next 2/6] lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip Simon Horman
@ 2019-10-09  9:15       ` Jiri Benc
  2019-10-10  9:45       ` Xin Long
  1 sibling, 0 replies; 12+ messages in thread
From: Jiri Benc @ 2019-10-09  9:15 UTC (permalink / raw)
  To: Simon Horman; +Cc: Xin Long, network dev, davem, Thomas Graf, u9012063

On Wed, 9 Oct 2019 09:55:27 +0200, Simon Horman wrote:
> This is the same concerned that was raised by others when I posed a patch
> to allow setting of Geneve options in a similar manner. I think what is
> called for here, as was the case in the Geneve work, is to expose netlink
> attributes for each option that may be set and have the kernel form
> these into the internal format (which appears to also be the wire format).

I agree with Simon.

Thanks,

 Jiri

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 net-next 2/6] lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip
  2019-10-09  7:55     ` [PATCHv2 net-next 2/6] lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip Simon Horman
  2019-10-09  9:15       ` Jiri Benc
@ 2019-10-10  9:45       ` Xin Long
  2019-10-11  5:31         ` Simon Horman
  1 sibling, 1 reply; 12+ messages in thread
From: Xin Long @ 2019-10-10  9:45 UTC (permalink / raw)
  To: Simon Horman; +Cc: network dev, davem, Jiri Benc, Thomas Graf, William Tu

On Wed, Oct 9, 2019 at 3:55 PM Simon Horman <simon.horman@netronome.com> wrote:
>
> On Tue, Oct 08, 2019 at 11:16:12PM +0800, Xin Long wrote:
> > This patch is to add LWTUNNEL_IP_OPTS into lwtunnel_ip_t, by which
> > users will be able to set options for ip_tunnel_info by "ip route
> > encap" for erspan and vxlan's private metadata. Like one way to go
> > in iproute is:
> >
> >   # ip route add 1.1.1.0/24 encap ip id 1 erspan ver 1 idx 123 \
> >       dst 10.1.0.2 dev erspan1
> >   # ip route show
> >     1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 \
> >       tos 0 erspan ver 1 idx 123 dev erspan1 scope link
> >
> > Signed-off-by: Xin Long <lucien.xin@gmail.com>
>
> Hi Xin,
>
> thanks for your patch.
>
> While I have no objection to allowing options to be set, as per the sample
> ip commands above, I am concerned that the approach you have taken exposes
> to userspace the internal encoding used by the kernel for these options.
>
> This is the same concerned that was raised by others when I posed a patch
> to allow setting of Geneve options in a similar manner. I think what is
> called for here, as was the case in the Geneve work, is to expose netlink
> attributes for each option that may be set and have the kernel form
> these into the internal format (which appears to also be the wire format).
Understand.

Do you think if it's necessary to support for setting these options
by ip route command?

or if yes, should we introduce a global lwtun_option_ops_list where
geneve/vxlan/erspan could register their own option parsing functions?

>
> > ---
> >  include/uapi/linux/lwtunnel.h |  1 +
> >  net/ipv4/ip_tunnel_core.c     | 30 ++++++++++++++++++++++++------
> >  2 files changed, 25 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
> > index de696ca..93f2c05 100644
> > --- a/include/uapi/linux/lwtunnel.h
> > +++ b/include/uapi/linux/lwtunnel.h
> > @@ -27,6 +27,7 @@ enum lwtunnel_ip_t {
> >       LWTUNNEL_IP_TOS,
> >       LWTUNNEL_IP_FLAGS,
> >       LWTUNNEL_IP_PAD,
> > +     LWTUNNEL_IP_OPTS,
> >       __LWTUNNEL_IP_MAX,
> >  };
> >
> > diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
> > index 10f0848..d9b7188 100644
> > --- a/net/ipv4/ip_tunnel_core.c
> > +++ b/net/ipv4/ip_tunnel_core.c
> > @@ -218,6 +218,7 @@ static const struct nla_policy ip_tun_policy[LWTUNNEL_IP_MAX + 1] = {
> >       [LWTUNNEL_IP_TTL]       = { .type = NLA_U8 },
> >       [LWTUNNEL_IP_TOS]       = { .type = NLA_U8 },
> >       [LWTUNNEL_IP_FLAGS]     = { .type = NLA_U16 },
> > +     [LWTUNNEL_IP_OPTS]      = { .type = NLA_BINARY },
> >  };
>
> ...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 net-next 2/6] lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip
  2019-10-10  9:45       ` Xin Long
@ 2019-10-11  5:31         ` Simon Horman
  2019-10-14 11:26           ` Xin Long
  0 siblings, 1 reply; 12+ messages in thread
From: Simon Horman @ 2019-10-11  5:31 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, davem, Jiri Benc, Thomas Graf, William Tu

On Thu, Oct 10, 2019 at 05:45:14PM +0800, Xin Long wrote:
> On Wed, Oct 9, 2019 at 3:55 PM Simon Horman <simon.horman@netronome.com> wrote:
> >
> > On Tue, Oct 08, 2019 at 11:16:12PM +0800, Xin Long wrote:
> > > This patch is to add LWTUNNEL_IP_OPTS into lwtunnel_ip_t, by which
> > > users will be able to set options for ip_tunnel_info by "ip route
> > > encap" for erspan and vxlan's private metadata. Like one way to go
> > > in iproute is:
> > >
> > >   # ip route add 1.1.1.0/24 encap ip id 1 erspan ver 1 idx 123 \
> > >       dst 10.1.0.2 dev erspan1
> > >   # ip route show
> > >     1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 \
> > >       tos 0 erspan ver 1 idx 123 dev erspan1 scope link
> > >
> > > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> >
> > Hi Xin,
> >
> > thanks for your patch.
> >
> > While I have no objection to allowing options to be set, as per the sample
> > ip commands above, I am concerned that the approach you have taken exposes
> > to userspace the internal encoding used by the kernel for these options.
> >
> > This is the same concerned that was raised by others when I posed a patch
> > to allow setting of Geneve options in a similar manner. I think what is
> > called for here, as was the case in the Geneve work, is to expose netlink
> > attributes for each option that may be set and have the kernel form
> > these into the internal format (which appears to also be the wire format).
>
> Understand.
>
> Do you think if it's necessary to support for setting these options
> by ip route command?
>
> or if yes, should we introduce a global lwtun_option_ops_list where
> geneve/vxlan/erspan could register their own option parsing functions?

Hi Xin,

In the case of Geneve the options are (now) exposed via tc cls_flower and
act_tunnel rather than ip route. The approach taken was to create an
ENC_OPTS attribute with a ENC_OPTS_GENEVE sub-attribute which in turn has
sub-attributes that allow options to be described.

The idea behind this design, as I recall, was to allow other options, say
for VXLAN, to be added via a new sub-attribute of ENC_OPTS.  Without
examining the details beyond your patch I think a similar approach would
work well for options supplied via ip route.

...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 net-next 2/6] lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip
  2019-10-11  5:31         ` Simon Horman
@ 2019-10-14 11:26           ` Xin Long
  0 siblings, 0 replies; 12+ messages in thread
From: Xin Long @ 2019-10-14 11:26 UTC (permalink / raw)
  To: Simon Horman; +Cc: network dev, davem, Jiri Benc, Thomas Graf, William Tu

On Fri, Oct 11, 2019 at 1:31 PM Simon Horman <simon.horman@netronome.com> wrote:
>
> On Thu, Oct 10, 2019 at 05:45:14PM +0800, Xin Long wrote:
> > On Wed, Oct 9, 2019 at 3:55 PM Simon Horman <simon.horman@netronome.com> wrote:
> > >
> > > On Tue, Oct 08, 2019 at 11:16:12PM +0800, Xin Long wrote:
> > > > This patch is to add LWTUNNEL_IP_OPTS into lwtunnel_ip_t, by which
> > > > users will be able to set options for ip_tunnel_info by "ip route
> > > > encap" for erspan and vxlan's private metadata. Like one way to go
> > > > in iproute is:
> > > >
> > > >   # ip route add 1.1.1.0/24 encap ip id 1 erspan ver 1 idx 123 \
> > > >       dst 10.1.0.2 dev erspan1
> > > >   # ip route show
> > > >     1.1.1.0/24  encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 \
> > > >       tos 0 erspan ver 1 idx 123 dev erspan1 scope link
> > > >
> > > > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > >
> > > Hi Xin,
> > >
> > > thanks for your patch.
> > >
> > > While I have no objection to allowing options to be set, as per the sample
> > > ip commands above, I am concerned that the approach you have taken exposes
> > > to userspace the internal encoding used by the kernel for these options.
> > >
> > > This is the same concerned that was raised by others when I posed a patch
> > > to allow setting of Geneve options in a similar manner. I think what is
> > > called for here, as was the case in the Geneve work, is to expose netlink
> > > attributes for each option that may be set and have the kernel form
> > > these into the internal format (which appears to also be the wire format).
> >
> > Understand.
> >
> > Do you think if it's necessary to support for setting these options
> > by ip route command?
> >
> > or if yes, should we introduce a global lwtun_option_ops_list where
> > geneve/vxlan/erspan could register their own option parsing functions?
>
> Hi Xin,
>
> In the case of Geneve the options are (now) exposed via tc cls_flower and
> act_tunnel rather than ip route. The approach taken was to create an
> ENC_OPTS attribute with a ENC_OPTS_GENEVE sub-attribute which in turn has
> sub-attributes that allow options to be described.
>
> The idea behind this design, as I recall, was to allow other options, say
> for VXLAN, to be added via a new sub-attribute of ENC_OPTS.  Without
> examining the details beyond your patch I think a similar approach would
> work well for options supplied via ip route.
Thanks, I will give it a try.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-10-14 11:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-08 15:16 [PATCHv2 net-next 0/6] net: add support for ip_tun_info options setting Xin Long
2019-10-08 15:16 ` [PATCHv2 net-next 1/6] lwtunnel: add options process for arp request Xin Long
2019-10-08 15:16   ` [PATCHv2 net-next 2/6] lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip Xin Long
2019-10-08 15:16     ` [PATCHv2 net-next 3/6] lwtunnel: add LWTUNNEL_IP6_OPTS support for lwtunnel_ip6 Xin Long
2019-10-08 15:16       ` [PATCHv2 net-next 4/6] vxlan: check tun_info options_len properly Xin Long
2019-10-08 15:16         ` [PATCHv2 net-next 5/6] erspan: fix the tun_info options_len check Xin Long
2019-10-08 15:16           ` [PATCHv2 net-next 6/6] erspan: make md work without TUNNEL_ERSPAN_OPT set Xin Long
2019-10-09  7:55     ` [PATCHv2 net-next 2/6] lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip Simon Horman
2019-10-09  9:15       ` Jiri Benc
2019-10-10  9:45       ` Xin Long
2019-10-11  5:31         ` Simon Horman
2019-10-14 11:26           ` Xin Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).