All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/4] net: Geneve options support for TC act_tunnel_key
@ 2018-06-27  4:39 Jakub Kicinski
  2018-06-27  4:39 ` [PATCH net-next v2 1/4] net/sched: act_tunnel_key: disambiguate metadata dst error cases Jakub Kicinski
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Jakub Kicinski @ 2018-06-27  4:39 UTC (permalink / raw)
  To: davem, jbenc
  Cc: Roopa Prabhu, jiri, jhs, xiyou.wangcong, oss-drivers, netdev,
	Jakub Kicinski

Hi,

Simon & Pieter say:

This set adds Geneve Options support to the TC tunnel key action.
It provides the plumbing required to configure Geneve variable length
options.  The options can be configured in the form CLASS:TYPE:DATA,
where CLASS is represented as a 16bit hexadecimal value, TYPE as an 8bit
hexadecimal value and DATA as a variable length hexadecimal value.
Additionally multiple options may be listed using a comma delimiter.

v2:
 - fix sparse warnings in patches 3 and 4 (first one reported by
   build bot).

Pieter Jansen van Vuuren (1):
  net: check tunnel option type in tunnel flags

Simon Horman (3):
  net/sched: act_tunnel_key: disambiguate metadata dst error cases
  net/sched: act_tunnel_key: add extended ack support
  net/sched: add tunnel option support to act_tunnel_key

 drivers/net/geneve.c                      |   6 +-
 drivers/net/vxlan.c                       |   3 +-
 include/net/ip_tunnels.h                  |   8 +-
 include/uapi/linux/tc_act/tc_tunnel_key.h |  26 +++
 net/core/filter.c                         |   2 +-
 net/ipv4/ip_gre.c                         |   2 +
 net/ipv6/ip6_gre.c                        |   2 +
 net/openvswitch/flow_netlink.c            |   7 +-
 net/sched/act_tunnel_key.c                | 246 +++++++++++++++++++++-
 9 files changed, 284 insertions(+), 18 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next v2 1/4] net/sched: act_tunnel_key: disambiguate metadata dst error cases
  2018-06-27  4:39 [PATCH net-next v2 0/4] net: Geneve options support for TC act_tunnel_key Jakub Kicinski
@ 2018-06-27  4:39 ` Jakub Kicinski
  2018-06-27  4:39 ` [PATCH net-next v2 2/4] net/sched: act_tunnel_key: add extended ack support Jakub Kicinski
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Jakub Kicinski @ 2018-06-27  4:39 UTC (permalink / raw)
  To: davem, jbenc
  Cc: Roopa Prabhu, jiri, jhs, xiyou.wangcong, oss-drivers, netdev,
	Simon Horman

From: Simon Horman <simon.horman@netronome.com>

Metadata may be NULL for one of two reasons:
* Missing user input
* Failure to allocate the metadata dst

Disambiguate these case by returning -EINVAL for the former and -ENOMEM
for the latter rather than -EINVAL for both cases.

This is in preparation for using extended ack to provide more information
to users when parsing their input.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 net/sched/act_tunnel_key.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index 626dac81a48a..2edd389e7c92 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -143,10 +143,13 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 			metadata = __ipv6_tun_set_dst(&saddr, &daddr, 0, 0, dst_port,
 						      0, flags,
 						      key_id, 0);
+		} else {
+			ret = -EINVAL;
+			goto err_out;
 		}
 
 		if (!metadata) {
-			ret = -EINVAL;
+			ret = -ENOMEM;
 			goto err_out;
 		}
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v2 2/4] net/sched: act_tunnel_key: add extended ack support
  2018-06-27  4:39 [PATCH net-next v2 0/4] net: Geneve options support for TC act_tunnel_key Jakub Kicinski
  2018-06-27  4:39 ` [PATCH net-next v2 1/4] net/sched: act_tunnel_key: disambiguate metadata dst error cases Jakub Kicinski
@ 2018-06-27  4:39 ` Jakub Kicinski
  2018-06-27  4:39 ` [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags Jakub Kicinski
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Jakub Kicinski @ 2018-06-27  4:39 UTC (permalink / raw)
  To: davem, jbenc
  Cc: Roopa Prabhu, jiri, jhs, xiyou.wangcong, oss-drivers, netdev,
	Simon Horman, Alexander Aring, Pieter Jansen van Vuuren

From: Simon Horman <simon.horman@netronome.com>

Add extended ack support for the tunnel key action by using NL_SET_ERR_MSG
during validation of user input.

Cc: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
---
 net/sched/act_tunnel_key.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index 2edd389e7c92..20e98ed8d498 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -86,16 +86,22 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 	int ret = 0;
 	int err;
 
-	if (!nla)
+	if (!nla) {
+		NL_SET_ERR_MSG(extack, "Tunnel requires attributes to be passed");
 		return -EINVAL;
+	}
 
 	err = nla_parse_nested(tb, TCA_TUNNEL_KEY_MAX, nla, tunnel_key_policy,
-			       NULL);
-	if (err < 0)
+			       extack);
+	if (err < 0) {
+		NL_SET_ERR_MSG(extack, "Failed to parse nested tunnel key attributes");
 		return err;
+	}
 
-	if (!tb[TCA_TUNNEL_KEY_PARMS])
+	if (!tb[TCA_TUNNEL_KEY_PARMS]) {
+		NL_SET_ERR_MSG(extack, "Missing tunnel key parameters");
 		return -EINVAL;
+	}
 
 	parm = nla_data(tb[TCA_TUNNEL_KEY_PARMS]);
 	exists = tcf_idr_check(tn, parm->index, a, bind);
@@ -107,6 +113,7 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 		break;
 	case TCA_TUNNEL_KEY_ACT_SET:
 		if (!tb[TCA_TUNNEL_KEY_ENC_KEY_ID]) {
+			NL_SET_ERR_MSG(extack, "Missing tunnel key id");
 			ret = -EINVAL;
 			goto err_out;
 		}
@@ -144,11 +151,13 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 						      0, flags,
 						      key_id, 0);
 		} else {
+			NL_SET_ERR_MSG(extack, "Missing either ipv4 or ipv6 src and dst");
 			ret = -EINVAL;
 			goto err_out;
 		}
 
 		if (!metadata) {
+			NL_SET_ERR_MSG(extack, "Cannot allocate tunnel metadata dst");
 			ret = -ENOMEM;
 			goto err_out;
 		}
@@ -156,6 +165,7 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 		metadata->u.tun_info.mode |= IP_TUNNEL_INFO_TX;
 		break;
 	default:
+		NL_SET_ERR_MSG(extack, "Unknown tunnel key action");
 		ret = -EINVAL;
 		goto err_out;
 	}
@@ -163,14 +173,18 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 	if (!exists) {
 		ret = tcf_idr_create(tn, parm->index, est, a,
 				     &act_tunnel_key_ops, bind, true);
-		if (ret)
+		if (ret) {
+			NL_SET_ERR_MSG(extack, "Cannot create TC IDR");
 			return ret;
+		}
 
 		ret = ACT_P_CREATED;
 	} else {
 		tcf_idr_release(*a, bind);
-		if (!ovr)
+		if (!ovr) {
+			NL_SET_ERR_MSG(extack, "TC IDR already exists");
 			return -EEXIST;
+		}
 	}
 
 	t = to_tunnel_key(*a);
@@ -180,6 +194,7 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 	if (unlikely(!params_new)) {
 		if (ret == ACT_P_CREATED)
 			tcf_idr_release(*a, bind);
+		NL_SET_ERR_MSG(extack, "Cannot allocate tunnel key parameters");
 		return -ENOMEM;
 	}
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags
  2018-06-27  4:39 [PATCH net-next v2 0/4] net: Geneve options support for TC act_tunnel_key Jakub Kicinski
  2018-06-27  4:39 ` [PATCH net-next v2 1/4] net/sched: act_tunnel_key: disambiguate metadata dst error cases Jakub Kicinski
  2018-06-27  4:39 ` [PATCH net-next v2 2/4] net/sched: act_tunnel_key: add extended ack support Jakub Kicinski
@ 2018-06-27  4:39 ` Jakub Kicinski
  2018-06-27  9:49   ` Daniel Borkmann
  2018-06-27  4:39 ` [PATCH net-next v2 4/4] net/sched: add tunnel option support to act_tunnel_key Jakub Kicinski
  2018-06-29 14:51 ` [PATCH net-next v2 0/4] net: Geneve options support for TC act_tunnel_key David Miller
  4 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2018-06-27  4:39 UTC (permalink / raw)
  To: davem, jbenc
  Cc: Roopa Prabhu, jiri, jhs, xiyou.wangcong, oss-drivers, netdev,
	Pieter Jansen van Vuuren, Jakub Kicinski, Daniel Borkmann

From: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>

Check the tunnel option type stored in tunnel flags when creating options
for tunnels. Thereby ensuring we do not set geneve, vxlan or erspan tunnel
options on interfaces that are not associated with them.

Make sure all users of the infrastructure set correct flags, for the BPF
helper we have to set all bits to keep backward compatibility.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
CC: Daniel Borkmann <daniel@iogearbox.net>

v2:
 - use __be16 for dst_opt_type in net/openvswitch/flow_netlink.c (build bot).
---
 drivers/net/geneve.c           | 6 ++++--
 drivers/net/vxlan.c            | 3 ++-
 include/net/ip_tunnels.h       | 8 ++++++--
 net/core/filter.c              | 2 +-
 net/ipv4/ip_gre.c              | 2 ++
 net/ipv6/ip6_gre.c             | 2 ++
 net/openvswitch/flow_netlink.c | 7 ++++++-
 7 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 3e94375b9b01..471edd76ff55 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -236,7 +236,8 @@ static void geneve_rx(struct geneve_dev *geneve, struct geneve_sock *gs,
 		}
 		/* Update tunnel dst according to Geneve options. */
 		ip_tunnel_info_opts_set(&tun_dst->u.tun_info,
-					gnvh->options, gnvh->opt_len * 4);
+					gnvh->options, gnvh->opt_len * 4,
+					TUNNEL_GENEVE_OPT);
 	} else {
 		/* Drop packets w/ critical options,
 		 * since we don't support any...
@@ -675,7 +676,8 @@ static void geneve_build_header(struct genevehdr *geneveh,
 	geneveh->proto_type = htons(ETH_P_TEB);
 	geneveh->rsvd2 = 0;
 
-	ip_tunnel_info_opts_get(geneveh->options, info);
+	if (info->key.tun_flags & TUNNEL_GENEVE_OPT)
+		ip_tunnel_info_opts_get(geneveh->options, info);
 }
 
 static int geneve_build_skb(struct dst_entry *dst, struct sk_buff *skb,
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index cc14e0cd5647..7eb30d7c8bd7 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2122,7 +2122,8 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		vni = tunnel_id_to_key32(info->key.tun_id);
 		ifindex = 0;
 		dst_cache = &info->dst_cache;
-		if (info->options_len)
+		if (info->options_len &&
+		    info->key.tun_flags & TUNNEL_VXLAN_OPT)
 			md = ip_tunnel_info_opts(info);
 		ttl = info->key.ttl;
 		tos = info->key.tos;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 90ff430f5e9d..b0d022ff6ea1 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -466,10 +466,12 @@ static inline void ip_tunnel_info_opts_get(void *to,
 }
 
 static inline void ip_tunnel_info_opts_set(struct ip_tunnel_info *info,
-					   const void *from, int len)
+					   const void *from, int len,
+					   __be16 flags)
 {
 	memcpy(ip_tunnel_info_opts(info), from, len);
 	info->options_len = len;
+	info->key.tun_flags |= flags;
 }
 
 static inline struct ip_tunnel_info *lwt_tun_info(struct lwtunnel_state *lwtstate)
@@ -511,9 +513,11 @@ static inline void ip_tunnel_info_opts_get(void *to,
 }
 
 static inline void ip_tunnel_info_opts_set(struct ip_tunnel_info *info,
-					   const void *from, int len)
+					   const void *from, int len,
+					   __be16 flags)
 {
 	info->options_len = 0;
+	info->key.tun_flags |= flags;
 }
 
 #endif /* CONFIG_INET */
diff --git a/net/core/filter.c b/net/core/filter.c
index e7f12e9f598c..dade922678f6 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3582,7 +3582,7 @@ BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
 	if (unlikely(size > IP_TUNNEL_OPTS_MAX))
 		return -ENOMEM;
 
-	ip_tunnel_info_opts_set(info, from, size);
+	ip_tunnel_info_opts_set(info, from, size, TUNNEL_OPTIONS_PRESENT);
 
 	return 0;
 }
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 2d8efeecf619..c8ca5d8f0f75 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -587,6 +587,8 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev,
 		goto err_free_skb;
 
 	key = &tun_info->key;
+	if (!(tun_info->key.tun_flags & TUNNEL_ERSPAN_OPT))
+		goto err_free_rt;
 	md = ip_tunnel_info_opts(tun_info);
 	if (!md)
 		goto err_free_rt;
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index c8cf2fdbb13b..367177786e34 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -990,6 +990,8 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
 		fl6.flowi6_uid = sock_net_uid(dev_net(dev), NULL);
 
 		dsfield = key->tos;
+		if (!(tun_info->key.tun_flags & TUNNEL_ERSPAN_OPT))
+			goto tx_err;
 		md = ip_tunnel_info_opts(tun_info);
 		if (!md)
 			goto tx_err;
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 492ab0c36f7c..391c4073a6dc 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -2516,7 +2516,9 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 	struct ovs_tunnel_info *ovs_tun;
 	struct nlattr *a;
 	int err = 0, start, opts_type;
+	__be16 dst_opt_type;
 
+	dst_opt_type = 0;
 	ovs_match_init(&match, &key, true, NULL);
 	opts_type = ip_tun_from_nlattr(nla_data(attr), &match, false, log);
 	if (opts_type < 0)
@@ -2528,10 +2530,13 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 			err = validate_geneve_opts(&key);
 			if (err < 0)
 				return err;
+			dst_opt_type = TUNNEL_GENEVE_OPT;
 			break;
 		case OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS:
+			dst_opt_type = TUNNEL_VXLAN_OPT;
 			break;
 		case OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS:
+			dst_opt_type = TUNNEL_ERSPAN_OPT;
 			break;
 		}
 	}
@@ -2574,7 +2579,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 	 */
 	ip_tunnel_info_opts_set(tun_info,
 				TUN_METADATA_OPTS(&key, key.tun_opts_len),
-				key.tun_opts_len);
+				key.tun_opts_len, dst_opt_type);
 	add_nested_action_end(*sfa, start);
 
 	return err;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v2 4/4] net/sched: add tunnel option support to act_tunnel_key
  2018-06-27  4:39 [PATCH net-next v2 0/4] net: Geneve options support for TC act_tunnel_key Jakub Kicinski
                   ` (2 preceding siblings ...)
  2018-06-27  4:39 ` [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags Jakub Kicinski
@ 2018-06-27  4:39 ` Jakub Kicinski
  2018-06-29 14:51 ` [PATCH net-next v2 0/4] net: Geneve options support for TC act_tunnel_key David Miller
  4 siblings, 0 replies; 15+ messages in thread
From: Jakub Kicinski @ 2018-06-27  4:39 UTC (permalink / raw)
  To: davem, jbenc
  Cc: Roopa Prabhu, jiri, jhs, xiyou.wangcong, oss-drivers, netdev,
	Simon Horman, Pieter Jansen van Vuuren

From: Simon Horman <simon.horman@netronome.com>

Allow setting tunnel options using the act_tunnel_key action.

Options are expressed as class:type:data and multiple options
may be listed using a comma delimiter.

 # ip link add name geneve0 type geneve dstport 0 external
 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
            set src_ip 10.0.99.192 \
            dst_ip 10.0.99.193 \
            dst_port 6081 \
            id 11 \
            geneve_opts 0102:80:00800022,0102:80:00800022 \
    action mirred egress redirect dev geneve0

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
v2:
 - use nla_get_be16()/nla_put_be16() for TCA_TUNNEL_KEY_ENC_OPT_GENEVE_CLASS
   as struct geneve_opt :: opt_class is __be16.
---
 include/uapi/linux/tc_act/tc_tunnel_key.h |  26 +++
 net/sched/act_tunnel_key.c                | 214 +++++++++++++++++++++-
 2 files changed, 236 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/tc_act/tc_tunnel_key.h b/include/uapi/linux/tc_act/tc_tunnel_key.h
index 72bbefe5d1d1..e284fec8c467 100644
--- a/include/uapi/linux/tc_act/tc_tunnel_key.h
+++ b/include/uapi/linux/tc_act/tc_tunnel_key.h
@@ -36,9 +36,35 @@ enum {
 	TCA_TUNNEL_KEY_PAD,
 	TCA_TUNNEL_KEY_ENC_DST_PORT,	/* be16 */
 	TCA_TUNNEL_KEY_NO_CSUM,		/* u8 */
+	TCA_TUNNEL_KEY_ENC_OPTS,	/* Nested TCA_TUNNEL_KEY_ENC_OPTS_
+					 * attributes
+					 */
 	__TCA_TUNNEL_KEY_MAX,
 };
 
 #define TCA_TUNNEL_KEY_MAX (__TCA_TUNNEL_KEY_MAX - 1)
 
+enum {
+	TCA_TUNNEL_KEY_ENC_OPTS_UNSPEC,
+	TCA_TUNNEL_KEY_ENC_OPTS_GENEVE,		/* Nested
+						 * TCA_TUNNEL_KEY_ENC_OPTS_
+						 * attributes
+						 */
+	__TCA_TUNNEL_KEY_ENC_OPTS_MAX,
+};
+
+#define TCA_TUNNEL_KEY_ENC_OPTS_MAX (__TCA_TUNNEL_KEY_ENC_OPTS_MAX - 1)
+
+enum {
+	TCA_TUNNEL_KEY_ENC_OPT_GENEVE_UNSPEC,
+	TCA_TUNNEL_KEY_ENC_OPT_GENEVE_CLASS,		/* be16 */
+	TCA_TUNNEL_KEY_ENC_OPT_GENEVE_TYPE,		/* u8 */
+	TCA_TUNNEL_KEY_ENC_OPT_GENEVE_DATA,		/* 4 to 128 bytes */
+
+	__TCA_TUNNEL_KEY_ENC_OPT_GENEVE_MAX,
+};
+
+#define TCA_TUNNEL_KEY_ENC_OPT_GENEVE_MAX \
+	(__TCA_TUNNEL_KEY_ENC_OPT_GENEVE_MAX - 1)
+
 #endif
diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index 20e98ed8d498..ea203e386a92 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -13,6 +13,7 @@
 #include <linux/kernel.h>
 #include <linux/skbuff.h>
 #include <linux/rtnetlink.h>
+#include <net/geneve.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
 #include <net/dst.h>
@@ -57,6 +58,135 @@ static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
 	return action;
 }
 
+static const struct nla_policy
+enc_opts_policy[TCA_TUNNEL_KEY_ENC_OPTS_MAX + 1] = {
+	[TCA_TUNNEL_KEY_ENC_OPTS_GENEVE]	= { .type = NLA_NESTED },
+};
+
+static const struct nla_policy
+geneve_opt_policy[TCA_TUNNEL_KEY_ENC_OPT_GENEVE_MAX + 1] = {
+	[TCA_TUNNEL_KEY_ENC_OPT_GENEVE_CLASS]	   = { .type = NLA_U16 },
+	[TCA_TUNNEL_KEY_ENC_OPT_GENEVE_TYPE]	   = { .type = NLA_U8 },
+	[TCA_TUNNEL_KEY_ENC_OPT_GENEVE_DATA]	   = { .type = NLA_BINARY,
+						       .len = 128 },
+};
+
+static int
+tunnel_key_copy_geneve_opt(const struct nlattr *nla, void *dst, int dst_len,
+			   struct netlink_ext_ack *extack)
+{
+	struct nlattr *tb[TCA_TUNNEL_KEY_ENC_OPT_GENEVE_MAX + 1];
+	int err, data_len, opt_len;
+	u8 *data;
+
+	err = nla_parse_nested(tb, TCA_TUNNEL_KEY_ENC_OPT_GENEVE_MAX,
+			       nla, geneve_opt_policy, extack);
+	if (err < 0)
+		return err;
+
+	if (!tb[TCA_TUNNEL_KEY_ENC_OPT_GENEVE_CLASS] ||
+	    !tb[TCA_TUNNEL_KEY_ENC_OPT_GENEVE_TYPE] ||
+	    !tb[TCA_TUNNEL_KEY_ENC_OPT_GENEVE_DATA]) {
+		NL_SET_ERR_MSG(extack, "Missing tunnel key geneve option class, type or data");
+		return -EINVAL;
+	}
+
+	data = nla_data(tb[TCA_TUNNEL_KEY_ENC_OPT_GENEVE_DATA]);
+	data_len = nla_len(tb[TCA_TUNNEL_KEY_ENC_OPT_GENEVE_DATA]);
+	if (data_len < 4) {
+		NL_SET_ERR_MSG(extack, "Tunnel key geneve option data is less than 4 bytes long");
+		return -ERANGE;
+	}
+	if (data_len % 4) {
+		NL_SET_ERR_MSG(extack, "Tunnel key geneve option data is not a multiple of 4 bytes long");
+		return -ERANGE;
+	}
+
+	opt_len = sizeof(struct geneve_opt) + data_len;
+	if (dst) {
+		struct geneve_opt *opt = dst;
+
+		WARN_ON(dst_len < opt_len);
+
+		opt->opt_class =
+			nla_get_be16(tb[TCA_TUNNEL_KEY_ENC_OPT_GENEVE_CLASS]);
+		opt->type = nla_get_u8(tb[TCA_TUNNEL_KEY_ENC_OPT_GENEVE_TYPE]);
+		opt->length = data_len / 4; /* length is in units of 4 bytes */
+		opt->r1 = 0;
+		opt->r2 = 0;
+		opt->r3 = 0;
+
+		memcpy(opt + 1, data, data_len);
+	}
+
+	return opt_len;
+}
+
+static int tunnel_key_copy_opts(const struct nlattr *nla, u8 *dst,
+				int dst_len, struct netlink_ext_ack *extack)
+{
+	int err, rem, opt_len, len = nla_len(nla), opts_len = 0;
+	const struct nlattr *attr, *head = nla_data(nla);
+
+	err = nla_validate(head, len, TCA_TUNNEL_KEY_ENC_OPTS_MAX,
+			   enc_opts_policy, extack);
+	if (err)
+		return err;
+
+	nla_for_each_attr(attr, head, len, rem) {
+		switch (nla_type(attr)) {
+		case TCA_TUNNEL_KEY_ENC_OPTS_GENEVE:
+			opt_len = tunnel_key_copy_geneve_opt(attr, dst,
+							     dst_len, extack);
+			if (opt_len < 0)
+				return opt_len;
+			opts_len += opt_len;
+			if (dst) {
+				dst_len -= opt_len;
+				dst += opt_len;
+			}
+			break;
+		}
+	}
+
+	if (!opts_len) {
+		NL_SET_ERR_MSG(extack, "Empty list of tunnel options");
+		return -EINVAL;
+	}
+
+	if (rem > 0) {
+		NL_SET_ERR_MSG(extack, "Trailing data after parsing tunnel key options attributes");
+		return -EINVAL;
+	}
+
+	return opts_len;
+}
+
+static int tunnel_key_get_opts_len(struct nlattr *nla,
+				   struct netlink_ext_ack *extack)
+{
+	return tunnel_key_copy_opts(nla, NULL, 0, extack);
+}
+
+static int tunnel_key_opts_set(struct nlattr *nla, struct ip_tunnel_info *info,
+			       int opts_len, struct netlink_ext_ack *extack)
+{
+	info->options_len = opts_len;
+	switch (nla_type(nla_data(nla))) {
+	case TCA_TUNNEL_KEY_ENC_OPTS_GENEVE:
+#if IS_ENABLED(CONFIG_INET)
+		info->key.tun_flags |= TUNNEL_GENEVE_OPT;
+		return tunnel_key_copy_opts(nla, ip_tunnel_info_opts(info),
+					    opts_len, extack);
+#else
+		return -EAFNOSUPPORT;
+#endif
+	default:
+		NL_SET_ERR_MSG(extack, "Cannot set tunnel options for unknown tunnel type");
+		return -EINVAL;
+	}
+}
+
 static const struct nla_policy tunnel_key_policy[TCA_TUNNEL_KEY_MAX + 1] = {
 	[TCA_TUNNEL_KEY_PARMS]	    = { .len = sizeof(struct tc_tunnel_key) },
 	[TCA_TUNNEL_KEY_ENC_IPV4_SRC] = { .type = NLA_U32 },
@@ -66,6 +196,7 @@ static const struct nla_policy tunnel_key_policy[TCA_TUNNEL_KEY_MAX + 1] = {
 	[TCA_TUNNEL_KEY_ENC_KEY_ID]   = { .type = NLA_U32 },
 	[TCA_TUNNEL_KEY_ENC_DST_PORT] = {.type = NLA_U16},
 	[TCA_TUNNEL_KEY_NO_CSUM]      = { .type = NLA_U8 },
+	[TCA_TUNNEL_KEY_ENC_OPTS]     = { .type = NLA_NESTED },
 };
 
 static int tunnel_key_init(struct net *net, struct nlattr *nla,
@@ -81,6 +212,7 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 	struct tcf_tunnel_key *t;
 	bool exists = false;
 	__be16 dst_port = 0;
+	int opts_len = 0;
 	__be64 key_id;
 	__be16 flags;
 	int ret = 0;
@@ -128,6 +260,15 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 		if (tb[TCA_TUNNEL_KEY_ENC_DST_PORT])
 			dst_port = nla_get_be16(tb[TCA_TUNNEL_KEY_ENC_DST_PORT]);
 
+		if (tb[TCA_TUNNEL_KEY_ENC_OPTS]) {
+			opts_len = tunnel_key_get_opts_len(tb[TCA_TUNNEL_KEY_ENC_OPTS],
+							   extack);
+			if (opts_len < 0) {
+				ret = opts_len;
+				goto err_out;
+			}
+		}
+
 		if (tb[TCA_TUNNEL_KEY_ENC_IPV4_SRC] &&
 		    tb[TCA_TUNNEL_KEY_ENC_IPV4_DST]) {
 			__be32 saddr;
@@ -138,7 +279,7 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 
 			metadata = __ip_tun_set_dst(saddr, daddr, 0, 0,
 						    dst_port, flags,
-						    key_id, 0);
+						    key_id, opts_len);
 		} else if (tb[TCA_TUNNEL_KEY_ENC_IPV6_SRC] &&
 			   tb[TCA_TUNNEL_KEY_ENC_IPV6_DST]) {
 			struct in6_addr saddr;
@@ -162,6 +303,14 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 			goto err_out;
 		}
 
+		if (opts_len) {
+			ret = tunnel_key_opts_set(tb[TCA_TUNNEL_KEY_ENC_OPTS],
+						  &metadata->u.tun_info,
+						  opts_len, extack);
+			if (ret < 0)
+				goto err_out;
+		}
+
 		metadata->u.tun_info.mode |= IP_TUNNEL_INFO_TX;
 		break;
 	default:
@@ -234,6 +383,61 @@ static void tunnel_key_release(struct tc_action *a)
 	}
 }
 
+static int tunnel_key_geneve_opts_dump(struct sk_buff *skb,
+				       const struct ip_tunnel_info *info)
+{
+	int len = info->options_len;
+	u8 *src = (u8 *)(info + 1);
+	struct nlattr *start;
+
+	start = nla_nest_start(skb, TCA_TUNNEL_KEY_ENC_OPTS_GENEVE);
+	if (!start)
+		return -EMSGSIZE;
+
+	while (len > 0) {
+		struct geneve_opt *opt = (struct geneve_opt *)src;
+
+		if (nla_put_be16(skb, TCA_TUNNEL_KEY_ENC_OPT_GENEVE_CLASS,
+				 opt->opt_class) ||
+		    nla_put_u8(skb, TCA_TUNNEL_KEY_ENC_OPT_GENEVE_TYPE,
+			       opt->type) ||
+		    nla_put(skb, TCA_TUNNEL_KEY_ENC_OPT_GENEVE_DATA,
+			    opt->length * 4, opt + 1))
+			return -EMSGSIZE;
+
+		len -= sizeof(struct geneve_opt) + opt->length * 4;
+		src += sizeof(struct geneve_opt) + opt->length * 4;
+	}
+
+	nla_nest_end(skb, start);
+	return 0;
+}
+
+static int tunnel_key_opts_dump(struct sk_buff *skb,
+				const struct ip_tunnel_info *info)
+{
+	struct nlattr *start;
+	int err;
+
+	if (!info->options_len)
+		return 0;
+
+	start = nla_nest_start(skb, TCA_TUNNEL_KEY_ENC_OPTS);
+	if (!start)
+		return -EMSGSIZE;
+
+	if (info->key.tun_flags & TUNNEL_GENEVE_OPT) {
+		err = tunnel_key_geneve_opts_dump(skb, info);
+		if (err)
+			return err;
+	} else {
+		return -EINVAL;
+	}
+
+	nla_nest_end(skb, start);
+	return 0;
+}
+
 static int tunnel_key_dump_addresses(struct sk_buff *skb,
 				     const struct ip_tunnel_info *info)
 {
@@ -284,8 +488,9 @@ static int tunnel_key_dump(struct sk_buff *skb, struct tc_action *a,
 		goto nla_put_failure;
 
 	if (params->tcft_action == TCA_TUNNEL_KEY_ACT_SET) {
-		struct ip_tunnel_key *key =
-			&params->tcft_enc_metadata->u.tun_info.key;
+		struct ip_tunnel_info *info =
+			&params->tcft_enc_metadata->u.tun_info;
+		struct ip_tunnel_key *key = &info->key;
 		__be32 key_id = tunnel_id_to_key32(key->tun_id);
 
 		if (nla_put_be32(skb, TCA_TUNNEL_KEY_ENC_KEY_ID, key_id) ||
@@ -293,7 +498,8 @@ static int tunnel_key_dump(struct sk_buff *skb, struct tc_action *a,
 					      &params->tcft_enc_metadata->u.tun_info) ||
 		    nla_put_be16(skb, TCA_TUNNEL_KEY_ENC_DST_PORT, key->tp_dst) ||
 		    nla_put_u8(skb, TCA_TUNNEL_KEY_NO_CSUM,
-			       !(key->tun_flags & TUNNEL_CSUM)))
+			       !(key->tun_flags & TUNNEL_CSUM)) ||
+		    tunnel_key_opts_dump(skb, info))
 			goto nla_put_failure;
 	}
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags
  2018-06-27  4:39 ` [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags Jakub Kicinski
@ 2018-06-27  9:49   ` Daniel Borkmann
  2018-06-28  7:42     ` Jiri Benc
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2018-06-27  9:49 UTC (permalink / raw)
  To: Jakub Kicinski, davem, jbenc
  Cc: Roopa Prabhu, jiri, jhs, xiyou.wangcong, oss-drivers, netdev,
	Pieter Jansen van Vuuren

On 06/27/2018 06:39 AM, Jakub Kicinski wrote:
> From: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
> 
> Check the tunnel option type stored in tunnel flags when creating options
> for tunnels. Thereby ensuring we do not set geneve, vxlan or erspan tunnel
> options on interfaces that are not associated with them.
> 
> Make sure all users of the infrastructure set correct flags, for the BPF
> helper we have to set all bits to keep backward compatibility.
> 
> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
> CC: Daniel Borkmann <daniel@iogearbox.net>
> 
> v2:
>  - use __be16 for dst_opt_type in net/openvswitch/flow_netlink.c (build bot).

Looks good to me, and yes in BPF case a mask like TUNNEL_OPTIONS_PRESENT is
right approach since this is opaque info and solely defined by the BPF prog
that is using the generic helper.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags
  2018-06-27  9:49   ` Daniel Borkmann
@ 2018-06-28  7:42     ` Jiri Benc
  2018-06-28 16:54       ` Jakub Kicinski
  0 siblings, 1 reply; 15+ messages in thread
From: Jiri Benc @ 2018-06-28  7:42 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jakub Kicinski, davem, Roopa Prabhu, jiri, jhs, xiyou.wangcong,
	oss-drivers, netdev, Pieter Jansen van Vuuren

On Wed, 27 Jun 2018 11:49:49 +0200, Daniel Borkmann wrote:
> Looks good to me, and yes in BPF case a mask like TUNNEL_OPTIONS_PRESENT is
> right approach since this is opaque info and solely defined by the BPF prog
> that is using the generic helper.

Wouldn't it make sense to introduce some safeguards here (in a backward
compatible way, of course)? It's easy to mistakenly set data for a
different tunnel type in a BPF program and then be surprised by the
result. It might help users if such usage was detected by the kernel,
one way or another.

I'm thinking about something like the BPF program voluntarily
specifying the type of the data; if not specified, the wildcard would be
used as it is now.

 Jiri

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags
  2018-06-28  7:42     ` Jiri Benc
@ 2018-06-28 16:54       ` Jakub Kicinski
  2018-06-28 17:01         ` Jiri Benc
  2018-06-29  7:04         ` Daniel Borkmann
  0 siblings, 2 replies; 15+ messages in thread
From: Jakub Kicinski @ 2018-06-28 16:54 UTC (permalink / raw)
  To: Jiri Benc
  Cc: Daniel Borkmann, davem, Roopa Prabhu, jiri, jhs, xiyou.wangcong,
	oss-drivers, netdev, Pieter Jansen van Vuuren

On Thu, 28 Jun 2018 09:42:06 +0200, Jiri Benc wrote:
> On Wed, 27 Jun 2018 11:49:49 +0200, Daniel Borkmann wrote:
> > Looks good to me, and yes in BPF case a mask like TUNNEL_OPTIONS_PRESENT is
> > right approach since this is opaque info and solely defined by the BPF prog
> > that is using the generic helper.  
> 
> Wouldn't it make sense to introduce some safeguards here (in a backward
> compatible way, of course)? It's easy to mistakenly set data for a
> different tunnel type in a BPF program and then be surprised by the
> result. It might help users if such usage was detected by the kernel,
> one way or another.

Well, that's how it works today ;)

> I'm thinking about something like the BPF program voluntarily
> specifying the type of the data; if not specified, the wildcard would be
> used as it is now.

Hmm... in practice we could steal top bits of the size parameter for
some flags, since it seems to be limited to values < 256 today?  Is it
worth it?

It would look something along the lines of:

---

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 59b19b6a40d7..194b40efa8e8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2213,6 +2213,13 @@ enum bpf_func_id {
 /* BPF_FUNC_perf_event_output for sk_buff input context. */
 #define BPF_F_CTXLEN_MASK              (0xfffffULL << 32)
 
+#define BPF_F_TUN_VXLAN                        (1U << 31)
+#define BPF_F_TUN_GENEVE               (1U << 30)
+#define BPF_F_TUN_ERSPAN               (1U << 29)
+#define BPF_F_TUN_FLAGS_ALL            (BPF_F_TUN_VXLAN | \
+                                        BPF_F_TUN_GENEVE | \
+                                        BPF_F_TUN_ERSPAN)
+
 /* Mode for BPF_FUNC_skb_adjust_room helper. */
 enum bpf_adj_room_mode {
        BPF_ADJ_ROOM_NET,
diff --git a/net/core/filter.c b/net/core/filter.c
index dade922678f6..cc592a1e8945 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3576,6 +3576,22 @@ BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
 {
        struct ip_tunnel_info *info = skb_tunnel_info(skb);
        const struct metadata_dst *md = this_cpu_ptr(md_dst);
+       __be16 tun_flags;
+       u32 flags;
+
+       BUILD_BUG_ON(BPF_F_TUN_FLAGS_ALL & IP_TUNNEL_OPTS_MAX);
+
+       flags = size & BPF_F_TUN_FLAGS_ALL;
+       size &= ~flags;
+       if (flags & BPF_F_TUN_VXLAN)
+               tun_flags |= TUNNEL_VXLAN_OPT;
+       if (flags & BPF_F_TUN_GENEVE)
+               tun_flags |= TUNNEL_GENEVE_OPT;
+       if (flags & BPF_F_TUN_ERSPAN)
+               tun_flags |= TUNNEL_ERSPAN_OPT;
+       /* User didn't specify the tunnel type, for backward compat set all */
+       if (!(tun_flags & TUNNEL_OPTIONS_PRESENT))
+               tun_flags |= TUNNEL_OPTIONS_PRESENT;
 
        if (unlikely(info != &md->u.tun_info || (size & (sizeof(u32) - 1))))
                return -EINVAL;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags
  2018-06-28 16:54       ` Jakub Kicinski
@ 2018-06-28 17:01         ` Jiri Benc
  2018-06-28 17:05           ` Jakub Kicinski
  2018-06-29  7:04         ` Daniel Borkmann
  1 sibling, 1 reply; 15+ messages in thread
From: Jiri Benc @ 2018-06-28 17:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Daniel Borkmann, davem, Roopa Prabhu, jiri, jhs, xiyou.wangcong,
	oss-drivers, netdev, Pieter Jansen van Vuuren

On Thu, 28 Jun 2018 09:54:52 -0700, Jakub Kicinski wrote:
> Hmm... in practice we could steal top bits of the size parameter for
> some flags, since it seems to be limited to values < 256 today?  Is it
> worth it?
> 
> It would look something along the lines of:

Something like that, yes. I'll leave to Daniel to review how much sense
it makes from the BPF side.

Thanks!

 Jiri

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags
  2018-06-28 17:01         ` Jiri Benc
@ 2018-06-28 17:05           ` Jakub Kicinski
  2018-06-29  9:06             ` Jiri Benc
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2018-06-28 17:05 UTC (permalink / raw)
  To: Jiri Benc
  Cc: Daniel Borkmann, davem, Roopa Prabhu, jiri, jhs, xiyou.wangcong,
	oss-drivers, netdev, Pieter Jansen van Vuuren

On Thu, 28 Jun 2018 19:01:52 +0200, Jiri Benc wrote:
> On Thu, 28 Jun 2018 09:54:52 -0700, Jakub Kicinski wrote:
> > Hmm... in practice we could steal top bits of the size parameter for
> > some flags, since it seems to be limited to values < 256 today?  Is it
> > worth it?
> > 
> > It would look something along the lines of:  
> 
> Something like that, yes. I'll leave to Daniel to review how much sense
> it makes from the BPF side.

Can we take this as a follow up through the bpf-next tree or do you
want us to respin as part of this set?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags
  2018-06-28 16:54       ` Jakub Kicinski
  2018-06-28 17:01         ` Jiri Benc
@ 2018-06-29  7:04         ` Daniel Borkmann
  2018-06-29 17:01           ` Jakub Kicinski
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2018-06-29  7:04 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Benc
  Cc: davem, Roopa Prabhu, jiri, jhs, xiyou.wangcong, oss-drivers,
	netdev, Pieter Jansen van Vuuren

On 06/28/2018 06:54 PM, Jakub Kicinski wrote:
> On Thu, 28 Jun 2018 09:42:06 +0200, Jiri Benc wrote:
>> On Wed, 27 Jun 2018 11:49:49 +0200, Daniel Borkmann wrote:
>>> Looks good to me, and yes in BPF case a mask like TUNNEL_OPTIONS_PRESENT is
>>> right approach since this is opaque info and solely defined by the BPF prog
>>> that is using the generic helper.  
>>
>> Wouldn't it make sense to introduce some safeguards here (in a backward
>> compatible way, of course)? It's easy to mistakenly set data for a
>> different tunnel type in a BPF program and then be surprised by the
>> result. It might help users if such usage was detected by the kernel,
>> one way or another.
> 
> Well, that's how it works today ;)

Well, it was designed like that on purpose, to be i) agnostic of the underlying
device, ii) to not clutter BPF API with tens of different APIs effectively doing
the same thing, and at the same time to avoid adding protocol specifics. E.g. at
least core bits of bpf_skb_{set,get}_tunnel_key() will work whether I use vxlan
or geneve underneath (we are actually using it this way) and I could use things
like tun_id to encode custom meta data from BPF for either of them depending on flavor
picked by orchestration system. For the tunnel options in bpf_skb_{set,get}_tunnel_opt()
it's similar although here there needs to be awareness of the underlying dev depending
on whether you encode data into e.g. gbp or tlvs, etc. However, downside right now I
can see with a patch like below is that:

i) People might still just keep using 'TUNNEL_OPTIONS_PRESENT path' since available
and backwards compatible with current/older kernels, ii) we cut bits away from
size over time for each new tunnel proto added in future that would support tunnel
options, iii) that extension is one-sided (at least below) and same would be needed
in getter part, and iv) there needs to be a way for the case when folks add new
tunnel options where we don't need to worry that we forget updating BPF_F_TUN_*
each time otherwise this will easily slip through and again people will just rely
on using TUNNEL_OPTIONS_PRESENT catchall. Given latter and in particular point i)
I wouldn't think it's worth the pain, the APIs were added to BPF in v4.6 so this
would buy them 2 more years wrt kernel compatibility with same functionality level.
And point v), I just noticed the patch is actually buggy: size is ARG_CONST_SIZE and
verifier will attempt to check the value whether the buffer passed in argument 2 is
valid or not, so using flags here in upper bits would let verification fail, you'd
really have to make a new helper just for this.

Best,
Daniel

>> I'm thinking about something like the BPF program voluntarily
>> specifying the type of the data; if not specified, the wildcard would be
>> used as it is now.
> 
> Hmm... in practice we could steal top bits of the size parameter for
> some flags, since it seems to be limited to values < 256 today?  Is it
> worth it?
> 
> It would look something along the lines of:
> 
> ---
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 59b19b6a40d7..194b40efa8e8 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2213,6 +2213,13 @@ enum bpf_func_id {
>  /* BPF_FUNC_perf_event_output for sk_buff input context. */
>  #define BPF_F_CTXLEN_MASK              (0xfffffULL << 32)
>  
> +#define BPF_F_TUN_VXLAN                        (1U << 31)
> +#define BPF_F_TUN_GENEVE               (1U << 30)
> +#define BPF_F_TUN_ERSPAN               (1U << 29)
> +#define BPF_F_TUN_FLAGS_ALL            (BPF_F_TUN_VXLAN | \
> +                                        BPF_F_TUN_GENEVE | \
> +                                        BPF_F_TUN_ERSPAN)
> +
>  /* Mode for BPF_FUNC_skb_adjust_room helper. */
>  enum bpf_adj_room_mode {
>         BPF_ADJ_ROOM_NET,
> diff --git a/net/core/filter.c b/net/core/filter.c
> index dade922678f6..cc592a1e8945 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3576,6 +3576,22 @@ BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
>  {
>         struct ip_tunnel_info *info = skb_tunnel_info(skb);
>         const struct metadata_dst *md = this_cpu_ptr(md_dst);
> +       __be16 tun_flags;
> +       u32 flags;
> +
> +       BUILD_BUG_ON(BPF_F_TUN_FLAGS_ALL & IP_TUNNEL_OPTS_MAX);
> +
> +       flags = size & BPF_F_TUN_FLAGS_ALL;
> +       size &= ~flags;
> +       if (flags & BPF_F_TUN_VXLAN)
> +               tun_flags |= TUNNEL_VXLAN_OPT;
> +       if (flags & BPF_F_TUN_GENEVE)
> +               tun_flags |= TUNNEL_GENEVE_OPT;
> +       if (flags & BPF_F_TUN_ERSPAN)
> +               tun_flags |= TUNNEL_ERSPAN_OPT;
> +       /* User didn't specify the tunnel type, for backward compat set all */
> +       if (!(tun_flags & TUNNEL_OPTIONS_PRESENT))
> +               tun_flags |= TUNNEL_OPTIONS_PRESENT;
>  
>         if (unlikely(info != &md->u.tun_info || (size & (sizeof(u32) - 1))))
>                 return -EINVAL;
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags
  2018-06-28 17:05           ` Jakub Kicinski
@ 2018-06-29  9:06             ` Jiri Benc
  0 siblings, 0 replies; 15+ messages in thread
From: Jiri Benc @ 2018-06-29  9:06 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Daniel Borkmann, davem, Roopa Prabhu, jiri, jhs, xiyou.wangcong,
	oss-drivers, netdev, Pieter Jansen van Vuuren

On Thu, 28 Jun 2018 10:05:36 -0700, Jakub Kicinski wrote:
> Can we take this as a follow up through the bpf-next tree or do you
> want us to respin as part of this set?

I think BPF is a separate issue.

 Jiri

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 0/4] net: Geneve options support for TC act_tunnel_key
  2018-06-27  4:39 [PATCH net-next v2 0/4] net: Geneve options support for TC act_tunnel_key Jakub Kicinski
                   ` (3 preceding siblings ...)
  2018-06-27  4:39 ` [PATCH net-next v2 4/4] net/sched: add tunnel option support to act_tunnel_key Jakub Kicinski
@ 2018-06-29 14:51 ` David Miller
  4 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2018-06-29 14:51 UTC (permalink / raw)
  To: jakub.kicinski
  Cc: jbenc, roopa, jiri, jhs, xiyou.wangcong, oss-drivers, netdev

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Tue, 26 Jun 2018 21:39:33 -0700

> Simon & Pieter say:
> 
> This set adds Geneve Options support to the TC tunnel key action.
> It provides the plumbing required to configure Geneve variable length
> options.  The options can be configured in the form CLASS:TYPE:DATA,
> where CLASS is represented as a 16bit hexadecimal value, TYPE as an 8bit
> hexadecimal value and DATA as a variable length hexadecimal value.
> Additionally multiple options may be listed using a comma delimiter.
> 
> v2:
>  - fix sparse warnings in patches 3 and 4 (first one reported by
>    build bot).

Series applied, thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags
  2018-06-29  7:04         ` Daniel Borkmann
@ 2018-06-29 17:01           ` Jakub Kicinski
  2018-06-30  8:47             ` Daniel Borkmann
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2018-06-29 17:01 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jiri Benc, davem, Roopa Prabhu, jiri, jhs, xiyou.wangcong,
	oss-drivers, netdev, Pieter Jansen van Vuuren

On Fri, 29 Jun 2018 09:04:15 +0200, Daniel Borkmann wrote:
> On 06/28/2018 06:54 PM, Jakub Kicinski wrote:
> > On Thu, 28 Jun 2018 09:42:06 +0200, Jiri Benc wrote:  
> >> On Wed, 27 Jun 2018 11:49:49 +0200, Daniel Borkmann wrote:  
> >>> Looks good to me, and yes in BPF case a mask like TUNNEL_OPTIONS_PRESENT is
> >>> right approach since this is opaque info and solely defined by the BPF prog
> >>> that is using the generic helper.    
> >>
> >> Wouldn't it make sense to introduce some safeguards here (in a backward
> >> compatible way, of course)? It's easy to mistakenly set data for a
> >> different tunnel type in a BPF program and then be surprised by the
> >> result. It might help users if such usage was detected by the kernel,
> >> one way or another.  
> > 
> > Well, that's how it works today ;)  
> 
> Well, it was designed like that on purpose, to be i) agnostic of the underlying
> device, ii) to not clutter BPF API with tens of different APIs effectively doing
> the same thing, and at the same time to avoid adding protocol specifics. E.g. at
> least core bits of bpf_skb_{set,get}_tunnel_key() will work whether I use vxlan
> or geneve underneath (we are actually using it this way) and I could use things
> like tun_id to encode custom meta data from BPF for either of them depending on flavor
> picked by orchestration system. For the tunnel options in bpf_skb_{set,get}_tunnel_opt()
> it's similar although here there needs to be awareness of the underlying dev depending
> on whether you encode data into e.g. gbp or tlvs, etc. However, downside right now I
> can see with a patch like below is that:
> 
> i) People might still just keep using 'TUNNEL_OPTIONS_PRESENT path' since available
> and backwards compatible with current/older kernels, ii) we cut bits away from
> size over time for each new tunnel proto added in future that would support tunnel
> options, iii) that extension is one-sided (at least below) and same would be needed
> in getter part, and iv) there needs to be a way for the case when folks add new
> tunnel options where we don't need to worry that we forget updating BPF_F_TUN_*
> each time otherwise this will easily slip through and again people will just rely
> on using TUNNEL_OPTIONS_PRESENT catchall. Given latter and in particular point i)
> I wouldn't think it's worth the pain, the APIs were added to BPF in v4.6 so this
> would buy them 2 more years wrt kernel compatibility with same functionality level.
> And point v), I just noticed the patch is actually buggy: size is ARG_CONST_SIZE and
> verifier will attempt to check the value whether the buffer passed in argument 2 is
> valid or not, so using flags here in upper bits would let verification fail, you'd
> really have to make a new helper just for this.

Ah, indeed.  I'd rather avoid a new helper, if we reuse an old one
people can always write a program like:

	err = helper(all_flags);
	if (err == -EINVAL)
		err = helper(fallback_flags);

With a new helper the program will not even load on old kernels :(

Could we add the flags new to bpf_skb_set_tunnel_key(), maybe?  It is a
bit ugly because only options care about the flags and in theory
bpf_skb_set_tunnel_key() doesn't have to be called before
bpf_skb_set_tunnel_opt() ...

Alternatively we could do this:

diff --git a/net/core/filter.c b/net/core/filter.c
index dade922678f6..9edcc2947be5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3582,7 +3582,9 @@ BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
        if (unlikely(size > IP_TUNNEL_OPTS_MAX))
                return -ENOMEM;
 
-       ip_tunnel_info_opts_set(info, from, size, TUNNEL_OPTIONS_PRESENT);
+       ip_tunnel_info_opts_set(info, from, size, TUNNEL_VXLAN_OPT |
+                                                 TUNNEL_GENEVE_OPT |
+                                                 TUNNEL_ERSPAN_OPT);
 
        return 0;
 }

to force ourselves to solve the problem when a next protocol is added ;)

Or should we just leave things as they are?  As you explain the new
helper would really be of marginal use given the backward compatibility
requirement and availability of the old one...

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags
  2018-06-29 17:01           ` Jakub Kicinski
@ 2018-06-30  8:47             ` Daniel Borkmann
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Borkmann @ 2018-06-30  8:47 UTC (permalink / raw)
  To: Jakub Kicinski, Daniel Borkmann
  Cc: Jiri Benc, davem, Roopa Prabhu, jiri, jhs, xiyou.wangcong,
	oss-drivers, netdev, Pieter Jansen van Vuuren

On 06/29/2018 07:01 PM, Jakub Kicinski wrote:
> On Fri, 29 Jun 2018 09:04:15 +0200, Daniel Borkmann wrote:
>> On 06/28/2018 06:54 PM, Jakub Kicinski wrote:
>>> On Thu, 28 Jun 2018 09:42:06 +0200, Jiri Benc wrote:  
>>>> On Wed, 27 Jun 2018 11:49:49 +0200, Daniel Borkmann wrote:  
>>>>> Looks good to me, and yes in BPF case a mask like TUNNEL_OPTIONS_PRESENT is
>>>>> right approach since this is opaque info and solely defined by the BPF prog
>>>>> that is using the generic helper.    
>>>>
>>>> Wouldn't it make sense to introduce some safeguards here (in a backward
>>>> compatible way, of course)? It's easy to mistakenly set data for a
>>>> different tunnel type in a BPF program and then be surprised by the
>>>> result. It might help users if such usage was detected by the kernel,
>>>> one way or another.  
>>>
>>> Well, that's how it works today ;)  
>>
>> Well, it was designed like that on purpose, to be i) agnostic of the underlying
>> device, ii) to not clutter BPF API with tens of different APIs effectively doing
>> the same thing, and at the same time to avoid adding protocol specifics. E.g. at
>> least core bits of bpf_skb_{set,get}_tunnel_key() will work whether I use vxlan
>> or geneve underneath (we are actually using it this way) and I could use things
>> like tun_id to encode custom meta data from BPF for either of them depending on flavor
>> picked by orchestration system. For the tunnel options in bpf_skb_{set,get}_tunnel_opt()
>> it's similar although here there needs to be awareness of the underlying dev depending
>> on whether you encode data into e.g. gbp or tlvs, etc. However, downside right now I
>> can see with a patch like below is that:
>>
>> i) People might still just keep using 'TUNNEL_OPTIONS_PRESENT path' since available
>> and backwards compatible with current/older kernels, ii) we cut bits away from
>> size over time for each new tunnel proto added in future that would support tunnel
>> options, iii) that extension is one-sided (at least below) and same would be needed
>> in getter part, and iv) there needs to be a way for the case when folks add new
>> tunnel options where we don't need to worry that we forget updating BPF_F_TUN_*
>> each time otherwise this will easily slip through and again people will just rely
>> on using TUNNEL_OPTIONS_PRESENT catchall. Given latter and in particular point i)
>> I wouldn't think it's worth the pain, the APIs were added to BPF in v4.6 so this
>> would buy them 2 more years wrt kernel compatibility with same functionality level.
>> And point v), I just noticed the patch is actually buggy: size is ARG_CONST_SIZE and
>> verifier will attempt to check the value whether the buffer passed in argument 2 is
>> valid or not, so using flags here in upper bits would let verification fail, you'd
>> really have to make a new helper just for this.
> 
> Ah, indeed.  I'd rather avoid a new helper, if we reuse an old one
> people can always write a program like:
> 
> 	err = helper(all_flags);
> 	if (err == -EINVAL)
> 		err = helper(fallback_flags);
> 
> With a new helper the program will not even load on old kernels :(
> 
> Could we add the flags new to bpf_skb_set_tunnel_key(), maybe?  It is a
> bit ugly because only options care about the flags and in theory
> bpf_skb_set_tunnel_key() doesn't have to be called before
> bpf_skb_set_tunnel_opt() ...

Right, though it's also ugly splitting things this way over two helpers
when this is _specifically_ only for one of them (unless you do need it for
hw offload where you need to signal for tunnel key _and_ options in BPF
itself but that's not from what I see is the case here). For sake of
'safeguard'-only I bet users won't do such fallback for reasons above but
again use currently TUNNEL_OPTIONS_PRESENT route, as is since it's simpler
and available since long time already. Thus I would prefer to leave it as
is in such case since ship has sailed long ago on this, and usage of this
extra check is not really obvious either, so I don't think it's worth it
and 'value-add' is marginal.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-06-30  8:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-27  4:39 [PATCH net-next v2 0/4] net: Geneve options support for TC act_tunnel_key Jakub Kicinski
2018-06-27  4:39 ` [PATCH net-next v2 1/4] net/sched: act_tunnel_key: disambiguate metadata dst error cases Jakub Kicinski
2018-06-27  4:39 ` [PATCH net-next v2 2/4] net/sched: act_tunnel_key: add extended ack support Jakub Kicinski
2018-06-27  4:39 ` [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags Jakub Kicinski
2018-06-27  9:49   ` Daniel Borkmann
2018-06-28  7:42     ` Jiri Benc
2018-06-28 16:54       ` Jakub Kicinski
2018-06-28 17:01         ` Jiri Benc
2018-06-28 17:05           ` Jakub Kicinski
2018-06-29  9:06             ` Jiri Benc
2018-06-29  7:04         ` Daniel Borkmann
2018-06-29 17:01           ` Jakub Kicinski
2018-06-30  8:47             ` Daniel Borkmann
2018-06-27  4:39 ` [PATCH net-next v2 4/4] net/sched: add tunnel option support to act_tunnel_key Jakub Kicinski
2018-06-29 14:51 ` [PATCH net-next v2 0/4] net: Geneve options support for TC act_tunnel_key David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.