With this patchset, users can configure options with LWTUNNEL_IP(6)_OPTS by ip route encap for ersapn or vxlan lwtunnel. Note that in kernel part it won't parse the option details but do some check and memcpy only, and the options will be parsed by iproute in userspace. We also improve the vxlan and erspan options processing in this patchset. As an example I also wrote a patch for iproute2 (see v1), with it we can add options for erspan lwtunnel like: # ip net a a; ip net a b # ip -n a l a eth0 type veth peer name eth0 netns b # ip -n a l s eth0 up; ip -n b link set eth0 up # ip -n a a a 10.1.0.1/24 dev eth0; ip -n b a a 10.1.0.2/24 dev eth0 # ip -n b l a erspan1 type erspan key 1 seq erspan 123 \ local 10.1.0.2 remote 10.1.0.1 # ip -n b a a 1.1.1.1/24 dev erspan1; ip -n b l s erspan1 up # ip -n b r a 2.1.1.0/24 dev erspan1 # ip -n a l a erspan1 type erspan key 1 seq local 10.1.0.1 external # ip -n a a a 2.1.1.1/24 dev erspan1; ip -n a l s erspan1 up # ip -n a r a 1.1.1.0/24 encap ip id 1 erspan ver 1 idx 123 \ dst 10.1.0.2 dev erspan1 # ip -n a r s; ip net exec a ping 1.1.1.1 -c 1 v1->v2: - no change, net-next reopened. Xin Long (6): lwtunnel: add options process for arp request lwtunnel: add LWTUNNEL_IP_OPTS support for lwtunnel_ip lwtunnel: add LWTUNNEL_IP6_OPTS support for lwtunnel_ip6 vxlan: check tun_info options_len properly erspan: fix the tun_info options_len check erspan: make md work without TUNNEL_ERSPAN_OPT set drivers/net/vxlan.c | 6 +++-- include/uapi/linux/lwtunnel.h | 2 ++ net/ipv4/ip_gre.c | 31 ++++++++++------------- net/ipv4/ip_tunnel_core.c | 59 +++++++++++++++++++++++++++++++++---------- net/ipv6/ip6_gre.c | 35 +++++++++++++------------ 5 files changed, 84 insertions(+), 49 deletions(-) -- 2.1.0
Without options copied to the dst tun_info in iptunnel_metadata_reply() called by arp_process for handling arp_request, the generated arp_reply packet may be dropped or sent out with wrong options for some tunnels like erspan and vxlan, and the traffic will break. Signed-off-by: Xin Long <lucien.xin@gmail.com> --- net/ipv4/ip_tunnel_core.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index 1452a97..10f0848 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -126,15 +126,14 @@ struct metadata_dst *iptunnel_metadata_reply(struct metadata_dst *md, if (!md || md->type != METADATA_IP_TUNNEL || md->u.tun_info.mode & IP_TUNNEL_INFO_TX) - return NULL; - res = metadata_dst_alloc(0, METADATA_IP_TUNNEL, flags); + src = &md->u.tun_info; + res = metadata_dst_alloc(src->options_len, METADATA_IP_TUNNEL, flags); if (!res) return NULL; dst = &res->u.tun_info; - src = &md->u.tun_info; dst->key.tun_id = src->key.tun_id; if (src->mode & IP_TUNNEL_INFO_IPV6) memcpy(&dst->key.u.ipv6.dst, &src->key.u.ipv6.src, @@ -143,6 +142,8 @@ struct metadata_dst *iptunnel_metadata_reply(struct metadata_dst *md, dst->key.u.ipv4.dst = src->key.u.ipv4.src; dst->key.tun_flags = src->key.tun_flags; dst->mode = src->mode | IP_TUNNEL_INFO_TX; + ip_tunnel_info_opts_set(dst, ip_tunnel_info_opts(src), + src->options_len, 0); return res; } -- 2.1.0
This patch is to add LWTUNNEL_IP_OPTS into lwtunnel_ip_t, by which users will be able to set options for ip_tunnel_info by "ip route encap" for erspan and vxlan's private metadata. Like one way to go in iproute is: # ip route add 1.1.1.0/24 encap ip id 1 erspan ver 1 idx 123 \ dst 10.1.0.2 dev erspan1 # ip route show 1.1.1.0/24 encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 \ tos 0 erspan ver 1 idx 123 dev erspan1 scope link Signed-off-by: Xin Long <lucien.xin@gmail.com> --- include/uapi/linux/lwtunnel.h | 1 + net/ipv4/ip_tunnel_core.c | 30 ++++++++++++++++++++++++------ 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h index de696ca..93f2c05 100644 --- a/include/uapi/linux/lwtunnel.h +++ b/include/uapi/linux/lwtunnel.h @@ -27,6 +27,7 @@ enum lwtunnel_ip_t { LWTUNNEL_IP_TOS, LWTUNNEL_IP_FLAGS, LWTUNNEL_IP_PAD, + LWTUNNEL_IP_OPTS, __LWTUNNEL_IP_MAX, }; diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index 10f0848..d9b7188 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -218,6 +218,7 @@ static const struct nla_policy ip_tun_policy[LWTUNNEL_IP_MAX + 1] = { [LWTUNNEL_IP_TTL] = { .type = NLA_U8 }, [LWTUNNEL_IP_TOS] = { .type = NLA_U8 }, [LWTUNNEL_IP_FLAGS] = { .type = NLA_U16 }, + [LWTUNNEL_IP_OPTS] = { .type = NLA_BINARY }, }; static int ip_tun_build_state(struct nlattr *attr, @@ -228,14 +229,20 @@ static int ip_tun_build_state(struct nlattr *attr, struct ip_tunnel_info *tun_info; struct lwtunnel_state *new_state; struct nlattr *tb[LWTUNNEL_IP_MAX + 1]; - int err; + int err, opts_len = 0; + void *opts; err = nla_parse_nested_deprecated(tb, LWTUNNEL_IP_MAX, attr, ip_tun_policy, extack); if (err < 0) return err; - new_state = lwtunnel_state_alloc(sizeof(*tun_info)); + if (tb[LWTUNNEL_IP_OPTS]) { + opts = nla_data(tb[LWTUNNEL_IP_OPTS]); + opts_len = nla_len(tb[LWTUNNEL_IP_OPTS]); + } + + new_state = lwtunnel_state_alloc(sizeof(*tun_info) + opts_len); if (!new_state) return -ENOMEM; @@ -269,8 +276,10 @@ static int ip_tun_build_state(struct nlattr *attr, if (tb[LWTUNNEL_IP_FLAGS]) tun_info->key.tun_flags = nla_get_be16(tb[LWTUNNEL_IP_FLAGS]); + if (opts_len) + ip_tunnel_info_opts_set(tun_info, opts, opts_len, 0); + tun_info->mode = IP_TUNNEL_INFO_TX; - tun_info->options_len = 0; *ts = new_state; @@ -299,6 +308,10 @@ static int ip_tun_fill_encap_info(struct sk_buff *skb, nla_put_u8(skb, LWTUNNEL_IP_TTL, tun_info->key.ttl) || nla_put_be16(skb, LWTUNNEL_IP_FLAGS, tun_info->key.tun_flags)) return -ENOMEM; + if (tun_info->options_len && + nla_put(skb, LWTUNNEL_IP_OPTS, + tun_info->options_len, ip_tunnel_info_opts(tun_info))) + return -ENOMEM; return 0; } @@ -310,13 +323,18 @@ static int ip_tun_encap_nlsize(struct lwtunnel_state *lwtstate) + nla_total_size(4) /* LWTUNNEL_IP_SRC */ + nla_total_size(1) /* LWTUNNEL_IP_TOS */ + nla_total_size(1) /* LWTUNNEL_IP_TTL */ - + nla_total_size(2); /* LWTUNNEL_IP_FLAGS */ + + nla_total_size(2) /* LWTUNNEL_IP_FLAGS */ + + lwt_tun_info(lwtstate)->options_len; /* LWTUNNEL_IP_OPTS */ } static int ip_tun_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b) { - return memcmp(lwt_tun_info(a), lwt_tun_info(b), - sizeof(struct ip_tunnel_info)); + struct ip_tunnel_info *info_a = lwt_tun_info(a); + struct ip_tunnel_info *info_b = lwt_tun_info(b); + u8 opts_len; + + opts_len = min(info_a->options_len, info_b->options_len); + return memcmp(info_a, info_b, sizeof(*info_a) + opts_len); } static const struct lwtunnel_encap_ops ip_tun_lwt_ops = { -- 2.1.0
Similar to lwtunnel_ip, this patch is to add options set/dump support for lwtunnel_ip6. Signed-off-by: Xin Long <lucien.xin@gmail.com> --- include/uapi/linux/lwtunnel.h | 1 + net/ipv4/ip_tunnel_core.c | 22 ++++++++++++++++++---- 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h index 93f2c05..4bed5e6 100644 --- a/include/uapi/linux/lwtunnel.h +++ b/include/uapi/linux/lwtunnel.h @@ -42,6 +42,7 @@ enum lwtunnel_ip6_t { LWTUNNEL_IP6_TC, LWTUNNEL_IP6_FLAGS, LWTUNNEL_IP6_PAD, + LWTUNNEL_IP6_OPTS, __LWTUNNEL_IP6_MAX, }; diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index d9b7188..c8f5375a 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -353,6 +353,7 @@ static const struct nla_policy ip6_tun_policy[LWTUNNEL_IP6_MAX + 1] = { [LWTUNNEL_IP6_HOPLIMIT] = { .type = NLA_U8 }, [LWTUNNEL_IP6_TC] = { .type = NLA_U8 }, [LWTUNNEL_IP6_FLAGS] = { .type = NLA_U16 }, + [LWTUNNEL_IP6_OPTS] = { .type = NLA_BINARY }, }; static int ip6_tun_build_state(struct nlattr *attr, @@ -363,14 +364,20 @@ static int ip6_tun_build_state(struct nlattr *attr, struct ip_tunnel_info *tun_info; struct lwtunnel_state *new_state; struct nlattr *tb[LWTUNNEL_IP6_MAX + 1]; - int err; + int err, opts_len = 0; + void *opts; err = nla_parse_nested_deprecated(tb, LWTUNNEL_IP6_MAX, attr, ip6_tun_policy, extack); if (err < 0) return err; - new_state = lwtunnel_state_alloc(sizeof(*tun_info)); + if (tb[LWTUNNEL_IP6_OPTS]) { + opts = nla_data(tb[LWTUNNEL_IP6_OPTS]); + opts_len = nla_len(tb[LWTUNNEL_IP6_OPTS]); + } + + new_state = lwtunnel_state_alloc(sizeof(*tun_info) + opts_len); if (!new_state) return -ENOMEM; @@ -396,8 +403,10 @@ static int ip6_tun_build_state(struct nlattr *attr, if (tb[LWTUNNEL_IP6_FLAGS]) tun_info->key.tun_flags = nla_get_be16(tb[LWTUNNEL_IP6_FLAGS]); + if (opts_len) + ip_tunnel_info_opts_set(tun_info, opts, opts_len, 0); + tun_info->mode = IP_TUNNEL_INFO_TX | IP_TUNNEL_INFO_IPV6; - tun_info->options_len = 0; *ts = new_state; @@ -417,6 +426,10 @@ static int ip6_tun_fill_encap_info(struct sk_buff *skb, nla_put_u8(skb, LWTUNNEL_IP6_HOPLIMIT, tun_info->key.ttl) || nla_put_be16(skb, LWTUNNEL_IP6_FLAGS, tun_info->key.tun_flags)) return -ENOMEM; + if (tun_info->options_len && + nla_put(skb, LWTUNNEL_IP6_OPTS, + tun_info->options_len, ip_tunnel_info_opts(tun_info))) + return -ENOMEM; return 0; } @@ -428,7 +441,8 @@ static int ip6_tun_encap_nlsize(struct lwtunnel_state *lwtstate) + nla_total_size(16) /* LWTUNNEL_IP6_SRC */ + nla_total_size(1) /* LWTUNNEL_IP6_HOPLIMIT */ + nla_total_size(1) /* LWTUNNEL_IP6_TC */ - + nla_total_size(2); /* LWTUNNEL_IP6_FLAGS */ + + nla_total_size(2) /* LWTUNNEL_IP6_FLAGS */ + + lwt_tun_info(lwtstate)->options_len; /* LWTUNNEL_IP6_OPTS */ } static const struct lwtunnel_encap_ops ip6_tun_lwt_ops = { -- 2.1.0
This patch is to improve the tun_info options_len by dropping the skb when TUNNEL_VXLAN_OPT is set but options_len is less than vxlan_metadata. This can void a potential out-of-bounds access on ip_tun_info. Signed-off-by: Xin Long <lucien.xin@gmail.com> --- drivers/net/vxlan.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 3d9bcc9..e0787286 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2487,9 +2487,11 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, vni = tunnel_id_to_key32(info->key.tun_id); ifindex = 0; dst_cache = &info->dst_cache; - if (info->options_len && - info->key.tun_flags & TUNNEL_VXLAN_OPT) + if (info->key.tun_flags & TUNNEL_VXLAN_OPT) { + if (info->options_len < sizeof(*md)) + goto drop; md = ip_tunnel_info_opts(info); + } ttl = info->key.ttl; tos = info->key.tos; label = info->key.label; -- 2.1.0
The check for !md doens't really work for ip_tunnel_info_opts(info) which only does info + 1. Also to avoid out-of-bounds access on info, it should ensure options_len is not less than erspan_metadata in both erspan_xmit() and ip6erspan_tunnel_xmit(). Signed-off-by: Xin Long <lucien.xin@gmail.com> --- net/ipv4/ip_gre.c | 4 ++-- net/ipv6/ip6_gre.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 52690bb..b5e1f5e 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -509,9 +509,9 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev) key = &tun_info->key; if (!(tun_info->key.tun_flags & TUNNEL_ERSPAN_OPT)) goto err_free_skb; - md = ip_tunnel_info_opts(tun_info); - if (!md) + if (sizeof(*md) > tun_info->options_len) goto err_free_skb; + md = ip_tunnel_info_opts(tun_info); /* ERSPAN has fixed 8 byte GRE header */ version = md->version; diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index d5779d6..116987d 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -980,9 +980,9 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb, dsfield = key->tos; if (!(tun_info->key.tun_flags & TUNNEL_ERSPAN_OPT)) goto tx_err; - md = ip_tunnel_info_opts(tun_info); - if (!md) + if (sizeof(*md) > tun_info->options_len) goto tx_err; + md = ip_tunnel_info_opts(tun_info); tun_id = tunnel_id_to_key32(key->tun_id); if (md->version == 1) { -- 2.1.0
Now when a skb comes with ip_tun_info but with no TUNNEL_ERSPAN_OPT to a md erspan device, it will be dropped. This patch is to allow this skb to go through this erspan device, and the options (version, index, hwid, dir, etc.) will be filled with tunnel's params, which can be configured by users. This can be verified by: # ip net a a; ip net a b # ip -n a l a eth0 type veth peer name eth0 netns b # ip -n a l s eth0 up; ip -n b link set eth0 up # ip -n a a a 10.1.0.1/24 dev eth0; ip -n b a a 10.1.0.2/24 dev eth0 # ip -n b l a erspan1 type erspan key 1 seq local 10.1.0.2 remote 10.1.0.1 # ip -n b a a 1.1.1.1/24 dev erspan1; ip -n b l s erspan1 up # ip -n b r a 2.1.1.0/24 dev erspan1 # ip -n a l a erspan1 type erspan key 1 seq local 10.1.0.1 external # ip -n a a a 2.1.1.1/24 dev erspan1; ip -n a l s erspan1 up # ip -n a r a 1.1.1.0/24 encap ip id 1 dst 10.1.0.2 dev erspan1 # ip net exec a ping 1.1.1.1 -c 1 Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> --- net/ipv4/ip_gre.c | 31 +++++++++++++------------------ net/ipv6/ip6_gre.c | 35 +++++++++++++++++++---------------- 2 files changed, 32 insertions(+), 34 deletions(-) diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index b5e1f5e..84a445d 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -491,15 +491,12 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev, static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev) { struct ip_tunnel *tunnel = netdev_priv(dev); + struct erspan_metadata *md = NULL; struct ip_tunnel_info *tun_info; const struct ip_tunnel_key *key; - struct erspan_metadata *md; + int version, nhoff, thoff; bool truncate = false; __be16 proto; - int tunnel_hlen; - int version; - int nhoff; - int thoff; tun_info = skb_tunnel_info(skb); if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) || @@ -507,15 +504,11 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev) goto err_free_skb; key = &tun_info->key; - if (!(tun_info->key.tun_flags & TUNNEL_ERSPAN_OPT)) - goto err_free_skb; - if (sizeof(*md) > tun_info->options_len) - goto err_free_skb; - md = ip_tunnel_info_opts(tun_info); - - /* ERSPAN has fixed 8 byte GRE header */ - version = md->version; - tunnel_hlen = 8 + erspan_hdr_len(version); + if (key->tun_flags & TUNNEL_ERSPAN_OPT) { + if (tun_info->options_len < sizeof(*md)) + goto err_free_skb; + md = ip_tunnel_info_opts(tun_info); + } if (skb_cow_head(skb, dev->needed_headroom)) goto err_free_skb; @@ -538,15 +531,17 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev) (ntohs(ipv6_hdr(skb)->payload_len) > skb->len - thoff)) truncate = true; + version = md ? md->version : tunnel->erspan_ver; if (version == 1) { erspan_build_header(skb, ntohl(tunnel_id_to_key32(key->tun_id)), - ntohl(md->u.index), truncate, true); + md ? ntohl(md->u.index) : tunnel->index, + truncate, true); proto = htons(ETH_P_ERSPAN); } else if (version == 2) { erspan_build_header_v2(skb, ntohl(tunnel_id_to_key32(key->tun_id)), - md->u.md2.dir, - get_hwid(&md->u.md2), + md ? md->u.md2.dir : tunnel->dir, + md ? get_hwid(&md->u.md2) : tunnel->hwid, truncate, true); proto = htons(ETH_P_ERSPAN2); } else { @@ -556,7 +551,7 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev) gre_build_header(skb, 8, TUNNEL_SEQ, proto, 0, htonl(tunnel->o_seqno++)); - ip_md_tunnel_xmit(skb, dev, IPPROTO_GRE, tunnel_hlen); + ip_md_tunnel_xmit(skb, dev, IPPROTO_GRE, 8 + erspan_hdr_len(version)); return; diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 116987d..3b7d213 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -959,10 +959,11 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb, * for native mode, call prepare_ip6gre_xmit_{ipv4,ipv6}. */ if (t->parms.collect_md) { + struct erspan_metadata *md = NULL; struct ip_tunnel_info *tun_info; const struct ip_tunnel_key *key; - struct erspan_metadata *md; __be32 tun_id; + int version; tun_info = skb_tunnel_info(skb); if (unlikely(!tun_info || @@ -978,23 +979,25 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb, fl6.flowi6_uid = sock_net_uid(dev_net(dev), NULL); dsfield = key->tos; - if (!(tun_info->key.tun_flags & TUNNEL_ERSPAN_OPT)) - goto tx_err; - if (sizeof(*md) > tun_info->options_len) - goto tx_err; - md = ip_tunnel_info_opts(tun_info); + if (key->tun_flags & TUNNEL_ERSPAN_OPT) { + if (tun_info->options_len < sizeof(*md)) + goto tx_err; + md = ip_tunnel_info_opts(tun_info); + } tun_id = tunnel_id_to_key32(key->tun_id); - if (md->version == 1) { - erspan_build_header(skb, - ntohl(tun_id), - ntohl(md->u.index), truncate, - false); - } else if (md->version == 2) { - erspan_build_header_v2(skb, - ntohl(tun_id), - md->u.md2.dir, - get_hwid(&md->u.md2), + version = md ? md->version : t->parms.erspan_ver; + if (version == 1) { + erspan_build_header(skb, ntohl(tun_id), + md ? ntohl(md->u.index) + : t->parms.index, + truncate, false); + } else if (version == 2) { + erspan_build_header_v2(skb, ntohl(tun_id), + md ? md->u.md2.dir + : t->parms.dir, + md ? get_hwid(&md->u.md2) + : t->parms.hwid, truncate, false); } else { goto tx_err; -- 2.1.0
On Tue, Oct 08, 2019 at 11:16:12PM +0800, Xin Long wrote: > This patch is to add LWTUNNEL_IP_OPTS into lwtunnel_ip_t, by which > users will be able to set options for ip_tunnel_info by "ip route > encap" for erspan and vxlan's private metadata. Like one way to go > in iproute is: > > # ip route add 1.1.1.0/24 encap ip id 1 erspan ver 1 idx 123 \ > dst 10.1.0.2 dev erspan1 > # ip route show > 1.1.1.0/24 encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 \ > tos 0 erspan ver 1 idx 123 dev erspan1 scope link > > Signed-off-by: Xin Long <lucien.xin@gmail.com> Hi Xin, thanks for your patch. While I have no objection to allowing options to be set, as per the sample ip commands above, I am concerned that the approach you have taken exposes to userspace the internal encoding used by the kernel for these options. This is the same concerned that was raised by others when I posed a patch to allow setting of Geneve options in a similar manner. I think what is called for here, as was the case in the Geneve work, is to expose netlink attributes for each option that may be set and have the kernel form these into the internal format (which appears to also be the wire format). > --- > include/uapi/linux/lwtunnel.h | 1 + > net/ipv4/ip_tunnel_core.c | 30 ++++++++++++++++++++++++------ > 2 files changed, 25 insertions(+), 6 deletions(-) > > diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h > index de696ca..93f2c05 100644 > --- a/include/uapi/linux/lwtunnel.h > +++ b/include/uapi/linux/lwtunnel.h > @@ -27,6 +27,7 @@ enum lwtunnel_ip_t { > LWTUNNEL_IP_TOS, > LWTUNNEL_IP_FLAGS, > LWTUNNEL_IP_PAD, > + LWTUNNEL_IP_OPTS, > __LWTUNNEL_IP_MAX, > }; > > diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c > index 10f0848..d9b7188 100644 > --- a/net/ipv4/ip_tunnel_core.c > +++ b/net/ipv4/ip_tunnel_core.c > @@ -218,6 +218,7 @@ static const struct nla_policy ip_tun_policy[LWTUNNEL_IP_MAX + 1] = { > [LWTUNNEL_IP_TTL] = { .type = NLA_U8 }, > [LWTUNNEL_IP_TOS] = { .type = NLA_U8 }, > [LWTUNNEL_IP_FLAGS] = { .type = NLA_U16 }, > + [LWTUNNEL_IP_OPTS] = { .type = NLA_BINARY }, > }; ...
On Wed, 9 Oct 2019 09:55:27 +0200, Simon Horman wrote:
> This is the same concerned that was raised by others when I posed a patch
> to allow setting of Geneve options in a similar manner. I think what is
> called for here, as was the case in the Geneve work, is to expose netlink
> attributes for each option that may be set and have the kernel form
> these into the internal format (which appears to also be the wire format).
I agree with Simon.
Thanks,
Jiri
On Wed, Oct 9, 2019 at 3:55 PM Simon Horman <simon.horman@netronome.com> wrote: > > On Tue, Oct 08, 2019 at 11:16:12PM +0800, Xin Long wrote: > > This patch is to add LWTUNNEL_IP_OPTS into lwtunnel_ip_t, by which > > users will be able to set options for ip_tunnel_info by "ip route > > encap" for erspan and vxlan's private metadata. Like one way to go > > in iproute is: > > > > # ip route add 1.1.1.0/24 encap ip id 1 erspan ver 1 idx 123 \ > > dst 10.1.0.2 dev erspan1 > > # ip route show > > 1.1.1.0/24 encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 \ > > tos 0 erspan ver 1 idx 123 dev erspan1 scope link > > > > Signed-off-by: Xin Long <lucien.xin@gmail.com> > > Hi Xin, > > thanks for your patch. > > While I have no objection to allowing options to be set, as per the sample > ip commands above, I am concerned that the approach you have taken exposes > to userspace the internal encoding used by the kernel for these options. > > This is the same concerned that was raised by others when I posed a patch > to allow setting of Geneve options in a similar manner. I think what is > called for here, as was the case in the Geneve work, is to expose netlink > attributes for each option that may be set and have the kernel form > these into the internal format (which appears to also be the wire format). Understand. Do you think if it's necessary to support for setting these options by ip route command? or if yes, should we introduce a global lwtun_option_ops_list where geneve/vxlan/erspan could register their own option parsing functions? > > > --- > > include/uapi/linux/lwtunnel.h | 1 + > > net/ipv4/ip_tunnel_core.c | 30 ++++++++++++++++++++++++------ > > 2 files changed, 25 insertions(+), 6 deletions(-) > > > > diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h > > index de696ca..93f2c05 100644 > > --- a/include/uapi/linux/lwtunnel.h > > +++ b/include/uapi/linux/lwtunnel.h > > @@ -27,6 +27,7 @@ enum lwtunnel_ip_t { > > LWTUNNEL_IP_TOS, > > LWTUNNEL_IP_FLAGS, > > LWTUNNEL_IP_PAD, > > + LWTUNNEL_IP_OPTS, > > __LWTUNNEL_IP_MAX, > > }; > > > > diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c > > index 10f0848..d9b7188 100644 > > --- a/net/ipv4/ip_tunnel_core.c > > +++ b/net/ipv4/ip_tunnel_core.c > > @@ -218,6 +218,7 @@ static const struct nla_policy ip_tun_policy[LWTUNNEL_IP_MAX + 1] = { > > [LWTUNNEL_IP_TTL] = { .type = NLA_U8 }, > > [LWTUNNEL_IP_TOS] = { .type = NLA_U8 }, > > [LWTUNNEL_IP_FLAGS] = { .type = NLA_U16 }, > > + [LWTUNNEL_IP_OPTS] = { .type = NLA_BINARY }, > > }; > > ...
On Thu, Oct 10, 2019 at 05:45:14PM +0800, Xin Long wrote:
> On Wed, Oct 9, 2019 at 3:55 PM Simon Horman <simon.horman@netronome.com> wrote:
> >
> > On Tue, Oct 08, 2019 at 11:16:12PM +0800, Xin Long wrote:
> > > This patch is to add LWTUNNEL_IP_OPTS into lwtunnel_ip_t, by which
> > > users will be able to set options for ip_tunnel_info by "ip route
> > > encap" for erspan and vxlan's private metadata. Like one way to go
> > > in iproute is:
> > >
> > > # ip route add 1.1.1.0/24 encap ip id 1 erspan ver 1 idx 123 \
> > > dst 10.1.0.2 dev erspan1
> > > # ip route show
> > > 1.1.1.0/24 encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 \
> > > tos 0 erspan ver 1 idx 123 dev erspan1 scope link
> > >
> > > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> >
> > Hi Xin,
> >
> > thanks for your patch.
> >
> > While I have no objection to allowing options to be set, as per the sample
> > ip commands above, I am concerned that the approach you have taken exposes
> > to userspace the internal encoding used by the kernel for these options.
> >
> > This is the same concerned that was raised by others when I posed a patch
> > to allow setting of Geneve options in a similar manner. I think what is
> > called for here, as was the case in the Geneve work, is to expose netlink
> > attributes for each option that may be set and have the kernel form
> > these into the internal format (which appears to also be the wire format).
>
> Understand.
>
> Do you think if it's necessary to support for setting these options
> by ip route command?
>
> or if yes, should we introduce a global lwtun_option_ops_list where
> geneve/vxlan/erspan could register their own option parsing functions?
Hi Xin,
In the case of Geneve the options are (now) exposed via tc cls_flower and
act_tunnel rather than ip route. The approach taken was to create an
ENC_OPTS attribute with a ENC_OPTS_GENEVE sub-attribute which in turn has
sub-attributes that allow options to be described.
The idea behind this design, as I recall, was to allow other options, say
for VXLAN, to be added via a new sub-attribute of ENC_OPTS. Without
examining the details beyond your patch I think a similar approach would
work well for options supplied via ip route.
...
On Fri, Oct 11, 2019 at 1:31 PM Simon Horman <simon.horman@netronome.com> wrote:
>
> On Thu, Oct 10, 2019 at 05:45:14PM +0800, Xin Long wrote:
> > On Wed, Oct 9, 2019 at 3:55 PM Simon Horman <simon.horman@netronome.com> wrote:
> > >
> > > On Tue, Oct 08, 2019 at 11:16:12PM +0800, Xin Long wrote:
> > > > This patch is to add LWTUNNEL_IP_OPTS into lwtunnel_ip_t, by which
> > > > users will be able to set options for ip_tunnel_info by "ip route
> > > > encap" for erspan and vxlan's private metadata. Like one way to go
> > > > in iproute is:
> > > >
> > > > # ip route add 1.1.1.0/24 encap ip id 1 erspan ver 1 idx 123 \
> > > > dst 10.1.0.2 dev erspan1
> > > > # ip route show
> > > > 1.1.1.0/24 encap ip id 1 src 0.0.0.0 dst 10.1.0.2 ttl 0 \
> > > > tos 0 erspan ver 1 idx 123 dev erspan1 scope link
> > > >
> > > > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > >
> > > Hi Xin,
> > >
> > > thanks for your patch.
> > >
> > > While I have no objection to allowing options to be set, as per the sample
> > > ip commands above, I am concerned that the approach you have taken exposes
> > > to userspace the internal encoding used by the kernel for these options.
> > >
> > > This is the same concerned that was raised by others when I posed a patch
> > > to allow setting of Geneve options in a similar manner. I think what is
> > > called for here, as was the case in the Geneve work, is to expose netlink
> > > attributes for each option that may be set and have the kernel form
> > > these into the internal format (which appears to also be the wire format).
> >
> > Understand.
> >
> > Do you think if it's necessary to support for setting these options
> > by ip route command?
> >
> > or if yes, should we introduce a global lwtun_option_ops_list where
> > geneve/vxlan/erspan could register their own option parsing functions?
>
> Hi Xin,
>
> In the case of Geneve the options are (now) exposed via tc cls_flower and
> act_tunnel rather than ip route. The approach taken was to create an
> ENC_OPTS attribute with a ENC_OPTS_GENEVE sub-attribute which in turn has
> sub-attributes that allow options to be described.
>
> The idea behind this design, as I recall, was to allow other options, say
> for VXLAN, to be added via a new sub-attribute of ENC_OPTS. Without
> examining the details beyond your patch I think a similar approach would
> work well for options supplied via ip route.
Thanks, I will give it a try.