* [PATCH net-next V7 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id()
2016-09-08 13:23 [PATCH net-next V7 0/4] net/sched: ip tunnel metadata set/release/classify by using TC Hadar Hen Zion
@ 2016-09-08 13:23 ` Hadar Hen Zion
2016-09-08 13:23 ` [PATCH net-next V7 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb Hadar Hen Zion
` (3 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Hadar Hen Zion @ 2016-09-08 13:23 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
Tom Herbert, Eric Dumazet, Cong Wang, Amir Vadai, Or Gerlitz,
Amir Vadai, Hadar Hen Zion
From: Amir Vadai <amir@vadai.me>
Add utility functions to convert a 32 bits key into a 64 bits tunnel and
vice versa.
These functions will be used instead of cloning code in GRE and VXLAN,
and in tc act_iptunnel which will be introduced in a following patch in
this patchset.
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
drivers/net/vxlan.c | 4 ++--
include/net/ip_tunnels.h | 19 +++++++++++++++++++
include/net/vxlan.h | 18 ------------------
net/ipv4/ip_gre.c | 23 ++---------------------
4 files changed, 23 insertions(+), 41 deletions(-)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 199dec0..4bfeb97 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1291,7 +1291,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
struct metadata_dst *tun_dst;
tun_dst = udp_tun_rx_dst(skb, vxlan_get_sk_family(vs), TUNNEL_KEY,
- vxlan_vni_to_tun_id(vni), sizeof(*md));
+ key32_to_tunnel_id(vni), sizeof(*md));
if (!tun_dst)
goto drop;
@@ -1945,7 +1945,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
goto drop;
}
dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port;
- vni = vxlan_tun_id_to_vni(info->key.tun_id);
+ vni = tunnel_id_to_key32(info->key.tun_id);
remote_ip.sa.sa_family = ip_tunnel_info_af(info);
if (remote_ip.sa.sa_family == AF_INET) {
remote_ip.sin.sin_addr.s_addr = info->key.u.ipv4.dst;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index a5e7035..e598c63 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -222,6 +222,25 @@ static inline unsigned short ip_tunnel_info_af(const struct ip_tunnel_info
return tun_info->mode & IP_TUNNEL_INFO_IPV6 ? AF_INET6 : AF_INET;
}
+static inline __be64 key32_to_tunnel_id(__be32 key)
+{
+#ifdef __BIG_ENDIAN
+ return (__force __be64)key;
+#else
+ return (__force __be64)((__force u64)key << 32);
+#endif
+}
+
+/* Returns the least-significant 32 bits of a __be64. */
+static inline __be32 tunnel_id_to_key32(__be64 tun_id)
+{
+#ifdef __BIG_ENDIAN
+ return (__force __be32)tun_id;
+#else
+ return (__force __be32)((__force u64)tun_id >> 32);
+#endif
+}
+
#ifdef CONFIG_INET
int ip_tunnel_init(struct net_device *dev);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index b96d036..0255613 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -350,24 +350,6 @@ static inline __be32 vxlan_vni_field(__be32 vni)
#endif
}
-static inline __be32 vxlan_tun_id_to_vni(__be64 tun_id)
-{
-#if defined(__BIG_ENDIAN)
- return (__force __be32)tun_id;
-#else
- return (__force __be32)((__force u64)tun_id >> 32);
-#endif
-}
-
-static inline __be64 vxlan_vni_to_tun_id(__be32 vni)
-{
-#if defined(__BIG_ENDIAN)
- return (__force __be64)vni;
-#else
- return (__force __be64)((u64)(__force u32)vni << 32);
-#endif
-}
-
static inline size_t vxlan_rco_start(__be32 vni_field)
{
return be32_to_cpu(vni_field & VXLAN_RCO_MASK) << VXLAN_RCO_SHIFT;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 113cc43..576f705 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -246,25 +246,6 @@ static void gre_err(struct sk_buff *skb, u32 info)
ipgre_err(skb, info, &tpi);
}
-static __be64 key_to_tunnel_id(__be32 key)
-{
-#ifdef __BIG_ENDIAN
- return (__force __be64)((__force u32)key);
-#else
- return (__force __be64)((__force u64)key << 32);
-#endif
-}
-
-/* Returns the least-significant 32 bits of a __be64. */
-static __be32 tunnel_id_to_key(__be64 x)
-{
-#ifdef __BIG_ENDIAN
- return (__force __be32)x;
-#else
- return (__force __be32)((__force u64)x >> 32);
-#endif
-}
-
static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
struct ip_tunnel_net *itn, int hdr_len, bool raw_proto)
{
@@ -290,7 +271,7 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
__be64 tun_id;
flags = tpi->flags & (TUNNEL_CSUM | TUNNEL_KEY);
- tun_id = key_to_tunnel_id(tpi->key);
+ tun_id = key32_to_tunnel_id(tpi->key);
tun_dst = ip_tun_rx_dst(skb, flags, tun_id, 0);
if (!tun_dst)
return PACKET_REJECT;
@@ -446,7 +427,7 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
flags = tun_info->key.tun_flags & (TUNNEL_CSUM | TUNNEL_KEY);
gre_build_header(skb, tunnel_hlen, flags, proto,
- tunnel_id_to_key(tun_info->key.tun_id), 0);
+ tunnel_id_to_key32(tun_info->key.tun_id), 0);
df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next V7 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb
2016-09-08 13:23 [PATCH net-next V7 0/4] net/sched: ip tunnel metadata set/release/classify by using TC Hadar Hen Zion
2016-09-08 13:23 ` [PATCH net-next V7 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id() Hadar Hen Zion
@ 2016-09-08 13:23 ` Hadar Hen Zion
2016-09-08 13:23 ` [PATCH net-next V7 3/4] net/sched: cls_flower: Classify packet in ip tunnels Hadar Hen Zion
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Hadar Hen Zion @ 2016-09-08 13:23 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
Tom Herbert, Eric Dumazet, Cong Wang, Amir Vadai, Or Gerlitz,
Amir Vadai, Hadar Hen Zion
From: Amir Vadai <amir@vadai.me>
Extract __ip_tun_set_dst() and __ipv6_tun_set_dst() out of
ip_tun_rx_dst() and ipv6_tun_rx_dst(), to be used without supplying an
skb.
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
include/net/dst_metadata.h | 52 ++++++++++++++++++++++++++++++++++------------
1 file changed, 39 insertions(+), 13 deletions(-)
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 5db9f59..6965c8f 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -112,12 +112,13 @@ static inline struct ip_tunnel_info *skb_tunnel_info_unclone(struct sk_buff *skb
return &dst->u.tun_info;
}
-static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
- __be16 flags,
- __be64 tunnel_id,
- int md_size)
+static inline struct metadata_dst *__ip_tun_set_dst(__be32 saddr,
+ __be32 daddr,
+ __u8 tos, __u8 ttl,
+ __be16 flags,
+ __be64 tunnel_id,
+ int md_size)
{
- const struct iphdr *iph = ip_hdr(skb);
struct metadata_dst *tun_dst;
tun_dst = tun_rx_dst(md_size);
@@ -125,17 +126,30 @@ static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
return NULL;
ip_tunnel_key_init(&tun_dst->u.tun_info.key,
- iph->saddr, iph->daddr, iph->tos, iph->ttl,
+ saddr, daddr, tos, ttl,
0, 0, 0, tunnel_id, flags);
return tun_dst;
}
-static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
+static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
__be16 flags,
__be64 tunnel_id,
int md_size)
{
- const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+ const struct iphdr *iph = ip_hdr(skb);
+
+ return __ip_tun_set_dst(iph->saddr, iph->daddr, iph->tos, iph->ttl,
+ flags, tunnel_id, md_size);
+}
+
+static inline struct metadata_dst *__ipv6_tun_set_dst(const struct in6_addr *saddr,
+ const struct in6_addr *daddr,
+ __u8 tos, __u8 ttl,
+ __be32 label,
+ __be16 flags,
+ __be64 tunnel_id,
+ int md_size)
+{
struct metadata_dst *tun_dst;
struct ip_tunnel_info *info;
@@ -150,14 +164,26 @@ static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
info->key.tp_src = 0;
info->key.tp_dst = 0;
- info->key.u.ipv6.src = ip6h->saddr;
- info->key.u.ipv6.dst = ip6h->daddr;
+ info->key.u.ipv6.src = *saddr;
+ info->key.u.ipv6.dst = *daddr;
- info->key.tos = ipv6_get_dsfield(ip6h);
- info->key.ttl = ip6h->hop_limit;
- info->key.label = ip6_flowlabel(ip6h);
+ info->key.tos = tos;
+ info->key.ttl = ttl;
+ info->key.label = label;
return tun_dst;
}
+static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
+ __be16 flags,
+ __be64 tunnel_id,
+ int md_size)
+{
+ const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+
+ return __ipv6_tun_set_dst(&ip6h->saddr, &ip6h->daddr,
+ ipv6_get_dsfield(ip6h), ip6h->hop_limit,
+ ip6_flowlabel(ip6h), flags, tunnel_id,
+ md_size);
+}
#endif /* __NET_DST_METADATA_H */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next V7 3/4] net/sched: cls_flower: Classify packet in ip tunnels
2016-09-08 13:23 [PATCH net-next V7 0/4] net/sched: ip tunnel metadata set/release/classify by using TC Hadar Hen Zion
2016-09-08 13:23 ` [PATCH net-next V7 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id() Hadar Hen Zion
2016-09-08 13:23 ` [PATCH net-next V7 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb Hadar Hen Zion
@ 2016-09-08 13:23 ` Hadar Hen Zion
2016-09-08 13:23 ` [PATCH net-next V7 4/4] net/sched: Introduce act_tunnel_key Hadar Hen Zion
2016-09-11 4:06 ` [PATCH net-next V7 0/4] net/sched: ip tunnel metadata set/release/classify by using TC David Miller
4 siblings, 0 replies; 11+ messages in thread
From: Hadar Hen Zion @ 2016-09-08 13:23 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
Tom Herbert, Eric Dumazet, Cong Wang, Amir Vadai, Or Gerlitz,
Amir Vadai, Hadar Hen Zion
From: Amir Vadai <amir@vadai.me>
Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.
For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0' (after metadata is released):
$ tc filter add dev vxlan0 protocol ip parent ffff: \
flower \
enc_src_ip 11.11.0.2 \
enc_dst_ip 11.11.0.1 \
enc_key_id 11 \
dst_ip 11.11.11.1 \
action tunnel_key release \
action mirred egress redirect dev vnet0
The action tunnel_key, will be introduced in the next patch in this
series.
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
include/uapi/linux/pkt_cls.h | 11 +++++
net/sched/cls_flower.c | 100 ++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 110 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51b5b24..f9c287c 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -431,6 +431,17 @@ enum {
TCA_FLOWER_KEY_VLAN_ID,
TCA_FLOWER_KEY_VLAN_PRIO,
TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+
+ TCA_FLOWER_KEY_ENC_KEY_ID, /* be32 */
+ TCA_FLOWER_KEY_ENC_IPV4_SRC, /* be32 */
+ TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,/* be32 */
+ TCA_FLOWER_KEY_ENC_IPV4_DST, /* be32 */
+ TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,/* be32 */
+ TCA_FLOWER_KEY_ENC_IPV6_SRC, /* struct in6_addr */
+ TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,/* struct in6_addr */
+ TCA_FLOWER_KEY_ENC_IPV6_DST, /* struct in6_addr */
+ TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,/* struct in6_addr */
+
__TCA_FLOWER_MAX,
};
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index cf9ad5b..b084b2a 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -23,9 +23,13 @@
#include <net/ip.h>
#include <net/flow_dissector.h>
+#include <net/dst.h>
+#include <net/dst_metadata.h>
+
struct fl_flow_key {
int indev_ifindex;
struct flow_dissector_key_control control;
+ struct flow_dissector_key_control enc_control;
struct flow_dissector_key_basic basic;
struct flow_dissector_key_eth_addrs eth;
struct flow_dissector_key_vlan vlan;
@@ -35,6 +39,11 @@ struct fl_flow_key {
struct flow_dissector_key_ipv6_addrs ipv6;
};
struct flow_dissector_key_ports tp;
+ struct flow_dissector_key_keyid enc_key_id;
+ union {
+ struct flow_dissector_key_ipv4_addrs enc_ipv4;
+ struct flow_dissector_key_ipv6_addrs enc_ipv6;
+ };
} __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. */
struct fl_flow_mask_range {
@@ -124,11 +133,31 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
struct cls_fl_filter *f;
struct fl_flow_key skb_key;
struct fl_flow_key skb_mkey;
+ struct ip_tunnel_info *info;
if (!atomic_read(&head->ht.nelems))
return -1;
fl_clear_masked_range(&skb_key, &head->mask);
+
+ info = skb_tunnel_info(skb);
+ if (info) {
+ struct ip_tunnel_key *key = &info->key;
+
+ switch (ip_tunnel_info_af(info)) {
+ case AF_INET:
+ skb_key.enc_ipv4.src = key->u.ipv4.src;
+ skb_key.enc_ipv4.dst = key->u.ipv4.dst;
+ break;
+ case AF_INET6:
+ skb_key.enc_ipv6.src = key->u.ipv6.src;
+ skb_key.enc_ipv6.dst = key->u.ipv6.dst;
+ break;
+ }
+
+ skb_key.enc_key_id.keyid = tunnel_id_to_key32(key->tun_id);
+ }
+
skb_key.indev_ifindex = skb->skb_iif;
/* skb_flow_dissect() does not set n_proto in case an unknown protocol,
* so do it rather here.
@@ -297,7 +326,15 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
[TCA_FLOWER_KEY_VLAN_ID] = { .type = NLA_U16 },
[TCA_FLOWER_KEY_VLAN_PRIO] = { .type = NLA_U8 },
[TCA_FLOWER_KEY_VLAN_ETH_TYPE] = { .type = NLA_U16 },
-
+ [TCA_FLOWER_KEY_ENC_KEY_ID] = { .type = NLA_U32 },
+ [TCA_FLOWER_KEY_ENC_IPV4_SRC] = { .type = NLA_U32 },
+ [TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK] = { .type = NLA_U32 },
+ [TCA_FLOWER_KEY_ENC_IPV4_DST] = { .type = NLA_U32 },
+ [TCA_FLOWER_KEY_ENC_IPV4_DST_MASK] = { .type = NLA_U32 },
+ [TCA_FLOWER_KEY_ENC_IPV6_SRC] = { .len = sizeof(struct in6_addr) },
+ [TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK] = { .len = sizeof(struct in6_addr) },
+ [TCA_FLOWER_KEY_ENC_IPV6_DST] = { .len = sizeof(struct in6_addr) },
+ [TCA_FLOWER_KEY_ENC_IPV6_DST_MASK] = { .len = sizeof(struct in6_addr) },
};
static void fl_set_key_val(struct nlattr **tb,
@@ -409,6 +446,40 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
sizeof(key->tp.dst));
}
+ if (tb[TCA_FLOWER_KEY_ENC_IPV4_SRC] ||
+ tb[TCA_FLOWER_KEY_ENC_IPV4_DST]) {
+ key->enc_control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
+ fl_set_key_val(tb, &key->enc_ipv4.src,
+ TCA_FLOWER_KEY_ENC_IPV4_SRC,
+ &mask->enc_ipv4.src,
+ TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,
+ sizeof(key->enc_ipv4.src));
+ fl_set_key_val(tb, &key->enc_ipv4.dst,
+ TCA_FLOWER_KEY_ENC_IPV4_DST,
+ &mask->enc_ipv4.dst,
+ TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,
+ sizeof(key->enc_ipv4.dst));
+ }
+
+ if (tb[TCA_FLOWER_KEY_ENC_IPV6_SRC] ||
+ tb[TCA_FLOWER_KEY_ENC_IPV6_DST]) {
+ key->enc_control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
+ fl_set_key_val(tb, &key->enc_ipv6.src,
+ TCA_FLOWER_KEY_ENC_IPV6_SRC,
+ &mask->enc_ipv6.src,
+ TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,
+ sizeof(key->enc_ipv6.src));
+ fl_set_key_val(tb, &key->enc_ipv6.dst,
+ TCA_FLOWER_KEY_ENC_IPV6_DST,
+ &mask->enc_ipv6.dst,
+ TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,
+ sizeof(key->enc_ipv6.dst));
+ }
+
+ fl_set_key_val(tb, &key->enc_key_id.keyid, TCA_FLOWER_KEY_ENC_KEY_ID,
+ &mask->enc_key_id.keyid, TCA_FLOWER_KEY_ENC_KEY_ID,
+ sizeof(key->enc_key_id.keyid));
+
return 0;
}
@@ -821,6 +892,33 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
sizeof(key->tp.dst))))
goto nla_put_failure;
+ if (key->enc_control.addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS &&
+ (fl_dump_key_val(skb, &key->enc_ipv4.src,
+ TCA_FLOWER_KEY_ENC_IPV4_SRC, &mask->enc_ipv4.src,
+ TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,
+ sizeof(key->enc_ipv4.src)) ||
+ fl_dump_key_val(skb, &key->enc_ipv4.dst,
+ TCA_FLOWER_KEY_ENC_IPV4_DST, &mask->enc_ipv4.dst,
+ TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,
+ sizeof(key->enc_ipv4.dst))))
+ goto nla_put_failure;
+ else if (key->enc_control.addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS &&
+ (fl_dump_key_val(skb, &key->enc_ipv6.src,
+ TCA_FLOWER_KEY_ENC_IPV6_SRC, &mask->enc_ipv6.src,
+ TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,
+ sizeof(key->enc_ipv6.src)) ||
+ fl_dump_key_val(skb, &key->enc_ipv6.dst,
+ TCA_FLOWER_KEY_ENC_IPV6_DST,
+ &mask->enc_ipv6.dst,
+ TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,
+ sizeof(key->enc_ipv6.dst))))
+ goto nla_put_failure;
+
+ if (fl_dump_key_val(skb, &key->enc_key_id, TCA_FLOWER_KEY_ENC_KEY_ID,
+ &mask->enc_key_id, TCA_FLOWER_KEY_ENC_KEY_ID,
+ sizeof(key->enc_key_id)))
+ goto nla_put_failure;
+
nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags);
if (tcf_exts_dump(skb, &f->exts))
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next V7 4/4] net/sched: Introduce act_tunnel_key
2016-09-08 13:23 [PATCH net-next V7 0/4] net/sched: ip tunnel metadata set/release/classify by using TC Hadar Hen Zion
` (2 preceding siblings ...)
2016-09-08 13:23 ` [PATCH net-next V7 3/4] net/sched: cls_flower: Classify packet in ip tunnels Hadar Hen Zion
@ 2016-09-08 13:23 ` Hadar Hen Zion
2016-09-08 14:19 ` Eric Dumazet
2016-09-08 16:15 ` John Fastabend
2016-09-11 4:06 ` [PATCH net-next V7 0/4] net/sched: ip tunnel metadata set/release/classify by using TC David Miller
4 siblings, 2 replies; 11+ messages in thread
From: Hadar Hen Zion @ 2016-09-08 13:23 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
Tom Herbert, Eric Dumazet, Cong Wang, Amir Vadai, Or Gerlitz,
Amir Vadai, Hadar Hen Zion
From: Amir Vadai <amir@vadai.me>
This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.
The action will release the metadata created by the tunnel device
(decap), or set the metadata with the specified values for encap
operation.
For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:
$ tc filter add dev net0 protocol ip parent ffff: \
flower \
ip_proto 1 \
dst_ip 11.11.11.2 \
action tunnel_key set \
src_ip 11.11.0.1 \
dst_ip 11.11.0.2 \
id 11 \
action mirred egress redirect dev vxlan0
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
include/net/tc_act/tc_tunnel_key.h | 31 +++
include/uapi/linux/tc_act/tc_tunnel_key.h | 42 ++++
net/sched/Kconfig | 11 +
net/sched/Makefile | 1 +
net/sched/act_tunnel_key.c | 351 ++++++++++++++++++++++++++++++
5 files changed, 436 insertions(+)
create mode 100644 include/net/tc_act/tc_tunnel_key.h
create mode 100644 include/uapi/linux/tc_act/tc_tunnel_key.h
create mode 100644 net/sched/act_tunnel_key.c
diff --git a/include/net/tc_act/tc_tunnel_key.h b/include/net/tc_act/tc_tunnel_key.h
new file mode 100644
index 0000000..6fd2255
--- /dev/null
+++ b/include/net/tc_act/tc_tunnel_key.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <amir@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __NET_TC_TUNNEL_KEY_H
+#define __NET_TC_TUNNEL_KEY_H
+
+#include <net/act_api.h>
+
+struct tcf_tunnel_key_params {
+ struct rcu_head rcu;
+ int tcft_action;
+ int action;
+ struct metadata_dst *tcft_enc_metadata;
+};
+
+struct tcf_tunnel_key {
+ struct tc_action common;
+ struct tcf_tunnel_key_params __rcu *params;
+};
+
+#define to_tunnel_key(a) ((struct tcf_tunnel_key *)a)
+
+#endif /* __NET_TC_TUNNEL_KEY_H */
+
diff --git a/include/uapi/linux/tc_act/tc_tunnel_key.h b/include/uapi/linux/tc_act/tc_tunnel_key.h
new file mode 100644
index 0000000..f9ddf53
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_tunnel_key.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <amir@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __LINUX_TC_TUNNEL_KEY_H
+#define __LINUX_TC_TUNNEL_KEY_H
+
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_TUNNEL_KEY 17
+
+#define TCA_TUNNEL_KEY_ACT_SET 1
+#define TCA_TUNNEL_KEY_ACT_RELEASE 2
+
+struct tc_tunnel_key {
+ tc_gen;
+ int t_action;
+};
+
+enum {
+ TCA_TUNNEL_KEY_UNSPEC,
+ TCA_TUNNEL_KEY_TM,
+ TCA_TUNNEL_KEY_PARMS,
+ TCA_TUNNEL_KEY_ENC_IPV4_SRC, /* be32 */
+ TCA_TUNNEL_KEY_ENC_IPV4_DST, /* be32 */
+ TCA_TUNNEL_KEY_ENC_IPV6_SRC, /* struct in6_addr */
+ TCA_TUNNEL_KEY_ENC_IPV6_DST, /* struct in6_addr */
+ TCA_TUNNEL_KEY_ENC_KEY_ID, /* be64 */
+ TCA_TUNNEL_KEY_PAD,
+ __TCA_TUNNEL_KEY_MAX,
+};
+
+#define TCA_TUNNEL_KEY_MAX (__TCA_TUNNEL_KEY_MAX - 1)
+
+#endif
+
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index ccf931b..72e3426 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -761,6 +761,17 @@ config NET_ACT_IFE
To compile this code as a module, choose M here: the
module will be called act_ife.
+config NET_ACT_TUNNEL_KEY
+ tristate "IP tunnel metadata manipulation"
+ depends on NET_CLS_ACT
+ ---help---
+ Say Y here to set/release ip tunnel metadata.
+
+ If unsure, say N.
+
+ To compile this code as a module, choose M here: the
+ module will be called act_tunnel_key.
+
config NET_IFE_SKBMARK
tristate "Support to encoding decoding skb mark on IFE action"
depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index ae088a5..b9d046b 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_NET_ACT_CONNMARK) += act_connmark.o
obj-$(CONFIG_NET_ACT_IFE) += act_ife.o
obj-$(CONFIG_NET_IFE_SKBMARK) += act_meta_mark.o
obj-$(CONFIG_NET_IFE_SKBPRIO) += act_meta_skbprio.o
+obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o
obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o
diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
new file mode 100644
index 0000000..dceff74
--- /dev/null
+++ b/net/sched/act_tunnel_key.c
@@ -0,0 +1,351 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <amir@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/dst.h>
+#include <net/dst_metadata.h>
+
+#include <linux/tc_act/tc_tunnel_key.h>
+#include <net/tc_act/tc_tunnel_key.h>
+
+#define TUNNEL_KEY_TAB_MASK 15
+
+static int tunnel_key_net_id;
+static struct tc_action_ops act_tunnel_key_ops;
+
+static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
+ struct tcf_result *res)
+{
+ struct tcf_tunnel_key *t = to_tunnel_key(a);
+ struct tcf_tunnel_key_params *params;
+ int action;
+
+ rcu_read_lock();
+
+ params = rcu_dereference(t->params);
+
+ tcf_lastuse_update(&t->tcf_tm);
+ bstats_cpu_update(this_cpu_ptr(t->common.cpu_bstats), skb);
+ action = params->action;
+
+ switch (params->tcft_action) {
+ case TCA_TUNNEL_KEY_ACT_RELEASE:
+ skb_dst_drop(skb);
+ break;
+ case TCA_TUNNEL_KEY_ACT_SET:
+ skb_dst_drop(skb);
+ skb_dst_set(skb, dst_clone(¶ms->tcft_enc_metadata->dst));
+ break;
+ default:
+ WARN_ONCE(1, "Bad tunnel_key action %d.\n",
+ params->tcft_action);
+ break;
+ }
+
+ rcu_read_unlock();
+
+ return action;
+}
+
+static const struct nla_policy tunnel_key_policy[TCA_TUNNEL_KEY_MAX + 1] = {
+ [TCA_TUNNEL_KEY_PARMS] = { .len = sizeof(struct tc_tunnel_key) },
+ [TCA_TUNNEL_KEY_ENC_IPV4_SRC] = { .type = NLA_U32 },
+ [TCA_TUNNEL_KEY_ENC_IPV4_DST] = { .type = NLA_U32 },
+ [TCA_TUNNEL_KEY_ENC_IPV6_SRC] = { .len = sizeof(struct in6_addr) },
+ [TCA_TUNNEL_KEY_ENC_IPV6_DST] = { .len = sizeof(struct in6_addr) },
+ [TCA_TUNNEL_KEY_ENC_KEY_ID] = { .type = NLA_U32 },
+};
+
+static int tunnel_key_init(struct net *net, struct nlattr *nla,
+ struct nlattr *est, struct tc_action **a,
+ int ovr, int bind)
+{
+ struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
+ struct nlattr *tb[TCA_TUNNEL_KEY_MAX + 1];
+ struct tcf_tunnel_key_params *params_old;
+ struct tcf_tunnel_key_params *params_new;
+ struct metadata_dst *metadata = NULL;
+ struct tc_tunnel_key *parm;
+ struct tcf_tunnel_key *t;
+ bool exists = false;
+ __be64 key_id;
+ int ret = 0;
+ int err;
+
+ if (!nla)
+ return -EINVAL;
+
+ err = nla_parse_nested(tb, TCA_TUNNEL_KEY_MAX, nla, tunnel_key_policy);
+ if (err < 0)
+ return err;
+
+ if (!tb[TCA_TUNNEL_KEY_PARMS])
+ return -EINVAL;
+
+ parm = nla_data(tb[TCA_TUNNEL_KEY_PARMS]);
+ exists = tcf_hash_check(tn, parm->index, a, bind);
+ if (exists && bind)
+ return 0;
+
+ switch (parm->t_action) {
+ case TCA_TUNNEL_KEY_ACT_RELEASE:
+ break;
+ case TCA_TUNNEL_KEY_ACT_SET:
+ if (!tb[TCA_TUNNEL_KEY_ENC_KEY_ID]) {
+ ret = -EINVAL;
+ goto err_out;
+ }
+
+ key_id = key32_to_tunnel_id(nla_get_be32(tb[TCA_TUNNEL_KEY_ENC_KEY_ID]));
+
+ if (tb[TCA_TUNNEL_KEY_ENC_IPV4_SRC] &&
+ tb[TCA_TUNNEL_KEY_ENC_IPV4_DST]) {
+ __be32 saddr;
+ __be32 daddr;
+
+ saddr = nla_get_in_addr(tb[TCA_TUNNEL_KEY_ENC_IPV4_SRC]);
+ daddr = nla_get_in_addr(tb[TCA_TUNNEL_KEY_ENC_IPV4_DST]);
+
+ metadata = __ip_tun_set_dst(saddr, daddr, 0, 0,
+ TUNNEL_KEY, key_id, 0);
+ } else if (tb[TCA_TUNNEL_KEY_ENC_IPV6_SRC] &&
+ tb[TCA_TUNNEL_KEY_ENC_IPV6_DST]) {
+ struct in6_addr saddr;
+ struct in6_addr daddr;
+
+ saddr = nla_get_in6_addr(tb[TCA_TUNNEL_KEY_ENC_IPV6_SRC]);
+ daddr = nla_get_in6_addr(tb[TCA_TUNNEL_KEY_ENC_IPV6_DST]);
+
+ metadata = __ipv6_tun_set_dst(&saddr, &daddr, 0, 0, 0,
+ TUNNEL_KEY, key_id, 0);
+ }
+
+ if (!metadata) {
+ ret = -EINVAL;
+ goto err_out;
+ }
+
+ metadata->u.tun_info.mode |= IP_TUNNEL_INFO_TX;
+ break;
+ default:
+ goto err_out;
+ }
+
+ if (!exists) {
+ ret = tcf_hash_create(tn, parm->index, est, a,
+ &act_tunnel_key_ops, bind, true);
+ if (ret)
+ return ret;
+
+ ret = ACT_P_CREATED;
+ } else {
+ tcf_hash_release(*a, bind);
+ if (!ovr)
+ return -EEXIST;
+ }
+
+ t = to_tunnel_key(*a);
+
+ ASSERT_RTNL();
+ params_new = kzalloc(sizeof(*params_new), GFP_KERNEL);
+ if (unlikely(!params_new)) {
+ if (ret == ACT_P_CREATED)
+ tcf_hash_release(*a, bind);
+ return -ENOMEM;
+ }
+
+ params_old = rtnl_dereference(t->params);
+
+ params_new->action = parm->action;
+ params_new->tcft_action = parm->t_action;
+ params_new->tcft_enc_metadata = metadata;
+
+ rcu_assign_pointer(t->params, params_new);
+
+ if (params_old)
+ kfree_rcu(params_old, rcu);
+
+ if (ret == ACT_P_CREATED)
+ tcf_hash_insert(tn, *a);
+
+ return ret;
+
+err_out:
+ if (exists)
+ tcf_hash_release(*a, bind);
+ return ret;
+}
+
+static void tunnel_key_release(struct tc_action *a, int bind)
+{
+ struct tcf_tunnel_key *t = to_tunnel_key(a);
+ struct tcf_tunnel_key_params *params;
+
+ rcu_read_lock();
+ params = rcu_dereference(t->params);
+
+ if (params->tcft_action == TCA_TUNNEL_KEY_ACT_SET)
+ dst_release(¶ms->tcft_enc_metadata->dst);
+
+ kfree_rcu(params, rcu);
+
+ rcu_read_unlock();
+}
+
+static int tunnel_key_dump_addresses(struct sk_buff *skb,
+ const struct ip_tunnel_info *info)
+{
+ unsigned short family = ip_tunnel_info_af(info);
+
+ if (family == AF_INET) {
+ __be32 saddr = info->key.u.ipv4.src;
+ __be32 daddr = info->key.u.ipv4.dst;
+
+ if (!nla_put_in_addr(skb, TCA_TUNNEL_KEY_ENC_IPV4_SRC, saddr) &&
+ !nla_put_in_addr(skb, TCA_TUNNEL_KEY_ENC_IPV4_DST, daddr))
+ return 0;
+ }
+
+ if (family == AF_INET6) {
+ const struct in6_addr *saddr6 = &info->key.u.ipv6.src;
+ const struct in6_addr *daddr6 = &info->key.u.ipv6.dst;
+
+ if (!nla_put_in6_addr(skb,
+ TCA_TUNNEL_KEY_ENC_IPV6_SRC, saddr6) &&
+ !nla_put_in6_addr(skb,
+ TCA_TUNNEL_KEY_ENC_IPV6_DST, daddr6))
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
+static int tunnel_key_dump(struct sk_buff *skb, struct tc_action *a,
+ int bind, int ref)
+{
+ unsigned char *b = skb_tail_pointer(skb);
+ struct tcf_tunnel_key *t = to_tunnel_key(a);
+ struct tcf_tunnel_key_params *params;
+ struct tc_tunnel_key opt = {
+ .index = t->tcf_index,
+ .refcnt = t->tcf_refcnt - ref,
+ .bindcnt = t->tcf_bindcnt - bind,
+ };
+ struct tcf_t tm;
+ int ret = -1;
+
+ rcu_read_lock();
+ params = rcu_dereference(t->params);
+
+ opt.t_action = params->tcft_action;
+ opt.action = params->action;
+
+ if (nla_put(skb, TCA_TUNNEL_KEY_PARMS, sizeof(opt), &opt))
+ goto nla_put_failure;
+
+ if (params->tcft_action == TCA_TUNNEL_KEY_ACT_SET) {
+ struct ip_tunnel_key *key =
+ ¶ms->tcft_enc_metadata->u.tun_info.key;
+ __be32 key_id = tunnel_id_to_key32(key->tun_id);
+
+ if (nla_put_be32(skb, TCA_TUNNEL_KEY_ENC_KEY_ID, key_id) ||
+ tunnel_key_dump_addresses(skb,
+ ¶ms->tcft_enc_metadata->u.tun_info))
+ goto nla_put_failure;
+ }
+
+ tcf_tm_dump(&tm, &t->tcf_tm);
+ if (nla_put_64bit(skb, TCA_TUNNEL_KEY_TM, sizeof(tm),
+ &tm, TCA_TUNNEL_KEY_PAD))
+ goto nla_put_failure;
+
+ ret = skb->len;
+ goto out;
+
+nla_put_failure:
+ nlmsg_trim(skb, b);
+out:
+ rcu_read_unlock();
+
+ return ret;
+}
+
+static int tunnel_key_walker(struct net *net, struct sk_buff *skb,
+ struct netlink_callback *cb, int type,
+ const struct tc_action_ops *ops)
+{
+ struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
+
+ return tcf_generic_walker(tn, skb, cb, type, ops);
+}
+
+static int tunnel_key_search(struct net *net, struct tc_action **a, u32 index)
+{
+ struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
+
+ return tcf_hash_search(tn, a, index);
+}
+
+static struct tc_action_ops act_tunnel_key_ops = {
+ .kind = "tunnel_key",
+ .type = TCA_ACT_TUNNEL_KEY,
+ .owner = THIS_MODULE,
+ .act = tunnel_key_act,
+ .dump = tunnel_key_dump,
+ .init = tunnel_key_init,
+ .cleanup = tunnel_key_release,
+ .walk = tunnel_key_walker,
+ .lookup = tunnel_key_search,
+ .size = sizeof(struct tcf_tunnel_key),
+};
+
+static __net_init int tunnel_key_init_net(struct net *net)
+{
+ struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
+
+ return tc_action_net_init(tn, &act_tunnel_key_ops, TUNNEL_KEY_TAB_MASK);
+}
+
+static void __net_exit tunnel_key_exit_net(struct net *net)
+{
+ struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
+
+ tc_action_net_exit(tn);
+}
+
+static struct pernet_operations tunnel_key_net_ops = {
+ .init = tunnel_key_init_net,
+ .exit = tunnel_key_exit_net,
+ .id = &tunnel_key_net_id,
+ .size = sizeof(struct tc_action_net),
+};
+
+static int __init tunnel_key_init_module(void)
+{
+ return tcf_register_action(&act_tunnel_key_ops, &tunnel_key_net_ops);
+}
+
+static void __exit tunnel_key_cleanup_module(void)
+{
+ tcf_unregister_action(&act_tunnel_key_ops, &tunnel_key_net_ops);
+}
+
+module_init(tunnel_key_init_module);
+module_exit(tunnel_key_cleanup_module);
+
+MODULE_AUTHOR("Amir Vadai <amir@vadai.me>");
+MODULE_DESCRIPTION("ip tunnel manipulation actions");
+MODULE_LICENSE("GPL v2");
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V7 4/4] net/sched: Introduce act_tunnel_key
2016-09-08 13:23 ` [PATCH net-next V7 4/4] net/sched: Introduce act_tunnel_key Hadar Hen Zion
@ 2016-09-08 14:19 ` Eric Dumazet
2016-09-08 16:15 ` John Fastabend
1 sibling, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2016-09-08 14:19 UTC (permalink / raw)
To: Hadar Hen Zion
Cc: David S. Miller, netdev, Jiri Pirko, Jiri Benc, Jamal Hadi Salim,
Shmulik Ladkani, Tom Herbert, Eric Dumazet, Cong Wang,
Amir Vadai, Or Gerlitz, Amir Vadai
On Thu, 2016-09-08 at 16:23 +0300, Hadar Hen Zion wrote:
> From: Amir Vadai <amir@vadai.me>
>
> This action could be used before redirecting packets to a shared tunnel
> device, or when redirecting packets arriving from a such a device.
> +static void tunnel_key_release(struct tc_action *a, int bind)
> +{
> + struct tcf_tunnel_key *t = to_tunnel_key(a);
> + struct tcf_tunnel_key_params *params;
> +
> + rcu_read_lock();
> + params = rcu_dereference(t->params);
> +
> + if (params->tcft_action == TCA_TUNNEL_KEY_ACT_SET)
> + dst_release(¶ms->tcft_enc_metadata->dst);
> +
> + kfree_rcu(params, rcu);
> +
> + rcu_read_unlock();
> +}
Note that you own the action here, so technically speaking no writer
could possibly modify t->params while this function is running.
So you could use
params = rcu_dereference_protected(t->params, 1)
(I could not find a way to express the 'I own this action and am the
last user' for LOCKDEP sake so I used 1)
instead of
rcu_read_lock();
params = rcu_dereference(t->params);
rcu_read_unlock();
But this is a very minor detail, and this patch looks fine to me, thanks
a lot for your patience Hadar .
Acked-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V7 4/4] net/sched: Introduce act_tunnel_key
2016-09-08 13:23 ` [PATCH net-next V7 4/4] net/sched: Introduce act_tunnel_key Hadar Hen Zion
2016-09-08 14:19 ` Eric Dumazet
@ 2016-09-08 16:15 ` John Fastabend
2016-09-09 5:30 ` Cong Wang
1 sibling, 1 reply; 11+ messages in thread
From: John Fastabend @ 2016-09-08 16:15 UTC (permalink / raw)
To: Hadar Hen Zion, David S. Miller
Cc: netdev, Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
Tom Herbert, Eric Dumazet, Cong Wang, Amir Vadai, Or Gerlitz,
Amir Vadai
On 16-09-08 06:23 AM, Hadar Hen Zion wrote:
> From: Amir Vadai <amir@vadai.me>
>
> This action could be used before redirecting packets to a shared tunnel
> device, or when redirecting packets arriving from a such a device.
>
> The action will release the metadata created by the tunnel device
> (decap), or set the metadata with the specified values for encap
> operation.
>
> For example, the following flower filter will forward all ICMP packets
> destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
> redirecting, a metadata for the vxlan tunnel is created using the
> tunnel_key action and it's arguments:
>
> $ tc filter add dev net0 protocol ip parent ffff: \
> flower \
> ip_proto 1 \
> dst_ip 11.11.11.2 \
> action tunnel_key set \
> src_ip 11.11.0.1 \
> dst_ip 11.11.0.2 \
> id 11 \
> action mirred egress redirect dev vxlan0
>
> Signed-off-by: Amir Vadai <amir@vadai.me>
> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> ---
[...]
> +static void tunnel_key_release(struct tc_action *a, int bind)
> +{
> + struct tcf_tunnel_key *t = to_tunnel_key(a);
> + struct tcf_tunnel_key_params *params;
> +
> + rcu_read_lock();
> + params = rcu_dereference(t->params);
> +
> + if (params->tcft_action == TCA_TUNNEL_KEY_ACT_SET)
> + dst_release(¶ms->tcft_enc_metadata->dst);
> +
> + kfree_rcu(params, rcu);
> +
> + rcu_read_unlock();
> +}
> +
Same comment as Eric, you better own the action or else this could
race.
> +
> +static int tunnel_key_dump(struct sk_buff *skb, struct tc_action *a,
> + int bind, int ref)
> +{
> + unsigned char *b = skb_tail_pointer(skb);
> + struct tcf_tunnel_key *t = to_tunnel_key(a);
> + struct tcf_tunnel_key_params *params;
> + struct tc_tunnel_key opt = {
> + .index = t->tcf_index,
> + .refcnt = t->tcf_refcnt - ref,
> + .bindcnt = t->tcf_bindcnt - bind,
> + };
> + struct tcf_t tm;
> + int ret = -1;
> +
> + rcu_read_lock();
> + params = rcu_dereference(t->params);
This should be rtnl_derefence(t->params) and drop the read_lock/unlock
pair. This is always called with RTNL lock unless you have a path I'm
not seeing.
> +
> + opt.t_action = params->tcft_action;
> + opt.action = params->action;
> +
> + if (nla_put(skb, TCA_TUNNEL_KEY_PARMS, sizeof(opt), &opt))
> + goto nla_put_failure;
> +
> + if (params->tcft_action == TCA_TUNNEL_KEY_ACT_SET) {
> + struct ip_tunnel_key *key =
> + ¶ms->tcft_enc_metadata->u.tun_info.key;
> + __be32 key_id = tunnel_id_to_key32(key->tun_id);
> +
> + if (nla_put_be32(skb, TCA_TUNNEL_KEY_ENC_KEY_ID, key_id) ||
> + tunnel_key_dump_addresses(skb,
> + ¶ms->tcft_enc_metadata->u.tun_info))
> + goto nla_put_failure;
> + }
> +
> + tcf_tm_dump(&tm, &t->tcf_tm);
> + if (nla_put_64bit(skb, TCA_TUNNEL_KEY_TM, sizeof(tm),
> + &tm, TCA_TUNNEL_KEY_PAD))
> + goto nla_put_failure;
> +
> + ret = skb->len;
> + goto out;
> +
> +nla_put_failure:
> + nlmsg_trim(skb, b);
> +out:
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
I don't really care if you roll the above two rcu cleanups on top of
the patch as a follow up or roll a v8. But I think we should get the
annotation right here so its clear later.
.John
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V7 4/4] net/sched: Introduce act_tunnel_key
2016-09-08 16:15 ` John Fastabend
@ 2016-09-09 5:30 ` Cong Wang
2016-09-09 13:19 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Cong Wang @ 2016-09-09 5:30 UTC (permalink / raw)
To: John Fastabend
Cc: Hadar Hen Zion, David S. Miller, Linux Kernel Network Developers,
Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
Tom Herbert, Eric Dumazet, Amir Vadai, Or Gerlitz, Amir Vadai
On Thu, Sep 8, 2016 at 9:15 AM, John Fastabend <john.fastabend@gmail.com> wrote:
>
> This should be rtnl_derefence(t->params) and drop the read_lock/unlock
> pair. This is always called with RTNL lock unless you have a path I'm
> not seeing.
You missed the previous discussion on V6, John.
BTW, you really should follow the whole discussion instead of
jumping in the middle, like what you did for my patchset.
I understand you are eager to comment, but please don't waste
others' time in this way.... Please.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V7 4/4] net/sched: Introduce act_tunnel_key
2016-09-09 5:30 ` Cong Wang
@ 2016-09-09 13:19 ` Eric Dumazet
2016-09-09 15:42 ` John Fastabend
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2016-09-09 13:19 UTC (permalink / raw)
To: Cong Wang
Cc: John Fastabend, Hadar Hen Zion, David S. Miller,
Linux Kernel Network Developers, Jiri Pirko, Jiri Benc,
Jamal Hadi Salim, Shmulik Ladkani, Tom Herbert, Eric Dumazet,
Amir Vadai, Or Gerlitz, Amir Vadai
On Thu, 2016-09-08 at 22:30 -0700, Cong Wang wrote:
> On Thu, Sep 8, 2016 at 9:15 AM, John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > This should be rtnl_derefence(t->params) and drop the read_lock/unlock
> > pair. This is always called with RTNL lock unless you have a path I'm
> > not seeing.
>
> You missed the previous discussion on V6, John.
>
> BTW, you really should follow the whole discussion instead of
> jumping in the middle, like what you did for my patchset.
> I understand you are eager to comment, but please don't waste
> others' time in this way.... Please.
But John is right, and he definitely is welcome to give his feedback
even at V13 if he wants to.
tunnel_key_dump() is called with RTNL being held.
Take a deep breath, vacations, and come back when you are relaxed.
Thanks.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V7 4/4] net/sched: Introduce act_tunnel_key
2016-09-09 13:19 ` Eric Dumazet
@ 2016-09-09 15:42 ` John Fastabend
0 siblings, 0 replies; 11+ messages in thread
From: John Fastabend @ 2016-09-09 15:42 UTC (permalink / raw)
To: Eric Dumazet, Cong Wang
Cc: Hadar Hen Zion, David S. Miller, Linux Kernel Network Developers,
Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
Tom Herbert, Eric Dumazet, Amir Vadai, Or Gerlitz, Amir Vadai
On 16-09-09 06:19 AM, Eric Dumazet wrote:
> On Thu, 2016-09-08 at 22:30 -0700, Cong Wang wrote:
>> On Thu, Sep 8, 2016 at 9:15 AM, John Fastabend <john.fastabend@gmail.com> wrote:
>>>
>>> This should be rtnl_derefence(t->params) and drop the read_lock/unlock
>>> pair. This is always called with RTNL lock unless you have a path I'm
>>> not seeing.
>>
>> You missed the previous discussion on V6, John.
>>
>> BTW, you really should follow the whole discussion instead of
>> jumping in the middle, like what you did for my patchset.
>> I understand you are eager to comment, but please don't waste
>> others' time in this way.... Please.
>
> But John is right, and he definitely is welcome to give his feedback
> even at V13 if he wants to.
>
> tunnel_key_dump() is called with RTNL being held.
>
> Take a deep breath, vacations, and come back when you are relaxed.
>
> Thanks.
>
>
Also v6 discussion was around cleanup() call back I see nothing about
the dump() callbacks. And if there was it wasn't fixed so it should
be resolved.
Anyways Dave/Hadar feel free to submit a follow up patch or v8 it
doesn't much matter to me as noted in the original post.
.John
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next V7 0/4] net/sched: ip tunnel metadata set/release/classify by using TC
2016-09-08 13:23 [PATCH net-next V7 0/4] net/sched: ip tunnel metadata set/release/classify by using TC Hadar Hen Zion
` (3 preceding siblings ...)
2016-09-08 13:23 ` [PATCH net-next V7 4/4] net/sched: Introduce act_tunnel_key Hadar Hen Zion
@ 2016-09-11 4:06 ` David Miller
4 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2016-09-11 4:06 UTC (permalink / raw)
To: hadarh
Cc: netdev, jiri, jbenc, jhs, shmulik.ladkani, tom, edumazet,
xiyou.wangcong, amirva, ogerlitz
From: Hadar Hen Zion <hadarh@mellanox.com>
Date: Thu, 8 Sep 2016 16:23:44 +0300
> This patchset introduces ip tunnel manipulation support using the TC subsystem.
Series applied, thanks.
^ permalink raw reply [flat|nested] 11+ messages in thread