All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next V3 0/4] net/sched: ip tunnel metadata set/release/classify by using TC
@ 2016-08-25 16:13 Hadar Hen Zion
  2016-08-25 16:13 ` [PATCH net-next V3 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id() Hadar Hen Zion
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Hadar Hen Zion @ 2016-08-25 16:13 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
	Tom Herbert, Or Gerlitz, Amir Vadai, Hadar Hen Zion

Hi,

This patchset introduces ip tunnel manipulation support using the TC subsystem.

In the decap flow, it enables the user to redirect packets from a shared tunnel
device and classify by outer and inner headers. The outer headers are extracted
from the metadata and used by the flower filter. A new action act_tunnel_key,
releases the metadata.

In the encap flow, act_tunnel_key creates a metadata object to be used by the
shared tunnel device. The actual redirection to the tunnel device is done using
act_mirred.

For example:
$ tc qdisc add dev vnet0 ingress
$ tc filter add dev vnet0 protocol ip parent ffff: \
	flower \
	 ip_proto 1 \
	action tunnel_key set \
	 src_ip 11.11.0.1 \
	 dst_ip 11.11.0.2 \
	 id 11 \
	action mirred egress redirect dev vxlan0

$ tc qdisc add dev vxlan0 ingress
$ tc filter add dev vxlan0 protocol ip parent ffff: \
	flower \
	 enc_src_ip 11.11.0.2 \
	 enc_dst_ip 11.11.0.1 \
	 enc_key_id 11 \
	action tunnel_key release \
	action mirred egress redirect dev vnet0

Amir & Hadar

Changes from V2:
- Use union in struct fl_flow_key for enc_ipv6 and enc_ipv4.
- Rename functions _ip_tun_rx_dst and _ipv6_tun_rx_dst to _ip_tun_set_dst and
  _ipv6_tun_set_dst accordingly.
- Remove local parameter 'encapdecap' from tunnel_key_init function.
- Don't copy in6_addr values in tunnel_key_dump_addresses function, use pointers.

Changes from V1:
- More cleanups to key32_to_tunnel_id() and tunnel_id_to_key32()
- IPv6 Support added
- Set TUNNEL_KEY flag to make GRE work
- Handle zero tunnel id properly in act_tunnel_key
- Don't leave junk in decap action
- Fix bug in act_tunnel_key initialization where (exists & ocr) is true
- Remove BUG() from code
- Rename action to tunnel_key
- Improve grep-ability of code
- Reuse code from ip_tun_rx_dst() and ipv6_tun_rx_dst()

Changes from RFC:
- Add a new action instead of making mirred too complex
- No need to specify UDP port in action - it is already in the tunnel device
  configuration
- Added a decap operation to drop tunnel metadata

Amir Vadai (4):
  net/ip_tunnels: Introduce tunnel_id_to_key32() and
    key32_to_tunnel_id()
  net/dst: Utility functions to build dst_metadata without supplying an
    skb
  net/sched: cls_flower: Classify packet in ip tunnels
  net/sched: Introduce act_tunnel_key

 drivers/net/vxlan.c                       |   4 +-
 include/net/dst_metadata.h                |  45 +++--
 include/net/ip_tunnels.h                  |  19 ++
 include/net/tc_act/tc_tunnel_key.h        |  25 +++
 include/net/vxlan.h                       |  18 --
 include/uapi/linux/pkt_cls.h              |  11 ++
 include/uapi/linux/tc_act/tc_tunnel_key.h |  42 ++++
 net/ipv4/ip_gre.c                         |  23 +--
 net/sched/Kconfig                         |  11 ++
 net/sched/Makefile                        |   1 +
 net/sched/act_tunnel_key.c                | 312 ++++++++++++++++++++++++++++++
 net/sched/cls_flower.c                    | 101 +++++++++-
 12 files changed, 556 insertions(+), 56 deletions(-)
 create mode 100644 include/net/tc_act/tc_tunnel_key.h
 create mode 100644 include/uapi/linux/tc_act/tc_tunnel_key.h
 create mode 100644 net/sched/act_tunnel_key.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH net-next V3 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id()
  2016-08-25 16:13 [PATCH net-next V3 0/4] net/sched: ip tunnel metadata set/release/classify by using TC Hadar Hen Zion
@ 2016-08-25 16:13 ` Hadar Hen Zion
  2016-08-26 10:26   ` Jiri Benc
  2016-08-25 16:13 ` [PATCH net-next V3 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb Hadar Hen Zion
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Hadar Hen Zion @ 2016-08-25 16:13 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
	Tom Herbert, Or Gerlitz, Amir Vadai, Amir Vadai, Hadar Hen Zion

From: Amir Vadai <amir@vadai.me>

Add utility functions to convert a 32 bits key into a 64 bits tunnel and
vice versa.
These functions will be used instead of cloning code in GRE and VXLAN,
and in tc act_iptunnel which will be introduced in a following patch in
this patchset.

Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
 drivers/net/vxlan.c      |  4 ++--
 include/net/ip_tunnels.h | 19 +++++++++++++++++++
 include/net/vxlan.h      | 18 ------------------
 net/ipv4/ip_gre.c        | 23 ++---------------------
 4 files changed, 23 insertions(+), 41 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index c0dda6f..b1ddf8f 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1294,7 +1294,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
 		struct metadata_dst *tun_dst;
 
 		tun_dst = udp_tun_rx_dst(skb, vxlan_get_sk_family(vs), TUNNEL_KEY,
-					 vxlan_vni_to_tun_id(vni), sizeof(*md));
+					 key32_to_tunnel_id(vni), sizeof(*md));
 
 		if (!tun_dst)
 			goto drop;
@@ -1948,7 +1948,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 			goto drop;
 		}
 		dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port;
-		vni = vxlan_tun_id_to_vni(info->key.tun_id);
+		vni = tunnel_id_to_key32(info->key.tun_id);
 		remote_ip.sa.sa_family = ip_tunnel_info_af(info);
 		if (remote_ip.sa.sa_family == AF_INET) {
 			remote_ip.sin.sin_addr.s_addr = info->key.u.ipv4.dst;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index a5e7035..e598c63 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -222,6 +222,25 @@ static inline unsigned short ip_tunnel_info_af(const struct ip_tunnel_info
 	return tun_info->mode & IP_TUNNEL_INFO_IPV6 ? AF_INET6 : AF_INET;
 }
 
+static inline __be64 key32_to_tunnel_id(__be32 key)
+{
+#ifdef __BIG_ENDIAN
+	return (__force __be64)key;
+#else
+	return (__force __be64)((__force u64)key << 32);
+#endif
+}
+
+/* Returns the least-significant 32 bits of a __be64. */
+static inline __be32 tunnel_id_to_key32(__be64 tun_id)
+{
+#ifdef __BIG_ENDIAN
+	return (__force __be32)tun_id;
+#else
+	return (__force __be32)((__force u64)tun_id >> 32);
+#endif
+}
+
 #ifdef CONFIG_INET
 
 int ip_tunnel_init(struct net_device *dev);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index b96d036..0255613 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -350,24 +350,6 @@ static inline __be32 vxlan_vni_field(__be32 vni)
 #endif
 }
 
-static inline __be32 vxlan_tun_id_to_vni(__be64 tun_id)
-{
-#if defined(__BIG_ENDIAN)
-	return (__force __be32)tun_id;
-#else
-	return (__force __be32)((__force u64)tun_id >> 32);
-#endif
-}
-
-static inline __be64 vxlan_vni_to_tun_id(__be32 vni)
-{
-#if defined(__BIG_ENDIAN)
-	return (__force __be64)vni;
-#else
-	return (__force __be64)((u64)(__force u32)vni << 32);
-#endif
-}
-
 static inline size_t vxlan_rco_start(__be32 vni_field)
 {
 	return be32_to_cpu(vni_field & VXLAN_RCO_MASK) << VXLAN_RCO_SHIFT;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 113cc43..576f705 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -246,25 +246,6 @@ static void gre_err(struct sk_buff *skb, u32 info)
 	ipgre_err(skb, info, &tpi);
 }
 
-static __be64 key_to_tunnel_id(__be32 key)
-{
-#ifdef __BIG_ENDIAN
-	return (__force __be64)((__force u32)key);
-#else
-	return (__force __be64)((__force u64)key << 32);
-#endif
-}
-
-/* Returns the least-significant 32 bits of a __be64. */
-static __be32 tunnel_id_to_key(__be64 x)
-{
-#ifdef __BIG_ENDIAN
-	return (__force __be32)x;
-#else
-	return (__force __be32)((__force u64)x >> 32);
-#endif
-}
-
 static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
 		       struct ip_tunnel_net *itn, int hdr_len, bool raw_proto)
 {
@@ -290,7 +271,7 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
 			__be64 tun_id;
 
 			flags = tpi->flags & (TUNNEL_CSUM | TUNNEL_KEY);
-			tun_id = key_to_tunnel_id(tpi->key);
+			tun_id = key32_to_tunnel_id(tpi->key);
 			tun_dst = ip_tun_rx_dst(skb, flags, tun_id, 0);
 			if (!tun_dst)
 				return PACKET_REJECT;
@@ -446,7 +427,7 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	flags = tun_info->key.tun_flags & (TUNNEL_CSUM | TUNNEL_KEY);
 	gre_build_header(skb, tunnel_hlen, flags, proto,
-			 tunnel_id_to_key(tun_info->key.tun_id), 0);
+			 tunnel_id_to_key32(tun_info->key.tun_id), 0);
 
 	df = key->tun_flags & TUNNEL_DONT_FRAGMENT ?  htons(IP_DF) : 0;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next V3 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb
  2016-08-25 16:13 [PATCH net-next V3 0/4] net/sched: ip tunnel metadata set/release/classify by using TC Hadar Hen Zion
  2016-08-25 16:13 ` [PATCH net-next V3 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id() Hadar Hen Zion
@ 2016-08-25 16:13 ` Hadar Hen Zion
  2016-08-25 16:40   ` Shmulik Ladkani
  2016-08-25 16:13 ` [PATCH net-next V3 3/4] net/sched: cls_flower: Classify packet in ip tunnels Hadar Hen Zion
  2016-08-25 16:13 ` [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key Hadar Hen Zion
  3 siblings, 1 reply; 23+ messages in thread
From: Hadar Hen Zion @ 2016-08-25 16:13 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
	Tom Herbert, Or Gerlitz, Amir Vadai, Amir Vadai, Hadar Hen Zion

From: Amir Vadai <amir@vadai.me>

Extract _ip_tun_rx_dst() and _ipv6_tun_rx_dst() out of ip_tun_rx_dst()
and ipv6_tun_rx_dst(), to be used without supplying an skb.

Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
---
 include/net/dst_metadata.h | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 5db9f59..f82ea58 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -112,12 +112,10 @@ static inline struct ip_tunnel_info *skb_tunnel_info_unclone(struct sk_buff *skb
 	return &dst->u.tun_info;
 }
 
-static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
-						 __be16 flags,
-						 __be64 tunnel_id,
-						 int md_size)
+static inline struct metadata_dst *
+_ip_tun_set_dst(__be32 saddr, __be32 daddr, __u8 tos, __u8 ttl,
+		__be16 flags, __be64 tunnel_id, int md_size)
 {
-	const struct iphdr *iph = ip_hdr(skb);
 	struct metadata_dst *tun_dst;
 
 	tun_dst = tun_rx_dst(md_size);
@@ -125,17 +123,27 @@ static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
 		return NULL;
 
 	ip_tunnel_key_init(&tun_dst->u.tun_info.key,
-			   iph->saddr, iph->daddr, iph->tos, iph->ttl,
+			   saddr, daddr, tos, ttl,
 			   0, 0, 0, tunnel_id, flags);
 	return tun_dst;
 }
 
-static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
+static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
 						 __be16 flags,
 						 __be64 tunnel_id,
 						 int md_size)
 {
-	const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+	const struct iphdr *iph = ip_hdr(skb);
+
+	return _ip_tun_set_dst(iph->saddr, iph->daddr, iph->tos, iph->ttl,
+			      flags, tunnel_id, md_size);
+}
+
+static inline struct metadata_dst *
+_ipv6_tun_set_dst(const struct in6_addr saddr, const struct in6_addr daddr,
+		  __u8 tos, __u8 ttl, __be32 label, __be16 flags,
+		  __be64 tunnel_id, int md_size)
+{
 	struct metadata_dst *tun_dst;
 	struct ip_tunnel_info *info;
 
@@ -150,14 +158,25 @@ static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
 	info->key.tp_src = 0;
 	info->key.tp_dst = 0;
 
-	info->key.u.ipv6.src = ip6h->saddr;
-	info->key.u.ipv6.dst = ip6h->daddr;
+	info->key.u.ipv6.src = saddr;
+	info->key.u.ipv6.dst = daddr;
 
-	info->key.tos = ipv6_get_dsfield(ip6h);
-	info->key.ttl = ip6h->hop_limit;
-	info->key.label = ip6_flowlabel(ip6h);
+	info->key.tos = tos;
+	info->key.ttl = ttl;
+	info->key.label = label;
 
 	return tun_dst;
 }
 
+static inline struct metadata_dst *
+ipv6_tun_rx_dst(struct sk_buff *skb, __be16 flags, __be64 tunnel_id,
+		int md_size)
+{
+	const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+
+	return _ipv6_tun_set_dst(ip6h->saddr, ip6h->daddr,
+				ipv6_get_dsfield(ip6h), ip6h->hop_limit,
+				ip6_flowlabel(ip6h), flags, tunnel_id,
+				md_size);
+}
 #endif /* __NET_DST_METADATA_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next V3 3/4] net/sched: cls_flower: Classify packet in ip tunnels
  2016-08-25 16:13 [PATCH net-next V3 0/4] net/sched: ip tunnel metadata set/release/classify by using TC Hadar Hen Zion
  2016-08-25 16:13 ` [PATCH net-next V3 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id() Hadar Hen Zion
  2016-08-25 16:13 ` [PATCH net-next V3 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb Hadar Hen Zion
@ 2016-08-25 16:13 ` Hadar Hen Zion
  2016-08-26 10:46   ` Jiri Benc
  2016-08-25 16:13 ` [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key Hadar Hen Zion
  3 siblings, 1 reply; 23+ messages in thread
From: Hadar Hen Zion @ 2016-08-25 16:13 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
	Tom Herbert, Or Gerlitz, Amir Vadai, Amir Vadai, Hadar Hen Zion

From: Amir Vadai <amir@vadai.me>

Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.

For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0' (after metadata is released):

$ filter add dev vxlan0 protocol ip parent ffff: \
    flower \
      enc_src_ip 11.11.0.2 \
      enc_dst_ip 11.11.0.1 \
      enc_key_id 11 \
      dst_ip 11.11.11.1 \
    action iptunnel decap \
    action mirred egress redirect dev vnet0

The action iptunnel, will be introduced in the next patch in this
series.

Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
---
 include/uapi/linux/pkt_cls.h |  11 +++++
 net/sched/cls_flower.c       | 101 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51b5b24..f9c287c 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -431,6 +431,17 @@ enum {
 	TCA_FLOWER_KEY_VLAN_ID,
 	TCA_FLOWER_KEY_VLAN_PRIO,
 	TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+
+	TCA_FLOWER_KEY_ENC_KEY_ID,	/* be32 */
+	TCA_FLOWER_KEY_ENC_IPV4_SRC,	/* be32 */
+	TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,/* be32 */
+	TCA_FLOWER_KEY_ENC_IPV4_DST,	/* be32 */
+	TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,/* be32 */
+	TCA_FLOWER_KEY_ENC_IPV6_SRC,	/* struct in6_addr */
+	TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,/* struct in6_addr */
+	TCA_FLOWER_KEY_ENC_IPV6_DST,	/* struct in6_addr */
+	TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,/* struct in6_addr */
+
 	__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 1e11e57..46f4f52 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -23,9 +23,13 @@
 #include <net/ip.h>
 #include <net/flow_dissector.h>
 
+#include <net/dst.h>
+#include <net/dst_metadata.h>
+
 struct fl_flow_key {
 	int	indev_ifindex;
 	struct flow_dissector_key_control control;
+	struct flow_dissector_key_control enc_control;
 	struct flow_dissector_key_basic basic;
 	struct flow_dissector_key_eth_addrs eth;
 	struct flow_dissector_key_vlan vlan;
@@ -35,6 +39,11 @@ struct fl_flow_key {
 		struct flow_dissector_key_ipv6_addrs ipv6;
 	};
 	struct flow_dissector_key_ports tp;
+	struct flow_dissector_key_keyid enc_key_id;
+	union {
+		struct flow_dissector_key_ipv4_addrs enc_ipv4;
+		struct flow_dissector_key_ipv6_addrs enc_ipv6;
+	};
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. */
 
 struct fl_flow_mask_range {
@@ -124,11 +133,31 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 	struct cls_fl_filter *f;
 	struct fl_flow_key skb_key;
 	struct fl_flow_key skb_mkey;
+	struct ip_tunnel_info *info;
 
 	if (!atomic_read(&head->ht.nelems))
 		return -1;
 
 	fl_clear_masked_range(&skb_key, &head->mask);
+
+	info = skb_tunnel_info(skb);
+	if (info) {
+		struct ip_tunnel_key *key = &info->key;
+
+		switch (ip_tunnel_info_af(info)) {
+		case AF_INET:
+			skb_key.enc_ipv4.src = key->u.ipv4.src;
+			skb_key.enc_ipv4.dst = key->u.ipv4.dst;
+			break;
+		case AF_INET6:
+			skb_key.enc_ipv6.src = key->u.ipv6.src;
+			skb_key.enc_ipv6.dst = key->u.ipv6.dst;
+			break;
+		}
+
+		skb_key.enc_key_id.keyid = tunnel_id_to_key32(key->tun_id);
+	}
+
 	skb_key.indev_ifindex = skb->skb_iif;
 	/* skb_flow_dissect() does not set n_proto in case an unknown protocol,
 	 * so do it rather here.
@@ -297,7 +326,15 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
 	[TCA_FLOWER_KEY_VLAN_ID]	= { .type = NLA_U16 },
 	[TCA_FLOWER_KEY_VLAN_PRIO]	= { .type = NLA_U8 },
 	[TCA_FLOWER_KEY_VLAN_ETH_TYPE]	= { .type = NLA_U16 },
-
+	[TCA_FLOWER_KEY_ENC_KEY_ID]	= { .type = NLA_U32 },
+	[TCA_FLOWER_KEY_ENC_IPV4_SRC]	= { .type = NLA_U32 },
+	[TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK] = { .type = NLA_U32 },
+	[TCA_FLOWER_KEY_ENC_IPV4_DST]	= { .type = NLA_U32 },
+	[TCA_FLOWER_KEY_ENC_IPV4_DST_MASK] = { .type = NLA_U32 },
+	[TCA_FLOWER_KEY_ENC_IPV6_SRC]	= { .len = sizeof(struct in6_addr) },
+	[TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK] = { .len = sizeof(struct in6_addr) },
+	[TCA_FLOWER_KEY_ENC_IPV6_DST]	= { .len = sizeof(struct in6_addr) },
+	[TCA_FLOWER_KEY_ENC_IPV6_DST_MASK] = { .len = sizeof(struct in6_addr) },
 };
 
 static void fl_set_key_val(struct nlattr **tb,
@@ -345,7 +382,6 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 		mask->indev_ifindex = 0xffffffff;
 	}
 #endif
-
 	fl_set_key_val(tb, key->eth.dst, TCA_FLOWER_KEY_ETH_DST,
 		       mask->eth.dst, TCA_FLOWER_KEY_ETH_DST_MASK,
 		       sizeof(key->eth.dst));
@@ -408,6 +444,40 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 			       sizeof(key->tp.dst));
 	}
 
+	if (tb[TCA_FLOWER_KEY_ENC_IPV4_SRC] ||
+	    tb[TCA_FLOWER_KEY_ENC_IPV4_DST]) {
+		key->enc_control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
+		fl_set_key_val(tb, &key->enc_ipv4.src,
+			       TCA_FLOWER_KEY_ENC_IPV4_SRC,
+			       &mask->enc_ipv4.src,
+			       TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,
+			       sizeof(key->enc_ipv4.src));
+		fl_set_key_val(tb, &key->enc_ipv4.dst,
+			       TCA_FLOWER_KEY_ENC_IPV4_DST,
+			       &mask->enc_ipv4.dst,
+			       TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,
+			       sizeof(key->enc_ipv4.dst));
+	}
+
+	if (tb[TCA_FLOWER_KEY_ENC_IPV6_SRC] ||
+	    tb[TCA_FLOWER_KEY_ENC_IPV6_DST]) {
+		key->enc_control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
+		fl_set_key_val(tb, &key->enc_ipv6.src,
+			       TCA_FLOWER_KEY_ENC_IPV6_SRC,
+			       &mask->enc_ipv6.src,
+			       TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,
+			       sizeof(key->enc_ipv6.src));
+		fl_set_key_val(tb, &key->enc_ipv6.dst,
+			       TCA_FLOWER_KEY_ENC_IPV6_DST,
+			       &mask->enc_ipv6.dst,
+			       TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,
+			       sizeof(key->enc_ipv6.dst));
+	}
+
+	fl_set_key_val(tb, &key->enc_key_id.keyid, TCA_FLOWER_KEY_ENC_KEY_ID,
+		       &mask->enc_key_id.keyid, TCA_FLOWER_KEY_ENC_KEY_ID,
+		       sizeof(key->enc_key_id.keyid));
+
 	return 0;
 }
 
@@ -815,6 +885,33 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 				  sizeof(key->tp.dst))))
 		goto nla_put_failure;
 
+	if (key->enc_control.addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS &&
+	    (fl_dump_key_val(skb, &key->enc_ipv4.src,
+			    TCA_FLOWER_KEY_ENC_IPV4_SRC, &mask->enc_ipv4.src,
+			    TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,
+			    sizeof(key->enc_ipv4.src)) ||
+	     fl_dump_key_val(skb, &key->enc_ipv4.dst,
+			     TCA_FLOWER_KEY_ENC_IPV4_DST, &mask->enc_ipv4.dst,
+			     TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,
+			     sizeof(key->enc_ipv4.dst))))
+		goto nla_put_failure;
+	else if (key->enc_control.addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS &&
+		 (fl_dump_key_val(skb, &key->enc_ipv6.src,
+			    TCA_FLOWER_KEY_ENC_IPV6_SRC, &mask->enc_ipv6.src,
+			    TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,
+			    sizeof(key->enc_ipv6.src)) ||
+		 fl_dump_key_val(skb, &key->enc_ipv6.dst,
+				 TCA_FLOWER_KEY_ENC_IPV6_DST,
+				 &mask->enc_ipv6.dst,
+				 TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,
+			    sizeof(key->enc_ipv6.dst))))
+		goto nla_put_failure;
+
+	if (fl_dump_key_val(skb, &key->enc_key_id, TCA_FLOWER_KEY_ENC_KEY_ID,
+			    &mask->enc_key_id, TCA_FLOWER_KEY_ENC_KEY_ID,
+			    sizeof(key->enc_key_id)))
+		goto nla_put_failure;
+
 	nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags);
 
 	if (tcf_exts_dump(skb, &f->exts))
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key
  2016-08-25 16:13 [PATCH net-next V3 0/4] net/sched: ip tunnel metadata set/release/classify by using TC Hadar Hen Zion
                   ` (2 preceding siblings ...)
  2016-08-25 16:13 ` [PATCH net-next V3 3/4] net/sched: cls_flower: Classify packet in ip tunnels Hadar Hen Zion
@ 2016-08-25 16:13 ` Hadar Hen Zion
  2016-08-25 16:52   ` Shmulik Ladkani
                     ` (2 more replies)
  3 siblings, 3 replies; 23+ messages in thread
From: Hadar Hen Zion @ 2016-08-25 16:13 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
	Tom Herbert, Or Gerlitz, Amir Vadai, Amir Vadai, Hadar Hen Zion

From: Amir Vadai <amir@vadai.me>

This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.

The action will release the metadata created by the tunnel device
(decap), or set the metadata with the specified values for encap
operation.

For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:

$ filter add dev net0 protocol ip parent ffff: \
    flower \
      ip_proto 1 \
      dst_ip 11.11.11.2 \
    action tunnel_key set\
      src_ip 11.11.0.1 \
      dst_ip 11.11.0.2 \
      id 11 \
    action mirred egress redirect dev vxlan0

Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
---
 include/net/tc_act/tc_tunnel_key.h        |  25 +++
 include/uapi/linux/tc_act/tc_tunnel_key.h |  42 ++++
 net/sched/Kconfig                         |  11 ++
 net/sched/Makefile                        |   1 +
 net/sched/act_tunnel_key.c                | 312 ++++++++++++++++++++++++++++++
 5 files changed, 391 insertions(+)
 create mode 100644 include/net/tc_act/tc_tunnel_key.h
 create mode 100644 include/uapi/linux/tc_act/tc_tunnel_key.h
 create mode 100644 net/sched/act_tunnel_key.c

diff --git a/include/net/tc_act/tc_tunnel_key.h b/include/net/tc_act/tc_tunnel_key.h
new file mode 100644
index 0000000..18d5950
--- /dev/null
+++ b/include/net/tc_act/tc_tunnel_key.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <amir@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __NET_TC_TUNNEL_KEY_H
+#define __NET_TC_TUNNEL_KEY_H
+
+#include <net/act_api.h>
+
+struct tcf_tunnel_key {
+	struct tc_action	common;
+	int			tcft_action;
+	struct metadata_dst     *tcft_enc_metadata;
+};
+
+#define to_tunnel_key(a) ((struct tcf_tunnel_key *)a)
+
+#endif /* __NET_TC_TUNNEL_KEY_H */
+
diff --git a/include/uapi/linux/tc_act/tc_tunnel_key.h b/include/uapi/linux/tc_act/tc_tunnel_key.h
new file mode 100644
index 0000000..f9ddf53
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_tunnel_key.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <amir@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __LINUX_TC_TUNNEL_KEY_H
+#define __LINUX_TC_TUNNEL_KEY_H
+
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_TUNNEL_KEY 17
+
+#define TCA_TUNNEL_KEY_ACT_SET	    1
+#define TCA_TUNNEL_KEY_ACT_RELEASE  2
+
+struct tc_tunnel_key {
+	tc_gen;
+	int t_action;
+};
+
+enum {
+	TCA_TUNNEL_KEY_UNSPEC,
+	TCA_TUNNEL_KEY_TM,
+	TCA_TUNNEL_KEY_PARMS,
+	TCA_TUNNEL_KEY_ENC_IPV4_SRC,	/* be32 */
+	TCA_TUNNEL_KEY_ENC_IPV4_DST,	/* be32 */
+	TCA_TUNNEL_KEY_ENC_IPV6_SRC,	/* struct in6_addr */
+	TCA_TUNNEL_KEY_ENC_IPV6_DST,	/* struct in6_addr */
+	TCA_TUNNEL_KEY_ENC_KEY_ID,	/* be64 */
+	TCA_TUNNEL_KEY_PAD,
+	__TCA_TUNNEL_KEY_MAX,
+};
+
+#define TCA_TUNNEL_KEY_MAX (__TCA_TUNNEL_KEY_MAX - 1)
+
+#endif
+
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index ccf931b..72e3426 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -761,6 +761,17 @@ config NET_ACT_IFE
 	  To compile this code as a module, choose M here: the
 	  module will be called act_ife.
 
+config NET_ACT_TUNNEL_KEY
+        tristate "IP tunnel metadata manipulation"
+        depends on NET_CLS_ACT
+        ---help---
+	  Say Y here to set/release ip tunnel metadata.
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_tunnel_key.
+
 config NET_IFE_SKBMARK
         tristate "Support to encoding decoding skb mark on IFE action"
         depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index ae088a5..b9d046b 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
 obj-$(CONFIG_NET_ACT_IFE)	+= act_ife.o
 obj-$(CONFIG_NET_IFE_SKBMARK)	+= act_meta_mark.o
 obj-$(CONFIG_NET_IFE_SKBPRIO)	+= act_meta_skbprio.o
+obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
 obj-$(CONFIG_NET_SCH_FIFO)	+= sch_fifo.o
 obj-$(CONFIG_NET_SCH_CBQ)	+= sch_cbq.o
 obj-$(CONFIG_NET_SCH_HTB)	+= sch_htb.o
diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
new file mode 100644
index 0000000..4726317
--- /dev/null
+++ b/net/sched/act_tunnel_key.c
@@ -0,0 +1,312 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <amir@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/dst.h>
+#include <net/dst_metadata.h>
+
+#include <linux/tc_act/tc_tunnel_key.h>
+#include <net/tc_act/tc_tunnel_key.h>
+
+#define TUNNEL_KEY_TAB_MASK     15
+
+static int tunnel_key_net_id;
+static struct tc_action_ops act_tunnel_key_ops;
+
+static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
+			  struct tcf_result *res)
+{
+	struct tcf_tunnel_key *t = to_tunnel_key(a);
+	int action;
+
+	spin_lock(&t->tcf_lock);
+	tcf_lastuse_update(&t->tcf_tm);
+	bstats_update(&t->tcf_bstats, skb);
+	action = t->tcf_action;
+
+	switch (t->tcft_action) {
+	case TCA_TUNNEL_KEY_ACT_RELEASE:
+		skb_dst_set_noref(skb, NULL);
+		break;
+	case TCA_TUNNEL_KEY_ACT_SET:
+		skb_dst_set_noref(skb, &t->tcft_enc_metadata->dst);
+		break;
+	default:
+		WARN_ONCE(1, "Bad tunnel_key action.\n");
+		break;
+	}
+
+	spin_unlock(&t->tcf_lock);
+	return action;
+}
+
+static const struct nla_policy tunnel_key_policy[TCA_TUNNEL_KEY_MAX + 1] = {
+	[TCA_TUNNEL_KEY_PARMS]	    = { .len = sizeof(struct tc_tunnel_key) },
+	[TCA_TUNNEL_KEY_ENC_IPV4_SRC] = { .type = NLA_U32 },
+	[TCA_TUNNEL_KEY_ENC_IPV4_DST] = { .type = NLA_U32 },
+	[TCA_TUNNEL_KEY_ENC_IPV6_SRC] = { .len = sizeof(struct in6_addr) },
+	[TCA_TUNNEL_KEY_ENC_IPV6_DST] = { .len = sizeof(struct in6_addr) },
+	[TCA_TUNNEL_KEY_ENC_KEY_ID]   = { .type = NLA_U32 },
+};
+
+static int tunnel_key_init(struct net *net, struct nlattr *nla,
+			   struct nlattr *est, struct tc_action **a,
+			   int ovr, int bind)
+{
+	struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
+	struct nlattr *tb[TCA_TUNNEL_KEY_MAX + 1];
+	struct metadata_dst *metadata = NULL;
+	struct tc_tunnel_key *parm;
+	struct tcf_tunnel_key *t;
+	__be64 key_id;
+	bool exists = false;
+	int ret = 0;
+	int err;
+
+	if (!nla)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_TUNNEL_KEY_MAX, nla, tunnel_key_policy);
+	if (err < 0)
+		return err;
+
+	if (!tb[TCA_TUNNEL_KEY_PARMS])
+		return -EINVAL;
+
+	parm = nla_data(tb[TCA_TUNNEL_KEY_PARMS]);
+	exists = tcf_hash_check(tn, parm->index, a, bind);
+	if (exists && bind)
+		return 0;
+
+	switch (parm->t_action) {
+	case TCA_TUNNEL_KEY_ACT_RELEASE:
+		break;
+	case TCA_TUNNEL_KEY_ACT_SET:
+		if (!tb[TCA_TUNNEL_KEY_ENC_KEY_ID]) {
+			ret = -EINVAL;
+			goto err_out;
+		}
+
+		key_id = key32_to_tunnel_id(nla_get_be32(tb[TCA_TUNNEL_KEY_ENC_KEY_ID]));
+
+		if (tb[TCA_TUNNEL_KEY_ENC_IPV4_SRC] &&
+		    tb[TCA_TUNNEL_KEY_ENC_IPV4_DST]) {
+			__be32 saddr;
+			__be32 daddr;
+
+			saddr = nla_get_be32(tb[TCA_TUNNEL_KEY_ENC_IPV4_SRC]);
+			daddr = nla_get_be32(tb[TCA_TUNNEL_KEY_ENC_IPV4_DST]);
+
+			metadata = _ip_tun_set_dst(saddr, daddr, 0, 0,
+						   TUNNEL_KEY, key_id, 0);
+		} else if (tb[TCA_TUNNEL_KEY_ENC_IPV6_SRC] &&
+			   tb[TCA_TUNNEL_KEY_ENC_IPV6_DST]) {
+			struct in6_addr saddr;
+			struct in6_addr daddr;
+
+			saddr = nla_get_in6_addr(tb[TCA_TUNNEL_KEY_ENC_IPV6_SRC]);
+			daddr = nla_get_in6_addr(tb[TCA_TUNNEL_KEY_ENC_IPV6_DST]);
+
+			metadata = _ipv6_tun_set_dst(saddr, daddr, 0, 0, 0,
+						     TUNNEL_KEY, key_id, 0);
+		}
+
+		if (!metadata) {
+			ret = -EINVAL;
+			goto err_out;
+		}
+
+		metadata->u.tun_info.mode |= IP_TUNNEL_INFO_TX;
+		break;
+	default:
+		goto err_out;
+	}
+
+	if (!exists) {
+		ret = tcf_hash_create(tn, parm->index, est, a,
+				      &act_tunnel_key_ops, bind, false);
+		if (ret)
+			return ret;
+
+		ret = ACT_P_CREATED;
+	} else {
+		tcf_hash_release(*a, bind);
+		if (!ovr)
+			return -EEXIST;
+	}
+
+	t = to_tunnel_key(*a);
+
+	spin_lock_bh(&t->tcf_lock);
+
+	t->tcf_action = parm->action;
+	t->tcft_action = parm->t_action;
+	t->tcft_enc_metadata = metadata;
+
+	spin_unlock_bh(&t->tcf_lock);
+
+	if (ret == ACT_P_CREATED)
+		tcf_hash_insert(tn, *a);
+
+	return ret;
+
+err_out:
+	if (exists)
+		tcf_hash_release(*a, bind);
+	return ret;
+}
+
+static void tunnel_key_release(struct tc_action *a, int bind)
+{
+	struct tcf_tunnel_key *t = to_tunnel_key(a);
+
+	if (t->tcft_action == TCA_TUNNEL_KEY_ACT_SET)
+		dst_release(&t->tcft_enc_metadata->dst);
+}
+
+static int tunnel_key_dump_addresses(struct sk_buff *skb,
+				     const struct ip_tunnel_info *info)
+{
+	unsigned short family = ip_tunnel_info_af(info);
+
+	if (family == AF_INET) {
+		__be32 saddr = info->key.u.ipv4.src;
+		__be32 daddr = info->key.u.ipv4.dst;
+
+		if (!nla_put_be32(skb, TCA_TUNNEL_KEY_ENC_IPV4_SRC, saddr) &&
+		    !nla_put_be32(skb, TCA_TUNNEL_KEY_ENC_IPV4_DST, daddr))
+			return 0;
+	}
+
+	if (family == AF_INET6) {
+		const struct in6_addr *saddr6 = &info->key.u.ipv6.src;
+		const struct in6_addr *daddr6 = &info->key.u.ipv6.dst;
+
+		if (!nla_put_in6_addr(skb,
+				      TCA_TUNNEL_KEY_ENC_IPV6_SRC, saddr6) &&
+		    !nla_put_in6_addr(skb,
+				      TCA_TUNNEL_KEY_ENC_IPV6_DST, daddr6))
+			return 0;
+	}
+
+	return -EINVAL;
+}
+
+static int tunnel_key_dump(struct sk_buff *skb, struct tc_action *a,
+			   int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_tunnel_key *t = to_tunnel_key(a);
+	struct tc_tunnel_key opt = {
+		.index    = t->tcf_index,
+		.refcnt   = t->tcf_refcnt - ref,
+		.bindcnt  = t->tcf_bindcnt - bind,
+		.action   = t->tcf_action,
+		.t_action = t->tcft_action,
+	};
+	struct tcf_t tm;
+
+	if (nla_put(skb, TCA_TUNNEL_KEY_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	if (t->tcft_action == TCA_TUNNEL_KEY_ACT_SET) {
+		struct ip_tunnel_key *key =
+			&t->tcft_enc_metadata->u.tun_info.key;
+		__be32 key_id = tunnel_id_to_key32(key->tun_id);
+
+		if (nla_put_be32(skb, TCA_TUNNEL_KEY_ENC_KEY_ID, key_id) ||
+		    tunnel_key_dump_addresses(skb, &t->tcft_enc_metadata->u.tun_info))
+			goto nla_put_failure;
+	}
+
+	tcf_tm_dump(&tm, &t->tcf_tm);
+	if (nla_put_64bit(skb, TCA_TUNNEL_KEY_TM, sizeof(tm),
+			  &tm, TCA_TUNNEL_KEY_PAD))
+		goto nla_put_failure;
+
+	return skb->len;
+
+nla_put_failure:
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static int tunnel_key_walker(struct net *net, struct sk_buff *skb,
+			     struct netlink_callback *cb, int type,
+			     const struct tc_action_ops *ops)
+{
+	struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
+
+	return tcf_generic_walker(tn, skb, cb, type, ops);
+}
+
+static int tunnel_key_search(struct net *net, struct tc_action **a, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
+
+	return tcf_hash_search(tn, a, index);
+}
+
+static struct tc_action_ops act_tunnel_key_ops = {
+	.kind		=	"tunnel_key",
+	.type		=	TCA_ACT_TUNNEL_KEY,
+	.owner		=	THIS_MODULE,
+	.act		=	tunnel_key_act,
+	.dump		=	tunnel_key_dump,
+	.init		=	tunnel_key_init,
+	.cleanup	=	tunnel_key_release,
+	.walk		=	tunnel_key_walker,
+	.lookup		=	tunnel_key_search,
+	.size		=	sizeof(struct tcf_tunnel_key),
+};
+
+static __net_init int tunnel_key_init_net(struct net *net)
+{
+	struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
+
+	return tc_action_net_init(tn, &act_tunnel_key_ops, TUNNEL_KEY_TAB_MASK);
+}
+
+static void __net_exit tunnel_key_exit_net(struct net *net)
+{
+	struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
+
+	tc_action_net_exit(tn);
+}
+
+static struct pernet_operations tunnel_key_net_ops = {
+	.init = tunnel_key_init_net,
+	.exit = tunnel_key_exit_net,
+	.id   = &tunnel_key_net_id,
+	.size = sizeof(struct tc_action_net),
+};
+
+static int __init tunnel_key_init_module(void)
+{
+	return tcf_register_action(&act_tunnel_key_ops, &tunnel_key_net_ops);
+}
+
+static void __exit tunnel_key_cleanup_module(void)
+{
+	tcf_unregister_action(&act_tunnel_key_ops, &tunnel_key_net_ops);
+}
+
+module_init(tunnel_key_init_module);
+module_exit(tunnel_key_cleanup_module);
+
+MODULE_AUTHOR("Amir Vadai <amir@vadai.me>");
+MODULE_DESCRIPTION("ip tunnel manipulation actions");
+MODULE_LICENSE("GPL v2");
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb
  2016-08-25 16:13 ` [PATCH net-next V3 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb Hadar Hen Zion
@ 2016-08-25 16:40   ` Shmulik Ladkani
  2016-08-26  6:14     ` Or Gerlitz
  2016-08-26 10:31     ` Jiri Benc
  0 siblings, 2 replies; 23+ messages in thread
From: Shmulik Ladkani @ 2016-08-25 16:40 UTC (permalink / raw)
  To: Hadar Hen Zion
  Cc: David S. Miller, netdev, Jiri Pirko, Jiri Benc, Jamal Hadi Salim,
	Tom Herbert, Or Gerlitz, Amir Vadai, Amir Vadai

Hi,

On Thu, 25 Aug 2016 19:13:45 +0300 Hadar Hen Zion <hadarh@mellanox.com> wrote:
> From: Amir Vadai <amir@vadai.me>
> 
> Extract _ip_tun_rx_dst() and _ipv6_tun_rx_dst() out of ip_tun_rx_dst()
> and ipv6_tun_rx_dst(), to be used without supplying an skb.

Per this v3, the newly introduced helpers ate named _ip_tun_set_dst and
_ipv6_tun_set_dst - better alter the log message to reflect that.

> +static inline struct metadata_dst *
> +_ipv6_tun_set_dst(const struct in6_addr saddr, const struct in6_addr daddr,
> +		  __u8 tos, __u8 ttl, __be32 label, __be16 flags,
> +		  __be64 tunnel_id, int md_size)
> +{

Any reason not passing in6_addr pointers, as suggested?
This is the common practice for ipv6 address parameters.

Thanks,
Shmulik

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key
  2016-08-25 16:13 ` [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key Hadar Hen Zion
@ 2016-08-25 16:52   ` Shmulik Ladkani
  2016-08-25 17:48   ` Eric Dumazet
  2016-08-26 11:13   ` Jiri Benc
  2 siblings, 0 replies; 23+ messages in thread
From: Shmulik Ladkani @ 2016-08-25 16:52 UTC (permalink / raw)
  To: Hadar Hen Zion
  Cc: David S. Miller, netdev, Jiri Pirko, Jiri Benc, Jamal Hadi Salim,
	Tom Herbert, Or Gerlitz, Amir Vadai, Amir Vadai

Hi,

On Thu, 25 Aug 2016 19:13:47 +0300 Hadar Hen Zion <hadarh@mellanox.com> wrote:
> +static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
> +			  struct tcf_result *res)
> +{
> +	struct tcf_tunnel_key *t = to_tunnel_key(a);
> +	int action;
> +
> +	spin_lock(&t->tcf_lock);
> +	tcf_lastuse_update(&t->tcf_tm);
> +	bstats_update(&t->tcf_bstats, skb);
> +	action = t->tcf_action;
> +
> +	switch (t->tcft_action) {
> +	case TCA_TUNNEL_KEY_ACT_RELEASE:
> +		skb_dst_set_noref(skb, NULL);
> +		break;
> +	case TCA_TUNNEL_KEY_ACT_SET:
> +		skb_dst_set_noref(skb, &t->tcft_enc_metadata->dst);
> +		break;

Two additional questions:
 - No need to perform a 'skb_dst_drop(skb)' prior the dst set calls?
 - Why there's no need to take a reference on tcft_enc_metadata->dst?

Besides that,

Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key
  2016-08-25 16:13 ` [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key Hadar Hen Zion
  2016-08-25 16:52   ` Shmulik Ladkani
@ 2016-08-25 17:48   ` Eric Dumazet
  2016-08-26  6:16     ` Or Gerlitz
  2016-08-26 18:26     ` Cong Wang
  2016-08-26 11:13   ` Jiri Benc
  2 siblings, 2 replies; 23+ messages in thread
From: Eric Dumazet @ 2016-08-25 17:48 UTC (permalink / raw)
  To: Hadar Hen Zion
  Cc: David S. Miller, netdev, Jiri Pirko, Jiri Benc, Jamal Hadi Salim,
	Shmulik Ladkani, Tom Herbert, Or Gerlitz, Amir Vadai, Amir Vadai

On Thu, 2016-08-25 at 19:13 +0300, Hadar Hen Zion wrote:
> From: Amir Vadai <amir@vadai.me>
> 
> This action could be used before redirecting packets to a shared tunnel
> device, or when redirecting packets arriving from a such a device.
> 
> The action will release the metadata created by the tunnel device
> (decap), or set the metadata with the specified values for encap
> operation.
> 
> For example, the following flower filter will forward all ICMP packets
> destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
> redirecting, a metadata for the vxlan tunnel is created using the
> tunnel_key action and it's arguments:

....

> +
> +static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
> +			  struct tcf_result *res)
> +{
> +	struct tcf_tunnel_key *t = to_tunnel_key(a);
> +	int action;
> +
> +	spin_lock(&t->tcf_lock);
> +	tcf_lastuse_update(&t->tcf_tm);
> +	bstats_update(&t->tcf_bstats, skb);
> +	action = t->tcf_action;
> +
> +	switch (t->tcft_action) {
> +	case TCA_TUNNEL_KEY_ACT_RELEASE:
> +		skb_dst_set_noref(skb, NULL);
> +		break;
> +	case TCA_TUNNEL_KEY_ACT_SET:
> +		skb_dst_set_noref(skb, &t->tcft_enc_metadata->dst);
> +		break;
> +	default:
> +		WARN_ONCE(1, "Bad tunnel_key action.\n");
> +		break;
> +	}
> +
> +	spin_unlock(&t->tcf_lock);
> +	return action;


Please find a better way than using a spinlock in this hot path.

Maybe looking at 
2ee22a90c7afac265bb6f7abea610b938195e2b8 net_sched: act_mirred: remove spinlock in fast path
56e5d1ca183d8616fab377d7d466c244b4dbb3b9 net_sched: act_gact: remove spinlock in fast path
8f2ae965b7ef4f4ddab6110f06388e270723d694 net_sched: act_gact: read tcfg_ptype once
cc6510a9504fd3c03d76bd68d99653148342eecc net_sched: act_gact: use a separate packet counters for gact_determ()
cef5ecf96b28dc91c4e9f398a336c578fb9e1a0c net_sched: act_gact: make tcfg_pval non zero
519c818e8fb646eef1e8bfedd18519bec47bc9a9 net: sched: add percpu stats to actions

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb
  2016-08-25 16:40   ` Shmulik Ladkani
@ 2016-08-26  6:14     ` Or Gerlitz
  2016-08-26 10:31     ` Jiri Benc
  1 sibling, 0 replies; 23+ messages in thread
From: Or Gerlitz @ 2016-08-26  6:14 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: Hadar Hen Zion, David S. Miller, Linux Netdev List, Jiri Pirko,
	Jiri Benc, Jamal Hadi Salim, Tom Herbert, Or Gerlitz, Amir Vadai,
	Amir Vadai

On Thu, Aug 25, 2016 at 7:40 PM, Shmulik Ladkani
<shmulik.ladkani@gmail.com> wrote:
> Hi,
>
> On Thu, 25 Aug 2016 19:13:45 +0300 Hadar Hen Zion <hadarh@mellanox.com> wrote:
>> From: Amir Vadai <amir@vadai.me>
>>
>> Extract _ip_tun_rx_dst() and _ipv6_tun_rx_dst() out of ip_tun_rx_dst()
>> and ipv6_tun_rx_dst(), to be used without supplying an skb.
>
> Per this v3, the newly introduced helpers ate named _ip_tun_set_dst and
> _ipv6_tun_set_dst - better alter the log message to reflect that.

sure, will fix that

>> +static inline struct metadata_dst *
>> +_ipv6_tun_set_dst(const struct in6_addr saddr, const struct in6_addr daddr,
>> +               __u8 tos, __u8 ttl, __be32 label, __be16 flags,
>> +               __be64 tunnel_id, int md_size)
>> +{
>
> Any reason not passing in6_addr pointers, as suggested?

guess not

> This is the common practice for ipv6 address parameters.

will do

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key
  2016-08-25 17:48   ` Eric Dumazet
@ 2016-08-26  6:16     ` Or Gerlitz
  2016-08-26 18:26     ` Cong Wang
  1 sibling, 0 replies; 23+ messages in thread
From: Or Gerlitz @ 2016-08-26  6:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hadar Hen Zion, David S. Miller, Linux Netdev List, Jiri Pirko,
	Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani, Tom Herbert,
	Or Gerlitz, Amir Vadai, Amir Vadai

On Thu, Aug 25, 2016 at 8:48 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2016-08-25 at 19:13 +0300, Hadar Hen Zion wrote:
>> From: Amir Vadai <amir@vadai.me>

>> +static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
>> +                       struct tcf_result *res)
>> +{
>> +     struct tcf_tunnel_key *t = to_tunnel_key(a);
>> +     int action;
>> +
>> +     spin_lock(&t->tcf_lock);
>> +     tcf_lastuse_update(&t->tcf_tm);
>> +     bstats_update(&t->tcf_bstats, skb);
>> +     action = t->tcf_action;
>> +
>> +     switch (t->tcft_action) {
>> +     case TCA_TUNNEL_KEY_ACT_RELEASE:
>> +             skb_dst_set_noref(skb, NULL);
>> +             break;
>> +     case TCA_TUNNEL_KEY_ACT_SET:
>> +             skb_dst_set_noref(skb, &t->tcft_enc_metadata->dst);
>> +             break;
>> +     default:
>> +             WARN_ONCE(1, "Bad tunnel_key action.\n");
>> +             break;
>> +     }
>> +
>> +     spin_unlock(&t->tcf_lock);
>> +     return action;

> Please find a better way than using a spinlock in this hot path.
> Maybe looking at
> 2ee22a90c7afac265bb6f7abea610b938195e2b8 net_sched: act_mirred: remove spinlock in fast path
[...]

okay, thanks for the heads up, will look there

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id()
  2016-08-25 16:13 ` [PATCH net-next V3 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id() Hadar Hen Zion
@ 2016-08-26 10:26   ` Jiri Benc
  0 siblings, 0 replies; 23+ messages in thread
From: Jiri Benc @ 2016-08-26 10:26 UTC (permalink / raw)
  To: Hadar Hen Zion
  Cc: David S. Miller, netdev, Jiri Pirko, Jamal Hadi Salim,
	Shmulik Ladkani, Tom Herbert, Or Gerlitz, Amir Vadai, Amir Vadai

On Thu, 25 Aug 2016 19:13:44 +0300, Hadar Hen Zion wrote:
> From: Amir Vadai <amir@vadai.me>
> 
> Add utility functions to convert a 32 bits key into a 64 bits tunnel and
> vice versa.
> These functions will be used instead of cloning code in GRE and VXLAN,
> and in tc act_iptunnel which will be introduced in a following patch in
> this patchset.
> 
> Signed-off-by: Amir Vadai <amir@vadai.me>
> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>

Acked-by: Jiri Benc <jbenc@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb
  2016-08-25 16:40   ` Shmulik Ladkani
  2016-08-26  6:14     ` Or Gerlitz
@ 2016-08-26 10:31     ` Jiri Benc
  1 sibling, 0 replies; 23+ messages in thread
From: Jiri Benc @ 2016-08-26 10:31 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: Hadar Hen Zion, David S. Miller, netdev, Jiri Pirko,
	Jamal Hadi Salim, Tom Herbert, Or Gerlitz, Amir Vadai,
	Amir Vadai

On Thu, 25 Aug 2016 19:40:50 +0300, Shmulik Ladkani wrote:
> On Thu, 25 Aug 2016 19:13:45 +0300 Hadar Hen Zion <hadarh@mellanox.com> wrote:
> > From: Amir Vadai <amir@vadai.me>
> > 
> > Extract _ip_tun_rx_dst() and _ipv6_tun_rx_dst() out of ip_tun_rx_dst()
> > and ipv6_tun_rx_dst(), to be used without supplying an skb.
> 
> Per this v3, the newly introduced helpers ate named _ip_tun_set_dst and
> _ipv6_tun_set_dst - better alter the log message to reflect that.

And please rename them to start with double underscore to match the
coding style of the rest of the kernel.

Thanks!

 Jiri

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 3/4] net/sched: cls_flower: Classify packet in ip tunnels
  2016-08-25 16:13 ` [PATCH net-next V3 3/4] net/sched: cls_flower: Classify packet in ip tunnels Hadar Hen Zion
@ 2016-08-26 10:46   ` Jiri Benc
  0 siblings, 0 replies; 23+ messages in thread
From: Jiri Benc @ 2016-08-26 10:46 UTC (permalink / raw)
  To: Hadar Hen Zion
  Cc: David S. Miller, netdev, Jiri Pirko, Jamal Hadi Salim,
	Shmulik Ladkani, Tom Herbert, Or Gerlitz, Amir Vadai, Amir Vadai

Just a nit,

On Thu, 25 Aug 2016 19:13:46 +0300, Hadar Hen Zion wrote:
> $ filter add dev vxlan0 protocol ip parent ffff: \
>     flower \
>       enc_src_ip 11.11.0.2 \
>       enc_dst_ip 11.11.0.1 \
>       enc_key_id 11 \
>       dst_ip 11.11.11.1 \
>     action iptunnel decap \

             ^ this is now called tunnel_key :-)

>     action mirred egress redirect dev vnet0
> 
> The action iptunnel, will be introduced in the next patch in this

             ^ and here.

Thanks!

 Jiri

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key
  2016-08-25 16:13 ` [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key Hadar Hen Zion
  2016-08-25 16:52   ` Shmulik Ladkani
  2016-08-25 17:48   ` Eric Dumazet
@ 2016-08-26 11:13   ` Jiri Benc
  2 siblings, 0 replies; 23+ messages in thread
From: Jiri Benc @ 2016-08-26 11:13 UTC (permalink / raw)
  To: Hadar Hen Zion
  Cc: David S. Miller, netdev, Jiri Pirko, Jamal Hadi Salim,
	Shmulik Ladkani, Tom Herbert, Or Gerlitz, Amir Vadai, Amir Vadai

On Thu, 25 Aug 2016 19:13:47 +0300, Hadar Hen Zion wrote:
> +static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
> +			  struct tcf_result *res)
> +{
> +	struct tcf_tunnel_key *t = to_tunnel_key(a);
> +	int action;
> +
> +	spin_lock(&t->tcf_lock);
> +	tcf_lastuse_update(&t->tcf_tm);
> +	bstats_update(&t->tcf_bstats, skb);
> +	action = t->tcf_action;
> +
> +	switch (t->tcft_action) {
> +	case TCA_TUNNEL_KEY_ACT_RELEASE:
> +		skb_dst_set_noref(skb, NULL);
> +		break;

You're leaking dst here.

> +	case TCA_TUNNEL_KEY_ACT_SET:
> +		skb_dst_set_noref(skb, &t->tcft_enc_metadata->dst);
> +		break;

And here, too. Also, what protects the tcft_enc_metadata->dst from
being freed if there's a skb queued and the action is removed? Seems
that tunnel_key_release just happily frees it. You probably need to
take a reference here.

> +		if (tb[TCA_TUNNEL_KEY_ENC_IPV4_SRC] &&
> +		    tb[TCA_TUNNEL_KEY_ENC_IPV4_DST]) {
> +			__be32 saddr;
> +			__be32 daddr;
> +
> +			saddr = nla_get_be32(tb[TCA_TUNNEL_KEY_ENC_IPV4_SRC]);
> +			daddr = nla_get_be32(tb[TCA_TUNNEL_KEY_ENC_IPV4_DST]);

Use nla_get_in_addr, please.

> +	if (family == AF_INET) {
> +		__be32 saddr = info->key.u.ipv4.src;
> +		__be32 daddr = info->key.u.ipv4.dst;
> +
> +		if (!nla_put_be32(skb, TCA_TUNNEL_KEY_ENC_IPV4_SRC, saddr) &&
> +		    !nla_put_be32(skb, TCA_TUNNEL_KEY_ENC_IPV4_DST, daddr))

nla_put_in_addr

Thanks!

 Jiri

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key
  2016-08-25 17:48   ` Eric Dumazet
  2016-08-26  6:16     ` Or Gerlitz
@ 2016-08-26 18:26     ` Cong Wang
  2016-08-26 19:16       ` Eric Dumazet
  1 sibling, 1 reply; 23+ messages in thread
From: Cong Wang @ 2016-08-26 18:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hadar Hen Zion, David S. Miller, Linux Kernel Network Developers,
	Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
	Tom Herbert, Or Gerlitz, Amir Vadai, Amir Vadai

On Thu, Aug 25, 2016 at 10:48 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Please find a better way than using a spinlock in this hot path.
>
> Maybe looking at
> 2ee22a90c7afac265bb6f7abea610b938195e2b8 net_sched: act_mirred: remove spinlock in fast path
> 56e5d1ca183d8616fab377d7d466c244b4dbb3b9 net_sched: act_gact: remove spinlock in fast path

This is not necessary at the moment, because:

1) Currently there are only a few actions using lockless, and they are
questionable, as we already discussed before, there could be some
race condition when you modify an existing action.

2) We need to change the tc action API in order to fully support RCU,
which is what I have been working on these days. I should come up
with something next Monday (if not this weekend).

So for this patchset, using spinlock is fine, just as many other actions.
I will take care of it later.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key
  2016-08-26 18:26     ` Cong Wang
@ 2016-08-26 19:16       ` Eric Dumazet
  2016-08-29  5:04         ` Cong Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2016-08-26 19:16 UTC (permalink / raw)
  To: Cong Wang
  Cc: Hadar Hen Zion, David S. Miller, Linux Kernel Network Developers,
	Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
	Tom Herbert, Or Gerlitz, Amir Vadai, Amir Vadai

On Fri, 2016-08-26 at 11:26 -0700, Cong Wang wrote:
> On Thu, Aug 25, 2016 at 10:48 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > Please find a better way than using a spinlock in this hot path.
> >
> > Maybe looking at
> > 2ee22a90c7afac265bb6f7abea610b938195e2b8 net_sched: act_mirred: remove spinlock in fast path
> > 56e5d1ca183d8616fab377d7d466c244b4dbb3b9 net_sched: act_gact: remove spinlock in fast path
> 
> This is not necessary at the moment, because:
> 
> 1) Currently there are only a few actions using lockless, and they are
> questionable, as we already discussed before, there could be some
> race condition when you modify an existing action.

There is no fundamental issue with a race condition.

Sure, there are races, but they have no serious effect.

Feel free to send a fix if you really have time to spare.

> 
> 2) We need to change the tc action API in order to fully support RCU,
> which is what I have been working on these days. I should come up
> with something next Monday (if not this weekend).
> 
> So for this patchset, using spinlock is fine, just as many other actions.
> I will take care of it later.

This is _not_ fine.

We are in 2016, not in 1995 anymore.

We are not adding a spinlock in a hot path unless absolutely needed.

With multi queue NIC, this spinlock is going to hurt performance so much
that this action wont be used by any serious user.

Here, it is absolutely trivial to use RCU and/or percpu counters.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key
  2016-08-26 19:16       ` Eric Dumazet
@ 2016-08-29  5:04         ` Cong Wang
  2016-08-30 11:03           ` Amir Vadai
  0 siblings, 1 reply; 23+ messages in thread
From: Cong Wang @ 2016-08-29  5:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hadar Hen Zion, David S. Miller, Linux Kernel Network Developers,
	Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
	Tom Herbert, Or Gerlitz, Amir Vadai, Amir Vadai

On Fri, Aug 26, 2016 at 12:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2016-08-26 at 11:26 -0700, Cong Wang wrote:
>> 1) Currently there are only a few actions using lockless, and they are
>> questionable, as we already discussed before, there could be some
>> race condition when you modify an existing action.
>
> There is no fundamental issue with a race condition.

For mirred action, maybe. As we already discussed, the more
complex an action is, the harder to make it lockless in your
way (that is, not using RCU)

>
> Sure, there are races, but they have no serious effect.
>
> Feel free to send a fix if you really have time to spare.

It's because the code is written by you?

I am surprised how you try to hide your own problem in
such a way...


>
>>
>> 2) We need to change the tc action API in order to fully support RCU,
>> which is what I have been working on these days. I should come up
>> with something next Monday (if not this weekend).
>>
>> So for this patchset, using spinlock is fine, just as many other actions.
>> I will take care of it later.
>
> This is _not_ fine.


OK, so where are your patches to make the rest actions
lockless?


>
> We are in 2016, not in 1995 anymore.
>

Fair enough, sounds like all actions are already lockless in
fast path now in 2016, you know this is not true...


> We are not adding a spinlock in a hot path unless absolutely needed.

If it is bug-free, yes, I am totally with you. I care about corretness
more than any performance.


>
> With multi queue NIC, this spinlock is going to hurt performance so much
> that this action wont be used by any serious user.

We have used mirred action even before you make it lockless.


>
> Here, it is absolutely trivial to use RCU and/or percpu counters.

Sounds like we don't need any API change, why not go ahead
and try it? Please do teach me how to modify an existing
action in a lockless way without changing any API (and of course
needs to be bug-free), I am very happy to learn your "trivial" way
to fix this, since I don't have any trivial fix.

Please, stop bullsh*t, show me your trivial code.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key
  2016-08-29  5:04         ` Cong Wang
@ 2016-08-30 11:03           ` Amir Vadai
  2016-08-30 11:39             ` Amir Vadai
  2016-08-30 12:05             ` Jamal Hadi Salim
  0 siblings, 2 replies; 23+ messages in thread
From: Amir Vadai @ 2016-08-30 11:03 UTC (permalink / raw)
  To: Cong Wang, Eric Dumazet
  Cc: Hadar Hen Zion, David S. Miller, Linux Kernel Network Developers,
	Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
	Tom Herbert, Or Gerlitz, Amir Vadai

On Sun, Aug 28, 2016 at 10:04:21PM -0700, Cong Wang wrote:
> On Fri, Aug 26, 2016 at 12:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Fri, 2016-08-26 at 11:26 -0700, Cong Wang wrote:
> >> 1) Currently there are only a few actions using lockless, and they are
> >> questionable, as we already discussed before, there could be some
> >> race condition when you modify an existing action.
> >
> > There is no fundamental issue with a race condition.
> 
> For mirred action, maybe. As we already discussed, the more
> complex an action is, the harder to make it lockless in your
> way (that is, not using RCU)
> 
> >
> > Sure, there are races, but they have no serious effect.
> >
> > Feel free to send a fix if you really have time to spare.
> 
> It's because the code is written by you?
> 
> I am surprised how you try to hide your own problem in
> such a way...
> 
> 
> >
> >>
> >> 2) We need to change the tc action API in order to fully support RCU,
> >> which is what I have been working on these days. I should come up
> >> with something next Monday (if not this weekend).
> >>
> >> So for this patchset, using spinlock is fine, just as many other actions.
> >> I will take care of it later.
> >
> > This is _not_ fine.
> 
> 
> OK, so where are your patches to make the rest actions
> lockless?
> 
> 
> >
> > We are in 2016, not in 1995 anymore.
> >
> 
> Fair enough, sounds like all actions are already lockless in
> fast path now in 2016, you know this is not true...
> 
> 
> > We are not adding a spinlock in a hot path unless absolutely needed.
> 
> If it is bug-free, yes, I am totally with you. I care about corretness
> more than any performance.
> 
> 
> >
> > With multi queue NIC, this spinlock is going to hurt performance so much
> > that this action wont be used by any serious user.
> 
> We have used mirred action even before you make it lockless.
> 
> 
> >
> > Here, it is absolutely trivial to use RCU and/or percpu counters.
> 
> Sounds like we don't need any API change, why not go ahead
> and try it? Please do teach me how to modify an existing
> action in a lockless way without changing any API (and of course
> needs to be bug-free), I am very happy to learn your "trivial" way
> to fix this, since I don't have any trivial fix.
> 
> Please, stop bullsh*t, show me your trivial code.

Regarding the specific action in this patchset, correct me if I'm wrong,
but I think that the lock could be removed safely.

When the action is modified during traffic, an existing tcf_enc_metadata
is not changed, but a new metadata is allocated and the pointer is
replaced to point to the new one.
I just need to make sure that when changing an action from 'release'
into 'set' - tcf_enc_metadata will be set before the action type is
changed - change the order of operations and add a memory barrier.
Here is a pseudo code to explain:

metadata_new = new allocated metadata
metadata_old = t->tcft_enc_metadata

t->tcft_action = encapdecap

/* make sure the compiler won't swap the setting of tcft_action with
 * tcft_enc_metadata
 */
wmb()

t->tcft_enc_metadata = metadata_new
release metadata_old


This way, no need for lock between the init() and act() operations.

Please let me know if you see a problem with this approach.
I will also change the stats to be percpu.

Thanks,
Amir

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key
  2016-08-30 11:03           ` Amir Vadai
@ 2016-08-30 11:39             ` Amir Vadai
  2016-08-30 12:27               ` Eric Dumazet
  2016-08-30 12:05             ` Jamal Hadi Salim
  1 sibling, 1 reply; 23+ messages in thread
From: Amir Vadai @ 2016-08-30 11:39 UTC (permalink / raw)
  To: Cong Wang, Eric Dumazet
  Cc: Hadar Hen Zion, David S. Miller, Linux Kernel Network Developers,
	Jiri Pirko, Jiri Benc, Jamal Hadi Salim, Shmulik Ladkani,
	Tom Herbert, Or Gerlitz, Amir Vadai

On Tue, Aug 30, 2016 at 02:03:08PM +0300, Amir Vadai wrote:
> On Sun, Aug 28, 2016 at 10:04:21PM -0700, Cong Wang wrote:
> > On Fri, Aug 26, 2016 at 12:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > On Fri, 2016-08-26 at 11:26 -0700, Cong Wang wrote:
> > >> 1) Currently there are only a few actions using lockless, and they are
> > >> questionable, as we already discussed before, there could be some
> > >> race condition when you modify an existing action.
> > >
> > > There is no fundamental issue with a race condition.
> > 
> > For mirred action, maybe. As we already discussed, the more
> > complex an action is, the harder to make it lockless in your
> > way (that is, not using RCU)
> > 
> > >
> > > Sure, there are races, but they have no serious effect.
> > >
> > > Feel free to send a fix if you really have time to spare.
> > 
> > It's because the code is written by you?
> > 
> > I am surprised how you try to hide your own problem in
> > such a way...
> > 
> > 
> > >
> > >>
> > >> 2) We need to change the tc action API in order to fully support RCU,
> > >> which is what I have been working on these days. I should come up
> > >> with something next Monday (if not this weekend).
> > >>
> > >> So for this patchset, using spinlock is fine, just as many other actions.
> > >> I will take care of it later.
> > >
> > > This is _not_ fine.
> > 
> > 
> > OK, so where are your patches to make the rest actions
> > lockless?
> > 
> > 
> > >
> > > We are in 2016, not in 1995 anymore.
> > >
> > 
> > Fair enough, sounds like all actions are already lockless in
> > fast path now in 2016, you know this is not true...
> > 
> > 
> > > We are not adding a spinlock in a hot path unless absolutely needed.
> > 
> > If it is bug-free, yes, I am totally with you. I care about corretness
> > more than any performance.
> > 
> > 
> > >
> > > With multi queue NIC, this spinlock is going to hurt performance so much
> > > that this action wont be used by any serious user.
> > 
> > We have used mirred action even before you make it lockless.
> > 
> > 
> > >
> > > Here, it is absolutely trivial to use RCU and/or percpu counters.
> > 
> > Sounds like we don't need any API change, why not go ahead
> > and try it? Please do teach me how to modify an existing
> > action in a lockless way without changing any API (and of course
> > needs to be bug-free), I am very happy to learn your "trivial" way
> > to fix this, since I don't have any trivial fix.
> > 
> > Please, stop bullsh*t, show me your trivial code.
> 
> Regarding the specific action in this patchset, correct me if I'm wrong,
> but I think that the lock could be removed safely.
> 
> When the action is modified during traffic, an existing tcf_enc_metadata
> is not changed, but a new metadata is allocated and the pointer is
> replaced to point to the new one.
> I just need to make sure that when changing an action from 'release'
> into 'set' - tcf_enc_metadata will be set before the action type is
> changed - change the order of operations and add a memory barrier.
> Here is a pseudo code to explain:
> 
> metadata_new = new allocated metadata
> metadata_old = t->tcft_enc_metadata
> 

Oh - I had a typo here:
Need to set the metadata and only after that, set the action:

t->tcft_enc_metadata = metadata_new
wmb()
t->tcft_action = encapdecap

> t->tcft_action = encapdecap
> 
> /* make sure the compiler won't swap the setting of tcft_action with
>  * tcft_enc_metadata
>  */
> wmb()
> 
> t->tcft_enc_metadata = metadata_new
> release metadata_old
> 
> 
> This way, no need for lock between the init() and act() operations.
> 
> Please let me know if you see a problem with this approach.
> I will also change the stats to be percpu.
> 
> Thanks,
> Amir
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key
  2016-08-30 11:03           ` Amir Vadai
  2016-08-30 11:39             ` Amir Vadai
@ 2016-08-30 12:05             ` Jamal Hadi Salim
  2016-08-30 13:17               ` Amir Vadai
  1 sibling, 1 reply; 23+ messages in thread
From: Jamal Hadi Salim @ 2016-08-30 12:05 UTC (permalink / raw)
  To: Amir Vadai, Cong Wang, Eric Dumazet
  Cc: Hadar Hen Zion, David S. Miller, Linux Kernel Network Developers,
	Jiri Pirko, Jiri Benc, Shmulik Ladkani, Tom Herbert, Or Gerlitz,
	Amir Vadai

On 16-08-30 07:03 AM, Amir Vadai wrote:
> On Sun, Aug 28, 2016 at 10:04:21PM -0700, Cong Wang wrote:
>> On Fri, Aug 26, 2016 at 12:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> On Fri, 2016-08-26 at 11:26 -0700, Cong Wang wrote:


> Regarding the specific action in this patchset, correct me if I'm wrong,
> but I think that the lock could be removed safely.
>

 From what Eric suggested (refer to my posting on skbmod),
this becomes:

+struct tcf_tunnel_key_p {
+	int			tcft_action;
+	struct metadata_dst     *tcft_enc_metadata;
+};

/* rcu protected */
+struct tcf_tunnel_key {
+	struct tc_action	common;
+       struct tcf_tunnel_key_p *p;
+};

At init() - always alloc struct tcf_tunnel_key_p, new

old = rtnl_dereference(mykey->p);
if (ovr)
     spin_lock_bh(&mykey->tcf_lock);
... update all params here ..
rcu_assign_pointer(mykey->p, new);
if (ovr) {
      spin_unlock_bh(&mykey->tcf_lock);
      synchronize_rcu();
}

kfree(old);

at act():

rcu_read_lock();
struct tcf_tunnel_key_p *p = rcu_dereference(mykey->p);
... use p here ...
rcu_read_unlock();

Cong was looking to do something more generic for all actions.

cheers,
jamal

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key
  2016-08-30 11:39             ` Amir Vadai
@ 2016-08-30 12:27               ` Eric Dumazet
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2016-08-30 12:27 UTC (permalink / raw)
  To: Amir Vadai
  Cc: Cong Wang, Hadar Hen Zion, David S. Miller,
	Linux Kernel Network Developers, Jiri Pirko, Jiri Benc,
	Jamal Hadi Salim, Shmulik Ladkani, Tom Herbert, Or Gerlitz,
	Amir Vadai

On Tue, 2016-08-30 at 14:39 +0300, Amir Vadai wrote:
> On Tue, Aug 30, 2016 at 02:03:08PM +0300, Amir Vadai wrote:

> > Regarding the specific action in this patchset, correct me if I'm wrong,
> > but I think that the lock could be removed safely.

Sure ;)

> > 
> > When the action is modified during traffic, an existing tcf_enc_metadata
> > is not changed, but a new metadata is allocated and the pointer is
> > replaced to point to the new one.
> > I just need to make sure that when changing an action from 'release'
> > into 'set' - tcf_enc_metadata will be set before the action type is
> > changed - change the order of operations and add a memory barrier.
> > Here is a pseudo code to explain:
> > 
> > metadata_new = new allocated metadata
> > metadata_old = t->tcft_enc_metadata
> > 
> 
> Oh - I had a typo here:
> Need to set the metadata and only after that, set the action:
> 
> t->tcft_enc_metadata = metadata_new
> wmb()

rcu_assign_pointer() is your friend, it auto documents the thing.


Note that you probably need to store in the allocated object :

	dst,
	tcf_action (a copy of it, read in tunnel_key_act()
	tcft_action ( a copy of it, read in tunnel_key_act())
	rcu_head rcu for kfree_rcu()

> t->tcft_action = encapdecap
> 
> > t->tcft_action = encapdecap
> > 
> > /* make sure the compiler won't swap the setting of tcft_action with
> >  * tcft_enc_metadata
> >  */
> > wmb()
> > 
> > t->tcft_enc_metadata = metadata_new
> > release metadata_old
> > 
> > 
> > This way, no need for lock between the init() and act() operations.
> > 
> > Please let me know if you see a problem with this approach.
> > I will also change the stats to be percpu.

Right, check tcf_hash_create() last argument. (false -> true)

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key
  2016-08-30 12:05             ` Jamal Hadi Salim
@ 2016-08-30 13:17               ` Amir Vadai
  2016-08-30 14:27                 ` Eric Dumazet
  0 siblings, 1 reply; 23+ messages in thread
From: Amir Vadai @ 2016-08-30 13:17 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Cong Wang, Eric Dumazet, Hadar Hen Zion, David S. Miller,
	Linux Kernel Network Developers, Jiri Pirko, Jiri Benc,
	Shmulik Ladkani, Tom Herbert, Or Gerlitz, Amir Vadai

On Tue, Aug 30, 2016 at 08:05:03AM -0400, Jamal Hadi Salim wrote:
> On 16-08-30 07:03 AM, Amir Vadai wrote:
> > On Sun, Aug 28, 2016 at 10:04:21PM -0700, Cong Wang wrote:
> > > On Fri, Aug 26, 2016 at 12:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > > On Fri, 2016-08-26 at 11:26 -0700, Cong Wang wrote:
> 
> 
> > Regarding the specific action in this patchset, correct me if I'm wrong,
> > but I think that the lock could be removed safely.
> > 
> 
> From what Eric suggested (refer to my posting on skbmod),
> this becomes:
> 
> +struct tcf_tunnel_key_p {
> +	int			tcft_action;
> +	struct metadata_dst     *tcft_enc_metadata;
> +};
> 
> /* rcu protected */
> +struct tcf_tunnel_key {
> +	struct tc_action	common;
> +       struct tcf_tunnel_key_p *p;
> +};
> 
> At init() - always alloc struct tcf_tunnel_key_p, new
> 
> old = rtnl_dereference(mykey->p);
> if (ovr)
>     spin_lock_bh(&mykey->tcf_lock);
Thanks for the detailed example :)

what are we protecting with this spin lock here? isn't concurrent init()
calls are protected by the rtnl lock?


> ... update all params here ..
> rcu_assign_pointer(mykey->p, new);
> if (ovr) {
>      spin_unlock_bh(&mykey->tcf_lock);
>      synchronize_rcu();
> }
> 
> kfree(old);
> 
> at act():
> 
> rcu_read_lock();
> struct tcf_tunnel_key_p *p = rcu_dereference(mykey->p);
> ... use p here ...
> rcu_read_unlock();
> 
> Cong was looking to do something more generic for all actions.
> 
> cheers,
> jamal

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key
  2016-08-30 13:17               ` Amir Vadai
@ 2016-08-30 14:27                 ` Eric Dumazet
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2016-08-30 14:27 UTC (permalink / raw)
  To: Amir Vadai
  Cc: Jamal Hadi Salim, Cong Wang, Hadar Hen Zion, David S. Miller,
	Linux Kernel Network Developers, Jiri Pirko, Jiri Benc,
	Shmulik Ladkani, Tom Herbert, Or Gerlitz, Amir Vadai

On Tue, 2016-08-30 at 16:17 +0300, Amir Vadai wrote:
> On Tue, Aug 30, 2016 at 08:05:03AM -0400, Jamal Hadi Salim wrote:
> > 
> > old = rtnl_dereference(mykey->p);
> > if (ovr)
> >     spin_lock_bh(&mykey->tcf_lock);
> Thanks for the detailed example :)
> 
> what are we protecting with this spin lock here? isn't concurrent init()
> calls are protected by the rtnl lock?

Right. RTNL should be enough here for the write exclusion.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-08-30 14:27 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-25 16:13 [PATCH net-next V3 0/4] net/sched: ip tunnel metadata set/release/classify by using TC Hadar Hen Zion
2016-08-25 16:13 ` [PATCH net-next V3 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id() Hadar Hen Zion
2016-08-26 10:26   ` Jiri Benc
2016-08-25 16:13 ` [PATCH net-next V3 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb Hadar Hen Zion
2016-08-25 16:40   ` Shmulik Ladkani
2016-08-26  6:14     ` Or Gerlitz
2016-08-26 10:31     ` Jiri Benc
2016-08-25 16:13 ` [PATCH net-next V3 3/4] net/sched: cls_flower: Classify packet in ip tunnels Hadar Hen Zion
2016-08-26 10:46   ` Jiri Benc
2016-08-25 16:13 ` [PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key Hadar Hen Zion
2016-08-25 16:52   ` Shmulik Ladkani
2016-08-25 17:48   ` Eric Dumazet
2016-08-26  6:16     ` Or Gerlitz
2016-08-26 18:26     ` Cong Wang
2016-08-26 19:16       ` Eric Dumazet
2016-08-29  5:04         ` Cong Wang
2016-08-30 11:03           ` Amir Vadai
2016-08-30 11:39             ` Amir Vadai
2016-08-30 12:27               ` Eric Dumazet
2016-08-30 12:05             ` Jamal Hadi Salim
2016-08-30 13:17               ` Amir Vadai
2016-08-30 14:27                 ` Eric Dumazet
2016-08-26 11:13   ` Jiri Benc

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.