All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/6] ila: Optimization to preserve value of early demux
@ 2015-09-29 22:17 Tom Herbert
  2015-09-29 22:17 ` [PATCH net-next 1/6] ila: Create net/ipv6/ila directory Tom Herbert
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Tom Herbert @ 2015-09-29 22:17 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

In the current implementation of ILA, LWT is used to perform
translation on both the input and output paths. This is functional,
however there is a big performance hit in the receive path. Early
demux occurs before the routing lookup (a hit actually obviates the
route lookup). Therefore the stack currently performs early
demux before translation so that a local connection with ILA
addresses is never matched. Note that this issue is not just
with ILA, but pretty much any translated or encapsulated packet
handled by LWT would miss the opportunity for early demux. Solving
the general problem seems non trivial since we would need to move
the route lookup before early demx thereby mitigating the value.

This patch set addresses the issue for ILA by adding a fast locator
lookup that occurs before early demux. This is done by creating an
XFRM hook to perform address translation early in the receive path.
For the backend we implement an rhashtable that contains identifier
to locator to mappings. The table also allows more specific matches
that include original locator and interface.

This patch set:
 - Add an rhashtable function to atomically replace and element.
   This is useful to implement sub-trees from a table entry
   without needing to use a special anchor structure as the
   table entry.
 - Add a start callback for starting a netlink dump.
 - Creates an ila directory under net/ipv6 and moves ila.c to it.
   ila.c is split into ila_common.c and ila_lwt.c.
 - Implement a table to do identifier->locator mapping. This is
   an rhashtable.
 - Configuration for the table with netlink.
 - Add XFRM xlat_addr facility. This includes a callback registeration
   function and hook to call registered callbacks.
 - Call xfrm6_xlat_addr from ipv6_rcv before NF_HOOK and routing.

Testing:
   Running 200 netperf TCP_RR streams

No ILA, baseline
   85.72% CPU utilization
   1861945 tps
   93/163/330 50/90/99% latencies

ILA before fix (LWT on both input and output)
   83.47 CPU utilization
   16583186 tps (-11% from baseline)
   107/183/338 50/90/99% latencies

ILA after fix (hook for input)
   84.97% CPU utilization
   1833948 tps (-1.5% from baseline)
   95/164/331 50/90/99% latencies

Hacked DNPT to do ILA
   80.94% CPU utilization
   1683315 tps (-10% from baseline)
   104/179/350 50/90/99% latencies

Tom Herbert (6):
  ila: Create net/ipv6/ila directory
  rhashtable: add function to replace an element
  netlink: add a start callback for starting a netlink dump
  xfrm: Add xfrm6 address translation function
  ipv6: Call xfrm6_xlat_addr from ipv6_rcv
  ila: Add support for xfrm6_xlat_addr

 include/linux/netlink.h    |   2 +
 include/linux/rhashtable.h |  82 ++++++
 include/net/genetlink.h    |   2 +
 include/net/xfrm.h         |  25 ++
 include/uapi/linux/ila.h   |  22 ++
 net/ipv6/Kconfig           |   5 +
 net/ipv6/Makefile          |   3 +-
 net/ipv6/ila.c             | 229 ----------------
 net/ipv6/ila/Makefile      |   7 +
 net/ipv6/ila/ila.h         |  48 ++++
 net/ipv6/ila/ila_common.c  | 103 ++++++++
 net/ipv6/ila/ila_lwt.c     | 152 +++++++++++
 net/ipv6/ila/ila_xlat.c    | 642 +++++++++++++++++++++++++++++++++++++++++++++
 net/ipv6/ip6_input.c       |   3 +
 net/ipv6/xfrm6_policy.c    |   7 +
 net/ipv6/xfrm6_xlat_addr.c |  66 +++++
 net/netlink/af_netlink.c   |   4 +
 net/netlink/genetlink.c    |  16 ++
 18 files changed, 1188 insertions(+), 230 deletions(-)
 delete mode 100644 net/ipv6/ila.c
 create mode 100644 net/ipv6/ila/Makefile
 create mode 100644 net/ipv6/ila/ila.h
 create mode 100644 net/ipv6/ila/ila_common.c
 create mode 100644 net/ipv6/ila/ila_lwt.c
 create mode 100644 net/ipv6/ila/ila_xlat.c
 create mode 100644 net/ipv6/xfrm6_xlat_addr.c

-- 
2.4.6

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next 1/6] ila: Create net/ipv6/ila directory
  2015-09-29 22:17 [PATCH net-next 0/6] ila: Optimization to preserve value of early demux Tom Herbert
@ 2015-09-29 22:17 ` Tom Herbert
  2015-09-29 22:17 ` [PATCH net-next 2/6] rhashtable: add function to replace an element Tom Herbert
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Tom Herbert @ 2015-09-29 22:17 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Create ila directory in preparation for supporting other hooks in the
kernel than LWT for doing ILA. This includes:
  - Moving ila.c to ila/ila_lwt.c
  - Splitting out some common functions into ila_common.c

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 net/ipv6/Makefile         |   2 +-
 net/ipv6/ila.c            | 229 ----------------------------------------------
 net/ipv6/ila/Makefile     |   7 ++
 net/ipv6/ila/ila.h        |  46 ++++++++++
 net/ipv6/ila/ila_common.c |  95 +++++++++++++++++++
 net/ipv6/ila/ila_lwt.c    | 152 ++++++++++++++++++++++++++++++
 6 files changed, 301 insertions(+), 230 deletions(-)
 delete mode 100644 net/ipv6/ila.c
 create mode 100644 net/ipv6/ila/Makefile
 create mode 100644 net/ipv6/ila/ila.h
 create mode 100644 net/ipv6/ila/ila_common.c
 create mode 100644 net/ipv6/ila/ila_lwt.c

diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index 2c900c7..2fbd90b 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -34,7 +34,7 @@ obj-$(CONFIG_INET6_XFRM_MODE_TUNNEL) += xfrm6_mode_tunnel.o
 obj-$(CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION) += xfrm6_mode_ro.o
 obj-$(CONFIG_INET6_XFRM_MODE_BEET) += xfrm6_mode_beet.o
 obj-$(CONFIG_IPV6_MIP6) += mip6.o
-obj-$(CONFIG_IPV6_ILA) += ila.o
+obj-$(CONFIG_IPV6_ILA) += ila/
 obj-$(CONFIG_NETFILTER)	+= netfilter/
 
 obj-$(CONFIG_IPV6_VTI) += ip6_vti.o
diff --git a/net/ipv6/ila.c b/net/ipv6/ila.c
deleted file mode 100644
index 678d2df..0000000
--- a/net/ipv6/ila.c
+++ /dev/null
@@ -1,229 +0,0 @@
-#include <linux/errno.h>
-#include <linux/ip.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/skbuff.h>
-#include <linux/socket.h>
-#include <linux/types.h>
-#include <net/checksum.h>
-#include <net/ip.h>
-#include <net/ip6_fib.h>
-#include <net/lwtunnel.h>
-#include <net/protocol.h>
-#include <uapi/linux/ila.h>
-
-struct ila_params {
-	__be64 locator;
-	__be64 locator_match;
-	__wsum csum_diff;
-};
-
-static inline struct ila_params *ila_params_lwtunnel(
-	struct lwtunnel_state *lwstate)
-{
-	return (struct ila_params *)lwstate->data;
-}
-
-static inline __wsum compute_csum_diff8(const __be32 *from, const __be32 *to)
-{
-	__be32 diff[] = {
-		~from[0], ~from[1], to[0], to[1],
-	};
-
-	return csum_partial(diff, sizeof(diff), 0);
-}
-
-static inline __wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p)
-{
-	if (*(__be64 *)&ip6h->daddr == p->locator_match)
-		return p->csum_diff;
-	else
-		return compute_csum_diff8((__be32 *)&ip6h->daddr,
-					  (__be32 *)&p->locator);
-}
-
-static void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p)
-{
-	__wsum diff;
-	struct ipv6hdr *ip6h = ipv6_hdr(skb);
-	size_t nhoff = sizeof(struct ipv6hdr);
-
-	/* First update checksum */
-	switch (ip6h->nexthdr) {
-	case NEXTHDR_TCP:
-		if (likely(pskb_may_pull(skb, nhoff + sizeof(struct tcphdr)))) {
-			struct tcphdr *th = (struct tcphdr *)
-					(skb_network_header(skb) + nhoff);
-
-			diff = get_csum_diff(ip6h, p);
-			inet_proto_csum_replace_by_diff(&th->check, skb,
-							diff, true);
-		}
-		break;
-	case NEXTHDR_UDP:
-		if (likely(pskb_may_pull(skb, nhoff + sizeof(struct udphdr)))) {
-			struct udphdr *uh = (struct udphdr *)
-					(skb_network_header(skb) + nhoff);
-
-			if (uh->check || skb->ip_summed == CHECKSUM_PARTIAL) {
-				diff = get_csum_diff(ip6h, p);
-				inet_proto_csum_replace_by_diff(&uh->check, skb,
-								diff, true);
-				if (!uh->check)
-					uh->check = CSUM_MANGLED_0;
-			}
-		}
-		break;
-	case NEXTHDR_ICMP:
-		if (likely(pskb_may_pull(skb,
-					 nhoff + sizeof(struct icmp6hdr)))) {
-			struct icmp6hdr *ih = (struct icmp6hdr *)
-					(skb_network_header(skb) + nhoff);
-
-			diff = get_csum_diff(ip6h, p);
-			inet_proto_csum_replace_by_diff(&ih->icmp6_cksum, skb,
-							diff, true);
-		}
-		break;
-	}
-
-	/* Now change destination address */
-	*(__be64 *)&ip6h->daddr = p->locator;
-}
-
-static int ila_output(struct sock *sk, struct sk_buff *skb)
-{
-	struct dst_entry *dst = skb_dst(skb);
-
-	if (skb->protocol != htons(ETH_P_IPV6))
-		goto drop;
-
-	update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate));
-
-	return dst->lwtstate->orig_output(sk, skb);
-
-drop:
-	kfree_skb(skb);
-	return -EINVAL;
-}
-
-static int ila_input(struct sk_buff *skb)
-{
-	struct dst_entry *dst = skb_dst(skb);
-
-	if (skb->protocol != htons(ETH_P_IPV6))
-		goto drop;
-
-	update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate));
-
-	return dst->lwtstate->orig_input(skb);
-
-drop:
-	kfree_skb(skb);
-	return -EINVAL;
-}
-
-static struct nla_policy ila_nl_policy[ILA_ATTR_MAX + 1] = {
-	[ILA_ATTR_LOCATOR] = { .type = NLA_U64, },
-};
-
-static int ila_build_state(struct net_device *dev, struct nlattr *nla,
-			   unsigned int family, const void *cfg,
-			   struct lwtunnel_state **ts)
-{
-	struct ila_params *p;
-	struct nlattr *tb[ILA_ATTR_MAX + 1];
-	size_t encap_len = sizeof(*p);
-	struct lwtunnel_state *newts;
-	const struct fib6_config *cfg6 = cfg;
-	int ret;
-
-	if (family != AF_INET6)
-		return -EINVAL;
-
-	ret = nla_parse_nested(tb, ILA_ATTR_MAX, nla,
-			       ila_nl_policy);
-	if (ret < 0)
-		return ret;
-
-	if (!tb[ILA_ATTR_LOCATOR])
-		return -EINVAL;
-
-	newts = lwtunnel_state_alloc(encap_len);
-	if (!newts)
-		return -ENOMEM;
-
-	newts->len = encap_len;
-	p = ila_params_lwtunnel(newts);
-
-	p->locator = (__force __be64)nla_get_u64(tb[ILA_ATTR_LOCATOR]);
-
-	if (cfg6->fc_dst_len > sizeof(__be64)) {
-		/* Precompute checksum difference for translation since we
-		 * know both the old locator and the new one.
-		 */
-		p->locator_match = *(__be64 *)&cfg6->fc_dst;
-		p->csum_diff = compute_csum_diff8(
-			(__be32 *)&p->locator_match, (__be32 *)&p->locator);
-	}
-
-	newts->type = LWTUNNEL_ENCAP_ILA;
-	newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT |
-			LWTUNNEL_STATE_INPUT_REDIRECT;
-
-	*ts = newts;
-
-	return 0;
-}
-
-static int ila_fill_encap_info(struct sk_buff *skb,
-			       struct lwtunnel_state *lwtstate)
-{
-	struct ila_params *p = ila_params_lwtunnel(lwtstate);
-
-	if (nla_put_u64(skb, ILA_ATTR_LOCATOR, (__force u64)p->locator))
-		goto nla_put_failure;
-
-	return 0;
-
-nla_put_failure:
-	return -EMSGSIZE;
-}
-
-static int ila_encap_nlsize(struct lwtunnel_state *lwtstate)
-{
-	/* No encapsulation overhead */
-	return 0;
-}
-
-static int ila_encap_cmp(struct lwtunnel_state *a, struct lwtunnel_state *b)
-{
-	struct ila_params *a_p = ila_params_lwtunnel(a);
-	struct ila_params *b_p = ila_params_lwtunnel(b);
-
-	return (a_p->locator != b_p->locator);
-}
-
-static const struct lwtunnel_encap_ops ila_encap_ops = {
-	.build_state = ila_build_state,
-	.output = ila_output,
-	.input = ila_input,
-	.fill_encap = ila_fill_encap_info,
-	.get_encap_size = ila_encap_nlsize,
-	.cmp_encap = ila_encap_cmp,
-};
-
-static int __init ila_init(void)
-{
-	return lwtunnel_encap_add_ops(&ila_encap_ops, LWTUNNEL_ENCAP_ILA);
-}
-
-static void __exit ila_fini(void)
-{
-	lwtunnel_encap_del_ops(&ila_encap_ops, LWTUNNEL_ENCAP_ILA);
-}
-
-module_init(ila_init);
-module_exit(ila_fini);
-MODULE_AUTHOR("Tom Herbert <tom@herbertland.com>");
-MODULE_LICENSE("GPL");
diff --git a/net/ipv6/ila/Makefile b/net/ipv6/ila/Makefile
new file mode 100644
index 0000000..31d136b
--- /dev/null
+++ b/net/ipv6/ila/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for ILA module
+#
+
+obj-$(CONFIG_IPV6_ILA) += ila.o
+
+ila-objs := ila_common.o ila_lwt.o
diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
new file mode 100644
index 0000000..b94081f
--- /dev/null
+++ b/net/ipv6/ila/ila.h
@@ -0,0 +1,46 @@
+/*
+ * Copyright (c) 2015 Tom Herbert <tom@herbertland.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ */
+
+#ifndef __ILA_H
+#define __ILA_H
+
+#include <linux/errno.h>
+#include <linux/ip.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/socket.h>
+#include <linux/skbuff.h>
+#include <linux/types.h>
+#include <net/checksum.h>
+#include <net/ip.h>
+#include <net/protocol.h>
+#include <uapi/linux/ila.h>
+
+struct ila_params {
+	__be64 locator;
+	__be64 locator_match;
+	__wsum csum_diff;
+};
+
+static inline __wsum compute_csum_diff8(const __be32 *from, const __be32 *to)
+{
+	__be32 diff[] = {
+		~from[0], ~from[1], to[0], to[1],
+	};
+
+	return csum_partial(diff, sizeof(diff), 0);
+}
+
+void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p);
+
+int ila_lwt_init(void);
+void ila_lwt_fini(void);
+
+#endif /* __ILA_H */
diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
new file mode 100644
index 0000000..1a1e1e0
--- /dev/null
+++ b/net/ipv6/ila/ila_common.c
@@ -0,0 +1,95 @@
+#include <linux/errno.h>
+#include <linux/ip.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/socket.h>
+#include <linux/types.h>
+#include <net/checksum.h>
+#include <net/ip.h>
+#include <net/ip6_fib.h>
+#include <net/lwtunnel.h>
+#include <net/protocol.h>
+#include <uapi/linux/ila.h>
+#include "ila.h"
+
+__wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p)
+{
+	if (*(__be64 *)&ip6h->daddr == p->locator_match)
+		return p->csum_diff;
+	else
+		return compute_csum_diff8((__be32 *)&ip6h->daddr,
+					  (__be32 *)&p->locator);
+}
+
+void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p)
+{
+	__wsum diff;
+	struct ipv6hdr *ip6h = ipv6_hdr(skb);
+	size_t nhoff = sizeof(struct ipv6hdr);
+
+	/* First update checksum */
+	switch (ip6h->nexthdr) {
+	case NEXTHDR_TCP:
+		if (likely(pskb_may_pull(skb, nhoff + sizeof(struct tcphdr)))) {
+			struct tcphdr *th = (struct tcphdr *)
+					(skb_network_header(skb) + nhoff);
+
+			diff = get_csum_diff(ip6h, p);
+			inet_proto_csum_replace_by_diff(&th->check, skb,
+							diff, true);
+		}
+		break;
+	case NEXTHDR_UDP:
+		if (likely(pskb_may_pull(skb, nhoff + sizeof(struct udphdr)))) {
+			struct udphdr *uh = (struct udphdr *)
+					(skb_network_header(skb) + nhoff);
+
+			if (uh->check || skb->ip_summed == CHECKSUM_PARTIAL) {
+				diff = get_csum_diff(ip6h, p);
+				inet_proto_csum_replace_by_diff(&uh->check, skb,
+								diff, true);
+				if (!uh->check)
+					uh->check = CSUM_MANGLED_0;
+			}
+		}
+		break;
+	case NEXTHDR_ICMP:
+		if (likely(pskb_may_pull(skb,
+					 nhoff + sizeof(struct icmp6hdr)))) {
+			struct icmp6hdr *ih = (struct icmp6hdr *)
+					(skb_network_header(skb) + nhoff);
+
+			diff = get_csum_diff(ip6h, p);
+			inet_proto_csum_replace_by_diff(&ih->icmp6_cksum, skb,
+							diff, true);
+		}
+		break;
+	}
+
+	/* Now change destination address */
+	*(__be64 *)&ip6h->daddr = p->locator;
+}
+
+static int __init ila_init(void)
+{
+	int ret;
+
+	ret = ila_lwt_init();
+
+	if (ret)
+		goto fail_lwt;
+
+fail_lwt:
+	return ret;
+}
+
+static void __exit ila_fini(void)
+{
+	ila_lwt_fini();
+}
+
+module_init(ila_init);
+module_exit(ila_fini);
+MODULE_AUTHOR("Tom Herbert <tom@herbertland.com>");
+MODULE_LICENSE("GPL");
diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
new file mode 100644
index 0000000..d4ed944
--- /dev/null
+++ b/net/ipv6/ila/ila_lwt.c
@@ -0,0 +1,152 @@
+#include <linux/errno.h>
+#include <linux/ip.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/socket.h>
+#include <linux/types.h>
+#include <net/checksum.h>
+#include <net/ip.h>
+#include <net/ip6_fib.h>
+#include <net/lwtunnel.h>
+#include <net/protocol.h>
+#include <uapi/linux/ila.h>
+#include "ila.h"
+
+static inline struct ila_params *ila_params_lwtunnel(
+	struct lwtunnel_state *lwstate)
+{
+	return (struct ila_params *)lwstate->data;
+}
+
+static int ila_output(struct sock *sk, struct sk_buff *skb)
+{
+	struct dst_entry *dst = skb_dst(skb);
+
+	if (skb->protocol != htons(ETH_P_IPV6))
+		goto drop;
+
+	update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate));
+
+	return dst->lwtstate->orig_output(sk, skb);
+
+drop:
+	kfree_skb(skb);
+	return -EINVAL;
+}
+
+static int ila_input(struct sk_buff *skb)
+{
+	struct dst_entry *dst = skb_dst(skb);
+
+	if (skb->protocol != htons(ETH_P_IPV6))
+		goto drop;
+
+	update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate));
+
+	return dst->lwtstate->orig_input(skb);
+
+drop:
+	kfree_skb(skb);
+	return -EINVAL;
+}
+
+static struct nla_policy ila_nl_policy[ILA_ATTR_MAX + 1] = {
+	[ILA_ATTR_LOCATOR] = { .type = NLA_U64, },
+};
+
+static int ila_build_state(struct net_device *dev, struct nlattr *nla,
+			   unsigned int family, const void *cfg,
+			   struct lwtunnel_state **ts)
+{
+	struct ila_params *p;
+	struct nlattr *tb[ILA_ATTR_MAX + 1];
+	size_t encap_len = sizeof(*p);
+	struct lwtunnel_state *newts;
+	const struct fib6_config *cfg6 = cfg;
+	int ret;
+
+	if (family != AF_INET6)
+		return -EINVAL;
+
+	ret = nla_parse_nested(tb, ILA_ATTR_MAX, nla,
+			       ila_nl_policy);
+	if (ret < 0)
+		return ret;
+
+	if (!tb[ILA_ATTR_LOCATOR])
+		return -EINVAL;
+
+	newts = lwtunnel_state_alloc(encap_len);
+	if (!newts)
+		return -ENOMEM;
+
+	newts->len = encap_len;
+	p = ila_params_lwtunnel(newts);
+
+	p->locator = (__force __be64)nla_get_u64(tb[ILA_ATTR_LOCATOR]);
+
+	if (cfg6->fc_dst_len > sizeof(__be64)) {
+		/* Precompute checksum difference for translation since we
+		 * know both the old locator and the new one.
+		 */
+		p->locator_match = *(__be64 *)&cfg6->fc_dst;
+		p->csum_diff = compute_csum_diff8(
+			(__be32 *)&p->locator_match, (__be32 *)&p->locator);
+	}
+
+	newts->type = LWTUNNEL_ENCAP_ILA;
+	newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT |
+			LWTUNNEL_STATE_INPUT_REDIRECT;
+
+	*ts = newts;
+
+	return 0;
+}
+
+static int ila_fill_encap_info(struct sk_buff *skb,
+			       struct lwtunnel_state *lwtstate)
+{
+	struct ila_params *p = ila_params_lwtunnel(lwtstate);
+
+	if (nla_put_u64(skb, ILA_ATTR_LOCATOR, (__force u64)p->locator))
+		goto nla_put_failure;
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
+static int ila_encap_nlsize(struct lwtunnel_state *lwtstate)
+{
+	/* No encapsulation overhead */
+	return 0;
+}
+
+static int ila_encap_cmp(struct lwtunnel_state *a, struct lwtunnel_state *b)
+{
+	struct ila_params *a_p = ila_params_lwtunnel(a);
+	struct ila_params *b_p = ila_params_lwtunnel(b);
+
+	return (a_p->locator != b_p->locator);
+}
+
+static const struct lwtunnel_encap_ops ila_encap_ops = {
+	.build_state = ila_build_state,
+	.output = ila_output,
+	.input = ila_input,
+	.fill_encap = ila_fill_encap_info,
+	.get_encap_size = ila_encap_nlsize,
+	.cmp_encap = ila_encap_cmp,
+};
+
+int ila_lwt_init(void)
+{
+	return lwtunnel_encap_add_ops(&ila_encap_ops, LWTUNNEL_ENCAP_ILA);
+}
+
+void ila_lwt_fini(void)
+{
+	lwtunnel_encap_del_ops(&ila_encap_ops, LWTUNNEL_ENCAP_ILA);
+}
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 2/6] rhashtable: add function to replace an element
  2015-09-29 22:17 [PATCH net-next 0/6] ila: Optimization to preserve value of early demux Tom Herbert
  2015-09-29 22:17 ` [PATCH net-next 1/6] ila: Create net/ipv6/ila directory Tom Herbert
@ 2015-09-29 22:17 ` Tom Herbert
  2015-09-29 22:17 ` [PATCH net-next 3/6] netlink: add a start callback for starting a netlink dump Tom Herbert
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Tom Herbert @ 2015-09-29 22:17 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Add the rhashtable_replace_fast function. This replaces one object in
the table with another atomically. The hashes of the new and old objects
must be equal.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/linux/rhashtable.h | 82 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 843ceca..77deece 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -819,4 +819,86 @@ out:
 	return err;
 }
 
+/* Internal function, please use rhashtable_replace_fast() instead */
+static inline int __rhashtable_replace_fast(
+	struct rhashtable *ht, struct bucket_table *tbl,
+	struct rhash_head *obj_old, struct rhash_head *obj_new,
+	const struct rhashtable_params params)
+{
+	struct rhash_head __rcu **pprev;
+	struct rhash_head *he;
+	spinlock_t *lock;
+	unsigned int hash;
+	int err = -ENOENT;
+
+	/* Minimally, the old and new objects must have same hash
+	 * (which should mean identifiers are the same).
+	 */
+	hash = rht_head_hashfn(ht, tbl, obj_old, params);
+	if (hash != rht_head_hashfn(ht, tbl, obj_new, params))
+		return -EINVAL;
+
+	lock = rht_bucket_lock(tbl, hash);
+
+	spin_lock_bh(lock);
+
+	pprev = &tbl->buckets[hash];
+	rht_for_each(he, tbl, hash) {
+		if (he != obj_old) {
+			pprev = &he->next;
+			continue;
+		}
+
+		rcu_assign_pointer(obj_new->next, obj_old->next);
+		rcu_assign_pointer(*pprev, obj_new);
+		err = 0;
+		break;
+	}
+
+	spin_unlock_bh(lock);
+
+	return err;
+}
+
+/**
+ * rhashtable_replace_fast - replace an object in hash table
+ * @ht:		hash table
+ * @obj_old:	pointer to hash head inside object being replaced
+ * @obj_new:	pointer to hash head inside object which is new
+ * @params:	hash table parameters
+ *
+ * Replacing an object doesn't affect the number of elements in the hash table
+ * or bucket, so we don't need to worry about shrinking or expanding the
+ * table here.
+ *
+ * Returns zero on success, -ENOENT if the entry could not be found,
+ * -EINVAL if hash is not the same for the old and new objects.
+ */
+static inline int rhashtable_replace_fast(
+	struct rhashtable *ht, struct rhash_head *obj_old,
+	struct rhash_head *obj_new,
+	const struct rhashtable_params params)
+{
+	struct bucket_table *tbl;
+	int err;
+
+	rcu_read_lock();
+
+	tbl = rht_dereference_rcu(ht->tbl, ht);
+
+	/* Because we have already taken (and released) the bucket
+	 * lock in old_tbl, if we find that future_tbl is not yet
+	 * visible then that guarantees the entry to still be in
+	 * the old tbl if it exists.
+	 */
+	while ((err = __rhashtable_replace_fast(ht, tbl, obj_old,
+						obj_new, params)) &&
+	       (tbl = rht_dereference_rcu(tbl->future_tbl, ht)))
+		;
+
+	rcu_read_unlock();
+
+	return err;
+}
+
 #endif /* _LINUX_RHASHTABLE_H */
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 3/6] netlink: add a start callback for starting a netlink dump
  2015-09-29 22:17 [PATCH net-next 0/6] ila: Optimization to preserve value of early demux Tom Herbert
  2015-09-29 22:17 ` [PATCH net-next 1/6] ila: Create net/ipv6/ila directory Tom Herbert
  2015-09-29 22:17 ` [PATCH net-next 2/6] rhashtable: add function to replace an element Tom Herbert
@ 2015-09-29 22:17 ` Tom Herbert
  2015-09-29 22:17 ` [PATCH net-next 4/6] xfrm: Add xfrm6 address translation function Tom Herbert
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Tom Herbert @ 2015-09-29 22:17 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

The start callback allows the caller to set up a context for the
dump callbacks. Presumably, the context can then be destroyed in
the done callback.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/linux/netlink.h  |  2 ++
 include/net/genetlink.h  |  2 ++
 net/netlink/af_netlink.c |  4 ++++
 net/netlink/genetlink.c  | 16 ++++++++++++++++
 4 files changed, 24 insertions(+)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 639e9b8..0b41959 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -131,6 +131,7 @@ netlink_skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 struct netlink_callback {
 	struct sk_buff		*skb;
 	const struct nlmsghdr	*nlh;
+	int			(*start)(struct netlink_callback *);
 	int			(*dump)(struct sk_buff * skb,
 					struct netlink_callback *cb);
 	int			(*done)(struct netlink_callback *cb);
@@ -153,6 +154,7 @@ struct nlmsghdr *
 __nlmsg_put(struct sk_buff *skb, u32 portid, u32 seq, int type, int len, int flags);
 
 struct netlink_dump_control {
+	int (*start)(struct netlink_callback *);
 	int (*dump)(struct sk_buff *skb, struct netlink_callback *);
 	int (*done)(struct netlink_callback *);
 	void *data;
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 1b6b6dc..43c0e77 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -114,6 +114,7 @@ static inline void genl_info_net_set(struct genl_info *info, struct net *net)
  * @flags: flags
  * @policy: attribute validation policy
  * @doit: standard command callback
+ * @start: start callback for dumps
  * @dumpit: callback for dumpers
  * @done: completion callback for dumps
  * @ops_list: operations list
@@ -122,6 +123,7 @@ struct genl_ops {
 	const struct nla_policy	*policy;
 	int		       (*doit)(struct sk_buff *skb,
 				       struct genl_info *info);
+	int		       (*start)(struct netlink_callback *cb);
 	int		       (*dumpit)(struct sk_buff *skb,
 					 struct netlink_callback *cb);
 	int		       (*done)(struct netlink_callback *cb);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 8f060d7..c8c43ac 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2905,6 +2905,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 
 	cb = &nlk->cb;
 	memset(cb, 0, sizeof(*cb));
+	cb->start = control->start;
 	cb->dump = control->dump;
 	cb->done = control->done;
 	cb->nlh = nlh;
@@ -2917,6 +2918,9 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 
 	mutex_unlock(nlk->cb_mutex);
 
+	if (cb->start)
+		cb->start(cb);
+
 	ret = netlink_dump(sk);
 	sock_put(sk);
 
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 75724a9..5fd08c0 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -513,6 +513,20 @@ void *genlmsg_put(struct sk_buff *skb, u32 portid, u32 seq,
 }
 EXPORT_SYMBOL(genlmsg_put);
 
+static int genl_lock_start(struct netlink_callback *cb)
+{
+	/* our ops are always const - netlink API doesn't propagate that */
+	const struct genl_ops *ops = cb->data;
+	int rc = 0;
+
+	if (ops->start) {
+		genl_lock();
+		rc = ops->start(cb);
+		genl_unlock();
+	}
+	return rc;
+}
+
 static int genl_lock_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	/* our ops are always const - netlink API doesn't propagate that */
@@ -577,6 +591,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
 				.module = family->module,
 				/* we have const, but the netlink API doesn't */
 				.data = (void *)ops,
+				.start = genl_lock_start,
 				.dump = genl_lock_dumpit,
 				.done = genl_lock_done,
 			};
@@ -588,6 +603,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
 		} else {
 			struct netlink_dump_control c = {
 				.module = family->module,
+				.start = ops->start,
 				.dump = ops->dumpit,
 				.done = ops->done,
 			};
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 4/6] xfrm: Add xfrm6 address translation function
  2015-09-29 22:17 [PATCH net-next 0/6] ila: Optimization to preserve value of early demux Tom Herbert
                   ` (2 preceding siblings ...)
  2015-09-29 22:17 ` [PATCH net-next 3/6] netlink: add a start callback for starting a netlink dump Tom Herbert
@ 2015-09-29 22:17 ` Tom Herbert
  2015-09-29 22:58   ` David Ahern
  2015-09-29 22:17 ` [PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv Tom Herbert
  2015-09-29 22:17 ` [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr Tom Herbert
  5 siblings, 1 reply; 15+ messages in thread
From: Tom Herbert @ 2015-09-29 22:17 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

This patch adds xfrm6_xlat_addr which is called in the data path
to perform address translation (primarily for the receive path). Modules
may register their own callback to perform a translation-- this
registration is managed by xfrm6_xlat_addr_add and xfrm6_xlat_addr_del.
xfrm6_xlat_addr allows translation of addresses for an sk_buff.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/net/xfrm.h         | 25 ++++++++++++++++++
 net/ipv6/Kconfig           |  4 +++
 net/ipv6/Makefile          |  1 +
 net/ipv6/xfrm6_policy.c    |  7 +++++
 net/ipv6/xfrm6_xlat_addr.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 103 insertions(+)
 create mode 100644 net/ipv6/xfrm6_xlat_addr.c

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index fd17610..ea05c4e 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -607,6 +607,31 @@ struct xfrm_mgr {
 int xfrm_register_km(struct xfrm_mgr *km);
 int xfrm_unregister_km(struct xfrm_mgr *km);
 
+struct xfrm6_xlat_addr {
+	int (*xlat)(struct sk_buff *skb);
+	struct list_head list;
+};
+
+#ifdef CONFIG_INET6_XFRM_XLAT_ADDR
+void xfrm6_xlat_addr_add(struct xfrm6_xlat_addr *xla);
+void xfrm6_xlat_addr_del(struct xfrm6_xlat_addr *xla);
+int xfrm6_xlat_addr(struct sk_buff *skb);
+int xfrm6_xlat_addr_init(void);
+void xfrm6_xlat_addr_fini(void);
+#else
+static inline int xfrm6_xlat_addr(struct sk_buff *skb)
+{
+	return 0;
+}
+
+static inline int xfrm6_xlat_addr_init(void)
+{
+	return 0;
+}
+
+static inline void xfrm6_xlat_addr_fini(void) { }
+#endif
+
 struct xfrm_tunnel_skb_cb {
 	union {
 		struct inet_skb_parm h4;
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 983bb99..6e8ca06 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -153,6 +153,10 @@ config INET6_XFRM_MODE_ROUTEOPTIMIZATION
 	---help---
 	  Support for MIPv6 route optimization mode.
 
+config INET6_XFRM_XLAT_ADDR
+	select XFRM
+	bool
+
 config IPV6_VTI
 tristate "Virtual (secure) IPv6: tunneling"
 	select IPV6_TUNNEL
diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index 2fbd90b..c719d6f 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -33,6 +33,7 @@ obj-$(CONFIG_INET6_XFRM_MODE_TRANSPORT) += xfrm6_mode_transport.o
 obj-$(CONFIG_INET6_XFRM_MODE_TUNNEL) += xfrm6_mode_tunnel.o
 obj-$(CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION) += xfrm6_mode_ro.o
 obj-$(CONFIG_INET6_XFRM_MODE_BEET) += xfrm6_mode_beet.o
+obj-$(CONFIG_INET6_XFRM_XLAT_ADDR) += xfrm6_xlat_addr.o
 obj-$(CONFIG_IPV6_MIP6) += mip6.o
 obj-$(CONFIG_IPV6_ILA) += ila/
 obj-$(CONFIG_NETFILTER)	+= netfilter/
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 30caa28..81b9079 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -390,11 +390,17 @@ int __init xfrm6_init(void)
 	if (ret)
 		goto out_state;
 
+	ret = xfrm6_xlat_addr_init();
+	if (ret)
+		goto out_protocol;
+
 #ifdef CONFIG_SYSCTL
 	register_pernet_subsys(&xfrm6_net_ops);
 #endif
 out:
 	return ret;
+out_protocol:
+	xfrm6_protocol_fini();
 out_state:
 	xfrm6_state_fini();
 out_policy:
@@ -407,6 +413,7 @@ void xfrm6_fini(void)
 #ifdef CONFIG_SYSCTL
 	unregister_pernet_subsys(&xfrm6_net_ops);
 #endif
+	xfrm6_xlat_addr_fini();
 	xfrm6_protocol_fini();
 	xfrm6_policy_fini();
 	xfrm6_state_fini();
diff --git a/net/ipv6/xfrm6_xlat_addr.c b/net/ipv6/xfrm6_xlat_addr.c
new file mode 100644
index 0000000..dd2199a
--- /dev/null
+++ b/net/ipv6/xfrm6_xlat_addr.c
@@ -0,0 +1,66 @@
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <net/ipv6.h>
+#include <net/xfrm.h>
+
+static struct list_head xfrm6_xlat_addr_head __read_mostly;
+static DEFINE_SPINLOCK(xfrm6_xlat_addr_lock);
+
+void xfrm6_xlat_addr_add(struct xfrm6_xlat_addr *xla)
+{
+	spin_lock(&xfrm6_xlat_addr_lock);
+	list_add_rcu(&xla->list, &xfrm6_xlat_addr_head);
+	spin_unlock(&xfrm6_xlat_addr_lock);
+}
+EXPORT_SYMBOL(xfrm6_xlat_addr_add);
+
+void xfrm6_xlat_addr_del(struct xfrm6_xlat_addr *xla)
+{
+	struct xfrm6_xlat_addr *tmp;
+
+	spin_lock(&xfrm6_xlat_addr_lock);
+
+	list_for_each_entry_rcu(tmp, &xfrm6_xlat_addr_head, list) {
+		if (xla == tmp) {
+			list_del_rcu(&xla->list);
+			goto out;
+		}
+	}
+
+	pr_warn("xfrm6_xlat_addr_del: %p not found\n", xla);
+out:
+	spin_unlock(&xfrm6_xlat_addr_lock);
+}
+EXPORT_SYMBOL(xfrm6_xlat_addr_del);
+
+int xfrm6_xlat_addr(struct sk_buff *skb)
+{
+	struct xfrm6_xlat_addr *xla;
+	int err = 0;
+
+	rcu_read_lock();
+
+	list_for_each_entry_rcu(xla, &xfrm6_xlat_addr_head, list) {
+		err = xla->xlat(skb);
+		if (err < 0)
+			break;
+	}
+
+	rcu_read_unlock();
+
+	return err;
+}
+EXPORT_SYMBOL(xfrm6_xlat_addr);
+
+int __init xfrm6_xlat_addr_init(void)
+{
+	INIT_LIST_HEAD(&xfrm6_xlat_addr_head);
+
+	return 0;
+}
+
+void xfrm6_xlat_addr_fini(void)
+{
+}
+
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv
  2015-09-29 22:17 [PATCH net-next 0/6] ila: Optimization to preserve value of early demux Tom Herbert
                   ` (3 preceding siblings ...)
  2015-09-29 22:17 ` [PATCH net-next 4/6] xfrm: Add xfrm6 address translation function Tom Herbert
@ 2015-09-29 22:17 ` Tom Herbert
  2015-09-29 23:26   ` Florian Westphal
  2015-09-30  9:06   ` Steffen Klassert
  2015-09-29 22:17 ` [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr Tom Herbert
  5 siblings, 2 replies; 15+ messages in thread
From: Tom Herbert @ 2015-09-29 22:17 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Call before performing NF_HOOK and routing in order to perform address
translation in the receive path.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 net/ipv6/ip6_input.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 9075acf..06dac55 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -183,6 +183,9 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt
 	/* Must drop socket now because of tproxy. */
 	skb_orphan(skb);
 
+	/* Translate destination address before routing */
+	xfrm6_xlat_addr(skb);
+
 	return NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
 		       net, NULL, skb, dev, NULL,
 		       ip6_rcv_finish);
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr
  2015-09-29 22:17 [PATCH net-next 0/6] ila: Optimization to preserve value of early demux Tom Herbert
                   ` (4 preceding siblings ...)
  2015-09-29 22:17 ` [PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv Tom Herbert
@ 2015-09-29 22:17 ` Tom Herbert
  2015-09-29 22:34   ` kbuild test robot
                     ` (2 more replies)
  5 siblings, 3 replies; 15+ messages in thread
From: Tom Herbert @ 2015-09-29 22:17 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

This patch set up a hook for xfrm6_xlat_addr. This provides a way to
perform ILA translation before early demux which can be a significant
performance advantage over LWT which would occur later.

The implementation entails a rhashtable which is used to do the locator
lookup. The rhash table is configured via new netlink commands.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/uapi/linux/ila.h  |  22 ++
 net/ipv6/Kconfig          |   1 +
 net/ipv6/ila/Makefile     |   2 +-
 net/ipv6/ila/ila.h        |   2 +
 net/ipv6/ila/ila_common.c |   8 +
 net/ipv6/ila/ila_xlat.c   | 642 ++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 676 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv6/ila/ila_xlat.c

diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h
index 7ed9e67..abde7bb 100644
--- a/include/uapi/linux/ila.h
+++ b/include/uapi/linux/ila.h
@@ -3,13 +3,35 @@
 #ifndef _UAPI_LINUX_ILA_H
 #define _UAPI_LINUX_ILA_H
 
+/* NETLINK_GENERIC related info */
+#define ILA_GENL_NAME		"ila"
+#define ILA_GENL_VERSION	0x1
+
 enum {
 	ILA_ATTR_UNSPEC,
 	ILA_ATTR_LOCATOR,			/* u64 */
+	ILA_ATTR_IDENTIFIER,			/* u64 */
+	ILA_ATTR_LOCATOR_MATCH,			/* u64 */
+	ILA_ATTR_IFINDEX,			/* s32 */
+	ILA_ATTR_DIR,				/* u32 */
 
 	__ILA_ATTR_MAX,
 };
 
 #define ILA_ATTR_MAX		(__ILA_ATTR_MAX - 1)
 
+enum {
+	ILA_CMD_UNSPEC,
+	ILA_CMD_ADD,
+	ILA_CMD_DEL,
+	ILA_CMD_GET,
+
+	__ILA_CMD_MAX,
+};
+
+#define ILA_CMD_MAX	(__ILA_CMD_MAX - 1)
+
+#define ILA_DIR_IN	(1 << 0)
+#define ILA_DIR_OUT	(1 << 1)
+
 #endif /* _UAPI_LINUX_ILA_H */
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 6e8ca06..c972497 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -95,6 +95,7 @@ config IPV6_MIP6
 config IPV6_ILA
 	tristate "IPv6: Identifier Locator Addressing (ILA)"
 	select LWTUNNEL
+	select INET6_XFRM_XLAT_ADDR
 	---help---
 	  Support for IPv6 Identifier Locator Addressing (ILA).
 
diff --git a/net/ipv6/ila/Makefile b/net/ipv6/ila/Makefile
index 31d136b..4b32e59 100644
--- a/net/ipv6/ila/Makefile
+++ b/net/ipv6/ila/Makefile
@@ -4,4 +4,4 @@
 
 obj-$(CONFIG_IPV6_ILA) += ila.o
 
-ila-objs := ila_common.o ila_lwt.o
+ila-objs := ila_common.o ila_lwt.o ila_xlat.o
diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
index b94081f..28542cb 100644
--- a/net/ipv6/ila/ila.h
+++ b/net/ipv6/ila/ila.h
@@ -42,5 +42,7 @@ void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p);
 
 int ila_lwt_init(void);
 void ila_lwt_fini(void);
+int ila_xlat_init(void);
+void ila_xlat_fini(void);
 
 #endif /* __ILA_H */
diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index 1a1e1e0..cde7b96 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -80,12 +80,20 @@ static int __init ila_init(void)
 	if (ret)
 		goto fail_lwt;
 
+	ret = ila_xlat_init();
+	if (ret)
+		goto fail_xlat;
+
+	return 0;
+fail_xlat:
+	ila_lwt_fini();
 fail_lwt:
 	return ret;
 }
 
 static void __exit ila_fini(void)
 {
+	ila_xlat_fini();
 	ila_lwt_fini();
 }
 
diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
new file mode 100644
index 0000000..cd6135b
--- /dev/null
+++ b/net/ipv6/ila/ila_xlat.c
@@ -0,0 +1,642 @@
+#include <linux/jhash.h>
+#include <linux/netfilter.h>
+#include <linux/rcupdate.h>
+#include <linux/rhashtable.h>
+#include <linux/vmalloc.h>
+#include <net/genetlink.h>
+#include <net/netns/generic.h>
+#include <net/xfrm.h>
+#include <uapi/linux/genetlink.h>
+#include "ila.h"
+
+struct ila_xlat_params {
+	struct ila_params ip;
+	__be64 identifier;
+	int ifindex;
+	unsigned int dir;
+};
+
+struct ila_map {
+	struct ila_xlat_params p;
+	struct rhash_head node;
+	struct ila_map *next;
+	struct rcu_head rcu;
+};
+
+static unsigned int ila_net_id;
+
+struct ila_net {
+	struct rhashtable rhash_table;
+	spinlock_t *locks; /* Bucket locks for entry manipulation */
+	unsigned int locks_mask;
+};
+
+#define	LOCKS_PER_CPU 10
+
+static int alloc_ila_locks(struct ila_net *ilan, gfp_t gfp)
+{
+	unsigned int i, size;
+	unsigned int nr_pcpus = num_possible_cpus();
+
+	nr_pcpus = min_t(unsigned int, nr_pcpus, 32UL);
+	size = roundup_pow_of_two(nr_pcpus * LOCKS_PER_CPU);
+
+	if (sizeof(spinlock_t) != 0) {
+#ifdef CONFIG_NUMA
+		if (size * sizeof(spinlock_t) > PAGE_SIZE &&
+		    gfp == GFP_KERNEL)
+			ilan->locks = vmalloc(size * sizeof(spinlock_t));
+		else
+#endif
+		ilan->locks = kmalloc_array(size, sizeof(spinlock_t),
+					    gfp);
+		if (!ilan->locks)
+			return -ENOMEM;
+		for (i = 0; i < size; i++)
+			spin_lock_init(&ilan->locks[i]);
+	}
+	ilan->locks_mask = size - 1;
+
+	return 0;
+}
+
+static u32 hashrnd __read_mostly;
+static __always_inline void __ila_hash_secret_init(void)
+{
+	net_get_random_once(&hashrnd, sizeof(hashrnd));
+}
+
+static inline u32 ila_identifier_hash(__be64 identifier)
+{
+	u32 *v = (u32 *)&identifier;
+
+	return jhash_2words(v[0], v[1], hashrnd);
+}
+
+static inline spinlock_t *ila_get_lock(struct ila_net *ilan, __be64 identifier)
+{
+	return &ilan->locks[ila_identifier_hash(identifier) & ilan->locks_mask];
+}
+
+static inline int ila_cmp_wildcards(struct ila_map *ila, __be64 loc,
+				    int ifindex, unsigned int dir)
+{
+	return (ila->p.ip.locator_match && ila->p.ip.locator_match != loc) ||
+	       (ila->p.ifindex && ila->p.ifindex != ifindex) ||
+	       !(ila->p.dir & dir);
+}
+
+static inline int ila_cmp_params(struct ila_map *ila, struct ila_xlat_params *p)
+{
+	return (ila->p.ip.locator_match != p->ip.locator_match) ||
+	       (ila->p.ifindex != p->ifindex) ||
+	       (ila->p.dir != p->dir);
+}
+
+static int ila_cmpfn(struct rhashtable_compare_arg *arg,
+		     const void *obj)
+{
+	const struct ila_map *ila = obj;
+
+	return (ila->p.identifier != *(__be64 *)arg->key);
+}
+
+static inline int ila_order(struct ila_map *ila)
+{
+	int score = 0;
+
+	if (ila->p.ip.locator_match)
+		score += 1 << 0;
+
+	if (ila->p.ifindex)
+		score += 1 << 1;
+
+	return score;
+}
+
+static const struct rhashtable_params rht_params = {
+	.nelem_hint = 1024,
+	.head_offset = offsetof(struct ila_map, node),
+	.key_offset = offsetof(struct ila_map, p.identifier),
+	.key_len = sizeof(u64), /* identifier */
+	.max_size = 1048576,
+	.min_size = 256,
+	.automatic_shrinking = true,
+	.obj_cmpfn = ila_cmpfn,
+};
+
+static struct genl_family ila_nl_family = {
+	.id		= GENL_ID_GENERATE,
+	.hdrsize	= 0,
+	.name		= ILA_GENL_NAME,
+	.version	= ILA_GENL_VERSION,
+	.maxattr	= ILA_ATTR_MAX,
+	.netnsok	= true,
+	.parallel_ops	= true,
+};
+
+static struct nla_policy ila_nl_policy[ILA_ATTR_MAX + 1] = {
+	[ILA_ATTR_IDENTIFIER] = { .type = NLA_U64, },
+	[ILA_ATTR_LOCATOR] = { .type = NLA_U64, },
+	[ILA_ATTR_LOCATOR_MATCH] = { .type = NLA_U64, },
+	[ILA_ATTR_IFINDEX] = { .type = NLA_U32, },
+	[ILA_ATTR_DIR] = { .type = NLA_U32, },
+};
+
+static int parse_nl_config(struct genl_info *info,
+			   struct ila_xlat_params *p)
+{
+	memset(p, 0, sizeof(*p));
+
+	if (info->attrs[ILA_ATTR_IDENTIFIER])
+		p->identifier = (__force __be64)nla_get_u64(
+			info->attrs[ILA_ATTR_IDENTIFIER]);
+
+	if (info->attrs[ILA_ATTR_LOCATOR])
+		p->ip.locator = (__force __be64)nla_get_u64(
+			info->attrs[ILA_ATTR_LOCATOR]);
+
+	if (info->attrs[ILA_ATTR_LOCATOR_MATCH])
+		p->ip.locator_match = (__force __be64)nla_get_u64(
+			info->attrs[ILA_ATTR_LOCATOR_MATCH]);
+
+	if (info->attrs[ILA_ATTR_IFINDEX])
+		p->ifindex = nla_get_s32(info->attrs[ILA_ATTR_IFINDEX]);
+
+	if (info->attrs[ILA_ATTR_DIR])
+		p->dir = nla_get_u32(info->attrs[ILA_ATTR_DIR]);
+
+	return 0;
+}
+
+/* Must be called with rcu readlock */
+static inline struct ila_map *ila_lookup_wildcards(__be64 id, __be64 loc,
+						   int ifindex,
+						   unsigned int dir,
+						   struct ila_net *ilan)
+{
+	struct ila_map *ila;
+
+	ila = rhashtable_lookup_fast(&ilan->rhash_table, &id, rht_params);
+	while (ila) {
+		if (!ila_cmp_wildcards(ila, loc, ifindex, dir))
+			return ila;
+		ila = rcu_access_pointer(ila->next);
+	}
+
+	return NULL;
+}
+
+/* Must be called with rcu readlock */
+static inline struct ila_map *ila_lookup_by_params(struct ila_xlat_params *p,
+						   struct ila_net *ilan)
+{
+	struct ila_map *ila;
+
+	ila = rhashtable_lookup_fast(&ilan->rhash_table, &p->identifier,
+				     rht_params);
+	while (ila) {
+		if (!ila_cmp_params(ila, p))
+			return ila;
+		ila = rcu_access_pointer(ila->next);
+	}
+
+	return NULL;
+}
+
+static inline void ila_release(struct ila_map *ila)
+{
+	kfree_rcu(ila, rcu);
+}
+
+static void ila_free_cb(void *ptr, void *arg)
+{
+	struct ila_map *ila = (struct ila_map *)ptr, *next;
+
+	/* Assume rcu_readlock held */
+	while (ila) {
+		next = rcu_access_pointer(ila->next);
+		ila_release(ila);
+		ila = next;
+	}
+}
+
+static int ila_add_mapping(struct net *net, struct ila_xlat_params *p)
+{
+	struct ila_net *ilan = net_generic(net, ila_net_id);
+	struct ila_map *ila, *head;
+	spinlock_t *lock = ila_get_lock(ilan, p->identifier);
+	int err = 0, order;
+
+	ila = kzalloc(sizeof(*ila), GFP_KERNEL);
+	if (!ila)
+		return -ENOMEM;
+
+	ila->p = *p;
+
+	if (p->ip.locator_match) {
+		/* Precompute checksum difference for translation since we
+		 * know both the old identifier and the new one.
+		 */
+		ila->p.ip.csum_diff = compute_csum_diff8(
+			(__be32 *)&p->ip.locator_match,
+			(__be32 *)&p->ip.locator);
+	}
+
+	order = ila_order(ila);
+
+	spin_lock(lock);
+
+	head = rhashtable_lookup_fast(&ilan->rhash_table, &p->identifier,
+				      rht_params);
+	if (!head) {
+		/* New entry for the rhash_table */
+		err = rhashtable_lookup_insert_fast(&ilan->rhash_table,
+						    &ila->node, rht_params);
+	} else {
+		struct ila_map *tila = head, *prev = NULL;
+
+		do {
+			if (!ila_cmp_params(tila, p)) {
+				err = -EEXIST;
+				goto out;
+			}
+
+			if (order > ila_order(tila))
+				break;
+
+			prev = tila;
+			tila = rcu_dereference_protected(tila->next,
+				lockdep_is_held(lock));
+		} while (tila);
+
+		if (prev) {
+			/* Insert in sub list of head */
+			RCU_INIT_POINTER(ila->next, tila);
+			rcu_assign_pointer(prev->next, ila);
+		} else {
+			/* Make this ila new head */
+			RCU_INIT_POINTER(ila->next, head);
+			err = rhashtable_replace_fast(&ilan->rhash_table,
+						      &head->node,
+						      &ila->node, rht_params);
+			if (err)
+				goto out;
+		}
+	}
+
+out:
+	spin_unlock(lock);
+
+	if (err)
+		kfree(ila);
+
+	return err;
+}
+
+static int ila_del_mapping(struct net *net, struct ila_xlat_params *p)
+{
+	struct ila_net *ilan = net_generic(net, ila_net_id);
+	struct ila_map *ila, *head, *prev;
+	spinlock_t *lock = ila_get_lock(ilan, p->identifier);
+	int err = -ENOENT;
+
+	spin_lock(lock);
+
+	head = rhashtable_lookup_fast(&ilan->rhash_table,
+				      &p->identifier, rht_params);
+	ila = head;
+
+	prev = NULL;
+
+	while (ila) {
+		if (ila_cmp_params(ila, p)) {
+			prev = ila;
+			ila = rcu_dereference_protected(ila->next,
+							lockdep_is_held(lock));
+			continue;
+		}
+
+		err = 0;
+
+		if (prev) {
+			/* Not head, just delete from list */
+			rcu_assign_pointer(prev->next, ila->next);
+		} else {
+			/* It is the head. If there is something in the
+			 * sublist we need to make a new head.
+			 */
+			head = rcu_dereference_protected(ila->next,
+							 lockdep_is_held(lock));
+			if (head) {
+				/* Put first entry in the sublist into the
+				 * table
+				 */
+				err = rhashtable_replace_fast(
+					&ilan->rhash_table, &ila->node,
+					&head->node, rht_params);
+				if (err)
+					goto out;
+			} else {
+				/* Entry no longer used */
+				err = rhashtable_remove_fast(&ilan->rhash_table,
+							     &ila->node,
+							     rht_params);
+			}
+		}
+
+		ila_release(ila);
+
+		break;
+	}
+
+out:
+	spin_unlock(lock);
+
+	return err;
+}
+
+static int ila_nl_cmd_add_mapping(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct ila_xlat_params p;
+	int err;
+
+	err = parse_nl_config(info, &p);
+	if (err)
+		return err;
+
+	return ila_add_mapping(net, &p);
+}
+
+static int ila_nl_cmd_del_mapping(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct ila_xlat_params p;
+	int err;
+
+	err = parse_nl_config(info, &p);
+	if (err)
+		return err;
+
+	ila_del_mapping(net, &p);
+
+	return 0;
+}
+
+static int ila_fill_info(struct ila_map *ila, struct sk_buff *msg)
+{
+	if (nla_put_u64(msg, ILA_ATTR_IDENTIFIER,
+			(__force u64)ila->p.identifier) ||
+	    nla_put_u64(msg, ILA_ATTR_LOCATOR,
+			(__force u64)ila->p.ip.locator) ||
+	    nla_put_u64(msg, ILA_ATTR_LOCATOR_MATCH,
+			(__force u64)ila->p.ip.locator_match) ||
+	    nla_put_s32(msg, ILA_ATTR_IFINDEX, ila->p.ifindex) ||
+	    nla_put_u32(msg, ILA_ATTR_DIR, ila->p.dir))
+		return -1;
+
+	return 0;
+}
+
+static int ila_dump_info(struct ila_map *ila,
+			 u32 portid, u32 seq, u32 flags,
+			 struct sk_buff *skb, u8 cmd)
+{
+	void *hdr;
+
+	hdr = genlmsg_put(skb, portid, seq, &ila_nl_family, flags, cmd);
+	if (!hdr)
+		return -ENOMEM;
+
+	if (ila_fill_info(ila, skb) < 0)
+		goto nla_put_failure;
+
+	genlmsg_end(skb, hdr);
+	return 0;
+
+nla_put_failure:
+	genlmsg_cancel(skb, hdr);
+	return -EMSGSIZE;
+}
+
+static int ila_nl_cmd_get_mapping(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct ila_net *ilan = net_generic(net, ila_net_id);
+	struct sk_buff *msg;
+	struct ila_xlat_params p;
+	struct ila_map *ila;
+	int ret;
+
+	ret = parse_nl_config(info, &p);
+	if (ret)
+		return ret;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	rcu_read_lock();
+
+	ila = ila_lookup_by_params(&p, ilan);
+	if (ila) {
+		ret = ila_dump_info(ila,
+				    info->snd_portid,
+				    info->snd_seq, 0, msg,
+				    info->genlhdr->cmd);
+	}
+
+	rcu_read_unlock();
+
+	if (ret < 0)
+		goto out_free;
+
+	return genlmsg_reply(msg, info);
+
+out_free:
+	nlmsg_free(msg);
+	return ret;
+}
+
+struct ila_dump_iter {
+	struct rhashtable_iter rhiter;
+};
+
+static int ila_nl_dump_start(struct netlink_callback *cb)
+{
+	struct net *net = sock_net(cb->skb->sk);
+	struct ila_net *ilan = net_generic(net, ila_net_id);
+	struct ila_dump_iter *iter = (struct ila_dump_iter *)cb->args;
+
+	return rhashtable_walk_init(&ilan->rhash_table, &iter->rhiter);
+}
+
+static int ila_nl_dump_done(struct netlink_callback *cb)
+{
+	struct ila_dump_iter *iter = (struct ila_dump_iter *)cb->args;
+
+	rhashtable_walk_exit(&iter->rhiter);
+
+	return 0;
+}
+
+static int ila_nl_dump(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	struct ila_dump_iter *iter = (struct ila_dump_iter *)cb->args;
+	struct rhashtable_iter *rhiter = &iter->rhiter;
+	struct ila_map *ila;
+	int ret;
+
+	ret = rhashtable_walk_start(rhiter);
+	if (ret && ret != -EAGAIN)
+		goto done;
+
+	for (;;) {
+		ila = rhashtable_walk_next(rhiter);
+
+		if (IS_ERR(ila)) {
+			if (PTR_ERR(ila) == -EAGAIN)
+				continue;
+			ret = PTR_ERR(ila);
+			goto done;
+		} else if (!ila) {
+			break;
+		}
+
+		while (ila) {
+			ret =  ila_dump_info(ila, NETLINK_CB(cb->skb).portid,
+					     cb->nlh->nlmsg_seq, NLM_F_MULTI,
+					     skb, ILA_CMD_GET);
+			if (ret)
+				goto done;
+
+			ila = rcu_access_pointer(ila->next);
+		}
+	}
+
+	ret = skb->len;
+
+done:
+	rhashtable_walk_stop(rhiter);
+	return ret;
+}
+
+static const struct genl_ops ila_nl_ops[] = {
+	{
+		.cmd = ILA_CMD_ADD,
+		.doit = ila_nl_cmd_add_mapping,
+		.policy = ila_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = ILA_CMD_DEL,
+		.doit = ila_nl_cmd_del_mapping,
+		.policy = ila_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = ILA_CMD_GET,
+		.doit = ila_nl_cmd_get_mapping,
+		.start = ila_nl_dump_start,
+		.dumpit = ila_nl_dump,
+		.done = ila_nl_dump_done,
+		.policy = ila_nl_policy,
+	},
+};
+
+#define ILA_HASH_TABLE_SIZE 1024
+
+static __net_init int ila_init_net(struct net *net)
+{
+	int err;
+	struct ila_net *ilan = net_generic(net, ila_net_id);
+
+	err = alloc_ila_locks(ilan, GFP_KERNEL);
+	if (err)
+		return err;
+
+	rhashtable_init(&ilan->rhash_table, &rht_params);
+
+	return 0;
+}
+
+static __net_exit void ila_exit_net(struct net *net)
+{
+	struct ila_net *ilan = net_generic(net, ila_net_id);
+
+	rhashtable_free_and_destroy(&ilan->rhash_table, ila_free_cb, NULL);
+
+	kvfree(ilan->locks);
+}
+
+static struct pernet_operations ila_net_ops = {
+	.init = ila_init_net,
+	.exit = ila_exit_net,
+	.id   = &ila_net_id,
+	.size = sizeof(struct ila_net),
+};
+
+static int ila_xlat_addr_in(struct sk_buff *skb)
+{
+	struct ila_map *ila;
+	struct ipv6hdr *ip6h = ipv6_hdr(skb);
+	struct net *net = dev_net(skb->dev);
+	struct ila_net *ilan = net_generic(net, ila_net_id);
+	__be64 identifier, locator_match;
+	size_t nhoff;
+
+	/* Assumes skb contains a valid IPv6 header that is pulled */
+
+	identifier = *(__be64 *)&ip6h->daddr.in6_u.u6_addr8[8];
+	locator_match = *(__be64 *)&ip6h->daddr.in6_u.u6_addr8[0];
+	nhoff = sizeof(struct ipv6hdr);
+
+	rcu_read_lock();
+
+	ila = ila_lookup_wildcards(identifier, locator_match,
+				   skb->dev->ifindex, ILA_DIR_IN, ilan);
+	if (ila)
+		update_ipv6_locator(skb, &ila->p.ip);
+
+	rcu_read_unlock();
+
+	return 0;
+}
+
+struct xfrm6_xlat_addr ila_xlat = {
+	.xlat = ila_xlat_addr_in,
+};
+
+int ila_xlat_init(void)
+{
+	int ret;
+
+	ret = register_pernet_device(&ila_net_ops);
+	if (ret)
+		goto exit;
+
+	ret = genl_register_family_with_ops(&ila_nl_family,
+					    ila_nl_ops);
+	if (ret < 0)
+		goto unregister;
+
+	xfrm6_xlat_addr_add(&ila_xlat);
+
+	return 0;
+
+unregister:
+	unregister_pernet_device(&ila_net_ops);
+exit:
+	return ret;
+}
+
+void ila_xlat_fini(void)
+{
+	int i;
+
+	xfrm6_xlat_addr_del(&ila_xlat);
+	genl_unregister_family(&ila_nl_family);
+	unregister_pernet_device(&ila_net_ops);
+}
+
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr
  2015-09-29 22:17 ` [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr Tom Herbert
@ 2015-09-29 22:34   ` kbuild test robot
  2015-09-29 22:49   ` kbuild test robot
  2015-09-29 23:18   ` kbuild test robot
  2 siblings, 0 replies; 15+ messages in thread
From: kbuild test robot @ 2015-09-29 22:34 UTC (permalink / raw)
  To: Tom Herbert; +Cc: kbuild-all, davem, netdev, kernel-team

[-- Attachment #1: Type: text/plain, Size: 1328 bytes --]

Hi Tom,

[auto build test results on next-20150929 -- if it's inappropriate base, please ignore]

config: xtensa-allyesconfig (attached as .config)
reproduce:
  wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout c505336670b5c681c0a36053a68591e0f9074245
  # save the attached .config to linux build tree
  make.cross ARCH=xtensa 

All warnings (new ones prefixed by >>):

   net/ipv6/ila/ila_xlat.c: In function 'ila_xlat_fini':
>> net/ipv6/ila/ila_xlat.c:636:6: warning: unused variable 'i' [-Wunused-variable]
     int i;
         ^

vim +/i +636 net/ipv6/ila/ila_xlat.c

   620						    ila_nl_ops);
   621		if (ret < 0)
   622			goto unregister;
   623	
   624		xfrm6_xlat_addr_add(&ila_xlat);
   625	
   626		return 0;
   627	
   628	unregister:
   629		unregister_pernet_device(&ila_net_ops);
   630	exit:
   631		return ret;
   632	}
   633	
   634	void ila_xlat_fini(void)
   635	{
 > 636		int i;
   637	
   638		xfrm6_xlat_addr_del(&ila_xlat);
   639		genl_unregister_family(&ila_nl_family);
   640		unregister_pernet_device(&ila_net_ops);
   641	}
   642	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 42546 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr
  2015-09-29 22:17 ` [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr Tom Herbert
  2015-09-29 22:34   ` kbuild test robot
@ 2015-09-29 22:49   ` kbuild test robot
  2015-09-29 23:18   ` kbuild test robot
  2 siblings, 0 replies; 15+ messages in thread
From: kbuild test robot @ 2015-09-29 22:49 UTC (permalink / raw)
  To: Tom Herbert; +Cc: kbuild-all, davem, netdev, kernel-team

[-- Attachment #1: Type: text/plain, Size: 764 bytes --]

Hi Tom,

[auto build test results on next-20150929 -- if it's inappropriate base, please ignore]

config: m68k-sun3_defconfig (attached as .config)
reproduce:
  wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout c505336670b5c681c0a36053a68591e0f9074245
  # save the attached .config to linux build tree
  make.cross ARCH=m68k 

All error/warnings (new ones prefixed by >>):

>> ERROR: "xfrm6_xlat_addr_fini" [net/ipv6/ipv6.ko] undefined!
>> ERROR: "xfrm6_xlat_addr_init" [net/ipv6/ipv6.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 10984 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 4/6] xfrm: Add xfrm6 address translation function
  2015-09-29 22:17 ` [PATCH net-next 4/6] xfrm: Add xfrm6 address translation function Tom Herbert
@ 2015-09-29 22:58   ` David Ahern
  2015-09-30  8:39     ` Steffen Klassert
  0 siblings, 1 reply; 15+ messages in thread
From: David Ahern @ 2015-09-29 22:58 UTC (permalink / raw)
  To: Tom Herbert, davem, netdev; +Cc: kernel-team

Hi Tom:

On 9/29/15 4:17 PM, Tom Herbert wrote:
> This patch adds xfrm6_xlat_addr which is called in the data path
> to perform address translation (primarily for the receive path). Modules
> may register their own callback to perform a translation-- this
> registration is managed by xfrm6_xlat_addr_add and xfrm6_xlat_addr_del.
> xfrm6_xlat_addr allows translation of addresses for an sk_buff.


Seems like a stretch to lump this into xfrms. You have a separate genl 
based config as opposed to the netlink xfrm API and you are calling the 
xlat_addr function directly in ip6_rcv as opposed to via some policy 
with dst_ops driven redirection. Why call this a xfrm?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr
  2015-09-29 22:17 ` [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr Tom Herbert
  2015-09-29 22:34   ` kbuild test robot
  2015-09-29 22:49   ` kbuild test robot
@ 2015-09-29 23:18   ` kbuild test robot
  2 siblings, 0 replies; 15+ messages in thread
From: kbuild test robot @ 2015-09-29 23:18 UTC (permalink / raw)
  To: Tom Herbert; +Cc: kbuild-all, davem, netdev, kernel-team

Hi Tom,

[auto build test results on next-20150929 -- if it's inappropriate base, please ignore]

reproduce:
  # apt-get install sparse
  make ARCH=x86_64 allmodconfig
  make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/ipv6/ila/ila_xlat.c:218:24: sparse: incompatible types in comparison expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:269:32: sparse: incompatible types in comparison expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:275:25: sparse: incompatible types in comparison expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:279:25: sparse: incompatible types in comparison expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:315:31: sparse: incompatible types in comparison expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:329:32: sparse: incompatible types in comparison expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:201:23: sparse: incompatible types in comparison expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:514:31: sparse: incompatible types in comparison expression (different address spaces)
   net/ipv6/ila/ila_xlat.c:184:23: sparse: incompatible types in comparison expression (different address spaces)
   net/ipv6/ila/ila_xlat.c: In function 'ila_xlat_fini':
   net/ipv6/ila/ila_xlat.c:636:6: warning: unused variable 'i' [-Wunused-variable]
     int i;
         ^

vim +218 net/ipv6/ila/ila_xlat.c

   202		}
   203	
   204		return NULL;
   205	}
   206	
   207	static inline void ila_release(struct ila_map *ila)
   208	{
   209		kfree_rcu(ila, rcu);
   210	}
   211	
   212	static void ila_free_cb(void *ptr, void *arg)
   213	{
   214		struct ila_map *ila = (struct ila_map *)ptr, *next;
   215	
   216		/* Assume rcu_readlock held */
   217		while (ila) {
 > 218			next = rcu_access_pointer(ila->next);
   219			ila_release(ila);
   220			ila = next;
   221		}
   222	}
   223	
   224	static int ila_add_mapping(struct net *net, struct ila_xlat_params *p)
   225	{
   226		struct ila_net *ilan = net_generic(net, ila_net_id);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv
  2015-09-29 22:17 ` [PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv Tom Herbert
@ 2015-09-29 23:26   ` Florian Westphal
  2015-09-30  9:06   ` Steffen Klassert
  1 sibling, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2015-09-29 23:26 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, kernel-team

Tom Herbert <tom@herbertland.com> wrote:
> Call before performing NF_HOOK and routing in order to perform address
> translation in the receive path.
> 
> Signed-off-by: Tom Herbert <tom@herbertland.com>
> ---
>  net/ipv6/ip6_input.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 9075acf..06dac55 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -183,6 +183,9 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt
>  	/* Must drop socket now because of tproxy. */
>  	skb_orphan(skb);
>  
> +	/* Translate destination address before routing */
> +	xfrm6_xlat_addr(skb);
> +

Ugh.  Yet another hook :-(
One would think we have enough by now.

In any case, I still think this ILA translation stuff should either
go into xtables (NPT-ish), nftables, or into tc if nft is unusable for
whatever reeason.  Judging by where this hook is placed, nf hooks
would work just fine.

If the iptables traverser has too high cost (unfortunately,
xtables design enforces counters and iface name matching even if its
not wanted/unneeded for instance), maybe nft would perform better in that
regard.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 4/6] xfrm: Add xfrm6 address translation function
  2015-09-29 22:58   ` David Ahern
@ 2015-09-30  8:39     ` Steffen Klassert
  0 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2015-09-30  8:39 UTC (permalink / raw)
  To: David Ahern; +Cc: Tom Herbert, davem, netdev, kernel-team

On Tue, Sep 29, 2015 at 04:58:46PM -0600, David Ahern wrote:
> Hi Tom:
> 
> On 9/29/15 4:17 PM, Tom Herbert wrote:
> >This patch adds xfrm6_xlat_addr which is called in the data path
> >to perform address translation (primarily for the receive path). Modules
> >may register their own callback to perform a translation-- this
> >registration is managed by xfrm6_xlat_addr_add and xfrm6_xlat_addr_del.
> >xfrm6_xlat_addr allows translation of addresses for an sk_buff.
> 
> 
> Seems like a stretch to lump this into xfrms. You have a separate
> genl based config as opposed to the netlink xfrm API and you are
> calling the xlat_addr function directly in ip6_rcv as opposed to via
> some policy with dst_ops driven redirection. Why call this a xfrm?

I have to agree here. We have policies and states to do the lookups
and to describe the transformation. Just adding a callback to do this
in a different way does not integrate well into xfrm.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv
  2015-09-29 22:17 ` [PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv Tom Herbert
  2015-09-29 23:26   ` Florian Westphal
@ 2015-09-30  9:06   ` Steffen Klassert
  2015-09-30 18:40     ` Tom Herbert
  1 sibling, 1 reply; 15+ messages in thread
From: Steffen Klassert @ 2015-09-30  9:06 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, kernel-team

On Tue, Sep 29, 2015 at 03:17:22PM -0700, Tom Herbert wrote:
> Call before performing NF_HOOK and routing in order to perform address
> translation in the receive path.
> 
> Signed-off-by: Tom Herbert <tom@herbertland.com>
> ---
>  net/ipv6/ip6_input.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 9075acf..06dac55 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -183,6 +183,9 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt
>  	/* Must drop socket now because of tproxy. */
>  	skb_orphan(skb);
>  
> +	/* Translate destination address before routing */
> +	xfrm6_xlat_addr(skb);
> +

This shows that xfrm is not the right place to add this. The existing
xfrm hooks are located at the same place as your current LWT hooks are.

You could use the existing xfrm hooks similar to xfrm tunnel modes.
This reinserts the transformed packet back into layer2, but I guess
this is not what you want.

I'm currently paying with a GRO codepath for IPsec to get the
packets transformed early. If you can do your address translation
that early, it could be an option too. This clearly depends on
enabled GRO at the receiving device, but you would still have
the LWT hook as a fallback.

>  	return NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
>  		       net, NULL, skb, dev, NULL,
>  		       ip6_rcv_finish);

Or, try to use the netfilter hook that seems to be at the right
place at least.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv
  2015-09-30  9:06   ` Steffen Klassert
@ 2015-09-30 18:40     ` Tom Herbert
  0 siblings, 0 replies; 15+ messages in thread
From: Tom Herbert @ 2015-09-30 18:40 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team

On Wed, Sep 30, 2015 at 2:06 AM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> On Tue, Sep 29, 2015 at 03:17:22PM -0700, Tom Herbert wrote:
>> Call before performing NF_HOOK and routing in order to perform address
>> translation in the receive path.
>>
>> Signed-off-by: Tom Herbert <tom@herbertland.com>
>> ---
>>  net/ipv6/ip6_input.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
>> index 9075acf..06dac55 100644
>> --- a/net/ipv6/ip6_input.c
>> +++ b/net/ipv6/ip6_input.c
>> @@ -183,6 +183,9 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt
>>       /* Must drop socket now because of tproxy. */
>>       skb_orphan(skb);
>>
>> +     /* Translate destination address before routing */
>> +     xfrm6_xlat_addr(skb);
>> +
>
> This shows that xfrm is not the right place to add this. The existing
> xfrm hooks are located at the same place as your current LWT hooks are.
>
> You could use the existing xfrm hooks similar to xfrm tunnel modes.
> This reinserts the transformed packet back into layer2, but I guess
> this is not what you want.
>
> I'm currently paying with a GRO codepath for IPsec to get the
> packets transformed early. If you can do your address translation
> that early, it could be an option too. This clearly depends on
> enabled GRO at the receiving device, but you would still have
> the LWT hook as a fallback.
>
GRO probably doesn't help here. ILA already works with GRO, and
performing translation for every segment instead of just once for the
GRO packet would be unnecessary overhead. Besides, that still doesn't
address the problem of how to hook in a lookup and translation
function in the data path.

>>       return NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
>>                      net, NULL, skb, dev, NULL,
>>                      ip6_rcv_finish);
>
> Or, try to use the netfilter hook that seems to be at the right
> place at least.
>
My original patch did hook into nf so it didn't require any change to
IP data path. The suggested alternatives were to use iptables or nft,
but the overhead of is too great for these to be useful for as a
performance optimization. The problem is that any additional lookup
added for this purpose only makes sense if it is significantly cheaper
than the cost of doing a route lookup (the part that can be eliminated
by early demux), and needs to have near zero impact on unrelated
traffic.

Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-09-30 18:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-29 22:17 [PATCH net-next 0/6] ila: Optimization to preserve value of early demux Tom Herbert
2015-09-29 22:17 ` [PATCH net-next 1/6] ila: Create net/ipv6/ila directory Tom Herbert
2015-09-29 22:17 ` [PATCH net-next 2/6] rhashtable: add function to replace an element Tom Herbert
2015-09-29 22:17 ` [PATCH net-next 3/6] netlink: add a start callback for starting a netlink dump Tom Herbert
2015-09-29 22:17 ` [PATCH net-next 4/6] xfrm: Add xfrm6 address translation function Tom Herbert
2015-09-29 22:58   ` David Ahern
2015-09-30  8:39     ` Steffen Klassert
2015-09-29 22:17 ` [PATCH net-next 5/6] ipv6: Call xfrm6_xlat_addr from ipv6_rcv Tom Herbert
2015-09-29 23:26   ` Florian Westphal
2015-09-30  9:06   ` Steffen Klassert
2015-09-30 18:40     ` Tom Herbert
2015-09-29 22:17 ` [PATCH net-next 6/6] ila: Add support for xfrm6_xlat_addr Tom Herbert
2015-09-29 22:34   ` kbuild test robot
2015-09-29 22:49   ` kbuild test robot
2015-09-29 23:18   ` kbuild test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.