All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/2] lwtunnel: make it really work, for IPv4
@ 2015-09-22 16:12 Jiri Benc
  2015-09-22 16:12 ` [PATCH net 1/2] ipv4: send arp replies to the correct tunnel Jiri Benc
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Jiri Benc @ 2015-09-22 16:12 UTC (permalink / raw)
  To: netdev; +Cc: Thomas Graf, Roopa Prabhu

One of the selling points of lwtunnel was the ability to specify the tunnel
destination using routes. However, this doesn't really work currently, as
ARP and ndisc replies are not handled correctly. ARP and ndisc replies won't
have tunnel metadata attached, thus they will be sent out with the default
parameters or not sent at all, either way never reaching the requester.

Most of the egress tunnel parameters can be inferred from the ingress
metada. The only and important exception is UDP ports. This patchset infers
the egress data from the ingress data and disallow settings of UDP ports in
tunnel routes. If there's a need for different UDP ports, a new interface
needs to be created for each port combination. Note that it's still possible
to specify the UDP ports to use, it just needs to be done while creating the
vxlan/geneve interface.

This covers only ARPs. IPv6 ndisc has the same problem but is harder to
solve, as there's already dst attached to outgoing skbs. Ideas to solve this
are welcome.

Jiri Benc (2):
  ipv4: send arp replies to the correct tunnel
  lwtunnel: remove source and destination UDP port config option

 include/net/ip_tunnels.h      |  2 ++
 include/uapi/linux/lwtunnel.h |  4 ----
 net/ipv4/arp.c                | 39 ++++++++++++++++++++------------
 net/ipv4/ip_tunnel_core.c     | 52 +++++++++++++++++++++++--------------------
 4 files changed, 55 insertions(+), 42 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net 1/2] ipv4: send arp replies to the correct tunnel
  2015-09-22 16:12 [PATCH net 0/2] lwtunnel: make it really work, for IPv4 Jiri Benc
@ 2015-09-22 16:12 ` Jiri Benc
  2015-09-23  8:10   ` Thomas Graf
  2015-09-22 16:12 ` [PATCH net 2/2] lwtunnel: remove source and destination UDP port config option Jiri Benc
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Jiri Benc @ 2015-09-22 16:12 UTC (permalink / raw)
  To: netdev; +Cc: Thomas Graf, Roopa Prabhu

When using ip lwtunnels, the additional data for xmit (basically, the actual
tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in
metadata dst. When replying to ARP requests, we need to send the reply to
the same tunnel the request came from. This means we need to construct
proper metadata dst for ARP replies.

We could perform another route lookup to get a dst entry with the correct
lwtstate. However, this won't always ensure that the outgoing tunnel is the
same as the incoming one, and it won't work anyway for IPv4 duplicate
address detection.

The only thing to do is to "reverse" the ip_tunnel_info.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 include/net/ip_tunnels.h  |  2 ++
 net/ipv4/arp.c            | 39 +++++++++++++++++++++++++--------------
 net/ipv4/ip_tunnel_core.c | 28 ++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+), 14 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 9a6a3ba888e8..f6dafec9102c 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -276,6 +276,8 @@ int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto);
 int iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 		  __be32 src, __be32 dst, u8 proto,
 		  u8 tos, u8 ttl, __be16 df, bool xnet);
+struct metadata_dst *iptunnel_metadata_reply(struct metadata_dst *md,
+					     gfp_t flags);
 
 struct sk_buff *iptunnel_handle_offloads(struct sk_buff *skb, bool gre_csum,
 					 int gso_type_mask);
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 30409b75e925..f03db8b7abee 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -113,6 +113,8 @@
 #include <net/arp.h>
 #include <net/ax25.h>
 #include <net/netrom.h>
+#include <net/dst_metadata.h>
+#include <net/ip_tunnels.h>
 
 #include <linux/uaccess.h>
 
@@ -296,7 +298,8 @@ static void arp_send_dst(int type, int ptype, __be32 dest_ip,
 			 struct net_device *dev, __be32 src_ip,
 			 const unsigned char *dest_hw,
 			 const unsigned char *src_hw,
-			 const unsigned char *target_hw, struct sk_buff *oskb)
+			 const unsigned char *target_hw,
+			 struct dst_entry *dst)
 {
 	struct sk_buff *skb;
 
@@ -309,9 +312,7 @@ static void arp_send_dst(int type, int ptype, __be32 dest_ip,
 	if (!skb)
 		return;
 
-	if (oskb)
-		skb_dst_copy(skb, oskb);
-
+	skb_dst_set(skb, dst);
 	arp_xmit(skb);
 }
 
@@ -333,6 +334,7 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
 	__be32 target = *(__be32 *)neigh->primary_key;
 	int probes = atomic_read(&neigh->probes);
 	struct in_device *in_dev;
+	struct dst_entry *dst = NULL;
 
 	rcu_read_lock();
 	in_dev = __in_dev_get_rcu(dev);
@@ -381,9 +383,10 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
 		}
 	}
 
+	if (skb && !(dev->priv_flags & IFF_XMIT_DST_RELEASE))
+		dst = dst_clone(skb_dst(skb));
 	arp_send_dst(ARPOP_REQUEST, ETH_P_ARP, target, dev, saddr,
-		     dst_hw, dev->dev_addr, NULL,
-		     dev->priv_flags & IFF_XMIT_DST_RELEASE ? NULL : skb);
+		     dst_hw, dev->dev_addr, NULL, dst);
 }
 
 static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip)
@@ -649,6 +652,7 @@ static int arp_process(struct sock *sk, struct sk_buff *skb)
 	int addr_type;
 	struct neighbour *n;
 	struct net *net = dev_net(dev);
+	struct dst_entry *reply_dst = NULL;
 	bool is_garp = false;
 
 	/* arp_rcv below verifies the ARP header and verifies the device
@@ -749,13 +753,18 @@ static int arp_process(struct sock *sk, struct sk_buff *skb)
  *  cache.
  */
 
+	if (arp->ar_op == htons(ARPOP_REQUEST) && skb_metadata_dst(skb))
+		reply_dst = (struct dst_entry *)
+			    iptunnel_metadata_reply(skb_metadata_dst(skb),
+						    GFP_ATOMIC);
+
 	/* Special case: IPv4 duplicate address detection packet (RFC2131) */
 	if (sip == 0) {
 		if (arp->ar_op == htons(ARPOP_REQUEST) &&
 		    inet_addr_type_dev_table(net, dev, tip) == RTN_LOCAL &&
 		    !arp_ignore(in_dev, sip, tip))
-			arp_send(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip, sha,
-				 dev->dev_addr, sha);
+			arp_send_dst(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip,
+				     sha, dev->dev_addr, sha, reply_dst);
 		goto out;
 	}
 
@@ -774,9 +783,10 @@ static int arp_process(struct sock *sk, struct sk_buff *skb)
 			if (!dont_send) {
 				n = neigh_event_ns(&arp_tbl, sha, &sip, dev);
 				if (n) {
-					arp_send(ARPOP_REPLY, ETH_P_ARP, sip,
-						 dev, tip, sha, dev->dev_addr,
-						 sha);
+					arp_send_dst(ARPOP_REPLY, ETH_P_ARP,
+						     sip, dev, tip, sha,
+						     dev->dev_addr, sha,
+						     reply_dst);
 					neigh_release(n);
 				}
 			}
@@ -794,9 +804,10 @@ static int arp_process(struct sock *sk, struct sk_buff *skb)
 				if (NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED ||
 				    skb->pkt_type == PACKET_HOST ||
 				    NEIGH_VAR(in_dev->arp_parms, PROXY_DELAY) == 0) {
-					arp_send(ARPOP_REPLY, ETH_P_ARP, sip,
-						 dev, tip, sha, dev->dev_addr,
-						 sha);
+					arp_send_dst(ARPOP_REPLY, ETH_P_ARP,
+						     sip, dev, tip, sha,
+						     dev->dev_addr, sha,
+						     reply_dst);
 				} else {
 					pneigh_enqueue(&arp_tbl,
 						       in_dev->arp_parms, skb);
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 9b97204b8c81..ce3a1e728606 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -46,6 +46,7 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 #include <net/rtnetlink.h>
+#include <net/dst_metadata.h>
 
 int iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 		  __be32 src, __be32 dst, __u8 proto,
@@ -119,6 +120,33 @@ int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto)
 }
 EXPORT_SYMBOL_GPL(iptunnel_pull_header);
 
+struct metadata_dst *iptunnel_metadata_reply(struct metadata_dst *md,
+					     gfp_t flags)
+{
+	struct metadata_dst *res;
+	struct ip_tunnel_info *dst, *src;
+
+	if (!md || md->u.tun_info.mode & IP_TUNNEL_INFO_TX)
+		return NULL;
+
+	res = metadata_dst_alloc(0, flags);
+	if (!res)
+		return NULL;
+
+	dst = &res->u.tun_info;
+	src = &md->u.tun_info;
+	dst->key.tun_id = src->key.tun_id;
+	if (src->mode & IP_TUNNEL_INFO_IPV6)
+		memcpy(&dst->key.u.ipv6.dst, &src->key.u.ipv6.src,
+		       sizeof(struct in6_addr));
+	else
+		dst->key.u.ipv4.dst = src->key.u.ipv4.src;
+	dst->mode = src->mode | IP_TUNNEL_INFO_TX;
+
+	return res;
+}
+EXPORT_SYMBOL_GPL(iptunnel_metadata_reply);
+
 struct sk_buff *iptunnel_handle_offloads(struct sk_buff *skb,
 					 bool csum_help,
 					 int gso_type_mask)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net 2/2] lwtunnel: remove source and destination UDP port config option
  2015-09-22 16:12 [PATCH net 0/2] lwtunnel: make it really work, for IPv4 Jiri Benc
  2015-09-22 16:12 ` [PATCH net 1/2] ipv4: send arp replies to the correct tunnel Jiri Benc
@ 2015-09-22 16:12 ` Jiri Benc
  2015-09-23  8:12   ` Thomas Graf
  2015-09-23  4:39 ` [PATCH net 0/2] lwtunnel: make it really work, for IPv4 Eric W. Biederman
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Jiri Benc @ 2015-09-22 16:12 UTC (permalink / raw)
  To: netdev; +Cc: Thomas Graf, Roopa Prabhu

The UDP tunnel config is asymmetric wrt. to the ports used. The source and
destination ports from one direction of the tunnel are not related to the
ports of the other direction. We need to be able to respond to ARP requests
using the correct ports without involving routing.

As the consequence, UDP ports need to be fixed property of the tunnel
interface and cannot be set per route. Remove the ability to set ports per
route. This is still okay to do, as no kernel has been released with these
attributes yet.

Note that the ability to specify source and destination ports is preserved
for other users of the lwtunnel API which don't use routes for tunnel key
specification (like openvswitch).

If in the future we rework ARP handling to allow port specification, the
attributes can be added back.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 include/uapi/linux/lwtunnel.h |  4 ----
 net/ipv4/ip_tunnel_core.c     | 24 ------------------------
 2 files changed, 28 deletions(-)

diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
index 34141a5dfe74..f8b01887a495 100644
--- a/include/uapi/linux/lwtunnel.h
+++ b/include/uapi/linux/lwtunnel.h
@@ -21,8 +21,6 @@ enum lwtunnel_ip_t {
 	LWTUNNEL_IP_SRC,
 	LWTUNNEL_IP_TTL,
 	LWTUNNEL_IP_TOS,
-	LWTUNNEL_IP_SPORT,
-	LWTUNNEL_IP_DPORT,
 	LWTUNNEL_IP_FLAGS,
 	__LWTUNNEL_IP_MAX,
 };
@@ -36,8 +34,6 @@ enum lwtunnel_ip6_t {
 	LWTUNNEL_IP6_SRC,
 	LWTUNNEL_IP6_HOPLIMIT,
 	LWTUNNEL_IP6_TC,
-	LWTUNNEL_IP6_SPORT,
-	LWTUNNEL_IP6_DPORT,
 	LWTUNNEL_IP6_FLAGS,
 	__LWTUNNEL_IP6_MAX,
 };
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index ce3a1e728606..84dce6a92f93 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -226,8 +226,6 @@ static const struct nla_policy ip_tun_policy[LWTUNNEL_IP_MAX + 1] = {
 	[LWTUNNEL_IP_SRC]	= { .type = NLA_U32 },
 	[LWTUNNEL_IP_TTL]	= { .type = NLA_U8 },
 	[LWTUNNEL_IP_TOS]	= { .type = NLA_U8 },
-	[LWTUNNEL_IP_SPORT]	= { .type = NLA_U16 },
-	[LWTUNNEL_IP_DPORT]	= { .type = NLA_U16 },
 	[LWTUNNEL_IP_FLAGS]	= { .type = NLA_U16 },
 };
 
@@ -267,12 +265,6 @@ static int ip_tun_build_state(struct net_device *dev, struct nlattr *attr,
 	if (tb[LWTUNNEL_IP_TOS])
 		tun_info->key.tos = nla_get_u8(tb[LWTUNNEL_IP_TOS]);
 
-	if (tb[LWTUNNEL_IP_SPORT])
-		tun_info->key.tp_src = nla_get_be16(tb[LWTUNNEL_IP_SPORT]);
-
-	if (tb[LWTUNNEL_IP_DPORT])
-		tun_info->key.tp_dst = nla_get_be16(tb[LWTUNNEL_IP_DPORT]);
-
 	if (tb[LWTUNNEL_IP_FLAGS])
 		tun_info->key.tun_flags = nla_get_u16(tb[LWTUNNEL_IP_FLAGS]);
 
@@ -294,8 +286,6 @@ static int ip_tun_fill_encap_info(struct sk_buff *skb,
 	    nla_put_be32(skb, LWTUNNEL_IP_SRC, tun_info->key.u.ipv4.src) ||
 	    nla_put_u8(skb, LWTUNNEL_IP_TOS, tun_info->key.tos) ||
 	    nla_put_u8(skb, LWTUNNEL_IP_TTL, tun_info->key.ttl) ||
-	    nla_put_u16(skb, LWTUNNEL_IP_SPORT, tun_info->key.tp_src) ||
-	    nla_put_u16(skb, LWTUNNEL_IP_DPORT, tun_info->key.tp_dst) ||
 	    nla_put_u16(skb, LWTUNNEL_IP_FLAGS, tun_info->key.tun_flags))
 		return -ENOMEM;
 
@@ -309,8 +299,6 @@ static int ip_tun_encap_nlsize(struct lwtunnel_state *lwtstate)
 		+ nla_total_size(4)	/* LWTUNNEL_IP_SRC */
 		+ nla_total_size(1)	/* LWTUNNEL_IP_TOS */
 		+ nla_total_size(1)	/* LWTUNNEL_IP_TTL */
-		+ nla_total_size(2)	/* LWTUNNEL_IP_SPORT */
-		+ nla_total_size(2)	/* LWTUNNEL_IP_DPORT */
 		+ nla_total_size(2);	/* LWTUNNEL_IP_FLAGS */
 }
 
@@ -333,8 +321,6 @@ static const struct nla_policy ip6_tun_policy[LWTUNNEL_IP6_MAX + 1] = {
 	[LWTUNNEL_IP6_SRC]		= { .len = sizeof(struct in6_addr) },
 	[LWTUNNEL_IP6_HOPLIMIT]		= { .type = NLA_U8 },
 	[LWTUNNEL_IP6_TC]		= { .type = NLA_U8 },
-	[LWTUNNEL_IP6_SPORT]		= { .type = NLA_U16 },
-	[LWTUNNEL_IP6_DPORT]		= { .type = NLA_U16 },
 	[LWTUNNEL_IP6_FLAGS]		= { .type = NLA_U16 },
 };
 
@@ -374,12 +360,6 @@ static int ip6_tun_build_state(struct net_device *dev, struct nlattr *attr,
 	if (tb[LWTUNNEL_IP6_TC])
 		tun_info->key.tos = nla_get_u8(tb[LWTUNNEL_IP6_TC]);
 
-	if (tb[LWTUNNEL_IP6_SPORT])
-		tun_info->key.tp_src = nla_get_be16(tb[LWTUNNEL_IP6_SPORT]);
-
-	if (tb[LWTUNNEL_IP6_DPORT])
-		tun_info->key.tp_dst = nla_get_be16(tb[LWTUNNEL_IP6_DPORT]);
-
 	if (tb[LWTUNNEL_IP6_FLAGS])
 		tun_info->key.tun_flags = nla_get_u16(tb[LWTUNNEL_IP6_FLAGS]);
 
@@ -401,8 +381,6 @@ static int ip6_tun_fill_encap_info(struct sk_buff *skb,
 	    nla_put_in6_addr(skb, LWTUNNEL_IP6_SRC, &tun_info->key.u.ipv6.src) ||
 	    nla_put_u8(skb, LWTUNNEL_IP6_HOPLIMIT, tun_info->key.tos) ||
 	    nla_put_u8(skb, LWTUNNEL_IP6_TC, tun_info->key.ttl) ||
-	    nla_put_u16(skb, LWTUNNEL_IP6_SPORT, tun_info->key.tp_src) ||
-	    nla_put_u16(skb, LWTUNNEL_IP6_DPORT, tun_info->key.tp_dst) ||
 	    nla_put_u16(skb, LWTUNNEL_IP6_FLAGS, tun_info->key.tun_flags))
 		return -ENOMEM;
 
@@ -416,8 +394,6 @@ static int ip6_tun_encap_nlsize(struct lwtunnel_state *lwtstate)
 		+ nla_total_size(16)	/* LWTUNNEL_IP6_SRC */
 		+ nla_total_size(1)	/* LWTUNNEL_IP6_HOPLIMIT */
 		+ nla_total_size(1)	/* LWTUNNEL_IP6_TC */
-		+ nla_total_size(2)	/* LWTUNNEL_IP6_SPORT */
-		+ nla_total_size(2)	/* LWTUNNEL_IP6_DPORT */
 		+ nla_total_size(2);	/* LWTUNNEL_IP6_FLAGS */
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net 0/2] lwtunnel: make it really work, for IPv4
  2015-09-22 16:12 [PATCH net 0/2] lwtunnel: make it really work, for IPv4 Jiri Benc
  2015-09-22 16:12 ` [PATCH net 1/2] ipv4: send arp replies to the correct tunnel Jiri Benc
  2015-09-22 16:12 ` [PATCH net 2/2] lwtunnel: remove source and destination UDP port config option Jiri Benc
@ 2015-09-23  4:39 ` Eric W. Biederman
  2015-09-23  8:09   ` Thomas Graf
  2015-09-23  8:08 ` Thomas Graf
  2015-09-24 21:32 ` David Miller
  4 siblings, 1 reply; 18+ messages in thread
From: Eric W. Biederman @ 2015-09-23  4:39 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev, Thomas Graf, Roopa Prabhu

Jiri Benc <jbenc@redhat.com> writes:

> One of the selling points of lwtunnel was the ability to specify the tunnel
> destination using routes. However, this doesn't really work currently, as
> ARP and ndisc replies are not handled correctly. ARP and ndisc replies won't
> have tunnel metadata attached, thus they will be sent out with the default
> parameters or not sent at all, either way never reaching the requester.
>
> Most of the egress tunnel parameters can be inferred from the ingress
> metada. The only and important exception is UDP ports. This patchset infers
> the egress data from the ingress data and disallow settings of UDP ports in
> tunnel routes. If there's a need for different UDP ports, a new interface
> needs to be created for each port combination. Note that it's still possible
> to specify the UDP ports to use, it just needs to be done while creating the
> vxlan/geneve interface.
>
> This covers only ARPs. IPv6 ndisc has the same problem but is harder to
> solve, as there's already dst attached to outgoing skbs. Ideas to solve this
> are welcome.

What distinguishes a skb received from a tunnel as opposed to a skb
received on from a network device is that a skb recevied on a tunnel
has a socket.

I could be easily missing something but couldn't you look at skb->sk
on the input path and if a socket is present use the socket to compute
the outgoing route?

I expect it would just need to be something like:
dst = sk_dst_check(sk, 0);

Eric

> Jiri Benc (2):
>   ipv4: send arp replies to the correct tunnel
>   lwtunnel: remove source and destination UDP port config option
>
>  include/net/ip_tunnels.h      |  2 ++
>  include/uapi/linux/lwtunnel.h |  4 ----
>  net/ipv4/arp.c                | 39 ++++++++++++++++++++------------
>  net/ipv4/ip_tunnel_core.c     | 52 +++++++++++++++++++++++--------------------
>  4 files changed, 55 insertions(+), 42 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 0/2] lwtunnel: make it really work, for IPv4
  2015-09-22 16:12 [PATCH net 0/2] lwtunnel: make it really work, for IPv4 Jiri Benc
                   ` (2 preceding siblings ...)
  2015-09-23  4:39 ` [PATCH net 0/2] lwtunnel: make it really work, for IPv4 Eric W. Biederman
@ 2015-09-23  8:08 ` Thomas Graf
  2015-09-24 21:32 ` David Miller
  4 siblings, 0 replies; 18+ messages in thread
From: Thomas Graf @ 2015-09-23  8:08 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev, Roopa Prabhu

On 09/22/15 at 06:12pm, Jiri Benc wrote:
> One of the selling points of lwtunnel was the ability to specify the tunnel
> destination using routes. However, this doesn't really work currently, as
> ARP and ndisc replies are not handled correctly. ARP and ndisc replies won't
> have tunnel metadata attached, thus they will be sent out with the default
> parameters or not sent at all, either way never reaching the requester.

Note that ARP is not strictly required in most cases where this form of
encapsulation is used. In particular any kind of broadcasting is very
undesirable for obvious reasons ;-)

The matrix which maps overlay addresses to tunnel endpoint and tunnel id
would normally also contain the MAC address of the virtual endpoint or a
dummy MAC if it's a L3 only endpoint (which is what you want to go for
these days) so you would typically answer the ARP requests locally or just
fill the neighbour cache yourself.

However, I very much agree that we should make this work since it eases
use.

> Most of the egress tunnel parameters can be inferred from the ingress
> metada. The only and important exception is UDP ports. This patchset infers
> the egress data from the ingress data and disallow settings of UDP ports in
> tunnel routes. If there's a need for different UDP ports, a new interface
> needs to be created for each port combination. Note that it's still possible
> to specify the UDP ports to use, it just needs to be done while creating the
> vxlan/geneve interface.

I would assume that dst.tp_dst = src.tp_dst would work well in practice.
The port number should be standardized anyway and would only differ to
support non compatible flavours of VXLAN.

I don't mind putting this restriction on though.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 0/2] lwtunnel: make it really work, for IPv4
  2015-09-23  4:39 ` [PATCH net 0/2] lwtunnel: make it really work, for IPv4 Eric W. Biederman
@ 2015-09-23  8:09   ` Thomas Graf
  2015-09-23 12:17     ` Eric W. Biederman
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Graf @ 2015-09-23  8:09 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Jiri Benc, netdev, Roopa Prabhu

On 09/22/15 at 11:39pm, Eric W. Biederman wrote:
> What distinguishes a skb received from a tunnel as opposed to a skb
> received on from a network device is that a skb recevied on a tunnel
> has a socket.
> 
> I could be easily missing something but couldn't you look at skb->sk
> on the input path and if a socket is present use the socket to compute
> the outgoing route?
> 
> I expect it would just need to be something like:
> dst = sk_dst_check(sk, 0);

If you are talking about the UDP socket then that socket would cache
the underlay route corresponding to the outer header. Jiri is looking
for the outer header route to derive tunnel parameters from.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 1/2] ipv4: send arp replies to the correct tunnel
  2015-09-22 16:12 ` [PATCH net 1/2] ipv4: send arp replies to the correct tunnel Jiri Benc
@ 2015-09-23  8:10   ` Thomas Graf
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Graf @ 2015-09-23  8:10 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev, Roopa Prabhu

On 09/22/15 at 06:12pm, Jiri Benc wrote:
> When using ip lwtunnels, the additional data for xmit (basically, the actual
> tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in
> metadata dst. When replying to ARP requests, we need to send the reply to
> the same tunnel the request came from. This means we need to construct
> proper metadata dst for ARP replies.
> 
> We could perform another route lookup to get a dst entry with the correct
> lwtstate. However, this won't always ensure that the outgoing tunnel is the
> same as the incoming one, and it won't work anyway for IPv4 duplicate
> address detection.
> 
> The only thing to do is to "reverse" the ip_tunnel_info.
> 
> Signed-off-by: Jiri Benc <jbenc@redhat.com>

Great start, thanks!

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 2/2] lwtunnel: remove source and destination UDP port config option
  2015-09-22 16:12 ` [PATCH net 2/2] lwtunnel: remove source and destination UDP port config option Jiri Benc
@ 2015-09-23  8:12   ` Thomas Graf
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Graf @ 2015-09-23  8:12 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev, Roopa Prabhu

On 09/22/15 at 06:12pm, Jiri Benc wrote:
> The UDP tunnel config is asymmetric wrt. to the ports used. The source and
> destination ports from one direction of the tunnel are not related to the
> ports of the other direction. We need to be able to respond to ARP requests
> using the correct ports without involving routing.
> 
> As the consequence, UDP ports need to be fixed property of the tunnel
> interface and cannot be set per route. Remove the ability to set ports per
> route. This is still okay to do, as no kernel has been released with these
> attributes yet.
> 
> Note that the ability to specify source and destination ports is preserved
> for other users of the lwtunnel API which don't use routes for tunnel key
> specification (like openvswitch).
> 
> If in the future we rework ARP handling to allow port specification, the
> attributes can be added back.
> 
> Signed-off-by: Jiri Benc <jbenc@redhat.com>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 0/2] lwtunnel: make it really work, for IPv4
  2015-09-23  8:09   ` Thomas Graf
@ 2015-09-23 12:17     ` Eric W. Biederman
  2015-09-23 14:29       ` Jiri Benc
  0 siblings, 1 reply; 18+ messages in thread
From: Eric W. Biederman @ 2015-09-23 12:17 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Jiri Benc, netdev, Roopa Prabhu

Thomas Graf <tgraf@suug.ch> writes:

> On 09/22/15 at 11:39pm, Eric W. Biederman wrote:
>> What distinguishes a skb received from a tunnel as opposed to a skb
>> received on from a network device is that a skb recevied on a tunnel
>> has a socket.
>> 
>> I could be easily missing something but couldn't you look at skb->sk
>> on the input path and if a socket is present use the socket to compute
>> the outgoing route?
>> 
>> I expect it would just need to be something like:
>> dst = sk_dst_check(sk, 0);
>
> If you are talking about the UDP socket then that socket would cache
> the underlay route corresponding to the outer header. Jiri is looking
> for the outer header route to derive tunnel parameters from.

Assuming the transport is UDP then it would be a UDP socket.  That
socket will have all of the information needed to construct the outer
header as the receive path of that socket removes the outer header.

I admit you can't use the cached dst.  It is the wrong on that socket.

My point is that if we have the UDP socket and we have the sk we have
all of the information we need to compute the reverse dst.

Having looked a little closer this seems to fall apart in practice as
the tunnel receive path appears to never call skb_set_owner or otherwise
set skb->sk.

Still I am not certain that limitation is fatal.  Looking at the
incoming socket still feels like the right thing to do in this
situation.

Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 0/2] lwtunnel: make it really work, for IPv4
  2015-09-23 12:17     ` Eric W. Biederman
@ 2015-09-23 14:29       ` Jiri Benc
  2015-09-23 17:42         ` Eric W. Biederman
  0 siblings, 1 reply; 18+ messages in thread
From: Jiri Benc @ 2015-09-23 14:29 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Thomas Graf, netdev, Roopa Prabhu

On Wed, 23 Sep 2015 07:17:53 -0500, Eric W. Biederman wrote:
> Assuming the transport is UDP then it would be a UDP socket.  That
> socket will have all of the information needed to construct the outer
> header as the receive path of that socket removes the outer header.
> 
> I admit you can't use the cached dst.  It is the wrong on that socket.
> 
> My point is that if we have the UDP socket and we have the sk we have
> all of the information we need to compute the reverse dst.

That (single) UDP socket may represent many tunnels with different
parameters. Knowing the socket is still not enough to construct the
data. The only place where the needed data is stored is routing table
which won't help us much for ARP, and the metadata dst attached to the
incoming skb.

 Jiri

-- 
Jiri Benc

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 0/2] lwtunnel: make it really work, for IPv4
  2015-09-23 14:29       ` Jiri Benc
@ 2015-09-23 17:42         ` Eric W. Biederman
  2015-09-23 20:54           ` Jiri Benc
  0 siblings, 1 reply; 18+ messages in thread
From: Eric W. Biederman @ 2015-09-23 17:42 UTC (permalink / raw)
  To: Jiri Benc; +Cc: Thomas Graf, netdev, Roopa Prabhu

Jiri Benc <jbenc@redhat.com> writes:

> On Wed, 23 Sep 2015 07:17:53 -0500, Eric W. Biederman wrote:
>> Assuming the transport is UDP then it would be a UDP socket.  That
>> socket will have all of the information needed to construct the outer
>> header as the receive path of that socket removes the outer header.
>> 
>> I admit you can't use the cached dst.  It is the wrong on that socket.
>> 
>> My point is that if we have the UDP socket and we have the sk we have
                                                              ^skb
>> all of the information we need to compute the reverse dst.
>
> That (single) UDP socket may represent many tunnels with different
> parameters. Knowing the socket is still not enough to construct the
> data. The only place where the needed data is stored is routing table
> which won't help us much for ARP, and the metadata dst attached to the
> incoming skb.

Having read through the code a bit more.  I understand more clearly what
is happening with metadata dsts.

Without your ARP changes in this patchset I don't understand what the
purpose of metadata dsts are, so I am probably still missing something
important.

If ARP and ndisc are the only uses of metadata dsts, allocating a
metadata dst for every packet seems undesirable.

In which case I suspect what is desriable is a ndo operation that
looks at the packet and computes the dst.

So perhaps instead of:
+	if (arp->ar_op == htons(ARPOP_REQUEST) && skb_metadata_dst(skb))
+		reply_dst = (struct dst_entry *)
+			    iptunnel_metadata_reply(skb_metadata_dst(skb),
+						    GFP_ATOMIC);
+
The code would be:
	if (arp->ar_op == htons(AROP_REQUEST) && dev->ndo_reply_dst)
		reply_dst = dev->ndo_reply_dst(skb, GFP_ATOMIC);


For any network that does interesting things with network level
identifiers below IP this seems like a general problem.  Further it
seems more desirable to only perform an allocation when necessary,
rather than for every packet, and two for the packets that actually
need replies.

Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 0/2] lwtunnel: make it really work, for IPv4
  2015-09-23 17:42         ` Eric W. Biederman
@ 2015-09-23 20:54           ` Jiri Benc
  2015-09-23 21:09             ` Eric W. Biederman
  0 siblings, 1 reply; 18+ messages in thread
From: Jiri Benc @ 2015-09-23 20:54 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Thomas Graf, netdev, Roopa Prabhu

On Wed, 23 Sep 2015 12:42:30 -0500, Eric W. Biederman wrote:
> Without your ARP changes in this patchset I don't understand what the
> purpose of metadata dsts are, so I am probably still missing something
> important.
> 
> If ARP and ndisc are the only uses of metadata dsts, allocating a
> metadata dst for every packet seems undesirable.

They're not. Metadata dsts carry information about the original
encapsulation. This is used to match flows on the original
encapsulation data. The current users are openvswitch, eBPF, in the
future also tc. Could be used also by e.g. nftables in the future.

In the other direction, it can be used to specify egress encapsulation
data. Alternatively, lwtstate pointer in dst_entry can be used for
egress.

> In which case I suspect what is desriable is a ndo operation that
> looks at the packet and computes the dst.
> 
> So perhaps instead of:
> +	if (arp->ar_op == htons(ARPOP_REQUEST) && skb_metadata_dst(skb))
> +		reply_dst = (struct dst_entry *)
> +			    iptunnel_metadata_reply(skb_metadata_dst(skb),
> +						    GFP_ATOMIC);
> +
> The code would be:
> 	if (arp->ar_op == htons(AROP_REQUEST) && dev->ndo_reply_dst)
> 		reply_dst = dev->ndo_reply_dst(skb, GFP_ATOMIC);

This is something I intended to do originally. I also considered adding
the callback into metadata_dst itself. However, it doesn't make much
sense for the time being. The only thing that's using metadata_dst
currently is IP based lwtunnels. Look at the struct metadata_dst
definition:

struct metadata_dst {
        struct dst_entry                dst;
        union { 
                struct ip_tunnel_info   tun_info; 
        } u;            
};                      

Although there's an union, there's nothing that says what kind of data
the metadata_dst carries. Adding new field to the union would also
require adding a new type field. Otherwise you risk misinterpretation
of the data, as all current places just blindly reach into tun_info if
metadata dst is present, regardless of where the skb came from.

If there's another user of metadata_dst in the future, this needs to be
changed anyway. We can add such ndo callback (or some other kind of
callback) then, if it turns out to be needed. For now, we don't need it
and the changes to make metadata_dst usable for other things than IP
tunnels are not suitable for net.git.

> For any network that does interesting things with network level
> identifiers below IP this seems like a general problem.  Further it
> seems more desirable to only perform an allocation when necessary,
> rather than for every packet, and two for the packets that actually
> need replies.

The metadata_dst are only allocated when requested. Look at e.g.
vxlan_collect_metadata function. If they're requested, it means they are
needed. And they certainly need to be allocated for ARP replies to such
packets. I don't see what could be further optimized here. Seems to me
that you assume that the sole purpose of why metadata_dst exist is ARP
which is not the case.

 Jiri

-- 
Jiri Benc

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 0/2] lwtunnel: make it really work, for IPv4
  2015-09-23 20:54           ` Jiri Benc
@ 2015-09-23 21:09             ` Eric W. Biederman
  2015-09-23 23:08               ` Thomas Graf
  2015-09-24  8:35               ` Jiri Benc
  0 siblings, 2 replies; 18+ messages in thread
From: Eric W. Biederman @ 2015-09-23 21:09 UTC (permalink / raw)
  To: Jiri Benc; +Cc: Thomas Graf, netdev, Roopa Prabhu

Jiri Benc <jbenc@redhat.com> writes:

> On Wed, 23 Sep 2015 12:42:30 -0500, Eric W. Biederman wrote:
[snip]
>> So perhaps instead of:
>> +	if (arp->ar_op == htons(ARPOP_REQUEST) && skb_metadata_dst(skb))
>> +		reply_dst = (struct dst_entry *)
>> +			    iptunnel_metadata_reply(skb_metadata_dst(skb),
>> +						    GFP_ATOMIC);
>> +
>> The code would be:
>> 	if (arp->ar_op == htons(AROP_REQUEST) && dev->ndo_reply_dst)
>> 		reply_dst = dev->ndo_reply_dst(skb, GFP_ATOMIC);
>
> This is something I intended to do originally. I also considered adding
> the callback into metadata_dst itself. However, it doesn't make much
> sense for the time being.

If you convert this into the equivalent for neighbour discovery where
the ingress metadata dst has already been lost it makes a lot of sense.

Which is the question you were asking that I have been replying to as
I think this all through.

[snip]

> If there's another user of metadata_dst in the future, this needs to be
> changed anyway. We can add such ndo callback (or some other kind of
> callback) then, if it turns out to be needed. For now, we don't need it
> and the changes to make metadata_dst usable for other things than IP
> tunnels are not suitable for net.git.

I think you were misunderstanding what I was suggesting.

I was suggesting an ndo callback that does not need an metadata dst on
the skb but has enough information between the network device and the
skb itself to properly construct a dst that can be used for arp and
ndisc type replies where you want to exactly reverse the path.

*Blink* You were targeting net.git with a feature enhancement????
I will just ignore that.

>> For any network that does interesting things with network level
>> identifiers below IP this seems like a general problem.  Further it
>> seems more desirable to only perform an allocation when necessary,
>> rather than for every packet, and two for the packets that actually
>> need replies.
>
> The metadata_dst are only allocated when requested. Look at e.g.
> vxlan_collect_metadata function. If they're requested, it means they are
> needed. And they certainly need to be allocated for ARP replies to such
> packets. I don't see what could be further optimized here. Seems to me
> that you assume that the sole purpose of why metadata_dst exist is ARP
> which is not the case.

What I was observing is that in general the only tunneled packets that
need an ingress metadata dst for a tunneled medium ethernet like medium
are arp and ndisc packets.  In other cases if you aren't doing something
exceptional like openvswitch the normal routing should be sufficient.

Which means a ndo_reply_dst method could remove the need in many cases
for an ingress metadata dst to need to be allocated.

Regardless a netdevice operation that digs into the packet and figures
out what is necessary for a reply seems like the clean way to make this
work for both arp and neighbour discovery.

Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 0/2] lwtunnel: make it really work, for IPv4
  2015-09-23 21:09             ` Eric W. Biederman
@ 2015-09-23 23:08               ` Thomas Graf
  2015-09-24  5:54                 ` Eric W. Biederman
  2015-09-24  8:35               ` Jiri Benc
  1 sibling, 1 reply; 18+ messages in thread
From: Thomas Graf @ 2015-09-23 23:08 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Jiri Benc, netdev, Roopa Prabhu

On 09/23/15 at 04:09pm, Eric W. Biederman wrote:

[...]

> *Blink* You were targeting net.git with a feature enhancement????
> I will just ignore that.

The point of this series is to not expose the src and dst port Netlink
bits to user space in a released kernel because the ABI is not set in
stone yet. Hence targeting net.

If patch 1 is regarded unacceptable we should at least pull in patch 2
to not expose these bits until this has been worked out to leave the
option proposed here on the table.

> What I was observing is that in general the only tunneled packets that
> need an ingress metadata dst for a tunneled medium ethernet like medium
> are arp and ndisc packets.  In other cases if you aren't doing something
> exceptional like openvswitch the normal routing should be sufficient.
> 
> Which means a ndo_reply_dst method could remove the need in many cases
> for an ingress metadata dst to need to be allocated.

The tunnel RX metadata collected is used to associate packets matching
a particular tunnel id with the appropriate virtual networks by forwarding
them to a separate netns, separate VRF device or a separate bridge.

More sophisticated hypervisors may run multiple tunnel endpoints on
the same host using different host addresses and differentiate packets
based on the underlay destination IP as well.

> Regardless a netdevice operation that digs into the packet and figures
> out what is necessary for a reply seems like the clean way to make this
> work for both arp and neighbour discovery.

I'm not disagreeing entirely although I disagree that you can do the
NDO without looking at the original metadata dst. Even a full fib
lookup based on the requested IP in the ARP header is somewhat error
prone. I fully agree though that once we support additional types
besides IP tunneling then such an NDO might in fact make sense.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 0/2] lwtunnel: make it really work, for IPv4
  2015-09-23 23:08               ` Thomas Graf
@ 2015-09-24  5:54                 ` Eric W. Biederman
  2015-09-24  8:19                   ` Jiri Benc
  0 siblings, 1 reply; 18+ messages in thread
From: Eric W. Biederman @ 2015-09-24  5:54 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Jiri Benc, netdev, Roopa Prabhu

Thomas Graf <tgraf@suug.ch> writes:

> On 09/23/15 at 04:09pm, Eric W. Biederman wrote:
>
> [...]
>
>> *Blink* You were targeting net.git with a feature enhancement????
>> I will just ignore that.
>
> The point of this series is to not expose the src and dst port Netlink
> bits to user space in a released kernel because the ABI is not set in
> stone yet. Hence targeting net.
>
> If patch 1 is regarded unacceptable we should at least pull in patch 2
> to not expose these bits until this has been worked out to leave the
> option proposed here on the table.

My only interest in this is to help figure out how to make IPv6 ndisc
work over light weight tunnels.

>> What I was observing is that in general the only tunneled packets that
>> need an ingress metadata dst for a tunneled medium ethernet like medium
>> are arp and ndisc packets.  In other cases if you aren't doing something
>> exceptional like openvswitch the normal routing should be sufficient.
>> 
>> Which means a ndo_reply_dst method could remove the need in many cases
>> for an ingress metadata dst to need to be allocated.
>
> The tunnel RX metadata collected is used to associate packets matching
> a particular tunnel id with the appropriate virtual networks by forwarding
> them to a separate netns, separate VRF device or a separate bridge.
>
> More sophisticated hypervisors may run multiple tunnel endpoints on
> the same host using different host addresses and differentiate packets
> based on the underlay destination IP as well.

Fair enough.  And in at least some of those situations the dst metadata
will be needed on every packet.  I think the extra allocation per packet
for the metadata dst is unfortunate but I won't say it is wrong.

>> Regardless a netdevice operation that digs into the packet and figures
>> out what is necessary for a reply seems like the clean way to make this
>> work for both arp and neighbour discovery.
>
> I'm not disagreeing entirely although I disagree that you can do the
> NDO without looking at the original metadata dst. Even a full fib
> lookup based on the requested IP in the ARP header is somewhat error
> prone. I fully agree though that once we support additional types
> besides IP tunneling then such an NDO might in fact make sense.

We can't use the metadata dst for IPv6 neighbour discovery.  Neighbour
discovery processing comes after ip6_route_input.  That is what makes
such a network device operation interesting today.

We don't need the information in the metadata dst because the
information that was in the metadata dst is still in the packet we just
need to reparse the packet.

Given that the input network device is per tunnel type, the network
device method will already know the format of the tunnel packet and so
should not have any trouble parsing it.

As an assist we can preserve 90% of the information in ip_tunnel_key by
repurposing inner_transport_header, inner_network_header and
inner_mac_header (which are only valid on output packets today) as
outer_transport_header, outer_network_header and outer_mac_header for
input packets.

That makes tun_id the only field of struct ip_tunnel_key that we have to
work to find.



Creating outer_transport_header, outer_network_header and
outer_mac_header should open up a lot of optmization opportunities
for input tunnel processing.  I expect with just a little bit of care
we should be able to replace the input metadata dst with a handful
of fields stored in skb->cb.  Which in turn means no memory allocations
are necessary, and that the work can be done unconditionally.

Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 0/2] lwtunnel: make it really work, for IPv4
  2015-09-24  5:54                 ` Eric W. Biederman
@ 2015-09-24  8:19                   ` Jiri Benc
  0 siblings, 0 replies; 18+ messages in thread
From: Jiri Benc @ 2015-09-24  8:19 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Thomas Graf, netdev, Roopa Prabhu

On Thu, 24 Sep 2015 00:54:00 -0500, Eric W. Biederman wrote:
> My only interest in this is to help figure out how to make IPv6 ndisc
> work over light weight tunnels.

Thanks for that. However, as much as some of the things you wrote in
this mail make sense (see below), I don't see how they help with this
particular problem. You seem to be concerned mostly with ingress and
the existence of metadata_dst, while the problem with IPv6 is there's
currently no place to attach egress tunnel info to.

> We can't use the metadata dst for IPv6 neighbour discovery.  Neighbour
> discovery processing comes after ip6_route_input.  That is what makes
> such a network device operation interesting today.
> 
> We don't need the information in the metadata dst because the
> information that was in the metadata dst is still in the packet we just
> need to reparse the packet.

That's an interesting point. For ARP and ndisc, we can certainly do
that. For ovs and similar use cases, I'm not sure. We can certainly
make it work this way but we'd need a ndo for reparsing the header.
Compared to the current situation where we just carry the parsed data,
reparsing seems to be more expensive. I'd expect this would show in
performance numbers.

> Given that the input network device is per tunnel type, the network
> device method will already know the format of the tunnel packet and so
> should not have any trouble parsing it.

Yes.

> As an assist we can preserve 90% of the information in ip_tunnel_key by
> repurposing inner_transport_header, inner_network_header and
> inner_mac_header (which are only valid on output packets today) as
> outer_transport_header, outer_network_header and outer_mac_header for
> input packets.
>
> That makes tun_id the only field of struct ip_tunnel_key that we have to
> work to find.

You will need to know the format of the tunnel header to get tun_id.
Because we want the users generic, it means a ndo to parse the headers.

> Creating outer_transport_header, outer_network_header and
> outer_mac_header should open up a lot of optmization opportunities
> for input tunnel processing.  I expect with just a little bit of care
> we should be able to replace the input metadata dst with a handful
> of fields stored in skb->cb.  Which in turn means no memory allocations
> are necessary, and that the work can be done unconditionally.

Which would mean no GRO. Which is probably no problem. Seems it may
indeed be an optimization. Doesn't have much to do with this set,
though, we'd still have the exact same problems as now and the exact
same ways to fix them.

 Jiri

-- 
Jiri Benc

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 0/2] lwtunnel: make it really work, for IPv4
  2015-09-23 21:09             ` Eric W. Biederman
  2015-09-23 23:08               ` Thomas Graf
@ 2015-09-24  8:35               ` Jiri Benc
  1 sibling, 0 replies; 18+ messages in thread
From: Jiri Benc @ 2015-09-24  8:35 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Thomas Graf, netdev, Roopa Prabhu

On Wed, 23 Sep 2015 16:09:14 -0500, Eric W. Biederman wrote:
> *Blink* You were targeting net.git with a feature enhancement????
> I will just ignore that.

I do not. This is set is a minimal fix to make IP based lwtunnels with
tunnel information specified in a route working (for IPv4, anyway).
Note that lwtunnels and IP/UDP based lwtunnels is a new feature merged
for 4.3. This set fixes a flaw in that feature.

Eric, I'd like you to ask how do you propose to fix this problem in
net.git. So far, your proposals were:

1. Use a new ndo to "reverse" metadata dst. I can do that but all
   current users will just call iptunnel_metadata_reply. Seems to be
   superfluous but whatever.

2. Replace metadata_dst on ingress with a combination of cb and
   outer_*_header. This is certainly not suitable for net.git and won't
   solve the problem anyway.

Meanwhile, I'm going to resubmit patch 2 as a separate fix, as this is
kind of urgent. Having non working feature in a released kernel is sad
but only a temporary problem. Having non working uAPI in a released
kernel is a problem we'll have to deal with forever.

 Jiri

-- 
Jiri Benc

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 0/2] lwtunnel: make it really work, for IPv4
  2015-09-22 16:12 [PATCH net 0/2] lwtunnel: make it really work, for IPv4 Jiri Benc
                   ` (3 preceding siblings ...)
  2015-09-23  8:08 ` Thomas Graf
@ 2015-09-24 21:32 ` David Miller
  4 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2015-09-24 21:32 UTC (permalink / raw)
  To: jbenc; +Cc: netdev, tgraf, roopa

From: Jiri Benc <jbenc@redhat.com>
Date: Tue, 22 Sep 2015 18:12:10 +0200

> One of the selling points of lwtunnel was the ability to specify the tunnel
> destination using routes. However, this doesn't really work currently, as
> ARP and ndisc replies are not handled correctly. ARP and ndisc replies won't
> have tunnel metadata attached, thus they will be sent out with the default
> parameters or not sent at all, either way never reaching the requester.
> 
> Most of the egress tunnel parameters can be inferred from the ingress
> metada. The only and important exception is UDP ports. This patchset infers
> the egress data from the ingress data and disallow settings of UDP ports in
> tunnel routes. If there's a need for different UDP ports, a new interface
> needs to be created for each port combination. Note that it's still possible
> to specify the UDP ports to use, it just needs to be done while creating the
> vxlan/geneve interface.
> 
> This covers only ARPs. IPv6 ndisc has the same problem but is harder to
> solve, as there's already dst attached to outgoing skbs. Ideas to solve this
> are welcome.

I'm applying this series as-is, as it's a reasonable step forward in
dealing with this issue.

Thanks Jiri.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-09-24 21:32 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-22 16:12 [PATCH net 0/2] lwtunnel: make it really work, for IPv4 Jiri Benc
2015-09-22 16:12 ` [PATCH net 1/2] ipv4: send arp replies to the correct tunnel Jiri Benc
2015-09-23  8:10   ` Thomas Graf
2015-09-22 16:12 ` [PATCH net 2/2] lwtunnel: remove source and destination UDP port config option Jiri Benc
2015-09-23  8:12   ` Thomas Graf
2015-09-23  4:39 ` [PATCH net 0/2] lwtunnel: make it really work, for IPv4 Eric W. Biederman
2015-09-23  8:09   ` Thomas Graf
2015-09-23 12:17     ` Eric W. Biederman
2015-09-23 14:29       ` Jiri Benc
2015-09-23 17:42         ` Eric W. Biederman
2015-09-23 20:54           ` Jiri Benc
2015-09-23 21:09             ` Eric W. Biederman
2015-09-23 23:08               ` Thomas Graf
2015-09-24  5:54                 ` Eric W. Biederman
2015-09-24  8:19                   ` Jiri Benc
2015-09-24  8:35               ` Jiri Benc
2015-09-23  8:08 ` Thomas Graf
2015-09-24 21:32 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.