All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] IPv6: fix rt_lookup in pmtu_discovery
@ 2010-01-07  4:43 Tom Herbert
  2010-01-07  9:27 ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Tom Herbert @ 2010-01-07  4:43 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, Lorenzo Colitti

In rt6_pmtu_discovery a route lookup is done constrained to the
receiving interface of the ICMP message.  In the case of an asymmetric
routing, the receive interface may be different than the sending
interface, so no route will be found in such cases and pmtu is never
correctly set in the route to the destination.  This patch fixes that
by not using the receive interface in the lookup.

Tom

Signed-off-by: Tom Herbert <therbert@google.com>

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index df9432a..d7d8893 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1569,7 +1569,7 @@ void rt6_pmtu_discovery(struct in6_addr *daddr,
struct in6_addr *saddr,
        struct net *net = dev_net(dev);
        int allfrag = 0;

-       rt = rt6_lookup(net, daddr, saddr, dev->ifindex, 0);
+       rt = rt6_lookup(net, daddr, saddr, 0, 0);
        if (rt == NULL)
                return;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] IPv6: fix rt_lookup in pmtu_discovery
  2010-01-07  4:43 [PATCH] IPv6: fix rt_lookup in pmtu_discovery Tom Herbert
@ 2010-01-07  9:27 ` David Miller
  2010-01-08  1:05   ` Maciej Żenczykowski
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2010-01-07  9:27 UTC (permalink / raw)
  To: therbert; +Cc: netdev, lorenzo

From: Tom Herbert <therbert@google.com>
Date: Wed, 6 Jan 2010 20:43:48 -0800

> In rt6_pmtu_discovery a route lookup is done constrained to the
> receiving interface of the ICMP message.  In the case of an asymmetric
> routing, the receive interface may be different than the sending
> interface, so no route will be found in such cases and pmtu is never
> correctly set in the route to the destination.  This patch fixes that
> by not using the receive interface in the lookup.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>

This needs to do what the IPV4 side does, iterate over specific then
"any" device index, doing a lookup for each case until the route is
found, therefore starting from more specific and going towards less
specific routes.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] IPv6: fix rt_lookup in pmtu_discovery
  2010-01-07  9:27 ` David Miller
@ 2010-01-08  1:05   ` Maciej Żenczykowski
  2010-01-08  1:10     ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Maciej Żenczykowski @ 2010-01-08  1:05 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, netdev, lorenzo

> This needs to do what the IPV4 side does, iterate over specific then
> "any" device index, doing a lookup for each case until the route is
> found, therefore starting from more specific and going towards less
> specific routes.

I've spoken with Tom and we can't quite seem to figure out what
exactly the code should be attempting to accomplish here.
Is checking the specific device index meant to deal with link local IPs?

As for the v4 code, I assume you're referring to ip_rt_frag_needed in
net/ipv4/route.c.

If so, shouldn't this sort of route lookup be abstracted away into
some function?
Path mtu discovery / fragmentation handling functions don't seem to be
the right place to be implementing route lookup policy.

Are you suggesting the following logic for ipv6:

rt = rt6_lookup(net, daddr, saddr, dev->ifindex, 0);
if (rt == NULL)
        rt = rt6_lookup(net, daddr, saddr, 0, 0);
if (rt == NULL)
        rt = rt6_lookup(net, daddr, 0, dev->ifindex, 0);
if (rt == NULL)
        rt = rt6_lookup(net, daddr, 0, 0, 0);
if (rt == NULL)
        return;

It's not clear to me what the last two lookups with saddr replaced
with 0 are for.

Maciej

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] IPv6: fix rt_lookup in pmtu_discovery
  2010-01-08  1:05   ` Maciej Żenczykowski
@ 2010-01-08  1:10     ` David Miller
  2010-01-09  0:12       ` Lorenzo Colitti
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2010-01-08  1:10 UTC (permalink / raw)
  To: zenczykowski; +Cc: therbert, netdev, lorenzo

From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Thu, 7 Jan 2010 17:05:36 -0800

> I've spoken with Tom and we can't quite seem to figure out what
> exactly the code should be attempting to accomplish here.

"git blame net/ipv4/route.c >x.c" might help you solve that
mystery, here is one example of what it might show you:

commit 0010e46577a27c1d915034637f6c2fa57a9a091c
Author: Timo Teras <timo.teras@iki.fi>
Date:   Tue Apr 29 03:32:25 2008 -0700

    ipv4: Update MTU to all related cache entries in ip_rt_frag_needed()
    
    Add struct net_device parameter to ip_rt_frag_needed() and update MTU to
    cache entries where ifindex is specified. This is similar to what is
    already done in ip_rt_redirect().
    
    Signed-off-by: Timo Teras <timo.teras@iki.fi>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/net/route.h b/include/net/route.h
index c633880..fc836ff 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -116,7 +116,7 @@ extern int		__ip_route_output_key(struct net *, struct rtable **, const struct f
 extern int		ip_route_output_key(struct net *, struct rtable **, struct flowi *flp);
 extern int		ip_route_output_flow(struct net *, struct rtable **rp, struct flowi *flp, struct sock *sk, int flags);
 extern int		ip_route_input(struct sk_buff*, __be32 dst, __be32 src, u8 tos, struct net_device *devin);
-extern unsigned short	ip_rt_frag_needed(struct net *net, struct iphdr *iph, unsigned short new_mtu);
+extern unsigned short	ip_rt_frag_needed(struct net *net, struct iphdr *iph, unsigned short new_mtu, struct net_device *dev);
 extern void		ip_rt_send_redirect(struct sk_buff *skb);
 
 extern unsigned		inet_addr_type(struct net *net, __be32 addr);
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index c67d00e..8739735 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -691,7 +691,8 @@ static void icmp_unreach(struct sk_buff *skb)
 					       NIPQUAD(iph->daddr));
 			} else {
 				info = ip_rt_frag_needed(net, iph,
-						     ntohs(icmph->un.frag.mtu));
+							 ntohs(icmph->un.frag.mtu),
+							 skb->dev);
 				if (!info)
 					goto out;
 			}
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index ce25a13..5e3685c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1430,11 +1430,13 @@ static inline unsigned short guess_mtu(unsigned short old_mtu)
 }
 
 unsigned short ip_rt_frag_needed(struct net *net, struct iphdr *iph,
-				 unsigned short new_mtu)
+				 unsigned short new_mtu,
+				 struct net_device *dev)
 {
-	int i;
+	int i, k;
 	unsigned short old_mtu = ntohs(iph->tot_len);
 	struct rtable *rth;
+	int  ikeys[2] = { dev->ifindex, 0 };
 	__be32  skeys[2] = { iph->saddr, 0, };
 	__be32  daddr = iph->daddr;
 	unsigned short est_mtu = 0;
@@ -1442,22 +1444,26 @@ unsigned short ip_rt_frag_needed(struct net *net, struct iphdr *iph,
 	if (ipv4_config.no_pmtu_disc)
 		return 0;
 
-	for (i = 0; i < 2; i++) {
-		unsigned hash = rt_hash(daddr, skeys[i], 0);
+	for (k = 0; k < 2; k++) {
+		for (i = 0; i < 2; i++) {
+			unsigned hash = rt_hash(daddr, skeys[i], ikeys[k]);
 
-		rcu_read_lock();
-		for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
-		     rth = rcu_dereference(rth->u.dst.rt_next)) {
-			if (rth->fl.fl4_dst == daddr &&
-			    rth->fl.fl4_src == skeys[i] &&
-			    rth->rt_dst  == daddr &&
-			    rth->rt_src  == iph->saddr &&
-			    rth->fl.iif == 0 &&
-			    !(dst_metric_locked(&rth->u.dst, RTAX_MTU)) &&
-			    net_eq(dev_net(rth->u.dst.dev), net) &&
-			    rth->rt_genid == atomic_read(&rt_genid)) {
+			rcu_read_lock();
+			for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
+			     rth = rcu_dereference(rth->u.dst.rt_next)) {
 				unsigned short mtu = new_mtu;
 
+				if (rth->fl.fl4_dst != daddr ||
+				    rth->fl.fl4_src != skeys[i] ||
+				    rth->rt_dst != daddr ||
+				    rth->rt_src != iph->saddr ||
+				    rth->fl.oif != ikeys[k] ||
+				    rth->fl.iif != 0 ||
+				    dst_metric_locked(&rth->u.dst, RTAX_MTU) ||
+				    !net_eq(dev_net(rth->u.dst.dev), net) ||
+				    rth->rt_genid != atomic_read(&rt_genid))
+					continue;
+
 				if (new_mtu < 68 || new_mtu >= old_mtu) {
 
 					/* BSD 4.2 compatibility hack :-( */
@@ -1483,8 +1489,8 @@ unsigned short ip_rt_frag_needed(struct net *net, struct iphdr *iph,
 					est_mtu = mtu;
 				}
 			}
+			rcu_read_unlock();
 		}
-		rcu_read_unlock();
 	}
 	return est_mtu ? : new_mtu;
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] IPv6: fix rt_lookup in pmtu_discovery
  2010-01-08  1:10     ` David Miller
@ 2010-01-09  0:12       ` Lorenzo Colitti
  2010-01-10 21:15         ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Lorenzo Colitti @ 2010-01-09  0:12 UTC (permalink / raw)
  To: David Miller; +Cc: zenczykowski, therbert, netdev

2010/1/7 David Miller <davem@davemloft.net>
>    ipv4: Update MTU to all related cache entries in ip_rt_frag_needed()
>
>    Add struct net_device parameter to ip_rt_frag_needed() and update MTU to
>    cache entries where ifindex is specified. This is similar to what is
>    already done in ip_rt_redirect().
> [...]
> +       int  ikeys[2] = { dev->ifindex, 0 };
>        __be32  skeys[2] = { iph->saddr, 0, };
>        __be32  daddr = iph->daddr;
> [...]

That patch makes it so that if a fragmentation needed message is
received on an interface other than the one that the kernel would
normally use to send a message to the original destination, then any
route cache entries pointing out that interface are updated as well.
AFAICT it was motivated by a scenario where traffic  was intended to
be sent through a particular interface with SO_BINDTODEVICE set:
http://lists.openwall.net/netdev/2008/04/24/44

The correct thing to do would be to update the MTU on all the route
cache entries, including entries pointing to other interfaces on the
box (for example, consider a box with a default route pointing at
eth0, the packet too big coming in on eth1, and the original packet
having been sent through gre1 with SO_BINDTODEVICE; in this case, the
existing IPv4 code would silently fail). However, this is expensive
and doing it for the two common cases seems a reasonable compromise,
so it's probably worth doing it for IPv6 as well.

How about this patch instead?

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c2bd74c..c27464d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1562,14 +1562,13 @@ out:
  *	i.e. Path MTU discovery
  */

-void rt6_pmtu_discovery(struct in6_addr *daddr, struct in6_addr *saddr,
-			struct net_device *dev, u32 pmtu)
+static void rt6_do_pmtu_disc(struct in6_addr *daddr, struct in6_addr *saddr,
+			     struct net *net, u32 pmtu, int ifindex)
 {
 	struct rt6_info *rt, *nrt;
-	struct net *net = dev_net(dev);
 	int allfrag = 0;

-	rt = rt6_lookup(net, daddr, saddr, dev->ifindex, 0);
+	rt = rt6_lookup(net, daddr, saddr, ifindex, 0);
 	if (rt == NULL)
 		return;

@@ -1637,6 +1636,28 @@ out:
 	dst_release(&rt->u.dst);
 }

+void rt6_pmtu_discovery(struct in6_addr *daddr, struct in6_addr *saddr,
+			struct net_device *dev, u32 pmtu)
+{
+	struct net *net = dev_net(dev);
+
+	/*
+	 * RFC 1981 states that a node "MUST reduce the size of the packets it
+	 * is sending along the path" that caused the Packet Too Big message.
+	 * Since it's not possible in the general case to determine which
+	 * interface was used to send the original packet, we update the MTU
+	 * on the interface that will be used to send future packets. We also
+	 * update the MTU on the interface that received the Packet Too Big in
+	 * case the original packet was forced out that interface with
+	 * SO_BINDTODEVICE or similar. This is the next best thing to the
+	 * correct behaviour, which would be to update the MTU on all
+	 * interfaces.
+	 */
+	rt6_do_pmtu_disc(daddr, saddr, net, pmtu, 0);
+	rt6_do_pmtu_disc(daddr, saddr, net, pmtu, dev->ifindex);
+}
+
+
 /*
  *	Misc support functions
  */

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] IPv6: fix rt_lookup in pmtu_discovery
  2010-01-09  0:12       ` Lorenzo Colitti
@ 2010-01-10 21:15         ` David Miller
  2010-01-14  0:51           ` Maciej Żenczykowski
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2010-01-10 21:15 UTC (permalink / raw)
  To: lorenzo; +Cc: zenczykowski, therbert, netdev

From: Lorenzo Colitti <lorenzo@google.com>
Date: Fri, 8 Jan 2010 16:12:55 -0800

> How about this patch instead?

It's still not fully correct.

You can't ignore source based routing and pretend it doesn't
exist or matter.

That's why ipv4 (and this code you are trying to change) needs to
iterate over saddr [ ANY, saddr ], and that's why you need that
special loop that iterates over all combinations of ifindex/saddr.

This is what I've been asking you to do from the beginning, please
implement it this way and I'll apply your patch.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] IPv6: fix rt_lookup in pmtu_discovery
  2010-01-10 21:15         ` David Miller
@ 2010-01-14  0:51           ` Maciej Żenczykowski
  2010-01-20 22:55             ` Maciej Żenczykowski
                               ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Maciej Żenczykowski @ 2010-01-14  0:51 UTC (permalink / raw)
  To: David Miller; +Cc: lorenzo, therbert, netdev

I believe you are looking for the following patch.

I don't think it solves the problem of source based routing to any
(significantly) greater extent than the previous patch.
That would actually require iterating over all IPs assigned to any
interface on our machine.

However, I really don't understand the code (nor the precise issues
involved) sufficiently well to be certain of that.

I actually think the ipv4 code path suffers from a very similar problem.

Mind you - I don't think these issues are really significant problems,
because they just mean we do PMTU once per (dest ip, src ip) instead
of once per (dest ip).

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c2bd74c..dd8e3b3 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1562,14 +1562,13 @@ out:
  *	i.e. Path MTU discovery
  */

-void rt6_pmtu_discovery(struct in6_addr *daddr, struct in6_addr *saddr,
-			struct net_device *dev, u32 pmtu)
+static void rt6_do_pmtu_disc(struct in6_addr *daddr, struct in6_addr *saddr,
+			     struct net *net, u32 pmtu, int ifindex)
 {
 	struct rt6_info *rt, *nrt;
-	struct net *net = dev_net(dev);
 	int allfrag = 0;

-	rt = rt6_lookup(net, daddr, saddr, dev->ifindex, 0);
+	rt = rt6_lookup(net, daddr, saddr, ifindex, 0);
 	if (rt == NULL)
 		return;

@@ -1637,6 +1636,31 @@ out:
 	dst_release(&rt->u.dst);
 }

+void rt6_pmtu_discovery(struct in6_addr *daddr, struct in6_addr *saddr,
+			struct net_device *dev, u32 pmtu)
+{
+	struct net *net = dev_net(dev);
+
+	/*
+	 * RFC 1981 states that a node "MUST reduce the size of the packets it
+	 * is sending along the path" that caused the Packet Too Big message.
+	 * Since it's not possible in the general case to determine which
+	 * interface was used to send the original packet, we update the MTU
+	 * on the interface that will be used to send future packets. We also
+	 * update the MTU on the interface that received the Packet Too Big in
+	 * case the original packet was forced out that interface with
+	 * SO_BINDTODEVICE or similar. This is the next best thing to the
+	 * correct behaviour, which would be to update the MTU on all
+	 * interfaces.
+	 */
+	rt6_do_pmtu_disc(daddr, saddr, net, pmtu, 0);
+	rt6_do_pmtu_disc(daddr, saddr, net, pmtu, dev->ifindex);
+	/* also support source address based routing */
+	rt6_do_pmtu_disc(daddr, NULL, net, pmtu, 0);
+	rt6_do_pmtu_disc(daddr, NULL, net, pmtu, dev->ifindex);
+}
+
+
 /*
  *	Misc support functions
  */

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] IPv6: fix rt_lookup in pmtu_discovery
  2010-01-14  0:51           ` Maciej Żenczykowski
@ 2010-01-20 22:55             ` Maciej Żenczykowski
  2010-01-20 22:57               ` David Miller
  2010-01-23 10:20             ` David Miller
  2010-09-27 10:05             ` [PATCH] net: Fix IPv6 PMTU disc. w/ asymmetric routes Maciej Żenczykowski
  2 siblings, 1 reply; 18+ messages in thread
From: Maciej Żenczykowski @ 2010-01-20 22:55 UTC (permalink / raw)
  To: David Miller; +Cc: lorenzo, therbert, netdev

David, could you comment on this?

2010/1/13 Maciej Żenczykowski <zenczykowski@gmail.com>:
> I believe you are looking for the following patch.
>
> I don't think it solves the problem of source based routing to any
> (significantly) greater extent than the previous patch.
> That would actually require iterating over all IPs assigned to any
> interface on our machine.
>
> However, I really don't understand the code (nor the precise issues
> involved) sufficiently well to be certain of that.
>
> I actually think the ipv4 code path suffers from a very similar problem.
>
> Mind you - I don't think these issues are really significant problems,
> because they just mean we do PMTU once per (dest ip, src ip) instead
> of once per (dest ip).
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index c2bd74c..dd8e3b3 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1562,14 +1562,13 @@ out:
>  *     i.e. Path MTU discovery
>  */
>
> -void rt6_pmtu_discovery(struct in6_addr *daddr, struct in6_addr *saddr,
> -                       struct net_device *dev, u32 pmtu)
> +static void rt6_do_pmtu_disc(struct in6_addr *daddr, struct in6_addr *saddr,
> +                            struct net *net, u32 pmtu, int ifindex)
>  {
>        struct rt6_info *rt, *nrt;
> -       struct net *net = dev_net(dev);
>        int allfrag = 0;
>
> -       rt = rt6_lookup(net, daddr, saddr, dev->ifindex, 0);
> +       rt = rt6_lookup(net, daddr, saddr, ifindex, 0);
>        if (rt == NULL)
>                return;
>
> @@ -1637,6 +1636,31 @@ out:
>        dst_release(&rt->u.dst);
>  }
>
> +void rt6_pmtu_discovery(struct in6_addr *daddr, struct in6_addr *saddr,
> +                       struct net_device *dev, u32 pmtu)
> +{
> +       struct net *net = dev_net(dev);
> +
> +       /*
> +        * RFC 1981 states that a node "MUST reduce the size of the packets it
> +        * is sending along the path" that caused the Packet Too Big message.
> +        * Since it's not possible in the general case to determine which
> +        * interface was used to send the original packet, we update the MTU
> +        * on the interface that will be used to send future packets. We also
> +        * update the MTU on the interface that received the Packet Too Big in
> +        * case the original packet was forced out that interface with
> +        * SO_BINDTODEVICE or similar. This is the next best thing to the
> +        * correct behaviour, which would be to update the MTU on all
> +        * interfaces.
> +        */
> +       rt6_do_pmtu_disc(daddr, saddr, net, pmtu, 0);
> +       rt6_do_pmtu_disc(daddr, saddr, net, pmtu, dev->ifindex);
> +       /* also support source address based routing */
> +       rt6_do_pmtu_disc(daddr, NULL, net, pmtu, 0);
> +       rt6_do_pmtu_disc(daddr, NULL, net, pmtu, dev->ifindex);
> +}
> +
> +
>  /*
>  *     Misc support functions
>  */
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] IPv6: fix rt_lookup in pmtu_discovery
  2010-01-20 22:55             ` Maciej Żenczykowski
@ 2010-01-20 22:57               ` David Miller
  2010-01-20 23:33                 ` Maciej Żenczykowski
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2010-01-20 22:57 UTC (permalink / raw)
  To: zenczykowski; +Cc: lorenzo, therbert, netdev

From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Wed, 20 Jan 2010 14:55:10 -0800

> David, could you comment on this?

I'll get to your patch when I get to it.  I have tons of other thing
that have much higher priority than this.

If it's in patchwork in "under review" state (which this one is), that
means it's in my backlog and just poking me about it will serve only
to irritate me and put it even lower ino my TODO list.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] IPv6: fix rt_lookup in pmtu_discovery
  2010-01-20 22:57               ` David Miller
@ 2010-01-20 23:33                 ` Maciej Żenczykowski
  0 siblings, 0 replies; 18+ messages in thread
From: Maciej Żenczykowski @ 2010-01-20 23:33 UTC (permalink / raw)
  To: David Miller; +Cc: lorenzo, therbert, netdev

>> David, could you comment on this?
>
> I'll get to your patch when I get to it.  I have tons of other thing
> that have much higher priority than this.
>
> If it's in patchwork in "under review" state (which this one is), that
> means it's in my backlog and just poking me about it will serve only
> to irritate me and put it even lower ino my TODO list.

Thank you, and sorry for the trouble.

Maciej

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] IPv6: fix rt_lookup in pmtu_discovery
  2010-01-14  0:51           ` Maciej Żenczykowski
  2010-01-20 22:55             ` Maciej Żenczykowski
@ 2010-01-23 10:20             ` David Miller
  2010-09-27 10:05             ` [PATCH] net: Fix IPv6 PMTU disc. w/ asymmetric routes Maciej Żenczykowski
  2 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2010-01-23 10:20 UTC (permalink / raw)
  To: zenczykowski; +Cc: lorenzo, therbert, netdev

From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Wed, 13 Jan 2010 16:51:30 -0800

> I believe you are looking for the following patch.
> 
> I don't think it solves the problem of source based routing to any
> (significantly) greater extent than the previous patch.
> That would actually require iterating over all IPs assigned to any
> interface on our machine.
> 
> However, I really don't understand the code (nor the precise issues
> involved) sufficiently well to be certain of that.
> 
> I actually think the ipv4 code path suffers from a very similar problem.
> 
> Mind you - I don't think these issues are really significant problems,
> because they just mean we do PMTU once per (dest ip, src ip) instead
> of once per (dest ip).

Yes, this is about right.

Please resubmit with a proper Signed-off-by: tag so I can apply
this, thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] net: Fix IPv6 PMTU disc. w/ asymmetric routes
  2010-01-14  0:51           ` Maciej Żenczykowski
  2010-01-20 22:55             ` Maciej Żenczykowski
  2010-01-23 10:20             ` David Miller
@ 2010-09-27 10:05             ` Maciej Żenczykowski
       [not found]               ` <AANLkTikPOHy79E1ZG=iJ-rHj0vzS+AY-mGqCEtWoXp2o@mail.gmail.com>
  2010-09-28 20:58               ` David Miller
  2 siblings, 2 replies; 18+ messages in thread
From: Maciej Żenczykowski @ 2010-09-27 10:05 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Maciej Żenczykowski

From: Maciej Żenczykowski <maze@google.com>

Signed-off-by: Maciej Żenczykowski <maze@google.com>
---
 net/ipv6/route.c |   31 +++++++++++++++++++++++++++----
 1 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 3a74f90..d700d1c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1559,14 +1559,13 @@ out:
  *	i.e. Path MTU discovery
  */
 
-void rt6_pmtu_discovery(struct in6_addr *daddr, struct in6_addr *saddr,
-			struct net_device *dev, u32 pmtu)
+static void rt6_do_pmtu_disc(struct in6_addr *daddr, struct in6_addr *saddr,
+			     struct net *net, u32 pmtu, int ifindex)
 {
 	struct rt6_info *rt, *nrt;
-	struct net *net = dev_net(dev);
 	int allfrag = 0;
 
-	rt = rt6_lookup(net, daddr, saddr, dev->ifindex, 0);
+	rt = rt6_lookup(net, daddr, saddr, ifindex, 0);
 	if (rt == NULL)
 		return;
 
@@ -1634,6 +1633,30 @@ out:
 	dst_release(&rt->dst);
 }
 
+void rt6_pmtu_discovery(struct in6_addr *daddr, struct in6_addr *saddr,
+			struct net_device *dev, u32 pmtu)
+{
+	struct net *net = dev_net(dev);
+
+	/*
+	 * RFC 1981 states that a node "MUST reduce the size of the packets it
+	 * is sending along the path" that caused the Packet Too Big message.
+	 * Since it's not possible in the general case to determine which
+	 * interface was used to send the original packet, we update the MTU
+	 * on the interface that will be used to send future packets. We also
+	 * update the MTU on the interface that received the Packet Too Big in
+	 * case the original packet was forced out that interface with
+	 * SO_BINDTODEVICE or similar. This is the next best thing to the
+	 * correct behaviour, which would be to update the MTU on all
+	 * interfaces.
+	 */
+	rt6_do_pmtu_disc(daddr, saddr, net, pmtu, 0);
+	rt6_do_pmtu_disc(daddr, saddr, net, pmtu, dev->ifindex);
+	/* also support source address based routing */
+	rt6_do_pmtu_disc(daddr, NULL, net, pmtu, 0);
+	rt6_do_pmtu_disc(daddr, NULL, net, pmtu, dev->ifindex);
+}
+
 /*
  *	Misc support functions
  */
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: Fix IPv6 PMTU disc. w/ asymmetric routes
       [not found]               ` <AANLkTikPOHy79E1ZG=iJ-rHj0vzS+AY-mGqCEtWoXp2o@mail.gmail.com>
@ 2010-09-27 18:11                 ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2010-09-27 18:11 UTC (permalink / raw)
  To: maze; +Cc: netdev

From: Maciej Żenczykowski <maze@google.com>
Date: Mon, 27 Sep 2010 03:10:12 -0700

> FYI, I'm signed up to netdev on my personal account.

You don't need to subscribe in order to post patches to the mailing
list.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: Fix IPv6 PMTU disc. w/ asymmetric routes
  2010-09-27 10:05             ` [PATCH] net: Fix IPv6 PMTU disc. w/ asymmetric routes Maciej Żenczykowski
       [not found]               ` <AANLkTikPOHy79E1ZG=iJ-rHj0vzS+AY-mGqCEtWoXp2o@mail.gmail.com>
@ 2010-09-28 20:58               ` David Miller
  2010-09-28 22:37                 ` Maciej Żenczykowski
  1 sibling, 1 reply; 18+ messages in thread
From: David Miller @ 2010-09-28 20:58 UTC (permalink / raw)
  To: zenczykowski; +Cc: netdev, maze

From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Mon, 27 Sep 2010 03:05:57 -0700

> From: Maciej Żenczykowski <maze@google.com>
> 
> Signed-off-by: Maciej Żenczykowski <maze@google.com>

Please handle all 4 cases just like the ipv4 routing code does:

{ saddr = SADDR, ifindex = dev->ifindex }
{ saddr = SADDR, ifindex = 0 }
{ saddr = INADDR_ANY, ifindex = dev->ifindex }
{ saddr = INADDR_ANY, ifindex = 0 }

I believe I've specifically asked for this every time someone
mentioned that they wanted to fix this issue.

And the ipv4 side is a good guide in other ways, it uses a set
of two arrays so you can just loop over them, making your
rt6_do_pmtu_disc() calls along the way.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: Fix IPv6 PMTU disc. w/ asymmetric routes
  2010-09-28 20:58               ` David Miller
@ 2010-09-28 22:37                 ` Maciej Żenczykowski
  2010-09-30  7:41                   ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Maciej Żenczykowski @ 2010-09-28 22:37 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Hideaki Yoshifuji

> Please handle all 4 cases just like the ipv4 routing code does:
>
> { saddr = SADDR, ifindex = dev->ifindex }
> { saddr = SADDR, ifindex = 0 }
> { saddr = INADDR_ANY, ifindex = dev->ifindex }
> { saddr = INADDR_ANY, ifindex = 0 }
>
> I believe I've specifically asked for this every time someone mentioned that they wanted to fix this issue.

I believe that is exactly what the following code fragment from the
above patch does:

+       rt6_do_pmtu_disc(daddr, saddr, net, pmtu, 0);
+       rt6_do_pmtu_disc(daddr, saddr, net, pmtu, dev->ifindex);
+       rt6_do_pmtu_disc(daddr, NULL, net, pmtu, 0);
+       rt6_do_pmtu_disc(daddr, NULL, net, pmtu, dev->ifindex);

The last two of these lines were added to the patch last time around
per your feedback - precisely to address this issue.
I reposted with those changes and you appeared to be ok with the
patch, and your only comment was to add a 'Signed-off-by' line.


Side note:
* I still think that handling the saddr == NULL ie. INADDR_ANY case is
entirely superfluous, since it doesn't actually iterate through all
possible source addresses.  With IPv6 there can be many, many possible
source addresses (just think of link local vs global public vs privacy
addresses and then tack on 6to4 and mobility, etc... for example I see
13 ipv6 addresses on eth0 on my desktop at home, 12 of them globally
reachable).
* Currently PMTU discovery is broken.  With this patch with all 4
lines we end up doing PMTU discovery per source address that isn't the
default source address.  Without those last two lines we'd just do
PMTU discovery per source address.  Not much of a difference, both
allow PMTU discovery to actually happen.

Except:

Imagine a machine with a 1500 MTU network card and 2 source addresses:
  - one default v6 native 'A'
  - one additional 6to4 address 'B'
The native address can reach the internet with a max MTU of 1500.
While due to source address based routing the 6to4 address goes
through a sit ipv4 based tunnel and thus has a max MTU of 1480 (1500 -
20 bytes IPv4 header).

Imagine a remote machine with v6 address 'Z' (our default address to
reach 'Z' is 'A').
Clearly the PMTU of A->Z is 1500, while the PMTU of B->Z is 1480.
[Assume there's nothing which would cause them to be lower.]

If you do PMTU discovery of A->Z, you will indeed get 1500, and mark
that as the PMTU of A->Z.
However if you do PMTU discovery of B->Z, you will indeed get 1480,
but you will mark that as the PMTU of both B->Z and A->Z (since A is
the default to reach Z).
Suddenly the PMTU of A->Z _needlessly and incorrectly_ dropped from
1500 to 1480 merely because of PMTU discovery being performed on B->Z.

And this is precisely why I think that adding those last two lines
that handle INADDR_ANY is actually more of a problem then solution
(indeed as far as I can tell, adding them doesn't actually fix any
problem - the code works just fine and performs PMTU discovery
correctly with just the first two lines).

Thankfully ending up with a lower MTU than actually possible is only a
slight performance problem, while not doing PMTU discovery correctly
at all (current state with asymmetric routing) is a big issue.
Hence regardless of whether this patch includes INADDR_ANY or not the
end result will still be an improvement.

> And the ipv4 side is a good guide in other ways, it uses a set of two arrays so you can just loop over them, making your rt6_do_pmtu_disc() calls along the way.

Sure, I can change it to use arrays, although I don't see any
particular improvement in performance (even if rt6_do_pmtu_disc was
inlined) or readability or anything.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: Fix IPv6 PMTU disc. w/ asymmetric routes
  2010-09-28 22:37                 ` Maciej Żenczykowski
@ 2010-09-30  7:41                   ` David Miller
  2010-10-03 21:49                     ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2010-09-30  7:41 UTC (permalink / raw)
  To: zenczykowski; +Cc: netdev, yoshfuji

From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Tue, 28 Sep 2010 15:37:26 -0700

> * I still think that handling the saddr == NULL ie. INADDR_ANY case is
> entirely superfluous, since it doesn't actually iterate through all
> possible source addresses.  With IPv6 there can be many, many possible
> source addresses (just think of link local vs global public vs privacy
> addresses and then tack on 6to4 and mobility, etc... for example I see
> 13 ipv6 addresses on eth0 on my desktop at home, 12 of them globally
> reachable).

I only have %100 confidence in the reasoning behind why ipv4 handles
things this way, so I'll discuss this in those terms and then try
to tie it into the ipv6 side.

When we are looking up an ipv4 output route, there are 2 "source
address" objects.

1) The one specified in the "struct flowi" for the lookup
   (the flp->fl4_src passed into ip_route_output_flow) which
   is also the one that ends up in the routing cache entry's
   ->fl.fl4_src member.

2) The one contained in the routing cache entry's specification.
   Ie. rth->rt_src

These are distinct.  #1 is what is used to hash and find a matching
routing cache entry.

Since a source address of INADDR_ANY is allowed for routing lookups,
routing cache entries for the same daddr/saddr pair can exist in more
than one hash chains.

Therefore, if we didn't iterate over INADDR_ANY and the specific
address in the icmp PMTU message, we'd miss some routing cache
entries.

Look at the PMTU loops in ipv4 ip_rt_frag_needed():

	for (k = 0; k < 2; k++) {
		for (i = 0; i < 2; i++) {
			unsigned hash = rt_hash(daddr, skeys[i], ikeys[k],
						rt_genid(net));

("ANY vs. specific" ifindex and saddr are used for the hash
 computation)

				if (rth->fl.fl4_dst != daddr ||
				    rth->fl.fl4_src != skeys[i] ||
				    rth->rt_dst != daddr ||
				    rth->rt_src != iph->saddr ||
				    rth->fl.oif != ikeys[k] ||
				    rth->fl.iif != 0 ||

(and for the routing cache entry flow member comparisons)

But the routing cache entry "rt_src" member is compared always to
"iph->saddr", it doesn't use the "ANY vs. specific" skey[] value.

Unless ipv6 does not allow INADDR_ANY source address specifications
during route lookups, it ought to have the same issue too.

My understanding is that ipv6 uses a two-layered tree based scheme,
one layer to key off of the source address and one layer to key off
of the destination address.

So it seems to me that the lookups would have the same aliasing issue
that ipv4 does, and thus require checking both the specific saddr
and also the saddr INADDR_ANY.

Maybe the problem is that the ipv6 side uses the same saddr for both
the lookup and the entry comparison in these PMTU code paths?  Does it
not allow specifying them seperately as the ipv4 PMTU (and incidently
the RT redirect) code paths do?

Or is this not an issue on the ipv6 side for some reason?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: Fix IPv6 PMTU disc. w/ asymmetric routes
  2010-09-30  7:41                   ` David Miller
@ 2010-10-03 21:49                     ` David Miller
  2010-10-04  0:21                       ` Maciej Żenczykowski
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2010-10-03 21:49 UTC (permalink / raw)
  To: zenczykowski; +Cc: netdev, yoshfuji

From: David Miller <davem@davemloft.net>
Date: Thu, 30 Sep 2010 00:41:36 -0700 (PDT)

> Maybe the problem is that the ipv6 side uses the same saddr for both
> the lookup and the entry comparison in these PMTU code paths?  Does it
> not allow specifying them seperately as the ipv4 PMTU (and incidently
> the RT redirect) code paths do?
> 
> Or is this not an issue on the ipv6 side for some reason?

Ok, meanwhile I did the research.

What ipv6 does is that when you lookup a route, it clones or copies
the prefixes route into one that is fully specified for a specific
SADDR/DADDR pair, and then inserts that specific route into the FIB6
tree.

Therefore the only cases we should lookup for PMTU discovery for ipv6
are:

	{ daddr, saddr, ifindex == 0 }
	{ daddr, saddr, ifindex == dev->ifindex }

This achieves the same effect as what ipv4 is doing.

So Maciej your original attempt was correct all along, and as a result
I'll commit the following.

Thanks!

--------------------
net: Fix IPv6 PMTU disc. w/ asymmetric routes

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/route.c |   28 ++++++++++++++++++++++++----
 1 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8323136..a275c6e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1556,14 +1556,13 @@ out:
  *	i.e. Path MTU discovery
  */
 
-void rt6_pmtu_discovery(struct in6_addr *daddr, struct in6_addr *saddr,
-			struct net_device *dev, u32 pmtu)
+static void rt6_do_pmtu_disc(struct in6_addr *daddr, struct in6_addr *saddr,
+			     struct net *net, u32 pmtu, int ifindex)
 {
 	struct rt6_info *rt, *nrt;
-	struct net *net = dev_net(dev);
 	int allfrag = 0;
 
-	rt = rt6_lookup(net, daddr, saddr, dev->ifindex, 0);
+	rt = rt6_lookup(net, daddr, saddr, ifindex, 0);
 	if (rt == NULL)
 		return;
 
@@ -1631,6 +1630,27 @@ out:
 	dst_release(&rt->dst);
 }
 
+void rt6_pmtu_discovery(struct in6_addr *daddr, struct in6_addr *saddr,
+			struct net_device *dev, u32 pmtu)
+{
+	struct net *net = dev_net(dev);
+
+	/*
+	 * RFC 1981 states that a node "MUST reduce the size of the packets it
+	 * is sending along the path" that caused the Packet Too Big message.
+	 * Since it's not possible in the general case to determine which
+	 * interface was used to send the original packet, we update the MTU
+	 * on the interface that will be used to send future packets. We also
+	 * update the MTU on the interface that received the Packet Too Big in
+	 * case the original packet was forced out that interface with
+	 * SO_BINDTODEVICE or similar. This is the next best thing to the
+	 * correct behaviour, which would be to update the MTU on all
+	 * interfaces.
+	 */
+	rt6_do_pmtu_disc(daddr, saddr, net, pmtu, 0);
+	rt6_do_pmtu_disc(daddr, saddr, net, pmtu, dev->ifindex);
+}
+
 /*
  *	Misc support functions
  */
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: Fix IPv6 PMTU disc. w/ asymmetric routes
  2010-10-03 21:49                     ` David Miller
@ 2010-10-04  0:21                       ` Maciej Żenczykowski
  0 siblings, 0 replies; 18+ messages in thread
From: Maciej Żenczykowski @ 2010-10-04  0:21 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, yoshfuji

> Ok, meanwhile I did the research.
>
> What ipv6 does is that when you lookup a route, it clones or copies
> the prefixes route into one that is fully specified for a specific
> SADDR/DADDR pair, and then inserts that specific route into the FIB6
> tree.
>
> Therefore the only cases we should lookup for PMTU discovery for ipv6
> are:
>
>        { daddr, saddr, ifindex == 0 }
>        { daddr, saddr, ifindex == dev->ifindex }
>
> This achieves the same effect as what ipv4 is doing.
>
> So Maciej your original attempt was correct all along, and as a result
> I'll commit the following.
>
> Thanks!

Awesome.  I've been trying to convince myself of this and come up with
a concise explanation - but it sounds like you got there first.

[I ran into SUBTREES and got stuck in trying to understand exactly how
they work and whether they affect this at all.]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-10-04  0:21 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-07  4:43 [PATCH] IPv6: fix rt_lookup in pmtu_discovery Tom Herbert
2010-01-07  9:27 ` David Miller
2010-01-08  1:05   ` Maciej Żenczykowski
2010-01-08  1:10     ` David Miller
2010-01-09  0:12       ` Lorenzo Colitti
2010-01-10 21:15         ` David Miller
2010-01-14  0:51           ` Maciej Żenczykowski
2010-01-20 22:55             ` Maciej Żenczykowski
2010-01-20 22:57               ` David Miller
2010-01-20 23:33                 ` Maciej Żenczykowski
2010-01-23 10:20             ` David Miller
2010-09-27 10:05             ` [PATCH] net: Fix IPv6 PMTU disc. w/ asymmetric routes Maciej Żenczykowski
     [not found]               ` <AANLkTikPOHy79E1ZG=iJ-rHj0vzS+AY-mGqCEtWoXp2o@mail.gmail.com>
2010-09-27 18:11                 ` David Miller
2010-09-28 20:58               ` David Miller
2010-09-28 22:37                 ` Maciej Żenczykowski
2010-09-30  7:41                   ` David Miller
2010-10-03 21:49                     ` David Miller
2010-10-04  0:21                       ` Maciej Żenczykowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.