netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/4] ipv4: various pmtu discovery fixes
@ 2011-10-11 11:08 Steffen Klassert
  2011-10-11 11:09 ` [PATCH 1/4] ipv4: Fix pmtu propagating Steffen Klassert
                   ` (4 more replies)
  0 siblings, 5 replies; 30+ messages in thread
From: Steffen Klassert @ 2011-10-11 11:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

This patchset addresses some issues I found during investigating
pmtu discovery. These issues were introduced with
git commit 2c8cec5 (ipv4: Cache learned PMTU information in inetpeer).

The patchset is based on current net/master.

Steffen

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/4] ipv4: Fix pmtu propagating
  2011-10-11 11:08 [PATCH net 0/4] ipv4: various pmtu discovery fixes Steffen Klassert
@ 2011-10-11 11:09 ` Steffen Klassert
  2011-10-12 21:02   ` David Miller
  2011-10-11 11:10 ` [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes Steffen Klassert
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Steffen Klassert @ 2011-10-11 11:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Since commit 2c8cec5c (ipv4: Cache learned PMTU information in inetpeer)
we cache the learned pmtu informations in inetpeer and propagate these
informations with the dst_ops->check() functions. However, these functions
might not be called for udp and raw packets. As a consequence, we don't
use the learned pmtu informations in these cases. With this patch we
call dst_check() from ip_setup_cork() to propagate the pmtu informations.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/ip_output.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 8c65633..f682437 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1063,6 +1063,11 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 	rt = *rtp;
 	if (unlikely(!rt))
 		return -EFAULT;
+
+	rt = (struct rtable *)dst_check(&rt->dst, 0);
+	if (!rt)
+		return -EHOSTUNREACH;
+
 	/*
 	 * We steal reference to this route, caller should not release it
 	 */
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes
  2011-10-11 11:08 [PATCH net 0/4] ipv4: various pmtu discovery fixes Steffen Klassert
  2011-10-11 11:09 ` [PATCH 1/4] ipv4: Fix pmtu propagating Steffen Klassert
@ 2011-10-11 11:10 ` Steffen Klassert
  2011-10-12 21:08   ` David Miller
  2011-10-11 11:11 ` [PATCH 3/4] ipv4: Fix inetpeer expiration handling Steffen Klassert
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Steffen Klassert @ 2011-10-11 11:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

The pmtu informations on the inetpeer are visible for output and
input routes. On packet forwarding, we might propagate a learned
pmtu to the sender. As we update the pmtu informations of the
inetpeer on demand, the original sender of the forwarded packets
might never notice when the pmtu to that inetpeer increases.
So propagate the nexthop mtu instead of the pmtu to the final
destination.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/route.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 075212e..9a6623e 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1579,9 +1579,10 @@ unsigned short ip_rt_frag_needed(struct net *net, const struct iphdr *iph,
 
 static void check_peer_pmtu(struct dst_entry *dst, struct inet_peer *peer)
 {
+	struct rtable *rt = (struct rtable *) dst;
 	unsigned long expires = ACCESS_ONCE(peer->pmtu_expires);
 
-	if (!expires)
+	if (rt_is_input_route(rt) || !expires)
 		return;
 	if (time_before(jiffies, expires)) {
 		u32 orig_dst_mtu = dst_mtu(dst);
@@ -1803,6 +1804,7 @@ static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4,
 			    struct fib_info *fi)
 {
 	struct inet_peer *peer;
+	struct dst_entry *dst = &rt->dst;
 	int create = 0;
 
 	/* If a peer entry exists for this destination, we must hook
@@ -1817,9 +1819,14 @@ static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4,
 		if (inet_metrics_new(peer))
 			memcpy(peer->metrics, fi->fib_metrics,
 			       sizeof(u32) * RTAX_MAX);
-		dst_init_metrics(&rt->dst, peer->metrics, false);
 
-		check_peer_pmtu(&rt->dst, peer);
+		dst_init_metrics(dst, peer->metrics, false);
+		check_peer_pmtu(dst, peer);
+
+		if (rt_is_input_route(rt))
+			dst_metric_set(dst, RTAX_MTU,
+				       dst->ops->default_mtu(dst));
+
 		if (peer->redirect_learned.a4 &&
 		    peer->redirect_learned.a4 != rt->rt_gateway) {
 			rt->rt_gateway = peer->redirect_learned.a4;
@@ -1830,7 +1837,7 @@ static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4,
 			rt->fi = fi;
 			atomic_inc(&fi->fib_clntref);
 		}
-		dst_init_metrics(&rt->dst, fi->fib_metrics, true);
+		dst_init_metrics(dst, fi->fib_metrics, true);
 	}
 }
 
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/4] ipv4: Fix inetpeer expiration handling
  2011-10-11 11:08 [PATCH net 0/4] ipv4: various pmtu discovery fixes Steffen Klassert
  2011-10-11 11:09 ` [PATCH 1/4] ipv4: Fix pmtu propagating Steffen Klassert
  2011-10-11 11:10 ` [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes Steffen Klassert
@ 2011-10-11 11:11 ` Steffen Klassert
  2011-10-20  6:24   ` Gao feng
  2011-10-11 11:12 ` [PATCH 4/4] ipv4: Fix inetpeer expire time information Steffen Klassert
  2011-10-11 19:54 ` [PATCH net 0/4] ipv4: various pmtu discovery fixes David Miller
  4 siblings, 1 reply; 30+ messages in thread
From: Steffen Klassert @ 2011-10-11 11:11 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

We leak a trigger to check if the learned pmtu information has
expired, so the learned pmtu informations never expire. This patch
add such a trigger to ipv4_dst_check().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/route.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 9a6623e..cda370c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1660,6 +1660,10 @@ static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
 
 	if (rt_is_expired(rt))
 		return NULL;
+
+	if (rt->peer && peer_pmtu_expired(rt->peer))
+		dst_metric_set(dst, RTAX_MTU, rt->peer->pmtu_orig);
+
 	if (rt->rt_peer_genid != rt_peer_genid()) {
 		struct inet_peer *peer;
 
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/4] ipv4: Fix inetpeer expire time information
  2011-10-11 11:08 [PATCH net 0/4] ipv4: various pmtu discovery fixes Steffen Klassert
                   ` (2 preceding siblings ...)
  2011-10-11 11:11 ` [PATCH 3/4] ipv4: Fix inetpeer expiration handling Steffen Klassert
@ 2011-10-11 11:12 ` Steffen Klassert
  2011-10-11 19:54 ` [PATCH net 0/4] ipv4: various pmtu discovery fixes David Miller
  4 siblings, 0 replies; 30+ messages in thread
From: Steffen Klassert @ 2011-10-11 11:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

As we update the learned pmtu informations on demand, we might
report a nagative expiration time value to userspace if the
pmtu informations are already expired and we have not send a
packet to that inetpeer after expiration. With this patch we
send a expire time of null to userspace after expiration
until the next packet is send to that inetpeer.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/route.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index cda370c..501a3c0 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2832,7 +2832,7 @@ static int rt_fill_info(struct net *net,
 	struct rtable *rt = skb_rtable(skb);
 	struct rtmsg *r;
 	struct nlmsghdr *nlh;
-	long expires = 0;
+	unsigned long expires = 0;
 	const struct inet_peer *peer = rt->peer;
 	u32 id = 0, ts = 0, tsage = 0, error;
 
@@ -2889,8 +2889,12 @@ static int rt_fill_info(struct net *net,
 			tsage = get_seconds() - peer->tcp_ts_stamp;
 		}
 		expires = ACCESS_ONCE(peer->pmtu_expires);
-		if (expires)
-			expires -= jiffies;
+		if (expires) {
+			if (time_before(jiffies, expires))
+				expires -= jiffies;
+			else
+				expires = 0;
+		}
 	}
 
 	if (rt_is_input_route(rt)) {
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH net 0/4] ipv4: various pmtu discovery fixes
  2011-10-11 11:08 [PATCH net 0/4] ipv4: various pmtu discovery fixes Steffen Klassert
                   ` (3 preceding siblings ...)
  2011-10-11 11:12 ` [PATCH 4/4] ipv4: Fix inetpeer expire time information Steffen Klassert
@ 2011-10-11 19:54 ` David Miller
  2011-11-08 19:41   ` David Miller
  4 siblings, 1 reply; 30+ messages in thread
From: David Miller @ 2011-10-11 19:54 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Tue, 11 Oct 2011 13:08:42 +0200

> This patchset addresses some issues I found during investigating
> pmtu discovery. These issues were introduced with
> git commit 2c8cec5 (ipv4: Cache learned PMTU information in inetpeer).
> 
> The patchset is based on current net/master.

Thanks a lot for these bug fixes, Herbert Xu reported similar issues.

I'll review these patches tonight.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
  2011-10-11 11:09 ` [PATCH 1/4] ipv4: Fix pmtu propagating Steffen Klassert
@ 2011-10-12 21:02   ` David Miller
  2011-10-13 10:09     ` Steffen Klassert
  0 siblings, 1 reply; 30+ messages in thread
From: David Miller @ 2011-10-12 21:02 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Tue, 11 Oct 2011 13:09:22 +0200

> Since commit 2c8cec5c (ipv4: Cache learned PMTU information in inetpeer)
> we cache the learned pmtu informations in inetpeer and propagate these
> informations with the dst_ops->check() functions. However, these functions
> might not be called for udp and raw packets. As a consequence, we don't
> use the learned pmtu informations in these cases. With this patch we
> call dst_check() from ip_setup_cork() to propagate the pmtu informations.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

This dst_check() call will only do something if dst->obsolete is non-zero.

If dst->obsolete can be set in these circumstances, that's a bug.  The
caller is responsible for providing either a freshly looked up route
or a cached route which has had dst_check() or sk_dst_check() invoked
upon it.

I am pretty sure these rules are followed by the current code.

Again, there are only two scenerios:

1) 'rt' is just looked up by caller (f.e. udp_sendmsg() in rt == NULL case),
   here dst->obsolete is very unlikely to be non-zero.

2) Connected case, and we use cached route from the socket, but here
   we'll use sk_dst_check() to validate the route.  sk_dst_check()
   makes the necessary dst->ops->check() call if dst->obsolete is
   non-zero, and in fact that is it's one and only job.

So I cannot see a case where your new check can be necessary.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes
  2011-10-11 11:10 ` [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes Steffen Klassert
@ 2011-10-12 21:08   ` David Miller
  2011-10-14  6:34     ` Steffen Klassert
  2011-11-08 19:36     ` David Miller
  0 siblings, 2 replies; 30+ messages in thread
From: David Miller @ 2011-10-12 21:08 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Tue, 11 Oct 2011 13:10:27 +0200

> @@ -1817,9 +1819,14 @@ static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4,
>  		if (inet_metrics_new(peer))
>  			memcpy(peer->metrics, fi->fib_metrics,
>  			       sizeof(u32) * RTAX_MAX);
> -		dst_init_metrics(&rt->dst, peer->metrics, false);
>  
> -		check_peer_pmtu(&rt->dst, peer);
> +		dst_init_metrics(dst, peer->metrics, false);
> +		check_peer_pmtu(dst, peer);
> +
> +		if (rt_is_input_route(rt))
> +			dst_metric_set(dst, RTAX_MTU,
> +				       dst->ops->default_mtu(dst));
> +

You really can't do this, it's going to kill all of the memory savings from
storing metrics in the inetpeer cache.

Every input route is going to have it's metrics COW'd with this change.

The whole idea is to use defaults as heavily as possible, and that's
the entire reason why the dst->ops->default_mtu() method exists, so
that we can just leave the values alone and have read-only copies %99
of the time.

Please rearrange your fix so that these goals are still achieved.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
  2011-10-12 21:02   ` David Miller
@ 2011-10-13 10:09     ` Steffen Klassert
  2011-10-13 17:58       ` David Miller
  0 siblings, 1 reply; 30+ messages in thread
From: Steffen Klassert @ 2011-10-13 10:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Wed, Oct 12, 2011 at 05:02:18PM -0400, David Miller wrote:
> 
> This dst_check() call will only do something if dst->obsolete is non-zero.
> 
> If dst->obsolete can be set in these circumstances, that's a bug.  The
> caller is responsible for providing either a freshly looked up route
> or a cached route which has had dst_check() or sk_dst_check() invoked
> upon it.

dst->obsolete was -1 in all cases I observed, regardless if connected or not.

> 
> I am pretty sure these rules are followed by the current code.
> 
> Again, there are only two scenerios:
> 
> 1) 'rt' is just looked up by caller (f.e. udp_sendmsg() in rt == NULL case),
>    here dst->obsolete is very unlikely to be non-zero.
> 
> 2) Connected case, and we use cached route from the socket, but here
>    we'll use sk_dst_check() to validate the route.  sk_dst_check()
>    makes the necessary dst->ops->check() call if dst->obsolete is
>    non-zero, and in fact that is it's one and only job.
> 

At least it seems that raw_sendmsg() and ping_sendmsg() don't use
a cached route, they do the route lookup in any case. I don't see
where we check if we learned a new pmtu in this cases. 

I added some debugging output and saw that peer->pmtu_learned
has the correct pmtu value, but it never ends up in the metric
of the dst_entry.

With a ping -s 1400 to a destination where the pmtu is 1000
I get for each packet 'Frag needed and DF set (mtu = 1000)'.
That's how I noticed that something changed.

With the dst_check() in ip_setup_cork() I get
'Frag needed and DF set (mtu = 1000)' for the first packet and
all further packets reach the destination, as it was before
commmit 2c8cec5c.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
  2011-10-13 10:09     ` Steffen Klassert
@ 2011-10-13 17:58       ` David Miller
  2011-10-14  5:54         ` Steffen Klassert
  0 siblings, 1 reply; 30+ messages in thread
From: David Miller @ 2011-10-13 17:58 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Thu, 13 Oct 2011 12:09:50 +0200

> At least it seems that raw_sendmsg() and ping_sendmsg() don't use
> a cached route, they do the route lookup in any case. I don't see
> where we check if we learned a new pmtu in this cases. 

A freshly looked up route should not have ->obsolete set.

That's why we don't do dst_check() in that part of the ip_output.c
helper code you're modifying.

Please find out exactly why dst->obsolete is non-zero on a freshly
looked up route.  It's unexpected.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
  2011-10-13 17:58       ` David Miller
@ 2011-10-14  5:54         ` Steffen Klassert
  2011-10-17 12:18           ` Steffen Klassert
  0 siblings, 1 reply; 30+ messages in thread
From: Steffen Klassert @ 2011-10-14  5:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Thu, Oct 13, 2011 at 01:58:08PM -0400, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Thu, 13 Oct 2011 12:09:50 +0200
> 
> > At least it seems that raw_sendmsg() and ping_sendmsg() don't use
> > a cached route, they do the route lookup in any case. I don't see
> > where we check if we learned a new pmtu in this cases. 
> 
> A freshly looked up route should not have ->obsolete set.
> 
> That's why we don't do dst_check() in that part of the ip_output.c
> helper code you're modifying.
> 
> Please find out exactly why dst->obsolete is non-zero on a freshly
> looked up route.  It's unexpected.

Hm, on a slow path route lookup e.g. __mkroute_output() calls
rt_dst_alloc() which initializes dst->obsolete to -1. It seems
that ___dst_free() is the only function that ever changes the
initial obsolete value. After calling ___dst_free() dst->obsolete
is 2.

Btw. on a slow path route lookup, __mkroute_output() and friends
initialize the pmtu informations via rt_set_nexthop(). How do we
check if these informations are still valid if we get the route
via the routing hash cache? Do we need to check in this case?

The raw protocol uses ip4_datagram_connect() as it's connect function.
ip4_datagram_connect() uses sk_dst_set() to cache the dst_entry on
the socket, why we don't use this cached dst_entry on raw_sendmsg()
in the connected case?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes
  2011-10-12 21:08   ` David Miller
@ 2011-10-14  6:34     ` Steffen Klassert
  2011-11-08 19:36     ` David Miller
  1 sibling, 0 replies; 30+ messages in thread
From: Steffen Klassert @ 2011-10-14  6:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Wed, Oct 12, 2011 at 05:08:05PM -0400, David Miller wrote:
> 
> You really can't do this, it's going to kill all of the memory savings from
> storing metrics in the inetpeer cache.
> 
> Every input route is going to have it's metrics COW'd with this change.
> 

Ok, I missed this completely. So if input and output routes share the
inetpeer information and we don't want to copy, we might not use the
(learned) pmtu informations on the inetpeer for input routes. So
for input routes, dst_mtu() could return dst->ops->default_mtu()
instead of the mtu informations stored on the metric. Are there other
(better) solutions?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
  2011-10-14  5:54         ` Steffen Klassert
@ 2011-10-17 12:18           ` Steffen Klassert
  2011-10-19  9:07             ` Gao feng
  0 siblings, 1 reply; 30+ messages in thread
From: Steffen Klassert @ 2011-10-17 12:18 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Fri, Oct 14, 2011 at 07:54:06AM +0200, Steffen Klassert wrote:
> On Thu, Oct 13, 2011 at 01:58:08PM -0400, David Miller wrote:
> > 
> > Please find out exactly why dst->obsolete is non-zero on a freshly
> > looked up route.  It's unexpected.
> 
> Hm, on a slow path route lookup e.g. __mkroute_output() calls
> rt_dst_alloc() which initializes dst->obsolete to -1.

Just a follow up:
git commit d11a4dc18 (ipv4: check rt_genid in dst_check)
changed the initialization value of dst->obsolete from
0 to -1.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
  2011-10-17 12:18           ` Steffen Klassert
@ 2011-10-19  9:07             ` Gao feng
  2011-10-19 19:32               ` David Miller
  0 siblings, 1 reply; 30+ messages in thread
From: Gao feng @ 2011-10-19  9:07 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, netdev

2011.10.17 20:18, Steffen Klassert wrote:
> On Fri, Oct 14, 2011 at 07:54:06AM +0200, Steffen Klassert wrote:
>> On Thu, Oct 13, 2011 at 01:58:08PM -0400, David Miller wrote:
>>>
>>> Please find out exactly why dst->obsolete is non-zero on a freshly
>>> looked up route.  It's unexpected.
>>
>> Hm, on a slow path route lookup e.g. __mkroute_output() calls
>> rt_dst_alloc() which initializes dst->obsolete to -1.
> 
> Just a follow up:
> git commit d11a4dc18 (ipv4: check rt_genid in dst_check)
> changed the initialization value of dst->obsolete from
> 0 to -1.
> --

Anybody give any suggestion?
Actually,we can't suppose the dst->obsolete is non-zero.
maybe we should use both dst->obsolete and rt->rt_peer_genid to decide whether to do dst_check or not?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
  2011-10-19  9:07             ` Gao feng
@ 2011-10-19 19:32               ` David Miller
  2011-11-08 19:19                 ` David Miller
  0 siblings, 1 reply; 30+ messages in thread
From: David Miller @ 2011-10-19 19:32 UTC (permalink / raw)
  To: gaofeng; +Cc: steffen.klassert, netdev

From: Gao feng <gaofeng@cn.fujitsu.com>
Date: Wed, 19 Oct 2011 17:07:50 +0800

> 2011.10.17 20:18, Steffen Klassert wrote:
>> On Fri, Oct 14, 2011 at 07:54:06AM +0200, Steffen Klassert wrote:
>>> On Thu, Oct 13, 2011 at 01:58:08PM -0400, David Miller wrote:
>>>>
>>>> Please find out exactly why dst->obsolete is non-zero on a freshly
>>>> looked up route.  It's unexpected.
>>>
>>> Hm, on a slow path route lookup e.g. __mkroute_output() calls
>>> rt_dst_alloc() which initializes dst->obsolete to -1.
>> 
>> Just a follow up:
>> git commit d11a4dc18 (ipv4: check rt_genid in dst_check)
>> changed the initialization value of dst->obsolete from
>> 0 to -1.
>> --
> 
> Anybody give any suggestion?

I'm really busy but will look at this soon.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] ipv4: Fix inetpeer expiration handling
  2011-10-11 11:11 ` [PATCH 3/4] ipv4: Fix inetpeer expiration handling Steffen Klassert
@ 2011-10-20  6:24   ` Gao feng
  2011-11-08 19:38     ` David Miller
  2011-11-09 12:47     ` Steffen Klassert
  0 siblings, 2 replies; 30+ messages in thread
From: Gao feng @ 2011-10-20  6:24 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, netdev

2011.10.11 19:11, Steffen Klassert wrote:
> We leak a trigger to check if the learned pmtu information has
> expired, so the learned pmtu informations never expire. This patch
> add such a trigger to ipv4_dst_check().
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> ---
>  net/ipv4/route.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 9a6623e..cda370c 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1660,6 +1660,10 @@ static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
>  
>  	if (rt_is_expired(rt))
>  		return NULL;
> +
> +	if (rt->peer && peer_pmtu_expired(rt->peer))
> +		dst_metric_set(dst, RTAX_MTU, rt->peer->pmtu_orig);
> +
>  	if (rt->rt_peer_genid != rt_peer_genid()) {
>  		struct inet_peer *peer;
>  

there are serval problem.
1:rt->peer maybe null,we should call rt_bind_peer just like the code below.
2:rt->peer_pmtu_orig is null. if we hasn't send packet before,the func check_peer_pmtu hasn't be called.
  so the peer->pmtu_orig is null.
3:when rt->rt_peer_genid != rt_peer_genid(), we has no need to do dst_metric_set(dst, RTAX_MTU, rt->peer->pmtu_orig),
  because check_peer_pmtu will do this.

what about this patch?


diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 04a14db..45782d2 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1655,17 +1655,17 @@ static int check_peer_redir(struct dst_entry *dst, struct inet_peer *peer)
 static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
 {
        struct rtable *rt = (struct rtable *) dst;
+       struct inet_peer *peer;

        if (rt_is_expired(rt))
                return NULL;
-       if (rt->rt_peer_genid != rt_peer_genid()) {
-               struct inet_peer *peer;

-               if (!rt->peer)
-                       rt_bind_peer(rt, rt->rt_dst, 0);
+       if (!rt->peer)
+               rt_bind_peer(rt, rt->rt_dst, 0);

-               peer = rt->peer;
-               if (peer) {
+       peer = rt->peer;
+       if (peer) {
+               if (rt->rt_peer_genid != rt_peer_genid()) {
                        check_peer_pmtu(dst, peer);

                        if (peer->redirect_learned.a4 &&
@@ -1673,9 +1673,12 @@ static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
                                if (check_peer_redir(dst, peer))
                                        return NULL;
                        }
-               }

-               rt->rt_peer_genid = rt_peer_genid();
+                       rt->rt_peer_genid = rt_peer_genid();
+
+               }else if(peer_pmtu_expired(peer))
+                       dst_metric_set(dst,RTAX_MTU,peer->pmtu_orig);
+
        }
        return dst;
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
  2011-10-19 19:32               ` David Miller
@ 2011-11-08 19:19                 ` David Miller
  2011-11-08 19:33                   ` David Miller
  0 siblings, 1 reply; 30+ messages in thread
From: David Miller @ 2011-11-08 19:19 UTC (permalink / raw)
  To: gaofeng; +Cc: steffen.klassert, netdev


Steffen I look at this specific patch again.

The peer should be found and loaded up, and the PMTU propagated, when
the route cache entry is created.  Specifically rt_init_metrics() will
find any existing peer, and do the whole check_peer_pmtu() sequence
that ipv4_dst_check() does.

If we have some issue with UDP or RAW caching a route past the first
use after the route lookup, yes we have to introduce a dst_check()
call somewhere.

But unilaterally doing this on every CORK setup seems at least very
excessive.  Because CORK setup happens even for single sends.  One
such example is ip_make_skb().

ip_make_skb() is used by, f.e., udp_sendmsg() when corkreq is false.
And in this case 'rt' is used immediately after being looked up
in udp_sendmsg().

And, as far as I can tell, routines like udp_sendmsg() in fact already
handle the "cached route across multiple sendmsg() calls" case too.

Specifically, udp_sendmsg() does this:

	if (connected)
		rt = (struct rtable *)sk_dst_check(sk, 0);

otherwise it makes a completely fresh route lookup.

RAW sendmsg unconditionally makes a fresh route lookup on every
sendmsg call.  So it should be OK too.

So I really fail to see the problematic case.  Therefore, if it exists
you'll have to give me an exact sequence of events that leads to the
problem.

I suspect that your real problem has nothing to do with UDP or RAW,
but rather the issue is that entries already in the routing cache
with a NULL peer need to be refreshed with peer information created
in another context.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
  2011-11-08 19:19                 ` David Miller
@ 2011-11-08 19:33                   ` David Miller
  2011-11-09 12:08                     ` Steffen Klassert
  0 siblings, 1 reply; 30+ messages in thread
From: David Miller @ 2011-11-08 19:33 UTC (permalink / raw)
  To: gaofeng; +Cc: steffen.klassert, netdev

From: David Miller <davem@davemloft.net>
Date: Tue, 08 Nov 2011 14:19:50 -0500 (EST)

> I suspect that your real problem has nothing to do with UDP or RAW,
> but rather the issue is that entries already in the routing cache
> with a NULL peer need to be refreshed with peer information created
> in another context.

So you want something like this patch:

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 155138d..2966631 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1677,12 +1677,8 @@ static int check_peer_redir(struct dst_entry *dst, struct inet_peer *peer)
 	return 0;
 }
 
-static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
+static struct rtable *ipv4_validate_peer(struct rtable *rt)
 {
-	struct rtable *rt = (struct rtable *) dst;
-
-	if (rt_is_expired(rt))
-		return NULL;
 	if (rt->rt_peer_genid != rt_peer_genid()) {
 		struct inet_peer *peer;
 
@@ -1691,17 +1687,27 @@ static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
 
 		peer = rt->peer;
 		if (peer) {
-			check_peer_pmtu(dst, peer);
+			check_peer_pmtu(&rt->dst, peer);
 
 			if (peer->redirect_learned.a4 &&
 			    peer->redirect_learned.a4 != rt->rt_gateway) {
-				if (check_peer_redir(dst, peer))
+				if (check_peer_redir(&rt->dst, peer))
 					return NULL;
 			}
 		}
 
 		rt->rt_peer_genid = rt_peer_genid();
 	}
+	return rt;
+}
+
+static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
+{
+	struct rtable *rt = (struct rtable *) dst;
+
+	if (rt_is_expired(rt))
+		return NULL;
+	dst = (struct dst_entry *) ipv4_validate_peer(rt);
 	return dst;
 }
 
@@ -2349,6 +2355,9 @@ int ip_route_input_common(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		    rth->rt_mark == skb->mark &&
 		    net_eq(dev_net(rth->dst.dev), net) &&
 		    !rt_is_expired(rth)) {
+			rth = ipv4_validate_peer(rth);
+			if (!rth)
+				continue;
 			if (noref) {
 				dst_use_noref(&rth->dst, jiffies);
 				skb_dst_set_noref(skb, &rth->dst);
@@ -2724,6 +2733,9 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *flp4)
 			    (IPTOS_RT_MASK | RTO_ONLINK)) &&
 		    net_eq(dev_net(rth->dst.dev), net) &&
 		    !rt_is_expired(rth)) {
+			rth = ipv4_validate_peer(rth);
+			if (!rth)
+				continue;
 			dst_use(&rth->dst, jiffies);
 			RT_CACHE_STAT_INC(out_hit);
 			rcu_read_unlock_bh();

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes
  2011-10-12 21:08   ` David Miller
  2011-10-14  6:34     ` Steffen Klassert
@ 2011-11-08 19:36     ` David Miller
  2011-11-09 12:11       ` Steffen Klassert
  2011-11-14 10:12       ` Steffen Klassert
  1 sibling, 2 replies; 30+ messages in thread
From: David Miller @ 2011-11-08 19:36 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev

From: David Miller <davem@davemloft.net>
Date: Wed, 12 Oct 2011 17:08:05 -0400 (EDT)

> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Tue, 11 Oct 2011 13:10:27 +0200
> 
>> @@ -1817,9 +1819,14 @@ static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4,
>>  		if (inet_metrics_new(peer))
>>  			memcpy(peer->metrics, fi->fib_metrics,
>>  			       sizeof(u32) * RTAX_MAX);
>> -		dst_init_metrics(&rt->dst, peer->metrics, false);
>>  
>> -		check_peer_pmtu(&rt->dst, peer);
>> +		dst_init_metrics(dst, peer->metrics, false);
>> +		check_peer_pmtu(dst, peer);
>> +
>> +		if (rt_is_input_route(rt))
>> +			dst_metric_set(dst, RTAX_MTU,
>> +				       dst->ops->default_mtu(dst));
>> +
> 
> You really can't do this, it's going to kill all of the memory savings from
> storing metrics in the inetpeer cache.
> 
> Every input route is going to have it's metrics COW'd with this change.
> 
> The whole idea is to use defaults as heavily as possible, and that's
> the entire reason why the dst->ops->default_mtu() method exists, so
> that we can just leave the values alone and have read-only copies %99
> of the time.
> 
> Please rearrange your fix so that these goals are still achieved.

What I think you can do to solve this problem is explicitly use
dst->ops->default_mtu() in ip_forward() instead of dst_mtu().

That way you won't use the cached PMTU for input routes.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] ipv4: Fix inetpeer expiration handling
  2011-10-20  6:24   ` Gao feng
@ 2011-11-08 19:38     ` David Miller
  2011-11-09 12:47     ` Steffen Klassert
  1 sibling, 0 replies; 30+ messages in thread
From: David Miller @ 2011-11-08 19:38 UTC (permalink / raw)
  To: gaofeng; +Cc: steffen.klassert, netdev

From: Gao feng <gaofeng@cn.fujitsu.com>
Date: Thu, 20 Oct 2011 14:24:33 +0800

> there are serval problem.
> 1:rt->peer maybe null,we should call rt_bind_peer just like the code below.
> 2:rt->peer_pmtu_orig is null. if we hasn't send packet before,the func check_peer_pmtu hasn't be called.
>   so the peer->pmtu_orig is null.
> 3:when rt->rt_peer_genid != rt_peer_genid(), we has no need to do dst_metric_set(dst, RTAX_MTU, rt->peer->pmtu_orig),
>   because check_peer_pmtu will do this.
> 
> what about this patch?

In the case where no peer was created and we use default metrics and
other settings (can be common for UDP) I don't think it's wise to make
an inetpeer lookup every time we check the dst.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net 0/4] ipv4: various pmtu discovery fixes
  2011-10-11 19:54 ` [PATCH net 0/4] ipv4: various pmtu discovery fixes David Miller
@ 2011-11-08 19:41   ` David Miller
  0 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2011-11-08 19:41 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev


Patch 4/4 looked fine so I applied that one as-is, thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
  2011-11-08 19:33                   ` David Miller
@ 2011-11-09 12:08                     ` Steffen Klassert
  2011-11-21  7:56                       ` Steffen Klassert
  0 siblings, 1 reply; 30+ messages in thread
From: Steffen Klassert @ 2011-11-09 12:08 UTC (permalink / raw)
  To: David Miller; +Cc: gaofeng, netdev

On Tue, Nov 08, 2011 at 02:33:02PM -0500, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 08 Nov 2011 14:19:50 -0500 (EST)
> 
> > I suspect that your real problem has nothing to do with UDP or RAW,
> > but rather the issue is that entries already in the routing cache
> > with a NULL peer need to be refreshed with peer information created
> > in another context.

Yes, that's the problem.

> 
> So you want something like this patch:
> 

Originally, I wanted to fix it with the patch below.

Given the fact that dst->obsolete is not null, this should
do the same like your patch for output routes. During the
tests with this patch I noticed a problem with that.
Unfortunately I can't remember what it was...

I'll do some investigating, perhaps I can get it back to my mind.

I did some quick tests with this and with your patch and both
seem to fix the problem at the first glance.

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 511f4a7..ac189c9 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2723,7 +2723,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *flp4)
 		    !((rth->rt_key_tos ^ flp4->flowi4_tos) &
 			    (IPTOS_RT_MASK | RTO_ONLINK)) &&
 		    net_eq(dev_net(rth->dst.dev), net) &&
-		    !rt_is_expired(rth)) {
+		    (rth = (struct rtable *) dst_check(&rth->dst, 0)) && rth) {
 			dst_use(&rth->dst, jiffies);
 			RT_CACHE_STAT_INC(out_hit);
 			rcu_read_unlock_bh();
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes
  2011-11-08 19:36     ` David Miller
@ 2011-11-09 12:11       ` Steffen Klassert
  2011-11-14 10:12       ` Steffen Klassert
  1 sibling, 0 replies; 30+ messages in thread
From: Steffen Klassert @ 2011-11-09 12:11 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Tue, Nov 08, 2011 at 02:36:30PM -0500, David Miller wrote:
> 
> What I think you can do to solve this problem is explicitly use
> dst->ops->default_mtu() in ip_forward() instead of dst_mtu().
> 
> That way you won't use the cached PMTU for input routes.

Yes, I already had something like that in mind. I'll do
a patch to fix it in this manner.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] ipv4: Fix inetpeer expiration handling
  2011-10-20  6:24   ` Gao feng
  2011-11-08 19:38     ` David Miller
@ 2011-11-09 12:47     ` Steffen Klassert
  1 sibling, 0 replies; 30+ messages in thread
From: Steffen Klassert @ 2011-11-09 12:47 UTC (permalink / raw)
  To: Gao feng; +Cc: David Miller, netdev

On Thu, Oct 20, 2011 at 02:24:33PM +0800, Gao feng wrote:
> 
> there are serval problem.
> 1:rt->peer maybe null,we should call rt_bind_peer just like the code below.

If rt->peer is NULL, rt_bind_peer() sets rt->rt_peer_genid = rt_peer_genid().
So your check for rt->rt_peer_genid != rt_peer_genid() is false then and
creates cases where an unchecked peer is bound to a route.

> 2:rt->peer_pmtu_orig is null. if we hasn't send packet before,the func check_peer_pmtu hasn't be called.
>   so the peer->pmtu_orig is null.

If a peer is bound to a route during slow path route lookup, the peer
should be properly initialized with rt_init_metrics(). So
rt->peer_pmtu_orig should not be null here as far as I can see.

I still think that my original patch fixes the problem.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes
  2011-11-08 19:36     ` David Miller
  2011-11-09 12:11       ` Steffen Klassert
@ 2011-11-14 10:12       ` Steffen Klassert
  2011-11-14 19:33         ` David Miller
  1 sibling, 1 reply; 30+ messages in thread
From: Steffen Klassert @ 2011-11-14 10:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Tue, Nov 08, 2011 at 02:36:30PM -0500, David Miller wrote:
> 
> What I think you can do to solve this problem is explicitly use
> dst->ops->default_mtu() in ip_forward() instead of dst_mtu().
> 

Unfortunately, that's not sufficient. We need to do an input route
check in ip_skb_dst_mtu() at least to get plain ipv4 to work,
IPsec is still broken then. So I assume most (all?) callers of
dst_mtu() don't want to get the cached PMTU for input routes on ipv4.

So for the moment I'm thinking about adding an ip_dst_mtu()
function that returns dst->ops->default_mtu() for input routes
and dst_mtu() for output routes. Then we could convert the
dst_mtu() users in net/ipv4/ over to this one.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes
  2011-11-14 10:12       ` Steffen Klassert
@ 2011-11-14 19:33         ` David Miller
  2011-11-15 10:00           ` Steffen Klassert
  2011-11-22 13:20           ` Steffen Klassert
  0 siblings, 2 replies; 30+ messages in thread
From: David Miller @ 2011-11-14 19:33 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Mon, 14 Nov 2011 11:12:44 +0100

> So for the moment I'm thinking about adding an ip_dst_mtu()
> function that returns dst->ops->default_mtu() for input routes
> and dst_mtu() for output routes. Then we could convert the
> dst_mtu() users in net/ipv4/ over to this one.

We'll need something similar for ipv6 eventually...

I would suggest that we do away with dst_ops->default_mtu() and just
have dst_ops->mtu() which gets invoked unconditionally by dst_mtu().

You can integrate the ->default_mtu() handling and the input route
check into this new method.  Then IPv6 can be fixed in a
straightforward manner later.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes
  2011-11-14 19:33         ` David Miller
@ 2011-11-15 10:00           ` Steffen Klassert
  2011-11-22 13:20           ` Steffen Klassert
  1 sibling, 0 replies; 30+ messages in thread
From: Steffen Klassert @ 2011-11-15 10:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Mon, Nov 14, 2011 at 02:33:20PM -0500, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Mon, 14 Nov 2011 11:12:44 +0100
> 
> > So for the moment I'm thinking about adding an ip_dst_mtu()
> > function that returns dst->ops->default_mtu() for input routes
> > and dst_mtu() for output routes. Then we could convert the
> > dst_mtu() users in net/ipv4/ over to this one.
> 
> We'll need something similar for ipv6 eventually...

Well, probaply. I did not test with ipv6 yet, so I focused to
fix the ipv4 part in the first place.

> 
> I would suggest that we do away with dst_ops->default_mtu() and just
> have dst_ops->mtu() which gets invoked unconditionally by dst_mtu().
> 
> You can integrate the ->default_mtu() handling and the input route
> check into this new method.  Then IPv6 can be fixed in a
> straightforward manner later.

Ok, I'll look into this. 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
  2011-11-09 12:08                     ` Steffen Klassert
@ 2011-11-21  7:56                       ` Steffen Klassert
  2011-12-01 18:40                         ` David Miller
  0 siblings, 1 reply; 30+ messages in thread
From: Steffen Klassert @ 2011-11-21  7:56 UTC (permalink / raw)
  To: David Miller; +Cc: gaofeng, netdev

On Wed, Nov 09, 2011 at 01:08:51PM +0100, Steffen Klassert wrote:
> On Tue, Nov 08, 2011 at 02:33:02PM -0500, David Miller wrote:
> > From: David Miller <davem@davemloft.net>
> > Date: Tue, 08 Nov 2011 14:19:50 -0500 (EST)
> > 
> > > I suspect that your real problem has nothing to do with UDP or RAW,
> > > but rather the issue is that entries already in the routing cache
> > > with a NULL peer need to be refreshed with peer information created
> > > in another context.
> 
> Yes, that's the problem.
> 
> > 
> > So you want something like this patch:
> > 
> 
> Originally, I wanted to fix it with the patch below.
> 
> Given the fact that dst->obsolete is not null, this should
> do the same like your patch for output routes. During the
> tests with this patch I noticed a problem with that.
> Unfortunately I can't remember what it was...
> 
> I'll do some investigating, perhaps I can get it back to my mind.
> 
> I did some quick tests with this and with your patch and both
> seem to fix the problem at the first glance.
> 

I still don't see any problems with both of the patches.
So I assume that both patches would fix the issue. However, 
your patch is probaply less fragile as this does not depend
on a certain value of dst->obsolete.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes
  2011-11-14 19:33         ` David Miller
  2011-11-15 10:00           ` Steffen Klassert
@ 2011-11-22 13:20           ` Steffen Klassert
  1 sibling, 0 replies; 30+ messages in thread
From: Steffen Klassert @ 2011-11-22 13:20 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Mon, Nov 14, 2011 at 02:33:20PM -0500, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Mon, 14 Nov 2011 11:12:44 +0100
> 
> > So for the moment I'm thinking about adding an ip_dst_mtu()
> > function that returns dst->ops->default_mtu() for input routes
> > and dst_mtu() for output routes. Then we could convert the
> > dst_mtu() users in net/ipv4/ over to this one.
> 
> We'll need something similar for ipv6 eventually...
> 
> I would suggest that we do away with dst_ops->default_mtu() and just
> have dst_ops->mtu() which gets invoked unconditionally by dst_mtu().
> 

I found another pmtu related issue. Since commit 2774c131b
(xfrm: Handle blackhole route creation via afinfo)
we create a blackhole route even on packet forwarding
if we have a xfrm policy but we don't have yet the states.

In this case, the packet is not dropped immediately
but continues to travel in the packet forwarding path.
This means that the blackhole route's dst_ops->default_mtu()
method is invoked which returns a mtu of null. So
we announce a pmtu of null to the original sender of
the packet.

The simplest fix would be to return e.g. IP_MAX_MTU
as the default mtu on blackhole routes. But actually
I don't see why we should create a blackhole route on
packet forwarding as long as we finally drop the
packet anyway. So perhaps it is better to create
blackhole routes just in the case when we have socket
context.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
  2011-11-21  7:56                       ` Steffen Klassert
@ 2011-12-01 18:40                         ` David Miller
  0 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2011-12-01 18:40 UTC (permalink / raw)
  To: steffen.klassert; +Cc: gaofeng, netdev

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Mon, 21 Nov 2011 08:56:21 +0100

> I still don't see any problems with both of the patches.
> So I assume that both patches would fix the issue. However, 
> your patch is probaply less fragile as this does not depend
> on a certain value of dst->obsolete.

Great, thanks for checking it out.

I'll add the following to the net tree and queue it up for -stable
too.

--------------------
ipv4: Perform peer validation on cached route lookup.

Otherwise we won't notice the peer GENID change.

Reported-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/route.c |   26 +++++++++++++++++++-------
 1 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 57e01bc..ca5e237 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1693,12 +1693,8 @@ static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu)
 }
 
 
-static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
+static struct rtable *ipv4_validate_peer(struct rtable *rt)
 {
-	struct rtable *rt = (struct rtable *) dst;
-
-	if (rt_is_expired(rt))
-		return NULL;
 	if (rt->rt_peer_genid != rt_peer_genid()) {
 		struct inet_peer *peer;
 
@@ -1707,19 +1703,29 @@ static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
 
 		peer = rt->peer;
 		if (peer) {
-			check_peer_pmtu(dst, peer);
+			check_peer_pmtu(&rt->dst, peer);
 
 			if (peer->redirect_genid != redirect_genid)
 				peer->redirect_learned.a4 = 0;
 			if (peer->redirect_learned.a4 &&
 			    peer->redirect_learned.a4 != rt->rt_gateway) {
-				if (check_peer_redir(dst, peer))
+				if (check_peer_redir(&rt->dst, peer))
 					return NULL;
 			}
 		}
 
 		rt->rt_peer_genid = rt_peer_genid();
 	}
+	return rt;
+}
+
+static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
+{
+	struct rtable *rt = (struct rtable *) dst;
+
+	if (rt_is_expired(rt))
+		return NULL;
+	dst = (struct dst_entry *) ipv4_validate_peer(rt);
 	return dst;
 }
 
@@ -2374,6 +2380,9 @@ int ip_route_input_common(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		    rth->rt_mark == skb->mark &&
 		    net_eq(dev_net(rth->dst.dev), net) &&
 		    !rt_is_expired(rth)) {
+			rth = ipv4_validate_peer(rth);
+			if (!rth)
+				continue;
 			if (noref) {
 				dst_use_noref(&rth->dst, jiffies);
 				skb_dst_set_noref(skb, &rth->dst);
@@ -2749,6 +2758,9 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *flp4)
 			    (IPTOS_RT_MASK | RTO_ONLINK)) &&
 		    net_eq(dev_net(rth->dst.dev), net) &&
 		    !rt_is_expired(rth)) {
+			rth = ipv4_validate_peer(rth);
+			if (!rth)
+				continue;
 			dst_use(&rth->dst, jiffies);
 			RT_CACHE_STAT_INC(out_hit);
 			rcu_read_unlock_bh();
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2011-12-01 18:41 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-11 11:08 [PATCH net 0/4] ipv4: various pmtu discovery fixes Steffen Klassert
2011-10-11 11:09 ` [PATCH 1/4] ipv4: Fix pmtu propagating Steffen Klassert
2011-10-12 21:02   ` David Miller
2011-10-13 10:09     ` Steffen Klassert
2011-10-13 17:58       ` David Miller
2011-10-14  5:54         ` Steffen Klassert
2011-10-17 12:18           ` Steffen Klassert
2011-10-19  9:07             ` Gao feng
2011-10-19 19:32               ` David Miller
2011-11-08 19:19                 ` David Miller
2011-11-08 19:33                   ` David Miller
2011-11-09 12:08                     ` Steffen Klassert
2011-11-21  7:56                       ` Steffen Klassert
2011-12-01 18:40                         ` David Miller
2011-10-11 11:10 ` [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes Steffen Klassert
2011-10-12 21:08   ` David Miller
2011-10-14  6:34     ` Steffen Klassert
2011-11-08 19:36     ` David Miller
2011-11-09 12:11       ` Steffen Klassert
2011-11-14 10:12       ` Steffen Klassert
2011-11-14 19:33         ` David Miller
2011-11-15 10:00           ` Steffen Klassert
2011-11-22 13:20           ` Steffen Klassert
2011-10-11 11:11 ` [PATCH 3/4] ipv4: Fix inetpeer expiration handling Steffen Klassert
2011-10-20  6:24   ` Gao feng
2011-11-08 19:38     ` David Miller
2011-11-09 12:47     ` Steffen Klassert
2011-10-11 11:12 ` [PATCH 4/4] ipv4: Fix inetpeer expire time information Steffen Klassert
2011-10-11 19:54 ` [PATCH net 0/4] ipv4: various pmtu discovery fixes David Miller
2011-11-08 19:41   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).