All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fix NULL pointer + success return in route lookup path
@ 2009-06-19 17:18 Neil Horman
  2009-06-20  8:15 ` David Miller
  2009-06-20 12:37 ` Jarek Poplawski
  0 siblings, 2 replies; 19+ messages in thread
From: Neil Horman @ 2009-06-19 17:18 UTC (permalink / raw)
  To: netdev; +Cc: mbizon, dada1, kuznet, davem, pekkas, jmorris, yoshfuji, nhorman

Don't drop route if we're not caching	

	I recently got a report of an oops on a route lookup.  Maxime was
testing what would happen if route caching was turned off (doing so by setting
making rt_caching always return 0), and found that it triggered an oops.  I
looked at it and found that the problem stemmed from the fact that the route
lookup routines were returning success from their lookup paths (which is good),
but never set the **rp pointer to anything (which is bad).  This happens because
in rt_intern_hash, if rt_caching returns false, we call rt_drop and return 0.
This almost emulates slient success.  What we should be doing is assigning *rp =
rt and _not_ dropping the route.  This way, during slow path lookups, when we
create a new route cache entry, we don't immediately discard it, rather we just
don't add it into the cache hash table, but we let this one lookup use it for
the purpose of this route request.  Maxime has tested and reports it prevents
the oops.  There is still a subsequent routing issue that I'm looking into
further, but I'm confident that, even if its related to this same path, this
patch makes sense to take.

Regards
Neil
    
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 route.c |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index cd76b3c..65b3a8b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1085,8 +1085,16 @@ restart:
 	now = jiffies;
 
 	if (!rt_caching(dev_net(rt->u.dst.dev))) {
-		rt_drop(rt);
-		return 0;
+		/*
+		 * If we're not caching, just tell the caller we
+		 * were successful and don't touch the route.  The
+		 * caller hold the sole reference to the cache entry, and
+		 * it will be released when the caller is done with it.
+		 * If we drop it here, the callers have no way to resolve routes
+		 * when we're not caching.  Instead, just point *rp at rt, so
+		 * the caller gets a single use out of the route
+		 */
+		goto report_and_exit;
 	}
 
 	rthp = &rt_hash_table[hash].chain;
@@ -1217,6 +1225,8 @@ restart:
 	rcu_assign_pointer(rt_hash_table[hash].chain, rt);
 
 	spin_unlock_bh(rt_hash_lock_addr(hash));
+
+report_and_exit:
 	if (rp)
 		*rp = rt;
 	else

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-19 17:18 [PATCH] fix NULL pointer + success return in route lookup path Neil Horman
@ 2009-06-20  8:15 ` David Miller
  2009-06-20 12:37 ` Jarek Poplawski
  1 sibling, 0 replies; 19+ messages in thread
From: David Miller @ 2009-06-20  8:15 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, mbizon, dada1, kuznet, pekkas, jmorris, yoshfuji

From: Neil Horman <nhorman@tuxdriver.com>
Date: Fri, 19 Jun 2009 13:18:14 -0400

> Don't drop route if we're not caching	
> 
> 	I recently got a report of an oops on a route lookup.  Maxime was
> testing what would happen if route caching was turned off (doing so by setting
> making rt_caching always return 0), and found that it triggered an oops.  I
> looked at it and found that the problem stemmed from the fact that the route
> lookup routines were returning success from their lookup paths (which is good),
> but never set the **rp pointer to anything (which is bad).  This happens because
> in rt_intern_hash, if rt_caching returns false, we call rt_drop and return 0.
> This almost emulates slient success.  What we should be doing is assigning *rp =
> rt and _not_ dropping the route.  This way, during slow path lookups, when we
> create a new route cache entry, we don't immediately discard it, rather we just
> don't add it into the cache hash table, but we let this one lookup use it for
> the purpose of this route request.  Maxime has tested and reports it prevents
> the oops.  There is still a subsequent routing issue that I'm looking into
> further, but I'm confident that, even if its related to this same path, this
> patch makes sense to take.
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-19 17:18 [PATCH] fix NULL pointer + success return in route lookup path Neil Horman
  2009-06-20  8:15 ` David Miller
@ 2009-06-20 12:37 ` Jarek Poplawski
  2009-06-20 16:39   ` Jarek Poplawski
  2009-06-20 16:44   ` Neil Horman
  1 sibling, 2 replies; 19+ messages in thread
From: Jarek Poplawski @ 2009-06-20 12:37 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, mbizon, dada1, kuznet, davem, pekkas, jmorris, yoshfuji

Neil Horman wrote, On 06/19/2009 07:18 PM:

> Don't drop route if we're not caching	
> 
> 	I recently got a report of an oops on a route lookup.  Maxime was
> testing what would happen if route caching was turned off (doing so by setting
> making rt_caching always return 0), and found that it triggered an oops.  I
> looked at it and found that the problem stemmed from the fact that the route
> lookup routines were returning success from their lookup paths (which is good),
> but never set the **rp pointer to anything (which is bad).  This happens because
> in rt_intern_hash, if rt_caching returns false, we call rt_drop and return 0.
> This almost emulates slient success.  What we should be doing is assigning *rp =
> rt and _not_ dropping the route.  This way, during slow path lookups, when we
> create a new route cache entry, we don't immediately discard it, rather we just
> don't add it into the cache hash table, but we let this one lookup use it for
> the purpose of this route request.  Maxime has tested and reports it prevents
> the oops.

Hmm... So, IOW, do you mean the same Maxime, by whom it was "Reported-by" and
"Tested-by", and probably anonymous on the Cc list, or I miss something?

Regards,
Jarek P.

> There is still a subsequent routing issue that I'm looking into
> further, but I'm confident that, even if its related to this same path, this
> patch makes sense to take.
> 
> Regards
> Neil
>     
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 
> 
>  route.c |   14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index cd76b3c..65b3a8b 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1085,8 +1085,16 @@ restart:
>  	now = jiffies;
>  
>  	if (!rt_caching(dev_net(rt->u.dst.dev))) {
> -		rt_drop(rt);
> -		return 0;
> +		/*
> +		 * If we're not caching, just tell the caller we
> +		 * were successful and don't touch the route.  The
> +		 * caller hold the sole reference to the cache entry, and
> +		 * it will be released when the caller is done with it.
> +		 * If we drop it here, the callers have no way to resolve routes
> +		 * when we're not caching.  Instead, just point *rp at rt, so
> +		 * the caller gets a single use out of the route
> +		 */
> +		goto report_and_exit;
>  	}
>  
>  	rthp = &rt_hash_table[hash].chain;
> @@ -1217,6 +1225,8 @@ restart:
>  	rcu_assign_pointer(rt_hash_table[hash].chain, rt);
>  
>  	spin_unlock_bh(rt_hash_lock_addr(hash));
> +
> +report_and_exit:
>  	if (rp)
>  		*rp = rt;
>  	else
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-20 12:37 ` Jarek Poplawski
@ 2009-06-20 16:39   ` Jarek Poplawski
  2009-06-20 17:11     ` Neil Horman
  2009-06-20 23:47     ` David Miller
  2009-06-20 16:44   ` Neil Horman
  1 sibling, 2 replies; 19+ messages in thread
From: Jarek Poplawski @ 2009-06-20 16:39 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, mbizon, dada1, kuznet, davem, pekkas, jmorris, yoshfuji

Jarek Poplawski wrote, On 06/20/2009 02:37 PM:

> Neil Horman wrote, On 06/19/2009 07:18 PM:
> 
>> Don't drop route if we're not caching	

...

>>  route.c |   14 ++++++++++++--
>>  1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>> index cd76b3c..65b3a8b 100644
>> --- a/net/ipv4/route.c
>> +++ b/net/ipv4/route.c
>> @@ -1085,8 +1085,16 @@ restart:
>>  	now = jiffies;
>>  
>>  	if (!rt_caching(dev_net(rt->u.dst.dev))) {
>> -		rt_drop(rt);


One more question: if this rt is assigned to an skb, there is only
skb_dst_drop() in kfree_skb(), but it seems we skip rt_free() part,
or I miss something?

Jarek P.

>> -		return 0;
>> +		/*
>> +		 * If we're not caching, just tell the caller we
>> +		 * were successful and don't touch the route.  The
>> +		 * caller hold the sole reference to the cache entry, and
>> +		 * it will be released when the caller is done with it.
>> +		 * If we drop it here, the callers have no way to resolve routes
>> +		 * when we're not caching.  Instead, just point *rp at rt, so
>> +		 * the caller gets a single use out of the route
>> +		 */
>> +		goto report_and_exit;
>>  	}
>>  
>>  	rthp = &rt_hash_table[hash].chain;
>> @@ -1217,6 +1225,8 @@ restart:
>>  	rcu_assign_pointer(rt_hash_table[hash].chain, rt);
>>  
>>  	spin_unlock_bh(rt_hash_lock_addr(hash));
>> +
>> +report_and_exit:
>>  	if (rp)
>>  		*rp = rt;
>>  	else



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-20 12:37 ` Jarek Poplawski
  2009-06-20 16:39   ` Jarek Poplawski
@ 2009-06-20 16:44   ` Neil Horman
  1 sibling, 0 replies; 19+ messages in thread
From: Neil Horman @ 2009-06-20 16:44 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: netdev, mbizon, dada1, kuznet, davem, pekkas, jmorris, yoshfuji

On Sat, Jun 20, 2009 at 02:37:00PM +0200, Jarek Poplawski wrote:
> Neil Horman wrote, On 06/19/2009 07:18 PM:
> 
> > Don't drop route if we're not caching	
> > 
> > 	I recently got a report of an oops on a route lookup.  Maxime was
> > testing what would happen if route caching was turned off (doing so by setting
> > making rt_caching always return 0), and found that it triggered an oops.  I
> > looked at it and found that the problem stemmed from the fact that the route
> > lookup routines were returning success from their lookup paths (which is good),
> > but never set the **rp pointer to anything (which is bad).  This happens because
> > in rt_intern_hash, if rt_caching returns false, we call rt_drop and return 0.
> > This almost emulates slient success.  What we should be doing is assigning *rp =
> > rt and _not_ dropping the route.  This way, during slow path lookups, when we
> > create a new route cache entry, we don't immediately discard it, rather we just
> > don't add it into the cache hash table, but we let this one lookup use it for
> > the purpose of this route request.  Maxime has tested and reports it prevents
> > the oops.
> 
> Hmm... So, IOW, do you mean the same Maxime, by whom it was "Reported-by" and
> "Tested-by", and probably anonymous on the Cc list, or I miss something?
> 
Yes, they are all one in the same person, I honestly had not thought of the
Reported-by tag in my email, apologies.  I had asked Maxime to follow up on the
list to add the tag, but that never seems to have happened.
Neil

> Regards,
> Jarek P.
> 
> > There is still a subsequent routing issue that I'm looking into
> > further, but I'm confident that, even if its related to this same path, this
> > patch makes sense to take.
> > 
> > Regards
> > Neil
> >     
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > 
> > 
> >  route.c |   14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index cd76b3c..65b3a8b 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -1085,8 +1085,16 @@ restart:
> >  	now = jiffies;
> >  
> >  	if (!rt_caching(dev_net(rt->u.dst.dev))) {
> > -		rt_drop(rt);
> > -		return 0;
> > +		/*
> > +		 * If we're not caching, just tell the caller we
> > +		 * were successful and don't touch the route.  The
> > +		 * caller hold the sole reference to the cache entry, and
> > +		 * it will be released when the caller is done with it.
> > +		 * If we drop it here, the callers have no way to resolve routes
> > +		 * when we're not caching.  Instead, just point *rp at rt, so
> > +		 * the caller gets a single use out of the route
> > +		 */
> > +		goto report_and_exit;
> >  	}
> >  
> >  	rthp = &rt_hash_table[hash].chain;
> > @@ -1217,6 +1225,8 @@ restart:
> >  	rcu_assign_pointer(rt_hash_table[hash].chain, rt);
> >  
> >  	spin_unlock_bh(rt_hash_lock_addr(hash));
> > +
> > +report_and_exit:
> >  	if (rp)
> >  		*rp = rt;
> >  	else
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-20 16:39   ` Jarek Poplawski
@ 2009-06-20 17:11     ` Neil Horman
  2009-06-20 17:23       ` Jarek Poplawski
  2009-06-20 23:47     ` David Miller
  1 sibling, 1 reply; 19+ messages in thread
From: Neil Horman @ 2009-06-20 17:11 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: netdev, mbizon, dada1, kuznet, davem, pekkas, jmorris, yoshfuji

On Sat, Jun 20, 2009 at 06:39:25PM +0200, Jarek Poplawski wrote:
> Jarek Poplawski wrote, On 06/20/2009 02:37 PM:
> 
> > Neil Horman wrote, On 06/19/2009 07:18 PM:
> > 
> >> Don't drop route if we're not caching	
> 
> ...
> 
> >>  route.c |   14 ++++++++++++--
> >>  1 file changed, 12 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> >> index cd76b3c..65b3a8b 100644
> >> --- a/net/ipv4/route.c
> >> +++ b/net/ipv4/route.c
> >> @@ -1085,8 +1085,16 @@ restart:
> >>  	now = jiffies;
> >>  
> >>  	if (!rt_caching(dev_net(rt->u.dst.dev))) {
> >> -		rt_drop(rt);
> 
> 
> One more question: if this rt is assigned to an skb, there is only
> skb_dst_drop() in kfree_skb(), but it seems we skip rt_free() part,
> or I miss something?
> 
no, you're right.  That actually seems to be more general problem, independent
of the one I just fixed.  As I understand it, skb_dst_drop is intended to
release the reference on a route cache entry that it grabbed previously, but
specifically avoids going throught the free path that rt_drop implements
(ostensibly so that the garbage collector will clean it up later from the route
cache).  With the recent work that lets us disable the garbage collector
entirely, and simply not cache, this path seems like it will leak routes if an
skb holds the last reference to a route cache entry.  What we likely need to do
is enhance rt_caching to return true in the event that the garbage collector is
off or if the cache has been attacked (rebuilt too many times).  Then we need to
export that function and check it in skb_dst_drop, calling call_rcu_bh to free
the dst entry if rt_caching is false, and the dst entries refcnt is zero (or
something like that).  I'll look into this further and address it in a separate
thread.  Thanks, and good catch!
Neil

> Jarek P.
> 
> >> -		return 0;
> >> +		/*
> >> +		 * If we're not caching, just tell the caller we
> >> +		 * were successful and don't touch the route.  The
> >> +		 * caller hold the sole reference to the cache entry, and
> >> +		 * it will be released when the caller is done with it.
> >> +		 * If we drop it here, the callers have no way to resolve routes
> >> +		 * when we're not caching.  Instead, just point *rp at rt, so
> >> +		 * the caller gets a single use out of the route
> >> +		 */
> >> +		goto report_and_exit;
> >>  	}
> >>  
> >>  	rthp = &rt_hash_table[hash].chain;
> >> @@ -1217,6 +1225,8 @@ restart:
> >>  	rcu_assign_pointer(rt_hash_table[hash].chain, rt);
> >>  
> >>  	spin_unlock_bh(rt_hash_lock_addr(hash));
> >> +
> >> +report_and_exit:
> >>  	if (rp)
> >>  		*rp = rt;
> >>  	else
> 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-20 17:11     ` Neil Horman
@ 2009-06-20 17:23       ` Jarek Poplawski
  0 siblings, 0 replies; 19+ messages in thread
From: Jarek Poplawski @ 2009-06-20 17:23 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, mbizon, dada1, kuznet, davem, pekkas, jmorris, yoshfuji

On Sat, Jun 20, 2009 at 01:11:16PM -0400, Neil Horman wrote:
...
> something like that).  I'll look into this further and address it in a separate
> thread.  

I guess you'll have a chance to complete Maxime credits btw...;-)

Jarek P.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-20 16:39   ` Jarek Poplawski
  2009-06-20 17:11     ` Neil Horman
@ 2009-06-20 23:47     ` David Miller
  2009-06-21 17:11       ` Neil Horman
  1 sibling, 1 reply; 19+ messages in thread
From: David Miller @ 2009-06-20 23:47 UTC (permalink / raw)
  To: jarkao2; +Cc: nhorman, netdev, mbizon, dada1, kuznet, pekkas, jmorris, yoshfuji

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Sat, 20 Jun 2009 18:39:25 +0200

> Jarek Poplawski wrote, On 06/20/2009 02:37 PM:
> 
>> Neil Horman wrote, On 06/19/2009 07:18 PM:
>> 
>>> Don't drop route if we're not caching	
> 
> ...
> 
>>>  route.c |   14 ++++++++++++--
>>>  1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>>> index cd76b3c..65b3a8b 100644
>>> --- a/net/ipv4/route.c
>>> +++ b/net/ipv4/route.c
>>> @@ -1085,8 +1085,16 @@ restart:
>>>  	now = jiffies;
>>>  
>>>  	if (!rt_caching(dev_net(rt->u.dst.dev))) {
>>> -		rt_drop(rt);
> 
> 
> One more question: if this rt is assigned to an skb, there is only
> skb_dst_drop() in kfree_skb(), but it seems we skip rt_free() part,
> or I miss something?

This whole code path was buggy, if it returns success it should
do as the normal success code path does which is assign for
non-SKB case to *rp, or skb_dst_set().

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-20 23:47     ` David Miller
@ 2009-06-21 17:11       ` Neil Horman
  2009-06-22  5:43         ` Jarek Poplawski
  0 siblings, 1 reply; 19+ messages in thread
From: Neil Horman @ 2009-06-21 17:11 UTC (permalink / raw)
  To: David Miller
  Cc: jarkao2, netdev, mbizon, dada1, kuznet, pekkas, jmorris, yoshfuji

On Sat, Jun 20, 2009 at 04:47:48PM -0700, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Sat, 20 Jun 2009 18:39:25 +0200
> 
> > Jarek Poplawski wrote, On 06/20/2009 02:37 PM:
> > 
> >> Neil Horman wrote, On 06/19/2009 07:18 PM:
> >> 
> >>> Don't drop route if we're not caching	
> > 
> > ...
> > 
> >>>  route.c |   14 ++++++++++++--
> >>>  1 file changed, 12 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> >>> index cd76b3c..65b3a8b 100644
> >>> --- a/net/ipv4/route.c
> >>> +++ b/net/ipv4/route.c
> >>> @@ -1085,8 +1085,16 @@ restart:
> >>>  	now = jiffies;
> >>>  
> >>>  	if (!rt_caching(dev_net(rt->u.dst.dev))) {
> >>> -		rt_drop(rt);
> > 
> > 
> > One more question: if this rt is assigned to an skb, there is only
> > skb_dst_drop() in kfree_skb(), but it seems we skip rt_free() part,
> > or I miss something?
> 
> This whole code path was buggy, if it returns success it should
> do as the normal success code path does which is assign for
> non-SKB case to *rp, or skb_dst_set().

So I'm a bit confused.  I see how my patch corrects the path we take through
rt_intern_hash, doing the same thing that we normally do in the success case.
What I don't see is how we clean up those dst entries when we're done with them.
Since their not placed in the route cache (assuming rt_caching returns zero),
then don't we have a leak, since the garbage collector will never see it in the
cache to reap.

Assuming thats the case, I was thinking about closing that leak by setting
DST_NOHASH in rt_intern_hash for any dst_entry that was submitted when
rt_caching returns zero.  Then in skb_dst_drop, we can check for dst->_refcnt
== 0, and if flags & DST_NOHASH is true, then we can call dst_free on it.  Or
does that remove the dst_entry before a caller might be done with it in some
cases?

Thoughts welcome and appreciated.

Thanks!
Neil


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-21 17:11       ` Neil Horman
@ 2009-06-22  5:43         ` Jarek Poplawski
  2009-06-22  8:59           ` Alexey Kuznetsov
  0 siblings, 1 reply; 19+ messages in thread
From: Jarek Poplawski @ 2009-06-22  5:43 UTC (permalink / raw)
  To: Neil Horman
  Cc: David Miller, netdev, mbizon, dada1, kuznet, pekkas, jmorris, yoshfuji

On 21-06-2009 19:11, Neil Horman wrote:
> On Sat, Jun 20, 2009 at 04:47:48PM -0700, David Miller wrote:
>> From: Jarek Poplawski <jarkao2@gmail.com>
>> Date: Sat, 20 Jun 2009 18:39:25 +0200
>>
>>> Jarek Poplawski wrote, On 06/20/2009 02:37 PM:
>>>
>>>> Neil Horman wrote, On 06/19/2009 07:18 PM:
>>>>
>>>>> Don't drop route if we're not caching	
>>> ...
>>>
>>>>>  route.c |   14 ++++++++++++--
>>>>>  1 file changed, 12 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>>>>> index cd76b3c..65b3a8b 100644
>>>>> --- a/net/ipv4/route.c
>>>>> +++ b/net/ipv4/route.c
>>>>> @@ -1085,8 +1085,16 @@ restart:
>>>>>  	now = jiffies;
>>>>>  
>>>>>  	if (!rt_caching(dev_net(rt->u.dst.dev))) {
>>>>> -		rt_drop(rt);
>>>
>>> One more question: if this rt is assigned to an skb, there is only
>>> skb_dst_drop() in kfree_skb(), but it seems we skip rt_free() part,
>>> or I miss something?
>> This whole code path was buggy, if it returns success it should
>> do as the normal success code path does which is assign for
>> non-SKB case to *rp, or skb_dst_set().
> 
> So I'm a bit confused.  I see how my patch corrects the path we take through
> rt_intern_hash, doing the same thing that we normally do in the success case.
> What I don't see is how we clean up those dst entries when we're done with them.
> Since their not placed in the route cache (assuming rt_caching returns zero),
> then don't we have a leak, since the garbage collector will never see it in the
> cache to reap.
> 
> Assuming thats the case, I was thinking about closing that leak by setting
> DST_NOHASH in rt_intern_hash for any dst_entry that was submitted when
> rt_caching returns zero.  Then in skb_dst_drop, we can check for dst->_refcnt
> == 0, and if flags & DST_NOHASH is true, then we can call dst_free on it.  Or
> does that remove the dst_entry before a caller might be done with it in some
> cases?

Maybe it can work, but it needs a thorough checking now and adds a new
code path to track later while looking for bugs. So, I wonder if it's
not better to link such dsts in rt_intern_hash anyway, probably as a
separate list, scanned only for expired entries.

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-22  5:43         ` Jarek Poplawski
@ 2009-06-22  8:59           ` Alexey Kuznetsov
  2009-06-22  9:42             ` David Miller
  2009-06-22 11:00             ` Jarek Poplawski
  0 siblings, 2 replies; 19+ messages in thread
From: Alexey Kuznetsov @ 2009-06-22  8:59 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Neil Horman, David Miller, netdev, mbizon, dada1, pekkas,
	jmorris, yoshfuji

On Mon, Jun 22, 2009 at 05:43:15AM +0000, Jarek Poplawski wrote:
> Maybe it can work, but it needs a thorough checking now and adds a new
> code path to track later while looking for bugs. So, I wonder if it's
> not better to link such dsts in rt_intern_hash anyway, probably as a
> separate list, scanned only for expired entries.

Such a list already exists, it is gc list in core/dst.c.

The fix to the problem could be replacing rt_drop() with rt_free()
(adding rt_free() after the patch, which deleted rt_drop()), something like:

	if (!rt_caching(dev_net(rt->u.dst.dev))) {
		/* ..... */
+		rt_free(rt);
		goto report_and_exit;
	}

rt_free() will put the route on that gc list and it will be releases
as soon as refcnt becomes 0.

Alexey


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-22  8:59           ` Alexey Kuznetsov
@ 2009-06-22  9:42             ` David Miller
  2009-06-22 10:56               ` Neil Horman
  2009-06-22 11:00             ` Jarek Poplawski
  1 sibling, 1 reply; 19+ messages in thread
From: David Miller @ 2009-06-22  9:42 UTC (permalink / raw)
  To: kuznet; +Cc: jarkao2, nhorman, netdev, mbizon, dada1, pekkas, jmorris, yoshfuji

From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Date: Mon, 22 Jun 2009 12:59:50 +0400

> On Mon, Jun 22, 2009 at 05:43:15AM +0000, Jarek Poplawski wrote:
>> Maybe it can work, but it needs a thorough checking now and adds a new
>> code path to track later while looking for bugs. So, I wonder if it's
>> not better to link such dsts in rt_intern_hash anyway, probably as a
>> separate list, scanned only for expired entries.
> 
> Such a list already exists, it is gc list in core/dst.c.
> 
> The fix to the problem could be replacing rt_drop() with rt_free()
> (adding rt_free() after the patch, which deleted rt_drop()), something like:
> 
> 	if (!rt_caching(dev_net(rt->u.dst.dev))) {
> 		/* ..... */
> +		rt_free(rt);
> 		goto report_and_exit;
> 	}
> 
> rt_free() will put the route on that gc list and it will be releases
> as soon as refcnt becomes 0.

That should work.

Can someone put together a formal patch and give it at least
a quick test?

Thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-22  9:42             ` David Miller
@ 2009-06-22 10:56               ` Neil Horman
  0 siblings, 0 replies; 19+ messages in thread
From: Neil Horman @ 2009-06-22 10:56 UTC (permalink / raw)
  To: David Miller
  Cc: kuznet, jarkao2, netdev, mbizon, dada1, pekkas, jmorris, yoshfuji

On Mon, Jun 22, 2009 at 02:42:43AM -0700, David Miller wrote:
> From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Date: Mon, 22 Jun 2009 12:59:50 +0400
> 
> > On Mon, Jun 22, 2009 at 05:43:15AM +0000, Jarek Poplawski wrote:
> >> Maybe it can work, but it needs a thorough checking now and adds a new
> >> code path to track later while looking for bugs. So, I wonder if it's
> >> not better to link such dsts in rt_intern_hash anyway, probably as a
> >> separate list, scanned only for expired entries.
> > 
> > Such a list already exists, it is gc list in core/dst.c.
> > 
> > The fix to the problem could be replacing rt_drop() with rt_free()
> > (adding rt_free() after the patch, which deleted rt_drop()), something like:
> > 
> > 	if (!rt_caching(dev_net(rt->u.dst.dev))) {
> > 		/* ..... */
> > +		rt_free(rt);
> > 		goto report_and_exit;
> > 	}
> > 
> > rt_free() will put the route on that gc list and it will be releases
> > as soon as refcnt becomes 0.
> 
> That should work.
> 
> Can someone put together a formal patch and give it at least
> a quick test?
> 
Yeah, I can handle this, I've found another problem involving arp in this code
that I'm validating.  I'll roll both fixes together.

Neil

> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-22  8:59           ` Alexey Kuznetsov
  2009-06-22  9:42             ` David Miller
@ 2009-06-22 11:00             ` Jarek Poplawski
  2009-06-22 11:08               ` Neil Horman
  2009-06-22 11:29               ` Alexey Kuznetsov
  1 sibling, 2 replies; 19+ messages in thread
From: Jarek Poplawski @ 2009-06-22 11:00 UTC (permalink / raw)
  To: Alexey Kuznetsov
  Cc: Neil Horman, David Miller, netdev, mbizon, dada1, pekkas,
	jmorris, yoshfuji

On Mon, Jun 22, 2009 at 12:59:50PM +0400, Alexey Kuznetsov wrote:
> On Mon, Jun 22, 2009 at 05:43:15AM +0000, Jarek Poplawski wrote:
> > Maybe it can work, but it needs a thorough checking now and adds a new
> > code path to track later while looking for bugs. So, I wonder if it's
> > not better to link such dsts in rt_intern_hash anyway, probably as a
> > separate list, scanned only for expired entries.
> 
> Such a list already exists, it is gc list in core/dst.c.
> 
> The fix to the problem could be replacing rt_drop() with rt_free()
> (adding rt_free() after the patch, which deleted rt_drop()), something like:
> 
> 	if (!rt_caching(dev_net(rt->u.dst.dev))) {
> 		/* ..... */
> +		rt_free(rt);
> 		goto report_and_exit;
> 	}
> 
> rt_free() will put the route on that gc list and it will be releases
> as soon as refcnt becomes 0.

One little doubt would be RCU: if it's currently used in rt_free, and
some code depends on it, there would be a change: rt could be freed
just after atomic dec, without waiting for RCU yet. The second one is
about timing: freeing this always from a workqueue could probably
make a problem if softirqs are often disabled.

Jarek P.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-22 11:00             ` Jarek Poplawski
@ 2009-06-22 11:08               ` Neil Horman
  2009-06-22 12:18                 ` Jarek Poplawski
  2009-06-22 11:29               ` Alexey Kuznetsov
  1 sibling, 1 reply; 19+ messages in thread
From: Neil Horman @ 2009-06-22 11:08 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Alexey Kuznetsov, David Miller, netdev, mbizon, dada1, pekkas,
	jmorris, yoshfuji

On Mon, Jun 22, 2009 at 11:00:23AM +0000, Jarek Poplawski wrote:
> On Mon, Jun 22, 2009 at 12:59:50PM +0400, Alexey Kuznetsov wrote:
> > On Mon, Jun 22, 2009 at 05:43:15AM +0000, Jarek Poplawski wrote:
> > > Maybe it can work, but it needs a thorough checking now and adds a new
> > > code path to track later while looking for bugs. So, I wonder if it's
> > > not better to link such dsts in rt_intern_hash anyway, probably as a
> > > separate list, scanned only for expired entries.
> > 
> > Such a list already exists, it is gc list in core/dst.c.
> > 
> > The fix to the problem could be replacing rt_drop() with rt_free()
> > (adding rt_free() after the patch, which deleted rt_drop()), something like:
> > 
> > 	if (!rt_caching(dev_net(rt->u.dst.dev))) {
> > 		/* ..... */
> > +		rt_free(rt);
> > 		goto report_and_exit;
> > 	}
> > 
> > rt_free() will put the route on that gc list and it will be releases
> > as soon as refcnt becomes 0.
> 
> One little doubt would be RCU: if it's currently used in rt_free, and
> some code depends on it, there would be a change: rt could be freed
> just after atomic dec, without waiting for RCU yet. The second one is

Not sure that I see the concern.  How are we going to call dst_rcu_free without
waiting for a quiesence of the use of this route?  When it enters rt_intern_hash
is use count is one, and if we call rt_free in this case, we are guaranteeing
that this path is the only user (no one else will be able to get a reference to
it as they can't look it up in the cache).  Once the route is dropped at the end
of this route lookup, its use is over and the garbage collector will reap it.

> about timing: freeing this always from a workqueue could probably
> make a problem if softirqs are often disabled.
> 
You mean often, or permanently?  If softirqs are often disabled, thats no big
deal, as long as they are enabled at least sometimes, the garbage colelctor will
run, and we'll be ok.  If softirqs are disabled permanently (or for sufficiently
long periods that we gather huge number of routes waiting to be reaped), I think
we have larger problems on our hands

Neil

> Jarek P.
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-22 11:00             ` Jarek Poplawski
  2009-06-22 11:08               ` Neil Horman
@ 2009-06-22 11:29               ` Alexey Kuznetsov
  2009-06-22 12:04                 ` Jarek Poplawski
  1 sibling, 1 reply; 19+ messages in thread
From: Alexey Kuznetsov @ 2009-06-22 11:29 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Neil Horman, David Miller, netdev, mbizon, dada1, pekkas,
	jmorris, yoshfuji

On Mon, Jun 22, 2009 at 11:00:23AM +0000, Jarek Poplawski wrote:
> One little doubt would be RCU: if it's currently used in rt_free, and
> some code depends on it, there would be a change: rt could be freed
> just after atomic dec, without waiting for RCU yet.

Is this bad 'could', or "would be good 'could'"? :-)

Such route always goes through RCU: nobody can destroy it,
destruction happens only if route is on gc list or when RCU callback
is called and refcnt is already 0.

BTW RCU on such routes is redundant, they never were in hash table.
This means that rt_free() could be safely replaced with straight dst_free()
in this place.


>					 The second one is
> about timing: freeing this always from a workqueue could probably
> make a problem if softirqs are often disabled.

Do you mean dst_gc_task?

Yes, it could be optimized out, route can be destroyed when
refcnt hits zero, but this will require checking refcnt in dst_release().
Old days it was well visible both on code size and performance.
Probably, nowadays it is not a problem anymore.

Alexey

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-22 11:29               ` Alexey Kuznetsov
@ 2009-06-22 12:04                 ` Jarek Poplawski
  0 siblings, 0 replies; 19+ messages in thread
From: Jarek Poplawski @ 2009-06-22 12:04 UTC (permalink / raw)
  To: Alexey Kuznetsov
  Cc: Neil Horman, David Miller, netdev, mbizon, dada1, pekkas,
	jmorris, yoshfuji

On Mon, Jun 22, 2009 at 03:29:56PM +0400, Alexey Kuznetsov wrote:
> On Mon, Jun 22, 2009 at 11:00:23AM +0000, Jarek Poplawski wrote:
> > One little doubt would be RCU: if it's currently used in rt_free, and
> > some code depends on it, there would be a change: rt could be freed
> > just after atomic dec, without waiting for RCU yet.
> 
> Is this bad 'could', or "would be good 'could'"? :-)

Hmm... 'would' could be 'could', I guess... ;-) 

> 
> Such route always goes through RCU: nobody can destroy it,
> destruction happens only if route is on gc list or when RCU callback
> is called and refcnt is already 0.

Righ, but now there is some RCU period after refcount goes 0, not
before.

> 
> BTW RCU on such routes is redundant, they never were in hash table.
> This means that rt_free() could be safely replaced with straight dst_free()
> in this place.

If it's like this then of course there is no problem.

> 
> >					 The second one is
> > about timing: freeing this always from a workqueue could probably
> > make a problem if softirqs are often disabled.
> 
> Do you mean dst_gc_task?
> 
> Yes, it could be optimized out, route can be destroyed when
> refcnt hits zero, but this will require checking refcnt in dst_release().
> Old days it was well visible both on code size and performance.
> Probably, nowadays it is not a problem anymore.

I mean checking refcnts on some list in rt_intern_hash while adding
a new uncached dst.

Jarek P.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-22 11:08               ` Neil Horman
@ 2009-06-22 12:18                 ` Jarek Poplawski
  2009-06-22 15:10                   ` Neil Horman
  0 siblings, 1 reply; 19+ messages in thread
From: Jarek Poplawski @ 2009-06-22 12:18 UTC (permalink / raw)
  To: Neil Horman
  Cc: Alexey Kuznetsov, David Miller, netdev, mbizon, dada1, pekkas,
	jmorris, yoshfuji

On Mon, Jun 22, 2009 at 07:08:19AM -0400, Neil Horman wrote:
> On Mon, Jun 22, 2009 at 11:00:23AM +0000, Jarek Poplawski wrote:
> > On Mon, Jun 22, 2009 at 12:59:50PM +0400, Alexey Kuznetsov wrote:
> > > On Mon, Jun 22, 2009 at 05:43:15AM +0000, Jarek Poplawski wrote:
> > > > Maybe it can work, but it needs a thorough checking now and adds a new
> > > > code path to track later while looking for bugs. So, I wonder if it's
> > > > not better to link such dsts in rt_intern_hash anyway, probably as a
> > > > separate list, scanned only for expired entries.
> > > 
> > > Such a list already exists, it is gc list in core/dst.c.
> > > 
> > > The fix to the problem could be replacing rt_drop() with rt_free()
> > > (adding rt_free() after the patch, which deleted rt_drop()), something like:
> > > 
> > > 	if (!rt_caching(dev_net(rt->u.dst.dev))) {
> > > 		/* ..... */
> > > +		rt_free(rt);
> > > 		goto report_and_exit;
> > > 	}
> > > 
> > > rt_free() will put the route on that gc list and it will be releases
> > > as soon as refcnt becomes 0.
> > 
> > One little doubt would be RCU: if it's currently used in rt_free, and
> > some code depends on it, there would be a change: rt could be freed
> > just after atomic dec, without waiting for RCU yet. The second one is
> 
> Not sure that I see the concern.  How are we going to call dst_rcu_free without
> waiting for a quiesence of the use of this route?  When it enters rt_intern_hash
> is use count is one, and if we call rt_free in this case, we are guaranteeing
> that this path is the only user (no one else will be able to get a reference to
> it as they can't look it up in the cache).  Once the route is dropped at the end
> of this route lookup, its use is over and the garbage collector will reap it.

The main question is where this RCU is really needed? (I guess not in
route cache, which controls moving it to dst gc.) If it's not necessary
like Alexey wrote, then this change doesn't matter.

> 
> > about timing: freeing this always from a workqueue could probably
> > make a problem if softirqs are often disabled.
> > 
> You mean often, or permanently?  If softirqs are often disabled, thats no big
> deal, as long as they are enabled at least sometimes, the garbage colelctor will
> run, and we'll be ok.  If softirqs are disabled permanently (or for sufficiently
> long periods that we gather huge number of routes waiting to be reaped), I think
> we have larger problems on our hands

If there is a lot of new skbs per napi and a workqueue code triggered
later there could be problem with getting this memory back on time.

Jarek P.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fix NULL pointer + success return in route lookup path
  2009-06-22 12:18                 ` Jarek Poplawski
@ 2009-06-22 15:10                   ` Neil Horman
  0 siblings, 0 replies; 19+ messages in thread
From: Neil Horman @ 2009-06-22 15:10 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Alexey Kuznetsov, David Miller, netdev, mbizon, dada1, pekkas,
	jmorris, yoshfuji

On Mon, Jun 22, 2009 at 12:18:53PM +0000, Jarek Poplawski wrote:
> On Mon, Jun 22, 2009 at 07:08:19AM -0400, Neil Horman wrote:
> > On Mon, Jun 22, 2009 at 11:00:23AM +0000, Jarek Poplawski wrote:
> > > On Mon, Jun 22, 2009 at 12:59:50PM +0400, Alexey Kuznetsov wrote:
> > > > On Mon, Jun 22, 2009 at 05:43:15AM +0000, Jarek Poplawski wrote:
> > > > > Maybe it can work, but it needs a thorough checking now and adds a new
> > > > > code path to track later while looking for bugs. So, I wonder if it's
> > > > > not better to link such dsts in rt_intern_hash anyway, probably as a
> > > > > separate list, scanned only for expired entries.
> > > > 
> > > > Such a list already exists, it is gc list in core/dst.c.
> > > > 
> > > > The fix to the problem could be replacing rt_drop() with rt_free()
> > > > (adding rt_free() after the patch, which deleted rt_drop()), something like:
> > > > 
> > > > 	if (!rt_caching(dev_net(rt->u.dst.dev))) {
> > > > 		/* ..... */
> > > > +		rt_free(rt);
> > > > 		goto report_and_exit;
> > > > 	}
> > > > 
> > > > rt_free() will put the route on that gc list and it will be releases
> > > > as soon as refcnt becomes 0.
> > > 
> > > One little doubt would be RCU: if it's currently used in rt_free, and
> > > some code depends on it, there would be a change: rt could be freed
> > > just after atomic dec, without waiting for RCU yet. The second one is
> > 
> > Not sure that I see the concern.  How are we going to call dst_rcu_free without
> > waiting for a quiesence of the use of this route?  When it enters rt_intern_hash
> > is use count is one, and if we call rt_free in this case, we are guaranteeing
> > that this path is the only user (no one else will be able to get a reference to
> > it as they can't look it up in the cache).  Once the route is dropped at the end
> > of this route lookup, its use is over and the garbage collector will reap it.
> 
> The main question is where this RCU is really needed? (I guess not in
> route cache, which controls moving it to dst gc.) If it's not necessary
> like Alexey wrote, then this change doesn't matter.
> 
> > 
> > > about timing: freeing this always from a workqueue could probably
> > > make a problem if softirqs are often disabled.
> > > 
> > You mean often, or permanently?  If softirqs are often disabled, thats no big
> > deal, as long as they are enabled at least sometimes, the garbage colelctor will
> > run, and we'll be ok.  If softirqs are disabled permanently (or for sufficiently
> > long periods that we gather huge number of routes waiting to be reaped), I think
> > we have larger problems on our hands
> 
> If there is a lot of new skbs per napi and a workqueue code triggered
> later there could be problem with getting this memory back on time.
> 
Thats true, But then we'll drop frames until such time as the workqueue gets a
chance to run, at which point we'll recover the memory, and start
receiving/forwarding again.  keep in mind that if rt_caching returns false, that
means that we've rebuilt our route cache enough to think that someone is trying
to attack us by biasing the hash table with extra long chains, which can lead to
even worse performance than not caching at all.  So even though the performance
stinks without caching, we're doing it because it would be worse if we were
caching.

Neil

> Jarek P.
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-06-22 15:11 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-19 17:18 [PATCH] fix NULL pointer + success return in route lookup path Neil Horman
2009-06-20  8:15 ` David Miller
2009-06-20 12:37 ` Jarek Poplawski
2009-06-20 16:39   ` Jarek Poplawski
2009-06-20 17:11     ` Neil Horman
2009-06-20 17:23       ` Jarek Poplawski
2009-06-20 23:47     ` David Miller
2009-06-21 17:11       ` Neil Horman
2009-06-22  5:43         ` Jarek Poplawski
2009-06-22  8:59           ` Alexey Kuznetsov
2009-06-22  9:42             ` David Miller
2009-06-22 10:56               ` Neil Horman
2009-06-22 11:00             ` Jarek Poplawski
2009-06-22 11:08               ` Neil Horman
2009-06-22 12:18                 ` Jarek Poplawski
2009-06-22 15:10                   ` Neil Horman
2009-06-22 11:29               ` Alexey Kuznetsov
2009-06-22 12:04                 ` Jarek Poplawski
2009-06-20 16:44   ` Neil Horman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.