All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Avoid extra calculation in ip_route_input_common
@ 2011-12-21  5:12 Michael Wang
  2011-12-21  5:19 ` Michael Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Michael Wang @ 2011-12-21  5:12 UTC (permalink / raw)
  To: netdev; +Cc: Michael Wang

From: Michael Wang <wangyun@linux.vnet.ibm.com>

If previous condition doesn't meet, the later check will be cancelled.
So we don't need to do all the calculation.

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
---
 net/ipv4/route.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f30112f..2872bfb 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2362,10 +2362,10 @@ int ip_route_input_common(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 
 	for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
 	     rth = rcu_dereference(rth->dst.rt_next)) {
-		if ((((__force u32)rth->rt_key_dst ^ (__force u32)daddr) |
-		     ((__force u32)rth->rt_key_src ^ (__force u32)saddr) |
-		     (rth->rt_route_iif ^ iif) |
-		     (rth->rt_key_tos ^ tos)) == 0 &&
+		if (((__force u32)rth->rt_key_dst ^ (__force u32)daddr) == 0 &&
+		    ((__force u32)rth->rt_key_src ^ (__force u32)saddr) == 0 &&
+		    rth->rt_route_iif == iif &&
+		    rth->rt_key_tos == tos &&
 		    rth->rt_mark == skb->mark &&
 		    net_eq(dev_net(rth->dst.dev), net) &&
 		    !rt_is_expired(rth)) {
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Avoid extra calculation in ip_route_input_common
  2011-12-21  5:12 [PATCH] Avoid extra calculation in ip_route_input_common Michael Wang
@ 2011-12-21  5:19 ` Michael Wang
  2011-12-21  5:23 ` Joe Perches
  2011-12-21  5:57 ` David Miller
  2 siblings, 0 replies; 8+ messages in thread
From: Michael Wang @ 2011-12-21  5:19 UTC (permalink / raw)
  To: Michael Wang; +Cc: netdev

Please tell me if this patch is silly...
I really want to know the reason we need to use ^ and | instead of if and &&.

On 12/21/2011 01:12 PM, Michael Wang wrote:

> From: Michael Wang <wangyun@linux.vnet.ibm.com>
> 
> If previous condition doesn't meet, the later check will be cancelled.
> So we don't need to do all the calculation.
> 
> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
> ---
>  net/ipv4/route.c |    8 ++++----
>  1 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index f30112f..2872bfb 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2362,10 +2362,10 @@ int ip_route_input_common(struct sk_buff *skb, __be32 daddr, __be32 saddr,
> 
>  	for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
>  	     rth = rcu_dereference(rth->dst.rt_next)) {
> -		if ((((__force u32)rth->rt_key_dst ^ (__force u32)daddr) |


And I also wonder whether I can use "rth->rt_key_dst == daddr" here or not...

Thanks & Regards,
Michael Wang

> -		     ((__force u32)rth->rt_key_src ^ (__force u32)saddr) |
> -		     (rth->rt_route_iif ^ iif) |
> -		     (rth->rt_key_tos ^ tos)) == 0 &&
> +		if (((__force u32)rth->rt_key_dst ^ (__force u32)daddr) == 0 &&
> +		    ((__force u32)rth->rt_key_src ^ (__force u32)saddr) == 0 &&
> +		    rth->rt_route_iif == iif &&
> +		    rth->rt_key_tos == tos &&
>  		    rth->rt_mark == skb->mark &&
>  		    net_eq(dev_net(rth->dst.dev), net) &&
>  		    !rt_is_expired(rth)) {

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Avoid extra calculation in ip_route_input_common
  2011-12-21  5:12 [PATCH] Avoid extra calculation in ip_route_input_common Michael Wang
  2011-12-21  5:19 ` Michael Wang
@ 2011-12-21  5:23 ` Joe Perches
  2011-12-21  5:39   ` Michael Wang
  2011-12-21  5:57 ` David Miller
  2 siblings, 1 reply; 8+ messages in thread
From: Joe Perches @ 2011-12-21  5:23 UTC (permalink / raw)
  To: Michael Wang, Stephen Hemminger; +Cc: netdev

On Wed, 2011-12-21 at 13:12 +0800, Michael Wang wrote:
> From: Michael Wang <wangyun@linux.vnet.ibm.com>
> 
> If previous condition doesn't meet, the later check will be cancelled.
> So we don't need to do all the calculation.

Not sure about that.

> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
> ---
>  net/ipv4/route.c |    8 ++++----
>  1 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index f30112f..2872bfb 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2362,10 +2362,10 @@ int ip_route_input_common(struct sk_buff *skb, __be32 daddr, __be32 saddr,
>  
>  	for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
>  	     rth = rcu_dereference(rth->dst.rt_next)) {
> -		if ((((__force u32)rth->rt_key_dst ^ (__force u32)daddr) |
> -		     ((__force u32)rth->rt_key_src ^ (__force u32)saddr) |
> -		     (rth->rt_route_iif ^ iif) |
> -		     (rth->rt_key_tos ^ tos)) == 0 &&
> +		if (((__force u32)rth->rt_key_dst ^ (__force u32)daddr) == 0 &&
> +		    ((__force u32)rth->rt_key_src ^ (__force u32)saddr) == 0 &&
> +		    rth->rt_route_iif == iif &&
> +		    rth->rt_key_tos == tos &&
>  		    rth->rt_mark == skb->mark &&
>  		    net_eq(dev_net(rth->dst.dev), net) &&
>  		    !rt_is_expired(rth)) {

See:

commit c0b8c32b1c96afc9b32b717927330025cc1c501e
Author: Stephen Hemminger <shemminger@vyatta.com>
Date:   Thu Apr 10 04:00:28 2008 -0700

    IPV4: use xor rather than multiple ands for route compare
    
    The comparison in ip_route_input is a hot path, by recoding the C
    "and" as bit operations, fewer conditional branches get generated
    so the code should be faster. Maybe someday Gcc will be smart
    enough to do this?
    
    Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
    Acked-by: Eric Dumazet <dada1@cosmosbay.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Avoid extra calculation in ip_route_input_common
  2011-12-21  5:23 ` Joe Perches
@ 2011-12-21  5:39   ` Michael Wang
  2011-12-21  5:50     ` Joe Perches
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Wang @ 2011-12-21  5:39 UTC (permalink / raw)
  To: Joe Perches; +Cc: Stephen Hemminger, netdev

On 12/21/2011 01:23 PM, Joe Perches wrote:

> On Wed, 2011-12-21 at 13:12 +0800, Michael Wang wrote:
>> From: Michael Wang <wangyun@linux.vnet.ibm.com>
>>
>> If previous condition doesn't meet, the later check will be cancelled.
>> So we don't need to do all the calculation.
> 
> Not sure about that.
> 

Hi, Joe

Thanks for your reply :)

>> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
>> ---
>>  net/ipv4/route.c |    8 ++++----
>>  1 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>> index f30112f..2872bfb 100644
>> --- a/net/ipv4/route.c
>> +++ b/net/ipv4/route.c
>> @@ -2362,10 +2362,10 @@ int ip_route_input_common(struct sk_buff *skb, __be32 daddr, __be32 saddr,
>>  
>>  	for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
>>  	     rth = rcu_dereference(rth->dst.rt_next)) {
>> -		if ((((__force u32)rth->rt_key_dst ^ (__force u32)daddr) |
>> -		     ((__force u32)rth->rt_key_src ^ (__force u32)saddr) |
>> -		     (rth->rt_route_iif ^ iif) |
>> -		     (rth->rt_key_tos ^ tos)) == 0 &&
>> +		if (((__force u32)rth->rt_key_dst ^ (__force u32)daddr) == 0 &&
>> +		    ((__force u32)rth->rt_key_src ^ (__force u32)saddr) == 0 &&
>> +		    rth->rt_route_iif == iif &&
>> +		    rth->rt_key_tos == tos &&
>>  		    rth->rt_mark == skb->mark &&
>>  		    net_eq(dev_net(rth->dst.dev), net) &&
>>  		    !rt_is_expired(rth)) {
> 
> See:
> 
> commit c0b8c32b1c96afc9b32b717927330025cc1c501e
> Author: Stephen Hemminger <shemminger@vyatta.com>
> Date:   Thu Apr 10 04:00:28 2008 -0700
> 
>     IPV4: use xor rather than multiple ands for route compare
>     
>     The comparison in ip_route_input is a hot path, by recoding the C
>     "and" as bit operations, fewer conditional branches get generated
>     so the code should be faster. Maybe someday Gcc will be smart
>     enough to do this?


This is what confused me, why "fewer conditional branches get generated"
will make code faster?
In this example, I think the best condition when daddr is different, we
only need to go to one branch do compare then quit, won't this be faster?

Thanks,
Michael Wang

>     
>     Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>     Acked-by: Eric Dumazet <dada1@cosmosbay.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Avoid extra calculation in ip_route_input_common
  2011-12-21  5:39   ` Michael Wang
@ 2011-12-21  5:50     ` Joe Perches
  2011-12-21  6:00       ` Michael Wang
  0 siblings, 1 reply; 8+ messages in thread
From: Joe Perches @ 2011-12-21  5:50 UTC (permalink / raw)
  To: Michael Wang; +Cc: Stephen Hemminger, netdev

On Wed, 2011-12-21 at 13:39 +0800, Michael Wang wrote:
> On 12/21/2011 01:23 PM, Joe Perches wrote:
> > On Wed, 2011-12-21 at 13:12 +0800, Michael Wang wrote:
> >> From: Michael Wang <wangyun@linux.vnet.ibm.com>
> >>
> >> If previous condition doesn't meet, the later check will be cancelled.
> >> So we don't need to do all the calculation.
[]
> > commit c0b8c32b1c96afc9b32b717927330025cc1c501e
> > Author: Stephen Hemminger <shemminger@vyatta.com>
> > Date:   Thu Apr 10 04:00:28 2008 -0700
> > 
> >     IPV4: use xor rather than multiple ands for route compare
> >     
> >     The comparison in ip_route_input is a hot path, by recoding the C
> >     "and" as bit operations, fewer conditional branches get generated
> >     so the code should be faster. Maybe someday Gcc will be smart
> >     enough to do this?
> This is what confused me, why "fewer conditional branches get generated"
> will make code faster?
> In this example, I think the best condition when daddr is different, we
> only need to go to one branch do compare then quit, won't this be faster?

	if (a && b)
		...
pseudo-codes to:
	if (!a)
		goto fail;
	if (!b)
		goto fail;
	...
fail:

Each of those conditional branches has a cost.
Combining tests of variables in the same cache lines
has relatively little cost compared to the conditional
branches.

That's the theory anyway.

If you have tests that demonstrate otherwise, please
provide them.

cheers, Joe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Avoid extra calculation in ip_route_input_common
  2011-12-21  5:12 [PATCH] Avoid extra calculation in ip_route_input_common Michael Wang
  2011-12-21  5:19 ` Michael Wang
  2011-12-21  5:23 ` Joe Perches
@ 2011-12-21  5:57 ` David Miller
  2011-12-21  6:04   ` Michael Wang
  2 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2011-12-21  5:57 UTC (permalink / raw)
  To: wangyun; +Cc: netdev

From: Michael Wang <wangyun@linux.vnet.ibm.com>
Date: Wed, 21 Dec 2011 13:12:03 +0800

> From: Michael Wang <wangyun@linux.vnet.ibm.com>
> 
> If previous condition doesn't meet, the later check will be cancelled.
> So we don't need to do all the calculation.
> 
> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>

This is intentional to reduce the number of branch prediction
misses, please don't change this.

Once we read one of these values, the rest are incredibly cheap,
the real cost is if we have tons of real branches here, each
of which can be mispredicted.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Avoid extra calculation in ip_route_input_common
  2011-12-21  5:50     ` Joe Perches
@ 2011-12-21  6:00       ` Michael Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Wang @ 2011-12-21  6:00 UTC (permalink / raw)
  To: Joe Perches; +Cc: Stephen Hemminger, netdev

On 12/21/2011 01:50 PM, Joe Perches wrote:

> On Wed, 2011-12-21 at 13:39 +0800, Michael Wang wrote:
>> On 12/21/2011 01:23 PM, Joe Perches wrote:
>>> On Wed, 2011-12-21 at 13:12 +0800, Michael Wang wrote:
>>>> From: Michael Wang <wangyun@linux.vnet.ibm.com>
>>>>
>>>> If previous condition doesn't meet, the later check will be cancelled.
>>>> So we don't need to do all the calculation.
> []
>>> commit c0b8c32b1c96afc9b32b717927330025cc1c501e
>>> Author: Stephen Hemminger <shemminger@vyatta.com>
>>> Date:   Thu Apr 10 04:00:28 2008 -0700
>>>
>>>     IPV4: use xor rather than multiple ands for route compare
>>>     
>>>     The comparison in ip_route_input is a hot path, by recoding the C
>>>     "and" as bit operations, fewer conditional branches get generated
>>>     so the code should be faster. Maybe someday Gcc will be smart
>>>     enough to do this?
>> This is what confused me, why "fewer conditional branches get generated"
>> will make code faster?
>> In this example, I think the best condition when daddr is different, we
>> only need to go to one branch do compare then quit, won't this be faster?
> 
> 	if (a && b)
> 		...
> pseudo-codes to:
> 	if (!a)
> 		goto fail;
> 	if (!b)
> 		goto fail;
> 	...
> fail:
> 
> Each of those conditional branches has a cost.
> Combining tests of variables in the same cache lines
> has relatively little cost compared to the conditional
> branches.
> 

That make sense :)

> That's the theory anyway.
> 
> If you have tests that demonstrate otherwise, please
> provide them.
> 

I think the previous patch should have done such test, otherwise they
won't do this change.

Thanks for your reply, that clear my confusion.

Regards,
Michael Wang

> cheers, Joe
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Avoid extra calculation in ip_route_input_common
  2011-12-21  5:57 ` David Miller
@ 2011-12-21  6:04   ` Michael Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Wang @ 2011-12-21  6:04 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On 12/21/2011 01:57 PM, David Miller wrote:

> From: Michael Wang <wangyun@linux.vnet.ibm.com>
> Date: Wed, 21 Dec 2011 13:12:03 +0800
> 
>> From: Michael Wang <wangyun@linux.vnet.ibm.com>
>>
>> If previous condition doesn't meet, the later check will be cancelled.
>> So we don't need to do all the calculation.
>>
>> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
> 
> This is intentional to reduce the number of branch prediction
> misses, please don't change this.
> 
> Once we read one of these values, the rest are incredibly cheap,
> the real cost is if we have tons of real branches here, each
> of which can be mispredicted.
> 

Hi, David

Thanks for your detailed explain, now I know the reason we use this style :)

Thanks,
Michael Wang

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-12-21  6:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-21  5:12 [PATCH] Avoid extra calculation in ip_route_input_common Michael Wang
2011-12-21  5:19 ` Michael Wang
2011-12-21  5:23 ` Joe Perches
2011-12-21  5:39   ` Michael Wang
2011-12-21  5:50     ` Joe Perches
2011-12-21  6:00       ` Michael Wang
2011-12-21  5:57 ` David Miller
2011-12-21  6:04   ` Michael Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.