* [PATCH] ipv4: fix ipsec forward performance regression
@ 2011-10-23 7:58 Yan, Zheng
2011-10-23 9:03 ` Eric Dumazet
2011-10-23 14:52 ` [PATCH] ipv4: fix ipsec forward performance regression Julian Anastasov
0 siblings, 2 replies; 16+ messages in thread
From: Yan, Zheng @ 2011-10-23 7:58 UTC (permalink / raw)
To: netdev, davem, eric.dumazet, Kim Phillips
There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
It makes xfrm4_fill_dst() modify wrong data structure.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
net/ipv4/xfrm4_policy.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index fc5368a..a0b4c5d 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -79,13 +79,13 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
struct rtable *rt = (struct rtable *)xdst->route;
const struct flowi4 *fl4 = &fl->u.ip4;
- rt->rt_key_dst = fl4->daddr;
- rt->rt_key_src = fl4->saddr;
- rt->rt_key_tos = fl4->flowi4_tos;
- rt->rt_route_iif = fl4->flowi4_iif;
- rt->rt_iif = fl4->flowi4_iif;
- rt->rt_oif = fl4->flowi4_oif;
- rt->rt_mark = fl4->flowi4_mark;
+ xdst->u.rt.rt_key_dst = fl4->daddr;
+ xdst->u.rt.rt_key_src = fl4->saddr;
+ xdst->u.rt.rt_key_tos = fl4->flowi4_tos;
+ xdst->u.rt.rt_route_iif = fl4->flowi4_iif;
+ xdst->u.rt.rt_iif = fl4->flowi4_iif;
+ xdst->u.rt.rt_oif = fl4->flowi4_oif;
+ xdst->u.rt.rt_mark = fl4->flowi4_mark;
xdst->u.dst.dev = dev;
dev_hold(dev);
--
1.7.4.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression
2011-10-23 7:58 [PATCH] ipv4: fix ipsec forward performance regression Yan, Zheng
@ 2011-10-23 9:03 ` Eric Dumazet
2011-10-24 7:02 ` David Miller
2011-10-23 14:52 ` [PATCH] ipv4: fix ipsec forward performance regression Julian Anastasov
1 sibling, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2011-10-23 9:03 UTC (permalink / raw)
To: Yan, Zheng; +Cc: netdev, davem, Kim Phillips
Le dimanche 23 octobre 2011 à 15:58 +0800, Yan, Zheng a écrit :
> There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
> It makes xfrm4_fill_dst() modify wrong data structure.
>
> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
> ---
> net/ipv4/xfrm4_policy.c | 14 +++++++-------
> 1 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
> index fc5368a..a0b4c5d 100644
> --- a/net/ipv4/xfrm4_policy.c
> +++ b/net/ipv4/xfrm4_policy.c
> @@ -79,13 +79,13 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
> struct rtable *rt = (struct rtable *)xdst->route;
> const struct flowi4 *fl4 = &fl->u.ip4;
>
> - rt->rt_key_dst = fl4->daddr;
> - rt->rt_key_src = fl4->saddr;
> - rt->rt_key_tos = fl4->flowi4_tos;
> - rt->rt_route_iif = fl4->flowi4_iif;
> - rt->rt_iif = fl4->flowi4_iif;
> - rt->rt_oif = fl4->flowi4_oif;
> - rt->rt_mark = fl4->flowi4_mark;
> + xdst->u.rt.rt_key_dst = fl4->daddr;
> + xdst->u.rt.rt_key_src = fl4->saddr;
> + xdst->u.rt.rt_key_tos = fl4->flowi4_tos;
> + xdst->u.rt.rt_route_iif = fl4->flowi4_iif;
> + xdst->u.rt.rt_iif = fl4->flowi4_iif;
> + xdst->u.rt.rt_oif = fl4->flowi4_oif;
> + xdst->u.rt.rt_mark = fl4->flowi4_mark;
>
> xdst->u.dst.dev = dev;
> dev_hold(dev);
Good catch, thanks !
Reported-by: Kim Phillips <kim.phillips@freescale.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression
2011-10-23 7:58 [PATCH] ipv4: fix ipsec forward performance regression Yan, Zheng
2011-10-23 9:03 ` Eric Dumazet
@ 2011-10-23 14:52 ` Julian Anastasov
2011-10-24 0:41 ` Yan, Zheng
1 sibling, 1 reply; 16+ messages in thread
From: Julian Anastasov @ 2011-10-23 14:52 UTC (permalink / raw)
To: Yan, Zheng; +Cc: netdev, davem, eric.dumazet, Kim Phillips
Hello,
On Sun, 23 Oct 2011, Yan, Zheng wrote:
> There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
> It makes xfrm4_fill_dst() modify wrong data structure.
>
> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
> ---
> net/ipv4/xfrm4_policy.c | 14 +++++++-------
> 1 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
> index fc5368a..a0b4c5d 100644
> --- a/net/ipv4/xfrm4_policy.c
> +++ b/net/ipv4/xfrm4_policy.c
> @@ -79,13 +79,13 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
> struct rtable *rt = (struct rtable *)xdst->route;
> const struct flowi4 *fl4 = &fl->u.ip4;
>
> - rt->rt_key_dst = fl4->daddr;
> - rt->rt_key_src = fl4->saddr;
> - rt->rt_key_tos = fl4->flowi4_tos;
> - rt->rt_route_iif = fl4->flowi4_iif;
> - rt->rt_iif = fl4->flowi4_iif;
> - rt->rt_oif = fl4->flowi4_oif;
> - rt->rt_mark = fl4->flowi4_mark;
> + xdst->u.rt.rt_key_dst = fl4->daddr;
> + xdst->u.rt.rt_key_src = fl4->saddr;
> + xdst->u.rt.rt_key_tos = fl4->flowi4_tos;
> + xdst->u.rt.rt_route_iif = fl4->flowi4_iif;
May be I'm missing something but I don't see where
flowi4_iif is set for the forwarding case. __xfrm_route_forward
calls xfrm_decode_session which does not appear to set
flowi4_iif. When providing fl4 for output routes flowi4_iif
is always set to 0, so it represents rt_route_iif. But
then there are 2 variants for __ip_route_output_key:
- ip_route_output_slow sets flowi4_iif to loopback and
flowi4_oif to outdev during lookup but never restores them
to original values. It is assumed that caller uses outdev
from dst, not from flowi4_oif.
- for cached route we do not update flowi4_iif and flowi4_oif
in __ip_route_output_key, so the resulting fl4 can not be
used for these values. I assume, the current rules are that
only fl4.saddr and daddr are updated while flowi4_iif and
flowi4_oif are not. It looks wrong flowi code to rely on them.
Currently, we have 3 values for devices:
rt_iif: indev for input routes, resulting outdev for output routes
which plays the role as indev for loopback traffic.
rt_oif: original outdev key, 0 for input routes, can be 0 for
output routes if socket is not bound to oif
rt_route_iif: indev for input routes, 0 for output routes
With above rules for flowi4_iif and flowi4_oif
it is impossible to select value for rt_iif from fl4.
I don't know the xfrm code well, may be after the
mentioned change we damaged rt_oif and rt_route_iif values
for cached dst which can lead to using slow path all the time.
Even if rt_intern_hash() avoids caching similar dsts multiple
times, if cached entry is damaged we will add more and
more new entries after every damage.
> + xdst->u.rt.rt_iif = fl4->flowi4_iif;
> + xdst->u.rt.rt_oif = fl4->flowi4_oif;
> + xdst->u.rt.rt_mark = fl4->flowi4_mark;
>
> xdst->u.dst.dev = dev;
> dev_hold(dev);
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression
2011-10-23 14:52 ` [PATCH] ipv4: fix ipsec forward performance regression Julian Anastasov
@ 2011-10-24 0:41 ` Yan, Zheng
2011-10-24 7:01 ` David Miller
0 siblings, 1 reply; 16+ messages in thread
From: Yan, Zheng @ 2011-10-24 0:41 UTC (permalink / raw)
To: Julian Anastasov; +Cc: netdev, davem, eric.dumazet, Kim Phillips
On 10/23/2011 10:52 PM, Julian Anastasov wrote:
>
> Hello,
>
> On Sun, 23 Oct 2011, Yan, Zheng wrote:
>
>> There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
>> It makes xfrm4_fill_dst() modify wrong data structure.
>>
>> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
>> ---
>> net/ipv4/xfrm4_policy.c | 14 +++++++-------
>> 1 files changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
>> index fc5368a..a0b4c5d 100644
>> --- a/net/ipv4/xfrm4_policy.c
>> +++ b/net/ipv4/xfrm4_policy.c
>> @@ -79,13 +79,13 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
>> struct rtable *rt = (struct rtable *)xdst->route;
>> const struct flowi4 *fl4 = &fl->u.ip4;
>>
>> - rt->rt_key_dst = fl4->daddr;
>> - rt->rt_key_src = fl4->saddr;
>> - rt->rt_key_tos = fl4->flowi4_tos;
>> - rt->rt_route_iif = fl4->flowi4_iif;
>> - rt->rt_iif = fl4->flowi4_iif;
>> - rt->rt_oif = fl4->flowi4_oif;
>> - rt->rt_mark = fl4->flowi4_mark;
>> + xdst->u.rt.rt_key_dst = fl4->daddr;
>> + xdst->u.rt.rt_key_src = fl4->saddr;
>> + xdst->u.rt.rt_key_tos = fl4->flowi4_tos;
>> + xdst->u.rt.rt_route_iif = fl4->flowi4_iif;
>
> May be I'm missing something but I don't see where
> flowi4_iif is set for the forwarding case. __xfrm_route_forward
> calls xfrm_decode_session which does not appear to set
> flowi4_iif. When providing fl4 for output routes flowi4_iif
> is always set to 0, so it represents rt_route_iif. But
> then there are 2 variants for __ip_route_output_key:
>
> - ip_route_output_slow sets flowi4_iif to loopback and
> flowi4_oif to outdev during lookup but never restores them
> to original values. It is assumed that caller uses outdev
> from dst, not from flowi4_oif.
>
> - for cached route we do not update flowi4_iif and flowi4_oif
> in __ip_route_output_key, so the resulting fl4 can not be
> used for these values. I assume, the current rules are that
> only fl4.saddr and daddr are updated while flowi4_iif and
> flowi4_oif are not. It looks wrong flowi code to rely on them.
>
> Currently, we have 3 values for devices:
>
> rt_iif: indev for input routes, resulting outdev for output routes
> which plays the role as indev for loopback traffic.
>
> rt_oif: original outdev key, 0 for input routes, can be 0 for
> output routes if socket is not bound to oif
>
> rt_route_iif: indev for input routes, 0 for output routes
>
> With above rules for flowi4_iif and flowi4_oif
> it is impossible to select value for rt_iif from fl4.
>
> I don't know the xfrm code well, may be after the
Neither do I. My understanding is that xfrm_dst(s) are managed by the
flow cache (net/core/flow.c). We don't put them into the routing cache.
Regards
Yan, Zheng
> mentioned change we damaged rt_oif and rt_route_iif values
> for cached dst which can lead to using slow path all the time.
> Even if rt_intern_hash() avoids caching similar dsts multiple
> times, if cached entry is damaged we will add more and
> more new entries after every damage.
>
>> + xdst->u.rt.rt_iif = fl4->flowi4_iif;
>> + xdst->u.rt.rt_oif = fl4->flowi4_oif;
>> + xdst->u.rt.rt_mark = fl4->flowi4_mark;
>>
>> xdst->u.dst.dev = dev;
>> dev_hold(dev);
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression
2011-10-24 0:41 ` Yan, Zheng
@ 2011-10-24 7:01 ` David Miller
0 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2011-10-24 7:01 UTC (permalink / raw)
To: zheng.z.yan; +Cc: ja, netdev, eric.dumazet, kim.phillips
From: "Yan, Zheng" <zheng.z.yan@intel.com>
Date: Mon, 24 Oct 2011 08:41:53 +0800
> My understanding is that xfrm_dst(s) are managed by the flow cache
> (net/core/flow.c). We don't put them into the routing cache.
This is correct.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression
2011-10-23 9:03 ` Eric Dumazet
@ 2011-10-24 7:02 ` David Miller
2011-11-01 23:50 ` Kim Phillips
0 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2011-10-24 7:02 UTC (permalink / raw)
To: eric.dumazet; +Cc: zheng.z.yan, netdev, kim.phillips
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 23 Oct 2011 11:03:10 +0200
> Le dimanche 23 octobre 2011 à 15:58 +0800, Yan, Zheng a écrit :
>> There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
>> It makes xfrm4_fill_dst() modify wrong data structure.
>>
>> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
...
> Reported-by: Kim Phillips <kim.phillips@freescale.com>
>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied, thanks everyone.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression
2011-10-24 7:02 ` David Miller
@ 2011-11-01 23:50 ` Kim Phillips
2011-11-02 0:34 ` David Miller
0 siblings, 1 reply; 16+ messages in thread
From: Kim Phillips @ 2011-11-01 23:50 UTC (permalink / raw)
To: stable; +Cc: eric.dumazet, zheng.z.yan, netdev, David Miller
On Mon, 24 Oct 2011 03:02:03 -0400
David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 23 Oct 2011 11:03:10 +0200
>
> > Le dimanche 23 octobre 2011 à 15:58 +0800, Yan, Zheng a écrit :
> >> There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
> >> It makes xfrm4_fill_dst() modify wrong data structure.
> >>
> >> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
> ...
> > Reported-by: Kim Phillips <kim.phillips@freescale.com>
> >
> > Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> Applied, thanks everyone.
To: <stable@kernel.org>
-stable maintainers, please consider the following two upstream
commits for inclusion in upcoming v3.0.x [1] stable releases:
v3.0.8 plus this:
upstream commit b73233960a59ee66e09d642f13d0592b13651e94
(ipv4: fix ipsec forward performance regression)
increases IPSec fwding performance from 0.2kpps to ~3.5kpps.
Adding this:
upstream commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0
(net: Handle different key sizes between address families in flow
cache)
to that, brings it back up to 2.6.38 levels, i.e., ~44kpps.
note that for v2.6.39.4 (.39 is the first kernel version with the
40->0.2kpps regression), commit b732339 depends on a slew of
commits, presumably ending with commit 5e2b61f: ipv4: Remove flowi
from struct rtable.
However it appears commit aa1c366e alone will restore almost all
the performance (~42kpps) on that kernel version.
So to summarize, please cherry-pick:
v2.6.39.x: aa1c366: net: Handle different key sizes between address families in flow cache
v3.0.x: aa1c366: net: Handle different key sizes between address families in flow cache
v3.0.x: b732339: ipv4: fix ipsec forward performance regression
v3.1.x: b732339: ipv4: fix ipsec forward performance regression
[All figures are based on a p2020ds board configured to rx, encrypt
and forward 64-byte packets.]
Thanks,
Kim
[1] initial kernel in long-term stable series for the embedded
industry (http://lwn.net/Articles/464834/)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression
2011-11-01 23:50 ` Kim Phillips
@ 2011-11-02 0:34 ` David Miller
2011-11-03 18:58 ` Kim Phillips
0 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2011-11-02 0:34 UTC (permalink / raw)
To: kim.phillips; +Cc: stable, eric.dumazet, zheng.z.yan, netdev
From: Kim Phillips <kim.phillips@freescale.com>
Date: Tue, 1 Nov 2011 18:50:22 -0500
> -stable maintainers, please consider the following two upstream
> commits for inclusion in upcoming v3.0.x [1] stable releases:
I submit networking stable fixes as needed, so please make such
requests to me.
I haven't submitted this change yet because I want it to sit a bit
longer in Linus's tree to make sure there aren't any unwanted
side-effects.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression
2011-11-02 0:34 ` David Miller
@ 2011-11-03 18:58 ` Kim Phillips
2011-11-04 2:43 ` David Miller
0 siblings, 1 reply; 16+ messages in thread
From: Kim Phillips @ 2011-11-03 18:58 UTC (permalink / raw)
To: David Miller; +Cc: stable, eric.dumazet, zheng.z.yan, netdev
On Tue, 1 Nov 2011 20:34:30 -0400
David Miller <davem@davemloft.net> wrote:
> From: Kim Phillips <kim.phillips@freescale.com>
> Date: Tue, 1 Nov 2011 18:50:22 -0500
>
> > -stable maintainers, please consider the following two upstream
> > commits for inclusion in upcoming v3.0.x [1] stable releases:
>
> I submit networking stable fixes as needed, so please make such
> requests to me.
>
> I haven't submitted this change yet because I want it to sit a bit
> longer in Linus's tree to make sure there aren't any unwanted
> side-effects.
I see b732339 (ipv4: fix ipsec forward performance regression) has
been added to the 3.0- and 3.1-stable trees.
b732339 increases IPSec fwding performance from 0.2kpps to ~3.5kpps.
However, adding upstream commit aa1c366 (net: Handle different key
sizes between address families in flow cache) brings performance
back up to 2.6.38 levels, i.e., ~44kpps.
So can aa1c366 also be queued to the v2.6.39 and v3.0 stable
trees?
Thanks,
Kim
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression
2011-11-03 18:58 ` Kim Phillips
@ 2011-11-04 2:43 ` David Miller
2011-11-04 19:46 ` [stable] net: Handle different key sizes between address families in flow cache Kim Phillips
0 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2011-11-04 2:43 UTC (permalink / raw)
To: kim.phillips; +Cc: stable, eric.dumazet, zheng.z.yan, netdev
From: Kim Phillips <kim.phillips@freescale.com>
Date: Thu, 3 Nov 2011 13:58:51 -0500
> I see b732339 (ipv4: fix ipsec forward performance regression) has
> been added to the 3.0- and 3.1-stable trees.
>
> b732339 increases IPSec fwding performance from 0.2kpps to ~3.5kpps.
>
> However, adding upstream commit aa1c366 (net: Handle different key
> sizes between address families in flow cache) brings performance
> back up to 2.6.38 levels, i.e., ~44kpps.
>
> So can aa1c366 also be queued to the v2.6.39 and v3.0 stable
> trees?
1) stable@kernel.org no longer exists and will just go into the bit
bucket, it's now stable@vger.kernel.org
2) The patch in question doesn't apply cleanly to v3.0.8 -stable so
if it's important to you you'll need to submit a backport.
3) 2.6.39 is no longer a maintained -stable kernel series.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [stable] net: Handle different key sizes between address families in flow cache
2011-11-04 2:43 ` David Miller
@ 2011-11-04 19:46 ` Kim Phillips
2011-11-04 20:41 ` David Miller
2011-11-04 21:24 ` Greg KH
0 siblings, 2 replies; 16+ messages in thread
From: Kim Phillips @ 2011-11-04 19:46 UTC (permalink / raw)
To: David Miller; +Cc: stable, eric.dumazet, zheng.z.yan, netdev, David Ward
commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0 upstream.
With the conversion of struct flowi to a union of AF-specific structs, some
operations on the flow cache need to account for the exact size of the key.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: stable@vger.kernel.org # v3.0.x
---
This patch is the result of a clean cherry-pick onto v3.0.8.
It restores IPSec fwding performance from ~4kpps back to ~44kpps on
a P2020DS board.
include/net/flow.h | 19 +++++++++++++++++++
net/core/flow.c | 31 +++++++++++++++++--------------
2 files changed, 36 insertions(+), 14 deletions(-)
diff --git a/include/net/flow.h b/include/net/flow.h
index c6d5fe5..93a8785 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -7,6 +7,7 @@
#ifndef _NET_FLOW_H
#define _NET_FLOW_H
+#include <linux/socket.h>
#include <linux/in6.h>
#include <asm/atomic.h>
@@ -161,6 +162,24 @@ static inline struct flowi *flowidn_to_flowi(struct flowidn *fldn)
return container_of(fldn, struct flowi, u.dn);
}
+typedef unsigned long flow_compare_t;
+
+static inline size_t flow_key_size(u16 family)
+{
+ switch (family) {
+ case AF_INET:
+ BUILD_BUG_ON(sizeof(struct flowi4) % sizeof(flow_compare_t));
+ return sizeof(struct flowi4) / sizeof(flow_compare_t);
+ case AF_INET6:
+ BUILD_BUG_ON(sizeof(struct flowi6) % sizeof(flow_compare_t));
+ return sizeof(struct flowi6) / sizeof(flow_compare_t);
+ case AF_DECnet:
+ BUILD_BUG_ON(sizeof(struct flowidn) % sizeof(flow_compare_t));
+ return sizeof(struct flowidn) / sizeof(flow_compare_t);
+ }
+ return 0;
+}
+
#define FLOW_DIR_IN 0
#define FLOW_DIR_OUT 1
#define FLOW_DIR_FWD 2
diff --git a/net/core/flow.c b/net/core/flow.c
index 990703b..a6bda2a 100644
--- a/net/core/flow.c
+++ b/net/core/flow.c
@@ -172,29 +172,26 @@ static void flow_new_hash_rnd(struct flow_cache *fc,
static u32 flow_hash_code(struct flow_cache *fc,
struct flow_cache_percpu *fcp,
- const struct flowi *key)
+ const struct flowi *key,
+ size_t keysize)
{
const u32 *k = (const u32 *) key;
+ const u32 length = keysize * sizeof(flow_compare_t) / sizeof(u32);
- return jhash2(k, (sizeof(*key) / sizeof(u32)), fcp->hash_rnd)
+ return jhash2(k, length, fcp->hash_rnd)
& (flow_cache_hash_size(fc) - 1);
}
-typedef unsigned long flow_compare_t;
-
/* I hear what you're saying, use memcmp. But memcmp cannot make
- * important assumptions that we can here, such as alignment and
- * constant size.
+ * important assumptions that we can here, such as alignment.
*/
-static int flow_key_compare(const struct flowi *key1, const struct flowi *key2)
+static int flow_key_compare(const struct flowi *key1, const struct flowi *key2,
+ size_t keysize)
{
const flow_compare_t *k1, *k1_lim, *k2;
- const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t);
-
- BUILD_BUG_ON(sizeof(struct flowi) % sizeof(flow_compare_t));
k1 = (const flow_compare_t *) key1;
- k1_lim = k1 + n_elem;
+ k1_lim = k1 + keysize;
k2 = (const flow_compare_t *) key2;
@@ -215,6 +212,7 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
struct flow_cache_entry *fle, *tfle;
struct hlist_node *entry;
struct flow_cache_object *flo;
+ size_t keysize;
unsigned int hash;
local_bh_disable();
@@ -222,6 +220,11 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
fle = NULL;
flo = NULL;
+
+ keysize = flow_key_size(family);
+ if (!keysize)
+ goto nocache;
+
/* Packet really early in init? Making flow_cache_init a
* pre-smp initcall would solve this. --RR */
if (!fcp->hash_table)
@@ -230,11 +233,11 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
if (fcp->hash_rnd_recalc)
flow_new_hash_rnd(fc, fcp);
- hash = flow_hash_code(fc, fcp, key);
+ hash = flow_hash_code(fc, fcp, key, keysize);
hlist_for_each_entry(tfle, entry, &fcp->hash_table[hash], u.hlist) {
if (tfle->family == family &&
tfle->dir == dir &&
- flow_key_compare(key, &tfle->key) == 0) {
+ flow_key_compare(key, &tfle->key, keysize) == 0) {
fle = tfle;
break;
}
@@ -248,7 +251,7 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
if (fle) {
fle->family = family;
fle->dir = dir;
- memcpy(&fle->key, key, sizeof(*key));
+ memcpy(&fle->key, key, keysize * sizeof(flow_compare_t));
fle->object = NULL;
hlist_add_head(&fle->u.hlist, &fcp->hash_table[hash]);
fcp->hash_count++;
--
1.7.7.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [stable] net: Handle different key sizes between address families in flow cache
2011-11-04 19:46 ` [stable] net: Handle different key sizes between address families in flow cache Kim Phillips
@ 2011-11-04 20:41 ` David Miller
2011-11-04 21:24 ` Greg KH
1 sibling, 0 replies; 16+ messages in thread
From: David Miller @ 2011-11-04 20:41 UTC (permalink / raw)
To: kim.phillips; +Cc: stable, eric.dumazet, zheng.z.yan, netdev, david.ward
From: Kim Phillips <kim.phillips@freescale.com>
Date: Fri, 4 Nov 2011 14:46:59 -0500
> commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0 upstream.
>
> With the conversion of struct flowi to a union of AF-specific structs, some
> operations on the flow cache need to account for the exact size of the key.
>
> Signed-off-by: David Ward <david.ward@ll.mit.edu>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Cc: stable@vger.kernel.org # v3.0.x
I'm fine with this, -stable folks please apply.
Thanks.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [stable] net: Handle different key sizes between address families in flow cache
2011-11-04 19:46 ` [stable] net: Handle different key sizes between address families in flow cache Kim Phillips
2011-11-04 20:41 ` David Miller
@ 2011-11-04 21:24 ` Greg KH
2011-11-05 2:29 ` Kim Phillips
1 sibling, 1 reply; 16+ messages in thread
From: Greg KH @ 2011-11-04 21:24 UTC (permalink / raw)
To: Kim Phillips
Cc: David Miller, stable, eric.dumazet, zheng.z.yan, netdev, David Ward
On Fri, Nov 04, 2011 at 02:46:59PM -0500, Kim Phillips wrote:
> commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0 upstream.
>
> With the conversion of struct flowi to a union of AF-specific structs, some
> operations on the flow cache need to account for the exact size of the key.
>
> Signed-off-by: David Ward <david.ward@ll.mit.edu>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Cc: stable@vger.kernel.org # v3.0.x
> ---
> This patch is the result of a clean cherry-pick onto v3.0.8.
> It restores IPSec fwding performance from ~4kpps back to ~44kpps on
> a P2020DS board.
Too bad you forgot to build this patch after you applied it:
CC init/main.o
In file included from include/linux/security.h:39:0,
from init/main.c:32:
include/net/flow.h: In function ‘flow_key_size’:
include/net/flow.h:174:3: error: size of unnamed array is negative
include/net/flow.h:177:3: error: size of unnamed array is negative
Please be kind to your poor over-worked stable kernel maintainer and do
the decent thing and TEST YOUR PATCH before you ask him to accept it.
bah, I think it's time for the weekend to start a bit earlier than normal...
greg k-h
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [stable] net: Handle different key sizes between address families in flow cache
2011-11-04 21:24 ` Greg KH
@ 2011-11-05 2:29 ` Kim Phillips
2011-11-08 16:53 ` Greg KH
0 siblings, 1 reply; 16+ messages in thread
From: Kim Phillips @ 2011-11-05 2:29 UTC (permalink / raw)
To: Greg KH
Cc: David Miller, stable, eric.dumazet, zheng.z.yan, netdev, David Ward
On Fri, 4 Nov 2011 14:24:39 -0700
Greg KH <greg@kroah.com> wrote:
> On Fri, Nov 04, 2011 at 02:46:59PM -0500, Kim Phillips wrote:
> > commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0 upstream.
> >
> > With the conversion of struct flowi to a union of AF-specific structs, some
> > operations on the flow cache need to account for the exact size of the key.
> >
> > Signed-off-by: David Ward <david.ward@ll.mit.edu>
> > Signed-off-by: David S. Miller <davem@davemloft.net>
> > Cc: stable@vger.kernel.org # v3.0.x
> > ---
> > This patch is the result of a clean cherry-pick onto v3.0.8.
> > It restores IPSec fwding performance from ~4kpps back to ~44kpps on
> > a P2020DS board.
>
> Too bad you forgot to build this patch after you applied it:
> CC init/main.o
> In file included from include/linux/security.h:39:0,
> from init/main.c:32:
> include/net/flow.h: In function 'flow_key_size':
> include/net/flow.h:174:3: error: size of unnamed array is negative
> include/net/flow.h:177:3: error: size of unnamed array is negative
>
> Please be kind to your poor over-worked stable kernel maintainer and do
> the decent thing and TEST YOUR PATCH before you ask him to accept it.
>
> bah, I think it's time for the weekend to start a bit earlier than normal...
so sorry I hadn't tested 64-bit.
I found the problem - upstream commit aa1c366 depends on
upstream commit 728871b:
commit 728871bc05afc8ff310b17dba3e57a2472792b13
Author: David Ward <david.ward@ll.mit.edu>
Date: Mon Sep 5 16:47:23 2011 +0000
net: Align AF-specific flowi structs to long
AF-specific flowi structs are now passed to flow_key_compare, which must
also be aligned to a long.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
so, one more time, in stable sign-off area format:
Cc: <stable@vger.kernel.org> # v3.0.x: 728871b: net: Align AF-specific flowi structs to long
Cc: <stable@vger.kernel.org> # v3.0.x
build tested on 32- and 64-bit powerpc, and ARCH=x86
{i386,x86_64}_defconfig.
Kim
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [stable] net: Handle different key sizes between address families in flow cache
2011-11-05 2:29 ` Kim Phillips
@ 2011-11-08 16:53 ` Greg KH
2011-11-08 19:44 ` [stable v2] " Kim Phillips
0 siblings, 1 reply; 16+ messages in thread
From: Greg KH @ 2011-11-08 16:53 UTC (permalink / raw)
To: Kim Phillips
Cc: David Miller, stable, eric.dumazet, zheng.z.yan, netdev, David Ward
On Fri, Nov 04, 2011 at 09:29:58PM -0500, Kim Phillips wrote:
> On Fri, 4 Nov 2011 14:24:39 -0700
> Greg KH <greg@kroah.com> wrote:
>
> > On Fri, Nov 04, 2011 at 02:46:59PM -0500, Kim Phillips wrote:
> > > commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0 upstream.
> > >
> > > With the conversion of struct flowi to a union of AF-specific structs, some
> > > operations on the flow cache need to account for the exact size of the key.
> > >
> > > Signed-off-by: David Ward <david.ward@ll.mit.edu>
> > > Signed-off-by: David S. Miller <davem@davemloft.net>
> > > Cc: stable@vger.kernel.org # v3.0.x
> > > ---
> > > This patch is the result of a clean cherry-pick onto v3.0.8.
> > > It restores IPSec fwding performance from ~4kpps back to ~44kpps on
> > > a P2020DS board.
> >
> > Too bad you forgot to build this patch after you applied it:
> > CC init/main.o
> > In file included from include/linux/security.h:39:0,
> > from init/main.c:32:
> > include/net/flow.h: In function 'flow_key_size':
> > include/net/flow.h:174:3: error: size of unnamed array is negative
> > include/net/flow.h:177:3: error: size of unnamed array is negative
> >
> > Please be kind to your poor over-worked stable kernel maintainer and do
> > the decent thing and TEST YOUR PATCH before you ask him to accept it.
> >
> > bah, I think it's time for the weekend to start a bit earlier than normal...
>
> so sorry I hadn't tested 64-bit.
>
> I found the problem - upstream commit aa1c366 depends on
> upstream commit 728871b:
>
> commit 728871bc05afc8ff310b17dba3e57a2472792b13
> Author: David Ward <david.ward@ll.mit.edu>
> Date: Mon Sep 5 16:47:23 2011 +0000
>
> net: Align AF-specific flowi structs to long
>
> AF-specific flowi structs are now passed to flow_key_compare, which must
> also be aligned to a long.
>
> Signed-off-by: David Ward <david.ward@ll.mit.edu>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> so, one more time, in stable sign-off area format:
>
> Cc: <stable@vger.kernel.org> # v3.0.x: 728871b: net: Align AF-specific flowi structs to long
> Cc: <stable@vger.kernel.org> # v3.0.x
>
> build tested on 32- and 64-bit powerpc, and ARCH=x86
> {i386,x86_64}_defconfig.
Can you please resend the original patch I needed to apply here as well?
It's long-gone in my deleted inbox.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 16+ messages in thread
* [stable v2] net: Handle different key sizes between address families in flow cache
2011-11-08 16:53 ` Greg KH
@ 2011-11-08 19:44 ` Kim Phillips
0 siblings, 0 replies; 16+ messages in thread
From: Kim Phillips @ 2011-11-08 19:44 UTC (permalink / raw)
To: Greg KH
Cc: David Miller, stable, eric.dumazet, zheng.z.yan, netdev, David Ward
commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0 upstream.
With the conversion of struct flowi to a union of AF-specific structs, some
operations on the flow cache need to account for the exact size of the key.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org> # v3.0.x: 728871b: net: Align AF-specific flowi structs to long
Cc: <stable@vger.kernel.org> # v3.0.x
---
v2: resend, with 64-bit build fix dependency 728871b included in sign-off area
include/net/flow.h | 19 +++++++++++++++++++
net/core/flow.c | 31 +++++++++++++++++--------------
2 files changed, 36 insertions(+), 14 deletions(-)
diff --git a/include/net/flow.h b/include/net/flow.h
index c6d5fe5..93a8785 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -7,6 +7,7 @@
#ifndef _NET_FLOW_H
#define _NET_FLOW_H
+#include <linux/socket.h>
#include <linux/in6.h>
#include <asm/atomic.h>
@@ -161,6 +162,24 @@ static inline struct flowi *flowidn_to_flowi(struct flowidn *fldn)
return container_of(fldn, struct flowi, u.dn);
}
+typedef unsigned long flow_compare_t;
+
+static inline size_t flow_key_size(u16 family)
+{
+ switch (family) {
+ case AF_INET:
+ BUILD_BUG_ON(sizeof(struct flowi4) % sizeof(flow_compare_t));
+ return sizeof(struct flowi4) / sizeof(flow_compare_t);
+ case AF_INET6:
+ BUILD_BUG_ON(sizeof(struct flowi6) % sizeof(flow_compare_t));
+ return sizeof(struct flowi6) / sizeof(flow_compare_t);
+ case AF_DECnet:
+ BUILD_BUG_ON(sizeof(struct flowidn) % sizeof(flow_compare_t));
+ return sizeof(struct flowidn) / sizeof(flow_compare_t);
+ }
+ return 0;
+}
+
#define FLOW_DIR_IN 0
#define FLOW_DIR_OUT 1
#define FLOW_DIR_FWD 2
diff --git a/net/core/flow.c b/net/core/flow.c
index 990703b..a6bda2a 100644
--- a/net/core/flow.c
+++ b/net/core/flow.c
@@ -172,29 +172,26 @@ static void flow_new_hash_rnd(struct flow_cache *fc,
static u32 flow_hash_code(struct flow_cache *fc,
struct flow_cache_percpu *fcp,
- const struct flowi *key)
+ const struct flowi *key,
+ size_t keysize)
{
const u32 *k = (const u32 *) key;
+ const u32 length = keysize * sizeof(flow_compare_t) / sizeof(u32);
- return jhash2(k, (sizeof(*key) / sizeof(u32)), fcp->hash_rnd)
+ return jhash2(k, length, fcp->hash_rnd)
& (flow_cache_hash_size(fc) - 1);
}
-typedef unsigned long flow_compare_t;
-
/* I hear what you're saying, use memcmp. But memcmp cannot make
- * important assumptions that we can here, such as alignment and
- * constant size.
+ * important assumptions that we can here, such as alignment.
*/
-static int flow_key_compare(const struct flowi *key1, const struct flowi *key2)
+static int flow_key_compare(const struct flowi *key1, const struct flowi *key2,
+ size_t keysize)
{
const flow_compare_t *k1, *k1_lim, *k2;
- const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t);
-
- BUILD_BUG_ON(sizeof(struct flowi) % sizeof(flow_compare_t));
k1 = (const flow_compare_t *) key1;
- k1_lim = k1 + n_elem;
+ k1_lim = k1 + keysize;
k2 = (const flow_compare_t *) key2;
@@ -215,6 +212,7 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
struct flow_cache_entry *fle, *tfle;
struct hlist_node *entry;
struct flow_cache_object *flo;
+ size_t keysize;
unsigned int hash;
local_bh_disable();
@@ -222,6 +220,11 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
fle = NULL;
flo = NULL;
+
+ keysize = flow_key_size(family);
+ if (!keysize)
+ goto nocache;
+
/* Packet really early in init? Making flow_cache_init a
* pre-smp initcall would solve this. --RR */
if (!fcp->hash_table)
@@ -230,11 +233,11 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
if (fcp->hash_rnd_recalc)
flow_new_hash_rnd(fc, fcp);
- hash = flow_hash_code(fc, fcp, key);
+ hash = flow_hash_code(fc, fcp, key, keysize);
hlist_for_each_entry(tfle, entry, &fcp->hash_table[hash], u.hlist) {
if (tfle->family == family &&
tfle->dir == dir &&
- flow_key_compare(key, &tfle->key) == 0) {
+ flow_key_compare(key, &tfle->key, keysize) == 0) {
fle = tfle;
break;
}
@@ -248,7 +251,7 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
if (fle) {
fle->family = family;
fle->dir = dir;
- memcpy(&fle->key, key, sizeof(*key));
+ memcpy(&fle->key, key, keysize * sizeof(flow_compare_t));
fle->object = NULL;
hlist_add_head(&fle->u.hlist, &fcp->hash_table[hash]);
fcp->hash_count++;
--
1.7.7.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-11-08 19:45 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-23 7:58 [PATCH] ipv4: fix ipsec forward performance regression Yan, Zheng
2011-10-23 9:03 ` Eric Dumazet
2011-10-24 7:02 ` David Miller
2011-11-01 23:50 ` Kim Phillips
2011-11-02 0:34 ` David Miller
2011-11-03 18:58 ` Kim Phillips
2011-11-04 2:43 ` David Miller
2011-11-04 19:46 ` [stable] net: Handle different key sizes between address families in flow cache Kim Phillips
2011-11-04 20:41 ` David Miller
2011-11-04 21:24 ` Greg KH
2011-11-05 2:29 ` Kim Phillips
2011-11-08 16:53 ` Greg KH
2011-11-08 19:44 ` [stable v2] " Kim Phillips
2011-10-23 14:52 ` [PATCH] ipv4: fix ipsec forward performance regression Julian Anastasov
2011-10-24 0:41 ` Yan, Zheng
2011-10-24 7:01 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).