* [PATCH] ipv4: fix ipsec forward performance regression @ 2011-10-23 7:58 Yan, Zheng 2011-10-23 9:03 ` Eric Dumazet 2011-10-23 14:52 ` [PATCH] ipv4: fix ipsec forward performance regression Julian Anastasov 0 siblings, 2 replies; 16+ messages in thread From: Yan, Zheng @ 2011-10-23 7:58 UTC (permalink / raw) To: netdev, davem, eric.dumazet, Kim Phillips There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable). It makes xfrm4_fill_dst() modify wrong data structure. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> --- net/ipv4/xfrm4_policy.c | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index fc5368a..a0b4c5d 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -79,13 +79,13 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev, struct rtable *rt = (struct rtable *)xdst->route; const struct flowi4 *fl4 = &fl->u.ip4; - rt->rt_key_dst = fl4->daddr; - rt->rt_key_src = fl4->saddr; - rt->rt_key_tos = fl4->flowi4_tos; - rt->rt_route_iif = fl4->flowi4_iif; - rt->rt_iif = fl4->flowi4_iif; - rt->rt_oif = fl4->flowi4_oif; - rt->rt_mark = fl4->flowi4_mark; + xdst->u.rt.rt_key_dst = fl4->daddr; + xdst->u.rt.rt_key_src = fl4->saddr; + xdst->u.rt.rt_key_tos = fl4->flowi4_tos; + xdst->u.rt.rt_route_iif = fl4->flowi4_iif; + xdst->u.rt.rt_iif = fl4->flowi4_iif; + xdst->u.rt.rt_oif = fl4->flowi4_oif; + xdst->u.rt.rt_mark = fl4->flowi4_mark; xdst->u.dst.dev = dev; dev_hold(dev); -- 1.7.4.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression 2011-10-23 7:58 [PATCH] ipv4: fix ipsec forward performance regression Yan, Zheng @ 2011-10-23 9:03 ` Eric Dumazet 2011-10-24 7:02 ` David Miller 2011-10-23 14:52 ` [PATCH] ipv4: fix ipsec forward performance regression Julian Anastasov 1 sibling, 1 reply; 16+ messages in thread From: Eric Dumazet @ 2011-10-23 9:03 UTC (permalink / raw) To: Yan, Zheng; +Cc: netdev, davem, Kim Phillips Le dimanche 23 octobre 2011 à 15:58 +0800, Yan, Zheng a écrit : > There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable). > It makes xfrm4_fill_dst() modify wrong data structure. > > Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> > --- > net/ipv4/xfrm4_policy.c | 14 +++++++------- > 1 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c > index fc5368a..a0b4c5d 100644 > --- a/net/ipv4/xfrm4_policy.c > +++ b/net/ipv4/xfrm4_policy.c > @@ -79,13 +79,13 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev, > struct rtable *rt = (struct rtable *)xdst->route; > const struct flowi4 *fl4 = &fl->u.ip4; > > - rt->rt_key_dst = fl4->daddr; > - rt->rt_key_src = fl4->saddr; > - rt->rt_key_tos = fl4->flowi4_tos; > - rt->rt_route_iif = fl4->flowi4_iif; > - rt->rt_iif = fl4->flowi4_iif; > - rt->rt_oif = fl4->flowi4_oif; > - rt->rt_mark = fl4->flowi4_mark; > + xdst->u.rt.rt_key_dst = fl4->daddr; > + xdst->u.rt.rt_key_src = fl4->saddr; > + xdst->u.rt.rt_key_tos = fl4->flowi4_tos; > + xdst->u.rt.rt_route_iif = fl4->flowi4_iif; > + xdst->u.rt.rt_iif = fl4->flowi4_iif; > + xdst->u.rt.rt_oif = fl4->flowi4_oif; > + xdst->u.rt.rt_mark = fl4->flowi4_mark; > > xdst->u.dst.dev = dev; > dev_hold(dev); Good catch, thanks ! Reported-by: Kim Phillips <kim.phillips@freescale.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression 2011-10-23 9:03 ` Eric Dumazet @ 2011-10-24 7:02 ` David Miller 2011-11-01 23:50 ` Kim Phillips 0 siblings, 1 reply; 16+ messages in thread From: David Miller @ 2011-10-24 7:02 UTC (permalink / raw) To: eric.dumazet; +Cc: zheng.z.yan, netdev, kim.phillips From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sun, 23 Oct 2011 11:03:10 +0200 > Le dimanche 23 octobre 2011 à 15:58 +0800, Yan, Zheng a écrit : >> There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable). >> It makes xfrm4_fill_dst() modify wrong data structure. >> >> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> ... > Reported-by: Kim Phillips <kim.phillips@freescale.com> > > Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Applied, thanks everyone. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression 2011-10-24 7:02 ` David Miller @ 2011-11-01 23:50 ` Kim Phillips 2011-11-02 0:34 ` David Miller 0 siblings, 1 reply; 16+ messages in thread From: Kim Phillips @ 2011-11-01 23:50 UTC (permalink / raw) To: stable; +Cc: eric.dumazet, zheng.z.yan, netdev, David Miller On Mon, 24 Oct 2011 03:02:03 -0400 David Miller <davem@davemloft.net> wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Sun, 23 Oct 2011 11:03:10 +0200 > > > Le dimanche 23 octobre 2011 à 15:58 +0800, Yan, Zheng a écrit : > >> There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable). > >> It makes xfrm4_fill_dst() modify wrong data structure. > >> > >> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> > ... > > Reported-by: Kim Phillips <kim.phillips@freescale.com> > > > > Acked-by: Eric Dumazet <eric.dumazet@gmail.com> > > Applied, thanks everyone. To: <stable@kernel.org> -stable maintainers, please consider the following two upstream commits for inclusion in upcoming v3.0.x [1] stable releases: v3.0.8 plus this: upstream commit b73233960a59ee66e09d642f13d0592b13651e94 (ipv4: fix ipsec forward performance regression) increases IPSec fwding performance from 0.2kpps to ~3.5kpps. Adding this: upstream commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0 (net: Handle different key sizes between address families in flow cache) to that, brings it back up to 2.6.38 levels, i.e., ~44kpps. note that for v2.6.39.4 (.39 is the first kernel version with the 40->0.2kpps regression), commit b732339 depends on a slew of commits, presumably ending with commit 5e2b61f: ipv4: Remove flowi from struct rtable. However it appears commit aa1c366e alone will restore almost all the performance (~42kpps) on that kernel version. So to summarize, please cherry-pick: v2.6.39.x: aa1c366: net: Handle different key sizes between address families in flow cache v3.0.x: aa1c366: net: Handle different key sizes between address families in flow cache v3.0.x: b732339: ipv4: fix ipsec forward performance regression v3.1.x: b732339: ipv4: fix ipsec forward performance regression [All figures are based on a p2020ds board configured to rx, encrypt and forward 64-byte packets.] Thanks, Kim [1] initial kernel in long-term stable series for the embedded industry (http://lwn.net/Articles/464834/) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression 2011-11-01 23:50 ` Kim Phillips @ 2011-11-02 0:34 ` David Miller 2011-11-03 18:58 ` Kim Phillips 0 siblings, 1 reply; 16+ messages in thread From: David Miller @ 2011-11-02 0:34 UTC (permalink / raw) To: kim.phillips; +Cc: stable, eric.dumazet, zheng.z.yan, netdev From: Kim Phillips <kim.phillips@freescale.com> Date: Tue, 1 Nov 2011 18:50:22 -0500 > -stable maintainers, please consider the following two upstream > commits for inclusion in upcoming v3.0.x [1] stable releases: I submit networking stable fixes as needed, so please make such requests to me. I haven't submitted this change yet because I want it to sit a bit longer in Linus's tree to make sure there aren't any unwanted side-effects. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression 2011-11-02 0:34 ` David Miller @ 2011-11-03 18:58 ` Kim Phillips 2011-11-04 2:43 ` David Miller 0 siblings, 1 reply; 16+ messages in thread From: Kim Phillips @ 2011-11-03 18:58 UTC (permalink / raw) To: David Miller; +Cc: stable, eric.dumazet, zheng.z.yan, netdev On Tue, 1 Nov 2011 20:34:30 -0400 David Miller <davem@davemloft.net> wrote: > From: Kim Phillips <kim.phillips@freescale.com> > Date: Tue, 1 Nov 2011 18:50:22 -0500 > > > -stable maintainers, please consider the following two upstream > > commits for inclusion in upcoming v3.0.x [1] stable releases: > > I submit networking stable fixes as needed, so please make such > requests to me. > > I haven't submitted this change yet because I want it to sit a bit > longer in Linus's tree to make sure there aren't any unwanted > side-effects. I see b732339 (ipv4: fix ipsec forward performance regression) has been added to the 3.0- and 3.1-stable trees. b732339 increases IPSec fwding performance from 0.2kpps to ~3.5kpps. However, adding upstream commit aa1c366 (net: Handle different key sizes between address families in flow cache) brings performance back up to 2.6.38 levels, i.e., ~44kpps. So can aa1c366 also be queued to the v2.6.39 and v3.0 stable trees? Thanks, Kim ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression 2011-11-03 18:58 ` Kim Phillips @ 2011-11-04 2:43 ` David Miller 2011-11-04 19:46 ` [stable] net: Handle different key sizes between address families in flow cache Kim Phillips 0 siblings, 1 reply; 16+ messages in thread From: David Miller @ 2011-11-04 2:43 UTC (permalink / raw) To: kim.phillips; +Cc: stable, eric.dumazet, zheng.z.yan, netdev From: Kim Phillips <kim.phillips@freescale.com> Date: Thu, 3 Nov 2011 13:58:51 -0500 > I see b732339 (ipv4: fix ipsec forward performance regression) has > been added to the 3.0- and 3.1-stable trees. > > b732339 increases IPSec fwding performance from 0.2kpps to ~3.5kpps. > > However, adding upstream commit aa1c366 (net: Handle different key > sizes between address families in flow cache) brings performance > back up to 2.6.38 levels, i.e., ~44kpps. > > So can aa1c366 also be queued to the v2.6.39 and v3.0 stable > trees? 1) stable@kernel.org no longer exists and will just go into the bit bucket, it's now stable@vger.kernel.org 2) The patch in question doesn't apply cleanly to v3.0.8 -stable so if it's important to you you'll need to submit a backport. 3) 2.6.39 is no longer a maintained -stable kernel series. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [stable] net: Handle different key sizes between address families in flow cache 2011-11-04 2:43 ` David Miller @ 2011-11-04 19:46 ` Kim Phillips 2011-11-04 20:41 ` David Miller 2011-11-04 21:24 ` Greg KH 0 siblings, 2 replies; 16+ messages in thread From: Kim Phillips @ 2011-11-04 19:46 UTC (permalink / raw) To: David Miller; +Cc: stable, eric.dumazet, zheng.z.yan, netdev, David Ward commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0 upstream. With the conversion of struct flowi to a union of AF-specific structs, some operations on the flow cache need to account for the exact size of the key. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: stable@vger.kernel.org # v3.0.x --- This patch is the result of a clean cherry-pick onto v3.0.8. It restores IPSec fwding performance from ~4kpps back to ~44kpps on a P2020DS board. include/net/flow.h | 19 +++++++++++++++++++ net/core/flow.c | 31 +++++++++++++++++-------------- 2 files changed, 36 insertions(+), 14 deletions(-) diff --git a/include/net/flow.h b/include/net/flow.h index c6d5fe5..93a8785 100644 --- a/include/net/flow.h +++ b/include/net/flow.h @@ -7,6 +7,7 @@ #ifndef _NET_FLOW_H #define _NET_FLOW_H +#include <linux/socket.h> #include <linux/in6.h> #include <asm/atomic.h> @@ -161,6 +162,24 @@ static inline struct flowi *flowidn_to_flowi(struct flowidn *fldn) return container_of(fldn, struct flowi, u.dn); } +typedef unsigned long flow_compare_t; + +static inline size_t flow_key_size(u16 family) +{ + switch (family) { + case AF_INET: + BUILD_BUG_ON(sizeof(struct flowi4) % sizeof(flow_compare_t)); + return sizeof(struct flowi4) / sizeof(flow_compare_t); + case AF_INET6: + BUILD_BUG_ON(sizeof(struct flowi6) % sizeof(flow_compare_t)); + return sizeof(struct flowi6) / sizeof(flow_compare_t); + case AF_DECnet: + BUILD_BUG_ON(sizeof(struct flowidn) % sizeof(flow_compare_t)); + return sizeof(struct flowidn) / sizeof(flow_compare_t); + } + return 0; +} + #define FLOW_DIR_IN 0 #define FLOW_DIR_OUT 1 #define FLOW_DIR_FWD 2 diff --git a/net/core/flow.c b/net/core/flow.c index 990703b..a6bda2a 100644 --- a/net/core/flow.c +++ b/net/core/flow.c @@ -172,29 +172,26 @@ static void flow_new_hash_rnd(struct flow_cache *fc, static u32 flow_hash_code(struct flow_cache *fc, struct flow_cache_percpu *fcp, - const struct flowi *key) + const struct flowi *key, + size_t keysize) { const u32 *k = (const u32 *) key; + const u32 length = keysize * sizeof(flow_compare_t) / sizeof(u32); - return jhash2(k, (sizeof(*key) / sizeof(u32)), fcp->hash_rnd) + return jhash2(k, length, fcp->hash_rnd) & (flow_cache_hash_size(fc) - 1); } -typedef unsigned long flow_compare_t; - /* I hear what you're saying, use memcmp. But memcmp cannot make - * important assumptions that we can here, such as alignment and - * constant size. + * important assumptions that we can here, such as alignment. */ -static int flow_key_compare(const struct flowi *key1, const struct flowi *key2) +static int flow_key_compare(const struct flowi *key1, const struct flowi *key2, + size_t keysize) { const flow_compare_t *k1, *k1_lim, *k2; - const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t); - - BUILD_BUG_ON(sizeof(struct flowi) % sizeof(flow_compare_t)); k1 = (const flow_compare_t *) key1; - k1_lim = k1 + n_elem; + k1_lim = k1 + keysize; k2 = (const flow_compare_t *) key2; @@ -215,6 +212,7 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir, struct flow_cache_entry *fle, *tfle; struct hlist_node *entry; struct flow_cache_object *flo; + size_t keysize; unsigned int hash; local_bh_disable(); @@ -222,6 +220,11 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir, fle = NULL; flo = NULL; + + keysize = flow_key_size(family); + if (!keysize) + goto nocache; + /* Packet really early in init? Making flow_cache_init a * pre-smp initcall would solve this. --RR */ if (!fcp->hash_table) @@ -230,11 +233,11 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir, if (fcp->hash_rnd_recalc) flow_new_hash_rnd(fc, fcp); - hash = flow_hash_code(fc, fcp, key); + hash = flow_hash_code(fc, fcp, key, keysize); hlist_for_each_entry(tfle, entry, &fcp->hash_table[hash], u.hlist) { if (tfle->family == family && tfle->dir == dir && - flow_key_compare(key, &tfle->key) == 0) { + flow_key_compare(key, &tfle->key, keysize) == 0) { fle = tfle; break; } @@ -248,7 +251,7 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir, if (fle) { fle->family = family; fle->dir = dir; - memcpy(&fle->key, key, sizeof(*key)); + memcpy(&fle->key, key, keysize * sizeof(flow_compare_t)); fle->object = NULL; hlist_add_head(&fle->u.hlist, &fcp->hash_table[hash]); fcp->hash_count++; -- 1.7.7.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [stable] net: Handle different key sizes between address families in flow cache 2011-11-04 19:46 ` [stable] net: Handle different key sizes between address families in flow cache Kim Phillips @ 2011-11-04 20:41 ` David Miller 2011-11-04 21:24 ` Greg KH 1 sibling, 0 replies; 16+ messages in thread From: David Miller @ 2011-11-04 20:41 UTC (permalink / raw) To: kim.phillips; +Cc: stable, eric.dumazet, zheng.z.yan, netdev, david.ward From: Kim Phillips <kim.phillips@freescale.com> Date: Fri, 4 Nov 2011 14:46:59 -0500 > commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0 upstream. > > With the conversion of struct flowi to a union of AF-specific structs, some > operations on the flow cache need to account for the exact size of the key. > > Signed-off-by: David Ward <david.ward@ll.mit.edu> > Signed-off-by: David S. Miller <davem@davemloft.net> > Cc: stable@vger.kernel.org # v3.0.x I'm fine with this, -stable folks please apply. Thanks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [stable] net: Handle different key sizes between address families in flow cache 2011-11-04 19:46 ` [stable] net: Handle different key sizes between address families in flow cache Kim Phillips 2011-11-04 20:41 ` David Miller @ 2011-11-04 21:24 ` Greg KH 2011-11-05 2:29 ` Kim Phillips 1 sibling, 1 reply; 16+ messages in thread From: Greg KH @ 2011-11-04 21:24 UTC (permalink / raw) To: Kim Phillips Cc: David Miller, stable, eric.dumazet, zheng.z.yan, netdev, David Ward On Fri, Nov 04, 2011 at 02:46:59PM -0500, Kim Phillips wrote: > commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0 upstream. > > With the conversion of struct flowi to a union of AF-specific structs, some > operations on the flow cache need to account for the exact size of the key. > > Signed-off-by: David Ward <david.ward@ll.mit.edu> > Signed-off-by: David S. Miller <davem@davemloft.net> > Cc: stable@vger.kernel.org # v3.0.x > --- > This patch is the result of a clean cherry-pick onto v3.0.8. > It restores IPSec fwding performance from ~4kpps back to ~44kpps on > a P2020DS board. Too bad you forgot to build this patch after you applied it: CC init/main.o In file included from include/linux/security.h:39:0, from init/main.c:32: include/net/flow.h: In function ‘flow_key_size’: include/net/flow.h:174:3: error: size of unnamed array is negative include/net/flow.h:177:3: error: size of unnamed array is negative Please be kind to your poor over-worked stable kernel maintainer and do the decent thing and TEST YOUR PATCH before you ask him to accept it. bah, I think it's time for the weekend to start a bit earlier than normal... greg k-h ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [stable] net: Handle different key sizes between address families in flow cache 2011-11-04 21:24 ` Greg KH @ 2011-11-05 2:29 ` Kim Phillips 2011-11-08 16:53 ` Greg KH 0 siblings, 1 reply; 16+ messages in thread From: Kim Phillips @ 2011-11-05 2:29 UTC (permalink / raw) To: Greg KH Cc: David Miller, stable, eric.dumazet, zheng.z.yan, netdev, David Ward On Fri, 4 Nov 2011 14:24:39 -0700 Greg KH <greg@kroah.com> wrote: > On Fri, Nov 04, 2011 at 02:46:59PM -0500, Kim Phillips wrote: > > commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0 upstream. > > > > With the conversion of struct flowi to a union of AF-specific structs, some > > operations on the flow cache need to account for the exact size of the key. > > > > Signed-off-by: David Ward <david.ward@ll.mit.edu> > > Signed-off-by: David S. Miller <davem@davemloft.net> > > Cc: stable@vger.kernel.org # v3.0.x > > --- > > This patch is the result of a clean cherry-pick onto v3.0.8. > > It restores IPSec fwding performance from ~4kpps back to ~44kpps on > > a P2020DS board. > > Too bad you forgot to build this patch after you applied it: > CC init/main.o > In file included from include/linux/security.h:39:0, > from init/main.c:32: > include/net/flow.h: In function 'flow_key_size': > include/net/flow.h:174:3: error: size of unnamed array is negative > include/net/flow.h:177:3: error: size of unnamed array is negative > > Please be kind to your poor over-worked stable kernel maintainer and do > the decent thing and TEST YOUR PATCH before you ask him to accept it. > > bah, I think it's time for the weekend to start a bit earlier than normal... so sorry I hadn't tested 64-bit. I found the problem - upstream commit aa1c366 depends on upstream commit 728871b: commit 728871bc05afc8ff310b17dba3e57a2472792b13 Author: David Ward <david.ward@ll.mit.edu> Date: Mon Sep 5 16:47:23 2011 +0000 net: Align AF-specific flowi structs to long AF-specific flowi structs are now passed to flow_key_compare, which must also be aligned to a long. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net> so, one more time, in stable sign-off area format: Cc: <stable@vger.kernel.org> # v3.0.x: 728871b: net: Align AF-specific flowi structs to long Cc: <stable@vger.kernel.org> # v3.0.x build tested on 32- and 64-bit powerpc, and ARCH=x86 {i386,x86_64}_defconfig. Kim ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [stable] net: Handle different key sizes between address families in flow cache 2011-11-05 2:29 ` Kim Phillips @ 2011-11-08 16:53 ` Greg KH 2011-11-08 19:44 ` [stable v2] " Kim Phillips 0 siblings, 1 reply; 16+ messages in thread From: Greg KH @ 2011-11-08 16:53 UTC (permalink / raw) To: Kim Phillips Cc: David Miller, stable, eric.dumazet, zheng.z.yan, netdev, David Ward On Fri, Nov 04, 2011 at 09:29:58PM -0500, Kim Phillips wrote: > On Fri, 4 Nov 2011 14:24:39 -0700 > Greg KH <greg@kroah.com> wrote: > > > On Fri, Nov 04, 2011 at 02:46:59PM -0500, Kim Phillips wrote: > > > commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0 upstream. > > > > > > With the conversion of struct flowi to a union of AF-specific structs, some > > > operations on the flow cache need to account for the exact size of the key. > > > > > > Signed-off-by: David Ward <david.ward@ll.mit.edu> > > > Signed-off-by: David S. Miller <davem@davemloft.net> > > > Cc: stable@vger.kernel.org # v3.0.x > > > --- > > > This patch is the result of a clean cherry-pick onto v3.0.8. > > > It restores IPSec fwding performance from ~4kpps back to ~44kpps on > > > a P2020DS board. > > > > Too bad you forgot to build this patch after you applied it: > > CC init/main.o > > In file included from include/linux/security.h:39:0, > > from init/main.c:32: > > include/net/flow.h: In function 'flow_key_size': > > include/net/flow.h:174:3: error: size of unnamed array is negative > > include/net/flow.h:177:3: error: size of unnamed array is negative > > > > Please be kind to your poor over-worked stable kernel maintainer and do > > the decent thing and TEST YOUR PATCH before you ask him to accept it. > > > > bah, I think it's time for the weekend to start a bit earlier than normal... > > so sorry I hadn't tested 64-bit. > > I found the problem - upstream commit aa1c366 depends on > upstream commit 728871b: > > commit 728871bc05afc8ff310b17dba3e57a2472792b13 > Author: David Ward <david.ward@ll.mit.edu> > Date: Mon Sep 5 16:47:23 2011 +0000 > > net: Align AF-specific flowi structs to long > > AF-specific flowi structs are now passed to flow_key_compare, which must > also be aligned to a long. > > Signed-off-by: David Ward <david.ward@ll.mit.edu> > Signed-off-by: David S. Miller <davem@davemloft.net> > > so, one more time, in stable sign-off area format: > > Cc: <stable@vger.kernel.org> # v3.0.x: 728871b: net: Align AF-specific flowi structs to long > Cc: <stable@vger.kernel.org> # v3.0.x > > build tested on 32- and 64-bit powerpc, and ARCH=x86 > {i386,x86_64}_defconfig. Can you please resend the original patch I needed to apply here as well? It's long-gone in my deleted inbox. thanks, greg k-h ^ permalink raw reply [flat|nested] 16+ messages in thread
* [stable v2] net: Handle different key sizes between address families in flow cache 2011-11-08 16:53 ` Greg KH @ 2011-11-08 19:44 ` Kim Phillips 0 siblings, 0 replies; 16+ messages in thread From: Kim Phillips @ 2011-11-08 19:44 UTC (permalink / raw) To: Greg KH Cc: David Miller, stable, eric.dumazet, zheng.z.yan, netdev, David Ward commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0 upstream. With the conversion of struct flowi to a union of AF-specific structs, some operations on the flow cache need to account for the exact size of the key. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> # v3.0.x: 728871b: net: Align AF-specific flowi structs to long Cc: <stable@vger.kernel.org> # v3.0.x --- v2: resend, with 64-bit build fix dependency 728871b included in sign-off area include/net/flow.h | 19 +++++++++++++++++++ net/core/flow.c | 31 +++++++++++++++++-------------- 2 files changed, 36 insertions(+), 14 deletions(-) diff --git a/include/net/flow.h b/include/net/flow.h index c6d5fe5..93a8785 100644 --- a/include/net/flow.h +++ b/include/net/flow.h @@ -7,6 +7,7 @@ #ifndef _NET_FLOW_H #define _NET_FLOW_H +#include <linux/socket.h> #include <linux/in6.h> #include <asm/atomic.h> @@ -161,6 +162,24 @@ static inline struct flowi *flowidn_to_flowi(struct flowidn *fldn) return container_of(fldn, struct flowi, u.dn); } +typedef unsigned long flow_compare_t; + +static inline size_t flow_key_size(u16 family) +{ + switch (family) { + case AF_INET: + BUILD_BUG_ON(sizeof(struct flowi4) % sizeof(flow_compare_t)); + return sizeof(struct flowi4) / sizeof(flow_compare_t); + case AF_INET6: + BUILD_BUG_ON(sizeof(struct flowi6) % sizeof(flow_compare_t)); + return sizeof(struct flowi6) / sizeof(flow_compare_t); + case AF_DECnet: + BUILD_BUG_ON(sizeof(struct flowidn) % sizeof(flow_compare_t)); + return sizeof(struct flowidn) / sizeof(flow_compare_t); + } + return 0; +} + #define FLOW_DIR_IN 0 #define FLOW_DIR_OUT 1 #define FLOW_DIR_FWD 2 diff --git a/net/core/flow.c b/net/core/flow.c index 990703b..a6bda2a 100644 --- a/net/core/flow.c +++ b/net/core/flow.c @@ -172,29 +172,26 @@ static void flow_new_hash_rnd(struct flow_cache *fc, static u32 flow_hash_code(struct flow_cache *fc, struct flow_cache_percpu *fcp, - const struct flowi *key) + const struct flowi *key, + size_t keysize) { const u32 *k = (const u32 *) key; + const u32 length = keysize * sizeof(flow_compare_t) / sizeof(u32); - return jhash2(k, (sizeof(*key) / sizeof(u32)), fcp->hash_rnd) + return jhash2(k, length, fcp->hash_rnd) & (flow_cache_hash_size(fc) - 1); } -typedef unsigned long flow_compare_t; - /* I hear what you're saying, use memcmp. But memcmp cannot make - * important assumptions that we can here, such as alignment and - * constant size. + * important assumptions that we can here, such as alignment. */ -static int flow_key_compare(const struct flowi *key1, const struct flowi *key2) +static int flow_key_compare(const struct flowi *key1, const struct flowi *key2, + size_t keysize) { const flow_compare_t *k1, *k1_lim, *k2; - const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t); - - BUILD_BUG_ON(sizeof(struct flowi) % sizeof(flow_compare_t)); k1 = (const flow_compare_t *) key1; - k1_lim = k1 + n_elem; + k1_lim = k1 + keysize; k2 = (const flow_compare_t *) key2; @@ -215,6 +212,7 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir, struct flow_cache_entry *fle, *tfle; struct hlist_node *entry; struct flow_cache_object *flo; + size_t keysize; unsigned int hash; local_bh_disable(); @@ -222,6 +220,11 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir, fle = NULL; flo = NULL; + + keysize = flow_key_size(family); + if (!keysize) + goto nocache; + /* Packet really early in init? Making flow_cache_init a * pre-smp initcall would solve this. --RR */ if (!fcp->hash_table) @@ -230,11 +233,11 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir, if (fcp->hash_rnd_recalc) flow_new_hash_rnd(fc, fcp); - hash = flow_hash_code(fc, fcp, key); + hash = flow_hash_code(fc, fcp, key, keysize); hlist_for_each_entry(tfle, entry, &fcp->hash_table[hash], u.hlist) { if (tfle->family == family && tfle->dir == dir && - flow_key_compare(key, &tfle->key) == 0) { + flow_key_compare(key, &tfle->key, keysize) == 0) { fle = tfle; break; } @@ -248,7 +251,7 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir, if (fle) { fle->family = family; fle->dir = dir; - memcpy(&fle->key, key, sizeof(*key)); + memcpy(&fle->key, key, keysize * sizeof(flow_compare_t)); fle->object = NULL; hlist_add_head(&fle->u.hlist, &fcp->hash_table[hash]); fcp->hash_count++; -- 1.7.7.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression 2011-10-23 7:58 [PATCH] ipv4: fix ipsec forward performance regression Yan, Zheng 2011-10-23 9:03 ` Eric Dumazet @ 2011-10-23 14:52 ` Julian Anastasov 2011-10-24 0:41 ` Yan, Zheng 1 sibling, 1 reply; 16+ messages in thread From: Julian Anastasov @ 2011-10-23 14:52 UTC (permalink / raw) To: Yan, Zheng; +Cc: netdev, davem, eric.dumazet, Kim Phillips Hello, On Sun, 23 Oct 2011, Yan, Zheng wrote: > There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable). > It makes xfrm4_fill_dst() modify wrong data structure. > > Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> > --- > net/ipv4/xfrm4_policy.c | 14 +++++++------- > 1 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c > index fc5368a..a0b4c5d 100644 > --- a/net/ipv4/xfrm4_policy.c > +++ b/net/ipv4/xfrm4_policy.c > @@ -79,13 +79,13 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev, > struct rtable *rt = (struct rtable *)xdst->route; > const struct flowi4 *fl4 = &fl->u.ip4; > > - rt->rt_key_dst = fl4->daddr; > - rt->rt_key_src = fl4->saddr; > - rt->rt_key_tos = fl4->flowi4_tos; > - rt->rt_route_iif = fl4->flowi4_iif; > - rt->rt_iif = fl4->flowi4_iif; > - rt->rt_oif = fl4->flowi4_oif; > - rt->rt_mark = fl4->flowi4_mark; > + xdst->u.rt.rt_key_dst = fl4->daddr; > + xdst->u.rt.rt_key_src = fl4->saddr; > + xdst->u.rt.rt_key_tos = fl4->flowi4_tos; > + xdst->u.rt.rt_route_iif = fl4->flowi4_iif; May be I'm missing something but I don't see where flowi4_iif is set for the forwarding case. __xfrm_route_forward calls xfrm_decode_session which does not appear to set flowi4_iif. When providing fl4 for output routes flowi4_iif is always set to 0, so it represents rt_route_iif. But then there are 2 variants for __ip_route_output_key: - ip_route_output_slow sets flowi4_iif to loopback and flowi4_oif to outdev during lookup but never restores them to original values. It is assumed that caller uses outdev from dst, not from flowi4_oif. - for cached route we do not update flowi4_iif and flowi4_oif in __ip_route_output_key, so the resulting fl4 can not be used for these values. I assume, the current rules are that only fl4.saddr and daddr are updated while flowi4_iif and flowi4_oif are not. It looks wrong flowi code to rely on them. Currently, we have 3 values for devices: rt_iif: indev for input routes, resulting outdev for output routes which plays the role as indev for loopback traffic. rt_oif: original outdev key, 0 for input routes, can be 0 for output routes if socket is not bound to oif rt_route_iif: indev for input routes, 0 for output routes With above rules for flowi4_iif and flowi4_oif it is impossible to select value for rt_iif from fl4. I don't know the xfrm code well, may be after the mentioned change we damaged rt_oif and rt_route_iif values for cached dst which can lead to using slow path all the time. Even if rt_intern_hash() avoids caching similar dsts multiple times, if cached entry is damaged we will add more and more new entries after every damage. > + xdst->u.rt.rt_iif = fl4->flowi4_iif; > + xdst->u.rt.rt_oif = fl4->flowi4_oif; > + xdst->u.rt.rt_mark = fl4->flowi4_mark; > > xdst->u.dst.dev = dev; > dev_hold(dev); Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression 2011-10-23 14:52 ` [PATCH] ipv4: fix ipsec forward performance regression Julian Anastasov @ 2011-10-24 0:41 ` Yan, Zheng 2011-10-24 7:01 ` David Miller 0 siblings, 1 reply; 16+ messages in thread From: Yan, Zheng @ 2011-10-24 0:41 UTC (permalink / raw) To: Julian Anastasov; +Cc: netdev, davem, eric.dumazet, Kim Phillips On 10/23/2011 10:52 PM, Julian Anastasov wrote: > > Hello, > > On Sun, 23 Oct 2011, Yan, Zheng wrote: > >> There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable). >> It makes xfrm4_fill_dst() modify wrong data structure. >> >> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> >> --- >> net/ipv4/xfrm4_policy.c | 14 +++++++------- >> 1 files changed, 7 insertions(+), 7 deletions(-) >> >> diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c >> index fc5368a..a0b4c5d 100644 >> --- a/net/ipv4/xfrm4_policy.c >> +++ b/net/ipv4/xfrm4_policy.c >> @@ -79,13 +79,13 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev, >> struct rtable *rt = (struct rtable *)xdst->route; >> const struct flowi4 *fl4 = &fl->u.ip4; >> >> - rt->rt_key_dst = fl4->daddr; >> - rt->rt_key_src = fl4->saddr; >> - rt->rt_key_tos = fl4->flowi4_tos; >> - rt->rt_route_iif = fl4->flowi4_iif; >> - rt->rt_iif = fl4->flowi4_iif; >> - rt->rt_oif = fl4->flowi4_oif; >> - rt->rt_mark = fl4->flowi4_mark; >> + xdst->u.rt.rt_key_dst = fl4->daddr; >> + xdst->u.rt.rt_key_src = fl4->saddr; >> + xdst->u.rt.rt_key_tos = fl4->flowi4_tos; >> + xdst->u.rt.rt_route_iif = fl4->flowi4_iif; > > May be I'm missing something but I don't see where > flowi4_iif is set for the forwarding case. __xfrm_route_forward > calls xfrm_decode_session which does not appear to set > flowi4_iif. When providing fl4 for output routes flowi4_iif > is always set to 0, so it represents rt_route_iif. But > then there are 2 variants for __ip_route_output_key: > > - ip_route_output_slow sets flowi4_iif to loopback and > flowi4_oif to outdev during lookup but never restores them > to original values. It is assumed that caller uses outdev > from dst, not from flowi4_oif. > > - for cached route we do not update flowi4_iif and flowi4_oif > in __ip_route_output_key, so the resulting fl4 can not be > used for these values. I assume, the current rules are that > only fl4.saddr and daddr are updated while flowi4_iif and > flowi4_oif are not. It looks wrong flowi code to rely on them. > > Currently, we have 3 values for devices: > > rt_iif: indev for input routes, resulting outdev for output routes > which plays the role as indev for loopback traffic. > > rt_oif: original outdev key, 0 for input routes, can be 0 for > output routes if socket is not bound to oif > > rt_route_iif: indev for input routes, 0 for output routes > > With above rules for flowi4_iif and flowi4_oif > it is impossible to select value for rt_iif from fl4. > > I don't know the xfrm code well, may be after the Neither do I. My understanding is that xfrm_dst(s) are managed by the flow cache (net/core/flow.c). We don't put them into the routing cache. Regards Yan, Zheng > mentioned change we damaged rt_oif and rt_route_iif values > for cached dst which can lead to using slow path all the time. > Even if rt_intern_hash() avoids caching similar dsts multiple > times, if cached entry is damaged we will add more and > more new entries after every damage. > >> + xdst->u.rt.rt_iif = fl4->flowi4_iif; >> + xdst->u.rt.rt_oif = fl4->flowi4_oif; >> + xdst->u.rt.rt_mark = fl4->flowi4_mark; >> >> xdst->u.dst.dev = dev; >> dev_hold(dev); > > Regards > > -- > Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] ipv4: fix ipsec forward performance regression 2011-10-24 0:41 ` Yan, Zheng @ 2011-10-24 7:01 ` David Miller 0 siblings, 0 replies; 16+ messages in thread From: David Miller @ 2011-10-24 7:01 UTC (permalink / raw) To: zheng.z.yan; +Cc: ja, netdev, eric.dumazet, kim.phillips From: "Yan, Zheng" <zheng.z.yan@intel.com> Date: Mon, 24 Oct 2011 08:41:53 +0800 > My understanding is that xfrm_dst(s) are managed by the flow cache > (net/core/flow.c). We don't put them into the routing cache. This is correct. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-11-08 19:45 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-10-23 7:58 [PATCH] ipv4: fix ipsec forward performance regression Yan, Zheng 2011-10-23 9:03 ` Eric Dumazet 2011-10-24 7:02 ` David Miller 2011-11-01 23:50 ` Kim Phillips 2011-11-02 0:34 ` David Miller 2011-11-03 18:58 ` Kim Phillips 2011-11-04 2:43 ` David Miller 2011-11-04 19:46 ` [stable] net: Handle different key sizes between address families in flow cache Kim Phillips 2011-11-04 20:41 ` David Miller 2011-11-04 21:24 ` Greg KH 2011-11-05 2:29 ` Kim Phillips 2011-11-08 16:53 ` Greg KH 2011-11-08 19:44 ` [stable v2] " Kim Phillips 2011-10-23 14:52 ` [PATCH] ipv4: fix ipsec forward performance regression Julian Anastasov 2011-10-24 0:41 ` Yan, Zheng 2011-10-24 7:01 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).