netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Help needed - Kernel lockup while running ipsec
@ 2019-08-19 12:55 Vakul Garg
  2019-08-19 17:38 ` Florian Westphal
  0 siblings, 1 reply; 15+ messages in thread
From: Vakul Garg @ 2019-08-19 12:55 UTC (permalink / raw)
  To: netdev

Hi

With kernel 4.14.122, I am getting a kernel softlockup while running single static ipsec tunnel.
The problem reproduces mostly after running 8-10 hours of ipsec encap test (on my dual core arm board).

I found that in function xfrm_policy_lookup_bytype(), the policy in variable 'ret' shows refcnt=0 under problem situation.
This creates an infinite loop in  xfrm_policy_lookup_bytype() and hence the lockup.

Can some body please provide me pointers about 'refcnt'?
Is it legitimate for 'refcnt' to become '0'? Under what condition can it become '0'?

Regards
Vakul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Help needed - Kernel lockup while running ipsec
  2019-08-19 12:55 Help needed - Kernel lockup while running ipsec Vakul Garg
@ 2019-08-19 17:38 ` Florian Westphal
  2019-08-20  9:10   ` Vakul Garg
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2019-08-19 17:38 UTC (permalink / raw)
  To: Vakul Garg; +Cc: netdev

Vakul Garg <vakul.garg@nxp.com> wrote:
> Hi
> 
> With kernel 4.14.122, I am getting a kernel softlockup while running single static ipsec tunnel.
> The problem reproduces mostly after running 8-10 hours of ipsec encap test (on my dual core arm board).
> 
> I found that in function xfrm_policy_lookup_bytype(), the policy in variable 'ret' shows refcnt=0 under problem situation.
> This creates an infinite loop in  xfrm_policy_lookup_bytype() and hence the lockup.
> 
> Can some body please provide me pointers about 'refcnt'?
> Is it legitimate for 'refcnt' to become '0'? Under what condition can it become '0'?

Yes, when policy is destroyed and the last user calls
xfrm_pol_put() which will invoke call_rcu to free the structure.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Help needed - Kernel lockup while running ipsec
  2019-08-19 17:38 ` Florian Westphal
@ 2019-08-20  9:10   ` Vakul Garg
  2019-08-20  9:23     ` Florian Westphal
  0 siblings, 1 reply; 15+ messages in thread
From: Vakul Garg @ 2019-08-20  9:10 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev

Thanks for your response.


> -----Original Message-----
> From: Florian Westphal <fw@strlen.de>
> Sent: Monday, August 19, 2019 11:08 PM
> To: Vakul Garg <vakul.garg@nxp.com>
> Cc: netdev@vger.kernel.org
> Subject: Re: Help needed - Kernel lockup while running ipsec
> 
> Vakul Garg <vakul.garg@nxp.com> wrote:
> > Hi
> >
> > With kernel 4.14.122, I am getting a kernel softlockup while running single
> static ipsec tunnel.
> > The problem reproduces mostly after running 8-10 hours of ipsec encap
> test (on my dual core arm board).
> >
> > I found that in function xfrm_policy_lookup_bytype(), the policy in variable
> 'ret' shows refcnt=0 under problem situation.
> > This creates an infinite loop in  xfrm_policy_lookup_bytype() and hence the
> lockup.
> >
> > Can some body please provide me pointers about 'refcnt'?
> > Is it legitimate for 'refcnt' to become '0'? Under what condition can it
> become '0'?
> 
> Yes, when policy is destroyed and the last user calls
> xfrm_pol_put() which will invoke call_rcu to free the structure.

It seems that policy reference count never gets decremented during packet ipsec encap.
It is getting incremented for every frame that hits the policy.
In setkey -DP output, I see refcnt to be wrapping around after '0'.

Is this designed to be like this or is it weird?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Help needed - Kernel lockup while running ipsec
  2019-08-20  9:10   ` Vakul Garg
@ 2019-08-20  9:23     ` Florian Westphal
  2019-08-20  9:26       ` Vakul Garg
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2019-08-20  9:23 UTC (permalink / raw)
  To: Vakul Garg; +Cc: Florian Westphal, netdev

Vakul Garg <vakul.garg@nxp.com> wrote:
> > > With kernel 4.14.122, I am getting a kernel softlockup while running single
> > static ipsec tunnel.
> > > The problem reproduces mostly after running 8-10 hours of ipsec encap
> > test (on my dual core arm board).
> > >
> > > I found that in function xfrm_policy_lookup_bytype(), the policy in variable
> > 'ret' shows refcnt=0 under problem situation.
> > > This creates an infinite loop in  xfrm_policy_lookup_bytype() and hence the
> > lockup.
> > >
> > > Can some body please provide me pointers about 'refcnt'?
> > > Is it legitimate for 'refcnt' to become '0'? Under what condition can it
> > become '0'?
> > 
> > Yes, when policy is destroyed and the last user calls
> > xfrm_pol_put() which will invoke call_rcu to free the structure.
> 
> It seems that policy reference count never gets decremented during packet ipsec encap.
> It is getting incremented for every frame that hits the policy.
> In setkey -DP output, I see refcnt to be wrapping around after '0'.

Thats a bug.  Does this affect 4.14 only or does this happen on current
tree as well?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Help needed - Kernel lockup while running ipsec
  2019-08-20  9:23     ` Florian Westphal
@ 2019-08-20  9:26       ` Vakul Garg
  2019-08-20  9:30         ` Vakul Garg
  2019-08-20  9:38         ` Florian Westphal
  0 siblings, 2 replies; 15+ messages in thread
From: Vakul Garg @ 2019-08-20  9:26 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev



> -----Original Message-----
> From: Florian Westphal <fw@strlen.de>
> Sent: Tuesday, August 20, 2019 2:53 PM
> To: Vakul Garg <vakul.garg@nxp.com>
> Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> Subject: Re: Help needed - Kernel lockup while running ipsec
> 
> Vakul Garg <vakul.garg@nxp.com> wrote:
> > > > With kernel 4.14.122, I am getting a kernel softlockup while
> > > > running single
> > > static ipsec tunnel.
> > > > The problem reproduces mostly after running 8-10 hours of ipsec
> > > > encap
> > > test (on my dual core arm board).
> > > >
> > > > I found that in function xfrm_policy_lookup_bytype(), the policy
> > > > in variable
> > > 'ret' shows refcnt=0 under problem situation.
> > > > This creates an infinite loop in  xfrm_policy_lookup_bytype() and
> > > > hence the
> > > lockup.
> > > >
> > > > Can some body please provide me pointers about 'refcnt'?
> > > > Is it legitimate for 'refcnt' to become '0'? Under what condition
> > > > can it
> > > become '0'?
> > >
> > > Yes, when policy is destroyed and the last user calls
> > > xfrm_pol_put() which will invoke call_rcu to free the structure.
> >
> > It seems that policy reference count never gets decremented during packet
> ipsec encap.
> > It is getting incremented for every frame that hits the policy.
> > In setkey -DP output, I see refcnt to be wrapping around after '0'.
> 
> Thats a bug.  Does this affect 4.14 only or does this happen on current tree
> as well?
 
I am yet to try it on 4.19.
Can you help me with the right fix? Which part of code should it get decremented?
I am not conversant with xfrm code.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Help needed - Kernel lockup while running ipsec
  2019-08-20  9:26       ` Vakul Garg
@ 2019-08-20  9:30         ` Vakul Garg
  2019-08-20  9:38         ` Florian Westphal
  1 sibling, 0 replies; 15+ messages in thread
From: Vakul Garg @ 2019-08-20  9:30 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev



> > -----Original Message-----
> > From: Florian Westphal <fw@strlen.de>
> > Sent: Tuesday, August 20, 2019 2:53 PM
> > To: Vakul Garg <vakul.garg@nxp.com>
> > Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> > Subject: Re: Help needed - Kernel lockup while running ipsec
> >
> > Vakul Garg <vakul.garg@nxp.com> wrote:
> > > > > With kernel 4.14.122, I am getting a kernel softlockup while
> > > > > running single
> > > > static ipsec tunnel.
> > > > > The problem reproduces mostly after running 8-10 hours of ipsec
> > > > > encap
> > > > test (on my dual core arm board).
> > > > >
> > > > > I found that in function xfrm_policy_lookup_bytype(), the policy
> > > > > in variable
> > > > 'ret' shows refcnt=0 under problem situation.
> > > > > This creates an infinite loop in  xfrm_policy_lookup_bytype() and
> > > > > hence the
> > > > lockup.
> > > > >
> > > > > Can some body please provide me pointers about 'refcnt'?
> > > > > Is it legitimate for 'refcnt' to become '0'? Under what condition
> > > > > can it
> > > > become '0'?
> > > >
> > > > Yes, when policy is destroyed and the last user calls
> > > > xfrm_pol_put() which will invoke call_rcu to free the structure.
> > >
> > > It seems that policy reference count never gets decremented during
> packet
> > ipsec encap.
> > > It is getting incremented for every frame that hits the policy.
> > > In setkey -DP output, I see refcnt to be wrapping around after '0'.
> >
> > Thats a bug.  Does this affect 4.14 only or does this happen on current tree
> > as well?
> 
> I am yet to try it on 4.19.

Correction: I am yet to try it on current tree.

> Can you help me with the right fix? Which part of code should it get
> decremented?
> I am not conversant with xfrm code.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Help needed - Kernel lockup while running ipsec
  2019-08-20  9:26       ` Vakul Garg
  2019-08-20  9:30         ` Vakul Garg
@ 2019-08-20  9:38         ` Florian Westphal
  2019-08-20  9:52           ` Vakul Garg
  1 sibling, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2019-08-20  9:38 UTC (permalink / raw)
  To: Vakul Garg; +Cc: Florian Westphal, netdev

Vakul Garg <vakul.garg@nxp.com> wrote:
> 
> 
> > -----Original Message-----
> > From: Florian Westphal <fw@strlen.de>
> > Sent: Tuesday, August 20, 2019 2:53 PM
> > To: Vakul Garg <vakul.garg@nxp.com>
> > Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> > Subject: Re: Help needed - Kernel lockup while running ipsec
> > 
> > Vakul Garg <vakul.garg@nxp.com> wrote:
> > > > > With kernel 4.14.122, I am getting a kernel softlockup while
> > > > > running single
> > > > static ipsec tunnel.
> > > > > The problem reproduces mostly after running 8-10 hours of ipsec
> > > > > encap
> > > > test (on my dual core arm board).
> > > > >
> > > > > I found that in function xfrm_policy_lookup_bytype(), the policy
> > > > > in variable
> > > > 'ret' shows refcnt=0 under problem situation.
> > > > > This creates an infinite loop in  xfrm_policy_lookup_bytype() and
> > > > > hence the
> > > > lockup.
> > > > >
> > > > > Can some body please provide me pointers about 'refcnt'?
> > > > > Is it legitimate for 'refcnt' to become '0'? Under what condition
> > > > > can it
> > > > become '0'?
> > > >
> > > > Yes, when policy is destroyed and the last user calls
> > > > xfrm_pol_put() which will invoke call_rcu to free the structure.
> > >
> > > It seems that policy reference count never gets decremented during packet
> > ipsec encap.
> > > It is getting incremented for every frame that hits the policy.
> > > In setkey -DP output, I see refcnt to be wrapping around after '0'.
> > 
> > Thats a bug.  Does this affect 4.14 only or does this happen on current tree
> > as well?
>  
> I am yet to try it on 4.19.
> Can you help me with the right fix? Which part of code should it get decremented?
> I am not conversant with xfrm code.

Normally policy reference counts get decremented when the skb is free'd, via dst
destruction (xfrm_dst_destroy()).

Do you see a dst leak as well?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Help needed - Kernel lockup while running ipsec
  2019-08-20  9:38         ` Florian Westphal
@ 2019-08-20  9:52           ` Vakul Garg
  2019-08-20 10:38             ` Vakul Garg
  0 siblings, 1 reply; 15+ messages in thread
From: Vakul Garg @ 2019-08-20  9:52 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev



> -----Original Message-----
> From: Florian Westphal <fw@strlen.de>
> Sent: Tuesday, August 20, 2019 3:08 PM
> To: Vakul Garg <vakul.garg@nxp.com>
> Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> Subject: Re: Help needed - Kernel lockup while running ipsec
> 
> Vakul Garg <vakul.garg@nxp.com> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Florian Westphal <fw@strlen.de>
> > > Sent: Tuesday, August 20, 2019 2:53 PM
> > > To: Vakul Garg <vakul.garg@nxp.com>
> > > Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> > > Subject: Re: Help needed - Kernel lockup while running ipsec
> > >
> > > Vakul Garg <vakul.garg@nxp.com> wrote:
> > > > > > With kernel 4.14.122, I am getting a kernel softlockup while
> > > > > > running single
> > > > > static ipsec tunnel.
> > > > > > The problem reproduces mostly after running 8-10 hours of
> > > > > > ipsec encap
> > > > > test (on my dual core arm board).
> > > > > >
> > > > > > I found that in function xfrm_policy_lookup_bytype(), the
> > > > > > policy in variable
> > > > > 'ret' shows refcnt=0 under problem situation.
> > > > > > This creates an infinite loop in  xfrm_policy_lookup_bytype()
> > > > > > and hence the
> > > > > lockup.
> > > > > >
> > > > > > Can some body please provide me pointers about 'refcnt'?
> > > > > > Is it legitimate for 'refcnt' to become '0'? Under what
> > > > > > condition can it
> > > > > become '0'?
> > > > >
> > > > > Yes, when policy is destroyed and the last user calls
> > > > > xfrm_pol_put() which will invoke call_rcu to free the structure.
> > > >
> > > > It seems that policy reference count never gets decremented during
> > > > packet
> > > ipsec encap.
> > > > It is getting incremented for every frame that hits the policy.
> > > > In setkey -DP output, I see refcnt to be wrapping around after '0'.
> > >
> > > Thats a bug.  Does this affect 4.14 only or does this happen on
> > > current tree as well?
> >
> > I am yet to try it on 4.19.
> > Can you help me with the right fix? Which part of code should it get
> decremented?
> > I am not conversant with xfrm code.
> 
> Normally policy reference counts get decremented when the skb is free'd, via
> dst destruction (xfrm_dst_destroy()).
> 
> Do you see a dst leak as well?

Can you please guide me how to detect it?

(I am checking refcount on recent kernel and will let you know.)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Help needed - Kernel lockup while running ipsec
  2019-08-20  9:52           ` Vakul Garg
@ 2019-08-20 10:38             ` Vakul Garg
  2019-08-21  7:37               ` Vakul Garg
  0 siblings, 1 reply; 15+ messages in thread
From: Vakul Garg @ 2019-08-20 10:38 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev



> 
> > -----Original Message-----
> > From: Florian Westphal <fw@strlen.de>
> > Sent: Tuesday, August 20, 2019 3:08 PM
> > To: Vakul Garg <vakul.garg@nxp.com>
> > Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> > Subject: Re: Help needed - Kernel lockup while running ipsec
> >
> > Vakul Garg <vakul.garg@nxp.com> wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Florian Westphal <fw@strlen.de>
> > > > Sent: Tuesday, August 20, 2019 2:53 PM
> > > > To: Vakul Garg <vakul.garg@nxp.com>
> > > > Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> > > > Subject: Re: Help needed - Kernel lockup while running ipsec
> > > >
> > > > Vakul Garg <vakul.garg@nxp.com> wrote:
> > > > > > > With kernel 4.14.122, I am getting a kernel softlockup while
> > > > > > > running single
> > > > > > static ipsec tunnel.
> > > > > > > The problem reproduces mostly after running 8-10 hours of
> > > > > > > ipsec encap
> > > > > > test (on my dual core arm board).
> > > > > > >
> > > > > > > I found that in function xfrm_policy_lookup_bytype(), the
> > > > > > > policy in variable
> > > > > > 'ret' shows refcnt=0 under problem situation.
> > > > > > > This creates an infinite loop in  xfrm_policy_lookup_bytype()
> > > > > > > and hence the
> > > > > > lockup.
> > > > > > >
> > > > > > > Can some body please provide me pointers about 'refcnt'?
> > > > > > > Is it legitimate for 'refcnt' to become '0'? Under what
> > > > > > > condition can it
> > > > > > become '0'?
> > > > > >
> > > > > > Yes, when policy is destroyed and the last user calls
> > > > > > xfrm_pol_put() which will invoke call_rcu to free the structure.
> > > > >
> > > > > It seems that policy reference count never gets decremented during
> > > > > packet
> > > > ipsec encap.
> > > > > It is getting incremented for every frame that hits the policy.
> > > > > In setkey -DP output, I see refcnt to be wrapping around after '0'.
> > > >
> > > > Thats a bug.  Does this affect 4.14 only or does this happen on
> > > > current tree as well?
> > >
> > > I am yet to try it on 4.19.
> > > Can you help me with the right fix? Which part of code should it get
> > decremented?
> > > I am not conversant with xfrm code.
> >
> > Normally policy reference counts get decremented when the skb is free'd,
> via
> > dst destruction (xfrm_dst_destroy()).
> >
> > Do you see a dst leak as well?
> 
> Can you please guide me how to detect it?
> 
> (I am checking refcount on recent kernel and will let you know.)

Policy refcount is decreasing properly on 4.19.
Same should be on the latest kernel too.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Help needed - Kernel lockup while running ipsec
  2019-08-20 10:38             ` Vakul Garg
@ 2019-08-21  7:37               ` Vakul Garg
  2019-08-21 16:11                 ` Florian Westphal
  0 siblings, 1 reply; 15+ messages in thread
From: Vakul Garg @ 2019-08-21  7:37 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev



> -----Original Message-----
> From: Vakul Garg
> Sent: Tuesday, August 20, 2019 4:08 PM
> To: Florian Westphal <fw@strlen.de>
> Cc: netdev@vger.kernel.org
> Subject: RE: Help needed - Kernel lockup while running ipsec
> 
> 
> 
> >
> > > -----Original Message-----
> > > From: Florian Westphal <fw@strlen.de>
> > > Sent: Tuesday, August 20, 2019 3:08 PM
> > > To: Vakul Garg <vakul.garg@nxp.com>
> > > Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> > > Subject: Re: Help needed - Kernel lockup while running ipsec
> > >
> > > Vakul Garg <vakul.garg@nxp.com> wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Florian Westphal <fw@strlen.de>
> > > > > Sent: Tuesday, August 20, 2019 2:53 PM
> > > > > To: Vakul Garg <vakul.garg@nxp.com>
> > > > > Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> > > > > Subject: Re: Help needed - Kernel lockup while running ipsec
> > > > >
> > > > > Vakul Garg <vakul.garg@nxp.com> wrote:
> > > > > > > > With kernel 4.14.122, I am getting a kernel softlockup while
> > > > > > > > running single
> > > > > > > static ipsec tunnel.
> > > > > > > > The problem reproduces mostly after running 8-10 hours of
> > > > > > > > ipsec encap
> > > > > > > test (on my dual core arm board).
> > > > > > > >
> > > > > > > > I found that in function xfrm_policy_lookup_bytype(), the
> > > > > > > > policy in variable
> > > > > > > 'ret' shows refcnt=0 under problem situation.
> > > > > > > > This creates an infinite loop in  xfrm_policy_lookup_bytype()
> > > > > > > > and hence the
> > > > > > > lockup.
> > > > > > > >
> > > > > > > > Can some body please provide me pointers about 'refcnt'?
> > > > > > > > Is it legitimate for 'refcnt' to become '0'? Under what
> > > > > > > > condition can it
> > > > > > > become '0'?
> > > > > > >
> > > > > > > Yes, when policy is destroyed and the last user calls
> > > > > > > xfrm_pol_put() which will invoke call_rcu to free the structure.
> > > > > >
> > > > > > It seems that policy reference count never gets decremented during
> > > > > > packet
> > > > > ipsec encap.
> > > > > > It is getting incremented for every frame that hits the policy.
> > > > > > In setkey -DP output, I see refcnt to be wrapping around after '0'.
> > > > >
> > > > > Thats a bug.  Does this affect 4.14 only or does this happen on
> > > > > current tree as well?
> > > >
> > > > I am yet to try it on 4.19.
> > > > Can you help me with the right fix? Which part of code should it get
> > > decremented?
> > > > I am not conversant with xfrm code.
> > >
> > > Normally policy reference counts get decremented when the skb is
> free'd,
> > via
> > > dst destruction (xfrm_dst_destroy()).
> > >
> > > Do you see a dst leak as well?
> >
> > Can you please guide me how to detect it?
> >
> > (I am checking refcount on recent kernel and will let you know.)
> 
> Policy refcount is decreasing properly on 4.19.
> Same should be on the latest kernel too.

On kernel-4.14, I find dst_release() is getting called through xfrm_output_one().
However since dst->__refcnt gets decremented to '1', 
the call_rcu(&dst->rcu_head, dst_destroy_rcu) is not invoked. 

On kernel-4.19, dst->__refcnt gets decremented to '0', hence things fall in place and 
dst_destroy_rcu() eventually executes.

Any further help/pointers for kernel-4.14 would be deeply appreciated.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Help needed - Kernel lockup while running ipsec
  2019-08-21  7:37               ` Vakul Garg
@ 2019-08-21 16:11                 ` Florian Westphal
  2019-08-22 10:23                   ` Vakul Garg
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2019-08-21 16:11 UTC (permalink / raw)
  To: Vakul Garg; +Cc: Florian Westphal, netdev

[-- Attachment #1: Type: text/plain, Size: 759 bytes --]

Vakul Garg <vakul.garg@nxp.com> wrote:
> > Policy refcount is decreasing properly on 4.19.
> > Same should be on the latest kernel too.
> 
> On kernel-4.14, I find dst_release() is getting called through xfrm_output_one().
> However since dst->__refcnt gets decremented to '1', 
> the call_rcu(&dst->rcu_head, dst_destroy_rcu) is not invoked. 
> 
> On kernel-4.19, dst->__refcnt gets decremented to '0', hence things fall in place and 
> dst_destroy_rcu() eventually executes.
> 
> Any further help/pointers for kernel-4.14 would be deeply appreciated.

Can you try getting rid of the pcpu dst cache?

I had a look at 4.14-stable and it at least lacks 2950278d2d04ff531.

I've attached an (untested) revert of the pcpu cache (its gone in 4.19
and onwards).



[-- Attachment #2: 0001-xfrm-policy-remove-pcpu-policy-cache.patch --]
[-- Type: text/x-diff, Size: 8993 bytes --]

From 058cb6719223d10dc57743dbf5c20424f118e7e7 Mon Sep 17 00:00:00 2001
From: Florian Westphal <fw@strlen.de>
Date: Mon, 25 Jun 2018 17:26:02 +0200
Subject: [PATCH 4.4.14.y] xfrm: policy: remove pcpu policy cache

commit e4db5b61c572475bbbcf63e3c8a2606bfccf2c9d upstream.

Kristian Evensen says:
  In a project I am involved in, we are running ipsec (Strongswan) on
  different mt7621-based routers. Each router is configured as an
  initiator and has around ~30 tunnels to different responders (running
  on misc. devices). Before the flow cache was removed (kernel 4.9), we
  got a combined throughput of around 70Mbit/s for all tunnels on one
  router. However, we recently switched to kernel 4.14 (4.14.48), and
  the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
  drop of around 20%. Reverting the flow cache removal restores, as
  expected, performance levels to that of kernel 4.9.

When pcpu xdst exists, it has to be validated first before it can be
used.

A negative hit thus increases cost vs. no-cache.

As number of tunnels increases, hit rate decreases so this pcpu caching
isn't a viable strategy.

Furthermore, the xdst cache also needs to run with BH off, so when
removing this the bh disable/enable pairs can be removed too.

Kristian tested a 4.14.y backport of this change and reported
increased performance:

  In our tests, the throughput reduction has been reduced from around -20%
  to -5%. We also see that the overall throughput is independent of the
  number of tunnels, while before the throughput was reduced as the number
  of tunnels increased.

Reported-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h     |   1 -
 net/xfrm/xfrm_device.c |  10 ---
 net/xfrm/xfrm_policy.c | 138 +----------------------------------------
 net/xfrm/xfrm_state.c  |   5 +-
 4 files changed, 3 insertions(+), 151 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index db99efb2d1d0..bdf185ae93db 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -323,7 +323,6 @@ int xfrm_policy_register_afinfo(const struct xfrm_policy_afinfo *afinfo, int fam
 void xfrm_policy_unregister_afinfo(const struct xfrm_policy_afinfo *afinfo);
 void km_policy_notify(struct xfrm_policy *xp, int dir,
 		      const struct km_event *c);
-void xfrm_policy_cache_flush(void);
 void km_state_notify(struct xfrm_state *x, const struct km_event *c);
 
 struct xfrm_tmpl;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 30e5746085b8..4e458fd9236a 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -153,12 +153,6 @@ static int xfrm_dev_register(struct net_device *dev)
 	return NOTIFY_DONE;
 }
 
-static int xfrm_dev_unregister(struct net_device *dev)
-{
-	xfrm_policy_cache_flush();
-	return NOTIFY_DONE;
-}
-
 static int xfrm_dev_feat_change(struct net_device *dev)
 {
 	if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops)
@@ -178,7 +172,6 @@ static int xfrm_dev_down(struct net_device *dev)
 	if (dev->features & NETIF_F_HW_ESP)
 		xfrm_dev_state_flush(dev_net(dev), dev, true);
 
-	xfrm_policy_cache_flush();
 	return NOTIFY_DONE;
 }
 
@@ -190,9 +183,6 @@ static int xfrm_dev_event(struct notifier_block *this, unsigned long event, void
 	case NETDEV_REGISTER:
 		return xfrm_dev_register(dev);
 
-	case NETDEV_UNREGISTER:
-		return xfrm_dev_unregister(dev);
-
 	case NETDEV_FEAT_CHANGE:
 		return xfrm_dev_feat_change(dev);
 
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 70ec57b887f6..b5006a091fd6 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -45,8 +45,6 @@ struct xfrm_flo {
 	u8 flags;
 };
 
-static DEFINE_PER_CPU(struct xfrm_dst *, xfrm_last_dst);
-static struct work_struct *xfrm_pcpu_work __read_mostly;
 static DEFINE_SPINLOCK(xfrm_policy_afinfo_lock);
 static struct xfrm_policy_afinfo const __rcu *xfrm_policy_afinfo[AF_INET6 + 1]
 						__read_mostly;
@@ -1715,108 +1713,6 @@ static int xfrm_expand_policies(const struct flowi *fl, u16 family,
 
 }
 
-static void xfrm_last_dst_update(struct xfrm_dst *xdst, struct xfrm_dst *old)
-{
-	this_cpu_write(xfrm_last_dst, xdst);
-	if (old)
-		dst_release(&old->u.dst);
-}
-
-static void __xfrm_pcpu_work_fn(void)
-{
-	struct xfrm_dst *old;
-
-	old = this_cpu_read(xfrm_last_dst);
-	if (old && !xfrm_bundle_ok(old))
-		xfrm_last_dst_update(NULL, old);
-}
-
-static void xfrm_pcpu_work_fn(struct work_struct *work)
-{
-	local_bh_disable();
-	rcu_read_lock();
-	__xfrm_pcpu_work_fn();
-	rcu_read_unlock();
-	local_bh_enable();
-}
-
-void xfrm_policy_cache_flush(void)
-{
-	struct xfrm_dst *old;
-	bool found = 0;
-	int cpu;
-
-	might_sleep();
-
-	local_bh_disable();
-	rcu_read_lock();
-	for_each_possible_cpu(cpu) {
-		old = per_cpu(xfrm_last_dst, cpu);
-		if (old && !xfrm_bundle_ok(old)) {
-			if (smp_processor_id() == cpu) {
-				__xfrm_pcpu_work_fn();
-				continue;
-			}
-			found = true;
-			break;
-		}
-	}
-
-	rcu_read_unlock();
-	local_bh_enable();
-
-	if (!found)
-		return;
-
-	get_online_cpus();
-
-	for_each_possible_cpu(cpu) {
-		bool bundle_release;
-
-		rcu_read_lock();
-		old = per_cpu(xfrm_last_dst, cpu);
-		bundle_release = old && !xfrm_bundle_ok(old);
-		rcu_read_unlock();
-
-		if (!bundle_release)
-			continue;
-
-		if (cpu_online(cpu)) {
-			schedule_work_on(cpu, &xfrm_pcpu_work[cpu]);
-			continue;
-		}
-
-		rcu_read_lock();
-		old = per_cpu(xfrm_last_dst, cpu);
-		if (old && !xfrm_bundle_ok(old)) {
-			per_cpu(xfrm_last_dst, cpu) = NULL;
-			dst_release(&old->u.dst);
-		}
-		rcu_read_unlock();
-	}
-
-	put_online_cpus();
-}
-
-static bool xfrm_xdst_can_reuse(struct xfrm_dst *xdst,
-				struct xfrm_state * const xfrm[],
-				int num)
-{
-	const struct dst_entry *dst = &xdst->u.dst;
-	int i;
-
-	if (xdst->num_xfrms != num)
-		return false;
-
-	for (i = 0; i < num; i++) {
-		if (!dst || dst->xfrm != xfrm[i])
-			return false;
-		dst = dst->child;
-	}
-
-	return xfrm_bundle_ok(xdst);
-}
-
 static struct xfrm_dst *
 xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, int num_pols,
 			       const struct flowi *fl, u16 family,
@@ -1824,7 +1720,7 @@ xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, int num_pols,
 {
 	struct net *net = xp_net(pols[0]);
 	struct xfrm_state *xfrm[XFRM_MAX_DEPTH];
-	struct xfrm_dst *xdst, *old;
+	struct xfrm_dst *xdst;
 	struct dst_entry *dst;
 	int err;
 
@@ -1839,21 +1735,6 @@ xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, int num_pols,
 		return ERR_PTR(err);
 	}
 
-	xdst = this_cpu_read(xfrm_last_dst);
-	if (xdst &&
-	    xdst->u.dst.dev == dst_orig->dev &&
-	    xdst->num_pols == num_pols &&
-	    memcmp(xdst->pols, pols,
-		   sizeof(struct xfrm_policy *) * num_pols) == 0 &&
-	    xfrm_xdst_can_reuse(xdst, xfrm, err)) {
-		dst_hold(&xdst->u.dst);
-		while (err > 0)
-			xfrm_state_put(xfrm[--err]);
-		return xdst;
-	}
-
-	old = xdst;
-
 	dst = xfrm_bundle_create(pols[0], xfrm, err, fl, dst_orig);
 	if (IS_ERR(dst)) {
 		XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTBUNDLEGENERROR);
@@ -1866,9 +1747,6 @@ xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, int num_pols,
 	memcpy(xdst->pols, pols, sizeof(struct xfrm_policy *) * num_pols);
 	xdst->policy_genid = atomic_read(&pols[0]->genid);
 
-	atomic_set(&xdst->u.dst.__refcnt, 2);
-	xfrm_last_dst_update(xdst, old);
-
 	return xdst;
 }
 
@@ -2069,11 +1947,8 @@ xfrm_bundle_lookup(struct net *net, const struct flowi *fl, u16 family, u8 dir,
 	if (num_xfrms <= 0)
 		goto make_dummy_bundle;
 
-	local_bh_disable();
 	xdst = xfrm_resolve_and_create_bundle(pols, num_pols, fl, family,
 					      xflo->dst_orig);
-	local_bh_enable();
-
 	if (IS_ERR(xdst)) {
 		err = PTR_ERR(xdst);
 		if (err != -EAGAIN)
@@ -2160,11 +2035,9 @@ struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry *dst_orig,
 				goto no_transform;
 			}
 
-			local_bh_disable();
 			xdst = xfrm_resolve_and_create_bundle(
 					pols, num_pols, fl,
 					family, dst_orig);
-			local_bh_enable();
 
 			if (IS_ERR(xdst)) {
 				xfrm_pols_put(pols, num_pols);
@@ -2992,15 +2865,6 @@ static struct pernet_operations __net_initdata xfrm_net_ops = {
 
 void __init xfrm_init(void)
 {
-	int i;
-
-	xfrm_pcpu_work = kmalloc_array(NR_CPUS, sizeof(*xfrm_pcpu_work),
-				       GFP_KERNEL);
-	BUG_ON(!xfrm_pcpu_work);
-
-	for (i = 0; i < NR_CPUS; i++)
-		INIT_WORK(&xfrm_pcpu_work[i], xfrm_pcpu_work_fn);
-
 	register_pernet_subsys(&xfrm_net_ops);
 	seqcount_init(&xfrm_policy_hash_generation);
 	xfrm_input_init();
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 0cd2bdf3b217..7c093de68780 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -735,10 +735,9 @@ int xfrm_state_flush(struct net *net, u8 proto, bool task_valid)
 	}
 out:
 	spin_unlock_bh(&net->xfrm.xfrm_state_lock);
-	if (cnt) {
+	if (cnt)
 		err = 0;
-		xfrm_policy_cache_flush();
-	}
+
 	return err;
 }
 EXPORT_SYMBOL(xfrm_state_flush);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: Help needed - Kernel lockup while running ipsec
  2019-08-21 16:11                 ` Florian Westphal
@ 2019-08-22 10:23                   ` Vakul Garg
  2019-08-22 11:21                     ` [PATCH 4.14.y stable] xfrm: policy: remove pcpu policy cache Florian Westphal
  0 siblings, 1 reply; 15+ messages in thread
From: Vakul Garg @ 2019-08-22 10:23 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev



> -----Original Message-----
> From: Florian Westphal <fw@strlen.de>
> Sent: Wednesday, August 21, 2019 9:42 PM
> To: Vakul Garg <vakul.garg@nxp.com>
> Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> Subject: Re: Help needed - Kernel lockup while running ipsec
> 
> Vakul Garg <vakul.garg@nxp.com> wrote:
> > > Policy refcount is decreasing properly on 4.19.
> > > Same should be on the latest kernel too.
> >
> > On kernel-4.14, I find dst_release() is getting called through
> xfrm_output_one().
> > However since dst->__refcnt gets decremented to '1', the
> > call_rcu(&dst->rcu_head, dst_destroy_rcu) is not invoked.
> >
> > On kernel-4.19, dst->__refcnt gets decremented to '0', hence things
> > fall in place and
> > dst_destroy_rcu() eventually executes.
> >
> > Any further help/pointers for kernel-4.14 would be deeply appreciated.
> 
> Can you try getting rid of the pcpu dst cache?
> 
> I had a look at 4.14-stable and it at least lacks 2950278d2d04ff531.
> 
> I've attached an (untested) revert of the pcpu cache (its gone in 4.19 and
> onwards).
> 

This patch fixed the refcnt issue. Many thanks for your help.
Would you send this patch for inclusion into 4.14-stable?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4.14.y stable] xfrm: policy: remove pcpu policy cache
  2019-08-22 10:23                   ` Vakul Garg
@ 2019-08-22 11:21                     ` Florian Westphal
  2019-08-22 13:09                       ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2019-08-22 11:21 UTC (permalink / raw)
  To: stable
  Cc: vakul.garg, netdev, Florian Westphal, Kristian Evensen, Steffen Klassert

commit e4db5b61c572475bbbcf63e3c8a2606bfccf2c9d upstream.

Kristian Evensen says:
  In a project I am involved in, we are running ipsec (Strongswan) on
  different mt7621-based routers. Each router is configured as an
  initiator and has around ~30 tunnels to different responders (running
  on misc. devices). Before the flow cache was removed (kernel 4.9), we
  got a combined throughput of around 70Mbit/s for all tunnels on one
  router. However, we recently switched to kernel 4.14 (4.14.48), and
  the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
  drop of around 20%. Reverting the flow cache removal restores, as
  expected, performance levels to that of kernel 4.9.

When pcpu xdst exists, it has to be validated first before it can be
used.

A negative hit thus increases cost vs. no-cache.

As number of tunnels increases, hit rate decreases so this pcpu caching
isn't a viable strategy.

Furthermore, the xdst cache also needs to run with BH off, so when
removing this the bh disable/enable pairs can be removed too.

Kristian tested a 4.14.y backport of this change and reported
increased performance:

  In our tests, the throughput reduction has been reduced from around -20%
  to -5%. We also see that the overall throughput is independent of the
  number of tunnels, while before the throughput was reduced as the number
  of tunnels increased.

Reported-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 Vakul Garg reports traffic going via ipsec tunnels will cause the kernel
 to spin in an infinite loop due to xfrm policy reference count
 overflowing and becoming 0.
 The refcount leak is in the pcpu cache.  Instead of fixing this, just
 remove the pcpu cache -- its not present in any other stable release.
 Vakul reported that this patch fixes the problem.

 There are no major deviations from the upstream revert; conflicts
 were only due to context.

 include/net/xfrm.h     |   1 -
 net/xfrm/xfrm_device.c |  10 ---
 net/xfrm/xfrm_policy.c | 138 +----------------------------------------
 net/xfrm/xfrm_state.c  |   5 +-
 4 files changed, 3 insertions(+), 151 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index db99efb2d1d0..bdf185ae93db 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -323,7 +323,6 @@ int xfrm_policy_register_afinfo(const struct xfrm_policy_afinfo *afinfo, int fam
 void xfrm_policy_unregister_afinfo(const struct xfrm_policy_afinfo *afinfo);
 void km_policy_notify(struct xfrm_policy *xp, int dir,
 		      const struct km_event *c);
-void xfrm_policy_cache_flush(void);
 void km_state_notify(struct xfrm_state *x, const struct km_event *c);
 
 struct xfrm_tmpl;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 30e5746085b8..4e458fd9236a 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -153,12 +153,6 @@ static int xfrm_dev_register(struct net_device *dev)
 	return NOTIFY_DONE;
 }
 
-static int xfrm_dev_unregister(struct net_device *dev)
-{
-	xfrm_policy_cache_flush();
-	return NOTIFY_DONE;
-}
-
 static int xfrm_dev_feat_change(struct net_device *dev)
 {
 	if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops)
@@ -178,7 +172,6 @@ static int xfrm_dev_down(struct net_device *dev)
 	if (dev->features & NETIF_F_HW_ESP)
 		xfrm_dev_state_flush(dev_net(dev), dev, true);
 
-	xfrm_policy_cache_flush();
 	return NOTIFY_DONE;
 }
 
@@ -190,9 +183,6 @@ static int xfrm_dev_event(struct notifier_block *this, unsigned long event, void
 	case NETDEV_REGISTER:
 		return xfrm_dev_register(dev);
 
-	case NETDEV_UNREGISTER:
-		return xfrm_dev_unregister(dev);
-
 	case NETDEV_FEAT_CHANGE:
 		return xfrm_dev_feat_change(dev);
 
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 70ec57b887f6..b5006a091fd6 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -45,8 +45,6 @@ struct xfrm_flo {
 	u8 flags;
 };
 
-static DEFINE_PER_CPU(struct xfrm_dst *, xfrm_last_dst);
-static struct work_struct *xfrm_pcpu_work __read_mostly;
 static DEFINE_SPINLOCK(xfrm_policy_afinfo_lock);
 static struct xfrm_policy_afinfo const __rcu *xfrm_policy_afinfo[AF_INET6 + 1]
 						__read_mostly;
@@ -1715,108 +1713,6 @@ static int xfrm_expand_policies(const struct flowi *fl, u16 family,
 
 }
 
-static void xfrm_last_dst_update(struct xfrm_dst *xdst, struct xfrm_dst *old)
-{
-	this_cpu_write(xfrm_last_dst, xdst);
-	if (old)
-		dst_release(&old->u.dst);
-}
-
-static void __xfrm_pcpu_work_fn(void)
-{
-	struct xfrm_dst *old;
-
-	old = this_cpu_read(xfrm_last_dst);
-	if (old && !xfrm_bundle_ok(old))
-		xfrm_last_dst_update(NULL, old);
-}
-
-static void xfrm_pcpu_work_fn(struct work_struct *work)
-{
-	local_bh_disable();
-	rcu_read_lock();
-	__xfrm_pcpu_work_fn();
-	rcu_read_unlock();
-	local_bh_enable();
-}
-
-void xfrm_policy_cache_flush(void)
-{
-	struct xfrm_dst *old;
-	bool found = 0;
-	int cpu;
-
-	might_sleep();
-
-	local_bh_disable();
-	rcu_read_lock();
-	for_each_possible_cpu(cpu) {
-		old = per_cpu(xfrm_last_dst, cpu);
-		if (old && !xfrm_bundle_ok(old)) {
-			if (smp_processor_id() == cpu) {
-				__xfrm_pcpu_work_fn();
-				continue;
-			}
-			found = true;
-			break;
-		}
-	}
-
-	rcu_read_unlock();
-	local_bh_enable();
-
-	if (!found)
-		return;
-
-	get_online_cpus();
-
-	for_each_possible_cpu(cpu) {
-		bool bundle_release;
-
-		rcu_read_lock();
-		old = per_cpu(xfrm_last_dst, cpu);
-		bundle_release = old && !xfrm_bundle_ok(old);
-		rcu_read_unlock();
-
-		if (!bundle_release)
-			continue;
-
-		if (cpu_online(cpu)) {
-			schedule_work_on(cpu, &xfrm_pcpu_work[cpu]);
-			continue;
-		}
-
-		rcu_read_lock();
-		old = per_cpu(xfrm_last_dst, cpu);
-		if (old && !xfrm_bundle_ok(old)) {
-			per_cpu(xfrm_last_dst, cpu) = NULL;
-			dst_release(&old->u.dst);
-		}
-		rcu_read_unlock();
-	}
-
-	put_online_cpus();
-}
-
-static bool xfrm_xdst_can_reuse(struct xfrm_dst *xdst,
-				struct xfrm_state * const xfrm[],
-				int num)
-{
-	const struct dst_entry *dst = &xdst->u.dst;
-	int i;
-
-	if (xdst->num_xfrms != num)
-		return false;
-
-	for (i = 0; i < num; i++) {
-		if (!dst || dst->xfrm != xfrm[i])
-			return false;
-		dst = dst->child;
-	}
-
-	return xfrm_bundle_ok(xdst);
-}
-
 static struct xfrm_dst *
 xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, int num_pols,
 			       const struct flowi *fl, u16 family,
@@ -1824,7 +1720,7 @@ xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, int num_pols,
 {
 	struct net *net = xp_net(pols[0]);
 	struct xfrm_state *xfrm[XFRM_MAX_DEPTH];
-	struct xfrm_dst *xdst, *old;
+	struct xfrm_dst *xdst;
 	struct dst_entry *dst;
 	int err;
 
@@ -1839,21 +1735,6 @@ xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, int num_pols,
 		return ERR_PTR(err);
 	}
 
-	xdst = this_cpu_read(xfrm_last_dst);
-	if (xdst &&
-	    xdst->u.dst.dev == dst_orig->dev &&
-	    xdst->num_pols == num_pols &&
-	    memcmp(xdst->pols, pols,
-		   sizeof(struct xfrm_policy *) * num_pols) == 0 &&
-	    xfrm_xdst_can_reuse(xdst, xfrm, err)) {
-		dst_hold(&xdst->u.dst);
-		while (err > 0)
-			xfrm_state_put(xfrm[--err]);
-		return xdst;
-	}
-
-	old = xdst;
-
 	dst = xfrm_bundle_create(pols[0], xfrm, err, fl, dst_orig);
 	if (IS_ERR(dst)) {
 		XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTBUNDLEGENERROR);
@@ -1866,9 +1747,6 @@ xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, int num_pols,
 	memcpy(xdst->pols, pols, sizeof(struct xfrm_policy *) * num_pols);
 	xdst->policy_genid = atomic_read(&pols[0]->genid);
 
-	atomic_set(&xdst->u.dst.__refcnt, 2);
-	xfrm_last_dst_update(xdst, old);
-
 	return xdst;
 }
 
@@ -2069,11 +1947,8 @@ xfrm_bundle_lookup(struct net *net, const struct flowi *fl, u16 family, u8 dir,
 	if (num_xfrms <= 0)
 		goto make_dummy_bundle;
 
-	local_bh_disable();
 	xdst = xfrm_resolve_and_create_bundle(pols, num_pols, fl, family,
 					      xflo->dst_orig);
-	local_bh_enable();
-
 	if (IS_ERR(xdst)) {
 		err = PTR_ERR(xdst);
 		if (err != -EAGAIN)
@@ -2160,11 +2035,9 @@ struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry *dst_orig,
 				goto no_transform;
 			}
 
-			local_bh_disable();
 			xdst = xfrm_resolve_and_create_bundle(
 					pols, num_pols, fl,
 					family, dst_orig);
-			local_bh_enable();
 
 			if (IS_ERR(xdst)) {
 				xfrm_pols_put(pols, num_pols);
@@ -2992,15 +2865,6 @@ static struct pernet_operations __net_initdata xfrm_net_ops = {
 
 void __init xfrm_init(void)
 {
-	int i;
-
-	xfrm_pcpu_work = kmalloc_array(NR_CPUS, sizeof(*xfrm_pcpu_work),
-				       GFP_KERNEL);
-	BUG_ON(!xfrm_pcpu_work);
-
-	for (i = 0; i < NR_CPUS; i++)
-		INIT_WORK(&xfrm_pcpu_work[i], xfrm_pcpu_work_fn);
-
 	register_pernet_subsys(&xfrm_net_ops);
 	seqcount_init(&xfrm_policy_hash_generation);
 	xfrm_input_init();
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 0cd2bdf3b217..7c093de68780 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -735,10 +735,9 @@ int xfrm_state_flush(struct net *net, u8 proto, bool task_valid)
 	}
 out:
 	spin_unlock_bh(&net->xfrm.xfrm_state_lock);
-	if (cnt) {
+	if (cnt)
 		err = 0;
-		xfrm_policy_cache_flush();
-	}
+
 	return err;
 }
 EXPORT_SYMBOL(xfrm_state_flush);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 4.14.y stable] xfrm: policy: remove pcpu policy cache
  2019-08-22 11:21                     ` [PATCH 4.14.y stable] xfrm: policy: remove pcpu policy cache Florian Westphal
@ 2019-08-22 13:09                       ` Greg KH
  2019-08-22 13:37                         ` Florian Westphal
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2019-08-22 13:09 UTC (permalink / raw)
  To: Florian Westphal
  Cc: stable, vakul.garg, netdev, Kristian Evensen, Steffen Klassert

On Thu, Aug 22, 2019 at 01:21:09PM +0200, Florian Westphal wrote:
> commit e4db5b61c572475bbbcf63e3c8a2606bfccf2c9d upstream.
> 
> Kristian Evensen says:
>   In a project I am involved in, we are running ipsec (Strongswan) on
>   different mt7621-based routers. Each router is configured as an
>   initiator and has around ~30 tunnels to different responders (running
>   on misc. devices). Before the flow cache was removed (kernel 4.9), we
>   got a combined throughput of around 70Mbit/s for all tunnels on one
>   router. However, we recently switched to kernel 4.14 (4.14.48), and
>   the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
>   drop of around 20%. Reverting the flow cache removal restores, as
>   expected, performance levels to that of kernel 4.9.
> 
> When pcpu xdst exists, it has to be validated first before it can be
> used.
> 
> A negative hit thus increases cost vs. no-cache.
> 
> As number of tunnels increases, hit rate decreases so this pcpu caching
> isn't a viable strategy.
> 
> Furthermore, the xdst cache also needs to run with BH off, so when
> removing this the bh disable/enable pairs can be removed too.
> 
> Kristian tested a 4.14.y backport of this change and reported
> increased performance:
> 
>   In our tests, the throughput reduction has been reduced from around -20%
>   to -5%. We also see that the overall throughput is independent of the
>   number of tunnels, while before the throughput was reduced as the number
>   of tunnels increased.
> 
> Reported-by: Kristian Evensen <kristian.evensen@gmail.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> ---
>  Vakul Garg reports traffic going via ipsec tunnels will cause the kernel
>  to spin in an infinite loop due to xfrm policy reference count
>  overflowing and becoming 0.
>  The refcount leak is in the pcpu cache.  Instead of fixing this, just
>  remove the pcpu cache -- its not present in any other stable release.
>  Vakul reported that this patch fixes the problem.
> 
>  There are no major deviations from the upstream revert; conflicts
>  were only due to context.

Now queued up, does 4.9.y also need this?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4.14.y stable] xfrm: policy: remove pcpu policy cache
  2019-08-22 13:09                       ` Greg KH
@ 2019-08-22 13:37                         ` Florian Westphal
  0 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2019-08-22 13:37 UTC (permalink / raw)
  To: Greg KH
  Cc: Florian Westphal, stable, vakul.garg, netdev, Kristian Evensen,
	Steffen Klassert

Greg KH <greg@kroah.com> wrote:
> On Thu, Aug 22, 2019 at 01:21:09PM +0200, Florian Westphal wrote:
> > commit e4db5b61c572475bbbcf63e3c8a2606bfccf2c9d upstream.
> > 
> > Kristian Evensen says:
> >   In a project I am involved in, we are running ipsec (Strongswan) on
> >   different mt7621-based routers. Each router is configured as an
> >   initiator and has around ~30 tunnels to different responders (running
> >   on misc. devices). Before the flow cache was removed (kernel 4.9), we
> >   got a combined throughput of around 70Mbit/s for all tunnels on one
> >   router. However, we recently switched to kernel 4.14 (4.14.48), and
> >   the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
> >   drop of around 20%. Reverting the flow cache removal restores, as
> >   expected, performance levels to that of kernel 4.9.
> > 
> > When pcpu xdst exists, it has to be validated first before it can be
> > used.
> > 
> > A negative hit thus increases cost vs. no-cache.
> > 
> > As number of tunnels increases, hit rate decreases so this pcpu caching
> > isn't a viable strategy.
> > 
> > Furthermore, the xdst cache also needs to run with BH off, so when
> > removing this the bh disable/enable pairs can be removed too.
> > 
> > Kristian tested a 4.14.y backport of this change and reported
> > increased performance:
> > 
> >   In our tests, the throughput reduction has been reduced from around -20%
> >   to -5%. We also see that the overall throughput is independent of the
> >   number of tunnels, while before the throughput was reduced as the number
> >   of tunnels increased.
> > 
> > Reported-by: Kristian Evensen <kristian.evensen@gmail.com>
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> > ---
> >  Vakul Garg reports traffic going via ipsec tunnels will cause the kernel
> >  to spin in an infinite loop due to xfrm policy reference count
> >  overflowing and becoming 0.
> >  The refcount leak is in the pcpu cache.  Instead of fixing this, just
> >  remove the pcpu cache -- its not present in any other stable release.
> >  Vakul reported that this patch fixes the problem.
> > 
> >  There are no major deviations from the upstream revert; conflicts
> >  were only due to context.
> 
> Now queued up, does 4.9.y also need this?

No, 4.14 was the first kernel with this thing and its already gone in
4.19.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-08-22 13:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-19 12:55 Help needed - Kernel lockup while running ipsec Vakul Garg
2019-08-19 17:38 ` Florian Westphal
2019-08-20  9:10   ` Vakul Garg
2019-08-20  9:23     ` Florian Westphal
2019-08-20  9:26       ` Vakul Garg
2019-08-20  9:30         ` Vakul Garg
2019-08-20  9:38         ` Florian Westphal
2019-08-20  9:52           ` Vakul Garg
2019-08-20 10:38             ` Vakul Garg
2019-08-21  7:37               ` Vakul Garg
2019-08-21 16:11                 ` Florian Westphal
2019-08-22 10:23                   ` Vakul Garg
2019-08-22 11:21                     ` [PATCH 4.14.y stable] xfrm: policy: remove pcpu policy cache Florian Westphal
2019-08-22 13:09                       ` Greg KH
2019-08-22 13:37                         ` Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).