All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Looks like issue in handling active_nodes count in 4.19 kernel .
       [not found] <0101016eeb5fdf43-18f58c0b-8670-43eb-ad08-60dae381f0fd-000000@us-west-2.amazonses.com>
@ 2019-12-09 18:05 ` Stephen Smalley
  2019-12-09 18:30   ` rsiddoji
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen Smalley @ 2019-12-09 18:05 UTC (permalink / raw)
  To: rsiddoji, selinux; +Cc: paul, linux-security-module

On 12/9/19 10:55 AM, rsiddoji@codeaurora.org wrote:
> Hi team ,
> Looks like we have  issue in handling the  "active_nodes" count in the
> Selinux - avc.c file.
> Where  avc_cache.active_nodes increase more than slot array   and code
> frequency calling of avc_reclaim_node()  from  avc_alloc_node() ;
> 
> Where following are the 2 instance which seem to  possible culprits which
> are seen on 4.19 kernel . Can you  comment if my understand is wrong.
> 
> 
> #1. if we see the  active_nodes count is incremented in  avc_alloc_node
> (avc) which is called in avc_insert()
> Where if the code take  failure path on  avc_xperms_populate  the code will
> not decrement this counter .
> 
> 
> static struct avc_node *avc_insert(struct selinux_avc *avc,
> 				   u32 ssid, u32 tsid, u16 tclass,
>   				   struct av_decision *avd,
> ....	
> 	node = avc_alloc_node(avc);  //incremented here
> ....
>                rc = avc_xperms_populate(node, xp_node);  // possibilities of
> this getting failure is there .
> 		if (rc) {
> 			kmem_cache_free(avc_node_cachep, node);  // but on
> failure we are not decrementing active_nodes ?
> 			return NULL;
>   		}

I think you are correct; we should perhaps be calling avc_node_kill() 
here as we do in an earlier error path?

> 
> #2.  where it looks like the logic on comparing the  active_nodes against
> avc_cache_threshold seems  wired  as the count of active nodes is always
> going to be
>   more than 512 will may land in simply  removing /calling  avc_reclaim_node
> frequently much before the slots are full maybe we are not using cache at
> best ?
>   we should be comparing with some high watermark ? or my understanding wrong
> ?
>   
> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) */
> 
>   	if (atomic_inc_return(&avc->avc_cache.active_nodes) >
>   	    avc->avc_cache_threshold)      //  default  threshold is 512
>   		avc_reclaim_node(avc);
> 

Not entirely sure what you are asking here.  avc_reclaim_node() should 
reclaim multiple nodes up to AVC_CACHE_RECLAIM.  Possibly that should be 
configurable via selinuxfs too, and/or calculated from 
avc_cache_threshold in some way?

Were you interested in creating a patch to fix the first issue above or 
looking to us to do so?



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Looks like issue in handling active_nodes count in 4.19 kernel .
  2019-12-09 18:05 ` Looks like issue in handling active_nodes count in 4.19 kernel Stephen Smalley
@ 2019-12-09 18:30   ` rsiddoji
  2019-12-11 14:37     ` Stephen Smalley
  0 siblings, 1 reply; 18+ messages in thread
From: rsiddoji @ 2019-12-09 18:30 UTC (permalink / raw)
  To: 'Stephen Smalley', selinux; +Cc: paul, linux-security-module

Thanks for quick response , yes it will be helpful if you can raise the change . 
On the second issue  in  avc_alloc_node we are trying to check the  slot status  as    active_nodes  > 512 ( default ) 
Where  checking the occupancy  should be corrected as     active_nodes > 80% of slots occupied  or 16*512 or 
May be we need to use a different logic . 

> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) */
> 
>   	if (atomic_inc_return(&avc->avc_cache.active_nodes) >
>   	    avc->avc_cache_threshold)      //  default  threshold is 512
>   		avc_reclaim_node(avc);
> 

Regards,
Ravi 

-----Original Message-----
From: selinux-owner@vger.kernel.org <selinux-owner@vger.kernel.org> On Behalf Of Stephen Smalley
Sent: Monday, December 9, 2019 11:35 PM
To: rsiddoji@codeaurora.org; selinux@vger.kernel.org
Cc: paul@paul-moore.com; linux-security-module@vger.kernel.org
Subject: Re: Looks like issue in handling active_nodes count in 4.19 kernel .

On 12/9/19 10:55 AM, rsiddoji@codeaurora.org wrote:
> Hi team ,
> Looks like we have  issue in handling the  "active_nodes" count in the 
> Selinux - avc.c file.
> Where  avc_cache.active_nodes increase more than slot array   and code
> frequency calling of avc_reclaim_node()  from  avc_alloc_node() ;
> 
> Where following are the 2 instance which seem to  possible culprits 
> which are seen on 4.19 kernel . Can you  comment if my understand is wrong.
> 
> 
> #1. if we see the  active_nodes count is incremented in  
> avc_alloc_node
> (avc) which is called in avc_insert()
> Where if the code take  failure path on  avc_xperms_populate  the code 
> will not decrement this counter .
> 
> 
> static struct avc_node *avc_insert(struct selinux_avc *avc,
> 				   u32 ssid, u32 tsid, u16 tclass,
>   				   struct av_decision *avd,
> ....	
> 	node = avc_alloc_node(avc);  //incremented here ....
>                rc = avc_xperms_populate(node, xp_node);  // 
> possibilities of this getting failure is there .
> 		if (rc) {
> 			kmem_cache_free(avc_node_cachep, node);  // but on failure we are 
> not decrementing active_nodes ?
> 			return NULL;
>   		}

I think you are correct; we should perhaps be calling avc_node_kill() here as we do in an earlier error path?

> 
> #2.  where it looks like the logic on comparing the  active_nodes 
> against avc_cache_threshold seems  wired  as the count of active nodes 
> is always going to be
>   more than 512 will may land in simply  removing /calling  
> avc_reclaim_node frequently much before the slots are full maybe we 
> are not using cache at best ?
>   we should be comparing with some high watermark ? or my 
> understanding wrong ?
>   
> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) */
> 
>   	if (atomic_inc_return(&avc->avc_cache.active_nodes) >
>   	    avc->avc_cache_threshold)      //  default  threshold is 512
>   		avc_reclaim_node(avc);
> 

Not entirely sure what you are asking here.  avc_reclaim_node() should reclaim multiple nodes up to AVC_CACHE_RECLAIM.  Possibly that should be configurable via selinuxfs too, and/or calculated from avc_cache_threshold in some way?

Were you interested in creating a patch to fix the first issue above or looking to us to do so?




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Looks like issue in handling active_nodes count in 4.19 kernel .
  2019-12-09 18:30   ` rsiddoji
@ 2019-12-11 14:37     ` Stephen Smalley
  2019-12-11 14:47       ` Stephen Smalley
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen Smalley @ 2019-12-11 14:37 UTC (permalink / raw)
  To: rsiddoji, selinux; +Cc: paul, linux-security-module

On 12/9/19 1:30 PM, rsiddoji@codeaurora.org wrote:
> Thanks for quick response , yes it will be helpful if you can raise the change .
> On the second issue  in  avc_alloc_node we are trying to check the  slot status  as    active_nodes  > 512 ( default )
> Where  checking the occupancy  should be corrected as     active_nodes > 80% of slots occupied  or 16*512 or
> May be we need to use a different logic .

Are you seeing an actual problem with this in practice, and if so, what 
exactly is it that you are seeing and do you have a reproducer?

> 
>> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) */
>>
>>    	if (atomic_inc_return(&avc->avc_cache.active_nodes) >
>>    	    avc->avc_cache_threshold)      //  default  threshold is 512
>>    		avc_reclaim_node(avc);
>>
> 
> Regards,
> Ravi
> 
> -----Original Message-----
> From: selinux-owner@vger.kernel.org <selinux-owner@vger.kernel.org> On Behalf Of Stephen Smalley
> Sent: Monday, December 9, 2019 11:35 PM
> To: rsiddoji@codeaurora.org; selinux@vger.kernel.org
> Cc: paul@paul-moore.com; linux-security-module@vger.kernel.org
> Subject: Re: Looks like issue in handling active_nodes count in 4.19 kernel .
> 
> On 12/9/19 10:55 AM, rsiddoji@codeaurora.org wrote:
>> Hi team ,
>> Looks like we have  issue in handling the  "active_nodes" count in the
>> Selinux - avc.c file.
>> Where  avc_cache.active_nodes increase more than slot array   and code
>> frequency calling of avc_reclaim_node()  from  avc_alloc_node() ;
>>
>> Where following are the 2 instance which seem to  possible culprits
>> which are seen on 4.19 kernel . Can you  comment if my understand is wrong.
>>
>>
>> #1. if we see the  active_nodes count is incremented in
>> avc_alloc_node
>> (avc) which is called in avc_insert()
>> Where if the code take  failure path on  avc_xperms_populate  the code
>> will not decrement this counter .
>>
>>
>> static struct avc_node *avc_insert(struct selinux_avc *avc,
>> 				   u32 ssid, u32 tsid, u16 tclass,
>>    				   struct av_decision *avd,
>> ....	
>> 	node = avc_alloc_node(avc);  //incremented here ....
>>                 rc = avc_xperms_populate(node, xp_node);  //
>> possibilities of this getting failure is there .
>> 		if (rc) {
>> 			kmem_cache_free(avc_node_cachep, node);  // but on failure we are
>> not decrementing active_nodes ?
>> 			return NULL;
>>    		}
> 
> I think you are correct; we should perhaps be calling avc_node_kill() here as we do in an earlier error path?
> 
>>
>> #2.  where it looks like the logic on comparing the  active_nodes
>> against avc_cache_threshold seems  wired  as the count of active nodes
>> is always going to be
>>    more than 512 will may land in simply  removing /calling
>> avc_reclaim_node frequently much before the slots are full maybe we
>> are not using cache at best ?
>>    we should be comparing with some high watermark ? or my
>> understanding wrong ?
>>    
>> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) */
>>
>>    	if (atomic_inc_return(&avc->avc_cache.active_nodes) >
>>    	    avc->avc_cache_threshold)      //  default  threshold is 512
>>    		avc_reclaim_node(avc);
>>
> 
> Not entirely sure what you are asking here.  avc_reclaim_node() should reclaim multiple nodes up to AVC_CACHE_RECLAIM.  Possibly that should be configurable via selinuxfs too, and/or calculated from avc_cache_threshold in some way?
> 
> Were you interested in creating a patch to fix the first issue above or looking to us to do so?
> 
> 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Looks like issue in handling active_nodes count in 4.19 kernel .
  2019-12-11 14:37     ` Stephen Smalley
@ 2019-12-11 14:47       ` Stephen Smalley
  2019-12-11 15:35         ` rsiddoji
       [not found]         ` <0101016ef59a2152-41e65aac-8784-4401-b20d-45b2852872d4-000000@us-west-2.amazonses.com>
  0 siblings, 2 replies; 18+ messages in thread
From: Stephen Smalley @ 2019-12-11 14:47 UTC (permalink / raw)
  To: rsiddoji, selinux; +Cc: paul, linux-security-module

On 12/11/19 9:37 AM, Stephen Smalley wrote:
> On 12/9/19 1:30 PM, rsiddoji@codeaurora.org wrote:
>> Thanks for quick response , yes it will be helpful if you can raise 
>> the change .
>> On the second issue  in  avc_alloc_node we are trying to check the  
>> slot status  as    active_nodes  > 512 ( default )
>> Where  checking the occupancy  should be corrected as     active_nodes 
>> > 80% of slots occupied  or 16*512 or
>> May be we need to use a different logic .
> 
> Are you seeing an actual problem with this in practice, and if so, what 
> exactly is it that you are seeing and do you have a reproducer?

BTW, on Linux distributions, there is an avcstat(8) utility that can be 
used to monitor the AVC statistics, or you can directly read the stats 
from the kernel via /sys/fs/selinux/avc/cache_stats

> 
>>
>>> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) */
>>>
>>>        if (atomic_inc_return(&avc->avc_cache.active_nodes) >
>>>            avc->avc_cache_threshold)      //  default  threshold is 512
>>>            avc_reclaim_node(avc);
>>>
>>
>> Regards,
>> Ravi
>>
>> -----Original Message-----
>> From: selinux-owner@vger.kernel.org <selinux-owner@vger.kernel.org> On 
>> Behalf Of Stephen Smalley
>> Sent: Monday, December 9, 2019 11:35 PM
>> To: rsiddoji@codeaurora.org; selinux@vger.kernel.org
>> Cc: paul@paul-moore.com; linux-security-module@vger.kernel.org
>> Subject: Re: Looks like issue in handling active_nodes count in 4.19 
>> kernel .
>>
>> On 12/9/19 10:55 AM, rsiddoji@codeaurora.org wrote:
>>> Hi team ,
>>> Looks like we have  issue in handling the  "active_nodes" count in the
>>> Selinux - avc.c file.
>>> Where  avc_cache.active_nodes increase more than slot array   and code
>>> frequency calling of avc_reclaim_node()  from  avc_alloc_node() ;
>>>
>>> Where following are the 2 instance which seem to  possible culprits
>>> which are seen on 4.19 kernel . Can you  comment if my understand is 
>>> wrong.
>>>
>>>
>>> #1. if we see the  active_nodes count is incremented in
>>> avc_alloc_node
>>> (avc) which is called in avc_insert()
>>> Where if the code take  failure path on  avc_xperms_populate  the code
>>> will not decrement this counter .
>>>
>>>
>>> static struct avc_node *avc_insert(struct selinux_avc *avc,
>>>                    u32 ssid, u32 tsid, u16 tclass,
>>>                       struct av_decision *avd,
>>> ....
>>>     node = avc_alloc_node(avc);  //incremented here ....
>>>                 rc = avc_xperms_populate(node, xp_node);  //
>>> possibilities of this getting failure is there .
>>>         if (rc) {
>>>             kmem_cache_free(avc_node_cachep, node);  // but on 
>>> failure we are
>>> not decrementing active_nodes ?
>>>             return NULL;
>>>            }
>>
>> I think you are correct; we should perhaps be calling avc_node_kill() 
>> here as we do in an earlier error path?
>>
>>>
>>> #2.  where it looks like the logic on comparing the  active_nodes
>>> against avc_cache_threshold seems  wired  as the count of active nodes
>>> is always going to be
>>>    more than 512 will may land in simply  removing /calling
>>> avc_reclaim_node frequently much before the slots are full maybe we
>>> are not using cache at best ?
>>>    we should be comparing with some high watermark ? or my
>>> understanding wrong ?
>>> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) */
>>>
>>>        if (atomic_inc_return(&avc->avc_cache.active_nodes) >
>>>            avc->avc_cache_threshold)      //  default  threshold is 512
>>>            avc_reclaim_node(avc);
>>>
>>
>> Not entirely sure what you are asking here.  avc_reclaim_node() should 
>> reclaim multiple nodes up to AVC_CACHE_RECLAIM.  Possibly that should 
>> be configurable via selinuxfs too, and/or calculated from 
>> avc_cache_threshold in some way?
>>
>> Were you interested in creating a patch to fix the first issue above 
>> or looking to us to do so?
>>
>>
>>
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Looks like issue in handling active_nodes count in 4.19 kernel .
  2019-12-11 14:47       ` Stephen Smalley
@ 2019-12-11 15:35         ` rsiddoji
       [not found]         ` <0101016ef59a2152-41e65aac-8784-4401-b20d-45b2852872d4-000000@us-west-2.amazonses.com>
  1 sibling, 0 replies; 18+ messages in thread
From: rsiddoji @ 2019-12-11 15:35 UTC (permalink / raw)
  To: 'Stephen Smalley', selinux; +Cc: paul, linux-security-module

Thanks for tacking the patch fwd . On the  question :

Actually issue started when we were seeing most of the  time "avc_reclaim_node" in the stack . 
Which on debugging further  avc_cache.active_nodes was already in 7K+ nodes  and  as the logic  is 

As below . 
	if (atomic_inc_return(&avc->avc_cache.active_nodes) >   avc->avc_cache_threshold)
           			avc_reclaim_node(avc);

So if the  active_nodes count is  > 512  (if not configured) we will be always be calling   avc_reclaim_node() and eventually  for each  node insert we will be calling avc_reclaim_node  and might  be expansive then using 
cache  and advantage of cache might be null and void due to this overhead?

Thanks , 
Ravi

-----Original Message-----
From: selinux-owner@vger.kernel.org <selinux-owner@vger.kernel.org> On Behalf Of Stephen Smalley
Sent: Wednesday, December 11, 2019 8:18 PM
To: rsiddoji@codeaurora.org; selinux@vger.kernel.org
Cc: paul@paul-moore.com; linux-security-module@vger.kernel.org
Subject: Re: Looks like issue in handling active_nodes count in 4.19 kernel .

On 12/11/19 9:37 AM, Stephen Smalley wrote:
> On 12/9/19 1:30 PM, rsiddoji@codeaurora.org wrote:
>> Thanks for quick response , yes it will be helpful if you can raise 
>> the change .
>> On the second issue  in  avc_alloc_node we are trying to check the 
>> slot status  as    active_nodes  > 512 ( default ) Where  checking 
>> the occupancy  should be corrected as     active_nodes
>> > 80% of slots occupied  or 16*512 or
>> May be we need to use a different logic .
> 
> Are you seeing an actual problem with this in practice, and if so, 
> what exactly is it that you are seeing and do you have a reproducer?

BTW, on Linux distributions, there is an avcstat(8) utility that can be used to monitor the AVC statistics, or you can directly read the stats from the kernel via /sys/fs/selinux/avc/cache_stats

> 
>>
>>> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) 
>>> */
>>>
>>>        if (atomic_inc_return(&avc->avc_cache.active_nodes) >
>>>            avc->avc_cache_threshold)      //  default  threshold is 
>>> 512
>>>            avc_reclaim_node(avc);
>>>
>>
>> Regards,
>> Ravi
>>
>> -----Original Message-----
>> From: selinux-owner@vger.kernel.org <selinux-owner@vger.kernel.org> 
>> On Behalf Of Stephen Smalley
>> Sent: Monday, December 9, 2019 11:35 PM
>> To: rsiddoji@codeaurora.org; selinux@vger.kernel.org
>> Cc: paul@paul-moore.com; linux-security-module@vger.kernel.org
>> Subject: Re: Looks like issue in handling active_nodes count in 4.19 
>> kernel .
>>
>> On 12/9/19 10:55 AM, rsiddoji@codeaurora.org wrote:
>>> Hi team ,
>>> Looks like we have  issue in handling the  "active_nodes" count in 
>>> the Selinux - avc.c file.
>>> Where  avc_cache.active_nodes increase more than slot array   and 
>>> code frequency calling of avc_reclaim_node()  from  avc_alloc_node() 
>>> ;
>>>
>>> Where following are the 2 instance which seem to  possible culprits 
>>> which are seen on 4.19 kernel . Can you  comment if my understand is 
>>> wrong.
>>>
>>>
>>> #1. if we see the  active_nodes count is incremented in 
>>> avc_alloc_node
>>> (avc) which is called in avc_insert() Where if the code take  
>>> failure path on  avc_xperms_populate  the code will not decrement 
>>> this counter .
>>>
>>>
>>> static struct avc_node *avc_insert(struct selinux_avc *avc,
>>>                    u32 ssid, u32 tsid, u16 tclass,
>>>                       struct av_decision *avd, ....
>>>     node = avc_alloc_node(avc);  //incremented here ....
>>>                 rc = avc_xperms_populate(node, xp_node);  // 
>>> possibilities of this getting failure is there .
>>>         if (rc) {
>>>             kmem_cache_free(avc_node_cachep, node);  // but on 
>>> failure we are not decrementing active_nodes ?
>>>             return NULL;
>>>            }
>>
>> I think you are correct; we should perhaps be calling avc_node_kill() 
>> here as we do in an earlier error path?
>>
>>>
>>> #2.  where it looks like the logic on comparing the  active_nodes 
>>> against avc_cache_threshold seems  wired  as the count of active 
>>> nodes is always going to be
>>>    more than 512 will may land in simply  removing /calling 
>>> avc_reclaim_node frequently much before the slots are full maybe we 
>>> are not using cache at best ?
>>>    we should be comparing with some high watermark ? or my 
>>> understanding wrong ?
>>> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) 
>>> */
>>>
>>>        if (atomic_inc_return(&avc->avc_cache.active_nodes) >
>>>            avc->avc_cache_threshold)      //  default  threshold is 
>>> 512
>>>            avc_reclaim_node(avc);
>>>
>>
>> Not entirely sure what you are asking here.  avc_reclaim_node() 
>> should reclaim multiple nodes up to AVC_CACHE_RECLAIM.  Possibly that 
>> should be configurable via selinuxfs too, and/or calculated from 
>> avc_cache_threshold in some way?
>>
>> Were you interested in creating a patch to fix the first issue above 
>> or looking to us to do so?
>>
>>
>>
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Looks like issue in handling active_nodes count in 4.19 kernel .
       [not found]         ` <0101016ef59a2152-41e65aac-8784-4401-b20d-45b2852872d4-000000@us-west-2.amazonses.com>
@ 2019-12-11 15:53           ` Stephen Smalley
  2019-12-17 15:40             ` Ravi Kumar Siddojigari
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen Smalley @ 2019-12-11 15:53 UTC (permalink / raw)
  To: rsiddoji, selinux; +Cc: paul, linux-security-module

On 12/11/19 10:35 AM, rsiddoji@codeaurora.org wrote:
> Thanks for tacking the patch fwd . On the  question :
> 
> Actually issue started when we were seeing most of the  time "avc_reclaim_node" in the stack .
> Which on debugging further  avc_cache.active_nodes was already in 7K+ nodes  and  as the logic  is
> 
> As below .
> 	if (atomic_inc_return(&avc->avc_cache.active_nodes) >   avc->avc_cache_threshold)
>             			avc_reclaim_node(avc);
> 
> So if the  active_nodes count is  > 512  (if not configured) we will be always be calling   avc_reclaim_node() and eventually  for each  node insert we will be calling avc_reclaim_node  and might  be expansive then using
> cache  and advantage of cache might be null and void due to this overhead?

Was this on a system with the default avc_cache_threshold value or was 
it set higher by the distro/user?

If it was still 512 or any value significantly less than 7K, then the 
bug is that it ever reached 7K in the first place. The first bug should 
only trigger under severe memory pressure.  The other potential reason 
for growing numbers of active nodes would be cache thrashing leading to 
avc_reclaim_node() being unable to take the lock on any buckets and 
therefore unable to release nodes.

Possibly you need a larger cache threshold set on this system.  It can 
be set via /sys/fs/selinux/avc/cache_threshold.

Allowing AVC_CACHE_RECLAIM to also be set via selinuxfs or computed 
relative to avc_cache_threshold would make sense as a further improvement.

> 
> Thanks ,
> Ravi
> 
> -----Original Message-----
> From: selinux-owner@vger.kernel.org <selinux-owner@vger.kernel.org> On Behalf Of Stephen Smalley
> Sent: Wednesday, December 11, 2019 8:18 PM
> To: rsiddoji@codeaurora.org; selinux@vger.kernel.org
> Cc: paul@paul-moore.com; linux-security-module@vger.kernel.org
> Subject: Re: Looks like issue in handling active_nodes count in 4.19 kernel .
> 
> On 12/11/19 9:37 AM, Stephen Smalley wrote:
>> On 12/9/19 1:30 PM, rsiddoji@codeaurora.org wrote:
>>> Thanks for quick response , yes it will be helpful if you can raise
>>> the change .
>>> On the second issue  in  avc_alloc_node we are trying to check the
>>> slot status  as    active_nodes  > 512 ( default ) Where  checking
>>> the occupancy  should be corrected as     active_nodes
>>>> 80% of slots occupied  or 16*512 or
>>> May be we need to use a different logic .
>>
>> Are you seeing an actual problem with this in practice, and if so,
>> what exactly is it that you are seeing and do you have a reproducer?
> 
> BTW, on Linux distributions, there is an avcstat(8) utility that can be used to monitor the AVC statistics, or you can directly read the stats from the kernel via /sys/fs/selinux/avc/cache_stats
> 
>>
>>>
>>>> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc)
>>>> */
>>>>
>>>>         if (atomic_inc_return(&avc->avc_cache.active_nodes) >
>>>>             avc->avc_cache_threshold)      //  default  threshold is
>>>> 512
>>>>             avc_reclaim_node(avc);
>>>>
>>>
>>> Regards,
>>> Ravi
>>>
>>> -----Original Message-----
>>> From: selinux-owner@vger.kernel.org <selinux-owner@vger.kernel.org>
>>> On Behalf Of Stephen Smalley
>>> Sent: Monday, December 9, 2019 11:35 PM
>>> To: rsiddoji@codeaurora.org; selinux@vger.kernel.org
>>> Cc: paul@paul-moore.com; linux-security-module@vger.kernel.org
>>> Subject: Re: Looks like issue in handling active_nodes count in 4.19
>>> kernel .
>>>
>>> On 12/9/19 10:55 AM, rsiddoji@codeaurora.org wrote:
>>>> Hi team ,
>>>> Looks like we have  issue in handling the  "active_nodes" count in
>>>> the Selinux - avc.c file.
>>>> Where  avc_cache.active_nodes increase more than slot array   and
>>>> code frequency calling of avc_reclaim_node()  from  avc_alloc_node()
>>>> ;
>>>>
>>>> Where following are the 2 instance which seem to  possible culprits
>>>> which are seen on 4.19 kernel . Can you  comment if my understand is
>>>> wrong.
>>>>
>>>>
>>>> #1. if we see the  active_nodes count is incremented in
>>>> avc_alloc_node
>>>> (avc) which is called in avc_insert() Where if the code take
>>>> failure path on  avc_xperms_populate  the code will not decrement
>>>> this counter .
>>>>
>>>>
>>>> static struct avc_node *avc_insert(struct selinux_avc *avc,
>>>>                     u32 ssid, u32 tsid, u16 tclass,
>>>>                        struct av_decision *avd, ....
>>>>      node = avc_alloc_node(avc);  //incremented here ....
>>>>                  rc = avc_xperms_populate(node, xp_node);  //
>>>> possibilities of this getting failure is there .
>>>>          if (rc) {
>>>>              kmem_cache_free(avc_node_cachep, node);  // but on
>>>> failure we are not decrementing active_nodes ?
>>>>              return NULL;
>>>>             }
>>>
>>> I think you are correct; we should perhaps be calling avc_node_kill()
>>> here as we do in an earlier error path?
>>>
>>>>
>>>> #2.  where it looks like the logic on comparing the  active_nodes
>>>> against avc_cache_threshold seems  wired  as the count of active
>>>> nodes is always going to be
>>>>     more than 512 will may land in simply  removing /calling
>>>> avc_reclaim_node frequently much before the slots are full maybe we
>>>> are not using cache at best ?
>>>>     we should be comparing with some high watermark ? or my
>>>> understanding wrong ?
>>>> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc)
>>>> */
>>>>
>>>>         if (atomic_inc_return(&avc->avc_cache.active_nodes) >
>>>>             avc->avc_cache_threshold)      //  default  threshold is
>>>> 512
>>>>             avc_reclaim_node(avc);
>>>>
>>>
>>> Not entirely sure what you are asking here.  avc_reclaim_node()
>>> should reclaim multiple nodes up to AVC_CACHE_RECLAIM.  Possibly that
>>> should be configurable via selinuxfs too, and/or calculated from
>>> avc_cache_threshold in some way?
>>>
>>> Were you interested in creating a patch to fix the first issue above
>>> or looking to us to do so?
>>>
>>>
>>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Looks like issue in handling active_nodes count in 4.19 kernel .
  2019-12-11 15:53           ` Stephen Smalley
@ 2019-12-17 15:40             ` Ravi Kumar Siddojigari
  2019-12-17 15:52               ` Stephen Smalley
  0 siblings, 1 reply; 18+ messages in thread
From: Ravi Kumar Siddojigari @ 2019-12-17 15:40 UTC (permalink / raw)
  To: 'Stephen Smalley', selinux; +Cc: paul, linux-security-module

Yes  indeed this is a stress test on ARM64 device with multicore  where most of the cores /tasks are stuck  in avc_reclaim_node . 
We still see this issue even after picking the earlier patch " selinux: ensure we cleanup the internal AVC counters on error in avc_insert() commit: d8db60cb23e4"
Where selinux_state  during issue was as below where all the slots are  NULL and the count was more than threshold.
Which seem to be calling avc_reclaim_node always and as the all the slots are empty its going for full for- loop with locks and unlock and taking too long . 
Not sure what could make the  slots null , for sure its not due to flush() /Reset(). We think that still we need to call  avc_kill_node  in update_node function .
Adding the patch below can you please review or correct the following patch .


  selinux_state = (
    disabled = FALSE,
    enforcing = TRUE,
    checkreqprot = FALSE,
    initialized = TRUE,
    policycap = (TRUE, TRUE, TRUE, FALSE, FALSE, TRUE),
    avc = 0xFFFFFF9BEFF1E890 -> (
      avc_cache_threshold = 512,  /* <<<<<not configured and its with default*/
      avc_cache = (
        slots = ((first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first   /*<<<< all are NULL */
        slots_lock = ((rlock = (raw_lock = (val = (counter = 0), locked = 0, pending = 0, locked_pending = 0, tail = 0), magic = 3735899821, owner_cpu = 4294967295, owner = 0xFFFFFFFFFFFFFFFF, dep_map = (key = 0xFFFFFF9BEFF298A8, cla
        lru_hint = (counter = 616831529),
        active_nodes = (counter = 547),   /*<<<<< increased more than 512*/
        latest_notif = 1)),
    ss = 0xFFFFFF9BEFF2E578)


--
In AVC update we don't call avc_node_kill() when avc_xperms_populate()
fails, resulting in the avc->avc_cache.active_nodes counter having a
false value.In last patch this changes was missed , so correcting it.

Change-Id: Ic0298162cc766c0f21be7ab232e259766654dad3
Signed-off-by: Jaihind Yadav<jaihindyadav@codeaurora.org>
---
 security/selinux/avc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 91d24c2..3d1cff2 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -913,7 +913,7 @@ static int avc_update_node(struct selinux_avc *avc,
        if (orig->ae.xp_node) {
                rc = avc_xperms_populate(node, orig->ae.xp_node);
                if (rc) {
-                       kmem_cache_free(avc_node_cachep, node);
+                       avc_node_kill(avc, node);
                        goto out_unlock;
                }
        }
--

Regards,
Ravi


-----Original Message-----
From: Stephen Smalley <sds@tycho.nsa.gov> 
Sent: Wednesday, December 11, 2019 9:24 PM
To: rsiddoji@codeaurora.org; selinux@vger.kernel.org
Cc: paul@paul-moore.com; linux-security-module@vger.kernel.org
Subject: Re: Looks like issue in handling active_nodes count in 4.19 kernel .

On 12/11/19 10:35 AM, rsiddoji@codeaurora.org wrote:
> Thanks for tacking the patch fwd . On the  question :
> 
> Actually issue started when we were seeing most of the  time "avc_reclaim_node" in the stack .
> Which on debugging further  avc_cache.active_nodes was already in 7K+ 
> nodes  and  as the logic  is
> 
> As below .
> 	if (atomic_inc_return(&avc->avc_cache.active_nodes) >   avc->avc_cache_threshold)
>             			avc_reclaim_node(avc);
> 
> So if the  active_nodes count is  > 512  (if not configured) we will be always be calling   avc_reclaim_node() and eventually  for each  node insert we will be calling avc_reclaim_node  and might  be expansive then using
> cache  and advantage of cache might be null and void due to this overhead?

Was this on a system with the default avc_cache_threshold value or was it set higher by the distro/user?

If it was still 512 or any value significantly less than 7K, then the bug is that it ever reached 7K in the first place. The first bug should only trigger under severe memory pressure.  The other potential reason for growing numbers of active nodes would be cache thrashing leading to
avc_reclaim_node() being unable to take the lock on any buckets and therefore unable to release nodes.

Possibly you need a larger cache threshold set on this system.  It can be set via /sys/fs/selinux/avc/cache_threshold.

Allowing AVC_CACHE_RECLAIM to also be set via selinuxfs or computed relative to avc_cache_threshold would make sense as a further improvement.

> 
> Thanks ,
> Ravi
> 
> -----Original Message-----
> From: selinux-owner@vger.kernel.org <selinux-owner@vger.kernel.org> On 
> Behalf Of Stephen Smalley
> Sent: Wednesday, December 11, 2019 8:18 PM
> To: rsiddoji@codeaurora.org; selinux@vger.kernel.org
> Cc: paul@paul-moore.com; linux-security-module@vger.kernel.org
> Subject: Re: Looks like issue in handling active_nodes count in 4.19 kernel .
> 
> On 12/11/19 9:37 AM, Stephen Smalley wrote:
>> On 12/9/19 1:30 PM, rsiddoji@codeaurora.org wrote:
>>> Thanks for quick response , yes it will be helpful if you can raise 
>>> the change .
>>> On the second issue  in  avc_alloc_node we are trying to check the
>>> slot status  as    active_nodes  > 512 ( default ) Where  checking
>>> the occupancy  should be corrected as     active_nodes
>>>> 80% of slots occupied  or 16*512 or
>>> May be we need to use a different logic .
>>
>> Are you seeing an actual problem with this in practice, and if so, 
>> what exactly is it that you are seeing and do you have a reproducer?
> 
> BTW, on Linux distributions, there is an avcstat(8) utility that can 
> be used to monitor the AVC statistics, or you can directly read the 
> stats from the kernel via /sys/fs/selinux/avc/cache_stats
> 
>>
>>>
>>>> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) 
>>>> */
>>>>
>>>>         if (atomic_inc_return(&avc->avc_cache.active_nodes) >
>>>>             avc->avc_cache_threshold)      //  default  threshold is
>>>> 512
>>>>             avc_reclaim_node(avc);
>>>>
>>>
>>> Regards,
>>> Ravi
>>>
>>> -----Original Message-----
>>> From: selinux-owner@vger.kernel.org <selinux-owner@vger.kernel.org> 
>>> On Behalf Of Stephen Smalley
>>> Sent: Monday, December 9, 2019 11:35 PM
>>> To: rsiddoji@codeaurora.org; selinux@vger.kernel.org
>>> Cc: paul@paul-moore.com; linux-security-module@vger.kernel.org
>>> Subject: Re: Looks like issue in handling active_nodes count in 4.19 
>>> kernel .
>>>
>>> On 12/9/19 10:55 AM, rsiddoji@codeaurora.org wrote:
>>>> Hi team ,
>>>> Looks like we have  issue in handling the  "active_nodes" count in 
>>>> the Selinux - avc.c file.
>>>> Where  avc_cache.active_nodes increase more than slot array   and
>>>> code frequency calling of avc_reclaim_node()  from  
>>>> avc_alloc_node() ;
>>>>
>>>> Where following are the 2 instance which seem to  possible culprits 
>>>> which are seen on 4.19 kernel . Can you  comment if my understand 
>>>> is wrong.
>>>>
>>>>
>>>> #1. if we see the  active_nodes count is incremented in 
>>>> avc_alloc_node
>>>> (avc) which is called in avc_insert() Where if the code take 
>>>> failure path on  avc_xperms_populate  the code will not decrement 
>>>> this counter .
>>>>
>>>>
>>>> static struct avc_node *avc_insert(struct selinux_avc *avc,
>>>>                     u32 ssid, u32 tsid, u16 tclass,
>>>>                        struct av_decision *avd, ....
>>>>      node = avc_alloc_node(avc);  //incremented here ....
>>>>                  rc = avc_xperms_populate(node, xp_node);  // 
>>>> possibilities of this getting failure is there .
>>>>          if (rc) {
>>>>              kmem_cache_free(avc_node_cachep, node);  // but on 
>>>> failure we are not decrementing active_nodes ?
>>>>              return NULL;
>>>>             }
>>>
>>> I think you are correct; we should perhaps be calling 
>>> avc_node_kill() here as we do in an earlier error path?
>>>
>>>>
>>>> #2.  where it looks like the logic on comparing the  active_nodes 
>>>> against avc_cache_threshold seems  wired  as the count of active 
>>>> nodes is always going to be
>>>>     more than 512 will may land in simply  removing /calling 
>>>> avc_reclaim_node frequently much before the slots are full maybe we 
>>>> are not using cache at best ?
>>>>     we should be comparing with some high watermark ? or my 
>>>> understanding wrong ?
>>>> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) 
>>>> */
>>>>
>>>>         if (atomic_inc_return(&avc->avc_cache.active_nodes) >
>>>>             avc->avc_cache_threshold)      //  default  threshold is
>>>> 512
>>>>             avc_reclaim_node(avc);
>>>>
>>>
>>> Not entirely sure what you are asking here.  avc_reclaim_node() 
>>> should reclaim multiple nodes up to AVC_CACHE_RECLAIM.  Possibly 
>>> that should be configurable via selinuxfs too, and/or calculated 
>>> from avc_cache_threshold in some way?
>>>
>>> Were you interested in creating a patch to fix the first issue above 
>>> or looking to us to do so?
>>>
>>>
>>>
>>
> 
> 


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Looks like issue in handling active_nodes count in 4.19 kernel .
  2019-12-17 15:40             ` Ravi Kumar Siddojigari
@ 2019-12-17 15:52               ` Stephen Smalley
  2019-12-17 16:23                 ` Stephen Smalley
  2019-12-19  2:20                 ` Paul Moore
  0 siblings, 2 replies; 18+ messages in thread
From: Stephen Smalley @ 2019-12-17 15:52 UTC (permalink / raw)
  To: Ravi Kumar Siddojigari, selinux; +Cc: paul, linux-security-module

On 12/17/19 10:40 AM, Ravi Kumar Siddojigari wrote:
> Yes  indeed this is a stress test on ARM64 device with multicore  where most of the cores /tasks are stuck  in avc_reclaim_node .
> We still see this issue even after picking the earlier patch " selinux: ensure we cleanup the internal AVC counters on error in avc_insert() commit: d8db60cb23e4"
> Where selinux_state  during issue was as below where all the slots are  NULL and the count was more than threshold.
> Which seem to be calling avc_reclaim_node always and as the all the slots are empty its going for full for- loop with locks and unlock and taking too long .
> Not sure what could make the  slots null , for sure its not due to flush() /Reset(). We think that still we need to call  avc_kill_node  in update_node function .
> Adding the patch below can you please review or correct the following patch .
> 
> 
>    selinux_state = (
>      disabled = FALSE,
>      enforcing = TRUE,
>      checkreqprot = FALSE,
>      initialized = TRUE,
>      policycap = (TRUE, TRUE, TRUE, FALSE, FALSE, TRUE),
>      avc = 0xFFFFFF9BEFF1E890 -> (
>        avc_cache_threshold = 512,  /* <<<<<not configured and its with default*/
>        avc_cache = (
>          slots = ((first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first   /*<<<< all are NULL */
>          slots_lock = ((rlock = (raw_lock = (val = (counter = 0), locked = 0, pending = 0, locked_pending = 0, tail = 0), magic = 3735899821, owner_cpu = 4294967295, owner = 0xFFFFFFFFFFFFFFFF, dep_map = (key = 0xFFFFFF9BEFF298A8, cla
>          lru_hint = (counter = 616831529),
>          active_nodes = (counter = 547),   /*<<<<< increased more than 512*/
>          latest_notif = 1)),
>      ss = 0xFFFFFF9BEFF2E578)
> 
> 
> --
> In AVC update we don't call avc_node_kill() when avc_xperms_populate()
> fails, resulting in the avc->avc_cache.active_nodes counter having a
> false value.In last patch this changes was missed , so correcting it.
> 
> Change-Id: Ic0298162cc766c0f21be7ab232e259766654dad3
> Signed-off-by: Jaihind Yadav<jaihindyadav@codeaurora.org>
> ---
>   security/selinux/avc.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/security/selinux/avc.c b/security/selinux/avc.c
> index 91d24c2..3d1cff2 100644
> --- a/security/selinux/avc.c
> +++ b/security/selinux/avc.c
> @@ -913,7 +913,7 @@ static int avc_update_node(struct selinux_avc *avc,
>          if (orig->ae.xp_node) {
>                  rc = avc_xperms_populate(node, orig->ae.xp_node);
>                  if (rc) {
> -                       kmem_cache_free(avc_node_cachep, node);
> +                       avc_node_kill(avc, node);
>                          goto out_unlock;
>                  }
>          }
> --

That looks correct to me; I guess that one got missed by the prior fix.
Still not sure how your AVC got into that state though...

Acked-by: Stephen Smalley <sds@tycho.nsa.gov>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Looks like issue in handling active_nodes count in 4.19 kernel .
  2019-12-17 15:52               ` Stephen Smalley
@ 2019-12-17 16:23                 ` Stephen Smalley
  2019-12-18  5:58                   ` Ravi Kumar Siddojigari
  2019-12-19  2:20                 ` Paul Moore
  1 sibling, 1 reply; 18+ messages in thread
From: Stephen Smalley @ 2019-12-17 16:23 UTC (permalink / raw)
  To: Ravi Kumar Siddojigari, selinux; +Cc: paul, linux-security-module

On 12/17/19 10:52 AM, Stephen Smalley wrote:
> On 12/17/19 10:40 AM, Ravi Kumar Siddojigari wrote:
>> Yes  indeed this is a stress test on ARM64 device with multicore  
>> where most of the cores /tasks are stuck  in avc_reclaim_node .
>> We still see this issue even after picking the earlier patch " 
>> selinux: ensure we cleanup the internal AVC counters on error in 
>> avc_insert() commit: d8db60cb23e4"
>> Where selinux_state  during issue was as below where all the slots 
>> are  NULL and the count was more than threshold.
>> Which seem to be calling avc_reclaim_node always and as the all the 
>> slots are empty its going for full for- loop with locks and unlock and 
>> taking too long .
>> Not sure what could make the  slots null , for sure its not due to 
>> flush() /Reset(). We think that still we need to call  avc_kill_node  
>> in update_node function .
>> Adding the patch below can you please review or correct the following 
>> patch .
>>
>>
>>    selinux_state = (
>>      disabled = FALSE,
>>      enforcing = TRUE,
>>      checkreqprot = FALSE,
>>      initialized = TRUE,
>>      policycap = (TRUE, TRUE, TRUE, FALSE, FALSE, TRUE),
>>      avc = 0xFFFFFF9BEFF1E890 -> (
>>        avc_cache_threshold = 512,  /* <<<<<not configured and its with 
>> default*/
>>        avc_cache = (
>>          slots = ((first = 0x0), (first = 0x0), (first = 0x0), (first 
>> = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), 
>> (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first   
>> /*<<<< all are NULL */
>>          slots_lock = ((rlock = (raw_lock = (val = (counter = 0), 
>> locked = 0, pending = 0, locked_pending = 0, tail = 0), magic = 
>> 3735899821, owner_cpu = 4294967295, owner = 0xFFFFFFFFFFFFFFFF, 
>> dep_map = (key = 0xFFFFFF9BEFF298A8, cla
>>          lru_hint = (counter = 616831529),
>>          active_nodes = (counter = 547),   /*<<<<< increased more than 
>> 512*/
>>          latest_notif = 1)),
>>      ss = 0xFFFFFF9BEFF2E578)
>>
>>
>> -- 
>> In AVC update we don't call avc_node_kill() when avc_xperms_populate()
>> fails, resulting in the avc->avc_cache.active_nodes counter having a
>> false value.In last patch this changes was missed , so correcting it.
>>
>> Change-Id: Ic0298162cc766c0f21be7ab232e259766654dad3
>> Signed-off-by: Jaihind Yadav<jaihindyadav@codeaurora.org>
>> ---
>>   security/selinux/avc.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/security/selinux/avc.c b/security/selinux/avc.c
>> index 91d24c2..3d1cff2 100644
>> --- a/security/selinux/avc.c
>> +++ b/security/selinux/avc.c
>> @@ -913,7 +913,7 @@ static int avc_update_node(struct selinux_avc *avc,
>>          if (orig->ae.xp_node) {
>>                  rc = avc_xperms_populate(node, orig->ae.xp_node);
>>                  if (rc) {
>> -                       kmem_cache_free(avc_node_cachep, node);
>> +                       avc_node_kill(avc, node);
>>                          goto out_unlock;
>>                  }
>>          }
>> -- 
> 
> That looks correct to me; I guess that one got missed by the prior fix.
> Still not sure how your AVC got into that state though...
> 
> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>

BTW, have you been running these stress tests on earlier kernels too? 
If so, what version(s) are known to pass them?  I ask because this code 
has been present since v4.3 and this is the first such report.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Looks like issue in handling active_nodes count in 4.19 kernel .
  2019-12-17 16:23                 ` Stephen Smalley
@ 2019-12-18  5:58                   ` Ravi Kumar Siddojigari
  2019-12-18 13:53                     ` Stephen Smalley
  0 siblings, 1 reply; 18+ messages in thread
From: Ravi Kumar Siddojigari @ 2019-12-18  5:58 UTC (permalink / raw)
  To: 'Stephen Smalley', selinux; +Cc: paul, linux-security-module

Yes this is the first time that we are getting this stress tested done on v4.19 kernel . 
We had not tested this prior version of kernel though . Current proposed changes seems to really help and testing is still going on . 
As per the delta it looks  change  6b6bc620  seem to be missing in earlier version of kernel not sure if this was the cause. 

Br , 
Ravi.
-----Original Message-----
From: Stephen Smalley <sds@tycho.nsa.gov> 
Sent: Tuesday, December 17, 2019 9:54 PM
To: Ravi Kumar Siddojigari <rsiddoji@codeaurora.org>; selinux@vger.kernel.org
Cc: paul@paul-moore.com; linux-security-module@vger.kernel.org
Subject: Re: Looks like issue in handling active_nodes count in 4.19 kernel .

On 12/17/19 10:52 AM, Stephen Smalley wrote:
> On 12/17/19 10:40 AM, Ravi Kumar Siddojigari wrote:
>> Yes  indeed this is a stress test on ARM64 device with multicore 
>> where most of the cores /tasks are stuck  in avc_reclaim_node .
>> We still see this issue even after picking the earlier patch " 
>> selinux: ensure we cleanup the internal AVC counters on error in
>> avc_insert() commit: d8db60cb23e4"
>> Where selinux_state  during issue was as below where all the slots 
>> are  NULL and the count was more than threshold.
>> Which seem to be calling avc_reclaim_node always and as the all the 
>> slots are empty its going for full for- loop with locks and unlock 
>> and taking too long .
>> Not sure what could make the  slots null , for sure its not due to
>> flush() /Reset(). We think that still we need to call  avc_kill_node 
>> in update_node function .
>> Adding the patch below can you please review or correct the following 
>> patch .
>>
>>
>>    selinux_state = (
>>      disabled = FALSE,
>>      enforcing = TRUE,
>>      checkreqprot = FALSE,
>>      initialized = TRUE,
>>      policycap = (TRUE, TRUE, TRUE, FALSE, FALSE, TRUE),
>>      avc = 0xFFFFFF9BEFF1E890 -> (
>>        avc_cache_threshold = 512,  /* <<<<<not configured and its 
>> with default*/
>>        avc_cache = (
>>          slots = ((first = 0x0), (first = 0x0), (first = 0x0), (first 
>> = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0),
>> (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first   
>> /*<<<< all are NULL */
>>          slots_lock = ((rlock = (raw_lock = (val = (counter = 0), 
>> locked = 0, pending = 0, locked_pending = 0, tail = 0), magic = 
>> 3735899821, owner_cpu = 4294967295, owner = 0xFFFFFFFFFFFFFFFF, 
>> dep_map = (key = 0xFFFFFF9BEFF298A8, cla
>>          lru_hint = (counter = 616831529),
>>          active_nodes = (counter = 547),   /*<<<<< increased more 
>> than 512*/
>>          latest_notif = 1)),
>>      ss = 0xFFFFFF9BEFF2E578)
>>
>>
>> --
>> In AVC update we don't call avc_node_kill() when 
>> avc_xperms_populate() fails, resulting in the 
>> avc->avc_cache.active_nodes counter having a false value.In last patch this changes was missed , so correcting it.
>>
>> Change-Id: Ic0298162cc766c0f21be7ab232e259766654dad3
>> Signed-off-by: Jaihind Yadav<jaihindyadav@codeaurora.org>
>> ---
>>   security/selinux/avc.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/security/selinux/avc.c b/security/selinux/avc.c index 
>> 91d24c2..3d1cff2 100644
>> --- a/security/selinux/avc.c
>> +++ b/security/selinux/avc.c
>> @@ -913,7 +913,7 @@ static int avc_update_node(struct selinux_avc 
>> *avc,
>>          if (orig->ae.xp_node) {
>>                  rc = avc_xperms_populate(node, orig->ae.xp_node);
>>                  if (rc) {
>> -                       kmem_cache_free(avc_node_cachep, node);
>> +                       avc_node_kill(avc, node);
>>                          goto out_unlock;
>>                  }
>>          }
>> --
> 
> That looks correct to me; I guess that one got missed by the prior fix.
> Still not sure how your AVC got into that state though...
> 
> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>

BTW, have you been running these stress tests on earlier kernels too? 
If so, what version(s) are known to pass them?  I ask because this code has been present since v4.3 and this is the first such report.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Looks like issue in handling active_nodes count in 4.19 kernel .
  2019-12-18  5:58                   ` Ravi Kumar Siddojigari
@ 2019-12-18 13:53                     ` Stephen Smalley
  0 siblings, 0 replies; 18+ messages in thread
From: Stephen Smalley @ 2019-12-18 13:53 UTC (permalink / raw)
  To: Ravi Kumar Siddojigari, selinux; +Cc: paul, linux-security-module

On 12/18/19 12:58 AM, Ravi Kumar Siddojigari wrote:
> Yes this is the first time that we are getting this stress tested done on v4.19 kernel .
> We had not tested this prior version of kernel though . Current proposed changes seems to really help and testing is still going on .
> As per the delta it looks  change  6b6bc620  seem to be missing in earlier version of kernel not sure if this was the cause.

6b6bc620 shouldn't have altered any behavior; it was purely an 
encapsulation of the data structures.  Both of the bugs you've 
identified were introduced by the xperms support in fa1aa143ac4a68. 
Maybe they were harder to trigger when the AVC was still using 
GFP_ATOMIC instead of GFP_NOWAIT, but they were bugs nonetheless.

> 
> Br ,
> Ravi.
> -----Original Message-----
> From: Stephen Smalley <sds@tycho.nsa.gov>
> Sent: Tuesday, December 17, 2019 9:54 PM
> To: Ravi Kumar Siddojigari <rsiddoji@codeaurora.org>; selinux@vger.kernel.org
> Cc: paul@paul-moore.com; linux-security-module@vger.kernel.org
> Subject: Re: Looks like issue in handling active_nodes count in 4.19 kernel .
> 
> On 12/17/19 10:52 AM, Stephen Smalley wrote:
>> On 12/17/19 10:40 AM, Ravi Kumar Siddojigari wrote:
>>> Yes  indeed this is a stress test on ARM64 device with multicore
>>> where most of the cores /tasks are stuck  in avc_reclaim_node .
>>> We still see this issue even after picking the earlier patch "
>>> selinux: ensure we cleanup the internal AVC counters on error in
>>> avc_insert() commit: d8db60cb23e4"
>>> Where selinux_state  during issue was as below where all the slots
>>> are  NULL and the count was more than threshold.
>>> Which seem to be calling avc_reclaim_node always and as the all the
>>> slots are empty its going for full for- loop with locks and unlock
>>> and taking too long .
>>> Not sure what could make the  slots null , for sure its not due to
>>> flush() /Reset(). We think that still we need to call  avc_kill_node
>>> in update_node function .
>>> Adding the patch below can you please review or correct the following
>>> patch .
>>>
>>>
>>>     selinux_state = (
>>>       disabled = FALSE,
>>>       enforcing = TRUE,
>>>       checkreqprot = FALSE,
>>>       initialized = TRUE,
>>>       policycap = (TRUE, TRUE, TRUE, FALSE, FALSE, TRUE),
>>>       avc = 0xFFFFFF9BEFF1E890 -> (
>>>         avc_cache_threshold = 512,  /* <<<<<not configured and its
>>> with default*/
>>>         avc_cache = (
>>>           slots = ((first = 0x0), (first = 0x0), (first = 0x0), (first
>>> = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0),
>>> (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first
>>> /*<<<< all are NULL */
>>>           slots_lock = ((rlock = (raw_lock = (val = (counter = 0),
>>> locked = 0, pending = 0, locked_pending = 0, tail = 0), magic =
>>> 3735899821, owner_cpu = 4294967295, owner = 0xFFFFFFFFFFFFFFFF,
>>> dep_map = (key = 0xFFFFFF9BEFF298A8, cla
>>>           lru_hint = (counter = 616831529),
>>>           active_nodes = (counter = 547),   /*<<<<< increased more
>>> than 512*/
>>>           latest_notif = 1)),
>>>       ss = 0xFFFFFF9BEFF2E578)
>>>
>>>
>>> --
>>> In AVC update we don't call avc_node_kill() when
>>> avc_xperms_populate() fails, resulting in the
>>> avc->avc_cache.active_nodes counter having a false value.In last patch this changes was missed , so correcting it.
>>>
>>> Change-Id: Ic0298162cc766c0f21be7ab232e259766654dad3
>>> Signed-off-by: Jaihind Yadav<jaihindyadav@codeaurora.org>
>>> ---
>>>    security/selinux/avc.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/security/selinux/avc.c b/security/selinux/avc.c index
>>> 91d24c2..3d1cff2 100644
>>> --- a/security/selinux/avc.c
>>> +++ b/security/selinux/avc.c
>>> @@ -913,7 +913,7 @@ static int avc_update_node(struct selinux_avc
>>> *avc,
>>>           if (orig->ae.xp_node) {
>>>                   rc = avc_xperms_populate(node, orig->ae.xp_node);
>>>                   if (rc) {
>>> -                       kmem_cache_free(avc_node_cachep, node);
>>> +                       avc_node_kill(avc, node);
>>>                           goto out_unlock;
>>>                   }
>>>           }
>>> --
>>
>> That looks correct to me; I guess that one got missed by the prior fix.
>> Still not sure how your AVC got into that state though...
>>
>> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
> 
> BTW, have you been running these stress tests on earlier kernels too?
> If so, what version(s) are known to pass them?  I ask because this code has been present since v4.3 and this is the first such report.
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Looks like issue in handling active_nodes count in 4.19 kernel .
  2019-12-17 15:52               ` Stephen Smalley
  2019-12-17 16:23                 ` Stephen Smalley
@ 2019-12-19  2:20                 ` Paul Moore
  2019-12-19  9:48                   ` Ravi Kumar Siddojigari
  1 sibling, 1 reply; 18+ messages in thread
From: Paul Moore @ 2019-12-19  2:20 UTC (permalink / raw)
  To: Stephen Smalley; +Cc: Ravi Kumar Siddojigari, selinux, linux-security-module

On Tue, Dec 17, 2019 at 10:51 AM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>
> On 12/17/19 10:40 AM, Ravi Kumar Siddojigari wrote:
> > Yes  indeed this is a stress test on ARM64 device with multicore  where most of the cores /tasks are stuck  in avc_reclaim_node .
> > We still see this issue even after picking the earlier patch " selinux: ensure we cleanup the internal AVC counters on error in avc_insert() commit: d8db60cb23e4"
> > Where selinux_state  during issue was as below where all the slots are  NULL and the count was more than threshold.
> > Which seem to be calling avc_reclaim_node always and as the all the slots are empty its going for full for- loop with locks and unlock and taking too long .
> > Not sure what could make the  slots null , for sure its not due to flush() /Reset(). We think that still we need to call  avc_kill_node  in update_node function .
> > Adding the patch below can you please review or correct the following patch .
> >
> >
> >    selinux_state = (
> >      disabled = FALSE,
> >      enforcing = TRUE,
> >      checkreqprot = FALSE,
> >      initialized = TRUE,
> >      policycap = (TRUE, TRUE, TRUE, FALSE, FALSE, TRUE),
> >      avc = 0xFFFFFF9BEFF1E890 -> (
> >        avc_cache_threshold = 512,  /* <<<<<not configured and its with default*/
> >        avc_cache = (
> >          slots = ((first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first   /*<<<< all are NULL */
> >          slots_lock = ((rlock = (raw_lock = (val = (counter = 0), locked = 0, pending = 0, locked_pending = 0, tail = 0), magic = 3735899821, owner_cpu = 4294967295, owner = 0xFFFFFFFFFFFFFFFF, dep_map = (key = 0xFFFFFF9BEFF298A8, cla
> >          lru_hint = (counter = 616831529),
> >          active_nodes = (counter = 547),   /*<<<<< increased more than 512*/
> >          latest_notif = 1)),
> >      ss = 0xFFFFFF9BEFF2E578)
> >
> >
> > --
> > In AVC update we don't call avc_node_kill() when avc_xperms_populate()
> > fails, resulting in the avc->avc_cache.active_nodes counter having a
> > false value.In last patch this changes was missed , so correcting it.
> >
> > Change-Id: Ic0298162cc766c0f21be7ab232e259766654dad3
> > Signed-off-by: Jaihind Yadav<jaihindyadav@codeaurora.org>
> > ---
> >   security/selinux/avc.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/security/selinux/avc.c b/security/selinux/avc.c
> > index 91d24c2..3d1cff2 100644
> > --- a/security/selinux/avc.c
> > +++ b/security/selinux/avc.c
> > @@ -913,7 +913,7 @@ static int avc_update_node(struct selinux_avc *avc,
> >          if (orig->ae.xp_node) {
> >                  rc = avc_xperms_populate(node, orig->ae.xp_node);
> >                  if (rc) {
> > -                       kmem_cache_free(avc_node_cachep, node);
> > +                       avc_node_kill(avc, node);
> >                          goto out_unlock;
> >                  }
> >          }
> > --
>
> That looks correct to me; I guess that one got missed by the prior fix.
> Still not sure how your AVC got into that state though...
>
> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>

This looks good to me too.  Ravi, can you submit this as a proper
patch with From: set to Jaihing Yadav (assuming they are the author)
and your sign-off?

Thanks.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Looks like issue in handling active_nodes count in 4.19 kernel .
  2019-12-19  2:20                 ` Paul Moore
@ 2019-12-19  9:48                   ` Ravi Kumar Siddojigari
  2019-12-19 16:00                     ` Stephen Smalley
  2019-12-19 18:11                     ` Paul Moore
  0 siblings, 2 replies; 18+ messages in thread
From: Ravi Kumar Siddojigari @ 2019-12-19  9:48 UTC (permalink / raw)
  To: 'Paul Moore', 'Stephen Smalley'
  Cc: selinux, linux-security-module

Sorry , Re-adding the patch  below as requested. 

Stephen , 
Issue is fixed with this  2 changes , Issue as even reproduced on v4.14 and  similar changes work there also . 

--
From 77c618006397c7a65ead257f3cb4e4fe3da2d4b8 Mon Sep 17 00:00:00 2001
From: Jaihind Yadav <jaihindyadav@codeaurora.org>
Date: Tue, 17 Dec 2019 17:25:47 +0530
Subject: [PATCH] selinux: ensure we cleanup the internal AVC counters on error
 in avc_update()

In AVC update we don't call avc_node_kill() when avc_xperms_populate()
fails, resulting in the avc->avc_cache.active_nodes counter having a
false value. In last patch this changes was missed , so correcting it.

Change-Id: Ic0298162cc766c0f21be7ab232e259766654dad3
Signed-off-by: Ravi Kumar Siddojigari <rsiddoji@codeaurora.org>
---
 security/selinux/avc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 91d24c2..3d1cff2 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -913,7 +913,7 @@ static int avc_update_node(struct selinux_avc *avc,
        if (orig->ae.xp_node) {
                rc = avc_xperms_populate(node, orig->ae.xp_node);
                if (rc) {
-                       kmem_cache_free(avc_node_cachep, node);
+                       avc_node_kill(avc, node);
                        goto out_unlock;
                }
        }
--
1.9.1

Br,


-----Original Message-----
From: selinux-owner@vger.kernel.org <selinux-owner@vger.kernel.org> On Behalf Of Paul Moore
Sent: Thursday, December 19, 2019 7:50 AM
To: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Ravi Kumar Siddojigari <rsiddoji@codeaurora.org>; selinux@vger.kernel.org; linux-security-module@vger.kernel.org
Subject: Re: Looks like issue in handling active_nodes count in 4.19 kernel .

On Tue, Dec 17, 2019 at 10:51 AM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>
> On 12/17/19 10:40 AM, Ravi Kumar Siddojigari wrote:
> > Yes  indeed this is a stress test on ARM64 device with multicore  where most of the cores /tasks are stuck  in avc_reclaim_node .
> > We still see this issue even after picking the earlier patch " selinux: ensure we cleanup the internal AVC counters on error in avc_insert() commit: d8db60cb23e4"
> > Where selinux_state  during issue was as below where all the slots are  NULL and the count was more than threshold.
> > Which seem to be calling avc_reclaim_node always and as the all the slots are empty its going for full for- loop with locks and unlock and taking too long .
> > Not sure what could make the  slots null , for sure its not due to flush() /Reset(). We think that still we need to call  avc_kill_node  in update_node function .
> > Adding the patch below can you please review or correct the following patch .
> >
> >
> >    selinux_state = (
> >      disabled = FALSE,
> >      enforcing = TRUE,
> >      checkreqprot = FALSE,
> >      initialized = TRUE,
> >      policycap = (TRUE, TRUE, TRUE, FALSE, FALSE, TRUE),
> >      avc = 0xFFFFFF9BEFF1E890 -> (
> >        avc_cache_threshold = 512,  /* <<<<<not configured and its with default*/
> >        avc_cache = (
> >          slots = ((first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first   /*<<<< all are NULL */
> >          slots_lock = ((rlock = (raw_lock = (val = (counter = 0), locked = 0, pending = 0, locked_pending = 0, tail = 0), magic = 3735899821, owner_cpu = 4294967295, owner = 0xFFFFFFFFFFFFFFFF, dep_map = (key = 0xFFFFFF9BEFF298A8, cla
> >          lru_hint = (counter = 616831529),
> >          active_nodes = (counter = 547),   /*<<<<< increased more than 512*/
> >          latest_notif = 1)),
> >      ss = 0xFFFFFF9BEFF2E578)
> >
> >
> > --
> > In AVC update we don't call avc_node_kill() when 
> > avc_xperms_populate() fails, resulting in the 
> > avc->avc_cache.active_nodes counter having a false value.In last patch this changes was missed , so correcting it.
> >
> > Change-Id: Ic0298162cc766c0f21be7ab232e259766654dad3
> > Signed-off-by: Jaihind Yadav<jaihindyadav@codeaurora.org>
> > ---
> >   security/selinux/avc.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/security/selinux/avc.c b/security/selinux/avc.c index 
> > 91d24c2..3d1cff2 100644
> > --- a/security/selinux/avc.c
> > +++ b/security/selinux/avc.c
> > @@ -913,7 +913,7 @@ static int avc_update_node(struct selinux_avc *avc,
> >          if (orig->ae.xp_node) {
> >                  rc = avc_xperms_populate(node, orig->ae.xp_node);
> >                  if (rc) {
> > -                       kmem_cache_free(avc_node_cachep, node);
> > +                       avc_node_kill(avc, node);
> >                          goto out_unlock;
> >                  }
> >          }
> > --
>
> That looks correct to me; I guess that one got missed by the prior fix.
> Still not sure how your AVC got into that state though...
>
> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>

This looks good to me too.  Ravi, can you submit this as a proper patch with From: set to Jaihing Yadav (assuming they are the author) and your sign-off?

Thanks.

--
paul moore
www.paul-moore.com

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Looks like issue in handling active_nodes count in 4.19 kernel .
  2019-12-19  9:48                   ` Ravi Kumar Siddojigari
@ 2019-12-19 16:00                     ` Stephen Smalley
  2019-12-19 18:11                     ` Paul Moore
  1 sibling, 0 replies; 18+ messages in thread
From: Stephen Smalley @ 2019-12-19 16:00 UTC (permalink / raw)
  To: Ravi Kumar Siddojigari, 'Paul Moore'
  Cc: selinux, linux-security-module

On 12/19/19 4:48 AM, Ravi Kumar Siddojigari wrote:
> Sorry , Re-adding the patch  below as requested.
> 
> Stephen ,
> Issue is fixed with this  2 changes , Issue as even reproduced on v4.14 and  similar changes work there also .

It would be preferable if you sent the patch directly via git send-email 
or similar.  In any event, for the final version, we should drop the 
Change-Id because it is Android-specific and we should add a Fixes line 
like so:

Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")

Given the behavior you are describing and the fact that you could 
reproduce it on v4.14 as well, I would recommend marking both it and 
Paul's earlier patch for stable (Paul will do this if he agrees; no 
action required by you).

> 
> --
>  From 77c618006397c7a65ead257f3cb4e4fe3da2d4b8 Mon Sep 17 00:00:00 2001
> From: Jaihind Yadav <jaihindyadav@codeaurora.org>
> Date: Tue, 17 Dec 2019 17:25:47 +0530
> Subject: [PATCH] selinux: ensure we cleanup the internal AVC counters on error
>   in avc_update()
> 
> In AVC update we don't call avc_node_kill() when avc_xperms_populate()
> fails, resulting in the avc->avc_cache.active_nodes counter having a
> false value. In last patch this changes was missed , so correcting it.
> 
> Change-Id: Ic0298162cc766c0f21be7ab232e259766654dad3
> Signed-off-by: Ravi Kumar Siddojigari <rsiddoji@codeaurora.org>
> ---
>   security/selinux/avc.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/security/selinux/avc.c b/security/selinux/avc.c
> index 91d24c2..3d1cff2 100644
> --- a/security/selinux/avc.c
> +++ b/security/selinux/avc.c
> @@ -913,7 +913,7 @@ static int avc_update_node(struct selinux_avc *avc,
>          if (orig->ae.xp_node) {
>                  rc = avc_xperms_populate(node, orig->ae.xp_node);
>                  if (rc) {
> -                       kmem_cache_free(avc_node_cachep, node);
> +                       avc_node_kill(avc, node);
>                          goto out_unlock;
>                  }
>          }
> --
> 1.9.1
> 
> Br,
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Looks like issue in handling active_nodes count in 4.19 kernel .
  2019-12-19  9:48                   ` Ravi Kumar Siddojigari
  2019-12-19 16:00                     ` Stephen Smalley
@ 2019-12-19 18:11                     ` Paul Moore
  2019-12-20 12:03                       ` Ravi Kumar Siddojigari
  1 sibling, 1 reply; 18+ messages in thread
From: Paul Moore @ 2019-12-19 18:11 UTC (permalink / raw)
  To: Ravi Kumar Siddojigari; +Cc: Stephen Smalley, selinux, linux-security-module

On Thu, Dec 19, 2019 at 4:48 AM Ravi Kumar Siddojigari
<rsiddoji@codeaurora.org> wrote:
>
> Sorry , Re-adding the patch  below as requested.
>
> Stephen ,
> Issue is fixed with this  2 changes , Issue as even reproduced on v4.14 and  similar changes work there also .
>
> --
> From 77c618006397c7a65ead257f3cb4e4fe3da2d4b8 Mon Sep 17 00:00:00 2001
> From: Jaihind Yadav <jaihindyadav@codeaurora.org>
> Date: Tue, 17 Dec 2019 17:25:47 +0530
> Subject: [PATCH] selinux: ensure we cleanup the internal AVC counters on error
>  in avc_update()
>
> In AVC update we don't call avc_node_kill() when avc_xperms_populate()
> fails, resulting in the avc->avc_cache.active_nodes counter having a
> false value. In last patch this changes was missed , so correcting it.
>
> Change-Id: Ic0298162cc766c0f21be7ab232e259766654dad3
> Signed-off-by: Ravi Kumar Siddojigari <rsiddoji@codeaurora.org>
> ---
>  security/selinux/avc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Two things:

* As Stephen already pointed out, please don't include "Change-Id"
metadata in your commit, that means nothing to the upstream kernel.

* If the patch is really from Jaihind Yadav then they should include
their sign-off, and preferably you would include your sign-off as well
since you are the one posting the patch.  Please look at the
"Developer's Certificate of Origin" section in
Documentation/process/submitting-patches.rst.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Looks like issue in handling active_nodes count in 4.19 kernel
  2019-12-19 18:11                     ` Paul Moore
@ 2019-12-20 12:03                       ` Ravi Kumar Siddojigari
  2019-12-21 16:02                         ` Paul Moore
  0 siblings, 1 reply; 18+ messages in thread
From: Ravi Kumar Siddojigari @ 2019-12-20 12:03 UTC (permalink / raw)
  To: selinux, linux-security-module, sds; +Cc: rsiddoji, Jaihind Yadav


Thanks for correcting , Adding the signoff of orginal author in the 
following commit .

From 6308b405e2097ab9d82c5a3894815daf7331e0b6 Mon Sep 17 00:00:00 2001
From: Jaihind Yadav <jaihindyadav@codeaurora.org>
Date: Tue, 17 Dec 2019 17:25:47 +0530
Subject: [PATCH] selinux: ensure we cleanup the internal AVC counters on error
 in avc_update()
To: rsiddoji@codeaurora.org

In AVC update we don't call avc_node_kill() when avc_xperms_populate()
fails, resulting in the avc->avc_cache.active_nodes counter having a
false value.In last patch this changes was missed , so correcting it.


Signed-off-by: Jaihind Yadav <jaihindyadav@codeaurora.org>
Signed-off-by: Ravi Kumar Siddojigari <rsiddoji@codeaurora.org>
---
 security/selinux/avc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index ecd3829996aa..9c69e83834b0 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -907,7 +907,7 @@ static int avc_update_node(struct selinux_avc *avc,
        if (orig->ae.xp_node) {
                rc = avc_xperms_populate(node, orig->ae.xp_node);
                if (rc) {
-                       kmem_cache_free(avc_node_cachep, node);
+                       avc_node_kill(avc, node);
                        goto out_unlock;
                }
        }
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Looks like issue in handling active_nodes count in 4.19 kernel
  2019-12-20 12:03                       ` Ravi Kumar Siddojigari
@ 2019-12-21 16:02                         ` Paul Moore
  0 siblings, 0 replies; 18+ messages in thread
From: Paul Moore @ 2019-12-21 16:02 UTC (permalink / raw)
  To: Ravi Kumar Siddojigari
  Cc: selinux, linux-security-module, Stephen Smalley, Jaihind Yadav

On Fri, Dec 20, 2019 at 7:03 AM Ravi Kumar Siddojigari
<rsiddoji@codeaurora.org> wrote:
> Thanks for correcting , Adding the signoff of orginal author in the
> following commit .
>
> From 6308b405e2097ab9d82c5a3894815daf7331e0b6 Mon Sep 17 00:00:00 2001
> From: Jaihind Yadav <jaihindyadav@codeaurora.org>
> Date: Tue, 17 Dec 2019 17:25:47 +0530
> Subject: [PATCH] selinux: ensure we cleanup the internal AVC counters on error
>  in avc_update()
> To: rsiddoji@codeaurora.org
>
> In AVC update we don't call avc_node_kill() when avc_xperms_populate()
> fails, resulting in the avc->avc_cache.active_nodes counter having a
> false value.In last patch this changes was missed , so correcting it.
>
>
> Signed-off-by: Jaihind Yadav <jaihindyadav@codeaurora.org>
> Signed-off-by: Ravi Kumar Siddojigari <rsiddoji@codeaurora.org>
> ---
>  security/selinux/avc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Merged into selinux/next, thanks!

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Looks like issue in handling   active_nodes count  in 4.19 kernel .
@ 2019-12-09 15:55 rsiddoji
  0 siblings, 0 replies; 18+ messages in thread
From: rsiddoji @ 2019-12-09 15:55 UTC (permalink / raw)
  To: selinux; +Cc: paul, linux-security-module, sds

Hi team , 
Looks like we have  issue in handling the  "active_nodes" count in the
Selinux - avc.c file. 
Where  avc_cache.active_nodes increase more than slot array   and code
frequency calling of avc_reclaim_node()  from  avc_alloc_node() ;

Where following are the 2 instance which seem to  possible culprits which
are seen on 4.19 kernel . Can you  comment if my understand is wrong.


#1. if we see the  active_nodes count is incremented in  avc_alloc_node
(avc) which is called in avc_insert() 
Where if the code take  failure path on  avc_xperms_populate  the code will
not decrement this counter . 


static struct avc_node *avc_insert(struct selinux_avc *avc,
				   u32 ssid, u32 tsid, u16 tclass,
 				   struct av_decision *avd,
....	
	node = avc_alloc_node(avc);  //incremented here
....
              rc = avc_xperms_populate(node, xp_node);  // possibilities of
this getting failure is there .
		if (rc) {
			kmem_cache_free(avc_node_cachep, node);  // but on
failure we are not decrementing active_nodes ?
			return NULL;  
 		}

#2.  where it looks like the logic on comparing the  active_nodes against
avc_cache_threshold seems  wired  as the count of active nodes is always
going to be
 more than 512 will may land in simply  removing /calling  avc_reclaim_node
frequently much before the slots are full maybe we are not using cache at
best ?
 we should be comparing with some high watermark ? or my understanding wrong
?
 
/*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) */

 	if (atomic_inc_return(&avc->avc_cache.active_nodes) >   
 	    avc->avc_cache_threshold)      //  default  threshold is 512 
 		avc_reclaim_node(avc);


Regards,
Ravi


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-12-21 16:02 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <0101016eeb5fdf43-18f58c0b-8670-43eb-ad08-60dae381f0fd-000000@us-west-2.amazonses.com>
2019-12-09 18:05 ` Looks like issue in handling active_nodes count in 4.19 kernel Stephen Smalley
2019-12-09 18:30   ` rsiddoji
2019-12-11 14:37     ` Stephen Smalley
2019-12-11 14:47       ` Stephen Smalley
2019-12-11 15:35         ` rsiddoji
     [not found]         ` <0101016ef59a2152-41e65aac-8784-4401-b20d-45b2852872d4-000000@us-west-2.amazonses.com>
2019-12-11 15:53           ` Stephen Smalley
2019-12-17 15:40             ` Ravi Kumar Siddojigari
2019-12-17 15:52               ` Stephen Smalley
2019-12-17 16:23                 ` Stephen Smalley
2019-12-18  5:58                   ` Ravi Kumar Siddojigari
2019-12-18 13:53                     ` Stephen Smalley
2019-12-19  2:20                 ` Paul Moore
2019-12-19  9:48                   ` Ravi Kumar Siddojigari
2019-12-19 16:00                     ` Stephen Smalley
2019-12-19 18:11                     ` Paul Moore
2019-12-20 12:03                       ` Ravi Kumar Siddojigari
2019-12-21 16:02                         ` Paul Moore
2019-12-09 15:55 rsiddoji

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.