Race condition during policy load in kernel

* Race condition during policy load in kernel
@ 2020-04-27 20:40 Daniel Burgener
  2020-04-28 18:54 ` Stephen Smalley
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Burgener @ 2020-04-27 20:40 UTC (permalink / raw)
  To: Stephen Smalley, selinux list; +Cc: james.morris

Hello all,

We've noticed a few instances of strange failures in userspace object 
managers happening immediately after a policy load, that we believe 
we've traced to a race condition in the kernel, and wanted to get your 
thoughts on our diagnosis and the appropriate fix before creating a 
patch in case we've missed something.

The issue is that userspace object managers rely on /sys/fs/selinux to 
determine the mapping of object class and permission strings to numeric 
identifiers, however immediately after a policy load, 
/sys/fs/selinux/{booleans,class,policy_capabilities}, are deleted and 
recreated.  This leaves a window where the object classes (etc) appear 
to be not defined, even if they are defined in both the old and new 
policies.

We have observed this with both dbus and systemd userspace object 
managers, and it can be reproduced straightforwardly by running the 
following (borrowed from bug linked below):

# (while true; do cat /sys/fs/selinux/class/service/index >/dev/null; 
done) &
# while true; do sudo load_policy; echo -n .; sleep 0.1;done

Periodically, we will get "No such file or directory" messages printed 
to stderr.  In the event of a userspace object manager using libselinux 
to check a userspace permission, that will result in a USER_AVC message 
indicating an unknown object class, and in the event that 
--handle-unknown is set to "deny", it will deny the access.

It seems to me as though some sort of locking should occur in the 
selinuxfs to prevent access to the files being deleted and recreated 
during the policy load, so that userspace programs relying on them (in 
particular userspace object managers doing class lookups) get a 
consistent view of the classes, perms, booleans and capabilities in the 
loaded policy.

This seems to be related to 
https://github.com/SELinuxProject/selinux-kernel/issues/42 but I believe 
it is a different case.  The deadlock in that bug seems to be related to 
the underlying filesystem functions, specifically around directory 
deletion while this is an issue the selinuxfs logic specifically. The 
above linked issue appears to have been fixed in recent upstream 
kernels, per the bug, but I have verified the issue I am discussing here 
in 5.7.0 rc3.

It seems to me as though from the perspective of userspace that all of 
sel_make_policy_nodes (or at least all of each of its component 
functions) should be atomic.  There was some discussion in a previous 
thread 
(https://lore.kernel.org/selinux/20181002155810.GP32577@ZenIV.linux.org.uk/) 
around a significant refactor of policy loading in general.  It appears 
as though the direct issue there of access during the deletion has been 
resolved (commit d4f4de5e5ef8efde85febb6876cd3c8ab1631999), although the 
complete suggested fix of refactoring policy_load into two parts has not 
been done.  Would that refactor be the right approach to the problem I 
am trying to solve?  Would a patch for adding locking around the 
selinuxfs delete/recreate operation be considered? That wouldn't address 
all the concerns, (namely the potential to access a view of the policy 
that doesn't match the currently loaded policy and error recovery in the 
case that sel_make_nodes fails), but would improve the reliability of 
existing userspace object managers

I'm happy to create and submit a patch, but I wanted to get the 
communities thought on the problem and correct approach first.

-Daniel

^ permalink raw reply	[flat|nested] 7+ messages in thread