Re: [PATCH] Fast status update interface (/selinux/status)

From: KaiGai Kohei <kaigai@ak.jp.nec.com>
To: Eric Paris <eparis@redhat.com>
Cc: KaiGai Kohei <kaigai@kaigai.gr.jp>,
	selinux@tycho.nsa.gov, ewalsh@tycho.nsa.gov
Subject: Re: [PATCH] Fast status update interface (/selinux/status)
Date: Tue, 14 Sep 2010 18:31:22 +0900	[thread overview]
Message-ID: <4C8F40EA.1040207@ak.jp.nec.com> (raw)
In-Reply-To: <1284410701.2703.31.camel@localhost.localdomain>

(2010/09/14 5:45), Eric Paris wrote:
> On Thu, 2010-09-02 at 17:16 +0900, KaiGai Kohei wrote:
>> (2010/08/28 12:24), KaiGai Kohei wrote:
>>> (2010/08/28 1:19), Eric Paris wrote:
>>>> On Fri, Aug 27, 2010 at 11:48 AM, Eric Paris<eparis@parisplace.org>    wrote:
>>>>> 2010/8/27 KaiGai Kohei<kaigai@ak.jp.nec.com>:
>>>>>> I revised the /selinux/status implementation.
>>>>>>
>>>>>> * It becomes to report 'deny_unknown'. Userspace object manager
>>>>>>     also reference this flag to decide its behavior when the loaded
>>>>>>     policy does not support expected object classes.
>>>>>> * It provided PAGE_READONLY to remap_pfn_range() as page protection
>>>>>>     flag independent from argument of mmap(2), but it was uncommon.
>>>>>>     I fixed to pass vma->vm_page_prot instead of the hardwired flag
>>>>>>     according to any other implementation style.
>>>>>>     Now it returns an error, if user tries to map /selinux/status as
>>>>>>     writable pages.
>>>>>
>>>>> I really hate blowing 4k of memory on every system to show 40 bytes of
>>>>> data on just a few systems.  Is there any change we could allocate the
>>>>> page the first time it is needed rather that at boot?  I know compared
>>>>> to the size of policy and other memory usage in SELinux it's odd for
>>>>> me to complain, but I've decided to get on a reduction if possible
>>>>> kick.
>>>>>
>>>>> Only other comment is that __initcall() is deprecated and we are
>>>>> supposed to use device_initcall() now.
>>>>>
>>>>> If you plan to use it, I'll ack if you change both of those things....
>>>>
>>>> actually if you move to dynamic allocation of the status page and use
>>>> static DEFINE_SPINLOCK instead of static spinlock_t you can get rid of
>>>> the __init() code altogether....
>>>>
>>>
>>> I revised the patch.
>>> It was changed the selinux_kernel_page being allocated at the first time
>>> when application tries to reference the /selinux/status.
>>> At the same time, it declares selinux_status_lock using DEFINE_MUTEX(),
>>> so whole of the __init section has gone.
>>>
>>> In addition, I changed first member of the selinux_kernel_status from
>>> 'length' to 'version', because sizeof(struct ...) is aligned to 64bit
>>> boundary (24bytes) on x86_64 system, although it is actually 20bytes.
>>> If we want to add a 32bit member in the future, 'length' may not inform
>>> applications enough.
>>>
>> How about getting the feature?
>> Although I've not found out this idea for a long time, it is quite helpful
>> feature to implement SE-PostgreSQL (and other upcoming userspace object
>> managers) in less invasive way.
>>
>> I fixed up two minor points in the patch, as follows:
>> * The 4K of status page becomes allocated at the file_operations::open()
>>    method, because it seems to me a bit unnatural that either read() or
>>    mmap() fails due to memory allocation error.
>> * I forgot to eliminate an unnecessary declaration of extern variable.
>>
>>   Signed-off-by: KaiGai Kohei<kaigai@ak.jp.nec.com>
> 
> Sorry I was on vacation for the last 2 weeks.  I'm happy with it so:
> 
> Acked-by: Eric Paris<eparis@redhat.com>
> 
> As to one comment in the code:
> 
> + * In addition, application should also checks the sequence number at
> + * tail of the access control routine. If it is changed from the value
> + * on the head, it means kernel status was changed under processing the
> + * routine. In this case, application should repeat to run the routine
> + * from head, but we expect it is much rare case.
> 
> Is this just to eliminate the race where:
> 
> userspace checks seqno
> 				kernel loads new policy
> userspace avc responds to request
> 
> but the response 'should' have been different thanks to the policy load
> that the userspace object manager didn't hear?  I claim there is no race
> here, since he request had to be made before the policy load, even if
> the userspace AVC didn't respond until after the load.
> 
The reason why I considered that we should check the seqno at the tail
of the access control routine is avc_has_perm() takes tclass as
security_class_t, and requested as access_vector_t which hold numbers
depending on the current policy.

Unlike earlier version of selinux, the code of object classes and
access vectors are not hard-wired; it enables to develop kernel code
and security policy individually, so good.

On the other hand, when and if application queries to selinux, it needs
to translate name of object class and access vectores, as follows:

1: tclass = string_to_security_class("file")
2: requested = string_to_av_perm(tclass, "read");
3: avc_has_perm(..., tclass, requested, ...);

Although it is a quite rare event, reloading new policy has a possibility
to change the code of object classes and access vectors.
So, we need to ensure the step 1-3 being handled atomically from the policy
reloading.

Apart from whether we should describe this section in the kernel source,
it seems to me a scenario that we need to pay an attention.

> I just don't see where closing that 'race' actually improves security.
> The application is going to actually do the privileged operation some
> time after the access check.  So what we have today seems just as good
> when we consider the set of operations
> 
> userspace checks seqno
> userspace avc respond to request
> userspace checks seqno
> 				kernel loads new policy
> userspace actually performs priv operation.
> 
I don't think it is the 'race' to be fixed up.
This pseudo code makes access control decision in atomic.
No difference with any other kernel code which performs priv operations
after the read_unlock(&policy_rwlock) :-)

> Doesn't change the patch at all, I just think it makes your test case
> looks a lot worse than it needs to.
> 
> I wonder what your results are when you use the separate thread method
> which doesn't have anything expect 1,000,000 calls to avc_has_perm().
> Doesn't matter just me wondering out loud and wondering if there is any
> reason to use the separate thread implementation if this is just about
> as fast.
> 
In the situation that we can use the separate thread, I expect here is
no significant differences between mmap()'ing and threading.
However, in some cases, it is an invasive style to launch a worker thread
from plugin modules. And right now, I'm under development of SE-PostgreSQL
and memcached with selinux as plugin modules.

Thanks,
-- 
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
This message was distributed to subscribers of the selinux mailing list.
If you no longer wish to subscribe, send mail to majordomo@tycho.nsa.gov with
the words "unsubscribe selinux" without quotes as the message.