On 2011-07-12 13:26, Gilles Chanteperdrix wrote:
> On 07/12/2011 01:10 PM, Jan Kiszka wrote:
>> On 2011-07-12 13:08, Gilles Chanteperdrix wrote:
>>> On 07/12/2011 01:06 PM, Jan Kiszka wrote:
>>>> On 2011-07-12 13:04, Gilles Chanteperdrix wrote:
>>>>> On 07/12/2011 01:00 PM, Jan Kiszka wrote:
>>>>>> On 2011-07-12 12:59, Gilles Chanteperdrix wrote:
>>>>>>> On 07/12/2011 09:22 AM, Jan Kiszka wrote:
>>>>>>>> On 2011-07-12 08:41, Gilles Chanteperdrix wrote:
>>>>>>>>> On 07/11/2011 10:12 PM, Jan Kiszka wrote:
>>>>>>>>>> On 2011-07-11 22:09, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 07/11/2011 10:06 PM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2011-07-11 22:02, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On 07/11/2011 09:59 PM, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2011-07-11 21:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On 07/11/2011 09:16 PM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>> On 2011-07-11 21:10, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>> On 2011-07-11 20:53, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>> On 07/08/2011 06:29 PM, GIT version control wrote:
>>>>>>>>>>>>>>>>>>> @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct task_struct *p)
>>>>>>>>>>>>>>>>>>>  	magic = xnthread_get_magic(thread);
>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>  	xnlock_get_irqsave(&nklock, s);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +	gksched = thread->gksched;
>>>>>>>>>>>>>>>>>>> +	if (gksched) {
>>>>>>>>>>>>>>>>>>> +		xnlock_put_irqrestore(&nklock, s);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Are we sure irqs are on here? Are you sure that what is needed is not an
>>>>>>>>>>>>>>>>>> xnlock_clear_irqon?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We are in the context of do_exit. Not only IRQs are on, also preemption.
>>>>>>>>>>>>>>>>> And surely no nklock is held.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Furthermore, I do not understand how we
>>>>>>>>>>>>>>>>>> "synchronize" with the gatekeeper, how is the gatekeeper garanteed to
>>>>>>>>>>>>>>>>>> wait for this assignment?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The gatekeeper holds the gksync token while it's active. We request it,
>>>>>>>>>>>>>>>>> thus we wait for the gatekeeper to become idle again. While it is idle,
>>>>>>>>>>>>>>>>> we reset the queued reference - but I just realized that this may tramp
>>>>>>>>>>>>>>>>> on other tasks' values. I need to add a check that the value to be
>>>>>>>>>>>>>>>>> null'ified is actually still ours.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thinking again, that's actually not a problem: gktarget is only needed
>>>>>>>>>>>>>>>> while gksync is zero - but then we won't get hold of it anyway and,
>>>>>>>>>>>>>>>> thus, can't cause any damage.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Well, you make it look like it does not work. From what I understand,
>>>>>>>>>>>>>>> what you want is to set gktarget to null if a task being hardened is
>>>>>>>>>>>>>>> destroyed. But by waiting for the semaphore, you actually wait for the
>>>>>>>>>>>>>>> harden to be complete, so setting to NULL is useless. Or am I missing
>>>>>>>>>>>>>>> something else?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Setting to NULL is probably unneeded but still better than rely on the
>>>>>>>>>>>>>> gatekeeper never waking up spuriously and then dereferencing a stale
>>>>>>>>>>>>>> pointer.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The key element of this fix is waitng on gksync, thus on the completion
>>>>>>>>>>>>>> of the non-RT part of the hardening. Actually, this part usually fails
>>>>>>>>>>>>>> as the target task received a termination signal at this point.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, but since you wait on the completion of the hardening, the test
>>>>>>>>>>>>> if (target &&...) in the gatekeeper code will always be true, because at
>>>>>>>>>>>>> this point the cleanup code will still be waiting for the semaphore.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, except we will ever wake up the gatekeeper later on without an
>>>>>>>>>>>> updated gktarget, ie. spuriously. Better safe than sorry, this is hairy
>>>>>>>>>>>> code anyway (hopefully obsolete one day).
>>>>>>>>>>>
>>>>>>>>>>> The gatekeeper is not woken up by posting the semaphore, the gatekeeper
>>>>>>>>>>> is woken up by the thread which is going to be hardened (and this thread
>>>>>>>>>>> is the one which waits for the semaphore).
>>>>>>>>>>
>>>>>>>>>> All true. And what is the point?
>>>>>>>>>
>>>>>>>>> The point being, would not something like this patch be sufficient?
>>>>>>>>>
>>>>>>>>> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
>>>>>>>>> index 01f4200..4742c02 100644
>>>>>>>>> --- a/ksrc/nucleus/shadow.c
>>>>>>>>> +++ b/ksrc/nucleus/shadow.c
>>>>>>>>> @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct
>>>>>>>>> task_struct *p)
>>>>>>>>>  	magic = xnthread_get_magic(thread);
>>>>>>>>>
>>>>>>>>>  	xnlock_get_irqsave(&nklock, s);
>>>>>>>>> +	if (xnthread_test_info(thread, XNATOMIC)) {
>>>>>>>>> +		struct xnsched *gksched = xnpod_sched_slot(task_cpu(p));
>>>>>>>>
>>>>>>>> That's not reliable, the task might have been migrated by Linux in the
>>>>>>>> meantime. We must use the stored gksched.
>>>>>>>>
>>>>>>>>> +		xnlock_put_irqrestore(&nklock, s);
>>>>>>>>> +
>>>>>>>>> +		/* Thread is in flight to primary mode, wait for the
>>>>>>>>> +		   gatekeeper to be done with it. */
>>>>>>>>> +		down(&gksched->gksync);
>>>>>>>>> +		up(&gksched->gksync);
>>>>>>>>> +
>>>>>>>>> +		xnlock_get_irqsave(&nklock, s);
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>>  	/* Prevent wakeup call from xnshadow_unmap(). */
>>>>>>>>>  	xnshadow_thrptd(p) = NULL;
>>>>>>>>>  	xnthread_archtcb(thread)->user_task = NULL;
>>>>>>>>>
>>>>>>>>
>>>>>>>> Again, setting gktarget to NULL and testing for NULL is simply safer,
>>>>>>>> and I see no gain in skipping that. But if you prefer the
>>>>>>>> micro-optimization, I'll drop it.
>>>>>>>
>>>>>>> Could not we use an info bit instead of adding a pointer?
>>>>>>>
>>>>>>
>>>>>> "That's not reliable, the task might have been migrated by Linux in the
>>>>>> meantime. We must use the stored gksched."
>>>>>
>>>>> I mean add another info bit to mean that the task is queued for wakeup
>>>>> by the gatekeeper.
>>>>>
>>>>> XNGKQ, or something.
>>>>
>>>> What additional value does it provide to gksched != NULL? We need that
>>>> pointer anyway to identify the gatekeeper that holds a reference.
>>>
>>> No, the scheduler which holds the reference is xnpod_sched_slot(task_cpu(p))
>>
>> And that is still not reliable, the task may have been pushed to a
>> different CPU in the meantime.
> 
> I find all this complicated for a very small corner-case, so, I keep
> looking for a simpler solution. Let us try something else.

It's rather the contrary: this solution is straightforward IMHO.

> 
> If the thread is woken up for whatever reason before the gatekeeper,
> then we will hit the following "if" in xnshadow_harden, can we set the
> target to NULL at this point if it is the current thread? With a cmpxchg
> perhaps?

Gatekeeper and target task aren't synchronized, are they?

Jan