On 2011-07-13 21:12, Gilles Chanteperdrix wrote:
> On 07/13/2011 09:04 PM, Jan Kiszka wrote:
>> On 2011-07-13 20:39, Gilles Chanteperdrix wrote:
>>> On 07/12/2011 07:43 PM, Jan Kiszka wrote:
>>>> On 2011-07-12 19:38, Gilles Chanteperdrix wrote:
>>>>> On 07/12/2011 07:34 PM, Jan Kiszka wrote:
>>>>>> On 2011-07-12 19:31, Gilles Chanteperdrix wrote:
>>>>>>> On 07/12/2011 02:57 PM, Jan Kiszka wrote:
>>>>>>>>  			xnlock_put_irqrestore(&nklock, s);
>>>>>>>>  			xnpod_schedule();
>>>>>>>>  		}
>>>>>>>> @@ -1036,6 +1043,7 @@ redo:
>>>>>>>>  	 * to process this signal anyway.
>>>>>>>>  	 */
>>>>>>>>  	if (rthal_current_domain == rthal_root_domain) {
>>>>>>>> +		XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC));
>>>>>>>
>>>>>>> Misleading dead code again, XNATOMIC is cleared not ten lines above.
>>>>>>
>>>>>> Nope, I forgot to remove that line.
>>>>>>
>>>>>>>
>>>>>>>>  		if (XENO_DEBUG(NUCLEUS) && (!signal_pending(this_task)
>>>>>>>>  		    || this_task->state != TASK_RUNNING))
>>>>>>>>  			xnpod_fatal
>>>>>>>> @@ -1044,6 +1052,8 @@ redo:
>>>>>>>>  		return -ERESTARTSYS;
>>>>>>>>  	}
>>>>>>>>  
>>>>>>>> +	xnthread_clear_info(thread, XNATOMIC);
>>>>>>>
>>>>>>> Why this? I find the xnthread_clear_info(XNATOMIC) right at the right
>>>>>>> place at the point it currently is.
>>>>>>
>>>>>> Nope. Now we either clear XNATOMIC after successful migration or when
>>>>>> the signal is about to be sent (ie. in the hook). That way we can test
>>>>>> more reliably (TM) in the gatekeeper if the thread can be migrated.
>>>>>
>>>>> Ok for adding the XNATOMIC test, because it improves the robustness, but
>>>>> why changing the way XNATOMIC is set and clear? Chances of breaking
>>>>> thing while changing code in this area are really high...
>>>>
>>>> The current code is (most probably) broken as it does not properly
>>>> synchronizes the gatekeeper against a signaled and "runaway" target
>>>> Linux task.
>>>>
>>>> We need an indication if a Linux signal will (or already has) woken up
>>>> the to-be-migrated task. That task may have continued over its context,
>>>> potentially on a different CPU. Providing this indication is the purpose
>>>> of changing where XNATOMIC is cleared.
>>>
>>> What about synchronizing with the gatekeeper with a semaphore, as done
>>> in the first patch you sent, but doing it in xnshadow_harden, as soon as
>>> we detect that we are not back from schedule in primary mode? It seems
>>> it would avoid any further issue, as we would then be guaranteed that
>>> the thread could not switch to TASK_INTERRUPTIBLE again before the
>>> gatekeeper is finished.
>>
>> The problem is that the gatekeeper tests the task state without holding
>> the task's rq lock (which is not available to us without a kernel
>> patch). That cannot work reliably as long as we accept signals. That's
>> why I'm trying to move state change and test under nklock.
>>
>>>
>>> What worries me is the comment in xnshadow_harden:
>>>
>>> 	 * gatekeeper sent us to primary mode. Since
>>> 	 * TASK_UNINTERRUPTIBLE is unavailable to us without wrecking
>>> 	 * the runqueue's count of uniniterruptible tasks, we just
>>> 	 * notice the issue and gracefully fail; the caller will have
>>> 	 * to process this signal anyway.
>>> 	 */
>>>
>>> Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this
>>> point? Or simply that TASK_UNINTERRUPTIBLE is not available for the
>>> business of xnshadow_harden?
>>>
>>
>> TASK_UNINTERRUPTIBLE is not available without patching the kernel's
>> scheduler for the reason mentioned in the comment (the scheduler becomes
>> confused and may pick the wrong tasks, IIRC).
> 
> Does not using down/up in the taskexit event handler risk to cause the
> same issue?

Yes, and that means the first patch is incomplete without something like
the second.

> 
>>
>> But I would refrain from trying to "improve" the gatekeeper design. I've
>> recently mentioned this to Philippe offlist: For Xenomai 3 with some
>> ipipe v3, we must rather patch schedule() to enable zero-switch domain
>> migration. Means: enter the scheduler, let it suspend current and pick
>> another task, but then simply escalate to the RT domain before doing any
>> context switch. That's much cheaper than the current design and
>> hopefully also less error-prone.
> 
> So, do you want me to merge your for-upstream branch?

You may merge up to for-upstream^, ie. without any gatekeeper fixes.

I strongly suspect that there are still more races in the migration
path. The crashes we face even with all patches applied may be related
to a shadow task being executed under Linux and Xenomai at the same time.

Jan