On 2011-07-15 14:30, Gilles Chanteperdrix wrote: > On 07/14/2011 10:57 PM, Jan Kiszka wrote: >> On 2011-07-13 21:12, Gilles Chanteperdrix wrote: >>> On 07/13/2011 09:04 PM, Jan Kiszka wrote: >>>> On 2011-07-13 20:39, Gilles Chanteperdrix wrote: >>>>> On 07/12/2011 07:43 PM, Jan Kiszka wrote: >>>>>> On 2011-07-12 19:38, Gilles Chanteperdrix wrote: >>>>>>> On 07/12/2011 07:34 PM, Jan Kiszka wrote: >>>>>>>> On 2011-07-12 19:31, Gilles Chanteperdrix wrote: >>>>>>>>> On 07/12/2011 02:57 PM, Jan Kiszka wrote: >>>>>>>>>> xnlock_put_irqrestore(&nklock, s); >>>>>>>>>> xnpod_schedule(); >>>>>>>>>> } >>>>>>>>>> @@ -1036,6 +1043,7 @@ redo: >>>>>>>>>> * to process this signal anyway. >>>>>>>>>> */ >>>>>>>>>> if (rthal_current_domain == rthal_root_domain) { >>>>>>>>>> + XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); >>>>>>>>> >>>>>>>>> Misleading dead code again, XNATOMIC is cleared not ten lines above. >>>>>>>> >>>>>>>> Nope, I forgot to remove that line. >>>>>>>> >>>>>>>>> >>>>>>>>>> if (XENO_DEBUG(NUCLEUS) && (!signal_pending(this_task) >>>>>>>>>> || this_task->state != TASK_RUNNING)) >>>>>>>>>> xnpod_fatal >>>>>>>>>> @@ -1044,6 +1052,8 @@ redo: >>>>>>>>>> return -ERESTARTSYS; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> + xnthread_clear_info(thread, XNATOMIC); >>>>>>>>> >>>>>>>>> Why this? I find the xnthread_clear_info(XNATOMIC) right at the right >>>>>>>>> place at the point it currently is. >>>>>>>> >>>>>>>> Nope. Now we either clear XNATOMIC after successful migration or when >>>>>>>> the signal is about to be sent (ie. in the hook). That way we can test >>>>>>>> more reliably (TM) in the gatekeeper if the thread can be migrated. >>>>>>> >>>>>>> Ok for adding the XNATOMIC test, because it improves the robustness, but >>>>>>> why changing the way XNATOMIC is set and clear? Chances of breaking >>>>>>> thing while changing code in this area are really high... >>>>>> >>>>>> The current code is (most probably) broken as it does not properly >>>>>> synchronizes the gatekeeper against a signaled and "runaway" target >>>>>> Linux task. >>>>>> >>>>>> We need an indication if a Linux signal will (or already has) woken up >>>>>> the to-be-migrated task. That task may have continued over its context, >>>>>> potentially on a different CPU. Providing this indication is the purpose >>>>>> of changing where XNATOMIC is cleared. >>>>> >>>>> What about synchronizing with the gatekeeper with a semaphore, as done >>>>> in the first patch you sent, but doing it in xnshadow_harden, as soon as >>>>> we detect that we are not back from schedule in primary mode? It seems >>>>> it would avoid any further issue, as we would then be guaranteed that >>>>> the thread could not switch to TASK_INTERRUPTIBLE again before the >>>>> gatekeeper is finished. >>>> >>>> The problem is that the gatekeeper tests the task state without holding >>>> the task's rq lock (which is not available to us without a kernel >>>> patch). That cannot work reliably as long as we accept signals. That's >>>> why I'm trying to move state change and test under nklock. >>>> >>>>> >>>>> What worries me is the comment in xnshadow_harden: >>>>> >>>>> * gatekeeper sent us to primary mode. Since >>>>> * TASK_UNINTERRUPTIBLE is unavailable to us without wrecking >>>>> * the runqueue's count of uniniterruptible tasks, we just >>>>> * notice the issue and gracefully fail; the caller will have >>>>> * to process this signal anyway. >>>>> */ >>>>> >>>>> Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this >>>>> point? Or simply that TASK_UNINTERRUPTIBLE is not available for the >>>>> business of xnshadow_harden? >>>>> >>>> >>>> TASK_UNINTERRUPTIBLE is not available without patching the kernel's >>>> scheduler for the reason mentioned in the comment (the scheduler becomes >>>> confused and may pick the wrong tasks, IIRC). >>> >>> Does not using down/up in the taskexit event handler risk to cause the >>> same issue? >> >> Yes, and that means the first patch is incomplete without something like >> the second. >> >>> >>>> >>>> But I would refrain from trying to "improve" the gatekeeper design. I've >>>> recently mentioned this to Philippe offlist: For Xenomai 3 with some >>>> ipipe v3, we must rather patch schedule() to enable zero-switch domain >>>> migration. Means: enter the scheduler, let it suspend current and pick >>>> another task, but then simply escalate to the RT domain before doing any >>>> context switch. That's much cheaper than the current design and >>>> hopefully also less error-prone. >>> >>> So, do you want me to merge your for-upstream branch? >> >> You may merge up to for-upstream^, ie. without any gatekeeper fixes. >> >> I strongly suspect that there are still more races in the migration >> path. The crashes we face even with all patches applied may be related >> to a shadow task being executed under Linux and Xenomai at the same time. > > Maybe we could try the following patch instead? > > diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c > index 01f4200..deb7620 100644 > --- a/ksrc/nucleus/shadow.c > +++ b/ksrc/nucleus/shadow.c > @@ -1033,6 +1033,8 @@ redo: > xnpod_fatal > ("xnshadow_harden() failed for thread %s[%d]", > thread->name, xnthread_user_pid(thread)); > + down(&sched->gksync); > + up(&sched->gksync); > return -ERESTARTSYS; > } > I don't think we need this. But we may need locking around clear_task_nowakeup in xnshadow_relax, it could race with state manipulation done over the task context on a different CPU. But... right now it looks like we found our primary regression: "nucleus/shadow: shorten the uninterruptible path to secondary mode". It opens a short windows during relax where the migrated task may be active under both schedulers. We are currently evaluating a revert (looks good so far), and I need to work out my theory in more details. Jan