On 2011-07-12 14:13, Jan Kiszka wrote: > On 2011-07-12 14:06, Gilles Chanteperdrix wrote: >> On 07/12/2011 01:58 PM, Jan Kiszka wrote: >>> On 2011-07-12 13:56, Jan Kiszka wrote: >>>> However, this parallel unsynchronized execution of the gatekeeper and >>>> its target thread leaves an increasingly bad feeling on my side. Did we >>>> really catch all corner cases now? I wouldn't guarantee that yet. >>>> Specifically as I still have an obscure crash of a Xenomai thread on >>>> Linux schedule() on my table. >>>> >>>> What if the target thread woke up due to a signal, continued much >>>> further on a different CPU, blocked in TASK_INTERRUPTIBLE, and then the >>>> gatekeeper continued? I wish we could already eliminate this complexity >>>> and do the migration directly inside schedule()... >>> >>> BTW, we do we mask out TASK_ATOMICSWITCH when checking the task state in >>> the gatekeeper? What would happen if we included it (state == >>> (TASK_ATOMICSWITCH | TASK_INTERRUPTIBLE))? >> >> I would tend to think that what we should check is >> xnthread_test_info(XNATOMIC). Or maybe check both, the interruptible >> state and the XNATOMIC info bit. > > Actually, neither the info bits nor the task state is sufficiently > synchronized against the gatekeeper yet. We need to hold a shared lock > when testing and resetting the state. I'm not sure yet if that is > fixable given the gatekeeper architecture. > This may work (on top of the exit-race fix): diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 50dcf43..90feb16 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -913,20 +913,27 @@ static int gatekeeper_thread(void *data) if ((xnthread_user_task(target)->state & ~TASK_ATOMICSWITCH) == TASK_INTERRUPTIBLE) { rpi_pop(target); xnlock_get_irqsave(&nklock, s); -#ifdef CONFIG_SMP + /* - * If the task changed its CPU while in - * secondary mode, change the CPU of the - * underlying Xenomai shadow too. We do not - * migrate the thread timers here, it would - * not work. For a "full" migration comprising - * timers, using xnpod_migrate_thread is - * required. + * Recheck XNATOMIC to avoid waking the shadow if the + * Linux task received a signal meanwhile. */ - if (target->sched != sched) - xnsched_migrate_passive(target, sched); + if (xnthread_test_info(target, XNATOMIC)) { +#ifdef CONFIG_SMP + /* + * If the task changed its CPU while in + * secondary mode, change the CPU of the + * underlying Xenomai shadow too. We do not + * migrate the thread timers here, it would + * not work. For a "full" migration comprising + * timers, using xnpod_migrate_thread is + * required. + */ + if (target->sched != sched) + xnsched_migrate_passive(target, sched); #endif /* CONFIG_SMP */ - xnpod_resume_thread(target, XNRELAX); + xnpod_resume_thread(target, XNRELAX); + } xnlock_put_irqrestore(&nklock, s); xnpod_schedule(); } @@ -1036,6 +1043,7 @@ redo: * to process this signal anyway. */ if (rthal_current_domain == rthal_root_domain) { + XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC)); if (XENO_DEBUG(NUCLEUS) && (!signal_pending(this_task) || this_task->state != TASK_RUNNING)) xnpod_fatal @@ -1044,6 +1052,8 @@ redo: return -ERESTARTSYS; } + xnthread_clear_info(thread, XNATOMIC); + /* "current" is now running into the Xenomai domain. */ thread->gksched = NULL; sched = xnsched_finish_unlocked_switch(thread->sched); @@ -2650,6 +2660,8 @@ static inline void do_sigwake_event(struct task_struct *p) xnlock_get_irqsave(&nklock, s); + xnthread_clear_info(thread, XNATOMIC); + if ((p->ptrace & PT_PTRACED) && !xnthread_test_state(thread, XNDEBUG)) { sigset_t pending; It totally ignores RPI and PREEMPT_RT for now. RPI is broken anyway, ripping it out would allow to use solely XNATOMIC as condition in the gatekeeper. /me is now looking to get this applied to a testbox that still crashes suspiciously. Jan