Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >>> Jan Kiszka wrote: >>>> Jan Kiszka wrote: >>>>> Gilles Chanteperdrix wrote: >>>>>> Jan Kiszka wrote: >>>>>>> Charlton, John wrote: >>>>>>>> I added some printk output to the cond.c file in xenomai-2.5.1 and it shows that xnsync_sleep_on returned -ETIMEDOUT as it should: >>>>>>>> Xenomai: registered exported object M1-1908 (mutexes) >>>>>>>> Xenomai: registered exported object C1-1908 (condvars) >>>>>>>> xnsynch_sleep_on returned: -110 >>>>>>>> xnsynch_sleep_on returned: -110 >>>>>>>> xnsynch_sleep_on returned: -110 >>>>>>>> xnsynch_sleep_on returned: -110 >>>>>>>> xnsynch_sleep_on returned: -110 >>>>>>>> xnsynch_sleep_on returned: -110 >>>>>>>> xnsynch_sleep_on returned: -4 >>>>>>>> Xenomai: native: cleaning up cond "C1-1908" (ret=0). >>>>>>>> Xenomai: unregistered exported object C1-1908 (condvars) >>>>>>>> Xenomai: native: cleaning up mutex "M1-1908" (ret=0). >>>>>>>> Xenomai: unregistered exported object M1-1908 (mutexes) >>>>>>>> Xenomai: POSIX: destroyed thread df5a0800 >>>>>>>> >>>>>>>> I put the printk on line cond.c:433 after err is set in rt_cond_wait_prologue(): >>>>>>>> >>>>>>>> info = xnsynch_sleep_on(&cond->synch_base, >>>>>>>> timeout, timeout_mode); >>>>>>>> if (info & XNRMID) >>>>>>>> err = -EIDRM; /* Condvar deleted while pending. */ >>>>>>>> else if (info & XNTIMEO) { >>>>>>>> err = -ETIMEDOUT; /* Timeout. */ >>>>>>>> } >>>>>>>> else if (info & XNBREAK) { >>>>>>>> err = -EINTR; /* Unblocked. */ >>>>>>>> } >>>>>>>> printk(KERN_DEBUG "xnsynch_sleep_on returned: %d\n", err); >>>>>>>> >>>>>>>> I put a printk in rt_cond_wait_inner() which is called directly by rt_cond_wait() from user mode but did not see the output in the dmesg output for that one. >>>>>>> One of our problems is the prologue/epilogue split up (both in kernel >>>>>>> and user space): the epilogue can eat the error code or the prologue, >>>>>>> including the -ETIMEDOUT. >>>>>> Ah. My fault. Looks user-space is Ok though. Only kernel-space has a >>>>>> problem. >>>>>> >>>>> Both are affected. >>>>> >>>>> Could you help me with what issues >>>>> 97323b3287b5ee8cad99a7fa67cd050bc51f76c4 should fix? >>>> Ah, restart after RT-signals! So far we blocked on the concluding >>>> rt_mutex_lock without breaking out to user space, now we have to loop >>>> over -EINTR to allow signals, right? >>> Yes, it is needed even for handling linux signals (getting gdb working >>> for instance). However, since we do not want rt_cond_wait to result into >>> two syscalls all the time, we handle everything in the first syscall if >>> possible. >>> >>> Try this: >>> >>> diff --git a/ksrc/skins/native/syscall.c b/ksrc/skins/native/syscall.c >>> index 40e5cfd..d4e885c 100644 >>> --- a/ksrc/skins/native/syscall.c >>> +++ b/ksrc/skins/native/syscall.c >>> @@ -1869,9 +1869,12 @@ static int __rt_cond_wait_prologue(struct pt_regs *regs) >>> >>> err = rt_cond_wait_prologue(cond, mutex, &lockcnt, timeout_mode, timeout); >>> >>> - if (err == 0 || err == -ETIMEDOUT) >>> - err = rt_cond_wait_epilogue(mutex, lockcnt); >>> - >>> + if (err == 0 || err == -ETIMEDOUT) { >>> + int loc_err = rt_cond_wait_epilogue(mutex, lockcnt); >>> + if (loc_err < 0) >>> + err = loc_err; >>> + } >>> + >>> if (err == -EINTR && __xn_reg_arg3(regs) >>> && __xn_safe_copy_to_user((void __user *)__xn_reg_arg3(regs), >>> &lockcnt, sizeof(lockcnt))) >>> >> Same issue exists in user space. And rt_cond_wait_inner needs fixing. >> And then there is a spurious sign inversion: >> >> diff --git a/src/skins/native/cond.c b/src/skins/native/cond.c >> index 478321d..f874678 100644 >> --- a/src/skins/native/cond.c >> +++ b/src/skins/native/cond.c >> @@ -96,9 +96,9 @@ int rt_cond_wait_until(RT_COND *cond, RT_MUTEX *mutex, RTIME timeout) >> &saved_lockcnt, XN_REALTIME, &timeout); >> >> while (err == -EINTR) >> - err = -XENOMAI_SKINCALL2(__native_muxid, >> - __native_cond_wait_epilogue, mutex, >> - saved_lockcnt); >> + err = XENOMAI_SKINCALL2(__native_muxid, >> + __native_cond_wait_epilogue, mutex, >> + saved_lockcnt); >> #endif /* !CONFIG_XENO_FASTSYNCH */ > > Ok for the sign inversion, but in this case the status is Ok. We call > cond_wait_epilogue only if prologue returned -EINTR, and update the > status, this is what we want. > > There is only one way out which will not break the ABI: do not call > cond_wait_epilogue in the kernel-space cond_wait_prologue. > Unfortunately, that ABI is broken: If we get -ETIMEDOUT on cond wait an -EINTR on mutex lock, we lose the former, don't we? If we can't find a workaround, I'm for breaking the ABI for this particular service - but let's search for an alternative first. Jan