I have done 2 things which might be of interrest: I) A rt_mutex unittest suite. It might also be usefull against the generic mutexes. II) I changed the priority inheritance mechanism in rt.c, optaining the following goals: 1) rt_mutex deadlocks doesn't become raw_spinlock deadlocks. And more importantly: futex_deadlocks doesn't become raw_spinlock deadlocks. 2) Time-Predictable code. No matter how deep you nest your locks (kernel or futex) the time spend in irqs or preemption off should be limited. 3) Simpler code. rt.c was kind of messy. Maybe it still is....:-) I have lost: 1) Some speed in the slow slow path. I _might_ have gained some in the normal slow path, though, without meassuring it. Idea: When a task blocks on a lock it adds itself to the wait list and calls schedule(). When it is unblocked it has the lock. Or rather due to grab-locking it has to check again. Therefore the schedule() call is wrapped in a loop. Now when a task is PI boosted, it is at the same time checked if it is blocked on a rt_mutex. If it is, it is unblocked ( wake_up_process_mutex() ). It will now go around in the above loop mentioned above. Within this loop it will now boost the owner of the lock it is blocked on, maybe unblocking the owner, which in turn can boost and unblock the next in the lock chain... At all points there is at least one task boosted to the highest priority required unblocked and working on boosting the next in the lock chain and there is therefore no priority inversion. The boosting of a long list of blocked tasks will clearly take longer than the previous version as there will be task switches. But remember, it is in the slow slow path! And it only occurs when PI boosting is happening on _nested_ locks. What is gained is that the amount of time where irq and preemption is off is limited: One task does it's work with preemption disabled, wakes up the next and enable preemption and schedules. The amount of time spend with preemption disabled is has a clear upper limit, untouched by how complicated and deep the lock structure is. So how many locks do we have to worry about? Two. One for locking the lock. One for locking various PI related data on the task structure, as the pi_waiters list, blocked_on, pending_owner - and also prio. Therefore only lock->wait_lock and sometask->pi_lock will be locked at the same time. And in that order. There is therefore no spinlock deadlocks. And the code is simpler. Because of the simplere code I was able to implement an optimization: Only the first waiter on each lock is member of the owner->pi_waiters. Therefore it is not needed to do any list traversels on neither owner->pi_waiters, not lock->wait_list. Every operation requires touching only removing and/or adding one element to these lists. As for robust futexes: They ought to work out of the box now, blocking in deadlock situations. I have added an entry to /proc//status "BlckOn: ". This can be used to do "post mortem" deadlock detection from userspace. What am I missing: Testing on SMP. I have no SMP machine. The unittest can mimic the SMP somewhat but no unittest can catch _all_ errors. Testing with futexes. ALL_PI_TASKS are always switched on now. This is for making the code simpler. My machine fails to run with CONFIG_DEBUG_DEADLOCKS and CONFIG_DEBUG_PREEMPT on at the same time. I need a serial cabel and on consol over serial to debug it. My screen is too small to see enough there. Figure out more tests to run in my unittester. So why aren't I doing those things before sending the patch? 1) Well my girlfriend comes back tomorrow with our child. I know I will have no time to code anything substential then. 2) I want to make sure Ingo sees this approach before he starts merging preempt_rt and rt_mutex with his now mainstream mutex. Esben