linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [PATCH] Priority Lists for the RT mutex
@ 2005-04-11  8:49 Perez-Gonzalez, Inaky
  2005-04-11  8:57 ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Perez-Gonzalez, Inaky @ 2005-04-11  8:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Sven-Thorsten Dietrich, Daniel Walker, linux-kernel,
	Steven Rostedt, Esben Nielsen, Joe Korty

>From: Ingo Molnar [mailto:mingo@elte.hu]
>
>* Perez-Gonzalez, Inaky <inaky.perez-gonzalez@intel.com> wrote:
>
>> >OTOH, deadlock detection is another issue. It's quite expensive and
i'm
>> >not sure we want to make it a runtime thing. But for fusyn's
deadlock
>> >detection and safe teardown on owner-exit is a must-have i suspect?
>>
>> Not really. Deadlock check is needed on PI, so it can be done at the
>> same time (you have to walk the chain anyway). In any other case, it
>> is an option you can request (or not).
>
>well, i was talking about the mutex code in PREEMPT_RT. There deadlock
>detection is an optional debug feature. You dont _have_ to do deadlock
>detection for the kernel's locks, and there's a difference in
>performance.

Big mouth'o mine :-| 

Let me re-phrase then: it is a must have only on PI, to make sure 
you don't have a loop when doing it. Maybe is a consequence of the
algorithm I chose. -However- it should be possible to disable it
in cases where you are reasonably sure it won't happen (such as
kernel code). In any case, AFAIR, I still did not implement it.

Was this more useful?

-- Inaky 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-04-11  8:49 [PATCH] Priority Lists for the RT mutex Perez-Gonzalez, Inaky
@ 2005-04-11  8:57 ` Ingo Molnar
  2005-04-11 22:17   ` Bill Huey
  2005-04-12 20:35   ` Esben Nielsen
  0 siblings, 2 replies; 38+ messages in thread
From: Ingo Molnar @ 2005-04-11  8:57 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: Sven-Thorsten Dietrich, Daniel Walker, linux-kernel,
	Steven Rostedt, Esben Nielsen, Joe Korty


* Perez-Gonzalez, Inaky <inaky.perez-gonzalez@intel.com> wrote:

> Let me re-phrase then: it is a must have only on PI, to make sure you 
> don't have a loop when doing it. Maybe is a consequence of the 
> algorithm I chose. -However- it should be possible to disable it in 
> cases where you are reasonably sure it won't happen (such as kernel 
> code). In any case, AFAIR, I still did not implement it.

are there cases where userspace wants to disable deadlock-detection for 
its own locks?

the deadlock detector in PREEMPT_RT is pretty much specialized for 
debugging (it does all sorts of weird locking tricks to get the first 
deadlock out, and to really report it on the console), but it ought to 
be possible to make it usable for userspace-controlled locks as well.

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-04-11  8:57 ` Ingo Molnar
@ 2005-04-11 22:17   ` Bill Huey
  2005-04-12 20:35   ` Esben Nielsen
  1 sibling, 0 replies; 38+ messages in thread
From: Bill Huey @ 2005-04-11 22:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Perez-Gonzalez, Inaky, Sven-Thorsten Dietrich, Daniel Walker,
	linux-kernel, Steven Rostedt, Esben Nielsen, Joe Korty

On Mon, Apr 11, 2005 at 10:57:37AM +0200, Ingo Molnar wrote:
> 
> * Perez-Gonzalez, Inaky <inaky.perez-gonzalez@intel.com> wrote:
> 
> > Let me re-phrase then: it is a must have only on PI, to make sure you 
> > don't have a loop when doing it. Maybe is a consequence of the 
> > algorithm I chose. -However- it should be possible to disable it in 
> > cases where you are reasonably sure it won't happen (such as kernel 
> > code). In any case, AFAIR, I still did not implement it.
> 
> are there cases where userspace wants to disable deadlock-detection for 
> its own locks?

I'd disable it for userspace locks. There might be folks that want to
implement userspace drivers, but I can't imagine it being 'ok' to have
the kernel call out to userspace and have it block correctly. I would
expect them to do something else that's less drastic.
 
> the deadlock detector in PREEMPT_RT is pretty much specialized for 
> debugging (it does all sorts of weird locking tricks to get the first 
> deadlock out, and to really report it on the console), but it ought to 
> be possible to make it usable for userspace-controlled locks as well.

If I understand things correctly, I'd let that be an RT app issue and
the app folks should decided what is appropriate for their setup. If
they need a deadlock detector they should decide on their own protocol.
The kernel debugging issues are completely different.

That's my two cents.

bill


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-04-11  8:57 ` Ingo Molnar
  2005-04-11 22:17   ` Bill Huey
@ 2005-04-12 20:35   ` Esben Nielsen
  1 sibling, 0 replies; 38+ messages in thread
From: Esben Nielsen @ 2005-04-12 20:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Perez-Gonzalez, Inaky, Sven-Thorsten Dietrich, Daniel Walker,
	linux-kernel, Steven Rostedt, Joe Korty

I looked at the PI-code to see what priority the task (old_owner below)
would end up with when it released a lock. From rt.c:

	prio = mutex_getprio(old_owner);
	if (new_owner && !plist_empty(&new_owner->pi_waiters)) {
		w = plist_entry(&new_owner->pi_waiters, struct
rt_mutex_waiter, pi_list);
		prio = w->task->prio;
	}
	if (prio != old_owner->prio)
		pi_setprio(lock, old_owner, prio);

What has new_owner to do with it? Shouldn't it be old_owner in these
lines? I.e. the prio we want to set old_owner to should be the prio of the
head of the old_owner->pi_waiters, not the new_owner!

Esben


On Mon, 11 Apr 2005, Ingo Molnar wrote:

> 
> * Perez-Gonzalez, Inaky <inaky.perez-gonzalez@intel.com> wrote:
> 
> > Let me re-phrase then: it is a must have only on PI, to make sure you 
> > don't have a loop when doing it. Maybe is a consequence of the 
> > algorithm I chose. -However- it should be possible to disable it in 
> > cases where you are reasonably sure it won't happen (such as kernel 
> > code). In any case, AFAIR, I still did not implement it.
> 
> are there cases where userspace wants to disable deadlock-detection for 
> its own locks?
> 
> the deadlock detector in PREEMPT_RT is pretty much specialized for 
> debugging (it does all sorts of weird locking tricks to get the first 
> deadlock out, and to really report it on the console), but it ought to 
> be possible to make it usable for userspace-controlled locks as well.
> 
> 	Ingo
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-05-09 21:29             ` Daniel Walker
@ 2005-05-09 22:15               ` Daniel Walker
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Walker @ 2005-05-09 22:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Perez-Gonzalez, Inaky, Oleg Nesterov, Ingo Molnar

On Mon, 2005-05-09 at 14:29, Daniel Walker wrote:

> 
> It's a typo , I mean rt_mutex->wait_list . Sorry .. The list part is a
> list head , which is two pointers. We could condense that to one
> pointer, and it eliminates the overhead of the structure we use .. Below
> seems to work, but consider it an rfc patch.

That last patch misses the first guy on the wait_list , but I think we
could work around that somehow ..

Daniel


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-05-09 19:21           ` Steven Rostedt
@ 2005-05-09 21:29             ` Daniel Walker
  2005-05-09 22:15               ` Daniel Walker
  0 siblings, 1 reply; 38+ messages in thread
From: Daniel Walker @ 2005-05-09 21:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Perez-Gonzalez, Inaky, Oleg Nesterov, Ingo Molnar

On Mon, 2005-05-09 at 12:21, Steven Rostedt wrote:
> On Mon, 2005-05-09 at 11:13 -0700, Daniel Walker wrote:
> > On Mon, 2005-05-09 at 02:11, Ingo Molnar wrote:
> > 
> > > What would be nice to achieve are [low-cost] reductions of the size of 
> > > struct rt_mutex (in include/linux/rt_lock.h), upon which all other 
> > > PI-aware locking objects are based. Right now it's 9 words, of which 
> > > struct plist is 5 words. Would be nice to trim this to 8 words - which 
> > > would give a nice round size of 32 bytes on 32-bit.
> > 
> > Why not make rt_mutex->wait_lock a pointer ? Set it to NULL and handle
> > it in rt.c .
> 
> That may make the rt_mutex structure smaller but this increases the size
> of the kernel by the size of that pointer (times every rt_mutex in the
> kernel!). You still need to allocate the size of the raw spin lock,
> although now you just point to it. Is rounding worth that much overhead?


It's a typo , I mean rt_mutex->wait_list . Sorry .. The list part is a
list head , which is two pointers. We could condense that to one
pointer, and it eliminates the overhead of the structure we use .. Below
seems to work, but consider it an rfc patch..

Daniel


Index: linux-2.6.11/include/linux/rt_lock.h
===================================================================
--- linux-2.6.11.orig/include/linux/rt_lock.h	2005-05-09 17:49:26.000000000 +0000
+++ linux-2.6.11/include/linux/rt_lock.h	2005-05-09 19:29:05.000000000 +0000
@@ -58,7 +58,7 @@ typedef struct {
  */
 struct rt_mutex {
 	raw_spinlock_t		wait_lock;
-	struct plist		wait_list;
+	struct plist		*wait_list;
 	struct task_struct	*owner;
 	int			owner_prio;
 # ifdef CONFIG_RT_DEADLOCK_DETECT
@@ -86,12 +86,11 @@ struct rt_mutex_waiter {
 #ifdef CONFIG_RT_DEADLOCK_DETECT
 # define __RT_MUTEX_INITIALIZER(lockname) \
 	{ .wait_lock = RAW_SPIN_LOCK_UNLOCKED, \
-	.wait_list = PLIST_INIT((lockname).wait_list, MAX_PRIO),  \
 	.name = #lockname, .file = __FILE__, .line = __LINE__ }
 #else
 # define __RT_MUTEX_INITIALIZER(lockname) \
 	{ .wait_lock = RAW_SPIN_LOCK_UNLOCKED, \
-	PLIST_INIT((lockname).wait_list, MAX_PRIO) }
+	 }
 #endif
 /*
  * RW-semaphores are an RT mutex plus a reader-depth count.
Index: linux-2.6.11/kernel/rt.c
===================================================================
--- linux-2.6.11.orig/kernel/rt.c	2005-05-09 18:14:31.000000000 +0000
+++ linux-2.6.11/kernel/rt.c	2005-05-09 19:30:37.000000000 +0000
@@ -640,10 +640,11 @@ static void pi_setprio(struct rt_mutex *
 			plist_init(&w->pi_list, prio);
 			plist_add(&w->pi_list, &lock->owner->pi_waiters);
 
-			plist_del(&w->list, &lock->wait_list);
-			plist_init(&w->list, prio);
-			plist_add(&w->list, &lock->wait_list);
-
+			if (!plist_empty(lock->wait_list)) {
+				plist_del(&w->list, lock->wait_list);
+				plist_init(&w->list, prio);
+				plist_add(&w->list, lock->wait_list);
+			}
 		}
 		/*
 		 * If the task is blocked on a lock, and we just restored
@@ -656,10 +657,11 @@ static void pi_setprio(struct rt_mutex *
 		if (!rt_task(p) && !plist_empty(&w->pi_list)) {
 			TRACE_BUG_ON(!was_rt);
 			plist_del(&w->pi_list, &lock->owner->pi_waiters);
-			plist_del(&w->list, &lock->wait_list);
-			plist_init(&w->list, prio);
-			plist_add(&w->list, &lock->wait_list);
-
+			if (!plist_empty(lock->wait_list)) {
+				plist_del(&w->list, lock->wait_list);
+				plist_init(&w->list, prio);
+				plist_add(&w->list, lock->wait_list);
+			}
 		}
 
 		pi_walk++;
@@ -694,16 +696,23 @@ task_blocks_on_lock(struct rt_mutex_wait
 	 */
 #ifndef ALL_TASKS_PI
 	if (!rt_task(task)) {
-		plist_add(&waiter->list, &lock->wait_list);
+		if (lock->wait_list)
+			plist_add(&waiter->list, lock->wait_list);
+		else
+			lock->wait_list = &waiter->list;
 		return;
 	}
 #endif
 	spin_lock(&pi_lock);
 	plist_add(&waiter->pi_list, &lock->owner->pi_waiters);
+
 	/*
 	 * Add RT tasks to the head:
 	 */
-	plist_add(&waiter->list, &lock->wait_list);
+	if (lock->wait_list)
+		plist_add(&waiter->list, lock->wait_list);
+	else
+		lock->wait_list = &waiter->list;
 	/*
 	 * If the waiter has higher priority than the owner
 	 * then temporarily boost the owner:
@@ -721,7 +730,7 @@ static void __init_rt_mutex(struct rt_mu
 {
 	lock->owner = NULL;
 	spin_lock_init(&lock->wait_lock);
-	plist_init(&lock->wait_list, MAX_PRIO);
+	lock->wait_list = NULL;
 #ifdef CONFIG_RT_DEADLOCK_DETECT
 	lock->save_state = save_state;
 	INIT_LIST_HEAD(&lock->held_list);
@@ -771,7 +780,7 @@ static inline struct task_struct * pick_
 	 *
 	 * (same-prio RT tasks go FIFO)
 	 */
-	waiter = plist_entry(&lock->wait_list, struct rt_mutex_waiter, list);
+	waiter = plist_entry(lock->wait_list, struct rt_mutex_waiter, list);
 
 	trace_special_pid(waiter->task->pid, waiter->task->prio, 0);
 
@@ -779,7 +788,11 @@ static inline struct task_struct * pick_
 	check_pi_list_present(lock, waiter, old_owner);
 #endif
 	new_owner = waiter->task;
-	plist_del_init(&waiter->list, &lock->wait_list);
+	if (plist_empty(lock->wait_list)) {
+		plist_del_init(&waiter->list, lock->wait_list);
+		lock->wait_list = NULL;
+	} else
+		plist_del_init(&waiter->list, lock->wait_list);
 
 	plist_del(&waiter->pi_list, &old_owner->pi_waiters);
 	plist_init(&waiter->pi_list, waiter->task->prio);
@@ -796,9 +809,6 @@ static inline struct task_struct * pick_
 
 static inline void init_lists(struct rt_mutex *lock)
 {
-	// we have to do this until the static initializers get fixed:
-	if (!lock->wait_list.dp_node.prev && !lock->wait_list.dp_node.next)
-		plist_init(&lock->wait_list, MAX_PRIO);
 #ifdef CONFIG_RT_DEADLOCK_DETECT
 	if (!lock->held_list.prev && !lock->held_list.next)
 		INIT_LIST_HEAD(&lock->held_list);
@@ -927,7 +937,7 @@ static void __sched __down(struct rt_mut
 	if (grab_lock(lock,task)) {
 		/* granted */
 		struct task_struct *old_owner = lock->owner;
-		TRACE_WARN_ON(!plist_empty(&lock->wait_list) && !old_owner);
+		TRACE_WARN_ON(lock->wait_list && !old_owner);
 		spin_lock(&pi_lock);
 		set_new_owner(lock, old_owner, task, eip);
 		spin_unlock(&pi_lock);
@@ -1024,7 +1034,7 @@ static void __sched __down_mutex(struct 
 	if (grab_lock(lock,task)) {
 		/* granted */
 		struct task_struct *old_owner = lock->owner;
-		TRACE_WARN_ON(!plist_empty(&lock->wait_list) && !old_owner);
+		TRACE_WARN_ON(lock->wait_list && !old_owner);
 		spin_lock(&pi_lock);
 		set_new_owner(lock, old_owner, task, eip);
 		spin_unlock(&pi_lock);
@@ -1164,7 +1174,7 @@ static int __sched __down_interruptible(
 	if (grab_lock(lock,task)) {
 		/* granted */
 		struct task_struct *old_owner = lock->owner;
-		TRACE_WARN_ON(!plist_empty(&lock->wait_list) && !old_owner);
+		TRACE_WARN_ON(lock->wait_list && !old_owner);
 		spin_lock(&pi_lock);
 		set_new_owner(lock, old_owner, task, eip);
 		spin_unlock(&pi_lock);
@@ -1201,7 +1211,11 @@ wait_again:
 			trace_lock_irq(&trace_lock);
 			spin_lock(&lock->wait_lock);
 			if (waiter.task) {
-				plist_del_init(&waiter.list, &lock->wait_list);
+				if (plist_empty(lock->wait_list)) {
+					plist_del_init(&waiter.list, lock->wait_list);
+					lock->wait_list = NULL;
+				} else
+					plist_del_init(&waiter.list, lock->wait_list);
 				/*
 				 * Just remove ourselves from the PI list.
 				 * (No big problem if our PI effect lingers
@@ -1257,7 +1271,7 @@ static int __down_trylock(struct rt_mute
 	if (grab_lock(lock,task)) {
 		/* granted */
 		struct task_struct *old_owner = lock->owner;
-		TRACE_WARN_ON(!plist_empty(&lock->wait_list) && !old_owner);
+		TRACE_WARN_ON(lock->wait_list && !old_owner);
 		spin_lock(&pi_lock);
 		set_new_owner(lock, old_owner, task, eip);
 		spin_unlock(&pi_lock);
@@ -1324,7 +1338,6 @@ static void __up_mutex(struct rt_mutex *
 	trace_lock_irqsave(&trace_lock, flags);
 	TRACE_BUG_ON(!irqs_disabled());
 	spin_lock(&lock->wait_lock);
-	TRACE_BUG_ON(!lock->wait_list.dp_node.prev && !lock->wait_list.dp_node.next);
 
 #ifdef CONFIG_RT_DEADLOCK_DETECT
 	TRACE_WARN_ON(list_empty(&lock->held_list));
@@ -1334,12 +1347,12 @@ static void __up_mutex(struct rt_mutex *
 
 	old_owner = lock->owner;
 #ifdef ALL_TASKS_PI
-	if (plist_empty(&lock->wait_list))
+	if (lock->wait_list)
 		check_pi_list_empty(lock, old_owner);
 #endif
 	lock->owner = NULL;
 	new_owner = NULL;
-	if (!plist_empty(&lock->wait_list))
+	if (lock->wait_list)
 		new_owner = pick_new_owner(lock, old_owner, save_state, eip);
 
 	/*




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-05-09 18:13         ` Daniel Walker
@ 2005-05-09 19:21           ` Steven Rostedt
  2005-05-09 21:29             ` Daniel Walker
  0 siblings, 1 reply; 38+ messages in thread
From: Steven Rostedt @ 2005-05-09 19:21 UTC (permalink / raw)
  To: dwalker; +Cc: linux-kernel, Perez-Gonzalez, Inaky, Oleg Nesterov, Ingo Molnar

On Mon, 2005-05-09 at 11:13 -0700, Daniel Walker wrote:
> On Mon, 2005-05-09 at 02:11, Ingo Molnar wrote:
> 
> > What would be nice to achieve are [low-cost] reductions of the size of 
> > struct rt_mutex (in include/linux/rt_lock.h), upon which all other 
> > PI-aware locking objects are based. Right now it's 9 words, of which 
> > struct plist is 5 words. Would be nice to trim this to 8 words - which 
> > would give a nice round size of 32 bytes on 32-bit.
> 
> Why not make rt_mutex->wait_lock a pointer ? Set it to NULL and handle
> it in rt.c .

That may make the rt_mutex structure smaller but this increases the size
of the kernel by the size of that pointer (times every rt_mutex in the
kernel!). You still need to allocate the size of the raw spin lock,
although now you just point to it. Is rounding worth that much overhead?

-- Steve


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-05-09  9:11       ` Ingo Molnar
  2005-05-09 14:32         ` Oleg Nesterov
@ 2005-05-09 18:13         ` Daniel Walker
  2005-05-09 19:21           ` Steven Rostedt
  1 sibling, 1 reply; 38+ messages in thread
From: Daniel Walker @ 2005-05-09 18:13 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Oleg Nesterov, Perez-Gonzalez, Inaky, linux-kernel

On Mon, 2005-05-09 at 02:11, Ingo Molnar wrote:

> What would be nice to achieve are [low-cost] reductions of the size of 
> struct rt_mutex (in include/linux/rt_lock.h), upon which all other 
> PI-aware locking objects are based. Right now it's 9 words, of which 
> struct plist is 5 words. Would be nice to trim this to 8 words - which 
> would give a nice round size of 32 bytes on 32-bit.

Why not make rt_mutex->wait_lock a pointer ? Set it to NULL and handle
it in rt.c .

Daniel


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-05-09  9:11       ` Ingo Molnar
@ 2005-05-09 14:32         ` Oleg Nesterov
  2005-05-09 18:13         ` Daniel Walker
  1 sibling, 0 replies; 38+ messages in thread
From: Oleg Nesterov @ 2005-05-09 14:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Perez-Gonzalez, Inaky, linux-kernel, Daniel Walker

Ingo Molnar wrote:
> 
> What would be nice to achieve are [low-cost] reductions of the size of
> struct rt_mutex (in include/linux/rt_lock.h), upon which all other
> PI-aware locking objects are based. Right now it's 9 words, of which
> struct plist is 5 words. Would be nice to trim this to 8 words - which
> would give a nice round size of 32 bytes on 32-bit.

Yes, the size of pl_head in that patch is 4 words, it doesn't have ->prio.

Oleg.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-05-07  7:25 ` Oleg Nesterov
  2005-05-09  7:30   ` Ingo Molnar
@ 2005-05-09 14:07   ` Daniel Walker
  1 sibling, 0 replies; 38+ messages in thread
From: Daniel Walker @ 2005-05-09 14:07 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Perez-Gonzalez, Inaky, linux-kernel, Ingo Molnar

On Sat, 7 May 2005, Oleg Nesterov wrote:
> Yes. ->node_list contains *ALL* nodes, that is why we can:
> 
> 	#define	plist_for_each(pos, head)	\
> 		 list_for_each_entry(pos, &(head)->node_list, node_list)
> 
> head <=======>  prio=1 <===> prio=2 <===> ...
>                  /\        /|  /\
>                  |         |   |
>                  \/        |   \/
>                 prio=1     | prio=2
>                  /\       /    /\
>                  |       /     |
>                  \/     /      \/
>                 prio=1 /      ....
>                   <---/
> 
>                                /\
> Where <===> means ->prio_list, |  ->node_list.
>                                \/
> Will do. However, I'm unfamiliar with Ingo's tree, so I
> can send only new plist's implementation.


I've got something like this queued up .. Feel free to send yours as well 
.. 


Daniel


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-05-09  8:08     ` Oleg Nesterov
@ 2005-05-09  9:11       ` Ingo Molnar
  2005-05-09 14:32         ` Oleg Nesterov
  2005-05-09 18:13         ` Daniel Walker
  0 siblings, 2 replies; 38+ messages in thread
From: Ingo Molnar @ 2005-05-09  9:11 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Perez-Gonzalez, Inaky, linux-kernel, Daniel Walker


* Oleg Nesterov <oleg@tv-sign.ru> wrote:

> > i've uploaded my latest tree to:
> > 
> >     http://redhat.com/~mingo/realtime-preempt/
> > 
> 
> Ok, I'll try to do it. Do you have any comments/objections to
> "[RFC][PATCH] alternative implementation of Priority Lists", see
> http://marc.theaimsgroup.com/?l=linux-kernel&m=111547290706136
> ?

i havent reviewed them closely, but on the face of it the changes seem 
plausible.

What would be nice to achieve are [low-cost] reductions of the size of 
struct rt_mutex (in include/linux/rt_lock.h), upon which all other 
PI-aware locking objects are based. Right now it's 9 words, of which 
struct plist is 5 words. Would be nice to trim this to 8 words - which 
would give a nice round size of 32 bytes on 32-bit.

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-05-09  7:30   ` Ingo Molnar
@ 2005-05-09  8:08     ` Oleg Nesterov
  2005-05-09  9:11       ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Oleg Nesterov @ 2005-05-09  8:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Perez-Gonzalez, Inaky, linux-kernel, Daniel Walker

Ingo Molnar wrote:
> 
> * Oleg Nesterov <oleg@tv-sign.ru> wrote:
> 
> > Daniel Walker wrote:
> > >
> > > Make a patch .
> >
> > Will do. However, I'm unfamiliar with Ingo's tree, so I
> > can send only new plist's implementation.
> 
> i've uploaded my latest tree to:
> 
>     http://redhat.com/~mingo/realtime-preempt/
> 

Ok, I'll try to do it. Do you have any comments/objections to
"[RFC][PATCH] alternative implementation of Priority Lists", see
http://marc.theaimsgroup.com/?l=linux-kernel&m=111547290706136
?

Oleg.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-05-07  7:25 ` Oleg Nesterov
@ 2005-05-09  7:30   ` Ingo Molnar
  2005-05-09  8:08     ` Oleg Nesterov
  2005-05-09 14:07   ` Daniel Walker
  1 sibling, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2005-05-09  7:30 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Perez-Gonzalez, Inaky, linux-kernel, Daniel Walker


* Oleg Nesterov <oleg@tv-sign.ru> wrote:

> Daniel Walker wrote:
> >
> > Make a patch .
> 
> Will do. However, I'm unfamiliar with Ingo's tree, so I
> can send only new plist's implementation.

i've uploaded my latest tree to:

    http://redhat.com/~mingo/realtime-preempt/

it's easy to give it a go, it's pretty much plug & play, just apply 
these patches ontop of a vanilla 2.6.11 kernel tree:

   http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.12-rc4.bz2
   http://redhat.com/~mingo/realtime-preempt/realtime-preempt-2.6.12-rc4-V0.7.47-00

take your usual .config, do a 'make oldconfig', make sure you pick up 
PREEMPT_RT (you can leave all the other .config options at their default 
values), recompile and you should be set. All RT tasks (SCHED_FIFO, 
SCHED_RR) will get PI handling and will use the plist.h code.

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-05-06 18:13 Perez-Gonzalez, Inaky
@ 2005-05-07  7:25 ` Oleg Nesterov
  2005-05-09  7:30   ` Ingo Molnar
  2005-05-09 14:07   ` Daniel Walker
  0 siblings, 2 replies; 38+ messages in thread
From: Oleg Nesterov @ 2005-05-07  7:25 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky; +Cc: linux-kernel, Daniel Walker, Ingo Molnar

"Perez-Gonzalez, Inaky" wrote:
>
> >Oleg Nesterov wrote:
> >>
> >> Daniel Walker wrote:
> >> >
> >> > Description:
> >> > 	This patch adds the priority list data structure from Inaky Perez-Gonzalez
> >> > to the Preempt Real-Time mutex.
> >> >
> >> ...
> >And I think it is possible to simplify plist's design.
> >
> > ...
> >
> >lt_prio:
> >	list_add_tail(&new->prio_list, &pos->prio_list);
> >eq_prio:
> >	list_add_tail(&new->node_list, &pos->node_list);
> >}
>
> Isn't this adding them to *both* lists in the lt_prio
> case? I don't understand what do you want to accomplish
> in this case.

Yes. ->node_list contains *ALL* nodes, that is why we can:

	#define	plist_for_each(pos, head)	\
		 list_for_each_entry(pos, &(head)->node_list, node_list)

head <=======>  prio=1 <===> prio=2 <===> ...
                 /\        /|  /\
                 |         |   |
                 \/        |   \/
                prio=1     | prio=2
                 /\       /    /\
                 |       /     |
                 \/     /      \/
                prio=1 /      ....
                  <---/

                               /\
Where <===> means ->prio_list, |  ->node_list.
                               \/

Daniel Walker wrote:
>
> Make a patch .

Will do. However, I'm unfamiliar with Ingo's tree, so I
can send only new plist's implementation.

Oleg.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-05-06 11:13 ` Oleg Nesterov
@ 2005-05-06 18:56   ` Daniel Walker
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Walker @ 2005-05-06 18:56 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Inaky Perez-Gonzalez, Ingo Molnar

On Fri, 6 May 2005, Oleg Nesterov wrote:

> Oleg Nesterov wrote:
> >
> > Daniel Walker wrote:
> > >
> > > Description:
> > > 	This patch adds the priority list data structure from Inaky Perez-Gonzalez
> > > to the Preempt Real-Time mutex.
> > >
> > ...
> >
> > I can't understand how this can work.
> 
> And I think it is possible to simplify plist's design.
> 
> struct plist {
> 	int prio;
> 	struct list_head prio_list;
> 	struct list_head node_list;
> };
> 

Make a patch .

Daniel


^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] Priority Lists for the RT mutex
@ 2005-05-06 18:13 Perez-Gonzalez, Inaky
  2005-05-07  7:25 ` Oleg Nesterov
  0 siblings, 1 reply; 38+ messages in thread
From: Perez-Gonzalez, Inaky @ 2005-05-06 18:13 UTC (permalink / raw)
  To: Oleg Nesterov, linux-kernel, Daniel Walker, Ingo Molnar


>-----Original Message-----
>From: tmp@several.ru [mailto:tmp@several.ru] On Behalf Of Oleg Nesterov
>Sent: Friday, May 06, 2005 4:13 AM
>To: linux-kernel@vger.kernel.org; Daniel Walker; Perez-Gonzalez, Inaky;
Ingo Molnar
>Subject: Re: [PATCH] Priority Lists for the RT mutex
>
>Oleg Nesterov wrote:
>>
>> Daniel Walker wrote:
>> >
>> > Description:
>> > 	This patch adds the priority list data structure from Inaky
Perez-Gonzalez
>> > to the Preempt Real-Time mutex.
>> >
>> ...
>>
>> I can't understand how this can work.
>
>And I think it is possible to simplify plist's design.
>
> ...
>
>					 struct plist, prio_list);
>			goto eq_prio;
>		}
>
>lt_prio:
>	list_add_tail(&new->prio_list, &pos->prio_list);
>eq_prio:
>	list_add_tail(&new->node_list, &pos->node_list);
>}

Isn't this adding them to *both* lists in the lt_prio
case? I don't understand what do you want to accomplish
in this case.

[like for example, if the list is empty and you add one,
it will start a new node_list, but you have also added
it to the head's prio_list] 

If you ware changing the operational mode, can you please
explain your change in more detail?

It could also be I have *the* headache...but hey :)

-- Inaky 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-05-06  9:05 Oleg Nesterov
@ 2005-05-06 11:13 ` Oleg Nesterov
  2005-05-06 18:56   ` Daniel Walker
  0 siblings, 1 reply; 38+ messages in thread
From: Oleg Nesterov @ 2005-05-06 11:13 UTC (permalink / raw)
  To: linux-kernel, Daniel Walker, Inaky Perez-Gonzalez, Ingo Molnar

Oleg Nesterov wrote:
>
> Daniel Walker wrote:
> >
> > Description:
> > 	This patch adds the priority list data structure from Inaky Perez-Gonzalez
> > to the Preempt Real-Time mutex.
> >
> ...
>
> I can't understand how this can work.

And I think it is possible to simplify plist's design.

struct plist {
	int prio;
	struct list_head prio_list;
	struct list_head node_list;
};

#define	plist_for_each(pos, head)	\
	 list_for_each_entry(pos, &(head)->node_list, node_list)

static inline void plist_add(struct plist *new, struct plist *head)
{
	struct plist* pos;

	INIT_LIST_HEAD(&new->prio_list);

	list_for_each_entry(pos, &head->prio_list, prio_list)
		if (new->prio < pos->prio)
			goto lt_prio;
		else if (new->prio == pos->prio) {
			pos = list_entry(pos->prio_list.next,
					 struct plist, prio_list);
			goto eq_prio;
		}

lt_prio:
	list_add_tail(&new->prio_list, &pos->prio_list);
eq_prio:
	list_add_tail(&new->node_list, &pos->node_list);
}

static inline void plist_del(struct plist *del)
{
	if (!list_empty(&del->prio_list)) {
		struct plist *next = list_entry(del->node_list.next,
						struct plist, node_list);
		list_move_tail(&next->prio_list, &del->prio_list);
		list_del_init(&del->prio_list);
	}

	list_del_init(&del->node_list);
}

All nodes in prio list are chained via ->node_list, sorted in ascending
->prio/fifo order. Nodes which start different priority slice are chained
via ->prio_list, ascending ->prio order.

Slightly tested, seems to work.

I think this would be even better:

struct plist_head {
	struct list_head prio_list;
	struct list_head node_list;
}

struct plist {
	int prio;
	struct plist_head head;
}

The plist_head's min/max prio can be found from
plist_head.prio_list.next/prev.

What do you think?

Oleg.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
@ 2005-05-06  9:05 Oleg Nesterov
  2005-05-06 11:13 ` Oleg Nesterov
  0 siblings, 1 reply; 38+ messages in thread
From: Oleg Nesterov @ 2005-05-06  9:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: Daniel Walker, Inaky Perez-Gonzalez, Ingo Molnar

Daniel Walker wrote:
>
> Description:
> 	This patch adds the priority list data structure from Inaky Perez-Gonzalez 
> to the Preempt Real-Time mutex.
>
> ...
>
> +#define plist_for_each(pos1, pos2, head)	\
> +	list_for_each_entry(pos1, &((head)->dp_node), dp_node)	\
> +		list_for_each_entry(pos2, &((pos1)->sp_node), sp_node)

I can't understand how this can work.

The fist list_for_each_entry(->dp_node) will iterate over nodes
with different priorities, ok. But the second one will skip the
first node (which starts new priority group), because list_for_each(head)
does not count head.

To be sure, I wrote simple test:

#include <stdio.h>
#include <limits.h>
#include "list.h"
#include "plist.h"

int main(void)
{
	struct plist head, node, *pos1, *pos2;

	plist_init(&head, 0);
	plist_init(&node, 0);

	plist_add(&node, &head);

	plist_for_each(pos1, pos2, &head)
		printf("Strange ???\n");

	return 0;
}

Prints nothing.

My apologies if I'm misunderstanding something.

Oleg.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-04-07 17:52 Daniel Walker
  2005-04-08  6:28 ` Ingo Molnar
  2005-04-10 10:53 ` Ingo Molnar
@ 2005-04-12 18:37 ` Paul E. McKenney
  2 siblings, 0 replies; 38+ messages in thread
From: Paul E. McKenney @ 2005-04-12 18:37 UTC (permalink / raw)
  To: Daniel Walker; +Cc: linux-kernel, mingo, sdietrich, inaky.perez-gonzalez

On Thu, Apr 07, 2005 at 10:52:25AM -0700, Daniel Walker wrote:
> 
> Source: Daniel Walker <dwalker@mvista.com> MontaVista Software, Inc
> Description:
> 	This patch adds the priority list data structure from Inaky Perez-Gonzalez 
> to the Preempt Real-Time mutex.
> 
> the patch order is (starting with a 2.6.11 kernel tree),
> 
> patch-2.6.12-rc2
> realtime-preempt-2.6.12-rc2-V0.7.44-01
> 	
> Signed-off-by: Daniel Walker <dwalker@mvista.com>
> 
> Index: linux-2.6.11/include/linux/plist.h
> ===================================================================
> --- linux-2.6.11.orig/include/linux/plist.h	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.11/include/linux/plist.h	2005-04-07 17:47:42.000000000 +0000
> @@ -0,0 +1,310 @@

[ . . . ]

> +/* Grunt to do the real removal work of @pl from the plist. */
> +static inline
> +void __plist_del (struct plist *pl)
> +{
> +	struct list_head *victim;
> +	if (list_empty (&pl->dp_node))  	/* SP-node, not head */
> +		victim = &pl->sp_node;
> +	else if (list_empty (&pl->sp_node)) 	/* DP-node, empty SP list */
> +		victim = &pl->dp_node;
> +	else {					/* SP list head, not empty */
> +		struct plist *pl_new = container_of (pl->sp_node.next,
> +						     struct plist, sp_node);
> +		victim = &pl->sp_node;
> +		list_replace_rcu (&pl->dp_node, &pl_new->dp_node);

If you are protecting this list with RCU...

> +	}
> +	list_del_init (victim);

... you need to wait for a grace period before deleting the element
removed from the list.

Or are you just using list_replace_rcu() for its replacement capability?
If so, seems like it might be worthwhile to make a list_replace().
This would get rid of the memory barrier, and also keep from confusing
people like myself.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-04-10 11:09     ` Ingo Molnar
@ 2005-04-12 17:56       ` Daniel Walker
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Walker @ 2005-04-12 17:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Sven-Thorsten Dietrich, linux-kernel, inaky.perez-gonzalez,
	Steven Rostedt, Esben Nielsen, Joe Korty

On Sun, 2005-04-10 at 04:09, Ingo Molnar wrote:
> Unless i'm missing something, this could be implemented by detaching
> lock->owner_prio from lock->owner - via e.g. negative values. Thus some
> minimal code would check whether we need the owner's priority in the PI
> logic, or the semaphore's "own" priority level.

The owners priority should be set to the semaphore's priority .. Or the
highest priority of all the semaphores that it has locked.

Daniel


^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] Priority Lists for the RT mutex
@ 2005-04-12  0:36 Perez-Gonzalez, Inaky
  0 siblings, 0 replies; 38+ messages in thread
From: Perez-Gonzalez, Inaky @ 2005-04-12  0:36 UTC (permalink / raw)
  To: Bill Huey (hui)
  Cc: Ingo Molnar, Sven-Thorsten Dietrich, Daniel Walker, linux-kernel,
	Steven Rostedt, Esben Nielsen, Joe Korty

>From: Bill Huey (hui) [mailto:bhuey@lnxw.com]
>
>> Quick fix: the usual. Enable deadlock detection and if it
>> returns deadlock, assume it is locked already and proceed (or
>> do a recursive mutex, or a trylock).
>
>You have to be joking me ? geez.
>...

This is way *more* common than you think--I've seen it around
some big multithreaded OSS packages [can't remember which now]. 

>> Sure--and because most was for legacy reasons that adhered to
>> POSIX strictly, it was very simple: we need POSIX this, that and
>> that (PI, proper adherence to scheduler policy wake up/rt-behaviour,
>> deadlock detection, etc).
>
>Some of this stuff sounds like recursive locking. Would this be a
>better expression to solve the "top level API locking" problem
>you're referring to ?

Bingo. That's another way to "fix" it. Luckily, recursive locking
can be safely and quickly done in user space (I own this lock,
ergo I just inc the lock count).

The problem with deadlocks is when the scenario gets more complex
and you are trying to lock a mutex and the owner is waiting for
a mutex whose owner is waiting for a mutex that you own...this
more commonly happens when you don't know what the heck is going
on in the code, which unfortunately is very common on people that
inherits big pieces of stacks to maintain.

>> Fortunately in those areas POSIX is not too gray; code to the book.
>> Deal.
>
>I would think that there will have to be a graph discontinuity
>between user/kernel spaces at kernel entry and exit for the deadlock
>detector. Can't say about issues at fork time, but I would expect
>that those objects would have to be destroyed when the process exits.

fork time is not an issue, as POSIX makes forks and thread incompatible
(in a nutshell, only the thread calling fork() survives, all the mutexes
are [IIRC] reinitialized or something like that...).

>The current RT (Ingo's) lock isn't recursive nor is the deadlock
>detector the last time I looked. Do think that this is a problem
>for legacy apps if it gets overload for being the userspace futex
>as well ? (assuming I'm understanding all of this correctly)

Should be not on the recursive side; as I said, that is easy to do
[currently NPTL does it with the futexes]. The deadlock stuff gets
hairier, but it's not such a big of a deal when you have your data
structures setup. It takes time, though.

>> Of course, selling it to the lkml is another story.
>
>I would think that pushing as much of this into userspace would
>make the kernel hooks for it more acceptable. Don't know.

Agreed. Deadlock checking though, has to be done in the kernel. For
the generic case it is the only way to do it sanely.

-- Inaky 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-04-11 23:28 Perez-Gonzalez, Inaky
@ 2005-04-12  0:27 ` Bill Huey
  0 siblings, 0 replies; 38+ messages in thread
From: Bill Huey @ 2005-04-12  0:27 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: Bill Huey (hui),
	Ingo Molnar, Sven-Thorsten Dietrich, Daniel Walker, linux-kernel,
	Steven Rostedt, Esben Nielsen, Joe Korty

On Mon, Apr 11, 2005 at 04:28:25PM -0700, Perez-Gonzalez, Inaky wrote:
> >From: Bill Huey (hui) [mailto:bhuey@lnxw.com]
...
> API than once upon a time was made multithreaded by just adding
> a bunch of pthread_mutex_[un]lock() at the API entry point...
> without realizing that some of the top level API calls also 
> called other top level API calls, so they'd deadlock.

Oh crap.

> Quick fix: the usual. Enable deadlock detection and if it
> returns deadlock, assume it is locked already and proceed (or
> do a recursive mutex, or a trylock).

You have to be joking me ? geez.
... 
> It is certainly something to explore, but I'd better drive your
> way than do it. It's cleaner. Hides implementation details.
>
> I agree, but it doesn't work that well when talking about legacy 
> systems...that's the problem.

Yeah, ok, I understand what's going on now. There isn't a notion
of projecting priority across into the Unix/Linux kernel traditionally
which is why it seemed so bizarre.

> Sure--and because most was for legacy reasons that adhered to 
> POSIX strictly, it was very simple: we need POSIX this, that and
> that (PI, proper adherence to scheduler policy wake up/rt-behaviour,
> deadlock detection, etc). 

Some of this stuff sounds like recursive locking. Would this be a
better expression to solve the "top level API locking" problem
you're referring to ?

> Fortunately in those areas POSIX is not too gray; code to the book.
> Deal. 

I would think that there will have to be a graph discontinuity
between user/kernel spaces at kernel entry and exit for the deadlock
detector. Can't say about issues at fork time, but I would expect
that those objects would have to be destroyed when the process exits.

The current RT (Ingo's) lock isn't recursive nor is the deadlock
detector the last time I looked. Do think that this is a problem
for legacy apps if it gets overload for being the userspace futex
as well ? (assuming I'm understanding all of this correctly)

> Of course, selling it to the lkml is another story.

I would think that pushing as much of this into userspace would
make the kernel hooks for it more acceptable. Don't know.

/me thinks more

bill


^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] Priority Lists for the RT mutex
@ 2005-04-11 23:28 Perez-Gonzalez, Inaky
  2005-04-12  0:27 ` Bill Huey
  0 siblings, 1 reply; 38+ messages in thread
From: Perez-Gonzalez, Inaky @ 2005-04-11 23:28 UTC (permalink / raw)
  To: Bill Huey (hui)
  Cc: Ingo Molnar, Sven-Thorsten Dietrich, Daniel Walker, linux-kernel,
	Steven Rostedt, Esben Nielsen, Joe Korty

>From: Bill Huey (hui) [mailto:bhuey@lnxw.com]
>On Mon, Apr 11, 2005 at 03:31:41PM -0700, Perez-Gonzalez, Inaky wrote:
>> If you are exposing the kernel locks to userspace to implement
>> mutexes (eg POSIX mutexes), deadlock checking is a feature you want
>> to have to complain with POSIX. According to some off the record
>> requirements I've been given, some applications badly need it (I have
>> a hard time believing that they are so broken, but heck...).
>
>I'd like to read about those requirements, but, IMO a lot of the value

More than a formal requirement is a "practical" one. Some
company (leader in their field, of course) has this huge
blobl of code they want to use in Linux and it has the typical
API than once upon a time was made multithreaded by just adding
a bunch of pthread_mutex_[un]lock() at the API entry point...
without realizing that some of the top level API calls also 
called other top level API calls, so they'd deadlock.

Quick fix: the usual. Enable deadlock detection and if it
returns deadlock, assume it is locked already and proceed (or
do a recursive mutex, or a trylock).

And so on, and so forth...

>of various priority protocols varies greatly on the context and size (N
>threads) of the application using it. If user/kernel space have to be
>coupled via some thread of execution, (IMO) then it's better to
seperate
>them with some event notification queues like signals (wake a thread
>via an queue event) than to mix locks across the user/kernel space
> ...

I wonder if we are confusing apples and oranges here--I don't think
we were considering cases about mixing kernel locks that are accessible
both from kernel and user space. 

The focus is to have a kernel lock entity and that user space can
use it for implementing the user space mutexes, but *not* mix
them (like in user and kernel space sharing this lock for doing 
whatever).

It is certainly something to explore, but I'd better drive your
way than do it. It's cleaner. Hides implementation details.

>It's important to outline the requirements of the applications and then
>see what you can do using minimal synchronization objects before
>exploding that divide.

I agree, but it doesn't work that well when talking about legacy 
systems...that's the problem.

>Also, Posix isn't always politically neutral nor complete regarding
>various things. You have to consider the context of these things.
>I'll have to think about this a bit more and review your patch more
>carefully.

Sure--and because most was for legacy reasons that adhered to 
POSIX strictly, it was very simple: we need POSIX this, that and
that (PI, proper adherence to scheduler policy wake up/rt-behaviour,
deadlock detection, etc). 

Fortunately in those areas POSIX is not too gray; code to the book.
Deal. 

Of course, selling it to the lkml is another story.

-- Inaky

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-04-11 22:31 Perez-Gonzalez, Inaky
@ 2005-04-11 23:11 ` Bill Huey
  0 siblings, 0 replies; 38+ messages in thread
From: Bill Huey @ 2005-04-11 23:11 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: Bill Huey (hui),
	Ingo Molnar, Sven-Thorsten Dietrich, Daniel Walker, linux-kernel,
	Steven Rostedt, Esben Nielsen, Joe Korty

On Mon, Apr 11, 2005 at 03:31:41PM -0700, Perez-Gonzalez, Inaky wrote:
> If you are exposing the kernel locks to userspace to implement
> mutexes (eg POSIX mutexes), deadlock checking is a feature you want
> to have to complain with POSIX. According to some off the record
> requirements I've been given, some applications badly need it (I have 
> a hard time believing that they are so broken, but heck...).

I'd like to read about those requirements, but, IMO a lot of the value
of various priority protocols varies greatly on the context and size (N
threads) of the application using it. If user/kernel space have to be
coupled via some thread of execution, (IMO) then it's better to seperate
them with some event notification queues like signals (wake a thread
via an queue event) than to mix locks across the user/kernel space
boundary. There's tons of abuse that can be opened up with various
priority protocols with regard to RT apps and giving it a first class
entry way without consideration is kind of scary.

It's important to outline the requirements of the applications and then
see what you can do using minimal synchronization objects before
exploding that divide.

Also, Posix isn't always politically neutral nor complete regarding
various things. You have to consider the context of these things.
I'll have to think about this a bit more and review your patch more
carefully.

I'm all ears if you think I'm wrong.

bill


^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] Priority Lists for the RT mutex
@ 2005-04-11 22:31 Perez-Gonzalez, Inaky
  2005-04-11 23:11 ` Bill Huey
  0 siblings, 1 reply; 38+ messages in thread
From: Perez-Gonzalez, Inaky @ 2005-04-11 22:31 UTC (permalink / raw)
  To: Bill Huey (hui), Ingo Molnar
  Cc: Sven-Thorsten Dietrich, Daniel Walker, linux-kernel,
	Steven Rostedt, Esben Nielsen, Joe Korty

>From: Bill Huey (hui) [mailto:bhuey@lnxw.com]
>
>On Mon, Apr 11, 2005 at 10:57:37AM +0200, Ingo Molnar wrote:
>>
>> * Perez-Gonzalez, Inaky <inaky.perez-gonzalez@intel.com> wrote:
>>
>> > Let me re-phrase then: it is a must have only on PI, to make sure
you
>> > don't have a loop when doing it. Maybe is a consequence of the
>> > algorithm I chose. -However- it should be possible to disable it in
>> > cases where you are reasonably sure it won't happen (such as kernel
>> > code). In any case, AFAIR, I still did not implement it.
>>
>> are there cases where userspace wants to disable deadlock-detection
for
>> its own locks?
>
>I'd disable it for userspace locks. There might be folks that want to
>implement userspace drivers, but I can't imagine it being 'ok' to have
>the kernel call out to userspace and have it block correctly. I would
>expect them to do something else that's less drastic.

If you are exposing the kernel locks to userspace to implement
mutexes (eg POSIX mutexes), deadlock checking is a feature you want
to have to complain with POSIX. According to some off the record
requirements I've been given, some applications badly need it (I have 
a hard time believing that they are so broken, but heck...).

-- Inaky

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-04-10 10:53 ` Ingo Molnar
@ 2005-04-11 16:19   ` Daniel Walker
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Walker @ 2005-04-11 16:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, sdietrich, inaky.perez-gonzalez

On Sun, 2005-04-10 at 03:53, Ingo Molnar wrote:

> ok, i've added this patch to the -45-00 release. It's looking good on my 
> testsystems so far, but it will need some more testing i guess.


Yes, I ran the PI test, and just let the system run .. So it could use
more extensive testing..

Daniel 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] Priority Lists for the RT mutex
@ 2005-04-11  9:03 Perez-Gonzalez, Inaky
  0 siblings, 0 replies; 38+ messages in thread
From: Perez-Gonzalez, Inaky @ 2005-04-11  9:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Sven-Thorsten Dietrich, Daniel Walker, linux-kernel,
	Steven Rostedt, Esben Nielsen, Joe Korty

>From: Ingo Molnar [mailto:mingo@elte.hu]
>
>* Perez-Gonzalez, Inaky <inaky.perez-gonzalez@intel.com> wrote:
>
>> Let me re-phrase then: it is a must have only on PI, to make sure you
>> don't have a loop when doing it. Maybe is a consequence of the
>> algorithm I chose. -However- it should be possible to disable it in
>> cases where you are reasonably sure it won't happen (such as kernel
>> code). In any case, AFAIR, I still did not implement it.
>
>are there cases where userspace wants to disable deadlock-detection for
>its own locks?

I would guess--if I know I have coded my application properly
(cough) or I am using locks that by design are completely orthogonal,
I would say deadlock checking is getting in the way.

>the deadlock detector in PREEMPT_RT is pretty much specialized for
>debugging (it does all sorts of weird locking tricks to get the first
>deadlock out, and to really report it on the console), but it ought to
>be possible to make it usable for userspace-controlled locks as well.

fusyn's is as simple as it can get: when you are about to lock(), it
checks that you don't own the lock already, but it generalizes it
(it checks that the owner of the lock is not waiting for a lock 
whose owner is waiting for a lock whose owner...is waiting for a lock
that you own).

-- Inaky

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-04-11  8:27 Perez-Gonzalez, Inaky
@ 2005-04-11  8:42 ` Ingo Molnar
  0 siblings, 0 replies; 38+ messages in thread
From: Ingo Molnar @ 2005-04-11  8:42 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: Sven-Thorsten Dietrich, Daniel Walker, linux-kernel,
	Steven Rostedt, Esben Nielsen, Joe Korty


* Perez-Gonzalez, Inaky <inaky.perez-gonzalez@intel.com> wrote:

> >OTOH, deadlock detection is another issue. It's quite expensive and i'm
> >not sure we want to make it a runtime thing. But for fusyn's deadlock
> >detection and safe teardown on owner-exit is a must-have i suspect?
> 
> Not really. Deadlock check is needed on PI, so it can be done at the 
> same time (you have to walk the chain anyway). In any other case, it 
> is an option you can request (or not).

well, i was talking about the mutex code in PREEMPT_RT. There deadlock 
detection is an optional debug feature. You dont _have_ to do deadlock 
detection for the kernel's locks, and there's a difference in 
performance.

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] Priority Lists for the RT mutex
@ 2005-04-11  8:27 Perez-Gonzalez, Inaky
  2005-04-11  8:42 ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Perez-Gonzalez, Inaky @ 2005-04-11  8:27 UTC (permalink / raw)
  To: Ingo Molnar, Sven-Thorsten Dietrich
  Cc: Daniel Walker, linux-kernel, Steven Rostedt, Esben Nielsen, Joe Korty

>From: Ingo Molnar [mailto:mingo@elte.hu]
>
>
>i'd not mind merging the extra bits to PREEMPT_RT to enable fusyn's, if
>they come in small, clean steps. E.g. Daniel's plist.h stuff was nice
>and clean.

I am finishing breaking it up in small bits so you can take a look
at it. Should be finished tomorrow noon (PST).

>is priority ceiling coming in via some POSIX feature that fusyn's need
>to address? What would be needed precisely - a way to set a priority
for
>a lock (independently of the owner's task priority), and let that
>control the PI mechanism?

Yep. It is kind of easy to do (at least in fusyns)--it is just a
matter of setting the priority of the lock, that sets the priority
of its list node.

Because the promotion code only cares about the priority of the
list node, it blends automatically in the whole scheme. The PI
code will modify the list node's priority while promoting all the
tasks affected in the ownership chain, only when the fulocks/mutexes
are PI. The PP code will modify the priority of the fulock/mutex's
list node with an special call. 

[you can check for my 2004 OLS paper for a deeper explanation, or
I can extend this one, if you want]. 

>i.e. this doesnt seem to really affect the core design of RT mutexes.

Nope it doesn't. As I said, it is done in such a way that no 
modifications are needed.

>OTOH, deadlock detection is another issue. It's quite expensive and i'm
>not sure we want to make it a runtime thing. But for fusyn's deadlock
>detection and safe teardown on owner-exit is a must-have i suspect?

Not really. Deadlock check is needed on PI, so it can be done at the
same time (you have to walk the chain anyway). In any other case, it
is an option you can request (or not).

-- Inaky

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-04-08  8:05   ` Sven-Thorsten Dietrich
@ 2005-04-10 11:09     ` Ingo Molnar
  2005-04-12 17:56       ` Daniel Walker
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2005-04-10 11:09 UTC (permalink / raw)
  To: Sven-Thorsten Dietrich
  Cc: Daniel Walker, linux-kernel, inaky.perez-gonzalez,
	Steven Rostedt, Esben Nielsen, Joe Korty


* Sven-Thorsten Dietrich <sdietrich@mvista.com> wrote:

> On Fri, 2005-04-08 at 08:28 +0200, Ingo Molnar wrote:
> > * Daniel Walker <dwalker@mvista.com> wrote:
> > 
> > > 	This patch adds the priority list data structure from Inaky 
> > > Perez-Gonzalez to the Preempt Real-Time mutex.
> > 
> > this one looks really clean.
> > 
> > it makes me wonder, what is the current status of fusyn's? Such a light 
> > datastructure would be much more mergeable upstream than the former 
> > 100-queues approach.
> 
> Ingo,
> 
> Joe Korty has been doing a lot of work on Fusyn recently.
> 
> Fusyn is now a generic implementation, similar to the RT mutex. SMP 
> scalability is somewhat better for decoupled locks on PI (last I 
> checked). It has the extra bulk required to support user space.
> 
> The major issue that concerns the Fusym and the real-time patch is 
> that merging the kernel portion of Fusyn creates a collision of 
> redundant PI implementations with respect to the RT mutex.

i'd not mind merging the extra bits to PREEMPT_RT to enable fusyn's, if 
they come in small, clean steps. E.g. Daniel's plist.h stuff was nice 
and clean.

> There are a few mistakes on the page, (RT mutex does not do priority 
> ceiling), but for the most part the info is current.

is priority ceiling coming in via some POSIX feature that fusyn's need
to address? What would be needed precisely - a way to set a priority for
a lock (independently of the owner's task priority), and let that
control the PI mechanism?

Unless i'm missing something, this could be implemented by detaching
lock->owner_prio from lock->owner - via e.g. negative values. Thus some
minimal code would check whether we need the owner's priority in the PI
logic, or the semaphore's "own" priority level.

i.e. this doesnt seem to really affect the core design of RT mutexes.

OTOH, deadlock detection is another issue. It's quite expensive and i'm 
not sure we want to make it a runtime thing. But for fusyn's deadlock 
detection and safe teardown on owner-exit is a must-have i suspect?

> If the RT mutex could be exposed in non PREEMPT_RT configurations, the 
> fulock portion could be superseded by the RT mutex, and the remaining 
> pieces merged in. Or vice versa.

sure, RT mutexes could be exposed in !PREEMPT_RT.

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-04-07 17:52 Daniel Walker
  2005-04-08  6:28 ` Ingo Molnar
@ 2005-04-10 10:53 ` Ingo Molnar
  2005-04-11 16:19   ` Daniel Walker
  2005-04-12 18:37 ` Paul E. McKenney
  2 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2005-04-10 10:53 UTC (permalink / raw)
  To: Daniel Walker; +Cc: linux-kernel, sdietrich, inaky.perez-gonzalez


* Daniel Walker <dwalker@mvista.com> wrote:

> Description:
> 	This patch adds the priority list data structure from Inaky 
> Perez-Gonzalez to the Preempt Real-Time mutex.

ok, i've added this patch to the -45-00 release. It's looking good on my 
testsystems so far, but it will need some more testing i guess.

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] Priority Lists for the RT mutex
@ 2005-04-08 23:05 Perez-Gonzalez, Inaky
  0 siblings, 0 replies; 38+ messages in thread
From: Perez-Gonzalez, Inaky @ 2005-04-08 23:05 UTC (permalink / raw)
  To: dwalker
  Cc: Ingo Molnar, linux-kernel, sdietrich, Steven Rostedt, Esben Nielsen

>From: Daniel Walker [mailto:dwalker@mvista.com]
>
>> Current tip of development has some issues with conditional variables
>> and broadcasts (requeue stuff) that I need to sink my teeth in. Joe
>> Korty is fixing up a lot of corner cases I wasn't catching, but
>> other than that is doing fine.
>
>You try to get out, and they suck you right back in.

Don't mention it :] That's why I want to get some more people
hooked up to this...so I can move on to do other things :)

>> How long ago since you saw it? I also implemented the futex
redirection
>> stuff we discussed some months ago.
>
>It's been a while since I've seen the fusyn scheduler changes. I have
>the curernt fusyn CVS, I'll take a look at it.

All that stuff is in futex.c; bear in mind what I said at
the confcall, it is just a hacky proof-of-concept--it doesn't
even implement the async interface.

It kind of works, but is not all that solid [last time I tried
the JVMs locked up].

-- Inaky

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] Priority Lists for the RT mutex
  2005-04-08 21:25 Perez-Gonzalez, Inaky
@ 2005-04-08 22:53 ` Daniel Walker
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Walker @ 2005-04-08 22:53 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: Ingo Molnar, linux-kernel, sdietrich, Steven Rostedt, Esben Nielsen

On Fri, 2005-04-08 at 14:25, Perez-Gonzalez, Inaky wrote:
> I concur with Daniel. If we can decide how to deal with that (toss
> one out, keep one, merge them, whatever), we could reuse all the user
> space glue that is the hardest part to get right.

	I have a preference to the Real-Time PI , but that's just cause I've
worked with it more. Ingo's really the one that should be make those
choices though, since he has the biggest influence over what goes into
sched.c ..

> Current tip of development has some issues with conditional variables
> and broadcasts (requeue stuff) that I need to sink my teeth in. Joe
> Korty is fixing up a lot of corner cases I wasn't catching, but 
> other than that is doing fine.

You try to get out, and they suck you right back in.

> How long ago since you saw it? I also implemented the futex redirection
> stuff we discussed some months ago.

It's been a while since I've seen the fusyn scheduler changes. I have
the curernt fusyn CVS, I'll take a look at it.

Daniel



^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] Priority Lists for the RT mutex
@ 2005-04-08 21:25 Perez-Gonzalez, Inaky
  2005-04-08 22:53 ` Daniel Walker
  0 siblings, 1 reply; 38+ messages in thread
From: Perez-Gonzalez, Inaky @ 2005-04-08 21:25 UTC (permalink / raw)
  To: dwalker, Ingo Molnar
  Cc: linux-kernel, sdietrich, Steven Rostedt, Esben Nielsen

>From: Daniel Walker [mailto:dwalker@mvista.com]
>
>On Thu, 2005-04-07 at 23:28, Ingo Molnar wrote:
>
>> this one looks really clean.
>>
>> it makes me wonder, what is the current status of fusyn's? Such a
light
>> datastructure would be much more mergeable upstream than the former
>> 100-queues approach.
>
>	Inaky was telling me that 100 queues approach is two years old.
>
>The biggest problem is that fusyn has it's own PI system .. So it's not
>clear if that will work with the RT mutex,. I was thinking the PI stuff
>could be made generic so, fusyn, maybe futex, can use it also .

I concur with Daniel. If we can decide how to deal with that (toss
one out, keep one, merge them, whatever), we could reuse all the user
space glue that is the hardest part to get right.

Current tip of development has some issues with conditional variables
and broadcasts (requeue stuff) that I need to sink my teeth in. Joe
Korty is fixing up a lot of corner cases I wasn't catching, but 
other than that is doing fine.

How long ago since you saw it? I also implemented the futex redirection
stuff we discussed some months ago.

-- Inaky

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-04-08  6:28 ` Ingo Molnar
  2005-04-08  8:05   ` Sven-Thorsten Dietrich
@ 2005-04-08 17:11   ` Daniel Walker
  1 sibling, 0 replies; 38+ messages in thread
From: Daniel Walker @ 2005-04-08 17:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, sdietrich, inaky.perez-gonzalez, Steven Rostedt,
	Esben Nielsen


On Thu, 2005-04-07 at 23:28, Ingo Molnar wrote:

> this one looks really clean.
> 
> it makes me wonder, what is the current status of fusyn's? Such a light 
> datastructure would be much more mergeable upstream than the former 
> 100-queues approach.


	Inaky was telling me that 100 queues approach is two years old. 

The biggest problem is that fusyn has it's own PI system .. So it's not
clear if that will work with the RT mutex,. I was thinking the PI stuff
could be made generic so, fusyn, maybe futex, can use it also .

Daniel


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-04-08  6:28 ` Ingo Molnar
@ 2005-04-08  8:05   ` Sven-Thorsten Dietrich
  2005-04-10 11:09     ` Ingo Molnar
  2005-04-08 17:11   ` Daniel Walker
  1 sibling, 1 reply; 38+ messages in thread
From: Sven-Thorsten Dietrich @ 2005-04-08  8:05 UTC (permalink / raw)
  To: Ingo Molnar, joe.korty
  Cc: Daniel Walker, linux-kernel, inaky.perez-gonzalez,
	Steven Rostedt, Esben Nielsen

On Fri, 2005-04-08 at 08:28 +0200, Ingo Molnar wrote:
> * Daniel Walker <dwalker@mvista.com> wrote:
> 
> > 	This patch adds the priority list data structure from Inaky 
> > Perez-Gonzalez to the Preempt Real-Time mutex.
> 
> this one looks really clean.
> 
> it makes me wonder, what is the current status of fusyn's? Such a light 
> datastructure would be much more mergeable upstream than the former 
> 100-queues approach.

Ingo,

Joe Korty has been doing a lot of work on Fusyn recently.

Fusyn is now a generic implementation, similar to the RT mutex. SMP
scalability is somewhat better for decoupled locks on PI (last I
checked). It has the extra bulk required to support user space.

The major issue that concerns the Fusym and the real-time patch is that
merging the kernel portion of Fusyn creates a collision of redundant PI
implementations with respect to the RT mutex.

The issues are outlined here:

http://developer.osdl.org/dev/mutexes/

There are a few mistakes on the page, (RT mutex does not do priority
ceiling), but for the most part the info is current.

If the RT mutex could be exposed in non PREEMPT_RT configurations,
the fulock portion could be superseded by the RT mutex, and the
remaining pieces merged in. Or vice versa.

We discussed the scenarios recently, any guidance you can offer to help
us out would be greatly appreciated.

http://lists.osdl.org/pipermail/robustmutexes/2005-April/000532.html


Sven



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Priority Lists for the RT mutex
  2005-04-07 17:52 Daniel Walker
@ 2005-04-08  6:28 ` Ingo Molnar
  2005-04-08  8:05   ` Sven-Thorsten Dietrich
  2005-04-08 17:11   ` Daniel Walker
  2005-04-10 10:53 ` Ingo Molnar
  2005-04-12 18:37 ` Paul E. McKenney
  2 siblings, 2 replies; 38+ messages in thread
From: Ingo Molnar @ 2005-04-08  6:28 UTC (permalink / raw)
  To: Daniel Walker
  Cc: linux-kernel, sdietrich, inaky.perez-gonzalez, Steven Rostedt,
	Esben Nielsen


* Daniel Walker <dwalker@mvista.com> wrote:

> 	This patch adds the priority list data structure from Inaky 
> Perez-Gonzalez to the Preempt Real-Time mutex.

this one looks really clean.

it makes me wonder, what is the current status of fusyn's? Such a light 
datastructure would be much more mergeable upstream than the former 
100-queues approach.

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH] Priority Lists for the RT mutex
@ 2005-04-07 17:52 Daniel Walker
  2005-04-08  6:28 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Daniel Walker @ 2005-04-07 17:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, sdietrich, inaky.perez-gonzalez


Source: Daniel Walker <dwalker@mvista.com> MontaVista Software, Inc
Description:
	This patch adds the priority list data structure from Inaky Perez-Gonzalez 
to the Preempt Real-Time mutex.

the patch order is (starting with a 2.6.11 kernel tree),

patch-2.6.12-rc2
realtime-preempt-2.6.12-rc2-V0.7.44-01
	
Signed-off-by: Daniel Walker <dwalker@mvista.com>

Index: linux-2.6.11/include/linux/plist.h
===================================================================
--- linux-2.6.11.orig/include/linux/plist.h	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.11/include/linux/plist.h	2005-04-07 17:47:42.000000000 +0000
@@ -0,0 +1,310 @@
+/*
+ * Descending-priority-sorted double-linked list
+ *
+ * (C) 2002-2003 Intel Corp
+ * Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>.
+ *
+ * 2001-2005 (c) MontaVista Software, Inc.
+ * Daniel Walker <dwalker@mvista.com>
+ *
+ * Licensed under the FSF's GNU Public License v2 or later.
+ *
+ * Based on simple lists (include/linux/list.h).
+ *
+ * 
+ * This is a priority-sorted list of nodes; each node has a >= 0
+ * priority from 0 (highest) to INT_MAX (lowest). The list itself has
+ * a priority too (the highest of all the nodes), stored in the head
+ * of the list (that is a node itself).
+ *
+ * Addition is O(1), removal is O(1), change of priority of a node is
+ * O(1).
+ *
+ * Addition and change of priority's order is really O(K), where K is
+ * a constant being the maximum number of different priorities you
+ * will store in the list. Being a constant, it means it is O(1).
+ *
+ * This list is really a list of lists:
+ *
+ *  - The tier 1 list is the dp list (Different Priority)
+ *
+ *  - The tier 2 list is the sp list (Same Priority)
+ *
+ * All the nodes in a SP list have the same priority, and all the DP
+ * lists have different priorities (and are sorted by priority, of
+ * course).
+ *
+ * Addition means: look for the DP node in the DP list for the
+ * priority of the node and append to the SP list corresponding to
+ * that node. If it is the first node of that priority, add it to the
+ * DP list in the right position and it will be the head of the SP
+ * list for his priority.
+ *
+ * Removal means remove it from the SP list corresponding to its
+ * prio. If it is the only one, it means its also the head and thus a
+ * DP node, so we remove it from the DP list. If it is the head but
+ * there are others in its SP list, then we remove it from both and
+ * get the first one of the SP list to be the new head.
+ *
+ * INT_MIN is the highest priority, 0 is the medium highest, INT_MAX
+ * is lowest priority.
+ *
+ * No locking is done, up to the caller.
+ *
+ * NOTE: This implementation does not offer as many interfaces as
+ *       linux/list.h does -- it is lazily minimal. You are welcome to
+ *       add them.
+ */
+
+#ifndef _LINUX_PLIST_H_
+#define _LINUX_PLIST_H_
+
+#include <linux/kernel.h>
+#include <linux/list.h>
+
+/* Priority-sorted list */
+struct plist {
+	int prio;
+	struct list_head dp_node;
+	struct list_head sp_node;
+};
+
+#define PLIST_INIT(p,__prio)				\
+{							\
+	.prio = __prio,					\
+	.dp_node = LIST_HEAD_INIT((p).dp_node),	\
+	.sp_node = LIST_HEAD_INIT((p).sp_node),	\
+}
+
+/**
+ * plist_entry - get the struct for this entry
+ * @ptr:        the &struct plist pointer.
+ * @type:       the type of the struct this is embedded in.
+ * @member:     the name of the list_struct within the struct.
+ */
+#define plist_entry(ptr, type, member) \
+        container_of(plist_first (ptr), type, member)
+/**
+ * plist_for_each  -       iterate over the plist
+ * @pos1:        the type * to use as a loop counter.
+ * @pos1:        the type * to use as a second loop counter.
+ * @head:       the head for your list.
+ */
+#define plist_for_each(pos1, pos2, head)	\
+	list_for_each_entry(pos1, &((head)->dp_node), dp_node)	\
+		list_for_each_entry(pos2, &((pos1)->sp_node), sp_node)
+/**
+ * plist_for_each_entry_safe - iterate over a plist of given type safe against removal of list entry
+ * @pos1:        the type * to use as a loop counter.
+ * @pos2:        the type * to use as a loop counter.
+ * @n1:          another type * to use as temporary storage
+ * @n2:          another type * to use as temporary storage
+ * @head:       the head for your list.
+ */
+#define plist_for_each_safe(pos1, pos2, n1, n2, head)			\
+	list_for_each_entry_safe(pos1, n1, &((head)->dp_node), dp_node)	\
+		list_for_each_entry_safe(pos2, n2, &((pos1)->sp_node), sp_node)
+
+/* Initialize a pl */
+static inline
+void plist_init (struct plist *pl, int prio)
+{
+	pl->prio = prio;
+	INIT_LIST_HEAD (&pl->dp_node);
+	INIT_LIST_HEAD (&pl->sp_node);
+}
+
+/* Return the first node (and thus, highest priority)
+ *
+ * Assumes the plist is _not_ empty.
+ */
+static inline
+struct plist * plist_first (struct plist *plist)
+{
+	return list_entry (plist->dp_node.next, struct plist, dp_node);
+}
+
+/* Return if the plist is empty. */
+static inline
+unsigned plist_empty (const struct plist *plist)
+{
+	return list_empty (&plist->dp_node);
+}
+
+/* Update the maximum priority of the whole list
+ *
+ * @returns !0 if the plist prio changed, 0 otherwise.
+ * 
+ * __plist_update_prio() assumes the plist is not empty.
+ */
+static inline
+unsigned __plist_update_prio (struct plist *plist)
+{
+	int prio = plist_first (plist)->prio;
+	if (plist->prio == prio)
+		return 0;
+	plist->prio = prio;
+	return !0;
+}
+
+static inline
+unsigned plist_update_prio (struct plist *plist)
+{
+	int old_prio = plist->prio;
+	/* plist empty, lowest prio = INT_MAX */
+	plist->prio = plist_empty (plist)? INT_MAX : plist_first (plist)->prio;
+	return old_prio != plist->prio;
+}
+
+/* Add a node to the plist [internal]
+ *
+ * pl->prio == INT_MAX is an special case, means low priority, get
+ * down to the end of the plist. Note the we want FIFO behaviour on
+ * the same priority.
+ */
+static inline
+void __plist_add_sorted (struct plist *plist, struct plist *pl)
+{
+	struct list_head *itr;
+	struct plist *itr_pl;
+
+
+	if (pl->prio < INT_MAX) {
+		list_for_each (itr, &plist->dp_node) {
+			itr_pl = list_entry (itr, struct plist, dp_node);
+			if (pl->prio == itr_pl->prio)
+				goto existing_sp_head;
+			else if (pl->prio < itr_pl->prio)
+				goto new_sp_head;
+		}
+		itr_pl = plist;
+		goto new_sp_head;
+	}
+	/* Append to end, SP list for prio INT_MAX */
+	itr_pl = container_of (plist->dp_node.prev, struct plist, dp_node);
+	if (!list_empty (&plist->dp_node) && itr_pl->prio == INT_MAX)
+		goto existing_sp_head;
+	itr_pl = plist;
+
+new_sp_head:
+	list_add_tail (&pl->dp_node, &itr_pl->dp_node);
+	INIT_LIST_HEAD (&pl->sp_node);
+	return;
+	
+existing_sp_head:
+	list_add_tail (&pl->sp_node, &itr_pl->sp_node);
+	INIT_LIST_HEAD (&pl->dp_node);
+	return;
+}
+
+
+/**
+ * Add node @pl to @plist @returns !0 if the plist prio changed, 0
+ * otherwise. 
+ */
+static inline
+unsigned plist_add (struct plist *pl, struct plist *plist)
+{
+	__plist_add_sorted (plist, pl);
+	/* Are we setting a higher priority? */
+	if (pl->prio < plist->prio) {
+		plist->prio = pl->prio;
+		return !0;
+	}
+	return 0;
+}
+
+
+/* Grunt to do the real removal work of @pl from the plist. */
+static inline
+void __plist_del (struct plist *pl)
+{
+	struct list_head *victim;
+	if (list_empty (&pl->dp_node))  	/* SP-node, not head */
+		victim = &pl->sp_node;
+	else if (list_empty (&pl->sp_node)) 	/* DP-node, empty SP list */
+		victim = &pl->dp_node;
+	else {					/* SP list head, not empty */
+		struct plist *pl_new = container_of (pl->sp_node.next,
+						     struct plist, sp_node);
+		victim = &pl->sp_node;
+		list_replace_rcu (&pl->dp_node, &pl_new->dp_node);
+	}
+	list_del_init (victim);
+}
+
+/**
+ * Remove a node @pl from @plist. @returns !0 if the plist prio
+ * changed, 0 otherwise.
+ */
+static inline
+unsigned plist_del (struct plist *plist, struct plist *pl)
+{
+	__plist_del (pl);
+	return plist_update_prio (plist);
+}
+
+/**
+ * plist_del_init - deletes entry from list and reinitialize it.
+ * @entry: the element to delete from the list.
+ */
+static inline void plist_del_init(struct plist *plist, struct plist *pl)
+{
+        plist_del (plist, pl);
+        plist_init(pl, 0);
+}
+
+/* Return the priority a pl node */
+static inline
+int plist_prio (struct plist *pl)
+{
+	return pl->prio;
+}
+
+/* Change the priority of a pl node, without updating plist position */
+static inline
+void __plist_chprio (struct plist *pl, int new_prio)
+{
+	pl->prio = new_prio;
+}
+
+/**
+ * Change the priority of node @pl in @plist (updating the list's max
+ * priority).  @returns !0 if the plist's maximum priority changes
+ */
+static inline
+unsigned plist_chprio (struct plist *plist, struct plist *pl, int new_prio)
+{
+	if (new_prio == plist->prio)
+		return 0;
+	__plist_chprio (pl, new_prio);
+	__plist_del (pl);
+	__plist_add_sorted (plist, pl);
+	return __plist_update_prio (plist);
+}
+
+
+static inline
+struct plist * __plist_dp_next (struct plist *node_dp) 
+{
+	return container_of (node_dp->dp_node.next, struct plist, dp_node);
+}
+
+
+extern void __plist_splice (struct plist *dst, struct plist *src);
+
+/** Join @src plist to @dst plist and reinitialise @dst. @returns !0 */
+static inline
+unsigned plist_splice_init (struct plist *dst, struct plist *src)
+{
+	int old_prio;
+	if (plist_empty (src))
+		return 0;
+	old_prio = dst->prio;
+	__plist_splice (dst, src);
+	plist_init (src, INT_MAX);
+	return old_prio != dst->prio;
+}
+
+#endif /* #ifndef _LINUX_PLIST_H_ */
+
Index: linux-2.6.11/include/linux/rt_lock.h
===================================================================
--- linux-2.6.11.orig/include/linux/rt_lock.h	2005-04-07 17:47:32.000000000 +0000
+++ linux-2.6.11/include/linux/rt_lock.h	2005-04-07 17:47:42.000000000 +0000
@@ -3,6 +3,7 @@
 
 #include <linux/config.h>
 #include <linux/list.h>
+#include <linux/plist.h>
 
 /*
  * These are the basic SMP spinlocks, allowing only a single CPU anywhere.
@@ -57,7 +58,7 @@
  */
 struct rt_mutex {
 	raw_spinlock_t		wait_lock;
-	struct list_head	wait_list;
+	struct plist		wait_list;
 	struct task_struct	*owner;
 	int			owner_prio;
 # ifdef CONFIG_RT_DEADLOCK_DETECT
@@ -75,8 +76,8 @@
  */
 struct rt_mutex_waiter {
 	struct rt_mutex *lock;
-	struct list_head list;
-	struct list_head pi_list;
+	struct plist	 list;
+	struct plist	 pi_list;
 	struct task_struct *task;
 
 	unsigned long eip;	// for debugging
@@ -85,12 +86,12 @@
 #ifdef CONFIG_RT_DEADLOCK_DETECT
 # define __RT_MUTEX_INITIALIZER(lockname) \
 	{ .wait_lock = RAW_SPIN_LOCK_UNLOCKED, \
-	.wait_list = LIST_HEAD_INIT((lockname).wait_list), \
+	.wait_list = PLIST_INIT((lockname).wait_list, 0),  \
 	.name = #lockname, .file = __FILE__, .line = __LINE__ }
 #else
 # define __RT_MUTEX_INITIALIZER(lockname) \
 	{ .wait_lock = RAW_SPIN_LOCK_UNLOCKED, \
-	   LIST_HEAD_INIT((lockname).wait_list) }
+	PLIST_INIT((lockname).wait_list, 0) }
 #endif
 /*
  * RW-semaphores are an RT mutex plus a reader-depth count.
Index: linux-2.6.11/include/linux/sched.h
===================================================================
--- linux-2.6.11.orig/include/linux/sched.h	2005-04-07 17:47:32.000000000 +0000
+++ linux-2.6.11/include/linux/sched.h	2005-04-07 17:47:42.000000000 +0000
@@ -837,7 +837,7 @@
 
 	/* realtime bits */
 	struct list_head delayed_put;
-	struct list_head pi_waiters;
+	struct plist pi_waiters;
 
 	/* RT deadlock detection and priority inheritance handling */
 	struct rt_mutex_waiter *blocked_on;
Index: linux-2.6.11/kernel/fork.c
===================================================================
--- linux-2.6.11.orig/kernel/fork.c	2005-04-07 17:47:32.000000000 +0000
+++ linux-2.6.11/kernel/fork.c	2005-04-07 17:47:42.000000000 +0000
@@ -990,7 +990,7 @@
  	}
 #endif
 	INIT_LIST_HEAD(&p->delayed_put);
-	INIT_LIST_HEAD(&p->pi_waiters);
+	plist_init(&p->pi_waiters, 0);
 	p->blocked_on = NULL; /* not blocked yet */
 
 	p->tgid = p->pid;
Index: linux-2.6.11/kernel/rt.c
===================================================================
--- linux-2.6.11.orig/kernel/rt.c	2005-04-07 17:47:32.000000000 +0000
+++ linux-2.6.11/kernel/rt.c	2005-04-07 17:47:42.000000000 +0000
@@ -31,6 +31,7 @@
 #include <linux/kallsyms.h>
 #include <linux/syscalls.h>
 #include <linux/interrupt.h>
+#include <linux/plist.h>
 
 /*
  * These flags are used for allowing of stealing of ownerships.
@@ -417,6 +418,7 @@
 void check_no_held_locks(struct task_struct *task)
 {
 	struct list_head *curr, *next, *cursor = NULL;
+	struct plist *curr1, *curr2;
 	struct rt_mutex *lock;
 	struct rt_mutex_waiter *w;
 	struct task_struct *p;
@@ -452,8 +454,8 @@
 		goto restart;
 	}
 	spin_lock(&pi_lock);
-	list_for_each(curr, &task->pi_waiters) {
-		w = list_entry(curr, struct rt_mutex_waiter, pi_list);
+	plist_for_each(curr1, curr2, &task->pi_waiters) {
+		w = plist_entry(curr2, struct rt_mutex_waiter, pi_list);
 		TRACE_OFF();
 		spin_unlock(&pi_lock);
 		trace_unlock_irqrestore(&trace_lock, flags);
@@ -514,13 +516,13 @@
 		      struct task_struct *old_owner)
 {
 	struct rt_mutex_waiter *w;
-	struct list_head *curr;
+	struct plist *curr1, *curr2;
 
-	TRACE_WARN_ON(list_empty(&waiter->pi_list));
+	TRACE_WARN_ON(plist_empty(&waiter->pi_list));
 	TRACE_WARN_ON(lock->owner);
 
-	list_for_each(curr, &old_owner->pi_waiters) {
-		w = list_entry(curr, struct rt_mutex_waiter, pi_list);
+	plist_for_each(curr1, curr2, &old_owner->pi_waiters) {
+		w = plist_entry(curr2, struct rt_mutex_waiter, pi_list);
 		if (w == waiter)
 			goto ok;
 	}
@@ -532,10 +534,10 @@
 check_pi_list_empty(struct rt_mutex *lock, struct task_struct *old_owner)
 {
 	struct rt_mutex_waiter *w;
-	struct list_head *curr;
+	struct plist *curr1, curr2;
 
-	list_for_each(curr, &old_owner->pi_waiters) {
-		w = list_entry(curr, struct rt_mutex_waiter, pi_list);
+	plist_for_each(curr1, curr2, &old_owner->pi_waiters) {
+		w = plist_entry(curr2, struct rt_mutex_waiter, pi_list);
 		if (w->lock == lock) {
 			TRACE_OFF();
 			printk("hm, PI interest but no waiter? Old owner:\n");
@@ -569,17 +571,19 @@
 change_owner(struct rt_mutex *lock, struct task_struct *old_owner,
 		   struct task_struct *new_owner)
 {
-	struct list_head *curr, *next;
+	struct plist *next1, *next2, *curr1, *curr2;
 	struct rt_mutex_waiter *w;
 	int requeued = 0, sum = 0;
 
 	if (old_owner == new_owner)
 		return;
-	list_for_each_safe(curr, next, &old_owner->pi_waiters) {
-		w = list_entry(curr, struct rt_mutex_waiter, pi_list);
+
+	plist_for_each_safe(curr1, curr2, next1, next2, &old_owner->pi_waiters) {
+		w = plist_entry(curr2, struct rt_mutex_waiter, pi_list);
 		if (w->lock == lock) {
-			list_del_init(curr);
-			list_add_tail(curr, &new_owner->pi_waiters);
+			plist_del(&w->pi_list, &old_owner->pi_waiters);
+			plist_init(&w->pi_list, w->task->prio);
+			plist_add(&w->pi_list, &new_owner->pi_waiters);
 			requeued++;
 		}
 		sum++;
@@ -626,31 +630,36 @@
 		/*
 		 * If the task is blocked on a lock, and we just made
 		 * it RT, then register the task in the PI list and
-		 * requeue it to the head of the wait list:
+		 * requeue it to the wait list:
 		 */
 		lock = w->lock;
 		TRACE_BUG_ON(!lock);
 		TRACE_BUG_ON(!lock->owner);
-		if (rt_task(p) && list_empty(&w->pi_list)) {
+		if (rt_task(p) && plist_empty(&w->pi_list)) {
 			TRACE_BUG_ON(was_rt);
-			list_add_tail(&w->pi_list, &lock->owner->pi_waiters);
-			list_del(&w->list);
-			list_add(&w->list, &lock->wait_list);
+			plist_init(&w->pi_list, prio);
+			plist_add(&w->pi_list, &lock->owner->pi_waiters);
+
+			plist_del(&w->list, &lock->wait_list);
+			plist_init(&w->list, prio);
+			plist_add(&w->list, &lock->wait_list);
+
 		}
 		/*
 		 * If the task is blocked on a lock, and we just restored
 		 * it from RT to non-RT then unregister the task from
-		 * the PI list and requeue it to the tail of the wait
-		 * list:
+		 * the PI list and requeue it to the wait list.
 		 *
 		 * (TODO: this can be unfair to SCHED_NORMAL tasks if they
 		 *        get PI handled.)
 		 */
-		if (!rt_task(p) && !list_empty(&w->pi_list)) {
+		if (!rt_task(p) && !plist_empty(&w->pi_list)) {
 			TRACE_BUG_ON(!was_rt);
-			list_del(&w->pi_list);
-			list_del(&w->list);
-			list_add_tail(&w->list, &lock->wait_list);
+			plist_del(&w->pi_list, &lock->owner->pi_waiters);
+			plist_del(&w->list, &lock->wait_list);
+			plist_init(&w->list, prio);
+			plist_add(&w->list, &lock->wait_list);
+
 		}
 
 		pi_walk++;
@@ -679,22 +688,22 @@
 	task->blocked_on = waiter;
 	waiter->lock = lock;
 	waiter->task = task;
-	INIT_LIST_HEAD(&waiter->pi_list);
+	plist_init(&waiter->pi_list, task->prio);
 	/*
 	 * Add SCHED_NORMAL tasks to the end of the waitqueue (FIFO):
 	 */
 #ifndef ALL_TASKS_PI
 	if (!rt_task(task)) {
-		list_add_tail(&waiter->list, &lock->wait_list);
+		plist_add(&waiter->list, &lock->wait_list);
 		return;
 	}
 #endif
 	spin_lock(&pi_lock);
-	list_add_tail(&waiter->pi_list, &lock->owner->pi_waiters);
+	plist_add(&waiter->pi_list, &lock->owner->pi_waiters);
 	/*
 	 * Add RT tasks to the head:
 	 */
-	list_add(&waiter->list, &lock->wait_list);
+	plist_add(&waiter->list, &lock->wait_list);
 	/*
 	 * If the waiter has higher priority than the owner
 	 * then temporarily boost the owner:
@@ -712,7 +721,7 @@
 {
 	lock->owner = NULL;
 	spin_lock_init(&lock->wait_lock);
-	INIT_LIST_HEAD(&lock->wait_list);
+	plist_init(&lock->wait_list, 0);
 #ifdef CONFIG_RT_DEADLOCK_DETECT
 	lock->save_state = save_state;
 	INIT_LIST_HEAD(&lock->held_list);
@@ -754,44 +763,26 @@
 		struct task_struct *old_owner, int save_state,
 		unsigned long eip)
 {
-	struct rt_mutex_waiter *w, *waiter = NULL;
+	struct rt_mutex_waiter *waiter = NULL;
 	struct task_struct *new_owner;
-	struct list_head *curr;
 
 	/*
 	 * Get the highest prio one:
 	 *
 	 * (same-prio RT tasks go FIFO)
 	 */
-	list_for_each(curr, &lock->wait_list) {
-		w = list_entry(curr, struct rt_mutex_waiter, list);
-		trace_special_pid(w->task->pid, w->task->prio, 0);
-		/*
-		 * Break out upon meeting the first non-RT-prio
-		 * task - we inserted them to the tail, so if we
-	 	 * see the first one the rest is SCHED_NORMAL too:
-	 	 */
-		if (!rt_task(w->task))
-			break;
-		if (!waiter || w->task->prio <= waiter->task->prio)
-			waiter = w;
-	}
+	waiter = plist_entry(&lock->wait_list, struct rt_mutex_waiter, list);
 
-	/*
-	 * If no RT waiter then pick the first one:
-	 */
-	if (!waiter)
-		waiter = list_entry(lock->wait_list.next,
-					struct rt_mutex_waiter, list);
 	trace_special_pid(waiter->task->pid, waiter->task->prio, 0);
 
 #ifdef ALL_TASKS_PI
 	check_pi_list_present(lock, waiter, old_owner);
 #endif
 	new_owner = waiter->task;
-	list_del_init(&waiter->list);
+	plist_del_init(&lock->wait_list, &waiter->list);
 
-	list_del_init(&waiter->pi_list);
+	plist_del(&waiter->pi_list, &old_owner->pi_waiters);
+	plist_init(&waiter->pi_list, waiter->task->prio);
 
 	set_new_owner(lock, old_owner, new_owner, waiter->eip);
 	/* Don't touch waiter after ->task has been NULLed */
@@ -806,8 +797,8 @@
 static inline void init_lists(struct rt_mutex *lock)
 {
 	// we have to do this until the static initializers get fixed:
-	if (!lock->wait_list.prev && !lock->wait_list.next)
-		INIT_LIST_HEAD(&lock->wait_list);
+	if (!lock->wait_list.dp_node.prev && !lock->wait_list.dp_node.next)
+		plist_init(&lock->wait_list, 0);
 #ifdef CONFIG_RT_DEADLOCK_DETECT
 	if (!lock->held_list.prev && !lock->held_list.next)
 		INIT_LIST_HEAD(&lock->held_list);
@@ -936,7 +927,7 @@
 	if (grab_lock(lock,task)) {
 		/* granted */
 		struct task_struct *old_owner = lock->owner;
-		TRACE_WARN_ON(!list_empty(&lock->wait_list) && !old_owner);
+		TRACE_WARN_ON(!plist_empty(&lock->wait_list) && !old_owner);
 		spin_lock(&pi_lock);
 		set_new_owner(lock, old_owner, task, eip);
 		spin_unlock(&pi_lock);
@@ -1032,7 +1023,7 @@
 	if (grab_lock(lock,task)) {
 		/* granted */
 		struct task_struct *old_owner = lock->owner;
-		TRACE_WARN_ON(!list_empty(&lock->wait_list) && !old_owner);
+		TRACE_WARN_ON(!plist_empty(&lock->wait_list) && !old_owner);
 		spin_lock(&pi_lock);
 		set_new_owner(lock, old_owner, task, eip);
 		spin_unlock(&pi_lock);
@@ -1042,6 +1033,7 @@
 		return;
 	}
 
+	plist_init (&waiter.list, task->prio);
 	task_blocks_on_lock(&waiter, task, lock, eip);
 
 	TRACE_BUG_ON(!irqs_disabled());
@@ -1171,7 +1163,7 @@
 	if (grab_lock(lock,task)) {
 		/* granted */
 		struct task_struct *old_owner = lock->owner;
-		TRACE_WARN_ON(!list_empty(&lock->wait_list) && !old_owner);
+		TRACE_WARN_ON(!plist_empty(&lock->wait_list) && !old_owner);
 		spin_lock(&pi_lock);
 		set_new_owner(lock, old_owner, task, eip);
 		spin_unlock(&pi_lock);
@@ -1183,6 +1175,7 @@
 
 	set_task_state(task, TASK_INTERRUPTIBLE);
 
+	plist_init (&waiter.list, task->prio);
 	task_blocks_on_lock(&waiter, task, lock, eip);
 
 	TRACE_BUG_ON(!irqs_disabled());
@@ -1207,14 +1200,15 @@
 			trace_lock_irq(&trace_lock);
 			spin_lock(&lock->wait_lock);
 			if (waiter.task) {
-				list_del_init(&waiter.list);
+				plist_del_init(&lock->wait_list, &waiter.list);
 				/*
 				 * Just remove ourselves from the PI list.
 				 * (No big problem if our PI effect lingers
 				 *  a bit - owner will restore prio.)
 				 */
 				spin_lock(&pi_lock);
-				list_del_init(&waiter.pi_list);
+				plist_del(&waiter.pi_list, &lock->owner->pi_waiters);
+				plist_init(&waiter.pi_list, waiter.task->prio);
 				spin_unlock(&pi_lock);
 				ret = -EINTR;
 			}
@@ -1262,7 +1256,7 @@
 	if (grab_lock(lock,task)) {
 		/* granted */
 		struct task_struct *old_owner = lock->owner;
-		TRACE_WARN_ON(!list_empty(&lock->wait_list) && !old_owner);
+		TRACE_WARN_ON(!plist_empty(&lock->wait_list) && !old_owner);
 		spin_lock(&pi_lock);
 		set_new_owner(lock, old_owner, task, eip);
 		spin_unlock(&pi_lock);
@@ -1321,7 +1315,6 @@
 {
 	struct task_struct *old_owner, *new_owner;
 	struct rt_mutex_waiter *w;
-	struct list_head *curr;
 	unsigned long flags;
 	int prio;
 
@@ -1330,7 +1323,7 @@
 	trace_lock_irqsave(&trace_lock, flags);
 	TRACE_BUG_ON(!irqs_disabled());
 	spin_lock(&lock->wait_lock);
-	TRACE_BUG_ON(!lock->wait_list.prev && !lock->wait_list.next);
+	TRACE_BUG_ON(!lock->wait_list.dp_node.prev && !lock->wait_list.dp_node.next);
 
 #ifdef CONFIG_RT_DEADLOCK_DETECT
 	TRACE_WARN_ON(list_empty(&lock->held_list));
@@ -1340,12 +1333,12 @@
 
 	old_owner = lock->owner;
 #ifdef ALL_TASKS_PI
-	if (list_empty(&lock->wait_list))
+	if (plist_empty(&lock->wait_list))
 		check_pi_list_empty(lock, old_owner);
 #endif
 	lock->owner = NULL;
 	new_owner = NULL;
-	if (!list_empty(&lock->wait_list))
+	if (!plist_empty(&lock->wait_list))
 		new_owner = pick_new_owner(lock, old_owner, save_state, eip);
 
 	/*
@@ -1354,11 +1347,9 @@
 	 * waiter's priority):
 	 */
 	prio = mutex_getprio(old_owner);
-	list_for_each(curr, &old_owner->pi_waiters) {
-		w = list_entry(curr, struct rt_mutex_waiter, pi_list);
-		if (w->task->prio < prio)
-			prio = w->task->prio;
-		trace_special_pid(w->task->pid, w->task->prio, 0);
+	if (new_owner && !plist_empty(&new_owner->pi_waiters)) {
+		w = plist_entry(&new_owner->pi_waiters, struct rt_mutex_waiter, pi_list);
+		prio = w->task->prio;
 	}
 	if (prio != old_owner->prio)
 		pi_setprio(lock, old_owner, prio);






^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2005-05-09 22:15 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-11  8:49 [PATCH] Priority Lists for the RT mutex Perez-Gonzalez, Inaky
2005-04-11  8:57 ` Ingo Molnar
2005-04-11 22:17   ` Bill Huey
2005-04-12 20:35   ` Esben Nielsen
  -- strict thread matches above, loose matches on Subject: below --
2005-05-06 18:13 Perez-Gonzalez, Inaky
2005-05-07  7:25 ` Oleg Nesterov
2005-05-09  7:30   ` Ingo Molnar
2005-05-09  8:08     ` Oleg Nesterov
2005-05-09  9:11       ` Ingo Molnar
2005-05-09 14:32         ` Oleg Nesterov
2005-05-09 18:13         ` Daniel Walker
2005-05-09 19:21           ` Steven Rostedt
2005-05-09 21:29             ` Daniel Walker
2005-05-09 22:15               ` Daniel Walker
2005-05-09 14:07   ` Daniel Walker
2005-05-06  9:05 Oleg Nesterov
2005-05-06 11:13 ` Oleg Nesterov
2005-05-06 18:56   ` Daniel Walker
2005-04-12  0:36 Perez-Gonzalez, Inaky
2005-04-11 23:28 Perez-Gonzalez, Inaky
2005-04-12  0:27 ` Bill Huey
2005-04-11 22:31 Perez-Gonzalez, Inaky
2005-04-11 23:11 ` Bill Huey
2005-04-11  9:03 Perez-Gonzalez, Inaky
2005-04-11  8:27 Perez-Gonzalez, Inaky
2005-04-11  8:42 ` Ingo Molnar
2005-04-08 23:05 Perez-Gonzalez, Inaky
2005-04-08 21:25 Perez-Gonzalez, Inaky
2005-04-08 22:53 ` Daniel Walker
2005-04-07 17:52 Daniel Walker
2005-04-08  6:28 ` Ingo Molnar
2005-04-08  8:05   ` Sven-Thorsten Dietrich
2005-04-10 11:09     ` Ingo Molnar
2005-04-12 17:56       ` Daniel Walker
2005-04-08 17:11   ` Daniel Walker
2005-04-10 10:53 ` Ingo Molnar
2005-04-11 16:19   ` Daniel Walker
2005-04-12 18:37 ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).