linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.17-rt1
@ 2006-06-18  7:06 Ingo Molnar
  2006-06-18 16:13 ` 2.6.17-rt1 Michal Piotrowski
                   ` (3 more replies)
  0 siblings, 4 replies; 51+ messages in thread
From: Ingo Molnar @ 2006-06-18  7:06 UTC (permalink / raw)
  To: linux-kernel

i have released the 2.6.17-rt1 tree, which can be downloaded from the 
usual place:

   http://redhat.com/~mingo/realtime-preempt/

this is a fixes-only release: merged to 2.6.17-final and smaller fixes.

to build a 2.6.17-rc6-rt1 tree, the following patches should be applied:

  http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.17.tar.bz2
  http://redhat.com/~mingo/realtime-preempt/patch-2.6.17-rt1

	Ingo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.17-rt1
  2006-06-18  7:06 2.6.17-rt1 Ingo Molnar
@ 2006-06-18 16:13 ` Michal Piotrowski
       [not found] ` <Pine.LNX.4.64.0606201656230.11643@localhost.localdomain>
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 51+ messages in thread
From: Michal Piotrowski @ 2006-06-18 16:13 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Hi Ingo,

On 18/06/06, Ingo Molnar <mingo@elte.hu> wrote:
> i have released the 2.6.17-rt1 tree, which can be downloaded from the
> usual place:
>
>    http://redhat.com/~mingo/realtime-preempt/
>

I get this when I reboot my system.

Jun 18 17:59:26 ltg01-fedora kernel: skge eth0: disabling interface
Jun 18 17:59:27 ltg01-fedora kernel: BUG: using smp_processor_id() in
preemptible [00000000] code: modprobe/20325
Jun 18 17:59:27 ltg01-fedora kernel: caller is drain_array+0x15/0xe5
Jun 18 17:59:27 ltg01-fedora kernel:  [<c0104647>] show_trace+0xd/0xf (8)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c010470c>] dump_stack+0x17/0x19 (12)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c01da50f>]
debug_smp_processor_id+0x7b/0x8c (32)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c015e044>] drain_array+0x15/0xe5 (32)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c015e26e>] __cache_shrink+0x35/0xab (32)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c015fac1>]
kmem_cache_destroy+0x76/0x12e (16)
Jun 18 17:59:27 ltg01-fedora kernel:  [<fdc6d23e>]
ip_conntrack_cleanup+0x80/0xad [ip_conntrack] (12)
Jun 18 17:59:27 ltg01-fedora kernel:  [<fdc7030e>]
ip_conntrack_standalone_fini+0x56/0x85 [ip_conntrack] (8)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c013a7c5>]
sys_delete_module+0x18f/0x1b6 (96)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c0102e73>]
sysenter_past_esp+0x54/0x75 (-4020)
Jun 18 17:59:27 ltg01-fedora kernel: ---------------------------
Jun 18 17:59:27 ltg01-fedora kernel: | preempt count: 00000001 ]
Jun 18 17:59:27 ltg01-fedora kernel: | 1-level deep critical section nesting:
Jun 18 17:59:27 ltg01-fedora kernel: ----------------------------------------
Jun 18 17:59:27 ltg01-fedora kernel: .. [<c01da4d5>] ....
debug_smp_processor_id+0x41/0x8c
Jun 18 17:59:27 ltg01-fedora kernel: .....[<c015e044>] ..   ( <=
drain_array+0x15/0xe5)
Jun 18 17:59:27 ltg01-fedora kernel:
Jun 18 17:59:27 ltg01-fedora kernel: ------------------------------
Jun 18 17:59:27 ltg01-fedora kernel: | showing all locks held by: |
(modprobe/20325 [f72380b0, 117]):
Jun 18 17:59:27 ltg01-fedora kernel: ------------------------------
Jun 18 17:59:27 ltg01-fedora kernel:
Jun 18 17:59:27 ltg01-fedora kernel: #001:             [c0313d04]
{cpucontrol.lock}
Jun 18 17:59:27 ltg01-fedora kernel: ... acquired at:
rt_down+0x11/0x28
Jun 18 17:59:27 ltg01-fedora kernel:
Jun 18 17:59:27 ltg01-fedora kernel: BUG: modprobe:20325 task might
have lost a preemption check!
Jun 18 17:59:27 ltg01-fedora kernel:  [<c0104647>] show_trace+0xd/0xf (8)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c010470c>] dump_stack+0x17/0x19 (12)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c011a884>]
preempt_enable_no_resched+0x4c/0x51 (20)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c01da517>]
debug_smp_processor_id+0x83/0x8c (16)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c015e044>] drain_array+0x15/0xe5 (32)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c015e26e>] __cache_shrink+0x35/0xab (32)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c015fac1>]
kmem_cache_destroy+0x76/0x12e (16)
Jun 18 17:59:27 ltg01-fedora kernel:  [<fdc6d23e>]
ip_conntrack_cleanup+0x80/0xad [ip_conntrack] (12)
Jun 18 17:59:27 ltg01-fedora kernel:  [<fdc7030e>]
ip_conntrack_standalone_fini+0x56/0x85 [ip_conntrack] (8)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c013a7c5>]
sys_delete_module+0x18f/0x1b6 (96)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c0102e73>]
sysenter_past_esp+0x54/0x75 (-4020)
Jun 18 17:59:27 ltg01-fedora kernel: ---------------------------
Jun 18 17:59:27 ltg01-fedora kernel: | preempt count: 00000000 ]
Jun 18 17:59:27 ltg01-fedora kernel: | 0-level deep critical section nesting:
Jun 18 17:59:27 ltg01-fedora kernel: ----------------------------------------
Jun 18 17:59:27 ltg01-fedora kernel:
Jun 18 17:59:27 ltg01-fedora kernel: ------------------------------
Jun 18 17:59:27 ltg01-fedora kernel: | showing all locks held by: |
(modprobe/20325 [f72380b0, 117]):
Jun 18 17:59:27 ltg01-fedora kernel: ------------------------------
Jun 18 17:59:27 ltg01-fedora kernel:
Jun 18 17:59:27 ltg01-fedora kernel: #001:             [c0313d04]
{cpucontrol.lock}
Jun 18 17:59:27 ltg01-fedora kernel: ... acquired at:
rt_down+0x11/0x28
Jun 18 17:59:27 ltg01-fedora kernel:
Jun 18 17:59:27 ltg01-fedora kernel: BUG: using smp_processor_id() in
preemptible [00000000] code: modprobe/20325
Jun 18 17:59:27 ltg01-fedora kernel: BUG: using smp_processor_id() in
preemptible [00000000] code: modprobe/20325
Jun 18 17:59:27 ltg01-fedora kernel: caller is drain_array+0x15/0xe5
Jun 18 17:59:27 ltg01-fedora kernel:  [<c0104647>] show_trace+0xd/0xf (8)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c010470c>] dump_stack+0x17/0x19 (12)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c01da50f>]
debug_smp_processor_id+0x7b/0x8c (32)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c015e044>] drain_array+0x15/0xe5 (32)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c015e26e>] __cache_shrink+0x35/0xab (32)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c015fac1>]
kmem_cache_destroy+0x76/0x12e (16)
Jun 18 17:59:27 ltg01-fedora kernel:  [<fdc6d248>]
ip_conntrack_cleanup+0x8a/0xad [ip_conntrack] (12)
Jun 18 17:59:27 ltg01-fedora kernel:  [<fdc7030e>]
ip_conntrack_standalone_fini+0x56/0x85 [ip_conntrack] (8)
Jun 18 17:59:27 ltg01-fedora kernel:  [<c013a7c5>]
sys_delete_module+0x18f/0x1b6 (96)
Jun 18 17:59:27 ltg01-fedora kernel: Kernel logging (proc) stopped.

Here is a config file
http://www.stardust.webpages.pl/files/rt/2.6.17-rt1/rt-config

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/wiki/)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
       [not found] ` <Pine.LNX.4.64.0606201656230.11643@localhost.localdomain>
@ 2006-06-20 15:13   ` Thomas Gleixner
  2006-06-20 17:09     ` Esben Nielsen
  0 siblings, 1 reply; 51+ messages in thread
From: Thomas Gleixner @ 2006-06-20 15:13 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Ingo Molnar, linux-kernel

On Tue, 2006-06-20 at 17:01 +0100, Esben Nielsen wrote:
> Hi,
>   I wanted to run some tests with RTExec and I wanted to play around with 
> the priorities, but I could not set the priorities of softirq-hrtXXXX.
> I looked a bit in the code and found that hrtimer_adjust_softirq_prio() is
> called every loop, setting it back to priority 1.
> 
> Why is that? Can it be fixed so it behaves as any other task you can use 
> chrt on?

No, please see

http://www.linutronix.de/index.php?mact=News,cntnt01,detail,0&cntnt01articleid=8&cntnt01dateformat=%25b%20%25d%2C%20%25Y&cntnt01returnid=31
        
Dynamic priority support for high resolution timers
        
	tglx
        


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-20 17:09     ` Esben Nielsen
@ 2006-06-20 16:35       ` Thomas Gleixner
  2006-06-20 21:16         ` Esben Nielsen
  2006-06-20 16:39       ` Steven Rostedt
  1 sibling, 1 reply; 51+ messages in thread
From: Thomas Gleixner @ 2006-06-20 16:35 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Ingo Molnar, linux-kernel

On Tue, 2006-06-20 at 18:09 +0100, Esben Nielsen wrote:
> The only question I have is why the priority of the callback is set to
> priority of the task calling hrtimer_start() (current->normal_prio). That 
> seems like an odd binding to me. Shouldn't the finding of the priority be moved over to the 
> posix-timer code, where it is needed, and be given as a parameter to 
> hrtimer_start()?

Was the simplest way to do.

> In rtmutex.c, where a hrtimer is used as a timeout on a mutex, wouldn't it 
> make more sense to use current->prio than current->normal_prio if the task 
> is boosted when it starts to wait on a mutex.

Not sure about that.

> Let say you have a bunch of callback running at priority 1 and then the 
> next hrt timer with priority 99 expires. Then the callback which 
> is running will be boosted to priority 99. So the overall latency at 
> priority 99 will at least the latency of the worst hrtimer callback.
> And worse: What if the callback running is blocked on a mutex? Will the 
> owner of the mutex be boosted as well? Not according to the code in 
> sched.c. Therefore you get priority inversion to priority 1. That is the 
> worst case hrtimer latency is that of priority 1.
> 
> Therefore, a simpler and more robust design would be to give the thread 
> priority 99 as a default - just as the posix_cpu_timer thread. Then the 
> system designer can move it around with chrt when needed.
> In fact you can say the current design have both the worst cases of having 
> it running as priority 99 and at priority 1!

We had this before and it is horrible.

> Another complicated design would be to make a task for each priority. 
> Then the interrupt wakes the highest priority one, which handles the first 
> callback and awakes the next one etc.

Uurgh. Thats not a serious proposal ?

A nice solution would be to enqueue the timer into the task struct of
the thread which is target of the signal, wake that thread in a special
state which only runs the callback and then does the signal delivery.
The scary thing on this is to get the locking straight, but it might
well worth to try it. That way we would burden the softirq delivery to
the thread and maybe save a couple of task switches.

	tglx



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-20 17:09     ` Esben Nielsen
  2006-06-20 16:35       ` Thomas Gleixner
@ 2006-06-20 16:39       ` Steven Rostedt
  2006-06-20 18:12         ` Esben Nielsen
  1 sibling, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2006-06-20 16:39 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel


On Tue, 20 Jun 2006, Esben Nielsen wrote:

> I am sorry. I should have read some more of the code before asking.
>
> The only question I have is why the priority of the callback is set to
> priority of the task calling hrtimer_start() (current->normal_prio). That
> seems like an odd binding to me. Shouldn't the finding of the priority be moved over to the
> posix-timer code, where it is needed, and be given as a parameter to
> hrtimer_start()?
> In rtmutex.c, where a hrtimer is used as a timeout on a mutex, wouldn't it
> make more sense to use current->prio than current->normal_prio if the task
> is boosted when it starts to wait on a mutex.

That seems reasonable.  It probably is a bug to use normal_prio, since we
really do care what prio is at that time.

>
>
> But I am not sure I like the design at all:
>
> Let say you have a bunch of callback running at priority 1 and then the
> next hrt timer with priority 99 expires. Then the callback which
> is running will be boosted to priority 99. So the overall latency at
> priority 99 will at least the latency of the worst hrtimer callback.

You mean for those that expire at the same time?

I don't think this is a problem, because the run_hrtimer_hres_queue runs
the hightest priorty callback first, then it adjusts its prio to the next
priority callback.  See hrtimer_adjust_softirq_prio.

> And worse: What if the callback running is blocked on a mutex? Will the
> owner of the mutex be boosted as well? Not according to the code in
> sched.c. Therefore you get priority inversion to priority 1. That is the
> worst case hrtimer latency is that of priority 1.

I don't see this.

>
> Therefore, a simpler and more robust design would be to give the thread
> priority 99 as a default - just as the posix_cpu_timer thread. Then the
> system designer can move it around with chrt when needed.
> In fact you can say the current design have both the worst cases of having
> it running as priority 99 and at priority 1!

I still don't see this happening.

>
> Another complicated design would be to make a task for each priority.
> Then the interrupt wakes the highest priority one, which handles the first
> callback and awakes the next one etc.

Don't think that is necessary.

-- Steve


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-20 15:13   ` Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1) Thomas Gleixner
@ 2006-06-20 17:09     ` Esben Nielsen
  2006-06-20 16:35       ` Thomas Gleixner
  2006-06-20 16:39       ` Steven Rostedt
  0 siblings, 2 replies; 51+ messages in thread
From: Esben Nielsen @ 2006-06-20 17:09 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Esben Nielsen, Ingo Molnar, linux-kernel

On Tue, 20 Jun 2006, Thomas Gleixner wrote:

> On Tue, 2006-06-20 at 17:01 +0100, Esben Nielsen wrote:
>> Hi,
>>   I wanted to run some tests with RTExec and I wanted to play around with
>> the priorities, but I could not set the priorities of softirq-hrtXXXX.
>> I looked a bit in the code and found that hrtimer_adjust_softirq_prio() is
>> called every loop, setting it back to priority 1.
>>
>> Why is that? Can it be fixed so it behaves as any other task you can use
>> chrt on?
>
> No, please see
>
> http://www.linutronix.de/index.php?mact=News,cntnt01,detail,0&cntnt01articleid=8&cntnt01dateformat=%25b%20%25d%2C%20%25Y&cntnt01returnid=31
>
> Dynamic priority support for high resolution timers
>

I am sorry. I should have read some more of the code before asking.

The only question I have is why the priority of the callback is set to
priority of the task calling hrtimer_start() (current->normal_prio). That 
seems like an odd binding to me. Shouldn't the finding of the priority be moved over to the 
posix-timer code, where it is needed, and be given as a parameter to 
hrtimer_start()?
In rtmutex.c, where a hrtimer is used as a timeout on a mutex, wouldn't it 
make more sense to use current->prio than current->normal_prio if the task 
is boosted when it starts to wait on a mutex.


But I am not sure I like the design at all:

Let say you have a bunch of callback running at priority 1 and then the 
next hrt timer with priority 99 expires. Then the callback which 
is running will be boosted to priority 99. So the overall latency at 
priority 99 will at least the latency of the worst hrtimer callback.
And worse: What if the callback running is blocked on a mutex? Will the 
owner of the mutex be boosted as well? Not according to the code in 
sched.c. Therefore you get priority inversion to priority 1. That is the 
worst case hrtimer latency is that of priority 1.

Therefore, a simpler and more robust design would be to give the thread 
priority 99 as a default - just as the posix_cpu_timer thread. Then the 
system designer can move it around with chrt when needed.
In fact you can say the current design have both the worst cases of having 
it running as priority 99 and at priority 1!

Another complicated design would be to make a task for each priority. 
Then the interrupt wakes the highest priority one, which handles the first 
callback and awakes the next one etc.


Esben


> 	tglx
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-20 18:12         ` Esben Nielsen
@ 2006-06-20 17:21           ` Thomas Gleixner
  2006-06-20 21:26             ` Esben Nielsen
  2006-06-21  8:13           ` Steven Rostedt
  1 sibling, 1 reply; 51+ messages in thread
From: Thomas Gleixner @ 2006-06-20 17:21 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Steven Rostedt, Ingo Molnar, linux-kernel

On Tue, 2006-06-20 at 19:12 +0100, Esben Nielsen wrote:
> >
> >>
> >> Another complicated design would be to make a task for each priority.
> >> Then the interrupt wakes the highest priority one, which handles the first
> >> callback and awakes the next one etc.
> >
> > Don't think that is necessary.
> 
> Me neither :-) Running sofhtirq-hrt at priority 99 - or whatever is 
> set by chrt - should be sufficient.

It is not, that was the reason, why we implemted it. You get arbitrary
latencies caused by timer storms.

I have to check, whether the priority is propagated when the softirq is
blocked on a lock. If not its a bug and has to be fixed.

	tglx



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-20 16:39       ` Steven Rostedt
@ 2006-06-20 18:12         ` Esben Nielsen
  2006-06-20 17:21           ` Thomas Gleixner
  2006-06-21  8:13           ` Steven Rostedt
  0 siblings, 2 replies; 51+ messages in thread
From: Esben Nielsen @ 2006-06-20 18:12 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Esben Nielsen, Thomas Gleixner, Ingo Molnar, linux-kernel



On Tue, 20 Jun 2006, Steven Rostedt wrote:

>
> On Tue, 20 Jun 2006, Esben Nielsen wrote:
>
>> I am sorry. I should have read some more of the code before asking.
>>
>> The only question I have is why the priority of the callback is set to
>> priority of the task calling hrtimer_start() (current->normal_prio). That
>> seems like an odd binding to me. Shouldn't the finding of the priority be moved over to the
>> posix-timer code, where it is needed, and be given as a parameter to
>> hrtimer_start()?
>> In rtmutex.c, where a hrtimer is used as a timeout on a mutex, wouldn't it
>> make more sense to use current->prio than current->normal_prio if the task
>> is boosted when it starts to wait on a mutex.
>
> That seems reasonable.  It probably is a bug to use normal_prio, since we
> really do care what prio is at that time.
>
>>
>>
>> But I am not sure I like the design at all:
>>
>> Let say you have a bunch of callback running at priority 1 and then the
>> next hrt timer with priority 99 expires. Then the callback which
>> is running will be boosted to priority 99. So the overall latency at
>> priority 99 will at least the latency of the worst hrtimer callback.
>
> You mean for those that expire at the same time?
>

No, when the priority 1 (userspace prio) expires just before the 
priority 99.

> I don't think this is a problem, because the run_hrtimer_hres_queue runs
> the hightest priorty callback first, then it adjusts its prio to the next
> priority callback.  See hrtimer_adjust_softirq_prio.
>
>> And worse: What if the callback running is blocked on a mutex? Will the
>> owner of the mutex be boosted as well? Not according to the code in
>> sched.c. Therefore you get priority inversion to priority 1. That is the
>> worst case hrtimer latency is that of priority 1.
>
> I don't see this.

Look at this situation:
softirq-hrt, running some callback, is priority 1 (US prio as always) 
blocked for a mutex owned by some task, A. This now have priority 1 (in 
the worst case).The HRT interrupt comes and calls setscheduler(... prio 99).
That doesn't change the priority of task A as far as I can see from the code.
So in effect the priority 99 callback will wait for task A which is still
priority 1. That is a priority inversion.

>
>>
>> Therefore, a simpler and more robust design would be to give the thread
>> priority 99 as a default - just as the posix_cpu_timer thread. Then the
>> system designer can move it around with chrt when needed.
>> In fact you can say the current design have both the worst cases of having
>> it running as priority 99 and at priority 1!
>
> I still don't see this happening.

The two worst cases are:
1) The system wide system 99 worst case latency is at least that of the 
longest callback.
2) The worst case latency of softirq-hrt is that of priority 1.

If you could set it by chrt you could at least choose which evil thing you 
want.

>
>>
>> Another complicated design would be to make a task for each priority.
>> Then the interrupt wakes the highest priority one, which handles the first
>> callback and awakes the next one etc.
>
> Don't think that is necessary.

Me neither :-) Running sofhtirq-hrt at priority 99 - or whatever is 
set by chrt - should be sufficient.

>
> -- Steve
>

Esben

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-20 21:16         ` Esben Nielsen
@ 2006-06-20 20:35           ` Thomas Gleixner
  2006-06-20 23:19             ` Esben Nielsen
  0 siblings, 1 reply; 51+ messages in thread
From: Thomas Gleixner @ 2006-06-20 20:35 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Ingo Molnar, linux-kernel

On Tue, 2006-06-20 at 22:16 +0100, Esben Nielsen wrote:
> > We had this before and it is horrible.
> 
> "Horrible" in what respect?

Unbound latencies.

> > A nice solution would be to enqueue the timer into the task struct of
> > the thread which is target of the signal, wake that thread in a special
> > state which only runs the callback and then does the signal delivery.
> 
> What if the thread is running?

Well, do it in the return from interrupt path.

> Hmm, a practical thing to do would be to make a system where you can post 
> jobs in a thread. These jobs can then be done in thread context 
> around schedule or just before the task returns to user-space.

Thats basically what I said. But you have to take care of asynchronous
signal delivery. Therefor you need a special state. Also return to
userspace is not enough. You have to do the check in the return from
interrupt path, as you might delay async signals otherwise.

	tglx



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-20 21:26             ` Esben Nielsen
@ 2006-06-20 20:51               ` Thomas Gleixner
  2006-06-21  8:20               ` Steven Rostedt
  1 sibling, 0 replies; 51+ messages in thread
From: Thomas Gleixner @ 2006-06-20 20:51 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Steven Rostedt, Ingo Molnar, linux-kernel

On Tue, 2006-06-20 at 22:26 +0100, Esben Nielsen wrote:
> > I have to check, whether the priority is propagated when the softirq is
> > blocked on a lock. If not its a bug and has to be fixed.
> 
> I think the simplest solution would be to add
> 
>          if (p->blocked_on)
>                  wake_up_process(p);
> 
> in __setscheduler().

I'm sure we had something to make this work. No idea where it got lost.
Will check tomorrow.

	tglx



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-20 16:35       ` Thomas Gleixner
@ 2006-06-20 21:16         ` Esben Nielsen
  2006-06-20 20:35           ` Thomas Gleixner
  0 siblings, 1 reply; 51+ messages in thread
From: Esben Nielsen @ 2006-06-20 21:16 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Esben Nielsen, Ingo Molnar, linux-kernel



On Tue, 20 Jun 2006, Thomas Gleixner wrote:

> On Tue, 2006-06-20 at 18:09 +0100, Esben Nielsen wrote:
>> The only question I have is why the priority of the callback is set to
>> priority of the task calling hrtimer_start() (current->normal_prio). That
>> seems like an odd binding to me. Shouldn't the finding of the priority be moved over to the
>> posix-timer code, where it is needed, and be given as a parameter to
>> hrtimer_start()?
>
> Was the simplest way to do.
>

Yes, when you only use the hrtimers to implement signals. But what if you 
use them for something different?

>> In rtmutex.c, where a hrtimer is used as a timeout on a mutex, wouldn't it
>> make more sense to use current->prio than current->normal_prio if the task
>> is boosted when it starts to wait on a mutex.
>
> Not sure about that.
>
>> Let say you have a bunch of callback running at priority 1 and then the
>> next hrt timer with priority 99 expires. Then the callback which
>> is running will be boosted to priority 99. So the overall latency at
>> priority 99 will at least the latency of the worst hrtimer callback.
>> And worse: What if the callback running is blocked on a mutex? Will the
>> owner of the mutex be boosted as well? Not according to the code in
>> sched.c. Therefore you get priority inversion to priority 1. That is the
>> worst case hrtimer latency is that of priority 1.
>>
>> Therefore, a simpler and more robust design would be to give the thread
>> priority 99 as a default - just as the posix_cpu_timer thread. Then the
>> system designer can move it around with chrt when needed.
>> In fact you can say the current design have both the worst cases of having
>> it running as priority 99 and at priority 1!
>
> We had this before and it is horrible.

"Horrible" in what respect?

>
>> Another complicated design would be to make a task for each priority.
>> Then the interrupt wakes the highest priority one, which handles the first
>> callback and awakes the next one etc.
>
> Uurgh. Thats not a serious proposal ?

No, not really. And then a little bit in a generalized way:
The kernel could have these threads as general working threads on which 
all kind of callbacks could be executed at their right priority.
But, ofcourse, having 100 tasks sitting around per CPU is huge waste of 
memory. But maybe some kind of pooling could be made?

As I said in a another mail some time back: Having 100 different RT 
priorities doesn't make sense. You can at most need 10 anyway.

>
> A nice solution would be to enqueue the timer into the task struct of
> the thread which is target of the signal, wake that thread in a special
> state which only runs the callback and then does the signal delivery.

What if the thread is running?

> The scary thing on this is to get the locking straight, but it might
> well worth to try it. That way we would burden the softirq delivery to
> the thread and maybe save a couple of task switches.
>
> 	tglx
>
>

Hmm, a practical thing to do would be to make a system where you can post 
jobs in a thread. These jobs can then be done in thread context 
around schedule or just before the task returns to user-space.

Esben

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-20 17:21           ` Thomas Gleixner
@ 2006-06-20 21:26             ` Esben Nielsen
  2006-06-20 20:51               ` Thomas Gleixner
  2006-06-21  8:20               ` Steven Rostedt
  0 siblings, 2 replies; 51+ messages in thread
From: Esben Nielsen @ 2006-06-20 21:26 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Esben Nielsen, Steven Rostedt, Ingo Molnar, linux-kernel

On Tue, 20 Jun 2006, Thomas Gleixner wrote:

> On Tue, 2006-06-20 at 19:12 +0100, Esben Nielsen wrote:
>>>
>>>>
>>>> Another complicated design would be to make a task for each priority.
>>>> Then the interrupt wakes the highest priority one, which handles the first
>>>> callback and awakes the next one etc.
>>>
>>> Don't think that is necessary.
>>
>> Me neither :-) Running sofhtirq-hrt at priority 99 - or whatever is
>> set by chrt - should be sufficient.
>
> It is not, that was the reason, why we implemted it. You get arbitrary
> latencies caused by timer storms.
>

What you are saying is that there is a lot of timers timing out at the 
same time. When that happens you will get them all executed at priority 
99 with the simple setup. In the current design you get them executed in 
order of priority and the task lowers it's priority as it goes along.
If you have a long list of low priority callbacks pending to be executed 
the running one will finish at priority 99 and then the high priority one 
will be put in as the list on the list.

Ok, I see your point: Although you can't preempt the individual callbacks 
you can preempt the loop, which helps on latencies as many timers can 
timeout before they are executed.

> I have to check, whether the priority is propagated when the softirq is
> blocked on a lock. If not its a bug and has to be fixed.

I think the simplest solution would be to add

         if (p->blocked_on)
                 wake_up_process(p);

in __setscheduler().

>
> 	tglx
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-20 20:35           ` Thomas Gleixner
@ 2006-06-20 23:19             ` Esben Nielsen
  0 siblings, 0 replies; 51+ messages in thread
From: Esben Nielsen @ 2006-06-20 23:19 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Esben Nielsen, Ingo Molnar, linux-kernel

On Tue, 20 Jun 2006, Thomas Gleixner wrote:

> On Tue, 2006-06-20 at 22:16 +0100, Esben Nielsen wrote:
>>> We had this before and it is horrible.
>>
>> "Horrible" in what respect?
>
> Unbound latencies.
>
>>> A nice solution would be to enqueue the timer into the task struct of
>>> the thread which is target of the signal, wake that thread in a special
>>> state which only runs the callback and then does the signal delivery.
>>
>> What if the thread is running?
>
> Well, do it in the return from interrupt path.

Wouldn't it just be the same as interrupt context then?

>
>> Hmm, a practical thing to do would be to make a system where you can post
>> jobs in a thread. These jobs can then be done in thread context
>> around schedule or just before the task returns to user-space.
>
> Thats basically what I said. But you have to take care of asynchronous
> signal delivery. Therefor you need a special state. Also return to
> userspace is not enough. You have to do the check in the return from
> interrupt path, as you might delay async signals otherwise.
>

It doesn't sound very preemptible then.

I was more thinking along the lines of having a "message" where threads 
(and interrupts) can post functions to each other to be executed in 
the target context.

The idea is:
1) You can get a fairly standeard code executeded with a priority 
associated with the target thread.
2) Make sure that only one thread touches some data and that way avoid 
having locks and therefore deadlocks.
3) Simplyfying callbacks. Instead of installing a callback which can be 
executed in some task or interrupt, you can with this system make a 
callback which will be executed in client's context, and thus simplyfying 
the client's headaches with locks.

By "fairly standeard" code I mean code which doesn't block 
except that it should be allowed to use mutexes, which in many ways is not 
real blocking when PI is implemented.

If this should be safe you can't just preemp and execute these functions 
anywhere in a thread. It must be in a well-defined place where no mutexes are 
held. Many (all?) kernel threads have a while() with a central schedule() in
the middle. One can most often call these functions safely at this 
schedule(). For userspace threads it would also be safe to call the functions
when going in and out of userspace. It will probably be safe to call the
functions at any call to schedule() except for the one inside the mutex 
lock operation. The volentary-preemption points Ingo made some years ago 
would probably be ok places, too.

Esben

> 	tglx
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-20 18:12         ` Esben Nielsen
  2006-06-20 17:21           ` Thomas Gleixner
@ 2006-06-21  8:13           ` Steven Rostedt
  2006-06-21 11:03             ` Esben Nielsen
  1 sibling, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2006-06-21  8:13 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel


On Tue, 20 Jun 2006, Esben Nielsen wrote:
> >>
> >> Let say you have a bunch of callback running at priority 1 and then the
> >> next hrt timer with priority 99 expires. Then the callback which
> >> is running will be boosted to priority 99. So the overall latency at
> >> priority 99 will at least the latency of the worst hrtimer callback.
> >
> > You mean for those that expire at the same time?
> >
>
> No, when the priority 1 (userspace prio) expires just before the
> priority 99.

So you are worried about a priority 1 timer running when the 99 priority
timer expires?  In this case we should up the timer softirq to 99 and make
the currently running timer just like a critical section.  So the latency
would be just the time of the longest running timer callback.

>
> > I don't think this is a problem, because the run_hrtimer_hres_queue runs
> > the hightest priorty callback first, then it adjusts its prio to the next
> > priority callback.  See hrtimer_adjust_softirq_prio.
> >
> >> And worse: What if the callback running is blocked on a mutex? Will the
> >> owner of the mutex be boosted as well? Not according to the code in
> >> sched.c. Therefore you get priority inversion to priority 1. That is the
> >> worst case hrtimer latency is that of priority 1.
> >
> > I don't see this.
>
> Look at this situation:
> softirq-hrt, running some callback, is priority 1 (US prio as always)
> blocked for a mutex owned by some task, A. This now have priority 1 (in
> the worst case).The HRT interrupt comes and calls setscheduler(... prio 99).
> That doesn't change the priority of task A as far as I can see from the code.
> So in effect the priority 99 callback will wait for task A which is still
> priority 1. That is a priority inversion.

OK, that sounds like a bug.

>
> >
> >>
> >> Therefore, a simpler and more robust design would be to give the thread
> >> priority 99 as a default - just as the posix_cpu_timer thread. Then the
> >> system designer can move it around with chrt when needed.
> >> In fact you can say the current design have both the worst cases of having
> >> it running as priority 99 and at priority 1!
> >
> > I still don't see this happening.
>
> The two worst cases are:
> 1) The system wide system 99 worst case latency is at least that of the
> longest callback.

If the above bug is fixed, that should read "at most" that of the longest
callback.

> 2) The worst case latency of softirq-hrt is that of priority 1.

We really do need the softirq-hrt dynamic. For what I deal with it helps a
lot.  I want the sleeping high priority task wake up when its timer goes
off, and I don't want it preempted when it's not sleeping by timers of
lower priority tasks.

>
> If you could set it by chrt you could at least choose which evil thing you
> want.

Doesn't work for my situation. Unless I tell the application programers to
up the timer softirq-hrt just before the highest prio thread goes to
sleep. But they are Java programmers and I don't expect them to do this ;)

>
> >
> >>
> >> Another complicated design would be to make a task for each priority.
> >> Then the interrupt wakes the highest priority one, which handles the first
> >> callback and awakes the next one etc.
> >
> > Don't think that is necessary.
>
> Me neither :-) Running sofhtirq-hrt at priority 99 - or whatever is
> set by chrt - should be sufficient.

No, I think fixing the bug you found is better :)

-- Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-20 21:26             ` Esben Nielsen
  2006-06-20 20:51               ` Thomas Gleixner
@ 2006-06-21  8:20               ` Steven Rostedt
  2006-06-21 11:05                 ` Esben Nielsen
  1 sibling, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2006-06-21  8:20 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel


On Tue, 20 Jun 2006, Esben Nielsen wrote:

>
> > I have to check, whether the priority is propagated when the softirq is
> > blocked on a lock. If not its a bug and has to be fixed.
>
> I think the simplest solution would be to add
>
>          if (p->blocked_on)
>                  wake_up_process(p);
>
> in __setscheduler().
>

Except that wake_up_process calls try_to_wakeup which grabs the runqueue
lock, which unfortunately is already held when __setscheduler is called.

-- Steve


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-21  8:13           ` Steven Rostedt
@ 2006-06-21 11:03             ` Esben Nielsen
  0 siblings, 0 replies; 51+ messages in thread
From: Esben Nielsen @ 2006-06-21 11:03 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Esben Nielsen, Thomas Gleixner, Ingo Molnar, linux-kernel



On Wed, 21 Jun 2006, Steven Rostedt wrote:

> [...]
> No, I think fixing the bug you found is better :)
>

Thomas convinced me. I don't like the design though. Moving the 
task priorities around isn't very healthy. It is confusing and it 
complicates stuff a lot. But as Thomas says, with my solution one could 
make DOS attack from userspace by arranging a lot of timers to timeout
simultaniously (or sufficient close to each other).
However, this is still possible, but less easy to do: In the interrupt 
every timer expirering is touched. I.e. by having a lot of timers you push 
a lot of work into the interrupt routine. In fact the interrupt routine 
also have to take the timer out of the binary tree, so the worst case interrupt 
latency is O(N log N), N being the total amount of timer in the system!!

But what about the configure option HIGH_RES_RESOLUTION ? It is barely 
used in the system now. But if timers closer than the resolution were 
batched together in a plist, then all the interrupt should do was to merge 
two plists, namely the list of newly timeouts and the list of stuff still 
pending to be executed by the softirq thread. The worst case would then be
max(O(log N), O(<number of priorities==140>), which is both fairly limited as
N can never be bigger than 1G on a 32 bit system such that log N <= 30.

Esben

> -- Steve
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-21  8:20               ` Steven Rostedt
@ 2006-06-21 11:05                 ` Esben Nielsen
  2006-06-21 15:43                   ` Esben Nielsen
  0 siblings, 1 reply; 51+ messages in thread
From: Esben Nielsen @ 2006-06-21 11:05 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Esben Nielsen, Thomas Gleixner, Ingo Molnar, linux-kernel



On Wed, 21 Jun 2006, Steven Rostedt wrote:

>
> On Tue, 20 Jun 2006, Esben Nielsen wrote:
>
>>
>>> I have to check, whether the priority is propagated when the softirq is
>>> blocked on a lock. If not its a bug and has to be fixed.
>>
>> I think the simplest solution would be to add
>>
>>          if (p->blocked_on)
>>                  wake_up_process(p);
>>
>> in __setscheduler().
>>
>
> Except that wake_up_process calls try_to_wakeup which grabs the runqueue
> lock, which unfortunately is already held when __setscheduler is called.
>

Yeah, I saw. Move it out in setscheduler() then. I'll try to fix it, but I 
am not sure I can make a test if it works.

Esben

> -- Steve
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-21 15:43                   ` Esben Nielsen
@ 2006-06-21 15:21                     ` Steven Rostedt
  2006-06-21 16:37                       ` Esben Nielsen
  2006-06-21 16:26                     ` Thomas Gleixner
  1 sibling, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2006-06-21 15:21 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Esben Nielsen, Thomas Gleixner, Ingo Molnar, linux-kernel


On Wed, 21 Jun 2006, Esben Nielsen wrote:

> > Yeah, I saw. Move it out in setscheduler() then. I'll try to fix it, but I am
> > not sure I can make a test if it works.
>
> What about the patch below. It compiles and my UP labtop runs fine, but I
> haven't otherwise tested it.  My labtop runs RTExec without hichups
> until now.

Hm, I take it that English is not your native language (maybe it its??)
You do write very well regardless, but I notice with my German colleagues,
that they seem to always translate "bis jetzt" to "until now" when I would
use something like "as of yet".  The "until now" (at least to me) gives
the impression that it hasn't happened up till now, but it has happended
now.  So one could say after running their car into a tree "Oh well,
I haven't had an accident until now". So I'm assuming that you still don't
have hichups.
  OK enough about that ;)

The below patch has whitespace damage.  Could you resend?

Thanks,

-- Steve

>
> Esben
>
> Index: linux-2.6.17-rt1/kernel/rtmutex.c
> ===================================================================
> --- linux-2.6.17-rt1.orig/kernel/rtmutex.c
> +++ linux-2.6.17-rt1/kernel/rtmutex.c
> @@ -625,6 +625,7 @@ rt_lock_slowlock(struct rt_mutex *lock _
>
>   	debug_rt_mutex_init_waiter(&waiter);
>   	waiter.task = NULL;
> +	waiter.save_state = 1;
>
>   	spin_lock(&lock->wait_lock);
>
> @@ -687,6 +688,19 @@ rt_lock_slowlock(struct rt_mutex *lock _
>   		state = xchg(&current->state, TASK_UNINTERRUPTIBLE);
>   		if (unlikely(state == TASK_RUNNING))
>   			saved_state = TASK_RUNNING;
> +
> +		if (unlikely(waiter.task) &&
> +		    waiter.list_entry.prio != current->prio) {
> +			/*
> +			 * We still not have the lock, but we are woken up with
> +			 * a different prio than the one we waited with
> +			 * originally. We remove the wait entry now and then
> +			 * reinsert ourselves with the right priority
> +			 */
> +			remove_waiter(lock, &waiter __IP__);
> +			waiter.task = NULL;
> +		}
> +
>   	}
>
>   	state = xchg(&current->state, saved_state);
> @@ -798,6 +812,7 @@ rt_mutex_slowlock(struct rt_mutex *lock,
>
>   	debug_rt_mutex_init_waiter(&waiter);
>   	waiter.task = NULL;
> +	waiter.save_state = 0;
>
>   	spin_lock(&lock->wait_lock);
>
> @@ -877,6 +892,18 @@ rt_mutex_slowlock(struct rt_mutex *lock,
>
>   		current->flags |= saved_flags;
>   		set_current_state(state);
> +
> +		if (unlikely(waiter.task) &&
> +		    waiter.list_entry.prio != current->prio) {
> +			/*
> +			 * We still not have the lock, but we are woken up with
> +			 * a different prio than the one we waited with
> +			 * originally. We remove the wait entry now and then
> +			 * reinsert ourselves with the right priority
> +			 */
> +			remove_waiter(lock, &waiter __IP__);
> +			waiter.task = NULL;
> +		}
>   	}
>
>   	set_current_state(TASK_RUNNING);
> Index: linux-2.6.17-rt1/kernel/sched.c
> ===================================================================
> --- linux-2.6.17-rt1.orig/kernel/sched.c
> +++ linux-2.6.17-rt1/kernel/sched.c
> @@ -57,6 +57,8 @@
>
>   #include <asm/unistd.h>
>
> +#include "rtmutex_common.h"
> +
>   /*
>    * Convert user-nice values [ -20 ... 0 ... 19 ]
>    * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
> @@ -646,7 +648,9 @@ static inline void sched_info_switch(tas
>   #define sched_info_switch(t, next)	do { } while (0)
>   #endif /* CONFIG_SCHEDSTATS */
>
> +#ifdef CONFIG_SMP
>   static __cacheline_aligned_in_smp atomic_t rt_overload;
> +#endif
>
>   static inline void inc_rt_tasks(task_t *p, runqueue_t *rq)
>   {
> @@ -1473,10 +1477,11 @@ static inline int wake_idle(int cpu, tas
>   static int try_to_wake_up(task_t *p, unsigned int state, int sync, int mutex)
>   {
>   	int cpu, this_cpu, success = 0;
> -	runqueue_t *this_rq, *rq;
> +	runqueue_t *rq;
>   	unsigned long flags;
>   	long old_state;
>   #ifdef CONFIG_SMP
> +	runqueue_t *this_rq;
>   	unsigned long load, this_load;
>   	struct sched_domain *sd, *this_sd = NULL;
>   	int new_cpu;
> @@ -4351,6 +4356,18 @@ int setscheduler(struct task_struct *p,
>   			resched_task(rq->curr);
>   	}
>   	task_rq_unlock(rq, &flags);
> +
> +	/*
> +	 * If the process is blocked on rt-mutex, it will now wake up and
> +	 * reinsert itself into the wait list and boost the owner correctly
> +	 */
> +	if (p->pi_blocked_on) {
> +		if (p->pi_blocked_on->save_state)
> +			wake_up_process_mutex(p);
> +		else
> +			wake_up_process(p);
> +	}
> +
>   	spin_unlock_irqrestore(&p->pi_lock, fp);
>   	return 0;
>   }
> @@ -7086,4 +7103,3 @@ void notrace preempt_enable_no_resched(v
>
>   EXPORT_SYMBOL(preempt_enable_no_resched);
>   #endif
> -
> Index: linux-2.6.17-rt1/kernel/rtmutex_common.h
> ===================================================================
> --- linux-2.6.17-rt1.orig/kernel/rtmutex_common.h
> +++ linux-2.6.17-rt1/kernel/rtmutex_common.h
> @@ -49,6 +49,7 @@ struct rt_mutex_waiter {
>   	struct plist_node	pi_list_entry;
>   	struct task_struct	*task;
>   	struct rt_mutex		*lock;
> +	int                     save_state;
>   #ifdef CONFIG_DEBUG_RT_MUTEXES
>   	unsigned long		ip;
>   	pid_t			deadlock_task_pid;
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-21 11:05                 ` Esben Nielsen
@ 2006-06-21 15:43                   ` Esben Nielsen
  2006-06-21 15:21                     ` Steven Rostedt
  2006-06-21 16:26                     ` Thomas Gleixner
  0 siblings, 2 replies; 51+ messages in thread
From: Esben Nielsen @ 2006-06-21 15:43 UTC (permalink / raw)
  To: Esben Nielsen
  Cc: Steven Rostedt, Esben Nielsen, Thomas Gleixner, Ingo Molnar,
	linux-kernel

On Wed, 21 Jun 2006, Esben Nielsen wrote:

>
>
> On Wed, 21 Jun 2006, Steven Rostedt wrote:
>
>>
>>  On Tue, 20 Jun 2006, Esben Nielsen wrote:
>> 
>> > 
>> > >  I have to check, whether the priority is propagated when the softirq 
>> > >  is
>> > >  blocked on a lock. If not its a bug and has to be fixed.
>> > 
>> >  I think the simplest solution would be to add
>> > 
>> >           if (p->blocked_on)
>> >                   wake_up_process(p);
>> > 
>> >  in __setscheduler().
>> > 
>>
>>  Except that wake_up_process calls try_to_wakeup which grabs the runqueue
>>  lock, which unfortunately is already held when __setscheduler is called.
>> 
>
> Yeah, I saw. Move it out in setscheduler() then. I'll try to fix it, but I am 
> not sure I can make a test if it works.

What about the patch below. It compiles and my UP labtop runs fine, but I 
haven't otherwise tested it.  My labtop runs RTExec without hichups 
until now.

Esben

Index: linux-2.6.17-rt1/kernel/rtmutex.c
===================================================================
--- linux-2.6.17-rt1.orig/kernel/rtmutex.c
+++ linux-2.6.17-rt1/kernel/rtmutex.c
@@ -625,6 +625,7 @@ rt_lock_slowlock(struct rt_mutex *lock _

  	debug_rt_mutex_init_waiter(&waiter);
  	waiter.task = NULL;
+	waiter.save_state = 1;

  	spin_lock(&lock->wait_lock);

@@ -687,6 +688,19 @@ rt_lock_slowlock(struct rt_mutex *lock _
  		state = xchg(&current->state, TASK_UNINTERRUPTIBLE);
  		if (unlikely(state == TASK_RUNNING))
  			saved_state = TASK_RUNNING;
+
+		if (unlikely(waiter.task) &&
+		    waiter.list_entry.prio != current->prio) {
+			/*
+			 * We still not have the lock, but we are woken up with
+			 * a different prio than the one we waited with
+			 * originally. We remove the wait entry now and then
+			 * reinsert ourselves with the right priority
+			 */
+			remove_waiter(lock, &waiter __IP__);
+			waiter.task = NULL;
+		}
+
  	}

  	state = xchg(&current->state, saved_state);
@@ -798,6 +812,7 @@ rt_mutex_slowlock(struct rt_mutex *lock,

  	debug_rt_mutex_init_waiter(&waiter);
  	waiter.task = NULL;
+	waiter.save_state = 0;

  	spin_lock(&lock->wait_lock);

@@ -877,6 +892,18 @@ rt_mutex_slowlock(struct rt_mutex *lock,

  		current->flags |= saved_flags;
  		set_current_state(state);
+
+		if (unlikely(waiter.task) &&
+		    waiter.list_entry.prio != current->prio) {
+			/*
+			 * We still not have the lock, but we are woken up with
+			 * a different prio than the one we waited with
+			 * originally. We remove the wait entry now and then
+			 * reinsert ourselves with the right priority
+			 */
+			remove_waiter(lock, &waiter __IP__);
+			waiter.task = NULL;
+		}
  	}

  	set_current_state(TASK_RUNNING);
Index: linux-2.6.17-rt1/kernel/sched.c
===================================================================
--- linux-2.6.17-rt1.orig/kernel/sched.c
+++ linux-2.6.17-rt1/kernel/sched.c
@@ -57,6 +57,8 @@

  #include <asm/unistd.h>

+#include "rtmutex_common.h"
+
  /*
   * Convert user-nice values [ -20 ... 0 ... 19 ]
   * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
@@ -646,7 +648,9 @@ static inline void sched_info_switch(tas
  #define sched_info_switch(t, next)	do { } while (0)
  #endif /* CONFIG_SCHEDSTATS */

+#ifdef CONFIG_SMP
  static __cacheline_aligned_in_smp atomic_t rt_overload;
+#endif

  static inline void inc_rt_tasks(task_t *p, runqueue_t *rq)
  {
@@ -1473,10 +1477,11 @@ static inline int wake_idle(int cpu, tas
  static int try_to_wake_up(task_t *p, unsigned int state, int sync, int mutex)
  {
  	int cpu, this_cpu, success = 0;
-	runqueue_t *this_rq, *rq;
+	runqueue_t *rq;
  	unsigned long flags;
  	long old_state;
  #ifdef CONFIG_SMP
+	runqueue_t *this_rq;
  	unsigned long load, this_load;
  	struct sched_domain *sd, *this_sd = NULL;
  	int new_cpu;
@@ -4351,6 +4356,18 @@ int setscheduler(struct task_struct *p,
  			resched_task(rq->curr);
  	}
  	task_rq_unlock(rq, &flags);
+
+	/*
+	 * If the process is blocked on rt-mutex, it will now wake up and
+	 * reinsert itself into the wait list and boost the owner correctly
+	 */
+	if (p->pi_blocked_on) {
+		if (p->pi_blocked_on->save_state)
+			wake_up_process_mutex(p);
+		else
+			wake_up_process(p);
+	}
+
  	spin_unlock_irqrestore(&p->pi_lock, fp);
  	return 0;
  }
@@ -7086,4 +7103,3 @@ void notrace preempt_enable_no_resched(v

  EXPORT_SYMBOL(preempt_enable_no_resched);
  #endif
-
Index: linux-2.6.17-rt1/kernel/rtmutex_common.h
===================================================================
--- linux-2.6.17-rt1.orig/kernel/rtmutex_common.h
+++ linux-2.6.17-rt1/kernel/rtmutex_common.h
@@ -49,6 +49,7 @@ struct rt_mutex_waiter {
  	struct plist_node	pi_list_entry;
  	struct task_struct	*task;
  	struct rt_mutex		*lock;
+	int                     save_state;
  #ifdef CONFIG_DEBUG_RT_MUTEXES
  	unsigned long		ip;
  	pid_t			deadlock_task_pid;

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-21 16:37                       ` Esben Nielsen
@ 2006-06-21 15:51                         ` Steven Rostedt
  2006-06-21 17:14                           ` Esben Nielsen
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2006-06-21 15:51 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Esben Nielsen, Thomas Gleixner, Ingo Molnar, linux-kernel


On Wed, 21 Jun 2006, Esben Nielsen wrote:

> > The below patch has whitespace damage.  Could you resend?
> >
>
> Does it help if I send it as an attachment?
>

Don't know.  See below:

>
>
> >>
> >> Index: linux-2.6.17-rt1/kernel/rtmutex.c
> >> ===================================================================
> >> --- linux-2.6.17-rt1.orig/kernel/rtmutex.c
> >> +++ linux-2.6.17-rt1/kernel/rtmutex.c
> >> @@ -625,6 +625,7 @@ rt_lock_slowlock(struct rt_mutex *lock _
> >>

The below has two spaces before the tab.  Is that from your mailer?

> >>   	debug_rt_mutex_init_waiter(&waiter);
> >>   	waiter.task = NULL;
> >> +	waiter.save_state = 1;
> >>
> >>   	spin_lock(&lock->wait_lock);
> >>

The rest of the patch is the same.  The two spaces kill patch.

-- Steve


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-21 15:43                   ` Esben Nielsen
  2006-06-21 15:21                     ` Steven Rostedt
@ 2006-06-21 16:26                     ` Thomas Gleixner
  2006-06-21 18:30                       ` Ingo Molnar
                                         ` (2 more replies)
  1 sibling, 3 replies; 51+ messages in thread
From: Thomas Gleixner @ 2006-06-21 16:26 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Esben Nielsen, Steven Rostedt, Ingo Molnar, linux-kernel

On Wed, 2006-06-21 at 16:43 +0100, Esben Nielsen wrote:
> What about the patch below. It compiles and my UP labtop runs fine, but I 
> haven't otherwise tested it.  My labtop runs RTExec without hichups 
> until now.

NAK, it puts the burden into the lock path and also does a remove / add
which results in walking the chain twice.

Find an version against the code in -mm below. Not too much tested yet.

	tglx

Index: linux-2.6.17-rc6-mm/kernel/rtmutex.c
===================================================================
--- linux-2.6.17-rc6-mm.orig/kernel/rtmutex.c	2006-06-21 13:34:34.000000000 +0200
+++ linux-2.6.17-rc6-mm/kernel/rtmutex.c	2006-06-21 15:45:21.000000000 +0200
@@ -160,7 +160,8 @@
 static int rt_mutex_adjust_prio_chain(task_t *task,
 				      int deadlock_detect,
 				      struct rt_mutex *orig_lock,
-				      struct rt_mutex_waiter *orig_waiter)
+				      struct rt_mutex_waiter *orig_waiter,
+				      struct task_struct *top_task)
 {
 	struct rt_mutex *lock;
 	struct rt_mutex_waiter *waiter, *top_waiter = orig_waiter;
@@ -188,7 +189,7 @@
 			prev_max = max_lock_depth;
 			printk(KERN_WARNING "Maximum lock depth %d reached "
 			       "task: %s (%d)\n", max_lock_depth,
-			       current->comm, current->pid);
+			       top_task->comm, top_task->pid);
 		}
 		put_task_struct(task);
 
@@ -228,7 +229,7 @@
 	}
 
 	/* Deadlock detection */
-	if (lock == orig_lock || rt_mutex_owner(lock) == current) {
+	if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
 		debug_rt_mutex_deadlock(deadlock_detect, orig_waiter, lock);
 		spin_unlock(&lock->wait_lock);
 		ret = deadlock_detect ? -EDEADLK : 0;
@@ -448,7 +449,8 @@
 
 	spin_unlock(&lock->wait_lock);
 
-	res = rt_mutex_adjust_prio_chain(owner, detect_deadlock, lock, waiter);
+	res = rt_mutex_adjust_prio_chain(owner, detect_deadlock, lock, waiter,
+					 current);
 
 	spin_lock(&lock->wait_lock);
 
@@ -561,12 +563,35 @@
 
 	spin_unlock(&lock->wait_lock);
 
-	rt_mutex_adjust_prio_chain(owner, 0, lock, NULL);
+	rt_mutex_adjust_prio_chain(owner, 0, lock, NULL, current);
 
 	spin_lock(&lock->wait_lock);
 }
 
 /*
+ * Recheck the pi chain, in case we got a priority setting
+ *
+ * Called from sched_setscheduler
+ */
+void rt_mutex_adjust_pi(struct task_struct *task)
+{
+	struct rt_mutex_waiter *waiter = task->pi_blocked_on;
+	unsigned long flags;
+
+	spin_lock_irqsave(&task->pi_lock, flags);
+
+	if (!waiter || waiter->list_entry.prio == task->prio) {
+		spin_unlock_irqrestore(&task->pi_lock, flags);
+		return;
+	}
+
+	get_task_struct(task);
+	spin_unlock_irqrestore(&task->pi_lock, flags);
+
+	rt_mutex_adjust_prio_chain(task, 0, NULL, NULL, task);
+}
+
+/*
  * Slow path lock function:
  */
 static int __sched
@@ -633,6 +658,7 @@
 			if (unlikely(ret))
 				break;
 		}
+
 		spin_unlock(&lock->wait_lock);
 
 		debug_rt_mutex_print_deadlock(&waiter);
Index: linux-2.6.17-rc6-mm/kernel/sched.c
===================================================================
--- linux-2.6.17-rc6-mm.orig/kernel/sched.c	2006-06-21 13:34:34.000000000 +0200
+++ linux-2.6.17-rc6-mm/kernel/sched.c	2006-06-21 18:21:16.000000000 +0200
@@ -4028,8 +4028,7 @@
 int sched_setscheduler(struct task_struct *p, int policy,
 		       struct sched_param *param)
 {
-	int retval;
-	int oldprio, oldpolicy = -1;
+	int retval, oldprio, oldpolicy = -1;
 	prio_array_t *array;
 	unsigned long flags;
 	runqueue_t *rq;
@@ -4119,6 +4118,8 @@
 	__task_rq_unlock(rq);
 	spin_unlock_irqrestore(&p->pi_lock, flags);
 
+	rt_mutex_adjust_pi(p);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(sched_setscheduler);
Index: linux-2.6.17-rc6-mm/include/linux/sched.h
===================================================================
--- linux-2.6.17-rc6-mm.orig/include/linux/sched.h	2006-06-21 13:37:21.000000000 +0200
+++ linux-2.6.17-rc6-mm/include/linux/sched.h	2006-06-21 15:43:23.000000000 +0200
@@ -1123,11 +1123,13 @@
 #ifdef CONFIG_RT_MUTEXES
 extern int rt_mutex_getprio(task_t *p);
 extern void rt_mutex_setprio(task_t *p, int prio);
+extern void rt_mutex_adjust_pi(task_t *p);
 #else
 static inline int rt_mutex_getprio(task_t *p)
 {
 	return p->normal_prio;
 }
+# define rt_mutex_adjust_pi(p)		do { } while (0)
 #endif
 
 extern void set_user_nice(task_t *p, long nice);



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-21 15:21                     ` Steven Rostedt
@ 2006-06-21 16:37                       ` Esben Nielsen
  2006-06-21 15:51                         ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Esben Nielsen @ 2006-06-21 16:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Esben Nielsen, Esben Nielsen, Thomas Gleixner, Ingo Molnar, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5388 bytes --]



On Wed, 21 Jun 2006, Steven Rostedt wrote:

>
> On Wed, 21 Jun 2006, Esben Nielsen wrote:
>
>>> Yeah, I saw. Move it out in setscheduler() then. I'll try to fix it, but I am
>>> not sure I can make a test if it works.
>>
>> What about the patch below. It compiles and my UP labtop runs fine, but I
>> haven't otherwise tested it.  My labtop runs RTExec without hichups
>> until now.
>
> Hm, I take it that English is not your native language (maybe it its??)
> You do write very well regardless, but I notice with my German colleagues,
> that they seem to always translate "bis jetzt" to "until now" when I would
> use something like "as of yet".  The "until now" (at least to me) gives
> the impression that it hasn't happened up till now, but it has happended
> now.  So one could say after running their car into a tree "Oh well,
> I haven't had an accident until now". So I'm assuming that you still don't
> have hichups.
>  OK enough about that ;)
>
> The below patch has whitespace damage.  Could you resend?
>

Does it help if I send it as an attachment?

Esben


> Thanks,
>
> -- Steve
>
>>
>> Esben
>>
>> Index: linux-2.6.17-rt1/kernel/rtmutex.c
>> ===================================================================
>> --- linux-2.6.17-rt1.orig/kernel/rtmutex.c
>> +++ linux-2.6.17-rt1/kernel/rtmutex.c
>> @@ -625,6 +625,7 @@ rt_lock_slowlock(struct rt_mutex *lock _
>>
>>   	debug_rt_mutex_init_waiter(&waiter);
>>   	waiter.task = NULL;
>> +	waiter.save_state = 1;
>>
>>   	spin_lock(&lock->wait_lock);
>>
>> @@ -687,6 +688,19 @@ rt_lock_slowlock(struct rt_mutex *lock _
>>   		state = xchg(&current->state, TASK_UNINTERRUPTIBLE);
>>   		if (unlikely(state == TASK_RUNNING))
>>   			saved_state = TASK_RUNNING;
>> +
>> +		if (unlikely(waiter.task) &&
>> +		    waiter.list_entry.prio != current->prio) {
>> +			/*
>> +			 * We still not have the lock, but we are woken up with
>> +			 * a different prio than the one we waited with
>> +			 * originally. We remove the wait entry now and then
>> +			 * reinsert ourselves with the right priority
>> +			 */
>> +			remove_waiter(lock, &waiter __IP__);
>> +			waiter.task = NULL;
>> +		}
>> +
>>   	}
>>
>>   	state = xchg(&current->state, saved_state);
>> @@ -798,6 +812,7 @@ rt_mutex_slowlock(struct rt_mutex *lock,
>>
>>   	debug_rt_mutex_init_waiter(&waiter);
>>   	waiter.task = NULL;
>> +	waiter.save_state = 0;
>>
>>   	spin_lock(&lock->wait_lock);
>>
>> @@ -877,6 +892,18 @@ rt_mutex_slowlock(struct rt_mutex *lock,
>>
>>   		current->flags |= saved_flags;
>>   		set_current_state(state);
>> +
>> +		if (unlikely(waiter.task) &&
>> +		    waiter.list_entry.prio != current->prio) {
>> +			/*
>> +			 * We still not have the lock, but we are woken up with
>> +			 * a different prio than the one we waited with
>> +			 * originally. We remove the wait entry now and then
>> +			 * reinsert ourselves with the right priority
>> +			 */
>> +			remove_waiter(lock, &waiter __IP__);
>> +			waiter.task = NULL;
>> +		}
>>   	}
>>
>>   	set_current_state(TASK_RUNNING);
>> Index: linux-2.6.17-rt1/kernel/sched.c
>> ===================================================================
>> --- linux-2.6.17-rt1.orig/kernel/sched.c
>> +++ linux-2.6.17-rt1/kernel/sched.c
>> @@ -57,6 +57,8 @@
>>
>>   #include <asm/unistd.h>
>>
>> +#include "rtmutex_common.h"
>> +
>>   /*
>>    * Convert user-nice values [ -20 ... 0 ... 19 ]
>>    * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
>> @@ -646,7 +648,9 @@ static inline void sched_info_switch(tas
>>   #define sched_info_switch(t, next)	do { } while (0)
>>   #endif /* CONFIG_SCHEDSTATS */
>>
>> +#ifdef CONFIG_SMP
>>   static __cacheline_aligned_in_smp atomic_t rt_overload;
>> +#endif
>>
>>   static inline void inc_rt_tasks(task_t *p, runqueue_t *rq)
>>   {
>> @@ -1473,10 +1477,11 @@ static inline int wake_idle(int cpu, tas
>>   static int try_to_wake_up(task_t *p, unsigned int state, int sync, int mutex)
>>   {
>>   	int cpu, this_cpu, success = 0;
>> -	runqueue_t *this_rq, *rq;
>> +	runqueue_t *rq;
>>   	unsigned long flags;
>>   	long old_state;
>>   #ifdef CONFIG_SMP
>> +	runqueue_t *this_rq;
>>   	unsigned long load, this_load;
>>   	struct sched_domain *sd, *this_sd = NULL;
>>   	int new_cpu;
>> @@ -4351,6 +4356,18 @@ int setscheduler(struct task_struct *p,
>>   			resched_task(rq->curr);
>>   	}
>>   	task_rq_unlock(rq, &flags);
>> +
>> +	/*
>> +	 * If the process is blocked on rt-mutex, it will now wake up and
>> +	 * reinsert itself into the wait list and boost the owner correctly
>> +	 */
>> +	if (p->pi_blocked_on) {
>> +		if (p->pi_blocked_on->save_state)
>> +			wake_up_process_mutex(p);
>> +		else
>> +			wake_up_process(p);
>> +	}
>> +
>>   	spin_unlock_irqrestore(&p->pi_lock, fp);
>>   	return 0;
>>   }
>> @@ -7086,4 +7103,3 @@ void notrace preempt_enable_no_resched(v
>>
>>   EXPORT_SYMBOL(preempt_enable_no_resched);
>>   #endif
>> -
>> Index: linux-2.6.17-rt1/kernel/rtmutex_common.h
>> ===================================================================
>> --- linux-2.6.17-rt1.orig/kernel/rtmutex_common.h
>> +++ linux-2.6.17-rt1/kernel/rtmutex_common.h
>> @@ -49,6 +49,7 @@ struct rt_mutex_waiter {
>>   	struct plist_node	pi_list_entry;
>>   	struct task_struct	*task;
>>   	struct rt_mutex		*lock;
>> +	int                     save_state;
>>   #ifdef CONFIG_DEBUG_RT_MUTEXES
>>   	unsigned long		ip;
>>   	pid_t			deadlock_task_pid;
>>
>

[-- Attachment #2: Type: TEXT/PLAIN, Size: 3991 bytes --]

Index: linux-2.6.17-rt1/kernel/rtmutex.c
===================================================================
--- linux-2.6.17-rt1.orig/kernel/rtmutex.c
+++ linux-2.6.17-rt1/kernel/rtmutex.c
@@ -625,6 +625,7 @@ rt_lock_slowlock(struct rt_mutex *lock _
 
 	debug_rt_mutex_init_waiter(&waiter);
 	waiter.task = NULL;
+	waiter.save_state = 1;
 
 	spin_lock(&lock->wait_lock);
 
@@ -687,6 +688,19 @@ rt_lock_slowlock(struct rt_mutex *lock _
 		state = xchg(&current->state, TASK_UNINTERRUPTIBLE);
 		if (unlikely(state == TASK_RUNNING))
 			saved_state = TASK_RUNNING;
+
+		if (unlikely(waiter.task) &&
+		    waiter.list_entry.prio != current->prio) {
+			/*
+			 * We still not have the lock, but we are woken up with
+			 * a different prio than the one we waited with
+			 * originally. We remove the wait entry now and then
+			 * reinsert ourselves with the right priority
+			 */
+			remove_waiter(lock, &waiter __IP__);
+			waiter.task = NULL;
+		}
+
 	}
 
 	state = xchg(&current->state, saved_state);
@@ -798,6 +812,7 @@ rt_mutex_slowlock(struct rt_mutex *lock,
 
 	debug_rt_mutex_init_waiter(&waiter);
 	waiter.task = NULL;
+	waiter.save_state = 0;
 
 	spin_lock(&lock->wait_lock);
 
@@ -877,6 +892,18 @@ rt_mutex_slowlock(struct rt_mutex *lock,
 
 		current->flags |= saved_flags;
 		set_current_state(state);
+
+		if (unlikely(waiter.task) &&
+		    waiter.list_entry.prio != current->prio) {
+			/*
+			 * We still not have the lock, but we are woken up with
+			 * a different prio than the one we waited with
+			 * originally. We remove the wait entry now and then
+			 * reinsert ourselves with the right priority
+			 */
+			remove_waiter(lock, &waiter __IP__);
+			waiter.task = NULL;
+		}
 	}
 
 	set_current_state(TASK_RUNNING);
Index: linux-2.6.17-rt1/kernel/sched.c
===================================================================
--- linux-2.6.17-rt1.orig/kernel/sched.c
+++ linux-2.6.17-rt1/kernel/sched.c
@@ -57,6 +57,8 @@
 
 #include <asm/unistd.h>
 
+#include "rtmutex_common.h"
+
 /*
  * Convert user-nice values [ -20 ... 0 ... 19 ]
  * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
@@ -646,7 +648,9 @@ static inline void sched_info_switch(tas
 #define sched_info_switch(t, next)	do { } while (0)
 #endif /* CONFIG_SCHEDSTATS */
 
+#ifdef CONFIG_SMP
 static __cacheline_aligned_in_smp atomic_t rt_overload;
+#endif
 
 static inline void inc_rt_tasks(task_t *p, runqueue_t *rq)
 {
@@ -1473,10 +1477,11 @@ static inline int wake_idle(int cpu, tas
 static int try_to_wake_up(task_t *p, unsigned int state, int sync, int mutex)
 {
 	int cpu, this_cpu, success = 0;
-	runqueue_t *this_rq, *rq;
+	runqueue_t *rq;
 	unsigned long flags;
 	long old_state;
 #ifdef CONFIG_SMP
+	runqueue_t *this_rq;
 	unsigned long load, this_load;
 	struct sched_domain *sd, *this_sd = NULL;
 	int new_cpu;
@@ -4351,6 +4356,18 @@ int setscheduler(struct task_struct *p, 
 			resched_task(rq->curr);
 	}
 	task_rq_unlock(rq, &flags);
+
+	/*
+	 * If the process is blocked on rt-mutex, it will now wake up and
+	 * reinsert itself into the wait list and boost the owner correctly
+	 */
+	if (p->pi_blocked_on) {
+		if (p->pi_blocked_on->save_state)
+			wake_up_process_mutex(p);
+		else
+			wake_up_process(p);
+	}
+
 	spin_unlock_irqrestore(&p->pi_lock, fp);
 	return 0;
 }
@@ -7086,4 +7103,3 @@ void notrace preempt_enable_no_resched(v
 
 EXPORT_SYMBOL(preempt_enable_no_resched);
 #endif
-
Index: linux-2.6.17-rt1/kernel/rtmutex_common.h
===================================================================
--- linux-2.6.17-rt1.orig/kernel/rtmutex_common.h
+++ linux-2.6.17-rt1/kernel/rtmutex_common.h
@@ -49,6 +49,7 @@ struct rt_mutex_waiter {
 	struct plist_node	pi_list_entry;
 	struct task_struct	*task;
 	struct rt_mutex		*lock;
+	int                     save_state;
 #ifdef CONFIG_DEBUG_RT_MUTEXES
 	unsigned long		ip;
 	pid_t			deadlock_task_pid;

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-21 15:51                         ` Steven Rostedt
@ 2006-06-21 17:14                           ` Esben Nielsen
  0 siblings, 0 replies; 51+ messages in thread
From: Esben Nielsen @ 2006-06-21 17:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Esben Nielsen, Esben Nielsen, Thomas Gleixner, Ingo Molnar, linux-kernel

Odd, ...

On Wed, 21 Jun 2006, Steven Rostedt wrote:

>
> On Wed, 21 Jun 2006, Esben Nielsen wrote:
>
>>> The below patch has whitespace damage.  Could you resend?
>>>
>>
>> Does it help if I send it as an attachment?
>>
>
> Don't know.  See below:
>
>>
>>
>>>>
>>>> Index: linux-2.6.17-rt1/kernel/rtmutex.c
>>>> ===================================================================
>>>> --- linux-2.6.17-rt1.orig/kernel/rtmutex.c
>>>> +++ linux-2.6.17-rt1/kernel/rtmutex.c
>>>> @@ -625,6 +625,7 @@ rt_lock_slowlock(struct rt_mutex *lock _
>>>>
>
> The below has two spaces before the tab.  Is that from your mailer?

When I open the patch file in emacs it isn't like that, so I guess so, but 
when I open my mail-box in emacs it is like that. So it has to be Pine. I 
thought ctrl-R did the trick of inserting a patch from quilt?

Anyway, could you apply the attached patch from the previous mail?

Esben


>
>>>>   	debug_rt_mutex_init_waiter(&waiter);
>>>>   	waiter.task = NULL;
>>>> +	waiter.save_state = 1;
>>>>
>>>>   	spin_lock(&lock->wait_lock);
>>>>
>
> The rest of the patch is the same.  The two spaces kill patch.
>
> -- Steve
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-21 16:26                     ` Thomas Gleixner
@ 2006-06-21 18:30                       ` Ingo Molnar
  2006-06-22 10:28                         ` Esben Nielsen
  2006-06-21 21:29                       ` Esben Nielsen
  2006-06-22 13:45                       ` Steven Rostedt
  2 siblings, 1 reply; 51+ messages in thread
From: Ingo Molnar @ 2006-06-21 18:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Esben Nielsen, Esben Nielsen, Steven Rostedt, linux-kernel


* Thomas Gleixner <tglx@linutronix.de> wrote:

> On Wed, 2006-06-21 at 16:43 +0100, Esben Nielsen wrote:
> > What about the patch below. It compiles and my UP labtop runs fine, but I 
> > haven't otherwise tested it.  My labtop runs RTExec without hichups 
> > until now.
> 
> NAK, it puts the burden into the lock path and also does a remove / 
> add which results in walking the chain twice.
> 
> Find an version against the code in -mm below. Not too much tested 
> yet.

i like this one - it does the prio fixup where it should be done.

	Ingo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-21 21:29                       ` Esben Nielsen
@ 2006-06-21 20:33                         ` Thomas Gleixner
  2006-06-21 23:35                           ` Esben Nielsen
  0 siblings, 1 reply; 51+ messages in thread
From: Thomas Gleixner @ 2006-06-21 20:33 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Esben Nielsen, Steven Rostedt, Ingo Molnar, linux-kernel

On Wed, 2006-06-21 at 22:29 +0100, Esben Nielsen wrote:
> > Find an version against the code in -mm below. Not too much tested yet.
>
> What if setscheduler is called from interrup context as in the hrt timers?

It simply gets stuff going, nothing else. 

	tglx



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-21 16:26                     ` Thomas Gleixner
  2006-06-21 18:30                       ` Ingo Molnar
@ 2006-06-21 21:29                       ` Esben Nielsen
  2006-06-21 20:33                         ` Thomas Gleixner
  2006-06-22 13:45                       ` Steven Rostedt
  2 siblings, 1 reply; 51+ messages in thread
From: Esben Nielsen @ 2006-06-21 21:29 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Esben Nielsen, Esben Nielsen, Steven Rostedt, Ingo Molnar, linux-kernel



On Wed, 21 Jun 2006, Thomas Gleixner wrote:

> On Wed, 2006-06-21 at 16:43 +0100, Esben Nielsen wrote:
>> What about the patch below. It compiles and my UP labtop runs fine, but I
>> haven't otherwise tested it.  My labtop runs RTExec without hichups
>> until now.
>
> NAK, it puts the burden into the lock path and
What burden? An extra if (waiter.task) is the only thing!

> also does a remove / add
> which results in walking the chain twice.

The twice running through can be fixed.

>
> Find an version against the code in -mm below. Not too much tested yet.
>

What if setscheduler is called from interrup context as in the hrt timers?

Esben

> 	tglx
>
> Index: linux-2.6.17-rc6-mm/kernel/rtmutex.c
> ===================================================================
> --- linux-2.6.17-rc6-mm.orig/kernel/rtmutex.c	2006-06-21 13:34:34.000000000 +0200
> +++ linux-2.6.17-rc6-mm/kernel/rtmutex.c	2006-06-21 15:45:21.000000000 +0200
> @@ -160,7 +160,8 @@
> static int rt_mutex_adjust_prio_chain(task_t *task,
> 				      int deadlock_detect,
> 				      struct rt_mutex *orig_lock,
> -				      struct rt_mutex_waiter *orig_waiter)
> +				      struct rt_mutex_waiter *orig_waiter,
> +				      struct task_struct *top_task)
> {
> 	struct rt_mutex *lock;
> 	struct rt_mutex_waiter *waiter, *top_waiter = orig_waiter;
> @@ -188,7 +189,7 @@
> 			prev_max = max_lock_depth;
> 			printk(KERN_WARNING "Maximum lock depth %d reached "
> 			       "task: %s (%d)\n", max_lock_depth,
> -			       current->comm, current->pid);
> +			       top_task->comm, top_task->pid);
> 		}
> 		put_task_struct(task);
>
> @@ -228,7 +229,7 @@
> 	}
>
> 	/* Deadlock detection */
> -	if (lock == orig_lock || rt_mutex_owner(lock) == current) {
> +	if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> 		debug_rt_mutex_deadlock(deadlock_detect, orig_waiter, lock);
> 		spin_unlock(&lock->wait_lock);
> 		ret = deadlock_detect ? -EDEADLK : 0;
> @@ -448,7 +449,8 @@
>
> 	spin_unlock(&lock->wait_lock);
>
> -	res = rt_mutex_adjust_prio_chain(owner, detect_deadlock, lock, waiter);
> +	res = rt_mutex_adjust_prio_chain(owner, detect_deadlock, lock, waiter,
> +					 current);
>
> 	spin_lock(&lock->wait_lock);
>
> @@ -561,12 +563,35 @@
>
> 	spin_unlock(&lock->wait_lock);
>
> -	rt_mutex_adjust_prio_chain(owner, 0, lock, NULL);
> +	rt_mutex_adjust_prio_chain(owner, 0, lock, NULL, current);
>
> 	spin_lock(&lock->wait_lock);
> }
>
> /*
> + * Recheck the pi chain, in case we got a priority setting
> + *
> + * Called from sched_setscheduler
> + */
> +void rt_mutex_adjust_pi(struct task_struct *task)
> +{
> +	struct rt_mutex_waiter *waiter = task->pi_blocked_on;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&task->pi_lock, flags);
> +
> +	if (!waiter || waiter->list_entry.prio == task->prio) {
> +		spin_unlock_irqrestore(&task->pi_lock, flags);
> +		return;
> +	}
> +
> +	get_task_struct(task);
> +	spin_unlock_irqrestore(&task->pi_lock, flags);
> +
> +	rt_mutex_adjust_prio_chain(task, 0, NULL, NULL, task);
> +}
> +
> +/*
>  * Slow path lock function:
>  */
> static int __sched
> @@ -633,6 +658,7 @@
> 			if (unlikely(ret))
> 				break;
> 		}
> +
> 		spin_unlock(&lock->wait_lock);
>
> 		debug_rt_mutex_print_deadlock(&waiter);
> Index: linux-2.6.17-rc6-mm/kernel/sched.c
> ===================================================================
> --- linux-2.6.17-rc6-mm.orig/kernel/sched.c	2006-06-21 13:34:34.000000000 +0200
> +++ linux-2.6.17-rc6-mm/kernel/sched.c	2006-06-21 18:21:16.000000000 +0200
> @@ -4028,8 +4028,7 @@
> int sched_setscheduler(struct task_struct *p, int policy,
> 		       struct sched_param *param)
> {
> -	int retval;
> -	int oldprio, oldpolicy = -1;
> +	int retval, oldprio, oldpolicy = -1;
> 	prio_array_t *array;
> 	unsigned long flags;
> 	runqueue_t *rq;
> @@ -4119,6 +4118,8 @@
> 	__task_rq_unlock(rq);
> 	spin_unlock_irqrestore(&p->pi_lock, flags);
>
> +	rt_mutex_adjust_pi(p);
> +
> 	return 0;
> }
> EXPORT_SYMBOL_GPL(sched_setscheduler);
> Index: linux-2.6.17-rc6-mm/include/linux/sched.h
> ===================================================================
> --- linux-2.6.17-rc6-mm.orig/include/linux/sched.h	2006-06-21 13:37:21.000000000 +0200
> +++ linux-2.6.17-rc6-mm/include/linux/sched.h	2006-06-21 15:43:23.000000000 +0200
> @@ -1123,11 +1123,13 @@
> #ifdef CONFIG_RT_MUTEXES
> extern int rt_mutex_getprio(task_t *p);
> extern void rt_mutex_setprio(task_t *p, int prio);
> +extern void rt_mutex_adjust_pi(task_t *p);
> #else
> static inline int rt_mutex_getprio(task_t *p)
> {
> 	return p->normal_prio;
> }
> +# define rt_mutex_adjust_pi(p)		do { } while (0)
> #endif
>
> extern void set_user_nice(task_t *p, long nice);
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-21 20:33                         ` Thomas Gleixner
@ 2006-06-21 23:35                           ` Esben Nielsen
  2006-06-22  7:06                             ` Thomas Gleixner
  0 siblings, 1 reply; 51+ messages in thread
From: Esben Nielsen @ 2006-06-21 23:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Esben Nielsen, Esben Nielsen, Steven Rostedt, Ingo Molnar, linux-kernel

On Wed, 21 Jun 2006, Thomas Gleixner wrote:

> On Wed, 2006-06-21 at 22:29 +0100, Esben Nielsen wrote:
>>> Find an version against the code in -mm below. Not too much tested yet.
>>
>> What if setscheduler is called from interrup context as in the hrt timers?
>
> It simply gets stuff going, nothing else.
>
What I mean is that we will then do the full priority inheritance boost 
with interrupts off.

Before setscheduler() was O(1), now it is O(<lock depth of what ever lock 
the target task might be locked on>).

This is not a problem for your use of setscheduler() as the task involved 
only can be blocked on kernel mutexes, but when the function is used on a 
userspace process the lock depth can be deep.


Esben


> 	tglx
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.17-rt1
  2006-06-18  7:06 2.6.17-rt1 Ingo Molnar
  2006-06-18 16:13 ` 2.6.17-rt1 Michal Piotrowski
       [not found] ` <Pine.LNX.4.64.0606201656230.11643@localhost.localdomain>
@ 2006-06-22  0:57 ` Lee Revell
  2006-06-22  2:51   ` More weird latency trace output (was Re: 2.6.17-rt1) Lee Revell
  2006-06-23 20:56 ` 2.6.17-rt1 - mm_struct leak Vernon Mauery
  3 siblings, 1 reply; 51+ messages in thread
From: Lee Revell @ 2006-06-22  0:57 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Thomas Gleixner, Steven Rostedt

On Sun, 2006-06-18 at 09:06 +0200, Ingo Molnar wrote:
> i have released the 2.6.17-rt1 tree, which can be downloaded from the 
> usual place:
> 
>    http://redhat.com/~mingo/realtime-preempt/
> 
> this is a fixes-only release: merged to 2.6.17-final and smaller fixes.
> 
> to build a 2.6.17-rc6-rt1 tree, the following patches should be applied:
> 
>   http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.17.tar.bz2
>   http://redhat.com/~mingo/realtime-preempt/patch-2.6.17-rt1

I'm getting some strange latency tracer output on a dual core AMD box.
Why does it start at 404us?

Lee

preemption latency trace v1.1.5 on 2.6.17-rt1-smp-latency-trace
--------------------------------------------------------------------
 latency: 415 us, #26/26, CPU#1 | (M:rt VP:0, KP:0, SP:1 HP:1 #P:2)
    -----------------
    | task: kjournald-2043 (uid:0 nice:-5 policy:0 rt_prio:0)
    -----------------

                 _------=> CPU#            
                / _-----=> irqs-off        
               | / _----=> need-resched    
               || / _---=> hardirq/softirq 
               ||| / _--=> preempt-depth   
               |||| /                      
               |||||     delay             
   cmd     pid ||||| time  |   caller      
      \   /    |||||   \   |   /           
  <idle>-0     1Dn.1  404us : smp_reschedule_interrupt (reschedule_interrupt)
  <idle>-0     1.n.1  405us : __exit_idle (cpu_idle)
  <idle>-0     1.n.1  405us : atomic_notifier_call_chain (__exit_idle)
  <idle>-0     1.n.1  406us : rcu_read_lock (atomic_notifier_call_chain)
  <idle>-0     1Dn.1  406us : rcu_read_lock (0 0 badc)
  <idle>-0     1Dn.1  406us : rcu_read_lock (ffff810002eb7228 0 badc)
  <idle>-0     1.n.1  406us : rcu_read_unlock (atomic_notifier_call_chain)
  <idle>-0     1Dn.1  407us : rcu_read_unlock (ffff810002eb7228 0 1)
  <idle>-0     1Dn.1  407us : rcu_read_unlock (0 0 0)
  <idle>-0     1Dn..  407us : __schedule (cpu_idle)
  <idle>-0     1Dn..  407us : profile_hit (__schedule)
  <idle>-0     1Dn.1  408us : sched_clock (__schedule)
  <idle>-0     1Dn.1  408us : _raw_spin_lock_irq (__schedule)
  <idle>-0     1Dn.2  409us : recalc_task_prio (__schedule)
  <idle>-0     1Dn.2  410us : effective_prio (recalc_task_prio)
  <idle>-0     1Dn.2  410us : effective_prio <<...>-2043> (-10 -10)
  <idle>-0     1D..2  412us : __switch_to (thread_return)
   <...>-2043  1D..2  413us : thread_return <<idle>-0> (20 -10)
   <...>-2043  1D..2  413us : _raw_spin_unlock_irq (thread_return)
   <...>-2043  1...1  413us : constant_test_bit (_raw_spin_unlock_irq)
   <...>-2043  1...1  413us : trace_stop_sched_switched (thread_return)
   <...>-2043  1D..1  414us : _raw_spin_lock (trace_stop_sched_switched)
   <...>-2043  1D..2  414us : trace_stop_sched_switched <<...>-2043> (110 1)
   <...>-2043  1D..2  415us : thread_return (thread_return)




^ permalink raw reply	[flat|nested] 51+ messages in thread

* More weird latency trace output (was Re: 2.6.17-rt1)
  2006-06-22  0:57 ` 2.6.17-rt1 Lee Revell
@ 2006-06-22  2:51   ` Lee Revell
  2006-06-23  1:24     ` Lee Revell
  0 siblings, 1 reply; 51+ messages in thread
From: Lee Revell @ 2006-06-22  2:51 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Thomas Gleixner, Steven Rostedt

On Wed, 2006-06-21 at 20:57 -0400, Lee Revell wrote:
> I'm getting some strange latency tracer output on a dual core AMD box.
> Why does it start at 404us?
> 

After setting trace_all_cpus to 1, I get the following:

( posix_cpu_timer-15   |#1): new 8 us maximum-latency wakeup.
(         IRQ 233-2446 |#1): new 14 us maximum-latency wakeup.
( posix_cpu_timer-15   |#1): new 19 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 861 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 862 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 862 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 868 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 876 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 879 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 880 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 882 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 884 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 884 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 885 us maximum-latency wakeup.

[ max latencies continue to creep up by 1-2 us at a time ]

(          IRQ 14-1783 |#0): new 1826 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 1827 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 1828 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 1831 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 1832 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 1834 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 1836 us maximum-latency wakeup.
(         IRQ 233-2446 |#0): new 1838 us maximum-latency wakeup.

Here is a typical trace:

preemption latency trace v1.1.5 on 2.6.17-rt1-smp-latency-trace
--------------------------------------------------------------------
 latency: 1898 us, #78/78, CPU#1 | (M:rt VP:0, KP:0, SP:1 HP:1 #P:2)
    -----------------
    | task: IRQ 233-2446 (uid:0 nice:-5 policy:1 rt_prio:40)
    -----------------

                 _------=> CPU#            
                / _-----=> irqs-off        
               | / _----=> need-resched    
               || / _---=> hardirq/softirq 
               ||| / _--=> preempt-depth   
               |||| /                      
               |||||     delay             
   cmd     pid ||||| time  |   caller      
      \   /    |||||   \   |   /           
softirq--5     0....    0us : __wake_up (ksoftirqd)
softirq--5     0....    0us : __lock_text_start (__wake_up)
softirq--5     0....    0us : __wake_up_common (__wake_up)
softirq--5     0....    0us : rt_unlock (__wake_up)
softirq--5     0....    0us : kthread_should_stop (ksoftirqd)
softirq--5     0....    0us : schedule (ksoftirqd)
softirq--5     0....    1us : __schedule (schedule)
softirq--5     0....    1us : profile_hit (__schedule)
softirq--5     0...1    1us : sched_clock (__schedule)
softirq--5     0...1    1us : _raw_spin_lock_irq (__schedule)
softirq--5     0D..2    2us : touch_softlockup_watchdog (__schedule)
softirq--5     0D..2    2us : deactivate_task (__schedule)
softirq--5     0D..2    2us : deactivate_task <softirq--5> (101 1)
softirq--5     0D..2    2us : dequeue_task (deactivate_task)
softirq--5     0D..2    3us : __first_cpu (__schedule)
softirq--5     0D..2    3us : __next_cpu (__schedule)
softirq--5     0D..2    3us : double_lock_balance (__schedule)
softirq--5     0D..2    3us : _raw_spin_trylock (double_lock_balance)
softirq--5     0D..3    4us : trace_change_sched_cpu (__schedule)
softirq--5     0D..3    5us : trace_change_sched_cpu (1 1 0)
softirq--5     0D..3    5us : _raw_spin_lock (trace_change_sched_cpu)
softirq--5     0D..4    5us : trace_change_sched_cpu (1 0 0)
softirq--5     0D..4    6us : _raw_spin_unlock (trace_change_sched_cpu)
softirq--5     0D..3    6us : constant_test_bit (_raw_spin_unlock)
softirq--5     0D..3    6us : deactivate_task (__schedule)
softirq--5     0D..3    6us : deactivate_task <<...>-2446> (140 3)
softirq--5     0D..3    7us : dequeue_task (deactivate_task)
softirq--5     0D..3    7us : activate_task (__schedule)
softirq--5     0D..3    7us : sched_clock (activate_task)
softirq--5     0D..3    7us : __activate_task (activate_task)
softirq--5     0D..3    8us : __activate_task <<...>-2446> (59 0)
softirq--5     0D..3    8us : enqueue_task (__activate_task)
softirq--5     0D..3    8us : _raw_spin_unlock (__schedule)
softirq--5     0D..2    8us : constant_test_bit (_raw_spin_unlock)
softirq--5     0D..2    8us : __next_cpu (__schedule)
softirq--5     0D..2    9us : __switch_to (thread_return)
   <...>-2446  0D..2   10us : thread_return <softirq--5> (101 140)
   <...>-2446  0D..2   10us : _raw_spin_unlock_irq (thread_return)
   <...>-2446  0...1   10us : constant_test_bit (_raw_spin_unlock_irq)
   <...>-2446  0...1   11us : trace_stop_sched_switched (thread_return)
   <...>-2446  0D..1   11us : _raw_spin_lock (trace_stop_sched_switched)
   <...>-2446  0D..2   11us : trace_stop_sched_switched <<...>-2446> (59 0)
   <...>-2446  0D..3   11us : _raw_spin_unlock (trace_stop_sched_switched)
   <...>-2446  0D..2   12us : constant_test_bit (_raw_spin_unlock)
   <...>-2446  0D..2   12us : report_latency (trace_stop_sched_switched)
   <...>-2446  0D..2   12us!: thread_return (thread_return)

How can the latency tracer be reporting 1898us max latency while the
trace is of a 12us latency?  This must be a bug, correct?

Lee



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-21 23:35                           ` Esben Nielsen
@ 2006-06-22  7:06                             ` Thomas Gleixner
  2006-06-22 10:32                               ` Esben Nielsen
  2006-06-22 13:33                               ` Steven Rostedt
  0 siblings, 2 replies; 51+ messages in thread
From: Thomas Gleixner @ 2006-06-22  7:06 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Esben Nielsen, Steven Rostedt, Ingo Molnar, linux-kernel

On Thu, 2006-06-22 at 00:35 +0100, Esben Nielsen wrote:
> On Wed, 21 Jun 2006, Thomas Gleixner wrote:
> 
> > On Wed, 2006-06-21 at 22:29 +0100, Esben Nielsen wrote:
> >>> Find an version against the code in -mm below. Not too much tested yet.
> >>
> >> What if setscheduler is called from interrup context as in the hrt timers?
> >
> > It simply gets stuff going, nothing else.
> >
> What I mean is that we will then do the full priority inheritance boost 
> with interrupts off.

Only in the case when its called from IRQ context.

> Before setscheduler() was O(1), now it is O(<lock depth of what ever lock 
> the target task might be locked on>).
>
> This is not a problem for your use of setscheduler() as the task involved 
> only can be blocked on kernel mutexes, but when the function is used on a 
> userspace process the lock depth can be deep.

Damn, I missed that this is still in the irq off section, when called
from do_sched_setscheduler().

Good catch. I fix that.

	tglx



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-21 18:30                       ` Ingo Molnar
@ 2006-06-22 10:28                         ` Esben Nielsen
  0 siblings, 0 replies; 51+ messages in thread
From: Esben Nielsen @ 2006-06-22 10:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Esben Nielsen, Esben Nielsen, Steven Rostedt,
	linux-kernel

On Wed, 21 Jun 2006, Ingo Molnar wrote:

>
> * Thomas Gleixner <tglx@linutronix.de> wrote:
>
>> On Wed, 2006-06-21 at 16:43 +0100, Esben Nielsen wrote:
>>> What about the patch below. It compiles and my UP labtop runs fine, but I
>>> haven't otherwise tested it.  My labtop runs RTExec without hichups
>>> until now.
>>
>> NAK, it puts the burden into the lock path and also does a remove /
>> add which results in walking the chain twice.
>>
>> Find an version against the code in -mm below. Not too much tested
>> yet.
>
> i like this one - it does the prio fixup where it should be done.

I disagree. I think the work should be done by the task which is blocked.
I can't see the work belongs to the one calling setschedule() - especially 
if that is in interrupt context.

A good principle in real-time system programming is to push at most work 
as possible to as low a priority as possible while still meeting the required 
deadlines. If you don't, you unneedingly increase the latency associated 
with the higher priority.

Let us say you the increase the priority of a task with setscheduler():
The espected latency can never be better than the one associated with the 
current task, nor can it be better than the target priority. Therefore the 
job of fixing the priorities should be done in the lowest priority of
current's priority and the target priority.

Let us say you the decrease the priority of a task with setscheduler():
You need to boost tasks out of the high priority with the latency 
associated with that priority, but the overall latency can never be 
expected to be higher that the priority of at which setscheduler() is 
running. Therefore the job of fixing the priorities should be donee in the 
lowest priority of the current priority and the previous priority of the 
target task.

Neither of the patches by Thomas and me does it optimal in all cases.

Esben

>
> 	Ingo
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-22  7:06                             ` Thomas Gleixner
@ 2006-06-22 10:32                               ` Esben Nielsen
  2006-06-22 13:33                               ` Steven Rostedt
  1 sibling, 0 replies; 51+ messages in thread
From: Esben Nielsen @ 2006-06-22 10:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Esben Nielsen, Esben Nielsen, Steven Rostedt, Ingo Molnar, linux-kernel



On Thu, 22 Jun 2006, Thomas Gleixner wrote:

> On Thu, 2006-06-22 at 00:35 +0100, Esben Nielsen wrote:
>> On Wed, 21 Jun 2006, Thomas Gleixner wrote:
>>
>>> On Wed, 2006-06-21 at 22:29 +0100, Esben Nielsen wrote:
>>>>> Find an version against the code in -mm below. Not too much tested yet.
>>>>
>>>> What if setscheduler is called from interrup context as in the hrt timers?
>>>
>>> It simply gets stuff going, nothing else.
>>>
>> What I mean is that we will then do the full priority inheritance boost
>> with interrupts off.
>
> Only in the case when its called from IRQ context.
>
>> Before setscheduler() was O(1), now it is O(<lock depth of what ever lock
>> the target task might be locked on>).
>>
>> This is not a problem for your use of setscheduler() as the task involved
>> only can be blocked on kernel mutexes, but when the function is used on a
>> userspace process the lock depth can be deep.
>
> Damn, I missed that this is still in the irq off section, when called
> from do_sched_setscheduler().
>

There is more to it than that:
What if you for some reason try to set the priority of a low level task 
from a high priority one? It could be that your application have some kind 
of watchdog or manager process dynamically adjusting the priorities of the 
other tasks. You simply don't want to be concerned about those tasks being 
blocked in some strange (maybe even buggy) logging structure. You want to 
do it O(1) and then let those tasks handle it.

> Good catch. I fix that.
>

I know you hate to let the scheduler do the work for you, but I find it 
very elegant and easy once in a while :-)

Esben

> 	tglx
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-22  7:06                             ` Thomas Gleixner
  2006-06-22 10:32                               ` Esben Nielsen
@ 2006-06-22 13:33                               ` Steven Rostedt
  1 sibling, 0 replies; 51+ messages in thread
From: Steven Rostedt @ 2006-06-22 13:33 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Esben Nielsen, Esben Nielsen, Ingo Molnar, linux-kernel


On Thu, 22 Jun 2006, Thomas Gleixner wrote:

> On Thu, 2006-06-22 at 00:35 +0100, Esben Nielsen wrote:
> > On Wed, 21 Jun 2006, Thomas Gleixner wrote:
> >
> > > On Wed, 2006-06-21 at 22:29 +0100, Esben Nielsen wrote:
> > >>> Find an version against the code in -mm below. Not too much tested yet.
> > >>
> > >> What if setscheduler is called from interrup context as in the hrt timers?
> > >
> > > It simply gets stuff going, nothing else.
> > >
> > What I mean is that we will then do the full priority inheritance boost
> > with interrupts off.
>
> Only in the case when its called from IRQ context.
>

Uh, Thomas, we can't call rt_mutex_adjust_prio_chain from interrupt
context.  It grabs the wait_lock, which is documented not to be
called by interrupts.  So this can easily deadlock the system.

-- Steve


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-21 16:26                     ` Thomas Gleixner
  2006-06-21 18:30                       ` Ingo Molnar
  2006-06-21 21:29                       ` Esben Nielsen
@ 2006-06-22 13:45                       ` Steven Rostedt
  2006-06-22 14:20                         ` Thomas Gleixner
  2 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2006-06-22 13:45 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Esben Nielsen, Ingo Molnar, LKML



On Wed, 21 Jun 2006, Thomas Gleixner wrote:

> On Wed, 2006-06-21 at 16:43 +0100, Esben Nielsen wrote:
> > What about the patch below. It compiles and my UP labtop runs fine, but I
> > haven't otherwise tested it.  My labtop runs RTExec without hichups
> > until now.
>
> NAK, it puts the burden into the lock path and also does a remove / add
> which results in walking the chain twice.

I think the burden is OK.  It only happens when the process is actually
blocked, so it's the slow path anyways.

>
>  /*
> + * Recheck the pi chain, in case we got a priority setting
> + *
> + * Called from sched_setscheduler
> + */
> +void rt_mutex_adjust_pi(struct task_struct *task)
> +{
> +	struct rt_mutex_waiter *waiter = task->pi_blocked_on;
> +	unsigned long flags;

Hmm, what if the process wakes up here and unblocks?  Since waiter is
on the stack, you could have a nasty race here.

> +
> +	spin_lock_irqsave(&task->pi_lock, flags);
> +
> +	if (!waiter || waiter->list_entry.prio == task->prio) {
> +		spin_unlock_irqrestore(&task->pi_lock, flags);
> +		return;
> +	}
> +
> +	get_task_struct(task);
> +	spin_unlock_irqrestore(&task->pi_lock, flags);
> +
> +	rt_mutex_adjust_prio_chain(task, 0, NULL, NULL, task);

Nasty nasty nasty!

> +}
> +

-- Steve


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-22 13:45                       ` Steven Rostedt
@ 2006-06-22 14:20                         ` Thomas Gleixner
  2006-06-22 14:23                           ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Thomas Gleixner @ 2006-06-22 14:20 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Esben Nielsen, Ingo Molnar, LKML

On Thu, 2006-06-22 at 09:45 -0400, Steven Rostedt wrote:
> > + * Recheck the pi chain, in case we got a priority setting
> > + *
> > + * Called from sched_setscheduler
> > + */
> > +void rt_mutex_adjust_pi(struct task_struct *task)
> > +{
> > +	struct rt_mutex_waiter *waiter = task->pi_blocked_on;
> > +	unsigned long flags;
> 
> Hmm, what if the process wakes up here and unblocks?  Since waiter is
> on the stack, you could have a nasty race here.

Thats fixed already. Moved into te pi_lock section

> > +
> > +	spin_lock_irqsave(&task->pi_lock, flags);
> > +
> > +	if (!waiter || waiter->list_entry.prio == task->prio) {
> > +		spin_unlock_irqrestore(&task->pi_lock, flags);
> > +		return;
> > +	}
> > +
> > +	get_task_struct(task);
> > +	spin_unlock_irqrestore(&task->pi_lock, flags);
> > +
> > +	rt_mutex_adjust_prio_chain(task, 0, NULL, NULL, task);
> 
> Nasty nasty nasty!

What's nasty ? 

	tglx



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-22 14:20                         ` Thomas Gleixner
@ 2006-06-22 14:23                           ` Steven Rostedt
  2006-06-22 14:26                             ` Thomas Gleixner
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2006-06-22 14:23 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Esben Nielsen, Ingo Molnar, LKML


On Thu, 22 Jun 2006, Thomas Gleixner wrote:

> On Thu, 2006-06-22 at 09:45 -0400, Steven Rostedt wrote:
> > > + * Recheck the pi chain, in case we got a priority setting
> > > + *
> > > + * Called from sched_setscheduler
> > > + */
> > > +void rt_mutex_adjust_pi(struct task_struct *task)
> > > +{
> > > +	struct rt_mutex_waiter *waiter = task->pi_blocked_on;
> > > +	unsigned long flags;
> >
> > Hmm, what if the process wakes up here and unblocks?  Since waiter is
> > on the stack, you could have a nasty race here.
>
> Thats fixed already. Moved into te pi_lock section

Yeah I saw that.

>
> > > +
> > > +	spin_lock_irqsave(&task->pi_lock, flags);
> > > +
> > > +	if (!waiter || waiter->list_entry.prio == task->prio) {
> > > +		spin_unlock_irqrestore(&task->pi_lock, flags);
> > > +		return;
> > > +	}
> > > +
> > > +	get_task_struct(task);
> > > +	spin_unlock_irqrestore(&task->pi_lock, flags);
> > > +
> > > +	rt_mutex_adjust_prio_chain(task, 0, NULL, NULL, task);
> >
> > Nasty nasty nasty!
>
> What's nasty ?
>

The fact that sched_setscheduler can never be called by interrupt context.
So I don't know how you're going to handle the high_res dynamic priority
now.

-- Steve


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-22 14:23                           ` Steven Rostedt
@ 2006-06-22 14:26                             ` Thomas Gleixner
  2006-06-22 18:06                               ` Esben Nielsen
  0 siblings, 1 reply; 51+ messages in thread
From: Thomas Gleixner @ 2006-06-22 14:26 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Esben Nielsen, Ingo Molnar, LKML

On Thu, 2006-06-22 at 10:23 -0400, Steven Rostedt wrote:
> > What's nasty ?
> >
> 
> The fact that sched_setscheduler can never be called by interrupt context.
> So I don't know how you're going to handle the high_res dynamic priority
> now.

Thats a seperate issue. Though you are right.

	tglx



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-22 18:06                               ` Esben Nielsen
@ 2006-06-22 18:05                                 ` Thomas Gleixner
  2006-06-23 11:23                                   ` Esben Nielsen
  0 siblings, 1 reply; 51+ messages in thread
From: Thomas Gleixner @ 2006-06-22 18:05 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Steven Rostedt, Ingo Molnar, LKML

On Thu, 2006-06-22 at 19:06 +0100, Esben Nielsen wrote:
> >
> > Thats a seperate issue. Though you are right.
> 
> Why not use my original patch and solve both issues?
> I have even updated it to avoid the double traversal. It also removes 
> one other traversal which shouldn't be needed. (I have not had time 
> to boot the kernel with it, though, but it does compile...:-)

Simply because it does not solve following scenario:

High prio task is blocked on lock and boosts 3 other tasks. Now the
higher prrio watchdog detects that the high prio task is stuck and
lowers the priority. You can wake it up as long as you want, the boosted
task is still busy looping. We want an immidate propagation.

And I do not like the idea of invoking the scheduler to do those
propagations. setscheduler is a synchronous effect in all other cases.
So it has to be synchronous in the propagation case too.

Preempt-RT and the dynamic priority adjustment of high resolution timers
is a different playground and we have to think about that seperately.

	tglx



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-22 14:26                             ` Thomas Gleixner
@ 2006-06-22 18:06                               ` Esben Nielsen
  2006-06-22 18:05                                 ` Thomas Gleixner
  0 siblings, 1 reply; 51+ messages in thread
From: Esben Nielsen @ 2006-06-22 18:06 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Steven Rostedt, Esben Nielsen, Ingo Molnar, LKML



On Thu, 22 Jun 2006, Thomas Gleixner wrote:

> On Thu, 2006-06-22 at 10:23 -0400, Steven Rostedt wrote:
>>> What's nasty ?
>>>
>>
>> The fact that sched_setscheduler can never be called by interrupt context.
>> So I don't know how you're going to handle the high_res dynamic priority
>> now.
>
> Thats a seperate issue. Though you are right.

Why not use my original patch and solve both issues?
I have even updated it to avoid the double traversal. It also removes 
one other traversal which shouldn't be needed. (I have not had time 
to boot the kernel with it, though, but it does compile...:-)

Esben

>
> 	tglx
>
>

Index: linux-2.6.17-rt1/kernel/rtmutex.c
===================================================================
--- linux-2.6.17-rt1.orig/kernel/rtmutex.c
+++ linux-2.6.17-rt1/kernel/rtmutex.c
@@ -539,7 +539,8 @@ static void wakeup_next_waiter(struct rt
   * Must be called with lock->wait_lock held
   */
  static void remove_waiter(struct rt_mutex *lock,
-			  struct rt_mutex_waiter *waiter  __IP_DECL__)
+			  struct rt_mutex_waiter *waiter,
+			  int fix_prio __IP_DECL__)
  {
  	int first = (waiter == rt_mutex_top_waiter(lock));
  	int boost = 0;
@@ -564,10 +565,12 @@ static void remove_waiter(struct rt_mute
  			next = rt_mutex_top_waiter(lock);
  			plist_add(&next->pi_list_entry, &owner->pi_waiters);
  		}
-		__rt_mutex_adjust_prio(owner);
+		if (fix_prio) {
+			__rt_mutex_adjust_prio(owner);
+		}

  		if (owner->pi_blocked_on) {
-			boost = 1;
+			boost = fix_prio;
  			get_task_struct(owner);
  		}
  		spin_unlock_irqrestore(&owner->pi_lock, flags);
@@ -625,6 +628,7 @@ rt_lock_slowlock(struct rt_mutex *lock _

  	debug_rt_mutex_init_waiter(&waiter);
  	waiter.task = NULL;
+	waiter.save_state = 1;

  	spin_lock(&lock->wait_lock);

@@ -687,6 +691,19 @@ rt_lock_slowlock(struct rt_mutex *lock _
  		state = xchg(&current->state, TASK_UNINTERRUPTIBLE);
  		if (unlikely(state == TASK_RUNNING))
  			saved_state = TASK_RUNNING;
+
+		if (unlikely(waiter.task) &&
+		    waiter.list_entry.prio != current->prio) {
+			/*
+			 * We still not have the lock, but we are woken up with
+			 * a different prio than the one we waited with
+			 * originally. We remove the wait entry now and then
+			 * reinsert ourselves with the right priority
+			 */
+			remove_waiter(lock, &waiter, 0 __IP__);
+			waiter.task = NULL;
+		}
+
  	}

  	state = xchg(&current->state, saved_state);
@@ -700,7 +717,7 @@ rt_lock_slowlock(struct rt_mutex *lock _
  	 * can end up with a non-NULL waiter.task:
  	 */
  	if (unlikely(waiter.task))
-		remove_waiter(lock, &waiter __IP__);
+		remove_waiter(lock, &waiter, 0 __IP__);
  	/*
  	 * try_to_take_rt_mutex() sets the waiter bit
  	 * unconditionally. We might have to fix that up:
@@ -798,6 +815,7 @@ rt_mutex_slowlock(struct rt_mutex *lock,

  	debug_rt_mutex_init_waiter(&waiter);
  	waiter.task = NULL;
+	waiter.save_state = 0;

  	spin_lock(&lock->wait_lock);

@@ -877,12 +895,24 @@ rt_mutex_slowlock(struct rt_mutex *lock,

  		current->flags |= saved_flags;
  		set_current_state(state);
+
+		if (unlikely(waiter.task) &&
+		    waiter.list_entry.prio != current->prio) {
+			/*
+			 * We still not have the lock, but we are woken up with
+			 * a different prio than the one we waited with
+			 * originally. We remove the wait entry now and then
+			 * reinsert ourselves with the right priority
+			 */
+			remove_waiter(lock, &waiter, 0 __IP__);
+			waiter.task = NULL;
+		}
  	}

  	set_current_state(TASK_RUNNING);

  	if (unlikely(waiter.task))
-		remove_waiter(lock, &waiter __IP__);
+		remove_waiter(lock, &waiter, 1 __IP__);

  	/*
  	 * try_to_take_rt_mutex() sets the waiter bit
Index: linux-2.6.17-rt1/kernel/sched.c
===================================================================
--- linux-2.6.17-rt1.orig/kernel/sched.c
+++ linux-2.6.17-rt1/kernel/sched.c
@@ -57,6 +57,8 @@

  #include <asm/unistd.h>

+#include "rtmutex_common.h"
+
  /*
   * Convert user-nice values [ -20 ... 0 ... 19 ]
   * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
@@ -646,7 +648,9 @@ static inline void sched_info_switch(tas
  #define sched_info_switch(t, next)	do { } while (0)
  #endif /* CONFIG_SCHEDSTATS */

+#ifdef CONFIG_SMP
  static __cacheline_aligned_in_smp atomic_t rt_overload;
+#endif

  static inline void inc_rt_tasks(task_t *p, runqueue_t *rq)
  {
@@ -1473,10 +1477,11 @@ static inline int wake_idle(int cpu, tas
  static int try_to_wake_up(task_t *p, unsigned int state, int sync, int mutex)
  {
  	int cpu, this_cpu, success = 0;
-	runqueue_t *this_rq, *rq;
+	runqueue_t *rq;
  	unsigned long flags;
  	long old_state;
  #ifdef CONFIG_SMP
+	runqueue_t *this_rq;
  	unsigned long load, this_load;
  	struct sched_domain *sd, *this_sd = NULL;
  	int new_cpu;
@@ -4351,6 +4356,18 @@ int setscheduler(struct task_struct *p,
  			resched_task(rq->curr);
  	}
  	task_rq_unlock(rq, &flags);
+
+	/*
+	 * If the process is blocked on rt-mutex, it will now wake up and
+	 * reinsert itself into the wait list and boost the owner correctly
+	 */
+	if (p->pi_blocked_on) {
+		if (p->pi_blocked_on->save_state)
+			wake_up_process_mutex(p);
+		else
+			wake_up_process(p);
+	}
+
  	spin_unlock_irqrestore(&p->pi_lock, fp);
  	return 0;
  }
@@ -7086,4 +7103,3 @@ void notrace preempt_enable_no_resched(v

  EXPORT_SYMBOL(preempt_enable_no_resched);
  #endif
-
Index: linux-2.6.17-rt1/kernel/rtmutex_common.h
===================================================================
--- linux-2.6.17-rt1.orig/kernel/rtmutex_common.h
+++ linux-2.6.17-rt1/kernel/rtmutex_common.h
@@ -49,6 +49,7 @@ struct rt_mutex_waiter {
  	struct plist_node	pi_list_entry;
  	struct task_struct	*task;
  	struct rt_mutex		*lock;
+	int                     save_state;
  #ifdef CONFIG_DEBUG_RT_MUTEXES
  	unsigned long		ip;
  	pid_t			deadlock_task_pid;

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: More weird latency trace output (was Re: 2.6.17-rt1)
  2006-06-22  2:51   ` More weird latency trace output (was Re: 2.6.17-rt1) Lee Revell
@ 2006-06-23  1:24     ` Lee Revell
  2006-06-24 22:15       ` Lee Revell
  0 siblings, 1 reply; 51+ messages in thread
From: Lee Revell @ 2006-06-23  1:24 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Thomas Gleixner, Steven Rostedt

On Wed, 2006-06-21 at 22:51 -0400, Lee Revell wrote:
> How can the latency tracer be reporting 1898us max latency while the
> trace is of a 12us latency?  This must be a bug, correct?

I've found the bug.  The latency tracer uses get_cycles() for
timestamping, which uses rdtsc, which is unusable for timing on dual
core AMD64 machines due to the well known "unsynced TSCs" hardware bug.

Would a patch to convert the latency tracer to use gettimeofday() be
acceptable?

Lee


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-23 11:23                                   ` Esben Nielsen
@ 2006-06-23 11:06                                     ` Steven Rostedt
  2006-07-03 11:48                                     ` Esben Nielsen
  1 sibling, 0 replies; 51+ messages in thread
From: Steven Rostedt @ 2006-06-23 11:06 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Thomas Gleixner, Ingo Molnar, LKML


On Fri, 23 Jun 2006, Esben Nielsen wrote:

>
> I was bonked for using the other thread for preempt-realtime stuff, so I
> assume this thread is for preempt-realtime stuff only :-)
>

Yep, please use this thread.  We don't want to bother any mainline people
with -rt only issues until it's in mainline ;)

-- Steve


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-22 18:05                                 ` Thomas Gleixner
@ 2006-06-23 11:23                                   ` Esben Nielsen
  2006-06-23 11:06                                     ` Steven Rostedt
  2006-07-03 11:48                                     ` Esben Nielsen
  0 siblings, 2 replies; 51+ messages in thread
From: Esben Nielsen @ 2006-06-23 11:23 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Esben Nielsen, Steven Rostedt, Ingo Molnar, LKML

On Thu, 22 Jun 2006, Thomas Gleixner wrote:

> On Thu, 2006-06-22 at 19:06 +0100, Esben Nielsen wrote:
>>>
>>> Thats a seperate issue. Though you are right.
>>
>> Why not use my original patch and solve both issues?
>> I have even updated it to avoid the double traversal. It also removes
>> one other traversal which shouldn't be needed. (I have not had time
>> to boot the kernel with it, though, but it does compile...:-)
>

Let me comment on your last sentence first:

> We want an immidate propagation.
There is no such thing as "immidiate". The most you can say is that when 
you  return from setscheduler() everything is taken care off. I agree - it 
has to be like that. But I just want to add _effectively_ taken care off. I.e. 
when the within the time setscheduler() has returned _and_ the the 
relevant priorities have time to run things get fixed. No, the users 
should not have to worry about the priorities hanging around, but the 
system can wait with fixing them it is relavant.

> Simply because it does not solve following scenario:
>
> High prio task is blocked on lock and boosts 3 other tasks. Now the
> higher prrio watchdog detects that the high prio task is stuck and
> lowers the priority. You can wake it up as long as you want, the boosted
> task is still busy looping.

Yes, you are right. I have thought about how to fix it, but it will be 
rather ugly. I'll try to see what I can do though, but I am afraid the 
patch might get too big. :-(

(
The same problem may actually always have been present in another corner 
situation:
    A boosts B whichs is blocked interruptible on a lock a boosts C, which 
again boosts D (etc.). A and B are signalled roughly at the same time. A 
wakes up first a decrease the priority of B but not C, because the B runs 
on another CPU, but with lower priority and fixes C. A then stops because C is
fixed. but D still have the A's priority and preempts B, which should fix 
it, but has it's low priority back.
  This situation is not bad because A has already been in the and 
therefore the designer has to think that C D etc already got it's 
priority. That A is later on interrupted shouldn't matter.
)

May I suggest a compromise:

if (in_interrupt() ||
     ( increasing priority && current->prio < prio ) )
            do it Esben's way;
else
            do it Thomas's way;

Then will have fixed most of the problems. We still have the problem of a 
the high priority watchdog task being unbound in setscheduler(), though. 
And if it is called from interrupt and used to descrease the priority, it 
wont work correctly. But it will work for the use in hrtimers, where the 
priority is increased from interrupt.


> And I do not like the idea of invoking the scheduler to do those
> propagations. setscheduler is a synchronous effect in all other cases.
> So it has to be synchronous in the propagation case too.

See my note above about being "immediate". Syncronous is more precise but 
the same argument holds.

>
> Preempt-RT and the dynamic priority adjustment of high resolution timers
> is a different playground and we have to think about that seperately.

*nod*
I agree, although we deffinitely don't want to have too many #ifdef 
CONFIG_PREEMPT_RT around in the code. That makes it far harder to get into 
the mainline tree.

I was bonked for using the other thread for preempt-realtime stuff, so I 
assume this thread is for preempt-realtime stuff only :-)

>
> 	tglx
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.17-rt1 - mm_struct leak
  2006-06-18  7:06 2.6.17-rt1 Ingo Molnar
                   ` (2 preceding siblings ...)
  2006-06-22  0:57 ` 2.6.17-rt1 Lee Revell
@ 2006-06-23 20:56 ` Vernon Mauery
  2006-06-24  9:24   ` Mark Hounschell
  2006-06-30 16:02   ` [PATCH -RT]Re: " Vernon Mauery
  3 siblings, 2 replies; 51+ messages in thread
From: Vernon Mauery @ 2006-06-23 20:56 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Sunday 18 June 2006 00:06, Ingo Molnar wrote:
> i have released the 2.6.17-rt1 tree, which can be downloaded from the
> usual place:

I was given a test case to run that seemed to cause the machine to run the 
OOM-killer after a bunch of iterations.  The test was run like:

$ for ((i=0;i<50000;i++)); do ./test; done

and somewhere in there, the LowFree would drop very low and the test and the 
bash shell running it would get killed.  And then since that didn't free up 
much memory, the machine would become very unresponsive and would have to be 
rebooted.

I don't have the source for the test myself and I am still trying to reproduce 
it, but from what I have gathered, there is a problem cleaning up after 
tasks.  I monitored the machine with slabtop while running the loop and found 
that the size-32, pgd, pgm, and mm_struct slabs caches were constantly 
growing while task_struct stayed where it should be.  At one point, there 
were  127 task_structs and 3530 mm_structs.

I am still trying to figure out what is going on here, but I thought I might 
throw this out there to see if anyone else has seen anything like this.  And 
possibly pointers for how to track it down.  Right now I am trying to trace 
down a possible mismatch between mmget and mmput in the process exit code.  I 
will let you know what I find.

--Vernon

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.17-rt1 - mm_struct leak
  2006-06-23 20:56 ` 2.6.17-rt1 - mm_struct leak Vernon Mauery
@ 2006-06-24  9:24   ` Mark Hounschell
  2006-06-24  9:32     ` Mark Hounschell
  2006-06-30 16:02   ` [PATCH -RT]Re: " Vernon Mauery
  1 sibling, 1 reply; 51+ messages in thread
From: Mark Hounschell @ 2006-06-24  9:24 UTC (permalink / raw)
  To: Vernon Mauery; +Cc: Ingo Molnar, linux-kernel

Vernon Mauery wrote:
> On Sunday 18 June 2006 00:06, Ingo Molnar wrote:
>> i have released the 2.6.17-rt1 tree, which can be downloaded from the
>> usual place:
> 
> I was given a test case to run that seemed to cause the machine to run the 
> OOM-killer after a bunch of iterations.  The test was run like:
> 
> $ for ((i=0;i<50000;i++)); do ./test; done
> 
> and somewhere in there, the LowFree would drop very low and the test and the 
> bash shell running it would get killed.  And then since that didn't free up 
> much memory, the machine would become very unresponsive and would have to be 
> rebooted.
> 
> I don't have the source for the test myself and I am still trying to reproduce 
> it, but from what I have gathered, there is a problem cleaning up after 
> tasks.  I monitored the machine with slabtop while running the loop and found 
> that the size-32, pgd, pgm, and mm_struct slabs caches were constantly 
> growing while task_struct stayed where it should be.  At one point, there 
> were  127 task_structs and 3530 mm_structs.
> 
> I am still trying to figure out what is going on here, but I thought I might 
> throw this out there to see if anyone else has seen anything like this.  And 
> possibly pointers for how to track it down.  Right now I am trying to trace 
> down a possible mismatch between mmget and mmput in the process exit code.  I 
> will let you know what I find.
> 
> --Vernon

I reported a similar if not the same problem against 2.6.16-rt23
See this thread. Subject "possible 2.6.16-rt23 OOM problem"

http://lkml.org/lkml/2006/5/25/239

I found a way around the problem so didn't pursue it. Maybe "my way around it"
might help in tracking it down.

Mark

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.17-rt1 - mm_struct leak
  2006-06-24  9:24   ` Mark Hounschell
@ 2006-06-24  9:32     ` Mark Hounschell
  0 siblings, 0 replies; 51+ messages in thread
From: Mark Hounschell @ 2006-06-24  9:32 UTC (permalink / raw)
  To: dmarkh; +Cc: Vernon Mauery, Ingo Molnar, linux-kernel

Mark Hounschell wrote:
> Vernon Mauery wrote:
>> On Sunday 18 June 2006 00:06, Ingo Molnar wrote:
>>> i have released the 2.6.17-rt1 tree, which can be downloaded from the
>>> usual place:
>> I was given a test case to run that seemed to cause the machine to run the 
>> OOM-killer after a bunch of iterations.  The test was run like:
>>
>> $ for ((i=0;i<50000;i++)); do ./test; done
>>
>> and somewhere in there, the LowFree would drop very low and the test and the 
>> bash shell running it would get killed.  And then since that didn't free up 
>> much memory, the machine would become very unresponsive and would have to be 
>> rebooted.
>>
>> I don't have the source for the test myself and I am still trying to reproduce 
>> it, but from what I have gathered, there is a problem cleaning up after 
>> tasks.  I monitored the machine with slabtop while running the loop and found 
>> that the size-32, pgd, pgm, and mm_struct slabs caches were constantly 
>> growing while task_struct stayed where it should be.  At one point, there 
>> were  127 task_structs and 3530 mm_structs.
>>
>> I am still trying to figure out what is going on here, but I thought I might 
>> throw this out there to see if anyone else has seen anything like this.  And 
>> possibly pointers for how to track it down.  Right now I am trying to trace 
>> down a possible mismatch between mmget and mmput in the process exit code.  I 
>> will let you know what I find.
>>
>> --Vernon
> 
> I reported a similar if not the same problem against 2.6.16-rt23
> See this thread. Subject "possible 2.6.16-rt23 OOM problem"
> 
> http://lkml.org/lkml/2006/5/25/239
> 
> I found a way around the problem so didn't pursue it. Maybe "my way around it"
> might help in tracking it down.
> 
> Mark
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

See this one instead. Its my own follow up.

http://lkml.org/lkml/2006/6/7/58

Mark

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: More weird latency trace output (was Re: 2.6.17-rt1)
  2006-06-24 22:15       ` Lee Revell
@ 2006-06-24 22:12         ` Ingo Molnar
  2006-06-24 22:31           ` Lee Revell
  2006-06-24 23:49           ` Lee Revell
  0 siblings, 2 replies; 51+ messages in thread
From: Ingo Molnar @ 2006-06-24 22:12 UTC (permalink / raw)
  To: Lee Revell; +Cc: linux-kernel, Thomas Gleixner, Steven Rostedt, john stultz


* Lee Revell <rlrevell@joe-job.com> wrote:

> On Thu, 2006-06-22 at 21:24 -0400, Lee Revell wrote:
> > On Wed, 2006-06-21 at 22:51 -0400, Lee Revell wrote:
> > > How can the latency tracer be reporting 1898us max latency while the
> > > trace is of a 12us latency?  This must be a bug, correct?
> > 
> > I've found the bug.  The latency tracer uses get_cycles() for
> > timestamping, which uses rdtsc, which is unusable for timing on dual
> > core AMD64 machines due to the well known "unsynced TSCs" hardware bug.
> > 
> > Would a patch to convert the latency tracer to use gettimeofday() be
> > acceptable?
> 
> OK, I tried that and it oopses on boot - presumably the latency tracer 
> runs before clocksource infrastructure is initialized.
> 
> Does anyone have any suggestions at all as to what a proper solution 
> would look like?  Is no one interested in this problem?

does it get better if you boot with idle=poll? [that could work around 
the rdtsc drifting problem] Calling gettimeofday() from within the 
tracer is close to impossible - way too many opportunities for 
recursion. It's also pretty slow that way.

	Ingo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: More weird latency trace output (was Re: 2.6.17-rt1)
  2006-06-23  1:24     ` Lee Revell
@ 2006-06-24 22:15       ` Lee Revell
  2006-06-24 22:12         ` Ingo Molnar
  0 siblings, 1 reply; 51+ messages in thread
From: Lee Revell @ 2006-06-24 22:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Thomas Gleixner, Steven Rostedt, john stultz

On Thu, 2006-06-22 at 21:24 -0400, Lee Revell wrote:
> On Wed, 2006-06-21 at 22:51 -0400, Lee Revell wrote:
> > How can the latency tracer be reporting 1898us max latency while the
> > trace is of a 12us latency?  This must be a bug, correct?
> 
> I've found the bug.  The latency tracer uses get_cycles() for
> timestamping, which uses rdtsc, which is unusable for timing on dual
> core AMD64 machines due to the well known "unsynced TSCs" hardware bug.
> 
> Would a patch to convert the latency tracer to use gettimeofday() be
> acceptable?

OK, I tried that and it oopses on boot - presumably the latency tracer
runs before clocksource infrastructure is initialized.

Does anyone have any suggestions at all as to what a proper solution
would look like?  Is no one interested in this problem?

Lee


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: More weird latency trace output (was Re: 2.6.17-rt1)
  2006-06-24 22:12         ` Ingo Molnar
@ 2006-06-24 22:31           ` Lee Revell
  2006-06-24 23:49           ` Lee Revell
  1 sibling, 0 replies; 51+ messages in thread
From: Lee Revell @ 2006-06-24 22:31 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Thomas Gleixner, Steven Rostedt, john stultz

On Sun, 2006-06-25 at 00:12 +0200, Ingo Molnar wrote:
> * Lee Revell <rlrevell@joe-job.com> wrote:
> 
> > On Thu, 2006-06-22 at 21:24 -0400, Lee Revell wrote:
> > > On Wed, 2006-06-21 at 22:51 -0400, Lee Revell wrote:
> > > > How can the latency tracer be reporting 1898us max latency while the
> > > > trace is of a 12us latency?  This must be a bug, correct?
> > > 
> > > I've found the bug.  The latency tracer uses get_cycles() for
> > > timestamping, which uses rdtsc, which is unusable for timing on dual
> > > core AMD64 machines due to the well known "unsynced TSCs" hardware bug.
> > > 
> > > Would a patch to convert the latency tracer to use gettimeofday() be
> > > acceptable?
> > 
> > OK, I tried that and it oopses on boot - presumably the latency tracer 
> > runs before clocksource infrastructure is initialized.
> > 
> > Does anyone have any suggestions at all as to what a proper solution 
> > would look like?  Is no one interested in this problem?
> 
> does it get better if you boot with idle=poll? [that could work around 
> the rdtsc drifting problem] Calling gettimeofday() from within the 
> tracer is close to impossible - way too many opportunities for 
> recursion. It's also pretty slow that way.

Probably, but that seems like an extremely heavy solution.  Won't it
turn my box into a space heater?

I'm thinking of replacing get_cycles() with low level code to just read
the PM timer.  I can deal with the slowness, this is just a debug
feature after all.

It seems that there might be a solution involving per CPU data, but I'm
not nearly a good enough kernel hacker to attempt that.

(I hope AMD fixes this issue in a future generation of CPUs.  At least,
they seem to acknowledge that it's a bug...)

Lee


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: More weird latency trace output (was Re: 2.6.17-rt1)
  2006-06-24 22:12         ` Ingo Molnar
  2006-06-24 22:31           ` Lee Revell
@ 2006-06-24 23:49           ` Lee Revell
  1 sibling, 0 replies; 51+ messages in thread
From: Lee Revell @ 2006-06-24 23:49 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Thomas Gleixner, Steven Rostedt, john stultz

On Sun, 2006-06-25 at 00:12 +0200, Ingo Molnar wrote:
> * Lee Revell <rlrevell@joe-job.com> wrote:
> 
> The latency tracer uses get_cycles() for
> > > timestamping, which uses rdtsc, which is unusable for timing on dual
> > > core AMD64 machines

> does it get better if you boot with idle=poll? [that could work around 
> the rdtsc drifting problem] Calling gettimeofday() from within the 
> tracer is close to impossible - way too many opportunities for 
> recursion. It's also pretty slow that way.

That seemed to solve the drifting problem, but I still get some weird
behavior.  The max reported trace in dmesg still does not
match /proc/latency_trace:

( posix_cpu_timer-3    |#0): new 38 us maximum-latency wakeup.


preemption latency trace v1.1.5 on 2.6.17-rt1-smp-latency-trace
--------------------------------------------------------------------
 latency: 19 us, #74/74, CPU#0 | (M:rt VP:0, KP:0, SP:1 HP:1 #P:2)
    -----------------
    | task: posix_cpu_timer-3 (uid:0 nice:0 policy:1 rt_prio:99)
    -----------------

Lee


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH -RT]Re: 2.6.17-rt1 - mm_struct leak
  2006-06-23 20:56 ` 2.6.17-rt1 - mm_struct leak Vernon Mauery
  2006-06-24  9:24   ` Mark Hounschell
@ 2006-06-30 16:02   ` Vernon Mauery
  1 sibling, 0 replies; 51+ messages in thread
From: Vernon Mauery @ 2006-06-30 16:02 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1950 bytes --]

On Friday 23 June 2006 13:56, Vernon Mauery wrote:
> On Sunday 18 June 2006 00:06, Ingo Molnar wrote:
> > i have released the 2.6.17-rt1 tree, which can be downloaded from the
> > usual place:
>
> I was given a test case to run that seemed to cause the machine to run the
> OOM-killer after a bunch of iterations.  The test was run like:
>
> $ for ((i=0;i<50000;i++)); do ./test; done
>
> and somewhere in there, the LowFree would drop very low and the test and
> the bash shell running it would get killed.  And then since that didn't
> free up much memory, the machine would become very unresponsive and would
> have to be rebooted.

I found the problem -- it was in the PI futex code -- lock_queue was getting 
called essentially twice in a row and was continually incrementing the 
mm_count ref count, thus causing the memory leak.  Not completely 
understanding all the intricacies of the PI futex code, I just suggested to 
no make the second call to lock_queue (not realizing __queue_me unlocks the 
queue).  Dinakar Guniguntala provided a proper fix for the problem that 
simply grabs the spinlock for the hash bucket queue rather than calling 
lock_queue.  His fix is below and applies to 2.6.17-rt4.

Attached is a test case I wrote to reproduce the problem.

--Vernon

The second time we do a queue_lock in futex_lock_pi, we really only need to 
take the hash bucket lock.

Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Signed-off-by: Vernon Mauery <vernux@us.ibm.com>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>

Index: linux-2.6.17-rt4/kernel/futex.c
===================================================================
--- a/kernel/futex.c	2006-06-30 04:39:29.000000000 -0700
+++ b/kernel/futex.c	2006-06-30 04:46:52.000000000 -0700
@@ -1269,7 +1269,7 @@
 	}
 
 	down_read(&curr->mm->mmap_sem);
-	hb = queue_lock(&q, -1, NULL);
+	spin_lock(q.lock_ptr);
 
 	/*
 	 * Got the lock. We might not be the anticipated owner if we




[-- Attachment #2: mm_leak.c --]
[-- Type: text/plain, Size: 4138 bytes --]

/*    Filename: mm_leak.c
 *      Author: Vernon Mauery <vernux@us.ibm.com>
 * Description: force a kernel memory leak using pi mutexes
 * Compilation: gcc mm_leak.c -lm -L/usr/lib/nptl -lpthread -lrt -D_GNU_SOURCE -I/usr/include/nptl -o mm_leak
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 *
 * Copyright (C) IBM Corporation, 2006
 *
 * 2006-May-3:	Initial version by Darren Hart <dvhltc@us.ibm.com>
 * 2006-May-4:  Timing fixes reported by Vernon Mauery <vernux@us.ibm.com>
 * 2006-May-4:  Made the med prio threads RT by Darren Hart <dvhltc@us.ibm.com>
 * 2006-May-5:  Modified to use flagging by Vernon Mauery <vernux@us.ibm.com>
 */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
#include <sched.h>

#define HIGH_PRIO 15
#define ITERATIONS 100

static int count = 0;
static pthread_mutex_t pi_mutex;

struct thread {
	pthread_t pthread;
	pthread_attr_t attr;
};

int create_fifo_thread(struct thread *thread, void*(*func)(void*), void *arg, int prio)
{
	struct sched_param param;

	param.sched_priority = sched_get_priority_min(SCHED_FIFO) + prio;

	pthread_attr_init(&thread->attr);
	pthread_attr_setinheritsched(&thread->attr, PTHREAD_EXPLICIT_SCHED);
	pthread_attr_setschedparam(&thread->attr, &param);
	pthread_attr_setschedpolicy(&thread->attr, SCHED_FIFO);

	if (pthread_create(&thread->pthread, &thread->attr, func, (void*)arg)) {
		perror("pthread_create failed");
		return 1;
	}

	pthread_attr_destroy(&thread->attr);
	return 0;
}

void init_pi_mutex(pthread_mutex_t *m, int use_pi)
{
	pthread_mutexattr_t attr;
	int ret;
	int protocol;

	if (pthread_mutexattr_init(&attr) != 0) {
		printf("Failed to init mutexattr\n");
	}
	if (use_pi) {
		if (pthread_mutexattr_setprotocol(&attr, PTHREAD_PRIO_INHERIT) != 0) {
			printf("Can't set protocol prio inherit\n");
		}
	}
	if (pthread_mutexattr_getprotocol(&attr, &protocol) != 0) {
		printf("Can't get mutexattr protocol\n");
	}
	if ((ret = pthread_mutex_init(m, &attr)) != 0) {
		printf("Failed to init mutex: %d\n", ret);
	}
	
	/* FIXME: does any of this need to be destroyed ? */
}

#define THREADS 2
void *thread_one(void *arg)
{
	int iterations = (int)arg;
	while (count < iterations) {
		while (1) {
			pthread_mutex_lock(&pi_mutex);
			if (count % THREADS == 1)
				break;
			pthread_mutex_unlock(&pi_mutex);
			usleep(10);
		}
		printf("1"); fflush(NULL);
		count++;
		pthread_mutex_unlock(&pi_mutex);
	}
	return NULL;
}

void *thread_zero(void *arg) {
	int iterations = (int)arg;
	while (count < iterations) {
		while (1) {
			pthread_mutex_lock(&pi_mutex);
			if (count % THREADS == 0)
				break;
			pthread_mutex_unlock(&pi_mutex);
			usleep(1);
		}
		printf("0"); fflush(NULL);
		count++;
		pthread_mutex_unlock(&pi_mutex);
	}
	return NULL;
}

int main(int argc, char *argv[])
{
	struct thread one;
	int iterations = ITERATIONS;
	int pi = 1, c;

	while ((c=getopt(argc, argv, "i:ph")) >= 0) {
		if (c == 'i') {
			iterations = atoi(optarg);
		} else if (c == 'p') {
			pi = 0;
		} else {
			printf("usage: %s [-i iterations] [-p] [-h]\n", argv[0]);
			printf("\t -i iterations\n");
			printf("\t -p -- do not use pi mutexes\n");
			printf("\t -h -- print this message\n");
			exit(0);
		}
	}

	/* creating the mutex as a PI mutex causes a kernel memory leak */
	init_pi_mutex(&pi_mutex, pi);

	create_fifo_thread(&one, thread_one, (void*)iterations, HIGH_PRIO);

	thread_zero((void*)iterations);

	pthread_join(one.pthread, NULL);
	printf("\n");
	return 0;
}

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1)
  2006-06-23 11:23                                   ` Esben Nielsen
  2006-06-23 11:06                                     ` Steven Rostedt
@ 2006-07-03 11:48                                     ` Esben Nielsen
  1 sibling, 0 replies; 51+ messages in thread
From: Esben Nielsen @ 2006-07-03 11:48 UTC (permalink / raw)
  To: Esben Nielsen
  Cc: Thomas Gleixner, Esben Nielsen, Steven Rostedt, Ingo Molnar, LKML

[-- Attachment #1: Type: TEXT/PLAIN, Size: 24242 bytes --]



On Fri, 23 Jun 2006, Esben Nielsen wrote:

> On Thu, 22 Jun 2006, Thomas Gleixner wrote:
>
>>  On Thu, 2006-06-22 at 19:06 +0100, Esben Nielsen wrote:
>> > > 
>> > >  Thats a seperate issue. Though you are right.
>> > 
>> >  Why not use my original patch and solve both issues?
>> >  I have even updated it to avoid the double traversal. It also removes
>> >  one other traversal which shouldn't be needed. (I have not had time
>> >  to boot the kernel with it, though, but it does compile...:-)
>> 
>
> Let me comment on your last sentence first:
>
>>  We want an immidate propagation.
> There is no such thing as "immidiate". The most you can say is that when you 
> return from setscheduler() everything is taken care off. I agree - it has to 
> be like that. But I just want to add _effectively_ taken care off. I.e. when 
> the within the time setscheduler() has returned _and_ the the relevant 
> priorities have time to run things get fixed. No, the users should not have 
> to worry about the priorities hanging around, but the system can wait with 
> fixing them it is relavant.
>
>>  Simply because it does not solve following scenario:
>>
>>  High prio task is blocked on lock and boosts 3 other tasks. Now the
>>  higher prrio watchdog detects that the high prio task is stuck and
>>  lowers the priority. You can wake it up as long as you want, the boosted
>>  task is still busy looping.
>
> Yes, you are right. I have thought about how to fix it, but it will be rather 
> ugly. I'll try to see what I can do though, but I am afraid the patch might 
> get too big. :-(
>
> (
> The same problem may actually always have been present in another corner 
> situation:
>   A boosts B whichs is blocked interruptible on a lock a boosts C, which 
> again boosts D (etc.). A and B are signalled roughly at the same time. A 
> wakes up first a decrease the priority of B but not C, because the B runs on 
> another CPU, but with lower priority and fixes C. A then stops because C is
> fixed. but D still have the A's priority and preempts B, which should fix it, 
> but has it's low priority back.
> This situation is not bad because A has already been in the and therefore 
> the designer has to think that C D etc already got it's priority. That A is 
> later on interrupted shouldn't matter.
> )
>


I did get around to make something. Besides solving the problems already 
discussed it solves and obvious problem not yet discussed:

Let us say you have a RT and a non-RT task sharing a mutex on a UP 
machine.
1. The non-RT takes the mutex.
2. The RT  tries to lock on the mutex with a 10 ms timeout.
3. The non-RT task is boostet to the RT's prio.
4. The non-RT task runs for 20 ms.
5. The non-RT task unlocks the mutex and is deboosted.
6. Now the RT task wakes up - after 20 ms.

The problem is clear: The RT task never gets the CPU even when it is timed 
out because the non-RT has the same prio and is already running. The 
timeout signal sends the RT task into the scheduler after the boosted 
non-RT task because everything is scheduled in FIFO order.

Notice: I have not yet tested this because I haven't got around to get a 
glibc with support for PI futexes. Where do I download that by the way?

So what do my patch do:

1) A new priority attribute, boosting_prio, is added to the task struct. 
This priority is the maximum priority the task is boosting another to. 
Neither the PI code itself, nor setscheduler() will ever lower the 
priority below this point. boosting_prio is initialized MAX_PRIO and then 
raised (numerically lowered :-), never lowered, every time the task is 
boosting another until the task is leaving the lock() operation due to a 
signal or timeout or because it is woken up.

2) The scheduler is modified such that tasks with boosting_prio!=MAX_PRIO 
is scheduled in LIFO order rather than FIFO. In other words: Tasks which 
are inside a rtmutex/lock locking operation will get their priority raised 
by a half and preempt other tasks having the same prioriy.

The net effect of all this is that the priority debooting, which is 
needed when a task times out, is signalled or setscheduler() is called, 
while the task is in a lock operation, runs in the task context itself, 
and the task is ensured to be able to preempt any of the tasks it has boosted.
If the task gets the lock it will loose it will not be woken up in LIFO 
order, because it will stop boosting anything before being woken up.
An odd effect is that after setscheduler() is used to lower a task's 
priority, it stays at a seemingly high priority while it is still blocked, 
but the task actually running in the critical section has the correct low 
priority. This is because boosting_prio is always set back to MAX_RT when 
the task leaves the lock operation.

Missing:
task->boosting_prio is protected by task->pi_lock. However, the scheduler 
reads task->boosting_prio without having that lock. I do not know enough 
of memory barriers to be able to insert them correctly. And it might be 
that the implicicit memory barriers from the pi_lock and the runqueue lock 
might be enoguh for all actual ways through the code.

I used Thomas's patch for the rt-tester, slightly modified. This is 
supposed to be applied first, as you really ought to apply the test before 
fixing the problem, to at least see that your test catches the problem.
I have attached my slightly modified version, but Thomas's version might 
be ok, but the main patch might not applie cleanly to the rttester 
scripts.

Will try to inline the main patch here so we can discuss it. I hope the 
white spaces are ok.

Esben


Index: linux-2.6.17-rt4/kernel/rtmutex.c
===================================================================
--- linux-2.6.17-rt4.orig/kernel/rtmutex.c
+++ linux-2.6.17-rt4/kernel/rtmutex.c
@@ -121,7 +121,7 @@ static inline void init_lists(struct rt_
   * Return task->normal_prio when the waiter list is empty or when
   * the waiter is not allowed to do priority boosting
   */
-int rt_mutex_getprio(struct task_struct *task)
+static int rt_mutex_getprio_real(struct task_struct *task)
  {
  	if (likely(!task_has_pi_waiters(task)))
  		return task->normal_prio;
@@ -130,6 +130,11 @@ int rt_mutex_getprio(struct task_struct
  		   task->normal_prio);
  }

+int rt_mutex_getprio(struct task_struct *task)
+{
+	return min(rt_mutex_getprio_real(task), task->boosting_prio);
+}
+
  /*
   * Adjust the priority of a task, after its pi_waiters got modified.
   *
@@ -152,11 +157,12 @@ static void __rt_mutex_adjust_prio(struc
   * of task. We do not use the spin_xx_mutex() variants here as we are
   * outside of the debug path.)
   */
-static void rt_mutex_adjust_prio(struct task_struct *task)
+static void rt_mutex_adjust_prio_final(struct task_struct *task)
  {
  	unsigned long flags;

  	spin_lock_irqsave(&task->pi_lock, flags);
+	task->boosting_prio = MAX_PRIO;
  	__rt_mutex_adjust_prio(task);
  	spin_unlock_irqrestore(&task->pi_lock, flags);
  }
@@ -232,7 +238,8 @@ static int rt_mutex_adjust_prio_chain(ta
  	 * When deadlock detection is off then we check, if further
  	 * priority adjustment is necessary.
  	 */
-	if (!detect_deadlock && waiter->list_entry.prio == task->prio)
+	if (!detect_deadlock &&
+	    waiter->list_entry.prio == rt_mutex_getprio_real(task))
  		goto out_unlock_pi;

  	lock = waiter->lock;
@@ -254,7 +261,9 @@ static int rt_mutex_adjust_prio_chain(ta

  	/* Requeue the waiter */
  	plist_del(&waiter->list_entry, &lock->wait_list);
-	waiter->list_entry.prio = task->prio;
+	waiter->list_entry.prio = rt_mutex_getprio_real(task);
+	task->boosting_prio = min(task->boosting_prio,
+				  waiter->list_entry.prio);
  	plist_add(&waiter->list_entry, &lock->wait_list);

  	/* Release the task */
@@ -300,6 +309,18 @@ static int rt_mutex_adjust_prio_chain(ta
  }

  /*
+ * Calls the rt_mutex_adjust_prio_chain() above
+ * whith unlocking and locking lock->wait_lock.
+ */
+static void rt_mutex_adjust_prio_chain_unlock(struct rt_mutex *lock)
+{
+	spin_unlock(&lock->wait_lock);
+	get_task_struct(current);
+	rt_mutex_adjust_prio_chain(current, 0, NULL, NULL __IP__);
+	spin_lock(&lock->wait_lock);
+}
+
+/*
   * Optimization: check if we can steal the lock from the
   * assigned pending owner [which might not have taken the
   * lock yet]:
@@ -309,6 +330,7 @@ static inline int try_to_steal_lock(stru
  	struct task_struct *pendowner = rt_mutex_owner(lock);
  	struct rt_mutex_waiter *next;
  	unsigned long flags;
+	int current_prio;

  	if (!rt_mutex_owner_pending(lock))
  		return 0;
@@ -316,8 +338,12 @@ static inline int try_to_steal_lock(stru
  	if (pendowner == current)
  		return 1;

+	spin_lock_irqsave(&current->pi_lock, flags);
+	current_prio = rt_mutex_getprio_real(current);
+	spin_unlock_irqrestore(&current->pi_lock, flags);
+
  	spin_lock_irqsave(&pendowner->pi_lock, flags);
-	if (current->prio >= pendowner->prio) {
+	if (current_prio >= rt_mutex_getprio_real(pendowner)) {
  		spin_unlock_irqrestore(&pendowner->pi_lock, flags);
  		return 0;
  	}
@@ -335,6 +361,7 @@ static inline int try_to_steal_lock(stru
  	/* No chain handling, pending owner is not blocked on anything: */
  	next = rt_mutex_top_waiter(lock);
  	plist_del(&next->pi_list_entry, &pendowner->pi_waiters);
+	pendowner->boosting_prio = MAX_PRIO;
  	__rt_mutex_adjust_prio(pendowner);
  	spin_unlock_irqrestore(&pendowner->pi_lock, flags);

@@ -426,8 +453,8 @@ static int task_blocks_on_rt_mutex(struc
  	__rt_mutex_adjust_prio(current);
  	waiter->task = current;
  	waiter->lock = lock;
-	plist_node_init(&waiter->list_entry, current->prio);
-	plist_node_init(&waiter->pi_list_entry, current->prio);
+	plist_node_init(&waiter->list_entry, rt_mutex_getprio_real(current));
+	plist_node_init(&waiter->pi_list_entry, waiter->list_entry.prio);

  	/* Get the top priority waiter on the lock */
  	if (rt_mutex_has_waiters(lock))
@@ -435,7 +462,8 @@ static int task_blocks_on_rt_mutex(struc
  	plist_add(&waiter->list_entry, &lock->wait_list);

  	current->pi_blocked_on = waiter;
-
+	current->boosting_prio = min(current->boosting_prio,
+				     waiter->list_entry.prio);
  	spin_unlock_irqrestore(&current->pi_lock, flags);

  	if (waiter == rt_mutex_top_waiter(lock)) {
@@ -499,7 +527,6 @@ static void wakeup_next_waiter(struct rt
  	plist_del(&waiter->pi_list_entry, &current->pi_waiters);
  	pendowner = waiter->task;
  	waiter->task = NULL;
-
  	rt_mutex_set_owner(lock, pendowner, RT_MUTEX_OWNER_PENDING);

  	spin_unlock_irqrestore(&current->pi_lock, flags);
@@ -518,6 +545,7 @@ static void wakeup_next_waiter(struct rt
  	WARN_ON(pendowner->pi_blocked_on->lock != lock);

  	pendowner->pi_blocked_on = NULL;
+	pendowner->boosting_prio = MAX_PRIO;

  	if (rt_mutex_has_waiters(lock)) {
  		struct rt_mutex_waiter *next;
@@ -525,6 +553,7 @@ static void wakeup_next_waiter(struct rt
  		next = rt_mutex_top_waiter(lock);
  		plist_add(&next->pi_list_entry, &pendowner->pi_waiters);
  	}
+	__rt_mutex_adjust_prio(pendowner);
  	spin_unlock_irqrestore(&pendowner->pi_lock, flags);

  	if (savestate)
@@ -539,7 +568,8 @@ static void wakeup_next_waiter(struct rt
   * Must be called with lock->wait_lock held
   */
  static void remove_waiter(struct rt_mutex *lock,
-			  struct rt_mutex_waiter *waiter  __IP_DECL__)
+			  struct rt_mutex_waiter *waiter
+			  __IP_DECL__)
  {
  	int first = (waiter == rt_mutex_top_waiter(lock));
  	int boost = 0;
@@ -621,7 +651,8 @@ static void fastcall noinline __sched
  rt_lock_slowlock(struct rt_mutex *lock __IP_DECL__)
  {
  	struct rt_mutex_waiter waiter;
-	unsigned long saved_state, state;      ;
+	unsigned long saved_state, state;
+	int adjust_prio_final = 0;

  	debug_rt_mutex_init_waiter(&waiter);
  	waiter.task = NULL;
@@ -661,12 +692,17 @@ rt_lock_slowlock(struct rt_mutex *lock _
  		 * when we have been woken up by the previous owner
  		 * but the lock got stolen by an higher prio task.
  		 */
-		if (!waiter.task) {
+		if (!waiter.task)
  			task_blocks_on_rt_mutex(lock, &waiter, 0 __IP__);
  			/* Wakeup during boost ? */
-			if (unlikely(!waiter.task))
+		else
+			rt_mutex_adjust_prio_chain_unlock(lock);
+
+		/*
+		 * We got it while lock->wait_lock was unlocked ?
+		 */
+		if (unlikely(!waiter.task))
  				continue;
-		}

  		/*
  		 * Prevent schedule() to drop BKL, while waiting for
@@ -699,8 +735,10 @@ rt_lock_slowlock(struct rt_mutex *lock _
  	 * highest-prio waiter (due to SCHED_OTHER changing prio), then we
  	 * can end up with a non-NULL waiter.task:
  	 */
-	if (unlikely(waiter.task))
-		remove_waiter(lock, &waiter __IP__);
+	if (unlikely(waiter.task)) {
+		remove_waiter(lock, &waiter  __IP__);
+		adjust_prio_final = 1;
+	}
  	/*
  	 * try_to_take_rt_mutex() sets the waiter bit
  	 * unconditionally. We might have to fix that up:
@@ -709,7 +747,11 @@ rt_lock_slowlock(struct rt_mutex *lock _

  	spin_unlock(&lock->wait_lock);

+	if (adjust_prio_final)
+		rt_mutex_adjust_prio_final(current);
+
  	debug_rt_mutex_free_waiter(&waiter);
+	BUG_ON(current->boosting_prio != MAX_PRIO);
  }

  /*
@@ -735,7 +777,8 @@ rt_lock_slowunlock(struct rt_mutex *lock
  	spin_unlock(&lock->wait_lock);

  	/* Undo pi boosting.when necessary */
-	rt_mutex_adjust_prio(current);
+	rt_mutex_adjust_prio_final(current);
+	BUG_ON(current->boosting_prio != MAX_PRIO);
  }

  void __lockfunc rt_lock(struct rt_mutex *lock)
@@ -795,6 +838,7 @@ rt_mutex_slowlock(struct rt_mutex *lock,
  {
  	int ret = 0, saved_lock_depth = -1;
  	struct rt_mutex_waiter waiter;
+	int adjust_prio_final = 0;

  	debug_rt_mutex_init_waiter(&waiter);
  	waiter.task = NULL;
@@ -848,21 +892,28 @@ rt_mutex_slowlock(struct rt_mutex *lock,
  		 * waiter.task is NULL the first time we come here and
  		 * when we have been woken up by the previous owner
  		 * but the lock got stolen by a higher prio task.
+		 *
+		 * If we get another kind of  wakeup but can't get the lock
+		 * we should try to adjust the priorities
  		 */
-		if (!waiter.task) {
+		if (!waiter.task)
  			ret = task_blocks_on_rt_mutex(lock, &waiter,
  						      detect_deadlock __IP__);
-			/*
-			 * If we got woken up by the owner then start loop
-			 * all over without going into schedule to try
-			 * to get the lock now:
-			 */
-			if (unlikely(!waiter.task))
-				continue;
+		else
+			rt_mutex_adjust_prio_chain_unlock(lock);
+
+
+		/*
+		 * If we got woken up by the owner then start loop
+		 * all over without going into schedule to try
+		 * to get the lock now:
+		 */
+		if (unlikely(!waiter.task))
+			continue;
+
+		if (unlikely(ret))
+			break;

-			if (unlikely(ret))
-				break;
-		}
  		saved_flags = current->flags & PF_NOSCHED;
  		current->flags &= ~PF_NOSCHED;

@@ -881,8 +932,13 @@ rt_mutex_slowlock(struct rt_mutex *lock,

  	set_current_state(TASK_RUNNING);

-	if (unlikely(waiter.task))
+	if (unlikely(waiter.task)) {
  		remove_waiter(lock, &waiter __IP__);
+		adjust_prio_final = 1;
+	}
+	else if (unlikely(ret))
+		adjust_prio_final = 1;
+

  	/*
  	 * try_to_take_rt_mutex() sets the waiter bit
@@ -892,17 +948,13 @@ rt_mutex_slowlock(struct rt_mutex *lock,

  	spin_unlock(&lock->wait_lock);

+	if (adjust_prio_final)
+		rt_mutex_adjust_prio_final(current);
+
  	/* Remove pending timer: */
  	if (unlikely(timeout))
  		hrtimer_cancel(&timeout->timer);

-	/*
-	 * Readjust priority, when we did not get the lock. We might
-	 * have been the pending owner and boosted. Since we did not
-	 * take the lock, the PI boost has to go.
-	 */
-	if (unlikely(ret))
-		rt_mutex_adjust_prio(current);

  	/* Must we reaquire the BKL? */
  	if (unlikely(saved_lock_depth >= 0))
@@ -910,6 +962,8 @@ rt_mutex_slowlock(struct rt_mutex *lock,

  	debug_rt_mutex_free_waiter(&waiter);

+	BUG_ON(current->boosting_prio != MAX_PRIO);
+
  	return ret;
  }

@@ -962,7 +1016,8 @@ rt_mutex_slowunlock(struct rt_mutex *loc
  	spin_unlock(&lock->wait_lock);

  	/* Undo pi boosting if necessary: */
-	rt_mutex_adjust_prio(current);
+	rt_mutex_adjust_prio_final(current);
+	BUG_ON(current->boosting_prio != MAX_PRIO);
  }

  /*
Index: linux-2.6.17-rt4/kernel/sched.c
===================================================================
--- linux-2.6.17-rt4.orig/kernel/sched.c
+++ linux-2.6.17-rt4/kernel/sched.c
@@ -57,6 +57,8 @@

  #include <asm/unistd.h>

+#include "rtmutex_common.h"
+
  /*
   * Convert user-nice values [ -20 ... 0 ... 19 ]
   * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
@@ -162,9 +164,8 @@
  	(JIFFIES_TO_NS(MAX_SLEEP_AVG * \
  		(MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))

-#define TASK_PREEMPTS_CURR(p, rq) \
-	((p)->prio < (rq)->curr->prio)
-
+#define TASK_PREEMPTS(p,q) task_preempts(p,q)
+#define TASK_PREEMPTS_CURR(p, rq) TASK_PREEMPTS(p, (rq)->curr)
  /*
   * task_timeslice() scales user-nice values [ -20 ... 0 ... 19 ]
   * to time slice values: [800ms ... 100ms ... 5ms]
@@ -399,6 +400,22 @@ static inline void finish_lock_switch(ru
  }
  #endif /* __ARCH_WANT_UNLOCKED_CTXSW */

+
+static inline int task_preempts(task_t *p, task_t *q)
+{
+	if (p->prio < q->prio)
+		return 1;
+
+#ifdef CONFIG_RT_MUTEXES
+	if (p->prio > q->prio)
+		return 0;
+
+	return p->boosting_prio != MAX_PRIO-1;
+#else
+	return 0;
+#endif
+}
+
  /*
   * task_rq_lock - lock the runqueue a given task resides on and disable
   * interrupts.  Note the ordering: we can safely lookup the task_rq without
@@ -646,7 +663,9 @@ static inline void sched_info_switch(tas
  #define sched_info_switch(t, next)	do { } while (0)
  #endif /* CONFIG_SCHEDSTATS */

+#ifdef CONFIG_SMP
  static __cacheline_aligned_in_smp atomic_t rt_overload;
+#endif

  static inline void inc_rt_tasks(task_t *p, runqueue_t *rq)
  {
@@ -710,7 +729,13 @@ static void enqueue_task(struct task_str
  		dump_stack();
  	}
  	sched_info_queued(p);
-	list_add_tail(&p->run_list, array->queue + p->prio);
+#ifdef CONFIG_RT_MUTEXES
+	if (p->boosting_prio != MAX_PRIO)
+		list_add(&p->run_list, array->queue + p->prio);
+	else
+#endif
+		list_add_tail(&p->run_list, array->queue + p->prio);
+
  	__set_bit(p->prio, array->bitmap);
  	array->nr_active++;
  	p->array = array;
@@ -723,7 +748,12 @@ static void enqueue_task(struct task_str
   */
  static void requeue_task(struct task_struct *p, prio_array_t *array)
  {
-	list_move_tail(&p->run_list, array->queue + p->prio);
+#ifdef CONFIG_RT_MUTEXES
+	if (p->boosting_prio != MAX_PRIO)
+		list_move(&p->run_list, array->queue + p->prio);
+	else
+#endif
+		list_move_tail(&p->run_list, array->queue + p->prio);
  }

  static inline void enqueue_task_head(struct task_struct *p, prio_array_t *array)
@@ -1313,7 +1343,7 @@ static void balance_rt_tasks(runqueue_t
  		 * Do we have an RT task that preempts
  		 * the to-be-scheduled task?
  		 */
-		if (p && (p->prio < next->prio)) {
+		if (p && TASK_PREEMPTS(p, next)) {
  			WARN_ON(p == src_rq->curr);
  			WARN_ON(!p->array);
  			schedstat_inc(this_rq, rto_pulled);
@@ -1473,10 +1503,11 @@ static inline int wake_idle(int cpu, tas
  static int try_to_wake_up(task_t *p, unsigned int state, int sync, int mutex)
  {
  	int cpu, this_cpu, success = 0;
-	runqueue_t *this_rq, *rq;
+	runqueue_t *rq;
  	unsigned long flags;
  	long old_state;
  #ifdef CONFIG_SMP
+	runqueue_t *this_rq;
  	unsigned long load, this_load;
  	struct sched_domain *sd, *this_sd = NULL;
  	int new_cpu;
@@ -4351,6 +4382,14 @@ int setscheduler(struct task_struct *p,
  			resched_task(rq->curr);
  	}
  	task_rq_unlock(rq, &flags);
+
+	/*
+	 * If the process is blocked on rt-mutex, it will now wake up and
+	 * reinsert itself into the wait list and boost the owner correctly
+	 */
+	if (p->pi_blocked_on)
+		wake_up_process_mutex(p);
+
  	spin_unlock_irqrestore(&p->pi_lock, fp);
  	return 0;
  }
@@ -7086,4 +7125,3 @@ void notrace preempt_enable_no_resched(v

  EXPORT_SYMBOL(preempt_enable_no_resched);
  #endif
-
Index: linux-2.6.17-rt4/include/linux/sched.h
===================================================================
--- linux-2.6.17-rt4.orig/include/linux/sched.h
+++ linux-2.6.17-rt4/include/linux/sched.h
@@ -994,6 +994,13 @@ struct task_struct {
  	struct plist_head pi_waiters;
  	/* Deadlock detection and priority inheritance handling */
  	struct rt_mutex_waiter *pi_blocked_on;
+
+	/*
+	 * Priority to which this task is boosting something. Don't lower the
+	 * priority below this before the boosting is resolved;
+	 */
+	int boosting_prio;
+
  # ifdef CONFIG_DEBUG_RT_MUTEXES
  	raw_spinlock_t held_list_lock;
  	struct list_head held_list_head;
Index: linux-2.6.17-rt4/scripts/rt-tester/t5-l4-pi-boost-deboost.tst
===================================================================
--- linux-2.6.17-rt4.orig/scripts/rt-tester/t5-l4-pi-boost-deboost.tst
+++ linux-2.6.17-rt4/scripts/rt-tester/t5-l4-pi-boost-deboost.tst
@@ -113,22 +113,22 @@ T: prioeq:		3:	84
  C: signal:		4: 	0
  W: unlocked:		4: 	3
  T: prioeq:		0: 	83
-T: prioeq:		1:	83
-T: prioeq:		2:	83
-T: prioeq:		3:	83
+T: prioeq:		1:	84
+T: prioeq:		2:	84
+T: prioeq:		3:	84

  # Signal T3
  C: signal:		3: 	0
  W: unlocked:		3: 	2
  T: prioeq:		0: 	82
-T: prioeq:		1:	82
-T: prioeq:		2:	82
+T: prioeq:		1:	84
+T: prioeq:		2:	84

  # Signal T2
  C: signal:		2: 	0
  W: unlocked:		2: 	1
  T: prioeq:		0: 	81
-T: prioeq:		1:	81
+T: prioeq:		1:	84

  # Signal T1
  C: signal:		1: 	0
Index: linux-2.6.17-rt4/kernel/fork.c
===================================================================
--- linux-2.6.17-rt4.orig/kernel/fork.c
+++ linux-2.6.17-rt4/kernel/fork.c
@@ -955,6 +955,7 @@ static inline void rt_mutex_init_task(st
  	spin_lock_init(&p->pi_lock);
  	plist_head_init(&p->pi_waiters, &p->pi_lock);
  	p->pi_blocked_on = NULL;
+	p->boosting_prio = MAX_PRIO;
  # ifdef CONFIG_DEBUG_RT_MUTEXES
  	spin_lock_init(&p->held_list_lock);
  	INIT_LIST_HEAD(&p->held_list_head);
Index: linux-2.6.17-rt4/include/linux/rtmutex.h
===================================================================
--- linux-2.6.17-rt4.orig/include/linux/rtmutex.h
+++ linux-2.6.17-rt4/include/linux/rtmutex.h
@@ -116,6 +116,7 @@ extern void rt_mutex_unlock(struct rt_mu
  # define INIT_RT_MUTEXES(tsk)						\
  	.pi_waiters	= PLIST_HEAD_INIT(tsk.pi_waiters, tsk.pi_lock),	\
  	.pi_lock	= RAW_SPIN_LOCK_UNLOCKED,			\
+        .boosting_prio  = MAX_PRIO                                      \
  	INIT_RT_MUTEX_DEBUG(tsk)
  #else
  # define INIT_RT_MUTEXES(tsk)
Index: linux-2.6.17-rt4/scripts/rt-tester/t5-l4-pi-boost-deboost-setsched.tst
===================================================================
--- linux-2.6.17-rt4.orig/scripts/rt-tester/t5-l4-pi-boost-deboost-setsched.tst
+++ linux-2.6.17-rt4/scripts/rt-tester/t5-l4-pi-boost-deboost-setsched.tst
@@ -107,10 +107,10 @@ T: prioeq:		3:	84
  # Reduce prio of T4
  C: schedfifo:		4: 	80
  T: prioeq:		0: 	83
-T: prioeq:		1:	83
-T: prioeq:		2:	83
-T: prioeq:		3:	83
-T: prioeq:		4:	80
+T: prioeq:		1:	84
+T: prioeq:		2:	84
+T: prioeq:		3:	84
+T: prioeq:		4:	84

  # Increase prio of T4
  C: schedfifo:		4: 	84
@@ -139,36 +139,46 @@ T: prioeq:		4:	84
  # Reduce prio of T3
  C: schedfifo:		3: 	83
  T: prioeq:		0: 	84
-T: prioeq:		1:	84
-T: prioeq:		2:	84
-T: prioeq:		3:	84
+T: prioeq:		1:	85
+T: prioeq:		2:	85
+T: prioeq:		3:	85
  T: prioeq:		4:	84

  # Signal T4
  C: signal:		4: 	0
  W: unlocked:		4: 	3
  T: prioeq:		0: 	83
-T: prioeq:		1:	83
-T: prioeq:		2:	83
-T: prioeq:		3:	83
+T: prioeq:		1:	85
+T: prioeq:		2:	85
+T: prioeq:		3:	85
+T: prioeq:              4:      84

  # Signal T3
  C: signal:		3: 	0
  W: unlocked:		3: 	2
  T: prioeq:		0: 	82
-T: prioeq:		1:	82
-T: prioeq:		2:	82
+T: prioeq:		1:	85
+T: prioeq:		2:	85
+T: prioeq:		3:	83
+T: prioeq:              4:      84

  # Signal T2
  C: signal:		2: 	0
  W: unlocked:		2: 	1
  T: prioeq:		0: 	81
-T: prioeq:		1:	81
+T: prioeq:		1:	85
+T: prioeq:		2:	82
+T: prioeq:		3:	83
+T: prioeq:              4:      84

  # Signal T1
  C: signal:		1: 	0
  W: unlocked:		1: 	0
  T: priolt:		0: 	1
+T: prioeq:		1:	81
+T: prioeq:		2:	82
+T: prioeq:		3:	83
+T: prioeq:              4:      84

  # Unlock and exit
  C: unlock:		3:	3
@@ -180,3 +190,9 @@ W: unlocked:		3:	3
  W: unlocked:		2:	2
  W: unlocked:		1:	1
  W: unlocked:		0:	0
+
+# Reset the -4 opcode from the signal
+C: reset:               1:      0
+C: reset:               2:      0
+C: reset:               3:      0
+C: reset:               4:      0
\ No newline at end of file

[-- Attachment #2: Type: TEXT/PLAIN, Size: 8691 bytes --]

Make test suite setscheduler calls asynchronously. Remove the waits in the test
cases and add a new testcase to verify the correctness of the setscheduler
priority propagation
This is based on and almost identical to the one Thomas Gleixner sent out 
earlier.
Index: linux-2.6.17-rt4/kernel/rtmutex-tester.c
===================================================================
--- linux-2.6.17-rt4.orig/kernel/rtmutex-tester.c
+++ linux-2.6.17-rt4/kernel/rtmutex-tester.c
@@ -55,7 +55,6 @@ enum test_opcodes {
 
 static int handle_op(struct test_thread_data *td, int lockwakeup)
 {
-	struct sched_param schedpar;
 	int i, id, ret = -EINVAL;
 
 	switch(td->opcode) {
@@ -63,17 +62,6 @@ static int handle_op(struct test_thread_
 	case RTTEST_NOP:
 		return 0;
 
-	case RTTEST_SCHEDOT:
-		schedpar.sched_priority = 0;
-		ret = sched_setscheduler(current, SCHED_NORMAL, &schedpar);
-		if (!ret)
-			set_user_nice(current, 0);
-		return ret;
-
-	case RTTEST_SCHEDRT:
-		schedpar.sched_priority = td->opdata;
-		return sched_setscheduler(current, SCHED_FIFO, &schedpar);
-
 	case RTTEST_LOCKCONT:
 		td->mutexes[td->opdata] = 1;
 		td->event = atomic_add_return(1, &rttest_event);
@@ -310,9 +298,10 @@ static int test_func(void *data)
 static ssize_t sysfs_test_command(struct sys_device *dev, const char *buf,
 				  size_t count)
 {
+	struct sched_param schedpar;
 	struct test_thread_data *td;
 	char cmdbuf[32];
-	int op, dat, tid;
+	int op, dat, tid, ret;
 
 	td = container_of(dev, struct test_thread_data, sysdev);
 	tid = td->sysdev.id;
@@ -334,6 +323,22 @@ static ssize_t sysfs_test_command(struct
 		return -EINVAL;
 
 	switch (op) {
+	case RTTEST_SCHEDOT:
+		schedpar.sched_priority = 0;
+		ret = sched_setscheduler(threads[tid], SCHED_NORMAL,
+					 &schedpar);
+		if (ret)
+			return ret;
+		set_user_nice(current, 0);
+		break;
+
+	case RTTEST_SCHEDRT:
+		schedpar.sched_priority = dat;
+		ret = sched_setscheduler(threads[tid], SCHED_FIFO, &schedpar);
+		if (ret)
+			return ret;
+		break;
+
 	case RTTEST_SIGNAL:
 		send_sig(SIGHUP, threads[tid], 0);
 		break;
Index: linux-2.6.17-rt4/scripts/rt-tester/reset-tester.py
===================================================================
--- /dev/null
+++ linux-2.6.17-rt4/scripts/rt-tester/reset-tester.py
@@ -0,0 +1,18 @@
+#!/usr/bin/env python
+
+sysfsprefix = "/sys/devices/system/rttest/rttest"
+statusfile = "/status"
+commandfile = "/command"
+
+for i in range(0,8):
+    cmdstr = "%s:%s" %("99", "0")
+    fname = "%s%d%s" %(sysfsprefix, i, commandfile)
+
+    try:
+        fcmd = open(fname, 'w')
+        fcmd.write(cmdstr)
+        fcmd.close()
+    except Exception,ex:
+        print i
+        print ex
+
Index: linux-2.6.17-rt4/scripts/rt-tester/t2-l1-signal.tst
===================================================================
--- linux-2.6.17-rt4.orig/scripts/rt-tester/t2-l1-signal.tst
+++ linux-2.6.17-rt4/scripts/rt-tester/t2-l1-signal.tst
@@ -77,3 +77,6 @@ T: opcodeeq:		1:	-4
 # Unlock and exit
 C: unlock:		0: 	0
 W: unlocked:		0: 	0
+
+# Reset the -4 opcode from the signal
+C: reset:               1:      0
\ No newline at end of file
Index: linux-2.6.17-rt4/scripts/rt-tester/t3-l1-pi-signal.tst
===================================================================
--- linux-2.6.17-rt4.orig/scripts/rt-tester/t3-l1-pi-signal.tst
+++ linux-2.6.17-rt4/scripts/rt-tester/t3-l1-pi-signal.tst
@@ -98,4 +98,5 @@ C: unlock:		1: 	0
 W: unlocked:		1: 	0
 
 
-
+# Reset the -4 opcode from the signal
+C: reset:               2:      0
\ No newline at end of file
Index: linux-2.6.17-rt4/scripts/rt-tester/t4-l2-pi-deboost.tst
===================================================================
--- linux-2.6.17-rt4.orig/scripts/rt-tester/t4-l2-pi-deboost.tst
+++ linux-2.6.17-rt4/scripts/rt-tester/t4-l2-pi-deboost.tst
@@ -125,3 +125,5 @@ W: unlocked:		2:	1
 C: unlock:		0:	0
 W: unlocked:		0:	0
 
+# Reset the -4 opcode from the signal
+C: reset:               3:      0
\ No newline at end of file
Index: linux-2.6.17-rt4/scripts/rt-tester/t5-l4-pi-boost-deboost-setsched.tst
===================================================================
--- /dev/null
+++ linux-2.6.17-rt4/scripts/rt-tester/t5-l4-pi-boost-deboost-setsched.tst
@@ -0,0 +1,182 @@
+#
+# rt-mutex test
+#
+# Op: C(ommand)/T(est)/W(ait)
+# |  opcode
+# |  |     threadid: 0-7
+# |  |     |  opcode argument
+# |  |     |  |
+# C: lock: 0: 0
+#
+# Commands
+#
+# opcode	opcode argument
+# schedother	nice value
+# schedfifo	priority
+# lock		lock nr (0-7)
+# locknowait	lock nr (0-7)
+# lockint	lock nr (0-7)
+# lockintnowait	lock nr (0-7)
+# lockcont	lock nr (0-7)
+# unlock	lock nr (0-7)
+# lockbkl	lock nr (0-7)
+# unlockbkl	lock nr (0-7)
+# signal	thread to signal (0-7)
+# reset		0
+# resetevent	0
+#
+# Tests / Wait
+#
+# opcode	opcode argument
+#
+# prioeq	priority
+# priolt	priority
+# priogt	priority
+# nprioeq	normal priority
+# npriolt	normal priority
+# npriogt	normal priority
+# locked	lock nr (0-7)
+# blocked	lock nr (0-7)
+# blockedwake	lock nr (0-7)
+# unlocked	lock nr (0-7)
+# lockedbkl	dont care
+# blockedbkl	dont care
+# unlockedbkl	dont care
+# opcodeeq	command opcode or number
+# opcodelt	number
+# opcodegt	number
+# eventeq	number
+# eventgt	number
+# eventlt	number
+
+#
+# 5 threads 4 lock PI - modify priority of blocked threads
+#
+C: resetevent:		0: 	0
+W: opcodeeq:		0: 	0
+
+# Set schedulers
+C: schedother:		0: 	0
+C: schedfifo:		1: 	81
+C: schedfifo:		2: 	82
+C: schedfifo:		3: 	83
+C: schedfifo:		4: 	84
+
+# T0 lock L0
+C: locknowait:		0: 	0
+W: locked:		0: 	0
+
+# T1 lock L1
+C: locknowait:		1: 	1
+W: locked:		1: 	1
+
+# T1 lock L0
+C: lockintnowait:	1: 	0
+W: blocked:		1: 	0
+T: prioeq:		0: 	81
+
+# T2 lock L2
+C: locknowait:		2: 	2
+W: locked:		2: 	2
+
+# T2 lock L1
+C: lockintnowait:	2: 	1
+W: blocked:		2: 	1
+T: prioeq:		0: 	82
+T: prioeq:		1:	82
+
+# T3 lock L3
+C: locknowait:		3: 	3
+W: locked:		3: 	3
+
+# T3 lock L2
+C: lockintnowait:	3: 	2
+W: blocked:		3: 	2
+T: prioeq:		0: 	83
+T: prioeq:		1:	83
+T: prioeq:		2:	83
+
+# T4 lock L3
+C: lockintnowait:	4:	3
+W: blocked:		4: 	3
+T: prioeq:		0: 	84
+T: prioeq:		1:	84
+T: prioeq:		2:	84
+T: prioeq:		3:	84
+
+# Reduce prio of T4
+C: schedfifo:		4: 	80
+T: prioeq:		0: 	83
+T: prioeq:		1:	83
+T: prioeq:		2:	83
+T: prioeq:		3:	83
+T: prioeq:		4:	80
+
+# Increase prio of T4
+C: schedfifo:		4: 	84
+T: prioeq:		0: 	84
+T: prioeq:		1:	84
+T: prioeq:		2:	84
+T: prioeq:		3:	84
+T: prioeq:		4:	84
+
+# Reduce prio of T3
+C: schedfifo:		3: 	80
+T: prioeq:		0: 	84
+T: prioeq:		1:	84
+T: prioeq:		2:	84
+T: prioeq:		3:	84
+T: prioeq:		4:	84
+
+# Increase prio of T3
+C: schedfifo:		3: 	85
+T: prioeq:		0: 	85
+T: prioeq:		1:	85
+T: prioeq:		2:	85
+T: prioeq:		3:	85
+T: prioeq:		4:	84
+
+# Reduce prio of T3
+C: schedfifo:		3: 	83
+T: prioeq:		0: 	84
+T: prioeq:		1:	84
+T: prioeq:		2:	84
+T: prioeq:		3:	84
+T: prioeq:		4:	84
+
+# Signal T4
+C: signal:		4: 	0
+W: unlocked:		4: 	3
+T: prioeq:		0: 	83
+T: prioeq:		1:	83
+T: prioeq:		2:	83
+T: prioeq:		3:	83
+
+# Signal T3
+C: signal:		3: 	0
+W: unlocked:		3: 	2
+T: prioeq:		0: 	82
+T: prioeq:		1:	82
+T: prioeq:		2:	82
+
+# Signal T2
+C: signal:		2: 	0
+W: unlocked:		2: 	1
+T: prioeq:		0: 	81
+T: prioeq:		1:	81
+
+# Signal T1
+C: signal:		1: 	0
+W: unlocked:		1: 	0
+T: priolt:		0: 	1
+
+# Unlock and exit
+C: unlock:		3:	3
+C: unlock:		2:	2
+C: unlock:		1:	1
+C: unlock:		0:	0
+
+W: unlocked:		3:	3
+W: unlocked:		2:	2
+W: unlocked:		1:	1
+W: unlocked:		0:	0
Index: linux-2.6.17-rt4/scripts/rt-tester/t5-l4-pi-boost-deboost.tst
===================================================================
--- linux-2.6.17-rt4.orig/scripts/rt-tester/t5-l4-pi-boost-deboost.tst
+++ linux-2.6.17-rt4/scripts/rt-tester/t5-l4-pi-boost-deboost.tst
@@ -146,3 +146,5 @@ W: unlocked:		2:	2
 W: unlocked:		1:	1
 W: unlocked:		0:	0
 
+# Reset the -4 opcode from the signal
+C: reset:               4:      0
\ No newline at end of file
Index: linux-2.6.17-rt4/scripts/rt-tester/check-all.sh
===================================================================
--- linux-2.6.17-rt4.orig/scripts/rt-tester/check-all.sh
+++ linux-2.6.17-rt4/scripts/rt-tester/check-all.sh
@@ -18,4 +18,4 @@ testit t3-l1-pi-steal.tst
 testit t3-l2-pi.tst
 testit t4-l2-pi-deboost.tst
 testit t5-l4-pi-boost-deboost.tst
-
+testit t5-l4-pi-boost-deboost-setsched.tst
\ No newline at end of file

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2006-07-03 10:48 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-18  7:06 2.6.17-rt1 Ingo Molnar
2006-06-18 16:13 ` 2.6.17-rt1 Michal Piotrowski
     [not found] ` <Pine.LNX.4.64.0606201656230.11643@localhost.localdomain>
2006-06-20 15:13   ` Why can't I set the priority of softirq-hrt? (Re: 2.6.17-rt1) Thomas Gleixner
2006-06-20 17:09     ` Esben Nielsen
2006-06-20 16:35       ` Thomas Gleixner
2006-06-20 21:16         ` Esben Nielsen
2006-06-20 20:35           ` Thomas Gleixner
2006-06-20 23:19             ` Esben Nielsen
2006-06-20 16:39       ` Steven Rostedt
2006-06-20 18:12         ` Esben Nielsen
2006-06-20 17:21           ` Thomas Gleixner
2006-06-20 21:26             ` Esben Nielsen
2006-06-20 20:51               ` Thomas Gleixner
2006-06-21  8:20               ` Steven Rostedt
2006-06-21 11:05                 ` Esben Nielsen
2006-06-21 15:43                   ` Esben Nielsen
2006-06-21 15:21                     ` Steven Rostedt
2006-06-21 16:37                       ` Esben Nielsen
2006-06-21 15:51                         ` Steven Rostedt
2006-06-21 17:14                           ` Esben Nielsen
2006-06-21 16:26                     ` Thomas Gleixner
2006-06-21 18:30                       ` Ingo Molnar
2006-06-22 10:28                         ` Esben Nielsen
2006-06-21 21:29                       ` Esben Nielsen
2006-06-21 20:33                         ` Thomas Gleixner
2006-06-21 23:35                           ` Esben Nielsen
2006-06-22  7:06                             ` Thomas Gleixner
2006-06-22 10:32                               ` Esben Nielsen
2006-06-22 13:33                               ` Steven Rostedt
2006-06-22 13:45                       ` Steven Rostedt
2006-06-22 14:20                         ` Thomas Gleixner
2006-06-22 14:23                           ` Steven Rostedt
2006-06-22 14:26                             ` Thomas Gleixner
2006-06-22 18:06                               ` Esben Nielsen
2006-06-22 18:05                                 ` Thomas Gleixner
2006-06-23 11:23                                   ` Esben Nielsen
2006-06-23 11:06                                     ` Steven Rostedt
2006-07-03 11:48                                     ` Esben Nielsen
2006-06-21  8:13           ` Steven Rostedt
2006-06-21 11:03             ` Esben Nielsen
2006-06-22  0:57 ` 2.6.17-rt1 Lee Revell
2006-06-22  2:51   ` More weird latency trace output (was Re: 2.6.17-rt1) Lee Revell
2006-06-23  1:24     ` Lee Revell
2006-06-24 22:15       ` Lee Revell
2006-06-24 22:12         ` Ingo Molnar
2006-06-24 22:31           ` Lee Revell
2006-06-24 23:49           ` Lee Revell
2006-06-23 20:56 ` 2.6.17-rt1 - mm_struct leak Vernon Mauery
2006-06-24  9:24   ` Mark Hounschell
2006-06-24  9:32     ` Mark Hounschell
2006-06-30 16:02   ` [PATCH -RT]Re: " Vernon Mauery

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).