* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15
@ 2004-12-10 17:49 Mark_H_Johnson
2004-12-10 21:09 ` Ingo Molnar
` (3 more replies)
0 siblings, 4 replies; 30+ messages in thread
From: Mark_H_Johnson @ 2004-12-10 17:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: Mark_H_Johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath,
emann, Gunther Persoons, K.R. Foley, linux-kernel,
Florian Schmidt, Fernando Pablo Lopez-Lezcano, Lee Revell,
Rui Nuno Capela, Shane Shrybman, Esben Nielsen, Thomas Gleixner,
Michal Schmidt
>The -32-15 kernel can be downloaded from the
>usual place:
>
> http://redhat.com/~mingo/realtime-preempt/
By the time I started today, the directory had -18 so that is what I built
with. Here are some initial results from running cpu_delay (the simple
RT test / user tracing) on a -18 PREEMPT_RT kernel. Have not started
any of the stress tests yet.
To recap, all IRQ # tasks, ksoftirqd/# and events/# tasks are RT FIFO
priority 99. The test program runs at RT FIFO 30 and should use about
70% of the CPU time on the two CPU SMP system under test.
[1] I still do not get traces where cpu_delay switches CPU's. I only
get trace output if it starts and ends on a single CPU. I also had
several cases where I "triggered" a trace but no output - I assume
those are related symptoms. For example:
# ./cpu_delay 0.000100
Delay limit set to 0.00010000 seconds
calibrating loop ....
time diff= 0.504598 or 396354830 loops/sec.
Trace activated with 0.000100 second delay.
Trace triggered with 0.000102 second delay. [not recorded]
Trace activated with 0.000100 second delay.
Trace triggered with 0.000164 second delay. [not recorded]
Trace activated with 0.000100 second delay.
Trace triggered with 0.000132 second delay. [00]
Trace activated with 0.000100 second delay.
Trace triggered with 0.000128 second delay. [01]
Trace activated with 0.000100 second delay.
Trace triggered with 0.000144 second delay. [not recorded]
Trace triggered with 0.000355 second delay. [02]
Trace activated with 0.000100 second delay.
Trace triggered with 0.000101 second delay. [not recorded]
Trace activated with 0.000100 second delay.
Trace triggered with 0.000126 second delay. [not recorded]
Trace triggered with 0.000205 second delay. [not recorded]
Trace activated with 0.000100 second delay.
Trace triggered with 0.000147 second delay. [03]
Trace activated with 0.000100 second delay.
Trace triggered with 0.000135 second delay. [04]
Trace triggered with 0.000110 second delay. [not recorded]
Trace triggered with 0.000247 second delay. [05]
Trace triggered with 0.000120 second delay. [06]
Trace activated with 0.000100 second delay.
Trace triggered with 0.000107 second delay. [07]
Trace triggered with 0.000104 second delay. [not recorded]
Trace activated with 0.000100 second delay.
Trace triggered with 0.000100 second delay. [08]
Trace activated with 0.000100 second delay.
Trace triggered with 0.000201 second delay. [09]
# chrt -f 1 ./get_ltrace.sh 50
Current Maximum is 4965280, limit will be 50.
Resetting max latency from 4965280 to 50.
No new latency samples at Fri Dec 10 11:01:22 CST 2004.
No new latency samples at Fri Dec 10 11:01:32 CST 2004.
No new latency samples at Fri Dec 10 11:01:42 CST 2004.
No new latency samples at Fri Dec 10 11:01:53 CST 2004.
No new latency samples at Fri Dec 10 11:02:03 CST 2004.
No new latency samples at Fri Dec 10 11:02:13 CST 2004.
New trace 0 w/ 117 usec latency.
Resetting max latency from 117 to 50.
No new latency samples at Fri Dec 10 11:02:35 CST 2004.
No new latency samples at Fri Dec 10 11:02:45 CST 2004.
New trace 1 w/ 99 usec latency.
Resetting max latency from 99 to 50.
New trace 2 w/ 248 usec latency.
Resetting max latency from 248 to 50.
New trace 3 w/ 146 usec latency.
Resetting max latency from 146 to 50.
New trace 4 w/ 134 usec latency.
Resetting max latency from 134 to 50.
New trace 5 w/ 250 usec latency.
Resetting max latency from 250 to 50.
New trace 6 w/ 120 usec latency.
Resetting max latency from 120 to 50.
New trace 7 w/ 105 usec latency.
Resetting max latency from 105 to 50.
New trace 8 w/ 91 usec latency.
Resetting max latency from 91 to 50.
New trace 9 w/ 200 usec latency.
For the most part, the elapsed times in the traces match closely to what
was measured by the application.
[2] The all CPU traces appear to show some cases where both ksoftirqd
tasks are active during a preemption of the RT task. That is expected
with my priority settings. [though the first trace shows BOTH getting
activated within 2 usec!]
[3] Some traces show information on both CPU's and then a long period
with no traces from the other. Here is an example...
preemption latency trace v1.1.4 on 2.6.10-rc2-mm3-V0.7.32-18RT
--------------------------------------------------------------------
latency: 99 us, #275/275, CPU#0 | (M:rt VP:0, KP:1, SP:1 HP:1 #P:2)
-----------------
| task: cpu_delay-3556 (uid:0 nice:0 policy:1 rt_prio:30)
-----------------
=> started at: <00000000>
=> ended at: <00000000>
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
<unknown-2847 1d... 0µs : smp_apic_timer_interrupt (86a89bf 0 0)
cpu_dela-3556 0d.h1 0µs : update_process_times
(smp_apic_timer_interrupt)
cpu_dela-3556 0d.h1 0µs : account_system_time (update_process_times)
cpu_dela-3556 0d.h1 0µs : check_rlimit (account_system_time)
<unknown-2847 1d.h. 0µs : update_process_times
(smp_apic_timer_interrupt)
cpu_dela-3556 0d.h1 0µs : account_it_prof (account_system_time)
...
<unknown-2847 1d.h1 4µs : _raw_spin_unlock (scheduler_tick)
cpu_dela-3556 0d.h1 4µs : irq_exit (apic_timer_interrupt)
<unknown-2847 1d.h. 4µs : rebalance_tick (scheduler_tick)
cpu_dela-3556 0d..2 4µs : do_softirq (irq_exit)
cpu_dela-3556 0d..2 4µs : __do_softirq (do_softirq)
<unknown-2847 1d.h. 5µs : irq_exit (apic_timer_interrupt)
cpu_dela-3556 0d..2 5µs : wake_up_process (do_softirq)
cpu_dela-3556 0d..2 5µs : try_to_wake_up (wake_up_process)
cpu_dela-3556 0d..2 5µs : task_rq_lock (try_to_wake_up)
<unknown-2847 1d... 5µs < (0)
cpu_dela-3556 0d..2 5µs : _raw_spin_lock (task_rq_lock)
cpu_dela-3556 0d..3 6µs : activate_task (try_to_wake_up)
cpu_dela-3556 0d..3 6µs : sched_clock (activate_task)
cpu_dela-3556 0d..3 7µs : recalc_task_prio (activate_task)
cpu_dela-3556 0d..3 7µs : effective_prio (recalc_task_prio)
cpu_dela-3556 0d..3 7µs : activate_task <ksoftirq-4> (0 1):
cpu_dela-3556 0d..3 7µs : enqueue_task (activate_task)
cpu_dela-3556 0d..3 8µs : resched_task (try_to_wake_up)
cpu_dela-3556 0dn.3 8µs : __trace_start_sched_wakeup (try_to_wake_up)
cpu_dela-3556 0dn.3 8µs : try_to_wake_up <ksoftirq-4> (0 45):
cpu_dela-3556 0dn.3 9µs : _raw_spin_unlock_irqrestore (try_to_wake_up)
cpu_dela-3556 0dn.2 9µs : preempt_schedule (try_to_wake_up)
cpu_dela-3556 0dn.2 9µs : wake_up_process (do_softirq)
cpu_dela-3556 0dn.1 10µs < (0)
cpu_dela-3556 0.n.1 10µs : preempt_schedule (up)
cpu_dela-3556 0.n.. 10µs : preempt_schedule (user_trace_start)
cpu_dela-3556 0dn.. 11µs : __sched_text_start (preempt_schedule)
cpu_dela-3556 0dn.1 11µs : sched_clock (__sched_text_start)
cpu_dela-3556 0dn.1 11µs : _raw_spin_lock_irq (__sched_text_start)
cpu_dela-3556 0dn.1 12µs : _raw_spin_lock_irqsave (__sched_text_start)
cpu_dela-3556 0dn.2 12µs : pull_rt_tasks (__sched_text_start)
cpu_dela-3556 0dn.2 12µs : find_next_bit (pull_rt_tasks)
cpu_dela-3556 0dn.2 13µs : find_next_bit (pull_rt_tasks)
cpu_dela-3556 0d..2 13µs : trace_array (__sched_text_start)
cpu_dela-3556 0d..2 13µs : trace_array <ksoftirq-4> (0 1):
cpu_dela-3556 0d..2 15µs : trace_array <cpu_dela-3556> (45 46):
cpu_dela-3556 0d..2 16µs+: trace_array (__sched_text_start)
ksoftirq-4 0d..2 19µs : __switch_to (__sched_text_start)
ksoftirq-4 0d..2 20µs : __sched_text_start <cpu_dela-3556> (45 0):
ksoftirq-4 0d..2 20µs : finish_task_switch (__sched_text_start)
ksoftirq-4 0d..2 20µs : smp_send_reschedule_allbutself
(finish_task_switch)
ksoftirq-4 0d..2 20µs : send_IPI_allbutself
(smp_send_reschedule_allbutself)
ksoftirq-4 0d..2 21µs : __bitmap_weight (send_IPI_allbutself)
ksoftirq-4 0d..2 21µs : __send_IPI_shortcut (send_IPI_allbutself)
ksoftirq-4 0d..2 21µs : _raw_spin_unlock (finish_task_switch)
ksoftirq-4 0d..1 22µs : trace_stop_sched_switched
(finish_task_switch)
ksoftirq-4 0.... 23µs : _do_softirq (ksoftirqd)
ksoftirq-4 0d... 23µs : ___do_softirq (_do_softirq)
ksoftirq-4 0.... 23µs : run_timer_softirq (___do_softirq)
ksoftirq-4 0.... 24µs : _spin_lock (run_timer_softirq)
ksoftirq-4 0.... 24µs : __spin_lock (_spin_lock)
ksoftirq-4 0.... 24µs : __might_sleep (__spin_lock)
<unknown-2847 1d... 24µs : smp_reschedule_interrupt (86a8bd8 0 0)
ksoftirq-4 0.... 24µs : _down_mutex (__spin_lock)
<unknown-2847 1d... 25µs < (0)
ksoftirq-4 0.... 25µs : __down_mutex (__spin_lock)
ksoftirq-4 0.... 25µs : __might_sleep (__down_mutex)
ksoftirq-4 0d... 25µs : _raw_spin_lock (__down_mutex)
ksoftirq-4 0d..1 25µs : _raw_spin_lock (__down_mutex)
ksoftirq-4 0d..2 26µs : _raw_spin_lock (__down_mutex)
ksoftirq-4 0d..3 26µs : set_new_owner (__down_mutex)
ksoftirq-4 0d..3 26µs : set_new_owner <ksoftirq-4> (0 0):
ksoftirq-4 0d..3 27µs : _raw_spin_unlock (__down_mutex)
ksoftirq-4 0d..2 27µs : _raw_spin_unlock (__down_mutex)
ksoftirq-4 0d..1 27µs : _raw_spin_unlock (__down_mutex)
... no more traces from CPU 1 ...
ksoftirq-4 0.... 77µs : rcu_check_quiescent_state
(__rcu_process_callbacks)
ksoftirq-4 0.... 77µs : cond_resched_all (___do_softirq)
ksoftirq-4 0.... 77µs : cond_resched (___do_softirq)
ksoftirq-4 0.... 78µs : cond_resched (ksoftirqd)
ksoftirq-4 0.... 78µs : kthread_should_stop (ksoftirqd)
ksoftirq-4 0.... 78µs : schedule (ksoftirqd)
ksoftirq-4 0.... 78µs : __sched_text_start (schedule)
ksoftirq-4 0...1 79µs : sched_clock (__sched_text_start)
ksoftirq-4 0...1 79µs : _raw_spin_lock_irq (__sched_text_start)
ksoftirq-4 0...1 79µs : _raw_spin_lock_irqsave (__sched_text_start)
ksoftirq-4 0d..2 80µs : deactivate_task (__sched_text_start)
ksoftirq-4 0d..2 80µs : dequeue_task (deactivate_task)
ksoftirq-4 0d..2 81µs : trace_array (__sched_text_start)
ksoftirq-4 0d..2 82µs : trace_array <cpu_dela-3556> (45 46):
ksoftirq-4 0d..2 84µs+: trace_array (__sched_text_start)
cpu_dela-3556 0d..2 86µs : __switch_to (__sched_text_start)
cpu_dela-3556 0d..2 87µs : __sched_text_start <ksoftirq-4> (0 45):
cpu_dela-3556 0d..2 87µs : finish_task_switch (__sched_text_start)
cpu_dela-3556 0d..2 87µs : _raw_spin_unlock (finish_task_switch)
cpu_dela-3556 0d..1 88µs : trace_stop_sched_switched
(finish_task_switch)
cpu_dela-3556 0d... 89µs+< (0)
cpu_dela-3556 0d... 92µs : math_state_restore (device_not_available)
cpu_dela-3556 0d... 92µs : restore_fpu (math_state_restore)
cpu_dela-3556 0d... 93µs < (0)
cpu_dela-3556 0.... 93µs > sys_gettimeofday (00000000 00000000 0000007b)
cpu_dela-3556 0.... 93µs : sys_gettimeofday (sysenter_past_esp)
cpu_dela-3556 0.... 94µs : user_trace_stop (sys_gettimeofday)
cpu_dela-3556 0...1 94µs : user_trace_stop (sys_gettimeofday)
cpu_dela-3556 0...1 95µs : _raw_spin_lock_irqsave (user_trace_stop)
cpu_dela-3556 0d..2 95µs : _raw_spin_unlock_irqrestore (user_trace_stop)
If I read this right, we tried to reschedule cpu_delay on CPU #1 (at
24 usec) but it never happened and cpu_delay was still "ready to run"
on CPU #0 70 usec later.
[4] I have a trace where cpu_delay was bumped off of CPU #1 at 20 usec
while the X server (not RT) was the active process on CPU #0 for another
130 usec (several traces with preempt-depth ==0) when it finally gets
bumped by IRQ 0.
[5] More of a cosmetic problem, several traces still show the
application name as "unknown" - even for long lived processes like
ksoftirqd/0 and X.
Due to the file size, I will send the traces separately.
--Mark
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15 2004-12-10 17:49 [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15 Mark_H_Johnson @ 2004-12-10 21:09 ` Ingo Molnar 2004-12-10 21:12 ` Ingo Molnar ` (2 subsequent siblings) 3 siblings, 0 replies; 30+ messages in thread From: Ingo Molnar @ 2004-12-10 21:09 UTC (permalink / raw) To: Mark_H_Johnson Cc: Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, K.R. Foley, linux-kernel, Florian Schmidt, Fernando Pablo Lopez-Lezcano, Lee Revell, Rui Nuno Capela, Shane Shrybman, Esben Nielsen, Thomas Gleixner, Michal Schmidt * Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote: > [1] I still do not get traces where cpu_delay switches CPU's. I only > get trace output if it starts and ends on a single CPU. [...] lt001.18RT/lt.02 is such a trace. It starts on CPU#1: <unknown-3556 1...1 0µs : find_next_bit (user_trace_start) and ends on CPU#0: <unknown-3556 1...1 247µs : _raw_spin_lock_irqsave (user_trace_stop) the trace shows a typical migration of an RT task. (but ... i have to say the debugging overhead is horrible. Please try a completely-non-debug-non-tracing kernel just to see the difference.) Ingo ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15 2004-12-10 17:49 [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15 Mark_H_Johnson 2004-12-10 21:09 ` Ingo Molnar @ 2004-12-10 21:12 ` Ingo Molnar 2004-12-10 21:24 ` Ingo Molnar 2004-12-13 0:16 ` Fernando Lopez-Lezcano 3 siblings, 0 replies; 30+ messages in thread From: Ingo Molnar @ 2004-12-10 21:12 UTC (permalink / raw) To: Mark_H_Johnson Cc: Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, K.R. Foley, linux-kernel, Florian Schmidt, Fernando Pablo Lopez-Lezcano, Lee Revell, Rui Nuno Capela, Shane Shrybman, Esben Nielsen, Thomas Gleixner, Michal Schmidt * Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote: > [3] Some traces show information on both CPU's and then a long period > with no traces from the other. Here is an example... > <unknown-2847 1d.h. 4µs : rebalance_tick (scheduler_tick) > <unknown-2847 1d.h. 5µs : irq_exit (apic_timer_interrupt) > <unknown-2847 1d... 5µs < (0) > ... no more traces from CPU 1 ... PID 2847 returned to userspace at timestamp 5µs. Userspace then can take an arbitrary amount of time until it calls the kernel again. Ingo ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15 2004-12-10 17:49 [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15 Mark_H_Johnson 2004-12-10 21:09 ` Ingo Molnar 2004-12-10 21:12 ` Ingo Molnar @ 2004-12-10 21:24 ` Ingo Molnar 2004-12-13 0:16 ` Fernando Lopez-Lezcano 3 siblings, 0 replies; 30+ messages in thread From: Ingo Molnar @ 2004-12-10 21:24 UTC (permalink / raw) To: Mark_H_Johnson Cc: Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, K.R. Foley, linux-kernel, Florian Schmidt, Fernando Pablo Lopez-Lezcano, Lee Revell, Rui Nuno Capela, Shane Shrybman, Esben Nielsen, Thomas Gleixner, Michal Schmidt * Mark_H_Johnson@raytheon.com <Mark_H_Johnson@raytheon.com> wrote: > [...] I also had several cases where I "triggered" a trace but no > output - I assume those are related symptoms. For example: > > # ./cpu_delay 0.000100 > Delay limit set to 0.00010000 seconds > calibrating loop .... > time diff= 0.504598 or 396354830 loops/sec. > Trace activated with 0.000100 second delay. > Trace triggered with 0.000102 second delay. [not recorded] > Trace activated with 0.000100 second delay. > Trace triggered with 0.000164 second delay. [not recorded] is the userspace delay measurement nested inside the kernel-based method? I.e. is it something like: gettimeofday(0,1); timestamp1 = cycles(); ... loop some ... timestamp2 = cycles(); gettimeofday(0,0); and do you get 'unreported' latencies in such a case too? If yes then that would indeed indicate a tracer bug. But if the measurement is done like this: gettimeofday(0,1); timestamp1 = cycles(); ... loop some ... gettimeofday(0,0); // [1] timestamp2 = cycles(); // [2] then a delay could get inbetween [1] and [2]. OTOH if the 'loop some' time is long enough then the [1]-[2] window is too small to be significant statistically, while your logs show a near 50% 'miss rate'. Ingo ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15 2004-12-10 17:49 [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15 Mark_H_Johnson ` (2 preceding siblings ...) 2004-12-10 21:24 ` Ingo Molnar @ 2004-12-13 0:16 ` Fernando Lopez-Lezcano 2004-12-13 6:47 ` Ingo Molnar 3 siblings, 1 reply; 30+ messages in thread From: Fernando Lopez-Lezcano @ 2004-12-13 0:16 UTC (permalink / raw) To: ngo Molnar, Mark_H_Johnson Cc: Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, K.R. Foley, linux-kernel, Florian Schmidt, Lee Revell, Rui Nuno Capela, Shane Shrybman, Esben Nielsen, Thomas Gleixner, Michal Schmidt logOn Fri, 2004-12-10 at 09:49, Mark_H_Johnson@raytheon.com wrote: > >The -32-15 kernel can be downloaded from the > >usual place: > > > > http://redhat.com/~mingo/realtime-preempt/ > > By the time I started today, the directory had -18 so that is what I built > with. Here are some initial results from running cpu_delay (the simple > RT test / user tracing) on a -18 PREEMPT_RT kernel. Have not started > any of the stress tests yet. Something that just happened to me: running 0.7.32-14 (PREEMPT_DESKTOP) and trying to install 0.7.32-19 from a custom built rpm package completely hangs the machine (p4 laptop - I tried twice). No clues left behind. If I boot into 0.7.32-9 I can install 0.7.32-19 with no problems. -- Fernando ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15 2004-12-13 0:16 ` Fernando Lopez-Lezcano @ 2004-12-13 6:47 ` Ingo Molnar 2004-12-14 0:46 ` Fernando Lopez-Lezcano 0 siblings, 1 reply; 30+ messages in thread From: Ingo Molnar @ 2004-12-13 6:47 UTC (permalink / raw) To: Fernando Lopez-Lezcano Cc: Mark_H_Johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, K.R. Foley, linux-kernel, Florian Schmidt, Lee Revell, Rui Nuno Capela, Shane Shrybman, Esben Nielsen, Thomas Gleixner, Michal Schmidt * Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote: > Something that just happened to me: running 0.7.32-14 > (PREEMPT_DESKTOP) and trying to install 0.7.32-19 from a custom built > rpm package completely hangs the machine (p4 laptop - I tried twice). > No clues left behind. If I boot into 0.7.32-9 I can install 0.7.32-19 > with no problems. does 0.7.32-19 work better if you reverse (patch -R) the loop.h and loop.c bits (see below)? Ingo --- linux/drivers/block/loop.c.orig +++ linux/drivers/block/loop.c @@ -378,7 +378,7 @@ static void loop_add_bio(struct loop_dev lo->lo_bio = lo->lo_biotail = bio; spin_unlock_irqrestore(&lo->lo_lock, flags); - up(&lo->lo_bh_mutex); + complete(&lo->lo_bh_done); } /* @@ -427,7 +427,7 @@ static int loop_make_request(request_que return 0; err: if (atomic_dec_and_test(&lo->lo_pending)) - up(&lo->lo_bh_mutex); + complete(&lo->lo_bh_done); out: bio_io_error(old_bio, old_bio->bi_size); return 0; @@ -495,12 +495,12 @@ static int loop_thread(void *data) /* * up sem, we are running */ - up(&lo->lo_sem); + complete(&lo->lo_done); for (;;) { - down_interruptible(&lo->lo_bh_mutex); + wait_for_completion_interruptible(&lo->lo_bh_done); /* - * could be upped because of tear-down, not because of + * could be completed because of tear-down, not because of * pending work */ if (!atomic_read(&lo->lo_pending)) @@ -521,7 +521,7 @@ static int loop_thread(void *data) break; } - up(&lo->lo_sem); + complete(&lo->lo_done); return 0; } @@ -708,7 +708,7 @@ static int loop_set_fd(struct loop_devic set_blocksize(bdev, lo_blocksize); kernel_thread(loop_thread, lo, CLONE_KERNEL); - down(&lo->lo_sem); + wait_for_completion(&lo->lo_done); return 0; out_putf: @@ -773,10 +773,10 @@ static int loop_clr_fd(struct loop_devic spin_lock_irq(&lo->lo_lock); lo->lo_state = Lo_rundown; if (atomic_dec_and_test(&lo->lo_pending)) - up(&lo->lo_bh_mutex); + complete(&lo->lo_bh_done); spin_unlock_irq(&lo->lo_lock); - down(&lo->lo_sem); + wait_for_completion(&lo->lo_done); lo->lo_backing_file = NULL; @@ -1153,8 +1153,8 @@ int __init loop_init(void) if (!lo->lo_queue) goto out_mem4; init_MUTEX(&lo->lo_ctl_mutex); - init_MUTEX_LOCKED(&lo->lo_sem); - init_MUTEX_LOCKED(&lo->lo_bh_mutex); + init_completion(&lo->lo_done); + init_completion(&lo->lo_bh_done); lo->lo_number = i; spin_lock_init(&lo->lo_lock); disk->major = LOOP_MAJOR; --- linux/include/linux/loop.h.orig +++ linux/include/linux/loop.h @@ -58,9 +58,9 @@ struct loop_device { struct bio *lo_bio; struct bio *lo_biotail; int lo_state; - struct semaphore lo_sem; + struct completion lo_done; + struct completion lo_bh_done; struct semaphore lo_ctl_mutex; - struct semaphore lo_bh_mutex; atomic_t lo_pending; request_queue_t *lo_queue; ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15 2004-12-13 6:47 ` Ingo Molnar @ 2004-12-14 0:46 ` Fernando Lopez-Lezcano 2004-12-14 4:42 ` K.R. Foley 0 siblings, 1 reply; 30+ messages in thread From: Fernando Lopez-Lezcano @ 2004-12-14 0:46 UTC (permalink / raw) To: Ingo Molnar Cc: Mark_H_Johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, K.R. Foley, linux-kernel, Florian Schmidt, Lee Revell, Rui Nuno Capela, Shane Shrybman, Esben Nielsen, Thomas Gleixner, Michal Schmidt On Sun, 2004-12-12 at 22:47, Ingo Molnar wrote: > * Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote: > > > Something that just happened to me: running 0.7.32-14 > > (PREEMPT_DESKTOP) and trying to install 0.7.32-19 from a custom built > > rpm package completely hangs the machine (p4 laptop - I tried twice). > > No clues left behind. If I boot into 0.7.32-9 I can install 0.7.32-19 > > with no problems. > > does 0.7.32-19 work better if you reverse (patch -R) the loop.h and > loop.c bits (see below)? Running 0.7.32-19 (no changes) I managed to install 0.7.32-20 with no problems... probably a problem in -14 that was somehow fixed in later releases. -- Fernando ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15 2004-12-14 0:46 ` Fernando Lopez-Lezcano @ 2004-12-14 4:42 ` K.R. Foley 2004-12-14 8:47 ` Rui Nuno Capela 0 siblings, 1 reply; 30+ messages in thread From: K.R. Foley @ 2004-12-14 4:42 UTC (permalink / raw) To: Fernando Lopez-Lezcano Cc: Ingo Molnar, Mark_H_Johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, linux-kernel, Florian Schmidt, Lee Revell, Rui Nuno Capela, Shane Shrybman, Esben Nielsen, Thomas Gleixner, Michal Schmidt Fernando Lopez-Lezcano wrote: > On Sun, 2004-12-12 at 22:47, Ingo Molnar wrote: > >>* Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote: >> >> >>>Something that just happened to me: running 0.7.32-14 >>>(PREEMPT_DESKTOP) and trying to install 0.7.32-19 from a custom built >>>rpm package completely hangs the machine (p4 laptop - I tried twice). >>>No clues left behind. If I boot into 0.7.32-9 I can install 0.7.32-19 >>>with no problems. >> >>does 0.7.32-19 work better if you reverse (patch -R) the loop.h and >>loop.c bits (see below)? > > > Running 0.7.32-19 (no changes) I managed to install 0.7.32-20 with no > problems... probably a problem in -14 that was somehow fixed in later > releases. > > -- Fernando Possibly. I have had the occasional problem with running make install locking one of my systems. Rebooting and running make install again works fine in my case. It is by no means a regular occurrence, even installing 2 or 3 new kernels daily on 3 different machines. kr ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15 2004-12-14 4:42 ` K.R. Foley @ 2004-12-14 8:47 ` Rui Nuno Capela 2004-12-14 11:35 ` Ingo Molnar 0 siblings, 1 reply; 30+ messages in thread From: Rui Nuno Capela @ 2004-12-14 8:47 UTC (permalink / raw) To: K.R. Foley Cc: Fernando Lopez-Lezcano, Ingo Molnar, mark_h_johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, linux-kernel, Florian Schmidt, Lee Revell, Shane Shrybman, Esben Nielsen, Thomas Gleixner, Michal Schmidt K.R. Foley wrote: > Fernando Lopez-Lezcano wrote: >> On Sun, 2004-12-12 at 22:47, Ingo Molnar wrote: >> >>>* Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote: >>> >>> >>>>Something that just happened to me: running 0.7.32-14 >>>>(PREEMPT_DESKTOP) and trying to install 0.7.32-19 from a custom built >>>>rpm package completely hangs the machine (p4 laptop - I tried twice). >>>>No clues left behind. If I boot into 0.7.32-9 I can install 0.7.32-19 >>>>with no problems. >>> >>>does 0.7.32-19 work better if you reverse (patch -R) the loop.h and >>>loop.c bits (see below)? >> >> >> Running 0.7.32-19 (no changes) I managed to install 0.7.32-20 with no >> problems... probably a problem in -14 that was somehow fixed in later >> releases. >> >> -- Fernando > > Possibly. I have had the occasional problem with running make install > locking one of my systems. Rebooting and running make install again > works fine in my case. It is by no means a regular occurrence, even > installing 2 or 3 new kernels daily on 3 different machines. > Isn't this tightly related to mkinitrd sometimes hanging while on mount -o loop, that I've been reporting a couple of times before? It used to hang on any other time I do a new kernel install, but latetly it seems to be OK (RT-V0.9.32-19 and -20). -- rncbc aka Rui Nuno Capela rncbc@rncbc.org ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15 2004-12-14 8:47 ` Rui Nuno Capela @ 2004-12-14 11:35 ` Ingo Molnar 2004-12-27 14:35 ` Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) Esben Nielsen 0 siblings, 1 reply; 30+ messages in thread From: Ingo Molnar @ 2004-12-14 11:35 UTC (permalink / raw) To: Rui Nuno Capela Cc: K.R. Foley, Fernando Lopez-Lezcano, mark_h_johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, linux-kernel, Florian Schmidt, Lee Revell, Shane Shrybman, Esben Nielsen, Thomas Gleixner, Michal Schmidt * Rui Nuno Capela <rncbc@rncbc.org> wrote: > Isn't this tightly related to mkinitrd sometimes hanging while on > mount -o loop, that I've been reporting a couple of times before? It > used to hang on any other time I do a new kernel install, but latetly > it seems to be OK (RT-V0.9.32-19 and -20). yeah, i've added Thomas Gleixner's earlier semaphore->completion conversion to the loop device, to -19 or -18. Ingo ^ permalink raw reply [flat|nested] 30+ messages in thread
* Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2004-12-14 11:35 ` Ingo Molnar @ 2004-12-27 14:35 ` Esben Nielsen 2004-12-27 15:27 ` Steven Rostedt 2005-01-28 7:38 ` Ingo Molnar 0 siblings, 2 replies; 30+ messages in thread From: Esben Nielsen @ 2004-12-27 14:35 UTC (permalink / raw) To: Ingo Molnar Cc: Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, mark_h_johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, linux-kernel, Florian Schmidt, Lee Revell, Shane Shrybman, Thomas Gleixner, Michal Schmidt I haven't seen much traffic on real-time preemption lately. Is it due to Christmas or lost interest? I noticed that you changed rw-locks to behave quite diferently under real-time preemption: They basicly works like normal locks now. I.e. there can only be one reader task within each region. This can can however lock the region recursively. I wanted to start looking at fixing that because it ought to hurt scalability quite a bit - and even on UP create a few unneeded task-switchs. However, the more I think about it the bigger the problem: First, let me describe how I see a read-write lock. It has 3 states: a) unlocked b) locked by n readers c) locked by 1 writer There can either be 1 writer within the protected region or n>=0 readers within the region. When a writer wants to take the lock, calling down_write(), it has to wait until the read count is 0. When a reader wants to take the lock, calling down_read(), he has only to wait until the the writer is done - there is no need to wait for the other readers. Now in a real-time system down_X() ought to have a deterministic blocking time. It should be easy to make down_read() deterministic: If there is a writer let it inherit the calling readers priority. However, down_write() is hard to make deterministic. Even if we assume that the lock not only keeps track of the number of readers but keeps a list of all the reader threads within the region it can traverse the list and boost the priority of all those threads. If there is n readers when down_write() is called the blocking time would be O(ceil(n/#cpus)) time - which is unbounded as n is not known! Having a rw-lock with deterministic down_read() but non-deterministic down_write() would be very usefull in a lot of cases. The characteritic is that the data structure being protected is relative static, is going to be used by a lot of RT readers and the updates doesn't have to be done with any real-time requirements. However, there is no way to know in general which locks in the kernel can be allowed to work like that and which can't. A good compromise would be limit the number of readers in a lock by the number of cpu's on the system. That would make the system scale over several CPUs without hitting unneeded congestions on read-locks and still have a determnistic down_write(). down_write() shall then do the following: Boost the priority of all the active readers to the priority of the caller. This will in turn distribute the readers over the cpu's of the system assuming no higher priority RT tasks are running. All the reader tasks will then run to up_read() in time O(1) as they can all run in parellel - assuming there is no ugly nested locking ofcourse! down_read() should first check if there is a writer. If there is boost it and wait. If there isn't but there isn't room for another reader boost one of the readers such it will run to up_read(). An extra bonus of not having the number of readers bounded: The various structures needed for making the list of readers can be allocated once. There is no need to call kmalloc() from within down_read() to get a list element for the lock's list of readers. I don't know wether I have time for coding this soon. Under all circumstances I do not have a SMP system so I can't really test it if I get time to code it :-( Esben On Tue, 14 Dec 2004, Ingo Molnar wrote: > > * Rui Nuno Capela <rncbc@rncbc.org> wrote: > > > Isn't this tightly related to mkinitrd sometimes hanging while on > > mount -o loop, that I've been reporting a couple of times before? It > > used to hang on any other time I do a new kernel install, but latetly > > it seems to be OK (RT-V0.9.32-19 and -20). > > yeah, i've added Thomas Gleixner's earlier semaphore->completion > conversion to the loop device, to -19 or -18. > > Ingo > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2004-12-27 14:35 ` Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) Esben Nielsen @ 2004-12-27 15:27 ` Steven Rostedt 2004-12-27 16:23 ` Esben Nielsen 2005-01-28 7:38 ` Ingo Molnar 1 sibling, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2004-12-27 15:27 UTC (permalink / raw) To: Esben Nielsen Cc: Ingo Molnar, Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, Mark Johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, LKML, Florian Schmidt, Lee Revell, Shane Shrybman, Thomas Gleixner, Michal Schmidt On Mon, 2004-12-27 at 15:35 +0100, Esben Nielsen wrote: > I haven't seen much traffic on real-time preemption lately. Is it due > to Christmas or lost interest? > I think they are on vacation :-) > I noticed that you changed rw-locks to behave quite diferently under > real-time preemption: They basicly works like normal locks now. I.e. there > can only be one reader task within each region. This can can however lock > the region recursively. I wanted to start looking at fixing that because > it ought to hurt scalability quite a bit - and even on UP create a few > unneeded task-switchs. However, the more I think about it the bigger the > problem: > > First, let me describe how I see a read-write lock. It has 3 states: > a) unlocked > b) locked by n readers > c) locked by 1 writer > There can either be 1 writer within the protected region or n>=0 > readers within the region. When a writer wants to take the lock, > calling down_write(), it has to wait until the read count is 0. When a > reader wants to take the lock, calling down_read(), he has only to wait > until the the writer is done - there is no need to wait for the other > readers. > > Now in a real-time system down_X() ought to have a deterministic > blocking time. It should be easy to make down_read() deterministic: If > there is a writer let it inherit the calling readers priority. > However, down_write() is hard to make deterministic. Even if we assume > that the lock not only keeps track of the number of readers but keeps a > list of all the reader threads within the region it can traverse the list > and boost the priority of all those threads. If there is n readers when > down_write() is called the blocking time would be > O(ceil(n/#cpus)) > time - which is unbounded as n is not known! > > Having a rw-lock with deterministic down_read() but non-deterministic > down_write() would be very usefull in a lot of cases. The characteritic is > that the data structure being protected is relative static, is going > to be used by a lot of RT readers and the updates doesn't have to be done > with any real-time requirements. > However, there is no way to know in general which locks in the kernel can > be allowed to work like that and which can't. A good compromise would be > limit the number of readers in a lock by the number of cpu's on the > system. That would make the system scale over several CPUs without hitting > unneeded congestions on read-locks and still have a determnistic > down_write(). > Why just limit to the number of CPUs, but make a configurable limit. I would say the default may be 2*CPUs. Reason being is that once you limit the number of readers, you just bound the down_write. Even if number of readers allowed is 100, the down_write is now bound to O(ceil(n/#cpus)) as you said, but now n is known. Make a CONFIG_ALLOWED_READERS or something to that affect, and it would be easy to see what is a good optimal configuration (assuming you have the proper tests). > down_write() shall then do the following: Boost the priority of all the > active readers to the priority of the caller. This will in turn distribute > the readers over the cpu's of the system assuming no higher priority RT > tasks are running. All the reader tasks will then run to up_read() in > time O(1) as they can all run in parellel - assuming there is no ugly > nested locking ofcourse! > down_read() should first check if there is a writer. If there is > boost it and wait. If there isn't but there isn't room for another reader > boost one of the readers such it will run to up_read(). > > An extra bonus of not having the number of readers bounded: The various > structures needed for making the list of readers can be allocated once. > There is no need to call kmalloc() from within down_read() to get a list > element for the lock's list of readers. > > I don't know wether I have time for coding this soon. Under all > circumstances I do not have a SMP system so I can't really test it if I > get time to code it :-( > I have two SMP machines that I can test on, unfortunately, they both have NVIDIA cards, so I cant use them with X, unless I go back to the default driver. Which I would do, but I really like the 3d graphics ;-) -- Steve > Esben > > > > On Tue, 14 Dec 2004, Ingo Molnar wrote: > > > > > * Rui Nuno Capela <rncbc@rncbc.org> wrote: > > > > > Isn't this tightly related to mkinitrd sometimes hanging while on > > > mount -o loop, that I've been reporting a couple of times before? It > > > used to hang on any other time I do a new kernel install, but latetly > > > it seems to be OK (RT-V0.9.32-19 and -20). > > > > yeah, i've added Thomas Gleixner's earlier semaphore->completion > > conversion to the loop device, to -19 or -18. > > > > Ingo > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Steven Rostedt Senior Engineer Kihon Technologies ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2004-12-27 15:27 ` Steven Rostedt @ 2004-12-27 16:23 ` Esben Nielsen 2004-12-27 16:39 ` Steven Rostedt 2004-12-28 21:42 ` Lee Revell 0 siblings, 2 replies; 30+ messages in thread From: Esben Nielsen @ 2004-12-27 16:23 UTC (permalink / raw) To: Steven Rostedt Cc: Ingo Molnar, Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, Mark Johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, LKML, Florian Schmidt, Lee Revell, Shane Shrybman, Thomas Gleixner, Michal Schmidt On Mon, 27 Dec 2004, Steven Rostedt wrote: > On Mon, 2004-12-27 at 15:35 +0100, Esben Nielsen wrote: > > I haven't seen much traffic on real-time preemption lately. Is it due > > to Christmas or lost interest? > > > > I think they are on vacation :-) > Oh, I thought I would have fun with some kernel programming in my vacation! :-) > > [...] > > Why just limit to the number of CPUs, but make a configurable limit. I > would say the default may be 2*CPUs. Reason being is that once you > limit the number of readers, you just bound the down_write. Even if > number of readers allowed is 100, the down_write is now bound to > O(ceil(n/#cpus)) as you said, but now n is known. Make a > CONFIG_ALLOWED_READERS or something to that affect, and it would be easy > to see what is a good optimal configuration (assuming you have the > proper tests). > I agree with you that it ought to be configureable. But if you set it to something like 100 it is _not_ deterministic in practise. I.e. during your tests you have a really hard time making 100 readers at the critical point. Most likely you only have a handfull. Then when you deploy the system where you might have a webserver presenting data read under a rw-lock is overloaded spawns 100 worker tasks. Bang! Your RT application doesn't run! It doesn't help you to have something bounded if the bound is insanely high such you never reach it in tests. And if you try to calculate the worst case scenarious for such a system you would find it doesn't schedule. I.e. you have to buy a bigger CPU or do some other drastic thing! A limit like 1 reader per CPU on the other hand behaves like a normal mutex priority inheritance wrt. determinism. And it scales like the stock Linux rw-lock which in practise is also limited to 1 task per CPU as preemption is switched off :-) > [...] > I have two SMP machines that I can test on, unfortunately, they both > have NVIDIA cards, so I cant use them with X, unless I go back to the > default driver. Which I would do, but I really like the 3d graphics ;-) > Well, these kind of things isn't something you want to combine with 3d graphics right away anyway! I have even had problems with crashing on 2.4.xx with a NVidia and hyperthreading on a machine I helped install :-( I am afraid NVidia cards and preempt realtime is far in the future :-( > > -- Steve > Esben ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2004-12-27 16:23 ` Esben Nielsen @ 2004-12-27 16:39 ` Steven Rostedt 2004-12-27 21:06 ` Bill Huey 2004-12-28 21:42 ` Lee Revell 1 sibling, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2004-12-27 16:39 UTC (permalink / raw) To: Esben Nielsen Cc: Ingo Molnar, Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, Mark Johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, LKML, Florian Schmidt, Lee Revell, Shane Shrybman, Thomas Gleixner, Michal Schmidt On Mon, 2004-12-27 at 17:23 +0100, Esben Nielsen wrote: > > > [...] > > > > Why just limit to the number of CPUs, but make a configurable limit. I > > would say the default may be 2*CPUs. Reason being is that once you > > limit the number of readers, you just bound the down_write. Even if > > number of readers allowed is 100, the down_write is now bound to > > O(ceil(n/#cpus)) as you said, but now n is known. Make a > > CONFIG_ALLOWED_READERS or something to that affect, and it would be easy > > to see what is a good optimal configuration (assuming you have the > > proper tests). > > > I agree with you that it ought to be configureable. But if you set it to > something like 100 it is _not_ deterministic in practise. I.e. during your > tests you have a really hard time making 100 readers at the critical > point. Most likely you only have a handfull. Then when you deploy the > system where you might have a webserver presenting data read under a > rw-lock is overloaded spawns 100 worker tasks. Bang! Your RT application > doesn't run! > It doesn't help you to have something bounded if the bound is insanely > high such you never reach it in tests. And if you try to calculate the > worst case scenarious for such a system you would find it doesn't > schedule. I.e. you have to buy a bigger CPU or do some other drastic > thing! > I wasn't saying that 100 was a GOOD number, but that was just an example. I'm one of those that don't like the program, application, kernel, to limit you on how you want to shoot yourself in the foot. But always have the default set to something that prevents the idiot from doing something too idiotic. > A limit like 1 reader per CPU on the other hand behaves like a > normal mutex priority inheritance wrt. determinism. And it scales like the > stock Linux rw-lock which in practise is also limited to 1 task per CPU as > preemption is switched off :-) > Do you think 2/cpu is too high? I'd figure that reads usually happen way more than writes, and writes can usually last longer than reads, but having the default to two, you can test your code, on a UP. > > [...] > > I have two SMP machines that I can test on, unfortunately, they both > > have NVIDIA cards, so I cant use them with X, unless I go back to the > > default driver. Which I would do, but I really like the 3d graphics ;-) > > > > Well, these kind of things isn't something you want to combine with 3d > graphics right away anyway! > I have even had problems with crashing on 2.4.xx with a NVidia and > hyperthreading on a machine I helped install :-( > I am afraid NVidia cards and preempt realtime is far in the future :-( > Actually, I've had some success with NVIDIA on all my kernels (except of course the realtime ones). Unfortunately, there are the little crashes here and there, but those usually happen with screen savers so I don't lose data. I just come back to the machine and say WTF, I'm at a login prompt! I am one of those that save everything within seconds of modifying, and use subversion commits like crazy :-) -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2004-12-27 16:39 ` Steven Rostedt @ 2004-12-27 21:06 ` Bill Huey 2004-12-27 21:48 ` Valdis.Kletnieks ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Bill Huey @ 2004-12-27 21:06 UTC (permalink / raw) To: Steven Rostedt Cc: Esben Nielsen, Ingo Molnar, Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, Mark Johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, LKML, Florian Schmidt, Lee Revell, Shane Shrybman, Thomas Gleixner, Michal Schmidt On Mon, Dec 27, 2004 at 11:39:20AM -0500, Steven Rostedt wrote: > Actually, I've had some success with NVIDIA on all my kernels (except of Doesn't the NVidia driver use their own version of DRM/DRI ? If so, then did you tell it to use the Linux kernel versions of that driver ? > course the realtime ones). Unfortunately, there are the little crashes > here and there, but those usually happen with screen savers so I don't I was just having a discussion about this last night with a friend of mine and I'm going to pose this question to you and others. Is a real-time enabled kernel still relevant for high performance video even with GPUs being as fast as they are these days ? The context that I'm working with is that I was told (been out of gaming for a long time now) that GPus are so fast these days that shortage of frame rate isn't a problem any more. An RTOS would be able to deliver a data/instructions to the GPU under a much tighter time period and could delivery better, more consistent frame rates. Does this assertion still apply or not ? why ? (for either answer) bill ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2004-12-27 21:06 ` Bill Huey @ 2004-12-27 21:48 ` Valdis.Kletnieks 2004-12-28 21:59 ` Lee Revell 2005-01-04 15:25 ` Andrew McGregor 2 siblings, 0 replies; 30+ messages in thread From: Valdis.Kletnieks @ 2004-12-27 21:48 UTC (permalink / raw) To: Bill Huey Cc: Steven Rostedt, Esben Nielsen, Ingo Molnar, Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, Mark Johnson, Amit Shah, Karsten Wiese, Adam Heath, emann, Gunther Persoons, LKML, Florian Schmidt, Lee Revell, Shane Shrybman, Thomas Gleixner, Michal Schmidt [-- Attachment #1: Type: text/plain, Size: 1838 bytes --] On Mon, 27 Dec 2004 13:06:14 PST, Bill Huey said: > Is a real-time enabled kernel still relevant for high performance > video even with GPUs being as fast as they are these days ? More to the point - can a RT kernel help *last* year's model? My laptop only has a GeForce4 440Go (which is closer to a GeForce2 in reality) and a 1.6Gz Pentium4. So it isn't any problem at all to even find xscreensaver GL-hacks that bring it to its knees. Even the venerable 'glxgears' drops down to about 40FPS in an 800x600 window. I'm sure the average game has a *lot* more polygons in it than glxgears does. xscreensaver's 'sierpinski3d' drops down to 18FPS when it gets up to 16K polygons. Linux has long been reknowned for its ability to Get Stuff Done on much older and less capable hardware than the stuff from Redmond. Got an old box that crawled under W2K and Win/XP won't even install? Toss the current RedHat or Suse on it, and it goes... Would be *really* nice if we could find similar tricks on the graphics side. ;) > The context that I'm working with is that I was told (been out of > gaming for a long time now) that GPus are so fast these days that > shortage of frame rate isn't a problem any more. An RTOS would be > able to deliver a data/instructions to the GPU under a much tighter > time period and could delivery better, more consistent frame rates. > > Does this assertion still apply or not ? why ? (for either answer) Shortage of frame rate is *always* a problem. No matter how many millions of polygons/sec you can push out the door, somebody will want to do N+25% per second. Ask yourself why SGI was *EVER* able to sell a machine with more than one InfiniteReality graphics engine on it, and then ask yourself what the people who were using 3 IR pipes 5-6 years ago are looking to use for graphics *this* year. [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2004-12-27 21:06 ` Bill Huey 2004-12-27 21:48 ` Valdis.Kletnieks @ 2004-12-28 21:59 ` Lee Revell 2005-01-04 15:25 ` Andrew McGregor 2 siblings, 0 replies; 30+ messages in thread From: Lee Revell @ 2004-12-28 21:59 UTC (permalink / raw) To: Bill Huey Cc: Steven Rostedt, Esben Nielsen, Ingo Molnar, Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, Mark Johnson, Amit Shah, Karsten Wiese, Adam Heath, emann, Gunther Persoons, LKML, Florian Schmidt, Shane Shrybman, Thomas Gleixner, Michal Schmidt On Mon, 2004-12-27 at 13:06 -0800, Bill Huey wrote: > I was just having a discussion about this last night with a friend > of mine and I'm going to pose this question to you and others. > > Is a real-time enabled kernel still relevant for high performance > video even with GPUs being as fast as they are these days ? > > The context that I'm working with is that I was told (been out of > gaming for a long time now) that GPus are so fast these days that > shortage of frame rate isn't a problem any more. An RTOS would be > able to deliver a data/instructions to the GPU under a much tighter > time period and could delivery better, more consistent frame rates. > > Does this assertion still apply or not ? why ? (for either answer) Yes, an RTOS certainly helps. Otherwise you cannot guarantee a minimum frame rate - if a long running disk ISR fires then you are screwed, because jsut as with low latency audio you have a SCHED_FIFO userspace process that is feeding data to the GPU at a constant rate. Its a worse problem with audio because you absolutely cannot drop a frame or you will hear it. With video it's likely to be imperceptible. Lee ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2004-12-27 21:06 ` Bill Huey 2004-12-27 21:48 ` Valdis.Kletnieks 2004-12-28 21:59 ` Lee Revell @ 2005-01-04 15:25 ` Andrew McGregor 2 siblings, 0 replies; 30+ messages in thread From: Andrew McGregor @ 2005-01-04 15:25 UTC (permalink / raw) To: Bill Huey Cc: Rui Nuno Capela, LKML, Ingo Molnar, Gunther Persoons, Thomas Gleixner, Karsten Wiese, Michal Schmidt, Shane Shrybman, Mark Johnson, emann, Adam Heath, Steven Rostedt, K.R. Foley, Lee Revell, Esben Nielsen, Amit Shah, Fernando Lopez-Lezcano, Florian Schmidt On 28/12/2004, at 10:06 AM, Bill Huey (hui) wrote: > On Mon, Dec 27, 2004 at 11:39:20AM -0500, Steven Rostedt wrote: >> Actually, I've had some success with NVIDIA on all my kernels (except >> of > > Doesn't the NVidia driver use their own version of DRM/DRI ? > If so, then did you tell it to use the Linux kernel versions of that > driver ? > >> course the realtime ones). Unfortunately, there are the little crashes >> here and there, but those usually happen with screen savers so I don't > > I was just having a discussion about this last night with a friend > of mine and I'm going to pose this question to you and others. > > Is a real-time enabled kernel still relevant for high performance > video even with GPUs being as fast as they are these days ? It is if you want to do any audio at the same time, as you usually do. Andrew ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2004-12-27 16:23 ` Esben Nielsen 2004-12-27 16:39 ` Steven Rostedt @ 2004-12-28 21:42 ` Lee Revell 1 sibling, 0 replies; 30+ messages in thread From: Lee Revell @ 2004-12-28 21:42 UTC (permalink / raw) To: Esben Nielsen Cc: Steven Rostedt, Ingo Molnar, Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, Mark Johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, LKML, Florian Schmidt, Shane Shrybman, Thomas Gleixner, Michal Schmidt On Mon, 2004-12-27 at 17:23 +0100, Esben Nielsen wrote: > I am afraid NVidia cards and preempt realtime is far in the future :-( Not necessarily. I am sure many of Nvidia's potential customers want to do high end simulation and that requires an RT capable OS and good graphics hardware. If many people end up using RT preempt for high end simulator applications, like several who have posted on this list, then it seems like it would a good fit. Lee ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2004-12-27 14:35 ` Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) Esben Nielsen 2004-12-27 15:27 ` Steven Rostedt @ 2005-01-28 7:38 ` Ingo Molnar 2005-01-28 11:56 ` William Lee Irwin III ` (2 more replies) 1 sibling, 3 replies; 30+ messages in thread From: Ingo Molnar @ 2005-01-28 7:38 UTC (permalink / raw) To: Esben Nielsen Cc: Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, mark_h_johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, linux-kernel, Florian Schmidt, Lee Revell, Shane Shrybman, Thomas Gleixner, Michal Schmidt * Esben Nielsen <simlo@phys.au.dk> wrote: > I noticed that you changed rw-locks to behave quite diferently under > real-time preemption: They basicly works like normal locks now. I.e. > there can only be one reader task within each region. This can can > however lock the region recursively. [...] correct. > [...] I wanted to start looking at fixing that because it ought to > hurt scalability quite a bit - and even on UP create a few unneeded > task-switchs. [...] no, it's not a big scalability problem. rwlocks are really a mistake - if you want scalability and spinlocks/semaphores are not enough then one should either use per-CPU locks or lockless structures. rwlocks/rwsems will very unlikely help much. > However, the more I think about it the bigger the problem: yes, that complexity to get it perform in a deterministic manner is why i introduced this (major!) simplification of locking. It turns out that most of the time the actual use of rwlocks matches this simplified 'owner-recursive exclusive lock' semantics, so we are lucky. look at what kind of worst-case scenarios there may already be with multiple spinlocks (blocker.c). With rwlocks that just gets insane. Ingo ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2005-01-28 7:38 ` Ingo Molnar @ 2005-01-28 11:56 ` William Lee Irwin III 2005-01-28 15:28 ` Ingo Molnar 2005-01-28 19:18 ` Trond Myklebust 2005-01-30 22:03 ` Esben Nielsen 2 siblings, 1 reply; 30+ messages in thread From: William Lee Irwin III @ 2005-01-28 11:56 UTC (permalink / raw) To: Ingo Molnar Cc: Esben Nielsen, Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, mark_h_johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, linux-kernel, Florian Schmidt, Lee Revell, Shane Shrybman, Thomas Gleixner, Michal Schmidt * Esben Nielsen <simlo@phys.au.dk> wrote: >> [...] I wanted to start looking at fixing that because it ought to >> hurt scalability quite a bit - and even on UP create a few unneeded >> task-switchs. [...] On Fri, Jan 28, 2005 at 08:38:56AM +0100, Ingo Molnar wrote: > no, it's not a big scalability problem. rwlocks are really a mistake - > if you want scalability and spinlocks/semaphores are not enough then one > should either use per-CPU locks or lockless structures. rwlocks/rwsems > will very unlikely help much. I wouldn't be so sure about that. SGI is already implicitly relying on the parallel holding of rwsems for the lockless pagefaulting, and Oracle has been pushing on mapping->tree_lock becoming an rwlock for a while, both for large performance gains. * Esben Nielsen <simlo@phys.au.dk> wrote: >> However, the more I think about it the bigger the problem: On Fri, Jan 28, 2005 at 08:38:56AM +0100, Ingo Molnar wrote: > yes, that complexity to get it perform in a deterministic manner is why > i introduced this (major!) simplification of locking. It turns out that > most of the time the actual use of rwlocks matches this simplified > 'owner-recursive exclusive lock' semantics, so we are lucky. > look at what kind of worst-case scenarios there may already be with > multiple spinlocks (blocker.c). With rwlocks that just gets insane. tasklist_lock is one large exception; it's meant for concurrency there, and it even gets sufficient concurrency to starve the write side. Try test_remap.c on mainline vs. -mm to get a microbenchmark-level notion of the importance of mapping->tree_lock being an rwlock (IIRC you were cc:'d in at least some of those threads). net/ has numerous rwlocks, which appear to frequently be associated with hashtables, and at least some have some relevance to performance. Are you suggesting that lockless alternatives to mapping->tree_lock, mm->mmap_sem, and tasklist_lock should be pursued now? -- wli ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2005-01-28 11:56 ` William Lee Irwin III @ 2005-01-28 15:28 ` Ingo Molnar 2005-01-28 15:55 ` William Lee Irwin III 0 siblings, 1 reply; 30+ messages in thread From: Ingo Molnar @ 2005-01-28 15:28 UTC (permalink / raw) To: William Lee Irwin III Cc: Esben Nielsen, Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, mark_h_johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, linux-kernel, Florian Schmidt, Lee Revell, Shane Shrybman, Thomas Gleixner, Michal Schmidt * William Lee Irwin III <wli@holomorphy.com> wrote: > On Fri, Jan 28, 2005 at 08:38:56AM +0100, Ingo Molnar wrote: > > no, it's not a big scalability problem. rwlocks are really a mistake - > > if you want scalability and spinlocks/semaphores are not enough then one > > should either use per-CPU locks or lockless structures. rwlocks/rwsems > > will very unlikely help much. > > I wouldn't be so sure about that. SGI is already implicitly relying on > the parallel holding of rwsems for the lockless pagefaulting, and > Oracle has been pushing on mapping->tree_lock becoming an rwlock for a > while, both for large performance gains. i dont really buy it. Any rwlock-type of locking causes global cacheline bounces. It can make a positive scalability difference only if the read-lock hold time is large, at which point RCU could likely have significantly higher performance. There _may_ be an intermediate locking pattern that is both long-held but has a higher mix of write-locking where rwlocks/rwsems may have a performance advantage over RCU or spinlocks. Also this is about PREEMPT_RT, mainly aimed towards embedded systems, and at most aimed towards small (dual-CPU) SMP systems, not the really big systems. But, the main argument wrt. PREEMPT_RT stands and is independent of any scalability properties: rwlocks/rwsems have so bad deterministic behavior that they are almost impossible to implement in a sane way. Ingo ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2005-01-28 15:28 ` Ingo Molnar @ 2005-01-28 15:55 ` William Lee Irwin III 2005-01-28 16:16 ` Ingo Molnar 0 siblings, 1 reply; 30+ messages in thread From: William Lee Irwin III @ 2005-01-28 15:55 UTC (permalink / raw) To: Ingo Molnar Cc: Esben Nielsen, Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, mark_h_johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, linux-kernel, Florian Schmidt, Lee Revell, Shane Shrybman, Thomas Gleixner, Michal Schmidt * William Lee Irwin III <wli@holomorphy.com> wrote: >> I wouldn't be so sure about that. SGI is already implicitly relying on >> the parallel holding of rwsems for the lockless pagefaulting, and >> Oracle has been pushing on mapping->tree_lock becoming an rwlock for a >> while, both for large performance gains. On Fri, Jan 28, 2005 at 04:28:02PM +0100, Ingo Molnar wrote: > i dont really buy it. Any rwlock-type of locking causes global cacheline > bounces. It can make a positive scalability difference only if the > read-lock hold time is large, at which point RCU could likely have > significantly higher performance. There _may_ be an intermediate locking > pattern that is both long-held but has a higher mix of write-locking > where rwlocks/rwsems may have a performance advantage over RCU or > spinlocks. The performance relative to mutual exclusion is quantifiable and very reproducible. These results have people using arguments similar to what you made above baffled. Systems as small as 4 logical cpus feel these effects strongly, and it appears to scale almost linearly with the number of cpus. It may be worth consulting an x86 processor architect or similar to get an idea of why the counterargument fails. I'm rather interested in hearing why as well, as I believed the cacheline bounce argument until presented with incontrovertible evidence to the contrary. As far as performance relative to RCU goes, I suspect cases where write-side latency is important will arise for these. Other lockless methods are probably more appropriate, and are more likely to dominate rwlocks as expected. For instance, a reimplementation of the radix trees for lockless insertion and traversal (c.f. lockless pagetable patches for examples of how that's carried out) is plausible, where RCU memory overhead in struct page is not. On Fri, Jan 28, 2005 at 04:28:02PM +0100, Ingo Molnar wrote: > Also this is about PREEMPT_RT, mainly aimed towards embedded systems, > and at most aimed towards small (dual-CPU) SMP systems, not the really > big systems. > But, the main argument wrt. PREEMPT_RT stands and is independent of any > scalability properties: rwlocks/rwsems have so bad deterministic > behavior that they are almost impossible to implement in a sane way. I suppose if it's not headed toward mainline the counterexamples don't really matter. I don't have much to say about the RT-related issues, though I'm aware of priority inheritance's infeasibility for the rwlock/rwsem case. -- wli ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2005-01-28 15:55 ` William Lee Irwin III @ 2005-01-28 16:16 ` Ingo Molnar 0 siblings, 0 replies; 30+ messages in thread From: Ingo Molnar @ 2005-01-28 16:16 UTC (permalink / raw) To: William Lee Irwin III Cc: Esben Nielsen, Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, mark_h_johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, linux-kernel, Florian Schmidt, Lee Revell, Shane Shrybman, Thomas Gleixner, Michal Schmidt * William Lee Irwin III <wli@holomorphy.com> wrote: > The performance relative to mutual exclusion is quantifiable and very > reproducible. [...] yes, i dont doubt the results - my point is that it's not proven that the other, more read-friendly types of locking underperform rwlocks. Obviously spinlocks and rwlocks have the same cache-bounce properties, so rwlocks can outperform spinlocks if the read path overhead is higher than that of a bounce, and reads are dominant. But it's still a poor form of scalability. In fact, when the read path is really expensive (larger than say 10-20 usecs) an rwlock can produce the appearance of linear scalability, when compared to spinlocks. > As far as performance relative to RCU goes, I suspect cases where > write-side latency is important will arise for these. Other lockless > methods are probably more appropriate, and are more likely to dominate > rwlocks as expected. For instance, a reimplementation of the radix > trees for lockless insertion and traversal (c.f. lockless pagetable > patches for examples of how that's carried out) is plausible, where > RCU memory overhead in struct page is not. yeah. Ingo ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2005-01-28 7:38 ` Ingo Molnar 2005-01-28 11:56 ` William Lee Irwin III @ 2005-01-28 19:18 ` Trond Myklebust 2005-01-28 19:45 ` Ingo Molnar 2005-01-28 21:13 ` Lee Revell 2005-01-30 22:03 ` Esben Nielsen 2 siblings, 2 replies; 30+ messages in thread From: Trond Myklebust @ 2005-01-28 19:18 UTC (permalink / raw) To: Ingo Molnar Cc: Esben Nielsen, Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, mark_h_johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, linux-kernel, Florian Schmidt, Lee Revell, Shane Shrybman, Thomas Gleixner, Michal Schmidt fr den 28.01.2005 Klokka 08:38 (+0100) skreiv Ingo Molnar: > no, it's not a big scalability problem. rwlocks are really a mistake - > if you want scalability and spinlocks/semaphores are not enough then one > should either use per-CPU locks or lockless structures. rwlocks/rwsems > will very unlikely help much. If you do have a highest interrupt case that causes all activity to block, then rwsems may indeed fit the bill. In the NFS client code we may use rwsems in order to protect stateful operations against the (very infrequently used) server reboot recovery code. The point is that when the server reboots, the server forces us to block *all* requests that involve adding new state (e.g. opening an NFSv4 file, or setting up a lock) while our client and others are re-establishing their existing state on the server. IOW: If you are planning on converting rwsems into a semaphore, you will screw us over most royally, by converting the currently highly infrequent scenario of a single task being able to access the server into the common case. Cheers, Trond -- Trond Myklebust <trond.myklebust@fys.uio.no> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2005-01-28 19:18 ` Trond Myklebust @ 2005-01-28 19:45 ` Ingo Molnar 2005-01-28 23:25 ` Bill Huey 2005-01-28 21:13 ` Lee Revell 1 sibling, 1 reply; 30+ messages in thread From: Ingo Molnar @ 2005-01-28 19:45 UTC (permalink / raw) To: Trond Myklebust Cc: Esben Nielsen, Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, mark_h_johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, linux-kernel, Florian Schmidt, Lee Revell, Shane Shrybman, Thomas Gleixner, Michal Schmidt * Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > If you do have a highest interrupt case that causes all activity to > block, then rwsems may indeed fit the bill. > > In the NFS client code we may use rwsems in order to protect stateful > operations against the (very infrequently used) server reboot recovery > code. The point is that when the server reboots, the server forces us > to block *all* requests that involve adding new state (e.g. opening an > NFSv4 file, or setting up a lock) while our client and others are > re-establishing their existing state on the server. it seems the most scalable solution for this would be a global flag plus per-CPU spinlocks (or per-CPU mutexes) to make this totally scalable and still support the requirements of this rare event. An rwsem really bounces around on SMP, and it seems very unnecessary in the case you described. possibly this could be formalised as an rwlock/rwlock implementation that scales better. brlocks were such an attempt. > IOW: If you are planning on converting rwsems into a semaphore, you > will screw us over most royally, by converting the currently highly > infrequent scenario of a single task being able to access the server > into the common case. nono, i have no such plans. Ingo ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2005-01-28 19:45 ` Ingo Molnar @ 2005-01-28 23:25 ` Bill Huey 0 siblings, 0 replies; 30+ messages in thread From: Bill Huey @ 2005-01-28 23:25 UTC (permalink / raw) To: Ingo Molnar Cc: Trond Myklebust, Esben Nielsen, Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, mark_h_johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, linux-kernel, Florian Schmidt, Lee Revell, Shane Shrybman, Thomas Gleixner, Michal Schmidt, Igor Manyilov (auriga) On Fri, Jan 28, 2005 at 08:45:46PM +0100, Ingo Molnar wrote: > * Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > > If you do have a highest interrupt case that causes all activity to > > block, then rwsems may indeed fit the bill. > > > > In the NFS client code we may use rwsems in order to protect stateful > > operations against the (very infrequently used) server reboot recovery > > code. The point is that when the server reboots, the server forces us > > to block *all* requests that involve adding new state (e.g. opening an > > NFSv4 file, or setting up a lock) while our client and others are > > re-establishing their existing state on the server. > > it seems the most scalable solution for this would be a global flag plus > per-CPU spinlocks (or per-CPU mutexes) to make this totally scalable and > still support the requirements of this rare event. An rwsem really > bounces around on SMP, and it seems very unnecessary in the case you > described. > > possibly this could be formalised as an rwlock/rwlock implementation > that scales better. brlocks were such an attempt. >From how I understand it, you'll have to have a global structure to denote an exclusive operation and then take some additional cpumask_t representing the spinlocks set and use it to iterate over when doing a PI chain operation. Locking of each individual parametric typed spinlock might require a raw_spinlock manipulate lists structures, which, added up, is rather heavy weight. No only that, you'd have to introduce a notion of it being counted since it could also be aquired/preempted by another higher priority thread on that same procesor. Not having this semantic would make the thread in that specific circumstance effectively non-preemptable (PI scheduler indeterminancy), where the mulipule readers portion of a real read/write (shared-exclusve) lock would have permitted this. http://people.lynuxworks.com/~bhuey/rt-share-exclusive-lock/rtsem.tgz.1208 Is our attempt at getting real shared-exclusive lock semantics in a blocking lock and may still be incomplete and buggy. Igor is still working on this and this is the latest that I have of his work. Getting comments on this approach would be a good thing as I/we (me/Igor) believed from the start that this approach is correct. Assuming that this is possible with the current approach, optimizing it to avoid CPU ping-ponging is an important next step bill ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2005-01-28 19:18 ` Trond Myklebust 2005-01-28 19:45 ` Ingo Molnar @ 2005-01-28 21:13 ` Lee Revell 1 sibling, 0 replies; 30+ messages in thread From: Lee Revell @ 2005-01-28 21:13 UTC (permalink / raw) To: Trond Myklebust Cc: Ingo Molnar, Esben Nielsen, Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, mark_h_johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, linux-kernel, Florian Schmidt, Shane Shrybman, Thomas Gleixner, Michal Schmidt On Fri, 2005-01-28 at 11:18 -0800, Trond Myklebust wrote: > In the NFS client code we may use rwsems in order to protect stateful > operations against the (very infrequently used) server reboot recovery > code. The point is that when the server reboots, the server forces us to > block *all* requests that involve adding new state (e.g. opening an > NFSv4 file, or setting up a lock) while our client and others are > re-establishing their existing state on the server. Hmm, when I was an ISP sysadmin I used to use this all the time. NFS mounts from the BSD/OS clients would start to act up under heavy web server load and the cleanest way to get them to recover was to simulate a reboot on the NetApp. Of course Linux clients were unaffected, they were just along for the ride ;-) Lee ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2005-01-28 7:38 ` Ingo Molnar 2005-01-28 11:56 ` William Lee Irwin III 2005-01-28 19:18 ` Trond Myklebust @ 2005-01-30 22:03 ` Esben Nielsen 2005-01-30 23:59 ` Kyle Moffett 2 siblings, 1 reply; 30+ messages in thread From: Esben Nielsen @ 2005-01-30 22:03 UTC (permalink / raw) To: Ingo Molnar Cc: Rui Nuno Capela, K.R. Foley, Fernando Lopez-Lezcano, mark_h_johnson, Amit Shah, Karsten Wiese, Bill Huey, Adam Heath, emann, Gunther Persoons, linux-kernel, Florian Schmidt, Lee Revell, Shane Shrybman, Thomas Gleixner, Michal Schmidt On Fri, 28 Jan 2005, Ingo Molnar wrote: > > * Esben Nielsen <simlo@phys.au.dk> wrote: > > > I noticed that you changed rw-locks to behave quite diferently under > > real-time preemption: They basicly works like normal locks now. I.e. > > there can only be one reader task within each region. This can can > > however lock the region recursively. [...] > > correct. > > > [...] I wanted to start looking at fixing that because it ought to > > hurt scalability quite a bit - and even on UP create a few unneeded > > task-switchs. [...] > > no, it's not a big scalability problem. rwlocks are really a mistake - > if you want scalability and spinlocks/semaphores are not enough then one > should either use per-CPU locks or lockless structures. rwlocks/rwsems > will very unlikely help much. > I agree that RCU ought to do the trick in a lot of cases. Unfortunately, people haven't used RCU in a lot of code but an rwlock. I also like the idea of rwlocks: Many readers or just one writer. I don't see the need to take that away from people. Here is an examble which even on a UP will give problems without it: You have a shared datastructure, rarely updated with many readers. A low priority task is reading it. That is preempted a high priority task which finds out it can't read it -> priority inheritance, task switch. The low priority task finishes the job -> priority set back, task switch. If it was done with a rwlock two task switchs would have been saved. > > However, the more I think about it the bigger the problem: > > yes, that complexity to get it perform in a deterministic manner is why > i introduced this (major!) simplification of locking. It turns out that > most of the time the actual use of rwlocks matches this simplified > 'owner-recursive exclusive lock' semantics, so we are lucky. > > look at what kind of worst-case scenarios there may already be with > multiple spinlocks (blocker.c). With rwlocks that just gets insane. > Yes it does. But one could make a compromise: The up_write() should _not_ be deterministic. In that case it would be very simple to implement. up_read() could still be deterministic as it would only involve boosting one writer in the rare case such exists. That kind of locking would be very usefull in many real-time systems. Ofcourse, RCU can do the job as well, but it puts a lot of contrains on the code. However, as Linux is a general OS there is no way to know wether a specific lock needs to be determnistic wrt. writing or not as the actual application is not known at the time the lock type is specified. > Ingo > Esben ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) 2005-01-30 22:03 ` Esben Nielsen @ 2005-01-30 23:59 ` Kyle Moffett 0 siblings, 0 replies; 30+ messages in thread From: Kyle Moffett @ 2005-01-30 23:59 UTC (permalink / raw) To: Esben Nielsen Cc: Shane Shrybman, K.R. Foley, Thomas Gleixner, emann, Rui Nuno Capela, Adam Heath, Gunther Persoons, Florian Schmidt, mark_h_johnson, linux-kernel, Fernando Lopez-Lezcano, Ingo Molnar, Karsten Wiese, Bill Huey, Amit Shah, Lee Revell, Michal Schmidt For anybody who wants a good executive summary of RCU, see these: http://lse.sourceforge.net/locking/rcupdate.html http://lse.sourceforge.net/locking/rcu/HOWTO/intro.html#WHATIS On Jan 30, 2005, at 17:03, Esben Nielsen wrote: > I agree that RCU ought to do the trick in a lot of cases. > Unfortunately, > people haven't used RCU in a lot of code but an rwlock. I also like the > idea of rwlocks: Many readers or just one writer. Well, RCU is nice because as long as there are no processes attempting to modify the data, the performance is as though there was no locking at all, which is better than the cacheline-bouncing for rwlock read-acquires, which must modify the rwlock data every time you acquire. It's only when you need to modify the data that readers or other writers must repeat their calculations when they find out that the data's changed. In the case of a reader and a writer, the performance reduction is the same as a cmpxchg and the reader redoing their calculations (if necessary). > You have a shared datastructure, rarely updated with many readers. A > low > priority task is reading it. That is preempted a high priority task > which > finds out it can't read it -> priority inheritance, task switch. The > low > priority task finishes the job -> priority set back, task switch. If it > was done with a rwlock two task switchs would have been saved. With RCU the high priority task (unlikely to be preempted) gets to run all the way through with its calculation, and any low priority tasks are the ones that will probably need to redo their calculations. > Yes it does. But one could make a compromise: The up_write() should > _not_ > be deterministic. In that case it would be very simple to implement. > up_read() could still be deterministic as it would only involve > boosting > one writer in the rare case such exists. That kind of locking would be > very usefull in many real-time systems. Ofcourse, RCU can do the job as > well, but it puts a lot of contrains on the code. Yeah, unfortunately it's harder to write good reliable RCU code than good reliable rwlock code, because the semantics of RCU WRT memory access are much more difficult, so more people write rwlock code that needs to be cleaned up. It's not like normal locking is easily comprehensible either. :-\ Cheers, Kyle Moffett -----BEGIN GEEK CODE BLOCK----- Version: 3.12 GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$ L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$ r !y?(-) ------END GEEK CODE BLOCK------ ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2005-01-30 23:59 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-12-10 17:49 [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15 Mark_H_Johnson 2004-12-10 21:09 ` Ingo Molnar 2004-12-10 21:12 ` Ingo Molnar 2004-12-10 21:24 ` Ingo Molnar 2004-12-13 0:16 ` Fernando Lopez-Lezcano 2004-12-13 6:47 ` Ingo Molnar 2004-12-14 0:46 ` Fernando Lopez-Lezcano 2004-12-14 4:42 ` K.R. Foley 2004-12-14 8:47 ` Rui Nuno Capela 2004-12-14 11:35 ` Ingo Molnar 2004-12-27 14:35 ` Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15) Esben Nielsen 2004-12-27 15:27 ` Steven Rostedt 2004-12-27 16:23 ` Esben Nielsen 2004-12-27 16:39 ` Steven Rostedt 2004-12-27 21:06 ` Bill Huey 2004-12-27 21:48 ` Valdis.Kletnieks 2004-12-28 21:59 ` Lee Revell 2005-01-04 15:25 ` Andrew McGregor 2004-12-28 21:42 ` Lee Revell 2005-01-28 7:38 ` Ingo Molnar 2005-01-28 11:56 ` William Lee Irwin III 2005-01-28 15:28 ` Ingo Molnar 2005-01-28 15:55 ` William Lee Irwin III 2005-01-28 16:16 ` Ingo Molnar 2005-01-28 19:18 ` Trond Myklebust 2005-01-28 19:45 ` Ingo Molnar 2005-01-28 23:25 ` Bill Huey 2005-01-28 21:13 ` Lee Revell 2005-01-30 22:03 ` Esben Nielsen 2005-01-30 23:59 ` Kyle Moffett
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).