* CPUfreq - udelay() interaction issues
@ 2010-04-22 3:34 Saravana Kannan
2010-04-22 21:22 ` Saravana Kannan
2010-04-22 23:21 ` Saravana Kannan
0 siblings, 2 replies; 16+ messages in thread
From: Saravana Kannan @ 2010-04-22 3:34 UTC (permalink / raw)
To: cpufreq, linux-arm-msm; +Cc: skannan
Hi,
I think there are a couple of issues with cpufreq and udelay
interaction. But that's based on my understanding of cpufreq. I have
worked with it for sometime now, so hopefully I not completely wrong.
So, I will list my assumptions and what I think is/are the issue(s) and
their solutions.
Please correct me if I'm wrong and let me know what you think.
Assumptions:
============
* Let's assume ondemand governor is being used.
* Ondemand uses one timer per core and they have CPU affinity set.
* For SMP, CPUfreq core expects the CPUfreq driver to adjust the per-CPU
jiffies.
* P1 indicates for lower CPU perfomance levels and P2 indicates a much
higher CPU pref level (say 10 times faster).
Issue 1: UP (non-SMP) scenario
==============================
This issue is also present for SMP case, but I don't want to complicate
this example with it. For future reference in this thread, let's call
this "Context switch issue".
Steps:
- CPU running at P1
- Driver context calls udelay
- udelay does loop calculation and starts looping
- Context switches to ondemand gov timer function
- Ondemand gov changes CPU to P2
- Context switches back to Driver context
- udelay does a delay that's 10 times shorter.
The last point is obviously a bad thing. I'm more concerned about ARM
arch for the moment, but considering x86 takes a max of 20ms (20000us)
for udelay, the above scenario looks very possible.
Is there anything I missed that prevents this from happening?
If this really is an issue, then one solution is to make cpufreq defer
the freq change if some flag indicates that udelay is active. Basically,
some kind of r/w semaphore or spinlock.
Does this sound like a reasonable solution?
Issue 2: SMP scenario
=====================
For future reference in this thread, let's call this "CPU affinity issue".
Steps:
- CPU0 running at P1
- CPU1 running at P2
- Driver context calls udelay in CPU0
- udelay does loop calculation and starts looping
- Driver context/thread is moved from CPU0 to CPU1
- udelay does a delay that's 10 times shorter.
Again, the last point is obviously a bad thing. Am I missing anything
here too? Again, I care more about ARM, but x86 (which a lot more people
might care about) also seems to be broken if it doesn't use the TSC
method for the delay.
Assuming we fix Issue 1 (or it's not present) I think an ideal solution
for this issue is to do something like:
udelay(us)
{
set cpu affinity to current CPU;
Do the usual udelay code;
restore cpu affinity status;
}
Does this sound like a reasonable solution?
Thanks,
Saravana
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: CPUfreq - udelay() interaction issues
2010-04-22 3:34 CPUfreq - udelay() interaction issues Saravana Kannan
@ 2010-04-22 21:22 ` Saravana Kannan
2010-04-22 23:18 ` Thomas Renninger
2010-04-22 23:21 ` Saravana Kannan
1 sibling, 1 reply; 16+ messages in thread
From: Saravana Kannan @ 2010-04-22 21:22 UTC (permalink / raw)
To: cpufreq, linux-arm-msm
Dave, Venkatesh and other maintainers,
Any comments?
Thanks,
Saravana
Saravana Kannan wrote:
> Hi,
>
> I think there are a couple of issues with cpufreq and udelay
> interaction. But that's based on my understanding of cpufreq. I have
> worked with it for sometime now, so hopefully I not completely wrong.
> So, I will list my assumptions and what I think is/are the issue(s) and
> their solutions.
>
> Please correct me if I'm wrong and let me know what you think.
>
> Assumptions:
> ============
> * Let's assume ondemand governor is being used.
> * Ondemand uses one timer per core and they have CPU affinity set.
> * For SMP, CPUfreq core expects the CPUfreq driver to adjust the per-CPU
> jiffies.
> * P1 indicates for lower CPU perfomance levels and P2 indicates a much
> higher CPU pref level (say 10 times faster).
>
> Issue 1: UP (non-SMP) scenario
> ==============================
>
> This issue is also present for SMP case, but I don't want to complicate
> this example with it. For future reference in this thread, let's call
> this "Context switch issue".
>
> Steps:
> - CPU running at P1
> - Driver context calls udelay
> - udelay does loop calculation and starts looping
> - Context switches to ondemand gov timer function
> - Ondemand gov changes CPU to P2
> - Context switches back to Driver context
> - udelay does a delay that's 10 times shorter.
>
> The last point is obviously a bad thing. I'm more concerned about ARM
> arch for the moment, but considering x86 takes a max of 20ms (20000us)
> for udelay, the above scenario looks very possible.
>
> Is there anything I missed that prevents this from happening?
>
> If this really is an issue, then one solution is to make cpufreq defer
> the freq change if some flag indicates that udelay is active. Basically,
> some kind of r/w semaphore or spinlock.
>
> Does this sound like a reasonable solution?
>
> Issue 2: SMP scenario
> =====================
>
> For future reference in this thread, let's call this "CPU affinity issue".
>
> Steps:
> - CPU0 running at P1
> - CPU1 running at P2
> - Driver context calls udelay in CPU0
> - udelay does loop calculation and starts looping
> - Driver context/thread is moved from CPU0 to CPU1
> - udelay does a delay that's 10 times shorter.
>
> Again, the last point is obviously a bad thing. Am I missing anything
> here too? Again, I care more about ARM, but x86 (which a lot more people
> might care about) also seems to be broken if it doesn't use the TSC
> method for the delay.
>
> Assuming we fix Issue 1 (or it's not present) I think an ideal solution
> for this issue is to do something like:
>
> udelay(us)
> {
> set cpu affinity to current CPU;
> Do the usual udelay code;
> restore cpu affinity status;
> }
>
> Does this sound like a reasonable solution?
>
> Thanks,
> Saravana
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: CPUfreq - udelay() interaction issues
2010-04-22 21:22 ` Saravana Kannan
@ 2010-04-22 23:18 ` Thomas Renninger
2010-04-22 23:37 ` Saravana Kannan
0 siblings, 1 reply; 16+ messages in thread
From: Thomas Renninger @ 2010-04-22 23:18 UTC (permalink / raw)
To: Saravana Kannan; +Cc: cpufreq, linux-arm-msm
On Thursday 22 April 2010 11:22:20 pm Saravana Kannan wrote:
> Dave, Venkatesh and other maintainers,
>
> Any comments?
>From adjust_jiffies in cpufreq.c:
* adjust_jiffies - adjust the system "loops_per_jiffy"
*
* This function alters the system "loops_per_jiffy" for the clock
* speed change. Note that loops_per_jiffy cannot be updated on SMP
* systems as each CPU might be scaled differently. So, use the arch
* per-CPU loops_per_jiffy value wherever possible.
For SMP case adjust_jiffies is just empty.
udelay on x86 uses the per cpu loops_per_jiffy:
cpu_data(raw_smp_processor_id()).loops_per_jiffy
which does not get adjusted via adjust_jiffies()
For me it looks as udelay is always wrong and sleeps too long
on lower frequencies, but I may oversee something.
It shouldn't be that hard to test this with a tiny test module
which is measuring real udelay sleep times via tsc reads on a x86 machine
with stable tsc. Doing that in a loop, print out the diff to how long
it should have slept and doing that under lowered freq or whatever bad
circumstances, should show worst cases after some time.
Thomas
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: CPUfreq - udelay() interaction issues
2010-04-22 3:34 CPUfreq - udelay() interaction issues Saravana Kannan
2010-04-22 21:22 ` Saravana Kannan
@ 2010-04-22 23:21 ` Saravana Kannan
2010-04-23 18:40 ` Mathieu Desnoyers
1 sibling, 1 reply; 16+ messages in thread
From: Saravana Kannan @ 2010-04-22 23:21 UTC (permalink / raw)
To: cpufreq, linux-arm-msm, Dave Jones, Venkatesh Pallipadi,
Mathieu Desnoyers, Thomas Renninger
Resending email to "cc" the maintainers.
Maintainers,
Any comments?
-Saravana
Saravana Kannan wrote:
> Hi,
>
> I think there are a couple of issues with cpufreq and udelay
> interaction. But that's based on my understanding of cpufreq. I have
> worked with it for sometime now, so hopefully I not completely wrong.
> So, I will list my assumptions and what I think is/are the issue(s) and
> their solutions.
>
> Please correct me if I'm wrong and let me know what you think.
>
> Assumptions:
> ============
> * Let's assume ondemand governor is being used.
> * Ondemand uses one timer per core and they have CPU affinity set.
> * For SMP, CPUfreq core expects the CPUfreq driver to adjust the per-CPU
> jiffies.
> * P1 indicates for lower CPU perfomance levels and P2 indicates a much
> higher CPU pref level (say 10 times faster).
>
> Issue 1: UP (non-SMP) scenario
> ==============================
>
> This issue is also present for SMP case, but I don't want to complicate
> this example with it. For future reference in this thread, let's call
> this "Context switch issue".
>
> Steps:
> - CPU running at P1
> - Driver context calls udelay
> - udelay does loop calculation and starts looping
> - Context switches to ondemand gov timer function
> - Ondemand gov changes CPU to P2
> - Context switches back to Driver context
> - udelay does a delay that's 10 times shorter.
>
> The last point is obviously a bad thing. I'm more concerned about ARM
> arch for the moment, but considering x86 takes a max of 20ms (20000us)
> for udelay, the above scenario looks very possible.
>
> Is there anything I missed that prevents this from happening?
>
> If this really is an issue, then one solution is to make cpufreq defer
> the freq change if some flag indicates that udelay is active. Basically,
> some kind of r/w semaphore or spinlock.
>
> Does this sound like a reasonable solution?
>
> Issue 2: SMP scenario
> =====================
>
> For future reference in this thread, let's call this "CPU affinity issue".
>
> Steps:
> - CPU0 running at P1
> - CPU1 running at P2
> - Driver context calls udelay in CPU0
> - udelay does loop calculation and starts looping
> - Driver context/thread is moved from CPU0 to CPU1
> - udelay does a delay that's 10 times shorter.
>
> Again, the last point is obviously a bad thing. Am I missing anything
> here too? Again, I care more about ARM, but x86 (which a lot more people
> might care about) also seems to be broken if it doesn't use the TSC
> method for the delay.
>
> Assuming we fix Issue 1 (or it's not present) I think an ideal solution
> for this issue is to do something like:
>
> udelay(us)
> {
> set cpu affinity to current CPU;
> Do the usual udelay code;
> restore cpu affinity status;
> }
>
> Does this sound like a reasonable solution?
>
> Thanks,
> Saravana
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: CPUfreq - udelay() interaction issues
2010-04-22 23:18 ` Thomas Renninger
@ 2010-04-22 23:37 ` Saravana Kannan
0 siblings, 0 replies; 16+ messages in thread
From: Saravana Kannan @ 2010-04-22 23:37 UTC (permalink / raw)
To: Thomas Renninger, Dave Jones; +Cc: cpufreq, linux-arm-msm
Looks like our emails just crossed each other.
Thomas Renninger wrote:
> On Thursday 22 April 2010 11:22:20 pm Saravana Kannan wrote:
>> Dave, Venkatesh and other maintainers,
>>
>> Any comments?
> From adjust_jiffies in cpufreq.c:
> * adjust_jiffies - adjust the system "loops_per_jiffy"
> *
> * This function alters the system "loops_per_jiffy" for the clock
> * speed change. Note that loops_per_jiffy cannot be updated on SMP
> * systems as each CPU might be scaled differently. So, use the arch
> * per-CPU loops_per_jiffy value wherever possible.
> For SMP case adjust_jiffies is just empty.
>
> udelay on x86 uses the per cpu loops_per_jiffy:
> cpu_data(raw_smp_processor_id()).loops_per_jiffy
>
> which does not get adjusted via adjust_jiffies()
Yes. That's why one of my questions/assumptions in my prev email was
about adjusting jiffies in SMP case.
So, can I get a confirmation on the following from the maintainers?
* For proper operation in SMP case, CPUfreq core expects the CPUfreq
driver to adjust the per-CPU jiffies.
Without that confirmation, I really can't claim to have found any
issues. Although, if the above is not the expectation, then turning on
CPUfreq should mandate booting up at the highest freq (see next point)
-- which is not realistic.
> For me it looks as udelay is always wrong and sleeps too long
> on lower frequencies, but I may oversee something.
Not only that, but if the boot up freq is not the highest freq, then
udelay is going to sleep shorter than the requested period on any freq
higher than the boot up freq.
> It shouldn't be that hard to test this with a tiny test module
> which is measuring real udelay sleep times via tsc reads on a x86 machine
> with stable tsc. Doing that in a loop, print out the diff to how long
> it should have slept and doing that under lowered freq or whatever bad
> circumstances, should show worst cases after some time.
I'm not sure there is a real need to test here. If the maintainers can
confirm that the cpufreq core expects the cpufreq drivers to handle the
jiffy adjusting for SMP cases, then it's clear that there are a few bugs.
Also, for the "context switch issue" (issue 1), it's going to be hard to
produce the exact scenario during testing and we may never hit it, but
it would still be an issue.
Any comments on the "CPU affinity issue"?
Thanks,
Saravana
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: CPUfreq - udelay() interaction issues
2010-04-22 23:21 ` Saravana Kannan
@ 2010-04-23 18:40 ` Mathieu Desnoyers
2010-04-23 19:22 ` Arjan van de Ven
2010-04-24 2:49 ` Saravana Kannan
0 siblings, 2 replies; 16+ messages in thread
From: Mathieu Desnoyers @ 2010-04-23 18:40 UTC (permalink / raw)
To: Saravana Kannan
Cc: cpufreq, linux-arm-msm, Dave Jones, Venkatesh Pallipadi,
Thomas Renninger, Arjan van de Ven, linux-kernel, Ingo Molnar,
Peter Zijlstra
[CCing Arjan, who seems to have played a lot with ondemand lately]
* Saravana Kannan (skannan@codeaurora.org) wrote:
> Resending email to "cc" the maintainers.
>
> Maintainers,
>
> Any comments?
>
> -Saravana
>
> Saravana Kannan wrote:
>> Hi,
>>
>> I think there are a couple of issues with cpufreq and udelay
>> interaction. But that's based on my understanding of cpufreq. I have
>> worked with it for sometime now, so hopefully I not completely wrong.
>> So, I will list my assumptions and what I think is/are the issue(s) and
>> their solutions.
>>
>> Please correct me if I'm wrong and let me know what you think.
>>
>> Assumptions:
>> ============
>> * Let's assume ondemand governor is being used.
>> * Ondemand uses one timer per core and they have CPU affinity set.
>> * For SMP, CPUfreq core expects the CPUfreq driver to adjust the
>> per-CPU jiffies.
>> * P1 indicates for lower CPU perfomance levels and P2 indicates a much
>> higher CPU pref level (say 10 times faster).
>>
>> Issue 1: UP (non-SMP) scenario
>> ==============================
>>
>> This issue is also present for SMP case, but I don't want to complicate
>> this example with it. For future reference in this thread, let's call
>> this "Context switch issue".
>>
>> Steps:
>> - CPU running at P1
>> - Driver context calls udelay
>> - udelay does loop calculation and starts looping
>> - Context switches to ondemand gov timer function
>> - Ondemand gov changes CPU to P2
>> - Context switches back to Driver context
>> - udelay does a delay that's 10 times shorter.
>>
>> The last point is obviously a bad thing. I'm more concerned about ARM
>> arch for the moment, but considering x86 takes a max of 20ms (20000us)
>> for udelay, the above scenario looks very possible.
I think your point is valid: if the CPU suddenly goes faster, the udelay
duration could be below the requested value.
I am not certain that there are any guarantee that udelay will sleep for
the exact amount requested, but I suppose it's generally assumed that it
will delay for _at least_ the amount requested. Then on top of that
interrupts, scheduler activity, etc... may make the delay longer.
Doing mutual exclusion between udelay and ondemand (as you propose
below) seems to be a solution that will complexify kernel locking a lot
for not much added value. spinlock is out of question because it would
disable preemption for 20ms durations. Any mutex or semaphore-based
solution will likely be a problem, because I suspect that udelay() is
used with preemption off somewhere.
One thing we could do, though, is to keep a per-cpu counter of the
number of frequency changes performed by ondemand. We sample the local
counter at the beginning of udelay, execute the correct number of loops,
re-sample the same counter, and if the frequency has changed while we
were executing the loops, we could go to a "slow-path" that would ensure
that we execute at least the minimum amount of loops to fill requested
time, possibly assuming the fastest frequency available on the system.
This counter could also be incremented by the scheduler migration code
so thread migrations between CPUs while udelay is running will also
trigger the "slow-path". This counter approach would take care of A-B-A
problems where frequency would go from A to B and the back to A while we
execute udelay, and also for migration from CPU A to B and back to A.
How does that sound ?
Thanks,
Mathieu
>>
>> Is there anything I missed that prevents this from happening?
>>
>> If this really is an issue, then one solution is to make cpufreq defer
>> the freq change if some flag indicates that udelay is active.
>> Basically, some kind of r/w semaphore or spinlock.
>>
>> Does this sound like a reasonable solution?
>>
>> Issue 2: SMP scenario
>> =====================
>>
>> For future reference in this thread, let's call this "CPU affinity issue".
>>
>> Steps:
>> - CPU0 running at P1
>> - CPU1 running at P2
>> - Driver context calls udelay in CPU0
>> - udelay does loop calculation and starts looping
>> - Driver context/thread is moved from CPU0 to CPU1
>> - udelay does a delay that's 10 times shorter.
>>
>> Again, the last point is obviously a bad thing. Am I missing anything
>> here too? Again, I care more about ARM, but x86 (which a lot more
>> people might care about) also seems to be broken if it doesn't use the
>> TSC method for the delay.
>>
>> Assuming we fix Issue 1 (or it's not present) I think an ideal solution
>> for this issue is to do something like:
>>
>> udelay(us)
>> {
>> set cpu affinity to current CPU;
>> Do the usual udelay code;
>> restore cpu affinity status;
>> }
>>
>> Does this sound like a reasonable solution?
>>
>> Thanks,
>> Saravana
>
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: CPUfreq - udelay() interaction issues
2010-04-23 18:40 ` Mathieu Desnoyers
@ 2010-04-23 19:22 ` Arjan van de Ven
2010-04-23 19:55 ` Mathieu Desnoyers
2010-04-24 2:57 ` Saravana Kannan
2010-04-24 2:49 ` Saravana Kannan
1 sibling, 2 replies; 16+ messages in thread
From: Arjan van de Ven @ 2010-04-23 19:22 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Saravana Kannan, cpufreq, linux-arm-msm, Dave Jones,
Venkatesh Pallipadi, Thomas Renninger, linux-kernel, Ingo Molnar,
Peter Zijlstra
On Fri, 23 Apr 2010 14:40:42 -0400
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> [CCing Arjan, who seems to have played a lot with ondemand lately]
>
> * Saravana Kannan (skannan@codeaurora.org) wrote:
> > Resending email to "cc" the maintainers.
> >
> > Maintainers,
> >
> > Any comments?
> >
> > -Saravana
> >
> > Saravana Kannan wrote:
> >> Hi,
> >>
> >> I think there are a couple of issues with cpufreq and udelay
> >> interaction. But that's based on my understanding of cpufreq. I
> >> have worked with it for sometime now, so hopefully I not
> >> completely wrong. So, I will list my assumptions and what I think
> >> is/are the issue(s) and their solutions.
> >>
> >> Please correct me if I'm wrong and let me know what you think.
> >>
> >> Assumptions:
> >> ============
> >> * Let's assume ondemand governor is being used.
> >> * Ondemand uses one timer per core and they have CPU affinity set.
> >> * For SMP, CPUfreq core expects the CPUfreq driver to adjust the
> >> per-CPU jiffies.
> >> * P1 indicates for lower CPU perfomance levels and P2 indicates a
> >> much higher CPU pref level (say 10 times faster).
> >>
so in reality, all hardware that does coordination between cores/etc
like this also has a tsc that is invariant of the actual P state.
If there are exceptions, those have a problem, but I can't think of any
right now.
Once the TSC is invariant of P state, udelay() is fine, since that goes
of the tsc, not of some delay loop kind of thing....
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: CPUfreq - udelay() interaction issues
2010-04-23 19:22 ` Arjan van de Ven
@ 2010-04-23 19:55 ` Mathieu Desnoyers
2010-04-24 18:56 ` Arjan van de Ven
2010-04-24 2:57 ` Saravana Kannan
1 sibling, 1 reply; 16+ messages in thread
From: Mathieu Desnoyers @ 2010-04-23 19:55 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Saravana Kannan, cpufreq, linux-arm-msm, Dave Jones,
Venkatesh Pallipadi, Thomas Renninger, linux-kernel, Ingo Molnar,
Peter Zijlstra
* Arjan van de Ven (arjan@infradead.org) wrote:
> On Fri, 23 Apr 2010 14:40:42 -0400
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>
> > [CCing Arjan, who seems to have played a lot with ondemand lately]
> >
> > * Saravana Kannan (skannan@codeaurora.org) wrote:
> > > Resending email to "cc" the maintainers.
> > >
> > > Maintainers,
> > >
> > > Any comments?
> > >
> > > -Saravana
> > >
> > > Saravana Kannan wrote:
> > >> Hi,
> > >>
> > >> I think there are a couple of issues with cpufreq and udelay
> > >> interaction. But that's based on my understanding of cpufreq. I
> > >> have worked with it for sometime now, so hopefully I not
> > >> completely wrong. So, I will list my assumptions and what I think
> > >> is/are the issue(s) and their solutions.
> > >>
> > >> Please correct me if I'm wrong and let me know what you think.
> > >>
> > >> Assumptions:
> > >> ============
> > >> * Let's assume ondemand governor is being used.
> > >> * Ondemand uses one timer per core and they have CPU affinity set.
> > >> * For SMP, CPUfreq core expects the CPUfreq driver to adjust the
> > >> per-CPU jiffies.
> > >> * P1 indicates for lower CPU perfomance levels and P2 indicates a
> > >> much higher CPU pref level (say 10 times faster).
> > >>
>
>
> so in reality, all hardware that does coordination between cores/etc
> like this also has a tsc that is invariant of the actual P state.
> If there are exceptions, those have a problem, but I can't think of any
> right now.
> Once the TSC is invariant of P state, udelay() is fine, since that goes
> of the tsc, not of some delay loop kind of thing....
I did an overview, back in 2007, of AMD and Intel processors that had either tsc
rate depending on P state and/or tsc rate changed by idle and/or tsc values
influenced by STPCLK-Throttling. Here are some notes, along with pointers to the
reference documents (please excuse the ad-hoc style of these notes):
http://git.dorsal.polymtl.ca/?p=lttv.git;a=blob_plain;f=doc/developer/tsc.txt
So I might be missing something about your statement "all hardware that does
coordination between cores/etc like this also has a tsc that is invariant of the
actual P state.". Do you mean that all udelay callers do not rely on it to
provide a guaranteed lower-bound, except for some sub-architectures ?
ARM currently does not rely on the c0_count register for udelay, but it could do
it in a near future on the omap3 at least. This register follows the CPU
frequency. I suspect that the current udelay loop implementation in
arch/arm/lib/delay.S, being calibrated on loops_per_jiffy, does not work that
well with ondemand cpufreq right now.
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: CPUfreq - udelay() interaction issues
2010-04-23 18:40 ` Mathieu Desnoyers
2010-04-23 19:22 ` Arjan van de Ven
@ 2010-04-24 2:49 ` Saravana Kannan
2010-04-24 5:56 ` Pavel Machek
2010-04-24 13:58 ` Mathieu Desnoyers
1 sibling, 2 replies; 16+ messages in thread
From: Saravana Kannan @ 2010-04-24 2:49 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: cpufreq, linux-arm-msm, Dave Jones, Thomas Renninger,
Arjan van de Ven, linux-kernel, Ingo Molnar, Peter Zijlstra
- Venkatesh since his email is bouncing with "unknown address".
Mathieu Desnoyers wrote:
>> Saravana Kannan wrote:
>>>
>>> Assumptions:
>>> ============
>>> * Let's assume ondemand governor is being used.
>>> * Ondemand uses one timer per core and they have CPU affinity set.
>>> * For SMP, CPUfreq core expects the CPUfreq driver to adjust the
>>> per-CPU jiffies.
>>> * P1 indicates for lower CPU perfomance levels and P2 indicates a much
>>> higher CPU pref level (say 10 times faster).
>>>
>>> Issue 1: UP (non-SMP) scenario
>>> ==============================
>>>
>>> This issue is also present for SMP case, but I don't want to complicate
>>> this example with it. For future reference in this thread, let's call
>>> this "Context switch issue".
>>>
>>> Steps:
>>> - CPU running at P1
>>> - Driver context calls udelay
>>> - udelay does loop calculation and starts looping
>>> - Context switches to ondemand gov timer function
>>> - Ondemand gov changes CPU to P2
>>> - Context switches back to Driver context
>>> - udelay does a delay that's 10 times shorter.
>>>
>>> The last point is obviously a bad thing. I'm more concerned about ARM
>>> arch for the moment, but considering x86 takes a max of 20ms (20000us)
>>> for udelay, the above scenario looks very possible.
>
> I think your point is valid: if the CPU suddenly goes faster, the udelay
> duration could be below the requested value.
>
> I am not certain that there are any guarantee that udelay will sleep for
> the exact amount requested, but I suppose it's generally assumed that it
> will delay for _at least_ the amount requested. Then on top of that
> interrupts, scheduler activity, etc... may make the delay longer.
Yes, udelay has an _at least_ guarantee. Extra delay is not much of a
concern and is probably unavoidable.
> Doing mutual exclusion between udelay and ondemand (as you propose
> below) seems to be a solution that will complexify kernel locking a lot
> for not much added value.
A lot of device drivers use udelay to meet h/w or protocol spec
requirements and if we randomly don't meet it, we would hit issues that
are hard to debug. So, I think proper working of udelay is quite a bit
important.
> spinlock is out of question because it would
> disable preemption for 20ms durations. Any mutex or semaphore-based
> solution will likely be a problem, because I suspect that udelay() is
> used with preemption off somewhere.
I agree, a spin lock would be a no-no for many reasons. Semaphore is not
really a problem in atomic contexts because cpufreq can't interrupt us
either -- so we don't need to grab a semaphore.
I was testing the waters about the actual existence of the bug before I
spent time to propose a clear solution. I guess I could have explained
the solution better -- explain further down.
> One thing we could do, though, is to keep a per-cpu counter of the
> number of frequency changes performed by ondemand. We sample the local
> counter at the beginning of udelay, execute the correct number of loops,
> re-sample the same counter, and if the frequency has changed while we
> were executing the loops, we could go to a "slow-path" that would ensure
> that we execute at least the minimum amount of loops to fill requested
> time, possibly assuming the fastest frequency available on the system.
> This counter could also be incremented by the scheduler migration code
> so thread migrations between CPUs while udelay is running will also
> trigger the "slow-path". This counter approach would take care of A-B-A
> problems where frequency would go from A to B and the back to A while we
> execute udelay, and also for migration from CPU A to B and back to A.
>
> How does that sound ?
Seems a bit more complicated than what I had in mind. This is touching
the scheduler I think we can get away without having to. Also, there is
no simple implementation for the "slowpath" that can guarantee the delay
without starting over the loop and hoping not to get interrupted or just
giving up and doing a massively inaccurate delay (like msleep, etc).
I was thinking of something along the lines of this:
udelay()
{
if (!is_atomic())
down_read(&freq_sem);
/* else
do nothing since cpufreq can't interrupt you.
*/
call usual code since cpufreq is not going to preempt you.
if (!is_atomic())
up_read(&freq_sem);
}
__cpufreq_driver_target(...)
{
down_write(&freq_sem);
cpufreq_driver->target(...);
up_write(&freq_sem);
}
In the implementation of the cpufreq driver, they just need to make sure
they always increase the LPJ _before_ increasing the freq and decrease
the LPJ _after_ decreasing the freq. This is make sure that when an
interrupt handler preempts the cpufreq driver code (since atomic
contexts aren't looking at the r/w semaphore) the LPJ value will be good
enough to satisfy the _at least_ guarantee of udelay().
For the CPU switching issue, I think the solution I proposed is quite
simple and should work.
Does my better explained solution look palatable?
Thanks,
Saravana
>>> Is there anything I missed that prevents this from happening?
>>>
>>> If this really is an issue, then one solution is to make cpufreq defer
>>> the freq change if some flag indicates that udelay is active.
>>> Basically, some kind of r/w semaphore or spinlock.
>>>
>>> Does this sound like a reasonable solution?
>>>
>>> Issue 2: SMP scenario
>>> =====================
>>>
>>> For future reference in this thread, let's call this "CPU affinity issue".
>>>
>>> Steps:
>>> - CPU0 running at P1
>>> - CPU1 running at P2
>>> - Driver context calls udelay in CPU0
>>> - udelay does loop calculation and starts looping
>>> - Driver context/thread is moved from CPU0 to CPU1
>>> - udelay does a delay that's 10 times shorter.
>>>
>>> Again, the last point is obviously a bad thing. Am I missing anything
>>> here too? Again, I care more about ARM, but x86 (which a lot more
>>> people might care about) also seems to be broken if it doesn't use the
>>> TSC method for the delay.
>>>
>>> Assuming we fix Issue 1 (or it's not present) I think an ideal solution
>>> for this issue is to do something like:
>>>
>>> udelay(us)
>>> {
>>> set cpu affinity to current CPU;
>>> Do the usual udelay code;
>>> restore cpu affinity status;
>>> }
>>>
>>> Does this sound like a reasonable solution?
>>>
>>> Thanks,
>>> Saravana
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: CPUfreq - udelay() interaction issues
2010-04-23 19:22 ` Arjan van de Ven
2010-04-23 19:55 ` Mathieu Desnoyers
@ 2010-04-24 2:57 ` Saravana Kannan
1 sibling, 0 replies; 16+ messages in thread
From: Saravana Kannan @ 2010-04-24 2:57 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Mathieu Desnoyers, cpufreq, linux-arm-msm, Dave Jones,
Venkatesh Pallipadi, Thomas Renninger, linux-kernel, Ingo Molnar,
Peter Zijlstra
Arjan van de Ven wrote:
> so in reality, all hardware that does coordination between cores/etc
> like this also has a tsc that is invariant of the actual P state.
> If there are exceptions, those have a problem, but I can't think of any
> right now.
> Once the TSC is invariant of P state, udelay() is fine, since that goes
> of the tsc, not of some delay loop kind of thing....
I assume you are talking specifically about x86. I want x86 to be
correct, but also want ARM to be correct. So, at this point I might as
well try to put in an arch independent fix.
Thanks,
Saravana
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: CPUfreq - udelay() interaction issues
2010-04-24 2:49 ` Saravana Kannan
@ 2010-04-24 5:56 ` Pavel Machek
2010-04-24 13:58 ` Mathieu Desnoyers
1 sibling, 0 replies; 16+ messages in thread
From: Pavel Machek @ 2010-04-24 5:56 UTC (permalink / raw)
To: Saravana Kannan
Cc: Mathieu Desnoyers, cpufreq, linux-arm-msm, Dave Jones,
Thomas Renninger, Arjan van de Ven, linux-kernel, Ingo Molnar,
Peter Zijlstra
Hi!
> Seems a bit more complicated than what I had in mind. This is
> touching the scheduler I think we can get away without having to.
> Also, there is no simple implementation for the "slowpath" that can
> guarantee the delay without starting over the loop and hoping not to
> get interrupted or just giving up and doing a massively inaccurate
> delay (like msleep, etc).
>
> I was thinking of something along the lines of this:
>
> udelay()
> {
> if (!is_atomic())
> down_read(&freq_sem);
> /* else
> do nothing since cpufreq can't interrupt you.
> */
>
> call usual code since cpufreq is not going to preempt you.
>
> if (!is_atomic())
> up_read(&freq_sem);
> }
Well, most delays are very short, so...
What about... we decide that cpufreq interruption or switch to
different cpu takes 100usec minimum, and only try to do complex magic
for delays >100usec? Hopefully there's minimum of those :-).
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: CPUfreq - udelay() interaction issues
2010-04-24 2:49 ` Saravana Kannan
2010-04-24 5:56 ` Pavel Machek
@ 2010-04-24 13:58 ` Mathieu Desnoyers
2010-04-27 23:41 ` Saravana Kannan
1 sibling, 1 reply; 16+ messages in thread
From: Mathieu Desnoyers @ 2010-04-24 13:58 UTC (permalink / raw)
To: Saravana Kannan
Cc: cpufreq, linux-arm-msm, Dave Jones, Thomas Renninger,
Arjan van de Ven, linux-kernel, Ingo Molnar, Peter Zijlstra
* Saravana Kannan (skannan@codeaurora.org) wrote:
[...]
>
> Seems a bit more complicated than what I had in mind. This is touching
> the scheduler I think we can get away without having to. Also, there is
> no simple implementation for the "slowpath" that can guarantee the delay
> without starting over the loop and hoping not to get interrupted or just
> giving up and doing a massively inaccurate delay (like msleep, etc).
Not necessarily. Another way to do it: we could keep the udelay loop counter in
the task struct. When ondemand changes frequency, and upon migration, this
counter would be adapted to the current cpu frequency.
>
> I was thinking of something along the lines of this:
>
> udelay()
> {
> if (!is_atomic())
see hardirq.h:
/*
* Are we running in atomic context? WARNING: this macro cannot
* always detect atomic context; in particular, it cannot know about
* held spinlocks in non-preemptible kernels. Thus it should not be
* used in the general case to determine whether sleeping is possible.
* Do not use in_atomic() in driver code.
*/
#define in_atomic() ((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_INATOMIC_BASE)
Sorry, your scheme is broken on !PREEMPT kernels.
> down_read(&freq_sem);
> /* else
> do nothing since cpufreq can't interrupt you.
> */
This comment seems broken. in_atomic() can return true because preemption is
disabled, thus letting cpufreq interrupts coming in.
>
> call usual code since cpufreq is not going to preempt you.
>
> if (!is_atomic())
> up_read(&freq_sem);
> }
>
> __cpufreq_driver_target(...)
> {
> down_write(&freq_sem);
> cpufreq_driver->target(...);
> up_write(&freq_sem);
> }
>
> In the implementation of the cpufreq driver, they just need to make sure
> they always increase the LPJ _before_ increasing the freq and decrease
> the LPJ _after_ decreasing the freq. This is make sure that when an
> interrupt handler preempts the cpufreq driver code (since atomic
> contexts aren't looking at the r/w semaphore) the LPJ value will be good
> enough to satisfy the _at least_ guarantee of udelay().
>
> For the CPU switching issue, I think the solution I proposed is quite
> simple and should work.
You mean this ?
>>>> udelay(us)
>>>> {
>>>> set cpu affinity to current CPU;
>>>> Do the usual udelay code;
>>>> restore cpu affinity status;
>>>> }
Things like lock scalability and performance degradations comes to my mind. We
can expect some drivers to make very heavy use of udelay(). This should not
bring a 4096-core box to its knees. sched_setaffinity() is very far from being
lightweight, as it locks cpu hotplug (that's a global mutex protecting a
refcount), allocates memory, manipulates cpumasks, etc...
>
> Does my better explained solution look palatable?
Nope, not on a multiprocessor system.
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: CPUfreq - udelay() interaction issues
2010-04-23 19:55 ` Mathieu Desnoyers
@ 2010-04-24 18:56 ` Arjan van de Ven
2010-04-24 21:00 ` Mathieu Desnoyers
0 siblings, 1 reply; 16+ messages in thread
From: Arjan van de Ven @ 2010-04-24 18:56 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Saravana Kannan, cpufreq, linux-arm-msm, Dave Jones,
Venkatesh Pallipadi, Thomas Renninger, linux-kernel, Ingo Molnar,
Peter Zijlstra
>
> I did an overview, back in 2007, of AMD and Intel processors that had
> either tsc rate depending on P state and/or tsc rate changed by idle
> and/or tsc values influenced by STPCLK-Throttling. Here are some
> notes, along with pointers to the reference documents (please excuse
> the ad-hoc style of these notes):
>
> http://git.dorsal.polymtl.ca/?p=lttv.git;a=blob_plain;f=doc/developer/tsc.txt
>
> So I might be missing something about your statement "all hardware
> that does coordination between cores/etc like this also has a tsc
> that is invariant of the actual P state.". Do you mean that all
> udelay callers do not rely on it to provide a guaranteed lower-bound,
> except for some sub-architectures ?
ok there's basically 3 cases
Case 1: single core, no hyperthreading. It does not matter what tsc
does, since the kernel knows what it does and will either scale it or
not for udelay depending on that.
(this case includes single core SMP configurations)
Case 2: multi core or HT, TSC is variable with CPU frequency.
This is the really sucky case, since logical CPU 0's tsc frequency in
part depends on what logical CPU 1 will do etc. No good answer
for this other than assuming the worst. Based on your document these do
actually exist in early P4 cpus.
Case 3: multi core/HT but with fixed rate TSC; no problem whatsoever,
tsc is a good measure for udelay.
Only case 2 sucks :-(
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: CPUfreq - udelay() interaction issues
2010-04-24 18:56 ` Arjan van de Ven
@ 2010-04-24 21:00 ` Mathieu Desnoyers
2010-04-24 23:20 ` Arjan van de Ven
0 siblings, 1 reply; 16+ messages in thread
From: Mathieu Desnoyers @ 2010-04-24 21:00 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Saravana Kannan, cpufreq, linux-arm-msm, Dave Jones,
Venkatesh Pallipadi, Thomas Renninger, linux-kernel, Ingo Molnar,
Peter Zijlstra
* Arjan van de Ven (arjan@infradead.org) wrote:
> >
> > I did an overview, back in 2007, of AMD and Intel processors that had
> > either tsc rate depending on P state and/or tsc rate changed by idle
> > and/or tsc values influenced by STPCLK-Throttling. Here are some
> > notes, along with pointers to the reference documents (please excuse
> > the ad-hoc style of these notes):
> >
> > http://git.dorsal.polymtl.ca/?p=lttv.git;a=blob_plain;f=doc/developer/tsc.txt
> >
> > So I might be missing something about your statement "all hardware
> > that does coordination between cores/etc like this also has a tsc
> > that is invariant of the actual P state.". Do you mean that all
> > udelay callers do not rely on it to provide a guaranteed lower-bound,
> > except for some sub-architectures ?
>
> ok there's basically 3 cases
>
> Case 1: single core, no hyperthreading. It does not matter what tsc
> does, since the kernel knows what it does and will either scale it or
> not for udelay depending on that.
> (this case includes single core SMP configurations)
The kernel will scale things like gettimeofday and the monotonic clocks in these
cases, but which piece of code takes care of arch/x86/lib/delay.c:delay_tsc()
exactly ? AFAIK, it reads the cycle counter with rdtscl() directly.
>
> Case 2: multi core or HT, TSC is variable with CPU frequency.
> This is the really sucky case, since logical CPU 0's tsc frequency in
> part depends on what logical CPU 1 will do etc. No good answer
> for this other than assuming the worst. Based on your document these do
> actually exist in early P4 cpus.
Keeping track of the cpu frequency changes can help here. Along with periodic
resynchronization if cpu clocks drift too far apart. I've done that for the
LTTng omap3 trace clock.
>
> Case 3: multi core/HT but with fixed rate TSC; no problem whatsoever,
> tsc is a good measure for udelay.
Agreed for case 3.
>
> Only case 2 sucks :-(
Not sure about case 1 specifically for udelay. I might have missed something
though.
Thanks,
Mathieu
>
>
> --
> Arjan van de Ven Intel Open Source Technology Centre
> For development, discussion and tips for power savings,
> visit http://www.lesswatts.org
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: CPUfreq - udelay() interaction issues
2010-04-24 21:00 ` Mathieu Desnoyers
@ 2010-04-24 23:20 ` Arjan van de Ven
0 siblings, 0 replies; 16+ messages in thread
From: Arjan van de Ven @ 2010-04-24 23:20 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Saravana Kannan, cpufreq, linux-arm-msm, Dave Jones,
Venkatesh Pallipadi, Thomas Renninger, linux-kernel, Ingo Molnar,
Peter Zijlstra
On Sat, 24 Apr 2010 17:00:42 -0400
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> > Case 2: multi core or HT, TSC is variable with CPU frequency.
> > This is the really sucky case, since logical CPU 0's tsc frequency
> > in part depends on what logical CPU 1 will do etc. No good answer
> > for this other than assuming the worst. Based on your document
> > these do actually exist in early P4 cpus.
>
> Keeping track of the cpu frequency changes can help here. Along with
> periodic resynchronization if cpu clocks drift too far apart. I've
> done that for the LTTng omap3 trace clock.
it's not enough; voting does not work that way.
the way voting ends up working is that the hardware runs the maximum of
the various frequency requests ... but for the threads that are not
idle.
so if cpu 1 goes to a high frequency, cpu 0 goes up to.. until cpu 1
goes idle; then only cpu 0's value is in use.
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: CPUfreq - udelay() interaction issues
2010-04-24 13:58 ` Mathieu Desnoyers
@ 2010-04-27 23:41 ` Saravana Kannan
0 siblings, 0 replies; 16+ messages in thread
From: Saravana Kannan @ 2010-04-27 23:41 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: cpufreq, linux-arm-msm, Dave Jones, Thomas Renninger,
Arjan van de Ven, linux-kernel, Ingo Molnar, Peter Zijlstra
Hi Mathieu,
Thanks for taking the time to provide your input. More responses below.
Mathieu Desnoyers wrote:
> * Saravana Kannan (skannan@codeaurora.org) wrote:
> [...]
>> Seems a bit more complicated than what I had in mind. This is touching
>> the scheduler I think we can get away without having to. Also, there is
>> no simple implementation for the "slowpath" that can guarantee the delay
>> without starting over the loop and hoping not to get interrupted or just
>> giving up and doing a massively inaccurate delay (like msleep, etc).
>
> Not necessarily. Another way to do it: we could keep the udelay loop counter in
> the task struct. When ondemand changes frequency, and upon migration, this
> counter would be adapted to the current cpu frequency.
This will take us back to the scalability problem because we now have to
go through every process running on a CPU to update their udelay loop
counters whenever the CPU freq changes.
>> I was thinking of something along the lines of this:
>>
>> udelay()
>> {
>> if (!is_atomic())
>
> see hardirq.h:
>
> /*
> * Are we running in atomic context? WARNING: this macro cannot
> * always detect atomic context; in particular, it cannot know about
> * held spinlocks in non-preemptible kernels. Thus it should not be
> * used in the general case to determine whether sleeping is possible.
> * Do not use in_atomic() in driver code.
> */
> #define in_atomic() ((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_INATOMIC_BASE)
>
> Sorry, your scheme is broken on !PREEMPT kernels.
If it's a !PREEMPT kernel, we don't have to worry about the CPUfreq
changing on us. CPU freq is changed in a deferrable work queue context.
>> down_read(&freq_sem);
>> /* else
>> do nothing since cpufreq can't interrupt you.
>> */
>
> This comment seems broken. in_atomic() can return true because preemption is
> disabled, thus letting cpufreq interrupts coming in.
As mentioned earlier, cpufreq change can't happen when udelay is running
in !PREEMPT kernel (which is where in_atomic() won't work). Btw, I
actually wasn't referring to the real in_atomic() macro (I remembered it
having limitations). But now that you mentioned the limitation, it might
not be a problem after all.
>> call usual code since cpufreq is not going to preempt you.
>>
>> if (!is_atomic())
>> up_read(&freq_sem);
>> }
>>
>> __cpufreq_driver_target(...)
>> {
>> down_write(&freq_sem);
>> cpufreq_driver->target(...);
>> up_write(&freq_sem);
>> }
>>
>> In the implementation of the cpufreq driver, they just need to make sure
>> they always increase the LPJ _before_ increasing the freq and decrease
>> the LPJ _after_ decreasing the freq. This is make sure that when an
>> interrupt handler preempts the cpufreq driver code (since atomic
>> contexts aren't looking at the r/w semaphore) the LPJ value will be good
>> enough to satisfy the _at least_ guarantee of udelay().
>>
>> For the CPU switching issue, I think the solution I proposed is quite
>> simple and should work.
>
> You mean this ?
>
>>>>> udelay(us)
>>>>> {
>>>>> set cpu affinity to current CPU;
>>>>> Do the usual udelay code;
>>>>> restore cpu affinity status;
>>>>> }
>
> Things like lock scalability and performance degradations comes to my mind. We
> can expect some drivers to make very heavy use of udelay(). This should not
> bring a 4096-core box to its knees. sched_setaffinity() is very far from being
> lightweight, as it locks cpu hotplug (that's a global mutex protecting a
> refcount), allocates memory, manipulates cpumasks, etc...
Hmm... set affinity does seem more complicated than what I expected.
>> Does my better explained solution look palatable?
>
> Nope, not on a multiprocessor system.
Yes, set affinity seems to be a problem.
Didn't get to work on this for the past few days. Let me think more
about this before I get back. In the mean time, if you can come up with
a relatively simple solution without scalability issues, I would be glad
to drop my existing solution.
Thanks again for the input.
-Saravana
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2010-04-27 23:41 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-22 3:34 CPUfreq - udelay() interaction issues Saravana Kannan
2010-04-22 21:22 ` Saravana Kannan
2010-04-22 23:18 ` Thomas Renninger
2010-04-22 23:37 ` Saravana Kannan
2010-04-22 23:21 ` Saravana Kannan
2010-04-23 18:40 ` Mathieu Desnoyers
2010-04-23 19:22 ` Arjan van de Ven
2010-04-23 19:55 ` Mathieu Desnoyers
2010-04-24 18:56 ` Arjan van de Ven
2010-04-24 21:00 ` Mathieu Desnoyers
2010-04-24 23:20 ` Arjan van de Ven
2010-04-24 2:57 ` Saravana Kannan
2010-04-24 2:49 ` Saravana Kannan
2010-04-24 5:56 ` Pavel Machek
2010-04-24 13:58 ` Mathieu Desnoyers
2010-04-27 23:41 ` Saravana Kannan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.