linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
@ 2014-05-07 23:23 Doug Anderson
  2014-05-08 10:41 ` Viresh Kumar
  0 siblings, 1 reply; 31+ messages in thread
From: Doug Anderson @ 2014-05-07 23:23 UTC (permalink / raw)
  To: Russell King, Will Deacon
  Cc: John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Rafael J. Wysocki, Doug Anderson,
	nicolas.pitre, sboyd, marc.zyngier, swarren, paul.gortmaker,
	linux-arm-kernel, linux-kernel

Downscaling loops_per_jiffy on SMP ARM systems really doesn't work.
You could really only do this if:

* Each CPU is has independent frequency changes (changing one CPU
  doesn't affect another).
* We change the generic ARM udelay() code to actually look at percpu
  loops_per_jiffy.

I don't know of any ARM CPUs that are totally independent that don't
just use a timer-based delay anyway.  For those that don't have a
timer-based delay, we should be conservative and overestimate
loops_per_jiffy.

Note that on some systems you might sometimes see (in the extreme case
when we're all the way downclocked) a udelay(100) become a
udelay(1000) now.

Signed-off-by: Doug Anderson <dianders@chromium.org>
---
Note that I don't have an board that has cpufreq enabled upstream so
I'm relying on the testing I did on our local kernel-3.8.  Hopefully
someone out there can test using David's nifty udelay tests.  In order
to see this you'd need to make sure that you _don't_ have arch timers
enabled.  See:
* https://patchwork.kernel.org/patch/4124721/
* https://patchwork.kernel.org/patch/4124731/

 arch/arm/kernel/smp.c | 45 ++++++++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 7c4fada..9d944f6 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -649,39 +649,50 @@ int setup_profiling_timer(unsigned int multiplier)
 
 #ifdef CONFIG_CPU_FREQ
 
-static DEFINE_PER_CPU(unsigned long, l_p_j_ref);
-static DEFINE_PER_CPU(unsigned long, l_p_j_ref_freq);
 static unsigned long global_l_p_j_ref;
 static unsigned long global_l_p_j_ref_freq;
+static unsigned long global_l_p_j_max_freq;
+
+/**
+ * cpufreq_callback - Adjust loops_per_jiffies when frequency changes
+ *
+ * When the CPU frequency changes we need to adjust loops_per_jiffies, which
+ * we assume scales linearly with frequency.
+ *
+ * This function is fairly castrated and only ever adjust loops_per_jiffies
+ * upward.  It also doesn't adjust the PER_CPU loops_per_jiffies.  Here's why:
+ * 1. The ARM udelay only ever looks at the global loops_per_jiffy not the
+ *    percpu one.  If your CPUs _are not_ changed in lockstep you could run
+ *    into problems by decreasing loops_per_jiffies since one of the other
+ *    processors might still be running slower.
+ * 2. The ARM udelay reads the loops_per_jiffy at the beginning of its loop and
+ *    no other times.  If your CPUs _are_ changed in lockstep you could run
+ *    into a race where one CPU has started its loop with old (slower)
+ *    loops_per_jiffy and then suddenly is running faster.
+ *
+ * Anyone who wants a good udelay() should be using a timer-based solution
+ * anyway.  If you don't have a timer solution, you just gotta be conservative.
+ */
 
 static int cpufreq_callback(struct notifier_block *nb,
 					unsigned long val, void *data)
 {
 	struct cpufreq_freqs *freq = data;
-	int cpu = freq->cpu;
 
 	if (freq->flags & CPUFREQ_CONST_LOOPS)
 		return NOTIFY_OK;
 
-	if (!per_cpu(l_p_j_ref, cpu)) {
-		per_cpu(l_p_j_ref, cpu) =
-			per_cpu(cpu_data, cpu).loops_per_jiffy;
-		per_cpu(l_p_j_ref_freq, cpu) = freq->old;
-		if (!global_l_p_j_ref) {
-			global_l_p_j_ref = loops_per_jiffy;
-			global_l_p_j_ref_freq = freq->old;
-		}
+	if (!global_l_p_j_ref) {
+		global_l_p_j_ref = loops_per_jiffy;
+		global_l_p_j_ref_freq = freq->old;
+		global_l_p_j_max_freq = freq->old;
 	}
 
-	if ((val == CPUFREQ_PRECHANGE  && freq->old < freq->new) ||
-	    (val == CPUFREQ_POSTCHANGE && freq->old > freq->new)) {
+	if (freq->new > global_l_p_j_max_freq) {
 		loops_per_jiffy = cpufreq_scale(global_l_p_j_ref,
 						global_l_p_j_ref_freq,
 						freq->new);
-		per_cpu(cpu_data, cpu).loops_per_jiffy =
-			cpufreq_scale(per_cpu(l_p_j_ref, cpu),
-					per_cpu(l_p_j_ref_freq, cpu),
-					freq->new);
+		global_l_p_j_max_freq = freq->new;
 	}
 	return NOTIFY_OK;
 }
-- 
1.9.1.423.g4596e3a


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-07 23:23 [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems Doug Anderson
@ 2014-05-08 10:41 ` Viresh Kumar
  2014-05-08 15:25   ` Doug Anderson
  0 siblings, 1 reply; 31+ messages in thread
From: Viresh Kumar @ 2014-05-08 10:41 UTC (permalink / raw)
  To: Doug Anderson, Rafael J. Wysocki
  Cc: Russell King, Will Deacon, John Stultz, David Riley, olof,
	Sonny Rao, Richard Zhao, Santosh Shilimkar, Shawn Guo,
	Nicolas Pitre, Stephen Boyd, Marc Zyngier, Stephen Warren,
	Paul Gortmaker, linux-arm-kernel, linux-kernel

Fixing Rafael's id.

On Thu, May 8, 2014 at 4:53 AM, Doug Anderson <dianders@chromium.org> wrote:
> Downscaling loops_per_jiffy on SMP ARM systems really doesn't work.
> You could really only do this if:
>
> * Each CPU is has independent frequency changes (changing one CPU
>   doesn't affect another).
> * We change the generic ARM udelay() code to actually look at percpu
>   loops_per_jiffy.

There is one more case I believe:
All CPUs share a single clock line and generic udelay() uses global
loops_per_jiffy, as it would never happen that some other CPU is running
faster and udelay would complete early

> I don't know of any ARM CPUs that are totally independent that don't
> just use a timer-based delay anyway.  For those that don't have a
> timer-based delay, we should be conservative and overestimate
> loops_per_jiffy.
>
> Note that on some systems you might sometimes see (in the extreme case
> when we're all the way downclocked) a udelay(100) become a
> udelay(1000) now.
>
> Signed-off-by: Doug Anderson <dianders@chromium.org>

> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 7c4fada..9d944f6 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -649,39 +649,50 @@ int setup_profiling_timer(unsigned int multiplier)
>
>  #ifdef CONFIG_CPU_FREQ
>
> -static DEFINE_PER_CPU(unsigned long, l_p_j_ref);
> -static DEFINE_PER_CPU(unsigned long, l_p_j_ref_freq);
>  static unsigned long global_l_p_j_ref;
>  static unsigned long global_l_p_j_ref_freq;
> +static unsigned long global_l_p_j_max_freq;
> +
> +/**
> + * cpufreq_callback - Adjust loops_per_jiffies when frequency changes
> + *
> + * When the CPU frequency changes we need to adjust loops_per_jiffies, which
> + * we assume scales linearly with frequency.
> + *
> + * This function is fairly castrated and only ever adjust loops_per_jiffies
> + * upward.  It also doesn't adjust the PER_CPU loops_per_jiffies.  Here's why:
> + * 1. The ARM udelay only ever looks at the global loops_per_jiffy not the
> + *    percpu one.  If your CPUs _are not_ changed in lockstep you could run
> + *    into problems by decreasing loops_per_jiffies since one of the other
> + *    processors might still be running slower.
> + * 2. The ARM udelay reads the loops_per_jiffy at the beginning of its loop and
> + *    no other times.  If your CPUs _are_ changed in lockstep you could run
> + *    into a race where one CPU has started its loop with old (slower)
> + *    loops_per_jiffy and then suddenly is running faster.
> + *
> + * Anyone who wants a good udelay() should be using a timer-based solution
> + * anyway.  If you don't have a timer solution, you just gotta be conservative.
> + */
>
>  static int cpufreq_callback(struct notifier_block *nb,
>                                         unsigned long val, void *data)
>  {
>         struct cpufreq_freqs *freq = data;
> -       int cpu = freq->cpu;
>
>         if (freq->flags & CPUFREQ_CONST_LOOPS)
>                 return NOTIFY_OK;
>
> -       if (!per_cpu(l_p_j_ref, cpu)) {
> -               per_cpu(l_p_j_ref, cpu) =
> -                       per_cpu(cpu_data, cpu).loops_per_jiffy;
> -               per_cpu(l_p_j_ref_freq, cpu) = freq->old;
> -               if (!global_l_p_j_ref) {
> -                       global_l_p_j_ref = loops_per_jiffy;
> -                       global_l_p_j_ref_freq = freq->old;
> -               }
> +       if (!global_l_p_j_ref) {
> +               global_l_p_j_ref = loops_per_jiffy;
> +               global_l_p_j_ref_freq = freq->old;
> +               global_l_p_j_max_freq = freq->old;
>         }
>
> -       if ((val == CPUFREQ_PRECHANGE  && freq->old < freq->new) ||
> -           (val == CPUFREQ_POSTCHANGE && freq->old > freq->new)) {
> +       if (freq->new > global_l_p_j_max_freq) {
>                 loops_per_jiffy = cpufreq_scale(global_l_p_j_ref,
>                                                 global_l_p_j_ref_freq,
>                                                 freq->new);
> -               per_cpu(cpu_data, cpu).loops_per_jiffy =
> -                       cpufreq_scale(per_cpu(l_p_j_ref, cpu),
> -                                       per_cpu(l_p_j_ref_freq, cpu),
> -                                       freq->new);
> +               global_l_p_j_max_freq = freq->new;
>         }
>         return NOTIFY_OK;
>  }

So, the end of this will happen when a CPU is set to its highest frequency
for the first time. After that this routine will simply be noise and nothing
more. Isn't it?

And if that's the case, why don't we get rid of it completely and set
global-loop-per-jiffy for the max freq at boot ?

--
viresh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-08 10:41 ` Viresh Kumar
@ 2014-05-08 15:25   ` Doug Anderson
  2014-05-08 16:04     ` Nicolas Pitre
  2014-05-09  9:25     ` [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems Viresh Kumar
  0 siblings, 2 replies; 31+ messages in thread
From: Doug Anderson @ 2014-05-08 15:25 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Russell King, Will Deacon, John Stultz,
	David Riley, olof, Sonny Rao, Richard Zhao, Santosh Shilimkar,
	Shawn Guo, Nicolas Pitre, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

Viresh,

On Thu, May 8, 2014 at 3:41 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> Fixing Rafael's id.
>
> On Thu, May 8, 2014 at 4:53 AM, Doug Anderson <dianders@chromium.org> wrote:
>> Downscaling loops_per_jiffy on SMP ARM systems really doesn't work.
>> You could really only do this if:
>>
>> * Each CPU is has independent frequency changes (changing one CPU
>>   doesn't affect another).
>> * We change the generic ARM udelay() code to actually look at percpu
>>   loops_per_jiffy.
>
> There is one more case I believe:
> All CPUs share a single clock line and generic udelay() uses global
> loops_per_jiffy, as it would never happen that some other CPU is running
> faster and udelay would complete early

I'm not sure I understood your case.  Maybe you can give me an
example?  I can give you an example myself of the single clock case
and you can tell me how your example differs:

1. Initially CPU1 and CPU2 at 200MHz.  Pretend loops_per_jiffy is 1000.

2. CPU1 starts a delay.  It reads global lpj (1000) and sets up its
local registers up for the loop.

3. At the same time, CPU2 is transitioning the system to 2000MHz.
Right after CPU1 reads lpj CPU2 stores it as 10000.

4. Now CPU1 and CPU2 are running at 2000MHz but CPU1 is only looping
1000 times.  It will complete too fast.

...you could possibly try to account for this in the delay loop code
(being careful to handle all of the corner cases and races).  ...or we
could make the delay loop super conservative and suggest that people
should be using a real timer.


> So, the end of this will happen when a CPU is set to its highest frequency
> for the first time. After that this routine will simply be noise and nothing
> more. Isn't it?

Yup.


> And if that's the case, why don't we get rid of it completely and set
> global-loop-per-jiffy for the max freq at boot ?

How exactly do you do this in a generic way?  I know that our systems
don't always boot up at full speed.  The HP Chromebook 11 might boot
up at 900MHz and stay that way for a while until the battery gets
enough juice.  The Samsung Chromebook 2 will boot up at 1.8GHz
although some of them can go to 1.9GHz and others to 2.0GHz.  Waiting
to actually see the cpufreq transition is a safe way, though it does
end up with some extra function calls.

-Doug

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-08 15:25   ` Doug Anderson
@ 2014-05-08 16:04     ` Nicolas Pitre
  2014-05-08 16:41       ` Doug Anderson
  2014-05-09  9:25     ` [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems Viresh Kumar
  1 sibling, 1 reply; 31+ messages in thread
From: Nicolas Pitre @ 2014-05-08 16:04 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Viresh Kumar, Rafael J. Wysocki, Russell King, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On Thu, 8 May 2014, Doug Anderson wrote:

> 1. Initially CPU1 and CPU2 at 200MHz.  Pretend loops_per_jiffy is 1000.
> 
> 2. CPU1 starts a delay.  It reads global lpj (1000) and sets up its
> local registers up for the loop.
> 
> 3. At the same time, CPU2 is transitioning the system to 2000MHz.
> Right after CPU1 reads lpj CPU2 stores it as 10000.
> 
> 4. Now CPU1 and CPU2 are running at 2000MHz but CPU1 is only looping
> 1000 times.  It will complete too fast.
> 
> ...you could possibly try to account for this in the delay loop code
> (being careful to handle all of the corner cases and races).  ...or we
> could make the delay loop super conservative and suggest that people
> should be using a real timer.

I don't see how you can possibly solve this issue without a timer based 
delay.  Even if you scale the loop count in only one direction, it will 
still have this problem even though the window for the race would happen 
much less often.  Yet having a delay which is way longer than expected 
might cause problems in some cases.

Yet clock frequency changes with the kind of magnitude you give in your 
example are usually not instantaneous.  Given udelay should be used for 
very short delays, it is likely that CPU1 will complete its count before 
the higher clock frequency is effective in most cases.

> How exactly do you do this in a generic way?  I know that our systems
> don't always boot up at full speed.  The HP Chromebook 11 might boot
> up at 900MHz and stay that way for a while until the battery gets
> enough juice.  The Samsung Chromebook 2 will boot up at 1.8GHz
> although some of them can go to 1.9GHz and others to 2.0GHz.  Waiting
> to actually see the cpufreq transition is a safe way, though it does
> end up with some extra function calls.

The Samsung Chromebook uses an A15 which does use a timer based udelay.  
What about the others?  IOW is this a real problem in practice or a 
theoretical one?

SMP with shared clock for DVFS simply doesn't allow pure loop counts to 
always be accurate.  Trying to fix a broken implementation with 
something that is still broken to some extent, and maybe more in 
some cases, doesn't look like much progress to me.


Nicolas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-08 16:04     ` Nicolas Pitre
@ 2014-05-08 16:41       ` Doug Anderson
  2014-05-08 17:43         ` Nicolas Pitre
  0 siblings, 1 reply; 31+ messages in thread
From: Doug Anderson @ 2014-05-08 16:41 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Viresh Kumar, Rafael J. Wysocki, Russell King, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

Nicolas,

On Thu, May 8, 2014 at 9:04 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Thu, 8 May 2014, Doug Anderson wrote:
>
>> 1. Initially CPU1 and CPU2 at 200MHz.  Pretend loops_per_jiffy is 1000.
>>
>> 2. CPU1 starts a delay.  It reads global lpj (1000) and sets up its
>> local registers up for the loop.
>>
>> 3. At the same time, CPU2 is transitioning the system to 2000MHz.
>> Right after CPU1 reads lpj CPU2 stores it as 10000.
>>
>> 4. Now CPU1 and CPU2 are running at 2000MHz but CPU1 is only looping
>> 1000 times.  It will complete too fast.
>>
>> ...you could possibly try to account for this in the delay loop code
>> (being careful to handle all of the corner cases and races).  ...or we
>> could make the delay loop super conservative and suggest that people
>> should be using a real timer.
>
> I don't see how you can possibly solve this issue without a timer based
> delay.  Even if you scale the loop count in only one direction, it will
> still have this problem even though the window for the race would happen
> much less often.  Yet having a delay which is way longer than expected
> might cause problems in some cases.

You could possibly try to do something excessively clever by checking
the loops per jiffy after the loop was done (and perhaps register for
cpufreq changes so you know if it changed and then changed back)?  As
I said, I don't think it's a good use of anyone's time.

Longer delays aren't very good, but IMHO having some delays of 100 =>
1000 is better than having delays of 100 => 75.  The former will cause
mostly performance problems and the later will cause real correctness
problems.

I'm not saying that 100 => 1000 is good, it's just less bad.
Specifically even in a timer-based system you can't guarantee that a
udelay(100) won't end up a udelay(1000) if the kernel finds something
better to do than to run your code.  I agree that there might be code
that breaks when a udelay(100) becomes a udelay(1000), but probably
that code needs to be fixed to be more tolerant anyway.

When you've got a udelay(100) => udelay(75) then suddenly you're
returning from regulator code before the regulator has fully ramped up
to its final voltage.  You're talking to peripherals faster than they
were intended to be talked to.  ...etc.


> Yet clock frequency changes with the kind of magnitude you give in your
> example are usually not instantaneous.  Given udelay should be used for
> very short delays, it is likely that CPU1 will complete its count before
> the higher clock frequency is effective in most cases.

In practice we found that the frequency changed fast enough in a real
system that we were seeing real problems, at least with a udelay(50).
If you've got a system using lpj-based udelay() that supports cpufreq,
grab David's test patches and try for yourself.  Perhaps I was
mistaken?


>> How exactly do you do this in a generic way?  I know that our systems
>> don't always boot up at full speed.  The HP Chromebook 11 might boot
>> up at 900MHz and stay that way for a while until the battery gets
>> enough juice.  The Samsung Chromebook 2 will boot up at 1.8GHz
>> although some of them can go to 1.9GHz and others to 2.0GHz.  Waiting
>> to actually see the cpufreq transition is a safe way, though it does
>> end up with some extra function calls.
>
> The Samsung Chromebook uses an A15 which does use a timer based udelay.
> What about the others?  IOW is this a real problem in practice or a
> theoretical one?

Correct, the Samsung Chromebook uses a timer-based udelay upstream (it
still doesn't in our tree BTW, but we're working to rectify that).
...that's why I didn't send the patch up initially.  I sent it up at
the request of John Stultz in a discussion about David's test code.


> SMP with shared clock for DVFS simply doesn't allow pure loop counts to
> always be accurate.  Trying to fix a broken implementation with
> something that is still broken to some extent, and maybe more in
> some cases, doesn't look like much progress to me.

I totally agree that this doesn't really fix the problem nicely which
is why I didn't send it initially.

I will make the argument that this patch makes things less broken
overall on any systems that actually end up running this code, but if
you want NAK it then it won't cause me any heartache.  ;)

-Doug

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-08 16:41       ` Doug Anderson
@ 2014-05-08 17:43         ` Nicolas Pitre
  2014-05-08 18:06           ` Doug Anderson
  2014-05-08 19:22           ` Russell King - ARM Linux
  0 siblings, 2 replies; 31+ messages in thread
From: Nicolas Pitre @ 2014-05-08 17:43 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Viresh Kumar, Rafael J. Wysocki, Russell King, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On Thu, 8 May 2014, Doug Anderson wrote:

> Nicolas,
> 
> On Thu, May 8, 2014 at 9:04 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Thu, 8 May 2014, Doug Anderson wrote:
> >
> >> 1. Initially CPU1 and CPU2 at 200MHz.  Pretend loops_per_jiffy is 1000.
> >>
> >> 2. CPU1 starts a delay.  It reads global lpj (1000) and sets up its
> >> local registers up for the loop.
> >>
> >> 3. At the same time, CPU2 is transitioning the system to 2000MHz.
> >> Right after CPU1 reads lpj CPU2 stores it as 10000.
> >>
> >> 4. Now CPU1 and CPU2 are running at 2000MHz but CPU1 is only looping
> >> 1000 times.  It will complete too fast.
> >>
> >> ...you could possibly try to account for this in the delay loop code
> >> (being careful to handle all of the corner cases and races).  ...or we
> >> could make the delay loop super conservative and suggest that people
> >> should be using a real timer.
> >
> > I don't see how you can possibly solve this issue without a timer based
> > delay.  Even if you scale the loop count in only one direction, it will
> > still have this problem even though the window for the race would happen
> > much less often.  Yet having a delay which is way longer than expected
> > might cause problems in some cases.
> 
> You could possibly try to do something excessively clever by checking
> the loops per jiffy after the loop was done (and perhaps register for
> cpufreq changes so you know if it changed and then changed back)?  As
> I said, I don't think it's a good use of anyone's time.

Agreed.

> Longer delays aren't very good, but IMHO having some delays of 100 =>
> 1000 is better than having delays of 100 => 75.  The former will cause
> mostly performance problems and the later will cause real correctness
> problems.
> I'm not saying that 100 => 1000 is good, it's just less bad.

There might be some cases where precise timing is needed though.
I thought I came across one such case in the past but I can't remember 
which.

> Specifically even in a timer-based system you can't guarantee that a
> udelay(100) won't end up a udelay(1000) if the kernel finds something
> better to do than to run your code.  I agree that there might be code
> that breaks when a udelay(100) becomes a udelay(1000), but probably
> that code needs to be fixed to be more tolerant anyway.

The timer based udelay implementation does poll the timer, and if exact 
timing is important then you'll certainly turn off IRQs during the 
critical sequence.

[...]
> > SMP with shared clock for DVFS simply doesn't allow pure loop counts to
> > always be accurate.  Trying to fix a broken implementation with
> > something that is still broken to some extent, and maybe more in
> > some cases, doesn't look like much progress to me.
> 
> I totally agree that this doesn't really fix the problem nicely which
> is why I didn't send it initially.
> 
> I will make the argument that this patch makes things less broken
> overall on any systems that actually end up running this code, but if
> you want NAK it then it won't cause me any heartache.  ;)

What I insist on is for this issue to be solved using a stable counter 
such a timer when available.  It _is_ available on one of the target you 
mentioned so that is the solution you should add to your tree.  
Investigating a similar solution for your other target should be 
preferred to hacking the udelay loop. This way you're guaranteed to 
solve this problem fully.


Nicolas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-08 17:43         ` Nicolas Pitre
@ 2014-05-08 18:06           ` Doug Anderson
  2014-05-08 19:59             ` Nicolas Pitre
  2014-05-08 20:55             ` Russell King - ARM Linux
  2014-05-08 19:22           ` Russell King - ARM Linux
  1 sibling, 2 replies; 31+ messages in thread
From: Doug Anderson @ 2014-05-08 18:06 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Viresh Kumar, Rafael J. Wysocki, Russell King, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

Nicolas,

On Thu, May 8, 2014 at 10:43 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Thu, 8 May 2014, Doug Anderson wrote:

>> Longer delays aren't very good, but IMHO having some delays of 100 =>
>> 1000 is better than having delays of 100 => 75.  The former will cause
>> mostly performance problems and the later will cause real correctness
>> problems.
>> I'm not saying that 100 => 1000 is good, it's just less bad.
>
> There might be some cases where precise timing is needed though.
> I thought I came across one such case in the past but I can't remember
> which.

If precise timing is needed, shouldn't it be using ktime?


>> I will make the argument that this patch makes things less broken
>> overall on any systems that actually end up running this code, but if
>> you want NAK it then it won't cause me any heartache.  ;)
>
> What I insist on is for this issue to be solved using a stable counter
> such a timer when available.  It _is_ available on one of the target you
> mentioned so that is the solution you should add to your tree.

Yup, we're working on it.


> Investigating a similar solution for your other target should be
> preferred to hacking the udelay loop. This way you're guaranteed to
> solve this problem fully.

I have no other target in mind.  I'm merely sending this up there just
in case there is some cpufreq running ARM board that is SMP and has no
timer-based udelay.  Those are the only boards that could possibly be
running this code anyway.

I guess I would say that my patch is unhacking the this code.  The
code after my patch is simpler.  I would perhaps argue that (ec971ea
ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp)
should never have landed to begin with.

-Doug

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-08 17:43         ` Nicolas Pitre
  2014-05-08 18:06           ` Doug Anderson
@ 2014-05-08 19:22           ` Russell King - ARM Linux
  2014-05-08 20:12             ` Nicolas Pitre
  1 sibling, 1 reply; 31+ messages in thread
From: Russell King - ARM Linux @ 2014-05-08 19:22 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Doug Anderson, Viresh Kumar, Rafael J. Wysocki, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On Thu, May 08, 2014 at 01:43:29PM -0400, Nicolas Pitre wrote:
> There might be some cases where precise timing is needed though.
> I thought I came across one such case in the past but I can't remember 
> which.

Anything which is expecting precise timings from udelay() is broken.
Firstly, udelay() does _not_ guarantee to give you a delay of at least
the requested period - it tries to give an _approximate_ delay.

The first thing to realise is that loops_per_jiffy is calibrated with
interrupts _on_, which means that the calculated loops_per_jiffy is
the number of iterations in a jiffy _minus_ the time it takes for the
timer interrupt to be processed.  This means loops_per_jiffy will
always be smaller than the number of loops that would be executed
within the same period.

This leads to udelay() always producing slightly shorter than
requested delays - this is quite measurable.

It gets worse when you consider the utter mess that the L2 cache code
is in - where on many platforms we calibrate udelay() with the cache
off, which results in loops_per_jiffy being smaller than it would
otherwise be (meaning shorter delays.)

So, that's two reasons there why loops_per_jiffy will be smaller than
it should be at boot, and therefore udelay() will be inaccurate.

Another reason udelay() can be unaccurate is if interrupts are on, and
you have USB present.  USB interrupt processing can take on the order
of 10s of milliseconds even on 800MHz or faster ARM CPUs.  If you
receive one of those mid-udelay(), your CPU will be occupied elsewhere.

Another reason is preempt.  If the kernel can preempt during udelay(),
your delay will also be much longer than you requested.  No, disabling
preemption in udelay() is not on, that would increase preemption
latency.

So, the only /real/ solution if you want proper delays is for udelay()
to use a timer or counter, and this is should always the preferred
method where it's available.  Quite rightly, we're not hacking udelay()
stuff to work around not having that, or if someone configures it out.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-08 18:06           ` Doug Anderson
@ 2014-05-08 19:59             ` Nicolas Pitre
  2014-05-08 20:55             ` Russell King - ARM Linux
  1 sibling, 0 replies; 31+ messages in thread
From: Nicolas Pitre @ 2014-05-08 19:59 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Viresh Kumar, Rafael J. Wysocki, Russell King, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On Thu, 8 May 2014, Doug Anderson wrote:

> Nicolas,
> 
> On Thu, May 8, 2014 at 10:43 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Thu, 8 May 2014, Doug Anderson wrote:
> 
> >> Longer delays aren't very good, but IMHO having some delays of 100 =>
> >> 1000 is better than having delays of 100 => 75.  The former will cause
> >> mostly performance problems and the later will cause real correctness
> >> problems.
> >> I'm not saying that 100 => 1000 is good, it's just less bad.
> >
> > There might be some cases where precise timing is needed though.
> > I thought I came across one such case in the past but I can't remember
> > which.
> 
> If precise timing is needed, shouldn't it be using ktime?

I meant "precise" in the sense that you might have to poke at some 
hardware within some deadline e.g. set bit, wait 13us, clear bit, and 
not exceed 15us between both bit events.  Arguably this is best hangled 
with actual FIQs (when they're available).  But I don't have any actual 
case of this to bring as example.

> >> I will make the argument that this patch makes things less broken
> >> overall on any systems that actually end up running this code, but if
> >> you want NAK it then it won't cause me any heartache.  ;)
> >
> > What I insist on is for this issue to be solved using a stable counter
> > such a timer when available.  It _is_ available on one of the target you
> > mentioned so that is the solution you should add to your tree.
> 
> Yup, we're working on it.
> 
> 
> > Investigating a similar solution for your other target should be
> > preferred to hacking the udelay loop. This way you're guaranteed to
> > solve this problem fully.
> 
> I have no other target in mind.  I'm merely sending this up there just
> in case there is some cpufreq running ARM board that is SMP and has no
> timer-based udelay.  Those are the only boards that could possibly be
> running this code anyway.
> 
> I guess I would say that my patch is unhacking the this code.  The
> code after my patch is simpler.  I would perhaps argue that (ec971ea
> ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp)
> should never have landed to begin with.

I would agree to qualify your patch as brokenness mitigation and that 
might be better than nothing.  But this should probably print a warning 
to that effect.

Yet, wouldn't using ktime to implement udelay() a better solution in 
that case?  Suffice to make sure a clock source with microsecs 
resolution is available.


Nicolas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-08 19:22           ` Russell King - ARM Linux
@ 2014-05-08 20:12             ` Nicolas Pitre
  2014-05-08 20:39               ` John Stultz
  2014-05-08 20:52               ` Russell King - ARM Linux
  0 siblings, 2 replies; 31+ messages in thread
From: Nicolas Pitre @ 2014-05-08 20:12 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Doug Anderson, Viresh Kumar, Rafael J. Wysocki, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On Thu, 8 May 2014, Russell King - ARM Linux wrote:

> Anything which is expecting precise timings from udelay() is broken.
> Firstly, udelay() does _not_ guarantee to give you a delay of at least
> the requested period - it tries to give an _approximate_ delay.
> 
> The first thing to realise is that loops_per_jiffy is calibrated with
> interrupts _on_, which means that the calculated loops_per_jiffy is
> the number of iterations in a jiffy _minus_ the time it takes for the
> timer interrupt to be processed.  This means loops_per_jiffy will
> always be smaller than the number of loops that would be executed
> within the same period.
> 
> This leads to udelay() always producing slightly shorter than
> requested delays - this is quite measurable.

OK, this is certainly bad.  Hopefully it won't be that far off like it 
would when the CPU is in the middle of a clock freq transition.

> It gets worse when you consider the utter mess that the L2 cache code
> is in - where on many platforms we calibrate udelay() with the cache
> off, which results in loops_per_jiffy being smaller than it would
> otherwise be (meaning shorter delays.)
> 
> So, that's two reasons there why loops_per_jiffy will be smaller than
> it should be at boot, and therefore udelay() will be inaccurate.
> 
> Another reason udelay() can be unaccurate is if interrupts are on, and
> you have USB present.  USB interrupt processing can take on the order
> of 10s of milliseconds even on 800MHz or faster ARM CPUs.  If you
> receive one of those mid-udelay(), your CPU will be occupied elsewhere.
> 
> Another reason is preempt.  If the kernel can preempt during udelay(),
> your delay will also be much longer than you requested.  No, disabling
> preemption in udelay() is not on, that would increase preemption
> latency.
> 
> So, the only /real/ solution if you want proper delays is for udelay()
> to use a timer or counter, and this is should always the preferred
> method where it's available.  Quite rightly, we're not hacking udelay()
> stuff to work around not having that, or if someone configures it out.

What about using a default based on ktime_get(), or even sched_clock(), 
when SMP and cpufreq are configured in?


Nicolas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-08 20:12             ` Nicolas Pitre
@ 2014-05-08 20:39               ` John Stultz
  2014-05-08 20:52               ` Russell King - ARM Linux
  1 sibling, 0 replies; 31+ messages in thread
From: John Stultz @ 2014-05-08 20:39 UTC (permalink / raw)
  To: Nicolas Pitre, Russell King - ARM Linux
  Cc: Doug Anderson, Viresh Kumar, Rafael J. Wysocki, Will Deacon,
	David Riley, olof, Sonny Rao, Richard Zhao, Santosh Shilimkar,
	Shawn Guo, Stephen Boyd, Marc Zyngier, Stephen Warren,
	Paul Gortmaker, linux-arm-kernel, linux-kernel

On 05/08/2014 01:12 PM, Nicolas Pitre wrote:
> On Thu, 8 May 2014, Russell King - ARM Linux wrote:
>
>> So, the only /real/ solution if you want proper delays is for udelay()
>> to use a timer or counter, and this is should always the preferred
>> method where it's available.  Quite rightly, we're not hacking udelay()
>> stuff to work around not having that, or if someone configures it out.
> What about using a default based on ktime_get(), or even sched_clock(), 
> when SMP and cpufreq are configured in?

While somewhat rare these days, there are some systems that still only
have tick(jiffies)-granular time (as well as sched_clock), which might
be far too coarse for udelay, and in addition depends on irqs being
enabled to be able for time to progress.

The loops based delay is sort of a hack that allows fine grained (but
somewhat inaccurate) time delays in just about any context.

The problem being the loop based delays are somewhat fragile,
particularly as many of the assumptions for that code have been broken
(cpufreq scaling, ASMP, SMIs or other calibration issues, etc).  On
these systems where the assumptions are broken, time-based counter
delays are needed, but not always in place.

Thus I suspect we need something that provides proper warnings when we
are using loop delays on system that breaks its assumptions.

thanks
-john




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-08 20:12             ` Nicolas Pitre
  2014-05-08 20:39               ` John Stultz
@ 2014-05-08 20:52               ` Russell King - ARM Linux
  2014-05-09  1:37                 ` Nicolas Pitre
  1 sibling, 1 reply; 31+ messages in thread
From: Russell King - ARM Linux @ 2014-05-08 20:52 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Doug Anderson, Viresh Kumar, Rafael J. Wysocki, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On Thu, May 08, 2014 at 04:12:14PM -0400, Nicolas Pitre wrote:
> On Thu, 8 May 2014, Russell King - ARM Linux wrote:
> 
> > Anything which is expecting precise timings from udelay() is broken.
> > Firstly, udelay() does _not_ guarantee to give you a delay of at least
> > the requested period - it tries to give an _approximate_ delay.
> > 
> > The first thing to realise is that loops_per_jiffy is calibrated with
> > interrupts _on_, which means that the calculated loops_per_jiffy is
> > the number of iterations in a jiffy _minus_ the time it takes for the
> > timer interrupt to be processed.  This means loops_per_jiffy will
> > always be smaller than the number of loops that would be executed
> > within the same period.
> > 
> > This leads to udelay() always producing slightly shorter than
> > requested delays - this is quite measurable.
> 
> OK, this is certainly bad.  Hopefully it won't be that far off like it 
> would when the CPU is in the middle of a clock freq transition.

It depends on the system, but my point is that assumption that udelay()
gives a delay of at least the requested time is something that is false,
and has *always* been false.

It's not "broken" either - it's just how the thing works, and the "fix"
for it is to use a timer based implementation which isn't affected by
interrupts.

> > So, the only /real/ solution if you want proper delays is for udelay()
> > to use a timer or counter, and this is should always the preferred
> > method where it's available.  Quite rightly, we're not hacking udelay()
> > stuff to work around not having that, or if someone configures it out.
> 
> What about using a default based on ktime_get(), or even sched_clock(), 
> when SMP and cpufreq are configured in?

I see no reason to play those kinds of games.  Keep the message simple.
If you're in a preempt or SMP environment, provide a timer for udelay().
IF you're in an environment with IRQs which can take a long time, use
a timer for udelay().  If you're in an environment where the CPU clock
can change unexpectedly, use a timer for udelay().

The very last thing we want to do is to sit around doing expensive calls
into various time keeping code which themselves add conversion on top,
which then end up making udelay() latency even worse than the loop-based
versions on slower machines.

So... the message is nice and simple: where possible, use a timer for
udelay().

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-08 18:06           ` Doug Anderson
  2014-05-08 19:59             ` Nicolas Pitre
@ 2014-05-08 20:55             ` Russell King - ARM Linux
  2014-05-09  0:02               ` Doug Anderson
  1 sibling, 1 reply; 31+ messages in thread
From: Russell King - ARM Linux @ 2014-05-08 20:55 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Nicolas Pitre, Viresh Kumar, Rafael J. Wysocki, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On Thu, May 08, 2014 at 11:06:24AM -0700, Doug Anderson wrote:
> I guess I would say that my patch is unhacking the this code.  The
> code after my patch is simpler.  I would perhaps argue that (ec971ea
> ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp)
> should never have landed to begin with.

That depends on your point of view.  As I've already pointed out through
the examples of why udelay() is inaccurate, for driver authors, they
should assume that udelay() just gives you an "approximate" delay and
it has no accuracy.

When you start from that point, rather than (as you seem to be) believing
that it has some kind of accuracy, then the implementation of scaling the
loops_per_jiffy is entirely reasonable - it's a best effort implementation.

Again, just use a timer for your udelay() implementation.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-08 20:55             ` Russell King - ARM Linux
@ 2014-05-09  0:02               ` Doug Anderson
  2014-05-09  0:23                 ` Russell King - ARM Linux
  0 siblings, 1 reply; 31+ messages in thread
From: Doug Anderson @ 2014-05-09  0:02 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Nicolas Pitre, Viresh Kumar, Rafael J. Wysocki, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel,
	Thomas Gleixner, Mark Brown

Russel,

On Thu, May 8, 2014 at 1:55 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, May 08, 2014 at 11:06:24AM -0700, Doug Anderson wrote:
>> I guess I would say that my patch is unhacking the this code.  The
>> code after my patch is simpler.  I would perhaps argue that (ec971ea
>> ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp)
>> should never have landed to begin with.
>
> That depends on your point of view.  As I've already pointed out through
> the examples of why udelay() is inaccurate, for driver authors, they
> should assume that udelay() just gives you an "approximate" delay and
> it has no accuracy.

That disagrees with what Thomas Gleixner says at
<http://lkml.iu.edu//hypermail/linux/kernel/1203.1/01034.html>.  It
also seems like perhaps the regulator core is broken, then...  If a
udelay(30) can end up as a udelay(20) then we may return from a
regulator code 10us earlier than we should and we'll assume that a
regulator is ramped before it really is...

I'm out tomorrow but I can confirm on Monday that I was really seeing
udelay(30) be a udelay(20) without this patch.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-09  0:02               ` Doug Anderson
@ 2014-05-09  0:23                 ` Russell King - ARM Linux
  2014-05-09  4:41                   ` Doug Anderson
  0 siblings, 1 reply; 31+ messages in thread
From: Russell King - ARM Linux @ 2014-05-09  0:23 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Nicolas Pitre, Viresh Kumar, Rafael J. Wysocki, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel,
	Thomas Gleixner, Mark Brown

On Thu, May 08, 2014 at 05:02:02PM -0700, Doug Anderson wrote:
> Russel,
> 
> On Thu, May 8, 2014 at 1:55 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Thu, May 08, 2014 at 11:06:24AM -0700, Doug Anderson wrote:
> >> I guess I would say that my patch is unhacking the this code.  The
> >> code after my patch is simpler.  I would perhaps argue that (ec971ea
> >> ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp)
> >> should never have landed to begin with.
> >
> > That depends on your point of view.  As I've already pointed out through
> > the examples of why udelay() is inaccurate, for driver authors, they
> > should assume that udelay() just gives you an "approximate" delay and
> > it has no accuracy.
> 
> That disagrees with what Thomas Gleixner says at
> <http://lkml.iu.edu//hypermail/linux/kernel/1203.1/01034.html>.  It
> also seems like perhaps the regulator core is broken, then...  If a
> udelay(30) can end up as a udelay(20) then we may return from a
> regulator code 10us earlier than we should and we'll assume that a
> regulator is ramped before it really is...
> 
> I'm out tomorrow but I can confirm on Monday that I was really seeing
> udelay(30) be a udelay(20) without this patch.

Thomas is wrong - when I researched this topic, I ended up finding
that udelay() does delay _less_ than requested, and I mailed Linus
about it...  This is whe way udelay() is - it's an approximate delay,
it's *not* accurate.

It's also fairly obvious when you stop and consider how it's calibrated.

Take a moment to wonder when we used to recalibrate each CPU individally,
why the boot CPU bogomips would be slightly lower than the secondary
CPU bogomips.  This is all down to the boot CPU having to run timer
interrupts while the secondary CPUs weren't at the time they calibrated.

Linus doesn't give a damn about udelay() being slightly short in this
way.  Neither do I.

Let me repeat again: use a timer.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-08 20:52               ` Russell King - ARM Linux
@ 2014-05-09  1:37                 ` Nicolas Pitre
  2014-05-09  4:43                   ` Doug Anderson
  2014-05-09  9:18                   ` [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems# Russell King - ARM Linux
  0 siblings, 2 replies; 31+ messages in thread
From: Nicolas Pitre @ 2014-05-09  1:37 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Doug Anderson, Viresh Kumar, Rafael J. Wysocki, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On Thu, 8 May 2014, Russell King - ARM Linux wrote:

> If you're in a preempt or SMP environment, provide a timer for udelay().
> IF you're in an environment with IRQs which can take a long time, use
> a timer for udelay().  If you're in an environment where the CPU clock
> can change unexpectedly, use a timer for udelay().

Longer delays are normally not a problem.  If they are, then simply 
disabling IRQs may solve it if absolutely required.  With much shorter 
delays than expected this is another story.

What about the following:

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 7c4fada440..10030cc5a0 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -682,6 +682,15 @@ static int cpufreq_callback(struct notifier_block *nb,
 			cpufreq_scale(per_cpu(l_p_j_ref, cpu),
 					per_cpu(l_p_j_ref_freq, cpu),
 					freq->new);
+		/*
+		 * Another CPU might have called udelay() just before LPJ
+		 * and a shared CPU clock is increased.  That other CPU still
+		 * looping on the old LPJ value would return significantly
+		 * sooner than expected.  The actual fix is to provide a
+		 * timer based udelay() implementation instead.
+		 */
+		if (freq->old < freq->new)
+			pr_warn_once("*** udelay() on SMP is racy and may be much shorter than expected ***\n");
 	}
 	return NOTIFY_OK;
 }


Nicolas


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-09  0:23                 ` Russell King - ARM Linux
@ 2014-05-09  4:41                   ` Doug Anderson
  0 siblings, 0 replies; 31+ messages in thread
From: Doug Anderson @ 2014-05-09  4:41 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Nicolas Pitre, Viresh Kumar, Rafael J. Wysocki, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel,
	Thomas Gleixner, Mark Brown

Russell,

On Thu, May 8, 2014 at 5:23 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, May 08, 2014 at 05:02:02PM -0700, Doug Anderson wrote:
>> Russel,
>>
>> On Thu, May 8, 2014 at 1:55 PM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>> > On Thu, May 08, 2014 at 11:06:24AM -0700, Doug Anderson wrote:
>> >> I guess I would say that my patch is unhacking the this code.  The
>> >> code after my patch is simpler.  I would perhaps argue that (ec971ea
>> >> ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp)
>> >> should never have landed to begin with.
>> >
>> > That depends on your point of view.  As I've already pointed out through
>> > the examples of why udelay() is inaccurate, for driver authors, they
>> > should assume that udelay() just gives you an "approximate" delay and
>> > it has no accuracy.
>>
>> That disagrees with what Thomas Gleixner says at
>> <http://lkml.iu.edu//hypermail/linux/kernel/1203.1/01034.html>.  It
>> also seems like perhaps the regulator core is broken, then...  If a
>> udelay(30) can end up as a udelay(20) then we may return from a
>> regulator code 10us earlier than we should and we'll assume that a
>> regulator is ramped before it really is...
>>
>> I'm out tomorrow but I can confirm on Monday that I was really seeing
>> udelay(30) be a udelay(20) without this patch.
>
> Thomas is wrong - when I researched this topic, I ended up finding
> that udelay() does delay _less_ than requested, and I mailed Linus
> about it...  This is whe way udelay() is - it's an approximate delay,
> it's *not* accurate.
>
> It's also fairly obvious when you stop and consider how it's calibrated.
>
> Take a moment to wonder when we used to recalibrate each CPU individally,
> why the boot CPU bogomips would be slightly lower than the secondary
> CPU bogomips.  This is all down to the boot CPU having to run timer
> interrupts while the secondary CPUs weren't at the time they calibrated.
>
> Linus doesn't give a damn about udelay() being slightly short in this
> way.  Neither do I.

Can you define what you mean by "slightly"?  I'm personally not super
concerned by udelay(50) becoming udelay(49), but...

I've got a test that hammers on udelay and tests it against ktime.  It
runs for _multiple minutes_ and give nearly spot on udelay(50) values.
 Then, it gets unlucky and gives something like this (values reported
in nanoseconds):

[  467.034522] 0. want=50000, got=308708
[  467.034522] 1. want=50000, got=51917
[  467.034522] 2. want=50000, got=51167
[  467.034522] 3. want=50000, got=76291
[  467.034522] 4. want=50000, got=149959
[  467.034522] 5. want=50000, got=76458
[  467.034522] 6. want=50000, got=39583
[  467.034522] 7. want=50000, got=19459
[  467.034522] 8. want=50000, got=20625
[  467.034522] 9. want=50000, got=20375

It's things like "want 50000, got 19459" that concern me.  It's even
more concerning that it only happens after multiple minutes, meaning
that it will lead to random, impossible to reproduce bugs.

NOTE: I haven't traced through to figure out exactly what scenario
causes the above to happen.  In my case I'm simply looping over some
test code using
<https://chromium-review.googlesource.com/#/c/189760/8/init/calibrate.c>.
 ...but if we stop screwing with loops_per_jiffy it definitely doesn't
happen.


> Let me repeat again: use a timer.

Totally agree.  We are using a timer going forward.  I was posting
this patch to help those less fortunate who happen to be running on a
system where the timer isn't available for whatever reason.

-Doug

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-09  1:37                 ` Nicolas Pitre
@ 2014-05-09  4:43                   ` Doug Anderson
  2014-05-09  9:18                   ` [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems# Russell King - ARM Linux
  1 sibling, 0 replies; 31+ messages in thread
From: Doug Anderson @ 2014-05-09  4:43 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Russell King - ARM Linux, Viresh Kumar, Rafael J. Wysocki,
	Will Deacon, John Stultz, David Riley, olof, Sonny Rao,
	Richard Zhao, Santosh Shilimkar, Shawn Guo, Stephen Boyd,
	Marc Zyngier, Stephen Warren, Paul Gortmaker, linux-arm-kernel,
	linux-kernel

Nicolas,

On Thu, May 8, 2014 at 6:37 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Thu, 8 May 2014, Russell King - ARM Linux wrote:
>
>> If you're in a preempt or SMP environment, provide a timer for udelay().
>> IF you're in an environment with IRQs which can take a long time, use
>> a timer for udelay().  If you're in an environment where the CPU clock
>> can change unexpectedly, use a timer for udelay().
>
> Longer delays are normally not a problem.  If they are, then simply
> disabling IRQs may solve it if absolutely required.  With much shorter
> delays than expected this is another story.
>
> What about the following:
>
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 7c4fada440..10030cc5a0 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -682,6 +682,15 @@ static int cpufreq_callback(struct notifier_block *nb,
>                         cpufreq_scale(per_cpu(l_p_j_ref, cpu),
>                                         per_cpu(l_p_j_ref_freq, cpu),
>                                         freq->new);
> +               /*
> +                * Another CPU might have called udelay() just before LPJ
> +                * and a shared CPU clock is increased.  That other CPU still
> +                * looping on the old LPJ value would return significantly
> +                * sooner than expected.  The actual fix is to provide a
> +                * timer based udelay() implementation instead.
> +                */
> +               if (freq->old < freq->new)
> +                       pr_warn_once("*** udelay() on SMP is racy and may be much shorter than expected ***\n");

I would be OK with that.  At least someone would have a clue what to do.

-Doug

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems#
  2014-05-09  1:37                 ` Nicolas Pitre
  2014-05-09  4:43                   ` Doug Anderson
@ 2014-05-09  9:18                   ` Russell King - ARM Linux
  2014-05-09 18:00                     ` Nicolas Pitre
  1 sibling, 1 reply; 31+ messages in thread
From: Russell King - ARM Linux @ 2014-05-09  9:18 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Doug Anderson, Viresh Kumar, Rafael J. Wysocki, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On Thu, May 08, 2014 at 09:37:15PM -0400, Nicolas Pitre wrote:
> On Thu, 8 May 2014, Russell King - ARM Linux wrote:
> 
> > If you're in a preempt or SMP environment, provide a timer for udelay().
> > IF you're in an environment with IRQs which can take a long time, use
> > a timer for udelay().  If you're in an environment where the CPU clock
> > can change unexpectedly, use a timer for udelay().
> 
> Longer delays are normally not a problem.  If they are, then simply 
> disabling IRQs may solve it if absolutely required.  With much shorter 
> delays than expected this is another story.
> 
> What about the following:
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 7c4fada440..10030cc5a0 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -682,6 +682,15 @@ static int cpufreq_callback(struct notifier_block *nb,
>  			cpufreq_scale(per_cpu(l_p_j_ref, cpu),
>  					per_cpu(l_p_j_ref_freq, cpu),
>  					freq->new);
> +		/*
> +		 * Another CPU might have called udelay() just before LPJ
> +		 * and a shared CPU clock is increased.  That other CPU still
> +		 * looping on the old LPJ value would return significantly
> +		 * sooner than expected.  The actual fix is to provide a
> +		 * timer based udelay() implementation instead.
> +		 */
> +		if (freq->old < freq->new)
> +			pr_warn_once("*** udelay() on SMP is racy and may be much shorter than expected ***\n");
>  	}
>  	return NOTIFY_OK;
>  }

No, because you're assuming this is just a SMP problem.  What about
preempt, where you could preempt away from a udelay loop to change
the CPU frequency, and then back again, possibly resulting in the
CPU clock rate increasing and maybe a shorter delay if the switch
from-change-clock-and-back is fast enough?  Remember that udelay()
can be used for up to 2ms delays.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems
  2014-05-08 15:25   ` Doug Anderson
  2014-05-08 16:04     ` Nicolas Pitre
@ 2014-05-09  9:25     ` Viresh Kumar
  1 sibling, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2014-05-09  9:25 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Rafael J. Wysocki, Russell King, Will Deacon, John Stultz,
	David Riley, olof, Sonny Rao, Richard Zhao, Santosh Shilimkar,
	Shawn Guo, Nicolas Pitre, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On 8 May 2014 20:55, Doug Anderson <dianders@chromium.org> wrote:
> On Thu, May 8, 2014 at 3:41 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:

>> And if that's the case, why don't we get rid of it completely and set
>> global-loop-per-jiffy for the max freq at boot ?
>
> How exactly do you do this in a generic way?

We could have requested for the highest possible frequency value
from cpufreq and adjust accordingly at boot.

But probably thread is turning in another direction now, so lets wait
to see what others have in box.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems#
  2014-05-09  9:18                   ` [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems# Russell King - ARM Linux
@ 2014-05-09 18:00                     ` Nicolas Pitre
  2014-05-09 18:22                       ` Russell King - ARM Linux
  0 siblings, 1 reply; 31+ messages in thread
From: Nicolas Pitre @ 2014-05-09 18:00 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Doug Anderson, Viresh Kumar, Rafael J. Wysocki, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On Fri, 9 May 2014, Russell King - ARM Linux wrote:

> On Thu, May 08, 2014 at 09:37:15PM -0400, Nicolas Pitre wrote:
> > On Thu, 8 May 2014, Russell King - ARM Linux wrote:
> > 
> > > If you're in a preempt or SMP environment, provide a timer for udelay().
> > > IF you're in an environment with IRQs which can take a long time, use
> > > a timer for udelay().  If you're in an environment where the CPU clock
> > > can change unexpectedly, use a timer for udelay().
> > 
> > Longer delays are normally not a problem.  If they are, then simply 
> > disabling IRQs may solve it if absolutely required.  With much shorter 
> > delays than expected this is another story.
> > 
> > What about the following:
> > 
> > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> > index 7c4fada440..10030cc5a0 100644
> > --- a/arch/arm/kernel/smp.c
> > +++ b/arch/arm/kernel/smp.c
> > @@ -682,6 +682,15 @@ static int cpufreq_callback(struct notifier_block *nb,
> >  			cpufreq_scale(per_cpu(l_p_j_ref, cpu),
> >  					per_cpu(l_p_j_ref_freq, cpu),
> >  					freq->new);
> > +		/*
> > +		 * Another CPU might have called udelay() just before LPJ
> > +		 * and a shared CPU clock is increased.  That other CPU still
> > +		 * looping on the old LPJ value would return significantly
> > +		 * sooner than expected.  The actual fix is to provide a
> > +		 * timer based udelay() implementation instead.
> > +		 */
> > +		if (freq->old < freq->new)
> > +			pr_warn_once("*** udelay() on SMP is racy and may be much shorter than expected ***\n");
> >  	}
> >  	return NOTIFY_OK;
> >  }
> 
> No, because you're assuming this is just a SMP problem.  What about
> preempt, where you could preempt away from a udelay loop to change
> the CPU frequency, and then back again, possibly resulting in the
> CPU clock rate increasing and maybe a shorter delay if the switch
> from-change-clock-and-back is fast enough?  Remember that udelay()
> can be used for up to 2ms delays.

Well... that would be somewhat less likely but still possible yes.

So the only way to "solve" this might look similar in spirit to what 
Doug alluded to earlier i.e. increase a sequence number on 
CPUFREQ_PRECHANGE and increase it again on CPUFREQ_POSTCHANGE, and have 
udelay() compare the count sampled before reading lpj and after 
returning from the loop code.  When the sequence count doesn't match 
then suffice to perform some arbitrarily large extra loops.


Nicolas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems#
  2014-05-09 18:00                     ` Nicolas Pitre
@ 2014-05-09 18:22                       ` Russell King - ARM Linux
  2014-05-09 21:05                         ` Nicolas Pitre
  0 siblings, 1 reply; 31+ messages in thread
From: Russell King - ARM Linux @ 2014-05-09 18:22 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Doug Anderson, Viresh Kumar, Rafael J. Wysocki, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On Fri, May 09, 2014 at 02:00:54PM -0400, Nicolas Pitre wrote:
> On Fri, 9 May 2014, Russell King - ARM Linux wrote:
> 
> > On Thu, May 08, 2014 at 09:37:15PM -0400, Nicolas Pitre wrote:
> > > On Thu, 8 May 2014, Russell King - ARM Linux wrote:
> > > 
> > > > If you're in a preempt or SMP environment, provide a timer for udelay().
> > > > IF you're in an environment with IRQs which can take a long time, use
> > > > a timer for udelay().  If you're in an environment where the CPU clock
> > > > can change unexpectedly, use a timer for udelay().
> > > 
> > > Longer delays are normally not a problem.  If they are, then simply 
> > > disabling IRQs may solve it if absolutely required.  With much shorter 
> > > delays than expected this is another story.
> > > 
> > > What about the following:
> > > 
> > > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> > > index 7c4fada440..10030cc5a0 100644
> > > --- a/arch/arm/kernel/smp.c
> > > +++ b/arch/arm/kernel/smp.c
> > > @@ -682,6 +682,15 @@ static int cpufreq_callback(struct notifier_block *nb,
> > >  			cpufreq_scale(per_cpu(l_p_j_ref, cpu),
> > >  					per_cpu(l_p_j_ref_freq, cpu),
> > >  					freq->new);
> > > +		/*
> > > +		 * Another CPU might have called udelay() just before LPJ
> > > +		 * and a shared CPU clock is increased.  That other CPU still
> > > +		 * looping on the old LPJ value would return significantly
> > > +		 * sooner than expected.  The actual fix is to provide a
> > > +		 * timer based udelay() implementation instead.
> > > +		 */
> > > +		if (freq->old < freq->new)
> > > +			pr_warn_once("*** udelay() on SMP is racy and may be much shorter than expected ***\n");
> > >  	}
> > >  	return NOTIFY_OK;
> > >  }
> > 
> > No, because you're assuming this is just a SMP problem.  What about
> > preempt, where you could preempt away from a udelay loop to change
> > the CPU frequency, and then back again, possibly resulting in the
> > CPU clock rate increasing and maybe a shorter delay if the switch
> > from-change-clock-and-back is fast enough?  Remember that udelay()
> > can be used for up to 2ms delays.
> 
> Well... that would be somewhat less likely but still possible yes.
> 
> So the only way to "solve" this might look similar in spirit to what 
> Doug alluded to earlier i.e. increase a sequence number on 
> CPUFREQ_PRECHANGE and increase it again on CPUFREQ_POSTCHANGE, and have 
> udelay() compare the count sampled before reading lpj and after 
> returning from the loop code.  When the sequence count doesn't match 
> then suffice to perform some arbitrarily large extra loops.

I'd much prefer just printing a warning at kernel boot time to report
that the kernel is running with features which would make udelay() less
than accurate.

Remember, it should be usable for _short_ delays on slow machines as
well as other stuff, and if we're going to start throwing stuff like
the above at it, it's going to become very inefficient.

And... I go back to what I've been saying all along: use a timer in
this situation, don't rely on the loops-based udelay if you have
preempt, USB interrupts, SMP etc.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems#
  2014-05-09 18:22                       ` Russell King - ARM Linux
@ 2014-05-09 21:05                         ` Nicolas Pitre
  2014-05-12 23:51                           ` Doug Anderson
  0 siblings, 1 reply; 31+ messages in thread
From: Nicolas Pitre @ 2014-05-09 21:05 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Doug Anderson, Viresh Kumar, Rafael J. Wysocki, Will Deacon,
	John Stultz, David Riley, olof, Sonny Rao, Richard Zhao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On Fri, 9 May 2014, Russell King - ARM Linux wrote:

> I'd much prefer just printing a warning at kernel boot time to report
> that the kernel is running with features which would make udelay() less
> than accurate.

What if there is simply no timer to rely upon, as in those cases where 
interrupts are needed for time keeping to make progress?  We should do 
better than simply saying "sorry your kernel should irradicate every 
udelay() usage to be reliable".

And I mean "reliable" which is not exactly the same as "accurate".  
Reliable means "never *significantly* shorter".

> Remember, it should be usable for _short_ delays on slow machines as
> well as other stuff, and if we're going to start throwing stuff like
> the above at it, it's going to become very inefficient.

You said that udelay can be much longer than expected due to various 
reasons.

You also said that the IRQ handler overhead during udelay calibration 
makes actual delays slightli shorter than expected.

I'm suggesting the addition of a slight overhead that is much smaller 
than the IRQ handler here.  That shouldn't impact things masurably.  
I'd certainly like Doug to run his udelay timing test with the following 
patch to see if it solves the problem.

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index dff714d886..74fb571a55 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -57,11 +57,6 @@ extern void __bad_udelay(void);
 			__const_udelay((n) * UDELAY_MULT)) :		\
 	  __udelay(n))
 
-/* Loop-based definitions for assembly code. */
-extern void __loop_delay(unsigned long loops);
-extern void __loop_udelay(unsigned long usecs);
-extern void __loop_const_udelay(unsigned long);
-
 /* Delay-loop timer registration. */
 #define ARCH_HAS_READ_CURRENT_TIMER
 extern void register_current_timer_delay(const struct delay_timer *timer);
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index 5306de3501..9150d31c2d 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -25,6 +25,11 @@
 #include <linux/module.h>
 #include <linux/timex.h>
 
+/* Loop-based definitions for assembly code. */
+extern void __loop_delay(unsigned long loops);
+extern void __loop_udelay(unsigned long usecs);
+extern void __loop_const_udelay(unsigned long);
+
 /*
  * Default to the loop-based delay implementation.
  */
@@ -34,6 +39,85 @@ struct arm_delay_ops arm_delay_ops = {
 	.udelay		= __loop_udelay,
 };
 
+#if defined(CONFIG_CPU_FREQ) && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT))
+
+#include <linux/cpufreq.h>
+
+/*
+ * Another CPU/thread might increase the CPU clock in the middle of
+ * the loop based delay routine and the newly scaled LPJ value won't be
+ * accounted for, resulting in a possibly significantly shorter delay than
+ * expected.  Let's make sure this occurrence is trapped and compensated.
+ */
+
+static int __loop_seq;
+static unsigned int __loop_security_factor;
+
+#define __safe_loop_(type) \
+static void __safe_loop_##type(unsigned long val) \
+{ \
+	int seq_count = __loop_seq; \
+	__loop_##type(val); \
+	if (seq_count != __loop_seq) \
+		__loop_##type(val * __loop_security_factor); \
+}
+
+__safe_loop_(delay)
+__safe_loop_(const_udelay)
+__safe_loop_(udelay)
+
+static int cpufreq_callback(struct notifier_block *nb,
+			    unsigned long val, void *data)
+{
+	struct cpufreq_freqs *freq = data;
+	unsigned int f;
+
+	if ((freq->flags & CPUFREQ_CONST_LOOPS) ||
+	    freq->old >= freq->new)
+		return NOTIFY_OK;
+
+	switch (val) {
+	case CPUFREQ_PRECHANGE:
+		/* Remember the largest security factor ever needed */
+		f = DIV_ROUND_UP(freq->new, freq->old) - 1;
+		if (__loop_security_factor < f)
+			__loop_security_factor = f;
+		/* fallthrough */
+	case CPUFREQ_POSTCHANGE:
+		__loop_seq++;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block cpufreq_notifier = {
+	.notifier_call  = cpufreq_callback,
+};
+
+static int __init init_safe_loop_delays(void)
+{
+	int err;
+
+	/*
+	 * Bail out if the default loop based implementation has
+	 * already been replaced by something better.
+	 */
+	if (arm_delay_ops.udelay != __loop_udelay)
+		return 0;
+
+	__loop_security_factor = 1;
+	err = cpufreq_register_notifier(&cpufreq_notifier,
+					CPUFREQ_TRANSITION_NOTIFIER);
+	if (!err) {
+		arm_delay_ops.delay		= __safe_loop_delay;
+		arm_delay_ops.const_udelay	= __safe_loop_const_udelay;
+		arm_delay_ops.udelay		= __safe_loop_udelay;
+	}
+	return err;
+}
+core_initcall(init_safe_loop_delays);
+
+#endif
+
 static const struct delay_timer *delay_timer;
 static bool delay_calibrated;
 

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems#
  2014-05-09 21:05                         ` Nicolas Pitre
@ 2014-05-12 23:51                           ` Doug Anderson
  2014-05-13 21:50                             ` Doug Anderson
  0 siblings, 1 reply; 31+ messages in thread
From: Doug Anderson @ 2014-05-12 23:51 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Russell King - ARM Linux, Viresh Kumar, Rafael J. Wysocki,
	Will Deacon, John Stultz, David Riley, olof, Sonny Rao,
	Richard Zhao, Santosh Shilimkar, Shawn Guo, Stephen Boyd,
	Marc Zyngier, Stephen Warren, Paul Gortmaker, linux-arm-kernel,
	linux-kernel

Hi,

On Fri, May 9, 2014 at 2:05 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Fri, 9 May 2014, Russell King - ARM Linux wrote:
>
>> I'd much prefer just printing a warning at kernel boot time to report
>> that the kernel is running with features which would make udelay() less
>> than accurate.
>
> What if there is simply no timer to rely upon, as in those cases where
> interrupts are needed for time keeping to make progress?  We should do
> better than simply saying "sorry your kernel should irradicate every
> udelay() usage to be reliable".
>
> And I mean "reliable" which is not exactly the same as "accurate".
> Reliable means "never *significantly* shorter".
>
>> Remember, it should be usable for _short_ delays on slow machines as
>> well as other stuff, and if we're going to start throwing stuff like
>> the above at it, it's going to become very inefficient.
>
> You said that udelay can be much longer than expected due to various
> reasons.
>
> You also said that the IRQ handler overhead during udelay calibration
> makes actual delays slightli shorter than expected.
>
> I'm suggesting the addition of a slight overhead that is much smaller
> than the IRQ handler here.  That shouldn't impact things masurably.
> I'd certainly like Doug to run his udelay timing test with the following
> patch to see if it solves the problem.

...so I spent a whole chunk of time debugging this problem today.  I'm
out of time today (more tomorrow), but it looks like the theory I
proposed about why udelay() is giving bad results _might_ have more to
do with bugs in the exynos cpufreq driver and less to do with the
theoretical race we've been talking about.  It looks possible that the
driver is not setting the "old" frequency properly, which would
certainly cause problems.

I'll post more when I figure this out for sure.

-Doug

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems#
  2014-05-12 23:51                           ` Doug Anderson
@ 2014-05-13 21:50                             ` Doug Anderson
  2014-05-13 22:15                               ` Stephen Warren
  2014-05-15  6:12                               ` Viresh Kumar
  0 siblings, 2 replies; 31+ messages in thread
From: Doug Anderson @ 2014-05-13 21:50 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Russell King - ARM Linux, Viresh Kumar, Rafael J. Wysocki,
	Will Deacon, John Stultz, David Riley, olof, Sonny Rao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

Hi,

On Mon, May 12, 2014 at 4:51 PM, Doug Anderson <dianders@chromium.org> wrote:
> Hi,
>
> On Fri, May 9, 2014 at 2:05 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>> On Fri, 9 May 2014, Russell King - ARM Linux wrote:
>>
>>> I'd much prefer just printing a warning at kernel boot time to report
>>> that the kernel is running with features which would make udelay() less
>>> than accurate.
>>
>> What if there is simply no timer to rely upon, as in those cases where
>> interrupts are needed for time keeping to make progress?  We should do
>> better than simply saying "sorry your kernel should irradicate every
>> udelay() usage to be reliable".
>>
>> And I mean "reliable" which is not exactly the same as "accurate".
>> Reliable means "never *significantly* shorter".
>>
>>> Remember, it should be usable for _short_ delays on slow machines as
>>> well as other stuff, and if we're going to start throwing stuff like
>>> the above at it, it's going to become very inefficient.
>>
>> You said that udelay can be much longer than expected due to various
>> reasons.
>>
>> You also said that the IRQ handler overhead during udelay calibration
>> makes actual delays slightli shorter than expected.
>>
>> I'm suggesting the addition of a slight overhead that is much smaller
>> than the IRQ handler here.  That shouldn't impact things masurably.
>> I'd certainly like Doug to run his udelay timing test with the following
>> patch to see if it solves the problem.
>
> ...so I spent a whole chunk of time debugging this problem today.  I'm
> out of time today (more tomorrow), but it looks like the theory I
> proposed about why udelay() is giving bad results _might_ have more to
> do with bugs in the exynos cpufreq driver and less to do with the
> theoretical race we've been talking about.  It looks possible that the
> driver is not setting the "old" frequency properly, which would
> certainly cause problems.

Argh.  It turns out that I spent a whole lot of time tracking down the
fact that cpufreq_out_of_sync() running.  As part of debugging this
problem I added a cpufreq_get(0).  That would periodically notice that
the driver's reported frequency didn't match "policy->cur" and call
cpufreq_out_of_sync().  cpufreq_out_of_sync() would "thoughtfully"
send out its own CPUFREQ_PRECHANGE / CPUFREQ_POSTCHANGE but without
any sort of mutexes (at least in our tree).  Ugh.

Overall cpufreq_out_of_sync() seems incredibly racy since there will
inevitably be some period of time where the cpufreq driver has changed
the real CPU frequency but hasn't yet sent out the
cpufreq_notify_transition().  ...and there is no locking between the
two that I see.  ...but that's getting pretty far afield from my
original bug and it's been that way forever, so I guess I'll ignore
it.

--

...but then I found the true problem shows up when we transition
between very low frequencies on exynos, like between 200MHz and
300MHz.  While transitioning between frequencies the system
temporarily bumps over to the "switcher" PLL running at 800MHz while
waiting for the main PLL to stabilize.  No CPUFREQ notification is
sent for that.  That means there's a period of time when we're running
at 800MHz but loops_per_jiffy is calibrated at between 200MHz and
300MHz.


I'm welcome to any suggestions for how to address this.  It sorta
feels like it would be a common thing to have a temporary PLL during
the transition, so my inclination would be to add a "temp" field to
"struct cpufreq_freqs".  Anyone who cared about the fact that cpufreq
might transition through a different frequency on its way from old to
new could look at this field.


What do people think?


-Doug

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems#
  2014-05-13 21:50                             ` Doug Anderson
@ 2014-05-13 22:15                               ` Stephen Warren
  2014-05-13 23:15                                 ` Nicolas Pitre
  2014-05-15  6:12                               ` Viresh Kumar
  1 sibling, 1 reply; 31+ messages in thread
From: Stephen Warren @ 2014-05-13 22:15 UTC (permalink / raw)
  To: Doug Anderson, Nicolas Pitre
  Cc: Russell King - ARM Linux, Viresh Kumar, Rafael J. Wysocki,
	Will Deacon, John Stultz, David Riley, olof, Sonny Rao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On 05/13/2014 03:50 PM, Doug Anderson wrote:
...
> ...but then I found the true problem shows up when we transition
> between very low frequencies on exynos, like between 200MHz and
> 300MHz.  While transitioning between frequencies the system
> temporarily bumps over to the "switcher" PLL running at 800MHz while
> waiting for the main PLL to stabilize.  No CPUFREQ notification is
> sent for that.  That means there's a period of time when we're running
> at 800MHz but loops_per_jiffy is calibrated at between 200MHz and
> 300MHz.
> 
> 
> I'm welcome to any suggestions for how to address this.  It sorta
> feels like it would be a common thing to have a temporary PLL during
> the transition, ...

We definitely do that on Tegra for some cpufreq transitions.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems#
  2014-05-13 22:15                               ` Stephen Warren
@ 2014-05-13 23:15                                 ` Nicolas Pitre
  2014-05-13 23:29                                   ` Nicolas Pitre
  0 siblings, 1 reply; 31+ messages in thread
From: Nicolas Pitre @ 2014-05-13 23:15 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Doug Anderson, Russell King - ARM Linux, Viresh Kumar,
	Rafael J. Wysocki, Will Deacon, John Stultz, David Riley, olof,
	Sonny Rao, Santosh Shilimkar, Shawn Guo, Stephen Boyd,
	Marc Zyngier, Stephen Warren, Paul Gortmaker, linux-arm-kernel,
	linux-kernel

On Tue, 13 May 2014, Stephen Warren wrote:

> On 05/13/2014 03:50 PM, Doug Anderson wrote:
> ...
> > ...but then I found the true problem shows up when we transition
> > between very low frequencies on exynos, like between 200MHz and
> > 300MHz.  While transitioning between frequencies the system
> > temporarily bumps over to the "switcher" PLL running at 800MHz while
> > waiting for the main PLL to stabilize.  No CPUFREQ notification is
> > sent for that.  That means there's a period of time when we're running
> > at 800MHz but loops_per_jiffy is calibrated at between 200MHz and
> > 300MHz.
> > 
> > 
> > I'm welcome to any suggestions for how to address this.  It sorta
> > feels like it would be a common thing to have a temporary PLL during
> > the transition, ...
> 
> We definitely do that on Tegra for some cpufreq transitions.

Ouch...  If this is a common strategy to use a third frequency during a 
transition phase, especially if that frequency is way off (800MHz vs 
200-300MHz) then it is something the cpufreq layer must capture and 
advertise.


Nicolas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems#
  2014-05-13 23:15                                 ` Nicolas Pitre
@ 2014-05-13 23:29                                   ` Nicolas Pitre
  2014-05-13 23:36                                     ` Russell King - ARM Linux
  2014-05-14 21:42                                     ` Doug Anderson
  0 siblings, 2 replies; 31+ messages in thread
From: Nicolas Pitre @ 2014-05-13 23:29 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Doug Anderson, Russell King - ARM Linux, Viresh Kumar,
	Rafael J. Wysocki, Will Deacon, John Stultz, David Riley, olof,
	Sonny Rao, Santosh Shilimkar, Shawn Guo, Stephen Boyd,
	Marc Zyngier, Stephen Warren, Paul Gortmaker, linux-arm-kernel,
	linux-kernel

On Tue, 13 May 2014, Nicolas Pitre wrote:

> On Tue, 13 May 2014, Stephen Warren wrote:
> 
> > On 05/13/2014 03:50 PM, Doug Anderson wrote:
> > ...
> > > ...but then I found the true problem shows up when we transition
> > > between very low frequencies on exynos, like between 200MHz and
> > > 300MHz.  While transitioning between frequencies the system
> > > temporarily bumps over to the "switcher" PLL running at 800MHz while
> > > waiting for the main PLL to stabilize.  No CPUFREQ notification is
> > > sent for that.  That means there's a period of time when we're running
> > > at 800MHz but loops_per_jiffy is calibrated at between 200MHz and
> > > 300MHz.
> > > 
> > > 
> > > I'm welcome to any suggestions for how to address this.  It sorta
> > > feels like it would be a common thing to have a temporary PLL during
> > > the transition, ...
> > 
> > We definitely do that on Tegra for some cpufreq transitions.
> 
> Ouch...  If this is a common strategy to use a third frequency during a 
> transition phase, especially if that frequency is way off (800MHz vs 
> 200-300MHz) then it is something the cpufreq layer must capture and 
> advertise.

Of course if only the loops_per_jiffy scaling does care about frequency 
changes these days, and if in those cases udelay() can instead be moved 
to a timer source on those hick-up prone platforms, then all this is 
fairly theoretical and may not be worth pursuing.


Nicolas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems#
  2014-05-13 23:29                                   ` Nicolas Pitre
@ 2014-05-13 23:36                                     ` Russell King - ARM Linux
  2014-05-14 21:42                                     ` Doug Anderson
  1 sibling, 0 replies; 31+ messages in thread
From: Russell King - ARM Linux @ 2014-05-13 23:36 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Stephen Warren, Doug Anderson, Viresh Kumar, Rafael J. Wysocki,
	Will Deacon, John Stultz, David Riley, olof, Sonny Rao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On Tue, May 13, 2014 at 07:29:52PM -0400, Nicolas Pitre wrote:
> On Tue, 13 May 2014, Nicolas Pitre wrote:
> 
> > On Tue, 13 May 2014, Stephen Warren wrote:
> > 
> > > On 05/13/2014 03:50 PM, Doug Anderson wrote:
> > > ...
> > > > ...but then I found the true problem shows up when we transition
> > > > between very low frequencies on exynos, like between 200MHz and
> > > > 300MHz.  While transitioning between frequencies the system
> > > > temporarily bumps over to the "switcher" PLL running at 800MHz while
> > > > waiting for the main PLL to stabilize.  No CPUFREQ notification is
> > > > sent for that.  That means there's a period of time when we're running
> > > > at 800MHz but loops_per_jiffy is calibrated at between 200MHz and
> > > > 300MHz.
> > > > 
> > > > 
> > > > I'm welcome to any suggestions for how to address this.  It sorta
> > > > feels like it would be a common thing to have a temporary PLL during
> > > > the transition, ...
> > > 
> > > We definitely do that on Tegra for some cpufreq transitions.
> > 
> > Ouch...  If this is a common strategy to use a third frequency during a 
> > transition phase, especially if that frequency is way off (800MHz vs 
> > 200-300MHz) then it is something the cpufreq layer must capture and 
> > advertise.
> 
> Of course if only the loops_per_jiffy scaling does care about frequency 
> changes these days, and if in those cases udelay() can instead be moved 
> to a timer source on those hick-up prone platforms, then all this is 
> fairly theoretical and may not be worth pursuing.

As I've been saying... use a bloody timer. :)

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems#
  2014-05-13 23:29                                   ` Nicolas Pitre
  2014-05-13 23:36                                     ` Russell King - ARM Linux
@ 2014-05-14 21:42                                     ` Doug Anderson
  1 sibling, 0 replies; 31+ messages in thread
From: Doug Anderson @ 2014-05-14 21:42 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Stephen Warren, Russell King - ARM Linux, Viresh Kumar,
	Rafael J. Wysocki, Will Deacon, John Stultz, David Riley, olof,
	Sonny Rao, Santosh Shilimkar, Shawn Guo, Stephen Boyd,
	Marc Zyngier, Stephen Warren, Paul Gortmaker, linux-arm-kernel,
	linux-kernel

Hi,

On Tue, May 13, 2014 at 4:29 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Tue, 13 May 2014, Nicolas Pitre wrote:
>
>> On Tue, 13 May 2014, Stephen Warren wrote:
>>
>> > On 05/13/2014 03:50 PM, Doug Anderson wrote:
>> > ...
>> > > ...but then I found the true problem shows up when we transition
>> > > between very low frequencies on exynos, like between 200MHz and
>> > > 300MHz.  While transitioning between frequencies the system
>> > > temporarily bumps over to the "switcher" PLL running at 800MHz while
>> > > waiting for the main PLL to stabilize.  No CPUFREQ notification is
>> > > sent for that.  That means there's a period of time when we're running
>> > > at 800MHz but loops_per_jiffy is calibrated at between 200MHz and
>> > > 300MHz.
>> > >
>> > >
>> > > I'm welcome to any suggestions for how to address this.  It sorta
>> > > feels like it would be a common thing to have a temporary PLL during
>> > > the transition, ...
>> >
>> > We definitely do that on Tegra for some cpufreq transitions.
>>
>> Ouch...  If this is a common strategy to use a third frequency during a
>> transition phase, especially if that frequency is way off (800MHz vs
>> 200-300MHz) then it is something the cpufreq layer must capture and
>> advertise.
>
> Of course if only the loops_per_jiffy scaling does care about frequency
> changes these days, and if in those cases udelay() can instead be moved
> to a timer source on those hick-up prone platforms, then all this is
> fairly theoretical and may not be worth pursuing.

Yup, I think I'm going to abandon this.  If anyone ever runs into this
problem later and can't use a bloody timer, hopefully they'll stumble
on this thread and at least end up with a good starting point.

For the record, the fix I made locally (which actually worked) was:

1. Add a new "temp" field to cpufreq_freqs:

--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -133,6 +133,7 @@ struct cpufreq_freqs {
        unsigned int cpu;       /* cpu nr */
        unsigned int old;
        unsigned int new;
+       unsigned int temp;
        u8 flags;               /* flags of cpufreq_driver, see below. */
 };

...depending on whether all users of this structure zero-initialize it
(I didn't check) you might also need to add a flag to say whether the
temp field is actually valid.


2. Set the "temp" to the switching PLL frequency in the cpufreq driver.


3. Change the ARM cpufreq_callback() to take the temp freq into account:

+       if (val == CPUFREQ_PRECHANGE) {
+               old_freq = freq->old;
+               new_freq = max(freq->new, freq->temp);
+       } else if (val == CPUFREQ_POSTCHANGE) {
+               old_freq = max(freq->old, freq->temp);
+               new_freq = freq->new;
+       }

...then use old_freq and new_freq in the place of freq->old and freq->new



-Doug

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems#
  2014-05-13 21:50                             ` Doug Anderson
  2014-05-13 22:15                               ` Stephen Warren
@ 2014-05-15  6:12                               ` Viresh Kumar
  1 sibling, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2014-05-15  6:12 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Nicolas Pitre, Russell King - ARM Linux, Rafael J. Wysocki,
	Will Deacon, John Stultz, David Riley, olof, Sonny Rao,
	Santosh Shilimkar, Shawn Guo, Stephen Boyd, Marc Zyngier,
	Stephen Warren, Paul Gortmaker, linux-arm-kernel, linux-kernel

On 14 May 2014 03:20, Doug Anderson <dianders@chromium.org> wrote:
> ...but then I found the true problem shows up when we transition
> between very low frequencies on exynos, like between 200MHz and
> 300MHz.  While transitioning between frequencies the system
> temporarily bumps over to the "switcher" PLL running at 800MHz while
> waiting for the main PLL to stabilize.  No CPUFREQ notification is
> sent for that.  That means there's a period of time when we're running
> at 800MHz but loops_per_jiffy is calibrated at between 200MHz and
> 300MHz.
>
>
> I'm welcome to any suggestions for how to address this.

I have attempted to fix this in a generic way and sent an RFC patch for
this. I have cc'd only few people from this list which I thought would be
interested in cpufreq stuff, sorry if I missed anyone.

https://lkml.org/lkml/2014/5/15/40

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2014-05-15  6:12 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-07 23:23 [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems Doug Anderson
2014-05-08 10:41 ` Viresh Kumar
2014-05-08 15:25   ` Doug Anderson
2014-05-08 16:04     ` Nicolas Pitre
2014-05-08 16:41       ` Doug Anderson
2014-05-08 17:43         ` Nicolas Pitre
2014-05-08 18:06           ` Doug Anderson
2014-05-08 19:59             ` Nicolas Pitre
2014-05-08 20:55             ` Russell King - ARM Linux
2014-05-09  0:02               ` Doug Anderson
2014-05-09  0:23                 ` Russell King - ARM Linux
2014-05-09  4:41                   ` Doug Anderson
2014-05-08 19:22           ` Russell King - ARM Linux
2014-05-08 20:12             ` Nicolas Pitre
2014-05-08 20:39               ` John Stultz
2014-05-08 20:52               ` Russell King - ARM Linux
2014-05-09  1:37                 ` Nicolas Pitre
2014-05-09  4:43                   ` Doug Anderson
2014-05-09  9:18                   ` [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems# Russell King - ARM Linux
2014-05-09 18:00                     ` Nicolas Pitre
2014-05-09 18:22                       ` Russell King - ARM Linux
2014-05-09 21:05                         ` Nicolas Pitre
2014-05-12 23:51                           ` Doug Anderson
2014-05-13 21:50                             ` Doug Anderson
2014-05-13 22:15                               ` Stephen Warren
2014-05-13 23:15                                 ` Nicolas Pitre
2014-05-13 23:29                                   ` Nicolas Pitre
2014-05-13 23:36                                     ` Russell King - ARM Linux
2014-05-14 21:42                                     ` Doug Anderson
2014-05-15  6:12                               ` Viresh Kumar
2014-05-09  9:25     ` [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems Viresh Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).