linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv5 0/3] Constant udelay() for SMP and non-SMP systems
@ 2011-04-05 23:56 Stephen Boyd
  2011-04-05 23:56 ` [PATCHv5 1/3] ARM: Translate delay.S into (mostly) C Stephen Boyd
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Stephen Boyd @ 2011-04-05 23:56 UTC (permalink / raw)
  To: Russell King
  Cc: linux-kernel, linux-arm-msm, linux-arm-kernel, Saravana Kannan,
	Nicolas Pitre, Andrew Morton, Mattias Wallin, Linus Walleij

Note: I will submit this to the patch tracker in 48 hours.
Tested by are appreciated.

These patches fix the udelay() issue pointed out on
arm-lkml[1][2]. A quick recap: some SMP machines can scale
their CPU frequencies independent of one another. loops_per_jiffy
is calibrated globally and used in __const_udelay(). If one CPU
is running faster than what the loops_per_jiffy is calculated
(or scaled) for, udelay() will be incorrect and not wait long
enough (or too long). A similar problem occurs if the cpu
frequency is scaled during a udelay() call.

We could fix this issue a couple ways, wholesale replacement
of __udelay() and __const_udelay() (see [2] for that approach),
or replacement of __delay() (this series). Option 1 can fail if
anybody uses udelay() before memory is mapped and also duplicates
most of the code in asm/delay.h. It also needs to hardcode the
timer tick frequency, which can sometimes be inaccurate. The
benefit is that loops_per_jiffy stays the same and thus BogoMIPS
is unchanged.  Option 2 cannot fail since the __delay() loop is
repointed after memory is mapped in, but it suffers from a low
BogoMIPS when timers are clocked slowly. It also more accurately
calculates the timer tick frequency through the use of
calibrate_delay_direct().

-- Reference --
[1] http://article.gmane.org/gmane.linux.kernel/977567
[2] http://article.gmane.org/gmane.linux.ports.arm.kernel/78496 

Changes since v4:
 * Rebased against changes to udelay.S

Changes since v3:
 * Inlined set_delay_fn()

Changes since v2:
 * Additional patch using the timer based delay

Changes since v1:
 * likely() in delay.c
 * comment fixup for read_current_timer_delay_loop()
 * cosmetic improvements to commit text

Stephen Boyd (3):
  ARM: Translate delay.S into (mostly) C
  ARM: Allow machines to override __delay()
  ARM: Implement a timer based __delay() loop

 arch/arm/include/asm/delay.h |   11 +++++-
 arch/arm/kernel/armksyms.c   |    4 --
 arch/arm/lib/delay.S         |   69 ---------------------------------
 arch/arm/lib/delay.c         |   87 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 97 insertions(+), 74 deletions(-)
 delete mode 100644 arch/arm/lib/delay.S
 create mode 100644 arch/arm/lib/delay.c

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCHv5 1/3] ARM: Translate delay.S into (mostly) C
  2011-04-05 23:56 [PATCHv5 0/3] Constant udelay() for SMP and non-SMP systems Stephen Boyd
@ 2011-04-05 23:56 ` Stephen Boyd
  2011-04-06  8:49   ` Mattias Wallin
  2011-04-07  7:29   ` Mattias Wallin
  2011-04-05 23:56 ` [PATCHv5 2/3] ARM: Allow machines to override __delay() Stephen Boyd
  2011-04-05 23:56 ` [PATCHv5 3/3] ARM: Implement a timer based __delay() loop Stephen Boyd
  2 siblings, 2 replies; 11+ messages in thread
From: Stephen Boyd @ 2011-04-05 23:56 UTC (permalink / raw)
  To: Russell King
  Cc: linux-kernel, linux-arm-msm, linux-arm-kernel, Saravana Kannan,
	Andrew Morton, Mattias Wallin, Linus Walleij

We want to allow machines to override the __delay() implementation
at runtime so they can use a timer based __delay() routine. It's
easier to do this using C, so let's write udelay and friends in C.

We lose the #if 0 code, which according to Russell is used "to
make the delay loop more stable and predictable on older CPUs"
(see http://article.gmane.org/gmane.linux.kernel/888867 for more
info). We shouldn't be too worried though, since we'll soon add
functionality allowing a machine to set the __delay() loop
themselves, thus allowing machines to resurrect the commented out
code should they need it.

Nicolas expressed concern that fixed lpj cmdlines will break due to
compiler optimizations. That doesn't seem to be the case since
before and after this patch I get the same lpj value when running
my CPU at 19.2 MHz. That should be sufficiently slow enough to
cover any machine running Linux.

Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>,
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    2 +-
 arch/arm/kernel/armksyms.c   |    4 --
 arch/arm/lib/delay.S         |   69 ------------------------------------------
 arch/arm/lib/delay.c         |   62 +++++++++++++++++++++++++++++++++++++
 4 files changed, 63 insertions(+), 74 deletions(-)
 delete mode 100644 arch/arm/lib/delay.S
 create mode 100644 arch/arm/lib/delay.c

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index b2deda1..ccc5ed5 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -8,7 +8,7 @@
 
 #include <asm/param.h>	/* HZ */
 
-extern void __delay(int loops);
+extern void __delay(unsigned long loops);
 
 /*
  * This function intentionally does not exist; if you see references to
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index e5e1e538..220dce6 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -52,10 +52,6 @@ extern void fpundefinstr(void);
 
 EXPORT_SYMBOL(__backtrace);
 
-	/* platform dependent support */
-EXPORT_SYMBOL(__udelay);
-EXPORT_SYMBOL(__const_udelay);
-
 	/* networking */
 EXPORT_SYMBOL(csum_partial);
 EXPORT_SYMBOL(csum_partial_copy_from_user);
diff --git a/arch/arm/lib/delay.S b/arch/arm/lib/delay.S
deleted file mode 100644
index 3c9a05c..0000000
--- a/arch/arm/lib/delay.S
+++ /dev/null
@@ -1,69 +0,0 @@
-/*
- *  linux/arch/arm/lib/delay.S
- *
- *  Copyright (C) 1995, 1996 Russell King
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-#include <linux/linkage.h>
-#include <asm/assembler.h>
-#include <asm/param.h>
-		.text
-
-.LC0:		.word	loops_per_jiffy
-.LC1:		.word	(2199023*HZ)>>11
-
-/*
- * r0  <= 2000
- * lpj <= 0x01ffffff (max. 3355 bogomips)
- * HZ  <= 1000
- */
-
-ENTRY(__udelay)
-		ldr	r2, .LC1
-		mul	r0, r2, r0
-ENTRY(__const_udelay)				@ 0 <= r0 <= 0x7fffff06
-		mov	r1, #-1
-		ldr	r2, .LC0
-		ldr	r2, [r2]		@ max = 0x01ffffff
-		add	r0, r0, r1, lsr #32-14
-		mov	r0, r0, lsr #14		@ max = 0x0001ffff
-		add	r2, r2, r1, lsr #32-10
-		mov	r2, r2, lsr #10		@ max = 0x00007fff
-		mul	r0, r2, r0		@ max = 2^32-1
-		add	r0, r0, r1, lsr #32-6
-		movs	r0, r0, lsr #6
-		moveq	pc, lr
-
-/*
- * loops = r0 * HZ * loops_per_jiffy / 1000000
- *
- * Oh, if only we had a cycle counter...
- */
-
-@ Delay routine
-ENTRY(__delay)
-		subs	r0, r0, #1
-#if 0
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-#endif
-		bhi	__delay
-		mov	pc, lr
-ENDPROC(__udelay)
-ENDPROC(__const_udelay)
-ENDPROC(__delay)
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
new file mode 100644
index 0000000..daca2be
--- /dev/null
+++ b/arch/arm/lib/delay.c
@@ -0,0 +1,62 @@
+/*
+ *  Originally from linux/arch/arm/lib/delay.S
+ *
+ *  Copyright (C) 1995, 1996 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/delay.h>
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ *
+ * Oh, if only we had a cycle counter...
+ */
+void __delay(unsigned long loops)
+{
+	asm volatile(
+	"1:	subs %0, %0, #1 \n"
+	"	bhi 1b		\n"
+	: /* No output */
+	: "r" (loops)
+	);
+}
+EXPORT_SYMBOL(__delay);
+
+/*
+ * 0 <= xloops <= 0x7fffff06
+ * loops_per_jiffy <= 0x01ffffff (max. 3355 bogomips)
+ */
+void __const_udelay(unsigned long xloops)
+{
+	unsigned long mask = ULONG_MAX;
+	unsigned long lpj = loops_per_jiffy;
+	unsigned long loops;
+
+	xloops += mask >> (32 - 14);
+	xloops >>= 14;			/* max = 0x01ffffff */
+
+	lpj += mask >> (32 - 10);
+	lpj >>= 10;			/* max = 0x0001ffff */
+
+	loops = lpj * xloops;		/* max = 0x00007fff */
+	loops += mask >> (32 - 6);
+	loops >>= 6;			/* max = 2^32-1 */
+
+	if (likely(loops))
+		__delay(loops);
+}
+EXPORT_SYMBOL(__const_udelay);
+
+/*
+ * usecs  <= 2000
+ * HZ  <= 1000
+ */
+void __udelay(unsigned long usecs)
+{
+	__const_udelay(usecs * ((2199023*HZ)>>11));
+}
+EXPORT_SYMBOL(__udelay);
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCHv5 2/3] ARM: Allow machines to override __delay()
  2011-04-05 23:56 [PATCHv5 0/3] Constant udelay() for SMP and non-SMP systems Stephen Boyd
  2011-04-05 23:56 ` [PATCHv5 1/3] ARM: Translate delay.S into (mostly) C Stephen Boyd
@ 2011-04-05 23:56 ` Stephen Boyd
  2011-04-07  7:30   ` Mattias Wallin
  2011-04-05 23:56 ` [PATCHv5 3/3] ARM: Implement a timer based __delay() loop Stephen Boyd
  2 siblings, 1 reply; 11+ messages in thread
From: Stephen Boyd @ 2011-04-05 23:56 UTC (permalink / raw)
  To: Russell King
  Cc: linux-kernel, linux-arm-msm, linux-arm-kernel, Saravana Kannan,
	Nicolas Pitre, Andrew Morton, Mattias Wallin, Linus Walleij

Some machines want to implement their own __delay() routine based
on fixed rate timers. Expose functionality to set the __delay()
routine at runtime. This should allow two machines with different
__delay() routines to happily co-exist within the same kernel
with minimal overhead.

Russell expressed concern that using a timer based __delay()
would cause problems when an iomapped device isn't mapped in
prior to a delay call being made (see
http://article.gmane.org/gmane.linux.ports.arm.kernel/78543 for
more info). We can sidestep that issue with this approach since
the __delay() routine _should_ only be pointed to a timer based
delay once the timer has been properly mapped. Up until that
point __delay() and udelay() will use delay_loop() which is
always safe to call.

This patch is inspired by x86's delay.c

Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    7 +++++++
 arch/arm/lib/delay.c         |   14 +++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index ccc5ed5..82ef82a 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -40,5 +40,12 @@ extern void __const_udelay(unsigned long);
 			__const_udelay((n) * ((2199023U*HZ)>>11))) :	\
 	  __udelay(n))
 
+extern void (*delay_fn)(unsigned long);
+
+static inline void set_delay_fn(void (*fn)(unsigned long))
+{
+	delay_fn = fn;
+}
+
 #endif /* defined(_ARM_DELAY_H) */
 
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index daca2be..42cda15 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -11,11 +11,9 @@
 #include <linux/delay.h>
 
 /*
- * loops = usecs * HZ * loops_per_jiffy / 1000000
- *
  * Oh, if only we had a cycle counter...
  */
-void __delay(unsigned long loops)
+static void delay_loop(unsigned long loops)
 {
 	asm volatile(
 	"1:	subs %0, %0, #1 \n"
@@ -24,6 +22,16 @@ void __delay(unsigned long loops)
 	: "r" (loops)
 	);
 }
+
+void (*delay_fn)(unsigned long) = delay_loop;
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ */
+void __delay(unsigned long loops)
+{
+	delay_fn(loops);
+}
 EXPORT_SYMBOL(__delay);
 
 /*
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCHv5 3/3] ARM: Implement a timer based __delay() loop
  2011-04-05 23:56 [PATCHv5 0/3] Constant udelay() for SMP and non-SMP systems Stephen Boyd
  2011-04-05 23:56 ` [PATCHv5 1/3] ARM: Translate delay.S into (mostly) C Stephen Boyd
  2011-04-05 23:56 ` [PATCHv5 2/3] ARM: Allow machines to override __delay() Stephen Boyd
@ 2011-04-05 23:56 ` Stephen Boyd
  2011-04-07  7:30   ` Mattias Wallin
  2 siblings, 1 reply; 11+ messages in thread
From: Stephen Boyd @ 2011-04-05 23:56 UTC (permalink / raw)
  To: Russell King
  Cc: linux-kernel, linux-arm-msm, linux-arm-kernel, Saravana Kannan,
	Nicolas Pitre, Andrew Morton, Mattias Wallin, Linus Walleij

udelay() can be incorrect on SMP machines that scale their CPU
frequencies independently of one another (as pointed out here
http://article.gmane.org/gmane.linux.kernel/977567). The delay
loop can either be too fast or too slow depending on which CPU the
loops_per_jiffy counter is calibrated on and which CPU the delay
loop is running on. udelay() can also be incorrect if the
CPU frequency switches during the __delay() loop, causing the loop
to either terminate too early, or too late.

Forcing udelay() to run on one CPU is unreasonable and taking the
penalty of a rather large loops_per_jiffy in udelay() when the
CPU is actually running slower is bad for performance. Solve the
problem by adding a timer based__delay() loop unaffected by CPU
frequency scaling. Machines should set this loop as their
__delay() implementation by calling set_timer_fn() during their
timer initialization.

The kernel is already prepared for a timer based approach
(evident by the read_current_timer() function). If an arch
implements read_current_timer(), calibrate_delay() will use
calibrate_delay_direct() to calculate loops_per_jiffy (in which
case loops_per_jiffy should really be renamed to
timer_ticks_per_jiffy). Since the loops_per_jiffy will be based
on timer ticks, __delay() should be implemented as a loop around
read_current_timer().

Doing this makes the expensive loops_per_jiffy calculation go
away (saving ~150ms on boot time on my machine) and fixes
udelay() by making it safe in the face of independently scaling
CPUs. The only prerequisite is that read_current_timer() is
monotonically increasing across calls (and doesn't overflow
within ~2000us).

There is a downside to this approach though. BogoMIPS is no
longer "accurate" in that it reflects the BogoMIPS of the timer
and not the CPU. On most SoC's the timer isn't running anywhere
near as fast as the CPU so BogoMIPS will be ridiculously low (my
timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my
CPU's 800). This shouldn't be too much of a concern though since
BogoMIPS are bogus anyway (hence the name).

This loop is pretty much a copy of AVR's version.

Reported-and-reviewed-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    2 ++
 arch/arm/lib/delay.c         |   17 +++++++++++++++++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index 82ef82a..91063a3 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -47,5 +47,7 @@ static inline void set_delay_fn(void (*fn)(unsigned long))
 	delay_fn = fn;
 }
 
+extern void read_current_timer_delay_loop(unsigned long loops);
+
 #endif /* defined(_ARM_DELAY_H) */
 
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index 42cda15..b8825e9 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -9,6 +9,7 @@
  */
 #include <linux/module.h>
 #include <linux/delay.h>
+#include <linux/timex.h>
 
 /*
  * Oh, if only we had a cycle counter...
@@ -23,6 +24,22 @@ static void delay_loop(unsigned long loops)
 	);
 }
 
+#ifdef ARCH_HAS_READ_CURRENT_TIMER
+/*
+ * Assumes read_current_timer() is monotonically increasing
+ * across calls and wraps at most once within MAX_UDELAY_MS.
+ */
+void read_current_timer_delay_loop(unsigned long loops)
+{
+	unsigned long bclock, now;
+
+	read_current_timer(&bclock);
+	do {
+		read_current_timer(&now);
+	} while ((now - bclock) < loops);
+}
+#endif
+
 void (*delay_fn)(unsigned long) = delay_loop;
 
 /*
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCHv5 1/3] ARM: Translate delay.S into (mostly) C
  2011-04-05 23:56 ` [PATCHv5 1/3] ARM: Translate delay.S into (mostly) C Stephen Boyd
@ 2011-04-06  8:49   ` Mattias Wallin
  2011-04-06 17:34     ` Stephen Boyd
  2011-04-07  1:27     ` Saravana Kannan
  2011-04-07  7:29   ` Mattias Wallin
  1 sibling, 2 replies; 11+ messages in thread
From: Mattias Wallin @ 2011-04-06  8:49 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King, linux-kernel, linux-arm-msm, linux-arm-kernel,
	Saravana Kannan, Andrew Morton, Linus Walleij

On 04/06/2011 01:56 AM, Stephen Boyd wrote:
> +void __udelay(unsigned long usecs)
> +{
> +	__const_udelay(usecs * ((2199023*HZ)>>11));
This will overflow if HZ=1000.
If I remember correct I fixed it like this:
__const_udelay(usecs * ((2199023UL*HZ)>>11));

/Mattias Wallin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv5 1/3] ARM: Translate delay.S into (mostly) C
  2011-04-06  8:49   ` Mattias Wallin
@ 2011-04-06 17:34     ` Stephen Boyd
  2011-04-07  1:27     ` Saravana Kannan
  1 sibling, 0 replies; 11+ messages in thread
From: Stephen Boyd @ 2011-04-06 17:34 UTC (permalink / raw)
  To: Mattias Wallin
  Cc: Russell King, linux-kernel, linux-arm-msm, linux-arm-kernel,
	Saravana Kannan, Andrew Morton, Linus Walleij

On 04/06/2011 01:49 AM, Mattias Wallin wrote:
> On 04/06/2011 01:56 AM, Stephen Boyd wrote:
>> +void __udelay(unsigned long usecs)
>> +{
>> +    __const_udelay(usecs * ((2199023*HZ)>>11));
>
> This will overflow if HZ=1000.
> If I remember correct I fixed it like this:
> __const_udelay(usecs * ((2199023UL*HZ)>>11));
>

Thanks. I see that my compiler spits out a warning when HZ is 1000.

I'll squash this in before sending to the tracker. Care to send a Tested-by?

----->8-----8<-----

diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index f664002..e7229bf 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -80,6 +80,6 @@ EXPORT_SYMBOL(__const_udelay);
  */
 void __udelay(unsigned long usecs)
 {
-       __const_udelay(usecs * ((2199023*HZ)>>11));
+       __const_udelay(usecs * ((2199023UL*HZ)>>11));
 }
 EXPORT_SYMBOL(__udelay);


-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCHv5 1/3] ARM: Translate delay.S into (mostly) C
  2011-04-06  8:49   ` Mattias Wallin
  2011-04-06 17:34     ` Stephen Boyd
@ 2011-04-07  1:27     ` Saravana Kannan
  2011-04-07  7:27       ` Mattias Wallin
  1 sibling, 1 reply; 11+ messages in thread
From: Saravana Kannan @ 2011-04-07  1:27 UTC (permalink / raw)
  To: Mattias Wallin
  Cc: Stephen Boyd, Russell King, linux-arm-msm, Linus Walleij,
	linux-kernel, Andrew Morton, linux-arm-kernel

On 04/06/2011 01:49 AM, Mattias Wallin wrote:
> On 04/06/2011 01:56 AM, Stephen Boyd wrote:
>> +void __udelay(unsigned long usecs)
>> +{
>> + __const_udelay(usecs * ((2199023*HZ)>>11));
> This will overflow if HZ=1000.
> If I remember correct I fixed it like this:
> __const_udelay(usecs * ((2199023UL*HZ)>>11));
>
> /Mattias Wallin
>

Mattias,

Isn't this a big in the existing code too? I would prefer not combining 
that with this patch series. That bug should be fixed before/after this 
patch series. Not as part of it.

I would vote for after this patch series since this patch series has 
been trying to get in for quite a while now.

-Saravana

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv5 1/3] ARM: Translate delay.S into (mostly) C
  2011-04-07  1:27     ` Saravana Kannan
@ 2011-04-07  7:27       ` Mattias Wallin
  0 siblings, 0 replies; 11+ messages in thread
From: Mattias Wallin @ 2011-04-07  7:27 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Stephen Boyd, Russell King, linux-arm-msm, Linus Walleij,
	linux-kernel, Andrew Morton, linux-arm-kernel

On 04/07/2011 03:27 AM, Saravana Kannan wrote:
> Isn't this a big in the existing code too? I would prefer not combining
> that with this patch series. That bug should be fixed before/after this
> patch series. Not as part of it.
>
> I would vote for after this patch series since this patch series has
> been trying to get in for quite a while now.
Ok for me.
Leave it as is and I'll send a patch on top of it.

Yours,
Mattias Wallin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv5 1/3] ARM: Translate delay.S into (mostly) C
  2011-04-05 23:56 ` [PATCHv5 1/3] ARM: Translate delay.S into (mostly) C Stephen Boyd
  2011-04-06  8:49   ` Mattias Wallin
@ 2011-04-07  7:29   ` Mattias Wallin
  1 sibling, 0 replies; 11+ messages in thread
From: Mattias Wallin @ 2011-04-07  7:29 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King, linux-kernel, linux-arm-msm, linux-arm-kernel,
	Saravana Kannan, Andrew Morton, Linus Walleij

On 04/06/2011 01:56 AM, Stephen Boyd wrote:
> We want to allow machines to override the __delay() implementation
> at runtime so they can use a timer based __delay() routine. It's
> easier to do this using C, so let's write udelay and friends in C.
>
> We lose the #if 0 code, which according to Russell is used "to
> make the delay loop more stable and predictable on older CPUs"
> (seehttp://article.gmane.org/gmane.linux.kernel/888867  for more
> info). We shouldn't be too worried though, since we'll soon add
> functionality allowing a machine to set the __delay() loop
> themselves, thus allowing machines to resurrect the commented out
> code should they need it.
>
> Nicolas expressed concern that fixed lpj cmdlines will break due to
> compiler optimizations. That doesn't seem to be the case since
> before and after this patch I get the same lpj value when running
> my CPU at 19.2 MHz. That should be sufficiently slow enough to
> cover any machine running Linux.

Tested-by: Mattias Wallin <mattias.wallin@stericsson.com>

Yours,
Mattias Wallin


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv5 2/3] ARM: Allow machines to override __delay()
  2011-04-05 23:56 ` [PATCHv5 2/3] ARM: Allow machines to override __delay() Stephen Boyd
@ 2011-04-07  7:30   ` Mattias Wallin
  0 siblings, 0 replies; 11+ messages in thread
From: Mattias Wallin @ 2011-04-07  7:30 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King, linux-kernel, linux-arm-msm, linux-arm-kernel,
	Saravana Kannan, Nicolas Pitre, Andrew Morton, Linus Walleij

On 04/06/2011 01:56 AM, Stephen Boyd wrote:
> Some machines want to implement their own __delay() routine based
> on fixed rate timers. Expose functionality to set the __delay()
> routine at runtime. This should allow two machines with different
> __delay() routines to happily co-exist within the same kernel
> with minimal overhead.
>
> Russell expressed concern that using a timer based __delay()
> would cause problems when an iomapped device isn't mapped in
> prior to a delay call being made (see
> http://article.gmane.org/gmane.linux.ports.arm.kernel/78543  for
> more info). We can sidestep that issue with this approach since
> the __delay() routine_should_  only be pointed to a timer based
> delay once the timer has been properly mapped. Up until that
> point __delay() and udelay() will use delay_loop() which is
> always safe to call.
>
> This patch is inspired by x86's delay.c
>

Tested-by: Mattias Wallin <mattias.wallin@stericsson.com>

Yours,
Mattias Wallin


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv5 3/3] ARM: Implement a timer based __delay() loop
  2011-04-05 23:56 ` [PATCHv5 3/3] ARM: Implement a timer based __delay() loop Stephen Boyd
@ 2011-04-07  7:30   ` Mattias Wallin
  0 siblings, 0 replies; 11+ messages in thread
From: Mattias Wallin @ 2011-04-07  7:30 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King, linux-kernel, linux-arm-msm, linux-arm-kernel,
	Saravana Kannan, Nicolas Pitre, Andrew Morton, Linus Walleij

On 04/06/2011 01:56 AM, Stephen Boyd wrote:
> udelay() can be incorrect on SMP machines that scale their CPU
> frequencies independently of one another (as pointed out here
> http://article.gmane.org/gmane.linux.kernel/977567). The delay
> loop can either be too fast or too slow depending on which CPU the
> loops_per_jiffy counter is calibrated on and which CPU the delay
> loop is running on. udelay() can also be incorrect if the
> CPU frequency switches during the __delay() loop, causing the loop
> to either terminate too early, or too late.
>
> Forcing udelay() to run on one CPU is unreasonable and taking the
> penalty of a rather large loops_per_jiffy in udelay() when the
> CPU is actually running slower is bad for performance. Solve the
> problem by adding a timer based__delay() loop unaffected by CPU
> frequency scaling. Machines should set this loop as their
> __delay() implementation by calling set_timer_fn() during their
> timer initialization.
>
> The kernel is already prepared for a timer based approach
> (evident by the read_current_timer() function). If an arch
> implements read_current_timer(), calibrate_delay() will use
> calibrate_delay_direct() to calculate loops_per_jiffy (in which
> case loops_per_jiffy should really be renamed to
> timer_ticks_per_jiffy). Since the loops_per_jiffy will be based
> on timer ticks, __delay() should be implemented as a loop around
> read_current_timer().
>
> Doing this makes the expensive loops_per_jiffy calculation go
> away (saving ~150ms on boot time on my machine) and fixes
> udelay() by making it safe in the face of independently scaling
> CPUs. The only prerequisite is that read_current_timer() is
> monotonically increasing across calls (and doesn't overflow
> within ~2000us).
>
> There is a downside to this approach though. BogoMIPS is no
> longer "accurate" in that it reflects the BogoMIPS of the timer
> and not the CPU. On most SoC's the timer isn't running anywhere
> near as fast as the CPU so BogoMIPS will be ridiculously low (my
> timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my
> CPU's 800). This shouldn't be too much of a concern though since
> BogoMIPS are bogus anyway (hence the name).
>
> This loop is pretty much a copy of AVR's version.

Tested-by: Mattias Wallin <mattias.wallin@stericsson.com>

Yours,
Mattias Wallin


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-04-07  7:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-05 23:56 [PATCHv5 0/3] Constant udelay() for SMP and non-SMP systems Stephen Boyd
2011-04-05 23:56 ` [PATCHv5 1/3] ARM: Translate delay.S into (mostly) C Stephen Boyd
2011-04-06  8:49   ` Mattias Wallin
2011-04-06 17:34     ` Stephen Boyd
2011-04-07  1:27     ` Saravana Kannan
2011-04-07  7:27       ` Mattias Wallin
2011-04-07  7:29   ` Mattias Wallin
2011-04-05 23:56 ` [PATCHv5 2/3] ARM: Allow machines to override __delay() Stephen Boyd
2011-04-07  7:30   ` Mattias Wallin
2011-04-05 23:56 ` [PATCHv5 3/3] ARM: Implement a timer based __delay() loop Stephen Boyd
2011-04-07  7:30   ` Mattias Wallin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).