All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too)
@ 2010-09-07 18:23 ` Stephen Boyd
  0 siblings, 0 replies; 21+ messages in thread
From: Stephen Boyd @ 2010-09-07 18:23 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arm-msm, linux-kernel, Russell King, Saravana Kannan,
	Colin Cross, Kevin Hilman, Santosh Shilimkar

(Sorry, resending due to stray comma...)

These patches are another attempt at fixing the udelay()
issue pointed out on arm-lkml[1][2]. A quick recap: some SMP
machines can scale their CPU frequencies independent of one
another. loops_per_jiffy is calibrated globally and used in
__const_udelay(). If one CPU is running faster than what the
loops_per_jiffy is calculated (or scaled) for, udelay() will
be incorrect and not wait long enough (or too long). A similar
problem occurs if the cpu frequency is scaled during a udelay()
call.

We could fix this issue a couple ways, wholesale replacement
of __udelay() and __const_udelay() (see [2] for that approach),
or replacement of __delay() (this series). Option 1 can fail if
anybody uses udelay() before memory is mapped and also duplicates
most of the code in asm/delay.h. It also needs to hardcode the
timer tick frequency, which can sometimes be inaccurate. The
benefit is that loops_per_jiffy stays the same and thus BogoMIPS
is unchanged.  Option 2 can't fail since the __delay() loop is
replaced after memory is mapped in, but it suffers from a low
BogoMIPS when timers are clocked slowly. It also more accurately
calculates the timer tick frequency through the use of
calibrate_delay_direct().

-- Reference --
[1] http://article.gmane.org/gmane.linux.kernel/977567
[2] http://article.gmane.org/gmane.linux.ports.arm.kernel/78496 

Stephen Boyd (3):
  [ARM] Translate delay.S into (mostly) C
  [ARM] Allow machines to override __delay()
  [ARM] Implement a timer based __delay() loop

 arch/arm/include/asm/delay.h |    5 ++-
 arch/arm/kernel/armksyms.c   |    4 --
 arch/arm/lib/delay.S         |   65 ------------------------------
 arch/arm/lib/delay.c         |   90 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 94 insertions(+), 70 deletions(-)
 delete mode 100644 arch/arm/lib/delay.S
 create mode 100644 arch/arm/lib/delay.c

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too)
@ 2010-09-07 18:23 ` Stephen Boyd
  0 siblings, 0 replies; 21+ messages in thread
From: Stephen Boyd @ 2010-09-07 18:23 UTC (permalink / raw)
  To: linux-arm-kernel

(Sorry, resending due to stray comma...)

These patches are another attempt at fixing the udelay()
issue pointed out on arm-lkml[1][2]. A quick recap: some SMP
machines can scale their CPU frequencies independent of one
another. loops_per_jiffy is calibrated globally and used in
__const_udelay(). If one CPU is running faster than what the
loops_per_jiffy is calculated (or scaled) for, udelay() will
be incorrect and not wait long enough (or too long). A similar
problem occurs if the cpu frequency is scaled during a udelay()
call.

We could fix this issue a couple ways, wholesale replacement
of __udelay() and __const_udelay() (see [2] for that approach),
or replacement of __delay() (this series). Option 1 can fail if
anybody uses udelay() before memory is mapped and also duplicates
most of the code in asm/delay.h. It also needs to hardcode the
timer tick frequency, which can sometimes be inaccurate. The
benefit is that loops_per_jiffy stays the same and thus BogoMIPS
is unchanged.  Option 2 can't fail since the __delay() loop is
replaced after memory is mapped in, but it suffers from a low
BogoMIPS when timers are clocked slowly. It also more accurately
calculates the timer tick frequency through the use of
calibrate_delay_direct().

-- Reference --
[1] http://article.gmane.org/gmane.linux.kernel/977567
[2] http://article.gmane.org/gmane.linux.ports.arm.kernel/78496 

Stephen Boyd (3):
  [ARM] Translate delay.S into (mostly) C
  [ARM] Allow machines to override __delay()
  [ARM] Implement a timer based __delay() loop

 arch/arm/include/asm/delay.h |    5 ++-
 arch/arm/kernel/armksyms.c   |    4 --
 arch/arm/lib/delay.S         |   65 ------------------------------
 arch/arm/lib/delay.c         |   90 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 94 insertions(+), 70 deletions(-)
 delete mode 100644 arch/arm/lib/delay.S
 create mode 100644 arch/arm/lib/delay.c

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C
  2010-09-07 18:23 ` Stephen Boyd
@ 2010-09-07 18:23   ` Stephen Boyd
  -1 siblings, 0 replies; 21+ messages in thread
From: Stephen Boyd @ 2010-09-07 18:23 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arm-msm, linux-kernel, Russell King, Saravana Kannan,
	Colin Cross, Kevin Hilman, Santosh Shilimkar

In the next patch we're going to allow machines to override the
__delay() implementation at runtime so they can implement a timer
based __delay() routine. It's easier to do this using C, so lets
write udelay and friends in C.

We lose the #if 0 code, which according to Russell is used "to
make the delay loop more stable and predictable on older CPUs"
(see http://article.gmane.org/gmane.linux.kernel/888867 for more
info). We shouldn't be too worried though, since the next patch
adds functionality to allow a machine to set the __delay() loop
themselves, therefore allowing machines to resurrect the
commented out code if they need it.

bloat-o-meter shows an increase of 12 bytes. Further inspection
of the assembly shows GCC copying the loops_per_jiffy pointer and
the magic HZ value to the ends of __const_udelay() and _delay()
thus contributing an extra 4 and 8 bytes of data to each
function. These two values weren't taken into account in the
delay.S version since they weren't part of the function in nm's
eyes. This means we only really gained an extra 4 bytes due to
GCC's decision to duplicate the loops_per_jiffy pointer in
__const_udelay.

 $ scripts/bloat-o-meter vmlinux.orig vmlinux.new
 add/remove: 0/0 grow/shrink: 2/0 up/down: 12/0 (12)
 function                                     old     new   delta
 __udelay                                      48      56      +8
 __const_udelay                                40      44      +4

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---

So maybe we can make the magic HZ value into a #define? UDELAY_SCALAR?
UDELAY_MAGIC?

 arch/arm/include/asm/delay.h |    2 +-
 arch/arm/kernel/armksyms.c   |    4 --
 arch/arm/lib/delay.S         |   65 ------------------------------------------
 arch/arm/lib/delay.c         |   57 ++++++++++++++++++++++++++++++++++++
 4 files changed, 58 insertions(+), 70 deletions(-)
 delete mode 100644 arch/arm/lib/delay.S
 create mode 100644 arch/arm/lib/delay.c

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index b2deda1..ccc5ed5 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -8,7 +8,7 @@
 
 #include <asm/param.h>	/* HZ */
 
-extern void __delay(int loops);
+extern void __delay(unsigned long loops);
 
 /*
  * This function intentionally does not exist; if you see references to
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 8214bfe..259e549 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -52,10 +52,6 @@ extern void fpundefinstr(void);
 
 EXPORT_SYMBOL(__backtrace);
 
-	/* platform dependent support */
-EXPORT_SYMBOL(__udelay);
-EXPORT_SYMBOL(__const_udelay);
-
 	/* networking */
 EXPORT_SYMBOL(csum_partial);
 EXPORT_SYMBOL(csum_partial_copy_from_user);
diff --git a/arch/arm/lib/delay.S b/arch/arm/lib/delay.S
deleted file mode 100644
index 8d6a876..0000000
--- a/arch/arm/lib/delay.S
+++ /dev/null
@@ -1,65 +0,0 @@
-/*
- *  linux/arch/arm/lib/delay.S
- *
- *  Copyright (C) 1995, 1996 Russell King
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-#include <linux/linkage.h>
-#include <asm/assembler.h>
-#include <asm/param.h>
-		.text
-
-.LC0:		.word	loops_per_jiffy
-.LC1:		.word	(2199023*HZ)>>11
-
-/*
- * r0  <= 2000
- * lpj <= 0x01ffffff (max. 3355 bogomips)
- * HZ  <= 1000
- */
-
-ENTRY(__udelay)
-		ldr	r2, .LC1
-		mul	r0, r2, r0
-ENTRY(__const_udelay)				@ 0 <= r0 <= 0x7fffff06
-		ldr	r2, .LC0
-		ldr	r2, [r2]		@ max = 0x01ffffff
-		mov	r0, r0, lsr #14		@ max = 0x0001ffff
-		mov	r2, r2, lsr #10		@ max = 0x00007fff
-		mul	r0, r2, r0		@ max = 2^32-1
-		movs	r0, r0, lsr #6
-		moveq	pc, lr
-
-/*
- * loops = r0 * HZ * loops_per_jiffy / 1000000
- *
- * Oh, if only we had a cycle counter...
- */
-
-@ Delay routine
-ENTRY(__delay)
-		subs	r0, r0, #1
-#if 0
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-#endif
-		bhi	__delay
-		mov	pc, lr
-ENDPROC(__udelay)
-ENDPROC(__const_udelay)
-ENDPROC(__delay)
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
new file mode 100644
index 0000000..5ee0adc
--- /dev/null
+++ b/arch/arm/lib/delay.c
@@ -0,0 +1,57 @@
+/*
+ *  Originally from linux/arch/arm/lib/delay.S
+ *
+ *  Copyright (C) 1995, 1996 Russell King
+ *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/delay.h>
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ *
+ * Oh, if only we had a cycle counter...
+ */
+void __delay(unsigned long loops)
+{
+	asm volatile(
+	"1:	subs %0, %0, #1 \n"
+	"	bhi 1b		\n"
+	: /* No output */
+	: "r" (loops)
+	);
+}
+EXPORT_SYMBOL(__delay);
+
+/*
+ * 0 <= xloops <= 0x7fffff06
+ * loops_per_jiffy <= 0x01ffffff (max. 3355 bogomips)
+ */
+void __const_udelay(unsigned long xloops)
+{
+	unsigned long lpj;
+	unsigned long loops;
+
+	xloops >>= 14;			/* max = 0x01ffffff */
+	lpj = loops_per_jiffy >> 10;	/* max = 0x0001ffff */
+	loops = lpj * xloops;		/* max = 0x00007fff */
+	loops >>= 6;			/* max = 2^32-1 */
+
+	if (loops)
+		__delay(loops);
+}
+EXPORT_SYMBOL(__const_udelay);
+
+/*
+ * usecs  <= 2000
+ * HZ  <= 1000
+ */
+void __udelay(unsigned long usecs)
+{
+	__const_udelay(usecs * ((2199023*HZ)>>11));
+}
+EXPORT_SYMBOL(__udelay);
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C
@ 2010-09-07 18:23   ` Stephen Boyd
  0 siblings, 0 replies; 21+ messages in thread
From: Stephen Boyd @ 2010-09-07 18:23 UTC (permalink / raw)
  To: linux-arm-kernel

In the next patch we're going to allow machines to override the
__delay() implementation at runtime so they can implement a timer
based __delay() routine. It's easier to do this using C, so lets
write udelay and friends in C.

We lose the #if 0 code, which according to Russell is used "to
make the delay loop more stable and predictable on older CPUs"
(see http://article.gmane.org/gmane.linux.kernel/888867 for more
info). We shouldn't be too worried though, since the next patch
adds functionality to allow a machine to set the __delay() loop
themselves, therefore allowing machines to resurrect the
commented out code if they need it.

bloat-o-meter shows an increase of 12 bytes. Further inspection
of the assembly shows GCC copying the loops_per_jiffy pointer and
the magic HZ value to the ends of __const_udelay() and _delay()
thus contributing an extra 4 and 8 bytes of data to each
function. These two values weren't taken into account in the
delay.S version since they weren't part of the function in nm's
eyes. This means we only really gained an extra 4 bytes due to
GCC's decision to duplicate the loops_per_jiffy pointer in
__const_udelay.

 $ scripts/bloat-o-meter vmlinux.orig vmlinux.new
 add/remove: 0/0 grow/shrink: 2/0 up/down: 12/0 (12)
 function                                     old     new   delta
 __udelay                                      48      56      +8
 __const_udelay                                40      44      +4

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---

So maybe we can make the magic HZ value into a #define? UDELAY_SCALAR?
UDELAY_MAGIC?

 arch/arm/include/asm/delay.h |    2 +-
 arch/arm/kernel/armksyms.c   |    4 --
 arch/arm/lib/delay.S         |   65 ------------------------------------------
 arch/arm/lib/delay.c         |   57 ++++++++++++++++++++++++++++++++++++
 4 files changed, 58 insertions(+), 70 deletions(-)
 delete mode 100644 arch/arm/lib/delay.S
 create mode 100644 arch/arm/lib/delay.c

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index b2deda1..ccc5ed5 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -8,7 +8,7 @@
 
 #include <asm/param.h>	/* HZ */
 
-extern void __delay(int loops);
+extern void __delay(unsigned long loops);
 
 /*
  * This function intentionally does not exist; if you see references to
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 8214bfe..259e549 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -52,10 +52,6 @@ extern void fpundefinstr(void);
 
 EXPORT_SYMBOL(__backtrace);
 
-	/* platform dependent support */
-EXPORT_SYMBOL(__udelay);
-EXPORT_SYMBOL(__const_udelay);
-
 	/* networking */
 EXPORT_SYMBOL(csum_partial);
 EXPORT_SYMBOL(csum_partial_copy_from_user);
diff --git a/arch/arm/lib/delay.S b/arch/arm/lib/delay.S
deleted file mode 100644
index 8d6a876..0000000
--- a/arch/arm/lib/delay.S
+++ /dev/null
@@ -1,65 +0,0 @@
-/*
- *  linux/arch/arm/lib/delay.S
- *
- *  Copyright (C) 1995, 1996 Russell King
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-#include <linux/linkage.h>
-#include <asm/assembler.h>
-#include <asm/param.h>
-		.text
-
-.LC0:		.word	loops_per_jiffy
-.LC1:		.word	(2199023*HZ)>>11
-
-/*
- * r0  <= 2000
- * lpj <= 0x01ffffff (max. 3355 bogomips)
- * HZ  <= 1000
- */
-
-ENTRY(__udelay)
-		ldr	r2, .LC1
-		mul	r0, r2, r0
-ENTRY(__const_udelay)				@ 0 <= r0 <= 0x7fffff06
-		ldr	r2, .LC0
-		ldr	r2, [r2]		@ max = 0x01ffffff
-		mov	r0, r0, lsr #14		@ max = 0x0001ffff
-		mov	r2, r2, lsr #10		@ max = 0x00007fff
-		mul	r0, r2, r0		@ max = 2^32-1
-		movs	r0, r0, lsr #6
-		moveq	pc, lr
-
-/*
- * loops = r0 * HZ * loops_per_jiffy / 1000000
- *
- * Oh, if only we had a cycle counter...
- */
-
-@ Delay routine
-ENTRY(__delay)
-		subs	r0, r0, #1
-#if 0
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-#endif
-		bhi	__delay
-		mov	pc, lr
-ENDPROC(__udelay)
-ENDPROC(__const_udelay)
-ENDPROC(__delay)
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
new file mode 100644
index 0000000..5ee0adc
--- /dev/null
+++ b/arch/arm/lib/delay.c
@@ -0,0 +1,57 @@
+/*
+ *  Originally from linux/arch/arm/lib/delay.S
+ *
+ *  Copyright (C) 1995, 1996 Russell King
+ *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/delay.h>
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ *
+ * Oh, if only we had a cycle counter...
+ */
+void __delay(unsigned long loops)
+{
+	asm volatile(
+	"1:	subs %0, %0, #1 \n"
+	"	bhi 1b		\n"
+	: /* No output */
+	: "r" (loops)
+	);
+}
+EXPORT_SYMBOL(__delay);
+
+/*
+ * 0 <= xloops <= 0x7fffff06
+ * loops_per_jiffy <= 0x01ffffff (max. 3355 bogomips)
+ */
+void __const_udelay(unsigned long xloops)
+{
+	unsigned long lpj;
+	unsigned long loops;
+
+	xloops >>= 14;			/* max = 0x01ffffff */
+	lpj = loops_per_jiffy >> 10;	/* max = 0x0001ffff */
+	loops = lpj * xloops;		/* max = 0x00007fff */
+	loops >>= 6;			/* max = 2^32-1 */
+
+	if (loops)
+		__delay(loops);
+}
+EXPORT_SYMBOL(__const_udelay);
+
+/*
+ * usecs  <= 2000
+ * HZ  <= 1000
+ */
+void __udelay(unsigned long usecs)
+{
+	__const_udelay(usecs * ((2199023*HZ)>>11));
+}
+EXPORT_SYMBOL(__udelay);
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/3] [ARM] Allow machines to override __delay()
  2010-09-07 18:23 ` Stephen Boyd
@ 2010-09-07 18:23   ` Stephen Boyd
  -1 siblings, 0 replies; 21+ messages in thread
From: Stephen Boyd @ 2010-09-07 18:23 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arm-msm, linux-kernel, Russell King, Saravana Kannan,
	Colin Cross, Kevin Hilman, Santosh Shilimkar

Some machines want to implement their own __delay() routine based
on fixed timers. Expose functionality to set the __delay()
routine at runtime. This should allow two machines with different
__delay() routines to happily co-exist within the same kernel
with minimal overhead.

Russell expressed concern that using a timer based __delay()
would cause problems where an iomapped device isn't mapped in
before a delay call was made (see
http://article.gmane.org/gmane.linux.ports.arm.kernel/78543 for
more info). We can sidestep that issue with this approach since
the __delay() routine _should_ only be pointed to a timer based
delay once the timer has been properly mapped. Up until that
point __delay() and udelay() will use delay_loop() which is
always safe to call.

This patch is inspired by x86's delay.c

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    2 ++
 arch/arm/lib/delay.c         |   21 ++++++++++++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index ccc5ed5..7c732b5 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -40,5 +40,7 @@ extern void __const_udelay(unsigned long);
 			__const_udelay((n) * ((2199023U*HZ)>>11))) :	\
 	  __udelay(n))
 
+extern void set_delay_fn(void (*fn)(unsigned long));
+
 #endif /* defined(_ARM_DELAY_H) */
 
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index 5ee0adc..b307fcc 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -3,6 +3,8 @@
  *
  *  Copyright (C) 1995, 1996 Russell King
  *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *  Copyright (C) 1993 Linus Torvalds
+ *  Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -12,11 +14,9 @@
 #include <linux/delay.h>
 
 /*
- * loops = usecs * HZ * loops_per_jiffy / 1000000
- *
  * Oh, if only we had a cycle counter...
  */
-void __delay(unsigned long loops)
+void delay_loop(unsigned long loops)
 {
 	asm volatile(
 	"1:	subs %0, %0, #1 \n"
@@ -25,6 +25,21 @@ void __delay(unsigned long loops)
 	: "r" (loops)
 	);
 }
+
+static void (*delay_fn)(unsigned long) = delay_loop;
+
+void set_delay_fn(void (*fn)(unsigned long))
+{
+	delay_fn = fn;
+}
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ */
+void __delay(unsigned long loops)
+{
+	delay_fn(loops);
+}
 EXPORT_SYMBOL(__delay);
 
 /*
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/3] [ARM] Allow machines to override __delay()
@ 2010-09-07 18:23   ` Stephen Boyd
  0 siblings, 0 replies; 21+ messages in thread
From: Stephen Boyd @ 2010-09-07 18:23 UTC (permalink / raw)
  To: linux-arm-kernel

Some machines want to implement their own __delay() routine based
on fixed timers. Expose functionality to set the __delay()
routine at runtime. This should allow two machines with different
__delay() routines to happily co-exist within the same kernel
with minimal overhead.

Russell expressed concern that using a timer based __delay()
would cause problems where an iomapped device isn't mapped in
before a delay call was made (see
http://article.gmane.org/gmane.linux.ports.arm.kernel/78543 for
more info). We can sidestep that issue with this approach since
the __delay() routine _should_ only be pointed to a timer based
delay once the timer has been properly mapped. Up until that
point __delay() and udelay() will use delay_loop() which is
always safe to call.

This patch is inspired by x86's delay.c

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    2 ++
 arch/arm/lib/delay.c         |   21 ++++++++++++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index ccc5ed5..7c732b5 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -40,5 +40,7 @@ extern void __const_udelay(unsigned long);
 			__const_udelay((n) * ((2199023U*HZ)>>11))) :	\
 	  __udelay(n))
 
+extern void set_delay_fn(void (*fn)(unsigned long));
+
 #endif /* defined(_ARM_DELAY_H) */
 
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index 5ee0adc..b307fcc 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -3,6 +3,8 @@
  *
  *  Copyright (C) 1995, 1996 Russell King
  *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *  Copyright (C) 1993 Linus Torvalds
+ *  Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -12,11 +14,9 @@
 #include <linux/delay.h>
 
 /*
- * loops = usecs * HZ * loops_per_jiffy / 1000000
- *
  * Oh, if only we had a cycle counter...
  */
-void __delay(unsigned long loops)
+void delay_loop(unsigned long loops)
 {
 	asm volatile(
 	"1:	subs %0, %0, #1 \n"
@@ -25,6 +25,21 @@ void __delay(unsigned long loops)
 	: "r" (loops)
 	);
 }
+
+static void (*delay_fn)(unsigned long) = delay_loop;
+
+void set_delay_fn(void (*fn)(unsigned long))
+{
+	delay_fn = fn;
+}
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ */
+void __delay(unsigned long loops)
+{
+	delay_fn(loops);
+}
 EXPORT_SYMBOL(__delay);
 
 /*
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/3] [ARM] Implement a timer based __delay() loop
  2010-09-07 18:23 ` Stephen Boyd
@ 2010-09-07 18:23   ` Stephen Boyd
  -1 siblings, 0 replies; 21+ messages in thread
From: Stephen Boyd @ 2010-09-07 18:23 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arm-msm, linux-kernel, Russell King, Saravana Kannan,
	Colin Cross, Kevin Hilman, Santosh Shilimkar

udelay() can be incorrect on SMP machines that scale their CPU
frequencies independently of one another (as pointed out here
http://article.gmane.org/gmane.linux.kernel/977567). The delay
loop can either be too fast or too slow depending on which CPU the
loops_per_jiffy counter is calibrated on and which CPU the delay
loop is running on. udelay() can also be incorrect if the
CPU frequency switches during the __delay() loop, causing the loop
to either terminate too early, or too late. 

We'd rather not fix udelay() by forcing it to run on one CPU or
take the penalty of a rather large loops_per_jiffy being used in
udelay() when the CPU is actually running slower. Instead we
solve the problem by making __delay() into a timer based loop
which should be unaffected by CPU frequency scaling. Therefore,
implement a timer based __delay() loop which can be used in place
of the current register based __delay() if a machine so chooses.

The kernel is already prepared for a timer based approach
(evident by the read_current_timer() function). If an arch
implements read_current_timer(), calibrate_delay() will use a
timer based function, calibrate_delay_direct(), to calculate
loops_per_jiffy (in which case loops_per_jiffy should really be
renamed to timer_ticks_per_jiffy). Since the loops_per_jiffy will
be based on timer ticks, __delay() should be implemented as a
loop around read_current_timer().

Doing this makes the expensive loops_per_jiffy calculation go
away (saving ~150ms on boot time on my machine) and fixes
udelay() by making it safe in the face of independently scaling
CPUs. The only prerequisite is that read_current_timer() is
monotonically increasing across calls (and doesn't overflow
within ~2000us).

There is a downside to this approach though. BogoMIPS is no
longer "accurate" in that it reflects the BogoMIPS of the timer
and not the CPU. On most SoC's the timer isn't running anywhere
near as fast as the CPU so BogoMIPS will be ridiculously low (my
timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my
CPU's 800). This shouldn't be too much of a concern though since
BogoMIPS are bogus anyway (hence the name).

This loop is pretty much a copy of AVR's version made more generic.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    1 +
 arch/arm/lib/delay.c         |   18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index 7c732b5..5c6b9a3 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long);
 	  __udelay(n))
 
 extern void set_delay_fn(void (*fn)(unsigned long));
+extern void read_current_timer_delay_loop(unsigned long loops);
 
 #endif /* defined(_ARM_DELAY_H) */
 
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index b307fcc..59fdf64 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -5,6 +5,7 @@
  *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
  *  Copyright (C) 1993 Linus Torvalds
  *  Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
+ *  Copyright (C) 2005-2006 Atmel Corporation
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -12,6 +13,7 @@
  */
 #include <linux/module.h>
 #include <linux/delay.h>
+#include <linux/timex.h>
 
 /*
  * Oh, if only we had a cycle counter...
@@ -26,6 +28,22 @@ void delay_loop(unsigned long loops)
 	);
 }
 
+#ifdef ARCH_HAS_READ_CURRENT_TIMER
+/*
+ * Assuming read_current_timer() is monotonically increasing
+ * across calls.
+ */
+void read_current_timer_delay_loop(unsigned long loops)
+{
+	unsigned long bclock, now;
+
+	read_current_timer(&bclock);
+	do {
+		read_current_timer(&now);
+	} while ((now - bclock) < loops);
+}
+#endif
+
 static void (*delay_fn)(unsigned long) = delay_loop;
 
 void set_delay_fn(void (*fn)(unsigned long))
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/3] [ARM] Implement a timer based __delay() loop
@ 2010-09-07 18:23   ` Stephen Boyd
  0 siblings, 0 replies; 21+ messages in thread
From: Stephen Boyd @ 2010-09-07 18:23 UTC (permalink / raw)
  To: linux-arm-kernel

udelay() can be incorrect on SMP machines that scale their CPU
frequencies independently of one another (as pointed out here
http://article.gmane.org/gmane.linux.kernel/977567). The delay
loop can either be too fast or too slow depending on which CPU the
loops_per_jiffy counter is calibrated on and which CPU the delay
loop is running on. udelay() can also be incorrect if the
CPU frequency switches during the __delay() loop, causing the loop
to either terminate too early, or too late. 

We'd rather not fix udelay() by forcing it to run on one CPU or
take the penalty of a rather large loops_per_jiffy being used in
udelay() when the CPU is actually running slower. Instead we
solve the problem by making __delay() into a timer based loop
which should be unaffected by CPU frequency scaling. Therefore,
implement a timer based __delay() loop which can be used in place
of the current register based __delay() if a machine so chooses.

The kernel is already prepared for a timer based approach
(evident by the read_current_timer() function). If an arch
implements read_current_timer(), calibrate_delay() will use a
timer based function, calibrate_delay_direct(), to calculate
loops_per_jiffy (in which case loops_per_jiffy should really be
renamed to timer_ticks_per_jiffy). Since the loops_per_jiffy will
be based on timer ticks, __delay() should be implemented as a
loop around read_current_timer().

Doing this makes the expensive loops_per_jiffy calculation go
away (saving ~150ms on boot time on my machine) and fixes
udelay() by making it safe in the face of independently scaling
CPUs. The only prerequisite is that read_current_timer() is
monotonically increasing across calls (and doesn't overflow
within ~2000us).

There is a downside to this approach though. BogoMIPS is no
longer "accurate" in that it reflects the BogoMIPS of the timer
and not the CPU. On most SoC's the timer isn't running anywhere
near as fast as the CPU so BogoMIPS will be ridiculously low (my
timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my
CPU's 800). This shouldn't be too much of a concern though since
BogoMIPS are bogus anyway (hence the name).

This loop is pretty much a copy of AVR's version made more generic.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    1 +
 arch/arm/lib/delay.c         |   18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index 7c732b5..5c6b9a3 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long);
 	  __udelay(n))
 
 extern void set_delay_fn(void (*fn)(unsigned long));
+extern void read_current_timer_delay_loop(unsigned long loops);
 
 #endif /* defined(_ARM_DELAY_H) */
 
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index b307fcc..59fdf64 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -5,6 +5,7 @@
  *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
  *  Copyright (C) 1993 Linus Torvalds
  *  Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
+ *  Copyright (C) 2005-2006 Atmel Corporation
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -12,6 +13,7 @@
  */
 #include <linux/module.h>
 #include <linux/delay.h>
+#include <linux/timex.h>
 
 /*
  * Oh, if only we had a cycle counter...
@@ -26,6 +28,22 @@ void delay_loop(unsigned long loops)
 	);
 }
 
+#ifdef ARCH_HAS_READ_CURRENT_TIMER
+/*
+ * Assuming read_current_timer() is monotonically increasing
+ * across calls.
+ */
+void read_current_timer_delay_loop(unsigned long loops)
+{
+	unsigned long bclock, now;
+
+	read_current_timer(&bclock);
+	do {
+		read_current_timer(&now);
+	} while ((now - bclock) < loops);
+}
+#endif
+
 static void (*delay_fn)(unsigned long) = delay_loop;
 
 void set_delay_fn(void (*fn)(unsigned long))
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too)
  2010-09-07 18:23 ` Stephen Boyd
@ 2010-09-22 21:54   ` Daniel Walker
  -1 siblings, 0 replies; 21+ messages in thread
From: Daniel Walker @ 2010-09-22 21:54 UTC (permalink / raw)
  To: Russell King
  Cc: linux-arm-kernel, Kevin Hilman, linux-arm-msm, linux-kernel,
	Saravana Kannan, Santosh Shilimkar, Colin Cross, Stephen Boyd


Russell

Is this series something you would be willing to pull into your tree?

Daniel

On Tue, 2010-09-07 at 11:23 -0700, Stephen Boyd wrote:
> (Sorry, resending due to stray comma...)
> 
> These patches are another attempt at fixing the udelay()
> issue pointed out on arm-lkml[1][2]. A quick recap: some SMP
> machines can scale their CPU frequencies independent of one
> another. loops_per_jiffy is calibrated globally and used in
> __const_udelay(). If one CPU is running faster than what the
> loops_per_jiffy is calculated (or scaled) for, udelay() will
> be incorrect and not wait long enough (or too long). A similar
> problem occurs if the cpu frequency is scaled during a udelay()
> call.
> 
> We could fix this issue a couple ways, wholesale replacement
> of __udelay() and __const_udelay() (see [2] for that approach),
> or replacement of __delay() (this series). Option 1 can fail if
> anybody uses udelay() before memory is mapped and also duplicates
> most of the code in asm/delay.h. It also needs to hardcode the
> timer tick frequency, which can sometimes be inaccurate. The
> benefit is that loops_per_jiffy stays the same and thus BogoMIPS
> is unchanged.  Option 2 can't fail since the __delay() loop is
> replaced after memory is mapped in, but it suffers from a low
> BogoMIPS when timers are clocked slowly. It also more accurately
> calculates the timer tick frequency through the use of
> calibrate_delay_direct().
> 
> -- Reference --
> [1] http://article.gmane.org/gmane.linux.kernel/977567
> [2] http://article.gmane.org/gmane.linux.ports.arm.kernel/78496 
> 
> Stephen Boyd (3):
>   [ARM] Translate delay.S into (mostly) C
>   [ARM] Allow machines to override __delay()
>   [ARM] Implement a timer based __delay() loop
> 
>  arch/arm/include/asm/delay.h |    5 ++-
>  arch/arm/kernel/armksyms.c   |    4 --
>  arch/arm/lib/delay.S         |   65 ------------------------------
>  arch/arm/lib/delay.c         |   90 ++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 94 insertions(+), 70 deletions(-)
>  delete mode 100644 arch/arm/lib/delay.S
>  create mode 100644 arch/arm/lib/delay.c
> 

-- 

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too)
@ 2010-09-22 21:54   ` Daniel Walker
  0 siblings, 0 replies; 21+ messages in thread
From: Daniel Walker @ 2010-09-22 21:54 UTC (permalink / raw)
  To: linux-arm-kernel


Russell

Is this series something you would be willing to pull into your tree?

Daniel

On Tue, 2010-09-07 at 11:23 -0700, Stephen Boyd wrote:
> (Sorry, resending due to stray comma...)
> 
> These patches are another attempt at fixing the udelay()
> issue pointed out on arm-lkml[1][2]. A quick recap: some SMP
> machines can scale their CPU frequencies independent of one
> another. loops_per_jiffy is calibrated globally and used in
> __const_udelay(). If one CPU is running faster than what the
> loops_per_jiffy is calculated (or scaled) for, udelay() will
> be incorrect and not wait long enough (or too long). A similar
> problem occurs if the cpu frequency is scaled during a udelay()
> call.
> 
> We could fix this issue a couple ways, wholesale replacement
> of __udelay() and __const_udelay() (see [2] for that approach),
> or replacement of __delay() (this series). Option 1 can fail if
> anybody uses udelay() before memory is mapped and also duplicates
> most of the code in asm/delay.h. It also needs to hardcode the
> timer tick frequency, which can sometimes be inaccurate. The
> benefit is that loops_per_jiffy stays the same and thus BogoMIPS
> is unchanged.  Option 2 can't fail since the __delay() loop is
> replaced after memory is mapped in, but it suffers from a low
> BogoMIPS when timers are clocked slowly. It also more accurately
> calculates the timer tick frequency through the use of
> calibrate_delay_direct().
> 
> -- Reference --
> [1] http://article.gmane.org/gmane.linux.kernel/977567
> [2] http://article.gmane.org/gmane.linux.ports.arm.kernel/78496 
> 
> Stephen Boyd (3):
>   [ARM] Translate delay.S into (mostly) C
>   [ARM] Allow machines to override __delay()
>   [ARM] Implement a timer based __delay() loop
> 
>  arch/arm/include/asm/delay.h |    5 ++-
>  arch/arm/kernel/armksyms.c   |    4 --
>  arch/arm/lib/delay.S         |   65 ------------------------------
>  arch/arm/lib/delay.c         |   90 ++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 94 insertions(+), 70 deletions(-)
>  delete mode 100644 arch/arm/lib/delay.S
>  create mode 100644 arch/arm/lib/delay.c
> 

-- 

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] [ARM] Implement a timer based __delay() loop
  2010-10-06  3:36       ` Stephen Boyd
@ 2010-10-06 13:44         ` Daniel Walker
  -1 siblings, 0 replies; 21+ messages in thread
From: Daniel Walker @ 2010-10-06 13:44 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King, linux-kernel, linux-arm-msm, linux-arm-kernel,
	Saravana Kannan, Colin Cross, Kevin Hilman, Santosh Shilimkar

On Tue, 2010-10-05 at 20:36 -0700, Stephen Boyd wrote:
> On 10/05/2010 10:38 AM, Daniel Walker wrote:
> >>  
> >> +#ifdef ARCH_HAS_READ_CURRENT_TIMER
> >> +/*
> >> + * Assuming read_current_timer() is monotonically increasing
> >> + * across calls.
> > You should add more comments here. You assuming that it's monotonic over
> > a 2000us (2ms) period .. I'm not sure this is a good assumption this
> > timer may not be monotonically increasing all the time, what happens
> > then?
> 
> Ok I'll add that it shouldn't wrap more than once within 2000us (or
> should I say 5ms since mdelay uses udelay?). Is that what you're saying
> by it not being monotonically increasing? If a timer isn't increasing
> the tick count it's broken and this call will loop forever. If the timer
> wraps, we'll be safe due to unsigned maths as long as it wraps only once.

I was saying it could be near wrapping when this process starts, then
wrap in the middle. The unsigned math thing should work I think, but add
a comment .. Your basically saying it's OK if it wraps, but the comment
says it needs to be "monotonically increasing" which actually doesn't
appear to be true. It needs to not wrap twice during this process.

> >> +void read_current_timer_delay_loop(unsigned long loops)
> >> +{
> >> +	unsigned long bclock, now;
> >> +
> >> +	read_current_timer(&bclock);
> >> +	do {
> >> +		read_current_timer(&now);
> >> +	} while ((now - bclock) < loops);
> > Have you looked at time_before()/time_after() ?
> 
> Nope. Wouldn't that require an addition though to make it work? I'd
> rather just leave it like it is.

I'm not saying you should change it, I'm saying look at those macro's to
make sure your doing thinks correctly.

Daniel


-- 
Sent by an consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/3] [ARM] Implement a timer based __delay() loop
@ 2010-10-06 13:44         ` Daniel Walker
  0 siblings, 0 replies; 21+ messages in thread
From: Daniel Walker @ 2010-10-06 13:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2010-10-05 at 20:36 -0700, Stephen Boyd wrote:
> On 10/05/2010 10:38 AM, Daniel Walker wrote:
> >>  
> >> +#ifdef ARCH_HAS_READ_CURRENT_TIMER
> >> +/*
> >> + * Assuming read_current_timer() is monotonically increasing
> >> + * across calls.
> > You should add more comments here. You assuming that it's monotonic over
> > a 2000us (2ms) period .. I'm not sure this is a good assumption this
> > timer may not be monotonically increasing all the time, what happens
> > then?
> 
> Ok I'll add that it shouldn't wrap more than once within 2000us (or
> should I say 5ms since mdelay uses udelay?). Is that what you're saying
> by it not being monotonically increasing? If a timer isn't increasing
> the tick count it's broken and this call will loop forever. If the timer
> wraps, we'll be safe due to unsigned maths as long as it wraps only once.

I was saying it could be near wrapping when this process starts, then
wrap in the middle. The unsigned math thing should work I think, but add
a comment .. Your basically saying it's OK if it wraps, but the comment
says it needs to be "monotonically increasing" which actually doesn't
appear to be true. It needs to not wrap twice during this process.

> >> +void read_current_timer_delay_loop(unsigned long loops)
> >> +{
> >> +	unsigned long bclock, now;
> >> +
> >> +	read_current_timer(&bclock);
> >> +	do {
> >> +		read_current_timer(&now);
> >> +	} while ((now - bclock) < loops);
> > Have you looked at time_before()/time_after() ?
> 
> Nope. Wouldn't that require an addition though to make it work? I'd
> rather just leave it like it is.

I'm not saying you should change it, I'm saying look at those macro's to
make sure your doing thinks correctly.

Daniel


-- 
Sent by an consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] [ARM] Implement a timer based __delay() loop
  2010-10-05 17:38     ` Daniel Walker
@ 2010-10-06  3:36       ` Stephen Boyd
  -1 siblings, 0 replies; 21+ messages in thread
From: Stephen Boyd @ 2010-10-06  3:36 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Russell King, linux-kernel, linux-arm-msm, linux-arm-kernel,
	Saravana Kannan, Colin Cross, Kevin Hilman, Santosh Shilimkar

 On 10/05/2010 10:38 AM, Daniel Walker wrote:
>>  
>> +#ifdef ARCH_HAS_READ_CURRENT_TIMER
>> +/*
>> + * Assuming read_current_timer() is monotonically increasing
>> + * across calls.
> You should add more comments here. You assuming that it's monotonic over
> a 2000us (2ms) period .. I'm not sure this is a good assumption this
> timer may not be monotonically increasing all the time, what happens
> then?

Ok I'll add that it shouldn't wrap more than once within 2000us (or
should I say 5ms since mdelay uses udelay?). Is that what you're saying
by it not being monotonically increasing? If a timer isn't increasing
the tick count it's broken and this call will loop forever. If the timer
wraps, we'll be safe due to unsigned maths as long as it wraps only once.

>> +void read_current_timer_delay_loop(unsigned long loops)
>> +{
>> +	unsigned long bclock, now;
>> +
>> +	read_current_timer(&bclock);
>> +	do {
>> +		read_current_timer(&now);
>> +	} while ((now - bclock) < loops);
> Have you looked at time_before()/time_after() ?

Nope. Wouldn't that require an addition though to make it work? I'd
rather just leave it like it is.

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/3] [ARM] Implement a timer based __delay() loop
@ 2010-10-06  3:36       ` Stephen Boyd
  0 siblings, 0 replies; 21+ messages in thread
From: Stephen Boyd @ 2010-10-06  3:36 UTC (permalink / raw)
  To: linux-arm-kernel

 On 10/05/2010 10:38 AM, Daniel Walker wrote:
>>  
>> +#ifdef ARCH_HAS_READ_CURRENT_TIMER
>> +/*
>> + * Assuming read_current_timer() is monotonically increasing
>> + * across calls.
> You should add more comments here. You assuming that it's monotonic over
> a 2000us (2ms) period .. I'm not sure this is a good assumption this
> timer may not be monotonically increasing all the time, what happens
> then?

Ok I'll add that it shouldn't wrap more than once within 2000us (or
should I say 5ms since mdelay uses udelay?). Is that what you're saying
by it not being monotonically increasing? If a timer isn't increasing
the tick count it's broken and this call will loop forever. If the timer
wraps, we'll be safe due to unsigned maths as long as it wraps only once.

>> +void read_current_timer_delay_loop(unsigned long loops)
>> +{
>> +	unsigned long bclock, now;
>> +
>> +	read_current_timer(&bclock);
>> +	do {
>> +		read_current_timer(&now);
>> +	} while ((now - bclock) < loops);
> Have you looked at time_before()/time_after() ?

Nope. Wouldn't that require an addition though to make it work? I'd
rather just leave it like it is.

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] [ARM] Implement a timer based __delay() loop
  2010-09-28  3:33   ` Stephen Boyd
@ 2010-10-05 17:38     ` Daniel Walker
  -1 siblings, 0 replies; 21+ messages in thread
From: Daniel Walker @ 2010-10-05 17:38 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King, linux-kernel, linux-arm-msm, linux-arm-kernel,
	Saravana Kannan, Colin Cross, Kevin Hilman, Santosh Shilimkar

On Mon, 2010-09-27 at 20:33 -0700, Stephen Boyd wrote:
> udelay() can be incorrect on SMP machines that scale their CPU
> frequencies independently of one another (as pointed out here
> http://article.gmane.org/gmane.linux.kernel/977567). The delay
> loop can either be too fast or too slow depending on which CPU the
> loops_per_jiffy counter is calibrated on and which CPU the delay
> loop is running on. udelay() can also be incorrect if the
> CPU frequency switches during the __delay() loop, causing the loop
> to either terminate too early, or too late. 
> 
> We'd rather not fix udelay() by forcing it to run on one CPU or
> take the penalty of a rather large loops_per_jiffy being used in
> udelay() when the CPU is actually running slower. Instead we
> solve the problem by making __delay() into a timer based loop
> which should be unaffected by CPU frequency scaling. Therefore,
> implement a timer based __delay() loop which can be used in place
> of the current register based __delay() if a machine so chooses.
> 
> The kernel is already prepared for a timer based approach
> (evident by the read_current_timer() function). If an arch
> implements read_current_timer(), calibrate_delay() will use a
> timer based function, calibrate_delay_direct(), to calculate
> loops_per_jiffy (in which case loops_per_jiffy should really be
> renamed to timer_ticks_per_jiffy). Since the loops_per_jiffy will
> be based on timer ticks, __delay() should be implemented as a
> loop around read_current_timer().
> 
> Doing this makes the expensive loops_per_jiffy calculation go
> away (saving ~150ms on boot time on my machine) and fixes
> udelay() by making it safe in the face of independently scaling
> CPUs. The only prerequisite is that read_current_timer() is
> monotonically increasing across calls (and doesn't overflow
> within ~2000us).
> 
> There is a downside to this approach though. BogoMIPS is no
> longer "accurate" in that it reflects the BogoMIPS of the timer
> and not the CPU. On most SoC's the timer isn't running anywhere
> near as fast as the CPU so BogoMIPS will be ridiculously low (my
> timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my
> CPU's 800). This shouldn't be too much of a concern though since
> BogoMIPS are bogus anyway (hence the name).
> 
> This loop is pretty much a copy of AVR's version made more generic.
> 
> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
> Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
> ---
>  arch/arm/include/asm/delay.h |    1 +
>  arch/arm/lib/delay.c         |   18 ++++++++++++++++++
>  2 files changed, 19 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
> index 7c732b5..5c6b9a3 100644
> --- a/arch/arm/include/asm/delay.h
> +++ b/arch/arm/include/asm/delay.h
> @@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long);
>  	  __udelay(n))
>  
>  extern void set_delay_fn(void (*fn)(unsigned long));
> +extern void read_current_timer_delay_loop(unsigned long loops);
>  
>  #endif /* defined(_ARM_DELAY_H) */
>  
> diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
> index b307fcc..59fdf64 100644
> --- a/arch/arm/lib/delay.c
> +++ b/arch/arm/lib/delay.c
> @@ -5,6 +5,7 @@
>   *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
>   *  Copyright (C) 1993 Linus Torvalds
>   *  Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
> + *  Copyright (C) 2005-2006 Atmel Corporation
>   *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License version 2 as
> @@ -12,6 +13,7 @@
>   */
>  #include <linux/module.h>
>  #include <linux/delay.h>
> +#include <linux/timex.h>
>  
>  /*
>   * Oh, if only we had a cycle counter...
> @@ -26,6 +28,22 @@ void delay_loop(unsigned long loops)
>  	);
>  }
>  
> +#ifdef ARCH_HAS_READ_CURRENT_TIMER
> +/*
> + * Assuming read_current_timer() is monotonically increasing
> + * across calls.

You should add more comments here. You assuming that it's monotonic over
a 2000us (2ms) period .. I'm not sure this is a good assumption this
timer may not be monotonically increasing all the time, what happens
then?

> +void read_current_timer_delay_loop(unsigned long loops)
> +{
> +	unsigned long bclock, now;
> +
> +	read_current_timer(&bclock);
> +	do {
> +		read_current_timer(&now);
> +	} while ((now - bclock) < loops);

Have you looked at time_before()/time_after() ?

Daniel

-- 

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/3] [ARM] Implement a timer based __delay() loop
@ 2010-10-05 17:38     ` Daniel Walker
  0 siblings, 0 replies; 21+ messages in thread
From: Daniel Walker @ 2010-10-05 17:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2010-09-27 at 20:33 -0700, Stephen Boyd wrote:
> udelay() can be incorrect on SMP machines that scale their CPU
> frequencies independently of one another (as pointed out here
> http://article.gmane.org/gmane.linux.kernel/977567). The delay
> loop can either be too fast or too slow depending on which CPU the
> loops_per_jiffy counter is calibrated on and which CPU the delay
> loop is running on. udelay() can also be incorrect if the
> CPU frequency switches during the __delay() loop, causing the loop
> to either terminate too early, or too late. 
> 
> We'd rather not fix udelay() by forcing it to run on one CPU or
> take the penalty of a rather large loops_per_jiffy being used in
> udelay() when the CPU is actually running slower. Instead we
> solve the problem by making __delay() into a timer based loop
> which should be unaffected by CPU frequency scaling. Therefore,
> implement a timer based __delay() loop which can be used in place
> of the current register based __delay() if a machine so chooses.
> 
> The kernel is already prepared for a timer based approach
> (evident by the read_current_timer() function). If an arch
> implements read_current_timer(), calibrate_delay() will use a
> timer based function, calibrate_delay_direct(), to calculate
> loops_per_jiffy (in which case loops_per_jiffy should really be
> renamed to timer_ticks_per_jiffy). Since the loops_per_jiffy will
> be based on timer ticks, __delay() should be implemented as a
> loop around read_current_timer().
> 
> Doing this makes the expensive loops_per_jiffy calculation go
> away (saving ~150ms on boot time on my machine) and fixes
> udelay() by making it safe in the face of independently scaling
> CPUs. The only prerequisite is that read_current_timer() is
> monotonically increasing across calls (and doesn't overflow
> within ~2000us).
> 
> There is a downside to this approach though. BogoMIPS is no
> longer "accurate" in that it reflects the BogoMIPS of the timer
> and not the CPU. On most SoC's the timer isn't running anywhere
> near as fast as the CPU so BogoMIPS will be ridiculously low (my
> timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my
> CPU's 800). This shouldn't be too much of a concern though since
> BogoMIPS are bogus anyway (hence the name).
> 
> This loop is pretty much a copy of AVR's version made more generic.
> 
> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
> Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
> ---
>  arch/arm/include/asm/delay.h |    1 +
>  arch/arm/lib/delay.c         |   18 ++++++++++++++++++
>  2 files changed, 19 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
> index 7c732b5..5c6b9a3 100644
> --- a/arch/arm/include/asm/delay.h
> +++ b/arch/arm/include/asm/delay.h
> @@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long);
>  	  __udelay(n))
>  
>  extern void set_delay_fn(void (*fn)(unsigned long));
> +extern void read_current_timer_delay_loop(unsigned long loops);
>  
>  #endif /* defined(_ARM_DELAY_H) */
>  
> diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
> index b307fcc..59fdf64 100644
> --- a/arch/arm/lib/delay.c
> +++ b/arch/arm/lib/delay.c
> @@ -5,6 +5,7 @@
>   *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
>   *  Copyright (C) 1993 Linus Torvalds
>   *  Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
> + *  Copyright (C) 2005-2006 Atmel Corporation
>   *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License version 2 as
> @@ -12,6 +13,7 @@
>   */
>  #include <linux/module.h>
>  #include <linux/delay.h>
> +#include <linux/timex.h>
>  
>  /*
>   * Oh, if only we had a cycle counter...
> @@ -26,6 +28,22 @@ void delay_loop(unsigned long loops)
>  	);
>  }
>  
> +#ifdef ARCH_HAS_READ_CURRENT_TIMER
> +/*
> + * Assuming read_current_timer() is monotonically increasing
> + * across calls.

You should add more comments here. You assuming that it's monotonic over
a 2000us (2ms) period .. I'm not sure this is a good assumption this
timer may not be monotonically increasing all the time, what happens
then?

> +void read_current_timer_delay_loop(unsigned long loops)
> +{
> +	unsigned long bclock, now;
> +
> +	read_current_timer(&bclock);
> +	do {
> +		read_current_timer(&now);
> +	} while ((now - bclock) < loops);

Have you looked@time_before()/time_after() ?

Daniel

-- 

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/3] [ARM] Implement a timer based __delay() loop
  2010-09-28  3:33 Stephen Boyd
@ 2010-09-28  3:33   ` Stephen Boyd
  0 siblings, 0 replies; 21+ messages in thread
From: Stephen Boyd @ 2010-09-28  3:33 UTC (permalink / raw)
  To: Russell King
  Cc: linux-kernel, linux-arm-msm, linux-arm-kernel, Daniel Walker,
	Saravana Kannan, Colin Cross, Kevin Hilman, Santosh Shilimkar

udelay() can be incorrect on SMP machines that scale their CPU
frequencies independently of one another (as pointed out here
http://article.gmane.org/gmane.linux.kernel/977567). The delay
loop can either be too fast or too slow depending on which CPU the
loops_per_jiffy counter is calibrated on and which CPU the delay
loop is running on. udelay() can also be incorrect if the
CPU frequency switches during the __delay() loop, causing the loop
to either terminate too early, or too late. 

We'd rather not fix udelay() by forcing it to run on one CPU or
take the penalty of a rather large loops_per_jiffy being used in
udelay() when the CPU is actually running slower. Instead we
solve the problem by making __delay() into a timer based loop
which should be unaffected by CPU frequency scaling. Therefore,
implement a timer based __delay() loop which can be used in place
of the current register based __delay() if a machine so chooses.

The kernel is already prepared for a timer based approach
(evident by the read_current_timer() function). If an arch
implements read_current_timer(), calibrate_delay() will use a
timer based function, calibrate_delay_direct(), to calculate
loops_per_jiffy (in which case loops_per_jiffy should really be
renamed to timer_ticks_per_jiffy). Since the loops_per_jiffy will
be based on timer ticks, __delay() should be implemented as a
loop around read_current_timer().

Doing this makes the expensive loops_per_jiffy calculation go
away (saving ~150ms on boot time on my machine) and fixes
udelay() by making it safe in the face of independently scaling
CPUs. The only prerequisite is that read_current_timer() is
monotonically increasing across calls (and doesn't overflow
within ~2000us).

There is a downside to this approach though. BogoMIPS is no
longer "accurate" in that it reflects the BogoMIPS of the timer
and not the CPU. On most SoC's the timer isn't running anywhere
near as fast as the CPU so BogoMIPS will be ridiculously low (my
timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my
CPU's 800). This shouldn't be too much of a concern though since
BogoMIPS are bogus anyway (hence the name).

This loop is pretty much a copy of AVR's version made more generic.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    1 +
 arch/arm/lib/delay.c         |   18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index 7c732b5..5c6b9a3 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long);
 	  __udelay(n))
 
 extern void set_delay_fn(void (*fn)(unsigned long));
+extern void read_current_timer_delay_loop(unsigned long loops);
 
 #endif /* defined(_ARM_DELAY_H) */
 
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index b307fcc..59fdf64 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -5,6 +5,7 @@
  *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
  *  Copyright (C) 1993 Linus Torvalds
  *  Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
+ *  Copyright (C) 2005-2006 Atmel Corporation
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -12,6 +13,7 @@
  */
 #include <linux/module.h>
 #include <linux/delay.h>
+#include <linux/timex.h>
 
 /*
  * Oh, if only we had a cycle counter...
@@ -26,6 +28,22 @@ void delay_loop(unsigned long loops)
 	);
 }
 
+#ifdef ARCH_HAS_READ_CURRENT_TIMER
+/*
+ * Assuming read_current_timer() is monotonically increasing
+ * across calls.
+ */
+void read_current_timer_delay_loop(unsigned long loops)
+{
+	unsigned long bclock, now;
+
+	read_current_timer(&bclock);
+	do {
+		read_current_timer(&now);
+	} while ((now - bclock) < loops);
+}
+#endif
+
 static void (*delay_fn)(unsigned long) = delay_loop;
 
 void set_delay_fn(void (*fn)(unsigned long))
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/3] [ARM] Implement a timer based __delay() loop
@ 2010-09-28  3:33   ` Stephen Boyd
  0 siblings, 0 replies; 21+ messages in thread
From: Stephen Boyd @ 2010-09-28  3:33 UTC (permalink / raw)
  To: linux-arm-kernel

udelay() can be incorrect on SMP machines that scale their CPU
frequencies independently of one another (as pointed out here
http://article.gmane.org/gmane.linux.kernel/977567). The delay
loop can either be too fast or too slow depending on which CPU the
loops_per_jiffy counter is calibrated on and which CPU the delay
loop is running on. udelay() can also be incorrect if the
CPU frequency switches during the __delay() loop, causing the loop
to either terminate too early, or too late. 

We'd rather not fix udelay() by forcing it to run on one CPU or
take the penalty of a rather large loops_per_jiffy being used in
udelay() when the CPU is actually running slower. Instead we
solve the problem by making __delay() into a timer based loop
which should be unaffected by CPU frequency scaling. Therefore,
implement a timer based __delay() loop which can be used in place
of the current register based __delay() if a machine so chooses.

The kernel is already prepared for a timer based approach
(evident by the read_current_timer() function). If an arch
implements read_current_timer(), calibrate_delay() will use a
timer based function, calibrate_delay_direct(), to calculate
loops_per_jiffy (in which case loops_per_jiffy should really be
renamed to timer_ticks_per_jiffy). Since the loops_per_jiffy will
be based on timer ticks, __delay() should be implemented as a
loop around read_current_timer().

Doing this makes the expensive loops_per_jiffy calculation go
away (saving ~150ms on boot time on my machine) and fixes
udelay() by making it safe in the face of independently scaling
CPUs. The only prerequisite is that read_current_timer() is
monotonically increasing across calls (and doesn't overflow
within ~2000us).

There is a downside to this approach though. BogoMIPS is no
longer "accurate" in that it reflects the BogoMIPS of the timer
and not the CPU. On most SoC's the timer isn't running anywhere
near as fast as the CPU so BogoMIPS will be ridiculously low (my
timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my
CPU's 800). This shouldn't be too much of a concern though since
BogoMIPS are bogus anyway (hence the name).

This loop is pretty much a copy of AVR's version made more generic.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    1 +
 arch/arm/lib/delay.c         |   18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index 7c732b5..5c6b9a3 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long);
 	  __udelay(n))
 
 extern void set_delay_fn(void (*fn)(unsigned long));
+extern void read_current_timer_delay_loop(unsigned long loops);
 
 #endif /* defined(_ARM_DELAY_H) */
 
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index b307fcc..59fdf64 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -5,6 +5,7 @@
  *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
  *  Copyright (C) 1993 Linus Torvalds
  *  Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
+ *  Copyright (C) 2005-2006 Atmel Corporation
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -12,6 +13,7 @@
  */
 #include <linux/module.h>
 #include <linux/delay.h>
+#include <linux/timex.h>
 
 /*
  * Oh, if only we had a cycle counter...
@@ -26,6 +28,22 @@ void delay_loop(unsigned long loops)
 	);
 }
 
+#ifdef ARCH_HAS_READ_CURRENT_TIMER
+/*
+ * Assuming read_current_timer() is monotonically increasing
+ * across calls.
+ */
+void read_current_timer_delay_loop(unsigned long loops)
+{
+	unsigned long bclock, now;
+
+	read_current_timer(&bclock);
+	do {
+		read_current_timer(&now);
+	} while ((now - bclock) < loops);
+}
+#endif
+
 static void (*delay_fn)(unsigned long) = delay_loop;
 
 void set_delay_fn(void (*fn)(unsigned long))
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/3] [ARM] Implement a timer based __delay() loop
  2010-09-04  4:28 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd
@ 2010-09-04  4:28 ` Stephen Boyd
  0 siblings, 0 replies; 21+ messages in thread
From: Stephen Boyd @ 2010-09-04  4:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arm-msm, linux-kernel, Russell King, Saravana Kannan,
	Colin Cross, Kevin Hilman, Shilimkar, Santosh

udelay() can be incorrect on SMP machines that scale their CPU
frequencies independently of one another (as pointed out here
http://article.gmane.org/gmane.linux.kernel/977567). The delay
loop can either be too fast or too slow depending on which CPU the
loops_per_jiffy counter is calibrated on and which CPU the delay
loop is running on. udelay() can also be incorrect if the
CPU frequency switches during the __delay() loop, causing the loop
to either terminate too early, or too late. 

We'd rather not fix udelay() by forcing it to run on one CPU or
take the penalty of a rather large loops_per_jiffy being used in
udelay() when the CPU is actually running slower. Instead we
solve the problem by making __delay() into a timer based loop
which should be unaffected by CPU frequency scaling. Therefore,
implement a timer based __delay() loop which can be used in place
of the current register based __delay() if a machine so chooses.

The kernel is already prepared for a timer based approach
(evident by the read_current_timer() function). If an arch
implements read_current_timer(), calibrate_delay() will use a
timer based function, calibrate_delay_direct(), to calculate
loops_per_jiffy (in which case loops_per_jiffy should really be
renamed to timer_ticks_per_jiffy). Since the loops_per_jiffy will
be based on timer ticks, __delay() should be implemented as a
loop around read_current_timer().

Doing this makes the expensive loops_per_jiffy calculation go
away (saving ~150ms on boot time on my machine) and fixes
udelay() by making it safe in the face of independently scaling
CPUs. The only prerequisite is that read_current_timer() is
monotonically increasing across calls (and doesn't overflow
within ~2000us).

There is a downside to this approach though. BogoMIPS is no
longer "accurate" in that it reflects the BogoMIPS of the timer
and not the CPU. On most SoC's the timer isn't running anywhere
near as fast as the CPU so BogoMIPS will be ridiculously low (my
timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my
CPU's 800). This shouldn't be too much of a concern though since
BogoMIPS are bogus anyway (hence the name).

This loop is pretty much a copy of AVR's version made more generic.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    1 +
 arch/arm/lib/delay.c         |   18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index 7c732b5..5c6b9a3 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long);
 	  __udelay(n))
 
 extern void set_delay_fn(void (*fn)(unsigned long));
+extern void read_current_timer_delay_loop(unsigned long loops);
 
 #endif /* defined(_ARM_DELAY_H) */
 
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index b307fcc..59fdf64 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -5,6 +5,7 @@
  *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
  *  Copyright (C) 1993 Linus Torvalds
  *  Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
+ *  Copyright (C) 2005-2006 Atmel Corporation
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -12,6 +13,7 @@
  */
 #include <linux/module.h>
 #include <linux/delay.h>
+#include <linux/timex.h>
 
 /*
  * Oh, if only we had a cycle counter...
@@ -26,6 +28,22 @@ void delay_loop(unsigned long loops)
 	);
 }
 
+#ifdef ARCH_HAS_READ_CURRENT_TIMER
+/*
+ * Assuming read_current_timer() is monotonically increasing
+ * across calls.
+ */
+void read_current_timer_delay_loop(unsigned long loops)
+{
+	unsigned long bclock, now;
+
+	read_current_timer(&bclock);
+	do {
+		read_current_timer(&now);
+	} while ((now - bclock) < loops);
+}
+#endif
+
 static void (*delay_fn)(unsigned long) = delay_loop;
 
 void set_delay_fn(void (*fn)(unsigned long))
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/3] [ARM] Implement a timer based __delay() loop
  2010-08-19  2:24 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd
@ 2010-08-19  2:24   ` Stephen Boyd
  0 siblings, 0 replies; 21+ messages in thread
From: Stephen Boyd @ 2010-08-19  2:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arm-msm, linux-kernel, Russell King, Saravana Kannan

udelay() can be incorrect on SMP machines that scale their CPU
frequencies independently of one another (as pointed out here
http://article.gmane.org/gmane.linux.kernel/977567). The delay
loop can either be too fast or too slow depending on which CPU the
loops_per_jiffy counter is calibrated on and which CPU the delay
loop is running on. udelay() can also be incorrect if the
CPU frequency switches during the __delay() loop, causing the loop
to either terminate too early, or too late. 

We'd rather not fix udelay() by forcing it to run on one CPU or
take the penalty of a rather large loops_per_jiffy being used in
udelay() when the CPU is actually running slower. Instead we
solve the problem by making __delay() into a timer based loop
which should be unaffected by CPU frequency scaling. Therefore,
implement a timer based __delay() loop which can be used in place
of the current register based __delay() if a machine so chooses.

The kernel is already prepared for a timer based approach
(evident by the read_current_timer() function). If an arch
implements read_current_timer(), calibrate_delay() will use a
timer based function, calibrate_delay_direct(), to calculate
loops_per_jiffy (in which case loops_per_jiffy should really be
renamed to timer_ticks_per_jiffy). Since the loops_per_jiffy will
be based on timer ticks, __delay() should be implemented as a
loop around read_current_timer().

Doing this makes the expensive loops_per_jiffy calculation go
away (saving ~150ms on boot time on my machine) and fixes
udelay() by making it safe in the face of independently scaling
CPUs. The only prerequisite is that read_current_timer() is
monotonically increasing across calls (and doesn't overflow
within ~2000us).

There is a downside to this approach though. BogoMIPS is no
longer "accurate" in that it reflects the BogoMIPS of the timer
and not the CPU. On most SoC's the timer isn't running anywhere
near as fast as the CPU so BogoMIPS will be ridiculously low (my
timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my
CPU's 800). This shouldn't be too much of a concern though since
BogoMIPS are bogus anyway (hence the name).

This loop is pretty much a copy of AVR's version made more generic.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    1 +
 arch/arm/lib/delay.c         |   18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index 7c732b5..5c6b9a3 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long);
 	  __udelay(n))
 
 extern void set_delay_fn(void (*fn)(unsigned long));
+extern void read_current_timer_delay_loop(unsigned long loops);
 
 #endif /* defined(_ARM_DELAY_H) */
 
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index b307fcc..59fdf64 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -5,6 +5,7 @@
  *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
  *  Copyright (C) 1993 Linus Torvalds
  *  Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
+ *  Copyright (C) 2005-2006 Atmel Corporation
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -12,6 +13,7 @@
  */
 #include <linux/module.h>
 #include <linux/delay.h>
+#include <linux/timex.h>
 
 /*
  * Oh, if only we had a cycle counter...
@@ -26,6 +28,22 @@ void delay_loop(unsigned long loops)
 	);
 }
 
+#ifdef ARCH_HAS_READ_CURRENT_TIMER
+/*
+ * Assuming read_current_timer() is monotonically increasing
+ * across calls.
+ */
+void read_current_timer_delay_loop(unsigned long loops)
+{
+	unsigned long bclock, now;
+
+	read_current_timer(&bclock);
+	do {
+		read_current_timer(&now);
+	} while ((now - bclock) < loops);
+}
+#endif
+
 static void (*delay_fn)(unsigned long) = delay_loop;
 
 void set_delay_fn(void (*fn)(unsigned long))
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/3] [ARM] Implement a timer based __delay() loop
@ 2010-08-19  2:24   ` Stephen Boyd
  0 siblings, 0 replies; 21+ messages in thread
From: Stephen Boyd @ 2010-08-19  2:24 UTC (permalink / raw)
  To: linux-arm-kernel

udelay() can be incorrect on SMP machines that scale their CPU
frequencies independently of one another (as pointed out here
http://article.gmane.org/gmane.linux.kernel/977567). The delay
loop can either be too fast or too slow depending on which CPU the
loops_per_jiffy counter is calibrated on and which CPU the delay
loop is running on. udelay() can also be incorrect if the
CPU frequency switches during the __delay() loop, causing the loop
to either terminate too early, or too late. 

We'd rather not fix udelay() by forcing it to run on one CPU or
take the penalty of a rather large loops_per_jiffy being used in
udelay() when the CPU is actually running slower. Instead we
solve the problem by making __delay() into a timer based loop
which should be unaffected by CPU frequency scaling. Therefore,
implement a timer based __delay() loop which can be used in place
of the current register based __delay() if a machine so chooses.

The kernel is already prepared for a timer based approach
(evident by the read_current_timer() function). If an arch
implements read_current_timer(), calibrate_delay() will use a
timer based function, calibrate_delay_direct(), to calculate
loops_per_jiffy (in which case loops_per_jiffy should really be
renamed to timer_ticks_per_jiffy). Since the loops_per_jiffy will
be based on timer ticks, __delay() should be implemented as a
loop around read_current_timer().

Doing this makes the expensive loops_per_jiffy calculation go
away (saving ~150ms on boot time on my machine) and fixes
udelay() by making it safe in the face of independently scaling
CPUs. The only prerequisite is that read_current_timer() is
monotonically increasing across calls (and doesn't overflow
within ~2000us).

There is a downside to this approach though. BogoMIPS is no
longer "accurate" in that it reflects the BogoMIPS of the timer
and not the CPU. On most SoC's the timer isn't running anywhere
near as fast as the CPU so BogoMIPS will be ridiculously low (my
timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my
CPU's 800). This shouldn't be too much of a concern though since
BogoMIPS are bogus anyway (hence the name).

This loop is pretty much a copy of AVR's version made more generic.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    1 +
 arch/arm/lib/delay.c         |   18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index 7c732b5..5c6b9a3 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long);
 	  __udelay(n))
 
 extern void set_delay_fn(void (*fn)(unsigned long));
+extern void read_current_timer_delay_loop(unsigned long loops);
 
 #endif /* defined(_ARM_DELAY_H) */
 
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index b307fcc..59fdf64 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -5,6 +5,7 @@
  *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
  *  Copyright (C) 1993 Linus Torvalds
  *  Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
+ *  Copyright (C) 2005-2006 Atmel Corporation
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -12,6 +13,7 @@
  */
 #include <linux/module.h>
 #include <linux/delay.h>
+#include <linux/timex.h>
 
 /*
  * Oh, if only we had a cycle counter...
@@ -26,6 +28,22 @@ void delay_loop(unsigned long loops)
 	);
 }
 
+#ifdef ARCH_HAS_READ_CURRENT_TIMER
+/*
+ * Assuming read_current_timer() is monotonically increasing
+ * across calls.
+ */
+void read_current_timer_delay_loop(unsigned long loops)
+{
+	unsigned long bclock, now;
+
+	read_current_timer(&bclock);
+	do {
+		read_current_timer(&now);
+	} while ((now - bclock) < loops);
+}
+#endif
+
 static void (*delay_fn)(unsigned long) = delay_loop;
 
 void set_delay_fn(void (*fn)(unsigned long))
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-10-06 13:44 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-07 18:23 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd
2010-09-07 18:23 ` Stephen Boyd
2010-09-07 18:23 ` [PATCH 1/3] [ARM] Translate delay.S into (mostly) C Stephen Boyd
2010-09-07 18:23   ` Stephen Boyd
2010-09-07 18:23 ` [PATCH 2/3] [ARM] Allow machines to override __delay() Stephen Boyd
2010-09-07 18:23   ` Stephen Boyd
2010-09-07 18:23 ` [PATCH 3/3] [ARM] Implement a timer based __delay() loop Stephen Boyd
2010-09-07 18:23   ` Stephen Boyd
2010-09-22 21:54 ` [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Daniel Walker
2010-09-22 21:54   ` Daniel Walker
  -- strict thread matches above, loose matches on Subject: below --
2010-09-28  3:33 Stephen Boyd
2010-09-28  3:33 ` [PATCH 3/3] [ARM] Implement a timer based __delay() loop Stephen Boyd
2010-09-28  3:33   ` Stephen Boyd
2010-10-05 17:38   ` Daniel Walker
2010-10-05 17:38     ` Daniel Walker
2010-10-06  3:36     ` Stephen Boyd
2010-10-06  3:36       ` Stephen Boyd
2010-10-06 13:44       ` Daniel Walker
2010-10-06 13:44         ` Daniel Walker
2010-09-04  4:28 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd
2010-09-04  4:28 ` [PATCH 3/3] [ARM] Implement a timer based __delay() loop Stephen Boyd
2010-08-19  2:24 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd
2010-08-19  2:24 ` [PATCH 3/3] [ARM] Implement a timer based __delay() loop Stephen Boyd
2010-08-19  2:24   ` Stephen Boyd

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.