All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 0/3] ARM: Fixing udelay() for SMP and non-SMP systems
@ 2010-10-28 21:19 ` Stephen Boyd
  0 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-10-28 21:19 UTC (permalink / raw)
  To: Russell King
  Cc: linux-kernel, linux-arm-msm, linux-arm-kernel, Daniel Walker,
	Saravana Kannan, Colin Cross, Kevin Hilman, Santosh Shilimkar,
	Nicolas Pitre

These patches are another attempt at fixing the udelay()
issue pointed out on arm-lkml[1][2]. A quick recap: some SMP
machines can scale their CPU frequencies independent of one
another. loops_per_jiffy is calibrated globally and used in
__const_udelay(). If one CPU is running faster than what the
loops_per_jiffy is calculated (or scaled) for, udelay() will
be incorrect and not wait long enough (or too long). A similar
problem occurs if the cpu frequency is scaled during a udelay()
call.

We could fix this issue a couple ways, wholesale replacement
of __udelay() and __const_udelay() (see [2] for that approach),
or replacement of __delay() (this series). Option 1 can fail if
anybody uses udelay() before memory is mapped and also duplicates
most of the code in asm/delay.h. It also needs to hardcode the
timer tick frequency, which can sometimes be inaccurate. The
benefit is that loops_per_jiffy stays the same and thus BogoMIPS
is unchanged.  Option 2 can't fail since the __delay() loop is
replaced after memory is mapped in, but it suffers from a low
BogoMIPS when timers are clocked slowly. It also more accurately
calculates the timer tick frequency through the use of
calibrate_delay_direct().

Changes since v1:
 * likely() in delay.c
 * comment fixup for read_current_timer_delay_loop() 
 * cosmetic improvements to commit text

I didn't add any more comments to delay.c's functions since I
couldn't come up with anything worthwhile to say.

-- Reference --
[1] http://article.gmane.org/gmane.linux.kernel/977567
[2] http://article.gmane.org/gmane.linux.ports.arm.kernel/78496 

Stephen Boyd (3):
  ARM: Translate delay.S into (mostly) C
  ARM: Allow machines to override __delay()
  ARM: Implement a timer based __delay() loop

 arch/arm/include/asm/delay.h |    5 ++-
 arch/arm/kernel/armksyms.c   |    4 --
 arch/arm/lib/delay.S         |   65 ------------------------------
 arch/arm/lib/delay.c         |   90 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 94 insertions(+), 70 deletions(-)
 delete mode 100644 arch/arm/lib/delay.S
 create mode 100644 arch/arm/lib/delay.c

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCHv2 0/3] ARM: Fixing udelay() for SMP and non-SMP systems
@ 2010-10-28 21:19 ` Stephen Boyd
  0 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-10-28 21:19 UTC (permalink / raw)
  To: linux-arm-kernel

These patches are another attempt at fixing the udelay()
issue pointed out on arm-lkml[1][2]. A quick recap: some SMP
machines can scale their CPU frequencies independent of one
another. loops_per_jiffy is calibrated globally and used in
__const_udelay(). If one CPU is running faster than what the
loops_per_jiffy is calculated (or scaled) for, udelay() will
be incorrect and not wait long enough (or too long). A similar
problem occurs if the cpu frequency is scaled during a udelay()
call.

We could fix this issue a couple ways, wholesale replacement
of __udelay() and __const_udelay() (see [2] for that approach),
or replacement of __delay() (this series). Option 1 can fail if
anybody uses udelay() before memory is mapped and also duplicates
most of the code in asm/delay.h. It also needs to hardcode the
timer tick frequency, which can sometimes be inaccurate. The
benefit is that loops_per_jiffy stays the same and thus BogoMIPS
is unchanged.  Option 2 can't fail since the __delay() loop is
replaced after memory is mapped in, but it suffers from a low
BogoMIPS when timers are clocked slowly. It also more accurately
calculates the timer tick frequency through the use of
calibrate_delay_direct().

Changes since v1:
 * likely() in delay.c
 * comment fixup for read_current_timer_delay_loop() 
 * cosmetic improvements to commit text

I didn't add any more comments to delay.c's functions since I
couldn't come up with anything worthwhile to say.

-- Reference --
[1] http://article.gmane.org/gmane.linux.kernel/977567
[2] http://article.gmane.org/gmane.linux.ports.arm.kernel/78496 

Stephen Boyd (3):
  ARM: Translate delay.S into (mostly) C
  ARM: Allow machines to override __delay()
  ARM: Implement a timer based __delay() loop

 arch/arm/include/asm/delay.h |    5 ++-
 arch/arm/kernel/armksyms.c   |    4 --
 arch/arm/lib/delay.S         |   65 ------------------------------
 arch/arm/lib/delay.c         |   90 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 94 insertions(+), 70 deletions(-)
 delete mode 100644 arch/arm/lib/delay.S
 create mode 100644 arch/arm/lib/delay.c

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCHv2 1/3] ARM: Translate delay.S into (mostly) C
  2010-10-28 21:19 ` Stephen Boyd
@ 2010-10-28 21:19   ` Stephen Boyd
  -1 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-10-28 21:19 UTC (permalink / raw)
  To: Russell King
  Cc: linux-kernel, linux-arm-msm, linux-arm-kernel, Daniel Walker,
	Saravana Kannan, Colin Cross, Kevin Hilman, Santosh Shilimkar,
	Nicolas Pitre

We want to allow machines to override the __delay() implementation
at runtime so they can use a timer based __delay() routine. It's
easier to do this using C, so let's write udelay and friends in C.

We lose the #if 0 code, which according to Russell is used "to
make the delay loop more stable and predictable on older CPUs"
(see http://article.gmane.org/gmane.linux.kernel/888867 for more
info). We shouldn't be too worried though, since we'll soon add
functionality allowing a machine to set the __delay() loop
themselves, thus allowing machines to resurrect the commented out
code should they need it.

Nico expressed concern that fixed lpj cmdlines will break due to
compiler optimizations. That doesn't seem to be the case since
before and after this patch I get the same lpj value when running
my CPU at 19.2 MHz. That should be sufficiently slow enough to
cover any machine running Linux.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    2 +-
 arch/arm/kernel/armksyms.c   |    4 --
 arch/arm/lib/delay.S         |   65 ------------------------------------------
 arch/arm/lib/delay.c         |   57 ++++++++++++++++++++++++++++++++++++
 4 files changed, 58 insertions(+), 70 deletions(-)
 delete mode 100644 arch/arm/lib/delay.S
 create mode 100644 arch/arm/lib/delay.c

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index b2deda1..ccc5ed5 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -8,7 +8,7 @@
 
 #include <asm/param.h>	/* HZ */
 
-extern void __delay(int loops);
+extern void __delay(unsigned long loops);
 
 /*
  * This function intentionally does not exist; if you see references to
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index e5e1e538..220dce6 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -52,10 +52,6 @@ extern void fpundefinstr(void);
 
 EXPORT_SYMBOL(__backtrace);
 
-	/* platform dependent support */
-EXPORT_SYMBOL(__udelay);
-EXPORT_SYMBOL(__const_udelay);
-
 	/* networking */
 EXPORT_SYMBOL(csum_partial);
 EXPORT_SYMBOL(csum_partial_copy_from_user);
diff --git a/arch/arm/lib/delay.S b/arch/arm/lib/delay.S
deleted file mode 100644
index 8d6a876..0000000
--- a/arch/arm/lib/delay.S
+++ /dev/null
@@ -1,65 +0,0 @@
-/*
- *  linux/arch/arm/lib/delay.S
- *
- *  Copyright (C) 1995, 1996 Russell King
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-#include <linux/linkage.h>
-#include <asm/assembler.h>
-#include <asm/param.h>
-		.text
-
-.LC0:		.word	loops_per_jiffy
-.LC1:		.word	(2199023*HZ)>>11
-
-/*
- * r0  <= 2000
- * lpj <= 0x01ffffff (max. 3355 bogomips)
- * HZ  <= 1000
- */
-
-ENTRY(__udelay)
-		ldr	r2, .LC1
-		mul	r0, r2, r0
-ENTRY(__const_udelay)				@ 0 <= r0 <= 0x7fffff06
-		ldr	r2, .LC0
-		ldr	r2, [r2]		@ max = 0x01ffffff
-		mov	r0, r0, lsr #14		@ max = 0x0001ffff
-		mov	r2, r2, lsr #10		@ max = 0x00007fff
-		mul	r0, r2, r0		@ max = 2^32-1
-		movs	r0, r0, lsr #6
-		moveq	pc, lr
-
-/*
- * loops = r0 * HZ * loops_per_jiffy / 1000000
- *
- * Oh, if only we had a cycle counter...
- */
-
-@ Delay routine
-ENTRY(__delay)
-		subs	r0, r0, #1
-#if 0
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-#endif
-		bhi	__delay
-		mov	pc, lr
-ENDPROC(__udelay)
-ENDPROC(__const_udelay)
-ENDPROC(__delay)
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
new file mode 100644
index 0000000..1ab117d
--- /dev/null
+++ b/arch/arm/lib/delay.c
@@ -0,0 +1,57 @@
+/*
+ *  Originally from linux/arch/arm/lib/delay.S
+ *
+ *  Copyright (C) 1995, 1996 Russell King
+ *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/delay.h>
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ *
+ * Oh, if only we had a cycle counter...
+ */
+void __delay(unsigned long loops)
+{
+	asm volatile(
+	"1:	subs %0, %0, #1 \n"
+	"	bhi 1b		\n"
+	: /* No output */
+	: "r" (loops)
+	);
+}
+EXPORT_SYMBOL(__delay);
+
+/*
+ * 0 <= xloops <= 0x7fffff06
+ * loops_per_jiffy <= 0x01ffffff (max. 3355 bogomips)
+ */
+void __const_udelay(unsigned long xloops)
+{
+	unsigned long lpj;
+	unsigned long loops;
+
+	xloops >>= 14;			/* max = 0x01ffffff */
+	lpj = loops_per_jiffy >> 10;	/* max = 0x0001ffff */
+	loops = lpj * xloops;		/* max = 0x00007fff */
+	loops >>= 6;			/* max = 2^32-1 */
+
+	if (likely(loops))
+		__delay(loops);
+}
+EXPORT_SYMBOL(__const_udelay);
+
+/*
+ * usecs  <= 2000
+ * HZ  <= 1000
+ */
+void __udelay(unsigned long usecs)
+{
+	__const_udelay(usecs * ((2199023*HZ)>>11));
+}
+EXPORT_SYMBOL(__udelay);
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCHv2 1/3] ARM: Translate delay.S into (mostly) C
@ 2010-10-28 21:19   ` Stephen Boyd
  0 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-10-28 21:19 UTC (permalink / raw)
  To: linux-arm-kernel

We want to allow machines to override the __delay() implementation
at runtime so they can use a timer based __delay() routine. It's
easier to do this using C, so let's write udelay and friends in C.

We lose the #if 0 code, which according to Russell is used "to
make the delay loop more stable and predictable on older CPUs"
(see http://article.gmane.org/gmane.linux.kernel/888867 for more
info). We shouldn't be too worried though, since we'll soon add
functionality allowing a machine to set the __delay() loop
themselves, thus allowing machines to resurrect the commented out
code should they need it.

Nico expressed concern that fixed lpj cmdlines will break due to
compiler optimizations. That doesn't seem to be the case since
before and after this patch I get the same lpj value when running
my CPU at 19.2 MHz. That should be sufficiently slow enough to
cover any machine running Linux.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    2 +-
 arch/arm/kernel/armksyms.c   |    4 --
 arch/arm/lib/delay.S         |   65 ------------------------------------------
 arch/arm/lib/delay.c         |   57 ++++++++++++++++++++++++++++++++++++
 4 files changed, 58 insertions(+), 70 deletions(-)
 delete mode 100644 arch/arm/lib/delay.S
 create mode 100644 arch/arm/lib/delay.c

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index b2deda1..ccc5ed5 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -8,7 +8,7 @@
 
 #include <asm/param.h>	/* HZ */
 
-extern void __delay(int loops);
+extern void __delay(unsigned long loops);
 
 /*
  * This function intentionally does not exist; if you see references to
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index e5e1e538..220dce6 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -52,10 +52,6 @@ extern void fpundefinstr(void);
 
 EXPORT_SYMBOL(__backtrace);
 
-	/* platform dependent support */
-EXPORT_SYMBOL(__udelay);
-EXPORT_SYMBOL(__const_udelay);
-
 	/* networking */
 EXPORT_SYMBOL(csum_partial);
 EXPORT_SYMBOL(csum_partial_copy_from_user);
diff --git a/arch/arm/lib/delay.S b/arch/arm/lib/delay.S
deleted file mode 100644
index 8d6a876..0000000
--- a/arch/arm/lib/delay.S
+++ /dev/null
@@ -1,65 +0,0 @@
-/*
- *  linux/arch/arm/lib/delay.S
- *
- *  Copyright (C) 1995, 1996 Russell King
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-#include <linux/linkage.h>
-#include <asm/assembler.h>
-#include <asm/param.h>
-		.text
-
-.LC0:		.word	loops_per_jiffy
-.LC1:		.word	(2199023*HZ)>>11
-
-/*
- * r0  <= 2000
- * lpj <= 0x01ffffff (max. 3355 bogomips)
- * HZ  <= 1000
- */
-
-ENTRY(__udelay)
-		ldr	r2, .LC1
-		mul	r0, r2, r0
-ENTRY(__const_udelay)				@ 0 <= r0 <= 0x7fffff06
-		ldr	r2, .LC0
-		ldr	r2, [r2]		@ max = 0x01ffffff
-		mov	r0, r0, lsr #14		@ max = 0x0001ffff
-		mov	r2, r2, lsr #10		@ max = 0x00007fff
-		mul	r0, r2, r0		@ max = 2^32-1
-		movs	r0, r0, lsr #6
-		moveq	pc, lr
-
-/*
- * loops = r0 * HZ * loops_per_jiffy / 1000000
- *
- * Oh, if only we had a cycle counter...
- */
-
-@ Delay routine
-ENTRY(__delay)
-		subs	r0, r0, #1
-#if 0
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-		movls	pc, lr
-		subs	r0, r0, #1
-#endif
-		bhi	__delay
-		mov	pc, lr
-ENDPROC(__udelay)
-ENDPROC(__const_udelay)
-ENDPROC(__delay)
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
new file mode 100644
index 0000000..1ab117d
--- /dev/null
+++ b/arch/arm/lib/delay.c
@@ -0,0 +1,57 @@
+/*
+ *  Originally from linux/arch/arm/lib/delay.S
+ *
+ *  Copyright (C) 1995, 1996 Russell King
+ *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/delay.h>
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ *
+ * Oh, if only we had a cycle counter...
+ */
+void __delay(unsigned long loops)
+{
+	asm volatile(
+	"1:	subs %0, %0, #1 \n"
+	"	bhi 1b		\n"
+	: /* No output */
+	: "r" (loops)
+	);
+}
+EXPORT_SYMBOL(__delay);
+
+/*
+ * 0 <= xloops <= 0x7fffff06
+ * loops_per_jiffy <= 0x01ffffff (max. 3355 bogomips)
+ */
+void __const_udelay(unsigned long xloops)
+{
+	unsigned long lpj;
+	unsigned long loops;
+
+	xloops >>= 14;			/* max = 0x01ffffff */
+	lpj = loops_per_jiffy >> 10;	/* max = 0x0001ffff */
+	loops = lpj * xloops;		/* max = 0x00007fff */
+	loops >>= 6;			/* max = 2^32-1 */
+
+	if (likely(loops))
+		__delay(loops);
+}
+EXPORT_SYMBOL(__const_udelay);
+
+/*
+ * usecs  <= 2000
+ * HZ  <= 1000
+ */
+void __udelay(unsigned long usecs)
+{
+	__const_udelay(usecs * ((2199023*HZ)>>11));
+}
+EXPORT_SYMBOL(__udelay);
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCHv2 2/3] ARM: Allow machines to override __delay()
  2010-10-28 21:19 ` Stephen Boyd
@ 2010-10-28 21:19   ` Stephen Boyd
  -1 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-10-28 21:19 UTC (permalink / raw)
  To: Russell King
  Cc: linux-kernel, linux-arm-msm, linux-arm-kernel, Daniel Walker,
	Saravana Kannan, Colin Cross, Kevin Hilman, Santosh Shilimkar,
	Nicolas Pitre

Some machines want to implement their own __delay() routine based
on fixed rate timers. Expose functionality to set the __delay()
routine at runtime. This should allow two machines with different
__delay() routines to happily co-exist within the same kernel
with minimal overhead.

Russell expressed concern that using a timer based __delay()
would cause problems when an iomapped device isn't mapped in
prior to a delay call being made (see
http://article.gmane.org/gmane.linux.ports.arm.kernel/78543 for
more info). We can sidestep that issue with this approach since
the __delay() routine _should_ only be pointed to a timer based
delay once the timer has been properly mapped. Up until that
point __delay() and udelay() will use delay_loop() which is
always safe to call.

This patch is inspired by x86's delay.c

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    2 ++
 arch/arm/lib/delay.c         |   21 ++++++++++++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index ccc5ed5..7c732b5 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -40,5 +40,7 @@ extern void __const_udelay(unsigned long);
 			__const_udelay((n) * ((2199023U*HZ)>>11))) :	\
 	  __udelay(n))
 
+extern void set_delay_fn(void (*fn)(unsigned long));
+
 #endif /* defined(_ARM_DELAY_H) */
 
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index 1ab117d..116a853 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -3,6 +3,8 @@
  *
  *  Copyright (C) 1995, 1996 Russell King
  *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *  Copyright (C) 1993 Linus Torvalds
+ *  Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -12,11 +14,9 @@
 #include <linux/delay.h>
 
 /*
- * loops = usecs * HZ * loops_per_jiffy / 1000000
- *
  * Oh, if only we had a cycle counter...
  */
-void __delay(unsigned long loops)
+void delay_loop(unsigned long loops)
 {
 	asm volatile(
 	"1:	subs %0, %0, #1 \n"
@@ -25,6 +25,21 @@ void __delay(unsigned long loops)
 	: "r" (loops)
 	);
 }
+
+static void (*delay_fn)(unsigned long) = delay_loop;
+
+void set_delay_fn(void (*fn)(unsigned long))
+{
+	delay_fn = fn;
+}
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ */
+void __delay(unsigned long loops)
+{
+	delay_fn(loops);
+}
 EXPORT_SYMBOL(__delay);
 
 /*
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCHv2 2/3] ARM: Allow machines to override __delay()
@ 2010-10-28 21:19   ` Stephen Boyd
  0 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-10-28 21:19 UTC (permalink / raw)
  To: linux-arm-kernel

Some machines want to implement their own __delay() routine based
on fixed rate timers. Expose functionality to set the __delay()
routine at runtime. This should allow two machines with different
__delay() routines to happily co-exist within the same kernel
with minimal overhead.

Russell expressed concern that using a timer based __delay()
would cause problems when an iomapped device isn't mapped in
prior to a delay call being made (see
http://article.gmane.org/gmane.linux.ports.arm.kernel/78543 for
more info). We can sidestep that issue with this approach since
the __delay() routine _should_ only be pointed to a timer based
delay once the timer has been properly mapped. Up until that
point __delay() and udelay() will use delay_loop() which is
always safe to call.

This patch is inspired by x86's delay.c

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    2 ++
 arch/arm/lib/delay.c         |   21 ++++++++++++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index ccc5ed5..7c732b5 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -40,5 +40,7 @@ extern void __const_udelay(unsigned long);
 			__const_udelay((n) * ((2199023U*HZ)>>11))) :	\
 	  __udelay(n))
 
+extern void set_delay_fn(void (*fn)(unsigned long));
+
 #endif /* defined(_ARM_DELAY_H) */
 
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index 1ab117d..116a853 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -3,6 +3,8 @@
  *
  *  Copyright (C) 1995, 1996 Russell King
  *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *  Copyright (C) 1993 Linus Torvalds
+ *  Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -12,11 +14,9 @@
 #include <linux/delay.h>
 
 /*
- * loops = usecs * HZ * loops_per_jiffy / 1000000
- *
  * Oh, if only we had a cycle counter...
  */
-void __delay(unsigned long loops)
+void delay_loop(unsigned long loops)
 {
 	asm volatile(
 	"1:	subs %0, %0, #1 \n"
@@ -25,6 +25,21 @@ void __delay(unsigned long loops)
 	: "r" (loops)
 	);
 }
+
+static void (*delay_fn)(unsigned long) = delay_loop;
+
+void set_delay_fn(void (*fn)(unsigned long))
+{
+	delay_fn = fn;
+}
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ */
+void __delay(unsigned long loops)
+{
+	delay_fn(loops);
+}
 EXPORT_SYMBOL(__delay);
 
 /*
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCHv2 3/3] ARM: Implement a timer based __delay() loop
  2010-10-28 21:19 ` Stephen Boyd
@ 2010-10-28 21:19   ` Stephen Boyd
  -1 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-10-28 21:19 UTC (permalink / raw)
  To: Russell King
  Cc: linux-kernel, linux-arm-msm, linux-arm-kernel, Daniel Walker,
	Saravana Kannan, Colin Cross, Kevin Hilman, Santosh Shilimkar,
	Nicolas Pitre

udelay() can be incorrect on SMP machines that scale their CPU
frequencies independently of one another (as pointed out here
http://article.gmane.org/gmane.linux.kernel/977567). The delay
loop can either be too fast or too slow depending on which CPU the
loops_per_jiffy counter is calibrated on and which CPU the delay
loop is running on. udelay() can also be incorrect if the
CPU frequency switches during the __delay() loop, causing the loop
to either terminate too early, or too late.

Forcing udelay() to run on one CPU is unreasonable and taking the
penalty of a rather large loops_per_jiffy in udelay() when the
CPU is actually running slower is bad for performance. Solve the
problem by adding a timer based__delay() loop unaffected by CPU
frequency scaling. Machines should set this loop as their
__delay() implementation by calling set_timer_fn() during their
timer initialization.

The kernel is already prepared for a timer based approach
(evident by the read_current_timer() function). If an arch
implements read_current_timer(), calibrate_delay() will use
calibrate_delay_direct() to calculate loops_per_jiffy (in which
case loops_per_jiffy should really be renamed to
timer_ticks_per_jiffy). Since the loops_per_jiffy will be based
on timer ticks, __delay() should be implemented as a loop around
read_current_timer().

Doing this makes the expensive loops_per_jiffy calculation go
away (saving ~150ms on boot time on my machine) and fixes
udelay() by making it safe in the face of independently scaling
CPUs. The only prerequisite is that read_current_timer() is
monotonically increasing across calls (and doesn't overflow
within ~2000us).

There is a downside to this approach though. BogoMIPS is no
longer "accurate" in that it reflects the BogoMIPS of the timer
and not the CPU. On most SoC's the timer isn't running anywhere
near as fast as the CPU so BogoMIPS will be ridiculously low (my
timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my
CPU's 800). This shouldn't be too much of a concern though since
BogoMIPS are bogus anyway (hence the name).

This loop is pretty much a copy of AVR's version.

Reported-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    1 +
 arch/arm/lib/delay.c         |   18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index 7c732b5..5c6b9a3 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long);
 	  __udelay(n))
 
 extern void set_delay_fn(void (*fn)(unsigned long));
+extern void read_current_timer_delay_loop(unsigned long loops);
 
 #endif /* defined(_ARM_DELAY_H) */
 
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index 116a853..c92ae49 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -5,6 +5,7 @@
  *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
  *  Copyright (C) 1993 Linus Torvalds
  *  Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
+ *  Copyright (C) 2005-2006 Atmel Corporation
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -12,6 +13,7 @@
  */
 #include <linux/module.h>
 #include <linux/delay.h>
+#include <linux/timex.h>
 
 /*
  * Oh, if only we had a cycle counter...
@@ -26,6 +28,22 @@ void delay_loop(unsigned long loops)
 	);
 }
 
+#ifdef ARCH_HAS_READ_CURRENT_TIMER
+/*
+ * Assumes read_current_timer() is monotonically increasing
+ * across calls and wraps at most once within MAX_UDELAY_MS.
+ */
+void read_current_timer_delay_loop(unsigned long loops)
+{
+	unsigned long bclock, now;
+
+	read_current_timer(&bclock);
+	do {
+		read_current_timer(&now);
+	} while ((now - bclock) < loops);
+}
+#endif
+
 static void (*delay_fn)(unsigned long) = delay_loop;
 
 void set_delay_fn(void (*fn)(unsigned long))
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCHv2 3/3] ARM: Implement a timer based __delay() loop
@ 2010-10-28 21:19   ` Stephen Boyd
  0 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-10-28 21:19 UTC (permalink / raw)
  To: linux-arm-kernel

udelay() can be incorrect on SMP machines that scale their CPU
frequencies independently of one another (as pointed out here
http://article.gmane.org/gmane.linux.kernel/977567). The delay
loop can either be too fast or too slow depending on which CPU the
loops_per_jiffy counter is calibrated on and which CPU the delay
loop is running on. udelay() can also be incorrect if the
CPU frequency switches during the __delay() loop, causing the loop
to either terminate too early, or too late.

Forcing udelay() to run on one CPU is unreasonable and taking the
penalty of a rather large loops_per_jiffy in udelay() when the
CPU is actually running slower is bad for performance. Solve the
problem by adding a timer based__delay() loop unaffected by CPU
frequency scaling. Machines should set this loop as their
__delay() implementation by calling set_timer_fn() during their
timer initialization.

The kernel is already prepared for a timer based approach
(evident by the read_current_timer() function). If an arch
implements read_current_timer(), calibrate_delay() will use
calibrate_delay_direct() to calculate loops_per_jiffy (in which
case loops_per_jiffy should really be renamed to
timer_ticks_per_jiffy). Since the loops_per_jiffy will be based
on timer ticks, __delay() should be implemented as a loop around
read_current_timer().

Doing this makes the expensive loops_per_jiffy calculation go
away (saving ~150ms on boot time on my machine) and fixes
udelay() by making it safe in the face of independently scaling
CPUs. The only prerequisite is that read_current_timer() is
monotonically increasing across calls (and doesn't overflow
within ~2000us).

There is a downside to this approach though. BogoMIPS is no
longer "accurate" in that it reflects the BogoMIPS of the timer
and not the CPU. On most SoC's the timer isn't running anywhere
near as fast as the CPU so BogoMIPS will be ridiculously low (my
timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my
CPU's 800). This shouldn't be too much of a concern though since
BogoMIPS are bogus anyway (hence the name).

This loop is pretty much a copy of AVR's version.

Reported-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
 arch/arm/include/asm/delay.h |    1 +
 arch/arm/lib/delay.c         |   18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index 7c732b5..5c6b9a3 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long);
 	  __udelay(n))
 
 extern void set_delay_fn(void (*fn)(unsigned long));
+extern void read_current_timer_delay_loop(unsigned long loops);
 
 #endif /* defined(_ARM_DELAY_H) */
 
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index 116a853..c92ae49 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -5,6 +5,7 @@
  *  Copyright (c) 2010, Code Aurora Forum. All rights reserved.
  *  Copyright (C) 1993 Linus Torvalds
  *  Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
+ *  Copyright (C) 2005-2006 Atmel Corporation
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -12,6 +13,7 @@
  */
 #include <linux/module.h>
 #include <linux/delay.h>
+#include <linux/timex.h>
 
 /*
  * Oh, if only we had a cycle counter...
@@ -26,6 +28,22 @@ void delay_loop(unsigned long loops)
 	);
 }
 
+#ifdef ARCH_HAS_READ_CURRENT_TIMER
+/*
+ * Assumes read_current_timer() is monotonically increasing
+ * across calls and wraps at most once within MAX_UDELAY_MS.
+ */
+void read_current_timer_delay_loop(unsigned long loops)
+{
+	unsigned long bclock, now;
+
+	read_current_timer(&bclock);
+	do {
+		read_current_timer(&now);
+	} while ((now - bclock) < loops);
+}
+#endif
+
 static void (*delay_fn)(unsigned long) = delay_loop;
 
 void set_delay_fn(void (*fn)(unsigned long))
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 1/3] ARM: Translate delay.S into (mostly) C
  2010-10-28 21:19   ` Stephen Boyd
@ 2010-11-03 17:57     ` Daniel Walker
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Walker @ 2010-11-03 17:57 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Stephen Boyd, Russell King, linux-kernel, linux-arm-msm,
	linux-arm-kernel, Saravana Kannan, Colin Cross, Kevin Hilman,
	Santosh Shilimkar

On Thu, 2010-10-28 at 14:19 -0700, Stephen Boyd wrote:
> We want to allow machines to override the __delay() implementation
> at runtime so they can use a timer based __delay() routine. It's
> easier to do this using C, so let's write udelay and friends in C.
> 
> We lose the #if 0 code, which according to Russell is used "to
> make the delay loop more stable and predictable on older CPUs"
> (see http://article.gmane.org/gmane.linux.kernel/888867 for more
> info). We shouldn't be too worried though, since we'll soon add
> functionality allowing a machine to set the __delay() loop
> themselves, thus allowing machines to resurrect the commented out
> code should they need it.
> 
> Nico expressed concern that fixed lpj cmdlines will break due to
> compiler optimizations. That doesn't seem to be the case since
> before and after this patch I get the same lpj value when running
> my CPU at 19.2 MHz. That should be sufficiently slow enough to
> cover any machine running Linux.

Nico, are you ready to sign off on this?

Daniel

-- 

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCHv2 1/3] ARM: Translate delay.S into (mostly) C
@ 2010-11-03 17:57     ` Daniel Walker
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Walker @ 2010-11-03 17:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-10-28 at 14:19 -0700, Stephen Boyd wrote:
> We want to allow machines to override the __delay() implementation
> at runtime so they can use a timer based __delay() routine. It's
> easier to do this using C, so let's write udelay and friends in C.
> 
> We lose the #if 0 code, which according to Russell is used "to
> make the delay loop more stable and predictable on older CPUs"
> (see http://article.gmane.org/gmane.linux.kernel/888867 for more
> info). We shouldn't be too worried though, since we'll soon add
> functionality allowing a machine to set the __delay() loop
> themselves, thus allowing machines to resurrect the commented out
> code should they need it.
> 
> Nico expressed concern that fixed lpj cmdlines will break due to
> compiler optimizations. That doesn't seem to be the case since
> before and after this patch I get the same lpj value when running
> my CPU at 19.2 MHz. That should be sufficiently slow enough to
> cover any machine running Linux.

Nico, are you ready to sign off on this?

Daniel

-- 

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 1/3] ARM: Translate delay.S into (mostly) C
  2010-10-28 21:19   ` Stephen Boyd
@ 2010-11-03 18:27     ` Will Deacon
  -1 siblings, 0 replies; 32+ messages in thread
From: Will Deacon @ 2010-11-03 18:27 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King, Daniel Walker, Nicolas Pitre, Kevin Hilman,
	linux-arm-msm, linux-kernel, Saravana Kannan, Santosh Shilimkar,
	Colin Cross, linux-arm-kernel

Hi Stephen,

On Thu, 2010-10-28 at 21:19 +0000, Stephen Boyd wrote:
> Nico expressed concern that fixed lpj cmdlines will break due to
> compiler optimizations. That doesn't seem to be the case since
> before and after this patch I get the same lpj value when running
> my CPU at 19.2 MHz. That should be sufficiently slow enough to
> cover any machine running Linux.

I appreciate this is an exceptional case, but there are some lucky
guys at ARM who (as routinely as they can) boot Linux on sub 1MHz
hardware. The delay loop is something they're keen to avoid so they do
make use of the lpj= command line option and would rather it didn't
break on them.

Will

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCHv2 1/3] ARM: Translate delay.S into (mostly) C
@ 2010-11-03 18:27     ` Will Deacon
  0 siblings, 0 replies; 32+ messages in thread
From: Will Deacon @ 2010-11-03 18:27 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Stephen,

On Thu, 2010-10-28 at 21:19 +0000, Stephen Boyd wrote:
> Nico expressed concern that fixed lpj cmdlines will break due to
> compiler optimizations. That doesn't seem to be the case since
> before and after this patch I get the same lpj value when running
> my CPU at 19.2 MHz. That should be sufficiently slow enough to
> cover any machine running Linux.

I appreciate this is an exceptional case, but there are some lucky
guys at ARM who (as routinely as they can) boot Linux on sub 1MHz
hardware. The delay loop is something they're keen to avoid so they do
make use of the lpj= command line option and would rather it didn't
break on them.

Will

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 1/3] ARM: Translate delay.S into (mostly) C
  2010-11-03 18:27     ` Will Deacon
@ 2010-11-03 23:15       ` Stephen Boyd
  -1 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-11-03 23:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: Russell King, Daniel Walker, Nicolas Pitre, Kevin Hilman,
	linux-arm-msm, linux-kernel, Saravana Kannan, Santosh Shilimkar,
	Colin Cross, linux-arm-kernel

On 11/03/2010 11:27 AM, Will Deacon wrote:
> Hi Stephen,
>
> On Thu, 2010-10-28 at 21:19 +0000, Stephen Boyd wrote:
>> Nico expressed concern that fixed lpj cmdlines will break due to
>> compiler optimizations. That doesn't seem to be the case since
>> before and after this patch I get the same lpj value when running
>> my CPU at 19.2 MHz. That should be sufficiently slow enough to
>> cover any machine running Linux.
>
> I appreciate this is an exceptional case, but there are some lucky
> guys at ARM who (as routinely as they can) boot Linux on sub 1MHz
> hardware. The delay loop is something they're keen to avoid so they do
> make use of the lpj= command line option and would rather it didn't
> break on them.

Do you know if it breaks at that frequency? I don't have any hardware to
test with that goes lower than the stated 19.2 MHz.

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCHv2 1/3] ARM: Translate delay.S into (mostly) C
@ 2010-11-03 23:15       ` Stephen Boyd
  0 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-11-03 23:15 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/03/2010 11:27 AM, Will Deacon wrote:
> Hi Stephen,
>
> On Thu, 2010-10-28 at 21:19 +0000, Stephen Boyd wrote:
>> Nico expressed concern that fixed lpj cmdlines will break due to
>> compiler optimizations. That doesn't seem to be the case since
>> before and after this patch I get the same lpj value when running
>> my CPU at 19.2 MHz. That should be sufficiently slow enough to
>> cover any machine running Linux.
>
> I appreciate this is an exceptional case, but there are some lucky
> guys at ARM who (as routinely as they can) boot Linux on sub 1MHz
> hardware. The delay loop is something they're keen to avoid so they do
> make use of the lpj= command line option and would rather it didn't
> break on them.

Do you know if it breaks at that frequency? I don't have any hardware to
test with that goes lower than the stated 19.2 MHz.

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 1/3] ARM: Translate delay.S into (mostly) C
  2010-11-03 23:15       ` Stephen Boyd
@ 2010-11-03 23:17         ` Daniel Walker
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Walker @ 2010-11-03 23:17 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Will Deacon, Russell King, Nicolas Pitre, Kevin Hilman,
	linux-arm-msm, linux-kernel, Saravana Kannan, Santosh Shilimkar,
	Colin Cross, linux-arm-kernel

On Wed, 2010-11-03 at 16:15 -0700, Stephen Boyd wrote:
> On 11/03/2010 11:27 AM, Will Deacon wrote:
> > Hi Stephen,
> >
> > On Thu, 2010-10-28 at 21:19 +0000, Stephen Boyd wrote:
> >> Nico expressed concern that fixed lpj cmdlines will break due to
> >> compiler optimizations. That doesn't seem to be the case since
> >> before and after this patch I get the same lpj value when running
> >> my CPU at 19.2 MHz. That should be sufficiently slow enough to
> >> cover any machine running Linux.
> >
> > I appreciate this is an exceptional case, but there are some lucky
> > guys at ARM who (as routinely as they can) boot Linux on sub 1MHz
> > hardware. The delay loop is something they're keen to avoid so they do
> > make use of the lpj= command line option and would rather it didn't
> > break on them.
> 
> Do you know if it breaks at that frequency? I don't have any hardware to
> test with that goes lower than the stated 19.2 MHz.

Isn't it possible that a new compiler could optimize the code
differently, and then end up breaking lpj= ?

Daniel

-- 

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCHv2 1/3] ARM: Translate delay.S into (mostly) C
@ 2010-11-03 23:17         ` Daniel Walker
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Walker @ 2010-11-03 23:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2010-11-03 at 16:15 -0700, Stephen Boyd wrote:
> On 11/03/2010 11:27 AM, Will Deacon wrote:
> > Hi Stephen,
> >
> > On Thu, 2010-10-28 at 21:19 +0000, Stephen Boyd wrote:
> >> Nico expressed concern that fixed lpj cmdlines will break due to
> >> compiler optimizations. That doesn't seem to be the case since
> >> before and after this patch I get the same lpj value when running
> >> my CPU at 19.2 MHz. That should be sufficiently slow enough to
> >> cover any machine running Linux.
> >
> > I appreciate this is an exceptional case, but there are some lucky
> > guys at ARM who (as routinely as they can) boot Linux on sub 1MHz
> > hardware. The delay loop is something they're keen to avoid so they do
> > make use of the lpj= command line option and would rather it didn't
> > break on them.
> 
> Do you know if it breaks at that frequency? I don't have any hardware to
> test with that goes lower than the stated 19.2 MHz.

Isn't it possible that a new compiler could optimize the code
differently, and then end up breaking lpj= ?

Daniel

-- 

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 1/3] ARM: Translate delay.S into (mostly) C
  2010-11-03 17:57     ` Daniel Walker
@ 2010-11-04  2:40       ` Nicolas Pitre
  -1 siblings, 0 replies; 32+ messages in thread
From: Nicolas Pitre @ 2010-11-04  2:40 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Stephen Boyd, Russell King, linux-kernel, linux-arm-msm,
	linux-arm-kernel, Saravana Kannan, Colin Cross, Kevin Hilman,
	Santosh Shilimkar

On Wed, 3 Nov 2010, Daniel Walker wrote:

> On Thu, 2010-10-28 at 14:19 -0700, Stephen Boyd wrote:
> > We want to allow machines to override the __delay() implementation
> > at runtime so they can use a timer based __delay() routine. It's
> > easier to do this using C, so let's write udelay and friends in C.
> > 
> > We lose the #if 0 code, which according to Russell is used "to
> > make the delay loop more stable and predictable on older CPUs"
> > (see http://article.gmane.org/gmane.linux.kernel/888867 for more
> > info). We shouldn't be too worried though, since we'll soon add
> > functionality allowing a machine to set the __delay() loop
> > themselves, thus allowing machines to resurrect the commented out
> > code should they need it.
> > 
> > Nico expressed concern that fixed lpj cmdlines will break due to
> > compiler optimizations. That doesn't seem to be the case since
> > before and after this patch I get the same lpj value when running
> > my CPU at 19.2 MHz. That should be sufficiently slow enough to
> > cover any machine running Linux.
> 
> Nico, are you ready to sign off on this?

Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>

The compiled code looks trivial enough.  I don't think gcc will find 
ways to optimize it further.  And if gcc regresses then the delay would 
just be longer.


Nicolas

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCHv2 1/3] ARM: Translate delay.S into (mostly) C
@ 2010-11-04  2:40       ` Nicolas Pitre
  0 siblings, 0 replies; 32+ messages in thread
From: Nicolas Pitre @ 2010-11-04  2:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 3 Nov 2010, Daniel Walker wrote:

> On Thu, 2010-10-28 at 14:19 -0700, Stephen Boyd wrote:
> > We want to allow machines to override the __delay() implementation
> > at runtime so they can use a timer based __delay() routine. It's
> > easier to do this using C, so let's write udelay and friends in C.
> > 
> > We lose the #if 0 code, which according to Russell is used "to
> > make the delay loop more stable and predictable on older CPUs"
> > (see http://article.gmane.org/gmane.linux.kernel/888867 for more
> > info). We shouldn't be too worried though, since we'll soon add
> > functionality allowing a machine to set the __delay() loop
> > themselves, thus allowing machines to resurrect the commented out
> > code should they need it.
> > 
> > Nico expressed concern that fixed lpj cmdlines will break due to
> > compiler optimizations. That doesn't seem to be the case since
> > before and after this patch I get the same lpj value when running
> > my CPU at 19.2 MHz. That should be sufficiently slow enough to
> > cover any machine running Linux.
> 
> Nico, are you ready to sign off on this?

Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>

The compiled code looks trivial enough.  I don't think gcc will find 
ways to optimize it further.  And if gcc regresses then the delay would 
just be longer.


Nicolas

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/3] ARM: Allow machines to override __delay()
  2010-10-28 21:19   ` Stephen Boyd
@ 2010-11-04 19:30     ` Daniel Walker
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Walker @ 2010-11-04 19:30 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King, linux-kernel, linux-arm-msm, linux-arm-kernel,
	Saravana Kannan, Colin Cross, Kevin Hilman, Santosh Shilimkar,
	Nicolas Pitre

On Thu, 2010-10-28 at 14:19 -0700, Stephen Boyd wrote:
> +
> +static void (*delay_fn)(unsigned long) = delay_loop;
> +
> +void set_delay_fn(void (*fn)(unsigned long))
> +{
> +       delay_fn = fn;
> +}

This needs to be a static inline in the header file.

> +/*
> + * loops = usecs * HZ * loops_per_jiffy / 1000000
> + */
> +void __delay(unsigned long loops)
> +{
> +       delay_fn(loops);
> +}
>  EXPORT_SYMBOL(__delay);

Can we make this static inline also? I'm sure about the module issues..

Daniel

-- 

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCHv2 2/3] ARM: Allow machines to override __delay()
@ 2010-11-04 19:30     ` Daniel Walker
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Walker @ 2010-11-04 19:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-10-28 at 14:19 -0700, Stephen Boyd wrote:
> +
> +static void (*delay_fn)(unsigned long) = delay_loop;
> +
> +void set_delay_fn(void (*fn)(unsigned long))
> +{
> +       delay_fn = fn;
> +}

This needs to be a static inline in the header file.

> +/*
> + * loops = usecs * HZ * loops_per_jiffy / 1000000
> + */
> +void __delay(unsigned long loops)
> +{
> +       delay_fn(loops);
> +}
>  EXPORT_SYMBOL(__delay);

Can we make this static inline also? I'm sure about the module issues..

Daniel

-- 

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/3] ARM: Allow machines to override __delay()
  2010-11-04 19:30     ` Daniel Walker
@ 2010-11-04 20:58       ` Stephen Boyd
  -1 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-11-04 20:58 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Russell King, linux-kernel, linux-arm-msm, linux-arm-kernel,
	Saravana Kannan, Colin Cross, Kevin Hilman, Santosh Shilimkar,
	Nicolas Pitre

On 11/04/2010 12:30 PM, Daniel Walker wrote:
> On Thu, 2010-10-28 at 14:19 -0700, Stephen Boyd wrote:
>> +
>> +static void (*delay_fn)(unsigned long) = delay_loop;
>> +
>> +void set_delay_fn(void (*fn)(unsigned long))
>> +{
>> +       delay_fn = fn;
>> +}
>
> This needs to be a static inline in the header file.

Wouldn't that mean delay_fn needs to be exposed in the header file too?
I like the fact that it's static and scoped to this file.

>> +/*
>> + * loops = usecs * HZ * loops_per_jiffy / 1000000
>> + */
>> +void __delay(unsigned long loops)
>> +{
>> +       delay_fn(loops);
>> +}
>>  EXPORT_SYMBOL(__delay);
>
> Can we make this static inline also? I'm sure about the module issues..

Do you mean in the header file or in this file?

I think it won't work because there actually needs to be a __delay
symbol and it can't just be inlined away at all the call sites.


-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCHv2 2/3] ARM: Allow machines to override __delay()
@ 2010-11-04 20:58       ` Stephen Boyd
  0 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-11-04 20:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/04/2010 12:30 PM, Daniel Walker wrote:
> On Thu, 2010-10-28 at 14:19 -0700, Stephen Boyd wrote:
>> +
>> +static void (*delay_fn)(unsigned long) = delay_loop;
>> +
>> +void set_delay_fn(void (*fn)(unsigned long))
>> +{
>> +       delay_fn = fn;
>> +}
>
> This needs to be a static inline in the header file.

Wouldn't that mean delay_fn needs to be exposed in the header file too?
I like the fact that it's static and scoped to this file.

>> +/*
>> + * loops = usecs * HZ * loops_per_jiffy / 1000000
>> + */
>> +void __delay(unsigned long loops)
>> +{
>> +       delay_fn(loops);
>> +}
>>  EXPORT_SYMBOL(__delay);
>
> Can we make this static inline also? I'm sure about the module issues..

Do you mean in the header file or in this file?

I think it won't work because there actually needs to be a __delay
symbol and it can't just be inlined away at all the call sites.


-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/3] ARM: Allow machines to override __delay()
  2010-11-04 20:58       ` Stephen Boyd
@ 2010-11-04 21:16         ` Daniel Walker
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Walker @ 2010-11-04 21:16 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King, linux-kernel, linux-arm-msm, linux-arm-kernel,
	Saravana Kannan, Colin Cross, Kevin Hilman, Santosh Shilimkar,
	Nicolas Pitre

On Thu, 2010-11-04 at 13:58 -0700, Stephen Boyd wrote:
> On 11/04/2010 12:30 PM, Daniel Walker wrote:
> > On Thu, 2010-10-28 at 14:19 -0700, Stephen Boyd wrote:
> >> +
> >> +static void (*delay_fn)(unsigned long) = delay_loop;
> >> +
> >> +void set_delay_fn(void (*fn)(unsigned long))
> >> +{
> >> +       delay_fn = fn;
> >> +}
> >
> > This needs to be a static inline in the header file.
> 
> Wouldn't that mean delay_fn needs to be exposed in the header file too?
> I like the fact that it's static and scoped to this file.

Yeah you would need to make an extern for that. Why is it better to have
it static in this file?

If you make it inline you should have smaller code size since all your
doing is "delay_fn = fn;" and that's not expected to be used much (if at
all).

> >> +/*
> >> + * loops = usecs * HZ * loops_per_jiffy / 1000000
> >> + */
> >> +void __delay(unsigned long loops)
> >> +{
> >> +       delay_fn(loops);
> >> +}
> >>  EXPORT_SYMBOL(__delay);
> >
> > Can we make this static inline also? I'm sure about the module issues..
> 
> Do you mean in the header file or in this file?

header file.

> I think it won't work because there actually needs to be a __delay
> symbol and it can't just be inlined away at all the call sites.

It would be inlined into delay_fn(loops); . So it's calling a function.
I think you can export the delay_fn symbol and use it that way.

Daniel

-- 

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCHv2 2/3] ARM: Allow machines to override __delay()
@ 2010-11-04 21:16         ` Daniel Walker
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Walker @ 2010-11-04 21:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-11-04 at 13:58 -0700, Stephen Boyd wrote:
> On 11/04/2010 12:30 PM, Daniel Walker wrote:
> > On Thu, 2010-10-28 at 14:19 -0700, Stephen Boyd wrote:
> >> +
> >> +static void (*delay_fn)(unsigned long) = delay_loop;
> >> +
> >> +void set_delay_fn(void (*fn)(unsigned long))
> >> +{
> >> +       delay_fn = fn;
> >> +}
> >
> > This needs to be a static inline in the header file.
> 
> Wouldn't that mean delay_fn needs to be exposed in the header file too?
> I like the fact that it's static and scoped to this file.

Yeah you would need to make an extern for that. Why is it better to have
it static in this file?

If you make it inline you should have smaller code size since all your
doing is "delay_fn = fn;" and that's not expected to be used much (if at
all).

> >> +/*
> >> + * loops = usecs * HZ * loops_per_jiffy / 1000000
> >> + */
> >> +void __delay(unsigned long loops)
> >> +{
> >> +       delay_fn(loops);
> >> +}
> >>  EXPORT_SYMBOL(__delay);
> >
> > Can we make this static inline also? I'm sure about the module issues..
> 
> Do you mean in the header file or in this file?

header file.

> I think it won't work because there actually needs to be a __delay
> symbol and it can't just be inlined away at all the call sites.

It would be inlined into delay_fn(loops); . So it's calling a function.
I think you can export the delay_fn symbol and use it that way.

Daniel

-- 

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/3] ARM: Allow machines to override __delay()
  2010-11-04 21:16         ` Daniel Walker
@ 2010-11-05 21:51           ` Stephen Boyd
  -1 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-11-05 21:51 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Russell King, Nicolas Pitre, Kevin Hilman, linux-arm-msm,
	linux-kernel, Saravana Kannan, Santosh Shilimkar, Colin Cross,
	linux-arm-kernel

On 11/04/2010 02:16 PM, Daniel Walker wrote:
> On Thu, 2010-11-04 at 13:58 -0700, Stephen Boyd wrote:
>> Wouldn't that mean delay_fn needs to be exposed in the header file too?
>> I like the fact that it's static and scoped to this file.
>
> Yeah you would need to make an extern for that. Why is it better to have
> it static in this file?
>
> If you make it inline you should have smaller code size since all your
> doing is "delay_fn = fn;" and that's not expected to be used much (if at
> all).
>

Ok. Doing that increases the size of my vmlinux.

$ size vmlinux.orig vmlinux.new
   text    data     bss     dec     hex filename
7091426  594512 1244648 8930586  88451a vmlinux.orig
7091514  594512 1244648 8930674  884572 vmlinux.new

Here's the interdiff.

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index 5c6b9a3..ef0ea48 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -8,7 +8,20 @@
 
 #include <asm/param.h>	/* HZ */
 
-extern void __delay(unsigned long loops);
+extern void (*delay_fn)(unsigned long);
+
+static inline void set_delay_fn(void (*fn)(unsigned long))
+{
+	delay_fn = fn;
+}
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ */
+static inline void __delay(unsigned long loops)
+{
+	delay_fn(loops);
+}
 
 /*
  * This function intentionally does not exist; if you see references to
@@ -40,7 +53,6 @@ extern void __const_udelay(unsigned long);
 			__const_udelay((n) * ((2199023U*HZ)>>11))) :	\
 	  __udelay(n))
 
-extern void set_delay_fn(void (*fn)(unsigned long));
 extern void read_current_timer_delay_loop(unsigned long loops);
 
 #endif /* defined(_ARM_DELAY_H) */
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index 59fdf64..a4b1ecd 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -44,20 +44,8 @@ void read_current_timer_delay_loop(unsigned long loops)
 }
 #endif
 
-static void (*delay_fn)(unsigned long) = delay_loop;
+void (*delay_fn)(unsigned long) = delay_loop;
 
-void set_delay_fn(void (*fn)(unsigned long))
-{
-	delay_fn = fn;
-}
-
-/*
- * loops = usecs * HZ * loops_per_jiffy / 1000000
- */
-void __delay(unsigned long loops)
-{
-	delay_fn(loops);
-}
 EXPORT_SYMBOL(__delay);
 
 /*


Do you get similar results?

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCHv2 2/3] ARM: Allow machines to override __delay()
@ 2010-11-05 21:51           ` Stephen Boyd
  0 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-11-05 21:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/04/2010 02:16 PM, Daniel Walker wrote:
> On Thu, 2010-11-04 at 13:58 -0700, Stephen Boyd wrote:
>> Wouldn't that mean delay_fn needs to be exposed in the header file too?
>> I like the fact that it's static and scoped to this file.
>
> Yeah you would need to make an extern for that. Why is it better to have
> it static in this file?
>
> If you make it inline you should have smaller code size since all your
> doing is "delay_fn = fn;" and that's not expected to be used much (if at
> all).
>

Ok. Doing that increases the size of my vmlinux.

$ size vmlinux.orig vmlinux.new
   text    data     bss     dec     hex filename
7091426  594512 1244648 8930586  88451a vmlinux.orig
7091514  594512 1244648 8930674  884572 vmlinux.new

Here's the interdiff.

diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index 5c6b9a3..ef0ea48 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -8,7 +8,20 @@
 
 #include <asm/param.h>	/* HZ */
 
-extern void __delay(unsigned long loops);
+extern void (*delay_fn)(unsigned long);
+
+static inline void set_delay_fn(void (*fn)(unsigned long))
+{
+	delay_fn = fn;
+}
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ */
+static inline void __delay(unsigned long loops)
+{
+	delay_fn(loops);
+}
 
 /*
  * This function intentionally does not exist; if you see references to
@@ -40,7 +53,6 @@ extern void __const_udelay(unsigned long);
 			__const_udelay((n) * ((2199023U*HZ)>>11))) :	\
 	  __udelay(n))
 
-extern void set_delay_fn(void (*fn)(unsigned long));
 extern void read_current_timer_delay_loop(unsigned long loops);
 
 #endif /* defined(_ARM_DELAY_H) */
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index 59fdf64..a4b1ecd 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -44,20 +44,8 @@ void read_current_timer_delay_loop(unsigned long loops)
 }
 #endif
 
-static void (*delay_fn)(unsigned long) = delay_loop;
+void (*delay_fn)(unsigned long) = delay_loop;
 
-void set_delay_fn(void (*fn)(unsigned long))
-{
-	delay_fn = fn;
-}
-
-/*
- * loops = usecs * HZ * loops_per_jiffy / 1000000
- */
-void __delay(unsigned long loops)
-{
-	delay_fn(loops);
-}
 EXPORT_SYMBOL(__delay);
 
 /*


Do you get similar results?

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/3] ARM: Allow machines to override __delay()
  2010-11-05 21:51           ` Stephen Boyd
@ 2010-11-05 23:43             ` Daniel Walker
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Walker @ 2010-11-05 23:43 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King, Nicolas Pitre, Kevin Hilman, linux-arm-msm,
	linux-kernel, Saravana Kannan, Santosh Shilimkar, Colin Cross,
	linux-arm-kernel

On Fri, 2010-11-05 at 14:51 -0700, Stephen Boyd wrote:
> On 11/04/2010 02:16 PM, Daniel Walker wrote:
> > On Thu, 2010-11-04 at 13:58 -0700, Stephen Boyd wrote:
> >> Wouldn't that mean delay_fn needs to be exposed in the header file too?
> >> I like the fact that it's static and scoped to this file.
> >
> > Yeah you would need to make an extern for that. Why is it better to have
> > it static in this file?
> >
> > If you make it inline you should have smaller code size since all your
> > doing is "delay_fn = fn;" and that's not expected to be used much (if at
> > all).
> >
> 
> Ok. Doing that increases the size of my vmlinux.
> 
> $ size vmlinux.orig vmlinux.new
>    text    data     bss     dec     hex filename
> 7091426  594512 1244648 8930586  88451a vmlinux.orig
> 7091514  594512 1244648 8930674  884572 vmlinux.new

This is what I get,

   text    data     bss     dec     hex filename
2168427  104288  186176 2458891  25850b ../build-test/vmlinux.orig
2168379  104288  186176 2458843  2584db ../build-test/vmlinux.new

Your patch has something wrong with it, which I fixed. Details below,

> Here's the interdiff.
> 
> diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
> index 5c6b9a3..ef0ea48 100644
> --- a/arch/arm/include/asm/delay.h
> +++ b/arch/arm/include/asm/delay.h
> @@ -8,7 +8,20 @@
>  
>  #include <asm/param.h>	/* HZ */
>  
> -extern void __delay(unsigned long loops);
> +extern void (*delay_fn)(unsigned long);
> +
> +static inline void set_delay_fn(void (*fn)(unsigned long))
> +{
> +	delay_fn = fn;
> +}
> +
> +/*
> + * loops = usecs * HZ * loops_per_jiffy / 1000000
> + */
> +static inline void __delay(unsigned long loops)
> +{
> +	delay_fn(loops);
> +}
>  
>  /*
>   * This function intentionally does not exist; if you see references to
> @@ -40,7 +53,6 @@ extern void __const_udelay(unsigned long);
>  			__const_udelay((n) * ((2199023U*HZ)>>11))) :	\
>  	  __udelay(n))
>  
> -extern void set_delay_fn(void (*fn)(unsigned long));
>  extern void read_current_timer_delay_loop(unsigned long loops);
>  
>  #endif /* defined(_ARM_DELAY_H) */
> diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
> index 59fdf64..a4b1ecd 100644
> --- a/arch/arm/lib/delay.c
> +++ b/arch/arm/lib/delay.c
> @@ -44,20 +44,8 @@ void read_current_timer_delay_loop(unsigned long loops)
>  }
>  #endif
>  
> -static void (*delay_fn)(unsigned long) = delay_loop;
> +void (*delay_fn)(unsigned long) = delay_loop;
>  
> -void set_delay_fn(void (*fn)(unsigned long))
> -{
> -	delay_fn = fn;
> -}
> -
> -/*
> - * loops = usecs * HZ * loops_per_jiffy / 1000000
> - */
> -void __delay(unsigned long loops)
> -{
> -	delay_fn(loops);
> -}
>  EXPORT_SYMBOL(__delay);

You need to modify this EXPORT_SYMBOL to delay_fn since __delay doesn't
exist anymore.

Daniel

-- 
Sent by an consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCHv2 2/3] ARM: Allow machines to override __delay()
@ 2010-11-05 23:43             ` Daniel Walker
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Walker @ 2010-11-05 23:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2010-11-05 at 14:51 -0700, Stephen Boyd wrote:
> On 11/04/2010 02:16 PM, Daniel Walker wrote:
> > On Thu, 2010-11-04 at 13:58 -0700, Stephen Boyd wrote:
> >> Wouldn't that mean delay_fn needs to be exposed in the header file too?
> >> I like the fact that it's static and scoped to this file.
> >
> > Yeah you would need to make an extern for that. Why is it better to have
> > it static in this file?
> >
> > If you make it inline you should have smaller code size since all your
> > doing is "delay_fn = fn;" and that's not expected to be used much (if at
> > all).
> >
> 
> Ok. Doing that increases the size of my vmlinux.
> 
> $ size vmlinux.orig vmlinux.new
>    text    data     bss     dec     hex filename
> 7091426  594512 1244648 8930586  88451a vmlinux.orig
> 7091514  594512 1244648 8930674  884572 vmlinux.new

This is what I get,

   text    data     bss     dec     hex filename
2168427  104288  186176 2458891  25850b ../build-test/vmlinux.orig
2168379  104288  186176 2458843  2584db ../build-test/vmlinux.new

Your patch has something wrong with it, which I fixed. Details below,

> Here's the interdiff.
> 
> diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
> index 5c6b9a3..ef0ea48 100644
> --- a/arch/arm/include/asm/delay.h
> +++ b/arch/arm/include/asm/delay.h
> @@ -8,7 +8,20 @@
>  
>  #include <asm/param.h>	/* HZ */
>  
> -extern void __delay(unsigned long loops);
> +extern void (*delay_fn)(unsigned long);
> +
> +static inline void set_delay_fn(void (*fn)(unsigned long))
> +{
> +	delay_fn = fn;
> +}
> +
> +/*
> + * loops = usecs * HZ * loops_per_jiffy / 1000000
> + */
> +static inline void __delay(unsigned long loops)
> +{
> +	delay_fn(loops);
> +}
>  
>  /*
>   * This function intentionally does not exist; if you see references to
> @@ -40,7 +53,6 @@ extern void __const_udelay(unsigned long);
>  			__const_udelay((n) * ((2199023U*HZ)>>11))) :	\
>  	  __udelay(n))
>  
> -extern void set_delay_fn(void (*fn)(unsigned long));
>  extern void read_current_timer_delay_loop(unsigned long loops);
>  
>  #endif /* defined(_ARM_DELAY_H) */
> diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
> index 59fdf64..a4b1ecd 100644
> --- a/arch/arm/lib/delay.c
> +++ b/arch/arm/lib/delay.c
> @@ -44,20 +44,8 @@ void read_current_timer_delay_loop(unsigned long loops)
>  }
>  #endif
>  
> -static void (*delay_fn)(unsigned long) = delay_loop;
> +void (*delay_fn)(unsigned long) = delay_loop;
>  
> -void set_delay_fn(void (*fn)(unsigned long))
> -{
> -	delay_fn = fn;
> -}
> -
> -/*
> - * loops = usecs * HZ * loops_per_jiffy / 1000000
> - */
> -void __delay(unsigned long loops)
> -{
> -	delay_fn(loops);
> -}
>  EXPORT_SYMBOL(__delay);

You need to modify this EXPORT_SYMBOL to delay_fn since __delay doesn't
exist anymore.

Daniel

-- 
Sent by an consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/3] ARM: Allow machines to override __delay()
  2010-11-05 23:43             ` Daniel Walker
@ 2010-11-06  3:36               ` Stephen Boyd
  -1 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-11-06  3:36 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Russell King, Nicolas Pitre, Kevin Hilman, linux-arm-msm,
	linux-kernel, Saravana Kannan, Santosh Shilimkar, Colin Cross,
	linux-arm-kernel

On 11/05/2010 04:43 PM, Daniel Walker wrote:
> On Fri, 2010-11-05 at 14:51 -0700, Stephen Boyd wrote:
>> Ok. Doing that increases the size of my vmlinux.
>>
>> $ size vmlinux.orig vmlinux.new
>>    text    data     bss     dec     hex filename
>> 7091426  594512 1244648 8930586  88451a vmlinux.orig
>> 7091514  594512 1244648 8930674  884572 vmlinux.new
>
> This is what I get,
>
>    text    data     bss     dec     hex filename
> 2168427  104288  186176 2458891  25850b ../build-test/vmlinux.orig
> 2168379  104288  186176 2458843  2584db ../build-test/vmlinux.new
>
> Your patch has something wrong with it, which I fixed. Details below,
>
[snip]
>> - */
>> -void __delay(unsigned long loops)
>> -{
>> -	delay_fn(loops);
>> -}
>>  EXPORT_SYMBOL(__delay);
>
> You need to modify this EXPORT_SYMBOL to delay_fn since __delay doesn't
> exist anymore.

Wait. Doesn't this mean we're exporting delay_fn instead of __delay now?
i.e. the symbol name has changed and modules can no longer call __delay?
That sounds bad.

If I make that change, my kernel size is exactly the same before and
after. It may sound like a win since you got a decrease and I got a net
zero, but I'm not sure since the symbol has changed. I could make
__delay a function pointer and assign it directly but I'm not very
interested to expose a function pointer to modules allowing them to
modify it at any time (easily). Actually, I should probably mark
set_delay_fn __init so it gets thrown away after init when its far too
late to switch the delay function anyway. That would give you the space
savings you want and allow me to keep the delay_fn static to delay.c

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCHv2 2/3] ARM: Allow machines to override __delay()
@ 2010-11-06  3:36               ` Stephen Boyd
  0 siblings, 0 replies; 32+ messages in thread
From: Stephen Boyd @ 2010-11-06  3:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/05/2010 04:43 PM, Daniel Walker wrote:
> On Fri, 2010-11-05 at 14:51 -0700, Stephen Boyd wrote:
>> Ok. Doing that increases the size of my vmlinux.
>>
>> $ size vmlinux.orig vmlinux.new
>>    text    data     bss     dec     hex filename
>> 7091426  594512 1244648 8930586  88451a vmlinux.orig
>> 7091514  594512 1244648 8930674  884572 vmlinux.new
>
> This is what I get,
>
>    text    data     bss     dec     hex filename
> 2168427  104288  186176 2458891  25850b ../build-test/vmlinux.orig
> 2168379  104288  186176 2458843  2584db ../build-test/vmlinux.new
>
> Your patch has something wrong with it, which I fixed. Details below,
>
[snip]
>> - */
>> -void __delay(unsigned long loops)
>> -{
>> -	delay_fn(loops);
>> -}
>>  EXPORT_SYMBOL(__delay);
>
> You need to modify this EXPORT_SYMBOL to delay_fn since __delay doesn't
> exist anymore.

Wait. Doesn't this mean we're exporting delay_fn instead of __delay now?
i.e. the symbol name has changed and modules can no longer call __delay?
That sounds bad.

If I make that change, my kernel size is exactly the same before and
after. It may sound like a win since you got a decrease and I got a net
zero, but I'm not sure since the symbol has changed. I could make
__delay a function pointer and assign it directly but I'm not very
interested to expose a function pointer to modules allowing them to
modify it at any time (easily). Actually, I should probably mark
set_delay_fn __init so it gets thrown away after init when its far too
late to switch the delay function anyway. That would give you the space
savings you want and allow me to keep the delay_fn static to delay.c

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 2/3] ARM: Allow machines to override __delay()
  2010-11-06  3:36               ` Stephen Boyd
@ 2010-11-08 18:11                 ` Daniel Walker
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Walker @ 2010-11-08 18:11 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King, Nicolas Pitre, Kevin Hilman, linux-arm-msm,
	linux-kernel, Saravana Kannan, Santosh Shilimkar, Colin Cross,
	linux-arm-kernel

On Fri, 2010-11-05 at 20:36 -0700, Stephen Boyd wrote:
> On 11/05/2010 04:43 PM, Daniel Walker wrote:
> > On Fri, 2010-11-05 at 14:51 -0700, Stephen Boyd wrote:
> >> Ok. Doing that increases the size of my vmlinux.
> >>
> >> $ size vmlinux.orig vmlinux.new
> >>    text    data     bss     dec     hex filename
> >> 7091426  594512 1244648 8930586  88451a vmlinux.orig
> >> 7091514  594512 1244648 8930674  884572 vmlinux.new
> >
> > This is what I get,
> >
> >    text    data     bss     dec     hex filename
> > 2168427  104288  186176 2458891  25850b ../build-test/vmlinux.orig
> > 2168379  104288  186176 2458843  2584db ../build-test/vmlinux.new
> >
> > Your patch has something wrong with it, which I fixed. Details below,
> >
> [snip]
> >> - */
> >> -void __delay(unsigned long loops)
> >> -{
> >> -	delay_fn(loops);
> >> -}
> >>  EXPORT_SYMBOL(__delay);
> >
> > You need to modify this EXPORT_SYMBOL to delay_fn since __delay doesn't
> > exist anymore.
> 
> Wait. Doesn't this mean we're exporting delay_fn instead of __delay now?
> i.e. the symbol name has changed and modules can no longer call __delay?
> That sounds bad.

The modules would just call the new symbol. It would be a problem for
binary modules, but we don't really cater to binary modules. Like you
suggest below you could change the name to __delay().

> If I make that change, my kernel size is exactly the same before and
> after. It may sound like a win since you got a decrease and I got a net
> zero, but I'm not sure since the symbol has changed. I could make

I can't imagine how it's a net zero change for you. The change is
removing two global functions.

> __delay a function pointer and assign it directly but I'm not very
> interested to expose a function pointer to modules allowing them to
> modify it at any time (easily). Actually, I should probably mark
> set_delay_fn __init so it gets thrown away after init when its far too
> late to switch the delay function anyway. That would give you the space
> savings you want and allow me to keep the delay_fn static to delay.c

I don't think we need to protect other code authors to that degree. You
could put down a comment letting people know it's bad to alter __delay
after bootup .. I doubt this API will be used all that often..

Marking it __init only saves run time space it doesn't reduce the image
size. However, since set_delay_fn is likely to be used in __init
sections already, and it's inline, means the code is likely to get
removed in that case too. So doing it the way I'm suggesting give you a
smaller image size, and smaller runtime size.

Daniel

-- 

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCHv2 2/3] ARM: Allow machines to override __delay()
@ 2010-11-08 18:11                 ` Daniel Walker
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Walker @ 2010-11-08 18:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2010-11-05 at 20:36 -0700, Stephen Boyd wrote:
> On 11/05/2010 04:43 PM, Daniel Walker wrote:
> > On Fri, 2010-11-05 at 14:51 -0700, Stephen Boyd wrote:
> >> Ok. Doing that increases the size of my vmlinux.
> >>
> >> $ size vmlinux.orig vmlinux.new
> >>    text    data     bss     dec     hex filename
> >> 7091426  594512 1244648 8930586  88451a vmlinux.orig
> >> 7091514  594512 1244648 8930674  884572 vmlinux.new
> >
> > This is what I get,
> >
> >    text    data     bss     dec     hex filename
> > 2168427  104288  186176 2458891  25850b ../build-test/vmlinux.orig
> > 2168379  104288  186176 2458843  2584db ../build-test/vmlinux.new
> >
> > Your patch has something wrong with it, which I fixed. Details below,
> >
> [snip]
> >> - */
> >> -void __delay(unsigned long loops)
> >> -{
> >> -	delay_fn(loops);
> >> -}
> >>  EXPORT_SYMBOL(__delay);
> >
> > You need to modify this EXPORT_SYMBOL to delay_fn since __delay doesn't
> > exist anymore.
> 
> Wait. Doesn't this mean we're exporting delay_fn instead of __delay now?
> i.e. the symbol name has changed and modules can no longer call __delay?
> That sounds bad.

The modules would just call the new symbol. It would be a problem for
binary modules, but we don't really cater to binary modules. Like you
suggest below you could change the name to __delay().

> If I make that change, my kernel size is exactly the same before and
> after. It may sound like a win since you got a decrease and I got a net
> zero, but I'm not sure since the symbol has changed. I could make

I can't imagine how it's a net zero change for you. The change is
removing two global functions.

> __delay a function pointer and assign it directly but I'm not very
> interested to expose a function pointer to modules allowing them to
> modify it at any time (easily). Actually, I should probably mark
> set_delay_fn __init so it gets thrown away after init when its far too
> late to switch the delay function anyway. That would give you the space
> savings you want and allow me to keep the delay_fn static to delay.c

I don't think we need to protect other code authors to that degree. You
could put down a comment letting people know it's bad to alter __delay
after bootup .. I doubt this API will be used all that often..

Marking it __init only saves run time space it doesn't reduce the image
size. However, since set_delay_fn is likely to be used in __init
sections already, and it's inline, means the code is likely to get
removed in that case too. So doing it the way I'm suggesting give you a
smaller image size, and smaller runtime size.

Daniel

-- 

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2010-11-08 18:11 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-28 21:19 [PATCHv2 0/3] ARM: Fixing udelay() for SMP and non-SMP systems Stephen Boyd
2010-10-28 21:19 ` Stephen Boyd
2010-10-28 21:19 ` [PATCHv2 1/3] ARM: Translate delay.S into (mostly) C Stephen Boyd
2010-10-28 21:19   ` Stephen Boyd
2010-11-03 17:57   ` Daniel Walker
2010-11-03 17:57     ` Daniel Walker
2010-11-04  2:40     ` Nicolas Pitre
2010-11-04  2:40       ` Nicolas Pitre
2010-11-03 18:27   ` Will Deacon
2010-11-03 18:27     ` Will Deacon
2010-11-03 23:15     ` Stephen Boyd
2010-11-03 23:15       ` Stephen Boyd
2010-11-03 23:17       ` Daniel Walker
2010-11-03 23:17         ` Daniel Walker
2010-10-28 21:19 ` [PATCHv2 2/3] ARM: Allow machines to override __delay() Stephen Boyd
2010-10-28 21:19   ` Stephen Boyd
2010-11-04 19:30   ` Daniel Walker
2010-11-04 19:30     ` Daniel Walker
2010-11-04 20:58     ` Stephen Boyd
2010-11-04 20:58       ` Stephen Boyd
2010-11-04 21:16       ` Daniel Walker
2010-11-04 21:16         ` Daniel Walker
2010-11-05 21:51         ` Stephen Boyd
2010-11-05 21:51           ` Stephen Boyd
2010-11-05 23:43           ` Daniel Walker
2010-11-05 23:43             ` Daniel Walker
2010-11-06  3:36             ` Stephen Boyd
2010-11-06  3:36               ` Stephen Boyd
2010-11-08 18:11               ` Daniel Walker
2010-11-08 18:11                 ` Daniel Walker
2010-10-28 21:19 ` [PATCHv2 3/3] ARM: Implement a timer based __delay() loop Stephen Boyd
2010-10-28 21:19   ` Stephen Boyd

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.