All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 1/3] arm64: arch_timer: Add device tree binding for A-008585 erratum
@ 2016-07-01 22:41 ` Scott Wood
  0 siblings, 0 replies; 24+ messages in thread
From: Scott Wood @ 2016-07-01 22:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, stuart.yoder-3arQi8VN3Tc,
	Scott Wood

This erratum describes a bug in logic outside the core, so MIDR can't be
used to identify its presence, and reading an SoC-specific revision
register from common arch timer code would be awkward.  So, describe it
in the device tree.

Signed-off-by: Scott Wood <oss-fOR+EgIDQEHk1uMJSBkQmQ@public.gmane.org>
---
 Documentation/devicetree/bindings/arm/arch_timer.txt | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/arch_timer.txt b/Documentation/devicetree/bindings/arm/arch_timer.txt
index e774128..ef5fbe9 100644
--- a/Documentation/devicetree/bindings/arm/arch_timer.txt
+++ b/Documentation/devicetree/bindings/arm/arch_timer.txt
@@ -25,6 +25,12 @@ to deliver its interrupts via SPIs.
 - always-on : a boolean property. If present, the timer is powered through an
   always-on power domain, therefore it never loses context.
 
+- fsl,erratum-a008585 : A boolean property. Indicates the presence of
+  QorIQ erratum A-008585, which says that reading the counter is
+  unreliable unless the same value is returned by back-to-back reads.
+  This also affects writes to the tval register, due to the implicit
+  counter read.
+
 ** Optional properties:
 
 - arm,cpu-registers-not-fw-configured : Firmware does not initialize
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 1/3] arm64: arch_timer: Add device tree binding for A-008585 erratum
@ 2016-07-01 22:41 ` Scott Wood
  0 siblings, 0 replies; 24+ messages in thread
From: Scott Wood @ 2016-07-01 22:41 UTC (permalink / raw)
  To: linux-arm-kernel

This erratum describes a bug in logic outside the core, so MIDR can't be
used to identify its presence, and reading an SoC-specific revision
register from common arch timer code would be awkward.  So, describe it
in the device tree.

Signed-off-by: Scott Wood <oss@buserror.net>
---
 Documentation/devicetree/bindings/arm/arch_timer.txt | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/arch_timer.txt b/Documentation/devicetree/bindings/arm/arch_timer.txt
index e774128..ef5fbe9 100644
--- a/Documentation/devicetree/bindings/arm/arch_timer.txt
+++ b/Documentation/devicetree/bindings/arm/arch_timer.txt
@@ -25,6 +25,12 @@ to deliver its interrupts via SPIs.
 - always-on : a boolean property. If present, the timer is powered through an
   always-on power domain, therefore it never loses context.
 
+- fsl,erratum-a008585 : A boolean property. Indicates the presence of
+  QorIQ erratum A-008585, which says that reading the counter is
+  unreliable unless the same value is returned by back-to-back reads.
+  This also affects writes to the tval register, due to the implicit
+  counter read.
+
 ** Optional properties:
 
 - arm,cpu-registers-not-fw-configured : Firmware does not initialize
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 2/3] arm64: dts: Add timer erratum property for LS2080A and LS1043A
  2016-07-01 22:41 ` Scott Wood
@ 2016-07-01 22:41     ` Scott Wood
  -1 siblings, 0 replies; 24+ messages in thread
From: Scott Wood @ 2016-07-01 22:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, stuart.yoder-3arQi8VN3Tc,
	Scott Wood

Signed-off-by: Scott Wood <oss-fOR+EgIDQEHk1uMJSBkQmQ@public.gmane.org>
---
 arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 1 +
 arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index be72bf5..596420c 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -115,6 +115,7 @@
 			     <1 14 0x1>, /* Physical Non-Secure PPI */
 			     <1 11 0x1>, /* Virtual PPI */
 			     <1 10 0x1>; /* Hypervisor PPI */
+		fsl,erratum-a008585;
 	};
 
 	pmu {
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
index 9d746c6..0270ccf 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
@@ -171,6 +171,7 @@
 			     <1 14 0x8>, /* Physical Non-Secure PPI, active-low */
 			     <1 11 0x8>, /* Virtual PPI, active-low */
 			     <1 10 0x8>; /* Hypervisor PPI, active-low */
+		fsl,erratum-a008585;
 	};
 
 	pmu {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 2/3] arm64: dts: Add timer erratum property for LS2080A and LS1043A
@ 2016-07-01 22:41     ` Scott Wood
  0 siblings, 0 replies; 24+ messages in thread
From: Scott Wood @ 2016-07-01 22:41 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Scott Wood <oss@buserror.net>
---
 arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 1 +
 arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index be72bf5..596420c 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -115,6 +115,7 @@
 			     <1 14 0x1>, /* Physical Non-Secure PPI */
 			     <1 11 0x1>, /* Virtual PPI */
 			     <1 10 0x1>; /* Hypervisor PPI */
+		fsl,erratum-a008585;
 	};
 
 	pmu {
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
index 9d746c6..0270ccf 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
@@ -171,6 +171,7 @@
 			     <1 14 0x8>, /* Physical Non-Secure PPI, active-low */
 			     <1 11 0x8>, /* Virtual PPI, active-low */
 			     <1 10 0x8>; /* Hypervisor PPI, active-low */
+		fsl,erratum-a008585;
 	};
 
 	pmu {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
  2016-07-01 22:41 ` Scott Wood
@ 2016-07-01 22:41     ` Scott Wood
  -1 siblings, 0 replies; 24+ messages in thread
From: Scott Wood @ 2016-07-01 22:41 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, stuart.yoder-3arQi8VN3Tc,
	Scott Wood

Erratum A-008585 says that the ARM generic timer counter "has the
potential to contain an erroneous value for a small number of core
clock cycles every time the timer value changes".  Accesses to TVAL
(both read and write) are also affected due to the implicit counter
read.  Accesses to CVAL are not affected.

The workaround is to reread TVAL and count registers until successive reads
return the same value, and when writing TVAL to retry until counter
reads before and after the write return the same value.

This erratum can be found on LS1043A and LS2080A.

Signed-off-by: Scott Wood <oss-fOR+EgIDQEHk1uMJSBkQmQ@public.gmane.org>
---
v3:
- Used cval rather than a loop for the write side of the erratum
- Added a Kconfig control
- Moved the device tree binding into its own patch
- Added erratum to silicon-errata.txt
- Changed function names to contain the erratum name
- Factored out the setting of erratum versions of set_next_event
  to improve readability
- Added a comment clarifying that the timeout is arbitrary

v2:
Significant rework based on feedback, including using static_key,
disabling VDSO counter access rather than adding the workaround to the
VDSO, and uninlining the loops.

Dropped the separate property for indicating that writes to TVAL are
affected, as I believe that's just a side effect of the implicit
counter read being corrupted, and thus a chip that is affected by one
will always be affected by the other.

Dropped the arm32 portion as it seems there was confusion about whether
LS1021A is affected.  Currently I am being told that it is not
affected.

I considered writing to CVAL rather than looping on TVAL writes, but
that would still have required separate set_next_event() code for the
erratum, and adding CVAL to the enum would have required a bunch of
extra handlers in switch statements (even where unused, due to compiler
warnings about unhandled enum values) including in an arm32 header.  It
seemed better to avoid the arm32 interaction and new untested
accessors.
---
 Documentation/arm64/silicon-errata.txt |   2 +
 arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
 drivers/clocksource/Kconfig            |  10 ++++
 drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
 4 files changed, 154 insertions(+), 9 deletions(-)

diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
index ba4b6ac..5778f62 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -57,3 +57,5 @@ stable kernels.
 | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
 | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
 | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
+|                |                 |                 |                         |
+| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
index fbe0ca3..70fbad9 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -23,10 +23,34 @@
 
 #include <linux/bug.h>
 #include <linux/init.h>
+#include <linux/jump_label.h>
 #include <linux/types.h>
 
 #include <clocksource/arm_arch_timer.h>
 
+extern struct static_key_false arch_timer_read_ool_enabled;
+
+#define ARCH_TIMER_REG_READ(reg, func) \
+extern u64 func##_ool(void); \
+static inline u64 __##func(void) \
+{ \
+	u64 val; \
+	asm volatile("mrs %0, " reg : "=r" (val)); \
+	return val; \
+} \
+static inline u64 _##func(void) \
+{ \
+	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
+	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
+		return func##_ool(); \
+	else \
+		return __##func(); \
+}
+
+ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
+ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
+ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
+
 /*
  * These register accessors are marked inline so the compiler can
  * nicely work out which register we want, and chuck away the rest of
@@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
 	isb();
 }
 
+static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
+{
+	if (access == ARCH_TIMER_PHYS_ACCESS)
+		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
+	else if (access == ARCH_TIMER_VIRT_ACCESS)
+		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
+
+	isb();
+}
+
 static __always_inline
 u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
 {
@@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
 	if (access == ARCH_TIMER_PHYS_ACCESS) {
 		switch (reg) {
 		case ARCH_TIMER_REG_CTRL:
-			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
+			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
 			break;
 		case ARCH_TIMER_REG_TVAL:
-			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
+			val = _arch_timer_get_ptval();
 			break;
 		}
 	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
 		switch (reg) {
 		case ARCH_TIMER_REG_CTRL:
-			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
+			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
 			break;
 		case ARCH_TIMER_REG_TVAL:
-			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
+			val = _arch_timer_get_vtval();
 			break;
 		}
 	}
@@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
 
 static inline u64 arch_counter_get_cntvct(void)
 {
-	u64 cval;
-
 	isb();
-	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
-
-	return cval;
+	return _arch_counter_get_cntvct();
 }
 
 static inline int arch_timer_arch_init(void)
diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
index c346be6..672ddc3 100644
--- a/drivers/clocksource/Kconfig
+++ b/drivers/clocksource/Kconfig
@@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
 	  This must be disabled for hardware validation purposes to detect any
 	  hardware anomalies of missing events.
 
+config FSL_ERRATUM_A008585
+	bool "Workaround for Freescale/NXP Erratum A-008585"
+	default y
+	depends on ARM_ARCH_TIMER && ARM64
+	help
+	  This option enables a workaround for Freescale/NXP Erratum
+	  A-008585 ("ARM generic timer may contain an erroneous
+	  value").  The workaround will only be active if the
+	  fsl,erratum-a008585 property is found in the timer node.
+
 config ARM_GLOBAL_TIMER
 	bool
 	select CLKSRC_OF if OF
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 5152b38..7ead4eb 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
  * Architected system timer support.
  */
 
+#ifdef CONFIG_FSL_ERRATUM_A008585
+DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
+EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
+
+/*
+ * __always_inline is used to ensure that func() is not an actual function
+ * pointer, which would result in the register accesses potentially being too
+ * far apart for the loop to work.
+ *
+ * The timeout is an arbitrary value well beyond the highest number
+ * of iterations the loop has been observed to take.
+ */
+static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
+{
+	u64 cval_old, cval_new;
+	int timeout = 200;
+
+	do {
+		isb();
+		cval_old = func();
+		cval_new = func();
+		timeout--;
+	} while (unlikely(cval_old != cval_new) && timeout);
+
+	WARN_ON_ONCE(!timeout);
+	return cval_new;
+}
+
+u64 arch_counter_get_cntvct_ool(void)
+{
+	return fsl_a008585_reread_counter(__arch_counter_get_cntvct);
+}
+
+u64 arch_timer_get_vtval_ool(void)
+{
+	return fsl_a008585_reread_counter(__arch_timer_get_vtval);
+}
+
+u64 arch_timer_get_ptval_ool(void)
+{
+	return fsl_a008585_reread_counter(__arch_timer_get_ptval);
+}
+
+#endif /* CONFIG_FSL_ERRATUM_A008585 */
+
 static __always_inline
 void arch_timer_reg_write(int access, enum arch_timer_reg reg, u32 val,
 			  struct clock_event_device *clk)
@@ -232,6 +277,35 @@ static __always_inline void set_next_event(const int access, unsigned long evt,
 	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
 }
 
+#ifdef CONFIG_FSL_ERRATUM_A008585
+static __always_inline void fsl_a008585_set_next_event(const int access,
+		unsigned long evt, struct clock_event_device *clk)
+{
+	unsigned long ctrl;
+	u64 cval = evt + arch_counter_get_cntvct();
+
+	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
+	ctrl |= ARCH_TIMER_CTRL_ENABLE;
+	ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
+	arch_timer_cval_write_cp15(access, cval);
+	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
+}
+
+static int fsl_a008585_set_next_event_virt(unsigned long evt,
+					   struct clock_event_device *clk)
+{
+	fsl_a008585_set_next_event(ARCH_TIMER_VIRT_ACCESS, evt, clk);
+	return 0;
+}
+
+static int fsl_a008585_set_next_event_phys(unsigned long evt,
+					   struct clock_event_device *clk)
+{
+	fsl_a008585_set_next_event(ARCH_TIMER_PHYS_ACCESS, evt, clk);
+	return 0;
+}
+#endif /* CONFIG_FSL_ERRATUM_A008585 */
+
 static int arch_timer_set_next_event_virt(unsigned long evt,
 					  struct clock_event_device *clk)
 {
@@ -260,6 +334,19 @@ static int arch_timer_set_next_event_phys_mem(unsigned long evt,
 	return 0;
 }
 
+static void fsl_a008585_set_sne(struct clock_event_device *clk)
+{
+#ifdef CONFIG_FSL_ERRATUM_A008585
+	if (!static_branch_unlikely(&arch_timer_read_ool_enabled))
+		return;
+
+	if (arch_timer_uses_ppi == VIRT_PPI)
+		clk->set_next_event = fsl_a008585_set_next_event_virt;
+	else
+		clk->set_next_event = fsl_a008585_set_next_event_phys;
+#endif
+}
+
 static void __arch_timer_setup(unsigned type,
 			       struct clock_event_device *clk)
 {
@@ -288,6 +375,8 @@ static void __arch_timer_setup(unsigned type,
 		default:
 			BUG();
 		}
+
+		fsl_a008585_set_sne(clk);
 	} else {
 		clk->features |= CLOCK_EVT_FEAT_DYNIRQ;
 		clk->name = "arch_mem_timer";
@@ -485,6 +574,15 @@ static void __init arch_counter_register(unsigned type)
 			arch_timer_read_counter = arch_counter_get_cntvct;
 		else
 			arch_timer_read_counter = arch_counter_get_cntpct;
+
+#ifdef CONFIG_FSL_ERRATUM_A008585
+		/*
+		 * Don't use the vdso fastpath if errata require using
+		 * the out-of-line counter accessor.
+		 */
+		if (static_branch_unlikely(&arch_timer_read_ool_enabled))
+			clocksource_counter.name = "arch_sys_counter_ool";
+#endif
 	} else {
 		arch_timer_read_counter = arch_counter_get_cntvct_mem;
 
@@ -763,6 +861,11 @@ static void __init arch_timer_of_init(struct device_node *np)
 
 	arch_timer_c3stop = !of_property_read_bool(np, "always-on");
 
+#ifdef CONFIG_FSL_ERRATUM_A008585
+	if (of_property_read_bool(np, "fsl,erratum-a008585"))
+		static_branch_enable(&arch_timer_read_ool_enabled);
+#endif
+
 	/*
 	 * If we cannot rely on firmware initializing the timer registers then
 	 * we should use the physical timers instead.
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
@ 2016-07-01 22:41     ` Scott Wood
  0 siblings, 0 replies; 24+ messages in thread
From: Scott Wood @ 2016-07-01 22:41 UTC (permalink / raw)
  To: linux-arm-kernel

Erratum A-008585 says that the ARM generic timer counter "has the
potential to contain an erroneous value for a small number of core
clock cycles every time the timer value changes".  Accesses to TVAL
(both read and write) are also affected due to the implicit counter
read.  Accesses to CVAL are not affected.

The workaround is to reread TVAL and count registers until successive reads
return the same value, and when writing TVAL to retry until counter
reads before and after the write return the same value.

This erratum can be found on LS1043A and LS2080A.

Signed-off-by: Scott Wood <oss@buserror.net>
---
v3:
- Used cval rather than a loop for the write side of the erratum
- Added a Kconfig control
- Moved the device tree binding into its own patch
- Added erratum to silicon-errata.txt
- Changed function names to contain the erratum name
- Factored out the setting of erratum versions of set_next_event
  to improve readability
- Added a comment clarifying that the timeout is arbitrary

v2:
Significant rework based on feedback, including using static_key,
disabling VDSO counter access rather than adding the workaround to the
VDSO, and uninlining the loops.

Dropped the separate property for indicating that writes to TVAL are
affected, as I believe that's just a side effect of the implicit
counter read being corrupted, and thus a chip that is affected by one
will always be affected by the other.

Dropped the arm32 portion as it seems there was confusion about whether
LS1021A is affected.  Currently I am being told that it is not
affected.

I considered writing to CVAL rather than looping on TVAL writes, but
that would still have required separate set_next_event() code for the
erratum, and adding CVAL to the enum would have required a bunch of
extra handlers in switch statements (even where unused, due to compiler
warnings about unhandled enum values) including in an arm32 header.  It
seemed better to avoid the arm32 interaction and new untested
accessors.
---
 Documentation/arm64/silicon-errata.txt |   2 +
 arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
 drivers/clocksource/Kconfig            |  10 ++++
 drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
 4 files changed, 154 insertions(+), 9 deletions(-)

diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
index ba4b6ac..5778f62 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -57,3 +57,5 @@ stable kernels.
 | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
 | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
 | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
+|                |                 |                 |                         |
+| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
index fbe0ca3..70fbad9 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -23,10 +23,34 @@
 
 #include <linux/bug.h>
 #include <linux/init.h>
+#include <linux/jump_label.h>
 #include <linux/types.h>
 
 #include <clocksource/arm_arch_timer.h>
 
+extern struct static_key_false arch_timer_read_ool_enabled;
+
+#define ARCH_TIMER_REG_READ(reg, func) \
+extern u64 func##_ool(void); \
+static inline u64 __##func(void) \
+{ \
+	u64 val; \
+	asm volatile("mrs %0, " reg : "=r" (val)); \
+	return val; \
+} \
+static inline u64 _##func(void) \
+{ \
+	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
+	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
+		return func##_ool(); \
+	else \
+		return __##func(); \
+}
+
+ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
+ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
+ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
+
 /*
  * These register accessors are marked inline so the compiler can
  * nicely work out which register we want, and chuck away the rest of
@@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
 	isb();
 }
 
+static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
+{
+	if (access == ARCH_TIMER_PHYS_ACCESS)
+		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
+	else if (access == ARCH_TIMER_VIRT_ACCESS)
+		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
+
+	isb();
+}
+
 static __always_inline
 u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
 {
@@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
 	if (access == ARCH_TIMER_PHYS_ACCESS) {
 		switch (reg) {
 		case ARCH_TIMER_REG_CTRL:
-			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
+			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
 			break;
 		case ARCH_TIMER_REG_TVAL:
-			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
+			val = _arch_timer_get_ptval();
 			break;
 		}
 	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
 		switch (reg) {
 		case ARCH_TIMER_REG_CTRL:
-			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
+			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
 			break;
 		case ARCH_TIMER_REG_TVAL:
-			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
+			val = _arch_timer_get_vtval();
 			break;
 		}
 	}
@@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
 
 static inline u64 arch_counter_get_cntvct(void)
 {
-	u64 cval;
-
 	isb();
-	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
-
-	return cval;
+	return _arch_counter_get_cntvct();
 }
 
 static inline int arch_timer_arch_init(void)
diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
index c346be6..672ddc3 100644
--- a/drivers/clocksource/Kconfig
+++ b/drivers/clocksource/Kconfig
@@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
 	  This must be disabled for hardware validation purposes to detect any
 	  hardware anomalies of missing events.
 
+config FSL_ERRATUM_A008585
+	bool "Workaround for Freescale/NXP Erratum A-008585"
+	default y
+	depends on ARM_ARCH_TIMER && ARM64
+	help
+	  This option enables a workaround for Freescale/NXP Erratum
+	  A-008585 ("ARM generic timer may contain an erroneous
+	  value").  The workaround will only be active if the
+	  fsl,erratum-a008585 property is found in the timer node.
+
 config ARM_GLOBAL_TIMER
 	bool
 	select CLKSRC_OF if OF
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 5152b38..7ead4eb 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
  * Architected system timer support.
  */
 
+#ifdef CONFIG_FSL_ERRATUM_A008585
+DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
+EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
+
+/*
+ * __always_inline is used to ensure that func() is not an actual function
+ * pointer, which would result in the register accesses potentially being too
+ * far apart for the loop to work.
+ *
+ * The timeout is an arbitrary value well beyond the highest number
+ * of iterations the loop has been observed to take.
+ */
+static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
+{
+	u64 cval_old, cval_new;
+	int timeout = 200;
+
+	do {
+		isb();
+		cval_old = func();
+		cval_new = func();
+		timeout--;
+	} while (unlikely(cval_old != cval_new) && timeout);
+
+	WARN_ON_ONCE(!timeout);
+	return cval_new;
+}
+
+u64 arch_counter_get_cntvct_ool(void)
+{
+	return fsl_a008585_reread_counter(__arch_counter_get_cntvct);
+}
+
+u64 arch_timer_get_vtval_ool(void)
+{
+	return fsl_a008585_reread_counter(__arch_timer_get_vtval);
+}
+
+u64 arch_timer_get_ptval_ool(void)
+{
+	return fsl_a008585_reread_counter(__arch_timer_get_ptval);
+}
+
+#endif /* CONFIG_FSL_ERRATUM_A008585 */
+
 static __always_inline
 void arch_timer_reg_write(int access, enum arch_timer_reg reg, u32 val,
 			  struct clock_event_device *clk)
@@ -232,6 +277,35 @@ static __always_inline void set_next_event(const int access, unsigned long evt,
 	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
 }
 
+#ifdef CONFIG_FSL_ERRATUM_A008585
+static __always_inline void fsl_a008585_set_next_event(const int access,
+		unsigned long evt, struct clock_event_device *clk)
+{
+	unsigned long ctrl;
+	u64 cval = evt + arch_counter_get_cntvct();
+
+	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
+	ctrl |= ARCH_TIMER_CTRL_ENABLE;
+	ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
+	arch_timer_cval_write_cp15(access, cval);
+	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
+}
+
+static int fsl_a008585_set_next_event_virt(unsigned long evt,
+					   struct clock_event_device *clk)
+{
+	fsl_a008585_set_next_event(ARCH_TIMER_VIRT_ACCESS, evt, clk);
+	return 0;
+}
+
+static int fsl_a008585_set_next_event_phys(unsigned long evt,
+					   struct clock_event_device *clk)
+{
+	fsl_a008585_set_next_event(ARCH_TIMER_PHYS_ACCESS, evt, clk);
+	return 0;
+}
+#endif /* CONFIG_FSL_ERRATUM_A008585 */
+
 static int arch_timer_set_next_event_virt(unsigned long evt,
 					  struct clock_event_device *clk)
 {
@@ -260,6 +334,19 @@ static int arch_timer_set_next_event_phys_mem(unsigned long evt,
 	return 0;
 }
 
+static void fsl_a008585_set_sne(struct clock_event_device *clk)
+{
+#ifdef CONFIG_FSL_ERRATUM_A008585
+	if (!static_branch_unlikely(&arch_timer_read_ool_enabled))
+		return;
+
+	if (arch_timer_uses_ppi == VIRT_PPI)
+		clk->set_next_event = fsl_a008585_set_next_event_virt;
+	else
+		clk->set_next_event = fsl_a008585_set_next_event_phys;
+#endif
+}
+
 static void __arch_timer_setup(unsigned type,
 			       struct clock_event_device *clk)
 {
@@ -288,6 +375,8 @@ static void __arch_timer_setup(unsigned type,
 		default:
 			BUG();
 		}
+
+		fsl_a008585_set_sne(clk);
 	} else {
 		clk->features |= CLOCK_EVT_FEAT_DYNIRQ;
 		clk->name = "arch_mem_timer";
@@ -485,6 +574,15 @@ static void __init arch_counter_register(unsigned type)
 			arch_timer_read_counter = arch_counter_get_cntvct;
 		else
 			arch_timer_read_counter = arch_counter_get_cntpct;
+
+#ifdef CONFIG_FSL_ERRATUM_A008585
+		/*
+		 * Don't use the vdso fastpath if errata require using
+		 * the out-of-line counter accessor.
+		 */
+		if (static_branch_unlikely(&arch_timer_read_ool_enabled))
+			clocksource_counter.name = "arch_sys_counter_ool";
+#endif
 	} else {
 		arch_timer_read_counter = arch_counter_get_cntvct_mem;
 
@@ -763,6 +861,11 @@ static void __init arch_timer_of_init(struct device_node *np)
 
 	arch_timer_c3stop = !of_property_read_bool(np, "always-on");
 
+#ifdef CONFIG_FSL_ERRATUM_A008585
+	if (of_property_read_bool(np, "fsl,erratum-a008585"))
+		static_branch_enable(&arch_timer_read_ool_enabled);
+#endif
+
 	/*
 	 * If we cannot rely on firmware initializing the timer registers then
 	 * we should use the physical timers instead.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
  2016-07-01 22:41     ` Scott Wood
@ 2016-07-04  9:58         ` Will Deacon
  -1 siblings, 0 replies; 24+ messages in thread
From: Will Deacon @ 2016-07-04  9:58 UTC (permalink / raw)
  To: Scott Wood
  Cc: Catalin Marinas, Marc Zyngier,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, stuart.yoder-3arQi8VN3Tc

On Fri, Jul 01, 2016 at 05:41:37PM -0500, Scott Wood wrote:
> Erratum A-008585 says that the ARM generic timer counter "has the
> potential to contain an erroneous value for a small number of core
> clock cycles every time the timer value changes".  Accesses to TVAL
> (both read and write) are also affected due to the implicit counter
> read.  Accesses to CVAL are not affected.
> 
> The workaround is to reread TVAL and count registers until successive reads
> return the same value, and when writing TVAL to retry until counter
> reads before and after the write return the same value.
> 
> This erratum can be found on LS1043A and LS2080A.
> 
> Signed-off-by: Scott Wood <oss-fOR+EgIDQEHk1uMJSBkQmQ@public.gmane.org>
> ---
> v3:
> - Used cval rather than a loop for the write side of the erratum
> - Added a Kconfig control
> - Moved the device tree binding into its own patch
> - Added erratum to silicon-errata.txt
> - Changed function names to contain the erratum name
> - Factored out the setting of erratum versions of set_next_event
>   to improve readability
> - Added a comment clarifying that the timeout is arbitrary
> 
> v2:
> Significant rework based on feedback, including using static_key,
> disabling VDSO counter access rather than adding the workaround to the
> VDSO, and uninlining the loops.
> 
> Dropped the separate property for indicating that writes to TVAL are
> affected, as I believe that's just a side effect of the implicit
> counter read being corrupted, and thus a chip that is affected by one
> will always be affected by the other.
> 
> Dropped the arm32 portion as it seems there was confusion about whether
> LS1021A is affected.  Currently I am being told that it is not
> affected.
> 
> I considered writing to CVAL rather than looping on TVAL writes, but
> that would still have required separate set_next_event() code for the
> erratum, and adding CVAL to the enum would have required a bunch of
> extra handlers in switch statements (even where unused, due to compiler
> warnings about unhandled enum values) including in an arm32 header.  It
> seemed better to avoid the arm32 interaction and new untested
> accessors.
> ---
>  Documentation/arm64/silicon-errata.txt |   2 +
>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>  drivers/clocksource/Kconfig            |  10 ++++
>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>  4 files changed, 154 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> index ba4b6ac..5778f62 100644
> --- a/Documentation/arm64/silicon-errata.txt
> +++ b/Documentation/arm64/silicon-errata.txt
> @@ -57,3 +57,5 @@ stable kernels.
>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
> +|                |                 |                 |                         |
> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index fbe0ca3..70fbad9 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -23,10 +23,34 @@
>  
>  #include <linux/bug.h>
>  #include <linux/init.h>
> +#include <linux/jump_label.h>
>  #include <linux/types.h>
>  
>  #include <clocksource/arm_arch_timer.h>
>  
> +extern struct static_key_false arch_timer_read_ool_enabled;
> +
> +#define ARCH_TIMER_REG_READ(reg, func) \
> +extern u64 func##_ool(void); \
> +static inline u64 __##func(void) \
> +{ \
> +	u64 val; \
> +	asm volatile("mrs %0, " reg : "=r" (val)); \
> +	return val; \
> +} \
> +static inline u64 _##func(void) \
> +{ \
> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
> +		return func##_ool(); \
> +	else \
> +		return __##func(); \
> +}
> +
> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)

It's probably worth #undefing the macro here, since it really shouldn't
grow any users outside of this header file.

> +
>  /*
>   * These register accessors are marked inline so the compiler can
>   * nicely work out which register we want, and chuck away the rest of
> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>  	isb();
>  }
>  
> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
> +{
> +	if (access == ARCH_TIMER_PHYS_ACCESS)
> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
> +
> +	isb();
> +}
> +
>  static __always_inline
>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>  {
> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>  		switch (reg) {
>  		case ARCH_TIMER_REG_CTRL:
> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>  			break;
>  		case ARCH_TIMER_REG_TVAL:
> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
> +			val = _arch_timer_get_ptval();
>  			break;
>  		}
>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>  		switch (reg) {
>  		case ARCH_TIMER_REG_CTRL:
> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>  			break;
>  		case ARCH_TIMER_REG_TVAL:
> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
> +			val = _arch_timer_get_vtval();
>  			break;
>  		}
>  	}
> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>  
>  static inline u64 arch_counter_get_cntvct(void)
>  {
> -	u64 cval;
> -
>  	isb();
> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
> -
> -	return cval;
> +	return _arch_counter_get_cntvct();
>  }
>  
>  static inline int arch_timer_arch_init(void)
> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
> index c346be6..672ddc3 100644
> --- a/drivers/clocksource/Kconfig
> +++ b/drivers/clocksource/Kconfig
> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>  	  This must be disabled for hardware validation purposes to detect any
>  	  hardware anomalies of missing events.
>  
> +config FSL_ERRATUM_A008585
> +	bool "Workaround for Freescale/NXP Erratum A-008585"
> +	default y
> +	depends on ARM_ARCH_TIMER && ARM64
> +	help
> +	  This option enables a workaround for Freescale/NXP Erratum
> +	  A-008585 ("ARM generic timer may contain an erroneous
> +	  value").  The workaround will only be active if the
> +	  fsl,erratum-a008585 property is found in the timer node.
> +
>  config ARM_GLOBAL_TIMER
>  	bool
>  	select CLKSRC_OF if OF
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 5152b38..7ead4eb 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>   * Architected system timer support.
>   */
>  
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
> +
> +/*
> + * __always_inline is used to ensure that func() is not an actual function
> + * pointer, which would result in the register accesses potentially being too
> + * far apart for the loop to work.
> + *
> + * The timeout is an arbitrary value well beyond the highest number
> + * of iterations the loop has been observed to take.
> + */
> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
> +{
> +	u64 cval_old, cval_new;
> +	int timeout = 200;
> +
> +	do {
> +		isb();
> +		cval_old = func();
> +		cval_new = func();
> +		timeout--;
> +	} while (unlikely(cval_old != cval_new) && timeout);
> +
> +	WARN_ON_ONCE(!timeout);
> +	return cval_new;
> +}
> +
> +u64 arch_counter_get_cntvct_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_counter_get_cntvct);
> +}
> +
> +u64 arch_timer_get_vtval_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_timer_get_vtval);
> +}
> +
> +u64 arch_timer_get_ptval_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_timer_get_ptval);
> +}
> +
> +#endif /* CONFIG_FSL_ERRATUM_A008585 */
> +
>  static __always_inline
>  void arch_timer_reg_write(int access, enum arch_timer_reg reg, u32 val,
>  			  struct clock_event_device *clk)
> @@ -232,6 +277,35 @@ static __always_inline void set_next_event(const int access, unsigned long evt,
>  	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
>  }
>  
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +static __always_inline void fsl_a008585_set_next_event(const int access,
> +		unsigned long evt, struct clock_event_device *clk)
> +{
> +	unsigned long ctrl;
> +	u64 cval = evt + arch_counter_get_cntvct();
> +
> +	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
> +	ctrl |= ARCH_TIMER_CTRL_ENABLE;
> +	ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
> +	arch_timer_cval_write_cp15(access, cval);
> +	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
> +}
> +
> +static int fsl_a008585_set_next_event_virt(unsigned long evt,
> +					   struct clock_event_device *clk)
> +{
> +	fsl_a008585_set_next_event(ARCH_TIMER_VIRT_ACCESS, evt, clk);
> +	return 0;
> +}
> +
> +static int fsl_a008585_set_next_event_phys(unsigned long evt,
> +					   struct clock_event_device *clk)
> +{
> +	fsl_a008585_set_next_event(ARCH_TIMER_PHYS_ACCESS, evt, clk);
> +	return 0;
> +}
> +#endif /* CONFIG_FSL_ERRATUM_A008585 */
> +
>  static int arch_timer_set_next_event_virt(unsigned long evt,
>  					  struct clock_event_device *clk)
>  {
> @@ -260,6 +334,19 @@ static int arch_timer_set_next_event_phys_mem(unsigned long evt,
>  	return 0;
>  }
>  
> +static void fsl_a008585_set_sne(struct clock_event_device *clk)
> +{
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +	if (!static_branch_unlikely(&arch_timer_read_ool_enabled))
> +		return;
> +
> +	if (arch_timer_uses_ppi == VIRT_PPI)
> +		clk->set_next_event = fsl_a008585_set_next_event_virt;
> +	else
> +		clk->set_next_event = fsl_a008585_set_next_event_phys;
> +#endif
> +}
> +
>  static void __arch_timer_setup(unsigned type,
>  			       struct clock_event_device *clk)
>  {
> @@ -288,6 +375,8 @@ static void __arch_timer_setup(unsigned type,
>  		default:
>  			BUG();
>  		}
> +
> +		fsl_a008585_set_sne(clk);
>  	} else {
>  		clk->features |= CLOCK_EVT_FEAT_DYNIRQ;
>  		clk->name = "arch_mem_timer";
> @@ -485,6 +574,15 @@ static void __init arch_counter_register(unsigned type)
>  			arch_timer_read_counter = arch_counter_get_cntvct;
>  		else
>  			arch_timer_read_counter = arch_counter_get_cntpct;
> +
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +		/*
> +		 * Don't use the vdso fastpath if errata require using
> +		 * the out-of-line counter accessor.
> +		 */
> +		if (static_branch_unlikely(&arch_timer_read_ool_enabled))
> +			clocksource_counter.name = "arch_sys_counter_ool";

We could avoid this entirely by providing an arch_clocksource_data structure
like x86 does, and get rid of the ugly strcmp magic from update_vsyscall.

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
@ 2016-07-04  9:58         ` Will Deacon
  0 siblings, 0 replies; 24+ messages in thread
From: Will Deacon @ 2016-07-04  9:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 01, 2016 at 05:41:37PM -0500, Scott Wood wrote:
> Erratum A-008585 says that the ARM generic timer counter "has the
> potential to contain an erroneous value for a small number of core
> clock cycles every time the timer value changes".  Accesses to TVAL
> (both read and write) are also affected due to the implicit counter
> read.  Accesses to CVAL are not affected.
> 
> The workaround is to reread TVAL and count registers until successive reads
> return the same value, and when writing TVAL to retry until counter
> reads before and after the write return the same value.
> 
> This erratum can be found on LS1043A and LS2080A.
> 
> Signed-off-by: Scott Wood <oss@buserror.net>
> ---
> v3:
> - Used cval rather than a loop for the write side of the erratum
> - Added a Kconfig control
> - Moved the device tree binding into its own patch
> - Added erratum to silicon-errata.txt
> - Changed function names to contain the erratum name
> - Factored out the setting of erratum versions of set_next_event
>   to improve readability
> - Added a comment clarifying that the timeout is arbitrary
> 
> v2:
> Significant rework based on feedback, including using static_key,
> disabling VDSO counter access rather than adding the workaround to the
> VDSO, and uninlining the loops.
> 
> Dropped the separate property for indicating that writes to TVAL are
> affected, as I believe that's just a side effect of the implicit
> counter read being corrupted, and thus a chip that is affected by one
> will always be affected by the other.
> 
> Dropped the arm32 portion as it seems there was confusion about whether
> LS1021A is affected.  Currently I am being told that it is not
> affected.
> 
> I considered writing to CVAL rather than looping on TVAL writes, but
> that would still have required separate set_next_event() code for the
> erratum, and adding CVAL to the enum would have required a bunch of
> extra handlers in switch statements (even where unused, due to compiler
> warnings about unhandled enum values) including in an arm32 header.  It
> seemed better to avoid the arm32 interaction and new untested
> accessors.
> ---
>  Documentation/arm64/silicon-errata.txt |   2 +
>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>  drivers/clocksource/Kconfig            |  10 ++++
>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>  4 files changed, 154 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> index ba4b6ac..5778f62 100644
> --- a/Documentation/arm64/silicon-errata.txt
> +++ b/Documentation/arm64/silicon-errata.txt
> @@ -57,3 +57,5 @@ stable kernels.
>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
> +|                |                 |                 |                         |
> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index fbe0ca3..70fbad9 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -23,10 +23,34 @@
>  
>  #include <linux/bug.h>
>  #include <linux/init.h>
> +#include <linux/jump_label.h>
>  #include <linux/types.h>
>  
>  #include <clocksource/arm_arch_timer.h>
>  
> +extern struct static_key_false arch_timer_read_ool_enabled;
> +
> +#define ARCH_TIMER_REG_READ(reg, func) \
> +extern u64 func##_ool(void); \
> +static inline u64 __##func(void) \
> +{ \
> +	u64 val; \
> +	asm volatile("mrs %0, " reg : "=r" (val)); \
> +	return val; \
> +} \
> +static inline u64 _##func(void) \
> +{ \
> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
> +		return func##_ool(); \
> +	else \
> +		return __##func(); \
> +}
> +
> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)

It's probably worth #undefing the macro here, since it really shouldn't
grow any users outside of this header file.

> +
>  /*
>   * These register accessors are marked inline so the compiler can
>   * nicely work out which register we want, and chuck away the rest of
> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>  	isb();
>  }
>  
> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
> +{
> +	if (access == ARCH_TIMER_PHYS_ACCESS)
> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
> +
> +	isb();
> +}
> +
>  static __always_inline
>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>  {
> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>  		switch (reg) {
>  		case ARCH_TIMER_REG_CTRL:
> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>  			break;
>  		case ARCH_TIMER_REG_TVAL:
> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
> +			val = _arch_timer_get_ptval();
>  			break;
>  		}
>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>  		switch (reg) {
>  		case ARCH_TIMER_REG_CTRL:
> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>  			break;
>  		case ARCH_TIMER_REG_TVAL:
> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
> +			val = _arch_timer_get_vtval();
>  			break;
>  		}
>  	}
> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>  
>  static inline u64 arch_counter_get_cntvct(void)
>  {
> -	u64 cval;
> -
>  	isb();
> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
> -
> -	return cval;
> +	return _arch_counter_get_cntvct();
>  }
>  
>  static inline int arch_timer_arch_init(void)
> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
> index c346be6..672ddc3 100644
> --- a/drivers/clocksource/Kconfig
> +++ b/drivers/clocksource/Kconfig
> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>  	  This must be disabled for hardware validation purposes to detect any
>  	  hardware anomalies of missing events.
>  
> +config FSL_ERRATUM_A008585
> +	bool "Workaround for Freescale/NXP Erratum A-008585"
> +	default y
> +	depends on ARM_ARCH_TIMER && ARM64
> +	help
> +	  This option enables a workaround for Freescale/NXP Erratum
> +	  A-008585 ("ARM generic timer may contain an erroneous
> +	  value").  The workaround will only be active if the
> +	  fsl,erratum-a008585 property is found in the timer node.
> +
>  config ARM_GLOBAL_TIMER
>  	bool
>  	select CLKSRC_OF if OF
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 5152b38..7ead4eb 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>   * Architected system timer support.
>   */
>  
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
> +
> +/*
> + * __always_inline is used to ensure that func() is not an actual function
> + * pointer, which would result in the register accesses potentially being too
> + * far apart for the loop to work.
> + *
> + * The timeout is an arbitrary value well beyond the highest number
> + * of iterations the loop has been observed to take.
> + */
> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
> +{
> +	u64 cval_old, cval_new;
> +	int timeout = 200;
> +
> +	do {
> +		isb();
> +		cval_old = func();
> +		cval_new = func();
> +		timeout--;
> +	} while (unlikely(cval_old != cval_new) && timeout);
> +
> +	WARN_ON_ONCE(!timeout);
> +	return cval_new;
> +}
> +
> +u64 arch_counter_get_cntvct_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_counter_get_cntvct);
> +}
> +
> +u64 arch_timer_get_vtval_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_timer_get_vtval);
> +}
> +
> +u64 arch_timer_get_ptval_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_timer_get_ptval);
> +}
> +
> +#endif /* CONFIG_FSL_ERRATUM_A008585 */
> +
>  static __always_inline
>  void arch_timer_reg_write(int access, enum arch_timer_reg reg, u32 val,
>  			  struct clock_event_device *clk)
> @@ -232,6 +277,35 @@ static __always_inline void set_next_event(const int access, unsigned long evt,
>  	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
>  }
>  
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +static __always_inline void fsl_a008585_set_next_event(const int access,
> +		unsigned long evt, struct clock_event_device *clk)
> +{
> +	unsigned long ctrl;
> +	u64 cval = evt + arch_counter_get_cntvct();
> +
> +	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
> +	ctrl |= ARCH_TIMER_CTRL_ENABLE;
> +	ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
> +	arch_timer_cval_write_cp15(access, cval);
> +	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
> +}
> +
> +static int fsl_a008585_set_next_event_virt(unsigned long evt,
> +					   struct clock_event_device *clk)
> +{
> +	fsl_a008585_set_next_event(ARCH_TIMER_VIRT_ACCESS, evt, clk);
> +	return 0;
> +}
> +
> +static int fsl_a008585_set_next_event_phys(unsigned long evt,
> +					   struct clock_event_device *clk)
> +{
> +	fsl_a008585_set_next_event(ARCH_TIMER_PHYS_ACCESS, evt, clk);
> +	return 0;
> +}
> +#endif /* CONFIG_FSL_ERRATUM_A008585 */
> +
>  static int arch_timer_set_next_event_virt(unsigned long evt,
>  					  struct clock_event_device *clk)
>  {
> @@ -260,6 +334,19 @@ static int arch_timer_set_next_event_phys_mem(unsigned long evt,
>  	return 0;
>  }
>  
> +static void fsl_a008585_set_sne(struct clock_event_device *clk)
> +{
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +	if (!static_branch_unlikely(&arch_timer_read_ool_enabled))
> +		return;
> +
> +	if (arch_timer_uses_ppi == VIRT_PPI)
> +		clk->set_next_event = fsl_a008585_set_next_event_virt;
> +	else
> +		clk->set_next_event = fsl_a008585_set_next_event_phys;
> +#endif
> +}
> +
>  static void __arch_timer_setup(unsigned type,
>  			       struct clock_event_device *clk)
>  {
> @@ -288,6 +375,8 @@ static void __arch_timer_setup(unsigned type,
>  		default:
>  			BUG();
>  		}
> +
> +		fsl_a008585_set_sne(clk);
>  	} else {
>  		clk->features |= CLOCK_EVT_FEAT_DYNIRQ;
>  		clk->name = "arch_mem_timer";
> @@ -485,6 +574,15 @@ static void __init arch_counter_register(unsigned type)
>  			arch_timer_read_counter = arch_counter_get_cntvct;
>  		else
>  			arch_timer_read_counter = arch_counter_get_cntpct;
> +
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +		/*
> +		 * Don't use the vdso fastpath if errata require using
> +		 * the out-of-line counter accessor.
> +		 */
> +		if (static_branch_unlikely(&arch_timer_read_ool_enabled))
> +			clocksource_counter.name = "arch_sys_counter_ool";

We could avoid this entirely by providing an arch_clocksource_data structure
like x86 does, and get rid of the ugly strcmp magic from update_vsyscall.

Will

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/3] arm64: arch_timer: Add device tree binding for A-008585 erratum
  2016-07-01 22:41 ` Scott Wood
@ 2016-07-05 15:43     ` Rob Herring
  -1 siblings, 0 replies; 24+ messages in thread
From: Rob Herring @ 2016-07-05 15:43 UTC (permalink / raw)
  To: Scott Wood
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, stuart.yoder-3arQi8VN3Tc

On Fri, Jul 01, 2016 at 05:41:35PM -0500, Scott Wood wrote:
> This erratum describes a bug in logic outside the core, so MIDR can't be
> used to identify its presence, and reading an SoC-specific revision
> register from common arch timer code would be awkward.  So, describe it
> in the device tree.
> 
> Signed-off-by: Scott Wood <oss-fOR+EgIDQEHk1uMJSBkQmQ@public.gmane.org>
> ---
>  Documentation/devicetree/bindings/arm/arch_timer.txt | 6 ++++++
>  1 file changed, 6 insertions(+)

Acked-by: Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 1/3] arm64: arch_timer: Add device tree binding for A-008585 erratum
@ 2016-07-05 15:43     ` Rob Herring
  0 siblings, 0 replies; 24+ messages in thread
From: Rob Herring @ 2016-07-05 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 01, 2016 at 05:41:35PM -0500, Scott Wood wrote:
> This erratum describes a bug in logic outside the core, so MIDR can't be
> used to identify its presence, and reading an SoC-specific revision
> register from common arch timer code would be awkward.  So, describe it
> in the device tree.
> 
> Signed-off-by: Scott Wood <oss@buserror.net>
> ---
>  Documentation/devicetree/bindings/arm/arch_timer.txt | 6 ++++++
>  1 file changed, 6 insertions(+)

Acked-by: Rob Herring <robh@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
  2016-07-01 22:41     ` Scott Wood
@ 2016-07-07  9:34       ` Ding Tianhong
  -1 siblings, 0 replies; 24+ messages in thread
From: Ding Tianhong @ 2016-07-07  9:34 UTC (permalink / raw)
  To: Scott Wood, Catalin Marinas, Will Deacon, Marc Zyngier
  Cc: devicetree, stuart.yoder, linux-arm-kernel

On 2016/7/2 6:41, Scott Wood wrote:
> Erratum A-008585 says that the ARM generic timer counter "has the
> potential to contain an erroneous value for a small number of core
> clock cycles every time the timer value changes".  Accesses to TVAL
> (both read and write) are also affected due to the implicit counter
> read.  Accesses to CVAL are not affected.
> 
> The workaround is to reread TVAL and count registers until successive reads
> return the same value, and when writing TVAL to retry until counter
> reads before and after the write return the same value.
> 
> This erratum can be found on LS1043A and LS2080A.
> 
> Signed-off-by: Scott Wood <oss@buserror.net>
> ---
> v3:
> - Used cval rather than a loop for the write side of the erratum
> - Added a Kconfig control
> - Moved the device tree binding into its own patch
> - Added erratum to silicon-errata.txt
> - Changed function names to contain the erratum name
> - Factored out the setting of erratum versions of set_next_event
>   to improve readability
> - Added a comment clarifying that the timeout is arbitrary
> 
> v2:
> Significant rework based on feedback, including using static_key,
> disabling VDSO counter access rather than adding the workaround to the
> VDSO, and uninlining the loops.
> 
> Dropped the separate property for indicating that writes to TVAL are
> affected, as I believe that's just a side effect of the implicit
> counter read being corrupted, and thus a chip that is affected by one
> will always be affected by the other.
> 
> Dropped the arm32 portion as it seems there was confusion about whether
> LS1021A is affected.  Currently I am being told that it is not
> affected.
> 
> I considered writing to CVAL rather than looping on TVAL writes, but
> that would still have required separate set_next_event() code for the
> erratum, and adding CVAL to the enum would have required a bunch of
> extra handlers in switch statements (even where unused, due to compiler
> warnings about unhandled enum values) including in an arm32 header.  It
> seemed better to avoid the arm32 interaction and new untested
> accessors.
> ---
>  Documentation/arm64/silicon-errata.txt |   2 +
>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>  drivers/clocksource/Kconfig            |  10 ++++
>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>  4 files changed, 154 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> index ba4b6ac..5778f62 100644
> --- a/Documentation/arm64/silicon-errata.txt
> +++ b/Documentation/arm64/silicon-errata.txt
> @@ -57,3 +57,5 @@ stable kernels.
>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
> +|                |                 |                 |                         |
> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index fbe0ca3..70fbad9 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -23,10 +23,34 @@
>  
>  #include <linux/bug.h>
>  #include <linux/init.h>
> +#include <linux/jump_label.h>
>  #include <linux/types.h>
>  
>  #include <clocksource/arm_arch_timer.h>
>  
> +extern struct static_key_false arch_timer_read_ool_enabled;
> +
> +#define ARCH_TIMER_REG_READ(reg, func) \
> +extern u64 func##_ool(void); \
> +static inline u64 __##func(void) \
> +{ \
> +	u64 val; \
> +	asm volatile("mrs %0, " reg : "=r" (val)); \
> +	return val; \
> +} \
> +static inline u64 _##func(void) \
> +{ \
> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
> +		return func##_ool(); \
> +	else \
> +		return __##func(); \
> +}
> +
> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
> +
>  /*
>   * These register accessors are marked inline so the compiler can
>   * nicely work out which register we want, and chuck away the rest of
> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>  	isb();
>  }
>  
> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
> +{
> +	if (access == ARCH_TIMER_PHYS_ACCESS)
> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
> +
> +	isb();
> +}
> +
>  static __always_inline
>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>  {
> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>  		switch (reg) {
>  		case ARCH_TIMER_REG_CTRL:
> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>  			break;
>  		case ARCH_TIMER_REG_TVAL:
> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
> +			val = _arch_timer_get_ptval();
>  			break;
>  		}
>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>  		switch (reg) {
>  		case ARCH_TIMER_REG_CTRL:
> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>  			break;
>  		case ARCH_TIMER_REG_TVAL:
> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
> +			val = _arch_timer_get_vtval();
>  			break;
>  		}
>  	}
> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>  
>  static inline u64 arch_counter_get_cntvct(void)
>  {
> -	u64 cval;
> -
>  	isb();
> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
> -
> -	return cval;
> +	return _arch_counter_get_cntvct();
>  }
>  
>  static inline int arch_timer_arch_init(void)
> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
> index c346be6..672ddc3 100644
> --- a/drivers/clocksource/Kconfig
> +++ b/drivers/clocksource/Kconfig
> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>  	  This must be disabled for hardware validation purposes to detect any
>  	  hardware anomalies of missing events.
>  
> +config FSL_ERRATUM_A008585
> +	bool "Workaround for Freescale/NXP Erratum A-008585"
> +	default y
> +	depends on ARM_ARCH_TIMER && ARM64
> +	help
> +	  This option enables a workaround for Freescale/NXP Erratum
> +	  A-008585 ("ARM generic timer may contain an erroneous
> +	  value").  The workaround will only be active if the
> +	  fsl,erratum-a008585 property is found in the timer node.
> +
>  config ARM_GLOBAL_TIMER
>  	bool
>  	select CLKSRC_OF if OF
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 5152b38..7ead4eb 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>   * Architected system timer support.
>   */
>  
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
> +
> +/*
> + * __always_inline is used to ensure that func() is not an actual function
> + * pointer, which would result in the register accesses potentially being too
> + * far apart for the loop to work.
> + *
> + * The timeout is an arbitrary value well beyond the highest number
> + * of iterations the loop has been observed to take.
> + */
> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
> +{
> +	u64 cval_old, cval_new;
> +	int timeout = 200;
> +
> +	do {
> +		isb();
> +		cval_old = func();
> +		cval_new = func();
> +		timeout--;
> +	} while (unlikely(cval_old != cval_new) && timeout);
> +
> +	WARN_ON_ONCE(!timeout);
> +	return cval_new;
> +}
Hi Scott:

I have test this patch, this solution looks will break the performance a little more than I expected.
it will have more than 10% that the cval will read again, we could sure that the cval_old always equal to the
cval_new in the normal circumstances, so I prefer this way:

	do {
		isb();
		cval_old = func();
		cval_new = func();
		timeout--;	
	} while (unlikely((cval_new - cval_old) >> 2) && timeout);

Thanks.
Ding

> +
> +u64 arch_counter_get_cntvct_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_counter_get_cntvct);
> +}
> +
> +u64 arch_timer_get_vtval_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_timer_get_vtval);
> +}
> +
> +u64 arch_timer_get_ptval_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_timer_get_ptval);
> +}
> +
> +#endif /* CONFIG_FSL_ERRATUM_A008585 */
> +
>  static __always_inline
>  void arch_timer_reg_write(int access, enum arch_timer_reg reg, u32 val,
>  			  struct clock_event_device *clk)
> @@ -232,6 +277,35 @@ static __always_inline void set_next_event(const int access, unsigned long evt,
>  	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
>  }
>  
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +static __always_inline void fsl_a008585_set_next_event(const int access,
> +		unsigned long evt, struct clock_event_device *clk)
> +{
> +	unsigned long ctrl;
> +	u64 cval = evt + arch_counter_get_cntvct();
> +
> +	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
> +	ctrl |= ARCH_TIMER_CTRL_ENABLE;
> +	ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
> +	arch_timer_cval_write_cp15(access, cval);
> +	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
> +}
> +
> +static int fsl_a008585_set_next_event_virt(unsigned long evt,
> +					   struct clock_event_device *clk)
> +{
> +	fsl_a008585_set_next_event(ARCH_TIMER_VIRT_ACCESS, evt, clk);
> +	return 0;
> +}
> +
> +static int fsl_a008585_set_next_event_phys(unsigned long evt,
> +					   struct clock_event_device *clk)
> +{
> +	fsl_a008585_set_next_event(ARCH_TIMER_PHYS_ACCESS, evt, clk);
> +	return 0;
> +}
> +#endif /* CONFIG_FSL_ERRATUM_A008585 */
> +
>  static int arch_timer_set_next_event_virt(unsigned long evt,
>  					  struct clock_event_device *clk)
>  {
> @@ -260,6 +334,19 @@ static int arch_timer_set_next_event_phys_mem(unsigned long evt,
>  	return 0;
>  }
>  
> +static void fsl_a008585_set_sne(struct clock_event_device *clk)
> +{
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +	if (!static_branch_unlikely(&arch_timer_read_ool_enabled))
> +		return;
> +
> +	if (arch_timer_uses_ppi == VIRT_PPI)
> +		clk->set_next_event = fsl_a008585_set_next_event_virt;
> +	else
> +		clk->set_next_event = fsl_a008585_set_next_event_phys;
> +#endif
> +}
> +
>  static void __arch_timer_setup(unsigned type,
>  			       struct clock_event_device *clk)
>  {
> @@ -288,6 +375,8 @@ static void __arch_timer_setup(unsigned type,
>  		default:
>  			BUG();
>  		}
> +
> +		fsl_a008585_set_sne(clk);
>  	} else {
>  		clk->features |= CLOCK_EVT_FEAT_DYNIRQ;
>  		clk->name = "arch_mem_timer";
> @@ -485,6 +574,15 @@ static void __init arch_counter_register(unsigned type)
>  			arch_timer_read_counter = arch_counter_get_cntvct;
>  		else
>  			arch_timer_read_counter = arch_counter_get_cntpct;
> +
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +		/*
> +		 * Don't use the vdso fastpath if errata require using
> +		 * the out-of-line counter accessor.
> +		 */
> +		if (static_branch_unlikely(&arch_timer_read_ool_enabled))
> +			clocksource_counter.name = "arch_sys_counter_ool";
> +#endif
>  	} else {
>  		arch_timer_read_counter = arch_counter_get_cntvct_mem;
>  
> @@ -763,6 +861,11 @@ static void __init arch_timer_of_init(struct device_node *np)
>  
>  	arch_timer_c3stop = !of_property_read_bool(np, "always-on");
>  
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +	if (of_property_read_bool(np, "fsl,erratum-a008585"))
> +		static_branch_enable(&arch_timer_read_ool_enabled);
> +#endif
> +
>  	/*
>  	 * If we cannot rely on firmware initializing the timer registers then
>  	 * we should use the physical timers instead.
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
@ 2016-07-07  9:34       ` Ding Tianhong
  0 siblings, 0 replies; 24+ messages in thread
From: Ding Tianhong @ 2016-07-07  9:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016/7/2 6:41, Scott Wood wrote:
> Erratum A-008585 says that the ARM generic timer counter "has the
> potential to contain an erroneous value for a small number of core
> clock cycles every time the timer value changes".  Accesses to TVAL
> (both read and write) are also affected due to the implicit counter
> read.  Accesses to CVAL are not affected.
> 
> The workaround is to reread TVAL and count registers until successive reads
> return the same value, and when writing TVAL to retry until counter
> reads before and after the write return the same value.
> 
> This erratum can be found on LS1043A and LS2080A.
> 
> Signed-off-by: Scott Wood <oss@buserror.net>
> ---
> v3:
> - Used cval rather than a loop for the write side of the erratum
> - Added a Kconfig control
> - Moved the device tree binding into its own patch
> - Added erratum to silicon-errata.txt
> - Changed function names to contain the erratum name
> - Factored out the setting of erratum versions of set_next_event
>   to improve readability
> - Added a comment clarifying that the timeout is arbitrary
> 
> v2:
> Significant rework based on feedback, including using static_key,
> disabling VDSO counter access rather than adding the workaround to the
> VDSO, and uninlining the loops.
> 
> Dropped the separate property for indicating that writes to TVAL are
> affected, as I believe that's just a side effect of the implicit
> counter read being corrupted, and thus a chip that is affected by one
> will always be affected by the other.
> 
> Dropped the arm32 portion as it seems there was confusion about whether
> LS1021A is affected.  Currently I am being told that it is not
> affected.
> 
> I considered writing to CVAL rather than looping on TVAL writes, but
> that would still have required separate set_next_event() code for the
> erratum, and adding CVAL to the enum would have required a bunch of
> extra handlers in switch statements (even where unused, due to compiler
> warnings about unhandled enum values) including in an arm32 header.  It
> seemed better to avoid the arm32 interaction and new untested
> accessors.
> ---
>  Documentation/arm64/silicon-errata.txt |   2 +
>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>  drivers/clocksource/Kconfig            |  10 ++++
>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>  4 files changed, 154 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> index ba4b6ac..5778f62 100644
> --- a/Documentation/arm64/silicon-errata.txt
> +++ b/Documentation/arm64/silicon-errata.txt
> @@ -57,3 +57,5 @@ stable kernels.
>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
> +|                |                 |                 |                         |
> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index fbe0ca3..70fbad9 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -23,10 +23,34 @@
>  
>  #include <linux/bug.h>
>  #include <linux/init.h>
> +#include <linux/jump_label.h>
>  #include <linux/types.h>
>  
>  #include <clocksource/arm_arch_timer.h>
>  
> +extern struct static_key_false arch_timer_read_ool_enabled;
> +
> +#define ARCH_TIMER_REG_READ(reg, func) \
> +extern u64 func##_ool(void); \
> +static inline u64 __##func(void) \
> +{ \
> +	u64 val; \
> +	asm volatile("mrs %0, " reg : "=r" (val)); \
> +	return val; \
> +} \
> +static inline u64 _##func(void) \
> +{ \
> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
> +		return func##_ool(); \
> +	else \
> +		return __##func(); \
> +}
> +
> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
> +
>  /*
>   * These register accessors are marked inline so the compiler can
>   * nicely work out which register we want, and chuck away the rest of
> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>  	isb();
>  }
>  
> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
> +{
> +	if (access == ARCH_TIMER_PHYS_ACCESS)
> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
> +
> +	isb();
> +}
> +
>  static __always_inline
>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>  {
> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>  		switch (reg) {
>  		case ARCH_TIMER_REG_CTRL:
> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>  			break;
>  		case ARCH_TIMER_REG_TVAL:
> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
> +			val = _arch_timer_get_ptval();
>  			break;
>  		}
>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>  		switch (reg) {
>  		case ARCH_TIMER_REG_CTRL:
> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>  			break;
>  		case ARCH_TIMER_REG_TVAL:
> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
> +			val = _arch_timer_get_vtval();
>  			break;
>  		}
>  	}
> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>  
>  static inline u64 arch_counter_get_cntvct(void)
>  {
> -	u64 cval;
> -
>  	isb();
> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
> -
> -	return cval;
> +	return _arch_counter_get_cntvct();
>  }
>  
>  static inline int arch_timer_arch_init(void)
> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
> index c346be6..672ddc3 100644
> --- a/drivers/clocksource/Kconfig
> +++ b/drivers/clocksource/Kconfig
> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>  	  This must be disabled for hardware validation purposes to detect any
>  	  hardware anomalies of missing events.
>  
> +config FSL_ERRATUM_A008585
> +	bool "Workaround for Freescale/NXP Erratum A-008585"
> +	default y
> +	depends on ARM_ARCH_TIMER && ARM64
> +	help
> +	  This option enables a workaround for Freescale/NXP Erratum
> +	  A-008585 ("ARM generic timer may contain an erroneous
> +	  value").  The workaround will only be active if the
> +	  fsl,erratum-a008585 property is found in the timer node.
> +
>  config ARM_GLOBAL_TIMER
>  	bool
>  	select CLKSRC_OF if OF
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 5152b38..7ead4eb 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>   * Architected system timer support.
>   */
>  
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
> +
> +/*
> + * __always_inline is used to ensure that func() is not an actual function
> + * pointer, which would result in the register accesses potentially being too
> + * far apart for the loop to work.
> + *
> + * The timeout is an arbitrary value well beyond the highest number
> + * of iterations the loop has been observed to take.
> + */
> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
> +{
> +	u64 cval_old, cval_new;
> +	int timeout = 200;
> +
> +	do {
> +		isb();
> +		cval_old = func();
> +		cval_new = func();
> +		timeout--;
> +	} while (unlikely(cval_old != cval_new) && timeout);
> +
> +	WARN_ON_ONCE(!timeout);
> +	return cval_new;
> +}
Hi Scott:

I have test this patch, this solution looks will break the performance a little more than I expected.
it will have more than 10% that the cval will read again, we could sure that the cval_old always equal to the
cval_new in the normal circumstances, so I prefer this way:

	do {
		isb();
		cval_old = func();
		cval_new = func();
		timeout--;	
	} while (unlikely((cval_new - cval_old) >> 2) && timeout);

Thanks.
Ding

> +
> +u64 arch_counter_get_cntvct_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_counter_get_cntvct);
> +}
> +
> +u64 arch_timer_get_vtval_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_timer_get_vtval);
> +}
> +
> +u64 arch_timer_get_ptval_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_timer_get_ptval);
> +}
> +
> +#endif /* CONFIG_FSL_ERRATUM_A008585 */
> +
>  static __always_inline
>  void arch_timer_reg_write(int access, enum arch_timer_reg reg, u32 val,
>  			  struct clock_event_device *clk)
> @@ -232,6 +277,35 @@ static __always_inline void set_next_event(const int access, unsigned long evt,
>  	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
>  }
>  
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +static __always_inline void fsl_a008585_set_next_event(const int access,
> +		unsigned long evt, struct clock_event_device *clk)
> +{
> +	unsigned long ctrl;
> +	u64 cval = evt + arch_counter_get_cntvct();
> +
> +	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
> +	ctrl |= ARCH_TIMER_CTRL_ENABLE;
> +	ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
> +	arch_timer_cval_write_cp15(access, cval);
> +	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
> +}
> +
> +static int fsl_a008585_set_next_event_virt(unsigned long evt,
> +					   struct clock_event_device *clk)
> +{
> +	fsl_a008585_set_next_event(ARCH_TIMER_VIRT_ACCESS, evt, clk);
> +	return 0;
> +}
> +
> +static int fsl_a008585_set_next_event_phys(unsigned long evt,
> +					   struct clock_event_device *clk)
> +{
> +	fsl_a008585_set_next_event(ARCH_TIMER_PHYS_ACCESS, evt, clk);
> +	return 0;
> +}
> +#endif /* CONFIG_FSL_ERRATUM_A008585 */
> +
>  static int arch_timer_set_next_event_virt(unsigned long evt,
>  					  struct clock_event_device *clk)
>  {
> @@ -260,6 +334,19 @@ static int arch_timer_set_next_event_phys_mem(unsigned long evt,
>  	return 0;
>  }
>  
> +static void fsl_a008585_set_sne(struct clock_event_device *clk)
> +{
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +	if (!static_branch_unlikely(&arch_timer_read_ool_enabled))
> +		return;
> +
> +	if (arch_timer_uses_ppi == VIRT_PPI)
> +		clk->set_next_event = fsl_a008585_set_next_event_virt;
> +	else
> +		clk->set_next_event = fsl_a008585_set_next_event_phys;
> +#endif
> +}
> +
>  static void __arch_timer_setup(unsigned type,
>  			       struct clock_event_device *clk)
>  {
> @@ -288,6 +375,8 @@ static void __arch_timer_setup(unsigned type,
>  		default:
>  			BUG();
>  		}
> +
> +		fsl_a008585_set_sne(clk);
>  	} else {
>  		clk->features |= CLOCK_EVT_FEAT_DYNIRQ;
>  		clk->name = "arch_mem_timer";
> @@ -485,6 +574,15 @@ static void __init arch_counter_register(unsigned type)
>  			arch_timer_read_counter = arch_counter_get_cntvct;
>  		else
>  			arch_timer_read_counter = arch_counter_get_cntpct;
> +
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +		/*
> +		 * Don't use the vdso fastpath if errata require using
> +		 * the out-of-line counter accessor.
> +		 */
> +		if (static_branch_unlikely(&arch_timer_read_ool_enabled))
> +			clocksource_counter.name = "arch_sys_counter_ool";
> +#endif
>  	} else {
>  		arch_timer_read_counter = arch_counter_get_cntvct_mem;
>  
> @@ -763,6 +861,11 @@ static void __init arch_timer_of_init(struct device_node *np)
>  
>  	arch_timer_c3stop = !of_property_read_bool(np, "always-on");
>  
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +	if (of_property_read_bool(np, "fsl,erratum-a008585"))
> +		static_branch_enable(&arch_timer_read_ool_enabled);
> +#endif
> +
>  	/*
>  	 * If we cannot rely on firmware initializing the timer registers then
>  	 * we should use the physical timers instead.
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
  2016-07-07  9:34       ` Ding Tianhong
@ 2016-07-07  9:49           ` Marc Zyngier
  -1 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2016-07-07  9:49 UTC (permalink / raw)
  To: Ding Tianhong, Scott Wood, Catalin Marinas, Will Deacon
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, stuart.yoder-3arQi8VN3Tc,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 07/07/16 10:34, Ding Tianhong wrote:
> On 2016/7/2 6:41, Scott Wood wrote:
>> Erratum A-008585 says that the ARM generic timer counter "has the
>> potential to contain an erroneous value for a small number of core
>> clock cycles every time the timer value changes".  Accesses to TVAL
>> (both read and write) are also affected due to the implicit counter
>> read.  Accesses to CVAL are not affected.
>>
>> The workaround is to reread TVAL and count registers until successive reads
>> return the same value, and when writing TVAL to retry until counter
>> reads before and after the write return the same value.
>>
>> This erratum can be found on LS1043A and LS2080A.
>>
>> Signed-off-by: Scott Wood <oss-fOR+EgIDQEHk1uMJSBkQmQ@public.gmane.org>
>> ---
>> v3:
>> - Used cval rather than a loop for the write side of the erratum
>> - Added a Kconfig control
>> - Moved the device tree binding into its own patch
>> - Added erratum to silicon-errata.txt
>> - Changed function names to contain the erratum name
>> - Factored out the setting of erratum versions of set_next_event
>>   to improve readability
>> - Added a comment clarifying that the timeout is arbitrary
>>
>> v2:
>> Significant rework based on feedback, including using static_key,
>> disabling VDSO counter access rather than adding the workaround to the
>> VDSO, and uninlining the loops.
>>
>> Dropped the separate property for indicating that writes to TVAL are
>> affected, as I believe that's just a side effect of the implicit
>> counter read being corrupted, and thus a chip that is affected by one
>> will always be affected by the other.
>>
>> Dropped the arm32 portion as it seems there was confusion about whether
>> LS1021A is affected.  Currently I am being told that it is not
>> affected.
>>
>> I considered writing to CVAL rather than looping on TVAL writes, but
>> that would still have required separate set_next_event() code for the
>> erratum, and adding CVAL to the enum would have required a bunch of
>> extra handlers in switch statements (even where unused, due to compiler
>> warnings about unhandled enum values) including in an arm32 header.  It
>> seemed better to avoid the arm32 interaction and new untested
>> accessors.
>> ---
>>  Documentation/arm64/silicon-errata.txt |   2 +
>>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>>  drivers/clocksource/Kconfig            |  10 ++++
>>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>>  4 files changed, 154 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
>> index ba4b6ac..5778f62 100644
>> --- a/Documentation/arm64/silicon-errata.txt
>> +++ b/Documentation/arm64/silicon-errata.txt
>> @@ -57,3 +57,5 @@ stable kernels.
>>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
>> +|                |                 |                 |                         |
>> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
>> index fbe0ca3..70fbad9 100644
>> --- a/arch/arm64/include/asm/arch_timer.h
>> +++ b/arch/arm64/include/asm/arch_timer.h
>> @@ -23,10 +23,34 @@
>>  
>>  #include <linux/bug.h>
>>  #include <linux/init.h>
>> +#include <linux/jump_label.h>
>>  #include <linux/types.h>
>>  
>>  #include <clocksource/arm_arch_timer.h>
>>  
>> +extern struct static_key_false arch_timer_read_ool_enabled;
>> +
>> +#define ARCH_TIMER_REG_READ(reg, func) \
>> +extern u64 func##_ool(void); \
>> +static inline u64 __##func(void) \
>> +{ \
>> +	u64 val; \
>> +	asm volatile("mrs %0, " reg : "=r" (val)); \
>> +	return val; \
>> +} \
>> +static inline u64 _##func(void) \
>> +{ \
>> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
>> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
>> +		return func##_ool(); \
>> +	else \
>> +		return __##func(); \
>> +}
>> +
>> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
>> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
>> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
>> +
>>  /*
>>   * These register accessors are marked inline so the compiler can
>>   * nicely work out which register we want, and chuck away the rest of
>> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>>  	isb();
>>  }
>>  
>> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
>> +{
>> +	if (access == ARCH_TIMER_PHYS_ACCESS)
>> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
>> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
>> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
>> +
>> +	isb();
>> +}
>> +
>>  static __always_inline
>>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>  {
>> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>>  		switch (reg) {
>>  		case ARCH_TIMER_REG_CTRL:
>> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
>> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>>  			break;
>>  		case ARCH_TIMER_REG_TVAL:
>> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
>> +			val = _arch_timer_get_ptval();
>>  			break;
>>  		}
>>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>>  		switch (reg) {
>>  		case ARCH_TIMER_REG_CTRL:
>> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
>> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>>  			break;
>>  		case ARCH_TIMER_REG_TVAL:
>> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
>> +			val = _arch_timer_get_vtval();
>>  			break;
>>  		}
>>  	}
>> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>>  
>>  static inline u64 arch_counter_get_cntvct(void)
>>  {
>> -	u64 cval;
>> -
>>  	isb();
>> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
>> -
>> -	return cval;
>> +	return _arch_counter_get_cntvct();
>>  }
>>  
>>  static inline int arch_timer_arch_init(void)
>> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
>> index c346be6..672ddc3 100644
>> --- a/drivers/clocksource/Kconfig
>> +++ b/drivers/clocksource/Kconfig
>> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>>  	  This must be disabled for hardware validation purposes to detect any
>>  	  hardware anomalies of missing events.
>>  
>> +config FSL_ERRATUM_A008585
>> +	bool "Workaround for Freescale/NXP Erratum A-008585"
>> +	default y
>> +	depends on ARM_ARCH_TIMER && ARM64
>> +	help
>> +	  This option enables a workaround for Freescale/NXP Erratum
>> +	  A-008585 ("ARM generic timer may contain an erroneous
>> +	  value").  The workaround will only be active if the
>> +	  fsl,erratum-a008585 property is found in the timer node.
>> +
>>  config ARM_GLOBAL_TIMER
>>  	bool
>>  	select CLKSRC_OF if OF
>> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
>> index 5152b38..7ead4eb 100644
>> --- a/drivers/clocksource/arm_arch_timer.c
>> +++ b/drivers/clocksource/arm_arch_timer.c
>> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>>   * Architected system timer support.
>>   */
>>  
>> +#ifdef CONFIG_FSL_ERRATUM_A008585
>> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
>> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
>> +
>> +/*
>> + * __always_inline is used to ensure that func() is not an actual function
>> + * pointer, which would result in the register accesses potentially being too
>> + * far apart for the loop to work.
>> + *
>> + * The timeout is an arbitrary value well beyond the highest number
>> + * of iterations the loop has been observed to take.
>> + */
>> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
>> +{
>> +	u64 cval_old, cval_new;
>> +	int timeout = 200;
>> +
>> +	do {
>> +		isb();
>> +		cval_old = func();
>> +		cval_new = func();
>> +		timeout--;
>> +	} while (unlikely(cval_old != cval_new) && timeout);
>> +
>> +	WARN_ON_ONCE(!timeout);
>> +	return cval_new;
>> +}
> Hi Scott:
> 
> I have test this patch, this solution looks will break the performance a little more than I expected.
> it will have more than 10% that the cval will read again, we could sure that the cval_old always equal to the
> cval_new in the normal circumstances, so I prefer this way:
> 
> 	do {
> 		isb();
> 		cval_old = func();
> 		cval_new = func();
> 		timeout--;	
> 	} while (unlikely((cval_new - cval_old) >> 2) && timeout);

What makes you think that ignoring the two bottom bits is a safe thing
to do? Talking about performance when the HW has such a dramatic bug is
like putting a bigger engine on a car that has no brakes: you just hit
the wall quicker.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
@ 2016-07-07  9:49           ` Marc Zyngier
  0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2016-07-07  9:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/07/16 10:34, Ding Tianhong wrote:
> On 2016/7/2 6:41, Scott Wood wrote:
>> Erratum A-008585 says that the ARM generic timer counter "has the
>> potential to contain an erroneous value for a small number of core
>> clock cycles every time the timer value changes".  Accesses to TVAL
>> (both read and write) are also affected due to the implicit counter
>> read.  Accesses to CVAL are not affected.
>>
>> The workaround is to reread TVAL and count registers until successive reads
>> return the same value, and when writing TVAL to retry until counter
>> reads before and after the write return the same value.
>>
>> This erratum can be found on LS1043A and LS2080A.
>>
>> Signed-off-by: Scott Wood <oss@buserror.net>
>> ---
>> v3:
>> - Used cval rather than a loop for the write side of the erratum
>> - Added a Kconfig control
>> - Moved the device tree binding into its own patch
>> - Added erratum to silicon-errata.txt
>> - Changed function names to contain the erratum name
>> - Factored out the setting of erratum versions of set_next_event
>>   to improve readability
>> - Added a comment clarifying that the timeout is arbitrary
>>
>> v2:
>> Significant rework based on feedback, including using static_key,
>> disabling VDSO counter access rather than adding the workaround to the
>> VDSO, and uninlining the loops.
>>
>> Dropped the separate property for indicating that writes to TVAL are
>> affected, as I believe that's just a side effect of the implicit
>> counter read being corrupted, and thus a chip that is affected by one
>> will always be affected by the other.
>>
>> Dropped the arm32 portion as it seems there was confusion about whether
>> LS1021A is affected.  Currently I am being told that it is not
>> affected.
>>
>> I considered writing to CVAL rather than looping on TVAL writes, but
>> that would still have required separate set_next_event() code for the
>> erratum, and adding CVAL to the enum would have required a bunch of
>> extra handlers in switch statements (even where unused, due to compiler
>> warnings about unhandled enum values) including in an arm32 header.  It
>> seemed better to avoid the arm32 interaction and new untested
>> accessors.
>> ---
>>  Documentation/arm64/silicon-errata.txt |   2 +
>>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>>  drivers/clocksource/Kconfig            |  10 ++++
>>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>>  4 files changed, 154 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
>> index ba4b6ac..5778f62 100644
>> --- a/Documentation/arm64/silicon-errata.txt
>> +++ b/Documentation/arm64/silicon-errata.txt
>> @@ -57,3 +57,5 @@ stable kernels.
>>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
>> +|                |                 |                 |                         |
>> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
>> index fbe0ca3..70fbad9 100644
>> --- a/arch/arm64/include/asm/arch_timer.h
>> +++ b/arch/arm64/include/asm/arch_timer.h
>> @@ -23,10 +23,34 @@
>>  
>>  #include <linux/bug.h>
>>  #include <linux/init.h>
>> +#include <linux/jump_label.h>
>>  #include <linux/types.h>
>>  
>>  #include <clocksource/arm_arch_timer.h>
>>  
>> +extern struct static_key_false arch_timer_read_ool_enabled;
>> +
>> +#define ARCH_TIMER_REG_READ(reg, func) \
>> +extern u64 func##_ool(void); \
>> +static inline u64 __##func(void) \
>> +{ \
>> +	u64 val; \
>> +	asm volatile("mrs %0, " reg : "=r" (val)); \
>> +	return val; \
>> +} \
>> +static inline u64 _##func(void) \
>> +{ \
>> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
>> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
>> +		return func##_ool(); \
>> +	else \
>> +		return __##func(); \
>> +}
>> +
>> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
>> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
>> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
>> +
>>  /*
>>   * These register accessors are marked inline so the compiler can
>>   * nicely work out which register we want, and chuck away the rest of
>> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>>  	isb();
>>  }
>>  
>> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
>> +{
>> +	if (access == ARCH_TIMER_PHYS_ACCESS)
>> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
>> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
>> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
>> +
>> +	isb();
>> +}
>> +
>>  static __always_inline
>>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>  {
>> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>>  		switch (reg) {
>>  		case ARCH_TIMER_REG_CTRL:
>> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
>> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>>  			break;
>>  		case ARCH_TIMER_REG_TVAL:
>> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
>> +			val = _arch_timer_get_ptval();
>>  			break;
>>  		}
>>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>>  		switch (reg) {
>>  		case ARCH_TIMER_REG_CTRL:
>> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
>> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>>  			break;
>>  		case ARCH_TIMER_REG_TVAL:
>> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
>> +			val = _arch_timer_get_vtval();
>>  			break;
>>  		}
>>  	}
>> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>>  
>>  static inline u64 arch_counter_get_cntvct(void)
>>  {
>> -	u64 cval;
>> -
>>  	isb();
>> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
>> -
>> -	return cval;
>> +	return _arch_counter_get_cntvct();
>>  }
>>  
>>  static inline int arch_timer_arch_init(void)
>> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
>> index c346be6..672ddc3 100644
>> --- a/drivers/clocksource/Kconfig
>> +++ b/drivers/clocksource/Kconfig
>> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>>  	  This must be disabled for hardware validation purposes to detect any
>>  	  hardware anomalies of missing events.
>>  
>> +config FSL_ERRATUM_A008585
>> +	bool "Workaround for Freescale/NXP Erratum A-008585"
>> +	default y
>> +	depends on ARM_ARCH_TIMER && ARM64
>> +	help
>> +	  This option enables a workaround for Freescale/NXP Erratum
>> +	  A-008585 ("ARM generic timer may contain an erroneous
>> +	  value").  The workaround will only be active if the
>> +	  fsl,erratum-a008585 property is found in the timer node.
>> +
>>  config ARM_GLOBAL_TIMER
>>  	bool
>>  	select CLKSRC_OF if OF
>> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
>> index 5152b38..7ead4eb 100644
>> --- a/drivers/clocksource/arm_arch_timer.c
>> +++ b/drivers/clocksource/arm_arch_timer.c
>> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>>   * Architected system timer support.
>>   */
>>  
>> +#ifdef CONFIG_FSL_ERRATUM_A008585
>> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
>> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
>> +
>> +/*
>> + * __always_inline is used to ensure that func() is not an actual function
>> + * pointer, which would result in the register accesses potentially being too
>> + * far apart for the loop to work.
>> + *
>> + * The timeout is an arbitrary value well beyond the highest number
>> + * of iterations the loop has been observed to take.
>> + */
>> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
>> +{
>> +	u64 cval_old, cval_new;
>> +	int timeout = 200;
>> +
>> +	do {
>> +		isb();
>> +		cval_old = func();
>> +		cval_new = func();
>> +		timeout--;
>> +	} while (unlikely(cval_old != cval_new) && timeout);
>> +
>> +	WARN_ON_ONCE(!timeout);
>> +	return cval_new;
>> +}
> Hi Scott:
> 
> I have test this patch, this solution looks will break the performance a little more than I expected.
> it will have more than 10% that the cval will read again, we could sure that the cval_old always equal to the
> cval_new in the normal circumstances, so I prefer this way:
> 
> 	do {
> 		isb();
> 		cval_old = func();
> 		cval_new = func();
> 		timeout--;	
> 	} while (unlikely((cval_new - cval_old) >> 2) && timeout);

What makes you think that ignoring the two bottom bits is a safe thing
to do? Talking about performance when the HW has such a dramatic bug is
like putting a bigger engine on a car that has no brakes: you just hit
the wall quicker.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
  2016-07-07  9:49           ` Marc Zyngier
@ 2016-07-07 11:37               ` Ding Tianhong
  -1 siblings, 0 replies; 24+ messages in thread
From: Ding Tianhong @ 2016-07-07 11:37 UTC (permalink / raw)
  To: Marc Zyngier, Scott Wood, Catalin Marinas, Will Deacon
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, stuart.yoder-3arQi8VN3Tc,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 2016/7/7 17:49, Marc Zyngier wrote:
> On 07/07/16 10:34, Ding Tianhong wrote:
>> On 2016/7/2 6:41, Scott Wood wrote:
>>> Erratum A-008585 says that the ARM generic timer counter "has the
>>> potential to contain an erroneous value for a small number of core
>>> clock cycles every time the timer value changes".  Accesses to TVAL
>>> (both read and write) are also affected due to the implicit counter
>>> read.  Accesses to CVAL are not affected.
>>>
>>> The workaround is to reread TVAL and count registers until successive reads
>>> return the same value, and when writing TVAL to retry until counter
>>> reads before and after the write return the same value.
>>>
>>> This erratum can be found on LS1043A and LS2080A.
>>>
>>> Signed-off-by: Scott Wood <oss-fOR+EgIDQEHk1uMJSBkQmQ@public.gmane.org>
>>> ---
>>> v3:
>>> - Used cval rather than a loop for the write side of the erratum
>>> - Added a Kconfig control
>>> - Moved the device tree binding into its own patch
>>> - Added erratum to silicon-errata.txt
>>> - Changed function names to contain the erratum name
>>> - Factored out the setting of erratum versions of set_next_event
>>>   to improve readability
>>> - Added a comment clarifying that the timeout is arbitrary
>>>
>>> v2:
>>> Significant rework based on feedback, including using static_key,
>>> disabling VDSO counter access rather than adding the workaround to the
>>> VDSO, and uninlining the loops.
>>>
>>> Dropped the separate property for indicating that writes to TVAL are
>>> affected, as I believe that's just a side effect of the implicit
>>> counter read being corrupted, and thus a chip that is affected by one
>>> will always be affected by the other.
>>>
>>> Dropped the arm32 portion as it seems there was confusion about whether
>>> LS1021A is affected.  Currently I am being told that it is not
>>> affected.
>>>
>>> I considered writing to CVAL rather than looping on TVAL writes, but
>>> that would still have required separate set_next_event() code for the
>>> erratum, and adding CVAL to the enum would have required a bunch of
>>> extra handlers in switch statements (even where unused, due to compiler
>>> warnings about unhandled enum values) including in an arm32 header.  It
>>> seemed better to avoid the arm32 interaction and new untested
>>> accessors.
>>> ---
>>>  Documentation/arm64/silicon-errata.txt |   2 +
>>>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>>>  drivers/clocksource/Kconfig            |  10 ++++
>>>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>>>  4 files changed, 154 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
>>> index ba4b6ac..5778f62 100644
>>> --- a/Documentation/arm64/silicon-errata.txt
>>> +++ b/Documentation/arm64/silicon-errata.txt
>>> @@ -57,3 +57,5 @@ stable kernels.
>>>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>>>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>>>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
>>> +|                |                 |                 |                         |
>>> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
>>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
>>> index fbe0ca3..70fbad9 100644
>>> --- a/arch/arm64/include/asm/arch_timer.h
>>> +++ b/arch/arm64/include/asm/arch_timer.h
>>> @@ -23,10 +23,34 @@
>>>  
>>>  #include <linux/bug.h>
>>>  #include <linux/init.h>
>>> +#include <linux/jump_label.h>
>>>  #include <linux/types.h>
>>>  
>>>  #include <clocksource/arm_arch_timer.h>
>>>  
>>> +extern struct static_key_false arch_timer_read_ool_enabled;
>>> +
>>> +#define ARCH_TIMER_REG_READ(reg, func) \
>>> +extern u64 func##_ool(void); \
>>> +static inline u64 __##func(void) \
>>> +{ \
>>> +	u64 val; \
>>> +	asm volatile("mrs %0, " reg : "=r" (val)); \
>>> +	return val; \
>>> +} \
>>> +static inline u64 _##func(void) \
>>> +{ \
>>> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
>>> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
>>> +		return func##_ool(); \
>>> +	else \
>>> +		return __##func(); \
>>> +}
>>> +
>>> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
>>> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
>>> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
>>> +
>>>  /*
>>>   * These register accessors are marked inline so the compiler can
>>>   * nicely work out which register we want, and chuck away the rest of
>>> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>>>  	isb();
>>>  }
>>>  
>>> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
>>> +{
>>> +	if (access == ARCH_TIMER_PHYS_ACCESS)
>>> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
>>> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
>>> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
>>> +
>>> +	isb();
>>> +}
>>> +
>>>  static __always_inline
>>>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>  {
>>> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>>>  		switch (reg) {
>>>  		case ARCH_TIMER_REG_CTRL:
>>> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
>>> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>>>  			break;
>>>  		case ARCH_TIMER_REG_TVAL:
>>> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
>>> +			val = _arch_timer_get_ptval();
>>>  			break;
>>>  		}
>>>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>>>  		switch (reg) {
>>>  		case ARCH_TIMER_REG_CTRL:
>>> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
>>> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>>>  			break;
>>>  		case ARCH_TIMER_REG_TVAL:
>>> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
>>> +			val = _arch_timer_get_vtval();
>>>  			break;
>>>  		}
>>>  	}
>>> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>>>  
>>>  static inline u64 arch_counter_get_cntvct(void)
>>>  {
>>> -	u64 cval;
>>> -
>>>  	isb();
>>> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
>>> -
>>> -	return cval;
>>> +	return _arch_counter_get_cntvct();
>>>  }
>>>  
>>>  static inline int arch_timer_arch_init(void)
>>> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
>>> index c346be6..672ddc3 100644
>>> --- a/drivers/clocksource/Kconfig
>>> +++ b/drivers/clocksource/Kconfig
>>> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>>>  	  This must be disabled for hardware validation purposes to detect any
>>>  	  hardware anomalies of missing events.
>>>  
>>> +config FSL_ERRATUM_A008585
>>> +	bool "Workaround for Freescale/NXP Erratum A-008585"
>>> +	default y
>>> +	depends on ARM_ARCH_TIMER && ARM64
>>> +	help
>>> +	  This option enables a workaround for Freescale/NXP Erratum
>>> +	  A-008585 ("ARM generic timer may contain an erroneous
>>> +	  value").  The workaround will only be active if the
>>> +	  fsl,erratum-a008585 property is found in the timer node.
>>> +
>>>  config ARM_GLOBAL_TIMER
>>>  	bool
>>>  	select CLKSRC_OF if OF
>>> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
>>> index 5152b38..7ead4eb 100644
>>> --- a/drivers/clocksource/arm_arch_timer.c
>>> +++ b/drivers/clocksource/arm_arch_timer.c
>>> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>>>   * Architected system timer support.
>>>   */
>>>  
>>> +#ifdef CONFIG_FSL_ERRATUM_A008585
>>> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
>>> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
>>> +
>>> +/*
>>> + * __always_inline is used to ensure that func() is not an actual function
>>> + * pointer, which would result in the register accesses potentially being too
>>> + * far apart for the loop to work.
>>> + *
>>> + * The timeout is an arbitrary value well beyond the highest number
>>> + * of iterations the loop has been observed to take.
>>> + */
>>> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
>>> +{
>>> +	u64 cval_old, cval_new;
>>> +	int timeout = 200;
>>> +
>>> +	do {
>>> +		isb();
>>> +		cval_old = func();
>>> +		cval_new = func();
>>> +		timeout--;
>>> +	} while (unlikely(cval_old != cval_new) && timeout);
>>> +
>>> +	WARN_ON_ONCE(!timeout);
>>> +	return cval_new;
>>> +}
>> Hi Scott:
>>
>> I have test this patch, this solution looks will break the performance a little more than I expected.
>> it will have more than 10% that the cval will read again, we could sure that the cval_old always equal to the
>> cval_new in the normal circumstances, so I prefer this way:
>>
>> 	do {
>> 		isb();
>> 		cval_old = func();
>> 		cval_new = func();
>> 		timeout--;	
>> 	} while (unlikely((cval_new - cval_old) >> 2) && timeout);
> 
> What makes you think that ignoring the two bottom bits is a safe thing
> to do? Talking about performance when the HW has such a dramatic bug is
> like putting a bigger engine on a car that has no brakes: you just hit
> the wall quicker.
> 
> Thanks,
> 

I have a chip which has the same problem like Scott's chip, and I wish to solve this problem in the same way, 
our chip designer told me that if you got a wrong value from the cntvct_el0, you would not get a wrong value
until 8 cycles later, so I could ignoring the lowest 3 bits if I reading twice together.

The key problem is the probability of this bug, my chip has 1/100000 chance to met this bug, so use 10% performance
to fix this bug looks more expensive.

Thanks.
Ding

> 	M.
> 


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
@ 2016-07-07 11:37               ` Ding Tianhong
  0 siblings, 0 replies; 24+ messages in thread
From: Ding Tianhong @ 2016-07-07 11:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016/7/7 17:49, Marc Zyngier wrote:
> On 07/07/16 10:34, Ding Tianhong wrote:
>> On 2016/7/2 6:41, Scott Wood wrote:
>>> Erratum A-008585 says that the ARM generic timer counter "has the
>>> potential to contain an erroneous value for a small number of core
>>> clock cycles every time the timer value changes".  Accesses to TVAL
>>> (both read and write) are also affected due to the implicit counter
>>> read.  Accesses to CVAL are not affected.
>>>
>>> The workaround is to reread TVAL and count registers until successive reads
>>> return the same value, and when writing TVAL to retry until counter
>>> reads before and after the write return the same value.
>>>
>>> This erratum can be found on LS1043A and LS2080A.
>>>
>>> Signed-off-by: Scott Wood <oss@buserror.net>
>>> ---
>>> v3:
>>> - Used cval rather than a loop for the write side of the erratum
>>> - Added a Kconfig control
>>> - Moved the device tree binding into its own patch
>>> - Added erratum to silicon-errata.txt
>>> - Changed function names to contain the erratum name
>>> - Factored out the setting of erratum versions of set_next_event
>>>   to improve readability
>>> - Added a comment clarifying that the timeout is arbitrary
>>>
>>> v2:
>>> Significant rework based on feedback, including using static_key,
>>> disabling VDSO counter access rather than adding the workaround to the
>>> VDSO, and uninlining the loops.
>>>
>>> Dropped the separate property for indicating that writes to TVAL are
>>> affected, as I believe that's just a side effect of the implicit
>>> counter read being corrupted, and thus a chip that is affected by one
>>> will always be affected by the other.
>>>
>>> Dropped the arm32 portion as it seems there was confusion about whether
>>> LS1021A is affected.  Currently I am being told that it is not
>>> affected.
>>>
>>> I considered writing to CVAL rather than looping on TVAL writes, but
>>> that would still have required separate set_next_event() code for the
>>> erratum, and adding CVAL to the enum would have required a bunch of
>>> extra handlers in switch statements (even where unused, due to compiler
>>> warnings about unhandled enum values) including in an arm32 header.  It
>>> seemed better to avoid the arm32 interaction and new untested
>>> accessors.
>>> ---
>>>  Documentation/arm64/silicon-errata.txt |   2 +
>>>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>>>  drivers/clocksource/Kconfig            |  10 ++++
>>>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>>>  4 files changed, 154 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
>>> index ba4b6ac..5778f62 100644
>>> --- a/Documentation/arm64/silicon-errata.txt
>>> +++ b/Documentation/arm64/silicon-errata.txt
>>> @@ -57,3 +57,5 @@ stable kernels.
>>>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>>>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>>>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
>>> +|                |                 |                 |                         |
>>> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
>>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
>>> index fbe0ca3..70fbad9 100644
>>> --- a/arch/arm64/include/asm/arch_timer.h
>>> +++ b/arch/arm64/include/asm/arch_timer.h
>>> @@ -23,10 +23,34 @@
>>>  
>>>  #include <linux/bug.h>
>>>  #include <linux/init.h>
>>> +#include <linux/jump_label.h>
>>>  #include <linux/types.h>
>>>  
>>>  #include <clocksource/arm_arch_timer.h>
>>>  
>>> +extern struct static_key_false arch_timer_read_ool_enabled;
>>> +
>>> +#define ARCH_TIMER_REG_READ(reg, func) \
>>> +extern u64 func##_ool(void); \
>>> +static inline u64 __##func(void) \
>>> +{ \
>>> +	u64 val; \
>>> +	asm volatile("mrs %0, " reg : "=r" (val)); \
>>> +	return val; \
>>> +} \
>>> +static inline u64 _##func(void) \
>>> +{ \
>>> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
>>> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
>>> +		return func##_ool(); \
>>> +	else \
>>> +		return __##func(); \
>>> +}
>>> +
>>> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
>>> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
>>> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
>>> +
>>>  /*
>>>   * These register accessors are marked inline so the compiler can
>>>   * nicely work out which register we want, and chuck away the rest of
>>> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>>>  	isb();
>>>  }
>>>  
>>> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
>>> +{
>>> +	if (access == ARCH_TIMER_PHYS_ACCESS)
>>> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
>>> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
>>> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
>>> +
>>> +	isb();
>>> +}
>>> +
>>>  static __always_inline
>>>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>  {
>>> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>>>  		switch (reg) {
>>>  		case ARCH_TIMER_REG_CTRL:
>>> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
>>> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>>>  			break;
>>>  		case ARCH_TIMER_REG_TVAL:
>>> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
>>> +			val = _arch_timer_get_ptval();
>>>  			break;
>>>  		}
>>>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>>>  		switch (reg) {
>>>  		case ARCH_TIMER_REG_CTRL:
>>> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
>>> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>>>  			break;
>>>  		case ARCH_TIMER_REG_TVAL:
>>> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
>>> +			val = _arch_timer_get_vtval();
>>>  			break;
>>>  		}
>>>  	}
>>> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>>>  
>>>  static inline u64 arch_counter_get_cntvct(void)
>>>  {
>>> -	u64 cval;
>>> -
>>>  	isb();
>>> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
>>> -
>>> -	return cval;
>>> +	return _arch_counter_get_cntvct();
>>>  }
>>>  
>>>  static inline int arch_timer_arch_init(void)
>>> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
>>> index c346be6..672ddc3 100644
>>> --- a/drivers/clocksource/Kconfig
>>> +++ b/drivers/clocksource/Kconfig
>>> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>>>  	  This must be disabled for hardware validation purposes to detect any
>>>  	  hardware anomalies of missing events.
>>>  
>>> +config FSL_ERRATUM_A008585
>>> +	bool "Workaround for Freescale/NXP Erratum A-008585"
>>> +	default y
>>> +	depends on ARM_ARCH_TIMER && ARM64
>>> +	help
>>> +	  This option enables a workaround for Freescale/NXP Erratum
>>> +	  A-008585 ("ARM generic timer may contain an erroneous
>>> +	  value").  The workaround will only be active if the
>>> +	  fsl,erratum-a008585 property is found in the timer node.
>>> +
>>>  config ARM_GLOBAL_TIMER
>>>  	bool
>>>  	select CLKSRC_OF if OF
>>> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
>>> index 5152b38..7ead4eb 100644
>>> --- a/drivers/clocksource/arm_arch_timer.c
>>> +++ b/drivers/clocksource/arm_arch_timer.c
>>> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>>>   * Architected system timer support.
>>>   */
>>>  
>>> +#ifdef CONFIG_FSL_ERRATUM_A008585
>>> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
>>> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
>>> +
>>> +/*
>>> + * __always_inline is used to ensure that func() is not an actual function
>>> + * pointer, which would result in the register accesses potentially being too
>>> + * far apart for the loop to work.
>>> + *
>>> + * The timeout is an arbitrary value well beyond the highest number
>>> + * of iterations the loop has been observed to take.
>>> + */
>>> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
>>> +{
>>> +	u64 cval_old, cval_new;
>>> +	int timeout = 200;
>>> +
>>> +	do {
>>> +		isb();
>>> +		cval_old = func();
>>> +		cval_new = func();
>>> +		timeout--;
>>> +	} while (unlikely(cval_old != cval_new) && timeout);
>>> +
>>> +	WARN_ON_ONCE(!timeout);
>>> +	return cval_new;
>>> +}
>> Hi Scott:
>>
>> I have test this patch, this solution looks will break the performance a little more than I expected.
>> it will have more than 10% that the cval will read again, we could sure that the cval_old always equal to the
>> cval_new in the normal circumstances, so I prefer this way:
>>
>> 	do {
>> 		isb();
>> 		cval_old = func();
>> 		cval_new = func();
>> 		timeout--;	
>> 	} while (unlikely((cval_new - cval_old) >> 2) && timeout);
> 
> What makes you think that ignoring the two bottom bits is a safe thing
> to do? Talking about performance when the HW has such a dramatic bug is
> like putting a bigger engine on a car that has no brakes: you just hit
> the wall quicker.
> 
> Thanks,
> 

I have a chip which has the same problem like Scott's chip, and I wish to solve this problem in the same way, 
our chip designer told me that if you got a wrong value from the cntvct_el0, you would not get a wrong value
until 8 cycles later, so I could ignoring the lowest 3 bits if I reading twice together.

The key problem is the probability of this bug, my chip has 1/100000 chance to met this bug, so use 10% performance
to fix this bug looks more expensive.

Thanks.
Ding

> 	M.
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
  2016-07-07 11:37               ` Ding Tianhong
@ 2016-07-07 11:51                   ` Marc Zyngier
  -1 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2016-07-07 11:51 UTC (permalink / raw)
  To: Ding Tianhong, Scott Wood, Catalin Marinas, Will Deacon
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, stuart.yoder-3arQi8VN3Tc,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 07/07/16 12:37, Ding Tianhong wrote:
> On 2016/7/7 17:49, Marc Zyngier wrote:
>> On 07/07/16 10:34, Ding Tianhong wrote:
>>> On 2016/7/2 6:41, Scott Wood wrote:
>>>> Erratum A-008585 says that the ARM generic timer counter "has the
>>>> potential to contain an erroneous value for a small number of core
>>>> clock cycles every time the timer value changes".  Accesses to TVAL
>>>> (both read and write) are also affected due to the implicit counter
>>>> read.  Accesses to CVAL are not affected.
>>>>
>>>> The workaround is to reread TVAL and count registers until successive reads
>>>> return the same value, and when writing TVAL to retry until counter
>>>> reads before and after the write return the same value.
>>>>
>>>> This erratum can be found on LS1043A and LS2080A.
>>>>
>>>> Signed-off-by: Scott Wood <oss-fOR+EgIDQEHk1uMJSBkQmQ@public.gmane.org>
>>>> ---
>>>> v3:
>>>> - Used cval rather than a loop for the write side of the erratum
>>>> - Added a Kconfig control
>>>> - Moved the device tree binding into its own patch
>>>> - Added erratum to silicon-errata.txt
>>>> - Changed function names to contain the erratum name
>>>> - Factored out the setting of erratum versions of set_next_event
>>>>   to improve readability
>>>> - Added a comment clarifying that the timeout is arbitrary
>>>>
>>>> v2:
>>>> Significant rework based on feedback, including using static_key,
>>>> disabling VDSO counter access rather than adding the workaround to the
>>>> VDSO, and uninlining the loops.
>>>>
>>>> Dropped the separate property for indicating that writes to TVAL are
>>>> affected, as I believe that's just a side effect of the implicit
>>>> counter read being corrupted, and thus a chip that is affected by one
>>>> will always be affected by the other.
>>>>
>>>> Dropped the arm32 portion as it seems there was confusion about whether
>>>> LS1021A is affected.  Currently I am being told that it is not
>>>> affected.
>>>>
>>>> I considered writing to CVAL rather than looping on TVAL writes, but
>>>> that would still have required separate set_next_event() code for the
>>>> erratum, and adding CVAL to the enum would have required a bunch of
>>>> extra handlers in switch statements (even where unused, due to compiler
>>>> warnings about unhandled enum values) including in an arm32 header.  It
>>>> seemed better to avoid the arm32 interaction and new untested
>>>> accessors.
>>>> ---
>>>>  Documentation/arm64/silicon-errata.txt |   2 +
>>>>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>>>>  drivers/clocksource/Kconfig            |  10 ++++
>>>>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>>>>  4 files changed, 154 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
>>>> index ba4b6ac..5778f62 100644
>>>> --- a/Documentation/arm64/silicon-errata.txt
>>>> +++ b/Documentation/arm64/silicon-errata.txt
>>>> @@ -57,3 +57,5 @@ stable kernels.
>>>>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>>>>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>>>>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
>>>> +|                |                 |                 |                         |
>>>> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
>>>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
>>>> index fbe0ca3..70fbad9 100644
>>>> --- a/arch/arm64/include/asm/arch_timer.h
>>>> +++ b/arch/arm64/include/asm/arch_timer.h
>>>> @@ -23,10 +23,34 @@
>>>>  
>>>>  #include <linux/bug.h>
>>>>  #include <linux/init.h>
>>>> +#include <linux/jump_label.h>
>>>>  #include <linux/types.h>
>>>>  
>>>>  #include <clocksource/arm_arch_timer.h>
>>>>  
>>>> +extern struct static_key_false arch_timer_read_ool_enabled;
>>>> +
>>>> +#define ARCH_TIMER_REG_READ(reg, func) \
>>>> +extern u64 func##_ool(void); \
>>>> +static inline u64 __##func(void) \
>>>> +{ \
>>>> +	u64 val; \
>>>> +	asm volatile("mrs %0, " reg : "=r" (val)); \
>>>> +	return val; \
>>>> +} \
>>>> +static inline u64 _##func(void) \
>>>> +{ \
>>>> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
>>>> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
>>>> +		return func##_ool(); \
>>>> +	else \
>>>> +		return __##func(); \
>>>> +}
>>>> +
>>>> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
>>>> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
>>>> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
>>>> +
>>>>  /*
>>>>   * These register accessors are marked inline so the compiler can
>>>>   * nicely work out which register we want, and chuck away the rest of
>>>> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>>>>  	isb();
>>>>  }
>>>>  
>>>> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
>>>> +{
>>>> +	if (access == ARCH_TIMER_PHYS_ACCESS)
>>>> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
>>>> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
>>>> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
>>>> +
>>>> +	isb();
>>>> +}
>>>> +
>>>>  static __always_inline
>>>>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>>  {
>>>> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>>>>  		switch (reg) {
>>>>  		case ARCH_TIMER_REG_CTRL:
>>>> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
>>>> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>>>>  			break;
>>>>  		case ARCH_TIMER_REG_TVAL:
>>>> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
>>>> +			val = _arch_timer_get_ptval();
>>>>  			break;
>>>>  		}
>>>>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>>>>  		switch (reg) {
>>>>  		case ARCH_TIMER_REG_CTRL:
>>>> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
>>>> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>>>>  			break;
>>>>  		case ARCH_TIMER_REG_TVAL:
>>>> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
>>>> +			val = _arch_timer_get_vtval();
>>>>  			break;
>>>>  		}
>>>>  	}
>>>> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>>>>  
>>>>  static inline u64 arch_counter_get_cntvct(void)
>>>>  {
>>>> -	u64 cval;
>>>> -
>>>>  	isb();
>>>> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
>>>> -
>>>> -	return cval;
>>>> +	return _arch_counter_get_cntvct();
>>>>  }
>>>>  
>>>>  static inline int arch_timer_arch_init(void)
>>>> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
>>>> index c346be6..672ddc3 100644
>>>> --- a/drivers/clocksource/Kconfig
>>>> +++ b/drivers/clocksource/Kconfig
>>>> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>>>>  	  This must be disabled for hardware validation purposes to detect any
>>>>  	  hardware anomalies of missing events.
>>>>  
>>>> +config FSL_ERRATUM_A008585
>>>> +	bool "Workaround for Freescale/NXP Erratum A-008585"
>>>> +	default y
>>>> +	depends on ARM_ARCH_TIMER && ARM64
>>>> +	help
>>>> +	  This option enables a workaround for Freescale/NXP Erratum
>>>> +	  A-008585 ("ARM generic timer may contain an erroneous
>>>> +	  value").  The workaround will only be active if the
>>>> +	  fsl,erratum-a008585 property is found in the timer node.
>>>> +
>>>>  config ARM_GLOBAL_TIMER
>>>>  	bool
>>>>  	select CLKSRC_OF if OF
>>>> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
>>>> index 5152b38..7ead4eb 100644
>>>> --- a/drivers/clocksource/arm_arch_timer.c
>>>> +++ b/drivers/clocksource/arm_arch_timer.c
>>>> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>>>>   * Architected system timer support.
>>>>   */
>>>>  
>>>> +#ifdef CONFIG_FSL_ERRATUM_A008585
>>>> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
>>>> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
>>>> +
>>>> +/*
>>>> + * __always_inline is used to ensure that func() is not an actual function
>>>> + * pointer, which would result in the register accesses potentially being too
>>>> + * far apart for the loop to work.
>>>> + *
>>>> + * The timeout is an arbitrary value well beyond the highest number
>>>> + * of iterations the loop has been observed to take.
>>>> + */
>>>> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
>>>> +{
>>>> +	u64 cval_old, cval_new;
>>>> +	int timeout = 200;
>>>> +
>>>> +	do {
>>>> +		isb();
>>>> +		cval_old = func();
>>>> +		cval_new = func();
>>>> +		timeout--;
>>>> +	} while (unlikely(cval_old != cval_new) && timeout);
>>>> +
>>>> +	WARN_ON_ONCE(!timeout);
>>>> +	return cval_new;
>>>> +}
>>> Hi Scott:
>>>
>>> I have test this patch, this solution looks will break the performance a little more than I expected.
>>> it will have more than 10% that the cval will read again, we could sure that the cval_old always equal to the
>>> cval_new in the normal circumstances, so I prefer this way:
>>>
>>> 	do {
>>> 		isb();
>>> 		cval_old = func();
>>> 		cval_new = func();
>>> 		timeout--;	
>>> 	} while (unlikely((cval_new - cval_old) >> 2) && timeout);
>>
>> What makes you think that ignoring the two bottom bits is a safe thing
>> to do? Talking about performance when the HW has such a dramatic bug is
>> like putting a bigger engine on a car that has no brakes: you just hit
>> the wall quicker.
>>
>> Thanks,
>>
> 
> I have a chip which has the same problem like Scott's chip, and I
> wish to solve this problem in the same way, our chip designer told me
> that if you got a wrong value from the cntvct_el0, you would not get
> a wrong value until 8 cycles later, so I could ignoring the lowest 3
> bits if I reading twice together.

Is that CPU cycles? Or timer cycles? What guarantees do you have that
the two reads are *always* done in the right timing window?

> The key problem is the probability of this bug, my chip has 1/100000
> chance to met this bug, so use 10% performance to fix this bug looks
> more expensive.

You care about performance, I care about correctness. If 10% of your CPU
is wasted on a correct workaround, tough. Next time, your HW guy will
spend more time getting it right.

Anyway, what you are describing is a different bug from what FSL has, so
don't try and shoehorn your own constraints in another workaround.
Please implement your own.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
@ 2016-07-07 11:51                   ` Marc Zyngier
  0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2016-07-07 11:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/07/16 12:37, Ding Tianhong wrote:
> On 2016/7/7 17:49, Marc Zyngier wrote:
>> On 07/07/16 10:34, Ding Tianhong wrote:
>>> On 2016/7/2 6:41, Scott Wood wrote:
>>>> Erratum A-008585 says that the ARM generic timer counter "has the
>>>> potential to contain an erroneous value for a small number of core
>>>> clock cycles every time the timer value changes".  Accesses to TVAL
>>>> (both read and write) are also affected due to the implicit counter
>>>> read.  Accesses to CVAL are not affected.
>>>>
>>>> The workaround is to reread TVAL and count registers until successive reads
>>>> return the same value, and when writing TVAL to retry until counter
>>>> reads before and after the write return the same value.
>>>>
>>>> This erratum can be found on LS1043A and LS2080A.
>>>>
>>>> Signed-off-by: Scott Wood <oss@buserror.net>
>>>> ---
>>>> v3:
>>>> - Used cval rather than a loop for the write side of the erratum
>>>> - Added a Kconfig control
>>>> - Moved the device tree binding into its own patch
>>>> - Added erratum to silicon-errata.txt
>>>> - Changed function names to contain the erratum name
>>>> - Factored out the setting of erratum versions of set_next_event
>>>>   to improve readability
>>>> - Added a comment clarifying that the timeout is arbitrary
>>>>
>>>> v2:
>>>> Significant rework based on feedback, including using static_key,
>>>> disabling VDSO counter access rather than adding the workaround to the
>>>> VDSO, and uninlining the loops.
>>>>
>>>> Dropped the separate property for indicating that writes to TVAL are
>>>> affected, as I believe that's just a side effect of the implicit
>>>> counter read being corrupted, and thus a chip that is affected by one
>>>> will always be affected by the other.
>>>>
>>>> Dropped the arm32 portion as it seems there was confusion about whether
>>>> LS1021A is affected.  Currently I am being told that it is not
>>>> affected.
>>>>
>>>> I considered writing to CVAL rather than looping on TVAL writes, but
>>>> that would still have required separate set_next_event() code for the
>>>> erratum, and adding CVAL to the enum would have required a bunch of
>>>> extra handlers in switch statements (even where unused, due to compiler
>>>> warnings about unhandled enum values) including in an arm32 header.  It
>>>> seemed better to avoid the arm32 interaction and new untested
>>>> accessors.
>>>> ---
>>>>  Documentation/arm64/silicon-errata.txt |   2 +
>>>>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>>>>  drivers/clocksource/Kconfig            |  10 ++++
>>>>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>>>>  4 files changed, 154 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
>>>> index ba4b6ac..5778f62 100644
>>>> --- a/Documentation/arm64/silicon-errata.txt
>>>> +++ b/Documentation/arm64/silicon-errata.txt
>>>> @@ -57,3 +57,5 @@ stable kernels.
>>>>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>>>>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>>>>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
>>>> +|                |                 |                 |                         |
>>>> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
>>>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
>>>> index fbe0ca3..70fbad9 100644
>>>> --- a/arch/arm64/include/asm/arch_timer.h
>>>> +++ b/arch/arm64/include/asm/arch_timer.h
>>>> @@ -23,10 +23,34 @@
>>>>  
>>>>  #include <linux/bug.h>
>>>>  #include <linux/init.h>
>>>> +#include <linux/jump_label.h>
>>>>  #include <linux/types.h>
>>>>  
>>>>  #include <clocksource/arm_arch_timer.h>
>>>>  
>>>> +extern struct static_key_false arch_timer_read_ool_enabled;
>>>> +
>>>> +#define ARCH_TIMER_REG_READ(reg, func) \
>>>> +extern u64 func##_ool(void); \
>>>> +static inline u64 __##func(void) \
>>>> +{ \
>>>> +	u64 val; \
>>>> +	asm volatile("mrs %0, " reg : "=r" (val)); \
>>>> +	return val; \
>>>> +} \
>>>> +static inline u64 _##func(void) \
>>>> +{ \
>>>> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
>>>> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
>>>> +		return func##_ool(); \
>>>> +	else \
>>>> +		return __##func(); \
>>>> +}
>>>> +
>>>> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
>>>> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
>>>> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
>>>> +
>>>>  /*
>>>>   * These register accessors are marked inline so the compiler can
>>>>   * nicely work out which register we want, and chuck away the rest of
>>>> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>>>>  	isb();
>>>>  }
>>>>  
>>>> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
>>>> +{
>>>> +	if (access == ARCH_TIMER_PHYS_ACCESS)
>>>> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
>>>> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
>>>> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
>>>> +
>>>> +	isb();
>>>> +}
>>>> +
>>>>  static __always_inline
>>>>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>>  {
>>>> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>>>>  		switch (reg) {
>>>>  		case ARCH_TIMER_REG_CTRL:
>>>> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
>>>> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>>>>  			break;
>>>>  		case ARCH_TIMER_REG_TVAL:
>>>> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
>>>> +			val = _arch_timer_get_ptval();
>>>>  			break;
>>>>  		}
>>>>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>>>>  		switch (reg) {
>>>>  		case ARCH_TIMER_REG_CTRL:
>>>> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
>>>> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>>>>  			break;
>>>>  		case ARCH_TIMER_REG_TVAL:
>>>> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
>>>> +			val = _arch_timer_get_vtval();
>>>>  			break;
>>>>  		}
>>>>  	}
>>>> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>>>>  
>>>>  static inline u64 arch_counter_get_cntvct(void)
>>>>  {
>>>> -	u64 cval;
>>>> -
>>>>  	isb();
>>>> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
>>>> -
>>>> -	return cval;
>>>> +	return _arch_counter_get_cntvct();
>>>>  }
>>>>  
>>>>  static inline int arch_timer_arch_init(void)
>>>> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
>>>> index c346be6..672ddc3 100644
>>>> --- a/drivers/clocksource/Kconfig
>>>> +++ b/drivers/clocksource/Kconfig
>>>> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>>>>  	  This must be disabled for hardware validation purposes to detect any
>>>>  	  hardware anomalies of missing events.
>>>>  
>>>> +config FSL_ERRATUM_A008585
>>>> +	bool "Workaround for Freescale/NXP Erratum A-008585"
>>>> +	default y
>>>> +	depends on ARM_ARCH_TIMER && ARM64
>>>> +	help
>>>> +	  This option enables a workaround for Freescale/NXP Erratum
>>>> +	  A-008585 ("ARM generic timer may contain an erroneous
>>>> +	  value").  The workaround will only be active if the
>>>> +	  fsl,erratum-a008585 property is found in the timer node.
>>>> +
>>>>  config ARM_GLOBAL_TIMER
>>>>  	bool
>>>>  	select CLKSRC_OF if OF
>>>> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
>>>> index 5152b38..7ead4eb 100644
>>>> --- a/drivers/clocksource/arm_arch_timer.c
>>>> +++ b/drivers/clocksource/arm_arch_timer.c
>>>> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>>>>   * Architected system timer support.
>>>>   */
>>>>  
>>>> +#ifdef CONFIG_FSL_ERRATUM_A008585
>>>> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
>>>> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
>>>> +
>>>> +/*
>>>> + * __always_inline is used to ensure that func() is not an actual function
>>>> + * pointer, which would result in the register accesses potentially being too
>>>> + * far apart for the loop to work.
>>>> + *
>>>> + * The timeout is an arbitrary value well beyond the highest number
>>>> + * of iterations the loop has been observed to take.
>>>> + */
>>>> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
>>>> +{
>>>> +	u64 cval_old, cval_new;
>>>> +	int timeout = 200;
>>>> +
>>>> +	do {
>>>> +		isb();
>>>> +		cval_old = func();
>>>> +		cval_new = func();
>>>> +		timeout--;
>>>> +	} while (unlikely(cval_old != cval_new) && timeout);
>>>> +
>>>> +	WARN_ON_ONCE(!timeout);
>>>> +	return cval_new;
>>>> +}
>>> Hi Scott:
>>>
>>> I have test this patch, this solution looks will break the performance a little more than I expected.
>>> it will have more than 10% that the cval will read again, we could sure that the cval_old always equal to the
>>> cval_new in the normal circumstances, so I prefer this way:
>>>
>>> 	do {
>>> 		isb();
>>> 		cval_old = func();
>>> 		cval_new = func();
>>> 		timeout--;	
>>> 	} while (unlikely((cval_new - cval_old) >> 2) && timeout);
>>
>> What makes you think that ignoring the two bottom bits is a safe thing
>> to do? Talking about performance when the HW has such a dramatic bug is
>> like putting a bigger engine on a car that has no brakes: you just hit
>> the wall quicker.
>>
>> Thanks,
>>
> 
> I have a chip which has the same problem like Scott's chip, and I
> wish to solve this problem in the same way, our chip designer told me
> that if you got a wrong value from the cntvct_el0, you would not get
> a wrong value until 8 cycles later, so I could ignoring the lowest 3
> bits if I reading twice together.

Is that CPU cycles? Or timer cycles? What guarantees do you have that
the two reads are *always* done in the right timing window?

> The key problem is the probability of this bug, my chip has 1/100000
> chance to met this bug, so use 10% performance to fix this bug looks
> more expensive.

You care about performance, I care about correctness. If 10% of your CPU
is wasted on a correct workaround, tough. Next time, your HW guy will
spend more time getting it right.

Anyway, what you are describing is a different bug from what FSL has, so
don't try and shoehorn your own constraints in another workaround.
Please implement your own.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
  2016-07-07 11:51                   ` Marc Zyngier
@ 2016-07-07 12:59                     ` Ding Tianhong
  -1 siblings, 0 replies; 24+ messages in thread
From: Ding Tianhong @ 2016-07-07 12:59 UTC (permalink / raw)
  To: Marc Zyngier, Scott Wood, Catalin Marinas, Will Deacon
  Cc: devicetree, stuart.yoder, linux-arm-kernel

On 2016/7/7 19:51, Marc Zyngier wrote:
> On 07/07/16 12:37, Ding Tianhong wrote:
>> On 2016/7/7 17:49, Marc Zyngier wrote:
>>> On 07/07/16 10:34, Ding Tianhong wrote:
>>>> On 2016/7/2 6:41, Scott Wood wrote:
>>>>> Erratum A-008585 says that the ARM generic timer counter "has the
>>>>> potential to contain an erroneous value for a small number of core
>>>>> clock cycles every time the timer value changes".  Accesses to TVAL
>>>>> (both read and write) are also affected due to the implicit counter
>>>>> read.  Accesses to CVAL are not affected.
>>>>>
>>>>> The workaround is to reread TVAL and count registers until successive reads
>>>>> return the same value, and when writing TVAL to retry until counter
>>>>> reads before and after the write return the same value.
>>>>>
>>>>> This erratum can be found on LS1043A and LS2080A.
>>>>>
>>>>> Signed-off-by: Scott Wood <oss@buserror.net>
>>>>> ---
>>>>> v3:
>>>>> - Used cval rather than a loop for the write side of the erratum
>>>>> - Added a Kconfig control
>>>>> - Moved the device tree binding into its own patch
>>>>> - Added erratum to silicon-errata.txt
>>>>> - Changed function names to contain the erratum name
>>>>> - Factored out the setting of erratum versions of set_next_event
>>>>>   to improve readability
>>>>> - Added a comment clarifying that the timeout is arbitrary
>>>>>
>>>>> v2:
>>>>> Significant rework based on feedback, including using static_key,
>>>>> disabling VDSO counter access rather than adding the workaround to the
>>>>> VDSO, and uninlining the loops.
>>>>>
>>>>> Dropped the separate property for indicating that writes to TVAL are
>>>>> affected, as I believe that's just a side effect of the implicit
>>>>> counter read being corrupted, and thus a chip that is affected by one
>>>>> will always be affected by the other.
>>>>>
>>>>> Dropped the arm32 portion as it seems there was confusion about whether
>>>>> LS1021A is affected.  Currently I am being told that it is not
>>>>> affected.
>>>>>
>>>>> I considered writing to CVAL rather than looping on TVAL writes, but
>>>>> that would still have required separate set_next_event() code for the
>>>>> erratum, and adding CVAL to the enum would have required a bunch of
>>>>> extra handlers in switch statements (even where unused, due to compiler
>>>>> warnings about unhandled enum values) including in an arm32 header.  It
>>>>> seemed better to avoid the arm32 interaction and new untested
>>>>> accessors.
>>>>> ---
>>>>>  Documentation/arm64/silicon-errata.txt |   2 +
>>>>>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>>>>>  drivers/clocksource/Kconfig            |  10 ++++
>>>>>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>>>>>  4 files changed, 154 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
>>>>> index ba4b6ac..5778f62 100644
>>>>> --- a/Documentation/arm64/silicon-errata.txt
>>>>> +++ b/Documentation/arm64/silicon-errata.txt
>>>>> @@ -57,3 +57,5 @@ stable kernels.
>>>>>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>>>>>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>>>>>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
>>>>> +|                |                 |                 |                         |
>>>>> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
>>>>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
>>>>> index fbe0ca3..70fbad9 100644
>>>>> --- a/arch/arm64/include/asm/arch_timer.h
>>>>> +++ b/arch/arm64/include/asm/arch_timer.h
>>>>> @@ -23,10 +23,34 @@
>>>>>  
>>>>>  #include <linux/bug.h>
>>>>>  #include <linux/init.h>
>>>>> +#include <linux/jump_label.h>
>>>>>  #include <linux/types.h>
>>>>>  
>>>>>  #include <clocksource/arm_arch_timer.h>
>>>>>  
>>>>> +extern struct static_key_false arch_timer_read_ool_enabled;
>>>>> +
>>>>> +#define ARCH_TIMER_REG_READ(reg, func) \
>>>>> +extern u64 func##_ool(void); \
>>>>> +static inline u64 __##func(void) \
>>>>> +{ \
>>>>> +	u64 val; \
>>>>> +	asm volatile("mrs %0, " reg : "=r" (val)); \
>>>>> +	return val; \
>>>>> +} \
>>>>> +static inline u64 _##func(void) \
>>>>> +{ \
>>>>> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
>>>>> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
>>>>> +		return func##_ool(); \
>>>>> +	else \
>>>>> +		return __##func(); \
>>>>> +}
>>>>> +
>>>>> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
>>>>> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
>>>>> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
>>>>> +
>>>>>  /*
>>>>>   * These register accessors are marked inline so the compiler can
>>>>>   * nicely work out which register we want, and chuck away the rest of
>>>>> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>>>>>  	isb();
>>>>>  }
>>>>>  
>>>>> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
>>>>> +{
>>>>> +	if (access == ARCH_TIMER_PHYS_ACCESS)
>>>>> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
>>>>> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
>>>>> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
>>>>> +
>>>>> +	isb();
>>>>> +}
>>>>> +
>>>>>  static __always_inline
>>>>>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>>>  {
>>>>> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>>>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>>>>>  		switch (reg) {
>>>>>  		case ARCH_TIMER_REG_CTRL:
>>>>> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
>>>>> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>>>>>  			break;
>>>>>  		case ARCH_TIMER_REG_TVAL:
>>>>> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
>>>>> +			val = _arch_timer_get_ptval();
>>>>>  			break;
>>>>>  		}
>>>>>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>>>>>  		switch (reg) {
>>>>>  		case ARCH_TIMER_REG_CTRL:
>>>>> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
>>>>> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>>>>>  			break;
>>>>>  		case ARCH_TIMER_REG_TVAL:
>>>>> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
>>>>> +			val = _arch_timer_get_vtval();
>>>>>  			break;
>>>>>  		}
>>>>>  	}
>>>>> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>>>>>  
>>>>>  static inline u64 arch_counter_get_cntvct(void)
>>>>>  {
>>>>> -	u64 cval;
>>>>> -
>>>>>  	isb();
>>>>> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
>>>>> -
>>>>> -	return cval;
>>>>> +	return _arch_counter_get_cntvct();
>>>>>  }
>>>>>  
>>>>>  static inline int arch_timer_arch_init(void)
>>>>> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
>>>>> index c346be6..672ddc3 100644
>>>>> --- a/drivers/clocksource/Kconfig
>>>>> +++ b/drivers/clocksource/Kconfig
>>>>> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>>>>>  	  This must be disabled for hardware validation purposes to detect any
>>>>>  	  hardware anomalies of missing events.
>>>>>  
>>>>> +config FSL_ERRATUM_A008585
>>>>> +	bool "Workaround for Freescale/NXP Erratum A-008585"
>>>>> +	default y
>>>>> +	depends on ARM_ARCH_TIMER && ARM64
>>>>> +	help
>>>>> +	  This option enables a workaround for Freescale/NXP Erratum
>>>>> +	  A-008585 ("ARM generic timer may contain an erroneous
>>>>> +	  value").  The workaround will only be active if the
>>>>> +	  fsl,erratum-a008585 property is found in the timer node.
>>>>> +
>>>>>  config ARM_GLOBAL_TIMER
>>>>>  	bool
>>>>>  	select CLKSRC_OF if OF
>>>>> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
>>>>> index 5152b38..7ead4eb 100644
>>>>> --- a/drivers/clocksource/arm_arch_timer.c
>>>>> +++ b/drivers/clocksource/arm_arch_timer.c
>>>>> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>>>>>   * Architected system timer support.
>>>>>   */
>>>>>  
>>>>> +#ifdef CONFIG_FSL_ERRATUM_A008585
>>>>> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
>>>>> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
>>>>> +
>>>>> +/*
>>>>> + * __always_inline is used to ensure that func() is not an actual function
>>>>> + * pointer, which would result in the register accesses potentially being too
>>>>> + * far apart for the loop to work.
>>>>> + *
>>>>> + * The timeout is an arbitrary value well beyond the highest number
>>>>> + * of iterations the loop has been observed to take.
>>>>> + */
>>>>> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
>>>>> +{
>>>>> +	u64 cval_old, cval_new;
>>>>> +	int timeout = 200;
>>>>> +
>>>>> +	do {
>>>>> +		isb();
>>>>> +		cval_old = func();
>>>>> +		cval_new = func();
>>>>> +		timeout--;
>>>>> +	} while (unlikely(cval_old != cval_new) && timeout);
>>>>> +
>>>>> +	WARN_ON_ONCE(!timeout);
>>>>> +	return cval_new;
>>>>> +}
>>>> Hi Scott:
>>>>
>>>> I have test this patch, this solution looks will break the performance a little more than I expected.
>>>> it will have more than 10% that the cval will read again, we could sure that the cval_old always equal to the
>>>> cval_new in the normal circumstances, so I prefer this way:
>>>>
>>>> 	do {
>>>> 		isb();
>>>> 		cval_old = func();
>>>> 		cval_new = func();
>>>> 		timeout--;	
>>>> 	} while (unlikely((cval_new - cval_old) >> 2) && timeout);
>>>
>>> What makes you think that ignoring the two bottom bits is a safe thing
>>> to do? Talking about performance when the HW has such a dramatic bug is
>>> like putting a bigger engine on a car that has no brakes: you just hit
>>> the wall quicker.
>>>
>>> Thanks,
>>>
>>
>> I have a chip which has the same problem like Scott's chip, and I
>> wish to solve this problem in the same way, our chip designer told me
>> that if you got a wrong value from the cntvct_el0, you would not get
>> a wrong value until 8 cycles later, so I could ignoring the lowest 3
>> bits if I reading twice together.
> 
> Is that CPU cycles? Or timer cycles? What guarantees do you have that
> the two reads are *always* done in the right timing window?
> 

The timer counter only use 56 bits in aarch64, my chip would change one of the higher 
bit(55 to 3) to a wrong value when occur bug, so there will be more than 8 cycles between
correct value and wrong value from the timer counter. Maybe Scott's problem is not just like
mine.

>> The key problem is the probability of this bug, my chip has 1/100000
>> chance to met this bug, so use 10% performance to fix this bug looks
>> more expensive.
> 
> You care about performance, I care about correctness. If 10% of your CPU
> is wasted on a correct workaround, tough. Next time, your HW guy will
> spend more time getting it right.
> 
> Anyway, what you are describing is a different bug from what FSL has, so
> don't try and shoehorn your own constraints in another workaround.
> Please implement your own.
> 

OK, In deed I should, thanks.

Ding

> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
@ 2016-07-07 12:59                     ` Ding Tianhong
  0 siblings, 0 replies; 24+ messages in thread
From: Ding Tianhong @ 2016-07-07 12:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016/7/7 19:51, Marc Zyngier wrote:
> On 07/07/16 12:37, Ding Tianhong wrote:
>> On 2016/7/7 17:49, Marc Zyngier wrote:
>>> On 07/07/16 10:34, Ding Tianhong wrote:
>>>> On 2016/7/2 6:41, Scott Wood wrote:
>>>>> Erratum A-008585 says that the ARM generic timer counter "has the
>>>>> potential to contain an erroneous value for a small number of core
>>>>> clock cycles every time the timer value changes".  Accesses to TVAL
>>>>> (both read and write) are also affected due to the implicit counter
>>>>> read.  Accesses to CVAL are not affected.
>>>>>
>>>>> The workaround is to reread TVAL and count registers until successive reads
>>>>> return the same value, and when writing TVAL to retry until counter
>>>>> reads before and after the write return the same value.
>>>>>
>>>>> This erratum can be found on LS1043A and LS2080A.
>>>>>
>>>>> Signed-off-by: Scott Wood <oss@buserror.net>
>>>>> ---
>>>>> v3:
>>>>> - Used cval rather than a loop for the write side of the erratum
>>>>> - Added a Kconfig control
>>>>> - Moved the device tree binding into its own patch
>>>>> - Added erratum to silicon-errata.txt
>>>>> - Changed function names to contain the erratum name
>>>>> - Factored out the setting of erratum versions of set_next_event
>>>>>   to improve readability
>>>>> - Added a comment clarifying that the timeout is arbitrary
>>>>>
>>>>> v2:
>>>>> Significant rework based on feedback, including using static_key,
>>>>> disabling VDSO counter access rather than adding the workaround to the
>>>>> VDSO, and uninlining the loops.
>>>>>
>>>>> Dropped the separate property for indicating that writes to TVAL are
>>>>> affected, as I believe that's just a side effect of the implicit
>>>>> counter read being corrupted, and thus a chip that is affected by one
>>>>> will always be affected by the other.
>>>>>
>>>>> Dropped the arm32 portion as it seems there was confusion about whether
>>>>> LS1021A is affected.  Currently I am being told that it is not
>>>>> affected.
>>>>>
>>>>> I considered writing to CVAL rather than looping on TVAL writes, but
>>>>> that would still have required separate set_next_event() code for the
>>>>> erratum, and adding CVAL to the enum would have required a bunch of
>>>>> extra handlers in switch statements (even where unused, due to compiler
>>>>> warnings about unhandled enum values) including in an arm32 header.  It
>>>>> seemed better to avoid the arm32 interaction and new untested
>>>>> accessors.
>>>>> ---
>>>>>  Documentation/arm64/silicon-errata.txt |   2 +
>>>>>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>>>>>  drivers/clocksource/Kconfig            |  10 ++++
>>>>>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>>>>>  4 files changed, 154 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
>>>>> index ba4b6ac..5778f62 100644
>>>>> --- a/Documentation/arm64/silicon-errata.txt
>>>>> +++ b/Documentation/arm64/silicon-errata.txt
>>>>> @@ -57,3 +57,5 @@ stable kernels.
>>>>>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>>>>>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>>>>>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
>>>>> +|                |                 |                 |                         |
>>>>> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
>>>>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
>>>>> index fbe0ca3..70fbad9 100644
>>>>> --- a/arch/arm64/include/asm/arch_timer.h
>>>>> +++ b/arch/arm64/include/asm/arch_timer.h
>>>>> @@ -23,10 +23,34 @@
>>>>>  
>>>>>  #include <linux/bug.h>
>>>>>  #include <linux/init.h>
>>>>> +#include <linux/jump_label.h>
>>>>>  #include <linux/types.h>
>>>>>  
>>>>>  #include <clocksource/arm_arch_timer.h>
>>>>>  
>>>>> +extern struct static_key_false arch_timer_read_ool_enabled;
>>>>> +
>>>>> +#define ARCH_TIMER_REG_READ(reg, func) \
>>>>> +extern u64 func##_ool(void); \
>>>>> +static inline u64 __##func(void) \
>>>>> +{ \
>>>>> +	u64 val; \
>>>>> +	asm volatile("mrs %0, " reg : "=r" (val)); \
>>>>> +	return val; \
>>>>> +} \
>>>>> +static inline u64 _##func(void) \
>>>>> +{ \
>>>>> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
>>>>> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
>>>>> +		return func##_ool(); \
>>>>> +	else \
>>>>> +		return __##func(); \
>>>>> +}
>>>>> +
>>>>> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
>>>>> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
>>>>> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
>>>>> +
>>>>>  /*
>>>>>   * These register accessors are marked inline so the compiler can
>>>>>   * nicely work out which register we want, and chuck away the rest of
>>>>> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>>>>>  	isb();
>>>>>  }
>>>>>  
>>>>> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
>>>>> +{
>>>>> +	if (access == ARCH_TIMER_PHYS_ACCESS)
>>>>> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
>>>>> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
>>>>> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
>>>>> +
>>>>> +	isb();
>>>>> +}
>>>>> +
>>>>>  static __always_inline
>>>>>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>>>  {
>>>>> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>>>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>>>>>  		switch (reg) {
>>>>>  		case ARCH_TIMER_REG_CTRL:
>>>>> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
>>>>> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>>>>>  			break;
>>>>>  		case ARCH_TIMER_REG_TVAL:
>>>>> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
>>>>> +			val = _arch_timer_get_ptval();
>>>>>  			break;
>>>>>  		}
>>>>>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>>>>>  		switch (reg) {
>>>>>  		case ARCH_TIMER_REG_CTRL:
>>>>> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
>>>>> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>>>>>  			break;
>>>>>  		case ARCH_TIMER_REG_TVAL:
>>>>> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
>>>>> +			val = _arch_timer_get_vtval();
>>>>>  			break;
>>>>>  		}
>>>>>  	}
>>>>> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>>>>>  
>>>>>  static inline u64 arch_counter_get_cntvct(void)
>>>>>  {
>>>>> -	u64 cval;
>>>>> -
>>>>>  	isb();
>>>>> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
>>>>> -
>>>>> -	return cval;
>>>>> +	return _arch_counter_get_cntvct();
>>>>>  }
>>>>>  
>>>>>  static inline int arch_timer_arch_init(void)
>>>>> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
>>>>> index c346be6..672ddc3 100644
>>>>> --- a/drivers/clocksource/Kconfig
>>>>> +++ b/drivers/clocksource/Kconfig
>>>>> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>>>>>  	  This must be disabled for hardware validation purposes to detect any
>>>>>  	  hardware anomalies of missing events.
>>>>>  
>>>>> +config FSL_ERRATUM_A008585
>>>>> +	bool "Workaround for Freescale/NXP Erratum A-008585"
>>>>> +	default y
>>>>> +	depends on ARM_ARCH_TIMER && ARM64
>>>>> +	help
>>>>> +	  This option enables a workaround for Freescale/NXP Erratum
>>>>> +	  A-008585 ("ARM generic timer may contain an erroneous
>>>>> +	  value").  The workaround will only be active if the
>>>>> +	  fsl,erratum-a008585 property is found in the timer node.
>>>>> +
>>>>>  config ARM_GLOBAL_TIMER
>>>>>  	bool
>>>>>  	select CLKSRC_OF if OF
>>>>> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
>>>>> index 5152b38..7ead4eb 100644
>>>>> --- a/drivers/clocksource/arm_arch_timer.c
>>>>> +++ b/drivers/clocksource/arm_arch_timer.c
>>>>> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>>>>>   * Architected system timer support.
>>>>>   */
>>>>>  
>>>>> +#ifdef CONFIG_FSL_ERRATUM_A008585
>>>>> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
>>>>> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
>>>>> +
>>>>> +/*
>>>>> + * __always_inline is used to ensure that func() is not an actual function
>>>>> + * pointer, which would result in the register accesses potentially being too
>>>>> + * far apart for the loop to work.
>>>>> + *
>>>>> + * The timeout is an arbitrary value well beyond the highest number
>>>>> + * of iterations the loop has been observed to take.
>>>>> + */
>>>>> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
>>>>> +{
>>>>> +	u64 cval_old, cval_new;
>>>>> +	int timeout = 200;
>>>>> +
>>>>> +	do {
>>>>> +		isb();
>>>>> +		cval_old = func();
>>>>> +		cval_new = func();
>>>>> +		timeout--;
>>>>> +	} while (unlikely(cval_old != cval_new) && timeout);
>>>>> +
>>>>> +	WARN_ON_ONCE(!timeout);
>>>>> +	return cval_new;
>>>>> +}
>>>> Hi Scott:
>>>>
>>>> I have test this patch, this solution looks will break the performance a little more than I expected.
>>>> it will have more than 10% that the cval will read again, we could sure that the cval_old always equal to the
>>>> cval_new in the normal circumstances, so I prefer this way:
>>>>
>>>> 	do {
>>>> 		isb();
>>>> 		cval_old = func();
>>>> 		cval_new = func();
>>>> 		timeout--;	
>>>> 	} while (unlikely((cval_new - cval_old) >> 2) && timeout);
>>>
>>> What makes you think that ignoring the two bottom bits is a safe thing
>>> to do? Talking about performance when the HW has such a dramatic bug is
>>> like putting a bigger engine on a car that has no brakes: you just hit
>>> the wall quicker.
>>>
>>> Thanks,
>>>
>>
>> I have a chip which has the same problem like Scott's chip, and I
>> wish to solve this problem in the same way, our chip designer told me
>> that if you got a wrong value from the cntvct_el0, you would not get
>> a wrong value until 8 cycles later, so I could ignoring the lowest 3
>> bits if I reading twice together.
> 
> Is that CPU cycles? Or timer cycles? What guarantees do you have that
> the two reads are *always* done in the right timing window?
> 

The timer counter only use 56 bits in aarch64, my chip would change one of the higher 
bit(55 to 3) to a wrong value when occur bug, so there will be more than 8 cycles between
correct value and wrong value from the timer counter. Maybe Scott's problem is not just like
mine.

>> The key problem is the probability of this bug, my chip has 1/100000
>> chance to met this bug, so use 10% performance to fix this bug looks
>> more expensive.
> 
> You care about performance, I care about correctness. If 10% of your CPU
> is wasted on a correct workaround, tough. Next time, your HW guy will
> spend more time getting it right.
> 
> Anyway, what you are describing is a different bug from what FSL has, so
> don't try and shoehorn your own constraints in another workaround.
> Please implement your own.
> 

OK, In deed I should, thanks.

Ding

> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
  2016-07-07 12:59                     ` Ding Tianhong
@ 2016-07-07 17:39                         ` Scott Wood
  -1 siblings, 0 replies; 24+ messages in thread
From: Scott Wood @ 2016-07-07 17:39 UTC (permalink / raw)
  To: Ding Tianhong, Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, stuart.yoder-3arQi8VN3Tc,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, 2016-07-07 at 20:59 +0800, Ding Tianhong wrote:
> On 2016/7/7 19:51, Marc Zyngier wrote:
> > 
> > On 07/07/16 12:37, Ding Tianhong wrote:
> > > 
> > > On 2016/7/7 17:49, Marc Zyngier wrote:
> > > > 
> > > > What makes you think that ignoring the two bottom bits is a safe thing
> > > > to do? Talking about performance when the HW has such a dramatic bug
> > > > is
> > > > like putting a bigger engine on a car that has no brakes: you just hit
> > > > the wall quicker.
> > > > 
> > > > Thanks,
> > > > 
> > > I have a chip which has the same problem like Scott's chip, and I
> > > wish to solve this problem in the same way, our chip designer told me
> > > that if you got a wrong value from the cntvct_el0, you would not get
> > > a wrong value until 8 cycles later, so I could ignoring the lowest 3
> > > bits if I reading twice together.
> > Is that CPU cycles? Or timer cycles? What guarantees do you have that
> > the two reads are *always* done in the right timing window?
> > 
> The timer counter only use 56 bits in aarch64, my chip would change one of
> the higher 
> bit(55 to 3) to a wrong value when occur bug, so there will be more than 8
> cycles between
> correct value and wrong value from the timer counter. Maybe Scott's problem
> is not just like
> mine.

It's not like yours.  Most errors I saw were time going backwards by 1, 3, or
7 cycles (with occasional larger errors).

-Scott

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
@ 2016-07-07 17:39                         ` Scott Wood
  0 siblings, 0 replies; 24+ messages in thread
From: Scott Wood @ 2016-07-07 17:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2016-07-07 at 20:59 +0800, Ding Tianhong wrote:
> On 2016/7/7 19:51, Marc Zyngier wrote:
> > 
> > On 07/07/16 12:37, Ding Tianhong wrote:
> > > 
> > > On 2016/7/7 17:49, Marc Zyngier wrote:
> > > > 
> > > > What makes you think that ignoring the two bottom bits is a safe thing
> > > > to do? Talking about performance when the HW has such a dramatic bug
> > > > is
> > > > like putting a bigger engine on a car that has no brakes: you just hit
> > > > the wall quicker.
> > > > 
> > > > Thanks,
> > > > 
> > > I have a chip which has the same problem like Scott's chip, and I
> > > wish to solve this problem in the same way, our chip designer told me
> > > that if you got a wrong value from the cntvct_el0, you would not get
> > > a wrong value until 8 cycles later, so I could ignoring the lowest 3
> > > bits if I reading twice together.
> > Is that CPU cycles? Or timer cycles? What guarantees do you have that
> > the two reads are *always* done in the right timing window?
> > 
> The timer counter only use 56 bits in aarch64, my chip would change one of
> the higher?
> bit(55 to 3) to a wrong value when occur bug, so there will be more than 8
> cycles between
> correct value and wrong value from the timer counter. Maybe Scott's problem
> is not just like
> mine.

It's not like yours. ?Most errors I saw were time going backwards by 1, 3, or
7 cycles (with occasional larger errors).

-Scott

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
  2016-07-07 17:39                         ` Scott Wood
@ 2016-07-08  0:51                             ` Ding Tianhong
  -1 siblings, 0 replies; 24+ messages in thread
From: Ding Tianhong @ 2016-07-08  0:51 UTC (permalink / raw)
  To: Scott Wood, Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, stuart.yoder-3arQi8VN3Tc,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 2016/7/8 1:39, Scott Wood wrote:
> On Thu, 2016-07-07 at 20:59 +0800, Ding Tianhong wrote:
>> On 2016/7/7 19:51, Marc Zyngier wrote:
>>>
>>> On 07/07/16 12:37, Ding Tianhong wrote:
>>>>
>>>> On 2016/7/7 17:49, Marc Zyngier wrote:
>>>>>
>>>>> What makes you think that ignoring the two bottom bits is a safe thing
>>>>> to do? Talking about performance when the HW has such a dramatic bug
>>>>> is
>>>>> like putting a bigger engine on a car that has no brakes: you just hit
>>>>> the wall quicker.
>>>>>
>>>>> Thanks,
>>>>>
>>>> I have a chip which has the same problem like Scott's chip, and I
>>>> wish to solve this problem in the same way, our chip designer told me
>>>> that if you got a wrong value from the cntvct_el0, you would not get
>>>> a wrong value until 8 cycles later, so I could ignoring the lowest 3
>>>> bits if I reading twice together.
>>> Is that CPU cycles? Or timer cycles? What guarantees do you have that
>>> the two reads are *always* done in the right timing window?
>>>
>> The timer counter only use 56 bits in aarch64, my chip would change one of
>> the higher 
>> bit(55 to 3) to a wrong value when occur bug, so there will be more than 8
>> cycles between
>> correct value and wrong value from the timer counter. Maybe Scott's problem
>> is not just like
>> mine.
> 
> It's not like yours.  Most errors I saw were time going backwards by 1, 3, or
> 7 cycles (with occasional larger errors).
> 
Looks more series, agree with your solution, thanks。

Thanks.
Ding
> -Scott
> 
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585
@ 2016-07-08  0:51                             ` Ding Tianhong
  0 siblings, 0 replies; 24+ messages in thread
From: Ding Tianhong @ 2016-07-08  0:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016/7/8 1:39, Scott Wood wrote:
> On Thu, 2016-07-07 at 20:59 +0800, Ding Tianhong wrote:
>> On 2016/7/7 19:51, Marc Zyngier wrote:
>>>
>>> On 07/07/16 12:37, Ding Tianhong wrote:
>>>>
>>>> On 2016/7/7 17:49, Marc Zyngier wrote:
>>>>>
>>>>> What makes you think that ignoring the two bottom bits is a safe thing
>>>>> to do? Talking about performance when the HW has such a dramatic bug
>>>>> is
>>>>> like putting a bigger engine on a car that has no brakes: you just hit
>>>>> the wall quicker.
>>>>>
>>>>> Thanks,
>>>>>
>>>> I have a chip which has the same problem like Scott's chip, and I
>>>> wish to solve this problem in the same way, our chip designer told me
>>>> that if you got a wrong value from the cntvct_el0, you would not get
>>>> a wrong value until 8 cycles later, so I could ignoring the lowest 3
>>>> bits if I reading twice together.
>>> Is that CPU cycles? Or timer cycles? What guarantees do you have that
>>> the two reads are *always* done in the right timing window?
>>>
>> The timer counter only use 56 bits in aarch64, my chip would change one of
>> the higher 
>> bit(55 to 3) to a wrong value when occur bug, so there will be more than 8
>> cycles between
>> correct value and wrong value from the timer counter. Maybe Scott's problem
>> is not just like
>> mine.
> 
> It's not like yours.  Most errors I saw were time going backwards by 1, 3, or
> 7 cycles (with occasional larger errors).
> 
Looks more series, agree with your solution, thanks?

Thanks.
Ding
> -Scott
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-07-08  0:51 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-01 22:41 [PATCH v3 1/3] arm64: arch_timer: Add device tree binding for A-008585 erratum Scott Wood
2016-07-01 22:41 ` Scott Wood
     [not found] ` <1467412897-15220-1-git-send-email-oss-fOR+EgIDQEHk1uMJSBkQmQ@public.gmane.org>
2016-07-01 22:41   ` [PATCH v3 2/3] arm64: dts: Add timer erratum property for LS2080A and LS1043A Scott Wood
2016-07-01 22:41     ` Scott Wood
2016-07-01 22:41   ` [PATCH v3 3/3] arm64: arch_timer: Work around QorIQ Erratum A-008585 Scott Wood
2016-07-01 22:41     ` Scott Wood
     [not found]     ` <1467412897-15220-3-git-send-email-oss-fOR+EgIDQEHk1uMJSBkQmQ@public.gmane.org>
2016-07-04  9:58       ` Will Deacon
2016-07-04  9:58         ` Will Deacon
2016-07-07  9:34     ` Ding Tianhong
2016-07-07  9:34       ` Ding Tianhong
     [not found]       ` <577E2226.3020902-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2016-07-07  9:49         ` Marc Zyngier
2016-07-07  9:49           ` Marc Zyngier
     [not found]           ` <577E25A4.2010800-5wv7dgnIgG8@public.gmane.org>
2016-07-07 11:37             ` Ding Tianhong
2016-07-07 11:37               ` Ding Tianhong
     [not found]               ` <577E3EF2.9080105-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2016-07-07 11:51                 ` Marc Zyngier
2016-07-07 11:51                   ` Marc Zyngier
2016-07-07 12:59                   ` Ding Tianhong
2016-07-07 12:59                     ` Ding Tianhong
     [not found]                     ` <577E524F.3030403-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2016-07-07 17:39                       ` Scott Wood
2016-07-07 17:39                         ` Scott Wood
     [not found]                         ` <1467913182.32358.68.camel-fOR+EgIDQEHk1uMJSBkQmQ@public.gmane.org>
2016-07-08  0:51                           ` Ding Tianhong
2016-07-08  0:51                             ` Ding Tianhong
2016-07-05 15:43   ` [PATCH v3 1/3] arm64: arch_timer: Add device tree binding for A-008585 erratum Rob Herring
2016-07-05 15:43     ` Rob Herring

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.