All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Exynos MCT udelay, MCT cleanup, MCT to 32-bits
@ 2014-06-20 17:47 ` Doug Anderson
  0 siblings, 0 replies; 12+ messages in thread
From: Doug Anderson @ 2014-06-20 17:47 UTC (permalink / raw)
  To: Daniel Lezcano, Kukjin Kim, Tomasz Figa
  Cc: Vincent Guittot, Chirantan Ekbote, David Riley, olof,
	linux-samsung-soc, Amit Daniel Kachhap, javier.martinez,
	Doug Anderson, linux-kernel, tglx, linux-arm-kernel

This is a series of 3 patches related to the exynos MCT (multi core
timer).  The first allows MCT to function as a udelay() timer which
fixes broken udelay on 5400, 5800, and even (to a lesser extent) on
5250.  The second is some general cleanup.  The third moves MCT to
32-bits where possible to give us a nice speedup.

The first probably ought to be destined for 3.16 as a bugfix whereas
the others could land in a future kernel release.

This series is based on (clocksource: exynos_mct: Fix ftrace).

With this series we can drop the patches I submitted:
- clocksource: exynos_mct: cache mct upper count
- clocksource: exynos_mct: Optimize register reads with ldmia

Changes in v3:
- Back to exynos_frc_read for now until 32/64 is resolved.
- Now returns cycles_t which matches arch/arm/include/asm/timex.h.
- Rebased.
- Moved registration to its own function.
- __raw_readl / __raw_writel patch new for version 3
- Now 32-bit version instead of ldmia version

Changes in v2:
- Added #defines for ARM and ARM64 as pointed by Doug Anderson.

Amit Daniel Kachhap (1):
  clocksource: exynos_mct: Register the timer for stable udelay

Doug Anderson (2):
  clocksource: exynos_mct: __raw_readl/__raw_writel =>
    readl_relaxed/writel_relaxed
  clocksource: exynos_mct: Only use 32-bits where possible

 drivers/clocksource/Kconfig      |  1 +
 drivers/clocksource/exynos_mct.c | 72 ++++++++++++++++++++++++++++++----------
 2 files changed, 55 insertions(+), 18 deletions(-)

-- 
2.0.0.526.g5318336


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 0/3] Exynos MCT udelay, MCT cleanup, MCT to 32-bits
@ 2014-06-20 17:47 ` Doug Anderson
  0 siblings, 0 replies; 12+ messages in thread
From: Doug Anderson @ 2014-06-20 17:47 UTC (permalink / raw)
  To: linux-arm-kernel

This is a series of 3 patches related to the exynos MCT (multi core
timer).  The first allows MCT to function as a udelay() timer which
fixes broken udelay on 5400, 5800, and even (to a lesser extent) on
5250.  The second is some general cleanup.  The third moves MCT to
32-bits where possible to give us a nice speedup.

The first probably ought to be destined for 3.16 as a bugfix whereas
the others could land in a future kernel release.

This series is based on (clocksource: exynos_mct: Fix ftrace).

With this series we can drop the patches I submitted:
- clocksource: exynos_mct: cache mct upper count
- clocksource: exynos_mct: Optimize register reads with ldmia

Changes in v3:
- Back to exynos_frc_read for now until 32/64 is resolved.
- Now returns cycles_t which matches arch/arm/include/asm/timex.h.
- Rebased.
- Moved registration to its own function.
- __raw_readl / __raw_writel patch new for version 3
- Now 32-bit version instead of ldmia version

Changes in v2:
- Added #defines for ARM and ARM64 as pointed by Doug Anderson.

Amit Daniel Kachhap (1):
  clocksource: exynos_mct: Register the timer for stable udelay

Doug Anderson (2):
  clocksource: exynos_mct: __raw_readl/__raw_writel =>
    readl_relaxed/writel_relaxed
  clocksource: exynos_mct: Only use 32-bits where possible

 drivers/clocksource/Kconfig      |  1 +
 drivers/clocksource/exynos_mct.c | 72 ++++++++++++++++++++++++++++++----------
 2 files changed, 55 insertions(+), 18 deletions(-)

-- 
2.0.0.526.g5318336

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/3] clocksource: exynos_mct: Register the timer for stable udelay
  2014-06-20 17:47 ` Doug Anderson
@ 2014-06-20 17:47   ` Doug Anderson
  -1 siblings, 0 replies; 12+ messages in thread
From: Doug Anderson @ 2014-06-20 17:47 UTC (permalink / raw)
  To: Daniel Lezcano, Kukjin Kim, Tomasz Figa
  Cc: Vincent Guittot, Chirantan Ekbote, David Riley, olof,
	linux-samsung-soc, Amit Daniel Kachhap, javier.martinez,
	Doug Anderson, tglx, linux-kernel, linux-arm-kernel

From: Amit Daniel Kachhap <amit.daniel@samsung.com>

This patch registers the exynos mct clocksource as the current timer
as it has constant clock rate. This will generate correct udelay for
the exynos platform and avoid using unnecessary calibrated
jiffies. This change has been tested on exynos5420 based board and
udelay is very close to expected.

Without this patch udelay() on exynos5400 / exynos5800 is wildly
inaccurate due to big.LITTLE not adjusting loops_per_jiffy correctly.
Also without this patch udelay() on exynos5250 can be innacruate
during transitions between frequencies < 800 MHz (you'll go 200 MHz ->
800 MHz -> 300 MHz and will run at 800 MHz for a time with the wrong
loops_per_jiffy).

[dianders: reworked and created version 3]

Signed-off-by: Amit Daniel Kachhap <amit.daniel@samsung.com>
Signed-off-by: Doug Anderson <dianders@chromium.org>
---
Changes in v3:
- Back to exynos_frc_read for now until 32/64 is resolved.
- Now returns cycles_t which matches arch/arm/include/asm/timex.h.
- Rebased.
- Moved registration to its own function.

Changes in v2:
- Added #defines for ARM and ARM64 as pointed by Doug Anderson.

 drivers/clocksource/exynos_mct.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index 5ce99c0..ab51bf20a 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -200,10 +200,21 @@ static u64 notrace exynos4_read_sched_clock(void)
 	return _exynos4_frc_read();
 }
 
+static struct delay_timer exynos4_delay_timer;
+
+static cycles_t exynos4_read_current_timer(void)
+{
+	return _exynos4_frc_read();
+}
+
 static void __init exynos4_clocksource_init(void)
 {
 	exynos4_mct_frc_start();
 
+	exynos4_delay_timer.read_current_timer = &exynos4_read_current_timer;
+	exynos4_delay_timer.freq = clk_rate;
+	register_current_timer_delay(&exynos4_delay_timer);
+
 	if (clocksource_register_hz(&mct_frc, clk_rate))
 		panic("%s: can't register clocksource\n", mct_frc.name);
 
-- 
2.0.0.526.g5318336


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 1/3] clocksource: exynos_mct: Register the timer for stable udelay
@ 2014-06-20 17:47   ` Doug Anderson
  0 siblings, 0 replies; 12+ messages in thread
From: Doug Anderson @ 2014-06-20 17:47 UTC (permalink / raw)
  To: linux-arm-kernel

From: Amit Daniel Kachhap <amit.daniel@samsung.com>

This patch registers the exynos mct clocksource as the current timer
as it has constant clock rate. This will generate correct udelay for
the exynos platform and avoid using unnecessary calibrated
jiffies. This change has been tested on exynos5420 based board and
udelay is very close to expected.

Without this patch udelay() on exynos5400 / exynos5800 is wildly
inaccurate due to big.LITTLE not adjusting loops_per_jiffy correctly.
Also without this patch udelay() on exynos5250 can be innacruate
during transitions between frequencies < 800 MHz (you'll go 200 MHz ->
800 MHz -> 300 MHz and will run at 800 MHz for a time with the wrong
loops_per_jiffy).

[dianders: reworked and created version 3]

Signed-off-by: Amit Daniel Kachhap <amit.daniel@samsung.com>
Signed-off-by: Doug Anderson <dianders@chromium.org>
---
Changes in v3:
- Back to exynos_frc_read for now until 32/64 is resolved.
- Now returns cycles_t which matches arch/arm/include/asm/timex.h.
- Rebased.
- Moved registration to its own function.

Changes in v2:
- Added #defines for ARM and ARM64 as pointed by Doug Anderson.

 drivers/clocksource/exynos_mct.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index 5ce99c0..ab51bf20a 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -200,10 +200,21 @@ static u64 notrace exynos4_read_sched_clock(void)
 	return _exynos4_frc_read();
 }
 
+static struct delay_timer exynos4_delay_timer;
+
+static cycles_t exynos4_read_current_timer(void)
+{
+	return _exynos4_frc_read();
+}
+
 static void __init exynos4_clocksource_init(void)
 {
 	exynos4_mct_frc_start();
 
+	exynos4_delay_timer.read_current_timer = &exynos4_read_current_timer;
+	exynos4_delay_timer.freq = clk_rate;
+	register_current_timer_delay(&exynos4_delay_timer);
+
 	if (clocksource_register_hz(&mct_frc, clk_rate))
 		panic("%s: can't register clocksource\n", mct_frc.name);
 
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/3] clocksource: exynos_mct: __raw_readl/__raw_writel => readl_relaxed/writel_relaxed
  2014-06-20 17:47 ` Doug Anderson
@ 2014-06-20 17:47   ` Doug Anderson
  -1 siblings, 0 replies; 12+ messages in thread
From: Doug Anderson @ 2014-06-20 17:47 UTC (permalink / raw)
  To: Daniel Lezcano, Kukjin Kim, Tomasz Figa
  Cc: Vincent Guittot, Chirantan Ekbote, David Riley, olof,
	linux-samsung-soc, Amit Daniel Kachhap, javier.martinez,
	Doug Anderson, tglx, linux-kernel, linux-arm-kernel

Using the __raw functions is discouraged.  Update the file to
consistently use the proper functions.

Signed-off-by: Doug Anderson <dianders@chromium.org>
---
Changes in v3:
- __raw_readl / __raw_writel patch new for version 3

 drivers/clocksource/exynos_mct.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index ab51bf20a..2df03e2 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -94,7 +94,7 @@ static void exynos4_mct_write(unsigned int value, unsigned long offset)
 	u32 mask;
 	u32 i;
 
-	__raw_writel(value, reg_base + offset);
+	writel_relaxed(value, reg_base + offset);
 
 	if (likely(offset >= EXYNOS4_MCT_L_BASE(0))) {
 		stat_addr = (offset & ~EXYNOS4_MCT_L_MASK) + MCT_L_WSTAT_OFFSET;
@@ -144,8 +144,8 @@ static void exynos4_mct_write(unsigned int value, unsigned long offset)
 
 	/* Wait maximum 1 ms until written values are applied */
 	for (i = 0; i < loops_per_jiffy / 1000 * HZ; i++)
-		if (__raw_readl(reg_base + stat_addr) & mask) {
-			__raw_writel(mask, reg_base + stat_addr);
+		if (readl_relaxed(reg_base + stat_addr) & mask) {
+			writel_relaxed(mask, reg_base + stat_addr);
 			return;
 		}
 
@@ -157,7 +157,7 @@ static void exynos4_mct_frc_start(void)
 {
 	u32 reg;
 
-	reg = __raw_readl(reg_base + EXYNOS4_MCT_G_TCON);
+	reg = readl_relaxed(reg_base + EXYNOS4_MCT_G_TCON);
 	reg |= MCT_G_TCON_START;
 	exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
 }
@@ -165,12 +165,12 @@ static void exynos4_mct_frc_start(void)
 static cycle_t notrace _exynos4_frc_read(void)
 {
 	unsigned int lo, hi;
-	u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
+	u32 hi2 = readl_relaxed(reg_base + EXYNOS4_MCT_G_CNT_U);
 
 	do {
 		hi = hi2;
-		lo = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_L);
-		hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
+		lo = readl_relaxed(reg_base + EXYNOS4_MCT_G_CNT_L);
+		hi2 = readl_relaxed(reg_base + EXYNOS4_MCT_G_CNT_U);
 	} while (hi != hi2);
 
 	return ((cycle_t)hi << 32) | lo;
@@ -225,7 +225,7 @@ static void exynos4_mct_comp0_stop(void)
 {
 	unsigned int tcon;
 
-	tcon = __raw_readl(reg_base + EXYNOS4_MCT_G_TCON);
+	tcon = readl_relaxed(reg_base + EXYNOS4_MCT_G_TCON);
 	tcon &= ~(MCT_G_TCON_COMP0_ENABLE | MCT_G_TCON_COMP0_AUTO_INC);
 
 	exynos4_mct_write(tcon, EXYNOS4_MCT_G_TCON);
@@ -238,7 +238,7 @@ static void exynos4_mct_comp0_start(enum clock_event_mode mode,
 	unsigned int tcon;
 	cycle_t comp_cycle;
 
-	tcon = __raw_readl(reg_base + EXYNOS4_MCT_G_TCON);
+	tcon = readl_relaxed(reg_base + EXYNOS4_MCT_G_TCON);
 
 	if (mode == CLOCK_EVT_MODE_PERIODIC) {
 		tcon |= MCT_G_TCON_COMP0_AUTO_INC;
@@ -327,7 +327,7 @@ static void exynos4_mct_tick_stop(struct mct_clock_event_device *mevt)
 	unsigned long mask = MCT_L_TCON_INT_START | MCT_L_TCON_TIMER_START;
 	unsigned long offset = mevt->base + MCT_L_TCON_OFFSET;
 
-	tmp = __raw_readl(reg_base + offset);
+	tmp = readl_relaxed(reg_base + offset);
 	if (tmp & mask) {
 		tmp &= ~mask;
 		exynos4_mct_write(tmp, offset);
@@ -349,7 +349,7 @@ static void exynos4_mct_tick_start(unsigned long cycles,
 	/* enable MCT tick interrupt */
 	exynos4_mct_write(0x1, mevt->base + MCT_L_INT_ENB_OFFSET);
 
-	tmp = __raw_readl(reg_base + mevt->base + MCT_L_TCON_OFFSET);
+	tmp = readl_relaxed(reg_base + mevt->base + MCT_L_TCON_OFFSET);
 	tmp |= MCT_L_TCON_INT_START | MCT_L_TCON_TIMER_START |
 	       MCT_L_TCON_INTERVAL_MODE;
 	exynos4_mct_write(tmp, mevt->base + MCT_L_TCON_OFFSET);
@@ -401,7 +401,7 @@ static int exynos4_mct_tick_clear(struct mct_clock_event_device *mevt)
 		exynos4_mct_tick_stop(mevt);
 
 	/* Clear the MCT tick interrupt */
-	if (__raw_readl(reg_base + mevt->base + MCT_L_INT_CSTAT_OFFSET) & 1) {
+	if (readl_relaxed(reg_base + mevt->base + MCT_L_INT_CSTAT_OFFSET) & 1) {
 		exynos4_mct_write(0x1, mevt->base + MCT_L_INT_CSTAT_OFFSET);
 		return 1;
 	} else {
-- 
2.0.0.526.g5318336


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/3] clocksource: exynos_mct: __raw_readl/__raw_writel => readl_relaxed/writel_relaxed
@ 2014-06-20 17:47   ` Doug Anderson
  0 siblings, 0 replies; 12+ messages in thread
From: Doug Anderson @ 2014-06-20 17:47 UTC (permalink / raw)
  To: linux-arm-kernel

Using the __raw functions is discouraged.  Update the file to
consistently use the proper functions.

Signed-off-by: Doug Anderson <dianders@chromium.org>
---
Changes in v3:
- __raw_readl / __raw_writel patch new for version 3

 drivers/clocksource/exynos_mct.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index ab51bf20a..2df03e2 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -94,7 +94,7 @@ static void exynos4_mct_write(unsigned int value, unsigned long offset)
 	u32 mask;
 	u32 i;
 
-	__raw_writel(value, reg_base + offset);
+	writel_relaxed(value, reg_base + offset);
 
 	if (likely(offset >= EXYNOS4_MCT_L_BASE(0))) {
 		stat_addr = (offset & ~EXYNOS4_MCT_L_MASK) + MCT_L_WSTAT_OFFSET;
@@ -144,8 +144,8 @@ static void exynos4_mct_write(unsigned int value, unsigned long offset)
 
 	/* Wait maximum 1 ms until written values are applied */
 	for (i = 0; i < loops_per_jiffy / 1000 * HZ; i++)
-		if (__raw_readl(reg_base + stat_addr) & mask) {
-			__raw_writel(mask, reg_base + stat_addr);
+		if (readl_relaxed(reg_base + stat_addr) & mask) {
+			writel_relaxed(mask, reg_base + stat_addr);
 			return;
 		}
 
@@ -157,7 +157,7 @@ static void exynos4_mct_frc_start(void)
 {
 	u32 reg;
 
-	reg = __raw_readl(reg_base + EXYNOS4_MCT_G_TCON);
+	reg = readl_relaxed(reg_base + EXYNOS4_MCT_G_TCON);
 	reg |= MCT_G_TCON_START;
 	exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
 }
@@ -165,12 +165,12 @@ static void exynos4_mct_frc_start(void)
 static cycle_t notrace _exynos4_frc_read(void)
 {
 	unsigned int lo, hi;
-	u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
+	u32 hi2 = readl_relaxed(reg_base + EXYNOS4_MCT_G_CNT_U);
 
 	do {
 		hi = hi2;
-		lo = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_L);
-		hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
+		lo = readl_relaxed(reg_base + EXYNOS4_MCT_G_CNT_L);
+		hi2 = readl_relaxed(reg_base + EXYNOS4_MCT_G_CNT_U);
 	} while (hi != hi2);
 
 	return ((cycle_t)hi << 32) | lo;
@@ -225,7 +225,7 @@ static void exynos4_mct_comp0_stop(void)
 {
 	unsigned int tcon;
 
-	tcon = __raw_readl(reg_base + EXYNOS4_MCT_G_TCON);
+	tcon = readl_relaxed(reg_base + EXYNOS4_MCT_G_TCON);
 	tcon &= ~(MCT_G_TCON_COMP0_ENABLE | MCT_G_TCON_COMP0_AUTO_INC);
 
 	exynos4_mct_write(tcon, EXYNOS4_MCT_G_TCON);
@@ -238,7 +238,7 @@ static void exynos4_mct_comp0_start(enum clock_event_mode mode,
 	unsigned int tcon;
 	cycle_t comp_cycle;
 
-	tcon = __raw_readl(reg_base + EXYNOS4_MCT_G_TCON);
+	tcon = readl_relaxed(reg_base + EXYNOS4_MCT_G_TCON);
 
 	if (mode == CLOCK_EVT_MODE_PERIODIC) {
 		tcon |= MCT_G_TCON_COMP0_AUTO_INC;
@@ -327,7 +327,7 @@ static void exynos4_mct_tick_stop(struct mct_clock_event_device *mevt)
 	unsigned long mask = MCT_L_TCON_INT_START | MCT_L_TCON_TIMER_START;
 	unsigned long offset = mevt->base + MCT_L_TCON_OFFSET;
 
-	tmp = __raw_readl(reg_base + offset);
+	tmp = readl_relaxed(reg_base + offset);
 	if (tmp & mask) {
 		tmp &= ~mask;
 		exynos4_mct_write(tmp, offset);
@@ -349,7 +349,7 @@ static void exynos4_mct_tick_start(unsigned long cycles,
 	/* enable MCT tick interrupt */
 	exynos4_mct_write(0x1, mevt->base + MCT_L_INT_ENB_OFFSET);
 
-	tmp = __raw_readl(reg_base + mevt->base + MCT_L_TCON_OFFSET);
+	tmp = readl_relaxed(reg_base + mevt->base + MCT_L_TCON_OFFSET);
 	tmp |= MCT_L_TCON_INT_START | MCT_L_TCON_TIMER_START |
 	       MCT_L_TCON_INTERVAL_MODE;
 	exynos4_mct_write(tmp, mevt->base + MCT_L_TCON_OFFSET);
@@ -401,7 +401,7 @@ static int exynos4_mct_tick_clear(struct mct_clock_event_device *mevt)
 		exynos4_mct_tick_stop(mevt);
 
 	/* Clear the MCT tick interrupt */
-	if (__raw_readl(reg_base + mevt->base + MCT_L_INT_CSTAT_OFFSET) & 1) {
+	if (readl_relaxed(reg_base + mevt->base + MCT_L_INT_CSTAT_OFFSET) & 1) {
 		exynos4_mct_write(0x1, mevt->base + MCT_L_INT_CSTAT_OFFSET);
 		return 1;
 	} else {
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/3] clocksource: exynos_mct: Only use 32-bits where possible
  2014-06-20 17:47 ` Doug Anderson
@ 2014-06-20 17:47   ` Doug Anderson
  -1 siblings, 0 replies; 12+ messages in thread
From: Doug Anderson @ 2014-06-20 17:47 UTC (permalink / raw)
  To: Daniel Lezcano, Kukjin Kim, Tomasz Figa
  Cc: Vincent Guittot, Chirantan Ekbote, David Riley, olof,
	linux-samsung-soc, Amit Daniel Kachhap, javier.martinez,
	Doug Anderson, tglx, linux-kernel, linux-arm-kernel

The MCT has a nice 64-bit counter.  That means that we _can_ register
as a 64-bit clocksource and sched_clock.  ...but that doesn't mean we
should.

The 64-bit counter is read by reading two 32-bit registers.  That
means reading needs to be something like:
- Read upper half
- Read lower half
- Read upper half and confirm that it hasn't changed.

That wouldn't be terrible, but:
- THe MCT isn't very fast to access (hundreds of nanoseconds).
- The clocksource is queried _all the time_.

In total system profiles of real workloads on ChromeOS, we've seen
exynos_frc_read() taking 2% or more of CPU time even after optimizing
the 3 reads above to 2 (see below).

The MCT is clocked at ~24MHz on all known systems.  That means that
the 32-bit half of the counter rolls over every ~178 seconds.  This
inspired an optimization in ChromeOS to cache the upper half between
calls, moving 3 reads to 2.  ...but we can do better!  Having a 32-bit
timer that flips every 178 seconds is more than sufficient for Linux.
Let's just use the lower half of the MCT.

Times on 5420 to do 1000000 gettimeofday() calls from userspace:
* Original code:                      1323852 us
* ChromeOS cache upper half:          1173084 us
* ChromeOS + ldmia to optimize:       1045674 us
* Use lower 32-bit only (this code):  1014429 us

As you can see, the time used doesn't increase linearly with the
number of reads and we can make 64-bit work almost as fast as 32-bit
with a bit of assembly code.  But since there's no real gain for
64-bit, let's go with the simplest and fastest implementation.

Note: with this change roughly half the time for gettimeofday() is
spent in exynos_frc_read().  The rest is timer / system call overhead.

Also note: this patch disables the use of the MCT on ARM64 systems
until we've sorted out how to make "cycles_t" always 32-bit.  Really
ARM64 systems should be using arch timers anyway.

Signed-off-by: Doug Anderson <dianders@chromium.org>
---
Changes in v3:
- Now 32-bit version instead of ldmia version

Changes in v2: None

 drivers/clocksource/Kconfig      |  1 +
 drivers/clocksource/exynos_mct.c | 39 ++++++++++++++++++++++++++++++++-------
 2 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
index 065131c..a7aeee8 100644
--- a/drivers/clocksource/Kconfig
+++ b/drivers/clocksource/Kconfig
@@ -125,6 +125,7 @@ config CLKSRC_METAG_GENERIC
 
 config CLKSRC_EXYNOS_MCT
 	def_bool y if ARCH_EXYNOS
+	depends on !ARM64
 	help
 	  Support for Multi Core Timer controller on Exynos SoCs.
 
diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index 2df03e2..9403061 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -162,7 +162,17 @@ static void exynos4_mct_frc_start(void)
 	exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
 }
 
-static cycle_t notrace _exynos4_frc_read(void)
+/**
+ * exynos4_read_count_64 - Read all 64-bits of the global counter
+ *
+ * This will read all 64-bits of the global counter taking care to make sure
+ * that the upper and lower half match.  Note that reading the MCT can be quite
+ * slow (hundreds of nanoseconds) so you should use the 32-bit (lower half
+ * only) version when possible.
+ *
+ * Returns the number of cycles in the global counter.
+ */
+static u64 exynos4_read_count_64(void)
 {
 	unsigned int lo, hi;
 	u32 hi2 = readl_relaxed(reg_base + EXYNOS4_MCT_G_CNT_U);
@@ -176,9 +186,22 @@ static cycle_t notrace _exynos4_frc_read(void)
 	return ((cycle_t)hi << 32) | lo;
 }
 
+/**
+ * exynos4_read_count_32 - Read the lower 32-bits of the global counter
+ *
+ * This will read just the lower 32-bits of the global counter.  This is marked
+ * as notrace so it can be used by the scheduler clock.
+ *
+ * Returns the number of cycles in the global counter (lower 32 bits).
+ */
+static u32 notrace exynos4_read_count_32(void)
+{
+	return readl_relaxed(reg_base + EXYNOS4_MCT_G_CNT_L);
+}
+
 static cycle_t exynos4_frc_read(struct clocksource *cs)
 {
-	return _exynos4_frc_read();
+	return exynos4_read_count_32();
 }
 
 static void exynos4_frc_resume(struct clocksource *cs)
@@ -190,21 +213,23 @@ struct clocksource mct_frc = {
 	.name		= "mct-frc",
 	.rating		= 400,
 	.read		= exynos4_frc_read,
-	.mask		= CLOCKSOURCE_MASK(64),
+	.mask		= CLOCKSOURCE_MASK(32),
 	.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
 	.resume		= exynos4_frc_resume,
 };
 
 static u64 notrace exynos4_read_sched_clock(void)
 {
-	return _exynos4_frc_read();
+	return exynos4_read_count_32();
 }
 
 static struct delay_timer exynos4_delay_timer;
 
 static cycles_t exynos4_read_current_timer(void)
 {
-	return _exynos4_frc_read();
+	BUILD_BUG_ON_MSG(sizeof(cycles_t) != sizeof(u32),
+			 "cycles_t needs to move to 32-bit for ARM64 usage");
+	return exynos4_read_count_32();
 }
 
 static void __init exynos4_clocksource_init(void)
@@ -218,7 +243,7 @@ static void __init exynos4_clocksource_init(void)
 	if (clocksource_register_hz(&mct_frc, clk_rate))
 		panic("%s: can't register clocksource\n", mct_frc.name);
 
-	sched_clock_register(exynos4_read_sched_clock, 64, clk_rate);
+	sched_clock_register(exynos4_read_sched_clock, 32, clk_rate);
 }
 
 static void exynos4_mct_comp0_stop(void)
@@ -245,7 +270,7 @@ static void exynos4_mct_comp0_start(enum clock_event_mode mode,
 		exynos4_mct_write(cycles, EXYNOS4_MCT_G_COMP0_ADD_INCR);
 	}
 
-	comp_cycle = exynos4_frc_read(&mct_frc) + cycles;
+	comp_cycle = exynos4_read_count_64() + cycles;
 	exynos4_mct_write((u32)comp_cycle, EXYNOS4_MCT_G_COMP0_L);
 	exynos4_mct_write((u32)(comp_cycle >> 32), EXYNOS4_MCT_G_COMP0_U);
 
-- 
2.0.0.526.g5318336


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/3] clocksource: exynos_mct: Only use 32-bits where possible
@ 2014-06-20 17:47   ` Doug Anderson
  0 siblings, 0 replies; 12+ messages in thread
From: Doug Anderson @ 2014-06-20 17:47 UTC (permalink / raw)
  To: linux-arm-kernel

The MCT has a nice 64-bit counter.  That means that we _can_ register
as a 64-bit clocksource and sched_clock.  ...but that doesn't mean we
should.

The 64-bit counter is read by reading two 32-bit registers.  That
means reading needs to be something like:
- Read upper half
- Read lower half
- Read upper half and confirm that it hasn't changed.

That wouldn't be terrible, but:
- THe MCT isn't very fast to access (hundreds of nanoseconds).
- The clocksource is queried _all the time_.

In total system profiles of real workloads on ChromeOS, we've seen
exynos_frc_read() taking 2% or more of CPU time even after optimizing
the 3 reads above to 2 (see below).

The MCT is clocked at ~24MHz on all known systems.  That means that
the 32-bit half of the counter rolls over every ~178 seconds.  This
inspired an optimization in ChromeOS to cache the upper half between
calls, moving 3 reads to 2.  ...but we can do better!  Having a 32-bit
timer that flips every 178 seconds is more than sufficient for Linux.
Let's just use the lower half of the MCT.

Times on 5420 to do 1000000 gettimeofday() calls from userspace:
* Original code:                      1323852 us
* ChromeOS cache upper half:          1173084 us
* ChromeOS + ldmia to optimize:       1045674 us
* Use lower 32-bit only (this code):  1014429 us

As you can see, the time used doesn't increase linearly with the
number of reads and we can make 64-bit work almost as fast as 32-bit
with a bit of assembly code.  But since there's no real gain for
64-bit, let's go with the simplest and fastest implementation.

Note: with this change roughly half the time for gettimeofday() is
spent in exynos_frc_read().  The rest is timer / system call overhead.

Also note: this patch disables the use of the MCT on ARM64 systems
until we've sorted out how to make "cycles_t" always 32-bit.  Really
ARM64 systems should be using arch timers anyway.

Signed-off-by: Doug Anderson <dianders@chromium.org>
---
Changes in v3:
- Now 32-bit version instead of ldmia version

Changes in v2: None

 drivers/clocksource/Kconfig      |  1 +
 drivers/clocksource/exynos_mct.c | 39 ++++++++++++++++++++++++++++++++-------
 2 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
index 065131c..a7aeee8 100644
--- a/drivers/clocksource/Kconfig
+++ b/drivers/clocksource/Kconfig
@@ -125,6 +125,7 @@ config CLKSRC_METAG_GENERIC
 
 config CLKSRC_EXYNOS_MCT
 	def_bool y if ARCH_EXYNOS
+	depends on !ARM64
 	help
 	  Support for Multi Core Timer controller on Exynos SoCs.
 
diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index 2df03e2..9403061 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -162,7 +162,17 @@ static void exynos4_mct_frc_start(void)
 	exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
 }
 
-static cycle_t notrace _exynos4_frc_read(void)
+/**
+ * exynos4_read_count_64 - Read all 64-bits of the global counter
+ *
+ * This will read all 64-bits of the global counter taking care to make sure
+ * that the upper and lower half match.  Note that reading the MCT can be quite
+ * slow (hundreds of nanoseconds) so you should use the 32-bit (lower half
+ * only) version when possible.
+ *
+ * Returns the number of cycles in the global counter.
+ */
+static u64 exynos4_read_count_64(void)
 {
 	unsigned int lo, hi;
 	u32 hi2 = readl_relaxed(reg_base + EXYNOS4_MCT_G_CNT_U);
@@ -176,9 +186,22 @@ static cycle_t notrace _exynos4_frc_read(void)
 	return ((cycle_t)hi << 32) | lo;
 }
 
+/**
+ * exynos4_read_count_32 - Read the lower 32-bits of the global counter
+ *
+ * This will read just the lower 32-bits of the global counter.  This is marked
+ * as notrace so it can be used by the scheduler clock.
+ *
+ * Returns the number of cycles in the global counter (lower 32 bits).
+ */
+static u32 notrace exynos4_read_count_32(void)
+{
+	return readl_relaxed(reg_base + EXYNOS4_MCT_G_CNT_L);
+}
+
 static cycle_t exynos4_frc_read(struct clocksource *cs)
 {
-	return _exynos4_frc_read();
+	return exynos4_read_count_32();
 }
 
 static void exynos4_frc_resume(struct clocksource *cs)
@@ -190,21 +213,23 @@ struct clocksource mct_frc = {
 	.name		= "mct-frc",
 	.rating		= 400,
 	.read		= exynos4_frc_read,
-	.mask		= CLOCKSOURCE_MASK(64),
+	.mask		= CLOCKSOURCE_MASK(32),
 	.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
 	.resume		= exynos4_frc_resume,
 };
 
 static u64 notrace exynos4_read_sched_clock(void)
 {
-	return _exynos4_frc_read();
+	return exynos4_read_count_32();
 }
 
 static struct delay_timer exynos4_delay_timer;
 
 static cycles_t exynos4_read_current_timer(void)
 {
-	return _exynos4_frc_read();
+	BUILD_BUG_ON_MSG(sizeof(cycles_t) != sizeof(u32),
+			 "cycles_t needs to move to 32-bit for ARM64 usage");
+	return exynos4_read_count_32();
 }
 
 static void __init exynos4_clocksource_init(void)
@@ -218,7 +243,7 @@ static void __init exynos4_clocksource_init(void)
 	if (clocksource_register_hz(&mct_frc, clk_rate))
 		panic("%s: can't register clocksource\n", mct_frc.name);
 
-	sched_clock_register(exynos4_read_sched_clock, 64, clk_rate);
+	sched_clock_register(exynos4_read_sched_clock, 32, clk_rate);
 }
 
 static void exynos4_mct_comp0_stop(void)
@@ -245,7 +270,7 @@ static void exynos4_mct_comp0_start(enum clock_event_mode mode,
 		exynos4_mct_write(cycles, EXYNOS4_MCT_G_COMP0_ADD_INCR);
 	}
 
-	comp_cycle = exynos4_frc_read(&mct_frc) + cycles;
+	comp_cycle = exynos4_read_count_64() + cycles;
 	exynos4_mct_write((u32)comp_cycle, EXYNOS4_MCT_G_COMP0_L);
 	exynos4_mct_write((u32)(comp_cycle >> 32), EXYNOS4_MCT_G_COMP0_U);
 
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] clocksource: exynos_mct: Only use 32-bits where possible
  2014-06-20 17:47   ` Doug Anderson
@ 2014-06-23  9:53     ` Vincent Guittot
  -1 siblings, 0 replies; 12+ messages in thread
From: Vincent Guittot @ 2014-06-23  9:53 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Daniel Lezcano, Kukjin Kim, Tomasz Figa, Chirantan Ekbote,
	David Riley, Olof Johansson, linux-samsung-soc,
	Amit Daniel Kachhap, javier.martinez, Thomas Gleixner,
	linux-kernel, LAK

Hi Doug,

Acked-by Vincent Guittot <vincent.guittot@linaro.org>

Vincent

On 20 June 2014 19:47, Doug Anderson <dianders@chromium.org> wrote:
> The MCT has a nice 64-bit counter.  That means that we _can_ register
> as a 64-bit clocksource and sched_clock.  ...but that doesn't mean we
> should.
>
> The 64-bit counter is read by reading two 32-bit registers.  That
> means reading needs to be something like:
> - Read upper half
> - Read lower half
> - Read upper half and confirm that it hasn't changed.
>
> That wouldn't be terrible, but:
> - THe MCT isn't very fast to access (hundreds of nanoseconds).
> - The clocksource is queried _all the time_.
>
> In total system profiles of real workloads on ChromeOS, we've seen
> exynos_frc_read() taking 2% or more of CPU time even after optimizing
> the 3 reads above to 2 (see below).
>
> The MCT is clocked at ~24MHz on all known systems.  That means that
> the 32-bit half of the counter rolls over every ~178 seconds.  This
> inspired an optimization in ChromeOS to cache the upper half between
> calls, moving 3 reads to 2.  ...but we can do better!  Having a 32-bit
> timer that flips every 178 seconds is more than sufficient for Linux.
> Let's just use the lower half of the MCT.
>
> Times on 5420 to do 1000000 gettimeofday() calls from userspace:
> * Original code:                      1323852 us
> * ChromeOS cache upper half:          1173084 us
> * ChromeOS + ldmia to optimize:       1045674 us
> * Use lower 32-bit only (this code):  1014429 us
>
> As you can see, the time used doesn't increase linearly with the
> number of reads and we can make 64-bit work almost as fast as 32-bit
> with a bit of assembly code.  But since there's no real gain for
> 64-bit, let's go with the simplest and fastest implementation.
>
> Note: with this change roughly half the time for gettimeofday() is
> spent in exynos_frc_read().  The rest is timer / system call overhead.
>
> Also note: this patch disables the use of the MCT on ARM64 systems
> until we've sorted out how to make "cycles_t" always 32-bit.  Really
> ARM64 systems should be using arch timers anyway.
>
> Signed-off-by: Doug Anderson <dianders@chromium.org>
> ---
> Changes in v3:
> - Now 32-bit version instead of ldmia version
>
> Changes in v2: None
>
>  drivers/clocksource/Kconfig      |  1 +
>  drivers/clocksource/exynos_mct.c | 39 ++++++++++++++++++++++++++++++++-------
>  2 files changed, 33 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
> index 065131c..a7aeee8 100644
> --- a/drivers/clocksource/Kconfig
> +++ b/drivers/clocksource/Kconfig
> @@ -125,6 +125,7 @@ config CLKSRC_METAG_GENERIC
>
>  config CLKSRC_EXYNOS_MCT
>         def_bool y if ARCH_EXYNOS
> +       depends on !ARM64
>         help
>           Support for Multi Core Timer controller on Exynos SoCs.
>
> diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
> index 2df03e2..9403061 100644
> --- a/drivers/clocksource/exynos_mct.c
> +++ b/drivers/clocksource/exynos_mct.c
> @@ -162,7 +162,17 @@ static void exynos4_mct_frc_start(void)
>         exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
>  }
>
> -static cycle_t notrace _exynos4_frc_read(void)
> +/**
> + * exynos4_read_count_64 - Read all 64-bits of the global counter
> + *
> + * This will read all 64-bits of the global counter taking care to make sure
> + * that the upper and lower half match.  Note that reading the MCT can be quite
> + * slow (hundreds of nanoseconds) so you should use the 32-bit (lower half
> + * only) version when possible.
> + *
> + * Returns the number of cycles in the global counter.
> + */
> +static u64 exynos4_read_count_64(void)
>  {
>         unsigned int lo, hi;
>         u32 hi2 = readl_relaxed(reg_base + EXYNOS4_MCT_G_CNT_U);
> @@ -176,9 +186,22 @@ static cycle_t notrace _exynos4_frc_read(void)
>         return ((cycle_t)hi << 32) | lo;
>  }
>
> +/**
> + * exynos4_read_count_32 - Read the lower 32-bits of the global counter
> + *
> + * This will read just the lower 32-bits of the global counter.  This is marked
> + * as notrace so it can be used by the scheduler clock.
> + *
> + * Returns the number of cycles in the global counter (lower 32 bits).
> + */
> +static u32 notrace exynos4_read_count_32(void)
> +{
> +       return readl_relaxed(reg_base + EXYNOS4_MCT_G_CNT_L);
> +}
> +
>  static cycle_t exynos4_frc_read(struct clocksource *cs)
>  {
> -       return _exynos4_frc_read();
> +       return exynos4_read_count_32();
>  }
>
>  static void exynos4_frc_resume(struct clocksource *cs)
> @@ -190,21 +213,23 @@ struct clocksource mct_frc = {
>         .name           = "mct-frc",
>         .rating         = 400,
>         .read           = exynos4_frc_read,
> -       .mask           = CLOCKSOURCE_MASK(64),
> +       .mask           = CLOCKSOURCE_MASK(32),
>         .flags          = CLOCK_SOURCE_IS_CONTINUOUS,
>         .resume         = exynos4_frc_resume,
>  };
>
>  static u64 notrace exynos4_read_sched_clock(void)
>  {
> -       return _exynos4_frc_read();
> +       return exynos4_read_count_32();
>  }
>
>  static struct delay_timer exynos4_delay_timer;
>
>  static cycles_t exynos4_read_current_timer(void)
>  {
> -       return _exynos4_frc_read();
> +       BUILD_BUG_ON_MSG(sizeof(cycles_t) != sizeof(u32),
> +                        "cycles_t needs to move to 32-bit for ARM64 usage");
> +       return exynos4_read_count_32();
>  }
>
>  static void __init exynos4_clocksource_init(void)
> @@ -218,7 +243,7 @@ static void __init exynos4_clocksource_init(void)
>         if (clocksource_register_hz(&mct_frc, clk_rate))
>                 panic("%s: can't register clocksource\n", mct_frc.name);
>
> -       sched_clock_register(exynos4_read_sched_clock, 64, clk_rate);
> +       sched_clock_register(exynos4_read_sched_clock, 32, clk_rate);
>  }
>
>  static void exynos4_mct_comp0_stop(void)
> @@ -245,7 +270,7 @@ static void exynos4_mct_comp0_start(enum clock_event_mode mode,
>                 exynos4_mct_write(cycles, EXYNOS4_MCT_G_COMP0_ADD_INCR);
>         }
>
> -       comp_cycle = exynos4_frc_read(&mct_frc) + cycles;
> +       comp_cycle = exynos4_read_count_64() + cycles;
>         exynos4_mct_write((u32)comp_cycle, EXYNOS4_MCT_G_COMP0_L);
>         exynos4_mct_write((u32)(comp_cycle >> 32), EXYNOS4_MCT_G_COMP0_U);
>
> --
> 2.0.0.526.g5318336
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 3/3] clocksource: exynos_mct: Only use 32-bits where possible
@ 2014-06-23  9:53     ` Vincent Guittot
  0 siblings, 0 replies; 12+ messages in thread
From: Vincent Guittot @ 2014-06-23  9:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Doug,

Acked-by Vincent Guittot <vincent.guittot@linaro.org>

Vincent

On 20 June 2014 19:47, Doug Anderson <dianders@chromium.org> wrote:
> The MCT has a nice 64-bit counter.  That means that we _can_ register
> as a 64-bit clocksource and sched_clock.  ...but that doesn't mean we
> should.
>
> The 64-bit counter is read by reading two 32-bit registers.  That
> means reading needs to be something like:
> - Read upper half
> - Read lower half
> - Read upper half and confirm that it hasn't changed.
>
> That wouldn't be terrible, but:
> - THe MCT isn't very fast to access (hundreds of nanoseconds).
> - The clocksource is queried _all the time_.
>
> In total system profiles of real workloads on ChromeOS, we've seen
> exynos_frc_read() taking 2% or more of CPU time even after optimizing
> the 3 reads above to 2 (see below).
>
> The MCT is clocked at ~24MHz on all known systems.  That means that
> the 32-bit half of the counter rolls over every ~178 seconds.  This
> inspired an optimization in ChromeOS to cache the upper half between
> calls, moving 3 reads to 2.  ...but we can do better!  Having a 32-bit
> timer that flips every 178 seconds is more than sufficient for Linux.
> Let's just use the lower half of the MCT.
>
> Times on 5420 to do 1000000 gettimeofday() calls from userspace:
> * Original code:                      1323852 us
> * ChromeOS cache upper half:          1173084 us
> * ChromeOS + ldmia to optimize:       1045674 us
> * Use lower 32-bit only (this code):  1014429 us
>
> As you can see, the time used doesn't increase linearly with the
> number of reads and we can make 64-bit work almost as fast as 32-bit
> with a bit of assembly code.  But since there's no real gain for
> 64-bit, let's go with the simplest and fastest implementation.
>
> Note: with this change roughly half the time for gettimeofday() is
> spent in exynos_frc_read().  The rest is timer / system call overhead.
>
> Also note: this patch disables the use of the MCT on ARM64 systems
> until we've sorted out how to make "cycles_t" always 32-bit.  Really
> ARM64 systems should be using arch timers anyway.
>
> Signed-off-by: Doug Anderson <dianders@chromium.org>
> ---
> Changes in v3:
> - Now 32-bit version instead of ldmia version
>
> Changes in v2: None
>
>  drivers/clocksource/Kconfig      |  1 +
>  drivers/clocksource/exynos_mct.c | 39 ++++++++++++++++++++++++++++++++-------
>  2 files changed, 33 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
> index 065131c..a7aeee8 100644
> --- a/drivers/clocksource/Kconfig
> +++ b/drivers/clocksource/Kconfig
> @@ -125,6 +125,7 @@ config CLKSRC_METAG_GENERIC
>
>  config CLKSRC_EXYNOS_MCT
>         def_bool y if ARCH_EXYNOS
> +       depends on !ARM64
>         help
>           Support for Multi Core Timer controller on Exynos SoCs.
>
> diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
> index 2df03e2..9403061 100644
> --- a/drivers/clocksource/exynos_mct.c
> +++ b/drivers/clocksource/exynos_mct.c
> @@ -162,7 +162,17 @@ static void exynos4_mct_frc_start(void)
>         exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
>  }
>
> -static cycle_t notrace _exynos4_frc_read(void)
> +/**
> + * exynos4_read_count_64 - Read all 64-bits of the global counter
> + *
> + * This will read all 64-bits of the global counter taking care to make sure
> + * that the upper and lower half match.  Note that reading the MCT can be quite
> + * slow (hundreds of nanoseconds) so you should use the 32-bit (lower half
> + * only) version when possible.
> + *
> + * Returns the number of cycles in the global counter.
> + */
> +static u64 exynos4_read_count_64(void)
>  {
>         unsigned int lo, hi;
>         u32 hi2 = readl_relaxed(reg_base + EXYNOS4_MCT_G_CNT_U);
> @@ -176,9 +186,22 @@ static cycle_t notrace _exynos4_frc_read(void)
>         return ((cycle_t)hi << 32) | lo;
>  }
>
> +/**
> + * exynos4_read_count_32 - Read the lower 32-bits of the global counter
> + *
> + * This will read just the lower 32-bits of the global counter.  This is marked
> + * as notrace so it can be used by the scheduler clock.
> + *
> + * Returns the number of cycles in the global counter (lower 32 bits).
> + */
> +static u32 notrace exynos4_read_count_32(void)
> +{
> +       return readl_relaxed(reg_base + EXYNOS4_MCT_G_CNT_L);
> +}
> +
>  static cycle_t exynos4_frc_read(struct clocksource *cs)
>  {
> -       return _exynos4_frc_read();
> +       return exynos4_read_count_32();
>  }
>
>  static void exynos4_frc_resume(struct clocksource *cs)
> @@ -190,21 +213,23 @@ struct clocksource mct_frc = {
>         .name           = "mct-frc",
>         .rating         = 400,
>         .read           = exynos4_frc_read,
> -       .mask           = CLOCKSOURCE_MASK(64),
> +       .mask           = CLOCKSOURCE_MASK(32),
>         .flags          = CLOCK_SOURCE_IS_CONTINUOUS,
>         .resume         = exynos4_frc_resume,
>  };
>
>  static u64 notrace exynos4_read_sched_clock(void)
>  {
> -       return _exynos4_frc_read();
> +       return exynos4_read_count_32();
>  }
>
>  static struct delay_timer exynos4_delay_timer;
>
>  static cycles_t exynos4_read_current_timer(void)
>  {
> -       return _exynos4_frc_read();
> +       BUILD_BUG_ON_MSG(sizeof(cycles_t) != sizeof(u32),
> +                        "cycles_t needs to move to 32-bit for ARM64 usage");
> +       return exynos4_read_count_32();
>  }
>
>  static void __init exynos4_clocksource_init(void)
> @@ -218,7 +243,7 @@ static void __init exynos4_clocksource_init(void)
>         if (clocksource_register_hz(&mct_frc, clk_rate))
>                 panic("%s: can't register clocksource\n", mct_frc.name);
>
> -       sched_clock_register(exynos4_read_sched_clock, 64, clk_rate);
> +       sched_clock_register(exynos4_read_sched_clock, 32, clk_rate);
>  }
>
>  static void exynos4_mct_comp0_stop(void)
> @@ -245,7 +270,7 @@ static void exynos4_mct_comp0_start(enum clock_event_mode mode,
>                 exynos4_mct_write(cycles, EXYNOS4_MCT_G_COMP0_ADD_INCR);
>         }
>
> -       comp_cycle = exynos4_frc_read(&mct_frc) + cycles;
> +       comp_cycle = exynos4_read_count_64() + cycles;
>         exynos4_mct_write((u32)comp_cycle, EXYNOS4_MCT_G_COMP0_L);
>         exynos4_mct_write((u32)(comp_cycle >> 32), EXYNOS4_MCT_G_COMP0_U);
>
> --
> 2.0.0.526.g5318336
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/3] Exynos MCT udelay, MCT cleanup, MCT to 32-bits
  2014-06-20 17:47 ` Doug Anderson
@ 2014-07-04 21:44   ` Kukjin Kim
  -1 siblings, 0 replies; 12+ messages in thread
From: Kukjin Kim @ 2014-07-04 21:44 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Daniel Lezcano, Kukjin Kim, Tomasz Figa, linux-samsung-soc,
	David Riley, Chirantan Ekbote, linux-kernel, Amit Daniel Kachhap,
	olof, Vincent Guittot, tglx, javier.martinez, linux-arm-kernel

On 06/21/14 02:47, Doug Anderson wrote:
> This is a series of 3 patches related to the exynos MCT (multi core
> timer).  The first allows MCT to function as a udelay() timer which
> fixes broken udelay on 5400, 5800, and even (to a lesser extent) on
> 5250.  The second is some general cleanup.  The third moves MCT to
> 32-bits where possible to give us a nice speedup.
>
> The first probably ought to be destined for 3.16 as a bugfix whereas
> the others could land in a future kernel release.
>
> This series is based on (clocksource: exynos_mct: Fix ftrace).
>
> With this series we can drop the patches I submitted:
> - clocksource: exynos_mct: cache mct upper count
> - clocksource: exynos_mct: Optimize register reads with ldmia
>
> Changes in v3:
> - Back to exynos_frc_read for now until 32/64 is resolved.
> - Now returns cycles_t which matches arch/arm/include/asm/timex.h.
> - Rebased.
> - Moved registration to its own function.
> - __raw_readl / __raw_writel patch new for version 3
> - Now 32-bit version instead of ldmia version
>
> Changes in v2:
> - Added #defines for ARM and ARM64 as pointed by Doug Anderson.
>
> Amit Daniel Kachhap (1):
>    clocksource: exynos_mct: Register the timer for stable udelay
>
> Doug Anderson (2):
>    clocksource: exynos_mct: __raw_readl/__raw_writel =>
>      readl_relaxed/writel_relaxed
>    clocksource: exynos_mct: Only use 32-bits where possible
>
>   drivers/clocksource/Kconfig      |  1 +
>   drivers/clocksource/exynos_mct.c | 72 ++++++++++++++++++++++++++++++----------
>   2 files changed, 55 insertions(+), 18 deletions(-)
>

Sorry for late taking this series...looks good to me and applied 
including previous 'fix ftrace'.

Thanks,
Kukjin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 0/3] Exynos MCT udelay, MCT cleanup, MCT to 32-bits
@ 2014-07-04 21:44   ` Kukjin Kim
  0 siblings, 0 replies; 12+ messages in thread
From: Kukjin Kim @ 2014-07-04 21:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/21/14 02:47, Doug Anderson wrote:
> This is a series of 3 patches related to the exynos MCT (multi core
> timer).  The first allows MCT to function as a udelay() timer which
> fixes broken udelay on 5400, 5800, and even (to a lesser extent) on
> 5250.  The second is some general cleanup.  The third moves MCT to
> 32-bits where possible to give us a nice speedup.
>
> The first probably ought to be destined for 3.16 as a bugfix whereas
> the others could land in a future kernel release.
>
> This series is based on (clocksource: exynos_mct: Fix ftrace).
>
> With this series we can drop the patches I submitted:
> - clocksource: exynos_mct: cache mct upper count
> - clocksource: exynos_mct: Optimize register reads with ldmia
>
> Changes in v3:
> - Back to exynos_frc_read for now until 32/64 is resolved.
> - Now returns cycles_t which matches arch/arm/include/asm/timex.h.
> - Rebased.
> - Moved registration to its own function.
> - __raw_readl / __raw_writel patch new for version 3
> - Now 32-bit version instead of ldmia version
>
> Changes in v2:
> - Added #defines for ARM and ARM64 as pointed by Doug Anderson.
>
> Amit Daniel Kachhap (1):
>    clocksource: exynos_mct: Register the timer for stable udelay
>
> Doug Anderson (2):
>    clocksource: exynos_mct: __raw_readl/__raw_writel =>
>      readl_relaxed/writel_relaxed
>    clocksource: exynos_mct: Only use 32-bits where possible
>
>   drivers/clocksource/Kconfig      |  1 +
>   drivers/clocksource/exynos_mct.c | 72 ++++++++++++++++++++++++++++++----------
>   2 files changed, 55 insertions(+), 18 deletions(-)
>

Sorry for late taking this series...looks good to me and applied 
including previous 'fix ftrace'.

Thanks,
Kukjin

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-07-04 21:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-20 17:47 [PATCH v3 0/3] Exynos MCT udelay, MCT cleanup, MCT to 32-bits Doug Anderson
2014-06-20 17:47 ` Doug Anderson
2014-06-20 17:47 ` [PATCH v3 1/3] clocksource: exynos_mct: Register the timer for stable udelay Doug Anderson
2014-06-20 17:47   ` Doug Anderson
2014-06-20 17:47 ` [PATCH v3 2/3] clocksource: exynos_mct: __raw_readl/__raw_writel => readl_relaxed/writel_relaxed Doug Anderson
2014-06-20 17:47   ` Doug Anderson
2014-06-20 17:47 ` [PATCH v3 3/3] clocksource: exynos_mct: Only use 32-bits where possible Doug Anderson
2014-06-20 17:47   ` Doug Anderson
2014-06-23  9:53   ` Vincent Guittot
2014-06-23  9:53     ` Vincent Guittot
2014-07-04 21:44 ` [PATCH v3 0/3] Exynos MCT udelay, MCT cleanup, MCT to 32-bits Kukjin Kim
2014-07-04 21:44   ` Kukjin Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.