All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
@ 2014-06-04 17:30 ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-04 17:30 UTC (permalink / raw)
  To: Kukjin Kim, Tomasz Figa
  Cc: Daniel Lezcano, Vincent Guittot, Chirantan Ekbote, David Riley,
	olof, linux-samsung-soc, Doug Anderson, tglx, linux-kernel,
	linux-arm-kernel

In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
supported using the MCT as a scheduler clock.  We properly marked
exynos4_read_sched_clock() as notrace.  However, we then went and
called another function that _wasn't_ notrace.  That means if you do:

  cd /sys/kernel/debug/tracing/
  echo function_graph > current_tracer

You'll get a crash.

Fix this (but still let other readers of the MCT be trace-enabled) by
adding an extra function.  It's important to keep other users of MCT
traceable because the MCT is actually quite slow.

Signed-off-by: Doug Anderson <dianders@chromium.org>
---
 drivers/clocksource/exynos_mct.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index 8d64200..ba3a683 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -165,7 +165,7 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
 	exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
 }
 
-static cycle_t exynos4_frc_read(struct clocksource *cs)
+static inline cycle_t notrace _exynos4_frc_read(void)
 {
 	unsigned int lo, hi;
 	u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
@@ -179,6 +179,11 @@ static cycle_t exynos4_frc_read(struct clocksource *cs)
 	return ((cycle_t)hi << 32) | lo;
 }
 
+static cycle_t exynos4_frc_read(struct clocksource *cs)
+{
+	return _exynos4_frc_read();
+}
+
 static void exynos4_frc_resume(struct clocksource *cs)
 {
 	exynos4_mct_frc_start(0, 0);
@@ -195,7 +200,7 @@ struct clocksource mct_frc = {
 
 static u64 notrace exynos4_read_sched_clock(void)
 {
-	return exynos4_frc_read(&mct_frc);
+	return _exynos4_frc_read();
 }
 
 static void __init exynos4_clocksource_init(void)
-- 
2.0.0.526.g5318336


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
@ 2014-06-04 17:30 ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-04 17:30 UTC (permalink / raw)
  To: Kukjin Kim, Tomasz Figa
  Cc: linux-samsung-soc, David Riley, Chirantan Ekbote, Daniel Lezcano,
	Doug Anderson, linux-kernel, olof, Vincent Guittot, tglx,
	linux-arm-kernel

In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
supported using the MCT as a scheduler clock.  We properly marked
exynos4_read_sched_clock() as notrace.  However, we then went and
called another function that _wasn't_ notrace.  That means if you do:

  cd /sys/kernel/debug/tracing/
  echo function_graph > current_tracer

You'll get a crash.

Fix this (but still let other readers of the MCT be trace-enabled) by
adding an extra function.  It's important to keep other users of MCT
traceable because the MCT is actually quite slow.

Signed-off-by: Doug Anderson <dianders@chromium.org>
---
 drivers/clocksource/exynos_mct.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index 8d64200..ba3a683 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -165,7 +165,7 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
 	exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
 }
 
-static cycle_t exynos4_frc_read(struct clocksource *cs)
+static inline cycle_t notrace _exynos4_frc_read(void)
 {
 	unsigned int lo, hi;
 	u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
@@ -179,6 +179,11 @@ static cycle_t exynos4_frc_read(struct clocksource *cs)
 	return ((cycle_t)hi << 32) | lo;
 }
 
+static cycle_t exynos4_frc_read(struct clocksource *cs)
+{
+	return _exynos4_frc_read();
+}
+
 static void exynos4_frc_resume(struct clocksource *cs)
 {
 	exynos4_mct_frc_start(0, 0);
@@ -195,7 +200,7 @@ struct clocksource mct_frc = {
 
 static u64 notrace exynos4_read_sched_clock(void)
 {
-	return exynos4_frc_read(&mct_frc);
+	return _exynos4_frc_read();
 }
 
 static void __init exynos4_clocksource_init(void)
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
@ 2014-06-04 17:30 ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-04 17:30 UTC (permalink / raw)
  To: linux-arm-kernel

In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
supported using the MCT as a scheduler clock.  We properly marked
exynos4_read_sched_clock() as notrace.  However, we then went and
called another function that _wasn't_ notrace.  That means if you do:

  cd /sys/kernel/debug/tracing/
  echo function_graph > current_tracer

You'll get a crash.

Fix this (but still let other readers of the MCT be trace-enabled) by
adding an extra function.  It's important to keep other users of MCT
traceable because the MCT is actually quite slow.

Signed-off-by: Doug Anderson <dianders@chromium.org>
---
 drivers/clocksource/exynos_mct.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index 8d64200..ba3a683 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -165,7 +165,7 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
 	exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
 }
 
-static cycle_t exynos4_frc_read(struct clocksource *cs)
+static inline cycle_t notrace _exynos4_frc_read(void)
 {
 	unsigned int lo, hi;
 	u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
@@ -179,6 +179,11 @@ static cycle_t exynos4_frc_read(struct clocksource *cs)
 	return ((cycle_t)hi << 32) | lo;
 }
 
+static cycle_t exynos4_frc_read(struct clocksource *cs)
+{
+	return _exynos4_frc_read();
+}
+
 static void exynos4_frc_resume(struct clocksource *cs)
 {
 	exynos4_mct_frc_start(0, 0);
@@ -195,7 +200,7 @@ struct clocksource mct_frc = {
 
 static u64 notrace exynos4_read_sched_clock(void)
 {
-	return exynos4_frc_read(&mct_frc);
+	return _exynos4_frc_read();
 }
 
 static void __init exynos4_clocksource_init(void)
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 2/3] clocksource: exynos_mct: cache mct upper count
  2014-06-04 17:30 ` Doug Anderson
  (?)
@ 2014-06-04 17:30   ` Doug Anderson
  -1 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-04 17:30 UTC (permalink / raw)
  To: Kukjin Kim, Tomasz Figa
  Cc: Daniel Lezcano, Vincent Guittot, Chirantan Ekbote, David Riley,
	olof, linux-samsung-soc, Mandeep Singh Baines, Andrew Bresticker,
	Doug Anderson, tglx, linux-kernel, linux-arm-kernel

From: Mandeep Singh Baines <msb@chromium.org>

Saves one register read.  Note that the upper count only changes every
~178 seconds with a 24MHz source clock, so it's likely it hasn't
changed from call to call.

Before: 1323852 us for 1000000 gettimeofday in userspace
After:  1173084 us for 1000000 gettimeofday in userspace

Note that even with this change the CPU is in exynos_frc_read() more
than 2% of the time in real world profiles of ChromeOS.  That
indicates that it's important to optimize.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Signed-off-by: Doug Anderson <dianders@chromium.org>
---
 drivers/clocksource/exynos_mct.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index ba3a683..7cbe4aa 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -167,8 +167,8 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
 
 static inline cycle_t notrace _exynos4_frc_read(void)
 {
-	unsigned int lo, hi;
-	u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
+	u32 lo, hi;
+	static u32 hi2;
 
 	do {
 		hi = hi2;
-- 
2.0.0.526.g5318336


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 2/3] clocksource: exynos_mct: cache mct upper count
@ 2014-06-04 17:30   ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-04 17:30 UTC (permalink / raw)
  To: Kukjin Kim, Tomasz Figa
  Cc: linux-samsung-soc, David Riley, Chirantan Ekbote,
	Mandeep Singh Baines, Daniel Lezcano, Doug Anderson,
	linux-kernel, Andrew Bresticker, olof, Vincent Guittot, tglx,
	linux-arm-kernel

From: Mandeep Singh Baines <msb@chromium.org>

Saves one register read.  Note that the upper count only changes every
~178 seconds with a 24MHz source clock, so it's likely it hasn't
changed from call to call.

Before: 1323852 us for 1000000 gettimeofday in userspace
After:  1173084 us for 1000000 gettimeofday in userspace

Note that even with this change the CPU is in exynos_frc_read() more
than 2% of the time in real world profiles of ChromeOS.  That
indicates that it's important to optimize.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Signed-off-by: Doug Anderson <dianders@chromium.org>
---
 drivers/clocksource/exynos_mct.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index ba3a683..7cbe4aa 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -167,8 +167,8 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
 
 static inline cycle_t notrace _exynos4_frc_read(void)
 {
-	unsigned int lo, hi;
-	u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
+	u32 lo, hi;
+	static u32 hi2;
 
 	do {
 		hi = hi2;
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 2/3] clocksource: exynos_mct: cache mct upper count
@ 2014-06-04 17:30   ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-04 17:30 UTC (permalink / raw)
  To: linux-arm-kernel

From: Mandeep Singh Baines <msb@chromium.org>

Saves one register read.  Note that the upper count only changes every
~178 seconds with a 24MHz source clock, so it's likely it hasn't
changed from call to call.

Before: 1323852 us for 1000000 gettimeofday in userspace
After:  1173084 us for 1000000 gettimeofday in userspace

Note that even with this change the CPU is in exynos_frc_read() more
than 2% of the time in real world profiles of ChromeOS.  That
indicates that it's important to optimize.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Signed-off-by: Doug Anderson <dianders@chromium.org>
---
 drivers/clocksource/exynos_mct.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index ba3a683..7cbe4aa 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -167,8 +167,8 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
 
 static inline cycle_t notrace _exynos4_frc_read(void)
 {
-	unsigned int lo, hi;
-	u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
+	u32 lo, hi;
+	static u32 hi2;
 
 	do {
 		hi = hi2;
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
  2014-06-04 17:30 ` Doug Anderson
  (?)
@ 2014-06-04 17:30   ` Doug Anderson
  -1 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-04 17:30 UTC (permalink / raw)
  To: Kukjin Kim, Tomasz Figa
  Cc: Daniel Lezcano, Vincent Guittot, Chirantan Ekbote, David Riley,
	olof, linux-samsung-soc, Doug Anderson, tglx, linux-kernel,
	linux-arm-kernel

As we saw in (clocksource: exynos_mct: cache mct upper count), the
time spent reading the MCT shows up fairly high in real-world
profiles.  That means that it's worth some optimization.

We get a roughly 10% speedup in userspace gettimeofday() by using an
ldmia to read the two halfs of the MCT.  That seems like a worthwhile
thing to do.

Before: 1173084 us for 1000000 gettimeofday in userspace
After:  1045674 us for 1000000 gettimeofday in userspace

NOTE: we could actually do better than this if we really wanted to.
Technically we could register the clocksource as a 32-bit timer and
only use the "lower" half.  Doing so brings us down to 1014429 us for
1000000 gettimeofday in userspace (and doesn't even require assembly
code).  That would be an alternative to this change.

Signed-off-by: Doug Anderson <dianders@chromium.org>
---
 drivers/clocksource/exynos_mct.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index 7cbe4aa..6e3017b 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -172,8 +172,9 @@ static inline cycle_t notrace _exynos4_frc_read(void)
 
 	do {
 		hi = hi2;
-		lo = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_L);
-		hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
+		asm volatile("ldmia  %2, {%0, %1}"
+			     : "=r" (lo), "=r" (hi2)
+			     : "r" (reg_base + EXYNOS4_MCT_G_CNT_L));
 	} while (hi != hi2);
 
 	return ((cycle_t)hi << 32) | lo;
-- 
2.0.0.526.g5318336


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
@ 2014-06-04 17:30   ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-04 17:30 UTC (permalink / raw)
  To: Kukjin Kim, Tomasz Figa
  Cc: linux-samsung-soc, David Riley, Chirantan Ekbote, Daniel Lezcano,
	Doug Anderson, linux-kernel, olof, Vincent Guittot, tglx,
	linux-arm-kernel

As we saw in (clocksource: exynos_mct: cache mct upper count), the
time spent reading the MCT shows up fairly high in real-world
profiles.  That means that it's worth some optimization.

We get a roughly 10% speedup in userspace gettimeofday() by using an
ldmia to read the two halfs of the MCT.  That seems like a worthwhile
thing to do.

Before: 1173084 us for 1000000 gettimeofday in userspace
After:  1045674 us for 1000000 gettimeofday in userspace

NOTE: we could actually do better than this if we really wanted to.
Technically we could register the clocksource as a 32-bit timer and
only use the "lower" half.  Doing so brings us down to 1014429 us for
1000000 gettimeofday in userspace (and doesn't even require assembly
code).  That would be an alternative to this change.

Signed-off-by: Doug Anderson <dianders@chromium.org>
---
 drivers/clocksource/exynos_mct.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index 7cbe4aa..6e3017b 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -172,8 +172,9 @@ static inline cycle_t notrace _exynos4_frc_read(void)
 
 	do {
 		hi = hi2;
-		lo = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_L);
-		hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
+		asm volatile("ldmia  %2, {%0, %1}"
+			     : "=r" (lo), "=r" (hi2)
+			     : "r" (reg_base + EXYNOS4_MCT_G_CNT_L));
 	} while (hi != hi2);
 
 	return ((cycle_t)hi << 32) | lo;
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
@ 2014-06-04 17:30   ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-04 17:30 UTC (permalink / raw)
  To: linux-arm-kernel

As we saw in (clocksource: exynos_mct: cache mct upper count), the
time spent reading the MCT shows up fairly high in real-world
profiles.  That means that it's worth some optimization.

We get a roughly 10% speedup in userspace gettimeofday() by using an
ldmia to read the two halfs of the MCT.  That seems like a worthwhile
thing to do.

Before: 1173084 us for 1000000 gettimeofday in userspace
After:  1045674 us for 1000000 gettimeofday in userspace

NOTE: we could actually do better than this if we really wanted to.
Technically we could register the clocksource as a 32-bit timer and
only use the "lower" half.  Doing so brings us down to 1014429 us for
1000000 gettimeofday in userspace (and doesn't even require assembly
code).  That would be an alternative to this change.

Signed-off-by: Doug Anderson <dianders@chromium.org>
---
 drivers/clocksource/exynos_mct.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index 7cbe4aa..6e3017b 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -172,8 +172,9 @@ static inline cycle_t notrace _exynos4_frc_read(void)
 
 	do {
 		hi = hi2;
-		lo = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_L);
-		hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
+		asm volatile("ldmia  %2, {%0, %1}"
+			     : "=r" (lo), "=r" (hi2)
+			     : "r" (reg_base + EXYNOS4_MCT_G_CNT_L));
 	} while (hi != hi2);
 
 	return ((cycle_t)hi << 32) | lo;
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
  2014-06-04 17:30   ` Doug Anderson
@ 2014-06-04 18:05     ` Thomas Gleixner
  -1 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2014-06-04 18:05 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Kukjin Kim, Tomasz Figa, Daniel Lezcano, Vincent Guittot,
	Chirantan Ekbote, David Riley, olof, linux-samsung-soc,
	linux-kernel, linux-arm-kernel

On Wed, 4 Jun 2014, Doug Anderson wrote:

> As we saw in (clocksource: exynos_mct: cache mct upper count), the
> time spent reading the MCT shows up fairly high in real-world
> profiles.  That means that it's worth some optimization.
> 
> We get a roughly 10% speedup in userspace gettimeofday() by using an
> ldmia to read the two halfs of the MCT.  That seems like a worthwhile
> thing to do.
> 
> Before: 1173084 us for 1000000 gettimeofday in userspace
> After:  1045674 us for 1000000 gettimeofday in userspace
> 
> NOTE: we could actually do better than this if we really wanted to.
> Technically we could register the clocksource as a 32-bit timer and
> only use the "lower" half.  Doing so brings us down to 1014429 us for
> 1000000 gettimeofday in userspace (and doesn't even require assembly
> code).  That would be an alternative to this change.

I was about to ask exactly that question: What's the advantage of the
64 bit dance there? AFAICT nothing.
 
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
@ 2014-06-04 18:05     ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2014-06-04 18:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 4 Jun 2014, Doug Anderson wrote:

> As we saw in (clocksource: exynos_mct: cache mct upper count), the
> time spent reading the MCT shows up fairly high in real-world
> profiles.  That means that it's worth some optimization.
> 
> We get a roughly 10% speedup in userspace gettimeofday() by using an
> ldmia to read the two halfs of the MCT.  That seems like a worthwhile
> thing to do.
> 
> Before: 1173084 us for 1000000 gettimeofday in userspace
> After:  1045674 us for 1000000 gettimeofday in userspace
> 
> NOTE: we could actually do better than this if we really wanted to.
> Technically we could register the clocksource as a 32-bit timer and
> only use the "lower" half.  Doing so brings us down to 1014429 us for
> 1000000 gettimeofday in userspace (and doesn't even require assembly
> code).  That would be an alternative to this change.

I was about to ask exactly that question: What's the advantage of the
64 bit dance there? AFAICT nothing.
 
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
  2014-06-04 18:05     ` Thomas Gleixner
  (?)
@ 2014-06-04 18:49       ` Doug Anderson
  -1 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-04 18:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kukjin Kim, Tomasz Figa, Daniel Lezcano, Vincent Guittot,
	Chirantan Ekbote, David Riley, Olof Johansson, linux-samsung-soc,
	linux-kernel, linux-arm-kernel

Thomas,

On Wed, Jun 4, 2014 at 11:05 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 4 Jun 2014, Doug Anderson wrote:
>
>> As we saw in (clocksource: exynos_mct: cache mct upper count), the
>> time spent reading the MCT shows up fairly high in real-world
>> profiles.  That means that it's worth some optimization.
>>
>> We get a roughly 10% speedup in userspace gettimeofday() by using an
>> ldmia to read the two halfs of the MCT.  That seems like a worthwhile
>> thing to do.
>>
>> Before: 1173084 us for 1000000 gettimeofday in userspace
>> After:  1045674 us for 1000000 gettimeofday in userspace
>>
>> NOTE: we could actually do better than this if we really wanted to.
>> Technically we could register the clocksource as a 32-bit timer and
>> only use the "lower" half.  Doing so brings us down to 1014429 us for
>> 1000000 gettimeofday in userspace (and doesn't even require assembly
>> code).  That would be an alternative to this change.
>
> I was about to ask exactly that question: What's the advantage of the
> 64 bit dance there? AFAICT nothing.

I debated whether to send out the 32-bit version, since I'd
implemented both.  I'm happy to send out the 32-bit version too and
people can pick which they like better.  Let me know.

The final thing that pushed me over the edge to send the 64-bit
version was that I didn't know enough about how MCT was used with
respect to low power modes (we don't use AFTR / LPA modes on our
system).  I could actually believe that we might want to set a timer
for more than 178 seconds into the future for these low power modes.
If that's the case, we still need to keep around the 64-bit read code
for that case.  ...and once we have the 64-bit code then we might as
well use it for the rest of the timers.

Perhaps Tomasz will comment on this.

-Doug

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
@ 2014-06-04 18:49       ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-04 18:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kukjin Kim, Tomasz Figa, Daniel Lezcano, Vincent Guittot,
	Chirantan Ekbote, David Riley, Olof Johansson, linux-samsung-soc,
	linux-kernel, linux-arm-kernel

Thomas,

On Wed, Jun 4, 2014 at 11:05 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 4 Jun 2014, Doug Anderson wrote:
>
>> As we saw in (clocksource: exynos_mct: cache mct upper count), the
>> time spent reading the MCT shows up fairly high in real-world
>> profiles.  That means that it's worth some optimization.
>>
>> We get a roughly 10% speedup in userspace gettimeofday() by using an
>> ldmia to read the two halfs of the MCT.  That seems like a worthwhile
>> thing to do.
>>
>> Before: 1173084 us for 1000000 gettimeofday in userspace
>> After:  1045674 us for 1000000 gettimeofday in userspace
>>
>> NOTE: we could actually do better than this if we really wanted to.
>> Technically we could register the clocksource as a 32-bit timer and
>> only use the "lower" half.  Doing so brings us down to 1014429 us for
>> 1000000 gettimeofday in userspace (and doesn't even require assembly
>> code).  That would be an alternative to this change.
>
> I was about to ask exactly that question: What's the advantage of the
> 64 bit dance there? AFAICT nothing.

I debated whether to send out the 32-bit version, since I'd
implemented both.  I'm happy to send out the 32-bit version too and
people can pick which they like better.  Let me know.

The final thing that pushed me over the edge to send the 64-bit
version was that I didn't know enough about how MCT was used with
respect to low power modes (we don't use AFTR / LPA modes on our
system).  I could actually believe that we might want to set a timer
for more than 178 seconds into the future for these low power modes.
If that's the case, we still need to keep around the 64-bit read code
for that case.  ...and once we have the 64-bit code then we might as
well use it for the rest of the timers.

Perhaps Tomasz will comment on this.

-Doug

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
@ 2014-06-04 18:49       ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-04 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

Thomas,

On Wed, Jun 4, 2014 at 11:05 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 4 Jun 2014, Doug Anderson wrote:
>
>> As we saw in (clocksource: exynos_mct: cache mct upper count), the
>> time spent reading the MCT shows up fairly high in real-world
>> profiles.  That means that it's worth some optimization.
>>
>> We get a roughly 10% speedup in userspace gettimeofday() by using an
>> ldmia to read the two halfs of the MCT.  That seems like a worthwhile
>> thing to do.
>>
>> Before: 1173084 us for 1000000 gettimeofday in userspace
>> After:  1045674 us for 1000000 gettimeofday in userspace
>>
>> NOTE: we could actually do better than this if we really wanted to.
>> Technically we could register the clocksource as a 32-bit timer and
>> only use the "lower" half.  Doing so brings us down to 1014429 us for
>> 1000000 gettimeofday in userspace (and doesn't even require assembly
>> code).  That would be an alternative to this change.
>
> I was about to ask exactly that question: What's the advantage of the
> 64 bit dance there? AFAICT nothing.

I debated whether to send out the 32-bit version, since I'd
implemented both.  I'm happy to send out the 32-bit version too and
people can pick which they like better.  Let me know.

The final thing that pushed me over the edge to send the 64-bit
version was that I didn't know enough about how MCT was used with
respect to low power modes (we don't use AFTR / LPA modes on our
system).  I could actually believe that we might want to set a timer
for more than 178 seconds into the future for these low power modes.
If that's the case, we still need to keep around the 64-bit read code
for that case.  ...and once we have the 64-bit code then we might as
well use it for the rest of the timers.

Perhaps Tomasz will comment on this.

-Doug

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/3] clocksource: exynos_mct: cache mct upper count
  2014-06-04 17:30   ` Doug Anderson
@ 2014-06-05  7:55     ` Vincent Guittot
  -1 siblings, 0 replies; 43+ messages in thread
From: Vincent Guittot @ 2014-06-05  7:55 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Kukjin Kim, Tomasz Figa, Daniel Lezcano, Chirantan Ekbote,
	David Riley, Olof Johansson, linux-samsung-soc,
	Mandeep Singh Baines, Andrew Bresticker, Thomas Gleixner,
	linux-kernel, LAK

On 4 June 2014 19:30, Doug Anderson <dianders@chromium.org> wrote:
> From: Mandeep Singh Baines <msb@chromium.org>
>
> Saves one register read.  Note that the upper count only changes every
> ~178 seconds with a 24MHz source clock, so it's likely it hasn't
> changed from call to call.

Hi Doug,

Have you checked that the time saved in using a 32bits counter instead
of a 64bits one is not lost in the handle of the wrap of sched_clock
which occurs every 178s instead of never ?

sched_clock_poll function will be called every 178s

Regards,
Vincent


>
> Before: 1323852 us for 1000000 gettimeofday in userspace
> After:  1173084 us for 1000000 gettimeofday in userspace
>
> Note that even with this change the CPU is in exynos_frc_read() more
> than 2% of the time in real world profiles of ChromeOS.  That
> indicates that it's important to optimize.
>
> Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
> Signed-off-by: Doug Anderson <dianders@chromium.org>
> ---
>  drivers/clocksource/exynos_mct.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
> index ba3a683..7cbe4aa 100644
> --- a/drivers/clocksource/exynos_mct.c
> +++ b/drivers/clocksource/exynos_mct.c
> @@ -167,8 +167,8 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
>
>  static inline cycle_t notrace _exynos4_frc_read(void)
>  {
> -       unsigned int lo, hi;
> -       u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
> +       u32 lo, hi;
> +       static u32 hi2;
>
>         do {
>                 hi = hi2;
> --
> 2.0.0.526.g5318336
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 2/3] clocksource: exynos_mct: cache mct upper count
@ 2014-06-05  7:55     ` Vincent Guittot
  0 siblings, 0 replies; 43+ messages in thread
From: Vincent Guittot @ 2014-06-05  7:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 4 June 2014 19:30, Doug Anderson <dianders@chromium.org> wrote:
> From: Mandeep Singh Baines <msb@chromium.org>
>
> Saves one register read.  Note that the upper count only changes every
> ~178 seconds with a 24MHz source clock, so it's likely it hasn't
> changed from call to call.

Hi Doug,

Have you checked that the time saved in using a 32bits counter instead
of a 64bits one is not lost in the handle of the wrap of sched_clock
which occurs every 178s instead of never ?

sched_clock_poll function will be called every 178s

Regards,
Vincent


>
> Before: 1323852 us for 1000000 gettimeofday in userspace
> After:  1173084 us for 1000000 gettimeofday in userspace
>
> Note that even with this change the CPU is in exynos_frc_read() more
> than 2% of the time in real world profiles of ChromeOS.  That
> indicates that it's important to optimize.
>
> Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
> Signed-off-by: Doug Anderson <dianders@chromium.org>
> ---
>  drivers/clocksource/exynos_mct.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
> index ba3a683..7cbe4aa 100644
> --- a/drivers/clocksource/exynos_mct.c
> +++ b/drivers/clocksource/exynos_mct.c
> @@ -167,8 +167,8 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
>
>  static inline cycle_t notrace _exynos4_frc_read(void)
>  {
> -       unsigned int lo, hi;
> -       u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
> +       u32 lo, hi;
> +       static u32 hi2;
>
>         do {
>                 hi = hi2;
> --
> 2.0.0.526.g5318336
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
  2014-06-04 18:49       ` Doug Anderson
  (?)
@ 2014-06-05 11:18         ` Tomasz Figa
  -1 siblings, 0 replies; 43+ messages in thread
From: Tomasz Figa @ 2014-06-05 11:18 UTC (permalink / raw)
  To: Doug Anderson, Thomas Gleixner
  Cc: Kukjin Kim, Tomasz Figa, Daniel Lezcano, Vincent Guittot,
	Chirantan Ekbote, David Riley, Olof Johansson, linux-samsung-soc,
	linux-kernel, linux-arm-kernel, Bartlomiej Zolnierkiewicz,
	Krzysztof Kozlowski

On 04.06.2014 20:49, Doug Anderson wrote:
> Thomas,
> 
> On Wed, Jun 4, 2014 at 11:05 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Wed, 4 Jun 2014, Doug Anderson wrote:
>>
>>> As we saw in (clocksource: exynos_mct: cache mct upper count), the
>>> time spent reading the MCT shows up fairly high in real-world
>>> profiles.  That means that it's worth some optimization.
>>>
>>> We get a roughly 10% speedup in userspace gettimeofday() by using an
>>> ldmia to read the two halfs of the MCT.  That seems like a worthwhile
>>> thing to do.
>>>
>>> Before: 1173084 us for 1000000 gettimeofday in userspace
>>> After:  1045674 us for 1000000 gettimeofday in userspace
>>>
>>> NOTE: we could actually do better than this if we really wanted to.
>>> Technically we could register the clocksource as a 32-bit timer and
>>> only use the "lower" half.  Doing so brings us down to 1014429 us for
>>> 1000000 gettimeofday in userspace (and doesn't even require assembly
>>> code).  That would be an alternative to this change.
>>
>> I was about to ask exactly that question: What's the advantage of the
>> 64 bit dance there? AFAICT nothing.
> 
> I debated whether to send out the 32-bit version, since I'd
> implemented both.  I'm happy to send out the 32-bit version too and
> people can pick which they like better.  Let me know.
> 
> The final thing that pushed me over the edge to send the 64-bit
> version was that I didn't know enough about how MCT was used with
> respect to low power modes (we don't use AFTR / LPA modes on our
> system).  I could actually believe that we might want to set a timer
> for more than 178 seconds into the future for these low power modes.
> If that's the case, we still need to keep around the 64-bit read code
> for that case.  ...and once we have the 64-bit code then we might as
> well use it for the rest of the timers.
> 
> Perhaps Tomasz will comment on this.

I can't say definitely that we won't need those extra 32-bits, but the
low power modes you mentioned are (and probably will always be)
implemented as CPU idle modes. This means that most likely having a
wake-up every 178 second wouldn't hurt that much, especially considering
the fact that in normal use cases, when the system is not fully
suspended it usually has things to do and will need to be woken up more
frequently than that.

Adding Bart and Krzysztof to the discussion as they are currently
working on power management.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
@ 2014-06-05 11:18         ` Tomasz Figa
  0 siblings, 0 replies; 43+ messages in thread
From: Tomasz Figa @ 2014-06-05 11:18 UTC (permalink / raw)
  To: Doug Anderson, Thomas Gleixner
  Cc: Kukjin Kim, Tomasz Figa, Daniel Lezcano, Vincent Guittot,
	Chirantan Ekbote, David Riley, Olof Johansson, linux-samsung-soc,
	linux-kernel, linux-arm-kernel, Bartlomiej Zolnierkiewicz,
	Krzysztof Kozlowski

On 04.06.2014 20:49, Doug Anderson wrote:
> Thomas,
> 
> On Wed, Jun 4, 2014 at 11:05 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Wed, 4 Jun 2014, Doug Anderson wrote:
>>
>>> As we saw in (clocksource: exynos_mct: cache mct upper count), the
>>> time spent reading the MCT shows up fairly high in real-world
>>> profiles.  That means that it's worth some optimization.
>>>
>>> We get a roughly 10% speedup in userspace gettimeofday() by using an
>>> ldmia to read the two halfs of the MCT.  That seems like a worthwhile
>>> thing to do.
>>>
>>> Before: 1173084 us for 1000000 gettimeofday in userspace
>>> After:  1045674 us for 1000000 gettimeofday in userspace
>>>
>>> NOTE: we could actually do better than this if we really wanted to.
>>> Technically we could register the clocksource as a 32-bit timer and
>>> only use the "lower" half.  Doing so brings us down to 1014429 us for
>>> 1000000 gettimeofday in userspace (and doesn't even require assembly
>>> code).  That would be an alternative to this change.
>>
>> I was about to ask exactly that question: What's the advantage of the
>> 64 bit dance there? AFAICT nothing.
> 
> I debated whether to send out the 32-bit version, since I'd
> implemented both.  I'm happy to send out the 32-bit version too and
> people can pick which they like better.  Let me know.
> 
> The final thing that pushed me over the edge to send the 64-bit
> version was that I didn't know enough about how MCT was used with
> respect to low power modes (we don't use AFTR / LPA modes on our
> system).  I could actually believe that we might want to set a timer
> for more than 178 seconds into the future for these low power modes.
> If that's the case, we still need to keep around the 64-bit read code
> for that case.  ...and once we have the 64-bit code then we might as
> well use it for the rest of the timers.
> 
> Perhaps Tomasz will comment on this.

I can't say definitely that we won't need those extra 32-bits, but the
low power modes you mentioned are (and probably will always be)
implemented as CPU idle modes. This means that most likely having a
wake-up every 178 second wouldn't hurt that much, especially considering
the fact that in normal use cases, when the system is not fully
suspended it usually has things to do and will need to be woken up more
frequently than that.

Adding Bart and Krzysztof to the discussion as they are currently
working on power management.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
@ 2014-06-05 11:18         ` Tomasz Figa
  0 siblings, 0 replies; 43+ messages in thread
From: Tomasz Figa @ 2014-06-05 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 04.06.2014 20:49, Doug Anderson wrote:
> Thomas,
> 
> On Wed, Jun 4, 2014 at 11:05 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Wed, 4 Jun 2014, Doug Anderson wrote:
>>
>>> As we saw in (clocksource: exynos_mct: cache mct upper count), the
>>> time spent reading the MCT shows up fairly high in real-world
>>> profiles.  That means that it's worth some optimization.
>>>
>>> We get a roughly 10% speedup in userspace gettimeofday() by using an
>>> ldmia to read the two halfs of the MCT.  That seems like a worthwhile
>>> thing to do.
>>>
>>> Before: 1173084 us for 1000000 gettimeofday in userspace
>>> After:  1045674 us for 1000000 gettimeofday in userspace
>>>
>>> NOTE: we could actually do better than this if we really wanted to.
>>> Technically we could register the clocksource as a 32-bit timer and
>>> only use the "lower" half.  Doing so brings us down to 1014429 us for
>>> 1000000 gettimeofday in userspace (and doesn't even require assembly
>>> code).  That would be an alternative to this change.
>>
>> I was about to ask exactly that question: What's the advantage of the
>> 64 bit dance there? AFAICT nothing.
> 
> I debated whether to send out the 32-bit version, since I'd
> implemented both.  I'm happy to send out the 32-bit version too and
> people can pick which they like better.  Let me know.
> 
> The final thing that pushed me over the edge to send the 64-bit
> version was that I didn't know enough about how MCT was used with
> respect to low power modes (we don't use AFTR / LPA modes on our
> system).  I could actually believe that we might want to set a timer
> for more than 178 seconds into the future for these low power modes.
> If that's the case, we still need to keep around the 64-bit read code
> for that case.  ...and once we have the 64-bit code then we might as
> well use it for the rest of the timers.
> 
> Perhaps Tomasz will comment on this.

I can't say definitely that we won't need those extra 32-bits, but the
low power modes you mentioned are (and probably will always be)
implemented as CPU idle modes. This means that most likely having a
wake-up every 178 second wouldn't hurt that much, especially considering
the fact that in normal use cases, when the system is not fully
suspended it usually has things to do and will need to be woken up more
frequently than that.

Adding Bart and Krzysztof to the discussion as they are currently
working on power management.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/3] clocksource: exynos_mct: cache mct upper count
  2014-06-05  7:55     ` Vincent Guittot
@ 2014-06-05 17:14       ` Doug Anderson
  -1 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-05 17:14 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Kukjin Kim, Tomasz Figa, Daniel Lezcano, Chirantan Ekbote,
	David Riley, Olof Johansson, linux-samsung-soc,
	Mandeep Singh Baines, Andrew Bresticker, Thomas Gleixner,
	linux-kernel, LAK

Vincent,

On Thu, Jun 5, 2014 at 12:55 AM, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
> On 4 June 2014 19:30, Doug Anderson <dianders@chromium.org> wrote:
>> From: Mandeep Singh Baines <msb@chromium.org>
>>
>> Saves one register read.  Note that the upper count only changes every
>> ~178 seconds with a 24MHz source clock, so it's likely it hasn't
>> changed from call to call.
>
> Hi Doug,
>
> Have you checked that the time saved in using a 32bits counter instead
> of a 64bits one is not lost in the handle of the wrap of sched_clock
> which occurs every 178s instead of never ?
>
> sched_clock_poll function will be called every 178s

Probably should move this to the discussion in the other patch where
we're talking about moving to 32-bits?

...but in general I have no benchmarks at all that how much the
scheduler is affected by this change.  All of my benchmarks are based
on gettimeofday(), where the improvement is clear.  Do you have a
suggested way to check?

I think that it's pretty clear that we'll want to take either the
32-bit conversion or the 64-bit optimizations I sent out.  If you take
out gettimeofday() overhead, I believe that the actual read of the MCT
(exynos_frc_read) took about:
* ~800ns per call with the original 64-bit code
* ~500ns per call with the optimized 64-bit code (both patch 2 and 3)
* ~500ns per call with the 32-bit code

The 32-bit code is slightly faster but not significantly (like 530ns
vs 500ns).  ...but it is definitely simpler.

If we find that the schedule clock wrap overhead is real then that
would be a good reason to take the 64-bit version.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 2/3] clocksource: exynos_mct: cache mct upper count
@ 2014-06-05 17:14       ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-05 17:14 UTC (permalink / raw)
  To: linux-arm-kernel

Vincent,

On Thu, Jun 5, 2014 at 12:55 AM, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
> On 4 June 2014 19:30, Doug Anderson <dianders@chromium.org> wrote:
>> From: Mandeep Singh Baines <msb@chromium.org>
>>
>> Saves one register read.  Note that the upper count only changes every
>> ~178 seconds with a 24MHz source clock, so it's likely it hasn't
>> changed from call to call.
>
> Hi Doug,
>
> Have you checked that the time saved in using a 32bits counter instead
> of a 64bits one is not lost in the handle of the wrap of sched_clock
> which occurs every 178s instead of never ?
>
> sched_clock_poll function will be called every 178s

Probably should move this to the discussion in the other patch where
we're talking about moving to 32-bits?

...but in general I have no benchmarks at all that how much the
scheduler is affected by this change.  All of my benchmarks are based
on gettimeofday(), where the improvement is clear.  Do you have a
suggested way to check?

I think that it's pretty clear that we'll want to take either the
32-bit conversion or the 64-bit optimizations I sent out.  If you take
out gettimeofday() overhead, I believe that the actual read of the MCT
(exynos_frc_read) took about:
* ~800ns per call with the original 64-bit code
* ~500ns per call with the optimized 64-bit code (both patch 2 and 3)
* ~500ns per call with the 32-bit code

The 32-bit code is slightly faster but not significantly (like 530ns
vs 500ns).  ...but it is definitely simpler.

If we find that the schedule clock wrap overhead is real then that
would be a good reason to take the 64-bit version.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
  2014-06-05 11:18         ` Tomasz Figa
  (?)
@ 2014-06-05 18:21           ` Doug Anderson
  -1 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-05 18:21 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Thomas Gleixner, Kukjin Kim, Tomasz Figa, Daniel Lezcano,
	Vincent Guittot, Chirantan Ekbote, David Riley, Olof Johansson,
	linux-samsung-soc, linux-kernel, linux-arm-kernel,
	Bartlomiej Zolnierkiewicz, Krzysztof Kozlowski

Tomasz,

On Thu, Jun 5, 2014 at 4:18 AM, Tomasz Figa <tomasz.figa@gmail.com> wrote:
> On 04.06.2014 20:49, Doug Anderson wrote:
>> Thomas,
>>
>> On Wed, Jun 4, 2014 at 11:05 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> On Wed, 4 Jun 2014, Doug Anderson wrote:
>>>
>>>> As we saw in (clocksource: exynos_mct: cache mct upper count), the
>>>> time spent reading the MCT shows up fairly high in real-world
>>>> profiles.  That means that it's worth some optimization.
>>>>
>>>> We get a roughly 10% speedup in userspace gettimeofday() by using an
>>>> ldmia to read the two halfs of the MCT.  That seems like a worthwhile
>>>> thing to do.
>>>>
>>>> Before: 1173084 us for 1000000 gettimeofday in userspace
>>>> After:  1045674 us for 1000000 gettimeofday in userspace
>>>>
>>>> NOTE: we could actually do better than this if we really wanted to.
>>>> Technically we could register the clocksource as a 32-bit timer and
>>>> only use the "lower" half.  Doing so brings us down to 1014429 us for
>>>> 1000000 gettimeofday in userspace (and doesn't even require assembly
>>>> code).  That would be an alternative to this change.
>>>
>>> I was about to ask exactly that question: What's the advantage of the
>>> 64 bit dance there? AFAICT nothing.
>>
>> I debated whether to send out the 32-bit version, since I'd
>> implemented both.  I'm happy to send out the 32-bit version too and
>> people can pick which they like better.  Let me know.
>>
>> The final thing that pushed me over the edge to send the 64-bit
>> version was that I didn't know enough about how MCT was used with
>> respect to low power modes (we don't use AFTR / LPA modes on our
>> system).  I could actually believe that we might want to set a timer
>> for more than 178 seconds into the future for these low power modes.
>> If that's the case, we still need to keep around the 64-bit read code
>> for that case.  ...and once we have the 64-bit code then we might as
>> well use it for the rest of the timers.
>>
>> Perhaps Tomasz will comment on this.
>
> I can't say definitely that we won't need those extra 32-bits, but the
> low power modes you mentioned are (and probably will always be)
> implemented as CPU idle modes. This means that most likely having a
> wake-up every 178 second wouldn't hurt that much, especially considering
> the fact that in normal use cases, when the system is not fully
> suspended it usually has things to do and will need to be woken up more
> frequently than that.
>
> Adding Bart and Krzysztof to the discussion as they are currently
> working on power management.

OK, I'm happy to send out the 32-bit version of the patches.  In the
other series Vincent was questioning whether the performance gain of
32-bit would be offset by the extra overhead of needing to handle the
wraparound.  I have no idea what the answer is there.  Who on this
thread would be willing to pick between the two?

My own personal opinion is to take the existing optimized 64-bit
versions I sent out since:
* there's no question that they work and won't regress performance or
functionality
* the speed is nearly equal to the 32-bit version
* the single assembly instruction isn't _too_ ugly.


-Doug

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
@ 2014-06-05 18:21           ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-05 18:21 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Thomas Gleixner, Kukjin Kim, Tomasz Figa, Daniel Lezcano,
	Vincent Guittot, Chirantan Ekbote, David Riley, Olof Johansson,
	linux-samsung-soc, linux-kernel, linux-arm-kernel,
	Bartlomiej Zolnierkiewicz, Krzysztof Kozlowski

Tomasz,

On Thu, Jun 5, 2014 at 4:18 AM, Tomasz Figa <tomasz.figa@gmail.com> wrote:
> On 04.06.2014 20:49, Doug Anderson wrote:
>> Thomas,
>>
>> On Wed, Jun 4, 2014 at 11:05 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> On Wed, 4 Jun 2014, Doug Anderson wrote:
>>>
>>>> As we saw in (clocksource: exynos_mct: cache mct upper count), the
>>>> time spent reading the MCT shows up fairly high in real-world
>>>> profiles.  That means that it's worth some optimization.
>>>>
>>>> We get a roughly 10% speedup in userspace gettimeofday() by using an
>>>> ldmia to read the two halfs of the MCT.  That seems like a worthwhile
>>>> thing to do.
>>>>
>>>> Before: 1173084 us for 1000000 gettimeofday in userspace
>>>> After:  1045674 us for 1000000 gettimeofday in userspace
>>>>
>>>> NOTE: we could actually do better than this if we really wanted to.
>>>> Technically we could register the clocksource as a 32-bit timer and
>>>> only use the "lower" half.  Doing so brings us down to 1014429 us for
>>>> 1000000 gettimeofday in userspace (and doesn't even require assembly
>>>> code).  That would be an alternative to this change.
>>>
>>> I was about to ask exactly that question: What's the advantage of the
>>> 64 bit dance there? AFAICT nothing.
>>
>> I debated whether to send out the 32-bit version, since I'd
>> implemented both.  I'm happy to send out the 32-bit version too and
>> people can pick which they like better.  Let me know.
>>
>> The final thing that pushed me over the edge to send the 64-bit
>> version was that I didn't know enough about how MCT was used with
>> respect to low power modes (we don't use AFTR / LPA modes on our
>> system).  I could actually believe that we might want to set a timer
>> for more than 178 seconds into the future for these low power modes.
>> If that's the case, we still need to keep around the 64-bit read code
>> for that case.  ...and once we have the 64-bit code then we might as
>> well use it for the rest of the timers.
>>
>> Perhaps Tomasz will comment on this.
>
> I can't say definitely that we won't need those extra 32-bits, but the
> low power modes you mentioned are (and probably will always be)
> implemented as CPU idle modes. This means that most likely having a
> wake-up every 178 second wouldn't hurt that much, especially considering
> the fact that in normal use cases, when the system is not fully
> suspended it usually has things to do and will need to be woken up more
> frequently than that.
>
> Adding Bart and Krzysztof to the discussion as they are currently
> working on power management.

OK, I'm happy to send out the 32-bit version of the patches.  In the
other series Vincent was questioning whether the performance gain of
32-bit would be offset by the extra overhead of needing to handle the
wraparound.  I have no idea what the answer is there.  Who on this
thread would be willing to pick between the two?

My own personal opinion is to take the existing optimized 64-bit
versions I sent out since:
* there's no question that they work and won't regress performance or
functionality
* the speed is nearly equal to the 32-bit version
* the single assembly instruction isn't _too_ ugly.


-Doug

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
@ 2014-06-05 18:21           ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-05 18:21 UTC (permalink / raw)
  To: linux-arm-kernel

Tomasz,

On Thu, Jun 5, 2014 at 4:18 AM, Tomasz Figa <tomasz.figa@gmail.com> wrote:
> On 04.06.2014 20:49, Doug Anderson wrote:
>> Thomas,
>>
>> On Wed, Jun 4, 2014 at 11:05 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> On Wed, 4 Jun 2014, Doug Anderson wrote:
>>>
>>>> As we saw in (clocksource: exynos_mct: cache mct upper count), the
>>>> time spent reading the MCT shows up fairly high in real-world
>>>> profiles.  That means that it's worth some optimization.
>>>>
>>>> We get a roughly 10% speedup in userspace gettimeofday() by using an
>>>> ldmia to read the two halfs of the MCT.  That seems like a worthwhile
>>>> thing to do.
>>>>
>>>> Before: 1173084 us for 1000000 gettimeofday in userspace
>>>> After:  1045674 us for 1000000 gettimeofday in userspace
>>>>
>>>> NOTE: we could actually do better than this if we really wanted to.
>>>> Technically we could register the clocksource as a 32-bit timer and
>>>> only use the "lower" half.  Doing so brings us down to 1014429 us for
>>>> 1000000 gettimeofday in userspace (and doesn't even require assembly
>>>> code).  That would be an alternative to this change.
>>>
>>> I was about to ask exactly that question: What's the advantage of the
>>> 64 bit dance there? AFAICT nothing.
>>
>> I debated whether to send out the 32-bit version, since I'd
>> implemented both.  I'm happy to send out the 32-bit version too and
>> people can pick which they like better.  Let me know.
>>
>> The final thing that pushed me over the edge to send the 64-bit
>> version was that I didn't know enough about how MCT was used with
>> respect to low power modes (we don't use AFTR / LPA modes on our
>> system).  I could actually believe that we might want to set a timer
>> for more than 178 seconds into the future for these low power modes.
>> If that's the case, we still need to keep around the 64-bit read code
>> for that case.  ...and once we have the 64-bit code then we might as
>> well use it for the rest of the timers.
>>
>> Perhaps Tomasz will comment on this.
>
> I can't say definitely that we won't need those extra 32-bits, but the
> low power modes you mentioned are (and probably will always be)
> implemented as CPU idle modes. This means that most likely having a
> wake-up every 178 second wouldn't hurt that much, especially considering
> the fact that in normal use cases, when the system is not fully
> suspended it usually has things to do and will need to be woken up more
> frequently than that.
>
> Adding Bart and Krzysztof to the discussion as they are currently
> working on power management.

OK, I'm happy to send out the 32-bit version of the patches.  In the
other series Vincent was questioning whether the performance gain of
32-bit would be offset by the extra overhead of needing to handle the
wraparound.  I have no idea what the answer is there.  Who on this
thread would be willing to pick between the two?

My own personal opinion is to take the existing optimized 64-bit
versions I sent out since:
* there's no question that they work and won't regress performance or
functionality
* the speed is nearly equal to the 32-bit version
* the single assembly instruction isn't _too_ ugly.


-Doug

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
  2014-06-04 18:05     ` Thomas Gleixner
  (?)
@ 2014-06-12 16:53       ` Doug Anderson
  -1 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-12 16:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kukjin Kim, Tomasz Figa, Daniel Lezcano, Vincent Guittot,
	Chirantan Ekbote, David Riley, Olof Johansson, linux-samsung-soc,
	linux-kernel, linux-arm-kernel

Thomas,

On Wed, Jun 4, 2014 at 11:05 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 4 Jun 2014, Doug Anderson wrote:
>
>> As we saw in (clocksource: exynos_mct: cache mct upper count), the
>> time spent reading the MCT shows up fairly high in real-world
>> profiles.  That means that it's worth some optimization.
>>
>> We get a roughly 10% speedup in userspace gettimeofday() by using an
>> ldmia to read the two halfs of the MCT.  That seems like a worthwhile
>> thing to do.
>>
>> Before: 1173084 us for 1000000 gettimeofday in userspace
>> After:  1045674 us for 1000000 gettimeofday in userspace
>>
>> NOTE: we could actually do better than this if we really wanted to.
>> Technically we could register the clocksource as a 32-bit timer and
>> only use the "lower" half.  Doing so brings us down to 1014429 us for
>> 1000000 gettimeofday in userspace (and doesn't even require assembly
>> code).  That would be an alternative to this change.
>
> I was about to ask exactly that question: What's the advantage of the
> 64 bit dance there? AFAICT nothing.

Are you explicitly naking the 64-bit version of these patches?  If so
I'll send out the 32-bit version right away.  If nothing else we
should get the ftrace fix (patch 1 in this series) landed ASAP since
that fixes a regression.  I'd like to make the 32-bit decision first
though since fixing the regression is easier in the 32-bit version.

Roughly: with the 64-bit version there's no question that we're not
regressing anyone's functionality or performance and it's nearly as
fast as the 32-bit version.  The 32-bit version is simpler / faster
but has the potential to cause (unknown) problems.

-Doug

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
@ 2014-06-12 16:53       ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-12 16:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kukjin Kim, Tomasz Figa, Daniel Lezcano, Vincent Guittot,
	Chirantan Ekbote, David Riley, Olof Johansson, linux-samsung-soc,
	linux-kernel, linux-arm-kernel

Thomas,

On Wed, Jun 4, 2014 at 11:05 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 4 Jun 2014, Doug Anderson wrote:
>
>> As we saw in (clocksource: exynos_mct: cache mct upper count), the
>> time spent reading the MCT shows up fairly high in real-world
>> profiles.  That means that it's worth some optimization.
>>
>> We get a roughly 10% speedup in userspace gettimeofday() by using an
>> ldmia to read the two halfs of the MCT.  That seems like a worthwhile
>> thing to do.
>>
>> Before: 1173084 us for 1000000 gettimeofday in userspace
>> After:  1045674 us for 1000000 gettimeofday in userspace
>>
>> NOTE: we could actually do better than this if we really wanted to.
>> Technically we could register the clocksource as a 32-bit timer and
>> only use the "lower" half.  Doing so brings us down to 1014429 us for
>> 1000000 gettimeofday in userspace (and doesn't even require assembly
>> code).  That would be an alternative to this change.
>
> I was about to ask exactly that question: What's the advantage of the
> 64 bit dance there? AFAICT nothing.

Are you explicitly naking the 64-bit version of these patches?  If so
I'll send out the 32-bit version right away.  If nothing else we
should get the ftrace fix (patch 1 in this series) landed ASAP since
that fixes a regression.  I'd like to make the 32-bit decision first
though since fixing the regression is easier in the 32-bit version.

Roughly: with the 64-bit version there's no question that we're not
regressing anyone's functionality or performance and it's nearly as
fast as the 32-bit version.  The 32-bit version is simpler / faster
but has the potential to cause (unknown) problems.

-Doug

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia
@ 2014-06-12 16:53       ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-12 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

Thomas,

On Wed, Jun 4, 2014 at 11:05 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 4 Jun 2014, Doug Anderson wrote:
>
>> As we saw in (clocksource: exynos_mct: cache mct upper count), the
>> time spent reading the MCT shows up fairly high in real-world
>> profiles.  That means that it's worth some optimization.
>>
>> We get a roughly 10% speedup in userspace gettimeofday() by using an
>> ldmia to read the two halfs of the MCT.  That seems like a worthwhile
>> thing to do.
>>
>> Before: 1173084 us for 1000000 gettimeofday in userspace
>> After:  1045674 us for 1000000 gettimeofday in userspace
>>
>> NOTE: we could actually do better than this if we really wanted to.
>> Technically we could register the clocksource as a 32-bit timer and
>> only use the "lower" half.  Doing so brings us down to 1014429 us for
>> 1000000 gettimeofday in userspace (and doesn't even require assembly
>> code).  That would be an alternative to this change.
>
> I was about to ask exactly that question: What's the advantage of the
> 64 bit dance there? AFAICT nothing.

Are you explicitly naking the 64-bit version of these patches?  If so
I'll send out the 32-bit version right away.  If nothing else we
should get the ftrace fix (patch 1 in this series) landed ASAP since
that fixes a regression.  I'd like to make the 32-bit decision first
though since fixing the regression is easier in the 32-bit version.

Roughly: with the 64-bit version there's no question that we're not
regressing anyone's functionality or performance and it's nearly as
fast as the 32-bit version.  The 32-bit version is simpler / faster
but has the potential to cause (unknown) problems.

-Doug

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
  2014-06-04 17:30 ` Doug Anderson
@ 2014-06-15 21:18   ` Daniel Lezcano
  -1 siblings, 0 replies; 43+ messages in thread
From: Daniel Lezcano @ 2014-06-15 21:18 UTC (permalink / raw)
  To: Doug Anderson, Kukjin Kim, Tomasz Figa
  Cc: Vincent Guittot, Chirantan Ekbote, David Riley, olof,
	linux-samsung-soc, tglx, linux-kernel, linux-arm-kernel

On 06/04/2014 07:30 PM, Doug Anderson wrote:
> In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
> supported using the MCT as a scheduler clock.  We properly marked
> exynos4_read_sched_clock() as notrace.  However, we then went and
> called another function that _wasn't_ notrace.  That means if you do:
>
>    cd /sys/kernel/debug/tracing/
>    echo function_graph > current_tracer
>
> You'll get a crash.
>
> Fix this (but still let other readers of the MCT be trace-enabled) by
> adding an extra function.  It's important to keep other users of MCT
> traceable because the MCT is actually quite slow.


Hi Doug,

could you elaborate ? I don't get the 'because the MCT ... slow'

Thanks

   -- Daniel

> Signed-off-by: Doug Anderson <dianders@chromium.org>
> ---
>   drivers/clocksource/exynos_mct.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
> index 8d64200..ba3a683 100644
> --- a/drivers/clocksource/exynos_mct.c
> +++ b/drivers/clocksource/exynos_mct.c
> @@ -165,7 +165,7 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
>   	exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
>   }
>
> -static cycle_t exynos4_frc_read(struct clocksource *cs)
> +static inline cycle_t notrace _exynos4_frc_read(void)
>   {
>   	unsigned int lo, hi;
>   	u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
> @@ -179,6 +179,11 @@ static cycle_t exynos4_frc_read(struct clocksource *cs)
>   	return ((cycle_t)hi << 32) | lo;
>   }
>
> +static cycle_t exynos4_frc_read(struct clocksource *cs)
> +{
> +	return _exynos4_frc_read();
> +}
> +
>   static void exynos4_frc_resume(struct clocksource *cs)
>   {
>   	exynos4_mct_frc_start(0, 0);
> @@ -195,7 +200,7 @@ struct clocksource mct_frc = {
>
>   static u64 notrace exynos4_read_sched_clock(void)
>   {
> -	return exynos4_frc_read(&mct_frc);
> +	return _exynos4_frc_read();
>   }
>
>   static void __init exynos4_clocksource_init(void)
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
@ 2014-06-15 21:18   ` Daniel Lezcano
  0 siblings, 0 replies; 43+ messages in thread
From: Daniel Lezcano @ 2014-06-15 21:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/04/2014 07:30 PM, Doug Anderson wrote:
> In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
> supported using the MCT as a scheduler clock.  We properly marked
> exynos4_read_sched_clock() as notrace.  However, we then went and
> called another function that _wasn't_ notrace.  That means if you do:
>
>    cd /sys/kernel/debug/tracing/
>    echo function_graph > current_tracer
>
> You'll get a crash.
>
> Fix this (but still let other readers of the MCT be trace-enabled) by
> adding an extra function.  It's important to keep other users of MCT
> traceable because the MCT is actually quite slow.


Hi Doug,

could you elaborate ? I don't get the 'because the MCT ... slow'

Thanks

   -- Daniel

> Signed-off-by: Doug Anderson <dianders@chromium.org>
> ---
>   drivers/clocksource/exynos_mct.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
> index 8d64200..ba3a683 100644
> --- a/drivers/clocksource/exynos_mct.c
> +++ b/drivers/clocksource/exynos_mct.c
> @@ -165,7 +165,7 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
>   	exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
>   }
>
> -static cycle_t exynos4_frc_read(struct clocksource *cs)
> +static inline cycle_t notrace _exynos4_frc_read(void)
>   {
>   	unsigned int lo, hi;
>   	u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
> @@ -179,6 +179,11 @@ static cycle_t exynos4_frc_read(struct clocksource *cs)
>   	return ((cycle_t)hi << 32) | lo;
>   }
>
> +static cycle_t exynos4_frc_read(struct clocksource *cs)
> +{
> +	return _exynos4_frc_read();
> +}
> +
>   static void exynos4_frc_resume(struct clocksource *cs)
>   {
>   	exynos4_mct_frc_start(0, 0);
> @@ -195,7 +200,7 @@ struct clocksource mct_frc = {
>
>   static u64 notrace exynos4_read_sched_clock(void)
>   {
> -	return exynos4_frc_read(&mct_frc);
> +	return _exynos4_frc_read();
>   }
>
>   static void __init exynos4_clocksource_init(void)
>


-- 
  <http://www.linaro.org/> Linaro.org ? Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
  2014-06-15 21:18   ` Daniel Lezcano
  (?)
@ 2014-06-16  4:40     ` Doug Anderson
  -1 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-16  4:40 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Kukjin Kim, Tomasz Figa, Vincent Guittot, Chirantan Ekbote,
	David Riley, Olof Johansson, linux-samsung-soc, Thomas Gleixner,
	linux-kernel, linux-arm-kernel

Daniel,

On Sun, Jun 15, 2014 at 2:18 PM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
> On 06/04/2014 07:30 PM, Doug Anderson wrote:
>>
>> In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
>> supported using the MCT as a scheduler clock.  We properly marked
>> exynos4_read_sched_clock() as notrace.  However, we then went and
>> called another function that _wasn't_ notrace.  That means if you do:
>>
>>    cd /sys/kernel/debug/tracing/
>>    echo function_graph > current_tracer
>>
>> You'll get a crash.
>>
>> Fix this (but still let other readers of the MCT be trace-enabled) by
>> adding an extra function.  It's important to keep other users of MCT
>> traceable because the MCT is actually quite slow.
>
>
>
> Hi Doug,
>
> could you elaborate ? I don't get the 'because the MCT ... slow'

Sorry, I was trying to avoid duplication in the series and it's more
obvious when you look at parts 2 and 3 of the series.  ;)

Doing the math (please correct any miscalculations) using the numbers
from the other patches: You can see that the existing code takes
1323852 us for 1000000 gettimeofday in userspace.  The fastest
implementation (just shaving to a 32-bit timer) gets us as fast as
~1000000 us for 1000000 gettimeofday in userspace.

>From profiling, I believe that gettimeofday from userspace is about
50% overhead (system call, multiplication, copies, etc) and about 50%
MCT read.  That means that the fastest you can possibly do an MCT read
is in .5us or 500ns.

I believe an A15 has something like 1 or 2 cycles per instruction.  If
it were 2 cycles per instruction, it can execute a normal instruction
on a 2GHz machine in .5ns.  That means we can execute 1000 normal
instructions in the time it takes to do a since MCT access.

...so I guess that's what I'd call slow.  ;)  What do you think?  I
know that the MCT read shows up in whole system profiles of
gettimeofday.

-Doug

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
@ 2014-06-16  4:40     ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-16  4:40 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Kukjin Kim, Tomasz Figa, Vincent Guittot, Chirantan Ekbote,
	David Riley, Olof Johansson, linux-samsung-soc, Thomas Gleixner,
	linux-kernel, linux-arm-kernel

Daniel,

On Sun, Jun 15, 2014 at 2:18 PM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
> On 06/04/2014 07:30 PM, Doug Anderson wrote:
>>
>> In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
>> supported using the MCT as a scheduler clock.  We properly marked
>> exynos4_read_sched_clock() as notrace.  However, we then went and
>> called another function that _wasn't_ notrace.  That means if you do:
>>
>>    cd /sys/kernel/debug/tracing/
>>    echo function_graph > current_tracer
>>
>> You'll get a crash.
>>
>> Fix this (but still let other readers of the MCT be trace-enabled) by
>> adding an extra function.  It's important to keep other users of MCT
>> traceable because the MCT is actually quite slow.
>
>
>
> Hi Doug,
>
> could you elaborate ? I don't get the 'because the MCT ... slow'

Sorry, I was trying to avoid duplication in the series and it's more
obvious when you look at parts 2 and 3 of the series.  ;)

Doing the math (please correct any miscalculations) using the numbers
from the other patches: You can see that the existing code takes
1323852 us for 1000000 gettimeofday in userspace.  The fastest
implementation (just shaving to a 32-bit timer) gets us as fast as
~1000000 us for 1000000 gettimeofday in userspace.

>From profiling, I believe that gettimeofday from userspace is about
50% overhead (system call, multiplication, copies, etc) and about 50%
MCT read.  That means that the fastest you can possibly do an MCT read
is in .5us or 500ns.

I believe an A15 has something like 1 or 2 cycles per instruction.  If
it were 2 cycles per instruction, it can execute a normal instruction
on a 2GHz machine in .5ns.  That means we can execute 1000 normal
instructions in the time it takes to do a since MCT access.

...so I guess that's what I'd call slow.  ;)  What do you think?  I
know that the MCT read shows up in whole system profiles of
gettimeofday.

-Doug

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
@ 2014-06-16  4:40     ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-16  4:40 UTC (permalink / raw)
  To: linux-arm-kernel

Daniel,

On Sun, Jun 15, 2014 at 2:18 PM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
> On 06/04/2014 07:30 PM, Doug Anderson wrote:
>>
>> In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
>> supported using the MCT as a scheduler clock.  We properly marked
>> exynos4_read_sched_clock() as notrace.  However, we then went and
>> called another function that _wasn't_ notrace.  That means if you do:
>>
>>    cd /sys/kernel/debug/tracing/
>>    echo function_graph > current_tracer
>>
>> You'll get a crash.
>>
>> Fix this (but still let other readers of the MCT be trace-enabled) by
>> adding an extra function.  It's important to keep other users of MCT
>> traceable because the MCT is actually quite slow.
>
>
>
> Hi Doug,
>
> could you elaborate ? I don't get the 'because the MCT ... slow'

Sorry, I was trying to avoid duplication in the series and it's more
obvious when you look at parts 2 and 3 of the series.  ;)

Doing the math (please correct any miscalculations) using the numbers
from the other patches: You can see that the existing code takes
1323852 us for 1000000 gettimeofday in userspace.  The fastest
implementation (just shaving to a 32-bit timer) gets us as fast as
~1000000 us for 1000000 gettimeofday in userspace.

>From profiling, I believe that gettimeofday from userspace is about
50% overhead (system call, multiplication, copies, etc) and about 50%
MCT read.  That means that the fastest you can possibly do an MCT read
is in .5us or 500ns.

I believe an A15 has something like 1 or 2 cycles per instruction.  If
it were 2 cycles per instruction, it can execute a normal instruction
on a 2GHz machine in .5ns.  That means we can execute 1000 normal
instructions in the time it takes to do a since MCT access.

...so I guess that's what I'd call slow.  ;)  What do you think?  I
know that the MCT read shows up in whole system profiles of
gettimeofday.

-Doug

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
  2014-06-16  4:40     ` Doug Anderson
  (?)
@ 2014-06-16  8:52       ` Daniel Lezcano
  -1 siblings, 0 replies; 43+ messages in thread
From: Daniel Lezcano @ 2014-06-16  8:52 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Kukjin Kim, Tomasz Figa, Vincent Guittot, Chirantan Ekbote,
	David Riley, Olof Johansson, linux-samsung-soc, Thomas Gleixner,
	linux-kernel, linux-arm-kernel

On 06/16/2014 06:40 AM, Doug Anderson wrote:
> Daniel,
>
> On Sun, Jun 15, 2014 at 2:18 PM, Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>> On 06/04/2014 07:30 PM, Doug Anderson wrote:
>>>
>>> In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
>>> supported using the MCT as a scheduler clock.  We properly marked
>>> exynos4_read_sched_clock() as notrace.  However, we then went and
>>> called another function that _wasn't_ notrace.  That means if you do:
>>>
>>>     cd /sys/kernel/debug/tracing/
>>>     echo function_graph > current_tracer
>>>
>>> You'll get a crash.
>>>
>>> Fix this (but still let other readers of the MCT be trace-enabled) by
>>> adding an extra function.  It's important to keep other users of MCT
>>> traceable because the MCT is actually quite slow.
>>
>>
>>
>> Hi Doug,
>>
>> could you elaborate ? I don't get the 'because the MCT ... slow'
>
> Sorry, I was trying to avoid duplication in the series and it's more
> obvious when you look at parts 2 and 3 of the series.  ;)
>
> Doing the math (please correct any miscalculations) using the numbers
> from the other patches: You can see that the existing code takes
> 1323852 us for 1000000 gettimeofday in userspace.  The fastest
> implementation (just shaving to a 32-bit timer) gets us as fast as
> ~1000000 us for 1000000 gettimeofday in userspace.
>
>  From profiling, I believe that gettimeofday from userspace is about
> 50% overhead (system call, multiplication, copies, etc) and about 50%
> MCT read.  That means that the fastest you can possibly do an MCT read
> is in .5us or 500ns.
>
> I believe an A15 has something like 1 or 2 cycles per instruction.  If
> it were 2 cycles per instruction, it can execute a normal instruction
> on a 2GHz machine in .5ns.  That means we can execute 1000 normal
> instructions in the time it takes to do a since MCT access.
>
> ...so I guess that's what I'd call slow.  ;)  What do you think?  I
> know that the MCT read shows up in whole system profiles of
> gettimeofday.

Hi Dough,

thanks for the explanation. I still don't get why it is important to 
keep others users of mct traceable because it is quite slow ? May be it 
is what you explained here, but I miss the connection between 'the other 
users' <-> 'traceable' <-> 'because slow'.

-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
@ 2014-06-16  8:52       ` Daniel Lezcano
  0 siblings, 0 replies; 43+ messages in thread
From: Daniel Lezcano @ 2014-06-16  8:52 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Kukjin Kim, Tomasz Figa, Vincent Guittot, Chirantan Ekbote,
	David Riley, Olof Johansson, linux-samsung-soc, Thomas Gleixner,
	linux-kernel, linux-arm-kernel

On 06/16/2014 06:40 AM, Doug Anderson wrote:
> Daniel,
>
> On Sun, Jun 15, 2014 at 2:18 PM, Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>> On 06/04/2014 07:30 PM, Doug Anderson wrote:
>>>
>>> In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
>>> supported using the MCT as a scheduler clock.  We properly marked
>>> exynos4_read_sched_clock() as notrace.  However, we then went and
>>> called another function that _wasn't_ notrace.  That means if you do:
>>>
>>>     cd /sys/kernel/debug/tracing/
>>>     echo function_graph > current_tracer
>>>
>>> You'll get a crash.
>>>
>>> Fix this (but still let other readers of the MCT be trace-enabled) by
>>> adding an extra function.  It's important to keep other users of MCT
>>> traceable because the MCT is actually quite slow.
>>
>>
>>
>> Hi Doug,
>>
>> could you elaborate ? I don't get the 'because the MCT ... slow'
>
> Sorry, I was trying to avoid duplication in the series and it's more
> obvious when you look at parts 2 and 3 of the series.  ;)
>
> Doing the math (please correct any miscalculations) using the numbers
> from the other patches: You can see that the existing code takes
> 1323852 us for 1000000 gettimeofday in userspace.  The fastest
> implementation (just shaving to a 32-bit timer) gets us as fast as
> ~1000000 us for 1000000 gettimeofday in userspace.
>
>  From profiling, I believe that gettimeofday from userspace is about
> 50% overhead (system call, multiplication, copies, etc) and about 50%
> MCT read.  That means that the fastest you can possibly do an MCT read
> is in .5us or 500ns.
>
> I believe an A15 has something like 1 or 2 cycles per instruction.  If
> it were 2 cycles per instruction, it can execute a normal instruction
> on a 2GHz machine in .5ns.  That means we can execute 1000 normal
> instructions in the time it takes to do a since MCT access.
>
> ...so I guess that's what I'd call slow.  ;)  What do you think?  I
> know that the MCT read shows up in whole system profiles of
> gettimeofday.

Hi Dough,

thanks for the explanation. I still don't get why it is important to 
keep others users of mct traceable because it is quite slow ? May be it 
is what you explained here, but I miss the connection between 'the other 
users' <-> 'traceable' <-> 'because slow'.

-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
@ 2014-06-16  8:52       ` Daniel Lezcano
  0 siblings, 0 replies; 43+ messages in thread
From: Daniel Lezcano @ 2014-06-16  8:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/16/2014 06:40 AM, Doug Anderson wrote:
> Daniel,
>
> On Sun, Jun 15, 2014 at 2:18 PM, Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>> On 06/04/2014 07:30 PM, Doug Anderson wrote:
>>>
>>> In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
>>> supported using the MCT as a scheduler clock.  We properly marked
>>> exynos4_read_sched_clock() as notrace.  However, we then went and
>>> called another function that _wasn't_ notrace.  That means if you do:
>>>
>>>     cd /sys/kernel/debug/tracing/
>>>     echo function_graph > current_tracer
>>>
>>> You'll get a crash.
>>>
>>> Fix this (but still let other readers of the MCT be trace-enabled) by
>>> adding an extra function.  It's important to keep other users of MCT
>>> traceable because the MCT is actually quite slow.
>>
>>
>>
>> Hi Doug,
>>
>> could you elaborate ? I don't get the 'because the MCT ... slow'
>
> Sorry, I was trying to avoid duplication in the series and it's more
> obvious when you look at parts 2 and 3 of the series.  ;)
>
> Doing the math (please correct any miscalculations) using the numbers
> from the other patches: You can see that the existing code takes
> 1323852 us for 1000000 gettimeofday in userspace.  The fastest
> implementation (just shaving to a 32-bit timer) gets us as fast as
> ~1000000 us for 1000000 gettimeofday in userspace.
>
>  From profiling, I believe that gettimeofday from userspace is about
> 50% overhead (system call, multiplication, copies, etc) and about 50%
> MCT read.  That means that the fastest you can possibly do an MCT read
> is in .5us or 500ns.
>
> I believe an A15 has something like 1 or 2 cycles per instruction.  If
> it were 2 cycles per instruction, it can execute a normal instruction
> on a 2GHz machine in .5ns.  That means we can execute 1000 normal
> instructions in the time it takes to do a since MCT access.
>
> ...so I guess that's what I'd call slow.  ;)  What do you think?  I
> know that the MCT read shows up in whole system profiles of
> gettimeofday.

Hi Dough,

thanks for the explanation. I still don't get why it is important to 
keep others users of mct traceable because it is quite slow ? May be it 
is what you explained here, but I miss the connection between 'the other 
users' <-> 'traceable' <-> 'because slow'.

-- 
  <http://www.linaro.org/> Linaro.org ? Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
  2014-06-16  8:52       ` Daniel Lezcano
  (?)
@ 2014-06-16 16:35         ` Doug Anderson
  -1 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-16 16:35 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Kukjin Kim, Tomasz Figa, Vincent Guittot, Chirantan Ekbote,
	David Riley, Olof Johansson, linux-samsung-soc, Thomas Gleixner,
	linux-kernel, linux-arm-kernel

Daniel,

On Mon, Jun 16, 2014 at 1:52 AM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
> Hi Dough,
>
> thanks for the explanation. I still don't get why it is important to keep
> others users of mct traceable because it is quite slow ? May be it is what
> you explained here, but I miss the connection between 'the other users' <->
> 'traceable' <-> 'because slow'.

It stems from the fact that one use of trace is to figure out why
things are slow.  In that case you want to make sure that the slow
things are traceable.

I can run the following:

cd /sys/kernel/debug/tracing
echo nop > current_tracer
echo 0 > options/sleep-time
echo 0 > options/graph-time
echo 1 > function_profile_enabled
~/gettimeofday 100000
echo 0 > function_profile_enabled

Then I can see this:

  Function                               Hit    Time            Avg
         s^2
  --------                               ---    ----            ---
         ---
 __getnstimeofday                    100005    258267.8 us     2.582
us        447196.9 us
  do_gettimeofday                     100004    252395.6 us     2.523
us        462549.6 us
  SyS_gettimeofday                    100004    249353.6 us     2.493
us        1223308 us
  getnstimeofday                      100005    246217.2 us     2.462
us        369149.8 us
  exynos4_frc_read                    101096    181620.0 us     1.796
us        157413.8 us

...if I mark exynos4_frc_read() as notrace then it doesn't show up in
the profile, which is unfortunate.

NOTE: the above profile is a bit misleading in terms of actual time
taken my belief is that we end up with some confusing numbers since
profiling uses the MCT itself (I think) to record how long a function
took to execute.  Despite this it's still awfully nice that
exynos4_frc_read() actually shows up in the trace...

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
@ 2014-06-16 16:35         ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-16 16:35 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Kukjin Kim, Tomasz Figa, Vincent Guittot, Chirantan Ekbote,
	David Riley, Olof Johansson, linux-samsung-soc, Thomas Gleixner,
	linux-kernel, linux-arm-kernel

Daniel,

On Mon, Jun 16, 2014 at 1:52 AM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
> Hi Dough,
>
> thanks for the explanation. I still don't get why it is important to keep
> others users of mct traceable because it is quite slow ? May be it is what
> you explained here, but I miss the connection between 'the other users' <->
> 'traceable' <-> 'because slow'.

It stems from the fact that one use of trace is to figure out why
things are slow.  In that case you want to make sure that the slow
things are traceable.

I can run the following:

cd /sys/kernel/debug/tracing
echo nop > current_tracer
echo 0 > options/sleep-time
echo 0 > options/graph-time
echo 1 > function_profile_enabled
~/gettimeofday 100000
echo 0 > function_profile_enabled

Then I can see this:

  Function                               Hit    Time            Avg
         s^2
  --------                               ---    ----            ---
         ---
 __getnstimeofday                    100005    258267.8 us     2.582
us        447196.9 us
  do_gettimeofday                     100004    252395.6 us     2.523
us        462549.6 us
  SyS_gettimeofday                    100004    249353.6 us     2.493
us        1223308 us
  getnstimeofday                      100005    246217.2 us     2.462
us        369149.8 us
  exynos4_frc_read                    101096    181620.0 us     1.796
us        157413.8 us

...if I mark exynos4_frc_read() as notrace then it doesn't show up in
the profile, which is unfortunate.

NOTE: the above profile is a bit misleading in terms of actual time
taken my belief is that we end up with some confusing numbers since
profiling uses the MCT itself (I think) to record how long a function
took to execute.  Despite this it's still awfully nice that
exynos4_frc_read() actually shows up in the trace...

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
@ 2014-06-16 16:35         ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-16 16:35 UTC (permalink / raw)
  To: linux-arm-kernel

Daniel,

On Mon, Jun 16, 2014 at 1:52 AM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
> Hi Dough,
>
> thanks for the explanation. I still don't get why it is important to keep
> others users of mct traceable because it is quite slow ? May be it is what
> you explained here, but I miss the connection between 'the other users' <->
> 'traceable' <-> 'because slow'.

It stems from the fact that one use of trace is to figure out why
things are slow.  In that case you want to make sure that the slow
things are traceable.

I can run the following:

cd /sys/kernel/debug/tracing
echo nop > current_tracer
echo 0 > options/sleep-time
echo 0 > options/graph-time
echo 1 > function_profile_enabled
~/gettimeofday 100000
echo 0 > function_profile_enabled

Then I can see this:

  Function                               Hit    Time            Avg
         s^2
  --------                               ---    ----            ---
         ---
 __getnstimeofday                    100005    258267.8 us     2.582
us        447196.9 us
  do_gettimeofday                     100004    252395.6 us     2.523
us        462549.6 us
  SyS_gettimeofday                    100004    249353.6 us     2.493
us        1223308 us
  getnstimeofday                      100005    246217.2 us     2.462
us        369149.8 us
  exynos4_frc_read                    101096    181620.0 us     1.796
us        157413.8 us

...if I mark exynos4_frc_read() as notrace then it doesn't show up in
the profile, which is unfortunate.

NOTE: the above profile is a bit misleading in terms of actual time
taken my belief is that we end up with some confusing numbers since
profiling uses the MCT itself (I think) to record how long a function
took to execute.  Despite this it's still awfully nice that
exynos4_frc_read() actually shows up in the trace...

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
  2014-06-04 17:30 ` Doug Anderson
@ 2014-06-17 12:13   ` Daniel Lezcano
  -1 siblings, 0 replies; 43+ messages in thread
From: Daniel Lezcano @ 2014-06-17 12:13 UTC (permalink / raw)
  To: Doug Anderson, Kukjin Kim, Tomasz Figa
  Cc: Vincent Guittot, Chirantan Ekbote, David Riley, olof,
	linux-samsung-soc, tglx, linux-kernel, linux-arm-kernel

On 06/04/2014 07:30 PM, Doug Anderson wrote:
> In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
> supported using the MCT as a scheduler clock.  We properly marked
> exynos4_read_sched_clock() as notrace.  However, we then went and
> called another function that _wasn't_ notrace.  That means if you do:
>
>    cd /sys/kernel/debug/tracing/
>    echo function_graph > current_tracer
>
> You'll get a crash.
>
> Fix this (but still let other readers of the MCT be trace-enabled) by
> adding an extra function.  It's important to keep other users of MCT
> traceable because the MCT is actually quite slow.

Thanks for the explanation in the other email.

I think the last sentence is a bit confusing because you are implicitly 
saying you need these traces to investigate why the timer is slow which 
is referring to something not related to this fix.

> Signed-off-by: Doug Anderson <dianders@chromium.org>
> ---
>   drivers/clocksource/exynos_mct.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
> index 8d64200..ba3a683 100644
> --- a/drivers/clocksource/exynos_mct.c
> +++ b/drivers/clocksource/exynos_mct.c
> @@ -165,7 +165,7 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
>   	exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
>   }
>
> -static cycle_t exynos4_frc_read(struct clocksource *cs)
> +static inline cycle_t notrace _exynos4_frc_read(void)

Why inline ?

>   {
>   	unsigned int lo, hi;
>   	u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
> @@ -179,6 +179,11 @@ static cycle_t exynos4_frc_read(struct clocksource *cs)
>   	return ((cycle_t)hi << 32) | lo;
>   }
>
> +static cycle_t exynos4_frc_read(struct clocksource *cs)
> +{
> +	return _exynos4_frc_read();
> +}
> +
>   static void exynos4_frc_resume(struct clocksource *cs)
>   {
>   	exynos4_mct_frc_start(0, 0);
> @@ -195,7 +200,7 @@ struct clocksource mct_frc = {
>
>   static u64 notrace exynos4_read_sched_clock(void)
>   {
> -	return exynos4_frc_read(&mct_frc);
> +	return _exynos4_frc_read();
>   }
>
>   static void __init exynos4_clocksource_init(void)
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
@ 2014-06-17 12:13   ` Daniel Lezcano
  0 siblings, 0 replies; 43+ messages in thread
From: Daniel Lezcano @ 2014-06-17 12:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/04/2014 07:30 PM, Doug Anderson wrote:
> In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
> supported using the MCT as a scheduler clock.  We properly marked
> exynos4_read_sched_clock() as notrace.  However, we then went and
> called another function that _wasn't_ notrace.  That means if you do:
>
>    cd /sys/kernel/debug/tracing/
>    echo function_graph > current_tracer
>
> You'll get a crash.
>
> Fix this (but still let other readers of the MCT be trace-enabled) by
> adding an extra function.  It's important to keep other users of MCT
> traceable because the MCT is actually quite slow.

Thanks for the explanation in the other email.

I think the last sentence is a bit confusing because you are implicitly 
saying you need these traces to investigate why the timer is slow which 
is referring to something not related to this fix.

> Signed-off-by: Doug Anderson <dianders@chromium.org>
> ---
>   drivers/clocksource/exynos_mct.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
> index 8d64200..ba3a683 100644
> --- a/drivers/clocksource/exynos_mct.c
> +++ b/drivers/clocksource/exynos_mct.c
> @@ -165,7 +165,7 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
>   	exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
>   }
>
> -static cycle_t exynos4_frc_read(struct clocksource *cs)
> +static inline cycle_t notrace _exynos4_frc_read(void)

Why inline ?

>   {
>   	unsigned int lo, hi;
>   	u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
> @@ -179,6 +179,11 @@ static cycle_t exynos4_frc_read(struct clocksource *cs)
>   	return ((cycle_t)hi << 32) | lo;
>   }
>
> +static cycle_t exynos4_frc_read(struct clocksource *cs)
> +{
> +	return _exynos4_frc_read();
> +}
> +
>   static void exynos4_frc_resume(struct clocksource *cs)
>   {
>   	exynos4_mct_frc_start(0, 0);
> @@ -195,7 +200,7 @@ struct clocksource mct_frc = {
>
>   static u64 notrace exynos4_read_sched_clock(void)
>   {
> -	return exynos4_frc_read(&mct_frc);
> +	return _exynos4_frc_read();
>   }
>
>   static void __init exynos4_clocksource_init(void)
>


-- 
  <http://www.linaro.org/> Linaro.org ? Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
  2014-06-17 12:13   ` Daniel Lezcano
  (?)
@ 2014-06-19 17:07     ` Doug Anderson
  -1 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-19 17:07 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Kukjin Kim, Tomasz Figa, Vincent Guittot, Chirantan Ekbote,
	David Riley, Olof Johansson, linux-samsung-soc, Thomas Gleixner,
	linux-kernel, linux-arm-kernel

Daniel,

On Tue, Jun 17, 2014 at 5:13 AM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
> On 06/04/2014 07:30 PM, Doug Anderson wrote:
>>
>> In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
>> supported using the MCT as a scheduler clock.  We properly marked
>> exynos4_read_sched_clock() as notrace.  However, we then went and
>> called another function that _wasn't_ notrace.  That means if you do:
>>
>>    cd /sys/kernel/debug/tracing/
>>    echo function_graph > current_tracer
>>
>> You'll get a crash.
>>
>> Fix this (but still let other readers of the MCT be trace-enabled) by
>> adding an extra function.  It's important to keep other users of MCT
>> traceable because the MCT is actually quite slow.
>
>
> Thanks for the explanation in the other email.
>
> I think the last sentence is a bit confusing because you are implicitly
> saying you need these traces to investigate why the timer is slow which is
> referring to something not related to this fix.

Done

>> Signed-off-by: Doug Anderson <dianders@chromium.org>
>> ---
>>   drivers/clocksource/exynos_mct.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/clocksource/exynos_mct.c
>> b/drivers/clocksource/exynos_mct.c
>> index 8d64200..ba3a683 100644
>> --- a/drivers/clocksource/exynos_mct.c
>> +++ b/drivers/clocksource/exynos_mct.c
>> @@ -165,7 +165,7 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
>>         exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
>>   }
>>
>> -static cycle_t exynos4_frc_read(struct clocksource *cs)
>> +static inline cycle_t notrace _exynos4_frc_read(void)
>
>
> Why inline ?

Somehow I thought that without the "inline" that somehow
exynos4_frc_read() would not be traceable.  ...but this makes no sense
so I've removed.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
@ 2014-06-19 17:07     ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-19 17:07 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Kukjin Kim, Tomasz Figa, Vincent Guittot, Chirantan Ekbote,
	David Riley, Olof Johansson, linux-samsung-soc, Thomas Gleixner,
	linux-kernel, linux-arm-kernel

Daniel,

On Tue, Jun 17, 2014 at 5:13 AM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
> On 06/04/2014 07:30 PM, Doug Anderson wrote:
>>
>> In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
>> supported using the MCT as a scheduler clock.  We properly marked
>> exynos4_read_sched_clock() as notrace.  However, we then went and
>> called another function that _wasn't_ notrace.  That means if you do:
>>
>>    cd /sys/kernel/debug/tracing/
>>    echo function_graph > current_tracer
>>
>> You'll get a crash.
>>
>> Fix this (but still let other readers of the MCT be trace-enabled) by
>> adding an extra function.  It's important to keep other users of MCT
>> traceable because the MCT is actually quite slow.
>
>
> Thanks for the explanation in the other email.
>
> I think the last sentence is a bit confusing because you are implicitly
> saying you need these traces to investigate why the timer is slow which is
> referring to something not related to this fix.

Done

>> Signed-off-by: Doug Anderson <dianders@chromium.org>
>> ---
>>   drivers/clocksource/exynos_mct.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/clocksource/exynos_mct.c
>> b/drivers/clocksource/exynos_mct.c
>> index 8d64200..ba3a683 100644
>> --- a/drivers/clocksource/exynos_mct.c
>> +++ b/drivers/clocksource/exynos_mct.c
>> @@ -165,7 +165,7 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
>>         exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
>>   }
>>
>> -static cycle_t exynos4_frc_read(struct clocksource *cs)
>> +static inline cycle_t notrace _exynos4_frc_read(void)
>
>
> Why inline ?

Somehow I thought that without the "inline" that somehow
exynos4_frc_read() would not be traceable.  ...but this makes no sense
so I've removed.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/3] clocksource: exynos_mct: Fix ftrace
@ 2014-06-19 17:07     ` Doug Anderson
  0 siblings, 0 replies; 43+ messages in thread
From: Doug Anderson @ 2014-06-19 17:07 UTC (permalink / raw)
  To: linux-arm-kernel

Daniel,

On Tue, Jun 17, 2014 at 5:13 AM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
> On 06/04/2014 07:30 PM, Doug Anderson wrote:
>>
>> In (93bfb76 clocksource: exynos_mct: register sched_clock callback) we
>> supported using the MCT as a scheduler clock.  We properly marked
>> exynos4_read_sched_clock() as notrace.  However, we then went and
>> called another function that _wasn't_ notrace.  That means if you do:
>>
>>    cd /sys/kernel/debug/tracing/
>>    echo function_graph > current_tracer
>>
>> You'll get a crash.
>>
>> Fix this (but still let other readers of the MCT be trace-enabled) by
>> adding an extra function.  It's important to keep other users of MCT
>> traceable because the MCT is actually quite slow.
>
>
> Thanks for the explanation in the other email.
>
> I think the last sentence is a bit confusing because you are implicitly
> saying you need these traces to investigate why the timer is slow which is
> referring to something not related to this fix.

Done

>> Signed-off-by: Doug Anderson <dianders@chromium.org>
>> ---
>>   drivers/clocksource/exynos_mct.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/clocksource/exynos_mct.c
>> b/drivers/clocksource/exynos_mct.c
>> index 8d64200..ba3a683 100644
>> --- a/drivers/clocksource/exynos_mct.c
>> +++ b/drivers/clocksource/exynos_mct.c
>> @@ -165,7 +165,7 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
>>         exynos4_mct_write(reg, EXYNOS4_MCT_G_TCON);
>>   }
>>
>> -static cycle_t exynos4_frc_read(struct clocksource *cs)
>> +static inline cycle_t notrace _exynos4_frc_read(void)
>
>
> Why inline ?

Somehow I thought that without the "inline" that somehow
exynos4_frc_read() would not be traceable.  ...but this makes no sense
so I've removed.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2014-06-19 17:07 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-04 17:30 [PATCH 1/3] clocksource: exynos_mct: Fix ftrace Doug Anderson
2014-06-04 17:30 ` Doug Anderson
2014-06-04 17:30 ` Doug Anderson
2014-06-04 17:30 ` [PATCH 2/3] clocksource: exynos_mct: cache mct upper count Doug Anderson
2014-06-04 17:30   ` Doug Anderson
2014-06-04 17:30   ` Doug Anderson
2014-06-05  7:55   ` Vincent Guittot
2014-06-05  7:55     ` Vincent Guittot
2014-06-05 17:14     ` Doug Anderson
2014-06-05 17:14       ` Doug Anderson
2014-06-04 17:30 ` [PATCH 3/3] clocksource: exynos_mct: Optimize register reads with ldmia Doug Anderson
2014-06-04 17:30   ` Doug Anderson
2014-06-04 17:30   ` Doug Anderson
2014-06-04 18:05   ` Thomas Gleixner
2014-06-04 18:05     ` Thomas Gleixner
2014-06-04 18:49     ` Doug Anderson
2014-06-04 18:49       ` Doug Anderson
2014-06-04 18:49       ` Doug Anderson
2014-06-05 11:18       ` Tomasz Figa
2014-06-05 11:18         ` Tomasz Figa
2014-06-05 11:18         ` Tomasz Figa
2014-06-05 18:21         ` Doug Anderson
2014-06-05 18:21           ` Doug Anderson
2014-06-05 18:21           ` Doug Anderson
2014-06-12 16:53     ` Doug Anderson
2014-06-12 16:53       ` Doug Anderson
2014-06-12 16:53       ` Doug Anderson
2014-06-15 21:18 ` [PATCH 1/3] clocksource: exynos_mct: Fix ftrace Daniel Lezcano
2014-06-15 21:18   ` Daniel Lezcano
2014-06-16  4:40   ` Doug Anderson
2014-06-16  4:40     ` Doug Anderson
2014-06-16  4:40     ` Doug Anderson
2014-06-16  8:52     ` Daniel Lezcano
2014-06-16  8:52       ` Daniel Lezcano
2014-06-16  8:52       ` Daniel Lezcano
2014-06-16 16:35       ` Doug Anderson
2014-06-16 16:35         ` Doug Anderson
2014-06-16 16:35         ` Doug Anderson
2014-06-17 12:13 ` Daniel Lezcano
2014-06-17 12:13   ` Daniel Lezcano
2014-06-19 17:07   ` Doug Anderson
2014-06-19 17:07     ` Doug Anderson
2014-06-19 17:07     ` Doug Anderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.