linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz
@ 2021-09-08 10:02 Nicolas Saenz Julienne
  2021-09-08 10:02 ` [PATCH 2/3] oslat: Add aarch64 support Nicolas Saenz Julienne
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Nicolas Saenz Julienne @ 2021-09-08 10:02 UTC (permalink / raw)
  To: linux-rt-users, peterx; +Cc: williams, jkacur, nsaenzju

'cpu_mhz' in oslat actually represents the frequency at which the high
frequency timer we measure with ticks. There is no need for it to match
the CPU frequency, nor will do on all supported architectures. So rename
it to 'timer_mhz' in order to better match reality.

Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
---
 src/oslat/oslat.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/oslat/oslat.c b/src/oslat/oslat.c
index 6ff5ba8..a4aa5f1 100644
--- a/src/oslat/oslat.c
+++ b/src/oslat/oslat.c
@@ -123,7 +123,7 @@ struct thread {
 	pthread_t            thread_id;
 
 	/* NOTE! this is also how many ticks per us */
-	unsigned int         cpu_mhz;
+	unsigned int         timer_mhz;
 	cycles_t             int_total;
 	stamp_t              frc_start;
 	stamp_t              frc_stop;
@@ -228,7 +228,7 @@ static int move_to_core(int core_i)
 	return sched_setaffinity(0, sizeof(cpus), &cpus);
 }
 
-static cycles_t __measure_cpu_hz(void)
+static cycles_t __measure_timer_hz(void)
 {
 	struct timeval tvs, tve;
 	stamp_t s, e;
@@ -244,13 +244,13 @@ static cycles_t __measure_cpu_hz(void)
 	return (cycles_t) ((e - s) / sec);
 }
 
-static unsigned int measure_cpu_mhz(void)
+static unsigned int measure_timer_mhz(void)
 {
 	cycles_t m, mprev, d;
 
-	mprev = __measure_cpu_hz();
+	mprev = __measure_timer_hz();
 	do {
-		m = __measure_cpu_hz();
+		m = __measure_timer_hz();
 		if (m > mprev)
 			d = m - mprev;
 		else
@@ -263,7 +263,7 @@ static unsigned int measure_cpu_mhz(void)
 
 static void thread_init(struct thread *t)
 {
-	t->cpu_mhz = measure_cpu_mhz();
+	t->timer_mhz = measure_timer_mhz();
 	t->maxlat = 0;
 	t->overflow_sum = 0;
 	t->minlat = (uint64_t)-1;
@@ -288,7 +288,7 @@ static void thread_init(struct thread *t)
 
 static float cycles_to_sec(const struct thread *t, uint64_t cycles)
 {
-	return cycles / (t->cpu_mhz * 1e6);
+	return cycles / (t->timer_mhz * 1e6);
 }
 
 static void insert_bucket(struct thread *t, stamp_t value)
@@ -296,7 +296,7 @@ static void insert_bucket(struct thread *t, stamp_t value)
 	int index, us;
 	uint64_t extra;
 
-	index = value / t->cpu_mhz;
+	index = value / t->timer_mhz;
 	assert(index >= 0);
 	us = index + 1;
 	assert(us > 0);
@@ -450,7 +450,7 @@ static void write_summary(struct thread *t)
 	calculate(t);
 
 	putfield("Core", t[i].core_i, "d", "");
-	putfield("CPU Freq", t[i].cpu_mhz, "u", " (Mhz)");
+	putfield("Timer Freq", t[i].timer_mhz, "u", " (Mhz)");
 
 	for (j = 0; j < g.bucket_size; j++) {
 		if (j < g.bucket_size-1 && g.output_omit_zero_buckets) {
@@ -494,7 +494,7 @@ static void write_summary_json(FILE *f, void *data)
 	for (i = 0; i < g.n_threads; ++i) {
 		fprintf(f, "    \"%u\": {\n", i);
 		fprintf(f, "      \"cpu\": %d,\n", t[i].core_i);
-		fprintf(f, "      \"freq\": %d,\n", t[i].cpu_mhz);
+		fprintf(f, "      \"freq\": %d,\n", t[i].timer_mhz);
 		fprintf(f, "      \"min\": %" PRIu64 ",\n", t[i].minlat);
 		fprintf(f, "      \"avg\": %3lf,\n", t[i].average);
 		fprintf(f, "      \"max\": %" PRIu64 ",\n", t[i].maxlat);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] oslat: Add aarch64 support
  2021-09-08 10:02 [PATCH 1/3] oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz Nicolas Saenz Julienne
@ 2021-09-08 10:02 ` Nicolas Saenz Julienne
  2021-09-08 18:09   ` Peter Xu
  2021-09-08 10:02 ` [PATCH 3/3] oslat: Allow for arch specific timer frequency measurements Nicolas Saenz Julienne
  2021-09-08 14:40 ` [PATCH 1/3] oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz Peter Xu
  2 siblings, 1 reply; 18+ messages in thread
From: Nicolas Saenz Julienne @ 2021-09-08 10:02 UTC (permalink / raw)
  To: linux-rt-users, peterx; +Cc: williams, jkacur, nsaenzju

The callbacks are based on Linux's implementation:
 - CNTVCT_EL0 provides direct access to the system virtual timer[1].
 - 'yield' serves as a CPU hint with similar semantics as x86's
   'pause'[2].

[1] See Linux's '__arch_get_hw_counter()' in arch/arm64/include/asm/vdso/gettimeofday.h
[2] See Linux's 1baa82f4803 ("arm64: Implement cpu_relax as yield").
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
---
 src/oslat/oslat.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/src/oslat/oslat.c b/src/oslat/oslat.c
index a4aa5f1..bd155a6 100644
--- a/src/oslat/oslat.c
+++ b/src/oslat/oslat.c
@@ -71,6 +71,19 @@ static inline void frc(uint64_t *pval)
 {
 	__asm__ __volatile__("mfspr %0, 268\n" : "=r" (*pval));
 }
+# elif defined(__aarch64__)
+#  define relax()          __asm__ __volatile("yield" : : : "memory")
+
+static inline void frc(uint64_t *pval)
+{
+
+	/*
+	 * This isb() is required to prevent that the counter value
+	 * is speculated.
+	 */
+	__asm__ __volatile__("isb; mrs %0, cntvct_el0" : "=r" (*pval));
+
+}
 # else
 #  define relax()          do { } while (0)
 #  define frc(x)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] oslat: Allow for arch specific timer frequency measurements
  2021-09-08 10:02 [PATCH 1/3] oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz Nicolas Saenz Julienne
  2021-09-08 10:02 ` [PATCH 2/3] oslat: Add aarch64 support Nicolas Saenz Julienne
@ 2021-09-08 10:02 ` Nicolas Saenz Julienne
  2021-09-08 18:16   ` Peter Xu
  2021-09-08 14:40 ` [PATCH 1/3] oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz Peter Xu
  2 siblings, 1 reply; 18+ messages in thread
From: Nicolas Saenz Julienne @ 2021-09-08 10:02 UTC (permalink / raw)
  To: linux-rt-users, peterx; +Cc: williams, jkacur, nsaenzju

Some architectures have special purpose registers to query the system
timer's frequency. Let's use that when available.

Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
---
 src/oslat/oslat.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/src/oslat/oslat.c b/src/oslat/oslat.c
index bd155a6..23ca9b6 100644
--- a/src/oslat/oslat.c
+++ b/src/oslat/oslat.c
@@ -51,6 +51,9 @@
 # define atomic_inc(ptr)   __sync_add_and_fetch((ptr), 1)
 # if defined(__x86_64__)
 #  define relax()          __asm__ __volatile__("pause" ::: "memory")
+
+#define measure_timer_mhz generic_measure_timer_mhz
+
 static inline void frc(uint64_t *pval)
 {
 	uint32_t low, high;
@@ -61,12 +64,18 @@ static inline void frc(uint64_t *pval)
 }
 # elif defined(__i386__)
 #  define relax()          __asm__ __volatile__("pause" ::: "memory")
+
+#define measure_timer_mhz generic_measure_timer_mhz
+
 static inline void frc(uint64_t *pval)
 {
 	__asm__ __volatile__("rdtsc" : "=A" (*pval));
 }
 # elif defined(__PPC64__)
 #  define relax()          do { } while (0)
+
+#define measure_timer_mhz generic_measure_timer_mhz
+
 static inline void frc(uint64_t *pval)
 {
 	__asm__ __volatile__("mfspr %0, 268\n" : "=r" (*pval));
@@ -74,6 +83,15 @@ static inline void frc(uint64_t *pval)
 # elif defined(__aarch64__)
 #  define relax()          __asm__ __volatile("yield" : : : "memory")
 
+static inline unsigned int measure_timer_mhz(void)
+{
+	unsigned int val;
+
+	__asm__ __volatile__("mrs %0, cntfrq_el0" : "=r" (val));
+
+	return val / 1e6;
+}
+
 static inline void frc(uint64_t *pval)
 {
 
@@ -257,7 +275,7 @@ static cycles_t __measure_timer_hz(void)
 	return (cycles_t) ((e - s) / sec);
 }
 
-static unsigned int measure_timer_mhz(void)
+static unsigned int __attribute__((unused)) generic_measure_timer_mhz(void)
 {
 	cycles_t m, mprev, d;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz
  2021-09-08 10:02 [PATCH 1/3] oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz Nicolas Saenz Julienne
  2021-09-08 10:02 ` [PATCH 2/3] oslat: Add aarch64 support Nicolas Saenz Julienne
  2021-09-08 10:02 ` [PATCH 3/3] oslat: Allow for arch specific timer frequency measurements Nicolas Saenz Julienne
@ 2021-09-08 14:40 ` Peter Xu
  2021-09-08 16:30   ` nsaenzju
  2 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2021-09-08 14:40 UTC (permalink / raw)
  To: Nicolas Saenz Julienne; +Cc: linux-rt-users, williams, jkacur

Hello, Nicolas,

On Wed, Sep 08, 2021 at 12:02:07PM +0200, Nicolas Saenz Julienne wrote:
> 'cpu_mhz' in oslat actually represents the frequency at which the high
> frequency timer we measure with ticks. There is no need for it to match
> the CPU frequency, nor will do on all supported architectures. So rename
> it to 'timer_mhz' in order to better match reality.

But right now "cpu_mhz" is indeed the cpu frequency per mhz, isn't it?  As I
believe that's how "time stamp counter" defined on x86. :)

I don't know what's the corresponding register for aarch64 to read the
processor clock cycles out, I did a quick google and it tells me PMCCNTR, but
I've no solid idea.  Or do you mean for some reason we can't read that info out
from aarch64?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz
  2021-09-08 14:40 ` [PATCH 1/3] oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz Peter Xu
@ 2021-09-08 16:30   ` nsaenzju
  2021-09-08 17:51     ` Peter Xu
  0 siblings, 1 reply; 18+ messages in thread
From: nsaenzju @ 2021-09-08 16:30 UTC (permalink / raw)
  To: Peter Xu; +Cc: linux-rt-users, williams, jkacur

Hi Peter, thanks for having a look at this! And sorry in advance for the long
documentation dump.

On Wed, 2021-09-08 at 10:40 -0400, Peter Xu wrote:
> Hello, Nicolas,
> 
> On Wed, Sep 08, 2021 at 12:02:07PM +0200, Nicolas Saenz Julienne wrote:
> > 'cpu_mhz' in oslat actually represents the frequency at which the high
> > frequency timer we measure with ticks. There is no need for it to match
> > the CPU frequency, nor will do on all supported architectures. So rename
> > it to 'timer_mhz' in order to better match reality.
> 
> But right now "cpu_mhz" is indeed the cpu frequency per mhz, isn't it?  As I
> believe that's how "time stamp counter" defined on x86. :)

Sadly I don't think this is really the case. In some cases TSC might match
CPU's base frequency, but it depends on the processor family and other
factors[1]. Also, the same applies for PPC64[2].

My reading is that, in general, we are only safe to assume we're getting a
constant monotonically increasing timer unrelated from CPU's actual frequency.

> I don't know what's the corresponding register for aarch64 to read the
> processor clock cycles out, I did a quick google and it tells me PMCCNTR, but
> I've no solid idea.  Or do you mean for some reason we can't read that info out
> from aarch64?

Sadly PMCCNTR isn't available at Exception Level 0 (user-space) and AFAIU we're
stuck with CNTVCT_EL0.

Regards,
Nicolas

[1] Intel TRM Vol 3B, 17.17 Time Stamp Counter

Processor families increment the time-stamp counter differently:

• For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 4
  processors, Intel Xeon processors (family [0FH], models [00H, 01H, or 02H]);
  and for P6 family processors: the time-stamp counter increments with every
  internal processor clock cycle. The internal processor clock cycle is
  determined by the current core-clock to bus-clock ratio. Intel® SpeedStep®
  technology transitions may also impact the processor clock.

• For Pentium 4 processors, Intel Xeon processors (family [0FH], models [03H
  and higher]); for Intel Core Solo and Intel Core Duo processors (family [06H],
  model [0EH]); for the Intel Xeon processor 5100 series and Intel Core 2 Duo
  processors (family [06H], model [0FH]); for Intel Core 2 and Intel Xeon
  processors (family [06H], DisplayModel [17H]); for Intel Atom processors
  (family [06H], DisplayModel [1CH]): the time-stamp counter increments at a
  constant rate. That rate may be set by the maximum core-clock to bus-clock
  ratio of the processor or may be set by the maximum resolved frequency at which
  the processor is booted. The maximum resolved frequency may differ from the
  processor base frequency, see Section 18.7.2 for more detail. On certain
  processors, the TSC frequency may not be the same as the frequency in the brand
  string. The specific processor configuration determines the behavior. Constant
  TSC behavior ensures that the duration of each clock tick is uniform and
  supports the use of the TSC as a wall clock timer even if the processor core
  changes frequency. This is the architectural behavior moving forward.

The specific processor configuration determines the behavior. Constant TSC
behavior ensures that the duration of each clock tick is uniform and supports
the use of the TSC as a wall clock timer even if the processor core changes
frequency. This is the architectural behavior moving forward.

[2] From __ppc_get_timebase() manpages: The Time Base Register is a 64-bit
register provided by Power Architecture processors. It stores a monotonically
incremented value that is updated at a system-dependent frequency that may be
different from the processor frequency.

Note that glibc's __ppc_get_timebase() and oslat's ppc frc() implementations
are the same:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/powerpc/sys/platform/ppc.h;h=8b0a66de1b93a56e3edf56c31a0c1505d7a8fc08;hb=HEAD#l27


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz
  2021-09-08 16:30   ` nsaenzju
@ 2021-09-08 17:51     ` Peter Xu
  2021-09-09  9:41       ` nsaenzju
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2021-09-08 17:51 UTC (permalink / raw)
  To: nsaenzju; +Cc: linux-rt-users, williams, jkacur

On Wed, Sep 08, 2021 at 06:30:50PM +0200, nsaenzju@redhat.com wrote:
> Hi Peter, thanks for having a look at this! And sorry in advance for the long
> documentation dump.

It's actually great to reference the documents, thanks for that. I'd appreciate
if you can also add some more explanation into commit message too, may not be
the full copy-paste of the document though, just with the explicit x86/ppc
examples to show this is not for arm-only idea?

Note there's a typo in subject too:

  oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz
                                            ^^^ timer

> 
> On Wed, 2021-09-08 at 10:40 -0400, Peter Xu wrote:
> > Hello, Nicolas,
> > 
> > On Wed, Sep 08, 2021 at 12:02:07PM +0200, Nicolas Saenz Julienne wrote:
> > > 'cpu_mhz' in oslat actually represents the frequency at which the high
> > > frequency timer we measure with ticks. There is no need for it to match
> > > the CPU frequency, nor will do on all supported architectures. So rename
> > > it to 'timer_mhz' in order to better match reality.
> > 
> > But right now "cpu_mhz" is indeed the cpu frequency per mhz, isn't it?  As I
> > believe that's how "time stamp counter" defined on x86. :)
> 
> Sadly I don't think this is really the case. In some cases TSC might match
> CPU's base frequency, but it depends on the processor family and other
> factors[1]. Also, the same applies for PPC64[2].
> 
> My reading is that, in general, we are only safe to assume we're getting a
> constant monotonically increasing timer unrelated from CPU's actual frequency.
> 
> > I don't know what's the corresponding register for aarch64 to read the
> > processor clock cycles out, I did a quick google and it tells me PMCCNTR, but
> > I've no solid idea.  Or do you mean for some reason we can't read that info out
> > from aarch64?
> 
> Sadly PMCCNTR isn't available at Exception Level 0 (user-space) and AFAIU we're
> stuck with CNTVCT_EL0.

Fair enough.

Though I still think the name "timer" is vague - timer normally means to me
that there's a setup+invoke procedure, and there can be overhead.  While all
these monotonically increasing registers (either real or virtual) should not
have those, we just read some counter out.

How about "clock_[m]hz"?  It does not necessarily to be a property of the
processor, but it's indeed a clock at least in electronics terms.

Thanks,

> 
> Regards,
> Nicolas
> 
> [1] Intel TRM Vol 3B, 17.17 Time Stamp Counter
> 
> Processor families increment the time-stamp counter differently:
> 
> • For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 4
>   processors, Intel Xeon processors (family [0FH], models [00H, 01H, or 02H]);
>   and for P6 family processors: the time-stamp counter increments with every
>   internal processor clock cycle. The internal processor clock cycle is
>   determined by the current core-clock to bus-clock ratio. Intel® SpeedStep®
>   technology transitions may also impact the processor clock.
> 
> • For Pentium 4 processors, Intel Xeon processors (family [0FH], models [03H
>   and higher]); for Intel Core Solo and Intel Core Duo processors (family [06H],
>   model [0EH]); for the Intel Xeon processor 5100 series and Intel Core 2 Duo
>   processors (family [06H], model [0FH]); for Intel Core 2 and Intel Xeon
>   processors (family [06H], DisplayModel [17H]); for Intel Atom processors
>   (family [06H], DisplayModel [1CH]): the time-stamp counter increments at a
>   constant rate. That rate may be set by the maximum core-clock to bus-clock
>   ratio of the processor or may be set by the maximum resolved frequency at which
>   the processor is booted. The maximum resolved frequency may differ from the
>   processor base frequency, see Section 18.7.2 for more detail. On certain
>   processors, the TSC frequency may not be the same as the frequency in the brand
>   string. The specific processor configuration determines the behavior. Constant
>   TSC behavior ensures that the duration of each clock tick is uniform and
>   supports the use of the TSC as a wall clock timer even if the processor core
>   changes frequency. This is the architectural behavior moving forward.
> 
> The specific processor configuration determines the behavior. Constant TSC
> behavior ensures that the duration of each clock tick is uniform and supports
> the use of the TSC as a wall clock timer even if the processor core changes
> frequency. This is the architectural behavior moving forward.
> 
> [2] From __ppc_get_timebase() manpages: The Time Base Register is a 64-bit
> register provided by Power Architecture processors. It stores a monotonically
> incremented value that is updated at a system-dependent frequency that may be
> different from the processor frequency.
> 
> Note that glibc's __ppc_get_timebase() and oslat's ppc frc() implementations
> are the same:
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/powerpc/sys/platform/ppc.h;h=8b0a66de1b93a56e3edf56c31a0c1505d7a8fc08;hb=HEAD#l27
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] oslat: Add aarch64 support
  2021-09-08 10:02 ` [PATCH 2/3] oslat: Add aarch64 support Nicolas Saenz Julienne
@ 2021-09-08 18:09   ` Peter Xu
  2021-09-09 10:10     ` nsaenzju
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2021-09-08 18:09 UTC (permalink / raw)
  To: Nicolas Saenz Julienne; +Cc: linux-rt-users, williams, jkacur

On Wed, Sep 08, 2021 at 12:02:08PM +0200, Nicolas Saenz Julienne wrote:
> The callbacks are based on Linux's implementation:
>  - CNTVCT_EL0 provides direct access to the system virtual timer[1].
>  - 'yield' serves as a CPU hint with similar semantics as x86's
>    'pause'[2].
> 
> [1] See Linux's '__arch_get_hw_counter()' in arch/arm64/include/asm/vdso/gettimeofday.h
> [2] See Linux's 1baa82f4803 ("arm64: Implement cpu_relax as yield").
> Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
> ---
>  src/oslat/oslat.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/src/oslat/oslat.c b/src/oslat/oslat.c
> index a4aa5f1..bd155a6 100644
> --- a/src/oslat/oslat.c
> +++ b/src/oslat/oslat.c
> @@ -71,6 +71,19 @@ static inline void frc(uint64_t *pval)
>  {
>  	__asm__ __volatile__("mfspr %0, 268\n" : "=r" (*pval));
>  }
> +# elif defined(__aarch64__)
> +#  define relax()          __asm__ __volatile("yield" : : : "memory")
> +
> +static inline void frc(uint64_t *pval)
> +{
> +

newline to drop?

> +	/*
> +	 * This isb() is required to prevent that the counter value
> +	 * is speculated.
> +	 */
> +	__asm__ __volatile__("isb; mrs %0, cntvct_el0" : "=r" (*pval));

I saw that commit 27e11a9fe2e2e added two isbs, one before, one after.  Then
commit 77ec462536a1 replaced the 2nd isb into another magic.  This function
dropped the 2nd barrier.  Also, the same to compiler barrier "memory" that's
gone too.

Is it on purpose to drop them?

No experience on arm, so just raise this up.  Thanks,

> +
> +}
>  # else
>  #  define relax()          do { } while (0)
>  #  define frc(x)
> -- 
> 2.31.1
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] oslat: Allow for arch specific timer frequency measurements
  2021-09-08 10:02 ` [PATCH 3/3] oslat: Allow for arch specific timer frequency measurements Nicolas Saenz Julienne
@ 2021-09-08 18:16   ` Peter Xu
  2021-09-09 10:29     ` nsaenzju
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2021-09-08 18:16 UTC (permalink / raw)
  To: Nicolas Saenz Julienne; +Cc: linux-rt-users, williams, jkacur

On Wed, Sep 08, 2021 at 12:02:09PM +0200, Nicolas Saenz Julienne wrote:
> Some architectures have special purpose registers to query the system
> timer's frequency. Let's use that when available.
> 
> Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
> ---
>  src/oslat/oslat.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/src/oslat/oslat.c b/src/oslat/oslat.c
> index bd155a6..23ca9b6 100644
> --- a/src/oslat/oslat.c
> +++ b/src/oslat/oslat.c
> @@ -51,6 +51,9 @@
>  # define atomic_inc(ptr)   __sync_add_and_fetch((ptr), 1)
>  # if defined(__x86_64__)
>  #  define relax()          __asm__ __volatile__("pause" ::: "memory")
> +
> +#define measure_timer_mhz generic_measure_timer_mhz
> +
>  static inline void frc(uint64_t *pval)
>  {
>  	uint32_t low, high;
> @@ -61,12 +64,18 @@ static inline void frc(uint64_t *pval)
>  }
>  # elif defined(__i386__)
>  #  define relax()          __asm__ __volatile__("pause" ::: "memory")
> +
> +#define measure_timer_mhz generic_measure_timer_mhz
> +
>  static inline void frc(uint64_t *pval)
>  {
>  	__asm__ __volatile__("rdtsc" : "=A" (*pval));
>  }
>  # elif defined(__PPC64__)
>  #  define relax()          do { } while (0)
> +
> +#define measure_timer_mhz generic_measure_timer_mhz
> +
>  static inline void frc(uint64_t *pval)
>  {
>  	__asm__ __volatile__("mfspr %0, 268\n" : "=r" (*pval));
> @@ -74,6 +83,15 @@ static inline void frc(uint64_t *pval)
>  # elif defined(__aarch64__)
>  #  define relax()          __asm__ __volatile("yield" : : : "memory")
>  
> +static inline unsigned int measure_timer_mhz(void)
> +{
> +	unsigned int val;
> +
> +	__asm__ __volatile__("mrs %0, cntfrq_el0" : "=r" (val));
> +
> +	return val / 1e6;
> +}
> +
>  static inline void frc(uint64_t *pval)
>  {
>  
> @@ -257,7 +275,7 @@ static cycles_t __measure_timer_hz(void)
>  	return (cycles_t) ((e - s) / sec);
>  }
>  
> -static unsigned int measure_timer_mhz(void)
> +static unsigned int __attribute__((unused)) generic_measure_timer_mhz(void)

This is okay I guess, but does not look nice, as it's marked unused even if
it's used..

How about for any arch that supports a faster version to read the freq, do:

#define measure_timer_mhz_fast
unsigned int measure_timer_mhz_fast(void)
{
    ...
}

Then at entry of measure_timer_mhz():

#ifdef measure_timer_mhz_fast
    return measure_timer_mhz_fast();
#endif

I'm also wondering the diff between the two methods on arm64 and whether
there's a huge difference on the numbers.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz
  2021-09-08 17:51     ` Peter Xu
@ 2021-09-09  9:41       ` nsaenzju
  2021-09-09 18:05         ` Peter Xu
  0 siblings, 1 reply; 18+ messages in thread
From: nsaenzju @ 2021-09-09  9:41 UTC (permalink / raw)
  To: Peter Xu; +Cc: linux-rt-users, williams, jkacur

On Wed, 2021-09-08 at 13:51 -0400, Peter Xu wrote:
> On Wed, Sep 08, 2021 at 06:30:50PM +0200, nsaenzju@redhat.com wrote:
> > Hi Peter, thanks for having a look at this! And sorry in advance for the long
> > documentation dump.
> 
> It's actually great to reference the documents, thanks for that. I'd appreciate
> if you can also add some more explanation into commit message too, may not be
> the full copy-paste of the document though, just with the explicit x86/ppc
> examples to show this is not for arm-only idea?

Will do.

> Note there's a typo in subject too:
> 
>   oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz
>                                             ^^^ timer

Noted.

> > 
> > On Wed, 2021-09-08 at 10:40 -0400, Peter Xu wrote:
> > > Hello, Nicolas,
> > > 
> > > On Wed, Sep 08, 2021 at 12:02:07PM +0200, Nicolas Saenz Julienne wrote:
> > > > 'cpu_mhz' in oslat actually represents the frequency at which the high
> > > > frequency timer we measure with ticks. There is no need for it to match
> > > > the CPU frequency, nor will do on all supported architectures. So rename
> > > > it to 'timer_mhz' in order to better match reality.
> > > 
> > > But right now "cpu_mhz" is indeed the cpu frequency per mhz, isn't it?  As I
> > > believe that's how "time stamp counter" defined on x86. :)
> > 
> > Sadly I don't think this is really the case. In some cases TSC might match
> > CPU's base frequency, but it depends on the processor family and other
> > factors[1]. Also, the same applies for PPC64[2].
> > 
> > My reading is that, in general, we are only safe to assume we're getting a
> > constant monotonically increasing timer unrelated from CPU's actual frequency.
> > 
> > > I don't know what's the corresponding register for aarch64 to read the
> > > processor clock cycles out, I did a quick google and it tells me PMCCNTR, but
> > > I've no solid idea.  Or do you mean for some reason we can't read that info out
> > > from aarch64?
> > 
> > Sadly PMCCNTR isn't available at Exception Level 0 (user-space) and AFAIU we're
> > stuck with CNTVCT_EL0.
> 
> Fair enough.
> 
> Though I still think the name "timer" is vague - timer normally means to me
> that there's a setup+invoke procedure, and there can be overhead.  While all
> these monotonically increasing registers (either real or virtual) should not
> have those, we just read some counter out.
> 
> How about "clock_[m]hz"?  It does not necessarily to be a property of the
> processor, but it's indeed a clock at least in electronics terms.

How about 'counter_[m]hz'? If not I'm fine with 'clock_[m]hz'.

-- 
Nicolás Sáenz


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] oslat: Add aarch64 support
  2021-09-08 18:09   ` Peter Xu
@ 2021-09-09 10:10     ` nsaenzju
  2021-09-09 18:03       ` Peter Xu
  0 siblings, 1 reply; 18+ messages in thread
From: nsaenzju @ 2021-09-09 10:10 UTC (permalink / raw)
  To: Peter Xu; +Cc: linux-rt-users, williams, jkacur

On Wed, 2021-09-08 at 14:09 -0400, Peter Xu wrote:
> On Wed, Sep 08, 2021 at 12:02:08PM +0200, Nicolas Saenz Julienne wrote:
> > The callbacks are based on Linux's implementation:
> >  - CNTVCT_EL0 provides direct access to the system virtual timer[1].
> >  - 'yield' serves as a CPU hint with similar semantics as x86's
> >    'pause'[2].
> > 
> > [1] See Linux's '__arch_get_hw_counter()' in arch/arm64/include/asm/vdso/gettimeofday.h
> > [2] See Linux's 1baa82f4803 ("arm64: Implement cpu_relax as yield").
> > Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
> > ---
> >  src/oslat/oslat.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/src/oslat/oslat.c b/src/oslat/oslat.c
> > index a4aa5f1..bd155a6 100644
> > --- a/src/oslat/oslat.c
> > +++ b/src/oslat/oslat.c
> > @@ -71,6 +71,19 @@ static inline void frc(uint64_t *pval)
> >  {
> >  	__asm__ __volatile__("mfspr %0, 268\n" : "=r" (*pval));
> >  }
> > +# elif defined(__aarch64__)
> > +#  define relax()          __asm__ __volatile("yield" : : : "memory")
> > +
> > +static inline void frc(uint64_t *pval)
> > +{
> > +
> 
> newline to drop?

Noted.

> > +	/*
> > +	 * This isb() is required to prevent that the counter value
> > +	 * is speculated.
> > +	 */
> > +	__asm__ __volatile__("isb; mrs %0, cntvct_el0" : "=r" (*pval));
> 
> I saw that commit 27e11a9fe2e2e added two isbs, one before, one after.  Then
> commit 77ec462536a1 replaced the 2nd isb into another magic.  This function
> dropped the 2nd barrier.  Also, the same to compiler barrier "memory" that's
> gone too.
> 
> Is it on purpose to drop them?

Yes, I removed it on purpose. VDSO's gettimeofday implementation uses a seqlock
to protect against changes to the counter's properties/state: you want to make
sure access to the counter register is ordered WRT access to the seqlock
protecting it. We don't really care for all this, as we trust our counters to
be stable.

-- 
Nicolás Sáenz


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] oslat: Allow for arch specific timer frequency measurements
  2021-09-08 18:16   ` Peter Xu
@ 2021-09-09 10:29     ` nsaenzju
  2021-09-09 18:04       ` Peter Xu
  0 siblings, 1 reply; 18+ messages in thread
From: nsaenzju @ 2021-09-09 10:29 UTC (permalink / raw)
  To: Peter Xu; +Cc: linux-rt-users, williams, jkacur

On Wed, 2021-09-08 at 14:16 -0400, Peter Xu wrote:
> On Wed, Sep 08, 2021 at 12:02:09PM +0200, Nicolas Saenz Julienne wrote:
> > Some architectures have special purpose registers to query the system
> > timer's frequency. Let's use that when available.
> > 
> > Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
> > ---
> >  src/oslat/oslat.c | 20 +++++++++++++++++++-
> >  1 file changed, 19 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/oslat/oslat.c b/src/oslat/oslat.c
> > index bd155a6..23ca9b6 100644
> > --- a/src/oslat/oslat.c
> > +++ b/src/oslat/oslat.c
> > @@ -51,6 +51,9 @@
> >  # define atomic_inc(ptr)   __sync_add_and_fetch((ptr), 1)
> >  # if defined(__x86_64__)
> >  #  define relax()          __asm__ __volatile__("pause" ::: "memory")
> > +
> > +#define measure_timer_mhz generic_measure_timer_mhz
> > +
> >  static inline void frc(uint64_t *pval)
> >  {
> >  	uint32_t low, high;
> > @@ -61,12 +64,18 @@ static inline void frc(uint64_t *pval)
> >  }
> >  # elif defined(__i386__)
> >  #  define relax()          __asm__ __volatile__("pause" ::: "memory")
> > +
> > +#define measure_timer_mhz generic_measure_timer_mhz
> > +
> >  static inline void frc(uint64_t *pval)
> >  {
> >  	__asm__ __volatile__("rdtsc" : "=A" (*pval));
> >  }
> >  # elif defined(__PPC64__)
> >  #  define relax()          do { } while (0)
> > +
> > +#define measure_timer_mhz generic_measure_timer_mhz
> > +
> >  static inline void frc(uint64_t *pval)
> >  {
> >  	__asm__ __volatile__("mfspr %0, 268\n" : "=r" (*pval));
> > @@ -74,6 +83,15 @@ static inline void frc(uint64_t *pval)
> >  # elif defined(__aarch64__)
> >  #  define relax()          __asm__ __volatile("yield" : : : "memory")
> >  
> > +static inline unsigned int measure_timer_mhz(void)
> > +{
> > +	unsigned int val;
> > +
> > +	__asm__ __volatile__("mrs %0, cntfrq_el0" : "=r" (val));
> > +
> > +	return val / 1e6;
> > +}
> > +
> >  static inline void frc(uint64_t *pval)
> >  {
> >  
> > @@ -257,7 +275,7 @@ static cycles_t __measure_timer_hz(void)
> >  	return (cycles_t) ((e - s) / sec);
> >  }
> >  
> > -static unsigned int measure_timer_mhz(void)
> > +static unsigned int __attribute__((unused)) generic_measure_timer_mhz(void)
> 
> This is okay I guess, but does not look nice, as it's marked unused even if
> it's used..

Yes the compiler attribute naming is unfortunate. That's why the kernel renamed
it to '__maybe_unused'.

> How about for any arch that supports a faster version to read the freq, do:
> 
> #define measure_timer_mhz_fast
> unsigned int measure_timer_mhz_fast(void)
> {
>     ...
> }
> 
> Then at entry of measure_timer_mhz():
> 
> #ifdef measure_timer_mhz_fast
>     return measure_timer_mhz_fast();
> #endif

Sounds good to me.

> I'm also wondering the diff between the two methods on arm64 and whether
> there's a huge difference on the numbers.

There isn't much, after rounding the end result in MHz is the same regardless
of the method used. IIUC, gettimeoftheday() uses the frequency register's value
to scale the counter measurements. So it makes sense for the results to be
coherent.

I figured it's cleaner to get the data from the source instead of inferring it
with extra math.

-- 
Nicolás Sáenz


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] oslat: Add aarch64 support
  2021-09-09 10:10     ` nsaenzju
@ 2021-09-09 18:03       ` Peter Xu
  2021-09-10 12:19         ` nsaenzju
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2021-09-09 18:03 UTC (permalink / raw)
  To: nsaenzju; +Cc: linux-rt-users, williams, jkacur

On Thu, Sep 09, 2021 at 12:10:49PM +0200, nsaenzju@redhat.com wrote:
> On Wed, 2021-09-08 at 14:09 -0400, Peter Xu wrote:
> > On Wed, Sep 08, 2021 at 12:02:08PM +0200, Nicolas Saenz Julienne wrote:
> > > The callbacks are based on Linux's implementation:
> > >  - CNTVCT_EL0 provides direct access to the system virtual timer[1].
> > >  - 'yield' serves as a CPU hint with similar semantics as x86's
> > >    'pause'[2].
> > > 
> > > [1] See Linux's '__arch_get_hw_counter()' in arch/arm64/include/asm/vdso/gettimeofday.h
> > > [2] See Linux's 1baa82f4803 ("arm64: Implement cpu_relax as yield").
> > > Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
> > > ---
> > >  src/oslat/oslat.c | 13 +++++++++++++
> > >  1 file changed, 13 insertions(+)
> > > 
> > > diff --git a/src/oslat/oslat.c b/src/oslat/oslat.c
> > > index a4aa5f1..bd155a6 100644
> > > --- a/src/oslat/oslat.c
> > > +++ b/src/oslat/oslat.c
> > > @@ -71,6 +71,19 @@ static inline void frc(uint64_t *pval)
> > >  {
> > >  	__asm__ __volatile__("mfspr %0, 268\n" : "=r" (*pval));
> > >  }
> > > +# elif defined(__aarch64__)
> > > +#  define relax()          __asm__ __volatile("yield" : : : "memory")
> > > +
> > > +static inline void frc(uint64_t *pval)
> > > +{
> > > +
> > 
> > newline to drop?
> 
> Noted.
> 
> > > +	/*
> > > +	 * This isb() is required to prevent that the counter value
> > > +	 * is speculated.
> > > +	 */
> > > +	__asm__ __volatile__("isb; mrs %0, cntvct_el0" : "=r" (*pval));
> > 
> > I saw that commit 27e11a9fe2e2e added two isbs, one before, one after.  Then
> > commit 77ec462536a1 replaced the 2nd isb into another magic.  This function
> > dropped the 2nd barrier.  Also, the same to compiler barrier "memory" that's
> > gone too.
> > 
> > Is it on purpose to drop them?
> 
> Yes, I removed it on purpose. VDSO's gettimeofday implementation uses a seqlock
> to protect against changes to the counter's properties/state: you want to make
> sure access to the counter register is ordered WRT access to the seqlock
> protecting it. We don't really care for all this, as we trust our counters to
> be stable.

OK, since you've referenced the code, would you mind add these into the commit
message too?

I also don't understand why you explicitly removed the compiler barrier.  IIUC
when without it the compiler could move these instructions to be before/after
other instructions generated in the c code.  That may not really happen in
practise, but just curious why the explicit removal.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] oslat: Allow for arch specific timer frequency measurements
  2021-09-09 10:29     ` nsaenzju
@ 2021-09-09 18:04       ` Peter Xu
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2021-09-09 18:04 UTC (permalink / raw)
  To: nsaenzju; +Cc: linux-rt-users, williams, jkacur

On Thu, Sep 09, 2021 at 12:29:17PM +0200, nsaenzju@redhat.com wrote:
> > I'm also wondering the diff between the two methods on arm64 and whether
> > there's a huge difference on the numbers.
> 
> There isn't much, after rounding the end result in MHz is the same regardless
> of the method used. IIUC, gettimeoftheday() uses the frequency register's value
> to scale the counter measurements. So it makes sense for the results to be
> coherent.
> 
> I figured it's cleaner to get the data from the source instead of inferring it
> with extra math.

Fair enough.  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz
  2021-09-09  9:41       ` nsaenzju
@ 2021-09-09 18:05         ` Peter Xu
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2021-09-09 18:05 UTC (permalink / raw)
  To: nsaenzju; +Cc: linux-rt-users, williams, jkacur

On Thu, Sep 09, 2021 at 11:41:22AM +0200, nsaenzju@redhat.com wrote:
> How about 'counter_[m]hz'? If not I'm fine with 'clock_[m]hz'.

'counter_[m]hz' looks good to me too.  Thanks.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] oslat: Add aarch64 support
  2021-09-09 18:03       ` Peter Xu
@ 2021-09-10 12:19         ` nsaenzju
  2021-09-10 13:33           ` Peter Xu
  0 siblings, 1 reply; 18+ messages in thread
From: nsaenzju @ 2021-09-10 12:19 UTC (permalink / raw)
  To: Peter Xu; +Cc: linux-rt-users, williams, jkacur

On Thu, 2021-09-09 at 14:03 -0400, Peter Xu wrote:
> On Thu, Sep 09, 2021 at 12:10:49PM +0200, nsaenzju@redhat.com wrote:
> > On Wed, 2021-09-08 at 14:09 -0400, Peter Xu wrote:
> > > On Wed, Sep 08, 2021 at 12:02:08PM +0200, Nicolas Saenz Julienne wrote:
> > > > The callbacks are based on Linux's implementation:
> > > >  - CNTVCT_EL0 provides direct access to the system virtual timer[1].
> > > >  - 'yield' serves as a CPU hint with similar semantics as x86's
> > > >    'pause'[2].
> > > > 
> > > > [1] See Linux's '__arch_get_hw_counter()' in arch/arm64/include/asm/vdso/gettimeofday.h
> > > > [2] See Linux's 1baa82f4803 ("arm64: Implement cpu_relax as yield").
> > > > Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
> > > > ---
> > > >  src/oslat/oslat.c | 13 +++++++++++++
> > > >  1 file changed, 13 insertions(+)
> > > > 
> > > > diff --git a/src/oslat/oslat.c b/src/oslat/oslat.c
> > > > index a4aa5f1..bd155a6 100644
> > > > --- a/src/oslat/oslat.c
> > > > +++ b/src/oslat/oslat.c
> > > > @@ -71,6 +71,19 @@ static inline void frc(uint64_t *pval)
> > > >  {
> > > >  	__asm__ __volatile__("mfspr %0, 268\n" : "=r" (*pval));
> > > >  }
> > > > +# elif defined(__aarch64__)
> > > > +#  define relax()          __asm__ __volatile("yield" : : : "memory")
> > > > +
> > > > +static inline void frc(uint64_t *pval)
> > > > +{
> > > > +
> > > 
> > > newline to drop?
> > 
> > Noted.
> > 
> > > > +	/*
> > > > +	 * This isb() is required to prevent that the counter value
> > > > +	 * is speculated.
> > > > +	 */
> > > > +	__asm__ __volatile__("isb; mrs %0, cntvct_el0" : "=r" (*pval));
> > > 
> > > I saw that commit 27e11a9fe2e2e added two isbs, one before, one after.  Then
> > > commit 77ec462536a1 replaced the 2nd isb into another magic.  This function
> > > dropped the 2nd barrier.  Also, the same to compiler barrier "memory" that's
> > > gone too.
> > > 
> > > Is it on purpose to drop them?
> > 
> > Yes, I removed it on purpose. VDSO's gettimeofday implementation uses a seqlock
> > to protect against changes to the counter's properties/state: you want to make
> > sure access to the counter register is ordered WRT access to the seqlock
> > protecting it. We don't really care for all this, as we trust our counters to
> > be stable.
> 
> OK, since you've referenced the code, would you mind add these into the commit
> message too?

Will do!

> I also don't understand why you explicitly removed the compiler barrier.  IIUC
> when without it the compiler could move these instructions to be before/after
> other instructions generated in the c code.  That may not really happen in
> practise, but just curious why the explicit removal.

I removed it too as I see no justification for it. There is nothing, except for
the actual timestamp values (which are safe as they come from an mrs), that
could suffer from the compiler prefetching the value, or reordering accesses.
I'll add a comment on the commit message.

-- 
Nicolás Sáenz


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] oslat: Add aarch64 support
  2021-09-10 12:19         ` nsaenzju
@ 2021-09-10 13:33           ` Peter Xu
  2021-09-10 14:27             ` nsaenzju
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2021-09-10 13:33 UTC (permalink / raw)
  To: nsaenzju; +Cc: linux-rt-users, williams, jkacur

On Fri, Sep 10, 2021 at 02:19:40PM +0200, nsaenzju@redhat.com wrote:
> > I also don't understand why you explicitly removed the compiler barrier.  IIUC
> > when without it the compiler could move these instructions to be before/after
> > other instructions generated in the c code.  That may not really happen in
> > practise, but just curious why the explicit removal.
> 
> I removed it too as I see no justification for it. There is nothing, except for
> the actual timestamp values (which are safe as they come from an mrs), that
> could suffer from the compiler prefetching the value, or reordering accesses.
> I'll add a comment on the commit message.

Again I have no solid example, but wondering whether when without compiler
barrier the compiler would be legal to compile this code clip:

  t1 = frc();
  a = 1;
  t2 = frc();

into something like:

  t1 = frc();
  t2 = frc();
  a = 1;

It's just that iiuc compiler barrier has 0 overhead to us.  No strong opinion
anyways.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] oslat: Add aarch64 support
  2021-09-10 13:33           ` Peter Xu
@ 2021-09-10 14:27             ` nsaenzju
  2021-09-10 17:00               ` Peter Xu
  0 siblings, 1 reply; 18+ messages in thread
From: nsaenzju @ 2021-09-10 14:27 UTC (permalink / raw)
  To: Peter Xu; +Cc: linux-rt-users, williams, jkacur

On Fri, 2021-09-10 at 09:33 -0400, Peter Xu wrote:
> On Fri, Sep 10, 2021 at 02:19:40PM +0200, nsaenzju@redhat.com wrote:
> > > I also don't understand why you explicitly removed the compiler barrier.  IIUC
> > > when without it the compiler could move these instructions to be before/after
> > > other instructions generated in the c code.  That may not really happen in
> > > practise, but just curious why the explicit removal.
> > 
> > I removed it too as I see no justification for it. There is nothing, except for
> > the actual timestamp values (which are safe as they come from an mrs), that
> > could suffer from the compiler prefetching the value, or reordering accesses.
> > I'll add a comment on the commit message.
> 
> Again I have no solid example, but wondering whether when without compiler
> barrier the compiler would be legal to compile this code clip:
> 
>   t1 = frc();
>   a = 1;
>   t2 = frc();
> 
> into something like:
> 
>   t1 = frc();
>   t2 = frc();
>   a = 1;
> 
> It's just that iiuc compiler barrier has 0 overhead to us.  No strong opinion
> anyways.

My understanding is that can only happen when the expression to right of 'a'
has no dependencies with t2. And at that stage, do we really care? Well, yes,
for example if you're measuring how long it took to assign a, which is similar
to what we do:

	workload_fn workload = g.workload->w_fn;
	frc(&t1)
	workload()
	frc(&t2)

But in that case the compiler is forced to handle the indirect function call
and IIUC there is little optimization todo, so we're safe without barriers.

That said, I don't feel overly confident about all this, and there is little
downside to the compiler barriers, so I'll just add them.

-- 
Nicolás Sáenz


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] oslat: Add aarch64 support
  2021-09-10 14:27             ` nsaenzju
@ 2021-09-10 17:00               ` Peter Xu
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2021-09-10 17:00 UTC (permalink / raw)
  To: nsaenzju; +Cc: linux-rt-users, williams, jkacur

On Fri, Sep 10, 2021 at 04:27:05PM +0200, nsaenzju@redhat.com wrote:
> My understanding is that can only happen when the expression to right of 'a'
> has no dependencies with t2. And at that stage, do we really care? Well, yes,
> for example if you're measuring how long it took to assign a, which is similar
> to what we do:
> 
> 	workload_fn workload = g.workload->w_fn;
> 	frc(&t1)
> 	workload()
> 	frc(&t2)
> 
> But in that case the compiler is forced to handle the indirect function call
> and IIUC there is little optimization todo, so we're safe without barriers.

Yep.

> 
> That said, I don't feel overly confident about all this, and there is little
> downside to the compiler barriers, so I'll just add them.

Yep, too.

IMHO that's the point (even I know it's trivial and I'm harsh :-D) - we don't
have a reason to drop it if it helps us to make sure there's no surprise.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-09-10 17:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-08 10:02 [PATCH 1/3] oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz Nicolas Saenz Julienne
2021-09-08 10:02 ` [PATCH 2/3] oslat: Add aarch64 support Nicolas Saenz Julienne
2021-09-08 18:09   ` Peter Xu
2021-09-09 10:10     ` nsaenzju
2021-09-09 18:03       ` Peter Xu
2021-09-10 12:19         ` nsaenzju
2021-09-10 13:33           ` Peter Xu
2021-09-10 14:27             ` nsaenzju
2021-09-10 17:00               ` Peter Xu
2021-09-08 10:02 ` [PATCH 3/3] oslat: Allow for arch specific timer frequency measurements Nicolas Saenz Julienne
2021-09-08 18:16   ` Peter Xu
2021-09-09 10:29     ` nsaenzju
2021-09-09 18:04       ` Peter Xu
2021-09-08 14:40 ` [PATCH 1/3] oslat: rename cpu_mhz/cpu_hz to timer_mhz/cpu_hz Peter Xu
2021-09-08 16:30   ` nsaenzju
2021-09-08 17:51     ` Peter Xu
2021-09-09  9:41       ` nsaenzju
2021-09-09 18:05         ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).