* [v2 8/8] sparc64: optimize functions that access tick
@ 2017-06-12 16:48 Pavel Tatashin
2017-06-12 19:13 ` David Miller
0 siblings, 1 reply; 2+ messages in thread
From: Pavel Tatashin @ 2017-06-12 16:48 UTC (permalink / raw)
To: sparclinux
Replace read tick function pointers with the new hot-patched get_tick().
This optimizes the performance of functions such as: sched_clock()
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
---
arch/sparc/kernel/time_64.c | 22 +++++++++++++---------
1 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/arch/sparc/kernel/time_64.c b/arch/sparc/kernel/time_64.c
index 5fe595a..4a0bd18 100644
--- a/arch/sparc/kernel/time_64.c
+++ b/arch/sparc/kernel/time_64.c
@@ -752,12 +752,10 @@ void setup_sparc64_timer(void)
void __delay(unsigned long loops)
{
- unsigned long bclock, now;
+ unsigned long bclock = get_tick();
- bclock = tick_operations.get_tick();
- do {
- now = tick_operations.get_tick();
- } while ((now-bclock) < loops);
+ while ((get_tick() - bclock) < loops)
+ ;
}
EXPORT_SYMBOL(__delay);
@@ -769,7 +767,7 @@ void udelay(unsigned long usecs)
static u64 clocksource_tick_read(struct clocksource *cs)
{
- return tick_operations.get_tick();
+ return get_tick();
}
static void __init get_tick_patch(void)
@@ -853,13 +851,19 @@ unsigned long long sched_clock(void)
{
unsigned long quotient = tick_operations.ticks_per_nsec_quotient;
unsigned long offset = tick_operations.offset;
- unsigned long ticks = tick_operations.get_tick();
- return ((ticks * quotient) >> SPARC64_NSEC_PER_CYC_SHIFT) - offset;
+ /* Use wmb so the compiler emits the loads first and overlaps load
+ * latency with reading tick, because reading %tick/%stick is a
+ * post-sync instruction that will flush and restart subsequent
+ * instructions after it commits.
+ */
+ wmb();
+
+ return ((get_tick() * quotient) >> SPARC64_NSEC_PER_CYC_SHIFT) - offset;
}
int read_current_timer(unsigned long *timer_val)
{
- *timer_val = tick_operations.get_tick();
+ *timer_val = get_tick();
return 0;
}
--
1.7.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [v2 8/8] sparc64: optimize functions that access tick
2017-06-12 16:48 [v2 8/8] sparc64: optimize functions that access tick Pavel Tatashin
@ 2017-06-12 19:13 ` David Miller
0 siblings, 0 replies; 2+ messages in thread
From: David Miller @ 2017-06-12 19:13 UTC (permalink / raw)
To: sparclinux
From: Pavel Tatashin <pasha.tatashin@oracle.com>
Date: Mon, 12 Jun 2017 12:48:27 -0400
> @@ -853,13 +851,19 @@ unsigned long long sched_clock(void)
> {
> unsigned long quotient = tick_operations.ticks_per_nsec_quotient;
> unsigned long offset = tick_operations.offset;
> - unsigned long ticks = tick_operations.get_tick();
>
> - return ((ticks * quotient) >> SPARC64_NSEC_PER_CYC_SHIFT) - offset;
> + /* Use wmb so the compiler emits the loads first and overlaps load
> + * latency with reading tick, because reading %tick/%stick is a
> + * post-sync instruction that will flush and restart subsequent
> + * instructions after it commits.
> + */
> + wmb();
> +
> + return ((get_tick() * quotient) >> SPARC64_NSEC_PER_CYC_SHIFT) - offset;
> }
I think you need to use barrier() here not wmb().
wmb() orders memory operations wrt. other memory operations.
get_tick() doesn't modify memory nor access memory, so as far as the
compiler is concerned it can still legal order the loads after
get_tick() if it really wanted to.
barrier() emits a volatile empty asm, which strictly orders all
operations before and after the barrier().
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2017-06-12 19:13 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-12 16:48 [v2 8/8] sparc64: optimize functions that access tick Pavel Tatashin
2017-06-12 19:13 ` David Miller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.