* [PATCH 0/3] x86/vdso: Minor improvements @ 2018-10-04 21:44 Andy Lutomirski 2018-10-04 21:44 ` [PATCH 1/3] x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks Andy Lutomirski ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Andy Lutomirski @ 2018-10-04 21:44 UTC (permalink / raw) To: x86; +Cc: LKML, Andy Lutomirski Hi tglx, This little series applies on top of your patches and, at least on my laptop, gets back all the performance you lost and then some :) --Andy Andy Lutomirski (3): x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks x86/vdso: Rearrange do_hres() to improve code generation x86/vdso: Document vgtod_ts better arch/x86/entry/vdso/vclock_gettime.c | 9 ++++----- arch/x86/include/asm/vgtod.h | 9 +++++++++ 2 files changed, 13 insertions(+), 5 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks 2018-10-04 21:44 [PATCH 0/3] x86/vdso: Minor improvements Andy Lutomirski @ 2018-10-04 21:44 ` Andy Lutomirski 2018-10-05 8:27 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski 2018-10-04 21:44 ` [PATCH 2/3] x86/vdso: Rearrange do_hres() to improve code generation Andy Lutomirski 2018-10-04 21:44 ` [PATCH 3/3] x86/vdso: Document vgtod_ts better Andy Lutomirski 2 siblings, 1 reply; 8+ messages in thread From: Andy Lutomirski @ 2018-10-04 21:44 UTC (permalink / raw) To: x86; +Cc: LKML, Andy Lutomirski When a vDSO clock function falls back to the syscall, no special barriers or ordering is needed, and the syscall fallbacks don't clobber any memory that is not explicitly listed in the asm constraints. Remove the "memory" clobber. This causes minor changes to the generated code, but otherwise has no obvious performance impact. I think it's nice to have, though, since it may help the optimizer in the future. Signed-off-by: Andy Lutomirski <luto@kernel.org> --- arch/x86/entry/vdso/vclock_gettime.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c index b7ccbff26a3f..18c8a78d1ec9 100644 --- a/arch/x86/entry/vdso/vclock_gettime.c +++ b/arch/x86/entry/vdso/vclock_gettime.c @@ -45,7 +45,7 @@ notrace static long vdso_fallback_gettime(long clock, struct timespec *ts) long ret; asm ("syscall" : "=a" (ret), "=m" (*ts) : "0" (__NR_clock_gettime), "D" (clock), "S" (ts) : - "memory", "rcx", "r11"); + "rcx", "r11"); return ret; } @@ -62,7 +62,7 @@ notrace static long vdso_fallback_gettime(long clock, struct timespec *ts) "mov %%edx, %%ebx \n" : "=a" (ret), "=m" (*ts) : "0" (__NR_clock_gettime), [clock] "g" (clock), "c" (ts) - : "memory", "edx"); + : "edx"); return ret; } -- 2.17.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [tip:x86/vdso] x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks 2018-10-04 21:44 ` [PATCH 1/3] x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks Andy Lutomirski @ 2018-10-05 8:27 ` tip-bot for Andy Lutomirski 0 siblings, 0 replies; 8+ messages in thread From: tip-bot for Andy Lutomirski @ 2018-10-05 8:27 UTC (permalink / raw) To: linux-tip-commits; +Cc: mingo, torvalds, linux-kernel, hpa, tglx, luto, peterz Commit-ID: 89fe0a1f1c694a3b0b3cfa8c0952d603753f36df Gitweb: https://git.kernel.org/tip/89fe0a1f1c694a3b0b3cfa8c0952d603753f36df Author: Andy Lutomirski <luto@kernel.org> AuthorDate: Thu, 4 Oct 2018 14:44:43 -0700 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Fri, 5 Oct 2018 10:12:18 +0200 x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks When a vDSO clock function falls back to the syscall, no special barriers or ordering is needed, and the syscall fallbacks don't clobber any memory that is not explicitly listed in the asm constraints. Remove the "memory" clobber. This causes minor changes to the generated code, but otherwise has no obvious performance impact. I think it's nice to have, though, since it may help the optimizer in the future. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/3a7438f5fb2422ed881683d2ccffd7f987b2dc44.1538689401.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> --- arch/x86/entry/vdso/vclock_gettime.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c index b7ccbff26a3f..18c8a78d1ec9 100644 --- a/arch/x86/entry/vdso/vclock_gettime.c +++ b/arch/x86/entry/vdso/vclock_gettime.c @@ -45,7 +45,7 @@ notrace static long vdso_fallback_gettime(long clock, struct timespec *ts) long ret; asm ("syscall" : "=a" (ret), "=m" (*ts) : "0" (__NR_clock_gettime), "D" (clock), "S" (ts) : - "memory", "rcx", "r11"); + "rcx", "r11"); return ret; } @@ -62,7 +62,7 @@ notrace static long vdso_fallback_gettime(long clock, struct timespec *ts) "mov %%edx, %%ebx \n" : "=a" (ret), "=m" (*ts) : "0" (__NR_clock_gettime), [clock] "g" (clock), "c" (ts) - : "memory", "edx"); + : "edx"); return ret; } ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] x86/vdso: Rearrange do_hres() to improve code generation 2018-10-04 21:44 [PATCH 0/3] x86/vdso: Minor improvements Andy Lutomirski 2018-10-04 21:44 ` [PATCH 1/3] x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks Andy Lutomirski @ 2018-10-04 21:44 ` Andy Lutomirski 2018-10-05 6:00 ` Thomas Gleixner 2018-10-04 21:44 ` [PATCH 3/3] x86/vdso: Document vgtod_ts better Andy Lutomirski 2 siblings, 1 reply; 8+ messages in thread From: Andy Lutomirski @ 2018-10-04 21:44 UTC (permalink / raw) To: x86; +Cc: LKML, Andy Lutomirski vgetcyc() is full of barriers, so fetching values out of the vvar page before vgetcyc() for use after vgetcyc() results in poor code generation. Put vgetcyc() first to avoid this problem. Also, pull the tv_sec division into the loop and put all the ts writes together. The old code wrote ts->tv_sec on each iteration before the syscall fallback check and then added in the offset afterwards, which forced the compiler to pointlessly copy base->sec to ts->tv_sec on each iteration. The new version seems to generate sensible code. Saves several cycles. With this patch applied, the result is faster than before the clock_gettime() rewrite. Signed-off-by: Andy Lutomirski <luto@kernel.org> --- arch/x86/entry/vdso/vclock_gettime.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c index 18c8a78d1ec9..419de7552c2f 100644 --- a/arch/x86/entry/vdso/vclock_gettime.c +++ b/arch/x86/entry/vdso/vclock_gettime.c @@ -147,10 +147,9 @@ notrace static int do_hres(clockid_t clk, struct timespec *ts) do { seq = gtod_read_begin(gtod); - ts->tv_sec = base->sec; + cycles = vgetcyc(gtod->vclock_mode); ns = base->nsec; last = gtod->cycle_last; - cycles = vgetcyc(gtod->vclock_mode); if (unlikely((s64)cycles < 0)) return vdso_fallback_gettime(clk, ts); if (cycles > last) @@ -158,7 +157,7 @@ notrace static int do_hres(clockid_t clk, struct timespec *ts) ns >>= gtod->shift; } while (unlikely(gtod_read_retry(gtod, seq))); - ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns); + ts->tv_sec = base->sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns); ts->tv_nsec = ns; return 0; -- 2.17.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] x86/vdso: Rearrange do_hres() to improve code generation 2018-10-04 21:44 ` [PATCH 2/3] x86/vdso: Rearrange do_hres() to improve code generation Andy Lutomirski @ 2018-10-05 6:00 ` Thomas Gleixner 2018-10-05 13:05 ` Andy Lutomirski 0 siblings, 1 reply; 8+ messages in thread From: Thomas Gleixner @ 2018-10-05 6:00 UTC (permalink / raw) To: Andy Lutomirski; +Cc: x86, LKML On Thu, 4 Oct 2018, Andy Lutomirski wrote: > index 18c8a78d1ec9..419de7552c2f 100644 > --- a/arch/x86/entry/vdso/vclock_gettime.c > +++ b/arch/x86/entry/vdso/vclock_gettime.c > @@ -147,10 +147,9 @@ notrace static int do_hres(clockid_t clk, struct timespec *ts) > > do { > seq = gtod_read_begin(gtod); > - ts->tv_sec = base->sec; > + cycles = vgetcyc(gtod->vclock_mode); > ns = base->nsec; > last = gtod->cycle_last; > - cycles = vgetcyc(gtod->vclock_mode); > if (unlikely((s64)cycles < 0)) > return vdso_fallback_gettime(clk, ts); > if (cycles > last) > @@ -158,7 +157,7 @@ notrace static int do_hres(clockid_t clk, struct timespec *ts) > ns >>= gtod->shift; > } while (unlikely(gtod_read_retry(gtod, seq))); > > - ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns); > + ts->tv_sec = base->sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns); You cannot access base->sec outside of the seqcount protected region. It might have been incremented by now and you'll get a time jump by a full second. Thanks, tglx ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] x86/vdso: Rearrange do_hres() to improve code generation 2018-10-05 6:00 ` Thomas Gleixner @ 2018-10-05 13:05 ` Andy Lutomirski 0 siblings, 0 replies; 8+ messages in thread From: Andy Lutomirski @ 2018-10-05 13:05 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Andy Lutomirski, x86, LKML > On Oct 4, 2018, at 11:00 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > >> On Thu, 4 Oct 2018, Andy Lutomirski wrote: >> index 18c8a78d1ec9..419de7552c2f 100644 >> --- a/arch/x86/entry/vdso/vclock_gettime.c >> +++ b/arch/x86/entry/vdso/vclock_gettime.c >> @@ -147,10 +147,9 @@ notrace static int do_hres(clockid_t clk, struct timespec *ts) >> >> do { >> seq = gtod_read_begin(gtod); >> - ts->tv_sec = base->sec; >> + cycles = vgetcyc(gtod->vclock_mode); >> ns = base->nsec; >> last = gtod->cycle_last; >> - cycles = vgetcyc(gtod->vclock_mode); >> if (unlikely((s64)cycles < 0)) >> return vdso_fallback_gettime(clk, ts); >> if (cycles > last) >> @@ -158,7 +157,7 @@ notrace static int do_hres(clockid_t clk, struct timespec *ts) >> ns >>= gtod->shift; >> } while (unlikely(gtod_read_retry(gtod, seq))); >> >> - ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns); >> + ts->tv_sec = base->sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns); > > You cannot access base->sec outside of the seqcount protected region. It > might have been incremented by now and you'll get a time jump by a full > second. Duh. Let me try this again. > > Thanks, > > tglx > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 3/3] x86/vdso: Document vgtod_ts better 2018-10-04 21:44 [PATCH 0/3] x86/vdso: Minor improvements Andy Lutomirski 2018-10-04 21:44 ` [PATCH 1/3] x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks Andy Lutomirski 2018-10-04 21:44 ` [PATCH 2/3] x86/vdso: Rearrange do_hres() to improve code generation Andy Lutomirski @ 2018-10-04 21:44 ` Andy Lutomirski 2018-10-05 8:28 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski 2 siblings, 1 reply; 8+ messages in thread From: Andy Lutomirski @ 2018-10-04 21:44 UTC (permalink / raw) To: x86; +Cc: LKML, Andy Lutomirski After reading do_hres() and do_course() and scratching my head a bit, I figured out why the arithmetic is strange. Document it. Signed-off-by: Andy Lutomirski <luto@kernel.org> --- arch/x86/include/asm/vgtod.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h index d17b092b9f1b..69d05c6d47f5 100644 --- a/arch/x86/include/asm/vgtod.h +++ b/arch/x86/include/asm/vgtod.h @@ -13,6 +13,15 @@ typedef u64 gtod_long_t; typedef unsigned long gtod_long_t; #endif +/* + * There is one of these objects in the vvar page for each + * vDSO-accelerated clockid. For high-resolution clocks, this encodes + * the time corresponding to vsyscall_gtod_data.cycle_last. For coarse + * clocks, this encodes the actual time. + * + * To confuse the reader, for high-resolution clocks, nsec is left-shifted + * by vsyscall_gtod_data.shift. + */ struct vgtod_ts { u64 sec; u64 nsec; -- 2.17.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [tip:x86/vdso] x86/vdso: Document vgtod_ts better 2018-10-04 21:44 ` [PATCH 3/3] x86/vdso: Document vgtod_ts better Andy Lutomirski @ 2018-10-05 8:28 ` tip-bot for Andy Lutomirski 0 siblings, 0 replies; 8+ messages in thread From: tip-bot for Andy Lutomirski @ 2018-10-05 8:28 UTC (permalink / raw) To: linux-tip-commits; +Cc: mingo, tglx, luto, linux-kernel, torvalds, peterz, hpa Commit-ID: bcc4a62a73cb65327d7268fbfa3a786d603f52dc Gitweb: https://git.kernel.org/tip/bcc4a62a73cb65327d7268fbfa3a786d603f52dc Author: Andy Lutomirski <luto@kernel.org> AuthorDate: Thu, 4 Oct 2018 14:44:45 -0700 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Fri, 5 Oct 2018 10:12:18 +0200 x86/vdso: Document vgtod_ts better After reading do_hres() and do_course() and scratching my head a bit, I figured out why the arithmetic is strange. Document it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f66f53d81150bbad47d7b282c9207a71a3ce1c16.1538689401.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> --- arch/x86/include/asm/vgtod.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h index d17b092b9f1b..69d05c6d47f5 100644 --- a/arch/x86/include/asm/vgtod.h +++ b/arch/x86/include/asm/vgtod.h @@ -13,6 +13,15 @@ typedef u64 gtod_long_t; typedef unsigned long gtod_long_t; #endif +/* + * There is one of these objects in the vvar page for each + * vDSO-accelerated clockid. For high-resolution clocks, this encodes + * the time corresponding to vsyscall_gtod_data.cycle_last. For coarse + * clocks, this encodes the actual time. + * + * To confuse the reader, for high-resolution clocks, nsec is left-shifted + * by vsyscall_gtod_data.shift. + */ struct vgtod_ts { u64 sec; u64 nsec; ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-10-05 13:05 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-10-04 21:44 [PATCH 0/3] x86/vdso: Minor improvements Andy Lutomirski 2018-10-04 21:44 ` [PATCH 1/3] x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks Andy Lutomirski 2018-10-05 8:27 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski 2018-10-04 21:44 ` [PATCH 2/3] x86/vdso: Rearrange do_hres() to improve code generation Andy Lutomirski 2018-10-05 6:00 ` Thomas Gleixner 2018-10-05 13:05 ` Andy Lutomirski 2018-10-04 21:44 ` [PATCH 3/3] x86/vdso: Document vgtod_ts better Andy Lutomirski 2018-10-05 8:28 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.