* [PATCH 0/3] x86/vdso: Minor improvements
@ 2018-10-04 21:44 Andy Lutomirski
2018-10-04 21:44 ` [PATCH 1/3] x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks Andy Lutomirski
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Andy Lutomirski @ 2018-10-04 21:44 UTC (permalink / raw)
To: x86; +Cc: LKML, Andy Lutomirski
Hi tglx,
This little series applies on top of your patches and, at least on
my laptop, gets back all the performance you lost and then some :)
--Andy
Andy Lutomirski (3):
x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks
x86/vdso: Rearrange do_hres() to improve code generation
x86/vdso: Document vgtod_ts better
arch/x86/entry/vdso/vclock_gettime.c | 9 ++++-----
arch/x86/include/asm/vgtod.h | 9 +++++++++
2 files changed, 13 insertions(+), 5 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks
2018-10-04 21:44 [PATCH 0/3] x86/vdso: Minor improvements Andy Lutomirski
@ 2018-10-04 21:44 ` Andy Lutomirski
2018-10-05 8:27 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2018-10-04 21:44 ` [PATCH 2/3] x86/vdso: Rearrange do_hres() to improve code generation Andy Lutomirski
2018-10-04 21:44 ` [PATCH 3/3] x86/vdso: Document vgtod_ts better Andy Lutomirski
2 siblings, 1 reply; 8+ messages in thread
From: Andy Lutomirski @ 2018-10-04 21:44 UTC (permalink / raw)
To: x86; +Cc: LKML, Andy Lutomirski
When a vDSO clock function falls back to the syscall, no special
barriers or ordering is needed, and the syscall fallbacks don't
clobber any memory that is not explicitly listed in the asm
constraints. Remove the "memory" clobber.
This causes minor changes to the generated code, but otherwise has
no obvious performance impact. I think it's nice to have, though,
since it may help the optimizer in the future.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/entry/vdso/vclock_gettime.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index b7ccbff26a3f..18c8a78d1ec9 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -45,7 +45,7 @@ notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
long ret;
asm ("syscall" : "=a" (ret), "=m" (*ts) :
"0" (__NR_clock_gettime), "D" (clock), "S" (ts) :
- "memory", "rcx", "r11");
+ "rcx", "r11");
return ret;
}
@@ -62,7 +62,7 @@ notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
"mov %%edx, %%ebx \n"
: "=a" (ret), "=m" (*ts)
: "0" (__NR_clock_gettime), [clock] "g" (clock), "c" (ts)
- : "memory", "edx");
+ : "edx");
return ret;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] x86/vdso: Rearrange do_hres() to improve code generation
2018-10-04 21:44 [PATCH 0/3] x86/vdso: Minor improvements Andy Lutomirski
2018-10-04 21:44 ` [PATCH 1/3] x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks Andy Lutomirski
@ 2018-10-04 21:44 ` Andy Lutomirski
2018-10-05 6:00 ` Thomas Gleixner
2018-10-04 21:44 ` [PATCH 3/3] x86/vdso: Document vgtod_ts better Andy Lutomirski
2 siblings, 1 reply; 8+ messages in thread
From: Andy Lutomirski @ 2018-10-04 21:44 UTC (permalink / raw)
To: x86; +Cc: LKML, Andy Lutomirski
vgetcyc() is full of barriers, so fetching values out of the vvar
page before vgetcyc() for use after vgetcyc() results in poor code
generation. Put vgetcyc() first to avoid this problem.
Also, pull the tv_sec division into the loop and put all the ts
writes together. The old code wrote ts->tv_sec on each iteration
before the syscall fallback check and then added in the offset
afterwards, which forced the compiler to pointlessly copy base->sec
to ts->tv_sec on each iteration. The new version seems to generate
sensible code.
Saves several cycles. With this patch applied, the result is faster
than before the clock_gettime() rewrite.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/entry/vdso/vclock_gettime.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 18c8a78d1ec9..419de7552c2f 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -147,10 +147,9 @@ notrace static int do_hres(clockid_t clk, struct timespec *ts)
do {
seq = gtod_read_begin(gtod);
- ts->tv_sec = base->sec;
+ cycles = vgetcyc(gtod->vclock_mode);
ns = base->nsec;
last = gtod->cycle_last;
- cycles = vgetcyc(gtod->vclock_mode);
if (unlikely((s64)cycles < 0))
return vdso_fallback_gettime(clk, ts);
if (cycles > last)
@@ -158,7 +157,7 @@ notrace static int do_hres(clockid_t clk, struct timespec *ts)
ns >>= gtod->shift;
} while (unlikely(gtod_read_retry(gtod, seq)));
- ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
+ ts->tv_sec = base->sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
ts->tv_nsec = ns;
return 0;
--
2.17.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] x86/vdso: Document vgtod_ts better
2018-10-04 21:44 [PATCH 0/3] x86/vdso: Minor improvements Andy Lutomirski
2018-10-04 21:44 ` [PATCH 1/3] x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks Andy Lutomirski
2018-10-04 21:44 ` [PATCH 2/3] x86/vdso: Rearrange do_hres() to improve code generation Andy Lutomirski
@ 2018-10-04 21:44 ` Andy Lutomirski
2018-10-05 8:28 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2 siblings, 1 reply; 8+ messages in thread
From: Andy Lutomirski @ 2018-10-04 21:44 UTC (permalink / raw)
To: x86; +Cc: LKML, Andy Lutomirski
After reading do_hres() and do_course() and scratching my head a
bit, I figured out why the arithmetic is strange. Document it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/include/asm/vgtod.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index d17b092b9f1b..69d05c6d47f5 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -13,6 +13,15 @@ typedef u64 gtod_long_t;
typedef unsigned long gtod_long_t;
#endif
+/*
+ * There is one of these objects in the vvar page for each
+ * vDSO-accelerated clockid. For high-resolution clocks, this encodes
+ * the time corresponding to vsyscall_gtod_data.cycle_last. For coarse
+ * clocks, this encodes the actual time.
+ *
+ * To confuse the reader, for high-resolution clocks, nsec is left-shifted
+ * by vsyscall_gtod_data.shift.
+ */
struct vgtod_ts {
u64 sec;
u64 nsec;
--
2.17.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] x86/vdso: Rearrange do_hres() to improve code generation
2018-10-04 21:44 ` [PATCH 2/3] x86/vdso: Rearrange do_hres() to improve code generation Andy Lutomirski
@ 2018-10-05 6:00 ` Thomas Gleixner
2018-10-05 13:05 ` Andy Lutomirski
0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2018-10-05 6:00 UTC (permalink / raw)
To: Andy Lutomirski; +Cc: x86, LKML
On Thu, 4 Oct 2018, Andy Lutomirski wrote:
> index 18c8a78d1ec9..419de7552c2f 100644
> --- a/arch/x86/entry/vdso/vclock_gettime.c
> +++ b/arch/x86/entry/vdso/vclock_gettime.c
> @@ -147,10 +147,9 @@ notrace static int do_hres(clockid_t clk, struct timespec *ts)
>
> do {
> seq = gtod_read_begin(gtod);
> - ts->tv_sec = base->sec;
> + cycles = vgetcyc(gtod->vclock_mode);
> ns = base->nsec;
> last = gtod->cycle_last;
> - cycles = vgetcyc(gtod->vclock_mode);
> if (unlikely((s64)cycles < 0))
> return vdso_fallback_gettime(clk, ts);
> if (cycles > last)
> @@ -158,7 +157,7 @@ notrace static int do_hres(clockid_t clk, struct timespec *ts)
> ns >>= gtod->shift;
> } while (unlikely(gtod_read_retry(gtod, seq)));
>
> - ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
> + ts->tv_sec = base->sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
You cannot access base->sec outside of the seqcount protected region. It
might have been incremented by now and you'll get a time jump by a full
second.
Thanks,
tglx
^ permalink raw reply [flat|nested] 8+ messages in thread
* [tip:x86/vdso] x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks
2018-10-04 21:44 ` [PATCH 1/3] x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks Andy Lutomirski
@ 2018-10-05 8:27 ` tip-bot for Andy Lutomirski
0 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Andy Lutomirski @ 2018-10-05 8:27 UTC (permalink / raw)
To: linux-tip-commits; +Cc: mingo, torvalds, linux-kernel, hpa, tglx, luto, peterz
Commit-ID: 89fe0a1f1c694a3b0b3cfa8c0952d603753f36df
Gitweb: https://git.kernel.org/tip/89fe0a1f1c694a3b0b3cfa8c0952d603753f36df
Author: Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 4 Oct 2018 14:44:43 -0700
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 5 Oct 2018 10:12:18 +0200
x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks
When a vDSO clock function falls back to the syscall, no special
barriers or ordering is needed, and the syscall fallbacks don't
clobber any memory that is not explicitly listed in the asm
constraints. Remove the "memory" clobber.
This causes minor changes to the generated code, but otherwise has
no obvious performance impact. I think it's nice to have, though,
since it may help the optimizer in the future.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3a7438f5fb2422ed881683d2ccffd7f987b2dc44.1538689401.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/entry/vdso/vclock_gettime.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index b7ccbff26a3f..18c8a78d1ec9 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -45,7 +45,7 @@ notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
long ret;
asm ("syscall" : "=a" (ret), "=m" (*ts) :
"0" (__NR_clock_gettime), "D" (clock), "S" (ts) :
- "memory", "rcx", "r11");
+ "rcx", "r11");
return ret;
}
@@ -62,7 +62,7 @@ notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
"mov %%edx, %%ebx \n"
: "=a" (ret), "=m" (*ts)
: "0" (__NR_clock_gettime), [clock] "g" (clock), "c" (ts)
- : "memory", "edx");
+ : "edx");
return ret;
}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [tip:x86/vdso] x86/vdso: Document vgtod_ts better
2018-10-04 21:44 ` [PATCH 3/3] x86/vdso: Document vgtod_ts better Andy Lutomirski
@ 2018-10-05 8:28 ` tip-bot for Andy Lutomirski
0 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Andy Lutomirski @ 2018-10-05 8:28 UTC (permalink / raw)
To: linux-tip-commits; +Cc: mingo, tglx, luto, linux-kernel, torvalds, peterz, hpa
Commit-ID: bcc4a62a73cb65327d7268fbfa3a786d603f52dc
Gitweb: https://git.kernel.org/tip/bcc4a62a73cb65327d7268fbfa3a786d603f52dc
Author: Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 4 Oct 2018 14:44:45 -0700
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 5 Oct 2018 10:12:18 +0200
x86/vdso: Document vgtod_ts better
After reading do_hres() and do_course() and scratching my head a
bit, I figured out why the arithmetic is strange. Document it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f66f53d81150bbad47d7b282c9207a71a3ce1c16.1538689401.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/include/asm/vgtod.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index d17b092b9f1b..69d05c6d47f5 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -13,6 +13,15 @@ typedef u64 gtod_long_t;
typedef unsigned long gtod_long_t;
#endif
+/*
+ * There is one of these objects in the vvar page for each
+ * vDSO-accelerated clockid. For high-resolution clocks, this encodes
+ * the time corresponding to vsyscall_gtod_data.cycle_last. For coarse
+ * clocks, this encodes the actual time.
+ *
+ * To confuse the reader, for high-resolution clocks, nsec is left-shifted
+ * by vsyscall_gtod_data.shift.
+ */
struct vgtod_ts {
u64 sec;
u64 nsec;
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] x86/vdso: Rearrange do_hres() to improve code generation
2018-10-05 6:00 ` Thomas Gleixner
@ 2018-10-05 13:05 ` Andy Lutomirski
0 siblings, 0 replies; 8+ messages in thread
From: Andy Lutomirski @ 2018-10-05 13:05 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: Andy Lutomirski, x86, LKML
> On Oct 4, 2018, at 11:00 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
>> On Thu, 4 Oct 2018, Andy Lutomirski wrote:
>> index 18c8a78d1ec9..419de7552c2f 100644
>> --- a/arch/x86/entry/vdso/vclock_gettime.c
>> +++ b/arch/x86/entry/vdso/vclock_gettime.c
>> @@ -147,10 +147,9 @@ notrace static int do_hres(clockid_t clk, struct timespec *ts)
>>
>> do {
>> seq = gtod_read_begin(gtod);
>> - ts->tv_sec = base->sec;
>> + cycles = vgetcyc(gtod->vclock_mode);
>> ns = base->nsec;
>> last = gtod->cycle_last;
>> - cycles = vgetcyc(gtod->vclock_mode);
>> if (unlikely((s64)cycles < 0))
>> return vdso_fallback_gettime(clk, ts);
>> if (cycles > last)
>> @@ -158,7 +157,7 @@ notrace static int do_hres(clockid_t clk, struct timespec *ts)
>> ns >>= gtod->shift;
>> } while (unlikely(gtod_read_retry(gtod, seq)));
>>
>> - ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
>> + ts->tv_sec = base->sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
>
> You cannot access base->sec outside of the seqcount protected region. It
> might have been incremented by now and you'll get a time jump by a full
> second.
Duh. Let me try this again.
>
> Thanks,
>
> tglx
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-10-05 13:05 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-04 21:44 [PATCH 0/3] x86/vdso: Minor improvements Andy Lutomirski
2018-10-04 21:44 ` [PATCH 1/3] x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks Andy Lutomirski
2018-10-05 8:27 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
2018-10-04 21:44 ` [PATCH 2/3] x86/vdso: Rearrange do_hres() to improve code generation Andy Lutomirski
2018-10-05 6:00 ` Thomas Gleixner
2018-10-05 13:05 ` Andy Lutomirski
2018-10-04 21:44 ` [PATCH 3/3] x86/vdso: Document vgtod_ts better Andy Lutomirski
2018-10-05 8:28 ` [tip:x86/vdso] " tip-bot for Andy Lutomirski
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.