* [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime
@ 2012-03-23 4:15 Andy Lutomirski
2012-03-23 4:15 ` [PATCH 1/2] x86-64: Simplify and optimize vdso clock_gettime monotonic variants Andy Lutomirski
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Andy Lutomirski @ 2012-03-23 4:15 UTC (permalink / raw)
To: Thomas Gleixner, x86; +Cc: linux-kernel, john.stultz, Andy Lutomirski
I think clock_gettime is already almost as fast as possible, but every
little bit helps. Also, I think the diffstat is pretty good for a
speedup. Here are some approximate timings.
Before After
CLOCK_REALTIME 16.7 15.2
CLOCK_MONOTONIC 17.3 15.5
CLOCK_REALTIME_COARSE 3.6 3.0
CLOCK_MONOTONIC_COARSE 4.2 3.6
These are extracted from an earlier series that's mostly abandoned now [1-2].
They apply to tip/timers/core commit 57779dc2b3b75bee05ef5d1ada47f615f7a13932.
For the git-inclined, the patches are here:
https://git.kernel.org/?p=linux/kernel/git/luto/linux.git;a=shortlog;h=refs/heads/timing/vclock_speedup/patch_v1
I'm not sure whether these are 3.4 material. On the pro side, they've
technically been floating around since long before the merge window.
They're also quite straightforward, and they're based on other -tip
changes (which is why I'm submitting now). On the con side, they don't
fix anything, and they're a little later than ideal.
[1] https://lkml.org/lkml/2011/12/25/26
[2] https://lkml.org/lkml/2011/12/25/27
Andy Lutomirski (2):
x86-64: Simplify and optimize vdso clock_gettime monotonic variants
x86-64: Inline vdso clock_gettime helpers
arch/x86/include/asm/vgtod.h | 15 +++++++-----
arch/x86/kernel/vsyscall_64.c | 10 +++++++-
arch/x86/vdso/vclock_gettime.c | 47 +++++++++++----------------------------
3 files changed, 31 insertions(+), 41 deletions(-)
--
1.7.7.6
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/2] x86-64: Simplify and optimize vdso clock_gettime monotonic variants
2012-03-23 4:15 [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime Andy Lutomirski
@ 2012-03-23 4:15 ` Andy Lutomirski
2012-03-23 4:15 ` [PATCH 2/2] x86-64: Inline vdso clock_gettime helpers Andy Lutomirski
2012-03-23 23:47 ` [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime John Stultz
2 siblings, 0 replies; 4+ messages in thread
From: Andy Lutomirski @ 2012-03-23 4:15 UTC (permalink / raw)
To: Thomas Gleixner, x86
Cc: linux-kernel, john.stultz, Andy Lutomirski, Andy Lutomirski
From: Andy Lutomirski <luto@mit.edu>
We used to store the wall-to-monotonic offset and the realtime base.
It's faster to precompute the monotonic base.
This is about a 3% speedup on Sandy Bridge for CLOCK_MONOTONIC.
It's much more impressive for CLOCK_MONOTONIC_COARSE.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
arch/x86/include/asm/vgtod.h | 15 +++++++++------
arch/x86/kernel/vsyscall_64.c | 10 +++++++++-
arch/x86/vdso/vclock_gettime.c | 38 ++++++++------------------------------
3 files changed, 26 insertions(+), 37 deletions(-)
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index 1f00717..8b38be2 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -7,11 +7,6 @@
struct vsyscall_gtod_data {
seqcount_t seq;
- /* open coded 'struct timespec' */
- time_t wall_time_sec;
- u32 wall_time_nsec;
-
- struct timezone sys_tz;
struct { /* extract of a clocksource struct */
int vclock_mode;
cycle_t cycle_last;
@@ -19,8 +14,16 @@ struct vsyscall_gtod_data {
u32 mult;
u32 shift;
} clock;
- struct timespec wall_to_monotonic;
+
+ /* open coded 'struct timespec' */
+ time_t wall_time_sec;
+ u32 wall_time_nsec;
+ u32 monotonic_time_nsec;
+ time_t monotonic_time_sec;
+
+ struct timezone sys_tz;
struct timespec wall_time_coarse;
+ struct timespec monotonic_time_coarse;
};
extern struct vsyscall_gtod_data vsyscall_gtod_data;
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index cdc95a7..4285f1f 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -84,6 +84,7 @@ void update_vsyscall(struct timespec *wall_time, struct timespec *wtm,
struct clocksource *clock, u32 mult)
{
write_seqcount_begin(&vsyscall_gtod_data.seq);
+ struct timespec monotonic;
/* copy vsyscall data */
vsyscall_gtod_data.clock.vclock_mode = clock->archdata.vclock_mode;
@@ -91,10 +92,17 @@ void update_vsyscall(struct timespec *wall_time, struct timespec *wtm,
vsyscall_gtod_data.clock.mask = clock->mask;
vsyscall_gtod_data.clock.mult = mult;
vsyscall_gtod_data.clock.shift = clock->shift;
+
vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec;
vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;
- vsyscall_gtod_data.wall_to_monotonic = *wtm;
+
+ monotonic = timespec_add(*wall_time, *wtm);
+ vsyscall_gtod_data.monotonic_time_sec = monotonic.tv_sec;
+ vsyscall_gtod_data.monotonic_time_nsec = monotonic.tv_nsec;
+
vsyscall_gtod_data.wall_time_coarse = __current_kernel_time();
+ vsyscall_gtod_data.monotonic_time_coarse =
+ timespec_add(vsyscall_gtod_data.wall_time_coarse, *wtm);
write_seqcount_end(&vsyscall_gtod_data.seq);
}
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 944c5e5..6eea70b8 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -113,27 +113,17 @@ notrace static noinline int do_realtime(struct timespec *ts)
notrace static noinline int do_monotonic(struct timespec *ts)
{
- unsigned long seq, ns, secs;
+ unsigned long seq, ns;
int mode;
do {
seq = read_seqcount_begin(>od->seq);
mode = gtod->clock.vclock_mode;
- secs = gtod->wall_time_sec;
- ns = gtod->wall_time_nsec + vgetns();
- secs += gtod->wall_to_monotonic.tv_sec;
- ns += gtod->wall_to_monotonic.tv_nsec;
+ ts->tv_sec = gtod->monotonic_time_sec;
+ ts->tv_nsec = gtod->monotonic_time_nsec;
+ ns = vgetns();
} while (unlikely(read_seqcount_retry(>od->seq, seq)));
-
- /* wall_time_nsec, vgetns(), and wall_to_monotonic.tv_nsec
- * are all guaranteed to be nonnegative.
- */
- while (ns >= NSEC_PER_SEC) {
- ns -= NSEC_PER_SEC;
- ++secs;
- }
- ts->tv_sec = secs;
- ts->tv_nsec = ns;
+ timespec_add_ns(ts, ns);
return mode;
}
@@ -151,25 +141,13 @@ notrace static noinline int do_realtime_coarse(struct timespec *ts)
notrace static noinline int do_monotonic_coarse(struct timespec *ts)
{
- unsigned long seq, ns, secs;
+ unsigned long seq;
do {
seq = read_seqcount_begin(>od->seq);
- secs = gtod->wall_time_coarse.tv_sec;
- ns = gtod->wall_time_coarse.tv_nsec;
- secs += gtod->wall_to_monotonic.tv_sec;
- ns += gtod->wall_to_monotonic.tv_nsec;
+ ts->tv_sec = gtod->monotonic_time_coarse.tv_sec;
+ ts->tv_nsec = gtod->monotonic_time_coarse.tv_nsec;
} while (unlikely(read_seqcount_retry(>od->seq, seq)));
- /* wall_time_nsec and wall_to_monotonic.tv_nsec are
- * guaranteed to be between 0 and NSEC_PER_SEC.
- */
- if (ns >= NSEC_PER_SEC) {
- ns -= NSEC_PER_SEC;
- ++secs;
- }
- ts->tv_sec = secs;
- ts->tv_nsec = ns;
-
return 0;
}
--
1.7.7.6
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] x86-64: Inline vdso clock_gettime helpers
2012-03-23 4:15 [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime Andy Lutomirski
2012-03-23 4:15 ` [PATCH 1/2] x86-64: Simplify and optimize vdso clock_gettime monotonic variants Andy Lutomirski
@ 2012-03-23 4:15 ` Andy Lutomirski
2012-03-23 23:47 ` [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime John Stultz
2 siblings, 0 replies; 4+ messages in thread
From: Andy Lutomirski @ 2012-03-23 4:15 UTC (permalink / raw)
To: Thomas Gleixner, x86
Cc: linux-kernel, john.stultz, Andy Lutomirski, Andy Lutomirski
From: Andy Lutomirski <luto@mit.edu>
This is about a 3% speedup on Sandy Bridge.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
arch/x86/vdso/vclock_gettime.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 6eea70b8..885eff4 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -94,7 +94,8 @@ notrace static inline long vgetns(void)
return (v * gtod->clock.mult) >> gtod->clock.shift;
}
-notrace static noinline int do_realtime(struct timespec *ts)
+/* Code size doesn't matter (vdso is 4k anyway) and this is faster. */
+notrace static int __always_inline do_realtime(struct timespec *ts)
{
unsigned long seq, ns;
int mode;
@@ -111,7 +112,7 @@ notrace static noinline int do_realtime(struct timespec *ts)
return mode;
}
-notrace static noinline int do_monotonic(struct timespec *ts)
+notrace static int do_monotonic(struct timespec *ts)
{
unsigned long seq, ns;
int mode;
@@ -128,7 +129,7 @@ notrace static noinline int do_monotonic(struct timespec *ts)
return mode;
}
-notrace static noinline int do_realtime_coarse(struct timespec *ts)
+notrace static int do_realtime_coarse(struct timespec *ts)
{
unsigned long seq;
do {
@@ -139,7 +140,7 @@ notrace static noinline int do_realtime_coarse(struct timespec *ts)
return 0;
}
-notrace static noinline int do_monotonic_coarse(struct timespec *ts)
+notrace static int do_monotonic_coarse(struct timespec *ts)
{
unsigned long seq;
do {
--
1.7.7.6
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime
2012-03-23 4:15 [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime Andy Lutomirski
2012-03-23 4:15 ` [PATCH 1/2] x86-64: Simplify and optimize vdso clock_gettime monotonic variants Andy Lutomirski
2012-03-23 4:15 ` [PATCH 2/2] x86-64: Inline vdso clock_gettime helpers Andy Lutomirski
@ 2012-03-23 23:47 ` John Stultz
2 siblings, 0 replies; 4+ messages in thread
From: John Stultz @ 2012-03-23 23:47 UTC (permalink / raw)
To: Andy Lutomirski; +Cc: Thomas Gleixner, x86, linux-kernel
On 03/22/2012 09:15 PM, Andy Lutomirski wrote:
> I think clock_gettime is already almost as fast as possible, but every
> little bit helps. Also, I think the diffstat is pretty good for a
> speedup. Here are some approximate timings.
>
> Before After
> CLOCK_REALTIME 16.7 15.2
> CLOCK_MONOTONIC 17.3 15.5
> CLOCK_REALTIME_COARSE 3.6 3.0
> CLOCK_MONOTONIC_COARSE 4.2 3.6
>
> These are extracted from an earlier series that's mostly abandoned now [1-2].
> They apply to tip/timers/core commit 57779dc2b3b75bee05ef5d1ada47f615f7a13932.
>
> For the git-inclined, the patches are here:
> https://git.kernel.org/?p=linux/kernel/git/luto/linux.git;a=shortlog;h=refs/heads/timing/vclock_speedup/patch_v1
>
> I'm not sure whether these are 3.4 material. On the pro side, they've
> technically been floating around since long before the merge window.
> They're also quite straightforward, and they're based on other -tip
> changes (which is why I'm submitting now). On the con side, they don't
> fix anything, and they're a little later than ideal.
The look straightforward enough. I've queued them at the end of my 3.4
queue. If Thomas or anyone wants to wait on them, I'll push them off to 3.5
thanks
-john
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-03-23 23:47 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-23 4:15 [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime Andy Lutomirski
2012-03-23 4:15 ` [PATCH 1/2] x86-64: Simplify and optimize vdso clock_gettime monotonic variants Andy Lutomirski
2012-03-23 4:15 ` [PATCH 2/2] x86-64: Inline vdso clock_gettime helpers Andy Lutomirski
2012-03-23 23:47 ` [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime John Stultz
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.