All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime
@ 2012-03-23  4:15 Andy Lutomirski
  2012-03-23  4:15 ` [PATCH 1/2] x86-64: Simplify and optimize vdso clock_gettime monotonic variants Andy Lutomirski
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Andy Lutomirski @ 2012-03-23  4:15 UTC (permalink / raw)
  To: Thomas Gleixner, x86; +Cc: linux-kernel, john.stultz, Andy Lutomirski

I think clock_gettime is already almost as fast as possible, but every
little bit helps.  Also, I think the diffstat is pretty good for a
speedup.  Here are some approximate timings.

                             Before       After
CLOCK_REALTIME                 16.7        15.2
CLOCK_MONOTONIC                17.3        15.5
CLOCK_REALTIME_COARSE           3.6         3.0
CLOCK_MONOTONIC_COARSE          4.2         3.6

These are extracted from an earlier series that's mostly abandoned now [1-2].
They apply to tip/timers/core commit 57779dc2b3b75bee05ef5d1ada47f615f7a13932.

For the git-inclined, the patches are here:
https://git.kernel.org/?p=linux/kernel/git/luto/linux.git;a=shortlog;h=refs/heads/timing/vclock_speedup/patch_v1

I'm not sure whether these are 3.4 material.  On the pro side, they've
technically been floating around since long before the merge window.
They're also quite straightforward, and they're based on other -tip
changes (which is why I'm submitting now).  On the con side, they don't
fix anything, and they're a little later than ideal.

[1] https://lkml.org/lkml/2011/12/25/26
[2] https://lkml.org/lkml/2011/12/25/27

Andy Lutomirski (2):
  x86-64: Simplify and optimize vdso clock_gettime monotonic variants
  x86-64: Inline vdso clock_gettime helpers

 arch/x86/include/asm/vgtod.h   |   15 +++++++-----
 arch/x86/kernel/vsyscall_64.c  |   10 +++++++-
 arch/x86/vdso/vclock_gettime.c |   47 +++++++++++----------------------------
 3 files changed, 31 insertions(+), 41 deletions(-)

-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] x86-64: Simplify and optimize vdso clock_gettime monotonic variants
  2012-03-23  4:15 [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime Andy Lutomirski
@ 2012-03-23  4:15 ` Andy Lutomirski
  2012-03-23  4:15 ` [PATCH 2/2] x86-64: Inline vdso clock_gettime helpers Andy Lutomirski
  2012-03-23 23:47 ` [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime John Stultz
  2 siblings, 0 replies; 4+ messages in thread
From: Andy Lutomirski @ 2012-03-23  4:15 UTC (permalink / raw)
  To: Thomas Gleixner, x86
  Cc: linux-kernel, john.stultz, Andy Lutomirski, Andy Lutomirski

From: Andy Lutomirski <luto@mit.edu>

We used to store the wall-to-monotonic offset and the realtime base.
It's faster to precompute the monotonic base.

This is about a 3% speedup on Sandy Bridge for CLOCK_MONOTONIC.
It's much more impressive for CLOCK_MONOTONIC_COARSE.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/include/asm/vgtod.h   |   15 +++++++++------
 arch/x86/kernel/vsyscall_64.c  |   10 +++++++++-
 arch/x86/vdso/vclock_gettime.c |   38 ++++++++------------------------------
 3 files changed, 26 insertions(+), 37 deletions(-)

diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index 1f00717..8b38be2 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -7,11 +7,6 @@
 struct vsyscall_gtod_data {
 	seqcount_t	seq;
 
-	/* open coded 'struct timespec' */
-	time_t		wall_time_sec;
-	u32		wall_time_nsec;
-
-	struct timezone sys_tz;
 	struct { /* extract of a clocksource struct */
 		int vclock_mode;
 		cycle_t	cycle_last;
@@ -19,8 +14,16 @@ struct vsyscall_gtod_data {
 		u32	mult;
 		u32	shift;
 	} clock;
-	struct timespec wall_to_monotonic;
+
+	/* open coded 'struct timespec' */
+	time_t		wall_time_sec;
+	u32		wall_time_nsec;
+	u32		monotonic_time_nsec;
+	time_t		monotonic_time_sec;
+
+	struct timezone sys_tz;
 	struct timespec wall_time_coarse;
+	struct timespec monotonic_time_coarse;
 };
 extern struct vsyscall_gtod_data vsyscall_gtod_data;
 
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index cdc95a7..4285f1f 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -84,6 +84,7 @@ void update_vsyscall(struct timespec *wall_time, struct timespec *wtm,
 			struct clocksource *clock, u32 mult)
 {
 	write_seqcount_begin(&vsyscall_gtod_data.seq);
+	struct timespec monotonic;
 
 	/* copy vsyscall data */
 	vsyscall_gtod_data.clock.vclock_mode	= clock->archdata.vclock_mode;
@@ -91,10 +92,17 @@ void update_vsyscall(struct timespec *wall_time, struct timespec *wtm,
 	vsyscall_gtod_data.clock.mask		= clock->mask;
 	vsyscall_gtod_data.clock.mult		= mult;
 	vsyscall_gtod_data.clock.shift		= clock->shift;
+
 	vsyscall_gtod_data.wall_time_sec	= wall_time->tv_sec;
 	vsyscall_gtod_data.wall_time_nsec	= wall_time->tv_nsec;
-	vsyscall_gtod_data.wall_to_monotonic	= *wtm;
+
+	monotonic = timespec_add(*wall_time, *wtm);
+	vsyscall_gtod_data.monotonic_time_sec	= monotonic.tv_sec;
+	vsyscall_gtod_data.monotonic_time_nsec	= monotonic.tv_nsec;
+
 	vsyscall_gtod_data.wall_time_coarse	= __current_kernel_time();
+	vsyscall_gtod_data.monotonic_time_coarse =
+		timespec_add(vsyscall_gtod_data.wall_time_coarse, *wtm);
 
 	write_seqcount_end(&vsyscall_gtod_data.seq);
 }
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 944c5e5..6eea70b8 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -113,27 +113,17 @@ notrace static noinline int do_realtime(struct timespec *ts)
 
 notrace static noinline int do_monotonic(struct timespec *ts)
 {
-	unsigned long seq, ns, secs;
+	unsigned long seq, ns;
 	int mode;
 
 	do {
 		seq = read_seqcount_begin(&gtod->seq);
 		mode = gtod->clock.vclock_mode;
-		secs = gtod->wall_time_sec;
-		ns = gtod->wall_time_nsec + vgetns();
-		secs += gtod->wall_to_monotonic.tv_sec;
-		ns += gtod->wall_to_monotonic.tv_nsec;
+		ts->tv_sec = gtod->monotonic_time_sec;
+		ts->tv_nsec = gtod->monotonic_time_nsec;
+		ns = vgetns();
 	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
-
-	/* wall_time_nsec, vgetns(), and wall_to_monotonic.tv_nsec
-	 * are all guaranteed to be nonnegative.
-	 */
-	while (ns >= NSEC_PER_SEC) {
-		ns -= NSEC_PER_SEC;
-		++secs;
-	}
-	ts->tv_sec = secs;
-	ts->tv_nsec = ns;
+	timespec_add_ns(ts, ns);
 
 	return mode;
 }
@@ -151,25 +141,13 @@ notrace static noinline int do_realtime_coarse(struct timespec *ts)
 
 notrace static noinline int do_monotonic_coarse(struct timespec *ts)
 {
-	unsigned long seq, ns, secs;
+	unsigned long seq;
 	do {
 		seq = read_seqcount_begin(&gtod->seq);
-		secs = gtod->wall_time_coarse.tv_sec;
-		ns = gtod->wall_time_coarse.tv_nsec;
-		secs += gtod->wall_to_monotonic.tv_sec;
-		ns += gtod->wall_to_monotonic.tv_nsec;
+		ts->tv_sec = gtod->monotonic_time_coarse.tv_sec;
+		ts->tv_nsec = gtod->monotonic_time_coarse.tv_nsec;
 	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
 
-	/* wall_time_nsec and wall_to_monotonic.tv_nsec are
-	 * guaranteed to be between 0 and NSEC_PER_SEC.
-	 */
-	if (ns >= NSEC_PER_SEC) {
-		ns -= NSEC_PER_SEC;
-		++secs;
-	}
-	ts->tv_sec = secs;
-	ts->tv_nsec = ns;
-
 	return 0;
 }
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] x86-64: Inline vdso clock_gettime helpers
  2012-03-23  4:15 [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime Andy Lutomirski
  2012-03-23  4:15 ` [PATCH 1/2] x86-64: Simplify and optimize vdso clock_gettime monotonic variants Andy Lutomirski
@ 2012-03-23  4:15 ` Andy Lutomirski
  2012-03-23 23:47 ` [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime John Stultz
  2 siblings, 0 replies; 4+ messages in thread
From: Andy Lutomirski @ 2012-03-23  4:15 UTC (permalink / raw)
  To: Thomas Gleixner, x86
  Cc: linux-kernel, john.stultz, Andy Lutomirski, Andy Lutomirski

From: Andy Lutomirski <luto@mit.edu>

This is about a 3% speedup on Sandy Bridge.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/vdso/vclock_gettime.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 6eea70b8..885eff4 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -94,7 +94,8 @@ notrace static inline long vgetns(void)
 	return (v * gtod->clock.mult) >> gtod->clock.shift;
 }
 
-notrace static noinline int do_realtime(struct timespec *ts)
+/* Code size doesn't matter (vdso is 4k anyway) and this is faster. */
+notrace static int __always_inline do_realtime(struct timespec *ts)
 {
 	unsigned long seq, ns;
 	int mode;
@@ -111,7 +112,7 @@ notrace static noinline int do_realtime(struct timespec *ts)
 	return mode;
 }
 
-notrace static noinline int do_monotonic(struct timespec *ts)
+notrace static int do_monotonic(struct timespec *ts)
 {
 	unsigned long seq, ns;
 	int mode;
@@ -128,7 +129,7 @@ notrace static noinline int do_monotonic(struct timespec *ts)
 	return mode;
 }
 
-notrace static noinline int do_realtime_coarse(struct timespec *ts)
+notrace static int do_realtime_coarse(struct timespec *ts)
 {
 	unsigned long seq;
 	do {
@@ -139,7 +140,7 @@ notrace static noinline int do_realtime_coarse(struct timespec *ts)
 	return 0;
 }
 
-notrace static noinline int do_monotonic_coarse(struct timespec *ts)
+notrace static int do_monotonic_coarse(struct timespec *ts)
 {
 	unsigned long seq;
 	do {
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime
  2012-03-23  4:15 [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime Andy Lutomirski
  2012-03-23  4:15 ` [PATCH 1/2] x86-64: Simplify and optimize vdso clock_gettime monotonic variants Andy Lutomirski
  2012-03-23  4:15 ` [PATCH 2/2] x86-64: Inline vdso clock_gettime helpers Andy Lutomirski
@ 2012-03-23 23:47 ` John Stultz
  2 siblings, 0 replies; 4+ messages in thread
From: John Stultz @ 2012-03-23 23:47 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Thomas Gleixner, x86, linux-kernel

On 03/22/2012 09:15 PM, Andy Lutomirski wrote:
> I think clock_gettime is already almost as fast as possible, but every
> little bit helps.  Also, I think the diffstat is pretty good for a
> speedup.  Here are some approximate timings.
>
>                               Before       After
> CLOCK_REALTIME                 16.7        15.2
> CLOCK_MONOTONIC                17.3        15.5
> CLOCK_REALTIME_COARSE           3.6         3.0
> CLOCK_MONOTONIC_COARSE          4.2         3.6
>
> These are extracted from an earlier series that's mostly abandoned now [1-2].
> They apply to tip/timers/core commit 57779dc2b3b75bee05ef5d1ada47f615f7a13932.
>
> For the git-inclined, the patches are here:
> https://git.kernel.org/?p=linux/kernel/git/luto/linux.git;a=shortlog;h=refs/heads/timing/vclock_speedup/patch_v1
>
> I'm not sure whether these are 3.4 material.  On the pro side, they've
> technically been floating around since long before the merge window.
> They're also quite straightforward, and they're based on other -tip
> changes (which is why I'm submitting now).  On the con side, they don't
> fix anything, and they're a little later than ideal.

The look straightforward enough. I've queued them at the end of my 3.4 
queue. If Thomas or anyone wants to wait on them, I'll push them off to 3.5

thanks
-john


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-03-23 23:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-23  4:15 [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime Andy Lutomirski
2012-03-23  4:15 ` [PATCH 1/2] x86-64: Simplify and optimize vdso clock_gettime monotonic variants Andy Lutomirski
2012-03-23  4:15 ` [PATCH 2/2] x86-64: Inline vdso clock_gettime helpers Andy Lutomirski
2012-03-23 23:47 ` [PATCH 0/2] x86-64: Simplify and speed up vdso clock_gettime John Stultz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.