All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
@ 2018-09-01  1:59 Matt Rickard
  2018-09-01  3:39 ` Andy Lutomirski
  0 siblings, 1 reply; 16+ messages in thread
From: Matt Rickard @ 2018-09-01  1:59 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Andy Lutomirski, Stephen Boyd, John Stultz, X86 ML, LKML

Process clock_gettime(CLOCK_TAI) in vDSO.
This makes the call about as fast as CLOCK_REALTIME and CLOCK_MONOTONIC:

  nanoseconds
 before after clockname
   ---- ----- ---------
    233    87 CLOCK_TAI
     96    93 CLOCK_REALTIME
     88    87 CLOCK_MONOTONIC

Signed-off-by: Matt Rickard <matt@softrans.com.au>
---
 arch/x86/entry/vdso/vclock_gettime.c    | 25 +++++++++++++++++++++++++
 arch/x86/entry/vsyscall/vsyscall_gtod.c |  2 ++
 arch/x86/include/asm/vgtod.h            |  1 +
 3 files changed, 28 insertions(+)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index f19856d95c60..91ed1bb2a3bb 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -246,6 +246,27 @@ notrace static int __always_inline do_monotonic(struct timespec *ts)
 	return mode;
 }
 
+notrace static int __always_inline do_tai(struct timespec *ts)
+{
+	unsigned long seq;
+	u64 ns;
+	int mode;
+
+	do {
+		seq = gtod_read_begin(gtod);
+		mode = gtod->vclock_mode;
+		ts->tv_sec = gtod->tai_time_sec;
+		ns = gtod->wall_time_snsec;
+		ns += vgetsns(&mode);
+		ns >>= gtod->shift;
+	} while (unlikely(gtod_read_retry(gtod, seq)));
+
+	ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
+	ts->tv_nsec = ns;
+
+	return mode;
+}
+
 notrace static void do_realtime_coarse(struct timespec *ts)
 {
 	unsigned long seq;
@@ -277,6 +298,10 @@ notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 		if (do_monotonic(ts) == VCLOCK_NONE)
 			goto fallback;
 		break;
+	case CLOCK_TAI:
+		if (do_tai(ts) == VCLOCK_NONE)
+			goto fallback;
+		break;
 	case CLOCK_REALTIME_COARSE:
 		do_realtime_coarse(ts);
 		break;
diff --git a/arch/x86/entry/vsyscall/vsyscall_gtod.c b/arch/x86/entry/vsyscall/vsyscall_gtod.c
index e1216dd95c04..d61392fe17f6 100644
--- a/arch/x86/entry/vsyscall/vsyscall_gtod.c
+++ b/arch/x86/entry/vsyscall/vsyscall_gtod.c
@@ -53,6 +53,8 @@ void update_vsyscall(struct timekeeper *tk)
 	vdata->monotonic_time_snsec	= tk->tkr_mono.xtime_nsec
 					+ ((u64)tk->wall_to_monotonic.tv_nsec
 						<< tk->tkr_mono.shift);
+	vdata->tai_time_sec	        = tk->xtime_sec
+					+ tk->tai_offset;
 	while (vdata->monotonic_time_snsec >=
 					(((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) {
 		vdata->monotonic_time_snsec -=
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index fb856c9f0449..adc9f7b20b9c 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -32,6 +32,7 @@ struct vsyscall_gtod_data {
 	gtod_long_t	wall_time_coarse_nsec;
 	gtod_long_t	monotonic_time_coarse_sec;
 	gtod_long_t	monotonic_time_coarse_nsec;
+	gtod_long_t	tai_time_sec;
 
 	int		tz_minuteswest;
 	int		tz_dsttime;

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
  2018-09-01  1:59 [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO Matt Rickard
@ 2018-09-01  3:39 ` Andy Lutomirski
  2018-09-01  9:33   ` Florian Weimer
  2018-09-09 20:05   ` Thomas Gleixner
  0 siblings, 2 replies; 16+ messages in thread
From: Andy Lutomirski @ 2018-09-01  3:39 UTC (permalink / raw)
  To: Matt Rickard, Florian Weimer
  Cc: Thomas Gleixner, Andy Lutomirski, Stephen Boyd, John Stultz,
	X86 ML, LKML

[-- Attachment #1: Type: text/plain, Size: 1580 bytes --]

(Hi, Florian!)

On Fri, Aug 31, 2018 at 6:59 PM, Matt Rickard <matt@softrans.com.au> wrote:
> Process clock_gettime(CLOCK_TAI) in vDSO.
> This makes the call about as fast as CLOCK_REALTIME and CLOCK_MONOTONIC:
>
>   nanoseconds
>  before after clockname
>    ---- ----- ---------
>     233    87 CLOCK_TAI
>      96    93 CLOCK_REALTIME
>      88    87 CLOCK_MONOTONIC

Are you sure you did this right?  With the clocksource set to TSC
(which is the only reasonable choice unless KVM has seriously cleaned
up its act), with retpolines enabled, I get 24ns for CLOCK_MONOTONIC
without your patch and 32ns with your patch.  And there is indeed a
retpoline in the disassembled output:

  e5:   e8 07 00 00 00          callq  f1 <__vdso_clock_gettime+0x31>
  ea:   f3 90                   pause
  ec:   0f ae e8                lfence
  ef:   eb f9                   jmp    ea <__vdso_clock_gettime+0x2a>
  f1:   48 89 04 24             mov    %rax,(%rsp)
  f5:   c3                      retq

You're probably going to have to set -fno-jump-tables or do something
clever like adding a whole array of (seconds, nsec) in gtod and
indexing that array by the clock id.

Meanwhile, I wrote the following trivial patch to add a
__vdso_clock_gettime_monotonic export.  It runs in 21ns, and I suspect
that the speedup is even a bit bigger when cache-cold because it
avoids some branches.  What do you all think?  Florian, do you think
glibc would be willing to add some magic to turn
clock_gettime(CLOCK_MONOTONIC, t) into
__vdso_clock_gettime_monotonic(t) when CLOCK_MONOTONIC is a constant?

[-- Attachment #2: vclock_gettime_monotonic.patch --]
[-- Type: text/x-patch, Size: 1107 bytes --]

diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 91ed1bb2a3bb..4f22e9cb97a5 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -319,6 +319,14 @@ notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 int clock_gettime(clockid_t, struct timespec *)
 	__attribute__((weak, alias("__vdso_clock_gettime")));
 
+notrace int __vdso_clock_gettime_monotonic(struct timespec *ts)
+{
+	if (likely(do_monotonic(ts) != VCLOCK_NONE))
+		return 0;
+
+	return vdso_fallback_gettime(CLOCK_MONOTONIC, ts);
+}
+
 notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
 {
 	if (likely(tv != NULL)) {
diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
index d3a2dce4cfa9..28e23cbc02c9 100644
--- a/arch/x86/entry/vdso/vdso.lds.S
+++ b/arch/x86/entry/vdso/vdso.lds.S
@@ -15,6 +15,11 @@
  * This controls what userland symbols we export from the vDSO.
  */
 VERSION {
+	LINUX_4.19 {
+	global:
+		__vdso_clock_gettime_monotonic;
+	};
+
 	LINUX_2.6 {
 	global:
 		clock_gettime;

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
  2018-09-01  3:39 ` Andy Lutomirski
@ 2018-09-01  9:33   ` Florian Weimer
  2018-09-01 17:37     ` Andy Lutomirski
  2018-09-09 20:05   ` Thomas Gleixner
  1 sibling, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2018-09-01  9:33 UTC (permalink / raw)
  To: Andy Lutomirski, Matt Rickard
  Cc: Thomas Gleixner, Stephen Boyd, John Stultz, X86 ML, LKML

On 09/01/2018 05:39 AM, Andy Lutomirski wrote:
> Florian, do you think
> glibc would be willing to add some magic to turn
> clock_gettime(CLOCK_MONOTONIC, t) into
> __vdso_clock_gettime_monotonic(t) when CLOCK_MONOTONIC is a constant?

What's the goal here?  Turn the indirect call/conditional jump/indirect 
call sequence into a single indirect call, purely for performance reasons?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
  2018-09-01  9:33   ` Florian Weimer
@ 2018-09-01 17:37     ` Andy Lutomirski
  0 siblings, 0 replies; 16+ messages in thread
From: Andy Lutomirski @ 2018-09-01 17:37 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andy Lutomirski, Matt Rickard, Thomas Gleixner, Stephen Boyd,
	John Stultz, X86 ML, LKML

On Sat, Sep 1, 2018 at 2:33 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 09/01/2018 05:39 AM, Andy Lutomirski wrote:
>>
>> Florian, do you think
>> glibc would be willing to add some magic to turn
>> clock_gettime(CLOCK_MONOTONIC, t) into
>> __vdso_clock_gettime_monotonic(t) when CLOCK_MONOTONIC is a constant?
>
>
> What's the goal here?  Turn the indirect call/conditional jump/indirect call
> sequence into a single indirect call, purely for performance reasons?

Almost.  It's to bypass some of the branches in
__vdso_clock_gettime(), which is supposed to be very fast.  AFAIK most
user code that uses clock_gettime() passes a constant for the first
argument, and we can squeeze out some performance by optimizing that
case.  The indirect branches internal to the vDSO are a separate issue
and should be solved separately.

(It's really too bad that x86 doesn't have a 64-bit call instruction.
If it did, then the PLT could get rewritten at dynamic link time to
avoid indirect calls entirely, and presumably glibc could use the same
technique to call into the vDSO without indirect calls.)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
  2018-09-01  3:39 ` Andy Lutomirski
  2018-09-01  9:33   ` Florian Weimer
@ 2018-09-09 20:05   ` Thomas Gleixner
  2018-09-10 10:38     ` Thomas Gleixner
  2018-09-12 13:01     ` Florian Weimer
  1 sibling, 2 replies; 16+ messages in thread
From: Thomas Gleixner @ 2018-09-09 20:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Matt Rickard, Florian Weimer, Stephen Boyd, John Stultz, X86 ML, LKML

On Fri, 31 Aug 2018, Andy Lutomirski wrote:

> (Hi, Florian!)
> 
> On Fri, Aug 31, 2018 at 6:59 PM, Matt Rickard <matt@softrans.com.au> wrote:
> > Process clock_gettime(CLOCK_TAI) in vDSO.
> > This makes the call about as fast as CLOCK_REALTIME and CLOCK_MONOTONIC:
> >
> >   nanoseconds
> >  before after clockname
> >    ---- ----- ---------
> >     233    87 CLOCK_TAI
> >      96    93 CLOCK_REALTIME
> >      88    87 CLOCK_MONOTONIC
> 
> Are you sure you did this right?  With the clocksource set to TSC
> (which is the only reasonable choice unless KVM has seriously cleaned
> up its act), with retpolines enabled, I get 24ns for CLOCK_MONOTONIC
> without your patch and 32ns with your patch.  And there is indeed a
> retpoline in the disassembled output:
> 
>   e5:   e8 07 00 00 00          callq  f1 <__vdso_clock_gettime+0x31>
>   ea:   f3 90                   pause
>   ec:   0f ae e8                lfence
>   ef:   eb f9                   jmp    ea <__vdso_clock_gettime+0x2a>
>   f1:   48 89 04 24             mov    %rax,(%rsp)
>   f5:   c3                      retq
> 
> You're probably going to have to set -fno-jump-tables or do something
> clever like adding a whole array of (seconds, nsec) in gtod and
> indexing that array by the clock id.

See the patch below. It's integrating TAI without slowing down everything
and it definitely does not result in indirect calls.

On a HSW it slows down clock_gettime() by ~0.5ns. On a SKL I get a speedup
by ~0.5ns. On a AMD Epyc server it's 1.2ns speedup. So it somehow depends
on the uarch and I also observed compiler version dependend variations.

Thanks,

	tglx
----
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -203,39 +203,18 @@ notrace static inline u64 vgetsns(int *m
 	return v * gtod->mult;
 }
 
-/* Code size doesn't matter (vdso is 4k anyway) and this is faster. */
-notrace static int __always_inline do_realtime(struct timespec *ts)
+notrace static int __always_inline do_hres(struct timespec *ts, clockid_t clk)
 {
-	unsigned long seq;
-	u64 ns;
+	struct vgtod_ts *base = &gtod->basetime[clk & VGTOD_HRES_MASK];
+	unsigned int seq;
 	int mode;
-
-	do {
-		seq = gtod_read_begin(gtod);
-		mode = gtod->vclock_mode;
-		ts->tv_sec = gtod->wall_time_sec;
-		ns = gtod->wall_time_snsec;
-		ns += vgetsns(&mode);
-		ns >>= gtod->shift;
-	} while (unlikely(gtod_read_retry(gtod, seq)));
-
-	ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
-	ts->tv_nsec = ns;
-
-	return mode;
-}
-
-notrace static int __always_inline do_monotonic(struct timespec *ts)
-{
-	unsigned long seq;
 	u64 ns;
-	int mode;
 
 	do {
 		seq = gtod_read_begin(gtod);
 		mode = gtod->vclock_mode;
-		ts->tv_sec = gtod->monotonic_time_sec;
-		ns = gtod->monotonic_time_snsec;
+		ts->tv_sec = base->sec;
+		ns = base->nsec;
 		ns += vgetsns(&mode);
 		ns >>= gtod->shift;
 	} while (unlikely(gtod_read_retry(gtod, seq)));
@@ -246,58 +225,50 @@ notrace static int __always_inline do_mo
 	return mode;
 }
 
-notrace static void do_realtime_coarse(struct timespec *ts)
+notrace static void do_coarse(struct timespec *ts, clockid_t clk)
 {
+	struct vgtod_ts *base = &gtod->basetime[clk];
 	unsigned long seq;
-	do {
-		seq = gtod_read_begin(gtod);
-		ts->tv_sec = gtod->wall_time_coarse_sec;
-		ts->tv_nsec = gtod->wall_time_coarse_nsec;
-	} while (unlikely(gtod_read_retry(gtod, seq)));
-}
 
-notrace static void do_monotonic_coarse(struct timespec *ts)
-{
-	unsigned long seq;
 	do {
 		seq = gtod_read_begin(gtod);
-		ts->tv_sec = gtod->monotonic_time_coarse_sec;
-		ts->tv_nsec = gtod->monotonic_time_coarse_nsec;
+		ts->tv_sec = base->sec;
+		ts->tv_nsec = base->nsec;
 	} while (unlikely(gtod_read_retry(gtod, seq)));
 }
 
 notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 {
-	switch (clock) {
-	case CLOCK_REALTIME:
-		if (do_realtime(ts) == VCLOCK_NONE)
-			goto fallback;
-		break;
-	case CLOCK_MONOTONIC:
-		if (do_monotonic(ts) == VCLOCK_NONE)
-			goto fallback;
-		break;
-	case CLOCK_REALTIME_COARSE:
-		do_realtime_coarse(ts);
-		break;
-	case CLOCK_MONOTONIC_COARSE:
-		do_monotonic_coarse(ts);
-		break;
-	default:
-		goto fallback;
-	}
+	unsigned int msk;
 
-	return 0;
-fallback:
+	/* Sort out negative (CPU/FD) and invalid clocks */
+	if (unlikely((unsigned int) clock >= MAX_CLOCKS))
+		return vdso_fallback_gettime(clock, ts);
+
+	/*
+	 * Convert the clockid to a bitmask and use it to check which
+	 * clocks are handled in the VDSO directly.
+	 */
+	msk = 1U << clock;
+	if (likely(msk & VGTOD_HRES)) {
+		if (do_hres(ts, clock) != VCLOCK_NONE)
+			return 0;
+	} else if (msk & VGTOD_COARSE) {
+		do_coarse(ts, clock);
+		return 0;
+	}
 	return vdso_fallback_gettime(clock, ts);
 }
+
 int clock_gettime(clockid_t, struct timespec *)
 	__attribute__((weak, alias("__vdso_clock_gettime")));
 
 notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
 {
 	if (likely(tv != NULL)) {
-		if (unlikely(do_realtime((struct timespec *)tv) == VCLOCK_NONE))
+		struct timespec *ts = (struct timespec *) tv;
+
+		if (unlikely(do_hres(ts, CLOCK_REALTIME) == VCLOCK_NONE))
 			return vdso_fallback_gtod(tv, tz);
 		tv->tv_usec /= 1000;
 	}
@@ -318,7 +289,7 @@ int gettimeofday(struct timeval *, struc
 notrace time_t __vdso_time(time_t *t)
 {
 	/* This is atomic on x86 so we don't need any locks. */
-	time_t result = READ_ONCE(gtod->wall_time_sec);
+	time_t result = READ_ONCE(gtod->basetime[CLOCK_REALTIME].sec);
 
 	if (t)
 		*t = result;
--- a/arch/x86/entry/vsyscall/vsyscall_gtod.c
+++ b/arch/x86/entry/vsyscall/vsyscall_gtod.c
@@ -31,6 +31,8 @@ void update_vsyscall(struct timekeeper *
 {
 	int vclock_mode = tk->tkr_mono.clock->archdata.vclock_mode;
 	struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data;
+	struct vgtod_ts *base;
+	u64 nsec;
 
 	/* Mark the new vclock used. */
 	BUILD_BUG_ON(VCLOCK_MAX >= 32);
@@ -45,34 +47,37 @@ void update_vsyscall(struct timekeeper *
 	vdata->mult		= tk->tkr_mono.mult;
 	vdata->shift		= tk->tkr_mono.shift;
 
-	vdata->wall_time_sec		= tk->xtime_sec;
-	vdata->wall_time_snsec		= tk->tkr_mono.xtime_nsec;
-
-	vdata->monotonic_time_sec	= tk->xtime_sec
-					+ tk->wall_to_monotonic.tv_sec;
-	vdata->monotonic_time_snsec	= tk->tkr_mono.xtime_nsec
-					+ ((u64)tk->wall_to_monotonic.tv_nsec
-						<< tk->tkr_mono.shift);
-	while (vdata->monotonic_time_snsec >=
-					(((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) {
-		vdata->monotonic_time_snsec -=
-					((u64)NSEC_PER_SEC) << tk->tkr_mono.shift;
-		vdata->monotonic_time_sec++;
+	base = &vdata->basetime[CLOCK_REALTIME];
+	base->sec = (gtod_long_t)tk->xtime_sec;
+	base->nsec = tk->tkr_mono.xtime_nsec;
+
+	base = &vdata->basetime[VGTOD_TAI];
+	base->sec = (gtod_long_t)(tk->xtime_sec + (s64)tk->tai_offset);
+	base->nsec = tk->tkr_mono.xtime_nsec;
+
+	base = &vdata->basetime[CLOCK_MONOTONIC];
+	base->sec = (gtod_long_t)(tk->xtime_sec + tk->wall_to_monotonic.tv_sec);
+	nsec = tk->tkr_mono.xtime_nsec;
+	nsec +=	((u64)tk->wall_to_monotonic.tv_nsec << tk->tkr_mono.shift);
+	while (nsec >= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) {
+		nsec -= ((u64)NSEC_PER_SEC) << tk->tkr_mono.shift;
+		base->sec++;
 	}
+	base->nsec = nsec;
 
-	vdata->wall_time_coarse_sec	= tk->xtime_sec;
-	vdata->wall_time_coarse_nsec	= (long)(tk->tkr_mono.xtime_nsec >>
-						 tk->tkr_mono.shift);
-
-	vdata->monotonic_time_coarse_sec =
-		vdata->wall_time_coarse_sec + tk->wall_to_monotonic.tv_sec;
-	vdata->monotonic_time_coarse_nsec =
-		vdata->wall_time_coarse_nsec + tk->wall_to_monotonic.tv_nsec;
-
-	while (vdata->monotonic_time_coarse_nsec >= NSEC_PER_SEC) {
-		vdata->monotonic_time_coarse_nsec -= NSEC_PER_SEC;
-		vdata->monotonic_time_coarse_sec++;
+	base = &vdata->basetime[CLOCK_REALTIME_COARSE];
+	base->sec = (gtod_long_t)tk->xtime_sec;
+	base->nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift;
+
+	base = &vdata->basetime[CLOCK_MONOTONIC_COARSE];
+	base->sec = (gtod_long_t)(tk->xtime_sec + tk->wall_to_monotonic.tv_sec);
+	nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift;
+	nsec += tk->wall_to_monotonic.tv_nsec;
+	while (nsec >= NSEC_PER_SEC) {
+		nsec -= NSEC_PER_SEC;
+		base->sec++;
 	}
+	base->nsec = nsec;
 
 	gtod_write_end(vdata);
 }
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -5,33 +5,41 @@
 #include <linux/compiler.h>
 #include <linux/clocksource.h>
 
+#include <uapi/linux/time.h>
+
 #ifdef BUILD_VDSO32_64
 typedef u64 gtod_long_t;
 #else
 typedef unsigned long gtod_long_t;
 #endif
+
+struct vgtod_ts {
+	gtod_long_t	sec;
+	u64		nsec;
+};
+
+#define VGTOD_BASES	(CLOCK_MONOTONIC_COARSE + 1)
+#define VGTOD_HRES	(BIT(CLOCK_REALTIME) | BIT(CLOCK_MONOTONIC) | BIT(CLOCK_TAI))
+#define VGTOD_COARSE	(BIT(CLOCK_REALTIME_COARSE) | BIT(CLOCK_MONOTONIC_COARSE))
+
+/* Abuse CLOCK_THREAD_CPUTIME_ID for VGTOD CLOCK TAI */
+#define VGTOD_HRES_MASK	0x3
+#define VGTOD_TAI	(CLOCK_TAI & VGTOD_HRES_MASK)
+
 /*
  * vsyscall_gtod_data will be accessed by 32 and 64 bit code at the same time
  * so be carefull by modifying this structure.
  */
 struct vsyscall_gtod_data {
-	unsigned seq;
+	unsigned int	seq;
+
+	int		vclock_mode;
+	u64		cycle_last;
+	u64		mask;
+	u32		mult;
+	u32		shift;
 
-	int vclock_mode;
-	u64	cycle_last;
-	u64	mask;
-	u32	mult;
-	u32	shift;
-
-	/* open coded 'struct timespec' */
-	u64		wall_time_snsec;
-	gtod_long_t	wall_time_sec;
-	gtod_long_t	monotonic_time_sec;
-	u64		monotonic_time_snsec;
-	gtod_long_t	wall_time_coarse_sec;
-	gtod_long_t	wall_time_coarse_nsec;
-	gtod_long_t	monotonic_time_coarse_sec;
-	gtod_long_t	monotonic_time_coarse_nsec;
+	struct vgtod_ts	basetime[VGTOD_BASES];
 
 	int		tz_minuteswest;
 	int		tz_dsttime;
@@ -44,9 +52,9 @@ static inline bool vclock_was_used(int v
 	return READ_ONCE(vclocks_used) & (1 << vclock);
 }
 
-static inline unsigned gtod_read_begin(const struct vsyscall_gtod_data *s)
+static inline unsigned int gtod_read_begin(const struct vsyscall_gtod_data *s)
 {
-	unsigned ret;
+	unsigned int ret;
 
 repeat:
 	ret = READ_ONCE(s->seq);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
  2018-09-09 20:05   ` Thomas Gleixner
@ 2018-09-10 10:38     ` Thomas Gleixner
  2018-09-12 13:01     ` Florian Weimer
  1 sibling, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2018-09-10 10:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Matt Rickard, Florian Weimer, Stephen Boyd, John Stultz, X86 ML, LKML

On Sun, 9 Sep 2018, Thomas Gleixner wrote:
>  #ifdef BUILD_VDSO32_64
>  typedef u64 gtod_long_t;
>  #else
>  typedef unsigned long gtod_long_t;
>  #endif
> +
> +struct vgtod_ts {
> +	gtod_long_t	sec;

and actually this wants to become u64 unconditionally as we need to provide
the full seconds even on 32bit for the upcoming y2038 support. We still
have to truncate it for the current 32bit interface, but the core code
can be made ready now.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
  2018-09-09 20:05   ` Thomas Gleixner
  2018-09-10 10:38     ` Thomas Gleixner
@ 2018-09-12 13:01     ` Florian Weimer
  2018-09-12 14:17       ` Thomas Gleixner
  1 sibling, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2018-09-12 13:01 UTC (permalink / raw)
  To: Thomas Gleixner, Andy Lutomirski
  Cc: Matt Rickard, Stephen Boyd, John Stultz, X86 ML, LKML

On 09/09/2018 10:05 PM, Thomas Gleixner wrote:
> See the patch below. It's integrating TAI without slowing down everything
> and it definitely does not result in indirect calls.
> 
> On a HSW it slows down clock_gettime() by ~0.5ns. On a SKL I get a speedup
> by ~0.5ns. On a AMD Epyc server it's 1.2ns speedup. So it somehow depends
> on the uarch and I also observed compiler version dependend variations.

Does this mean glibc can keep using a single vDSO entrypoint, the one we 
have today?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
  2018-09-12 13:01     ` Florian Weimer
@ 2018-09-12 14:17       ` Thomas Gleixner
  2018-09-12 14:20         ` Florian Weimer
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2018-09-12 14:17 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andy Lutomirski, Matt Rickard, Stephen Boyd, John Stultz, X86 ML, LKML

On Wed, 12 Sep 2018, Florian Weimer wrote:
> On 09/09/2018 10:05 PM, Thomas Gleixner wrote:
> > See the patch below. It's integrating TAI without slowing down everything
> > and it definitely does not result in indirect calls.
> > 
> > On a HSW it slows down clock_gettime() by ~0.5ns. On a SKL I get a speedup
> > by ~0.5ns. On a AMD Epyc server it's 1.2ns speedup. So it somehow depends
> > on the uarch and I also observed compiler version dependend variations.
> 
> Does this mean glibc can keep using a single vDSO entrypoint, the one we
> have today?

We have no intention to change that.

But we surely could provide separate entry points as an extra to avoid a
bunch of conditionals.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
  2018-09-12 14:17       ` Thomas Gleixner
@ 2018-09-12 14:20         ` Florian Weimer
  2018-09-12 14:29           ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2018-09-12 14:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Matt Rickard, Stephen Boyd, John Stultz, X86 ML, LKML

On 09/12/2018 04:17 PM, Thomas Gleixner wrote:
> On Wed, 12 Sep 2018, Florian Weimer wrote:
>> On 09/09/2018 10:05 PM, Thomas Gleixner wrote:
>>> See the patch below. It's integrating TAI without slowing down everything
>>> and it definitely does not result in indirect calls.
>>>
>>> On a HSW it slows down clock_gettime() by ~0.5ns. On a SKL I get a speedup
>>> by ~0.5ns. On a AMD Epyc server it's 1.2ns speedup. So it somehow depends
>>> on the uarch and I also observed compiler version dependend variations.
>>
>> Does this mean glibc can keep using a single vDSO entrypoint, the one we
>> have today?
> 
> We have no intention to change that.

Okay, I was wondering because Andy seemed to have proposed just that.

> But we surely could provide separate entry points as an extra to avoid a
> bunch of conditionals.

We could adjust to that, but the benefit would be long-term because it's 
an ABI change for glibc, and they tend to take a long time to propagate.

But I must say that clock_gettime is an odd place to start.  I would 
have expected any of the type-polymorphic multiplexer interfaces (fcntl, 
ioctl, ptrace, futex) to be a more natural starting point. 8-)

Thanks,
Florian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
  2018-09-12 14:20         ` Florian Weimer
@ 2018-09-12 14:29           ` Thomas Gleixner
  2018-09-12 17:11             ` Andy Lutomirski
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2018-09-12 14:29 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andy Lutomirski, Matt Rickard, Stephen Boyd, John Stultz, X86 ML, LKML

On Wed, 12 Sep 2018, Florian Weimer wrote:
> On 09/12/2018 04:17 PM, Thomas Gleixner wrote:
> > On Wed, 12 Sep 2018, Florian Weimer wrote:
> > > Does this mean glibc can keep using a single vDSO entrypoint, the one we
> > > have today?
> > 
> > We have no intention to change that.
> 
> Okay, I was wondering because Andy seemed to have proposed just that.
> 
> > But we surely could provide separate entry points as an extra to avoid a
> > bunch of conditionals.
> 
> We could adjust to that, but the benefit would be long-term because it's an
> ABI change for glibc, and they tend to take a long time to propagate.
> 
> But I must say that clock_gettime is an odd place to start.  I would have
> expected any of the type-polymorphic multiplexer interfaces (fcntl, ioctl,
> ptrace, futex) to be a more natural starting point. 8-)

Well, the starting point of this was to provide clock_tai support in the
vdso. clock_gettime() in the vdso vs. the real syscall is a factor of 10 in
speed. clock_gettime() is a pretty hot function in some workloads.

Andy then noticed that some conditionals could be avoided entirely by using
a different entry point and offered one along with a 10% speedup. We don't
have to go there, we can.

The multiplexer interfaces need much more surgery and talking about futex,
we'd need to sit down with quite some people and identify the things they
actually care about before just splitting it up and keeping the existing
overloaded trainwreck the same.

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
  2018-09-12 14:29           ` Thomas Gleixner
@ 2018-09-12 17:11             ` Andy Lutomirski
  2018-09-13  8:07               ` Florian Weimer
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2018-09-12 17:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Florian Weimer, Andy Lutomirski, Matt Rickard, Stephen Boyd,
	John Stultz, X86 ML, LKML



> On Sep 12, 2018, at 7:29 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
>> On Wed, 12 Sep 2018, Florian Weimer wrote:
>>> On 09/12/2018 04:17 PM, Thomas Gleixner wrote:
>>>> On Wed, 12 Sep 2018, Florian Weimer wrote:
>>>> Does this mean glibc can keep using a single vDSO entrypoint, the one we
>>>> have today?
>>> 
>>> We have no intention to change that.
>> 
>> Okay, I was wondering because Andy seemed to have proposed just that.
>> 
>>> But we surely could provide separate entry points as an extra to avoid a
>>> bunch of conditionals.
>> 
>> We could adjust to that, but the benefit would be long-term because it's an
>> ABI change for glibc, and they tend to take a long time to propagate.
>> 
>> But I must say that clock_gettime is an odd place to start.  I would have
>> expected any of the type-polymorphic multiplexer interfaces (fcntl, ioctl,
>> ptrace, futex) to be a more natural starting point. 8-)
> 
> Well, the starting point of this was to provide clock_tai support in the
> vdso. clock_gettime() in the vdso vs. the real syscall is a factor of 10 in
> speed. clock_gettime() is a pretty hot function in some workloads.
> 
> Andy then noticed that some conditionals could be avoided entirely by using
> a different entry point and offered one along with a 10% speedup. We don't
> have to go there, we can.
> 
> The multiplexer interfaces need much more surgery and talking about futex,
> we'd need to sit down with quite some people and identify the things they
> actually care about before just splitting it up and keeping the existing
> overloaded trainwreck the same.
> 

There’s also the issue of how much the speedup matters. For futex, maybe a better interface saves 3ns, but a futex syscall is hundreds of ns. clock_gettime() is called at high frequency and can be ~25ns. Saving a few ns is a bigger deal.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
  2018-09-12 17:11             ` Andy Lutomirski
@ 2018-09-13  8:07               ` Florian Weimer
  2018-09-13 15:22                 ` Andy Lutomirski
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2018-09-13  8:07 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner
  Cc: Andy Lutomirski, Matt Rickard, Stephen Boyd, John Stultz, X86 ML, LKML

On 09/12/2018 07:11 PM, Andy Lutomirski wrote:
>> The multiplexer interfaces need much more surgery and talking about futex,
>> we'd need to sit down with quite some people and identify the things they
>> actually care about before just splitting it up and keeping the existing
>> overloaded trainwreck the same.
>>
> There’s also the issue of how much the speedup matters. For futex, maybe a better interface saves 3ns, but a futex syscall is hundreds of ns. clock_gettime() is called at high frequency and can be ~25ns. Saving a few ns is a bigger deal.

My concern is that the userspace system call wrappers currently do not 
know how many arguments the individual operations take and what types 
the arguments have (hence the “type-polymorphic” nature I mentioned). 
This could be a problem for on-stack argument passing (where you might 
read values beyond the end of the stack, and glibc avoids that most of 
the time by having enough cruft on the stack), and for architectures 
which pass pointers and integers in different registers (like some m68k 
ABIs do for the return value).

Thanks,
Florian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
  2018-09-13  8:07               ` Florian Weimer
@ 2018-09-13 15:22                 ` Andy Lutomirski
  2018-09-13 19:07                   ` Florian Weimer
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2018-09-13 15:22 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Thomas Gleixner, Andy Lutomirski, Matt Rickard, Stephen Boyd,
	John Stultz, X86 ML, LKML



> On Sep 13, 2018, at 1:07 AM, Florian Weimer <fweimer@redhat.com> wrote:
> 
> On 09/12/2018 07:11 PM, Andy Lutomirski wrote:
>>> The multiplexer interfaces need much more surgery and talking about futex,
>>> we'd need to sit down with quite some people and identify the things they
>>> actually care about before just splitting it up and keeping the existing
>>> overloaded trainwreck the same.
>>> 
>> There’s also the issue of how much the speedup matters. For futex, maybe a better interface saves 3ns, but a futex syscall is hundreds of ns. clock_gettime() is called at high frequency and can be ~25ns. Saving a few ns is a bigger deal.
> 
> My concern is that the userspace system call wrappers currently do not know how many arguments the individual operations take and what types the arguments have (hence the “type-polymorphic” nature I mentioned). This could be a problem for on-stack argument passing (where you might read values beyond the end of the stack, and glibc avoids that most of the time by having enough cruft on the stack), and for architectures which pass pointers and integers in different registers (like some m68k ABIs do for the return value).
> 
> 

Isn’t clock_gettime already special because of the vDSO entry point, though?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
  2018-09-13 15:22                 ` Andy Lutomirski
@ 2018-09-13 19:07                   ` Florian Weimer
  2018-09-13 19:35                     ` Andy Lutomirski
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2018-09-13 19:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Andy Lutomirski, Matt Rickard, Stephen Boyd,
	John Stultz, X86 ML, LKML

On 09/13/2018 05:22 PM, Andy Lutomirski wrote:
> 
> 
>> On Sep 13, 2018, at 1:07 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>
>> On 09/12/2018 07:11 PM, Andy Lutomirski wrote:
>>>> The multiplexer interfaces need much more surgery and talking about futex,
>>>> we'd need to sit down with quite some people and identify the things they
>>>> actually care about before just splitting it up and keeping the existing
>>>> overloaded trainwreck the same.
>>>>
>>> There’s also the issue of how much the speedup matters. For futex, maybe a better interface saves 3ns, but a futex syscall is hundreds of ns. clock_gettime() is called at high frequency and can be ~25ns. Saving a few ns is a bigger deal.
>>
>> My concern is that the userspace system call wrappers currently do not know how many arguments the individual operations take and what types the arguments have (hence the “type-polymorphic” nature I mentioned). This could be a problem for on-stack argument passing (where you might read values beyond the end of the stack, and glibc avoids that most of the time by having enough cruft on the stack), and for architectures which pass pointers and integers in different registers (like some m68k ABIs do for the return value).

> Isn’t clock_gettime already special because of the vDSO entry point, though?

Somewhat special, yes, but not overly so, and not in the 
type-polymorphic sense.  We can't give direct access of the vDSO 
implementation to applications because the kernel does not know about 
the userspace errno variable.  We do that for time on x86_64, where 
applications call into the vDSO directly, bypassing glibc completely 
after binding.

I suspect most Linux libcs that know about the vDSO at all have generic 
vsyscall support, just like they have generic support for plain system 
calls.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
  2018-09-13 19:07                   ` Florian Weimer
@ 2018-09-13 19:35                     ` Andy Lutomirski
  2018-09-13 19:53                       ` Florian Weimer
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2018-09-13 19:35 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Thomas Gleixner, Andy Lutomirski, Matt Rickard, Stephen Boyd,
	John Stultz, X86 ML, LKML



> On Sep 13, 2018, at 12:07 PM, Florian Weimer <fweimer@redhat.com> wrote:
> 
> On 09/13/2018 05:22 PM, Andy Lutomirski wrote:
>>> On Sep 13, 2018, at 1:07 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>> 
>>> On 09/12/2018 07:11 PM, Andy Lutomirski wrote:
>>>>> The multiplexer interfaces need much more surgery and talking about futex,
>>>>> we'd need to sit down with quite some people and identify the things they
>>>>> actually care about before just splitting it up and keeping the existing
>>>>> overloaded trainwreck the same.
>>>>> 
>>>> There’s also the issue of how much the speedup matters. For futex, maybe a better interface saves 3ns, but a futex syscall is hundreds of ns. clock_gettime() is called at high frequency and can be ~25ns. Saving a few ns is a bigger deal.
>>> 
>>> My concern is that the userspace system call wrappers currently do not know how many arguments the individual operations take and what types the arguments have (hence the “type-polymorphic” nature I mentioned). This could be a problem for on-stack argument passing (where you might read values beyond the end of the stack, and glibc avoids that most of the time by having enough cruft on the stack), and for architectures which pass pointers and integers in different registers (like some m68k ABIs do for the return value).
> 
>> Isn’t clock_gettime already special because of the vDSO entry point, though?
> 
> Somewhat special, yes, but not overly so, and not in the type-polymorphic sense.  We can't give direct access of the vDSO implementation to applications because the kernel does not know about the userspace errno variable.  We do that for time on x86_64, where applications call into the vDSO directly, bypassing glibc completely after binding.

If the vDSO adds special helpers for CLOCK_MONOTONIC and CLOCK_REALTIME, I think we can reasonably safely promise that they never fail. (seccomp can obviously break that promise if there’s no TSC, but I think that seccomp users who do that get to keep both pieces.)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO
  2018-09-13 19:35                     ` Andy Lutomirski
@ 2018-09-13 19:53                       ` Florian Weimer
  0 siblings, 0 replies; 16+ messages in thread
From: Florian Weimer @ 2018-09-13 19:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Andy Lutomirski, Matt Rickard, Stephen Boyd,
	John Stultz, X86 ML, LKML

On 09/13/2018 09:35 PM, Andy Lutomirski wrote:

>> Somewhat special, yes, but not overly so, and not in the type-polymorphic sense.  We can't give direct access of the vDSO implementation to applications because the kernel does not know about the userspace errno variable.  We do that for time on x86_64, where applications call into the vDSO directly, bypassing glibc completely after binding.
> 
> If the vDSO adds special helpers for CLOCK_MONOTONIC and CLOCK_REALTIME, I think we can reasonably safely promise that they never fail. (seccomp can obviously break that promise if there’s no TSC, but I think that seccomp users who do that get to keep both pieces.)

I agree, I thought about the same thing.  We already do not return 
EFAULT for invalid pointers, for obvious reasons.  And if the clock ID 
is fixed, the EINVAL error is impossible.

That would shave off a few nanoseconds more if the calling convention is 
identical to what glibc exposes to applications.  If the vDSO is not 
available or the symbol is missing, we can provide an implementation 
based on the current clock_gettime in glibc.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-09-13 19:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-01  1:59 [PATCH v3] x86/vdso: Handle clock_gettime(CLOCK_TAI) in vDSO Matt Rickard
2018-09-01  3:39 ` Andy Lutomirski
2018-09-01  9:33   ` Florian Weimer
2018-09-01 17:37     ` Andy Lutomirski
2018-09-09 20:05   ` Thomas Gleixner
2018-09-10 10:38     ` Thomas Gleixner
2018-09-12 13:01     ` Florian Weimer
2018-09-12 14:17       ` Thomas Gleixner
2018-09-12 14:20         ` Florian Weimer
2018-09-12 14:29           ` Thomas Gleixner
2018-09-12 17:11             ` Andy Lutomirski
2018-09-13  8:07               ` Florian Weimer
2018-09-13 15:22                 ` Andy Lutomirski
2018-09-13 19:07                   ` Florian Weimer
2018-09-13 19:35                     ` Andy Lutomirski
2018-09-13 19:53                       ` Florian Weimer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.