From: Vincenzo Frascino <vincenzo.frascino@arm.com>
To: Dmitry Safonov <dima@arista.com>, linux-kernel@vger.kernel.org
Cc: Dmitry Safonov <0x7f454c46@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Adrian Reber <adrian@lisas.de>, Andrei Vagin <avagin@openvz.org>,
Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
Christian Brauner <christian.brauner@ubuntu.com>,
Cyrill Gorcunov <gorcunov@openvz.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
Jann Horn <jannh@google.com>, Jeff Dike <jdike@addtoit.com>,
Oleg Nesterov <oleg@redhat.com>,
Pavel Emelyanov <xemul@virtuozzo.com>,
Shuah Khan <shuah@kernel.org>,
containers@lists.linux-foundation.org, criu@openvz.org,
linux-api@vger.kernel.org, x86@kernel.org,
Andrei Vagin <avagin@gmail.com>
Subject: Re: [PATCHv7 19/33] lib/vdso: Prepare for time namespace support
Date: Wed, 16 Oct 2019 15:37:10 +0100 [thread overview]
Message-ID: <a726e64f-bf73-4eca-6acf-75926898d88a@arm.com> (raw)
In-Reply-To: <20191011012341.846266-20-dima@arista.com>
Hi Dmitry,
On 10/11/19 2:23 AM, Dmitry Safonov wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> To support time namespaces in the vdso with a minimal impact on regular non
> time namespace affected tasks, the namespace handling needs to be hidden in
> a slow path.
>
> The most obvious place is vdso_seq_begin(). If a task belongs to a time
> namespace then the VVAR page which contains the system wide vdso data is
> replaced with a namespace specific page which has the same layout as the
> VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
> and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
> namespace handling path.
>
> The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
> update of the vdso data is in progress, is not really affecting regular
> tasks which are not part of a time namespace as the task is spin waiting
> for the update to finish and vdso_data->seq to become even again.
>
> If a time namespace task hits that code path, it invokes the corresponding
> time getter function which retrieves the real VVAR page, reads host time
> and then adds the offset for the requested clock which is stored in the
> special VVAR page.
>
> If VDSO time namespace support is disabled the whole magic is compiled out.
>
> Initial testing shows that the disabled case is almost identical to the
> host case which does not take the slow timens path. With the special timens
> page installed the performance hit is constant time and in the range of
> 5-7%.
>
> For the vdso functions which are not using the sequence count an
> unconditional check for vdso_data->clock_mode is added which switches to
> the real vdso when the clock_mode is VCLOCK_TIMENS.
>
> Suggested-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Andrei Vagin <avagin@gmail.com>
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
> include/linux/time.h | 6 ++
> include/vdso/datapage.h | 19 +++++-
> lib/vdso/Kconfig | 6 ++
> lib/vdso/gettimeofday.c | 128 +++++++++++++++++++++++++++++++++++++++-
> 4 files changed, 155 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/time.h b/include/linux/time.h
> index 27d83fd2ae61..b1a592638d7d 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -96,4 +96,10 @@ static inline bool itimerspec64_valid(const struct itimerspec64 *its)
> */
> #define time_after32(a, b) ((s32)((u32)(b) - (u32)(a)) < 0)
> #define time_before32(b, a) time_after32(a, b)
> +
> +struct timens_offset {
> + s64 sec;
> + u64 nsec;
> +};
> +
> #endif
> diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h
> index 2e302c0f41f7..65a38acce27e 100644
> --- a/include/vdso/datapage.h
> +++ b/include/vdso/datapage.h
> @@ -21,6 +21,8 @@
> #define CS_RAW 1
> #define CS_BASES (CS_RAW + 1)
>
> +#define VCLOCK_TIMENS UINT_MAX
> +
> /**
> * struct vdso_timestamp - basetime per clock_id
> * @sec: seconds
> @@ -48,6 +50,7 @@ struct vdso_timestamp {
> * @mult: clocksource multiplier
> * @shift: clocksource shift
> * @basetime[clock_id]: basetime per clock_id
> + * @offset[clock_id]: time namespace offset per clock_id
> * @tz_minuteswest: minutes west of Greenwich
> * @tz_dsttime: type of DST correction
> * @hrtimer_res: hrtimer resolution
> @@ -55,6 +58,17 @@ struct vdso_timestamp {
> *
> * vdso_data will be accessed by 64 bit and compat code at the same time
> * so we should be careful before modifying this structure.
> + *
> + * @basetime is used to store the base time for the system wide time getter
> + * VVAR page.
> + *
> + * @offset is used by the special time namespace VVAR pages which are
> + * installed instead of the real VVAR page. These namespace pages must set
> + * @seq to 1 and @clock_mode to VLOCK_TIMENS to force the code into the
> + * time namespace slow path. The namespace aware functions retrieve the
> + * real system wide VVAR page, read host time and add the per clock offset.
> + * For clocks which are not affected by time namespace adjustement the
> + * offset must be zero.
> */
> struct vdso_data {
> u32 seq;
> @@ -65,7 +79,10 @@ struct vdso_data {
> u32 mult;
> u32 shift;
>
> - struct vdso_timestamp basetime[VDSO_BASES];
> + union {
> + struct vdso_timestamp basetime[VDSO_BASES];
> + struct timens_offset offset[VDSO_BASES];
> + };
>
> s32 tz_minuteswest;
> s32 tz_dsttime;
> diff --git a/lib/vdso/Kconfig b/lib/vdso/Kconfig
> index 9fe698ff62ec..85276de70dba 100644
> --- a/lib/vdso/Kconfig
> +++ b/lib/vdso/Kconfig
> @@ -24,4 +24,10 @@ config GENERIC_COMPAT_VDSO
> help
> This config option enables the compat VDSO layer.
>
> +config VDSO_TIMENS
To uniform the naming with the rest of the file and with CONFIG_TIME_NS, can we
please change the name of this config option in GENERIC_VDSO_TIME_NS? And then
follow the logic explained by Thomas in patch 1 of this series.
Thanks,
Vincenzo
> + bool
> + help
> + Selected by architectures which support time namespaces in the
> + VDSO
> +
> endif
> diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
> index e630e7ff57f1..25244b677823 100644
> --- a/lib/vdso/gettimeofday.c
> +++ b/lib/vdso/gettimeofday.c
> @@ -38,6 +38,51 @@ u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
> }
> #endif
>
> +#ifndef CONFIG_VDSO_TIMENS
> +static __always_inline
> +const struct vdso_data *__arch_get_timens_vdso_data(void)
> +{
> + return NULL;
> +}
> +#endif
> +
> +static int do_hres_timens(const struct vdso_data *vdns, clockid_t clk,
> + struct __kernel_timespec *ts)
> +{
> + const struct vdso_data *vd = __arch_get_timens_vdso_data();
> + const struct vdso_timestamp *vdso_ts = &vd->basetime[clk];
> + const struct timens_offset *offs = &vdns->offset[clk];
> + u64 cycles, last, ns;
> + s64 sec;
> + u32 seq;
> +
> + do {
> + seq = vdso_read_begin(vd);
> + cycles = __arch_get_hw_counter(vd->clock_mode);
> + ns = vdso_ts->nsec;
> + last = vd->cycle_last;
> + if (unlikely((s64)cycles < 0))
> + return -1;
> +
> + ns += vdso_calc_delta(cycles, last, vd->mask, vd->mult);
> + ns >>= vd->shift;
> + sec = vdso_ts->sec;
> + } while (unlikely(vdso_read_retry(vd, seq)));
> +
> + /* Add the namespace offset */
> + sec += offs->sec;
> + ns += offs->nsec;
> +
> + /*
> + * Do this outside the loop: a race inside the loop could result
> + * in __iter_div_u64_rem() being extremely slow.
> + */
> + ts->tv_sec = sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
> + ts->tv_nsec = ns;
> +
> + return 0;
> +}
> +
> static int do_hres(const struct vdso_data *vd, clockid_t clk,
> struct __kernel_timespec *ts)
> {
> @@ -46,7 +91,28 @@ static int do_hres(const struct vdso_data *vd, clockid_t clk,
> u32 seq;
>
> do {
> - seq = vdso_read_begin(vd);
> + /*
> + * Open coded to handle VCLOCK_TIMENS. Time namespace
> + * enabled tasks have a special VVAR page installed which
> + * has vd->seq set to 1 and vd->clock_mode set to
> + * VCLOCK_TIMENS. For non time namespace affected tasks
> + * this does not affect performance because if vd->seq is
> + * odd, i.e. a concurrent update is in progress the extra
> + * check for vd->clock_mode is just a few extra
> + * instructions while spin waiting for vd->seq to become
> + * even again.
> + */
> + while (1) {
> + seq = READ_ONCE(vd->seq);
> + if (likely(!(seq & 1)))
> + break;
> + if (IS_ENABLED(CONFIG_VDSO_TIMENS) &&
> + vd->clock_mode == VCLOCK_TIMENS)
> + return do_hres_timens(vd, clk, ts);
> + cpu_relax();
> + }
> + smp_rmb();
> +
> cycles = __arch_get_hw_counter(vd->clock_mode);
> ns = vdso_ts->nsec;
> last = vd->cycle_last;
> @@ -68,6 +134,34 @@ static int do_hres(const struct vdso_data *vd, clockid_t clk,
> return 0;
> }
>
> +static void do_coarse_timens(const struct vdso_data *vdns, clockid_t clk,
> + struct __kernel_timespec *ts)
> +{
> + const struct vdso_data *vd = __arch_get_timens_vdso_data();
> + const struct vdso_timestamp *vdso_ts = &vd->basetime[clk];
> + const struct timens_offset *offs = &vdns->offset[clk];
> + u64 nsec;
> + s64 sec;
> + s32 seq;
> +
> + do {
> + seq = vdso_read_begin(vd);
> + sec = vdso_ts->sec;
> + nsec = vdso_ts->nsec;
> + } while (unlikely(vdso_read_retry(vd, seq)));
> +
> + /* Add the namespace offset */
> + sec += offs->sec;
> + nsec += offs->nsec;
> +
> + /*
> + * Do this outside the loop: a race inside the loop could result
> + * in __iter_div_u64_rem() being extremely slow.
> + */
> + ts->tv_sec = sec + __iter_div_u64_rem(nsec, NSEC_PER_SEC, &nsec);
> + ts->tv_nsec = nsec;
> +}
> +
> static void do_coarse(const struct vdso_data *vd, clockid_t clk,
> struct __kernel_timespec *ts)
> {
> @@ -75,7 +169,23 @@ static void do_coarse(const struct vdso_data *vd, clockid_t clk,
> u32 seq;
>
> do {
> - seq = vdso_read_begin(vd);
> + /*
> + * Open coded to handle VCLOCK_TIMENS. See comment in
> + * do_hres().
> + */
> + while (1) {
> + seq = READ_ONCE(vd->seq);
> + if (likely(!(seq & 1)))
> + break;
> + if (IS_ENABLED(CONFIG_VDSO_TIMENS) &&
> + vd->clock_mode == VCLOCK_TIMENS) {
> + do_coarse_timens(vd, clk, ts);
> + return;
> + }
> + cpu_relax();
> + }
> + smp_rmb();
> +
> ts->tv_sec = vdso_ts->sec;
> ts->tv_nsec = vdso_ts->nsec;
> } while (unlikely(vdso_read_retry(vd, seq)));
> @@ -156,6 +266,10 @@ __cvdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz)
> }
>
> if (unlikely(tz != NULL)) {
> + if (IS_ENABLED(CONFIG_VDSO_TIMENS) &&
> + vd->clock_mode == VCLOCK_TIMENS)
> + vd = __arch_get_timens_vdso_data();
> +
> tz->tz_minuteswest = vd[CS_HRES_COARSE].tz_minuteswest;
> tz->tz_dsttime = vd[CS_HRES_COARSE].tz_dsttime;
> }
> @@ -167,7 +281,12 @@ __cvdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz)
> static __maybe_unused time_t __cvdso_time(time_t *time)
> {
> const struct vdso_data *vd = __arch_get_vdso_data();
> - time_t t = READ_ONCE(vd[CS_HRES_COARSE].basetime[CLOCK_REALTIME].sec);
> + time_t t;
> +
> + if (IS_ENABLED(CONFIG_VDSO_TIMENS) && vd->clock_mode == VCLOCK_TIMENS)
> + vd = __arch_get_timens_vdso_data();
> +
> + t = READ_ONCE(vd[CS_HRES_COARSE].basetime[CLOCK_REALTIME].sec);
>
> if (time)
> *time = t;
> @@ -189,6 +308,9 @@ int __cvdso_clock_getres_common(clockid_t clock, struct __kernel_timespec *res)
> if (unlikely((u32) clock >= MAX_CLOCKS))
> return -1;
>
> + if (IS_ENABLED(CONFIG_VDSO_TIMENS) && vd->clock_mode == VCLOCK_TIMENS)
> + vd = __arch_get_timens_vdso_data();
> +
> hrtimer_res = READ_ONCE(vd[CS_HRES_COARSE].hrtimer_res);
> /*
> * Convert the clockid to a bitmask and use it to check which
>
--
Regards,
Vincenzo
next prev parent reply other threads:[~2019-10-16 14:35 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-11 1:23 [PATCHv7 00/33] kernel: Introduce Time Namespace Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 01/33] ns: " Dmitry Safonov
2019-10-16 10:27 ` Vincenzo Frascino
2019-10-16 10:39 ` Thomas Gleixner
2019-10-16 10:44 ` Vincenzo Frascino
2019-10-16 13:57 ` Dmitry Safonov
2019-10-16 23:33 ` Andrei Vagin
2019-10-17 9:20 ` Thomas Gleixner
2019-10-17 9:47 ` Vincenzo Frascino
2019-10-17 9:23 ` Vincenzo Frascino
2019-10-11 1:23 ` [PATCHv7 02/33] time: Add timens_offsets to be used for tasks in timens Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 03/33] posix-clocks: Rename the clock_get() callback to clock_get_timespec() Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 04/33] posix-clocks: Rename .clock_get_timespec() callbacks accordingly Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 05/33] alarmtimer: Rename gettime() callback to get_ktime() Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 06/33] alarmtimer: Provide get_timespec() callback Dmitry Safonov
2019-10-14 0:36 ` kbuild test robot
2019-10-11 1:23 ` [PATCHv7 07/33] posix-clocks: Introduce clock_get_ktime() callback Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 08/33] posix-timers: Use clock_get_ktime() in common_timer_get() Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 09/33] posix-clocks: Wire up clock_gettime() with timens offsets Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 10/33] kernel: Add do_timens_ktime_to_host() helper Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 11/33] timerfd: Make timerfd_settime() time namespace aware Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 12/33] posix-timers: Make timer_settime() " Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 13/33] alarmtimer: Make nanosleep " Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 14/33] hrtimers: Prepare hrtimer_nanosleep() for time namespaces Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 15/33] posix-timers: Make clock_nanosleep() time namespace aware Dmitry Safonov
2019-10-14 0:50 ` kbuild test robot
2019-10-14 4:10 ` kbuild test robot
2019-10-14 19:58 ` Andrey Vagin
2019-10-11 1:23 ` [PATCHv7 16/33] fs/proc: Respect boottime inside time namespace for /proc/uptime Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 17/33] x86/vdso: Restrict splitting VVAR VMA Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 18/33] lib/vdso: Add unlikely() hint into vdso_read_begin() Dmitry Safonov
2019-10-16 11:24 ` Vincenzo Frascino
2019-10-24 6:13 ` Andrei Vagin
2019-10-24 9:30 ` Vincenzo Frascino
2019-10-24 13:14 ` Vincenzo Frascino
2019-10-11 1:23 ` [PATCHv7 19/33] lib/vdso: Prepare for time namespace support Dmitry Safonov
2019-10-16 14:37 ` Vincenzo Frascino [this message]
2019-10-16 15:07 ` Thomas Gleixner
2019-10-16 16:36 ` Vincenzo Frascino
2019-10-11 1:23 ` [PATCHv7 20/33] x86/vdso: Provide vdso_data offset on vvar_page Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 21/33] x86/vdso: Add timens page Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 22/33] time: Allocate per-timens vvar page Dmitry Safonov
2019-10-14 2:22 ` kbuild test robot
2019-10-14 2:34 ` kbuild test robot
2019-10-11 1:23 ` [PATCHv7 23/33] x86/vdso: Handle faults on timens page Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 24/33] x86/vdso: On timens page fault prefault also VVAR page Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 25/33] x86/vdso: Zap vvar pages on switch a time namspace Dmitry Safonov
2019-10-14 2:47 ` kbuild test robot
2019-10-14 3:11 ` kbuild test robot
2019-10-11 1:23 ` [PATCHv7 26/33] fs/proc: Introduce /proc/pid/timens_offsets Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 27/33] selftests/timens: Add Time Namespace test for supported clocks Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 28/33] selftests/timens: Add a test for timerfd Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 29/33] selftests/timens: Add a test for clock_nanosleep() Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 30/33] selftests/timens: Add procfs selftest Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 31/33] selftests/timens: Add timer offsets test Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 32/33] selftests/timens: Add a simple perf test for clock_gettime() Dmitry Safonov
2019-10-11 1:23 ` [PATCHv7 33/33] selftests/timens: Check for right timens offsets after fork and exec Dmitry Safonov
2019-10-17 9:24 ` [PATCHv7 00/33] kernel: Introduce Time Namespace Thomas Gleixner
2019-10-17 23:47 ` Andrei Vagin
2019-10-22 8:45 ` Andrei Vagin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a726e64f-bf73-4eca-6acf-75926898d88a@arm.com \
--to=vincenzo.frascino@arm.com \
--cc=0x7f454c46@gmail.com \
--cc=adrian@lisas.de \
--cc=arnd@arndb.de \
--cc=avagin@gmail.com \
--cc=avagin@openvz.org \
--cc=christian.brauner@ubuntu.com \
--cc=containers@lists.linux-foundation.org \
--cc=criu@openvz.org \
--cc=dima@arista.com \
--cc=ebiederm@xmission.com \
--cc=gorcunov@openvz.org \
--cc=hpa@zytor.com \
--cc=jannh@google.com \
--cc=jdike@addtoit.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=shuah@kernel.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=xemul@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).