From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E6E9C43219 for ; Thu, 25 Apr 2019 16:14:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 889162089E for ; Thu, 25 Apr 2019 16:14:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=arista.com header.i=@arista.com header.b="WkbDlwog" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728371AbfDYQOs (ORCPT ); Thu, 25 Apr 2019 12:14:48 -0400 Received: from mail-ed1-f65.google.com ([209.85.208.65]:45319 "EHLO mail-ed1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728151AbfDYQOo (ORCPT ); Thu, 25 Apr 2019 12:14:44 -0400 Received: by mail-ed1-f65.google.com with SMTP id k92so357855edc.12 for ; Thu, 25 Apr 2019 09:14:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arista.com; s=googlenew; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+1znLXNqilQSnm1xgvpPiJTCwIQrCZNMrioXZYT6Vnw=; b=WkbDlwogpm5/O/iozLsa8oqGamWroqJnzoVz/KOCDiM/yBg8ycbibYuw20G4DjLmDS NFYpUej064Ftysaurh+B45S6HIFVwauEjLqKtpjbJ+tdBAbQbquWnRDSZKbnjR3Cb5q+ FIf51Xi7CVNC2x5Za4LiLQSQez5VVowgx/F6YIdxNaYC4WZeRti3FNT3JXstVoT+5Bnc x6mWVwMX6na97pFKKGTb6P5UtnX/TtaaHlHq5qiuHXQEvKoYbTlVgRSLHI8tNNmflMEf 7PItF90DozNjEJaWzQhCEGD/MWB8PLxanB1caggACqVOQXZbGpiTgA7ojzDOvhrFDYJm +cHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+1znLXNqilQSnm1xgvpPiJTCwIQrCZNMrioXZYT6Vnw=; b=ChxwYgbD/YTzL2mcEhqncGTC09/7m6jLtP6coFQzJsXKUH2hIyQPcaWR4CFTsHy1y6 1IqK1GJguSrbMWmL3jjkMmxZFIalGEJuAx8UIt90XLKoHNilHAeI2kHPlkPoeL08H8EM 4Hqz+M45RCKbOKH0KgMY2wdmG29Lyo6PXn/9l40nhqcw/VSdr6lhddSxN2NqB6abRjxG KvQ2VPsRWCbzinNCbjwHxl/6c5MIHIll5atcLjRTy49P8XIL+rCfs8On2EDIFdx8Lew8 ARWg7vvsMoh7oht2XGrv7VVnihHxh2PGsfT/rxi+xA6nDj3JlIg5kCwalWTqAcW9YQgD gHig== X-Gm-Message-State: APjAAAUWYKUXGGy3+SeCCegwAWBHo+aAV7ETz8yYaBmHZGUPXUCCqxFr PsS9OUI5tFsWFiCwQxig/lnIwbd0wkE= X-Google-Smtp-Source: APXvYqytoAHBNlgjyuUnqYkWV6tjlQDFXOTUAW1P4ytkWqCMtTlcJORiXIrcJNfO21itz95L/40kXg== X-Received: by 2002:a05:6402:781:: with SMTP id d1mr23591698edy.286.1556208881472; Thu, 25 Apr 2019 09:14:41 -0700 (PDT) Received: from Mindolluin.ire.aristanetworks.com ([217.173.96.166]) by smtp.gmail.com with ESMTPSA id br19sm4147042ejb.48.2019.04.25.09.14.39 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 25 Apr 2019 09:14:40 -0700 (PDT) From: Dmitry Safonov To: linux-kernel@vger.kernel.org Cc: Andrei Vagin , Dmitry Safonov , Adrian Reber , Andy Lutomirski , Arnd Bergmann , Christian Brauner , Cyrill Gorcunov , Dmitry Safonov <0x7f454c46@gmail.com>, "Eric W. Biederman" , "H. Peter Anvin" , Ingo Molnar , Jeff Dike , Oleg Nesterov , Pavel Emelyanov , Shuah Khan , Thomas Gleixner , Vincenzo Frascino , containers@lists.linux-foundation.org, criu@openvz.org, linux-api@vger.kernel.org, x86@kernel.org, Andrei Vagin Subject: [PATCHv3 14/27] x86/vdso: Add offsets page in vvar Date: Thu, 25 Apr 2019 17:14:03 +0100 Message-Id: <20190425161416.26600-15-dima@arista.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190425161416.26600-1-dima@arista.com> References: <20190425161416.26600-1-dima@arista.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Andrei Vagin As modern applications fetch time from VDSO without entering the kernel, it's needed to provide offsets for userspace code inside time namespace. A page for timens offsets is allocated on time namespace construction. Put that page into VVAR for tasks inside timens and zero page for host processes. As VDSO code is already optimized as much as possible in terms of speed, any new if-condition in VDSO code is undesirable; the goal is to provide two .so(s), as was originally suggested by Andy and Thomas: - for host tasks with optimized-out clk_to_ns() without any penalty - for processes inside timens with clk_to_ns() For this purpose, define clk_to_ns() under CONFIG_TIME_NS. To eliminate any performance regression, clk_to_ns() will be called under static_branch with follow-up patches, that adds support for patching vdso. VDSO mappings are platform-specific, add Kconfig dependency for arch. Signed-off-by: Andrei Vagin Co-developed-by: Dmitry Safonov Signed-off-by: Dmitry Safonov --- arch/Kconfig | 5 ++++ arch/x86/Kconfig | 1 + arch/x86/entry/vdso/vclock_gettime.c | 43 +++++++++++++++++++++++++++ arch/x86/entry/vdso/vdso-layout.lds.S | 9 +++++- arch/x86/entry/vdso/vdso2c.c | 3 ++ arch/x86/entry/vdso/vma.c | 12 ++++++++ arch/x86/include/asm/vdso.h | 1 + init/Kconfig | 1 + 8 files changed, 74 insertions(+), 1 deletion(-) diff --git a/arch/Kconfig b/arch/Kconfig index 33687dddd86a..1db650ad80bc 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -717,6 +717,11 @@ config HAVE_ARCH_NVRAM_OPS config ISA_BUS_API def_bool ISA +config ARCH_HAS_VDSO_TIME_NS + bool + help + VDSO can add time-ns offsets without entering kernel. + # # ABI hall of shame # diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 62fc3fda1a05..e692c62f53df 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -69,6 +69,7 @@ config X86 select ARCH_HAS_STRICT_MODULE_RWX select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE select ARCH_HAS_UBSAN_SANITIZE_ALL + select ARCH_HAS_VDSO_TIME_NS select ARCH_HAS_ZONE_DEVICE if X86_64 select ARCH_HAVE_NMI_SAFE_CMPXCHG select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c index 007b3fe9d727..1d2ba6250255 100644 --- a/arch/x86/entry/vdso/vclock_gettime.c +++ b/arch/x86/entry/vdso/vclock_gettime.c @@ -21,6 +21,7 @@ #include #include #include +#include #define gtod (&VVAR(vsyscall_gtod_data)) @@ -38,6 +39,11 @@ extern u8 hvclock_page __attribute__((visibility("hidden"))); #endif +#ifdef CONFIG_TIME_NS +extern u8 timens_page + __attribute__((visibility("hidden"))); +#endif + #ifndef BUILD_VDSO32 notrace static long vdso_fallback_gettime(long clock, struct timespec *ts) @@ -139,6 +145,39 @@ notrace static inline u64 vgetcyc(int mode) return U64_MAX; } +#ifdef CONFIG_TIME_NS +notrace static __always_inline void clk_to_ns(clockid_t clk, struct timespec *ts) +{ + struct timens_offsets *timens = (struct timens_offsets *) &timens_page; + struct timespec64 *offset64; + + switch (clk) { + case CLOCK_MONOTONIC: + case CLOCK_MONOTONIC_COARSE: + case CLOCK_MONOTONIC_RAW: + offset64 = &timens->monotonic_time_offset; + break; + case CLOCK_BOOTTIME: + offset64 = &timens->monotonic_boottime_offset; + default: + return; + } + + ts->tv_nsec += offset64->tv_nsec; + ts->tv_sec += offset64->tv_sec; + if (ts->tv_nsec >= NSEC_PER_SEC) { + ts->tv_nsec -= NSEC_PER_SEC; + ts->tv_sec++; + } + if (ts->tv_nsec < 0) { + ts->tv_nsec += NSEC_PER_SEC; + ts->tv_sec--; + } +} +#else +notrace static __always_inline void clk_to_ns(clockid_t clk, struct timespec *ts) {} +#endif + notrace static int do_hres(clockid_t clk, struct timespec *ts) { struct vgtod_ts *base = >od->basetime[clk]; @@ -165,6 +204,8 @@ notrace static int do_hres(clockid_t clk, struct timespec *ts) ts->tv_sec = sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns); ts->tv_nsec = ns; + clk_to_ns(clk, ts); + return 0; } @@ -178,6 +219,8 @@ notrace static void do_coarse(clockid_t clk, struct timespec *ts) ts->tv_sec = base->sec; ts->tv_nsec = base->nsec; } while (unlikely(gtod_read_retry(gtod, seq))); + + clk_to_ns(clk, ts); } notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts) diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S index 93c6dc7812d0..ba216527e59f 100644 --- a/arch/x86/entry/vdso/vdso-layout.lds.S +++ b/arch/x86/entry/vdso/vdso-layout.lds.S @@ -7,6 +7,12 @@ * This script controls its layout. */ +#ifdef CONFIG_TIME_NS +# define TIMENS_SZ PAGE_SIZE +#else +# define TIMENS_SZ 0 +#endif + SECTIONS { /* @@ -16,7 +22,7 @@ SECTIONS * segment. */ - vvar_start = . - 3 * PAGE_SIZE; + vvar_start = . - (3 * PAGE_SIZE + TIMENS_SZ); vvar_page = vvar_start; /* Place all vvars at the offsets in asm/vvar.h. */ @@ -28,6 +34,7 @@ SECTIONS pvclock_page = vvar_start + PAGE_SIZE; hvclock_page = vvar_start + 2 * PAGE_SIZE; + timens_page = vvar_start + 3 * PAGE_SIZE; . = SIZEOF_HEADERS; diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c index 26d7177c119e..ed66b023d4b9 100644 --- a/arch/x86/entry/vdso/vdso2c.c +++ b/arch/x86/entry/vdso/vdso2c.c @@ -76,6 +76,7 @@ enum { sym_hpet_page, sym_pvclock_page, sym_hvclock_page, + sym_timens_page, }; const int special_pages[] = { @@ -83,6 +84,7 @@ const int special_pages[] = { sym_hpet_page, sym_pvclock_page, sym_hvclock_page, + sym_timens_page, }; struct vdso_sym { @@ -96,6 +98,7 @@ struct vdso_sym required_syms[] = { [sym_hpet_page] = {"hpet_page", true}, [sym_pvclock_page] = {"pvclock_page", true}, [sym_hvclock_page] = {"hvclock_page", true}, + [sym_timens_page] = {"timens_page", true}, {"VDSO32_NOTE_MASK", true}, {"__kernel_vsyscall", true}, {"__kernel_sigreturn", true}, diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 35f7a1c1f4bc..80cbb2167eba 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -23,6 +24,7 @@ #include #include #include +#include #if defined(CONFIG_X86_64) unsigned int __read_mostly vdso64_enabled = 1; @@ -135,6 +137,16 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, if (tsc_pg && vclock_was_used(VCLOCK_HVCLOCK)) return vmf_insert_pfn(vma, vmf->address, vmalloc_to_pfn(tsc_pg)); + } else if (sym_offset == image->sym_timens_page) { + struct time_namespace *ns = current->nsproxy->time_ns; + unsigned long pfn; + + if (!ns->offsets) + pfn = page_to_pfn(ZERO_PAGE(0)); + else + pfn = page_to_pfn(virt_to_page(ns->offsets)); + + return vmf_insert_pfn(vma, vmf->address, pfn); } return VM_FAULT_SIGBUS; diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h index 02cb843b4c0b..b0eb59c198eb 100644 --- a/arch/x86/include/asm/vdso.h +++ b/arch/x86/include/asm/vdso.h @@ -22,6 +22,7 @@ struct vdso_image { long sym_hpet_page; long sym_pvclock_page; long sym_hvclock_page; + long sym_timens_page; long sym_VDSO32_NOTE_MASK; long sym___kernel_sigreturn; long sym___kernel_rt_sigreturn; diff --git a/init/Kconfig b/init/Kconfig index 10eebeaadfaa..e5a80278c395 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -984,6 +984,7 @@ config UTS_NS config TIME_NS bool "TIME namespace" + depends on ARCH_HAS_VDSO_TIME_NS default y help In this namespace boottime and monotonic clocks can be set. -- 2.21.0