From: Dmitry Safonov <dima@arista.com> To: linux-kernel@vger.kernel.org Cc: Dmitry Safonov <0x7f454c46@gmail.com>, Dmitry Safonov <dima@arista.com>, Adrian Reber <adrian@lisas.de>, Andrei Vagin <avagin@openvz.org>, Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>, Christian Brauner <christian.brauner@ubuntu.com>, Cyrill Gorcunov <gorcunov@openvz.org>, "Eric W. Biederman" <ebiederm@xmission.com>, "H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>, Jann Horn <jannh@google.com>, Jeff Dike <jdike@addtoit.com>, Oleg Nesterov <oleg@redhat.com>, Pavel Emelyanov <xemul@virtuozzo.com>, Shuah Khan <shuah@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, Vincenzo Frascino <vincenzo.frascino@arm.com>, containers@lists.linux-foundation.org, criu@openvz.org, linux-api@vger.kernel.org, x86@kernel.org, Andrei Vagin <avagin@gmail.com> Subject: [PATCHv5 25/37] x86/vdso: Switch image on setns()/clone() Date: Mon, 29 Jul 2019 22:57:07 +0100 [thread overview] Message-ID: <20190729215758.28405-26-dima@arista.com> (raw) In-Reply-To: <20190729215758.28405-1-dima@arista.com> As it has been discussed on timens RFC, adding a new conditional branch `if (inside_time_ns)` on VDSO for all processes is undesirable. It will add a penalty for everybody as branch predictor may mispredict the jump. Also there are instruction cache lines wasted on cmp/jmp. Those effects of introducing time namespace are very much unwanted having in mind how much work have been spent on micro-optimisation vdso code. Addressing those problems, there are two versions of VDSO's .so: for host tasks (without any penalty) and for processes inside of time namespace with clk_to_ns() that subtracts offsets from host's time. Whenever a user does setns() or unshare(CLONE_TIMENS) followed by clone(), change VDSO image in mm and zap VVAR/VDSO page tables. They will be re-faulted with corresponding image and VVAR offsets. Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> --- arch/x86/entry/vdso/vma.c | 23 +++++++++++++++++++++++ arch/x86/include/asm/vdso.h | 1 + kernel/time_namespace.c | 11 +++++++++++ 3 files changed, 35 insertions(+) diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 8a8211fd4cfc..91cf5a5c8c9e 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -25,6 +25,7 @@ #include <asm/cpufeature.h> #include <clocksource/hyperv_timer.h> #include <asm/page.h> +#include <asm/tlb.h> #if defined(CONFIG_X86_64) unsigned int __read_mostly vdso64_enabled = 1; @@ -266,6 +267,28 @@ static const struct vm_special_mapping vvar_mapping = { .mremap = vvar_mremap, }; +#ifdef CONFIG_TIME_NS +int vdso_join_timens(struct task_struct *task) +{ + struct mm_struct *mm = task->mm; + struct vm_area_struct *vma; + + if (down_write_killable(&mm->mmap_sem)) + return -EINTR; + + for (vma = mm->mmap; vma; vma = vma->vm_next) { + unsigned long size = vma->vm_end - vma->vm_start; + + if (vma_is_special_mapping(vma, &vvar_mapping) || + vma_is_special_mapping(vma, &vdso_mapping)) + zap_page_range(vma, vma->vm_start, size); + } + + up_write(&mm->mmap_sem); + return 0; +} +#endif + /* * Add vdso and vvar mappings to current process. * @image - blob to map diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h index 03f468c63a24..ccf89dedd04f 100644 --- a/arch/x86/include/asm/vdso.h +++ b/arch/x86/include/asm/vdso.h @@ -45,6 +45,7 @@ extern struct vdso_image vdso_image_32; extern void __init init_vdso_image(struct vdso_image *image); extern int map_vdso_once(const struct vdso_image *image, unsigned long addr); +extern int vdso_join_timens(struct task_struct *task); #endif /* __ASSEMBLER__ */ diff --git a/kernel/time_namespace.c b/kernel/time_namespace.c index 9807c5c90cb2..4b2eb92ad595 100644 --- a/kernel/time_namespace.c +++ b/kernel/time_namespace.c @@ -15,6 +15,7 @@ #include <linux/cred.h> #include <linux/err.h> #include <linux/mm.h> +#include <asm/vdso.h> ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim, struct timens_offsets *ns_offsets) @@ -199,6 +200,7 @@ static void timens_put(struct ns_common *ns) static int timens_install(struct nsproxy *nsproxy, struct ns_common *new) { struct time_namespace *ns = to_time_ns(new); + int ret; if (!thread_group_empty(current)) return -EINVAL; @@ -207,6 +209,10 @@ static int timens_install(struct nsproxy *nsproxy, struct ns_common *new) !ns_capable(current_user_ns(), CAP_SYS_ADMIN)) return -EPERM; + ret = vdso_join_timens(current); + if (ret) + return ret; + get_time_ns(ns); get_time_ns(ns); put_time_ns(nsproxy->time_ns); @@ -221,10 +227,15 @@ int timens_on_fork(struct nsproxy *nsproxy, struct task_struct *tsk) { struct ns_common *nsc = &nsproxy->time_ns_for_children->ns; struct time_namespace *ns = to_time_ns(nsc); + int ret; if (nsproxy->time_ns == nsproxy->time_ns_for_children) return 0; + ret = vdso_join_timens(tsk); + if (ret) + return ret; + get_time_ns(ns); put_time_ns(nsproxy->time_ns); nsproxy->time_ns = ns; -- 2.22.0
WARNING: multiple messages have this Message-ID (diff)
From: Dmitry Safonov <dima@arista.com> To: linux-kernel@vger.kernel.org Cc: Dmitry Safonov <0x7f454c46@gmail.com>, Dmitry Safonov <dima@arista.com>, Adrian Reber <adrian@lisas.de>, Andrei Vagin <avagin@openvz.org>, Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>, Christian Brauner <christian.brauner@ubuntu.com>, Cyrill Gorcunov <gorcunov@openvz.org>, "Eric W. Biederman" <ebiederm@xmission.com>, "H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>, Jann Horn <jannh@google.com>, Jeff Dike <jdike@addtoit.com>, Oleg Nesterov <oleg@redhat.com>, Pavel Emelyanov <xemul@virtuozzo.com>, Shuah Khan <shuah@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, Vincenzo Frascino <vincenzo.frascino@arm.com>, containers@lists.linux-foundation.org, criu@openvz.org, linux-api@vger.kernel.org, x86@kernel.org Subject: [PATCHv5 25/37] x86/vdso: Switch image on setns()/clone() Date: Mon, 29 Jul 2019 22:57:07 +0100 [thread overview] Message-ID: <20190729215758.28405-26-dima@arista.com> (raw) In-Reply-To: <20190729215758.28405-1-dima@arista.com> As it has been discussed on timens RFC, adding a new conditional branch `if (inside_time_ns)` on VDSO for all processes is undesirable. It will add a penalty for everybody as branch predictor may mispredict the jump. Also there are instruction cache lines wasted on cmp/jmp. Those effects of introducing time namespace are very much unwanted having in mind how much work have been spent on micro-optimisation vdso code. Addressing those problems, there are two versions of VDSO's .so: for host tasks (without any penalty) and for processes inside of time namespace with clk_to_ns() that subtracts offsets from host's time. Whenever a user does setns() or unshare(CLONE_TIMENS) followed by clone(), change VDSO image in mm and zap VVAR/VDSO page tables. They will be re-faulted with corresponding image and VVAR offsets. Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> --- arch/x86/entry/vdso/vma.c | 23 +++++++++++++++++++++++ arch/x86/include/asm/vdso.h | 1 + kernel/time_namespace.c | 11 +++++++++++ 3 files changed, 35 insertions(+) diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 8a8211fd4cfc..91cf5a5c8c9e 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -25,6 +25,7 @@ #include <asm/cpufeature.h> #include <clocksource/hyperv_timer.h> #include <asm/page.h> +#include <asm/tlb.h> #if defined(CONFIG_X86_64) unsigned int __read_mostly vdso64_enabled = 1; @@ -266,6 +267,28 @@ static const struct vm_special_mapping vvar_mapping = { .mremap = vvar_mremap, }; +#ifdef CONFIG_TIME_NS +int vdso_join_timens(struct task_struct *task) +{ + struct mm_struct *mm = task->mm; + struct vm_area_struct *vma; + + if (down_write_killable(&mm->mmap_sem)) + return -EINTR; + + for (vma = mm->mmap; vma; vma = vma->vm_next) { + unsigned long size = vma->vm_end - vma->vm_start; + + if (vma_is_special_mapping(vma, &vvar_mapping) || + vma_is_special_mapping(vma, &vdso_mapping)) + zap_page_range(vma, vma->vm_start, size); + } + + up_write(&mm->mmap_sem); + return 0; +} +#endif + /* * Add vdso and vvar mappings to current process. * @image - blob to map diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h index 03f468c63a24..ccf89dedd04f 100644 --- a/arch/x86/include/asm/vdso.h +++ b/arch/x86/include/asm/vdso.h @@ -45,6 +45,7 @@ extern struct vdso_image vdso_image_32; extern void __init init_vdso_image(struct vdso_image *image); extern int map_vdso_once(const struct vdso_image *image, unsigned long addr); +extern int vdso_join_timens(struct task_struct *task); #endif /* __ASSEMBLER__ */ diff --git a/kernel/time_namespace.c b/kernel/time_namespace.c index 9807c5c90cb2..4b2eb92ad595 100644 --- a/kernel/time_namespace.c +++ b/kernel/time_namespace.c @@ -15,6 +15,7 @@ #include <linux/cred.h> #include <linux/err.h> #include <linux/mm.h> +#include <asm/vdso.h> ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim, struct timens_offsets *ns_offsets) @@ -199,6 +200,7 @@ static void timens_put(struct ns_common *ns) static int timens_install(struct nsproxy *nsproxy, struct ns_common *new) { struct time_namespace *ns = to_time_ns(new); + int ret; if (!thread_group_empty(current)) return -EINVAL; @@ -207,6 +209,10 @@ static int timens_install(struct nsproxy *nsproxy, struct ns_common *new) !ns_capable(current_user_ns(), CAP_SYS_ADMIN)) return -EPERM; + ret = vdso_join_timens(current); + if (ret) + return ret; + get_time_ns(ns); get_time_ns(ns); put_time_ns(nsproxy->time_ns); @@ -221,10 +227,15 @@ int timens_on_fork(struct nsproxy *nsproxy, struct task_struct *tsk) { struct ns_common *nsc = &nsproxy->time_ns_for_children->ns; struct time_namespace *ns = to_time_ns(nsc); + int ret; if (nsproxy->time_ns == nsproxy->time_ns_for_children) return 0; + ret = vdso_join_timens(tsk); + if (ret) + return ret; + get_time_ns(ns); put_time_ns(nsproxy->time_ns); nsproxy->time_ns = ns; -- 2.22.0
next prev parent reply other threads:[~2019-07-29 22:01 UTC|newest] Thread overview: 112+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-07-29 21:56 [PATCHv5 00/37] kernel: Introduce Time Namespace Dmitry Safonov 2019-07-29 21:56 ` Dmitry Safonov 2019-07-29 21:56 ` [PATCHv5 01/37] ns: " Dmitry Safonov 2019-08-01 5:29 ` Andy Lutomirski 2019-08-01 5:29 ` Andy Lutomirski 2019-08-01 23:46 ` Dmitry Safonov 2019-08-07 0:24 ` [PATCHv6 " Dmitry Safonov 2019-07-29 21:56 ` [PATCHv5 02/37] timens: Add timens_offsets Dmitry Safonov 2019-07-29 21:56 ` [PATCHv5 03/37] posix-clocks: Rename the clock_get() into clock_get_timespec() Dmitry Safonov 2019-07-29 21:56 ` Dmitry Safonov 2019-07-29 21:56 ` [PATCHv5 04/37] posix-clocks: Rename *_clock_get() functions into *_clock_get_timespec() Dmitry Safonov 2019-07-29 21:56 ` Dmitry Safonov 2019-08-07 6:01 ` Thomas Gleixner 2019-07-29 21:56 ` [PATCHv5 05/37] alarmtimer: Rename gettime() callback to get_ktime() Dmitry Safonov 2019-07-29 21:56 ` Dmitry Safonov 2019-07-29 21:56 ` [PATCHv5 06/37] alarmtimer: Provide get_timespec() callback Dmitry Safonov 2019-07-29 21:56 ` Dmitry Safonov 2019-08-07 6:04 ` Thomas Gleixner 2019-08-08 6:18 ` Andrei Vagin 2019-07-29 21:56 ` [PATCHv5 07/37] posix-clocks: Introduce clock_get_ktime() callback Dmitry Safonov 2019-07-29 21:56 ` Dmitry Safonov 2019-07-29 21:56 ` [PATCHv5 08/37] posix-timers: Use clock_get_ktime() in common_timer_get() Dmitry Safonov 2019-07-29 21:56 ` Dmitry Safonov 2019-07-29 21:56 ` [PATCHv5 09/37] posix-clocks: Introduce CLOCK_MONOTONIC time namespace offsets Dmitry Safonov 2019-08-07 6:07 ` Thomas Gleixner 2019-07-29 21:56 ` [PATCHv5 10/37] posix-clocks: Introduce CLOCK_BOOTTIME time namespace offset Dmitry Safonov 2019-07-29 21:56 ` Dmitry Safonov 2019-07-29 21:56 ` [PATCHv5 11/37] kernel: Add do_timens_ktime_to_host() helper Dmitry Safonov 2019-07-29 21:56 ` Dmitry Safonov 2019-07-29 21:56 ` [PATCHv5 12/37] timerfd: Make timerfd_settime() time namespace aware Dmitry Safonov 2019-07-29 21:56 ` Dmitry Safonov 2019-07-29 21:56 ` [PATCHv5 13/37] posix-timers: Make timer_settime() " Dmitry Safonov 2019-07-29 21:56 ` Dmitry Safonov 2019-07-29 21:56 ` [PATCHv5 14/37] alarmtimer: Make nanosleep " Dmitry Safonov 2019-07-29 21:56 ` Dmitry Safonov 2019-07-29 21:56 ` [PATCHv5 15/37] hrtimers: Prepare hrtimer_nanosleep() for time namespaces Dmitry Safonov 2019-07-29 21:56 ` Dmitry Safonov 2019-07-29 21:56 ` [PATCHv5 16/37] posix-timers: Make clock_nanosleep() time namespace aware Dmitry Safonov 2019-07-29 21:56 ` Dmitry Safonov 2019-07-29 21:56 ` [PATCHv5 17/37] fd/proc: Respect boottime inside time namespace for /proc/uptime Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 18/37] x86/vdso2c: Correct err messages on file opening Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 19/37] x86/vdso2c: Convert iterator to unsigned Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 20/37] x86/vdso/Makefile: Add vobjs32 Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 21/37] x86/vdso: Restrict splitting VVAR VMA Dmitry Safonov 2019-08-01 5:23 ` Andy Lutomirski 2019-08-01 5:23 ` Andy Lutomirski 2019-07-29 21:57 ` [PATCHv5 22/37] x86/vdso: Rename vdso_image {.data=>.text} Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 23/37] x86/vdso: Add offsets page in vvar Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-08-01 5:22 ` Andy Lutomirski 2019-08-01 5:22 ` Andy Lutomirski 2019-07-29 21:57 ` [PATCHv5 24/37] x86/vdso: Allocate timens vdso Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov [this message] 2019-07-29 21:57 ` [PATCHv5 25/37] x86/vdso: Switch image on setns()/clone() Dmitry Safonov 2019-08-01 5:34 ` Andy Lutomirski 2019-08-01 5:34 ` Andy Lutomirski 2019-08-01 6:09 ` hpa 2019-08-01 21:39 ` Andy Lutomirski 2019-08-01 21:39 ` Andy Lutomirski 2019-08-07 0:27 ` [PATCHv6 " Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 26/37] vdso: Introduce vdso_static_branch_unlikely() Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 27/37] x86/vdso2c: Process jump tables Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 28/37] x86/vdso: Enable static branches for the timens vdso Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-08-01 5:21 ` Andy Lutomirski 2019-08-01 5:21 ` Andy Lutomirski 2019-08-01 6:48 ` Thomas Gleixner 2019-08-01 6:48 ` Thomas Gleixner 2019-07-29 21:57 ` [PATCHv5 29/37] posix-clocks: Add align for timens_offsets Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 30/37] fs/proc: Introduce /proc/pid/timens_offsets Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 31/37] selftest/timens: Add Time Namespace test for supported clocks Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 32/37] selftest/timens: Add a test for timerfd Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 33/37] selftest/timens: Add a test for clock_nanosleep() Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 34/37] selftest/timens: Add procfs selftest Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 35/37] selftest/timens: Add timer offsets test Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 36/37] selftests/timens: Add a simple perf test for clock_gettime() Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 37/37] selftest/timens: Check that a right vdso is mapped after fork and exec Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 00/37] kernel: Introduce Time Namespace Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 22:07 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 01/37] ns: " Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 02/37] timens: Add timens_offsets Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 03/37] posix-clocks: Rename the clock_get() into clock_get_timespec() Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 04/37] posix-clocks: Rename *_clock_get() functions into *_clock_get_timespec() Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 05/37] alarmtimer: Rename gettime() callback to get_ktime() Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 06/37] alarmtimer: Provide get_timespec() callback Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 07/37] posix-clocks: Introduce clock_get_ktime() callback Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 08/37] posix-timers: Use clock_get_ktime() in common_timer_get() Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 09/37] posix-clocks: Introduce CLOCK_MONOTONIC time namespace offsets Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 10/37] posix-clocks: Introduce CLOCK_BOOTTIME time namespace offset Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 11/37] kernel: Add do_timens_ktime_to_host() helper Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 12/37] timerfd: Make timerfd_settime() time namespace aware Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov 2019-07-29 21:57 ` [PATCHv5 13/37] posix-timers: Make timer_settime() " Dmitry Safonov 2019-07-29 21:57 ` Dmitry Safonov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190729215758.28405-26-dima@arista.com \ --to=dima@arista.com \ --cc=0x7f454c46@gmail.com \ --cc=adrian@lisas.de \ --cc=arnd@arndb.de \ --cc=avagin@gmail.com \ --cc=avagin@openvz.org \ --cc=christian.brauner@ubuntu.com \ --cc=containers@lists.linux-foundation.org \ --cc=criu@openvz.org \ --cc=ebiederm@xmission.com \ --cc=gorcunov@openvz.org \ --cc=hpa@zytor.com \ --cc=jannh@google.com \ --cc=jdike@addtoit.com \ --cc=linux-api@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=luto@kernel.org \ --cc=mingo@redhat.com \ --cc=oleg@redhat.com \ --cc=shuah@kernel.org \ --cc=tglx@linutronix.de \ --cc=vincenzo.frascino@arm.com \ --cc=x86@kernel.org \ --cc=xemul@virtuozzo.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.