linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>,
	Dmitry Safonov <dima@arista.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	Adrian Reber <adrian@lisas.de>, Andrei Vagin <avagin@openvz.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Ingo Molnar <mingo@redhat.com>, Jann Horn <jannh@google.com>,
	Jeff Dike <jdike@addtoit.com>, Oleg Nesterov <oleg@redhat.com>,
	Pavel Emelyanov <xemul@virtuozzo.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vincenzo Frascino <vincenzo.frascino@arm.com>,
	Linux Containers <containers@lists.linux-foundation.org>,
	criu@openvz.org, Linux API <linux-api@vger.kernel.org>,
	X86 ML <x86@kernel.org>, Andrei Vagin <avagin@gmail.com>
Subject: Re: [PATCHv5 25/37] x86/vdso: Switch image on setns()/clone()
Date: Thu, 1 Aug 2019 14:39:51 -0700	[thread overview]
Message-ID: <CALCETrUfGb8VcgdyCme=n755OB_qaGqS9QATpn8wqQ3XCqUgAA@mail.gmail.com> (raw)
In-Reply-To: <4D0E6734-066D-4A72-A119-2FD6482F857D@zytor.com>

On Wed, Jul 31, 2019 at 11:09 PM <hpa@zytor.com> wrote:
>
> On July 31, 2019 10:34:26 PM PDT, Andy Lutomirski <luto@kernel.org> wrote:
> >On Mon, Jul 29, 2019 at 2:58 PM Dmitry Safonov <dima@arista.com> wrote:
> >>
> >> As it has been discussed on timens RFC, adding a new conditional
> >branch
> >> `if (inside_time_ns)` on VDSO for all processes is undesirable.
> >> It will add a penalty for everybody as branch predictor may
> >mispredict
> >> the jump. Also there are instruction cache lines wasted on cmp/jmp.
> >
> >
> >>
> >> +#ifdef CONFIG_TIME_NS
> >> +int vdso_join_timens(struct task_struct *task)
> >> +{
> >> +       struct mm_struct *mm = task->mm;
> >> +       struct vm_area_struct *vma;
> >> +
> >> +       if (down_write_killable(&mm->mmap_sem))
> >> +               return -EINTR;
> >> +
> >> +       for (vma = mm->mmap; vma; vma = vma->vm_next) {
> >> +               unsigned long size = vma->vm_end - vma->vm_start;
> >> +
> >> +               if (vma_is_special_mapping(vma, &vvar_mapping) ||
> >> +                   vma_is_special_mapping(vma, &vdso_mapping))
> >> +                       zap_page_range(vma, vma->vm_start, size);
> >> +       }
> >
> >This is, unfortunately, fundamentally buggy.  If any thread is in the
> >vDSO or has the vDSO on the stack (due to a signal, for example), this
> >will crash it.  I can think of three solutions:
> >
> >1. Say that you can't setns() if you have other mms and ignore the
> >signal issue.  Anything with green threads will disapprove.  It's also
> >rather gross.
> >
> >2. Make it so that you can flip the static branch safely.  As in my
> >other email, you'll need to deal with CoW somehow,
> >
> >3. Make it so that you can't change timens, or at least that you can't
> >turn timens on or off, without execve() or fork().
> >
> >BTW, that static branch probably needs to be aligned to a cache line
> >or something similar to avoid all the nastiness with trying to poke
> >text that might be concurrently executing.  This will be a mess.
>
> Since we are talking about different physical addresses I believe we should be okay as long as they don't cross page boundaries, and even if they do it can be managed with proper page invalidation sequencing – it's not like the problems of having to deal with XMC on live pages like in the kernel.
>
> Still, you really need each instruction sequence to be present, with the only difference being specific patch sites.
>
> Any fundamental reason this can't be strictly data driven? Seems odd to me if it couldn't, but I might be missing something obvious.

I think it can be.  There are at least two places where vDSO slow
paths could hook without affecting fast paths: vclock_mode and the low
bit of the sequence number.

  reply	other threads:[~2019-08-01 21:40 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-29 21:56 [PATCHv5 00/37] kernel: Introduce Time Namespace Dmitry Safonov
2019-07-29 21:56 ` [PATCHv5 01/37] ns: " Dmitry Safonov
2019-08-01  5:29   ` Andy Lutomirski
2019-08-01 23:46     ` Dmitry Safonov
2019-08-07  0:24   ` [PATCHv6 " Dmitry Safonov
2019-07-29 21:56 ` [PATCHv5 02/37] timens: Add timens_offsets Dmitry Safonov
2019-07-29 21:56 ` [PATCHv5 03/37] posix-clocks: Rename the clock_get() into clock_get_timespec() Dmitry Safonov
2019-07-29 21:56 ` [PATCHv5 04/37] posix-clocks: Rename *_clock_get() functions into *_clock_get_timespec() Dmitry Safonov
2019-08-07  6:01   ` Thomas Gleixner
2019-07-29 21:56 ` [PATCHv5 05/37] alarmtimer: Rename gettime() callback to get_ktime() Dmitry Safonov
2019-07-29 21:56 ` [PATCHv5 06/37] alarmtimer: Provide get_timespec() callback Dmitry Safonov
2019-08-07  6:04   ` Thomas Gleixner
2019-08-08  6:18     ` Andrei Vagin
2019-07-29 21:56 ` [PATCHv5 07/37] posix-clocks: Introduce clock_get_ktime() callback Dmitry Safonov
2019-07-29 21:56 ` [PATCHv5 08/37] posix-timers: Use clock_get_ktime() in common_timer_get() Dmitry Safonov
2019-07-29 21:56 ` [PATCHv5 09/37] posix-clocks: Introduce CLOCK_MONOTONIC time namespace offsets Dmitry Safonov
2019-08-07  6:07   ` Thomas Gleixner
2019-07-29 21:56 ` [PATCHv5 10/37] posix-clocks: Introduce CLOCK_BOOTTIME time namespace offset Dmitry Safonov
2019-07-29 21:56 ` [PATCHv5 11/37] kernel: Add do_timens_ktime_to_host() helper Dmitry Safonov
2019-07-29 21:56 ` [PATCHv5 12/37] timerfd: Make timerfd_settime() time namespace aware Dmitry Safonov
2019-07-29 21:56 ` [PATCHv5 13/37] posix-timers: Make timer_settime() " Dmitry Safonov
2019-07-29 21:56 ` [PATCHv5 14/37] alarmtimer: Make nanosleep " Dmitry Safonov
2019-07-29 21:56 ` [PATCHv5 15/37] hrtimers: Prepare hrtimer_nanosleep() for time namespaces Dmitry Safonov
2019-07-29 21:56 ` [PATCHv5 16/37] posix-timers: Make clock_nanosleep() time namespace aware Dmitry Safonov
2019-07-29 21:56 ` [PATCHv5 17/37] fd/proc: Respect boottime inside time namespace for /proc/uptime Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 18/37] x86/vdso2c: Correct err messages on file opening Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 19/37] x86/vdso2c: Convert iterator to unsigned Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 20/37] x86/vdso/Makefile: Add vobjs32 Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 21/37] x86/vdso: Restrict splitting VVAR VMA Dmitry Safonov
2019-08-01  5:23   ` Andy Lutomirski
2019-07-29 21:57 ` [PATCHv5 22/37] x86/vdso: Rename vdso_image {.data=>.text} Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 23/37] x86/vdso: Add offsets page in vvar Dmitry Safonov
2019-08-01  5:22   ` Andy Lutomirski
2019-07-29 21:57 ` [PATCHv5 24/37] x86/vdso: Allocate timens vdso Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 25/37] x86/vdso: Switch image on setns()/clone() Dmitry Safonov
2019-08-01  5:34   ` Andy Lutomirski
2019-08-01  6:09     ` hpa
2019-08-01 21:39       ` Andy Lutomirski [this message]
2019-08-07  0:27   ` [PATCHv6 " Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 26/37] vdso: Introduce vdso_static_branch_unlikely() Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 27/37] x86/vdso2c: Process jump tables Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 28/37] x86/vdso: Enable static branches for the timens vdso Dmitry Safonov
2019-08-01  5:21   ` Andy Lutomirski
2019-08-01  6:48     ` Thomas Gleixner
2019-07-29 21:57 ` [PATCHv5 29/37] posix-clocks: Add align for timens_offsets Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 30/37] fs/proc: Introduce /proc/pid/timens_offsets Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 31/37] selftest/timens: Add Time Namespace test for supported clocks Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 32/37] selftest/timens: Add a test for timerfd Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 33/37] selftest/timens: Add a test for clock_nanosleep() Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 34/37] selftest/timens: Add procfs selftest Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 35/37] selftest/timens: Add timer offsets test Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 36/37] selftests/timens: Add a simple perf test for clock_gettime() Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 37/37] selftest/timens: Check that a right vdso is mapped after fork and exec Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 00/37] kernel: Introduce Time Namespace Dmitry Safonov
2019-07-29 22:07   ` Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 01/37] ns: " Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 02/37] timens: Add timens_offsets Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 03/37] posix-clocks: Rename the clock_get() into clock_get_timespec() Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 04/37] posix-clocks: Rename *_clock_get() functions into *_clock_get_timespec() Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 05/37] alarmtimer: Rename gettime() callback to get_ktime() Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 06/37] alarmtimer: Provide get_timespec() callback Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 07/37] posix-clocks: Introduce clock_get_ktime() callback Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 08/37] posix-timers: Use clock_get_ktime() in common_timer_get() Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 09/37] posix-clocks: Introduce CLOCK_MONOTONIC time namespace offsets Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 10/37] posix-clocks: Introduce CLOCK_BOOTTIME time namespace offset Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 11/37] kernel: Add do_timens_ktime_to_host() helper Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 12/37] timerfd: Make timerfd_settime() time namespace aware Dmitry Safonov
2019-07-29 21:57 ` [PATCHv5 13/37] posix-timers: Make timer_settime() " Dmitry Safonov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrUfGb8VcgdyCme=n755OB_qaGqS9QATpn8wqQ3XCqUgAA@mail.gmail.com' \
    --to=luto@kernel.org \
    --cc=0x7f454c46@gmail.com \
    --cc=adrian@lisas.de \
    --cc=arnd@arndb.de \
    --cc=avagin@gmail.com \
    --cc=avagin@openvz.org \
    --cc=christian.brauner@ubuntu.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=criu@openvz.org \
    --cc=dima@arista.com \
    --cc=ebiederm@xmission.com \
    --cc=gorcunov@openvz.org \
    --cc=hpa@zytor.com \
    --cc=jannh@google.com \
    --cc=jdike@addtoit.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vincenzo.frascino@arm.com \
    --cc=x86@kernel.org \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).