All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Andy Lutomirski <luto@kernel.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Dave Hansen <dave.hansen@intel.com>, X86 ML <x86@kernel.org>,
	Borislav Petkov <bp@alien8.de>,
	Neil Berrington <neil.berrington@datacore.com>,
	LKML <linux-kernel@vger.kernel.org>,
	stable <stable@vger.kernel.org>
Subject: Re: [PATCH v2 1/2] x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems
Date: Fri, 26 Jan 2018 23:50:03 +0300	[thread overview]
Message-ID: <20180126205003.7dpkewl23qn2v5il@node.shutemov.name> (raw)
In-Reply-To: <CALCETrUaHysYacCF1t_Sap0jHhqBUb7dUKjaVDtPyM-kUMR3sw@mail.gmail.com>

On Fri, Jan 26, 2018 at 11:02:08AM -0800, Andy Lutomirski wrote:
> On Fri, Jan 26, 2018 at 10:51 AM, Kirill A. Shutemov
> <kirill@shutemov.name> wrote:
> > On Thu, Jan 25, 2018 at 01:12:14PM -0800, Andy Lutomirski wrote:
> >> Neil Berrington reported a double-fault on a VM with 768GB of RAM that
> >> uses large amounts of vmalloc space with PTI enabled.
> >>
> >> The cause is that load_new_mm_cr3() was never fixed to take the
> >> 5-level pgd folding code into account, so, on a 4-level kernel, the
> >> pgd synchronization logic compiles away to exactly nothing.
> >
> > Ouch. Sorry for this.
> >
> >>
> >> Interestingly, the problem doesn't trigger with nopti.  I assume this
> >> is because the kernel is mapped with global pages if we boot with
> >> nopti.  The sequence of operations when we create a new task is that
> >> we first load its mm while still running on the old stack (which
> >> crashes if the old stack is unmapped in the new mm unless the TLB
> >> saves us), then we call prepare_switch_to(), and then we switch to the
> >> new stack.  prepare_switch_to() pokes the new stack directly, which
> >> will populate the mapping through vmalloc_fault().  I assume that
> >> we're getting lucky on non-PTI systems -- the old stack's TLB entry
> >> stays alive long enough to make it all the way through
> >> prepare_switch_to() and switch_to() so that we make it to a valid
> >> stack.
> >>
> >> Fixes: b50858ce3e2a ("x86/mm/vmalloc: Add 5-level paging support")
> >> Cc: stable@vger.kernel.org
> >> Reported-and-tested-by: Neil Berrington <neil.berrington@datacore.com>
> >> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> >> ---
> >>  arch/x86/mm/tlb.c | 34 +++++++++++++++++++++++++++++-----
> >>  1 file changed, 29 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> >> index a1561957dccb..5bfe61a5e8e3 100644
> >> --- a/arch/x86/mm/tlb.c
> >> +++ b/arch/x86/mm/tlb.c
> >> @@ -151,6 +151,34 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
> >>       local_irq_restore(flags);
> >>  }
> >>
> >> +static void sync_current_stack_to_mm(struct mm_struct *mm)
> >> +{
> >> +     unsigned long sp = current_stack_pointer;
> >> +     pgd_t *pgd = pgd_offset(mm, sp);
> >> +
> >> +     if (CONFIG_PGTABLE_LEVELS > 4) {
> >
> > Can we have
> >
> >         if (PTRS_PER_P4D > 1)
> >
> > here instead? This way I wouldn't need to touch the code again for
> > boot-time switching support.
> 
> Want to send a patch?

I'll send it with the rest of boot-time switching stuff.

> (Also, I haven't noticed a patch to fix up the SYSRET checking for
> boot-time switching.  Have I just missed it?)

It's not upstream yet.

There are two patches: initial boot-time switching support and optimization on
top of it.

https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=la57/boot-switching/wip&id=c35fc0af7a4fe9b5369134d7485d95427a0a039b
https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=la57/boot-switching/wip&id=fae0e6c3eb253e63532f4ecfa6705aac2c5d710c

-- 
 Kirill A. Shutemov

  reply	other threads:[~2018-01-26 20:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-25 21:12 [PATCH v2 0/2] x86/mm/64: vmalloc pgd synchronization cleanups/fixes Andy Lutomirski
2018-01-25 21:12 ` [PATCH v2 1/2] x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems Andy Lutomirski
2018-01-25 21:49   ` Dave Hansen
2018-01-25 22:00     ` Andy Lutomirski
2018-01-26  9:30       ` Ingo Molnar
2018-01-26 18:54       ` Kirill A. Shutemov
2018-01-26 15:06   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2018-01-26 18:51   ` [PATCH v2 1/2] " Kirill A. Shutemov
2018-01-26 19:02     ` Andy Lutomirski
2018-01-26 20:50       ` Kirill A. Shutemov [this message]
2018-01-25 21:12 ` [PATCH v2 2/2] x86/mm/64: Tighten up vmalloc_fault() sanity checks on 5-level kernels Andy Lutomirski
2018-01-26 15:07   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180126205003.7dpkewl23qn2v5il@node.shutemov.name \
    --to=kirill@shutemov.name \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=neil.berrington@datacore.com \
    --cc=stable@vger.kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.