From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Cyrus-Session-Id: sloti22d1t05-3489203-1516992711-2-14826250243825358936 X-Sieve: CMU Sieve 3.0 X-Spam-known-sender: no X-Spam-score: 0.0 X-Spam-hits: BAYES_00 -1.9, HEADER_FROM_DIFFERENT_DOMAINS 0.001, RCVD_IN_DNSWL_HI -5, T_RP_MATCHES_RCVD -0.01, LANGUAGES en, BAYES_USED global, SA_VERSION 3.4.0 X-Spam-source: IP='209.132.180.67', Host='vger.kernel.org', Country='US', FromHeader='name', MailFrom='org' X-Spam-charsets: plain='us-ascii' X-Resolved-to: greg@kroah.com X-Delivered-to: greg@kroah.com X-Mail-from: stable-owner@vger.kernel.org ARC-Seal: i=1; a=rsa-sha256; cv=none; d=messagingengine.com; s=arctest; t=1516992710; b=JkXuUzaOB9D3KWjtGcrvLR4lOngWUTCnMZHqNJCh6K42dBC xqzZfe1UkpkWYkkWp50mkK/bcvPr+4W9eX4a2LOrg9M4SoKeOWiTiXnbg3yk9hGA l3IG2p3iDhL2MBYoaDLYcEZ2e9NTh3HeNOf/zPSHDb3ny2x51OYa8vPWi0OGlz9/ YXDh0jMCLeJtDBwYrF/XhHhHPTK6l8jwnFvySiPAIFLPIQX3K2nPa20SyPqS0F3M mNvrW6ioL5gG3n16ZzzrTGfQ4lBTSP2hEHkll5WzuvbzdNrPfL+DeOKr6lz18uR0 Jfi+xrIP7/UYIriimjwLnEGtuF4COYqKBKvAyGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=date:from:to:cc:subject:message-id :references:mime-version:content-type:in-reply-to:sender :list-id; s=arctest; t=1516992710; bh=kqtmldxhFkV+MlPK9bkLRBGy7F /6v+xUxP7ZnCrg9EE=; b=PvKXNePnunp+72kWnzI9kggaQkMxD6MfYssoIMeyF6 wpatbJuz0qZ4oOXHDgZOmlgOpfSRg8yI2ldxdK9pJXbhYVzfbsOp4eaVtSpKUmLc k162tStK6HNYTQVWWq75pVVKjPPFqXfiFal76EZwmf1hRgfi3Ad7Zs23O1aaHvHj QAf4Oct6N7+P27Lz4A8Q9DOk9FX4dUHAc4ioOIT/PLARiWBKZGSdSxovWHkCn79d aV3Xoh+TaH6HBc03ymKh3GS0gelVrYYLk6YSBWBicuQHIwhjFMgmhIMfHHbdHPZs VboC99c/eLaH+MgTonc6iBnuFZzpHBojPg2cOtiTbNeA== ARC-Authentication-Results: i=1; mx5.messagingengine.com; arc=none (no signatures found); dkim=pass (2048-bit rsa key sha256) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b=rkv7dK+W x-bits=2048 x-keytype=rsa x-algorithm=sha256 x-selector=20150623; dmarc=none (p=none,has-list-id=yes,d=none) header.from=shutemov.name; iprev=pass policy.iprev=209.132.180.67 (vger.kernel.org); spf=none smtp.mailfrom=stable-owner@vger.kernel.org smtp.helo=vger.kernel.org; x-aligned-from=fail; x-google-dkim=pass (2048-bit rsa key) header.d=1e100.net header.i=@1e100.net header.b=VWgoZgmW; x-ptr=pass x-ptr-helo=vger.kernel.org x-ptr-lookup=vger.kernel.org; x-return-mx=pass smtp.domain=vger.kernel.org smtp.result=pass smtp_org.domain=kernel.org smtp_org.result=pass smtp_is_org_domain=no header.domain=shutemov.name header.result=pass header_is_org_domain=yes Authentication-Results: mx5.messagingengine.com; arc=none (no signatures found); dkim=pass (2048-bit rsa key sha256) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b=rkv7dK+W x-bits=2048 x-keytype=rsa x-algorithm=sha256 x-selector=20150623; dmarc=none (p=none,has-list-id=yes,d=none) header.from=shutemov.name; iprev=pass policy.iprev=209.132.180.67 (vger.kernel.org); spf=none smtp.mailfrom=stable-owner@vger.kernel.org smtp.helo=vger.kernel.org; x-aligned-from=fail; x-google-dkim=pass (2048-bit rsa key) header.d=1e100.net header.i=@1e100.net header.b=VWgoZgmW; x-ptr=pass x-ptr-helo=vger.kernel.org x-ptr-lookup=vger.kernel.org; x-return-mx=pass smtp.domain=vger.kernel.org smtp.result=pass smtp_org.domain=kernel.org smtp_org.result=pass smtp_is_org_domain=no header.domain=shutemov.name header.result=pass header_is_org_domain=yes Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752186AbeAZSvr (ORCPT ); Fri, 26 Jan 2018 13:51:47 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:38998 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752118AbeAZSvq (ORCPT ); Fri, 26 Jan 2018 13:51:46 -0500 X-Google-Smtp-Source: AH8x226sSSRBZrh4IFa+fmzVgPrGfKrk9k/eigtVFI0m3FLQ7AOEZ0gWeywCIO9o+x2E696aSWXStA== Date: Fri, 26 Jan 2018 21:51:43 +0300 From: "Kirill A. Shutemov" To: Andy Lutomirski Cc: Konstantin Khlebnikov , Dave Hansen , X86 ML , Borislav Petkov , Neil Berrington , LKML , stable@vger.kernel.org Subject: Re: [PATCH v2 1/2] x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems Message-ID: <20180126185143.dx7emh7cq5pbrkxn@node.shutemov.name> References: <346541c56caed61abbe693d7d2742b4a380c5001.1516914529.git.luto@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <346541c56caed61abbe693d7d2742b4a380c5001.1516914529.git.luto@kernel.org> User-Agent: NeoMutt/20171215 Sender: stable-owner@vger.kernel.org X-Mailing-List: stable@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Thu, Jan 25, 2018 at 01:12:14PM -0800, Andy Lutomirski wrote: > Neil Berrington reported a double-fault on a VM with 768GB of RAM that > uses large amounts of vmalloc space with PTI enabled. > > The cause is that load_new_mm_cr3() was never fixed to take the > 5-level pgd folding code into account, so, on a 4-level kernel, the > pgd synchronization logic compiles away to exactly nothing. Ouch. Sorry for this. > > Interestingly, the problem doesn't trigger with nopti. I assume this > is because the kernel is mapped with global pages if we boot with > nopti. The sequence of operations when we create a new task is that > we first load its mm while still running on the old stack (which > crashes if the old stack is unmapped in the new mm unless the TLB > saves us), then we call prepare_switch_to(), and then we switch to the > new stack. prepare_switch_to() pokes the new stack directly, which > will populate the mapping through vmalloc_fault(). I assume that > we're getting lucky on non-PTI systems -- the old stack's TLB entry > stays alive long enough to make it all the way through > prepare_switch_to() and switch_to() so that we make it to a valid > stack. > > Fixes: b50858ce3e2a ("x86/mm/vmalloc: Add 5-level paging support") > Cc: stable@vger.kernel.org > Reported-and-tested-by: Neil Berrington > Signed-off-by: Andy Lutomirski > --- > arch/x86/mm/tlb.c | 34 +++++++++++++++++++++++++++++----- > 1 file changed, 29 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index a1561957dccb..5bfe61a5e8e3 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -151,6 +151,34 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, > local_irq_restore(flags); > } > > +static void sync_current_stack_to_mm(struct mm_struct *mm) > +{ > + unsigned long sp = current_stack_pointer; > + pgd_t *pgd = pgd_offset(mm, sp); > + > + if (CONFIG_PGTABLE_LEVELS > 4) { Can we have if (PTRS_PER_P4D > 1) here instead? This way I wouldn't need to touch the code again for boot-time switching support. -- Kirill A. Shutemov