From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BD66C352A2 for ; Wed, 5 Feb 2020 18:21:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2F5AE217F4 for ; Wed, 5 Feb 2020 18:21:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2F5AE217F4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DACC06B0070; Wed, 5 Feb 2020 13:20:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C6D436B0073; Wed, 5 Feb 2020 13:20:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0BBF6B0071; Wed, 5 Feb 2020 13:20:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 99AFA6B006E for ; Wed, 5 Feb 2020 13:20:33 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3B6AA49960C for ; Wed, 5 Feb 2020 18:20:33 +0000 (UTC) X-FDA: 76456888746.16.bike81_792d663455759 X-HE-Tag: bike81_792d663455759 X-Filterd-Recvd-Size: 6512 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Wed, 5 Feb 2020 18:20:32 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Feb 2020 10:20:27 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,406,1574150400"; d="scan'208";a="279447789" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by FMSMGA003.fm.intel.com with ESMTP; 05 Feb 2020 10:20:27 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , x86-patch-review@intel.com Cc: Yu-cheng Yu Subject: [RFC PATCH v9 12/27] x86/mm: Modify ptep_set_wrprotect and pmdp_set_wrprotect for _PAGE_DIRTY_SW Date: Wed, 5 Feb 2020 10:19:20 -0800 Message-Id: <20200205181935.3712-13-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200205181935.3712-1-yu-cheng.yu@intel.com> References: <20200205181935.3712-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When Shadow Stack (SHSTK) is enabled, the [R/O + PAGE_DIRTY_HW] setting i= s reserved only for SHSTK. Non-Shadow Stack R/O PTEs are [R/O + PAGE_DIRTY_SW]. When a PTE goes from [R/W + PAGE_DIRTY_HW] to [R/O + PAGE_DIRTY_SW], it could become a transient SHSTK PTE in two cases. The first case is that some processors can start a write but end up seein= g a read-only PTE by the time they get to the Dirty bit, creating a transie= nt SHSTK PTE. However, this will not occur on processors supporting SHSTK therefore we don't need a TLB flush here. The second case is that when the software, without atomic, tests & replac= es PAGE_DIRTY_HW with PAGE_DIRTY_SW, a transient SHSTK PTE can exist. This = is prevented with cmpxchg. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the cmpxchg solution. v9: - Change compile-time conditionals to runtime checks. - Fix parameters of try_cmpxchg(): change pte_t/pmd_t to pte_t.pte/pmd_t.pmd. v4: - Implement try_cmpxchg(). Signed-off-by: Yu-cheng Yu --- arch/x86/include/asm/pgtable.h | 66 ++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtabl= e.h index 2733e7ec16b3..43cb27379208 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1253,6 +1253,39 @@ static inline pte_t ptep_get_and_clear_full(struct= mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + /* + * Some processors can start a write, but end up seeing a read-only + * PTE by the time they get to the Dirty bit. In this case, they + * will set the Dirty bit, leaving a read-only, Dirty PTE which + * looks like a Shadow Stack PTE. + * + * However, this behavior has been improved and will not occur on + * processors supporting Shadow Stack. Without this guarantee, a + * transition to a non-present PTE and flush the TLB would be + * needed. + * + * When changing a writable PTE to read-only and if the PTE has + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_DIRTY_SW so that + * the PTE is not a valid Shadow Stack PTE. + */ +#ifdef CONFIG_X86_64 + if (static_cpu_has(X86_FEATURE_SHSTK)) { + pte_t new_pte, pte =3D READ_ONCE(*ptep); + + do { + /* + * This is the same as moving _PAGE_DIRTY_HW + * to _PAGE_DIRTY_SW. + */ + new_pte =3D pte_wrprotect(pte); + new_pte.pte |=3D (new_pte.pte & _PAGE_DIRTY_HW) >> + _PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW; + new_pte.pte &=3D ~_PAGE_DIRTY_HW; + } while (!try_cmpxchg(&ptep->pte, &pte.pte, new_pte.pte)); + + return; + } +#endif clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); } =20 @@ -1303,6 +1336,39 @@ static inline pud_t pudp_huge_get_and_clear(struct= mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { + /* + * Some processors can start a write, but end up seeing a read-only + * PMD by the time they get to the Dirty bit. In this case, they + * will set the Dirty bit, leaving a read-only, Dirty PMD which + * looks like a Shadow Stack PMD. + * + * However, this behavior has been improved and will not occur on + * processors supporting Shadow Stack. Without this guarantee, a + * transition to a non-present PMD and flush the TLB would be + * needed. + * + * When changing a writable PMD to read-only and if the PMD has + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_DIRTY_SW so that + * the PMD is not a valid Shadow Stack PMD. + */ +#ifdef CONFIG_X86_64 + if (static_cpu_has(X86_FEATURE_SHSTK)) { + pmd_t new_pmd, pmd =3D READ_ONCE(*pmdp); + + do { + /* + * This is the same as moving _PAGE_DIRTY_HW + * to _PAGE_DIRTY_SW. + */ + new_pmd =3D pmd_wrprotect(pmd); + new_pmd.pmd |=3D (new_pmd.pmd & _PAGE_DIRTY_HW) >> + _PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW; + new_pmd.pmd &=3D ~_PAGE_DIRTY_HW; + } while (!try_cmpxchg(&pmdp->pmd, &pmd.pmd, new_pmd.pmd)); + + return; + } +#endif clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } =20 --=20 2.21.0