From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69FD7C61DD8 for ; Tue, 10 Nov 2020 16:23:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0407020780 for ; Tue, 10 Nov 2020 16:23:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0407020780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E11106B0092; Tue, 10 Nov 2020 11:22:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CE1D16B0099; Tue, 10 Nov 2020 11:22:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 981456B0095; Tue, 10 Nov 2020 11:22:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0066.hostedemail.com [216.40.44.66]) by kanga.kvack.org (Postfix) with ESMTP id 1E3956B0080 for ; Tue, 10 Nov 2020 11:22:58 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id AAA1B82499A8 for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) X-FDA: 77469027594.13.fifth28_0813a55272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 81B4318140B69 for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) X-HE-Tag: fifth28_0813a55272f6 X-Filterd-Recvd-Size: 6371 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) IronPort-SDR: zfTbVCxD+JicNpEH+05xk1yitX+HPnhG5A8SLlpc3LoUUp8egAT9Dn0WeIphI3zcQyNJQajs2E 8+R+xWduqKzw== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="231630857" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="231630857" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:51 -0800 IronPort-SDR: btnedAJXepRgTCIznTycPfe/5Ka1MFwIT0ZAGB0RSyB+Fqksri5RnOESYNHQ/Z3lZSvCu+co0n TatqdnoiAZcA== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572845" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:51 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 11/26] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY_HW to _PAGE_COW Date: Tue, 10 Nov 2020 08:21:56 -0800 Message-Id: <20201110162211.9207-12-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When shadow stack is introduced, [R/O + _PAGE_DIRTY_HW] PTE is reserved for shadow stack. Copy-on-write PTEs have [R/O + _PAGE_COW]. When a PTE goes from [R/W + _PAGE_DIRTY_HW] to [R/O + _PAGE_COW], it coul= d become a transient shadow stack PTE in two cases: The first case is that some processors can start a write but end up seein= g a read-only PTE by the time they get to the Dirty bit, creating a transie= nt shadow stack PTE. However, this will not occur on processors supporting shadow stack, therefore we don't need a TLB flush here. The second case is that when the software, without atomic, tests & replac= es _PAGE_DIRTY_HW with _PAGE_COW, a transient shadow stack PTE can exist. This is prevented with cmpxchg. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the cmpxchg solution. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- arch/x86/include/asm/pgtable.h | 52 ++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtabl= e.h index 07a08e763bce..5fd4d6b60383 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1229,6 +1229,32 @@ static inline pte_t ptep_get_and_clear_full(struct= mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + /* + * Some processors can start a write, but end up seeing a read-only + * PTE by the time they get to the Dirty bit. In this case, they + * will set the Dirty bit, leaving a read-only, Dirty PTE which + * looks like a shadow stack PTE. + * + * However, this behavior has been improved and will not occur on + * processors supporting shadow stack. Without this guarantee, a + * transition to a non-present PTE and flush the TLB would be + * needed. + * + * When changing a writable PTE to read-only and if the PTE has + * _PAGE_DIRTY_HW set, move that bit to _PAGE_COW so that the + * PTE is not a shadow stack PTE. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pte_t old_pte, new_pte; + + do { + old_pte =3D READ_ONCE(*ptep); + new_pte =3D pte_wrprotect(old_pte); + + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); + + return; + } clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); } =20 @@ -1285,6 +1311,32 @@ static inline pud_t pudp_huge_get_and_clear(struct= mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { + /* + * Some processors can start a write, but end up seeing a read-only + * PMD by the time they get to the Dirty bit. In this case, they + * will set the Dirty bit, leaving a read-only, Dirty PMD which + * looks like a Shadow Stack PMD. + * + * However, this behavior has been improved and will not occur on + * processors supporting Shadow Stack. Without this guarantee, a + * transition to a non-present PMD and flush the TLB would be + * needed. + * + * When changing a writable PMD to read-only and if the PMD has + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_COW so that the + * PMD is not a shadow stack PMD. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pmd_t old_pmd, new_pmd; + + do { + old_pmd =3D READ_ONCE(*pmdp); + new_pmd =3D pmd_wrprotect(old_pmd); + + } while (!try_cmpxchg((pmdval_t *)pmdp, (pmdval_t *)&old_pmd, pmd_val(= new_pmd))); + + return; + } clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } =20 --=20 2.21.0