From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF822C74A5B for ; Sun, 19 Mar 2023 00:16:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C36F28000E; Sat, 18 Mar 2023 20:16:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 04A56280001; Sat, 18 Mar 2023 20:16:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB71928000E; Sat, 18 Mar 2023 20:16:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BF2C1280001 for ; Sat, 18 Mar 2023 20:16:25 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9D58640C8B for ; Sun, 19 Mar 2023 00:16:25 +0000 (UTC) X-FDA: 80583731130.23.857AD96 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf11.hostedemail.com (Postfix) with ESMTP id 90DA640012 for ; Sun, 19 Mar 2023 00:16:23 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aHMgeMMK; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184983; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=yh44XlmxZ3TpyhdCF8bY5/DkNMqHR+ekMlxGg2PUBdY=; b=X18o4CMubeCSpbvYy3jnuPMUkAa9GZIgPj7roeUGbCcuO6bPD9NCnjioaQmVTJll/lZPkn LlcAIsg+aOzV407deRQzWnYhIfO/+gvScaDBckhKyyx+LN/tlKrSKynx10dbei8V/pcBCB S75V7J5sNR7e9t+jNh8nUplvB0SzPZU= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aHMgeMMK; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184983; a=rsa-sha256; cv=none; b=djWDi/UKpht0b1VkUsCjZXLSDJMQMf1kyVB6Lp3BMXkFZAlx0oYWEc50/xr2qJt1ggbp6L KqngeP2JhOpLkFNiFbxpSJVAOByy3LPk/zqHDP5J6sYrSfce1+rIW913HTd3dn2/ntDLo2 bUp6YnRD6uMu8a3pZtSuh28p7HXPpAw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184983; x=1710720983; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=V5J+b+gsGH0IqMjFO6EWI1A0U+2t/hj/jJS5+EQE7p4=; b=aHMgeMMKvr6/HW4jGs6XBli9WHsA0LXHJQ6ssb9K+bfOmHlzghoGWvOl 9VCqSf4dg0Aqd3rKJM8vQxnNDCXwHgsyJB+SwlVxHL00pvCpYPLdsnyrR 8QhMH/GxNRbPR22YBuVGf/Xk/0t+5ksv2prgvf3tcrS65x+jXbqQJ10ia x6c1CAnZ8ksOJmtaZNk8oRysqJ2AXvxICoAivhykcKcRoufBSV8kBT4gs cL3NlxQeSKeOgdgbgzpV7ZjuujDo6YlvbsUOIz5WkhtA1RoCK3LdtgQQj nn6OP/alfrBrhXaJEstYs4jWxQI45ya1dYT3PnMn5MLUB8mN/iDQQUhtL g==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491084" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491084" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672845" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672845" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:20 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 16/40] x86/mm: Start actually marking _PAGE_SAVED_DIRTY Date: Sat, 18 Mar 2023 17:15:11 -0700 Message-Id: <20230319001535.23210-17-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 90DA640012 X-Rspam-User: X-Stat-Signature: nj5k6w5nmowgyfodh9tnw6iky1oi979d X-HE-Tag: 1679184983-668226 X-HE-Meta: U2FsdGVkX1+QmtMY2vcOodVTn9gDHDN7GpyXzpJIFnvjiTPtZwNeWLbqLAS8g6EBCZCCt83sEIHPBhGRrRSYfo9BTGjUEy2kOrqDiVR0dUSfOuryvBPF7ql0MPwKcIum255xUp40NPusYxO7rm80GPkriU/isK1MPQlb3ZxHWZL0N3uoiGTiuoy3aRwJH7gBRJC/JHCjI75eGx9+za9++xvC8M1jX9VMjYelc0goxdZ1StgsvJFZki0VJm3bvJbNx10L7yu/7c5NTe3qs0HDIQ4rwBOmtgffIknWb0KtUcf0qXhudToo4zp0/iVgcS2F9dyvnJSDv8SBJlxzHyrhIEHNPX/E8J6LBbTqMP9I4QpBq/NlWwtESclq08ud8+F9iXtZ4IWF6s/ZckbNpcvcy+H84IE5mMInHmf8GCNYqqJiTM9vUu/Ub1vBznj5pe+y/S24ImGweE5wJjizG9VAno5Y9egYJkk99ozkGngLjXDbjMBS7C7U9piWPRUxnC6LGgVj8YNwKLJjetwQe4HS34A9P0ENQe8MRON0kAFv+QNKm2/T3UdmCN8lmQz7Lna+/zLHfYnlkAj+Y7TuDUZDwF24yUHdVKJPXq4/Zm1biFMIZpgErA4Mq6ftd7JQsTdGerjEXxKK3xzaf1FC7g/pSrEqkuXT/qwO2Oil57Su2TH30YdB6bEPnpwxRETpa9Jgzj77K6RzbpppsF7d64O65/pJYyt5GNbtw+yacM76HsdESoWmPJcMVB9OoOFj1y6zb0dtNkYH/6lj2QJEqctuo/Q7TxaS2KXRnize9bdN5+3kcB1FypOTxLgSYcsv6Mxe2oDCYktNX2/durYIqoe1ZchKDDfFE80O0O5wln9N3PiLsYVTX9beuqiDFAVF1soVnE2mjeitmUx4oo8j1uI9poYmVO3qeXI9GTDvD7EZnRWUJh0oRGGM4+dJrFyg8kUJQ4rFqATMw/hv3slIC3H D3BTMi+e joOSsoLJdaKQir/XCd/J1L2LHZxgsgoID33uwU+mNo/3DjC6UqDjr709TUCOmUMIQ1RD+8hH8NFMK5eDXSYcHTdXzTnQMAv7E1hEsBRuFQyO6qMcN8iGXPV2tvZo8As5xbLlcjDgq1CXf2XvzCkf756EtkrZQcXoY0DNxOOQBVA2LWOSU/MmT/Lu1wf1CwdXymabYprWpK1b2Qd/7AU/C4ZJBQyANWp4h0Ls2yYzbW6bUMgk0pl7z/3zwSrtkhGTlrHzDsp3o110rpx7WZd4kbgoBqbS1Hk1ceX67yOaOHmyoXUDBbSXFpx5w6hNsJQDlgpj77dXcgtoxPBE+/zdPpOUbqPSkTh51j3MFzFUCoWKsIT4dvvDBkPp+kBwmm7uKDZEOe0Im1tCMcm0CYZovjohbTg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The recently introduced _PAGE_SAVED_DIRTY should be used instead of the HW Dirty bit whenever a PTE is Write=0, in order to not inadvertently create shadow stack PTEs. Update pte_mk*() helpers to do this, and apply the same changes to pmd and pud. For pte_modify() this is a bit trickier. It takes a "raw" pgprot_t which was not necessarily created with any of the existing PTE bit helpers. That means that it can return a pte_t with Write=0,Dirty=1, a shadow stack PTE, when it did not intend to create one. Modify it to also move _PAGE_DIRTY to _PAGE_SAVED_DIRTY. To avoid creating Write=0,Dirty=1 PTEs, pte_modify() needs to avoid: 1. Marking Write=0 PTEs Dirty=1 2. Marking Dirty=1 PTEs Write=0 The first case cannot happen as the existing behavior of pte_modify() is to filter out any Dirty bit passed in newprot. Handle the second case by shifting _PAGE_DIRTY=1 to _PAGE_SAVED_DIRTY=1 if the PTE was write protected by the pte_modify() call. Apply the same changes to pmd_modify(). Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v6: - Rename _PAGE_COW to _PAGE_SAVED_DIRTY (David Hildenbrand) - Open code _PAGE_SAVED_DIRTY part in pte_modify() (Boris) - Change the logic so the open coded part is not too ugly - Merge pte_modify() patch with this one because of the above v4: - Break part patch for better bisectability --- arch/x86/include/asm/pgtable.h | 168 ++++++++++++++++++++++++++++----- 1 file changed, 145 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 349fcab0405a..05dfdbdf96b4 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -124,9 +124,17 @@ extern pmdval_t early_pmd_flags; * The following only work if pte_present() is true. * Undefined behaviour if not.. */ -static inline int pte_dirty(pte_t pte) +static inline bool pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY; + return pte_flags(pte) & _PAGE_DIRTY_BITS; +} + +static inline bool pte_shstk(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return false; + + return (pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY; } static inline int pte_young(pte_t pte) @@ -134,9 +142,18 @@ static inline int pte_young(pte_t pte) return pte_flags(pte) & _PAGE_ACCESSED; } -static inline int pmd_dirty(pmd_t pmd) +static inline bool pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY; + return pmd_flags(pmd) & _PAGE_DIRTY_BITS; +} + +static inline bool pmd_shstk(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return false; + + return (pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY | _PAGE_PSE)) == + (_PAGE_DIRTY | _PAGE_PSE); } #define pmd_young pmd_young @@ -145,9 +162,9 @@ static inline int pmd_young(pmd_t pmd) return pmd_flags(pmd) & _PAGE_ACCESSED; } -static inline int pud_dirty(pud_t pud) +static inline bool pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY; + return pud_flags(pud) & _PAGE_DIRTY_BITS; } static inline int pud_young(pud_t pud) @@ -157,13 +174,21 @@ static inline int pud_young(pud_t pud) static inline int pte_write(pte_t pte) { - return pte_flags(pte) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pte_flags(pte) & _PAGE_RW) || pte_shstk(pte); } #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pmd_flags(pmd) & _PAGE_RW) || pmd_shstk(pmd); } #define pud_write pud_write @@ -342,7 +367,16 @@ static inline pte_t pte_clear_saveddirty(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) { - return pte_clear_flags(pte, _PAGE_RW); + pte = pte_clear_flags(pte, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PTE (Write=0,Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pte_dirty(pte)) + pte = pte_mksaveddirty(pte); + return pte; } #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP @@ -380,7 +414,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_DIRTY_BITS); } static inline pte_t pte_mkold(pte_t pte) @@ -395,7 +429,19 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { - return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pteval_t dirty = _PAGE_DIRTY; + + /* Avoid creating Dirty=1,Write=0 PTEs */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && !pte_write(pte)) + dirty = _PAGE_SAVED_DIRTY; + + return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + /* pte_clear_saveddirty() also sets Dirty=1 */ + return pte_clear_saveddirty(pte); } static inline pte_t pte_mkyoung(pte_t pte) @@ -412,7 +458,12 @@ struct vm_area_struct; static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { - return pte_mkwrite_kernel(pte); + pte = pte_mkwrite_kernel(pte); + + if (pte_dirty(pte)) + pte = pte_clear_saveddirty(pte); + + return pte; } static inline pte_t pte_mkhuge(pte_t pte) @@ -481,7 +532,15 @@ static inline pmd_t pmd_clear_saveddirty(pmd_t pmd) static inline pmd_t pmd_wrprotect(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_RW); + pmd = pmd_clear_flags(pmd, _PAGE_RW); + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PMD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pmd_dirty(pmd)) + pmd = pmd_mksaveddirty(pmd); + return pmd; } #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP @@ -508,12 +567,23 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS); } static inline pmd_t pmd_mkdirty(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pmdval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PMDs */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && !pmd_write(pmd)) + dirty = _PAGE_SAVED_DIRTY; + + return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + return pmd_clear_saveddirty(pmd); } static inline pmd_t pmd_mkdevmap(pmd_t pmd) @@ -533,7 +603,12 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) static inline pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { - return pmd_set_flags(pmd, _PAGE_RW); + pmd = pmd_set_flags(pmd, _PAGE_RW); + + if (pmd_dirty(pmd)) + pmd = pmd_clear_saveddirty(pmd); + + return pmd; } static inline pud_t pud_set_flags(pud_t pud, pudval_t set) @@ -577,17 +652,32 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_DIRTY_BITS); } static inline pud_t pud_wrprotect(pud_t pud) { - return pud_clear_flags(pud, _PAGE_RW); + pud = pud_clear_flags(pud, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PUD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pud_dirty(pud)) + pud = pud_mksaveddirty(pud); + return pud; } static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pudval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PUDs */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && !pud_write(pud)) + dirty = _PAGE_SAVED_DIRTY; + + return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY); } static inline pud_t pud_mkdevmap(pud_t pud) @@ -607,7 +697,11 @@ static inline pud_t pud_mkyoung(pud_t pud) static inline pud_t pud_mkwrite(pud_t pud) { - return pud_set_flags(pud, _PAGE_RW); + pud = pud_set_flags(pud, _PAGE_RW); + + if (pud_dirty(pud)) + pud = pud_clear_saveddirty(pud); + return pud; } #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY @@ -724,6 +818,8 @@ static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { pteval_t val = pte_val(pte), oldval = val; + bool wr_protected; + pte_t pte_result; /* * Chop off the NX bit (if present), and add the NX portion of @@ -732,17 +828,43 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) val &= _PAGE_CHG_MASK; val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK; val = flip_protnone_guard(oldval, val, PTE_PFN_MASK); - return __pte(val); + + pte_result = __pte(val); + + /* + * Do the saveddirty fixup if the PTE was just write protected and + * it's dirty. + */ + wr_protected = (oldval & _PAGE_RW) && !(val & _PAGE_RW); + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && wr_protected && + (val & _PAGE_DIRTY)) + pte_result = pte_mksaveddirty(pte_result); + + return pte_result; } static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) { pmdval_t val = pmd_val(pmd), oldval = val; + bool wr_protected; + pmd_t pmd_result; - val &= _HPAGE_CHG_MASK; + val &= (_HPAGE_CHG_MASK & ~_PAGE_DIRTY); val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK; val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK); - return __pmd(val); + + pmd_result = __pmd(val); + + /* + * Do the saveddirty fixup if the PMD was just write protected and + * it's dirty. + */ + wr_protected = (oldval & _PAGE_RW) && !(val & _PAGE_RW); + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && wr_protected && + (val & _PAGE_DIRTY)) + pmd_result = pmd_mksaveddirty(pmd_result); + + return pmd_result; } /* -- 2.17.1