From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FF67C6379F for ; Thu, 19 Jan 2023 21:23:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 502066B0095; Thu, 19 Jan 2023 16:23:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 48CD36B0096; Thu, 19 Jan 2023 16:23:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2DDE98E0001; Thu, 19 Jan 2023 16:23:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0EB156B0095 for ; Thu, 19 Jan 2023 16:23:50 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id CA11C1C658F for ; Thu, 19 Jan 2023 21:23:48 +0000 (UTC) X-FDA: 80372825736.07.C6EBB78 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf27.hostedemail.com (Postfix) with ESMTP id C4E5440010 for ; Thu, 19 Jan 2023 21:23:46 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Mp57EqQ3; spf=pass (imf27.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674163427; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=e2Tm+WaYXmy3ls6tYjrnsqh0ZVC1D7WOAXojsWXtuCw=; b=EQU/sILmBNH2ufBqggnSRXZM7b2+3K9GBsr2SfEPOV5ekeJLb8P38Xh16O7rFX2dOEWnxq 2zPVKM8Jyq8HP7nH3TPWbyyDOkDbXcqwn2715toQbotjQt7ZrOtJdIV1KhipDJ6dh5nkrh oZ7RHC0GwkKKKFkYHUwXUvZ0w5cbgRM= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Mp57EqQ3; spf=pass (imf27.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674163427; a=rsa-sha256; cv=none; b=B2Td3z97oP7EQzXo43VQAf9KirvGTZxRThtX+72lN8hQMWkVU/Nqj64NUikpTNi6m36iYK M9kkNFuDbRGOBZO8VyUY8CnNpDq/2+phnv2MOebY+939E9UIgBFkPnmGVs17JJl3wZ3hTi xwxICx1Es0UA9v+HAmnEpipK+6fUef4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674163426; x=1705699426; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Wk+41wU25mzgjYiTB7ECzyixSrgPZ+gatfisdIuQPw8=; b=Mp57EqQ3XMKXxtPVUtv6VrtCaZLXAzBy17S2un5uD63lPx/t6FjTksBW MSn7pUefla5qvJbu08j8gl70mjOGbFVKWoTWPTVKNN00AN8fcweJGIqo4 XFoK5+hovj9nHyV0x+KpN4/jzMOzw4Q3NsfXfP0UbZG7SPb197ivqNH9h wTzM16szX18V9tGn/29w7M1ZgE5B4QKyLItHcM1yN6pKAbaxcJH7z/69p m4qmXWMhx7Tq2fcdta+Qr+tzdqHonx1/zRCnGX2tW9oVVMgobFG+qq9cq rUVRyCRc02zT/bKvh0IJ0N8QWMWSQ4l7diQpyXx/jmDlFsrPinTV+QR8w A==; X-IronPort-AV: E=McAfee;i="6500,9779,10595"; a="323119459" X-IronPort-AV: E=Sophos;i="5.97,230,1669104000"; d="scan'208";a="323119459" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2023 13:23:46 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10595"; a="989139047" X-IronPort-AV: E=Sophos;i="5.97,230,1669104000"; d="scan'208";a="989139047" Received: from hossain3-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.252.128.187]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2023 13:23:44 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v5 13/39] x86/mm: Start actually marking _PAGE_COW Date: Thu, 19 Jan 2023 13:22:51 -0800 Message-Id: <20230119212317.8324-14-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230119212317.8324-1-rick.p.edgecombe@intel.com> References: <20230119212317.8324-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: C4E5440010 X-Stat-Signature: 6oyjuya4e1t9hoysf4q4bwdhqokrox4y X-Rspam-User: X-HE-Tag: 1674163426-979918 X-HE-Meta: U2FsdGVkX1/JBjhROK1UhUlNXdRNhgiWDOHZuiCe994GBsqtE6HfbDyBCb/hjWhHo8Gxfs/VOTUQcmM/mL10mAfGCcdbLMVaIaTQGBoWen6B+BWAERXBsBOMaGNC9VjM1QKruMvM+3wvIMEKTNNnXA4/MJrPUSJDZW3eAGBQyK2JVGmG54zy41XFzL9WA5RcY54lAcCrfgMaCVQRCyR7mwdAuhCZ5Joxhi5GS1htcUzv7sq9+eveci0abX8N3XaHmzJAWlHPXQ5na4hVRkwjnRgn2UqOTL8TRx8uLxq+cJou3mnCGfOXGbvklaUU9Ht/ZcCYMoMaq75x6g1PMn3os3rubiKW+Q14lq0YcMMIzyO9EJZ7puE8TyvPdZYWAvjoAPM64c4of3EK1g9c6Aaxq7zJAQme/nRXXv0aVKwctL37MS1EGQrfrrTWFZosIRwDTkU0Vn8KElRpvI1zfMhjD/UcnbJ1vRwBdfha3SRppPZqFDcLWF9NExDJ6UiiHwIcNAzF0nKlB23J8UUXJFq01xn1v42L1gzvoLBVQQxWJLPivx+Rz8kEdFDgWoHypocTI4ibTVjESHPRDkynqlRMzdoN6kKMPgzaFcNKKG0EHA5CsxA3PeCXJhPxKGdnv+SeXmJYFgJIntmkymHgriZFaTxgFc2f9h06s0LfALUYCW/aZiyT+0sfGLabjrLyU1LYwH/II1HHN7BMP7Cm7u0PdAg6fFWQ+yFm1/tLKhFxeJG6x4c7I74iV9OF6IMkR/dxHufKKwwIzGxz9c5U58oC4I98j6via/6AxKroBMCBsSqQILrgq2/jj+MLNq8HkEh6JRZqLJxM2enwE37PWEJ41jftyhRUmusGH5ROKt3q4GCQlwNf0zFig7sliiWWWRhfaJ0vT8mVMOpXMn3bGhb8/NXWt4PJC1l0XObfFRpZ7rHI9XtRr7KZLq+X75LZ4PaM4x0koCy0whsv8+pwtQ/ Bs7Xcsoy Q3ccwaOn+ZvozXx5R/oePKvDN9n7DmThwiLMYEzJefA38Dyy6XQBxJzl+Zk57vmfr97pnvQKUr4ZK6XK7mwaE58XvlbJe8mhY+zEv8Yi2v0xPkunKyacReMx50zcF754qKJqq4r1cO9jO/Kx5kFZF1fDOnFqQe+O1Sj/KkJ38++pdMQrK7NiMoGxMqpU9vB1NbvHihUW0xURFrcI5//sSxOtxCXAowKjzpmrfZTn6uJRr/HUwGqfwVtCbxU01DhYvjxZkFe7vbqKoXExEZh+/rDwtlChBNfJLXHzcD62HOHvxfCGgw30DYsFCr8Bz0DS2Ye+N76vtKpDu54+KA3QPCE+BWw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The recently introduced _PAGE_COW should be used instead of the HW Dirty bit whenever a PTE is Write=0, in order to not inadvertently create shadow stack PTEs. Update pte_mk*() helpers to do this, and apply the same changes to pmd and pud. Reviewed-by: Kees Cook Tested-by: Pengfei Xu Tested-by: John Allen Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe --- v4: - Break part patch for better bisectability arch/x86/include/asm/pgtable.h | 125 ++++++++++++++++++++++++++++----- 1 file changed, 107 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index c5047eb5f406..e96558abc8ec 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -124,9 +124,17 @@ extern pmdval_t early_pmd_flags; * The following only work if pte_present() is true. * Undefined behaviour if not.. */ -static inline int pte_dirty(pte_t pte) +static inline bool pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY; + return pte_flags(pte) & _PAGE_DIRTY_BITS; +} + +static inline bool pte_shstk(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return false; + + return (pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY; } static inline int pte_young(pte_t pte) @@ -134,9 +142,18 @@ static inline int pte_young(pte_t pte) return pte_flags(pte) & _PAGE_ACCESSED; } -static inline int pmd_dirty(pmd_t pmd) +static inline bool pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY; + return pmd_flags(pmd) & _PAGE_DIRTY_BITS; +} + +static inline bool pmd_shstk(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return false; + + return (pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY | _PAGE_PSE)) == + (_PAGE_DIRTY | _PAGE_PSE); } #define pmd_young pmd_young @@ -145,9 +162,9 @@ static inline int pmd_young(pmd_t pmd) return pmd_flags(pmd) & _PAGE_ACCESSED; } -static inline int pud_dirty(pud_t pud) +static inline bool pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY; + return pud_flags(pud) & _PAGE_DIRTY_BITS; } static inline int pud_young(pud_t pud) @@ -157,13 +174,21 @@ static inline int pud_young(pud_t pud) static inline int pte_write(pte_t pte) { - return pte_flags(pte) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pte_flags(pte) & _PAGE_RW) || pte_shstk(pte); } #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pmd_flags(pmd) & _PAGE_RW) || pmd_shstk(pmd); } #define pud_write pud_write @@ -374,7 +399,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_DIRTY_BITS); } static inline pte_t pte_mkold(pte_t pte) @@ -384,7 +409,16 @@ static inline pte_t pte_mkold(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) { - return pte_clear_flags(pte, _PAGE_RW); + pte = pte_clear_flags(pte, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PTE (Write=0,Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pte_dirty(pte)) + pte = pte_mkcow(pte); + return pte; } static inline pte_t pte_mkexec(pte_t pte) @@ -396,6 +430,10 @@ static inline pte_t __pte_mkdirty(pte_t pte, bool soft) { pteval_t dirty = _PAGE_DIRTY; + /* Avoid creating Dirty=1,Write=0 PTEs */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && !pte_write(pte)) + dirty = _PAGE_COW; + if (soft) dirty |= _PAGE_SOFT_DIRTY; @@ -407,6 +445,12 @@ static inline pte_t pte_mkdirty(pte_t pte) return __pte_mkdirty(pte, true); } +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + /* pte_clear_cow() also sets Dirty=1 */ + return pte_clear_cow(pte); +} + static inline pte_t pte_mkyoung(pte_t pte) { return pte_set_flags(pte, _PAGE_ACCESSED); @@ -414,7 +458,12 @@ static inline pte_t pte_mkyoung(pte_t pte) static inline pte_t pte_mkwrite(pte_t pte) { - return pte_set_flags(pte, _PAGE_RW); + pte = pte_set_flags(pte, _PAGE_RW); + + if (pte_dirty(pte)) + pte = pte_clear_cow(pte); + + return pte; } static inline pte_t pte_mkhuge(pte_t pte) @@ -505,18 +554,30 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS); } static inline pmd_t pmd_wrprotect(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_RW); + pmd = pmd_clear_flags(pmd, _PAGE_RW); + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PMD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pmd_dirty(pmd)) + pmd = pmd_mkcow(pmd); + return pmd; } static inline pmd_t __pmd_mkdirty(pmd_t pmd, bool soft) { pmdval_t dirty = _PAGE_DIRTY; + /* Avoid creating (HW)Dirty=1, Write=0 PMDs */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && !pmd_write(pmd)) + dirty = _PAGE_COW; + if (soft) dirty |= _PAGE_SOFT_DIRTY; @@ -528,6 +589,11 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd) return __pmd_mkdirty(pmd, true); } +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + return pmd_clear_cow(pmd); +} + static inline pmd_t pmd_mkdevmap(pmd_t pmd) { return pmd_set_flags(pmd, _PAGE_DEVMAP); @@ -545,7 +611,11 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) static inline pmd_t pmd_mkwrite(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_RW); + pmd = pmd_set_flags(pmd, _PAGE_RW); + + if (pmd_dirty(pmd)) + pmd = pmd_clear_cow(pmd); + return pmd; } static inline pud_t pud_set_flags(pud_t pud, pudval_t set) @@ -589,17 +659,32 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_DIRTY_BITS); } static inline pud_t pud_wrprotect(pud_t pud) { - return pud_clear_flags(pud, _PAGE_RW); + pud = pud_clear_flags(pud, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PUD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pud_dirty(pud)) + pud = pud_mkcow(pud); + return pud; } static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pudval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PUDs */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && !pud_write(pud)) + dirty = _PAGE_COW; + + return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY); } static inline pud_t pud_mkdevmap(pud_t pud) @@ -619,7 +704,11 @@ static inline pud_t pud_mkyoung(pud_t pud) static inline pud_t pud_mkwrite(pud_t pud) { - return pud_set_flags(pud, _PAGE_RW); + pud = pud_set_flags(pud, _PAGE_RW); + + if (pud_dirty(pud)) + pud = pud_clear_cow(pud); + return pud; } #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY -- 2.17.1