From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75BB0C87FDE for ; Mon, 27 Feb 2023 22:32:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D14B6B009B; Mon, 27 Feb 2023 17:31:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 891FB6B0093; Mon, 27 Feb 2023 17:31:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 620DF6B0096; Mon, 27 Feb 2023 17:31:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3CF836B0093 for ; Mon, 27 Feb 2023 17:31:49 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1A5CB1C605F for ; Mon, 27 Feb 2023 22:31:49 +0000 (UTC) X-FDA: 80514520338.27.2FBE7E0 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf24.hostedemail.com (Postfix) with ESMTP id 32491180003 for ; Mon, 27 Feb 2023 22:31:47 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=oDgDMJzf; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf24.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677537107; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=sRvzlorvbRuacARz1OmsuoI0l9txi5DKbUTizxVaiSw=; b=LrsmPiB3UpY7GQpRZP391wIyjP+MymRWVBxDsd2HyHrdorPX6lmGOyuLBGC+6XBaybPe/F /glSUNnz3ZEdXqForkQ/Y6npuG9uhiCN4yXXdpXKeiDy1aLhOqEnV2gOy6NjglDwnnGsb+ 0XCIEXTWbU/aBaosHY+Pr7MgYlGWnZc= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=oDgDMJzf; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf24.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677537107; a=rsa-sha256; cv=none; b=Ck9cGpzKFwvk2kO0SQCNqZGejqnaNRGEqkGodADDIEjQCoWN0hANRZj/2dPh7rwrU0Cb0f lVJdsIFf+ixDMYZcSOEspp1MPNATmdYLnnGNOsza9HOfJ8BLV3ZaFIu/3qXa609Eye9R91 TKajSz2GayFVnFh6g0Lx3n3zmX4Que4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1677537107; x=1709073107; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=s8OgmYq9WCVDjcUq/7zu6NLjqIcWEXmfAG1/rmhO7do=; b=oDgDMJzfK5KbN9aAdNNSRWeoyFGOavRY87AdMALT/4guYx7PEXpcjqbH 0tDZy9kT3o14ebjwGAwAB4sN86a7TxRsyZoV2TcJ7ycbus3iINgQPVWQj 5eDDXtIraxDAc0L8b6NjvEbc+nQYcEzP6O0KXu/jJEZkoP3sKr/mfrqVI 5AIN+g4HbcSmILqeAtOWYn+Nz1ydwxEb/NnALCXYP6mzs6MVEX4XLONKv 35EVak536vFdwBhpqXJpOWNe4//dNSl0MypjIyE13d2qcAqnsDxOSCz15 UiPrYx3siINZWLp5gUOyJG9Gqhr44nYAq85nxTr0ZJ+2ZkNnMcaiBBXbG A==; X-IronPort-AV: E=McAfee;i="6500,9779,10634"; a="313657498" X-IronPort-AV: E=Sophos;i="5.98,220,1673942400"; d="scan'208";a="313657498" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2023 14:31:24 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10634"; a="848024608" X-IronPort-AV: E=Sophos;i="5.98,220,1673942400"; d="scan'208";a="848024608" Received: from leonqu-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.72.19]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2023 14:31:20 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v7 19/41] x86/mm: Check shadow stack page fault errors Date: Mon, 27 Feb 2023 14:29:35 -0800 Message-Id: <20230227222957.24501-20-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230227222957.24501-1-rick.p.edgecombe@intel.com> References: <20230227222957.24501-1-rick.p.edgecombe@intel.com> X-Rspamd-Queue-Id: 32491180003 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: orypk9hdsd88je71apy7pe7mkn3axq5y X-HE-Tag: 1677537107-332969 X-HE-Meta: U2FsdGVkX19ZahD+biDeuXVYXoP6ZXlQuv+SRL1GEXXI8aSSvpIuOpQZfljkKt1GGvKcqF5YPSQopMVxQGq2AmwD2C8RmgD6lSBtyQMJIB0HuSmA8PWunh6R+XAf5ynwphgcfJGH046otma9PTtQc7Af1K4br4H1HW/uLxjgYIEIIr6+d9jaDEN3GcisIZYJA9uDmYcyztUVdXq+z2bDz+NnPXN3PlyrHjsc2lIZ6p1ehx+KgabwBnpd7KI9O6cm1emLm+6d4rOxGRuJLTmEY2Dmvigo+vgANw04Vhd6xdMpSbzGglBslITdypQfuDM78VGslzKD5UI20OPHO2M2nQVJoH9h/zzpYpoSxCMOdL+wEiVKPUbSMJz5r46jVDdkK8g9yVMu8uCFrmx/oCj7Ryc9NBI1shlzIBcXtEkqaTgWD3I9Gn4i6k511CKqKI80r5lKI2mBVOxR1B0RGngGGrdULwGfKRo8lL7zFmqMajjEqBXEBhrVFEsuwsoAauHwPmgzneT+RnalAftFiOB0hW1Zk58a2IvlL+ah6MYWbSxpqE1LSgAVNHH3CY8wGig7aOXLxCuOgGBcXQlBWMsoVqE+YG9Ureg9NfxKg6fdKfoluGZrC8jqs3c/FUCVI2zT/fVuA+uKJvxd14lvkLTLbf0s4DjgTgnYDOAi5wyznRuOsjaFyFHjHsUDCxmJQAoC/al7R5yqnMq8oqNGurWXgRYHJRkutyokEhTHsh05TOajObh+orKIHeij6/fOkmNPIdxbNyTu4+NG+UPZG1qIjauzLpCqMOK3Ud2n58Zpg9w/Ri6LG8kS9dtNVD21ShDkWccBbCxUBKbCxP2ytJdMSlGHiXqluzBjVfX8hk0drpaGWN3ONqt+nHyHOYWkZODUlMBN/2mkib+h8Je7/wqbv3ZLLJPCY+uaOOPQu4OoH3BA5qzDDTshOw1qMVFOC/iDBowk+QImI2AIKp6+by7 ls/V4kIK rVao3OsZ4LuswT73ggwvc8I7qx0UYMlrqWzltNqtbreijJLlLcabSx03YWuIKSjzmEyIxAOvVIqsiy96Jm0qUtinQ9uR7UbCP1FR+/KVwhKh/vGwCpwibC9xbSnRrljjU1cXMSw50LfAuhbr4iyZHASiQIxYuD9IjDlMOiYolBz5zQII6fiESC6YtShyqVGV/0eEkWWwQkowKlLx2oDYtzWZdid7qD7tjrLOx+v6G1DNBYiu7g9BFaoPRtYZB4mJiYEgGpEeaUcT+5ST2NhscPlF8RNirvLIya/7w1iGitKUjmIoV0iZlM5LDmKysHunPQ/yChef9CUVP62H24Hx0mFeG4l5IhvB0qCJU X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu The CPU performs "shadow stack accesses" when it expects to encounter shadow stack mappings. These accesses can be implicit (via CALL/RET instructions) or explicit (instructions like WRSS). Shadow stack accesses to shadow-stack mappings can result in faults in normal, valid operation just like regular accesses to regular mappings. Shadow stacks need some of the same features like delayed allocation, swap and copy-on-write. The kernel needs to use faults to implement those features. The architecture has concepts of both shadow stack reads and shadow stack writes. Any shadow stack access to non-shadow stack memory will generate a fault with the shadow stack error code bit set. This means that, unlike normal write protection, the fault handler needs to create a type of memory that can be written to (with instructions that generate shadow stack writes), even to fulfill a read access. So in the case of COW memory, the COW needs to take place even with a shadow stack read. Otherwise the page will be left (shadow stack) writable in userspace. So to trigger the appropriate behavior, set FAULT_FLAG_WRITE for shadow stack accesses, even if the access was a shadow stack read. For the purpose of making this clearer, consider the following example. If a process has a shadow stack, and forks, the shadow stack PTEs will become read-only due to COW. If the CPU in one process performs a shadow stack read access to the shadow stack, for example executing a RET and causing the CPU to read the shadow stack copy of the return address, then in order for the fault to be resolved the PTE will need to be set with shadow stack permissions. But then the memory would be changeable from userspace (from CALL, RET, WRSS, etc). So this scenario needs to trigger COW, otherwise the shared page would be changeable from both processes. Shadow stack accesses can also result in errors, such as when a shadow stack overflows, or if a shadow stack access occurs to a non-shadow-stack mapping. Also, generate the errors for invalid shadow stack accesses. Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Acked-by: Mike Rapoport (IBM) Reviewed-by: Kees Cook Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v7: - Update comment in fault handler (David Hildenbrand) v6: - Update comment due to rename of Cow bit to SavedDirty v5: - Add description of COW example (Boris) - Replace "permissioned" (Boris) - Remove capitalization of shadow stack (Boris) v4: - Further improve comment talking about FAULT_FLAG_WRITE (Peterz) v3: - Improve comment talking about using FAULT_FLAG_WRITE (Peterz) --- arch/x86/include/asm/trap_pf.h | 2 ++ arch/x86/mm/fault.c | 31 +++++++++++++++++++++++++++++++ 2 files changed, 33 insertions(+) diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h index 10b1de500ab1..afa524325e55 100644 --- a/arch/x86/include/asm/trap_pf.h +++ b/arch/x86/include/asm/trap_pf.h @@ -11,6 +11,7 @@ * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access + * bit 6 == 1: shadow stack access fault * bit 15 == 1: SGX MMU page-fault */ enum x86_pf_error_code { @@ -20,6 +21,7 @@ enum x86_pf_error_code { X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, + X86_PF_SHSTK = 1 << 6, X86_PF_SGX = 1 << 15, }; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index a498ae1fbe66..776b92339cfe 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1117,8 +1117,22 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; + /* + * Shadow stack accesses (PF_SHSTK=1) are only permitted to + * shadow stack VMAs. All other accesses result in an error. + */ + if (error_code & X86_PF_SHSTK) { + if (unlikely(!(vma->vm_flags & VM_SHADOW_STACK))) + return 1; + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return 1; + return 0; + } + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ + if (unlikely(vma->vm_flags & VM_SHADOW_STACK)) + return 1; if (unlikely(!(vma->vm_flags & VM_WRITE))) return 1; return 0; @@ -1310,6 +1324,23 @@ void do_user_addr_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + /* + * For conventionally writable pages, a read can be serviced with a + * read only PTE. But for shadow stack, there isn't a concept of + * read-only shadow stack memory. If it a PTE has the shadow stack + * permission, it can be modified via CALL and RET instructions. So + * core MM needs to fault in a writable PTE and do things it already + * does for write faults. + * + * Shadow stack accesses (read or write) need to be serviced with + * shadow stack permission memory, which always include write + * permissions. So in the case of a shadow stack read access, treat it + * as a WRITE fault. This will make sure that MM will prepare + * everything (e.g., break COW) such that maybe_mkwrite() can create a + * proper shadow stack PTE. + */ + if (error_code & X86_PF_SHSTK) + flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_INSTR) -- 2.17.1