From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68B4CC433DB for ; Wed, 10 Feb 2021 20:28:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 93E6F64EE7 for ; Wed, 10 Feb 2021 20:28:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 93E6F64EE7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CB2BC6B006C; Wed, 10 Feb 2021 15:28:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C8A926B0075; Wed, 10 Feb 2021 15:28:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC82A6B007B; Wed, 10 Feb 2021 15:28:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0019.hostedemail.com [216.40.44.19]) by kanga.kvack.org (Postfix) with ESMTP id A5D7A6B006C for ; Wed, 10 Feb 2021 15:28:45 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 6B7C5824556B for ; Wed, 10 Feb 2021 20:28:45 +0000 (UTC) X-FDA: 77803496610.25.cord24_0710c7a27612 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 4789E1807AD2F for ; Wed, 10 Feb 2021 20:28:45 +0000 (UTC) X-HE-Tag: cord24_0710c7a27612 X-Filterd-Recvd-Size: 5879 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Wed, 10 Feb 2021 20:28:43 +0000 (UTC) IronPort-SDR: PrGPyVVJtW0PsvbefFHX1F5Hh40EgU4UcrexLmWBat1RjmYiQOTHdX7cEk41UH17CpetXzRXdY jAVy2vi7UkCw== X-IronPort-AV: E=McAfee;i="6000,8403,9891"; a="161292863" X-IronPort-AV: E=Sophos;i="5.81,169,1610438400"; d="scan'208";a="161292863" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2021 12:28:42 -0800 IronPort-SDR: C2gtxQF37nYHRLTyTbO97ptYfjx/bOJpnwR80Eqectr+d5v2RSeD7yjf1Uve6yLfo+U+QTX09R 11FIsNyOKKfg== X-IronPort-AV: E=Sophos;i="5.81,169,1610438400"; d="scan'208";a="421193546" Received: from yyu32-mobl1.amr.corp.intel.com (HELO [10.212.188.167]) ([10.212.188.167]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2021 12:28:40 -0800 Subject: Re: [PATCH v20 08/25] x86/mm: Introduce _PAGE_COW To: Kees Cook Cc: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu , haitao.huang@intel.com References: <20210210175703.12492-1-yu-cheng.yu@intel.com> <20210210175703.12492-9-yu-cheng.yu@intel.com> <202102101137.E109C9FE6@keescook> From: "Yu, Yu-cheng" Message-ID: <819b6d6a-64ea-d908-76ad-0a6366ed0d53@intel.com> Date: Wed, 10 Feb 2021 12:28:39 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: <202102101137.E109C9FE6@keescook> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2/10/2021 11:42 AM, Kees Cook wrote: > On Wed, Feb 10, 2021 at 09:56:46AM -0800, Yu-cheng Yu wrote: >> There is essentially no room left in the x86 hardware PTEs on some OSes >> (not Linux). That left the hardware architects looking for a way to >> represent a new memory type (shadow stack) within the existing bits. >> They chose to repurpose a lightly-used state: Write=0, Dirty=1. >> >> The reason it's lightly used is that Dirty=1 is normally set by hardware >> and cannot normally be set by hardware on a Write=0 PTE. Software must >> normally be involved to create one of these PTEs, so software can simply >> opt to not create them. >> >> In places where Linux normally creates Write=0, Dirty=1, it can use the >> software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other >> words, whenever Linux needs to create Write=0, Dirty=1, it instead creates >> Write=0, Cow=1, except for shadow stack, which is Write=0, Dirty=1. This >> clearly separates shadow stack from other data, and results in the >> following: >> >> (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1) >> (b) A R/O page that has been COW'ed: (Write=0, Cow=1) >> The user page is in a R/O VMA, and get_user_pages() needs a writable >> copy. The page fault handler creates a copy of the page and sets >> the new copy's PTE as Write=0 and Cow=1. >> (c) A shadow stack PTE: (Write=0, Dirty=1) >> (d) A shared shadow stack PTE: (Write=0, Cow=1) >> When a shadow stack page is being shared among processes (this happens >> at fork()), its PTE is made Dirty=0, so the next shadow stack access >> causes a fault, and the page is duplicated and Dirty=1 is set again. >> This is the COW equivalent for shadow stack pages, even though it's >> copy-on-access rather than copy-on-write. >> (e) A page where the processor observed a Write=1 PTE, started a write, set >> Dirty=1, but then observed a Write=0 PTE. That's possible today, but >> will not happen on processors that support shadow stack. >> >> Define _PAGE_COW and update pte_*() helpers and apply the same changes to >> pmd and pud. > > I still find this commit confusing mostly due to _PAGE_COW being 0 > without CET enabled. Shouldn't this just get changed universally? Why > should this change depend on CET? > For example, in... static inline int pte_write(pte_t pte) { if (cpu_feature_enabled(X86_FEATURE_SHSTK)) return pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY); else return pte_flags(pte) & _PAGE_RW; } There are four cases: (a) RW=1, Dirty=1 -> writable (b) RW=1, Dirty=0 -> writable (c) RW=0, Dirty=0 -> not writable (d) RW=0, Dirty=1 -> shadow stack, or not-writable if !X86_FEATURE_SHSTK Case (d) is ture only when shadow stack is enabled, otherwise it is not writable. With shadow stack feature, the usual dirty, copy-on-write PTE becomes RW=0, Cow=1. We can get this changed universally, but all usual dirty, copy-on-write PTEs need the Dirty/Cow swapping, always. Is that desirable? -- Yu-cheng [...]