From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 996F6C25B4E for ; Tue, 24 Jan 2023 16:24:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9E2A6B0073; Tue, 24 Jan 2023 11:24:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C4E1C6B0074; Tue, 24 Jan 2023 11:24:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AEE5A6B0075; Tue, 24 Jan 2023 11:24:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9E98D6B0073 for ; Tue, 24 Jan 2023 11:24:12 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6B4C040646 for ; Tue, 24 Jan 2023 16:24:12 +0000 (UTC) X-FDA: 80390214744.23.67C70BD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 4CD121C000A for ; Tue, 24 Jan 2023 16:24:09 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=iz3bv4v6; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674577449; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bX9IT7xjfbuY2NTWDjty1xoUFXXw0P63GK85NclMpJ4=; b=QSL4DsKMMM17FImto6zJZOYN8XOM3vdFu81M/E9qHXjCPwJU9oG0uUeQgUMOchAyScbLwi zJOmyu/8Oihkj65xmVEiFGXbZ/MmZt3T3Z3qCV/koyTFXFK1zpw3Y+mWA5vU3rXCyR7oWv Jn+xgN0Y9WqebxZUZv095MA3OYF1+AA= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=iz3bv4v6; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674577449; a=rsa-sha256; cv=none; b=ntn2BMPoDZ93Mjxjs39Zpn7cLPMJxJ82VUNQhjYz8NwvwZ54nga5C2icvYGzx+FVlLxbid /STLxwRoTA4CehlSgMEoj2+yh/JCeiYKkTkH2nv11uR7BNIvYSI+YCFV5vj5hUBqLCmD5m Whpl6owwDGhe71afBey8yop4bUmtBpM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674577448; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bX9IT7xjfbuY2NTWDjty1xoUFXXw0P63GK85NclMpJ4=; b=iz3bv4v6WrgEffwcLDCXBf81T/N5RYMTVFWRzls2qJdGPuv6nAxmXG/j3G25UBVy8N/PMJ Ka2DHUtal6IBiY+XnALR/2PdwJPNuagndy6g+nLmdmYK8qvfyoi6VIsAfRJOlSg/1EP+XC KpLXY1pgzwmUTln+t+xvRm2ZLLSoG6s= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-63-qwI0nokhOiKx4pCS5yVVag-1; Tue, 24 Jan 2023 11:24:07 -0500 X-MC-Unique: qwI0nokhOiKx4pCS5yVVag-1 Received: by mail-wm1-f70.google.com with SMTP id l23-20020a7bc457000000b003db0cb8e543so4140264wmi.3 for ; Tue, 24 Jan 2023 08:24:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=bX9IT7xjfbuY2NTWDjty1xoUFXXw0P63GK85NclMpJ4=; b=3ld6atc5Nd4E7j7/fQUXanKZX7AN6cQHS//cFbhS46ft1N3Z1CVHAhou5sANNBhelQ O8E0joOxT4GnVWPRSwF0LZMM0Ip1FRbAacgi9mwKu6DfGO+fNIAt7cuFkY5TQzlK4oZZ LTnr27DOnTY/1mGZpX0zSXY3i12VmhxZuZhLS+5o1CdkQv8gNKqkNgFAh0qN+lkogo7H TSyF7jVzTlbZeWwUNyN4yjJ3QfzmO/JHM0Sqarqnt8bxBwj+df6q8b7EZ4FiXJykBx0z G7JssyWUCu9EZt5cGlHcVHWd23nA5VZM/GZBGJsqspw1dHsMPTHgx39Jj1WHvWc28Zdx +jLw== X-Gm-Message-State: AFqh2koGEMS90ezWdls9Qy7gro+hFcex1P+wRmfJ1HrlvLUDWlRtBpXC PMcUUiAH3V72XhLDss2SyXoB52B+Ve2kSLSr2niuDqgohZ2wnvpP2yEdmV1D7CvDQ86TJUQInKy Y4PBSmR7z7nU= X-Received: by 2002:a7b:cbcb:0:b0:3db:2ad:e330 with SMTP id n11-20020a7bcbcb000000b003db02ade330mr28272743wmi.5.1674577445832; Tue, 24 Jan 2023 08:24:05 -0800 (PST) X-Google-Smtp-Source: AMrXdXvVa2+1egZlhr/xt2amqGJknvYkEtFYGQNyqzp/aYD39jd9aby4xdRXSty3HU8zkUSOb5cmTg== X-Received: by 2002:a7b:cbcb:0:b0:3db:2ad:e330 with SMTP id n11-20020a7bcbcb000000b003db02ade330mr28272696wmi.5.1674577445575; Tue, 24 Jan 2023 08:24:05 -0800 (PST) Received: from ?IPV6:2003:cb:c707:9d00:9303:90ce:6dcb:2bc9? (p200300cbc7079d00930390ce6dcb2bc9.dip0.t-ipconnect.de. [2003:cb:c707:9d00:9303:90ce:6dcb:2bc9]) by smtp.gmail.com with ESMTPSA id l36-20020a05600c08a400b003da28dfdedcsm2868528wmp.5.2023.01.24.08.24.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 24 Jan 2023 08:24:05 -0800 (PST) Message-ID: <1327c608-1473-af4f-d962-c24f04f3952c@redhat.com> Date: Tue, 24 Jan 2023 17:24:02 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 To: "Edgecombe, Rick P" , "bsingharora@gmail.com" , "hpa@zytor.com" , "Syromiatnikov, Eugene" , "peterz@infradead.org" , "rdunlap@infradead.org" , "keescook@chromium.org" , "dave.hansen@linux.intel.com" , "kirill.shutemov@linux.intel.com" , "Eranian, Stephane" , "linux-mm@kvack.org" , "fweimer@redhat.com" , "nadav.amit@gmail.com" , "jannh@google.com" , "dethoma@microsoft.com" , "linux-arch@vger.kernel.org" , "kcc@google.com" , "pavel@ucw.cz" , "oleg@redhat.com" , "hjl.tools@gmail.com" , "bp@alien8.de" , "Lutomirski, Andy" , "linux-doc@vger.kernel.org" , "arnd@arndb.de" , "tglx@linutronix.de" , "Schimpe, Christina" , "x86@kernel.org" , "mike.kravetz@oracle.com" , "Yang, Weijiang" , "jamorris@linux.microsoft.com" , "john.allen@amd.com" , "rppt@kernel.org" , "andrew.cooper3@citrix.com" , "mingo@redhat.com" , "corbet@lwn.net" , "linux-kernel@vger.kernel.org" , "linux-api@vger.kernel.org" , "gorcunov@gmail.com" , "akpm@linux-foundation.org" Cc: "Yu, Yu-cheng" References: <20230119212317.8324-1-rick.p.edgecombe@intel.com> <20230119212317.8324-19-rick.p.edgecombe@intel.com> <7f63d13d-7940-afb6-8b25-26fdf3804e00@redhat.com> <50cf64932507ba60639eca28692e7df285bcc0a7.camel@intel.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v5 18/39] mm: Handle faultless write upgrades for shstk In-Reply-To: <50cf64932507ba60639eca28692e7df285bcc0a7.camel@intel.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4CD121C000A X-Rspam-User: X-Stat-Signature: q7544o1n3cpyxfrw9za3ha8g6p9k7iuf X-HE-Tag: 1674577449-453416 X-HE-Meta: U2FsdGVkX19pJcre6yWtRGS8wY4jUn3ID+dLIQGCo4rXFT1ar50qe/9dpndIF2LOqHzRjXi29wr6UZ+04tSbifuneNHEfMeutrFoVY/joIXxbB6gtPEWNXkY4XRd6HH3onHZvGklcb/nGAFZub8UShbNkh4SLFB1QJ/phV+aeqQKsRDRPh7S7nwsXzwKQBlVYdeFJOYmPEhoBier/64SEipY7ozhbINTMJbi5ELHY2A67KeLmOPLmChEHP2fVVOjSUSiedUSNvCrQR7UPakF9g3jsi+CaFGsu2+7p287SdtRW9C9fMZjHmedW2Q5ccrtnjYXPNrsBva6Z4NGfIAvMfn0S90h970r+oOFI2uLG0W0s4+NV/nnfAf0qe/JCZAAR/aKGCDKuTVuqP7W7oumIDrzuY7cMmTgPw9D5yZktce7Gk1WJhsu78DTCYOPW3jN8tb0zOWp/JcVrZZ6RnIa3eiVMWdsbpY72brBDY6x7KVAxiIdwO/JKenogcCRRE9mjlG4x4AWfpfNBtjCTfpUwQ11GJT00lXe1WtpguQZNYAC/ZcpRBPX5lEl5ab+Tog3ESik03qf9ej60koXeCysLXcd9H7fD3e2GUMJelRXIfRwVFvuozMrBtbTKTm0j3k1Y8C9uoXRDMrTxbmNgHu5GXE1gGUpLLMy92/SSwWxXUNZ6zHtC2DvURVuiGz3Wf9MTrRq9c6L2MINgpI6qZo5akG+Yia8Piy+a9vcZJr9qJnhZzDEnxFwZadiPP2j1GI0CZ1TeuNeMWt2c3nqr5rSgkRLdSv72pZEZS0ntTE7t7UOdusM06pCWydh+jG81G6n4v7Ymm4SWVUpXFdEWGKK11g0c8c8g3fwyJmPcHsFGhOmcr9iizlKJFX6WhC5KMOU1Rj0ZAQkVz3xkewyLWTIlwii1ZZgMmv1SqC08VDVjC5wcTTU+bpF6EDsY97EXEezGu6PurELqnSrWq6LSVc HMvrwdR8 omGWBnu75IPNuFHbi+8jNDVpglFCdfJGkHD9/x8XuTE0Ulst0lmHJHKHNle6lF9V8zGvTEUVp3zXhFZGKW1+bono2O0Mr4y6fjrCbhoO03sEs92Taga4LfdAJsAYeSxC5xm/ZvoKS0HD2jANnxmLwqVlaqTEHFxlV3z7fs6SUr1mWCwoWJLi9NWHYABrP3DrXgjsiynLkhVX4UWIO05uWfhu1Mk42jh/PxCB2ZTk73KAIOVkE7DhvzY/UWOqdxFC5GmlpYV71b5vMwEDbyNuE4HFpveZwKNbPxDvDn35LjYHBp+7N2x3H2CHvmgXZPRSkcqOuftL+fAjbIJRzmrApo9XEMinUeBUFhbTCDlB4Jvog7b/dqDiq7LhpDJGWJvAYDQoekoKwAMh/amutS4Uw5qk+obX7Zafp5kZbQiOfZnbJabujKSRih/HGLPWZZZpo72I+vBJKnUUL8+kpHwXr/IcoBsjsQfMnZ/hAAMEp/42coSelJoQ0rcq3833RY0ApbjgUBYm/c3amcnzQwnouxrc3mRxYQdgQqLK0A2+rsgDJUEcNfDZBejf0bbbM53c4rGcnyKh9eyrr3R4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 23.01.23 21:47, Edgecombe, Rick P wrote: > On Mon, 2023-01-23 at 10:50 +0100, David Hildenbrand wrote: >> On 19.01.23 22:22, Rick Edgecombe wrote: >>> The x86 Control-flow Enforcement Technology (CET) feature includes >>> a new >>> type of memory called shadow stack. This shadow stack memory has >>> some >>> unusual properties, which requires some core mm changes to function >>> properly. >>> >>> Since shadow stack memory can be changed from userspace, is both >>> VM_SHADOW_STACK and VM_WRITE. But it should not be made >>> conventionally >>> writable (i.e. pte_mkwrite()). So some code that calls >>> pte_mkwrite() needs >>> to be adjusted. >>> >>> One such case is when memory is made writable without an actual >>> write >>> fault. This happens in some mprotect operations, and also prot_numa >>> faults. >>> In both cases code checks whether it should be made >>> (conventionally) >>> writable by calling vma_wants_manual_pte_write_upgrade(). >>> >>> One way to fix this would be have code actually check if memory is >>> also >>> VM_SHADOW_STACK and in that case call pte_mkwrite_shstk(). But >>> since >>> most memory won't be shadow stack, just have simpler logic and skip >>> this >>> optimization by changing vma_wants_manual_pte_write_upgrade() to >>> not >>> return true for VM_SHADOW_STACK_MEMORY. This will simply handle all >>> cases of this type. >>> >>> Cc: David Hildenbrand >>> Tested-by: Pengfei Xu >>> Tested-by: John Allen >>> Signed-off-by: Yu-cheng Yu >>> Reviewed-by: Kirill A. Shutemov >>> Signed-off-by: Rick Edgecombe >>> --- >> >> Instead of having these x86-shadow stack details all over the MM >> space, >> was the option explored to handle this more in arch specific code? >> >> IIUC, one way to get it working would be >> >> 1) Have a SW "shadowstack" PTE flag. >> 2) Have an "SW-dirty" PTE flag, to store "dirty=1" when "write=0". > > I don't think that idea came up. So vma->vm_page_prot would have the SW > shadow stack flag for VM_SHADOW_STACK, and pte_mkwrite() could do > Write=0,Dirty=1 part. It seems like it should work. > Right, if we include it in vma->vm_page_prot, we'd immediately let mk_pte() just handle that. Otherwise, we'd have to refactor e.g., mk_pte() to consume a vma instead of the vma->vm_page_prot. Let's see if we can avoid that for now. >> >> pte_mkwrite(), pte_write(), pte_dirty ... can then make decisions >> based >> on the "shadowstack" PTE flag and hide all these details from core- >> mm. >> >> When mapping a shadowstack page (new page, migration, swapin, ...), >> which can be obtained by looking at the VMA flags, the first thing >> you'd >> do is set the "shadowstack" PTE flag. > > I guess the downside is that it uses an extra software bit. But the > other positive is that it's less error prone, so that someone writing > core-mm code won't introduce a change that makes shadow stack VMAs > Write=1 if they don't know to also check for VM_SHADOW_STACK. Right. And I think this mimics the what I would have expected HW to provide: a dedicated HW bit, not somehow mangling this into semantics of existing bits. Roughly speaking: if we abstract it that way and get all of the "how to set it writable now?" out of core-MM, it not only is cleaner and less error prone, it might even allow other architectures that implement something comparable (e.g., using a dedicated HW bit) to actually reuse some of that work. Otherwise most of that "shstk" is really just x86 specific ... I guess the only cases we have to special case would be page pinning code where pte_write() would indicate that the PTE is writable (well, it is, just not by "ordinary CPU instruction" context directly): but you do that already, so ... :) Sorry for stumbling over that this late, I only started looking into this when you CCed me on that one patch. -- Thanks, David / dhildenb