From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 054C7C433EF for ; Thu, 31 Mar 2022 17:13:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240532AbiCaRO5 (ORCPT ); Thu, 31 Mar 2022 13:14:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43000 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232034AbiCaRO4 (ORCPT ); Thu, 31 Mar 2022 13:14:56 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 768F71BE128 for ; Thu, 31 Mar 2022 10:13:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1648746788; x=1680282788; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=vKD+FSo7ycenmTLPLG5I9AmNhjGi8PvaANfGgxaPLrI=; b=C7eWHZ/kZcq3JTbOV0ooAaFLXBdL04b0oQ0HTi3xm2mbDRN6J9r4qJbW W2inCy2sqxAL4hMOXORi5Np9NSiOV6T/jiDJapfIOeIP6KkABFXBUPpQV 2Y0dX23Aj8QLVimF9GlTWWklCYHTUPW2LSmbTRrKHsPLt3sfpwfqwYOZs OwKt7rHgRzwXMa1vG68CxsiQhNbt7/GJoNuQDR0uxMy9uSy6uhQePZwym vSf7j6G6lFp5l5HMO3PQljYc4DYg+oC0sjhl8JajW3Q61ZhDX/WGectck ybooFncO/icWuxnBN6k4dB76mOKFbuNDPM2FfOZw8j8tGZGyvi0cLcGrG Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10302"; a="258730997" X-IronPort-AV: E=Sophos;i="5.90,225,1643702400"; d="scan'208";a="258730997" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Mar 2022 10:13:08 -0700 X-IronPort-AV: E=Sophos;i="5.90,225,1643702400"; d="scan'208";a="566658737" Received: from jccornej-mobl.amr.corp.intel.com (HELO localhost) ([10.213.179.11]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Mar 2022 10:13:07 -0700 Date: Thu, 31 Mar 2022 10:13:07 -0700 From: Ira Weiny To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: Re: [PATCH V9 00/45] PKS/PMEM: Add Stray Write Protection Message-ID: References: <20220310172019.850939-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220310172019.850939-1-ira.weiny@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 10, 2022 at 09:19:34AM -0800, Ira wrote: > From: Ira Weiny > > > I'm looking for Intel acks on the series prior to submitting to maintainers. > Most of the changes from V8 to V9 was in getting the tests straightened out. > But there are some improvements in the actual code. Is there any feedback on this? Ira > > > Changes for V9 > > Review and update all commit messages. > Update cover letter below > > PKS Core > Separate user and supervisor pkey code in the headers > create linux/pks.h for supervisor calls > This facilitated making the pmem code more efficient > Completely rearchitect the test code > [After Dave Hansen and Rick Edgecombe found issues in the test > code it was easier to rearchitect the code completely > rather than attempt to fix it.] > Remove pks_test_callback in favor of using fault hooks > Fault hooks also isolate the fault callbacks from being > false positives if non-test consumers are running > Make additional PKS_TEST_RUN_ALL Kconfig option which is > mutually exclusive to any non-test PKS consumer > PKS_TEST_RUN_ALL takes over all pkey callbacks > Ensure that each test runs within it's own context and is > mutually exclusive from running while any other test is > running. > Ensure test session and context memory is cleaned up on file > close > Use pr_debug() and dynamic debug for in kernel debug messages > Enhance test_pks selftest > Add the ability to run all tests not just the context > switch test > Standardize output [PASS][FAIL][SKIP] > Add '-d' option enables dynamic debug to see the kernel > debug messages > > Incorporate feedback from Rick Edgecombe > Update all pkey types to u8 > Fix up test code barriers > Move patch declaring PKS_INIT_VALUE ahead of the patch which enables > PKS so that PKS_INIT_VALUE can be used when pks_setup() is > first created > From Dan Williams > Use macros instead of an enum for a pkey allocation scheme > which is predicated on the config options of consumers > This almost worked perfectly. It required a bit of > tweeking to be able to allocate all of the keys. > > From Dave Hansen > Reposition some code to be near/similar to user pkeys > s/pks_write_current/x86_pkrs_load > s/pks_saved_pkrs/pkrs > Update Documentation > s/PKR_{RW,AD,WD}_KEY/PKR_{RW,AD,WD}_MASK > Consistently use lower case for pkey > Update commit messages > Add Acks > > PMEM Stray Write > Building on the change to the pks_mk_*() function rename > s/pgmap_mk_*/pgmap_set_*/ > s/dax_mk_*/dax_set_*/ > From Dan Williams > Avoid adding new dax operations by teaching dax_device about pgmap > Remove pgmap_protection_flag_invalid() patch (Just let > kmap'ings fail) > > > PKS/PMEM Stray write protection > =============================== > > This series is broken into 2 parts. > > 1) Introduce Protection Key Supervisor (PKS), testing, and > documentation > 2) Use PKS to protect PMEM from stray writes > > Introduce Protection Key Supervisor (PKS) > ----------------------------------------- > > PKS enables protections on 'domains' of supervisor pages to limit supervisor > mode access to pages beyond the normal paging protections. PKS works in a > similar fashion to user space pkeys, PKU. As with PKU, supervisor pkeys are > checked in addition to normal paging protections. And page mappings are > assigned to a domain by setting a 4 bit pkey in the PTE of that mapping. > > Unlike PKU, permissions are changed via a MSR update. This update avoids TLB > flushes making this an efficient way to alter protections vs PTE updates. > > Also, unlike PTE updates PKS permission changes apply only to the current > processor. Therefore changing permissions apply only to that thread and not > any other cpu/process. This allows protections to remain in place on other > cpus for additional protection and isolation. > > Even though PKS updates are thread local, XSAVE is not supported for the PKRS > MSR. Therefore this implementation saves and restores the MSR across context > switches and during exceptions within software. Nested exceptions are > supported by each exception getting a new PKS state. > > For consistent behavior with current paging protections, pkey 0 is reserved and > configured to allow full access via the pkey mechanism, thus preserving the > default paging protections because PTEs naturally have a pkey value of 0. > > Other keys, (1-15) are statically allocated by kernel consumers when > configured. This is done by adding the appropriate PKS_NEW_KEY and > PKS_DECLARE_INIT_VALUE macros to pks-keys.h. > > Two PKS consumers, PKS_TEST and PMEM stray write protection, are included in > this series. When the number of users grows larger the sharing of keys will > need to be resolved depending on the needs of the users at that time. Many > methods have been contemplated but the number of kernel users and use cases > envisioned is still quite small, much less than the 15 available keys. > > To summarize, the following are key attributes of PKS. > > 1) Fast switching of permissions > 1a) Prevents access without page table manipulations > 1b) No TLB flushes required > 2) Works on a per thread basis, thus allowing protections to be > preserved on threads which are not actively accessing data through > the mapping. > > PKS is available with 4 and 5 level paging. For this and simplicity of > implementation, the feature is restricted to x86_64. > > > Use PKS to protect PMEM from stray writes > ----------------------------------------- > > DAX leverages the direct-map to enable 'struct page' services for PMEM. Given > that PMEM capacity may be an order of magnitude higher capacity than System RAM > it presents a large vulnerability surface to stray writes. Such a stray write > becomes a silent data corruption bug. > > Stray pointers to System RAM may result in a crash or other undesirable > behavior which, while unfortunate, are usually recoverable with a reboot. > Stray writes to PMEM are permanent in nature and thus are more likely to result > in permanent user data loss. Given that PMEM access from the kernel is limited > to a constrained set of locations (PMEM driver, Filesystem-DAX, direct-I/O, and > any properly kmap'ed page), it is amenable to PKS protection. > > Set up an infrastructure for extra device access protection. Then implement the > protection using the new Protection Keys Supervisor (PKS) on architectures > which support it. > > Because PMEM pages are all associated with a struct dev_pagemap and flags in > struct page are valuable the flag of protecting memory can be stored in struct > dev_pagemap. All PMEM is protected by the same pkey. So a single flag is all > that is needed in each dev_pagemap to indicate protection. > > General access in the kernel is supported by modifying the kmap infrastructure > which can detect if a page is pks protected and enable access until the > corresponding unmap is called. > > Because PKS is a thread local mechanism and because kmap was never really > intended to create a long term mapping, this implementation does not support > the kmap()/kunmap() calls. Calling kmap() on a PMEM protected page is allowed > but accessing that mapping will cause a fault. > > Originally this series modified many of the kmap call sites to indicate they > were thread local.[1] And an attempt to support kmap()[2] was made. But now > that kmap_local_page() has been developed[3] and in more wide spread use, > kmap() can safely be left unsupported. > > How the fault is handled is configurable via a new module parameter > memremap.pks_fault_mode. Two modes are supported. > > 'relaxed' (default) -- WARN_ONCE, disable the protection and allow > access > > 'strict' -- prevent any unguarded access to a protected dev_pagemap > range > > This 'safety valve' feature has already been useful in the development of this > feature. > > > [1] https://lore.kernel.org/lkml/20201009195033.3208459-1-ira.weiny@intel.com/ > > [2] https://lore.kernel.org/lkml/87mtycqcjf.fsf@nanos.tec.linutronix.de/ > > [3] https://lore.kernel.org/lkml/20210128061503.1496847-1-ira.weiny@intel.com/ > https://lore.kernel.org/lkml/20210210062221.3023586-1-ira.weiny@intel.com/ > https://lore.kernel.org/lkml/20210205170030.856723-1-ira.weiny@intel.com/ > https://lore.kernel.org/lkml/20210217024826.3466046-1-ira.weiny@intel.com/ > > > ---------------------------------------------------------------------------- > Changes for V8 > > Feedback from Thomas > * clean up noinstr mess > * Fix static PKEY allocation mess > * Ensure all functions are consistently named. > * Split up patches to do 1 thing per patch > * pkey_update_pkval() implementation > * Streamline the use of pks_write_pkrs() by not disabling preemption > - Leave this to the callers who require it. > - Use documentation and lockdep to prevent errors > * Clean up commit messages to explain in detail _why_ each patch is > there. > > Feedback from Dave H. > * Leave out pks_mk_readonly() as it is not used by the PMEM use case > > Feedback from Peter Anvin > * Replace pks_abandon_pkey() with pks_update_exception() > This is an even greater simplification in that it no longer > attempts to shield users from faults. As the main use case for > abandoning a key was to allow a system to continue running even > with an error. This should be a rare event so the performance > should not be an issue. > > * Simplify ARCH_ENABLE_SUPERVISOR_PKEYS > > * Update PKS Test code > - Add default value test > - Split up the test code into patches which follow each feature > addition > - simplify test code processing > - ensure consistent reporting of errors. > > * Ensure all entry points to the PKS code are protected by > cpu_feature_enabled(X86_FEATURE_PKS) > - At the same time make sure non-entry points or sub-functions to the > PKS code are not _unnecessarily_ protected by the feature check > > * Update documentation > - Use kernel docs to place the docs with the code for easier internal > developer use > > * Adjust the PMEM use cases for the core changes > > * Split the PMEM patches up to be 1 change per patch and help clarify review > > * Review all header files and remove those no longer needed > > * Review/update/clarify all commit messages > > Fenghua Yu (1): > mm/pkeys: Define PKS page table macros > > Ira Weiny (43): > entry: Create an internal irqentry_exit_cond_resched() call > Documentation/protection-keys: Clean up documentation for User Space > pkeys > x86/pkeys: Clarify PKRU_AD_KEY macro > x86/pkeys: Make PKRU macros generic > x86/fpu: Refactor arch_set_user_pkey_access() > mm/pkeys: Add Kconfig options for PKS > x86/pkeys: Add PKS CPU feature bit > x86/fault: Adjust WARN_ON for pkey fault > Documentation/pkeys: Add initial PKS documentation > mm/pkeys: Provide for PKS key allocation > x86/pkeys: Enable PKS on cpus which support it > mm/pkeys: PKS testing, add initial test code > x86/selftests: Add test_pks > x86/pkeys: Introduce pks_write_pkrs() > x86/pkeys: Preserve the PKS MSR on context switch > mm/pkeys: Introduce pks_set_readwrite() > mm/pkeys: Introduce pks_set_noaccess() > mm/pkeys: PKS testing, add a fault call back > mm/pkeys: PKS testing, add pks_set_*() tests > mm/pkeys: PKS testing, test context switching > x86/entry: Add auxiliary pt_regs space > entry: Split up irqentry_exit_cond_resched() > entry: Add calls for save/restore auxiliary pt_regs > x86/entry: Define arch_{save|restore}_auxiliary_pt_regs() > x86/pkeys: Preserve PKRS MSR across exceptions > x86/fault: Print PKS MSR on fault > mm/pkeys: PKS testing, Add exception test > mm/pkeys: Introduce pks_update_exception() > mm/pkeys: PKS testing, test pks_update_exception() > mm/pkeys: PKS testing, add test for all keys > mm/pkeys: Add pks_available() > memremap_pages: Add Kconfig for DEVMAP_ACCESS_PROTECTION > memremap_pages: Introduce pgmap_protection_available() > memremap_pages: Introduce a PGMAP_PROTECTION flag > memremap_pages: Introduce devmap_protected() > memremap_pages: Reserve a PKS pkey for eventual use by PMEM > memremap_pages: Set PKS pkey in PTEs if requested > memremap_pages: Define pgmap_set_{readwrite|noaccess}() calls > memremap_pages: Add memremap.pks_fault_mode > kmap: Make kmap work for devmap protected pages > dax: Stray access protection for dax_direct_access() > nvdimm/pmem: Enable stray access protection > devdax: Enable stray access protection > > Rick Edgecombe (1): > mm/pkeys: Introduce PKS fault callbacks > > .../admin-guide/kernel-parameters.txt | 12 + > Documentation/core-api/protection-keys.rst | 130 ++- > arch/x86/Kconfig | 6 + > arch/x86/entry/calling.h | 20 + > arch/x86/entry/common.c | 2 +- > arch/x86/entry/entry_64.S | 22 + > arch/x86/entry/entry_64_compat.S | 6 + > arch/x86/include/asm/cpufeatures.h | 1 + > arch/x86/include/asm/disabled-features.h | 8 +- > arch/x86/include/asm/entry-common.h | 15 + > arch/x86/include/asm/msr-index.h | 1 + > arch/x86/include/asm/pgtable_types.h | 22 + > arch/x86/include/asm/pkeys.h | 2 + > arch/x86/include/asm/pkeys_common.h | 18 + > arch/x86/include/asm/pkru.h | 20 +- > arch/x86/include/asm/pks.h | 46 ++ > arch/x86/include/asm/processor.h | 15 +- > arch/x86/include/asm/ptrace.h | 21 + > arch/x86/include/uapi/asm/processor-flags.h | 2 + > arch/x86/kernel/asm-offsets_64.c | 15 + > arch/x86/kernel/cpu/common.c | 2 + > arch/x86/kernel/dumpstack.c | 32 +- > arch/x86/kernel/fpu/xstate.c | 22 +- > arch/x86/kernel/head_64.S | 6 + > arch/x86/kernel/process_64.c | 3 + > arch/x86/mm/fault.c | 17 +- > arch/x86/mm/pkeys.c | 320 +++++++- > drivers/dax/device.c | 2 + > drivers/dax/super.c | 59 ++ > drivers/md/dm-writecache.c | 8 +- > drivers/nvdimm/pmem.c | 26 + > fs/dax.c | 8 + > fs/fuse/virtio_fs.c | 2 + > include/linux/dax.h | 5 + > include/linux/entry-common.h | 15 +- > include/linux/highmem-internal.h | 4 + > include/linux/memremap.h | 1 + > include/linux/mm.h | 72 ++ > include/linux/pgtable.h | 4 + > include/linux/pks-keys.h | 92 +++ > include/linux/pks.h | 73 ++ > include/linux/sched.h | 7 + > include/uapi/asm-generic/mman-common.h | 1 + > init/init_task.c | 3 + > kernel/entry/common.c | 44 +- > kernel/sched/core.c | 40 +- > lib/Kconfig.debug | 33 + > lib/Makefile | 3 + > lib/pks/Makefile | 3 + > lib/pks/pks_test.c | 755 ++++++++++++++++++ > mm/Kconfig | 32 + > mm/memremap.c | 132 +++ > tools/testing/selftests/x86/Makefile | 2 +- > tools/testing/selftests/x86/test_pks.c | 514 ++++++++++++ > 54 files changed, 2617 insertions(+), 109 deletions(-) > create mode 100644 arch/x86/include/asm/pkeys_common.h > create mode 100644 arch/x86/include/asm/pks.h > create mode 100644 include/linux/pks-keys.h > create mode 100644 include/linux/pks.h > create mode 100644 lib/pks/Makefile > create mode 100644 lib/pks/pks_test.c > create mode 100644 tools/testing/selftests/x86/test_pks.c > > -- > 2.35.1 >