linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ira.weiny@intel.com
To: Dave Hansen <dave.hansen@linux.intel.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Andy Lutomirski <luto@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Fenghua Yu <fenghua.yu@intel.com>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	nvdimm@lists.linux.dev, linux-mm@kvack.org
Subject: [PATCH V7 00/18] PKS/PMEM: Add Stray Write Protection
Date: Tue,  3 Aug 2021 21:32:13 -0700	[thread overview]
Message-ID: <20210804043231.2655537-1-ira.weiny@intel.com> (raw)

From: Ira Weiny <ira.weiny@intel.com>

NOTE: x86 maintainers, I'm submitting this for ack/review by Dave Hansen and
Dan Williams.  Feel free to ignore it but we have had a lot of internal debate
on a number of design decisions so we would like to have the remaining reviews
public such that everyone can see the remaining debate/decisions.

Furthermore, this gives a public reference for Rick to build other PKS use
cases on.


PKS/PMEM Stray write protection
===============================

This series is broken into 2 parts.

	1) Introduce Protection Key Supervisor (PKS)
	2) Use PKS to protect PMEM from stray writes

Introduce Protection Key Supervisor (PKS)
-----------------------------------------

PKS enables protections on 'domains' of supervisor pages to limit supervisor
mode access to pages beyond the normal paging protections.  PKS works in a
similar fashion to user space pkeys, PKU.  As with PKU, supervisor pkeys are
checked in addition to normal paging protections and Access or Writes can be
disabled via a MSR update without TLB flushes when permissions change.

Also like PKU, a page mapping is assigned to a domain by setting pkey bits in
the page table entry for that mapping.

Access is controlled through a PKRS register which is updated via WRMSR/RDMSR.

XSAVE is not supported for the PKRS MSR.  Therefore the implementation
saves/restores the MSR across context switches and during exceptions.  Nested
exceptions are supported by each exception getting a new PKS state.

For consistent behavior with current paging protections, pkey 0 is reserved and
configured to allow full access via the pkey mechanism, thus preserving the
default paging protections.

Other keys, (1-15) are statically allocated by kernel users adding an entry to
'enum pks_pkey_consumers' and adding a corresponding default value in
consumer_defaults in create_initial_pkrs_value().  This patch series allocates
a single key for use by persistent memory stray write protection.  When the
number of users grows larger the sharing of keys will need to be resolved
depending on the needs of the users at that time.

More usage details can be found in the documentation.

The following are key attributes of PKS.

	1) Fast switching of permissions
		1a) Prevents access without page table manipulations
		1b) No TLB flushes required
	2) Works on a per thread basis

PKS is available with 4 and 5 level paging.  Like PKRU it consumes 4 bits from
the PTE to store the pkey within the entry.


Use PKS to protect PMEM from stray writes
-----------------------------------------

DAX leverages the direct-map to enable 'struct page' services for PMEM.  Given
that PMEM capacity may be an order of magnitude higher capacity than System RAM
it presents a large vulnerability surface to stray writes.  Such a stray write
becomes a silent data corruption bug.

Given that PMEM access from the kernel is limited to a constrained set of
locations (PMEM driver, Filesystem-DAX, and direct-I/O), it is amenable to PKS
protection.  Set up an infrastructure for extra device access protection. Then
implement the protection using the new Protection Keys Supervisor (PKS) on
architectures which support it.

Because PMEM pages are all associated with a struct dev_pagemap the flag of
protecting memory can be stored there.  All PMEM is protected by the same pkey.
So a single flag is all that is needed to indicate protection.

General access in the kernel is supported by modifying the kmap infrastructure
which can detect if a page is PMEM and pks protected.  If so kmap_local_page()
and kmap_atomic() can enable access until their unmap's are called.

Because PKS is a thread local mechanism and because kmap was never really
intended to create a long term mapping,

This implementation avoids supporting the kmap()/kunmap() for a number of
reasons.  First, kmap was never really intended to create long term mappings.
Second, no known kernel users of pmem use kmap.  Third, PKS is a thread local
mechanism.

Originally this series modified many of the kmap call sites to indicate they
were thread local.[1]  And an attempt to support kmap()[2] was made.  But now
that kmap_local_page() has been developed[3] and in more wide spread use,
kmap() should be safe to leave unsupported and is considered an invalid access.

Handling invalid access to these pages is configurable via a new module
parameter memremap.pks_fault_mode.  2 modes are suported.

	'relaxed' (default) -- WARN_ONCE, disable the protection and allow
	                       access

	'strict' -- prevent any unguarded access to a protected dev_pagemap
		    range

The fault handler detects the PMEM fault and applies the above configuration to
the faulting thread.  The kmap call is a special case.  It is considered an
invalid access but uses the configuration early before any access such that the
kmap code path can be better evaluated and fixed.


[1] https://lore.kernel.org/lkml/20201009195033.3208459-1-ira.weiny@intel.com/

[2] https://lore.kernel.org/lkml/87mtycqcjf.fsf@nanos.tec.linutronix.de/

[3] https://lore.kernel.org/lkml/20210128061503.1496847-1-ira.weiny@intel.com/
    https://lore.kernel.org/lkml/20210210062221.3023586-1-ira.weiny@intel.com/
    https://lore.kernel.org/lkml/20210205170030.856723-1-ira.weiny@intel.com/
    https://lore.kernel.org/lkml/20210217024826.3466046-1-ira.weiny@intel.com/

[4] https://lore.kernel.org/lkml/20201106232908.364581-1-ira.weiny@intel.com/

[5] https://lore.kernel.org/lkml/20210322053020.2287058-1-ira.weiny@intel.com/

[6] https://lore.kernel.org/lkml/20210331191405.341999-1-ira.weiny@intel.com/


Fenghua Yu (1):
  x86/pks: Add PKS kernel API

Ira Weiny (16):
  x86/pkeys: Create pkeys_common.h
  x86/fpu: Refactor arch_set_user_pkey_access()
  x86/pks: Add additional PKEY helper macros
  x86/pks: Add PKS defines and Kconfig options
  x86/pks: Add PKS setup code
  x86/fault: Adjust WARN_ON for PKey fault
  x86/pks: Preserve the PKRS MSR on context switch
  x86/entry: Preserve PKRS MSR across exceptions
  x86/pks: Introduce pks_abandon_protections()
  x86/pks: Add PKS Test code
  memremap_pages: Add access protection via supervisor Protection Keys
    (PKS)
  memremap_pages: Add memremap.pks_fault_mode
  kmap: Add stray access protection for devmap pages
  dax: Stray access protection for dax_direct_access()
  nvdimm/pmem: Enable stray access protection
  devdax: Enable stray access protection

Rick Edgecombe (1):
  x86/pks: Add PKS fault callbacks

 .../admin-guide/kernel-parameters.txt         |  14 +
 Documentation/core-api/protection-keys.rst    | 153 +++-
 arch/x86/Kconfig                              |   1 +
 arch/x86/entry/calling.h                      |  26 +
 arch/x86/entry/common.c                       |  56 ++
 arch/x86/entry/entry_64.S                     |  22 +-
 arch/x86/entry/entry_64_compat.S              |   6 +-
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/msr-index.h              |   1 +
 arch/x86/include/asm/pgtable_types.h          |  12 +
 arch/x86/include/asm/pkeys.h                  |   2 +
 arch/x86/include/asm/pkeys_common.h           |  19 +
 arch/x86/include/asm/pkru.h                   |  16 +-
 arch/x86/include/asm/pks.h                    |  67 ++
 arch/x86/include/asm/processor-flags.h        |   2 +
 arch/x86/include/asm/processor.h              |  19 +-
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/kernel/cpu/common.c                  |   2 +
 arch/x86/kernel/fpu/xstate.c                  |  22 +-
 arch/x86/kernel/head_64.S                     |   7 +-
 arch/x86/kernel/process.c                     |   3 +
 arch/x86/kernel/process_64.c                  |   3 +
 arch/x86/mm/fault.c                           |  82 +-
 arch/x86/mm/pkeys.c                           | 277 +++++-
 drivers/dax/device.c                          |   2 +
 drivers/dax/super.c                           |  54 ++
 drivers/md/dm-writecache.c                    |   8 +-
 drivers/nvdimm/pmem.c                         |  55 +-
 fs/dax.c                                      |   8 +
 fs/fuse/virtio_fs.c                           |   2 +
 include/linux/dax.h                           |   8 +
 include/linux/highmem-internal.h              |   5 +
 include/linux/memremap.h                      |   1 +
 include/linux/mm.h                            |  88 ++
 include/linux/pgtable.h                       |   4 +
 include/linux/pkeys.h                         |  36 +
 include/linux/sched.h                         |   7 +
 init/init_task.c                              |   3 +
 kernel/entry/common.c                         |  14 +-
 kernel/fork.c                                 |   3 +
 lib/Kconfig.debug                             |  13 +
 lib/Makefile                                  |   3 +
 lib/pks/Makefile                              |   3 +
 lib/pks/pks_test.c                            | 864 ++++++++++++++++++
 mm/Kconfig                                    |  26 +
 mm/memremap.c                                 | 158 ++++
 tools/testing/selftests/x86/Makefile          |   2 +-
 tools/testing/selftests/x86/test_pks.c        | 157 ++++
 49 files changed, 2261 insertions(+), 86 deletions(-)
 create mode 100644 arch/x86/include/asm/pkeys_common.h
 create mode 100644 arch/x86/include/asm/pks.h
 create mode 100644 lib/pks/Makefile
 create mode 100644 lib/pks/pks_test.c
 create mode 100644 tools/testing/selftests/x86/test_pks.c

-- 
2.28.0.rc0.12.gb6a658bd00c9


             reply	other threads:[~2021-08-04  4:32 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-04  4:32 ira.weiny [this message]
2021-08-04  4:32 ` [PATCH V7 01/18] x86/pkeys: Create pkeys_common.h ira.weiny
2021-08-04  4:32 ` [PATCH V7 02/18] x86/fpu: Refactor arch_set_user_pkey_access() ira.weiny
2021-11-25 14:23   ` Thomas Gleixner
2021-08-04  4:32 ` [PATCH V7 03/18] x86/pks: Add additional PKEY helper macros ira.weiny
2021-11-25 14:25   ` Thomas Gleixner
2021-11-25 16:58     ` Thomas Gleixner
2021-12-08  0:51     ` Ira Weiny
2021-12-08 15:11       ` Thomas Gleixner
2021-08-04  4:32 ` [PATCH V7 04/18] x86/pks: Add PKS defines and Kconfig options ira.weiny
2021-08-04  4:32 ` [PATCH V7 05/18] x86/pks: Add PKS setup code ira.weiny
2021-11-25 15:15   ` Thomas Gleixner
2021-11-26  3:11     ` taoyi.ty
2021-11-26  9:57       ` Thomas Gleixner
2021-11-26 11:03     ` Thomas Gleixner
2021-08-04  4:32 ` [PATCH V7 06/18] x86/fault: Adjust WARN_ON for PKey fault ira.weiny
2021-08-04  4:32 ` [PATCH V7 07/18] x86/pks: Preserve the PKRS MSR on context switch ira.weiny
2021-11-25 15:25   ` Thomas Gleixner
2021-08-04  4:32 ` [PATCH V7 08/18] x86/entry: Preserve PKRS MSR across exceptions ira.weiny
2021-11-13  0:50   ` Ira Weiny
2021-11-25 11:19     ` Thomas Gleixner
2021-12-03  1:13     ` Andy Lutomirski
2021-11-25 14:12   ` Thomas Gleixner
2021-12-07  1:54     ` Ira Weiny
2021-12-07  4:45       ` Ira Weiny
2021-12-08  0:21       ` Thomas Gleixner
2021-08-04  4:32 ` [PATCH V7 09/18] x86/pks: Add PKS kernel API ira.weiny
2021-08-04  4:32 ` [PATCH V7 10/18] x86/pks: Introduce pks_abandon_protections() ira.weiny
2021-08-04  4:32 ` [PATCH V7 11/18] x86/pks: Add PKS Test code ira.weiny
2021-08-04  4:32 ` [PATCH V7 12/18] x86/pks: Add PKS fault callbacks ira.weiny
2021-08-11 21:18   ` Edgecombe, Rick P
2021-08-17  3:21     ` Ira Weiny
2021-08-04  4:32 ` [PATCH V7 13/18] memremap_pages: Add access protection via supervisor Protection Keys (PKS) ira.weiny
2021-08-04  4:32 ` [PATCH V7 14/18] memremap_pages: Add memremap.pks_fault_mode ira.weiny
2021-08-04  4:57   ` Randy Dunlap
2021-08-07 19:32     ` Ira Weiny
2021-08-11 19:01   ` Edgecombe, Rick P
2021-08-17  3:12     ` Ira Weiny
2021-08-04  4:32 ` [PATCH V7 15/18] kmap: Add stray access protection for devmap pages ira.weiny
2021-08-04  4:32 ` [PATCH V7 16/18] dax: Stray access protection for dax_direct_access() ira.weiny
2021-08-04  4:32 ` [PATCH V7 17/18] nvdimm/pmem: Enable stray access protection ira.weiny
2021-08-04  4:32 ` [PATCH V7 18/18] devdax: " ira.weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210804043231.2655537-1-ira.weiny@intel.com \
    --to=ira.weiny@intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --subject='Re: [PATCH V7 00/18] PKS/PMEM: Add Stray Write Protection' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).