All of lore.kernel.org
 help / color / mirror / Atom feed
From: Randy Dunlap <rdunlap@infradead.org>
To: Igor Stoppa <igor.stoppa@gmail.com>,
	Mimi Zohar <zohar@linux.vnet.ibm.com>,
	Kees Cook <keescook@chromium.org>,
	Matthew Wilcox <willy@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	James Morris <jmorris@namei.org>,
	Michal Hocko <mhocko@kernel.org>,
	kernel-hardening@lists.openwall.com,
	linux-integrity@vger.kernel.org,
	linux-security-module@vger.kernel.org
Cc: igor.stoppa@huawei.com, Dave Hansen <dave.hansen@linux.intel.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Laura Abbott <labbott@redhat.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 10/17] prmem: documentation
Date: Tue, 23 Oct 2018 20:48:33 -0700	[thread overview]
Message-ID: <b09b3feb-8e3d-5fab-4e2b-35e8c252b27c@infradead.org> (raw)
In-Reply-To: <20181023213504.28905-11-igor.stoppa@huawei.com>

Hi,

On 10/23/18 2:34 PM, Igor Stoppa wrote:
> Documentation for protected memory.
> 
> Topics covered:
> * static memory allocation
> * dynamic memory allocation
> * write-rare
> 
> Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
> CC: Jonathan Corbet <corbet@lwn.net>
> CC: Randy Dunlap <rdunlap@infradead.org>
> CC: Mike Rapoport <rppt@linux.vnet.ibm.com>
> CC: linux-doc@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> ---
>  Documentation/core-api/index.rst |   1 +
>  Documentation/core-api/prmem.rst | 172 +++++++++++++++++++++++++++++++
>  MAINTAINERS                      |   1 +
>  3 files changed, 174 insertions(+)
>  create mode 100644 Documentation/core-api/prmem.rst


> diff --git a/Documentation/core-api/prmem.rst b/Documentation/core-api/prmem.rst
> new file mode 100644
> index 000000000000..16d7edfe327a
> --- /dev/null
> +++ b/Documentation/core-api/prmem.rst
> @@ -0,0 +1,172 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +.. _prmem:
> +
> +Memory Protection
> +=================
> +
> +:Date: October 2018
> +:Author: Igor Stoppa <igor.stoppa@huawei.com>
> +
> +Foreword
> +--------
> +- In a typical system using some sort of RAM as execution environment,
> +  **all** memory is initially writable.
> +
> +- It must be initialized with the appropriate content, be it code or data.
> +
> +- Said content typically undergoes modifications, i.e. relocations or
> +  relocation-induced changes.
> +
> +- The present document doesn't address such transient.

                                               transience.

> +
> +- Kernel code is protected at system level and, unlike data, it doesn't
> +  require special attention.
> +
> +Protection mechanism
> +--------------------
> +
> +- When available, the MMU can write protect memory pages that would be
> +  otherwise writable.
> +
> +- The protection has page-level granularity.
> +
> +- An attempt to overwrite a protected page will trigger an exception.
> +- **Write protected data must go exclusively to write protected pages**
> +- **Writable data must go exclusively to writable pages**
> +
> +Available protections for kernel data
> +-------------------------------------
> +
> +- **constant**
> +   Labelled as **const**, the data is never supposed to be altered.
> +   It is statically allocated - if it has any memory footprint at all.
> +   The compiler can even optimize it away, where possible, by replacing
> +   references to a **const** with its actual value.
> +
> +- **read only after init**
> +   By tagging an otherwise ordinary statically allocated variable with
> +   **__ro_after_init**, it is placed in a special segment that will
> +   become write protected, at the end of the kernel init phase.
> +   The compiler has no notion of this restriction and it will treat any
> +   write operation on such variable as legal. However, assignments that
> +   are attempted after the write protection is in place, will cause

no comma.

> +   exceptions.
> +
> +- **write rare after init**
> +   This can be seen as variant of read only after init, which uses the
> +   tag **__wr_after_init**. It is also limited to statically allocated
> +   memory. It is still possible to alter this type of variables, after
> +   the kernel init phase is complete, however it can be done exclusively
> +   with special functions, instead of the assignment operator. Using the
> +   assignment operator after conclusion of the init phase will still
> +   trigger an exception. It is not possible to transition a certain
> +   variable from __wr_ater_init to a permanent read-only status, at
> +   runtime.
> +
> +- **dynamically allocated write-rare / read-only**
> +   After defining a pool, memory can be obtained through it, primarily
> +   through the **pmalloc()** allocator. The exact writability state of the
> +   memory obtained from **pmalloc()** and friends can be configured when
> +   creating the pool. At any point it is possible to transition to a less
> +   permissive write status the memory currently associated to the pool.
> +   Once memory has become read-only, it the only valid operation, beside
> +   reading, is to released it, by destroying the pool it belongs to.
> +
> +
> +Protecting dynamically allocated memory
> +---------------------------------------
> +
> +When dealing with dynamically allocated memory, three options are
> + available for configuring its writability state:
> +
> +- **Options selected when creating a pool**
> +   When creating the pool, it is possible to choose one of the following:
> +    - **PMALLOC_MODE_RO**
> +       - Writability at allocation time: *WRITABLE*
> +       - Writability at protection time: *NONE*
> +    - **PMALLOC_MODE_WR**
> +       - Writability at allocation time: *WRITABLE*
> +       - Writability at protection time: *WRITE-RARE*
> +    - **PMALLOC_MODE_AUTO_RO**
> +       - Writability at allocation time:
> +           - the latest allocation: *WRITABLE*
> +           - every other allocation: *NONE*
> +       - Writability at protection time: *NONE*
> +    - **PMALLOC_MODE_AUTO_WR**
> +       - Writability at allocation time:
> +           - the latest allocation: *WRITABLE*
> +           - every other allocation: *WRITE-RARE*
> +       - Writability at protection time: *WRITE-RARE*
> +    - **PMALLOC_MODE_START_WR**
> +       - Writability at allocation time: *WRITE-RARE*
> +       - Writability at protection time: *WRITE-RARE*
> +
> +   **Remarks:**
> +    - The "AUTO" modes perform automatic protection of the content, whenever
> +       the current vmap_area is used up and a new one is allocated.
> +        - At that point, the vmap_area being phased out is protected.
> +        - The size of the vmap_area depends on various parameters.
> +        - It might not be possible to know for sure *when* certain data will
> +          be protected.
> +        - The functionality is provided as tradeoff between hardening and speed.
> +        - Its usefulness depends on the specific use case at hand

end above sentence with a period, please, like all of the others above it.

> +    - The "START_WR" mode is the only one which provides immediate protection, at the cost of speed.

Please try to keep the line above and a few below to < 80 characters in length.
(because some of us read rst files as text files, with a text editor, and line
wrap is ugly)

> +
> +- **Protecting the pool**
> +   This is achieved with **pmalloc_protect_pool()**
> +    - Any vmap_area currently in the pool is write-protected according to its initial configuration.
> +    - Any residual space still available from the current vmap_area is lost, as the area is protected.
> +    - **protecting a pool after every allocation will likely be very wasteful**
> +    - Using PMALLOC_MODE_START_WR is likely a better choice.
> +
> +- **Upgrading the protection level**
> +   This is achieved with **pmalloc_make_pool_ro()**
> +    - it turns the present content of a write-rare pool into read-only
> +    - can be useful when the content of the memory has settled
> +
> +
> +Caveats
> +-------
> +- Freeing of memory is not supported. Pages will be returned to the
> +  system upon destruction of their memory pool.
> +
> +- The address range available for vmalloc (and thus for pmalloc too) is
> +  limited, on 32-bit systems. However it shouldn't be an issue, since not
> +  much data is expected to be dynamically allocated and turned into
> +  write-protected.
> +
> +- Regarding SMP systems, changing state of pages and altering mappings
> +  requires performing cross-processor synchronizations of page tables.
> +  This is an additional reason for limiting the use of write rare.
> +
> +- Not only the pmalloc memory must be protected, but also any reference to
> +  it that might become the target for an attack. The attack would replace
> +  a reference to the protected memory with a reference to some other,
> +  unprotected, memory.
> +
> +- The users of rare write must take care of ensuring the atomicity of the

s/rare write/write rare/ ?

> +  action, respect to the way they use the data being altered; for example,

  This ..   "respect to the way" is awkward, but I don't know what to
change it to.

> +  take a lock before making a copy of the value to modify (if it's
> +  relevant), then alter it, issue the call to rare write and finally
> +  release the lock. Some special scenario might be exempt from the need
> +  for locking, but in general rare-write must be treated as an operation

It seemed to me that "write-rare" (or write rare) was the going name, but now
it's being called "rare write" (or rare-write).  Just be consistent, please.


> +  that can incur into races.
> +
> +- pmalloc relies on virtual memory areas and will therefore use more
> +  tlb entries. It still does a better job of it, compared to invoking

     TLB

> +  vmalloc for each allocation, but it is undeniably less optimized wrt to

s/wrt/with respect to/

> +  TLB use than using the physmap directly, through kmalloc or similar.
> +
> +
> +Utilization
> +-----------
> +
> +**add examples here**
> +
> +API
> +---
> +
> +.. kernel-doc:: include/linux/prmem.h
> +.. kernel-doc:: mm/prmem.c
> +.. kernel-doc:: include/linux/prmemextra.h


Thanks for the documentation.

-- 
~Randy

  reply	other threads:[~2018-10-24  3:48 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-23 21:34 [RFC v1 PATCH 00/17] prmem: protected memory Igor Stoppa
2018-10-23 21:34 ` [PATCH 01/17] prmem: linker section for static write rare Igor Stoppa
2018-10-23 21:34 ` [PATCH 02/17] prmem: write rare for static allocation Igor Stoppa
2018-10-25  0:24   ` Dave Hansen
2018-10-29 18:03     ` Igor Stoppa
2018-10-26  9:41   ` Peter Zijlstra
2018-10-29 20:01     ` Igor Stoppa
2018-10-23 21:34 ` [PATCH 03/17] prmem: vmalloc support for dynamic allocation Igor Stoppa
2018-10-25  0:26   ` Dave Hansen
2018-10-29 18:07     ` Igor Stoppa
2018-10-23 21:34 ` [PATCH 04/17] prmem: " Igor Stoppa
2018-10-23 21:34 ` [PATCH 05/17] prmem: shorthands for write rare on common types Igor Stoppa
2018-10-25  0:28   ` Dave Hansen
2018-10-29 18:12     ` Igor Stoppa
2018-10-23 21:34 ` [PATCH 06/17] prmem: test cases for memory protection Igor Stoppa
2018-10-24  3:27   ` Randy Dunlap
2018-10-24 14:24     ` Igor Stoppa
2018-10-25 16:43   ` Dave Hansen
2018-10-29 18:16     ` Igor Stoppa
2018-10-23 21:34 ` [PATCH 07/17] prmem: lkdtm tests " Igor Stoppa
2018-10-23 21:34 ` [PATCH 08/17] prmem: struct page: track vmap_area Igor Stoppa
2018-10-24  3:12   ` Matthew Wilcox
2018-10-24 23:01     ` Igor Stoppa
2018-10-25  2:13       ` Matthew Wilcox
2018-10-29 18:21         ` Igor Stoppa
2018-10-23 21:34 ` [PATCH 09/17] prmem: hardened usercopy Igor Stoppa
2018-10-29 11:45   ` Chris von Recklinghausen
2018-10-29 18:24     ` Igor Stoppa
2018-10-23 21:34 ` [PATCH 10/17] prmem: documentation Igor Stoppa
2018-10-24  3:48   ` Randy Dunlap [this message]
2018-10-24 14:30     ` Igor Stoppa
2018-10-24 23:04   ` Mike Rapoport
2018-10-29 19:05     ` Igor Stoppa
2018-10-26  9:26   ` Peter Zijlstra
2018-10-26 10:20     ` Matthew Wilcox
2018-10-29 19:28       ` Igor Stoppa
2018-10-26 10:46     ` Kees Cook
2018-10-28 18:31       ` Peter Zijlstra
2018-10-29 21:04         ` Igor Stoppa
2018-10-30 15:26           ` Peter Zijlstra
2018-10-30 16:37             ` Kees Cook
2018-10-30 17:06               ` Andy Lutomirski
2018-10-30 17:58                 ` Matthew Wilcox
2018-10-30 18:03                   ` Dave Hansen
2018-10-31  9:18                     ` Peter Zijlstra
2018-10-30 18:28                   ` Tycho Andersen
2018-10-30 19:20                     ` Matthew Wilcox
2018-10-30 20:43                       ` Igor Stoppa
2018-10-30 21:02                         ` Andy Lutomirski
2018-10-30 21:07                           ` Kees Cook
2018-10-30 21:25                             ` Igor Stoppa
2018-10-30 22:15                           ` Igor Stoppa
2018-10-31 10:11                             ` Peter Zijlstra
2018-10-31 20:38                               ` Andy Lutomirski
2018-10-31 20:53                                 ` Andy Lutomirski
2018-10-31  9:45                           ` Peter Zijlstra
2018-10-30 21:35                         ` Matthew Wilcox
2018-10-30 21:49                           ` Igor Stoppa
2018-10-31  4:41                           ` Andy Lutomirski
2018-10-31  9:08                             ` Igor Stoppa
2018-10-31 19:38                               ` Igor Stoppa
2018-10-31 10:02                             ` Peter Zijlstra
2018-10-31 20:36                               ` Andy Lutomirski
2018-10-31 21:00                                 ` Peter Zijlstra
2018-10-31 22:57                                   ` Andy Lutomirski
2018-10-31 23:10                                     ` Igor Stoppa
2018-10-31 23:19                                       ` Andy Lutomirski
2018-10-31 23:26                                         ` Igor Stoppa
2018-11-01  8:21                                           ` Thomas Gleixner
2018-11-01 15:58                                             ` Igor Stoppa
2018-11-01 17:08                                     ` Peter Zijlstra
2018-10-30 18:51                   ` Andy Lutomirski
2018-10-30 19:14                     ` Kees Cook
2018-10-30 21:25                     ` Matthew Wilcox
2018-10-30 21:55                       ` Igor Stoppa
2018-10-30 22:08                         ` Matthew Wilcox
2018-10-31  9:29                       ` Peter Zijlstra
2018-10-30 23:18                     ` Nadav Amit
2018-10-31  9:08                       ` Peter Zijlstra
2018-11-01 16:31                         ` Nadav Amit
2018-11-02 21:11                           ` Nadav Amit
2018-10-31  9:36                   ` Peter Zijlstra
2018-10-31 11:33                     ` Matthew Wilcox
2018-11-13 14:25                 ` Igor Stoppa
2018-11-13 17:16                   ` Andy Lutomirski
2018-11-13 17:43                     ` Nadav Amit
2018-11-13 17:47                       ` Andy Lutomirski
2018-11-13 18:06                         ` Nadav Amit
2018-11-13 18:31                         ` Igor Stoppa
2018-11-13 18:33                           ` Igor Stoppa
2018-11-13 18:36                             ` Andy Lutomirski
2018-11-13 19:03                               ` Igor Stoppa
2018-11-21 16:34                               ` Igor Stoppa
2018-11-21 17:36                                 ` Nadav Amit
2018-11-21 18:01                                   ` Igor Stoppa
2018-11-21 18:15                                 ` Andy Lutomirski
2018-11-22 19:27                                   ` Igor Stoppa
2018-11-22 20:04                                     ` Matthew Wilcox
2018-11-22 20:53                                       ` Andy Lutomirski
2018-12-04 12:34                                         ` Igor Stoppa
2018-11-13 18:48                           ` Andy Lutomirski
2018-11-13 19:35                             ` Igor Stoppa
2018-11-13 18:26                     ` Igor Stoppa
2018-11-13 18:35                       ` Andy Lutomirski
2018-11-13 19:01                         ` Igor Stoppa
2018-10-31  9:27               ` Igor Stoppa
2018-10-26 11:09     ` Markus Heiser
2018-10-29 19:35       ` Igor Stoppa
2018-10-26 15:05     ` Jonathan Corbet
2018-10-29 19:38       ` Igor Stoppa
2018-10-29 20:35     ` Igor Stoppa
2018-10-23 21:34 ` [PATCH 11/17] prmem: llist: use designated initializer Igor Stoppa
2018-10-23 21:34 ` [PATCH 12/17] prmem: linked list: set alignment Igor Stoppa
2018-10-26  9:31   ` Peter Zijlstra
2018-10-23 21:35 ` [PATCH 13/17] prmem: linked list: disable layout randomization Igor Stoppa
2018-10-24 13:43   ` Alexey Dobriyan
2018-10-29 19:40     ` Igor Stoppa
2018-10-26  9:32   ` Peter Zijlstra
2018-10-26 10:17     ` Matthew Wilcox
2018-10-30 15:39       ` Peter Zijlstra
2018-10-23 21:35 ` [PATCH 14/17] prmem: llist, hlist, both plain and rcu Igor Stoppa
2018-10-24 11:37   ` Mathieu Desnoyers
2018-10-24 14:03     ` Igor Stoppa
2018-10-24 14:56       ` Tycho Andersen
2018-10-24 22:52         ` Igor Stoppa
2018-10-25  8:11           ` Tycho Andersen
2018-10-28  9:52       ` Steven Rostedt
2018-10-29 19:43         ` Igor Stoppa
2018-10-26  9:38   ` Peter Zijlstra
2018-10-23 21:35 ` [PATCH 15/17] prmem: test cases for prlist and prhlist Igor Stoppa
2018-10-23 21:35 ` [PATCH 16/17] prmem: pratomic-long Igor Stoppa
2018-10-25  0:13   ` Peter Zijlstra
2018-10-29 21:17     ` Igor Stoppa
2018-10-30 15:58       ` Peter Zijlstra
2018-10-30 16:28         ` Will Deacon
2018-10-31  9:10           ` Peter Zijlstra
2018-11-01  3:28             ` Kees Cook
2018-10-23 21:35 ` [PATCH 17/17] prmem: ima: turn the measurements list write rare Igor Stoppa
2018-10-24 23:03 ` [RFC v1 PATCH 00/17] prmem: protected memory Dave Chinner
2018-10-29 19:47   ` Igor Stoppa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b09b3feb-8e3d-5fab-4e2b-35e8c252b27c@infradead.org \
    --to=rdunlap@infradead.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=igor.stoppa@gmail.com \
    --cc=igor.stoppa@huawei.com \
    --cc=jmorris@namei.org \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=labbott@redhat.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-integrity@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=willy@infradead.org \
    --cc=zohar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.