All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-mm@kvack.org, linux-arch@vger.kernel.org,
	linux-api@vger.kernel.org, Arnd Bergmann <arnd@arndb.de>,
	Andy Lutomirski <luto@kernel.org>,
	Balbir Singh <bsingharora@gmail.com>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Eugene Syromiatnikov <esyr@redhat.com>,
	Florian Weimer <fweimer@redhat.com>,
	"H.J. Lu" <hjl.tools@gmail.com>, Jann Horn <jannh@google.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Kees Cook <keescook@chromium.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>, Pavel Machek <pavel@ucw.cz>,
	Peter Zijlstra <peterz@infradead.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	Vedvyas Shanbhogue <vedvyas.shanbhogue@intel.com>,
	Dave Martin <Dave.Martin@arm.com>,
	Weijiang Yang <weijiang.yang@intel.com>,
	Pengfei Xu <pengfei.xu@intel.com>
Subject: Re: [PATCH v15 08/26] x86/mm: Introduce _PAGE_COW
Date: Tue, 8 Dec 2020 18:50:14 +0100	[thread overview]
Message-ID: <20201208175014.GD27920@zn.tnic> (raw)
In-Reply-To: <20201110162211.9207-9-yu-cheng.yu@intel.com>

On Tue, Nov 10, 2020 at 08:21:53AM -0800, Yu-cheng Yu wrote:
> There is essentially no room left in the x86 hardware PTEs on some OSes
> (not Linux).  That left the hardware architects looking for a way to
> represent a new memory type (shadow stack) within the existing bits.
> They chose to repurpose a lightly-used state: Write=0,Dirty=1.

It is not clear to me what the definition and semantics of that bit is.

+#define _PAGE_BIT_COW          _PAGE_BIT_SOFTW5 /* copy-on-write */

Is it set by hw or by sw and hw uses it to know it is a shadow stack
page, and so on.

I think you should lead with its definition.

> The reason it's lightly used is that Dirty=1 is normally set by hardware
> and cannot normally be set by hardware on a Write=0 PTE.  Software must
> normally be involved to create one of these PTEs, so software can simply
> opt to not create them.
> 
> But that leaves us with a Linux problem: we need to ensure we never create

Please use passive voice in your commit message: no "we" or "I", etc.

> Write=0,Dirty=1 PTEs.  In places where we do create them, we need to find
> an alternative way to represent them _without_ using the same hardware bit
> combination.  Thus, enter _PAGE_COW.  This results in the following:
> 
> (a) A modified, copy-on-write (COW) page: (R/O + _PAGE_COW)
> (b) A R/O page that has been COW'ed: (R/O + _PAGE_COW)

Both are "R/O + _PAGE_COW". Where's the difference? The dirty bit?

>     The user page is in a R/O VMA, and get_user_pages() needs a writable
>     copy.  The page fault handler creates a copy of the page and sets
>     the new copy's PTE as R/O and _PAGE_COW.
> (c) A shadow stack PTE: (R/O + _PAGE_DIRTY_HW)

So W=0, D=1 ?

> (d) A shared shadow stack PTE: (R/O + _PAGE_COW)
>     When a shadow stack page is being shared among processes (this happens
>     at fork()), its PTE is cleared of _PAGE_DIRTY_HW, so the next shadow
>     stack access causes a fault, and the page is duplicated and
>     _PAGE_DIRTY_HW is set again.  This is the COW equivalent for shadow
>     stack pages, even though it's copy-on-access rather than copy-on-write.
> (e) A page where the processor observed a Write=1 PTE, started a write, set
>     Dirty=1, but then observed a Write=0 PTE.

How does that happen? Something changed the PTE's W bit to 0 in-between?

> That's possible today, but
>     will not happen on processors that support shadow stack.
> 
> Use _PAGE_COW in pte_wrprotect() and _PAGE_DIRTY_HW in pte_mkwrite().
> Apply the same changes to pmd and pud.
> 
> When this patch is applied, there are six free bits left in the 64-bit PTE.

s/When this patch is applied/After this/

Avoid having "This patch" or "This commit" in the commit message. It is
tautologically useless.

Also, do

$ git grep 'This patch' Documentation/process

for more details.

> There are no more free bits in the 32-bit PTE (except for PAE) and shadow
> stack is not implemented for the 32-bit kernel.
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
>  arch/x86/include/asm/pgtable.h       | 120 ++++++++++++++++++++++++---
>  arch/x86/include/asm/pgtable_types.h |  41 ++++++++-
>  2 files changed, 150 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index b23697658b28..c88c7ccf0318 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -121,9 +121,9 @@ extern pmdval_t early_pmd_flags;
>   * The following only work if pte_present() is true.
>   * Undefined behaviour if not..
>   */
> -static inline int pte_dirty(pte_t pte)
> +static inline bool pte_dirty(pte_t pte)
>  {
> -	return pte_flags(pte) & _PAGE_DIRTY_HW;
> +	return pte_flags(pte) & _PAGE_DIRTY_BITS;

Why?

Does _PAGE_COW mean dirty too?

> @@ -343,6 +349,17 @@ static inline pte_t pte_mkold(pte_t pte)
>  
>  static inline pte_t pte_wrprotect(pte_t pte)
>  {
> +	/*
> +	 * Blindly clearing _PAGE_RW might accidentally create
> +	 * a shadow stack PTE (RW=0,Dirty=1).  Move the hardware
> +	 * dirty value to the software bit.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
> +		pte.pte |= (pte.pte & _PAGE_DIRTY_HW) >>
> +			   _PAGE_BIT_DIRTY_HW << _PAGE_BIT_COW;

Let that line stick out. And that shifting is not grokkable at a quick
glance, at least not to me. Simplify?

>  static inline pmd_t pmd_wrprotect(pmd_t pmd)
>  {
> +	/*
> +	 * Blindly clearing _PAGE_RW might accidentally create
> +	 * a shadow stack PMD (RW=0,Dirty=1).  Move the hardware
> +	 * dirty value to the software bit.

This whole carefully sidestepping the possiblity of creating a shadow
stack pXd is kinda sucky...

> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index 7462a574fc93..5f764d8d9bae 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -23,7 +23,8 @@
>  #define _PAGE_BIT_SOFTW2	10	/* " */
>  #define _PAGE_BIT_SOFTW3	11	/* " */
>  #define _PAGE_BIT_PAT_LARGE	12	/* On 2MB or 1GB pages */
> -#define _PAGE_BIT_SOFTW4	58	/* available for programmer */
> +#define _PAGE_BIT_SOFTW4	57	/* available for programmer */
> +#define _PAGE_BIT_SOFTW5	58	/* available for programmer */
>  #define _PAGE_BIT_PKEY_BIT0	59	/* Protection Keys, bit 1/4 */
>  #define _PAGE_BIT_PKEY_BIT1	60	/* Protection Keys, bit 2/4 */
>  #define _PAGE_BIT_PKEY_BIT2	61	/* Protection Keys, bit 3/4 */
> @@ -36,6 +37,16 @@
>  #define _PAGE_BIT_SOFT_DIRTY	_PAGE_BIT_SOFTW3 /* software dirty tracking */
>  #define _PAGE_BIT_DEVMAP	_PAGE_BIT_SOFTW4
>  
> +/*
> + * This bit indicates a copy-on-write page, and is different from
> + * _PAGE_BIT_SOFT_DIRTY, which tracks which pages a task writes to.
> + */
> +#ifdef CONFIG_X86_64

CONFIG_X86_64 ? Do all x86 machines out there support CET?

If anything, CONFIG_X86_CET...

> +#define _PAGE_BIT_COW		_PAGE_BIT_SOFTW5 /* copy-on-write */
> +#else
> +#define _PAGE_BIT_COW		0
> +#endif
> +
>  /* If _PAGE_BIT_PRESENT is clear, we use these: */
>  /* - if the user mapped it with PROT_NONE; pte_present gives true */
-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2020-12-08 17:51 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-10 16:21 [PATCH v15 00/26] Control-flow Enforcement: Shadow Stack Yu-cheng Yu
2020-11-10 16:21 ` [PATCH v15 01/26] Documentation/x86: Add CET description Yu-cheng Yu
2020-11-30 18:26   ` Nick Desaulniers
2020-11-30 18:26     ` Nick Desaulniers
2020-11-30 18:34     ` Yu, Yu-cheng
2020-11-30 19:38       ` Fāng-ruì Sòng
2020-11-30 19:38         ` Fāng-ruì Sòng
2020-11-30 19:47         ` Yu, Yu-cheng
2020-11-10 16:21 ` [PATCH v15 02/26] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET) Yu-cheng Yu
2020-11-10 16:21 ` [PATCH v15 03/26] x86/fpu/xstate: Introduce CET MSR XSAVES supervisor states Yu-cheng Yu
2020-11-26 11:02   ` Borislav Petkov
2020-11-30 17:45   ` [NEEDS-REVIEW] " Dave Hansen
2020-11-30 18:06     ` Yu, Yu-cheng
2020-11-30 18:12       ` Dave Hansen
2020-11-30 18:17         ` Yu, Yu-cheng
2020-11-30 23:16     ` Yu, Yu-cheng
2020-12-01 22:26       ` Dave Hansen
2020-12-01 22:35         ` Yu, Yu-cheng
2020-11-10 16:21 ` [PATCH v15 04/26] x86/cet: Add control-protection fault handler Yu-cheng Yu
2020-11-26 18:49   ` Borislav Petkov
2020-11-10 16:21 ` [PATCH v15 05/26] x86/cet/shstk: Add Kconfig option for user-mode Shadow Stack Yu-cheng Yu
2020-11-27 17:10   ` Borislav Petkov
2020-11-28 16:23     ` Yu, Yu-cheng
2020-11-30 18:15       ` Borislav Petkov
2020-11-30 22:48         ` Yu, Yu-cheng
2020-12-01 16:02           ` Borislav Petkov
2020-11-30 19:56   ` Nick Desaulniers
2020-11-30 19:56     ` Nick Desaulniers
2020-11-30 20:30     ` Yu, Yu-cheng
2020-11-10 16:21 ` [PATCH v15 06/26] x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW Yu-cheng Yu
2020-12-03  9:19   ` Borislav Petkov
2020-12-03 15:12     ` Dave Hansen
2020-12-03 15:56       ` Yu, Yu-cheng
2020-11-10 16:21 ` [PATCH v15 07/26] x86/mm: Remove _PAGE_DIRTY_HW from kernel RO pages Yu-cheng Yu
2020-12-07 16:36   ` Borislav Petkov
2020-12-07 17:11     ` Yu, Yu-cheng
2020-11-10 16:21 ` [PATCH v15 08/26] x86/mm: Introduce _PAGE_COW Yu-cheng Yu
2020-12-08 17:50   ` Borislav Petkov [this message]
2020-12-08 18:25     ` Yu, Yu-cheng
2020-12-08 18:47       ` Borislav Petkov
2020-12-08 19:24         ` Yu, Yu-cheng
2020-12-10 17:41           ` Borislav Petkov
2020-12-10 18:10             ` Yu, Yu-cheng
2020-11-10 16:21 ` [PATCH v15 09/26] drm/i915/gvt: Change _PAGE_DIRTY to _PAGE_DIRTY_BITS Yu-cheng Yu
2020-11-10 16:21 ` [PATCH v15 10/26] x86/mm: Update pte_modify for _PAGE_COW Yu-cheng Yu
2020-11-10 16:21 ` [PATCH v15 11/26] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY_HW to _PAGE_COW Yu-cheng Yu
2020-11-10 16:21 ` [PATCH v15 12/26] mm: Introduce VM_SHSTK for shadow stack memory Yu-cheng Yu
2020-11-10 16:21 ` [PATCH v15 13/26] x86/mm: Shadow Stack page fault error checking Yu-cheng Yu
2020-11-10 16:21 ` [PATCH v15 14/26] x86/mm: Update maybe_mkwrite() for shadow stack Yu-cheng Yu
2020-11-10 16:22 ` [PATCH v15 15/26] mm: Fixup places that call pte_mkwrite() directly Yu-cheng Yu
2020-11-10 16:22 ` [PATCH v15 16/26] mm: Add guard pages around a shadow stack Yu-cheng Yu
2020-11-10 16:22 ` [PATCH v15 17/26] mm/mmap: Add shadow stack pages to memory accounting Yu-cheng Yu
2020-11-10 16:22 ` [PATCH v15 18/26] mm: Update can_follow_write_pte() for shadow stack Yu-cheng Yu
2020-11-10 16:22 ` [PATCH v15 19/26] mm: Re-introduce vm_flags to do_mmap() Yu-cheng Yu
2020-11-10 16:22 ` [PATCH v15 20/26] x86/cet/shstk: User-mode shadow stack support Yu-cheng Yu
2020-11-10 16:22 ` [PATCH v15 21/26] x86/cet/shstk: Handle signals for shadow stack Yu-cheng Yu
2020-11-10 16:22 ` [PATCH v15 22/26] binfmt_elf: Define GNU_PROPERTY_X86_FEATURE_1_AND properties Yu-cheng Yu
2020-11-10 16:22 ` [PATCH v15 23/26] ELF: Introduce arch_setup_elf_property() Yu-cheng Yu
2020-11-10 16:22 ` [PATCH v15 24/26] x86/cet/shstk: Handle thread shadow stack Yu-cheng Yu
2020-11-10 16:22 ` [PATCH v15 25/26] x86/cet/shstk: Add arch_prctl functions for " Yu-cheng Yu
2020-11-10 16:22 ` [PATCH v15 26/26] mm: Introduce PROT_SHSTK " Yu-cheng Yu
2020-11-27  9:29 ` [PATCH v15 00/26] Control-flow Enforcement: Shadow Stack Balbir Singh
2020-11-28 16:31   ` Yu, Yu-cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201208175014.GD27920@zn.tnic \
    --to=bp@alien8.de \
    --cc=Dave.Martin@arm.com \
    --cc=arnd@arndb.de \
    --cc=bsingharora@gmail.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=esyr@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=gorcunov@gmail.com \
    --cc=hjl.tools@gmail.com \
    --cc=hpa@zytor.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=nadav.amit@gmail.com \
    --cc=oleg@redhat.com \
    --cc=pavel@ucw.cz \
    --cc=pengfei.xu@intel.com \
    --cc=peterz@infradead.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=rdunlap@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=vedvyas.shanbhogue@intel.com \
    --cc=weijiang.yang@intel.com \
    --cc=x86@kernel.org \
    --cc=yu-cheng.yu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.