linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Dave Hansen <dave@sr71.net>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-mm@kvack.org, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, dave.hansen@linux.intel.com,
	arnd@arndb.de, hughd@google.com, viro@zeniv.linux.org.uk
Subject: Re: [PATCH 5/9] x86, pkeys: allocation/free syscalls
Date: Thu, 7 Jul 2016 15:40:17 +0100	[thread overview]
Message-ID: <20160707144017.GW11498@techsingularity.net> (raw)
In-Reply-To: <20160707124727.62F2BEE0@viggo.jf.intel.com>

On Thu, Jul 07, 2016 at 05:47:27AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> This patch adds two new system calls:
> 
> 	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
> 	int pkey_free(int pkey);
> 
> These implement an "allocator" for the protection keys
> themselves, which can be thought of as analogous to the allocator
> that the kernel has for file descriptors.  The kernel tracks
> which numbers are in use, and only allows operations on keys that
> are valid.  A key which was not obtained by pkey_alloc() may not,
> for instance, be passed to pkey_mprotect() (or the forthcoming
> get/set syscalls).
> 

Ok, so the last patch wired up the system call before the kernel was
tracking which numbers were in use. It doesn't really matter as such but
the patches should be swapped around and only expose the systemcall when
it's actually safe.

> These system calls are also very important given the kernel's use
> of pkeys to implement execute-only support.  These help ensure
> that userspace can never assume that it has control of a key
> unless it first asks the kernel.
> 
> The 'init_access_rights' argument to pkey_alloc() specifies the
> rights that will be established for the returned pkey.  For
> instance:
> 
> 	pkey = pkey_alloc(flags, PKEY_DENY_WRITE);
> 
> will allocate 'pkey', but also sets the bits in PKRU[1] such that
> writing to 'pkey' is already denied.  This keeps userspace from
> needing to have knowledge about manipulating PKRU with the
> RDPKRU/WRPKRU instructions.  Userspace is still free to use these
> instructions as it wishes, but this facility ensures it is no
> longer required.
> 
> The kernel does _not_ enforce that this interface must be used for
> changes to PKRU, even for keys it does not control.
> 
> The kernel does not prevent pkey_free() from successfully freeing
> in-use pkeys (those still assigned to a memory range by
> pkey_mprotect()).  It would be expensive to implement the checks
> for this, so we instead say, "Just don't do it" since sane
> software will never do it anyway.
> 

Unfortunately, it could manifest as either corruption due to an area
expected to be protected being accessible or an unexpected SEGV.

I accept the expensive arguement but it opens a new class of problems
that userspace debuggers will need to evaluate.

> diff -puN arch/x86/include/asm/mmu_context.h~pkeys-116-syscalls-allocation arch/x86/include/asm/mmu_context.h
> --- a/arch/x86/include/asm/mmu_context.h~pkeys-116-syscalls-allocation	2016-07-07 05:47:01.435831049 -0700
> +++ b/arch/x86/include/asm/mmu_context.h	2016-07-07 05:47:01.454831911 -0700
> @@ -108,7 +108,16 @@ static inline void enter_lazy_tlb(struct
>  static inline int init_new_context(struct task_struct *tsk,
>  				   struct mm_struct *mm)
>  {
> +	#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> +	if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
> +		/* pkey 0 is the default and always allocated */
> +		mm->context.pkey_allocation_map = 0x1;
> +		/* -1 means unallocated or invalid */
> +		mm->context.execute_only_pkey = -1;
> +	}
> +	#endif
>  	init_new_context_ldt(tsk, mm);
> +
>  	return 0;
>  }
>  static inline void destroy_context(struct mm_struct *mm)

I prevents userspace modifying the default key from userspace with WRPKRU
or an unallocated key for that matter.  However, I also cannot find a case
where it really matters. An application screwing it up may ask mprotect
to do something very unexpected but that's about it.

> +static inline
> +bool mm_pkey_is_allocated(struct mm_struct *mm, unsigned long pkey)
> +{
> +	if (!validate_pkey(pkey))
> +		return true;
> +
> +	return mm_pkey_allocation_map(mm) & (1 << pkey);
> +}
> +

We flip-flop between whether pkey is signed or unsigned.

> +SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
> +{
> +	int pkey;
> +	int ret;
> +
> +	/* No flags supported yet. */
> +	if (flags)
> +		return -EINVAL;
> +	/* check for unsupported init values */
> +	if (init_val & ~PKEY_ACCESS_MASK)
> +		return -EINVAL;
> +
> +	down_write(&current->mm->mmap_sem);
> +	pkey = mm_pkey_alloc(current->mm);
> +
> +	ret = -ENOSPC;
> +	if (pkey == -1)
> +		goto out;
> +
> +	ret = arch_set_user_pkey_access(current, pkey, init_val);
> +	if (ret) {
> +		mm_pkey_free(current->mm, pkey);
> +		goto out;
> +	}
> +	ret = pkey;
> +out:
> +	up_write(&current->mm->mmap_sem);
> +	return ret;
> +}

It's not wrong as such but mmap_sem taken for write seems *extremely*
heavy to protect the allocation mask. If userspace is racing a key
allocation with mprotect, it's already game over in terms of random
behaviour.

I've no idea what the frequency of pkey alloc/free is expected to be. If
it's really low then maybe it doesn't matter but if it's high this is
going to be a bottleneck later.

> +
> +SYSCALL_DEFINE1(pkey_free, int, pkey)
> +{
> +	int ret;
> +
> +	down_write(&current->mm->mmap_sem);
> +	ret = mm_pkey_free(current->mm, pkey);
> +	up_write(&current->mm->mmap_sem);
> +
> +	/*
> +	 * We could provie warnings or errors if any VMA still
> +	 * has the pkey set here.
> +	 */
> +	return ret;
> +}
> _

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2016-07-07 14:40 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-07 12:47 [PATCH 0/9] [REVIEW-REQUEST] [v4] System Calls for Memory Protection Keys Dave Hansen
2016-07-07 12:47 ` [PATCH 1/9] x86, pkeys: add fault handling for PF_PK page fault bit Dave Hansen
2016-07-07 14:40   ` Mel Gorman
2016-07-07 15:42     ` Dave Hansen
2016-07-07 12:47 ` [PATCH 2/9] mm: implement new pkey_mprotect() system call Dave Hansen
2016-07-07 14:40   ` Mel Gorman
2016-07-07 16:51     ` Dave Hansen
2016-07-08 10:15       ` Mel Gorman
2016-07-07 12:47 ` [PATCH 3/9] x86, pkeys: make mprotect_key() mask off additional vm_flags Dave Hansen
2016-07-07 12:47 ` [PATCH 4/9] x86: wire up mprotect_key() system call Dave Hansen
2016-07-07 12:47 ` [PATCH 5/9] x86, pkeys: allocation/free syscalls Dave Hansen
2016-07-07 14:40   ` Mel Gorman [this message]
2016-07-07 15:38     ` Dave Hansen
2016-07-07 12:47 ` [PATCH 6/9] x86, pkeys: add pkey set/get syscalls Dave Hansen
2016-07-07 14:45   ` Mel Gorman
2016-07-07 17:33     ` Dave Hansen
2016-07-08  7:18       ` Ingo Molnar
2016-07-08 16:32         ` Dave Hansen
2016-07-09  8:37           ` Ingo Molnar
2016-07-11  4:25             ` Andy Lutomirski
2016-07-11  7:35               ` Ingo Molnar
2016-07-11 14:28                 ` Dave Hansen
2016-07-12  7:13                   ` Ingo Molnar
2016-07-12 15:39                     ` Dave Hansen
2016-07-11 14:50                 ` Andy Lutomirski
2016-07-11 14:34               ` Dave Hansen
2016-07-11 14:45                 ` Andy Lutomirski
2016-07-11 15:48                   ` Dave Hansen
2016-07-12 16:32                     ` Andy Lutomirski
2016-07-12 17:12                       ` Dave Hansen
2016-07-12 22:55                         ` Andy Lutomirski
2016-07-13  7:56                       ` Ingo Molnar
2016-07-13 18:43                         ` Andy Lutomirski
2016-07-14  8:07                           ` Ingo Molnar
2016-07-18  4:43                             ` Andy Lutomirski
2016-07-18  9:56                               ` Ingo Molnar
2016-07-18 18:02             ` Dave Hansen
2016-07-18 20:12             ` Dave Hansen
2016-07-08 19:26         ` Dave Hansen
2016-07-08 10:22       ` Mel Gorman
2016-07-07 12:47 ` [PATCH 7/9] generic syscalls: wire up memory protection keys syscalls Dave Hansen
2016-07-07 12:47 ` [PATCH 8/9] pkeys: add details of system call use to Documentation/ Dave Hansen
2016-07-07 12:47 ` [PATCH 9/9] x86, pkeys: add self-tests Dave Hansen
2016-07-07 14:47 ` [PATCH 0/9] [REVIEW-REQUEST] [v4] System Calls for Memory Protection Keys Mel Gorman
2016-07-08 18:38 ` Hugh Dickins
  -- strict thread matches above, loose matches on Subject: below --
2016-06-09  0:01 [PATCH 0/9] [v3] " Dave Hansen
2016-06-09  0:01 ` [PATCH 5/9] x86, pkeys: allocation/free syscalls Dave Hansen
2016-06-07 20:47 [PATCH 0/9] [v2] System Calls for Memory Protection Keys Dave Hansen
2016-06-07 20:47 ` [PATCH 5/9] x86, pkeys: allocation/free syscalls Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160707144017.GW11498@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@sr71.net \
    --cc=hughd@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).