All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave@sr71.net>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-mm@kvack.org, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, dave.hansen@linux.intel.com,
	arnd@arndb.de, hughd@google.com, viro@zeniv.linux.org.uk
Subject: Re: [PATCH 2/9] mm: implement new pkey_mprotect() system call
Date: Thu, 7 Jul 2016 09:51:52 -0700	[thread overview]
Message-ID: <577E88A8.8030909@sr71.net> (raw)
In-Reply-To: <20160707144031.GY11498@techsingularity.net>

On 07/07/2016 07:40 AM, Mel Gorman wrote:
> On Thu, Jul 07, 2016 at 05:47:22AM -0700, Dave Hansen wrote:
>> +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
>>  static inline int vma_pkey(struct vm_area_struct *vma)
>>  {
>> -	u16 pkey = 0;
>> -#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
>>  	unsigned long vma_pkey_mask = VM_PKEY_BIT0 | VM_PKEY_BIT1 |
>>  				      VM_PKEY_BIT2 | VM_PKEY_BIT3;
>> -	pkey = (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
>> -#endif
>> -	return pkey;
>> +
>> +	return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
>> +}
>> +#else
>> +static inline int vma_pkey(struct vm_area_struct *vma)
>> +{
>> +	return 0;
>>  }
>> +#endif
>>  
>>  static inline bool __pkru_allows_pkey(u16 pkey, bool write)
>>  {
> 
> Looks like MASK could have been statically defined and be a simple shift
> and mask known at compile time. Minor though.

The VM_PKEY_BIT*'s are only ever defined as masks and not bit numbers.
So, if you want to use a mask, you end up doing something like:

	unsigned long mask = (NR_PKEYS-1) << ffz(~VM_PKEY_BIT0);

Which ends up with the same thing, but I think ends up being pretty on
par for ugliness.

...
>> +/*
>> + * When setting a userspace-provided value, we need to ensure
>> + * that it is valid.  The __ version can get used by
>> + * kernel-internal uses like the execute-only support.
>> + */
>> +int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
>> +		unsigned long init_val)
>> +{
>> +	if (!validate_pkey(pkey))
>> +		return -EINVAL;
>> +	return __arch_set_user_pkey_access(tsk, pkey, init_val);
>> +}
> 
> There appears to be a subtle bug fixed for validate_key. It appears
> there wasn't protection of the dedicated key before but nothing could
> reach it.

Right.  There was no user interface that took a key and we trusted that
the kernel knew what it was doing.

> The arch_max_pkey and PKEY_DEDICATE_EXECUTE_ONLY interaction is subtle
> but I can't find a problem with it either.
> 
> That aside, the validate_pkey check looks weak. It might be a number
> that works but no guarantee it's an allocated key or initialised
> properly. At this point, garbage can be handed into the system call
> potentially but maybe that gets fixed later.

It's called in three paths:
1. by the kernel when setting up execute-only support
2. by pkey_alloc() on the pkey we just allocated
3. by pkey_set() on a pkey we just checked was allocated

So, it isn't broken, but it's also not clear at all why it is safe and
what validate_pkey() is actually validating.

But, that said, this does make me realize that with
pkey_alloc()/pkey_free(), this is probably redundant.  We verify that
the key is allocated, and we only allow valid keys to be allocated.

IOW, I think I can remove validate_pkey(), but only if we keep pkey_alloc().

...
>> -		newflags = calc_vm_prot_bits(prot, pkey);
>> +		new_vma_pkey = arch_override_mprotect_pkey(vma, prot, pkey);
>> +		newflags = calc_vm_prot_bits(prot, new_vma_pkey);
>>  		newflags |= (vma->vm_flags & ~(VM_READ | VM_WRITE | VM_EXEC));
>>  
> 
> On CPUs that do not support the feature, arch_override_mprotect_pkey
> returns 0 and the normal protections are used. It's not clear how an
> application is meant to detect if the operation succeeded or not. What
> if the application relies on pkeys to be working?

It actually shows up as -ENOSPC from pkey_alloc().  This sounds goofy,
but it teaches programs something very important: they always have to
look for ENOSPC, and must always be prepared to function without
protection keys.  A library might have stolen all the keys, or an
LD_PRELOAD, so an app can never be sure what is available.

If we teach them to check for ENOSPC from day one, they'll never be
surprised.

I've tried to spell this out a bit more clearly in the manpages.  I'll
also add it to the changelog.

WARNING: multiple messages have this Message-ID (diff)
From: Dave Hansen <dave@sr71.net>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-mm@kvack.org, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, dave.hansen@linux.intel.com,
	arnd@arndb.de, hughd@google.com, viro@zeniv.linux.org.uk
Subject: Re: [PATCH 2/9] mm: implement new pkey_mprotect() system call
Date: Thu, 7 Jul 2016 09:51:52 -0700	[thread overview]
Message-ID: <577E88A8.8030909@sr71.net> (raw)
In-Reply-To: <20160707144031.GY11498@techsingularity.net>

On 07/07/2016 07:40 AM, Mel Gorman wrote:
> On Thu, Jul 07, 2016 at 05:47:22AM -0700, Dave Hansen wrote:
>> +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
>>  static inline int vma_pkey(struct vm_area_struct *vma)
>>  {
>> -	u16 pkey = 0;
>> -#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
>>  	unsigned long vma_pkey_mask = VM_PKEY_BIT0 | VM_PKEY_BIT1 |
>>  				      VM_PKEY_BIT2 | VM_PKEY_BIT3;
>> -	pkey = (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
>> -#endif
>> -	return pkey;
>> +
>> +	return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
>> +}
>> +#else
>> +static inline int vma_pkey(struct vm_area_struct *vma)
>> +{
>> +	return 0;
>>  }
>> +#endif
>>  
>>  static inline bool __pkru_allows_pkey(u16 pkey, bool write)
>>  {
> 
> Looks like MASK could have been statically defined and be a simple shift
> and mask known at compile time. Minor though.

The VM_PKEY_BIT*'s are only ever defined as masks and not bit numbers.
So, if you want to use a mask, you end up doing something like:

	unsigned long mask = (NR_PKEYS-1) << ffz(~VM_PKEY_BIT0);

Which ends up with the same thing, but I think ends up being pretty on
par for ugliness.

...
>> +/*
>> + * When setting a userspace-provided value, we need to ensure
>> + * that it is valid.  The __ version can get used by
>> + * kernel-internal uses like the execute-only support.
>> + */
>> +int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
>> +		unsigned long init_val)
>> +{
>> +	if (!validate_pkey(pkey))
>> +		return -EINVAL;
>> +	return __arch_set_user_pkey_access(tsk, pkey, init_val);
>> +}
> 
> There appears to be a subtle bug fixed for validate_key. It appears
> there wasn't protection of the dedicated key before but nothing could
> reach it.

Right.  There was no user interface that took a key and we trusted that
the kernel knew what it was doing.

> The arch_max_pkey and PKEY_DEDICATE_EXECUTE_ONLY interaction is subtle
> but I can't find a problem with it either.
> 
> That aside, the validate_pkey check looks weak. It might be a number
> that works but no guarantee it's an allocated key or initialised
> properly. At this point, garbage can be handed into the system call
> potentially but maybe that gets fixed later.

It's called in three paths:
1. by the kernel when setting up execute-only support
2. by pkey_alloc() on the pkey we just allocated
3. by pkey_set() on a pkey we just checked was allocated

So, it isn't broken, but it's also not clear at all why it is safe and
what validate_pkey() is actually validating.

But, that said, this does make me realize that with
pkey_alloc()/pkey_free(), this is probably redundant.  We verify that
the key is allocated, and we only allow valid keys to be allocated.

IOW, I think I can remove validate_pkey(), but only if we keep pkey_alloc().

...
>> -		newflags = calc_vm_prot_bits(prot, pkey);
>> +		new_vma_pkey = arch_override_mprotect_pkey(vma, prot, pkey);
>> +		newflags = calc_vm_prot_bits(prot, new_vma_pkey);
>>  		newflags |= (vma->vm_flags & ~(VM_READ | VM_WRITE | VM_EXEC));
>>  
> 
> On CPUs that do not support the feature, arch_override_mprotect_pkey
> returns 0 and the normal protections are used. It's not clear how an
> application is meant to detect if the operation succeeded or not. What
> if the application relies on pkeys to be working?

It actually shows up as -ENOSPC from pkey_alloc().  This sounds goofy,
but it teaches programs something very important: they always have to
look for ENOSPC, and must always be prepared to function without
protection keys.  A library might have stolen all the keys, or an
LD_PRELOAD, so an app can never be sure what is available.

If we teach them to check for ENOSPC from day one, they'll never be
surprised.

I've tried to spell this out a bit more clearly in the manpages.  I'll
also add it to the changelog.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-07-07 16:52 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-07 12:47 [PATCH 0/9] [REVIEW-REQUEST] [v4] System Calls for Memory Protection Keys Dave Hansen
2016-07-07 12:47 ` Dave Hansen
2016-07-07 12:47 ` [PATCH 1/9] x86, pkeys: add fault handling for PF_PK page fault bit Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 14:40   ` Mel Gorman
2016-07-07 14:40     ` Mel Gorman
2016-07-07 15:42     ` Dave Hansen
2016-07-07 15:42       ` Dave Hansen
2016-07-07 12:47 ` [PATCH 2/9] mm: implement new pkey_mprotect() system call Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 14:40   ` Mel Gorman
2016-07-07 14:40     ` Mel Gorman
2016-07-07 16:51     ` Dave Hansen [this message]
2016-07-07 16:51       ` Dave Hansen
2016-07-08 10:15       ` Mel Gorman
2016-07-08 10:15         ` Mel Gorman
2016-07-07 12:47 ` [PATCH 3/9] x86, pkeys: make mprotect_key() mask off additional vm_flags Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 12:47 ` [PATCH 4/9] x86: wire up mprotect_key() system call Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 12:47 ` [PATCH 5/9] x86, pkeys: allocation/free syscalls Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 14:40   ` Mel Gorman
2016-07-07 14:40     ` Mel Gorman
2016-07-07 15:38     ` Dave Hansen
2016-07-07 15:38       ` Dave Hansen
2016-07-07 12:47 ` [PATCH 6/9] x86, pkeys: add pkey set/get syscalls Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 14:45   ` Mel Gorman
2016-07-07 14:45     ` Mel Gorman
2016-07-07 17:33     ` Dave Hansen
2016-07-07 17:33       ` Dave Hansen
2016-07-08  7:18       ` Ingo Molnar
2016-07-08  7:18         ` Ingo Molnar
2016-07-08 16:32         ` Dave Hansen
2016-07-08 16:32           ` Dave Hansen
2016-07-08 16:32           ` Dave Hansen
2016-07-09  8:37           ` Ingo Molnar
2016-07-09  8:37             ` Ingo Molnar
2016-07-09  8:37             ` Ingo Molnar
2016-07-11  4:25             ` Andy Lutomirski
2016-07-11  4:25               ` Andy Lutomirski
2016-07-11  7:35               ` Ingo Molnar
2016-07-11  7:35                 ` Ingo Molnar
2016-07-11  7:35                 ` Ingo Molnar
2016-07-11 14:28                 ` Dave Hansen
2016-07-11 14:28                   ` Dave Hansen
2016-07-12  7:13                   ` Ingo Molnar
2016-07-12  7:13                     ` Ingo Molnar
2016-07-12 15:39                     ` Dave Hansen
2016-07-12 15:39                       ` Dave Hansen
2016-07-11 14:50                 ` Andy Lutomirski
2016-07-11 14:50                   ` Andy Lutomirski
2016-07-11 14:34               ` Dave Hansen
2016-07-11 14:34                 ` Dave Hansen
2016-07-11 14:34                 ` Dave Hansen
2016-07-11 14:45                 ` Andy Lutomirski
2016-07-11 14:45                   ` Andy Lutomirski
2016-07-11 15:48                   ` Dave Hansen
2016-07-11 15:48                     ` Dave Hansen
2016-07-12 16:32                     ` Andy Lutomirski
2016-07-12 16:32                       ` Andy Lutomirski
2016-07-12 17:12                       ` Dave Hansen
2016-07-12 17:12                         ` Dave Hansen
2016-07-12 22:55                         ` Andy Lutomirski
2016-07-12 22:55                           ` Andy Lutomirski
2016-07-13  7:56                       ` Ingo Molnar
2016-07-13  7:56                         ` Ingo Molnar
2016-07-13 18:43                         ` Andy Lutomirski
2016-07-13 18:43                           ` Andy Lutomirski
2016-07-14  8:07                           ` Ingo Molnar
2016-07-14  8:07                             ` Ingo Molnar
2016-07-18  4:43                             ` Andy Lutomirski
2016-07-18  4:43                               ` Andy Lutomirski
2016-07-18  9:56                               ` Ingo Molnar
2016-07-18  9:56                                 ` Ingo Molnar
2016-07-18 18:02             ` Dave Hansen
2016-07-18 18:02               ` Dave Hansen
2016-07-18 18:02               ` Dave Hansen
2016-07-18 20:12             ` Dave Hansen
2016-07-18 20:12               ` Dave Hansen
2016-07-18 20:12               ` Dave Hansen
2016-07-08 19:26         ` Dave Hansen
2016-07-08 19:26           ` Dave Hansen
2016-07-08 10:22       ` Mel Gorman
2016-07-08 10:22         ` Mel Gorman
2016-07-08 10:22         ` Mel Gorman
2016-07-07 12:47 ` [PATCH 7/9] generic syscalls: wire up memory protection keys syscalls Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 12:47 ` [PATCH 8/9] pkeys: add details of system call use to Documentation/ Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 12:47 ` [PATCH 9/9] x86, pkeys: add self-tests Dave Hansen
2016-07-07 12:47   ` Dave Hansen
2016-07-07 14:47 ` [PATCH 0/9] [REVIEW-REQUEST] [v4] System Calls for Memory Protection Keys Mel Gorman
2016-07-07 14:47   ` Mel Gorman
2016-07-07 14:47   ` Mel Gorman
2016-07-08 18:38 ` Hugh Dickins
2016-07-08 18:38   ` Hugh Dickins
2016-07-08 18:38   ` Hugh Dickins
  -- strict thread matches above, loose matches on Subject: below --
2016-06-09  0:01 [PATCH 0/9] [v3] " Dave Hansen
2016-06-09  0:01 ` [PATCH 2/9] mm: implement new pkey_mprotect() system call Dave Hansen
2016-06-09  0:01   ` Dave Hansen
2016-06-11  9:47   ` Thomas Gleixner
2016-06-11  9:47     ` Thomas Gleixner
2016-06-13 16:03     ` Dave Hansen
2016-06-13 16:03       ` Dave Hansen
2016-06-13 16:03       ` Dave Hansen
2016-06-07 20:47 [PATCH 0/9] [v2] System Calls for Memory Protection Keys Dave Hansen
2016-06-07 20:47 ` [PATCH 2/9] mm: implement new pkey_mprotect() system call Dave Hansen
2016-06-07 20:47   ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=577E88A8.8030909@sr71.net \
    --to=dave@sr71.net \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hughd@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.