All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/mmap: Map MAP_STACK to VM_STACK
@ 2023-04-18 21:02 Waiman Long
  2023-04-18 21:18 ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Waiman Long @ 2023-04-18 21:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Joe Mario, Barry Marson, Rafael Aquini,
	Waiman Long

One of the flags of mmap(2) is MAP_STACK to request a memory segment
suitable for a process or thread stack. The kernel currently ignores
this flags. Glibc uses MAP_STACK when mmapping a thread stack. However,
selinux has an execstack check in selinux_file_mprotect() which disallows
a stack VMA to be made executable.

Since MAP_STACK is a noop, it is possible for a stack VMA to be merged
with an adjacent anonymous VMA. With that merging, using mprotect(2)
to change a part of the merged anonymous VMA to make it executable may
fail. This can lead to sporadic failure of applications that need to
make those changes.

One possible fix is to make sure that a stack VMA will not be merged
with a non-stack anonymous VMA. That requires a vm flag that can be
used to distinguish a stack VMA from a regular anonymous VMA. One
can add a new dummy vm flag for that purpose. However, there is only
1 bit left in the lower 32 bits of vm_flags. Another alternative is
to use an existing vm flag. VM_STACK (= VM_GROWSDOWN for most arches)
can certainly be used for this purpose. The downside is that it is a
slight change in existing behavior.

Making a stack VMA growable by default certainly fits the need of a
process or thread stack. This patch now maps MAP_STACK to VM_STACK to
prevent unwanted merging with adjacent non-stack VMAs and make the VMA
more suitable for being used as a stack.

Reported-by: Joe Mario <jmario@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/mman.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/mman.h b/include/linux/mman.h
index cee1e4b566d8..4b621d30ace9 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -152,6 +152,7 @@ calc_vm_flag_bits(unsigned long flags)
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
 	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
 	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
+	       _calc_vm_trans(flags, MAP_STACK,	     VM_STACK     ) |
 	       arch_calc_vm_flag_bits(flags);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/mmap: Map MAP_STACK to VM_STACK
  2023-04-18 21:02 [PATCH] mm/mmap: Map MAP_STACK to VM_STACK Waiman Long
@ 2023-04-18 21:18 ` Andrew Morton
  2023-04-19  1:16   ` Waiman Long
  2023-04-19 23:21   ` Jane Chu
  0 siblings, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2023-04-18 21:18 UTC (permalink / raw)
  To: Waiman Long
  Cc: linux-mm, linux-kernel, Joe Mario, Barry Marson, Rafael Aquini

On Tue, 18 Apr 2023 17:02:30 -0400 Waiman Long <longman@redhat.com> wrote:

> One of the flags of mmap(2) is MAP_STACK to request a memory segment
> suitable for a process or thread stack. The kernel currently ignores
> this flags. Glibc uses MAP_STACK when mmapping a thread stack. However,
> selinux has an execstack check in selinux_file_mprotect() which disallows
> a stack VMA to be made executable.
> 
> Since MAP_STACK is a noop, it is possible for a stack VMA to be merged
> with an adjacent anonymous VMA. With that merging, using mprotect(2)
> to change a part of the merged anonymous VMA to make it executable may
> fail. This can lead to sporadic failure of applications that need to
> make those changes.

"Sporadic failure of applications" sounds quite serious.  Can you
provide more details?

Did you consider a -stable backport?  I'm unable to judge, because the
description of the userspace effects is so thin,

> One possible fix is to make sure that a stack VMA will not be merged
> with a non-stack anonymous VMA. That requires a vm flag that can be
> used to distinguish a stack VMA from a regular anonymous VMA. One
> can add a new dummy vm flag for that purpose. However, there is only
> 1 bit left in the lower 32 bits of vm_flags. Another alternative is
> to use an existing vm flag. VM_STACK (= VM_GROWSDOWN for most arches)
> can certainly be used for this purpose. The downside is that it is a
> slight change in existing behavior.
> 
> Making a stack VMA growable by default certainly fits the need of a
> process or thread stack. This patch now maps MAP_STACK to VM_STACK to
> prevent unwanted merging with adjacent non-stack VMAs and make the VMA
> more suitable for being used as a stack.
> 
> ...
>
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -152,6 +152,7 @@ calc_vm_flag_bits(unsigned long flags)
>  	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
>  	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
>  	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
> +	       _calc_vm_trans(flags, MAP_STACK,	     VM_STACK     ) |
>  	       arch_calc_vm_flag_bits(flags);
>  }

The mmap(2) manpage says

  This flag is currently a no-op on Linux.  However, by employing
  this flag, applications can ensure that they transparently ob- tain
  support if the flag is implemented in the future.  Thus, it is used
  in the glibc threading implementation to allow for the fact that some
  architectures may (later) require special treat- ment for stack
  allocations.  A further reason to employ this flag is portability:
  MAP_STACK exists (and has an effect) on some other systems (e.g.,
  some of the BSDs).

so please propose an update for this?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/mmap: Map MAP_STACK to VM_STACK
  2023-04-18 21:18 ` Andrew Morton
@ 2023-04-19  1:16   ` Waiman Long
  2023-04-19  1:36     ` Hugh Dickins
  2023-04-19  3:46     ` Matthew Wilcox
  2023-04-19 23:21   ` Jane Chu
  1 sibling, 2 replies; 13+ messages in thread
From: Waiman Long @ 2023-04-19  1:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Joe Mario, Barry Marson, Rafael Aquini


On 4/18/23 17:18, Andrew Morton wrote:
> On Tue, 18 Apr 2023 17:02:30 -0400 Waiman Long <longman@redhat.com> wrote:
>
>> One of the flags of mmap(2) is MAP_STACK to request a memory segment
>> suitable for a process or thread stack. The kernel currently ignores
>> this flags. Glibc uses MAP_STACK when mmapping a thread stack. However,
>> selinux has an execstack check in selinux_file_mprotect() which disallows
>> a stack VMA to be made executable.
>>
>> Since MAP_STACK is a noop, it is possible for a stack VMA to be merged
>> with an adjacent anonymous VMA. With that merging, using mprotect(2)
>> to change a part of the merged anonymous VMA to make it executable may
>> fail. This can lead to sporadic failure of applications that need to
>> make those changes.
> "Sporadic failure of applications" sounds quite serious.  Can you
> provide more details?

The problem boils down to the fact that it is possible for user code to 
mmap a region of memory and then for the kernel to merge the VMA for 
that memory with the VMA for one of the application's thread stacks. 
This is causing random SEGVs with one of our large customer application.

At a high level, this is what's happening:

  1) App runs creating lots of threads.
  2) It mmap's 256K pages of anonymous memory.
  3) It writes executable code to that memory.
  4) It calls mprotect() with PROT_EXEC on that memory so
     it can subsequently execute the code.

The above mprotect() will fail if the mmap'd region's VMA gets merged 
with the VMA for one of the thread stacks.  That's because the default 
RHEL SELinux policy is to not allow executable stacks.

>
> Did you consider a -stable backport?  I'm unable to judge, because the
> description of the userspace effects is so thin,

Yes, stable backport can be considered.


>
>> One possible fix is to make sure that a stack VMA will not be merged
>> with a non-stack anonymous VMA. That requires a vm flag that can be
>> used to distinguish a stack VMA from a regular anonymous VMA. One
>> can add a new dummy vm flag for that purpose. However, there is only
>> 1 bit left in the lower 32 bits of vm_flags. Another alternative is
>> to use an existing vm flag. VM_STACK (= VM_GROWSDOWN for most arches)
>> can certainly be used for this purpose. The downside is that it is a
>> slight change in existing behavior.
>>
>> Making a stack VMA growable by default certainly fits the need of a
>> process or thread stack. This patch now maps MAP_STACK to VM_STACK to
>> prevent unwanted merging with adjacent non-stack VMAs and make the VMA
>> more suitable for being used as a stack.
>>
>> ...
>>
>> --- a/include/linux/mman.h
>> +++ b/include/linux/mman.h
>> @@ -152,6 +152,7 @@ calc_vm_flag_bits(unsigned long flags)
>>   	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
>>   	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
>>   	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
>> +	       _calc_vm_trans(flags, MAP_STACK,	     VM_STACK     ) |
>>   	       arch_calc_vm_flag_bits(flags);
>>   }
> The mmap(2) manpage says
>
>    This flag is currently a no-op on Linux.  However, by employing
>    this flag, applications can ensure that they transparently ob- tain
>    support if the flag is implemented in the future.  Thus, it is used
>    in the glibc threading implementation to allow for the fact that some
>    architectures may (later) require special treat- ment for stack
>    allocations.  A further reason to employ this flag is portability:
>    MAP_STACK exists (and has an effect) on some other systems (e.g.,
>    some of the BSDs).
>
> so please propose an update for this?

OK, will do.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/mmap: Map MAP_STACK to VM_STACK
  2023-04-19  1:16   ` Waiman Long
@ 2023-04-19  1:36     ` Hugh Dickins
  2023-04-19  1:45       ` Waiman Long
  2023-04-19  3:46     ` Matthew Wilcox
  1 sibling, 1 reply; 13+ messages in thread
From: Hugh Dickins @ 2023-04-19  1:36 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, linux-mm, linux-kernel, Joe Mario, Barry Marson,
	Rafael Aquini

[-- Attachment #1: Type: text/plain, Size: 2036 bytes --]

On Tue, 18 Apr 2023, Waiman Long wrote:
> On 4/18/23 17:18, Andrew Morton wrote:
> > On Tue, 18 Apr 2023 17:02:30 -0400 Waiman Long <longman@redhat.com> wrote:
> >
> >> One of the flags of mmap(2) is MAP_STACK to request a memory segment
> >> suitable for a process or thread stack. The kernel currently ignores
> >> this flags. Glibc uses MAP_STACK when mmapping a thread stack. However,
> >> selinux has an execstack check in selinux_file_mprotect() which disallows
> >> a stack VMA to be made executable.
> >>
> >> Since MAP_STACK is a noop, it is possible for a stack VMA to be merged
> >> with an adjacent anonymous VMA. With that merging, using mprotect(2)
> >> to change a part of the merged anonymous VMA to make it executable may
> >> fail. This can lead to sporadic failure of applications that need to
> >> make those changes.
> > "Sporadic failure of applications" sounds quite serious.  Can you
> > provide more details?
> 
> The problem boils down to the fact that it is possible for user code to mmap a
> region of memory and then for the kernel to merge the VMA for that memory with
> the VMA for one of the application's thread stacks. This is causing random
> SEGVs with one of our large customer application.
> 
> At a high level, this is what's happening:
> 
>  1) App runs creating lots of threads.
>  2) It mmap's 256K pages of anonymous memory.
>  3) It writes executable code to that memory.
>  4) It calls mprotect() with PROT_EXEC on that memory so
>     it can subsequently execute the code.
> 
> The above mprotect() will fail if the mmap'd region's VMA gets merged with the
> VMA for one of the thread stacks.  That's because the default RHEL SELinux
> policy is to not allow executable stacks.

Then wouldn't the bug be at the SELinux end?  VMAs may have been merged
already, but the mprotect() with PROT_EXEC of the good non-stack range
will then split that area off from the stack again - maybe the SELinux
check does not understand that must happen?

Hugh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/mmap: Map MAP_STACK to VM_STACK
  2023-04-19  1:36     ` Hugh Dickins
@ 2023-04-19  1:45       ` Waiman Long
  2023-04-19  3:24         ` Matthew Wilcox
  0 siblings, 1 reply; 13+ messages in thread
From: Waiman Long @ 2023-04-19  1:45 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, linux-mm, linux-kernel, Joe Mario, Barry Marson,
	Rafael Aquini


On 4/18/23 21:36, Hugh Dickins wrote:
> On Tue, 18 Apr 2023, Waiman Long wrote:
>> On 4/18/23 17:18, Andrew Morton wrote:
>>> On Tue, 18 Apr 2023 17:02:30 -0400 Waiman Long <longman@redhat.com> wrote:
>>>
>>>> One of the flags of mmap(2) is MAP_STACK to request a memory segment
>>>> suitable for a process or thread stack. The kernel currently ignores
>>>> this flags. Glibc uses MAP_STACK when mmapping a thread stack. However,
>>>> selinux has an execstack check in selinux_file_mprotect() which disallows
>>>> a stack VMA to be made executable.
>>>>
>>>> Since MAP_STACK is a noop, it is possible for a stack VMA to be merged
>>>> with an adjacent anonymous VMA. With that merging, using mprotect(2)
>>>> to change a part of the merged anonymous VMA to make it executable may
>>>> fail. This can lead to sporadic failure of applications that need to
>>>> make those changes.
>>> "Sporadic failure of applications" sounds quite serious.  Can you
>>> provide more details?
>> The problem boils down to the fact that it is possible for user code to mmap a
>> region of memory and then for the kernel to merge the VMA for that memory with
>> the VMA for one of the application's thread stacks. This is causing random
>> SEGVs with one of our large customer application.
>>
>> At a high level, this is what's happening:
>>
>>   1) App runs creating lots of threads.
>>   2) It mmap's 256K pages of anonymous memory.
>>   3) It writes executable code to that memory.
>>   4) It calls mprotect() with PROT_EXEC on that memory so
>>      it can subsequently execute the code.
>>
>> The above mprotect() will fail if the mmap'd region's VMA gets merged with the
>> VMA for one of the thread stacks.  That's because the default RHEL SELinux
>> policy is to not allow executable stacks.
> Then wouldn't the bug be at the SELinux end?  VMAs may have been merged
> already, but the mprotect() with PROT_EXEC of the good non-stack range
> will then split that area off from the stack again - maybe the SELinux
> check does not understand that must happen?

The SELinux check is done per VMA, not a region within a VMA. After VMA 
merging, SELinux is probably not able to determine which part of a VMA 
is a stack unless we keep that information somewhere and provide an API 
for SELinux to query. That can be quite a lot of work. So the easiest 
way to prevent this problem is to avoid merging a stack VMA with a 
regular anonymous VMA.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/mmap: Map MAP_STACK to VM_STACK
  2023-04-19  1:45       ` Waiman Long
@ 2023-04-19  3:24         ` Matthew Wilcox
  2023-04-19 14:38           ` Paul Moore
  0 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2023-04-19  3:24 UTC (permalink / raw)
  To: Waiman Long
  Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel, Joe Mario,
	Barry Marson, Rafael Aquini, Paul Moore, Stephen Smalley,
	Eric Paris, selinux, James Morris, Serge E. Hallyn,
	linux-security-module

On Tue, Apr 18, 2023 at 09:45:34PM -0400, Waiman Long wrote:
> 
> On 4/18/23 21:36, Hugh Dickins wrote:
> > On Tue, 18 Apr 2023, Waiman Long wrote:
> > > On 4/18/23 17:18, Andrew Morton wrote:
> > > > On Tue, 18 Apr 2023 17:02:30 -0400 Waiman Long <longman@redhat.com> wrote:
> > > > 
> > > > > One of the flags of mmap(2) is MAP_STACK to request a memory segment
> > > > > suitable for a process or thread stack. The kernel currently ignores
> > > > > this flags. Glibc uses MAP_STACK when mmapping a thread stack. However,
> > > > > selinux has an execstack check in selinux_file_mprotect() which disallows
> > > > > a stack VMA to be made executable.
> > > > > 
> > > > > Since MAP_STACK is a noop, it is possible for a stack VMA to be merged
> > > > > with an adjacent anonymous VMA. With that merging, using mprotect(2)
> > > > > to change a part of the merged anonymous VMA to make it executable may
> > > > > fail. This can lead to sporadic failure of applications that need to
> > > > > make those changes.
> > > > "Sporadic failure of applications" sounds quite serious.  Can you
> > > > provide more details?
> > > The problem boils down to the fact that it is possible for user code to mmap a
> > > region of memory and then for the kernel to merge the VMA for that memory with
> > > the VMA for one of the application's thread stacks. This is causing random
> > > SEGVs with one of our large customer application.
> > > 
> > > At a high level, this is what's happening:
> > > 
> > >   1) App runs creating lots of threads.
> > >   2) It mmap's 256K pages of anonymous memory.
> > >   3) It writes executable code to that memory.
> > >   4) It calls mprotect() with PROT_EXEC on that memory so
> > >      it can subsequently execute the code.
> > > 
> > > The above mprotect() will fail if the mmap'd region's VMA gets merged with the
> > > VMA for one of the thread stacks.  That's because the default RHEL SELinux
> > > policy is to not allow executable stacks.
> > Then wouldn't the bug be at the SELinux end?  VMAs may have been merged
> > already, but the mprotect() with PROT_EXEC of the good non-stack range
> > will then split that area off from the stack again - maybe the SELinux
> > check does not understand that must happen?
> 
> The SELinux check is done per VMA, not a region within a VMA. After VMA
> merging, SELinux is probably not able to determine which part of a VMA is a
> stack unless we keep that information somewhere and provide an API for
> SELinux to query. That can be quite a lot of work. So the easiest way to
> prevent this problem is to avoid merging a stack VMA with a regular
> anonymous VMA.

To paraphrase you, "Yes, SELinux is buggy, but we don't want to fix it".

Cc'ing the SELinux people so it can be fixed properly.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/mmap: Map MAP_STACK to VM_STACK
  2023-04-19  1:16   ` Waiman Long
  2023-04-19  1:36     ` Hugh Dickins
@ 2023-04-19  3:46     ` Matthew Wilcox
  2023-04-19 15:07       ` Waiman Long
  1 sibling, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2023-04-19  3:46 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, linux-mm, linux-kernel, Joe Mario, Barry Marson,
	Rafael Aquini

On Tue, Apr 18, 2023 at 09:16:37PM -0400, Waiman Long wrote:
>  1) App runs creating lots of threads.
>  2) It mmap's 256K pages of anonymous memory.
>  3) It writes executable code to that memory.
>  4) It calls mprotect() with PROT_EXEC on that memory so
>     it can subsequently execute the code.
> 
> The above mprotect() will fail if the mmap'd region's VMA gets merged with
> the VMA for one of the thread stacks.  That's because the default RHEL
> SELinux policy is to not allow executable stacks.

By the way, this is a daft policy.  The policy you really want is
EXEC|WRITE is not allowed.  A non-writable stack is useless, so it's
actually a superset of your current policy.  Forbidding _simultaneous_
write and executable is just good programming.  This way, you don't need
to care about the underlying VMA's current permissions, you just need
to do:

	if ((prot & (PROT_EXEC|PROT_WRITE)) == (PROT_EXEC|PROT_WRITE))
		return -EACCESS;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/mmap: Map MAP_STACK to VM_STACK
  2023-04-19  3:24         ` Matthew Wilcox
@ 2023-04-19 14:38           ` Paul Moore
  0 siblings, 0 replies; 13+ messages in thread
From: Paul Moore @ 2023-04-19 14:38 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Waiman Long, Hugh Dickins, Andrew Morton, linux-mm, linux-kernel,
	Joe Mario, Barry Marson, Rafael Aquini, Stephen Smalley,
	Eric Paris, selinux, James Morris, Serge E. Hallyn,
	linux-security-module

On Tue, Apr 18, 2023 at 11:24 PM Matthew Wilcox <willy@infradead.org> wrote:
> On Tue, Apr 18, 2023 at 09:45:34PM -0400, Waiman Long wrote:
> > On 4/18/23 21:36, Hugh Dickins wrote:
> > > On Tue, 18 Apr 2023, Waiman Long wrote:
> > > > On 4/18/23 17:18, Andrew Morton wrote:
> > > > > On Tue, 18 Apr 2023 17:02:30 -0400 Waiman Long <longman@redhat.com> wrote:
> > > > >
> > > > > > One of the flags of mmap(2) is MAP_STACK to request a memory segment
> > > > > > suitable for a process or thread stack. The kernel currently ignores
> > > > > > this flags. Glibc uses MAP_STACK when mmapping a thread stack. However,
> > > > > > selinux has an execstack check in selinux_file_mprotect() which disallows
> > > > > > a stack VMA to be made executable.
> > > > > >
> > > > > > Since MAP_STACK is a noop, it is possible for a stack VMA to be merged
> > > > > > with an adjacent anonymous VMA. With that merging, using mprotect(2)
> > > > > > to change a part of the merged anonymous VMA to make it executable may
> > > > > > fail. This can lead to sporadic failure of applications that need to
> > > > > > make those changes.
> > > > > "Sporadic failure of applications" sounds quite serious.  Can you
> > > > > provide more details?
> > > > The problem boils down to the fact that it is possible for user code to mmap a
> > > > region of memory and then for the kernel to merge the VMA for that memory with
> > > > the VMA for one of the application's thread stacks. This is causing random
> > > > SEGVs with one of our large customer application.
> > > >
> > > > At a high level, this is what's happening:
> > > >
> > > >   1) App runs creating lots of threads.
> > > >   2) It mmap's 256K pages of anonymous memory.
> > > >   3) It writes executable code to that memory.
> > > >   4) It calls mprotect() with PROT_EXEC on that memory so
> > > >      it can subsequently execute the code.
> > > >
> > > > The above mprotect() will fail if the mmap'd region's VMA gets merged with the
> > > > VMA for one of the thread stacks.  That's because the default RHEL SELinux
> > > > policy is to not allow executable stacks.
> > > Then wouldn't the bug be at the SELinux end?  VMAs may have been merged
> > > already, but the mprotect() with PROT_EXEC of the good non-stack range
> > > will then split that area off from the stack again - maybe the SELinux
> > > check does not understand that must happen?
> >
> > The SELinux check is done per VMA, not a region within a VMA. After VMA
> > merging, SELinux is probably not able to determine which part of a VMA is a
> > stack unless we keep that information somewhere and provide an API for
> > SELinux to query. That can be quite a lot of work. So the easiest way to
> > prevent this problem is to avoid merging a stack VMA with a regular
> > anonymous VMA.
>
> To paraphrase you, "Yes, SELinux is buggy, but we don't want to fix it".
>
> Cc'ing the SELinux people so it can be fixed properly.

SELinux needs some way to determine what memory region is currently
being used by an application's stacks.  The current logic can be found
in selinux_file_mprotect(), the relevant snippet is below:

int selinux_file_mprotect(struct vm_area_struct *vma, ...)
{
  ...
  } else if (!vma->vm_file &&
    ((vma->vm_start <= vma->vm_mm->start_stack &&
      vma->vm_end >= vma->vm_mm->start_stack) ||
    vma_is_stack_for_current(vma))) {
      rc = avc_has_perm(&selinux_state,
                        sid, sid, SECCLASS_PROCESS,
                        PROCESS__EXECSTACK, NULL);
 }
  ...
}

If someone has a better, more foolproof way to determine an
application's stack please let us know, or better yet submit a patch
:)

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/mmap: Map MAP_STACK to VM_STACK
  2023-04-19  3:46     ` Matthew Wilcox
@ 2023-04-19 15:07       ` Waiman Long
  2023-04-19 15:09         ` Matthew Wilcox
  0 siblings, 1 reply; 13+ messages in thread
From: Waiman Long @ 2023-04-19 15:07 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, linux-mm, linux-kernel, Joe Mario, Barry Marson,
	Rafael Aquini

On 4/18/23 23:46, Matthew Wilcox wrote:
> On Tue, Apr 18, 2023 at 09:16:37PM -0400, Waiman Long wrote:
>>   1) App runs creating lots of threads.
>>   2) It mmap's 256K pages of anonymous memory.
>>   3) It writes executable code to that memory.
>>   4) It calls mprotect() with PROT_EXEC on that memory so
>>      it can subsequently execute the code.
>>
>> The above mprotect() will fail if the mmap'd region's VMA gets merged with
>> the VMA for one of the thread stacks.  That's because the default RHEL
>> SELinux policy is to not allow executable stacks.
> By the way, this is a daft policy.  The policy you really want is
> EXEC|WRITE is not allowed.  A non-writable stack is useless, so it's
> actually a superset of your current policy.  Forbidding _simultaneous_
> write and executable is just good programming.  This way, you don't need
> to care about the underlying VMA's current permissions, you just need
> to do:
>
> 	if ((prot & (PROT_EXEC|PROT_WRITE)) == (PROT_EXEC|PROT_WRITE))
> 		return -EACCESS;

I am not totally sure if the application changes the VMA to read-only 
first. Even if it does that, it highlights another possible issue when 
an anonymous VMA is merged with a stack VMA. Either the mprotect() to 
write-protect the VMA will fail or the application will segfault if it 
writes stuff to the stack. This particular issue is not related to 
SELinux. It provides another good idea why we should avoid merging stack 
VMA to anonymous VMA.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/mmap: Map MAP_STACK to VM_STACK
  2023-04-19 15:07       ` Waiman Long
@ 2023-04-19 15:09         ` Matthew Wilcox
  2023-04-19 16:00           ` Joe Mario
  0 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2023-04-19 15:09 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, linux-mm, linux-kernel, Joe Mario, Barry Marson,
	Rafael Aquini

On Wed, Apr 19, 2023 at 11:07:04AM -0400, Waiman Long wrote:
> On 4/18/23 23:46, Matthew Wilcox wrote:
> > On Tue, Apr 18, 2023 at 09:16:37PM -0400, Waiman Long wrote:
> > >   1) App runs creating lots of threads.
> > >   2) It mmap's 256K pages of anonymous memory.
> > >   3) It writes executable code to that memory.
> > >   4) It calls mprotect() with PROT_EXEC on that memory so
> > >      it can subsequently execute the code.
> > > 
> > > The above mprotect() will fail if the mmap'd region's VMA gets merged with
> > > the VMA for one of the thread stacks.  That's because the default RHEL
> > > SELinux policy is to not allow executable stacks.
> > By the way, this is a daft policy.  The policy you really want is
> > EXEC|WRITE is not allowed.  A non-writable stack is useless, so it's
> > actually a superset of your current policy.  Forbidding _simultaneous_
> > write and executable is just good programming.  This way, you don't need
> > to care about the underlying VMA's current permissions, you just need
> > to do:
> > 
> > 	if ((prot & (PROT_EXEC|PROT_WRITE)) == (PROT_EXEC|PROT_WRITE))
> > 		return -EACCESS;
> 
> I am not totally sure if the application changes the VMA to read-only first.
> Even if it does that, it highlights another possible issue when an anonymous
> VMA is merged with a stack VMA. Either the mprotect() to write-protect the
> VMA will fail or the application will segfault if it writes stuff to the
> stack. This particular issue is not related to SELinux. It provides another
> good idea why we should avoid merging stack VMA to anonymous VMA.

mprotect will split the VMA into two VMAs, one that is
PROT_READ|PROT_WRITE and one the is PROT_READ|PROT_EXEC.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/mmap: Map MAP_STACK to VM_STACK
  2023-04-19 15:09         ` Matthew Wilcox
@ 2023-04-19 16:00           ` Joe Mario
  0 siblings, 0 replies; 13+ messages in thread
From: Joe Mario @ 2023-04-19 16:00 UTC (permalink / raw)
  To: Matthew Wilcox, Waiman Long
  Cc: Andrew Morton, linux-mm, linux-kernel, Barry Marson, Rafael Aquini



On 4/19/23 11:09 AM, Matthew Wilcox wrote:
> On Wed, Apr 19, 2023 at 11:07:04AM -0400, Waiman Long wrote:
>> On 4/18/23 23:46, Matthew Wilcox wrote:
>>> On Tue, Apr 18, 2023 at 09:16:37PM -0400, Waiman Long wrote:
>>>>   1) App runs creating lots of threads.
>>>>   2) It mmap's 256K pages of anonymous memory.
>>>>   3) It writes executable code to that memory.
>>>>   4) It calls mprotect() with PROT_EXEC on that memory so
>>>>      it can subsequently execute the code.
>>>>
>>>> The above mprotect() will fail if the mmap'd region's VMA gets merged with
>>>> the VMA for one of the thread stacks.  That's because the default RHEL
>>>> SELinux policy is to not allow executable stacks.
>>> By the way, this is a daft policy.  The policy you really want is
>>> EXEC|WRITE is not allowed.  A non-writable stack is useless, so it's
>>> actually a superset of your current policy.  Forbidding _simultaneous_
>>> write and executable is just good programming.  This way, you don't need
>>> to care about the underlying VMA's current permissions, you just need
>>> to do:
>>>
>>> 	if ((prot & (PROT_EXEC|PROT_WRITE)) == (PROT_EXEC|PROT_WRITE))
>>> 		return -EACCESS;
>>
>> I am not totally sure if the application changes the VMA to read-only first.
>> Even if it does that, it highlights another possible issue when an anonymous
>> VMA is merged with a stack VMA. Either the mprotect() to write-protect the
>> VMA will fail or the application will segfault if it writes stuff to the
>> stack. This particular issue is not related to SELinux. It provides another
>> good idea why we should avoid merging stack VMA to anonymous VMA.
> 
> mprotect will split the VMA into two VMAs, one that is
> PROT_READ|PROT_WRITE and one the is PROT_READ|PROT_EXEC.
> 

But in this case, the latter still has PROT_WRITE.  

This was reported by a large data analytics customer.  They started getting infrequent random crashes in code they haven't touched in 10 years.

One of the threads in their program mmaps a large region using PROT_READ|PROT_WRITE, and that region just happens to be merged with the thread's stack.

Then they copy a small snipit of code to a location somewhere within that mapped region. For the one page that contains that code, they mprotect it to PROT_READ|PROT_WRITE|PROT_EXEC.  I recall they're still reading and writing data elsewhere on that page.

Joe




  


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/mmap: Map MAP_STACK to VM_STACK
  2023-04-18 21:18 ` Andrew Morton
  2023-04-19  1:16   ` Waiman Long
@ 2023-04-19 23:21   ` Jane Chu
  2023-04-20  0:00     ` Jane Chu
  1 sibling, 1 reply; 13+ messages in thread
From: Jane Chu @ 2023-04-19 23:21 UTC (permalink / raw)
  To: Andrew Morton, Waiman Long
  Cc: linux-mm, linux-kernel, Joe Mario, Barry Marson, Rafael Aquini

On 4/18/2023 2:18 PM, Andrew Morton wrote:
> On Tue, 18 Apr 2023 17:02:30 -0400 Waiman Long <longman@redhat.com> wrote:
[..]
>> ...
>>
>> --- a/include/linux/mman.h
>> +++ b/include/linux/mman.h
>> @@ -152,6 +152,7 @@ calc_vm_flag_bits(unsigned long flags)
>>   	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
>>   	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
>>   	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
>> +	       _calc_vm_trans(flags, MAP_STACK,	     VM_STACK     ) |
>>   	       arch_calc_vm_flag_bits(flags);
>>   }
> 
> The mmap(2) manpage says
> 
>    This flag is currently a no-op on Linux.  However, by employing
>    this flag, applications can ensure that they transparently ob- tain
>    support if the flag is implemented in the future.  Thus, it is used
>    in the glibc threading implementation to allow for the fact that some
>    architectures may (later) require special treat- ment for stack
>    allocations.  A further reason to employ this flag is portability:
>    MAP_STACK exists (and has an effect) on some other systems (e.g.,
>    some of the BSDs).
> 
> so please propose an update for this?
> 

Just curious, why isn't MAP_STACK implemented in Linux kernel? what does 
it take to implement it?

Also, could there be other potential issue with the vma merge, such as, 
the user process start to truncate half of the anonymous memory vma 
range oblivious to the fact that the vma has 'grown' into its stack and 
it might be attempting to unmap some of its stack range?

If the vma merge is otherwise harmless, does it bring benefit other than 
being one vma less?

thanks!
-jane


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/mmap: Map MAP_STACK to VM_STACK
  2023-04-19 23:21   ` Jane Chu
@ 2023-04-20  0:00     ` Jane Chu
  0 siblings, 0 replies; 13+ messages in thread
From: Jane Chu @ 2023-04-20  0:00 UTC (permalink / raw)
  To: Andrew Morton, Waiman Long
  Cc: linux-mm, linux-kernel, Joe Mario, Barry Marson, Rafael Aquini



On 4/19/2023 4:21 PM, Jane Chu wrote:
> On 4/18/2023 2:18 PM, Andrew Morton wrote:
>> On Tue, 18 Apr 2023 17:02:30 -0400 Waiman Long <longman@redhat.com> 
>> wrote:
> [..]
>>> ...
>>>
>>> --- a/include/linux/mman.h
>>> +++ b/include/linux/mman.h
>>> @@ -152,6 +152,7 @@ calc_vm_flag_bits(unsigned long flags)
>>>       return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
>>>              _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
>>>              _calc_vm_trans(flags, MAP_SYNC,         VM_SYNC      ) |
>>> +           _calc_vm_trans(flags, MAP_STACK,         VM_STACK     ) |
>>>              arch_calc_vm_flag_bits(flags);
>>>   }
>>
>> The mmap(2) manpage says
>>
>>    This flag is currently a no-op on Linux.  However, by employing
>>    this flag, applications can ensure that they transparently ob- tain
>>    support if the flag is implemented in the future.  Thus, it is used
>>    in the glibc threading implementation to allow for the fact that some
>>    architectures may (later) require special treat- ment for stack
>>    allocations.  A further reason to employ this flag is portability:
>>    MAP_STACK exists (and has an effect) on some other systems (e.g.,
>>    some of the BSDs).
>>
>> so please propose an update for this?
>>
> 
> Just curious, why isn't MAP_STACK implemented in Linux kernel? what does 
> it take to implement it?
> 
> Also, could there be other potential issue with the vma merge, such as, 
> the user process start to truncate half of the anonymous memory vma 
> range oblivious to the fact that the vma has 'grown' into its stack and 
> it might be attempting to unmap some of its stack range?

Sorry, not 'oblivious'. how about a malicious user process get an fd via 
memfd_create() and attempt to truncate more than it mmap'ed?

> 
> If the vma merge is otherwise harmless, does it bring benefit other than 
> being one vma less?
> 
> thanks!
> -jane
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-04-20  0:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-18 21:02 [PATCH] mm/mmap: Map MAP_STACK to VM_STACK Waiman Long
2023-04-18 21:18 ` Andrew Morton
2023-04-19  1:16   ` Waiman Long
2023-04-19  1:36     ` Hugh Dickins
2023-04-19  1:45       ` Waiman Long
2023-04-19  3:24         ` Matthew Wilcox
2023-04-19 14:38           ` Paul Moore
2023-04-19  3:46     ` Matthew Wilcox
2023-04-19 15:07       ` Waiman Long
2023-04-19 15:09         ` Matthew Wilcox
2023-04-19 16:00           ` Joe Mario
2023-04-19 23:21   ` Jane Chu
2023-04-20  0:00     ` Jane Chu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.