linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm: align larger anonymous mappings on THP boundaries
@ 2022-08-09 18:24 Rik van Riel
  2022-08-10 17:06 ` Yang Shi
  2024-01-16 11:53 ` Jiri Slaby
  0 siblings, 2 replies; 29+ messages in thread
From: Rik van Riel @ 2022-08-09 18:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, kernel-team, Matthew Wilcox, Yang Shi

Align larger anonymous memory mappings on THP boundaries by
going through thp_get_unmapped_area if THPs are enabled for
the current process.

With this patch, larger anonymous mappings are now THP aligned.
When a malloc library allocates a 2MB or larger arena, that
arena can now be mapped with THPs right from the start, which
can result in better TLB hit rates and execution time.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
v2: avoid the chicken & egg issue with MMF_VM_HUGEPAGE (Yang Shi)

 mm/mmap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index c035020d0c89..1d859893436d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2229,6 +2229,9 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
 		 */
 		pgoff = 0;
 		get_area = shmem_get_unmapped_area;
+	} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+		/* Ensures that larger anonymous mappings are THP aligned. */
+		get_area = thp_get_unmapped_area;
 	}
 
 	addr = get_area(file, addr, len, pgoff, flags);
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2022-08-09 18:24 [PATCH v2] mm: align larger anonymous mappings on THP boundaries Rik van Riel
@ 2022-08-10 17:06 ` Yang Shi
  2024-01-16 11:53 ` Jiri Slaby
  1 sibling, 0 replies; 29+ messages in thread
From: Yang Shi @ 2022-08-10 17:06 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, linux-mm, linux-kernel, kernel-team, Matthew Wilcox

On Tue, Aug 9, 2022 at 11:25 AM Rik van Riel <riel@surriel.com> wrote:
>
> Align larger anonymous memory mappings on THP boundaries by
> going through thp_get_unmapped_area if THPs are enabled for
> the current process.
>
> With this patch, larger anonymous mappings are now THP aligned.
> When a malloc library allocates a 2MB or larger arena, that
> arena can now be mapped with THPs right from the start, which
> can result in better TLB hit rates and execution time.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
> v2: avoid the chicken & egg issue with MMF_VM_HUGEPAGE (Yang Shi)

Reviewed-by: Yang Shi <shy828301@gmail.com>

>
>  mm/mmap.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index c035020d0c89..1d859893436d 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2229,6 +2229,9 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
>                  */
>                 pgoff = 0;
>                 get_area = shmem_get_unmapped_area;
> +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> +               /* Ensures that larger anonymous mappings are THP aligned. */
> +               get_area = thp_get_unmapped_area;
>         }
>
>         addr = get_area(file, addr, len, pgoff, flags);
> --
> 2.37.1
>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2022-08-09 18:24 [PATCH v2] mm: align larger anonymous mappings on THP boundaries Rik van Riel
  2022-08-10 17:06 ` Yang Shi
@ 2024-01-16 11:53 ` Jiri Slaby
  2024-01-16 12:09   ` Jiri Slaby
                     ` (2 more replies)
  1 sibling, 3 replies; 29+ messages in thread
From: Jiri Slaby @ 2024-01-16 11:53 UTC (permalink / raw)
  To: Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, kernel-team, Matthew Wilcox, Yang Shi,
	Christoph Lameter

Hi,

On 09. 08. 22, 20:24, Rik van Riel wrote:
> Align larger anonymous memory mappings on THP boundaries by
> going through thp_get_unmapped_area if THPs are enabled for
> the current process.
> 
> With this patch, larger anonymous mappings are now THP aligned.
> When a malloc library allocates a 2MB or larger arena, that
> arena can now be mapped with THPs right from the start, which
> can result in better TLB hit rates and execution time.

This appears to break 32bit processes on x86_64 (at least). In 
particular, 32bit kernel or firefox builds in our build system.

Reverting this on top of 6.7 makes it work again.

Downstream report:
  https://bugzilla.suse.com/show_bug.cgi?id=1218841

So running:
pahole -J --btf_gen_floats -j --lang_exclude=rust 
--skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf

crashes or errors out with some random errors:
[182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346 
Error emitting field

strace shows mmap() fails with ENOMEM right before the errors:
1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
...
1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate memory)

Note the .tmp_vmlinux.btf above can be arbitrary, but likely large 
enough. For reference, one is available at:
https://decibel.fi.muni.cz/~xslaby/n/btf

Any ideas?

> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
> v2: avoid the chicken & egg issue with MMF_VM_HUGEPAGE (Yang Shi)
> 
>   mm/mmap.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index c035020d0c89..1d859893436d 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2229,6 +2229,9 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
>   		 */
>   		pgoff = 0;
>   		get_area = shmem_get_unmapped_area;
> +	} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> +		/* Ensures that larger anonymous mappings are THP aligned. */
> +		get_area = thp_get_unmapped_area;
>   	}
>   
>   	addr = get_area(file, addr, len, pgoff, flags);

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-16 11:53 ` Jiri Slaby
@ 2024-01-16 12:09   ` Jiri Slaby
  2024-01-16 19:16     ` Suren Baghdasaryan
                       ` (2 more replies)
  2024-01-20 13:43   ` Linux regression tracking #adding (Thorsten Leemhuis)
  2024-01-20 15:47   ` Linux regression tracking #adding (Thorsten Leemhuis)
  2 siblings, 3 replies; 29+ messages in thread
From: Jiri Slaby @ 2024-01-16 12:09 UTC (permalink / raw)
  To: Rik van Riel, Andrew Morton
  Cc: linux-mm, linux-kernel, kernel-team, Matthew Wilcox, Yang Shi,
	Christoph Lameter

On 16. 01. 24, 12:53, Jiri Slaby wrote:
> Hi,
> 
> On 09. 08. 22, 20:24, Rik van Riel wrote:
>> Align larger anonymous memory mappings on THP boundaries by
>> going through thp_get_unmapped_area if THPs are enabled for
>> the current process.
>>
>> With this patch, larger anonymous mappings are now THP aligned.
>> When a malloc library allocates a 2MB or larger arena, that
>> arena can now be mapped with THPs right from the start, which
>> can result in better TLB hit rates and execution time.
> 
> This appears to break 32bit processes on x86_64 (at least). In 
> particular, 32bit kernel or firefox builds in our build system.
> 
> Reverting this on top of 6.7 makes it work again.
> 
> Downstream report:
>   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> 
> So running:
> pahole -J --btf_gen_floats -j --lang_exclude=rust 
> --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> 
> crashes or errors out with some random errors:
> [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346 
> Error emitting field
> 
> strace shows mmap() fails with ENOMEM right before the errors:
> 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE, 
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> ...
> 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate 
> memory)
> 
> Note the .tmp_vmlinux.btf above can be arbitrary, but likely large 
> enough. For reference, one is available at:
> https://decibel.fi.muni.cz/~xslaby/n/btf
> 
> Any ideas?

This works around the problem, of course (but is a band-aid, not a fix):

--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long 
addr, unsigned long len,
                  */
                 pgoff = 0;
                 get_area = shmem_get_unmapped_area;
-       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && 
!in_32bit_syscall()) {
                 /* Ensures that larger anonymous mappings are THP 
aligned. */
                 get_area = thp_get_unmapped_area;
         }


thp_get_unmapped_area() does not take care of the legacy stuff...

regards,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-16 12:09   ` Jiri Slaby
@ 2024-01-16 19:16     ` Suren Baghdasaryan
  2024-01-16 20:56       ` Yang Shi
  2024-01-16 20:55     ` Yang Shi
  2024-01-18  0:07     ` Yang Shi
  2 siblings, 1 reply; 29+ messages in thread
From: Suren Baghdasaryan @ 2024-01-16 19:16 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Rik van Riel, Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Matthew Wilcox, Yang Shi, Christoph Lameter

On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
>
> On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > Hi,
> >
> > On 09. 08. 22, 20:24, Rik van Riel wrote:
> >> Align larger anonymous memory mappings on THP boundaries by
> >> going through thp_get_unmapped_area if THPs are enabled for
> >> the current process.
> >>
> >> With this patch, larger anonymous mappings are now THP aligned.
> >> When a malloc library allocates a 2MB or larger arena, that
> >> arena can now be mapped with THPs right from the start, which
> >> can result in better TLB hit rates and execution time.
> >
> > This appears to break 32bit processes on x86_64 (at least). In
> > particular, 32bit kernel or firefox builds in our build system.
> >
> > Reverting this on top of 6.7 makes it work again.
> >
> > Downstream report:
> >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> >
> > So running:
> > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> >
> > crashes or errors out with some random errors:
> > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > Error emitting field
> >
> > strace shows mmap() fails with ENOMEM right before the errors:
> > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > ...
> > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > memory)
> >
> > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > enough. For reference, one is available at:
> > https://decibel.fi.muni.cz/~xslaby/n/btf
> >
> > Any ideas?
>
> This works around the problem, of course (but is a band-aid, not a fix):
>
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> addr, unsigned long len,
>                   */
>                  pgoff = 0;
>                  get_area = shmem_get_unmapped_area;
> -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> !in_32bit_syscall()) {
>                  /* Ensures that larger anonymous mappings are THP
> aligned. */
>                  get_area = thp_get_unmapped_area;
>          }
>
>
> thp_get_unmapped_area() does not take care of the legacy stuff...

This change also affects the entropy of allocations. With this patch
Android test [1] started failing and it requires only 8 bits of
entropy. The feedback from our security team:

8 bits of entropy is already embarrassingly low, but was necessary for
32 bit ARM targets with low RAM at the time. It's definitely not
acceptable for 64 bit targets.

Could this change be either reverted or made optional (opt-in/opt-out)?
Thanks,
Suren.

[1] https://cs.android.com/android/platform/superproject/main/+/main:cts/tests/aslr/src/AslrMallocTest.cpp;l=130

>
> regards,
> --
> js
> suse labs
>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-16 12:09   ` Jiri Slaby
  2024-01-16 19:16     ` Suren Baghdasaryan
@ 2024-01-16 20:55     ` Yang Shi
  2024-01-18  0:07     ` Yang Shi
  2 siblings, 0 replies; 29+ messages in thread
From: Yang Shi @ 2024-01-16 20:55 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Rik van Riel, Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Matthew Wilcox, Christoph Lameter

On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
>
> On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > Hi,
> >
> > On 09. 08. 22, 20:24, Rik van Riel wrote:
> >> Align larger anonymous memory mappings on THP boundaries by
> >> going through thp_get_unmapped_area if THPs are enabled for
> >> the current process.
> >>
> >> With this patch, larger anonymous mappings are now THP aligned.
> >> When a malloc library allocates a 2MB or larger arena, that
> >> arena can now be mapped with THPs right from the start, which
> >> can result in better TLB hit rates and execution time.
> >
> > This appears to break 32bit processes on x86_64 (at least). In
> > particular, 32bit kernel or firefox builds in our build system.
> >
> > Reverting this on top of 6.7 makes it work again.
> >
> > Downstream report:
> >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> >
> > So running:
> > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> >
> > crashes or errors out with some random errors:
> > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > Error emitting field
> >
> > strace shows mmap() fails with ENOMEM right before the errors:
> > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > ...
> > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > memory)

It should be due to the virtual address space size on 32-bit machine.

> >
> > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > enough. For reference, one is available at:
> > https://decibel.fi.muni.cz/~xslaby/n/btf
> >
> > Any ideas?
>
> This works around the problem, of course (but is a band-aid, not a fix):
>
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> addr, unsigned long len,
>                   */
>                  pgoff = 0;
>                  get_area = shmem_get_unmapped_area;
> -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> !in_32bit_syscall()) {
>                  /* Ensures that larger anonymous mappings are THP
> aligned. */
>                  get_area = thp_get_unmapped_area;
>          }
>
>
> thp_get_unmapped_area() does not take care of the legacy stuff...

Thanks for the report. I think we can just make
thp_get_unmapped_area() no-op on 32 bit.

>
> regards,
> --
> js
> suse labs
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-16 19:16     ` Suren Baghdasaryan
@ 2024-01-16 20:56       ` Yang Shi
  2024-01-16 21:57         ` Suren Baghdasaryan
  0 siblings, 1 reply; 29+ messages in thread
From: Yang Shi @ 2024-01-16 20:56 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Jiri Slaby, Rik van Riel, Andrew Morton, linux-mm, linux-kernel,
	kernel-team, Matthew Wilcox, Christoph Lameter

On Tue, Jan 16, 2024 at 11:16 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> >
> > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > Hi,
> > >
> > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > >> Align larger anonymous memory mappings on THP boundaries by
> > >> going through thp_get_unmapped_area if THPs are enabled for
> > >> the current process.
> > >>
> > >> With this patch, larger anonymous mappings are now THP aligned.
> > >> When a malloc library allocates a 2MB or larger arena, that
> > >> arena can now be mapped with THPs right from the start, which
> > >> can result in better TLB hit rates and execution time.
> > >
> > > This appears to break 32bit processes on x86_64 (at least). In
> > > particular, 32bit kernel or firefox builds in our build system.
> > >
> > > Reverting this on top of 6.7 makes it work again.
> > >
> > > Downstream report:
> > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > >
> > > So running:
> > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > >
> > > crashes or errors out with some random errors:
> > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > Error emitting field
> > >
> > > strace shows mmap() fails with ENOMEM right before the errors:
> > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > ...
> > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > memory)
> > >
> > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > enough. For reference, one is available at:
> > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > >
> > > Any ideas?
> >
> > This works around the problem, of course (but is a band-aid, not a fix):
> >
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > addr, unsigned long len,
> >                   */
> >                  pgoff = 0;
> >                  get_area = shmem_get_unmapped_area;
> > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > !in_32bit_syscall()) {
> >                  /* Ensures that larger anonymous mappings are THP
> > aligned. */
> >                  get_area = thp_get_unmapped_area;
> >          }
> >
> >
> > thp_get_unmapped_area() does not take care of the legacy stuff...
>
> This change also affects the entropy of allocations. With this patch
> Android test [1] started failing and it requires only 8 bits of
> entropy. The feedback from our security team:
>
> 8 bits of entropy is already embarrassingly low, but was necessary for
> 32 bit ARM targets with low RAM at the time. It's definitely not
> acceptable for 64 bit targets.

Thanks for the report. Is it 32 bit only or 64 bit is also impacted?
If I understand the code correctly, it expects the address allocated
by malloc() is kind of randomized, right?

>
> Could this change be either reverted or made optional (opt-in/opt-out)?
> Thanks,
> Suren.
>
> [1] https://cs.android.com/android/platform/superproject/main/+/main:cts/tests/aslr/src/AslrMallocTest.cpp;l=130
>
> >
> > regards,
> > --
> > js
> > suse labs
> >
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-16 20:56       ` Yang Shi
@ 2024-01-16 21:57         ` Suren Baghdasaryan
  2024-01-16 22:25           ` Yang Shi
  0 siblings, 1 reply; 29+ messages in thread
From: Suren Baghdasaryan @ 2024-01-16 21:57 UTC (permalink / raw)
  To: Yang Shi
  Cc: Jiri Slaby, Rik van Riel, Andrew Morton, linux-mm, linux-kernel,
	kernel-team, Matthew Wilcox, Christoph Lameter

On Tue, Jan 16, 2024 at 12:56 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Tue, Jan 16, 2024 at 11:16 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> > >
> > > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > > Hi,
> > > >
> > > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > > >> Align larger anonymous memory mappings on THP boundaries by
> > > >> going through thp_get_unmapped_area if THPs are enabled for
> > > >> the current process.
> > > >>
> > > >> With this patch, larger anonymous mappings are now THP aligned.
> > > >> When a malloc library allocates a 2MB or larger arena, that
> > > >> arena can now be mapped with THPs right from the start, which
> > > >> can result in better TLB hit rates and execution time.
> > > >
> > > > This appears to break 32bit processes on x86_64 (at least). In
> > > > particular, 32bit kernel or firefox builds in our build system.
> > > >
> > > > Reverting this on top of 6.7 makes it work again.
> > > >
> > > > Downstream report:
> > > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > > >
> > > > So running:
> > > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > > >
> > > > crashes or errors out with some random errors:
> > > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > > Error emitting field
> > > >
> > > > strace shows mmap() fails with ENOMEM right before the errors:
> > > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > > ...
> > > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > > memory)
> > > >
> > > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > > enough. For reference, one is available at:
> > > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > > >
> > > > Any ideas?
> > >
> > > This works around the problem, of course (but is a band-aid, not a fix):
> > >
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > > addr, unsigned long len,
> > >                   */
> > >                  pgoff = 0;
> > >                  get_area = shmem_get_unmapped_area;
> > > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > > !in_32bit_syscall()) {
> > >                  /* Ensures that larger anonymous mappings are THP
> > > aligned. */
> > >                  get_area = thp_get_unmapped_area;
> > >          }
> > >
> > >
> > > thp_get_unmapped_area() does not take care of the legacy stuff...
> >
> > This change also affects the entropy of allocations. With this patch
> > Android test [1] started failing and it requires only 8 bits of
> > entropy. The feedback from our security team:
> >
> > 8 bits of entropy is already embarrassingly low, but was necessary for
> > 32 bit ARM targets with low RAM at the time. It's definitely not
> > acceptable for 64 bit targets.
>
> Thanks for the report. Is it 32 bit only or 64 bit is also impacted?
> If I understand the code correctly, it expects the address allocated
> by malloc() is kind of randomized, right?

Yes, correct, the test expects a certain level of address randomization.
The test failure was reported while running kernel_virt_x86_64 target
(Android emulator on x86), so it does impact 64bit targets.

>
> >
> > Could this change be either reverted or made optional (opt-in/opt-out)?
> > Thanks,
> > Suren.
> >
> > [1] https://cs.android.com/android/platform/superproject/main/+/main:cts/tests/aslr/src/AslrMallocTest.cpp;l=130
> >
> > >
> > > regards,
> > > --
> > > js
> > > suse labs
> > >
> > >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-16 21:57         ` Suren Baghdasaryan
@ 2024-01-16 22:25           ` Yang Shi
  2024-01-16 22:30             ` Suren Baghdasaryan
  0 siblings, 1 reply; 29+ messages in thread
From: Yang Shi @ 2024-01-16 22:25 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Jiri Slaby, Rik van Riel, Andrew Morton, linux-mm, linux-kernel,
	kernel-team, Matthew Wilcox, Christoph Lameter

On Tue, Jan 16, 2024 at 1:58 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Tue, Jan 16, 2024 at 12:56 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Tue, Jan 16, 2024 at 11:16 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> > > >
> > > > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > > > Hi,
> > > > >
> > > > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > > > >> Align larger anonymous memory mappings on THP boundaries by
> > > > >> going through thp_get_unmapped_area if THPs are enabled for
> > > > >> the current process.
> > > > >>
> > > > >> With this patch, larger anonymous mappings are now THP aligned.
> > > > >> When a malloc library allocates a 2MB or larger arena, that
> > > > >> arena can now be mapped with THPs right from the start, which
> > > > >> can result in better TLB hit rates and execution time.
> > > > >
> > > > > This appears to break 32bit processes on x86_64 (at least). In
> > > > > particular, 32bit kernel or firefox builds in our build system.
> > > > >
> > > > > Reverting this on top of 6.7 makes it work again.
> > > > >
> > > > > Downstream report:
> > > > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > > > >
> > > > > So running:
> > > > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > > > >
> > > > > crashes or errors out with some random errors:
> > > > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > > > Error emitting field
> > > > >
> > > > > strace shows mmap() fails with ENOMEM right before the errors:
> > > > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > > > ...
> > > > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > > > memory)
> > > > >
> > > > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > > > enough. For reference, one is available at:
> > > > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > > > >
> > > > > Any ideas?
> > > >
> > > > This works around the problem, of course (but is a band-aid, not a fix):
> > > >
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
> > > > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > > > addr, unsigned long len,
> > > >                   */
> > > >                  pgoff = 0;
> > > >                  get_area = shmem_get_unmapped_area;
> > > > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > > > !in_32bit_syscall()) {
> > > >                  /* Ensures that larger anonymous mappings are THP
> > > > aligned. */
> > > >                  get_area = thp_get_unmapped_area;
> > > >          }
> > > >
> > > >
> > > > thp_get_unmapped_area() does not take care of the legacy stuff...
> > >
> > > This change also affects the entropy of allocations. With this patch
> > > Android test [1] started failing and it requires only 8 bits of
> > > entropy. The feedback from our security team:
> > >
> > > 8 bits of entropy is already embarrassingly low, but was necessary for
> > > 32 bit ARM targets with low RAM at the time. It's definitely not
> > > acceptable for 64 bit targets.
> >
> > Thanks for the report. Is it 32 bit only or 64 bit is also impacted?
> > If I understand the code correctly, it expects the address allocated
> > by malloc() is kind of randomized, right?
>
> Yes, correct, the test expects a certain level of address randomization.
> The test failure was reported while running kernel_virt_x86_64 target
> (Android emulator on x86), so it does impact 64bit targets.

IIUC this breaks the "expectation" for randomized addresses returned
by malloc(), but it doesn't break any real Android application, right?
So this is a security concern instead of a real regression.

I think we can make this opt-in with a knob. Anyone who outweighs
security could opt this feature out. However I'm wondering whether
Android should implement a general address randomization mechanism
instead of depending on "luck" if you do care about it.

>
> >
> > >
> > > Could this change be either reverted or made optional (opt-in/opt-out)?
> > > Thanks,
> > > Suren.
> > >
> > > [1] https://cs.android.com/android/platform/superproject/main/+/main:cts/tests/aslr/src/AslrMallocTest.cpp;l=130
> > >
> > > >
> > > > regards,
> > > > --
> > > > js
> > > > suse labs
> > > >
> > > >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-16 22:25           ` Yang Shi
@ 2024-01-16 22:30             ` Suren Baghdasaryan
  2024-01-16 23:14               ` Yang Shi
  2024-01-17 17:40               ` Kees Cook
  0 siblings, 2 replies; 29+ messages in thread
From: Suren Baghdasaryan @ 2024-01-16 22:30 UTC (permalink / raw)
  To: Yang Shi, Jeffrey Vander Stoep, Kees Cook
  Cc: Jiri Slaby, Rik van Riel, Andrew Morton, linux-mm, linux-kernel,
	kernel-team, Matthew Wilcox, Christoph Lameter

On Tue, Jan 16, 2024 at 2:25 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Tue, Jan 16, 2024 at 1:58 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Tue, Jan 16, 2024 at 12:56 PM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Tue, Jan 16, 2024 at 11:16 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> > > > >
> > > > > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > > > > Hi,
> > > > > >
> > > > > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > > > > >> Align larger anonymous memory mappings on THP boundaries by
> > > > > >> going through thp_get_unmapped_area if THPs are enabled for
> > > > > >> the current process.
> > > > > >>
> > > > > >> With this patch, larger anonymous mappings are now THP aligned.
> > > > > >> When a malloc library allocates a 2MB or larger arena, that
> > > > > >> arena can now be mapped with THPs right from the start, which
> > > > > >> can result in better TLB hit rates and execution time.
> > > > > >
> > > > > > This appears to break 32bit processes on x86_64 (at least). In
> > > > > > particular, 32bit kernel or firefox builds in our build system.
> > > > > >
> > > > > > Reverting this on top of 6.7 makes it work again.
> > > > > >
> > > > > > Downstream report:
> > > > > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > > > > >
> > > > > > So running:
> > > > > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > > > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > > > > >
> > > > > > crashes or errors out with some random errors:
> > > > > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > > > > Error emitting field
> > > > > >
> > > > > > strace shows mmap() fails with ENOMEM right before the errors:
> > > > > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > > > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > > > > ...
> > > > > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > > > > memory)
> > > > > >
> > > > > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > > > > enough. For reference, one is available at:
> > > > > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > > > > >
> > > > > > Any ideas?
> > > > >
> > > > > This works around the problem, of course (but is a band-aid, not a fix):
> > > > >
> > > > > --- a/mm/mmap.c
> > > > > +++ b/mm/mmap.c
> > > > > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > > > > addr, unsigned long len,
> > > > >                   */
> > > > >                  pgoff = 0;
> > > > >                  get_area = shmem_get_unmapped_area;
> > > > > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > > > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > > > > !in_32bit_syscall()) {
> > > > >                  /* Ensures that larger anonymous mappings are THP
> > > > > aligned. */
> > > > >                  get_area = thp_get_unmapped_area;
> > > > >          }
> > > > >
> > > > >
> > > > > thp_get_unmapped_area() does not take care of the legacy stuff...
> > > >
> > > > This change also affects the entropy of allocations. With this patch
> > > > Android test [1] started failing and it requires only 8 bits of
> > > > entropy. The feedback from our security team:
> > > >
> > > > 8 bits of entropy is already embarrassingly low, but was necessary for
> > > > 32 bit ARM targets with low RAM at the time. It's definitely not
> > > > acceptable for 64 bit targets.
> > >
> > > Thanks for the report. Is it 32 bit only or 64 bit is also impacted?
> > > If I understand the code correctly, it expects the address allocated
> > > by malloc() is kind of randomized, right?
> >
> > Yes, correct, the test expects a certain level of address randomization.
> > The test failure was reported while running kernel_virt_x86_64 target
> > (Android emulator on x86), so it does impact 64bit targets.
>
> IIUC this breaks the "expectation" for randomized addresses returned
> by malloc(), but it doesn't break any real Android application, right?
> So this is a security concern instead of a real regression.

How is making a system move vulnerabile not a real regression?

>
> I think we can make this opt-in with a knob. Anyone who outweighs
> security could opt this feature out. However I'm wondering whether
> Android should implement a general address randomization mechanism
> instead of depending on "luck" if you do care about it.

This is not depending on luck. This is checking for possible
vulnerabilities in the system.
I admit I'm not a security expert, so I'm looping in Jeff and Kees to chime in.
Thanks,
Suren.

>
> >
> > >
> > > >
> > > > Could this change be either reverted or made optional (opt-in/opt-out)?
> > > > Thanks,
> > > > Suren.
> > > >
> > > > [1] https://cs.android.com/android/platform/superproject/main/+/main:cts/tests/aslr/src/AslrMallocTest.cpp;l=130
> > > >
> > > > >
> > > > > regards,
> > > > > --
> > > > > js
> > > > > suse labs
> > > > >
> > > > >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-16 22:30             ` Suren Baghdasaryan
@ 2024-01-16 23:14               ` Yang Shi
  2024-01-17 17:40               ` Kees Cook
  1 sibling, 0 replies; 29+ messages in thread
From: Yang Shi @ 2024-01-16 23:14 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Jeffrey Vander Stoep, Kees Cook, Jiri Slaby, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Matthew Wilcox, Christoph Lameter

On Tue, Jan 16, 2024 at 2:30 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Tue, Jan 16, 2024 at 2:25 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Tue, Jan 16, 2024 at 1:58 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Tue, Jan 16, 2024 at 12:56 PM Yang Shi <shy828301@gmail.com> wrote:
> > > >
> > > > On Tue, Jan 16, 2024 at 11:16 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > >
> > > > > On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> > > > > >
> > > > > > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > > > > > >> Align larger anonymous memory mappings on THP boundaries by
> > > > > > >> going through thp_get_unmapped_area if THPs are enabled for
> > > > > > >> the current process.
> > > > > > >>
> > > > > > >> With this patch, larger anonymous mappings are now THP aligned.
> > > > > > >> When a malloc library allocates a 2MB or larger arena, that
> > > > > > >> arena can now be mapped with THPs right from the start, which
> > > > > > >> can result in better TLB hit rates and execution time.
> > > > > > >
> > > > > > > This appears to break 32bit processes on x86_64 (at least). In
> > > > > > > particular, 32bit kernel or firefox builds in our build system.
> > > > > > >
> > > > > > > Reverting this on top of 6.7 makes it work again.
> > > > > > >
> > > > > > > Downstream report:
> > > > > > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > > > > > >
> > > > > > > So running:
> > > > > > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > > > > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > > > > > >
> > > > > > > crashes or errors out with some random errors:
> > > > > > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > > > > > Error emitting field
> > > > > > >
> > > > > > > strace shows mmap() fails with ENOMEM right before the errors:
> > > > > > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > > > > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > > > > > ...
> > > > > > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > > > > > memory)
> > > > > > >
> > > > > > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > > > > > enough. For reference, one is available at:
> > > > > > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > > > > > >
> > > > > > > Any ideas?
> > > > > >
> > > > > > This works around the problem, of course (but is a band-aid, not a fix):
> > > > > >
> > > > > > --- a/mm/mmap.c
> > > > > > +++ b/mm/mmap.c
> > > > > > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > > > > > addr, unsigned long len,
> > > > > >                   */
> > > > > >                  pgoff = 0;
> > > > > >                  get_area = shmem_get_unmapped_area;
> > > > > > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > > > > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > > > > > !in_32bit_syscall()) {
> > > > > >                  /* Ensures that larger anonymous mappings are THP
> > > > > > aligned. */
> > > > > >                  get_area = thp_get_unmapped_area;
> > > > > >          }
> > > > > >
> > > > > >
> > > > > > thp_get_unmapped_area() does not take care of the legacy stuff...
> > > > >
> > > > > This change also affects the entropy of allocations. With this patch
> > > > > Android test [1] started failing and it requires only 8 bits of
> > > > > entropy. The feedback from our security team:
> > > > >
> > > > > 8 bits of entropy is already embarrassingly low, but was necessary for
> > > > > 32 bit ARM targets with low RAM at the time. It's definitely not
> > > > > acceptable for 64 bit targets.
> > > >
> > > > Thanks for the report. Is it 32 bit only or 64 bit is also impacted?
> > > > If I understand the code correctly, it expects the address allocated
> > > > by malloc() is kind of randomized, right?
> > >
> > > Yes, correct, the test expects a certain level of address randomization.
> > > The test failure was reported while running kernel_virt_x86_64 target
> > > (Android emulator on x86), so it does impact 64bit targets.
> >
> > IIUC this breaks the "expectation" for randomized addresses returned
> > by malloc(), but it doesn't break any real Android application, right?
> > So this is a security concern instead of a real regression.
>
> How is making a system move vulnerabile not a real regression?
>
> >
> > I think we can make this opt-in with a knob. Anyone who outweighs
> > security could opt this feature out. However I'm wondering whether
> > Android should implement a general address randomization mechanism
> > instead of depending on "luck" if you do care about it.
>
> This is not depending on luck. This is checking for possible
> vulnerabilities in the system.

I don't think the kernel guarantees address randomization if I read
the kernel code correctly although it may look like it has a certain
level of randomization somehow.

If I remember correctly PaX has an ASLR patch, but it is out of tree.

> I admit I'm not a security expert, so I'm looping in Jeff and Kees to chime in.
> Thanks,
> Suren.
>
> >
> > >
> > > >
> > > > >
> > > > > Could this change be either reverted or made optional (opt-in/opt-out)?
> > > > > Thanks,
> > > > > Suren.
> > > > >
> > > > > [1] https://cs.android.com/android/platform/superproject/main/+/main:cts/tests/aslr/src/AslrMallocTest.cpp;l=130
> > > > >
> > > > > >
> > > > > > regards,
> > > > > > --
> > > > > > js
> > > > > > suse labs
> > > > > >
> > > > > >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-16 22:30             ` Suren Baghdasaryan
  2024-01-16 23:14               ` Yang Shi
@ 2024-01-17 17:40               ` Kees Cook
  2024-01-17 23:32                 ` Yang Shi
  1 sibling, 1 reply; 29+ messages in thread
From: Kees Cook @ 2024-01-17 17:40 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Yang Shi, Jeffrey Vander Stoep, Jiri Slaby, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Matthew Wilcox, Christoph Lameter, linux-hardening

On Tue, Jan 16, 2024 at 02:30:36PM -0800, Suren Baghdasaryan wrote:
> On Tue, Jan 16, 2024 at 2:25 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Tue, Jan 16, 2024 at 1:58 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Tue, Jan 16, 2024 at 12:56 PM Yang Shi <shy828301@gmail.com> wrote:
> > > >
> > > > On Tue, Jan 16, 2024 at 11:16 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > >
> > > > > On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> > > > > >
> > > > > > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > > > > > >> Align larger anonymous memory mappings on THP boundaries by
> > > > > > >> going through thp_get_unmapped_area if THPs are enabled for
> > > > > > >> the current process.
> > > > > > >>
> > > > > > >> With this patch, larger anonymous mappings are now THP aligned.
> > > > > > >> When a malloc library allocates a 2MB or larger arena, that
> > > > > > >> arena can now be mapped with THPs right from the start, which
> > > > > > >> can result in better TLB hit rates and execution time.
> > > > > > >
> > > > > > > This appears to break 32bit processes on x86_64 (at least). In
> > > > > > > particular, 32bit kernel or firefox builds in our build system.
> > > > > > >
> > > > > > > Reverting this on top of 6.7 makes it work again.
> > > > > > >
> > > > > > > Downstream report:
> > > > > > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > > > > > >
> > > > > > > So running:
> > > > > > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > > > > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > > > > > >
> > > > > > > crashes or errors out with some random errors:
> > > > > > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > > > > > Error emitting field
> > > > > > >
> > > > > > > strace shows mmap() fails with ENOMEM right before the errors:
> > > > > > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > > > > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > > > > > ...
> > > > > > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > > > > > memory)
> > > > > > >
> > > > > > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > > > > > enough. For reference, one is available at:
> > > > > > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > > > > > >
> > > > > > > Any ideas?
> > > > > >
> > > > > > This works around the problem, of course (but is a band-aid, not a fix):
> > > > > >
> > > > > > --- a/mm/mmap.c
> > > > > > +++ b/mm/mmap.c
> > > > > > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > > > > > addr, unsigned long len,
> > > > > >                   */
> > > > > >                  pgoff = 0;
> > > > > >                  get_area = shmem_get_unmapped_area;
> > > > > > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > > > > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > > > > > !in_32bit_syscall()) {
> > > > > >                  /* Ensures that larger anonymous mappings are THP
> > > > > > aligned. */
> > > > > >                  get_area = thp_get_unmapped_area;
> > > > > >          }
> > > > > >
> > > > > >
> > > > > > thp_get_unmapped_area() does not take care of the legacy stuff...
> > > > >
> > > > > This change also affects the entropy of allocations. With this patch
> > > > > Android test [1] started failing and it requires only 8 bits of
> > > > > entropy. The feedback from our security team:
> > > > >
> > > > > 8 bits of entropy is already embarrassingly low, but was necessary for
> > > > > 32 bit ARM targets with low RAM at the time. It's definitely not
> > > > > acceptable for 64 bit targets.
> > > >
> > > > Thanks for the report. Is it 32 bit only or 64 bit is also impacted?
> > > > If I understand the code correctly, it expects the address allocated
> > > > by malloc() is kind of randomized, right?
> > >
> > > Yes, correct, the test expects a certain level of address randomization.
> > > The test failure was reported while running kernel_virt_x86_64 target
> > > (Android emulator on x86), so it does impact 64bit targets.
> >
> > IIUC this breaks the "expectation" for randomized addresses returned
> > by malloc(), but it doesn't break any real Android application, right?
> > So this is a security concern instead of a real regression.
> 
> How is making a system move vulnerabile not a real regression?
> 
> >
> > I think we can make this opt-in with a knob. Anyone who outweighs
> > security could opt this feature out. However I'm wondering whether
> > Android should implement a general address randomization mechanism
> > instead of depending on "luck" if you do care about it.
> 
> This is not depending on luck. This is checking for possible
> vulnerabilities in the system.
> I admit I'm not a security expert, so I'm looping in Jeff and Kees to chime in.

Hi!

Just to chime in, but reduction in ASLR entropy is absolutely a
regression. This is userspace visible (via /proc/sys/kernel/randomize_va_space,
/proc/sys/vm/mmap_rnd*_bits) with expectations that it work as
advertised. So, while 32-bit might be already ASLR-weak, we don't want
to make things super bad nor break ASLR in compat mode under 64-bit
systems.

Having an opt-in sounds reasonable, but we need to wire it to the ASLR
sysctls in some way so nothing lying about the ASLR entropy.

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-17 17:40               ` Kees Cook
@ 2024-01-17 23:32                 ` Yang Shi
  2024-01-18  0:01                   ` Suren Baghdasaryan
  0 siblings, 1 reply; 29+ messages in thread
From: Yang Shi @ 2024-01-17 23:32 UTC (permalink / raw)
  To: Kees Cook
  Cc: Suren Baghdasaryan, Jeffrey Vander Stoep, Jiri Slaby,
	Rik van Riel, Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Matthew Wilcox, Christoph Lameter, linux-hardening

On Wed, Jan 17, 2024 at 9:40 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, Jan 16, 2024 at 02:30:36PM -0800, Suren Baghdasaryan wrote:
> > On Tue, Jan 16, 2024 at 2:25 PM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Tue, Jan 16, 2024 at 1:58 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > On Tue, Jan 16, 2024 at 12:56 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > >
> > > > > On Tue, Jan 16, 2024 at 11:16 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > > >
> > > > > > On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> > > > > > >
> > > > > > > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > > > > > > >> Align larger anonymous memory mappings on THP boundaries by
> > > > > > > >> going through thp_get_unmapped_area if THPs are enabled for
> > > > > > > >> the current process.
> > > > > > > >>
> > > > > > > >> With this patch, larger anonymous mappings are now THP aligned.
> > > > > > > >> When a malloc library allocates a 2MB or larger arena, that
> > > > > > > >> arena can now be mapped with THPs right from the start, which
> > > > > > > >> can result in better TLB hit rates and execution time.
> > > > > > > >
> > > > > > > > This appears to break 32bit processes on x86_64 (at least). In
> > > > > > > > particular, 32bit kernel or firefox builds in our build system.
> > > > > > > >
> > > > > > > > Reverting this on top of 6.7 makes it work again.
> > > > > > > >
> > > > > > > > Downstream report:
> > > > > > > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > > > > > > >
> > > > > > > > So running:
> > > > > > > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > > > > > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > > > > > > >
> > > > > > > > crashes or errors out with some random errors:
> > > > > > > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > > > > > > Error emitting field
> > > > > > > >
> > > > > > > > strace shows mmap() fails with ENOMEM right before the errors:
> > > > > > > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > > > > > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > > > > > > ...
> > > > > > > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > > > > > > memory)
> > > > > > > >
> > > > > > > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > > > > > > enough. For reference, one is available at:
> > > > > > > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > > > > > > >
> > > > > > > > Any ideas?
> > > > > > >
> > > > > > > This works around the problem, of course (but is a band-aid, not a fix):
> > > > > > >
> > > > > > > --- a/mm/mmap.c
> > > > > > > +++ b/mm/mmap.c
> > > > > > > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > > > > > > addr, unsigned long len,
> > > > > > >                   */
> > > > > > >                  pgoff = 0;
> > > > > > >                  get_area = shmem_get_unmapped_area;
> > > > > > > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > > > > > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > > > > > > !in_32bit_syscall()) {
> > > > > > >                  /* Ensures that larger anonymous mappings are THP
> > > > > > > aligned. */
> > > > > > >                  get_area = thp_get_unmapped_area;
> > > > > > >          }
> > > > > > >
> > > > > > >
> > > > > > > thp_get_unmapped_area() does not take care of the legacy stuff...
> > > > > >
> > > > > > This change also affects the entropy of allocations. With this patch
> > > > > > Android test [1] started failing and it requires only 8 bits of
> > > > > > entropy. The feedback from our security team:
> > > > > >
> > > > > > 8 bits of entropy is already embarrassingly low, but was necessary for
> > > > > > 32 bit ARM targets with low RAM at the time. It's definitely not
> > > > > > acceptable for 64 bit targets.
> > > > >
> > > > > Thanks for the report. Is it 32 bit only or 64 bit is also impacted?
> > > > > If I understand the code correctly, it expects the address allocated
> > > > > by malloc() is kind of randomized, right?
> > > >
> > > > Yes, correct, the test expects a certain level of address randomization.
> > > > The test failure was reported while running kernel_virt_x86_64 target
> > > > (Android emulator on x86), so it does impact 64bit targets.
> > >
> > > IIUC this breaks the "expectation" for randomized addresses returned
> > > by malloc(), but it doesn't break any real Android application, right?
> > > So this is a security concern instead of a real regression.
> >
> > How is making a system move vulnerabile not a real regression?
> >
> > >
> > > I think we can make this opt-in with a knob. Anyone who outweighs
> > > security could opt this feature out. However I'm wondering whether
> > > Android should implement a general address randomization mechanism
> > > instead of depending on "luck" if you do care about it.
> >
> > This is not depending on luck. This is checking for possible
> > vulnerabilities in the system.
> > I admit I'm not a security expert, so I'm looping in Jeff and Kees to chime in.
>
> Hi!
>
> Just to chime in, but reduction in ASLR entropy is absolutely a
> regression. This is userspace visible (via /proc/sys/kernel/randomize_va_space,
> /proc/sys/vm/mmap_rnd*_bits) with expectations that it work as
> advertised. So, while 32-bit might be already ASLR-weak, we don't want
> to make things super bad nor break ASLR in compat mode under 64-bit
> systems.
>
> Having an opt-in sounds reasonable, but we need to wire it to the ASLR
> sysctls in some way so nothing lying about the ASLR entropy.

Thanks for the explanation. IIUC the randomiza_va_space and
mmap_rnd_bits randomize the mmap_base and start_brk for each exec()
call. So the heap allocation is randomized. But it seems the formula
doesn't take into account huge page. ARM64 adjusts the mmap_rnd_bits
based on page size.

I did a simple test, which conceptually does:
1. call mmap to allocate 8M heap
2. print out the allocated address
3. run the program 1000 times (launch/exit/re-launch)
4. check how many unique addresses

With the default config on my arm64 VM (mmap_rnd_bits is 18), I saw
134 unique addresses. Without the patch, I saw 945 unique addresses.
So I think the test could replicate what your test does.

When I increased the mmap_rnd_bits to 24, I saw 988 unique addresses
with the patch. x86_64 should have 28 bits by default, it should
randomize the address quite well. I don't know why you still saw this,
or you have a different setting for mmap_rnd_bits?

I'm wondering whether we should take into account huge page alignment
for mmap_rnd_bits. And I think this is a huge page common problem, we
have file mapping huge page aligned as well.

32 bit is easy, I think I can just make thp_get_unmapped_area() a
no-op on 32 bit system.

>
> -Kees
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-17 23:32                 ` Yang Shi
@ 2024-01-18  0:01                   ` Suren Baghdasaryan
  2024-01-18  0:13                     ` Yang Shi
  0 siblings, 1 reply; 29+ messages in thread
From: Suren Baghdasaryan @ 2024-01-18  0:01 UTC (permalink / raw)
  To: Yang Shi
  Cc: Kees Cook, Jeffrey Vander Stoep, Jiri Slaby, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Matthew Wilcox, Christoph Lameter, linux-hardening

On Wed, Jan 17, 2024 at 3:32 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Wed, Jan 17, 2024 at 9:40 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Tue, Jan 16, 2024 at 02:30:36PM -0800, Suren Baghdasaryan wrote:
> > > On Tue, Jan 16, 2024 at 2:25 PM Yang Shi <shy828301@gmail.com> wrote:
> > > >
> > > > On Tue, Jan 16, 2024 at 1:58 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > >
> > > > > On Tue, Jan 16, 2024 at 12:56 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Jan 16, 2024 at 11:16 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > > > > > > > >> Align larger anonymous memory mappings on THP boundaries by
> > > > > > > > >> going through thp_get_unmapped_area if THPs are enabled for
> > > > > > > > >> the current process.
> > > > > > > > >>
> > > > > > > > >> With this patch, larger anonymous mappings are now THP aligned.
> > > > > > > > >> When a malloc library allocates a 2MB or larger arena, that
> > > > > > > > >> arena can now be mapped with THPs right from the start, which
> > > > > > > > >> can result in better TLB hit rates and execution time.
> > > > > > > > >
> > > > > > > > > This appears to break 32bit processes on x86_64 (at least). In
> > > > > > > > > particular, 32bit kernel or firefox builds in our build system.
> > > > > > > > >
> > > > > > > > > Reverting this on top of 6.7 makes it work again.
> > > > > > > > >
> > > > > > > > > Downstream report:
> > > > > > > > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > > > > > > > >
> > > > > > > > > So running:
> > > > > > > > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > > > > > > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > > > > > > > >
> > > > > > > > > crashes or errors out with some random errors:
> > > > > > > > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > > > > > > > Error emitting field
> > > > > > > > >
> > > > > > > > > strace shows mmap() fails with ENOMEM right before the errors:
> > > > > > > > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > > > > > > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > > > > > > > ...
> > > > > > > > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > > > > > > > memory)
> > > > > > > > >
> > > > > > > > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > > > > > > > enough. For reference, one is available at:
> > > > > > > > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > > > > > > > >
> > > > > > > > > Any ideas?
> > > > > > > >
> > > > > > > > This works around the problem, of course (but is a band-aid, not a fix):
> > > > > > > >
> > > > > > > > --- a/mm/mmap.c
> > > > > > > > +++ b/mm/mmap.c
> > > > > > > > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > > > > > > > addr, unsigned long len,
> > > > > > > >                   */
> > > > > > > >                  pgoff = 0;
> > > > > > > >                  get_area = shmem_get_unmapped_area;
> > > > > > > > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > > > > > > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > > > > > > > !in_32bit_syscall()) {
> > > > > > > >                  /* Ensures that larger anonymous mappings are THP
> > > > > > > > aligned. */
> > > > > > > >                  get_area = thp_get_unmapped_area;
> > > > > > > >          }
> > > > > > > >
> > > > > > > >
> > > > > > > > thp_get_unmapped_area() does not take care of the legacy stuff...
> > > > > > >
> > > > > > > This change also affects the entropy of allocations. With this patch
> > > > > > > Android test [1] started failing and it requires only 8 bits of
> > > > > > > entropy. The feedback from our security team:
> > > > > > >
> > > > > > > 8 bits of entropy is already embarrassingly low, but was necessary for
> > > > > > > 32 bit ARM targets with low RAM at the time. It's definitely not
> > > > > > > acceptable for 64 bit targets.
> > > > > >
> > > > > > Thanks for the report. Is it 32 bit only or 64 bit is also impacted?
> > > > > > If I understand the code correctly, it expects the address allocated
> > > > > > by malloc() is kind of randomized, right?
> > > > >
> > > > > Yes, correct, the test expects a certain level of address randomization.
> > > > > The test failure was reported while running kernel_virt_x86_64 target
> > > > > (Android emulator on x86), so it does impact 64bit targets.
> > > >
> > > > IIUC this breaks the "expectation" for randomized addresses returned
> > > > by malloc(), but it doesn't break any real Android application, right?
> > > > So this is a security concern instead of a real regression.
> > >
> > > How is making a system move vulnerabile not a real regression?
> > >
> > > >
> > > > I think we can make this opt-in with a knob. Anyone who outweighs
> > > > security could opt this feature out. However I'm wondering whether
> > > > Android should implement a general address randomization mechanism
> > > > instead of depending on "luck" if you do care about it.
> > >
> > > This is not depending on luck. This is checking for possible
> > > vulnerabilities in the system.
> > > I admit I'm not a security expert, so I'm looping in Jeff and Kees to chime in.
> >
> > Hi!
> >
> > Just to chime in, but reduction in ASLR entropy is absolutely a
> > regression. This is userspace visible (via /proc/sys/kernel/randomize_va_space,
> > /proc/sys/vm/mmap_rnd*_bits) with expectations that it work as
> > advertised. So, while 32-bit might be already ASLR-weak, we don't want
> > to make things super bad nor break ASLR in compat mode under 64-bit
> > systems.
> >
> > Having an opt-in sounds reasonable, but we need to wire it to the ASLR
> > sysctls in some way so nothing lying about the ASLR entropy.
>
> Thanks for the explanation. IIUC the randomiza_va_space and
> mmap_rnd_bits randomize the mmap_base and start_brk for each exec()
> call. So the heap allocation is randomized. But it seems the formula
> doesn't take into account huge page. ARM64 adjusts the mmap_rnd_bits
> based on page size.
>
> I did a simple test, which conceptually does:
> 1. call mmap to allocate 8M heap
> 2. print out the allocated address
> 3. run the program 1000 times (launch/exit/re-launch)
> 4. check how many unique addresses
>
> With the default config on my arm64 VM (mmap_rnd_bits is 18), I saw
> 134 unique addresses. Without the patch, I saw 945 unique addresses.
> So I think the test could replicate what your test does.
>
> When I increased the mmap_rnd_bits to 24, I saw 988 unique addresses
> with the patch. x86_64 should have 28 bits by default, it should
> randomize the address quite well. I don't know why you still saw this,
> or you have a different setting for mmap_rnd_bits?

I checked the configuration on our test harness where the test failed:

/proc/sys/vm/mmap_rnd_bits = 32
/proc/sys/vm/mmap_rnd_compat_bits = 16

The failure logs are:

10-20 14:37:52.123  7029  7029 V AslrMallocTest: 7 bits of entropy for
allocation size 8388608 (minimum 8)
10-20 14:37:52.123  7029  7029 E AslrMallocTest: insufficient entropy
for malloc(8388608)

which come from here:
https://cs.android.com/android/platform/superproject/main/+/main:cts/tests/aslr/src/AslrMallocTest.cpp;l=127
So, the allocation size for which this test failed was 2^23.


> I'm wondering whether we should take into account huge page alignment
> for mmap_rnd_bits. And I think this is a huge page common problem, we
> have file mapping huge page aligned as well.
>
> 32 bit is easy, I think I can just make thp_get_unmapped_area() a
> no-op on 32 bit system.
>
> >
> > -Kees
> >
> > --
> > Kees Cook

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-16 12:09   ` Jiri Slaby
  2024-01-16 19:16     ` Suren Baghdasaryan
  2024-01-16 20:55     ` Yang Shi
@ 2024-01-18  0:07     ` Yang Shi
  2024-01-18  0:09       ` Suren Baghdasaryan
  2024-01-18  7:04       ` Jiri Slaby
  2 siblings, 2 replies; 29+ messages in thread
From: Yang Shi @ 2024-01-18  0:07 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Rik van Riel, Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Matthew Wilcox, Christoph Lameter

On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
>
> On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > Hi,
> >
> > On 09. 08. 22, 20:24, Rik van Riel wrote:
> >> Align larger anonymous memory mappings on THP boundaries by
> >> going through thp_get_unmapped_area if THPs are enabled for
> >> the current process.
> >>
> >> With this patch, larger anonymous mappings are now THP aligned.
> >> When a malloc library allocates a 2MB or larger arena, that
> >> arena can now be mapped with THPs right from the start, which
> >> can result in better TLB hit rates and execution time.
> >
> > This appears to break 32bit processes on x86_64 (at least). In
> > particular, 32bit kernel or firefox builds in our build system.
> >
> > Reverting this on top of 6.7 makes it work again.
> >
> > Downstream report:
> >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> >
> > So running:
> > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> >
> > crashes or errors out with some random errors:
> > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > Error emitting field
> >
> > strace shows mmap() fails with ENOMEM right before the errors:
> > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > ...
> > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > memory)
> >
> > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > enough. For reference, one is available at:
> > https://decibel.fi.muni.cz/~xslaby/n/btf
> >
> > Any ideas?
>
> This works around the problem, of course (but is a band-aid, not a fix):
>
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> addr, unsigned long len,
>                   */
>                  pgoff = 0;
>                  get_area = shmem_get_unmapped_area;
> -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> !in_32bit_syscall()) {
>                  /* Ensures that larger anonymous mappings are THP
> aligned. */
>                  get_area = thp_get_unmapped_area;
>          }
>
>
> thp_get_unmapped_area() does not take care of the legacy stuff...

Could you please help test the below patch? It is compiled, but I
don't have 32 bit userspace or machine to test it.

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 94ef5c02b459..a4d0839e5f31 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -811,6 +811,9 @@ static unsigned long
__thp_get_unmapped_area(struct file *filp,
        loff_t off_align = round_up(off, size);
        unsigned long len_pad, ret;

+       if (IS_ENABLED(CONFIG_32BIT) || in_compat_syscall())
+               return 0;
+
        if (off_end <= off_align || (off_end - off_align) < size)
                return 0;


>
> regards,
> --
> js
> suse labs
>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-18  0:07     ` Yang Shi
@ 2024-01-18  0:09       ` Suren Baghdasaryan
  2024-01-18  0:11         ` Suren Baghdasaryan
  2024-01-18  7:04       ` Jiri Slaby
  1 sibling, 1 reply; 29+ messages in thread
From: Suren Baghdasaryan @ 2024-01-18  0:09 UTC (permalink / raw)
  To: Yang Shi
  Cc: Jiri Slaby, Rik van Riel, Andrew Morton, linux-mm, linux-kernel,
	kernel-team, Matthew Wilcox, Christoph Lameter

On Wed, Jan 17, 2024 at 4:07 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> >
> > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > Hi,
> > >
> > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > >> Align larger anonymous memory mappings on THP boundaries by
> > >> going through thp_get_unmapped_area if THPs are enabled for
> > >> the current process.
> > >>
> > >> With this patch, larger anonymous mappings are now THP aligned.
> > >> When a malloc library allocates a 2MB or larger arena, that
> > >> arena can now be mapped with THPs right from the start, which
> > >> can result in better TLB hit rates and execution time.
> > >
> > > This appears to break 32bit processes on x86_64 (at least). In
> > > particular, 32bit kernel or firefox builds in our build system.
> > >
> > > Reverting this on top of 6.7 makes it work again.
> > >
> > > Downstream report:
> > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > >
> > > So running:
> > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > >
> > > crashes or errors out with some random errors:
> > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > Error emitting field
> > >
> > > strace shows mmap() fails with ENOMEM right before the errors:
> > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > ...
> > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > memory)
> > >
> > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > enough. For reference, one is available at:
> > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > >
> > > Any ideas?
> >
> > This works around the problem, of course (but is a band-aid, not a fix):
> >
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > addr, unsigned long len,
> >                   */
> >                  pgoff = 0;
> >                  get_area = shmem_get_unmapped_area;
> > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > !in_32bit_syscall()) {
> >                  /* Ensures that larger anonymous mappings are THP
> > aligned. */
> >                  get_area = thp_get_unmapped_area;
> >          }
> >
> >
> > thp_get_unmapped_area() does not take care of the legacy stuff...
>
> Could you please help test the below patch? It is compiled, but I
> don't have 32 bit userspace or machine to test it.

Hmm. I think you misunderstood me. This is happening on x86_64 emulated system.

>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 94ef5c02b459..a4d0839e5f31 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -811,6 +811,9 @@ static unsigned long
> __thp_get_unmapped_area(struct file *filp,
>         loff_t off_align = round_up(off, size);
>         unsigned long len_pad, ret;
>
> +       if (IS_ENABLED(CONFIG_32BIT) || in_compat_syscall())
> +               return 0;
> +
>         if (off_end <= off_align || (off_end - off_align) < size)
>                 return 0;
>
>
> >
> > regards,
> > --
> > js
> > suse labs
> >
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-18  0:09       ` Suren Baghdasaryan
@ 2024-01-18  0:11         ` Suren Baghdasaryan
  2024-01-18  0:15           ` Yang Shi
  0 siblings, 1 reply; 29+ messages in thread
From: Suren Baghdasaryan @ 2024-01-18  0:11 UTC (permalink / raw)
  To: Yang Shi
  Cc: Jiri Slaby, Rik van Riel, Andrew Morton, linux-mm, linux-kernel,
	kernel-team, Matthew Wilcox, Christoph Lameter

On Wed, Jan 17, 2024 at 4:09 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Jan 17, 2024 at 4:07 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> > >
> > > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > > Hi,
> > > >
> > > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > > >> Align larger anonymous memory mappings on THP boundaries by
> > > >> going through thp_get_unmapped_area if THPs are enabled for
> > > >> the current process.
> > > >>
> > > >> With this patch, larger anonymous mappings are now THP aligned.
> > > >> When a malloc library allocates a 2MB or larger arena, that
> > > >> arena can now be mapped with THPs right from the start, which
> > > >> can result in better TLB hit rates and execution time.
> > > >
> > > > This appears to break 32bit processes on x86_64 (at least). In
> > > > particular, 32bit kernel or firefox builds in our build system.
> > > >
> > > > Reverting this on top of 6.7 makes it work again.
> > > >
> > > > Downstream report:
> > > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > > >
> > > > So running:
> > > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > > >
> > > > crashes or errors out with some random errors:
> > > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > > Error emitting field
> > > >
> > > > strace shows mmap() fails with ENOMEM right before the errors:
> > > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > > ...
> > > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > > memory)
> > > >
> > > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > > enough. For reference, one is available at:
> > > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > > >
> > > > Any ideas?
> > >
> > > This works around the problem, of course (but is a band-aid, not a fix):
> > >
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > > addr, unsigned long len,
> > >                   */
> > >                  pgoff = 0;
> > >                  get_area = shmem_get_unmapped_area;
> > > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > > !in_32bit_syscall()) {
> > >                  /* Ensures that larger anonymous mappings are THP
> > > aligned. */
> > >                  get_area = thp_get_unmapped_area;
> > >          }
> > >
> > >
> > > thp_get_unmapped_area() does not take care of the legacy stuff...
> >
> > Could you please help test the below patch? It is compiled, but I
> > don't have 32 bit userspace or machine to test it.
>
> Hmm. I think you misunderstood me. This is happening on x86_64 emulated system.

I mean it's x86_64 Android emulator, so kernel is 64bit.

>
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 94ef5c02b459..a4d0839e5f31 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -811,6 +811,9 @@ static unsigned long
> > __thp_get_unmapped_area(struct file *filp,
> >         loff_t off_align = round_up(off, size);
> >         unsigned long len_pad, ret;
> >
> > +       if (IS_ENABLED(CONFIG_32BIT) || in_compat_syscall())
> > +               return 0;
> > +
> >         if (off_end <= off_align || (off_end - off_align) < size)
> >                 return 0;
> >
> >
> > >
> > > regards,
> > > --
> > > js
> > > suse labs
> > >
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-18  0:01                   ` Suren Baghdasaryan
@ 2024-01-18  0:13                     ` Yang Shi
  2024-01-18  0:29                       ` Suren Baghdasaryan
  0 siblings, 1 reply; 29+ messages in thread
From: Yang Shi @ 2024-01-18  0:13 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Kees Cook, Jeffrey Vander Stoep, Jiri Slaby, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Matthew Wilcox, Christoph Lameter, linux-hardening

On Wed, Jan 17, 2024 at 4:02 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Jan 17, 2024 at 3:32 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Wed, Jan 17, 2024 at 9:40 AM Kees Cook <keescook@chromium.org> wrote:
> > >
> > > On Tue, Jan 16, 2024 at 02:30:36PM -0800, Suren Baghdasaryan wrote:
> > > > On Tue, Jan 16, 2024 at 2:25 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > >
> > > > > On Tue, Jan 16, 2024 at 1:58 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > > >
> > > > > > On Tue, Jan 16, 2024 at 12:56 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jan 16, 2024 at 11:16 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > > > > > > > > >> Align larger anonymous memory mappings on THP boundaries by
> > > > > > > > > >> going through thp_get_unmapped_area if THPs are enabled for
> > > > > > > > > >> the current process.
> > > > > > > > > >>
> > > > > > > > > >> With this patch, larger anonymous mappings are now THP aligned.
> > > > > > > > > >> When a malloc library allocates a 2MB or larger arena, that
> > > > > > > > > >> arena can now be mapped with THPs right from the start, which
> > > > > > > > > >> can result in better TLB hit rates and execution time.
> > > > > > > > > >
> > > > > > > > > > This appears to break 32bit processes on x86_64 (at least). In
> > > > > > > > > > particular, 32bit kernel or firefox builds in our build system.
> > > > > > > > > >
> > > > > > > > > > Reverting this on top of 6.7 makes it work again.
> > > > > > > > > >
> > > > > > > > > > Downstream report:
> > > > > > > > > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > > > > > > > > >
> > > > > > > > > > So running:
> > > > > > > > > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > > > > > > > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > > > > > > > > >
> > > > > > > > > > crashes or errors out with some random errors:
> > > > > > > > > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > > > > > > > > Error emitting field
> > > > > > > > > >
> > > > > > > > > > strace shows mmap() fails with ENOMEM right before the errors:
> > > > > > > > > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > > > > > > > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > > > > > > > > ...
> > > > > > > > > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > > > > > > > > memory)
> > > > > > > > > >
> > > > > > > > > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > > > > > > > > enough. For reference, one is available at:
> > > > > > > > > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > > > > > > > > >
> > > > > > > > > > Any ideas?
> > > > > > > > >
> > > > > > > > > This works around the problem, of course (but is a band-aid, not a fix):
> > > > > > > > >
> > > > > > > > > --- a/mm/mmap.c
> > > > > > > > > +++ b/mm/mmap.c
> > > > > > > > > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > > > > > > > > addr, unsigned long len,
> > > > > > > > >                   */
> > > > > > > > >                  pgoff = 0;
> > > > > > > > >                  get_area = shmem_get_unmapped_area;
> > > > > > > > > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > > > > > > > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > > > > > > > > !in_32bit_syscall()) {
> > > > > > > > >                  /* Ensures that larger anonymous mappings are THP
> > > > > > > > > aligned. */
> > > > > > > > >                  get_area = thp_get_unmapped_area;
> > > > > > > > >          }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > thp_get_unmapped_area() does not take care of the legacy stuff...
> > > > > > > >
> > > > > > > > This change also affects the entropy of allocations. With this patch
> > > > > > > > Android test [1] started failing and it requires only 8 bits of
> > > > > > > > entropy. The feedback from our security team:
> > > > > > > >
> > > > > > > > 8 bits of entropy is already embarrassingly low, but was necessary for
> > > > > > > > 32 bit ARM targets with low RAM at the time. It's definitely not
> > > > > > > > acceptable for 64 bit targets.
> > > > > > >
> > > > > > > Thanks for the report. Is it 32 bit only or 64 bit is also impacted?
> > > > > > > If I understand the code correctly, it expects the address allocated
> > > > > > > by malloc() is kind of randomized, right?
> > > > > >
> > > > > > Yes, correct, the test expects a certain level of address randomization.
> > > > > > The test failure was reported while running kernel_virt_x86_64 target
> > > > > > (Android emulator on x86), so it does impact 64bit targets.
> > > > >
> > > > > IIUC this breaks the "expectation" for randomized addresses returned
> > > > > by malloc(), but it doesn't break any real Android application, right?
> > > > > So this is a security concern instead of a real regression.
> > > >
> > > > How is making a system move vulnerabile not a real regression?
> > > >
> > > > >
> > > > > I think we can make this opt-in with a knob. Anyone who outweighs
> > > > > security could opt this feature out. However I'm wondering whether
> > > > > Android should implement a general address randomization mechanism
> > > > > instead of depending on "luck" if you do care about it.
> > > >
> > > > This is not depending on luck. This is checking for possible
> > > > vulnerabilities in the system.
> > > > I admit I'm not a security expert, so I'm looping in Jeff and Kees to chime in.
> > >
> > > Hi!
> > >
> > > Just to chime in, but reduction in ASLR entropy is absolutely a
> > > regression. This is userspace visible (via /proc/sys/kernel/randomize_va_space,
> > > /proc/sys/vm/mmap_rnd*_bits) with expectations that it work as
> > > advertised. So, while 32-bit might be already ASLR-weak, we don't want
> > > to make things super bad nor break ASLR in compat mode under 64-bit
> > > systems.
> > >
> > > Having an opt-in sounds reasonable, but we need to wire it to the ASLR
> > > sysctls in some way so nothing lying about the ASLR entropy.
> >
> > Thanks for the explanation. IIUC the randomiza_va_space and
> > mmap_rnd_bits randomize the mmap_base and start_brk for each exec()
> > call. So the heap allocation is randomized. But it seems the formula
> > doesn't take into account huge page. ARM64 adjusts the mmap_rnd_bits
> > based on page size.
> >
> > I did a simple test, which conceptually does:
> > 1. call mmap to allocate 8M heap
> > 2. print out the allocated address
> > 3. run the program 1000 times (launch/exit/re-launch)
> > 4. check how many unique addresses
> >
> > With the default config on my arm64 VM (mmap_rnd_bits is 18), I saw
> > 134 unique addresses. Without the patch, I saw 945 unique addresses.
> > So I think the test could replicate what your test does.
> >
> > When I increased the mmap_rnd_bits to 24, I saw 988 unique addresses
> > with the patch. x86_64 should have 28 bits by default, it should
> > randomize the address quite well. I don't know why you still saw this,
> > or you have a different setting for mmap_rnd_bits?
>
> I checked the configuration on our test harness where the test failed:

Thanks, Suren.

>
> /proc/sys/vm/mmap_rnd_bits = 32

It is surprising 32 bits still fail. 24 bits on arm64 works for me. Or
the compat bits is used?

> /proc/sys/vm/mmap_rnd_compat_bits = 16
>
> The failure logs are:
>
> 10-20 14:37:52.123  7029  7029 V AslrMallocTest: 7 bits of entropy for
> allocation size 8388608 (minimum 8)
> 10-20 14:37:52.123  7029  7029 E AslrMallocTest: insufficient entropy
> for malloc(8388608)
>
> which come from here:
> https://cs.android.com/android/platform/superproject/main/+/main:cts/tests/aslr/src/AslrMallocTest.cpp;l=127
> So, the allocation size for which this test failed was 2^23.

The patch just tries to align >= 2M allocations. It looks like your
test allocates 256 bytes, 64K and 8M. So just 8M is impacted.


>
>
> > I'm wondering whether we should take into account huge page alignment
> > for mmap_rnd_bits. And I think this is a huge page common problem, we
> > have file mapping huge page aligned as well.
> >
> > 32 bit is easy, I think I can just make thp_get_unmapped_area() a
> > no-op on 32 bit system.
> >
> > >
> > > -Kees
> > >
> > > --
> > > Kees Cook

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-18  0:11         ` Suren Baghdasaryan
@ 2024-01-18  0:15           ` Yang Shi
  2024-01-18  0:28             ` Suren Baghdasaryan
  0 siblings, 1 reply; 29+ messages in thread
From: Yang Shi @ 2024-01-18  0:15 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Jiri Slaby, Rik van Riel, Andrew Morton, linux-mm, linux-kernel,
	kernel-team, Matthew Wilcox, Christoph Lameter

On Wed, Jan 17, 2024 at 4:11 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Jan 17, 2024 at 4:09 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Wed, Jan 17, 2024 at 4:07 PM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> > > >
> > > > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > > > Hi,
> > > > >
> > > > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > > > >> Align larger anonymous memory mappings on THP boundaries by
> > > > >> going through thp_get_unmapped_area if THPs are enabled for
> > > > >> the current process.
> > > > >>
> > > > >> With this patch, larger anonymous mappings are now THP aligned.
> > > > >> When a malloc library allocates a 2MB or larger arena, that
> > > > >> arena can now be mapped with THPs right from the start, which
> > > > >> can result in better TLB hit rates and execution time.
> > > > >
> > > > > This appears to break 32bit processes on x86_64 (at least). In
> > > > > particular, 32bit kernel or firefox builds in our build system.
> > > > >
> > > > > Reverting this on top of 6.7 makes it work again.
> > > > >
> > > > > Downstream report:
> > > > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > > > >
> > > > > So running:
> > > > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > > > >
> > > > > crashes or errors out with some random errors:
> > > > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > > > Error emitting field
> > > > >
> > > > > strace shows mmap() fails with ENOMEM right before the errors:
> > > > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > > > ...
> > > > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > > > memory)
> > > > >
> > > > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > > > enough. For reference, one is available at:
> > > > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > > > >
> > > > > Any ideas?
> > > >
> > > > This works around the problem, of course (but is a band-aid, not a fix):
> > > >
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
> > > > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > > > addr, unsigned long len,
> > > >                   */
> > > >                  pgoff = 0;
> > > >                  get_area = shmem_get_unmapped_area;
> > > > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > > > !in_32bit_syscall()) {
> > > >                  /* Ensures that larger anonymous mappings are THP
> > > > aligned. */
> > > >                  get_area = thp_get_unmapped_area;
> > > >          }
> > > >
> > > >
> > > > thp_get_unmapped_area() does not take care of the legacy stuff...
> > >
> > > Could you please help test the below patch? It is compiled, but I
> > > don't have 32 bit userspace or machine to test it.
> >
> > Hmm. I think you misunderstood me. This is happening on x86_64 emulated system.
>
> I mean it's x86_64 Android emulator, so kernel is 64bit.

No, I didn't. This patch is aimed to solve the -ENOMEM problem
reported by Jiri Slaby. I believe his test was run on 32 bit machine
or 32 bit userspace on 64 bit kernel.

>
> >
> > >
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 94ef5c02b459..a4d0839e5f31 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -811,6 +811,9 @@ static unsigned long
> > > __thp_get_unmapped_area(struct file *filp,
> > >         loff_t off_align = round_up(off, size);
> > >         unsigned long len_pad, ret;
> > >
> > > +       if (IS_ENABLED(CONFIG_32BIT) || in_compat_syscall())
> > > +               return 0;
> > > +
> > >         if (off_end <= off_align || (off_end - off_align) < size)
> > >                 return 0;
> > >
> > >
> > > >
> > > > regards,
> > > > --
> > > > js
> > > > suse labs
> > > >
> > >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-18  0:15           ` Yang Shi
@ 2024-01-18  0:28             ` Suren Baghdasaryan
  0 siblings, 0 replies; 29+ messages in thread
From: Suren Baghdasaryan @ 2024-01-18  0:28 UTC (permalink / raw)
  To: Yang Shi
  Cc: Jiri Slaby, Rik van Riel, Andrew Morton, linux-mm, linux-kernel,
	kernel-team, Matthew Wilcox, Christoph Lameter

On Wed, Jan 17, 2024 at 4:15 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Wed, Jan 17, 2024 at 4:11 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Wed, Jan 17, 2024 at 4:09 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Wed, Jan 17, 2024 at 4:07 PM Yang Shi <shy828301@gmail.com> wrote:
> > > >
> > > > On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> > > > >
> > > > > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > > > > Hi,
> > > > > >
> > > > > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > > > > >> Align larger anonymous memory mappings on THP boundaries by
> > > > > >> going through thp_get_unmapped_area if THPs are enabled for
> > > > > >> the current process.
> > > > > >>
> > > > > >> With this patch, larger anonymous mappings are now THP aligned.
> > > > > >> When a malloc library allocates a 2MB or larger arena, that
> > > > > >> arena can now be mapped with THPs right from the start, which
> > > > > >> can result in better TLB hit rates and execution time.
> > > > > >
> > > > > > This appears to break 32bit processes on x86_64 (at least). In
> > > > > > particular, 32bit kernel or firefox builds in our build system.
> > > > > >
> > > > > > Reverting this on top of 6.7 makes it work again.
> > > > > >
> > > > > > Downstream report:
> > > > > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > > > > >
> > > > > > So running:
> > > > > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > > > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > > > > >
> > > > > > crashes or errors out with some random errors:
> > > > > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > > > > Error emitting field
> > > > > >
> > > > > > strace shows mmap() fails with ENOMEM right before the errors:
> > > > > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > > > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > > > > ...
> > > > > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > > > > memory)
> > > > > >
> > > > > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > > > > enough. For reference, one is available at:
> > > > > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > > > > >
> > > > > > Any ideas?
> > > > >
> > > > > This works around the problem, of course (but is a band-aid, not a fix):
> > > > >
> > > > > --- a/mm/mmap.c
> > > > > +++ b/mm/mmap.c
> > > > > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > > > > addr, unsigned long len,
> > > > >                   */
> > > > >                  pgoff = 0;
> > > > >                  get_area = shmem_get_unmapped_area;
> > > > > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > > > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > > > > !in_32bit_syscall()) {
> > > > >                  /* Ensures that larger anonymous mappings are THP
> > > > > aligned. */
> > > > >                  get_area = thp_get_unmapped_area;
> > > > >          }
> > > > >
> > > > >
> > > > > thp_get_unmapped_area() does not take care of the legacy stuff...
> > > >
> > > > Could you please help test the below patch? It is compiled, but I
> > > > don't have 32 bit userspace or machine to test it.
> > >
> > > Hmm. I think you misunderstood me. This is happening on x86_64 emulated system.
> >
> > I mean it's x86_64 Android emulator, so kernel is 64bit.
>
> No, I didn't. This patch is aimed to solve the -ENOMEM problem
> reported by Jiri Slaby. I believe his test was run on 32 bit machine
> or 32 bit userspace on 64 bit kernel.

Oh, sorry, I didn't realize you were asking Jiri to test for the
problem he reported :)

>
> >
> > >
> > > >
> > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > > index 94ef5c02b459..a4d0839e5f31 100644
> > > > --- a/mm/huge_memory.c
> > > > +++ b/mm/huge_memory.c
> > > > @@ -811,6 +811,9 @@ static unsigned long
> > > > __thp_get_unmapped_area(struct file *filp,
> > > >         loff_t off_align = round_up(off, size);
> > > >         unsigned long len_pad, ret;
> > > >
> > > > +       if (IS_ENABLED(CONFIG_32BIT) || in_compat_syscall())
> > > > +               return 0;
> > > > +
> > > >         if (off_end <= off_align || (off_end - off_align) < size)
> > > >                 return 0;
> > > >
> > > >
> > > > >
> > > > > regards,
> > > > > --
> > > > > js
> > > > > suse labs
> > > > >
> > > >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-18  0:13                     ` Yang Shi
@ 2024-01-18  0:29                       ` Suren Baghdasaryan
  2024-01-18  1:34                         ` Suren Baghdasaryan
  0 siblings, 1 reply; 29+ messages in thread
From: Suren Baghdasaryan @ 2024-01-18  0:29 UTC (permalink / raw)
  To: Yang Shi
  Cc: Kees Cook, Jeffrey Vander Stoep, Jiri Slaby, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Matthew Wilcox, Christoph Lameter, linux-hardening

On Wed, Jan 17, 2024 at 4:13 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Wed, Jan 17, 2024 at 4:02 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Wed, Jan 17, 2024 at 3:32 PM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Wed, Jan 17, 2024 at 9:40 AM Kees Cook <keescook@chromium.org> wrote:
> > > >
> > > > On Tue, Jan 16, 2024 at 02:30:36PM -0800, Suren Baghdasaryan wrote:
> > > > > On Tue, Jan 16, 2024 at 2:25 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Jan 16, 2024 at 1:58 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jan 16, 2024 at 12:56 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jan 16, 2024 at 11:16 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > > > > > > > > > >> Align larger anonymous memory mappings on THP boundaries by
> > > > > > > > > > >> going through thp_get_unmapped_area if THPs are enabled for
> > > > > > > > > > >> the current process.
> > > > > > > > > > >>
> > > > > > > > > > >> With this patch, larger anonymous mappings are now THP aligned.
> > > > > > > > > > >> When a malloc library allocates a 2MB or larger arena, that
> > > > > > > > > > >> arena can now be mapped with THPs right from the start, which
> > > > > > > > > > >> can result in better TLB hit rates and execution time.
> > > > > > > > > > >
> > > > > > > > > > > This appears to break 32bit processes on x86_64 (at least). In
> > > > > > > > > > > particular, 32bit kernel or firefox builds in our build system.
> > > > > > > > > > >
> > > > > > > > > > > Reverting this on top of 6.7 makes it work again.
> > > > > > > > > > >
> > > > > > > > > > > Downstream report:
> > > > > > > > > > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > > > > > > > > > >
> > > > > > > > > > > So running:
> > > > > > > > > > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > > > > > > > > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > > > > > > > > > >
> > > > > > > > > > > crashes or errors out with some random errors:
> > > > > > > > > > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > > > > > > > > > Error emitting field
> > > > > > > > > > >
> > > > > > > > > > > strace shows mmap() fails with ENOMEM right before the errors:
> > > > > > > > > > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > > > > > > > > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > > > > > > > > > ...
> > > > > > > > > > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > > > > > > > > > memory)
> > > > > > > > > > >
> > > > > > > > > > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > > > > > > > > > enough. For reference, one is available at:
> > > > > > > > > > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > > > > > > > > > >
> > > > > > > > > > > Any ideas?
> > > > > > > > > >
> > > > > > > > > > This works around the problem, of course (but is a band-aid, not a fix):
> > > > > > > > > >
> > > > > > > > > > --- a/mm/mmap.c
> > > > > > > > > > +++ b/mm/mmap.c
> > > > > > > > > > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > > > > > > > > > addr, unsigned long len,
> > > > > > > > > >                   */
> > > > > > > > > >                  pgoff = 0;
> > > > > > > > > >                  get_area = shmem_get_unmapped_area;
> > > > > > > > > > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > > > > > > > > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > > > > > > > > > !in_32bit_syscall()) {
> > > > > > > > > >                  /* Ensures that larger anonymous mappings are THP
> > > > > > > > > > aligned. */
> > > > > > > > > >                  get_area = thp_get_unmapped_area;
> > > > > > > > > >          }
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > thp_get_unmapped_area() does not take care of the legacy stuff...
> > > > > > > > >
> > > > > > > > > This change also affects the entropy of allocations. With this patch
> > > > > > > > > Android test [1] started failing and it requires only 8 bits of
> > > > > > > > > entropy. The feedback from our security team:
> > > > > > > > >
> > > > > > > > > 8 bits of entropy is already embarrassingly low, but was necessary for
> > > > > > > > > 32 bit ARM targets with low RAM at the time. It's definitely not
> > > > > > > > > acceptable for 64 bit targets.
> > > > > > > >
> > > > > > > > Thanks for the report. Is it 32 bit only or 64 bit is also impacted?
> > > > > > > > If I understand the code correctly, it expects the address allocated
> > > > > > > > by malloc() is kind of randomized, right?
> > > > > > >
> > > > > > > Yes, correct, the test expects a certain level of address randomization.
> > > > > > > The test failure was reported while running kernel_virt_x86_64 target
> > > > > > > (Android emulator on x86), so it does impact 64bit targets.
> > > > > >
> > > > > > IIUC this breaks the "expectation" for randomized addresses returned
> > > > > > by malloc(), but it doesn't break any real Android application, right?
> > > > > > So this is a security concern instead of a real regression.
> > > > >
> > > > > How is making a system move vulnerabile not a real regression?
> > > > >
> > > > > >
> > > > > > I think we can make this opt-in with a knob. Anyone who outweighs
> > > > > > security could opt this feature out. However I'm wondering whether
> > > > > > Android should implement a general address randomization mechanism
> > > > > > instead of depending on "luck" if you do care about it.
> > > > >
> > > > > This is not depending on luck. This is checking for possible
> > > > > vulnerabilities in the system.
> > > > > I admit I'm not a security expert, so I'm looping in Jeff and Kees to chime in.
> > > >
> > > > Hi!
> > > >
> > > > Just to chime in, but reduction in ASLR entropy is absolutely a
> > > > regression. This is userspace visible (via /proc/sys/kernel/randomize_va_space,
> > > > /proc/sys/vm/mmap_rnd*_bits) with expectations that it work as
> > > > advertised. So, while 32-bit might be already ASLR-weak, we don't want
> > > > to make things super bad nor break ASLR in compat mode under 64-bit
> > > > systems.
> > > >
> > > > Having an opt-in sounds reasonable, but we need to wire it to the ASLR
> > > > sysctls in some way so nothing lying about the ASLR entropy.
> > >
> > > Thanks for the explanation. IIUC the randomiza_va_space and
> > > mmap_rnd_bits randomize the mmap_base and start_brk for each exec()
> > > call. So the heap allocation is randomized. But it seems the formula
> > > doesn't take into account huge page. ARM64 adjusts the mmap_rnd_bits
> > > based on page size.
> > >
> > > I did a simple test, which conceptually does:
> > > 1. call mmap to allocate 8M heap
> > > 2. print out the allocated address
> > > 3. run the program 1000 times (launch/exit/re-launch)
> > > 4. check how many unique addresses
> > >
> > > With the default config on my arm64 VM (mmap_rnd_bits is 18), I saw
> > > 134 unique addresses. Without the patch, I saw 945 unique addresses.
> > > So I think the test could replicate what your test does.
> > >
> > > When I increased the mmap_rnd_bits to 24, I saw 988 unique addresses
> > > with the patch. x86_64 should have 28 bits by default, it should
> > > randomize the address quite well. I don't know why you still saw this,
> > > or you have a different setting for mmap_rnd_bits?
> >
> > I checked the configuration on our test harness where the test failed:
>
> Thanks, Suren.
>
> >
> > /proc/sys/vm/mmap_rnd_bits = 32
>
> It is surprising 32 bits still fail. 24 bits on arm64 works for me. Or
> the compat bits is used?

Hmm. Let me verify to exclude that possibility.

>
> > /proc/sys/vm/mmap_rnd_compat_bits = 16
> >
> > The failure logs are:
> >
> > 10-20 14:37:52.123  7029  7029 V AslrMallocTest: 7 bits of entropy for
> > allocation size 8388608 (minimum 8)
> > 10-20 14:37:52.123  7029  7029 E AslrMallocTest: insufficient entropy
> > for malloc(8388608)
> >
> > which come from here:
> > https://cs.android.com/android/platform/superproject/main/+/main:cts/tests/aslr/src/AslrMallocTest.cpp;l=127
> > So, the allocation size for which this test failed was 2^23.
>
> The patch just tries to align >= 2M allocations. It looks like your
> test allocates 256 bytes, 64K and 8M. So just 8M is impacted.

Correct.

>
>
> >
> >
> > > I'm wondering whether we should take into account huge page alignment
> > > for mmap_rnd_bits. And I think this is a huge page common problem, we
> > > have file mapping huge page aligned as well.
> > >
> > > 32 bit is easy, I think I can just make thp_get_unmapped_area() a
> > > no-op on 32 bit system.
> > >
> > > >
> > > > -Kees
> > > >
> > > > --
> > > > Kees Cook

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-18  0:29                       ` Suren Baghdasaryan
@ 2024-01-18  1:34                         ` Suren Baghdasaryan
  2024-01-18  2:10                           ` Suren Baghdasaryan
  0 siblings, 1 reply; 29+ messages in thread
From: Suren Baghdasaryan @ 2024-01-18  1:34 UTC (permalink / raw)
  To: Yang Shi
  Cc: Kees Cook, Jeffrey Vander Stoep, Jiri Slaby, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Matthew Wilcox, Christoph Lameter, linux-hardening

On Wed, Jan 17, 2024 at 4:29 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Jan 17, 2024 at 4:13 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Wed, Jan 17, 2024 at 4:02 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Wed, Jan 17, 2024 at 3:32 PM Yang Shi <shy828301@gmail.com> wrote:
> > > >
> > > > On Wed, Jan 17, 2024 at 9:40 AM Kees Cook <keescook@chromium.org> wrote:
> > > > >
> > > > > On Tue, Jan 16, 2024 at 02:30:36PM -0800, Suren Baghdasaryan wrote:
> > > > > > On Tue, Jan 16, 2024 at 2:25 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jan 16, 2024 at 1:58 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jan 16, 2024 at 12:56 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Jan 16, 2024 at 11:16 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > > > > > > > > > > Hi,
> > > > > > > > > > > >
> > > > > > > > > > > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > > > > > > > > > > >> Align larger anonymous memory mappings on THP boundaries by
> > > > > > > > > > > >> going through thp_get_unmapped_area if THPs are enabled for
> > > > > > > > > > > >> the current process.
> > > > > > > > > > > >>
> > > > > > > > > > > >> With this patch, larger anonymous mappings are now THP aligned.
> > > > > > > > > > > >> When a malloc library allocates a 2MB or larger arena, that
> > > > > > > > > > > >> arena can now be mapped with THPs right from the start, which
> > > > > > > > > > > >> can result in better TLB hit rates and execution time.
> > > > > > > > > > > >
> > > > > > > > > > > > This appears to break 32bit processes on x86_64 (at least). In
> > > > > > > > > > > > particular, 32bit kernel or firefox builds in our build system.
> > > > > > > > > > > >
> > > > > > > > > > > > Reverting this on top of 6.7 makes it work again.
> > > > > > > > > > > >
> > > > > > > > > > > > Downstream report:
> > > > > > > > > > > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > > > > > > > > > > >
> > > > > > > > > > > > So running:
> > > > > > > > > > > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > > > > > > > > > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > > > > > > > > > > >
> > > > > > > > > > > > crashes or errors out with some random errors:
> > > > > > > > > > > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > > > > > > > > > > Error emitting field
> > > > > > > > > > > >
> > > > > > > > > > > > strace shows mmap() fails with ENOMEM right before the errors:
> > > > > > > > > > > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > > > > > > > > > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > > > > > > > > > > ...
> > > > > > > > > > > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > > > > > > > > > > memory)
> > > > > > > > > > > >
> > > > > > > > > > > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > > > > > > > > > > enough. For reference, one is available at:
> > > > > > > > > > > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > > > > > > > > > > >
> > > > > > > > > > > > Any ideas?
> > > > > > > > > > >
> > > > > > > > > > > This works around the problem, of course (but is a band-aid, not a fix):
> > > > > > > > > > >
> > > > > > > > > > > --- a/mm/mmap.c
> > > > > > > > > > > +++ b/mm/mmap.c
> > > > > > > > > > > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > > > > > > > > > > addr, unsigned long len,
> > > > > > > > > > >                   */
> > > > > > > > > > >                  pgoff = 0;
> > > > > > > > > > >                  get_area = shmem_get_unmapped_area;
> > > > > > > > > > > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > > > > > > > > > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > > > > > > > > > > !in_32bit_syscall()) {
> > > > > > > > > > >                  /* Ensures that larger anonymous mappings are THP
> > > > > > > > > > > aligned. */
> > > > > > > > > > >                  get_area = thp_get_unmapped_area;
> > > > > > > > > > >          }
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > thp_get_unmapped_area() does not take care of the legacy stuff...
> > > > > > > > > >
> > > > > > > > > > This change also affects the entropy of allocations. With this patch
> > > > > > > > > > Android test [1] started failing and it requires only 8 bits of
> > > > > > > > > > entropy. The feedback from our security team:
> > > > > > > > > >
> > > > > > > > > > 8 bits of entropy is already embarrassingly low, but was necessary for
> > > > > > > > > > 32 bit ARM targets with low RAM at the time. It's definitely not
> > > > > > > > > > acceptable for 64 bit targets.
> > > > > > > > >
> > > > > > > > > Thanks for the report. Is it 32 bit only or 64 bit is also impacted?
> > > > > > > > > If I understand the code correctly, it expects the address allocated
> > > > > > > > > by malloc() is kind of randomized, right?
> > > > > > > >
> > > > > > > > Yes, correct, the test expects a certain level of address randomization.
> > > > > > > > The test failure was reported while running kernel_virt_x86_64 target
> > > > > > > > (Android emulator on x86), so it does impact 64bit targets.
> > > > > > >
> > > > > > > IIUC this breaks the "expectation" for randomized addresses returned
> > > > > > > by malloc(), but it doesn't break any real Android application, right?
> > > > > > > So this is a security concern instead of a real regression.
> > > > > >
> > > > > > How is making a system move vulnerabile not a real regression?
> > > > > >
> > > > > > >
> > > > > > > I think we can make this opt-in with a knob. Anyone who outweighs
> > > > > > > security could opt this feature out. However I'm wondering whether
> > > > > > > Android should implement a general address randomization mechanism
> > > > > > > instead of depending on "luck" if you do care about it.
> > > > > >
> > > > > > This is not depending on luck. This is checking for possible
> > > > > > vulnerabilities in the system.
> > > > > > I admit I'm not a security expert, so I'm looping in Jeff and Kees to chime in.
> > > > >
> > > > > Hi!
> > > > >
> > > > > Just to chime in, but reduction in ASLR entropy is absolutely a
> > > > > regression. This is userspace visible (via /proc/sys/kernel/randomize_va_space,
> > > > > /proc/sys/vm/mmap_rnd*_bits) with expectations that it work as
> > > > > advertised. So, while 32-bit might be already ASLR-weak, we don't want
> > > > > to make things super bad nor break ASLR in compat mode under 64-bit
> > > > > systems.
> > > > >
> > > > > Having an opt-in sounds reasonable, but we need to wire it to the ASLR
> > > > > sysctls in some way so nothing lying about the ASLR entropy.
> > > >
> > > > Thanks for the explanation. IIUC the randomiza_va_space and
> > > > mmap_rnd_bits randomize the mmap_base and start_brk for each exec()
> > > > call. So the heap allocation is randomized. But it seems the formula
> > > > doesn't take into account huge page. ARM64 adjusts the mmap_rnd_bits
> > > > based on page size.
> > > >
> > > > I did a simple test, which conceptually does:
> > > > 1. call mmap to allocate 8M heap
> > > > 2. print out the allocated address
> > > > 3. run the program 1000 times (launch/exit/re-launch)
> > > > 4. check how many unique addresses
> > > >
> > > > With the default config on my arm64 VM (mmap_rnd_bits is 18), I saw
> > > > 134 unique addresses. Without the patch, I saw 945 unique addresses.
> > > > So I think the test could replicate what your test does.
> > > >
> > > > When I increased the mmap_rnd_bits to 24, I saw 988 unique addresses
> > > > with the patch. x86_64 should have 28 bits by default, it should
> > > > randomize the address quite well. I don't know why you still saw this,
> > > > or you have a different setting for mmap_rnd_bits?
> > >
> > > I checked the configuration on our test harness where the test failed:
> >
> > Thanks, Suren.
> >
> > >
> > > /proc/sys/vm/mmap_rnd_bits = 32
> >
> > It is surprising 32 bits still fail. 24 bits on arm64 works for me. Or
> > the compat bits is used?
>
> Hmm. Let me verify to exclude that possibility.

Aha! You are correct, the test is using compat syscalls and your
suggestion at https://lore.kernel.org/all/CAHbLzkoL6sCDciHqVMJga288853CHdOTa5thOPQ9SHKSqjGGPQ@mail.gmail.com/
seems to fix it. I started a complete set of presubmit tests at
https://android-review.googlesource.com/c/kernel/common/+/2916065 and
will report the results tomorrow morning but I expect them to pass
now.
Thanks,
Suren.

>
> >
> > > /proc/sys/vm/mmap_rnd_compat_bits = 16
> > >
> > > The failure logs are:
> > >
> > > 10-20 14:37:52.123  7029  7029 V AslrMallocTest: 7 bits of entropy for
> > > allocation size 8388608 (minimum 8)
> > > 10-20 14:37:52.123  7029  7029 E AslrMallocTest: insufficient entropy
> > > for malloc(8388608)
> > >
> > > which come from here:
> > > https://cs.android.com/android/platform/superproject/main/+/main:cts/tests/aslr/src/AslrMallocTest.cpp;l=127
> > > So, the allocation size for which this test failed was 2^23.
> >
> > The patch just tries to align >= 2M allocations. It looks like your
> > test allocates 256 bytes, 64K and 8M. So just 8M is impacted.
>
> Correct.
>
> >
> >
> > >
> > >
> > > > I'm wondering whether we should take into account huge page alignment
> > > > for mmap_rnd_bits. And I think this is a huge page common problem, we
> > > > have file mapping huge page aligned as well.
> > > >
> > > > 32 bit is easy, I think I can just make thp_get_unmapped_area() a
> > > > no-op on 32 bit system.
> > > >
> > > > >
> > > > > -Kees
> > > > >
> > > > > --
> > > > > Kees Cook

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-18  1:34                         ` Suren Baghdasaryan
@ 2024-01-18  2:10                           ` Suren Baghdasaryan
  0 siblings, 0 replies; 29+ messages in thread
From: Suren Baghdasaryan @ 2024-01-18  2:10 UTC (permalink / raw)
  To: Yang Shi
  Cc: Kees Cook, Jeffrey Vander Stoep, Jiri Slaby, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Matthew Wilcox, Christoph Lameter, linux-hardening

On Wed, Jan 17, 2024 at 5:34 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Jan 17, 2024 at 4:29 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Wed, Jan 17, 2024 at 4:13 PM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Wed, Jan 17, 2024 at 4:02 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > On Wed, Jan 17, 2024 at 3:32 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > >
> > > > > On Wed, Jan 17, 2024 at 9:40 AM Kees Cook <keescook@chromium.org> wrote:
> > > > > >
> > > > > > On Tue, Jan 16, 2024 at 02:30:36PM -0800, Suren Baghdasaryan wrote:
> > > > > > > On Tue, Jan 16, 2024 at 2:25 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jan 16, 2024 at 1:58 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Jan 16, 2024 at 12:56 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Jan 16, 2024 at 11:16 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jan 16, 2024 at 4:09 AM Jiri Slaby <jirislaby@kernel.org> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On 16. 01. 24, 12:53, Jiri Slaby wrote:
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On 09. 08. 22, 20:24, Rik van Riel wrote:
> > > > > > > > > > > > >> Align larger anonymous memory mappings on THP boundaries by
> > > > > > > > > > > > >> going through thp_get_unmapped_area if THPs are enabled for
> > > > > > > > > > > > >> the current process.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> With this patch, larger anonymous mappings are now THP aligned.
> > > > > > > > > > > > >> When a malloc library allocates a 2MB or larger arena, that
> > > > > > > > > > > > >> arena can now be mapped with THPs right from the start, which
> > > > > > > > > > > > >> can result in better TLB hit rates and execution time.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This appears to break 32bit processes on x86_64 (at least). In
> > > > > > > > > > > > > particular, 32bit kernel or firefox builds in our build system.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Reverting this on top of 6.7 makes it work again.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Downstream report:
> > > > > > > > > > > > >   https://bugzilla.suse.com/show_bug.cgi?id=1218841
> > > > > > > > > > > > >
> > > > > > > > > > > > > So running:
> > > > > > > > > > > > > pahole -J --btf_gen_floats -j --lang_exclude=rust
> > > > > > > > > > > > > --skip_encoding_btf_inconsistent_proto --btf_gen_optimized .tmp_vmlinux.btf
> > > > > > > > > > > > >
> > > > > > > > > > > > > crashes or errors out with some random errors:
> > > > > > > > > > > > > [182671] STRUCT idr's field 'idr_next' offset=128 bit_size=0 type=181346
> > > > > > > > > > > > > Error emitting field
> > > > > > > > > > > > >
> > > > > > > > > > > > > strace shows mmap() fails with ENOMEM right before the errors:
> > > > > > > > > > > > > 1223  mmap2(NULL, 5783552, PROT_READ|PROT_WRITE,
> > > > > > > > > > > > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > > > > > > > > > > > > ...
> > > > > > > > > > > > > 1223  <... mmap2 resumed>)              = -1 ENOMEM (Cannot allocate
> > > > > > > > > > > > > memory)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Note the .tmp_vmlinux.btf above can be arbitrary, but likely large
> > > > > > > > > > > > > enough. For reference, one is available at:
> > > > > > > > > > > > > https://decibel.fi.muni.cz/~xslaby/n/btf
> > > > > > > > > > > > >
> > > > > > > > > > > > > Any ideas?
> > > > > > > > > > > >
> > > > > > > > > > > > This works around the problem, of course (but is a band-aid, not a fix):
> > > > > > > > > > > >
> > > > > > > > > > > > --- a/mm/mmap.c
> > > > > > > > > > > > +++ b/mm/mmap.c
> > > > > > > > > > > > @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > > > > > > > > > > > addr, unsigned long len,
> > > > > > > > > > > >                   */
> > > > > > > > > > > >                  pgoff = 0;
> > > > > > > > > > > >                  get_area = shmem_get_unmapped_area;
> > > > > > > > > > > > -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > > > > > > > > > > +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > > > > > > > > > > > !in_32bit_syscall()) {
> > > > > > > > > > > >                  /* Ensures that larger anonymous mappings are THP
> > > > > > > > > > > > aligned. */
> > > > > > > > > > > >                  get_area = thp_get_unmapped_area;
> > > > > > > > > > > >          }
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > thp_get_unmapped_area() does not take care of the legacy stuff...
> > > > > > > > > > >
> > > > > > > > > > > This change also affects the entropy of allocations. With this patch
> > > > > > > > > > > Android test [1] started failing and it requires only 8 bits of
> > > > > > > > > > > entropy. The feedback from our security team:
> > > > > > > > > > >
> > > > > > > > > > > 8 bits of entropy is already embarrassingly low, but was necessary for
> > > > > > > > > > > 32 bit ARM targets with low RAM at the time. It's definitely not
> > > > > > > > > > > acceptable for 64 bit targets.
> > > > > > > > > >
> > > > > > > > > > Thanks for the report. Is it 32 bit only or 64 bit is also impacted?
> > > > > > > > > > If I understand the code correctly, it expects the address allocated
> > > > > > > > > > by malloc() is kind of randomized, right?
> > > > > > > > >
> > > > > > > > > Yes, correct, the test expects a certain level of address randomization.
> > > > > > > > > The test failure was reported while running kernel_virt_x86_64 target
> > > > > > > > > (Android emulator on x86), so it does impact 64bit targets.
> > > > > > > >
> > > > > > > > IIUC this breaks the "expectation" for randomized addresses returned
> > > > > > > > by malloc(), but it doesn't break any real Android application, right?
> > > > > > > > So this is a security concern instead of a real regression.
> > > > > > >
> > > > > > > How is making a system move vulnerabile not a real regression?
> > > > > > >
> > > > > > > >
> > > > > > > > I think we can make this opt-in with a knob. Anyone who outweighs
> > > > > > > > security could opt this feature out. However I'm wondering whether
> > > > > > > > Android should implement a general address randomization mechanism
> > > > > > > > instead of depending on "luck" if you do care about it.
> > > > > > >
> > > > > > > This is not depending on luck. This is checking for possible
> > > > > > > vulnerabilities in the system.
> > > > > > > I admit I'm not a security expert, so I'm looping in Jeff and Kees to chime in.
> > > > > >
> > > > > > Hi!
> > > > > >
> > > > > > Just to chime in, but reduction in ASLR entropy is absolutely a
> > > > > > regression. This is userspace visible (via /proc/sys/kernel/randomize_va_space,
> > > > > > /proc/sys/vm/mmap_rnd*_bits) with expectations that it work as
> > > > > > advertised. So, while 32-bit might be already ASLR-weak, we don't want
> > > > > > to make things super bad nor break ASLR in compat mode under 64-bit
> > > > > > systems.
> > > > > >
> > > > > > Having an opt-in sounds reasonable, but we need to wire it to the ASLR
> > > > > > sysctls in some way so nothing lying about the ASLR entropy.
> > > > >
> > > > > Thanks for the explanation. IIUC the randomiza_va_space and
> > > > > mmap_rnd_bits randomize the mmap_base and start_brk for each exec()
> > > > > call. So the heap allocation is randomized. But it seems the formula
> > > > > doesn't take into account huge page. ARM64 adjusts the mmap_rnd_bits
> > > > > based on page size.
> > > > >
> > > > > I did a simple test, which conceptually does:
> > > > > 1. call mmap to allocate 8M heap
> > > > > 2. print out the allocated address
> > > > > 3. run the program 1000 times (launch/exit/re-launch)
> > > > > 4. check how many unique addresses
> > > > >
> > > > > With the default config on my arm64 VM (mmap_rnd_bits is 18), I saw
> > > > > 134 unique addresses. Without the patch, I saw 945 unique addresses.
> > > > > So I think the test could replicate what your test does.
> > > > >
> > > > > When I increased the mmap_rnd_bits to 24, I saw 988 unique addresses
> > > > > with the patch. x86_64 should have 28 bits by default, it should
> > > > > randomize the address quite well. I don't know why you still saw this,
> > > > > or you have a different setting for mmap_rnd_bits?
> > > >
> > > > I checked the configuration on our test harness where the test failed:
> > >
> > > Thanks, Suren.
> > >
> > > >
> > > > /proc/sys/vm/mmap_rnd_bits = 32
> > >
> > > It is surprising 32 bits still fail. 24 bits on arm64 works for me. Or
> > > the compat bits is used?
> >
> > Hmm. Let me verify to exclude that possibility.
>
> Aha! You are correct, the test is using compat syscalls and your
> suggestion at https://lore.kernel.org/all/CAHbLzkoL6sCDciHqVMJga288853CHdOTa5thOPQ9SHKSqjGGPQ@mail.gmail.com/
> seems to fix it. I started a complete set of presubmit tests at
> https://android-review.googlesource.com/c/kernel/common/+/2916065 and
> will report the results tomorrow morning but I expect them to pass
> now.

nit: You will need to #include <linux/compat.h> in your fix. Most
configurations build fine but one failed. It has only CONFIG_COMPAT_32
enabled:

#
# Binary Emulations
#
CONFIG_COMPAT_32=y
# end of Binary Emulations

Adding the missing include into mm/huge_memory.c fixes the issue.


> Thanks,
> Suren.
>
> >
> > >
> > > > /proc/sys/vm/mmap_rnd_compat_bits = 16
> > > >
> > > > The failure logs are:
> > > >
> > > > 10-20 14:37:52.123  7029  7029 V AslrMallocTest: 7 bits of entropy for
> > > > allocation size 8388608 (minimum 8)
> > > > 10-20 14:37:52.123  7029  7029 E AslrMallocTest: insufficient entropy
> > > > for malloc(8388608)
> > > >
> > > > which come from here:
> > > > https://cs.android.com/android/platform/superproject/main/+/main:cts/tests/aslr/src/AslrMallocTest.cpp;l=127
> > > > So, the allocation size for which this test failed was 2^23.
> > >
> > > The patch just tries to align >= 2M allocations. It looks like your
> > > test allocates 256 bytes, 64K and 8M. So just 8M is impacted.
> >
> > Correct.
> >
> > >
> > >
> > > >
> > > >
> > > > > I'm wondering whether we should take into account huge page alignment
> > > > > for mmap_rnd_bits. And I think this is a huge page common problem, we
> > > > > have file mapping huge page aligned as well.
> > > > >
> > > > > 32 bit is easy, I think I can just make thp_get_unmapped_area() a
> > > > > no-op on 32 bit system.
> > > > >
> > > > > >
> > > > > > -Kees
> > > > > >
> > > > > > --
> > > > > > Kees Cook

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-18  0:07     ` Yang Shi
  2024-01-18  0:09       ` Suren Baghdasaryan
@ 2024-01-18  7:04       ` Jiri Slaby
  2024-01-18 17:48         ` Suren Baghdasaryan
  2024-01-18 18:42         ` Yang Shi
  1 sibling, 2 replies; 29+ messages in thread
From: Jiri Slaby @ 2024-01-18  7:04 UTC (permalink / raw)
  To: Yang Shi
  Cc: Rik van Riel, Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Matthew Wilcox, Christoph Lameter

On 18. 01. 24, 1:07, Yang Shi wrote:
>> This works around the problem, of course (but is a band-aid, not a fix):
>>
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
>> addr, unsigned long len,
>>                    */
>>                   pgoff = 0;
>>                   get_area = shmem_get_unmapped_area;
>> -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
>> +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
>> !in_32bit_syscall()) {
>>                   /* Ensures that larger anonymous mappings are THP
>> aligned. */
>>                   get_area = thp_get_unmapped_area;
>>           }
>>
>>
>> thp_get_unmapped_area() does not take care of the legacy stuff...
> 
> Could you please help test the below patch? It is compiled, but I
> don't have 32 bit userspace or machine to test it.

Yeah, for x86_64, it's semantically the same as the above, so this works 
too:

Tested-by: Jiri Slaby <jirislaby@kernel.org>

> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -811,6 +811,9 @@ static unsigned long
> __thp_get_unmapped_area(struct file *filp,
>          loff_t off_align = round_up(off, size);
>          unsigned long len_pad, ret;
> 
> +       if (IS_ENABLED(CONFIG_32BIT) || in_compat_syscall())
> +               return 0;
> +
>          if (off_end <= off_align || (off_end - off_align) < size)
>                  return 0;

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-18  7:04       ` Jiri Slaby
@ 2024-01-18 17:48         ` Suren Baghdasaryan
  2024-01-18 18:42           ` Yang Shi
  2024-01-18 18:42         ` Yang Shi
  1 sibling, 1 reply; 29+ messages in thread
From: Suren Baghdasaryan @ 2024-01-18 17:48 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Yang Shi, Rik van Riel, Andrew Morton, linux-mm, linux-kernel,
	kernel-team, Matthew Wilcox, Christoph Lameter

On Wed, Jan 17, 2024 at 11:05 PM Jiri Slaby <jirislaby@kernel.org> wrote:
>
> On 18. 01. 24, 1:07, Yang Shi wrote:
> >> This works around the problem, of course (but is a band-aid, not a fix):
> >>
> >> --- a/mm/mmap.c
> >> +++ b/mm/mmap.c
> >> @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> >> addr, unsigned long len,
> >>                    */
> >>                   pgoff = 0;
> >>                   get_area = shmem_get_unmapped_area;
> >> -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> >> +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> >> !in_32bit_syscall()) {
> >>                   /* Ensures that larger anonymous mappings are THP
> >> aligned. */
> >>                   get_area = thp_get_unmapped_area;
> >>           }
> >>
> >>
> >> thp_get_unmapped_area() does not take care of the legacy stuff...
> >
> > Could you please help test the below patch? It is compiled, but I
> > don't have 32 bit userspace or machine to test it.
>
> Yeah, for x86_64, it's semantically the same as the above, so this works
> too:
>
> Tested-by: Jiri Slaby <jirislaby@kernel.org>

With the addition of  #include <linux/compat.h> it builds and passes
our tests as well. Thanks!

Tested-by: Suren Baghdasaryan <surenb@google.com>

>
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -811,6 +811,9 @@ static unsigned long
> > __thp_get_unmapped_area(struct file *filp,
> >          loff_t off_align = round_up(off, size);
> >          unsigned long len_pad, ret;
> >
> > +       if (IS_ENABLED(CONFIG_32BIT) || in_compat_syscall())
> > +               return 0;
> > +
> >          if (off_end <= off_align || (off_end - off_align) < size)
> >                  return 0;
>
> thanks,
> --
> js
> suse labs
>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-18 17:48         ` Suren Baghdasaryan
@ 2024-01-18 18:42           ` Yang Shi
  0 siblings, 0 replies; 29+ messages in thread
From: Yang Shi @ 2024-01-18 18:42 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Jiri Slaby, Rik van Riel, Andrew Morton, linux-mm, linux-kernel,
	kernel-team, Matthew Wilcox, Christoph Lameter

On Thu, Jan 18, 2024 at 9:48 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Jan 17, 2024 at 11:05 PM Jiri Slaby <jirislaby@kernel.org> wrote:
> >
> > On 18. 01. 24, 1:07, Yang Shi wrote:
> > >> This works around the problem, of course (but is a band-aid, not a fix):
> > >>
> > >> --- a/mm/mmap.c
> > >> +++ b/mm/mmap.c
> > >> @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> > >> addr, unsigned long len,
> > >>                    */
> > >>                   pgoff = 0;
> > >>                   get_area = shmem_get_unmapped_area;
> > >> -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > >> +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > >> !in_32bit_syscall()) {
> > >>                   /* Ensures that larger anonymous mappings are THP
> > >> aligned. */
> > >>                   get_area = thp_get_unmapped_area;
> > >>           }
> > >>
> > >>
> > >> thp_get_unmapped_area() does not take care of the legacy stuff...
> > >
> > > Could you please help test the below patch? It is compiled, but I
> > > don't have 32 bit userspace or machine to test it.
> >
> > Yeah, for x86_64, it's semantically the same as the above, so this works
> > too:
> >
> > Tested-by: Jiri Slaby <jirislaby@kernel.org>
>
> With the addition of  #include <linux/compat.h> it builds and passes
> our tests as well. Thanks!
>
> Tested-by: Suren Baghdasaryan <surenb@google.com>

Thanks for checking the test case and testing the patch. I will fix it
and post the formal patch.

>
> >
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -811,6 +811,9 @@ static unsigned long
> > > __thp_get_unmapped_area(struct file *filp,
> > >          loff_t off_align = round_up(off, size);
> > >          unsigned long len_pad, ret;
> > >
> > > +       if (IS_ENABLED(CONFIG_32BIT) || in_compat_syscall())
> > > +               return 0;
> > > +
> > >          if (off_end <= off_align || (off_end - off_align) < size)
> > >                  return 0;
> >
> > thanks,
> > --
> > js
> > suse labs
> >
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-18  7:04       ` Jiri Slaby
  2024-01-18 17:48         ` Suren Baghdasaryan
@ 2024-01-18 18:42         ` Yang Shi
  1 sibling, 0 replies; 29+ messages in thread
From: Yang Shi @ 2024-01-18 18:42 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Rik van Riel, Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Matthew Wilcox, Christoph Lameter

On Wed, Jan 17, 2024 at 11:04 PM Jiri Slaby <jirislaby@kernel.org> wrote:
>
> On 18. 01. 24, 1:07, Yang Shi wrote:
> >> This works around the problem, of course (but is a band-aid, not a fix):
> >>
> >> --- a/mm/mmap.c
> >> +++ b/mm/mmap.c
> >> @@ -1829,7 +1829,7 @@ get_unmapped_area(struct file *file, unsigned long
> >> addr, unsigned long len,
> >>                    */
> >>                   pgoff = 0;
> >>                   get_area = shmem_get_unmapped_area;
> >> -       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> >> +       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> >> !in_32bit_syscall()) {
> >>                   /* Ensures that larger anonymous mappings are THP
> >> aligned. */
> >>                   get_area = thp_get_unmapped_area;
> >>           }
> >>
> >>
> >> thp_get_unmapped_area() does not take care of the legacy stuff...
> >
> > Could you please help test the below patch? It is compiled, but I
> > don't have 32 bit userspace or machine to test it.
>
> Yeah, for x86_64, it's semantically the same as the above, so this works
> too:
>
> Tested-by: Jiri Slaby <jirislaby@kernel.org>

Thanks!

>
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -811,6 +811,9 @@ static unsigned long
> > __thp_get_unmapped_area(struct file *filp,
> >          loff_t off_align = round_up(off, size);
> >          unsigned long len_pad, ret;
> >
> > +       if (IS_ENABLED(CONFIG_32BIT) || in_compat_syscall())
> > +               return 0;
> > +
> >          if (off_end <= off_align || (off_end - off_align) < size)
> >                  return 0;
>
> thanks,
> --
> js
> suse labs
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-16 11:53 ` Jiri Slaby
  2024-01-16 12:09   ` Jiri Slaby
@ 2024-01-20 13:43   ` Linux regression tracking #adding (Thorsten Leemhuis)
  2024-01-20 15:47   ` Linux regression tracking #adding (Thorsten Leemhuis)
  2 siblings, 0 replies; 29+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2024-01-20 13:43 UTC (permalink / raw)
  To: Linux kernel regressions list; +Cc: linux-mm, linux-kernel

On 16.01.24 12:53, Jiri Slaby wrote:

> On 09. 08. 22, 20:24, Rik van Riel wrote:
>> Align larger anonymous memory mappings on THP boundaries by
>> going through thp_get_unmapped_area if THPs are enabled for
>> the current process.
>>
>> With this patch, larger anonymous mappings are now THP aligned.
>> When a malloc library allocates a 2MB or larger arena, that
>> arena can now be mapped with THPs right from the start, which
>> can result in better TLB hit rates and execution time.
> 
> This appears to break 32bit processes on x86_64 (at least). In
> particular, 32bit kernel or firefox builds in our build system.
> 
> Reverting this on top of 6.7 makes it work again.
> 
> Downstream report:
>  https://bugzilla.suse.com/show_bug.cgi?id=1218841
> [...]
#regzbot ^introduced efa7df3e3bb5
#regzbot title mm: huge_memory: 32 bit systems or compat
userspace broke
#regzbot link: https://bugzilla.suse.com/show_bug.cgi?id=1218841
#regzbot fix: mm: huge_memory: don't force huge page alignment on 32 bit
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] mm: align larger anonymous mappings on THP boundaries
  2024-01-16 11:53 ` Jiri Slaby
  2024-01-16 12:09   ` Jiri Slaby
  2024-01-20 13:43   ` Linux regression tracking #adding (Thorsten Leemhuis)
@ 2024-01-20 15:47   ` Linux regression tracking #adding (Thorsten Leemhuis)
  2 siblings, 0 replies; 29+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2024-01-20 15:47 UTC (permalink / raw)
  To: Linux kernel regressions list; +Cc: linux-mm, linux-kernel

On 16.01.24 12:53, Jiri Slaby wrote:

> On 09. 08. 22, 20:24, Rik van Riel wrote:
>> Align larger anonymous memory mappings on THP boundaries by
>> going through thp_get_unmapped_area if THPs are enabled for
>> the current process.
>>
>> With this patch, larger anonymous mappings are now THP aligned.
>> When a malloc library allocates a 2MB or larger arena, that
>> arena can now be mapped with THPs right from the start, which
>> can result in better TLB hit rates and execution time.
> 
> This appears to break 32bit processes on x86_64 (at least). In
> particular, 32bit kernel or firefox builds in our build system.
> 
> Reverting this on top of 6.7 makes it work again.
> 
> Downstream report:
>  https://bugzilla.suse.com/show_bug.cgi?id=1218841
> [...]

Trying this again, sorry for the noise:

#regzbot ^introduced efa7df3e3bb5
#regzbot title mm: huge_memory: 32 bit systems or compat userspace broke
#regzbot link: https://bugzilla.suse.com/show_bug.cgi?id=1218841
#regzbot fix: mm: huge_memory: don't force huge page alignment on 32 bit
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-01-20 15:47 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-09 18:24 [PATCH v2] mm: align larger anonymous mappings on THP boundaries Rik van Riel
2022-08-10 17:06 ` Yang Shi
2024-01-16 11:53 ` Jiri Slaby
2024-01-16 12:09   ` Jiri Slaby
2024-01-16 19:16     ` Suren Baghdasaryan
2024-01-16 20:56       ` Yang Shi
2024-01-16 21:57         ` Suren Baghdasaryan
2024-01-16 22:25           ` Yang Shi
2024-01-16 22:30             ` Suren Baghdasaryan
2024-01-16 23:14               ` Yang Shi
2024-01-17 17:40               ` Kees Cook
2024-01-17 23:32                 ` Yang Shi
2024-01-18  0:01                   ` Suren Baghdasaryan
2024-01-18  0:13                     ` Yang Shi
2024-01-18  0:29                       ` Suren Baghdasaryan
2024-01-18  1:34                         ` Suren Baghdasaryan
2024-01-18  2:10                           ` Suren Baghdasaryan
2024-01-16 20:55     ` Yang Shi
2024-01-18  0:07     ` Yang Shi
2024-01-18  0:09       ` Suren Baghdasaryan
2024-01-18  0:11         ` Suren Baghdasaryan
2024-01-18  0:15           ` Yang Shi
2024-01-18  0:28             ` Suren Baghdasaryan
2024-01-18  7:04       ` Jiri Slaby
2024-01-18 17:48         ` Suren Baghdasaryan
2024-01-18 18:42           ` Yang Shi
2024-01-18 18:42         ` Yang Shi
2024-01-20 13:43   ` Linux regression tracking #adding (Thorsten Leemhuis)
2024-01-20 15:47   ` Linux regression tracking #adding (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).