All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Regression] mmap with MAP_32BIT randomly fails since 6.1
@ 2023-05-12  1:02 Robert Hensing
  2023-05-15 14:39 ` Liam R. Howlett
  0 siblings, 1 reply; 6+ messages in thread
From: Robert Hensing @ 2023-05-12  1:02 UTC (permalink / raw)
  To: Liam R. Howlett, Snild Dolkow, Matthew Wilcox (Oracle),
	regressions, LKML, Linux-MM, maple-tree, Andrew Morton

It appears that commit 58c5d0d6d522112577c7eeb71d382ea642ed7be4 causes
another regression of allocations with MAP_32BIT.
Reverting it fixes the reproducer from
https://lore.kernel.org/linux-mm/cb8dc31a-fef2-1d09-f133-e9f7b9f9e77a@sony.com/

Do you think this commit is somewhat safe to revert?

The following may be superfluous, but adds some context and might help
someone
find this thread. It merely confirms to the observation of this
regression in
https://lore.kernel.org/linux-mm/e6108286ac025c268964a7ead3aab9899f9bc6e9.camel@spotco.us/

 From what I can tell it also fixes my own use case, and

  - The program I maintain,
    https://github.com/hercules-ci/hercules-ci-agent/issues/514

  - Another program, also Haskell:
    https://github.com/aristanetworks/nix-serve-ng/issues/27

  - An FPGA interface process. I've found them because they list the same
    commit id on their blog.
    https://jia.je/software/2023/05/06/linux-regression-vivado-en/



On 3/2/23 19:43, Liam R. Howlett wrote:
> * Snild Dolkow <snild@sony.com> [230302 10:33]:
>> After upgrading a machine from 5.17.4 to 6.1.12 a couple of weeks ago, I
>> started getting (inconsistent) failures when building Android:
>> While it claims to be using 0x22 (MAP_PRIVATE | MAP_ANONYMOUS) for the
>> flags, it really uses 0x40 (MAP_32BIT) as well, as shown by strace:
>>

The same applies to the dynamic linker in the GHC Haskell runtime system.

It also uses MAP_32BIT, in its linker, and reports the error

ghc: mmap 4096 bytes at (nil): Cannot allocate memory


I hope this was a somewhat useful contribution to the regressions
thread. (also hi, I'm new here)

Cheers,

Robert Hensing



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Regression] mmap with MAP_32BIT randomly fails since 6.1
  2023-05-12  1:02 [Regression] mmap with MAP_32BIT randomly fails since 6.1 Robert Hensing
@ 2023-05-15 14:39 ` Liam R. Howlett
  0 siblings, 0 replies; 6+ messages in thread
From: Liam R. Howlett @ 2023-05-15 14:39 UTC (permalink / raw)
  To: Robert Hensing
  Cc: Snild Dolkow, Matthew Wilcox (Oracle),
	regressions, LKML, Linux-MM, maple-tree, Andrew Morton

* Robert Hensing <robert@hercules-ci.com> [230511 21:02]:
> It appears that commit 58c5d0d6d522112577c7eeb71d382ea642ed7be4 causes
> another regression of allocations with MAP_32BIT.
> Reverting it fixes the reproducer from
> https://lore.kernel.org/linux-mm/cb8dc31a-fef2-1d09-f133-e9f7b9f9e77a@sony.com/
> 
> Do you think this commit is somewhat safe to revert?

No, don't do that.

Add this [1] instead.  The patch is currently in mm-unstable and will
make its way though the normal channels to stable and mainline

[1] https://lore.kernel.org/linux-mm/20230505145829.74574-1-zhangpeng.00@bytedance.com/

Thanks,
Liam

> 
> The following may be superfluous, but adds some context and might help
> someone
> find this thread. It merely confirms to the observation of this
> regression in
> https://lore.kernel.org/linux-mm/e6108286ac025c268964a7ead3aab9899f9bc6e9.camel@spotco.us/
> 
>  From what I can tell it also fixes my own use case, and
> 
>   - The program I maintain,
>     https://github.com/hercules-ci/hercules-ci-agent/issues/514
> 
>   - Another program, also Haskell:
>     https://github.com/aristanetworks/nix-serve-ng/issues/27
> 
>   - An FPGA interface process. I've found them because they list the same
>     commit id on their blog.
>     https://jia.je/software/2023/05/06/linux-regression-vivado-en/
> 
> 
> 
> On 3/2/23 19:43, Liam R. Howlett wrote:
> > * Snild Dolkow <snild@sony.com> [230302 10:33]:
> >> After upgrading a machine from 5.17.4 to 6.1.12 a couple of weeks ago, I
> >> started getting (inconsistent) failures when building Android:
> >> While it claims to be using 0x22 (MAP_PRIVATE | MAP_ANONYMOUS) for the
> >> flags, it really uses 0x40 (MAP_32BIT) as well, as shown by strace:
> >>
> 
> The same applies to the dynamic linker in the GHC Haskell runtime system.
> 
> It also uses MAP_32BIT, in its linker, and reports the error
> 
> ghc: mmap 4096 bytes at (nil): Cannot allocate memory
> 
> 
> I hope this was a somewhat useful contribution to the regressions
> thread. (also hi, I'm new here)
> 
> Cheers,
> 
> Robert Hensing
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Regression] mmap with MAP_32BIT randomly fails since 6.1
  2023-03-03  8:31 ` Linux regression tracking #adding (Thorsten Leemhuis)
@ 2023-03-03 20:11   ` Liam R. Howlett
  0 siblings, 0 replies; 6+ messages in thread
From: Liam R. Howlett @ 2023-03-03 20:11 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Snild Dolkow, Matthew Wilcox (Oracle),
	LKML, Linux-MM, maple-tree, Andrew Morton

* Linux regression tracking #adding (Thorsten Leemhuis) <regressions@leemhuis.info> [230303 03:31]:
> [TLDR: I'm adding this report to the list of tracked Linux kernel
> regressions; the text you find below is based on a few templates
> paragraphs you might have encountered already in similar form.
> See link in footer if these mails annoy you.]
> 
> On 02.03.23 16:32, Snild Dolkow wrote:
> > After upgrading a machine from 5.17.4 to 6.1.12 a couple of weeks ago, I
> > started getting (inconsistent) failures when building Android:
> > [...]
> > I have checked a more recent master commit (ee3f96b1, from March 1st),
> > and the problem is still there. Bisecting shows that e15e06a8 is the
> > last good commit, and that 524e00b3 is the first one failing in this
> > way. The 10 or so commits in between run into a page fault BUG down in
> > vma_merge() instead.
> > 
> > This range of commits is about the same as mentioned in
> > https://lore.kernel.org/lkml/0b9f5425-08d4-8013-aa4c-e620c3b10bb2@leemhuis.info/, so I assume that my problem, too, was introduced with the Maple Tree changes. Sending this to the same people and lists.
> > 
> 
> Thanks for the report. To be sure the issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> tracking bot:
> 
> #regzbot ^introduced e15e06a8..524e00b3
> #regzbot title mm: mmap with MAP_32BIT randomly fails since 6.1
> #regzbot ignore-activity

Thanks!

> 
> This isn't a regression? This issue or a fix for it are already
> discussed somewhere else? It was fixed already? You want to clarify when
> the regression started to happen? Or point out I got the title or
> something else totally wrong? Then just reply and tell me -- ideally
> while also telling regzbot about it, as explained by the page listed in
> the footer of this mail.

I've sent a patch which has been tested by Snild and does not fully fix
the issue [1].  I am continuing work on this problem.

> 
> Developers: When fixing the issue, remember to add 'Link:' tags pointing
> to the report (the parent of this mail). See page linked in footer for
> details.

Pretty sure I did this part, so maybe the discussion was already picked
up.

1. https://lore.kernel.org/linux-mm/20230303021540.1056603-1-Liam.Howlett@oracle.com/

Cheers,
Liam

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Regression] mmap with MAP_32BIT randomly fails since 6.1
  2023-03-02 15:32 Snild Dolkow
  2023-03-02 18:43 ` Liam R. Howlett
@ 2023-03-03  8:31 ` Linux regression tracking #adding (Thorsten Leemhuis)
  2023-03-03 20:11   ` Liam R. Howlett
  1 sibling, 1 reply; 6+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-03-03  8:31 UTC (permalink / raw)
  To: Snild Dolkow, Liam R. Howlett, Matthew Wilcox (Oracle)
  Cc: regressions, LKML, Linux-MM, maple-tree, Andrew Morton

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 02.03.23 16:32, Snild Dolkow wrote:
> After upgrading a machine from 5.17.4 to 6.1.12 a couple of weeks ago, I
> started getting (inconsistent) failures when building Android:
> [...]
> I have checked a more recent master commit (ee3f96b1, from March 1st),
> and the problem is still there. Bisecting shows that e15e06a8 is the
> last good commit, and that 524e00b3 is the first one failing in this
> way. The 10 or so commits in between run into a page fault BUG down in
> vma_merge() instead.
> 
> This range of commits is about the same as mentioned in
> https://lore.kernel.org/lkml/0b9f5425-08d4-8013-aa4c-e620c3b10bb2@leemhuis.info/, so I assume that my problem, too, was introduced with the Maple Tree changes. Sending this to the same people and lists.
> 

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced e15e06a8..524e00b3
#regzbot title mm: mmap with MAP_32BIT randomly fails since 6.1
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Regression] mmap with MAP_32BIT randomly fails since 6.1
  2023-03-02 15:32 Snild Dolkow
@ 2023-03-02 18:43 ` Liam R. Howlett
  2023-03-03  8:31 ` Linux regression tracking #adding (Thorsten Leemhuis)
  1 sibling, 0 replies; 6+ messages in thread
From: Liam R. Howlett @ 2023-03-02 18:43 UTC (permalink / raw)
  To: Snild Dolkow
  Cc: Matthew Wilcox (Oracle),
	regressions, LKML, Linux-MM, maple-tree, Andrew Morton

* Snild Dolkow <snild@sony.com> [230302 10:33]:
> After upgrading a machine from 5.17.4 to 6.1.12 a couple of weeks ago, I
> started getting (inconsistent) failures when building Android:

Thanks for reporting this.

> 
> > dex2oatd F 02-28 11:49:44 40098 40098 mem_map_arena_pool.cc:65] Check failed: map.IsValid() Failed anonymous mmap((nil), 131072, 0x3, 0x22, -1, 0): Cannot allocate memory. See process maps in the log.
> 
> While it claims to be using 0x22 (MAP_PRIVATE | MAP_ANONYMOUS) for the
> flags, it really uses 0x40 (MAP_32BIT) as well, as shown by strace:
> 
> > mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x40720000
> > mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x4124e000
> > mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = -1 ENOMEM (Cannot allocate memory)
> > dex2oatd F 03-01 10:32:33 74063 74063 mem_map_arena_pool.cc:65] Check failed: map.IsValid() Failed anonymous mmap((nil), 131072, 0x3, 0x22, -1, 0): Cannot allocate memory. See process maps in the log.
> 
> Here's a simple reproducer, which (if my math is correct) tries to mmap a
> total of ~600MiB in increasing chunk sizes:
> 
> #include <sys/mman.h>
> #include <stdio.h>
> #include <errno.h>
> 
> int main() {
>     size_t total_leaks = 0;
>     for (int shift=12; shift<=16; shift++) {
>         size_t size = ((size_t)1)<<shift;
>         for (int i=0; i<5000; ++i) {
>             void* m = mmap(NULL, size, PROT_READ | PROT_WRITE,
>                     MAP_PRIVATE | MAP_ANONYMOUS | MAP_32BIT, -1, 0);
>             if (m == MAP_FAILED || m == NULL) {
>                 printf(
>                     "Failed. m=%p size=%zd (1<<%d) i=%d "
>                     " errno=%d total_leaks=%zd (%zd MiB)\n",
>                     m, size, shift, i, errno,
>                     total_leaks, total_leaks / 1024 / 1024);
>                 return 1;
>             }
>             total_leaks += size;
>         }
>     }
>     printf("Success.\n");
>     return 0;
> }

Very useful, thanks!

> 
> Older kernels fail very consistently at almost exactly 1GiB total_leaks, if
> you change the test program to go that far. On 6.1.12, it fails much
> earlier, after an arbitrary amount of successful mmaps:
> 
> > $ ./mmap-test Failed. m=0xffffffffffffffff size=4096 (1<<12) i=1500
> > errno=12 total_leaks=6144000 (5 MiB)
> > $ ./mmap-test Failed. m=0xffffffffffffffff size=4096 (1<<12) i=620
> > errno=12 total_leaks=2539520 (2 MiB)
> > $ ./mmap-test Failed. m=0xffffffffffffffff size=4096 (1<<12) i=2408
> > errno=12 total_leaks=9863168 (9 MiB)
> > $ ./mmap-test Failed. m=0xffffffffffffffff size=4096 (1<<12) i=774
> > errno=12 total_leaks=3170304 (3 MiB)
> > $ ./mmap-test Failed. m=0xffffffffffffffff size=4096 (1<<12) i=1648
> > errno=12 total_leaks=6750208 (6 MiB)
> > $ ./mmap-test
> 
> 
> I have checked a more recent master commit (ee3f96b1, from March 1st), and
> the problem is still there. Bisecting shows that e15e06a8 is the last good
> commit, and that 524e00b3 is the first one failing in this way. The 10 or so
> commits in between run into a page fault BUG down in vma_merge() instead.

It does look like it's the maple tree.  I am working on this issue now.

> 
> This range of commits is about the same as mentioned in https://lore.kernel.org/lkml/0b9f5425-08d4-8013-aa4c-e620c3b10bb2@leemhuis.info/,
> so I assume that my problem, too, was introduced with the Maple Tree
> changes. Sending this to the same people and lists.

These are the right people to email.

Hopefully I'll have an update for you soon.

Regards,
Liam

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Regression] mmap with MAP_32BIT randomly fails since 6.1
@ 2023-03-02 15:32 Snild Dolkow
  2023-03-02 18:43 ` Liam R. Howlett
  2023-03-03  8:31 ` Linux regression tracking #adding (Thorsten Leemhuis)
  0 siblings, 2 replies; 6+ messages in thread
From: Snild Dolkow @ 2023-03-02 15:32 UTC (permalink / raw)
  To: Liam R. Howlett, Matthew Wilcox (Oracle)
  Cc: regressions, LKML, Linux-MM, maple-tree, Andrew Morton

After upgrading a machine from 5.17.4 to 6.1.12 a couple of weeks ago, I 
started getting (inconsistent) failures when building Android:

> dex2oatd F 02-28 11:49:44 40098 40098 mem_map_arena_pool.cc:65] Check failed: map.IsValid() Failed anonymous mmap((nil), 131072, 0x3, 0x22, -1, 0): Cannot allocate memory. See process maps in the log.

While it claims to be using 0x22 (MAP_PRIVATE | MAP_ANONYMOUS) for the 
flags, it really uses 0x40 (MAP_32BIT) as well, as shown by strace:

> mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x40720000
> mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x4124e000
> mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = -1 ENOMEM (Cannot allocate memory)
> dex2oatd F 03-01 10:32:33 74063 74063 mem_map_arena_pool.cc:65] Check failed: map.IsValid() Failed anonymous mmap((nil), 131072, 0x3, 0x22, -1, 0): Cannot allocate memory. See process maps in the log.

Here's a simple reproducer, which (if my math is correct) tries to mmap 
a total of ~600MiB in increasing chunk sizes:

#include <sys/mman.h>
#include <stdio.h>
#include <errno.h>

int main() {
     size_t total_leaks = 0;
     for (int shift=12; shift<=16; shift++) {
         size_t size = ((size_t)1)<<shift;
         for (int i=0; i<5000; ++i) {
             void* m = mmap(NULL, size, PROT_READ | PROT_WRITE,
                     MAP_PRIVATE | MAP_ANONYMOUS | MAP_32BIT, -1, 0);
             if (m == MAP_FAILED || m == NULL) {
                 printf(
                     "Failed. m=%p size=%zd (1<<%d) i=%d "
                     " errno=%d total_leaks=%zd (%zd MiB)\n",
                     m, size, shift, i, errno,
                     total_leaks, total_leaks / 1024 / 1024);
                 return 1;
             }
             total_leaks += size;
         }
     }
     printf("Success.\n");
     return 0;
}

Older kernels fail very consistently at almost exactly 1GiB total_leaks, 
if you change the test program to go that far. On 6.1.12, it fails much 
earlier, after an arbitrary amount of successful mmaps:

> $ ./mmap-test 
> Failed. m=0xffffffffffffffff size=4096 (1<<12) i=1500  errno=12 total_leaks=6144000 (5 MiB)
> $ ./mmap-test 
> Failed. m=0xffffffffffffffff size=4096 (1<<12) i=620  errno=12 total_leaks=2539520 (2 MiB)
> $ ./mmap-test 
> Failed. m=0xffffffffffffffff size=4096 (1<<12) i=2408  errno=12 total_leaks=9863168 (9 MiB)
> $ ./mmap-test 
> Failed. m=0xffffffffffffffff size=4096 (1<<12) i=774  errno=12 total_leaks=3170304 (3 MiB)
> $ ./mmap-test 
> Failed. m=0xffffffffffffffff size=4096 (1<<12) i=1648  errno=12 total_leaks=6750208 (6 MiB)
> $ ./mmap-test 


I have checked a more recent master commit (ee3f96b1, from March 1st), 
and the problem is still there. Bisecting shows that e15e06a8 is the 
last good commit, and that 524e00b3 is the first one failing in this 
way. The 10 or so commits in between run into a page fault BUG down in 
vma_merge() instead.

This range of commits is about the same as mentioned in 
https://lore.kernel.org/lkml/0b9f5425-08d4-8013-aa4c-e620c3b10bb2@leemhuis.info/, 
so I assume that my problem, too, was introduced with the Maple Tree 
changes. Sending this to the same people and lists.

//Snild

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-05-15 15:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-12  1:02 [Regression] mmap with MAP_32BIT randomly fails since 6.1 Robert Hensing
2023-05-15 14:39 ` Liam R. Howlett
  -- strict thread matches above, loose matches on Subject: below --
2023-03-02 15:32 Snild Dolkow
2023-03-02 18:43 ` Liam R. Howlett
2023-03-03  8:31 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-03-03 20:11   ` Liam R. Howlett

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.