linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] mapletree-vs-khugepaged
@ 2022-04-28 17:20 Guenter Roeck
  2022-04-28 19:27 ` Liam Howlett
  2022-04-29 12:09 ` Heiko Carstens
  0 siblings, 2 replies; 32+ messages in thread
From: Guenter Roeck @ 2022-04-28 17:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrew Morton, linux-mm, linux-kernel, Liam R. Howlett

On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
> Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
> more consistent", v3.
> 
> Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
attached.

Guenter

---
[   12.624703] Unable to handle kernel paging request at virtual address 0e00000000000000
[   12.624793] tsk->{mm,active_mm}->context = 0000000000000005
[   12.624823] tsk->{mm,active_mm}->pgd = fffff800048b8000
[   12.624849]               \|/ ____ \|/
[   12.624849]               "@'/ .. \`@"
[   12.624849]               /_| \__/ |_\
[   12.624849]                  \__U_/
[   12.624874] init(1): Oops [#1]
[   12.625194] CPU: 0 PID: 1 Comm: init Not tainted 5.18.0-rc4-next-20220428 #1
[   12.625421] TSTATE: 0000009911001606 TPC: 00000000005e6330 TNPC: 00000000005e6334 Y: 00000000    Not tainted
[   12.625455] TPC: <mmap_region+0x150/0x700>
[   12.625503] g0: 0000000000619a00 g1: 0000000000000000 g2: fffff8000488b200 g3: 0000000000000000
[   12.625537] g4: fffff8000414a9a0 g5: fffff8001dd3e000 g6: fffff8000414c000 g7: 0000000000000000
[   12.625569] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000001167b68 o3: 0000000000f51bb8
[   12.625601] o4: fffff80100301fff o5: fffff8000414fc20 sp: fffff8000414f341 ret_pc: 00000000005e6310
[   12.625630] RPC: <mmap_region+0x130/0x700>
[   12.625692] l0: fffff8000488b260 l1: 000000000000008b l2: fffff80100302000 l3: 0000000000000000
[   12.625725] l4: fffff80100301fff l5: 0000000000000000 l6: 30812c2a1dd8556f l7: fffff8000414b438
[   12.625762] i0: fffff800044f58a0 i1: fffff801001ec000 i2: 0e00000000000000 i3: 0000000000000075
[   12.625795] i4: 0000000000000000 i5: fffff8000414fde0 i6: fffff8000414f461 i7: 00000000005e6c58
[   12.625833] I7: <do_mmap+0x378/0x500>
[   12.625906] Call Trace:
[   12.626006] [<00000000005e6c58>] do_mmap+0x378/0x500
[   12.626092] [<00000000005bdc98>] vm_mmap_pgoff+0x78/0x100
[   12.626112] [<00000000005e3d24>] ksys_mmap_pgoff+0x164/0x1c0
[   12.626129] [<0000000000406294>] linux_sparc_syscall+0x34/0x44
[   12.626198] Disabling lock debugging due to kernel taint
[   12.626286] Caller[00000000005e6c58]: do_mmap+0x378/0x500
[   12.626335] Caller[00000000005bdc98]: vm_mmap_pgoff+0x78/0x100
[   12.626354] Caller[00000000005e3d24]: ksys_mmap_pgoff+0x164/0x1c0
[   12.626371] Caller[0000000000406294]: linux_sparc_syscall+0x34/0x44
[   12.626390] Caller[fffff8010001d88c]: 0xfffff8010001d88c
[   12.626537] Instruction DUMP:
[   12.626567]  a6100008
[   12.626678]  02c68006
[   12.626685]  01000000
[   12.626690] <c25e8000>
[   12.626696]  80a04012
[   12.626701]  22600077
[   12.626707]  c25ea088
[   12.626712]  22c4c00a
[   12.626717]  f277a7c7
[   12.626728]
[   12.627169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

---
# bad: [bdc61aad77faf67187525028f1f355eff3849f22] Add linux-next specific files for 20220428
# good: [af2d861d4cd2a4da5137f795ee3509e6f944a25b] Linux 5.18-rc4
git bisect start 'HEAD' 'v5.18-rc4'
# good: [a6ffa4aa7e81a54632f3370f4c93fce603160192] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
git bisect good a6ffa4aa7e81a54632f3370f4c93fce603160192
# good: [cd63f17e3bb63006f9f88bf7f5947b8e1601bcd9] Merge branch 'edac-for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git
git bisect good cd63f17e3bb63006f9f88bf7f5947b8e1601bcd9
# good: [cee7bbed3e5cc089b5c364ac8ad4a186c2a28bb6] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git
git bisect good cee7bbed3e5cc089b5c364ac8ad4a186c2a28bb6
# good: [d5a23156ea99f10b584221893a6a7d6f6554cde8] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git
git bisect good d5a23156ea99f10b584221893a6a7d6f6554cde8
# good: [2f1fde90d983bc404503100c9c4bbbf1e191bcf4] selftests: cgroup: fix alloc_anon_noexit() instantly freeing memory
git bisect good 2f1fde90d983bc404503100c9c4bbbf1e191bcf4
# good: [fca1db6ff251278c532231552e840c7dc36dfa76] Merge branch 'bitmap-for-next' of https://github.com/norov/linux.git
git bisect good fca1db6ff251278c532231552e840c7dc36dfa76
# good: [40b39116fe8e6fb66e3166ea40138eec506dfd91] perf: use VMA iterator
git bisect good 40b39116fe8e6fb66e3166ea40138eec506dfd91
# bad: [33ef257872566922df2b6bcfdb5330b2388aef53] Docs/{ABI,admin-guide}/damon: update for fixed virtual address ranges monitoring
git bisect bad 33ef257872566922df2b6bcfdb5330b2388aef53
# good: [2d8640f244c1ea6c40acde911d339dabc2ac765d] mm/oom_kill: use maple tree iterators instead of vma linked list
git bisect good 2d8640f244c1ea6c40acde911d339dabc2ac765d
# good: [49d281fa016f2906346f1707e5059b6f7674a948] mm/mmap.c: pass in mapping to __vma_link_file()
git bisect good 49d281fa016f2906346f1707e5059b6f7674a948
# bad: [778ae6914961a857596ccdddb69f34ad1d597cd0] selftets/damon/sysfs: test existence and permission of avail_operations
git bisect bad 778ae6914961a857596ccdddb69f34ad1d597cd0
# bad: [14031cb11d7f48cc0cb19084537e378fa8ce020d] mm/damon/core: add a function for damon_operations registration checks
git bisect bad 14031cb11d7f48cc0cb19084537e378fa8ce020d
# bad: [41fd8be857ee43f2f466fca7c2b66fea39f6540d] mapletree-vs-khugepaged
git bisect bad 41fd8be857ee43f2f466fca7c2b66fea39f6540d
# first bad commit: [41fd8be857ee43f2f466fca7c2b66fea39f6540d] mapletree-vs-khugepaged

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-04-28 17:20 [PATCH] mapletree-vs-khugepaged Guenter Roeck
@ 2022-04-28 19:27 ` Liam Howlett
  2022-04-29 12:09 ` Heiko Carstens
  1 sibling, 0 replies; 32+ messages in thread
From: Liam Howlett @ 2022-04-28 19:27 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Andrew Morton, linux-mm, linux-kernel

* Guenter Roeck <linux@roeck-us.net> [220428 13:20]:
> On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
> > Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
> > more consistent", v3.
> > 
> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> 
> This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
> attached.

This is very interesting.  If 49d281fa016f2906346f1707e5059b6f7674a948
"mm/mmap.c: pass in mapping to __vma_link_file()" is okay, I would
expect this one to also be okay.  Is this a case of randomization of
addresses on boot causing bad commits to be reported as good sometimes?

I'll try and get set up to test all these architectures, but a lot of
them are frustrating to get going so it might take a while.  Note that
progress may be slower due to events scheduled for next week.

Thanks,
Liam


> 
> Guenter
> 
> ---
> [   12.624703] Unable to handle kernel paging request at virtual address 0e00000000000000
> [   12.624793] tsk->{mm,active_mm}->context = 0000000000000005
> [   12.624823] tsk->{mm,active_mm}->pgd = fffff800048b8000
> [   12.624849]               \|/ ____ \|/
> [   12.624849]               "@'/ .. \`@"
> [   12.624849]               /_| \__/ |_\
> [   12.624849]                  \__U_/
> [   12.624874] init(1): Oops [#1]
> [   12.625194] CPU: 0 PID: 1 Comm: init Not tainted 5.18.0-rc4-next-20220428 #1
> [   12.625421] TSTATE: 0000009911001606 TPC: 00000000005e6330 TNPC: 00000000005e6334 Y: 00000000    Not tainted
> [   12.625455] TPC: <mmap_region+0x150/0x700>
> [   12.625503] g0: 0000000000619a00 g1: 0000000000000000 g2: fffff8000488b200 g3: 0000000000000000
> [   12.625537] g4: fffff8000414a9a0 g5: fffff8001dd3e000 g6: fffff8000414c000 g7: 0000000000000000
> [   12.625569] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000001167b68 o3: 0000000000f51bb8
> [   12.625601] o4: fffff80100301fff o5: fffff8000414fc20 sp: fffff8000414f341 ret_pc: 00000000005e6310
> [   12.625630] RPC: <mmap_region+0x130/0x700>
> [   12.625692] l0: fffff8000488b260 l1: 000000000000008b l2: fffff80100302000 l3: 0000000000000000
> [   12.625725] l4: fffff80100301fff l5: 0000000000000000 l6: 30812c2a1dd8556f l7: fffff8000414b438
> [   12.625762] i0: fffff800044f58a0 i1: fffff801001ec000 i2: 0e00000000000000 i3: 0000000000000075
> [   12.625795] i4: 0000000000000000 i5: fffff8000414fde0 i6: fffff8000414f461 i7: 00000000005e6c58
> [   12.625833] I7: <do_mmap+0x378/0x500>
> [   12.625906] Call Trace:
> [   12.626006] [<00000000005e6c58>] do_mmap+0x378/0x500
> [   12.626092] [<00000000005bdc98>] vm_mmap_pgoff+0x78/0x100
> [   12.626112] [<00000000005e3d24>] ksys_mmap_pgoff+0x164/0x1c0
> [   12.626129] [<0000000000406294>] linux_sparc_syscall+0x34/0x44
> [   12.626198] Disabling lock debugging due to kernel taint
> [   12.626286] Caller[00000000005e6c58]: do_mmap+0x378/0x500
> [   12.626335] Caller[00000000005bdc98]: vm_mmap_pgoff+0x78/0x100
> [   12.626354] Caller[00000000005e3d24]: ksys_mmap_pgoff+0x164/0x1c0
> [   12.626371] Caller[0000000000406294]: linux_sparc_syscall+0x34/0x44
> [   12.626390] Caller[fffff8010001d88c]: 0xfffff8010001d88c
> [   12.626537] Instruction DUMP:
> [   12.626567]  a6100008
> [   12.626678]  02c68006
> [   12.626685]  01000000
> [   12.626690] <c25e8000>
> [   12.626696]  80a04012
> [   12.626701]  22600077
> [   12.626707]  c25ea088
> [   12.626712]  22c4c00a
> [   12.626717]  f277a7c7
> [   12.626728]
> [   12.627169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> 
> ---
> # bad: [bdc61aad77faf67187525028f1f355eff3849f22] Add linux-next specific files for 20220428
> # good: [af2d861d4cd2a4da5137f795ee3509e6f944a25b] Linux 5.18-rc4
> git bisect start 'HEAD' 'v5.18-rc4'
> # good: [a6ffa4aa7e81a54632f3370f4c93fce603160192] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> git bisect good a6ffa4aa7e81a54632f3370f4c93fce603160192
> # good: [cd63f17e3bb63006f9f88bf7f5947b8e1601bcd9] Merge branch 'edac-for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git
> git bisect good cd63f17e3bb63006f9f88bf7f5947b8e1601bcd9
> # good: [cee7bbed3e5cc089b5c364ac8ad4a186c2a28bb6] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git
> git bisect good cee7bbed3e5cc089b5c364ac8ad4a186c2a28bb6
> # good: [d5a23156ea99f10b584221893a6a7d6f6554cde8] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git
> git bisect good d5a23156ea99f10b584221893a6a7d6f6554cde8
> # good: [2f1fde90d983bc404503100c9c4bbbf1e191bcf4] selftests: cgroup: fix alloc_anon_noexit() instantly freeing memory
> git bisect good 2f1fde90d983bc404503100c9c4bbbf1e191bcf4
> # good: [fca1db6ff251278c532231552e840c7dc36dfa76] Merge branch 'bitmap-for-next' of https://github.com/norov/linux.git
> git bisect good fca1db6ff251278c532231552e840c7dc36dfa76
> # good: [40b39116fe8e6fb66e3166ea40138eec506dfd91] perf: use VMA iterator
> git bisect good 40b39116fe8e6fb66e3166ea40138eec506dfd91
> # bad: [33ef257872566922df2b6bcfdb5330b2388aef53] Docs/{ABI,admin-guide}/damon: update for fixed virtual address ranges monitoring
> git bisect bad 33ef257872566922df2b6bcfdb5330b2388aef53
> # good: [2d8640f244c1ea6c40acde911d339dabc2ac765d] mm/oom_kill: use maple tree iterators instead of vma linked list
> git bisect good 2d8640f244c1ea6c40acde911d339dabc2ac765d
> # good: [49d281fa016f2906346f1707e5059b6f7674a948] mm/mmap.c: pass in mapping to __vma_link_file()
> git bisect good 49d281fa016f2906346f1707e5059b6f7674a948
> # bad: [778ae6914961a857596ccdddb69f34ad1d597cd0] selftets/damon/sysfs: test existence and permission of avail_operations
> git bisect bad 778ae6914961a857596ccdddb69f34ad1d597cd0
> # bad: [14031cb11d7f48cc0cb19084537e378fa8ce020d] mm/damon/core: add a function for damon_operations registration checks
> git bisect bad 14031cb11d7f48cc0cb19084537e378fa8ce020d
> # bad: [41fd8be857ee43f2f466fca7c2b66fea39f6540d] mapletree-vs-khugepaged
> git bisect bad 41fd8be857ee43f2f466fca7c2b66fea39f6540d
> # first bad commit: [41fd8be857ee43f2f466fca7c2b66fea39f6540d] mapletree-vs-khugepaged

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-04-28 17:20 [PATCH] mapletree-vs-khugepaged Guenter Roeck
  2022-04-28 19:27 ` Liam Howlett
@ 2022-04-29 12:09 ` Heiko Carstens
  2022-04-29 13:01   ` Liam Howlett
  2022-05-13 14:46   ` Sven Schnelle
  1 sibling, 2 replies; 32+ messages in thread
From: Heiko Carstens @ 2022-04-29 12:09 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Andrew Morton, linux-mm, linux-kernel, Liam R. Howlett

On Thu, Apr 28, 2022 at 10:20:40AM -0700, Guenter Roeck wrote:
> On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
> > Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
> > more consistent", v3.
> > 
> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> 
> This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
> attached.
> 
> Guenter
> 
> ---
> [   12.624703] Unable to handle kernel paging request at virtual address 0e00000000000000
> [   12.624793] tsk->{mm,active_mm}->context = 0000000000000005
> [   12.624823] tsk->{mm,active_mm}->pgd = fffff800048b8000
> [   12.624849]               \|/ ____ \|/
> [   12.624849]               "@'/ .. \`@"
> [   12.624849]               /_| \__/ |_\
> [   12.624849]                  \__U_/
> [   12.624874] init(1): Oops [#1]
> [   12.625194] CPU: 0 PID: 1 Comm: init Not tainted 5.18.0-rc4-next-20220428 #1
> [   12.625421] TSTATE: 0000009911001606 TPC: 00000000005e6330 TNPC: 00000000005e6334 Y: 00000000    Not tainted
> [   12.625455] TPC: <mmap_region+0x150/0x700>
> [   12.625503] g0: 0000000000619a00 g1: 0000000000000000 g2: fffff8000488b200 g3: 0000000000000000
> [   12.625537] g4: fffff8000414a9a0 g5: fffff8001dd3e000 g6: fffff8000414c000 g7: 0000000000000000
> [   12.625569] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000001167b68 o3: 0000000000f51bb8
> [   12.625601] o4: fffff80100301fff o5: fffff8000414fc20 sp: fffff8000414f341 ret_pc: 00000000005e6310
> [   12.625630] RPC: <mmap_region+0x130/0x700>
> [   12.625692] l0: fffff8000488b260 l1: 000000000000008b l2: fffff80100302000 l3: 0000000000000000
> [   12.625725] l4: fffff80100301fff l5: 0000000000000000 l6: 30812c2a1dd8556f l7: fffff8000414b438
> [   12.625762] i0: fffff800044f58a0 i1: fffff801001ec000 i2: 0e00000000000000 i3: 0000000000000075
> [   12.625795] i4: 0000000000000000 i5: fffff8000414fde0 i6: fffff8000414f461 i7: 00000000005e6c58
> [   12.625833] I7: <do_mmap+0x378/0x500>
> [   12.625906] Call Trace:
> [   12.626006] [<00000000005e6c58>] do_mmap+0x378/0x500
> [   12.626092] [<00000000005bdc98>] vm_mmap_pgoff+0x78/0x100
> [   12.626112] [<00000000005e3d24>] ksys_mmap_pgoff+0x164/0x1c0
> [   12.626129] [<0000000000406294>] linux_sparc_syscall+0x34/0x44
> [   12.626198] Disabling lock debugging due to kernel taint
> [   12.626286] Caller[00000000005e6c58]: do_mmap+0x378/0x500
> [   12.626335] Caller[00000000005bdc98]: vm_mmap_pgoff+0x78/0x100
> [   12.626354] Caller[00000000005e3d24]: ksys_mmap_pgoff+0x164/0x1c0
> [   12.626371] Caller[0000000000406294]: linux_sparc_syscall+0x34/0x44
> [   12.626390] Caller[fffff8010001d88c]: 0xfffff8010001d88c
> [   12.626537] Instruction DUMP:
> [   12.626567]  a6100008
> [   12.626678]  02c68006
> [   12.626685]  01000000
> [   12.626690] <c25e8000>
> [   12.626696]  80a04012
> [   12.626701]  22600077
> [   12.626707]  c25ea088
> [   12.626712]  22c4c00a
> [   12.626717]  f277a7c7
> [   12.626728]
> [   12.627169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

FWIW, same on s390 - linux-next is completely broken. Note: I didn't
bisect, but given that the call trace, and even the failing address
match, I'm quite confident it is the same reason.

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0e00000000000000 TEID: 0e00000000000803
Fault in home space mode while using kernel ASCE.
AS:00000000bac44007 R3:00000001ffff0007 S:00000001fffef800 P:000000000000003d
Oops: 0038 ilc:3 [#1] SMP
CPU: 3 PID: 79757 Comm: pt_upgrade_race Tainted: G            E K   5.18.0-20220428.rc4.git500.bdc61aad77fa.300.fc35.s390x+next #1
Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
Krnl PSW : 0704c00180000000 00000000b912c9a2 (mmap_region+0x1a2/0x8a8)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
           ffffffffffffffff 000000000000000f 00000380016b3d98 0000080000100000
           000000008364c100 0000080000000000 0000000000100077 0e00000000000000
           00000000909da100 00000380016b3c58 00000000b912c982 00000380016b3b40
Krnl Code: 00000000b912c992: a774002c          brc     7,00000000b912c9ea
           00000000b912c996: ecb80225007c      cgij    %r11,0,8,00000000b912cde0
          #00000000b912c99c: e310f0f80004      lg      %r1,248(%r15)
          >00000000b912c9a2: e37010000020      cg      %r7,0(%r1)
           00000000b912c9a8: a784010b          brc     8,00000000b912cbbe
           00000000b912c9ac: e310f0e80004      lg      %r1,232(%r15)
           00000000b912c9b2: ec180013007c      cgij    %r1,0,8,00000000b912c9d8
           00000000b912c9b8: e310f0e80004      lg      %r1,232(%r15)
Call Trace:
 [<00000000b912c9a2>] mmap_region+0x1a2/0x8a8
([<00000000b912c982>] mmap_region+0x182/0x8a8)
 [<00000000b912d492>] do_mmap+0x3ea/0x4c8
 [<00000000b90fb9cc>] vm_mmap_pgoff+0xd4/0x170
 [<00000000b9129c9a>] ksys_mmap_pgoff+0x62/0x238
 [<00000000b912a034>] __s390x_sys_old_mmap+0x74/0x98
 [<00000000b9a78ff8>] __do_syscall+0x1d8/0x200
 [<00000000b9a872a2>] system_call+0x82/0xb0
Last Breaking-Event-Address:
 [<00000000b9b9e678>] __s390_indirect_jump_r14+0x0/0xc
Kernel panic - not syncing: Fatal exception: panic_on_oops

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-04-29 12:09 ` Heiko Carstens
@ 2022-04-29 13:01   ` Liam Howlett
  2022-04-29 13:10     ` Heiko Carstens
  2022-05-13 14:46   ` Sven Schnelle
  1 sibling, 1 reply; 32+ messages in thread
From: Liam Howlett @ 2022-04-29 13:01 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: Guenter Roeck, Andrew Morton, linux-mm, linux-kernel

* Heiko Carstens <hca@linux.ibm.com> [220429 08:10]:
> On Thu, Apr 28, 2022 at 10:20:40AM -0700, Guenter Roeck wrote:
> > On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
> > > Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
> > > more consistent", v3.
> > > 
> > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > 
> > This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
> > attached.
> > 
> > Guenter
> > 
> > ---
> > [   12.624703] Unable to handle kernel paging request at virtual address 0e00000000000000
> > [   12.624793] tsk->{mm,active_mm}->context = 0000000000000005
> > [   12.624823] tsk->{mm,active_mm}->pgd = fffff800048b8000
> > [   12.624849]               \|/ ____ \|/
> > [   12.624849]               "@'/ .. \`@"
> > [   12.624849]               /_| \__/ |_\
> > [   12.624849]                  \__U_/
> > [   12.624874] init(1): Oops [#1]
> > [   12.625194] CPU: 0 PID: 1 Comm: init Not tainted 5.18.0-rc4-next-20220428 #1
> > [   12.625421] TSTATE: 0000009911001606 TPC: 00000000005e6330 TNPC: 00000000005e6334 Y: 00000000    Not tainted
> > [   12.625455] TPC: <mmap_region+0x150/0x700>
> > [   12.625503] g0: 0000000000619a00 g1: 0000000000000000 g2: fffff8000488b200 g3: 0000000000000000
> > [   12.625537] g4: fffff8000414a9a0 g5: fffff8001dd3e000 g6: fffff8000414c000 g7: 0000000000000000
> > [   12.625569] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000001167b68 o3: 0000000000f51bb8
> > [   12.625601] o4: fffff80100301fff o5: fffff8000414fc20 sp: fffff8000414f341 ret_pc: 00000000005e6310
> > [   12.625630] RPC: <mmap_region+0x130/0x700>
> > [   12.625692] l0: fffff8000488b260 l1: 000000000000008b l2: fffff80100302000 l3: 0000000000000000
> > [   12.625725] l4: fffff80100301fff l5: 0000000000000000 l6: 30812c2a1dd8556f l7: fffff8000414b438
> > [   12.625762] i0: fffff800044f58a0 i1: fffff801001ec000 i2: 0e00000000000000 i3: 0000000000000075
> > [   12.625795] i4: 0000000000000000 i5: fffff8000414fde0 i6: fffff8000414f461 i7: 00000000005e6c58
> > [   12.625833] I7: <do_mmap+0x378/0x500>
> > [   12.625906] Call Trace:
> > [   12.626006] [<00000000005e6c58>] do_mmap+0x378/0x500
> > [   12.626092] [<00000000005bdc98>] vm_mmap_pgoff+0x78/0x100
> > [   12.626112] [<00000000005e3d24>] ksys_mmap_pgoff+0x164/0x1c0
> > [   12.626129] [<0000000000406294>] linux_sparc_syscall+0x34/0x44
> > [   12.626198] Disabling lock debugging due to kernel taint
> > [   12.626286] Caller[00000000005e6c58]: do_mmap+0x378/0x500
> > [   12.626335] Caller[00000000005bdc98]: vm_mmap_pgoff+0x78/0x100
> > [   12.626354] Caller[00000000005e3d24]: ksys_mmap_pgoff+0x164/0x1c0
> > [   12.626371] Caller[0000000000406294]: linux_sparc_syscall+0x34/0x44
> > [   12.626390] Caller[fffff8010001d88c]: 0xfffff8010001d88c
> > [   12.626537] Instruction DUMP:
> > [   12.626567]  a6100008
> > [   12.626678]  02c68006
> > [   12.626685]  01000000
> > [   12.626690] <c25e8000>
> > [   12.626696]  80a04012
> > [   12.626701]  22600077
> > [   12.626707]  c25ea088
> > [   12.626712]  22c4c00a
> > [   12.626717]  f277a7c7
> > [   12.626728]
> > [   12.627169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> 
> FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> bisect, but given that the call trace, and even the failing address
> match, I'm quite confident it is the same reason.

This is worth a lot to me.  Thanks for the report and the testing.

Regards,
Liam

> 
> Unable to handle kernel pointer dereference in virtual kernel address space
> Failing address: 0e00000000000000 TEID: 0e00000000000803
> Fault in home space mode while using kernel ASCE.
> AS:00000000bac44007 R3:00000001ffff0007 S:00000001fffef800 P:000000000000003d
> Oops: 0038 ilc:3 [#1] SMP
> CPU: 3 PID: 79757 Comm: pt_upgrade_race Tainted: G            E K   5.18.0-20220428.rc4.git500.bdc61aad77fa.300.fc35.s390x+next #1
> Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
> Krnl PSW : 0704c00180000000 00000000b912c9a2 (mmap_region+0x1a2/0x8a8)
>            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
>            ffffffffffffffff 000000000000000f 00000380016b3d98 0000080000100000
>            000000008364c100 0000080000000000 0000000000100077 0e00000000000000
>            00000000909da100 00000380016b3c58 00000000b912c982 00000380016b3b40
> Krnl Code: 00000000b912c992: a774002c          brc     7,00000000b912c9ea
>            00000000b912c996: ecb80225007c      cgij    %r11,0,8,00000000b912cde0
>           #00000000b912c99c: e310f0f80004      lg      %r1,248(%r15)
>           >00000000b912c9a2: e37010000020      cg      %r7,0(%r1)
>            00000000b912c9a8: a784010b          brc     8,00000000b912cbbe
>            00000000b912c9ac: e310f0e80004      lg      %r1,232(%r15)
>            00000000b912c9b2: ec180013007c      cgij    %r1,0,8,00000000b912c9d8
>            00000000b912c9b8: e310f0e80004      lg      %r1,232(%r15)
> Call Trace:
>  [<00000000b912c9a2>] mmap_region+0x1a2/0x8a8
> ([<00000000b912c982>] mmap_region+0x182/0x8a8)
>  [<00000000b912d492>] do_mmap+0x3ea/0x4c8
>  [<00000000b90fb9cc>] vm_mmap_pgoff+0xd4/0x170
>  [<00000000b9129c9a>] ksys_mmap_pgoff+0x62/0x238
>  [<00000000b912a034>] __s390x_sys_old_mmap+0x74/0x98
>  [<00000000b9a78ff8>] __do_syscall+0x1d8/0x200
>  [<00000000b9a872a2>] system_call+0x82/0xb0
> Last Breaking-Event-Address:
>  [<00000000b9b9e678>] __s390_indirect_jump_r14+0x0/0xc
> Kernel panic - not syncing: Fatal exception: panic_on_oops

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-04-29 13:01   ` Liam Howlett
@ 2022-04-29 13:10     ` Heiko Carstens
  2022-04-29 16:18       ` Liam Howlett
  0 siblings, 1 reply; 32+ messages in thread
From: Heiko Carstens @ 2022-04-29 13:10 UTC (permalink / raw)
  To: Liam Howlett; +Cc: Guenter Roeck, Andrew Morton, linux-mm, linux-kernel

On Fri, Apr 29, 2022 at 01:01:53PM +0000, Liam Howlett wrote:
> * Heiko Carstens <hca@linux.ibm.com> [220429 08:10]:
> > On Thu, Apr 28, 2022 at 10:20:40AM -0700, Guenter Roeck wrote:
> > > On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
> > > > Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
> > > > more consistent", v3.
> > > > 
> > > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> > > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > > 
> > > This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
> > > attached.
> > > 
> > > Guenter
...
> > 
> > FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> > bisect, but given that the call trace, and even the failing address
> > match, I'm quite confident it is the same reason.
> 
> This is worth a lot to me.  Thanks for the report and the testing.

Not sure if it is of any relevance, and you are probably aware if it
anyway, but both sparc64 and s390 are big endian; and there was no
report from little endian architectures yet.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-04-29 13:10     ` Heiko Carstens
@ 2022-04-29 16:18       ` Liam Howlett
  2022-05-02  9:10         ` Geert Uytterhoeven
  0 siblings, 1 reply; 32+ messages in thread
From: Liam Howlett @ 2022-04-29 16:18 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: Guenter Roeck, Andrew Morton, linux-mm, linux-kernel

* Heiko Carstens <hca@linux.ibm.com> [220429 09:19]:
> On Fri, Apr 29, 2022 at 01:01:53PM +0000, Liam Howlett wrote:
> > * Heiko Carstens <hca@linux.ibm.com> [220429 08:10]:
> > > On Thu, Apr 28, 2022 at 10:20:40AM -0700, Guenter Roeck wrote:
> > > > On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
> > > > > Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
> > > > > more consistent", v3.
> > > > > 
> > > > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> > > > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > > > 
> > > > This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
> > > > attached.
> > > > 
> > > > Guenter
> ...
> > > 
> > > FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> > > bisect, but given that the call trace, and even the failing address
> > > match, I'm quite confident it is the same reason.
> > 
> > This is worth a lot to me.  Thanks for the report and the testing.
> 
> Not sure if it is of any relevance, and you are probably aware if it
> anyway, but both sparc64 and s390 are big endian; and there was no
> report from little endian architectures yet.

I was aware they are big endian, but thanks - the more info the better.
sparc64 is technically bi-endian but I think everyone runs it in big
endian mode?  Is alpha the same?  There was a report of alpha having
issues too. m68k is also big endian - but also nommu, so that makes
testing difficult.

What I liked about the s390 report is that the s390 is very good at
finding vma issues since it seems to use move_vma (among others) a lot
more than other arch. I've built my maple tree + v5.18-rc2 and
successfully booted with KASAN and poison on the s390. Andrew asked me
to respin maple on top of one of his branches with the fixes rolled in
so I'm going to work on that while the m68k buildroot compiles.

In parallel, I'm running a (very slow qemu) install of sparc64 and
trying to figure out how to get a qemu setup for those. I'm trying to
follow what Guenter has in his repo[1] and have found debian ISOs[2]
that may help with some of these targets.


1. https://github.com/groeck/linux-build-test
2. https://cdimage.debian.org/cdimage/ports/snapshots/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-04-29 16:18       ` Liam Howlett
@ 2022-05-02  9:10         ` Geert Uytterhoeven
  0 siblings, 0 replies; 32+ messages in thread
From: Geert Uytterhoeven @ 2022-05-02  9:10 UTC (permalink / raw)
  To: Liam Howlett
  Cc: Heiko Carstens, Guenter Roeck, Andrew Morton, linux-mm, linux-kernel

Hi Liam,

On Sat, Apr 30, 2022 at 1:58 AM Liam Howlett <liam.howlett@oracle.com> wrote:
> * Heiko Carstens <hca@linux.ibm.com> [220429 09:19]:
> > On Fri, Apr 29, 2022 at 01:01:53PM +0000, Liam Howlett wrote:
> > > * Heiko Carstens <hca@linux.ibm.com> [220429 08:10]:
> > > > On Thu, Apr 28, 2022 at 10:20:40AM -0700, Guenter Roeck wrote:
> > > > > On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
> > > > > > Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
> > > > > > more consistent", v3.
> > > > > >
> > > > > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> > > > > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > > > >
> > > > > This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
> > > > > attached.
> > > > >
> > > > > Guenter
> > ...
> > > >
> > > > FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> > > > bisect, but given that the call trace, and even the failing address
> > > > match, I'm quite confident it is the same reason.
> > >
> > > This is worth a lot to me.  Thanks for the report and the testing.
> >
> > Not sure if it is of any relevance, and you are probably aware if it
> > anyway, but both sparc64 and s390 are big endian; and there was no
> > report from little endian architectures yet.
>
> I was aware they are big endian, but thanks - the more info the better.
> sparc64 is technically bi-endian but I think everyone runs it in big

Sparc64 is big-endian. It has support for accessing little endian data
in memory, but that's merely an optimization.

> endian mode?  Is alpha the same?  There was a report of alpha having

Alpha is little-endian.

> issues too. m68k is also big endian - but also nommu, so that makes
> testing difficult.

M68k exists with and without MMU.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-04-29 12:09 ` Heiko Carstens
  2022-04-29 13:01   ` Liam Howlett
@ 2022-05-13 14:46   ` Sven Schnelle
  2022-05-13 14:51     ` Sven Schnelle
                       ` (4 more replies)
  1 sibling, 5 replies; 32+ messages in thread
From: Sven Schnelle @ 2022-05-13 14:46 UTC (permalink / raw)
  To: Liam R. Howlett
  Cc: Heiko Carstens, Guenter Roeck, Andrew Morton, linux-mm, linux-kernel

Heiko Carstens <hca@linux.ibm.com> writes:

> On Thu, Apr 28, 2022 at 10:20:40AM -0700, Guenter Roeck wrote:
>> On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
>> > Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
>> > more consistent", v3.
>> > 
>> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
>> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> 
>> This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
>> attached.
>> 
>> Guenter
>> 
>> ---
>> [   12.624703] Unable to handle kernel paging request at virtual address 0e00000000000000
>> [   12.624793] tsk->{mm,active_mm}->context = 0000000000000005
>> [   12.624823] tsk->{mm,active_mm}->pgd = fffff800048b8000
>> [   12.624849]               \|/ ____ \|/
>> [   12.624849]               "@'/ .. \`@"
>> [   12.624849]               /_| \__/ |_\
>> [   12.624849]                  \__U_/
>> [   12.624874] init(1): Oops [#1]
>> [   12.625194] CPU: 0 PID: 1 Comm: init Not tainted 5.18.0-rc4-next-20220428 #1
>> [   12.625421] TSTATE: 0000009911001606 TPC: 00000000005e6330 TNPC: 00000000005e6334 Y: 00000000    Not tainted
>> [   12.625455] TPC: <mmap_region+0x150/0x700>
>> [   12.625503] g0: 0000000000619a00 g1: 0000000000000000 g2: fffff8000488b200 g3: 0000000000000000
>> [   12.625537] g4: fffff8000414a9a0 g5: fffff8001dd3e000 g6: fffff8000414c000 g7: 0000000000000000
>> [   12.625569] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000001167b68 o3: 0000000000f51bb8
>> [   12.625601] o4: fffff80100301fff o5: fffff8000414fc20 sp: fffff8000414f341 ret_pc: 00000000005e6310
>> [   12.625630] RPC: <mmap_region+0x130/0x700>
>> [   12.625692] l0: fffff8000488b260 l1: 000000000000008b l2: fffff80100302000 l3: 0000000000000000
>> [   12.625725] l4: fffff80100301fff l5: 0000000000000000 l6: 30812c2a1dd8556f l7: fffff8000414b438
>> [   12.625762] i0: fffff800044f58a0 i1: fffff801001ec000 i2: 0e00000000000000 i3: 0000000000000075
>> [   12.625795] i4: 0000000000000000 i5: fffff8000414fde0 i6: fffff8000414f461 i7: 00000000005e6c58
>> [   12.625833] I7: <do_mmap+0x378/0x500>
>> [   12.625906] Call Trace:
>> [   12.626006] [<00000000005e6c58>] do_mmap+0x378/0x500
>> [   12.626092] [<00000000005bdc98>] vm_mmap_pgoff+0x78/0x100
>> [   12.626112] [<00000000005e3d24>] ksys_mmap_pgoff+0x164/0x1c0
>> [   12.626129] [<0000000000406294>] linux_sparc_syscall+0x34/0x44
>> [   12.626198] Disabling lock debugging due to kernel taint
>> [   12.626286] Caller[00000000005e6c58]: do_mmap+0x378/0x500
>> [   12.626335] Caller[00000000005bdc98]: vm_mmap_pgoff+0x78/0x100
>> [   12.626354] Caller[00000000005e3d24]: ksys_mmap_pgoff+0x164/0x1c0
>> [   12.626371] Caller[0000000000406294]: linux_sparc_syscall+0x34/0x44
>> [   12.626390] Caller[fffff8010001d88c]: 0xfffff8010001d88c
>> [   12.626537] Instruction DUMP:
>> [   12.626567]  a6100008
>> [   12.626678]  02c68006
>> [   12.626685]  01000000
>> [   12.626690] <c25e8000>
>> [   12.626696]  80a04012
>> [   12.626701]  22600077
>> [   12.626707]  c25ea088
>> [   12.626712]  22c4c00a
>> [   12.626717]  f277a7c7
>> [   12.626728]
>> [   12.627169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
>
> FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> bisect, but given that the call trace, and even the failing address
> match, I'm quite confident it is the same reason.
>
> Unable to handle kernel pointer dereference in virtual kernel address space
> Failing address: 0e00000000000000 TEID: 0e00000000000803
> Fault in home space mode while using kernel ASCE.
> AS:00000000bac44007 R3:00000001ffff0007 S:00000001fffef800 P:000000000000003d
> Oops: 0038 ilc:3 [#1] SMP
> CPU: 3 PID: 79757 Comm: pt_upgrade_race Tainted: G            E K   5.18.0-20220428.rc4.git500.bdc61aad77fa.300.fc35.s390x+next #1
> Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
> Krnl PSW : 0704c00180000000 00000000b912c9a2 (mmap_region+0x1a2/0x8a8)
>            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
>            ffffffffffffffff 000000000000000f 00000380016b3d98 0000080000100000
>            000000008364c100 0000080000000000 0000000000100077 0e00000000000000
>            00000000909da100 00000380016b3c58 00000000b912c982 00000380016b3b40
> Krnl Code: 00000000b912c992: a774002c          brc     7,00000000b912c9ea
>            00000000b912c996: ecb80225007c      cgij    %r11,0,8,00000000b912cde0
>           #00000000b912c99c: e310f0f80004      lg      %r1,248(%r15)
>           >00000000b912c9a2: e37010000020      cg      %r7,0(%r1)
>            00000000b912c9a8: a784010b          brc     8,00000000b912cbbe
>            00000000b912c9ac: e310f0e80004      lg      %r1,232(%r15)
>            00000000b912c9b2: ec180013007c      cgij    %r1,0,8,00000000b912c9d8
>            00000000b912c9b8: e310f0e80004      lg      %r1,232(%r15)
> Call Trace:
>  [<00000000b912c9a2>] mmap_region+0x1a2/0x8a8
> ([<00000000b912c982>] mmap_region+0x182/0x8a8)
>  [<00000000b912d492>] do_mmap+0x3ea/0x4c8
>  [<00000000b90fb9cc>] vm_mmap_pgoff+0xd4/0x170
>  [<00000000b9129c9a>] ksys_mmap_pgoff+0x62/0x238
>  [<00000000b912a034>] __s390x_sys_old_mmap+0x74/0x98
>  [<00000000b9a78ff8>] __do_syscall+0x1d8/0x200
>  [<00000000b9a872a2>] system_call+0x82/0xb0
> Last Breaking-Event-Address:
>  [<00000000b9b9e678>] __s390_indirect_jump_r14+0x0/0xc
> Kernel panic - not syncing: Fatal exception: panic_on_oops

Starting today we're still seeing the same crash with linux-next from
(next-20220513):

[  211.937897] CPU: 7 PID: 535 Comm: pt_upgrade Not tainted 5.18.0-rc6-11648-g76535d42eb53-dirty #732
[  211.937902] Unable to handle kernel pointer dereference in virtual kernel address space
[  211.937903] Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
[  211.937906] Failing address: 0e00000000000000 TEID: 0e00000000000803
[  211.937909] Krnl PSW : 0704c00180000000 0000001ca52f06d6
[  211.937910] Fault in home space mode while using kernel ASCE.
[  211.937917] AS:0000001ca6e24007 R3:0000001fffff0007 S:0000001ffffef800 P:000000000000003d
[  211.937914]  (mmap_region+0x19e/0x848)
[  211.937929]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[  211.937939] Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
[  211.937942]            ffffffff00000f0f ffffffffffffffff 0e00000000000000 0000040000001000
[  211.937945]            0000000083551900 0000040000000000 00000000000000fb 000003800070fc58
[  211.937947]            000000008f490000 0000000000000000 0000001ca52f06b6 000003800070fb48
[  211.937959] Krnl Code: 0000001ca52f06c6: a7740021            brc     7,0000001ca52f0708
[  211.937959]            0000001ca52f06ca: ec6801b3007c        cgij    %r6,0,8,0000001ca52f0a30
[  211.937959]           #0000001ca52f06d0: e310f0f80004        lg      %r1,248(%r15)
[  211.937959]           >0000001ca52f06d6: e37010000020        cg      %r7,0(%r1)
[  211.937959]            0000001ca52f06dc: a78400ea            brc     8,0000001ca52f08b0
[  211.937959]            0000001ca52f06e0: e310f0f00004        lg      %r1,240(%r15)
[  211.937959]            0000001ca52f06e6: ec180008007c        cgij    %r1,0,8,0000001ca52f06f6
[  211.937959]            0000001ca52f06ec: e39010080020        cg      %r9,8(%r1)
[  211.937973] Call Trace:
[  211.937975]  [<0000001ca52f06d6>] mmap_region+0x19e/0x848
[  211.937978] ([<0000001ca52f06b6>] mmap_region+0x17e/0x848)
[  211.937981]  [<0000001ca52f116a>] do_mmap+0x3ea/0x4c8
[  211.937983]  [<0000001ca52bed12>] vm_mmap_pgoff+0xda/0x178
[  211.937987]  [<0000001ca52ed5ea>] ksys_mmap_pgoff+0x62/0x238
[  211.937989]  [<0000001ca52ed992>] __s390x_sys_old_mmap+0x7a/0xa0
[  211.937993]  [<0000001ca5c4ef5c>] __do_syscall+0x1d4/0x200
[  211.937999]  [<0000001ca5c5d572>] system_call+0x82/0xb0
[  211.938002] Last Breaking-Event-Address:
[  211.938003]  [<0000001ca5888616>] mas_prev+0xb6/0xc0
[  211.938010] Oops: 0038 ilc:3 [#2]
[  211.938011] Kernel panic - not syncing: Fatal exception: panic_on_oops
[  211.938012] SMP
[  211.938014] Modules linked in:
07: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 0000001C
A50679A6

IS that issue supposed to be fixed? git bisect pointed me to

# bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
  'mm-everything' of
  git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

which isn't really helpful.

Anything we could help with debugging this?

Thanks
Sven

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-13 14:46   ` Sven Schnelle
@ 2022-05-13 14:51     ` Sven Schnelle
  2022-05-13 16:49     ` Andrew Morton
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Sven Schnelle @ 2022-05-13 14:51 UTC (permalink / raw)
  To: Liam R. Howlett
  Cc: Heiko Carstens, Guenter Roeck, Andrew Morton, linux-mm, linux-kernel


Sven Schnelle <svens@linux.ibm.com> writes:

> Starting today we're still seeing the same crash with linux-next from
> (next-20220513):

Small correction: This also happened the last two days, so it started
with next-20220511.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-13 14:46   ` Sven Schnelle
  2022-05-13 14:51     ` Sven Schnelle
@ 2022-05-13 16:49     ` Andrew Morton
  2022-05-13 17:00     ` Liam Howlett
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Andrew Morton @ 2022-05-13 16:49 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Liam R. Howlett, Heiko Carstens, Guenter Roeck, linux-mm, linux-kernel

On Fri, 13 May 2022 16:46:41 +0200 Sven Schnelle <svens@linux.ibm.com> wrote:

> IS that issue supposed to be fixed? git bisect pointed me to
> 
> # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
>   'mm-everything' of
>   git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> 
> which isn't really helpful.
> 
> Anything we could help with debugging this?

git-bisect is still doing that?  There's a way of nudging it, but I
forget.

Perhaps you could grab the mm-unstable branch from
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and bisect that?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-13 14:46   ` Sven Schnelle
  2022-05-13 14:51     ` Sven Schnelle
  2022-05-13 16:49     ` Andrew Morton
@ 2022-05-13 17:00     ` Liam Howlett
  2022-05-15 20:02       ` Sven Schnelle
  2022-05-17 11:53       ` Heiko Carstens
  2022-05-13 17:28     ` Guenter Roeck
  2022-05-13 20:12     ` Yang Shi
  4 siblings, 2 replies; 32+ messages in thread
From: Liam Howlett @ 2022-05-13 17:00 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Heiko Carstens, Guenter Roeck, Andrew Morton, linux-mm, linux-kernel

* Sven Schnelle <svens@linux.ibm.com> [220513 10:46]:
> Heiko Carstens <hca@linux.ibm.com> writes:
> 
> > On Thu, Apr 28, 2022 at 10:20:40AM -0700, Guenter Roeck wrote:
> >> On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
> >> > Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
> >> > more consistent", v3.
> >> > 
> >> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> >> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >> 
> >> This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
> >> attached.
> >> 
> >> Guenter
> >> 

....

> >
> > FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> > bisect, but given that the call trace, and even the failing address
> > match, I'm quite confident it is the same reason.
> >
> > Unable to handle kernel pointer dereference in virtual kernel address space
> > Failing address: 0e00000000000000 TEID: 0e00000000000803
> > Fault in home space mode while using kernel ASCE.
> > AS:00000000bac44007 R3:00000001ffff0007 S:00000001fffef800 P:000000000000003d
> > Oops: 0038 ilc:3 [#1] SMP
> > CPU: 3 PID: 79757 Comm: pt_upgrade_race Tainted: G            E K   5.18.0-20220428.rc4.git500.bdc61aad77fa.300.fc35.s390x+next #1
> > Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
> > Krnl PSW : 0704c00180000000 00000000b912c9a2 (mmap_region+0x1a2/0x8a8)
> >            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> > Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
> >            ffffffffffffffff 000000000000000f 00000380016b3d98 0000080000100000
> >            000000008364c100 0000080000000000 0000000000100077 0e00000000000000
> >            00000000909da100 00000380016b3c58 00000000b912c982 00000380016b3b40
> > Krnl Code: 00000000b912c992: a774002c          brc     7,00000000b912c9ea
> >            00000000b912c996: ecb80225007c      cgij    %r11,0,8,00000000b912cde0
> >           #00000000b912c99c: e310f0f80004      lg      %r1,248(%r15)
> >           >00000000b912c9a2: e37010000020      cg      %r7,0(%r1)
> >            00000000b912c9a8: a784010b          brc     8,00000000b912cbbe
> >            00000000b912c9ac: e310f0e80004      lg      %r1,232(%r15)
> >            00000000b912c9b2: ec180013007c      cgij    %r1,0,8,00000000b912c9d8
> >            00000000b912c9b8: e310f0e80004      lg      %r1,232(%r15)
> > Call Trace:
> >  [<00000000b912c9a2>] mmap_region+0x1a2/0x8a8
> > ([<00000000b912c982>] mmap_region+0x182/0x8a8)
> >  [<00000000b912d492>] do_mmap+0x3ea/0x4c8
> >  [<00000000b90fb9cc>] vm_mmap_pgoff+0xd4/0x170
> >  [<00000000b9129c9a>] ksys_mmap_pgoff+0x62/0x238
> >  [<00000000b912a034>] __s390x_sys_old_mmap+0x74/0x98
> >  [<00000000b9a78ff8>] __do_syscall+0x1d8/0x200
> >  [<00000000b9a872a2>] system_call+0x82/0xb0
> > Last Breaking-Event-Address:
> >  [<00000000b9b9e678>] __s390_indirect_jump_r14+0x0/0xc
> > Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
> Starting today we're still seeing the same crash with linux-next from
> (next-20220513):
> 
> [  211.937897] CPU: 7 PID: 535 Comm: pt_upgrade Not tainted 5.18.0-rc6-11648-g76535d42eb53-dirty #732
> [  211.937902] Unable to handle kernel pointer dereference in virtual kernel address space
> [  211.937903] Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
> [  211.937906] Failing address: 0e00000000000000 TEID: 0e00000000000803
> [  211.937909] Krnl PSW : 0704c00180000000 0000001ca52f06d6
> [  211.937910] Fault in home space mode while using kernel ASCE.
> [  211.937917] AS:0000001ca6e24007 R3:0000001fffff0007 S:0000001ffffef800 P:000000000000003d
> [  211.937914]  (mmap_region+0x19e/0x848)
> [  211.937929]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [  211.937939] Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
> [  211.937942]            ffffffff00000f0f ffffffffffffffff 0e00000000000000 0000040000001000
> [  211.937945]            0000000083551900 0000040000000000 00000000000000fb 000003800070fc58
> [  211.937947]            000000008f490000 0000000000000000 0000001ca52f06b6 000003800070fb48
> [  211.937959] Krnl Code: 0000001ca52f06c6: a7740021            brc     7,0000001ca52f0708
> [  211.937959]            0000001ca52f06ca: ec6801b3007c        cgij    %r6,0,8,0000001ca52f0a30
> [  211.937959]           #0000001ca52f06d0: e310f0f80004        lg      %r1,248(%r15)
> [  211.937959]           >0000001ca52f06d6: e37010000020        cg      %r7,0(%r1)
> [  211.937959]            0000001ca52f06dc: a78400ea            brc     8,0000001ca52f08b0
> [  211.937959]            0000001ca52f06e0: e310f0f00004        lg      %r1,240(%r15)
> [  211.937959]            0000001ca52f06e6: ec180008007c        cgij    %r1,0,8,0000001ca52f06f6
> [  211.937959]            0000001ca52f06ec: e39010080020        cg      %r9,8(%r1)
> [  211.937973] Call Trace:
> [  211.937975]  [<0000001ca52f06d6>] mmap_region+0x19e/0x848
> [  211.937978] ([<0000001ca52f06b6>] mmap_region+0x17e/0x848)
> [  211.937981]  [<0000001ca52f116a>] do_mmap+0x3ea/0x4c8
> [  211.937983]  [<0000001ca52bed12>] vm_mmap_pgoff+0xda/0x178
> [  211.937987]  [<0000001ca52ed5ea>] ksys_mmap_pgoff+0x62/0x238
> [  211.937989]  [<0000001ca52ed992>] __s390x_sys_old_mmap+0x7a/0xa0
> [  211.937993]  [<0000001ca5c4ef5c>] __do_syscall+0x1d4/0x200
> [  211.937999]  [<0000001ca5c5d572>] system_call+0x82/0xb0
> [  211.938002] Last Breaking-Event-Address:
> [  211.938003]  [<0000001ca5888616>] mas_prev+0xb6/0xc0
> [  211.938010] Oops: 0038 ilc:3 [#2]
> [  211.938011] Kernel panic - not syncing: Fatal exception: panic_on_oops
> [  211.938012] SMP
> [  211.938014] Modules linked in:
> 07: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 0000001C
> A50679A6
> 
> IS that issue supposed to be fixed? git bisect pointed me to
> 
> # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
>   'mm-everything' of
>   git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> 
> which isn't really helpful.
> 
> Anything we could help with debugging this?

I tested the maple tree on top of the s390 as it was the same crash and
it was okay.  I haven't tested the mm-everything branch though.  Can you
test mm-unstable?

I'll continue setting up a sparc VM for testing here and test
mm-everything on that and the s390

Thanks,
Liam

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-13 14:46   ` Sven Schnelle
                       ` (2 preceding siblings ...)
  2022-05-13 17:00     ` Liam Howlett
@ 2022-05-13 17:28     ` Guenter Roeck
  2022-05-13 20:12     ` Yang Shi
  4 siblings, 0 replies; 32+ messages in thread
From: Guenter Roeck @ 2022-05-13 17:28 UTC (permalink / raw)
  To: Sven Schnelle, Liam R. Howlett
  Cc: Heiko Carstens, Andrew Morton, linux-mm, linux-kernel

On 5/13/22 07:46, Sven Schnelle wrote:
[ ... ]
> 
> IS that issue supposed to be fixed? git bisect pointed me to
> 
> # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
>    'mm-everything' of
>    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> 
> which isn't really helpful.
> 

sparc64 is still broken for me, as are several other platforms/
architectures. Summary for next-20220513:

Build results:
	total: 146 pass: 136 fail: 10
Qemu test results:
	total: 489 pass: 406 fail: 83

Failures are way too far spread for a single person to analyze.
Anyone interested may have a look at the 'next' column at
https://kerneltests.org/builders for details.

Guenter

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-13 14:46   ` Sven Schnelle
                       ` (3 preceding siblings ...)
  2022-05-13 17:28     ` Guenter Roeck
@ 2022-05-13 20:12     ` Yang Shi
  4 siblings, 0 replies; 32+ messages in thread
From: Yang Shi @ 2022-05-13 20:12 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Liam R. Howlett, Heiko Carstens, Guenter Roeck, Andrew Morton,
	Linux MM, Linux Kernel Mailing List

On Fri, May 13, 2022 at 7:47 AM Sven Schnelle <svens@linux.ibm.com> wrote:
>
> Heiko Carstens <hca@linux.ibm.com> writes:
>
> > On Thu, Apr 28, 2022 at 10:20:40AM -0700, Guenter Roeck wrote:
> >> On Wed, Apr 27, 2022 at 03:10:45PM -0700, Andrew Morton wrote:
> >> > Fix mapletree for patch series "Make khugepaged collapse readonly FS THP
> >> > more consistent", v3.
> >> >
> >> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> >> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >>
> >> This patch causes all my sparc64 boot tests to fail. Bisect and crash logs
> >> attached.
> >>
> >> Guenter
> >>
> >> ---
> >> [   12.624703] Unable to handle kernel paging request at virtual address 0e00000000000000
> >> [   12.624793] tsk->{mm,active_mm}->context = 0000000000000005
> >> [   12.624823] tsk->{mm,active_mm}->pgd = fffff800048b8000
> >> [   12.624849]               \|/ ____ \|/
> >> [   12.624849]               "@'/ .. \`@"
> >> [   12.624849]               /_| \__/ |_\
> >> [   12.624849]                  \__U_/
> >> [   12.624874] init(1): Oops [#1]
> >> [   12.625194] CPU: 0 PID: 1 Comm: init Not tainted 5.18.0-rc4-next-20220428 #1
> >> [   12.625421] TSTATE: 0000009911001606 TPC: 00000000005e6330 TNPC: 00000000005e6334 Y: 00000000    Not tainted
> >> [   12.625455] TPC: <mmap_region+0x150/0x700>
> >> [   12.625503] g0: 0000000000619a00 g1: 0000000000000000 g2: fffff8000488b200 g3: 0000000000000000
> >> [   12.625537] g4: fffff8000414a9a0 g5: fffff8001dd3e000 g6: fffff8000414c000 g7: 0000000000000000
> >> [   12.625569] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000001167b68 o3: 0000000000f51bb8
> >> [   12.625601] o4: fffff80100301fff o5: fffff8000414fc20 sp: fffff8000414f341 ret_pc: 00000000005e6310
> >> [   12.625630] RPC: <mmap_region+0x130/0x700>
> >> [   12.625692] l0: fffff8000488b260 l1: 000000000000008b l2: fffff80100302000 l3: 0000000000000000
> >> [   12.625725] l4: fffff80100301fff l5: 0000000000000000 l6: 30812c2a1dd8556f l7: fffff8000414b438
> >> [   12.625762] i0: fffff800044f58a0 i1: fffff801001ec000 i2: 0e00000000000000 i3: 0000000000000075
> >> [   12.625795] i4: 0000000000000000 i5: fffff8000414fde0 i6: fffff8000414f461 i7: 00000000005e6c58
> >> [   12.625833] I7: <do_mmap+0x378/0x500>
> >> [   12.625906] Call Trace:
> >> [   12.626006] [<00000000005e6c58>] do_mmap+0x378/0x500
> >> [   12.626092] [<00000000005bdc98>] vm_mmap_pgoff+0x78/0x100
> >> [   12.626112] [<00000000005e3d24>] ksys_mmap_pgoff+0x164/0x1c0
> >> [   12.626129] [<0000000000406294>] linux_sparc_syscall+0x34/0x44
> >> [   12.626198] Disabling lock debugging due to kernel taint
> >> [   12.626286] Caller[00000000005e6c58]: do_mmap+0x378/0x500
> >> [   12.626335] Caller[00000000005bdc98]: vm_mmap_pgoff+0x78/0x100
> >> [   12.626354] Caller[00000000005e3d24]: ksys_mmap_pgoff+0x164/0x1c0
> >> [   12.626371] Caller[0000000000406294]: linux_sparc_syscall+0x34/0x44
> >> [   12.626390] Caller[fffff8010001d88c]: 0xfffff8010001d88c
> >> [   12.626537] Instruction DUMP:
> >> [   12.626567]  a6100008
> >> [   12.626678]  02c68006
> >> [   12.626685]  01000000
> >> [   12.626690] <c25e8000>
> >> [   12.626696]  80a04012
> >> [   12.626701]  22600077
> >> [   12.626707]  c25ea088
> >> [   12.626712]  22c4c00a
> >> [   12.626717]  f277a7c7
> >> [   12.626728]
> >> [   12.627169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> >
> > FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> > bisect, but given that the call trace, and even the failing address
> > match, I'm quite confident it is the same reason.
> >
> > Unable to handle kernel pointer dereference in virtual kernel address space
> > Failing address: 0e00000000000000 TEID: 0e00000000000803
> > Fault in home space mode while using kernel ASCE.
> > AS:00000000bac44007 R3:00000001ffff0007 S:00000001fffef800 P:000000000000003d
> > Oops: 0038 ilc:3 [#1] SMP
> > CPU: 3 PID: 79757 Comm: pt_upgrade_race Tainted: G            E K   5.18.0-20220428.rc4.git500.bdc61aad77fa.300.fc35.s390x+next #1
> > Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
> > Krnl PSW : 0704c00180000000 00000000b912c9a2 (mmap_region+0x1a2/0x8a8)
> >            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> > Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
> >            ffffffffffffffff 000000000000000f 00000380016b3d98 0000080000100000
> >            000000008364c100 0000080000000000 0000000000100077 0e00000000000000
> >            00000000909da100 00000380016b3c58 00000000b912c982 00000380016b3b40
> > Krnl Code: 00000000b912c992: a774002c          brc     7,00000000b912c9ea
> >            00000000b912c996: ecb80225007c      cgij    %r11,0,8,00000000b912cde0
> >           #00000000b912c99c: e310f0f80004      lg      %r1,248(%r15)
> >           >00000000b912c9a2: e37010000020      cg      %r7,0(%r1)
> >            00000000b912c9a8: a784010b          brc     8,00000000b912cbbe
> >            00000000b912c9ac: e310f0e80004      lg      %r1,232(%r15)
> >            00000000b912c9b2: ec180013007c      cgij    %r1,0,8,00000000b912c9d8
> >            00000000b912c9b8: e310f0e80004      lg      %r1,232(%r15)
> > Call Trace:
> >  [<00000000b912c9a2>] mmap_region+0x1a2/0x8a8
> > ([<00000000b912c982>] mmap_region+0x182/0x8a8)
> >  [<00000000b912d492>] do_mmap+0x3ea/0x4c8
> >  [<00000000b90fb9cc>] vm_mmap_pgoff+0xd4/0x170
> >  [<00000000b9129c9a>] ksys_mmap_pgoff+0x62/0x238
> >  [<00000000b912a034>] __s390x_sys_old_mmap+0x74/0x98
> >  [<00000000b9a78ff8>] __do_syscall+0x1d8/0x200
> >  [<00000000b9a872a2>] system_call+0x82/0xb0
> > Last Breaking-Event-Address:
> >  [<00000000b9b9e678>] __s390_indirect_jump_r14+0x0/0xc
> > Kernel panic - not syncing: Fatal exception: panic_on_oops
>
> Starting today we're still seeing the same crash with linux-next from
> (next-20220513):
>
> [  211.937897] CPU: 7 PID: 535 Comm: pt_upgrade Not tainted 5.18.0-rc6-11648-g76535d42eb53-dirty #732
> [  211.937902] Unable to handle kernel pointer dereference in virtual kernel address space
> [  211.937903] Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
> [  211.937906] Failing address: 0e00000000000000 TEID: 0e00000000000803
> [  211.937909] Krnl PSW : 0704c00180000000 0000001ca52f06d6
> [  211.937910] Fault in home space mode while using kernel ASCE.
> [  211.937917] AS:0000001ca6e24007 R3:0000001fffff0007 S:0000001ffffef800 P:000000000000003d
> [  211.937914]  (mmap_region+0x19e/0x848)
> [  211.937929]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [  211.937939] Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
> [  211.937942]            ffffffff00000f0f ffffffffffffffff 0e00000000000000 0000040000001000
> [  211.937945]            0000000083551900 0000040000000000 00000000000000fb 000003800070fc58
> [  211.937947]            000000008f490000 0000000000000000 0000001ca52f06b6 000003800070fb48
> [  211.937959] Krnl Code: 0000001ca52f06c6: a7740021            brc     7,0000001ca52f0708
> [  211.937959]            0000001ca52f06ca: ec6801b3007c        cgij    %r6,0,8,0000001ca52f0a30
> [  211.937959]           #0000001ca52f06d0: e310f0f80004        lg      %r1,248(%r15)
> [  211.937959]           >0000001ca52f06d6: e37010000020        cg      %r7,0(%r1)
> [  211.937959]            0000001ca52f06dc: a78400ea            brc     8,0000001ca52f08b0
> [  211.937959]            0000001ca52f06e0: e310f0f00004        lg      %r1,240(%r15)
> [  211.937959]            0000001ca52f06e6: ec180008007c        cgij    %r1,0,8,0000001ca52f06f6
> [  211.937959]            0000001ca52f06ec: e39010080020        cg      %r9,8(%r1)
> [  211.937973] Call Trace:
> [  211.937975]  [<0000001ca52f06d6>] mmap_region+0x19e/0x848
> [  211.937978] ([<0000001ca52f06b6>] mmap_region+0x17e/0x848)
> [  211.937981]  [<0000001ca52f116a>] do_mmap+0x3ea/0x4c8
> [  211.937983]  [<0000001ca52bed12>] vm_mmap_pgoff+0xda/0x178
> [  211.937987]  [<0000001ca52ed5ea>] ksys_mmap_pgoff+0x62/0x238
> [  211.937989]  [<0000001ca52ed992>] __s390x_sys_old_mmap+0x7a/0xa0
> [  211.937993]  [<0000001ca5c4ef5c>] __do_syscall+0x1d4/0x200
> [  211.937999]  [<0000001ca5c5d572>] system_call+0x82/0xb0
> [  211.938002] Last Breaking-Event-Address:
> [  211.938003]  [<0000001ca5888616>] mas_prev+0xb6/0xc0
> [  211.938010] Oops: 0038 ilc:3 [#2]
> [  211.938011] Kernel panic - not syncing: Fatal exception: panic_on_oops
> [  211.938012] SMP
> [  211.938014] Modules linked in:
> 07: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 0000001C
> A50679A6
>
> IS that issue supposed to be fixed? git bisect pointed me to
>
> # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
>   'mm-everything' of
>   git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>
> which isn't really helpful.
>
> Anything we could help with debugging this?

I think this is the same issue. In the initial report, you bisected to
"Fix mapletree for patch series "Make khugepaged collapse readonly FS
THP more consistent", v3.", this was used to fix the build error when
applying "Make khugepaged collapse readonly FS THP more consistent" on
top of Liam's maple tree series. You could find the patch at:
https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next-history/+/41fd8be857ee43f2f466fca7c2b66fea39f6540d%5E%21/mm/mmap.c.
It converted two new khugepaged_enter_vma_merge() calls from Liam's
series to khugepaged_enter_vma().

Then the "Make khugepaged collapse readonly FS THP more consistent"
was dropped due to some other reasons, and it was re-added around May
11.

Liam's maple tree series added two new khugepaged_enter_vma_merge()
calls, then my series converted khugepaged_enter_vma_merge() to
khugepaged_enter_vma() by patch "mm: khugepaged: make
hugepage_vma_check() non-static" and "mm: khugepaged: introduce
khugepaged_enter_vma() helper".

But the two patches should not change how khugepaged_enter_vma_merge()
works, just rearranged some code. Anyway khugepaged_enter_vma() would
check a lot stuff from vma, if vma is not fully initialized it may
have NULL pointer dereference, but the weird thing is it works for x86
(sorry, I just have x86 machine for testing).

So I'm not sure whether the failure is caused by Liam's patch or mine.
So, could you please apply the blow debug patch then try to boot?
Thanks.

diff --git a/mm/mmap.c b/mm/mmap.c
index 67aa1d2a959b..7b860ad9d847 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2651,10 +2651,8 @@ unsigned long mmap_region(struct file *file,
unsigned long addr,

        /* Actually expand, if possible */
        if (vma &&
-           !vma_expand(&mas, vma, merge_start, merge_end, vm_pgoff, next)) {
-               khugepaged_enter_vma(vma, vm_flags);
+           !vma_expand(&mas, vma, merge_start, merge_end, vm_pgoff, next))
                goto expanded;
-       }

        mas.index = addr;
        mas.last = end - 1;
@@ -3074,7 +3072,6 @@ static int do_brk_flags(struct ma_state *mas,
struct vm_area_struct *vma,
                        anon_vma_interval_tree_post_update_vma(vma);
                        anon_vma_unlock_write(vma->anon_vma);
                }
-               khugepaged_enter_vma(vma, flags);
                goto out;
        }

>
> Thanks
> Sven
>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-13 17:00     ` Liam Howlett
@ 2022-05-15 20:02       ` Sven Schnelle
  2022-05-16 14:02         ` Liam Howlett
  2022-05-17 11:53       ` Heiko Carstens
  1 sibling, 1 reply; 32+ messages in thread
From: Sven Schnelle @ 2022-05-15 20:02 UTC (permalink / raw)
  To: Liam Howlett
  Cc: Heiko Carstens, Guenter Roeck, Andrew Morton, linux-mm, linux-kernel

Liam Howlett <liam.howlett@oracle.com> writes:

> * Sven Schnelle <svens@linux.ibm.com> [220513 10:46]:
>> Starting today we're still seeing the same crash with linux-next from
>> (next-20220513):
>>
>> [  211.937897] CPU: 7 PID: 535 Comm: pt_upgrade Not tainted 5.18.0-rc6-11648-g76535d42eb53-dirty #732
>> [  211.937902] Unable to handle kernel pointer dereference in virtual kernel address space
>> [  211.937903] Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
>> [  211.937906] Failing address: 0e00000000000000 TEID: 0e00000000000803
>> [  211.937909] Krnl PSW : 0704c00180000000 0000001ca52f06d6
>> [  211.937910] Fault in home space mode while using kernel ASCE.
>> [  211.937917] AS:0000001ca6e24007 R3:0000001fffff0007 S:0000001ffffef800 P:000000000000003d
>> [  211.937914]  (mmap_region+0x19e/0x848)
>> [  211.937929]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>> [  211.937939] Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
>> [  211.937942]            ffffffff00000f0f ffffffffffffffff 0e00000000000000 0000040000001000
>> [  211.937945]            0000000083551900 0000040000000000 00000000000000fb 000003800070fc58
>> [  211.937947]            000000008f490000 0000000000000000 0000001ca52f06b6 000003800070fb48
>> [  211.937959] Krnl Code: 0000001ca52f06c6: a7740021            brc     7,0000001ca52f0708
>> [  211.937959]            0000001ca52f06ca: ec6801b3007c        cgij    %r6,0,8,0000001ca52f0a30
>> [  211.937959]           #0000001ca52f06d0: e310f0f80004        lg      %r1,248(%r15)
>> [  211.937959]           >0000001ca52f06d6: e37010000020        cg      %r7,0(%r1)
>> [  211.937959]            0000001ca52f06dc: a78400ea            brc     8,0000001ca52f08b0
>> [  211.937959]            0000001ca52f06e0: e310f0f00004        lg      %r1,240(%r15)
>> [  211.937959]            0000001ca52f06e6: ec180008007c        cgij    %r1,0,8,0000001ca52f06f6
>> [  211.937959]            0000001ca52f06ec: e39010080020        cg      %r9,8(%r1)
>> [  211.937973] Call Trace:
>> [  211.937975]  [<0000001ca52f06d6>] mmap_region+0x19e/0x848
>> [  211.937978] ([<0000001ca52f06b6>] mmap_region+0x17e/0x848)
>> [  211.937981]  [<0000001ca52f116a>] do_mmap+0x3ea/0x4c8
>> [  211.937983]  [<0000001ca52bed12>] vm_mmap_pgoff+0xda/0x178
>> [  211.937987]  [<0000001ca52ed5ea>] ksys_mmap_pgoff+0x62/0x238
>> [  211.937989]  [<0000001ca52ed992>] __s390x_sys_old_mmap+0x7a/0xa0
>> [  211.937993]  [<0000001ca5c4ef5c>] __do_syscall+0x1d4/0x200
>> [  211.937999]  [<0000001ca5c5d572>] system_call+0x82/0xb0
>> [  211.938002] Last Breaking-Event-Address:
>> [  211.938003]  [<0000001ca5888616>] mas_prev+0xb6/0xc0
>> [  211.938010] Oops: 0038 ilc:3 [#2]
>> [  211.938011] Kernel panic - not syncing: Fatal exception: panic_on_oops
>> [  211.938012] SMP
>> [  211.938014] Modules linked in:
>> 07: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 0000001C
>> A50679A6
>>
>> IS that issue supposed to be fixed? git bisect pointed me to
>>
>> # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
>>   'mm-everything' of
>>   git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>>
>> which isn't really helpful.
>>
>> Anything we could help with debugging this?
>
> I tested the maple tree on top of the s390 as it was the same crash and
> it was okay.  I haven't tested the mm-everything branch though.  Can you
> test mm-unstable?

Yes, i tested mm-unstable but wasn't able to reproduce the issue.

> I'll continue setting up a sparc VM for testing here and test
> mm-everything on that and the s390

One thing that is different compared to x86 is that both sparc and s390
are big endian. Not sure whether and where that would make a difference.

The code to trigger the crash on s390 is rather simple: Just force a
paging level upgrade to 5 levels by calling mmap() with an address that
doesn't fit in 3 levels. Haven't tested whether an upgrade to 4 levels
would be sufficent. I've condensed our test case that triggers this, and
basically all that is required is:

--------------------------------8<---------------------------------------
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <stdio.h>

#define PAGE_SIZE       0x1000
#define _REGION1_SIZE   (1UL << 54)

int main(int argc, char *argv[])
{
        int pid, status;
        void *addr;

        pid = fork();
        if (pid == 0) {
                /*
                 * Trigger page table level upgrade
                 */
                addr = mmap((void *)_REGION1_SIZE, PAGE_SIZE, PROT_READ | PROT_WRITE,
                            MAP_SHARED | MAP_ANONYMOUS, -1, 0);
                if (addr == MAP_FAILED)
                        return 1;
                *(int *)addr = 1;
                return 0;
        }
        wait(&status);
        return 0;
}
--------------------------------8<---------------------------------------

I've added a few debug statements to the maple tree code:

[   27.769641] mas_next_entry: offset=14
[   27.769642] mas_next_nentry: entry = 0e00000000000000, slots=0000000090249f80, mas->offset=15 count=14

I see in mas_next_nentry() that there's a while that iterates over the
(used?) slots until count is reached. After that loop mas_next_entry()
just picks the next (unused?) entry, which is slot 15 in that case.

What i noticed while scanning over include/linux/maple_tree.h is:

struct maple_range_64 {
	struct maple_pnode *parent;
	unsigned long pivot[MAPLE_RANGE64_SLOTS - 1];
	union {
		void __rcu *slot[MAPLE_RANGE64_SLOTS];
		struct {
		void __rcu *pad[MAPLE_RANGE64_SLOTS - 1];
		struct maple_metadata meta;
        	};
	};
};

and struct maple_metadata is:

struct maple_metadata {
	unsigned char end;
	unsigned char gap;
};

If i swap the gap and end members 0x0e00000000000000 becomes
0x000e000000000000. And 0xe matches our msa->offset 14 above.
So it looks like mas_next() in mmap_region returns the meta
data for the node.

So from the lines above you likely already guessed that i have no clue
how mapple tree works, and i didn't had enough time today to read all
the magic and understand it. But i thought i just drop my observation
here in case someone has an idea.

Thanks,
Sven

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-15 20:02       ` Sven Schnelle
@ 2022-05-16 14:02         ` Liam Howlett
  2022-05-16 15:37           ` Sven Schnelle
  0 siblings, 1 reply; 32+ messages in thread
From: Liam Howlett @ 2022-05-16 14:02 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Heiko Carstens, Guenter Roeck, Andrew Morton, linux-mm, linux-kernel

* Sven Schnelle <svens@linux.ibm.com> [220515 16:02]:
> Liam Howlett <liam.howlett@oracle.com> writes:
> 
> > * Sven Schnelle <svens@linux.ibm.com> [220513 10:46]:
> >> Starting today we're still seeing the same crash with linux-next from
> >> (next-20220513):
> >>
> >> [  211.937897] CPU: 7 PID: 535 Comm: pt_upgrade Not tainted 5.18.0-rc6-11648-g76535d42eb53-dirty #732
> >> [  211.937902] Unable to handle kernel pointer dereference in virtual kernel address space
> >> [  211.937903] Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
> >> [  211.937906] Failing address: 0e00000000000000 TEID: 0e00000000000803
> >> [  211.937909] Krnl PSW : 0704c00180000000 0000001ca52f06d6
> >> [  211.937910] Fault in home space mode while using kernel ASCE.
> >> [  211.937917] AS:0000001ca6e24007 R3:0000001fffff0007 S:0000001ffffef800 P:000000000000003d
> >> [  211.937914]  (mmap_region+0x19e/0x848)
> >> [  211.937929]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> >> [  211.937939] Krnl GPRS: 0000000000000000 0e00000000000000 0000000000000000 0000000000000000
> >> [  211.937942]            ffffffff00000f0f ffffffffffffffff 0e00000000000000 0000040000001000
> >> [  211.937945]            0000000083551900 0000040000000000 00000000000000fb 000003800070fc58
> >> [  211.937947]            000000008f490000 0000000000000000 0000001ca52f06b6 000003800070fb48
> >> [  211.937959] Krnl Code: 0000001ca52f06c6: a7740021            brc     7,0000001ca52f0708
> >> [  211.937959]            0000001ca52f06ca: ec6801b3007c        cgij    %r6,0,8,0000001ca52f0a30
> >> [  211.937959]           #0000001ca52f06d0: e310f0f80004        lg      %r1,248(%r15)
> >> [  211.937959]           >0000001ca52f06d6: e37010000020        cg      %r7,0(%r1)
> >> [  211.937959]            0000001ca52f06dc: a78400ea            brc     8,0000001ca52f08b0
> >> [  211.937959]            0000001ca52f06e0: e310f0f00004        lg      %r1,240(%r15)
> >> [  211.937959]            0000001ca52f06e6: ec180008007c        cgij    %r1,0,8,0000001ca52f06f6
> >> [  211.937959]            0000001ca52f06ec: e39010080020        cg      %r9,8(%r1)
> >> [  211.937973] Call Trace:
> >> [  211.937975]  [<0000001ca52f06d6>] mmap_region+0x19e/0x848
> >> [  211.937978] ([<0000001ca52f06b6>] mmap_region+0x17e/0x848)
> >> [  211.937981]  [<0000001ca52f116a>] do_mmap+0x3ea/0x4c8
> >> [  211.937983]  [<0000001ca52bed12>] vm_mmap_pgoff+0xda/0x178
> >> [  211.937987]  [<0000001ca52ed5ea>] ksys_mmap_pgoff+0x62/0x238
> >> [  211.937989]  [<0000001ca52ed992>] __s390x_sys_old_mmap+0x7a/0xa0
> >> [  211.937993]  [<0000001ca5c4ef5c>] __do_syscall+0x1d4/0x200
> >> [  211.937999]  [<0000001ca5c5d572>] system_call+0x82/0xb0
> >> [  211.938002] Last Breaking-Event-Address:
> >> [  211.938003]  [<0000001ca5888616>] mas_prev+0xb6/0xc0
> >> [  211.938010] Oops: 0038 ilc:3 [#2]
> >> [  211.938011] Kernel panic - not syncing: Fatal exception: panic_on_oops
> >> [  211.938012] SMP
> >> [  211.938014] Modules linked in:
> >> 07: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 0000001C
> >> A50679A6
> >>
> >> IS that issue supposed to be fixed? git bisect pointed me to
> >>
> >> # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
> >>   'mm-everything' of
> >>   git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> >>
> >> which isn't really helpful.
> >>
> >> Anything we could help with debugging this?
> >
> > I tested the maple tree on top of the s390 as it was the same crash and
> > it was okay.  I haven't tested the mm-everything branch though.  Can you
> > test mm-unstable?
> 
> Yes, i tested mm-unstable but wasn't able to reproduce the issue.
> 
> > I'll continue setting up a sparc VM for testing here and test
> > mm-everything on that and the s390
> 
> One thing that is different compared to x86 is that both sparc and s390
> are big endian. Not sure whether and where that would make a difference.
> 
> The code to trigger the crash on s390 is rather simple: Just force a
> paging level upgrade to 5 levels by calling mmap() with an address that
> doesn't fit in 3 levels. Haven't tested whether an upgrade to 4 levels
> would be sufficent. I've condensed our test case that triggers this, and
> basically all that is required is:
> 
> --------------------------------8<---------------------------------------
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <sys/wait.h>
> #include <stdio.h>
> 
> #define PAGE_SIZE       0x1000
> #define _REGION1_SIZE   (1UL << 54)
> 
> int main(int argc, char *argv[])
> {
>         int pid, status;
>         void *addr;
> 
>         pid = fork();
>         if (pid == 0) {
>                 /*
>                  * Trigger page table level upgrade
>                  */
>                 addr = mmap((void *)_REGION1_SIZE, PAGE_SIZE, PROT_READ | PROT_WRITE,
>                             MAP_SHARED | MAP_ANONYMOUS, -1, 0);
>                 if (addr == MAP_FAILED)
>                         return 1;
>                 *(int *)addr = 1;
>                 return 0;
>         }
>         wait(&status);
>         return 0;
> }
> --------------------------------8<---------------------------------------
> 

I tried the above on my qemu s390 with kernel 5.18.0-rc6-next-20220513,
but it runs without issue, return code is 0.  Is there something the VM
needs to have for this to trigger?

> I've added a few debug statements to the maple tree code:
> 
> [   27.769641] mas_next_entry: offset=14
> [   27.769642] mas_next_nentry: entry = 0e00000000000000, slots=0000000090249f80, mas->offset=15 count=14

Where exactly are you printing this?

> 
> I see in mas_next_nentry() that there's a while that iterates over the
> (used?) slots until count is reached.`

Yes, mas_next_nentry() looks for the next non-null entry in the current
node.

>After that loop mas_next_entry()
> just picks the next (unused?) entry, which is slot 15 in that case.

mas_next_entry() returns the next non-null entry.  If there isn't one
returned by mas_next_nentry(), then it will advance to the next node by
calling mas_next_node().  There are checks in there for detecting dead
nodes for RCU use and limit checking as well.

> 
> What i noticed while scanning over include/linux/maple_tree.h is:
> 
> struct maple_range_64 {
> 	struct maple_pnode *parent;
> 	unsigned long pivot[MAPLE_RANGE64_SLOTS - 1];
> 	union {
> 		void __rcu *slot[MAPLE_RANGE64_SLOTS];
> 		struct {
> 		void __rcu *pad[MAPLE_RANGE64_SLOTS - 1];
> 		struct maple_metadata meta;
>         	};
> 	};
> };
> 
> and struct maple_metadata is:
> 
> struct maple_metadata {
> 	unsigned char end;
> 	unsigned char gap;
> };
> 
> If i swap the gap and end members 0x0e00000000000000 becomes
> 0x000e000000000000. And 0xe matches our msa->offset 14 above.
> So it looks like mas_next() in mmap_region returns the meta
> data for the node.

If this is the case, then I think any task that has more than 14 VMAs
would have issues.  I also use mas_next_entry() in mas_find() which is
used for the mas_for_each() macro/iterator.  Can you please enable
CONFIG_DEBUG_VM_MAPLE_TREE ?  mmap.c tests the tree after pretty much
any change and will dump useful information if there is an issue -
including the entire tree. See validate_mm_mt() for details.

You can find CONFIG_DEBUG_VM_MAPLE_TREE in the config:
kernel hacking -> Memory debugging -> Debug VM -> Debug VM maple trees

> 
> So from the lines above you likely already guessed that i have no clue
> how mapple tree works, and i didn't had enough time today to read all
> the magic and understand it. But i thought i just drop my observation
> here in case someone has an idea.

Thanks for sharing.  I'm having a hard time recreating the issue so I
cannot fully dig in myself.



I was able to boot spar64 with mm-unstable.  I did get an error:
[    5.002625] Kernel unaligned access at TPC[59bae8]
mmap_region+0x168/0xb00

faddr2line is less than useful though with reported line "at ??:?"

I'll keep digging into that.

Thanks,
Liam

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-16 14:02         ` Liam Howlett
@ 2022-05-16 15:37           ` Sven Schnelle
  2022-05-16 15:50             ` Liam Howlett
  0 siblings, 1 reply; 32+ messages in thread
From: Sven Schnelle @ 2022-05-16 15:37 UTC (permalink / raw)
  To: Liam Howlett
  Cc: Heiko Carstens, Guenter Roeck, Andrew Morton, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2874 bytes --]

Hi Liam,

Liam Howlett <liam.howlett@oracle.com> writes:

> * Sven Schnelle <svens@linux.ibm.com> [220515 16:02]:
>
> I tried the above on my qemu s390 with kernel 5.18.0-rc6-next-20220513,
> but it runs without issue, return code is 0.  Is there something the VM
> needs to have for this to trigger?

A coworker said the same. Reason for this seems to be that i've run the
code in a unittest environment which seems to make a difference. When
compiling the code above with gcc on my system it also doesn't crash.
So i have to figure out what makes this unittest binary special.

>> I've added a few debug statements to the maple tree code:
>> 
>> [   27.769641] mas_next_entry: offset=14
>> [   27.769642] mas_next_nentry: entry = 0e00000000000000, slots=0000000090249f80, mas->offset=15 count=14
>
> Where exactly are you printing this?

I added a lot of debug statements to the code trying to understand
it. I'll attach it to this mail.

>> 
>> I see in mas_next_nentry() that there's a while that iterates over the
>> (used?) slots until count is reached.`
>
> Yes, mas_next_nentry() looks for the next non-null entry in the current
> node.
>
>>After that loop mas_next_entry()
>> just picks the next (unused?) entry, which is slot 15 in that case.
>
> mas_next_entry() returns the next non-null entry.  If there isn't one
> returned by mas_next_nentry(), then it will advance to the next node by
> calling mas_next_node().  There are checks in there for detecting dead
> nodes for RCU use and limit checking as well.
>
>> 
>> What i noticed while scanning over include/linux/maple_tree.h is:
>> 
>> struct maple_range_64 {
>> 	struct maple_pnode *parent;
>> 	unsigned long pivot[MAPLE_RANGE64_SLOTS - 1];
>> 	union {
>> 		void __rcu *slot[MAPLE_RANGE64_SLOTS];
>> 		struct {
>> 		void __rcu *pad[MAPLE_RANGE64_SLOTS - 1];
>> 		struct maple_metadata meta;
>>         	};
>> 	};
>> };
>> 
>> and struct maple_metadata is:
>> 
>> struct maple_metadata {
>> 	unsigned char end;
>> 	unsigned char gap;
>> };
>> 
>> If i swap the gap and end members 0x0e00000000000000 becomes
>> 0x000e000000000000. And 0xe matches our msa->offset 14 above.
>> So it looks like mas_next() in mmap_region returns the meta
>> data for the node.
>
> If this is the case, then I think any task that has more than 14 VMAs
> would have issues.  I also use mas_next_entry() in mas_find() which is
> used for the mas_for_each() macro/iterator.  Can you please enable
> CONFIG_DEBUG_VM_MAPLE_TREE ?  mmap.c tests the tree after pretty much
> any change and will dump useful information if there is an issue -
> including the entire tree. See validate_mm_mt() for details.
>
> You can find CONFIG_DEBUG_VM_MAPLE_TREE in the config:
> kernel hacking -> Memory debugging -> Debug VM -> Debug VM maple trees

I have both DEBUG_MAPPLE_TREE and DEBUG_VM_MAPLE_TREE enabled, but don't
see anything printed.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: mapple-debug.diff --]
[-- Type: text/x-diff, Size: 172 bytes --]

 lib/maple_tree.c | 37 +++++++++++++++++++++++++++++++++++--
 mm/mmap.c        | 36 +++++++++++++++++++++++++++++-------
 2 files changed, 64 insertions(+), 9 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-16 15:37           ` Sven Schnelle
@ 2022-05-16 15:50             ` Liam Howlett
  2022-05-16 17:10               ` Sven Schnelle
  0 siblings, 1 reply; 32+ messages in thread
From: Liam Howlett @ 2022-05-16 15:50 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Heiko Carstens, Guenter Roeck, Andrew Morton, linux-mm, linux-kernel

* Sven Schnelle <svens@linux.ibm.com> [220516 11:37]:
> Hi Liam,
> 
> Liam Howlett <liam.howlett@oracle.com> writes:
> 
> > * Sven Schnelle <svens@linux.ibm.com> [220515 16:02]:
> >
> > I tried the above on my qemu s390 with kernel 5.18.0-rc6-next-20220513,
> > but it runs without issue, return code is 0.  Is there something the VM
> > needs to have for this to trigger?
> 
> A coworker said the same. Reason for this seems to be that i've run the
> code in a unittest environment which seems to make a difference. When
> compiling the code above with gcc on my system it also doesn't crash.
> So i have to figure out what makes this unittest binary special.
> 
> >> I've added a few debug statements to the maple tree code:
> >> 
> >> [   27.769641] mas_next_entry: offset=14
> >> [   27.769642] mas_next_nentry: entry = 0e00000000000000, slots=0000000090249f80, mas->offset=15 count=14
> >
> > Where exactly are you printing this?
> 
> I added a lot of debug statements to the code trying to understand
> it. I'll attach it to this mail.

Thanks.  Can you check to see if that diff you sent was the correct
file?  It appears to be the git stats and not the changes themselves.

> 
> >> 
> >> I see in mas_next_nentry() that there's a while that iterates over the
> >> (used?) slots until count is reached.`
> >
> > Yes, mas_next_nentry() looks for the next non-null entry in the current
> > node.
> >
> >>After that loop mas_next_entry()
> >> just picks the next (unused?) entry, which is slot 15 in that case.
> >
> > mas_next_entry() returns the next non-null entry.  If there isn't one
> > returned by mas_next_nentry(), then it will advance to the next node by
> > calling mas_next_node().  There are checks in there for detecting dead
> > nodes for RCU use and limit checking as well.
> >
> >> 
> >> What i noticed while scanning over include/linux/maple_tree.h is:
> >> 
> >> struct maple_range_64 {
> >> 	struct maple_pnode *parent;
> >> 	unsigned long pivot[MAPLE_RANGE64_SLOTS - 1];
> >> 	union {
> >> 		void __rcu *slot[MAPLE_RANGE64_SLOTS];
> >> 		struct {
> >> 		void __rcu *pad[MAPLE_RANGE64_SLOTS - 1];
> >> 		struct maple_metadata meta;
> >>         	};
> >> 	};
> >> };
> >> 
> >> and struct maple_metadata is:
> >> 
> >> struct maple_metadata {
> >> 	unsigned char end;
> >> 	unsigned char gap;
> >> };
> >> 
> >> If i swap the gap and end members 0x0e00000000000000 becomes
> >> 0x000e000000000000. And 0xe matches our msa->offset 14 above.
> >> So it looks like mas_next() in mmap_region returns the meta
> >> data for the node.
> >
> > If this is the case, then I think any task that has more than 14 VMAs
> > would have issues.  I also use mas_next_entry() in mas_find() which is
> > used for the mas_for_each() macro/iterator.  Can you please enable
> > CONFIG_DEBUG_VM_MAPLE_TREE ?  mmap.c tests the tree after pretty much
> > any change and will dump useful information if there is an issue -
> > including the entire tree. See validate_mm_mt() for details.
> >
> > You can find CONFIG_DEBUG_VM_MAPLE_TREE in the config:
> > kernel hacking -> Memory debugging -> Debug VM -> Debug VM maple trees
> 
> I have both DEBUG_MAPPLE_TREE and DEBUG_VM_MAPLE_TREE enabled, but don't
> see anything printed.

I suspect that this means the issue is actually in the mmap code and not
the tree.  It may be sensitive to the task-specific layout.  Do you have
randomization on as well?  I'm thinking maybe I return NULL because I'm
asking for the next element after the last VMA and not checking that
correctly or similar.

Does ./scripts/faddr2line work for you?  What does it say about
mmap_region+0x19e/0x848 ?

Thanks,
Liam

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-16 15:50             ` Liam Howlett
@ 2022-05-16 17:10               ` Sven Schnelle
  2022-05-17 14:52                 ` Liam Howlett
  0 siblings, 1 reply; 32+ messages in thread
From: Sven Schnelle @ 2022-05-16 17:10 UTC (permalink / raw)
  To: Liam Howlett
  Cc: Heiko Carstens, Guenter Roeck, Andrew Morton, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3143 bytes --]

Liam Howlett <liam.howlett@oracle.com> writes:

> * Sven Schnelle <svens@linux.ibm.com> [220516 11:37]:
>> Hi Liam,
>> 
>> Liam Howlett <liam.howlett@oracle.com> writes:
>> 
>> > * Sven Schnelle <svens@linux.ibm.com> [220515 16:02]:
>> >
>> > I tried the above on my qemu s390 with kernel 5.18.0-rc6-next-20220513,
>> > but it runs without issue, return code is 0.  Is there something the VM
>> > needs to have for this to trigger?
>> 
>> A coworker said the same. Reason for this seems to be that i've run the
>> code in a unittest environment which seems to make a difference. When
>> compiling the code above with gcc on my system it also doesn't crash.
>> So i have to figure out what makes this unittest binary special.
>> 
>> >> I've added a few debug statements to the maple tree code:
>> >> 
>> >> [   27.769641] mas_next_entry: offset=14
>> >> [   27.769642] mas_next_nentry: entry = 0e00000000000000, slots=0000000090249f80, mas->offset=15 count=14
>> >
>> > Where exactly are you printing this?
>> 
>> I added a lot of debug statements to the code trying to understand
>> it. I'll attach it to this mail.
>
> Thanks.  Can you check to see if that diff you sent was the correct
> file?  It appears to be the git stats and not the changes themselves.

No, it wasn't. I did git stash show with -p, which doesn't make sense.
I've attached the correct diff.

>> > If this is the case, then I think any task that has more than 14 VMAs
>> > would have issues.  I also use mas_next_entry() in mas_find() which is
>> > used for the mas_for_each() macro/iterator.  Can you please enable
>> > CONFIG_DEBUG_VM_MAPLE_TREE ?  mmap.c tests the tree after pretty much
>> > any change and will dump useful information if there is an issue -
>> > including the entire tree. See validate_mm_mt() for details.
>> >
>> > You can find CONFIG_DEBUG_VM_MAPLE_TREE in the config:
>> > kernel hacking -> Memory debugging -> Debug VM -> Debug VM maple trees
>> 
>> I have both DEBUG_MAPPLE_TREE and DEBUG_VM_MAPLE_TREE enabled, but don't
>> see anything printed.
>
> I suspect that this means the issue is actually in the mmap code and not
> the tree.  It may be sensitive to the task-specific layout.  Do you have
> randomization on as well?  I'm thinking maybe I return NULL because I'm
> asking for the next element after the last VMA and not checking that
> correctly or similar.


> Does ./scripts/faddr2line work for you?  What does it say about
> mmap_region+0x19e/0x848 ?

2629        next = mas_next(&mas, ULONG_MAX);
2630        prev = mas_prev(&mas, 0);
2631        if (vm_flags & VM_SPECIAL)
2632               goto cannot_expand;
2633
2634        /* Attempt to expand an old mapping */
2635        /* Check next */
2636        if (next && next->vm_start == end && !vma_policy(next) &&

next above is 0x0e00000000000000 and next->vm_start will trigger the crash.

2637            can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen,
2638                              NULL_VM_UFFD_CTX, NULL)) {
2639               merge_end = next->vm_end;
2640               vma = next;
2641               vm_pgoff = next->vm_pgoff - pglen;
2642        }

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: mapple-debug.diff --]
[-- Type: text/x-diff, Size: 6469 bytes --]

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 967631055210..a621d17e872d 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -140,6 +140,7 @@ struct maple_subtree_state {
 	struct maple_big_node *bn;
 };
 
+extern int mas_debug;
 /* Functions */
 static inline struct maple_node *mt_alloc_one(gfp_t gfp)
 {
@@ -4550,6 +4551,9 @@ static inline void *mas_next_nentry(struct ma_state *mas,
 	while (mas->offset < count) {
 		pivot = pivots[mas->offset];
 		entry = mas_slot(mas, slots, mas->offset);
+		if (mas_debug)
+			pr_err("%s: entry = %px, slots=%px, mas->offset=%d\n",
+			       __func__, entry, slots, mas->offset);
 		if (ma_dead_node(node))
 			return NULL;
 
@@ -4570,6 +4574,10 @@ static inline void *mas_next_nentry(struct ma_state *mas,
 
 	pivot = mas_safe_pivot(mas, pivots, mas->offset, type);
 	entry = mas_slot(mas, slots, mas->offset);
+	if (mas_debug)
+		pr_err("%s: entry = %px, slots=%px, mas->offset=%d count=%d\n",
+		       __func__, entry, slots, mas->offset, count);
+
 	if (ma_dead_node(node))
 		return NULL;
 
@@ -4580,6 +4588,8 @@ static inline void *mas_next_nentry(struct ma_state *mas,
 		return NULL;
 
 found:
+	if (mas_debug)
+		pr_err("found pivot = %lx, entry = %px\n", pivot, entry);
 	mas->last = pivot;
 	return entry;
 }
@@ -4618,6 +4628,9 @@ static inline void *mas_next_entry(struct ma_state *mas, unsigned long limit)
 	unsigned long last;
 	enum maple_type mt;
 
+	if (mas_debug)
+		pr_err("%s: entry\n", __func__);
+
 	last = mas->last;
 retry:
 	offset = mas->offset;
@@ -4625,10 +4638,17 @@ static inline void *mas_next_entry(struct ma_state *mas, unsigned long limit)
 	node = mas_mn(mas);
 	mt = mte_node_type(mas->node);
 	mas->offset++;
-	if (unlikely(mas->offset >= mt_slots[mt]))
+	if (mas_debug)
+		pr_err("%s: offset=%d\n", __func__, offset);
+	if (unlikely(mas->offset >= mt_slots[mt])) {
+		if (mas_debug)
+			pr_err("%s: next node\n", __func__);
 		goto next_node;
+	}
 
 	while (!mas_is_none(mas)) {
+		if (mas_debug)
+			pr_err("%s: !none\n", __func__);
 		entry = mas_next_nentry(mas, node, limit, mt);
 		if (unlikely(ma_dead_node(node))) {
 			mas_rewalk(mas, last);
@@ -4656,6 +4676,8 @@ static inline void *mas_next_entry(struct ma_state *mas, unsigned long limit)
 	mas->index = mas->last = limit;
 	mas->offset = offset;
 	mas->node = prev_node;
+	if (mas_debug)
+		pr_err("%s: return NULL, mas->node = %px\n", __func__, prev_node);
 	return NULL;
 }
 
@@ -4914,6 +4936,8 @@ static inline bool mas_anode_descend(struct ma_state *mas, unsigned long size)
 void *mas_walk(struct ma_state *mas)
 {
 	void *entry;
+	if (mas_debug)
+		pr_err("%s\n", __func__);
 
 retry:
 	entry = mas_state_walk(mas);
@@ -5838,7 +5862,12 @@ EXPORT_SYMBOL_GPL(mas_pause);
  */
 void *mas_find(struct ma_state *mas, unsigned long max)
 {
+	if (mas_debug)
+		pr_err("%s: max=%lx\n", __func__, max);
+
 	if (unlikely(mas_is_paused(mas))) {
+		if (mas_debug)
+			pr_err("%s: paused\n", __func__);
 		if (unlikely(mas->last == ULONG_MAX)) {
 			mas->node = MAS_NONE;
 			return NULL;
@@ -5848,6 +5877,8 @@ void *mas_find(struct ma_state *mas, unsigned long max)
 	}
 
 	if (unlikely(mas_is_start(mas))) {
+		if (mas_debug)
+			pr_err("%s: start\n", __func__);
 		/* First run or continue */
 		void *entry;
 
@@ -5859,8 +5890,10 @@ void *mas_find(struct ma_state *mas, unsigned long max)
 			return entry;
 	}
 
-	if (unlikely(!mas_searchable(mas)))
+	if (unlikely(!mas_searchable(mas))) {
+		pr_err("%s: not searchable\n", __func__);
 		return NULL;
+	}
 
 	/* Retries on dead nodes handled by mas_next_entry */
 	return mas_next_entry(mas, max);
diff --git a/mm/mmap.c b/mm/mmap.c
index 2049500931ae..16ee834c0a4c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -57,6 +57,7 @@
 
 #include "internal.h"
 
+int mas_debug;
 #ifndef arch_mmap_check
 #define arch_mmap_check(addr, len, flags)	(0)
 #endif
@@ -2375,6 +2376,10 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
 	int count = 0;
 	int error = -ENOMEM;
 	MA_STATE(mas_detach, &mt_detach, start, end - 1);
+
+	if (mas_debug)
+		pr_err("%s:%d\n", __func__, __LINE__);
+
 	mt_init_flags(&mt_detach, MM_MT_FLAGS);
 	mt_set_external_lock(&mt_detach, &mm->mmap_lock);
 
@@ -2541,28 +2546,37 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
  *
  * Returns: -EINVAL on failure, 1 on success and unlock, 0 otherwise.
  */
+
+
+
 int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm,
 		  unsigned long start, size_t len, struct list_head *uf,
 		  bool downgrade)
 {
 	unsigned long end;
 	struct vm_area_struct *vma;
+	if (mas_debug)
+		pr_err("%s: %lx %lx\n", __func__, start, len);
 
 	if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start)
 		return -EINVAL;
-
+	if (mas_debug)
+		pr_err("%s:%d\n", __func__, __LINE__);
 	end = start + PAGE_ALIGN(len);
 	if (end == start)
 		return -EINVAL;
-
+	if (mas_debug)
+		pr_err("%s:%d\n", __func__, __LINE__);
 	 /* arch_unmap() might do unmaps itself.  */
 	arch_unmap(mm, start, end);
-
+	if (mas_debug)
+		pr_err("%s:%d\n", __func__, __LINE__);
 	/* Find the first overlapping VMA */
 	vma = mas_find(mas, end - 1);
 	if (!vma)
 		return 0;
-
+	if (mas_debug)
+		pr_err("vma=%px\n", vma);
 	return do_mas_align_munmap(mas, vma, mm, start, end, uf, downgrade);
 }
 
@@ -2594,6 +2608,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	pgoff_t vm_pgoff;
 	int error;
 	MA_STATE(mas, &mm->mm_mt, addr, end - 1);
+	mas_debug = (addr == (1UL << 54));
 
 	/* Check against address space limit. */
 	if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT)) {
@@ -2609,23 +2624,30 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 					(len >> PAGE_SHIFT) - nr_pages))
 			return -ENOMEM;
 	}
-
 	/* Unmap any existing mapping in the area */
-	if (do_mas_munmap(&mas, mm, addr, len, uf, false))
+
+	if (do_mas_munmap(&mas, mm, addr, len, uf, false)) {
+		mas_debug = 0;
 		return -ENOMEM;
+	}
 
 	/*
 	 * Private writable mapping: check memory availability
 	 */
 	if (accountable_mapping(file, vm_flags)) {
 		charged = len >> PAGE_SHIFT;
-		if (security_vm_enough_memory_mm(mm, charged))
+		if (security_vm_enough_memory_mm(mm, charged)) {
+			mas_debug = 0;
 			return -ENOMEM;
+		}
 		vm_flags |= VM_ACCOUNT;
 	}
 
 	next = mas_next(&mas, ULONG_MAX);
 	prev = mas_prev(&mas, 0);
+	if (mas_debug)
+		pr_err("%s: next %px\n", __func__, next);
+	mas_debug = 0;
 	if (vm_flags & VM_SPECIAL)
 		goto cannot_expand;
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-13 17:00     ` Liam Howlett
  2022-05-15 20:02       ` Sven Schnelle
@ 2022-05-17 11:53       ` Heiko Carstens
  2022-05-17 12:26         ` Heiko Carstens
                           ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: Heiko Carstens @ 2022-05-17 11:53 UTC (permalink / raw)
  To: Liam Howlett
  Cc: Sven Schnelle, Guenter Roeck, Andrew Morton, linux-mm, linux-kernel

On Fri, May 13, 2022 at 05:00:31PM +0000, Liam Howlett wrote:
> * Sven Schnelle <svens@linux.ibm.com> [220513 10:46]:
> > Heiko Carstens <hca@linux.ibm.com> writes:
> > > FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> > > bisect, but given that the call trace, and even the failing address
> > > match, I'm quite confident it is the same reason.
> > IS that issue supposed to be fixed? git bisect pointed me to
> > 
> > # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
> >   'mm-everything' of
> >   git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> > 
> > which isn't really helpful.
> > 
> > Anything we could help with debugging this?
> 
> I tested the maple tree on top of the s390 as it was the same crash and
> it was okay.  I haven't tested the mm-everything branch though.  Can you
> test mm-unstable?
> 
> I'll continue setting up a sparc VM for testing here and test
> mm-everything on that and the s390

So due to reports here I did some sort of "special bisect": with today's
linux-next I did a hard reset to commit 562340595cbb ("Merge branch
'for-next/kspp' of
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git"),
started a bisect on Andrew's tree between mm-stable and mm-unstable, and
merged whatever commit was about to be bisected into 562340595cbb.

This lead finally to commit f1297d3a2cb7 ("mm/mmap: reorganize munmap to
use maple states") as "first bad commit".

So given that we are shortly before the merge window and linux-next is
completely broken for s390, how do we proceed? Right now I have no idea if
there is anything else in linux-next that would break s390 because of this.

Even though I'm sure you won't like to hear this, but I'd appreciate if
this code could be removed from linux-next again.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-17 11:53       ` Heiko Carstens
@ 2022-05-17 12:26         ` Heiko Carstens
  2022-05-17 13:23         ` Guenter Roeck
  2022-05-17 14:32         ` Guenter Roeck
  2 siblings, 0 replies; 32+ messages in thread
From: Heiko Carstens @ 2022-05-17 12:26 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Liam Howlett, Sven Schnelle, Guenter Roeck, Andrew Morton,
	linux-mm, linux-kernel

On Tue, May 17, 2022 at 01:53:22PM +0200, Heiko Carstens wrote:
> On Fri, May 13, 2022 at 05:00:31PM +0000, Liam Howlett wrote:
> > * Sven Schnelle <svens@linux.ibm.com> [220513 10:46]:
> > > Heiko Carstens <hca@linux.ibm.com> writes:
> > > > FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> > > > bisect, but given that the call trace, and even the failing address
> > > > match, I'm quite confident it is the same reason.
> > > IS that issue supposed to be fixed? git bisect pointed me to
> > > 
> > > # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
> > >   'mm-everything' of
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> > > 
> > > which isn't really helpful.
> > > 
> > > Anything we could help with debugging this?
> > 
> > I tested the maple tree on top of the s390 as it was the same crash and
> > it was okay.  I haven't tested the mm-everything branch though.  Can you
> > test mm-unstable?
> > 
> > I'll continue setting up a sparc VM for testing here and test
> > mm-everything on that and the s390
> 
> So due to reports here I did some sort of "special bisect": with today's
> linux-next I did a hard reset to commit 562340595cbb ("Merge branch
> 'for-next/kspp' of
> git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git"),

Forgot to mention: this is the last merge commit before the akpm
trees were merged, which again come last in linux-next.

> started a bisect on Andrew's tree between mm-stable and mm-unstable, and
> merged whatever commit was about to be bisected into 562340595cbb.
> 
> This lead finally to commit f1297d3a2cb7 ("mm/mmap: reorganize munmap to
> use maple states") as "first bad commit".
> 
> So given that we are shortly before the merge window and linux-next is
> completely broken for s390, how do we proceed? Right now I have no idea if
> there is anything else in linux-next that would break s390 because of this.
> 
> Even though I'm sure you won't like to hear this, but I'd appreciate if
> this code could be removed from linux-next again.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-17 11:53       ` Heiko Carstens
  2022-05-17 12:26         ` Heiko Carstens
@ 2022-05-17 13:23         ` Guenter Roeck
  2022-05-17 15:03           ` Liam Howlett
  2022-05-17 14:32         ` Guenter Roeck
  2 siblings, 1 reply; 32+ messages in thread
From: Guenter Roeck @ 2022-05-17 13:23 UTC (permalink / raw)
  To: Heiko Carstens, Liam Howlett
  Cc: Sven Schnelle, Andrew Morton, linux-mm, linux-kernel

On 5/17/22 04:53, Heiko Carstens wrote:
> On Fri, May 13, 2022 at 05:00:31PM +0000, Liam Howlett wrote:
>> * Sven Schnelle <svens@linux.ibm.com> [220513 10:46]:
>>> Heiko Carstens <hca@linux.ibm.com> writes:
>>>> FWIW, same on s390 - linux-next is completely broken. Note: I didn't
>>>> bisect, but given that the call trace, and even the failing address
>>>> match, I'm quite confident it is the same reason.
>>> IS that issue supposed to be fixed? git bisect pointed me to
>>>
>>> # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
>>>    'mm-everything' of
>>>    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>>>
>>> which isn't really helpful.
>>>
>>> Anything we could help with debugging this?
>>
>> I tested the maple tree on top of the s390 as it was the same crash and
>> it was okay.  I haven't tested the mm-everything branch though.  Can you
>> test mm-unstable?
>>
>> I'll continue setting up a sparc VM for testing here and test
>> mm-everything on that and the s390
> 
> So due to reports here I did some sort of "special bisect": with today's
> linux-next I did a hard reset to commit 562340595cbb ("Merge branch
> 'for-next/kspp' of
> git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git"),
> started a bisect on Andrew's tree between mm-stable and mm-unstable, and
> merged whatever commit was about to be bisected into 562340595cbb.
> 
> This lead finally to commit f1297d3a2cb7 ("mm/mmap: reorganize munmap to
> use maple states") as "first bad commit".
> 
> So given that we are shortly before the merge window and linux-next is
> completely broken for s390, how do we proceed? Right now I have no idea if
> there is anything else in linux-next that would break s390 because of this.
> 
> Even though I'm sure you won't like to hear this, but I'd appreciate if
> this code could be removed from linux-next again.

I finally found some time to bisect the alpha boot failures in -next.
Bisect results below.

Guenter

---
# bad: [47c1c54d1bcd0a69a56b49473bc20f17b70e5242] Add linux-next specific files for 20220517
# good: [42226c989789d8da4af1de0c31070c96726d990c] Linux 5.18-rc7
git bisect start 'HEAD' 'v5.18-rc7'
# good: [27d9fca0814b912f762dd5adeb81d9ab250705a9] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
git bisect good 27d9fca0814b912f762dd5adeb81d9ab250705a9
# good: [c57a4cecd8f02b365109fc326c7cde5ec6020a54] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
git bisect good c57a4cecd8f02b365109fc326c7cde5ec6020a54
# good: [0389ffdcdd4eacf5cfcef2a5e25b49a1662f8a8b] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git
git bisect good 0389ffdcdd4eacf5cfcef2a5e25b49a1662f8a8b
# good: [c506c5c327711f8bb522c10262ec3a932fe91cd2] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git
git bisect good c506c5c327711f8bb522c10262ec3a932fe91cd2
# bad: [b24964a75c5c0129fd6a0d9bcb0fb6e84fbac126] mm/mmap: drop range_has_overlap() function
git bisect bad b24964a75c5c0129fd6a0d9bcb0fb6e84fbac126
# good: [3cbab4ca1ea8752912d8719d72059869b4c18045] mm/damon/sysfs: use enum for 'state' input handling
git bisect good 3cbab4ca1ea8752912d8719d72059869b4c18045
# good: [9113eaf331bf44579882c001867773cf1b3364fd] mm/memory-failure.c: add hwpoison_filter for soft offline
git bisect good 9113eaf331bf44579882c001867773cf1b3364fd
# good: [2b2adc303cebc213072520eb2e86e4beb65e9499] mm: remove rb tree.
git bisect good 2b2adc303cebc213072520eb2e86e4beb65e9499
# bad: [bbd9f664c351fbe61afa949abaaa03aed5c84af9] fs/proc/task_mmu: stop using linked list and highest_vm_end
git bisect bad bbd9f664c351fbe61afa949abaaa03aed5c84af9
# bad: [3b150cc8ab608f7a7da76b036b0ba7513628b0be] arm64: remove mmap linked list from vdso
git bisect bad 3b150cc8ab608f7a7da76b036b0ba7513628b0be
# good: [282a3e65bd1dece042bb8cf3a1666d79af1553b0] mm: use maple tree operations for find_vma_intersection()
git bisect good 282a3e65bd1dece042bb8cf3a1666d79af1553b0
# good: [d175d2cc268623e3e68181235a4aa068b50a8213] mm: convert vma_lookup() to use mtree_load()
git bisect good d175d2cc268623e3e68181235a4aa068b50a8213
# bad: [f1297d3a2cb77261c10fbbd69d92bbca700108e0] mm/mmap: reorganize munmap to use maple states
git bisect bad f1297d3a2cb77261c10fbbd69d92bbca700108e0
# good: [a897bb88e1e61df70f350a51f51cade3930adc3c] mm/mmap: move mmap_region() below do_munmap()
git bisect good a897bb88e1e61df70f350a51f51cade3930adc3c
# first bad commit: [f1297d3a2cb77261c10fbbd69d92bbca700108e0] mm/mmap: reorganize munmap to use maple states

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-17 11:53       ` Heiko Carstens
  2022-05-17 12:26         ` Heiko Carstens
  2022-05-17 13:23         ` Guenter Roeck
@ 2022-05-17 14:32         ` Guenter Roeck
  2022-05-19 14:35           ` Liam Howlett
  2 siblings, 1 reply; 32+ messages in thread
From: Guenter Roeck @ 2022-05-17 14:32 UTC (permalink / raw)
  To: Heiko Carstens, Liam Howlett
  Cc: Sven Schnelle, Andrew Morton, linux-mm, linux-kernel

On 5/17/22 04:53, Heiko Carstens wrote:
> On Fri, May 13, 2022 at 05:00:31PM +0000, Liam Howlett wrote:
>> * Sven Schnelle <svens@linux.ibm.com> [220513 10:46]:
>>> Heiko Carstens <hca@linux.ibm.com> writes:
>>>> FWIW, same on s390 - linux-next is completely broken. Note: I didn't
>>>> bisect, but given that the call trace, and even the failing address
>>>> match, I'm quite confident it is the same reason.
>>> IS that issue supposed to be fixed? git bisect pointed me to
>>>
>>> # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
>>>    'mm-everything' of
>>>    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>>>
>>> which isn't really helpful.
>>>
>>> Anything we could help with debugging this?
>>
>> I tested the maple tree on top of the s390 as it was the same crash and
>> it was okay.  I haven't tested the mm-everything branch though.  Can you
>> test mm-unstable?
>>
>> I'll continue setting up a sparc VM for testing here and test
>> mm-everything on that and the s390
> 
> So due to reports here I did some sort of "special bisect": with today's
> linux-next I did a hard reset to commit 562340595cbb ("Merge branch
> 'for-next/kspp' of
> git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git"),
> started a bisect on Andrew's tree between mm-stable and mm-unstable, and
> merged whatever commit was about to be bisected into 562340595cbb.
> 
> This lead finally to commit f1297d3a2cb7 ("mm/mmap: reorganize munmap to
> use maple states") as "first bad commit".
> 
> So given that we are shortly before the merge window and linux-next is
> completely broken for s390, how do we proceed? Right now I have no idea if
> there is anything else in linux-next that would break s390 because of this.
> 
> Even though I'm sure you won't like to hear this, but I'd appreciate if
> this code could be removed from linux-next again.

Another bisect result, boot failures with nommu targets (arm:mps2-an385,
m68k:mcf5208evb). Bisect log is the same for both.

Guenter

---
# bad: [47c1c54d1bcd0a69a56b49473bc20f17b70e5242] Add linux-next specific files for 20220517
# good: [42226c989789d8da4af1de0c31070c96726d990c] Linux 5.18-rc7
git bisect start 'HEAD' 'v5.18-rc7'
# good: [27d9fca0814b912f762dd5adeb81d9ab250705a9] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
git bisect good 27d9fca0814b912f762dd5adeb81d9ab250705a9
# good: [c57a4cecd8f02b365109fc326c7cde5ec6020a54] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
git bisect good c57a4cecd8f02b365109fc326c7cde5ec6020a54
# good: [0389ffdcdd4eacf5cfcef2a5e25b49a1662f8a8b] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git
git bisect good 0389ffdcdd4eacf5cfcef2a5e25b49a1662f8a8b
# good: [c506c5c327711f8bb522c10262ec3a932fe91cd2] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git
git bisect good c506c5c327711f8bb522c10262ec3a932fe91cd2
# bad: [b24964a75c5c0129fd6a0d9bcb0fb6e84fbac126] mm/mmap: drop range_has_overlap() function
git bisect bad b24964a75c5c0129fd6a0d9bcb0fb6e84fbac126
# good: [3cbab4ca1ea8752912d8719d72059869b4c18045] mm/damon/sysfs: use enum for 'state' input handling
git bisect good 3cbab4ca1ea8752912d8719d72059869b4c18045
# good: [9113eaf331bf44579882c001867773cf1b3364fd] mm/memory-failure.c: add hwpoison_filter for soft offline
git bisect good 9113eaf331bf44579882c001867773cf1b3364fd
# good: [2b2adc303cebc213072520eb2e86e4beb65e9499] mm: remove rb tree.
git bisect good 2b2adc303cebc213072520eb2e86e4beb65e9499
# good: [bbd9f664c351fbe61afa949abaaa03aed5c84af9] fs/proc/task_mmu: stop using linked list and highest_vm_end
git bisect good bbd9f664c351fbe61afa949abaaa03aed5c84af9
# good: [e09906217c535bd7c2e4eb35182543214c4b8891] mm/mempolicy: use vma iterator & maple state instead of vma linked list
git bisect good e09906217c535bd7c2e4eb35182543214c4b8891
# good: [0154775a4cf2bc79c2220207816c86142f259e95] mm/pagewalk: use vma_find() instead of vma linked list
git bisect good 0154775a4cf2bc79c2220207816c86142f259e95
# bad: [bd773a78705fb58eeadd80e5b31739df4c83c559] nommu: remove uses of VMA linked list
git bisect bad bd773a78705fb58eeadd80e5b31739df4c83c559
# good: [387141902a1de92a387816ae819d6111e582c6eb] i915: use the VMA iterator
git bisect good 387141902a1de92a387816ae819d6111e582c6eb
# first bad commit: [bd773a78705fb58eeadd80e5b31739df4c83c559] nommu: remove uses of VMA linked list

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-16 17:10               ` Sven Schnelle
@ 2022-05-17 14:52                 ` Liam Howlett
  0 siblings, 0 replies; 32+ messages in thread
From: Liam Howlett @ 2022-05-17 14:52 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Heiko Carstens, Guenter Roeck, Andrew Morton, linux-mm, linux-kernel

* Sven Schnelle <svens@linux.ibm.com> [220516 13:10]:
> Liam Howlett <liam.howlett@oracle.com> writes:
> 
> > * Sven Schnelle <svens@linux.ibm.com> [220516 11:37]:
> >> Hi Liam,
> >> 
> >> Liam Howlett <liam.howlett@oracle.com> writes:
> >> 
> >> > * Sven Schnelle <svens@linux.ibm.com> [220515 16:02]:
> >> >
> >> > I tried the above on my qemu s390 with kernel 5.18.0-rc6-next-20220513,
> >> > but it runs without issue, return code is 0.  Is there something the VM
> >> > needs to have for this to trigger?
> >> 
> >> A coworker said the same. Reason for this seems to be that i've run the
> >> code in a unittest environment which seems to make a difference. When
> >> compiling the code above with gcc on my system it also doesn't crash.
> >> So i have to figure out what makes this unittest binary special.
> >> 
> >> >> I've added a few debug statements to the maple tree code:
> >> >> 
> >> >> [   27.769641] mas_next_entry: offset=14
> >> >> [   27.769642] mas_next_nentry: entry = 0e00000000000000, slots=0000000090249f80, mas->offset=15 count=14
> >> >
> >> > Where exactly are you printing this?
> >> 
> >> I added a lot of debug statements to the code trying to understand
> >> it. I'll attach it to this mail.
> >
> > Thanks.  Can you check to see if that diff you sent was the correct
> > file?  It appears to be the git stats and not the changes themselves.
> 
> No, it wasn't. I did git stash show with -p, which doesn't make sense.
> I've attached the correct diff.

Thanks for this.  I think the key is that the offset is beyond the end
of the node (count).  What is happening is that we are already at the
last entry in the node and calling next.  I had moved the last entry
from the loop to optimize mas_next_nentry() and set it after.
Unfortunately I did not check for this condition on entry to the
function.  I have a patch I will send out shortly.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-17 13:23         ` Guenter Roeck
@ 2022-05-17 15:03           ` Liam Howlett
  2022-05-17 16:28             ` Guenter Roeck
  0 siblings, 1 reply; 32+ messages in thread
From: Liam Howlett @ 2022-05-17 15:03 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Heiko Carstens, Sven Schnelle, Andrew Morton, linux-mm, linux-kernel

* Guenter Roeck <linux@roeck-us.net> [220517 09:23]:
> On 5/17/22 04:53, Heiko Carstens wrote:
> > On Fri, May 13, 2022 at 05:00:31PM +0000, Liam Howlett wrote:
> > > * Sven Schnelle <svens@linux.ibm.com> [220513 10:46]:
> > > > Heiko Carstens <hca@linux.ibm.com> writes:
> > > > > FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> > > > > bisect, but given that the call trace, and even the failing address
> > > > > match, I'm quite confident it is the same reason.
> > > > IS that issue supposed to be fixed? git bisect pointed me to
> > > > 
> > > > # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
> > > >    'mm-everything' of
> > > >    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> > > > 
> > > > which isn't really helpful.
> > > > 
> > > > Anything we could help with debugging this?
> > > 
> > > I tested the maple tree on top of the s390 as it was the same crash and
> > > it was okay.  I haven't tested the mm-everything branch though.  Can you
> > > test mm-unstable?
> > > 
> > > I'll continue setting up a sparc VM for testing here and test
> > > mm-everything on that and the s390
> > 
> > So due to reports here I did some sort of "special bisect": with today's
> > linux-next I did a hard reset to commit 562340595cbb ("Merge branch
> > 'for-next/kspp' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git"),
> > started a bisect on Andrew's tree between mm-stable and mm-unstable, and
> > merged whatever commit was about to be bisected into 562340595cbb.
> > 
> > This lead finally to commit f1297d3a2cb7 ("mm/mmap: reorganize munmap to
> > use maple states") as "first bad commit".
> > 
> > So given that we are shortly before the merge window and linux-next is
> > completely broken for s390, how do we proceed? Right now I have no idea if
> > there is anything else in linux-next that would break s390 because of this.
> > 
> > Even though I'm sure you won't like to hear this, but I'd appreciate if
> > this code could be removed from linux-next again.
> 
> I finally found some time to bisect the alpha boot failures in -next.
> Bisect results below.
> 
> Guenter
> 
> ---
...
> # first bad commit: [f1297d3a2cb77261c10fbbd69d92bbca700108e0] mm/mmap: reorganize munmap to use maple states

Thanks for all the work on this.

I was able to reproduce on a sparc64 VM.  I was returning node metadata
in a rare case.  I just CC'ed you all on a patch [1].  It should apply
cleanly to whatever branch you want since it only changes
lib/maple_tree.c

1. https://lore.kernel.org/linux-mm/20220517145913.3480729-1-Liam.Howlett@oracle.com/

Thanks,
Liam

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-17 15:03           ` Liam Howlett
@ 2022-05-17 16:28             ` Guenter Roeck
  2022-05-17 20:38               ` Liam Howlett
  0 siblings, 1 reply; 32+ messages in thread
From: Guenter Roeck @ 2022-05-17 16:28 UTC (permalink / raw)
  To: Liam Howlett
  Cc: Heiko Carstens, Sven Schnelle, Andrew Morton, linux-mm, linux-kernel

On 5/17/22 08:03, Liam Howlett wrote:
> * Guenter Roeck <linux@roeck-us.net> [220517 09:23]:
>> On 5/17/22 04:53, Heiko Carstens wrote:
>>> On Fri, May 13, 2022 at 05:00:31PM +0000, Liam Howlett wrote:
>>>> * Sven Schnelle <svens@linux.ibm.com> [220513 10:46]:
>>>>> Heiko Carstens <hca@linux.ibm.com> writes:
>>>>>> FWIW, same on s390 - linux-next is completely broken. Note: I didn't
>>>>>> bisect, but given that the call trace, and even the failing address
>>>>>> match, I'm quite confident it is the same reason.
>>>>> IS that issue supposed to be fixed? git bisect pointed me to
>>>>>
>>>>> # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
>>>>>     'mm-everything' of
>>>>>     git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>>>>>
>>>>> which isn't really helpful.
>>>>>
>>>>> Anything we could help with debugging this?
>>>>
>>>> I tested the maple tree on top of the s390 as it was the same crash and
>>>> it was okay.  I haven't tested the mm-everything branch though.  Can you
>>>> test mm-unstable?
>>>>
>>>> I'll continue setting up a sparc VM for testing here and test
>>>> mm-everything on that and the s390
>>>
>>> So due to reports here I did some sort of "special bisect": with today's
>>> linux-next I did a hard reset to commit 562340595cbb ("Merge branch
>>> 'for-next/kspp' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git"),
>>> started a bisect on Andrew's tree between mm-stable and mm-unstable, and
>>> merged whatever commit was about to be bisected into 562340595cbb.
>>>
>>> This lead finally to commit f1297d3a2cb7 ("mm/mmap: reorganize munmap to
>>> use maple states") as "first bad commit".
>>>
>>> So given that we are shortly before the merge window and linux-next is
>>> completely broken for s390, how do we proceed? Right now I have no idea if
>>> there is anything else in linux-next that would break s390 because of this.
>>>
>>> Even though I'm sure you won't like to hear this, but I'd appreciate if
>>> this code could be removed from linux-next again.
>>
>> I finally found some time to bisect the alpha boot failures in -next.
>> Bisect results below.
>>
>> Guenter
>>
>> ---
> ...
>> # first bad commit: [f1297d3a2cb77261c10fbbd69d92bbca700108e0] mm/mmap: reorganize munmap to use maple states
> 
> Thanks for all the work on this.
> 
> I was able to reproduce on a sparc64 VM.  I was returning node metadata
> in a rare case.  I just CC'ed you all on a patch [1].  It should apply
> cleanly to whatever branch you want since it only changes
> lib/maple_tree.c
> 

FWIW, below is my bisect log for the sparc64 problem. It looks like your patch
does fix this problem.

Guenter

---
# bad: [47c1c54d1bcd0a69a56b49473bc20f17b70e5242] Add linux-next specific files for 20220517
# good: [42226c989789d8da4af1de0c31070c96726d990c] Linux 5.18-rc7
git bisect start 'HEAD' 'v5.18-rc7'
# good: [27d9fca0814b912f762dd5adeb81d9ab250705a9] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
git bisect good 27d9fca0814b912f762dd5adeb81d9ab250705a9
# good: [c57a4cecd8f02b365109fc326c7cde5ec6020a54] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
git bisect good c57a4cecd8f02b365109fc326c7cde5ec6020a54
# good: [0389ffdcdd4eacf5cfcef2a5e25b49a1662f8a8b] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git
git bisect good 0389ffdcdd4eacf5cfcef2a5e25b49a1662f8a8b
# good: [c506c5c327711f8bb522c10262ec3a932fe91cd2] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git
git bisect good c506c5c327711f8bb522c10262ec3a932fe91cd2
# bad: [b24964a75c5c0129fd6a0d9bcb0fb6e84fbac126] mm/mmap: drop range_has_overlap() function
git bisect bad b24964a75c5c0129fd6a0d9bcb0fb6e84fbac126
# good: [3cbab4ca1ea8752912d8719d72059869b4c18045] mm/damon/sysfs: use enum for 'state' input handling
git bisect good 3cbab4ca1ea8752912d8719d72059869b4c18045
# good: [9113eaf331bf44579882c001867773cf1b3364fd] mm/memory-failure.c: add hwpoison_filter for soft offline
git bisect good 9113eaf331bf44579882c001867773cf1b3364fd
# good: [2b2adc303cebc213072520eb2e86e4beb65e9499] mm: remove rb tree.
git bisect good 2b2adc303cebc213072520eb2e86e4beb65e9499
# bad: [bbd9f664c351fbe61afa949abaaa03aed5c84af9] fs/proc/task_mmu: stop using linked list and highest_vm_end
git bisect bad bbd9f664c351fbe61afa949abaaa03aed5c84af9
# bad: [3b150cc8ab608f7a7da76b036b0ba7513628b0be] arm64: remove mmap linked list from vdso
git bisect bad 3b150cc8ab608f7a7da76b036b0ba7513628b0be
# good: [282a3e65bd1dece042bb8cf3a1666d79af1553b0] mm: use maple tree operations for find_vma_intersection()
git bisect good 282a3e65bd1dece042bb8cf3a1666d79af1553b0
# good: [d175d2cc268623e3e68181235a4aa068b50a8213] mm: convert vma_lookup() to use mtree_load()
git bisect good d175d2cc268623e3e68181235a4aa068b50a8213
# bad: [f1297d3a2cb77261c10fbbd69d92bbca700108e0] mm/mmap: reorganize munmap to use maple states
git bisect bad f1297d3a2cb77261c10fbbd69d92bbca700108e0
# good: [a897bb88e1e61df70f350a51f51cade3930adc3c] mm/mmap: move mmap_region() below do_munmap()
git bisect good a897bb88e1e61df70f350a51f51cade3930adc3c
# first bad commit: [f1297d3a2cb77261c10fbbd69d92bbca700108e0] mm/mmap: reorganize munmap to use maple states


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-17 16:28             ` Guenter Roeck
@ 2022-05-17 20:38               ` Liam Howlett
  0 siblings, 0 replies; 32+ messages in thread
From: Liam Howlett @ 2022-05-17 20:38 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Heiko Carstens, Sven Schnelle, Andrew Morton, linux-mm, linux-kernel

* Guenter Roeck <linux@roeck-us.net> [220517 12:28]:
> On 5/17/22 08:03, Liam Howlett wrote:
> > * Guenter Roeck <linux@roeck-us.net> [220517 09:23]:
> > > On 5/17/22 04:53, Heiko Carstens wrote:
> > > > On Fri, May 13, 2022 at 05:00:31PM +0000, Liam Howlett wrote:
> > > > > * Sven Schnelle <svens@linux.ibm.com> [220513 10:46]:
> > > > > > Heiko Carstens <hca@linux.ibm.com> writes:
> > > > > > > FWIW, same on s390 - linux-next is completely broken. Note: I didn't
> > > > > > > bisect, but given that the call trace, and even the failing address
> > > > > > > match, I'm quite confident it is the same reason.
> > > > > > IS that issue supposed to be fixed? git bisect pointed me to
> > > > > > 
> > > > > > # bad: [76535d42eb53485775a8c54ea85725812b75543f] Merge branch
> > > > > >     'mm-everything' of
> > > > > >     git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> > > > > > 
> > > > > > which isn't really helpful.
> > > > > > 
> > > > > > Anything we could help with debugging this?
> > > > > 
> > > > > I tested the maple tree on top of the s390 as it was the same crash and
> > > > > it was okay.  I haven't tested the mm-everything branch though.  Can you
> > > > > test mm-unstable?
> > > > > 
> > > > > I'll continue setting up a sparc VM for testing here and test
> > > > > mm-everything on that and the s390
> > > > 
> > > > So due to reports here I did some sort of "special bisect": with today's
> > > > linux-next I did a hard reset to commit 562340595cbb ("Merge branch
> > > > 'for-next/kspp' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git"),
> > > > started a bisect on Andrew's tree between mm-stable and mm-unstable, and
> > > > merged whatever commit was about to be bisected into 562340595cbb.
> > > > 
> > > > This lead finally to commit f1297d3a2cb7 ("mm/mmap: reorganize munmap to
> > > > use maple states") as "first bad commit".
> > > > 
> > > > So given that we are shortly before the merge window and linux-next is
> > > > completely broken for s390, how do we proceed? Right now I have no idea if
> > > > there is anything else in linux-next that would break s390 because of this.
> > > > 
> > > > Even though I'm sure you won't like to hear this, but I'd appreciate if
> > > > this code could be removed from linux-next again.
> > > 
> > > I finally found some time to bisect the alpha boot failures in -next.
> > > Bisect results below.
> > > 
> > > Guenter
> > > 
> > > ---
> > ...
> > > # first bad commit: [f1297d3a2cb77261c10fbbd69d92bbca700108e0] mm/mmap: reorganize munmap to use maple states
> > 
> > Thanks for all the work on this.
> > 
> > I was able to reproduce on a sparc64 VM.  I was returning node metadata
> > in a rare case.  I just CC'ed you all on a patch [1].  It should apply
> > cleanly to whatever branch you want since it only changes
> > lib/maple_tree.c
> > 
> 
> FWIW, below is my bisect log for the sparc64 problem. It looks like your patch
> does fix this problem.
> 

Thanks, yes.  I actually tested on my sparc64 VM since I still cannot
get the s390 to crash.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-17 14:32         ` Guenter Roeck
@ 2022-05-19 14:35           ` Liam Howlett
  2022-05-19 21:41             ` Guenter Roeck
  0 siblings, 1 reply; 32+ messages in thread
From: Liam Howlett @ 2022-05-19 14:35 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Heiko Carstens, Sven Schnelle, Andrew Morton, linux-mm, linux-kernel

* Guenter Roeck <linux@roeck-us.net> [220517 10:32]:

...
> 
> Another bisect result, boot failures with nommu targets (arm:mps2-an385,
> m68k:mcf5208evb). Bisect log is the same for both.
...
> # first bad commit: [bd773a78705fb58eeadd80e5b31739df4c83c559] nommu: remove uses of VMA linked list

I cannot reproduce this on my side, even with that specific commit.  Can
you point me to the failure log, config file, etc?  Do you still see
this with the fixes I've sent recently?

Thanks,
Liam

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-19 14:35           ` Liam Howlett
@ 2022-05-19 21:41             ` Guenter Roeck
  2022-05-19 22:38               ` Liam Howlett
  2022-05-30 17:38               ` Liam Howlett
  0 siblings, 2 replies; 32+ messages in thread
From: Guenter Roeck @ 2022-05-19 21:41 UTC (permalink / raw)
  To: Liam Howlett
  Cc: Heiko Carstens, Sven Schnelle, Andrew Morton, linux-mm, linux-kernel

On 5/19/22 07:35, Liam Howlett wrote:
> * Guenter Roeck <linux@roeck-us.net> [220517 10:32]:
> 
> ...
>>
>> Another bisect result, boot failures with nommu targets (arm:mps2-an385,
>> m68k:mcf5208evb). Bisect log is the same for both.
> ...
>> # first bad commit: [bd773a78705fb58eeadd80e5b31739df4c83c559] nommu: remove uses of VMA linked list
> 
> I cannot reproduce this on my side, even with that specific commit.  Can
> you point me to the failure log, config file, etc?  Do you still see
> this with the fixes I've sent recently?
> 

This was in linux-next; most recently with next-20220517.
I don't know if that was up-to-date with your patches.
The problem seems to be memory allocation failures.
A sample log is at
https://kerneltests.org/builders/qemu-m68k-next/builds/1065/steps/qemubuildcommand/logs/stdio
The log history at
https://kerneltests.org/builders/qemu-m68k-next?numbuilds=30
will give you a variety of logs.

The configuration is derived from m5208evb_defconfig, with initrd
and command line embedded in the image. You can see the detailed
configuration updates at
https://github.com/groeck/linux-build-test/blob/master/rootfs/m68k/run-qemu-m68k.sh

Qemu command line is

qemu-system-m68k -M mcf5208evb -kernel vmlinux \
     -cpu m5208 -no-reboot -nographic -monitor none
     -append "rdinit=/sbin/init console=ttyS0,115200"

with initrd from
https://github.com/groeck/linux-build-test/blob/master/rootfs/m68k/rootfs-5208.cpio.gz

I use qemu v6.2, but any recent qemu version should work.

Hope this helps,
Guenter

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-19 21:41             ` Guenter Roeck
@ 2022-05-19 22:38               ` Liam Howlett
  2022-05-30 17:38               ` Liam Howlett
  1 sibling, 0 replies; 32+ messages in thread
From: Liam Howlett @ 2022-05-19 22:38 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Heiko Carstens, Sven Schnelle, Andrew Morton, linux-mm, linux-kernel

* Guenter Roeck <linux@roeck-us.net> [220519 17:42]:
> On 5/19/22 07:35, Liam Howlett wrote:
> > * Guenter Roeck <linux@roeck-us.net> [220517 10:32]:
> > 
> > ...
> > > 
> > > Another bisect result, boot failures with nommu targets (arm:mps2-an385,
> > > m68k:mcf5208evb). Bisect log is the same for both.
> > ...
> > > # first bad commit: [bd773a78705fb58eeadd80e5b31739df4c83c559] nommu: remove uses of VMA linked list
> > 
> > I cannot reproduce this on my side, even with that specific commit.  Can
> > you point me to the failure log, config file, etc?  Do you still see
> > this with the fixes I've sent recently?
> > 
> 
> This was in linux-next; most recently with next-20220517.
> I don't know if that was up-to-date with your patches.
> The problem seems to be memory allocation failures.
> A sample log is at
> https://kerneltests.org/builders/qemu-m68k-next/builds/1065/steps/qemubuildcommand/logs/stdio
> The log history at
> https://kerneltests.org/builders/qemu-m68k-next?numbuilds=30
> will give you a variety of logs.

I did hunt that down.  It looks like it's allocating 512kb and failing.
I tried the commit you see the failures on and my qemu boots fine.

> 
> The configuration is derived from m5208evb_defconfig, with initrd
> and command line embedded in the image. You can see the detailed
> configuration updates at
> https://github.com/groeck/linux-build-test/blob/master/rootfs/m68k/run-qemu-m68k.sh
> 
> Qemu command line is
> 
> qemu-system-m68k -M mcf5208evb -kernel vmlinux \
>     -cpu m5208 -no-reboot -nographic -monitor none
>     -append "rdinit=/sbin/init console=ttyS0,115200"
> 
> with initrd from
> https://github.com/groeck/linux-build-test/blob/master/rootfs/m68k/rootfs-5208.cpio.gz

I'm using buildroot-2022.02.1, so maybe that's the difference?
Buildroot has the following qemu line:

qemu-system-m68k -M mcf5208evb -cpu m5208 -kernel vmlinux -nographic

> 
> I use qemu v6.2, but any recent qemu version should work.
> 
> Hope this helps,
> Guenter

Thanks for the information.  I will keep digging into it.
Liam

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-19 21:41             ` Guenter Roeck
  2022-05-19 22:38               ` Liam Howlett
@ 2022-05-30 17:38               ` Liam Howlett
  2022-05-31 18:56                 ` Liam Howlett
  1 sibling, 1 reply; 32+ messages in thread
From: Liam Howlett @ 2022-05-30 17:38 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Heiko Carstens, Sven Schnelle, Andrew Morton, linux-mm, linux-kernel

* Guenter Roeck <linux@roeck-us.net> [220519 17:42]:
> On 5/19/22 07:35, Liam Howlett wrote:
> > * Guenter Roeck <linux@roeck-us.net> [220517 10:32]:
> > 
> > ...
> > > 
> > > Another bisect result, boot failures with nommu targets (arm:mps2-an385,
> > > m68k:mcf5208evb). Bisect log is the same for both.
> > ...
> > > # first bad commit: [bd773a78705fb58eeadd80e5b31739df4c83c559] nommu: remove uses of VMA linked list
> > 
> > I cannot reproduce this on my side, even with that specific commit.  Can
> > you point me to the failure log, config file, etc?  Do you still see
> > this with the fixes I've sent recently?
> > 
> 
> This was in linux-next; most recently with next-20220517.
> I don't know if that was up-to-date with your patches.
> The problem seems to be memory allocation failures.
> A sample log is at
> https://kerneltests.org/builders/qemu-m68k-next/builds/1065/steps/qemubuildcommand/logs/stdio
> The log history at
> https://kerneltests.org/builders/qemu-m68k-next?numbuilds=30
> will give you a variety of logs.
> 
> The configuration is derived from m5208evb_defconfig, with initrd
> and command line embedded in the image. You can see the detailed
> configuration updates at
> https://github.com/groeck/linux-build-test/blob/master/rootfs/m68k/run-qemu-m68k.sh
> 
> Qemu command line is
> 
> qemu-system-m68k -M mcf5208evb -kernel vmlinux \
>     -cpu m5208 -no-reboot -nographic -monitor none
>     -append "rdinit=/sbin/init console=ttyS0,115200"
> 
> with initrd from
> https://github.com/groeck/linux-build-test/blob/master/rootfs/m68k/rootfs-5208.cpio.gz
> 
> I use qemu v6.2, but any recent qemu version should work.

I have qemu 7.0 which seems to change the default memory size from 32MB
to 128MB. This can be seen on your log here:

Memory: 27928K/32768K available (2827K kernel code, 160K rwdata, 432K rodata, 1016K init, 66K bss, 4840K reserved, 0K cma-reserved)

With 128MB the kernel boots.  With 64MB it also boots.  32MB fails with
an OOM. Looking into it more, I see that the OOM is caused by a
contiguous page allocation of 1MB (order 7 at 8K pages).  This can be
seen in the log as well:

Running sysctl: echo: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL), nodemask=(null)
...
nommu: Allocation of length 884736 from process 63 (echo) failed

This last log message above comes from the code path that uses
alloc_pages_exact().

I don't see why my 256 byte nodes (order 0 allocations yield 32 nodes)
would fragment the memory beyond use on boot.  I have checked for some
sort of massive leak by adding a static node count to the code and have
only ever hit ~12 nodes.  Consulting the OOM log from the above link
again:

DMA: 0*8kB 1*16kB (U) 9*32kB (U) 7*64kB (U) 21*128kB (U) 7*256kB (U) 6*512kB (U) 0*1024kB 0*2048kB 0*4096kB 0*8192kB = 8304kB

So to get to the point of breaking up a 1MB block, we'd need an obscene
number of nodes.

Furthermore, the OOM on boot is not always happening.  When boot
succeeds without an oom,  I checked slabinfo and see that the maple_node
has 32 active objects which is 1 order 0 allocation. The boot does
mostly cause an OOM.  It is worth noting that the slabinfo count is lazy
on counting the number of active objects so it is most likely lower than
this value in reality.

Does anyone have any idea why nommu would be getting this fragmented?

Thanks,
Liam

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-30 17:38               ` Liam Howlett
@ 2022-05-31 18:56                 ` Liam Howlett
  2022-06-01 19:06                   ` Liam Howlett
  0 siblings, 1 reply; 32+ messages in thread
From: Liam Howlett @ 2022-05-31 18:56 UTC (permalink / raw)
  To: Guenter Roeck, Heiko Carstens, Sven Schnelle, Andrew Morton,
	linux-mm, linux-kernel, Matthew Wilcox (Oracle)

[-- Attachment #1: Type: text/plain, Size: 5292 bytes --]

* Liam R. Howlett <Liam.Howlett@Oracle.com> [220530 13:38]:
> * Guenter Roeck <linux@roeck-us.net> [220519 17:42]:
> > On 5/19/22 07:35, Liam Howlett wrote:
> > > * Guenter Roeck <linux@roeck-us.net> [220517 10:32]:
> > > 
> > > ...
> > > > 
> > > > Another bisect result, boot failures with nommu targets (arm:mps2-an385,
> > > > m68k:mcf5208evb). Bisect log is the same for both.
> > > ...
> > > > # first bad commit: [bd773a78705fb58eeadd80e5b31739df4c83c559] nommu: remove uses of VMA linked list
> > > 
> > > I cannot reproduce this on my side, even with that specific commit.  Can
> > > you point me to the failure log, config file, etc?  Do you still see
> > > this with the fixes I've sent recently?
> > > 
> > 
> > This was in linux-next; most recently with next-20220517.
> > I don't know if that was up-to-date with your patches.
> > The problem seems to be memory allocation failures.
> > A sample log is at
> > https://kerneltests.org/builders/qemu-m68k-next/builds/1065/steps/qemubuildcommand/logs/stdio
> > The log history at
> > https://kerneltests.org/builders/qemu-m68k-next?numbuilds=30
> > will give you a variety of logs.
> > 
> > The configuration is derived from m5208evb_defconfig, with initrd
> > and command line embedded in the image. You can see the detailed
> > configuration updates at
> > https://github.com/groeck/linux-build-test/blob/master/rootfs/m68k/run-qemu-m68k.sh
> > 
> > Qemu command line is
> > 
> > qemu-system-m68k -M mcf5208evb -kernel vmlinux \
> >     -cpu m5208 -no-reboot -nographic -monitor none
> >     -append "rdinit=/sbin/init console=ttyS0,115200"
> > 
> > with initrd from
> > https://github.com/groeck/linux-build-test/blob/master/rootfs/m68k/rootfs-5208.cpio.gz
> > 
> > I use qemu v6.2, but any recent qemu version should work.
> 
> I have qemu 7.0 which seems to change the default memory size from 32MB
> to 128MB. This can be seen on your log here:
> 
> Memory: 27928K/32768K available (2827K kernel code, 160K rwdata, 432K rodata, 1016K init, 66K bss, 4840K reserved, 0K cma-reserved)
> 
> With 128MB the kernel boots.  With 64MB it also boots.  32MB fails with
> an OOM. Looking into it more, I see that the OOM is caused by a
> contiguous page allocation of 1MB (order 7 at 8K pages).  This can be
> seen in the log as well:
> 
> Running sysctl: echo: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL), nodemask=(null)
> ...
> nommu: Allocation of length 884736 from process 63 (echo) failed
> 
> This last log message above comes from the code path that uses
> alloc_pages_exact().
> 
> I don't see why my 256 byte nodes (order 0 allocations yield 32 nodes)
> would fragment the memory beyond use on boot.  I have checked for some
> sort of massive leak by adding a static node count to the code and have
> only ever hit ~12 nodes.  Consulting the OOM log from the above link
> again:
> 
> DMA: 0*8kB 1*16kB (U) 9*32kB (U) 7*64kB (U) 21*128kB (U) 7*256kB (U) 6*512kB (U) 0*1024kB 0*2048kB 0*4096kB 0*8192kB = 8304kB
> 
> So to get to the point of breaking up a 1MB block, we'd need an obscene
> number of nodes.
> 
> Furthermore, the OOM on boot is not always happening.  When boot
> succeeds without an oom,  I checked slabinfo and see that the maple_node
> has 32 active objects which is 1 order 0 allocation. The boot does
> mostly cause an OOM.  It is worth noting that the slabinfo count is lazy
> on counting the number of active objects so it is most likely lower than
> this value in reality.
> 
> Does anyone have any idea why nommu would be getting this fragmented?

Answer: Why, yes.  Matthew does.  Using alloc_pages_exact() means we
allocate the huge chunk of memory then free the leftovers immediately.
Those freed leftover pages are handed out on the next request - which
happens to be the maple tree.

It seems nommu is so close to OOMing already that this makes a
difference.  Attached is a patch which _almost_ solves the issue by
making it less likely to use those pages, but it's still a matter of
timing on if this will OOM anyways.  It reduces the potential by a large
margin, maybe 1/10 fail instead of 4/5 failing.  This patch is probably
worth taking on its own as it reduces memory fragmentation on
short-lived allocations that use alloc_pages_exact().

I changed the nommu code a bit to reduce memory usage as well.  During a
split even, I no longer delete then re-add the VMA and I only
preallocate a single time for the two writes associated with a split. I
also moved my pre-allocation ahead of the call path that does
alloc_pages_exact().  This all but ensures we won't fragment the larger
chunks of memory as we get enough nodes out of a single page to run at
least through boot.  However, the failure rate remained at 1/10 with
this change.

I had accepted the scenario that this all just worked before, but my
setup is different than that of Guenter.  I am using buildroot-2022.02.1
and qemu 7.0 for my testing.  My configuration OOMs 12/13 times without
maple tree, so I think we actually lowered the memory pressure on boot
with these changes.  Obviously there is a element of timing that causes
variation in the testing so exact numbers are not possible.

Thanks,
Liam


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-mm-page_alloc-Reduce-potential-fragmentation-in-make.patch --]
[-- Type: text/x-diff; name="0001-mm-page_alloc-Reduce-potential-fragmentation-in-make.patch", Size: 1664 bytes --]

From abef6d264d2413a625670bdb873133576d5cce5f Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
Date: Tue, 31 May 2022 09:20:51 -0400
Subject: [PATCH] mm/page_alloc:  Reduce potential fragmentation in
 make_alloc_exact()

Try to avoid using the left over split page on the next request for a
page by calling __free_pages_ok() with FPI_TO_TAIL.  This increases the
potential of defragmenting memory when it's used for a short period of
time.

Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/page_alloc.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f01c71e41bcf..8b6d6cada684 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5580,14 +5580,18 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
 		size_t size)
 {
 	if (addr) {
-		unsigned long alloc_end = addr + (PAGE_SIZE << order);
-		unsigned long used = addr + PAGE_ALIGN(size);
-
-		split_page(virt_to_page((void *)addr), order);
-		while (used < alloc_end) {
-			free_page(used);
-			used += PAGE_SIZE;
-		}
+		unsigned long nr = DIV_ROUND_UP(size, PAGE_SIZE);
+		struct page *page = virt_to_page((void *)addr);
+		struct page *last = page + nr;
+
+		split_page_owner(page, 1 << order);
+		split_page_memcg(page, 1 << order);
+		while (page < --last)
+			set_page_refcounted(last);
+
+		last = page + (1UL << order);
+		for (page += nr; page < last; page++)
+			__free_pages_ok(page, 0, FPI_TO_TAIL);
 	}
 	return (void *)addr;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH] mapletree-vs-khugepaged
  2022-05-31 18:56                 ` Liam Howlett
@ 2022-06-01 19:06                   ` Liam Howlett
  0 siblings, 0 replies; 32+ messages in thread
From: Liam Howlett @ 2022-06-01 19:06 UTC (permalink / raw)
  To: Guenter Roeck, Heiko Carstens, Sven Schnelle, Andrew Morton,
	linux-mm, linux-kernel, Matthew Wilcox (Oracle)

[-- Attachment #1: Type: text/plain, Size: 2992 bytes --]

* Liam R. Howlett <Liam.Howlett@Oracle.com> [220531 14:56]:
> * Liam R. Howlett <Liam.Howlett@Oracle.com> [220530 13:38]:
> > * Guenter Roeck <linux@roeck-us.net> [220519 17:42]:
> > > On 5/19/22 07:35, Liam Howlett wrote:
> > > > * Guenter Roeck <linux@roeck-us.net> [220517 10:32]:
> > > > 

...

> > I have qemu 7.0 which seems to change the default memory size from 32MB
> > to 128MB. This can be seen on your log here:
> > 
> > Memory: 27928K/32768K available (2827K kernel code, 160K rwdata, 432K rodata, 1016K init, 66K bss, 4840K reserved, 0K cma-reserved)
> > 
> > With 128MB the kernel boots.  With 64MB it also boots.  32MB fails with
> > an OOM. Looking into it more, I see that the OOM is caused by a
> > contiguous page allocation of 1MB (order 7 at 8K pages).

...

> > Does anyone have any idea why nommu would be getting this fragmented?
> 
> Answer: Why, yes.  Matthew does.  Using alloc_pages_exact() means we
> allocate the huge chunk of memory then free the leftovers immediately.
> Those freed leftover pages are handed out on the next request - which
> happens to be the maple tree.
> 
> It seems nommu is so close to OOMing already that this makes a
> difference.  Attached is a patch which _almost_ solves the issue by
> making it less likely to use those pages, but it's still a matter of
> timing on if this will OOM anyways.  It reduces the potential by a large
> margin, maybe 1/10 fail instead of 4/5 failing.  This patch is probably
> worth taking on its own as it reduces memory fragmentation on
> short-lived allocations that use alloc_pages_exact().
> 
> I changed the nommu code a bit to reduce memory usage as well.  During a
> split even, I no longer delete then re-add the VMA and I only
> preallocate a single time for the two writes associated with a split. I
> also moved my pre-allocation ahead of the call path that does
> alloc_pages_exact().  This all but ensures we won't fragment the larger
> chunks of memory as we get enough nodes out of a single page to run at
> least through boot.  However, the failure rate remained at 1/10 with
> this change.
> 
> I had accepted the scenario that this all just worked before, but my
> setup is different than that of Guenter.  I am using buildroot-2022.02.1
> and qemu 7.0 for my testing.  My configuration OOMs 12/13 times without
> maple tree, so I think we actually lowered the memory pressure on boot
> with these changes.  Obviously there is a element of timing that causes
> variation in the testing so exact numbers are not possible.

Andrew,

Please add the previous patch to the mm branch, it is not dependent on
the maple tree.

Please also include the attached patch as a fix for the maple tree nommu
OOM issue on top of "nommu: remove uses of VMA linked list".  It
triggers much less for me than a straight up buildroot-2022.02.1 build
with qemu 7.0.  I believe this will fix Guenter's issues with the maple
tree.

Thanks,
Liam

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-mm-nommu-Move-preallocations-and-limit-other-allocat.patch --]
[-- Type: text/x-diff; name="0001-mm-nommu-Move-preallocations-and-limit-other-allocat.patch", Size: 8620 bytes --]

From c92dfca0283877f0c0d9b4e619e261c0da78eb74 Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Date: Fri, 20 May 2022 21:51:57 -0400
Subject: [PATCH] mm/nommu: Move preallocations and limit other allocations

Move the preallocations in do_mmap() ahead of allocations that may use
alloc_pages_exact() so that reclaimed areas of larger pages don't get
reused for the maple tree nodes.  This will allow the larger blocks of
memory to reassemble after use.

Also avoid other unnecessary allocations for maple tree nodes by
overwriting the VMA on split instead of deleting and re-adding.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/nommu.c | 114 ++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 86 insertions(+), 28 deletions(-)

diff --git a/mm/nommu.c b/mm/nommu.c
index b5bb12772cbf..c80797961067 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -557,25 +557,14 @@ void vma_mas_remove(struct vm_area_struct *vma, struct ma_state *mas)
 	mas_store_prealloc(mas, NULL);
 }
 
-/*
- * add a VMA into a process's mm_struct in the appropriate place in the list
- * and tree and add to the address space's page tree also if not an anonymous
- * page
- * - should be called with mm->mmap_lock held writelocked
- */
-static void add_vma_to_mm(struct mm_struct *mm, struct vm_area_struct *vma)
+static void setup_vma_to_mm(struct vm_area_struct *vma, struct mm_struct *mm)
 {
-	struct address_space *mapping;
-	MA_STATE(mas, &mm->mm_mt, vma->vm_start, vma->vm_end);
-
-	BUG_ON(!vma->vm_region);
-
 	mm->map_count++;
 	vma->vm_mm = mm;
 
 	/* add the VMA to the mapping */
 	if (vma->vm_file) {
-		mapping = vma->vm_file->f_mapping;
+		struct address_space *mapping = vma->vm_file->f_mapping;
 
 		i_mmap_lock_write(mapping);
 		flush_dcache_mmap_lock(mapping);
@@ -583,18 +572,46 @@ static void add_vma_to_mm(struct mm_struct *mm, struct vm_area_struct *vma)
 		flush_dcache_mmap_unlock(mapping);
 		i_mmap_unlock_write(mapping);
 	}
+}
 
+/*
+ * mas_add_vma_to_mm() - Maple state variant of add_mas_to_mm().
+ * @mas: The maple state with preallocations.
+ * @mm: The mm_struct
+ * @vma: The vma to add
+ *
+ */
+static void mas_add_vma_to_mm(struct ma_state *mas, struct mm_struct *mm,
+			      struct vm_area_struct *vma)
+{
+	BUG_ON(!vma->vm_region);
+
+	setup_vma_to_mm(vma, mm);
 	/* add the VMA to the tree */
-	vma_mas_store(vma, &mas);
+	vma_mas_store(vma, mas);
 }
 
 /*
- * delete a VMA from its owning mm_struct and address space
+ * add a VMA into a process's mm_struct in the appropriate place in the list
+ * and tree and add to the address space's page tree also if not an anonymous
+ * page
+ * - should be called with mm->mmap_lock held writelocked
  */
-static void delete_vma_from_mm(struct vm_area_struct *vma)
+static int add_vma_to_mm(struct mm_struct *mm, struct vm_area_struct *vma)
 {
-	MA_STATE(mas, &vma->vm_mm->mm_mt, 0, 0);
+	MA_STATE(mas, &mm->mm_mt, vma->vm_start, vma->vm_end);
+
+	if (mas_preallocate(&mas, vma, GFP_KERNEL)) {
+		pr_warn("Allocation of vma tree for process %d failed\n",
+		       current->pid);
+		return -ENOMEM;
+	}
+	mas_add_vma_to_mm(&mas, mm, vma);
+	return 0;
+}
 
+static void cleanup_vma_from_mm(struct vm_area_struct *vma)
+{
 	vma->vm_mm->map_count--;
 	/* remove the VMA from the mapping */
 	if (vma->vm_file) {
@@ -607,9 +624,24 @@ static void delete_vma_from_mm(struct vm_area_struct *vma)
 		flush_dcache_mmap_unlock(mapping);
 		i_mmap_unlock_write(mapping);
 	}
+}
+/*
+ * delete a VMA from its owning mm_struct and address space
+ */
+static int delete_vma_from_mm(struct vm_area_struct *vma)
+{
+	MA_STATE(mas, &vma->vm_mm->mm_mt, 0, 0);
+
+	if (mas_preallocate(&mas, vma, GFP_KERNEL)) {
+		pr_warn("Allocation of vma tree for process %d failed\n",
+		       current->pid);
+		return -ENOMEM;
+	}
+	cleanup_vma_from_mm(vma);
 
 	/* remove from the MM's tree and list */
 	vma_mas_remove(vma, &mas);
+	return 0;
 }
 
 /*
@@ -1019,6 +1051,7 @@ unsigned long do_mmap(struct file *file,
 	vm_flags_t vm_flags;
 	unsigned long capabilities, result;
 	int ret;
+	MA_STATE(mas, &current->mm->mm_mt, 0, 0);
 
 	*populate = 0;
 
@@ -1037,6 +1070,10 @@ unsigned long do_mmap(struct file *file,
 	 * now know into VMA flags */
 	vm_flags = determine_vm_flags(file, prot, flags, capabilities);
 
+
+	if (mas_preallocate(&mas, vma, GFP_KERNEL))
+		goto error_maple_preallocate;
+
 	/* we're going to need to record the mapping */
 	region = kmem_cache_zalloc(vm_region_jar, GFP_KERNEL);
 	if (!region)
@@ -1186,7 +1223,7 @@ unsigned long do_mmap(struct file *file,
 	current->mm->total_vm += len >> PAGE_SHIFT;
 
 share:
-	add_vma_to_mm(current->mm, vma);
+	mas_add_vma_to_mm(&mas, current->mm, vma);
 
 	/* we flush the region from the icache only when the first executable
 	 * mapping of it is made  */
@@ -1212,11 +1249,13 @@ unsigned long do_mmap(struct file *file,
 
 sharing_violation:
 	up_write(&nommu_region_sem);
+	mas_destroy(&mas);
 	pr_warn("Attempt to share mismatched mappings\n");
 	ret = -EINVAL;
 	goto error;
 
 error_getting_vma:
+	mas_destroy(&mas);
 	kmem_cache_free(vm_region_jar, region);
 	pr_warn("Allocation of vma for %lu byte allocation from process %d failed\n",
 			len, current->pid);
@@ -1224,10 +1263,17 @@ unsigned long do_mmap(struct file *file,
 	return -ENOMEM;
 
 error_getting_region:
+	mas_destroy(&mas);
 	pr_warn("Allocation of vm region for %lu byte allocation from process %d failed\n",
 			len, current->pid);
 	show_free_areas(0, NULL);
 	return -ENOMEM;
+
+error_maple_preallocate:
+	pr_warn("Allocation of vma tree for process %d failed\n", current->pid);
+	show_free_areas(0, NULL);
+	return -ENOMEM;
+
 }
 
 unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
@@ -1293,6 +1339,7 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct vm_area_struct *new;
 	struct vm_region *region;
 	unsigned long npages;
+	MA_STATE(mas, &mm->mm_mt, vma->vm_start, vma->vm_end);
 
 	/* we're only permitted to split anonymous regions (these should have
 	 * only a single usage on the region) */
@@ -1328,7 +1375,6 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (new->vm_ops && new->vm_ops->open)
 		new->vm_ops->open(new);
 
-	delete_vma_from_mm(vma);
 	down_write(&nommu_region_sem);
 	delete_nommu_region(vma->vm_region);
 	if (new_below) {
@@ -1341,8 +1387,17 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
 	add_nommu_region(vma->vm_region);
 	add_nommu_region(new->vm_region);
 	up_write(&nommu_region_sem);
-	add_vma_to_mm(mm, vma);
-	add_vma_to_mm(mm, new);
+	if (mas_preallocate(&mas, vma, GFP_KERNEL)) {
+		pr_warn("Allocation of vma tree for process %d failed\n",
+		       current->pid);
+		return -ENOMEM;
+	}
+
+	setup_vma_to_mm(vma, mm);
+	setup_vma_to_mm(new, mm);
+	mas_set_range(&mas, vma->vm_start, vma->vm_end - 1);
+	mas_store(&mas, vma);
+	vma_mas_store(new, &mas);
 	return 0;
 }
 
@@ -1358,12 +1413,14 @@ static int shrink_vma(struct mm_struct *mm,
 
 	/* adjust the VMA's pointers, which may reposition it in the MM's tree
 	 * and list */
-	delete_vma_from_mm(vma);
+	if (delete_vma_from_mm(vma))
+		return -ENOMEM;
 	if (from > vma->vm_start)
 		vma->vm_end = from;
 	else
 		vma->vm_start = to;
-	add_vma_to_mm(mm, vma);
+	if (add_vma_to_mm(mm, vma))
+		return -ENOMEM;
 
 	/* cut the backing region down to size */
 	region = vma->vm_region;
@@ -1394,7 +1451,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, struct list
 	MA_STATE(mas, &mm->mm_mt, start, start);
 	struct vm_area_struct *vma;
 	unsigned long end;
-	int ret;
+	int ret = 0;
 
 	len = PAGE_ALIGN(len);
 	if (len == 0)
@@ -1444,9 +1501,10 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, struct list
 	}
 
 erase_whole_vma:
-	delete_vma_from_mm(vma);
+	if (delete_vma_from_mm(vma))
+		ret = -ENOMEM;
 	delete_vma(mm, vma);
-	return 0;
+	return ret;
 }
 
 int vm_munmap(unsigned long addr, size_t len)
@@ -1485,12 +1543,12 @@ void exit_mmap(struct mm_struct *mm)
 	 */
 	mmap_write_lock(mm);
 	for_each_vma(vmi, vma) {
-		delete_vma_from_mm(vma);
+		cleanup_vma_from_mm(vma);
 		delete_vma(mm, vma);
 		cond_resched();
 	}
-	mmap_write_unlock(mm);
 	__mt_destroy(&mm->mm_mt);
+	mmap_write_unlock(mm);
 }
 
 int vm_brk(unsigned long addr, unsigned long len)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2022-06-01 20:48 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-28 17:20 [PATCH] mapletree-vs-khugepaged Guenter Roeck
2022-04-28 19:27 ` Liam Howlett
2022-04-29 12:09 ` Heiko Carstens
2022-04-29 13:01   ` Liam Howlett
2022-04-29 13:10     ` Heiko Carstens
2022-04-29 16:18       ` Liam Howlett
2022-05-02  9:10         ` Geert Uytterhoeven
2022-05-13 14:46   ` Sven Schnelle
2022-05-13 14:51     ` Sven Schnelle
2022-05-13 16:49     ` Andrew Morton
2022-05-13 17:00     ` Liam Howlett
2022-05-15 20:02       ` Sven Schnelle
2022-05-16 14:02         ` Liam Howlett
2022-05-16 15:37           ` Sven Schnelle
2022-05-16 15:50             ` Liam Howlett
2022-05-16 17:10               ` Sven Schnelle
2022-05-17 14:52                 ` Liam Howlett
2022-05-17 11:53       ` Heiko Carstens
2022-05-17 12:26         ` Heiko Carstens
2022-05-17 13:23         ` Guenter Roeck
2022-05-17 15:03           ` Liam Howlett
2022-05-17 16:28             ` Guenter Roeck
2022-05-17 20:38               ` Liam Howlett
2022-05-17 14:32         ` Guenter Roeck
2022-05-19 14:35           ` Liam Howlett
2022-05-19 21:41             ` Guenter Roeck
2022-05-19 22:38               ` Liam Howlett
2022-05-30 17:38               ` Liam Howlett
2022-05-31 18:56                 ` Liam Howlett
2022-06-01 19:06                   ` Liam Howlett
2022-05-13 17:28     ` Guenter Roeck
2022-05-13 20:12     ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).