linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* v5.18-rc1: migratepages triggers VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
@ 2022-04-04 13:29 Naoya Horiguchi
  2022-04-04 14:05 ` Zi Yan
  0 siblings, 1 reply; 8+ messages in thread
From: Naoya Horiguchi @ 2022-04-04 13:29 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Andrew Morton, Matthew Wilcox, Michal Hocko, Naoya Horiguchi

[-- Attachment #1: Type: text/plain, Size: 3889 bytes --]

Hi,

I found that the below VM_BUG_ON_FOLIO is triggered on v5.18-rc1
(and also reproducible with mmotm on 3/31).
I have no idea about the bug's mechanism, but it seems not to be
shared in LKML yet, so let me just share. config.gz is attached.

This easily reproduces (for example) by calling migratepages(8)
command by any of running process (like PID 1).

Could anyone help me solve this?

Thanks,
Naoya Horiguchi

[   48.206424] page:0000000021452e3a refcount:6 mapcount:0 mapping:000000003aaf5253 index:0x0 pfn:0x14e600
[   48.213316] head:0000000021452e3a order:9 compound_mapcount:0 compound_pincount:0
[   48.218830] aops:xfs_address_space_operations [xfs] ino:dee dentry name:"libc.so.6"
[   48.225098] flags: 0x57ffffc0012027(locked|referenced|uptodate|active|private|head|node=1|zone=2|lastcpupid=0x1fffff)
[   48.232792] raw: 0057ffffc0012027 0000000000000000 dead000000000122 ffff8a0dc9a376b8
[   48.238464] raw: 0000000000000000 ffff8a0dc6b23d20 00000006ffffffff 0000000000000000
[   48.244109] page dumped because: VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
[   48.249196] ------------[ cut here ]------------
[   48.251240] kernel BUG at mm/memcontrol.c:6857!
[   48.253896] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   48.255377] CPU: 5 PID: 844 Comm: migratepages Tainted: G            E     5.18.0-rc1-v5.18-rc1-220404-1637-000-rc1+ #39
[   48.258251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
[   48.260535] RIP: 0010:mem_cgroup_migrate+0x217/0x320
[   48.261914] Code: 48 89 ef e8 5b 2c f7 ff 0f 0b 48 c7 c6 e8 64 5b b9 48 89 ef e8 4a 2c f7 ff 0f 0b 48 c7 c6 28 65 5b b9 48 89 ef e8 39 2c f7 ff <0f> 0b e8 12 79 e0 ff 49 8b 45 10 a8 03 0f 85 d2 00 00 00 65 48 ff
[   48.268541] RSP: 0018:ffffa19b41b77a20 EFLAGS: 00010286
[   48.270245] RAX: 0000000000000045 RBX: 0000000000000200 RCX: 0000000000000000
[   48.272494] RDX: 0000000000000001 RSI: ffffffffb9599561 RDI: 00000000ffffffff
[   48.274726] RBP: ffffe30f85398000 R08: 0000000000000000 R09: 00000000ffffdfff
[   48.276969] R10: ffffa19b41b77810 R11: ffffffffb9940d08 R12: 0000000000000000
[   48.279136] R13: ffffe30f85398000 R14: ffff8a0dc6b23d20 R15: 0000000000000200
[   48.281151] FS:  00007fadd1182740(0000) GS:ffff8a0efbc80000(0000) knlGS:0000000000000000
[   48.283422] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   48.285059] CR2: 00007fadd118b090 CR3: 0000000144432005 CR4: 0000000000170ee0
[   48.286942] Call Trace:
[   48.287665]  <TASK>
[   48.288255]  iomap_migrate_page+0x64/0x190
[   48.289366]  move_to_new_page+0xa3/0x470
[   48.290448]  ? page_not_mapped+0xa/0x20
[   48.291491]  ? rmap_walk_file+0xe1/0x1f0
[   48.292503]  ? try_to_migrate+0x8e/0xd0
[   48.293524]  migrate_pages+0x166e/0x1870
[   48.294607]  ? migrate_page+0xe0/0xe0
[   48.295761]  ? walk_page_range+0x9a/0x110
[   48.296885]  migrate_to_node+0xea/0x120
[   48.297873]  do_migrate_pages+0x23c/0x2a0
[   48.298925]  kernel_migrate_pages+0x3f5/0x470
[   48.300149]  __x64_sys_migrate_pages+0x19/0x20
[   48.301371]  do_syscall_64+0x3b/0x90
[   48.302340]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   48.303789] RIP: 0033:0x7fadd0f0af3d
[   48.304957] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bb ee 0e 00 f7 d8 64 89 01 48
[   48.310983] RSP: 002b:00007fff5997e178 EFLAGS: 00000246 ORIG_RAX: 0000000000000100
[   48.313444] RAX: ffffffffffffffda RBX: 0000556a722bf120 RCX: 00007fadd0f0af3d
[   48.315763] RDX: 0000556a722bf140 RSI: 0000000000000401 RDI: 000000000000034a
[   48.318070] RBP: 000000000000034a R08: 0000000000000000 R09: 0000000000000003
[   48.320370] R10: 0000556a722bf1f0 R11: 0000000000000246 R12: 0000556a722bf1d0
[   48.322679] R13: 000000000000034a R14: 00007fadd11cec00 R15: 0000556a71a59d50
[   48.324998]  </TASK>

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 48497 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v5.18-rc1: migratepages triggers VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
  2022-04-04 13:29 v5.18-rc1: migratepages triggers VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages) Naoya Horiguchi
@ 2022-04-04 14:05 ` Zi Yan
  2022-04-04 14:29   ` Matthew Wilcox
  0 siblings, 1 reply; 8+ messages in thread
From: Zi Yan @ 2022-04-04 14:05 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-kernel, linux-mm, Andrew Morton, Matthew Wilcox,
	Michal Hocko, Naoya Horiguchi

[-- Attachment #1: Type: text/plain, Size: 5362 bytes --]

On 4 Apr 2022, at 9:29, Naoya Horiguchi wrote:

> Hi,
>
> I found that the below VM_BUG_ON_FOLIO is triggered on v5.18-rc1
> (and also reproducible with mmotm on 3/31).
> I have no idea about the bug's mechanism, but it seems not to be
> shared in LKML yet, so let me just share. config.gz is attached.
>
> This easily reproduces (for example) by calling migratepages(8)
> command by any of running process (like PID 1).
>
> Could anyone help me solve this?
>
> Thanks,
> Naoya Horiguchi
>
> [   48.206424] page:0000000021452e3a refcount:6 mapcount:0 mapping:000000003aaf5253 index:0x0 pfn:0x14e600
> [   48.213316] head:0000000021452e3a order:9 compound_mapcount:0 compound_pincount:0
> [   48.218830] aops:xfs_address_space_operations [xfs] ino:dee dentry name:"libc.so.6"
> [   48.225098] flags: 0x57ffffc0012027(locked|referenced|uptodate|active|private|head|node=1|zone=2|lastcpupid=0x1fffff)
> [   48.232792] raw: 0057ffffc0012027 0000000000000000 dead000000000122 ffff8a0dc9a376b8
> [   48.238464] raw: 0000000000000000 ffff8a0dc6b23d20 00000006ffffffff 0000000000000000
> [   48.244109] page dumped because: VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
> [   48.249196] ------------[ cut here ]------------
> [   48.251240] kernel BUG at mm/memcontrol.c:6857!
> [   48.253896] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> [   48.255377] CPU: 5 PID: 844 Comm: migratepages Tainted: G            E     5.18.0-rc1-v5.18-rc1-220404-1637-000-rc1+ #39
> [   48.258251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
> [   48.260535] RIP: 0010:mem_cgroup_migrate+0x217/0x320
> [   48.261914] Code: 48 89 ef e8 5b 2c f7 ff 0f 0b 48 c7 c6 e8 64 5b b9 48 89 ef e8 4a 2c f7 ff 0f 0b 48 c7 c6 28 65 5b b9 48 89 ef e8 39 2c f7 ff <0f> 0b e8 12 79 e0 ff 49 8b 45 10 a8 03 0f 85 d2 00 00 00 65 48 ff
> [   48.268541] RSP: 0018:ffffa19b41b77a20 EFLAGS: 00010286
> [   48.270245] RAX: 0000000000000045 RBX: 0000000000000200 RCX: 0000000000000000
> [   48.272494] RDX: 0000000000000001 RSI: ffffffffb9599561 RDI: 00000000ffffffff
> [   48.274726] RBP: ffffe30f85398000 R08: 0000000000000000 R09: 00000000ffffdfff
> [   48.276969] R10: ffffa19b41b77810 R11: ffffffffb9940d08 R12: 0000000000000000
> [   48.279136] R13: ffffe30f85398000 R14: ffff8a0dc6b23d20 R15: 0000000000000200
> [   48.281151] FS:  00007fadd1182740(0000) GS:ffff8a0efbc80000(0000) knlGS:0000000000000000
> [   48.283422] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   48.285059] CR2: 00007fadd118b090 CR3: 0000000144432005 CR4: 0000000000170ee0
> [   48.286942] Call Trace:
> [   48.287665]  <TASK>
> [   48.288255]  iomap_migrate_page+0x64/0x190
> [   48.289366]  move_to_new_page+0xa3/0x470
> [   48.290448]  ? page_not_mapped+0xa/0x20
> [   48.291491]  ? rmap_walk_file+0xe1/0x1f0
> [   48.292503]  ? try_to_migrate+0x8e/0xd0
> [   48.293524]  migrate_pages+0x166e/0x1870
> [   48.294607]  ? migrate_page+0xe0/0xe0
> [   48.295761]  ? walk_page_range+0x9a/0x110
> [   48.296885]  migrate_to_node+0xea/0x120
> [   48.297873]  do_migrate_pages+0x23c/0x2a0
> [   48.298925]  kernel_migrate_pages+0x3f5/0x470
> [   48.300149]  __x64_sys_migrate_pages+0x19/0x20
> [   48.301371]  do_syscall_64+0x3b/0x90
> [   48.302340]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [   48.303789] RIP: 0033:0x7fadd0f0af3d
> [   48.304957] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bb ee 0e 00 f7 d8 64 89 01 48
> [   48.310983] RSP: 002b:00007fff5997e178 EFLAGS: 00000246 ORIG_RAX: 0000000000000100
> [   48.313444] RAX: ffffffffffffffda RBX: 0000556a722bf120 RCX: 00007fadd0f0af3d
> [   48.315763] RDX: 0000556a722bf140 RSI: 0000000000000401 RDI: 000000000000034a
> [   48.318070] RBP: 000000000000034a R08: 0000000000000000 R09: 0000000000000003
> [   48.320370] R10: 0000556a722bf1f0 R11: 0000000000000246 R12: 0000556a722bf1d0
> [   48.322679] R13: 000000000000034a R14: 00007fadd11cec00 R15: 0000556a71a59d50
> [   48.324998]  </TASK>

Is it because migration code assumes all THPs have order=HPAGE_PMD_ORDER?
Would the patch below fix the issue?

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a2516d31db6c..358b7c11426d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1209,7 +1209,7 @@ static struct page *new_page(struct page *page, unsigned long start)
                struct page *thp;

                thp = alloc_hugepage_vma(GFP_TRANSHUGE, vma, address,
-                                        HPAGE_PMD_ORDER);
+                                        thp_order(page));
                if (!thp)
                        return NULL;
                prep_transhuge_page(thp);
diff --git a/mm/migrate.c b/mm/migrate.c
index de175e2fdba5..79e4b36f709a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1547,7 +1547,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
                 */
                gfp_mask &= ~__GFP_RECLAIM;
                gfp_mask |= GFP_TRANSHUGE;
-               order = HPAGE_PMD_ORDER;
+               order = thp_order(page);
        }
        zidx = zone_idx(page_zone(page));
        if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: v5.18-rc1: migratepages triggers VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
  2022-04-04 14:05 ` Zi Yan
@ 2022-04-04 14:29   ` Matthew Wilcox
  2022-04-04 14:47     ` Zi Yan
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2022-04-04 14:29 UTC (permalink / raw)
  To: Zi Yan
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, Andrew Morton,
	Michal Hocko, Naoya Horiguchi

On Mon, Apr 04, 2022 at 10:05:00AM -0400, Zi Yan wrote:
> On 4 Apr 2022, at 9:29, Naoya Horiguchi wrote:
> > I found that the below VM_BUG_ON_FOLIO is triggered on v5.18-rc1
> > (and also reproducible with mmotm on 3/31).
> > I have no idea about the bug's mechanism, but it seems not to be
> > shared in LKML yet, so let me just share. config.gz is attached.
> >
> > [   48.206424] page:0000000021452e3a refcount:6 mapcount:0 mapping:000000003aaf5253 index:0x0 pfn:0x14e600
> > [   48.213316] head:0000000021452e3a order:9 compound_mapcount:0 compound_pincount:0
> > [   48.218830] aops:xfs_address_space_operations [xfs] ino:dee dentry name:"libc.so.6"
> > [   48.225098] flags: 0x57ffffc0012027(locked|referenced|uptodate|active|private|head|node=1|zone=2|lastcpupid=0x1fffff)
> > [   48.232792] raw: 0057ffffc0012027 0000000000000000 dead000000000122 ffff8a0dc9a376b8
> > [   48.238464] raw: 0000000000000000 ffff8a0dc6b23d20 00000006ffffffff 0000000000000000
> > [   48.244109] page dumped because: VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
> > [   48.249196] ------------[ cut here ]------------
> > [   48.251240] kernel BUG at mm/memcontrol.c:6857!
> > [   48.260535] RIP: 0010:mem_cgroup_migrate+0x217/0x320
> > [   48.286942] Call Trace:
> > [   48.287665]  <TASK>
> > [   48.288255]  iomap_migrate_page+0x64/0x190
> > [   48.289366]  move_to_new_page+0xa3/0x470
> 
> Is it because migration code assumes all THPs have order=HPAGE_PMD_ORDER?
> Would the patch below fix the issue?

This looks entirely plausible to me!  I do have changes in this area,
but clearly I should have submitted them earlier.  Let's get these fixes
in as they are.

Is there a test suite that tests page migration?  I usually use xfstests
and it does no page migration at all (at least 'git grep migrate'
finds nothing useful).


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v5.18-rc1: migratepages triggers VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
  2022-04-04 14:29   ` Matthew Wilcox
@ 2022-04-04 14:47     ` Zi Yan
  2022-04-04 15:18       ` Naoya Horiguchi
  0 siblings, 1 reply; 8+ messages in thread
From: Zi Yan @ 2022-04-04 14:47 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, Andrew Morton,
	Michal Hocko, Naoya Horiguchi

[-- Attachment #1: Type: text/plain, Size: 2171 bytes --]

On 4 Apr 2022, at 10:29, Matthew Wilcox wrote:

> On Mon, Apr 04, 2022 at 10:05:00AM -0400, Zi Yan wrote:
>> On 4 Apr 2022, at 9:29, Naoya Horiguchi wrote:
>>> I found that the below VM_BUG_ON_FOLIO is triggered on v5.18-rc1
>>> (and also reproducible with mmotm on 3/31).
>>> I have no idea about the bug's mechanism, but it seems not to be
>>> shared in LKML yet, so let me just share. config.gz is attached.
>>>
>>> [   48.206424] page:0000000021452e3a refcount:6 mapcount:0 mapping:000000003aaf5253 index:0x0 pfn:0x14e600
>>> [   48.213316] head:0000000021452e3a order:9 compound_mapcount:0 compound_pincount:0
>>> [   48.218830] aops:xfs_address_space_operations [xfs] ino:dee dentry name:"libc.so.6"
>>> [   48.225098] flags: 0x57ffffc0012027(locked|referenced|uptodate|active|private|head|node=1|zone=2|lastcpupid=0x1fffff)
>>> [   48.232792] raw: 0057ffffc0012027 0000000000000000 dead000000000122 ffff8a0dc9a376b8
>>> [   48.238464] raw: 0000000000000000 ffff8a0dc6b23d20 00000006ffffffff 0000000000000000
>>> [   48.244109] page dumped because: VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
>>> [   48.249196] ------------[ cut here ]------------
>>> [   48.251240] kernel BUG at mm/memcontrol.c:6857!
>>> [   48.260535] RIP: 0010:mem_cgroup_migrate+0x217/0x320
>>> [   48.286942] Call Trace:
>>> [   48.287665]  <TASK>
>>> [   48.288255]  iomap_migrate_page+0x64/0x190
>>> [   48.289366]  move_to_new_page+0xa3/0x470
>>
>> Is it because migration code assumes all THPs have order=HPAGE_PMD_ORDER?
>> Would the patch below fix the issue?
>
> This looks entirely plausible to me!  I do have changes in this area,
> but clearly I should have submitted them earlier.  Let's get these fixes
> in as they are.
>
> Is there a test suite that tests page migration?  I usually use xfstests
> and it does no page migration at all (at least 'git grep migrate'
> finds nothing useful).

https://github.com/linux-test-project/ltp has some migrate_pages and move_pages
tests. You can run them after install ltp:
sudo ./runltp -f syscalls -s migrate_pages and
sudo ./runltp -f sys calls -s move_pages


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v5.18-rc1: migratepages triggers VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
  2022-04-04 14:47     ` Zi Yan
@ 2022-04-04 15:18       ` Naoya Horiguchi
  2022-04-04 15:44         ` Zi Yan
  0 siblings, 1 reply; 8+ messages in thread
From: Naoya Horiguchi @ 2022-04-04 15:18 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox, linux-kernel, linux-mm, Andrew Morton,
	Michal Hocko, Naoya Horiguchi

On Mon, Apr 04, 2022 at 10:47:20AM -0400, Zi Yan wrote:
> On 4 Apr 2022, at 10:29, Matthew Wilcox wrote:
> 
> > On Mon, Apr 04, 2022 at 10:05:00AM -0400, Zi Yan wrote:
> >> On 4 Apr 2022, at 9:29, Naoya Horiguchi wrote:
> >>> I found that the below VM_BUG_ON_FOLIO is triggered on v5.18-rc1
> >>> (and also reproducible with mmotm on 3/31).
> >>> I have no idea about the bug's mechanism, but it seems not to be
> >>> shared in LKML yet, so let me just share. config.gz is attached.
> >>>
> >>> [   48.206424] page:0000000021452e3a refcount:6 mapcount:0 mapping:000000003aaf5253 index:0x0 pfn:0x14e600
> >>> [   48.213316] head:0000000021452e3a order:9 compound_mapcount:0 compound_pincount:0
> >>> [   48.218830] aops:xfs_address_space_operations [xfs] ino:dee dentry name:"libc.so.6"
> >>> [   48.225098] flags: 0x57ffffc0012027(locked|referenced|uptodate|active|private|head|node=1|zone=2|lastcpupid=0x1fffff)
> >>> [   48.232792] raw: 0057ffffc0012027 0000000000000000 dead000000000122 ffff8a0dc9a376b8
> >>> [   48.238464] raw: 0000000000000000 ffff8a0dc6b23d20 00000006ffffffff 0000000000000000
> >>> [   48.244109] page dumped because: VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
> >>> [   48.249196] ------------[ cut here ]------------
> >>> [   48.251240] kernel BUG at mm/memcontrol.c:6857!
> >>> [   48.260535] RIP: 0010:mem_cgroup_migrate+0x217/0x320
> >>> [   48.286942] Call Trace:
> >>> [   48.287665]  <TASK>
> >>> [   48.288255]  iomap_migrate_page+0x64/0x190
> >>> [   48.289366]  move_to_new_page+0xa3/0x470
> >>
> >> Is it because migration code assumes all THPs have order=HPAGE_PMD_ORDER?
> >> Would the patch below fix the issue?

I briefly confirmed that this bug didn't reproduce with your change,
thank you very much!

- Naoya Horiguchi

> >
> > This looks entirely plausible to me!  I do have changes in this area,
> > but clearly I should have submitted them earlier.  Let's get these fixes
> > in as they are.
> >
> > Is there a test suite that tests page migration?  I usually use xfstests
> > and it does no page migration at all (at least 'git grep migrate'
> > finds nothing useful).
> 
> https://github.com/linux-test-project/ltp has some migrate_pages and move_pages
> tests. You can run them after install ltp:
> sudo ./runltp -f syscalls -s migrate_pages and
> sudo ./runltp -f sys calls -s move_pages


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v5.18-rc1: migratepages triggers VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
  2022-04-04 15:18       ` Naoya Horiguchi
@ 2022-04-04 15:44         ` Zi Yan
  2022-04-04 16:06           ` Matthew Wilcox
  0 siblings, 1 reply; 8+ messages in thread
From: Zi Yan @ 2022-04-04 15:44 UTC (permalink / raw)
  To: Naoya Horiguchi, Matthew Wilcox
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Naoya Horiguchi

[-- Attachment #1: Type: text/plain, Size: 5134 bytes --]

On 4 Apr 2022, at 11:18, Naoya Horiguchi wrote:

> On Mon, Apr 04, 2022 at 10:47:20AM -0400, Zi Yan wrote:
>> On 4 Apr 2022, at 10:29, Matthew Wilcox wrote:
>>
>>> On Mon, Apr 04, 2022 at 10:05:00AM -0400, Zi Yan wrote:
>>>> On 4 Apr 2022, at 9:29, Naoya Horiguchi wrote:
>>>>> I found that the below VM_BUG_ON_FOLIO is triggered on v5.18-rc1
>>>>> (and also reproducible with mmotm on 3/31).
>>>>> I have no idea about the bug's mechanism, but it seems not to be
>>>>> shared in LKML yet, so let me just share. config.gz is attached.
>>>>>
>>>>> [   48.206424] page:0000000021452e3a refcount:6 mapcount:0 mapping:000000003aaf5253 index:0x0 pfn:0x14e600
>>>>> [   48.213316] head:0000000021452e3a order:9 compound_mapcount:0 compound_pincount:0
>>>>> [   48.218830] aops:xfs_address_space_operations [xfs] ino:dee dentry name:"libc.so.6"
>>>>> [   48.225098] flags: 0x57ffffc0012027(locked|referenced|uptodate|active|private|head|node=1|zone=2|lastcpupid=0x1fffff)
>>>>> [   48.232792] raw: 0057ffffc0012027 0000000000000000 dead000000000122 ffff8a0dc9a376b8
>>>>> [   48.238464] raw: 0000000000000000 ffff8a0dc6b23d20 00000006ffffffff 0000000000000000
>>>>> [   48.244109] page dumped because: VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
>>>>> [   48.249196] ------------[ cut here ]------------
>>>>> [   48.251240] kernel BUG at mm/memcontrol.c:6857!
>>>>> [   48.260535] RIP: 0010:mem_cgroup_migrate+0x217/0x320
>>>>> [   48.286942] Call Trace:
>>>>> [   48.287665]  <TASK>
>>>>> [   48.288255]  iomap_migrate_page+0x64/0x190
>>>>> [   48.289366]  move_to_new_page+0xa3/0x470
>>>>
>>>> Is it because migration code assumes all THPs have order=HPAGE_PMD_ORDER?
>>>> Would the patch below fix the issue?
>
> I briefly confirmed that this bug didn't reproduce with your change,
> thank you very much!
>

Thanks.


Hi Matthew,

I am wondering if my change is the right fix or not. folios with order>0
are still available when CONFIG_TRANSPARENT_HUGEPAGE is not set, right?
Then, PageTransHuge always returns false and the VM_BUG will still be
triggered, since there is no code to allocate folios with order>0.

Maybe the patch below could cover !CONFIG_TRANSPARENT_HUGEPAGE too?

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a2516d31db6c..6e60b5c4b565 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1209,7 +1209,7 @@ static struct page *new_page(struct page *page, unsigned long start)
                struct page *thp;

                thp = alloc_hugepage_vma(GFP_TRANSHUGE, vma, address,
-                                        HPAGE_PMD_ORDER);
+                                        thp_order(page));
                if (!thp)
                        return NULL;
                prep_transhuge_page(thp);
@@ -1218,8 +1218,8 @@ static struct page *new_page(struct page *page, unsigned long start)
        /*
         * if !vma, alloc_page_vma() will use task or system default policy
         */
-       return alloc_page_vma(GFP_HIGHUSER_MOVABLE | __GFP_RETRY_MAYFAIL,
-                       vma, address);
+       return alloc_pages_vma(GFP_HIGHUSER_MOVABLE | __GFP_RETRY_MAYFAIL,
+                       folio_order(page_folio(page), vma, address);
 }
 #else

diff --git a/mm/migrate.c b/mm/migrate.c
index de175e2fdba5..b079605854d7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1522,7 +1522,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
 {
        struct migration_target_control *mtc;
        gfp_t gfp_mask;
-       unsigned int order = 0;
+       unsigned int order = folio_order(page_folio(page));
        struct page *new_page = NULL;
        int nid;
        int zidx;
@@ -1547,7 +1547,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
                 */
                gfp_mask &= ~__GFP_RECLAIM;
                gfp_mask |= GFP_TRANSHUGE;
-               order = HPAGE_PMD_ORDER;
+               order = thp_order(page);
        }
        zidx = zone_idx(page_zone(page));
        if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)


>>>
>>> This looks entirely plausible to me!  I do have changes in this area,
>>> but clearly I should have submitted them earlier.  Let's get these fixes
>>> in as they are.
>>>
>>> Is there a test suite that tests page migration?  I usually use xfstests
>>> and it does no page migration at all (at least 'git grep migrate'
>>> finds nothing useful).
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flinux-test-project%2Fltp&amp;data=04%7C01%7Cziy%40nvidia.com%7Cec512f5a763543d4f99608da164e5413%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637846822934713102%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=Ig%2Ba4GEkks5vShdpfX8RSX5csCTKq3dmtaOqjpOmelk%3D&amp;reserved=0 has some migrate_pages and move_pages
>> tests. You can run them after install ltp:
>> sudo ./runltp -f syscalls -s migrate_pages and
>> sudo ./runltp -f sys calls -s move_pages


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: v5.18-rc1: migratepages triggers VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
  2022-04-04 15:44         ` Zi Yan
@ 2022-04-04 16:06           ` Matthew Wilcox
  2022-04-04 16:41             ` Zi Yan
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2022-04-04 16:06 UTC (permalink / raw)
  To: Zi Yan
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, Andrew Morton,
	Michal Hocko, Naoya Horiguchi

On Mon, Apr 04, 2022 at 11:44:39AM -0400, Zi Yan wrote:
> I am wondering if my change is the right fix or not. folios with order>0
> are still available when CONFIG_TRANSPARENT_HUGEPAGE is not set, right?

That's the eventual plan, but it's not possible today.  We need to
be able to split large folios (eg in truncation) and that functionality
is still under CONFIG_TRANSPARENT_HUGEPAGE in mm/huge_memory.c.  So
large folios depend on CONFIG_TRANSPARENT_HUGEPAGE instead of having a
clean separation between functionality-to-support-PMD-mapping and
functionality-to-support-order>0.

So I preferred your earlier patch because it's more obvious.  I mean,
we could pull in the two or three patches from my tree that convert
these functions and their callers to folios ... we're only at rc1.
I can post them and see what others think.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v5.18-rc1: migratepages triggers VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
  2022-04-04 16:06           ` Matthew Wilcox
@ 2022-04-04 16:41             ` Zi Yan
  0 siblings, 0 replies; 8+ messages in thread
From: Zi Yan @ 2022-04-04 16:41 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, Andrew Morton,
	Michal Hocko, Naoya Horiguchi

[-- Attachment #1: Type: text/plain, Size: 972 bytes --]

On 4 Apr 2022, at 12:06, Matthew Wilcox wrote:

> On Mon, Apr 04, 2022 at 11:44:39AM -0400, Zi Yan wrote:
>> I am wondering if my change is the right fix or not. folios with order>0
>> are still available when CONFIG_TRANSPARENT_HUGEPAGE is not set, right?
>
> That's the eventual plan, but it's not possible today.  We need to
> be able to split large folios (eg in truncation) and that functionality
> is still under CONFIG_TRANSPARENT_HUGEPAGE in mm/huge_memory.c.  So
> large folios depend on CONFIG_TRANSPARENT_HUGEPAGE instead of having a
> clean separation between functionality-to-support-PMD-mapping and
> functionality-to-support-order>0.
>
> So I preferred your earlier patch because it's more obvious.  I mean,
> we could pull in the two or three patches from my tree that convert
> these functions and their callers to folios ... we're only at rc1.
> I can post them and see what others think.

OK, I will send out my initial patch.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-04-04 21:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-04 13:29 v5.18-rc1: migratepages triggers VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages) Naoya Horiguchi
2022-04-04 14:05 ` Zi Yan
2022-04-04 14:29   ` Matthew Wilcox
2022-04-04 14:47     ` Zi Yan
2022-04-04 15:18       ` Naoya Horiguchi
2022-04-04 15:44         ` Zi Yan
2022-04-04 16:06           ` Matthew Wilcox
2022-04-04 16:41             ` Zi Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).