All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed
Date: Wed, 6 Mar 2024 13:42:06 +0000	[thread overview]
Message-ID: <367a14f7-340e-4b29-90ae-bc3fcefdd5f4@arm.com> (raw)
In-Reply-To: <20240227174254.710559-11-willy@infradead.org>

Hi Matthew,

Afraid I have another bug for you...

On 27/02/2024 17:42, Matthew Wilcox (Oracle) wrote:
> Hugetlb folios still get special treatment, but normal large folios
> can now be freed by free_unref_folios().  This should have a reasonable
> performance impact, TBD.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

When running some swap tests with this change (which is in mm-stable) present, I see BadThings(TM). Usually I see a "bad page state" followed by a delay of a few seconds, followed by an oops or NULL pointer deref. Bisect points to this change, and if I revert it, the problem goes away.

Here is one example, running against mm-unstable (a7f399ae964e):

[   76.239466] BUG: Bad page state in process usemem  pfn:2554a0
[   76.240196] kernel BUG at include/linux/mm.h:1120!
[   76.240198] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[   76.240724]  dump_backtrace+0x98/0xf8
[   76.241523] Modules linked in:
[   76.241943]  show_stack+0x20/0x38
[   76.242282] 
[   76.242680]  dump_stack_lvl+0x48/0x60
[   76.242855] CPU: 2 PID: 62 Comm: kcompactd0 Not tainted 6.8.0-rc5-00456-ga7f399ae964e #16
[   76.243278]  dump_stack+0x18/0x28
[   76.244138] Hardware name: linux,dummy-virt (DT)
[   76.244510]  bad_page+0x88/0x128
[   76.244995] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   76.245370]  free_page_is_bad_report+0xa4/0xb8
[   76.246101] pc : migrate_folio_done+0x140/0x150
[   76.246572]  __free_pages_ok+0x370/0x4b0
[   76.247048] lr : migrate_folio_done+0x140/0x150
[   76.247489]  destroy_large_folio+0x94/0x108
[   76.247971] sp : ffff800083f5b8d0
[   76.248451]  __folio_put_large+0x70/0xc0
[   76.248807] x29: ffff800083f5b8d0
[   76.249256]  __folio_put+0xac/0xc0
[   76.249260]  deferred_split_scan+0x234/0x340
[   76.249607]  x28: 0000000000000000
[   76.249997]  do_shrink_slab+0x144/0x460
[   76.250444]  x27: ffff800083f5bb30
[   76.250829]  shrink_slab+0x2e0/0x4e0
[   76.251234] 
[   76.251604]  shrink_node+0x204/0x8a0
[   76.251979] x26: 0000000000000001
[   76.252147]  do_try_to_free_pages+0xd0/0x568
[   76.252527]  x25: 0000000000000010
[   76.252881]  try_to_free_mem_cgroup_pages+0x128/0x2d0
[   76.253337]  x24: fffffc0008552800
[   76.253687]  try_charge_memcg+0x12c/0x650
[   76.254219] 
[   76.254583]  __mem_cgroup_charge+0x6c/0xd0
[   76.255013] x23: ffff0000e6f353a8
[   76.255181]  __handle_mm_fault+0xe90/0x16a8
[   76.255624]  x22: ffff0013f5fa59c0
[   76.255977]  handle_mm_fault+0x70/0x2b0
[   76.256413]  x21: 0000000000000000
[   76.256756]  do_page_fault+0x100/0x4c0
[   76.257177] 
[   76.257540]  do_translation_fault+0xb4/0xd0
[   76.257932] x20: 0000000000000007
[   76.258095]  do_mem_abort+0x4c/0xa8
[   76.258532]  x19: fffffc0008552800
[   76.258883]  el0_da+0x2c/0x78
[   76.259263]  x18: 0000000000000010
[   76.259616]  el0t_64_sync_handler+0xe4/0x158
[   76.259933] 
[   76.260286]  el0t_64_sync+0x190/0x198
[   76.260729] x17: 3030303030303020 x16: 6666666666666666 x15: 3030303030303030
[   76.262010] x14: 0000000000000000 x13: 7465732029732867 x12: 616c662045455246
[   76.262746] x11: 5f54415f4b434548 x10: ffff800082e8bff8 x9 : ffff8000801276ac
[   76.263462] x8 : 00000000ffffefff x7 : ffff800082e8bff8 x6 : 0000000000000000
[   76.264182] x5 : ffff0013f5eb9d08 x4 : 0000000000000000 x3 : 0000000000000000
[   76.264903] x2 : 0000000000000000 x1 : ffff0000c105d640 x0 : 000000000000003e
[   76.265604] Call trace:
[   76.265865]  migrate_folio_done+0x140/0x150
[   76.266278]  migrate_pages_batch+0x9ec/0xff0
[   76.266716]  migrate_pages+0xd20/0xe20
[   76.267103]  compact_zone+0x7b4/0x1000
[   76.267460]  kcompactd_do_work+0x174/0x4d8
[   76.267869]  kcompactd+0x26c/0x418
[   76.268175]  kthread+0x120/0x130
[   76.268517]  ret_from_fork+0x10/0x20
[   76.268892] Code: aa1303e0 b000d161 9100c021 97fe0465 (d4210000) 
[   76.269447] ---[ end trace 0000000000000000 ]---
[   76.269893] note: kcompactd0[62] exited with irqs disabled
[   76.269942] page: refcount:0 mapcount:1 mapping:0000000000000000 index:0xffffbd0a0 pfn:0x2554a0
[   76.270483] note: kcompactd0[62] exited with preempt_count 1
[   76.271344] head: order:0 entire_mapcount:1 nr_pages_mapped:0 pincount:0
[   76.272521] flags: 0xbfffc0000080058(uptodate|dirty|head|swapbacked|node=0|zone=2|lastcpupid=0xffff)
[   76.273265] page_type: 0xffffffff()
[   76.273542] raw: 0bfffc0000080058 dead000000000100 dead000000000122 0000000000000000
[   76.274368] raw: 0000000ffffbd0a0 0000000000000000 00000000ffffffff 0000000000000000
[   76.275043] head: 0bfffc0000080058 dead000000000100 dead000000000122 0000000000000000
[   76.275651] head: 0000000ffffbd0a0 0000000000000000 00000000ffffffff 0000000000000000
[   76.276407] head: 0bfffc0000000000 0000000000000000 fffffc0008552848 0000000000000000
[   76.277064] head: 0000001000000000 0000000000000000 00000000ffffffff 0000000000000000
[   76.277784] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
[   76.278502] ------------[ cut here ]------------
[   76.278893] kernel BUG at include/linux/mm.h:1120!
[   76.279269] Internal error: Oops - BUG: 00000000f2000800 [#2] PREEMPT SMP
[   76.280144] Modules linked in:
[   76.280401] CPU: 6 PID: 1337 Comm: usemem Tainted: G    B D            6.8.0-rc5-00456-ga7f399ae964e #16
[   76.281214] Hardware name: linux,dummy-virt (DT)
[   76.281635] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   76.282256] pc : deferred_split_scan+0x2f0/0x340
[   76.282698] lr : deferred_split_scan+0x2f0/0x340
[   76.283082] sp : ffff80008681b830
[   76.283426] x29: ffff80008681b830 x28: ffff0000cd4fb3c0 x27: fffffc0008552800
[   76.284113] x26: 0000000000000001 x25: 00000000ffffffff x24: 0000000000000001
[   76.284914] x23: 0000000000000000 x22: fffffc0008552800 x21: ffff0000e9df7820
[   76.285590] x20: ffff80008681b898 x19: ffff0000e9df7818 x18: 0000000000000000
[   76.286271] x17: 0000000000000001 x16: 0000000000000001 x15: ffff0000c0617210
[   76.286927] x14: ffff0000c10b6558 x13: 0000000000000040 x12: 0000000000000228
[   76.287543] x11: 0000000000000040 x10: 0000000000000a90 x9 : ffff800080220ed8
[   76.288176] x8 : ffff0000cd4fbeb0 x7 : 0000000000000000 x6 : 0000000000000000
[   76.288842] x5 : ffff0013f5f35d08 x4 : 0000000000000000 x3 : 0000000000000000
[   76.289538] x2 : 0000000000000000 x1 : ffff0000cd4fb3c0 x0 : 000000000000003e
[   76.290201] Call trace:
[   76.290432]  deferred_split_scan+0x2f0/0x340
[   76.290856]  do_shrink_slab+0x144/0x460
[   76.291221]  shrink_slab+0x2e0/0x4e0
[   76.291513]  shrink_node+0x204/0x8a0
[   76.291831]  do_try_to_free_pages+0xd0/0x568
[   76.292192]  try_to_free_mem_cgroup_pages+0x128/0x2d0
[   76.292599]  try_charge_memcg+0x12c/0x650
[   76.292926]  __mem_cgroup_charge+0x6c/0xd0
[   76.293289]  __handle_mm_fault+0xe90/0x16a8
[   76.293713]  handle_mm_fault+0x70/0x2b0
[   76.294031]  do_page_fault+0x100/0x4c0
[   76.294343]  do_translation_fault+0xb4/0xd0
[   76.294694]  do_mem_abort+0x4c/0xa8
[   76.294968]  el0_da+0x2c/0x78
[   76.295202]  el0t_64_sync_handler+0xe4/0x158
[   76.295565]  el0t_64_sync+0x190/0x198
[   76.295860] Code: aa1603e0 d000d0e1 9100c021 97fdc715 (d4210000) 
[   76.296429] ---[ end trace 0000000000000000 ]---
[   76.296805] note: usemem[1337] exited with irqs disabled
[   76.297261] note: usemem[1337] exited with preempt_count 1



My test case is intended to stress swap:

  - Running in VM (on Ampere Altra) with 70 vCPUs and 80G RAM
  - Have a 35G block ram device (CONFIG_BLK_DEV_RAM & "brd.rd_nr=1 brd.rd_size=36700160")
  - the ramdisk is configured as the swap backend
  - run the test case in a memcg constrained to 40G (to force mem pressure)
  - test case has 70 processes, each allocating and writing 1G of RAM


swapoff -a
mkswap /dev/ram0
swapon -f /dev/ram0
cgcreate -g memory:/mmperfcgroup
echo 40G > /sys/fs/cgroup/mmperfcgroup/memory.max
cgexec -g memory:mmperfcgroup sudo -u $(whoami) bash

Then inside that second bash shell, run this script:

--8<---
function run_usemem_once {
        ./usemem -n 70 -O 1G | grep -v "free memory"
}

function run_usemem_multi {
        size=${1}
        for i in {1..2}; do
                echo "${size} THP ${i}"
                run_usemem_once
        done
}

echo never > /sys/kernel/mm/transparent_hugepage/hugepages-*/enabled
echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled
run_usemem_multi "64K"
--8<---

It will usually get through the first iteration of the loop in run_usemem_multi() and fail on the second. I've never seen it get all the way through both iterations.

"usemem" is from the vm-scalability suite. It just allocates and writes loads of anonymous memory (70 is concurrent processes, 1G is the amount of memory per process). Then the memory pressure from the cgroup causes lots of swap to happen.

> ---
>  mm/swap.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index dce5ea67ae05..6b697d33fa5b 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -1003,12 +1003,13 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs)
>  		if (!folio_ref_sub_and_test(folio, nr_refs))
>  			continue;
>  
> -		if (folio_test_large(folio)) {
> +		/* hugetlb has its own memcg */
> +		if (folio_test_hugetlb(folio)) {

This still looks reasonable to me after re-review, so I have no idea what the problem is? I recall seeing some weird crashes when I looked at this original RFC, but didn't have time to debug at the time. I wonder if the root cause is the same.

If you find a smoking gun, I'm happy to test it if the above is too painful to reproduce.

Thanks,
Ryan

>  			if (lruvec) {
>  				unlock_page_lruvec_irqrestore(lruvec, flags);
>  				lruvec = NULL;
>  			}
> -			__folio_put_large(folio);
> +			free_huge_folio(folio);
>  			continue;
>  		}
>  



  reply	other threads:[~2024-03-06 13:42 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 17:42 [PATCH v3 00/18] Rearrange batched folio freeing Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 01/18] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 02/18] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 03/18] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 04/18] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 05/18] memcg: Add mem_cgroup_uncharge_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 06/18] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 07/18] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 08/18] mm: use __page_cache_release() in folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 09/18] mm: Handle large folios in free_unref_folios() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox (Oracle)
2024-03-06 13:42   ` Ryan Roberts [this message]
2024-03-06 16:09     ` Matthew Wilcox
2024-03-06 16:19       ` Ryan Roberts
2024-03-06 17:41         ` Ryan Roberts
2024-03-06 18:41           ` Zi Yan
2024-03-06 19:55             ` Matthew Wilcox
2024-03-06 21:55               ` Matthew Wilcox
2024-03-07  8:56                 ` Ryan Roberts
2024-03-07 13:50                   ` Yin, Fengwei
2024-03-07 14:05                     ` Re: Matthew Wilcox
2024-03-07 15:24                       ` Re: Ryan Roberts
2024-03-07 16:24                         ` Re: Ryan Roberts
2024-03-07 23:02                           ` Re: Matthew Wilcox
2024-03-08  1:06                       ` Re: Yin, Fengwei
2024-03-07 17:33                   ` [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox
2024-03-07 18:35                     ` Ryan Roberts
2024-03-07 20:42                       ` Matthew Wilcox
2024-03-08 11:44                     ` Ryan Roberts
2024-03-08 12:09                       ` Ryan Roberts
2024-03-08 14:21                         ` Ryan Roberts
2024-03-08 15:11                           ` Matthew Wilcox
2024-03-08 16:03                             ` Matthew Wilcox
2024-03-08 17:13                               ` Ryan Roberts
2024-03-08 18:09                                 ` Ryan Roberts
2024-03-08 18:18                                   ` Matthew Wilcox
2024-03-09  4:34                                     ` Andrew Morton
2024-03-09  4:52                                       ` Matthew Wilcox
2024-03-09  8:05                                         ` Ryan Roberts
2024-03-09 12:33                                           ` Ryan Roberts
2024-03-10 13:38                                             ` Matthew Wilcox
2024-03-08 15:33                         ` Matthew Wilcox
2024-03-09  6:09                       ` Matthew Wilcox
2024-03-09  7:59                         ` Ryan Roberts
2024-03-09  8:18                           ` Ryan Roberts
2024-03-09  9:38                             ` Ryan Roberts
2024-03-10  4:23                               ` Matthew Wilcox
2024-03-10  8:23                                 ` Ryan Roberts
2024-03-10 11:08                                   ` Matthew Wilcox
2024-03-10 11:01       ` Ryan Roberts
2024-03-10 11:11         ` Matthew Wilcox
2024-03-10 16:31           ` Ryan Roberts
2024-03-10 19:57             ` Matthew Wilcox
2024-03-10 19:59             ` Ryan Roberts
2024-03-10 20:46               ` Matthew Wilcox
2024-03-10 21:52                 ` Matthew Wilcox
2024-03-11  9:01                   ` Ryan Roberts
2024-03-11 12:26                     ` Matthew Wilcox
2024-03-11 12:36                       ` Ryan Roberts
2024-03-11 15:50                         ` Matthew Wilcox
2024-03-11 16:14                           ` Ryan Roberts
2024-03-11 17:49                             ` Matthew Wilcox
2024-03-12 11:57                               ` Ryan Roberts
2024-03-11 19:26                             ` Matthew Wilcox
2024-03-10 11:14         ` Ryan Roberts
2024-02-27 17:42 ` [PATCH v3 11/18] mm: Free folios in a batch in shrink_folio_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 12/18] mm: Free folios directly in move_folios_to_lru() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 13/18] memcg: Remove mem_cgroup_uncharge_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 14/18] mm: Remove free_unref_page_list() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 15/18] mm: Remove lru_to_page() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 16/18] mm: Convert free_pages_and_swap_cache() to use folios_put() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 17/18] mm: Use a folio in __collapse_huge_page_copy_succeeded() Matthew Wilcox (Oracle)
2024-02-27 17:42 ` [PATCH v3 18/18] mm: Convert free_swap_cache() to take a folio Matthew Wilcox (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=367a14f7-340e-4b29-90ae-bc3fcefdd5f4@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.