All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-mm@kvack.org
Subject: Re: [RFC PATCH 00/14] Rearrange batched folio freeing
Date: Mon, 4 Sep 2023 14:25:41 +0100	[thread overview]
Message-ID: <eeaf36cf-8e29-4de2-9e5a-9ec2a5e30c61@arm.com> (raw)
In-Reply-To: <20230825135918.4164671-1-willy@infradead.org>

On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> Other than the obvious "remove calls to compound_head" changes, the
> fundamental belief here is that iterating a linked list is much slower
> than iterating an array (5-15x slower in my testing).  There's also
> an associated belief that since we iterate the batch of folios three
> times, we do better when the array is small (ie 15 entries) than we do
> with a batch that is hundreds of entries long, which only gives us the
> opportunity for the first pages to fall out of cache by the time we get
> to the end.
> 
> The one place where that probably falls down is "Free folios in a batch
> in shrink_folio_list()" where we'll flush the TLB once per batch instead
> of at the end.  That's going to take some benchmarking.
> 
> Matthew Wilcox (Oracle) (14):
>   mm: Make folios_put() the basis of release_pages()
>   mm: Convert free_unref_page_list() to use folios
>   mm: Add free_unref_folios()
>   mm: Use folios_put() in __folio_batch_release()
>   memcg: Add mem_cgroup_uncharge_folios()
>   mm: Remove use of folio list from folios_put()
>   mm: Use free_unref_folios() in put_pages_list()
>   mm: use __page_cache_release() in folios_put()
>   mm: Handle large folios in free_unref_folios()
>   mm: Allow non-hugetlb large folios to be batch processed
>   mm: Free folios in a batch in shrink_folio_list()
>   mm: Free folios directly in move_folios_to_lru()
>   memcg: Remove mem_cgroup_uncharge_list()
>   mm: Remove free_unref_page_list()
> 
>  include/linux/memcontrol.h |  24 ++---
>  include/linux/mm.h         |  19 +---
>  mm/internal.h              |   4 +-
>  mm/memcontrol.c            |  16 ++--
>  mm/mlock.c                 |   3 +-
>  mm/page_alloc.c            |  74 ++++++++-------
>  mm/swap.c                  | 180 ++++++++++++++++++++-----------------
>  mm/vmscan.c                |  51 +++++------
>  8 files changed, 181 insertions(+), 190 deletions(-)
> 

Hi Matthew,

I've been doing some benchmarking of this series, as promised, but have hit an oops. It doesn't appear to be easily reproducible, and I'm struggling to figure out the root cause, so thought I would share in case you have any ideas?


--->8---

[  205.942771] ================================================================================
[  205.951201] UBSAN: array-index-out-of-bounds in mm/page_alloc.c:668:46
[  205.957716] index 10283 is out of range for type 'list_head [6]'
[  205.963774] ================================================================================
[  205.972200] Unable to handle kernel paging request at virtual address 006808190c06670f
[  205.980103] Mem abort info:
[  205.982884]   ESR = 0x0000000096000044
[  205.986620]   EC = 0x25: DABT (current EL), IL = 32 bits
[  205.991918]   SET = 0, FnV = 0
[  205.994959]   EA = 0, S1PTW = 0
[  205.998088]   FSC = 0x04: level 0 translation fault
[  206.002952] Data abort info:
[  206.005819]   ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
[  206.011291]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[  206.016329]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  206.021627] [006808190c06670f] address between user and kernel address ranges
[  206.028749] Internal error: Oops: 0000000096000044 [#1] SMP
[  206.034310] Modules linked in: nfs lockd grace sunrpc fscache netfs nls_iso8859_1 scsi_dh_rdac scsi_dh_emc scsi_dh_alua drm xfs btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core mlx5_core mlxfw pci_hyperv_intf crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce tls nvme psample nvme_core aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher
[  206.075579] CPU: 43 PID: 159703 Comm: git Not tainted 6.5.0-rc4-ryarob01-all #1
[  206.082875] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
[  206.095638] pstate: 004000c9 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  206.102587] pc : free_pcppages_bulk+0x330/0x7f8
[  206.107104] lr : free_pcppages_bulk+0x7a8/0x7f8
[  206.111622] sp : ffff8000aeef3680
[  206.114923] x29: ffff8000aeef3680 x28: 000000000000282b x27: 00000000000000fc
[  206.122046] x26: 000000008015a39a x25: ffff08181ef9e840 x24: ffff0818836caf80
[  206.129169] x23: 0000000000000001 x22: 0000000000000000 x21: ffff08181ef9e850
[  206.136292] x20: fffffc200368e680 x19: fffffc200368e6c0 x18: 0000000000000000
[  206.143414] x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d
[  206.150537] x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x12: 3d3d3d3d3d3d3d3d
[  206.157659] x11: 3d3d3d3d3d3d3d3d x10: 3d3d3d3d3d3d3d3d x9 : fffffc200368e688
[  206.164782] x8 : fffffc200368e680 x7 : 205d343737333639 x6 : ffff08181dee0000
[  206.171904] x5 : ffff0818836caf80 x4 : 0000000000000000 x3 : 0000000000000001
[  206.179027] x2 : ffff0818836f3330 x1 : ffff0818836f3230 x0 : 006808190c066707
[  206.186149] Call trace:
[  206.188582]  free_pcppages_bulk+0x330/0x7f8
[  206.192752]  free_unref_page_commit+0x15c/0x250
[  206.197270]  free_unref_folios+0x37c/0x4a8
[  206.201353]  release_unref_folios+0xac/0xf8
[  206.205524]  folios_put+0xe0/0x1f0
[  206.208914]  __folio_batch_release+0x34/0x88
[  206.213170]  truncate_inode_pages_range+0x160/0x540
[  206.218035]  truncate_inode_pages_final+0x58/0x90
[  206.222726]  ext4_evict_inode+0x164/0x900
[  206.226722]  evict+0xac/0x160
[  206.229678]  iput+0x170/0x228
[  206.232633]  do_unlinkat+0x1d0/0x290
[  206.236196]  __arm64_sys_unlinkat+0x48/0x98
[  206.240367]  invoke_syscall+0x74/0xf8
[  206.244016]  el0_svc_common.constprop.0+0x58/0x130
[  206.248793]  do_el0_svc+0x40/0xa8
[  206.252095]  el0_svc+0x2c/0xb8
[  206.255137]  el0t_64_sync_handler+0xc0/0xc8
[  206.259308]  el0t_64_sync+0x1a8/0x1b0
[  206.262958] Code: 8b0110a1 8b050305 8b010301 f9408020 (f9000409) 
[  206.269039] ---[ end trace 0000000000000000 ]---

--->8---


UBSAN is complaining about migratetype being out of range here:

/* Used for pages not on another list */
static inline void add_to_free_list(struct page *page, struct zone *zone,
				    unsigned int order, int migratetype)
{
	struct free_area *area = &zone->free_area[order];

	list_add(&page->buddy_list, &area->free_list[migratetype]);
	area->nr_free++;
}

And I think that is called from __free_one_page(), which is called from free_pcppages_bulk() at the top of the stack trace. migratetype originates from get_pcppage_migratetype(page), which is page->index. But I can't see where this might be getting corrupted, or how yours or my changes could affect this.




Config: This particular config actually has my large anon folios and mmu_gather series as well as this series from you. Although this issue is happening for file-backed memory, so I'm hoping its not related to LAF.

I have modified your series in a couple of relevant ways though:

 - I'm using `pcp_allowed_order(order)` instead of direct compare to PAGE_ALLOC_COSTLY_ORDER in free_unref_folios() (as I suggested in the review).

 - I've refactored folios_put() as you suggested in another thread to make implementation of my free_folio_regions_and_swap_cache() simpler:


static void release_unref_folios(struct folio_batch *folios)
{
	struct lruvec *lruvec = NULL;
	unsigned long flags = 0;
	int i;

	for (i = 0; i < folios->nr; i++)
		__page_cache_release(folios->folios[i], &lruvec, &flags);

	if (lruvec)
		unlock_page_lruvec_irqrestore(lruvec, flags);

	mem_cgroup_uncharge_folios(folios);
	free_unref_folios(folios);
}

/**
 * free_folio_regions_and_swap_cache - free swap cache and put refs
 * @folios: array of `struct folio_region`s to release
 * @nr: number of folio regions in array
 *
 * Each `struct folio_region` describes the start and end pfn of a region within
 * a single folio. The folio's swap cache is freed, then the folio reference
 * count is decremented once for each pfn in the region. If it falls to zero,
 * remove the folio from the LRU and free it.
 */
void free_folio_regions_and_swap_cache(struct folio_region *folios, int nr)
{
	struct folio_batch fbatch;
	unsigned int i;

	folio_batch_init(&fbatch);
	lru_add_drain();

	for (i = 0; i < nr; i++) {
		struct folio *folio = pfn_folio(folios[i].pfn_start);
		int refs = folios[i].pfn_end - folios[i].pfn_start;

		free_swap_cache(folio);

		if (is_huge_zero_page(&folio->page))
			continue;

		if (folio_is_zone_device(folio)) {
			if (put_devmap_managed_page_refs(&folio->page, refs))
				continue;
			if (folio_ref_sub_and_test(folio, refs))
				free_zone_device_page(&folio->page);
			continue;
		}

		if (!folio_ref_sub_and_test(folio, refs))
			continue;

		/* hugetlb has its own memcg */
		if (folio_test_hugetlb(folio)) {
			free_huge_folio(folio);
			continue;
		}

		if (folio_batch_add(&fbatch, folio) == 0)
			release_unref_folios(&fbatch);
	}

	if (fbatch.nr)
		release_unref_folios(&fbatch);
}

/**
 * folios_put - Decrement the reference count on a batch of folios.
 * @folios: The folios.
 *
 * Like folio_put(), but for a batch of folios.  This is more efficient
 * than writing the loop yourself as it will optimise the locks which need
 * to be taken if the folios are freed.  The folios batch is returned
 * empty and ready to be reused for another batch; there is no need to
 * reinitialise it.
 *
 * Context: May be called in process or interrupt context, but not in NMI
 * context.  May be called while holding a spinlock.
 */
void folios_put(struct folio_batch *folios)
{
	int i, j;

	for (i = 0, j = 0; i < folios->nr; i++) {
		struct folio *folio = folios->folios[i];

		if (is_huge_zero_page(&folio->page))
			continue;

		if (folio_is_zone_device(folio)) {
			if (put_devmap_managed_page(&folio->page))
				continue;
			if (folio_put_testzero(folio))
				free_zone_device_page(&folio->page);
			continue;
		}

		if (!folio_put_testzero(folio))
			continue;

		/* hugetlb has its own memcg */
		if (folio_test_hugetlb(folio)) {
			free_huge_folio(folio);
			continue;
		}

		if (j != i)
			folios->folios[j] = folio;
		j++;
	}

	folios->nr = j;
	if (!j)
		return;
	release_unref_folios(folios);
}
EXPORT_SYMBOL(folios_put);


Let me know if you have any thoughts.

Thanks,
Ryan




  parent reply	other threads:[~2023-09-04 13:25 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
2023-08-25 13:59 ` [RFC PATCH 01/14] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
2023-08-31 14:21   ` Ryan Roberts
2023-09-01  3:58     ` Matthew Wilcox
2023-09-01  8:14       ` Ryan Roberts
2023-08-25 13:59 ` [RFC PATCH 02/14] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
2023-08-31 14:29   ` Ryan Roberts
2023-09-01  4:03     ` Matthew Wilcox
2023-09-01  8:15       ` Ryan Roberts
2023-08-25 13:59 ` [RFC PATCH 03/14] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
2023-08-31 14:39   ` Ryan Roberts
2023-08-25 13:59 ` [RFC PATCH 04/14] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
2023-08-31 14:41   ` Ryan Roberts
2023-08-25 13:59 ` [RFC PATCH 05/14] memcg: Add mem_cgroup_uncharge_folios() Matthew Wilcox (Oracle)
2023-08-31 14:49   ` Ryan Roberts
2023-08-25 13:59 ` [RFC PATCH 06/14] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
2023-08-31 14:53   ` Ryan Roberts
2023-08-25 13:59 ` [RFC PATCH 07/14] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
2023-08-25 13:59 ` [RFC PATCH 08/14] mm: use __page_cache_release() in folios_put() Matthew Wilcox (Oracle)
2023-08-25 13:59 ` [RFC PATCH 09/14] mm: Handle large folios in free_unref_folios() Matthew Wilcox (Oracle)
2023-08-31 15:21   ` Ryan Roberts
2023-09-01  4:09     ` Matthew Wilcox
2023-08-25 13:59 ` [RFC PATCH 10/14] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox (Oracle)
2023-08-31 15:28   ` Ryan Roberts
2023-09-01  4:10     ` Matthew Wilcox
2023-08-25 13:59 ` [RFC PATCH 11/14] mm: Free folios in a batch in shrink_folio_list() Matthew Wilcox (Oracle)
2023-09-04  3:43   ` Matthew Wilcox
2024-01-05 17:00     ` Matthew Wilcox
2023-08-25 13:59 ` [RFC PATCH 12/14] mm: Free folios directly in move_folios_to_lru() Matthew Wilcox (Oracle)
2023-08-31 15:46   ` Ryan Roberts
2023-09-01  4:16     ` Matthew Wilcox
2023-08-25 13:59 ` [RFC PATCH 13/14] memcg: Remove mem_cgroup_uncharge_list() Matthew Wilcox (Oracle)
2023-08-31 18:26   ` Ryan Roberts
2023-08-25 13:59 ` [RFC PATCH 14/14] mm: Remove free_unref_page_list() Matthew Wilcox (Oracle)
2023-08-31 18:27   ` Ryan Roberts
2023-08-30 18:50 ` [RFC PATCH 15/18] mm: Convert free_pages_and_swap_cache() to use folios_put() Matthew Wilcox (Oracle)
2023-08-30 18:50 ` [RFC PATCH 16/18] mm: Use a folio in __collapse_huge_page_copy_succeeded() Matthew Wilcox (Oracle)
2023-08-30 18:50 ` [RFC PATCH 17/18] mm: Convert free_swap_cache() to take a folio Matthew Wilcox (Oracle)
2023-08-31 18:49   ` Ryan Roberts
2023-08-30 18:50 ` [RFC PATCH 18/18] mm: Add pfn_range_put() Matthew Wilcox (Oracle)
2023-08-31 19:03   ` Ryan Roberts
2023-09-01  4:27     ` Matthew Wilcox
2023-09-01  7:59       ` Ryan Roberts
2023-09-04 13:25 ` Ryan Roberts [this message]
2023-09-05 13:15   ` [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox
2023-09-05 13:26     ` Ryan Roberts
2023-09-05 14:00       ` Matthew Wilcox
2023-09-06  3:48         ` Matthew Wilcox
2023-09-06 10:23           ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eeaf36cf-8e29-4de2-9e5a-9ec2a5e30c61@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.