From: Qian Cai <quic_qiancai@quicinc.com>
To: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>,
<virtualization@lists.linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
Mel Gorman <mgorman@techsingularity.net>,
Eric Ren <renzhengeek@gmail.com>, Mike Rapoport <rppt@kernel.org>,
"Oscar Salvador" <osalvador@suse.de>,
Christophe Leroy <christophe.leroy@csgroup.eu>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v11 0/6] Use pageblock_order for cma and alloc_contig_range alignment.
Date: Tue, 26 Apr 2022 16:18:55 -0400 [thread overview]
Message-ID: <20220426201855.GA1014@qian> (raw)
In-Reply-To: <20220425143118.2850746-1-zi.yan@sent.com>
On Mon, Apr 25, 2022 at 10:31:12AM -0400, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
>
> Hi David,
>
> This patchset tries to remove the MAX_ORDER-1 alignment requirement for CMA
> and alloc_contig_range(). It prepares for my upcoming changes to make
> MAX_ORDER adjustable at boot time[1]. It is on top of mmotm-2022-04-20-17-12.
>
> Changelog
> ===
> V11
> ---
> 1. Moved start_isolate_page_range()/undo_isolate_page_range() alignment
> change to a separate patch after the unmovable page check change and
> alloc_contig_range() change to avoid some unwanted memory
> hotplug/hotremove failures.
> 2. Cleaned up has_unmovable_pages() in Patch 2.
>
> V10
> ---
> 1. Reverted back to the original outer_start, outer_end range for
> test_pages_isolated() and isolate_freepages_range() in Patch 3,
> otherwise isolation will fail if start in alloc_contig_range() is in
> the middle of a free page.
>
> V9
> ---
> 1. Limited has_unmovable_pages() check within a pageblock.
> 2. Added a check to ensure page isolation is done within a single zone
> in isolate_single_pageblock().
> 3. Fixed an off-by-one bug in isolate_single_pageblock().
> 4. Fixed a NULL-deferencing bug when the pages before to-be-isolated pageblock
> is not online in isolate_single_pageblock().
>
> V8
> ---
> 1. Cleaned up has_unmovable_pages() to remove page argument.
>
> V7
> ---
> 1. Added page validity check in isolate_single_pageblock() to avoid out
> of zone pages.
> 2. Fixed a bug in split_free_page() to split and free pages in correct
> page order.
>
> V6
> ---
> 1. Resolved compilation error/warning reported by kernel test robot.
> 2. Tried to solve the coding concerns from Christophe Leroy.
> 3. Shortened lengthy lines (pointed out by Christoph Hellwig).
>
> V5
> ---
> 1. Moved isolation address alignment handling in start_isolate_page_range().
> 2. Rewrote and simplified how alloc_contig_range() works at pageblock
> granularity (Patch 3). Only two pageblock migratetypes need to be saved and
> restored. start_isolate_page_range() might need to migrate pages in this
> version, but it prevents the caller from worrying about
> max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment after the page range
> is isolated.
>
> V4
> ---
> 1. Dropped two irrelevant patches on non-lru compound page handling, as
> it is not supported upstream.
> 2. Renamed migratetype_has_fallback() to migratetype_is_mergeable().
> 3. Always check whether two pageblocks can be merged in
> __free_one_page() when order is >= pageblock_order, as the case (not
> mergeable pageblocks are isolated, CMA, and HIGHATOMIC) becomes more common.
> 3. Moving has_unmovable_pages() is now a separate patch.
> 4. Removed MAX_ORDER-1 alignment requirement in the comment in virtio_mem code.
>
> Description
> ===
>
> The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
> isolates pageblocks to remove free memory from buddy allocator but isolating
> only a subset of pageblocks within a page spanning across multiple pageblocks
> causes free page accounting issues. Isolated page might not be put into the
> right free list, since the code assumes the migratetype of the first pageblock
> as the whole free page migratetype. This is based on the discussion at [2].
>
> To remove the requirement, this patchset:
> 1. isolates pages at pageblock granularity instead of
> max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages);
> 2. splits free pages across the specified range or migrates in-use pages
> across the specified range then splits the freed page to avoid free page
> accounting issues (it happens when multiple pageblocks within a single page
> have different migratetypes);
> 3. only checks unmovable pages within the range instead of MAX_ORDER - 1 aligned
> range during isolation to avoid alloc_contig_range() failure when pageblocks
> within a MAX_ORDER - 1 aligned range are allocated separately.
> 4. returns pages not in the range as it did before.
>
> One optimization might come later:
> 1. make MIGRATE_ISOLATE a separate bit to be able to restore the original
> migratetypes when isolation fails in the middle of the range.
>
> Feel free to give comments and suggestions. Thanks.
>
> [1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/
> [2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/
>
> Zi Yan (6):
> mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
> mm: page_isolation: check specified range for unmovable pages
> mm: make alloc_contig_range work at pageblock granularity
> mm: page_isolation: enable arbitrary range page isolation.
> mm: cma: use pageblock_order as the single alignment
> drivers: virtio_mem: use pageblock size as the minimum virtio_mem
> size.
>
> drivers/virtio/virtio_mem.c | 6 +-
> include/linux/cma.h | 4 +-
> include/linux/mmzone.h | 5 +-
> include/linux/page-isolation.h | 6 +-
> mm/internal.h | 6 +
> mm/memory_hotplug.c | 3 +-
> mm/page_alloc.c | 191 +++++-------------
> mm/page_isolation.c | 345 +++++++++++++++++++++++++++++++--
> 8 files changed, 392 insertions(+), 174 deletions(-)
Reverting this series fixed a deadlock during memory offline/online
tests and then a crash.
INFO: task kmemleak:1027 blocked for more than 120 seconds.
Not tainted 5.18.0-rc4-next-20220426-dirty #27
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kmemleak state:D stack:27744 pid: 1027 ppid: 2 flags:0x00000008
Call trace:
__switch_to
__schedule
schedule
percpu_rwsem_wait
__percpu_down_read
percpu_down_read.constprop.0
get_online_mems
kmemleak_scan
kmemleak_scan_thread
kthread
ret_from_fork
Showing all locks held in the system:
1 lock held by rcu_tasks_kthre/11:
#0: ffffc1e2cefc17f0 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp
1 lock held by rcu_tasks_rude_/12:
#0: ffffc1e2cefc1a90 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp
1 lock held by rcu_tasks_trace/13:
#0: ffffc1e2cefc1db0 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp
1 lock held by khungtaskd/824:
#0: ffffc1e2cefc2820 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks
2 locks held by kmemleak/1027:
#0: ffffc1e2cf1aa628 (scan_mutex){+.+.}-{3:3}, at: kmemleak_scan_thread
#1: ffffc1e2cf14e690 (mem_hotplug_lock){++++}-{0:0}, at: get_online_mems
2 locks held by cppc_fie/1805:
1 lock held by in:imklog/2822:
8 locks held by tee/3334:
#0: ffff0816d65c9438 (sb_writers#6){.+.+}-{0:0}, at: vfs_write
#1: ffff40025438be88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter
#2: ffff4000c8261eb0 (kn->active#298){.+.+}-{0:0}, at: kernfs_fop_write_iter
#3: ffffc1e2d0013f68 (device_hotplug_lock){+.+.}-{3:3}, at: online_store
#4: ffff0800cd8bb998 (&dev->mutex){....}-{3:3}, at: device_offline
#5: ffffc1e2ceed3750 (cpu_hotplug_lock){++++}-{0:0}, at: cpus_read_lock
#6: ffffc1e2cf14e690 (mem_hotplug_lock){++++}-{0:0}, at: offline_pages
#7: ffffc1e2cf13bf68 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable
__zone_set_pageset_high_and_batch at mm/page_alloc.c:7005
(inlined by) zone_pcp_disable at mm/page_alloc.c:9286
Later, running some kernel compilation workloads could trigger a crash.
Unable to handle kernel paging request at virtual address fffffbfffe000030
KASAN: maybe wild-memory-access in range [0x0003dffff0000180-0x0003dffff0000187]
Mem abort info:
ESR = 0x96000006
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x06: level 2 translation fault
Data abort info:
ISV = 0, ISS = 0x00000006
CM = 0, WnR = 0
swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000817545fd000
[fffffbfffe000030] pgd=00000817581e9003, p4d=00000817581e9003, pud=00000817581ea003, pmd=0000000000000000
Internal error: Oops: 96000006 [#1] PREEMPT SMP
Modules linked in: bridge stp llc cdc_ether usbnet ipmi_devintf ipmi_msghandler cppc_cpufreq fuse ip_tables x_tables ipv6 btrfs blake2b_generic libcrc32c xor xor_neon raid6_pq zstd_compress dm_mod nouveau drm_ttm_helper ttm crct10dif_ce mlx5_core drm_display_helper drm_kms_helper nvme mpt3sas xhci_pci nvme_core drm raid_class xhci_pci_renesas
CPU: 147 PID: 3334 Comm: tee Not tainted 5.18.0-rc4-next-20220426-dirty #27
pstate: 10400009 (nzcV daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : isolate_single_pageblock
lr : isolate_single_pageblock
sp : ffff80003e767500
x29: ffff80003e767500 x28: 0000000000000000 x27: ffff783c59963b1f
x26: dfff800000000000 x25: ffffc1e2ccb1d000 x24: ffffc1e2ccb1d8f8
x23: 00000000803bfe00 x22: ffffc1e2cee39098 x21: 0000000000000020
x20: 00000000803c0000 x19: fffffbfffe000000 x18: ffffc1e2cee37d1c
x17: 0000000000000000 x16: 1fffe8004a86f14c x15: 1fffe806c89e154a
x14: 1fffe8004a86f11c x13: 0000000000000004 x12: ffff783c5c455e6d
x11: 1ffff83c5c455e6c x10: ffff783c5c455e6c x9 : dfff800000000000
x8 : ffffc1e2e22af363 x7 : 0000000000000001 x6 : 0000000000000003
x5 : ffffc1e2e22af360 x4 : ffff783c5c455e6c x3 : ffff700007cece90
x2 : 0000000000000003 x1 : 0000000000000000 x0 : fffffbfffe000030
Call trace:
Call trace:
isolate_single_pageblock
PageBuddy at ./include/linux/page-flags.h:969 (discriminator 3)
(inlined by) isolate_single_pageblock at mm/page_isolation.c:414 (discriminator 3)
start_isolate_page_range
offline_pages
memory_subsys_offline
device_offline
online_store
dev_attr_store
sysfs_kf_write
kernfs_fop_write_iter
new_sync_write
vfs_write
ksys_write
__arm64_sys_write
invoke_syscall
el0_svc_common.constprop.0
do_el0_svc
el0_svc
el0t_64_sync_handler
el0t_64_sync
Code: 38fa6821 7100003f 7a411041 54000dca (b9403260)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs
Kernel Offset: 0x41e2c0720000 from 0xffff800008000000
PHYS_OFFSET: 0x80000000
CPU features: 0x000,0021700d,19801c82
Memory Limit: none
next prev parent reply other threads:[~2022-04-26 20:19 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-25 14:31 [PATCH v11 0/6] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
2022-04-25 14:31 ` [PATCH v11 1/6] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
2022-04-25 14:31 ` [PATCH v11 2/6] mm: page_isolation: check specified range for unmovable pages Zi Yan
2022-04-25 14:31 ` [PATCH v11 3/6] mm: make alloc_contig_range work at pageblock granularity Zi Yan
2022-04-29 13:54 ` Zi Yan
2022-05-24 19:00 ` Zi Yan
2022-05-25 17:41 ` Doug Berger
2022-05-25 17:53 ` Zi Yan
2022-05-25 21:03 ` Doug Berger
2022-05-25 21:11 ` Zi Yan
2022-05-26 17:34 ` Zi Yan
2022-05-26 19:46 ` Doug Berger
2022-04-25 14:31 ` [PATCH v11 4/6] mm: page_isolation: enable arbitrary range page isolation Zi Yan
2022-05-24 19:02 ` Zi Yan
2022-04-25 14:31 ` [PATCH v11 5/6] mm: cma: use pageblock_order as the single alignment Zi Yan
2022-04-25 14:31 ` [PATCH v11 6/6] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
2022-04-26 20:18 ` Qian Cai [this message]
2022-04-26 20:26 ` [PATCH v11 0/6] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
2022-04-26 21:08 ` Qian Cai
2022-04-26 21:38 ` Zi Yan
2022-04-27 12:41 ` Qian Cai
2022-04-27 13:10 ` Qian Cai
2022-04-27 13:27 ` Qian Cai
2022-04-27 13:30 ` Zi Yan
2022-04-27 21:04 ` Zi Yan
2022-04-28 12:33 ` Qian Cai
2022-04-28 12:39 ` Zi Yan
2022-04-28 16:19 ` Qian Cai
2022-04-29 13:38 ` Zi Yan
2022-05-19 20:57 ` Qian Cai
2022-05-19 21:35 ` Zi Yan
2022-05-19 23:24 ` Zi Yan
2022-05-20 11:30 ` Qian Cai
2022-05-20 13:43 ` Zi Yan
2022-05-20 14:13 ` Zi Yan
2022-05-20 19:41 ` Qian Cai
2022-05-20 21:56 ` Zi Yan
2022-05-20 23:41 ` Qian Cai
2022-05-22 16:54 ` Zi Yan
2022-05-22 19:33 ` Zi Yan
2022-05-24 16:59 ` Qian Cai
2022-05-10 1:03 ` Andrew Morton
2022-05-10 1:03 ` Andrew Morton
2022-05-10 1:07 ` Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220426201855.GA1014@qian \
--to=quic_qiancai@quicinc.com \
--cc=akpm@linux-foundation.org \
--cc=christophe.leroy@csgroup.eu \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=osalvador@suse.de \
--cc=renzhengeek@gmail.com \
--cc=rppt@kernel.org \
--cc=vbabka@suse.cz \
--cc=virtualization@lists.linux-foundation.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.