linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: Qian Cai <quic_qiancai@quicinc.com>
Cc: David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Eric Ren <renzhengeek@gmail.com>, Mike Rapoport <rppt@kernel.org>,
	Oscar Salvador <osalvador@suse.de>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v11 0/6] Use pageblock_order for cma and alloc_contig_range alignment.
Date: Tue, 26 Apr 2022 16:26:08 -0400	[thread overview]
Message-ID: <B621B4DD-5D11-4F0E-AFF5-F8684AE37E57@nvidia.com> (raw)
In-Reply-To: <20220426201855.GA1014@qian>

[-- Attachment #1: Type: text/plain, Size: 10870 bytes --]

On 26 Apr 2022, at 16:18, Qian Cai wrote:

> On Mon, Apr 25, 2022 at 10:31:12AM -0400, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Hi David,
>>
>> This patchset tries to remove the MAX_ORDER-1 alignment requirement for CMA
>> and alloc_contig_range(). It prepares for my upcoming changes to make
>> MAX_ORDER adjustable at boot time[1]. It is on top of mmotm-2022-04-20-17-12.
>>
>> Changelog
>> ===
>> V11
>> ---
>> 1. Moved start_isolate_page_range()/undo_isolate_page_range() alignment
>>    change to a separate patch after the unmovable page check change and
>>    alloc_contig_range() change to avoid some unwanted memory
>>    hotplug/hotremove failures.
>> 2. Cleaned up has_unmovable_pages() in Patch 2.
>>
>> V10
>> ---
>> 1. Reverted back to the original outer_start, outer_end range for
>>    test_pages_isolated() and isolate_freepages_range() in Patch 3,
>>    otherwise isolation will fail if start in alloc_contig_range() is in
>>    the middle of a free page.
>>
>> V9
>> ---
>> 1. Limited has_unmovable_pages() check within a pageblock.
>> 2. Added a check to ensure page isolation is done within a single zone
>>    in isolate_single_pageblock().
>> 3. Fixed an off-by-one bug in isolate_single_pageblock().
>> 4. Fixed a NULL-deferencing bug when the pages before to-be-isolated pageblock
>>    is not online in isolate_single_pageblock().
>>
>> V8
>> ---
>> 1. Cleaned up has_unmovable_pages() to remove page argument.
>>
>> V7
>> ---
>> 1. Added page validity check in isolate_single_pageblock() to avoid out
>>    of zone pages.
>> 2. Fixed a bug in split_free_page() to split and free pages in correct
>>    page order.
>>
>> V6
>> ---
>> 1. Resolved compilation error/warning reported by kernel test robot.
>> 2. Tried to solve the coding concerns from Christophe Leroy.
>> 3. Shortened lengthy lines (pointed out by Christoph Hellwig).
>>
>> V5
>> ---
>> 1. Moved isolation address alignment handling in start_isolate_page_range().
>> 2. Rewrote and simplified how alloc_contig_range() works at pageblock
>>    granularity (Patch 3). Only two pageblock migratetypes need to be saved and
>>    restored. start_isolate_page_range() might need to migrate pages in this
>>    version, but it prevents the caller from worrying about
>>    max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment after the page range
>>    is isolated.
>>
>> V4
>> ---
>> 1. Dropped two irrelevant patches on non-lru compound page handling, as
>>    it is not supported upstream.
>> 2. Renamed migratetype_has_fallback() to migratetype_is_mergeable().
>> 3. Always check whether two pageblocks can be merged in
>>    __free_one_page() when order is >= pageblock_order, as the case (not
>>    mergeable pageblocks are isolated, CMA, and HIGHATOMIC) becomes more common.
>> 3. Moving has_unmovable_pages() is now a separate patch.
>> 4. Removed MAX_ORDER-1 alignment requirement in the comment in virtio_mem code.
>>
>> Description
>> ===
>>
>> The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
>> isolates pageblocks to remove free memory from buddy allocator but isolating
>> only a subset of pageblocks within a page spanning across multiple pageblocks
>> causes free page accounting issues. Isolated page might not be put into the
>> right free list, since the code assumes the migratetype of the first pageblock
>> as the whole free page migratetype. This is based on the discussion at [2].
>>
>> To remove the requirement, this patchset:
>> 1. isolates pages at pageblock granularity instead of
>>    max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages);
>> 2. splits free pages across the specified range or migrates in-use pages
>>    across the specified range then splits the freed page to avoid free page
>>    accounting issues (it happens when multiple pageblocks within a single page
>>    have different migratetypes);
>> 3. only checks unmovable pages within the range instead of MAX_ORDER - 1 aligned
>>    range during isolation to avoid alloc_contig_range() failure when pageblocks
>>    within a MAX_ORDER - 1 aligned range are allocated separately.
>> 4. returns pages not in the range as it did before.
>>
>> One optimization might come later:
>> 1. make MIGRATE_ISOLATE a separate bit to be able to restore the original
>>    migratetypes when isolation fails in the middle of the range.
>>
>> Feel free to give comments and suggestions. Thanks.
>>
>> [1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/
>> [2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/
>>
>> Zi Yan (6):
>>   mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
>>   mm: page_isolation: check specified range for unmovable pages
>>   mm: make alloc_contig_range work at pageblock granularity
>>   mm: page_isolation: enable arbitrary range page isolation.
>>   mm: cma: use pageblock_order as the single alignment
>>   drivers: virtio_mem: use pageblock size as the minimum virtio_mem
>>     size.
>>
>>  drivers/virtio/virtio_mem.c    |   6 +-
>>  include/linux/cma.h            |   4 +-
>>  include/linux/mmzone.h         |   5 +-
>>  include/linux/page-isolation.h |   6 +-
>>  mm/internal.h                  |   6 +
>>  mm/memory_hotplug.c            |   3 +-
>>  mm/page_alloc.c                | 191 +++++-------------
>>  mm/page_isolation.c            | 345 +++++++++++++++++++++++++++++++--
>>  8 files changed, 392 insertions(+), 174 deletions(-)
>
> Reverting this series fixed a deadlock during memory offline/online
> tests and then a crash.

Hi Qian,

Thanks for reporting the issue. Do you have a reproducer I can use to debug the code?

>
>  INFO: task kmemleak:1027 blocked for more than 120 seconds.
>        Not tainted 5.18.0-rc4-next-20220426-dirty #27
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  task:kmemleak        state:D stack:27744 pid: 1027 ppid:     2 flags:0x00000008
>  Call trace:
>   __switch_to
>   __schedule
>   schedule
>   percpu_rwsem_wait
>   __percpu_down_read
>   percpu_down_read.constprop.0
>   get_online_mems
>   kmemleak_scan
>   kmemleak_scan_thread
>   kthread
>   ret_from_fork
>
>  Showing all locks held in the system:
>  1 lock held by rcu_tasks_kthre/11:
>   #0: ffffc1e2cefc17f0 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp
>  1 lock held by rcu_tasks_rude_/12:
>   #0: ffffc1e2cefc1a90 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp
>  1 lock held by rcu_tasks_trace/13:
>   #0: ffffc1e2cefc1db0 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp
>  1 lock held by khungtaskd/824:
>   #0: ffffc1e2cefc2820 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks
>  2 locks held by kmemleak/1027:
>   #0: ffffc1e2cf1aa628 (scan_mutex){+.+.}-{3:3}, at: kmemleak_scan_thread
>   #1: ffffc1e2cf14e690 (mem_hotplug_lock){++++}-{0:0}, at: get_online_mems
>  2 locks held by cppc_fie/1805:
>  1 lock held by in:imklog/2822:
>  8 locks held by tee/3334:
>   #0: ffff0816d65c9438 (sb_writers#6){.+.+}-{0:0}, at: vfs_write
>   #1: ffff40025438be88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter
>   #2: ffff4000c8261eb0 (kn->active#298){.+.+}-{0:0}, at: kernfs_fop_write_iter
>   #3: ffffc1e2d0013f68 (device_hotplug_lock){+.+.}-{3:3}, at: online_store
>   #4: ffff0800cd8bb998 (&dev->mutex){....}-{3:3}, at: device_offline
>   #5: ffffc1e2ceed3750 (cpu_hotplug_lock){++++}-{0:0}, at: cpus_read_lock
>   #6: ffffc1e2cf14e690 (mem_hotplug_lock){++++}-{0:0}, at: offline_pages
>   #7: ffffc1e2cf13bf68 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable
>  __zone_set_pageset_high_and_batch at mm/page_alloc.c:7005
>  (inlined by) zone_pcp_disable at mm/page_alloc.c:9286
>
> Later, running some kernel compilation workloads could trigger a crash.
>
>  Unable to handle kernel paging request at virtual address fffffbfffe000030
>  KASAN: maybe wild-memory-access in range [0x0003dffff0000180-0x0003dffff0000187]
>  Mem abort info:
>    ESR = 0x96000006
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>    FSC = 0x06: level 2 translation fault
>  Data abort info:
>    ISV = 0, ISS = 0x00000006
>    CM = 0, WnR = 0
>  swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000817545fd000
>  [fffffbfffe000030] pgd=00000817581e9003, p4d=00000817581e9003, pud=00000817581ea003, pmd=0000000000000000
>  Internal error: Oops: 96000006 [#1] PREEMPT SMP
>  Modules linked in: bridge stp llc cdc_ether usbnet ipmi_devintf ipmi_msghandler cppc_cpufreq fuse ip_tables x_tables ipv6 btrfs blake2b_generic libcrc32c xor xor_neon raid6_pq zstd_compress dm_mod nouveau drm_ttm_helper ttm crct10dif_ce mlx5_core drm_display_helper drm_kms_helper nvme mpt3sas xhci_pci nvme_core drm raid_class xhci_pci_renesas
>  CPU: 147 PID: 3334 Comm: tee Not tainted 5.18.0-rc4-next-20220426-dirty #27
>  pstate: 10400009 (nzcV daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>  pc : isolate_single_pageblock
>  lr : isolate_single_pageblock
>  sp : ffff80003e767500
>  x29: ffff80003e767500 x28: 0000000000000000 x27: ffff783c59963b1f
>  x26: dfff800000000000 x25: ffffc1e2ccb1d000 x24: ffffc1e2ccb1d8f8
>  x23: 00000000803bfe00 x22: ffffc1e2cee39098 x21: 0000000000000020
>  x20: 00000000803c0000 x19: fffffbfffe000000 x18: ffffc1e2cee37d1c
>  x17: 0000000000000000 x16: 1fffe8004a86f14c x15: 1fffe806c89e154a
>  x14: 1fffe8004a86f11c x13: 0000000000000004 x12: ffff783c5c455e6d
>  x11: 1ffff83c5c455e6c x10: ffff783c5c455e6c x9 : dfff800000000000
>  x8 : ffffc1e2e22af363 x7 : 0000000000000001 x6 : 0000000000000003
>  x5 : ffffc1e2e22af360 x4 : ffff783c5c455e6c x3 : ffff700007cece90
>  x2 : 0000000000000003 x1 : 0000000000000000 x0 : fffffbfffe000030
>  Call trace:
>  Call trace:
>   isolate_single_pageblock
>   PageBuddy at ./include/linux/page-flags.h:969 (discriminator 3)
>   (inlined by) isolate_single_pageblock at mm/page_isolation.c:414 (discriminator 3)
>   start_isolate_page_range
>   offline_pages
>   memory_subsys_offline
>   device_offline
>   online_store
>   dev_attr_store
>   sysfs_kf_write
>   kernfs_fop_write_iter
>   new_sync_write
>   vfs_write
>   ksys_write
>   __arm64_sys_write
>   invoke_syscall
>   el0_svc_common.constprop.0
>   do_el0_svc
>   el0_svc
>   el0t_64_sync_handler
>   el0t_64_sync
>  Code: 38fa6821 7100003f 7a411041 54000dca (b9403260)
>  ---[ end trace 0000000000000000 ]---
>  Kernel panic - not syncing: Oops: Fatal exception
>  SMP: stopping secondary CPUs
>  Kernel Offset: 0x41e2c0720000 from 0xffff800008000000
>  PHYS_OFFSET: 0x80000000
>  CPU features: 0x000,0021700d,19801c82
>  Memory Limit: none

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

  reply	other threads:[~2022-04-26 20:26 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-25 14:31 [PATCH v11 0/6] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
2022-04-25 14:31 ` [PATCH v11 1/6] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
2022-04-25 14:31 ` [PATCH v11 2/6] mm: page_isolation: check specified range for unmovable pages Zi Yan
2022-04-25 14:31 ` [PATCH v11 3/6] mm: make alloc_contig_range work at pageblock granularity Zi Yan
2022-04-29 13:54   ` Zi Yan
2022-05-24 19:00     ` Zi Yan
2022-05-25 17:41     ` Doug Berger
2022-05-25 17:53       ` Zi Yan
2022-05-25 21:03         ` Doug Berger
2022-05-25 21:11           ` Zi Yan
2022-05-26 17:34             ` Zi Yan
2022-05-26 19:46               ` Doug Berger
2022-04-25 14:31 ` [PATCH v11 4/6] mm: page_isolation: enable arbitrary range page isolation Zi Yan
2022-05-24 19:02   ` Zi Yan
2022-04-25 14:31 ` [PATCH v11 5/6] mm: cma: use pageblock_order as the single alignment Zi Yan
2022-04-25 14:31 ` [PATCH v11 6/6] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
2022-04-26 20:18 ` [PATCH v11 0/6] Use pageblock_order for cma and alloc_contig_range alignment Qian Cai
2022-04-26 20:26   ` Zi Yan [this message]
2022-04-26 21:08     ` Qian Cai
2022-04-26 21:38       ` Zi Yan
2022-04-27 12:41         ` Qian Cai
2022-04-27 13:10         ` Qian Cai
2022-04-27 13:27         ` Qian Cai
2022-04-27 13:30           ` Zi Yan
2022-04-27 21:04             ` Zi Yan
2022-04-28 12:33               ` Qian Cai
2022-04-28 12:39                 ` Zi Yan
2022-04-28 16:19                   ` Qian Cai
2022-04-29 13:38                     ` Zi Yan
2022-05-19 20:57                   ` Qian Cai
2022-05-19 21:35                     ` Zi Yan
2022-05-19 23:24                       ` Zi Yan
2022-05-20 11:30                       ` Qian Cai
2022-05-20 13:43                         ` Zi Yan
2022-05-20 14:13                           ` Zi Yan
2022-05-20 19:41                             ` Qian Cai
2022-05-20 21:56                               ` Zi Yan
2022-05-20 23:41                                 ` Qian Cai
2022-05-22 16:54                                   ` Zi Yan
2022-05-22 19:33                                     ` Zi Yan
2022-05-24 16:59                                     ` Qian Cai
2022-05-10  1:03 ` Andrew Morton
2022-05-10  1:07   ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B621B4DD-5D11-4F0E-AFF5-F8684AE37E57@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=osalvador@suse.de \
    --cc=quic_qiancai@quicinc.com \
    --cc=renzhengeek@gmail.com \
    --cc=rppt@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).