All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qian Cai <quic_qiancai@quicinc.com>
To: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>,
	<virtualization@lists.linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Eric Ren <renzhengeek@gmail.com>, Mike Rapoport <rppt@kernel.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v11 0/6] Use pageblock_order for cma and alloc_contig_range alignment.
Date: Tue, 26 Apr 2022 16:18:55 -0400	[thread overview]
Message-ID: <20220426201855.GA1014@qian> (raw)
In-Reply-To: <20220425143118.2850746-1-zi.yan@sent.com>

On Mon, Apr 25, 2022 at 10:31:12AM -0400, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Hi David,
> 
> This patchset tries to remove the MAX_ORDER-1 alignment requirement for CMA
> and alloc_contig_range(). It prepares for my upcoming changes to make
> MAX_ORDER adjustable at boot time[1]. It is on top of mmotm-2022-04-20-17-12.
> 
> Changelog
> ===
> V11
> ---
> 1. Moved start_isolate_page_range()/undo_isolate_page_range() alignment
>    change to a separate patch after the unmovable page check change and
>    alloc_contig_range() change to avoid some unwanted memory
>    hotplug/hotremove failures.
> 2. Cleaned up has_unmovable_pages() in Patch 2.
> 
> V10
> ---
> 1. Reverted back to the original outer_start, outer_end range for
>    test_pages_isolated() and isolate_freepages_range() in Patch 3,
>    otherwise isolation will fail if start in alloc_contig_range() is in
>    the middle of a free page.
> 
> V9
> ---
> 1. Limited has_unmovable_pages() check within a pageblock.
> 2. Added a check to ensure page isolation is done within a single zone
>    in isolate_single_pageblock().
> 3. Fixed an off-by-one bug in isolate_single_pageblock().
> 4. Fixed a NULL-deferencing bug when the pages before to-be-isolated pageblock
>    is not online in isolate_single_pageblock().
> 
> V8
> ---
> 1. Cleaned up has_unmovable_pages() to remove page argument.
> 
> V7
> ---
> 1. Added page validity check in isolate_single_pageblock() to avoid out
>    of zone pages.
> 2. Fixed a bug in split_free_page() to split and free pages in correct
>    page order.
> 
> V6
> ---
> 1. Resolved compilation error/warning reported by kernel test robot.
> 2. Tried to solve the coding concerns from Christophe Leroy.
> 3. Shortened lengthy lines (pointed out by Christoph Hellwig).
> 
> V5
> ---
> 1. Moved isolation address alignment handling in start_isolate_page_range().
> 2. Rewrote and simplified how alloc_contig_range() works at pageblock
>    granularity (Patch 3). Only two pageblock migratetypes need to be saved and
>    restored. start_isolate_page_range() might need to migrate pages in this
>    version, but it prevents the caller from worrying about
>    max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment after the page range
>    is isolated.
> 
> V4
> ---
> 1. Dropped two irrelevant patches on non-lru compound page handling, as
>    it is not supported upstream.
> 2. Renamed migratetype_has_fallback() to migratetype_is_mergeable().
> 3. Always check whether two pageblocks can be merged in
>    __free_one_page() when order is >= pageblock_order, as the case (not
>    mergeable pageblocks are isolated, CMA, and HIGHATOMIC) becomes more common.
> 3. Moving has_unmovable_pages() is now a separate patch.
> 4. Removed MAX_ORDER-1 alignment requirement in the comment in virtio_mem code.
> 
> Description
> ===
> 
> The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
> isolates pageblocks to remove free memory from buddy allocator but isolating
> only a subset of pageblocks within a page spanning across multiple pageblocks
> causes free page accounting issues. Isolated page might not be put into the
> right free list, since the code assumes the migratetype of the first pageblock
> as the whole free page migratetype. This is based on the discussion at [2].
> 
> To remove the requirement, this patchset:
> 1. isolates pages at pageblock granularity instead of
>    max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages);
> 2. splits free pages across the specified range or migrates in-use pages
>    across the specified range then splits the freed page to avoid free page
>    accounting issues (it happens when multiple pageblocks within a single page
>    have different migratetypes);
> 3. only checks unmovable pages within the range instead of MAX_ORDER - 1 aligned
>    range during isolation to avoid alloc_contig_range() failure when pageblocks
>    within a MAX_ORDER - 1 aligned range are allocated separately.
> 4. returns pages not in the range as it did before.
> 
> One optimization might come later:
> 1. make MIGRATE_ISOLATE a separate bit to be able to restore the original
>    migratetypes when isolation fails in the middle of the range.
> 
> Feel free to give comments and suggestions. Thanks.
> 
> [1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/
> [2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/
> 
> Zi Yan (6):
>   mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
>   mm: page_isolation: check specified range for unmovable pages
>   mm: make alloc_contig_range work at pageblock granularity
>   mm: page_isolation: enable arbitrary range page isolation.
>   mm: cma: use pageblock_order as the single alignment
>   drivers: virtio_mem: use pageblock size as the minimum virtio_mem
>     size.
> 
>  drivers/virtio/virtio_mem.c    |   6 +-
>  include/linux/cma.h            |   4 +-
>  include/linux/mmzone.h         |   5 +-
>  include/linux/page-isolation.h |   6 +-
>  mm/internal.h                  |   6 +
>  mm/memory_hotplug.c            |   3 +-
>  mm/page_alloc.c                | 191 +++++-------------
>  mm/page_isolation.c            | 345 +++++++++++++++++++++++++++++++--
>  8 files changed, 392 insertions(+), 174 deletions(-)

Reverting this series fixed a deadlock during memory offline/online
tests and then a crash.

 INFO: task kmemleak:1027 blocked for more than 120 seconds.
       Not tainted 5.18.0-rc4-next-20220426-dirty #27
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kmemleak        state:D stack:27744 pid: 1027 ppid:     2 flags:0x00000008
 Call trace:
  __switch_to
  __schedule
  schedule
  percpu_rwsem_wait
  __percpu_down_read
  percpu_down_read.constprop.0
  get_online_mems
  kmemleak_scan
  kmemleak_scan_thread
  kthread
  ret_from_fork

 Showing all locks held in the system:
 1 lock held by rcu_tasks_kthre/11:
  #0: ffffc1e2cefc17f0 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp
 1 lock held by rcu_tasks_rude_/12:
  #0: ffffc1e2cefc1a90 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp
 1 lock held by rcu_tasks_trace/13:
  #0: ffffc1e2cefc1db0 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp
 1 lock held by khungtaskd/824:
  #0: ffffc1e2cefc2820 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks
 2 locks held by kmemleak/1027:
  #0: ffffc1e2cf1aa628 (scan_mutex){+.+.}-{3:3}, at: kmemleak_scan_thread
  #1: ffffc1e2cf14e690 (mem_hotplug_lock){++++}-{0:0}, at: get_online_mems
 2 locks held by cppc_fie/1805:
 1 lock held by in:imklog/2822:
 8 locks held by tee/3334:
  #0: ffff0816d65c9438 (sb_writers#6){.+.+}-{0:0}, at: vfs_write
  #1: ffff40025438be88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter
  #2: ffff4000c8261eb0 (kn->active#298){.+.+}-{0:0}, at: kernfs_fop_write_iter
  #3: ffffc1e2d0013f68 (device_hotplug_lock){+.+.}-{3:3}, at: online_store
  #4: ffff0800cd8bb998 (&dev->mutex){....}-{3:3}, at: device_offline
  #5: ffffc1e2ceed3750 (cpu_hotplug_lock){++++}-{0:0}, at: cpus_read_lock
  #6: ffffc1e2cf14e690 (mem_hotplug_lock){++++}-{0:0}, at: offline_pages
  #7: ffffc1e2cf13bf68 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable
 __zone_set_pageset_high_and_batch at mm/page_alloc.c:7005
 (inlined by) zone_pcp_disable at mm/page_alloc.c:9286

Later, running some kernel compilation workloads could trigger a crash.

 Unable to handle kernel paging request at virtual address fffffbfffe000030
 KASAN: maybe wild-memory-access in range [0x0003dffff0000180-0x0003dffff0000187]
 Mem abort info:
   ESR = 0x96000006
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x06: level 2 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000006
   CM = 0, WnR = 0
 swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000817545fd000
 [fffffbfffe000030] pgd=00000817581e9003, p4d=00000817581e9003, pud=00000817581ea003, pmd=0000000000000000
 Internal error: Oops: 96000006 [#1] PREEMPT SMP
 Modules linked in: bridge stp llc cdc_ether usbnet ipmi_devintf ipmi_msghandler cppc_cpufreq fuse ip_tables x_tables ipv6 btrfs blake2b_generic libcrc32c xor xor_neon raid6_pq zstd_compress dm_mod nouveau drm_ttm_helper ttm crct10dif_ce mlx5_core drm_display_helper drm_kms_helper nvme mpt3sas xhci_pci nvme_core drm raid_class xhci_pci_renesas
 CPU: 147 PID: 3334 Comm: tee Not tainted 5.18.0-rc4-next-20220426-dirty #27
 pstate: 10400009 (nzcV daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : isolate_single_pageblock
 lr : isolate_single_pageblock
 sp : ffff80003e767500
 x29: ffff80003e767500 x28: 0000000000000000 x27: ffff783c59963b1f
 x26: dfff800000000000 x25: ffffc1e2ccb1d000 x24: ffffc1e2ccb1d8f8
 x23: 00000000803bfe00 x22: ffffc1e2cee39098 x21: 0000000000000020
 x20: 00000000803c0000 x19: fffffbfffe000000 x18: ffffc1e2cee37d1c
 x17: 0000000000000000 x16: 1fffe8004a86f14c x15: 1fffe806c89e154a
 x14: 1fffe8004a86f11c x13: 0000000000000004 x12: ffff783c5c455e6d
 x11: 1ffff83c5c455e6c x10: ffff783c5c455e6c x9 : dfff800000000000
 x8 : ffffc1e2e22af363 x7 : 0000000000000001 x6 : 0000000000000003
 x5 : ffffc1e2e22af360 x4 : ffff783c5c455e6c x3 : ffff700007cece90
 x2 : 0000000000000003 x1 : 0000000000000000 x0 : fffffbfffe000030
 Call trace:
 Call trace:
  isolate_single_pageblock
  PageBuddy at ./include/linux/page-flags.h:969 (discriminator 3)
  (inlined by) isolate_single_pageblock at mm/page_isolation.c:414 (discriminator 3)
  start_isolate_page_range
  offline_pages
  memory_subsys_offline
  device_offline
  online_store
  dev_attr_store
  sysfs_kf_write
  kernfs_fop_write_iter
  new_sync_write
  vfs_write
  ksys_write
  __arm64_sys_write
  invoke_syscall
  el0_svc_common.constprop.0
  do_el0_svc
  el0_svc
  el0t_64_sync_handler
  el0t_64_sync
 Code: 38fa6821 7100003f 7a411041 54000dca (b9403260)
 ---[ end trace 0000000000000000 ]---
 Kernel panic - not syncing: Oops: Fatal exception
 SMP: stopping secondary CPUs
 Kernel Offset: 0x41e2c0720000 from 0xffff800008000000
 PHYS_OFFSET: 0x80000000
 CPU features: 0x000,0021700d,19801c82
 Memory Limit: none

  parent reply	other threads:[~2022-04-26 20:19 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-25 14:31 [PATCH v11 0/6] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
2022-04-25 14:31 ` [PATCH v11 1/6] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
2022-04-25 14:31 ` [PATCH v11 2/6] mm: page_isolation: check specified range for unmovable pages Zi Yan
2022-04-25 14:31 ` [PATCH v11 3/6] mm: make alloc_contig_range work at pageblock granularity Zi Yan
2022-04-29 13:54   ` Zi Yan
2022-05-24 19:00     ` Zi Yan
2022-05-25 17:41     ` Doug Berger
2022-05-25 17:53       ` Zi Yan
2022-05-25 21:03         ` Doug Berger
2022-05-25 21:11           ` Zi Yan
2022-05-26 17:34             ` Zi Yan
2022-05-26 19:46               ` Doug Berger
2022-04-25 14:31 ` [PATCH v11 4/6] mm: page_isolation: enable arbitrary range page isolation Zi Yan
2022-05-24 19:02   ` Zi Yan
2022-04-25 14:31 ` [PATCH v11 5/6] mm: cma: use pageblock_order as the single alignment Zi Yan
2022-04-25 14:31 ` [PATCH v11 6/6] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
2022-04-26 20:18 ` Qian Cai [this message]
2022-04-26 20:26   ` [PATCH v11 0/6] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
2022-04-26 21:08     ` Qian Cai
2022-04-26 21:38       ` Zi Yan
2022-04-27 12:41         ` Qian Cai
2022-04-27 13:10         ` Qian Cai
2022-04-27 13:27         ` Qian Cai
2022-04-27 13:30           ` Zi Yan
2022-04-27 21:04             ` Zi Yan
2022-04-28 12:33               ` Qian Cai
2022-04-28 12:39                 ` Zi Yan
2022-04-28 16:19                   ` Qian Cai
2022-04-29 13:38                     ` Zi Yan
2022-05-19 20:57                   ` Qian Cai
2022-05-19 21:35                     ` Zi Yan
2022-05-19 23:24                       ` Zi Yan
2022-05-20 11:30                       ` Qian Cai
2022-05-20 13:43                         ` Zi Yan
2022-05-20 14:13                           ` Zi Yan
2022-05-20 19:41                             ` Qian Cai
2022-05-20 21:56                               ` Zi Yan
2022-05-20 23:41                                 ` Qian Cai
2022-05-22 16:54                                   ` Zi Yan
2022-05-22 19:33                                     ` Zi Yan
2022-05-24 16:59                                     ` Qian Cai
2022-05-10  1:03 ` Andrew Morton
2022-05-10  1:03   ` Andrew Morton
2022-05-10  1:07   ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220426201855.GA1014@qian \
    --to=quic_qiancai@quicinc.com \
    --cc=akpm@linux-foundation.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=osalvador@suse.de \
    --cc=renzhengeek@gmail.com \
    --cc=rppt@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.