linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Muchun Song <songmuchun@bytedance.com>,
	corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, x86@kernel.org, hpa@zytor.com,
	dave.hansen@linux.intel.com, luto@kernel.org,
	peterz@infradead.org, viro@zeniv.linux.org.uk,
	akpm@linux-foundation.org, paulmck@kernel.org,
	pawan.kumar.gupta@linux.intel.com, rdunlap@infradead.org,
	oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de,
	almasrymina@google.com, rientjes@google.com, willy@infradead.org,
	osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com,
	david@redhat.com, naoya.horiguchi@nec.com,
	joao.m.martins@oracle.com
Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com,
	zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v21 6/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page
Date: Wed, 28 Apr 2021 19:42:39 -0700	[thread overview]
Message-ID: <956a52e4-3a2c-b0dc-3b5d-dd651a4aa54a@oracle.com> (raw)
In-Reply-To: <20210425070752.17783-7-songmuchun@bytedance.com>

On 4/25/21 12:07 AM, Muchun Song wrote:
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index d523a345dc86..d3abaaec2a22 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -525,6 +525,7 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>   *	code knows it has only reference.  All other examinations and
>   *	modifications require hugetlb_lock.
>   * HPG_freed - Set when page is on the free lists.
> + * HPG_vmemmap_optimized - Set when the vmemmap pages of the page are freed.
>   *	Synchronization: hugetlb_lock held for examination and modification.
>   */
>  enum hugetlb_page_flags {
> @@ -532,6 +533,7 @@ enum hugetlb_page_flags {
>  	HPG_migratable,
>  	HPG_temporary,
>  	HPG_freed,
> +	HPG_vmemmap_optimized,
>  	__NR_HPAGEFLAGS,
>  };
>  
> @@ -577,6 +579,7 @@ HPAGEFLAG(RestoreReserve, restore_reserve)
>  HPAGEFLAG(Migratable, migratable)
>  HPAGEFLAG(Temporary, temporary)
>  HPAGEFLAG(Freed, freed)
> +HPAGEFLAG(VmemmapOptimized, vmemmap_optimized)
>  
>  #ifdef CONFIG_HUGETLB_PAGE
>  

During migration, the page->private field of the original page may be
cleared.  This will clear all hugetlb specific flags.  Prior to this
new flag that was OK, as the only flag which could be set during migration
was the Temporary flag and that is transfered to the target page.

If VmemmapOptimized optimized flag is cleared in the original page, we
will get an addressing exception as shown below.

We should preserve page->private with something like this:

diff --git a/mm/migrate.c b/mm/migrate.c
index b234c3f3acb7..128e3e4126a2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -625,7 +625,9 @@ void migrate_page_states(struct page *newpage, struct page *page)
 	if (PageSwapCache(page))
 		ClearPageSwapCache(page);
 	ClearPagePrivate(page);
-	set_page_private(page, 0);
+	/* page->private contains hugetlb specific flags */
+	if (!PageHuge(page))
+		set_page_private(page, 0);
 
 	/*
 	 * If any waiters have accumulated on the new page then

-- 
Mike Kravetz


[  209.568110] BUG: unable to handle page fault for address: ffffea0004a5a000
[  209.569417] #PF: supervisor write access in kernel mode
[  209.570932] #PF: error_code(0x0003) - permissions violation
[  209.572059] PGD 23fff8067 P4D 23fff8067 PUD 23fff7067 PMD 23ffd9067 PTE 800000021c98e061
[  209.573679] Oops: 0003 [#1] SMP PTI
[  209.574410] CPU: 1 PID: 1011 Comm: bash Not tainted 5.12.0-rc8-mm1+ #3
[  209.575730] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
[  209.577530] RIP: 0010:__update_and_free_page+0x58/0x2c0
[  209.578618] Code: a3 01 00 00 49 b8 00 00 00 00 00 16 00 00 4c 89 e0 bf 01 00 00 00 49 b9 00 00 00 00 00 ea ff ff 4d 01 e0 49 c1 f8 06 83 c2 01 <48> 81 20 d4 5e ff ff 48 83 c0 40 f7 c2 ff 03 00 00 0f 84 f3 00 00
[  209.582603] RSP: 0018:ffffc90001fdfa60 EFLAGS: 00010206
[  209.583629] RAX: ffffea0004a5a000 RBX: 0000000000000000 RCX: 0000000000000009
[  209.585148] RDX: 0000000000000081 RSI: 0000000000000200 RDI: 0000000000000001
[  209.586649] RBP: ffffffff839ada30 R08: 0000000000129600 R09: ffffea0000000000
[  209.588096] R10: 0000000000000001 R11: 0000000000000001 R12: ffffea0004a58000
[  209.589643] R13: 0000000000000200 R14: ffffea0005ff8000 R15: ffffc90001fdfba0
[  209.591194] FS:  00007f1e50065740(0000) GS:ffff888237d00000(0000) knlGS:0000000000000000
[  209.592989] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  209.594222] CR2: ffffea0004a5a000 CR3: 000000018cd46004 CR4: 0000000000370ee0
[  209.595762] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  209.597302] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  209.598925] Call Trace:
[  209.599496]  migrate_pages+0xd8f/0x1030
[  209.600372]  ? trace_event_raw_event_mm_migrate_pages_start+0xa0/0xa0
[  209.601745]  ? alloc_migration_target+0x1c0/0x1c0
[  209.602787]  alloc_contig_range+0x1e3/0x3d0
[  209.603718]  cma_alloc+0x1ae/0x5f0
[  209.604486]  alloc_fresh_huge_page+0x67/0x190
[  209.605481]  alloc_pool_huge_page+0x72/0xf0
[  209.606423]  set_max_huge_pages+0x128/0x2c0
[  209.607369]  __nr_hugepages_store_common+0x3d/0xb0
[  209.608442]  ? _kstrtoull+0x35/0xd0
[  209.609225]  nr_hugepages_store+0x73/0x80
[  209.610140]  kernfs_fop_write_iter+0x127/0x1c0
[  209.611162]  new_sync_write+0x11f/0x1b0
[  209.612069]  vfs_write+0x26f/0x380
[  209.612880]  ksys_write+0x68/0xe0
[  209.613628]  do_syscall_64+0x40/0x80
[  209.614456]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  209.615589] RIP: 0033:0x7f1e50155ff8
[  209.616474] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 77 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[  209.620629] RSP: 002b:00007ffd7e3f97c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  209.622319] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1e50155ff8
[  209.623966] RDX: 0000000000000002 RSI: 00005585ef557960 RDI: 0000000000000001
[  209.625568] RBP: 00005585ef557960 R08: 000000000000000a R09: 00007f1e501e7e80
[  209.627262] R10: 000000000000000a R11: 0000000000000246 R12: 00007f1e50229780
[  209.628916] R13: 0000000000000002 R14: 00007f1e50224740 R15: 0000000000000002
[  209.630457] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack rfkill nf_defrag_ipv6 nf_defrag_ipv4 ebtable_filter ebtables 9p ip6table_filter ip6_tables sunrpc snd_hda_codec_generic crct10dif_pclmul crc32_pclmul snd_hda_intel snd_intel_dspcfg ghash_clmulni_intel snd_hda_codec snd_hwdep joydev snd_hda_core snd_seq snd_seq_device snd_pcm virtio_balloon snd_timer snd soundcore 9pnet_virtio i2c_piix4 9pnet virtio_blk virtio_console virtio_net net_failover failover 8139too qxl drm_ttm_helper ttm drm_kms_helper crc32c_intel serio_raw drm 8139cp mii ata_generic virtio_pci virtio_pci_modern_dev virtio_ring pata_acpi virtio
[  209.647105] CR2: ffffea0004a5a000
[  209.647913] ---[ end trace 48e9b007521233a7 ]---

  reply	other threads:[~2021-04-29  2:43 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-25  7:07 [PATCH v21 0/9] Free some vmemmap pages of HugeTLB page Muchun Song
2021-04-25  7:07 ` [PATCH v21 1/9] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c Muchun Song
2021-04-25  7:07 ` [PATCH v21 2/9] mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP Muchun Song
2021-04-25  7:07 ` [PATCH v21 3/9] mm: hugetlb: gather discrete indexes of tail page Muchun Song
2021-04-25  7:07 ` [PATCH v21 4/9] mm: hugetlb: free the vmemmap pages associated with each HugeTLB page Muchun Song
2021-04-25  7:07 ` [PATCH v21 5/9] mm: hugetlb: defer freeing of HugeTLB pages Muchun Song
2021-04-25  7:07 ` [PATCH v21 6/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page Muchun Song
2021-04-29  2:42   ` Mike Kravetz [this message]
2021-04-29  4:05     ` [External] " Muchun Song
2021-04-25  7:07 ` [PATCH v21 7/9] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap Muchun Song
2021-04-25  7:07 ` [PATCH v21 8/9] mm: memory_hotplug: disable memmap_on_memory when hugetlb_free_vmemmap enabled Muchun Song
2021-04-25  7:07 ` [PATCH v21 9/9] mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate Muchun Song
2021-04-27 23:47 ` [PATCH v21 0/9] Free some vmemmap pages of HugeTLB page Mike Kravetz
2021-04-28 12:26   ` [External] " Muchun Song
2021-04-29  2:31     ` Mike Kravetz
2021-04-29  4:02       ` Muchun Song
2021-04-29 22:23         ` Mike Kravetz
2021-04-29 23:19           ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=956a52e4-3a2c-b0dc-3b5d-dd651a4aa54a@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=anshuman.khandual@arm.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=fam.zheng@bytedance.com \
    --cc=hpa@zytor.com \
    --cc=joao.m.martins@oracle.com \
    --cc=jroedel@suse.de \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=oneukum@suse.com \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=songmuchun@bytedance.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).