From: Michael Roth <michael.roth@amd.com>
To: Chao Peng <chao.p.peng@linux.intel.com>
Cc: <kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<linux-mm@kvack.org>, <linux-fsdevel@vger.kernel.org>,
<linux-arch@vger.kernel.org>, <linux-api@vger.kernel.org>,
<linux-doc@vger.kernel.org>, <qemu-devel@nongnu.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Jonathan Corbet <corbet@lwn.net>,
Sean Christopherson <seanjc@google.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
<x86@kernel.org>, "H . Peter Anvin" <hpa@zytor.com>,
Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>,
"J . Bruce Fields" <bfields@fieldses.org>,
Andrew Morton <akpm@linux-foundation.org>,
Shuah Khan <shuah@kernel.org>, Mike Rapoport <rppt@kernel.org>,
Steven Price <steven.price@arm.com>,
"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
Vlastimil Babka <vbabka@suse.cz>,
Vishal Annapurve <vannapurve@google.com>,
Yu Zhang <yu.c.zhang@linux.intel.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
<luto@kernel.org>, <jun.nakajima@intel.com>,
<dave.hansen@intel.com>, <ak@linux.intel.com>, <david@redhat.com>,
<aarcange@redhat.com>, <ddutile@redhat.com>,
<dhildenb@redhat.com>, Quentin Perret <qperret@google.com>,
<tabba@google.com>, <mhocko@suse.com>,
Muchun Song <songmuchun@bytedance.com>, <wei.w.wang@intel.com>
Subject: Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory
Date: Mon, 28 Nov 2022 18:06:32 -0600 [thread overview]
Message-ID: <20221129000632.sz6pobh6p7teouiu@amd.com> (raw)
In-Reply-To: <20221025151344.3784230-2-chao.p.peng@linux.intel.com>
On Tue, Oct 25, 2022 at 11:13:37PM +0800, Chao Peng wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
<snip>
> +static struct file *restrictedmem_file_create(struct file *memfd)
> +{
> + struct restrictedmem_data *data;
> + struct address_space *mapping;
> + struct inode *inode;
> + struct file *file;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data)
> + return ERR_PTR(-ENOMEM);
> +
> + data->memfd = memfd;
> + mutex_init(&data->lock);
> + INIT_LIST_HEAD(&data->notifiers);
> +
> + inode = alloc_anon_inode(restrictedmem_mnt->mnt_sb);
> + if (IS_ERR(inode)) {
> + kfree(data);
> + return ERR_CAST(inode);
> + }
> +
> + inode->i_mode |= S_IFREG;
> + inode->i_op = &restrictedmem_iops;
> + inode->i_mapping->private_data = data;
> +
> + file = alloc_file_pseudo(inode, restrictedmem_mnt,
> + "restrictedmem", O_RDWR,
> + &restrictedmem_fops);
> + if (IS_ERR(file)) {
> + iput(inode);
> + kfree(data);
> + return ERR_CAST(file);
> + }
> +
> + file->f_flags |= O_LARGEFILE;
> +
> + mapping = memfd->f_mapping;
> + mapping_set_unevictable(mapping);
> + mapping_set_gfp_mask(mapping,
> + mapping_gfp_mask(mapping) & ~__GFP_MOVABLE);
Is this supposed to prevent migration of pages being used for
restrictedmem/shmem backend?
In my case I've been testing SNP support based on UPM v9, and for
large guests (128GB+), if I force 2M THPs via:
echo always >/sys/kernel/mm/transparent_hugepages/shmem_enabled
it will in some cases trigger the below trace, which suggests that
kcompactd is trying to call migrate_folio() on a PFN that was/is
still allocated for guest private memory (and so has been removed from
directmap as part of shared->private conversation via REG_REGION kvm
ioctl, leading to the crash). This trace seems to occur during early
OVMF boot while the guest is in the middle of pre-accepting on private
memory (no lazy accept in this case).
Is this expected behavior? What else needs to be done to ensure
migrations aren't attempted in this case?
Thanks!
-Mike
# Host logs with debug info for crash during SNP boot
...
[ 904.373632] kvm_restricted_mem_get_pfn: GFN: 0x1caced1, PFN: 0x156b7f, page: ffffea0006b197b0, ref_count: 2
[ 904.373634] kvm_restricted_mem_get_pfn: GFN: 0x1caced2, PFN: 0x156840, page: ffffea0006b09400, ref_count: 2
[ 904.373637] kvm_restricted_mem_get_pfn: GFN: 0x1caced3, PFN: 0x156841, page: ffffea0006b09450, ref_count: 2
[ 904.373639] kvm_restricted_mem_get_pfn: GFN: 0x1caced4, PFN: 0x156842, page: ffffea0006b094a0, ref_count: 2
[ 904.373641] kvm_restricted_mem_get_pfn: GFN: 0x1caced5, PFN: 0x156843, page: ffffea0006b094f0, ref_count: 2
[ 904.373645] kvm_restricted_mem_get_pfn: GFN: 0x1caced6, PFN: 0x156844, page: ffffea0006b09540, ref_count: 2
[ 904.373647] kvm_restricted_mem_get_pfn: GFN: 0x1caced7, PFN: 0x156845, page: ffffea0006b09590, ref_count: 2
[ 904.373649] kvm_restricted_mem_get_pfn: GFN: 0x1caced8, PFN: 0x156846, page: ffffea0006b095e0, ref_count: 2
[ 904.373652] kvm_restricted_mem_get_pfn: GFN: 0x1caced9, PFN: 0x156847, page: ffffea0006b09630, ref_count: 2
[ 904.373654] kvm_restricted_mem_get_pfn: GFN: 0x1caceda, PFN: 0x156848, page: ffffea0006b09680, ref_count: 2
[ 904.373656] kvm_restricted_mem_get_pfn: GFN: 0x1cacedb, PFN: 0x156849, page: ffffea0006b096d0, ref_count: 2
[ 904.373661] kvm_restricted_mem_get_pfn: GFN: 0x1cacedc, PFN: 0x15684a, page: ffffea0006b09720, ref_count: 2
[ 904.373663] kvm_restricted_mem_get_pfn: GFN: 0x1cacedd, PFN: 0x15684b, page: ffffea0006b09770, ref_count: 2
# PFN 0x15684c is allocated for guest private memory, will have been removed from directmap as part of RMP requirements
[ 904.373665] kvm_restricted_mem_get_pfn: GFN: 0x1cacede, PFN: 0x15684c, page: ffffea0006b097c0, ref_count: 2
...
# kcompactd crashes trying to copy PFN 0x15684c to a new folio, crashes trying to access PFN via directmap
[ 904.470135] Migrating restricted page, SRC pfn: 0x15684c, folio_ref_count: 2, folio_order: 0
[ 904.470154] BUG: unable to handle page fault for address: ffff88815684c000
[ 904.470314] kvm_restricted_mem_get_pfn: GFN: 0x1cafe00, PFN: 0x19f6d0, page: ffffea00081d2100, ref_count: 2
[ 904.477828] #PF: supervisor read access in kernel mode
[ 904.477831] #PF: error_code(0x0000) - not-present page
[ 904.477833] PGD 6601067 P4D 6601067 PUD 1569ad063 PMD 1569af063 PTE 800ffffea97b3060
[ 904.508806] Oops: 0000 [#1] SMP NOPTI
[ 904.512892] CPU: 52 PID: 1563 Comm: kcompactd0 Tainted: G E 6.0.0-rc7-hsnp-v7pfdv9d+ #10
[ 904.523473] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM1006B 08/20/2021
[ 904.532499] RIP: 0010:copy_page+0x7/0x10
[ 904.536877] Code: 00 66 90 48 89 f8 48 89 d1 f3 a4 31 c0 c3 cc cc cc cc 48 89 c8 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 66 90 b9 00 02 00 00 <f3> 48 a5 c3 cc cc cc cc 90 48 83 ec 10 48 89 1c 24 4c 89 64 24 08
[ 904.557831] RSP: 0018:ffffc900106dfb78 EFLAGS: 00010286
[ 904.563661] RAX: ffff888000000000 RBX: ffffea0006b09810 RCX: 0000000000000200
[ 904.571622] RDX: ffffea0000000000 RSI: ffff88815684c000 RDI: ffff88816bc5d000
[ 904.579581] RBP: ffffc900106dfba0 R08: 0000000000000001 R09: ffffea0006b097c0
[ 904.587541] R10: 0000000000000002 R11: ffffc900106dfb38 R12: ffffea00071add60
[ 904.595502] R13: cccccccccccccccd R14: ffffea0006b09810 R15: ffff888159c1e0f8
[ 904.603462] FS: 0000000000000000(0000) GS:ffff88a04df00000(0000) knlGS:0000000000000000
[ 904.612489] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 904.618897] CR2: ffff88815684c000 CR3: 00000020eae16002 CR4: 0000000000770ee0
[ 904.626855] PKRU: 55555554
[ 904.629870] Call Trace:
[ 904.632594] <TASK>
[ 904.634928] ? folio_copy+0x8c/0xe0
[ 904.638818] migrate_folio+0x5b/0x110
[ 904.642901] move_to_new_folio+0x5b/0x150
[ 904.647371] migrate_pages+0x11bb/0x1830
[ 904.651743] ? move_freelist_tail+0xc0/0xc0
[ 904.656406] ? isolate_freepages_block+0x470/0x470
[ 904.661749] compact_zone+0x681/0xda0
[ 904.665832] kcompactd_do_work+0x1b3/0x2c0
[ 904.670400] kcompactd+0x257/0x330
[ 904.674190] ? prepare_to_wait_event+0x120/0x120
[ 904.679338] ? kcompactd_do_work+0x2c0/0x2c0
[ 904.684098] kthread+0xcf/0xf0
[ 904.687501] ? kthread_complete_and_exit+0x20/0x20
[ 904.692844] ret_from_fork+0x22/0x30
[ 904.696830] </TASK>
[ 904.699262] Modules linked in: nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) xt_addrtype(E) br_netfilter(E) xt_CHECKSUM(E) xt_MASQUERADE(E) xt_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) ip6table_mangle(E) ip6table_nat(E) iptable_mangle(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) nfnetlink(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) bpfilter(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) bridge(E) stp(E) llc(E) kvm_amd(E) overlay(E) nls_iso8859_1(E) kvm(E) crct10dif_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) cryptd(E) rapl(E) ipmi_si(E) ipmi_devintf(E) wmi_bmof(E) ipmi_msghandler(E) efi_pstore(E) binfmt_misc(E) ast(E) drm_vram_helper(E) joydev(E) drm_ttm_helper(E) ttm(E) drm_kms_helper(E) input_leds(E) i2c_algo_bit(E) fb_sys_fops(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ccp(E) k10temp(E) mac_hid(E) sch_fq_codel(E) parport_pc(E) ppdev(E) lp(E) parport(E) drm(E) ip_tables(E)
[ 904.699316] x_tables(E) autofs4(E) btrfs(E) blake2b_generic(E) zstd_compress(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E) multipath(E) linear(E) crc32_pclmul(E) hid_generic(E) usbhid(E) hid(E) e1000e(E) i2c_piix4(E) wmi(E)
[ 904.828498] CR2: ffff88815684c000
[ 904.832193] ---[ end trace 0000000000000000 ]---
[ 904.937159] RIP: 0010:copy_page+0x7/0x10
[ 904.941524] Code: 00 66 90 48 89 f8 48 89 d1 f3 a4 31 c0 c3 cc cc cc cc 48 89 c8 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 66 90 b9 00 02 00 00 <f3> 48 a5 c3 cc cc cc cc 90 48 83 ec 10 48 89 1c 24 4c 89 64 24 08
[ 904.962478] RSP: 0018:ffffc900106dfb78 EFLAGS: 00010286
[ 904.968305] RAX: ffff888000000000 RBX: ffffea0006b09810 RCX: 0000000000000200
[ 904.976265] RDX: ffffea0000000000 RSI: ffff88815684c000 RDI: ffff88816bc5d000
[ 904.984227] RBP: ffffc900106dfba0 R08: 0000000000000001 R09: ffffea0006b097c0
[ 904.992187] R10: 0000000000000002 R11: ffffc900106dfb38 R12: ffffea00071add60
[ 905.000145] R13: cccccccccccccccd R14: ffffea0006b09810 R15: ffff888159c1e0f8
[ 905.008105] FS: 0000000000000000(0000) GS:ffff88a04df00000(0000) knlGS:0000000000000000
[ 905.017132] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 905.023540] CR2: ffff88815684c000 CR3: 00000020eae16002 CR4: 0000000000770ee0
[ 905.031501] PKRU: 55555554
[ 905.034558] kvm_restricted_mem_get_pfn: GFN: 0x1cafe01, PFN: 0x19f6d1, page: ffffea00081d2150, ref_count: 2
[ 905.045455] kvm_restricted_mem_get_pfn: GFN: 0x1cafe02, PFN: 0x19f6d2, page: ffffea00081d21a0, ref_count: 2
...
next prev parent reply other threads:[~2022-11-29 0:07 UTC|newest]
Thread overview: 101+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-25 15:13 [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM Chao Peng
2022-10-25 15:13 ` [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory Chao Peng
2022-10-26 17:31 ` Isaku Yamahata
2022-10-28 6:12 ` Chao Peng
2022-10-27 10:20 ` Fuad Tabba
2022-10-31 17:47 ` Michael Roth
2022-11-01 11:37 ` Chao Peng
2022-11-01 15:19 ` Michael Roth
2022-11-01 19:30 ` Michael Roth
2022-11-02 14:53 ` Chao Peng
2022-11-02 21:19 ` Michael Roth
2022-11-14 14:02 ` Vlastimil Babka
2022-11-14 15:28 ` Kirill A. Shutemov
2022-11-14 22:16 ` Michael Roth
2022-11-15 9:48 ` Chao Peng
2022-11-14 22:16 ` Michael Roth
2022-11-02 21:14 ` Kirill A. Shutemov
2022-11-02 21:26 ` Michael Roth
2022-11-02 22:07 ` Michael Roth
2022-11-03 16:30 ` Kirill A. Shutemov
2022-11-29 0:06 ` Michael Roth [this message]
2022-11-29 11:21 ` Kirill A. Shutemov
2022-11-29 11:39 ` David Hildenbrand
2022-11-29 13:59 ` Chao Peng
2022-11-29 13:58 ` Chao Peng
2022-11-29 0:37 ` Michael Roth
2022-11-29 14:06 ` Chao Peng
2022-11-29 19:06 ` Michael Roth
2022-11-29 19:18 ` Michael Roth
2022-11-30 9:39 ` Chao Peng
2022-11-30 14:31 ` Michael Roth
2022-11-29 18:01 ` Vishal Annapurve
2022-12-02 2:16 ` Vishal Annapurve
2022-12-02 6:49 ` Chao Peng
2022-12-02 13:44 ` Kirill A . Shutemov
2022-10-25 15:13 ` [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory Chao Peng
2022-10-27 10:25 ` Fuad Tabba
2022-10-28 7:04 ` Xiaoyao Li
2022-10-31 14:14 ` Chao Peng
2022-11-14 16:04 ` Alex Bennée
2022-11-15 9:29 ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit Chao Peng
2022-10-25 15:26 ` Peter Maydell
2022-10-25 16:17 ` Sean Christopherson
2022-10-27 10:27 ` Fuad Tabba
2022-10-28 6:14 ` Chao Peng
2022-11-15 16:56 ` Alex Bennée
2022-11-16 3:14 ` Chao Peng
2022-11-16 19:03 ` Alex Bennée
2022-11-17 13:45 ` Chao Peng
2022-11-17 15:08 ` Alex Bennée
2022-11-18 1:32 ` Chao Peng
2022-11-18 13:23 ` Alex Bennée
2022-11-18 15:59 ` Sean Christopherson
2022-11-22 9:50 ` Chao Peng
2022-11-23 18:02 ` Sean Christopherson
2022-11-16 18:15 ` Andy Lutomirski
2022-11-16 18:48 ` Sean Christopherson
2022-11-17 13:42 ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry Chao Peng
2022-10-27 10:29 ` Fuad Tabba
2022-11-04 2:28 ` Chao Peng
2022-11-04 22:29 ` Sean Christopherson
2022-11-08 7:16 ` Chao Peng
2022-11-10 17:53 ` Sean Christopherson
2022-11-10 20:06 ` Sean Christopherson
2022-11-11 8:27 ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions Chao Peng
2022-10-27 10:31 ` Fuad Tabba
2022-11-03 23:04 ` Sean Christopherson
2022-11-04 8:28 ` Chao Peng
2022-11-04 21:19 ` Sean Christopherson
2022-11-08 8:24 ` Chao Peng
2022-11-08 1:35 ` Yuan Yao
2022-11-08 9:41 ` Chao Peng
2022-11-09 5:52 ` Yuan Yao
2022-11-16 22:24 ` Sean Christopherson
2022-11-17 13:20 ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed Chao Peng
2022-10-26 20:46 ` Isaku Yamahata
2022-10-28 6:38 ` Chao Peng
2022-11-08 12:08 ` Yuan Yao
2022-11-09 4:13 ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 7/8] KVM: Handle page fault for private memory Chao Peng
2022-10-26 21:54 ` Isaku Yamahata
2022-10-28 6:55 ` Chao Peng
2022-11-01 0:02 ` Isaku Yamahata
2022-11-01 11:38 ` Chao Peng
2022-11-16 20:50 ` Ackerley Tng
2022-11-16 22:13 ` Sean Christopherson
2022-11-17 13:25 ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 8/8] KVM: Enable and expose KVM_MEM_PRIVATE Chao Peng
2022-10-27 10:31 ` Fuad Tabba
2022-11-03 12:13 ` [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM Vishal Annapurve
2022-11-08 0:41 ` Isaku Yamahata
2022-11-09 15:54 ` Kirill A. Shutemov
2022-11-15 14:36 ` Kirill A. Shutemov
2022-11-14 11:43 ` Alex Bennée
2022-11-16 5:00 ` Chao Peng
2022-11-16 9:40 ` Alex Bennée
2022-11-17 14:16 ` Chao Peng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221129000632.sz6pobh6p7teouiu@amd.com \
--to=michael.roth@amd.com \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=bfields@fieldses.org \
--cc=bp@alien8.de \
--cc=chao.p.peng@linux.intel.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=ddutile@redhat.com \
--cc=dhildenb@redhat.com \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=jlayton@kernel.org \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=jun.nakajima@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mail@maciej.szmigiero.name \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qperret@google.com \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=songmuchun@bytedance.com \
--cc=steven.price@arm.com \
--cc=tabba@google.com \
--cc=tglx@linutronix.de \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
--cc=wei.w.wang@intel.com \
--cc=x86@kernel.org \
--cc=yu.c.zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).