All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Roth <michael.roth@amd.com>
To: Chao Peng <chao.p.peng@linux.intel.com>
Cc: <kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>, <linux-fsdevel@vger.kernel.org>,
	<linux-arch@vger.kernel.org>, <linux-api@vger.kernel.org>,
	<linux-doc@vger.kernel.org>, <qemu-devel@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Sean Christopherson <seanjc@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	<x86@kernel.org>, "H . Peter Anvin" <hpa@zytor.com>,
	Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>,
	"J . Bruce Fields" <bfields@fieldses.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shuah Khan <shuah@kernel.org>, Mike Rapoport <rppt@kernel.org>,
	Steven Price <steven.price@arm.com>,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	Vlastimil Babka <vbabka@suse.cz>,
	Vishal Annapurve <vannapurve@google.com>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	<luto@kernel.org>, <jun.nakajima@intel.com>,
	<dave.hansen@intel.com>, <ak@linux.intel.com>, <david@redhat.com>,
	<aarcange@redhat.com>, <ddutile@redhat.com>,
	<dhildenb@redhat.com>, Quentin Perret <qperret@google.com>,
	<tabba@google.com>, <mhocko@suse.com>,
	Muchun Song <songmuchun@bytedance.com>, <wei.w.wang@intel.com>
Subject: Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory
Date: Mon, 28 Nov 2022 18:06:32 -0600	[thread overview]
Message-ID: <20221129000632.sz6pobh6p7teouiu@amd.com> (raw)
In-Reply-To: <20221025151344.3784230-2-chao.p.peng@linux.intel.com>

On Tue, Oct 25, 2022 at 11:13:37PM +0800, Chao Peng wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 

<snip>

> +static struct file *restrictedmem_file_create(struct file *memfd)
> +{
> +	struct restrictedmem_data *data;
> +	struct address_space *mapping;
> +	struct inode *inode;
> +	struct file *file;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		return ERR_PTR(-ENOMEM);
> +
> +	data->memfd = memfd;
> +	mutex_init(&data->lock);
> +	INIT_LIST_HEAD(&data->notifiers);
> +
> +	inode = alloc_anon_inode(restrictedmem_mnt->mnt_sb);
> +	if (IS_ERR(inode)) {
> +		kfree(data);
> +		return ERR_CAST(inode);
> +	}
> +
> +	inode->i_mode |= S_IFREG;
> +	inode->i_op = &restrictedmem_iops;
> +	inode->i_mapping->private_data = data;
> +
> +	file = alloc_file_pseudo(inode, restrictedmem_mnt,
> +				 "restrictedmem", O_RDWR,
> +				 &restrictedmem_fops);
> +	if (IS_ERR(file)) {
> +		iput(inode);
> +		kfree(data);
> +		return ERR_CAST(file);
> +	}
> +
> +	file->f_flags |= O_LARGEFILE;
> +
> +	mapping = memfd->f_mapping;
> +	mapping_set_unevictable(mapping);
> +	mapping_set_gfp_mask(mapping,
> +			     mapping_gfp_mask(mapping) & ~__GFP_MOVABLE);

Is this supposed to prevent migration of pages being used for
restrictedmem/shmem backend?

In my case I've been testing SNP support based on UPM v9, and for
large guests (128GB+), if I force 2M THPs via:

  echo always >/sys/kernel/mm/transparent_hugepages/shmem_enabled

it will in some cases trigger the below trace, which suggests that
kcompactd is trying to call migrate_folio() on a PFN that was/is
still allocated for guest private memory (and so has been removed from
directmap as part of shared->private conversation via REG_REGION kvm
ioctl, leading to the crash). This trace seems to occur during early
OVMF boot while the guest is in the middle of pre-accepting on private
memory (no lazy accept in this case).

Is this expected behavior? What else needs to be done to ensure
migrations aren't attempted in this case?

Thanks!

-Mike


# Host logs with debug info for crash during SNP boot

...
[  904.373632] kvm_restricted_mem_get_pfn: GFN: 0x1caced1, PFN: 0x156b7f, page: ffffea0006b197b0, ref_count: 2
[  904.373634] kvm_restricted_mem_get_pfn: GFN: 0x1caced2, PFN: 0x156840, page: ffffea0006b09400, ref_count: 2
[  904.373637] kvm_restricted_mem_get_pfn: GFN: 0x1caced3, PFN: 0x156841, page: ffffea0006b09450, ref_count: 2
[  904.373639] kvm_restricted_mem_get_pfn: GFN: 0x1caced4, PFN: 0x156842, page: ffffea0006b094a0, ref_count: 2
[  904.373641] kvm_restricted_mem_get_pfn: GFN: 0x1caced5, PFN: 0x156843, page: ffffea0006b094f0, ref_count: 2
[  904.373645] kvm_restricted_mem_get_pfn: GFN: 0x1caced6, PFN: 0x156844, page: ffffea0006b09540, ref_count: 2
[  904.373647] kvm_restricted_mem_get_pfn: GFN: 0x1caced7, PFN: 0x156845, page: ffffea0006b09590, ref_count: 2
[  904.373649] kvm_restricted_mem_get_pfn: GFN: 0x1caced8, PFN: 0x156846, page: ffffea0006b095e0, ref_count: 2
[  904.373652] kvm_restricted_mem_get_pfn: GFN: 0x1caced9, PFN: 0x156847, page: ffffea0006b09630, ref_count: 2
[  904.373654] kvm_restricted_mem_get_pfn: GFN: 0x1caceda, PFN: 0x156848, page: ffffea0006b09680, ref_count: 2
[  904.373656] kvm_restricted_mem_get_pfn: GFN: 0x1cacedb, PFN: 0x156849, page: ffffea0006b096d0, ref_count: 2
[  904.373661] kvm_restricted_mem_get_pfn: GFN: 0x1cacedc, PFN: 0x15684a, page: ffffea0006b09720, ref_count: 2
[  904.373663] kvm_restricted_mem_get_pfn: GFN: 0x1cacedd, PFN: 0x15684b, page: ffffea0006b09770, ref_count: 2

# PFN 0x15684c is allocated for guest private memory, will have been removed from directmap as part of RMP requirements

[  904.373665] kvm_restricted_mem_get_pfn: GFN: 0x1cacede, PFN: 0x15684c, page: ffffea0006b097c0, ref_count: 2
...

# kcompactd crashes trying to copy PFN 0x15684c to a new folio, crashes trying to access PFN via directmap

[  904.470135] Migrating restricted page, SRC pfn: 0x15684c, folio_ref_count: 2, folio_order: 0
[  904.470154] BUG: unable to handle page fault for address: ffff88815684c000
[  904.470314] kvm_restricted_mem_get_pfn: GFN: 0x1cafe00, PFN: 0x19f6d0, page: ffffea00081d2100, ref_count: 2
[  904.477828] #PF: supervisor read access in kernel mode
[  904.477831] #PF: error_code(0x0000) - not-present page
[  904.477833] PGD 6601067 P4D 6601067 PUD 1569ad063 PMD 1569af063 PTE 800ffffea97b3060
[  904.508806] Oops: 0000 [#1] SMP NOPTI
[  904.512892] CPU: 52 PID: 1563 Comm: kcompactd0 Tainted: G            E      6.0.0-rc7-hsnp-v7pfdv9d+ #10
[  904.523473] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM1006B 08/20/2021
[  904.532499] RIP: 0010:copy_page+0x7/0x10
[  904.536877] Code: 00 66 90 48 89 f8 48 89 d1 f3 a4 31 c0 c3 cc cc cc cc 48 89 c8 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 66 90 b9 00 02 00 00 <f3> 48 a5 c3 cc cc cc cc 90 48 83 ec 10 48 89 1c 24 4c 89 64 24 08
[  904.557831] RSP: 0018:ffffc900106dfb78 EFLAGS: 00010286
[  904.563661] RAX: ffff888000000000 RBX: ffffea0006b09810 RCX: 0000000000000200
[  904.571622] RDX: ffffea0000000000 RSI: ffff88815684c000 RDI: ffff88816bc5d000
[  904.579581] RBP: ffffc900106dfba0 R08: 0000000000000001 R09: ffffea0006b097c0
[  904.587541] R10: 0000000000000002 R11: ffffc900106dfb38 R12: ffffea00071add60
[  904.595502] R13: cccccccccccccccd R14: ffffea0006b09810 R15: ffff888159c1e0f8
[  904.603462] FS:  0000000000000000(0000) GS:ffff88a04df00000(0000) knlGS:0000000000000000
[  904.612489] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  904.618897] CR2: ffff88815684c000 CR3: 00000020eae16002 CR4: 0000000000770ee0
[  904.626855] PKRU: 55555554
[  904.629870] Call Trace:
[  904.632594]  <TASK>
[  904.634928]  ? folio_copy+0x8c/0xe0
[  904.638818]  migrate_folio+0x5b/0x110
[  904.642901]  move_to_new_folio+0x5b/0x150
[  904.647371]  migrate_pages+0x11bb/0x1830
[  904.651743]  ? move_freelist_tail+0xc0/0xc0
[  904.656406]  ? isolate_freepages_block+0x470/0x470
[  904.661749]  compact_zone+0x681/0xda0
[  904.665832]  kcompactd_do_work+0x1b3/0x2c0
[  904.670400]  kcompactd+0x257/0x330
[  904.674190]  ? prepare_to_wait_event+0x120/0x120
[  904.679338]  ? kcompactd_do_work+0x2c0/0x2c0
[  904.684098]  kthread+0xcf/0xf0
[  904.687501]  ? kthread_complete_and_exit+0x20/0x20
[  904.692844]  ret_from_fork+0x22/0x30
[  904.696830]  </TASK>
[  904.699262] Modules linked in: nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) xt_addrtype(E) br_netfilter(E) xt_CHECKSUM(E) xt_MASQUERADE(E) xt_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) ip6table_mangle(E) ip6table_nat(E) iptable_mangle(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) nfnetlink(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) bpfilter(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) bridge(E) stp(E) llc(E) kvm_amd(E) overlay(E) nls_iso8859_1(E) kvm(E) crct10dif_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) cryptd(E) rapl(E) ipmi_si(E) ipmi_devintf(E) wmi_bmof(E) ipmi_msghandler(E) efi_pstore(E) binfmt_misc(E) ast(E) drm_vram_helper(E) joydev(E) drm_ttm_helper(E) ttm(E) drm_kms_helper(E) input_leds(E) i2c_algo_bit(E) fb_sys_fops(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ccp(E) k10temp(E) mac_hid(E) sch_fq_codel(E) parport_pc(E) ppdev(E) lp(E) parport(E) drm(E) ip_tables(E)
[  904.699316]  x_tables(E) autofs4(E) btrfs(E) blake2b_generic(E) zstd_compress(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E) multipath(E) linear(E) crc32_pclmul(E) hid_generic(E) usbhid(E) hid(E) e1000e(E) i2c_piix4(E) wmi(E)
[  904.828498] CR2: ffff88815684c000
[  904.832193] ---[ end trace 0000000000000000 ]---
[  904.937159] RIP: 0010:copy_page+0x7/0x10
[  904.941524] Code: 00 66 90 48 89 f8 48 89 d1 f3 a4 31 c0 c3 cc cc cc cc 48 89 c8 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 66 90 b9 00 02 00 00 <f3> 48 a5 c3 cc cc cc cc 90 48 83 ec 10 48 89 1c 24 4c 89 64 24 08
[  904.962478] RSP: 0018:ffffc900106dfb78 EFLAGS: 00010286
[  904.968305] RAX: ffff888000000000 RBX: ffffea0006b09810 RCX: 0000000000000200
[  904.976265] RDX: ffffea0000000000 RSI: ffff88815684c000 RDI: ffff88816bc5d000
[  904.984227] RBP: ffffc900106dfba0 R08: 0000000000000001 R09: ffffea0006b097c0
[  904.992187] R10: 0000000000000002 R11: ffffc900106dfb38 R12: ffffea00071add60
[  905.000145] R13: cccccccccccccccd R14: ffffea0006b09810 R15: ffff888159c1e0f8
[  905.008105] FS:  0000000000000000(0000) GS:ffff88a04df00000(0000) knlGS:0000000000000000
[  905.017132] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  905.023540] CR2: ffff88815684c000 CR3: 00000020eae16002 CR4: 0000000000770ee0
[  905.031501] PKRU: 55555554
[  905.034558] kvm_restricted_mem_get_pfn: GFN: 0x1cafe01, PFN: 0x19f6d1, page: ffffea00081d2150, ref_count: 2
[  905.045455] kvm_restricted_mem_get_pfn: GFN: 0x1cafe02, PFN: 0x19f6d2, page: ffffea00081d21a0, ref_count: 2
...

  parent reply	other threads:[~2022-11-29  0:08 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-25 15:13 [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM Chao Peng
2022-10-25 15:13 ` [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory Chao Peng
2022-10-26 17:31   ` Isaku Yamahata
2022-10-28  6:12     ` Chao Peng
2022-10-27 10:20   ` Fuad Tabba
2022-10-31 17:47   ` Michael Roth
2022-11-01 11:37     ` Chao Peng
2022-11-01 15:19       ` Michael Roth
2022-11-01 19:30         ` Michael Roth
2022-11-02 14:53           ` Chao Peng
2022-11-02 21:19             ` Michael Roth
2022-11-14 14:02         ` Vlastimil Babka
2022-11-14 15:28           ` Kirill A. Shutemov
2022-11-14 22:16             ` Michael Roth
2022-11-15  9:48               ` Chao Peng
2022-11-14 22:16           ` Michael Roth
2022-11-02 21:14     ` Kirill A. Shutemov
2022-11-02 21:26       ` Michael Roth
2022-11-02 22:07       ` Michael Roth
2022-11-03 16:30         ` Kirill A. Shutemov
2022-11-29  0:06   ` Michael Roth [this message]
2022-11-29 11:21     ` Kirill A. Shutemov
2022-11-29 11:39       ` David Hildenbrand
2022-11-29 13:59         ` Chao Peng
2022-11-29 13:58       ` Chao Peng
2022-11-29  0:37   ` Michael Roth
2022-11-29 14:06     ` Chao Peng
2022-11-29 19:06       ` Michael Roth
2022-11-29 19:18         ` Michael Roth
2022-11-30  9:39           ` Chao Peng
2022-11-30 14:31             ` Michael Roth
2022-11-29 18:01     ` Vishal Annapurve
2022-12-02  2:16   ` Vishal Annapurve
2022-12-02  6:49     ` Chao Peng
2022-12-02 13:44       ` Kirill A . Shutemov
2022-10-25 15:13 ` [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory Chao Peng
2022-10-27 10:25   ` Fuad Tabba
2022-10-28  7:04   ` Xiaoyao Li
2022-10-31 14:14     ` Chao Peng
2022-11-14 16:04   ` Alex Bennée
2022-11-15  9:29     ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit Chao Peng
2022-10-25 15:26   ` Peter Maydell
2022-10-25 16:17     ` Sean Christopherson
2022-10-27 10:27   ` Fuad Tabba
2022-10-28  6:14     ` Chao Peng
2022-11-15 16:56   ` Alex Bennée
2022-11-16  3:14     ` Chao Peng
2022-11-16 19:03       ` Alex Bennée
2022-11-17 13:45         ` Chao Peng
2022-11-17 15:08           ` Alex Bennée
2022-11-18  1:32             ` Chao Peng
2022-11-18 13:23               ` Alex Bennée
2022-11-18 15:59                 ` Sean Christopherson
2022-11-22  9:50                   ` Chao Peng
2022-11-23 18:02                     ` Sean Christopherson
2022-11-16 18:15   ` Andy Lutomirski
2022-11-16 18:48     ` Sean Christopherson
2022-11-17 13:42       ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry Chao Peng
2022-10-27 10:29   ` Fuad Tabba
2022-11-04  2:28     ` Chao Peng
2022-11-04 22:29       ` Sean Christopherson
2022-11-08  7:16         ` Chao Peng
2022-11-10 17:53           ` Sean Christopherson
2022-11-10 20:06   ` Sean Christopherson
2022-11-11  8:27     ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions Chao Peng
2022-10-27 10:31   ` Fuad Tabba
2022-11-03 23:04   ` Sean Christopherson
2022-11-04  8:28     ` Chao Peng
2022-11-04 21:19       ` Sean Christopherson
2022-11-08  8:24         ` Chao Peng
2022-11-08  1:35   ` Yuan Yao
2022-11-08  9:41     ` Chao Peng
2022-11-09  5:52       ` Yuan Yao
2022-11-16 22:24   ` Sean Christopherson
2022-11-17 13:20     ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed Chao Peng
2022-10-26 20:46   ` Isaku Yamahata
2022-10-28  6:38     ` Chao Peng
2022-11-08 12:08   ` Yuan Yao
2022-11-09  4:13     ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 7/8] KVM: Handle page fault for private memory Chao Peng
2022-10-26 21:54   ` Isaku Yamahata
2022-10-28  6:55     ` Chao Peng
2022-11-01  0:02       ` Isaku Yamahata
2022-11-01 11:38         ` Chao Peng
2022-11-16 20:50   ` Ackerley Tng
2022-11-16 22:13     ` Sean Christopherson
2022-11-17 13:25       ` Chao Peng
2022-10-25 15:13 ` [PATCH v9 8/8] KVM: Enable and expose KVM_MEM_PRIVATE Chao Peng
2022-10-27 10:31   ` Fuad Tabba
2022-11-03 12:13 ` [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM Vishal Annapurve
2022-11-08  0:41   ` Isaku Yamahata
2022-11-09 15:54     ` Kirill A. Shutemov
2022-11-15 14:36       ` Kirill A. Shutemov
2022-11-14 11:43 ` Alex Bennée
2022-11-16  5:00   ` Chao Peng
2022-11-16  9:40     ` Alex Bennée
2022-11-17 14:16       ` Chao Peng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221129000632.sz6pobh6p7teouiu@amd.com \
    --to=michael.roth@amd.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=bp@alien8.de \
    --cc=chao.p.peng@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=ddutile@redhat.com \
    --cc=dhildenb@redhat.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jlayton@kernel.org \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=jun.nakajima@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qperret@google.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=songmuchun@bytedance.com \
    --cc=steven.price@arm.com \
    --cc=tabba@google.com \
    --cc=tglx@linutronix.de \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=wei.w.wang@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.