From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4EB52C4320E for ; Fri, 27 Aug 2021 22:19:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C2A80608FE for ; Fri, 27 Aug 2021 22:18:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org C2A80608FE Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D6886900006; Fri, 27 Aug 2021 18:18:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D1790900002; Fri, 27 Aug 2021 18:18:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BDF09900006; Fri, 27 Aug 2021 18:18:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0152.hostedemail.com [216.40.44.152]) by kanga.kvack.org (Postfix) with ESMTP id A08DE900002 for ; Fri, 27 Aug 2021 18:18:58 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 53FA8181CA760 for ; Fri, 27 Aug 2021 22:18:58 +0000 (UTC) X-FDA: 78522276756.25.B733B4F Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf02.hostedemail.com (Postfix) with ESMTP id 0C85C7001A0A for ; Fri, 27 Aug 2021 22:18:57 +0000 (UTC) Received: by mail-pf1-f170.google.com with SMTP id 18so6799957pfh.9 for ; Fri, 27 Aug 2021 15:18:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Rd2j1K7U4hRg00mMM07NvFIy1Z4HrCM8V2x0jTRZdHc=; b=Ir0TCeJ/oEbBpLCFe3DigxZyeNESMUbjXWFtm1bmgt5YKV0hkbuJC3dJVW7OhLbAYW +CNFaoOcUPqHJVf9zdoevfwQEwaVfK040RXUOHFWgMWxLAAxp0xml+kSW7SHxsweEJe/ BY2m5dfkHfvXcKceHJf0H659WAY7HhITZt8TS4NCzCeyZvTfTi5eOxTJx+fvAXfQbbF1 D8C85ucITkf1U4rC+rnF1l6C9PW/QXV/LLiIF9IvfD89GiHBX5uwCUV1Ots0Oaq+XSpe mau9V6tRrcLxcPyKK2aQ/onBId9GLdt6RecXowjcw4UOBOUlt4xSmOyrjeeuhojAN0qF ixqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Rd2j1K7U4hRg00mMM07NvFIy1Z4HrCM8V2x0jTRZdHc=; b=BvN5CptnWlh5ozf7sse8TdYErrloa7lcETIjhZc/aX3RhEbkBnFhqyWB2mDbjO81UX muEwc6KAtMxRVxIbPMzUWC8+N3j+/M7YUVKCSXD2w+PpnQJZRQ3lHCd8fyaReDmjOqMc 3k4bIAwjUXLQVRgt9MAk0cLH90yJw14dOq/oYBH0GCrw6fB8EI8B8nPCOM+E8Ss+I2tv PLeCZztdZZC//VsmPh/mRT/lYpMmurQTY9+0D5OiZdzXC8ZNXQ28Xi93MS2mDejJJUQZ egewQuiMxF6idhZGfTOk/lHZ6tKorDlHOmNJ1KLkmaWFgImvSm82RhXg1CedAAG8i+bD EAaQ== X-Gm-Message-State: AOAM532hrjvBih2Dbrb51pCkIBdoZXLnWOflC7zPhxrtP0hCorq7KbTO Qwm1nL16bnbntpG9UiPZYBRo+g== X-Google-Smtp-Source: ABdhPJzuJP6dIT72j2d/aCBLY9krrtGRa2afNkv9q820gAKetasQmVhVHa+tvbyJjtZXZL8dDZGGhg== X-Received: by 2002:a62:6007:0:b029:3cd:e67a:ef9e with SMTP id u7-20020a6260070000b02903cde67aef9emr11198834pfb.72.1630102736650; Fri, 27 Aug 2021 15:18:56 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id q1sm6782229pfj.132.2021.08.27.15.18.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Aug 2021 15:18:56 -0700 (PDT) Date: Fri, 27 Aug 2021 22:18:52 +0000 From: Sean Christopherson To: David Hildenbrand Cc: Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Borislav Petkov , Andy Lutomirski , Andrew Morton , Joerg Roedel , Andi Kleen , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, "Kirill A . Shutemov" , "Kirill A . Shutemov" , Kuppuswamy Sathyanarayanan , Dave Hansen , Yu Zhang Subject: Re: [RFC] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: References: <20210824005248.200037-1-seanjc@google.com> <307d385a-a263-276f-28eb-4bc8dd287e32@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <307d385a-a263-276f-28eb-4bc8dd287e32@redhat.com> Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b="Ir0TCeJ/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of seanjc@google.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=seanjc@google.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 0C85C7001A0A X-Stat-Signature: c6gi1witnipxadkrmo4mh1ey3y4xhgcd X-HE-Tag: 1630102737-559195 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Aug 26, 2021, David Hildenbrand wrote: > You'll end up with a VMA that corresponds to the whole file in a single > process only, and that cannot vanish, not even in parts. How would userspace tell the kernel to free parts of memory that it doesn't want assigned to the guest, e.g. to free memory that the guest has converted to not-private? > Define "ordinary" user memory slots as overlay on top of "encrypted" memory > slots. Inside KVM, bail out if you encounter such a VMA inside a normal > user memory slot. When creating a "encryped" user memory slot, require that > the whole VMA is covered at creation time. You know the VMA can't change > later. This can work for the basic use cases, but even then I'd strongly prefer not to tie memslot correctness to the VMAs. KVM doesn't truly care what lies behind the virtual address of a memslot, and when it does care, it tends to do poorly, e.g. see the whole PFNMAP snafu. KVM cares about the pfn<->gfn mappings, and that's reflected in the infrastructure. E.g. KVM relies on the mmu_notifiers to handle mprotect()/munmap()/etc... As is, I don't think KVM would get any kind of notification if userpaces unmaps the VMA for a private memslot that does not have any entries in the host page tables. I'm sure it's a solvable problem, e.g. by ensuring at least one page is touched by the backing store, but I don't think the end result would be any prettier than a dedicated API for KVM to consume. Relying on VMAs, and thus the mmu_notifiers, also doesn't provide line of sight to page migration or swap. For those types of operations, KVM currently just reacts to invalidation notifications by zapping guest PTEs, and then gets the new pfn when the guest re-faults on the page. That sequence doesn't work for TDX or SEV-SNP because the trusteday agent needs to do the memcpy() of the page contents, i.e. the host needs to call into KVM for the actual migration. There's also the memory footprint side of things; the fd-based approach avoids having to create host page tables for memory that by definition will never be used by the host.