From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFA52C433F5 for ; Fri, 1 Apr 2022 18:29:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 891DD6B0071; Fri, 1 Apr 2022 14:25:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 81A4F6B0072; Fri, 1 Apr 2022 14:25:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 646468D0001; Fri, 1 Apr 2022 14:25:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 534CA6B0071 for ; Fri, 1 Apr 2022 14:25:00 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 2AA9B5C0 for ; Fri, 1 Apr 2022 18:24:50 +0000 (UTC) X-FDA: 79309136340.12.661DB4F Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf05.hostedemail.com (Postfix) with ESMTP id 9F906100032 for ; Fri, 1 Apr 2022 18:24:49 +0000 (UTC) Received: by mail-pl1-f173.google.com with SMTP id j8so3104521pll.11 for ; Fri, 01 Apr 2022 11:24:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=u8pAbwXuVZ7sCmxLVbJSUFO5pNLY6s4GVBTnQJD82aI=; b=YYzb/Ra45v4lgC8Mo/lsfCtm73X48xfnKrwB54MwA5wPabpTz4mmwu8S2I8qAYyJj5 Kn8uWkEGKw42ptlWD2HrRNgNxXueDq73aO//NsxdRXksW3j8tF8LEZ9TNnOWeavfn4mS ZnUNSezrBOtvRZXs/fW/nfMTjCy5B2muTjyglf0mxjSXQ6YnqfEviq2D7D++dUzYgxcG UdF4Qru6HTQXhc6znRvUaCdL76qs761lqxWW05blkvDbtrYsJVEI4tJZNsF09hAOBLyT 1WnaRrSyuvDFbGtFyDjWoiF7bOi7Ge6us5fAD3EwikvFCxEpyQbi8pGe9Fn/yt7zc2oz m/pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=u8pAbwXuVZ7sCmxLVbJSUFO5pNLY6s4GVBTnQJD82aI=; b=fN1liPr2Pc0qE2dxQ6fQCbowzL0dOQWL6Ogcc/yn+BdEzBAfwfT7m2lCFyhttWawP/ 47Onc2ZZrc+vi5eGXMz3+DynekGDlssyz7pyjjWRw0TUrfB5H1icxEr2JpfcTUTNjkIp grp9W2ZHyrAxa/UovXagAAeRtiPIRzTKcywxs33RkzTF4uiDy9f4+DLj/rLX94uHN0Qq JsJBf2aD35Ufs5wCF9D/dJ2b77yyFn8r9YAQ6c1TRZRsdrTltelZ/sUjMxysrzQpk+kC J42Qw5qU/36B+abmNihs1quGlJpUF6m/75J9Spl1LdJWi3SkHpIle7VpVVw6ckSYX4lx 8Hnw== X-Gm-Message-State: AOAM533C7+lZpffq31v1Pzh+Omu/4bRU4/To9o0iFoZ8mPYF0a9+GQH3 SKvBsKjX7RcKd/c1msHAMqbNUQ== X-Google-Smtp-Source: ABdhPJwlAe0fXT7OVI7wEJgZhusGac4WQ7Y+Hjw0pJaGrpLtvXvp7Ss9UWdSc1InlU9W2r2G1vWIkQ== X-Received: by 2002:a17:90b:1c86:b0:1bf:2a7e:5c75 with SMTP id oo6-20020a17090b1c8600b001bf2a7e5c75mr13348243pjb.145.1648837488223; Fri, 01 Apr 2022 11:24:48 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id oc10-20020a17090b1c0a00b001c7510ed0c8sm14589897pjb.49.2022.04.01.11.24.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Apr 2022 11:24:47 -0700 (PDT) Date: Fri, 1 Apr 2022 18:24:44 +0000 From: Sean Christopherson To: Quentin Perret Cc: Andy Lutomirski , Steven Price , Chao Peng , kvm list , Linux Kernel Mailing List , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Linux API , qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , the arch/x86 maintainers , "H. Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A. Shutemov" , "Nakajima, Jun" , Dave Hansen , Andi Kleen , David Hildenbrand , Marc Zyngier , Will Deacon Subject: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: References: <88620519-029e-342b-0a85-ce2a20eaf41b@arm.com> <80aad2f9-9612-4e87-a27a-755d3fa97c92@www.fastmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="YYzb/Ra4"; spf=pass (imf05.hostedemail.com: domain of seanjc@google.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=seanjc@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 9F906100032 X-Stat-Signature: 595xccowpo493tm4sqhfha51xhbfrpoh X-HE-Tag: 1648837489-549814 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Apr 01, 2022, Quentin Perret wrote: > On Friday 01 Apr 2022 at 17:14:21 (+0000), Sean Christopherson wrote: > > On Fri, Apr 01, 2022, Quentin Perret wrote: > > I assume there is a scenario where a page can be converted from shared=>private? > > If so, is there a use case where that happens post-boot _and_ the contents of the > > page are preserved? > > I think most our use-cases are private=>shared, but how is that > different? Ah, it's not really different. What I really was trying to understand is if there are post-boot conversions that preserve data. I asked about shared=>private because there are known pre-boot conversions, e.g. populating the initial guest image, but AFAIK there are no use cases for post-boot conversions, which might be more needy in terms of performance. > > > We currently don't allow the host punching holes in the guest IPA space. > > > > The hole doesn't get punched in guest IPA space, it gets punched in the private > > backing store, which is host PA space. > > Hmm, in a previous message I thought that you mentioned when a whole > gets punched in the fd KVM will go and unmap the page in the private > SPTEs, which will cause a fatal error for any subsequent access from the > guest to the corresponding IPA? Oooh, that was in the context of TDX. Mixing VMX and arm64 terminology... TDX has two separate stage-2 roots, one for private IPAs and one for shared IPAs. The guest selects private/shared by toggling a bit stolen from the guest IPA space. Upon conversion, KVM will remove from one stage-2 tree and insert into the other. But even then, subsequent accesses to the wrong IPA won't be fatal, as KVM will treat them as implicit conversions. I wish they could be fatal, but that's not "allowed" given the guest/host contract dictated by the TDX specs. > If that's correct, I meant that we currently don't support that - the > host can't unmap anything from the guest stage-2, it can only tear it > down entirely. But again, I'm not too worried about that, we could > certainly implement that part without too many issues. I believe for the pKVM case it wouldn't be unmapping, it would be a PFN change. > > > Once it has donated a page to a guest, it can't have it back until the > > > guest has been entirely torn down (at which point all of memory is > > > poisoned by the hypervisor obviously). > > > > The guest doesn't have to know that it was handed back a different page. It will > > require defining the semantics to state that the trusted hypervisor will clear > > that page on conversion, but IMO the trusted hypervisor should be doing that > > anyways. IMO, forcing on the guest to correctly zero pages on conversion is > > unnecessarily risky because converting private=>shared and preserving the contents > > should be a very, very rare scenario, i.e. it's just one more thing for the guest > > to get wrong. > > I'm not sure I agree. The guest is going to communicate with an > untrusted entity via that shared page, so it better be careful. Guest > hardening in general is a major topic, and of all problems, zeroing the > page before sharing is probably one of the simplest to solve. Yes, for private=>shared you're correct, the guest needs to be paranoid as there are no guarantees as to what data may be in the shared page. I was thinking more in the context of shared=>private conversions, e.g. the guest is done sharing a page and wants it back. In that case, forcing the guest to zero the private page upon re-acceptance is dicey. Hmm, but if the guest needs to explicitly re-accept the page, then putting the onus on the guest to zero the page isn't a big deal. The pKVM contract would just need to make it clear that the guest cannot make any assumptions about the state of private data Oh, now I remember why I'm biased toward the trusted entity doing the work. IIRC, thanks to TDX's lovely memory poisoning and cache aliasing behavior, the guest can't be trusted to properly initialize private memory with the guest key, i.e. the guest could induce a #MC and crash the host. Anywho, I agree that for performance reasons, requiring the guest to zero private pages is preferable so long as the guest must explicitly accept/initiate conversions. > Also, note that in pKVM all the hypervisor code at EL2 runs with > preemption disabled, which is a strict constraint. As such one of the > main goals is the spend as little time as possible in that context. > We're trying hard to keep the amount of zeroing/memcpy-ing to an > absolute minimum. And that's especially true as we introduce support for > huge pages. So, we'll take every opportunity we get to have the guest > or the host do that work. FWIW, TDX has the exact same constraints (they're actually worse as the trusted entity runs with _all_ interrupts blocked). And yeah, it needs to be careful when dealing with huge pages, e.g. many flows force the guest/host to do 512 * 4kb operations.