From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Kalra, Ashish" <Ashish.Kalra@amd.com>
Cc: "Singh, Brijesh" <brijesh.singh@amd.com>,
Steve Rutherford <srutherford@google.com>,
Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
Joerg Roedel <joro@8bytes.org>, Borislav Petkov <bp@suse.de>,
"Lendacky, Thomas" <Thomas.Lendacky@amd.com>,
X86 ML <x86@kernel.org>, KVM list <kvm@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
"dovmurik@linux.vnet.ibm.com" <dovmurik@linux.vnet.ibm.com>,
"tobin@ibm.com" <tobin@ibm.com>,
"jejb@linux.ibm.com" <jejb@linux.ibm.com>,
"frankeh@us.ibm.com" <frankeh@us.ibm.com>
Subject: Re: [PATCH v2 1/9] KVM: x86: Add AMD SEV specific Hypercall3
Date: Fri, 18 Dec 2020 19:56:41 +0000 [thread overview]
Message-ID: <20201218195641.GL2956@work-vm> (raw)
In-Reply-To: <E79E09A2-F314-4B59-B7AE-07B1D422DF2B@amd.com>
* Kalra, Ashish (Ashish.Kalra@amd.com) wrote:
> Hello Dave,
>
> On Dec 18, 2020, at 1:40 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>
> * Ashish Kalra (ashish.kalra@amd.com) wrote:
> On Fri, Dec 11, 2020 at 10:55:42PM +0000, Ashish Kalra wrote:
> Hello All,
>
> On Tue, Dec 08, 2020 at 10:29:05AM -0600, Brijesh Singh wrote:
>
> On 12/7/20 9:09 PM, Steve Rutherford wrote:
> On Mon, Dec 7, 2020 at 12:42 PM Sean Christopherson <seanjc@google.com> wrote:
> On Sun, Dec 06, 2020, Paolo Bonzini wrote:
> On 03/12/20 01:34, Sean Christopherson wrote:
> On Tue, Dec 01, 2020, Ashish Kalra wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> KVM hypercall framework relies on alternative framework to patch the
> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
> apply_alternative() is called then it defaults to VMCALL. The approach
> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
> will be able to decode the instruction and do the right things. But
> when SEV is active, guest memory is encrypted with guest key and
> hypervisor will not be able to decode the instruction bytes.
>
> Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
> will be used by the SEV guest to notify encrypted pages to the hypervisor.
> What if we invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to VMMCALL
> and opt into VMCALL? It's a synthetic feature flag either way, and I don't
> think there are any existing KVM hypercalls that happen before alternatives are
> patched, i.e. it'll be a nop for sane kernel builds.
>
> I'm also skeptical that a KVM specific hypercall is the right approach for the
> encryption behavior, but I'll take that up in the patches later in the series.
> Do you think that it's the guest that should "donate" memory for the bitmap
> instead?
> No. Two things I'd like to explore:
>
> 1. Making the hypercall to announce/request private vs. shared common across
> hypervisors (KVM, Hyper-V, VMware, etc...) and technologies (SEV-* and TDX).
> I'm concerned that we'll end up with multiple hypercalls that do more or
> less the same thing, e.g. KVM+SEV, Hyper-V+SEV, TDX, etc... Maybe it's a
> pipe dream, but I'd like to at least explore options before shoving in KVM-
> only hypercalls.
>
>
> 2. Tracking shared memory via a list of ranges instead of a using bitmap to
> track all of guest memory. For most use cases, the vast majority of guest
> memory will be private, most ranges will be 2mb+, and conversions between
> private and shared will be uncommon events, i.e. the overhead to walk and
> split/merge list entries is hopefully not a big concern. I suspect a list
> would consume far less memory, hopefully without impacting performance.
> For a fancier data structure, I'd suggest an interval tree. Linux
> already has an rbtree-based interval tree implementation, which would
> likely work, and would probably assuage any performance concerns.
>
> Something like this would not be worth doing unless most of the shared
> pages were physically contiguous. A sample Ubuntu 20.04 VM on GCP had
> 60ish discontiguous shared regions. This is by no means a thorough
> search, but it's suggestive. If this is typical, then the bitmap would
> be far less efficient than most any interval-based data structure.
>
> You'd have to allow userspace to upper bound the number of intervals
> (similar to the maximum bitmap size), to prevent host OOMs due to
> malicious guests. There's something nice about the guest donating
> memory for this, since that would eliminate the OOM risk.
>
>
> Tracking the list of ranges may not be bad idea, especially if we use
> the some kind of rbtree-based data structure to update the ranges. It
> will certainly be better than bitmap which grows based on the guest
> memory size and as you guys see in the practice most of the pages will
> be guest private. I am not sure if guest donating a memory will cover
> all the cases, e.g what if we do a memory hotplug (increase the guest
> ram from 2GB to 64GB), will donated memory range will be enough to store
> the metadata.
>
> .
>
> With reference to internal discussions regarding the above, i am going
> to look into specific items as listed below :
>
> 1). "hypercall" related :
> a). Explore the SEV-SNP page change request structure (included in GHCB),
> see if there is something common there than can be re-used for SEV/SEV-ES
> page encryption status hypercalls.
> b). Explore if there is any common hypercall framework i can use in
> Linux/KVM.
>
> 2). related to the "backing" data structure - explore using a range-based
> list or something like rbtree-based interval tree data structure
> (as mentioned by Steve above) to replace the current bitmap based
> implementation.
>
>
>
> I do agree that a range-based list or an interval tree data structure is a
> really good "logical" fit for the guest page encryption status tracking.
>
> We can only keep track of the guest unencrypted shared pages in the
> range(s) list (which will keep the data structure quite compact) and all
> the guest private/encrypted memory does not really need any tracking in
> the list, anything not in the list will be encrypted/private.
>
> Also looking at a more "practical" use case, here is the current log of
> page encryption status hypercalls when booting a linux guest :
>
> ...
>
> <snip>
>
> [ 56.146336] page_enc_status_hc invoked, gpa = 1f018000, npages = 1, enc = 1
> [ 56.146351] page_enc_status_hc invoked, gpa = 1f00e000, npages = 1, enc = 0
> [ 56.147261] page_enc_status_hc invoked, gpa = 1f00e000, npages = 1, enc = 0
> [ 56.147271] page_enc_status_hc invoked, gpa = 1f018000, npages = 1, enc = 0
> ....
>
> [ 56.180730] page_enc_status_hc invoked, gpa = 1f008000, npages = 1, enc = 0
> [ 56.180741] page_enc_status_hc invoked, gpa = 1f006000, npages = 1, enc = 0
> [ 56.180768] page_enc_status_hc invoked, gpa = 1f008000, npages = 1, enc = 1
> [ 56.180782] page_enc_status_hc invoked, gpa = 1f006000, npages = 1, enc = 1
>
> ....
> [ 56.197110] page_enc_status_hc invoked, gpa = 1f007000, npages = 1, enc = 0
> [ 56.197120] page_enc_status_hc invoked, gpa = 1f005000, npages = 1, enc = 0
> [ 56.197136] page_enc_status_hc invoked, gpa = 1f007000, npages = 1, enc = 1
> [ 56.197148] page_enc_status_hc invoked, gpa = 1f005000, npages = 1, enc = 1
> ....
>
> [ 56.222679] page_enc_status_hc invoked, gpa = 1e83b000, npages = 1, enc = 0
> [ 56.222691] page_enc_status_hc invoked, gpa = 1e839000, npages = 1, enc = 0
> [ 56.222707] page_enc_status_hc invoked, gpa = 1e83b000, npages = 1, enc = 1
> [ 56.222720] page_enc_status_hc invoked, gpa = 1e839000, npages = 1, enc = 1
> ....
>
> [ 56.313747] page_enc_status_hc invoked, gpa = 1e5eb000, npages = 1, enc = 0
> [ 56.313771] page_enc_status_hc invoked, gpa = 1e5e9000, npages = 1, enc = 0
> [ 56.313789] page_enc_status_hc invoked, gpa = 1e5eb000, npages = 1, enc = 1
> [ 56.313803] page_enc_status_hc invoked, gpa = 1e5e9000, npages = 1, enc = 1
> ....
> [ 56.459276] page_enc_status_hc invoked, gpa = 1d767000, npages = 100, enc = 0
> [ 56.459428] page_enc_status_hc invoked, gpa = 1e501000, npages = 1, enc = 1
> [ 56.460037] page_enc_status_hc invoked, gpa = 1d767000, npages = 100, enc = 1
> [ 56.460216] page_enc_status_hc invoked, gpa = 1e501000, npages = 1, enc = 0
> [ 56.460299] page_enc_status_hc invoked, gpa = 1d767000, npages = 100, enc = 0
> [ 56.460448] page_enc_status_hc invoked, gpa = 1e501000, npages = 1, enc = 1
> ....
>
> As can be observed here, all guest MMIO ranges are initially setup as
> shared, and those are all contigious guest page ranges.
>
> After that the encryption status hypercalls are invoked when DMA gets
> triggered during disk i/o while booting the guest ... here again the
> guest page ranges are contigious, though mostly single page is touched
> and a lot of page re-use is observed.
>
> So a range-based list/structure will be a "good" fit for such usage
> scenarios.
>
> It seems surprisingly common to flick the same pages back and forth between
> encrypted and clear for quite a while; why is this?
>
>
> dma_alloc_coherent()'s will allocate pages and then call
> set_decrypted() on them and then at dma_free_coherent(), set_encrypted()
> is called on the pages to be freed. So these observations in the logs
> where a lot of single 4K pages are seeing C-bit transitions and
> corresponding hypercalls are the ones associated with
> dma_alloc_coherent().
It makes me wonder if it might be worth teaching it to hold onto those
DMA pages somewhere until it needs them for something else and avoid the
extra hypercalls; just something to think about.
Dave
> Thanks,
> Ashish
>
>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2020-12-18 19:58 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-01 0:45 [PATCH v2 0/9] Add AMD SEV page encryption bitmap support Ashish Kalra
2020-12-01 0:45 ` [PATCH v2 1/9] KVM: x86: Add AMD SEV specific Hypercall3 Ashish Kalra
2020-12-03 0:34 ` Sean Christopherson
2020-12-04 17:16 ` Brijesh Singh
2020-12-06 10:26 ` Paolo Bonzini
2020-12-07 20:41 ` Sean Christopherson
2020-12-08 3:09 ` Steve Rutherford
2020-12-08 4:16 ` Kalra, Ashish
2020-12-08 16:29 ` Brijesh Singh
2020-12-11 22:55 ` Ashish Kalra
2020-12-12 4:56 ` Ashish Kalra
2020-12-18 19:39 ` Dr. David Alan Gilbert
[not found] ` <E79E09A2-F314-4B59-B7AE-07B1D422DF2B@amd.com>
2020-12-18 19:56 ` Dr. David Alan Gilbert [this message]
2021-01-06 23:05 ` Ashish Kalra
2021-01-07 1:01 ` Steve Rutherford
2021-01-07 1:34 ` Ashish Kalra
2021-01-07 8:05 ` Ashish Kalra
2021-01-08 0:47 ` Ashish Kalra
2021-01-08 0:55 ` Steve Rutherford
2021-01-07 17:07 ` Ashish Kalra
2021-01-07 17:26 ` Sean Christopherson
2021-01-07 18:41 ` Ashish Kalra
2021-01-07 19:22 ` Sean Christopherson
2021-01-08 0:54 ` Steve Rutherford
2021-01-08 16:56 ` Sean Christopherson
2020-12-01 0:46 ` [PATCH v2 2/9] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall Ashish Kalra
2020-12-02 16:54 ` Dr. David Alan Gilbert
2020-12-02 21:22 ` Ashish Kalra
2020-12-06 10:25 ` Paolo Bonzini
2020-12-01 0:47 ` [PATCH v2 3/9] KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl Ashish Kalra
2020-12-06 11:02 ` Dov Murik
2020-12-07 22:00 ` Ashish Kalra
2020-12-01 0:47 ` [PATCH v2 4/9] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
2020-12-01 0:47 ` [PATCH v2 5/9] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl Ashish Kalra
2020-12-01 0:47 ` [PATCH v2 6/9] KVM: SVM: Add support for static allocation of unified Page Encryption Bitmap Ashish Kalra
2020-12-01 0:48 ` [PATCH v2 7/9] KVM: x86: Mark _bss_decrypted section variables as decrypted in page encryption bitmap Ashish Kalra
2020-12-01 0:48 ` [PATCH v2 8/9] KVM: x86: Add kexec support for SEV " Ashish Kalra
2020-12-01 0:48 ` [PATCH v2 9/9] KVM: SVM: Bypass DBG_DECRYPT API calls for unecrypted guest memory Ashish Kalra
2020-12-08 5:18 [PATCH v2 1/9] KVM: x86: Add AMD SEV specific Hypercall3 Kalra, Ashish
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201218195641.GL2956@work-vm \
--to=dgilbert@redhat.com \
--cc=Ashish.Kalra@amd.com \
--cc=Thomas.Lendacky@amd.com \
--cc=bp@suse.de \
--cc=brijesh.singh@amd.com \
--cc=dovmurik@linux.vnet.ibm.com \
--cc=frankeh@us.ibm.com \
--cc=hpa@zytor.com \
--cc=jejb@linux.ibm.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=srutherford@google.com \
--cc=tglx@linutronix.de \
--cc=tobin@ibm.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).