From: Steve Rutherford <srutherford@google.com>
To: "Kalra, Ashish" <Ashish.Kalra@amd.com>
Cc: "Singh, Brijesh" <brijesh.singh@amd.com>,
Sean Christopherson <seanjc@google.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"joro@8bytes.org" <joro@8bytes.org>,
"Lendacky, Thomas" <Thomas.Lendacky@amd.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"venu.busireddy@oracle.com" <venu.busireddy@oracle.com>,
Will Deacon <will@kernel.org>,
Quentin Perret <qperret@google.com>
Subject: Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
Date: Tue, 9 Mar 2021 19:47:26 -0800 [thread overview]
Message-ID: <CABayD+d9DkHV9tnpPfKXgzGiQ27+K=21R1HhOpjLpks6zgoGUw@mail.gmail.com> (raw)
In-Reply-To: <F3B77ECE-8C70-47AA-98F8-0C032CB5F568@amd.com>
On Tue, Mar 9, 2021 at 7:42 PM Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
>
>
>
> > On Mar 9, 2021, at 3:22 AM, Steve Rutherford <srutherford@google.com> wrote:
> >
> > On Mon, Mar 8, 2021 at 1:11 PM Brijesh Singh <brijesh.singh@amd.com> wrote:
> >>
> >>
> >>> On 3/8/21 1:51 PM, Sean Christopherson wrote:
> >>> On Mon, Mar 08, 2021, Ashish Kalra wrote:
> >>>> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> >>>>> +Will and Quentin (arm64)
> >>>>>
> >>>>> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at this
> >>>>> point.
> >>>>>
> >>>>> On Fri, Feb 26, 2021, Ashish Kalra wrote:
> >>>>>> On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> >>>>>>> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >>>>>>> Thanks for grabbing the data!
> >>>>>>>
> >>>>>>> I am fine with both paths. Sean has stated an explicit desire for
> >>>>>>> hypercall exiting, so I think that would be the current consensus.
> >>>>> Yep, though it'd be good to get Paolo's input, too.
> >>>>>
> >>>>>>> If we want to do hypercall exiting, this should be in a follow-up
> >>>>>>> series where we implement something more generic, e.g. a hypercall
> >>>>>>> exiting bitmap or hypercall exit list. If we are taking the hypercall
> >>>>>>> exit route, we can drop the kvm side of the hypercall.
> >>>>> I don't think this is a good candidate for arbitrary hypercall interception. Or
> >>>>> rather, I think hypercall interception should be an orthogonal implementation.
> >>>>>
> >>>>> The guest, including guest firmware, needs to be aware that the hypercall is
> >>>>> supported, and the ABI needs to be well-defined. Relying on userspace VMMs to
> >>>>> implement a common ABI is an unnecessary risk.
> >>>>>
> >>>>> We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> >>>>> require further VMM intervention. But, I just don't see the point, it would
> >>>>> save only a few lines of code. It would also limit what KVM could do in the
> >>>>> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> >>>>> then mandatory interception would essentially make it impossible for KVM to do
> >>>>> bookkeeping while still honoring the interception request.
> >>>>>
> >>>>> However, I do think it would make sense to have the userspace exit be a generic
> >>>>> exit type. But hey, we already have the necessary ABI defined for that! It's
> >>>>> just not used anywhere.
> >>>>>
> >>>>> /* KVM_EXIT_HYPERCALL */
> >>>>> struct {
> >>>>> __u64 nr;
> >>>>> __u64 args[6];
> >>>>> __u64 ret;
> >>>>> __u32 longmode;
> >>>>> __u32 pad;
> >>>>> } hypercall;
> >>>>>
> >>>>>
> >>>>>>> Userspace could also handle the MSR using MSR filters (would need to
> >>>>>>> confirm that). Then userspace could also be in control of the cpuid bit.
> >>>>> An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> >>>>> The data limitation could be fudged by shoving data into non-standard GPRs, but
> >>>>> that will result in truly heinous guest code, and extensibility issues.
> >>>>>
> >>>>> The data limitation is a moot point, because the x86-only thing is a deal
> >>>>> breaker. arm64's pKVM work has a near-identical use case for a guest to share
> >>>>> memory with a host. I can't think of a clever way to avoid having to support
> >>>>> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> >>>>> multiple KVM variants.
> >>>>>
> >>>> Potentially, there is another reason for in-kernel hypercall handling
> >>>> considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
> >>>> of each guest page, for instance pages in hypervisor state, i.e., pages
> >>>> with C=0 and pages in guest valid state with C=1.
> >>>>
> >>>> Now, there shouldn't be a need for page encryption status hypercalls on
> >>>> SEV-SNP as KVM can track & reference guest page status directly using
> >>>> the RMP table.
> >>> Relying on the RMP table itself would require locking the RMP table for an
> >>> extended duration, and walking the entire RMP to find shared pages would be
> >>> very inefficient.
> >>>
> >>>> As KVM maintains the RMP table, therefore we will need SET/GET type of
> >>>> interfaces to provide the guest page encryption status to userspace.
> >>> Hrm, somehow I temporarily forgot about SNP and TDX adding their own hypercalls
> >>> for converting between shared and private. And in the case of TDX, the hypercall
> >>> can't be trusted, i.e. is just a hint, otherwise the guest could induce a #MC in
> >>> the host.
> >>>
> >>> But, the different guest behavior doesn't require KVM to maintain a list/tree,
> >>> e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page encryption
> >>> status changes would also suffice.
> >>>
> >>> Actually, that made me think of another argument against maintaining a list in
> >>> KVM: there's no way to notify userspace that a page's status has changed.
> >>> Userspace would need to query KVM to do GET_LIST after every GET_DIRTY.
> >>> Obviously not a huge issue, but it does make migration slightly less efficient.
> >>>
> >>> On a related topic, there are fatal race conditions that will require careful
> >>> coordination between guest and host, and will effectively be wired into the ABI.
> >>> SNP and TDX don't suffer these issues because host awareness of status is atomic
> >>> with respect to the guest actually writing the page with the new encryption
> >>> status.
> >>>
> >>> For SEV live migration...
> >>>
> >>> If the guest does the hypercall after writing the page, then the guest is hosed
> >>> if it gets migrated while writing the page (scenario #1):
> >>>
> >>> vCPU Userspace
> >>> zero_bytes[0:N]
> >>> <transfers written bytes as private instead of shared>
> >>> <migrates vCPU>
> >>> zero_bytes[N+1:4095]
> >>> set_shared (dest)
> >>> kaboom!
> >>
> >>
> >> Maybe I am missing something, this is not any different from a normal
> >> operation inside a guest. Making a page shared/private in the page table
> >> does not update the content of the page itself. In your above case, I
> >> assume zero_bytes[N+1:4095] are written by the destination VM. The
> >> memory region was private in the source VM page table, so, those writes
> >> will be performed encrypted. The destination VM later changed the memory
> >> to shared, but nobody wrote to the memory after it has been transitioned
> >> to the shared, so a reader of the memory should get ciphertext and
> >> unless there was a write after the set_shared (dest).
> >>
> >>
> >>> If userspace does GET_DIRTY after GET_LIST, then the host would transfer bad
> >>> data by consuming a stale list (scenario #2):
> >>>
> >>> vCPU Userspace
> >>> get_list (from KVM or internally)
> >>> set_shared (src)
> >>> zero_page (src)
> >>> get_dirty
> >>> <transfers private data instead of shared>
> >>> <migrates vCPU>
> >>> kaboom!
> >>
> >>
> >> I don't remember how things are done in recent Ashish Qemu/KVM patches
> >> but in previous series, the get_dirty() happens before the querying the
> >> encrypted state. There was some logic in VMM to resync the encrypted
> >> bitmap during the final migration stage and perform any additional data
> >> transfer since last sync.
> >>
> >>
> >>> If both guest and host order things to avoid #1 and #2, the host can still
> >>> migrate the wrong data (scenario #3):
> >>>
> >>> vCPU Userspace
> >>> set_private
> >>> zero_bytes[0:4096]
> >>> get_dirty
> >>> set_shared (src)
> >>> get_list
> >>> <transfers as shared instead of private>
> >>> <migrates vCPU>
> >>> set_private (dest)
> >>> kaboom!
> >>
> >>
> >> Since there was no write to the memory after the set_shared (src), so
> >> the content of the page should not have changed. After the set_private
> >> (dest), the caller should be seeing the same content written by the
> >> zero_bytes[0:4096]
> > I think Sean was going for the situation where the VM has moved to the
> > destination, which would have changed the VEK. That way the guest
> > would be decrypting the old ciphertext with the new (wrong) key.
> >>
>
> But how can this happen, if a page is migrated as private , when it is received it will be decrypted using the transport key TEK and then re-encrypted using the destination VM’s VEK on the destination VM.
>
If, as in scenario #3 above, the page is set to shared just before
being migrated. It would then be migrated in the clear, but be
interpreted on the target as encrypted (since, immediately
post-migration, the page is flipped to private without ever writing to
the page). This is not a scenario that is expected to work, as it
requires violating (currently unspoken?) invariants.
Thanks,
Steve
> Thanks,
> Ashish
>
> >>
> >>> Scenario #3 is unlikely, but plausible, e.g. if the guest bails from its
> >>> conversion flow for whatever reason, after making the initial hypercall. Maybe
> >>> it goes without saying, but to address #3, the guest must consider existing data
> >>> as lost the instant it tells the host the page has been converted to a different
> >>> type.
> >>>
> >>>> For the above reason if we do in-kernel hypercall handling for page
> >>>> encryption status (which we probably won't require for SEV-SNP &
> >>>> correspondingly there will be no hypercall exiting),
> >>> As above, that doesn't preclude KVM from exiting to userspace on conversion.
> >>>
> >>>> then we can implement a standard GET/SET ioctl interface to get/set the guest
> >>>> page encryption status for userspace, which will work across SEV, SEV-ES and
> >>>> SEV-SNP.
next prev parent reply other threads:[~2021-03-10 3:49 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-04 0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
2021-02-04 0:36 ` [PATCH v10 01/16] KVM: SVM: Add KVM_SEV SEND_START command Ashish Kalra
2021-02-04 0:36 ` [PATCH v10 02/16] KVM: SVM: Add KVM_SEND_UPDATE_DATA command Ashish Kalra
2021-02-04 0:37 ` [PATCH v10 03/16] KVM: SVM: Add KVM_SEV_SEND_FINISH command Ashish Kalra
2021-02-04 0:37 ` [PATCH v10 04/16] KVM: SVM: Add support for KVM_SEV_RECEIVE_START command Ashish Kalra
2021-02-04 0:37 ` [PATCH v10 05/16] KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command Ashish Kalra
2021-02-04 0:37 ` [PATCH v10 06/16] KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command Ashish Kalra
2021-02-04 0:38 ` [PATCH v10 07/16] KVM: x86: Add AMD SEV specific Hypercall3 Ashish Kalra
2021-02-04 0:38 ` [PATCH v10 08/16] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall Ashish Kalra
2021-02-04 16:03 ` Tom Lendacky
2021-02-05 1:44 ` Steve Rutherford
2021-02-05 3:32 ` Ashish Kalra
2021-02-04 0:39 ` [PATCH v10 09/16] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
2021-02-04 0:39 ` [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl Ashish Kalra
2021-02-04 16:14 ` Tom Lendacky
2021-02-04 16:34 ` Ashish Kalra
2021-02-17 1:03 ` Sean Christopherson
2021-02-17 14:00 ` Kalra, Ashish
2021-02-17 16:13 ` Sean Christopherson
2021-02-18 6:48 ` Kalra, Ashish
2021-02-18 16:39 ` Sean Christopherson
2021-02-18 17:05 ` Kalra, Ashish
2021-02-18 17:50 ` Sean Christopherson
2021-02-18 18:32 ` Kalra, Ashish
2021-02-24 17:51 ` Ashish Kalra
2021-02-24 18:22 ` Sean Christopherson
2021-02-25 20:20 ` Ashish Kalra
2021-02-25 22:59 ` Steve Rutherford
2021-02-25 23:24 ` Steve Rutherford
2021-02-26 14:04 ` Ashish Kalra
2021-02-26 17:44 ` Sean Christopherson
2021-03-02 14:55 ` Ashish Kalra
2021-03-02 15:15 ` Ashish Kalra
2021-03-03 18:54 ` Will Deacon
2021-03-03 19:32 ` Ashish Kalra
2021-03-09 19:10 ` Ashish Kalra
2021-03-11 18:14 ` Ashish Kalra
2021-03-11 20:48 ` Steve Rutherford
2021-03-19 17:59 ` Ashish Kalra
2021-04-02 1:40 ` Steve Rutherford
2021-04-02 11:09 ` Ashish Kalra
2021-03-08 10:40 ` Ashish Kalra
2021-03-08 19:51 ` Sean Christopherson
2021-03-08 21:05 ` Ashish Kalra
2021-03-08 21:11 ` Brijesh Singh
2021-03-08 21:32 ` Ashish Kalra
2021-03-08 21:51 ` Steve Rutherford
2021-03-09 19:42 ` Sean Christopherson
2021-03-10 3:42 ` Kalra, Ashish
2021-03-10 3:47 ` Steve Rutherford [this message]
2021-03-08 21:48 ` Steve Rutherford
2021-02-17 1:06 ` Sean Christopherson
2021-02-04 0:39 ` [PATCH v10 11/16] KVM: x86: Introduce KVM_SET_SHARED_PAGES_LIST ioctl Ashish Kalra
2021-02-04 0:39 ` [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR Ashish Kalra
2021-02-05 0:56 ` Steve Rutherford
2021-02-05 3:07 ` Ashish Kalra
2021-02-06 2:54 ` Steve Rutherford
2021-02-06 4:49 ` Ashish Kalra
2021-02-06 5:46 ` Ashish Kalra
2021-02-06 13:56 ` Ashish Kalra
2021-02-08 0:28 ` Ashish Kalra
2021-02-08 22:50 ` Steve Rutherford
2021-02-10 20:36 ` Ashish Kalra
2021-02-10 22:01 ` Steve Rutherford
2021-02-10 22:05 ` Steve Rutherford
2021-02-16 23:20 ` Sean Christopherson
2021-02-04 0:40 ` [PATCH v10 13/16] EFI: Introduce the new AMD Memory Encryption GUID Ashish Kalra
2021-02-04 0:40 ` [PATCH v10 14/16] KVM: x86: Add guest support for detecting and enabling SEV Live Migration feature Ashish Kalra
2021-02-18 17:56 ` Sean Christopherson
2021-02-04 0:40 ` [PATCH v10 15/16] KVM: x86: Add kexec support for SEV Live Migration Ashish Kalra
2021-02-04 0:40 ` [PATCH v10 16/16] KVM: SVM: Bypass DBG_DECRYPT API calls for unencrypted guest memory Ashish Kalra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CABayD+d9DkHV9tnpPfKXgzGiQ27+K=21R1HhOpjLpks6zgoGUw@mail.gmail.com' \
--to=srutherford@google.com \
--cc=Ashish.Kalra@amd.com \
--cc=Thomas.Lendacky@amd.com \
--cc=brijesh.singh@amd.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=qperret@google.com \
--cc=seanjc@google.com \
--cc=venu.busireddy@oracle.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).