Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!

From: harry harry <hiharryharryharry@gmail.com>
To: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>,
	qemu-devel@nongnu.org, mathieu.tarral@protonmail.com,
	stefanha@redhat.com, libvir-list@redhat.com, kvm@vger.kernel.org,
	pbonzini@redhat.com
Subject: Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
Date: Tue, 13 Oct 2020 18:40:19 -0400	[thread overview]
Message-ID: <CA+-xGqO37RzQDg5dnE_3NWMp6+u2L02GQDqoSr3RdedoMBugrg@mail.gmail.com> (raw)
In-Reply-To: <20201013070329.GC11344@linux.intel.com>

Hi Sean,

Thanks much for your detailed replies. It's clear to me why GPAs are
different from HVAs in QEM/KVM. Thanks! I appreciate it if you could
help with the following two more questions.

On Tue, Oct 13, 2020 at 3:03 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> This is where memslots come in.  Think of memslots as a one-level page tablea
> that translate GPAs to HVAs.  A memslot, set by userspace, tells KVM the
> corresponding HVA for a given GPA.
>
> Before the guest is running (assuming host userspace isn't broken), the
> userspace VMM will first allocate virtual memory (HVA) for all physical
> memory it wants to map into the guest (GPA).  It then tells KVM how to
> translate a given GPA to its HVA by creating a memslot.
>
> To avoid getting lost in a tangent about page offsets, let's assume array[0]'s
> GPA = 0xa000.  For KVM to create a GPA->HPA mapping for the guest, there _must_
> be a memslot that translates GPA 0xa000 to an HVA[*].  Let's say HVA = 0xb000.
>
> On an EPT violation, KVM does a memslot lookup to translate the GPA (0xa000) to
> its HVA (0xb000), and then walks the host page tables to translate the HVA into
> a HPA (let's say that ends up being 0xc000).  KVM then stuffs 0xc000 into the
> EPT tables, which yields:
>
>   GPA    -> HVA    (KVM memslots)
>   0xa000    0xb000
>
>   HVA    -> HPA    (host page tables)
>   0xb000    0xc000
>
>   GPA    -> HPA    (extended page tables)
>   0xa000    0xc000
>
> To keep the EPT tables synchronized with the host page tables, if HVA->HPA
> changes, e.g. HVA 0xb000 is remapped to HPA 0xd000, then KVM will get notified
> by the host kernel that the HVA has been unmapped and will find and unmap
> the corresponding GPA (again via memslots) to HPA translations.
>
> Ditto for the case where userspace moves a memslot, e.g. if HVA is changed
> to 0xe000, KVM will first unmap all old GPA->HPA translations so that accesses
> to GPA 0xa000 from the guest will take an EPT violation and see the new HVA
> (and presumably a new HPA).

Q1: Is there any file like ``/proc/pid/pagemap'' to record the
mappings between GPAs and HVAs in the host OS?

Q2: Seems that there might be extra overhead (e.g., synchronization
between EPT tables and host regular page tables; maintaining extra
regular page tables and data structures), which is caused by the extra
translation between GPAs to HVAs via memslots. Why doesn't KVM
directly use GPAs as HVAs and leverage extended/nested page tables to
translate HVAs (i.e., GPAs) to HPAs?

Thanks,
Harry